Skip to main content
Version: Next

ai-proxy

Description#

The ai-prox-multi plugin simplifies access to LLM providers and models by defining a standard request format that allows key fields in plugin configuration to be embedded into the request.

This plugin adds additional features like load balancing and retries to the existing ai-proxy plugin.

Proxying requests to OpenAI is supported now. Other LLM services will be supported soon.

Request Format#

OpenAI#

  • Chat API
NameTypeRequiredDescription
messagesArrayYesAn array of message objects
messages.roleStringYesRole of the message (system, user, assistant)
messages.contentStringYesContent of the message

Plugin Attributes#

NameRequiredTypeDescriptionDefault
providersYesarrayList of AI providers, each following the provider schema.
provider.nameYesstringName of the AI service provider. Allowed values: openai, deepseek.
provider.modelYesstringName of the AI model to execute. Example: gpt-4o.
provider.priorityNointegerPriority of the provider for load balancing.0
provider.weightNointegerLoad balancing weight.
balancer.algorithmNostringLoad balancing algorithm. Allowed values: chash, roundrobin.roundrobin
balancer.hash_onNostringDefines what to hash on for consistent hashing (vars, header, cookie, consumer, vars_combinations).vars
balancer.keyNostringKey for consistent hashing in dynamic load balancing.
provider.authYesobjectAuthentication details, including headers and query parameters.
provider.auth.headerNoobjectAuthentication details sent via headers. Header name must match ^[a-zA-Z0-9._-]+$.
provider.auth.queryNoobjectAuthentication details sent via query parameters. Keys must match ^[a-zA-Z0-9._-]+$.
provider.options.max_tokensNointegerDefines the maximum tokens for chat or completion models.256
provider.options.input_costNonumberCost per 1M tokens in the input prompt. Minimum is 0.
provider.options.output_costNonumberCost per 1M tokens in the AI-generated output. Minimum is 0.
provider.options.temperatureNonumberDefines the model's temperature (0.0 - 5.0) for randomness in responses.
provider.options.top_pNonumberDefines the top-p probability mass (0 - 1) for nucleus sampling.
provider.options.streamNobooleanEnables streaming responses via SSE.false
provider.override.endpointNostringCustom host override for the AI provider.
passthroughNobooleanIf true, requests are forwarded without processing.false
timeoutNointegerRequest timeout in milliseconds (1-60000).3000
keepaliveNobooleanEnables keepalive connections.true
keepalive_timeoutNointegerTimeout for keepalive connections (minimum 1000ms).60000
keepalive_poolNointegerMaximum keepalive connections.30
ssl_verifyNobooleanEnables SSL certificate verification.true

Example usage#

Create a route with the ai-proxy-multi plugin like so:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-multi-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"providers": [
{
"name": "openai",
"model": "gpt-4",
"weight": 1,
"priority": 1,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"max_tokens": 512,
"temperature": 1.0
}
},
{
"name": "deepseek",
"model": "deepseek-chat",
"weight": 1,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"max_tokens": 512,
"temperature": 1.0
}
}
],
"passthrough": false
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"httpbin.org": 1
}
}
}'

In the above configuration, requests will be equally balanced among the openai and deepseek providers.

Retry and fallback:#

The priority attribute can be adjusted to implement the fallback and retry feature.

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-multi-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"providers": [
{
"name": "openai",
"model": "gpt-4",
"weight": 1,
"priority": 1,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"max_tokens": 512,
"temperature": 1.0
}
},
{
"name": "deepseek",
"model": "deepseek-chat",
"weight": 1,
"priority": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"max_tokens": 512,
"temperature": 1.0
}
}
],
"passthrough": false
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"httpbin.org": 1
}
}
}'

In the above configuration priority for the deepseek provider is set to 0. Which means if openai provider is unavailable then ai-proxy-multi plugin will retry sending request to deepseek in the second attempt.