Skip to main content
Version: Next

ai-rag

Description#

The ai-rag plugin integrates Retrieval-Augmented Generation (RAG) capabilities with AI models. It allows efficient retrieval of relevant documents or information from external data sources and augments the LLM responses with that data, improving the accuracy and context of generated outputs.

As of now only Azure OpenAI and Azure AI Search services are supported for generating embeddings and performing vector search respectively. PRs for introducing support for other service providers are welcomed.

Plugin Attributes#

FieldRequiredTypeDescription
embeddings_providerYesobjectConfigurations of the embedding models provider
embeddings_provider.azure_openaiYesobjectConfigurations of Azure OpenAI as the embedding models provider.
embeddings_provider.azure_openai.endpointYesstringAzure OpenAI endpoint
embeddings_provider.azure_openai.api_keyYesstringAzure OpenAI API key
vector_search_providerYesobjectConfiguration for the vector search provider
vector_search_provider.azure_ai_searchYesobjectConfiguration for Azure AI Search
vector_search_provider.azure_ai_search.endpointYesstringAzure AI Search endpoint
vector_search_provider.azure_ai_search.api_keyYesstringAzure AI Search API key

Request Body Format#

The following fields must be present in the request body.

FieldTypeDescription
ai_ragobjectConfiguration for AI-RAG (Retrieval Augmented Generation)
ai_rag.embeddingsobjectRequest parameters required to generate embeddings. Contents will depend on the API specification of the configured provider.
ai_rag.vector_searchobjectRequest parameters required to perform vector search. Contents will depend on the API specification of the configured provider.
  • Parameters of ai_rag.embeddings

    • Azure OpenAI
    NameRequiredTypeDescription
    inputYesstringInput text used to compute embeddings, encoded as a string.
    userNostringA unique identifier representing your end-user, which can help in monitoring and detecting abuse.
    encoding_formatNostringThe format to return the embeddings in. Can be either float or base64. Defaults to float.
    dimensionsNointegerThe number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

For other parameters please refer to the Azure OpenAI embeddings documentation.

  • Parameters of ai_rag.vector_search

    • Azure AI Search
    FieldRequiredTypeDescription
    fieldsYesStringFields for the vector search

    For other parameters please refer the Azure AI Search documentation.

Example request body:

{
"ai_rag": {
"vector_search": { "fields": "contentVector" },
"embeddings": {
"input": "which service is good for devops",
"dimensions": 1024
}
}
}

Example usage#

First initialise these shell variables:

ADMIN_API_KEY=edd1c9f034335f136f87ad84b625c8f1
AZURE_OPENAI_ENDPOINT=https://name.openai.azure.com/openai/deployments/gpt-4o/chat/completions
VECTOR_SEARCH_ENDPOINT=https://name.search.windows.net/indexes/indexname/docs/search?api-version=2024-07-01
EMBEDDINGS_ENDPOINT=https://name.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15
EMBEDDINGS_KEY=secret-azure-openai-embeddings-key
SEARCH_KEY=secret-azureai-search-key
AZURE_OPENAI_KEY=secret-azure-openai-key

Create a route with the ai-rag and ai-proxy plugin like so:

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"uri": "/rag",
"plugins": {
"ai-rag": {
"embeddings_provider": {
"azure_openai": {
"endpoint": "'"$EMBEDDINGS_ENDPOINT"'",
"api_key": "'"$EMBEDDINGS_KEY"'"
}
},
"vector_search_provider": {
"azure_ai_search": {
"endpoint": "'"$VECTOR_SEARCH_ENDPOINT"'",
"api_key": "'"$SEARCH_KEY"'"
}
}
},
"ai-proxy": {
"auth": {
"header": {
"api-key": "'"$AZURE_OPENAI_KEY"'"
},
"query": {
"api-version": "2023-03-15-preview"
}
},
"model": {
"provider": "openai",
"name": "gpt-4",
"options": {
"max_tokens": 512,
"temperature": 1.0
}
},
"override": {
"endpoint": "'"$AZURE_OPENAI_ENDPOINT"'"
}
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"someupstream.com:443": 1
},
"scheme": "https",
"pass_host": "node"
}
}'

The ai-proxy plugin is used here as it simplifies access to LLMs. Alternatively, you may configure the LLM service address in the upstream configuration and update the route URI as well.

Now send a request:

curl http://127.0.0.1:9080/rag -XPOST  -H 'Content-Type: application/json' -d '{"ai_rag":{"vector_search":{"fields":"contentVector"},"embeddings":{"input":"which service is good for devops","dimensions":1024}}}'

You will receive a response like this:

{
"choices": [
{
"finish_reason": "length",
"index": 0,
"message": {
"content": "Here are the details for some of the services you inquired about from your Azure search context:\n\n ... <rest of the response>",
"role": "assistant"
}
}
],
"created": 1727079764,
"id": "chatcmpl-AAYdA40YjOaeIHfgFBkaHkUFCWxfc",
"model": "gpt-4o-2024-05-13",
"object": "chat.completion",
"system_fingerprint": "fp_67802d9a6d",
"usage": {
"completion_tokens": 512,
"prompt_tokens": 6560,
"total_tokens": 7072
}
}