Skip to main content

Load Balancing AI/ML API with Apache APISIX

· 5 min read
Yilia Lin

This blog provides a step-by-step guide to configure Apache APISIX for AI traffic splitting and load balancing between API versions, covering security setup, canary testing, and deployment monitoring.

Overview

AI/ML API is a one-stop, OpenAI-compatible endpoint that is trusted by 150,000+ developers to 300+ state-of-the-art models—chat, vision, image/video/music generation, embeddings, OCR, and more—from Google, Meta, OpenAI, Anthropic, Mistral, and others.

Apache APISIX is a dynamic, real-time, high-performance API Gateway. APISIX API Gateway provides rich traffic management features and can serve as an AI Gateway through its flexible plugin system.

Modern AI workloads often require smooth version migrations, A/B testing, and rolling updates. This guide shows you how to:

  1. Install Apache APISIX with Docker quickstart.
  2. Secure the Admin API with keys and IP whitelisting.
  3. Define separate routes for API versions v1 and v2.
  4. Implement weighted traffic splitting (50/50) via the traffic-split plugin.
  5. Verify the newly created split endpoint functionality.
  6. Load test and monitor distribution accuracy.

To perform authenticated requests, you'll need an AI/ML API key. You can get one at https://aimlapi.com/app/keys/ and use it as a Bearer token in your Authorization headers.

Generate AI/ML API Key
Click to Preview

Quickstart Installation

# 1. Download and run the quickstart script (includes etcd + APISIX)
curl -sL https://run.api7.ai/apisix/quickstart | sh

# 2. Confirm APISIX is up and running
curl -I http://127.0.0.1:9080 | grep Server
# ➜ Server: APISIX/3.13.0

Tip: If you encounter port conflicts, adjust Docker host networking or map to different ports in the quickstart script.

Secure the Admin API

By default, quickstart bypasses Admin API authentication. For any non-development environment, enforce security:

1. Set an Admin Key

Edit conf/config.yaml inside the APISIX container or local install directory, replacing the example key with your own API key obtained from the link above:

apisix:
enable_admin: true # Enable Admin API
admin_key_required: true # Reject unauthenticated Admin requests
admin_key:
- name: admin
key: YOUR_ADMIN_KEY_HERE # Generated admin key - you can replace this with a secure key as you wish
role: admin

Security Best Practice: Use at least 32 characters, mix letters/numbers/symbols, and rotate keys quarterly.

2. Whitelist Management IPs (allow_admin)

Add your management or local networks under the admin: section:

admin:
allow_admin:
- 127.0.0.0/24 # Localhost & host network
- 0.0.0.0/0 # Allow all (temporary/testing only)

Warning: 0.0.0.0/0 opens Admin API to the world! Lock this down to specific subnets in production.

3. Restart APISIX

docker restart apisix-quickstart

Check Logs: docker logs apisix-quickstart --tail 50 to ensure no errors about admin authentication.

Define Basic Routes for v1 and v2

Before splitting traffic, ensure each version works individually.

1. Route for v1

curl -i http://127.0.0.1:9180/apisix/admin/routes/test-v1 \
-X PUT \
-H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \
-d '{
"uri": "/test/v1",
"upstream": {
"type": "roundrobin",
"nodes": {"api.aimlapi.com:443": 1},
"scheme": "https",
"pass_host": "node"
}
}'

Tip: Use id fields if you want to manage or delete routes easily later.

2. Route for v2

curl -i http://127.0.0.1:9180/apisix/admin/routes/test-v2 \
-X PUT \
-H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \
-d '{
"uri": "/test/v2",
"upstream": {
"type": "roundrobin",
"nodes": {"api.aimlapi.com:443": 1},
"scheme": "https",
"pass_host": "node"
}
}'

Implement Traffic Splitting (50/50)

Use the traffic-split plugin for controlled distribution between v1 and v2. In the admin request below, replace YOUR_ADMIN_KEY_HERE with your actual key.

curl -i http://127.0.0.1:9180/apisix/admin/routes/aimlapi-split \
-X PUT \
-H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \
-d '{
"id": "aimlapi-split",
"uri": "/chat/completions",
"upstream": {
"type": "roundrobin",
"nodes": {"api.aimlapi.com:443": 1},
"scheme": "https",
"pass_host": "node"
},
"plugins": {
"traffic-split": {
"rules": [
{
"weight": 50,
"upstream": {"type":"roundrobin","nodes":{"api.aimlapi.com:443":1},"scheme":"https","pass_host":"node"},
"rewrite": {"uri":"/v1/chat/completions"}
},
{
"weight": 50,
"upstream": {"type":"roundrobin","nodes":{"api.aimlapi.com:443":1},"scheme":"https","pass_host":"node"},
"rewrite": {"uri":"/v2/chat/completions"}
}
]
}
}
}'

Tip: Adjust the weight values to shift traffic ratios (e.g., 80/20 for canary).

Note: rewrite must match the internal API path exactly.

Verify Split Endpoint Functionality

Test the /chat/completions endpoint you just created. Replace <AIML_API_KEY> with the key obtained earlier and use it as a Bearer token:

curl -v -X POST http://127.0.0.1:9080/chat/completions \
-H "Authorization: Bearer <AIML_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"ping"}]}'

Expected Output:

{"content":"Pong! How can I assist you today?"}

Tip: Use -v for verbose output to troubleshoot headers or TLS issues.

Load Test & Distribution Validation

After configuring the split route, use the following commands to validate distribution. Replace <AIML_API_KEY> with your Bearer token.

# 1. Send 100 test requests
time seq 100 | xargs -I {} curl -s -o /dev/null -X POST http://127.0.0.1:9080/chat/completions \
-H "Authorization: Bearer <AIML_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"ping"}]}'

# 2. Check APISIX logs for upstream hits (replace IPs with actual resolved IPs)
echo "v1 hits: $(docker logs apisix-quickstart --since 5m | grep -c '188.114.97.3:443')"
echo "v2 hits: $(docker logs apisix-quickstart --since 5m | grep -c '188.114.96.3:443')"

Expected: Approximately 50 requests to each upstream.

Tip: Use Prometheus or OpenTelemetry plugins for real‑time metrics instead of manual log parsing.

Best Practices & Next Steps

  • Rate Limiting & Quotas: Add limit-count plugin to protect your upstream from spikes.
  • Authentication: Layer on the key-auth plugin for consumer management.
  • Circuit Breaker: Prevent cascading failures with the api-breaker plugin.
  • Observability: Integrate Prometheus, Skywalking, or Loki for dashboards and alerts.
  • Infrastructure as Code: Consider managing APISIX config via Kubernetes CRDs or ADC for reproducibility.

References