Skip to main content

· One min read

API gateway authentication is the practice of verifying client identity at a centralized entry point before requests reach backend services. By enforcing authentication at the gateway layer, organizations eliminate redundant auth logic across services, reduce attack surface, and gain a single enforcement point for access policies.

What is API Gateway Authentication#

In a distributed architecture, every service that exposes an endpoint must answer a fundamental question: who is making this request? Without a gateway, each service independently implements its own authentication stack. This leads to inconsistent enforcement, duplicated code, and a broader attack surface.

An API gateway centralizes this concern. It intercepts every inbound request, validates credentials against a configured identity provider or local store, and either forwards the authenticated request downstream or rejects it immediately. Broken authentication consistently ranks among the top API vulnerability categories, making centralized enforcement critical.

Centralizing authentication at the gateway layer provides three key advantages. First, it significantly reduces per-service authentication code by consolidating auth logic into a single component. Second, it creates a single audit log for every authentication event. Third, it enables credential rotation and policy changes without redeploying individual services.

Authentication Methods#

Key Auth#

Key authentication is the simplest method. The client includes a static API key in a header or query parameter. The gateway validates the key against a stored registry and maps it to a consumer identity.

Key Auth works well for server-to-server communication where transport security (TLS) is guaranteed and the client population is small. API keys remain common for machine-to-machine authentication, though their share is declining as organizations move toward token-based methods.

Apache APISIX supports Key Auth natively through its key-auth plugin. Configuration requires only defining a consumer and attaching the plugin to a route.

JWT (JSON Web Tokens)#

JWT authentication uses digitally signed tokens that carry claims about the client. The gateway validates the token signature, checks expiration, and optionally verifies audience and issuer claims. Because JWTs are self-contained, the gateway does not need to call an external service on every request.

JWTs dominate modern API authentication. The compact format and stateless verification make JWTs particularly well-suited for high-throughput gateways where microsecond-level latency matters.

APISIX implements JWT validation through its jwt-auth plugin, supporting both HS256 and RS256 algorithms with configurable claim validation.

OAuth 2.0#

OAuth 2.0 is an authorization framework that enables third-party applications to obtain limited access to an API on behalf of a resource owner. The gateway validates bearer tokens issued by an authorization server, typically by introspecting the token or verifying a JWT access token locally.

OAuth 2.0 is widely adopted across enterprises for API integrations. The framework's delegation model makes it essential for any API exposed to external developers or partner ecosystems.

OpenID Connect (OIDC)#

OpenID Connect extends OAuth 2.0 with a standardized identity layer. It adds an ID token (a JWT) that carries user identity claims alongside the OAuth 2.0 access token. The gateway can validate the ID token to confirm user identity and use the access token for authorization decisions.

OIDC is the de facto standard for single sign-on in API ecosystems. Major identity providers including Okta, Auth0, Azure AD, and Google Identity all implement OIDC. APISIX provides native OIDC support through its openid-connect plugin, which handles the full authorization code flow, token introspection, and token refresh.

mTLS (Mutual TLS)#

Mutual TLS requires both the client and server to present certificates during the TLS handshake. The gateway validates the client certificate against a trusted certificate authority, establishing strong machine identity without application-layer tokens.

mTLS adoption has surged alongside zero-trust architecture initiatives. In Kubernetes environments, mTLS between services has become increasingly common. At the gateway level, mTLS is particularly valuable for B2B integrations and internal service-to-service communication where certificate management infrastructure already exists.

HMAC Authentication#

HMAC authentication requires the client to compute a hash-based message authentication code over the request content using a shared secret. The gateway independently computes the same HMAC and compares the results. This method provides request integrity verification in addition to authentication.

HMAC is common in financial APIs and webhook verification scenarios where request tampering must be detected. AWS Signature Version 4, used across all AWS API calls, is an HMAC-based scheme processing billions of requests daily.

Comparison Table#

MethodComplexityStatefulnessBest ForToken Expiry
Key AuthLowStateless (lookup)Internal services, simple integrationsManual rotation
JWTMediumStatelessHigh-throughput APIs, mobile clientsBuilt-in (exp claim)
OAuth 2.0HighStateful (auth server)Third-party access, delegated authAccess token TTL
OIDCHighStateful (identity provider)SSO, user-facing APIsID + access token TTL
mTLSHighStateless (cert validation)Zero-trust, B2B, service meshCertificate validity period
HMACMediumStatelessFinancial APIs, webhook verificationPer-key rotation policy

Best Practices#

Layer your authentication. Use mTLS at the transport layer for service identity and JWT or OAuth 2.0 at the application layer for user identity. Defense in depth reduces the impact of any single credential compromise.

Enforce short-lived tokens. Set JWT and OAuth 2.0 access token lifetimes to 15 minutes or less for user-facing flows. Use refresh tokens to obtain new access tokens without re-authentication. Short token lifetimes limit the window of exploitation if a token is leaked.

Centralize consumer management. Define consumers at the gateway level with consistent identity attributes. Map every API key, JWT subject, and OAuth 2.0 client ID to a named consumer entity. This enables unified rate limiting, logging, and access control across authentication methods.

Validate all claims. Do not trust a JWT solely because its signature is valid. Verify the issuer (iss), audience (aud), expiration (exp), and not-before (nbf) claims. Reject tokens with unexpected or missing claims.

Log authentication events comprehensively. Record every authentication success and failure with client identity, timestamp, source IP, and the route accessed. These logs are essential for incident response and compliance audits. NIST SP 800-92 recommends retaining authentication logs for a minimum of 90 days.

How Apache APISIX Handles Authentication#

Apache APISIX provides a plugin-based authentication architecture that supports all six methods described above. Each authentication plugin runs in the gateway's request processing pipeline before the request reaches any upstream service.

APISIX's consumer abstraction ties authentication credentials to named entities. A single consumer can have multiple authentication methods attached, enabling gradual migration between methods. For example, an organization migrating from Key Auth to JWT can configure both plugins on the same consumer during the transition period.

Key plugins include:

  • key-auth: Static API key validation with header or query parameter extraction.
  • jwt-auth: JWT signature verification with configurable algorithms and claim validation.
  • openid-connect: Full OIDC flow support including authorization code, token introspection, and PKCE.

APISIX also supports chaining authentication plugins with authorization plugins such as consumer-restriction and OPA (Open Policy Agent), enabling fine-grained access control decisions after identity is established.

Performance benchmarks show APISIX processing authenticated requests with sub-millisecond overhead for Key Auth and JWT validation, and under 5ms for OIDC token introspection with a local identity provider. These numbers hold at sustained loads exceeding 10,000 requests per second on modest hardware.

FAQ#

Should I use JWT or OAuth 2.0 for my API?#

JWT and OAuth 2.0 are not mutually exclusive. OAuth 2.0 is an authorization framework that often uses JWTs as its access token format. If your API serves first-party clients only, standalone JWT authentication may suffice. If third-party developers need delegated access, implement the full OAuth 2.0 framework with JWT access tokens.

Is API key authentication secure enough for production?#

API key authentication is secure for server-to-server communication over TLS when keys are rotated regularly and scoped to specific consumers. It is not recommended for client-side applications (browsers, mobile apps) because keys cannot be kept secret on end-user devices. For any client-facing API, prefer OAuth 2.0 or OIDC.

How does mTLS differ from standard TLS at the gateway?#

Standard TLS authenticates only the server to the client. The client verifies the server's certificate, but the server accepts any client connection. mTLS adds a second handshake step where the client also presents a certificate that the server validates against a trusted CA. This provides strong machine identity for both parties and is a foundational component of zero-trust network architectures.

Can I combine multiple authentication methods on a single route?#

Yes. Apache APISIX supports configuring multiple authentication plugins on a single route. The gateway attempts each configured method in order and accepts the request if any method succeeds. This is useful during migration periods or when a route serves clients with different authentication capabilities.

· One min read

Microservices architectures need an API gateway to provide a single entry point that abstracts the complexity of a distributed service fleet from API consumers. The gateway handles cross-cutting concerns like authentication, routing, rate limiting, and observability centrally, preventing each microservice from reimplementing these capabilities independently and ensuring consistent behavior across the entire API surface.

Why Microservices Need a Gateway#

A microservices architecture decomposes a monolithic application into independently deployable services, each owning a specific business domain. While this approach improves development velocity and scaling flexibility, it introduces operational challenges that compound as the number of services grows.

Without a gateway, every client must know the network location of every service it needs. A single mobile application screen might require data from five different services, forcing the client to manage multiple connections, handle partial failures, and aggregate responses. As organizations scale their microservices fleets, client-side orchestration becomes impractical.

The API gateway pattern, first described by Chris Richardson and widely adopted since, solves this by interposing a single component between clients and the service fleet. The gateway accepts all client requests, routes them to the appropriate services, and returns consolidated responses. The pattern has become a standard component in production microservices architectures.

The gateway also addresses a second fundamental problem: cross-cutting concern duplication. Authentication, logging, rate limiting, CORS handling, and request validation must be applied consistently across all services. Without a gateway, each service team implements these independently, leading to drift, inconsistency, and duplicated effort. Centralizing cross-cutting concerns at the gateway significantly reduces per-service boilerplate code.

Core Gateway Patterns#

Request Routing#

The most fundamental gateway pattern routes incoming requests to the correct upstream service based on URL paths, headers, methods, or other request attributes. A gateway might route /api/users/* to the user service, /api/orders/* to the order service, and /api/products/* to the catalog service.

Dynamic routing takes this further by reading route configurations from a control plane or configuration store, allowing routes to be updated without restarting the gateway. Apache APISIX supports dynamic route configuration through its Admin API and etcd-backed configuration store, enabling zero-downtime route changes.

API Composition (Aggregation)#

API composition combines responses from multiple microservices into a single response for the client. Instead of requiring a mobile application to make five separate API calls to render a dashboard, the gateway fetches data from all five services in parallel and returns a unified response.

This pattern reduces client-side complexity and network round trips. Consolidating multiple API calls into a single gateway request significantly decreases page load time, especially on mobile networks where each round trip adds noticeable latency.

Backend for Frontend (BFF)#

The BFF pattern creates gateway configurations tailored to specific client types. A mobile BFF provides compact responses optimized for bandwidth constraints and small screens. A web BFF returns richer data structures suited to desktop layouts. An internal BFF serves admin tools with elevated access.

Each BFF acts as a specialized gateway layer that transforms and filters upstream service responses for its target client. Netflix, Spotify, and SoundCloud have publicly documented their BFF implementations. The pattern prevents a one-size-fits-all API from forcing compromises on every client type.

Service Mesh Integration#

In architectures that deploy both an API gateway and a service mesh, the gateway handles north-south traffic (client to cluster) while the service mesh manages east-west traffic (service to service). The gateway provides external-facing features like API key authentication, rate limiting, and response transformation. The mesh handles internal concerns like mTLS, circuit breaking, and service-to-service load balancing.

Most organizations using a service mesh also deploy an API gateway with clear boundaries between the two components. This separation avoids the complexity of running a full mesh for external traffic while preserving mesh benefits for internal communication.

Key Features for Microservices#

Service Discovery#

In a microservices environment, services scale dynamically and their network locations change frequently. Static configuration of upstream addresses becomes impractical at scale. Service discovery enables the gateway to automatically detect available service instances and their health status.

Apache APISIX integrates with multiple service discovery systems, documented in its discovery configuration guide. Supported registries include Consul, Eureka, Nacos, and Kubernetes-native service discovery. When a new service instance registers or an existing instance becomes unhealthy, APISIX updates its routing table automatically.

In Kubernetes environments, APISIX can read Service and Endpoint resources directly, eliminating the need for a separate discovery system. Kubernetes-native service discovery typically provides faster routing updates compared to polling-based approaches.

Circuit Breaking#

Circuit breaking prevents cascading failures by stopping requests to an unhealthy upstream service. When error rates exceed a configured threshold, the circuit opens and the gateway returns a fast failure response instead of forwarding requests to the struggling service. After a cooldown period, the circuit enters a half-open state and allows a limited number of test requests through. If those succeed, the circuit closes and normal traffic resumes.

Without circuit breaking, a single unhealthy service can consume all available connections and thread pools in the gateway, causing failures to cascade across the entire system. Organizations using circuit breakers typically experience significantly shorter outage durations.

Canary Deployment#

Canary deployment routes a small percentage of production traffic to a new service version while the majority continues to hit the stable version. The gateway controls the traffic split, enabling teams to validate new releases with real traffic before committing to a full rollout.

APISIX supports traffic splitting through weighted upstream configurations. A typical canary deployment starts with 5% of traffic routed to the new version, gradually increasing to 25%, 50%, and finally 100% as metrics confirm stability. If errors spike, the gateway reverts the traffic split without any service redeployment.

Distributed Tracing#

In a microservices architecture, a single client request might traverse ten or more services. Distributed tracing tracks the request's path through the entire service chain, recording latency at each hop. The gateway plays a critical role by injecting trace context headers (W3C Trace Context or B3) into every request it forwards.

APISIX supports trace context propagation to observability backends including Zipkin, Jaeger, SkyWalking, and OpenTelemetry. With tracing enabled at the gateway, operations teams gain end-to-end visibility into request flows, enabling faster incident resolution compared to relying solely on logs and metrics.

API Gateway vs Service Mesh#

API gateways and service meshes both manage network traffic in a microservices architecture, but they target different communication patterns and offer different feature sets.

AspectAPI GatewayService Mesh
Traffic directionNorth-south (external to internal)East-west (internal to internal)
Deployment modelCentralized proxyDistributed sidecar proxies
Primary focusAPI management, external securityInternal networking, observability
AuthenticationAPI keys, JWT, OAuth, OIDCmTLS (identity-based)
Rate limitingPer-consumer, per-routePer-service (less granular)
Protocol supportHTTP, gRPC, WebSocket, GraphQLTCP, HTTP, gRPC
Request transformationYesTypically no

The two technologies are complementary, not competitive. Organizations deploying both an API gateway and a service mesh generally report improved overall system reliability compared to using either component alone.

How Apache APISIX Supports Microservices#

Apache APISIX is designed for microservices environments, offering dynamic configuration, multi-protocol support, and native integration with cloud-native infrastructure.

Dynamic configuration without restarts. APISIX stores configuration in etcd and applies changes in real time. Routes, upstream definitions, plugins, and consumers can be added, modified, or removed through the Admin API without restarting any gateway node. This is essential in microservices environments where service endpoints change frequently.

Plugin pipeline architecture. APISIX's plugin system runs a configurable pipeline of plugins on each request. For microservices, this means authentication, rate limiting, request transformation, and logging execute as independent pipeline stages. Plugins can be enabled per-route, per-service, or globally, providing fine-grained control over cross-cutting behavior.

Kubernetes-native operation. The APISIX Ingress Controller deploys APISIX as a Kubernetes-native gateway, supporting both the legacy Ingress resource and the newer Gateway API specification. This allows platform teams to manage gateway configuration using familiar Kubernetes declarative workflows.

Service discovery integration. APISIX's service discovery capabilities connect directly to Consul, Nacos, Eureka, and Kubernetes DNS, ensuring the gateway always routes to healthy, available service instances without manual configuration updates.

Observability. Built-in plugins export metrics to Prometheus, traces to Jaeger, Zipkin, and OpenTelemetry, and logs to HTTP endpoints, syslog, or Kafka. This enables a complete observability pipeline with the gateway as the instrumentation point for all external API traffic.

FAQ#

Do I need an API gateway if I already use a service mesh?#

Yes. A service mesh manages internal service-to-service communication but does not address external API concerns like consumer authentication, API key management, rate limiting per consumer, request transformation, or developer-facing documentation. The API gateway handles the north-south boundary where external clients interact with your microservices. Deploy both for comprehensive traffic management.

How does an API gateway handle partial failures across microservices?#

An API gateway can be configured with circuit breakers, timeouts, and retry policies for each upstream service. When using the API composition pattern, the gateway can return partial results with degraded status rather than failing the entire request when one backend is unavailable. APISIX supports configurable timeouts and retry counts per route, and health checks automatically remove unhealthy nodes from the upstream pool.

Should each microservice team manage their own gateway routes?#

A decentralized model where service teams own their route configurations works well at scale, provided the platform team controls the global policies (authentication requirements, rate limiting defaults, logging standards). APISIX supports this through its Admin API, which can be integrated into CI/CD pipelines. Service teams declare their routes in version-controlled configuration files, and the deployment pipeline applies changes through the Admin API after policy validation.

What is the performance impact of adding an API gateway to a microservices architecture?#

Apache APISIX, built on NGINX and LuaJIT, adds 1-2ms of latency per request with a typical plugin configuration (authentication, rate limiting, logging). For most microservices architectures where end-to-end request latency is 50-500ms, this overhead is under 2% of total latency. The operational benefits of centralized security, observability, and traffic management significantly outweigh the minor latency cost.

· One min read

API gateway rate limiting is the practice of controlling how many requests a client can make to your API within a defined time window. Implemented at the gateway layer, rate limiting protects backend services from overload, prevents abuse, ensures fair resource allocation across consumers, and maintains predictable service quality under variable traffic conditions.

What is Rate Limiting#

Rate limiting enforces a maximum request throughput for API consumers. When a client exceeds its allowed quota, the gateway returns an HTTP 429 (Too Many Requests) response instead of forwarding the request to the upstream service. The response typically includes a Retry-After header indicating when the client can resume making requests.

The need for rate limiting has grown alongside API traffic volumes. API traffic now represents the majority of HTTP requests processed globally, and a significant portion consists of automated requests, many of which are abusive or unintentional high-frequency polling.

Without rate limiting, a single misbehaving client can consume disproportionate backend resources, degrading performance for all consumers. Rate limiting is also a contractual tool: it enforces the usage tiers defined in API monetization plans and SLAs.

Why Rate Limit at the Gateway#

Implementing rate limiting at the API gateway rather than in individual services provides several structural advantages.

Single enforcement point. When rate limits are defined at the gateway, every request passes through the same throttling logic regardless of which upstream service handles it. This eliminates the risk of inconsistent enforcement across a microservices fleet and reduces availability incidents caused by traffic spikes.

Reduced backend load. Rejected requests never reach the upstream service. This means the gateway absorbs the cost of excess traffic, keeping backend services operating within their designed capacity.

Consistent client experience. Centralized rate limiting ensures all consumers receive the same HTTP 429 responses with standardized headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset), making it straightforward for client developers to implement backoff logic.

Operational visibility. Gateway-level rate limiting produces unified metrics on throttled requests, enabling operations teams to identify abusive clients, undersized quotas, and traffic anomalies from a single dashboard.

Rate Limiting Algorithms#

Token Bucket#

The token bucket algorithm maintains a bucket of tokens for each rate-limited entity. Tokens are added at a fixed rate up to a maximum capacity. Each request consumes one token. If the bucket is empty, the request is rejected.

Token bucket allows short bursts up to the bucket capacity while enforcing an average rate over time. This makes it well-suited for APIs where occasional traffic spikes are acceptable but sustained overuse is not.

Pros: Permits controlled bursting, simple to implement, low memory footprint.

Cons: Burst size must be tuned carefully; overly generous bursts can still overwhelm backends.

Leaky Bucket#

The leaky bucket algorithm processes requests at a fixed rate, queuing excess requests until the queue is full. It smooths traffic into a uniform output rate regardless of input burstiness.

Leaky bucket is ideal for backends that require strictly uniform request rates, such as third-party APIs with their own rate limits or services with fixed connection pools.

Pros: Produces perfectly smooth output, prevents backend overload from bursts.

Cons: Higher latency for bursty traffic due to queuing, queue size requires tuning.

Sliding Window#

The sliding window algorithm divides time into overlapping windows and counts requests across the current and previous windows using weighted proportions. This eliminates the boundary problem inherent in fixed windows.

For example, if the window is 60 seconds and the current position is 40 seconds into the window, the algorithm weights 33% of the previous window's count and 100% of the current window's count to determine if the limit is exceeded.

Pros: Accurate rate enforcement without boundary spikes, reasonable memory usage.

Cons: Slightly more complex to implement than fixed window.

Fixed Window#

The fixed window algorithm divides time into non-overlapping intervals and counts requests within each interval. When the count exceeds the limit, subsequent requests are rejected until the next window begins.

Fixed window is the simplest algorithm but has a well-known boundary problem: a client can make double the intended rate by clustering requests at the end of one window and the beginning of the next. Despite this limitation, fixed window remains widely deployed due to its simplicity and low overhead.

Pros: Minimal memory and computation, easy to understand and debug.

Cons: Boundary burst problem allows temporary rate doubling.

Algorithm Comparison#

AlgorithmBurst HandlingOutput SmoothnessMemoryComplexityBoundary Accuracy
Token BucketAllows controlled burstsModerateLowLowN/A
Leaky BucketQueues burstsVery smoothMediumLowN/A
Sliding WindowProportional smoothingSmoothMediumMediumHigh
Fixed WindowBoundary bursts possibleLowVery lowVery lowLow

Rate Limiting Strategies#

Per-Consumer#

Assign rate limits based on authenticated consumer identity. This is the most common strategy for APIs with tiered pricing plans. A free tier consumer might receive 100 requests per minute while a paid enterprise consumer receives 10,000.

Per-consumer rate limiting requires the rate limiting plugin to execute after authentication so the consumer identity is available. APISIX's consumer abstraction makes this straightforward: attach rate limit configurations directly to consumer definitions.

Per-IP#

Throttle requests based on the client's source IP address. This strategy is effective for public APIs that do not require authentication, such as health check endpoints or public data feeds. IP-based rate limiting is a practical first line of defense against volumetric API abuse, especially when combined with reputation scoring.

Per-IP limiting has limitations in environments where many clients share a single IP (corporate NATs, mobile carriers). Use it as a coarse first defense layer, not as the sole rate limiting strategy.

Per-Route#

Apply different rate limits to different API endpoints based on their resource cost. A search endpoint that triggers expensive database queries might have a stricter limit than a simple metadata lookup. This strategy protects the most resource-intensive parts of your backend.

Global#

Enforce an aggregate rate limit across all consumers and routes. Global limits protect the overall system capacity and are typically set well above individual consumer limits. They serve as a safety net when the sum of individual limits exceeds actual infrastructure capacity.

How Apache APISIX Implements Rate Limiting#

Apache APISIX provides three complementary rate limiting plugins, each targeting a different dimension of traffic control.

limit-req (Request Rate Limiting)#

The limit-req plugin implements a leaky bucket algorithm that controls the request rate per second. It accepts configuration for the sustained request rate (rate), the burst allowance (burst), and the rejection status code.

This plugin is ideal when you need to smooth traffic to a uniform rate. It supports keying on remote address, consumer name, service, or any variable available in the APISIX context.

limit-count (Request Count Limiting)#

The limit-count plugin enforces a maximum number of requests within a configurable time window. It supports both fixed window and sliding window algorithms, with the window size configurable from one second to one day.

limit-count is the best choice for implementing API quota plans (e.g., 10,000 requests per day). It returns standard rate limit headers so clients can track their remaining quota. For distributed deployments, limit-count supports shared counters via Redis, ensuring accurate enforcement across multiple gateway nodes. In benchmarks, Redis-backed distributed counting adds less than 1ms of latency per request at the 99th percentile.

limit-conn (Concurrent Connection Limiting)#

The limit-conn plugin restricts the number of concurrent requests being processed simultaneously. Unlike rate-based limits, connection limits protect against slow-client attacks and long-running requests that tie up backend connections.

This plugin is essential for APIs that serve large file downloads, streaming responses, or long-polling connections. It works by counting active connections per key and rejecting new connections when the limit is exceeded.

Combining Plugins#

APISIX allows stacking all three plugins on a single route. A typical production configuration might combine limit-count for daily quotas, limit-req for per-second smoothing, and limit-conn for concurrent connection caps. The plugins execute in order, and a request rejected by any plugin does not consume quota in subsequent plugins.

This layered approach mirrors industry best practice. Production APIs benefit from enforcing at least two independent rate limiting dimensions to provide comprehensive protection.

FAQ#

What HTTP status code should I return for rate-limited requests?#

Return HTTP 429 (Too Many Requests) as defined in RFC 6585. Include a Retry-After header with the number of seconds the client should wait before retrying. Additionally, include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers so clients can proactively manage their request rate. APISIX's limit-count plugin returns these headers automatically.

How do I handle rate limiting in a distributed gateway deployment?#

Use a shared counter store such as Redis. APISIX's limit-count plugin natively supports Redis and Redis Cluster backends for distributed counter synchronization. This ensures that rate limits are enforced accurately regardless of which gateway node processes the request. The trade-off is a small latency increase (typically under 1ms) for the Redis round-trip on each request.

Should I rate limit internal service-to-service traffic?#

Yes, but with different thresholds. Internal rate limiting prevents cascading failures when one service sends an unexpectedly high volume of requests to another. Set internal limits based on measured capacity rather than commercial quotas. Circuit breakers complement internal rate limiting by stopping requests entirely when a downstream service is unhealthy.

How do I communicate rate limits to API consumers?#

Document rate limits in your API reference and include them in onboarding materials. Use standard rate limit response headers on every response (not just 429 responses) so clients can monitor their consumption in real time. Provide a dedicated endpoint or dashboard where consumers can check their current usage against their quota. For paid tiers, send proactive notifications when consumers approach their limits.

· One min read

API gateway security is the practice of protecting your API infrastructure at the edge by enforcing authentication, authorization, rate limiting, and traffic filtering before requests reach backend services. A properly secured gateway reduces attack surface, prevents data breaches, and ensures compliance across every API endpoint in your organization.

Why API Gateway Security Matters#

APIs have become the primary attack vector for modern applications. According to the OWASP API Security Top 10 (2023 edition), broken object-level authorization and broken authentication remain the two most critical API vulnerabilities, affecting organizations across every industry. The explosive growth of API-first architectures has created an equally explosive growth in API-targeted attacks.

The cost of getting API security wrong is substantial, as breaches involving API vulnerabilities tend to take longer to identify and contain and carry significant financial impact. The API gateway sits at a unique vantage point: it processes every inbound request, making it the single most effective location to enforce security policies consistently.

Common API Threats#

Understanding the threat landscape is essential for building an effective defense. The following categories represent the most frequent and damaging attack patterns targeting APIs today.

Broken Object-Level Authorization (BOLA)#

BOLA attacks exploit weak authorization checks to access resources belonging to other users. An attacker modifies object identifiers in API requests (for example, changing /users/123/orders to /users/456/orders) to retrieve unauthorized data. BOLA remains one of the most exploited API vulnerability classes, particularly in organizations where API management and authorization enforcement have not kept pace with API proliferation.

Injection Attacks#

SQL injection, NoSQL injection, and command injection remain persistent threats. Attackers embed malicious payloads in query parameters, headers, or request bodies. Despite being a well-known vulnerability class, injection attacks continue to appear frequently in web application security assessments.

Broken Authentication#

Weak or improperly implemented authentication mechanisms allow attackers to assume legitimate user identities. Common failures include missing token validation, weak password policies, credential stuffing vulnerabilities, and improper session management. Credential stuffing attacks account for billions of login attempts monthly across the internet.

Excessive Data Exposure#

APIs frequently return more data than the client needs, relying on the frontend to filter sensitive fields. Attackers bypass the frontend and consume raw API responses directly, gaining access to data never intended for display. This over-exposure is especially dangerous in mobile applications where API traffic is easily intercepted.

Rate Limit Bypass#

Without proper rate limiting, attackers can launch brute-force attacks, denial-of-service campaigns, and credential enumeration at scale. Automated bot traffic constitutes a significant portion of all internet traffic, and much of it targets API endpoints specifically.

Security Layers at the Gateway#

A defense-in-depth approach applies multiple security controls at the gateway layer, each addressing a distinct category of risk.

Authentication#

The gateway should verify identity before any request reaches a backend service. Common mechanisms include JWT validation, OAuth 2.0 token introspection, API key verification, and mutual TLS (mTLS) for service-to-service communication. Centralizing authentication at the gateway eliminates the risk of inconsistent enforcement across individual services.

Authorization#

Beyond verifying identity, the gateway must enforce access control. Role-based access control (RBAC), attribute-based access control (ABAC), and scope-based authorization ensure that authenticated users can only access resources and operations they are permitted to use. Fine-grained authorization at the gateway prevents BOLA vulnerabilities at scale.

Rate Limiting and Throttling#

Rate limiting protects backend services from abuse and ensures fair resource allocation. Effective rate limiting operates at multiple granularities: per consumer, per route, per IP address, and globally. A substantial share of traffic on the average website comes from bots, and rate limiting is the first line of defense against automated abuse.

IP Restriction#

IP allowlists and denylists provide coarse-grained access control. While not sufficient as a sole security measure, IP restriction is valuable for restricting administrative endpoints, limiting partner API access to known address ranges, and blocking traffic from regions associated with attack activity.

WAF and CORS#

A Web Application Firewall (WAF) at the gateway layer inspects request payloads for known attack patterns. CORS policies prevent unauthorized cross-origin requests from browser-based clients. Together, they address both server-side injection attacks and client-side cross-origin abuse.

TLS Termination#

TLS termination at the gateway ensures that all client-to-gateway traffic is encrypted. The gateway handles certificate management, cipher suite configuration, and protocol version enforcement, relieving backend services of this operational burden. The vast majority of web traffic now uses HTTPS, and TLS is considered a baseline requirement for any production API.

Request Validation#

Schema-based request validation rejects malformed or oversized payloads before they reach backend services. Validating request structure, data types, and content length at the gateway prevents injection attacks and reduces the attack surface of downstream services.

Zero-Trust API Architecture#

Zero-trust architecture assumes that no request is inherently trustworthy, regardless of its origin. Every API call must be authenticated, authorized, and validated, whether it arrives from the public internet, an internal service, or a trusted partner.

At the gateway layer, zero-trust principles translate into several concrete practices. Every request carries verifiable identity credentials. Authorization is evaluated per request rather than per session. Network location (internal vs. external) does not confer implicit trust. All traffic is encrypted, including east-west service-to-service communication. The API gateway enables zero-trust by serving as a policy enforcement point. It validates tokens, checks permissions, and applies security policies uniformly across all traffic, creating a consistent security boundary regardless of the underlying network topology.

Security Best Practices#

The following practices represent a comprehensive approach to API gateway security that organizations should adopt incrementally based on risk profile.

  1. Enforce authentication on every endpoint. No API route should be accessible without verified identity. Use JWTs with short expiration times and validate signatures on every request.

  2. Implement least-privilege authorization. Grant the minimum permissions required for each consumer. Default to deny and require explicit grants for sensitive operations.

  3. Apply rate limiting at multiple levels. Configure per-consumer, per-route, and global rate limits. Use sliding window algorithms to prevent burst abuse while accommodating legitimate traffic spikes.

  4. Validate all request inputs. Enforce request schema validation at the gateway. Reject payloads that exceed expected sizes, contain unexpected fields, or fail type checks.

  5. Use mutual TLS for service-to-service calls. Encrypt and authenticate all internal traffic. Rotate certificates automatically and enforce certificate validation on every connection.

  6. Enable WAF rules for known attack patterns. Deploy rulesets targeting SQL injection, XSS, and command injection. Update rules regularly to address emerging attack vectors.

  7. Log and audit all security events. Capture authentication failures, authorization denials, rate limit triggers, and WAF blocks. Feed security logs into a SIEM for correlation and alerting.

  8. Rotate credentials and secrets regularly. Automate API key rotation, certificate renewal, and token signing key rotation. Never embed secrets in client-side code or version control.

  9. Restrict administrative API access. Protect management APIs with strong authentication, IP restrictions, and separate credentials from data-plane APIs.

  10. Conduct regular security assessments. Perform API-specific penetration testing, not just general web application assessments. The OWASP API Security Testing Guide provides a structured methodology.

How Apache APISIX Secures APIs#

Apache APISIX provides a comprehensive set of security plugins that implement each layer of the defense-in-depth model described above.

For IP-based access control, the ip-restriction plugin supports allowlists and denylists at the route level, enabling fine-grained control over which addresses can reach specific endpoints.

Cross-origin resource sharing is managed through the CORS plugin, which configures allowed origins, methods, and headers to prevent unauthorized cross-origin requests from browser clients.

CSRF protection is available through the CSRF plugin, which generates and validates CSRF tokens to prevent cross-site request forgery attacks on state-changing API operations.

For mutual TLS, APISIX supports mTLS configuration for both client-to-gateway and gateway-to-upstream connections, ensuring encrypted and mutually authenticated communication at every hop.

APISIX also supports JWT authentication, key authentication, OpenID Connect, rate limiting with multiple algorithms, and request body validation. The plugin architecture enables security policies to be composed per route, allowing teams to apply exactly the controls each endpoint requires without over- or under-securing traffic.

FAQ#

What is the difference between API gateway security and API security?#

API security is the broad discipline of protecting APIs across their entire lifecycle, including design, development, testing, and runtime. API gateway security specifically refers to the security controls enforced at the gateway layer during runtime, such as authentication, rate limiting, and input validation. The gateway is one component of a comprehensive API security strategy, not a replacement for secure coding practices and security testing.

Should I terminate TLS at the API gateway or at the backend service?#

Terminate TLS at the gateway for client-facing connections. This centralizes certificate management and offloads cryptographic processing from backend services. For traffic between the gateway and upstream services, use mTLS to maintain encryption and mutual authentication throughout the request path. This approach balances operational simplicity with end-to-end security.

How many rate limiting layers should an API gateway enforce?#

Apply at least three layers: a global rate limit to protect overall infrastructure capacity, a per-consumer limit to prevent any single client from monopolizing resources, and per-route limits for endpoints with expensive backend operations. Use sliding window or leaky bucket algorithms rather than fixed windows to provide smoother throttling behavior and prevent burst abuse at window boundaries.

· One min read

An API gateway and a load balancer serve different primary purposes. A load balancer distributes network traffic across multiple backend servers to maximize throughput and availability. An API gateway operates at the application layer to manage, secure, and transform API traffic with features like authentication, rate limiting, and request routing. In modern architectures, they complement each other and are frequently deployed together.

What is a Load Balancer#

A load balancer sits between clients and a pool of backend servers, distributing incoming requests to ensure no single server becomes overwhelmed. Load balancers operate at either Layer 4 (TCP/UDP) or Layer 7 (HTTP/HTTPS) of the OSI model.

Layer 4 load balancers route traffic based on IP address and port number without inspecting the request content. They are fast, protocol-agnostic, and add minimal latency. Layer 7 load balancers inspect HTTP headers, URLs, and sometimes request bodies to make more intelligent routing decisions.

Load balancers are foundational infrastructure. The vast majority of organizations use some form of load balancing in their production environments. The technology has been a networking staple for over two decades, with the core algorithms (round-robin, least connections, weighted distribution) remaining largely unchanged.

The primary value of a load balancer is availability. By distributing traffic and performing health checks, load balancers ensure that the failure of a single backend instance does not cause a service outage. They also enable horizontal scaling: adding more backend instances to handle increased traffic without changing the client-facing endpoint.

What is an API Gateway#

An API gateway is an application-layer proxy that acts as the single entry point for API consumers. Beyond routing requests to the correct backend service, an API gateway provides a rich set of cross-cutting concerns: authentication, authorization, rate limiting, request and response transformation, caching, logging, and monitoring.

API gateways emerged from the needs of microservices architectures and API-first product strategies. When an organization exposes dozens or hundreds of microservices, a gateway centralizes the operational concerns that would otherwise be duplicated across every service.

An API gateway typically operates exclusively at Layer 7 and understands application-level protocols like HTTP, gRPC, WebSocket, and GraphQL. It makes routing decisions based on URL paths, headers, query parameters, and even request body content.

Feature Comparison#

CapabilityLoad BalancerAPI Gateway
Traffic distributionYes (core function)Yes (built-in)
Health checksYesYes
SSL/TLS terminationYesYes
Layer 4 routingYesTypically no
Layer 7 routingL7 LB onlyYes (core function)
AuthenticationNoYes
AuthorizationNoYes
Rate limitingBasic (some L7 LBs)Yes (granular)
Request transformationNoYes
Response transformationNoYes
API versioningNoYes
Protocol translationLimitedYes (HTTP to gRPC, REST to GraphQL)
CachingLimitedYes
Developer portalNoYes (with management layer)
Analytics and monitoringBasic metricsDetailed API analytics
Circuit breakingSome implementationsYes
Canary/blue-green deploysSome implementationsYes

The table makes the distinction clear: load balancers focus on network-level traffic distribution, while API gateways focus on application-level API management. The overlap exists primarily in Layer 7 load balancers, which have gradually added some application-aware features.

Key Differences Explained#

Scope of Concern#

A load balancer answers the question: which backend server should handle this connection? An API gateway answers a broader set of questions: is this client authenticated? Are they authorized for this endpoint? Have they exceeded their rate limit? Does the request need transformation before forwarding? Should the response be cached?

In practice, most organizations using API gateways configure multiple cross-cutting policies (authentication, rate limiting, logging, and CORS), none of which fall within a traditional load balancer's responsibility.

Protocol Awareness#

Load balancers, especially at Layer 4, are largely protocol-agnostic. They route TCP connections without understanding the application protocol. API gateways are deeply protocol-aware. They parse HTTP methods, match URL patterns, inspect headers, and in many cases understand domain-specific protocols like gRPC and GraphQL.

This protocol awareness enables capabilities that load balancers cannot provide. For example, an API gateway can route GraphQL queries to different backend services based on the query's requested fields, or translate between REST and gRPC protocols transparently.

Configuration Granularity#

Load balancer configuration centers on server pools, health check parameters, and distribution algorithms. API gateway configuration is far more granular: per-route authentication requirements, per-consumer rate limits, request header injection, response body transformation, and conditional plugin execution.

A typical enterprise API gateway configuration manages 50-200 routes with distinct policy combinations, compared to a load balancer managing 10-30 server pools. The operational complexity reflects the difference in scope.

Performance Profile#

Layer 4 load balancers add microsecond-level latency because they operate below the HTTP layer. API gateways add millisecond-level latency because they must parse, inspect, and potentially transform HTTP requests. High-performance gateways like Apache APISIX, built on NGINX and LuaJIT, keep this overhead under 1ms for typical configurations. According to APISIX benchmark data, the gateway processes over 20,000 requests per second per core with authentication and rate limiting enabled.

When to Use Which#

Use a Load Balancer When#

  • You need to distribute TCP or UDP traffic across backend instances.
  • Your primary concern is availability and horizontal scaling.
  • You are load balancing non-HTTP protocols (databases, message queues, custom TCP services).
  • You want minimal latency overhead with no application-layer processing.

Use an API Gateway When#

  • You expose APIs to external consumers who need authentication and rate limiting.
  • You run a microservices architecture and need centralized cross-cutting concerns.
  • You need request or response transformation between clients and services.
  • You require detailed API analytics, logging, and monitoring.
  • You manage multiple API versions or need protocol translation.

Use Both Together#

In most production architectures, load balancers and API gateways coexist at different layers. A common deployment pattern places a Layer 4 or cloud-native load balancer (AWS NLB, Google Cloud Load Balancing) in front of a cluster of API gateway instances. The load balancer distributes traffic across gateway nodes for high availability, while the gateway handles application-level API management.

This separation of concerns allows each component to do what it does best.

How Apache APISIX Combines Both#

Apache APISIX is an API gateway that includes built-in load balancing capabilities, effectively combining both roles into a single component for many use cases.

APISIX supports multiple load balancing algorithms natively, documented in its load balancing guide:

  • Round-robin (weighted): Distributes requests across upstream nodes based on configured weights.
  • Consistent hashing: Routes requests to the same backend based on a configurable key (IP, header, URI), useful for cache-friendly distributions.
  • Least connections: Sends requests to the upstream node with the fewest active connections.
  • EWMA (Exponential Weighted Moving Average): Selects the upstream node with the lowest response latency, adapting to real-time backend performance.

By combining API gateway features with production-grade load balancing, APISIX reduces architectural complexity for many deployments. Organizations that would otherwise deploy a separate load balancer and a separate API gateway can consolidate into a single APISIX layer, reducing operational overhead and network hops.

For large-scale deployments, a dedicated Layer 4 load balancer in front of APISIX nodes still makes sense for TCP-level high availability and DDoS protection. But within the application layer, APISIX handles both traffic distribution and API management without requiring an additional component.

FAQ#

Can an API gateway replace a load balancer entirely?#

For HTTP and gRPC traffic, a modern API gateway like Apache APISIX can replace a Layer 7 load balancer because it includes equivalent load balancing algorithms. However, for non-HTTP protocols (raw TCP, UDP, database connections) or for Layer 4 DDoS protection, a dedicated load balancer remains necessary. The most common production pattern uses both: a Layer 4 load balancer for network-level distribution and an API gateway for application-level management.

Does adding an API gateway increase latency compared to a load balancer alone?#

Yes, but the increase is typically small. A Layer 4 load balancer adds microseconds of latency. An API gateway adds 0.5-2ms depending on the number of active plugins. For most APIs where upstream service response times are 10-500ms, the gateway overhead is negligible. The operational benefits of centralized authentication, rate limiting, and observability far outweigh the minor latency cost.

Should I use a cloud provider's managed API gateway or deploy my own?#

Managed gateways (AWS API Gateway, Google Apigee) reduce operational burden but limit customization and can become expensive at high traffic volumes. AWS API Gateway charges per million requests, which can reach thousands of dollars monthly for high-traffic APIs. Self-managed gateways like Apache APISIX offer full control, unlimited throughput on your infrastructure, and no per-request fees, but require your team to operate the gateway cluster. Evaluate based on your traffic volume, customization needs, and operations capacity.

How does an API gateway differ from a reverse proxy?#

A reverse proxy forwards client requests to backend servers and is the foundation of both load balancers and API gateways. An API gateway is a specialized reverse proxy that adds API-specific features: authentication, rate limiting, request transformation, API versioning, and developer-facing analytics. NGINX, for example, can function as a reverse proxy, load balancer, or (with extensions) an API gateway. Apache APISIX is purpose-built as an API gateway with load balancing built in.

· One min read

API monetization is the practice of generating revenue from APIs by charging consumers for access, usage, or the value derived from API-powered integrations. Successful API monetization requires aligning a pricing model with how consumers perceive and extract value, backed by technical infrastructure for metering, rate limiting, and billing.

Why API Monetization Matters#

APIs have shifted from internal integration glue to standalone revenue channels. The global API management market continues to grow rapidly, driven by the platformization of business capabilities.

Companies like Stripe and Twilio have built multi-billion-dollar businesses where the API itself is the product. This pattern is maturing, with a growing number of enterprises monetizing external APIs.

The technical challenge is substantial: you need usage metering accurate to the individual request, rate limiting that enforces plan boundaries in real time, and billing integration that translates API consumption into invoices without manual reconciliation.

API Monetization Models#

Comparison Table#

ModelRevenue TriggerBest ForProsCons
Free / FreemiumConversion to paidDeveloper adoption, market entryLow barrier, viral growthRevenue depends on conversion
Pay-Per-CallEach API requestHigh-volume transactional APIsScales with usage, fair pricingUnpredictable revenue
Subscription TiersMonthly/annual planPredictable workloadsPredictable revenue, simpler billingOverprovision or underprovision risk
Revenue SharingTransaction valueMarketplace, payment APIsAligned incentivesComplex accounting
Transaction-BasedPer business eventPayment processing, shipping APIsValue-aligned pricingRequires event definition

Free and Freemium#

The freemium model offers a free tier with usage caps (typically 1,000-10,000 requests per month) and charges for usage beyond that threshold. Freemium APIs generally see single-digit conversion rates from free to paid plans.

This model works best when the API has broad appeal, a natural expansion path (users start small and grow), and low marginal cost per request. Stripe's original developer onboarding followed this pattern: free to integrate, pay only when processing real transactions.

The risk is subsidizing non-converting users. Effective freemium models set free-tier limits low enough to demonstrate value but high enough to allow meaningful evaluation. Rate limiting at the gateway layer enforces these boundaries without application code changes.

Pay-Per-Call#

Pay-per-call charges a fixed price per API request, typically ranging from $0.001 to $0.05 per call depending on the API's value and computational cost. AWS API Gateway charges $3.50 per million requests; Google Maps Platform charges $5 per 1,000 geocoding requests.

This model aligns cost directly with consumption and is perceived as fair by developers. However, it creates revenue unpredictability for the provider and cost anxiety for the consumer. Unpredictable costs are consistently cited as a top concern among API consumers using pay-per-call pricing.

Implementation requires precise request-level metering. Every API call must be counted, attributed to a consumer, and recorded for billing. API gateways with built-in request counting and consumer identification (via API keys or OAuth tokens) provide this metering layer.

Subscription Tiers#

Tiered subscription pricing offers predefined plans (e.g., Starter at $49/month for 50,000 calls, Professional at $199/month for 500,000 calls, Enterprise at custom pricing) with increasing rate limits, feature access, and support levels.

This is the most common API monetization model. Subscription tiers provide predictable revenue for the provider and predictable costs for the consumer.

The challenge is designing tiers that match actual usage patterns. If 80% of customers cluster in the cheapest tier and 5% need custom enterprise plans, the middle tiers generate minimal revenue. Usage analytics from the API gateway layer inform tier design by revealing actual consumption distributions.

Revenue Sharing#

Revenue sharing takes a percentage of the transaction value facilitated by the API. Stripe charges 2.9% + $0.30 per transaction. Shopify takes a revenue share from apps in its marketplace that use its APIs.

This model aligns provider and consumer incentives because the provider earns more when the consumer's business grows. It works best for APIs that facilitate commerce, payments, or marketplace transactions where the transaction value is clearly attributable.

Revenue-sharing APIs tend to generate higher lifetime customer value compared to flat-rate subscription APIs, though they require more complex accounting and settlement infrastructure.

Transaction-Based#

Transaction-based pricing charges per business event rather than per raw API call. A shipping API might charge per label generated, a payment API per successful charge, or a KYC API per identity verification completed.

This approach captures value more accurately than request counting because a single business transaction may involve multiple API calls (initiate, validate, confirm, webhook). Twilio's pricing model exemplifies this: $0.0075 per SMS sent, regardless of how many API calls the integration makes to send that message.

Implementation requires defining what constitutes a billable event and instrumenting the API to track those events separately from raw request counts. Transaction-based pricing tends to achieve higher gross margins than pay-per-call pricing for APIs with multi-step workflows, because the pricing unit better reflects the value delivered.

Building a Monetization Strategy#

Step 1: Identify the Value Unit#

Determine what unit of value consumers derive from your API. Is it a data record retrieved, a transaction processed, a message sent, or a computation performed? The pricing unit should map to this value unit, not to raw infrastructure metrics.

Step 2: Analyze Consumer Segments#

Different consumers extract different value. A startup making 5,000 API calls per month has different willingness to pay than an enterprise making 50 million. Segment by usage volume, use case, and organizational size. APIs with segment-specific pricing consistently outperform one-size-fits-all pricing in revenue generation.

Step 3: Set Pricing with Data#

Start with competitive analysis (what do comparable APIs charge?), then layer in your cost structure (infrastructure cost per request plus margin) and value-based pricing (what is the consumer's willingness to pay based on the value they derive?). The API gateway's usage analytics provide the data foundation for these calculations.

Step 4: Instrument Metering and Billing#

Technical metering must be accurate, real-time, and attributable to individual consumers. Billing integration must translate metered usage into invoices. These systems must handle edge cases: failed requests (do they count?), cached responses (do they count?), and burst traffic (how is it rated?).

Technical Requirements#

Usage Metering#

Every API request must be captured with consumer identity, endpoint, timestamp, response status, and response size. This data feeds both real-time enforcement (rate limiting) and batch processing (billing). Metering must operate at the gateway layer to capture all traffic regardless of backend implementation.

Most organizations implementing API monetization run metering at the API gateway rather than at the application level. Gateway-level metering is preferred because it provides a single, consistent measurement point.

Rate Limiting#

Rate limiting enforces plan boundaries in real time. A consumer on the Starter plan hitting their 50,000 monthly call limit must receive a clear 429 response with headers indicating their remaining quota and reset time. Rate limiting must be distributed (consistent across multiple gateway nodes), accurate (not approximate), and fast (sub-millisecond decision time).

Usage Analytics#

Raw metering data must be aggregated into dashboards showing consumption trends per consumer, per endpoint, and per time period. These analytics inform tier design, identify upsell opportunities (consumers approaching their limit), and detect anomalies (sudden traffic spikes that may indicate abuse or integration errors).

Billing Integration#

Metering data must flow into a billing system (Stripe Billing, Chargebee, Recurly, or custom) that generates invoices, processes payments, and handles dunning (failed payment recovery). The integration between metering and billing must be reliable: undercounting loses revenue; overcounting erodes trust.

How Apache APISIX Supports API Monetization#

Apache APISIX provides the gateway-layer infrastructure required for API monetization: consumer management, rate limiting, authentication, and logging for metering.

Consumer Management#

APISIX's consumer abstraction represents an API consumer with associated credentials and plugin configurations. Each consumer can have different rate limits, authentication methods, and access policies. This maps directly to monetization tiers: create a consumer group per pricing plan, assign rate limits and quotas per group, and associate individual API keys or OAuth clients with their respective consumer.

Rate Limiting for Plan Enforcement#

The limit-count plugin enforces request quotas per consumer over configurable time windows. A Starter plan consumer can be limited to 50,000 requests per month with a 429 response and X-RateLimit-Remaining headers when the quota is approached. The plugin supports Redis-backed distributed counting, ensuring consistent enforcement across multiple APISIX nodes.

For more granular control, the limit-req plugin enforces requests-per-second limits to prevent burst abuse, while limit-conn controls concurrent connection counts. These three plugins together provide comprehensive traffic shaping aligned with monetization tiers.

Authentication for Consumer Identification#

Monetization requires identifying which consumer made each request. APISIX supports key-auth, JWT authentication, and OpenID Connect for consumer identification. Each authentication method binds requests to a consumer entity, enabling per-consumer metering and rate limiting.

Logging for Usage Metering#

APISIX's logging plugins export request-level data to external systems for metering aggregation. The http-logger sends structured logs to a webhook endpoint, kafka-logger streams to Kafka for high-volume processing, and clickhouse-logger writes directly to ClickHouse for analytical queries. Each log entry includes consumer identity, route, timestamp, status code, and latency, providing the raw data for billing calculations.

A typical monetization pipeline routes APISIX access logs through Kafka into a metering service that aggregates usage per consumer per billing period and feeds the totals into Stripe Billing or a similar platform. Organizations using this architecture typically achieve very high metering accuracy with sub-second log delivery latency.

FAQ#

How do I price my API if I have no usage data yet?#

Start with competitive benchmarking: survey 5-10 comparable APIs and note their pricing structures. Launch with a simple freemium model (generous free tier, one paid tier) to collect usage data. After 90 days, analyze consumption patterns to design informed tiers. Most successfully monetized APIs adjust their pricing model within the first year based on actual usage data.

Should I charge for failed API requests?#

Industry practice varies, but the dominant approach is to not charge for server-side errors (5xx) while counting client-side errors (4xx) against quotas. The rationale is that 4xx errors (bad request, unauthorized, rate limited) result from client behavior, while 5xx errors are provider failures. Document your counting policy clearly in your developer portal. Transparent billing policies consistently rank among the most important factors in API provider selection, alongside documentation quality.

What is a reasonable free-tier limit?#

The free tier should allow a developer to build a proof of concept and demonstrate value to their organization without hitting limits during evaluation. For most APIs, this means 1,000-10,000 requests per month. Data-intensive APIs (maps, AI inference) often set lower limits (100-500 per day) due to higher marginal costs. The key metric is trial-to-paid conversion rate: if your free tier converts below 3%, it may be too generous; above 10%, it may be too restrictive.

How do I handle customers who consistently exceed their tier limits?#

Implement a graduated response: send usage alerts at 80% and 95% of the quota, allow a configurable burst buffer (10-20% over limit) with prorated charges, and only hard-block at a defined overage ceiling. Communicate upsell options proactively when consumers approach limits. APISIX's limit-count plugin supports configurable rejection behavior, and the logging pipeline can trigger automated alerts through webhook integrations when consumers cross threshold percentages.

· One min read

Apache APISIX and Kong are the two most widely adopted open-source API gateways, both built on NGINX and Lua. APISIX differentiates itself with a fully dynamic architecture powered by etcd, higher single-core throughput, and a broader protocol support matrix, while Kong offers a mature enterprise ecosystem with extensive third-party integrations and a large plugin marketplace.

Overview#

Both projects serve as high-performance, extensible API gateways for microservices architectures. Kong was open-sourced in 2015 and has built a substantial commercial ecosystem around Kong Gateway Enterprise, Kong Konnect, and the Kong Plugin Hub. Apache APISIX entered the Apache Software Foundation incubator in 2019 and graduated as a top-level project in 2020, with rapid community growth.

Both projects are recognized as production-grade gateways and see active production deployments worldwide.

Architecture Comparison#

The architectural differences between APISIX and Kong are fundamental and affect day-to-day operations, scalability, and deployment complexity.

Apache APISIX Architecture#

APISIX uses NGINX as its data plane with Lua plugins running in the request lifecycle. Configuration is stored in etcd, a distributed key-value store that pushes changes to all gateway nodes in real time via watch mechanisms. This architecture means that route changes, plugin updates, and upstream modifications take effect within milliseconds without requiring restarts or reloads. There is no relational database dependency.

The etcd-based design gives APISIX a stateless data plane: any node can be added or removed without migration steps or database schema changes. This makes horizontal scaling straightforward and reduces operational overhead significantly in Kubernetes environments where pods are ephemeral.

Kong Architecture#

Kong also uses NGINX and Lua for its data plane. Configuration is stored in PostgreSQL or Cassandra (though Cassandra support has been deprecated in newer versions). Kong's DB-mode requires database migrations when upgrading, and configuration changes propagate through a polling mechanism with a configurable cache TTL, which introduces a delay between API calls to the Admin API and actual enforcement at the proxy layer.

Kong also offers a DB-less mode where configuration is loaded from a declarative YAML file, which eliminates the database dependency but sacrifices the ability to modify configuration dynamically through the Admin API at runtime. Kong's commercial offering, Konnect, provides a managed control plane that addresses many of these operational concerns.

Performance Benchmarks#

Performance characteristics matter at scale, where even small per-request overhead compounds into significant infrastructure costs.

Key architectural differences that affect performance:

  • Route matching: APISIX uses a radix tree-based routing algorithm. Kong uses a different matching approach. The routing algorithm affects lookup time as the number of routes grows.
  • Configuration propagation: APISIX pushes configuration changes from etcd to all nodes in real time. Kong's DB-mode polls the database on a configurable interval, introducing a delay between configuration changes and enforcement.
  • Memory model: Both use NGINX's event-driven architecture, but their plugin execution models differ in per-request allocation patterns.

We recommend benchmarking both gateways with your actual workload, plugin chain, and hardware to get meaningful performance comparisons. Vendor-published benchmarks often test under ideal conditions that may not reflect your production environment.

For many production deployments, both gateways provide sufficient throughput, and the choice often depends on factors beyond raw performance such as ecosystem maturity, plugin availability, and operational familiarity.

Feature Comparison#

FeatureApache APISIXKong (OSS)
Plugin count (built-in)80+40+ (OSS), 200+ (Enterprise)
Protocol supportHTTP/1.1, HTTP/2, HTTP/3, gRPC, WebSocket, TCP/UDP, MQTT, DubboHTTP/1.1, HTTP/2, gRPC, WebSocket, TCP/UDP
DashboardApache APISIX Dashboard (OSS)Kong Manager (Enterprise only)
Admin APIFull REST API, fully dynamicREST API, DB-mode or DB-less
Service discoveryNacos, Consul, Eureka, DNS, KubernetesDNS, Consul (others via plugins)
Kubernetes ingressAPISIX Ingress Controller (CRD-based)Kong Ingress Controller (KIC)
AI gateway capabilitiesai-proxy plugin, multi-LLM routingAI Gateway plugins (Enterprise)
Multi-language plugin supportGo, Java, Python, Wasm, LuaGo, JavaScript, Python (PDK)
Configuration storageetcd (distributed, real-time)PostgreSQL (requires migrations)
Canary/traffic splittingBuilt-in traffic-split pluginCanary plugin (Enterprise)

Both gateways support core functionality like rate limiting, authentication (JWT, OAuth 2.0, API key, LDAP), load balancing, health checks, and circuit breaking. The primary differences lie in the breadth of built-in features available in the open-source edition versus features gated behind enterprise licensing.

Plugin Ecosystem#

APISIX ships with over 80 built-in plugins covering authentication, security, traffic management, observability, and protocol transformation. Notably, plugins for serverless functions (running custom Lua, Java, or Go code inline), AI proxy routing, and advanced traffic management are available in the open-source edition.

Kong's open-source edition includes approximately 40 built-in plugins, with a substantial number of additional plugins available through Kong Plugin Hub and the enterprise edition. Kong's plugin marketplace includes many third-party and partner-contributed plugins, giving it a broader ecosystem for specific vendor integrations like Datadog, PagerDuty, and Moesif.

For custom plugin development, APISIX supports external plugins via gRPC-based plugin runners in Go, Java, and Python, as well as Wasm-based plugins that run in a sandboxed environment. Kong offers a Plugin Development Kit (PDK) supporting Go, JavaScript, and Python alongside native Lua plugins. Both projects accept community-contributed plugins, and their ecosystems continue to grow.

Kubernetes Integration#

Both gateways offer mature Kubernetes ingress controllers, though they differ in design philosophy.

The APISIX Ingress Controller supports both custom resource definitions (CRDs) specific to APISIX and standard Kubernetes Ingress resources. It communicates with the APISIX data plane through the Admin API and supports Gateway API, the emerging Kubernetes standard for traffic management. Configuration changes propagate instantly through etcd.

The Kong Ingress Controller (KIC) also supports CRDs and standard Kubernetes Ingress resources, with Kong-specific annotations for extended functionality. KIC translates Kubernetes resources into Kong configuration, applying them through the Admin API. KIC has a longer track record in production Kubernetes environments and benefits from extensive documentation and community resources.

Both controllers are actively maintained and see regular releases aligned with Kubernetes version updates.

Community and Ecosystem#

MetricApache APISIXKong
LicenseApache 2.0Apache 2.0 (OSS)
GovernanceApache Software FoundationKong Inc.
First release20192015

APISIX benefits from Apache Software Foundation governance, which ensures vendor-neutral development and community-driven roadmap decisions. Kong benefits from the backing of Kong Inc., which provides dedicated engineering resources, enterprise support, and a commercial ecosystem that many large organizations value.

Both projects maintain active community forums, Slack channels, and regular release cadences. Kong's longer market presence gives it an advantage in terms of available tutorials, third-party integrations, and consultant familiarity.

When to Choose Apache APISIX#

APISIX is the stronger choice when your requirements include:

  • Dynamic configuration at scale: Environments where routes and plugins change frequently benefit from etcd-based instant propagation without restarts.
  • Maximum open-source functionality: Teams that need advanced features like traffic splitting, AI proxy, and multi-protocol support without enterprise licensing.
  • High-performance requirements: Workloads where per-request latency and single-core throughput directly impact infrastructure costs.
  • Kubernetes-native deployments: Organizations adopting Gateway API and wanting tight integration with cloud-native service discovery (Nacos, Consul, Eureka).
  • Vendor-neutral governance: Teams that prefer Apache Software Foundation stewardship over single-vendor control.

When to Choose Kong#

Kong is the stronger choice when your requirements include:

  • Mature enterprise ecosystem: Organizations that need commercial support, SLA guarantees, and a proven enterprise deployment track record.
  • Extensive third-party integrations: Environments with specific vendor integration needs covered by Kong's plugin marketplace.
  • Existing Kong investment: Teams already running Kong in production where migration cost outweighs technical advantages.
  • Managed control plane: Organizations that prefer a SaaS-managed control plane (Kong Konnect) to reduce operational burden.
  • Broad hiring market: Teams that can more easily find engineers with Kong experience due to its longer market presence.

FAQ#

Can APISIX and Kong run side by side during a migration?#

Yes. Both gateways can operate in parallel by splitting traffic at the load balancer level. A common migration strategy routes new services through APISIX while existing services continue running through Kong. Gradual traffic shifting with health checks ensures zero-downtime migration. The timeline depends on the number of routes, custom plugins, and testing requirements.

Is APISIX harder to operate because it requires etcd?#

etcd adds a dependency compared to Kong's DB-less mode, but in practice, etcd is a well-understood, battle-tested component already present in most Kubernetes clusters (it is the backing store for Kubernetes itself). Operating etcd requires standard distributed systems practices: run an odd number of nodes (3 or 5), monitor disk latency, and maintain regular snapshots. For teams already running Kubernetes, etcd operational knowledge is typically already available. The operational cost of etcd is generally lower than managing PostgreSQL migrations required by Kong's DB-mode.

Which gateway has better AI and LLM support?#

Both gateways are investing in AI gateway capabilities, but they approach it differently. APISIX provides the ai-proxy plugin in its open-source edition, supporting multi-model routing, token-based rate limiting, and prompt transformation for major LLM providers. Kong offers AI Gateway plugins primarily through its enterprise edition and Konnect platform. For teams building AI-powered applications on an open-source budget, APISIX currently provides more built-in AI functionality without licensing costs.

How do the two gateways compare on gRPC and streaming support?#

APISIX provides native gRPC proxying, gRPC-Web transcoding, and HTTP-to-gRPC transformation out of the box, along with support for HTTP/3 (QUIC), Dubbo, and MQTT protocols. Kong supports gRPC proxying and gRPC-Web through plugins, with HTTP/2 support on both client and upstream connections. For teams heavily invested in gRPC or multi-protocol architectures, APISIX's broader built-in protocol support reduces the need for custom plugins or sidecars.

· One min read

A Kubernetes API gateway is the component that manages external traffic entering a Kubernetes cluster and routes it to the appropriate services. It translates Kubernetes-native resource definitions (Ingress resources or Gateway API resources) into routing rules, handling TLS termination, path-based routing, authentication, and traffic policies at the cluster edge.

What is a Kubernetes API Gateway#

Kubernetes does not include a built-in data plane for external traffic management. The platform defines APIs (Ingress, Gateway API) that describe how traffic should be routed, but the actual implementation is delegated to third-party controllers. These controllers run as pods within the cluster, watch for resource changes, and configure their underlying proxy accordingly.

This design reflects Kubernetes' philosophy of extensibility. With Kubernetes now the dominant container orchestration platform, the choice of API gateway is one of the most consequential infrastructure decisions a platform team faces.

The Kubernetes gateway landscape has evolved significantly. The original Ingress resource, introduced in Kubernetes 1.1 (2015), provided minimal routing capabilities. The newer Gateway API, which reached GA for core features in 2023, offers a far richer model with support for traffic splitting, header-based routing, and role-oriented configuration. Adoption of Gateway API resources in new Kubernetes deployments has grown rapidly since its GA release.

Kubernetes Ingress vs Gateway API#

Ingress Resource#

The Ingress resource is Kubernetes' original API for defining external HTTP routing rules. An Ingress object specifies host-based and path-based routing rules that map incoming requests to backend Services.

Ingress is simple but limited. It supports only HTTP and HTTPS traffic, has no native concept of traffic splitting, and lacks a standard way to express advanced routing (header matching, query parameter routing, request mirroring). To work around these limitations, every ingress controller defines its own annotations, creating vendor lock-in and configuration inconsistency.

Despite its limitations, Ingress remains widely deployed. Most Kubernetes clusters still have at least one Ingress resource defined, though many organizations are migrating to Gateway API for new workloads.

Gateway API#

The Gateway API is a collection of Kubernetes custom resources that provide a more expressive and role-oriented model for traffic management. Its core resources are:

  • GatewayClass: Defines a class of gateway implementations (analogous to StorageClass for volumes).
  • Gateway: Declares a gateway instance with listeners for specific protocols and ports.
  • HTTPRoute: Defines HTTP routing rules with support for path matching, header matching, query parameter matching, request mirroring, traffic splitting, and request/response header modification.
  • GRPCRoute, TCPRoute, TLSRoute, UDPRoute: Protocol-specific route types for non-HTTP traffic.

Gateway API's role-oriented design separates infrastructure concerns (managed by platform teams via GatewayClass and Gateway) from application routing (managed by service teams via HTTPRoute). This separation mirrors real organizational structures where platform engineers control the gateway infrastructure and application teams define their own routes.

Gateway API implementations generally process configuration changes faster than equivalent annotation-based Ingress configurations because the structured resource model eliminates the need for annotation parsing and interpretation.

Comparison Table#

CapabilityIngressGateway API
HTTP host/path routingYesYes
Header-based routingVia annotations (non-standard)Native
Traffic splittingVia annotations (non-standard)Native (HTTPRoute weights)
Request mirroringVia annotations (non-standard)Native
gRPC routingVia annotations (non-standard)Native (GRPCRoute)
TCP/UDP routingNot supportedNative (TCPRoute, UDPRoute)
TLS passthroughVia annotations (non-standard)Native (TLSRoute)
Role-based ownershipNo separationGatewayClass/Gateway vs Route
Cross-namespace routingNot supportedNative (ReferenceGrant)
Request header modificationVia annotations (non-standard)Native
Status reportingBasicDetailed per-route conditions
API maturityStable (v1, limited scope)Core features GA, extended features beta

What is an Ingress Controller#

An ingress controller is a Kubernetes controller that watches Ingress (and optionally Gateway API) resources and configures a reverse proxy to implement the defined routing rules. The controller runs as a Deployment or DaemonSet within the cluster and typically exposes itself via a LoadBalancer or NodePort Service.

Every ingress controller uses a different underlying proxy technology. APISIX Ingress Controller uses Apache APISIX. NGINX Ingress Controller uses NGINX. Traefik and Kong act as both the controller and the proxy. The choice of controller determines the available features, performance characteristics, and operational model.

The ingress controller market has consolidated around several primary options: NGINX Ingress Controller (legacy standard), Apache APISIX Ingress Controller (feature-rich, high performance), Traefik (developer-friendly, auto-discovery), and Kong Ingress Controller (API management focus).

Choosing an Ingress Controller#

Apache APISIX Ingress Controller#

APISIX Ingress Controller pairs a Kubernetes-native control plane with the high-performance Apache APISIX data plane. It supports both Ingress resources and Gateway API, allowing gradual migration. Key differentiators include a rich plugin ecosystem (80+ plugins), dynamic configuration without restarts, and sub-millisecond routing latency.

APISIX is built on NGINX and LuaJIT, delivering throughput exceeding 20,000 requests per second per core in benchmarks. Its plugin architecture means that authentication, rate limiting, request transformation, and observability can be configured through Kubernetes custom resources without modifying application code.

NGINX Ingress Controller#

The NGINX Ingress Controller is the most widely deployed option. It is stable and well-documented but relies heavily on annotations for advanced configuration, which creates verbose and hard-to-maintain manifests as complexity grows.

Traefik#

Traefik provides automatic service discovery and integrates with multiple orchestrators beyond Kubernetes. Its middleware system offers a plugin-like model for cross-cutting concerns. Traefik is popular for smaller deployments and developer environments. Its Go-based architecture makes it lightweight but limits per-core throughput compared to NGINX-based controllers.

Kong Ingress Controller#

Kong pairs its API gateway with a Kubernetes controller and offers a path to Kong's commercial API management platform. It provides a plugin ecosystem comparable to APISIX's but uses a PostgreSQL or Cassandra database for configuration storage, adding operational complexity compared to APISIX's etcd-backed approach.

How Apache APISIX Works as a Kubernetes API Gateway#

The APISIX Ingress Controller deploys Apache APISIX as the data plane and a Kubernetes controller as the control plane within the cluster.

Architecture#

The control plane watches Kubernetes resources (Ingress, Gateway API, and APISIX custom resources) and translates them into APISIX routing configurations via the Admin API. The data plane (APISIX instances) handles actual traffic processing. This separation allows the data plane to scale independently based on traffic volume.

A typical production deployment runs 2-3 APISIX data plane replicas behind a cloud load balancer, with a single controller replica (plus a standby) managing configuration. The data plane stores active configuration in shared memory, enabling sub-millisecond routing decisions without external lookups per request.

Gateway API Support#

APISIX Ingress Controller implements the Gateway API specification, supporting GatewayClass, Gateway, and HTTPRoute resources. Platform teams define GatewayClass and Gateway resources that configure the APISIX data plane. Application teams create HTTPRoute resources that define routing rules for their services.

This role-based model aligns with enterprise organizational structures and helps reduce misconfigurations compared to annotation-based Ingress resources.

Custom Resources#

Beyond standard Kubernetes APIs, APISIX Ingress Controller provides custom resources (ApisixRoute, ApisixUpstream, ApisixPluginConfig) that expose the full power of APISIX's plugin ecosystem. These CRDs allow Kubernetes-native configuration of features like JWT authentication, rate limiting, request transformation, and traffic mirroring without resorting to annotations.

Plugin Configuration#

APISIX's 80+ plugins can be configured through Kubernetes custom resources. For example, enabling JWT authentication on a route requires adding a plugin reference to the ApisixRoute resource. The controller translates this into APISIX plugin configuration automatically. Plugin configurations can be shared across routes using ApisixPluginConfig resources, reducing duplication.

Deployment Patterns#

Single Cluster Gateway#

The simplest pattern deploys APISIX as the sole ingress point for a single Kubernetes cluster. All external traffic enters through APISIX, which handles TLS termination, routing, authentication, and rate limiting before forwarding requests to cluster services. This pattern suits organizations with a single production cluster handling moderate traffic volumes.

Multi-Cluster with Shared Gateway#

For organizations running multiple Kubernetes clusters (multi-region, staging/production, or domain-separated), a shared APISIX deployment can route traffic across clusters. APISIX's upstream configuration supports endpoints outside the local cluster, enabling cross-cluster routing. Many organizations now operate multiple production Kubernetes clusters, making cross-cluster traffic management a common requirement.

Gateway Per Namespace#

Large organizations with multiple teams sharing a cluster may deploy separate APISIX instances per namespace or per team. Each team manages its own gateway configuration through Gateway API resources scoped to their namespace. ReferenceGrant resources control cross-namespace access. This pattern provides strong isolation between teams while sharing cluster infrastructure.

Sidecar Gateway#

For latency-sensitive workloads, APISIX can be deployed as a sidecar alongside the application pod. This eliminates the network hop to a centralized gateway but increases resource consumption and operational complexity. This pattern is uncommon and typically reserved for specialized use cases where every millisecond of latency matters.

FAQ#

Should I use Ingress or Gateway API for new Kubernetes deployments?#

Use Gateway API for new deployments. Gateway API provides a richer feature set, role-based ownership, and native support for traffic splitting, header matching, and multi-protocol routing. Ingress will continue to work but receives no new features. The Kubernetes SIG-Network has stated that Gateway API is the future of Kubernetes traffic management. APISIX Ingress Controller supports both, so you can migrate incrementally.

How does APISIX Ingress Controller compare to the NGINX Ingress Controller?#

APISIX offers dynamic configuration without reloads, a richer plugin ecosystem (80+ plugins vs annotation-based configuration), native support for Gateway API, and higher throughput per core. NGINX Ingress Controller has broader community adoption and more third-party documentation. If your requirements include advanced authentication, rate limiting, or request transformation, APISIX provides these as native plugins rather than custom annotations.

Can I run multiple ingress controllers in the same Kubernetes cluster?#

Yes. Kubernetes supports multiple ingress controllers differentiated by IngressClass (for Ingress resources) or GatewayClass (for Gateway API resources). A common pattern runs APISIX for external-facing APIs requiring authentication and rate limiting, and a lightweight controller like Traefik for internal developer tools. Each Ingress or HTTPRoute resource specifies which controller should handle it.

What is the resource overhead of running APISIX in Kubernetes?#

A production APISIX data plane replica typically requests 500m CPU and 256Mi memory, handling 10,000-20,000 requests per second depending on plugin configuration. The controller replica requests 200m CPU and 128Mi memory. For most clusters, two data plane replicas and one controller replica provide sufficient capacity and redundancy. These resource requirements are comparable to other Kubernetes ingress controllers and negligible relative to the application workloads they protect.

· One min read

An AI gateway is a specialized API gateway that manages traffic between applications and large language models (LLMs), enforcing token-based rate limiting, model routing, cost controls, and content safety policies. As AI agents adopt the Model Context Protocol (MCP) to interact with external tools and data sources, AI gateways become essential infrastructure for securing, observing, and scaling these interactions in production environments.

What is an AI Gateway#

An AI gateway sits between your applications and AI model providers (OpenAI, Anthropic, Google, open-source models), routing requests, enforcing policies, and providing observability across all AI interactions. Unlike traditional API gateways that focus on REST and gRPC traffic patterns, AI gateways understand LLM-specific concerns: token consumption, prompt structure, model-specific rate limits, and response streaming.

The market for AI infrastructure is expanding rapidly, with enterprise adoption of generative AI APIs and models accelerating across industries. This growth creates urgent demand for infrastructure that manages AI traffic with the same rigor applied to traditional API traffic. For more on AI gateway capabilities, see the APISIX AI Gateway overview.

The Rise of AI Agents and LLM Traffic#

AI agents represent a shift from simple prompt-response interactions to autonomous, multi-step workflows where LLMs invoke tools, query databases, browse the web, and orchestrate complex tasks. Unlike a single chatbot API call, an agent workflow may generate dozens of LLM invocations, tool calls, and data retrievals to complete a single user request.

Much of the economic value from generative AI will flow through agentic AI systems that operate autonomously on behalf of users and organizations. Developer adoption of AI agent frameworks has accelerated rapidly, as reflected in growing open-source activity and ecosystem investment.

This growth in agentic AI creates a traffic management challenge. A single agent interaction might produce 10-50 API calls across multiple model providers and tool servers. Without gateway-level management, organizations face unpredictable costs, security blind spots, and no centralized observability over AI operations.

What is MCP (Model Context Protocol)#

The Model Context Protocol (MCP) is an open standard introduced by Anthropic that defines how AI assistants connect to external tools, data sources, and services. MCP provides a standardized interface that replaces the fragmented, vendor-specific tool integration patterns that emerged as AI agents proliferated.

Before MCP, every AI application needed custom integration code for each tool and data source. An agent that needed to query a database, search documents, and call an API required three separate integration implementations, each with its own authentication, error handling, and data formatting logic. MCP standardizes this interaction into a single protocol that any AI assistant can use with any MCP-compatible server.

The protocol draws inspiration from the Language Server Protocol (LSP), which standardized how code editors communicate with language-specific tooling. Just as LSP eliminated the need for every editor to implement every language's features independently, MCP aims to eliminate the need for every AI application to implement every tool integration independently. Since its release, MCP adoption has grown significantly, with a large number of community-built MCP servers available and major AI platforms including support for the protocol.

MCP Architecture#

MCP follows a client-server architecture with clear separation of concerns across four components.

Host#

The host is the AI application that initiates interactions. It could be a desktop AI assistant, an IDE with AI capabilities, a chatbot platform, or any application that leverages LLMs. The host creates and manages MCP client instances and controls which servers the AI model can access, enforcing security boundaries.

Client#

The MCP client is a protocol handler embedded within the host application. Each client maintains a one-to-one connection with a single MCP server. The client handles protocol negotiation, capability discovery, and message routing between the host and the server.

Server#

MCP servers expose tools, resources, and prompts to AI clients through a standardized interface. A server might wrap a database, a file system, a web API, a code repository, or any other data source or capability. Servers declare their capabilities during an initialization handshake, allowing clients to discover available tools dynamically.

Transport#

MCP supports multiple transport mechanisms. The stdio transport communicates through standard input/output streams, suitable for local server processes. The Streamable HTTP transport (which supersedes the earlier SSE-based transport) uses HTTP for remote server communication, enabling servers to run as network services accessible across infrastructure boundaries. In production environments, the HTTP-based transport is widely preferred for its flexibility in distributed deployments.

Why AI Traffic Needs Gateway Management#

AI traffic introduces challenges that traditional API management was not designed to handle.

Security#

AI agents with tool access can potentially reach sensitive systems. Without centralized policy enforcement, an agent might access production databases, execute privileged operations, or leak sensitive data through prompts sent to third-party model providers. Data leakage is widely cited as a primary security concern among organizations deploying AI agents.

Rate Limiting#

LLM providers impose rate limits measured in tokens per minute and requests per minute. These limits differ by model, tier, and provider. An AI gateway tracks token consumption across all applications and enforces limits before requests are rejected by upstream providers, preventing cascading failures.

Cost Control#

LLM API costs scale with token consumption, and agentic workflows can generate substantial token volumes. A single complex agent task might consume 100,000 tokens across multiple model calls. Without gateway-level cost tracking and budget enforcement, organizations frequently discover unexpected AI spending.

Observability#

Debugging agentic AI workflows requires end-to-end visibility across model calls, tool invocations, and data retrievals. Traditional logging captures individual HTTP requests but misses the logical flow of an agent's reasoning chain. AI gateways correlate related requests into coherent traces, making it possible to understand why an agent made specific decisions.

Multi-Provider Routing#

Organizations increasingly use multiple model providers to optimize for cost, latency, and capability. An AI gateway routes requests to the appropriate provider based on model availability, cost thresholds, latency requirements, and task complexity, functioning as an intelligent load balancer for AI traffic.

Key AI Gateway Features#

Modern AI gateways provide capabilities specifically designed for LLM and agent traffic patterns.

LLM load balancing distributes requests across multiple model endpoints, providers, or self-hosted instances. This includes weighted routing, failover, and least-latency selection. Organizations running self-hosted models alongside commercial APIs use load balancing to optimize cost and performance simultaneously.

Token-based rate limiting tracks consumption in tokens rather than simple request counts. Since a single LLM request can consume anywhere from 100 to 100,000 tokens depending on context length, request-based rate limiting is insufficient. Token-aware rate limiting provides accurate cost and capacity management.

Prompt caching stores responses for repeated or similar prompts, reducing latency and cost for common queries. Semantic caching extends this by matching prompts based on meaning rather than exact text. Effective prompt caching strategies can meaningfully reduce both latency and cost for common queries.

Model fallback automatically redirects traffic to alternative models when a primary provider experiences outages, rate limit exhaustion, or elevated latency. Fallback chains can be configured with degradation policies (for example, falling back from GPT-4 to GPT-3.5 with a user notification).

Content moderation inspects prompts and responses for policy violations, sensitive data, prompt injection attempts, and harmful content. Gateway-level moderation ensures consistent enforcement regardless of which application or agent generates the traffic.

How APISIX Supports AI Workloads#

Apache APISIX provides AI gateway capabilities through its plugin architecture, enabling organizations to manage LLM traffic alongside traditional API traffic within a single gateway infrastructure.

The ai-proxy plugin provides a unified interface for routing requests to multiple LLM providers including OpenAI, Anthropic, Azure OpenAI, and self-hosted models. It handles provider-specific authentication, request format translation, and response normalization, allowing applications to switch between providers without code changes.

APISIX supports token-based rate limiting through its rate limiting plugins configured with token consumption metrics, enabling organizations to enforce per-consumer and per-route token budgets. Combined with the logging and metrics plugins, this provides complete visibility into AI spending across all applications and teams.

For MCP-to-HTTP bridging, APISIX can proxy traffic between MCP clients using Streamable HTTP transport and backend MCP servers, applying the same authentication, rate limiting, and observability policies that govern traditional API traffic. This enables organizations to expose MCP servers through a managed gateway layer rather than allowing direct network access from AI agents to tool servers.

APISIX's dynamic configuration through etcd is particularly valuable for AI workloads where model endpoints, rate limits, and routing rules change frequently as new models are released, pricing changes, and usage patterns evolve. Configuration changes take effect in milliseconds without gateway restarts, enabling rapid response to provider outages or cost threshold breaches.

Future of AI Infrastructure#

The convergence of AI gateways and traditional API gateways is accelerating. As AI capabilities become embedded in every application, the distinction between "AI traffic" and "regular API traffic" will blur. Gateways that manage both traffic types within a unified policy framework will have a significant advantage over point solutions.

MCP adoption is likely to grow as more AI platforms and tool providers implement the protocol, creating demand for infrastructure that can manage MCP traffic at enterprise scale. The protocol's evolution toward more sophisticated transport mechanisms, authentication models, and capability negotiation will require gateway-level support to handle securely.

As worldwide spending on AI infrastructure continues to grow rapidly, a meaningful portion will flow through AI gateway infrastructure that provides the security, observability, and cost management enterprises require before deploying AI agents in production environments.

FAQ#

What is the difference between an AI gateway and a traditional API gateway?#

An AI gateway extends traditional API gateway capabilities with LLM-specific features: token-based rate limiting, prompt inspection, model routing, cost tracking, and response streaming support. A traditional API gateway manages REST and gRPC traffic with request-based rate limiting, authentication, and load balancing. Modern platforms like Apache APISIX blur this distinction by supporting both traditional and AI-specific traffic management within a single gateway, eliminating the need for separate infrastructure.

How does MCP relate to function calling and tool use in LLMs?#

Function calling (also called tool use) is the LLM capability to generate structured outputs that invoke external functions. MCP standardizes the infrastructure layer that connects these function calls to actual tool implementations. Where function calling defines what the model wants to do, MCP defines how the request reaches the tool server and how results return to the model. MCP is complementary to function calling, not a replacement.

Can I use an AI gateway without adopting MCP?#

Yes. AI gateways manage all types of AI traffic, including direct LLM API calls that do not use MCP. Most organizations start with basic LLM proxy and rate limiting features before adopting MCP for tool integration. The gateway provides value regardless of whether your AI applications use MCP, custom tool integrations, or simple prompt-response patterns.

· One min read

An open-source API gateway sits between clients and backend services, handling routing, authentication, rate limiting, and observability. Apache APISIX, Kong, Envoy, and Traefik are among the most widely adopted options, each with distinct architectural decisions that affect performance, extensibility, and operational complexity.

Why the Choice of API Gateway Matters#

Organizations running microservices at scale route millions of requests per day through their gateway layer. The gateway you choose determines your latency floor, plugin flexibility, and how much operational overhead your platform team absorbs.

Choosing poorly means rearchitecting under pressure. Choosing well means a gateway that scales with your traffic for years without becoming a bottleneck.

Feature Comparison Table#

FeatureApache APISIXKongEnvoyTraefik
LanguageLua (NGINX + LuaJIT)Lua (NGINX + LuaJIT)C++Go
Configuration StoreetcdPostgreSQL / CassandraxDS API (control plane)File / KV stores
Admin APIRESTful, fully dynamicRESTfulxDS gRPCREST + dashboard
Hot ReloadYes, sub-millisecondPartial (DB polling)Yes (xDS push)Yes (provider watch)
Plugin Count100+ built-in60+ bundled (more in Hub)~30 HTTP filters~30 middlewares
Plugin LanguagesLua, Java, Go, Python, WasmLua, Go (PDK)C++, WasmGo (middleware)
gRPC ProxyingNativeSupportedNativeSupported
HTTP/3 (QUIC)SupportedExperimentalSupportedSupported
DashboardBuilt-in (APISIX Dashboard)Kong Manager (Enterprise)None (third-party)Built-in
LicenseApache 2.0Apache 2.0 (OSS) / Proprietary (Enterprise)Apache 2.0MIT

Note: Feature details are based on each project's official documentation as of early 2026. Check the respective project sites for the latest status.

Detailed Breakdown#

Apache APISIX#

Apache APISIX is built on NGINX and LuaJIT, using etcd as its configuration store. This architecture eliminates database dependencies on the data path: route changes propagate to every gateway node within milliseconds without restarts or reloads.

The plugin ecosystem includes over 100 built-in options spanning authentication (JWT, key-auth, OpenID Connect), traffic management (rate limiting, circuit breaking), observability (Prometheus, Zipkin, OpenTelemetry), and transformation (request/response rewriting, gRPC transcoding). Developers can write custom plugins in Lua, Go, Java, Python, or WebAssembly, making it one of the most polyglot gateway runtimes available.

APISIX supports the Kubernetes Ingress Controller pattern natively. The APISIX Ingress Controller watches Kubernetes resources and translates them into APISIX routing configuration, enabling declarative GitOps workflows while preserving the full plugin surface.

As an Apache Software Foundation top-level project, APISIX is community-governed and vendor-neutral.

Kong#

Kong is the longest-established open-source API gateway, with a mature commercial ecosystem. It shares the NGINX + LuaJIT foundation with APISIX but relies on PostgreSQL or Cassandra as its configuration store. This architectural choice introduces a database dependency for configuration storage, which adds operational complexity for HA deployments.

Kong's plugin hub offers approximately 60 bundled plugins in the open-source edition, with additional enterprise-only plugins for advanced features like OAuth2 introspection and advanced rate limiting. The Go Plugin Development Kit (PDK) allows extending Kong in Go, though Lua remains the primary plugin language.

Kong has a strong enterprise support ecosystem with commercial offerings (Kong Gateway Enterprise, Kong Konnect) and a large user community.

Envoy#

Envoy is a high-performance C++ proxy originally built at Lyft, now a CNCF graduated project. It excels as a service mesh data plane and is the foundation for Istio, AWS App Mesh, and other mesh implementations.

Envoy's configuration model uses the xDS (discovery service) API, a gRPC-based protocol that pushes configuration updates from a control plane. This design is powerful but means Envoy does not function as a standalone gateway without a control plane component. Organizations adopting Envoy as an edge gateway typically pair it with a control plane like Gloo Edge or similar tools.

The filter chain model supports around 30 built-in HTTP filters. Custom extensions require C++ or WebAssembly, raising the barrier for teams without C++ expertise. Envoy is most commonly deployed as a sidecar proxy within a service mesh, though it is also used as an edge proxy.

Traefik#

Traefik is written in Go and designed for automatic service discovery. It integrates natively with Docker, Kubernetes, Consul, and other orchestrators, automatically detecting new services and generating routes without manual configuration. This auto-discovery model makes Traefik popular for development environments and smaller-scale production deployments.

Traefik includes built-in Let's Encrypt integration for automatic TLS certificate provisioning, a feature that requires additional tooling in other gateways. Its middleware system offers approximately 30 built-in options covering authentication, rate limiting, headers manipulation, and circuit breaking.

Traefik has a large community and is widely used in Docker-native environments.

Performance Considerations#

Performance varies significantly based on configuration, plugin chains, TLS termination, and upstream complexity. When evaluating gateways, run your own benchmarks with your actual workload patterns rather than relying on vendor-published numbers.

Key factors that affect gateway performance:

  • Architecture: C++ and LuaJIT-based gateways (Envoy, APISIX, Kong) generally achieve lower latency than pure Go implementations
  • Configuration store: Gateways that avoid database queries on the data path (APISIX, Envoy) tend to have more consistent latency
  • Plugin overhead: Each active plugin adds processing time. Test with your actual plugin chain enabled
  • Connection handling: The NGINX event-driven model (APISIX, Kong) handles high concurrency efficiently

We recommend benchmarking the specific gateways you are considering with a representative workload on hardware similar to your production environment.

When to Choose Which#

Choose Apache APISIX when you need a large built-in plugin ecosystem, fully dynamic configuration without restarts, multi-language plugin support, and no database dependency. It suits teams building platform-grade API infrastructure. See the getting started guide to evaluate it hands-on.

Choose Kong when you are operating in an enterprise environment with existing Kong deployments, need commercial support, or require specific enterprise-only plugins. Kong's maturity means more third-party integrations and consultants are available.

Choose Envoy when your primary use case is a service mesh data plane, you need advanced load balancing algorithms, or you are already running Istio or a similar mesh. Envoy is less suited as a standalone edge gateway due to its control plane dependency.

Choose Traefik when auto-discovery and zero-configuration routing are priorities, or you need built-in Let's Encrypt integration without additional tooling. Traefik excels in Docker-native and small-to-medium Kubernetes environments.

Migration Considerations#

Migrating between gateways is nontrivial and typically requires careful planning. Key factors include:

  • Plugin compatibility: Not all plugins have equivalents across gateways. Audit your active plugins and identify gaps before migrating.
  • Configuration translation: Each gateway uses a different configuration format. Automated translation tools can help but manual verification is essential.
  • Operational tooling: Monitoring dashboards, CI/CD pipelines, and alerting rules need updating.
  • Canary approach: Running both gateways in parallel behind a load balancer and shifting traffic gradually is the safest migration strategy.

Frequently Asked Questions#

Is Apache APISIX production-ready for enterprise workloads?#

Yes. Apache APISIX is an Apache Software Foundation top-level project used in production by organizations worldwide. The etcd-backed architecture provides high availability without single points of failure when deployed with an etcd cluster.

Can I migrate from Kong to APISIX without downtime?#

A zero-downtime migration is achievable using a canary deployment approach: run both gateways in parallel behind a load balancer, gradually shifting traffic from Kong to APISIX as you validate route-by-route equivalence. APISIX supports most Kong plugin equivalents natively, and the Admin API allows automated route provisioning during migration.

How do open-source API gateways compare to cloud-managed options like AWS API Gateway?#

Cloud-managed gateways trade control for convenience. They handle infrastructure operations but impose vendor lock-in, per-request pricing that grows with traffic volume, and limited plugin customization. Open-source gateways like APISIX provide full control over the data plane, support multi-cloud and hybrid deployments, and eliminate per-request platform fees.

Which gateway has the best Kubernetes support?#

All four gateways support Kubernetes, but the depth varies. APISIX and Kong offer dedicated ingress controllers with CRD-based configuration. Envoy integrates through the Kubernetes Gateway API and service mesh deployments. Traefik auto-discovers Kubernetes services natively. The emerging Kubernetes Gateway API standard is supported by all four projects to varying degrees, and is becoming the recommended approach for new deployments.