API Gateway Architecture Done Right
API gateways collect responsibilities the way junk drawers collect batteries. A small transformation added “just this once” for a specific client integration. Business logic for a special case that would be “easier to handle at the edge.” Data aggregation because the mobile client needed it and a proper BFF wasn’t ready yet. You know exactly how this story ends.
Six months later, the gateway has business rules only two engineers understand. One of them just left. There’s aggregation logic that can’t scale on its own. Deployment coupling so tight that any API response change needs a gateway release. The front desk receptionist who started sorting mail, fixing the printer, doing everyone’s accounting, and is now the bottleneck for the entire building.
The opposite failure is just as common. Gateway configured as a thin proxy that forwards everything to a single backend. No cross-cutting value. Just a network hop and a deployment dependency for nothing. A front desk that waves everyone through without checking badges.
- Five responsibilities belong in the gateway. TLS termination, auth validation, rate limiting, request routing, request ID injection. Everything else is a liability.
- BFF (backend-for-frontend) is the correct aggregation layer. Putting aggregation in the gateway is the #1 gateway anti-pattern, and the one with the longest recovery time.
- Rate limit hit rate above 5% means limits are miscalibrated. Below 1% means they’re working. Alert on sudden jumps, not the absolute number.
- Gateway observability slashes mean-time-to-diagnosis. The gateway sees 100% of inbound traffic. No other component has that view.
- Old API versions are attack surface. Build sunset enforcement into routing with 90-day deprecation windows and automatic
Sunsetheaders.
The Gateway’s Actual Job
A narrow responsibility set, strictly kept. The front desk checks badges, directs visitors, and logs who enters. That’s it. Application security at the edge depends on this boundary staying clean.
| Belongs in the gateway | Does NOT belong in the gateway |
|---|---|
| TLS termination | Business logic |
| Token validation (authentication) | Authorization (needs domain context) |
| Rate limiting (global + per-client) | Data aggregation from multiple services |
| Request routing by path/header | Response transformation for specific clients |
| Request ID / trace ID injection | Database queries or caching logic |
| Access logging and metrics | Retry logic for specific service failures |
Complete list. If someone on your team is tempted to add more, the answer is no. Every extra responsibility is a step toward the edge monolith. The moment the receptionist starts doing accounting, the lobby is unattended.
The distinction between authentication (does this token represent a valid user?) and authorization (can this user access this resource?) is where most teams blur the line first. Authentication belongs at the gateway. Authorization needs domain context that only the backend service has. The front desk checks your badge is real. Only the department head decides whether you’re allowed in the meeting.
The BFF Pattern: Aggregation Done Right
Different clients need different data shapes. Mobile needs compact responses with minimal payloads. Web needs rich data for complex UI rendering. Partners need stable contracts that don’t break when internal services change. One front desk, three different tours.
A dedicated BFF per client type aggregates and transforms independently. The mobile team updates their BFF without coordinating with web or gateway teams. If gateway changes need cross-team coordination, the gateway has swallowed too much. Microservice architectures were supposed to fix that.
| When BFF makes sense | When it’s overkill |
|---|---|
| Mobile and web need very different response shapes | All clients consume the same API shape |
| Client teams deploy on independent schedules | One team owns all clients and the backend |
| Partner integrations need stable, versioned contracts | Internal traffic only, no external consumers |
| Aggregation spans 3+ backend services per request | Each client calls one backend service directly |
The BFF adds 10-30ms of latency. Worth it when it kills over-fetching and cuts client-side processing. Not worth it when a single backend already returns exactly what the client needs. Don’t build a personal assistant for someone who only asks for directions.
Rate Limiting Design
Rate limit by IP for unauthenticated traffic (100 req/min), by API key for authenticated traffic (1,000-10,000 depending on tier). Return 429 with Retry-After, not 503. The status code difference matters because well-behaved clients retry on 429 with backoff but treat 503 as a service failure. The difference between “please wait” and “something’s broken.”
# Gateway rate limiting config
rate_limiting:
unauthenticated:
limit: 100
window: 60s
key: client_ip
response: 429 # with Retry-After header
authenticated:
tiers:
free: { limit: 1000, window: 60s, key: api_key }
pro: { limit: 5000, window: 60s, key: api_key }
unlimited: { limit: 10000, window: 60s, key: api_key }
Below 1% rate limited: set correctly. Above 5%: limits are too tight or traffic patterns shifted. Alert on sudden jumps. Connect to your observability stack so rate limit events line up with latency spikes. A surge in 429s that doesn’t match a traffic spike means your limits are wrong, not your users.
| Request Type | Rate Limit Tier | Limit | What Happens on Breach |
|---|---|---|---|
| Unauthenticated | IP-based | 60 requests/minute per IP | 429 Too Many Requests. Retry-After header with backoff |
| Free tier authenticated | API key or token-based | 1,000 requests/hour | 429 with usage dashboard link. Upgrade CTA |
| Paid tier authenticated | Token-based with plan lookup | 10,000-100,000 requests/hour (plan-dependent) | 429 with current usage. Soft notification at 80% threshold |
| Internal service | Service identity | No hard limit. Circuit breaker at anomaly detection | Alert on-call. No 429 (internal services shouldn’t be rate-limited, they should be debugged) |
Don’t: Apply the same rate limits to internal service-to-service traffic and external client traffic. Internal traffic has completely different patterns, security posture, and latency needs. Running both through the same gateway config means either throttling internal calls for no reason or under-protecting external ones. Putting the delivery entrance and the customer entrance through the same revolving door.
Do: Separate gateway configs for internal and external traffic. Internal gateways handle service mesh routing and mTLS validation. External gateways handle rate limiting, WAF, and client authentication.
Gateway Observability
The gateway sees 100% of inbound traffic. The security camera at the front door. Five metrics matter: request rate by endpoint (alert on 2x jump from baseline), error rate by status code (separate 4xx client errors from 5xx server errors), latency at P50/P95/P99, upstream health from backend health check failures, and rate limit hit rate as a percentage of total traffic.
Inject trace IDs at the gateway. One identifier connects the client request through every backend hop to the exact failure point. Structured access logs (path, method, status, latency, client ID, trace ID) resolve most “your API is broken” tickets in minutes instead of hours. Teams with gateway dashboards resolve incidents faster because the gateway is the one point with complete traffic visibility. API integration engineering starts with this visibility layer.
Building a gateway observability dashboard
The minimum viable dashboard has four panels: request rate by top-10 endpoints (line chart, 5-minute windows), error rate by status code family (stacked area, 4xx and 5xx separated), P50/P95/P99 latency (line chart, alert when P99 crosses 500ms), and rate limit events by tier. Add a fifth panel showing upstream service response times so you can tell gateway latency from backend latency during incidents. Wire alerts to the on-call rotation for 5xx rate above 1% and P99 above 500ms. Most incident triage starts and ends at this dashboard.
Version Routing and API Sunset
Route /v1/users to the v1 service instance, /v2/users to v2. The gateway separates public API versions from internal service versions, which means v1 and v2 can be completely different implementations behind the same hostname. Inject Sunset and Deprecation headers automatically on deprecated versions so clients get advance notice in every response.
Old versions are attack surface. That 2019 endpoint “nobody uses” still gets 47 requests per day from an undocumented integration. Ghosts in the building. Leaving it running means maintaining security patches for code that should have been torn down years ago.
Security at the Edge
Four controls, layered. WAF blocks OWASP top 10 attacks at the edge with 1-3ms per request overhead. The bouncer who knows what trouble looks like. mTLS between gateway and backends gives you zero-trust internal communication. Automate certificate rotation from day one with short-lived certificates, because manual rotation is the rotation that doesn’t happen. (It never happens.)
JWT validation at the gateway handles authentication. Authorization stays at the backend where domain logic lives. Don’t push authorization into the gateway. It needs context (ownership, roles, resource state) that only the service has. The gateway confirms you work here. Only the service decides whether you belong in the room.
Bot detection analyzes header ordering, TLS fingerprints, and timing patterns. A cloud-native architecture processing heavy request volumes will find a surprising share of traffic is bots, and that share grows with API popularity. Your most loyal “users” might not be human.
What the Industry Gets Wrong About API Gateways
“The gateway should handle aggregation.” The single most common gateway anti-pattern, and the one with the longest recovery time. Once aggregation logic lives in the gateway, extracting it into BFFs means rewriting every affected endpoint while keeping backward compatibility with existing clients. Removing load-bearing walls from a building that’s already occupied. Start with BFFs. There is no shortcut that doesn’t become technical debt.
“One gateway config fits all traffic.” Internal service-to-service traffic and external client traffic have completely different security, rate limiting, and versioning needs. Separate configs for internal and external traffic is the correct architecture. The employee entrance and the customer entrance exist for a reason.
"GraphQL federation replaces the gateway." Federation handles query composition across subgraphs. It does not handle rate limiting, authentication, request tracing, or version lifecycle. The federation layer sits behind the gateway, not instead of it. The conference room doesn’t replace the front desk.
That “just this once” transformation from six months ago? Rip it out. Clear contract: route, authenticate, rate-limit. Everything else belongs somewhere else. The front desk checks badges and points visitors to the right floor. Nothing more.