Microservice Communication Patterns: REST, gRPC, Events
It starts with 3 services talking over REST. Everyone agrees REST is the fastest way to get moving. Fast forward to service 15, and an inventory lookup is taking 800ms under load. The order service calls inventory synchronously. The checkout page calls order synchronously. Users stare at a 2.4-second spinner for what used to be a 200ms page load.
Walking to someone’s desk and waiting for an answer. They’re on the phone. You wait. They need to check with someone else. You wait longer. Three desks deep and you’ve been standing for 2.4 seconds.
The gRPC framework will eventually replace some of these internal calls, but by then the SRE team is already pulling a late night tracing a cascade failure that started with a 97th-percentile database query three hops away from the service that’s actually timing out.
If you’ve been in that room, you already know where this is going.
- Synchronous REST between 15+ services creates a distributed monolith. All the coupling, none of the transactional guarantees. One slow service cascades through the entire call chain.
- Async messaging breaks the coupling but introduces eventual consistency. Not every operation tolerates “your order will appear shortly.”
- Circuit breakers without timeout tuning are ornamental. A 30-second default timeout means 30 seconds of requests piling up before the breaker trips. Set timeouts based on P99 latency, not defaults.
- gRPC cuts payload size hard and adds strong typing, but introduces complexity in load balancing (HTTP/2 connection reuse) and debugging (binary protocol).
- The decision is not sync vs async. It’s which operations need which pattern. Reads: sync. Fire-and-forget: async. Commands with confirmation: sync call that triggers async processing.
Communication pattern determines failure blast radius, consistency model, and team coupling. Make these microservice architecture decisions before they calcify.
The Synchronous Coupling Problem
Five services at 99.9% availability each. Chained synchronously, end-to-end availability drops to 99.5%. P99 of 200ms per hop means 1,000ms of network latency before your business logic even runs. One slow dependency exhausts the caller’s thread pool, which cascades upstream through every service in the chain. The chain is only as fast as the slowest link.
The arithmetic is unforgiving. Add a sixth service and availability drops further. Add retries without budgets and a single slow leaf service receives 81 requests from one originating user request (3 retries across 4 layers: 3^4). A partial slowdown becomes a complete outage. Site reliability engineering formalizes the circuit breakers that prevent propagation, but the structural problem is the synchronous chain itself.
Choosing Between Sync and Async
Producer publishes, moves on. Notification service down? Order still completes. Message waits in the queue. Drop a memo in the inbox. Go back to your desk. They read it when they’re free. The trade-off: eventual consistency. For most cross-domain events, eventual consistency is fine. For payment confirmation before order fulfillment, synchronous is still the right call. Some questions need an answer before you can leave the desk.
| Pattern | Use When | Latency | Coupling | Failure Handling |
|---|---|---|---|---|
| Sync REST | Caller needs an answer now (reads, confirmations) | Low (direct) | High (caller blocks) | Circuit breaker + timeout |
| Sync gRPC | High-frequency internal calls, strong typing needed | Very low (binary, HTTP/2) | High | Same, plus load balancer awareness |
| Async events (Kafka) | Other domains should react, no confirmation needed | Variable (eventual) | Low (temporal decoupling) | DLQ + retry + idempotency |
| Async commands (SQS) | Fire-and-forget tasks, work queues | Variable | Low | Visibility timeout + DLQ |
| Request-reply (async) | Caller needs answer but can wait, no blocking | Medium | Medium | Correlation ID + timeout |
Design for ordering and idempotency from day one. Kafka guarantees ordering within a partition only. Partition keys solve entity-scoped events, but bolt this on after 50 event types and it becomes a multi-sprint migration. Platform engineering codifies these defaults before they become afterthoughts.
| Question | If Yes | If No |
|---|---|---|
| Does the caller need a response before it can proceed? | Synchronous required (continue below) | Async messaging (Kafka, SQS, RabbitMQ) |
| Is this a high-frequency internal call (>1000 RPS)? | gRPC (binary protobuf, HTTP/2 multiplexing) | Continue below |
| Must consistency be immediate (e.g. payment confirmation)? | Synchronous REST or gRPC (based on volume) | REST (external APIs, low-frequency calls) |
gRPC for Internal High-Frequency Calls
Protobuf payloads are much smaller than JSON equivalents. HTTP/2 multiplexing means dozens of concurrent RPCs over a single connection. Generated stubs eliminate runtime type mismatches. The intercom system. Faster than walking. Both parties on the line at once. The setup cost: proto compilation, versioning, and schema evolution discipline. Never reuse field numbers after deprecation.
For external APIs: REST. Browser clients expect JSON, and debugging binary protocols requires specialized tooling. For internal calls above 1,000 RPS: gRPC pays back in weeks through reduced bandwidth and stronger contracts. Distributed systems at scale benefit from sustainable proto management.
| Dimension | REST / JSON | gRPC / Protobuf |
|---|---|---|
| Payload format | JSON text. Human-readable, larger wire size | Binary protobuf. Compact, 3-10x smaller than JSON |
| Connection model | HTTP/1.1: one request per connection (HTTP/2 helps but not universal) | HTTP/2 multiplexing: many RPCs per connection |
| Type safety | Runtime validation. Silent mismatches possible between client and server | Compile-time contracts. Generated stubs catch mismatches at build |
| Tooling barrier | Low. curl, Postman, browser devtools | Higher. grpcurl, proto management, codegen pipeline |
| Best for | External APIs, browser clients, <1000 RPS | Internal service calls, high-frequency, >1000 RPS |
Don’t: Use gRPC for every internal call because “it’s faster.” A service called 10 times per minute gains nothing from binary serialization. The debugging overhead (no curl, no browser dev tools, binary wire format) outweighs the tiny latency savings.
Do: Reserve gRPC for high-frequency internal paths above 1,000 RPS or where strong typing across team boundaries prevents integration bugs. Use REST everywhere else.
Circuit Breakers and Retry Budgets
A circuit breaker monitors error rates and opens when failures exceed a threshold (typically 50% over a 10-second window). Open state means calls fail fast. No network requests. No thread pool exhaustion. The elevator that stops accepting passengers when the lobby is full. Better than everyone piling in and getting stuck. After a cooldown (30 seconds is a reasonable default), the breaker enters half-open state and allows probe requests to test recovery.
The resilience4j library implements these patterns in Java, and the resilience patterns guide covers tuning in depth. But the breaker is only half the defense. Retry budgets are the other half: 3 retries per layer maximum, exponential backoff starting at 100ms, plus jitter to prevent thundering herds. Without budgets, a 4-service chain amplifies 1 request into 81. One memo becomes 81 copies in the mail room. (The mail room is on fire.)
- Circuit breaker library with configurable thresholds per downstream dependency
- Retry budget capped at 3 attempts with exponential backoff and jitter
- Timeout set to 2x P99 latency of the downstream service, not the library default
- Dead letter queue set up for async paths with alerting on DLQ depth
- Distributed tracing active across all service-to-service calls
Sagas for Distributed Transactions
Distributed transactions across microservices don’t have the luxury of ACID guarantees. The saga pattern breaks a multi-service operation into local transactions, each with a compensating action for rollback. If step 3 of a 5-step saga fails, steps 2 and 1 run their compensating transactions in reverse order. A multi-department process. Shipping fails? Undo the inventory hold and refund the payment. In reverse. Praying the refund department isn’t at lunch.
Orchestration wins beyond 3-4 steps. A central coordinator manages the sequence, tracks state, and triggers compensation. Choreography (each service reacts to events from the previous one) works for simple two-step flows but becomes impossible to reason about when six services are chained. Nobody can see the full picture.
Two hard failure modes that code alone can’t solve: the compensating action itself fails (refund API is down when you need to reverse a charge), and partial success where compensation is impossible (product already shipped before the payment bounced). Both need runbooks and human escalation paths, not just retry loops.
Choreography vs orchestration: when each pattern fits
Choreography works when the saga has 2-3 steps, each service owns its compensation logic, and the flow is linear. Payment reserved, then inventory reserved. If inventory fails, payment service listens for the failure event and reverses itself. Simple. Decoupled. Two desks. Each knows what to do.
Beyond 3 steps, choreography becomes a distributed state machine that nobody can draw on a whiteboard. Each service must know which events to react to and which compensating events to emit. Six departments each reacting to memos from the previous department. Nobody sees the whole process. Orchestration centralizes the coordination. The saga coordinator knows the full sequence, tracks progress in a persistent store, and triggers compensation in the correct reverse order. More coupling to the coordinator, but much easier to debug and extend. A project manager who owns the process. More overhead. But someone can actually explain what’s happening.
What the Industry Gets Wrong About Microservice Communication
“Start with REST, switch to async later.” Switching from sync to async after 20 services are wired together means renegotiating contracts with every caller. The interface shape is baked into retry logic, error handling, and deployment pipelines across a dozen codebases. Making the right pattern choice early costs a few days of design discussion. Making it late costs quarters of migration. Ripping out the phone system after everyone’s memorized the extensions.
“gRPC is always faster than REST.” gRPC is faster for payload serialization. Binary protobuf is much more compact than JSON. But when the bottleneck is the database query behind the API, a service that takes 400ms to query PostgreSQL returns in 400ms regardless of the transport protocol. Faster intercom doesn’t help when the person on the other end needs 400ms to look up the answer. Measure where latency actually lives before changing protocols.
That 800ms inventory lookup cascading through three synchronous hops? With async events carrying cross-domain state changes and circuit breakers capping the remaining sync calls, the same traffic spike degrades gracefully instead of toppling the checkout page. The services are still talking. They just stopped standing in line. Memos in inboxes. Intercoms for urgent questions. The building runs because people stopped blocking each other.