Microservice Communication Patterns: REST, gRPC, Events

Dec 2, 2025 Metasphere Engineering 13 min read

It starts with 3 services talking over REST. Everyone agrees REST is the fastest way to get moving. Fast forward to service 15, and an inventory lookup is taking 800ms under load. The order service calls inventory synchronously. The checkout page calls order synchronously. Users stare at a 2.4-second spinner for what used to be a 200ms page load.

Walking to someone’s desk and waiting for an answer. They’re on the phone. You wait. They need to check with someone else. You wait longer. Three desks deep and you’ve been standing for 2.4 seconds.

The gRPC framework will eventually replace some of these internal calls, but by then the SRE team is already pulling a late night tracing a cascade failure that started with a 97th-percentile database query three hops away from the service that’s actually timing out.

If you’ve been in that room, you already know where this is going.

Key takeaways

Synchronous REST between 15+ services creates a distributed monolith. All the coupling, none of the transactional guarantees. One slow service cascades through the entire call chain.
Async messaging breaks the coupling but introduces eventual consistency. Not every operation tolerates “your order will appear shortly.”
Circuit breakers without timeout tuning are ornamental. A 30-second default timeout means 30 seconds of requests piling up before the breaker trips. Set timeouts based on P99 latency, not defaults.
gRPC cuts payload size hard and adds strong typing, but introduces complexity in load balancing (HTTP/2 connection reuse) and debugging (binary protocol).
The decision is not sync vs async. It’s which operations need which pattern. Reads: sync. Fire-and-forget: async. Commands with confirmation: sync call that triggers async processing.

Communication pattern determines failure blast radius, consistency model, and team coupling. Make these microservice architecture decisions before they calcify.

The Synchronous Coupling Problem

Five services at 99.9% availability each. Chained synchronously, end-to-end availability drops to 99.5%. P99 of 200ms per hop means 1,000ms of network latency before your business logic even runs. One slow dependency exhausts the caller’s thread pool, which cascades upstream through every service in the chain. The chain is only as fast as the slowest link.

The arithmetic is unforgiving. Add a sixth service and availability drops further. Add retries without budgets and a single slow leaf service receives 81 requests from one originating user request (3 retries across 4 layers: 3^4). A partial slowdown becomes a complete outage. Site reliability engineering formalizes the circuit breakers that prevent propagation, but the structural problem is the synchronous chain itself.

Choosing Between Sync and Async

Producer publishes, moves on. Notification service down? Order still completes. Message waits in the queue. Drop a memo in the inbox. Go back to your desk. They read it when they’re free. The trade-off: eventual consistency. For most cross-domain events, eventual consistency is fine. For payment confirmation before order fulfillment, synchronous is still the right call. Some questions need an answer before you can leave the desk.

Pattern	Use When	Latency	Coupling	Failure Handling
Sync REST	Caller needs an answer now (reads, confirmations)	Low (direct)	High (caller blocks)	Circuit breaker + timeout
Sync gRPC	High-frequency internal calls, strong typing needed	Very low (binary, HTTP/2)	High	Same, plus load balancer awareness
Async events (Kafka)	Other domains should react, no confirmation needed	Variable (eventual)	Low (temporal decoupling)	DLQ + retry + idempotency
Async commands (SQS)	Fire-and-forget tasks, work queues	Variable	Low	Visibility timeout + DLQ
Request-reply (async)	Caller needs answer but can wait, no blocking	Medium	Medium	Correlation ID + timeout

Design for ordering and idempotency from day one. Kafka guarantees ordering within a partition only. Partition keys solve entity-scoped events, but bolt this on after 50 event types and it becomes a multi-sprint migration. Platform engineering codifies these defaults before they become afterthoughts.

Question	If Yes	If No
Does the caller need a response before it can proceed?	Synchronous required (continue below)	Async messaging (Kafka, SQS, RabbitMQ)
Is this a high-frequency internal call (>1000 RPS)?	gRPC (binary protobuf, HTTP/2 multiplexing)	Continue below
Must consistency be immediate (e.g. payment confirmation)?	Synchronous REST or gRPC (based on volume)	REST (external APIs, low-frequency calls)

gRPC for Internal High-Frequency Calls

Protobuf payloads are much smaller than JSON equivalents. HTTP/2 multiplexing means dozens of concurrent RPCs over a single connection. Generated stubs eliminate runtime type mismatches. The intercom system. Faster than walking. Both parties on the line at once. The setup cost: proto compilation, versioning, and schema evolution discipline. Never reuse field numbers after deprecation.

For external APIs: REST. Browser clients expect JSON, and debugging binary protocols requires specialized tooling. For internal calls above 1,000 RPS: gRPC pays back in weeks through reduced bandwidth and stronger contracts. Distributed systems at scale benefit from sustainable proto management.

Dimension	REST / JSON	gRPC / Protobuf
Payload format	JSON text. Human-readable, larger wire size	Binary protobuf. Compact, 3-10x smaller than JSON
Connection model	HTTP/1.1: one request per connection (HTTP/2 helps but not universal)	HTTP/2 multiplexing: many RPCs per connection
Type safety	Runtime validation. Silent mismatches possible between client and server	Compile-time contracts. Generated stubs catch mismatches at build
Tooling barrier	Low. curl, Postman, browser devtools	Higher. grpcurl, proto management, codegen pipeline
Best for	External APIs, browser clients, <1000 RPS	Internal service calls, high-frequency, >1000 RPS

Anti-pattern

Don’t: Use gRPC for every internal call because “it’s faster.” A service called 10 times per minute gains nothing from binary serialization. The debugging overhead (no curl, no browser dev tools, binary wire format) outweighs the tiny latency savings.

Do: Reserve gRPC for high-frequency internal paths above 1,000 RPS or where strong typing across team boundaries prevents integration bugs. Use REST everywhere else.

Circuit Breakers and Retry Budgets

A circuit breaker monitors error rates and opens when failures exceed a threshold (typically 50% over a 10-second window). Open state means calls fail fast. No network requests. No thread pool exhaustion. The elevator that stops accepting passengers when the lobby is full. Better than everyone piling in and getting stuck. After a cooldown (30 seconds is a reasonable default), the breaker enters half-open state and allows probe requests to test recovery.

The resilience4j library implements these patterns in Java, and the resilience patterns guide covers tuning in depth. But the breaker is only half the defense. Retry budgets are the other half: 3 retries per layer maximum, exponential backoff starting at 100ms, plus jitter to prevent thundering herds. Without budgets, a 4-service chain amplifies 1 request into 81. One memo becomes 81 copies in the mail room. (The mail room is on fire.)

Prerequisites

Circuit breaker library with configurable thresholds per downstream dependency
Retry budget capped at 3 attempts with exponential backoff and jitter
Timeout set to 2x P99 latency of the downstream service, not the library default
Dead letter queue set up for async paths with alerting on DLQ depth
Distributed tracing active across all service-to-service calls

Sagas for Distributed Transactions

Distributed transactions across microservices don’t have the luxury of ACID guarantees. The saga pattern breaks a multi-service operation into local transactions, each with a compensating action for rollback. If step 3 of a 5-step saga fails, steps 2 and 1 run their compensating transactions in reverse order. A multi-department process. Shipping fails? Undo the inventory hold and refund the payment. In reverse. Praying the refund department isn’t at lunch.

Orchestration wins beyond 3-4 steps. A central coordinator manages the sequence, tracks state, and triggers compensation. Choreography (each service reacts to events from the previous one) works for simple two-step flows but becomes impossible to reason about when six services are chained. Nobody can see the full picture.

Two hard failure modes that code alone can’t solve: the compensating action itself fails (refund API is down when you need to reverse a charge), and partial success where compensation is impossible (product already shipped before the payment bounced). Both need runbooks and human escalation paths, not just retry loops.

Choreography vs orchestration: when each pattern fits

Choreography works when the saga has 2-3 steps, each service owns its compensation logic, and the flow is linear. Payment reserved, then inventory reserved. If inventory fails, payment service listens for the failure event and reverses itself. Simple. Decoupled. Two desks. Each knows what to do.

Beyond 3 steps, choreography becomes a distributed state machine that nobody can draw on a whiteboard. Each service must know which events to react to and which compensating events to emit. Six departments each reacting to memos from the previous department. Nobody sees the whole process. Orchestration centralizes the coordination. The saga coordinator knows the full sequence, tracks progress in a persistent store, and triggers compensation in the correct reverse order. More coupling to the coordinator, but much easier to debug and extend. A project manager who owns the process. More overhead. But someone can actually explain what’s happening.

The Distributed Monolith A system with the deployment topology of microservices and the coupling characteristics of a monolith. Synchronous HTTP between 15+ services, shared database tables, coordinated deployments. Everyone moved to separate buildings but still walks to each other’s desk for everything. All the operational overhead of microservices with none of the independence benefits. The worst of both worlds, and disturbingly common.

What the Industry Gets Wrong About Microservice Communication

“Start with REST, switch to async later.” Switching from sync to async after 20 services are wired together means renegotiating contracts with every caller. The interface shape is baked into retry logic, error handling, and deployment pipelines across a dozen codebases. Making the right pattern choice early costs a few days of design discussion. Making it late costs quarters of migration. Ripping out the phone system after everyone’s memorized the extensions.

“gRPC is always faster than REST.” gRPC is faster for payload serialization. Binary protobuf is much more compact than JSON. But when the bottleneck is the database query behind the API, a service that takes 400ms to query PostgreSQL returns in 400ms regardless of the transport protocol. Faster intercom doesn’t help when the person on the other end needs 400ms to look up the answer. Measure where latency actually lives before changing protocols.

Our take Default to async events for cross-domain communication. Drop a memo. Default to sync (REST or gRPC) for queries where the caller needs an answer before proceeding. Walk to the desk. Make this decision per use case, not per service. A single service might expose sync endpoints for reads and publish async events for writes. The pattern should match the operation’s consistency needs, not the architecture diagram’s aesthetic preferences.

That 800ms inventory lookup cascading through three synchronous hops? With async events carrying cross-domain state changes and circuit breakers capping the remaining sync calls, the same traffic spike degrades gracefully instead of toppling the checkout page. The services are still talking. They just stopped standing in line. Memos in inboxes. Intercoms for urgent questions. The building runs because people stopped blocking each other.

Frequently Asked Questions

When should microservices use synchronous vs asynchronous communication?

Use synchronous communication when the caller needs a response before it can proceed, such as confirming inventory before accepting payment. Use asynchronous messaging when the caller doesn’t need an immediate response, such as sending notifications or publishing domain events. In a chain of 5 services each at 99.9% availability, synchronous coupling drops end-to-end availability to 99.5%. Default to async for cross-domain events and reserve sync for queries needing definitive answers.

What is the retry amplification problem in microservice chains?

In a 4-service chain where each layer retries 3 times, a single slow leaf service can receive 81 requests from 1 user request (3^4). This amplification turns a partial slowdown into a complete outage under load. The defense is retry budgets capped at 3 attempts with exponential backoff starting at 100ms plus jitter, combined with circuit breakers that open after a 50% error rate over a 10-second window.

What is gRPC and when should microservices use it over REST?

gRPC uses HTTP/2 and Protocol Buffers, producing payloads much smaller than equivalent JSON with strongly typed contracts and generated client stubs. Use gRPC for high-frequency internal calls above roughly 1,000 RPS where payload efficiency compounds. REST stays preferable for external-facing APIs and lower-frequency internal calls where developer ergonomics and browser compatibility outweigh the performance delta.

What is the saga pattern and how does it handle distributed transactions?

The saga pattern breaks a distributed transaction into local transactions, each with a compensating action for rollback. If step 3 of a 5-step saga fails, steps 2 and 1 run their compensating transactions in reverse order. Every step must have an idempotent compensating transaction designed before building. The two hard failure modes are compensating transactions that themselves fail (needing retry infrastructure) and partial success where compensation is impossible (needing manual intervention runbooks).

How does service discovery work in Kubernetes microservice architectures?

Service discovery lets services find each other without hardcoded addresses. In Kubernetes, DNS-based discovery is built in and resolves within milliseconds. For non-Kubernetes environments, client-side discovery queries a registry like Consul or Eureka. Service meshes like Istio and Linkerd add discovery to the data plane alongside mTLS and circuit breaking without application code changes, though the sidecar proxy adds nontrivial memory overhead per pod.