Microservice Communication Patterns: REST, gRPC, Events

Dec 2, 2025 Metasphere Engineering 10 min read

It starts with 3 services talking over REST. Everyone agrees it’s the fastest way to get moving. Fast forward to service 15, and somebody’s inventory lookup is taking 800ms under load. Because the order service calls inventory synchronously, and the checkout page calls order synchronously, your users are staring at a 2.4-second spinner for what used to be a 200ms page load. The SRE team pulls a late night tracing a cascade failure that started with a 97th-percentile database query in a service three hops away from the one that’s actually timing out. If you’ve been in this room at 11 PM, you already know where this article is going.

By the time you’ve got 20 services wired together with synchronous HTTP, changing even one service to async means renegotiating the contract with every caller. The interface shape is baked into deployment pipelines, retry logic, and error handling across a dozen codebases. Congratulations. You’ve built a distributed monolith with network hops instead of function calls.

Communication pattern is not a detail to sort out after the services are running. It determines your failure blast radius, your consistency model, and the coupling between teams who should not need to coordinate on every release. Getting the defaults right early is what separates architectures that scale from ones that calcify. Making these microservice architecture decisions before they’re expensive to reverse is the whole game.

The Synchronous Coupling Problem

REST between services means the calling service blocks until it gets a response. In a chain, latency adds and availability multiplies. This is just math. Five services each at 99.9% availability produce a chain with 99.5% end-to-end availability. If the slowest service in that chain has P99 latency of 200ms, the chain’s P99 is at least 1,000ms before your own code even runs. These are not implementation bugs. They are the arithmetic consequences of synchronous coupling.

The more dangerous property is failure coupling. When the inventory service is overloaded and responding in 10 seconds, the order service waits 10 seconds. The API gateway waits 10 seconds. Users see 10-second page loads. Without circuit breakers, one slow dependency grinds every transaction path that touches it to a halt.

Here’s the part most teams learn too late: the failure mode is not one slow service. It’s one slow service that causes thread pool exhaustion in its callers, which causes those callers to become slow, which exhausts thread pools in their callers. A cascade. By the time your monitoring alerts fire, three services are down and the root cause is a database index that dropped on a service nobody was watching. Site reliability engineering practices formalize the circuit breaker and retry budget parameters that prevent this cascade from propagating across service boundaries.

When to Choose Async

Asynchronous messaging through Kafka, RabbitMQ, or SQS decouples services temporally. The producer publishes and moves on. If the notification service is down when an order is placed, the order still completes. The message sits in the queue. When the notification service recovers, it processes the backlog. The order service has no awareness that the notification service even exists. That’s the power of temporal decoupling.

The trade-off is eventual consistency. An order placed at 10:00:00 may not show in the analytics dashboard until 10:00:05. For most cross-domain events, that’s perfectly fine. For use cases where the caller needs to know the outcome before proceeding (confirming inventory availability before accepting payment, for instance) synchronous communication is the right call.

Here is the heuristic that holds up after dozens of these decisions: use async for things other domains should react to but don’t need to confirm. Use sync for queries and commands where the caller needs a definitive answer before proceeding. Make this choice explicitly per use case. Do not default to one pattern for everything. Cloud-native platform engineering practices codify these defaults so individual teams aren’t reinventing the decision on every new service.

One thing that catches teams off-guard with async, and it will catch you if you’re not deliberate: you need to design for message ordering and idempotency from day one. Kafka guarantees ordering within a partition, but not across partitions. If your order-created and order-cancelled events land in different partitions, a consumer processes them out of order. Partition keys solve this for entity-scoped events, but you need to think about it upfront. Bolting it on after you’ve already got 50 event types in production is a multi-sprint effort that nobody wants to fund.

gRPC for Internal High-Frequency Calls

For internal service-to-service calls where latency and throughput compound, gRPC is worth the tooling investment. Protocol Buffers produce payloads 30-60% smaller than equivalent JSON. HTTP/2 multiplexing enables multiple concurrent RPCs over a single TCP connection. Strongly typed proto contracts generate client and server stubs in Go, Java, Python, TypeScript, or whatever your teams run. No more runtime type mismatches that JSON-over-REST silently allows.

The setup cost is real, though. Don’t pretend otherwise. Proto files need to be compiled and distributed. Generated code needs to be versioned alongside the proto definitions. Service teams need to understand proto schema evolution rules: field numbering, required vs. optional semantics, and the discipline of never reusing field numbers after deprecation. A team that renames field 3 instead of deprecating it and adding field 8 will produce a wire-compatible but semantically broken contract that passes all tests. This exact mistake happens more than once.

For external APIs where developer ergonomics matter, browser compatibility is needed, or you want engineers to be able to curl your endpoints, REST is still the right default. For internal calls above roughly 1,000 RPS, the gRPC investment pays back in weeks. Solid distributed systems engineering covers the proto management and API evolution patterns that keep gRPC sustainable at scale.

Circuit Breakers and Retry Budgets

Microservice architectures without circuit breakers are not resilient architectures. They’re architectures that haven’t failed badly enough yet. Wire them in before the first production traffic, not after the first outage.

The pattern itself is straightforward. Track the error rate of calls to each downstream dependency over a rolling window. When the error rate exceeds your threshold (50% over a 10-second window is a reasonable starting point), open the circuit. Subsequent calls fail immediately with a local error rather than making network calls to the failing dependency. The caller returns a degraded response: cached data, a graceful fallback, or an honest error. After a configured cooldown (30 seconds is typical), allow a small number of probe requests through. If they succeed, close the circuit. If they fail, extend the open state.

The subtlety is tuning, and this is where teams spend real time. Set the threshold too sensitive and circuits open during normal traffic spikes. Circuits open every morning when traffic ramps up after overnight lulls. Set it too loose and the circuit opens only after hundreds of requests have already timed out, which means hundreds of users already had a bad experience.

Retry budgets are the companion control, and they matter just as much. In a 4-service chain where each layer retries 3 times, a single failing leaf service receives 3^4 = 81 requests from one originating request. That amplification turns a struggling service into a dead service. The standard defense: cap retries at 3 attempts per layer, use exponential backoff with jitter starting at 100ms (the jitter prevents synchronized retry storms from all callers hitting the failing service simultaneously), and let the circuit breaker handle sustained failures rather than relying on retries to eventually succeed.

Sagas for Distributed Transactions

The moment you split a monolith into services, you lose database transactions that span multiple entities. An order that debits a wallet, reserves inventory, and creates a shipment used to be one transaction with ACID guarantees. Now it’s three service calls, each with its own database, and “rollback” does not mean what it used to.

The saga pattern is the standard answer. Each step in a multi-service operation has a corresponding compensating action. If step 3 fails, steps 2 and 1 execute their compensating actions in reverse. Choreography-based sagas use events: the inventory service publishes “inventory.reserved” and the payment service reacts. Orchestration-based sagas use a coordinator service that directs each step explicitly.

In practice, orchestration wins for anything beyond 3-4 steps. Don’t fight this. Choreographed sagas across 6 services become impossible to reason about when you need to answer “what happens if step 4 fails after step 3 succeeded?” The event chain is distributed across 6 codebases and you need to read all of them to understand the rollback sequence. Nobody wants to do that at 3 AM.

The two genuinely hard failure modes are the ones nobody thinks about until they’re in production: the compensating action itself fails (the payment refund API is down when you need to compensate), and partial success where compensation is impossible (you shipped the package before the payment bounced). Both require runbooks, not just code. Teams that don’t design compensating actions before implementing the forward path always discover this gap in production. Always.

Communication pattern choices made in the first few months of a microservice architecture become load-bearing walls that are expensive to tear out once multiple teams are building against them. Default to async for cross-domain events. Use sync only when the caller genuinely needs a response to proceed. Wire in circuit breakers before the first production traffic. Design compensating actions before implementing forward paths. Get these four things right early, and the architecture scales. Get them wrong, and you’ll spend a year paying down the debt.

Frequently Asked Questions

When should microservices use synchronous vs asynchronous communication?

Use synchronous communication when the caller needs a response before it can proceed, such as confirming inventory before accepting payment. Use asynchronous messaging when the caller does not need an immediate response, such as sending notifications or publishing domain events. In a chain of 5 services each at 99.9% availability, synchronous coupling drops end-to-end availability to 99.5%. Default to async for cross-domain events and reserve sync for queries requiring definitive answers.

What is the retry amplification problem in microservice chains?

In a 4-service chain where each layer retries 3 times, a single slow leaf service can receive 81 requests from 1 originating user request (3^4). This amplification turns a partial degradation into a complete outage under load. The standard defense is retry budgets capped at 3 attempts with exponential backoff starting at 100ms plus jitter, combined with circuit breakers that open after a 50% error rate over a 10-second window.

What is gRPC and when should microservices use it over REST?

gRPC uses HTTP/2 and Protocol Buffers, producing payloads 30-60% smaller than equivalent JSON with strongly typed contracts and generated client stubs. Use gRPC for high-frequency internal calls above roughly 1,000 RPS where payload efficiency compounds. REST remains preferable for external-facing APIs and lower-frequency internal calls where developer ergonomics and browser compatibility outweigh the performance delta.

What is the saga pattern and how does it handle distributed transactions?

The saga pattern decomposes a distributed transaction into local transactions, each with a compensating action for rollback. If step 3 of a 5-step saga fails, steps 2 and 1 execute their compensating transactions in reverse order. Every step must have an idempotent compensating transaction designed before implementation. The two hard failure modes are compensating transactions that themselves fail (requiring retry infrastructure) and partial success where compensation is impossible (requiring manual intervention runbooks).

How does service discovery work in Kubernetes microservice architectures?

Service discovery lets services locate each other without hardcoded addresses. In Kubernetes, DNS-based discovery is built in and resolves within milliseconds. For non-Kubernetes environments, client-side discovery queries a registry like Consul or Eureka. Service meshes like Istio and Linkerd add discovery to the data plane alongside mTLS and circuit breaking without application code changes, but add 50-100MB memory overhead per pod.