Multi-Cloud Networking: Connectivity Without Lock-in
You run workloads on two cloud providers. Compute was straightforward. Kubernetes clusters in both, Terraform modules per provider, deployment pipelines that target each independently. Then someone asks: “Why can’t Service A in AWS talk to Service B in Azure?”
Two countries. Roads built. Cars running. Then someone tries to drive from one country to the other and discovers there’s no bridge.
And suddenly you discover that multi-cloud compute was never the hard part. Multi-cloud networking is.
Different providers use different networking models. The IETF Network Working Group is standardizing overlays, but proprietary models remain the reality. Connecting AWS VPCs, Azure VNets, and GCP VPCs means maintaining a translation layer across three stacks, each designed by engineers who assumed you’d never need the other two. Three countries. Three different road systems. Three different address formats. None of them interoperable by default.
- Multi-cloud compute is straightforward. Multi-cloud networking is where the real pain lives. AWS VPCs, Azure VNets, and GCP VPCs have incompatible assumptions about routing, DNS, and firewalls. Three countries, three road systems.
- CIDR range conflicts are the #1 networking blocker. Plan non-overlapping ranges across all providers before the first VPC exists. Both countries using the same house numbers.
- DNS resolution across clouds needs deliberate forwarding architecture. Services in AWS can’t resolve Azure private DNS zones by default. The postal system that doesn’t work across borders.
- Egress charges are the hidden budget killer. Data transfer costs between clouds compound faster than teams expect.
- A second region in one provider often beats a second provider entirely. Geographic redundancy at a fraction of the networking complexity. A second city beats a second country.
The multi-cloud strategy conversation focuses on compute portability. Networking is where the actual operational cost lives.
Transit Gateways and Interconnect Architecture
Each provider built its own routing hub. AWS Transit Gateway (regional router, 5,000 attachments, cross-region requires peering). Azure Virtual WAN (managed hub, simpler operations but less routing flexibility). GCP Network Connectivity Center (treats all locations as spokes through Google’s backbone). Each country built its own highway system. Each operates differently. Connecting them requires a separate bridge.
The interconnection options span a wide range of cost, performance, and provisioning complexity:
| Interconnect Type | Bandwidth | Latency | Provision Time | Best For |
|---|---|---|---|---|
| Site-to-site VPN | ~1.25 Gbps per tunnel | Variable, 10-30ms added | Minutes | Dev/test, low-volume cross-cloud |
| Dedicated interconnect (Direct Connect, ExpressRoute) | 1-100 Gbps | 2-5ms same-metro | 2-8 weeks | Production workloads, predictable latency |
| Cloud exchange (Equinix Fabric, Megaport) | 1-100 Gbps | 2-5ms same-metro | Minutes to hours | Multi-provider setups (3+ clouds) |
VPN tunnels work for basic IP reachability in dev environments. A dirt road between countries. For production latency needs, they’re a non-starter. A cloud exchange provides a single physical point where you set up private connections to all three providers. The international bridge with lanes to every country. New cross-connects provision in minutes instead of weeks, and you pay one port fee instead of three.
IP Address Management: Get It Right on Day One
Both AWS and Azure default to 10.0.0.0/16 for new VPCs. Every team discovers this at the worst possible moment: when they try to route between the two networks and realize the address spaces overlap. Both countries using the same house numbering system. 123 Main Street exists in both. Mail goes to the wrong one.
Don’t: Let each team provision VPCs with whatever CIDR range the provider defaults to. By the time you find overlaps, re-addressing means rebuilding the VPC and migrating every workload in it. Renumbering every house on the street while people are living in them.
Do: Allocate non-overlapping supernets before provisioning anything. A clean pattern: 10.0.0.0/12 for AWS, 10.16.0.0/12 for Azure, 10.32.0.0/12 for GCP, 10.48.0.0/12 for on-premises. Document them in a shared registry and enforce allocation through a centralized IPAM tool.
Centralized IPAM is not optional at scale. Without it, the third team provisioning infrastructure will overlap with the first provider. And by the time someone notices, both environments are in production. Two neighborhoods with the same addresses. The ambulance goes to the wrong house.
- Non-overlapping CIDR ranges reserved per provider, per region, per environment before any VPC creation
- Centralized IPAM registry accessible to all infrastructure teams with allocation enforcement
- Transit gateway or cloud exchange provisioned with cross-provider connectivity tested
- DNS forwarding zones configured between all provider pairs for cross-cloud name resolution
- Egress cost baseline measured for current cross-cloud traffic patterns
Service Mesh Across Cloud Boundaries
Istio multi-cluster federation connects meshes through east-west gateways with independent control planes. The shared CA (Vault or cert-manager with a common root) enables cross-cluster mTLS. Workloads in AWS call workloads in Azure as if they were in the same mesh, with encryption, traffic management, and observability spanning the boundary. The international passport system. Different countries, shared identity verification.
The operational cost is real. Consistent certificate authorities across providers. Synchronized service discovery. East-west gateways that add 1-3ms per hop on top of the 2-5ms base interconnect latency. Customs checkpoints at every border crossing. Necessary. Not free.
Consul mesh gateways provide simpler cross-cloud connectivity with lighter operations but fewer L7 traffic management features. Cilium Cluster Mesh connects Kubernetes clusters into a single discovery and policy layer without sidecars, trading L7 features for lower overhead.
The critical design principle: keep hot-path traffic within a single provider. Keep local traffic local. Cross-cloud calls belong in batch processing, failover, and data sync. If your service mesh routes latency-sensitive requests across cloud boundaries, the real problem is which services live where, not how the mesh connects them. Driving across the border for groceries. Maybe the grocery store should be in your country.
DNS and Cross-Cloud Routing
Services in AWS can’t resolve Azure private DNS zones by default. And vice versa. The postal system that doesn’t work across borders. Cross-cloud DNS resolution requires deliberate forwarding architecture.
Weighted DNS routing, health-check failover, and latency-based routing all work across providers without cross-cloud networking infrastructure. Route 53, Azure Traffic Manager, and Cloud DNS support health-checked records. Failover takes 30-60 seconds via TTL propagation. For DR scenarios measured in minutes, DNS failover is simple and effective.
Active-passive with DNS failover is operationally simpler and enough for most DR needs. Active-active requires data replication across clouds and consistent routing, which multiplies complexity fast.
The Egress Cost Trap
Every provider charges for data leaving their network. Ingress is free. Egress is not. Tolls at every border. And the charges compound at every transit point.
A pipeline processing 10 TB daily in AWS that sends results to Azure incurs charges that can exceed compute cost over a year. The math is brutal: AWS egress fees, plus NAT gateway processing fees, plus Direct Connect data transfer fees. Each transit hop adds its own per-GB charge. Three toll booths between you and the destination.
The principle that saves money: keep compute close to data. Process where the data lives. Send only the output across the border. A 10 TB dataset produces 50 MB of results? Ship the results, not the dataset. Don’t drive the factory to the customer. Drive the product.
SD-WAN and hybrid connectivity considerations
SD-WAN provides direct cloud on-ramps from branch offices instead of backhauling all traffic through a centralized data center. Application-aware routing sends AWS-destined traffic through the AWS on-ramp, Azure traffic through Azure. The latency improvement for branch users is real. Direct flights instead of connecting through a single hub airport.
The challenge is security policy consistency. When traffic no longer funnels through a single inspection point, security policy enforcement must move to the edge or to each cloud’s ingress. SASE (Secure Access Service Edge) platforms combine SD-WAN routing with cloud-delivered security, but add another vendor dependency and control plane to manage.
For organizations already running centralized inspection (firewall, IDS/IPS, DLP), moving to SD-WAN means rearchitecting the security model, not just the routing model. Moving from one customs checkpoint to checkpoints at every border crossing.
When Multi-Cloud Networking Is Not Worth It
If you have fewer than 3 genuinely different workloads per secondary provider and cross-cloud data movement exceeds 5 TB monthly, the egress costs, operational overhead, and engineering time debugging cross-cloud routing issues typically exceed the value of the second provider. A second city in the same country gives geographic redundancy at a fraction of the complexity. A second country adds international customs, border tolls, and a legal system you don’t speak.
| Multi-Cloud Networking Makes Sense | A Second Region Is Probably Better |
|---|---|
| Genuine best-of-breed needs per provider (e.g., GCP for ML, AWS for everything else) | Geographic redundancy is the primary goal |
| Regulatory requirement to avoid single-provider dependency | Fewer than 3 different workloads in the secondary provider |
| Existing production workloads in multiple providers from M&A | Cross-cloud data transfer exceeds 5 TB/month |
| Provider-specific managed services with no equivalent elsewhere | Team lacks multi-cloud networking expertise |
Solid infrastructure architecture starts with this assessment. The multi-cloud and hybrid cloud capabilities become relevant only after the business case clears this bar.
What the Industry Gets Wrong About Multi-Cloud Networking
“VPN tunnels between clouds solve connectivity.” VPN tunnels solve basic IP reachability. A dirt road between countries. They don’t solve DNS resolution across cloud boundaries, overlapping CIDR ranges, incompatible security group models, or the fact that each provider’s load balancer works differently. Connectivity is step one of roughly twelve.
“Multi-cloud networking is a one-time setup.” Networking evolves with every new service, every new region, and every firewall rule change. A transit gateway configured for 5 services handles 5 services. Service 6 requires a new route, a new DNS forwarding rule, and a new firewall entry. International trade agreements that need renegotiating every time a new product ships. Multi-cloud networking is ongoing operational investment.
“Egress costs are small.” Egress charges compound at every transit point. A dataset moving from S3 through NAT Gateway to Direct Connect to Azure Event Hub collects a toll at each hop. Teams that model cross-cloud costs using only the headline per-GB egress rate are consistently surprised by the actual bill. Three toll booths, not one.
“Why can’t Service A talk to Service B?” The answer involves transit gateways, mesh federation, DNS forwarding, IPAM enforcement, and security policy across incompatible stacks. Three countries. Three road systems. Three sets of traffic laws. Whether the bridge is worth building depends on whether the second country earns its keep through genuinely different capabilities, or whether a second city would have done the job with a tenth of the complexity.