← Back to Insights

Multi-Cloud Networking: Connectivity Without Lock-in

Metasphere Engineering 14 min read

You run workloads on two cloud providers. Compute was straightforward. Kubernetes clusters in both, Terraform modules per provider, deployment pipelines that target each independently. Then someone asks: “Why can’t Service A in AWS talk to Service B in Azure?”

Two countries. Roads built. Cars running. Then someone tries to drive from one country to the other and discovers there’s no bridge.

And suddenly you discover that multi-cloud compute was never the hard part. Multi-cloud networking is.

Different providers use different networking models. The IETF Network Working Group is standardizing overlays, but proprietary models remain the reality. Connecting AWS VPCs, Azure VNets, and GCP VPCs means maintaining a translation layer across three stacks, each designed by engineers who assumed you’d never need the other two. Three countries. Three different road systems. Three different address formats. None of them interoperable by default.

Key takeaways
  • Multi-cloud compute is straightforward. Multi-cloud networking is where the real pain lives. AWS VPCs, Azure VNets, and GCP VPCs have incompatible assumptions about routing, DNS, and firewalls. Three countries, three road systems.
  • CIDR range conflicts are the #1 networking blocker. Plan non-overlapping ranges across all providers before the first VPC exists. Both countries using the same house numbers.
  • DNS resolution across clouds needs deliberate forwarding architecture. Services in AWS can’t resolve Azure private DNS zones by default. The postal system that doesn’t work across borders.
  • Egress charges are the hidden budget killer. Data transfer costs between clouds compound faster than teams expect.
  • A second region in one provider often beats a second provider entirely. Geographic redundancy at a fraction of the networking complexity. A second city beats a second country.

The multi-cloud strategy conversation focuses on compute portability. Networking is where the actual operational cost lives.

Transit Gateways and Interconnect Architecture

Each provider built its own routing hub. AWS Transit Gateway (regional router, 5,000 attachments, cross-region requires peering). Azure Virtual WAN (managed hub, simpler operations but less routing flexibility). GCP Network Connectivity Center (treats all locations as spokes through Google’s backbone). Each country built its own highway system. Each operates differently. Connecting them requires a separate bridge.

The interconnection options span a wide range of cost, performance, and provisioning complexity:

Interconnect TypeBandwidthLatencyProvision TimeBest For
Site-to-site VPN~1.25 Gbps per tunnelVariable, 10-30ms addedMinutesDev/test, low-volume cross-cloud
Dedicated interconnect (Direct Connect, ExpressRoute)1-100 Gbps2-5ms same-metro2-8 weeksProduction workloads, predictable latency
Cloud exchange (Equinix Fabric, Megaport)1-100 Gbps2-5ms same-metroMinutes to hoursMulti-provider setups (3+ clouds)

VPN tunnels work for basic IP reachability in dev environments. A dirt road between countries. For production latency needs, they’re a non-starter. A cloud exchange provides a single physical point where you set up private connections to all three providers. The international bridge with lanes to every country. New cross-connects provision in minutes instead of weeks, and you pay one port fee instead of three.

IP Address Management: Get It Right on Day One

Both AWS and Azure default to 10.0.0.0/16 for new VPCs. Every team discovers this at the worst possible moment: when they try to route between the two networks and realize the address spaces overlap. Both countries using the same house numbering system. 123 Main Street exists in both. Mail goes to the wrong one.

Anti-pattern

Don’t: Let each team provision VPCs with whatever CIDR range the provider defaults to. By the time you find overlaps, re-addressing means rebuilding the VPC and migrating every workload in it. Renumbering every house on the street while people are living in them.

Do: Allocate non-overlapping supernets before provisioning anything. A clean pattern: 10.0.0.0/12 for AWS, 10.16.0.0/12 for Azure, 10.32.0.0/12 for GCP, 10.48.0.0/12 for on-premises. Document them in a shared registry and enforce allocation through a centralized IPAM tool.

Cross-cloud routing with IPAM allocation and DNS-based failoverThree cloud providers with non-overlapping CIDR blocks connected through a cloud exchange, with DNS health checks routing client traffic to the healthy provider and failing over automatically when a provider becomes unreachablePhase 1: Non-Overlapping CIDR Allocation Per ProviderAWS10.0.0.0/121,048,576 addressesAzure10.16.0.0/121,048,576 addressesGCP10.32.0.0/121,048,576 addressesZero overlap. No NAT required at transit points. Full end-to-end traceability.Phase 2: Cloud Exchange Interconnect (Equinix / Megaport)2-5ms latency2-5ms latency2-5ms latencyDirect ConnectExpressRouteInterconnectCloud ExchangeSingle physical PoP, multiple providersPhase 3: DNS Health-Checked Routing (Active-Active)Client Requestapi.example.comGlobal DNSHealth checks + weightsAWS ALB (Weight: 70%)HealthyAzure AG (Weight: 30%)HealthyPhase 4: Automatic Failover (AWS Health Check Fails)AWS ALB (Weight: 0%)FailedAzure AG (Weight: 100%)ActiveDNS TTL: 30-60s propagation. Zero cross-cloud networking required for failover.

Centralized IPAM is not optional at scale. Without it, the third team provisioning infrastructure will overlap with the first provider. And by the time someone notices, both environments are in production. Two neighborhoods with the same addresses. The ambulance goes to the wrong house.

Prerequisites
  1. Non-overlapping CIDR ranges reserved per provider, per region, per environment before any VPC creation
  2. Centralized IPAM registry accessible to all infrastructure teams with allocation enforcement
  3. Transit gateway or cloud exchange provisioned with cross-provider connectivity tested
  4. DNS forwarding zones configured between all provider pairs for cross-cloud name resolution
  5. Egress cost baseline measured for current cross-cloud traffic patterns

Service Mesh Across Cloud Boundaries

Istio multi-cluster federation connects meshes through east-west gateways with independent control planes. The shared CA (Vault or cert-manager with a common root) enables cross-cluster mTLS. Workloads in AWS call workloads in Azure as if they were in the same mesh, with encryption, traffic management, and observability spanning the boundary. The international passport system. Different countries, shared identity verification.

The operational cost is real. Consistent certificate authorities across providers. Synchronized service discovery. East-west gateways that add 1-3ms per hop on top of the 2-5ms base interconnect latency. Customs checkpoints at every border crossing. Necessary. Not free.

Consul mesh gateways provide simpler cross-cloud connectivity with lighter operations but fewer L7 traffic management features. Cilium Cluster Mesh connects Kubernetes clusters into a single discovery and policy layer without sidecars, trading L7 features for lower overhead.

The critical design principle: keep hot-path traffic within a single provider. Keep local traffic local. Cross-cloud calls belong in batch processing, failover, and data sync. If your service mesh routes latency-sensitive requests across cloud boundaries, the real problem is which services live where, not how the mesh connects them. Driving across the border for groceries. Maybe the grocery store should be in your country.

DNS and Cross-Cloud Routing

Services in AWS can’t resolve Azure private DNS zones by default. And vice versa. The postal system that doesn’t work across borders. Cross-cloud DNS resolution requires deliberate forwarding architecture.

Weighted DNS routing, health-check failover, and latency-based routing all work across providers without cross-cloud networking infrastructure. Route 53, Azure Traffic Manager, and Cloud DNS support health-checked records. Failover takes 30-60 seconds via TTL propagation. For DR scenarios measured in minutes, DNS failover is simple and effective.

DNS-Based Cross-Cloud FailoverDNS-Based Cross-Cloud FailoverRoute 53 (DNS)Health checks every 10sAWS (Primary)Healthy. 100% trafficAzure (Standby)Warm. 0% trafficAWS (Failed)Health check failed 3xAzure (Active)DNS updated. 100% trafficDNS failover is the simplest cross-cloud resilience. TTL determines recovery time.

Active-passive with DNS failover is operationally simpler and enough for most DR needs. Active-active requires data replication across clouds and consistent routing, which multiplies complexity fast.

The Egress Cost Trap

Every provider charges for data leaving their network. Ingress is free. Egress is not. Tolls at every border. And the charges compound at every transit point.

A pipeline processing 10 TB daily in AWS that sends results to Azure incurs charges that can exceed compute cost over a year. The math is brutal: AWS egress fees, plus NAT gateway processing fees, plus Direct Connect data transfer fees. Each transit hop adds its own per-GB charge. Three toll booths between you and the destination.

Data Gravity: Cross-Cloud Egress CompoundsData Gravity: Cross-Cloud Egress CompoundsAWSPrimary data store50TB growing monthlyAzureCompute workloadsNeed AWS data daily$0.09/GB egress10TB/day = $900/day = $27,000/monthReplication doubles it: $54,000/monthDR + analytics + ML pipelinesData gravity is real. Moving compute to data is cheaper than moving data to compute.

The principle that saves money: keep compute close to data. Process where the data lives. Send only the output across the border. A 10 TB dataset produces 50 MB of results? Ship the results, not the dataset. Don’t drive the factory to the customer. Drive the product.

SD-WAN and hybrid connectivity considerations

SD-WAN provides direct cloud on-ramps from branch offices instead of backhauling all traffic through a centralized data center. Application-aware routing sends AWS-destined traffic through the AWS on-ramp, Azure traffic through Azure. The latency improvement for branch users is real. Direct flights instead of connecting through a single hub airport.

The challenge is security policy consistency. When traffic no longer funnels through a single inspection point, security policy enforcement must move to the edge or to each cloud’s ingress. SASE (Secure Access Service Edge) platforms combine SD-WAN routing with cloud-delivered security, but add another vendor dependency and control plane to manage.

For organizations already running centralized inspection (firewall, IDS/IPS, DLP), moving to SD-WAN means rearchitecting the security model, not just the routing model. Moving from one customs checkpoint to checkpoints at every border crossing.

When Multi-Cloud Networking Is Not Worth It

If you have fewer than 3 genuinely different workloads per secondary provider and cross-cloud data movement exceeds 5 TB monthly, the egress costs, operational overhead, and engineering time debugging cross-cloud routing issues typically exceed the value of the second provider. A second city in the same country gives geographic redundancy at a fraction of the complexity. A second country adds international customs, border tolls, and a legal system you don’t speak.

Multi-Cloud Networking Makes SenseA Second Region Is Probably Better
Genuine best-of-breed needs per provider (e.g., GCP for ML, AWS for everything else)Geographic redundancy is the primary goal
Regulatory requirement to avoid single-provider dependencyFewer than 3 different workloads in the secondary provider
Existing production workloads in multiple providers from M&ACross-cloud data transfer exceeds 5 TB/month
Provider-specific managed services with no equivalent elsewhereTeam lacks multi-cloud networking expertise

Solid infrastructure architecture starts with this assessment. The multi-cloud and hybrid cloud capabilities become relevant only after the business case clears this bar.

The CIDR Collision Overlapping IP address ranges between cloud providers that prevent direct routing. Both AWS and Azure default to 10.0.0.0/16 for new VPCs. Both countries using the same house numbers. Discovering this conflict after both environments are in production requires re-addressing one side, which means rebuilding the VPC and migrating every workload in it. The 30 minutes spent on IP planning before the first VPC is created prevents weeks of re-addressing later.

What the Industry Gets Wrong About Multi-Cloud Networking

“VPN tunnels between clouds solve connectivity.” VPN tunnels solve basic IP reachability. A dirt road between countries. They don’t solve DNS resolution across cloud boundaries, overlapping CIDR ranges, incompatible security group models, or the fact that each provider’s load balancer works differently. Connectivity is step one of roughly twelve.

“Multi-cloud networking is a one-time setup.” Networking evolves with every new service, every new region, and every firewall rule change. A transit gateway configured for 5 services handles 5 services. Service 6 requires a new route, a new DNS forwarding rule, and a new firewall entry. International trade agreements that need renegotiating every time a new product ships. Multi-cloud networking is ongoing operational investment.

“Egress costs are small.” Egress charges compound at every transit point. A dataset moving from S3 through NAT Gateway to Direct Connect to Azure Event Hub collects a toll at each hop. Teams that model cross-cloud costs using only the headline per-GB egress rate are consistently surprised by the actual bill. Three toll booths, not one.

Our take Plan CIDR ranges across all providers before the first VPC is created. Document them in a shared registry. Reserve non-overlapping ranges for each provider, each region, and each environment. This single planning exercise prevents more multi-cloud pain than any tool, product, or architecture pattern. The street numbering system. Get it wrong and every subsequent decision inherits the mess. Get it right and nobody thinks about it again.

“Why can’t Service A talk to Service B?” The answer involves transit gateways, mesh federation, DNS forwarding, IPAM enforcement, and security policy across incompatible stacks. Three countries. Three road systems. Three sets of traffic laws. Whether the bridge is worth building depends on whether the second country earns its keep through genuinely different capabilities, or whether a second city would have done the job with a tenth of the complexity.

Connect Your Clouds Without the Complexity Tax

Multi-cloud networking done wrong costs more in egress and operational overhead than running two separate environments. Transit architectures, cross-cloud service discovery, and DNS-based failover need to work at production scale, not just on the architecture diagram.

Architect Your Network

Frequently Asked Questions

What is the latency penalty for cross-cloud traffic versus same-cloud?

+

Cross-cloud traffic over the public internet usually adds 10-30ms of round-trip latency compared to same-region, same-provider calls. Dedicated interconnects through Equinix or Megaport cut that to 2-5ms by skipping public peering. For latency-sensitive work, that 25ms stacks up across chained service calls. A request hitting 4 services adds 100ms of cross-cloud penalty on public internet versus 8-20ms on dedicated interconnect.

How do egress costs compare between VPN, dedicated interconnect, and cloud exchange?

+

All three incur per-GB egress charges from the source provider, typically a few cents per GB. The difference is transport cost. Site-to-site VPN uses the public internet, so transport is free. Dedicated interconnects (AWS Direct Connect, Azure ExpressRoute) add a fixed monthly port fee plus per-GB data transfer charges. Cloud exchanges like Equinix Fabric or Megaport add a port fee and cross-connect charge but let you reach multiple providers from one physical connection, cutting total cost for multi-provider setups.

Can Istio service mesh span multiple cloud providers?

+

Yes. Istio multi-cluster federation supports connecting meshes across providers using an east-west gateway per cluster. The control planes stay independent, but workloads get cross-cluster mTLS and traffic management. The operational cost is real: you need consistent certificate authorities across providers, synchronized service discovery, and east-west gateways that add 1-3ms of latency per hop. Consul mesh gateway achieves similar cross-cloud connectivity with a simpler operational model but fewer L7 traffic management features.

How should IP address management work across multiple cloud providers?

+

Reserve non-overlapping CIDR blocks per provider before provisioning any infrastructure. A common pattern is assigning 10.0.0.0/12 to AWS, 10.16.0.0/12 to Azure, and 10.32.0.0/12 to GCP. Overlapping CIDRs between providers need NAT translation at every transit point, which breaks service discovery and makes tracing requests across clouds nearly impossible. Use a centralized IPAM tool to enforce allocation rules.

When does multi-cloud networking complexity exceed the benefit?

+

When you have fewer than 3 genuinely different workloads per provider and cross-cloud data movement tops 5 TB per month. At that point, the egress costs, the overhead of running parallel networking stacks, and the time spent debugging cross-cloud routing typically cost more than the second provider is worth. A second region in the same provider gives you geographic redundancy at a fraction of the complexity.