Zero Trust Architecture: Build It, Not Buy It

Apr 17, 2025 Metasphere Engineering 14 min read

You just signed a hefty contract for a “zero trust platform.” The sales engineer configured it in two weeks. Your CISO checked the compliance box on the board report. Everyone feels secure.

A new lock on the front door. Every room inside still unlocked.

Meanwhile, your billing service still talks to 47 other internal services over a flat network using the same database password it’s had since 2021.

The Checkbox Security Illusion You deploy the vendor product, update the compliance slide deck, and report zero trust adoption to the board. Nothing changes in how services actually talk to each other. New lock on the front door. Same keys to every room. The vendor product sits in front of the same flat network. Checkbox checked. Nothing else changed.

Gartner found that over 60% of organizations claiming zero trust adoption still trusted internal traffic by default. The vendor product got deployed. The security architecture didn’t change. Same attack surface, same risks.

Key takeaways

Over 60% of organizations claiming zero trust still use implicit trust for internal communication. The vendor product got deployed. The architecture didn’t change.
Zero trust means per-request authentication and authorization. Every service call carries identity. Every call is verified. No exceptions for “internal” traffic.
Microsegmentation replaces flat networks with explicit allow lists. Service A can only reach Services B and C. Everything else is denied by default.
SPIFFE/SPIRE provides cryptographic workload identity without static credentials. Short-lived, automatically rotated, verifiable.
Start with your most critical service boundary, not a full rewrite. Enforce zero trust between your payment service and everything it touches. Expand from there.

What Zero Trust Actually Means

Never trust based on network location. Inside the firewall, on VPN, from your own cloud account. Every request gets treated the same as one from an unknown IP.

A VPN authenticates you once and gives you the run of the house for 8-12 hours. Zero trust verifies every request. A user on the corporate LAN and a user on airport WiFi get the same treatment. There’s no “inside” the network. Just “authenticated for this request” or “denied.”

Perimeter security assumes everything inside the firewall is safe. Pen tests show what happens when that breaks: a compromised web-tier reaches databases within minutes on a flat network. The attacker doesn’t need to break through the perimeter again. They’re already inside, and nothing stops lateral movement.

Traditional perimeter	Zero trust
Authenticate once at the edge, access everything	Authenticate per request, access only what’s authorized
Network location implies trust	Network location is irrelevant
Static credentials shared across services	Short-lived, dynamically issued, per-service credentials
Flat internal network, no segmentation	Default-deny, explicit allow between specific services
Breach = full lateral movement	Breach = single service, contained blast radius

Identity as the New Perimeter

Identity replaces the network boundary. Every request must answer three questions: who is making this request, are they allowed this specific action, and is their credential still valid right now?

Human Identity

Phishing-resistant MFA (FIDO2 passkeys, hardware security keys) replaces password-plus-SMS. Device posture checks make sure the machine is managed and patched. Session tokens are tied to specific applications with 1-4 hour lifetimes. An engineer accessing the production dashboard gets a token that works for the dashboard and nothing else, valid for two hours, not eight.

Workload Identity

SPIFFE/SPIRE gives every service a cryptographic identity. Each one gets an X.509 certificate with a 24-hour TTL, automatically rotated. Now picture a service account key created three years ago, shared across 12 services, stored in a CI/CD variable nobody remembers setting. The secrets management guide covers migrating from static credentials to dynamic issuance.

Microsegmentation: Containing the Blast Radius

Identity answers “who is making this request.” Microsegmentation answers a different question: can this service even talk to that one? You need both. Without segmentation, an authenticated attacker still has the run of the network. And segmentation alone? The allowed paths carry unverified traffic.

On a flat network, a compromised billing service can reach the analytics database, the user service, the backup system, and everything else on the same subnet. Pen testers prove it every time: compromise one web-facing service, pivot laterally, reach the database within minutes. Microsegmentation enforces default-deny rules at the workload level. Billing can reach the payment service and its own database. It can’t reach analytics, user data, or backups. The path doesn’t exist.

Kubernetes handles this with NetworkPolicies. Cloud environments use scoped security groups for the same thing. The tooling changes by platform, but the principle doesn’t: no communication path exists unless explicitly allowed. Start by mapping every service-to-service communication path in production, then build allow rules only for the paths that actually need to exist. Most teams find they have way more connections than they thought, and pruning unused paths shrinks the attack surface right away.

Anti-pattern

Don’t: Deploy microsegmentation with overly broad allow rules to avoid breaking services. allow billing -> * defeats the entire purpose. You’ve added operational complexity without reducing attack surface.

Do: Start with default-deny and add specific allow rules one service pair at a time. Monitor denied connections for a week before enforcement to catch legitimate paths your service map missed.

Application-Level Authorization

Network segmentation controls which services can talk to each other. But once the connection is open, what can they actually do? Application-level authorization fills that gap. Every API endpoint must answer: is this identity allowed this action on this resource right now?

OPA lets you write authorization logic as code, version-controlled and unit-tested. Policies live alongside the application code, changes go through pull requests, and every authorization decision produces an audit log. The API security guide covers preventing BOLA, which is the most common API vulnerability and the one microsegmentation alone can’t prevent.

Step	Actor	Action	What’s Verified
1	Billing Service	Presents workload certificate via mTLS to Identity Provider	Service identity is authentic (not spoofed)
2	Identity Provider	Issues short-lived access token	Token scoped to billing-svc, expires in minutes
3	Billing Service	POST /payments/charge with token + payload	Request reaches Payment API
4	Payment API	Asks Policy Engine (OPA): “Can billing-svc write /payments for tenant-42?”	Three checks: identity valid, action permitted, resource ownership matches
5a	Policy Engine (pass)	ALLOW. Payment API processes the charge	200 OK returned to Billing Service
5b	Policy Engine (fail)	DENY. Expired token, wrong action, or wrong tenant	403 Forbidden. Violation logged for security review

Every request gets checked. No service gets a pass for being “internal.”

The Vendor Trap

A vendor platform that sits in front of your architecture without changing the architecture is a VPN with better marketing materials. Three questions tell you if it’s real. Did services change how they authenticate to each other? Are static credentials gone? Is microsegmentation enforced? If all three answers are no, the vendor product added cost without adding security. The compliance checkbox is checked. The blast radius of a breach hasn’t changed since before the contract was signed.

The IAM at scale guide covers the identity patterns in detail. It happens the same way every time: buy a product, configure it at the perimeter, leave the internal architecture untouched. The compliance report improves. The actual security posture doesn’t. The vendor dashboard shows green. The flat network behind it is still flat.

Vendor-deployed “zero trust”	Engineering-built zero trust
Product sits at the perimeter	Controls enforce at every service boundary
Static credentials unchanged	Dynamic, short-lived credentials issued per request
Flat internal network	Default-deny microsegmentation between services
Authentication at the edge only	Per-request authentication and authorization
Compliance checkbox satisfied	Actual attack surface measurably reduced

The Adoption Roadmap

Zero trust is an 18-24 month engineering program, not a product deployment. Each phase builds on the last, shrinking the attack surface along the way.

Phase	Focus	Key Deliverables	Timeline
Phase 1: Identity Foundation	SSO + MFA everywhere. Service identity for workloads	Identity provider consolidated, MFA enforced (no exceptions), service accounts mapped	Month 1-3
Phase 2: Credential Elimination	Remove static credentials. Dynamic secrets, short-lived tokens	Vault for dynamic credentials, certificate-based auth for services, API keys retired	Month 3-6
Phase 3: Microsegmentation	Network segmentation by identity, not IP. East-west controls	Service mesh mTLS, namespace network policies, workload identity verification	Month 6-9
Phase 4: Continuous Verification	Every request evaluated in real-time. Device posture + context	Per-request authorization, device health checks, behavioral anomaly detection	Month 9-12+

Prerequisites

SSO with phishing-resistant MFA enforced for all human access
Service inventory mapping all service-to-service communication paths
Centralized logging capable of recording every authentication event
At least one secrets management backend capable of dynamic credential issuance
Network policy enforcement mechanism available (Kubernetes NetworkPolicies or cloud security groups)

Phase	Duration	Primary outcome	Key risk
Identity foundation	90 days	SSO + MFA for humans, SPIFFE for workloads	Incomplete service inventory delays workload identity
Credential elimination	90 days	Dynamic secrets replace static credentials for top 10 services	Connection pool failures during credential rotation
Microsegmentation	90 days	Default-deny between all production services	Overly broad allow rules negate the benefit
Continuous authorization	90 days	OPA policy engine evaluating every API request	Policy latency adds measurable overhead to request paths

What the Industry Gets Wrong About Zero Trust

“Zero trust is a product you can buy.” Vendors have co-opted the term to sell VPN replacements with better dashboards. Deploying a “zero trust platform” without changing how services authenticate and authorize requests to each other is checkbox security. The vendor check clears. The architecture stays the same. So does the attack surface.

“VPN provides adequate security for internal services.” A VPN guards the perimeter and hopes the inside stays clean. Once inside a flat network, attackers reach other services within minutes. Replace the VPN with identity-aware proxies that verify every request regardless of network origin, and a breach stays contained to the one compromised service.

“Microsegmentation is too complex for most organizations.” Start with three critical services. Enforce default-deny between them and everything they communicate with. Most teams see their attack surface shrink within 90 days of partial implementation. Full maturity takes 18-24 months. The first meaningful reduction takes 90 days.

Our take Zero trust is an engineering program, not a procurement exercise. Build incrementally: identity first, credential elimination second, microsegmentation third, continuous authorization last. Each phase shrinks the attack surface. Treat it as architecture work and it works. Treat it as a purchasing decision and it doesn’t. Budget and talent are rarely the bottleneck. Engineering commitment is.

That billing service talking to 47 internal services over a flat network with a 2021 database password? It authenticates per-request now with short-lived credentials, communicates only over microsegmented paths, and every call goes through the policy engine. Turns out, the hefty vendor contract was never the problem.

Frequently Asked Questions

What is the fundamental difference between zero trust and a VPN?

A VPN grants broad network access after a single authentication, typically for 8-12 hours across all internal resources. Zero trust verifies every individual request regardless of network origin. A user on the corporate LAN and a user on airport WiFi get identical scrutiny. Organizations that replace VPN with zero trust proxies see lateral movement incidents plummet within the first year.

Can we implement zero trust without replacing our entire architecture?

Yes. Start by eliminating long-lived static credentials on your three most critical services and enforcing phishing-resistant MFA. Add microsegmentation to those same workloads. The exploitable attack surface shrinks fast, often within 90 days of partial implementation. Full zero trust maturity typically takes 18-24 months.

Why are static credentials incompatible with zero trust?

A credential valid for 15 minutes has an 8,760x smaller exposure window than one valid for a year. Static API keys and database passwords are regularly found in breach dumps and reused across environments. Zero trust requires short-lived, dynamically generated credentials via tools like HashiCorp Vault or AWS IAM Roles Anywhere that expire automatically and become useless if intercepted.

What is microsegmentation and how does it differ from a perimeter firewall?

Microsegmentation enforces granular security policies at the individual workload level using default-deny rules. A compromised billing service cannot reach the analytics database because that path is permanently blocked. A perimeter firewall only guards the boundary. Once inside a flat network, attackers typically achieve lateral movement across the vast majority of services within minutes. Microsegmentation contains blast radius to a single workload.

Do we still need network firewalls in a zero trust environment?

Yes, but their role narrows to handling volumetric DDoS attacks and serving as a coarse outer boundary. Primary security controls shift to identity-aware proxies, per-service authorization, and workload identity verification. Organizations that layer firewalls with microsegmentation and continuous authorization shrink breach blast radius drastically compared to perimeter-only defense.