Container Security Beyond the Build

Jul 29, 2025 Metasphere Engineering 15 min read

Your on-call gets paged for CPU saturation across every node. Dashboard shows 80% utilization, evenly distributed, zero increase in application traffic. Something is very wrong.

A cryptominer. Deployed through a compromised CI pipeline. The image passed Trivy with zero high-severity findings. Passed Kyverno admission policies. Ran as non-root. Connected to a mining pool over port 443, indistinguishable from normal HTTPS. Every security control did its job. The attack still got through.

The miner ran for six hours before the cost spike triggered investigation. The image scan was clean because the miner was downloaded at runtime via a curl command in the application’s startup script. You inspected the shipping container at the factory. Clean. The contraband was loaded at a port stop along the way.

Key takeaways

Build-time scanning catches known CVEs. Runtime attacks bypass it entirely. Malware downloaded at runtime via curl passes every image scan. The gap between a clean build and a compromised runtime is where real attacks land.
Behavioral profiling catches what signature matching misses. A container that never made outbound connections suddenly reaching a mining pool on port 443 is suspicious regardless of the binary’s hash.
Default-deny egress is the highest-impact runtime control. Explicit allowlists block data exfiltration and C2 communication before lateral movement begins.
Immutable containers (read-only root filesystem) prevent runtime code injection. If nothing can be written, nothing can be downloaded.
Falco runtime rules catch syscall anomalies like unexpected process execution, sensitive file reads, and privilege escalation attempts with tiny CPU overhead.

Most container security programs are front-loaded at the build-time layer. Teams set up image scanning, celebrate the green build, and move on. Nobody watches what containers actually do once they’re running. All customs at the port of origin. None at the destination. That blind spot is where real attacks land.

Start With the Image

Hardening the container image is the right starting point. Just not the ending point. Locking the shipping container matters, but it doesn’t help if nobody checks what gets added during transit.

Use minimal base images. Alpine-based images cut the package count from 400+ (Ubuntu) to under 50. Distroless images go further: no shell, no package manager, no debugging tools. A distroless Python image means an attacker with code execution can’t install anything, can’t spawn a shell, and can’t use standard tooling to set up camp. The trade-off is real. Debugging needs kubectl debug with an ephemeral container rather than kubectl exec. For production workloads, that trade-off pays for itself. Fewer tools in the container, fewer tools for the attacker. A shipping container with nothing inside worth stealing.

Run as a non-root user. Add USER 1001 to your Dockerfile. One line. It blocks an entire class of container escape scenarios where a kernel exploit inherits root on the host. Pair it with readOnlyRootFilesystem: true in your Kubernetes pod security context. If the application needs temporary files, mount a specific writable emptyDir volume for that purpose only. Everything else stays immutable. A locked container where only one small compartment opens.

Anti-pattern

Don’t: Leave CAP_SYS_ADMIN enabled “because the container won’t start without it.” That capability is effectively root and enables mounting filesystems. The actual fix is almost always a log directory permission or volume mount change. Giving the container a skeleton key because you didn’t want to find the right one.

Do: capabilities: { drop: ["ALL"] } in your container security context. Add back only what the application genuinely needs. Most web applications need zero additional capabilities.

Control	Dockerfile / Build	Kubernetes Pod Security	Impact
Non-root user	USER 1000:1000	runAsNonRoot: true, runAsUser: 1000	Prevents container escape to host root. Single most important control
Read-only filesystem	N/A	readOnlyRootFilesystem: true	Blocks attackers from writing malware to disk. Mount writable /tmp if needed
Minimal base image	FROM gcr.io/distroless/static or alpine	N/A	Fewer packages = fewer CVEs. Distroless has no shell (attackers can’t exec in)
No privilege escalation	N/A	allowPrivilegeEscalation: false	Prevents setuid binaries from gaining elevated permissions
Drop all capabilities	N/A	drop: [“ALL”]	Linux capabilities default to a dangerous set. Drop all, add back only what’s needed
Resource limits	N/A	limits: cpu, memory	Prevents crypto-mining from consuming the node. Also prevents OOM cascade
Image scanning	Trivy in CI pipeline	Kyverno/OPA policy: only signed, scanned images deploy	Catches known CVEs before they reach production

Admission Control: The Gate Before Workloads Run

Image hardening is under your direct control. Third-party images, operator-deployed workloads, and developer mistakes are not. Kubernetes admission controllers enforce security policy before any workload reaches the cluster, regardless of how it was created. Port authority checking the manifest before the container enters the dock.

OPA Gatekeeper and Kyverno both intercept resource creation, checking requests against your policy library. The baseline policies for every namespace:

# Kyverno: enforce non-root, read-only filesystem, seccomp
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-security-context
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-non-root
      match:
        resources:
          kinds: ["Pod"]
      validate:
        message: "Containers must not run as root"
        pattern:
          spec:
            containers:
              - securityContext:
                  runAsNonRoot: true
                  readOnlyRootFilesystem: true
                  allowPrivilegeEscalation: false

Require non-root execution, block hostPID and hostNetwork, require resource limits, restrict registries to your internal registry plus an explicit allowlist.

What most teams get wrong matters more than which engine you pick. Policies that are easy to follow get followed. Policies that block real work without a clear fix get turned off quietly by frustrated engineers. Every policy should include a hint in its denial message telling the developer exactly what to change. A gate that tells you which form to fill out, not just “access denied.” Cloud-native security programs that treat admission control as a teaching tool stick. The ones that just block people get bypassed within weeks. DevSecOps shift-left practices reinforce this.

Runtime Detection With eBPF

Admission controllers stop policy-violating configs from running. They can’t stop an attacker who exploits a vulnerability in an app that’s running legitimately. The attacker isn’t trying to deploy a bad pod. They’re exploiting one that passed every check you built. The cargo cleared customs. The bomb was inside a legitimate shipment.

Falco (CNCF-maintained) uses eBPF to hook the Linux kernel and check every syscall against a rule set. Out of the box it catches the obvious attack patterns: shell spawn, /etc writes, /etc/shadow reads, outbound connections to unknown IPs, download-and-execute at runtime. Alerts fire within milliseconds. Barely any CPU cost per node. Security cameras inside every warehouse on the dock.

Tetragon (from Cilium) goes further. It enforces in the kernel, blocking bad actions instead of just alerting. For the highest-risk violations (shell spawn in production, writes to sensitive paths, running downloaded binaries), blocking is the right call. You want to stop the cryptominer from running, not get a Slack notification that it started five minutes ago. Catching the thief beats reviewing the security footage.

Prerequisites

Deploy Falco in audit mode for 2-4 weeks before enforcing rules
Behavioral baseline set up per workload type (which syscalls are “normal”)
Alert routing configured to security on-call, not dumped into a noisy channel
Falco rules tuned to suppress known-good patterns for your stack (init containers, sidecar proxies)
Tetragon enforcement turned on only for highest-risk classes first (shell spawn, binary download-and-execute)

The behavioral baseline is the step most teams skip. Deploy runtime detection without knowing what “normal” looks like for your workloads and you get alert fatigue. Security turns sensitivity down. The tool becomes decoration. Security cameras everywhere, nobody watching the monitors. Back to square one with extra overhead.

Network Policy as the Perimeter That Actually Works

Default Kubernetes networking lets every pod reach every other pod across all namespaces. Convenient for development. Devastating for containment. A compromised analytics pod reaching your payment service is not a sophisticated attack. It’s the default. Every container on the dock with access to every warehouse. No locks on any door.

Default-deny with Cilium or Calico means you set explicit allow rules per namespace. The gap is rarely the tooling. Most cloud-native teams have the CNI installed already. The gap is knowing which namespaces need to talk to which, over which ports, and in which direction.

When default-deny egress works	When it creates friction
Production namespaces with well-defined dependencies	Development environments where engineers explore new integrations
Workloads with stable outbound connection patterns	Services connecting with frequently changing third-party APIs
High-security namespaces (payment, PII processing)	Early deployment phases before traffic patterns are baselined

Start with network policies on your highest-value namespaces first. Payment processing, PII storage, authentication services. The most valuable cargo gets the tightest security. Expand outward once the policy definitions stabilize.

The Visibility Gap Across Layers

No single layer catches everything. The value comes from overlap: what image scanning misses, admission control catches. What admission control can’t see, runtime detection watches. What runtime detection flags, network policy blocks. Factory inspection, port authority, dock cameras, warehouse locks. Each layer catches what the others miss.

Security Layer	What It Catches	What It Misses	When It Runs
Image scanning (Trivy, Grype)	Known CVEs in packages	Runtime-downloaded malware, zero-days	Build time / CI
Image signing (cosign, Sigstore)	Tampered or unsigned images	Malicious code in legitimately signed images	Build + admission
Admission control (Kyverno, OPA)	Policy violations (root, privileged, no limits)	Behavior after admission	Deploy time
Runtime detection (Falco, eBPF)	Anomalous syscalls, unexpected network connections	Attacks that perfectly mimic normal behavior	Continuous
Network policy (Cilium, Calico)	Lateral movement, data exfiltration	Attacks within allowed traffic paths	Continuous

Without eBPF runtime visibility and proper security operations , you find out about incidents when finance asks why costs doubled. The invoice, not the alarm. Secrets management covers eliminating another common vector: static credentials baked into environment variables.

How profiles add a sixth layer

Seccomp limits which of the 300+ Linux system calls a container can make. Most applications use fewer than 50. Enabling Kubernetes’ RuntimeDefault seccomp profile blocks kernel exploits that depend on uncommon syscalls like ptrace, mount, or bpf. One annotation in the pod spec, near-zero overhead, and it prevents an entire class of privilege escalation that would otherwise need a CVE-specific detection rule. Most teams never turn them on because the default (unconfined) doesn’t break anything visible. The lock you never install because the door opens fine without it.

What the Industry Gets Wrong About Container Security

“Image scanning is container security.” Image scanning catches known CVEs at build time. It can’t catch runtime-downloaded malware, zero-day exploitation, privilege escalation through misconfigured capabilities, or outbound connections to attacker infrastructure. The cryptominer in the opening passed Trivy with zero high-severity findings because it was downloaded at runtime. Inspected at the factory. Loaded on the ship. Build-time scanning is necessary. Calling it sufficient is how six-hour dwell times happen.

“Shift-left means runtime security is unnecessary.” Shifting security left catches misconfigurations before deployment. It can’t stop an attacker exploiting a vulnerability in a legitimately deployed, policy-compliant application. Runtime detection with eBPF is the only layer that watches what containers actually do after they start running. Customs at the origin doesn’t replace customs at the destination.

“Default Kubernetes networking is acceptable for development.” Every pod reaching every other pod across all namespaces makes lateral movement trivial. A compromised analytics pod reaching your payment service is the default network behavior, not a sophisticated attack. Every door in the building unlocked. Default-deny with explicit allow rules is the minimum.

The Clean Scan Illusion The false confidence you get when every build-time control passes and nobody watches runtime. Clean image scan. Passed admission policies. Ran as non-root. All green. The attack gets through anyway because it works entirely at runtime: downloading binaries, reaching out, eating resources. The container was inspected at the factory. Nobody checked the ship.

Our take Falco with eBPF is the highest-value addition to any Kubernetes security program that currently stops at admission control. Barely any CPU cost per node, CNCF-maintained, and it fills the one gap that no amount of build-time scanning can cover: what containers actually do when they run. The most dangerous security posture is the one that feels complete. If your team has solid build-time controls and no runtime visibility, you’ll find out about incidents when finance notices the bill.

Same page. Same cryptominer scenario. Falco flags the mining pool connection in seconds. Network policy blocks egress to unknown IPs. Hours of dwell time collapse to moments. Factory inspection, port authority, dock cameras, warehouse locks. The contraband never made it past the dock.

Frequently Asked Questions

Why is image scanning alone insufficient for container security?

Image scanning finds known CVEs at build time. It can’t catch runtime attacks, privilege escalation from misconfigured capabilities, connections to attacker servers, or exploits published after your last build. A clean scan is a snapshot of the past. Cryptominers downloaded at runtime via curl commands eat cluster CPU for hours while image scans stay perfectly green.

What is eBPF and why is it the preferred mechanism for container runtime security?

eBPF runs small programs safely inside the Linux kernel without kernel modules. Falco and Tetragon use it to watch every syscall, network connection, and file operation in real time with barely any performance cost and zero application changes. Because eBPF works at the kernel level, it can’t be dodged by application-layer tricks like process name spoofing.

What is the difference between OPA Gatekeeper and Kyverno for Kubernetes admission control?

Both intercept resource creation before workloads reach the cluster. OPA Gatekeeper uses Rego, a purpose-built policy language that’s powerful but hard to learn. Kyverno uses YAML-based policies that are simpler to write and maintain. For teams without Rego experience, Kyverno usually wins on learning curve. Most teams get a working set of core policies running within days using Kyverno.

Should containers ever run as root?

No. If a process escapes the container through a kernel exploit, it gets root on the host node when it ran as root inside the container. Add USER 1001 to your Dockerfile, set runAsNonRoot: true in pod security contexts, and enforce it with an admission controller. One line blocks an entire class of container escapes.

What are seccomp profiles and how much security value do they add?

Seccomp limits which of the 300+ Linux system calls a container can make. Most apps use fewer than 50. Turning on Kubernetes’ RuntimeDefault seccomp profile blocks kernel exploits that depend on uncommon syscalls. One annotation, near-zero overhead, and it stops an entire class of privilege escalation in any Kubernetes cluster.