Container Security Beyond the Build
Your on-call gets paged for CPU saturation across every node. Dashboard shows 80% utilization, evenly distributed, zero increase in application traffic. Something is very wrong.
A cryptominer. Deployed through a compromised CI pipeline. The image passed Trivy with zero high-severity findings. Passed Kyverno admission policies. Ran as non-root. Connected to a mining pool over port 443, indistinguishable from normal HTTPS. Every security control did its job. The attack still got through.
The miner ran for six hours before the cost spike triggered investigation. The image scan was clean because the miner was downloaded at runtime via a curl command in the application’s startup script. You inspected the shipping container at the factory. Clean. The contraband was loaded at a port stop along the way.
- Build-time scanning catches known CVEs. Runtime attacks bypass it entirely. Malware downloaded at runtime via
curlpasses every image scan. The gap between a clean build and a compromised runtime is where real attacks land. - Behavioral profiling catches what signature matching misses. A container that never made outbound connections suddenly reaching a mining pool on port 443 is suspicious regardless of the binary’s hash.
- Default-deny egress is the highest-impact runtime control. Explicit allowlists block data exfiltration and C2 communication before lateral movement begins.
- Immutable containers (read-only root filesystem) prevent runtime code injection. If nothing can be written, nothing can be downloaded.
- Falco runtime rules catch syscall anomalies like unexpected process execution, sensitive file reads, and privilege escalation attempts with tiny CPU overhead.
Most container security programs are front-loaded at the build-time layer. Teams set up image scanning, celebrate the green build, and move on. Nobody watches what containers actually do once they’re running. All customs at the port of origin. None at the destination. That blind spot is where real attacks land.
Start With the Image
Hardening the container image is the right starting point. Just not the ending point. Locking the shipping container matters, but it doesn’t help if nobody checks what gets added during transit.
Use minimal base images. Alpine-based images cut the package count from 400+ (Ubuntu) to under 50. Distroless images go further: no shell, no package manager, no debugging tools. A distroless Python image means an attacker with code execution can’t install anything, can’t spawn a shell, and can’t use standard tooling to set up camp. The trade-off is real. Debugging needs kubectl debug with an ephemeral container rather than kubectl exec. For production workloads, that trade-off pays for itself. Fewer tools in the container, fewer tools for the attacker. A shipping container with nothing inside worth stealing.
Run as a non-root user. Add USER 1001 to your Dockerfile. One line. It blocks an entire class of container escape scenarios where a kernel exploit inherits root on the host. Pair it with readOnlyRootFilesystem: true in your Kubernetes pod security context. If the application needs temporary files, mount a specific writable emptyDir volume for that purpose only. Everything else stays immutable. A locked container where only one small compartment opens.
Don’t: Leave CAP_SYS_ADMIN enabled “because the container won’t start without it.” That capability is effectively root and enables mounting filesystems. The actual fix is almost always a log directory permission or volume mount change. Giving the container a skeleton key because you didn’t want to find the right one.
Do: capabilities: { drop: ["ALL"] } in your container security context. Add back only what the application genuinely needs. Most web applications need zero additional capabilities.
| Control | Dockerfile / Build | Kubernetes Pod Security | Impact |
|---|---|---|---|
| Non-root user | USER 1000:1000 | runAsNonRoot: true, runAsUser: 1000 | Prevents container escape to host root. Single most important control |
| Read-only filesystem | N/A | readOnlyRootFilesystem: true | Blocks attackers from writing malware to disk. Mount writable /tmp if needed |
| Minimal base image | FROM gcr.io/distroless/static or alpine | N/A | Fewer packages = fewer CVEs. Distroless has no shell (attackers can’t exec in) |
| No privilege escalation | N/A | allowPrivilegeEscalation: false | Prevents setuid binaries from gaining elevated permissions |
| Drop all capabilities | N/A | drop: [“ALL”] | Linux capabilities default to a dangerous set. Drop all, add back only what’s needed |
| Resource limits | N/A | limits: cpu, memory | Prevents crypto-mining from consuming the node. Also prevents OOM cascade |
| Image scanning | Trivy in CI pipeline | Kyverno/OPA policy: only signed, scanned images deploy | Catches known CVEs before they reach production |
Admission Control: The Gate Before Workloads Run
Image hardening is under your direct control. Third-party images, operator-deployed workloads, and developer mistakes are not. Kubernetes admission controllers enforce security policy before any workload reaches the cluster, regardless of how it was created. Port authority checking the manifest before the container enters the dock.
OPA Gatekeeper and Kyverno both intercept resource creation, checking requests against your policy library. The baseline policies for every namespace:
# Kyverno: enforce non-root, read-only filesystem, seccomp
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-security-context
spec:
validationFailureAction: Enforce
rules:
- name: require-non-root
match:
resources:
kinds: ["Pod"]
validate:
message: "Containers must not run as root"
pattern:
spec:
containers:
- securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
Require non-root execution, block hostPID and hostNetwork, require resource limits, restrict registries to your internal registry plus an explicit allowlist.
What most teams get wrong matters more than which engine you pick. Policies that are easy to follow get followed. Policies that block real work without a clear fix get turned off quietly by frustrated engineers. Every policy should include a hint in its denial message telling the developer exactly what to change. A gate that tells you which form to fill out, not just “access denied.” Cloud-native security programs that treat admission control as a teaching tool stick. The ones that just block people get bypassed within weeks. DevSecOps shift-left practices reinforce this.
Runtime Detection With eBPF
Admission controllers stop policy-violating configs from running. They can’t stop an attacker who exploits a vulnerability in an app that’s running legitimately. The attacker isn’t trying to deploy a bad pod. They’re exploiting one that passed every check you built. The cargo cleared customs. The bomb was inside a legitimate shipment.
Falco
(CNCF-maintained) uses eBPF to hook the Linux kernel and check every syscall against a rule set. Out of the box it catches the obvious attack patterns: shell spawn, /etc writes, /etc/shadow reads, outbound connections to unknown IPs, download-and-execute at runtime. Alerts fire within milliseconds. Barely any CPU cost per node. Security cameras inside every warehouse on the dock.
Tetragon (from Cilium) goes further. It enforces in the kernel, blocking bad actions instead of just alerting. For the highest-risk violations (shell spawn in production, writes to sensitive paths, running downloaded binaries), blocking is the right call. You want to stop the cryptominer from running, not get a Slack notification that it started five minutes ago. Catching the thief beats reviewing the security footage.
- Deploy Falco in audit mode for 2-4 weeks before enforcing rules
- Behavioral baseline set up per workload type (which syscalls are “normal”)
- Alert routing configured to security on-call, not dumped into a noisy channel
- Falco rules tuned to suppress known-good patterns for your stack (init containers, sidecar proxies)
- Tetragon enforcement turned on only for highest-risk classes first (shell spawn, binary download-and-execute)
The behavioral baseline is the step most teams skip. Deploy runtime detection without knowing what “normal” looks like for your workloads and you get alert fatigue. Security turns sensitivity down. The tool becomes decoration. Security cameras everywhere, nobody watching the monitors. Back to square one with extra overhead.
Network Policy as the Perimeter That Actually Works
Default Kubernetes networking lets every pod reach every other pod across all namespaces. Convenient for development. Devastating for containment. A compromised analytics pod reaching your payment service is not a sophisticated attack. It’s the default. Every container on the dock with access to every warehouse. No locks on any door.
Default-deny with Cilium or Calico means you set explicit allow rules per namespace. The gap is rarely the tooling. Most cloud-native teams have the CNI installed already. The gap is knowing which namespaces need to talk to which, over which ports, and in which direction.
| When default-deny egress works | When it creates friction |
|---|---|
| Production namespaces with well-defined dependencies | Development environments where engineers explore new integrations |
| Workloads with stable outbound connection patterns | Services connecting with frequently changing third-party APIs |
| High-security namespaces (payment, PII processing) | Early deployment phases before traffic patterns are baselined |
Start with network policies on your highest-value namespaces first. Payment processing, PII storage, authentication services. The most valuable cargo gets the tightest security. Expand outward once the policy definitions stabilize.
The Visibility Gap Across Layers
No single layer catches everything. The value comes from overlap: what image scanning misses, admission control catches. What admission control can’t see, runtime detection watches. What runtime detection flags, network policy blocks. Factory inspection, port authority, dock cameras, warehouse locks. Each layer catches what the others miss.
| Security Layer | What It Catches | What It Misses | When It Runs |
|---|---|---|---|
| Image scanning (Trivy, Grype) | Known CVEs in packages | Runtime-downloaded malware, zero-days | Build time / CI |
| Image signing (cosign, Sigstore) | Tampered or unsigned images | Malicious code in legitimately signed images | Build + admission |
| Admission control (Kyverno, OPA) | Policy violations (root, privileged, no limits) | Behavior after admission | Deploy time |
| Runtime detection (Falco, eBPF) | Anomalous syscalls, unexpected network connections | Attacks that perfectly mimic normal behavior | Continuous |
| Network policy (Cilium, Calico) | Lateral movement, data exfiltration | Attacks within allowed traffic paths | Continuous |
Without eBPF runtime visibility and proper security operations , you find out about incidents when finance asks why costs doubled. The invoice, not the alarm. Secrets management covers eliminating another common vector: static credentials baked into environment variables.
How seccomp profiles add a sixth layer
Seccomp limits which of the 300+ Linux system calls a container can make. Most applications use fewer than 50. Enabling Kubernetes’ RuntimeDefault seccomp profile blocks kernel exploits that depend on uncommon syscalls like ptrace, mount, or bpf. One annotation in the pod spec, near-zero overhead, and it prevents an entire class of privilege escalation that would otherwise need a CVE-specific detection rule. Most teams never turn them on because the default (unconfined) doesn’t break anything visible. The lock you never install because the door opens fine without it.
What the Industry Gets Wrong About Container Security
“Image scanning is container security.” Image scanning catches known CVEs at build time. It can’t catch runtime-downloaded malware, zero-day exploitation, privilege escalation through misconfigured capabilities, or outbound connections to attacker infrastructure. The cryptominer in the opening passed Trivy with zero high-severity findings because it was downloaded at runtime. Inspected at the factory. Loaded on the ship. Build-time scanning is necessary. Calling it sufficient is how six-hour dwell times happen.
“Shift-left means runtime security is unnecessary.” Shifting security left catches misconfigurations before deployment. It can’t stop an attacker exploiting a vulnerability in a legitimately deployed, policy-compliant application. Runtime detection with eBPF is the only layer that watches what containers actually do after they start running. Customs at the origin doesn’t replace customs at the destination.
“Default Kubernetes networking is acceptable for development.” Every pod reaching every other pod across all namespaces makes lateral movement trivial. A compromised analytics pod reaching your payment service is the default network behavior, not a sophisticated attack. Every door in the building unlocked. Default-deny with explicit allow rules is the minimum.
Same page. Same cryptominer scenario. Falco flags the mining pool connection in seconds. Network policy blocks egress to unknown IPs. Hours of dwell time collapse to moments. Factory inspection, port authority, dock cameras, warehouse locks. The contraband never made it past the dock.