GitOps Beyond Kubernetes: Terraform, DBs, and Policy
Your on-call engineer gets paged for a connectivity issue between two services. She adds an inbound rule to a security group directly in the cloud console. The incident resolves. The Slack thread goes quiet. Everyone goes back to sleep. Nobody files a follow-up ticket.
An off-the-books transaction. No receipt. No entry in the ledger.
Three weeks later, someone on the infrastructure team runs terraform apply as part of a routine change. Terraform doesn’t know about the console fix. It sees drift from the declared state and quietly reverts the security group to its pre-incident configuration. The connectivity between those two services breaks again. Another page, another scramble, and this time the on-call engineer has no idea what changed because the Terraform apply log shows a “normal” change. She’s debugging a ghost. The accountant “corrected” a transaction they didn’t know was intentional. Same problem. Nobody connects the two events.
- Console fixes get reverted by
terraform apply. Any manual change outside the declared state is a ghost waiting to cause the same incident twice. - GitOps extends beyond Kubernetes. Terraform, network policies, DNS, IAM, database schemas. Anything with a declared state can be reconciled.
- Drift detection must run continuously, not just on deploy. Infrastructure that drifts between applies piles up invisible risk.
- PR-based infrastructure changes create audit trails automatically. Who changed what, when, why, and who approved it. Every compliance framework asks for this.
- Emergency console access needs a break-glass process that immediately creates a PR to codify the change. The receipt goes in the ledger within the hour.
ArgoCD and Flux taught the industry GitOps for Kubernetes. The reconciliation loop (compare desired to actual, close the gap) works for everything declarative: Terraform, database schemas, OPA policies, network config. Limiting GitOps to Kubernetes leaves the majority of the operational surface unreconciled. Keeping perfect books for one department while the rest of the company runs on napkin math.
Infrastructure GitOps with Terraform
Running terraform plan locally and then terraform apply is not GitOps. It’s infrastructure-as-code without the reconciliation loop. Atlantis and Spacelift implement the real pattern: PR triggers plan, merge triggers apply, scheduled runs detect drift.
| GitOps Layer | Tool | Reconciliation | Drift Detection |
|---|---|---|---|
| Kubernetes workloads | ArgoCD / Flux | Continuous (every 3 min) | Built-in, visual diff |
| Cloud infrastructure | Atlantis / Spacelift + Terraform | PR-triggered apply | Scheduled terraform plan |
| Database schemas | Flyway / Liquibase in CI | Migration on merge | Schema comparison tools |
| DNS / CDN config | Terraform or Octodns | PR-triggered apply | Drift detection via API diff |
| IAM / RBAC policies | OPA / Kyverno in Git | Admission-time enforcement | Policy audit logs |
The Terraform workflow looks different from Kubernetes GitOps because human review stays in the loop. No auto-apply on drift detection. Every plan gets a PR.
Drift Detection: The Part Everyone Skips
A handful of manual console changes every month during incidents. “I’ll codify it tomorrow.” (Tomorrow never comes. It never does.) Those changes pile up as drift until terraform apply either reverts a critical fix or produces a plan with 47 unexpected changes nobody wants to untangle. Infrastructure-as-code
gets abandoned while the actual infrastructure evolves through the console.
The fix is mechanical, not cultural. Run terraform plan in detect mode on a 30-minute schedule. Any non-empty plan opens a PR automatically and alerts the team. Within weeks, console drift drops steeply. Not because engineers changed their habits out of principle, but because the tool changed the feedback loop. Drift gets flagged before anyone forgets about it.
Database Schema as Code
Database migrations are the exception that proves the GitOps rule. Kubernetes manifests and Terraform configs are bidirectional: sync to any commit, forward or rollback. Database migrations are forward-only. Reverting a commit doesn’t drop a column or un-rename a table.
- All schema migrations stored in version control with sequential numbering or timestamps
- CI pipeline validates every migration against a test database before merge approval
- Expand/contract pattern adopted for all breaking schema changes (add new column, migrate data, drop old column in a later migration)
- Migration execution time logged and alerted when exceeding expected bounds
- Rollback migrations written and tested for every change that modifies existing columns
Flyway and Liquibase store migrations in version control with CI validation against a test database before production apply. The safety net: run every migration against a test database on every PR. Fast enough to finish in seconds for most schemas. Catches syntax errors, constraint violations, and data type mismatches before they hit production.
For breaking changes, the expand/contract pattern is mandatory. Add the new column in one migration, backfill data, update application code to write to both columns, then drop the old column in a separate migration after all consumers have migrated. More ceremony than a direct rename. Far fewer incidents.
Policy-as-Code: Security Without Review Queues
OPA Gatekeeper evaluates admission requests in under 5ms. Container running as root? Rejected with a specific fix message in the kubectl output. No waiting for a security review. No spreadsheet of exceptions. The policy is code, the enforcement is real-time, and the audit trail is a Git commit. Automated enforcement that never takes a break and never misses one.
Start with a handful of critical rules: no privileged containers, no public load balancers without approval, required resource limits on every pod. Grow into the hundreds as the team gains confidence. When an auditor asks “when was this policy enacted and who approved it?” the answer is a Git commit with timestamp, diff, and reviewed PR. Compliance audits get shorter. Security posture improves. Both from the same mechanism.
Don’t: Keep security policies in a wiki or spreadsheet and rely on manual review to catch violations. Policies that exist only in documentation are hypotheses about what production looks like. They’re rarely accurate.
Do: Express policies as OPA Rego or Kyverno YAML in Git. Deploy through the same PR review process as application code. Enforce at admission time. A policy violation is rejected before the resource ever exists in the cluster.
# OPA policy: deny containers running as root
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
container.securityContext.runAsUser == 0
msg := sprintf(
"Container '%s' runs as root (UID 0). Set runAsNonRoot: true",
[container.name]
)
}
The Incident Response Shift
When an incident demands a configuration change, the instinct is to fix it in the console. Faster. More direct. But that console fix becomes a ghost the next time terraform apply runs.
Committing the fix to Git instead of the console takes 3-5 minutes longer during the incident. In exchange: an audit trail, pipeline validation, no silent revert. For DevOps teams deploying 10+ times per day, committing-first becomes muscle memory within 2-3 weeks. The reconciler reverts unauthorized console changes within 30 minutes. After a few reverts, engineers discover that Git-first is the path of least resistance. Writing the receipt is faster than explaining why there isn’t one.
| When to commit-first | When break-glass is justified |
|---|---|
| Configuration change with known fix | Customer-impacting outage with unknown root cause |
| Change that can wait 3-5 minutes for PR | Security incident needing immediate action |
| Non-time-critical infrastructure modification | Complete service failure where minutes of downtime compound |
For break-glass scenarios, the process must include an automatic follow-up: the console change triggers a PR within the hour to codify it. Without that follow-up mechanism, break-glass becomes the default and GitOps degrades to “infrastructure in Git, sometimes.” Within the hour. No exceptions.
Extend gradually: Kubernetes first (months 1-2), Terraform with drift detection (3-4), database migrations (5-6), policy-as-code through platform engineering (7-8). By month 8, “what changed?” always has an answer in Git.
Push vs. pull model: when each applies
Push model: CI/CD pushes changes to the target system on every commit. Simpler to set up. Works well for Terraform applies and database migrations where the change is triggered by a merge event. Downside: the CI system needs credentials to the production environment, expanding the attack surface.
Pull model: an agent running in the target environment (ArgoCD, Flux) polls Git every 3-5 minutes and pulls changes. The agent authenticates outbound to Git, which means no inbound credentials to production from CI. Continuous reconciliation handles network gaps gracefully. If the cluster was unreachable when a change was pushed, the pull model catches up automatically on the next poll.
For Kubernetes, pull is the clear winner because of the security model and continuous reconciliation. For Terraform and database migrations, push is more practical because the “reconciliation” is a plan-and-apply cycle that shouldn’t run on an automatic schedule without human review.
terraform apply runs. The engineer who made the fix doesn’t know it was undone. The engineer running Terraform doesn’t know the fix existed. An off-the-books payment that the accountant “corrected” during reconciliation. The incident recurs weeks later and nobody connects the two events. Drift detection on a 30-minute schedule eliminates the ghost by flagging the console change before anyone forgets about it.What the Industry Gets Wrong About GitOps
“GitOps is Kubernetes.” GitOps is a principle: declared state in Git, automated reconciliation, drift detection. Kubernetes is one target. Terraform, DNS, IAM policies, database schemas, and network configuration all benefit from the same principle. Limiting GitOps to Kubernetes leaves everything else in the manual-apply-and-hope model.
“Drift detection is optional.” Drift detection is the entire point. Without it, GitOps is just “infrastructure in Git” with no guarantee that Git reflects reality. Infrastructure that drifts between applies piles up risk nobody can see until the next terraform plan produces a wall of unexpected changes.
“GitOps means no manual access to production.” GitOps means every change flows through Git as the source of truth. Break-glass access for emergencies is still necessary. The difference is that break-glass changes get codified in a PR within the hour, not forgotten until the next terraform apply reverts them.
terraform plan in detect mode on a 4-hour schedule against every production workspace. Alert on any drift. Require a PR to codify or revert every detected change within 48 hours. This single practice eliminates ghost reverts, undocumented console fixes, and the “what changed?” question that derails incident investigations. One practice. The books always match reality.That security group rule, added in the console during an incident, quietly reverted three weeks later by terraform apply? With drift detection on a 30-minute schedule and every change flowing through a PR, the console fix gets flagged within the hour and codified before anyone forgets about it. No ghost revert. No repeat incident. Git and reality finally match.