← Back to Insights

GitOps Beyond Kubernetes: Terraform, DBs, and Policy

Metasphere Engineering 12 min read

Your on-call engineer gets paged for a connectivity issue between two services. She adds an inbound rule to a security group directly in the cloud console. The incident resolves. The Slack thread goes quiet. Everyone goes back to sleep. Nobody files a follow-up ticket.

An off-the-books transaction. No receipt. No entry in the ledger.

Three weeks later, someone on the infrastructure team runs terraform apply as part of a routine change. Terraform doesn’t know about the console fix. It sees drift from the declared state and quietly reverts the security group to its pre-incident configuration. The connectivity between those two services breaks again. Another page, another scramble, and this time the on-call engineer has no idea what changed because the Terraform apply log shows a “normal” change. She’s debugging a ghost. The accountant “corrected” a transaction they didn’t know was intentional. Same problem. Nobody connects the two events.

Key takeaways
  • Console fixes get reverted by terraform apply. Any manual change outside the declared state is a ghost waiting to cause the same incident twice.
  • GitOps extends beyond Kubernetes. Terraform, network policies, DNS, IAM, database schemas. Anything with a declared state can be reconciled.
  • Drift detection must run continuously, not just on deploy. Infrastructure that drifts between applies piles up invisible risk.
  • PR-based infrastructure changes create audit trails automatically. Who changed what, when, why, and who approved it. Every compliance framework asks for this.
  • Emergency console access needs a break-glass process that immediately creates a PR to codify the change. The receipt goes in the ledger within the hour.

ArgoCD and Flux taught the industry GitOps for Kubernetes. The reconciliation loop (compare desired to actual, close the gap) works for everything declarative: Terraform, database schemas, OPA policies, network config. Limiting GitOps to Kubernetes leaves the majority of the operational surface unreconciled. Keeping perfect books for one department while the rest of the company runs on napkin math.

Infrastructure GitOps with Terraform

Running terraform plan locally and then terraform apply is not GitOps. It’s infrastructure-as-code without the reconciliation loop. Atlantis and Spacelift implement the real pattern: PR triggers plan, merge triggers apply, scheduled runs detect drift.

GitOps LayerToolReconciliationDrift Detection
Kubernetes workloadsArgoCD / FluxContinuous (every 3 min)Built-in, visual diff
Cloud infrastructureAtlantis / Spacelift + TerraformPR-triggered applyScheduled terraform plan
Database schemasFlyway / Liquibase in CIMigration on mergeSchema comparison tools
DNS / CDN configTerraform or OctodnsPR-triggered applyDrift detection via API diff
IAM / RBAC policiesOPA / Kyverno in GitAdmission-time enforcementPolicy audit logs
The drift time bomb: a manual console change silently reverted by terraform apply weeks laterAn animated sequence showing infrastructure in sync, a manual console change during an incident, time passing as drift accumulates, then terraform apply unknowingly reverting the change and breaking services again.Terraform state (code)Live infrastructure1In SyncCode = Live infra2Incident! Manual FixEngineer adds security grouprule via AWS console3Time PassesNobody updates the codeDriftLive stateCode state5terraform applyRoutine change. Sees drift.Reverts SG to code state.6Security rule removedThe manual fix is silently undone!Services break againAnother page. The on-call has no idea what changed.

The Terraform workflow looks different from Kubernetes GitOps because human review stays in the loop. No auto-apply on drift detection. Every plan gets a PR.

Terraform GitOps: PR Plan, Merge ApplyTerraform GitOps: PR Triggers Plan, Merge Triggers ApplyPR OpenedEngineer changes .tffiles in a branchterraform planRuns automaticallyDiff posted as PR commentReviewable before mergePR MergedTeam reviewed planApproved the diffterraform applyAtlantis / Spacelift runs applyInfrastructure matches codeDrift detection runs every 30 minPR = plan. Merge = apply. Git is the single source of truth for infrastructure.

Drift Detection: The Part Everyone Skips

A handful of manual console changes every month during incidents. “I’ll codify it tomorrow.” (Tomorrow never comes. It never does.) Those changes pile up as drift until terraform apply either reverts a critical fix or produces a plan with 47 unexpected changes nobody wants to untangle. Infrastructure-as-code gets abandoned while the actual infrastructure evolves through the console.

The fix is mechanical, not cultural. Run terraform plan in detect mode on a 30-minute schedule. Any non-empty plan opens a PR automatically and alerts the team. Within weeks, console drift drops steeply. Not because engineers changed their habits out of principle, but because the tool changed the feedback loop. Drift gets flagged before anyone forgets about it.

GitOps Reconciliation: Desired vs Actual, ContinuouslyGitOps: Continuous Reconciliation LoopGit RepositoryDesired state (source of truth)Reconciler (ArgoCD / Flux)Compare desired vs actual every 3 minutesMatchIn SyncDriftAuto-remediateReconciler applies Git stateManual change revertedManual changes are reverted automatically. Git always wins.

Database Schema as Code

Database migrations are the exception that proves the GitOps rule. Kubernetes manifests and Terraform configs are bidirectional: sync to any commit, forward or rollback. Database migrations are forward-only. Reverting a commit doesn’t drop a column or un-rename a table.

Prerequisites
  1. All schema migrations stored in version control with sequential numbering or timestamps
  2. CI pipeline validates every migration against a test database before merge approval
  3. Expand/contract pattern adopted for all breaking schema changes (add new column, migrate data, drop old column in a later migration)
  4. Migration execution time logged and alerted when exceeding expected bounds
  5. Rollback migrations written and tested for every change that modifies existing columns

Flyway and Liquibase store migrations in version control with CI validation against a test database before production apply. The safety net: run every migration against a test database on every PR. Fast enough to finish in seconds for most schemas. Catches syntax errors, constraint violations, and data type mismatches before they hit production.

For breaking changes, the expand/contract pattern is mandatory. Add the new column in one migration, backfill data, update application code to write to both columns, then drop the old column in a separate migration after all consumers have migrated. More ceremony than a direct rename. Far fewer incidents.

GitOps reconciliation models: bidirectional for K8s, forward-only for TerraformKubernetes uses bidirectional reconciliation: desired state in Git, actual state in cluster, controller continuously converges. Terraform uses forward-only: plan in CI, apply on merge, no continuous loop. Database schemas need a hybrid: migration files in Git, applied forward-only with rollback scripts.GitOps Reconciliation: Three ModelsKubernetesBidirectional reconciliationGit = desired stateCluster = actual stateArgoCD/Flux convergescontinuously (every 3 min)Self-healing loopTerraformForward-only applyPR triggers planMerge triggers applyNo continuous loopDrift detected by scheduled planEvent-driven, not continuousDatabase SchemasHybrid: forward + rollbackMigration files in GitApplied forward-only (Flyway)Rollback scripts requiredState is cumulative, not declarativeOrdered, irreversible stepsNot everything reconciles like Kubernetes. Match the model to the resource.

Policy-as-Code: Security Without Review Queues

OPA Gatekeeper evaluates admission requests in under 5ms. Container running as root? Rejected with a specific fix message in the kubectl output. No waiting for a security review. No spreadsheet of exceptions. The policy is code, the enforcement is real-time, and the audit trail is a Git commit. Automated enforcement that never takes a break and never misses one.

Start with a handful of critical rules: no privileged containers, no public load balancers without approval, required resource limits on every pod. Grow into the hundreds as the team gains confidence. When an auditor asks “when was this policy enacted and who approved it?” the answer is a Git commit with timestamp, diff, and reviewed PR. Compliance audits get shorter. Security posture improves. Both from the same mechanism.

Anti-pattern

Don’t: Keep security policies in a wiki or spreadsheet and rely on manual review to catch violations. Policies that exist only in documentation are hypotheses about what production looks like. They’re rarely accurate.

Do: Express policies as OPA Rego or Kyverno YAML in Git. Deploy through the same PR review process as application code. Enforce at admission time. A policy violation is rejected before the resource ever exists in the cluster.

# OPA policy: deny containers running as root
package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "Pod"
  container := input.request.object.spec.containers[_]
  container.securityContext.runAsUser == 0
  msg := sprintf(
    "Container '%s' runs as root (UID 0). Set runAsNonRoot: true",
    [container.name]
  )
}

The Incident Response Shift

When an incident demands a configuration change, the instinct is to fix it in the console. Faster. More direct. But that console fix becomes a ghost the next time terraform apply runs.

Committing the fix to Git instead of the console takes 3-5 minutes longer during the incident. In exchange: an audit trail, pipeline validation, no silent revert. For DevOps teams deploying 10+ times per day, committing-first becomes muscle memory within 2-3 weeks. The reconciler reverts unauthorized console changes within 30 minutes. After a few reverts, engineers discover that Git-first is the path of least resistance. Writing the receipt is faster than explaining why there isn’t one.

When to commit-firstWhen break-glass is justified
Configuration change with known fixCustomer-impacting outage with unknown root cause
Change that can wait 3-5 minutes for PRSecurity incident needing immediate action
Non-time-critical infrastructure modificationComplete service failure where minutes of downtime compound

For break-glass scenarios, the process must include an automatic follow-up: the console change triggers a PR within the hour to codify it. Without that follow-up mechanism, break-glass becomes the default and GitOps degrades to “infrastructure in Git, sometimes.” Within the hour. No exceptions.

GitOps adoption roadmap from Kubernetes to full platform coverage over 8 monthsFour-phase adoption: Kubernetes GitOps with ArgoCD in months 1-2, infrastructure GitOps with Terraform in months 3-4, database schema GitOps with Flyway in months 5-6, and policy GitOps with OPA in months 7-8. Each phase builds on the previous foundation.GitOps Adoption Roadmap: Kubernetes to Full PlatformMonth 1-2KubernetesArgoCD or FluxApp manifests in Git3-min sync intervalFoundationMonth 3-4InfrastructureAtlantis or SpaceliftTerraform in Git30-min drift detectionAdd dataMonth 5-6DB SchemasFlyway or LiquibaseMigrations in GitCI-validated deploysAdd policyMonth 7-8PolicyOPA GatekeeperRego policies in GitReal-time enforcementBy month 8: "what changed?" always has an answer in Git.

Extend gradually: Kubernetes first (months 1-2), Terraform with drift detection (3-4), database migrations (5-6), policy-as-code through platform engineering (7-8). By month 8, “what changed?” always has an answer in Git.

Push vs. pull model: when each applies

Push model: CI/CD pushes changes to the target system on every commit. Simpler to set up. Works well for Terraform applies and database migrations where the change is triggered by a merge event. Downside: the CI system needs credentials to the production environment, expanding the attack surface.

Pull model: an agent running in the target environment (ArgoCD, Flux) polls Git every 3-5 minutes and pulls changes. The agent authenticates outbound to Git, which means no inbound credentials to production from CI. Continuous reconciliation handles network gaps gracefully. If the cluster was unreachable when a change was pushed, the pull model catches up automatically on the next poll.

For Kubernetes, pull is the clear winner because of the security model and continuous reconciliation. For Terraform and database migrations, push is more practical because the “reconciliation” is a plan-and-apply cycle that shouldn’t run on an automatic schedule without human review.

The Ghost Revert A console fix made during an incident that gets quietly reverted the next time terraform apply runs. The engineer who made the fix doesn’t know it was undone. The engineer running Terraform doesn’t know the fix existed. An off-the-books payment that the accountant “corrected” during reconciliation. The incident recurs weeks later and nobody connects the two events. Drift detection on a 30-minute schedule eliminates the ghost by flagging the console change before anyone forgets about it.

What the Industry Gets Wrong About GitOps

“GitOps is Kubernetes.” GitOps is a principle: declared state in Git, automated reconciliation, drift detection. Kubernetes is one target. Terraform, DNS, IAM policies, database schemas, and network configuration all benefit from the same principle. Limiting GitOps to Kubernetes leaves everything else in the manual-apply-and-hope model.

“Drift detection is optional.” Drift detection is the entire point. Without it, GitOps is just “infrastructure in Git” with no guarantee that Git reflects reality. Infrastructure that drifts between applies piles up risk nobody can see until the next terraform plan produces a wall of unexpected changes.

“GitOps means no manual access to production.” GitOps means every change flows through Git as the source of truth. Break-glass access for emergencies is still necessary. The difference is that break-glass changes get codified in a PR within the hour, not forgotten until the next terraform apply reverts them.

Our take Run terraform plan in detect mode on a 4-hour schedule against every production workspace. Alert on any drift. Require a PR to codify or revert every detected change within 48 hours. This single practice eliminates ghost reverts, undocumented console fixes, and the “what changed?” question that derails incident investigations. One practice. The books always match reality.

That security group rule, added in the console during an incident, quietly reverted three weeks later by terraform apply? With drift detection on a 30-minute schedule and every change flowing through a PR, the console fix gets flagged within the hour and codified before anyone forgets about it. No ghost revert. No repeat incident. Git and reality finally match.

Your Console Fixes Are Getting Silently Reverted

GitOps applied only to Kubernetes is a partial solution. Extending GitOps principles to Terraform, cloud resources, DNS, and IAM policies creates complete auditability and drift prevention across your entire operational surface.

Expand Your GitOps Practice

Frequently Asked Questions

What is the core principle that makes GitOps valuable?

+

GitOps makes desired state declarative and version-controlled, with an automated reconciler comparing desired to actual state every 3-5 minutes. Every change is a Git commit with author, timestamp, and rationale. Configuration drift incidents drop. Mean time to recovery shrinks to minutes via git revert.

How does GitOps apply to Terraform infrastructure?

+

Terraform code in Git with plan/apply automated by CI/CD is GitOps for infrastructure. The key addition is drift detection: running terraform plan on a 15-30 minute schedule and alerting on non-empty plans. Atlantis and Spacelift handle the full loop. Teams using Atlantis catch most unauthorized console changes within one detection cycle.

Can database schema changes be managed with GitOps?

+

Yes, with a key difference. Flyway and Liquibase store migrations in version control with CI validation against a test database before production apply. But database migrations are forward-only and often irreversible. You can’t revert to a previous commit and expect the schema to roll back. Teams run migration validation in seconds against test databases, catching the overwhelming majority of errors before production.

What is policy-as-code in the GitOps model?

+

Policy-as-code means security and compliance rules expressed as OPA Rego or Kyverno YAML, stored in Git and deployed through PR review. OPA Gatekeeper evaluates admission requests in under 5ms, rejecting non-compliant resources in real time. Mature organizations enforce hundreds of rules, catching most misconfigurations at deploy time that would otherwise reach production.

What is the difference between push and pull model GitOps?

+

Push model means CI/CD pushes changes on commit. Pull model means an agent like ArgoCD or Flux polls Git every 3 minutes and pulls changes. Pull models are preferred because the agent authenticates outbound, removing the need for inbound CI credentials to production. This sharply reduces credential exposure and provides continuous reconciliation that handles network gaps gracefully.