Terraform Tutorial: Drift Detection Strategies

Bits Lovers
Written by Bits Lovers on
Terraform Tutorial: Drift Detection Strategies

Teams sometimes assume their infrastructure-as-code templates are the final word on what’s running. That’s rarely true for long. Configuration drift — the gap between what your code says and what’s actually deployed — is one of those problems that creeps in no matter how careful your DevOps workflow is.

Drift happens whenever someone changes, adds, or removes infrastructure outside of Terraform. Maybe a teammate tweaked an instance type in the AWS console. Maybe an auto-scaling event changed a resource. Maybe a runbook script modified something directly. The reasons vary, but the result is the same: your state file no longer reflects reality.

Terraform state

Before talking about drift, a quick primer on terraform state. State is basically Terraform’s memory — it’s what Terraform thinks your infrastructure looks like right now. Each resource in your .tf files gets mapped to a real object in AWS, Azure, or wherever, and that mapping lives in the state file.

Terraform uses resource mapping to connect your code to actual cloud resources. The binding sits in the state file. Here’s the thing: if someone changes a real resource without going through Terraform, the state file is none the wiser. It still thinks everything matches. That mismatch is drift.

How drift creeps in

Changes made outside Terraform — through a cloud console, a CLI tool, or an automation script — are invisible to the state file. For example, if you change a VM size through the Azure portal, Terraform won’t detect that change until the next plan or apply.

Some resources are more prone to drift than others. Virtual machines managed through Terraform often have configuration options that teams also tweak manually. The provider can track these, but only if all changes go through Terraform.

You can also run a non-Terraform automation process that modifies resources. Same result — the state goes stale. In some cases you can catch this by running terraform plan, but certain changes can break state in ways that require manual fixes.

Be careful when making changes outside Terraform. Even small modifications can cause deployment failures by corrupting the state of resources.

Drift detection strategies

You can detect drift by comparing the state file against the current state reported by the provider’s API. There are several approaches:

terraform plan -refresh-only

If you’re on Terraform v0.15.4 or later (released May 2021), this is the way to go. The -refresh-only flag replaced the old terraform refresh command.

terraform plan -refresh-only

What this does: it calls the provider API, fetches the current state of every managed resource, and compares that against what’s in your state file. You only see drift — no config-driven changes show up in the plan. You review first, then decide.

To write the updated state back:

terraform apply -refresh-only

The workflow:

  1. terraform init
  2. terraform plan -refresh-only — see what’s drifted
  3. Check the output
  4. terraform apply -refresh-only — update state

The older terraform refresh command (deprecated)

Before -refresh-only existed, teams used terraform refresh to detect drift. This command reads the current state of remote objects and updates the state file directly.

This command is now deprecated. The problem with it: it applies state changes automatically without giving you a chance to review them first. That’s risky. If the remote resources are misconfigured, you could corrupt your state file.

If you’re still using terraform refresh, switch to -refresh-only. The old command is effectively the same as terraform apply -refresh-only -auto-approve — it skips the review step that the new workflow gives you.

Running a drift detection plan

When you run terraform plan -refresh-only, Terraform compares the state file against the provider API responses. The output uses standard plan notation:

  • ~ indicates a changed attribute
  • - indicates something was removed
  • + indicates something new was detected

For example, if someone changed a VM from Standard_DS2_v2 to Standard-B2ms outside Terraform, the plan output would show that difference and propose updating the state to match.

Third-party drift detection tools

Beyond Terraform’s built-in capabilities, a few external tools have existed for drift detection. Their status has changed significantly since this article was first published.

CloudQuery

CloudQuery started as an open-source cloud asset inventory powered by SQL. It could extract resources from cloud providers, load them into PostgreSQL, and run drift detection queries on top.

Since then, CloudQuery has pivoted to become a broader cloud security and compliance platform (CSPM and FinOps). It’s still maintained, but drift detection is no longer its focus. If you need a general cloud asset inventory, it’s worth a look. For drift detection specifically, you’re better off with Terraform’s native -refresh-only workflow.

Driftctl

Driftctl was an open-source tool from CloudSkiff that tracked drift across managed and unmanaged resources on AWS, Azure, GCP, and GitHub. It worked well for what it did.

Then CloudSkiff got acquired by Snyk in 2022, and the project stalled. It’s now in maintenance mode on Snyk’s GitHub — no new features, just critical fixes. Never made it to 1.0. We’d skip it for anything new.

What changed since 2021

A lot has happened. Here’s the drift-relevant timeline:

Version What changed
v0.15.4 (May 2021) Added -refresh-only flag
v1.0 (June 2021) Stabilized that workflow, deprecated terraform refresh
v1.1 (Nov 2021) moved blocks — refactor without messing with state
v1.5 (June 2023) import blocks — bring unmanaged resources under Terraform control declaratively
v1.7 (Jan 2024) Better removed blocks for cleaner resource lifecycle
v1.10 (2025) Ephemeral resources (not stored in state), S3 backend got native state locking

One more thing: HashiCorp changed Terraform’s license in August 2023, and the community forked it into OpenTofu (now a CNCF sandbox project). Same drift detection, same -refresh-only workflow. If you want an open-source Terraform, that’s the one.

Practical recommendations

For most teams, the built-in -refresh-only workflow is sufficient. Here’s what works in practice:

  1. Run terraform plan -refresh-only on a schedule (daily or weekly) to catch drift early
  2. If you use Terraform Cloud, it has managed drift detection that can run periodic checks on your workspaces — check the current HashiCorp docs for setup details
  3. Treat drift as a signal: if the same resources keep drifting, something in your workflow is pushing changes outside Terraform
  4. Use moved and import blocks to keep your state clean, which reduces false positives

Detecting drift is straightforward. Preventing it is the harder problem. The best strategy is to restrict direct infrastructure changes and make Terraform the only path to modify resources. That’s easier said than done, especially in larger teams, but every manual change you eliminate is one less drift event to chase down.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus