Terraform Stacks: Multi-Environment State Management
Terraform workspaces seemed like the solution to multi-environment management — one configuration, many states. Then teams discovered the problems: workspace sprawl, no isolation between environments at the module level, and workspace state files in the same backend bucket with no clear separation. Terraform Stacks, GA in HCP Terraform since late 2025, takes a different approach: you declare components and deployments explicitly in .tfstack.hcl and .tfdeploy.hcl files, and HCP Terraform manages the orchestration, dependency resolution, and per-deployment state.
This guide covers the Stacks model, the file syntax, multi-environment deployment patterns, and what to do with the traditional S3 backend approach for teams not on HCP Terraform or using OpenTofu.
What Terraform Stacks Solves
Traditional Terraform state management at scale runs into three problems:
State file proliferation. A team with 5 environments and 10 infrastructure components ends up with 50 state files, each in a different S3 prefix, each with its own DynamoDB lock entry. Managing the cross-references between them — the VPC ID that the EKS module needs from the networking module — means terraform_remote_state data sources scattered throughout every module.
Workspace limitations. Workspaces share the same Terraform configuration. If production needs a different instance type or CIDR range, you end up with workspace-specific conditionals that make the code harder to read and impossible to test cleanly.
Orchestration gaps. When you update the VPC module, all five environment stacks that depend on it need to re-run in the right order. Nothing in vanilla Terraform coordinates this. Teams write shell scripts or CI pipelines to handle the sequencing.
Stacks addresses all three by treating the full component graph as a first-class resource.
Stack File Syntax
A Stack lives in a directory with .tfstack.hcl files (components) and .tfdeploy.hcl files (deployments). Components are Terraform root modules wired together; deployments are instances of the stack with specific variable values.
# stacks/eks-platform/networking.tfstack.hcl
component "networking" {
source = "./components/networking"
version = "~> 2.0"
inputs = {
vpc_cidr = var.vpc_cidr
availability_zones = var.availability_zones
cluster_name = var.cluster_name
}
}
component "eks" {
source = "./components/eks"
inputs = {
vpc_id = component.networking.outputs.vpc_id
subnet_ids = component.networking.outputs.private_subnet_ids
cluster_version = var.kubernetes_version
cluster_name = var.cluster_name
}
}
component "addons" {
source = "./components/eks-addons"
inputs = {
cluster_name = component.eks.outputs.cluster_name
cluster_endpoint = component.eks.outputs.cluster_endpoint
oidc_provider = component.eks.outputs.oidc_provider_arn
}
}
The component.networking.outputs.vpc_id reference creates an explicit dependency: HCP Terraform knows eks must wait for networking to complete before it can apply. No manual orchestration needed.
Variables in Stacks are declared separately:
# stacks/eks-platform/variables.tfstack.hcl
variable "cluster_name" {
type = string
}
variable "vpc_cidr" {
type = string
}
variable "kubernetes_version" {
type = string
default = "1.32"
}
variable "availability_zones" {
type = list(string)
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
Deployment Configuration
If a Stack is the blueprint, a Deployment is the specific build. Staging and production are both deployments of the same Stack, differing only in their input values. Each deployment gets completely isolated state — destroying the staging deployment touches nothing in production, and a plan run in one can’t interfere with an apply running in the other:
# stacks/eks-platform/deployments.tfdeploy.hcl
deployment "staging" {
inputs = {
cluster_name = "staging-eks"
vpc_cidr = "10.1.0.0/16"
kubernetes_version = "1.32"
}
orchestration_policy = run_all # Apply all components in dependency order
}
deployment "production" {
inputs = {
cluster_name = "production-eks"
vpc_cidr = "10.0.0.0/16"
kubernetes_version = "1.31" # Production trails staging by one version
}
orchestration_policy = serial # Apply one component at a time in production
}
With run_all, HCP Terraform runs independent components in parallel and respects the dependency graph. With serial, each component waits for the previous to complete regardless of dependencies — safer for production but slower.
# HCP Terraform CLI commands for Stacks
terraform stacks list
# STACK LAST DEPLOYED STATUS
# eks-platform 2 hours ago Applied
terraform stacks plan eks-platform --deployment staging
terraform stacks apply eks-platform --deployment staging
terraform stacks apply eks-platform --deployment production
State Management in Stacks
State isolation is where Stacks genuinely improves on workspaces. Each deployment keeps a separate state file per component — staging/eks has no connection to production/eks beyond sharing the same HCL definition. Concurrent plans in different deployments can’t corrupt each other’s state. The structure HCP Terraform uses internally:
eks-platform/
deployments/
staging/
components/
networking/ ← independent state
eks/ ← independent state
addons/ ← independent state
production/
components/
networking/
eks/
addons/
This isolation is the key advantage over workspaces. Staging’s EKS component state has nothing to do with production’s EKS component state — they’re completely separate resources with separate lock files. A plan run in staging can’t interfere with a concurrent plan in production.
Traditional S3 State Management (Non-HCP / OpenTofu)
For teams on OpenTofu or self-hosted Terraform without HCP Terraform, the S3 backend with DynamoDB locking is still the standard approach. The Stacks feature is HCP Terraform-only; OpenTofu is working on a comparable feature (called “Stacks-compatible modules”) but it’s not GA as of this writing.
# backend.tf — shared across environments via partial configuration
terraform {
backend "s3" {
bucket = "my-terraform-state"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
With partial backend configuration, the key (state file path) is injected at init time:
# Each component/environment combo gets its own state file
terraform init \
-backend-config="key=production/networking/terraform.tfstate"
terraform init \
-backend-config="key=production/eks/terraform.tfstate"
terraform init \
-backend-config="key=staging/networking/terraform.tfstate"
Set up the S3 bucket and DynamoDB table once:
# bootstrap/main.tf — run this manually once, state stored locally
resource "aws_s3_bucket" "state" {
bucket = "my-terraform-state"
}
resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
bucket = aws_s3_bucket.state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "state" {
bucket = aws_s3_bucket.state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "state_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Purpose = "Terraform state locking"
}
}
Cross-State References Without Stacks
Without Stacks, components share outputs via terraform_remote_state:
# eks/main.tf — reading networking output from separate state file
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "${var.environment}/networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_eks_cluster" "this" {
name = "${var.environment}-eks"
role_arn = aws_iam_role.cluster.arn
vpc_config {
subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}
}
The downside: terraform_remote_state requires the state file to exist and be readable before any plan or apply in the dependent module. In a fresh environment, you must apply networking first, manually, then eks, then addons. Stacks automates this; the S3 approach requires a scripted pipeline or GitOps tooling like the Flux CD + OpenTofu guide to handle the ordering.
State Operations You’ll Need
# List all resources in a state file
terraform state list
# Move a resource to a different state file (common during refactoring)
terraform state mv \
-state-out=../new-module/terraform.tfstate \
aws_vpc.main \
module.networking.aws_vpc.main
# Import an existing resource into state (without recreating it)
terraform import aws_eks_cluster.this production-eks
# Remove a resource from state without destroying it
terraform state rm aws_security_group.temporary
# Pull remote state to local for inspection
terraform state pull > state-backup.json
# Push local state to remote (use with extreme caution)
terraform state push state-backup.json
# Force-unlock a stuck lock (DynamoDB lock ID from the error message)
terraform force-unlock LOCK_ID_FROM_ERROR_MESSAGE
The force-unlock command is for the case where a terraform apply died mid-run and left a DynamoDB lock entry. The lock ID is in the error: Error: Error acquiring the state lock: Lock ID: 12345678-1234-1234-1234-123456789012. Only run this when you’re certain no apply is actively running.
State Security
Terraform state contains sensitive values — database passwords, private keys, connection strings. Anyone with read access to the S3 bucket can extract every secret from your infrastructure.
# Restrict state bucket access to specific IAM roles only
aws s3api put-bucket-policy \
--bucket my-terraform-state \
--policy '{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyNonCIAccess",
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-terraform-state/*",
"Condition": {
"StringNotLike": {
"aws:PrincipalArn": [
"arn:aws:iam::123456789012:role/terraform-ci-role",
"arn:aws:iam::123456789012:role/terraform-admin-role"
]
}
}
}
]
}'
For HCP Terraform, Stacks manages state access through HCP’s IAM model — team access controls apply per workspace, and state is never stored in your own bucket. State is encrypted at rest and in transit within HCP’s infrastructure.
For the CI/CD pipeline that triggers Terraform/OpenTofu applies, the GitHub Actions with Terraform guide covers the OIDC-based authentication that gives your CI runner access to the state bucket without storing long-lived credentials.
Comments