Flux CD + OpenTofu: GitOps for Kubernetes and Infrastructure

Bits Lovers
Written by Bits Lovers on
Flux CD + OpenTofu: GitOps for Kubernetes and Infrastructure

HashiCorp switched Terraform to the Business Source License in August 2023. Within weeks, the OpenTofu fork was announced under the Linux Foundation, accepted as a CNCF project, and had a 1.6 release by early 2024. By mid-2026, OpenTofu 1.9 is the production-ready, drop-in replacement for Terraform 1.x — same HCL syntax, same provider ecosystem, same state format.

Flux CD v2 is the CNCF-graduated GitOps operator for Kubernetes, handling both application deployments and, via the tofu-controller, OpenTofu runs triggered from Git commits. Together they cover the full infrastructure lifecycle from a single Git workflow: push a change to an OpenTofu module, Flux detects it, runs tofu apply, and reconciles the resulting state back into the cluster. This guide covers Flux installation, the four Flux controllers, deploying applications with Kustomize and Helm, and wiring up the tofu-controller for infrastructure-as-code GitOps on EKS.

How Flux Works

Flux runs four controllers as pods in your cluster:

  • Source Controller: watches Git repos, Helm repos, and OCI registries for changes. Fetches and caches artifacts.
  • Kustomize Controller: applies Kustomize overlays and raw Kubernetes manifests from Sources.
  • Helm Controller: installs and upgrades Helm charts declared as HelmRelease objects.
  • Notification Controller: sends alerts to Slack, PagerDuty, or GitHub commit statuses on reconciliation events.

The reconciliation loop runs every few minutes by default. When you push a commit, the Source Controller detects it within one poll interval (default: 1 minute for Git). For immediate reconciliation, flux reconcile forces it manually.

Installing Flux on EKS

The flux bootstrap command does two things at once: it installs the controllers into your cluster and commits the component manifests into your Git repository, so Flux immediately starts managing itself. From that point forward, changing Flux configuration means opening a PR — not running kubectl:

# Install the Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# Verify prerequisites
flux check --pre

# Bootstrap Flux with GitHub (creates flux-system namespace and commits manifests)
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
export GITHUB_USER=my-org

flux bootstrap github \
  --owner=$GITHUB_USER \
  --repository=fleet-infra \
  --branch=main \
  --path=./clusters/production \
  --personal=false \
  --components-extra=image-reflector-controller,image-automation-controller

# Verify all controllers are running
kubectl get pods -n flux-system
# NAME                                        READY   STATUS    RESTARTS
# helm-controller-5f7b8c9d6-xxxxx             1/1     Running   0
# kustomize-controller-7c6b4d8f9-xxxxx        1/1     Running   0
# notification-controller-8d9c5e7f6-xxxxx     1/1     Running   0
# source-controller-6e8f7d4c5-xxxxx           1/1     Running   0

After bootstrap, the fleet-infra repository at clusters/production/flux-system/ contains the Flux component manifests. Flux is now reconciling itself from Git — any change to those manifests is automatically applied.

Deploying Applications with Kustomize

Define a GitRepository source and a Kustomization that points to your app manifests:

# clusters/production/apps/source.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: my-api
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/my-org/my-api
  ref:
    branch: main
  secretRef:
    name: github-token   # Secret with type: Opaque, data.username + data.password
# clusters/production/apps/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: my-api
  namespace: flux-system
spec:
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: my-api
  path: "./k8s/overlays/production"
  prune: true          # Delete resources removed from Git
  wait: true           # Wait for resources to become Ready
  timeout: 5m
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: my-api
      namespace: my-api

The prune: true field is what makes Flux genuinely GitOps rather than just apply-on-commit: resources deleted from Git are deleted from the cluster. Without it, you’d accumulate orphaned resources.

# Watch reconciliation status
flux get kustomizations --watch
# NAME     REVISION             SUSPENDED  READY  MESSAGE
# my-api   main@sha1:abc1234    False      True   Applied revision: main@sha1:abc1234

# Force immediate reconciliation
flux reconcile kustomization my-api --with-source

# Check health
flux get all -n flux-system

Deploying Helm Charts with HelmRelease

The Helm Controller manages chart installations declared as HelmRelease objects:

# clusters/production/monitoring/prometheus.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: prometheus-community
  namespace: flux-system
spec:
  interval: 12h
  url: https://prometheus-community.github.io/helm-charts

---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: kube-prometheus-stack
  namespace: monitoring
spec:
  interval: 1h
  chart:
    spec:
      chart: kube-prometheus-stack
      version: ">=58.0.0 <59.0.0"
      sourceRef:
        kind: HelmRepository
        name: prometheus-community
        namespace: flux-system
  values:
    grafana:
      adminPassword: "${GRAFANA_ADMIN_PASSWORD}"   # Substituted from Secret
    prometheus:
      prometheusSpec:
        retention: 15d
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: gp3
              resources:
                requests:
                  storage: 50Gi
  install:
    remediation:
      retries: 3
  upgrade:
    cleanupOnFail: true
    remediation:
      retries: 3
      strategy: rollback

The version range ">=58.0.0 <59.0.0" tells Flux to automatically upgrade within the minor range but never cross a major version without a deliberate manifest change. The upgrade.remediation.strategy: rollback means a failed upgrade automatically rolls back to the last successful release.

OpenTofu on EKS via tofu-controller

The tofu-controller is a Flux-compatible controller that runs OpenTofu inside your cluster, reading .tf files from a Git source and reconciling infrastructure state. Install it alongside Flux:

# Install tofu-controller
flux create source helm tofu-controller \
  --url=https://flux-iac.github.io/tofu-controller/ \
  --namespace=flux-system

flux create helmrelease tofu-controller \
  --chart=tofu-controller \
  --source=HelmRepository/tofu-controller \
  --namespace=flux-system \
  --chart-version=">=0.16.0"

# Verify
kubectl get pods -n flux-system | grep tofu
# tofu-controller-5f9b8c7d-xxxxx   1/1   Running   0

A note on naming: the CRD is still called Terraform in the tofu-controller, not OpenTofu. The controller runs OpenTofu binaries underneath, but the Kubernetes API object kept its original name for backwards compatibility. Don’t let that confuse you when reading status output:

# clusters/production/infrastructure/vpc.yaml
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
  name: vpc
  namespace: flux-system
spec:
  interval: 15m
  approvePlan: "auto"      # auto-approve plans; set to "" for manual approval
  path: ./terraform/vpc
  sourceRef:
    kind: GitRepository
    name: fleet-infra
    namespace: flux-system
  vars:
    - name: cluster_name
      value: production-eks
    - name: vpc_cidr
      value: "10.0.0.0/16"
  backendConfig:
    customConfiguration: |
      backend "s3" {
        bucket         = "my-tfstate-bucket"
        key            = "production/vpc/terraform.tfstate"
        region         = "us-east-1"
        dynamodb_table = "terraform-state-lock"
        encrypt        = true
      }
  serviceAccountName: tofu-runner     # SA with IRSA permissions to manage VPC resources

When you push a commit that modifies terraform/vpc/, Flux detects the change via the Source Controller, the tofu-controller generates a plan, and with approvePlan: "auto" it applies immediately. For production infrastructure you’ll want approvePlan: "" instead — this pauses after the plan step and requires a human to set the approval field:

# Review the generated plan
kubectl get terraform vpc -n flux-system -o jsonpath='{.status.plan.message}'

# Approve the plan (after review)
kubectl patch terraform vpc -n flux-system \
  --type=merge -p '{"spec":{"approvePlan":"plan-abc123"}}'

IRSA for the tofu-controller

When the tofu-controller runs tofu apply to provision an EKS cluster or VPC, the runner pod needs AWS credentials. The wrong approach is mounting an access key pair as a Secret — that key needs rotation, it can leak, and it’s not auditable at the API call level. IRSA gives the runner pod short-lived credentials tied to a specific IAM role without any secrets in the cluster:

# Create the IRSA role
aws iam create-role \
  --role-name tofu-controller-runner \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"},
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:flux-system:tofu-runner"
        }
      }
    }]
  }'

# Attach appropriate policies (scope to what your TF modules actually need)
aws iam attach-role-policy \
  --role-name tofu-controller-runner \
  --policy-arn arn:aws:iam::aws:policy/AmazonVPCFullAccess
# The ServiceAccount for the runner pod
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tofu-runner
  namespace: flux-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/tofu-controller-runner

Multi-Environment Setup

Most teams run at least two clusters (staging and production) with shared application code but different configuration values. The convention the Flux community settled on puts cluster-specific paths under clusters/ and reusable app manifests under apps/. This keeps staging and production separated at the cluster layer without duplicating application manifests:

fleet-infra/
├── clusters/
│   ├── production/
│   │   ├── flux-system/        # Flux components (bootstrapped)
│   │   ├── infrastructure/     # Kyverno, cert-manager, ingress, tofu resources
│   │   └── apps/               # Application kustomizations
│   └── staging/
│       ├── flux-system/
│       ├── infrastructure/
│       └── apps/
└── apps/
    ├── base/                   # Shared app manifests
    └── overlays/
        ├── production/         # Prod-specific patches
        └── staging/            # Staging-specific patches

Each cluster’s flux-system/ contains a kustomization.yaml that references the infrastructure and apps directories in order:

# clusters/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - flux-system
  - infrastructure
  - apps

Flux processes these in dependency order: infrastructure (cert-manager, networking) reconciles before apps start deploying. Use dependsOn for explicit ordering:

# clusters/production/apps/kustomization.yaml (Flux CRD)
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps
  namespace: flux-system
spec:
  dependsOn:
    - name: infrastructure    # Apps wait for infrastructure Kustomization to be Ready
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: fleet-infra
  path: ./apps/overlays/production
  prune: true

Notifications

The Notification Controller sends Slack or GitHub status updates when reconciliation completes or fails:

apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: slack-ops
  namespace: flux-system
spec:
  type: slack
  channel: "#deployments"
  secretRef:
    name: slack-webhook-url

---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: on-call-alert
  namespace: flux-system
spec:
  providerRef:
    name: slack-ops
  eventSeverity: error
  eventSources:
    - kind: Kustomization
      name: "*"
    - kind: HelmRelease
      name: "*"
    - kind: Terraform
      name: "*"
  summary: "Flux reconciliation failure in production"

For the ArgoCD alternative to Flux — both are CNCF GitOps operators with different UX tradeoffs — the ArgoCD on EKS guide covers ArgoCD’s app-of-apps pattern and its UI. For the Helm charts Flux manages here, the Helm Charts on EKS guide covers chart structure and OCI registry publishing. For the OpenTofu state bucket and DynamoDB lock table setup, the GitHub Actions with Terraform guide shows the S3 backend configuration pattern that the tofu-controller’s backendConfig references.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus