EKS Karpenter Autoscaling: Faster Nodes, Smarter Scheduling
Karpenter hit v1.0 in late 2024, and for most EKS clusters it’s now the better choice over Cluster Autoscaler. The performance difference alone is enough to justify the switch: Cluster Autoscaler typically takes 2-5 minutes to add a node, Karpenter does it in under 60 seconds. But the more interesting reason to switch is how Karpenter thinks about nodes. It doesn’t manage Auto Scaling Groups. It provisions EC2 instances directly, based on what your pending pods actually need, right now.
What Karpenter Does Differently
Cluster Autoscaler operates by scaling existing ASG node groups up or down. You define node groups in advance — one for on-demand, one for spot, maybe one for GPU workloads — and Cluster Autoscaler picks which group to expand when pods are pending. If none of your groups fit the pod’s requirements, the pod stays pending. Cluster Autoscaler can’t create new instance types or change what a node group offers.
Karpenter bypasses all of that. When a pod is unschedulable, Karpenter reads the pod’s resource requests, node selectors, tolerations, and affinity rules, then directly calls the EC2 API to launch an instance that satisfies those requirements. It considers hundreds of instance types simultaneously and picks the most appropriate one. The ASG middleman is gone.
This matters practically. A Cluster Autoscaler setup typically needs multiple carefully-tuned node groups covering your expected workload mix. Karpenter replaces that with a single NodePool that describes constraints (instance families, zones, capacity types) and lets Karpenter figure out which specific instance to launch per workload.
Installing Karpenter on EKS
Karpenter runs as a Deployment in your cluster and needs two things: an IAM role with permissions to call EC2 and describe the cluster, and a way to get those credentials. The standard approach is IRSA (IAM Roles for Service Accounts) — the Karpenter service account gets annotated with an IAM role ARN, and the pod receives short-lived credentials automatically.
Setting up via Helm:
# Set cluster variables
export CLUSTER_NAME="my-eks-cluster"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION="us-east-1"
# Create the Karpenter IAM role (using eksctl for IRSA)
eksctl create iamserviceaccount \
--cluster $CLUSTER_NAME \
--namespace karpenter \
--name karpenter \
--role-name KarpenterControllerRole-${CLUSTER_NAME} \
--attach-policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME} \
--approve
# Add the Karpenter Helm repo and install
helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm upgrade --install karpenter karpenter/karpenter \
--namespace karpenter \
--create-namespace \
--version "1.0.6" \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=Karpenter-${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--wait
The interruptionQueue setting is an SQS queue name that Karpenter uses to receive EC2 Spot interruption notices and scheduled rebalance events. When AWS is about to terminate a Spot instance, Karpenter drains and cordons the node before the termination happens, giving pods time to reschedule cleanly.
Karpenter’s IAM policy is wider than you’d expect. It needs to call RunInstances, TerminateInstances, CreateFleet, and a dozen Describe actions to do its job. The precise set of actions changes between Karpenter versions, so use the policy from the official Karpenter docs rather than hand-crafting it. The IAM roles and policies guide covers how IRSA wires the service account to the IAM role so Karpenter picks up credentials without static keys.
NodePool: The Core Resource
A NodePool defines what kinds of nodes Karpenter can provision. It’s the replacement for manually managed node groups:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
taints:
- key: node.kubernetes.io/not-ready
effect: NoExecute
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
The requirements section defines constraints. instance-category: c, m, r covers compute, general purpose, and memory-optimized families. instance-generation Gt 2 excludes older generation instances. Karpenter will choose from any instance type within those constraints that fits the pending pod.
limits caps total cluster capacity. When your cluster hits 1,000 CPU across Karpenter-managed nodes, no more nodes get provisioned. Useful for cost control during runaway scaling.
disruption.consolidationPolicy: WhenEmptyOrUnderutilized enables node consolidation — Karpenter actively looks for underutilized nodes and replaces them with fewer, fuller ones. An overprovisioned cluster of t3.medium instances gets consolidated into fewer c5.large instances automatically.
EC2NodeClass: Instance Configuration
The NodePool references an EC2NodeClass that holds AWS-specific configuration:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiSelectorTerms:
- alias: al2023@latest
role: KarpenterNodeRole-my-eks-cluster
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-eks-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-eks-cluster
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
iops: 3000
throughput: 125
deleteOnTermination: true
amiSelectorTerms with alias: al2023@latest tells Karpenter to always use the latest Amazon Linux 2023 EKS-optimized AMI for your cluster version. No manual AMI updates needed — Karpenter tracks the latest AMI automatically and uses drift detection to replace nodes running outdated AMIs.
Subnets and security groups are discovered by tag. Tag your private subnets and the cluster’s node security group with karpenter.sh/discovery: <cluster-name> and Karpenter finds them automatically.
Spot + On-Demand Strategy
The capacity-type requirement in the NodePool controls which capacity types Karpenter uses. Setting it to ["spot", "on-demand"] means Karpenter prefers Spot when available and falls back to on-demand. Spot instances can be 60-90% cheaper than on-demand for the same instance type.
For workloads that can’t tolerate interruption (stateful services, long-running jobs with no checkpointing), use a separate NodePool that only allows on-demand:
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
taints:
- key: workload-type
value: stateful
effect: NoSchedule
Workloads targeting this pool need a matching toleration. Everything else goes to the mixed-capacity default pool. The taint ensures stateful workloads don’t accidentally land on spot nodes.
Karpenter’s Spot strategy by default uses price-capacity-optimized allocation, which balances cost savings with availability. It selects from instance types where AWS has high Spot capacity, reducing interruption probability while still getting significant discounts over on-demand.
Consolidation in Practice
Consolidation is where Karpenter saves money passively. When a node’s utilization drops (pods move away, requests decrease), Karpenter evaluates whether it can be replaced with a smaller node or removed entirely.
WhenEmptyOrUnderutilized triggers consolidation when a node is empty (no pods except DaemonSets) or when the pods could fit on other existing nodes. Karpenter cordons the node, drains it (respecting PodDisruptionBudgets), and terminates it.
consolidateAfter: 1m means Karpenter waits one minute after a node becomes eligible before consolidating. Too short and you get thrashing during brief load spikes. One to five minutes is a reasonable range for most workloads.
Pods that have karpenter.sh/do-not-disrupt: "true" annotation won’t be evicted during consolidation. Use this for critical pods you want to protect from disruption windows, but be conservative — over-annotating prevents consolidation from working.
Drift Detection
Drift detection is Karpenter’s mechanism for keeping nodes up to date. A node is “drifted” when it no longer matches the current NodePool or EC2NodeClass specification. Common drift causes: AMI updates, security group changes, subnet configuration changes, NodePool requirement updates.
When a node drifts, Karpenter schedules a replacement. It provisions the new node first, then drains and terminates the old one. Zero-downtime rolling updates happen automatically.
This replaces the painful manual process of rolling node group updates. With Cluster Autoscaler, updating an AMI means updating the launch template, triggering a node group update, waiting for nodes to cycle through. With Karpenter, update the EC2NodeClass AMI selector and nodes refresh themselves over the next disruption window.
disruption.budgets controls how aggressively Karpenter disrupts nodes:
disruption:
budgets:
- nodes: "20%"
schedule: "* * * * *"
- nodes: "0"
schedule: "0 9-17 * * 1-5"
This allows disrupting 20% of nodes at any time, but prevents any disruption during business hours Monday through Friday. The cron schedule uses standard cron syntax in UTC.
Monitoring Karpenter
Karpenter exposes Prometheus metrics at :8080/metrics. The ones worth watching:
karpenter_nodes_total — total nodes managed by Karpenter, labeled by NodePool and provisioner. Sudden drops might indicate consolidation running too aggressively.
karpenter_pods_state — count of pods in each scheduling state (pending, running, etc.). Sustained unschedulable count means Karpenter can’t satisfy pod requirements — likely a NodePool constraint issue.
karpenter_provisioner_scheduling_duration_seconds — how long it takes Karpenter to make a scheduling decision. Should stay under a few seconds.
karpenter_nodepool_usage — resource usage against NodePool limits. Alert when this approaches 90% so you can expand limits before new pods start pending.
Connect these to CloudWatch Container Insights for centralized alerting. The Karpenter Helm chart includes a ServiceMonitor for Prometheus if you’re running the kube-prometheus-stack.
Karpenter vs Cluster Autoscaler: When to Use Each
Karpenter is better in most cases, but not all.
Use Karpenter when you want flexible instance selection, fast scale-out (under 60 seconds), automatic consolidation, and simplified node management with fewer node groups. It handles mixed workloads — spot and on-demand, different instance families, GPU nodes — through a single NodePool with appropriate requirements.
Stick with Cluster Autoscaler if you have specific compliance requirements that mandate using predefined ASGs, if your organization’s security policies require all infrastructure to be managed through Auto Scaling Groups for audit purposes, or if you’re running a Kubernetes distribution other than EKS (Karpenter has limited support outside EKS). Also consider Cluster Autoscaler if your team is already deeply familiar with it and the operational overhead of migration isn’t justified by your scale.
For new EKS clusters, Karpenter is the default choice. For existing clusters with Cluster Autoscaler, the migration path is straightforward: install Karpenter alongside Cluster Autoscaler, create NodePools for your workload types, gradually migrate node groups by removing them from Cluster Autoscaler’s management and letting Karpenter take over. Run both simultaneously during the transition — they don’t conflict.
The GitOps workflows for EKS with ArgoCD post covers how to manage Karpenter NodePool and EC2NodeClass configurations through GitOps, which is the recommended approach for keeping Karpenter configuration version-controlled and auditable.
Comments