EC2 Auto Scaling Groups: Complete Guide to Scaling Policies and Launch Templates
EC2 Auto Scaling has been around since 2009, but teams still misconfigure it in ways that cost them money or reliability. The most common mistake: using simple scaling policies instead of target tracking, which means the group is always reacting to alarms that fired 60 seconds ago rather than continuously adjusting to current load. The second most common: not using launch templates, which blocks access to newer features like mixed instance types and capacity rebalancing for Spot.
This guide covers every component you need for a properly configured Auto Scaling Group — launch templates, all four scaling policy types, lifecycle hooks, instance refresh for rolling deployments, warm pools for faster scale-out, and the mixed instances policy that blends On-Demand and Spot to cut costs 60-70%.
Launch Templates vs Launch Configurations
Launch configurations are the original way to define what instances an ASG launches. They’re effectively deprecated — AWS stopped adding features to them in 2022 and recommends migrating everything to launch templates. If you’re creating a new ASG, use a launch template. If you have existing ASGs using launch configurations, migrate before your next major change.
Launch templates support versioning (track changes over time), mixed instance types (required for Spot diversification), instance requirements-based selection (let AWS pick the best matching instance type), and all current EC2 features including newer instance families.
# Create a launch template with IMDSv2 required (security best practice)
LT_ID=$(aws ec2 create-launch-template \
--launch-template-name my-app-lt \
--version-description "Initial version" \
--launch-template-data '{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "m6i.large",
"KeyName": "my-keypair",
"SecurityGroupIds": ["sg-0abc123"],
"IamInstanceProfile": {
"Arn": "arn:aws:iam::123456789012:instance-profile/MyAppInstanceProfile"
},
"MetadataOptions": {
"HttpTokens": "required",
"HttpPutResponseHopLimit": 1
},
"UserData": "'"$(base64 -w 0 << 'EOF'
#!/bin/bash
yum update -y
/opt/aws/bin/cfn-signal --success true --stack MyStack --resource MyASG --region us-east-1
EOF
)"'",
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [{"Key": "Name", "Value": "my-app"}, {"Key": "Env", "Value": "production"}]
}]
}' \
--query 'LaunchTemplate.LaunchTemplateId' \
--output text)
echo "Launch Template: $LT_ID"
# Create a new version when AMI changes
aws ec2 create-launch-template-version \
--launch-template-id $LT_ID \
--source-version 1 \
--version-description "Updated AMI" \
--launch-template-data '{"ImageId": "ami-0newami123456789"}'
# Set default version
aws ec2 modify-launch-template \
--launch-template-id $LT_ID \
--default-version 2
HttpTokens: required enforces IMDSv2 — every request to the instance metadata service must use a session token. This prevents SSRF attacks that could steal the instance’s IAM credentials through the metadata endpoint. Set it on every new launch template.
Creating an Auto Scaling Group
# Create ASG using the launch template
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name my-app-asg \
--launch-template "LaunchTemplateId=$LT_ID,Version=\$Default" \
--min-size 2 \
--max-size 20 \
--desired-capacity 4 \
--vpc-zone-identifier "subnet-1a,subnet-1b,subnet-1c" \
--target-group-arns "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/abc123" \
--health-check-type ELB \
--health-check-grace-period 300 \
--default-cooldown 300 \
--tags "Key=Name,Value=my-app-asg,PropagateAtLaunch=false"
Two health check types: EC2 (default) checks whether the instance is running and passing EC2 status checks. ELB additionally checks whether the load balancer target is healthy — the ELB health check. For any ASG behind a load balancer, use ELB health checks. If you use EC2, an instance can be running but returning 503s and the ASG won’t replace it.
HealthCheckGracePeriod: 300 gives new instances 5 minutes to finish bootstrapping before health checks start. Set this to whatever your application startup time actually is — if your app takes 90 seconds to start, set 180 seconds. If you set it too short, the ASG terminates instances before they’re ready. If you set it too long, a truly broken instance stays in the group too long.
Target Tracking Scaling (Preferred)
Target tracking is the right default for most workloads. You specify a target metric value, and the ASG adjusts capacity to maintain it. No alarm configuration, no step definitions — you just say “keep average CPU at 60%” and the scaling logic handles the rest.
# Target tracking on CPU utilization
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": 60.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300,
"DisableScaleIn": false
}'
# Target tracking on ALB request count per target (better for web APIs)
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name request-count-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": 1000.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ALBRequestCountPerTarget",
"ResourceLabel": "app/my-alb/abc123/targetgroup/my-tg/def456"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
Scale-out cooldown of 60 seconds lets the ASG add capacity quickly when load spikes. Scale-in cooldown of 300 seconds prevents the ASG from removing instances too eagerly when load briefly dips — you usually want to keep capacity around after a spike in case it returns.
ALBRequestCountPerTarget is often a better target than CPU for web services. Requests per target directly represents how busy each instance is, regardless of how CPU-intensive each request happens to be. A target of 1000 means each instance handles around 1000 concurrent requests before scale-out triggers.
Step Scaling for Complex Policies
Step scaling gives you fine-grained control over how much capacity to add or remove based on alarm breach severity. When CPU hits 70%, add 2 instances. When it hits 85%, add 5. When it drops below 40%, remove 1.
# First create CloudWatch alarms
aws cloudwatch put-metric-alarm \
--alarm-name asg-cpu-high \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 60 \
--evaluation-periods 2 \
--threshold 70 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=AutoScalingGroupName,Value=my-app-asg \
--alarm-actions "arn:aws:autoscaling:us-east-1:123456789012:scalingPolicy:abc:autoScalingGroupName/my-app-asg:policyName/cpu-step-out"
# Scale-out step policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name cpu-step-out \
--policy-type StepScaling \
--adjustment-type ChangeInCapacity \
--step-adjustments '[
{"MetricIntervalLowerBound": 0, "MetricIntervalUpperBound": 15, "ScalingAdjustment": 2},
{"MetricIntervalLowerBound": 15, "MetricIntervalUpperBound": 30, "ScalingAdjustment": 4},
{"MetricIntervalLowerBound": 30, "ScalingAdjustment": 8}
]' \
--estimated-instance-warmup 120
The intervals are relative to the alarm threshold. If the alarm fires at 70% CPU, and MetricIntervalLowerBound: 0 means 70-85% → add 2 instances, 15 means 85-100% → add 4, 30 means >100% (saturated) → add 8. Step scaling doesn’t wait for the cooldown between additions when a higher-breach step triggers — useful for handling sudden large load spikes.
Scheduled Scaling
For workloads with predictable patterns — business hours spike, nightly batch jobs, weekly report generation — scheduled scaling pre-positions capacity before you need it rather than reacting after load arrives.
# Scale up for business hours (Monday-Friday 8am UTC)
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-app-asg \
--scheduled-action-name scale-up-business-hours \
--recurrence "0 8 * * MON-FRI" \
--min-size 4 \
--max-size 20 \
--desired-capacity 8
# Scale down overnight
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-app-asg \
--scheduled-action-name scale-down-overnight \
--recurrence "0 20 * * MON-FRI" \
--min-size 2 \
--max-size 20 \
--desired-capacity 2
# Scale down for weekends
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-app-asg \
--scheduled-action-name scale-down-weekend \
--recurrence "0 20 * * FRI" \
--min-size 1 \
--max-size 4 \
--desired-capacity 1
Combine scheduled with target tracking: use scheduled actions to set minimum capacity at appropriate levels, then let target tracking handle variable load within those bounds. The scheduled action raises the floor; target tracking handles the day-to-day variation above it.
Mixed Instances Policy with Spot
The most cost-effective production setup: a base of On-Demand instances for stability, supplemented by Spot instances for variable load. Spot can be 60-70% cheaper than On-Demand for the same instance size.
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name my-app-spot-asg \
--min-size 2 \
--max-size 20 \
--desired-capacity 6 \
--vpc-zone-identifier "subnet-1a,subnet-1b,subnet-1c" \
--mixed-instances-policy '{
"LaunchTemplate": {
"LaunchTemplateSpecification": {
"LaunchTemplateId": "'"$LT_ID"'",
"Version": "$Default"
},
"Overrides": [
{"InstanceType": "m6i.large"},
{"InstanceType": "m6a.large"},
{"InstanceType": "m5.large"},
{"InstanceType": "m5a.large"},
{"InstanceType": "m4.large"}
]
},
"InstancesDistribution": {
"OnDemandBaseCapacity": 2,
"OnDemandPercentageAboveBaseCapacity": 20,
"SpotAllocationStrategy": "capacity-optimized",
"SpotInstancePools": 0
}
}' \
--capacity-rebalance \
--health-check-type ELB \
--health-check-grace-period 300
OnDemandBaseCapacity: 2 keeps 2 On-Demand instances always running. Above that baseline, 20% of new capacity comes from On-Demand, 80% from Spot. At 6 instances: 2 On-Demand base + 0.8 On-Demand variable + 3.2 Spot = roughly 3 On-Demand, 3 Spot.
SpotAllocationStrategy: capacity-optimized picks Spot instances from the pools with the most available capacity, which reduces interruption probability. The alternative lowest-price picks the cheapest Spot pool — lower cost but higher interruption rates. For production workloads, capacity-optimized is the better default.
Diversifying across 5 instance types (all roughly equivalent in CPU/memory) is what makes Spot reliable. If AWS reclaims m6i.large capacity, the ASG requests m6a.large or m5.large instead. A Spot ASG with only one instance type is fragile.
--capacity-rebalance enables proactive replacement: when AWS signals an instance will be interrupted, the ASG launches a replacement before the interruption happens, then drains the at-risk instance. Without this, you wait for the interruption to happen before replacement starts.
Lifecycle Hooks
Lifecycle hooks pause instance launch or termination to let you run custom actions — registering the instance with a configuration management tool, draining connections, taking a final snapshot.
# Hook on launch: pause new instances before putting them in service
aws autoscaling put-lifecycle-hook \
--auto-scaling-group-name my-app-asg \
--lifecycle-hook-name instance-launch-hook \
--lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
--default-result CONTINUE \
--heartbeat-timeout 300 \
--notification-target-arn arn:aws:sqs:us-east-1:123456789012:instance-launch-queue \
--role-arn arn:aws:iam::123456789012:role/AutoScalingNotificationRole
# Hook on termination: drain gracefully before instance dies
aws autoscaling put-lifecycle-hook \
--auto-scaling-group-name my-app-asg \
--lifecycle-hook-name instance-termination-hook \
--lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING \
--default-result CONTINUE \
--heartbeat-timeout 120 \
--notification-target-arn arn:aws:sqs:us-east-1:123456789012:instance-termination-queue \
--role-arn arn:aws:iam::123456789012:role/AutoScalingNotificationRole
While in a lifecycle hook, the instance is in the Pending:Wait state (launch) or Terminating:Wait state (terminate). Your automation reads the SQS message, does its work, then signals completion:
# Signal the hook from your automation (or the instance itself)
aws autoscaling complete-lifecycle-action \
--auto-scaling-group-name my-app-asg \
--lifecycle-hook-name instance-launch-hook \
--instance-id i-0abc123 \
--lifecycle-action-result CONTINUE
If your automation fails or times out, --default-result CONTINUE moves the instance forward anyway. Use ABANDON to terminate a failed launch. Set --heartbeat-timeout to slightly longer than your automation takes — if the automation gets stuck, you don’t want instances waiting forever.
Instance Refresh for Rolling Updates
When you update the launch template (new AMI, new user data), Instance Refresh replaces running instances in a rolling fashion without downtime.
# Start an instance refresh — replaces 80% of instances at a time, keeps 20% healthy
aws autoscaling start-instance-refresh \
--auto-scaling-group-name my-app-asg \
--strategy Rolling \
--preferences '{
"MinHealthyPercentage": 80,
"InstanceWarmup": 300,
"CheckpointPercentages": [20, 50, 100],
"CheckpointDelay": 3600
}'
# Monitor progress
aws autoscaling describe-instance-refreshes \
--auto-scaling-group-name my-app-asg \
--query 'InstanceRefreshes[0].{Status:Status,PercentageComplete:PercentageComplete}'
CheckpointPercentages adds gates: the refresh pauses at 20% replaced, waits CheckpointDelay seconds (1 hour here), then continues to 50%, pauses again, then finishes. Use checkpoints to verify the new AMI works correctly before committing to a full rollout. If something’s wrong, cancel the refresh before it completes.
# Cancel if something is wrong
aws autoscaling cancel-instance-refresh \
--auto-scaling-group-name my-app-asg
Cancelled instances stay at their current state — instances already replaced with the new AMI stay on the new AMI. You’d need another refresh to roll back.
Warm Pools
Scale-out normally takes 3-5 minutes: launch instance, run user data, pass health checks. Warm pools pre-initialize instances in a stopped state, so scale-out becomes 30-60 seconds instead.
# Create a warm pool — keep 2 pre-initialized stopped instances ready
aws autoscaling put-warm-pool \
--auto-scaling-group-name my-app-asg \
--pool-state Stopped \
--min-size 2 \
--max-group-prepared-capacity 5
When scale-out triggers, warm pool instances start (fast) instead of launching from scratch (slow). You pay EC2 stopped instance rates (storage only, no compute) for warm pool instances. The trade-off: paying for pre-initialized capacity that might not be needed, in exchange for much faster scale response.
For workloads where slow scale-out causes user impact — checkout flows, payment APIs, anything interactive — warm pools are worth the cost. For background processing where a 5-minute delay is acceptable, they’re probably not necessary.
Scale-out latency remains the most common EC2 Auto Scaling complaint. Warm pools solve it at the EC2 layer; App Runner and Lambda solve it by abstracting the instance layer entirely. Which approach fits depends on whether you need the control that EC2 provides. For cost optimization across your compute fleet, the AWS Savings Plans guide and Compute Optimizer guide cover how to right-size and pre-purchase capacity for the stable On-Demand baseline in mixed-instance ASGs.
Comments