What no one tells you about AWS Auto Scaling Group!
Most people know that Auto Scaling Groups monitor your servers and adjust capacity based on traffic. That’s the basic pitch, anyway.
But there’s a feature most tutorials skip over: Lifecycle Hooks. I’ve used these to solve some real problems with production instances, and I want to show you how they work.
What Lifecycle Hooks Actually Do
Lifecycle Hooks let you tap into events in your Auto Scaling Group. When something happens to an instance—launch, termination, whatever—you can trigger custom actions instead of just letting AWS handle it.
Save Logs from Your EC2 Before It’s Gone
Here’s the problem I kept running into: instances would fail health checks, get marked unhealthy, and then get terminated by the ASG. When that happened, I’d lose all my logs. No trace of what went wrong. It’s frustrating.
The fix is to use a Lifecycle Hook that fires when termination is imminent. The hook triggers a Lambda function, which runs commands on the instance via SSM to打包 logs and upload them to S3 before the instance disappears.
When This Saved My Bacon
I had an application that would randomly fail ELB health checks. Every few days, an instance would get terminated and I’d have no idea why. After I set up Lifecycle Hooks, I could actually download the logs afterward and see the real error messages. Previously, they were just gone.
What You Need
The setup uses a few AWS services together:
- Auto Scaling Group (you probably already have this)
- IAM Role for the EC2 instance
- EventBridge Rule (AWS renamed CloudWatch Events, in case you were confused)
- Lambda Function
- IAM Role for Lambda
- SSM Document
- Run Command via Systems Manager
- Optional: SNS for notifications
- Optional: Load Balancer (if you’re not already using one)
It’s simpler than it looks at first. Most of these you might already have.
How It Works
Here’s the flow:
- Instance fails health check and gets marked unhealthy
- ASG initiates termination but pauses thanks to the Lifecycle Hook
- EventBridge catches the termination event
- EventBridge triggers a Lambda function
- Lambda runs a script on the instance via SSM Run Command
- The script tars up the logs and uploads to S3
- Script sends optional SNS notification
- Script calls
CompleteLifecycleActionto tell ASG to finish termination
The Lambda function is written in Python using boto3. It checks if the SSM document exists, sends the command to the instance, then monitors the command status. If something fails, it still completes the lifecycle action so you’re not stuck with orphaned instances.
Setting Up the IAM Role for EC2
You’ll need a role that allows the instance to:
- Call
autoscaling:CompleteLifecycleAction - Publish to SNS (if using notifications)
- Upload to S3
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:CompleteLifecycleAction",
"sns:Publish",
"s3:*"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
Also attach AmazonEC2RoleforSSM so the instance works with Systems Manager. And make sure the instance can write to your S3 bucket—don’t forget this or your logs won’t go anywhere.
Setting Up the Lambda Role
Create a role for Lambda with these policies:
AWSLambdaBasicExecutionRoleAmazonSSMFullAccess
If you’re using SNS notifications, add an inline policy allowing sns:Publish.
Creating the Lifecycle Hook
In the AWS Console, find your Auto Scaling Group and go to the “Instance Management” tab. Click “Create lifecycle hook.”
Give it a name. For “Lifecycle transition,” pick “Instance Terminate” if you want to catch termination events. Set “Default Result” to “CONTINUE” so the instance terminates if something goes wrong.
Heartbeat Timeout Matters
This is where people mess up. The heartbeat timeout determines how long the instance stays in the wait state while your script runs. If you’re backing up large log files, you need more time. Set it too low and the instance terminates before your upload finishes.
I learned this the hard way.
Creating the SSM Document
Systems Manager Documents define what commands run on your instances. AWS provides pre-built documents, but you’ll create your own for this task.
Go to Systems Manager > Documents > Create document. Give it a name and paste the document content. The document should accept parameters like your S3 bucket name and the log paths you want to back up.
Our document defines the steps: find the logs, create a tar archive, upload to S3, optionally notify via SNS, then signal completion.
Creating the Lambda Function
Create a new Lambda function with Python 3.9 or later (the post mentioned 3.9 but newer versions work fine). Paste the handler code from the GitHub repository.
Set these environment variables:
S3BUCKET: Your bucket nameSNSTARGET: ARN of your SNS topic (optional)SSM_DOCUMENT_NAME: The document you created above
Creating the EventBridge Rule
Go to EventBridge > Rules > Create rule.
Choose “Rule with an event pattern.” For the event source, select “Other.” Use this pattern:
{
"source": ["aws.autoscaling"],
"detail-type": ["EC2 Instance-terminate Lifecycle Action"],
"detail": {
"AutoScalingGroupName": ["your-asg-name"]
}
}
Attach your Lambda function as the target.
Testing It
Set your ASG’s Desired Capacity and Minimum Capacity to 0. This forces termination of all instances, which triggers the lifecycle hooks. Watch the Instances tab—you’ll see instances go into “Terminating: Wait” state.
Check CloudWatch Logs for Lambda output. Then verify your S3 bucket has the uploaded log files. You can also check SSM Command History in the EC2 console to see if the commands executed properly.
Troubleshooting
“Not authorized to perform: autoscaling:CompleteLifecycleAction”
This means your EC2 instance role is missing permissions. Add the IAM policy from earlier to the instance’s role.
Run Command failing
Check CloudWatch Logs from the Lambda function. Also look at SSM Command History in Systems Manager—you can see the exact parameters that were passed and any error messages.
Lifecycle hook not firing
Verify your EventBridge rule is attached to the right event bus and that the ASG name in your filter matches exactly.
What Else You Can Do
Lifecycle Hooks aren’t just for logs. I’ve used them to:
- Gracefully drain GitLab Runners before termination
- Deregister instances from service registries
- Snapshot databases before shutdown
- Move persistent data to another instance
The possibilities are wider than most articles suggest. Once you understand the pattern—pause, do something, resume—you can adapt it to all kinds of scenarios.
The Terraform Shortcut
If you don’t want to click through all these steps manually, there’s a Terraform module that deploys this entire setup. You just provide the ASG name and S3 bucket. It takes about 30 seconds to provision instead of an hour of console work. Check the GitHub repo linked in the post for details.
Conclusion
Auto Scaling Lifecycle Hooks, Lambda, and Run Command together let you react to scaling events in ways AWS doesn’t support out of the box. It’s not glamorous, but it’s saved me more than once when instances died unexpectedly and I needed to figure out why.
The setup takes a bit of effort, but once it’s running, you stop losing logs to terminations. That’s worth it in my book.
Comments