Spot Instances in AWS [Complete Guide]

Bits Lovers
Written by Bits Lovers on
Spot Instances in AWS [Complete Guide]

I have been running workloads on AWS for years, and one of the easiest ways to cut your EC2 bill is Spot Instances. In this post I will walk through what they are, when they make sense, how the pricing works, what happens when AWS takes them back, and how Spot Fleets help you manage everything at scale.

Spot Instances vs On-Demand

A Spot Instance lets you use spare EC2 capacity at a discount. The hourly price is set by AWS and adjusts gradually based on supply and demand. In practice you pay up to 90% less than the On-Demand rate. The tradeoff is that AWS can reclaim the instance with a two-minute warning whenever they need the capacity back.

When to Use Spot Instances

Spot Instances are a good fit for workloads that tolerate interruptions. Think batch processing, containerized tasks, CI/CD builds, big data jobs, and high-performance computing. I also use them for test and dev environments where losing an instance is not a crisis.

They are a bad fit for anything that needs to stay up 24/7. A web server that serves live traffic, a database, or any critical process that cannot be interrupted should run on On-Demand or Reserved Instances instead.

How Spot Pricing Works

The Spot price changes based on long-term supply and demand for each instance type in each Availability Zone. AWS adjusts it gradually, and you pay the current Spot price for as long as your instance runs. You do not need to bid anymore. AWS used to require you to set a maximum price, but these days the default behavior is simpler: your Spot Instance runs whenever capacity is available and the Spot price is below the On-Demand price.

If you want, you can still set a maximum price as a safety cap. If the Spot price exceeds your maximum, AWS will interrupt your instance. You get a two-minute warning before that happens.

You can view the price history in the EC2 console under Spot Instance pricing. It breaks down by instance type, operating system, and Availability Zone, and you can filter by date range. This is useful for comparing AZs and picking the one with the lowest historical price.

Spot Instances Price History Spot Instances Price History

When Not to Use Spot Instances

Do not use Spot Instances for persistent workloads like web servers or databases. You do not want your primary database living on hardware that can disappear with two minutes of notice. For those cases, stick with On-Demand, Reserved Instances, or EC2 Savings Plans.


CI/CD Using AWS Fargate

How to Speed Up AWS S3

AWS Auto Scaling Hooks


Spot Instance Interruptions

Spot Instances Lifecycle Spot Instances Lifecycle

When you create a Spot request, you specify your launch configuration (AMI, instance type, subnet, and so on). The request type can be one-time or persistent. A persistent request will automatically relaunch your instance after an interruption if capacity is still available.

The two-minute warning

When AWS needs the capacity back, you receive an interruption notice. EC2 makes the notice available through the instance metadata service and via an EventBridge event. From that moment you have exactly two minutes to wrap up. Your options for handling the interruption are:

  • Stop the instance so you can start it again later when the Spot price drops.
  • Terminate the instance and clean up yourself.
  • Hibernate the instance (if you configured it for hibernation), which saves the in-memory state to the EBS root volume.

AWS also provides a rebalance recommendation signal that arrives before the two-minute warning. It tells you the instance is at elevated risk of interruption, which gives you time to proactively replace it without waiting for the hard cutoff.

The persistent request trap

Here is something that catches people off guard. If you have a persistent Spot request that is still open and you terminate your instances manually, the Spot service will notice that your target capacity is unmet and it will launch replacement instances automatically. You end up in a loop.

The fix is to cancel the Spot request first, and then terminate the instances. Canceling the request alone does not terminate running instances, so you need both steps.

Spot Instances Request State Spot Instances Request State

What Are Spot Fleets?

A Spot Fleet is a collection of Spot Instances plus optional On-Demand instances, managed as a single unit. You set a target capacity (say, 50 instances) and the fleet tries to maintain that number within your price constraints.

The advantage of a Spot Fleet over individual Spot requests is that you can define multiple launch pools with different instance types, operating systems, and Availability Zones. The fleet then picks the best combination based on the allocation strategy you choose. If an instance gets interrupted, the fleet automatically requests a replacement from one of your pools.

Allocation Strategies for Spot Fleets

When you configure a Spot Fleet, you choose how it picks which launch pool to draw from. AWS offers four allocation strategies:

capacityOptimized

The fleet launches instances from the pools with the most available spare capacity at the time of the request. This is the strategy I recommend for most workloads because it minimizes the chance of interruption. You still get Spot pricing, but from pools that are less likely to be reclaimed.

lowestPrice

The fleet picks the pool with the lowest Spot price. This used to be the default, but it has a downside: the cheapest pool is often the one with the least capacity, which means your instances are more likely to get interrupted. You can combine it with InstancePoolsToUseCount to spread across N cheapest pools instead of just one.

diversified

The fleet distributes instances evenly across all pools you defined. This is useful when you want maximum availability, since the chance of all pools being interrupted at once is low.

InstancePoolsToUseCount

Used alongside lowestPrice. Instead of picking the single cheapest pool, it spreads your instances across the N cheapest pools you specify. This gives you some diversity while still optimizing for price.

Practical Tips

Here are a few things I have learned running Spot workloads in production:

  • Use multiple instance types. The more flexible you are on instance type, the more pools the fleet can draw from, and the less likely you are to be interrupted.
  • Spread across Availability Zones. Spot pricing and availability vary by AZ. Diversifying reduces risk.
  • Handle the interruption signal. Poll the instance metadata service at http://169.254.169.254/latest/meta-data/spot/instance-action or listen for the EventBridge event. Use the two-minute window to checkpoint work, drain connections, and shut down cleanly.
  • Use Auto Scaling groups with capacity rebalancing. When enabled, ASG proactively replaces Spot Instances that receive a rebalance recommendation, reducing the impact of interruptions.
  • Check the Spot placement score. AWS provides an API that scores how likely a Spot request is to succeed for a given configuration. Use it before launching large fleets.

Conclusion

Spot Instances can cut your EC2 costs by up to 90%, but they only make sense for workloads that can handle being interrupted. Do not put databases or production web servers on Spot. Instead, use them for batch jobs, CI/CD, containers, and other flexible workloads.

Spot Fleets give you a way to manage a mixed pool of Spot and On-Demand instances across multiple instance types and AZs. The capacityOptimized strategy is a solid default because it reduces interruption risk by drawing from pools with the most spare capacity.

Handle the two-minute interruption notice in your application, use persistent requests carefully, and always cancel the request before terminating instances to avoid the re-launch loop.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus