AWS Auto Scaling Groups
An Auto Scaling Group (ASG) is the mechanism that turns a static EC2 fleet into an elastic, self-healing one. ASGs launch and terminate instances on demand, replace unhealthy ones, and distribute them across AZs. Together with a load balancer and a launch template, this is the canonical “web tier” on AWS.
What an ASG is
An ASG is a controller that maintains a defined number of running EC2 instances matching a template, across a set of subnets (AZs). You give it:
- What to launch → a Launch Template (or the older Launch Configuration)
- Where to launch → subnets (each in one AZ)
- How many → min / desired / max capacity
- How to decide → scaling policies
- How to check → health check config
And the ASG continually reconciles: if an instance dies, launch a replacement. If load grows, scale out. If load shrinks, scale in.
The three capacity numbers
| Setting | Meaning |
|---|---|
| Min | Lower bound — never scale below this |
| Desired | Current target count |
| Max | Upper bound — never scale above this |
Scaling policies adjust Desired; the ASG then launches or terminates to match. Min/Max act as guardrails.
Launch Templates vs Launch Configurations
- Launch Template — modern, versioned, supports mixed instance types, Spot, newest features. Use this.
- Launch Configuration — legacy, immutable (can’t edit — must create new one and replace), limited features. AWS will eventually deprecate.
A launch template captures:
- AMI ID, instance type(s), key pair, security groups
- IAM instance profile
- User data (first-boot script)
- EBS volume specs
- Network interface config, metadata options (IMDSv2 enforcement)
Templates are versioned — point the ASG at a specific version or at $Latest / $Default.
Multi-AZ by default
You specify multiple subnets across AZs. The ASG spreads instances evenly across them, using AZ-rebalance internally. If an AZ goes down, surviving AZs pick up the load; new instances skip the failed AZ until it returns.
This is the fundamental HA unit on AWS. Combined with a load balancer, it gives zero-touch recovery from instance and AZ failures.
Health checks — what “unhealthy” means
Two sources:
- EC2 status checks — hypervisor-level (hardware, reachability). On by default.
- ELB health checks — the load balancer’s view of the target’s health. Must enable the “ELB” health check type on the ASG for this to propagate.
If a check fails, the ASG marks the instance unhealthy and terminates + replaces it. That’s the self-healing loop.
Health check grace period — how long after launch before checks count. Gives instances time to boot and bootstrap before being judged. Set it long enough for user-data + app startup.
Scaling policies
Target tracking (recommended default)
You set a metric and a target value; ASG does the math.
“Keep average CPU at 50%.” → ASG scales out when average climbs above, in when it drops below.
Works with: CPU, network in/out, ALB request count per target, custom metrics. Simplest, most predictable, handles the cool-downs internally.
Step scaling
Finer control — “if CPU > 70%, add 2 instances; if > 85%, add 4.” Useful when you want asymmetric behaviour or multiple threshold bands.
Simple scaling
“If CPU > 70%, add 1 instance.” One rule, no bands. Older style; step scaling is strictly more flexible.
Scheduled actions
“At 8am weekdays, set desired to 20. At 8pm, set to 5.” Cron-like capacity changes. Stack with dynamic policies.
Predictive scaling
ML-driven — looks at 14 days of history to forecast and pre-warm capacity. Good for predictable daily/weekly patterns. Combines with reactive policies.
Instance types — single vs mixed
Modern ASGs support mixed instance types and mixed purchase options in one group:
- Pick several instance types (e.g.
m5.large,m6i.large,m5a.large) — ASG picks what’s available - Split between On-Demand and Spot with a base count + percentage split
- Capacity Rebalancing — proactively replace Spot instances before they’re reclaimed
Typical pattern for cost efficiency: “2 On-Demand baseline, everything else Spot, 4 instance types permitted.”
Lifecycle hooks
Custom logic at launch and termination:
EC2_INSTANCE_LAUNCHING— pause before marking InService (e.g. bootstrap config, register with service discovery)EC2_INSTANCE_TERMINATING— pause before termination (e.g. deregister, drain connections, snapshot logs)
Hooks hold instances in a Pending:Wait or Terminating:Wait state for up to 1 hour. A Lambda or SQS consumer performs work, then completes the hook.
Without lifecycle hooks, termination is hard — an unhealthy instance gets killed immediately.
Termination policy — who gets killed?
When scaling in, the ASG picks which instance to terminate. Default priority:
- Old launch template version
- Oldest instance (helps “rolling” an ASG)
- Instance closest to the next billing hour (cost optimisation from when billing was hourly; less relevant with per-second)
You can override with specific policies: NewestInstance, OldestInstance, ClosestToNextInstanceHour, Default, OldestLaunchConfiguration, AllocationStrategy.
Instance Scale-In Protection prevents a specific instance from being chosen for scale-in — useful for a “leader” or a long-running job.
Instance refresh — rolling updates
Instance Refresh is the built-in way to roll a fleet after a launch-template change. You set a minimum healthy percentage; ASG replaces instances in batches without breaching HA. Optionally:
- Warm Pool — pre-warmed stopped instances that start-up fast (trading EBS costs for scale-out speed)
- Checkpoints — pause mid-refresh for validation
- Skip matching — skip instances already on the desired version
This is how you deploy a new AMI across hundreds of instances without downtime.
Interaction with Elastic Load Balancing
Attach the ASG to one or more target groups (ALB/NLB). As instances are launched, the ASG auto-registers them with the target group; terminations trigger deregistration with the target group’s deregistration delay (connection draining).
Health is the union — a target group considers an instance unhealthy, ELB health-check-enabled ASG terminates it, ASG launches a replacement, registers to target group, health check passes, begins receiving traffic. End-to-end self-healing.
Cooldowns — preventing flapping
A cooldown period after a scaling activity prevents another scale action from firing immediately. Target-tracking handles cooldowns implicitly; for step/simple scaling, set them deliberately (default 300s).
Poor cooldowns → scaling oscillation. Good cooldowns → stable equilibrium.
Cost
- No charge for the ASG itself — you pay only for the instances it launches (plus the LB it’s attached to).
- Savings Plans / Reserved Instances still apply — ASG just chooses from your shape.
- Spot via mixed instances policy is the biggest lever for cost-sensitive stateless tiers.
Common pitfalls
- Single-AZ ASG — defeats the primary purpose. Always span 2+ AZs.
- Grace period too short — new instances get killed mid-bootstrap because ELB health checks fail before the app is up.
- Desired > Max. AWS silently caps at Max and the scaling policy can’t push past it. Check when unexpected “not scaling.”
- User-data failures are invisible without inspection. Log to CloudWatch via the agent; surface failures.
- State on instances — ASGs terminate freely; don’t store anything durable on instance volumes. Persist to S3, EFS, RDS, or DynamoDB.
- IAM instance profile missing. Instance boots without expected access; SSM doesn’t work; CloudWatch Agent can’t write.
- Slow bootstrapping. A 4-minute user-data script means 4 minutes of load on surviving instances while the new one spins up. Bake more into the AMI (golden image pattern) or use a warm pool.
- IMDS v1 still enabled. Set metadata options in the launch template to enforce IMDSv2.
Mental model
- ASG = a control loop. Desired count is the setpoint; scaling policies adjust the setpoint; health checks + termination policy keep the actuals aligned.
- Launch template = the “shape” of an instance. Versioned, declarative.
- Target groups + ELB = the traffic layer that the ASG feeds.
- Lifecycle hooks = the escape hatch for custom behaviour at transitions.
- Instance Refresh = the “rolling deploy” primitive.
Used together (ASG + LT + ELB + health checks), you have a self-healing, elastic, multi-AZ compute tier that survives most failure modes without operator intervention — the AWS answer to “stateless horizontal scaling.”