AWS Auto Scaling Groups

An Auto Scaling Group (ASG) is the mechanism that turns a static EC2 fleet into an elastic, self-healing one. ASGs launch and terminate instances on demand, replace unhealthy ones, and distribute them across AZs. Together with a load balancer and a launch template, this is the canonical “web tier” on AWS.

What an ASG is

An ASG is a controller that maintains a defined number of running EC2 instances matching a template, across a set of subnets (AZs). You give it:

  1. What to launch → a Launch Template (or the older Launch Configuration)
  2. Where to launch → subnets (each in one AZ)
  3. How many → min / desired / max capacity
  4. How to decide → scaling policies
  5. How to check → health check config

And the ASG continually reconciles: if an instance dies, launch a replacement. If load grows, scale out. If load shrinks, scale in.

The three capacity numbers

SettingMeaning
MinLower bound — never scale below this
DesiredCurrent target count
MaxUpper bound — never scale above this

Scaling policies adjust Desired; the ASG then launches or terminates to match. Min/Max act as guardrails.

Launch Templates vs Launch Configurations

  • Launch Template — modern, versioned, supports mixed instance types, Spot, newest features. Use this.
  • Launch Configuration — legacy, immutable (can’t edit — must create new one and replace), limited features. AWS will eventually deprecate.

A launch template captures:

  • AMI ID, instance type(s), key pair, security groups
  • IAM instance profile
  • User data (first-boot script)
  • EBS volume specs
  • Network interface config, metadata options (IMDSv2 enforcement)

Templates are versioned — point the ASG at a specific version or at $Latest / $Default.

Multi-AZ by default

You specify multiple subnets across AZs. The ASG spreads instances evenly across them, using AZ-rebalance internally. If an AZ goes down, surviving AZs pick up the load; new instances skip the failed AZ until it returns.

This is the fundamental HA unit on AWS. Combined with a load balancer, it gives zero-touch recovery from instance and AZ failures.

Health checks — what “unhealthy” means

Two sources:

  • EC2 status checks — hypervisor-level (hardware, reachability). On by default.
  • ELB health checks — the load balancer’s view of the target’s health. Must enable the “ELB” health check type on the ASG for this to propagate.

If a check fails, the ASG marks the instance unhealthy and terminates + replaces it. That’s the self-healing loop.

Health check grace period — how long after launch before checks count. Gives instances time to boot and bootstrap before being judged. Set it long enough for user-data + app startup.

Scaling policies

You set a metric and a target value; ASG does the math.

“Keep average CPU at 50%.” → ASG scales out when average climbs above, in when it drops below.

Works with: CPU, network in/out, ALB request count per target, custom metrics. Simplest, most predictable, handles the cool-downs internally.

Step scaling

Finer control — “if CPU > 70%, add 2 instances; if > 85%, add 4.” Useful when you want asymmetric behaviour or multiple threshold bands.

Simple scaling

“If CPU > 70%, add 1 instance.” One rule, no bands. Older style; step scaling is strictly more flexible.

Scheduled actions

“At 8am weekdays, set desired to 20. At 8pm, set to 5.” Cron-like capacity changes. Stack with dynamic policies.

Predictive scaling

ML-driven — looks at 14 days of history to forecast and pre-warm capacity. Good for predictable daily/weekly patterns. Combines with reactive policies.

Instance types — single vs mixed

Modern ASGs support mixed instance types and mixed purchase options in one group:

  • Pick several instance types (e.g. m5.large, m6i.large, m5a.large) — ASG picks what’s available
  • Split between On-Demand and Spot with a base count + percentage split
  • Capacity Rebalancing — proactively replace Spot instances before they’re reclaimed

Typical pattern for cost efficiency: “2 On-Demand baseline, everything else Spot, 4 instance types permitted.”

Lifecycle hooks

Custom logic at launch and termination:

  • EC2_INSTANCE_LAUNCHING — pause before marking InService (e.g. bootstrap config, register with service discovery)
  • EC2_INSTANCE_TERMINATING — pause before termination (e.g. deregister, drain connections, snapshot logs)

Hooks hold instances in a Pending:Wait or Terminating:Wait state for up to 1 hour. A Lambda or SQS consumer performs work, then completes the hook.

Without lifecycle hooks, termination is hard — an unhealthy instance gets killed immediately.

Termination policy — who gets killed?

When scaling in, the ASG picks which instance to terminate. Default priority:

  1. Old launch template version
  2. Oldest instance (helps “rolling” an ASG)
  3. Instance closest to the next billing hour (cost optimisation from when billing was hourly; less relevant with per-second)

You can override with specific policies: NewestInstance, OldestInstance, ClosestToNextInstanceHour, Default, OldestLaunchConfiguration, AllocationStrategy.

Instance Scale-In Protection prevents a specific instance from being chosen for scale-in — useful for a “leader” or a long-running job.

Instance refresh — rolling updates

Instance Refresh is the built-in way to roll a fleet after a launch-template change. You set a minimum healthy percentage; ASG replaces instances in batches without breaching HA. Optionally:

  • Warm Pool — pre-warmed stopped instances that start-up fast (trading EBS costs for scale-out speed)
  • Checkpoints — pause mid-refresh for validation
  • Skip matching — skip instances already on the desired version

This is how you deploy a new AMI across hundreds of instances without downtime.

Interaction with Elastic Load Balancing

Attach the ASG to one or more target groups (ALB/NLB). As instances are launched, the ASG auto-registers them with the target group; terminations trigger deregistration with the target group’s deregistration delay (connection draining).

Health is the union — a target group considers an instance unhealthy, ELB health-check-enabled ASG terminates it, ASG launches a replacement, registers to target group, health check passes, begins receiving traffic. End-to-end self-healing.

Cooldowns — preventing flapping

A cooldown period after a scaling activity prevents another scale action from firing immediately. Target-tracking handles cooldowns implicitly; for step/simple scaling, set them deliberately (default 300s).

Poor cooldowns → scaling oscillation. Good cooldowns → stable equilibrium.

Cost

  • No charge for the ASG itself — you pay only for the instances it launches (plus the LB it’s attached to).
  • Savings Plans / Reserved Instances still apply — ASG just chooses from your shape.
  • Spot via mixed instances policy is the biggest lever for cost-sensitive stateless tiers.

Common pitfalls

  1. Single-AZ ASG — defeats the primary purpose. Always span 2+ AZs.
  2. Grace period too short — new instances get killed mid-bootstrap because ELB health checks fail before the app is up.
  3. Desired > Max. AWS silently caps at Max and the scaling policy can’t push past it. Check when unexpected “not scaling.”
  4. User-data failures are invisible without inspection. Log to CloudWatch via the agent; surface failures.
  5. State on instances — ASGs terminate freely; don’t store anything durable on instance volumes. Persist to S3, EFS, RDS, or DynamoDB.
  6. IAM instance profile missing. Instance boots without expected access; SSM doesn’t work; CloudWatch Agent can’t write.
  7. Slow bootstrapping. A 4-minute user-data script means 4 minutes of load on surviving instances while the new one spins up. Bake more into the AMI (golden image pattern) or use a warm pool.
  8. IMDS v1 still enabled. Set metadata options in the launch template to enforce IMDSv2.

Mental model

  • ASG = a control loop. Desired count is the setpoint; scaling policies adjust the setpoint; health checks + termination policy keep the actuals aligned.
  • Launch template = the “shape” of an instance. Versioned, declarative.
  • Target groups + ELB = the traffic layer that the ASG feeds.
  • Lifecycle hooks = the escape hatch for custom behaviour at transitions.
  • Instance Refresh = the “rolling deploy” primitive.

Used together (ASG + LT + ELB + health checks), you have a self-healing, elastic, multi-AZ compute tier that survives most failure modes without operator intervention — the AWS answer to “stateless horizontal scaling.”

See also