AWS VPC Fundamentals
A VPC is your own private L3 network inside AWS. For a network engineer, it’s a familiar landscape with a few AWS-specific surprises. This note focuses on the surprises — the parts where AWS networking diverges from physical networking.
What a VPC is
A Virtual Private Cloud (VPC) is an isolated virtual network in one AWS region. It’s defined by:
- A CIDR block (e.g.
10.0.0.0/16) — your address space - Subnets within the CIDR, each in exactly one Availability Zone
- Route tables controlling L3 forwarding
- Gateways for connectivity to the outside (IGW, NAT, VPN, TGW)
- Security groups and NACLs for filtering
The closest traditional analogue is a VRF in a data center fabric — your own routing and address space, isolated from everyone else’s.
CIDR sizing
- You pick the primary CIDR when creating the VPC (minimum
/28, maximum/16) - Can add secondary CIDRs later (up to 4 additional blocks)
- AWS reserves 5 addresses per subnet (network, VPC router, DNS, future use, broadcast — yes, even though broadcast doesn’t exist)
- Don’t overlap with other VPCs you might peer with, or with on-prem networks you’ll connect via VPN/Direct Connect. This causes painful migrations later.
Common pattern: size the VPC /16, subnets /24. Gives you 256 subnets × 251 usable hosts.
Subnets — AZ-scoped L3 segments
A subnet is a piece of the VPC CIDR, bound to exactly one AZ. You can’t stretch a subnet across AZs. This is the fundamental HA unit — to be multi-AZ, your workload must live in subnets in multiple AZs.
Public vs private subnets
The distinction is purely about routing — there’s no “public subnet” checkbox. A subnet is “public” if its route table sends 0.0.0.0/0 to an Internet Gateway. Otherwise it’s “private.”
Public subnet route table:
10.0.0.0/16 → local
0.0.0.0/0 → igw-xxxx ← makes it "public"
Private subnet route table:
10.0.0.0/16 → local
0.0.0.0/0 → nat-yyyy ← outbound-only via NAT
Instances in a public subnet can have public IPs and be reached from the internet. Instances in a private subnet have no direct inbound path — they reach out via NAT.
Subnet surprises for a network engineer
- No broadcast, no multicast (except via a special Transit Gateway multicast domain)
- No L2 visibility — you don’t see your AZ-mates on the wire; packets are routed from the first hop
- Implicit first-hop router — the
.1address of every subnet is always the VPC router. There’s no HSRP/VRRP; it’s just always up. - ARP is synthetic — AWS forges ARP responses; MAC spoofing is blocked at the hypervisor
- You can’t run protocols that need L2 adjacency (OSPF on broadcast, VRRP, classic multicast) without workarounds (GRE/VXLAN overlays, Transit Gateway Connect)
Route tables — per-subnet policy
Each subnet is associated with exactly one route table. Multiple subnets can share a table. The table controls where the VPC router forwards traffic.
Possible route targets:
| Target | Destination example |
|---|---|
local | The VPC CIDR — automatic, can’t be removed |
Internet Gateway (igw-*) | 0.0.0.0/0 for public subnets |
NAT Gateway (nat-*) | 0.0.0.0/0 for private subnets (outbound only) |
VPC Peering (pcx-*) | Another VPC’s CIDR |
Transit Gateway (tgw-*) | Multiple destinations |
Virtual Private Gateway (vgw-*) | On-prem CIDRs via site-to-site VPN |
Network Interface (eni-*) | For traffic steering through an NVA (firewall) |
Longest-prefix-match applies, like any routing table.
Internet Gateway (IGW)
- A VPC-attached component providing 1:1 NAT for public IPs → private IPs
- One per VPC, horizontally scaled, no maintenance
- Does not cost anything on its own (unlike NAT Gateway)
- Without an IGW attached + a route to it, no traffic leaves the VPC to the internet
An instance with a public IP doesn’t actually have a public IP on its NIC — the NIC has the private IP. The IGW translates inbound/outbound. This is why ifconfig on an EC2 instance shows only the private IP.
NAT Gateway
For private subnets that need outbound internet (apt updates, API calls) without being reachable from the internet.
- Per-AZ managed service (one NAT Gateway per AZ for true HA — don’t share across AZs or cross-AZ charges apply)
- Charged per hour and per GB processed — expensive at scale; a common bill shocker
- Alternatives: NAT Instance (self-managed EC2 with NAT enabled — cheaper but you babysit it), VPC Endpoints (see below) for AWS services
Security Groups vs NACLs
The two filtering layers:
| Security Group | NACL | |
|---|---|---|
| Scope | Per ENI (instance-level) | Per subnet |
| State | Stateful — return traffic implicitly allowed | Stateless — allow both directions explicitly |
| Action | Allow only (implicit deny) | Allow or deny, ordered rules |
| Rule evaluation | All rules evaluated (permissive union) | First-match by rule number |
Default rule of thumb: use Security Groups for almost everything. Treat NACLs as belt-and-suspenders for coarse subnet-level blocks (e.g. block a known-bad IP range at the subnet level).
See AWS Security Groups vs NACLs for the detailed comparison.
VPC connectivity options
How a VPC talks to other things:
| Need | Use |
|---|---|
| Connect two VPCs (same or different accounts) | VPC Peering (non-transitive, point-to-point) |
| Connect many VPCs + on-prem | Transit Gateway (hub-and-spoke, transitive) |
| Site-to-site to on-prem over internet | Site-to-Site VPN (IPsec over internet) |
| Dedicated to on-prem | Direct Connect (private fiber circuit) |
| Reach AWS services from a VPC without traversing the internet | VPC Endpoints (Gateway for S3/DynamoDB, Interface for others via PrivateLink) |
| Cross-region private | Transit Gateway peering or Cloud WAN |
VPC Peering is not transitive. If A peers with B and B peers with C, A cannot reach C via B. Transit Gateway solves this at scale.
The default VPC
Every AWS account starts with a default VPC in every region: 172.31.0.0/16, one public subnet per AZ, IGW attached, public IPs auto-assigned. Convenient for learning, dangerous for production because anything you launch is internet-reachable by default. For real workloads, create your own VPCs and ignore the default one.
Flow logs — the observability layer
VPC Flow Logs capture metadata (5-tuple + action) for every flow in/out of ENIs. Written to CloudWatch Logs or S3. Essential for:
- Security analysis (what connected where)
- Troubleshooting (“why can’t A reach B?“)
- Cost attribution (cross-AZ traffic analysis)
Flow logs don’t capture packet payloads — that’s a VPC Traffic Mirroring feature, separate.
Common pitfalls
- Overlapping CIDRs — blocks VPC peering, TGW attachment, and on-prem integration. Plan before you build.
- NAT Gateway billing — processing charges on outbound data add up. S3 Gateway Endpoint bypasses NAT for S3 traffic (and it’s free). Use it.
- SG references across VPCs — SGs can reference other SGs only within the same VPC (or peered VPCs with config). Across TGW, you use CIDRs.
- Subnet sizing — too small (
/27), you run out of IPs; too large, you waste space and can’t split later without migration. - Public IP confusion — elastic IPs persist; auto-assigned public IPs change on stop/start. For anything needing a stable public endpoint, use an EIP or a load balancer.