Site-to-Site VPN vs Direct Connect
Two ways to connect on-prem (or another cloud) to AWS. One rides the public internet with IPsec; the other is a physical private circuit into an AWS router. They solve different problems — and production-grade hybrid often uses both.
Quick verdict
| Need | Choice |
|---|---|
| Fast to turn up, tolerant of internet jitter, cost-sensitive | Site-to-Site VPN |
| Consistent throughput, predictable latency, high volume | Direct Connect |
| ”Never goes down” | Direct Connect + VPN backup |
| Compliance requires not touching the public internet | Direct Connect (but note: DX alone is unencrypted — add MACsec or IPsec-over-DX) |
Side-by-side
| Aspect | Site-to-Site VPN | Direct Connect (DX) |
|---|---|---|
| Transport | IPsec tunnel over the public internet | Dedicated fibre from your DC to an AWS DX location |
| Setup time | Minutes | Weeks-to-months (ordering circuit, cross-connect, physical) |
| Encryption | Built-in (IPsec) | None by default — add MACsec (L2) or IPsec-over-DX |
| Throughput | Up to ~1.25 Gbps per tunnel aggregate; scale with multiple tunnels + ECMP | 1, 10, 100 Gbps dedicated; sub-1G as hosted connections via partner |
| Latency / jitter | Internet-variable | Deterministic (your provider’s SLA) |
| Pricing | Per-hour + data transfer out | Port-hour + data transfer out (DX egress is cheaper than internet egress) |
| Redundancy model | Two tunnels per connection (automatic) | Need 2 ports, ideally at 2 DX locations |
| BGP | Supported (dynamic) or static | BGP required on private/transit VIF |
| Terminates on | VGW or TGW | Private VIF → VGW/DXGW/TGW; Transit VIF → DXGW → TGW |
The VPN side
A Site-to-Site VPN is an IPsec tunnel between an on-prem customer gateway (your firewall/router) and an AWS managed endpoint:
- Terminates on a Virtual Private Gateway (VGW) attached to a single VPC, or on a Transit Gateway (scales to many VPCs)
- AWS provisions 2 tunnels per VPN in different AWS AZs. Redundancy is baked in.
- Routing: static (explicit prefixes) or BGP (dynamic, preferred at scale). BGP also runs the per-tunnel ECMP.
- Maximum throughput is per-tunnel — to exceed one tunnel’s limit, use multiple VPN connections with ECMP on TGW (active/active).
When VPN is right
- Bootstrapping a new hybrid setup
- Sites that don’t move much data
- Disaster-recovery backup (below)
- Multi-vendor SD-WAN overlays terminating into TGW Connect
VPN quirks
- IPsec rekey causes microsecond drops; BGP smooths over this
- MTU: default 1500; IPsec overhead → effective MTU 1436. TCP MSS clamp or pmtud matters.
- Cross-tunnel failover uses BGP; without BGP you’re static-route pinned to one tunnel unless you script failover
The Direct Connect side
DX is a physical, dedicated 802.1Q-trunked fibre link between your equipment and an AWS router (the “DX location” — typically a carrier-neutral facility like Equinix). Once established, you attach one or more Virtual Interfaces (VIFs):
| VIF type | Purpose |
|---|---|
| Private VIF | Reach a VPC — terminates on a VGW or Direct Connect Gateway (DXGW) |
| Public VIF | Reach AWS public services (S3, DynamoDB, etc.) with private routing — no internet traversal |
| Transit VIF | Terminates on a DXGW that fronts a TGW — scales to many VPCs, multi-region |
BGP is mandatory on every VIF. You get a private AS-path peering with AWS; you advertise your prefixes, AWS advertises VPC/public prefixes.
Connection types
- Dedicated Connection — 1, 10, or 100 Gbps physical port allocated solely to you. Order directly from AWS.
- Hosted Connection — fractional (50 Mbps up to 10 Gbps) provided by a DX Partner. Provisioned through the partner’s portal. Faster to get, more partners available.
DX is unencrypted by default
The fibre is private to your carrier, but AWS doesn’t encrypt at L2/L3. If regulatory rules demand encryption:
- MACsec — L2 encryption on 10/100 Gbps dedicated connections where supported
- IPsec over DX — run a Site-to-Site VPN inside a Public VIF (effectively encrypted DX). Popular pattern for HIPAA/PCI.
Redundancy & hybrid patterns
Pattern 1 — DX primary, VPN backup
Most common production pattern:
on-prem router ──DX──▶ DXGW ──▶ TGW ──▶ VPCs
│
└───── Internet ──VPN──▶ TGW ◀── backup path
BGP local-preference / AS-path prepending makes DX the preferred path; VPN takes over automatically if DX drops. Near-zero RTO.
Pattern 2 — DX with two locations
For DX-only HA, provision at two DX locations (e.g. Ashburn + Atlanta) and optionally two routers at each site. AWS recommends this for maximum resiliency (MAX model per the Direct Connect Resiliency Toolkit).
Pattern 3 — VPN-only with multiple connections + ECMP
A single IPsec tunnel caps at ~1.25 Gbps. On TGW with ECMP enabled, multiple VPNs load-balance — a cost-effective way to push a few Gbps without DX.
Pricing mental model
- VPN: flat hourly charge per VPN attachment + data-transfer-out (at standard internet rates). Cheap to stand up.
- DX: port-hour per VIF/connection + data-transfer-out at DX rates (noticeably cheaper than internet egress). Upfront cost for the circuit.
At very high egress volumes (~terabytes/month), DX’s cheaper per-GB egress can offset the port fee. That’s the usual business case when DX is chosen for cost rather than performance.
When to choose which
Choose Site-to-Site VPN when:
- You need connectivity this week
- Throughput fits within a few hundred Mbps
- Latency variability is tolerable
- It’s a backup path, not the primary
Choose Direct Connect when:
- Sustained high throughput (> 1 Gbps)
- Jitter-sensitive workloads (VoIP, interactive terminals, replication)
- Predictable monthly data-transfer bills
- Compliance mandates non-internet paths
- Hybrid is permanent architecture, not transitional
Use both when: the workload truly can’t tolerate DX downtime. Circuits do fail; VPN-as-backup is cheap insurance.
Common pitfalls
- DX ordered, but no redundant circuit. A single DX is a single point of failure. AWS SLAs only apply to the DX service, not the carrier circuit.
- VPN without BGP — static routing doesn’t failover between the two AWS-provided tunnels. Use BGP.
- Asymmetric routing between DX (primary) and VPN (backup). Stateful firewalls hate this. Align BGP metrics both directions.
- Forgetting DX is unencrypted. Dev team assumes “private circuit = secure”; auditors disagree.
- MTU mismatches. DX supports jumbo frames (9001); the on-prem side must agree end-to-end or you get black holes.
- VIF ownership and hosted VIFs. On hosted connections, the partner owns the port; VIFs are sliced from that port. Organisational seams sometimes create delays.
Mental model
- VPN = best-effort overlay over public internet with built-in IPsec. Fast to deploy, good enough for many.
- DX = a dedicated WAN circuit terminated on AWS — think of it as a leased line into AWS’s backbone.
- DXGW = the DX transit layer — fronts multiple VPCs / regions behind one DX link.
- Together = the resiliency story AWS recommends for real production hybrid.