AWS CloudTrail Fundamentals
CloudTrail is AWS’s audit log. Every API call — console click, CLI command, SDK call, internal service action — is captured with identity, source, timestamp, and parameters. It’s the service you reach for during security incidents, compliance audits, and “who deleted our production bucket” forensics. On every AWS account, CloudTrail should be on, cross-account, and immutable.
What CloudTrail records
Every API call has a structured event like:
{
"eventTime": "2026-04-24T02:35:00Z",
"eventName": "TerminateInstances",
"eventSource": "ec2.amazonaws.com",
"awsRegion": "us-east-1",
"userIdentity": {
"type": "AssumedRole",
"arn": "arn:aws:sts::123:assumed-role/AdminRole/alice",
"accessKeyId": "ASIA..."
},
"sourceIPAddress": "1.2.3.4",
"userAgent": "aws-cli/2.13.0",
"requestParameters": { "instanceId": "i-abc" },
"responseElements": { ... }
}Answers the forensic questions: who, what, when, where from, to what resource, with what result.
Event types
CloudTrail captures three categories, each billed and configured separately:
| Type | What it captures | Default |
|---|---|---|
| Management events | Control-plane ops: RunInstances, PutBucketPolicy, CreateUser, etc. | ✅ Free for the first copy (90-day view) |
| Data events | Data-plane ops: S3 GetObject / PutObject, Lambda Invoke, DynamoDB item-level ops | ❌ Off by default — high-volume, costs per event |
| Insights events | Anomaly-detected spikes in write-management-events (e.g. 10x surge in CreateUser) | ❌ Opt-in |
Why data events aren’t on by default: volume. An active S3 bucket can generate billions of GetObject events/day. You enable data events selectively on buckets/Lambda/DynamoDB tables that matter (critical data stores, sensitive buckets).
Trails — the delivery mechanism
A trail is the configuration that persists CloudTrail events to S3 and optionally CloudWatch Logs / EventBridge. Without a trail, you only have the 90-day Event History in the console (read-only, not exportable, management events only).
Create a trail for every account. The trail writes JSON log files to an S3 bucket (encrypted, integrity-validated).
Scope options
- Single-region — events from one region. Rarely the right choice.
- Multi-region (all regions) — default for new trails. Captures global and all regional events.
- Organization trail — one trail at the Organization level; auto-applies to every member account; non-tamperable by member accounts. This is the correct default for a multi-account environment.
Log file integrity
CloudTrail can digitally sign log files — enable “log file integrity validation.” You can prove cryptographically that logs haven’t been modified. For compliance (SOC, HIPAA, PCI), this is usually required.
Recommended baseline
Every production AWS org should have, at minimum:
- Organization Trail, multi-region, management + insights events → writes to a dedicated log-archive account S3 bucket
- Log file integrity validation on
- S3 bucket in the log account with:
- Bucket policy restricting access (only the log account)
- Object Lock (WORM) for tamper-proofing
- MFA-delete
- KMS-encrypted
- Trail sending copy to CloudWatch Logs → metric filter + alarms on sensitive events (root login, policy changes, IAM user creation, CloudTrail changes themselves)
- Data events enabled on critical buckets / Lambda functions / DynamoDB tables — scoped, not global
Many orgs layer AWS CloudTrail Lake (a managed queryable data store, SQL-style) or ship to a SIEM (Splunk, Sentinel, Chronicle) for broader correlation.
Global vs regional service events
Most services are regional — events land in the region of the API call. A few are global and CloudTrail records their events in us-east-1 specifically:
- IAM
- STS
- Route 53
- CloudFront
- Organizations
- Support
- WAF Classic
Implication: your multi-region trail captures these automatically. A single-region trail in eu-west-1 would miss IAM activity entirely — one more reason “multi-region trail” is the default.
Alerting on CloudTrail
Two common patterns:
1. CloudWatch Logs metric filter + alarm
Trail → CloudWatch Logs → Metric Filter (pattern: ConsoleLogin with "Failure")
→ Custom metric → Alarm → SNS → PagerDuty
Classic playbook alarms (from the AWS CIS Benchmark):
- Root account usage
- Unauthorized API calls (
errorCode = AccessDenied) - IAM policy changes
- CloudTrail configuration changes (tampering!)
- Network ACL / SG changes
- S3 bucket policy changes
- Disabling/deletion of KMS CMKs
- Route table changes
2. EventBridge rules
AWS API call via CloudTrail → EventBridge pattern → Lambda/SNS/Step Functions
More flexible than metric filters; can match on detailed event structure. Often the modern choice for real-time automated response.
CloudTrail Lake
A managed event data store that keeps CloudTrail events (up to 10 years) with SQL querying:
SELECT eventTime, userIdentity.arn, eventName
FROM "my-event-store"
WHERE eventName = 'DeleteObject'
AND eventTime > timestamp '2026-04-01'
ORDER BY eventTime DESC
LIMIT 100;Good when you want historical query without shipping to a SIEM. Pricing: per-event ingestion + scanned-data on queries.
Validating tamper-free logs
For each log file CloudTrail produces, a digest file is generated hourly (if integrity validation is on). The digest is signed with an AWS-managed key. CLI:
aws cloudtrail validate-logs --trail-arn <arn> \
--start-time 2026-04-24T00:00:00ZValidates hash chain and signatures. Alerts you if any log file has been altered or deleted.
What CloudTrail doesn’t see
- Data plane traffic beyond what “data events” captures — e.g. actual SQL queries to RDS are not CloudTrail-visible
- Network traffic inside VPCs — that’s VPC Flow Logs territory
- Guest-OS actions on EC2 — use SSM, OSQuery, auditd, CloudWatch Agent
- AWS support interactions with your account — separately logged
For full-picture auditing you need CloudTrail + VPC Flow Logs + Config + GuardDuty + OS-level audit, depending on compliance scope.
CloudTrail vs CloudWatch — the frequent confusion
| CloudTrail | CloudWatch | |
|---|---|---|
| Primary data | Discrete API-call events | Numeric time-series + strings |
| Use for | Audit, forensics, compliance | Monitoring, alerting, dashboards |
| Cardinality | Millions of distinct events | Aggregated metrics |
| Retention | Long-term (S3 / Lake) — years | Limited by design |
They complement. CloudTrail feeds CloudWatch Logs when you want to alarm on audit events; CloudWatch alone can’t tell you “who made this change.”
Common pitfalls
- No Organization trail. Member accounts can disable their own trails; attackers-first-move is usually to do so. Org trails are tamper-proof from member-account perspective.
- Trail logging into the same account. A compromised account can tamper with its own trail. Use a dedicated log-archive account with Object Lock.
- Forgetting data events on sensitive S3 buckets. Someone exfiltrates petabytes; CloudTrail says “bucket configuration didn’t change.” Data events would have recorded the
GetObjectcalls. - No alarms on CloudTrail itself. If the trail is disabled, you should get paged. Alarm on
StopLoggingandDeleteTrailevents. - KMS-encrypted S3 bucket with wrong key policy — trail writes fail silently. Monitor trail write errors.
- Assuming 90-day Event History = enough. It’s read-only, short, management-only. Real audit lives in a trail-to-S3.
- IAM events missed because of single-region trail. Multi-region or bust.
Mental model
- CloudTrail = append-only ledger of every control-plane (and optionally data-plane) API call.
- Trails = the storage and delivery config that makes the ledger durable and queryable.
- Organization + log-archive account + Object Lock + KMS = the tamper-proofing recipe.
- Metric filters / EventBridge = the real-time alarming layer on top.
- CloudTrail Lake / SIEM = the long-term analytics layer.
- First thing an attacker tries to kill. Design accordingly.