Observability
Definition
Three pillars: metrics, logs, traces. Observability = ability to ask new questions about your system without shipping new code.
Where it appears
🌐 Networking
- SNMP — counters, interface stats
- NetFlow / sFlow / IPFIX — flow records
- Packet capture — tcpdump, Wireshark
- Streaming telemetry — gNMI, OpenConfig
🐧 Linux
- journald / syslog — structured logs
- Prometheus node_exporter — host metrics
- eBPF — bpftrace, bcc, Pixie
☁️ Cloud
- AWS — CloudWatch, VPC Flow Logs, X-Ray, CloudTrail (audit)
- Azure — Monitor, Log Analytics, Network Watcher, Application Insights
📦 Containers
- Prometheus + Grafana
- OpenTelemetry — unified metrics/logs/traces
- Jaeger / Tempo — distributed tracing
- Loki / Elasticsearch — log aggregation
🔐 Cybersecurity
- SIEM — Splunk, Sentinel, Elastic, Wazuh
- SOAR — orchestration
- IDS logs — Suricata, Zeek