Fastify Observability Playbook: Logs, Metrics, Traces
Observability is not buying three tools and wiring dashboards. It is designing your API so every user-facing failure maps to measurable signals.
1. Structured logs must be queryable by intent
Include request id, guild id, route id, and actor id on every log event. Free-text logs look useful during coding, then collapse during incidents.
2. Metrics should answer operational questions
Start with route-level latency histograms, error-rate counters, and queue backlog gauges. Skip vanity graphs that never influence production decisions.
3. Tracing closes the gap between API and workers
Propagate trace context from HTTP request to queue publish to worker execution. Without that chain, distributed failures become guesswork.
4. Set SLO alerts with clear action owners
Alerting should map to people who can execute a fix. Alerts with no owner become noise. Tie each alert rule to a runbook and an on-call role.
5. Review observability debt weekly
Every unresolved incident should produce one instrumentation upgrade. This feedback loop is how your system gets easier to operate over time.
Good observability is cumulative. Treat every incident as a design review for your telemetry.