Back to Blog
MonitoringObservabilityMonitoringDevOps
Building Observability into Cloud Applications
November 15, 2023
11 min read

Monitoring tells you if something is wrong. Observability lets you ask why. In a complex distributed system, you can't predict every failure mode, so you need the ability to explore your system's state. This is built on three pillars.
1. Metrics
Metrics are numerical measurements over time, perfect for dashboards and alerts.
- What to Track: Follow the RED method for services (Rate, Errors, Duration) and the USE method for resources (Utilization, Saturation, Errors).
- Tools: Prometheus is the open-source standard for collecting metrics. Grafana is the tool of choice for visualizing them.
2. Logs
Logs are immutable records of discrete events. Raw logs are hard to parse; structured logs are essential.
- Best Practice: Log in a structured format like JSON. Include contextual data like a
trace_idto correlate the log with a specific request. - Tools: The ELK Stack (Elasticsearch, Logstash, Kibana) or Promtail/Loki are powerful for aggregating and searching logs.
3. Traces
Traces show the journey of a single request as it flows through multiple services. They are indispensable for debugging latency issues in a microservices architecture.
- How it Works: Use libraries that support OpenTelemetry to automatically instrument your code and propagate context between services.
- Tools: Jaeger and Zipkin are popular open-source tools for collecting and visualizing traces.
Want to discuss this further?
I'm always happy to chat about cloud architecture and share experiences.
Follow me for more insights on cloud architecture and DevOps
Follow on LinkedIn