MonitoringObservabilityMonitoringDevOps

Building Observability into Cloud Applications

November 15, 2023

11 min read

Cover image for Building Observability into Cloud Applications

Monitoring tells you if something is wrong. Observability lets you ask why. In a complex distributed system, you can't predict every failure mode, so you need the ability to explore your system's state. This is built on three pillars.

1. Metrics

Metrics are numerical measurements over time, perfect for dashboards and alerts.

What to Track: Follow the RED method for services (Rate, Errors, Duration) and the USE method for resources (Utilization, Saturation, Errors).
Tools: Prometheus is the open-source standard for collecting metrics. Grafana is the tool of choice for visualizing them.

2. Logs

Logs are immutable records of discrete events. Raw logs are hard to parse; structured logs are essential.

Best Practice: Log in a structured format like JSON. Include contextual data like a trace_id to correlate the log with a specific request.
Tools: The ELK Stack (Elasticsearch, Logstash, Kibana) or Promtail/Loki are powerful for aggregating and searching logs.

3. Traces

Traces show the journey of a single request as it flows through multiple services. They are indispensable for debugging latency issues in a microservices architecture.

How it Works: Use libraries that support OpenTelemetry to automatically instrument your code and propagate context between services.
Tools: Jaeger and Zipkin are popular open-source tools for collecting and visualizing traces.

Want to discuss this further?

I'm always happy to chat about software engineering, cloud architecture, AI/ML, and DevOps.

Get In Touch Read More Articles

Follow me for more insights on software engineering, cloud architecture, AI/ML, and DevOps

Follow on LinkedIn

Back to Blog