Understanding Observability using Open Telementry and Jaeger for traces

Wednesday, September 18, 2024

Understanding Observability using Open Telementry and Jaeger for traces

OpenTelemetry and Tracing

OpenTelemetry is an observability framework providing tools, APIs, and SDKs to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) for analyzing software performance and behavior.

Traces

Traces represent the journey of a request through various components of a distributed system. They're composed of spans, which represent individual operations within the trace.

Tracing

In this diagram:

  • The entire sequence represents a single trace
  • Each arrow represents a span within that trace
  • Spans can have child spans, creating a hierarchical structure

How OpenTelemetry Works Internally

  1. Instrumentation: Add OpenTelemetry instrumentation to your code
  2. Data Collection: Instrumentation collects telemetry data as the application runs
  3. Processing: OpenTelemetry SDK processes this data, adding timestamps and trace IDs
  4. Exporting: Processed data is sent to a backend system for storage and analysis

OpenTelemetry Workflow

Data flow through the OpenTelemetry system:

  1. Application code instrumented with OpenTelemetry API
  2. OpenTelemetry SDK collects telemetry data
  3. Span Processor handles spans (adding timestamps, batching)
  4. Exporter sends data to chosen backend system
  5. Configuration customizes SDK behavior
  6. Context Propagation maintains trace context across service boundaries

Key Concepts

  1. Span: Unit of work or operation; building block of a trace
  2. Trace: Collection of spans representing a complete request flow
  3. Attributes: Key-value pairs adding context to spans (e.g., user ID, HTTP method)
  4. Events: Time-stamped logs attached to a span
  5. Links: Connections between spans not in a parent-child relationship

Benefits of OpenTelemetry

  • Standardization: Single set of APIs and libraries for cross-platform instrumentation
  • Vendor Neutrality: Switch backends without changing instrumentation code
  • Correlation: Correlate traces, metrics, and logs for better observability
  • Distributed Tracing: Trace across service boundaries in distributed systems

In microservices, OpenTelemetry traces requests from API Gateway through Product and Cart Services, providing visibility into the entire request lifecycle and helping identify performance bottlenecks or errors.

Example: Adding a Product to Favorites and Cart

Here's how a trace might look when adding a product to both favorites and cart:

Complex Trace Example

This example demonstrates how a single trace can contain multiple spans, each recording a specific part of the request and its implementation.