Let’s discuss OpenTelemetry – a vendor-neutral standard way to collect telemetry data.
Offering better observability into an application is a big challenge for any developer because they need to capture telemetry data of the application. The Cambridge dictionary defines telemetry as the science or process of collecting information about objects that are far away and sending the information somewhere electronically.
For example, a user’s single click or session on a website generates a lot of requests and tracing flowing between networks, microservices, databases, etc.
OpenTelemetry is an observability platform, a set of well-factored components that can be used together or a la carte. Furthermore, developers of frameworks and libraries that we all use today now have a standard way to bake telemetry data into those libraries and frameworks, giving the end-users many out-of-the-box insights into what those frameworks are doing under the hood.
To understand OpenTelemetry, you first need to know what distributed tracing is.
What is Distributed Tracing?
As our applications become more complex and more services are involved in serving user traffic and completing transactions, it becomes more and more critical to understand how requests traverse our services and how each service contributes to overall latency. This is what distributed tracing does. It captures the latency of user requests and how long it takes each microservice in the path to return a response.
When a user request comes in, we want to create a trace, i.e., the total information that describes how our system responds to a user request. Traces are composed of spans, and each span signifies a specific request and response pair involved in serving a user request. The parent span describes the latency as observed by the end-user. And child span is used to understand how a particular service in the distributed system was called and responded with their latency information.
What is OpenTelemetry?
OpenTelemetry is an open-source project hosted by the CNCF that provides a standard way to generate telemetry data. It was created by the merger of OpenTracing, a standard for generating trace data, and OpenCensus, which was a standard for generating metrics data.
OpenTelemetry offers a single set of APIs, agents, collector services, and libraries to capture distributed traces and metrics from your application. OpenTelemetry standardizes how we collect telemetry data and send it to a back-end of your choice. This provides you a vendor-neutral path to instrumentation and gives you the flexibility to change your back-end without instrumenting your code again.
So, you can instrument your applications using a vendor-agnostic agent while still sending your metrics and traces to a SaaS vendor like Datadog. Then if you want to switch vendors (e.g., from Datadog to Dynatrace), you can do it without changing your application code.
The OpenTelemetry project aims to provide a single set of APIs libraries and agents to capture metrics and distributed traces from your applications. This applies across many languages and platforms. The OpenTelemetry project also includes an optional collector service and has a dedicated repository for specifications. To be clear, OpenTelemetry is not Jaeger or Prometheus, which are observable back-ends. But it helps in exporting data to open-source and commercial back-ends.
Below are the features that OpenTelemetry provides:
- Standardization on collecting telemetry data that organizations can follow, which makes it easy to move between vendors
- A vendor-agnostic, open-standard semantic convention for the process of data collection
- Collector which can be deployed as an agent or gateways or many different ways
- Supports multiple context propagation formats for migration
- An end to end solution to generate, emit, collect, process, and export telemetry data
- Facility to send data to various destinations in parallel with complete control on it
Below are the core components of OpenTelemetry:
- Proto: This component is used to define for collectors, instrumentation libraries, etc., which are language-independent interface types for OpenTelemetry.
- Collector: Collectors are used to receiving, process, and export telemetry data. This implementation of collectors has to be vendor agnostic. By default, all the telemetry data is exported by instrumentation libraries at this location.
- Specification: This component describes the requirements and expectations of the implementation in different languages consisting of APIs, SDKs, and data. API generates the telemetry data, processing, and exporting capabilities for implementing the APIs provided by SDKs. Data has the semantic conventions to support all kinds of vendors without changing any code.
- Instrumentation Libraries: These are available in multiple languages as a part of the OpenTelemetry project. These libraries are used to provide observability for other libraries to make all the applications observed by making calls to OpenTelemetry API.
At the high level, OpenTelemetry consists of three main pieces:
- A set of APIs to instrument applications, libraries, and frameworks.
- The SDK implements APIs.
- An optional collector can ingest, aggregate, and export telemetry data wherever you need it.
The purpose of the API is to enable the creation of instrumentation for libraries and the application code. The API has four main sections: tracing, meters, a shared context, and semantic conventions.
- Tracer API supports creating, annotating, and completing spans.
- The meter API consists of multiple metric instruments. Examples of these instruments are observers, value recorders, counters.
- You can track and execute span context by enabling the context API and propagate that context both within and externally to your system.
- All guidelines and rules for mainly naming, such as naming the spans, attributes, labels, and metric instruments, are present in the semantic conventions. These conventions are implemented to ensure consistency across different language implementations and for external instrumentations.
In a shared context, the context implementation lies between the tracer and the meter and enables all non-observer metric recordings to occur in the context of an executing span. A feature that allows SDKs to capture exemplar spans for metric values. You can customize the context with propagators, which enable propagating the span context into and out of the system that enables true distributed tracing.
The Collector is an essential part of OpenTelemetry architecture. It is a standalone service that can receive, process, and export telemetry data from various sources, including OpenCensus, Zipkin, Jaeger, and the OpenTelemetry protocol. Using collectors, you can export spans and metrics to multiple vendors and open-source telemetry systems.
The OpenTelemetry architecture offers a complete telemetry solution out of the box. You can also do customization by using multiple extension points as per the need.
How OpenTelemetry Works?
Inside of every service in your deployment, install the OpenTelemetry client. The client is the SDK; the SDK, in turn, has an API. Your applications frameworks and libraries use this instrumentation API to describe the work that they are doing. The SDK then exports the collected observations to a data pipelining service called the Collector.
OpenTelemetry has its own data protocol, OTLP, but the collector can translate OTLP into various formats, including Zipkin, Jaeger, and Prometheus. Notably, OpenTelemetry does not provide its own back end or analysis tool; this is because it is a standardization effort at the heart of OpenTelemetry. The goal is to come up with a universal language for describing the operations of computers in a cloud environment. The goal is not to standardize how we analyze that data. Instead, we hope that OpenTelemetry will help push the world of observability forward by allowing new analysis tools to get started quickly without rebuilding this entire ecosystem of telemetry software.
When you are sending a lot of data across the system, there is a lot to consider. Luckily OpenTelemetry has thought about all the things and has solutions to each of those questions. First and foremost, OpenTelemetry is flexible, and it handles multiple context propagation formats. This means that even though there is a standard, there is still the option of choice within that standard. So, if you are using something like the w3c trace context format or b3 propagation, these are different standards within the standard that allow your services to connect the dots.
OpenTelemetry collects up a variety of observations, distributed tracing metrics and system resources being the most important. Rather than treating these as separate signals, OpenTelemetry braids them together and provides indexing and context that allows you to aggregate and cross-index all of these signals on the back end.
In addition to the data collection, OpenTelemetry provides a data processing and pipelining facility that allows you to change data formats, manipulate your data, and all the tools you need to build a robust telemetry pipeline in a modern system.
So, that was all about OpenTelemetry, go ahead and try out this tool.