Skip to content

Latest commit



190 lines (146 loc) · 8.82 KB

File metadata and controls

190 lines (146 loc) · 8.82 KB


Distributed Tracing

A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A distributed trace contains events that cross process, network and security boundaries. A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed.


Traces in OpenTelemetry are defined implicitly by their Spans. In particular, a Trace can be thought of as a directed acyclic graph (DAG) of Spans, where the edges between Spans are defined as parent/child relationship.

For example, the following is an example Trace made up of 8 Spans:

Causal relationships between Spans in a single Trace

        [Span A]  ←←←(the root span)
     |             |
 [Span B]      [Span C] ←←←(Span C is a `child` of Span A)
     |             |
 [Span D]      +---+-------+
               |           |
           [Span E]    [Span F] 

Sometimes it's easier to visualize Traces with a time axis as in the diagram below:

Temporal relationships between Spans in a single Trace

––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time

 [Span A···················································]
   [Span B··············································]
      [Span D··········································]
    [Span C········································]
         [Span E·······]        [Span F··]


Each Span encapsulates the following state:

  • An operation name
  • A start and finish timestamp
  • A set of zero or more key:value Attributes. The keys must be strings. The values may be strings, bools, or numeric types.
  • A set of zero or more Events, each of which is itself a key:value map paired with a timestamp. The keys must be strings, though the values may be of the same types as Span Attributes.
  • Parent's Span identifier.
  • Links to zero or more causally-related Spans (via the SpanContext of those related Spans).
  • SpanContext identification of a Span. See below.


Represents all the information that identifies Span in the Trace and MUST be propagated to child Spans and across process boundaries. A SpanContext contains the tracing identifiers and the options that are propagated from parent to child Spans.

  • TraceId is the identifier for a trace. It is worldwide unique with practically sufficient probability by being made as 16 randomly generated bytes. TraceId is used to group all spans for a specific trace together across all processes.
  • SpanId is the identifier for a span. It is globally unique with practically sufficient probability by being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the parent span id for the child Span.
  • TraceOptions represents the options for a trace. It is represented as 1 byte (bitmap).
    • Sampling bit - Bit to represent whether trace is sampled or not (mask 0x1).
  • Tracestate carries tracing-system specific context in a list of key value pairs. Tracestate allows different vendors propagate additional information and inter-operate with their legacy Id formats. For more details see [this][].

Links between spans

A Span may be linked to zero or more other Spans (defined by SpanContext) that are causally related. Links can point to SpanContexts inside a single Trace or across different Traces. Links can be used to represent batched operations where a Span has multiple parents, each representing a single incoming item being processed in the batch. Another example of using a Link is to declare relationship between originating and restarted trace. This can be used when Trace enters trusted boundaries of an service and service policy requires to generate a new Trace instead of trusting incoming Trace context.


TODO: Describe metrics terminology open-telemetry#45


DistributedContext is an abstract data type that represents collection of entries. Each key of DistributedContext is associated with exactly one value. DistributedContext is serializable, to facilitate propagating it not only inside the process but also across process boundaries.

DistributedContext is used to annotate telemetry with the name:value pair Entry. Those values can be used to add dimension to the metric or additional contest properties to logs and traces.

DistributedContext is a recommended name but languages can have more language-specific names like dctx.


An Entry is used to label anything that is associated with a specific operation, such as an HTTP request. It consists of EntryKey, EntryValue and EntryMetadata.

  • EntryKey is the name of the Entry. EntryKey along with EntryValue can be used to aggregate and group stats, annotate traces and logs, etc. EntryKey is a string that contains only printable ASCII (codes between 32 and 126 inclusive) and with a length greater than zero and less than 256.
  • EntryValue is a string that contains only printable ASCII (codes between 32 and 126).
  • EntryMetadata contains properties associated with an Entry. For now only the property EntryTTL is defined.
  • EntryTTL is an integer that represents number of hops an entry can propagate. Anytime a sender serializes an entry, sends it over the wire and receiver unserializes the entry then the entry is considered to have travelled one hop.


Resource captures information about the entity for which telemetry is recorded. For example, metrics exposed by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, and container name.

Resource may capture an entire hierarchy of entity identification. It may describe the host in the cloud and specific container or an application running in the process.

Note, that some of the process identification information can be associated with telemetry automatically by OpenTelemetry SDK or specific exporter. See OpenTelemetry proto for an example.

TODO: Better describe the difference between the resource and a Node open-telemetry/opentelemetry-proto#17


The OpenTelemetry service is a set of components that can collect traces, metrics and eventually other telemetry data (e.g. logs) from processes instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger, Prometheus, etc.), do aggregation and smart sampling, and export traces and metrics to one or more monitoring/tracing backends. The service will allow to enrich and transform collected telemetry (e.g. add additional attributes or scrab personal information).

The OpenTelemetry service has two primary modes of operation: Agent (a locally running daemon) and Collector (a standalone running service).

Read more at OpenTelemetry Service Long-term Vision.

Instrumentation adapters

The inspiration of the project is to make every library and application manageable out of the box by instrumenting it with OpenTelemery. However on the way to this goal there will be a need to enable instrumentation by plugging instrumentation adapters into the library of choice. These adapters can be wrapping library APIs, subscribing to the library-specific callbacks or translating telemetry exposed in other formats into OpenTelemetry model.

Instrumentation adapters may be called different names. It is often referred as plugin, collector or auto-collector, telemetry module, bridge, etc. It is always recommended to follow the library and language standards. For instance, if instrumentation adapter is implemented as "log appender" - it will probably be called an appender, not an instrumentation adapter. However if there is no established name - the recommendation is to call packages "Instrumentation Adapter" or simply "Adapter".

Code injecting adapters

TODO: fill out as a result of SIG discussion.