diff --git a/website_docs/instrumentation.md b/website_docs/instrumentation.md new file mode 100644 index 00000000..57433436 --- /dev/null +++ b/website_docs/instrumentation.md @@ -0,0 +1,378 @@ +--- +title: "Instrumentation" +weight: 30 +--- + +Instrumentation is the act of adding observability code to your +application. This can be done with direct calls to the OpenTelemetry API within +your code or including a dependency which calls the API and hooks into your +project, like a middleware for an HTTP server. + +# TracerProvider and Tracers + +In OpenTelemetry each service being traced has at least one `TracerProvider` +that is used to hold configuration about the name/version of the service, what +sampler to use and how to process/export the spans. A `Tracer` is created by a +`TracerProvider` and has a name and version. In the Erlang/Elixir OpenTelemetry +the name and version of each `Tracer` is the same as the name and version of the +OTP Application the module using the `Tracer` is in. If the call to use a +`Tracer` is not in a module, for example when using the interactive shell, the +default `Tracer` is used. + +Each OTP Application has a `Tracer` registered for it when the `opentelemetry` +Application boots. This can be disabled by setting the Application environment +variable `register_loaded_applications` to `false`. If you want a more specific +named `Tracer` or disable the automatic registration you can register a `Tracer` +either with a name and version or with an Application name and +`opentelemetry` will get the version from the loaded Application. Examples: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +opentelemetry:register_tracer(test_tracer, <<"0.1.0">>), +opentelemetry:register_application_tracer(myapp), +{{< /tab >}} + +{{< tab >}} +OpenTelemetry.register_tracer(:test_tracer, "0.1.0") +OpenTelemetry.register_application_tracer(:myapp) +{{< /tab >}} + +{{< /tabs >}} + +Giving names to each `Tracer`, and in the case of Erlang/Elixir having that name +be the name of the Application, allows for the ability to blacklist traces from +a particular Application. This can be useful if, for example, a dependency turns +out to be generating too many or in some way problematic spans and it is desired +to disable their generation. + +Additionally, the name and version of the `Tracer` are exported as the +[`InstrumentationLibrary`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/glossary.md#instrumentation-library) +component of spans. This allows users to group and search spans by the +Application they came from. + +You can lookup a `Tracer` by name with `get_tracer/1` and use that `Tracer` +variable to call the tracing API through `otel_tracer` in Erlang or +`OpenTelemetry.Tracer` in Elixir: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +Tracer = opentelemetry:get_tracer(my_app), +SpanCtx = otel_tracer:start_span(Tracer, <<"hello-world">>, #{}), +... +otel_tracer:end_span(SpanCtx). +{{< /tab >}} + +{{< tab >}} +tracer = OpenTelemetry.get_tracer(:my_app) +span_ctx = OpenTelemetry.Tracer.start_span(tracer, "hello-world", %{}) +... +OpenTelemetry.Tracer.end_span(span_ctx) +{{< /tab >}} + +{{< /tabs >}} + +In most cases you will not need to manually register or look up a +`Tracer`. Simply use the macros provided, which are covered in the following +section, and the `Tracer` for the Application the macro is used in will be used +automatically. + +# Starting Spans + +A trace is a tree of spans, starting with a root span that has no parent. To +represent this tree, each span after the root has a parent span associated with +it. When a span is started the parent is set based on the `context`. A `context` +can either be implicit, meaning your code does not have to pass a `Context` +variable to track the active `context`, or explicit where your code must pass +the `Context` as an argument not only to the OpenTelemetry functions but to any +function you need to propagate the `context` so that spans started in those +functions have the proper parent. + +For implicit context propagation across functions within a process the [process +dictionary](http://erlang.org/doc/reference_manual/processes.html#process-dictionary) +is used to store the context. When you start a span with the macro `with_span` +the context in the process dictionary is updated to make the newly started span +the currently active span and this span will be end'ed when the block or +function completes. Additionally, starting a new span within the body of +`with_span` will use the active span as the parent of the new span and the +parent is again the active span when the child's block or function body +completes: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +parent_function() -> + ?with_span(<<"parent">>, #{}, fun child_function/0). + +child_function() -> + %% this is the same process, so the span <<"parent">> set as the active + %% span in the with_span call above will be the active span in this function + ?with_span(<<"child">>, #{}, + fun() -> + %% do work here. when this function returns, <<"child">> will complete. + end). + +{{< /tab >}} + +{{< tab >}} +require OpenTelemetry.Tracer + +def parent_function() do + OpenTelemetry.Tracer.with_span "parent" do + child_function() + end +end + +def child_function() do + ## this is the same process, so the span <<"parent">> set as the active + ## span in the with_span call above will be the active span in this function + OpenTelemetry.Tracer.with_span "child" do + ## do work here. when this function returns, <<"child">> will complete. + end +end +{{< /tab >}} + +{{< /tabs >}} + +## Cross Process Propagation + +The examples in the previous section were spans with a child-parent relationship +within the same process where the parent is available in the process dictionary +when creating a child span. Using the process dictionary this way isn't possible +when crossing processes, either by spawning a new process or sending a message +to an existing process. Instead, the context must be manually passed as a variable. + +### Creating Spans for New Processes + +To pass spans across processes we need to start a span that isn't connected to +particular process. This can be done with the macro `start_span`. Unlike +`with_span`, the `start_span` macro does not set the new span as the currently +active span in the context of the process dictionary. + +Connecting a span as a parent to a child in a new process can be done by attaching +the context and setting the new span as currently active in the process. The +whole context should be attached in order to not lose other telemetry data like +[baggage](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md). + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +SpanCtx = ?start_span(<<"child">>), +Ctx = otel_ctx:get_current(), + +proc_lib:spawn_link(fun() -> + otel_ctx:attach(Ctx), + ?set_current_span(SpanCtx), + + %% do work here + + ?end_span(SpanCtx) + end), +{{< /tab >}} + +{{< tab >}} +span_ctx = OpenTelemetry.Tracer.start_span(<<"child">>) +ctx = OpenTelemetry.Ctx.get_current() + +task = Task.async(fn -> + OpenTelemetry.Ctx.attach(ctx), + OpenTelemetry.Tracer.set_current_span(span_ctx) + # do work here + + # end span here or after `await` returns + end) + +_ = Task.await(task) +OpenTelemetry.Tracer.end_span(span_ctx) +{{< /tab >}} + +{{< /tabs >}} + +### Linking the New Span + +If the work being done by the other process is better represented as a `link` -- +see [the `link` definition in the +specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans) +for more on when that is appropriate +-- then the `SpanCtx` returned by `start_span` is passed to `link/1` to create +a `link` that can be passed to `with_span` or `start_span`: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +Parent = ?current_span_ctx, +proc_lib:spawn_link(fun() -> + %% a new process has a new context so the span created + %% by the following `with_span` will have no parent + Link = opentelemetry:link(Parent), + ?with_span(<<"other-process">>, #{links => [Link]}, + fun() -> ok end) + end), +{{< /tab >}} + +{{< tab >}} +parent = OpenTelemetry.current_span_ctx() +task = Task.async(fn -> + # a new process has a new context so the span created + # by the following `with_span` will have no parent + link = OpenTelemetry.link(parent) + Tracer.with_span "my-task", %{links: [link]} do + :hello + end + end) +{{< /tab >}} + +{{< /tabs >}} + +## Attributes + +Attributes are key-value pairs that are applied as metadata to your spans and +are useful for aggregating, filtering, and grouping traces. Attributes can be +added at span creation, or at any other time during the life cycle of a span +before it has completed. + +The key can be an atom or a utf8 string (a regular string in Elixir and a +binary, `<<"..."/utf8>>`, in Erlang). The value can be of any type. If necessary +the key and value are converted to strings when the attribute is exported in a +span. + +The following example shows the two ways of setting attributes on a span by both +setting an attribute in the start options and then again with `set_attributes` +in the body of the span operation: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +?with_span(<<"my-span">>, #{attributes => [{<<"start-opts-attr">>, <<"start-opts-value">>}]}, + fun() -> + ?set_attributes([{<<"my-attribute">>, <<"my-value">>}, + {another_attribute, <<"value-of-attribute">>}]) + end) +{{< /tab >}} + +{{< tab >}} +Tracer.with_span "span-1", %{attributes: [{<<"start-opts-attr">>, <<"start-opts-value">>}]} do + Tracer.set_attributes([{"my-attributes", "my-value"}, + {:another_attribute, "value-of-attributes"}]) +end +{{< /tab >}} + +{{< /tabs >}} + +### Semantic Attributes + +Semantic Attributes are attributes that are defined by the OpenTelemetry +Specification in order to provide a shared set of attribute keys across multiple +languages, frameworks, and runtimes for common concepts like HTTP methods, +status codes, user agents, and more. These attribute keys and values are +available in the header `opentelemetry_api/include/otel_resource.hrl`. + +Tracing semantic conventions can be found [in this document](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions) + +## Events + +An event is a human-readable message on a span that represents "something +happening" during it's lifetime. For example, imagine a function that requires +exclusive access to a resource like a database connection from a pool. An event +could be created at two points - once, when the connection is checked out from +the pool, and another when it is checked in. + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +?with_span(<<"my-span">>, #{}, + fun() -> + ?add_event(<<"checking out connection">>), + %% acquire connection from connection pool + ?add_event(<<"got connection, doing work">>), + %% do some work with the connection and then return it to the pool + ?add_event(<<"checking in connection">>) + end) +{{< /tab >}} + +{{< tab >}} +Tracer.with_span "my-span" do + Span.add_event("checking out connection") + ## acquire connection from connection pool + Span.add_event("got connection, doing work") + ## do some work with the connection and then return it to the pool + Span.add_event("checking in connection") +end +{{< /tab >}} + +{{< /tabs >}} + + +A useful characteristic of events is that their timestamps are displayed as +offsets from the beginning of the span, allowing you to easily see how much time +elapsed between them. + +Additionally, events can also have attributes of their own: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +?add_event("Process exited with reason", [{pid, Pid)}, {reason, Reason}])) +{{< /tab >}} + +{{< tab >}} +Span.add_event("Process exited with reason", pid: pid, reason: Reason) +{{< /tab >}} + +{{< /tabs >}} + +# Cross Service Propagators + +Distributed traces extend beyond a single service, meaning some context must be +propagated across services to create the parent-child relationship between +spans. This requires cross service [_context +propagation_](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#context-propagation), +a mechanism where identifiers for a trace are sent to remote processes. + +In order to propagate trace context over the wire, a propagator must be +registered with OpenTelemetry. This can be done through configuration of the +`opentelemetry` application: + +{{< tabs Erlang Elixir >}} + +{{< tab >}} +%% sys.config +... +{text_map_propagators, [fun otel_baggage:get_text_map_propagators/0, + fun otel_tracer_default:w3c_propagators/0]}, +... +{{< /tab >}} + +{{< tab >}} +# runtime.exs +... +text_map_propagators: + [&:otel_baggage.get_text_map_propagators/0, + &:otel_tracer_default.w3c_propagators/0], +... +{{< /tab >}} + +{{< /tabs >}} + +If you instead need to use the [B3 +specification](https://github.com/openzipkin/b3-propagation), originally from +the [Zipkin project](https://zipkin.io/), then replace +`otel_tracer_default:w3c_propagators/0` and +`&:otel_tracer_default.w3c_propagators/0` with `fun +otel_tracer_default:b3_propagators/0` and +`&:otel_tracer_default.b3_propagators/0` for Erlang or Elixir respectively. + +# Library Instrumentation + +Library instrumentations, broadly speaking, refers to instrumentation code that +you didn't write but instead include through another library. OpenTelemetry for +Erlang/Elixir supports this process through wrappers and helper functions around +many popular frameworks and libraries. You can find in the +[opentelemetry-erlang-contrib +repo](https://github.com/open-telemetry/opentelemetry-erlang-contrib/) and the [registry](/registry). + +# Creating Metrics + +The metrics API, found in `apps/opentelemetry-experimental-api` of the +`opentelemetry-erlang` repository, is currently unstable, documentation TBA.