From e54a60bd820c7a97a14f371be84345c9c92466e3 Mon Sep 17 00:00:00 2001 From: patrickhuie19 Date: Thu, 4 Jan 2024 12:13:29 -0500 Subject: [PATCH 1/3] Adding guide for product teams implementing tracing --- .github/tracing/README.md | 47 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/.github/tracing/README.md b/.github/tracing/README.md index feba31feb65..173dddac72b 100644 --- a/.github/tracing/README.md +++ b/.github/tracing/README.md @@ -48,4 +48,49 @@ This folder contains the following config files: These config files are for an OTEL collector, grafana Tempo, and a grafana UI instance to run as containers on the same network. `otel-collector-dev.yaml` is the configuration for dev (i.e. your local machine) environments, and forwards traces from the otel collector to the grafana tempo instance on the same network. -`otel-collector-ci.yaml` is the configuration for the CI runs, and exports the trace data to the artifact from the github run. \ No newline at end of file +`otel-collector-ci.yaml` is the configuration for the CI runs, and exports the trace data to the artifact from the github run. + +## Instrumenting Custom Traces + +Adding custom traces requires identifying an observability gap in a related group of code executions. This is intuitive for the developer: + +- "What's the flow of component interaction in this distributed system?" +- "What's a ballpark latency of funcOne?" +- "What's the behavior of the JobProcessorOne component when jobs with [x, y, z] attributes are processed?" + +Once an execution flow is desired to be traced, the developer will decide which parts of the execution flow should be measured separately. Each logically separate measure will be a span. Spans can have one parent span and multiple children span. The relations between parent and child spans will make a directed acyclic graph. + +The most trivial application of a span is measuring latency performance at high levels of granularity (individual func executions). There is much more you can do, including creating human readable and timestamped events within a span (useful for monitoring concurrent access to resources), recording errors, linking parent and children spans through large parts of an application, and even extending a span beyond a single process. + +Spans are created by tracers and passed through go applications by Contexts. A tracer must be initialized first. Both core and plugin developers will initialize a tracer from the globally registered trace provider: + +``` +tracer := otel.GetTracerProvider().Tracer("example.com/foo") +``` + +The globally registered tracer provider is available for plugins after they are initialized, and available in core after configuration is processed (`initGlobals`). + +Add spans by: +``` + func interestingFunc() { + // Assuming there is an appropriate parentContext + ctx, span := tracer.Start(parentContext, "hello-span") + defer span.End() + + // do some work to track with hello-span + } +``` +As implied by the example, span is a child span of its parent span captured by parentContext. + + +Note that in certain situations, there are 3rd party libraries that will setup spans. For instance: + +``` +import ( + "github.com/gin-gonic/gin" + "go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin" +) + +router := gin.Default() +router.Use(otelgin.Middleware("service-name")) +``` \ No newline at end of file From 7e7e2f04e1eeaa5eacfc1a59eda30616313535e1 Mon Sep 17 00:00:00 2001 From: patrickhuie19 Date: Thu, 4 Jan 2024 18:44:11 -0500 Subject: [PATCH 2/3] addressing comments --- .github/tracing/README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/.github/tracing/README.md b/.github/tracing/README.md index 173dddac72b..f99c73ab97a 100644 --- a/.github/tracing/README.md +++ b/.github/tracing/README.md @@ -55,10 +55,9 @@ These config files are for an OTEL collector, grafana Tempo, and a grafana UI in Adding custom traces requires identifying an observability gap in a related group of code executions. This is intuitive for the developer: - "What's the flow of component interaction in this distributed system?" -- "What's a ballpark latency of funcOne?" - "What's the behavior of the JobProcessorOne component when jobs with [x, y, z] attributes are processed?" -Once an execution flow is desired to be traced, the developer will decide which parts of the execution flow should be measured separately. Each logically separate measure will be a span. Spans can have one parent span and multiple children span. The relations between parent and child spans will make a directed acyclic graph. +Given an execution flow, the developer must decide which subset of the flow to measure. Each logically separate measure will be a span. Spans can have one parent span and multiple children span. The relations between parent and child spans will make a directed acyclic graph, called a trace. The most trivial application of a span is measuring latency performance at high levels of granularity (individual func executions). There is much more you can do, including creating human readable and timestamped events within a span (useful for monitoring concurrent access to resources), recording errors, linking parent and children spans through large parts of an application, and even extending a span beyond a single process. @@ -93,4 +92,10 @@ import ( router := gin.Default() router.Use(otelgin.Middleware("service-name")) -``` \ No newline at end of file +``` + +Some quick checks to know you're aligning with Tracing best practices: +- It measures critical paths +- All paths are measured end to end (Context is wired all the way through) +- Emphasizing broadness of measurement over depth +- Using automatic instrumentation if possible \ No newline at end of file From 1617c71415962a4706761620a2ceaff942f93d42 Mon Sep 17 00:00:00 2001 From: patrickhuie19 Date: Thu, 4 Jan 2024 19:11:00 -0500 Subject: [PATCH 3/3] edits --- .github/tracing/README.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/.github/tracing/README.md b/.github/tracing/README.md index f99c73ab97a..04f0216e25f 100644 --- a/.github/tracing/README.md +++ b/.github/tracing/README.md @@ -50,18 +50,19 @@ These config files are for an OTEL collector, grafana Tempo, and a grafana UI in `otel-collector-dev.yaml` is the configuration for dev (i.e. your local machine) environments, and forwards traces from the otel collector to the grafana tempo instance on the same network. `otel-collector-ci.yaml` is the configuration for the CI runs, and exports the trace data to the artifact from the github run. -## Instrumenting Custom Traces +## Adding Traces to Plugins and to core -Adding custom traces requires identifying an observability gap in a related group of code executions. This is intuitive for the developer: +Adding traces requires identifying an observability gap in a related group of code executions or a critical path in your application. This is intuitive for the developer: - "What's the flow of component interaction in this distributed system?" - "What's the behavior of the JobProcessorOne component when jobs with [x, y, z] attributes are processed?" +- "Is this critical path workflow behaving the way we expect?" -Given an execution flow, the developer must decide which subset of the flow to measure. Each logically separate measure will be a span. Spans can have one parent span and multiple children span. The relations between parent and child spans will make a directed acyclic graph, called a trace. +The developer will measure a flow of execution from end to end in one trace. Each logically separate measure of this flow is called a span. Spans have either one or no parent span and multiple children span. The relationship between parent and child spans in agreggate will form a directed acyclic graph. The trace begins at the root of this graph. -The most trivial application of a span is measuring latency performance at high levels of granularity (individual func executions). There is much more you can do, including creating human readable and timestamped events within a span (useful for monitoring concurrent access to resources), recording errors, linking parent and children spans through large parts of an application, and even extending a span beyond a single process. +The most trivial application of a span is measuring top level performance in one critical path. There is much more you can do, including creating human readable and timestamped events within a span (useful for monitoring concurrent access to resources), recording errors, linking parent and children spans through large parts of an application, and even extending a span beyond a single process. -Spans are created by tracers and passed through go applications by Contexts. A tracer must be initialized first. Both core and plugin developers will initialize a tracer from the globally registered trace provider: +Spans are created by `tracers` and passed through go applications by `Context`s. A tracer must be initialized first. Both core and plugin developers will initialize a tracer from the globally registered trace provider: ``` tracer := otel.GetTracerProvider().Tracer("example.com/foo") @@ -79,7 +80,7 @@ Add spans by: // do some work to track with hello-span } ``` -As implied by the example, span is a child span of its parent span captured by parentContext. +As implied by the example, `span` is a child of its parent span captured by `parentContext`. Note that in certain situations, there are 3rd party libraries that will setup spans. For instance: @@ -94,8 +95,8 @@ router := gin.Default() router.Use(otelgin.Middleware("service-name")) ``` -Some quick checks to know you're aligning with Tracing best practices: -- It measures critical paths -- All paths are measured end to end (Context is wired all the way through) -- Emphasizing broadness of measurement over depth -- Using automatic instrumentation if possible \ No newline at end of file +The developer aligns with best practices when they: +- Start with critical paths +- Measure paths from end to end (Context is wired all the way through) +- Emphasize broadness of measurement over depth +- Use automatic instrumentation if possible \ No newline at end of file