Experimental support for Apollo tracing over OTLP #4982

timbotnik · 2024-04-18T13:36:36Z

https://apollographql.atlassian.net/browse/ROUTER-348

As the ecosystem around OpenTelemetry (OTel) has been expanding rapidly, we are evaluating a migration of Apollo's internal tracing system to use an OTel-based protocol.

In the short-term, benefits include:

A comprehensive way to visualize the Router execution path in Studio.
Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
Additional metadata such as subgraph fetch details, Router idle / busy timing, and more.

Long-term, we see this as a strategic enhancement to consolidate these 2 disparate tracing systems.
This will pave the way for future enhancements to more easily plug into the Studio trace visualizer.

Configuration

This change adds a new configuration option experimental_otlp_tracing_sampler. This can be used to send
a percentage of traces via OTLP instead of the native Apollo Usage Reporting protocol. Supported values:

always_off (default): send all traces via Apollo Usage Reporting protocol.
always_on: send all traces via OTLP.
0.0 - 1.0: the ratio of traces to send via OTLP (0.5 = 50 / 50).

Note that this sampler is only applied after the common tracing sampler, for example:

Sample 1% of traces, send all traces via OTLP:

telemetry:
  apollo:
    # Send all traces via OTLP
    experimental_otlp_tracing_sampler: always_on

  exporters:
    tracing:
      common:
        # Sample traces at 1% of all traffic
        sampler: 0.01

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

router-perf · 2024-04-18T13:46:10Z

apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs

apollo-router/src/plugins/telemetry/tracing/apollo_telemetry.rs

apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs

apollo-router/src/plugins/telemetry/tracing/apollo_telemetry.rs

…tudio

Note that this is very much unvetted and doesn’t quite compile yet.

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

…d of Arc

…initial-support

apollo-router/src/plugins/telemetry/apollo.rs

…porter Which with my current skills means we’ll have to clone all of the spans. Possibly there is a way to use shared span pointers here instead but I’ll leave that for an optimization pass.

…es of LightSpanData’s

… cache

garypen

It looks generally good, but I'm concerned about the lock around the export future. Maybe worth taking another look at that or at least convincing me that it works as you'd expect.

docs/source/configuration/telemetry/instrumentation/standard-instruments.mdx

apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs

apollo-router/tests/apollo_otel_traces.rs

BrynCooke · 2024-06-03T14:25:57Z

apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs

+            resource_template: Resource::new([
+                KeyValue::new(
+                    "apollo.router.id",
+                    ROUTER_ID.get_or_init(Uuid::new_v4).to_string(),


This is initialised in a couple of places now. Let's have a fn router_id() that contains ROUTER_ID.get_or_init(||Uuid::new_v4().to_string())

BrynCooke · 2024-06-03T14:30:26Z

apollo-router/tests/apollo_otel_traces.rs

@@ -0,0 +1,519 @@
+//! Be aware that this test file contains some fairly flaky tests which embed a number of


If the integration test runner was used rather than the test harness then all the tests could be isolated.

Actually I haven't seen any flakiness TBH - I'll remove this line.

…initial-support

…creation

During testing, we found that some spans were missing at the top-level. This means that our trace viewer is not able to connect the full trace tree and is rendered as an “invalid” trace. I’ll raise a discussion about whether users can inject additional spans between the request and the supergraph / execution spans which will end up breaking our tree.

… once we detect a signature

…initial-support

This comment has been minimized.

Sign in to view

timbotnik changed the title ~~Timbotnik/apollo otlp/initial support~~ WIP: initial support for Apollo over OTLP Apr 18, 2024

timbotnik changed the title ~~WIP: initial support for Apollo over OTLP~~ WIP: experiment to support for Apollo over OTLP Apr 18, 2024

timbotnik changed the title ~~WIP: experiment to support for Apollo over OTLP~~ WIP: experiment to support Apollo tracing over OTLP Apr 18, 2024

timbotnik requested a review from bnjjj April 18, 2024 13:38

timbotnik commented Apr 18, 2024

View reviewed changes

apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs Outdated Show resolved Hide resolved

timbotnik commented Apr 18, 2024

View reviewed changes

apollo-router/src/plugins/telemetry/tracing/apollo_telemetry.rs Outdated Show resolved Hide resolved

bnjjj reviewed Apr 18, 2024

View reviewed changes

timbotnik added 3 commits April 19, 2024 01:32

Add a new configuration for setting the tracing protocol for Apollo S…

5cf5750

…tudio

Modify some exports that we’ll need

d1024dc

Add the OTLP path

360f8fb

Note that this is very much unvetted and doesn’t quite compile yet.

timbotnik force-pushed the timbotnik/apollo-otlp/initial-support branch from ea8c4a2 to b0755b2 Compare April 18, 2024 15:32

Fix up some loose ends, review notes

376f480

timbotnik force-pushed the timbotnik/apollo-otlp/initial-support branch from b0755b2 to 376f480 Compare April 18, 2024 15:41

bnjjj and others added 6 commits April 19, 2024 10:21

fixes

f4973e4

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

Turn the ApollOtlpExporter into a SpanExporter, try direct ref instea…

2d5c59b

…d of Arc

Use interior mutability with Arcs to improve lifetimes

738e2b3

Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…

0728090

…initial-support

Add shutdown handling

e35b241

Code cleanup

2004a6e

bonnici reviewed Apr 24, 2024

View reviewed changes

apollo-router/src/plugins/telemetry/apollo.rs Outdated Show resolved Hide resolved

timbotnik added 8 commits April 24, 2024 23:13

Fix an issue where we’d be stealing the spans away from the Apollo ex…

0730668

…porter Which with my current skills means we’ll have to clone all of the spans. Possibly there is a way to use shared span pointers here instead but I’ll leave that for an optimization pass.

Prepare SpanData’s during collection, prevents making additional copi…

2ce4d1a

…es of LightSpanData’s

Introduce “OTLP” only option, refactor to support peek vs. pop on the…

9a5fa87

… cache

Run cargo fmt

51c43bf

Run xtask lint —fmt

e7ac75d

Manual lint fixes

6b08eb8

Clippy is sometimes wrong

c418e2a

Stop filtering spans for now, can revisit this later.

02246e8

bnjjj requested review from Geal, BrynCooke, o0Ignition0o and garypen June 3, 2024 07:56

Merge branch 'dev' into timbotnik/apollo-otlp/initial-support

71a03f6

bnjjj requested review from a team as code owners June 3, 2024 08:30

garypen reviewed Jun 3, 2024

View reviewed changes

docs/source/configuration/telemetry/instrumentation/standard-instruments.mdx Outdated Show resolved Hide resolved

apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs Show resolved Hide resolved

apollo-router/tests/apollo_otel_traces.rs Show resolved Hide resolved

BrynCooke reviewed Jun 3, 2024

View reviewed changes

timbotnik added 6 commits June 4, 2024 10:35

Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…

a0f33f3

…initial-support

Review notes: make mutex lock / unlock more explicit

c99082b

Review notes: set the "apollo_private.operation.subtype" during span …

78ea80d

…creation

Review notes: refactor ROUTER_ID usage to shared initializer function

7d4a2ae

Review notes: clean up integration test comments

4ad4315

Lint fix

27b4c81

timbotnik requested review from garypen, BrynCooke and bnjjj June 4, 2024 05:27

garypen approved these changes Jun 4, 2024

View reviewed changes

bnjjj approved these changes Jun 4, 2024

View reviewed changes

timbotnik added 3 commits June 6, 2024 13:29

Parochial router tests: invert “send_trace” logic to only send traces…

f7ef42e

… once we detect a signature

Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…

a1ff1a1

…initial-support

BrynCooke approved these changes Jun 6, 2024

View reviewed changes

timbotnik added 2 commits June 7, 2024 11:15

Parochial router tests: drop span filtering altogether. #nofilter

0dba236

Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…

e182e18

…initial-support

bnjjj approved these changes Jun 10, 2024

View reviewed changes

timbotnik merged commit cee539e into dev Jun 10, 2024
19 checks passed

timbotnik deleted the timbotnik/apollo-otlp/initial-support branch June 10, 2024 08:37

lrlna mentioned this pull request Jun 18, 2024

prep release: v1.49.0 #5473

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental support for Apollo tracing over OTLP #4982

Experimental support for Apollo tracing over OTLP #4982

timbotnik commented Apr 18, 2024 •

edited

Loading

This comment has been minimized.

router-perf bot commented Apr 18, 2024

garypen left a comment

BrynCooke Jun 3, 2024

timbotnik Jun 3, 2024

timbotnik Jun 4, 2024

BrynCooke Jun 3, 2024 •

edited

Loading

timbotnik Jun 3, 2024

timbotnik Jun 4, 2024

		@@ -0,0 +1,519 @@
		//! Be aware that this test file contains some fairly flaky tests which embed a number of

Experimental support for Apollo tracing over OTLP #4982

Experimental support for Apollo tracing over OTLP #4982

Conversation

timbotnik commented Apr 18, 2024 • edited Loading

Configuration

Sample 1% of traces, send all traces via OTLP:

Footnotes

This comment has been minimized.

router-perf bot commented Apr 18, 2024

garypen left a comment

Choose a reason for hiding this comment

BrynCooke Jun 3, 2024

Choose a reason for hiding this comment

timbotnik Jun 3, 2024

Choose a reason for hiding this comment

timbotnik Jun 4, 2024

Choose a reason for hiding this comment

BrynCooke Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

timbotnik Jun 3, 2024

Choose a reason for hiding this comment

timbotnik Jun 4, 2024

Choose a reason for hiding this comment

timbotnik commented Apr 18, 2024 •

edited

Loading

BrynCooke Jun 3, 2024 •

edited

Loading