feat(llmobs): llmobs context provider #10138

Yun-Kim · 2024-08-08T15:48:17Z

tl;dr: This PR adds an additional context provider to the LLMObs service to keep track of the active LLMObs-type span.

This PR also adds a convenience LLMObs.current_span() method to get the current LLMObs type span, and uses this in our existing public methods that involve checking for the current span if not provided by the user as an argument (ex: LLMObs.export_span(span=None, ...), LLMObs.annotate(span=None, ...).

Workflow

When an LLMObs span gets started, (i.e. LLMObs._start_span(...), BaseLLMIntegration.trace(submit_to_llmobs=True, ...), BedrockIntegration.set_llmobs_tags(...)), we look at the current active LLMObs span, use that to set the parent_id field of the started span, and then activate the started span as the new current active LLMObs span.

The LLMObsContextProvider updates the current active LLMObs span by iterating through the span's ancestor tree and stopping at the closest LLMObs-type span which is not finished.

For distributed tracing scenarios, we inject the request headers with an LLMObs parent ID. We then create a Context object using the extracted LLMObs parent ID header and activate that in the new process.

Context

Spans submitted to LLM Observability are technically APM spans (but with enhanced metadata/metrics and in a different span structure) marked with an llm span_type. This means APM span/parent/trace IDs are all reused in LLMObs spans.

Problem: if APM-specific spans (via integration/manual) are intermixed with LLMObs spans in a trace, this causes problems with traces submitted to LLM Observability, where spans may have parent_id values referencing spans that do not exist in LLM Observability storage, because those APM-specific spans are not ingested to LLM Observability. (example trace structure below)

Span 1 (LLMObs) --> Span 2 (APM) --> Span 3 (LLMObs)
# Span 3's parent_id is equal to span 2's span_id, but span 2 does not exist in LLM Obs ==> broken LLM Obs trace

Initial Solution: Use Span Context

In an effort to reuse as much tracer-internals as possible, internal LLMObs trace processing involved traversing a span's parent tree to find the nearest LLMObs type ancestor span, and then using that span's span_id as the parent_id. For distributed cases, the LLMObs parent ID was injected into request headers, and then extracted into the underlying span Context object's context._meta dictionary field.
This works fine for the most part, except for multithreading applications as the Context._meta field is a shallow copy and thus shared between threads, making it susceptible to race conditions where the propagated LLMObs parent ID would get overwritten by other threads starting.

Current Approach: Manage our own LLMObs contexts

This approach is similar to the first, but instead of attempting to use the Context._meta field in a use case it was not designed for, we just create our own context provider for tracking active LLMObs span/contexts. Since we're relying on ContextVar which is thread-safe, this does not have the same problems as previously regarding multithreaded applications.

Since we are relying on the new LLMObsContextProvider to manage and update the active LLMObs span (which reuses the same logic of traversing the parent tree for the nearest LLMObs span), we can get rid of all of the special casing required for dealing with Context-propagated tags.

The only changes we need to make to shared library code is adding distributed injection code into HTTPPropagator.inject(), activating the propagated request headers into trace_utils.activate_distributed_headers(), and context propagation/activation in the threading patch code.

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-08-08T15:48:52Z

CODEOWNERS have been resolved as:

ddtrace/llmobs/_context.py                                              @DataDog/ml-observability
ddtrace/contrib/internal/futures/threading.py                           @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/trace_utils.py                                          @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_integrations/base.py                                    @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_trace_processor.py                                      @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
ddtrace/propagation/http.py                                             @DataDog/apm-sdk-api-python

pr-commenter · 2024-08-08T16:21:27Z

Benchmarks

Benchmark execution time: 2024-08-09 20:55:39

Comparing candidate commit 79303da in PR branch yunkim/llmobs-context-management with baseline commit 6f46cc7 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 214 metrics, 2 unstable metrics.

Yun-Kim · 2024-08-08T21:23:47Z

ddtrace/llmobs/_constants.py

@@ -3,8 +3,7 @@
 METADATA = "_ml_obs.meta.metadata"
 METRICS = "_ml_obs.metrics"
 ML_APP = "_ml_obs.meta.ml_app"
-PROPAGATED_PARENT_ID_KEY = "_dd.p.llmobs_parent_id"
-PARENT_ID_KEY = "_ml_obs.llmobs_parent_id"
+PARENT_ID_KEY = "_dd.p.llmobs_parent_id"


Unsure if we need to rename this to a different key name. This format _dd.p.* was required because we were injecting the key-value into the context._meta, but if we are directly injecting/extracting from the request headers this might not be needed.
Note 2: are there any requirements for key-naming for injecting into request headers?

ddtrace/contrib/internal/futures/threading.py

ddtrace/llmobs/_llmobs.py

Yun-Kim · 2024-09-23T20:52:43Z

Closing this due to annoying merge conflicts from upstream changes. Replacing this with #10767.

WIP: llmobs context provider

86dc15e

Yun-Kim force-pushed the yunkim/llmobs-context-management branch from 3af4255 to 86dc15e Compare August 8, 2024 18:27

Add current_span() method to LLMObs

58cfd1e

Yun-Kim commented Aug 8, 2024

View reviewed changes

Yun-Kim added 3 commits August 8, 2024 17:30

fmt

6af65b5

fmt

1201499

Merge branch 'main' into yunkim/llmobs-context-management

79303da

lievan reviewed Aug 9, 2024

View reviewed changes

ddtrace/contrib/internal/futures/threading.py Show resolved Hide resolved

ddtrace/llmobs/_llmobs.py Show resolved Hide resolved

Yun-Kim closed this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): llmobs context provider #10138

feat(llmobs): llmobs context provider #10138

Yun-Kim commented Aug 8, 2024 •

edited

Loading

github-actions bot commented Aug 8, 2024 •

edited

Loading

pr-commenter bot commented Aug 8, 2024 •

edited

Loading

Yun-Kim Aug 8, 2024 •

edited

Loading

Yun-Kim commented Sep 23, 2024

feat(llmobs): llmobs context provider #10138

feat(llmobs): llmobs context provider #10138

Conversation

Yun-Kim commented Aug 8, 2024 • edited Loading

Workflow

Context

Initial Solution: Use Span Context

Current Approach: Manage our own LLMObs contexts

Checklist

Reviewer Checklist

github-actions bot commented Aug 8, 2024 • edited Loading

pr-commenter bot commented Aug 8, 2024 • edited Loading

Benchmarks

Yun-Kim Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Yun-Kim commented Sep 23, 2024

Yun-Kim commented Aug 8, 2024 •

edited

Loading

github-actions bot commented Aug 8, 2024 •

edited

Loading

pr-commenter bot commented Aug 8, 2024 •

edited

Loading

Yun-Kim Aug 8, 2024 •

edited

Loading