Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLOB-1561] LLM Observability SDK API #4773

Merged
merged 36 commits into from
Oct 24, 2024

Conversation

sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Oct 11, 2024

What does this PR do?

Adds an LLM Observability SDK API onto the tracer.

Important Call-Outs

ML Observability reviewers:

  • index.d.ts contains the TypeScript type definitions which will constitute our API. I think this is most relevant to what we have been solidifying across our SDKs
  • sdk.js is the main file for the SDK, which includes handling different values passed into the API functions and verifying the required data is met

APM Reviewers:

  • packages/dd-trace/src/llmobs/index.js houses the enablement of the LLMObs module. However, since the SDK is always initialized even when LLMObs isn't enabled (it still acts in a no-op state although it is not the no-op SDK), any writers and channel subscribers happen in the module, which is only enabled when a) enable(config) is called from the tracer proxy because LLMObs was enabled during init, or b) llmobs.enable(llmobsConfig) is called after init is called. This way, no writers/periodic flushing and span processing/injection subscribers are registered if LLMObs isn't explicitly enabled.

Motivation

Part of the LLM Observability SDK release. This is the last PR for the main SDK. A follow-up PR will be for an OpenAI LLM Obs plugin.

Copy link

github-actions bot commented Oct 11, 2024

Overall package size

Self size: 7.7 MB
Deduped: 62.44 MB
No deduping: 62.72 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/native-appsec | 8.1.1 | 18.67 MB | 18.68 MB | | @datadog/native-iast-taint-tracking | 3.1.0 | 12.27 MB | 12.28 MB | | @datadog/pprof | 5.3.0 | 9.85 MB | 10.22 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.5.0 | 2.51 MB | 2.59 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 2.0.0 | 898.77 kB | 1.3 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.11.2 | 112.74 kB | 826.22 kB | | msgpack-lite | 0.1.26 | 201.16 kB | 281.59 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | lru-cache | 7.14.0 | 74.95 kB | 74.95 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | int64-buffer | 0.1.10 | 49.18 kB | 49.18 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | path-to-regexp | 0.1.10 | 6.38 kB | 6.38 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Oct 11, 2024

Benchmarks

Benchmark execution time: 2024-10-17 23:55:17

Comparing candidate commit 7c7a328 in PR branch sabrenner/llmobs-sdk-sdk with baseline commit b6452ad in branch sabrenner/llmobs-sdk-release.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 259 metrics, 7 unstable metrics.

@sergey-mamulchenko
Copy link

Hi, we're interested to try out monitoring LLMs used in our Node application. Do you have an idea when JS SDK is planned to be shipped?

index.d.ts Outdated Show resolved Hide resolved
@sabrenner sabrenner marked this pull request as ready for review October 16, 2024 17:04
@sabrenner sabrenner requested a review from a team as a code owner October 16, 2024 17:04
@sabrenner
Copy link
Collaborator Author

Hi @sergey-mamulchenko! We're looking to release this SDK by the end of the month. Feel free to follow along with #4742, as that will be the PR that lands containing the SDK 😄

Copy link

@lievan lievan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking comments, ty!

index.d.ts Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
Copy link
Contributor

@Yun-Kim Yun-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a small comment about exposing active() and some clarification questions, no major blocking comments from the MLObs team perspective! Great work @sabrenner 🎉

docs/test.ts Show resolved Hide resolved
docs/test.ts Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
index.d.ts Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Show resolved Hide resolved

if (result && typeof result.then === 'function') {
return result.then(value => {
if (value && kind !== 'retrieval' && !LLMObsTagger.tagMap.get(span)?.[OUTPUT_VALUE]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this still attempt to auto annotate the output for an llm-kind span? Asking because LLM span outputs are in message format so we should just avoid auto-annotating here entirely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh yeah, i actually missed that our _model_decorators do not auto annotate at all. i'll add a check here for llm, and then input annotation check for llm and embedding, and just write a couple small regression tests!

packages/dd-trace/src/llmobs/sdk.js Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
}

// extracts the argument names from a function string
function parseArgumentNames (str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is parsing the argument and function signature from the entire serialized function? 💀

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah 💀 but i memoized it with a weak reference to the function, so if the same function is invoked a million times, it'll use the parsed argument names from the first time and map over the arguments accordingly. tbh this might need some iteration, because i tried to assume user cases in writing tests but i'm sure i didn't think of everything. try/catched it so we will never crash, but will probably iterate on it accordingly from user reports.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if JS had some built-in parsers/helpers like Python, but all pretty basic string operations here so it shouldn't be too time consuming, its just all linear

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
Copy link
Member

@Kyle-Verhoog Kyle-Verhoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't speak too much to the llm-specific stuff since i'm still ramping up to everything. Main concern is the soft fails instead of hard. Feel free to merge and address in the bigger PR.

docs/test.ts Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
index.d.ts Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/sdk.js Outdated Show resolved Hide resolved
@sabrenner
Copy link
Collaborator Author

Will merge this PR in to the feature branch, it looks like a serverless benchmark is failing but I will resolve that in the final PR if needed

@sabrenner sabrenner merged commit 7e8e0f7 into sabrenner/llmobs-sdk-release Oct 24, 2024
200 of 202 checks passed
@sabrenner sabrenner deleted the sabrenner/llmobs-sdk-sdk branch October 24, 2024 15:34
sabrenner added a commit that referenced this pull request Oct 29, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Oct 31, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Oct 31, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Oct 31, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Nov 6, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Nov 6, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants