fix(langchain): improve callbacks #1317

tibor-reiss · 2024-06-16T19:32:44Z

Follow-up to #1170

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

#1170 hooks into __init__ which does not seem to be too robust. Browsing through the langchain code it seems that the API prefers the following functions: invoke, stream, etc.

tibor-reiss · 2024-06-16T19:42:29Z

Hi @nirga, I had the feeling that the previous PR was not the best approach given the difficulty of hooking up/extending it. Given the difficulty with RunnableSequence, namely that it's not possible to add callbacks in __init__, I looked for another alternative: it seems to me that there are some functions which should be used for running chains, namely invoke and stream (and probably batch). This is underlined by the deprecation warnings on __call__ and run.

The difficulty is to identify the correct span in the callback. For this I used a tag which is nicely propagated in the on_* callbacks. Note: metadata does not show up in e.g. on_chain_end.

This is a first attempt towards utilizing the former functions (currently only invoke). Please have a look and let me know what you think. 10/11 tests still pass without even touching them, which is not that bad :) The real test will be to see if RunnableSequence also behaves nicely - tbc

nirga · 2024-06-16T20:27:36Z

@tibor-reiss that's an elegant solution - except for the usage of tag. If we must use it - so be it. But we're basically adding a tag (which is something the user can also add and use) without the user consent which seems a bit meh. I wonder if we could maybe create a new callback object for each callback and then have that object hold all the spans details.

tibor-reiss · 2024-06-17T18:28:10Z

@nirga Yes, I am with you on the additional tag. However, I could not figure out (yet) how the individual spans can be distinguished. To sum up, here are the issues which I encountered so far:

current implementation with __init__: attributes - e.g. callbacks - are not available
pydantic disallows adding additional attributes
in some cases the chain members inherit callbacks and/or the callback manager, this can e.g. lead to multiple calls for on_chain_start, and also to multiple calls on span.end() where the parent span would be ended before the child span
the on_*_start methods' serialized argument is lacking a unique identifier (see also the json serializer: it produces an id attribute, but this also lacks a unique identifier)
the on_*_end methods have only the kwargs argument

I will keep looking for a solution which can match the objects to the appropriate spans...

nirga · 2024-06-17T20:50:48Z

@tibor-reiss what about the run_id? That seems to be something that is sent on most callbacks

tibor-reiss · 2024-06-18T18:01:00Z

@tibor-reiss what about the run_id? That seems to be something that is sent on most callbacks

@nirga It is indeed sent, however there are two issues:

keeping track of the order of the callbacks (maybe via parent_id, but is not guaranteed to be filled, can be None)
at the time of the instantiation of the callback class there is no information about the callbacks, and the callbacks don't send the object which they belong to.

Even though the tag solution is ugly in the sense that there is an extra tag, the logic itself becomes clean and simple. Whereas with the run_id/parent_id it seems there are many assumptions to be made which also means a more complex logic.

nirga · 2024-06-18T18:55:16Z

@tibor-reiss hmm not sure I completely understood the issue with the run ID, wanna chat over slack / zoom?

tibor-reiss · 2024-06-19T19:12:13Z

@nirga Based on our conversation, here is a first try. The logic in callback_wrapper() still feels clunky, together with class_name and kind...

Re python3.12 error: related to a pydantic release it seems: pydantic/pydantic#9637

nirga · 2024-06-19T19:44:26Z

Yeah I get what you're saying @tibor-reiss , but I think it's reasonable (and I can't think of a better solution for now). Reg. the test failing - yeah I've already fixed it in main

nirga · 2024-06-19T19:45:19Z

...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_wrapper.py

-        if not any(isinstance(c, SyncSpanCallbackHandler) for c in kwargs["callbacks"]):
-            # Avoid adding the same callback twice, e.g. SequentialChain is also a Chain
-            kwargs["callbacks"].append(cb)
+    kind = to_wrap.get("kind")


One thing I would here is extract all this logic (except for the actual call to wrapped to a separate method so you can decorate it with @dont_throw (and so for the callback methods below)

tibor-reiss · 2024-06-22T20:32:28Z

@nirga Notes to the current state:

The Achilles is the logic in add_instance: as I mentioned in the first implementation with a tag, any other solution - including the current one - does not guarantee uniqueness. Afais, the serializer does not give back a unique object, so when it comes to calling the callbacks, there is no more chance to uniquely identify the instance. I might reach out to the langchain devs to have some more insight...

I would recommend removing _set_chat_request, because all the relevant information is available in the other attributes. I provided this to be more inline with the past implementation, e.g. custom_chat_wrapper/_handle_request.

I don't know why but there is no on_chat_model_end.

I'll have a look at the failing task asap...

nirga · 2024-06-22T20:44:41Z

@tibor-reiss thanks for the update! Why do you need to uniquely identify the object? Can't we just generate a new callback object each time?

tibor-reiss · 2024-06-23T08:18:53Z

@nirga Not sure if I understand this correctly: "Can't we just generate a new callback object each time?"

Creating a new instance of the SyncSpanCallbackHandler at every invoke:

The callback manager will have to be always modified to have just one instance. This is inefficient, and might cause issues with closing spans, especially in the cases where there is no on_*_end method.
How to connect the spans (without a global state which stores run_ids and parent_run_ids)? E.g. using just the below code will completely ignore the parent_run_id:

current_context = set_span_in_context(self.span)
context_api.attach(current_context)

A good example is test_simple_lcel.

One instance of the SyncSpanCallbackHandler (as is now):

If the chain consists of multiple objects of the same class, how are they identified? e.g. multiple prompts which only differ in their names - atm add_instance would overwrite with the last. Even if one would store the instance id, the on-callback-functions don't have the necessary information to identify the correct instance.

TLDR: the current implementation so far works in many cases, but it's good to keep in mind it's limitations.

nirga

Got it! Thanks @tibor-reiss 🙏 . Left a small comment. Did you manage to create a fake example that "fails" this current solution?

nirga · 2024-06-23T09:30:58Z

packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py

-        "ChatCohere.langchain.task",
-        "LLMChain.langchain.task",
-        "StuffDocumentsChain.langchain.task",
+        "LLMChain.langchain.workflow",


Ideally, this should be a task inside the StuffDocumentsChain workflow

Both LLMChain and StuffDocumentsChain inherit from Chain (also SequentialChain - see test_sequential_chain), which kind is defined as WORKFLOW in WRAPPED_METHODS.

One possibility would be to get rid of the "kind" here and make some logic inside callback_wrapper, e.g. if there is a parent_run_id, it's a TASK.

Another possibility would be to list the classes and assign them the kind explicitly.

I went with the current solution because it was simple and the least coupling to the langchain implementation. Let me know what's your preference please!

I think the solution you suggested, where you check for parent_run_id sounds good and better. The expectation is for the workflow to only be that root span.

@nirga Thanks for feedback, added this logic in commit 27024dd.

Additionally, I added a second commit (5fa7689) which removes "kind" from init and passing it to the constructor, because imho it is a good simplification now. Let me know if you agree with this one, otherwise I can revert.

tibor-reiss · 2024-06-23T18:27:51Z

Got it! Thanks @tibor-reiss 🙏 . Left a small comment. Did you manage to create a fake example that "fails" this current solution?

I still have not found one with a name collision, but there is this other one: test_chains/test_sequential_chain -> name any of the LLMChains, e.g. synopsis_chain = LLMChain(..., name="chain1"). The test fails, because instance.get_name() = "chain1", but the class_name obtained from serialized is still LLMChain. One could use serialized["kwargs"]["name"], but the point is that imho using these names (name, run_name, etc) is not very robust.

nirga · 2024-06-27T16:24:40Z

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

tibor-reiss · 2024-06-29T17:40:43Z

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

Correct, not at the moment. The reason being that the name is not always defined/propagated the same way: there is run_name, and there is also kwargs["name"].

tibor-reiss · 2024-06-29T17:46:50Z

@nirga I have also started to explore wrapping stream and transform, similar to invoke. Would you like me to add that work into this PR as well, or shall we try to finish up this first and make a separate PR later on? There are some additional challenges which require some rewriting - so that would be in favor to add now. On the other hand, I am a big proponent of small PRs, and adding it separately shows the evolution of the ideas. Let me know please!

nirga · 2024-06-29T18:37:01Z

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

Correct, not at the moment. The reason being that the name is not always defined/propagated the same way: there is run_name, and there is also kwargs["name"].

Can we maybe use whatever's available? I know folks rely on that

nirga · 2024-06-29T18:38:03Z

@nirga I have also started to explore wrapping stream and transform, similar to invoke. Would you like me to add that work into this PR as well, or shall we try to finish up this first and make a separate PR later on? There are some additional challenges which require some rewriting - so that would be in favor to add now. On the other hand, I am a big proponent of small PRs, and adding it separately shows the evolution of the ideas. Let me know please!

I think I'd love to merge this PR today / tomorrow so depending on that. Unless it's almost done I'd do a separate PR

tibor-reiss · 2024-06-30T07:21:29Z

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

Correct, not at the moment. The reason being that the name is not always defined/propagated the same way: there is run_name, and there is also kwargs["name"].

Can we maybe use whatever's available? I know folks rely on that

Absolutely, see now _get_instance_name_from_callback. I hope this covers most of the cases, including setting the name in the constructor, setting the run_name in with_config. For examples see the updated tests.

nirga

Great work @tibor-reiss! Overall looks good, left a small question

nirga · 2024-06-30T10:46:27Z

packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py


    assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"
    assert openai_span.attributes[SpanAttributes.LLM_REQUEST_MODEL] == "gpt-3.5-turbo"
-    assert (


Did you remove these because they fail now? Cause we still expect the OpenAI span to be there (as it reports the token usage etc - although it might have been better to rely on the langchain instruments to on to provide that, but that's probably out of scope for now).

TLDR: yes, they would fail because I left this from the new callback_wrapper. However, I have similar asserts on the input and output.

See an earlier comment regarding this: "I would recommend removing _set_chat_request, because all the relevant information is available in the other attributes. I provided this to be more inline with the past implementation, e.g. custom_chat_wrapper/_handle_request."

In custom_chat_wrapper there are 2 functions which populate these kinds of fields: _handle_request and _handle_reponse. I did not replicate the should_send_prompts parts because the output from the models is model-specific, furthermore, all the information can be found in SpanAttributes.TRACELOOP_ENTITY_INPUT and SpanAttributes.TRACELOOP_ENTITY_OUTPUT. One could write a parser for the callback arguments inputs / outputs / messages, but this could end up being a if-else spaghetti :)

Let me know please if you would like to have this parser populating the span attributes like {SpanAttributes.LLM_PROMPTS}.{idx}.role.

But @tibor-reiss they should be generated from the OpenAI instrumentation, so they should always be there, no matter what you change in the LangChain instrumentation

Yes, they are there. And they are passed into TRACELOOP_ENTITY_INPUT and TRACELOOP_ENTITY_OUTPUT (see the callbacks, e.g. on_chain_start / inputs and on_chain_end / outputs); they are just not parsed further into LLM_PROMPTS.*, yet. Let me know if you want to add the further parsing.

Not sure I follow @tibor-reiss - what happens if you re add those assertions? I expect the instrumentation for OpenAI to specify everything needed by these assertions

Those are indeed provided by OpenAI, but as part of the callbacks, see inputs, outputs and messages arguments.

If I add back the deleted assertions the tests fail. The attributes starting with LLM_PROMPTS were previously (code is still there) filled in custom_chat_wrapper / _handle_request. This is what I call the "parser", and I can add it to callback_wrapper if you would like to (some basic version is already in _set_chat_request).

However, here is the outputs argument from on_chain_end for test_openai:
AIMessage(content="Why did OpenTelemetry break up with its significant other? Because it couldn't handle the constant tracing and monitoring of their relationship!", response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 24, 'total_tokens': 50}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_d9767fc5b9', 'finish_reason': 'stop', 'logprobs': None}, id='run-18c68937-8f10-4385-9cd5-1e279a0529c6-0', usage_metadata={'input_tokens': 24, 'output_tokens': 26, 'total_tokens': 50})

And here is the outputs from test_anthropic:
AIMessage(content="Why can't a bicycle stand up by itself? Because it's two-tired!", response_metadata={'id': 'msg_017fMG9SRDFTBhcD1ibtN1nK', 'model': 'claude-2.1', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 19, 'output_tokens': 22}}, id='run-4e13a0a2-c199-44ac-b0ab-b807ed023c62-0', usage_metadata={'input_tokens': 19, 'output_tokens': 22, 'total_tokens': 41})

They are similar, but there are still differences, e.g. one striking one is model vs model_name.

So my fear is that the parser we would write, it would just cause a lot of maintenance burden with newer versions. At the same time, all the inputs/messages/outputs are already provided in the SpanAttributes (in TRACELOOP_ENTITY_INPUT and TRACELOOP_ENTITY_OUTPUT), so the parser would be just a pretty printer :)

I don't have a strong preference though, I can implement it if you would like to.

tibor-reiss · 2024-07-01T18:31:19Z

@nirga I have also started to explore wrapping stream and transform, similar to invoke. Would you like me to add that work into this PR as well, or shall we try to finish up this first and make a separate PR later on? There are some additional challenges which require some rewriting - so that would be in favor to add now. On the other hand, I am a big proponent of small PRs, and adding it separately shows the evolution of the ideas. Let me know please!

I think I'd love to merge this PR today / tomorrow so depending on that. Unless it's almost done I'd do a separate PR

Hi @nirga, check out #1426: I managed to remove all the "instance" manipulations. Let me know your thoughts!

tibor-reiss marked this pull request as draft June 16, 2024 19:33

tibor-reiss changed the title ~~WIP - feat(langchain): improve callbacks~~ feat(langchain): improve callbacks Jun 16, 2024

tibor-reiss force-pushed the langchain-callbacks-revisited branch 2 times, most recently from 4d07483 to 4fd020a Compare June 19, 2024 19:05

nirga reviewed Jun 19, 2024

View reviewed changes

tibor-reiss added 3 commits June 22, 2024 10:38

Frist try

d42951f

Trials with run_id

79cc01c

Code review

3407b65

tibor-reiss force-pushed the langchain-callbacks-revisited branch from 4fd020a to b22b0a1 Compare June 22, 2024 20:16

Add more classes

ec21773

tibor-reiss force-pushed the langchain-callbacks-revisited branch from b22b0a1 to ec21773 Compare June 22, 2024 20:26

Fix test

52e1cff

tibor-reiss marked this pull request as ready for review June 23, 2024 09:07

nirga changed the title ~~feat(langchain): improve callbacks~~ fןס(langchain): improve callbacks Jun 23, 2024

nirga changed the title ~~fןס(langchain): improve callbacks~~ fix(langchain): improve callbacks Jun 23, 2024

nirga reviewed Jun 23, 2024

View reviewed changes

jjmachan mentioned this pull request Jun 25, 2024

[Feature Request] OpenLLMetry Integration (OpenTelemetry) explodinggradients/ragas#1039

Open

Merge branch 'main' into langchain-callbacks-revisited

4f7d45e

tibor-reiss added 2 commits June 29, 2024 19:31

Code review - use parent_run_id to demote from workflow to task

27024dd

Do not pass kind to the constructor

5fa7689

Add heuristic for naming chains

b2a1929

nirga approved these changes Jun 30, 2024

View reviewed changes

tibor-reiss mentioned this pull request Jul 1, 2024

feat(langchain): improve callbacks #1426

Merged

4 tasks

tibor-reiss closed this Jul 4, 2024

tibor-reiss deleted the langchain-callbacks-revisited branch July 4, 2024 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(langchain): improve callbacks #1317

fix(langchain): improve callbacks #1317

tibor-reiss commented Jun 16, 2024 •

edited

Loading

tibor-reiss commented Jun 16, 2024 •

edited

Loading

nirga commented Jun 16, 2024

tibor-reiss commented Jun 17, 2024 •

edited

Loading

nirga commented Jun 17, 2024

tibor-reiss commented Jun 18, 2024

nirga commented Jun 18, 2024

tibor-reiss commented Jun 19, 2024

nirga commented Jun 19, 2024

nirga Jun 19, 2024

tibor-reiss commented Jun 22, 2024

nirga commented Jun 22, 2024

tibor-reiss commented Jun 23, 2024

nirga left a comment

nirga Jun 23, 2024

tibor-reiss Jun 23, 2024

nirga Jun 27, 2024

tibor-reiss Jun 29, 2024

tibor-reiss commented Jun 23, 2024

nirga commented Jun 27, 2024

tibor-reiss commented Jun 29, 2024

tibor-reiss commented Jun 29, 2024

nirga commented Jun 29, 2024

nirga commented Jun 29, 2024

tibor-reiss commented Jun 30, 2024

nirga left a comment

nirga Jun 30, 2024

tibor-reiss Jun 30, 2024

nirga Jun 30, 2024

tibor-reiss Jun 30, 2024 •

edited

Loading

nirga Jun 30, 2024

tibor-reiss Jun 30, 2024

tibor-reiss commented Jul 1, 2024

fix(langchain): improve callbacks #1317

fix(langchain): improve callbacks #1317

Conversation

tibor-reiss commented Jun 16, 2024 • edited Loading

tibor-reiss commented Jun 16, 2024 • edited Loading

nirga commented Jun 16, 2024

tibor-reiss commented Jun 17, 2024 • edited Loading

nirga commented Jun 17, 2024

tibor-reiss commented Jun 18, 2024

nirga commented Jun 18, 2024

tibor-reiss commented Jun 19, 2024

nirga commented Jun 19, 2024

Choose a reason for hiding this comment

tibor-reiss commented Jun 22, 2024

nirga commented Jun 22, 2024

tibor-reiss commented Jun 23, 2024

nirga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibor-reiss commented Jun 23, 2024

nirga commented Jun 27, 2024

tibor-reiss commented Jun 29, 2024

tibor-reiss commented Jun 29, 2024

nirga commented Jun 29, 2024

nirga commented Jun 29, 2024

tibor-reiss commented Jun 30, 2024

nirga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibor-reiss Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibor-reiss commented Jul 1, 2024

tibor-reiss commented Jun 16, 2024 •

edited

Loading

tibor-reiss commented Jun 16, 2024 •

edited

Loading

tibor-reiss commented Jun 17, 2024 •

edited

Loading

tibor-reiss Jun 30, 2024 •

edited

Loading