Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(langchain): improve callbacks #1317

Closed

Conversation

tibor-reiss
Copy link
Contributor

@tibor-reiss tibor-reiss commented Jun 16, 2024

Follow-up to #1170

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

#1170 hooks into __init__ which does not seem to be too robust. Browsing through the langchain code it seems that the API prefers the following functions: invoke, stream, etc.

@tibor-reiss tibor-reiss marked this pull request as draft June 16, 2024 19:33
@tibor-reiss tibor-reiss changed the title WIP - feat(langchain): improve callbacks feat(langchain): improve callbacks Jun 16, 2024
@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented Jun 16, 2024

Hi @nirga, I had the feeling that the previous PR was not the best approach given the difficulty of hooking up/extending it. Given the difficulty with RunnableSequence, namely that it's not possible to add callbacks in __init__, I looked for another alternative: it seems to me that there are some functions which should be used for running chains, namely invoke and stream (and probably batch). This is underlined by the deprecation warnings on __call__ and run.

The difficulty is to identify the correct span in the callback. For this I used a tag which is nicely propagated in the on_* callbacks. Note: metadata does not show up in e.g. on_chain_end.

This is a first attempt towards utilizing the former functions (currently only invoke). Please have a look and let me know what you think. 10/11 tests still pass without even touching them, which is not that bad :) The real test will be to see if RunnableSequence also behaves nicely - tbc

@nirga
Copy link
Member

nirga commented Jun 16, 2024

@tibor-reiss that's an elegant solution - except for the usage of tag. If we must use it - so be it. But we're basically adding a tag (which is something the user can also add and use) without the user consent which seems a bit meh. I wonder if we could maybe create a new callback object for each callback and then have that object hold all the spans details.

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented Jun 17, 2024

@nirga Yes, I am with you on the additional tag. However, I could not figure out (yet) how the individual spans can be distinguished. To sum up, here are the issues which I encountered so far:

  • current implementation with __init__: attributes - e.g. callbacks - are not available
  • pydantic disallows adding additional attributes
  • in some cases the chain members inherit callbacks and/or the callback manager, this can e.g. lead to multiple calls for on_chain_start, and also to multiple calls on span.end() where the parent span would be ended before the child span
  • the on_*_start methods' serialized argument is lacking a unique identifier (see also the json serializer: it produces an id attribute, but this also lacks a unique identifier)
  • the on_*_end methods have only the kwargs argument

I will keep looking for a solution which can match the objects to the appropriate spans...

@nirga
Copy link
Member

nirga commented Jun 17, 2024

@tibor-reiss what about the run_id? That seems to be something that is sent on most callbacks

@tibor-reiss
Copy link
Contributor Author

@tibor-reiss what about the run_id? That seems to be something that is sent on most callbacks

@nirga It is indeed sent, however there are two issues:

  • keeping track of the order of the callbacks (maybe via parent_id, but is not guaranteed to be filled, can be None)
  • at the time of the instantiation of the callback class there is no information about the callbacks, and the callbacks don't send the object which they belong to.

Even though the tag solution is ugly in the sense that there is an extra tag, the logic itself becomes clean and simple. Whereas with the run_id/parent_id it seems there are many assumptions to be made which also means a more complex logic.

@nirga
Copy link
Member

nirga commented Jun 18, 2024

@tibor-reiss hmm not sure I completely understood the issue with the run ID, wanna chat over slack / zoom?

@tibor-reiss tibor-reiss force-pushed the langchain-callbacks-revisited branch 2 times, most recently from 4d07483 to 4fd020a Compare June 19, 2024 19:05
@tibor-reiss
Copy link
Contributor Author

@nirga Based on our conversation, here is a first try. The logic in callback_wrapper() still feels clunky, together with class_name and kind...

Re python3.12 error: related to a pydantic release it seems: pydantic/pydantic#9637

@nirga
Copy link
Member

nirga commented Jun 19, 2024

Yeah I get what you're saying @tibor-reiss , but I think it's reasonable (and I can't think of a better solution for now). Reg. the test failing - yeah I've already fixed it in main

if not any(isinstance(c, SyncSpanCallbackHandler) for c in kwargs["callbacks"]):
# Avoid adding the same callback twice, e.g. SequentialChain is also a Chain
kwargs["callbacks"].append(cb)
kind = to_wrap.get("kind")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I would here is extract all this logic (except for the actual call to wrapped to a separate method so you can decorate it with @dont_throw (and so for the callback methods below)

@tibor-reiss tibor-reiss force-pushed the langchain-callbacks-revisited branch from 4fd020a to b22b0a1 Compare June 22, 2024 20:16
@tibor-reiss tibor-reiss force-pushed the langchain-callbacks-revisited branch from b22b0a1 to ec21773 Compare June 22, 2024 20:26
@tibor-reiss
Copy link
Contributor Author

@nirga Notes to the current state:

The Achilles is the logic in add_instance: as I mentioned in the first implementation with a tag, any other solution - including the current one - does not guarantee uniqueness. Afais, the serializer does not give back a unique object, so when it comes to calling the callbacks, there is no more chance to uniquely identify the instance. I might reach out to the langchain devs to have some more insight...

I would recommend removing _set_chat_request, because all the relevant information is available in the other attributes. I provided this to be more inline with the past implementation, e.g. custom_chat_wrapper/_handle_request.

I don't know why but there is no on_chat_model_end.

I'll have a look at the failing task asap...

@nirga
Copy link
Member

nirga commented Jun 22, 2024

@tibor-reiss thanks for the update! Why do you need to uniquely identify the object? Can't we just generate a new callback object each time?

@tibor-reiss
Copy link
Contributor Author

@nirga Not sure if I understand this correctly: "Can't we just generate a new callback object each time?"

Creating a new instance of the SyncSpanCallbackHandler at every invoke:

  • The callback manager will have to be always modified to have just one instance. This is inefficient, and might cause issues with closing spans, especially in the cases where there is no on_*_end method.
  • How to connect the spans (without a global state which stores run_ids and parent_run_ids)? E.g. using just the below code will completely ignore the parent_run_id:
current_context = set_span_in_context(self.span)
context_api.attach(current_context)

A good example is test_simple_lcel.

One instance of the SyncSpanCallbackHandler (as is now):

  • If the chain consists of multiple objects of the same class, how are they identified? e.g. multiple prompts which only differ in their names - atm add_instance would overwrite with the last. Even if one would store the instance id, the on-callback-functions don't have the necessary information to identify the correct instance.

TLDR: the current implementation so far works in many cases, but it's good to keep in mind it's limitations.

@tibor-reiss tibor-reiss marked this pull request as ready for review June 23, 2024 09:07
@nirga nirga changed the title feat(langchain): improve callbacks fןס(langchain): improve callbacks Jun 23, 2024
@nirga nirga changed the title fןס(langchain): improve callbacks fix(langchain): improve callbacks Jun 23, 2024
Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks @tibor-reiss 🙏 . Left a small comment. Did you manage to create a fake example that "fails" this current solution?

"ChatCohere.langchain.task",
"LLMChain.langchain.task",
"StuffDocumentsChain.langchain.task",
"LLMChain.langchain.workflow",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this should be a task inside the StuffDocumentsChain workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both LLMChain and StuffDocumentsChain inherit from Chain (also SequentialChain - see test_sequential_chain), which kind is defined as WORKFLOW in WRAPPED_METHODS.

One possibility would be to get rid of the "kind" here and make some logic inside callback_wrapper, e.g. if there is a parent_run_id, it's a TASK.

Another possibility would be to list the classes and assign them the kind explicitly.

I went with the current solution because it was simple and the least coupling to the langchain implementation. Let me know what's your preference please!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution you suggested, where you check for parent_run_id sounds good and better. The expectation is for the workflow to only be that root span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirga Thanks for feedback, added this logic in commit 27024dd.

Additionally, I added a second commit (5fa7689) which removes "kind" from init and passing it to the constructor, because imho it is a good simplification now. Let me know if you agree with this one, otherwise I can revert.

@tibor-reiss
Copy link
Contributor Author

Got it! Thanks @tibor-reiss 🙏 . Left a small comment. Did you manage to create a fake example that "fails" this current solution?

I still have not found one with a name collision, but there is this other one: test_chains/test_sequential_chain -> name any of the LLMChains, e.g. synopsis_chain = LLMChain(..., name="chain1"). The test fails, because instance.get_name() = "chain1", but the class_name obtained from serialized is still LLMChain. One could use serialized["kwargs"]["name"], but the point is that imho using these names (name, run_name, etc) is not very robust.

@nirga
Copy link
Member

nirga commented Jun 27, 2024

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

@tibor-reiss
Copy link
Contributor Author

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

Correct, not at the moment. The reason being that the name is not always defined/propagated the same way: there is run_name, and there is also kwargs["name"].

@tibor-reiss
Copy link
Contributor Author

@nirga I have also started to explore wrapping stream and transform, similar to invoke. Would you like me to add that work into this PR as well, or shall we try to finish up this first and make a separate PR later on? There are some additional challenges which require some rewriting - so that would be in favor to add now. On the other hand, I am a big proponent of small PRs, and adding it separately shows the evolution of the ideas. Let me know please!

@nirga
Copy link
Member

nirga commented Jun 29, 2024

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

Correct, not at the moment. The reason being that the name is not always defined/propagated the same way: there is run_name, and there is also kwargs["name"].

Can we maybe use whatever's available? I know folks rely on that

@nirga
Copy link
Member

nirga commented Jun 29, 2024

@nirga I have also started to explore wrapping stream and transform, similar to invoke. Would you like me to add that work into this PR as well, or shall we try to finish up this first and make a separate PR later on? There are some additional challenges which require some rewriting - so that would be in favor to add now. On the other hand, I am a big proponent of small PRs, and adding it separately shows the evolution of the ideas. Let me know please!

I think I'd love to merge this PR today / tomorrow so depending on that. Unless it's almost done I'd do a separate PR

@tibor-reiss
Copy link
Contributor Author

@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name?

Correct, not at the moment. The reason being that the name is not always defined/propagated the same way: there is run_name, and there is also kwargs["name"].

Can we maybe use whatever's available? I know folks rely on that

Absolutely, see now _get_instance_name_from_callback. I hope this covers most of the cases, including setting the name in the constructor, setting the run_name in with_config. For examples see the updated tests.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @tibor-reiss! Overall looks good, left a small question


assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"
assert openai_span.attributes[SpanAttributes.LLM_REQUEST_MODEL] == "gpt-3.5-turbo"
assert (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you remove these because they fail now? Cause we still expect the OpenAI span to be there (as it reports the token usage etc - although it might have been better to rely on the langchain instruments to on to provide that, but that's probably out of scope for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: yes, they would fail because I left this from the new callback_wrapper. However, I have similar asserts on the input and output.

See an earlier comment regarding this: "I would recommend removing _set_chat_request, because all the relevant information is available in the other attributes. I provided this to be more inline with the past implementation, e.g. custom_chat_wrapper/_handle_request."

In custom_chat_wrapper there are 2 functions which populate these kinds of fields: _handle_request and _handle_reponse. I did not replicate the should_send_prompts parts because the output from the models is model-specific, furthermore, all the information can be found in SpanAttributes.TRACELOOP_ENTITY_INPUT and SpanAttributes.TRACELOOP_ENTITY_OUTPUT. One could write a parser for the callback arguments inputs / outputs / messages, but this could end up being a if-else spaghetti :)

Let me know please if you would like to have this parser populating the span attributes like {SpanAttributes.LLM_PROMPTS}.{idx}.role.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But @tibor-reiss they should be generated from the OpenAI instrumentation, so they should always be there, no matter what you change in the LangChain instrumentation

Copy link
Contributor Author

@tibor-reiss tibor-reiss Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are there. And they are passed into TRACELOOP_ENTITY_INPUT and TRACELOOP_ENTITY_OUTPUT (see the callbacks, e.g. on_chain_start / inputs and on_chain_end / outputs); they are just not parsed further into LLM_PROMPTS.*, yet. Let me know if you want to add the further parsing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow @tibor-reiss - what happens if you re add those assertions? I expect the instrumentation for OpenAI to specify everything needed by these assertions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are indeed provided by OpenAI, but as part of the callbacks, see inputs, outputs and messages arguments.

If I add back the deleted assertions the tests fail. The attributes starting with LLM_PROMPTS were previously (code is still there) filled in custom_chat_wrapper / _handle_request. This is what I call the "parser", and I can add it to callback_wrapper if you would like to (some basic version is already in _set_chat_request).

However, here is the outputs argument from on_chain_end for test_openai:
AIMessage(content="Why did OpenTelemetry break up with its significant other? Because it couldn't handle the constant tracing and monitoring of their relationship!", response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 24, 'total_tokens': 50}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_d9767fc5b9', 'finish_reason': 'stop', 'logprobs': None}, id='run-18c68937-8f10-4385-9cd5-1e279a0529c6-0', usage_metadata={'input_tokens': 24, 'output_tokens': 26, 'total_tokens': 50})

And here is the outputs from test_anthropic:
AIMessage(content="Why can't a bicycle stand up by itself? Because it's two-tired!", response_metadata={'id': 'msg_017fMG9SRDFTBhcD1ibtN1nK', 'model': 'claude-2.1', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 19, 'output_tokens': 22}}, id='run-4e13a0a2-c199-44ac-b0ab-b807ed023c62-0', usage_metadata={'input_tokens': 19, 'output_tokens': 22, 'total_tokens': 41})

They are similar, but there are still differences, e.g. one striking one is model vs model_name.

So my fear is that the parser we would write, it would just cause a lot of maintenance burden with newer versions. At the same time, all the inputs/messages/outputs are already provided in the SpanAttributes (in TRACELOOP_ENTITY_INPUT and TRACELOOP_ENTITY_OUTPUT), so the parser would be just a pretty printer :)

I don't have a strong preference though, I can implement it if you would like to.

@tibor-reiss
Copy link
Contributor Author

@nirga I have also started to explore wrapping stream and transform, similar to invoke. Would you like me to add that work into this PR as well, or shall we try to finish up this first and make a separate PR later on? There are some additional challenges which require some rewriting - so that would be in favor to add now. On the other hand, I am a big proponent of small PRs, and adding it separately shows the evolution of the ideas. Let me know please!

I think I'd love to merge this PR today / tomorrow so depending on that. Unless it's almost done I'd do a separate PR

Hi @nirga, check out #1426: I managed to remove all the "instance" manipulations. Let me know your thoughts!

@tibor-reiss tibor-reiss closed this Jul 4, 2024
@tibor-reiss tibor-reiss deleted the langchain-callbacks-revisited branch July 4, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants