-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(langchain): improve callbacks #1317
fix(langchain): improve callbacks #1317
Conversation
Hi @nirga, I had the feeling that the previous PR was not the best approach given the difficulty of hooking up/extending it. Given the difficulty with The difficulty is to identify the correct span in the callback. For this I used a tag which is nicely propagated in the This is a first attempt towards utilizing the former functions (currently only |
@tibor-reiss that's an elegant solution - except for the usage of |
@nirga Yes, I am with you on the additional tag. However, I could not figure out (yet) how the individual spans can be distinguished. To sum up, here are the issues which I encountered so far:
I will keep looking for a solution which can match the objects to the appropriate spans... |
@tibor-reiss what about the |
@nirga It is indeed sent, however there are two issues:
Even though the tag solution is ugly in the sense that there is an extra tag, the logic itself becomes clean and simple. Whereas with the run_id/parent_id it seems there are many assumptions to be made which also means a more complex logic. |
@tibor-reiss hmm not sure I completely understood the issue with the run ID, wanna chat over slack / zoom? |
4d07483
to
4fd020a
Compare
@nirga Based on our conversation, here is a first try. The logic in Re python3.12 error: related to a pydantic release it seems: pydantic/pydantic#9637 |
Yeah I get what you're saying @tibor-reiss , but I think it's reasonable (and I can't think of a better solution for now). Reg. the test failing - yeah I've already fixed it in |
if not any(isinstance(c, SyncSpanCallbackHandler) for c in kwargs["callbacks"]): | ||
# Avoid adding the same callback twice, e.g. SequentialChain is also a Chain | ||
kwargs["callbacks"].append(cb) | ||
kind = to_wrap.get("kind") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I would here is extract all this logic (except for the actual call to wrapped
to a separate method so you can decorate it with @dont_throw
(and so for the callback methods below)
4fd020a
to
b22b0a1
Compare
b22b0a1
to
ec21773
Compare
@nirga Notes to the current state: The Achilles is the logic in I would recommend removing I don't know why but there is no I'll have a look at the failing task asap... |
@tibor-reiss thanks for the update! Why do you need to uniquely identify the object? Can't we just generate a new callback object each time? |
@nirga Not sure if I understand this correctly: "Can't we just generate a new callback object each time?" Creating a new instance of the
A good example is One instance of the
TLDR: the current implementation so far works in many cases, but it's good to keep in mind it's limitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks @tibor-reiss 🙏 . Left a small comment. Did you manage to create a fake example that "fails" this current solution?
"ChatCohere.langchain.task", | ||
"LLMChain.langchain.task", | ||
"StuffDocumentsChain.langchain.task", | ||
"LLMChain.langchain.workflow", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this should be a task inside the StuffDocumentsChain
workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both LLMChain
and StuffDocumentsChain
inherit from Chain
(also SequentialChain
- see test_sequential_chain), which kind
is defined as WORKFLOW
in WRAPPED_METHODS
.
One possibility would be to get rid of the "kind"
here and make some logic inside callback_wrapper, e.g. if there is a parent_run_id
, it's a TASK
.
Another possibility would be to list the classes and assign them the kind explicitly.
I went with the current solution because it was simple and the least coupling to the langchain implementation. Let me know what's your preference please!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution you suggested, where you check for parent_run_id
sounds good and better. The expectation is for the workflow
to only be that root span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have not found one with a name collision, but there is this other one: |
@tibor-reiss not sure I got that comment - you're saying the chain name isn't propagated to the span name? |
Correct, not at the moment. The reason being that the |
@nirga I have also started to explore wrapping |
Can we maybe use whatever's available? I know folks rely on that |
I think I'd love to merge this PR today / tomorrow so depending on that. Unless it's almost done I'd do a separate PR |
Absolutely, see now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @tibor-reiss! Overall looks good, left a small question
|
||
assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat" | ||
assert openai_span.attributes[SpanAttributes.LLM_REQUEST_MODEL] == "gpt-3.5-turbo" | ||
assert ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you remove these because they fail now? Cause we still expect the OpenAI span to be there (as it reports the token usage etc - although it might have been better to rely on the langchain instruments to on to provide that, but that's probably out of scope for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: yes, they would fail because I left this from the new callback_wrapper
. However, I have similar asserts on the input and output.
See an earlier comment regarding this: "I would recommend removing _set_chat_request
, because all the relevant information is available in the other attributes. I provided this to be more inline with the past implementation, e.g. custom_chat_wrapper/_handle_request."
In custom_chat_wrapper
there are 2 functions which populate these kinds of fields: _handle_request
and _handle_reponse
. I did not replicate the should_send_prompts
parts because the output from the models is model-specific, furthermore, all the information can be found in SpanAttributes.TRACELOOP_ENTITY_INPUT
and SpanAttributes.TRACELOOP_ENTITY_OUTPUT
. One could write a parser for the callback arguments inputs / outputs / messages, but this could end up being a if-else spaghetti :)
Let me know please if you would like to have this parser populating the span attributes like {SpanAttributes.LLM_PROMPTS}.{idx}.role
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @tibor-reiss they should be generated from the OpenAI instrumentation, so they should always be there, no matter what you change in the LangChain instrumentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are there. And they are passed into TRACELOOP_ENTITY_INPUT and TRACELOOP_ENTITY_OUTPUT (see the callbacks, e.g. on_chain_start / inputs and on_chain_end / outputs); they are just not parsed further into LLM_PROMPTS.*, yet. Let me know if you want to add the further parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow @tibor-reiss - what happens if you re add those assertions? I expect the instrumentation for OpenAI to specify everything needed by these assertions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are indeed provided by OpenAI, but as part of the callbacks, see inputs
, outputs
and messages arguments
.
If I add back the deleted assertions the tests fail. The attributes starting with LLM_PROMPTS were previously (code is still there) filled in custom_chat_wrapper / _handle_request. This is what I call the "parser", and I can add it to callback_wrapper if you would like to (some basic version is already in _set_chat_request
).
However, here is the outputs
argument from on_chain_end
for test_openai
:
AIMessage(content="Why did OpenTelemetry break up with its significant other? Because it couldn't handle the constant tracing and monitoring of their relationship!", response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 24, 'total_tokens': 50}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_d9767fc5b9', 'finish_reason': 'stop', 'logprobs': None}, id='run-18c68937-8f10-4385-9cd5-1e279a0529c6-0', usage_metadata={'input_tokens': 24, 'output_tokens': 26, 'total_tokens': 50})
And here is the outputs
from test_anthropic:
AIMessage(content="Why can't a bicycle stand up by itself? Because it's two-tired!", response_metadata={'id': 'msg_017fMG9SRDFTBhcD1ibtN1nK', 'model': 'claude-2.1', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 19, 'output_tokens': 22}}, id='run-4e13a0a2-c199-44ac-b0ab-b807ed023c62-0', usage_metadata={'input_tokens': 19, 'output_tokens': 22, 'total_tokens': 41})
They are similar, but there are still differences, e.g. one striking one is model
vs model_name
.
So my fear is that the parser we would write, it would just cause a lot of maintenance burden with newer versions. At the same time, all the inputs/messages/outputs are already provided in the SpanAttributes (in TRACELOOP_ENTITY_INPUT and TRACELOOP_ENTITY_OUTPUT), so the parser would be just a pretty printer :)
I don't have a strong preference though, I can implement it if you would like to.
Hi @nirga, check out #1426: I managed to remove all the "instance" manipulations. Let me know your thoughts! |
Follow-up to #1170
feat(instrumentation): ...
orfix(instrumentation): ...
.#1170 hooks into
__init__
which does not seem to be too robust. Browsing through the langchain code it seems that the API prefers the following functions:invoke
,stream
, etc.