Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming final agent response callbacks #5937

Conversation

thaiminhpv
Copy link
Contributor

@thaiminhpv thaiminhpv commented Jun 9, 2023

This is a comprehensive version of callbacks when we want to streaming the only the last response of the agent.

This fix issue #2483 but in a more comprehensive way. This is NOT duplicate of #4630 .

I know there is a great work has been done by @UmerHA for FinalStreamingStdOutCallbackHandler class, and this PR work is inspired by him, but I want that class to be a simple use case for final streaming. What this PR does is a comprehensive version of final streaming with the following additional feature:

  • Multiple answer prefix phrases: Just like OpenAI they allows multiple stop sequences, this class can detect multiple stop sequences before it begin streaming.
  • Post-process on-the-fly: This utilize the above algorithm (matching multiple prefix phrases) so I think this is suitable to place it here.
  • Easy-to-use Callback on each new token during streaming of final agent response.
  • Abnormal streaming detection: if the streaming contains unexpected tokens generated by model, then raise an Exception.

Quick setup

from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.llms import OpenAI
from langchain.callbacks import StreamingLastResponseCallbackHandler

llm = OpenAI(temperature=0, streaming=True)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)

Usage 1 - use as callback function for last response new token

stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
@stream.on_last_response_new_token()
def on_new_token(token: str):
    if token is StopIteration:
        print("\n[Done]")
        return
    else:
        print(token, end="", flush=True)

agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?", callbacks=[stream])

Usage 2 - use as iterator

stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
def _run():
    agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?", callbacks=[stream])
threading.Thread(target=_run).start()

for token in stream:
    print(token, end="", flush=True)

Usage 3 - Post process on-the-fly

import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

@stream.postprocess(sliding_window_step=1, window_size=3)
def postprocess_func(tokens: List[str]) -> List[str]:
    sentence = "".join(tokens).replace("LLM", "LangChain")
    words = [enc.decode([t]) for t in enc.encode(sentence)]  # postprocess output can have different size!
    return words

def _run():
    agent.run("Say 'Large Language Model (LLM) is great!'", callbacks=[stream])
threading.Thread(target=_run).start()

for token in stream:
    print(token, end="", flush=True)

Who can review?

Hi @hwchase17 and @agola11, please review my PR, I do have an usage example in the docstring of this StreamingLastResponseCallbackHandler class.

I appreciate any feedback, please review my PR soon.

P/s: Updated using decorator approach.

@thaiminhpv thaiminhpv force-pushed the thaiminhpv/streaming-last-response-callbacks branch 3 times, most recently from ee75e0a to 4ff2718 Compare June 12, 2023 17:14
@thaiminhpv
Copy link
Contributor Author

Dear @hwchase17 and @agola11, I have fixed the linting and improve the usability, please check it!

@thaiminhpv thaiminhpv force-pushed the thaiminhpv/streaming-last-response-callbacks branch from 1828015 to aea3734 Compare June 12, 2023 20:24
@vercel
Copy link

vercel bot commented Jun 17, 2023

@thaiminhpv is attempting to deploy a commit to the LangChain Team on Vercel.

A member of the Team first needs to authorize it.

@UmerHA
Copy link
Contributor

UmerHA commented Jun 17, 2023

@thaiminhpv cool! Especially love the postprocessing on the fly.
One question: In stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION), why do we need the agent type to initialize this callback?

@UmerHA
Copy link
Contributor

UmerHA commented Jun 17, 2023

@thaiminhpv Also a suggestion: Consider renaming the class to make this PR backward-compatible. If people are already using FinalStreamingStdOutCallbackHandler, they'll get errors.

@thaiminhpv
Copy link
Contributor Author

thaiminhpv commented Jun 18, 2023

@UmerHA thank you!

One question: In stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION), why do we need the agent type to initialize this callback?

It is just a convenient shorthand for StreamingLastResponseCallbackHandler(answer_prefix_phrases=["Final Answer:"])

We can use this class by custom defining answer_prefix_phrases like this

stream = StreamingLastResponseCallbackHandler(answer_prefix_phrases=["Now I know the final answer:"])

As in here:

https://github.com/hwchase17/langchain/blob/e208414a9ae3cc9a6d84b6ebfe5b2c7e4c0cea77/langchain/callbacks/streaming_last_response_callback.py#L308-L323


Also a suggestion: Consider renaming the class to make this PR backward-compatible. If people are already using FinalStreamingStdOutCallbackHandler, they'll get errors.

@UmerHA IMO, I prefer to keep both classes as separate files, as each classes serve different purposes.

"This is needed in order to calculate detection_windows_size for StreamingLastResponseCallbackHandler"
"Please install it with `pip install tiktoken`."
)
self._enc = tiktoken.get_encoding(tiktoken_encoding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth letting folks pass in whatever tokenize they want (can still default to tiktoken "cl100k_base")?

Copy link
Contributor Author

@thaiminhpv thaiminhpv Aug 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth letting folks pass in whatever tokenize they want (can still default to tiktoken "cl100k_base")?

@baskaryan Thank you for the suggestion, I have implemented it, please check it!

super().__init__()
if isinstance(tokenizer, str):
try:
import tiktoken
except ImportError:
raise ImportError(
"Could not import tiktoken python package. "
"This is needed in order to calculate detection_windows_size for StreamingLastResponseCallbackHandler"
"Please install it with `pip install tiktoken`."
)
tokenizer = tiktoken.get_encoding(tokenizer)
else:
try:
from transformers import PreTrainedTokenizerBase
if not isinstance(tokenizer, PreTrainedTokenizerBase):
raise ValueError(
"Tokenizer received was neither a string nor a PreTrainedTokenizerBase from transformers."
)
except ImportError:
raise ValueError(
"Could not import transformers python package. "
"Please install it with `pip install transformers`."
)
def _huggingface_tokenizer_length(text: str) -> int:
return len(tokenizer.encode(text))
self._get_length_in_tokens = _huggingface_tokenizer_length

@baskaryan baskaryan added the 03 enhancement Enhancement of existing functionality label Jul 13, 2023
@vercel
Copy link

vercel bot commented Jul 13, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 13, 2023 7:37pm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pr @thaiminhpv! could we add a not explaining the difference between this and FinalStreamingStdOutCallbackHandler? imagine it might otherwise confuse a lot of folks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan
I think the differences are as follows:

The FinalStreamingStdOutCallbackHandler, as the name suggests, it just output the final response of the agent to stdout. User can implement anything such as postprocess or complex logic on top of that.

The StreamingLastResponseCallbackHandler, is like the battery included version of FinalStreamingStdOutCallbackHandler, with additional features that I mentioned earlier in this PR.

In other words:

  • If the user needs to custom some complex logic themselves -> go for FinalStreamingStdOutCallbackHandler

  • If user want to post-process on-the-fly, or use multiple answer-prefix-phrase, or use abnormal detection feature -> go for FinalStreamingStdOutCallbackHandler.

I'm not sure where to document the explanation of the difference. Do you have any idea?

@thaiminhpv thaiminhpv force-pushed the thaiminhpv/streaming-last-response-callbacks branch from 946a9e2 to f18dafe Compare August 13, 2023 19:25
@thaiminhpv
Copy link
Contributor Author

BTW, I've rebased my branch (manually) to be ahead of the master branch, following by a force push.

This means we can merge my branch without any conflict, for now.

@thaiminhpv thaiminhpv requested a review from baskaryan August 16, 2023 14:24
@leo-gan
Copy link
Collaborator

leo-gan commented Sep 15, 2023

@baskaryan Could you, please, review it?

@hinetabi
Copy link

hinetabi commented Sep 24, 2023

This PR is so helpful, please merge it. I want this in the new version of langchain.

@ilianherzi
Copy link

+2 would love to see this!

@leo-gan leo-gan requested a review from hwchase17 September 25, 2023 15:29
@IdkwhatImD0ing
Copy link
Contributor

Is there a way to make this work for openai functions?

@leo-gan
Copy link
Collaborator

leo-gan commented Oct 16, 2023

@baskaryan FYI

@leo-gan leo-gan requested a review from efriis October 16, 2023 01:18
@hwchase17 hwchase17 closed this Jan 30, 2024
@thaiminhpv
Copy link
Contributor Author

This pull request was created prior to the introduction of OpenAI functions and the LangChain Expression Language (LCEL). I think this pull request is now obsolete and outdated.

@thaiminhpv thaiminhpv deleted the thaiminhpv/streaming-last-response-callbacks branch February 24, 2024 14:27
@thaiminhpv thaiminhpv restored the thaiminhpv/streaming-last-response-callbacks branch February 24, 2024 14:31
@alexgg278
Copy link

Isn't it possible to adapt it to LCEL and OpenAI functions? It would be usefult to have it

@Tesax123
Copy link

+1, this would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 enhancement Enhancement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.