Streaming final agent response callbacks #5937

thaiminhpv · 2023-06-09T11:37:59Z

This is a comprehensive version of callbacks when we want to streaming the only the last response of the agent.

This fix issue #2483 but in a more comprehensive way. This is NOT duplicate of #4630 .

I know there is a great work has been done by @UmerHA for FinalStreamingStdOutCallbackHandler class, and this PR work is inspired by him, but I want that class to be a simple use case for final streaming. What this PR does is a comprehensive version of final streaming with the following additional feature:

Multiple answer prefix phrases: Just like OpenAI they allows multiple stop sequences, this class can detect multiple stop sequences before it begin streaming.
Post-process on-the-fly: This utilize the above algorithm (matching multiple prefix phrases) so I think this is suitable to place it here.
Easy-to-use Callback on each new token during streaming of final agent response.
Abnormal streaming detection: if the streaming contains unexpected tokens generated by model, then raise an Exception.

Quick setup

from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.llms import OpenAI
from langchain.callbacks import StreamingLastResponseCallbackHandler

llm = OpenAI(temperature=0, streaming=True)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)

Usage 1 - use as callback function for last response new token

stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
@stream.on_last_response_new_token()
def on_new_token(token: str):
    if token is StopIteration:
        print("\n[Done]")
        return
    else:
        print(token, end="", flush=True)

agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?", callbacks=[stream])

Usage 2 - use as iterator

stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
def _run():
    agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?", callbacks=[stream])
threading.Thread(target=_run).start()

for token in stream:
    print(token, end="", flush=True)

Usage 3 - Post process on-the-fly

import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

@stream.postprocess(sliding_window_step=1, window_size=3)
def postprocess_func(tokens: List[str]) -> List[str]:
    sentence = "".join(tokens).replace("LLM", "LangChain")
    words = [enc.decode([t]) for t in enc.encode(sentence)]  # postprocess output can have different size!
    return words

def _run():
    agent.run("Say 'Large Language Model (LLM) is great!'", callbacks=[stream])
threading.Thread(target=_run).start()

for token in stream:
    print(token, end="", flush=True)

Who can review?

Hi @hwchase17 and @agola11, please review my PR, I do have an usage example in the docstring of this StreamingLastResponseCallbackHandler class.

I appreciate any feedback, please review my PR soon.

P/s: Updated using decorator approach.

thaiminhpv · 2023-06-12T19:34:11Z

Dear @hwchase17 and @agola11, I have fixed the linting and improve the usability, please check it!

vercel · 2023-06-17T12:30:59Z

@thaiminhpv is attempting to deploy a commit to the LangChain Team on Vercel.

A member of the Team first needs to authorize it.

UmerHA · 2023-06-17T12:57:15Z

@thaiminhpv cool! Especially love the postprocessing on the fly.
One question: In stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION), why do we need the agent type to initialize this callback?

UmerHA · 2023-06-17T12:58:43Z

@thaiminhpv Also a suggestion: Consider renaming the class to make this PR backward-compatible. If people are already using FinalStreamingStdOutCallbackHandler, they'll get errors.

thaiminhpv · 2023-06-18T12:16:14Z

@UmerHA thank you!

One question: In stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION), why do we need the agent type to initialize this callback?

It is just a convenient shorthand for StreamingLastResponseCallbackHandler(answer_prefix_phrases=["Final Answer:"])

We can use this class by custom defining answer_prefix_phrases like this

stream = StreamingLastResponseCallbackHandler(answer_prefix_phrases=["Now I know the final answer:"])

As in here:

https://github.com/hwchase17/langchain/blob/e208414a9ae3cc9a6d84b6ebfe5b2c7e4c0cea77/langchain/callbacks/streaming_last_response_callback.py#L308-L323

Also a suggestion: Consider renaming the class to make this PR backward-compatible. If people are already using FinalStreamingStdOutCallbackHandler, they'll get errors.

@UmerHA IMO, I prefer to keep both classes as separate files, as each classes serve different purposes.

baskaryan · 2023-07-13T22:10:15Z

langchain/callbacks/streaming_last_response_callback.py

+                "This is needed in order to calculate detection_windows_size for StreamingLastResponseCallbackHandler"
+                "Please install it with `pip install tiktoken`."
+            )
+        self._enc = tiktoken.get_encoding(tiktoken_encoding)


is it worth letting folks pass in whatever tokenize they want (can still default to tiktoken "cl100k_base")?

is it worth letting folks pass in whatever tokenize they want (can still default to tiktoken "cl100k_base")?

@baskaryan Thank you for the suggestion, I have implemented it, please check it!

langchain/libs/langchain/langchain/callbacks/streaming_last_response_callback.py

Lines 106 to 135 in 4c8bdce

super().__init__()

if isinstance(tokenizer, str):

try:

import tiktoken

except ImportError:

raise ImportError(

"Could not import tiktoken python package. "

"This is needed in order to calculate detection_windows_size for StreamingLastResponseCallbackHandler"

"Please install it with `pip install tiktoken`."

)

tokenizer = tiktoken.get_encoding(tokenizer)

else:

try:

from transformers import PreTrainedTokenizerBase

if not isinstance(tokenizer, PreTrainedTokenizerBase):

raise ValueError(

"Tokenizer received was neither a string nor a PreTrainedTokenizerBase from transformers."

)

except ImportError:

raise ValueError(

"Could not import transformers python package. "

"Please install it with `pip install transformers`."

)

def _huggingface_tokenizer_length(text: str) -> int:

return len(tokenizer.encode(text))

self._get_length_in_tokens = _huggingface_tokenizer_length

vercel · 2023-07-13T22:41:47Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Aug 13, 2023 7:37pm

baskaryan · 2023-07-13T22:43:03Z

docs/extras/modules/agents/how_to/streaming_last_agent_response.ipynb

thanks for pr @thaiminhpv! could we add a not explaining the difference between this and FinalStreamingStdOutCallbackHandler? imagine it might otherwise confuse a lot of folks

@baskaryan
I think the differences are as follows:

The FinalStreamingStdOutCallbackHandler, as the name suggests, it just output the final response of the agent to stdout. User can implement anything such as postprocess or complex logic on top of that.

The StreamingLastResponseCallbackHandler, is like the battery included version of FinalStreamingStdOutCallbackHandler, with additional features that I mentioned earlier in this PR.

In other words:

If the user needs to custom some complex logic themselves -> go for FinalStreamingStdOutCallbackHandler

If user want to post-process on-the-fly, or use multiple answer-prefix-phrase, or use abnormal detection feature -> go for FinalStreamingStdOutCallbackHandler.

I'm not sure where to document the explanation of the difference. Do you have any idea?

…ution note

thaiminhpv · 2023-08-13T19:54:49Z

BTW, I've rebased my branch (manually) to be ahead of the master branch, following by a force push.

This means we can merge my branch without any conflict, for now.

leo-gan · 2023-09-15T01:57:57Z

@baskaryan Could you, please, review it?

hinetabi · 2023-09-24T04:45:26Z

This PR is so helpful, please merge it. I want this in the new version of langchain.

ilianherzi · 2023-09-25T00:36:33Z

+2 would love to see this!

IdkwhatImD0ing · 2023-10-14T23:20:41Z

Is there a way to make this work for openai functions?

leo-gan · 2023-10-16T01:16:42Z

@baskaryan FYI

thaiminhpv · 2024-02-24T14:26:52Z

This pull request was created prior to the introduction of OpenAI functions and the LangChain Expression Language (LCEL). I think this pull request is now obsolete and outdated.

alexgg278 · 2024-03-09T09:52:26Z

Isn't it possible to adapt it to LCEL and OpenAI functions? It would be usefult to have it

Tesax123 · 2024-03-18T16:47:56Z

+1, this would be very useful.

thaiminhpv force-pushed the thaiminhpv/streaming-last-response-callbacks branch 3 times, most recently from ee75e0a to 4ff2718 Compare June 12, 2023 17:14

thaiminhpv force-pushed the thaiminhpv/streaming-last-response-callbacks branch from 1828015 to aea3734 Compare June 12, 2023 20:24

thaiminhpv mentioned this pull request Jun 12, 2023

using a Agent and wanted to stream just the final response #2483

Closed

baskaryan reviewed Jul 13, 2023

View reviewed changes

baskaryan assigned nfcampos and agola11 Jul 13, 2023

baskaryan added the 03 enhancement Enhancement of existing functionality label Jul 13, 2023

baskaryan reviewed Jul 13, 2023

View reviewed changes

thaiminhpv added 3 commits August 14, 2023 01:19

Add streaming_last_response_callback.py and usage example notebook

680ddc2

reset callback state after streaming

4931a0b

change example streaming_last_response_callback to ChatOpenAI, add ca…

f18dafe

…ution note

thaiminhpv force-pushed the thaiminhpv/streaming-last-response-callbacks branch from 946a9e2 to f18dafe Compare August 13, 2023 19:25

fix linting

4c8bdce

thaiminhpv requested a review from baskaryan August 16, 2023 14:24

leo-gan requested a review from hwchase17 September 25, 2023 15:29

leo-gan requested a review from efriis October 16, 2023 01:18

hwchase17 closed this Jan 30, 2024

thaiminhpv deleted the thaiminhpv/streaming-last-response-callbacks branch February 24, 2024 14:27

thaiminhpv restored the thaiminhpv/streaming-last-response-callbacks branch February 24, 2024 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming final agent response callbacks #5937

Streaming final agent response callbacks #5937

thaiminhpv commented Jun 9, 2023 •

edited

Loading

thaiminhpv commented Jun 12, 2023

vercel bot commented Jun 17, 2023

UmerHA commented Jun 17, 2023

UmerHA commented Jun 17, 2023

thaiminhpv commented Jun 18, 2023 •

edited

Loading

baskaryan Jul 13, 2023

thaiminhpv Aug 13, 2023 •

edited

Loading

vercel bot commented Jul 13, 2023 •

edited

Loading

baskaryan Jul 13, 2023

thaiminhpv Aug 13, 2023

thaiminhpv commented Aug 13, 2023

leo-gan commented Sep 15, 2023

hinetabi commented Sep 24, 2023 •

edited

Loading

ilianherzi commented Sep 25, 2023

IdkwhatImD0ing commented Oct 14, 2023

leo-gan commented Oct 16, 2023

thaiminhpv commented Feb 24, 2024

alexgg278 commented Mar 9, 2024

Tesax123 commented Mar 18, 2024

	super().__init__()

	if isinstance(tokenizer, str):
	try:
	import tiktoken
	except ImportError:
	raise ImportError(
	"Could not import tiktoken python package. "
	"This is needed in order to calculate detection_windows_size for StreamingLastResponseCallbackHandler"
	"Please install it with `pip install tiktoken`."
	)
	tokenizer = tiktoken.get_encoding(tokenizer)
	else:
	try:
	from transformers import PreTrainedTokenizerBase

	if not isinstance(tokenizer, PreTrainedTokenizerBase):
	raise ValueError(
	"Tokenizer received was neither a string nor a PreTrainedTokenizerBase from transformers."
	)
	except ImportError:
	raise ValueError(
	"Could not import transformers python package. "
	"Please install it with `pip install transformers`."
	)

	def _huggingface_tokenizer_length(text: str) -> int:
	return len(tokenizer.encode(text))

	self._get_length_in_tokens = _huggingface_tokenizer_length

Streaming final agent response callbacks #5937

Streaming final agent response callbacks #5937

Conversation

thaiminhpv commented Jun 9, 2023 • edited Loading

Quick setup

Usage 1 - use as callback function for last response new token

Usage 2 - use as iterator

Usage 3 - Post process on-the-fly

Who can review?

thaiminhpv commented Jun 12, 2023

vercel bot commented Jun 17, 2023

UmerHA commented Jun 17, 2023

UmerHA commented Jun 17, 2023

thaiminhpv commented Jun 18, 2023 • edited Loading

baskaryan Jul 13, 2023

Choose a reason for hiding this comment

thaiminhpv Aug 13, 2023 • edited Loading

Choose a reason for hiding this comment

vercel bot commented Jul 13, 2023 • edited Loading

baskaryan Jul 13, 2023

Choose a reason for hiding this comment

thaiminhpv Aug 13, 2023

Choose a reason for hiding this comment

thaiminhpv commented Aug 13, 2023

leo-gan commented Sep 15, 2023

hinetabi commented Sep 24, 2023 • edited Loading

ilianherzi commented Sep 25, 2023

IdkwhatImD0ing commented Oct 14, 2023

leo-gan commented Oct 16, 2023

thaiminhpv commented Feb 24, 2024

alexgg278 commented Mar 9, 2024

Tesax123 commented Mar 18, 2024

thaiminhpv commented Jun 9, 2023 •

edited

Loading

thaiminhpv commented Jun 18, 2023 •

edited

Loading

thaiminhpv Aug 13, 2023 •

edited

Loading

vercel bot commented Jul 13, 2023 •

edited

Loading

hinetabi commented Sep 24, 2023 •

edited

Loading