-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using a Agent and wanted to stream just the final response #2483
Comments
Me,too |
Ticked the top reply but commenting to bump the issue |
Me, too |
Do you have any solution for this? |
Do you want to just get the final response, without intermediate output? Doesn't If you're looking for something else, feel free to describe it. I'd be happy to implement it :) |
I think the main question here is not that we want to get only the final answer, but the streaming part, so the user does have to wait 20-30 seconds for the answer, but it comes by words so the response seems faster. |
is there a solution to this? |
As far as im concerned, no |
Done ✅ - see #4630 Recording.mp4 |
can we use agent.run(or any other method) returns as a generator so that we can do some custom processing for outputs? @UmerHA thanks |
You can subclass |
In fact, I need to return each char just like openai stream, I need to wrap each char in json with some additional information and pass it to the downstream service @UmerHA |
LLMs return the output on a token-level, not character-level. Making the characters appear one after another is a UI trick. Still, you can get your desired behavior by subclassing
You can use it like this:
Edit: formatting |
I updated the langchain to the latest version(v0.0.169), but I didn't find the FinalStreamingStdOutCallbackHandler |
The PR (#4630) hasn't been merged yet, so it's not part of the code base yet. Waiting for the LangChain team to merge it. |
Have been facing the same challenge! Eagerly waiting to take benefit from this PR. Hoping they'd soon merge with master. |
I have created a PR #5937 to support returning a generator by using simple syntax: Basic setupfrom langchain.agents import load_tools, initialize_agent, AgentType
from langchain.llms import OpenAI
from langchain.callbacks import StreamingLastResponseCallbackHandler
llm = OpenAI(temperature=0, streaming=True)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False) Then this is all you need to do import threading
stream = StreamingLastResponseCallbackHandler.from_agent_type(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
def _run():
agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?", callbacks=[stream])
threading.Thread(target=_run).start()
for token in stream: # <-- generator here
print(token, end="", flush=True) if you want to do it the @stream.on_last_response_new_token()
def on_new_token(token: str):
print(f"Next token: '{token}'") |
Have you noticed the fact that not all types of agents can stream the reponse? Or is it just for me??? |
I posted a similar question here https://stackoverflow.com/questions/76663943/how-to-use-openai-in-streaming-mode-with-gradio |
Me,too |
For what Its worth, if you're using AgentType.CONVERSATIONAL_REACT_DESCRIPTION, this is the callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=['AI', ':']) |
I am not a fan of using import asyncio
from typing import Any, AsyncIterator, Dict, List, Literal, Union, cast
from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema.agent import AgentAction, AgentFinish
from langchain.schema.output import LLMResult
class CustomAsyncCallbackHandler(AsyncCallbackHandler):
"""
Streaming callback handler that returns an async iterator. This supports
both streaming llm and agent executors.
:param is_agent: Whether this is an agent executor.
"""
queue: asyncio.Queue[str]
done: asyncio.Event
@property
def always_verbose(self) -> bool:
return True
def __init__(self, is_agent: bool = False) -> None:
self.queue = asyncio.Queue()
self.done = asyncio.Event()
self.is_agent = is_agent
async def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
print("================== LLM Start! ==========================")
self.done.clear()
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
if token is not None and token != "":
self.queue.put_nowait(token)
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""
Do not close the queue here, as it may be used by the agent.
"""
if not self.is_agent:
self.done.set()
else:
print("================== LLM finished! ==========================")
print(response)
generation_info = response.generations[-1][-1].generation_info
if generation_info is not None:
print(
"================== LLM finish reason! =========================="
)
# Figured out through trial and error
if generation_info.get("finish_reason") == "stop":
self.done.set()
async def on_llm_error(self, error: BaseException, **kwargs: Any) -> None:
if not self.is_agent:
self.done.set()
# async def on_agent_action(self, action: AgentAction, **kwargs: Any) -> None:
# """Run on agent action."""
# async def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
# """Run on agent end."""
# self.done.set()
# async def on_chain_start(
# self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
# ) -> None:
# self.done.clear()
# async def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
# """Run when chain ends running."""
# self.done.set()
async def aiter(self) -> AsyncIterator[str]:
"""
Returns an async iterator that yields tokens from the LLM.
"""
while not self.queue.empty() or not self.done.is_set():
# Wait for the next token in the queue or for the done event to be set
done, other = await asyncio.wait(
[
asyncio.ensure_future(self.queue.get()),
asyncio.ensure_future(self.done.wait()),
],
return_when=asyncio.FIRST_COMPLETED,
)
# Cancel the other task
if other:
other.pop().cancel()
# Extract the value of the first completed task
token_or_done = cast(Union[str, Literal[True]], done.pop().result())
# If the extracted value is the boolean True, the done event was set
if token_or_done is True:
break
# Otherwise, the extracted value is a token, which we yield
yield token_or_done Edit: After some more details testing, looks like For completeness purposes, here's a FastAPI route: @router.post("/")
async def general(input: Request):
# Create the memory out of the provided messages list
prompt = input.prompt
memory = ConversationSummaryBufferMemory(
llm=OpenAI(model="gpt-3.5-turbo-instruct"),
chat_memory=get_chat_memory(input.messages or []),
)
# Create the conversation and the async callback handler
handler = CustomAsyncCallbackHandler()
conversation = ConversationChain(
llm=OpenAI(model="gpt-3.5-turbo-instruct", streaming=True, callbacks=[handler]),
memory=memory,
)
async def ask_question_async():
asyncio.create_task(conversation.apredict(input=prompt))
async for chunk in handler.aiter():
yield f"data: {json.dumps({'content': chunk, 'tokens': 0})}\n\n"
return StreamingResponse(ask_question_async(), media_type="text/plain") |
Trying the above, but want to include intermediate steps in terms of Tools usage, really difficult as I don't have a way to distinguish between final output and intermediate logs / thoughts... |
You might consider using the new LangChain expression language. I also recommend the new take on Agents from LangChain 0.1.0 omwards. You can do so much more know from streaming the output out of the box, to even stream the intermediary steps themselves. I might follow with a project on GitHub (not in the near future though) |
@usersina can you refer to the documentation explaining that you can directly stream the final output. I have only found the same stream method that streams intermediare steps and actions |
This page has it all - https://python.langchain.com/docs/expression_language/streaming#using-stream-events |
I successfully resolved this issue using the langchain-openai-api-bridge library. You can find the library here: https://github.com/samuelint/langchain-openai-api-bridge Key Benefits:
Example Usage: # Client
openai_client = OpenAI(
base_url="http://my-langchain-server/my-custom-path/openai/v1",
)
chat_completion = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": 'Say "This is a test"',
}
],
)
print(chat_completion.choices[0].message.content)
#> "This is a test" Hope this help :) |
This is how I solved this issue using Streamlit and GPT-4o
Why am I passing the callback through the |
I am using a Agent and wanted to stream just the final response, do you know if that is supported already? and how to do it?
The text was updated successfully, but these errors were encountered: