Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] Streaming support #217

Closed
2 of 4 tasks
sonichi opened this issue Oct 12, 2023 · 20 comments · Fixed by #1551
Closed
2 of 4 tasks

[Roadmap] Streaming support #217

sonichi opened this issue Oct 12, 2023 · 20 comments · Fixed by #1551
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed in-progress Roadmap is actively being worked on roadmap Issues related to roadmap of AutoGen

Comments

@sonichi
Copy link
Contributor

sonichi commented Oct 12, 2023

stream messages from agent's reply.

Tasks

  1. ui/deploy
  2. streaming
@sonichi sonichi added enhancement New feature or request roadmap Issues related to roadmap of AutoGen labels Oct 12, 2023
@sonichi sonichi added the help wanted Extra attention is needed label Oct 22, 2023
@sonichi
Copy link
Contributor Author

sonichi commented Oct 22, 2023

No one is working on this issue as far as I know. Any volunteer to take a lead?

@maxim-saplin
Copy link
Collaborator

maxim-saplin commented Oct 23, 2023

A bit of criticism... Is Streaming needed?

It doesn't seem AutoGen is a user facing tool with UI/UX being a top priority, while streaming mainly solves user experience concerns (interactivity and visible progress). After all the performance (total time to complete the request) is that same no matter is streaming is enabled or not.

On the contrary it seems that implementing streaming is a large piece of work and many parts will be affected and hard to maintain:

  1. Working with chunks in async matter might require creating 2 flavours of all APIs that use completions
  2. Cost accounting will be broken right away, streaming APIs (at least from OpenAI) don't return token stats for stream responses. OpenAI suggests to utilise tiktoken to do own accounting, though my experience with tiktoken says that there will alway be ~1% discrepancy with what OpenAI returns - costing will become less accurate.

IMO, it is a Large piece of work for a Small value (speaking in terms of S-M-L sizing and prioritizing).

@ragyabraham
Copy link
Collaborator

@sonichi im actually working on this already. Happy to pick this up

@sonichi
Copy link
Contributor Author

sonichi commented Oct 23, 2023

Thanks. One thing to pay attention to is #203 . If your work uses the streaming feature from openai, better target the newer version.
I'm currently working on #203 without the streaming part.

@ragyabraham
Copy link
Collaborator

@sonichi I've also done some work to enable messages to be sent over sockets rather than printing in a terminal. Do you think that is something that I should open a PR for? Would that be useful?

@sonichi
Copy link
Contributor Author

sonichi commented Oct 24, 2023

That sounds useful because I've heard many different people trying to do that. @victordibia @AaronWard may be interested and feel free to pull others who are also interested.

@ragyabraham
Copy link
Collaborator

ragyabraham commented Oct 24, 2023

ok sounds good. I'll start a new issue and push the changes I have so far. @victordibia @AaronWard please refer to #394

@victordibia
Copy link
Collaborator

victordibia commented Oct 24, 2023

I replied on #394 .
One thing to note here is that the work by @ragyabraham is more focused on streaming completed responses from each agent within an active conversation, not directly streaming the tokens from each agent as they are generated by an llm. The later is more involved and has unclear use cases/benefits (as mentioned by @maxim-saplin above).
@ragyabraham kindly confirm that this is your focus here?

@ragyabraham
Copy link
Collaborator

Hi @victordibia, I intend to utilise streaming to chunk all responses from the LLM. The approach we are thinking of is:

  1. we chunk the response and utilise some sort of messaging framework to emit messages to a server (e.g. socketio that sends messages to the FE)
  2. chunks are aggregated in memory (e.g. string += chunk)
  3. once all chunks have been consumed the complete message can be sent to the intended team member/recipient

Please let me know what you think

@Alvaromah
Copy link
Collaborator

Hi!,

I've just created PR #465 to introduce streaming support in a straightforward and non-intrusive manner.

Usage:

llm_config={
    "config_list": config_list,
    # Enable, disable streaming (defaults to False)
    "stream": True,
}

assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=llm_config,
)

Please, feel free to review the code and make suggestions.

@lianghsun
Copy link

Hi @Alvaromah , thank you for your contribution, which has enabled autogen to stream in the terminal. However, I would like to ask if there's a way to support streaming simultaneously to an external output? I'm asking this because if autogen is integrated with other UI frameworks, it would be desirable to see a streaming effect. I've tried modifying some parts of the source code to use 'yield', but it doesn't seem to have made any difference. Thank you for your help. 😀

@ragyabraham
Copy link
Collaborator

You'll need to use websockets

@Joaoprcf
Copy link

Joaoprcf commented Dec 21, 2023

Hi!,

I've just created PR #465 to introduce streaming support in a straightforward and non-intrusive manner.

Usage:

llm_config={
    "config_list": config_list,
    # Enable, disable streaming (defaults to False)
    "stream": True,
}

assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=llm_config,
)

Please, feel free to review the code and make suggestions.

Instead of just stream: True, maybe also allow stream: callable, where True would point to sys.stdout.write by default. It is a small change that would make a big difference.

@sonichi
Copy link
Contributor Author

sonichi commented Jan 1, 2024

@Joaoprcf Good suggestion! Please feel free to make a PR and add @Alvaromah @ragyabraham as a reviewer.

@bitnom
Copy link
Contributor

bitnom commented Jan 2, 2024

Hi!,
I've just created PR #465 to introduce streaming support in a straightforward and non-intrusive manner.
Usage:

llm_config={
    "config_list": config_list,
    # Enable, disable streaming (defaults to False)
    "stream": True,
}

assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=llm_config,
)

Please, feel free to review the code and make suggestions.

Instead of just stream: True, maybe also allow stream: callable, where True would point to sys.stdout.write by default. It is a small change that would make a big difference.

I think it best to add a new parameter. Let's not introduce a weird boolean that isn't a boolean. What we should probably have is:

llm_config={
    "config_list": config_list,
    "stream": True,
    "response_callback": my_cb_func
}

which would fire for both stream: True (chunks) and stream: False (full message). I don't think there's a need to separate them since multiple chunks give you the full message anyway.

Note: must ensure that the finished message (stop_reason or whatever the return model calls it) is always passed to the callback.

Also, see #1118 about function streams.

@matbee-eth
Copy link

Any idea when these PRs can land?

@sonichi
Copy link
Contributor Author

sonichi commented Feb 11, 2024

@matbee-eth if you'd like to help accelerate it, please participate in #1551

@jackgerrits jackgerrits assigned ekzhu and davorrunje and unassigned ragyabraham Mar 18, 2024
@jackgerrits jackgerrits added the in-progress Roadmap is actively being worked on label Mar 18, 2024
@jackgerrits jackgerrits changed the title streaming support [Roadmap] Streaming support Mar 18, 2024
@vinodvarma24
Copy link

It would be great to have the steaming enabled so that for end-user production applications, UX will be better.

@vinodvarma24
Copy link

A bit of criticism... Is Streaming needed?

It doesn't seem AutoGen is a user facing tool with UI/UX being a top priority, while streaming mainly solves user experience concerns (interactivity and visible progress). After all the performance (total time to complete the request) is that same no matter is streaming is enabled or not.

On the contrary it seems that implementing streaming is a large piece of work and many parts will be affected and hard to maintain:

  1. Working with chunks in async matter might require creating 2 flavours of all APIs that use completions
  2. Cost accounting will be broken right away, streaming APIs (at least from OpenAI) don't return token stats for stream responses. OpenAI suggests to utilise tiktoken to do own accounting, though my experience with tiktoken says that there will alway be ~1% discrepancy with what OpenAI returns - costing will become less accurate.

IMO, it is a Large piece of work for a Small value (speaking in terms of S-M-L sizing and prioritizing).

It's a big value for end users of the Agents

@davorrunje
Copy link
Collaborator

It would be great to have the steaming enabled so that for end-user production applications, UX will be better.

@vinodvarma24 streaming via websockets is implemented in #1551, please take a look at it and let us know what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed in-progress Roadmap is actively being worked on roadmap Issues related to roadmap of AutoGen
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.