-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stream
param for inference APIs
#198646
Add stream
param for inference APIs
#198646
Conversation
/ci |
/ci |
/ci |
/ci |
Pinging @elastic/appex-ai-infra (Team:AI Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kibana-presentation changes LGTM
code review only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the docs, this looks really good. I left a few questions, and some minor suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
History
|
@elasticmachine merge upstream |
Starting backport for target branches: 8.x https://github.com/elastic/kibana/actions/runs/11686697979 |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
## Summary Fix elastic#198644 Add a `stream` parameter to the `chatComplete` and `output` APIs, defaulting to `false`, to switch between "full content response as promise" and "event observable" responses. Note: at the moment, in non-stream mode, the implementation is simply constructing the response from the observable. It should be possible later to improve this by having the LLM adapters handle the stream/no-stream logic, but this is out of scope of the current PR. ### Normal mode ```ts const response = await chatComplete({ connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); const { content, toolCalls } = response; // do something ``` ### Stream mode ```ts const events$ = chatComplete({ stream: true, connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); events$.subscribe((event) => { // do something }); ``` --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> (cherry picked from commit fe16822)
# Backport This will backport the following commits from `main` to `8.x`: - Add `stream` param for inference APIs (#198646) (fe16822) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Pierre Gayvallet","email":"pierre.gayvallet@elastic.co"},"sourceCommit":{"committedDate":"2024-11-05T14:54:41Z","message":"Add `stream` param for inference APIs (#198646)\n\n## Summary\r\n\r\nFix https://github.com/elastic/kibana/issues/198644\r\n\r\nAdd a `stream` parameter to the `chatComplete` and `output` APIs,\r\ndefaulting to `false`, to switch between \"full content response as\r\npromise\" and \"event observable\" responses.\r\n\r\nNote: at the moment, in non-stream mode, the implementation is simply\r\nconstructing the response from the observable. It should be possible\r\nlater to improve this by having the LLM adapters handle the\r\nstream/no-stream logic, but this is out of scope of the current PR.\r\n\r\n### Normal mode\r\n```ts\r\nconst response = await chatComplete({\r\n connectorId: 'my-connector',\r\n system: \"You are a helpful assistant\",\r\n messages: [\r\n { role: MessageRole.User, content: \"Some question?\"},\r\n ]\r\n});\r\n\r\nconst { content, toolCalls } = response;\r\n// do something\r\n```\r\n\r\n### Stream mode\r\n```ts\r\nconst events$ = chatComplete({\r\n stream: true,\r\n connectorId: 'my-connector',\r\n system: \"You are a helpful assistant\",\r\n messages: [\r\n { role: MessageRole.User, content: \"Some question?\"},\r\n ]\r\n});\r\n\r\nevents$.subscribe((event) => {\r\n // do something\r\n});\r\n\r\n```\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>","sha":"fe168221df0d0bf598a8c32eb3c910df402572db"},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[]}] BACKPORT-->
## Summary Fix elastic#198644 Add a `stream` parameter to the `chatComplete` and `output` APIs, defaulting to `false`, to switch between "full content response as promise" and "event observable" responses. Note: at the moment, in non-stream mode, the implementation is simply constructing the response from the observable. It should be possible later to improve this by having the LLM adapters handle the stream/no-stream logic, but this is out of scope of the current PR. ### Normal mode ```ts const response = await chatComplete({ connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); const { content, toolCalls } = response; // do something ``` ### Stream mode ```ts const events$ = chatComplete({ stream: true, connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); events$.subscribe((event) => { // do something }); ``` --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Summary
Fix #198644
Add a
stream
parameter to thechatComplete
andoutput
APIs, defaulting tofalse
, to switch between "full content response as promise" and "event observable" responses.Note: at the moment, in non-stream mode, the implementation is simply constructing the response from the observable. It should be possible later to improve this by having the LLM adapters handle the stream/no-stream logic, but this is out of scope of the current PR.
Normal mode
Stream mode