Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream param for inference APIs #198646

Merged
merged 17 commits into from
Nov 5, 2024

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Nov 1, 2024

Summary

Fix #198644

Add a stream parameter to the chatComplete and output APIs, defaulting to false, to switch between "full content response as promise" and "event observable" responses.

Note: at the moment, in non-stream mode, the implementation is simply constructing the response from the observable. It should be possible later to improve this by having the LLM adapters handle the stream/no-stream logic, but this is out of scope of the current PR.

Normal mode

const response = await chatComplete({
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

const { content, toolCalls } = response;
// do something

Stream mode

const events$ = chatComplete({
  stream: true,
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

events$.subscribe((event) => {
   // do something
});

@pgayvallet pgayvallet added release_note:skip Skip the PR/issue when compiling release notes backport:prev-minor Backport to (9.0) the previous minor version (i.e. one version back from main) Team:AI Infra AppEx AI Infrastructure Team v8.17.0 labels Nov 1, 2024
@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet pgayvallet marked this pull request as ready for review November 1, 2024 12:47
@pgayvallet pgayvallet requested review from a team as code owners November 1, 2024 12:47
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-ai-infra (Team:AI Infra)

Copy link
Contributor

@nreese nreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kibana-presentation changes LGTM
code review only

Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the docs, this looks really good. I left a few questions, and some minor suggestions.

@pgayvallet pgayvallet requested a review from legrego November 5, 2024 11:05
Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/inference-common 39 38 -1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
dashboard 647.2KB 647.3KB +42.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/inference-common 0 1 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
inference 5.7KB 6.0KB +307.0B
Unknown metric groups

API count

id before after diff
@kbn/inference-common 99 121 +22

History

@pgayvallet
Copy link
Contributor Author

@elasticmachine merge upstream

@pgayvallet pgayvallet enabled auto-merge (squash) November 5, 2024 13:14
@pgayvallet pgayvallet merged commit fe16822 into elastic:main Nov 5, 2024
21 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11686697979

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.x Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 198646

Questions ?

Please refer to the Backport tool documentation

pgayvallet added a commit to pgayvallet/kibana that referenced this pull request Nov 6, 2024
## Summary

Fix elastic#198644

Add a `stream` parameter to the `chatComplete` and `output` APIs,
defaulting to `false`, to switch between "full content response as
promise" and "event observable" responses.

Note: at the moment, in non-stream mode, the implementation is simply
constructing the response from the observable. It should be possible
later to improve this by having the LLM adapters handle the
stream/no-stream logic, but this is out of scope of the current PR.

### Normal mode
```ts
const response = await chatComplete({
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

const { content, toolCalls } = response;
// do something
```

### Stream mode
```ts
const events$ = chatComplete({
  stream: true,
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

events$.subscribe((event) => {
   // do something
});

```

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit fe16822)
pgayvallet added a commit that referenced this pull request Nov 6, 2024
# Backport

This will backport the following commits from `main` to `8.x`:
 - Add `stream` param for inference APIs (#198646) (fe16822)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Pierre
Gayvallet","email":"pierre.gayvallet@elastic.co"},"sourceCommit":{"committedDate":"2024-11-05T14:54:41Z","message":"Add
`stream` param for inference APIs (#198646)\n\n## Summary\r\n\r\nFix
https://github.com/elastic/kibana/issues/198644\r\n\r\nAdd a `stream`
parameter to the `chatComplete` and `output` APIs,\r\ndefaulting to
`false`, to switch between \"full content response as\r\npromise\" and
\"event observable\" responses.\r\n\r\nNote: at the moment, in
non-stream mode, the implementation is simply\r\nconstructing the
response from the observable. It should be possible\r\nlater to improve
this by having the LLM adapters handle the\r\nstream/no-stream logic,
but this is out of scope of the current PR.\r\n\r\n### Normal
mode\r\n```ts\r\nconst response = await chatComplete({\r\n connectorId:
'my-connector',\r\n system: \"You are a helpful assistant\",\r\n
messages: [\r\n { role: MessageRole.User, content: \"Some
question?\"},\r\n ]\r\n});\r\n\r\nconst { content, toolCalls } =
response;\r\n// do something\r\n```\r\n\r\n### Stream
mode\r\n```ts\r\nconst events$ = chatComplete({\r\n stream: true,\r\n
connectorId: 'my-connector',\r\n system: \"You are a helpful
assistant\",\r\n messages: [\r\n { role: MessageRole.User, content:
\"Some question?\"},\r\n ]\r\n});\r\n\r\nevents$.subscribe((event) =>
{\r\n // do
something\r\n});\r\n\r\n```\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"fe168221df0d0bf598a8c32eb3c910df402572db"},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->
mgadewoll pushed a commit to mgadewoll/kibana that referenced this pull request Nov 7, 2024
## Summary

Fix elastic#198644

Add a `stream` parameter to the `chatComplete` and `output` APIs,
defaulting to `false`, to switch between "full content response as
promise" and "event observable" responses.

Note: at the moment, in non-stream mode, the implementation is simply
constructing the response from the observable. It should be possible
later to improve this by having the LLM adapters handle the
stream/no-stream logic, but this is out of scope of the current PR.

### Normal mode
```ts
const response = await chatComplete({
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

const { content, toolCalls } = response;
// do something
```

### Stream mode
```ts
const events$ = chatComplete({
  stream: true,
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

events$.subscribe((event) => {
   // do something
});

```

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (9.0) the previous minor version (i.e. one version back from main) release_note:skip Skip the PR/issue when compiling release notes Team:AI Infra AppEx AI Infrastructure Team v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[inference] Add non-stream versions of the chatComplete and output APIs.
6 participants