Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Improve recall speed #176428

Merged
merged 16 commits into from
Feb 8, 2024

Conversation

dgieselaar
Copy link
Member

@dgieselaar dgieselaar commented Feb 7, 2024

Improves recall speed by outputting as CSV with zero-indexed document "ids". Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason. I didn't actually see a difference in completion speed, but emitting the first value took significantly less time when using the CSV output. I also tried sending a single document per request using the old format, and while that certainly improves things, the slowest request becomes the bottleneck. These are results from about 10 tries per strategy (I'd love to see others reproduce at least the batch vs csv strategy results):

batch: 24.7s
chunk: 10s
csv: 4.9s

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@miltonhultgren
Copy link
Contributor

(It's late so maybe I'm just being stupid but...)
It feels like the PR is missing some changes?
Like I don't see the change to the CSV format, and it looks like you're running 1 request per document in parallel in this version.

@dgieselaar
Copy link
Member Author

@miltonhultgren lol, thanks - committed the wrong stash 😭 i ran the chunk one last and forgot to revert to the csv strategy. Will fix

@dgieselaar
Copy link
Member Author

@miltonhultgren should be ready now. One thing to add here: I'm sending over the entire conversation now as context for the LLM to evaluate. I think that leads to better results rather than just sending over the current question (e.g. consider follow up questions).

@sorenlouv
Copy link
Member

Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason.

Does this mean we are optimizing for implementation details in openai's model that could change at any time?
Perhaps we should invest in automation to monitor response time from the LLM?

Copy link
Member

@sorenlouv sorenlouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits and questions

@@ -87,12 +87,17 @@ export function registerRecallFunction({
messages.filter((message) => message.message.role === MessageRole.User)
Copy link
Member

@sorenlouv sorenlouv Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a good use case for findLast (well-supported and not nearly used enough :D )

Suggested change
messages.filter((message) => message.message.role === MessageRole.User)
messages.findLast((message) => message.message.role === MessageRole.User)

Copy link
Member Author

@dgieselaar dgieselaar Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know this exists 😄 unfortunately TS doesn't know about it either (??)

Copy link
Member

@sorenlouv sorenlouv Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately TS doesn't know about it either (??)

Argh, I see that now. Odd since CanIUse reports mainstream support in all browsers we care about since at least 2022.

Edit seems like we need to wait until Typescript 5


const queriesOrUserPrompt = nonEmptyQueries.length
? nonEmptyQueries
: compact([userMessage?.message.content]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always filter by the user query combined with the query from the LLM? Is that too restrictive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd like to investigate is have the LLM decide when to recall (other than the first message). E.g., if somebody asks "what does the following error mean: ..." and then "what are the consequences of this error", the latter doesn't really need a recall. If they ask "how does this affect my checkout service" it does. The LLM should be able to classify this, and rewrite it in a way that does not require us to send over the entire conversation. In that case, the query it chooses should be the only thing we use. So, I'm preparing for that future.

Comment on lines 181 to 182
dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 7,
0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in
Copy link
Member

@sorenlouv sorenlouv Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering about this before: why is the scoring interval 0-7? Is it something magic?
Btw It looks like you expect a scoring between 0-10

Suggested change
dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 7,
0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in
dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 10,
0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miltonhultgren any thoughts here on 0 to 7 vs 10?

@@ -250,18 +229,27 @@ async function scoreSuggestions({
(
await client.chat('score_suggestions', {
connectorId,
messages: [extendedSystemMessage, newUserMessage],
messages: [...messages.slice(-1), newUserMessage],
Copy link
Member

@sorenlouv sorenlouv Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we use last instead? (my head spins when doing array math)

Suggested change
messages: [...messages.slice(-1), newUserMessage],
messages: [last(messages), newUserMessage],

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a typo 😄. it's supposed to be "everything except the last", so .slice(0,-1). Will correct.

Comment on lines +244 to +251
const scores = scoresAsString.split('\n').map((line) => {
const [index, score] = line
.split(',')
.map((value) => value.trim())
.map(Number);

return { id: suggestions[index].id, score };
});
Copy link
Member

@sorenlouv sorenlouv Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How confident are we that this is the format the LLM will respond with (seeing we support multiple LLMs soon)? Should we handle the cases where the LLM will respond in non-csv format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look in the Bedrock PR, but I don't think we should expect too much from or invest too much in LLMs other than OpenAI, until there is an LLM that has similar performance.

Comment on lines 538 to 541
.then(
() => {},
() => {}
)
Copy link
Member

@sorenlouv sorenlouv Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this then necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, just doing this to catch an error so it doesn't result in an unhandled promise rejection. But I don't need a then, I can just use a catch.

@sorenlouv
Copy link
Member

sorenlouv commented Feb 8, 2024

I'd love to see others reproduce at least the batch vs csv strategy results

How can I do that? I'm already capturing traces to a private cluster. How can I benchmark batch vs csv?

@dgieselaar
Copy link
Member Author

I'd love to see others reproduce at least the batch vs csv strategy results

How can I do that? I'm already capturing traces to a private cluster. How can I benchmark batch vs csv?

In the chat function, add the following line:

span?.setLabel('recall_strategy', 'batch' | 'csv')

depending on which strategy you are using. Revert the commits that change the strategy. You can then either inspect the logs or go to https://kibana-cloud-apm.elastic.dev/ to inspect your spans.

@dgieselaar
Copy link
Member Author

Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason.

Does this mean we are optimizing for implementation details in openai's model that could change at any time? Perhaps we should invest in automation to monitor response time from the LLM?

I'm hoping we can use APM for this, or do you want to run a performance test at set intervals outside of regular usage?

@dgieselaar
Copy link
Member Author

@elasticmachine merge upstream

.finally(() => {
this.dependencies.logger.debug(
`Received first value after ${Math.round(performance.now() - now)}ms${
spanId ? ` (${spanId})` : ''
Copy link
Contributor

@klacabane klacabane Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these intermediate metrics only available in logs or can we somehow store them in the span ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, as labels, but not sure if it has tremendous value - adding labels changes the mapping so I'm a little wary of adding more. Let's see if we need it.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dgieselaar dgieselaar merged commit fc58a0d into elastic:main Feb 8, 2024
17 checks passed
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.12 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 176428

Questions ?

Please refer to the Backport tool documentation

@dgieselaar
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.12

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

dgieselaar added a commit to dgieselaar/kibana that referenced this pull request Feb 9, 2024
Improves recall speed by outputting as CSV with zero-indexed document
"ids". Previously, it was a JSON object, with the real document ids.
This causes the LLM to "think" for longer, for whatever reason. I didn't
actually see a difference in completion speed, but emitting the first
value took significantly less time when using the CSV output. I also
tried sending a single document per request using the old format, and
while that certainly improves things, the slowest request becomes the
bottleneck. These are results from about 10 tries per strategy (I'd love
to see others reproduce at least the `batch` vs `csv` strategy results):

`batch`: 24.7s
`chunk`: 10s
`csv`: 4.9s

---------

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit fc58a0d)

# Conflicts:
#	x-pack/plugins/observability_ai_assistant/server/functions/recall.ts
#	x-pack/plugins/observability_ai_assistant/server/service/client/index.ts
dgieselaar added a commit that referenced this pull request Feb 9, 2024
# Backport

This will backport the following commits from `main` to `8.12`:
- [[Obs AI Assistant] Improve recall speed
(#176428)](#176428)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dario
Gieselaar","email":"dario.gieselaar@elastic.co"},"sourceCommit":{"committedDate":"2024-02-08T16:27:24Z","message":"[Obs
AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by
outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it
was a JSON object, with the real document ids.\r\nThis causes the LLM to
\"think\" for longer, for whatever reason. I didn't\r\nactually see a
difference in completion speed, but emitting the first\r\nvalue took
significantly less time when using the CSV output. I also\r\ntried
sending a single document per request using the old format, and\r\nwhile
that certainly improves things, the slowest request becomes
the\r\nbottleneck. These are results from about 10 tries per strategy
(I'd love\r\nto see others reproduce at least the `batch` vs `csv`
strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`:
4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen
<sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v8.13.0","v8.12.2"],"number":176428,"url":"https://github.com/elastic/kibana/pull/176428","mergeCommit":{"message":"[Obs
AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by
outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it
was a JSON object, with the real document ids.\r\nThis causes the LLM to
\"think\" for longer, for whatever reason. I didn't\r\nactually see a
difference in completion speed, but emitting the first\r\nvalue took
significantly less time when using the CSV output. I also\r\ntried
sending a single document per request using the old format, and\r\nwhile
that certainly improves things, the slowest request becomes
the\r\nbottleneck. These are results from about 10 tries per strategy
(I'd love\r\nto see others reproduce at least the `batch` vs `csv`
strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`:
4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen
<sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/176428","number":176428,"mergeCommit":{"message":"[Obs
AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by
outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it
was a JSON object, with the real document ids.\r\nThis causes the LLM to
\"think\" for longer, for whatever reason. I didn't\r\nactually see a
difference in completion speed, but emitting the first\r\nvalue took
significantly less time when using the CSV output. I also\r\ntried
sending a single document per request using the old format, and\r\nwhile
that certainly improves things, the slowest request becomes
the\r\nbottleneck. These are results from about 10 tries per strategy
(I'd love\r\nto see others reproduce at least the `batch` vs `csv`
strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`:
4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen
<sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4"}},{"branch":"8.12","label":"v8.12.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
Improves recall speed by outputting as CSV with zero-indexed document
"ids". Previously, it was a JSON object, with the real document ids.
This causes the LLM to "think" for longer, for whatever reason. I didn't
actually see a difference in completion speed, but emitting the first
value took significantly less time when using the CSV output. I also
tried sending a single document per request using the old format, and
while that certainly improves things, the slowest request becomes the
bottleneck. These are results from about 10 tries per strategy (I'd love
to see others reproduce at least the `batch` vs `csv` strategy results):

`batch`: 24.7s
`chunk`: 10s
`csv`: 4.9s

---------

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
Improves recall speed by outputting as CSV with zero-indexed document
"ids". Previously, it was a JSON object, with the real document ids.
This causes the LLM to "think" for longer, for whatever reason. I didn't
actually see a difference in completion speed, but emitting the first
value took significantly less time when using the CSV output. I also
tried sending a single document per request using the old format, and
while that certainly improves things, the slowest request becomes the
bottleneck. These are results from about 10 tries per strategy (I'd love
to see others reproduce at least the `batch` vs `csv` strategy results):

`batch`: 24.7s
`chunk`: 10s
`csv`: 4.9s

---------

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants