-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Improve recall speed #176428
[Obs AI Assistant] Improve recall speed #176428
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
(It's late so maybe I'm just being stupid but...) |
@miltonhultgren lol, thanks - committed the wrong stash 😭 i ran the |
…/kibana into obs-ai-assistant-recall-speed
@miltonhultgren should be ready now. One thing to add here: I'm sending over the entire conversation now as context for the LLM to evaluate. I think that leads to better results rather than just sending over the current question (e.g. consider follow up questions). |
Does this mean we are optimizing for implementation details in openai's model that could change at any time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits and questions
x-pack/plugins/observability_ai_assistant/server/functions/recall.ts
Outdated
Show resolved
Hide resolved
@@ -87,12 +87,17 @@ export function registerRecallFunction({ | |||
messages.filter((message) => message.message.role === MessageRole.User) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is a good use case for findLast
(well-supported and not nearly used enough :D )
messages.filter((message) => message.message.role === MessageRole.User) | |
messages.findLast((message) => message.message.role === MessageRole.User) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know this exists 😄 unfortunately TS doesn't know about it either (??)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately TS doesn't know about it either (??)
Argh, I see that now. Odd since CanIUse reports mainstream support in all browsers we care about since at least 2022.
Edit seems like we need to wait until Typescript 5
|
||
const queriesOrUserPrompt = nonEmptyQueries.length | ||
? nonEmptyQueries | ||
: compact([userMessage?.message.content]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not always filter by the user query combined with the query from the LLM? Is that too restrictive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'd like to investigate is have the LLM decide when to recall (other than the first message). E.g., if somebody asks "what does the following error mean: ..." and then "what are the consequences of this error", the latter doesn't really need a recall. If they ask "how does this affect my checkout service" it does. The LLM should be able to classify this, and rewrite it in a way that does not require us to send over the entire conversation. In that case, the query it chooses should be the only thing we use. So, I'm preparing for that future.
dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 7, | ||
0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wondering about this before: why is the scoring interval 0-7? Is it something magic?
Btw It looks like you expect a scoring between 0-10
dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 7, | |
0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in | |
dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 10, | |
0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miltonhultgren any thoughts here on 0 to 7 vs 10?
@@ -250,18 +229,27 @@ async function scoreSuggestions({ | |||
( | |||
await client.chat('score_suggestions', { | |||
connectorId, | |||
messages: [extendedSystemMessage, newUserMessage], | |||
messages: [...messages.slice(-1), newUserMessage], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we use last
instead? (my head spins when doing array math)
messages: [...messages.slice(-1), newUserMessage], | |
messages: [last(messages), newUserMessage], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a typo 😄. it's supposed to be "everything except the last", so .slice(0,-1)
. Will correct.
const scores = scoresAsString.split('\n').map((line) => { | ||
const [index, score] = line | ||
.split(',') | ||
.map((value) => value.trim()) | ||
.map(Number); | ||
|
||
return { id: suggestions[index].id, score }; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How confident are we that this is the format the LLM will respond with (seeing we support multiple LLMs soon)? Should we handle the cases where the LLM will respond in non-csv format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look in the Bedrock PR, but I don't think we should expect too much from or invest too much in LLMs other than OpenAI, until there is an LLM that has similar performance.
.then( | ||
() => {}, | ||
() => {} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this then
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, just doing this to catch an error so it doesn't result in an unhandled promise rejection. But I don't need a then, I can just use a catch.
How can I do that? I'm already capturing traces to a private cluster. How can I benchmark |
In the chat function, add the following line:
depending on which strategy you are using. Revert the commits that change the strategy. You can then either inspect the logs or go to https://kibana-cloud-apm.elastic.dev/ to inspect your spans. |
…all.ts Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
…/kibana into obs-ai-assistant-recall-speed
I'm hoping we can use APM for this, or do you want to run a performance test at set intervals outside of regular usage? |
@elasticmachine merge upstream |
…/kibana into obs-ai-assistant-recall-speed
.finally(() => { | ||
this.dependencies.logger.debug( | ||
`Received first value after ${Math.round(performance.now() - now)}ms${ | ||
spanId ? ` (${spanId})` : '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these intermediate metrics only available in logs or can we somehow store them in the span ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, as labels, but not sure if it has tremendous value - adding labels changes the mapping so I'm a little wary of adding more. Let's see if we need it.
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
Improves recall speed by outputting as CSV with zero-indexed document "ids". Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason. I didn't actually see a difference in completion speed, but emitting the first value took significantly less time when using the CSV output. I also tried sending a single document per request using the old format, and while that certainly improves things, the slowest request becomes the bottleneck. These are results from about 10 tries per strategy (I'd love to see others reproduce at least the `batch` vs `csv` strategy results): `batch`: 24.7s `chunk`: 10s `csv`: 4.9s --------- Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit fc58a0d) # Conflicts: # x-pack/plugins/observability_ai_assistant/server/functions/recall.ts # x-pack/plugins/observability_ai_assistant/server/service/client/index.ts
# Backport This will backport the following commits from `main` to `8.12`: - [[Obs AI Assistant] Improve recall speed (#176428)](#176428) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Dario Gieselaar","email":"dario.gieselaar@elastic.co"},"sourceCommit":{"committedDate":"2024-02-08T16:27:24Z","message":"[Obs AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it was a JSON object, with the real document ids.\r\nThis causes the LLM to \"think\" for longer, for whatever reason. I didn't\r\nactually see a difference in completion speed, but emitting the first\r\nvalue took significantly less time when using the CSV output. I also\r\ntried sending a single document per request using the old format, and\r\nwhile that certainly improves things, the slowest request becomes the\r\nbottleneck. These are results from about 10 tries per strategy (I'd love\r\nto see others reproduce at least the `batch` vs `csv` strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`: 4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v8.13.0","v8.12.2"],"number":176428,"url":"https://github.com/elastic/kibana/pull/176428","mergeCommit":{"message":"[Obs AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it was a JSON object, with the real document ids.\r\nThis causes the LLM to \"think\" for longer, for whatever reason. I didn't\r\nactually see a difference in completion speed, but emitting the first\r\nvalue took significantly less time when using the CSV output. I also\r\ntried sending a single document per request using the old format, and\r\nwhile that certainly improves things, the slowest request becomes the\r\nbottleneck. These are results from about 10 tries per strategy (I'd love\r\nto see others reproduce at least the `batch` vs `csv` strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`: 4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/176428","number":176428,"mergeCommit":{"message":"[Obs AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it was a JSON object, with the real document ids.\r\nThis causes the LLM to \"think\" for longer, for whatever reason. I didn't\r\nactually see a difference in completion speed, but emitting the first\r\nvalue took significantly less time when using the CSV output. I also\r\ntried sending a single document per request using the old format, and\r\nwhile that certainly improves things, the slowest request becomes the\r\nbottleneck. These are results from about 10 tries per strategy (I'd love\r\nto see others reproduce at least the `batch` vs `csv` strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`: 4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4"}},{"branch":"8.12","label":"v8.12.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
Improves recall speed by outputting as CSV with zero-indexed document "ids". Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason. I didn't actually see a difference in completion speed, but emitting the first value took significantly less time when using the CSV output. I also tried sending a single document per request using the old format, and while that certainly improves things, the slowest request becomes the bottleneck. These are results from about 10 tries per strategy (I'd love to see others reproduce at least the `batch` vs `csv` strategy results): `batch`: 24.7s `chunk`: 10s `csv`: 4.9s --------- Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Improves recall speed by outputting as CSV with zero-indexed document "ids". Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason. I didn't actually see a difference in completion speed, but emitting the first value took significantly less time when using the CSV output. I also tried sending a single document per request using the old format, and while that certainly improves things, the slowest request becomes the bottleneck. These are results from about 10 tries per strategy (I'd love to see others reproduce at least the `batch` vs `csv` strategy results): `batch`: 24.7s `chunk`: 10s `csv`: 4.9s --------- Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Improves recall speed by outputting as CSV with zero-indexed document "ids". Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason. I didn't actually see a difference in completion speed, but emitting the first value took significantly less time when using the CSV output. I also tried sending a single document per request using the old format, and while that certainly improves things, the slowest request becomes the bottleneck. These are results from about 10 tries per strategy (I'd love to see others reproduce at least the
batch
vscsv
strategy results):batch
: 24.7schunk
: 10scsv
: 4.9s