New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Obs AI Assistant] Improve recall speed #176428

Merged

dgieselaar merged 16 commits into elastic:main from dgieselaar:obs-ai-assistant-recall-speed

Feb 8, 2024

Member

dgieselaar commented Feb 7, 2024 •

edited by kibanamachine

Loading

Improves recall speed by outputting as CSV with zero-indexed document "ids". Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason. I didn't actually see a difference in completion speed, but emitting the first value took significantly less time when using the CSV output. I also tried sending a single document per request using the old format, and while that certainly improves things, the slowest request becomes the bottleneck. These are results from about 10 tries per strategy (I'd love to see others reproduce at least the batch vs csv strategy results):

batch: 24.7s
chunk: 10s
csv: 4.9s

dgieselaar added 3 commits

February 7, 2024 15:34


          Log response times

260ba9b


          Count tokens when sending over request

749f22b


          Use CSV output format for recall

46d82b5

dgieselaar added release_note:fix v8.13.0 v8.12.2 labels

dgieselaar requested a review from a team as a code owner

February 7, 2024 17:13

Contributor

apmmachine commented Feb 7, 2024

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
/oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Contributor

miltonhultgren commented Feb 7, 2024

(It's late so maybe I'm just being stupid but...)
It feels like the PR is missing some changes?
Like I don't see the change to the CSV format, and it looks like you're running 1 request per document in parallel in this version.

Member Author

dgieselaar commented Feb 7, 2024

@miltonhultgren lol, thanks - committed the wrong stash 😭 i ran the chunk one last and forgot to revert to the csv strategy. Will fix

dgieselaar added 4 commits

February 8, 2024 08:35


          Don't log (incorrect) token count

80877b7


          Use CSV

8db2c42


          Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

5d99236

…t-recall-speed


          Merge branch 'obs-ai-assistant-recall-speed' of github.com:dgieselaar…

3d2d81b

…/kibana into obs-ai-assistant-recall-speed

Member Author

dgieselaar commented Feb 8, 2024

@miltonhultgren should be ready now. One thing to add here: I'm sending over the entire conversation now as context for the LLM to evaluate. I think that leads to better results rather than just sending over the current question (e.g. consider follow up questions).

Member

sorenlouv commented Feb 8, 2024

Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason.

Does this mean we are optimizing for implementation details in openai's model that could change at any time?
Perhaps we should invest in automation to monitor response time from the LLM?

sorenlouv approved these changes

View reviewed changes

Member

sorenlouv left a comment

Mostly nits and questions

x-pack/plugins/observability_ai_assistant/server/functions/recall.ts Outdated Show resolved Hide resolved

x-pack/plugins/observability_ai_assistant/server/functions/recall.ts

		@@ -87,12 +87,17 @@ export function registerRecallFunction({
		messages.filter((message) => message.message.role === MessageRole.User)

Member

sorenlouv Feb 8, 2024 •

edited

Loading

nit: this is a good use case for findLast (well-supported and not nearly used enough :D )

Suggested change

      
                    messages.filter((message) => message.message.role === MessageRole.User)
          
                    messages.findLast((message) => message.message.role === MessageRole.User)

Member Author

dgieselaar Feb 8, 2024 •

edited

Loading

I did not know this exists 😄 unfortunately TS doesn't know about it either (??)

Member

sorenlouv Feb 8, 2024 •

edited

Loading

unfortunately TS doesn't know about it either (??)

Argh, I see that now. Odd since CanIUse reports mainstream support in all browsers we care about since at least 2022.

Edit seems like we need to wait until Typescript 5

x-pack/plugins/observability_ai_assistant/server/functions/recall.ts

+                    const queriesOrUserPrompt = nonEmptyQueries.length
+                      ? nonEmptyQueries
+                      : compact([userMessage?.message.content]);

Member

sorenlouv Feb 8, 2024

Why not always filter by the user query combined with the query from the LLM? Is that too restrictive?

Member Author

dgieselaar Feb 8, 2024

What I'd like to investigate is have the LLM decide when to recall (other than the first message). E.g., if somebody asks "what does the following error mean: ..." and then "what are the consequences of this error", the latter doesn't really need a recall. If they ask "how does this affect my checkout service" it does. The LLM should be able to classify this, and rewrite it in a way that does not require us to send over the entire conversation. In that case, the query it chooses should be the only thing we use. So, I'm preparing for that future.

x-pack/plugins/observability_ai_assistant/server/functions/recall.ts Outdated

Comment on lines 181 to 182

		dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 7,
		0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in

Member

sorenlouv Feb 8, 2024 •

edited

Loading

I've been wondering about this before: why is the scoring interval 0-7? Is it something magic?
Btw It looks like you expect a scoring between 0-10

Suggested change

      
                dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 7,
          
                0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in
          
                dedent(`Given the following question, score the documents that are relevant to the question. on a scale from 0 to 10,
          
                0 being completely relevant, and 10 being extremely relevant. Information is relevant to the question if it helps in

Member Author

dgieselaar Feb 8, 2024

@miltonhultgren any thoughts here on 0 to 7 vs 10?

x-pack/plugins/observability_ai_assistant/server/functions/recall.ts Outdated

@@ @@ -250,18 +229,27 @@ async function scoreSuggestions({ @@
                   (
                     await client.chat('score_suggestions', {
                       connectorId,
-                      messages: [extendedSystemMessage, newUserMessage],
+                      messages: [...messages.slice(-1), newUserMessage],

Member

sorenlouv Feb 8, 2024 •

edited

Loading

nit: Can we use last instead? (my head spins when doing array math)

Suggested change

      
                    messages: [...messages.slice(-1), newUserMessage],
          
                    messages: [last(messages), newUserMessage],

Member Author

dgieselaar Feb 8, 2024

it's a typo 😄. it's supposed to be "everything except the last", so .slice(0,-1). Will correct.

x-pack/plugins/observability_ai_assistant/server/functions/recall.ts

Comment on lines +244 to +251

+                const scores = scoresAsString.split('\n').map((line) => {
+                  const [index, score] = line
+                    .split(',')
+                    .map((value) => value.trim())
+                    .map(Number);
+                  return { id: suggestions[index].id, score };
+                });

Member

sorenlouv Feb 8, 2024 •

edited

Loading

How confident are we that this is the format the LLM will respond with (seeing we support multiple LLMs soon)? Should we handle the cases where the LLM will respond in non-csv format?

Member Author

dgieselaar Feb 8, 2024

I'll have a look in the Bedrock PR, but I don't think we should expect too much from or invest too much in LLMs other than OpenAI, until there is an LLM that has similar performance.

x-pack/plugins/observability_ai_assistant/server/service/client/index.ts Outdated

Comment on lines 538 to 541

+                    .then(
+                      () => {},
+                      () => {}
+                    )

Member

sorenlouv Feb 8, 2024 •

edited

Loading

Is this then necessary?

Member Author

dgieselaar Feb 8, 2024

no, just doing this to catch an error so it doesn't result in an unhandled promise rejection. But I don't need a then, I can just use a catch.

Member

sorenlouv commented Feb 8, 2024 •

edited

Loading

I'd love to see others reproduce at least the batch vs csv strategy results

How can I do that? I'm already capturing traces to a private cluster. How can I benchmark batch vs csv?

Member Author

dgieselaar commented Feb 8, 2024

I'd love to see others reproduce at least the batch vs csv strategy results

How can I do that? I'm already capturing traces to a private cluster. How can I benchmark batch vs csv?

In the chat function, add the following line:

span?.setLabel('recall_strategy', 'batch' | 'csv')

depending on which strategy you are using. Revert the commits that change the strategy. You can then either inspect the logs or go to https://kibana-cloud-apm.elastic.dev/ to inspect your spans.

dgieselaar and others added 5 commits

February 8, 2024 14:01


          Update x-pack/plugins/observability_ai_assistant/server/functions/rec…

4ee1847

…all.ts

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>


          Review feedback

394e3ce


          Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

127a422

…t-recall-speed


          Merge branch 'obs-ai-assistant-recall-speed' of github.com:dgieselaar…

6a095e6

…/kibana into obs-ai-assistant-recall-speed


          Review feedback

9bfbc32

Member Author

dgieselaar commented Feb 8, 2024

Previously, it was a JSON object, with the real document ids. This causes the LLM to "think" for longer, for whatever reason.

Does this mean we are optimizing for implementation details in openai's model that could change at any time? Perhaps we should invest in automation to monitor response time from the LLM?

I'm hoping we can use APM for this, or do you want to run a performance test at set intervals outside of regular usage?

Member Author

dgieselaar commented Feb 8, 2024

@elasticmachine merge upstream

kibanamachine and others added 4 commits

February 8, 2024 09:03


          Merge branch 'main' into obs-ai-assistant-recall-speed

9ce2046


          Resolve merge conflict

1b36de7


          Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

4dac0f3

…t-recall-speed


          Merge branch 'obs-ai-assistant-recall-speed' of github.com:dgieselaar…

37fb2c3

…/kibana into obs-ai-assistant-recall-speed

klacabane reviewed

View reviewed changes

x-pack/plugins/observability_ai_assistant/server/service/client/index.ts

+                    .finally(() => {
+                      this.dependencies.logger.debug(
+                        `Received first value after ${Math.round(performance.now() - now)}ms${
+                          spanId ? ` (${spanId})` : ''

Contributor

klacabane Feb 8, 2024 •

edited

Loading

Are these intermediate metrics only available in logs or can we somehow store them in the span ?

Member Author

dgieselaar Feb 8, 2024

We can, as labels, but not sure if it has tremendous value - adding labels changes the mapping so I'm a little wary of adding more. Let's see if we need it.

Collaborator

kibana-ci commented Feb 8, 2024

💚 Build Succeeded

Buildkite Build
Commit: 37fb2c3

Metrics [docs]

✅ unchanged

History

💔 Build #192265 failed 9ce2046
💔 Build #192256 failed 9bfbc32
💚 Build #192144 succeeded 3d2d81b
💔 Build #192022 failed 46d82b5

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

dgieselaar merged commit fc58a0d into elastic:main

17 checks passed

Contributor

kibanamachine commented Feb 8, 2024

💔 All backports failed

Status	Branch	Result
❌	8.12	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 176428

Questions ?

Please refer to the Backport tool documentation

dgieselaar mentioned this pull request

[8.12] [Obs AI Assistant] Improve recall speed (#176428) #176561

Merged

Member Author

dgieselaar commented Feb 9, 2024

💚 All backports created successfully

Status	Branch	Result
✅	8.12

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

dgieselaar added a commit to dgieselaar/kibana that referenced this pull request


          [Obs AI Assistant] Improve recall speed (elastic#176428)

c04cccb

Improves recall speed by outputting as CSV with zero-indexed document
"ids". Previously, it was a JSON object, with the real document ids.
This causes the LLM to "think" for longer, for whatever reason. I didn't
actually see a difference in completion speed, but emitting the first
value took significantly less time when using the CSV output. I also
tried sending a single document per request using the old format, and
while that certainly improves things, the slowest request becomes the
bottleneck. These are results from about 10 tries per strategy (I'd love
to see others reproduce at least the `batch` vs `csv` strategy results):

`batch`: 24.7s
`chunk`: 10s
`csv`: 4.9s

---------

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit fc58a0d)

# Conflicts:
#	x-pack/plugins/observability_ai_assistant/server/functions/recall.ts
#	x-pack/plugins/observability_ai_assistant/server/service/client/index.ts

dgieselaar added a commit that referenced this pull request


          [8.12] [Obs AI Assistant] Improve recall speed (#176428) (#176561)

eebc330

# Backport

This will backport the following commits from `main` to `8.12`:
- [[Obs AI Assistant] Improve recall speed
(#176428)](#176428)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dario
Gieselaar","email":"dario.gieselaar@elastic.co"},"sourceCommit":{"committedDate":"2024-02-08T16:27:24Z","message":"[Obs
AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by
outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it
was a JSON object, with the real document ids.\r\nThis causes the LLM to
\"think\" for longer, for whatever reason. I didn't\r\nactually see a
difference in completion speed, but emitting the first\r\nvalue took
significantly less time when using the CSV output. I also\r\ntried
sending a single document per request using the old format, and\r\nwhile
that certainly improves things, the slowest request becomes
the\r\nbottleneck. These are results from about 10 tries per strategy
(I'd love\r\nto see others reproduce at least the `batch` vs `csv`
strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`:
4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen
<sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v8.13.0","v8.12.2"],"number":176428,"url":"https://github.com/elastic/kibana/pull/176428","mergeCommit":{"message":"[Obs
AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by
outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it
was a JSON object, with the real document ids.\r\nThis causes the LLM to
\"think\" for longer, for whatever reason. I didn't\r\nactually see a
difference in completion speed, but emitting the first\r\nvalue took
significantly less time when using the CSV output. I also\r\ntried
sending a single document per request using the old format, and\r\nwhile
that certainly improves things, the slowest request becomes
the\r\nbottleneck. These are results from about 10 tries per strategy
(I'd love\r\nto see others reproduce at least the `batch` vs `csv`
strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`:
4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen
<sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/176428","number":176428,"mergeCommit":{"message":"[Obs
AI Assistant] Improve recall speed (#176428)\n\nImproves recall speed by
outputting as CSV with zero-indexed document\r\n\"ids\". Previously, it
was a JSON object, with the real document ids.\r\nThis causes the LLM to
\"think\" for longer, for whatever reason. I didn't\r\nactually see a
difference in completion speed, but emitting the first\r\nvalue took
significantly less time when using the CSV output. I also\r\ntried
sending a single document per request using the old format, and\r\nwhile
that certainly improves things, the slowest request becomes
the\r\nbottleneck. These are results from about 10 tries per strategy
(I'd love\r\nto see others reproduce at least the `batch` vs `csv`
strategy results):\r\n\r\n`batch`: 24.7s\r\n`chunk`: 10s\r\n`csv`:
4.9s\r\n\r\n---------\r\n\r\nCo-authored-by: Søren Louv-Jansen
<sorenlouv@gmail.com>\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"fc58a0d3a71dd946fb24a75050930030c002d2a4"}},{"branch":"8.12","label":"v8.12.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

dgieselaar mentioned this pull request

Improve recall speed #175432

Closed

CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request


          [Obs AI Assistant] Improve recall speed (elastic#176428)

beca2e6

Improves recall speed by outputting as CSV with zero-indexed document
"ids". Previously, it was a JSON object, with the real document ids.
This causes the LLM to "think" for longer, for whatever reason. I didn't
actually see a difference in completion speed, but emitting the first
value took significantly less time when using the CSV output. I also
tried sending a single document per request using the old format, and
while that certainly improves things, the slowest request becomes the
bottleneck. These are results from about 10 tries per strategy (I'd love
to see others reproduce at least the `batch` vs `csv` strategy results):

`batch`: 24.7s
`chunk`: 10s
`csv`: 4.9s

---------

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

fkanout pushed a commit to fkanout/kibana that referenced this pull request


          [Obs AI Assistant] Improve recall speed (elastic#176428)

d7cef59

Improves recall speed by outputting as CSV with zero-indexed document
"ids". Previously, it was a JSON object, with the real document ids.
This causes the LLM to "think" for longer, for whatever reason. I didn't
actually see a difference in completion speed, but emitting the first
value took significantly less time when using the CSV output. I also
tried sending a single document per request using the old format, and
while that certainly improves things, the slowest request becomes the
bottleneck. These are results from about 10 tries per strategy (I'd love
to see others reproduce at least the `batch` vs `csv` strategy results):

`batch`: 24.7s
`chunk`: 10s
`csv`: 4.9s

---------

Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:fix v8.12.2 v8.13.0