[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166

e40pud · 2024-09-26T14:49:16Z

Summary

This PR improves OSS models integration via OpenAI connector (https://github.com/elastic/security-team/issues/10416).

OSS models do not support "native" tool calling, thus we need to do that via prompting. To do that we use structured chat agent (LangChain.createStructuredChatAgent) and appropriate prompt (see STRUCTURED_SYSTEM_PROMPT in x-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/translations.ts file).

One important note, since we are using OpenAI connector to connect to OSS model we need to recognise and distinguish that which is done in a new utils function isOpenSourceModel in x-pack/plugins/elastic_assistant/server/routes/utils.ts file.

To avoid tool's output parsing in streaming mode, we do enable response step for the OSS models - similar to what we do in case of Bedrock. This simplifies parsing during streaming and safes us headaches from fighting with JSON formatting issues returned by OSS models (especially Llama). I noticed that in case of Llama, the final answer returned with extra escape characters which breaks a markdown.

Tested OSS models

Mistral Large 2407
LLama 3.1 70B
Llama 3.1 405B (NOTE: for some reason Azure deployment does not work right now. Working on it.)

Both models work well with existing ES tools. Mistral shown slightly better results than Llama.

Credentials

Use these credentials for testing LLama 3.1 70B and Mistral Large 2407:

https://p.elstc.co/paste/ofqxUGiW#RR1Pedserj9hWKm3vOw5oVEHWuwbX7i9Jl2rS7q7MRP

Notes about ES|QL generation

By default works with the ESQLKnowledgeBaseTool tool. The new NaturalLanguageESQLTool provides better results. To enable it use:

xpack.securitySolution.enableExperimental:
  - 'assistantNaturalLanguageESQLTool'

Example of ES|QL generation using Mistral model and `NaturalLanguageESQLTool` tool

mistral.mov

Evaluation

Experiment 1

Suite: ES|QL Generation Regression Suite
Models: Llama 3.1 405B and Mistral Large 2407
Tool: ESQLKnowledgeBaseTool
Correctness results:

Experiment 2

Suite: ES|QL Generation Regression Suite
Models: Llama 3.1 405B and Mistral Large 2407
Tool: NaturalLanguageESQLTool
Results:

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

1. Make sure that we check last appearence of `'"action_input": "'` 2. Make sure that we do not cut streaming in case expected `\` before `*` character was already added to the final message in previous round

…e Azure AI

… markdown

e40pud · 2024-09-27T09:47:51Z

@elasticmachine merge upstream

e40pud · 2024-09-27T14:32:46Z

x-pack/plugins/elastic_assistant/server/routes/evaluate/post_evaluate.ts

@@ -332,7 +337,7 @@ export const postEvaluateRoute = (
              return output;
            };

-            const evalOutput = await evaluate(predict, {
+            const evalOutput = evaluate(predict, {


Removing this await solves the issue where different OSS models fail multiple tests due to error:

Error: ActionsClientChatOpenAI: an error occurred while running the action - Status code: undefined. Message: Unexpected API Error: ERR_CANCELED - canceled

This mostly happens with Llama model, but occasionally with Mistral as well.

Here is the example of those failures https://smith.langchain.com/o/b739bf24-7ba4-4994-b632-65dd677ac74e/datasets/261dcc59-fbe7-4397-a662-ff94042f666c/compare?selectedSessions=f26a13b9-73ba-4e6d-992b-3b4eda61bb20&baseline=undefined&activeSession=f26a13b9-73ba-4e6d-992b-3b4eda61bb20

I feel it has something to do with the slowness of the model (not proved yet) which leads to timeouts.

@spong How important for us to make sure that all experiments are finished and only then respond back? Can we just schedule those experiments and return and log evaluation results and handle errors in a different way here? Something like this:

evaluate(predict, { data: datasetName ?? '', evaluators: [], // Evals to be managed in LangSmith for now experimentPrefix: name, client: new Client({ apiKey: langSmithApiKey }), // prevent rate limiting and unexpected multiple experiment runs maxConcurrency: 5, }).then((evalOutput) => { logger.debug(`runResp:\n ${JSON.stringify(evalOutput, null, 2)}`); }).catch(() => { // handle error here });

also we would need to add better error handling in this case as well.

Yeah that should be fine. We used to use the result output and write that to the local ES, but don't do that anymore, and so just debug log the output since we've been using LangSmith for analysis after the fact. Really we just need to know if the eval was successful from the API perspective.

That said, I'm curious why this await would cause internal issues with the chat clients. That shouldn't change anything with the concurrency or executions, but seems with it that previous ones are getting cancelled? Perhaps there's some shared state between the chat clients like @stephmilovic fixed with createLlmInstance over in #190004?

If you turn down maxConcurrency to 1 and run with the await do you still see it? Curious if rate limiting or something might be cancelling the previous runs?

So, I was able to run simple stress testing against our llama deployment and I was able to get this error which I guess is the reason our evaluations are failing sometimes

{ "statusCode": 429, "message": "Rate Limit on Number of Tokens is exceeded. Retry after 8 seconds." }

I don't think, this is something we cannot control in code and will need to be configured in llm deployment if we need it for evaluations. I found this article https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest#assign-quota which shows how to address that in deployment settings (sent to James as well). There we have an option to assign Tokens-Per-Minute (TPM) to llm deployment.

Btw, lowering maxConcurrency did not help with the issue. Setting it to 1 failed requests even faster for whatever reasons.

To lower chances of hitting rate limit failure, I removed the await and restructured code in order to log results and errors at the end of execution. This will allow users to avoid 429 errors in most cases.

After talking to @jamesspi, we agreed that this is something user can fix on their end by increasing the quota and resources. We would prefer to keep things simple instead of finding a good solution to deal with 429 during evaluations and make our code complicated.

@spong what do you think?

e40pud · 2024-09-30T08:17:17Z

@elasticmachine merge upstream

e40pud · 2024-10-02T12:44:33Z

@elasticmachine merge upstream

x-pack/plugins/elastic_assistant/server/routes/utils.ts

stephmilovic · 2024-10-04T14:52:47Z

x-pack/plugins/elastic_assistant/server/routes/utils.ts

+    isOpeAiType &&
+    (!connectorApiUrl ||
+      connectorApiUrl === OPENAI_CHAT_URL ||
+      connectorApiProvider === OpenAiProviderType.AzureAi);


Did you consider adding a new apiProvider of Open Source and relying on that for the isOpenSourceModel determination?

Yes, my second PR is about that #194831

We will add another provide to cover "other" OpenAI compatible services. It will work for all newly added connectors, though we would like to support already created connectors - OSS models via OpenAI provide.

e40pud · 2024-10-07T12:12:17Z

@elasticmachine merge upstream

Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>

kibana-ci · 2024-10-07T18:04:19Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 0f96231

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`integrationAssistant`	562	563	+1
`securitySolution`	5925	5926	+1
total			+2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`elasticAssistant`	37	38	+1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	20.6MB	20.6MB	+255.0B

Unknown metric groups

API count

id	before	after	diff
`elasticAssistant`	52	53	+1

History

💛 Build #239831 was flaky 26cb4c3
💛 Build #239495 was flaky d0458eb
💚 Build #239333 succeeded db9fd74
💚 Build #239091 succeeded 5c7bf7f
💛 Build #238864 was flaky 69543ce
💛 Build #238775 was flaky c0a00eb

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @e40pud

stephmilovic

LGTM! Thank you for the improvement and for your patience fixing the timeout error. Nice work @e40pud

elasticmachine · 2024-10-07T21:37:34Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 59fb225

Failed CI Steps

Jest Integration Tests #2

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`integrationAssistant`	562	563	+1
`securitySolution`	5925	5926	+1
total			+2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`elasticAssistant`	37	38	+1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	20.6MB	20.6MB	+285.0B

Unknown metric groups

API count

id	before	after	diff
`elasticAssistant`	52	53	+1

cc @e40pud

kibanamachine · 2024-10-07T21:41:38Z

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11224445157

…lastic#10416) (elastic#194166) (cherry picked from commit 1ee648d)

kibanamachine · 2024-10-07T21:46:05Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…els (#10416) (#194166) (#195324) # Backport This will backport the following commits from `main` to `8.x`: - [[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) (#194166)](#194166)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Ievgen Sorokopud <ievgen.sorokopud@elastic.co>

e40pud added 11 commits September 26, 2024 15:30

OSS LLM

bc4228d

OSS LLMs streaming fixes

4d5c841

1. Make sure that we check last appearence of `'"action_input": "'` 2. Make sure that we do not cut streaming in case expected `\` before `*` character was already added to the final message in previous round

Use api URL to verify OSS llms vs OpenAI

45c8ac0

Prompting

98abd53

Use provider type to better identify OSS model - handles case with th…

7a7f7e2

…e Azure AI

Enable NaturalLanguageESQLTool for OSS models like Llama

6903909

Fix the issue with extra escape backslash characters which breaks the…

7436f34

… markdown

Revert streaming events parsing

ae638e6

Simplified OSS model streaming

a03e5ea

Add OSS model specific prompt to the tool description

5d2e1f2

Brush up implementation and add some unit tests

f9eb9d7

e40pud added release_note:skip Skip the PR/issue when compiling release notes Feature:Security Assistant Security Assistant Team:Security Generative AI Security Generative AI labels Sep 26, 2024

e40pud self-assigned this Sep 26, 2024

Remove redundant code

351c2f7

e40pud marked this pull request as ready for review September 27, 2024 09:48

e40pud requested review from a team as code owners September 27, 2024 09:48

e40pud added backport:version Backport to applied version labels v9.0.0 v8.16.0 labels Sep 27, 2024

Merge branch 'main' into security/genai/10416-OSS-models

d78af80

e40pud commented Sep 27, 2024

View reviewed changes

elasticmachine and others added 2 commits September 30, 2024 10:17

Merge branch 'main' into security/genai/10416-OSS-models

8689cc2

Merge branch 'main' into security/genai/10416-OSS-models

046c3c5

Merge branch 'main' into security/genai/10416-OSS-models

c0a00eb

e40pud added 3 commits October 2, 2024 17:39

Make sure we log evaluation results and errors

69543ce

Merge branch 'main' into security/genai/10416-OSS-models

5c7bf7f

Merge branch 'main' into security/genai/10416-OSS-models

db9fd74

stephmilovic reviewed Oct 4, 2024

View reviewed changes

x-pack/plugins/elastic_assistant/server/routes/utils.ts Outdated Show resolved Hide resolved

Merge branch 'main' into security/genai/10416-OSS-models

d0458eb

stephmilovic reviewed Oct 4, 2024

View reviewed changes

elasticmachine and others added 4 commits October 7, 2024 14:12

Merge branch 'main' into security/genai/10416-OSS-models

26cb4c3

Review feedback: long time request issue

513d4bf

Update x-pack/plugins/elastic_assistant/server/routes/utils.ts

1d60365

Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>

Review feedback: naming

0f96231

Review feedback: re-instantiate AbortController after the abort

59fb225

stephmilovic approved these changes Oct 7, 2024

View reviewed changes

stephmilovic merged commit 1ee648d into elastic:main Oct 7, 2024
41 checks passed

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 7, 2024

[Security Assistant] AI Assistant - Better Solution for OSS models (e…

a813812

…lastic#10416) (elastic#194166) (cherry picked from commit 1ee648d)

kibanamachine mentioned this pull request Oct 7, 2024

[8.x] [Security Assistant] AI Assistant - Better Solution for OSS models (#10416) (#194166) #195324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166

[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166

e40pud commented Sep 26, 2024 •

edited by kibanamachine

Loading

e40pud commented Sep 27, 2024

e40pud Sep 27, 2024

spong Oct 1, 2024

spong Oct 1, 2024

e40pud Oct 2, 2024 •

edited

Loading

e40pud Oct 2, 2024

e40pud commented Sep 30, 2024

e40pud commented Oct 2, 2024

stephmilovic Oct 4, 2024

e40pud Oct 4, 2024

e40pud commented Oct 7, 2024

kibana-ci commented Oct 7, 2024

API count

stephmilovic left a comment

elasticmachine commented Oct 7, 2024

API count

kibanamachine commented Oct 7, 2024

kibanamachine commented Oct 7, 2024

[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166

[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166

Conversation

e40pud commented Sep 26, 2024 • edited by kibanamachine Loading

Summary

Tested OSS models

Credentials

Notes about ES|QL generation

Example of ES|QL generation using Mistral model and NaturalLanguageESQLTool tool

Evaluation

Experiment 1

Experiment 2

Checklist

e40pud commented Sep 27, 2024

e40pud Sep 27, 2024

Choose a reason for hiding this comment

spong Oct 1, 2024

Choose a reason for hiding this comment

spong Oct 1, 2024

Choose a reason for hiding this comment

e40pud Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

e40pud Oct 2, 2024

Choose a reason for hiding this comment

e40pud commented Sep 30, 2024

e40pud commented Oct 2, 2024

stephmilovic Oct 4, 2024

Choose a reason for hiding this comment

e40pud Oct 4, 2024

Choose a reason for hiding this comment

e40pud commented Oct 7, 2024

kibana-ci commented Oct 7, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

API count

History

stephmilovic left a comment

Choose a reason for hiding this comment

elasticmachine commented Oct 7, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

API count

kibanamachine commented Oct 7, 2024

kibanamachine commented Oct 7, 2024

💚 All backports created successfully

Questions ?

e40pud commented Sep 26, 2024 •

edited by kibanamachine

Loading

Example of ES|QL generation using Mistral model and `NaturalLanguageESQLTool` tool

e40pud Oct 2, 2024 •

edited

Loading