Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166

Merged
merged 25 commits into from
Oct 7, 2024

Conversation

e40pud
Copy link
Contributor

@e40pud e40pud commented Sep 26, 2024

Summary

This PR improves OSS models integration via OpenAI connector (https://github.com/elastic/security-team/issues/10416).

OSS models do not support "native" tool calling, thus we need to do that via prompting. To do that we use structured chat agent (LangChain.createStructuredChatAgent) and appropriate prompt (see STRUCTURED_SYSTEM_PROMPT in x-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/translations.ts file).

One important note, since we are using OpenAI connector to connect to OSS model we need to recognise and distinguish that which is done in a new utils function isOpenSourceModel in x-pack/plugins/elastic_assistant/server/routes/utils.ts file.

To avoid tool's output parsing in streaming mode, we do enable response step for the OSS models - similar to what we do in case of Bedrock. This simplifies parsing during streaming and safes us headaches from fighting with JSON formatting issues returned by OSS models (especially Llama). I noticed that in case of Llama, the final answer returned with extra escape characters which breaks a markdown.

Tested OSS models

  • Mistral Large 2407
  • LLama 3.1 70B
  • Llama 3.1 405B (NOTE: for some reason Azure deployment does not work right now. Working on it.)

Both models work well with existing ES tools. Mistral shown slightly better results than Llama.

Credentials

Use these credentials for testing LLama 3.1 70B and Mistral Large 2407:

https://p.elstc.co/paste/ofqxUGiW#RR1Pedserj9hWKm3vOw5oVEHWuwbX7i9Jl2rS7q7MRP

Notes about ES|QL generation

By default works with the ESQLKnowledgeBaseTool tool. The new NaturalLanguageESQLTool provides better results. To enable it use:

xpack.securitySolution.enableExperimental:
  - 'assistantNaturalLanguageESQLTool'

Example of ES|QL generation using Mistral model and NaturalLanguageESQLTool tool

mistral.mov

Evaluation

Experiment 1

Suite: ES|QL Generation Regression Suite
Models: Llama 3.1 405B and Mistral Large 2407
Tool: ESQLKnowledgeBaseTool
Correctness results:

Experiment 2

Suite: ES|QL Generation Regression Suite
Models: Llama 3.1 405B and Mistral Large 2407
Tool: NaturalLanguageESQLTool
Results:

Checklist

Delete any items that are not applicable to this PR.

@e40pud e40pud added release_note:skip Skip the PR/issue when compiling release notes Feature:Security Assistant Security Assistant Team:Security Generative AI Security Generative AI labels Sep 26, 2024
@e40pud e40pud self-assigned this Sep 26, 2024
@e40pud
Copy link
Contributor Author

e40pud commented Sep 27, 2024

@elasticmachine merge upstream

@e40pud e40pud marked this pull request as ready for review September 27, 2024 09:48
@e40pud e40pud requested review from a team as code owners September 27, 2024 09:48
@e40pud e40pud added backport:version Backport to applied version labels v9.0.0 v8.16.0 labels Sep 27, 2024
@@ -332,7 +337,7 @@ export const postEvaluateRoute = (
return output;
};

const evalOutput = await evaluate(predict, {
const evalOutput = evaluate(predict, {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this await solves the issue where different OSS models fail multiple tests due to error:

Error: ActionsClientChatOpenAI: an error occurred while running the action - Status code: undefined. Message: Unexpected API Error: ERR_CANCELED - canceled

This mostly happens with Llama model, but occasionally with Mistral as well.

Here is the example of those failures https://smith.langchain.com/o/b739bf24-7ba4-4994-b632-65dd677ac74e/datasets/261dcc59-fbe7-4397-a662-ff94042f666c/compare?selectedSessions=f26a13b9-73ba-4e6d-992b-3b4eda61bb20&baseline=undefined&activeSession=f26a13b9-73ba-4e6d-992b-3b4eda61bb20

I feel it has something to do with the slowness of the model (not proved yet) which leads to timeouts.

@spong How important for us to make sure that all experiments are finished and only then respond back? Can we just schedule those experiments and return and log evaluation results and handle errors in a different way here? Something like this:

evaluate(predict, {
  data: datasetName ?? '',
  evaluators: [], // Evals to be managed in LangSmith for now
  experimentPrefix: name,
  client: new Client({ apiKey: langSmithApiKey }),
  // prevent rate limiting and unexpected multiple experiment runs
  maxConcurrency: 5,
}).then((evalOutput) => {
  logger.debug(`runResp:\n ${JSON.stringify(evalOutput, null, 2)}`);
}).catch(() => {
  // handle error here
});

also we would need to add better error handling in this case as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that should be fine. We used to use the result output and write that to the local ES, but don't do that anymore, and so just debug log the output since we've been using LangSmith for analysis after the fact. Really we just need to know if the eval was successful from the API perspective.

That said, I'm curious why this await would cause internal issues with the chat clients. That shouldn't change anything with the concurrency or executions, but seems with it that previous ones are getting cancelled? Perhaps there's some shared state between the chat clients like @stephmilovic fixed with createLlmInstance over in #190004?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you turn down maxConcurrency to 1 and run with the await do you still see it? Curious if rate limiting or something might be cancelling the previous runs?

Copy link
Contributor Author

@e40pud e40pud Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I was able to run simple stress testing against our llama deployment and I was able to get this error which I guess is the reason our evaluations are failing sometimes

{
    "statusCode": 429,
    "message": "Rate Limit on Number of Tokens is exceeded. Retry after 8 seconds."
}

Screenshot 2024-10-02 at 14 33 15

I don't think, this is something we cannot control in code and will need to be configured in llm deployment if we need it for evaluations. I found this article https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest#assign-quota which shows how to address that in deployment settings (sent to James as well). There we have an option to assign Tokens-Per-Minute (TPM) to llm deployment.

Btw, lowering maxConcurrency did not help with the issue. Setting it to 1 failed requests even faster for whatever reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To lower chances of hitting rate limit failure, I removed the await and restructured code in order to log results and errors at the end of execution. This will allow users to avoid 429 errors in most cases.

After talking to @jamesspi, we agreed that this is something user can fix on their end by increasing the quota and resources. We would prefer to keep things simple instead of finding a good solution to deal with 429 during evaluations and make our code complicated.

@spong what do you think?

@e40pud
Copy link
Contributor Author

e40pud commented Sep 30, 2024

@elasticmachine merge upstream

@e40pud
Copy link
Contributor Author

e40pud commented Oct 2, 2024

@elasticmachine merge upstream

isOpeAiType &&
(!connectorApiUrl ||
connectorApiUrl === OPENAI_CHAT_URL ||
connectorApiProvider === OpenAiProviderType.AzureAi);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider adding a new apiProvider of Open Source and relying on that for the isOpenSourceModel determination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my second PR is about that #194831

We will add another provide to cover "other" OpenAI compatible services. It will work for all newly added connectors, though we would like to support already created connectors - OSS models via OpenAI provide.

@e40pud
Copy link
Contributor Author

e40pud commented Oct 7, 2024

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
integrationAssistant 562 563 +1
securitySolution 5925 5926 +1
total +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
elasticAssistant 37 38 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 20.6MB 20.6MB +255.0B
Unknown metric groups

API count

id before after diff
elasticAssistant 52 53 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @e40pud

Copy link
Contributor

@stephmilovic stephmilovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the improvement and for your patience fixing the timeout error. Nice work @e40pud

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
integrationAssistant 562 563 +1
securitySolution 5925 5926 +1
total +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
elasticAssistant 37 38 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 20.6MB 20.6MB +285.0B
Unknown metric groups

API count

id before after diff
elasticAssistant 52 53 +1

cc @e40pud

@stephmilovic stephmilovic merged commit 1ee648d into elastic:main Oct 7, 2024
41 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11224445157

@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Oct 7, 2024
…els (#10416) (#194166) (#195324)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Security Assistant] AI Assistant - Better Solution for OSS models
(#10416) (#194166)](#194166)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ievgen
Sorokopud","email":"ievgen.sorokopud@elastic.co"},"sourceCommit":{"committedDate":"2024-10-07T21:41:20Z","message":"[Security
Assistant] AI Assistant - Better Solution for OSS models (#10416)
(#194166)","sha":"1ee648d672f0ed5322183d5abd4ebbe4e13f0a93","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Feature:Security
Assistant","Team:Security Generative
AI","v8.16.0","backport:version"],"title":"[Security Assistant] AI
Assistant - Better Solution for OSS models
(#10416)","number":194166,"url":"https://github.com/elastic/kibana/pull/194166","mergeCommit":{"message":"[Security
Assistant] AI Assistant - Better Solution for OSS models (#10416)
(#194166)","sha":"1ee648d672f0ed5322183d5abd4ebbe4e13f0a93"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/194166","number":194166,"mergeCommit":{"message":"[Security
Assistant] AI Assistant - Better Solution for OSS models (#10416)
(#194166)","sha":"1ee648d672f0ed5322183d5abd4ebbe4e13f0a93"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Ievgen Sorokopud <ievgen.sorokopud@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels Feature:Security Assistant Security Assistant release_note:skip Skip the PR/issue when compiling release notes Team:Security Generative AI Security Generative AI v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants