-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166
[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166
Conversation
1. Make sure that we check last appearence of `'"action_input": "'` 2. Make sure that we do not cut streaming in case expected `\` before `*` character was already added to the final message in previous round
@elasticmachine merge upstream |
@@ -332,7 +337,7 @@ export const postEvaluateRoute = ( | |||
return output; | |||
}; | |||
|
|||
const evalOutput = await evaluate(predict, { | |||
const evalOutput = evaluate(predict, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this await
solves the issue where different OSS models fail multiple tests due to error:
Error: ActionsClientChatOpenAI: an error occurred while running the action - Status code: undefined. Message: Unexpected API Error: ERR_CANCELED - canceled
This mostly happens with Llama model, but occasionally with Mistral as well.
Here is the example of those failures https://smith.langchain.com/o/b739bf24-7ba4-4994-b632-65dd677ac74e/datasets/261dcc59-fbe7-4397-a662-ff94042f666c/compare?selectedSessions=f26a13b9-73ba-4e6d-992b-3b4eda61bb20&baseline=undefined&activeSession=f26a13b9-73ba-4e6d-992b-3b4eda61bb20
I feel it has something to do with the slowness of the model (not proved yet) which leads to timeouts.
@spong How important for us to make sure that all experiments are finished and only then respond back? Can we just schedule those experiments and return and log evaluation results and handle errors in a different way here? Something like this:
evaluate(predict, {
data: datasetName ?? '',
evaluators: [], // Evals to be managed in LangSmith for now
experimentPrefix: name,
client: new Client({ apiKey: langSmithApiKey }),
// prevent rate limiting and unexpected multiple experiment runs
maxConcurrency: 5,
}).then((evalOutput) => {
logger.debug(`runResp:\n ${JSON.stringify(evalOutput, null, 2)}`);
}).catch(() => {
// handle error here
});
also we would need to add better error handling in this case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that should be fine. We used to use the result output and write that to the local ES, but don't do that anymore, and so just debug log the output since we've been using LangSmith for analysis after the fact. Really we just need to know if the eval was successful from the API perspective.
That said, I'm curious why this await
would cause internal issues with the chat clients. That shouldn't change anything with the concurrency or executions, but seems with it that previous ones are getting cancelled? Perhaps there's some shared state between the chat clients like @stephmilovic fixed with createLlmInstance
over in #190004?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you turn down maxConcurrency
to 1
and run with the await
do you still see it? Curious if rate limiting or something might be cancelling the previous runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I was able to run simple stress testing against our llama deployment and I was able to get this error which I guess is the reason our evaluations are failing sometimes
{
"statusCode": 429,
"message": "Rate Limit on Number of Tokens is exceeded. Retry after 8 seconds."
}
I don't think, this is something we cannot control in code and will need to be configured in llm deployment if we need it for evaluations. I found this article https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest#assign-quota which shows how to address that in deployment settings (sent to James as well). There we have an option to assign Tokens-Per-Minute (TPM) to llm deployment.
Btw, lowering maxConcurrency
did not help with the issue. Setting it to 1
failed requests even faster for whatever reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To lower chances of hitting rate limit failure, I removed the await
and restructured code in order to log results and errors at the end of execution. This will allow users to avoid 429 errors in most cases.
After talking to @jamesspi, we agreed that this is something user can fix on their end by increasing the quota and resources. We would prefer to keep things simple instead of finding a good solution to deal with 429 during evaluations and make our code complicated.
@spong what do you think?
@elasticmachine merge upstream |
@elasticmachine merge upstream |
isOpeAiType && | ||
(!connectorApiUrl || | ||
connectorApiUrl === OPENAI_CHAT_URL || | ||
connectorApiProvider === OpenAiProviderType.AzureAi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider adding a new apiProvider
of Open Source
and relying on that for the isOpenSourceModel
determination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my second PR is about that #194831
We will add another provide to cover "other" OpenAI compatible services. It will work for all newly added connectors, though we would like to support already created connectors - OSS models via OpenAI provide.
@elasticmachine merge upstream |
Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]Module Count
Public APIs missing comments
Async chunks
History
To update your PR or re-run it, just comment with: cc @e40pud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for the improvement and for your patience fixing the timeout error. Nice work @e40pud
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]Module Count
Public APIs missing comments
Async chunks
cc @e40pud |
Starting backport for target branches: 8.x https://github.com/elastic/kibana/actions/runs/11224445157 |
…lastic#10416) (elastic#194166) (cherry picked from commit 1ee648d)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…els (#10416) (#194166) (#195324) # Backport This will backport the following commits from `main` to `8.x`: - [[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) (#194166)](#194166) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ievgen Sorokopud","email":"ievgen.sorokopud@elastic.co"},"sourceCommit":{"committedDate":"2024-10-07T21:41:20Z","message":"[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) (#194166)","sha":"1ee648d672f0ed5322183d5abd4ebbe4e13f0a93","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Feature:Security Assistant","Team:Security Generative AI","v8.16.0","backport:version"],"title":"[Security Assistant] AI Assistant - Better Solution for OSS models (#10416)","number":194166,"url":"https://github.com/elastic/kibana/pull/194166","mergeCommit":{"message":"[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) (#194166)","sha":"1ee648d672f0ed5322183d5abd4ebbe4e13f0a93"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/194166","number":194166,"mergeCommit":{"message":"[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) (#194166)","sha":"1ee648d672f0ed5322183d5abd4ebbe4e13f0a93"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Ievgen Sorokopud <ievgen.sorokopud@elastic.co>
Summary
This PR improves OSS models integration via OpenAI connector (https://github.com/elastic/security-team/issues/10416).
OSS models do not support "native" tool calling, thus we need to do that via prompting. To do that we use structured chat agent (
LangChain.createStructuredChatAgent
) and appropriate prompt (seeSTRUCTURED_SYSTEM_PROMPT
inx-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/translations.ts
file).One important note, since we are using OpenAI connector to connect to OSS model we need to recognise and distinguish that which is done in a new utils function
isOpenSourceModel
inx-pack/plugins/elastic_assistant/server/routes/utils.ts
file.To avoid tool's output parsing in streaming mode, we do enable response step for the OSS models - similar to what we do in case of Bedrock. This simplifies parsing during streaming and safes us headaches from fighting with JSON formatting issues returned by OSS models (especially Llama). I noticed that in case of Llama, the final answer returned with extra escape characters which breaks a markdown.
Tested OSS models
Both models work well with existing ES tools. Mistral shown slightly better results than Llama.
Credentials
Use these credentials for testing
LLama 3.1 70B
andMistral Large 2407
:https://p.elstc.co/paste/ofqxUGiW#RR1Pedserj9hWKm3vOw5oVEHWuwbX7i9Jl2rS7q7MRP
Notes about ES|QL generation
By default works with the
ESQLKnowledgeBaseTool
tool. The newNaturalLanguageESQLTool
provides better results. To enable it use:Example of ES|QL generation using Mistral model and
NaturalLanguageESQLTool
toolmistral.mov
Evaluation
Experiment 1
Suite: ES|QL Generation Regression Suite
Models:
Llama 3.1 405B
andMistral Large 2407
Tool:
ESQLKnowledgeBaseTool
Correctness results:
Experiment 2
Suite: ES|QL Generation Regression Suite
Models:
Llama 3.1 405B
andMistral Large 2407
Tool:
NaturalLanguageESQLTool
Results:
Checklist
Delete any items that are not applicable to this PR.