Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Benchmark performance of importing a large number of prebuilt rules #195632

Closed
Tracked by #174168
banderror opened this issue Oct 9, 2024 · 4 comments
Assignees
Labels
8.18 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Feature:Rule Import/Export Security Solution Detection Rule Import & Export workflow Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.18.0

Comments

@banderror
Copy link
Contributor

banderror commented Oct 9, 2024

Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168

Summary

With #180168, we're going to introduce additional logic to the import endpoint for calculating the rule source object. Some of this logic will be run once for a given import call, some of it will be run multiple times for each rule being imported. Some of it can be IO-heavy (installing the package, fetching historical rule versions and ids), some of it can be CPU-heavy (calculating a diff for each rule).

Based on our prior observations, the rules import endpoint times out when importing a large number of rules. I think the number can be around 2-3k rules. Now, with the additional logic, the endpoint is going to be even heavier and can start timing out with a lower number of rules in the ndjson payload.

We would like to:

  • Test the performance of the endpoint with the feature flag turned on, in ESS and Serverless.
  • Profile the import endpoint locally to see if there's any inefficient code paths that could be optimized. Maybe there are some low hanging fruits there.

If we find some easy performance optimizations to do, we might reconsider working on #195633.

Testing performance

  • I suggest to use a few different "extreme" payloads, such as "all existing prebuilt rules", "all existing prebuilt rules where all of them are customized", "all existing prebuilt rules + some custom rules + exceptions/connectors".
  • We need to measure how much time it takes to import the payloads above, in both ESS and Serverless, and find the number of rules in the payload that causes the endpoint to time out.

Profiling

It could be done by sending APM data from your local Kibana to your personal remote Elastic APM and then using APM as a profiler. This remote Elastic APM can be spinned up in production cloud as a normal deployment with APM. Staging cloud is known to be problematic for this use case.

@banderror banderror added 8.17 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Feature:Rule Import/Export Security Solution Detection Rule Import & Export workflow Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team labels Oct 9, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@banderror banderror changed the title [Security Solution] Benchmark performance of importing a large number of prebuilt rules (DRAFT) [Security Solution] Benchmark performance of importing a large number of prebuilt rules Oct 9, 2024
@banderror
Copy link
Contributor Author

banderror commented Nov 27, 2024

Performance testing has been completed and the results discussed with @rylnd and @xcrzx. We haven't found any major issues that would block the release of prebuilt rule import/export within Milestone 3:

  • No crazy durations of import calls: importing 4000 prebuilt rules takes 3 to 4 minutes
  • No OOMs or timeouts in Serverless during the import
  • Enough room to grow in terms of import payloads and number of prebuilt rules

Non-blocking performance optimizations that we will consider for Milestone 4:

Thanks @rylnd and @xcrzx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.18 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Feature:Rule Import/Export Security Solution Detection Rule Import & Export workflow Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.18.0
Projects
None yet
Development

No branches or pull requests

3 participants