[Detection Engine] Adds Alert Suppression to ML Rules #181926

rylnd · 2024-04-26T22:04:44Z

Summary

This PR introduces Alert Suppression for ML Detection Rules. This feature is behaviorally similar to alerting suppression for other Detection Engine Rule types, and nearly identical to the analogous features for EQL rules.

There are some additional UI behaviors introduced here as well, mainly intended to cover the shortcomings discovered in #183100. Those behaviors are:

Populating the suppression field list with fields from the anomaly index(es).
Disabling the suppression UI if no selected ML jobs are running (because we cannot populate the list of fields on which they'll be suppressing).
Warning the user if some selected ML jobs are not running (because the list of suppression fields may be incomplete).

See screenshots below for more info.

Intermediate Serverless Deployment

As per the "intermediate deployment" requirements for serverless, while the schema (and declared alert SO mappings) will be extended to allow this functionality, the user-facing features are currently hidden behind a feature flag. Once this is merged and released, we can issue a "final" deployment in which the feature flag is enabled, and the feature effectively released.

Screenshots

Overview of new UI fields
Example of Anomaly fields in suppression combobox
Suppression disabled due to no jobs running
Warning due to not all jobs running

Steps to Review

Review the Test Plan for an overview of behavior
Review Integration tests for an overview of implementation and edge cases
Review Cypress tests for an overview of UX changes
Testing on Demo Instance (elastic/changeme)
1. This instance has the relevant feature flag enabled, has some sample auditbeat data, as well as the anomalies archive data for the purposes of exercising an ML rule against "real" anomalies
2. There are a few example rules in the default space:
  1. A simple query rule against auditbeat data
  2. An ML rule with per-execution suppression on both by_field_name and by_field_value (which ends up not actually suppressing anything)
  3. An ML rule with per-execution suppression on by_field_name (which suppresses all anomalies into a single alert)

Related Issues

This feature was temporarily blocked by [Security Solution][Detection Engine] ML Rule forms have incorrect autocomplete fields #183100, but those changes are now in this PR.

Checklist

Functional changes are hidden behind a feature flag. If not hidden, the PR explains why these changes are being implemented in a long-living feature branch.
Functional changes are covered with a test plan and automated tests.
- Test Plan
Stability of new and changed tests is verified using the Flaky Test Runner in both ESS and Serverless. By default, use 200 runs for ESS and 200 runs for Serverless.
Comprehensive manual testing is done by two engineers: the PR author and one of the PR reviewers. Changes are tested in both ESS and Serverless.
Mapping changes are accompanied by a technical design document. It can be a GitHub issue or an RFC explaining the changes. The design document is shared with and approved by the appropriate teams and individual stakeholders.
(OPTIONAL) OpenAPI specs changes include detailed descriptions and examples of usage and are ready to be released on https://docs.elastic.co/api-reference. NOTE: This is optional because at the moment we don't have yet any OpenAPI specs that would be fully "documented" and "GA-ready" for publishing on https://docs.elastic.co/api-reference.
Functional changes are communicated to the Docs team. A ticket is opened in https://github.com/elastic/security-docs using the Internal documentation request (Elastic employees) template. The following information is included: feature flags used, target ESS version, planned timing for ESS and Serverless releases.

This is mostly based on the current test plan. It's not wired up yet, nor are there any actual implementations.

These now have type errors, since ML rules don't yet accept suppression fields. We have our next task!

`node scripts/openapi/generate`

We're now asserting that suppression fields are present on the generated alerts, which they're not, because we haven't implemented them yet. That's the next step!

* Adds call getIsSuppressionActive in our rule executor, and necessary dependencies * Adds suppression fields to ML rule schema * Adds feature flag for ML suppression

I noticed that it doesn't look like we're including a lot of timing info in the ML executor; adding this to validate that, and document what we _are_ recording.

This will light up the paths that we need to implement. Next!

This adds all the parameters necessary to invoke this method (if relevant) in the ML rule executor. Given the relative simplicity of the ML rule type, I'm guessing that many of these values are irrelevant/unused in this case, but I haven't yet investigated that. Next step is to exercise this implementation against the FTR tests, and see if the behavior is what we expect. Once that's done, we can try to pare down what we need/use. I also added some TODOs in the course of this work to check some potential bugs I noticed.

Tests were failing as rules were being created without suppression params. Fixed!

We've got suppression fields making it into ML alerts for the first time! Now, to test the various suppression conditions.

I realized that most of these tests were using es_archiver to insert anomalies into an index, but our tests were only ever using a single one of those anomalies. In order to ensure these tests are independent of the data in that archive, I've created and leveraged a helper to delete all the persisted anomalies, and then use existing tooling to manually insert the anomalies needed for our tests. All of the current tests are green; there are just a few more permutations that still need to be implemented.

This tests all of the interesting permutations of alert suppression for ML rules, both with per-execution and interval suppression durations. I added a few TODOs noting unexpected (to me) behavior; we'll see what others think.

...tion_logic/trial_license_complete_tier/execution_logic/machine_learning_alert_suppression.ts

The behavior demonstrated in this test is in fact expected, as the suppression duration window applies to the alert creation time, not the original anomaly time.

rylnd · 2024-05-08T18:00:22Z

/ci

Most other rule types have both a "fill" task and a "fillAndContinue" task; this adds that pattern for ML rules on the Define step.

These are failing because I haven't yet enabled the suppression UI for ML rules. Once that's done, we can start validating these tests.

Since some of these fields won't be mapped in the alerts index, we can't always do the dynamic filter generation based on the suppression fields. Until we have direction on how to handle that, we can at least display the current alert by _id, and allow the analyst to expand the timeline from there.

This mainly just composes some existing hooks that were previously pieced together in the form itself (step_define_rule) into a new hook, which is agnostic of the form itself.

Not sure where this came from, whether it was a bad conflict or just some weird autocompletion that I missed.

pmuellr

ResponseOps changes LGTM. Only changes there were in our schema-change test, indicating new parameters in the rules, which could cause BWC / ZDT issues in rollbacks. Conversation in thread #181926 (comment) sounds like this will be handled appropriately.

kibana-ci · 2024-07-01T21:28:00Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: a9b410d

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`securitySolution`	5582	5585	+3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	15.6MB	15.6MB	+3.3KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`securitySolution`	83.7KB	83.8KB	+68.0B

Unknown metric groups

References to deprecated APIs

id	before	after	diff
`securitySolution`	571	574	+3

History

💔 Build #218877 failed 1924596
💔 Build #218588 failed d666eb0
💔 Build #218583 failed d152289
💛 Build #218513 was flaky 6ce66d1
💛 Build #218338 was flaky 307e964

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @rylnd

...tion_logic/trial_license_complete_tier/execution_logic/machine_learning_alert_suppression.ts

x-pack/plugins/security_solution/public/detections/components/alerts_table/actions.tsx

michaelolo24

Investigations code owner changes. Nice work!

vitaliidm

Great work, @rylnd

I tested suppression on rule interval and during rule execution and did not find issues.

Consider to add ftr tests as per #181926 (comment), since we had in past issues with enrichment not working with suppression.

Have you also had a chance to check whether tests flaky or not?
Since I see it's checked in description, but links lead to PR itself

Stability of new and changed tests is verified using the Flaky Test Runner in both ESS and Serverless. By default, use 200 runs for ESS and 200 runs for Serverless.
ESS - Cypress x 200
Serverless - Cypress x 200
ESS - API x 200
Serverless - API x 200

vitaliidm · 2024-07-02T15:43:37Z

@rylnd

I also think that we should disable duration and missing fields checkboxes, when suppression fields controller is disabled.
Otherwise user can still edit it and save rule with new configuration

Screen.Recording.2024-07-02.at.16.38.16.mov

As they rely on a feature flag to function.

rylnd · 2024-07-02T19:32:25Z

I also think that we should disable duration and missing fields checkboxes, when suppression fields controller is disabled.

@vitaliidm I'm looking into this but we need to revisit all of those suppression conditions in the rule form. I'm going to merge this PR as is, we can determine if there are any bugs with non-ML rules during testing tomorrow, and address the form fixes holistically as a followup.

kibanamachine · 2024-07-02T20:42:46Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6447

[❌] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/ess.config.ts: 161/200 tests passed.

see run history

kibanamachine · 2024-07-02T21:03:28Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6448

[❌] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/serverless.config.ts: 156/200 tests passed.

see run history

kibanamachine · 2024-07-03T09:24:12Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6449

[❌] Security Solution Detection Engine - Cypress: 173/200 tests passed.

see run history

kibanamachine · 2024-07-03T13:08:49Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6450

[❌] [Serverless] Security Solution Detection Engine - Cypress: 168/200 tests passed.

see run history

This was requested during review of elastic#181926, and I'm circling back to that now.

## Summary This PR is a followup to #181926. It includes the following changes: - Refactoring some Rule Form logic with `useMemo` - Requested [in this discussion](#181926 (comment)) - Addressed in a5fcf4d - Adds FTR tests validating ML Suppression supports alert enrichment - Requested [during previous review](#181926 (comment)) - Addressed in d5aa551 - Disables ML Suppression fields as a group - Requested in [this comment](#181926 (comment)) - Addressed by 983945b ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [x] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))

## Summary This PR is a followup to elastic#181926. It includes the following changes: - Refactoring some Rule Form logic with `useMemo` - Requested [in this discussion](elastic#181926 (comment)) - Addressed in a5fcf4d - Adds FTR tests validating ML Suppression supports alert enrichment - Requested [during previous review](elastic#181926 (comment)) - Addressed in d5aa551 - Disables ML Suppression fields as a group - Requested in [this comment](elastic#181926 (comment)) - Addressed by 983945b ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [x] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) (cherry picked from commit e2150de)

## Summary This PR is a followup to elastic#181926. It includes the following changes: - Refactoring some Rule Form logic with `useMemo` - Requested [in this discussion](elastic#181926 (comment)) - Addressed in a5fcf4d - Adds FTR tests validating ML Suppression supports alert enrichment - Requested [during previous review](elastic#181926 (comment)) - Addressed in d5aa551 - Disables ML Suppression fields as a group - Requested in [this comment](elastic#181926 (comment)) - Addressed by 983945b ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [x] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))

rylnd added 5 commits April 25, 2024 21:52

Add outline of integration test scenarios

ad461bb

This is mostly based on the current test plan. It's not wired up yet, nor are there any actual implementations.

Fleshing out more of our suppression execution tests

8c0f6c1

These now have type errors, since ML rules don't yet accept suppression fields. We have our next task!

Declare alert suppression fields as optional for ML rules

01bcf8e

Generated new types from new schema

c42b339

`node scripts/openapi/generate`

First legitimately failing test

b78c531

We're now asserting that suppression fields are present on the generated alerts, which they're not, because we haven't implemented them yet. That's the next step!

rylnd added Feature:ML Rule Security Solution Machine Learning rule type Feature:Alert Suppression Security Solution Alert Suppression feature Team:Detection Engine Security Solution Detection Engine Area 8.15 candidate labels Apr 26, 2024

rylnd self-assigned this Apr 26, 2024

rylnd added 13 commits May 1, 2024 17:29

Extract executor params to interface

cad4183

Merge branch 'main' into ml_rule_alert_suppression

5377c6d

Adding more ML suppression functionality as typescript and tests dictate

6dbb88f

* Adds call getIsSuppressionActive in our rule executor, and necessary dependencies * Adds suppression fields to ML rule schema * Adds feature flag for ML suppression

Declare ML rule to be suppressible

e29c3d7

Add ML rule to general suppression schema tests

12ad5f5

Add placeholder for ML executor functionality

c8b7c6a

I noticed that it doesn't look like we're including a lot of timing info in the ML executor; adding this to validate that, and document what we _are_ recording.

Declare our new executor parameters needed for rule suppression

7f317cf

This will light up the paths that we need to implement. Next!

Enable feature flag in FTR tests

703084f

Handle ML suppression params in rule converters

b9de69e

Tests were failing as rules were being created without suppression params. Fixed!

First passing integration test

7e63b4d

We've got suppression fields making it into ML alerts for the first time! Now, to test the various suppression conditions.

Flesh out remaining API integration tests

99aaffe

This tests all of the interesting permutations of alert suppression for ML rules, both with per-execution and interval suppression durations. I added a few TODOs noting unexpected (to me) behavior; we'll see what others think.

rylnd commented May 8, 2024

View reviewed changes

...tion_logic/trial_license_complete_tier/execution_logic/machine_learning_alert_suppression.ts Outdated Show resolved Hide resolved

rylnd commented May 8, 2024

View reviewed changes

...tion_logic/trial_license_complete_tier/execution_logic/machine_learning_alert_suppression.ts Outdated Show resolved Hide resolved

Update test description in response to feedback

ee86fb1

The behavior demonstrated in this test is in fact expected, as the suppression duration window applies to the alert creation time, not the original anomaly time.

rylnd added 3 commits May 8, 2024 14:34

Add non-destructive form filling task for ML rules

b5e809d

Most other rule types have both a "fill" task and a "fillAndContinue" task; this adds that pattern for ML rules on the Define step.

Remove unused helper

10a9a42

Add cypress tests around creating/editing ML rules with suppression

f010c1c

These are failing because I haven't yet enabled the suppression UI for ML rules. Once that's done, we can start validating these tests.

rylnd added 5 commits July 1, 2024 13:13

Merge branch 'main' into ml_rule_alert_suppression

1924596

Test prebuilt rule workflows with ML Alert Suppression

96141dd

Abstract ML-related logic into helper hook

3aa105d

This mainly just composes some existing hooks that were previously pieced together in the form itself (step_define_rule) into a new hook, which is agnostic of the form itself.

Remove unused parameter

a9b410d

Not sure where this came from, whether it was a bad conflict or just some weird autocompletion that I missed.

pmuellr approved these changes Jul 1, 2024

View reviewed changes

MadameSheema reviewed Jul 2, 2024

View reviewed changes

...tion_logic/trial_license_complete_tier/execution_logic/machine_learning_alert_suppression.ts Outdated Show resolved Hide resolved

michaelolo24 reviewed Jul 2, 2024

View reviewed changes

x-pack/plugins/security_solution/public/detections/components/alerts_table/actions.tsx Show resolved Hide resolved

michaelolo24 approved these changes Jul 2, 2024

View reviewed changes

vitaliidm approved these changes Jul 2, 2024

View reviewed changes

Skip FTR suppression tests in MKI

8e07763

As they rely on a feature flag to function.

MadameSheema approved these changes Jul 2, 2024

View reviewed changes

rylnd merged commit 2aa94a2 into elastic:main Jul 2, 2024
38 checks passed

rylnd deleted the ml_rule_alert_suppression branch July 2, 2024 19:33

kibanamachine added v8.15.0 backport:skip This commit does not require backporting labels Jul 2, 2024

nastasha-solomon mentioned this pull request Jul 5, 2024

[Request][8.15 & Serverless] Alert suppression for ML rules elastic/security-docs#5517

Closed

rylnd added a commit to rylnd/kibana that referenced this pull request Jul 12, 2024

Add tests validating enrichment behavior for ML suppression

d5aa551

This was requested during review of elastic#181926, and I'm circling back to that now.

rylnd mentioned this pull request Jul 16, 2024

[Detection Engine] ML Rule Alert Suppression - Followup #188267

Merged

3 tasks

Mikaayenson mentioned this pull request Aug 14, 2024

[FR] Add Alert Suppression for Addtional Rule Types elastic/detection-rules#3986

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Detection Engine] Adds Alert Suppression to ML Rules #181926

[Detection Engine] Adds Alert Suppression to ML Rules #181926

rylnd commented Apr 26, 2024 •

edited

Loading

rylnd commented May 8, 2024

pmuellr left a comment

kibana-ci commented Jul 1, 2024

References to deprecated APIs

michaelolo24 left a comment

vitaliidm left a comment

vitaliidm commented Jul 2, 2024

rylnd commented Jul 2, 2024

kibanamachine commented Jul 2, 2024

kibanamachine commented Jul 2, 2024

kibanamachine commented Jul 3, 2024

kibanamachine commented Jul 3, 2024

[Detection Engine] Adds Alert Suppression to ML Rules #181926

[Detection Engine] Adds Alert Suppression to ML Rules #181926

Conversation

rylnd commented Apr 26, 2024 • edited Loading

Summary

Intermediate Serverless Deployment

Screenshots

Steps to Review

Related Issues

Checklist

rylnd commented May 8, 2024

pmuellr left a comment

Choose a reason for hiding this comment

kibana-ci commented Jul 1, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Async chunks

Page load bundle

References to deprecated APIs

History

michaelolo24 left a comment

Choose a reason for hiding this comment

vitaliidm left a comment

Choose a reason for hiding this comment

vitaliidm commented Jul 2, 2024

rylnd commented Jul 2, 2024

kibanamachine commented Jul 2, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6447

kibanamachine commented Jul 2, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6448

kibanamachine commented Jul 3, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6449

kibanamachine commented Jul 3, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6450

rylnd commented Apr 26, 2024 •

edited

Loading