Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Handle dupes when processing threshold rules #83062

Merged
merged 41 commits into from
Nov 30, 2020
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
7ac4f7d
Fix threshold rule synthetic signal generation
madirey Nov 3, 2020
fba2636
Merge branch 'master' of github.com:elastic/kibana into cidr-house-rules
madirey Nov 3, 2020
352a139
Merge branch 'master' of github.com:elastic/kibana into cidr-house-rules
madirey Nov 5, 2020
0c9d5e3
Use top_hits aggregation
madirey Nov 5, 2020
2f61b30
Find signals and aggregate over search terms
madirey Nov 6, 2020
56e558a
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 9, 2020
4610643
Exclude dupes
madirey Nov 9, 2020
3fa0e69
Fixes to algorithm
madirey Nov 10, 2020
0c3c1e7
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 10, 2020
35e2119
Sync timestamps with events/signals
madirey Nov 10, 2020
29641b6
Merge branch 'master' of github.com:elastic/kibana into cidr-house-rules
madirey Nov 10, 2020
5914e99
Add timestampOverride
madirey Nov 10, 2020
26c7c0d
Merge branch 'cidr-house-rules' into threshold-dupes
madirey Nov 10, 2020
7bbd12a
Revert changes in signal creation
madirey Nov 10, 2020
399d886
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 10, 2020
960ed9b
Simplify query, return 10k buckets
madirey Nov 10, 2020
8591886
Account for when threshold.field is not supplied
madirey Nov 10, 2020
a90af11
Merge branch 'master' of github.com:elastic/kibana into cidr-house-rules
madirey Nov 10, 2020
bf910ab
Ensure we're getting the last event when threshold.field is not provided
madirey Nov 10, 2020
2396190
Merge branch 'cidr-house-rules' into threshold-dupes
madirey Nov 10, 2020
83f81cb
Add missing import
madirey Nov 10, 2020
3052975
Merge branch 'cidr-house-rules' into threshold-dupes
madirey Nov 10, 2020
8d6c81d
Handle case where threshold field not supplied
madirey Nov 10, 2020
a087823
Fix type errors
madirey Nov 11, 2020
2e424f1
Merge master
madirey Nov 11, 2020
c898480
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 11, 2020
925acc7
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 12, 2020
ae24022
Handle non-ECS fields
madirey Nov 12, 2020
2c8dcfa
Regorganize
madirey Nov 12, 2020
8a7c84a
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 13, 2020
49044a6
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 20, 2020
974a5a3
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 23, 2020
23e4ec3
Address comments
madirey Nov 23, 2020
2bc904f
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 23, 2020
1e8d122
Fix type error
madirey Nov 24, 2020
3a54495
Add unit test for buildBulkBody on threshold results
madirey Nov 24, 2020
e5f2593
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 24, 2020
80a582a
Add threshold_count back to mapping (and deprecate)
madirey Nov 25, 2020
fe956ad
Merge branch 'master' of github.com:elastic/kibana into threshold-dupes
madirey Nov 25, 2020
2c61c26
Timestamp fixes
madirey Nov 25, 2020
a855a19
Merge branch 'master' into threshold-dupes
kibanamachine Nov 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import signalsMapping from './signals_mapping.json';
import ecsMapping from './ecs_mapping.json';

export const SIGNALS_TEMPLATE_VERSION = 2;
export const SIGNALS_TEMPLATE_VERSION = 3;
export const MIN_EQL_RULE_INDEX_VERSION = 2;

export const getSignalsTemplate = (index: string) => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,16 @@
"threshold_count": {
"type": "float"
},
"threshold_result": {
rylnd marked this conversation as resolved.
Show resolved Hide resolved
"properties": {
"count": {
"type": "long"
},
"value": {
"type": "keyword"
}
}
},
"depth": {
"type": "integer"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,115 @@ describe('buildBulkBody', () => {
expect(fakeSignalSourceHit).toEqual(expected);
});

test('bulk body builds well-defined body with threshold results', () => {
const sampleParams = sampleRuleAlertParams();
const baseDoc = sampleDocNoSortId();
const doc: SignalSourceHit = {
...baseDoc,
_source: {
...baseDoc._source,
threshold_result: {
count: 5,
value: 'abcd',
},
},
};
delete doc._source.source;
const fakeSignalSourceHit = buildBulkBody({
doc,
ruleParams: sampleParams,
id: sampleRuleGuid,
name: 'rule-name',
actions: [],
createdAt: '2020-01-28T15:58:34.810Z',
updatedAt: '2020-01-28T15:59:14.004Z',
createdBy: 'elastic',
updatedBy: 'elastic',
interval: '5m',
enabled: true,
tags: ['some fake tag 1', 'some fake tag 2'],
throttle: 'no_actions',
});
// Timestamp will potentially always be different so remove it for the test
// @ts-expect-error
delete fakeSignalSourceHit['@timestamp'];
const expected: Omit<SignalHit, '@timestamp'> & { someKey: 'someValue' } = {
someKey: 'someValue',
event: {
kind: 'signal',
},
signal: {
parent: {
id: sampleIdGuid,
type: 'event',
index: 'myFakeSignalIndex',
depth: 0,
},
parents: [
{
id: sampleIdGuid,
type: 'event',
index: 'myFakeSignalIndex',
depth: 0,
},
],
ancestors: [
{
id: sampleIdGuid,
type: 'event',
index: 'myFakeSignalIndex',
depth: 0,
},
],
original_time: '2020-04-20T21:27:45+0000',
status: 'open',
rule: {
actions: [],
author: ['Elastic'],
building_block_type: 'default',
id: '04128c15-0d1b-4716-a4c5-46997ac7f3bd',
rule_id: 'rule-1',
false_positives: [],
max_signals: 10000,
risk_score: 50,
risk_score_mapping: [],
output_index: '.siem-signals',
description: 'Detecting root and admin users',
from: 'now-6m',
immutable: false,
index: ['auditbeat-*', 'filebeat-*', 'packetbeat-*', 'winlogbeat-*'],
interval: '5m',
language: 'kuery',
license: 'Elastic License',
name: 'rule-name',
query: 'user.name: root or user.name: admin',
references: ['http://google.com'],
severity: 'high',
severity_mapping: [],
tags: ['some fake tag 1', 'some fake tag 2'],
threat: [],
throttle: 'no_actions',
type: 'query',
to: 'now',
note: '',
enabled: true,
created_by: 'elastic',
updated_by: 'elastic',
version: 1,
created_at: fakeSignalSourceHit.signal.rule?.created_at,
updated_at: fakeSignalSourceHit.signal.rule?.updated_at,
exceptions_list: getListArrayMock(),
},
threshold_result: {
count: 5,
value: 'abcd',
},
depth: 1,
},
};
expect(fakeSignalSourceHit).toEqual(expected);
});

test('bulk body builds original_event if it exists on the event to begin with', () => {
const sampleParams = sampleRuleAlertParams();
const doc = sampleDocNoSortId();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ export const buildBulkBody = ({
...buildSignal([doc], rule),
...additionalSignalFields(doc),
};
delete doc._source.threshold_result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this merit a comment about preventing the "meta signals pollution"? Or should we make this more implicit by moving this into a simple filtering function, for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really about persisting values that belong to the signal before the the signal object has been created yet. I like the idea of using a filter, or even maybe architecting this a little differently so that the signal fields can be merged into the signal object, rather than having this function create the signal object. Will look at this in a follow-up PR.

const event = buildEventTypeSignal(doc);
const signalHit: SignalHit = {
...doc._source,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ export const buildSignal = (docs: BaseSignalHit[], rule: RulesSchema): Signal =>
export const additionalSignalFields = (doc: BaseSignalHit) => {
return {
parent: buildParent(removeClashes(doc)),
original_time: doc._source['@timestamp'],
original_time: doc._source['@timestamp'], // This field has already been replaced with timestampOverride, if provided.
original_event: doc._source.event ?? undefined,
threshold_count: doc._source.threshold_count ?? undefined,
threshold_result: doc._source.threshold_result,
original_signal:
doc._source.signal != null && !isEventTypeSignal(doc) ? doc._source.signal : undefined,
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,10 @@ const getTransformedHits = (

const source = {
'@timestamp': get(timestampOverride ?? '@timestamp', hit._source),
threshold_count: totalResults,
threshold_result: {
count: totalResults,
value: ruleId,
},
...getThresholdSignalQueryFields(hit, filter),
};

Expand All @@ -176,7 +179,10 @@ const getTransformedHits = (

const source = {
'@timestamp': get(timestampOverride ?? '@timestamp', hit._source),
threshold_count: docCount,
threshold_result: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that these "temporary" fields (@timestamp, threshold_result) are not intended to be persisted in this format to the final signal, but instead are used to pass context from the query to the signal generation. Had you considered making this process more explicit by nesting these fields in some declarative key like searchContext ?

This seems indicative of some broader architectural issues: getThresholdSignalQueryFields seems analogous to additionalSignalFields except that it exists at a different level, but I could absolutely see it being moved up as additionalSourceFields and similarly taking from _source.searchContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't new code so definitely a NIT on the above; just curious what your thoughts are here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think I can address this with the signal properties merging code discussed above. getThresholdSignalQueryFields is going away, per our discussion last week. Those fields are just copies of the data that can be reconstructed using signal.rule.query and signal.rule.filter combined with the "searchContext" contained in threshold_result. It may be cumbersome to get them all into one searchContext object, but I'll take a look at it. The query and filter already exist in the signal.rule section of the signal, while threshold_result belongs to the signal itself (it's computed after the data are aggregated). I think this can be cleaned up a bit though, so I'll see what I can do, and then run it by you. Thanks!

count: docCount,
value: get(threshold.field, hit._source),
},
...getThresholdSignalQueryFields(hit, filter),
};

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

import { TimestampOverrideOrUndefined } from '../../../../common/detection_engine/schemas/common/schemas';
import { singleSearchAfter } from './single_search_after';

import { AlertServices } from '../../../../../alerts/server';
import { Logger } from '../../../../../../../src/core/server';
import { SignalSearchResponse } from './types';
import { BuildRuleMessage } from './rule_messages';

interface FindPreviousThresholdSignalsParams {
from: string;
to: string;
indexPattern: string[];
services: AlertServices;
logger: Logger;
ruleId: string;
bucketByField: string;
timestampOverride: TimestampOverrideOrUndefined;
buildRuleMessage: BuildRuleMessage;
}

export const findPreviousThresholdSignals = async ({
from,
to,
indexPattern,
services,
logger,
ruleId,
bucketByField,
timestampOverride,
buildRuleMessage,
}: FindPreviousThresholdSignalsParams): Promise<{
searchResult: SignalSearchResponse;
searchDuration: string;
searchErrors: string[];
}> => {
const aggregations = {
threshold: {
terms: {
field: 'signal.threshold_result.value',
},
aggs: {
lastSignalTimestamp: {
max: {
field: 'signal.original_time', // timestamp of last event captured by bucket
},
},
},
},
};

const filter = {
bool: {
must: [
{
term: {
'signal.rule.rule_id': ruleId,
},
},
{
term: {
'signal.rule.threshold.field': bucketByField,
},
},
],
},
};

return singleSearchAfter({
aggregations,
searchAfterSortId: undefined,
timestampOverride,
index: indexPattern,
from,
to,
services,
logger,
filter,
pageSize: 0,
buildRuleMessage,
});
};
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ export const findThresholdSignals = async ({
terms: {
field: threshold.field,
min_doc_count: threshold.value,
size: 10000, // max 10k buckets
},
aggs: {
// Get the most recent hit per bucket
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import { Logger, KibanaRequest } from 'src/core/server';

import { Filter } from 'src/plugins/data/common';
import {
SIGNALS_ID,
DEFAULT_SEARCH_AFTER_PAGE_SIZE,
Expand All @@ -29,6 +30,7 @@ import {
RuleAlertAttributes,
EqlSignalSearchResponse,
BaseSignalHit,
ThresholdQueryBucket,
} from './types';
import {
getGapBetweenRuns,
Expand All @@ -46,6 +48,7 @@ import { signalParamsSchema } from './signal_params_schema';
import { siemRuleActionGroups } from './siem_rule_action_groups';
import { findMlSignals } from './find_ml_signals';
import { findThresholdSignals } from './find_threshold_signals';
import { findPreviousThresholdSignals } from './find_previous_threshold_signals';
import { bulkCreateMlSignals } from './bulk_create_ml_signals';
import { bulkCreateThresholdSignals } from './bulk_create_threshold_signals';
import {
Expand Down Expand Up @@ -300,6 +303,46 @@ export const signalRulesAlertType = ({
lists: exceptionItems ?? [],
});

const {
searchResult: previousSignals,
searchErrors: previousSearchErrors,
} = await findPreviousThresholdSignals({
indexPattern: [outputIndex],
from,
to,
services,
logger,
ruleId,
bucketByField: threshold.field,
timestampOverride,
buildRuleMessage,
});

previousSignals.aggregations.threshold.buckets.forEach((bucket: ThresholdQueryBucket) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this logic to a function? We've got similar filter-generating functions and this file is already chonky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally!

esFilter.bool.filter.push(({
bool: {
must_not: {
bool: {
must: [
{
term: {
[threshold.field ?? 'signal.rule.rule_id']: bucket.key,
},
},
{
range: {
[timestampOverride ?? '@timestamp']: {
lte: bucket.lastSignalTimestamp.value_as_string,
},
},
},
],
},
},
},
} as unknown) as Filter);
});

const { searchResult: thresholdResults, searchErrors } = await findThresholdSignals({
inputIndexPattern: inputIndex,
from,
Expand Down Expand Up @@ -349,7 +392,7 @@ export const signalRulesAlertType = ({
}),
createSearchAfterReturnType({
success,
errors: [...errors, ...searchErrors],
errors: [...errors, ...previousSearchErrors, ...searchErrors],
createdSignalsCount: createdItemsCount,
bulkCreateTimes: bulkCreateDuration ? [bulkCreateDuration] : [],
}),
Expand Down
Loading