-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[alerting] sorted limit of groups in index threshold alert #58905
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Where this gets interesting is when the aggregation function changes - Somehow near the request/response with count() over top 42 host.name.keyword request{
"index": [
"es-apm-sys-sim"
],
"body": {
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "2020-03-12T17:57:36.229Z",
"lt": "2020-03-12T17:58:26.229Z",
"format": "strict_date_time"
}
}
}
}
},
"aggs": {
"groupAgg": {
"terms": {
"field": "host.name.keyword",
"size": 42
},
"aggs": {
"dateAgg": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "2020-03-12T17:57:36.229Z",
"to": "2020-03-12T17:58:26.229Z"
}
]
}
}
}
}
}
},
"ignoreUnavailable": true,
"allowNoIndices": true,
"ignore": [
404
]
} response{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 196,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"groupAgg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "host-A",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T17:57:36.229Z-2020-03-12T17:58:26.229Z",
"from": 1584035856229,
"from_as_string": "2020-03-12T17:57:36.229Z",
"to": 1584035906229,
"to_as_string": "2020-03-12T17:58:26.229Z",
"doc_count": 49
}
]
}
},
{
"key": "host-B",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T17:57:36.229Z-2020-03-12T17:58:26.229Z",
"from": 1584035856229,
"from_as_string": "2020-03-12T17:57:36.229Z",
"to": 1584035906229,
"to_as_string": "2020-03-12T17:58:26.229Z",
"doc_count": 49
}
]
}
},
{
"key": "host-C",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T17:57:36.229Z-2020-03-12T17:58:26.229Z",
"from": 1584035856229,
"from_as_string": "2020-03-12T17:57:36.229Z",
"to": 1584035906229,
"to_as_string": "2020-03-12T17:58:26.229Z",
"doc_count": 49
}
]
}
},
{
"key": "host-D",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T17:57:36.229Z-2020-03-12T17:58:26.229Z",
"from": 1584035856229,
"from_as_string": "2020-03-12T17:57:36.229Z",
"to": 1584035906229,
"to_as_string": "2020-03-12T17:58:26.229Z",
"doc_count": 49
}
]
}
}
]
}
}
} request/response with avg(system.cpu.total.norm.pct) over top 42 host.name.keyword request{
"index": [
"es-apm-sys-sim"
],
"body": {
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "2020-03-12T18:04:00.650Z",
"lt": "2020-03-12T18:04:50.650Z",
"format": "strict_date_time"
}
}
}
}
},
"aggs": {
"groupAgg": {
"terms": {
"field": "host.name.keyword",
"size": 42
},
"aggs": {
"dateAgg": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "2020-03-12T18:04:00.650Z",
"to": "2020-03-12T18:04:50.650Z"
}
]
},
"aggs": {
"metricAgg": {
"avg": {
"field": "system.cpu.total.norm.pct"
}
}
}
}
}
}
}
},
"ignoreUnavailable": true,
"allowNoIndices": true,
"ignore": [
404
]
} response{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 196,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"groupAgg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "host-A",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T18:04:00.650Z-2020-03-12T18:04:50.650Z",
"from": 1584036240650,
"from_as_string": "2020-03-12T18:04:00.650Z",
"to": 1584036290650,
"to_as_string": "2020-03-12T18:04:50.650Z",
"doc_count": 49,
"metricAgg": {
"value": 0.8295918362481254
}
}
]
}
},
{
"key": "host-B",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T18:04:00.650Z-2020-03-12T18:04:50.650Z",
"from": 1584036240650,
"from_as_string": "2020-03-12T18:04:00.650Z",
"to": 1584036290650,
"to_as_string": "2020-03-12T18:04:50.650Z",
"doc_count": 49,
"metricAgg": {
"value": 0.608163266765828
}
}
]
}
},
{
"key": "host-C",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T18:04:00.650Z-2020-03-12T18:04:50.650Z",
"from": 1584036240650,
"from_as_string": "2020-03-12T18:04:00.650Z",
"to": 1584036290650,
"to_as_string": "2020-03-12T18:04:50.650Z",
"doc_count": 49,
"metricAgg": {
"value": 0.44183673238267707
}
}
]
}
},
{
"key": "host-D",
"doc_count": 49,
"dateAgg": {
"buckets": [
{
"key": "2020-03-12T18:04:00.650Z-2020-03-12T18:04:50.650Z",
"from": 1584036240650,
"from_as_string": "2020-03-12T18:04:00.650Z",
"to": 1584036290650,
"to_as_string": "2020-03-12T18:04:50.650Z",
"doc_count": 49,
"metricAgg": {
"value": 0.11428571690102013
}
}
]
}
}
]
}
}
} Here's where to instrument the code to get these kind of data dumps: kibana/x-pack/plugins/alerting_builtins/server/alert_types/index_threshold/lib/time_series_query.ts Lines 107 to 116 in 73d1013
|
Here are some relevant text from the doc on using
In our case, the aggs path includes a multi-bucket date_range agg, so we can't reference the existing "leaf" metric agg this way. We'd need to create a new agg over the entire date range, independent of the date_range agg.
So that means the sort would work fine for |
Seems worth noting as well that we're using the same query in the alert executor AND the time series query to render the viz graph in the alert ui. It's far more important to get the former right, than the latter. The alert executor only ends up with a single date_range bucket, so seems like we could optimize on that, or not optimize and use the whole date range. Add a new agg to get the calculated metric with a single bucket agg based on the last/only date range being requested (or whole date range), and then reference that agg in the order param. For This doesn't take into account the comparators
Maybe? like you'd want to sort ascending for the first, but descending for the second. That leads to quandries like how would you sort
That would involve sorting by the distance of bucket's average from a total average, or something? Seems hard-to-impossible, and beyond the scope of just getting basic ordering in, so will defer working on that part, for the first PR to address this issue. |
The current index threshold alert uses a `size` limit on term aggregation, when used, but does not sort the buckets, so it's just using descending count on the grouped buckets as the sort to determine what to return. The watcher API for the index threshold notes this as "top N of", implying a sort. This PR applies sorting when the using `groupBy: top`, and the `aggType != count`. For count, ES is already sorting the way we want. The sort is calculated as a separate agg beside the date_range aggregation, which is the same metrics agg specified in the query - `aggType(aggField)`. This field is then referenced in a new `order` property in the terms agg, using 'asc' sorting for `min`, and `desc` sorting for `avg`, `max`, and `sum`. This doesn't change the shape of the output at all, just changes which term buckets will be returned, if there are more term buckets than requested with the `termSize` parameter.
The current index threshold alert uses a `size` limit on term aggregation, when used, but does not sort the buckets, so it's just using descending count on the grouped buckets as the sort to determine what to return. The watcher API for the index threshold notes this as "top N of", implying a sort. This PR applies sorting when the using `groupBy: top`, and the `aggType != count`. For count, ES is already sorting the way we want. The sort is calculated as a separate agg beside the date_range aggregation, which is the same metrics agg specified in the query - `aggType(aggField)`. This field is then referenced in a new `order` property in the terms agg, using 'asc' sorting for `min`, and `desc` sorting for `avg`, `max`, and `sum`. This doesn't change the shape of the output at all, just changes which term buckets will be returned, if there are more term buckets than requested with the `termSize` parameter.
The current index threshold alert uses a `size` limit on term aggregation, when used, but does not sort the buckets, so it's just using descending count on the grouped buckets as the sort to determine what to return. The watcher API for the index threshold notes this as "top N of", implying a sort. This PR applies sorting when the using `groupBy: top`, and the `aggType != count`. For count, ES is already sorting the way we want. The sort is calculated as a separate agg beside the date_range aggregation, which is the same metrics agg specified in the query - `aggType(aggField)`. This field is then referenced in a new `order` property in the terms agg, using 'asc' sorting for `min`, and `desc` sorting for `avg`, `max`, and `sum`. This doesn't change the shape of the output at all, just changes which term buckets will be returned, if there are more term buckets than requested with the `termSize` parameter.
* master: (30 commits) [TSVB] fix text color when using custom background color (elastic#60261) Fix import to timefilter from in TSVB (elastic#60296) [NP] Get rid of usage redirectWhenMissing service (elastic#59777) [SIEM] Fix Timeline footer styling (elastic#59587) [ML] Fixes to error handling for analytics jobs and file data viz (elastic#60249) Give better stack traces for Unhandled Promise Rejection warnings (elastic#60235) resolves elastic#58905 (elastic#60120) Added variables button for text fields in Pagerduty component. (elastic#60189) adds test that action vars are rendered for alert action parms (elastic#60310) Closes 59786 by removing the update toast (elastic#60172) [EPM] Packages list tabs (elastic#60167) Added message variables button for Webhook body form field (elastic#60174) Revert "adds new test (elastic#60064)" [Maps] move MapSavedObject type out of telemetry (elastic#60127) [Reporting] Fix error handling for job handler in route (elastic#60161) [Endpoint] TEST: verify alerts page header says 'Alerts' (elastic#60206) EMT-248: implement ack resource to accept event payload to acknowledge agent actions (elastic#60218) Migrate dual validated range (elastic#59689) Embeddable triggers (elastic#58440) [Endpoint] Sample data generator CLI script (elastic#59952) ...
The current index threshold alert uses a `size` limit on term aggregation, when used, but does not sort the buckets, so it's just using descending count on the grouped buckets as the sort to determine what to return. The watcher API for the index threshold notes this as "top N of", implying a sort. This PR applies sorting when the using `groupBy: top`, and the `aggType != count`. For count, ES is already sorting the way we want. The sort is calculated as a separate agg beside the date_range aggregation, which is the same metrics agg specified in the query - `aggType(aggField)`. This field is then referenced in a new `order` property in the terms agg, using 'asc' sorting for `min`, and `desc` sorting for `avg`, `max`, and `sum`. This doesn't change the shape of the output at all, just changes which term buckets will be returned, if there are more term buckets than requested with the `termSize` parameter.
The watcher index threshold alert which the new Kibana alerting index threshold alert is based on, has an option to limit the number of "groups" returned (when using
groupField
). The Kibana alert supports this, but the watcher one labels it as "Top n of ...", implying that the groups are somehow sorted before limiting, presumably showing you the most relevant groups.It's not quite clear how this works, given all the aggregation functions. I think for count, average max and sum, you'd basically want to pick the groups that the highest values being processed. For min, you'd want the lowest.
For between? And I added a "notBetween" to the Kibana alert. I think maybe we just don't sort for those.note: between is a comparator, not an aggregationWe'll need to figure out how to work this into our query DSL that we are sending. I could see some sorting done with the size limiter, not quite sure if that's still applicable given we're doing a different query than watcher did, but seems like a start.
The text was updated successfully, but these errors were encountered: