-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics UI] Alerts fail when hitting the bucket limit #68492
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
Issue persists in 7.8.0. |
The filters and grouping work fine in both Inventory and Metrics Explorer. However, no alert instances are created if we apply the same filter in an alert. We have also noticed there are discrepancies between the chart shown on the alert flyout with the Metrics Explorer charts. I am not sure if this is related, but please refer to the following screenshots. Metrics Explorer Alert flyout Based on the above screenshots, you may notice that we have got 2 hosts with CPU exceeding 50% usage most of the time. However, if we refer to the alert flyout, the chart only plotted the usage for one host. Does this mean the chart in the alert flyout does not support grouping in 7.8 yet? Now, based on the same Metrics Explorer charts and alert configuration, we know that there should be at least 1 alert instance every time, because the 2nd chart on the first row shows that host CPU usage consistently hovers around 56%-58%, but we are seeing nothing in the alert instance list. If we remove the 2nd condition: |
Still trying to figure out how to reproduce this result on my end, but in the meantime I can answer:
Correct, we only show you one sample group on the chart and don't yet have a way to paginate through all the rest of them. We do have #67684 coming up in 7.9 which can at least tell you if some of your matched groups will cause the alert to fire, but we can talk about adding pagination if that'd be a good UX improvement. Will update this issue once we get closer to figuring out what's causing your problem. |
Yup, sure, my initial post was based on our customer's production and DR environment, while my subsequent post was based on our demo/test environment. We are able to replicate the same in both environments. |
Noticed the following query error in ES, looks like the query failed to execute. Tried to run the query, but I encountered Expand
|
Exceeding a bucket limit could explain why @Zacqary was unable to reproduce this. I believe the index threshold does some kind of calculation to determine how many buckets could be created, and does something if it's over some limit (eg, 10K) - reducing the number of date ranges or something. Also looking at that query, wondering if the filter can be moved out of the Should also note that the index threshold only does elaborate date range aggregations like this for generating the data for the graph it shows - the query used when the alert runs only looks at one specific date range. I wonder if this query was actually for a graph visualization rather than what the alert ran. |
What I noticed was this error occurs regularly at a time interval, so I don't think it is caused by the graph visualization. The alert flyout was also not opened when this error appeared in ES log. |
@hendry-lim Thanks for that error, that's helpful! Metric threshold alerts don't have a too many buckets handler, so it looks like that's what we'll need to add. |
So actually on closer investigation it looks like your query is missing a It would still be valuable to handle the bucket limit in case you wanted to alert on 4000 groups at once, but in this query's case it shouldn't be hitting that bucket limit at all. So this is two bugs for us to fix. |
#70672 should fix the root cause of this issue. @simianhacker told me that the |
Looking good in |
Maintainer Edit
range
filter and query too much data, triggering a Too Many Buckets exceptionOriginal Submitted Issue
Kibana version: 7.7.1
Elasticsearch version: 7.7.1
Server OS version: RHEL 8
Browser version: 83.0.4103.97
Browser OS version: Windows 10
Original install method (e.g. download page, yum, from source, etc.): Docker
Describe the bug:
Alert instances were not created with the following filter in Metric Threshold alert:
NOT host.name:dv* and NOT host.name:ts*
However, alert instances were created if we only used the following:
NOT host.name:dv*
There are other hosts that exceeded the memory threshold other than those that matched
dv*
andts*
.Steps to reproduce:
system.memory.used.pct
is above or equals0.8
5 minutes
NOT host.name:dv* and NOT host.name:ts*
host.name
Expected behavior:
Alert instances should be created with either/both filters applied as long as there are hosts that exceed the memory threshold.
The text was updated successfully, but these errors were encountered: