-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cumulative Sum with correct initial value #60672
Comments
Pinging @elastic/kibana-app (Team:KibanaApp) |
Pinging @elastic/kibana-app-arch (Team:AppArch) |
@elastic/es-analytics-geo I wonder if there would be a way to achieve such a feature inside Elasticsearch? Does the cumulative sum has enough knowledge that it could be able to calculate basically the sum (or whatever metric it runs over) for all documents "before the date histogram starts" (which would usually be determined by the overall date range filter)? I am still on the fence, if that - even though a common use-case in Kibana - isn't requiring to specific knowledge to handle it in Elasticsearch directly. If it doesn't make sense, it would at least be good if we could manually specify a initial value to the cumulative_sum that will be added before the first bucket, so we could do a two query approach on that. |
Not easily, no. :( The main issue is that aggs get their values from whatever matches the query, so if the query is filtering out a portion of time none of the aggs will ever get a chance to see it. And aggs don't have any influence over the query so it's entirely up to the user (or kibana) to configure it so that the right data is aggregated. That makes supporting it tricky because we don't want aggs to be dependent on the user setting up the agg tree correctly (e.g. this functionality only works if you have ABC in XYZ places) Supporting something like an Theoretically the data could be collected with a few different combinations, but none of them are super clean: Option 1
Option 2
Option 3
Option 4
Unclear which would be best. The first option is one search collector execution, but the The second executes two searches essentially ( The third is probably the fastest since it's two exclusive queries hitting the minimal amount of data, but does require an msearch and two entire search executions including all the extra overhead. The fourth is probably faster than 1 or 2, but requires kibana to take all the "pre-date-histo" buckets and merge them together before presenting to the user. And in all cases, it requires Kibana to intervene and munge the results into something usable for the user. |
Thanks for the detailed explanation of different possibility. In general it's not problem to mungle results later in Kibana, it would just be nice if we can solve it in one query ideally, since that simplifies infrastructure for us quiet a lot. Thanks also for the performance hints here. My feeling around those:
Since we also have a chance solving that in one request, I think we rather prefer solving it in one request and then summing up on Kibana side, than making 2 requests and need a "initial value" option for the cum sum aggregation. |
👍 that seems reasonable to me, especially as this isn't likely to be a feature that shows up on all dashboards across all parts of kibana. So a little slower relative to other operations is probably acceptable :) Between 1 and 2, hard to really pin down which would be better, might be worthwhile testing both against a decent-sized dataset. Filter aggs work by fetch the bitset of matches against the filter, and then checking that bitset for each document that it sees. This is in contrast to normal query filters which "advance" to the first matching document.
Hard to say. :) |
Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment. |
It would be nice if the user would have a way to use cumulative sum, but not have the first bucket start at 0, but basically at the value the aggregation had for all documents "before" the date_histogram started. Currently there is no real way of using that value as a starting point.
Users want to achieve charts like the following (see this discuss post):
I think that feature would mainly make sense for cumulative_sums that run over date_histogram buckets, though technically the same idea could apply for histograms, and the initial value would be the "sum" (or whatever metric), of all documents smaller then the left most bucket in the histogram.
The text was updated successfully, but these errors were encountered: