Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Summarize a break down by (collapse bucket column) #94619

Closed
flash1293 opened this issue Mar 15, 2021 · 7 comments · Fixed by #131748
Closed

[Lens] Summarize a break down by (collapse bucket column) #94619

flash1293 opened this issue Mar 15, 2021 · 7 comments · Fixed by #131748
Assignees
Labels
discuss enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@flash1293
Copy link
Contributor

In some cases aggregating calculations spanning a whole series should be kept along with the individual values (#94597), but in some cases they are not relevant and should be "collapsed" - e.g. in the resulting table passed to the chart only the aggregated value is included. This will reduce the number of rows in the table.

Description

by_terms(field, size, metric, reduce op, order by, order direction)
by_toplevel_terms(field, size, metric, reduce op, order by, order direction)

Reduce op can be: min, max, sum, avg, last, first
Order by can be: term, metric, count
Example: by_toplevel_terms(“host”, 100, last(out_bytes), sum)

by_terms is the function to do min/max/sum/... buckets on the current level of bucket nesting, by using by_toplevel_terms it's doing the same thing, but changing the order of columns, putting the inner terms agg to the top level.

This syntax is limiting the collapsed bucket columns to use the top values function only, but I think this is also by far the most common use case. If this wouldn't cover some case, we could make it more flexible later on and allow other buckets. This can always be worked around by the user adding a runtime field which calculates the desired bucket (e.g. for bucketing by year)

Implementation

If there are multiple by_terms or by_toplevel_terms , we have to do separate esaggs calls anyway, merging together the resulting tables using an outer join on the shared bucket columns. This makes the logic of to_expression much more complex, because for each by_*_terms a separate esaggs has to be made (as some of the columns will be collapsed but others won't). The resulting tables can be joined together by doing an outer join on the bucket columns all esaggs calls have in common.

Use case

If the last state of multiple entities has to be aggregated, e.g. the cumulative load of all hosts: by_toplevel_terms(host, 100, last_value(load), sum)

@flash1293 flash1293 added discuss enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Mar 15, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@ghudgins
Copy link
Contributor

this case got +1'd by a customer who had to do series aggregation in TSVB to get the proper calculation per server to then derive an overall calculation.

@ghudgins
Copy link
Contributor

+1 - summary of a metric across hosts without displaying the hosts

@flash1293
Copy link
Contributor Author

flash1293 commented Apr 4, 2022

The issue with the idea presented above is that it will only work for "top values". Maybe we should think about another approach to allow the full flexibility (and even going beyond) of min/max/avg/sum bucket aggs.

One idea is to not solving this within formula but instead having a separate dimension group:
Screenshot 2022-04-04 at 13 10 05
Screenshot 2022-04-04 at 13 12 01

In this example the "Count of records" dimension would actually be broken down by the additional "Top values of geo.dest", then summed up because of the "Collapse by" selection.

This would limit the user to apply the same "collapsed" dimensions per layer, but maybe that's OK. This way it would be straightforward to break down by multiple nested dimensions and collapse them all at once (like "give me the average day for the average host" - daily date histogram + top values for host in the "breakdown and collapse" group)

I would like to avoid to provide a way to encode all bucket dimensions into formula as well.

@flash1293
Copy link
Contributor Author

After offline discussion, we decided to allow the user to collapse certain existing breakdown dimensions:
Screenshot 2022-04-19 at 15 05 21
Screenshot 2022-04-19 at 15 05 35

If these are selected, the dimension is displayed as hidden and not shown in the chart anymore.

On the technical layer, the chart toExpression function is calling a collapse_column expression function to do the collapsing, then it's constructing the expression to not include the breakdown dimension.

This will be added for pie, table and xy - later once we are going to add gauge and metric breakdown, it will be added as an option to these as well.

@flash1293
Copy link
Contributor Author

flash1293 commented Apr 21, 2022

The approach described above is missing one feature which might be common (but is also not supported in TSVB today) - calculating the ratio of two series aggs. However, if we would implement #94609 , then it would be possible to do the calculation with the given primitives:

  • There's a series with dimension fields dim1, dim2 and dim3
  • There are two gauge metrics (m1 and m2)
  • The required calculation is to sum up the last value for each series (defined by dim1-3) for both m1 and m2, then building the ratio (the values for m1 and m2 are not necessarily defined on the same documents, but they share their dimensions)
  • Do a breakdown dimension on dim1-3
  • Create a formula like this: last_value(m1) / overall_sum(m2, group_by=reference(x_axis))
  • Collapse the breakdown dimension using sum
  • The resulting single series is the percentage of the sum of last values of m1 by the sum of last values of m2

We can add the same functionality to TSVB relatively easily by adding a new variable to the math context: #130766

@ghudgins ghudgins changed the title [Lens] Formula: Collapse bucket column [Lens] Summarize a break down by (collapse bucket column) May 5, 2022
@flash1293 flash1293 self-assigned this May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants