Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Allow "count of field" by supporting the value count aggregation #74910

Closed
wylieconlon opened this issue Aug 12, 2020 · 8 comments · Fixed by #136385
Closed

[Lens] Allow "count of field" by supporting the value count aggregation #74910

wylieconlon opened this issue Aug 12, 2020 · 8 comments · Fixed by #136385
Assignees
Labels
enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@wylieconlon
Copy link
Contributor

wylieconlon commented Aug 12, 2020

The value count aggregation lets users count the total number of indexed values for a field, and is mostly useful when dealing with arrays. This metric is supported in TSVB, but not Visualize.

For example, when using kibana_sample_data_ecommerce, you can use the value count aggregation to compare the number of products to the number of orders. Each order contains one or more products, so the value count is higher than the order count:

Screen Shot 2020-08-12 at 6 06 36 PM

Proposal:

Because the name "value count" is potentially confusing, I'm proposing that we combine Count and Value count into one aggregation that can support any field.

Dependencies:

#67403

@wylieconlon wylieconlon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Aug 12, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@wylieconlon wylieconlon added the enhancement New value added to drive a business result label Oct 22, 2020
@wylieconlon
Copy link
Contributor Author

I've opened a related issue for the Elasticsearch team about array fields: elastic/elasticsearch#64077

@wylieconlon
Copy link
Contributor Author

This seems especially relevant for formulas, thanks @ghudgins for raising that connection. Part of the question is whether at the formula level we would want count(field), value_count(field), or maybe a more SQL-inspired syntax like count(all field)

@flash1293
Copy link
Contributor

Not sure how far we should reach into SQL territory with this. If we go with count(field) / count(all field), we should also go with count(distinct field) for cardinality, but that's called unique_count(field) already.

@ghudgins
Copy link
Contributor

ghudgins commented Jun 17, 2021

I think count(distinct field) could be an interesting convenience function (like we let that work how the user expects but don't document it) but it's probably too complex...we'd have to make sure it works with time shifts and KQL and it just gets weirder from there so it's probably too much syntax mashing. I'm always open to shortcuts from what a user expects, though. we could show a message if we hit this case for distinct count.

count(field) on the other hand mirrors all of the other aggregations for the most part and sticks out as inconsistent. To be clear though i'm not proposing we return the value count...I would expect count() and count(field) to be the same (as it is in SQL)

Another one to 🤔 is count(*) which my hands do from years of BI / RDBMS automatically.

@flash1293
Copy link
Contributor

flash1293 commented Jun 17, 2021

I think the danger here is to make both the implementation and the tool itself more complex in order to make it easier for people to pick up formula, which will in the long term do more harm than good. Let's say we would implement count(distinct field) to do the same as unique_count(field) - people getting used to the first because it's what SQL does would expect everything to work like SQL and be confused in these places, as well as when looking at other formula using unique_count.

I would expect count() and count(field) to be the same (as it is in SQL)

It's not the same - count(field) in SQL only counts rows which have a value for field - it's closest to count(kql="field: *") in formula.

This is precisely the thing I'm worried about - all of these things have slightly different semantics and if we try to provide a familiar syntax targeting SQL users, we will break these subtle cases while also making the feature more complex. IMHO it's not worth it and a consistent translation of quick function to formula is more valuable than trying to look like SQL in some special cases.

@ghudgins
Copy link
Contributor

just closing the loop - I retract the tangent about count following a discussion. We agreed it's a slippery slope to support some SQL syntax and not others in custom formula. Thanks for the clarification

We can continue the discussion about the value count aggregation in Lens. Appreciate the conversation!

@ghudgins
Copy link
Contributor

discussed in meeting - count(field) seems like the best path forward without adding a new function. similar to excel which was the inspiration for formula

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants