-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single row implied aggregates default to values instead of records #4420
Conversation
This commit changes the semantics of aggregate functions of an implied summarize op to simply emit the values of the aggregation instead of putting them in a record. This changes the UX for aggregqtions to be more Zed-like and less SQL-like when trying to form simple aggregations that result in a single result without one or more group-by keys. Many of us found ourselves typing the pattern 'agg() | yield agg' and realized this approach would simplify things. This way, 'echo "1 2 3"' | sum(this)' results in 6 instead of {sum:6}. The change here is rather small but the ramifications on tests are non-trivial.
I did just confirm that this change breaks a couple of the end-to-end Zui tests, but the fix looks trivial and is something I'll be able to take care of. |
@mccanne: I think we should do this only if we have a single aggregation. I say that because I think $ echo 1 2 3 | zq 'count(),sum(this)' -
{count:3(uint64),sum:6} is both more intuitive and more useful than $ echo 1 2 3 | zq 'count(),sum(this)' -
3(uint64)
6 |
@@ -18,7 +18,7 @@ echo 'true false true' | zq -z 'and(this)' - | |||
``` | |||
=> | |||
```mdtest-output | |||
{and:false} | |||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR can effectively be seen as one big bug fix. I see now that all this time the usage guidance for all our aggregate functions have looked like:
and(bool) -> bool
i.e., they've been showing the primitive return type this PR introduces and not the record type they've always returned. So now it finally matches. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever we ultimately decide, somewhere in the docs (probably in
docs/language/operators/summarize.md
) I think we should have some brief-but-explicit text explaining when they can expect what's returned to be a value vs. a record.
Ok, just pushed an update to summarize.md
Having read through the updated examples, I think I agree with @nwt's proposal of only doing it with a single aggregation. Whatever we ultimately decide, somewhere in the docs (probably in |
Agreed! Will fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
docs/language/operators/summarize.md
Outdated
@@ -30,6 +30,11 @@ A key may be either an expression or a field. If the key field is omitted, | |||
it is inferred from the expression, e.g., the field name for `by lower(s)` | |||
is `lower`. | |||
|
|||
When the result of `summarize` is a single value (e.g., a single aggregate | |||
function without group-by keys) and there is no field name specified, then | |||
output is the Zed value of that result rather than a single-field record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output is the Zed value of that result rather than a single-field record | |
the output is that single value rather than a single-field record |
docs/language/operators/summarize.md
Outdated
``` | ||
|
||
To format the output of a single-valued aggregation into a record, simply specify | ||
an explicit field for the output as in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an explicit field for the output as in: | |
an explicit field for the output: |
docs/language/operators/summarize.md
Outdated
``` | ||
|
||
When multiple aggregate functions are specified, even without explicit field names, | ||
a record result is generated from the implied fields names of the functions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a record result is generated from the implied fields names of the functions: | |
a record result is generated with field names implied by the functions: |
With #4420, the DAG for "from a | count()" now contains a nested sequential operator, which prevents Optimizer.parallelizeTrunk from parallelizing the flowgraph. Fix this by inlining nested sequential operators in parallelizeTrunk.
With #4420, the DAG for "from a | count()" now contains a nested sequential operator, which prevents Optimizer.parallelizeTrunk from parallelizing the flowgraph. Fix this by inlining nested sequential operators in parallelizeTrunk.
This commit changes the semantics of aggregate functions of an implied summarize op to simply emit the values of the aggregation instead of putting them in a record. This changes the UX for aggregations to be more Zed-like and less SQL-like when trying to form simple aggregations that result in a single result without one or more group-by keys.
Many of us found ourselves typing the pattern 'agg() | yield agg' and realized this approach would simplify things. This way, 'echo "1 2 3"' | sum(this)' results in 6 instead of {sum:6}.
The change here is rather small but the ramifications on tests are non-trivial.
Fixes #4418