single row implied aggregates default to values instead of records #4420

mccanne · 2023-03-03T00:33:37Z

This commit changes the semantics of aggregate functions of an implied summarize op to simply emit the values of the aggregation instead of putting them in a record. This changes the UX for aggregations to be more Zed-like and less SQL-like when trying to form simple aggregations that result in a single result without one or more group-by keys.

Many of us found ourselves typing the pattern 'agg() | yield agg' and realized this approach would simplify things. This way, 'echo "1 2 3"' | sum(this)' results in 6 instead of {sum:6}.

The change here is rather small but the ramifications on tests are non-trivial.

Fixes #4418

This commit changes the semantics of aggregate functions of an implied summarize op to simply emit the values of the aggregation instead of putting them in a record. This changes the UX for aggregqtions to be more Zed-like and less SQL-like when trying to form simple aggregations that result in a single result without one or more group-by keys. Many of us found ourselves typing the pattern 'agg() | yield agg' and realized this approach would simplify things. This way, 'echo "1 2 3"' | sum(this)' results in 6 instead of {sum:6}. The change here is rather small but the ramifications on tests are non-trivial.

philrz · 2023-03-03T01:00:19Z

I did just confirm that this change breaks a couple of the end-to-end Zui tests, but the fix looks trivial and is something I'll be able to take care of.

nwt · 2023-03-03T01:29:31Z

@mccanne: I think we should do this only if we have a single aggregation. I say that because I think

$ echo 1 2 3 | zq 'count(),sum(this)' -
{count:3(uint64),sum:6}

is both more intuitive and more useful than

$ echo 1 2 3 | zq 'count(),sum(this)' -
3(uint64)
6

philrz · 2023-03-03T01:15:07Z

docs/language/aggregates/and.md

@@ -18,7 +18,7 @@ echo 'true false true' | zq -z 'and(this)' -
 ```
 =>
 ```mdtest-output
-{and:false}
+false


This PR can effectively be seen as one big bug fix. I see now that all this time the usage guidance for all our aggregate functions have looked like:

and(bool) -> bool

i.e., they've been showing the primitive return type this PR introduces and not the record type they've always returned. So now it finally matches. 😂

Whatever we ultimately decide, somewhere in the docs (probably in docs/language/operators/summarize.md) I think we should have some brief-but-explicit text explaining when they can expect what's returned to be a value vs. a record.

Ok, just pushed an update to summarize.md

philrz · 2023-03-03T01:56:41Z

Having read through the updated examples, I think I agree with @nwt's proposal of only doing it with a single aggregation.

Whatever we ultimately decide, somewhere in the docs (probably in docs/language/operators/summarize.md) I think we should have some brief-but-explicit text explaining when they can expect what's returned to be a value vs. a record.

mccanne · 2023-03-03T02:03:50Z

Agreed! Will fix.

nwt

Looks good!

nwt · 2023-03-03T16:27:05Z

docs/language/operators/summarize.md

@@ -30,6 +30,11 @@ A key may be either an expression or a field.  If the key field is omitted,
 it is inferred from the expression, e.g., the field name for `by lower(s)`
 is `lower`.

+When the result of `summarize` is a single value (e.g., a single aggregate
+function without group-by keys) and there is no field name specified, then
+output is the Zed value of that result rather than a single-field record


Suggested change

output is the Zed value of that result rather than a single-field record

the output is that single value rather than a single-field record

nwt · 2023-03-03T16:27:54Z

docs/language/operators/summarize.md

+```
+
+To format the output of a single-valued aggregation into a record, simply specify
+an explicit field for the output as in:


Suggested change

an explicit field for the output as in:

an explicit field for the output:

nwt · 2023-03-03T16:29:36Z

docs/language/operators/summarize.md

+```
+
+When multiple aggregate functions are specified, even without explicit field names,
+a record result is generated from the implied fields names of the functions:


Suggested change

a record result is generated from the implied fields names of the functions:

a record result is generated with field names implied by the functions:

With #4420, the DAG for "from a | count()" now contains a nested sequential operator, which prevents Optimizer.parallelizeTrunk from parallelizing the flowgraph. Fix this by inlining nested sequential operators in parallelizeTrunk.

mccanne requested review from jameskerr, philrz and a team March 3, 2023 00:33

philrz reviewed Mar 3, 2023

View reviewed changes

philrz requested a review from a team March 3, 2023 01:57

philrz mentioned this pull request Mar 3, 2023

Add links from aggregate function README #4421

Merged

mccanne added 3 commits March 2, 2023 18:20

address PR feedback

11c6346

address PR feedback to update summarize description in docs

84e2d4b

delete trailing space

f3cb4d2

nwt approved these changes Mar 3, 2023

View reviewed changes

address PR feedback

750f649

mccanne merged commit 739e271 into main Mar 3, 2023

mccanne deleted the single-row-agg branch March 3, 2023 16:41

This was referenced Mar 3, 2023

Advance Zed pointer and fix tests brimdata/zui#2690

Merged

Restore a couple aggregate function docs examples #4423

Merged

Crash when attempting to output top-level value as Zeek TSV #4424

Closed

philrz mentioned this pull request Mar 15, 2023

Mention collect() with "zq -j" #4437

Merged

nwt mentioned this pull request Mar 19, 2023

compiler: inline nested sequential operators in parallelizeTrunk #4442

Merged

philrz mentioned this pull request Jun 30, 2023

Change value of several fields based on a field name pattern #4050

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single row implied aggregates default to values instead of records #4420

single row implied aggregates default to values instead of records #4420

mccanne commented Mar 3, 2023 •

edited by philrz

Loading

philrz commented Mar 3, 2023

nwt commented Mar 3, 2023

philrz Mar 3, 2023

mccanne Mar 3, 2023

philrz commented Mar 3, 2023

mccanne commented Mar 3, 2023

nwt left a comment

nwt Mar 3, 2023

nwt Mar 3, 2023

nwt Mar 3, 2023

	output is the Zed value of that result rather than a single-field record
	the output is that single value rather than a single-field record

	an explicit field for the output as in:
	an explicit field for the output:

	a record result is generated from the implied fields names of the functions:
	a record result is generated with field names implied by the functions:

single row implied aggregates default to values instead of records #4420

single row implied aggregates default to values instead of records #4420

Conversation

mccanne commented Mar 3, 2023 • edited by philrz Loading

philrz commented Mar 3, 2023

nwt commented Mar 3, 2023

philrz Mar 3, 2023

Choose a reason for hiding this comment

mccanne Mar 3, 2023

Choose a reason for hiding this comment

philrz commented Mar 3, 2023

mccanne commented Mar 3, 2023

nwt left a comment

Choose a reason for hiding this comment

nwt Mar 3, 2023

Choose a reason for hiding this comment

nwt Mar 3, 2023

Choose a reason for hiding this comment

nwt Mar 3, 2023

Choose a reason for hiding this comment

mccanne commented Mar 3, 2023 •

edited by philrz

Loading