Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single row implied aggregates default to values instead of records #4420

Merged
merged 5 commits into from
Mar 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/zed/manage/ztests/index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ inputs:
outputs:
- name: stdout
data: |
{count:10(uint64)}
10(uint64)
6 changes: 2 additions & 4 deletions cmd/zed/ztests/query-stats.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ script: |
zed init -q
zed create -q test
zed load -q -use test babble.zson
zed query -s -Z "from test | count()"
zed query -s -z "from test | count()"

inputs:
- name: babble.zson
Expand All @@ -12,9 +12,7 @@ inputs:
outputs:
- name: stdout
data: |
{
count: 1000 (uint64)
}
1000(uint64)
- name: stderr
data: |
{bytes_read:32889,bytes_matched:32889,records_read:1000,records_matched:1000}
2 changes: 1 addition & 1 deletion cmd/zq/ztests/aggmem.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ inputs:
outputs:
- name: stdout
data: |
{collect:null}
null
- name: stderr
data: |
aggmem value must be greater than zero
2 changes: 1 addition & 1 deletion cmd/zq/ztests/http-simple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ inputs:
outputs:
- name: stdout
data: |
{count:1000(uint64)}
1000(uint64)
2 changes: 1 addition & 1 deletion cmd/zq/ztests/s3-parquet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ inputs:
outputs:
- name: stdout
data: |
{count:10(uint64)}
10(uint64)
2 changes: 1 addition & 1 deletion cmd/zq/ztests/s3-simple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ inputs:
outputs:
- name: stdout
data: |
{count:1000(uint64)}
1000(uint64)
2 changes: 1 addition & 1 deletion compiler/parser/ztests/summarize.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ input: |
output-flags: -pretty=0

output: |
{count:2(uint64)}
2(uint64)
4 changes: 2 additions & 2 deletions compiler/parser/ztests/where-expr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ inputs:
outputs:
- name: stdout
data: |
{count:2(uint64)}
2(uint64)
===
{count:2(uint64)}
2(uint64)
4 changes: 2 additions & 2 deletions compiler/parser/ztests/where-search.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ inputs:
outputs:
- name: stdout
data: |
{count:2(uint64)}
2(uint64)
===
{count:2(uint64)}
2(uint64)
43 changes: 42 additions & 1 deletion compiler/semantic/op.go
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,11 @@ func semOp(ctx context.Context, scope *Scope, o ast.Op, ds *data.Source, head *l
if err != nil {
return nil, err
}
if len(keys) == 0 && len(o.Aggs) == 1 {
if op := singletonAgg(scope, o.Aggs[0]); op != nil {
return op, nil
}
}
aggs, err := semAssignments(scope, o.Aggs, true)
if err != nil {
return nil, err
Expand Down Expand Up @@ -629,6 +634,34 @@ func semOver(ctx context.Context, scope *Scope, in *ast.Over, ds *data.Source, h
}, nil
}

func singletonAgg(scope *Scope, agg ast.Assignment) dag.Op {
if agg.LHS != nil {
return nil
}
out, err := semAssignment(scope, agg, true)
if err != nil {
return nil
}
yield := &dag.Yield{
Kind: "Yield",
}
this, ok := out.LHS.(*dag.This)
if !ok || len(this.Path) != 1 {
return nil
}
yield.Exprs = append(yield.Exprs, this)
return &dag.Sequential{
Kind: "Sequential",
Ops: []dag.Op{
&dag.Summarize{
Kind: "Summarize",
Aggs: []dag.Assignment{out},
},
yield,
},
}
}

func semDecls(scope *Scope, decls []ast.Decl) ([]dag.Def, []*dag.Func, error) {
var consts []dag.Def
var fds []*ast.FuncDecl
Expand Down Expand Up @@ -804,7 +837,7 @@ func isBool(e dag.Expr) bool {

func semCallOp(scope *Scope, call *ast.Call) (dag.Op, error) {
if agg, err := maybeConvertAgg(scope, call); err == nil && agg != nil {
return &dag.Summarize{
summarize := &dag.Summarize{
Kind: "Summarize",
Aggs: []dag.Assignment{
{
Expand All @@ -813,6 +846,14 @@ func semCallOp(scope *Scope, call *ast.Call) (dag.Op, error) {
RHS: agg,
},
},
}
yield := &dag.Yield{
Kind: "Yield",
Exprs: []dag.Expr{&dag.This{Kind: "This", Path: field.New(call.Name)}},
}
return &dag.Sequential{
Kind: "Sequential",
Ops: []dag.Op{summarize, yield},
}, nil
}
if !function.HasBoolResult(call.Name) {
Expand Down
4 changes: 2 additions & 2 deletions compiler/ztests/from-fork.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ inputs:
outputs:
- name: stdout
data: |
{count:2(uint64)}
{count:2(uint64)}
2(uint64)
2(uint64)
2 changes: 1 addition & 1 deletion docs/language/aggregates/and.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo 'true false true' | zq -z 'and(this)' -
```
=>
```mdtest-output
{and:false}
false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can effectively be seen as one big bug fix. I see now that all this time the usage guidance for all our aggregate functions have looked like:

and(bool) -> bool

i.e., they've been showing the primitive return type this PR introduces and not the record type they've always returned. So now it finally matches. 😂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever we ultimately decide, somewhere in the docs (probably in docs/language/operators/summarize.md) I think we should have some brief-but-explicit text explaining when they can expect what's returned to be a value vs. a record.

Ok, just pushed an update to summarize.md

```

Continuous AND of simple sequence:
Expand Down
4 changes: 2 additions & 2 deletions docs/language/aggregates/any.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ echo '1 2 3 4' | zq -z 'any(this)' -
```
=>
```mdtest-output
{any:1}
1
```

Continuous any over a simple sequence:
Expand All @@ -39,5 +39,5 @@ echo '"foo" 1 2 3 ' | zq -z 'any(this)' -
```
=>
```mdtest-output
{any:"foo"}
"foo"
```
4 changes: 2 additions & 2 deletions docs/language/aggregates/avg.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo '1 2 3 4' | zq -z 'avg(this)' -
```
=>
```mdtest-output
{avg:2.5}
2.5
```

Continuous average of simple sequence:
Expand All @@ -38,5 +38,5 @@ echo '1 2 3 4 "foo"' | zq -z 'avg(this)' -
```
=>
```mdtest-output
{avg:2.5}
2.5
```
4 changes: 2 additions & 2 deletions docs/language/aggregates/collect.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ echo '1 2 3 4' | zq -z 'collect(this)' -
```
=>
```mdtest-output
{collect:[1,2,3,4]}
[1,2,3,4]
```

Continuous collection over a simple sequence:
Expand All @@ -40,5 +40,5 @@ echo '1 2 3 4 "foo"' | zq -z 'collect(this)' -
```
=>
```mdtest-output
{collect:[1,2,3,4,"foo"]}
[1,2,3,4,"foo"]
```
2 changes: 1 addition & 1 deletion docs/language/aggregates/count.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo '1 2 3' | zq -z 'count()' -
```
=>
```mdtest-output
{count:3(uint64)}
3(uint64)
```

Continuous count of simple sequence:
Expand Down
2 changes: 1 addition & 1 deletion docs/language/aggregates/dcount.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ echo '1 2 2 3' | zq -z 'dcount(this)' -
```
=>
```mdtest-output
{dcount:3(uint64)}
3(uint64)
```

Continuous count of simple sequence:
Expand Down
2 changes: 1 addition & 1 deletion docs/language/aggregates/fuse.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ echo '{a:1,b:2}{a:2,b:"foo"}' | zq -z 'fuse(this)' -
```
=>
```mdtest-output
{fuse:<{a:int64,b:(int64,string)}>}
<{a:int64,b:(int64,string)}>
```
Fuse records with a group-by key:
```mdtest-command
Expand Down
2 changes: 1 addition & 1 deletion docs/language/aggregates/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ echo '{stock:"APPL",price:145.03} {stock:"GOOG",price:87.07}' | zq -z 'map(|{sto
```
=>
```mdtest-output
{map:|{"APPL":145.03,"GOOG":87.07}|}
|{"APPL":145.03,"GOOG":87.07}|
```

Continuous collection over a simple sequence:
Expand Down
4 changes: 2 additions & 2 deletions docs/language/aggregates/max.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo '1 2 3 4' | zq -z 'max(this)' -
```
=>
```mdtest-output
{max:4}
4
```

Continuous maximum of simple sequence:
Expand All @@ -38,5 +38,5 @@ echo '1 2 3 4 "foo"' | zq -z 'max(this)' -
```
=>
```mdtest-output
{max:4}
4
```
4 changes: 2 additions & 2 deletions docs/language/aggregates/min.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo '1 2 3 4' | zq -z 'min(this)' -
```
=>
```mdtest-output
{min:1}
1
```

Continuous minimum of simple sequence:
Expand All @@ -38,5 +38,5 @@ echo '1 2 3 4 "foo"' | zq -z 'min(this)' -
```
=>
```mdtest-output
{min:1}
1
```
2 changes: 1 addition & 1 deletion docs/language/aggregates/or.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo 'false true false' | zq -z 'or(this)' -
```
=>
```mdtest-output
{or:true}
true
```

Continuous OR of simple sequence:
Expand Down
4 changes: 2 additions & 2 deletions docs/language/aggregates/sum.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ echo '1 2 3 4' | zq -z 'sum(this)' -
```
=>
```mdtest-output
{sum:10}
10
```

Continuous sum of simple sequence:
Expand All @@ -38,5 +38,5 @@ echo '1 2 3 4 "foo"' | zq -z 'sum(this)' -
```
=>
```mdtest-output
{sum:10}
10
```
2 changes: 1 addition & 1 deletion docs/language/aggregates/union.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ echo '1 2 3 3' | zq -z 'union(this)' -
```
=>
```mdtest-output
{union:|[1,2,3]|}
|[1,2,3]|
```

Continuous average of simple sequence:
Expand Down
8 changes: 4 additions & 4 deletions docs/language/operators/over.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,20 +115,20 @@ echo '{a:[1,2]} {a:[3,4,5]}' | zq -z 'over a | sum(this)' -
```
=>
```mdtest-output
{sum:15}
15
```
_Aggregate the traversed values in a lateral query_
```mdtest-command
echo '{a:[1,2]} {a:[3,4,5]}' | zq -z 'over a => ( sum(this) )' -
```
=>
```mdtest-output
{sum:3}
{sum:12}
3
12
```
_Access the outer values in a lateral query_
```mdtest-command
echo '{a:[1,2],s:"foo"} {a:[3,4,5],s:"bar"}' | zq -z 'over a with s => (sum(this) | yield {s,sum})' -
echo '{a:[1,2],s:"foo"} {a:[3,4,5],s:"bar"}' | zq -z 'over a with s => (sum(this) | yield {s,sum:this})' -
```
=>
```mdtest-output
Expand Down
29 changes: 27 additions & 2 deletions docs/language/operators/summarize.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ A key may be either an expression or a field. If the key field is omitted,
it is inferred from the expression, e.g., the field name for `by lower(s)`
is `lower`.

When the result of `summarize` is a single value (e.g., a single aggregate
function without group-by keys) and there is no field name specified, then
the output is that single value rather than a single-field record
containing that value.

If the cardinality of group-by keys causes the memory footprint to exceed
a limit, then each aggregate's partial results are spilled to temporary storage
and the results merged into final results using an external merge sort.
Expand All @@ -44,7 +49,27 @@ echo '1 2 3 4' | zq -z 'summarize avg(this)' -
```
=>
```mdtest-output
{avg:2.5}
2.5
```

To format the output of a single-valued aggregation into a record, simply specify
an explicit field for the output:
```mdtest-command
echo '1 2 3 4' | zq -z 'summarize mean:=avg(this)' -
```
=>
```mdtest-output
{mean:2.5}
```

When multiple aggregate functions are specified, even without explicit field names,
a record result is generated with field names implied by the functions:
```mdtest-command
echo '1 2 3 4' | zq -z 'summarize avg(this),sum(this),count()' -
```
=>
```mdtest-output
{avg:2.5,sum:10,count:4(uint64)}
```

Sum the input sequence, leaving out the `summarize` keyword:
Expand All @@ -53,7 +78,7 @@ echo '1 2 3 4' | zq -z 'sum(this)' -
```
=>
```mdtest-output
{sum:10}
10
```

Create integer sets by key and sort the output to get a deterministic order:
Expand Down
2 changes: 1 addition & 1 deletion docs/language/operators/yield.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ echo '1 2 3' | zq -z 'yield this*2+1' -
```
_Yield is often used to transform records_
```mdtest-command
echo '{a:1,b:2}{a:3,b:4}' | zq -z 'yield [a,b],[b,a] | collect(this) | yield collect' -
echo '{a:1,b:2}{a:3,b:4}' | zq -z 'yield [a,b],[b,a] | collect(this)' -
```
=>
```mdtest-output
Expand Down
Loading