-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: fix handling of empty table by aggregates #39007
Conversation
I think that what I did here is an overkill since the figuring out how to treat an empty table happens only once per lifetime of an aggregate, so templating it out is not necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this over from me! This looks all good to me. I do think that it would be less code without doing the templating, and since like you said, the isScalar check only needs to happen once, I don't think there's any performance reason to template it. My vote is to keep it as a normal code branch like in count_agg, but if it's a lot of effort to go back and change, it's fine with me to merge.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review carefully, and will defer to Rafi on this, but any chance we could avoid having 2 impls of all aggregates by teaching the aggregate runner thing (exec/aggregator.go
) to deal with this instead?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)
098d5e1
to
9b01cd7
Compare
Yeah, I also didn't like having two different implementations that basically did the same thing except for this edge case. I decided to go with an extra argument to |
9b01cd7
to
aab58f4
Compare
Just to double check, I ran all the benchmarks, and there is no performance difference. |
aab58f4
to
b032f42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)
pkg/sql/exec/aggregator.go, line 173 at r2 (raw file):
aggTypes: aggTypes, groupCol: groupCol, isScalar: len(groupCols) == 0,
I don't think this is correct—it's possibly to have a non-scalar group-by with an empty set of grouping columns (say, if you're grouping on a set of constant set of columns which the optimizer then simplifies). groupNode
has an isScalar
flag which denotes whether a group-by is scalar or not.
This is an example of a non-scalar group by with an empty grouping column set:
root@127.0.0.1:49438/defaultdb> create table x (a int primary key, b int);
CREATE TABLE
Time: 5.94ms
root@127.0.0.1:49438/defaultdb> explain (opt, verbose) select sum(a) from (select 1 as const, a from x) group by const;
text
+---------------------------------------+
group-by
├── columns: sum:4
├── cardinality: [0 - 1]
├── stats: [rows=1]
├── cost: 1040.04
├── key: ()
├── fd: ()-->(4)
├── prune: (4)
├── scan x
│ ├── columns: a:1
│ ├── stats: [rows=1000]
│ ├── cost: 1030.02
│ ├── key: (1)
│ ├── prune: (1)
│ └── interesting orderings: (+1)
└── aggregations
└── sum [outer=(1)]
└── variable: a
(18 rows)
b032f42
to
f2983c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @solongordon)
pkg/sql/exec/aggregator.go, line 173 at r2 (raw file):
Previously, justinj (Justin Jaffray) wrote…
I don't think this is correct—it's possibly to have a non-scalar group-by with an empty set of grouping columns (say, if you're grouping on a set of constant set of columns which the optimizer then simplifies).
groupNode
has anisScalar
flag which denotes whether a group-by is scalar or not.This is an example of a non-scalar group by with an empty grouping column set:
root@127.0.0.1:49438/defaultdb> create table x (a int primary key, b int); CREATE TABLE Time: 5.94ms root@127.0.0.1:49438/defaultdb> explain (opt, verbose) select sum(a) from (select 1 as const, a from x) group by const; text +---------------------------------------+ group-by ├── columns: sum:4 ├── cardinality: [0 - 1] ├── stats: [rows=1] ├── cost: 1040.04 ├── key: () ├── fd: ()-->(4) ├── prune: (4) ├── scan x │ ├── columns: a:1 │ ├── stats: [rows=1000] │ ├── cost: 1030.02 │ ├── key: (1) │ ├── prune: (1) │ └── interesting orderings: (+1) └── aggregations └── sum [outer=(1)] └── variable: a (18 rows)
Indeed, thanks for the catch! I adjusted the code to use Type
of the spec, so it should be good now.
f2983c8
to
1fad672
Compare
I added a second commit here that makes the tracing test more flexible (it used to have a hard-coded table id, and when I added creation of |
test change looks good to me |
d04b666
to
1d87a7f
Compare
Second commit was removed (I didn't think through carefully enough). I just left a clarifying comment if someone runs into the same failure later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)
pkg/sql/exec/count_agg.go, line 68 at r3 (raw file):
if a.curIdx >= 0 { a.curIdx++ } else {
Is there no way we can abstract this out? It seems very hard to remember that this has to get done for new aggregates, and very error prone that every aggregate has to do it perfectly.
c8025cf
to
4ad75d9
Compare
Alright, I gave it another shot. Now everything is handled in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this much better than the previous approach. I think you could fix the internal Nulls
exposing too, if you wanted. Add another interface method to the aggregate functions called emitEmptyScalarGroup
or something, that either sets NULL or sets 0 in case of countOp.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)
pkg/sql/exec/aggregator.go, line 295 at r5 (raw file):
// functions except for count aggregates (for which it should be zero). for _, fn := range a.aggregateFuncs { if c, isCountAgg := fn.(*countAgg); isCountAgg {
I don't love this special case, but it seems like count
is just a very very special snowflake and this can't be avoided.
4ad75d9
to
436b2ee
Compare
Done. At some point, I used the same approach, but it's not exactly "handling everything in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could pull the scratch
buffer into a shared aggregator base at some point, and then implement EmitEmptyScalarGroup
once on that shared base to avoid the remaining duplication. But, I don't want to hold up the fix on that, of course.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)
pkg/sql/exec/avg_agg_tmpl.go, line 168 at r6 (raw file):
} func (a *avg_TYPEAgg) HandleEmptyInputScalar() {
Last suggestion - you could embed an empty struct called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)
pkg/sql/exec/avg_agg_tmpl.go, line 168 at r6 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Last suggestion - you could embed an empty struct called
Whoops, moved this to the main respond field and forgot to delete it.
436b2ee
to
5a98a70
Compare
This commit adds special behavior for a case when the input to an aggregate is empty. There is also a subtle difference between a scalar and non-scalar case: in scalar context, an aggregate function needs to emit either null or zero, but in non-scalar context, all functions have no output. Release note: None
5a98a70
to
82bb6a3
Compare
Thanks everyone for the input! bors r+ |
39007: exec: fix handling of empty table by aggregates r=yuzefovich a=yuzefovich This commit adds special behavior for a case when the input to an aggregate is empty. There is also a subtle difference between a scalar and non-scalar case: if GROUP BY clause is omitted, the aggregate is scalar, so it should emit null or zero, but if GROUP BY is present, the aggregate is non-scalar and should emit no output. These two cases are templated out. Based on #38872. Fixes: #38858. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Build succeeded |
This commit adds special behavior for a case when the input to an
aggregate is empty. There is also a subtle difference between a
scalar and non-scalar case: if GROUP BY clause is omitted, the
aggregate is scalar, so it should emit null or zero, but if GROUP
BY is present, the aggregate is non-scalar and should emit no
output. These two cases are templated out.
Based on #38872.
Fixes: #38858.
Release note: None