exec: fix handling of empty table by aggregates #39007

yuzefovich · 2019-07-20T00:06:33Z

This commit adds special behavior for a case when the input to an
aggregate is empty. There is also a subtle difference between a
scalar and non-scalar case: if GROUP BY clause is omitted, the
aggregate is scalar, so it should emit null or zero, but if GROUP
BY is present, the aggregate is non-scalar and should emit no
output. These two cases are templated out.

Based on #38872.

Fixes: #38858.

Release note: None

cockroach-teamcity · 2019-07-20T00:06:40Z

This change is

yuzefovich · 2019-07-20T15:20:57Z

I think that what I did here is an overkill since the figuring out how to treat an empty table happens only once per lifetime of an aggregate, so templating it out is not necessary.

rafiss

Thanks for taking this over from me! This looks all good to me. I do think that it would be less code without doing the templating, and since like you said, the isScalar check only needs to happen once, I don't think there's any performance reason to template it. My vote is to keep it as a normal code branch like in count_agg, but if it's a lot of effort to go back and change, it's fine with me to merge.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)

jordanlewis

Didn't review carefully, and will defer to Rafi on this, but any chance we could avoid having 2 impls of all aggregates by teaching the aggregate runner thing (exec/aggregator.go) to deal with this instead?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)

yuzefovich · 2019-07-22T17:30:49Z

Yeah, I also didn't like having two different implementations that basically did the same thing except for this edge case. I decided to go with an extra argument to Compute method. PTAL.

yuzefovich · 2019-07-22T18:51:33Z

Just to double check, I ran all the benchmarks, and there is no performance difference.

justinj

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)

pkg/sql/exec/aggregator.go, line 173 at r2 (raw file):

		aggTypes: aggTypes,
		groupCol: groupCol,
		isScalar: len(groupCols) == 0,

I don't think this is correct—it's possibly to have a non-scalar group-by with an empty set of grouping columns (say, if you're grouping on a set of constant set of columns which the optimizer then simplifies). groupNode has an isScalar flag which denotes whether a group-by is scalar or not.

This is an example of a non-scalar group by with an empty grouping column set:

root@127.0.0.1:49438/defaultdb> create table x (a int primary key, b int);
CREATE TABLE

Time: 5.94ms

root@127.0.0.1:49438/defaultdb> explain (opt, verbose) select sum(a) from (select 1 as const, a from x) group by const;
                  text
+---------------------------------------+
  group-by
   ├── columns: sum:4
   ├── cardinality: [0 - 1]
   ├── stats: [rows=1]
   ├── cost: 1040.04
   ├── key: ()
   ├── fd: ()-->(4)
   ├── prune: (4)
   ├── scan x
   │    ├── columns: a:1
   │    ├── stats: [rows=1000]
   │    ├── cost: 1030.02
   │    ├── key: (1)
   │    ├── prune: (1)
   │    └── interesting orderings: (+1)
   └── aggregations
        └── sum [outer=(1)]
             └── variable: a
(18 rows)

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @solongordon)

pkg/sql/exec/aggregator.go, line 173 at r2 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I don't think this is correct—it's possibly to have a non-scalar group-by with an empty set of grouping columns (say, if you're grouping on a set of constant set of columns which the optimizer then simplifies). groupNode has an isScalar flag which denotes whether a group-by is scalar or not.

This is an example of a non-scalar group by with an empty grouping column set:
root@127.0.0.1:49438/defaultdb> create table x (a int primary key, b int);
CREATE TABLE

Time: 5.94ms

root@127.0.0.1:49438/defaultdb> explain (opt, verbose) select sum(a) from (select 1 as const, a from x) group by const;
                  text
+---------------------------------------+
  group-by
   ├── columns: sum:4
   ├── cardinality: [0 - 1]
   ├── stats: [rows=1]
   ├── cost: 1040.04
   ├── key: ()
   ├── fd: ()-->(4)
   ├── prune: (4)
   ├── scan x
   │    ├── columns: a:1
   │    ├── stats: [rows=1000]
   │    ├── cost: 1030.02
   │    ├── key: (1)
   │    ├── prune: (1)
   │    └── interesting orderings: (+1)
   └── aggregations
        └── sum [outer=(1)]
             └── variable: a
(18 rows)

Indeed, thanks for the catch! I adjusted the code to use Type of the spec, so it should be good now.

yuzefovich · 2019-07-22T20:33:30Z

I added a second commit here that makes the tracing test more flexible (it used to have a hard-coded table id, and when I added creation of empty table, the IDs shifted).

rohany · 2019-07-22T20:35:06Z

test change looks good to me

yuzefovich · 2019-07-22T21:06:47Z

Second commit was removed (I didn't think through carefully enough). I just left a clarifying comment if someone runs into the same failure later.

jordanlewis

Is there any way

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)

pkg/sql/exec/count_agg.go, line 68 at r3 (raw file):

		if a.curIdx >= 0 {
			a.curIdx++
		} else {

Is there no way we can abstract this out? It seems very hard to remember that this has to get done for new aggregates, and very error prone that every aggregate has to do it perfectly.

yuzefovich · 2019-07-23T06:52:38Z

Alright, I gave it another shot. Now everything is handled in exec/aggregator, but I exposed an internal Nulls of aggregate functions which I don't like. Let me know if you guys like this approach better.

jordanlewis

I like this much better than the previous approach. I think you could fix the internal Nulls exposing too, if you wanted. Add another interface method to the aggregate functions called emitEmptyScalarGroup or something, that either sets NULL or sets 0 in case of countOp.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)

pkg/sql/exec/aggregator.go, line 295 at r5 (raw file):

				// functions except for count aggregates (for which it should be zero).
				for _, fn := range a.aggregateFuncs {
					if c, isCountAgg := fn.(*countAgg); isCountAgg {

I don't love this special case, but it seems like count is just a very very special snowflake and this can't be avoided.

yuzefovich · 2019-07-23T15:22:43Z

Done. At some point, I used the same approach, but it's not exactly "handling everything in exec/aggregator", so I discarded it, yet it is probably the cleanest. PTAL.

jordanlewis

Perhaps we could pull the scratch buffer into a shared aggregator base at some point, and then implement EmitEmptyScalarGroup once on that shared base to avoid the remaining duplication. But, I don't want to hold up the fix on that, of course.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)

pkg/sql/exec/avg_agg_tmpl.go, line 168 at r6 (raw file):

}

func (a *avg_TYPEAgg) HandleEmptyInputScalar() {

Last suggestion - you could embed an empty struct called

jordanlewis

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)

pkg/sql/exec/avg_agg_tmpl.go, line 168 at r6 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Last suggestion - you could embed an empty struct called

Whoops, moved this to the main respond field and forgot to delete it.

This commit adds special behavior for a case when the input to an aggregate is empty. There is also a subtle difference between a scalar and non-scalar case: in scalar context, an aggregate function needs to emit either null or zero, but in non-scalar context, all functions have no output. Release note: None

yuzefovich · 2019-07-23T16:38:22Z

Thanks everyone for the input!

bors r+

39007: exec: fix handling of empty table by aggregates r=yuzefovich a=yuzefovich This commit adds special behavior for a case when the input to an aggregate is empty. There is also a subtle difference between a scalar and non-scalar case: if GROUP BY clause is omitted, the aggregate is scalar, so it should emit null or zero, but if GROUP BY is present, the aggregate is non-scalar and should emit no output. These two cases are templated out. Based on #38872. Fixes: #38858. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

craig · 2019-07-23T16:57:54Z

Build succeeded

GitHub CI (Cockroach)

yuzefovich requested review from solongordon, rafiss and a team July 20, 2019 00:06

rafiss approved these changes Jul 22, 2019

View reviewed changes

jordanlewis reviewed Jul 22, 2019

View reviewed changes

yuzefovich force-pushed the exec-agg-empty-table branch from 098d5e1 to 9b01cd7 Compare July 22, 2019 17:29

yuzefovich force-pushed the exec-agg-empty-table branch from 9b01cd7 to aab58f4 Compare July 22, 2019 17:31

solongordon mentioned this pull request Jul 22, 2019

exec: tracking issue for unit test failures #38935

Closed

13 tasks

yuzefovich force-pushed the exec-agg-empty-table branch from aab58f4 to b032f42 Compare July 22, 2019 18:53

justinj reviewed Jul 22, 2019

View reviewed changes

yuzefovich force-pushed the exec-agg-empty-table branch from b032f42 to f2983c8 Compare July 22, 2019 19:52

yuzefovich requested review from a team July 22, 2019 19:52

yuzefovich commented Jul 22, 2019

View reviewed changes

yuzefovich force-pushed the exec-agg-empty-table branch from f2983c8 to 1fad672 Compare July 22, 2019 20:23

yuzefovich force-pushed the exec-agg-empty-table branch 2 times, most recently from d04b666 to 1d87a7f Compare July 22, 2019 21:05

jordanlewis reviewed Jul 23, 2019

View reviewed changes

yuzefovich force-pushed the exec-agg-empty-table branch 2 times, most recently from c8025cf to 4ad75d9 Compare July 23, 2019 06:49

jordanlewis reviewed Jul 23, 2019

View reviewed changes

yuzefovich force-pushed the exec-agg-empty-table branch from 4ad75d9 to 436b2ee Compare July 23, 2019 15:18

jordanlewis approved these changes Jul 23, 2019

View reviewed changes

jordanlewis reviewed Jul 23, 2019

View reviewed changes

yuzefovich force-pushed the exec-agg-empty-table branch from 436b2ee to 5a98a70 Compare July 23, 2019 15:54

yuzefovich force-pushed the exec-agg-empty-table branch from 5a98a70 to 82bb6a3 Compare July 23, 2019 15:56

craig bot merged commit 82bb6a3 into cockroachdb:master Jul 23, 2019

yuzefovich deleted the exec-agg-empty-table branch July 23, 2019 17:01

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

rafiss mentioned this pull request Feb 26, 2020

COUNT(*) on empty table in GROUP BY is different from Postgres #45453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exec: fix handling of empty table by aggregates #39007

exec: fix handling of empty table by aggregates #39007

yuzefovich commented Jul 20, 2019 •

edited

Loading

cockroach-teamcity commented Jul 20, 2019

yuzefovich commented Jul 20, 2019

rafiss left a comment

jordanlewis left a comment

yuzefovich commented Jul 22, 2019

yuzefovich commented Jul 22, 2019

justinj left a comment

yuzefovich left a comment

yuzefovich commented Jul 22, 2019

rohany commented Jul 22, 2019

yuzefovich commented Jul 22, 2019

jordanlewis left a comment

yuzefovich commented Jul 23, 2019

jordanlewis left a comment

yuzefovich commented Jul 23, 2019

jordanlewis left a comment

jordanlewis left a comment

yuzefovich commented Jul 23, 2019

craig bot commented Jul 23, 2019

exec: fix handling of empty table by aggregates #39007

exec: fix handling of empty table by aggregates #39007

Conversation

yuzefovich commented Jul 20, 2019 • edited Loading

cockroach-teamcity commented Jul 20, 2019

yuzefovich commented Jul 20, 2019

rafiss left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich commented Jul 22, 2019

yuzefovich commented Jul 22, 2019

justinj left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Jul 22, 2019

rohany commented Jul 22, 2019

yuzefovich commented Jul 22, 2019

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich commented Jul 23, 2019

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich commented Jul 23, 2019

jordanlewis left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich commented Jul 23, 2019

craig bot commented Jul 23, 2019

Build succeeded

yuzefovich commented Jul 20, 2019 •

edited

Loading