Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: fix handling of empty table by aggregates #39007

Merged
merged 1 commit into from
Jul 23, 2019

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 20, 2019

This commit adds special behavior for a case when the input to an
aggregate is empty. There is also a subtle difference between a
scalar and non-scalar case: if GROUP BY clause is omitted, the
aggregate is scalar, so it should emit null or zero, but if GROUP
BY is present, the aggregate is non-scalar and should emit no
output. These two cases are templated out.

Based on #38872.

Fixes: #38858.

Release note: None

@yuzefovich yuzefovich requested review from solongordon, rafiss and a team July 20, 2019 00:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

I think that what I did here is an overkill since the figuring out how to treat an empty table happens only once per lifetime of an aggregate, so templating it out is not necessary.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this over from me! This looks all good to me. I do think that it would be less code without doing the templating, and since like you said, the isScalar check only needs to happen once, I don't think there's any performance reason to template it. My vote is to keep it as a normal code branch like in count_agg, but if it's a lot of effort to go back and change, it's fine with me to merge.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review carefully, and will defer to Rafi on this, but any chance we could avoid having 2 impls of all aggregates by teaching the aggregate runner thing (exec/aggregator.go) to deal with this instead?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @solongordon)

@yuzefovich
Copy link
Member Author

Yeah, I also didn't like having two different implementations that basically did the same thing except for this edge case. I decided to go with an extra argument to Compute method. PTAL.

@yuzefovich
Copy link
Member Author

Just to double check, I ran all the benchmarks, and there is no performance difference.

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)


pkg/sql/exec/aggregator.go, line 173 at r2 (raw file):

		aggTypes: aggTypes,
		groupCol: groupCol,
		isScalar: len(groupCols) == 0,

I don't think this is correct—it's possibly to have a non-scalar group-by with an empty set of grouping columns (say, if you're grouping on a set of constant set of columns which the optimizer then simplifies). groupNode has an isScalar flag which denotes whether a group-by is scalar or not.

This is an example of a non-scalar group by with an empty grouping column set:

root@127.0.0.1:49438/defaultdb> create table x (a int primary key, b int);
CREATE TABLE

Time: 5.94ms

root@127.0.0.1:49438/defaultdb> explain (opt, verbose) select sum(a) from (select 1 as const, a from x) group by const;
                  text
+---------------------------------------+
  group-by
   ├── columns: sum:4
   ├── cardinality: [0 - 1]
   ├── stats: [rows=1]
   ├── cost: 1040.04
   ├── key: ()
   ├── fd: ()-->(4)
   ├── prune: (4)
   ├── scan x
   │    ├── columns: a:1
   │    ├── stats: [rows=1000]
   │    ├── cost: 1030.02
   │    ├── key: (1)
   │    ├── prune: (1)
   │    └── interesting orderings: (+1)
   └── aggregations
        └── sum [outer=(1)]
             └── variable: a
(18 rows)

@yuzefovich yuzefovich requested review from a team July 22, 2019 19:52
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @solongordon)


pkg/sql/exec/aggregator.go, line 173 at r2 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I don't think this is correct—it's possibly to have a non-scalar group-by with an empty set of grouping columns (say, if you're grouping on a set of constant set of columns which the optimizer then simplifies). groupNode has an isScalar flag which denotes whether a group-by is scalar or not.

This is an example of a non-scalar group by with an empty grouping column set:

root@127.0.0.1:49438/defaultdb> create table x (a int primary key, b int);
CREATE TABLE

Time: 5.94ms

root@127.0.0.1:49438/defaultdb> explain (opt, verbose) select sum(a) from (select 1 as const, a from x) group by const;
                  text
+---------------------------------------+
  group-by
   ├── columns: sum:4
   ├── cardinality: [0 - 1]
   ├── stats: [rows=1]
   ├── cost: 1040.04
   ├── key: ()
   ├── fd: ()-->(4)
   ├── prune: (4)
   ├── scan x
   │    ├── columns: a:1
   │    ├── stats: [rows=1000]
   │    ├── cost: 1030.02
   │    ├── key: (1)
   │    ├── prune: (1)
   │    └── interesting orderings: (+1)
   └── aggregations
        └── sum [outer=(1)]
             └── variable: a
(18 rows)

Indeed, thanks for the catch! I adjusted the code to use Type of the spec, so it should be good now.

@yuzefovich
Copy link
Member Author

I added a second commit here that makes the tracing test more flexible (it used to have a hard-coded table id, and when I added creation of empty table, the IDs shifted).

@rohany
Copy link
Contributor

rohany commented Jul 22, 2019

test change looks good to me

@yuzefovich yuzefovich force-pushed the exec-agg-empty-table branch 2 times, most recently from d04b666 to 1d87a7f Compare July 22, 2019 21:05
@yuzefovich
Copy link
Member Author

Second commit was removed (I didn't think through carefully enough). I just left a clarifying comment if someone runs into the same failure later.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)


pkg/sql/exec/count_agg.go, line 68 at r3 (raw file):

		if a.curIdx >= 0 {
			a.curIdx++
		} else {

Is there no way we can abstract this out? It seems very hard to remember that this has to get done for new aggregates, and very error prone that every aggregate has to do it perfectly.

@yuzefovich yuzefovich force-pushed the exec-agg-empty-table branch 2 times, most recently from c8025cf to 4ad75d9 Compare July 23, 2019 06:49
@yuzefovich
Copy link
Member Author

Alright, I gave it another shot. Now everything is handled in exec/aggregator, but I exposed an internal Nulls of aggregate functions which I don't like. Let me know if you guys like this approach better.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this much better than the previous approach. I think you could fix the internal Nulls exposing too, if you wanted. Add another interface method to the aggregate functions called emitEmptyScalarGroup or something, that either sets NULL or sets 0 in case of countOp.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)


pkg/sql/exec/aggregator.go, line 295 at r5 (raw file):

				// functions except for count aggregates (for which it should be zero).
				for _, fn := range a.aggregateFuncs {
					if c, isCountAgg := fn.(*countAgg); isCountAgg {

I don't love this special case, but it seems like count is just a very very special snowflake and this can't be avoided.

@yuzefovich
Copy link
Member Author

Done. At some point, I used the same approach, but it's not exactly "handling everything in exec/aggregator", so I discarded it, yet it is probably the cleanest. PTAL.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Perhaps we could pull the scratch buffer into a shared aggregator base at some point, and then implement EmitEmptyScalarGroup once on that shared base to avoid the remaining duplication. But, I don't want to hold up the fix on that, of course.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)


pkg/sql/exec/avg_agg_tmpl.go, line 168 at r6 (raw file):

}

func (a *avg_TYPEAgg) HandleEmptyInputScalar() {

Last suggestion - you could embed an empty struct called

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @solongordon, and @yuzefovich)


pkg/sql/exec/avg_agg_tmpl.go, line 168 at r6 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Last suggestion - you could embed an empty struct called

Whoops, moved this to the main respond field and forgot to delete it.

This commit adds special behavior for a case when the input to an
aggregate is empty. There is also a subtle difference between a
scalar and non-scalar case: in scalar context, an aggregate
function needs to emit either null or zero, but in non-scalar
context, all functions have no output.

Release note: None
@yuzefovich
Copy link
Member Author

Thanks everyone for the input!

bors r+

craig bot pushed a commit that referenced this pull request Jul 23, 2019
39007: exec: fix handling of empty table by aggregates r=yuzefovich a=yuzefovich

This commit adds special behavior for a case when the input to an
aggregate is empty. There is also a subtle difference between a
scalar and non-scalar case: if GROUP BY clause is omitted, the
aggregate is scalar, so it should emit null or zero, but if GROUP
BY is present, the aggregate is non-scalar and should emit no
output. These two cases are templated out.

Based on #38872.

Fixes: #38858.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jul 23, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exec: min/max do not return nulls over an empty table
6 participants