Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fix bug with incorrect results produced by paired left inverted join #59279

Merged
merged 1 commit into from
Jan 26, 2021

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Jan 22, 2021

Prior to this patch, it was possible for a paired join to produce incorrect
results for a left inverted join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the lookup join)
are projected, not columns from the first (the inverted join).

Fixes #58892

Release note (bug fix): Fixed an issue where a left inverted join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue has only existed in alpha releases of 21.1 so
far, and it is now fixed.

@rytaft rytaft requested a review from a team as a code owner January 22, 2021 03:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft rytaft force-pushed the new-col-ids branch 2 times, most recently from f4ea1e1 to da1ba5f Compare January 23, 2021 14:31
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks for fixing this. :lgtm:

Reviewed 15 of 15 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde, @rytaft, and @sumeerbhola)


pkg/sql/logictest/testdata/logic_test/inverted_join_multi_column, line 161 at r1 (raw file):

# This query performs a left inverted join on ij_idx with no matching rows on
# the right.
# TODO(mgartner): This test is broken, see #58892.

❤️


pkg/sql/opt/xform/join_funcs.go, line 710 at r1 (raw file):

	invertedJoin.Cols = invertedJoin.Cols.Difference(indexCols).Union(newIndexCols)
	invertedJoin.ConstFilters = c.MapFilterCols(invertedJoin.ConstFilters, srcCols, dstCols)
	invertedJoin.On = c.MapFilterCols(invertedJoin.On, srcCols, dstCols)

I suppose invertedJoin.PrefixKeyCols do not need to be mapped because those are columns from the input of the join? Maybe add to the mapInvertedJoin function comment that columns from the input are not mapped. You could call out PrefixKeyCols specifically if you think that's appropriate.


pkg/sql/opt/xform/select_funcs.go, line 1691 at r1 (raw file):

}

func (c *CustomFuncs) mapScalarExprCols(scalar opt.ScalarExpr, src, dst opt.ColSet) opt.ScalarExpr {

[nit] you could move this to general_funcs.go now that select and join funcs rely on it.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this will easily extend to pairs of lookup joins as done in #58261 what do you think?

Reviewed 3 of 15 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/ordering/lookup_join.go, line 35 at r1 (raw file):

		child := expr.Child(0).(memo.RelExpr)
		res := projectOrderingToInput(child, required)
		res = trimColumnGroups(&res, &child.Relational().FuncDeps)

why this additional code? The remapping in this PR should not have changed what ordering the first join can provide since their orderings are based on their input side, which is not being remapped.
Could you add a code comment?


pkg/sql/opt/xform/join_funcs.go, line 593 at r1 (raw file):

		// Create a new ScanPrivate, which will be used below for the inverted join.
		// Note: this must happen before the continuation column is created to ensure
		// that the continuation column will have the highest column ID.

Could you add a code comment about paired joins that explains why we are doing this remapping?


pkg/sql/opt/xform/join_funcs.go, line 645 at r1 (raw file):

		indexJoin.Table = scanPrivate.Table
		indexJoin.Index = cat.PrimaryIndex
		indexJoin.KeyCols = c.getPkCols(invertedJoin.Table)

Am I correct in understanding that invertedJoin.Table is the mapped table?


pkg/sql/opt/xform/join_funcs.go, line 654 at r1 (raw file):

		// If this is a semi- or anti-join, ensure the columns do not include any
		// unneeded right-side columns.
		if joinType == opt.SemiJoinOp || joinType == opt.AntiJoinOp {

I don't quite understand the simplification that happened here. Could you explain, if possible with a simple example?

…join

Prior to this patch, it was possible for a paired join to produce incorrect
results for a left inverted join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the lookup join)
are projected, not columns from the first (the inverted join).

Fixes cockroachdb#58892

Release note (bug fix): Fixed an issue where a left inverted join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue has only existed in alpha releases of 21.1 so
far, and it is now fixed.
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

I'm wondering if this will easily extend to pairs of lookup joins as done in #58261 what do you think?

I think so! We'll just need to define a similar function mapLookupJoin and map the first join.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @RaduBerinde, and @sumeerbhola)


pkg/sql/opt/ordering/lookup_join.go, line 35 at r1 (raw file):

Previously, sumeerbhola wrote…

why this additional code? The remapping in this PR should not have changed what ordering the first join can provide since their orderings are based on their input side, which is not being remapped.
Could you add a code comment?

I was getting an error that the ordering passed to CanProvide included some columns not provided by the input. I've copied some of the comments from lookupOrIndexJoinBuildChildReqOrdering below and added a bit more.

This is needed because we represent required orderings as an OrderingChoice which basically allows you to order by any of several different columns. For example, if the required ordering is on a, but we happen to know that a=b and c is constant, then the ordering would be represented as +(a|b) opt(c), indicating that we can order by a, b, (c,a), or (c,b). We can remove one of the required columns and all of the optional columns, and still have it be valid. (Not going to include all of this info since it's documented in the comment for OrderingChoice).


pkg/sql/opt/xform/join_funcs.go, line 593 at r1 (raw file):

Previously, sumeerbhola wrote…

Could you add a code comment about paired joins that explains why we are doing this remapping?

Done.


pkg/sql/opt/xform/join_funcs.go, line 645 at r1 (raw file):

Previously, sumeerbhola wrote…

Am I correct in understanding that invertedJoin.Table is the mapped table?

Yes, that's right.


pkg/sql/opt/xform/join_funcs.go, line 654 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't quite understand the simplification that happened here. Could you explain, if possible with a simple example?

The test case that caused us to require the added projection was this one:

SELECT lk FROM ltable
WHERE NOT EXISTS (
  SELECT * FROM rtable WHERE ST_Covers(ltable.geom1, rtable.geom) AND lk > 5 AND rk1 > 12
)

Normally, the execbuilder can add a projection on top of the anti join to remove the columns that should not be output:

if join.JoinType == opt.SemiJoinOp || join.JoinType == opt.AntiJoinOp {
But the problem was that it could not tell the difference between columns from the inverted join and true input columns. In the above example, it was not effectively removing rk1. That's why I added this special logic in #50993.

This PR removes the need for the special logic, since columns from the inverted join will never be part of the indexJoin.On condition, and therefore won't be included in indexJoin.Cols. Now the logic in the execbuilder is effective (I updated the comment there).


pkg/sql/opt/xform/join_funcs.go, line 710 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I suppose invertedJoin.PrefixKeyCols do not need to be mapped because those are columns from the input of the join? Maybe add to the mapInvertedJoin function comment that columns from the input are not mapped. You could call out PrefixKeyCols specifically if you think that's appropriate.

Done.


pkg/sql/opt/xform/select_funcs.go, line 1691 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

[nit] you could move this to general_funcs.go now that select and join funcs rely on it.

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!
:lgtm:

Reviewed 12 of 15 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

bors r+

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)

@craig
Copy link
Contributor

craig bot commented Jan 26, 2021

Build succeeded:

@craig craig bot merged commit 081f813 into cockroachdb:master Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: left inverted join pair produces incorrect results
4 participants