-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix bug with incorrect results produced by paired left inverted join #59279
Conversation
f4ea1e1
to
da1ba5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks for fixing this.
Reviewed 15 of 15 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde, @rytaft, and @sumeerbhola)
pkg/sql/logictest/testdata/logic_test/inverted_join_multi_column, line 161 at r1 (raw file):
# This query performs a left inverted join on ij_idx with no matching rows on # the right. # TODO(mgartner): This test is broken, see #58892.
❤️
pkg/sql/opt/xform/join_funcs.go, line 710 at r1 (raw file):
invertedJoin.Cols = invertedJoin.Cols.Difference(indexCols).Union(newIndexCols) invertedJoin.ConstFilters = c.MapFilterCols(invertedJoin.ConstFilters, srcCols, dstCols) invertedJoin.On = c.MapFilterCols(invertedJoin.On, srcCols, dstCols)
I suppose invertedJoin.PrefixKeyCols
do not need to be mapped because those are columns from the input of the join? Maybe add to the mapInvertedJoin
function comment that columns from the input are not mapped. You could call out PrefixKeyCols
specifically if you think that's appropriate.
pkg/sql/opt/xform/select_funcs.go, line 1691 at r1 (raw file):
} func (c *CustomFuncs) mapScalarExprCols(scalar opt.ScalarExpr, src, dst opt.ColSet) opt.ScalarExpr {
[nit] you could move this to general_funcs.go
now that select and join funcs rely on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this will easily extend to pairs of lookup joins as done in #58261 what do you think?
Reviewed 3 of 15 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/ordering/lookup_join.go, line 35 at r1 (raw file):
child := expr.Child(0).(memo.RelExpr) res := projectOrderingToInput(child, required) res = trimColumnGroups(&res, &child.Relational().FuncDeps)
why this additional code? The remapping in this PR should not have changed what ordering the first join can provide since their orderings are based on their input side, which is not being remapped.
Could you add a code comment?
pkg/sql/opt/xform/join_funcs.go, line 593 at r1 (raw file):
// Create a new ScanPrivate, which will be used below for the inverted join. // Note: this must happen before the continuation column is created to ensure // that the continuation column will have the highest column ID.
Could you add a code comment about paired joins that explains why we are doing this remapping?
pkg/sql/opt/xform/join_funcs.go, line 645 at r1 (raw file):
indexJoin.Table = scanPrivate.Table indexJoin.Index = cat.PrimaryIndex indexJoin.KeyCols = c.getPkCols(invertedJoin.Table)
Am I correct in understanding that invertedJoin.Table
is the mapped table?
pkg/sql/opt/xform/join_funcs.go, line 654 at r1 (raw file):
// If this is a semi- or anti-join, ensure the columns do not include any // unneeded right-side columns. if joinType == opt.SemiJoinOp || joinType == opt.AntiJoinOp {
I don't quite understand the simplification that happened here. Could you explain, if possible with a simple example?
…join Prior to this patch, it was possible for a paired join to produce incorrect results for a left inverted join. In particular, some output rows had non-NULL values for right-side columns when the right-side columns should have been NULL. This commit fixes the issue by updating the optimizer to ensure that only columns from the second join in the paired join (the lookup join) are projected, not columns from the first (the inverted join). Fixes cockroachdb#58892 Release note (bug fix): Fixed an issue where a left inverted join could have incorrect results. In particular, some output rows could have non-NULL values for right-side columns when the right-side columns should have been NULL. This issue has only existed in alpha releases of 21.1 so far, and it is now fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
I'm wondering if this will easily extend to pairs of lookup joins as done in #58261 what do you think?
I think so! We'll just need to define a similar function mapLookupJoin
and map the first join.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @RaduBerinde, and @sumeerbhola)
pkg/sql/opt/ordering/lookup_join.go, line 35 at r1 (raw file):
Previously, sumeerbhola wrote…
why this additional code? The remapping in this PR should not have changed what ordering the first join can provide since their orderings are based on their input side, which is not being remapped.
Could you add a code comment?
I was getting an error that the ordering passed to CanProvide
included some columns not provided by the input. I've copied some of the comments from lookupOrIndexJoinBuildChildReqOrdering
below and added a bit more.
This is needed because we represent required orderings as an OrderingChoice
which basically allows you to order by any of several different columns. For example, if the required ordering is on a
, but we happen to know that a=b
and c
is constant, then the ordering would be represented as +(a|b) opt(c)
, indicating that we can order by a
, b
, (c,a)
, or (c,b)
. We can remove one of the required columns and all of the optional columns, and still have it be valid. (Not going to include all of this info since it's documented in the comment for OrderingChoice
).
pkg/sql/opt/xform/join_funcs.go, line 593 at r1 (raw file):
Previously, sumeerbhola wrote…
Could you add a code comment about paired joins that explains why we are doing this remapping?
Done.
pkg/sql/opt/xform/join_funcs.go, line 645 at r1 (raw file):
Previously, sumeerbhola wrote…
Am I correct in understanding that
invertedJoin.Table
is the mapped table?
Yes, that's right.
pkg/sql/opt/xform/join_funcs.go, line 654 at r1 (raw file):
Previously, sumeerbhola wrote…
I don't quite understand the simplification that happened here. Could you explain, if possible with a simple example?
The test case that caused us to require the added projection was this one:
SELECT lk FROM ltable
WHERE NOT EXISTS (
SELECT * FROM rtable WHERE ST_Covers(ltable.geom1, rtable.geom) AND lk > 5 AND rk1 > 12
)
Normally, the execbuilder
can add a projection on top of the anti join to remove the columns that should not be output:
if join.JoinType == opt.SemiJoinOp || join.JoinType == opt.AntiJoinOp { |
rk1
. That's why I added this special logic in #50993.
This PR removes the need for the special logic, since columns from the inverted join will never be part of the indexJoin.On
condition, and therefore won't be included in indexJoin.Cols
. Now the logic in the execbuilder
is effective (I updated the comment there).
pkg/sql/opt/xform/join_funcs.go, line 710 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I suppose
invertedJoin.PrefixKeyCols
do not need to be mapped because those are columns from the input of the join? Maybe add to themapInvertedJoin
function comment that columns from the input are not mapped. You could call outPrefixKeyCols
specifically if you think that's appropriate.
Done.
pkg/sql/opt/xform/select_funcs.go, line 1691 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
[nit] you could move this to
general_funcs.go
now that select and join funcs rely on it.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 15 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
bors r+
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)
Build succeeded: |
Prior to this patch, it was possible for a paired join to produce incorrect
results for a left inverted join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.
This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the lookup join)
are projected, not columns from the first (the inverted join).
Fixes #58892
Release note (bug fix): Fixed an issue where a left inverted join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue has only existed in alpha releases of 21.1 so
far, and it is now fixed.