-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: left inverted join pair produces incorrect results #58892
Comments
I think there is something broken with the projection taken from the The The There is another related problem: the cockroach/pkg/sql/rowexec/joinreader.go Line 609 in d1702fe
j2.k column is nil (though the rowFetcher does have it, since it needs to compute the key string in the preceding call to PartialKey which will be used in the map lookup). I think this is because the join expects that the j2.k from the left-side suffices, and j2.j is the only column in the neededRightCols which is used to initialize the rowFetcher cockroach/pkg/sql/rowexec/joinreader.go Line 279 in d1702fe
I am surprised that this breakage hasn't shown up in tests before. Maybe our logic tests are not testing the case of only false positives for the original row from the first join. The |
I looked into this issue for a bit, and I think the problem is a broken assumption in how unmatched tuples are handled by most joiners (those that embed In non-paired joiner approach it works as follows: for LEFT/FULL outer joins, if the left tuple doesn't have a match, we create a "combined" row in Now that we have a paired joiner approach, from the perspective of the second lookup joiner, columns I'm not sure what the correct fix is. My first thought is why do we project Alternatively, we could maintain a special set of columnIDs coming from the right side in the first inverted joiner so that at the second lookup joiner we know what are the "true" right columns. This seems ugly and annoying though. |
I don't think this is true, But I am also suspicious of the outer left lookup join having |
I suspect we are saying the same things, but just to be sure: we need The problem is that I expected that we would subsequently keep the j2 key columns from the right side of the lookup, but we use the ones from the left side. Usually this does not matter since they are equal, except when nothing matched, where the right ones are going to be NULL for LEFT OUTER JOIN, and the left ones may not be. For ANTI JOIN this does not matter since the left ones will be projected away. It sounds like this may not be a trivial fix in the optimizer so I tried a fix in the join reader itself. Since it knows which columns are the right cols in the left side (they are the lookup columns), it can explicitly set them to NULL for this unmatched case sumeerbhola@888777e |
Oh, I see, thanks. Your prototype fix seems reasonable to me. Overall, the complexity of the paired joiner approach keeps on increasing, so it's pretty hard to follow what's going on, but I guess we have to accept it for the performance reasons, and the fix doesn't make things much worse complexity-wise. |
I think the way to fix this in the optimizer would be to prevent both sides of the lookup join from having the same column IDs. Let me see if I can do that -- it might be cleaner than fixing it on the execution side. But if it's not possible then the approach in the prototype seems like a good alternative. |
I've confirmed that using different Column IDs in the optimizer fixes the issue. Although it's a bit involved, I think it's probably still preferable to the changes on the execution side. I'll clean up the code and submit a PR shortly. |
…join Prior to this patch, it was possible for a paired join to produce incorrect results for a left inverted join. In particular, some output rows had non-NULL values for right-side columns when the right-side columns should have been NULL. This commit fixes the issue by updating the optimizer to ensure that only columns from the second join in the paired join (the lookup join) are projected, not columns from the first (the inverted join). Fixes cockroachdb#58892 Release note (bug fix): Fixed an issue where a left inverted join could have incorrect results. In particular, some output rows could have non-NULL values for right-side columns when the right-side columns should have been NULL. This issue has only existed in alpha releases of 21.1 so far, and it is now fixed.
59279: opt: fix bug with incorrect results produced by paired left inverted join r=rytaft a=rytaft Prior to this patch, it was possible for a paired join to produce incorrect results for a left inverted join. In particular, some output rows had non-NULL values for right-side columns when the right-side columns should have been NULL. This commit fixes the issue by updating the optimizer to ensure that only columns from the second join in the paired join (the lookup join) are projected, not columns from the first (the inverted join). Fixes #58892 Release note (bug fix): Fixed an issue where a left inverted join could have incorrect results. In particular, some output rows could have non-NULL values for right-side columns when the right-side columns should have been NULL. This issue has only existed in alpha releases of 21.1 so far, and it is now fixed. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Describe the problem
See the logictest with the
NOTE
below. The output row'sj2.k
value should beNULL
because no rows on the right match theON
condition.When I remove the join and index hints, a cross join is performed with a different result:
Environment:
I'm currently seeing this on master@
0d6f0ddd
.cc @rytaft @sumeerbhola
The text was updated successfully, but these errors were encountered: