Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: left inverted join pair produces incorrect results #58892

Closed
mgartner opened this issue Jan 12, 2021 · 7 comments · Fixed by #59279
Closed

sql: left inverted join pair produces incorrect results #58892

mgartner opened this issue Jan 12, 2021 · 7 comments · Fixed by #59279
Assignees
Labels
A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@mgartner
Copy link
Collaborator

Describe the problem

See the logictest with the NOTE below. The output row's j2.k value should be NULL because no rows on the right match the ON condition.

statement ok
CREATE TABLE j1 (
  k INT PRIMARY KEY,
  j JSON
)

statement ok
INSERT INTO j1 VALUES (1, '{"a": "b"}')

statement ok
CREATE TABLE j2 (
  k INT PRIMARY KEY,
  j JSON,
  INVERTED INDEX j_idx (j)
)

statement ok
INSERT INTO j2 VALUES (1, '{"a": "b"}')

query T
EXPLAIN (OPT, VERBOSE)
SELECT j1.*, j2.*
FROM j1 LEFT INVERTED JOIN j2@j_idx
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
sort (segmented)
 ├── columns: k:1 j:2 k:5 j:6
 ├── immutable
 ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
 ├── cost: 102714.07
 ├── key: (1,5)
 ├── fd: (1)-->(2), (1,5)-->(6)
 ├── ordering: +1,+5
 ├── prune: (1,5)
 ├── interesting orderings: (+1) (+5)
 └── left-join (lookup j2)
      ├── columns: j1.k:1 j1.j:2 j2.k:5 j2.j:6
      ├── key columns: [5] = [5]
      ├── lookup columns are key
      ├── immutable
      ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      ├── cost: 102684.06
      ├── key: (1,5)
      ├── fd: (1)-->(2), (1,5)-->(6)
      ├── ordering: +1
      ├── prune: (1,5)
      ├── interesting orderings: (+1) (+5)
      ├── left-join (inverted j2@j_idx)
      │    ├── columns: j1.k:1 j1.j:2 j2.k:5 continuation:11
      │    ├── flags: force inverted join (into right side)
      │    ├── inverted-expr
      │    │    └── j2.j:6 @> j1.j:2
      │    ├── stats: [rows=10000, distinct(1)=1000, null(1)=0]
      │    ├── cost: 41984.03
      │    ├── key: (1,5)
      │    ├── fd: (1)-->(2), (5)-->(11)
      │    ├── ordering: +1
      │    ├── scan j1
      │    │    ├── columns: j1.k:1 j1.j:2
      │    │    ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      │    │    ├── cost: 1084.02
      │    │    ├── key: (1)
      │    │    ├── fd: (1)-->(2)
      │    │    ├── ordering: +1
      │    │    ├── prune: (1,2)
      │    │    ├── interesting orderings: (+1)
      │    │    └── unfiltered-cols: (1-4)
      │    └── filters (true)
      └── filters
           ├── j2.j:6 @> j1.j:2 [outer=(2,6), immutable]
           └── j2.j:6 = '"foo"' [outer=(6), immutable, constraints=(/6: [/'"foo"' - /'"foo"']; tight), fd=()-->(6)]

# NOTE: The output should be:
#
#   1  {"a": "b"}  NULL  NULL
#
query ITIT
SELECT j1.*, j2.*
FROM j1 LEFT INVERTED JOIN j2@j_idx
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
1  {"a": "b"}  1  NULL

When I remove the join and index hints, a cross join is performed with a different result:

query T
EXPLAIN (OPT, VERBOSE)
SELECT j1.*, j2.*
FROM j1 LEFT JOIN j2
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
sort
 ├── columns: k:1 j:2 k:5 j:6
 ├── immutable
 ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
 ├── cost: 2521.04647
 ├── key: (1,5)
 ├── fd: (1)-->(2), (1,5)-->(6)
 ├── ordering: +1,+5
 ├── prune: (1,5)
 ├── interesting orderings: (+1) (+5)
 └── left-join (cross)
      ├── columns: j1.k:1 j1.j:2 j2.k:5 j2.j:6
      ├── immutable
      ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      ├── cost: 2290.755
      ├── key: (1,5)
      ├── fd: (1)-->(2), (1,5)-->(6)
      ├── prune: (1,5)
      ├── interesting orderings: (+1) (+5)
      ├── scan j1
      │    ├── columns: j1.k:1 j1.j:2
      │    ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      │    ├── cost: 1084.02
      │    ├── key: (1)
      │    ├── fd: (1)-->(2)
      │    ├── prune: (1,2)
      │    ├── interesting orderings: (+1)
      │    └── unfiltered-cols: (1-4)
      ├── select
      │    ├── columns: j2.k:5 j2.j:6
      │    ├── immutable
      │    ├── stats: [rows=10, distinct(6)=1, null(6)=0]
      │    ├── cost: 1094.04
      │    ├── key: (5)
      │    ├── fd: ()-->(6)
      │    ├── prune: (5)
      │    ├── interesting orderings: (+5)
      │    ├── scan j2
      │    │    ├── columns: j2.k:5 j2.j:6
      │    │    ├── stats: [rows=1000, distinct(5)=1000, null(5)=0, distinct(6)=100, null(6)=10]
      │    │    ├── cost: 1084.02
      │    │    ├── key: (5)
      │    │    ├── fd: (5)-->(6)
      │    │    ├── prune: (5,6)
      │    │    └── interesting orderings: (+5)
      │    └── filters
      │         └── j2.j:6 = '"foo"' [outer=(6), immutable, constraints=(/6: [/'"foo"' - /'"foo"']; tight), fd=()-->(6)]
      └── filters
           └── j2.j:6 @> j1.j:2 [outer=(2,6), immutable]

query ITIT
SELECT j1.*, j2.*
FROM j1 LEFT JOIN j2
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
1  {"a": "b"}  NULL  NULL

Environment:

I'm currently seeing this on master@0d6f0ddd.

cc @rytaft @sumeerbhola

@mgartner mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. A-sql-execution Relating to SQL execution. labels Jan 12, 2021
@sumeerbhola
Copy link
Collaborator

I think there is something broken with the projection taken from the joinReader output.
The output columns listed above for the inverted join and left join are:
columns: j1.k:1 j1.j:2 j2.k:5 continuation:11
columns: j1.k:1 j1.j:2 j2.k:5 j2.j:6

The joinReader produces the union of the left and right side, i.e., it includes the continuation column from the left and includes the key used in lookup from both the left and right side. This is described in the example here https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/execinfrapb/processors_sql.proto#L240-L254
The projection after that is supposed to remove the continuation column and the lookup key included in the left side, since the lookup key from the right side is the one that will indicate whether a match really occurred.

The joinReader in this case is producing
1 (j1.k), <json value> (j1.j), 1 (j2.k), false (continuation:11), null (right side j2.k), null (j2.j)
as promised by the JoinReaderSpec but the projection is using the j2.k from the left side.

There is another related problem: the rowFetcher does not provide a value for the j2.k column in NextRow here

lookedUpRow, _, _, err := jr.fetcher.NextRow(jr.Ctx)
. As expected, there are 2 columns in that row, but the j2.k column is nil (though the rowFetcher does have it, since it needs to compute the key string in the preceding call to PartialKey which will be used in the map lookup). I think this is because the join expects that the j2.k from the left-side suffices, and j2.j is the only column in the neededRightCols which is used to initialize the rowFetcher
neededRightCols := jr.neededRightCols()

I am surprised that this breakage hasn't shown up in tests before. Maybe our logic tests are not testing the case of only false positives for the original row from the first join. The joinReader unit tests do test it, but of course don't cover this projection problem, and specify a PostProcessSpec that includes the PK columns of the primary index being looked up, so the rowFetcher also behaves as expected.

@yuzefovich
Copy link
Member

yuzefovich commented Jan 20, 2021

I looked into this issue for a bit, and I think the problem is a broken assumption in how unmatched tuples are handled by most joiners (those that embed joinerBase) when they are used as the second in the joiner pair.

In non-paired joiner approach it works as follows: for LEFT/FULL outer joins, if the left tuple doesn't have a match, we create a "combined" row in joinerBase.renderUnmatchedRow by copying the left row as is and putting DNulls into all columns corresponding to the right side.

Now that we have a paired joiner approach, from the perspective of the second lookup joiner, columns j1.k:1 j1.j:2 j2.k:5 continuation:11 came from the left, so they are part of the "left tuple" and are copied as is while only j2.j:6 gets DNull value. However, that is not quite correct because in the first inverted joiner j2.k:5 came from the right side and should be treated as such in the second lookup joiner too.

I'm not sure what the correct fix is. My first thought is why do we project j2.k:5 from the first inverted joiner in the first place? We know that j2.k:5 must equal j1.k:1, so we could omit emitting it from the inverted joiner, then we will have to fetch it from the right side in the second lookup joiner, and it will be treated correctly for the "unmatched tuples handling" purposes. I imagine the changes would need to be made somewhere in the optimizer.

Alternatively, we could maintain a special set of columnIDs coming from the right side in the first inverted joiner so that at the second lookup joiner we know what are the "true" right columns. This seems ugly and annoying though.

@mgartner
Copy link
Collaborator Author

mgartner commented Jan 20, 2021

I'm not sure what the correct fix is. My first thought is why do we project j2.k:5 from the first inverted joiner in the first place? We know that j2.k:5 must equal j1.k:1.

I don't think this is true, j2.k:5 and j1.k:1 are not equal after the first join.

But I am also suspicious of the outer left lookup join having j2.k:5 as an input column AND a column fetched during the lookup. Producing a new t2.k column with a different column ID as the lookup column in the outer left lookup join might fix the issue, and be safer in general.

@sumeerbhola
Copy link
Collaborator

I suspect we are saying the same things, but just to be sure: we need j2.k:5 from the first join since it is needed for the second join (for the lookup, since it is the key).

The problem is that I expected that we would subsequently keep the j2 key columns from the right side of the lookup, but we use the ones from the left side. Usually this does not matter since they are equal, except when nothing matched, where the right ones are going to be NULL for LEFT OUTER JOIN, and the left ones may not be. For ANTI JOIN this does not matter since the left ones will be projected away.

It sounds like this may not be a trivial fix in the optimizer so I tried a fix in the join reader itself. Since it knows which columns are the right cols in the left side (they are the lookup columns), it can explicitly set them to NULL for this unmatched case sumeerbhola@888777e
Let me know if this sounds reasonable, and I'll clean it up for review.

@yuzefovich
Copy link
Member

Oh, I see, thanks. Your prototype fix seems reasonable to me.

Overall, the complexity of the paired joiner approach keeps on increasing, so it's pretty hard to follow what's going on, but I guess we have to accept it for the performance reasons, and the fix doesn't make things much worse complexity-wise.

@rytaft
Copy link
Collaborator

rytaft commented Jan 20, 2021

I think the way to fix this in the optimizer would be to prevent both sides of the lookup join from having the same column IDs. Let me see if I can do that -- it might be cleaner than fixing it on the execution side. But if it's not possible then the approach in the prototype seems like a good alternative.

@rytaft
Copy link
Collaborator

rytaft commented Jan 21, 2021

I've confirmed that using different Column IDs in the optimizer fixes the issue. Although it's a bit involved, I think it's probably still preferable to the changes on the execution side. I'll clean up the code and submit a PR shortly.

rytaft added a commit to rytaft/cockroach that referenced this issue Jan 23, 2021
…join

Prior to this patch, it was possible for a paired join to produce incorrect
results for a left inverted join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the lookup join)
are projected, not columns from the first (the inverted join).

Fixes cockroachdb#58892

Release note (bug fix): Fixed an issue where a left inverted join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue has only existed in alpha releases of 21.1 so
far, and it is now fixed.
craig bot pushed a commit that referenced this issue Jan 26, 2021
59279: opt: fix bug with incorrect results produced by paired left inverted join r=rytaft a=rytaft

Prior to this patch, it was possible for a paired join to produce incorrect
results for a left inverted join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the lookup join)
are projected, not columns from the first (the inverted join).

Fixes #58892

Release note (bug fix): Fixed an issue where a left inverted join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue has only existed in alpha releases of 21.1 so
far, and it is now fixed.

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig craig bot closed this as completed in 7e2d119 Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants