sql: left inverted join pair produces incorrect results #58892

mgartner · 2021-01-12T23:24:24Z

Describe the problem

See the logictest with the NOTE below. The output row's j2.k value should be NULL because no rows on the right match the ON condition.

statement ok
CREATE TABLE j1 (
  k INT PRIMARY KEY,
  j JSON
)

statement ok
INSERT INTO j1 VALUES (1, '{"a": "b"}')

statement ok
CREATE TABLE j2 (
  k INT PRIMARY KEY,
  j JSON,
  INVERTED INDEX j_idx (j)
)

statement ok
INSERT INTO j2 VALUES (1, '{"a": "b"}')

query T
EXPLAIN (OPT, VERBOSE)
SELECT j1.*, j2.*
FROM j1 LEFT INVERTED JOIN j2@j_idx
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
sort (segmented)
 ├── columns: k:1 j:2 k:5 j:6
 ├── immutable
 ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
 ├── cost: 102714.07
 ├── key: (1,5)
 ├── fd: (1)-->(2), (1,5)-->(6)
 ├── ordering: +1,+5
 ├── prune: (1,5)
 ├── interesting orderings: (+1) (+5)
 └── left-join (lookup j2)
      ├── columns: j1.k:1 j1.j:2 j2.k:5 j2.j:6
      ├── key columns: [5] = [5]
      ├── lookup columns are key
      ├── immutable
      ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      ├── cost: 102684.06
      ├── key: (1,5)
      ├── fd: (1)-->(2), (1,5)-->(6)
      ├── ordering: +1
      ├── prune: (1,5)
      ├── interesting orderings: (+1) (+5)
      ├── left-join (inverted j2@j_idx)
      │    ├── columns: j1.k:1 j1.j:2 j2.k:5 continuation:11
      │    ├── flags: force inverted join (into right side)
      │    ├── inverted-expr
      │    │    └── j2.j:6 @> j1.j:2
      │    ├── stats: [rows=10000, distinct(1)=1000, null(1)=0]
      │    ├── cost: 41984.03
      │    ├── key: (1,5)
      │    ├── fd: (1)-->(2), (5)-->(11)
      │    ├── ordering: +1
      │    ├── scan j1
      │    │    ├── columns: j1.k:1 j1.j:2
      │    │    ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      │    │    ├── cost: 1084.02
      │    │    ├── key: (1)
      │    │    ├── fd: (1)-->(2)
      │    │    ├── ordering: +1
      │    │    ├── prune: (1,2)
      │    │    ├── interesting orderings: (+1)
      │    │    └── unfiltered-cols: (1-4)
      │    └── filters (true)
      └── filters
           ├── j2.j:6 @> j1.j:2 [outer=(2,6), immutable]
           └── j2.j:6 = '"foo"' [outer=(6), immutable, constraints=(/6: [/'"foo"' - /'"foo"']; tight), fd=()-->(6)]

# NOTE: The output should be:
#
#   1  {"a": "b"}  NULL  NULL
#
query ITIT
SELECT j1.*, j2.*
FROM j1 LEFT INVERTED JOIN j2@j_idx
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
1  {"a": "b"}  1  NULL

When I remove the join and index hints, a cross join is performed with a different result:

query T
EXPLAIN (OPT, VERBOSE)
SELECT j1.*, j2.*
FROM j1 LEFT JOIN j2
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
sort
 ├── columns: k:1 j:2 k:5 j:6
 ├── immutable
 ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
 ├── cost: 2521.04647
 ├── key: (1,5)
 ├── fd: (1)-->(2), (1,5)-->(6)
 ├── ordering: +1,+5
 ├── prune: (1,5)
 ├── interesting orderings: (+1) (+5)
 └── left-join (cross)
      ├── columns: j1.k:1 j1.j:2 j2.k:5 j2.j:6
      ├── immutable
      ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      ├── cost: 2290.755
      ├── key: (1,5)
      ├── fd: (1)-->(2), (1,5)-->(6)
      ├── prune: (1,5)
      ├── interesting orderings: (+1) (+5)
      ├── scan j1
      │    ├── columns: j1.k:1 j1.j:2
      │    ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
      │    ├── cost: 1084.02
      │    ├── key: (1)
      │    ├── fd: (1)-->(2)
      │    ├── prune: (1,2)
      │    ├── interesting orderings: (+1)
      │    └── unfiltered-cols: (1-4)
      ├── select
      │    ├── columns: j2.k:5 j2.j:6
      │    ├── immutable
      │    ├── stats: [rows=10, distinct(6)=1, null(6)=0]
      │    ├── cost: 1094.04
      │    ├── key: (5)
      │    ├── fd: ()-->(6)
      │    ├── prune: (5)
      │    ├── interesting orderings: (+5)
      │    ├── scan j2
      │    │    ├── columns: j2.k:5 j2.j:6
      │    │    ├── stats: [rows=1000, distinct(5)=1000, null(5)=0, distinct(6)=100, null(6)=10]
      │    │    ├── cost: 1084.02
      │    │    ├── key: (5)
      │    │    ├── fd: (5)-->(6)
      │    │    ├── prune: (5,6)
      │    │    └── interesting orderings: (+5)
      │    └── filters
      │         └── j2.j:6 = '"foo"' [outer=(6), immutable, constraints=(/6: [/'"foo"' - /'"foo"']; tight), fd=()-->(6)]
      └── filters
           └── j2.j:6 @> j1.j:2 [outer=(2,6), immutable]

query ITIT
SELECT j1.*, j2.*
FROM j1 LEFT JOIN j2
  ON j2.j @> j1.j AND j2.j = '"foo"'
ORDER BY j1.k, j2.k
----
1  {"a": "b"}  NULL  NULL

Environment:

I'm currently seeing this on master@0d6f0ddd.

cc @rytaft @sumeerbhola

The text was updated successfully, but these errors were encountered:

sumeerbhola · 2021-01-13T19:29:10Z

I think there is something broken with the projection taken from the joinReader output.
The output columns listed above for the inverted join and left join are:
columns: j1.k:1 j1.j:2 j2.k:5 continuation:11
columns: j1.k:1 j1.j:2 j2.k:5 j2.j:6

The joinReader produces the union of the left and right side, i.e., it includes the continuation column from the left and includes the key used in lookup from both the left and right side. This is described in the example here https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/execinfrapb/processors_sql.proto#L240-L254
The projection after that is supposed to remove the continuation column and the lookup key included in the left side, since the lookup key from the right side is the one that will indicate whether a match really occurred.

The joinReader in this case is producing
1 (j1.k), <json value> (j1.j), 1 (j2.k), false (continuation:11), null (right side j2.k), null (j2.j)
as promised by the JoinReaderSpec but the projection is using the j2.k from the left side.

There is another related problem: the rowFetcher does not provide a value for the j2.k column in NextRow here

cockroach/pkg/sql/rowexec/joinreader.go

Line 609 in d1702fe

lookedUpRow, _, _, err := jr.fetcher.NextRow(jr.Ctx)

. As expected, there are 2 columns in that row, but the j2.k column is nil (though the rowFetcher does have it, since it needs to compute the key string in the preceding call to PartialKey which will be used in the map lookup). I think this is because the join expects that the j2.k from the left-side suffices, and j2.j is the only column in the neededRightCols which is used to initialize the rowFetcher

cockroach/pkg/sql/rowexec/joinreader.go

Line 279 in d1702fe

neededRightCols := jr.neededRightCols()

I am surprised that this breakage hasn't shown up in tests before. Maybe our logic tests are not testing the case of only false positives for the original row from the first join. The joinReader unit tests do test it, but of course don't cover this projection problem, and specify a PostProcessSpec that includes the PK columns of the primary index being looked up, so the rowFetcher also behaves as expected.

yuzefovich · 2021-01-20T02:33:22Z

I looked into this issue for a bit, and I think the problem is a broken assumption in how unmatched tuples are handled by most joiners (those that embed joinerBase) when they are used as the second in the joiner pair.

In non-paired joiner approach it works as follows: for LEFT/FULL outer joins, if the left tuple doesn't have a match, we create a "combined" row in joinerBase.renderUnmatchedRow by copying the left row as is and putting DNulls into all columns corresponding to the right side.

Now that we have a paired joiner approach, from the perspective of the second lookup joiner, columns j1.k:1 j1.j:2 j2.k:5 continuation:11 came from the left, so they are part of the "left tuple" and are copied as is while only j2.j:6 gets DNull value. However, that is not quite correct because in the first inverted joiner j2.k:5 came from the right side and should be treated as such in the second lookup joiner too.

I'm not sure what the correct fix is. My first thought is why do we project j2.k:5 from the first inverted joiner in the first place? ~~We know that j2.k:5 must equal j1.k:1, so~~ we could omit emitting it from the inverted joiner, then we will have to fetch it from the right side in the second lookup joiner, and it will be treated correctly for the "unmatched tuples handling" purposes. I imagine the changes would need to be made somewhere in the optimizer.

Alternatively, we could maintain a special set of columnIDs coming from the right side in the first inverted joiner so that at the second lookup joiner we know what are the "true" right columns. This seems ugly and annoying though.

mgartner · 2021-01-20T04:29:40Z

I'm not sure what the correct fix is. My first thought is why do we project j2.k:5 from the first inverted joiner in the first place? We know that j2.k:5 must equal j1.k:1.

I don't think this is true, j2.k:5 and j1.k:1 are not equal after the first join.

But I am also suspicious of the outer left lookup join having j2.k:5 as an input column AND a column fetched during the lookup. Producing a new t2.k column with a different column ID as the lookup column in the outer left lookup join might fix the issue, and be safer in general.

sumeerbhola · 2021-01-20T17:03:44Z

I suspect we are saying the same things, but just to be sure: we need j2.k:5 from the first join since it is needed for the second join (for the lookup, since it is the key).

The problem is that I expected that we would subsequently keep the j2 key columns from the right side of the lookup, but we use the ones from the left side. Usually this does not matter since they are equal, except when nothing matched, where the right ones are going to be NULL for LEFT OUTER JOIN, and the left ones may not be. For ANTI JOIN this does not matter since the left ones will be projected away.

It sounds like this may not be a trivial fix in the optimizer so I tried a fix in the join reader itself. Since it knows which columns are the right cols in the left side (they are the lookup columns), it can explicitly set them to NULL for this unmatched case sumeerbhola@888777e
Let me know if this sounds reasonable, and I'll clean it up for review.

yuzefovich · 2021-01-20T18:05:07Z

Oh, I see, thanks. Your prototype fix seems reasonable to me.

Overall, the complexity of the paired joiner approach keeps on increasing, so it's pretty hard to follow what's going on, but I guess we have to accept it for the performance reasons, and the fix doesn't make things much worse complexity-wise.

rytaft · 2021-01-20T19:17:51Z

I think the way to fix this in the optimizer would be to prevent both sides of the lookup join from having the same column IDs. Let me see if I can do that -- it might be cleaner than fixing it on the execution side. But if it's not possible then the approach in the prototype seems like a good alternative.

rytaft · 2021-01-21T15:49:26Z

I've confirmed that using different Column IDs in the optimizer fixes the issue. Although it's a bit involved, I think it's probably still preferable to the changes on the execution side. I'll clean up the code and submit a PR shortly.

…join Prior to this patch, it was possible for a paired join to produce incorrect results for a left inverted join. In particular, some output rows had non-NULL values for right-side columns when the right-side columns should have been NULL. This commit fixes the issue by updating the optimizer to ensure that only columns from the second join in the paired join (the lookup join) are projected, not columns from the first (the inverted join). Fixes cockroachdb#58892 Release note (bug fix): Fixed an issue where a left inverted join could have incorrect results. In particular, some output rows could have non-NULL values for right-side columns when the right-side columns should have been NULL. This issue has only existed in alpha releases of 21.1 so far, and it is now fixed.

59279: opt: fix bug with incorrect results produced by paired left inverted join r=rytaft a=rytaft Prior to this patch, it was possible for a paired join to produce incorrect results for a left inverted join. In particular, some output rows had non-NULL values for right-side columns when the right-side columns should have been NULL. This commit fixes the issue by updating the optimizer to ensure that only columns from the second join in the paired join (the lookup join) are projected, not columns from the first (the inverted join). Fixes #58892 Release note (bug fix): Fixed an issue where a left inverted join could have incorrect results. In particular, some output rows could have non-NULL values for right-side columns when the right-side columns should have been NULL. This issue has only existed in alpha releases of 21.1 so far, and it is now fixed. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. A-sql-execution Relating to SQL execution. labels Jan 12, 2021

mgartner mentioned this issue Jan 13, 2021

opt: do not pass along non-inverted prefix columns from left inverted join #58163

Closed

yuzefovich assigned sumeerbhola Jan 20, 2021

rytaft mentioned this issue Jan 22, 2021

opt: fix bug with incorrect results produced by paired left inverted join #59279

Merged

craig bot closed this as completed in 7e2d119 Jan 26, 2021

rytaft mentioned this issue Feb 22, 2021

opt,sql: use paired-joins with non-covering indexes for left joins #58261

Merged

rytaft mentioned this issue May 30, 2022

opt: left lookup join pair does not correctly null-extend unmatched rows #81968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: left inverted join pair produces incorrect results #58892

sql: left inverted join pair produces incorrect results #58892

mgartner commented Jan 12, 2021

sumeerbhola commented Jan 13, 2021

yuzefovich commented Jan 20, 2021 •

edited

Loading

mgartner commented Jan 20, 2021 •

edited

Loading

sumeerbhola commented Jan 20, 2021

yuzefovich commented Jan 20, 2021

rytaft commented Jan 20, 2021

rytaft commented Jan 21, 2021

sql: left inverted join pair produces incorrect results #58892

sql: left inverted join pair produces incorrect results #58892

Comments

mgartner commented Jan 12, 2021

sumeerbhola commented Jan 13, 2021

yuzefovich commented Jan 20, 2021 • edited Loading

mgartner commented Jan 20, 2021 • edited Loading

sumeerbhola commented Jan 20, 2021

yuzefovich commented Jan 20, 2021

rytaft commented Jan 20, 2021

rytaft commented Jan 21, 2021

yuzefovich commented Jan 20, 2021 •

edited

Loading

mgartner commented Jan 20, 2021 •

edited

Loading