Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: left lookup join pair does not correctly null-extend unmatched rows #81968

Closed
50U10FCA7 opened this issue May 27, 2022 · 5 comments · Fixed by #82054
Closed

opt: left lookup join pair does not correctly null-extend unmatched rows #81968

50U10FCA7 opened this issue May 27, 2022 · 5 comments · Fixed by #82054
Assignees
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory O-community Originated from the community release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. T-sql-queries SQL Queries Team

Comments

@50U10FCA7
Copy link

50U10FCA7 commented May 27, 2022

Describe the problem

Found a problem with subqueries which is using columns of the main query table.

To Reproduce

MRE:

CREATE TABLE items (
    id        INT NOT NULL PRIMARY KEY,
    chat_id   INT NOT NULL,
    author_id INT NOT NULL,
    INDEX chat_id_idx (chat_id) -- without this index everything works fine
);

CREATE TABLE views (
    chat_id INT NOT NULL,
    user_id INT NOT NULL,
    PRIMARY KEY (chat_id, user_id)
);

-- create view of user(1) to chat(1)
INSERT INTO views(chat_id, user_id) VALUES (1, 1);

-- user(1) inserted items to chat(1)
INSERT INTO items(id, chat_id, author_id) VALUES
(1, 1, 1),
(2, 1, 1),
(3, 1, 1);

-- following request should return 0 records, because there is no items related to any other user.
-- but result is 1
SELECT (SELECT COUNT(items.id)
        FROM items
        WHERE items.chat_id = views.chat_id
          AND items.author_id != views.user_id)
FROM views
WHERE chat_id = 1
  AND user_id = 1;

-- flatten version of the same query works fine
SELECT COUNT(id)
FROM items
WHERE chat_id = 1
  AND author_id != 1;

Expected behavior

Select returns 0.

Additional data / screenshots

cockroach sql session log:

sh-4.4# cockroach sql --insecure
#
# Welcome to the CockroachDB SQL shell.
# All statements must be terminated by a semicolon.
# To exit, type: \q.
#
# Server version: CockroachDB CCL v22.1.0 (x86_64-pc-linux-gnu, built 2022/05/23 16:27:47, go1.17.6) (same version as client)
# Cluster ID: ddfc36f6-1cca-48ad-b1b2-da3f90493b5e
#
# Enter \? for a brief introduction.
#
root@:26257/defaultdb> CREATE TABLE items (id INT NOT NULL PRIMARY KEY, chat_id INT NOT NULL, author_id INT NOT NULL, INDEX chat_id_idx (chat_id));                                                                                                                                                              CREATE TABLE views (chat_id INT NOT NULL, user_id INT NOT NULL, PRIMARY KEY (chat_id, user_id));
CREATE TABLE


Time: 57ms total (execution 55ms / network 2ms)

CREATE TABLE


Time: 16ms total (execution 16ms / network 0ms)

root@:26257/defaultdb> INSERT INTO views(chat_id, user_id) VALUES (1, 1);
INSERT 1


Time: 75ms total (execution 74ms / network 1ms)

root@:26257/defaultdb> INSERT INTO items(id, chat_id, author_id) VALUES (1, 1, 1), (2, 1, 1), (3, 1, 1);
INSERT 3


Time: 47ms total (execution 46ms / network 1ms)

root@:26257/defaultdb> SELECT (SELECT COUNT(items.id) FROM items WHERE items.chat_id = views.chat_id AND items.author_id != views.user_id) FROM views WHERE chat_id = 1 AND user_id = 1;
  count
---------
      1
(1 row)


Time: 55ms total (execution 52ms / network 3ms)

root@:26257/defaultdb> SELECT COUNT(id) FROM items WHERE chat_id = 1 AND author_id != 1;
  count
---------
      0
(1 row)


Time: 23ms total (execution 22ms / network 1ms)

Log of the same execution but without chat_id_idx index:

root@:26257/defaultdb> CREATE TABLE items (id INT NOT NULL PRIMARY KEY, chat_id   INT NOT NULL, author_id INT NOT NULL);
CREATE TABLE


Time: 53ms total (execution 52ms / network 1ms)

root@:26257/defaultdb> CREATE TABLE views (chat_id INT NOT NULL, user_id INT NOT NULL, PRIMARY KEY (chat_id, user_id));
CREATE TABLE


Time: 43ms total (execution 42ms / network 1ms)

root@:26257/defaultdb> INSERT INTO views(chat_id, user_id) VALUES (1, 1);
INSERT 1


Time: 47ms total (execution 46ms / network 1ms)

root@:26257/defaultdb> INSERT INTO items(id, chat_id, author_id) VALUES (1, 1, 1), (2, 1, 1), (3, 1, 1);
INSERT 3


Time: 43ms total (execution 42ms / network 1ms)

root@:26257/defaultdb> SELECT (SELECT COUNT(items.id) FROM items WHERE items.chat_id = views.chat_id AND items.author_id != views.user_id) FROM views WHERE chat_id = 1 AND user_id = 1;
  count
---------
      0
(1 row)


Time: 62ms total (execution 61ms / network 1ms)

Environment:

  • CockroachDB version [v22.1.0]
  • Server OS: [cockroachdb/cockroach docker image]
  • Client app [cockroach sql]

Additional context

The problem is not reproducible on the previous versions of CockroachDB (latest-v21.2 image tag), so I think some bug in the last update.

Jira issue: CRDB-16146

@50U10FCA7 50U10FCA7 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 27, 2022
@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels May 27, 2022
@yuzefovich yuzefovich removed the X-blathers-untriaged blathers was unable to find an owner label May 27, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label May 27, 2022
@yuzefovich
Copy link
Member

Thanks for the report and the detailed repro steps! I think #58261 is to blame (which was merged in 22.1 time frame).

@cockroachdb cockroachdb deleted a comment from blathers-crl bot May 27, 2022
@yuzefovich
Copy link
Member

yuzefovich commented May 27, 2022

Here is the plan:

  • render
  │ columns: (count)
  │ estimated row count: 1 (missing stats)
  │ render count: count
  │
  └── • group (streaming)
      │ columns: (count)
      │ estimated row count: 1 (missing stats)
      │ aggregate 0: count(chat_id)
      │
      └── • project
          │ columns: (chat_id)
          │
          └── • project
              │ columns: (chat_id, user_id, chat_id, author_id)
              │ estimated row count: 3 (missing stats)
              │
              └── • lookup join (left outer)
                  │ columns: (chat_id, user_id, id, chat_id, cont, author_id)
                  │ table: items@items_pkey
                  │ equality: (id) = (id)
                  │ equality cols are key
                  │ pred: author_id != user_id
                  │
                  └── • lookup join (left outer)
                      │ columns: (chat_id, user_id, id, chat_id, cont)
                      │ estimated row count: 9 (missing stats)
                      │ table: items@chat_id_idx
                      │ equality: (chat_id) = (chat_id)
                      │
                      └── • scan
                            columns: (chat_id, user_id)
                            estimated row count: 1 (missing stats)
                            table: views@views_pkey
                            spans: /1/1/0

I believe what's happening is the following:

  • we use a paired joiner to execute LEFT OUTER lookup join on items table
  • we read a single row (1, 1) from the views table
  • then the first lookup join in the pair returns (1, 1, 1, 1, false), (1, 1, 2, 1, true), (1, 1, 3, 1, true) (the fifth value is the "continuation" column)
  • the second lookup join (prior to evaluation ON condition) also fetches three rows (1, 1, 1, 1, false, 1), (1, 1, 2, 1, true, 1), (1, 1, 3, 1, true, 1)
  • however, all three rows fail the ON condition
  • yet, since it is a LEFT OUTER join, we take the last input row to the second joiner (1, 1, 3, 1, true), add a NULL value for author_id column, and return (1, 1, 3, 1, true, NULL).

This is where the bug is - we should have returned (1, 1, NULL, NULL, NULL, NULL). The problem is that the second joiner should have taken the input row to the first joiner for the unmatched output row, without all columns added by the first joiner.

@msirek msirek self-assigned this May 27, 2022
@rytaft rytaft added S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 labels May 29, 2022
@rytaft rytaft changed the title Invalid subqueries filtering opt: left lookup join pair does not correctly null-extend unmatched rows May 30, 2022
@rytaft
Copy link
Collaborator

rytaft commented May 30, 2022

Thank you for the report, @50U10FCA7, and for the initial investigation, @yuzefovich.

It looks like this is the same situation as we were seeing in #58892 for inverted joins. Since #58892 was already fixed, we see correct results if the example from above is tweaked slightly to use an inverted join:

CREATE TABLE items (
    id        INT NOT NULL PRIMARY KEY,
    chat_id   INT NOT NULL,
    author_id INT NOT NULL,
    j JSON,
    INVERTED INDEX chat_id_j_idx (chat_id, j) -- without this index everything works fine
);
CREATE TABLE views (
    chat_id INT NOT NULL,
    user_id INT NOT NULL,
    j JSON,
    PRIMARY KEY (chat_id, user_id)
);
INSERT INTO views(chat_id, user_id, j) VALUES (1, 1, '["a"]');
INSERT INTO items(id, chat_id, author_id, j) VALUES
(1, 1, 1, '["a"]'),
(2, 1, 1, '["a"]'),
(3, 1, 1, '["a"]');

SELECT (SELECT COUNT(items.id)
        FROM items
        WHERE items.chat_id = views.chat_id
          AND items.j @> views.j
          AND items.author_id != views.user_id)
FROM views
WHERE chat_id = 1
  AND user_id = 1;
  count
---------
      0
(1 row)

EXPLAIN (opt) SELECT (SELECT COUNT(items.id)
        FROM items
        WHERE items.chat_id = views.chat_id
          AND items.j @> views.j
          AND items.author_id != views.user_id)
FROM views
WHERE chat_id = 1
  AND user_id = 1;
                                         info
--------------------------------------------------------------------------------------
  project
   ├── group-by (streaming)
   │    ├── left-join (lookup items)
   │    │    ├── lookup columns are key
   │    │    ├── second join in paired joiner
   │    │    ├── left-join (inverted items@chat_id_j_idx)
   │    │    │    ├── first join in paired joiner; continuation column: continuation
   │    │    │    ├── inverted-expr
   │    │    │    │    └── items.j @> views.j
   │    │    │    ├── scan views
   │    │    │    │    └── constraint: /1/2: [/1/1 - /1/1]
   │    │    │    └── filters (true)
   │    │    └── filters
   │    │         ├── items.j @> views.j
   │    │         └── author_id != user_id
   │    └── aggregations
   │         └── count
   │              └── items.chat_id
   └── projections
        └── count

The fix for this issue will be the same as the fix for #58892, but applied to GenerateLookupJoins. Basically, we need to ensure that the index used for the first join in the paired left join has different column IDs than the index used for the second join.

I will submit a PR to fix this shortly.

@rytaft rytaft assigned rytaft and unassigned msirek May 30, 2022
rytaft added a commit to rytaft/cockroach that referenced this issue May 30, 2022
Prior to this patch, it was possible for a paired join to produce incorrect
results for a left lookup join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the index join)
are projected, not columns from the first (the lookup join).

Fixes cockroachdb#81968

Release note (bug fix): Fixed an issue where a left lookup join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue only exists in 22.1.0 and prior development
releases of 22.1.
craig bot pushed a commit that referenced this issue May 30, 2022
82054: opt: fix bug with incorrect results produced by paired left lookup join r=rytaft a=rytaft

Prior to this patch, it was possible for a paired join to produce incorrect
results for a left lookup join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the index join)
are projected, not columns from the first (the lookup join).

Fixes #81968

Release note (bug fix): Fixed an issue where a left lookup join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue only exists in 22.1.0 and prior development
releases of 22.1.

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig craig bot closed this as completed in 0cb0dca May 30, 2022
rytaft added a commit to rytaft/cockroach that referenced this issue May 30, 2022
Prior to this patch, it was possible for a paired join to produce incorrect
results for a left lookup join. In particular, some output rows had
non-NULL values for right-side columns when the right-side columns should
have been NULL.

This commit fixes the issue by updating the optimizer to ensure that
only columns from the second join in the paired join (the index join)
are projected, not columns from the first (the lookup join).

Fixes cockroachdb#81968

Release note (bug fix): Fixed an issue where a left lookup join could
have incorrect results. In particular, some output rows could have non-NULL
values for right-side columns when the right-side columns should
have been NULL. This issue only exists in 22.1.0 and prior development
releases of 22.1.
@rytaft
Copy link
Collaborator

rytaft commented Jun 1, 2022

The fix for this issue will be available as part of v22.1.1, which is scheduled to be released next week. In the meantime, users who think they may be affected by this issue should take the following steps:

To determine whether your queries may be affected by this issue on v22.1.0, you should examine the query plans of any queries with left joins or correlated subqueries by using EXPLAIN (opt). If the resulting plan shows two nested left lookup joins as part of a “paired joiner”, these queries may produce incorrect results. The EXPLAIN (opt) output for affected plans will look something like this:

   └── left-join (lookup table2 [as=t2])
        ├── lookup columns are key
        ├── second join in paired joiner
        ├── left-join (lookup table2@table2_idx [as=t2])
        │    ├── first join in paired joiner; continuation column: continuation

To mitigate this problem for affected queries on v22.1.0, ensure that tables used by queries with a left join or correlated subquery have only covering secondary indexes. That is, any secondary indexes on these tables should include (either as a key column or a STORING column) all columns used in the ON or WHERE conditions.

Alternatively, non-covering indexes can be dropped.

@rytaft
Copy link
Collaborator

rytaft commented Jun 22, 2022

@50U10FCA7 Thank you so much for finding and filing this issue, we really appreciate it. We're always looking to improve our testing coverage, so we'd love to understand your use cases and workload. If you are willing to share, would you mind reaching out to support@cockroachlabs.com (please mention this GitHub issue in the email)? Thank you!

@rytaft rytaft added the C-technical-advisory Caused a technical advisory label Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory O-community Originated from the community release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants