Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support batched nested loop on conjunction clauses #14068

Closed
tanujnay112 opened this issue Sep 19, 2022 · 1 comment
Closed

[YSQL] Support batched nested loop on conjunction clauses #14068

tanujnay112 opened this issue Sep 19, 2022 · 1 comment
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue

Comments

@tanujnay112
Copy link
Contributor

tanujnay112 commented Sep 19, 2022

Jira Link: DB-3562

Description

We currently don't support batched nested loop joins on any clauses that are conjunctions of simple clauses such as inner_table.c1 = outer_table.c1 AND inner_table.c2 = outer_table.c2. In order to do this we need to also support inner table requests of the form (inner_table.c1, inner_table.c2) IN ((outer_table.c1_1, outer_table.c2_1),(outer_table.c1_2, outer_table.c2_2), (outer_table.c1_3, outer_table.c2_3)....).

@tanujnay112 tanujnay112 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Sep 19, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Sep 19, 2022
@tanujnay112 tanujnay112 self-assigned this Sep 19, 2022
@yugabyte-ci yugabyte-ci added kind/new-feature This is a request for a completely new feature and removed kind/bug This issue is a bug labels Sep 19, 2022
tanujnay112 added a commit that referenced this issue Dec 15, 2022
…ScanChoices

Summary:
This change alters scan_choices and doc_pgsql/ql_scanspec to allow for IN conditions on tuples. They use the data structure in col_group.h to achieve this. The column groups introduced to the ScanChoices class in this change encode which columns should be "grouped" together when processing the request filter. For example, if we have a filter of the form (r0, r2) IN ((4,6), (2,6), (10, 12)) AND r1 <= 24 where the primary key of the table in question is (r0 asc, r1 asc, r2 asc), the newly constructed groups would be {0,2} and {1}.

This change is meant as a lead up to the introduction of batched nested loop joins on conjunctive join clauses.

Test Plan: ./yb_build.sh --cxx-test scan_choices-test

Reviewers: rskannan, kpopali, mtakahara

Reviewed By: kpopali, mtakahara

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D21581
tanujnay112 added a commit that referenced this issue Jan 2, 2023
Summary:
This change allows for batching on nested loop join clauses that are conjunctions. A join clause of the form
`inner_var1 = f1(outer_vars) AND inner_var2 = f2(outer_vars)` is converted into `ROW(inner_var1, inner_var2) IN (ROW(f1(outer_vars_$1),  f2(outer_vars_$1)), ROW(f1(outer_vars_$2),  f2(outer_vars_$2)), ROW(f1(outer_vars_$3), f2(outer_vars_$3))...)` when batching. Before this change DocDB and PgGate could not handle such IN conditions on RowExpressions. This change adds those necessary changes to to PgGate and the DocDB iterator as well.

Summary of changes:
- indxpath.c has been edited to allow batching conjunction clauses as above.
- A mechanism to take a bunch of individually batched clauses (inner_var1 = BatchedExpr(f1(outer_vars)), inner_var2 = BatchedExpr(f2(outer_vars))) and "zip them up" to (ROW(inner_var1, inner_var2) = BatchedExpr(ROW(f1(outer_vars), f2(outer_vars))) has been added to restrictinfo.c.
- nodeindexscan.c has had its ScanKey building mechanism augmented to allow for ScalarArrayOperations on RowExpressions.
- yb_scan.c has been edited in the way it processes RowExpressions. Previously, all the ScanKeys of a `Scan` object was isomorphically translated into the ScanKey array of the `YbScanDesc` object in `ybcBeginScan`. This was the case for RowExpression ScanKeys as well. Their subsidiary ScanKeys in the sk_argument array were not flattened into the ScanKey array of the `YbScanDesc`. This would cause a lot of unnecessary casing in order to process Row keys. This has been alleviated. This file also binds ScalaryArrayOperations on Row expressions via the `ybcBindTupleExprCondIn` method which is the Row counterpart for `ybcBindColumnCondIn`.
- Row expressions are bound as tuples in Pggate. Changes have been made throughout pggate to account for this notion.
- pg_doc_op.cc has been altered to be able to deal with bound tuples that have a mix of hash key and range key columns inside them.
- doc_rowwise_iterator and doc_pgsql_scanspec have been altered to allow for IN conditions on tuples. They use the data structure in col_group.h to achieve this. Further details of these changes are in the addendum at the end of the following doc:

https://docs.google.com/document/d/1kKtFEYtT0xIlvuXKD18wgWuqkiIGan8dKl0M9KoFI_Y/edit?usp=sharing

Test Plan: ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressJoin'

Reviewers: mtakahara, amartsinchyk

Reviewed By: amartsinchyk

Subscribers: dmitry, yql, smishra

Differential Revision: https://phabricator.dev.yugabyte.com/D20059
@jameshartig
Copy link
Contributor

@tanujnay112 Can this get backported to 2.16?

@rthallamko3 rthallamko3 removed the status/awaiting-triage Issue awaiting triage label Feb 28, 2023
jasonyb pushed a commit that referenced this issue Jul 24, 2023
Summary:
Commit 12e23ad, titled

    [#14068] YSQL: Allow Batched Nested Loop joins for conjunction clauses

introduces bugs for lsm IndexScan/IndexOnlyScan when, during
execution, scan keys are transferred to YbScanDesc.  The main reason is
that it flattens subkeys but doesn't modify bounds checks accordingly.
This can lead to segfaults or wrong results.

1. yb_hash_code and regular keys: both types of keys share the same
   array, one writing forward in the array and the other writing
   backwards, with a calculation to make sure they don't overlap.
   Flattening causes more keys to be written than anticipated, so
   regular keys go beyond the calculated midpoint into the same area
   that yb_hash_code keys are written (or even further beyond that).
   Fix by separating yb_hash_code keys into a separate List to avoid
   having to deal with calculations and sharing space for two separate
   collections of keys.
1. Overall array bounds check: YbScanDesc->keys is a fixed length array,
   and the logic to prevent writing outside the array boundaries doesn't
   account for flattening, resulting in potential buffer overflow.  Fix
   the check.

Add tests to exercise these cases and more.  Uncover more bugs #18347
and #18360 in the process.
Jira: DB-7311

Test Plan:
    ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressDml

Backport-through: 2.18
Close: #18322

Reviewers: dmitry, tnayak

Reviewed By: tnayak

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D27169
jasonyb pushed a commit that referenced this issue Jul 31, 2023
Summary:
Commit 12e23ad, titled

    [#14068] YSQL: Allow Batched Nested Loop joins for conjunction clauses

introduces bugs for lsm IndexScan/IndexOnlyScan when, during
execution, scan keys are transferred to YbScanDesc.  The main reason is
that it flattens subkeys but doesn't modify bounds checks accordingly.
This can lead to segfaults or wrong results.

1. yb_hash_code and regular keys: both types of keys share the same
   array, one writing forward in the array and the other writing
   backwards, with a calculation to make sure they don't overlap.
   Flattening causes more keys to be written than anticipated, so
   regular keys go beyond the calculated midpoint into the same area
   that yb_hash_code keys are written (or even further beyond that).
   Fix by separating yb_hash_code keys into a separate List to avoid
   having to deal with calculations and sharing space for two separate
   collections of keys.
1. Overall array bounds check: YbScanDesc->keys is a fixed length array,
   and the logic to prevent writing outside the array boundaries doesn't
   account for flattening, resulting in potential buffer overflow.  Fix
   the check.

Add tests to exercise these cases and more.  Uncover more bugs #18347
and #18360 in the process.
Jira: DB-7311

Test Plan:
    ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressDml

Backport-through: 2.18
Original commit: bfe385d / D27169

Reviewers: tnayak

Reviewed By: tnayak

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D27415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants