-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Support batched nested loop on conjunction clauses #14068
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/new-feature
This is a request for a completely new feature
priority/medium
Medium priority issue
Comments
tanujnay112
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Sep 19, 2022
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Sep 19, 2022
8 tasks
yugabyte-ci
added
kind/new-feature
This is a request for a completely new feature
and removed
kind/bug
This issue is a bug
labels
Sep 19, 2022
tanujnay112
added a commit
that referenced
this issue
Dec 15, 2022
…ScanChoices Summary: This change alters scan_choices and doc_pgsql/ql_scanspec to allow for IN conditions on tuples. They use the data structure in col_group.h to achieve this. The column groups introduced to the ScanChoices class in this change encode which columns should be "grouped" together when processing the request filter. For example, if we have a filter of the form (r0, r2) IN ((4,6), (2,6), (10, 12)) AND r1 <= 24 where the primary key of the table in question is (r0 asc, r1 asc, r2 asc), the newly constructed groups would be {0,2} and {1}. This change is meant as a lead up to the introduction of batched nested loop joins on conjunctive join clauses. Test Plan: ./yb_build.sh --cxx-test scan_choices-test Reviewers: rskannan, kpopali, mtakahara Reviewed By: kpopali, mtakahara Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D21581
tanujnay112
added a commit
that referenced
this issue
Jan 2, 2023
Summary: This change allows for batching on nested loop join clauses that are conjunctions. A join clause of the form `inner_var1 = f1(outer_vars) AND inner_var2 = f2(outer_vars)` is converted into `ROW(inner_var1, inner_var2) IN (ROW(f1(outer_vars_$1), f2(outer_vars_$1)), ROW(f1(outer_vars_$2), f2(outer_vars_$2)), ROW(f1(outer_vars_$3), f2(outer_vars_$3))...)` when batching. Before this change DocDB and PgGate could not handle such IN conditions on RowExpressions. This change adds those necessary changes to to PgGate and the DocDB iterator as well. Summary of changes: - indxpath.c has been edited to allow batching conjunction clauses as above. - A mechanism to take a bunch of individually batched clauses (inner_var1 = BatchedExpr(f1(outer_vars)), inner_var2 = BatchedExpr(f2(outer_vars))) and "zip them up" to (ROW(inner_var1, inner_var2) = BatchedExpr(ROW(f1(outer_vars), f2(outer_vars))) has been added to restrictinfo.c. - nodeindexscan.c has had its ScanKey building mechanism augmented to allow for ScalarArrayOperations on RowExpressions. - yb_scan.c has been edited in the way it processes RowExpressions. Previously, all the ScanKeys of a `Scan` object was isomorphically translated into the ScanKey array of the `YbScanDesc` object in `ybcBeginScan`. This was the case for RowExpression ScanKeys as well. Their subsidiary ScanKeys in the sk_argument array were not flattened into the ScanKey array of the `YbScanDesc`. This would cause a lot of unnecessary casing in order to process Row keys. This has been alleviated. This file also binds ScalaryArrayOperations on Row expressions via the `ybcBindTupleExprCondIn` method which is the Row counterpart for `ybcBindColumnCondIn`. - Row expressions are bound as tuples in Pggate. Changes have been made throughout pggate to account for this notion. - pg_doc_op.cc has been altered to be able to deal with bound tuples that have a mix of hash key and range key columns inside them. - doc_rowwise_iterator and doc_pgsql_scanspec have been altered to allow for IN conditions on tuples. They use the data structure in col_group.h to achieve this. Further details of these changes are in the addendum at the end of the following doc: https://docs.google.com/document/d/1kKtFEYtT0xIlvuXKD18wgWuqkiIGan8dKl0M9KoFI_Y/edit?usp=sharing Test Plan: ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressJoin' Reviewers: mtakahara, amartsinchyk Reviewed By: amartsinchyk Subscribers: dmitry, yql, smishra Differential Revision: https://phabricator.dev.yugabyte.com/D20059
@tanujnay112 Can this get backported to 2.16? |
1 task
jasonyb
pushed a commit
that referenced
this issue
Jul 24, 2023
Summary: Commit 12e23ad, titled [#14068] YSQL: Allow Batched Nested Loop joins for conjunction clauses introduces bugs for lsm IndexScan/IndexOnlyScan when, during execution, scan keys are transferred to YbScanDesc. The main reason is that it flattens subkeys but doesn't modify bounds checks accordingly. This can lead to segfaults or wrong results. 1. yb_hash_code and regular keys: both types of keys share the same array, one writing forward in the array and the other writing backwards, with a calculation to make sure they don't overlap. Flattening causes more keys to be written than anticipated, so regular keys go beyond the calculated midpoint into the same area that yb_hash_code keys are written (or even further beyond that). Fix by separating yb_hash_code keys into a separate List to avoid having to deal with calculations and sharing space for two separate collections of keys. 1. Overall array bounds check: YbScanDesc->keys is a fixed length array, and the logic to prevent writing outside the array boundaries doesn't account for flattening, resulting in potential buffer overflow. Fix the check. Add tests to exercise these cases and more. Uncover more bugs #18347 and #18360 in the process. Jira: DB-7311 Test Plan: ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressDml Backport-through: 2.18 Close: #18322 Reviewers: dmitry, tnayak Reviewed By: tnayak Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D27169
jasonyb
pushed a commit
that referenced
this issue
Jul 31, 2023
Summary: Commit 12e23ad, titled [#14068] YSQL: Allow Batched Nested Loop joins for conjunction clauses introduces bugs for lsm IndexScan/IndexOnlyScan when, during execution, scan keys are transferred to YbScanDesc. The main reason is that it flattens subkeys but doesn't modify bounds checks accordingly. This can lead to segfaults or wrong results. 1. yb_hash_code and regular keys: both types of keys share the same array, one writing forward in the array and the other writing backwards, with a calculation to make sure they don't overlap. Flattening causes more keys to be written than anticipated, so regular keys go beyond the calculated midpoint into the same area that yb_hash_code keys are written (or even further beyond that). Fix by separating yb_hash_code keys into a separate List to avoid having to deal with calculations and sharing space for two separate collections of keys. 1. Overall array bounds check: YbScanDesc->keys is a fixed length array, and the logic to prevent writing outside the array boundaries doesn't account for flattening, resulting in potential buffer overflow. Fix the check. Add tests to exercise these cases and more. Uncover more bugs #18347 and #18360 in the process. Jira: DB-7311 Test Plan: ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressDml Backport-through: 2.18 Original commit: bfe385d / D27169 Reviewers: tnayak Reviewed By: tnayak Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D27415
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/new-feature
This is a request for a completely new feature
priority/medium
Medium priority issue
Jira Link: DB-3562
Description
We currently don't support batched nested loop joins on any clauses that are conjunctions of simple clauses such as
inner_table.c1 = outer_table.c1 AND inner_table.c2 = outer_table.c2
. In order to do this we need to also support inner table requests of the form(inner_table.c1, inner_table.c2) IN ((outer_table.c1_1, outer_table.c2_1),(outer_table.c1_2, outer_table.c2_2), (outer_table.c1_3, outer_table.c2_3)....)
.The text was updated successfully, but these errors were encountered: