Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component #7836

Closed
ramsrivatsa opened this issue Mar 28, 2021 · 1 comment
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@ramsrivatsa
Copy link
Contributor

ramsrivatsa commented Mar 28, 2021

Jira Link: DB-3200

CREATE TABLE t1 (id int primary key, val int);
INSERT INTO t1 SELECT i, i FROM (SELECT generate_series(1, 100) i) t;
set yb_debug_log_docdb_requests=true;
SELECT * FROM t1 where t1.id=ANY(ARRAY[10,20,30,40,50]);

If the following set of queries are executed 5 docDB requests are being sent. These requests can be batched into 1 docDB request.

Note: docDB requests can be viewed in this file ~/yugabyte-data/node-1/disk-1/yb-data/tserver/logs/postgresql*.log

@ramsrivatsa ramsrivatsa self-assigned this Mar 28, 2021
@ramsrivatsa
Copy link
Contributor Author

ramsrivatsa commented Mar 28, 2021

@ramsrivatsa ramsrivatsa added the kind/enhancement This is an enhancement of an existing feature label Mar 28, 2021
@ramsrivatsa ramsrivatsa changed the title [YSQL] Batching queries that use index lookup of elements inside an ANY array [YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component Mar 29, 2021
@sushantrmishra sushantrmishra added the area/ysql Yugabyte SQL (YSQL) label Aug 15, 2022
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Aug 15, 2022
tanujnay112 added a commit that referenced this issue Mar 17, 2023
Summary:
Before this change, IN conditions bound to hash key columns produce one request per possible values of the hash keys. For example, consider a query `SELECT * FROM sample_table WHERE h1 IN (1,4,6,8);` where sample_table has a primary index with `h1` as its full hash component. We send 4 requests, one per each in element, of the form `SELECT * FROM sample_table WHERE h1 = 1;`, `SELECT * FROM sample_table WHERE h1 = 4;` etc. If the IN condition was bound to a range column, we would send the entire filter at once as a singular condition and send just one request per partition of `sample_table_pkey`. The reason why we couldn't do this with hash column IN filters was because the `DocRowwiseIterator` could not perform skip scans over hash columns so it did not have the necessary infrastructure to process IN conditions on hash columns.

This diff fixes the above issue by having IN conditions on hash columns behave similar to those on range columns. In order to do this, we did the following changes:
  - We adjusted pgsql/qlscanspec and ScanChoices to be able to carry out skip scans on hash column IN conditions.
  - We added infrastructure in pg_doc_op.h to convert IN filters of the form `h1 IN (v1,v2,...,vn)` a condition expression of the form `(yb_hash_code(h1), h1) IN ((yb_hash_code(v1), v1), (yb_hash_code(v2), v2), (yb_hash_code(v3), v3), ..., (yb_hash_code(vn), vn))`. If we have multiple hash partitions on the table we form one request per partition and the RHS of the hash condition on each partition request is ensured to only have values from (v1,v2,...vn) that are relevant to it. This feature also works similarly for multicolumn hash keys.

This feature is disabled when serializable isolation level is used for now as there isn't infrastructure to lock multiple non-contiguous rows as such filters would require.

This feature's enablement is controlled by the autoflag GUC `yb_enable_hash_batch_in`.
We also added a tserver flag `ysql_hash_batch_permutation_limit` that specifies a limit on the number of hash permutations a query must produce in order to be eligible to use this feature. Without this check, we can materialize an unbounded number of hash permutations in memory and cause an OOM crash.

Test Plan:
```
./yb_build.sh release --java-test org.yb.pgsql.TestPgRegressIndex
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressHashInQueries'
```

Reviewers: smishra, neil, amartsinchyk, kpopali

Reviewed By: kpopali

Subscribers: mbautin, kpopali, kannan, ssong, yql, mihnea, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D19672
@tanujnay112 tanujnay112 self-assigned this Mar 27, 2023
@github-project-automation github-project-automation bot moved this to Done in YQL-beta Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

5 participants