[YSQL] Index cost estimates should account for index and scan type #4494

m-iancu · 2020-05-18T23:07:10Z

Among other things, when evaluating the per-row and index cost we should consider:

Index uniqueness (e.g. fully-specified key guarantee single-row results for unique indexes, but not for non-unique indexes).
Included columns (i.e. Index Scan vs Index Only Scan). Index scans will require an additional read from the main table to retrieve the needed columns (these are batched so costs are somewhat amortized).
Scan direction -- in DocDB reverse scans are (somewhat) slower than forward scans, so we should prefer fwd scans if all else is equal.
Partial indexes -- the partial index predicate will mean the index size (number of rows) will be less than the table (depending on how restrictive the predicate is). The cost estimate should account for the predicate filter in addition to the index condition filter.

Example:

CREATE TABLE test(device_id int primary key, device_name text UNIQUE, supplier_id int);
CREATE INDEX ON test(supplier_id);

INSERT INTO test (SELECT generate_series, generate_series::text, 1 FROM generate_series(1, 100000));

EXPLAIN ANALYZE SELECT * FROM test WHERE device_name = '1' AND supplier_id = 1;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_supplier_id_idx on test  (cost=0.00..4.12 rows=1 width=40) (actual time=82.820..842.473 rows=1 loops=1)
   Index Cond: (supplier_id = 1)
   Filter: (device_name = '1'::text)
   Rows Removed by Filter: 99999
 Planning Time: 0.116 ms
 Execution Time: 842.639 ms
(6 rows)

Query takes ~850ms, even though it could be executed as a single-key read if using the right index (on device_name, not supplier_id).

The text was updated successfully, but these errors were encountered:

Summary: Improve index cost estimates by considering the following when computing the costs: 1. Index uniqueness -- For instance fully-specified key guarantee single-row results for unique indexes, but not for non-unique indexes, so it now has lower costs for such cases. 2. Included columns (i.e. `Index Scan` vs `Index Only Scan`) -- Index scans will require an additional read from the main table to retrieve the needed columns (these are batched so costs are somewhat amortized). So index-only scans are preferred (have lower cost). 3. Scan direction -- in DocDB backwards (reverse) scans are (somewhat) slower than forward scans, so we prefer forward scans if all else is equal. 4. Partial indexes -- the partial index predicate will mean the index size (number of rows) will be less than the table (depending on how restrictive the predicate is). The cost estimate now account for the predicate filter in addition to the index condition filter. Additionally, disable merge joins for cases where they are not supported because the `ammarkpos` and `amrestrpos` are not yet implemented by the `ybc` access method. Choosing the unsupported index would have lead to a runtime error during execution (#4496). Example: For a pathological case with an inefficient (low-cardinality) index: ``` CREATE TABLE test(device_id int primary key, device_name text UNIQUE, supplier_id int); CREATE INDEX ON test(supplier_id); INSERT INTO test (SELECT generate_series, generate_series::text, 1 FROM generate_series(1, 100000)); EXPLAIN ANALYZE SELECT * FROM test WHERE device_name = '1' AND supplier_id = 1; ``` ``` QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using test_device_name_key on test (cost=0.00..4.13 rows=1 width=40) (actual time=1.271..1.273 rows=1 loops=1) Index Cond: (device_name = '1'::text) Filter: (supplier_id = 1) Planning Time: 0.072 ms Execution Time: 1.302 ms (5 rows) ``` Query now takes `1.302 ms` (down from `~850ms`). Test Plan: TestPgRegressPlanner (yb_planner_indexes, yb_planner_joins) Reviewers: neil, alex, neha Reviewed By: neil Subscribers: dmitry, yql Differential Revision: https://phabricator.dev.yugabyte.com/D8445

m-iancu · 2020-06-01T08:06:51Z

Fixed by 6a0dd26.

m-iancu added this to the v2.2 milestone May 18, 2020

m-iancu self-assigned this May 18, 2020

m-iancu closed this as completed Jun 1, 2020

m-iancu added this to Done in YSQL via automation Jun 1, 2020

ndeodhar mentioned this issue Jun 8, 2020

[YSQL] Select on column used in partial index does not use the index #2275

Closed

rkarthik007 mentioned this issue Jul 28, 2020

Improve YSQL query performance #5242

Open

ajcaldera1 mentioned this issue Aug 11, 2020

Allow overriding the RPC communication port for masters/tservers in Platform #3806

Closed

sushantrmishra mentioned this issue Jan 15, 2022

[YSQL] Support TwoPhase commit #11084

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Index cost estimates should account for index and scan type #4494

[YSQL] Index cost estimates should account for index and scan type #4494

m-iancu commented May 18, 2020 •

edited

Loading

m-iancu commented Jun 1, 2020

[YSQL] Index cost estimates should account for index and scan type #4494

[YSQL] Index cost estimates should account for index and scan type #4494

Comments

m-iancu commented May 18, 2020 • edited Loading

Example:

m-iancu commented Jun 1, 2020

m-iancu commented May 18, 2020 •

edited

Loading