Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Index cost estimates should account for index and scan type #4494

Closed
m-iancu opened this issue May 18, 2020 · 1 comment
Closed

[YSQL] Index cost estimates should account for index and scan type #4494

m-iancu opened this issue May 18, 2020 · 1 comment
Assignees
Projects
Milestone

Comments

@m-iancu
Copy link
Contributor

m-iancu commented May 18, 2020

Among other things, when evaluating the per-row and index cost we should consider:

  1. Index uniqueness (e.g. fully-specified key guarantee single-row results for unique indexes, but not for non-unique indexes).
  2. Included columns (i.e. Index Scan vs Index Only Scan). Index scans will require an additional read from the main table to retrieve the needed columns (these are batched so costs are somewhat amortized).
  3. Scan direction -- in DocDB reverse scans are (somewhat) slower than forward scans, so we should prefer fwd scans if all else is equal.
  4. Partial indexes -- the partial index predicate will mean the index size (number of rows) will be less than the table (depending on how restrictive the predicate is). The cost estimate should account for the predicate filter in addition to the index condition filter.

Example:

CREATE TABLE test(device_id int primary key, device_name text UNIQUE, supplier_id int);
CREATE INDEX ON test(supplier_id);

INSERT INTO test (SELECT generate_series, generate_series::text, 1 FROM generate_series(1, 100000));

EXPLAIN ANALYZE SELECT * FROM test WHERE device_name = '1' AND supplier_id = 1;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_supplier_id_idx on test  (cost=0.00..4.12 rows=1 width=40) (actual time=82.820..842.473 rows=1 loops=1)
   Index Cond: (supplier_id = 1)
   Filter: (device_name = '1'::text)
   Rows Removed by Filter: 99999
 Planning Time: 0.116 ms
 Execution Time: 842.639 ms
(6 rows)

Query takes ~850ms, even though it could be executed as a single-key read if using the right index (on device_name, not supplier_id).

@m-iancu m-iancu added this to the v2.2 milestone May 18, 2020
@m-iancu m-iancu self-assigned this May 18, 2020
m-iancu added a commit that referenced this issue Jun 1, 2020
Summary:
Improve index cost estimates by considering the following when computing the costs:
1. Index uniqueness -- For instance fully-specified key guarantee single-row results for unique indexes, but not for
    non-unique indexes, so it now has lower costs for such cases.
2. Included columns (i.e. `Index Scan` vs `Index Only Scan`) -- Index scans will require an additional read from the
    main table to retrieve the needed columns (these are batched so costs are somewhat amortized).
    So index-only scans are preferred (have lower cost).
3. Scan direction -- in DocDB backwards (reverse) scans are (somewhat) slower than forward scans, so we prefer
    forward scans if all else is equal.
4. Partial indexes -- the partial index predicate will mean the index size (number of rows) will be less than the table
    (depending on how restrictive the predicate is). The cost estimate now account for the predicate filter in addition
    to the index condition filter.

Additionally, disable merge joins for cases where they are not supported because the `ammarkpos` and `amrestrpos`
are not yet implemented by the `ybc` access method.
Choosing the unsupported index would have lead to a runtime error during execution (#4496).

Example:
For a pathological case with an inefficient (low-cardinality) index:
```
CREATE TABLE test(device_id int primary key, device_name text UNIQUE, supplier_id int);
CREATE INDEX ON test(supplier_id);

INSERT INTO test (SELECT generate_series, generate_series::text, 1 FROM generate_series(1, 100000));

EXPLAIN ANALYZE SELECT * FROM test WHERE device_name = '1' AND supplier_id = 1;
```
```
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_device_name_key on test  (cost=0.00..4.13 rows=1 width=40) (actual time=1.271..1.273 rows=1 loops=1)
   Index Cond: (device_name = '1'::text)
   Filter: (supplier_id = 1)
 Planning Time: 0.072 ms
 Execution Time: 1.302 ms
(5 rows)
```
Query now takes `1.302 ms` (down from `~850ms`).

Test Plan: TestPgRegressPlanner (yb_planner_indexes, yb_planner_joins)

Reviewers: neil, alex, neha

Reviewed By: neil

Subscribers: dmitry, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D8445
@m-iancu
Copy link
Contributor Author

m-iancu commented Jun 1, 2020

Fixed by 6a0dd26.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
YSQL
  
Done
Development

No branches or pull requests

1 participant