Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filter condition push down to IndexScan for prefix index. #21145

Open
coocood opened this issue Nov 19, 2020 · 4 comments
Open

Allow filter condition push down to IndexScan for prefix index. #21145

coocood opened this issue Nov 19, 2020 · 4 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@coocood
Copy link
Member

coocood commented Nov 19, 2020

Feature Request

Allow filter condition push down to IndexScan for prefix index.

mysql> create table t (a text, b text, index a_b (a(255), b(255)));
Query OK, 0 rows affected (0.01 sec)

mysql> explain select * from t where a between 'a' and 'b' and b = 'b';
+-------------------------------+---------+-----------+--------------------------+---------------------------------------------------------+
| id                            | estRows | task      | access object            | operator info                                           |
+-------------------------------+---------+-----------+--------------------------+---------------------------------------------------------+
| IndexLookUp_11                | 0.25    | root      |                          |                                                         |
| ├─IndexRangeScan_8(Build)     | 250.00  | cop[tikv] | table:t, index:a_b(a, b) | range:["a","b"], keep order:false, stats:pseudo         |
| └─Selection_10(Probe)         | 0.25    | cop[tikv] |                          | eq(test.t.b, "b"), ge(test.t.a, "a"), le(test.t.a, "b") |
|   └─TableRowIDScan_9          | 250.00  | cop[tikv] | table:t                  | keep order:false, stats:pseudo                          |
+-------------------------------+---------+-----------+--------------------------+---------------------------------------------------------+

The condition b = 'b' should be able to pushed down to IndexScan phase to reduce the lookup table cost.

We only need to make sure the equal condition value of b is less than the index column length 255.

The ideal plan should be

mysql> explain select * from t where a between 'a' and 'b' and b = 'b';
+--------------------------+---------+-----------+--------------------------+-------------------------------------------------+
| id                       | estRows | task      | access object            | operator info                                   |
+--------------------------+---------+-----------+--------------------------+-------------------------------------------------+
| IndexReader_7            | 0.25    | root      |                          | index:Selection_6                               |
| └─Selection_6            | 0.25    | cop[tikv] |                          | eq(test.t.b, "b")                               |
|   └─IndexRangeScan_5     | 250.00  | cop[tikv] | table:t, index:a_b(a, b) | range:["a","b"], keep order:false, stats:pseudo |
+--------------------------+---------+-----------+--------------------------+-------------------------------------------------+

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@xuyifangreeneyes
Copy link
Contributor

The optimization may cause some correctness problems when new collation is enabled. For example, using #38215, we can get the following wrong result.

mysql> create table t1 (
    ->  id int not null,
    ->  city varchar(20) not null,
    ->  key (city(7),id)
    -> ) character set=utf8;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into t1 values (1,'Durban North');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (2,'Durban');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1 where city = 'Durban';
+----+---------+
| id | city    |
+----+---------+
|  1 | Durban  |
|  2 | Durban  |
+----+---------+
2 rows in set (0.00 sec)

mysql> explain select * from t1 where city = 'Durban';
+------------------------+---------+-----------+--------------------------------+-----------------------------------------------------------+
| id                     | estRows | task      | access object                  | operator info                                             |
+------------------------+---------+-----------+--------------------------------+-----------------------------------------------------------+
| IndexReader_6          | 0.00    | root      |                                | index:IndexRangeScan_5                                    |
| └─IndexRangeScan_5     | 0.00    | cop[tikv] | table:t1, index:city(city, id) | range:["Durban","Durban"], keep order:false, stats:pseudo |
+------------------------+---------+-----------+--------------------------------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

'Durban' and 'Durban ' is equal under utf8mb4_bin collation. When inserting 'Durban North' into index, we first cut it to 'Durban ', then trim it to 'Durban' and fill into index. When we apply city = 'Durban' on index, we don't know whether the index key 'Durban' has been trimmed or not. Therefore, we must apply city = 'Durban' again on table.

@xuyifangreeneyes
Copy link
Contributor

There are some differences between MySQL and TiDB when handling prefix column. MySQL pads space to prefix length while TiDB trims space. In both implementations we cannot distinguish whether the real value length is less than the index prefix column length. MySQL also needs double scan for select * from t1 where city = 'Durban' and applies city = 'Durban' on table.

@xuyifangreeneyes
Copy link
Contributor

There are still two optimizations we can do:

  1. col is null can be pushed to prefix index. See planner: avoid double scan for index prefix col is (not) null #38555 for details.
  2. Filter conditions on index prefix column can be applied twice on both index and table. Even if the const length exceeds the prefix column length, we can cut the const and then apply the filter condition on index.

@rverma-dev
Copy link

Is this supported now? Can we also do push down to greater and less than scenario too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
4 participants