-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semi join performance issue #10226
Comments
To guarantee correctness, we moved the equal condition TiDB(root@127.0.0.1:test) > desc select * from t1 where id not in (select id from t1 where ctime < '2019-04-22 18:09:27');
+------------------------+----------+------+-----------------------------------------------------------------------------+
| id | count | task | operator info |
+------------------------+----------+------+-----------------------------------------------------------------------------+
| HashLeftJoin_9 | 8000.00 | root | anti semi join, inner:TableReader_14, other cond:eq(test.t1.id, test.t1.id) |
| ├─TableReader_11 | 10000.00 | root | data:TableScan_10 |
| │ └─TableScan_10 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_14 | 3323.33 | root | data:Selection_13 |
| └─Selection_13 | 3323.33 | cop | lt(test.t1.ctime, 2019-04-22 18:09:27.000000) |
| └─TableScan_12 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
+------------------------+----------+------+-----------------------------------------------------------------------------+
6 rows in set (0.01 sec) As you can see, it performs a cartesian product first and then performs the filter To avoid this transformation, you can add a create table t1 (id bigint not null, ctime timestamp); Then the query execution plan is: TiDB(root@127.0.0.1:test) > desc select * from t1 where id not in (select id from t1 where ctime < '2019-04-22 18:09:27');
+------------------------+----------+------+--------------------------------------------------------------------------+
| id | count | task | operator info |
+------------------------+----------+------+--------------------------------------------------------------------------+
| HashLeftJoin_9 | 8000.00 | root | anti semi join, inner:TableReader_14, equal:[eq(test.t1.id, test.t1.id)] |
| ├─TableReader_11 | 10000.00 | root | data:TableScan_10 |
| │ └─TableScan_10 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_14 | 3323.33 | root | data:Selection_13 |
| └─Selection_13 | 3323.33 | cop | lt(test.t1.ctime, 2019-04-22 18:09:27.000000) |
| └─TableScan_12 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
+------------------------+----------+------+--------------------------------------------------------------------------+
6 rows in set (0.00 sec) Because In the future, we are planning to derive TiDB(root@127.0.0.1:test) > desc select * from t1 where id not in (select id from t1 where ctime < '2019-04-22 18:09:27' and id is not null);
+------------------------+----------+------+-----------------------------------------------------------------------------+
| id | count | task | operator info |
+------------------------+----------+------+-----------------------------------------------------------------------------+
| HashLeftJoin_9 | 8000.00 | root | anti semi join, inner:TableReader_14, other cond:eq(test.t1.id, test.t1.id) |
| ├─TableReader_11 | 10000.00 | root | data:TableScan_10 |
| │ └─TableScan_10 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_14 | 3320.01 | root | data:Selection_13 |
| └─Selection_13 | 3320.01 | cop | lt(test.t1.ctime, 2019-04-22 18:09:27.000000), not(isnull(test.t1.id)) |
| └─TableScan_12 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
+------------------------+----------+------+-----------------------------------------------------------------------------+
6 rows in set (0.00 sec) |
this behavior is different from MySQL, due to historical reasons, now we use in production, we have 1000w+ records, with |
Could you please illustrate more about this comment?
|
That's interesting.
To make it clear, actually the SQL "behavior" of TiDB is compatible with MySQL now, but the implementation is different. In MySQL,
Another interesting thing is that we can learn from the blog to optimize |
Bug Report
Please answer these questions before submitting your issue. Thanks!
What did you do?
create table t1 (id bigint, ctime timestamp);
generate 10w rows data
select * from t1 where id not in (select id from t1 where ctime < '2019-04-22 18:09:27');
What did you expect to see?
response quickly as tidb-v2.1.4
What did you see instead?
very slow since plan/executor: make semi joins null and empty aware (#9051) #9449 merged
What version of TiDB are you using (
tidb-server -V
or runselect tidb_version();
on TiDB)?tidb-v2.1.8
The text was updated successfully, but these errors were encountered: