-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semi Join should be NULL-Aware #8844
Comments
For hash join, we cannot use it for semi-joins at least now, because for queries like |
|
I believe the original cases are all fixed, but the hash-join case is not fixed. Original testcase: drop table if exists t;
create table t(a bigint, b bigint, c bigint);
insert into t values(null, 1, 1), (2, 2, 2), (3, null, 3), (4, 4, 3);
select a, b, a in (select b from t) from t;
select a, c, a in (select c from t) from t;
select a, b, a not in (select b from t) from t;
select a, c, a not in (select c from t) from t;
select tidb_version()\G
..
mysql> select a, b, a in (select b from t) from t; # correct
+------+------+------------------------+
| a | b | a in (select b from t) |
+------+------+------------------------+
| NULL | 1 | NULL |
| 2 | 2 | 1 |
| 3 | NULL | NULL |
| 4 | 4 | 1 |
+------+------+------------------------+
4 rows in set (0.00 sec)
mysql> select a, c, a in (select c from t) from t; # correct
+------+------+------------------------+
| a | c | a in (select c from t) |
+------+------+------------------------+
| NULL | 1 | NULL |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 3 | 0 |
+------+------+------------------------+
4 rows in set (0.00 sec)
mysql> select a, b, a not in (select b from t) from t; # correct
+------+------+----------------------------+
| a | b | a not in (select b from t) |
+------+------+----------------------------+
| NULL | 1 | NULL |
| 2 | 2 | 0 |
| 3 | NULL | NULL |
| 4 | 4 | 0 |
+------+------+----------------------------+
4 rows in set (0.00 sec)
mysql> select a, c, a not in (select c from t) from t; # correct
+------+------+----------------------------+
| a | c | a not in (select c from t) |
+------+------+----------------------------+
| NULL | 1 | NULL |
| 2 | 2 | 0 |
| 3 | 3 | 0 |
| 4 | 3 | 1 |
+------+------+----------------------------+
4 rows in set (0.00 sec)
mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-893-g4e829aaee
Edition: Community
Git Commit Hash: 4e829aaee7b656aa807814708ae05af5233302af
Git Branch: master
UTC Build Time: 2020-08-04 12:40:52
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec) Hash join case: DROP TABLE IF EXISTS ss, tt;
create table ss (
a bigint,
b bigint
);
create table tt (
a bigint,
b bigint
);
INSERT INTO ss VALUES (1,NULL),(2,NULL),(2,2);
INSERT INTO tt VALUES (1,1),(1,NULL),(2,NULL);
SELECT tt.a, tt.b, (tt.a, tt.b) in (select a,b from ss) from tt;
..
mysql> SELECT tt.a, tt.b, (tt.a, tt.b) in (select a,b from ss) from tt;
+------+------+--------------------------------------+
| a | b | (tt.a, tt.b) in (select a,b from ss) |
+------+------+--------------------------------------+
| 1 | 1 | 0 |
| 1 | NULL | 0 |
| 2 | NULL | 0 |
+------+------+--------------------------------------+
3 rows in set (0.01 sec) The third column should be |
@fzhedu PTAL |
Please edit this comment or add a new comment to complete the following informationNot a bug
Duplicate bug
BugNote: Make Sure that 'component', and 'severity' labels are added 1. Root Cause Analysis (RCA) (optional)2. Symptom (optional)3. All Trigger Conditions (optional)4. Workaround (optional)5. Affected versions6. Fixed versionsv5.0.0-rc |
( AffectedVersions ) fields are empty. |
Bug Report
Take this as an example:
LeftOuterSemiJoin
The join result should be
OuterRow + NULL
if:NULL
(row count from inner is not empty), or:NULL
value in the inner side join key.MySQL:
While in TiDB the result is:
The join result should be
OuterRow + 1
if:NULL
and there is at least one inner join key has the same value with the outer side.MySQL:
While in TiDB, the result is:
The join result should be
OuterRow + 0
if:NULL
and there is no inner join key has the same value with the outer side, and there is noNULL
value in the inner side join key.MySQL:
While in TiDB, the result is:
Anti LeftOuterSemiJoin
The join result should be
OuterRow + NULL
if:NULL
, or:NULL
value in the inner side join key.MySQL:
While in TiDB:
The join result should be
OuterRow + 0
if:NULL
and there is at least one inner join key has the same value with the outer side.MySQL:
While in TiDB:
The join result should be
OuterRow + 0
if:NULL
and there is no inner join key has the same value with the outer side, and there is noNULL
value in the inner side join key.MySQL:
While in TiDB:
Summary
For
LeftOuterSemiJoin
andAnti LeftOuterSemiJoin
, TiDB can not correctly produce theOuterRow + NULL
.There is another two semi join types in TiDB:
LeftOuterSemiJoin
result is OuterRow + 1, not OuterRow + 0 and OuterRow + NULLAnti LeftOuterSemiJoin
result is OuterRow + 1, not OuterRow + 0 and OuterRow + NULLSuggestions
In Planner
NOT NULL
filters on the inner side of the join key if the join type is the four semi joins.If possible, we can remove
SemiJoin
andAnti SemiJoin
.LeftOuterSemiJoin
andAnti LeftOuterSemiJoin
has the whole message of the join result, we only need these two semi joins actually:In Executor
For Hash Join and NestLoopedApply, we should:
NULL
outer join keys for semi joins, returnOuterRow + NULL
NULL
value in the inner join key, returnOuterRow + NULL
if the outer row is not NULL and there is no matched inner join key.The text was updated successfully, but these errors were encountered: