Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

value change caused by DISTINCT #38756

Closed
qaqcatz opened this issue Oct 31, 2022 · 11 comments · Fixed by #50020
Closed

value change caused by DISTINCT #38756

qaqcatz opened this issue Oct 31, 2022 · 11 comments · Fixed by #50020
Assignees
Labels
affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@qaqcatz
Copy link

qaqcatz commented Oct 31, 2022

Bug Report

1. Minimal reproduce step (Required)

drop table if exists t;
create table t (c1 int);
insert into t values (1), (2), (3);

(SELECT SQRT(1) FROM t); -- sql1

(SELECT DISTINCT SQRT(1) FROM t); -- sql2

2. What did you expect to see? (Required)

In theory, the result of sql2(DISTINCT) ⊆ the result of sql1.

3. What did you see instead (Required)

However, the value 1 changed to 5e-324 after adding DISTINCT, seems like a logical bug.

mysql> (SELECT SQRT(1) FROM t); -- sql1
+---------+
| SQRT(1) |
+---------+
|       1 |
|       1 |
|       1 |
+---------+
3 rows in set (0.00 sec)

mysql> (SELECT DISTINCT SQRT(1) FROM t); -- sql2
+---------+
| SQRT(1) |
+---------+
|  5e-324 |
+---------+
1 row in set (0.00 sec)

4. What is your TiDB version? (Required)

Release Version: v6.3.0-alpha-567-g312155afb
Edition: Community
Git Commit Hash: 312155afbd0f91a7526e9b2910cf2beb09bfc6f5
Git Branch: master
UTC Build Time: 2022-10-31 03:20:57
GoVersion: go1.19.1
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: unistore
@qaqcatz qaqcatz added the type/bug The issue is confirmed as a bug. label Oct 31, 2022
@aytrack aytrack added sig/execution SIG execution severity/critical affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 labels Oct 31, 2022
@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. labels Oct 31, 2022
@aytrack aytrack removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. labels Oct 31, 2022
@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Nov 1, 2022

minimal case:

tidb> desc SELECT DISTINCT cast(1 as double) FROM t;
+---------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id                        | estRows | task      | access object | operator info                                         |
+---------------------------+---------+-----------+---------------+-------------------------------------------------------+
| HashAgg_8                 | 0.00    | root      |               | group by:Column#5, funcs:firstrow(Column#5)->Column#3 |
| └─TableReader_9           | 0.00    | root      |               | data:HashAgg_4                                        |
|   └─HashAgg_4             | 0.00    | cop[tikv] |               | group by:1,                                           |
|     └─TableFullScan_7     | 3.00    | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+---------------------------+---------+-----------+---------------+-------------------------------------------------------+
4 rows in set (0.00 sec)

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Nov 1, 2022

It seems we miss a root projection for this plan as the following sqls show:

tidb> desc select cast(1 as double) from t group by cast(1 as double);
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id                          | estRows | task      | access object | operator info                                         |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| Projection_4                | 1.00    | root      |               | 1->Column#3                                           |
| └─HashAgg_9                 | 0.00    | root      |               | group by:Column#7, funcs:firstrow(Column#7)->Column#6 |
|   └─TableReader_10          | 0.00    | root      |               | data:HashAgg_5                                        |
|     └─HashAgg_5             | 0.00    | cop[tikv] |               | group by:1,                                           |
|       └─TableFullScan_8     | 3.00    | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
5 rows in set (0.00 sec)

tidb> desc (SELECT  1 FROM t group by 1); -- sql2
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id                          | estRows | task      | access object | operator info                                         |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| Projection_4                | 1.00    | root      |               | 1->Column#3                                           |
| └─HashAgg_9                 | 0.00    | root      |               | group by:Column#7, funcs:firstrow(Column#7)->Column#6 |
|   └─TableReader_10          | 0.00    | root      |               | data:HashAgg_5                                        |
|     └─HashAgg_5             | 0.00    | cop[tikv] |               | group by:1,                                           |
|       └─TableFullScan_8     | 3.00    | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
5 rows in set (0.00 sec)

tidb> desc (SELECT  1 FROM t group by c1); -- sql2
+-----------------------------+---------+-----------+---------------+--------------------------------------------------------+
| id                          | estRows | task      | access object | operator info                                          |
+-----------------------------+---------+-----------+---------------+--------------------------------------------------------+
| Projection_4                | 2.40    | root      |               | 1->Column#3                                            |
| └─HashAgg_9                 | 0.00    | root      |               | group by:test.t.c1, funcs:firstrow(Column#7)->Column#6 |
|   └─TableReader_10          | 0.00    | root      |               | data:HashAgg_5                                         |
|     └─HashAgg_5             | 0.00    | cop[tikv] |               | group by:test.t.c1, funcs:firstrow(1)->Column#7        |
|       └─TableFullScan_8     | 3.00    | cop[tikv] | table:t       | keep order:false, stats:pseudo                         |
+-----------------------------+---------+-----------+---------------+--------------------------------------------------------+
5 rows in set (0.00 sec)

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Nov 1, 2022

I think this may be a bug of the optimizer, I'll change the label to sig/planner.
BTW, I think the severity can be changed to severity/major. This is a corner case and there is a workaround for this issue

@XuHuaiyu XuHuaiyu added sig/planner SIG: Planner and removed sig/execution SIG execution labels Nov 1, 2022
@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. labels Nov 1, 2022
@fixdb
Copy link
Contributor

fixdb commented Nov 1, 2022

We also have the same issue when TiFlash is enabled:

mysql> explain  SELECT DISTINCT SQRT(1) FROM foo;
+---------------------------+---------+-------------------+---------------+-------------------------------------------------------+
| id                        | estRows | task              | access object | operator info                                         |
+---------------------------+---------+-------------------+---------------+-------------------------------------------------------+
| HashAgg_9                 | 1.00    | root              |               | group by:Column#7, funcs:firstrow(Column#7)->Column#5 |
| └─TableReader_10          | 1.00    | root              |               | data:HashAgg_4                                        |
|   └─HashAgg_4             | 1.00    | batchCop[tiflash] |               | group by:1,                                           |
|     └─TableFullScan_8     | 5.00    | batchCop[tiflash] | table:foo     | keep order:false, stats:pseudo                        |
+---------------------------+---------+-------------------+---------------+-------------------------------------------------------+
4 rows in set (0.00 sec)

mysql> SELECT DISTINCT SQRT(1) FROM foo;
+---------+
| SQRT(1) |
+---------+
|  5e-324 |
+---------+
1 row in set (0.00 sec)

@VelocityLight VelocityLight added the affects-6.5 This bug affects the 6.5.x(LTS) versions. label Dec 2, 2022
@qaqcatz
Copy link
Author

qaqcatz commented Dec 18, 2022

Hope these can be helpful for your debugging:
We look for the first version of the bug in all docker images (with format x.x.x) of https://hub.docker.com/r/pingcap/tidb/tags
We found that the bug first occurred in pingcap/tidb:v5.1.0, it cannot be reproduced in pingcap/tidb:v5.0.5 (we can't find more intermediate versions in dockerhub):

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v5.1.0 |
+--------------------+
1 row in set (0.00 sec)

mysql> (SELECT SQRT(1) FROM t); -- sql1
+---------+
| SQRT(1) |
+---------+
|       1 |
|       1 |
|       1 |
+---------+
3 rows in set (0.00 sec)

mysql> (SELECT DISTINCT SQRT(1) FROM t); -- sql2
+---------+
| SQRT(1) |
+---------+
|  5e-324 |
+---------+
1 row in set (0.00 sec)

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v5.0.5 |
+--------------------+
1 row in set (0.00 sec)

mysql> (SELECT SQRT(1) FROM t); -- sql1
+---------+
| SQRT(1) |
+---------+
|       1 |
|       1 |
|       1 |
+---------+
3 rows in set (0.00 sec)

mysql> (SELECT DISTINCT SQRT(1) FROM t); -- sql2
+---------+
| SQRT(1) |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)

@VelocityLight VelocityLight added the affects-7.1 This bug affects the 7.1.x(LTS) versions. label Apr 20, 2023
@ti-chi-bot ti-chi-bot added the affects-7.5 This bug affects the 7.5.x(LTS) versions. label Oct 23, 2023
@fixdb
Copy link
Contributor

fixdb commented Oct 28, 2023

/assign @hi-rustin

@Rustin170506 Rustin170506 moved this to 📋 Backlog in 🎒My Work Oct 31, 2023
@Rustin170506 Rustin170506 moved this from 📋 Backlog to 🏊WIP in 🎒My Work Jan 3, 2024
@Rustin170506 Rustin170506 removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. labels Jan 3, 2024
@Rustin170506
Copy link
Member

I found that: If we analyze the table after we insert some data into the table immediately. Then we can get the correct result.

@Rustin170506
Copy link
Member

Rustin170506 commented Jan 8, 2024

Before analyze:

mysql> explain (SELECT DISTINCT SQRT(1) FROM t);
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| id                        | estRows  | task      | access object | operator info                                         |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| HashAgg_8                 | 1.00     | root      |               | group by:Column#5, funcs:firstrow(Column#5)->Column#3 |
| └─TableReader_9           | 1.00     | root      |               | data:HashAgg_4                                        |
|   └─HashAgg_4             | 1.00     | cop[tikv] |               | group by:1,                                           |
|     └─TableFullScan_7     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
4 rows in set (0.01 sec)
logical p: DataScan(tdda)->Projection->Aggr(firstrow(Column#3))
logic: DataScan(tdda)->Aggr(firstrow(1))
physical: TableReader(Table(tdda)->HashAgg)->HashAgg
finalPlan: TableReader(Table(tdda)->HashAgg)->HashAgg

After analyze:

mysql> explain (SELECT DISTINCT SQRT(1) FROM t);
+--------------------------+---------+-----------+---------------+-----------------------------------------+
| id                       | estRows | task      | access object | operator info                           |
+--------------------------+---------+-----------+---------------+-----------------------------------------+
| HashAgg_6                | 1.00    | root      |               | group by:1, funcs:firstrow(1)->Column#3 |
| └─TableReader_11         | 3.00    | root      |               | data:TableFullScan_10                   |
|   └─TableFullScan_10     | 3.00    | cop[tikv] | table:t       | keep order:false                        |
+--------------------------+---------+-----------+---------------+-----------------------------------------+
3 rows in set (0.00 sec)
logical p: DataScan(tdda)->Projection->Aggr(firstrow(Column#3))
logic: DataScan(tdda)->Aggr(firstrow(1))
physical: TableReader(Table(tdda))->HashAgg
finalPlan: TableReader(Table(tdda))->HashAgg

@Rustin170506
Copy link
Member

So the issue happens if we do the HashAgg operation twice.

@wshwsh12
Copy link
Contributor

wshwsh12 commented Jan 8, 2024

When the value is 2, tikv will return 2 columns and get the correct result.
So why is the function firstrow lost when the value is 1?

tidb> desc SELECT DISTINCT cast(1 as double) FROM t;
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| id                        | estRows  | task      | access object | operator info                                         |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| HashAgg_8                 | 1.00     | root      |               | group by:Column#5, funcs:firstrow(Column#5)->Column#3 |
| └─TableReader_9           | 1.00     | root      |               | data:HashAgg_4                                        |
|   └─HashAgg_4             | 1.00     | cop[tikv] |               | group by:1,                                           |
|     └─TableFullScan_7     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
4 rows in set (0.000 sec)

tidb> desc SELECT DISTINCT cast(2 as double) FROM t;
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| id                        | estRows  | task      | access object | operator info                                         |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| HashAgg_8                 | 1.00     | root      |               | group by:Column#5, funcs:firstrow(Column#6)->Column#3 |
| └─TableReader_9           | 1.00     | root      |               | data:HashAgg_4                                        |
|   └─HashAgg_4             | 1.00     | cop[tikv] |               | group by:1, funcs:firstrow(2)->Column#6               |
|     └─TableFullScan_7     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
4 rows in set (0.000 sec)

@Rustin170506
Copy link
Member

(SELECT DISTINCT SQRT(2) FROM tdda)
image
(SELECT DISTINCT SQRT(1) FROM tdda)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants