Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong firstrow aggregation result when there are regions returning empty result #30923

Closed
time-and-fate opened this issue Dec 21, 2021 · 7 comments · Fixed by tikv/tikv#11708, #31392, pingcap/tipb#254 or pingcap/tiflash#3822
Assignees
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. severity/critical sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@time-and-fate
Copy link
Member

time-and-fate commented Dec 21, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create table t(a int, b int);
insert into t value(1,1);
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
split table t between (0) and (30000) regions 20;
insert into t value(2, 1);
select any_value(a), sum(b) from t where a = 2;

2. What did you expect to see? (Required)

+--------------+--------+
| any_value(a) | sum(b) |
+--------------+--------+
|            2 |      1 |
+--------------+--------+

3. What did you see instead (Required)

+--------------+--------+
| any_value(a) | sum(b) |
+--------------+--------+
|         NULL |      1 |
+--------------+--------+

4. What is your TiDB version? (Required)

master
Also reproducible on 4.0.12.

@time-and-fate time-and-fate added the type/bug The issue is confirmed as a bug. label Dec 21, 2021
@ichn-hu
Copy link
Contributor

ichn-hu commented Dec 22, 2021

I can reproduce it on master tidb with tikv and pd

@lzmhhh123
Copy link
Contributor

/assign @ichn-hu

@ichn-hu
Copy link
Contributor

ichn-hu commented Dec 23, 2021

the error on master seems from the erroneous plan

When given wrong result, the plan is:

explain select any_value(a), sum(b) from t where a = 2;                                                                  
+--------------------------------+----------+-----------+---------------+--------------------------------------------------------------------+
| id                             | estRows  | task      | access object | operator info                                                      |
+--------------------------------+----------+-----------+---------------+--------------------------------------------------------------------+
| Projection_5                   | 1.00     | root      |               | any_value(test.t.a)->Column#5, Column#4                            |
| └─StreamAgg_20                 | 1.00     | root      |               | funcs:sum(Column#9)->Column#4, funcs:firstrow(Column#10)->test.t.a |
|   └─TableReader_21             | 1.00     | root      |               | data:StreamAgg_9                                                   |
|     └─StreamAgg_9              | 1.00     | cop[tikv] |               | funcs:sum(test.t.b)->Column#9, funcs:firstrow(test.t.a)->Column#10 |
|       └─Selection_19           | 32.77    | cop[tikv] |               | eq(test.t.a, 2)                                                    |
|         └─TableFullScan_18     | 32768.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo                                     |
+--------------------------------+----------+-----------+---------------+--------------------------------------------------------------------+

And after a while, when it gives the correct result, the plan is

explain select any_value(a), sum(b) from t where a = 2;
+--------------------------------+----------+-----------+---------------+---------------------------------------------------------------------+
| id                             | estRows  | task      | access object | operator info                                                       |
+--------------------------------+----------+-----------+---------------+---------------------------------------------------------------------+
| Projection_5                   | 1.00     | root      |               | any_value(test.t.a)->Column#5, Column#4                             |
| └─StreamAgg_10                 | 1.00     | root      |               | funcs:sum(Column#11)->Column#4, funcs:firstrow(Column#12)->test.t.a |
|   └─Projection_20              | 1.00     | root      |               | cast(test.t.b, decimal(32,0) BINARY)->Column#11, test.t.a           |
|     └─TableReader_17           | 1.00     | root      |               | data:Selection_16                                                   |
|       └─Selection_16           | 1.00     | cop[tikv] |               | eq(test.t.a, 2)                                                     |
|         └─TableFullScan_15     | 32769.00 | cop[tikv] | table:t       | keep order:false                                                    |
+--------------------------------+----------+-----------+---------------+---------------------------------------------------------------------+

@windtalker windtalker assigned windtalker and unassigned ichn-hu Dec 23, 2021
@windtalker
Copy link
Contributor

windtalker commented Dec 23, 2021

The root cause is

  1. any_value is rewrite to first_row during aggregation
  2. the aggregation is converted to 2 stage
    • partial agg, which is push down to tikv, and since there are 20 regions, it will be executed 20 times
    • final agg, which is executed in TiDB
  3. according to SQL standard, an aggregation without group by should at least return 1 row
  4. there is only one row with a = 2, it means only 1 region will return the value 2, 1, and for all the other 19 regions, it will return NULL, NULL
  5. the first_row function in the TiDB only save the first value it meet, and if it meet NULL, NULL first, then the any_value will return NULL, sum(b) is not affected because NULL will be ignored in sum

@windtalker
Copy link
Contributor

Possible solution:
Since we can make sure that the aggregation executed in TiKV is always the first stage agg, when there is no input for aggregation, it is safe to not return any thing even if the aggregion does not have group by columns.

@time-and-fate time-and-fate added affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. labels Dec 23, 2021
@windtalker
Copy link
Contributor

And this issue also exists when running the same query in TiDB + TiFlash.

ti-chi-bot added a commit to tikv/tikv that referenced this issue Feb 18, 2022
…ation (#11708) (#11744)

ref #11708, close #11735, close pingcap/tidb#30923

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: xufei <xufei@pingcap.com>
Co-authored-by: Yilin Chen <sticnarf@gmail.com>
Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
ti-chi-bot added a commit to pingcap/tiflash that referenced this issue Feb 23, 2022
@VelocityLight VelocityLight added affects-5.4 This bug affects the 5.4.x(LTS) versions. may-affects-5.4 This bug maybe affects 5.4.x versions. and removed affects-5.4 This bug affects the 5.4.x(LTS) versions. labels Jun 21, 2022
@VelocityLight VelocityLight added affects-5.4 This bug affects the 5.4.x(LTS) versions. and removed may-affects-5.4 This bug maybe affects 5.4.x versions. labels Jul 4, 2022
ti-chi-bot added a commit to pingcap/tiflash that referenced this issue Jul 7, 2022
ti-chi-bot added a commit to pingcap/tiflash that referenced this issue Jul 11, 2022
ti-chi-bot added a commit to pingcap/tiflash that referenced this issue Jul 11, 2022
ti-chi-bot added a commit to pingcap/tiflash that referenced this issue Jul 14, 2022
ti-chi-bot pushed a commit to tikv/tikv that referenced this issue Jul 14, 2022
…ation (#11708) (#11741)

ref #11708, close #11735, close pingcap/tidb#30923

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

Co-authored-by: xufei <xufei@pingcap.com>
Co-authored-by: xufei <xufeixw@mail.ustc.edu.cn>
ti-chi-bot pushed a commit to tikv/tikv that referenced this issue Jul 14, 2022
…ation (#11708) (#11743)

ref #11708, close #11735, close pingcap/tidb#30923

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: xufei <xufei@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. severity/critical sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
5 participants