Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TABLESAMPLE REGIONS() with unsigned bigint primary key returns unordered result #48253

Closed
kennytm opened this issue Nov 2, 2023 · 6 comments · Fixed by #48315
Closed

TABLESAMPLE REGIONS() with unsigned bigint primary key returns unordered result #48253

kennytm opened this issue Nov 2, 2023 · 6 comments · Fixed by #48315
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.3 affects-7.5 This bug affects the 7.5.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. found/gs found by gs severity/major type/bug The issue is confirmed as a bug.

Comments

@kennytm
Copy link
Contributor

kennytm commented Nov 2, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Populate data:

CREATE TABLE a (pk bigint unsigned primary key clustered, v text);
INSERT INTO a WITH RECURSIVE b(pk) AS (SELECT 1 UNION ALL SELECT pk+1 FROM b WHERE pk < 1000) SELECT pk, 'a' FROM b;
INSERT INTO a WITH RECURSIVE b(pk) AS (SELECT 1 UNION ALL SELECT pk+1 FROM b WHERE pk < 1000) SELECT pk + (1<<63), 'b' FROM b;
SPLIT TABLE a BY (500);

Perform the tablesample regions() query:

SELECT * FROM a TABLESAMPLE REGIONS() ORDER BY pk;

2. What did you expect to see? (Required)

+---------------------+------+
| pk                  | v    |
+---------------------+------+
|                 500 | a    |
| 9223372036854775809 | b    |
+---------------------+------+

3. What did you see instead (Required)

+---------------------+------+
| pk                  | v    |
+---------------------+------+
| 9223372036854775809 | b    |
|                 500 | a    |
+---------------------+------+
The `pk` is ordered like a signed integer.

EXPLAINing this shows the ORDER BY is eliminated. So the fix should be applied to the TableSample object itself.

mysql> explain SELECT * FROM a TABLESAMPLE REGIONS() ORDER BY pk;
+---------------+---------+------+---------------+---------------+
| id            | estRows | task | access object | operator info |
+---------------+---------+------+---------------+---------------+
| TableSample_9 | 1.00    | root |               |               |
+---------------+---------+------+---------------+---------------+

4. What is your TiDB version? (Required)

v7.6.0-alpha-nightly-20231101

Release Version: v7.6.0-alpha
Edition: Community
Git Commit Hash: c652a92df890196ad1e956da3a6afa7abf71adfb
Git Branch: heads/refs/tags/v7.6.0-alpha
UTC Build Time: 2023-11-01 14:21:14
GoVersion: go1.21.3
Race Enabled: false
Check Table Before Drop: false
Store: tikv 

(Also reproduced on v7.3.0 and v6.5.2. Should affect every TiDB version since TABLESAMPLE REGIONS() was implemented)

@kennytm kennytm added type/bug The issue is confirmed as a bug. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.3 found/gs found by gs labels Nov 2, 2023
@kennytm
Copy link
Contributor Author

kennytm commented Nov 2, 2023

BTW is it ok to add a hidden environmental switch or something to br/pkg/version/version.go or Dumpling CLI and other the tools (primarily sync-diff-inspector) to treat the database as if vanilla MySQL? The fragile usage of TABLESAMPLE REGIONS() (e.g. #27349) and stats-bucket decoding (e.g. pingcap/tidb-tools#623) and other TiDB-specific APIs for chunking has caused quite a significant number of bugs on the tools, and currently there is no escape hatch to force Dumpling falling back to the generic, slower but reliable algorithms.

@kennytm
Copy link
Contributor Author

kennytm commented Nov 3, 2023

BTW this can be workedaround on the client side by forcing a Sort operation:

SELECT * FROM a TABLESAMPLE REGIONS() ORDER BY coalesce(pk)
--                                             ^~~~~~~~
+-------------------------+---------+------+---------------+----------------------------------------------------+
| id                      | estRows | task | access object | operator info                                      |
+-------------------------+---------+------+---------------+----------------------------------------------------+
| Projection_7            | 2000.00 | root |               | test.a.pk, test.a.v                                |
| └─Sort_4                | 2000.00 | root |               | Column#3                                           |
|   └─Projection_8        | 1.00    | root |               | test.a.pk, test.a.v, coalesce(test.a.pk)->Column#3 |
|     └─TableSample_6     | 1.00    | root |               |                                                    |
+-------------------------+---------+------+---------------+----------------------------------------------------+

@ti-chi-bot ti-chi-bot added affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. labels Nov 3, 2023
@jebter jebter added severity/major component/ddl This issue is related to DDL of TiDB. labels Nov 3, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Nov 3, 2023
@tangenta tangenta removed may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Nov 6, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 8, 2023
@JasonWu0506
Copy link

/type regression

@JasonWu0506
Copy link

/type remove regression

Copy link

ti-chi-bot bot commented Jan 18, 2024

@JasonWu0506: The label(s) type/remove cannot be applied, because the repository doesn't have them.

In response to this:

/type remove regression

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@JasonWu0506
Copy link

/remove-type regression

@ti-chi-bot ti-chi-bot added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. labels Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.3 affects-7.5 This bug affects the 7.5.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. found/gs found by gs severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants