Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](judge-partition) Fix incorrect logic in determining partitioned table #27515

Merged
merged 1 commit into from
Nov 27, 2023

Conversation

whutpencil
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

The old logic used to determine whether it was a partition table based on the number of buckets, but if I had a partition table with only one partition and the number of buckets in that partition was 1, it would be mistakenly recognized as a non partition table.

Table[test_load_doris_to_hive_2] is not partitioned

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.38 seconds
stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.6 seconds inserted 10000000 Rows, about 337K ops/s
storage size: 17099390689 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 53b97c7b1d251c065f1d2d73b2e09c13cdb806b0, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4909	4689	4651	4651
q2	373	141	159	141
q3	2037	1909	1860	1860
q4	1385	1262	1220	1220
q5	3941	3909	4045	3909
q6	245	129	126	126
q7	1452	879	868	868
q8	2770	2769	2760	2760
q9	9917	12702	9399	9399
q10	3476	3531	3525	3525
q11	396	246	245	245
q12	445	294	294	294
q13	4564	3806	3746	3746
q14	333	294	289	289
q15	590	531	528	528
q16	679	588	580	580
q17	1144	932	913	913
q18	7827	7446	7383	7383
q19	1674	1675	1695	1675
q20	530	304	319	304
q21	4404	3954	3942	3942
q22	478	372	366	366
Total cold run time: 53569 ms
Total hot run time: 48724 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4562	4560	4585	4560
q2	356	229	264	229
q3	4020	3995	4008	3995
q4	2698	2716	2695	2695
q5	9713	9754	9618	9618
q6	243	117	118	117
q7	3010	2459	2489	2459
q8	4418	4449	4439	4439
q9	12908	12866	12893	12866
q10	4076	4147	4196	4147
q11	749	672	644	644
q12	976	821	808	808
q13	4288	3565	3616	3565
q14	402	366	350	350
q15	572	521	521	521
q16	743	671	682	671
q17	3931	3935	3868	3868
q18	9476	9059	9106	9059
q19	1833	1767	1779	1767
q20	2393	2046	2036	2036
q21	8718	8616	8507	8507
q22	939	870	829	829
Total cold run time: 81024 ms
Total hot run time: 77750 ms

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Nov 27, 2023
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 36a528b into apache:master Nov 27, 2023
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
…is a partitioned table (apache#27515)

The old logic used to determine whether it was a partition table based on the number of buckets, but if I had a partition table with only one partition and the number of buckets in that partition was 1, it would be mistakenly recognized as a non partition table.

```
Table[test_load_doris_to_hive_2] is not partitioned
```
morningman added a commit that referenced this pull request Dec 8, 2023
This PR #27515 change the logic if Table's `isPartitioned()` method.
But this method has 2 usages:

1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export.

    For this case, it should return true if the table is range or list partitioned. even if it has only
    one partition and one buckets.

2. To check whether the data is distributed (either by partitions or by buckets), for query planner.

    For this case, it should return true if table has more than one bucket. Even if this table is not
    range or list partitioned, if it has more than one bucket, it should return true.

So we should separate this method into 2, for different usages.
Otherwise, it may cause some unreasonable plan shape
@wm1581066 wm1581066 added the usercase Important user case type label label Dec 11, 2023
morningman pushed a commit to morningman/doris that referenced this pull request Dec 13, 2023
…is a partitioned table (apache#27515)

The old logic used to determine whether it was a partition table based on the number of buckets, but if I had a partition table with only one partition and the number of buckets in that partition was 1, it would be mistakenly recognized as a non partition table.

```
Table[test_load_doris_to_hive_2] is not partitioned
```
morningman added a commit to morningman/doris that referenced this pull request Dec 13, 2023
This PR apache#27515 change the logic if Table's `isPartitioned()` method.
But this method has 2 usages:

1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export.

    For this case, it should return true if the table is range or list partitioned. even if it has only
    one partition and one buckets.

2. To check whether the data is distributed (either by partitions or by buckets), for query planner.

    For this case, it should return true if table has more than one bucket. Even if this table is not
    range or list partitioned, if it has more than one bucket, it should return true.

So we should separate this method into 2, for different usages.
Otherwise, it may cause some unreasonable plan shape
@xiaokang xiaokang added the p0_b label Dec 13, 2023
morningman added a commit to morningman/doris that referenced this pull request Dec 13, 2023
This PR apache#27515 change the logic if Table's `isPartitioned()` method.
But this method has 2 usages:

1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export.

    For this case, it should return true if the table is range or list partitioned. even if it has only
    one partition and one buckets.

2. To check whether the data is distributed (either by partitions or by buckets), for query planner.

    For this case, it should return true if table has more than one bucket. Even if this table is not
    range or list partitioned, if it has more than one bucket, it should return true.

So we should separate this method into 2, for different usages.
Otherwise, it may cause some unreasonable plan shape
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…is a partitioned table (apache#27515)

The old logic used to determine whether it was a partition table based on the number of buckets, but if I had a partition table with only one partition and the number of buckets in that partition was 1, it would be mistakenly recognized as a non partition table.

```
Table[test_load_doris_to_hive_2] is not partitioned
```
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
This PR apache#27515 change the logic if Table's `isPartitioned()` method.
But this method has 2 usages:

1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export.

    For this case, it should return true if the table is range or list partitioned. even if it has only
    one partition and one buckets.

2. To check whether the data is distributed (either by partitions or by buckets), for query planner.

    For this case, it should return true if table has more than one bucket. Even if this table is not
    range or list partitioned, if it has more than one bucket, it should return true.

So we should separate this method into 2, for different usages.
Otherwise, it may cause some unreasonable plan shape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.4-merged p0_b reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants