Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](compaction) fix mismatch between segment key and value column rows during compaction #37960

Merged
merged 7 commits into from
Jul 22, 2024

Conversation

luwei16
Copy link
Contributor

@luwei16 luwei16 commented Jul 16, 2024

When a block is splitted to 3 segments, old code just handles 2 and the last is overlowed.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@luwei16
Copy link
Contributor Author

luwei16 commented Jul 16, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40049 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bf709f55be03f0785e24b380d8a8c16d08e06aec, data reload: false

------ Round 1 ----------------------------------
q1	17595	4706	4316	4316
q2	2015	191	181	181
q3	10450	1187	1112	1112
q4	10180	815	771	771
q5	7553	2703	2681	2681
q6	218	136	138	136
q7	952	600	597	597
q8	9279	2090	2067	2067
q9	8745	6616	6562	6562
q10	8781	3775	3738	3738
q11	450	237	243	237
q12	433	225	226	225
q13	17966	2949	2994	2949
q14	266	234	255	234
q15	520	476	482	476
q16	504	373	386	373
q17	974	781	782	781
q18	7997	7411	7405	7405
q19	7784	1424	1435	1424
q20	695	316	325	316
q21	4905	3185	3336	3185
q22	344	291	283	283
Total cold run time: 118606 ms
Total hot run time: 40049 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4427	4251	4253	4251
q2	365	263	250	250
q3	3081	2962	2910	2910
q4	1969	1693	1742	1693
q5	5591	5543	5432	5432
q6	232	136	131	131
q7	2191	1883	1848	1848
q8	3264	3397	3405	3397
q9	8818	8799	8873	8799
q10	4128	3883	3786	3786
q11	599	482	511	482
q12	811	633	620	620
q13	17129	3206	3189	3189
q14	316	290	295	290
q15	532	493	481	481
q16	492	424	440	424
q17	1835	1562	1519	1519
q18	7969	8051	7642	7642
q19	1747	1678	1578	1578
q20	2144	1868	1864	1864
q21	5135	4790	4695	4695
q22	579	500	514	500
Total cold run time: 73354 ms
Total hot run time: 55781 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173552 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bf709f55be03f0785e24b380d8a8c16d08e06aec, data reload: false

query1	922	388	365	365
query2	6436	1929	1840	1840
query3	6641	210	224	210
query4	28422	17522	17200	17200
query5	3622	482	491	482
query6	295	185	164	164
query7	4582	284	288	284
query8	240	196	195	195
query9	8557	2394	2376	2376
query10	423	285	263	263
query11	10599	10063	10267	10063
query12	113	82	81	81
query13	1652	366	359	359
query14	10153	7654	7565	7565
query15	217	165	167	165
query16	7711	313	310	310
query17	1759	544	517	517
query18	1855	287	272	272
query19	193	147	151	147
query20	89	82	79	79
query21	203	132	120	120
query22	4322	4040	3979	3979
query23	33986	33906	33630	33630
query24	11197	2971	2868	2868
query25	662	439	436	436
query26	910	154	150	150
query27	2396	283	286	283
query28	6553	2057	2071	2057
query29	911	624	644	624
query30	257	155	156	155
query31	944	759	785	759
query32	98	54	56	54
query33	755	292	303	292
query34	1045	506	520	506
query35	715	583	599	583
query36	1175	995	1000	995
query37	145	86	81	81
query38	2995	2876	2832	2832
query39	893	824	835	824
query40	201	120	120	120
query41	45	43	43	43
query42	121	100	98	98
query43	495	481	477	477
query44	1211	746	719	719
query45	191	162	161	161
query46	1082	746	734	734
query47	1855	1755	1803	1755
query48	379	293	289	289
query49	844	393	407	393
query50	778	395	389	389
query51	6892	6862	6813	6813
query52	99	96	91	91
query53	357	296	289	289
query54	862	451	440	440
query55	78	74	75	74
query56	282	263	267	263
query57	1138	1061	1042	1042
query58	255	246	263	246
query59	2885	2631	2748	2631
query60	342	270	276	270
query61	106	92	95	92
query62	818	649	651	649
query63	321	281	292	281
query64	9487	2208	1648	1648
query65	3193	3121	3114	3114
query66	759	323	327	323
query67	15361	15016	15042	15016
query68	4529	540	546	540
query69	565	429	347	347
query70	1204	1124	1150	1124
query71	393	289	281	281
query72	7415	5910	5525	5525
query73	761	333	328	328
query74	6061	5702	5651	5651
query75	3488	2684	2721	2684
query76	2197	928	978	928
query77	415	302	300	300
query78	9708	9040	9053	9040
query79	3248	523	523	523
query80	2917	466	477	466
query81	599	222	218	218
query82	1065	140	135	135
query83	321	173	172	172
query84	276	88	90	88
query85	2056	312	307	307
query86	474	298	311	298
query87	3304	3132	3110	3110
query88	4671	2396	2392	2392
query89	483	385	403	385
query90	1769	187	190	187
query91	128	98	101	98
query92	63	49	50	49
query93	4210	517	502	502
query94	1138	210	209	209
query95	416	313	313	313
query96	611	282	277	277
query97	3247	3026	3009	3009
query98	218	202	194	194
query99	1655	1247	1271	1247
Total cold run time: 284685 ms
Total hot run time: 173552 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.82 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bf709f55be03f0785e24b380d8a8c16d08e06aec, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.22	0.05	0.04
query4	1.67	0.07	0.08
query5	0.49	0.49	0.50
query6	1.14	0.73	0.72
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.55	0.50	0.49
query10	0.53	0.53	0.53
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.59	0.58	0.59
query14	0.76	0.76	0.77
query15	0.86	0.83	0.81
query16	0.35	0.37	0.39
query17	0.94	0.95	0.96
query18	0.22	0.22	0.22
query19	1.83	1.72	1.70
query20	0.01	0.01	0.01
query21	15.39	0.74	0.66
query22	4.75	6.66	2.12
query23	18.24	1.42	1.27
query24	2.11	0.24	0.24
query25	0.15	0.08	0.09
query26	0.29	0.21	0.20
query27	0.45	0.23	0.24
query28	13.25	1.02	0.99
query29	12.58	3.29	3.29
query30	0.24	0.06	0.05
query31	2.87	0.40	0.38
query32	3.26	0.48	0.47
query33	2.92	2.90	2.96
query34	17.02	4.39	4.42
query35	4.40	4.42	4.50
query36	0.66	0.46	0.48
query37	0.18	0.16	0.15
query38	0.15	0.15	0.15
query39	0.05	0.03	0.04
query40	0.15	0.12	0.13
query41	0.08	0.05	0.04
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.94 s
Total hot run time: 30.82 s

}
// when splitting segment, need to make rows align between key columns and value columns
if (num_rows_written + limit >= num_rows_key_group &&
_cur_writer_idx < _segment_writers.size() - 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_written = num_rows_written + limit >= num_rows_key_group ? num_rows_key_group - num_rows_written : limit;
_segment_writers[_cur_writer_idx]->append_block(block, start_offset, to_written);
start_offset += to_written;
limit -= to_written;
if (limit > 0) {
++_cur_writer_idx;
// switch to next writer
RETURN_IF_ERROR(_segment_writers[_cur_writer_idx]->init(col_ids, is_key));
num_rows_written = 0;
num_rows_key_group = _segment_writers[_cur_writer_idx]->row_count();
}

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

2 similar comments
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@luwei16
Copy link
Contributor Author

luwei16 commented Jul 19, 2024

run buildall

@luwei16
Copy link
Contributor Author

luwei16 commented Jul 19, 2024

run performance

dataroaring
dataroaring previously approved these changes Jul 19, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 19, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 20, 2024
@luwei16
Copy link
Contributor Author

luwei16 commented Jul 20, 2024

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 7d2e431 into apache:master Jul 22, 2024
26 of 28 checks passed
dataroaring pushed a commit that referenced this pull request Jul 24, 2024
dataroaring pushed a commit that referenced this pull request Jul 24, 2024
…ows during compaction (#37960)

When a block is splitted to 3 segments, old code just handles 2 and the
last is overlowed.
dataroaring pushed a commit that referenced this pull request Jul 24, 2024
gavinchou pushed a commit that referenced this pull request Jul 25, 2024
dataroaring pushed a commit that referenced this pull request Jul 27, 2024
luwei16 added a commit to luwei16/incubator-doris that referenced this pull request Aug 4, 2024
…ows during compaction (apache#37960)

When a block is splitted to 3 segments, old code just handles 2 and the
last is overlowed.
luwei16 added a commit to luwei16/incubator-doris that referenced this pull request Aug 4, 2024
luwei16 added a commit to luwei16/incubator-doris that referenced this pull request Aug 4, 2024
dataroaring pushed a commit that referenced this pull request Aug 5, 2024
luwei16 added a commit to luwei16/incubator-doris that referenced this pull request Aug 13, 2024
…ows during compaction (apache#37960)

When a block is splitted to 3 segments, old code just handles 2 and the
last is overlowed.
luwei16 added a commit to luwei16/incubator-doris that referenced this pull request Aug 13, 2024
luwei16 added a commit to luwei16/incubator-doris that referenced this pull request Aug 13, 2024
dataroaring pushed a commit that referenced this pull request Aug 14, 2024
GoGoWen pushed a commit to GoGoWen/incubator-doris that referenced this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants