Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](LZ4 compression) Fix wrong LZ4 compression max input size limit #41239

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

Yukang-Lian
Copy link
Collaborator

Proposed changes

LZ4 compression max supported value is LZ4_MAX_INPUT_SIZE, which is 0x7E000000(2,113,929,216 bytes). Doris use wrong max size INT_MAX, which is 2,147,483,647, to check. If input data size is between this two size, then it can pass the check but LZ4 compression will fail.

This PR fix it.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.31% (9622/25789)
Line Coverage: 28.73% (79644/277263)
Region Coverage: 28.17% (41183/146201)
Branch Coverage: 24.80% (20986/84622)
Coverage Report: http://coverage.selectdb-in.cc/coverage/202faed6c1b7a705098d0a35e3b6b0a659345774_202faed6c1b7a705098d0a35e3b6b0a659345774/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40566 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 202faed6c1b7a705098d0a35e3b6b0a659345774, data reload: false

------ Round 1 ----------------------------------
q1	17600	7369	7223	7223
q2	2009	292	282	282
q3	12261	1045	1178	1045
q4	10558	738	773	738
q5	7755	2881	2854	2854
q6	237	150	149	149
q7	991	626	618	618
q8	9527	1880	1899	1880
q9	7576	6460	6338	6338
q10	6986	2280	2321	2280
q11	437	244	237	237
q12	400	213	209	209
q13	17773	2979	3003	2979
q14	243	215	212	212
q15	593	531	528	528
q16	696	607	629	607
q17	961	540	526	526
q18	7089	6656	6691	6656
q19	1386	982	1039	982
q20	583	300	278	278
q21	3897	3090	2979	2979
q22	1097	966	983	966
Total cold run time: 110655 ms
Total hot run time: 40566 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7240	7209	7193	7193
q2	325	224	237	224
q3	2974	2945	2942	2942
q4	1998	1832	1799	1799
q5	5687	5612	5650	5612
q6	239	150	142	142
q7	2226	1843	1816	1816
q8	3361	3485	3501	3485
q9	8857	8925	8935	8925
q10	3595	3809	3564	3564
q11	591	503	484	484
q12	815	624	631	624
q13	8246	3194	3139	3139
q14	330	272	277	272
q15	589	540	529	529
q16	708	686	666	666
q17	1819	1621	1607	1607
q18	8200	7708	7851	7708
q19	1718	1591	1460	1460
q20	2125	1894	1885	1885
q21	5427	5278	5329	5278
q22	1114	996	1013	996
Total cold run time: 68184 ms
Total hot run time: 60350 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191894 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 202faed6c1b7a705098d0a35e3b6b0a659345774, data reload: false

query1	943	381	396	381
query2	6404	2084	2043	2043
query3	8683	194	202	194
query4	34197	23592	23505	23505
query5	3448	482	487	482
query6	283	173	169	169
query7	4194	303	313	303
query8	316	234	226	226
query9	9564	2677	2656	2656
query10	454	277	273	273
query11	17923	15110	15198	15110
query12	159	98	95	95
query13	1530	473	423	423
query14	10586	7276	7377	7276
query15	281	173	182	173
query16	7870	498	518	498
query17	1737	592	578	578
query18	2096	314	312	312
query19	371	148	151	148
query20	114	112	114	112
query21	214	102	106	102
query22	4741	4407	4640	4407
query23	35258	34656	34395	34395
query24	11071	2797	2828	2797
query25	579	384	388	384
query26	768	159	153	153
query27	2132	289	287	287
query28	6687	2432	2421	2421
query29	734	423	409	409
query30	256	150	153	150
query31	1008	794	770	770
query32	99	51	53	51
query33	753	294	296	294
query34	898	499	503	499
query35	854	729	740	729
query36	1116	961	940	940
query37	148	92	83	83
query38	4012	3864	3937	3864
query39	1474	1431	1413	1413
query40	206	97	100	97
query41	52	47	48	47
query42	113	97	94	94
query43	535	500	496	496
query44	1228	818	793	793
query45	198	165	169	165
query46	1143	731	726	726
query47	1894	1788	1831	1788
query48	464	368	369	368
query49	880	417	402	402
query50	834	423	412	412
query51	7099	6997	7006	6997
query52	99	90	86	86
query53	259	195	183	183
query54	1234	473	474	473
query55	78	81	84	81
query56	280	272	268	268
query57	1238	1065	1120	1065
query58	258	240	235	235
query59	3110	2985	2967	2967
query60	315	288	281	281
query61	130	127	125	125
query62	871	659	678	659
query63	217	188	185	185
query64	3884	727	718	718
query65	3262	3211	3227	3211
query66	772	314	334	314
query67	16069	15499	15438	15438
query68	4486	575	572	572
query69	626	302	318	302
query70	1207	1173	1136	1136
query71	418	277	281	277
query72	7789	3948	3981	3948
query73	770	342	352	342
query74	10504	8949	8906	8906
query75	4137	2668	2665	2665
query76	3355	936	968	936
query77	735	304	287	287
query78	10134	9252	10032	9252
query79	1322	607	601	601
query80	2441	436	437	436
query81	596	246	241	241
query82	475	141	140	140
query83	320	141	132	132
query84	297	74	77	74
query85	1446	292	275	275
query86	426	303	289	289
query87	4555	4316	4287	4287
query88	3314	2459	2418	2418
query89	393	296	294	294
query90	2049	188	189	188
query91	182	161	140	140
query92	61	51	49	49
query93	1090	538	543	538
query94	1094	300	284	284
query95	359	254	256	254
query96	604	286	285	285
query97	3222	3097	3114	3097
query98	222	203	199	199
query99	1661	1271	1292	1271
Total cold run time: 300967 ms
Total hot run time: 191894 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.73 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 202faed6c1b7a705098d0a35e3b6b0a659345774, data reload: false

query1	0.04	0.04	0.04
query2	0.06	0.03	0.03
query3	0.23	0.07	0.06
query4	1.65	0.10	0.10
query5	0.52	0.51	0.51
query6	1.14	0.73	0.72
query7	0.01	0.01	0.01
query8	0.04	0.03	0.03
query9	0.55	0.50	0.50
query10	0.56	0.58	0.55
query11	0.13	0.10	0.10
query12	0.14	0.11	0.10
query13	0.61	0.60	0.60
query14	2.96	2.98	3.00
query15	0.91	0.83	0.83
query16	0.38	0.38	0.42
query17	1.04	1.04	1.06
query18	0.19	0.20	0.20
query19	1.91	1.84	1.96
query20	0.01	0.01	0.01
query21	15.36	0.58	0.58
query22	2.42	2.09	2.10
query23	17.22	0.85	0.94
query24	2.77	1.41	1.79
query25	0.23	0.09	0.09
query26	0.54	0.15	0.13
query27	0.03	0.04	0.04
query28	9.97	1.09	1.06
query29	12.51	3.26	3.26
query30	0.24	0.06	0.05
query31	2.88	0.37	0.38
query32	3.29	0.48	0.46
query33	2.96	3.02	3.00
query34	16.65	4.46	4.46
query35	4.49	4.49	4.61
query36	0.69	0.48	0.49
query37	0.09	0.07	0.05
query38	0.04	0.03	0.03
query39	0.03	0.02	0.02
query40	0.15	0.13	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.02	0.03
Total cold run time: 105.79 s
Total hot run time: 33.73 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 24, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@wm1581066 wm1581066 added the p0_c label Sep 25, 2024
Copy link
Collaborator

@TangSiyang2001 TangSiyang2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit cfc887d into apache:master Sep 26, 2024
26 of 31 checks passed
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Sep 30, 2024
…apache#41239)

## Proposed changes

LZ4 compression max supported value is LZ4_MAX_INPUT_SIZE, which is
0x7E000000(2,113,929,216 bytes). Doris use wrong max size INT_MAX, which
is 2,147,483,647, to check. If input data size is between this two size,
then it can pass the check but LZ4 compression will fail.

This PR fix it.

<!--Describe your changes.-->
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Sep 30, 2024
…apache#41239)

## Proposed changes

LZ4 compression max supported value is LZ4_MAX_INPUT_SIZE, which is
0x7E000000(2,113,929,216 bytes). Doris use wrong max size INT_MAX, which
is 2,147,483,647, to check. If input data size is between this two size,
then it can pass the check but LZ4 compression will fail.

This PR fix it.

<!--Describe your changes.-->
yiguolei pushed a commit that referenced this pull request Oct 1, 2024
…compression max input size limit (#41239)" (#41505)

## Proposed changes

LZ4 compression max supported value is LZ4_MAX_INPUT_SIZE, which is
0x7E000000(2,113,929,216 bytes). Doris use wrong max size INT_MAX, which
is 2,147,483,647, to check. If input data size is between this two size,
then it can pass the check but LZ4 compression will fail.

This PR fix it.

<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.x dev/2.1.x dev/3.0.x reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants