Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](hive) support hive write text table #38549

Merged

Conversation

suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Jul 30, 2024

Proposed changes

  1. Support write hive text table
  2. Add SessionVariable hive_text_compression to write compressed hive text table
  3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@suxiaogang223 suxiaogang223 marked this pull request as draft July 30, 2024 15:21
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/util/block_compression.h Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/util/block_compression.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/util/block_compression.cpp Outdated Show resolved Hide resolved
be/src/util/block_compression.cpp Outdated Show resolved Hide resolved
@suxiaogang223 suxiaogang223 marked this pull request as ready for review August 2, 2024 08:33
@suxiaogang223 suxiaogang223 force-pushed the hive_text_write_and_compression branch from c7b1696 to 321ec9b Compare August 12, 2024 12:53
@suxiaogang223
Copy link
Contributor Author

run buildall

4 similar comments
@morningman
Copy link
Contributor

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38864 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f69e65833c13b4aa935d6f9e0b1eb0ba77f1a1d0, data reload: false

------ Round 1 ----------------------------------
q1	18610	4510	4355	4355
q2	2064	221	226	221
q3	11666	966	1087	966
q4	10542	789	817	789
q5	7813	2904	2891	2891
q6	275	160	160	160
q7	1038	673	653	653
q8	9384	2151	2176	2151
q9	7166	6595	6638	6595
q10	7076	2218	2295	2218
q11	496	289	285	285
q12	445	261	265	261
q13	18865	3036	3038	3036
q14	326	266	264	264
q15	569	539	534	534
q16	525	420	422	420
q17	1052	726	706	706
q18	7541	6837	6905	6837
q19	6529	1079	1145	1079
q20	729	385	378	378
q21	3915	3086	3025	3025
q22	1113	1040	1054	1040
Total cold run time: 117739 ms
Total hot run time: 38864 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4593	4295	4320	4295
q2	415	318	311	311
q3	2900	2675	2725	2675
q4	2002	1716	1738	1716
q5	5657	5719	5686	5686
q6	248	155	156	155
q7	2177	1807	1757	1757
q8	3350	3580	3514	3514
q9	8934	8845	8788	8788
q10	3671	3329	3304	3304
q11	648	522	553	522
q12	853	687	648	648
q13	17366	3200	3076	3076
q14	326	306	282	282
q15	562	518	518	518
q16	508	467	497	467
q17	1832	1588	1562	1562
q18	8403	7933	7651	7651
q19	7928	1713	1666	1666
q20	2206	1938	1882	1882
q21	14117	5393	5319	5319
q22	1197	1082	1070	1070
Total cold run time: 89893 ms
Total hot run time: 56864 ms

be/src/util/slice.h Show resolved Hide resolved
@@ -1111,6 +1113,9 @@ public class SessionVariable implements Serializable, Writable {
"set the number of sort phases 1 or 2. if set other value, let cbo decide the sort type"})
public int sortPhaseNum = 0;

@VariableMgr.VarAttr(name = HIVE_TEXT_COMPRESSION, needForward = true)
private String hiveTextCompression = "uncompressed";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this variable for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to determine the compressed file format written to the hive_text table.

@suxiaogang223 suxiaogang223 force-pushed the hive_text_write_and_compression branch from f69e658 to 3ecc571 Compare August 23, 2024 07:22
@suxiaogang223
Copy link
Contributor Author

run buildall

1 similar comment
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38232 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d02eade5909ff740c7b144aa388ba4ac7c90e935, data reload: false

------ Round 1 ----------------------------------
q1	17937	4570	4323	4323
q2	2028	187	179	179
q3	11816	941	1131	941
q4	10518	806	720	720
q5	7732	2864	2852	2852
q6	227	140	140	140
q7	972	619	608	608
q8	9354	2090	2078	2078
q9	7286	6530	6569	6530
q10	7002	2264	2280	2264
q11	443	245	242	242
q12	401	226	228	226
q13	17993	3046	3035	3035
q14	272	238	240	238
q15	525	492	499	492
q16	495	391	386	386
q17	998	691	684	684
q18	7626	6805	6944	6805
q19	1390	1006	1059	1006
q20	682	346	338	338
q21	4037	3118	3155	3118
q22	1114	1045	1027	1027
Total cold run time: 110848 ms
Total hot run time: 38232 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4350	4298	4259	4259
q2	374	272	278	272
q3	2931	2694	2686	2686
q4	1965	1649	1714	1649
q5	5667	5688	5870	5688
q6	227	136	135	135
q7	2265	1916	1838	1838
q8	3308	3523	3504	3504
q9	8848	8814	8832	8814
q10	3599	3334	3304	3304
q11	600	518	538	518
q12	838	672	713	672
q13	12711	3192	3338	3192
q14	345	287	287	287
q15	533	496	489	489
q16	484	444	451	444
q17	1835	1551	1531	1531
q18	8220	7851	7803	7803
q19	1777	1566	1707	1566
q20	2145	1926	1937	1926
q21	5906	5528	5811	5528
q22	1130	1046	1065	1046
Total cold run time: 70058 ms
Total hot run time: 57151 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191595 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d02eade5909ff740c7b144aa388ba4ac7c90e935, data reload: false

query1	1272	888	861	861
query2	6347	1932	1891	1891
query3	10604	4029	3944	3944
query4	59624	26350	23227	23227
query5	5365	499	492	492
query6	409	161	160	160
query7	5740	296	308	296
query8	291	218	213	213
query9	8960	2481	2466	2466
query10	483	286	261	261
query11	17838	15021	15192	15021
query12	146	106	107	106
query13	1530	419	414	414
query14	10955	6774	6536	6536
query15	247	174	174	174
query16	7536	448	484	448
query17	1116	592	590	590
query18	2082	323	307	307
query19	333	159	156	156
query20	124	113	192	113
query21	214	102	106	102
query22	4745	4302	4721	4302
query23	34116	33635	34460	33635
query24	5956	2868	2938	2868
query25	515	398	404	398
query26	672	159	160	159
query27	1762	284	288	284
query28	3948	2051	2131	2051
query29	660	437	432	432
query30	239	153	150	150
query31	963	744	774	744
query32	78	60	59	59
query33	486	299	299	299
query34	878	476	485	476
query35	887	744	727	727
query36	1069	917	912	912
query37	139	90	89	89
query38	4111	3878	3822	3822
query39	1472	1382	1417	1382
query40	209	119	119	119
query41	47	47	46	46
query42	121	104	98	98
query43	510	468	478	468
query44	1106	749	753	749
query45	200	165	169	165
query46	1100	740	733	733
query47	1884	1796	1828	1796
query48	375	313	311	311
query49	777	439	443	439
query50	823	425	425	425
query51	7178	7038	6928	6928
query52	99	89	91	89
query53	255	185	180	180
query54	572	462	463	462
query55	84	79	78	78
query56	284	261	275	261
query57	1205	1053	1061	1053
query58	227	235	248	235
query59	3091	3092	3081	3081
query60	286	273	266	266
query61	100	98	107	98
query62	750	662	654	654
query63	218	182	184	182
query64	3569	2048	1746	1746
query65	3169	3202	3154	3154
query66	668	355	329	329
query67	15725	15377	15201	15201
query68	4357	572	560	560
query69	663	363	284	284
query70	1118	1078	1056	1056
query71	459	287	275	275
query72	2591	2125	2060	2060
query73	732	328	329	328
query74	9327	8774	8775	8774
query75	3407	2778	2681	2681
query76	2430	1038	973	973
query77	680	328	320	320
query78	9761	9102	8970	8970
query79	1045	545	528	528
query80	760	507	491	491
query81	460	224	223	223
query82	289	137	141	137
query83	174	172	156	156
query84	263	78	80	78
query85	798	285	276	276
query86	306	302	293	293
query87	4482	4308	4312	4308
query88	3076	2386	2379	2379
query89	396	292	285	285
query90	2093	196	195	195
query91	124	100	100	100
query92	67	52	54	52
query93	1104	543	547	543
query94	783	298	294	294
query95	357	268	270	268
query96	601	280	273	273
query97	3163	3088	3034	3034
query98	218	201	203	201
query99	1814	1287	1288	1287
Total cold run time: 306442 ms
Total hot run time: 191595 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d02eade5909ff740c7b144aa388ba4ac7c90e935, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.23	0.06	0.06
query4	1.65	0.08	0.08
query5	0.50	0.50	0.51
query6	1.13	0.73	0.73
query7	0.01	0.01	0.01
query8	0.05	0.05	0.04
query9	0.55	0.49	0.49
query10	0.58	0.55	0.54
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.61	0.60
query14	0.77	0.80	0.79
query15	0.84	0.83	0.85
query16	0.37	0.39	0.38
query17	1.07	1.03	1.06
query18	0.21	0.20	0.21
query19	1.95	1.78	1.87
query20	0.01	0.01	0.01
query21	15.40	0.67	0.67
query22	3.51	8.53	2.03
query23	18.28	1.38	1.33
query24	2.14	0.23	0.23
query25	0.14	0.09	0.08
query26	0.27	0.18	0.17
query27	0.08	0.09	0.07
query28	13.20	1.03	1.01
query29	12.66	3.34	3.27
query30	0.25	0.06	0.06
query31	2.88	0.42	0.40
query32	3.23	0.49	0.49
query33	2.98	3.01	3.01
query34	17.17	4.44	4.38
query35	4.48	4.42	4.39
query36	0.66	0.46	0.49
query37	0.20	0.16	0.16
query38	0.16	0.15	0.15
query39	0.05	0.04	0.03
query40	0.16	0.13	0.13
query41	0.08	0.04	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.05 s
Total hot run time: 31.04 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38151 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cd950118a8da1725b067e9bb2665069847305c16, data reload: false

------ Round 1 ----------------------------------
q1	18444	4731	4375	4375
q2	2022	188	179	179
q3	11318	970	1021	970
q4	10242	763	652	652
q5	7741	2854	2780	2780
q6	229	142	139	139
q7	985	608	626	608
q8	9332	2104	2134	2104
q9	7333	6585	6609	6585
q10	7005	2267	2208	2208
q11	460	242	241	241
q12	398	220	226	220
q13	17769	3074	3023	3023
q14	285	243	239	239
q15	531	496	508	496
q16	495	408	386	386
q17	1008	700	720	700
q18	7401	6976	6906	6906
q19	1403	1083	989	989
q20	658	334	324	324
q21	4375	3083	3031	3031
q22	1151	996	1018	996
Total cold run time: 110585 ms
Total hot run time: 38151 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4382	4335	4291	4291
q2	380	268	274	268
q3	2918	2699	2700	2699
q4	2034	1664	1681	1664
q5	5657	5700	5793	5700
q6	233	132	135	132
q7	2181	1832	1809	1809
q8	3334	3475	3543	3475
q9	8879	8822	8800	8800
q10	3553	3424	3415	3415
q11	600	506	516	506
q12	856	679	679	679
q13	14028	3043	3222	3043
q14	328	292	289	289
q15	550	503	494	494
q16	509	450	430	430
q17	1847	1578	1551	1551
q18	8179	7879	8025	7879
q19	1773	1662	1572	1572
q20	2157	1899	1929	1899
q21	5839	5453	5548	5453
q22	1127	1020	1002	1002
Total cold run time: 71344 ms
Total hot run time: 57050 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191796 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cd950118a8da1725b067e9bb2665069847305c16, data reload: false

query1	1266	890	854	854
query2	6316	1984	1955	1955
query3	10663	4065	3996	3996
query4	59549	26056	23123	23123
query5	5400	509	512	509
query6	400	157	156	156
query7	5779	294	299	294
query8	282	207	222	207
query9	8913	2515	2503	2503
query10	490	275	272	272
query11	17981	15091	15472	15091
query12	161	107	104	104
query13	1542	393	395	393
query14	11142	7314	6607	6607
query15	244	180	175	175
query16	7535	480	449	449
query17	1138	570	561	561
query18	2043	300	293	293
query19	284	147	150	147
query20	120	108	134	108
query21	206	101	103	101
query22	4674	4445	4416	4416
query23	34522	33835	33568	33568
query24	6022	2853	2895	2853
query25	545	401	400	400
query26	692	159	158	158
query27	1788	288	284	284
query28	3719	2100	2069	2069
query29	707	429	425	425
query30	241	154	148	148
query31	944	764	765	764
query32	83	56	58	56
query33	475	296	303	296
query34	868	483	473	473
query35	840	715	728	715
query36	1076	906	953	906
query37	139	84	85	84
query38	3953	3873	3848	3848
query39	1481	1398	1420	1398
query40	198	123	118	118
query41	48	49	47	47
query42	117	101	99	99
query43	509	480	482	480
query44	1117	763	757	757
query45	197	167	169	167
query46	1089	758	767	758
query47	1890	1797	1792	1792
query48	388	307	304	304
query49	771	435	447	435
query50	833	421	425	421
query51	7173	7084	6922	6922
query52	104	92	88	88
query53	257	182	182	182
query54	582	460	474	460
query55	81	77	75	75
query56	287	265	266	265
query57	1170	1063	1049	1049
query58	222	234	234	234
query59	3053	2711	2711	2711
query60	301	280	272	272
query61	126	123	126	123
query62	774	666	660	660
query63	222	189	186	186
query64	4345	2339	1936	1936
query65	3208	3157	3170	3157
query66	672	330	328	328
query67	15461	15218	15259	15218
query68	5656	571	554	554
query69	460	279	280	279
query70	1206	1124	1114	1114
query71	491	280	276	276
query72	6486	2293	2035	2035
query73	1109	322	326	322
query74	9596	8876	8892	8876
query75	3426	2678	2785	2678
query76	3230	1031	1022	1022
query77	571	318	321	318
query78	9735	9017	8997	8997
query79	1646	535	541	535
query80	1084	531	509	509
query81	527	231	225	225
query82	427	138	135	135
query83	274	147	146	146
query84	262	73	76	73
query85	894	288	319	288
query86	406	313	280	280
query87	4504	4315	4256	4256
query88	3080	2336	2329	2329
query89	398	282	284	282
query90	2062	201	197	197
query91	124	101	100	100
query92	61	51	54	51
query93	2187	549	550	549
query94	762	270	304	270
query95	349	269	266	266
query96	601	272	271	271
query97	3252	3076	3063	3063
query98	225	210	213	210
query99	1570	1287	1290	1287
Total cold run time: 316079 ms
Total hot run time: 191796 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.01 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cd950118a8da1725b067e9bb2665069847305c16, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.05	0.04
query3	0.23	0.06	0.05
query4	1.65	0.08	0.09
query5	0.50	0.50	0.48
query6	1.12	0.74	0.73
query7	0.03	0.02	0.02
query8	0.05	0.05	0.05
query9	0.55	0.48	0.49
query10	0.56	0.53	0.55
query11	0.15	0.12	0.11
query12	0.16	0.12	0.12
query13	0.63	0.59	0.59
query14	0.76	0.80	0.77
query15	0.85	0.83	0.84
query16	0.36	0.38	0.37
query17	1.01	1.05	1.02
query18	0.21	0.21	0.21
query19	1.84	1.76	1.74
query20	0.02	0.01	0.01
query21	15.40	0.66	0.65
query22	4.26	7.07	2.10
query23	18.32	1.44	1.27
query24	2.10	0.24	0.22
query25	0.16	0.08	0.09
query26	0.27	0.18	0.18
query27	0.07	0.07	0.08
query28	13.28	1.02	1.01
query29	12.55	3.38	3.39
query30	0.24	0.06	0.05
query31	2.85	0.40	0.40
query32	3.26	0.48	0.48
query33	3.01	2.96	2.99
query34	17.07	4.44	4.43
query35	4.44	4.40	4.40
query36	0.66	0.49	0.46
query37	0.19	0.16	0.15
query38	0.17	0.16	0.16
query39	0.04	0.04	0.04
query40	0.16	0.13	0.14
query41	0.09	0.04	0.05
query42	0.06	0.04	0.04
query43	0.05	0.05	0.04
Total cold run time: 109.5 s
Total hot run time: 31.01 s

@morningman morningman merged commit 8439376 into apache:master Aug 28, 2024
28 of 32 checks passed
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Aug 28, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Aug 28, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
yiguolei pushed a commit that referenced this pull request Aug 29, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

pick from #38549
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Aug 29, 2024
yiguolei added a commit that referenced this pull request Aug 30, 2024
#40157)

…0063)"

This reverts commit c6df7c2.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
morningman pushed a commit that referenced this pull request Sep 4, 2024
followup #38549
If the large_block_len is 0, should not continue reading the block_len.
@suxiaogang223 suxiaogang223 deleted the hive_text_write_and_compression branch September 9, 2024 06:17
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Sep 9, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Sep 9, 2024
followup apache#38549
If the large_block_len is 0, should not continue reading the block_len.
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Sep 25, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
morningman pushed a commit that referenced this pull request Sep 27, 2024
## Proposed changes
pick prs:
#38549
#40183
#40315

---------

Co-authored-by: Calvin Kirs <kirs@apache.org>
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Oct 10, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Oct 10, 2024
followup apache#38549
If the large_block_len is 0, should not continue reading the block_len.
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Oct 10, 2024
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Oct 10, 2024
followup apache#38549
If the large_block_len is 0, should not continue reading the block_len.
morningman pushed a commit that referenced this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants