Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](multi-catalog) Add memory tracker for orc reader/writer and arrow parquet writer。 #37234

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Jul 3, 2024

Proposed changes

[Feature] (multi-catalog) Add memory tracker for orc reader/writer and arrow parquet writer。

Future work

  • Since the parquet reader is written by ourself and does not use the arrow third-party library, some memory usage needs to be added to the memory track.
  • Added read and write operator-level memory tracker to the profile.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#include <map>

#include "orc/MemoryPool.hh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'orc/MemoryPool.hh' file not found [clang-diagnostic-error]

#include "orc/MemoryPool.hh"
         ^

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 6bb4b3f to 61ac8ab Compare July 3, 2024 12:16
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40162 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 61ac8ab547a9d9f5da165f1ac79af7b44b5fbc37, data reload: false

------ Round 1 ----------------------------------
q1	17603	4499	4393	4393
q2	2014	198	184	184
q3	10464	1206	1224	1206
q4	10199	766	820	766
q5	7480	2703	2673	2673
q6	220	137	135	135
q7	976	606	615	606
q8	9225	2106	2099	2099
q9	8864	6548	6493	6493
q10	9022	3720	3709	3709
q11	450	231	238	231
q12	464	233	234	233
q13	17917	2980	3023	2980
q14	269	234	217	217
q15	531	463	492	463
q16	528	381	369	369
q17	985	706	761	706
q18	8149	7424	7321	7321
q19	3356	1504	1529	1504
q20	732	301	327	301
q21	5103	3232	4093	3232
q22	414	353	341	341
Total cold run time: 114965 ms
Total hot run time: 40162 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4413	4278	4316	4278
q2	367	269	261	261
q3	2972	2740	2830	2740
q4	2006	1780	1713	1713
q5	5637	5543	5538	5538
q6	225	134	130	130
q7	2216	1814	1953	1814
q8	3278	3454	3439	3439
q9	8729	8690	8821	8690
q10	4096	3907	3833	3833
q11	582	479	498	479
q12	818	629	654	629
q13	17380	3149	3180	3149
q14	321	293	277	277
q15	534	478	516	478
q16	490	437	430	430
q17	1844	1508	1515	1508
q18	8058	7822	7721	7721
q19	2742	1493	1545	1493
q20	2172	1848	1900	1848
q21	5165	5020	5085	5020
q22	625	525	553	525
Total cold run time: 74670 ms
Total hot run time: 55993 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175352 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 61ac8ab547a9d9f5da165f1ac79af7b44b5fbc37, data reload: false

query1	927	387	380	380
query2	6499	2394	2321	2321
query3	6636	210	215	210
query4	19194	17603	17362	17362
query5	3672	511	479	479
query6	250	162	170	162
query7	4589	289	292	289
query8	339	294	303	294
query9	8702	2385	2366	2366
query10	597	301	302	301
query11	10375	9910	9997	9910
query12	118	89	86	86
query13	1647	365	364	364
query14	10307	7950	7935	7935
query15	235	191	198	191
query16	7842	279	273	273
query17	1885	560	558	558
query18	1978	283	277	277
query19	204	160	155	155
query20	95	83	84	83
query21	217	132	132	132
query22	4170	4003	4242	4003
query23	33820	33782	33817	33782
query24	10867	3056	2940	2940
query25	613	428	392	392
query26	710	158	156	156
query27	2237	325	334	325
query28	5935	2143	2151	2143
query29	923	671	678	671
query30	261	159	162	159
query31	1023	778	785	778
query32	98	53	58	53
query33	673	314	315	314
query34	873	466	491	466
query35	812	669	665	665
query36	1165	991	983	983
query37	149	92	84	84
query38	2883	2886	2776	2776
query39	892	853	866	853
query40	214	136	128	128
query41	60	55	55	55
query42	122	103	109	103
query43	593	563	574	563
query44	1118	736	718	718
query45	200	170	172	170
query46	1073	721	752	721
query47	1827	1737	1756	1737
query48	368	308	302	302
query49	857	515	412	412
query50	757	375	391	375
query51	6838	6838	6799	6799
query52	104	93	90	90
query53	357	291	301	291
query54	891	454	445	445
query55	85	77	75	75
query56	279	268	270	268
query57	1110	1047	1058	1047
query58	251	247	251	247
query59	3650	3264	3152	3152
query60	303	279	278	278
query61	104	113	87	87
query62	612	430	441	430
query63	324	298	294	294
query64	8832	2276	1777	1777
query65	3177	3153	3108	3108
query66	757	320	337	320
query67	15645	14989	14906	14906
query68	8623	542	533	533
query69	759	476	363	363
query70	1463	1169	1117	1117
query71	521	294	290	290
query72	8977	5473	5401	5401
query73	2276	324	326	324
query74	5870	5405	5454	5405
query75	5345	2655	2676	2655
query76	5163	908	972	908
query77	780	305	314	305
query78	10514	9916	9859	9859
query79	8531	511	508	508
query80	1022	478	476	476
query81	551	239	220	220
query82	453	108	107	107
query83	322	173	169	169
query84	266	83	85	83
query85	928	279	268	268
query86	356	315	326	315
query87	3309	3096	3059	3059
query88	4322	2410	2350	2350
query89	534	402	399	399
query90	1999	194	193	193
query91	129	101	105	101
query92	59	49	48	48
query93	6624	511	504	504
query94	1310	188	189	188
query95	413	315	317	315
query96	617	270	264	264
query97	3171	3021	3031	3021
query98	208	197	195	195
query99	1118	840	861	840
Total cold run time: 291453 ms
Total hot run time: 175352 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 61ac8ab547a9d9f5da165f1ac79af7b44b5fbc37, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.69	0.09	0.08
query5	0.50	0.48	0.50
query6	1.13	0.72	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.55	0.47	0.49
query10	0.54	0.54	0.53
query11	0.15	0.11	0.12
query12	0.14	0.12	0.12
query13	0.60	0.60	0.59
query14	0.75	0.76	0.79
query15	0.85	0.81	0.83
query16	0.35	0.35	0.36
query17	0.96	1.03	0.96
query18	0.23	0.22	0.29
query19	1.83	1.78	1.80
query20	0.01	0.00	0.00
query21	15.44	0.76	0.66
query22	4.01	7.57	2.06
query23	18.35	1.41	1.27
query24	2.14	0.24	0.22
query25	0.16	0.09	0.08
query26	0.26	0.18	0.17
query27	0.08	0.08	0.08
query28	13.22	1.02	1.01
query29	12.65	3.34	3.33
query30	0.25	0.05	0.06
query31	2.88	0.39	0.40
query32	3.26	0.49	0.47
query33	2.84	2.94	2.94
query34	16.85	4.42	4.40
query35	4.49	4.53	4.56
query36	0.64	0.46	0.47
query37	0.18	0.15	0.16
query38	0.16	0.17	0.14
query39	0.04	0.04	0.04
query40	0.17	0.16	0.15
query41	0.08	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 108.92 s
Total hot run time: 30.89 s

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 61ac8ab to 671de96 Compare July 3, 2024 16:13
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

#endif // #if defined(USE_JEMALLOC) && defined(USE_MEM_TRACKER)
}

void free(char* p) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: pointer parameter 'p' can be pointer to const [readability-non-const-parameter]

Suggested change
void free(char* p) override {
void free(const char* p) override {

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41072 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 671de96b026b8b7a5fb44615d60a3589f3a94417, data reload: false

------ Round 1 ----------------------------------
q1	17772	4613	4425	4425
q2	2640	207	205	205
q3	11508	1174	1136	1136
q4	10583	803	799	799
q5	8125	2722	2713	2713
q6	223	147	144	144
q7	977	603	609	603
q8	9228	2103	2110	2103
q9	9027	6557	6493	6493
q10	8944	3723	3778	3723
q11	457	243	243	243
q12	406	240	230	230
q13	17769	3002	2977	2977
q14	275	231	215	215
q15	516	477	492	477
q16	511	374	370	370
q17	970	699	741	699
q18	8167	7587	7453	7453
q19	7760	1493	1459	1459
q20	662	334	332	332
q21	4926	3929	3980	3929
q22	402	344	344	344
Total cold run time: 121848 ms
Total hot run time: 41072 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4398	4275	4290	4275
q2	385	269	260	260
q3	3028	2768	2735	2735
q4	1906	1603	1586	1586
q5	5260	5360	5302	5302
q6	223	137	136	136
q7	2163	1704	1724	1704
q8	3190	3370	3359	3359
q9	8424	8383	8375	8375
q10	3940	3636	3613	3613
q11	594	477	502	477
q12	808	617	607	607
q13	17574	2979	2989	2979
q14	280	268	275	268
q15	515	488	494	488
q16	469	410	421	410
q17	1802	1460	1459	1459
q18	7664	7444	7384	7384
q19	1708	1690	1488	1488
q20	1982	1800	1769	1769
q21	4873	4741	4792	4741
q22	663	549	552	549
Total cold run time: 71849 ms
Total hot run time: 53964 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172392 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 671de96b026b8b7a5fb44615d60a3589f3a94417, data reload: false

query1	926	385	377	377
query2	6461	2368	2405	2368
query3	6651	209	209	209
query4	19542	17657	17363	17363
query5	4185	503	506	503
query6	294	186	149	149
query7	4606	291	288	288
query8	294	282	278	278
query9	8588	2478	2449	2449
query10	617	311	276	276
query11	10494	10104	10073	10073
query12	135	89	80	80
query13	1628	375	360	360
query14	9550	7776	6988	6988
query15	242	191	183	183
query16	7839	308	294	294
query17	1826	572	512	512
query18	1968	282	280	280
query19	198	161	164	161
query20	95	86	83	83
query21	211	140	130	130
query22	4226	4118	3956	3956
query23	33689	33212	33016	33016
query24	12011	2876	2840	2840
query25	667	361	370	361
query26	1815	155	151	151
query27	2990	307	315	307
query28	7493	2121	2099	2099
query29	1182	642	610	610
query30	279	149	147	147
query31	938	744	740	740
query32	97	56	52	52
query33	770	281	294	281
query34	986	470	476	470
query35	749	614	612	612
query36	1073	938	945	938
query37	285	79	76	76
query38	2901	2759	2793	2759
query39	887	803	825	803
query40	282	127	125	125
query41	59	50	53	50
query42	115	99	105	99
query43	610	553	553	553
query44	1247	748	735	735
query45	200	156	165	156
query46	1084	712	732	712
query47	1846	1749	1791	1749
query48	383	298	298	298
query49	1202	425	422	422
query50	770	388	396	388
query51	6912	6901	6837	6837
query52	110	93	97	93
query53	359	303	301	301
query54	941	451	458	451
query55	75	74	74	74
query56	302	279	282	279
query57	1157	1054	1049	1049
query58	255	261	249	249
query59	3498	3280	3077	3077
query60	366	319	310	310
query61	117	116	120	116
query62	704	424	450	424
query63	324	291	291	291
query64	9911	2265	1757	1757
query65	3239	3104	3097	3097
query66	1376	340	326	326
query67	15307	15327	15098	15098
query68	4541	555	561	555
query69	459	327	317	317
query70	1148	1161	1092	1092
query71	407	283	283	283
query72	7138	5627	5019	5019
query73	752	325	327	325
query74	5933	5573	5614	5573
query75	3422	2685	2671	2671
query76	2836	939	938	938
query77	472	321	306	306
query78	9324	9133	8852	8852
query79	2782	527	524	524
query80	2177	472	483	472
query81	585	222	216	216
query82	911	112	106	106
query83	282	173	171	171
query84	281	91	87	87
query85	2074	316	366	316
query86	490	317	312	312
query87	3262	3074	3146	3074
query88	4164	2390	2395	2390
query89	466	376	384	376
query90	1855	189	200	189
query91	129	100	101	100
query92	63	50	52	50
query93	2786	524	514	514
query94	1277	213	218	213
query95	411	328	322	322
query96	616	272	278	272
query97	3186	2988	3029	2988
query98	212	202	204	202
query99	1200	838	829	829
Total cold run time: 277780 ms
Total hot run time: 172392 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.95 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 671de96b026b8b7a5fb44615d60a3589f3a94417, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.67	0.08	0.08
query5	0.53	0.48	0.48
query6	1.13	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.49	0.51
query10	0.55	0.56	0.55
query11	0.16	0.12	0.12
query12	0.16	0.12	0.12
query13	0.60	0.60	0.59
query14	0.80	0.78	0.80
query15	0.83	0.82	0.81
query16	0.37	0.38	0.37
query17	1.04	0.95	1.04
query18	0.24	0.24	0.24
query19	1.90	1.73	1.68
query20	0.01	0.01	0.01
query21	15.40	0.78	0.66
query22	4.14	7.01	2.25
query23	18.26	1.47	1.24
query24	2.14	0.23	0.23
query25	0.15	0.09	0.08
query26	0.26	0.17	0.18
query27	0.08	0.09	0.08
query28	13.25	1.03	1.01
query29	12.60	3.29	3.31
query30	0.25	0.07	0.05
query31	2.86	0.39	0.38
query32	3.28	0.48	0.47
query33	2.89	2.96	2.87
query34	17.10	4.47	4.43
query35	4.53	4.55	4.50
query36	0.63	0.48	0.46
query37	0.18	0.15	0.16
query38	0.15	0.15	0.15
query39	0.04	0.03	0.03
query40	0.18	0.15	0.15
query41	0.09	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.52 s
Total hot run time: 30.95 s

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 671de96 to 1fa9e9f Compare July 4, 2024 01:44
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -84,7 +84,7 @@ class VOrcTransformer final : public VFileFormatTransformer {
TFileCompressType::type compression,
const iceberg::Schema* iceberg_schema = nullptr);

~VOrcTransformer() = default;
~VOrcTransformer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: annotate this function with 'override' or (rarely) 'final' [modernize-use-override]

Suggested change
~VOrcTransformer();
~VOrcTransformer() override;

@doris-robot
Copy link

TPC-H: Total hot run time: 39593 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1fa9e9fa96f7db8b5b59ba2b027e8d6bf4556cf8, data reload: false

------ Round 1 ----------------------------------
q1	17639	4333	4216	4216
q2	2008	195	185	185
q3	10445	1205	1021	1021
q4	10200	786	756	756
q5	7486	2651	2584	2584
q6	216	136	135	135
q7	941	594	600	594
q8	9244	2049	2065	2049
q9	8846	6461	6438	6438
q10	9050	3705	3694	3694
q11	464	238	229	229
q12	470	229	221	221
q13	18022	2950	2980	2950
q14	266	219	217	217
q15	509	483	475	475
q16	497	375	370	370
q17	951	712	682	682
q18	8019	7373	7390	7373
q19	6245	1550	1564	1550
q20	655	325	323	323
q21	4793	3197	3252	3197
q22	388	334	337	334
Total cold run time: 117354 ms
Total hot run time: 39593 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4519	4174	4277	4174
q2	364	269	262	262
q3	2971	2800	2871	2800
q4	2018	1734	1668	1668
q5	5634	5481	5465	5465
q6	220	130	128	128
q7	2216	1823	1877	1823
q8	3224	3386	3392	3386
q9	8727	8634	8792	8634
q10	4029	3926	3717	3717
q11	573	496	511	496
q12	814	663	669	663
q13	15860	3145	3212	3145
q14	303	278	278	278
q15	543	488	471	471
q16	480	436	416	416
q17	1798	1524	1515	1515
q18	8132	8011	7755	7755
q19	1767	1629	1491	1491
q20	2136	1867	1820	1820
q21	5051	4940	4841	4841
q22	620	565	557	557
Total cold run time: 71999 ms
Total hot run time: 55505 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173515 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1fa9e9fa96f7db8b5b59ba2b027e8d6bf4556cf8, data reload: false

query1	940	402	393	393
query2	6453	2429	2487	2429
query3	6637	205	212	205
query4	22007	17776	17258	17258
query5	3677	471	490	471
query6	262	177	191	177
query7	4599	305	293	293
query8	342	294	303	294
query9	8822	2452	2456	2452
query10	594	304	296	296
query11	10546	9996	10035	9996
query12	117	91	85	85
query13	1674	366	360	360
query14	10422	7183	7806	7183
query15	230	194	184	184
query16	7781	307	302	302
query17	1751	533	522	522
query18	1953	287	276	276
query19	201	150	172	150
query20	92	87	82	82
query21	216	155	136	136
query22	4364	4192	4022	4022
query23	33992	33644	33529	33529
query24	11069	2907	2845	2845
query25	616	393	404	393
query26	1107	160	161	160
query27	2535	322	329	322
query28	7017	2145	2154	2145
query29	939	648	662	648
query30	261	167	165	165
query31	963	761	779	761
query32	99	54	58	54
query33	751	297	296	296
query34	1016	488	489	488
query35	782	641	641	641
query36	1122	1007	989	989
query37	156	84	84	84
query38	2932	2812	2907	2812
query39	949	884	827	827
query40	217	152	128	128
query41	53	51	54	51
query42	111	102	109	102
query43	628	542	557	542
query44	1280	753	733	733
query45	202	169	166	166
query46	1089	703	720	703
query47	1859	1774	1741	1741
query48	369	300	294	294
query49	869	425	422	422
query50	778	389	379	379
query51	6923	6911	6785	6785
query52	111	94	100	94
query53	363	289	296	289
query54	938	455	448	448
query55	75	72	76	72
query56	298	269	272	269
query57	1157	1040	1066	1040
query58	274	252	252	252
query59	3544	3308	3169	3169
query60	331	277	274	274
query61	93	94	92	92
query62	616	467	428	428
query63	325	292	292	292
query64	8925	2299	1778	1778
query65	3207	3185	3151	3151
query66	818	335	334	334
query67	15428	15086	14906	14906
query68	6194	542	552	542
query69	715	414	328	328
query70	1199	1090	1147	1090
query71	463	289	281	281
query72	7602	5211	5840	5211
query73	793	321	321	321
query74	5842	5442	5540	5442
query75	3958	2714	2686	2686
query76	3223	951	911	911
query77	653	310	301	301
query78	9816	9102	9004	9004
query79	2666	533	530	530
query80	2022	484	503	484
query81	610	235	231	231
query82	742	112	107	107
query83	275	176	173	173
query84	271	91	90	90
query85	1273	288	269	269
query86	414	312	314	312
query87	3322	3103	3118	3103
query88	3722	2370	2425	2370
query89	486	386	384	384
query90	1774	194	189	189
query91	128	100	102	100
query92	59	49	50	49
query93	3955	532	523	523
query94	1031	211	210	210
query95	403	318	321	318
query96	610	267	266	266
query97	3200	3024	3013	3013
query98	218	214	197	197
query99	1182	821	846	821
Total cold run time: 279688 ms
Total hot run time: 173515 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1fa9e9fa96f7db8b5b59ba2b027e8d6bf4556cf8, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.22	0.05	0.04
query4	1.69	0.07	0.07
query5	0.50	0.50	0.50
query6	1.14	0.72	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.55	0.50	0.51
query10	0.55	0.55	0.53
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.59	0.58	0.60
query14	0.75	0.78	0.79
query15	0.85	0.81	0.82
query16	0.35	0.34	0.36
query17	0.97	0.95	0.99
query18	0.22	0.26	0.24
query19	1.81	1.77	1.73
query20	0.02	0.01	0.01
query21	15.39	0.78	0.66
query22	4.66	6.88	2.16
query23	18.34	1.31	1.17
query24	2.09	0.24	0.20
query25	0.16	0.09	0.09
query26	0.28	0.18	0.18
query27	0.08	0.08	0.08
query28	13.26	1.00	0.99
query29	12.65	3.30	3.31
query30	0.25	0.06	0.06
query31	2.86	0.38	0.38
query32	3.29	0.46	0.47
query33	2.83	2.96	2.90
query34	17.00	4.37	4.41
query35	4.48	4.47	4.51
query36	0.65	0.48	0.49
query37	0.19	0.16	0.16
query38	0.15	0.14	0.14
query39	0.04	0.03	0.03
query40	0.17	0.15	0.14
query41	0.08	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.04	0.05
Total cold run time: 109.69 s
Total hot run time: 30.69 s

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 1fa9e9f to 6679a62 Compare July 24, 2024 01:39
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

std::mutex RecordSizeMemoryAllocator::_mutex;

template <bool clear_memory_, bool mmap_populate, bool use_mmap, typename MemoryAllocator>
void Allocator<clear_memory_, mmap_populate, use_mmap, MemoryAllocator>::sys_memory_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'sys_memory_check' exceeds recommended size/complexity thresholds [readability-function-size]

void Allocator<clear_memory_, mmap_populate, use_mmap, MemoryAllocator>::sys_memory_check(
                                                                         ^
Additional context

be/src/vec/common/allocator.cpp:46: 112 lines including whitespace and comments (threshold 80)

void Allocator<clear_memory_, mmap_populate, use_mmap, MemoryAllocator>::sys_memory_check(
                                                                         ^

@@ -23,6 +23,10 @@
// TODO: Readable

#include <fmt/format.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'fmt/format.h' file not found [clang-diagnostic-error]

#include <fmt/format.h>
         ^


void* alloc(size_t size) {
if (size <= N) {
if constexpr (Base::clear_memory) memset(stack_memory, 0, N);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if constexpr (Base::clear_memory) memset(stack_memory, 0, N);
if constexpr (Base::clear_memory) { memset(stack_memory, 0, N);
}

if (size > N) Base::free(buf, size);
}
void free(void* buf, size_t size) {
if (size > N) Base::free(buf, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (size > N) Base::free(buf, size);
if (size > N) { Base::free(buf, size);
}

if (new_size <= N) return buf;
void* realloc(void* buf, size_t old_size, size_t new_size) {
/// Was in stack_memory, will remain there.
if (new_size <= N) return buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (new_size <= N) return buf;
if (new_size <= N) { return buf;
}

/// Already was big enough to not fit in stack_memory.
if (old_size > N) return Base::realloc(buf, old_size, new_size, Alignment);
/// Already was big enough to not fit in stack_memory.
if (old_size > N) return Base::realloc(buf, old_size, new_size, Alignment);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (old_size > N) return Base::realloc(buf, old_size, new_size, Alignment);
if (old_size > N) { return Base::realloc(buf, old_size, new_size, Alignment);
}

return arrow::Status::OK();
}

void ArrowAllocator::deallocate_aligned(uint8_t* ptr, int64_t size, int64_t alignment) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: pointer parameter 'ptr' can be pointer to const [readability-non-const-parameter]

Suggested change
void ArrowAllocator::deallocate_aligned(uint8_t* ptr, int64_t size, int64_t alignment) {
void ArrowAllocator::deallocate_aligned(const uint8_t* ptr, int64_t size, int64_t alignment) {

be/src/vec/exec/format/parquet/arrow_memory_pool.h:47:

-     void deallocate_aligned(uint8_t* ptr, int64_t size, int64_t alignment);
+     void deallocate_aligned(const uint8_t* ptr, int64_t size, int64_t alignment);


#pragma once

#include "arrow/memory_pool.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'arrow/memory_pool.h' file not found [clang-diagnostic-error]

#include "arrow/memory_pool.h"
         ^

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 6679a62 to fe66e7f Compare July 24, 2024 01:59
@kaka11chen kaka11chen changed the title [Feature](multi-catalog) Add memory tracker for orc reader and writer. [Feature](multi-catalog) Add memory tracker for orc reader/writer and arrow parquet writer。 Jul 24, 2024
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from fe66e7f to ad0e5b4 Compare July 24, 2024 04:41
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

PR approved by anyone and no changes requested.

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 766cffa to 1bce0fe Compare July 24, 2024 06:16
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 24, 2024
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the orc_parquet_memory_tracking branch from 1bce0fe to af452ad Compare July 24, 2024 08:25
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39933 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit af452adeae2acb7416232c649cf4cbeab62ad55b, data reload: false

------ Round 1 ----------------------------------
q1	17861	4769	4408	4408
q2	2530	211	190	190
q3	12198	1273	1058	1058
q4	10576	762	784	762
q5	7578	2801	2710	2710
q6	228	146	147	146
q7	997	613	613	613
q8	9341	2100	2128	2100
q9	8818	6555	6579	6555
q10	8727	3819	3758	3758
q11	433	238	235	235
q12	394	232	217	217
q13	17749	2987	2957	2957
q14	270	232	227	227
q15	514	498	491	491
q16	499	381	379	379
q17	973	740	692	692
q18	8114	7616	7383	7383
q19	7507	1333	1368	1333
q20	683	313	334	313
q21	4898	3126	3290	3126
q22	348	280	283	280
Total cold run time: 121236 ms
Total hot run time: 39933 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4437	4247	4216	4216
q2	380	250	265	250
q3	3015	2734	2706	2706
q4	1936	1609	1584	1584
q5	5300	5327	5333	5327
q6	215	130	131	130
q7	2088	1677	1709	1677
q8	3237	3350	3336	3336
q9	8442	8423	8405	8405
q10	3909	3724	3668	3668
q11	585	472	474	472
q12	770	604	638	604
q13	16378	2989	2952	2952
q14	308	261	266	261
q15	518	480	471	471
q16	481	415	438	415
q17	1806	1477	1482	1477
q18	7594	7485	7457	7457
q19	1714	1603	1543	1543
q20	2006	1798	1799	1798
q21	4814	4606	4622	4606
q22	604	488	494	488
Total cold run time: 70537 ms
Total hot run time: 53843 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173202 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit af452adeae2acb7416232c649cf4cbeab62ad55b, data reload: false

query1	929	373	365	365
query2	6443	1946	1831	1831
query3	6673	205	212	205
query4	28440	17460	17140	17140
query5	4185	474	486	474
query6	254	166	162	162
query7	4586	290	285	285
query8	235	189	185	185
query9	8539	2427	2394	2394
query10	447	281	272	272
query11	11784	10037	10052	10037
query12	133	82	83	82
query13	1650	372	376	372
query14	9886	7707	7657	7657
query15	221	166	164	164
query16	7731	477	507	477
query17	1522	566	544	544
query18	1937	293	281	281
query19	206	150	154	150
query20	91	83	82	82
query21	212	136	130	130
query22	4262	4130	4243	4130
query23	34045	33275	33352	33275
query24	12047	2886	2823	2823
query25	686	384	387	384
query26	1847	148	148	148
query27	2978	272	280	272
query28	7673	2010	1997	1997
query29	1220	602	601	601
query30	276	144	147	144
query31	972	728	742	728
query32	95	53	53	53
query33	788	327	339	327
query34	910	470	482	470
query35	856	748	728	728
query36	1085	945	923	923
query37	281	74	79	74
query38	2878	2803	2774	2774
query39	879	842	798	798
query40	279	121	116	116
query41	47	46	44	44
query42	120	102	103	102
query43	516	469	475	469
query44	1198	719	740	719
query45	201	158	159	158
query46	1084	699	731	699
query47	1861	1763	1794	1763
query48	378	290	295	290
query49	1203	422	417	417
query50	789	387	393	387
query51	6920	6644	6751	6644
query52	112	89	95	89
query53	366	290	283	283
query54	1008	453	451	451
query55	74	73	75	73
query56	286	288	265	265
query57	1157	1026	1034	1026
query58	254	244	252	244
query59	2771	2687	2669	2669
query60	313	270	276	270
query61	96	93	96	93
query62	844	630	645	630
query63	327	292	290	290
query64	10485	2232	1671	1671
query65	3157	3103	3091	3091
query66	1376	322	328	322
query67	15302	15251	15051	15051
query68	6805	554	543	543
query69	666	426	356	356
query70	1245	1156	1170	1156
query71	524	281	276	276
query72	7176	5541	5916	5541
query73	809	343	327	327
query74	6306	5701	5740	5701
query75	3979	2636	2708	2636
query76	4503	914	992	914
query77	670	308	314	308
query78	9777	9128	9065	9065
query79	3191	520	530	520
query80	1256	470	465	465
query81	582	213	229	213
query82	765	136	142	136
query83	212	164	163	163
query84	329	82	94	82
query85	1442	322	302	302
query86	387	301	314	301
query87	3382	3092	3051	3051
query88	4170	2417	2360	2360
query89	496	368	387	368
query90	1884	193	191	191
query91	135	99	102	99
query92	64	48	50	48
query93	4294	508	514	508
query94	1052	293	296	293
query95	405	310	320	310
query96	592	275	275	275
query97	3212	2994	3036	2994
query98	220	243	196	196
query99	1583	1279	1240	1240
Total cold run time: 294519 ms
Total hot run time: 173202 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.8 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit af452adeae2acb7416232c649cf4cbeab62ad55b, data reload: false

query1	0.05	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.08
query5	0.50	0.48	0.50
query6	1.12	0.73	0.72
query7	0.01	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.48	0.50
query10	0.54	0.54	0.55
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.60	0.58	0.59
query14	0.77	0.77	0.77
query15	0.86	0.80	0.81
query16	0.37	0.37	0.36
query17	0.95	0.96	0.94
query18	0.23	0.22	0.21
query19	1.81	1.67	1.67
query20	0.01	0.01	0.00
query21	15.43	0.77	0.66
query22	4.15	7.07	2.18
query23	18.39	1.38	1.28
query24	2.11	0.23	0.22
query25	0.16	0.08	0.08
query26	0.29	0.20	0.21
query27	0.45	0.22	0.23
query28	13.33	1.00	1.01
query29	12.63	3.34	3.33
query30	0.24	0.06	0.05
query31	2.86	0.39	0.38
query32	3.29	0.48	0.47
query33	2.92	2.96	2.91
query34	17.20	4.32	4.35
query35	4.43	4.47	4.41
query36	0.66	0.47	0.47
query37	0.19	0.15	0.15
query38	0.15	0.16	0.16
query39	0.05	0.04	0.04
query40	0.16	0.13	0.13
query41	0.10	0.04	0.05
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 109.99 s
Total hot run time: 30.8 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 25, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

morningman pushed a commit that referenced this pull request Jul 25, 2024
… arrow parquet writer。 (#37257)

## Proposed changes

backport #37234
Copy link
Contributor

@xinyiZzz xinyiZzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit c5a1998 into apache:master Jul 30, 2024
27 of 30 checks passed
dataroaring pushed a commit that referenced this pull request Jul 31, 2024
… arrow parquet writer。 (#37234)

## Proposed changes

[Feature] (multi-catalog) Add memory tracker for orc reader/writer and
arrow parquet writer。

## Future work

- Since the parquet reader is written by ourself and does not use the
arrow third-party library, some memory usage needs to be added to the
memory track.
- Added read and write operator-level memory tracker to the profile.
zy-kkk pushed a commit that referenced this pull request Jul 31, 2024
Fix allocator.h compiling failed on mac which introduced by #37234.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Aug 1, 2024
Fix allocator.h compiling failed on mac which introduced by apache#37234.
feiniaofeiafei pushed a commit to feiniaofeiafei/doris that referenced this pull request Aug 9, 2024
Fix allocator.h compiling failed on mac which introduced by apache#37234.
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
Fix allocator.h compiling failed on mac which introduced by #37234.
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
Fix allocator.h compiling failed on mac which introduced by #37234.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants