Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](memory) Fix Jemalloc Cache Memory Tracker #37464

Merged
merged 3 commits into from
Jul 11, 2024

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Jul 8, 2024

Proposed changes

Doris uses Jemalloc as default Allocator, Jemalloc Cache consists of two parts:

  • Thread Cache, cache a specified number of Pages in Thread Cache.
  • Dirty Page, memory Page that can be reused in all Arenas.
  1. Metadata should not be counted as cache, this will cause memory GC to be delayed, leading to BE OOM.
  2. Fix Jemalloc dirty page memory size, previous code used dirty page number * page size (4K on x86), which is much smaller than the actual memory. the fix is ​​the sum of dirty page memory of all size classes of extents.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Jul 8, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 8, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40937 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 20e8053fc9ec0f1f9e37b24f487c46d115ccdfcc, data reload: false

------ Round 1 ----------------------------------
q1	17692	4487	4347	4347
q2	2038	200	201	200
q3	10519	1177	1082	1082
q4	10223	775	840	775
q5	7508	2695	2683	2683
q6	218	136	140	136
q7	966	600	611	600
q8	9256	2138	2114	2114
q9	8997	6563	6527	6527
q10	8994	3761	3737	3737
q11	474	243	240	240
q12	484	238	259	238
q13	18916	2995	3001	2995
q14	271	235	225	225
q15	525	483	472	472
q16	492	372	384	372
q17	985	738	779	738
q18	8115	7440	7517	7440
q19	1939	1476	1454	1454
q20	667	324	325	324
q21	5049	3928	3905	3905
q22	405	333	339	333
Total cold run time: 114733 ms
Total hot run time: 40937 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4944	4299	4241	4241
q2	370	270	265	265
q3	3052	2937	2880	2880
q4	2084	1727	1730	1727
q5	5755	5525	5506	5506
q6	227	133	134	133
q7	2225	1980	1900	1900
q8	3293	3431	3462	3431
q9	8765	8800	8854	8800
q10	4097	3799	3741	3741
q11	585	525	491	491
q12	821	643	664	643
q13	15900	3183	3156	3156
q14	332	289	284	284
q15	522	487	499	487
q16	502	439	447	439
q17	1831	1547	1510	1510
q18	8072	7925	7845	7845
q19	5999	1708	1581	1581
q20	2159	1874	1872	1872
q21	5251	4932	4818	4818
q22	625	528	550	528
Total cold run time: 77411 ms
Total hot run time: 56278 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174027 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 20e8053fc9ec0f1f9e37b24f487c46d115ccdfcc, data reload: false

query1	923	375	371	371
query2	7481	2433	2438	2433
query3	6635	209	216	209
query4	28558	17490	17258	17258
query5	3744	487	473	473
query6	253	178	164	164
query7	4591	299	295	295
query8	300	297	282	282
query9	8576	2390	2376	2376
query10	574	304	287	287
query11	10856	9993	10218	9993
query12	123	86	92	86
query13	1663	383	390	383
query14	10912	7659	7556	7556
query15	232	185	186	185
query16	7737	336	311	311
query17	1548	560	548	548
query18	1981	282	282	282
query19	195	149	154	149
query20	89	85	86	85
query21	210	129	130	129
query22	4517	4244	3995	3995
query23	33965	33948	33724	33724
query24	11155	2888	2857	2857
query25	663	400	392	392
query26	1547	155	151	151
query27	3085	320	317	317
query28	7307	2155	2141	2141
query29	1011	633	622	622
query30	258	156	157	156
query31	962	741	764	741
query32	95	51	53	51
query33	740	318	286	286
query34	981	498	512	498
query35	721	621	617	617
query36	1147	957	984	957
query37	152	82	77	77
query38	2947	2911	2848	2848
query39	898	856	873	856
query40	256	145	124	124
query41	55	52	51	51
query42	118	97	101	97
query43	608	563	576	563
query44	1220	733	735	733
query45	190	164	156	156
query46	1082	700	720	700
query47	1862	1776	1800	1776
query48	371	301	309	301
query49	926	412	424	412
query50	766	382	382	382
query51	6922	6853	6768	6768
query52	106	94	95	94
query53	360	287	287	287
query54	863	452	457	452
query55	74	73	70	70
query56	279	259	266	259
query57	1176	1053	1038	1038
query58	256	239	248	239
query59	3529	3108	3190	3108
query60	297	275	269	269
query61	124	92	92	92
query62	617	455	445	445
query63	312	305	288	288
query64	10254	2162	1616	1616
query65	3168	3144	3106	3106
query66	1025	335	329	329
query67	15586	15224	15105	15105
query68	4592	557	540	540
query69	608	339	327	327
query70	1140	1127	1145	1127
query71	445	276	277	276
query72	8856	5444	5852	5444
query73	745	331	328	328
query74	5961	5543	5522	5522
query75	4312	2655	2668	2655
query76	3245	903	940	903
query77	702	294	299	294
query78	9649	9003	8841	8841
query79	2612	516	524	516
query80	1337	487	471	471
query81	584	217	220	217
query82	887	111	104	104
query83	351	174	167	167
query84	272	89	92	89
query85	2093	372	299	299
query86	444	306	313	306
query87	3287	3165	3095	3095
query88	4540	2457	2486	2457
query89	478	393	389	389
query90	1965	188	182	182
query91	126	100	105	100
query92	64	49	49	49
query93	4045	543	522	522
query94	1301	215	207	207
query95	415	304	323	304
query96	601	269	267	267
query97	3228	3033	2987	2987
query98	225	192	195	192
query99	1300	865	849	849
Total cold run time: 292144 ms
Total hot run time: 174027 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 20e8053fc9ec0f1f9e37b24f487c46d115ccdfcc, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.03
query3	0.22	0.05	0.04
query4	1.67	0.09	0.09
query5	0.49	0.50	0.49
query6	1.13	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.05	0.05
query9	0.54	0.50	0.50
query10	0.54	0.55	0.55
query11	0.15	0.12	0.11
query12	0.14	0.12	0.12
query13	0.59	0.60	0.58
query14	0.76	0.78	0.78
query15	0.83	0.83	0.83
query16	0.37	0.37	0.36
query17	0.95	1.00	1.02
query18	0.23	0.24	0.25
query19	1.79	1.66	1.66
query20	0.02	0.01	0.01
query21	15.40	0.74	0.66
query22	4.36	7.85	1.42
query23	18.30	1.37	1.18
query24	2.08	0.23	0.22
query25	0.16	0.09	0.09
query26	0.29	0.20	0.20
query27	0.45	0.23	0.23
query28	13.33	1.01	0.99
query29	12.64	3.30	3.27
query30	0.25	0.06	0.06
query31	2.89	0.38	0.38
query32	3.27	0.48	0.46
query33	2.82	2.92	2.97
query34	17.02	4.34	4.35
query35	4.41	4.41	4.38
query36	0.66	0.50	0.51
query37	0.18	0.16	0.15
query38	0.15	0.15	0.15
query39	0.04	0.04	0.03
query40	0.14	0.12	0.13
query41	0.09	0.04	0.04
query42	0.06	0.04	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.63 s
Total hot run time: 30.07 s

@xinyiZzz xinyiZzz force-pushed the 20240707_fix_allocator_cache branch from 20e8053 to aa7d8fa Compare July 10, 2024 13:24
@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40002 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aa7d8fa2185f245e9a617986f5fd79f990defead, data reload: false

------ Round 1 ----------------------------------
q1	17606	5129	4331	4331
q2	2022	190	185	185
q3	10516	1158	1098	1098
q4	10184	867	791	791
q5	7507	2684	2635	2635
q6	218	137	133	133
q7	972	591	614	591
q8	9240	2089	2116	2089
q9	8907	6550	6504	6504
q10	8982	3682	3758	3682
q11	450	233	235	233
q12	438	230	222	222
q13	17758	2976	3018	2976
q14	271	227	231	227
q15	544	496	495	495
q16	513	372	371	371
q17	972	732	672	672
q18	8190	7501	7549	7501
q19	6080	1470	1500	1470
q20	671	330	326	326
q21	4968	3133	3913	3133
q22	401	340	337	337
Total cold run time: 117410 ms
Total hot run time: 40002 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4475	4279	4262	4262
q2	373	273	259	259
q3	3019	2841	2890	2841
q4	1907	1729	1724	1724
q5	5570	5591	5443	5443
q6	222	131	133	131
q7	2235	1894	1850	1850
q8	3291	3487	3445	3445
q9	8726	8876	8804	8804
q10	4087	3766	3808	3766
q11	589	508	511	508
q12	806	639	641	639
q13	16744	3183	3184	3183
q14	304	292	273	273
q15	525	489	476	476
q16	488	417	459	417
q17	1835	1536	1510	1510
q18	8187	7808	7794	7794
q19	3612	1657	1507	1507
q20	2122	1873	1830	1830
q21	5141	4770	4828	4770
q22	601	573	543	543
Total cold run time: 74859 ms
Total hot run time: 55975 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174466 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aa7d8fa2185f245e9a617986f5fd79f990defead, data reload: false

query1	908	373	373	373
query2	6433	2508	2347	2347
query3	6626	207	214	207
query4	26177	17393	17351	17351
query5	3589	502	492	492
query6	279	173	164	164
query7	4586	295	289	289
query8	321	300	304	300
query9	8506	2355	2354	2354
query10	439	268	262	262
query11	10948	10156	10058	10058
query12	115	86	86	86
query13	1629	394	367	367
query14	10240	8084	7616	7616
query15	237	193	181	181
query16	7156	294	304	294
query17	1825	547	532	532
query18	1182	272	286	272
query19	200	160	154	154
query20	100	84	82	82
query21	209	127	136	127
query22	4395	4140	4235	4140
query23	33839	33531	33550	33531
query24	11033	2827	2903	2827
query25	625	412	404	404
query26	1192	151	153	151
query27	2555	270	285	270
query28	7467	2119	2124	2119
query29	880	636	635	635
query30	248	151	147	147
query31	990	757	744	744
query32	99	54	56	54
query33	753	301	295	295
query34	971	499	481	481
query35	698	560	582	560
query36	1125	975	992	975
query37	142	88	86	86
query38	2940	2853	2841	2841
query39	909	843	846	843
query40	204	121	126	121
query41	55	53	55	53
query42	114	105	99	99
query43	617	559	559	559
query44	1200	737	741	737
query45	201	161	160	160
query46	1086	752	750	750
query47	1841	1777	1789	1777
query48	391	304	295	295
query49	843	404	402	402
query50	783	387	398	387
query51	6824	6800	6752	6752
query52	114	88	99	88
query53	357	291	284	284
query54	923	448	466	448
query55	76	72	73	72
query56	305	306	277	277
query57	1166	1063	1053	1053
query58	247	243	262	243
query59	3405	3291	3129	3129
query60	305	275	269	269
query61	95	93	122	93
query62	809	653	658	653
query63	319	288	285	285
query64	9642	2175	1699	1699
query65	3173	3114	3114	3114
query66	765	321	326	321
query67	15901	14988	15160	14988
query68	4498	545	552	545
query69	558	457	340	340
query70	1169	1164	1125	1125
query71	425	279	273	273
query72	7398	5538	5284	5284
query73	757	323	322	322
query74	6164	5467	5432	5432
query75	3389	2681	2683	2681
query76	2859	993	890	890
query77	610	301	299	299
query78	9573	8905	8922	8905
query79	2880	505	514	505
query80	977	462	477	462
query81	594	217	227	217
query82	799	136	135	135
query83	262	167	171	167
query84	233	89	87	87
query85	1345	309	299	299
query86	439	324	332	324
query87	3245	3100	3097	3097
query88	4487	2470	2478	2470
query89	467	381	399	381
query90	1761	193	192	192
query91	127	121	111	111
query92	59	50	53	50
query93	2434	510	505	505
query94	1081	211	215	211
query95	408	318	313	313
query96	612	287	282	282
query97	3218	3014	3025	3014
query98	230	206	194	194
query99	1507	1249	1235	1235
Total cold run time: 278892 ms
Total hot run time: 174466 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.52 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit aa7d8fa2185f245e9a617986f5fd79f990defead, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.09	0.08
query5	0.50	0.47	0.49
query6	1.14	0.72	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.55	0.50	0.48
query10	0.53	0.55	0.54
query11	0.14	0.11	0.11
query12	0.14	0.12	0.12
query13	0.58	0.58	0.59
query14	0.77	0.76	0.78
query15	0.86	0.81	0.82
query16	0.37	0.35	0.37
query17	1.01	1.02	1.02
query18	0.23	0.23	0.22
query19	1.90	1.70	1.69
query20	0.01	0.01	0.01
query21	15.38	0.72	0.65
query22	3.64	8.41	1.94
query23	18.32	1.35	1.27
query24	2.08	0.21	0.22
query25	0.16	0.09	0.09
query26	0.29	0.21	0.21
query27	0.46	0.23	0.23
query28	13.33	1.02	1.00
query29	12.59	3.35	3.30
query30	0.25	0.05	0.06
query31	2.87	0.39	0.38
query32	3.28	0.46	0.47
query33	2.90	2.90	2.92
query34	16.98	4.28	4.30
query35	4.41	4.43	4.40
query36	0.66	0.47	0.46
query37	0.19	0.15	0.16
query38	0.15	0.15	0.14
query39	0.05	0.03	0.04
query40	0.15	0.13	0.12
query41	0.08	0.04	0.04
query42	0.06	0.04	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.16 s
Total hot run time: 30.52 s

@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 40519 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f1bbb9d2edf70d615a1a2948c7307db573d9f99b, data reload: false

------ Round 1 ----------------------------------
q1	17610	4495	4336	4336
q2	2015	191	193	191
q3	10533	1247	1188	1188
q4	10189	813	811	811
q5	7587	2727	2705	2705
q6	222	137	141	137
q7	965	605	622	605
q8	9212	2098	2082	2082
q9	8888	6577	6600	6577
q10	8912	3817	3833	3817
q11	449	242	233	233
q12	411	237	233	233
q13	17769	3004	3019	3004
q14	283	241	236	236
q15	526	476	495	476
q16	493	393	373	373
q17	973	777	746	746
q18	8165	7520	7476	7476
q19	6981	1532	1440	1440
q20	681	333	324	324
q21	4944	3198	3285	3198
q22	385	331	339	331
Total cold run time: 118193 ms
Total hot run time: 40519 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4412	4263	4292	4263
q2	371	253	266	253
q3	3125	2928	3040	2928
q4	2026	1766	1741	1741
q5	5597	5527	5476	5476
q6	232	142	146	142
q7	2283	1880	1832	1832
q8	3273	3476	3440	3440
q9	8766	8911	8803	8803
q10	4206	3800	3867	3800
q11	601	497	509	497
q12	820	668	646	646
q13	16106	3161	3195	3161
q14	331	285	292	285
q15	539	489	498	489
q16	512	424	427	424
q17	1828	1552	1514	1514
q18	8291	8123	8043	8043
q19	1736	1552	1581	1552
q20	2173	1876	1874	1874
q21	7599	4821	5034	4821
q22	619	554	565	554
Total cold run time: 75446 ms
Total hot run time: 56538 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175224 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f1bbb9d2edf70d615a1a2948c7307db573d9f99b, data reload: false

query1	922	391	370	370
query2	6382	2580	2278	2278
query3	6640	208	227	208
query4	26908	17464	17447	17447
query5	3667	473	479	473
query6	287	171	161	161
query7	4576	301	285	285
query8	302	292	285	285
query9	8472	2409	2398	2398
query10	431	304	264	264
query11	12215	10210	10004	10004
query12	117	92	79	79
query13	1637	369	370	369
query14	10229	7725	7857	7725
query15	237	185	183	183
query16	7279	305	309	305
query17	1605	550	521	521
query18	1652	273	271	271
query19	194	150	151	150
query20	85	78	79	78
query21	205	129	121	121
query22	4546	4147	3994	3994
query23	34197	33545	33741	33545
query24	11010	2908	2879	2879
query25	610	384	395	384
query26	1281	159	146	146
query27	2241	271	284	271
query28	6848	2161	2157	2157
query29	892	679	617	617
query30	259	154	147	147
query31	989	774	782	774
query32	84	52	58	52
query33	713	297	293	293
query34	964	489	494	489
query35	719	594	587	587
query36	1121	992	957	957
query37	156	90	89	89
query38	2987	2951	2895	2895
query39	895	844	869	844
query40	209	129	125	125
query41	58	54	49	49
query42	117	108	107	107
query43	601	561	555	555
query44	1228	736	742	736
query45	200	168	176	168
query46	1088	742	725	725
query47	1856	1767	1751	1751
query48	372	298	298	298
query49	846	422	429	422
query50	785	398	387	387
query51	6877	6827	6806	6806
query52	118	99	91	91
query53	371	290	297	290
query54	997	465	454	454
query55	75	75	75	75
query56	302	287	281	281
query57	1126	1075	1056	1056
query58	241	253	282	253
query59	3520	3257	3121	3121
query60	310	297	298	297
query61	120	115	116	115
query62	783	661	627	627
query63	324	288	297	288
query64	9236	2334	1766	1766
query65	3233	3152	3112	3112
query66	849	339	346	339
query67	15784	15034	15084	15034
query68	6205	548	547	547
query69	759	504	387	387
query70	1180	1158	1168	1158
query71	485	276	279	276
query72	9309	5925	5563	5563
query73	773	322	325	322
query74	5989	5489	5576	5489
query75	4524	2678	2704	2678
query76	4236	1005	922	922
query77	688	300	302	300
query78	10053	9120	9018	9018
query79	8824	529	522	522
query80	2332	480	473	473
query81	591	222	218	218
query82	1181	138	136	136
query83	284	173	170	170
query84	273	86	87	86
query85	1312	317	296	296
query86	459	334	304	304
query87	3301	3094	3208	3094
query88	5185	2349	2370	2349
query89	545	381	387	381
query90	1946	196	194	194
query91	133	105	107	105
query92	67	49	50	49
query93	6953	510	505	505
query94	1209	215	210	210
query95	416	314	312	312
query96	612	284	270	270
query97	3260	2990	3028	2990
query98	230	200	200	200
query99	1605	1282	1277	1277
Total cold run time: 301099 ms
Total hot run time: 175224 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f1bbb9d2edf70d615a1a2948c7307db573d9f99b, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.08
query5	0.50	0.48	0.48
query6	1.12	0.73	0.72
query7	0.03	0.02	0.01
query8	0.06	0.04	0.04
query9	0.55	0.50	0.50
query10	0.54	0.54	0.54
query11	0.15	0.11	0.11
query12	0.14	0.12	0.13
query13	0.61	0.61	0.59
query14	0.76	0.77	0.77
query15	0.86	0.81	0.82
query16	0.36	0.36	0.37
query17	0.99	1.02	1.00
query18	0.22	0.21	0.21
query19	1.82	1.75	1.65
query20	0.01	0.01	0.01
query21	15.40	0.75	0.66
query22	4.05	6.63	2.44
query23	18.29	1.36	1.36
query24	2.13	0.24	0.23
query25	0.16	0.10	0.08
query26	0.29	0.22	0.21
query27	0.45	0.23	0.23
query28	13.27	1.01	0.99
query29	12.67	3.26	3.25
query30	0.25	0.06	0.05
query31	2.90	0.39	0.38
query32	3.26	0.48	0.46
query33	2.87	2.96	2.95
query34	17.09	4.35	4.46
query35	4.39	4.43	4.40
query36	0.65	0.46	0.47
query37	0.18	0.15	0.16
query38	0.15	0.14	0.15
query39	0.05	0.03	0.04
query40	0.16	0.11	0.12
query41	0.09	0.05	0.04
query42	0.06	0.04	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.58 s
Total hot run time: 31.16 s

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinyiZzz xinyiZzz merged commit 5453e4a into apache:master Jul 11, 2024
25 of 29 checks passed
xinyiZzz added a commit to xinyiZzz/incubator-doris that referenced this pull request Jul 16, 2024
## Proposed changes

Doris uses Jemalloc as default Allocator, Jemalloc Cache consists of two
parts:
- Thread Cache, cache a specified number of Pages in Thread Cache.
- Dirty Page, memory Page that can be reused in all Arenas.

1. Metadata should not be counted as cache, this will cause memory GC to
be delayed, leading to BE OOM.
2. Fix Jemalloc dirty page memory size, previous code used dirty page
number * page size (4K on x86), which is much smaller than the actual
memory. the fix is ​​the sum of dirty page memory of all size classes of
extents.
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
## Proposed changes

Doris uses Jemalloc as default Allocator, Jemalloc Cache consists of two
parts:
- Thread Cache, cache a specified number of Pages in Thread Cache.
- Dirty Page, memory Page that can be reused in all Arenas.

1. Metadata should not be counted as cache, this will cause memory GC to
be delayed, leading to BE OOM.
2. Fix Jemalloc dirty page memory size, previous code used dirty page
number * page size (4K on x86), which is much smaller than the actual
memory. the fix is ​​the sum of dirty page memory of all size classes of
extents.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
## Proposed changes

Doris uses Jemalloc as default Allocator, Jemalloc Cache consists of two
parts:
- Thread Cache, cache a specified number of Pages in Thread Cache.
- Dirty Page, memory Page that can be reused in all Arenas.

1. Metadata should not be counted as cache, this will cause memory GC to
be delayed, leading to BE OOM.
2. Fix Jemalloc dirty page memory size, previous code used dirty page
number * page size (4K on x86), which is much smaller than the actual
memory. the fix is ​​the sum of dirty page memory of all size classes of
extents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants