Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix]add min scan thread num for workload group's scan thread #38096

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

wangbo
Copy link
Contributor

@wangbo wangbo commented Jul 18, 2024

Proposed changes

Set workload group's and non-workload group's remote scan min thread to reduce thread num, prevent Be core for thread Exhaustion.
before:
image
after:
image

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@wangbo wangbo force-pushed the 0718_set_min_scan_thread_num branch from 22e4664 to 5eb3cd3 Compare July 18, 2024 12:41
@wangbo
Copy link
Contributor Author

wangbo commented Jul 18, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

yiguolei
yiguolei previously approved these changes Jul 18, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 18, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 40062 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5eb3cd37b38cc917bf88e71867dd8322153c0836, data reload: false

------ Round 1 ----------------------------------
q1	17601	4418	4308	4308
q2	2042	196	194	194
q3	10504	1276	1102	1102
q4	10554	849	840	840
q5	7966	2882	2674	2674
q6	226	140	139	139
q7	984	613	615	613
q8	9341	2126	2143	2126
q9	9289	6688	6628	6628
q10	8822	3836	3781	3781
q11	464	236	244	236
q12	399	224	222	222
q13	17762	2941	2983	2941
q14	278	238	244	238
q15	528	500	483	483
q16	506	387	377	377
q17	1027	718	703	703
q18	8226	7422	7391	7391
q19	5237	1430	1380	1380
q20	687	324	312	312
q21	4835	3088	3268	3088
q22	351	291	286	286
Total cold run time: 117629 ms
Total hot run time: 40062 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4417	4285	4351	4285
q2	367	270	260	260
q3	3035	2799	2725	2725
q4	1910	1606	1544	1544
q5	5283	5351	5309	5309
q6	224	130	130	130
q7	2088	1742	1712	1712
q8	3236	3369	3334	3334
q9	8464	8407	8425	8407
q10	3934	3707	3722	3707
q11	621	489	484	484
q12	762	584	590	584
q13	17474	2935	2975	2935
q14	300	273	274	273
q15	509	476	474	474
q16	474	423	404	404
q17	1782	1504	1460	1460
q18	7631	7541	7540	7540
q19	1643	1536	1559	1536
q20	2009	1755	1774	1755
q21	4820	4784	4722	4722
q22	555	500	487	487
Total cold run time: 71538 ms
Total hot run time: 54067 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172267 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5eb3cd37b38cc917bf88e71867dd8322153c0836, data reload: false

query1	911	376	366	366
query2	6469	1844	1770	1770
query3	6664	202	217	202
query4	28510	17582	17322	17322
query5	4220	479	500	479
query6	260	171	163	163
query7	4600	284	287	284
query8	251	185	182	182
query9	8424	2381	2365	2365
query10	440	277	263	263
query11	10981	10036	10064	10036
query12	130	81	78	78
query13	1630	376	373	373
query14	9499	7747	7745	7745
query15	232	170	166	166
query16	7602	314	322	314
query17	1339	559	511	511
query18	1794	269	268	268
query19	193	143	146	143
query20	91	85	84	84
query21	202	130	124	124
query22	4447	4051	3893	3893
query23	33729	33123	32907	32907
query24	12177	2820	2802	2802
query25	665	359	365	359
query26	1818	145	144	144
query27	2947	267	269	267
query28	7349	1969	1987	1969
query29	1088	621	616	616
query30	287	152	151	151
query31	964	762	737	737
query32	94	50	52	50
query33	762	289	282	282
query34	1001	473	475	473
query35	692	618	555	555
query36	1058	919	907	907
query37	192	80	78	78
query38	2875	2784	2767	2767
query39	872	801	807	801
query40	281	118	117	117
query41	44	45	43	43
query42	118	102	99	99
query43	479	463	461	461
query44	1259	731	733	731
query45	195	160	160	160
query46	1102	746	752	746
query47	1859	1778	1778	1778
query48	392	288	295	288
query49	1171	407	398	398
query50	777	383	401	383
query51	6921	6776	6728	6728
query52	102	90	90	90
query53	362	286	286	286
query54	950	445	449	445
query55	74	72	72	72
query56	285	264	265	264
query57	1147	1034	1090	1034
query58	262	240	258	240
query59	2810	2620	2641	2620
query60	305	289	290	289
query61	118	113	115	113
query62	850	631	665	631
query63	326	288	298	288
query64	10572	2187	1681	1681
query65	3230	3128	3126	3126
query66	1359	339	322	322
query67	15368	14831	14925	14831
query68	4604	547	547	547
query69	621	485	383	383
query70	1169	1060	1138	1060
query71	414	274	285	274
query72	7003	5330	6065	5330
query73	768	325	330	325
query74	6267	5693	5588	5588
query75	3521	2660	2692	2660
query76	2877	1006	949	949
query77	714	298	309	298
query78	9503	9652	9372	9372
query79	4850	507	522	507
query80	1004	473	476	473
query81	552	221	218	218
query82	877	134	132	132
query83	193	170	167	167
query84	275	91	90	90
query85	1476	316	296	296
query86	334	321	315	315
query87	3364	3118	3182	3118
query88	4582	2451	2532	2451
query89	497	378	387	378
query90	1823	200	200	200
query91	131	102	102	102
query92	62	48	50	48
query93	5198	513	496	496
query94	970	208	209	208
query95	412	309	314	309
query96	606	271	279	271
query97	3193	2977	3049	2977
query98	229	206	188	188
query99	1758	1303	1277	1277
Total cold run time: 290024 ms
Total hot run time: 172267 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5eb3cd37b38cc917bf88e71867dd8322153c0836, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.03
query3	0.22	0.05	0.04
query4	1.69	0.07	0.07
query5	0.47	0.47	0.48
query6	1.14	0.72	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.56	0.50	0.49
query10	0.54	0.55	0.54
query11	0.15	0.11	0.12
query12	0.14	0.12	0.13
query13	0.59	0.59	0.58
query14	0.75	0.79	0.78
query15	0.85	0.81	0.82
query16	0.36	0.36	0.37
query17	0.97	1.01	0.97
query18	0.23	0.22	0.21
query19	1.90	1.75	1.71
query20	0.01	0.01	0.02
query21	15.42	0.74	0.66
query22	4.23	7.61	2.12
query23	18.28	1.39	1.28
query24	2.21	0.22	0.23
query25	0.16	0.10	0.08
query26	0.29	0.21	0.20
query27	0.45	0.23	0.22
query28	13.15	1.01	0.98
query29	12.65	3.25	3.24
query30	0.25	0.06	0.05
query31	2.86	0.38	0.38
query32	3.30	0.47	0.45
query33	2.88	2.90	2.91
query34	16.99	4.39	4.36
query35	4.40	4.42	4.44
query36	0.64	0.46	0.47
query37	0.19	0.16	0.15
query38	0.14	0.15	0.14
query39	0.04	0.04	0.03
query40	0.16	0.13	0.11
query41	0.09	0.05	0.05
query42	0.05	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.63 s
Total hot run time: 30.68 s

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

4 similar comments
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@wangbo wangbo force-pushed the 0718_set_min_scan_thread_num branch from de33794 to fe8d7a9 Compare July 19, 2024 03:53
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@wangbo
Copy link
Contributor Author

wangbo commented Jul 19, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40092 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fe8d7a96c98a4fa56e1d04cfb939f025ee3c26af, data reload: false

------ Round 1 ----------------------------------
q1	17622	4858	4332	4332
q2	2019	194	193	193
q3	10427	1250	1139	1139
q4	10181	813	815	813
q5	7536	2719	2713	2713
q6	216	140	140	140
q7	969	605	605	605
q8	9207	2086	2105	2086
q9	8592	6573	6560	6560
q10	8846	3752	3761	3752
q11	444	240	253	240
q12	481	232	231	231
q13	17771	2974	2998	2974
q14	272	234	236	234
q15	525	473	486	473
q16	496	387	376	376
q17	983	705	749	705
q18	8138	7553	7374	7374
q19	5493	1428	1373	1373
q20	675	328	315	315
q21	4881	3184	3205	3184
q22	347	281	280	280
Total cold run time: 116121 ms
Total hot run time: 40092 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4423	4243	4285	4243
q2	397	265	265	265
q3	2989	2895	2873	2873
q4	2015	1766	1701	1701
q5	5660	5545	5505	5505
q6	222	138	131	131
q7	2231	1851	1862	1851
q8	3294	3479	3414	3414
q9	8816	8790	8915	8790
q10	4102	3905	3734	3734
q11	629	502	496	496
q12	798	664	627	627
q13	15993	3212	3177	3177
q14	305	292	278	278
q15	556	495	487	487
q16	498	436	435	435
q17	1841	1529	1501	1501
q18	8215	8028	7784	7784
q19	1882	1552	1526	1526
q20	2181	1902	1842	1842
q21	5248	4944	4966	4944
q22	562	490	509	490
Total cold run time: 72857 ms
Total hot run time: 56094 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172845 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fe8d7a96c98a4fa56e1d04cfb939f025ee3c26af, data reload: false

query1	908	365	365	365
query2	6451	1839	1833	1833
query3	6631	208	219	208
query4	21591	17378	17013	17013
query5	3673	468	506	468
query6	280	181	164	164
query7	4585	287	285	285
query8	245	198	201	198
query9	8632	2466	2440	2440
query10	447	298	272	272
query11	12531	10043	9987	9987
query12	113	80	82	80
query13	1629	362	358	358
query14	9590	7682	7709	7682
query15	216	175	166	166
query16	7558	470	430	430
query17	1289	540	537	537
query18	1880	274	269	269
query19	206	152	153	152
query20	94	80	79	79
query21	202	123	119	119
query22	4278	4184	4010	4010
query23	34219	33885	33654	33654
query24	11249	2889	2885	2885
query25	626	413	415	413
query26	702	155	152	152
query27	2262	284	293	284
query28	6247	2094	2078	2078
query29	943	665	707	665
query30	256	149	152	149
query31	975	783	754	754
query32	94	55	60	55
query33	765	327	341	327
query34	924	482	495	482
query35	862	718	751	718
query36	1144	995	944	944
query37	139	79	82	79
query38	2935	2909	2853	2853
query39	935	867	824	824
query40	202	118	117	117
query41	50	45	43	43
query42	113	96	103	96
query43	497	457	463	457
query44	1213	733	738	733
query45	198	163	159	159
query46	1088	741	719	719
query47	1860	1765	1759	1759
query48	381	286	286	286
query49	838	405	400	400
query50	778	382	393	382
query51	6932	6850	6630	6630
query52	98	93	93	93
query53	360	297	285	285
query54	921	450	447	447
query55	76	73	73	73
query56	284	259	269	259
query57	1132	1076	1046	1046
query58	247	247	265	247
query59	2951	2845	2576	2576
query60	311	271	273	271
query61	98	90	90	90
query62	787	663	642	642
query63	319	292	283	283
query64	9174	2216	1643	1643
query65	3162	3099	3127	3099
query66	754	331	325	325
query67	15327	14957	14734	14734
query68	4468	536	545	536
query69	457	371	355	355
query70	1120	1097	1161	1097
query71	430	276	297	276
query72	6995	5423	5991	5423
query73	745	325	321	321
query74	6143	5670	5612	5612
query75	3437	2726	2709	2709
query76	2732	965	902	902
query77	469	311	306	306
query78	9570	8898	10242	8898
query79	2431	508	517	508
query80	1149	471	471	471
query81	586	223	231	223
query82	806	132	137	132
query83	231	164	167	164
query84	236	84	88	84
query85	1769	303	302	302
query86	333	305	321	305
query87	3265	3051	3079	3051
query88	3759	2343	2357	2343
query89	477	377	372	372
query90	1701	197	188	188
query91	128	98	99	98
query92	56	50	51	50
query93	2282	512	509	509
query94	736	293	282	282
query95	395	321	305	305
query96	591	273	277	273
query97	3180	3019	2981	2981
query98	222	202	252	202
query99	1554	1242	1265	1242
Total cold run time: 270910 ms
Total hot run time: 172845 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fe8d7a96c98a4fa56e1d04cfb939f025ee3c26af, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.04
query3	0.23	0.05	0.04
query4	1.68	0.08	0.08
query5	0.49	0.49	0.48
query6	1.13	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.55	0.49	0.48
query10	0.54	0.54	0.54
query11	0.15	0.11	0.12
query12	0.14	0.11	0.12
query13	0.59	0.59	0.58
query14	0.76	0.78	0.76
query15	0.86	0.81	0.81
query16	0.37	0.37	0.36
query17	1.03	0.98	0.99
query18	0.22	0.21	0.20
query19	1.77	1.74	1.73
query20	0.01	0.01	0.01
query21	15.41	0.75	0.67
query22	4.96	6.21	2.21
query23	18.29	1.34	1.16
query24	2.11	0.22	0.22
query25	0.15	0.10	0.08
query26	0.29	0.21	0.20
query27	0.45	0.23	0.23
query28	13.28	1.01	1.00
query29	12.70	3.38	3.35
query30	0.24	0.06	0.06
query31	2.86	0.39	0.38
query32	3.28	0.46	0.48
query33	2.89	2.98	2.94
query34	17.20	4.33	4.38
query35	4.50	4.44	4.37
query36	0.65	0.48	0.47
query37	0.19	0.16	0.15
query38	0.15	0.16	0.14
query39	0.04	0.03	0.04
query40	0.15	0.11	0.12
query41	0.09	0.05	0.04
query42	0.06	0.05	0.05
query43	0.04	0.03	0.04
Total cold run time: 110.68 s
Total hot run time: 30.81 s

@yiguolei yiguolei merged commit af2d93c into apache:master Jul 19, 2024
26 of 31 checks passed
yiguolei pushed a commit that referenced this pull request Jul 19, 2024
dataroaring pushed a commit that referenced this pull request Jul 22, 2024
## Proposed changes
Set workload group's and non-workload group's remote scan min thread to
reduce thread num, prevent Be core for thread Exhaustion.
before:
<img width="582" alt="image"
src="https://github.com/user-attachments/assets/3a861191-c5a9-4b73-8a08-0aec0bed1cd5">
after:
<img width="522" alt="image"
src="https://github.com/user-attachments/assets/4024bbc8-d9d3-45bd-a895-07a6d87a6fd8">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants