Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement](group commit)Optimize be select for group commit #35558

Conversation

Yukang-Lian
Copy link
Collaborator

@Yukang-Lian Yukang-Lian commented May 28, 2024

  1. Streamload and insert into, if batched and sent to the master FE, should use a consistent BE strategy (previously, insert into reused the first selected BE, while streamload used round robin). First, a map <table id, be id> records a fixed be id for a certain table. The first time a table is imported, a BE is randomly selected, and this table id and be id are recorded in the map permanently. Subsequently, all data imported into this table will select the BE corresponding to the table id recorded in the map. This ensures that batching is maximized to a single BE.
    To address the issue of excessive load on a single BE, a variable similar to a bvar window is used to monitor the total data volume sent to a specific BE for a specific table during the batch interval (default 10 seconds). A second map <be id, window variable> is used to track this. If a new import finds that its corresponding BE's window variable is less than a certain value (e.g., 1G), the new import continues to be sent to the corresponding BE according to map1. If it exceeds this value, the new import is sent to another BE with the smallest window variable value, and map1 is updated. If every BE exceeds this value, the one with the smallest value is still chosen. This helps to alleviate excessive pressure on a single BE.

  2. For streamload, if batched and sent to a BE, it will batch directly on this BE and will commit the transaction at the end of the import. At this point, a request is sent to the FE, which records the size of this import and adds it to the window variable.

  3. Streamload sent to observer FE, as well as insert into sent to observer FE, follow the logic in 1 by RPC, passing the table id to the master FE to obtain the selected be id.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Yukang-Lian Yukang-Lian marked this pull request as draft May 28, 2024 15:20
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

3 similar comments
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian Yukang-Lian marked this pull request as ready for review May 29, 2024 09:39
@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41637 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8cc9df72907d98faba25c5f80301c9d013fdc9e9, data reload: false

------ Round 1 ----------------------------------
q1	17619	4397	4260	4260
q2	2035	209	213	209
q3	10421	1226	1194	1194
q4	10210	886	800	800
q5	7494	2720	2727	2720
q6	214	135	139	135
q7	974	623	609	609
q8	9224	2166	2132	2132
q9	9483	6772	6831	6772
q10	9170	3896	3954	3896
q11	435	255	245	245
q12	419	223	234	223
q13	17340	3221	3282	3221
q14	249	214	222	214
q15	506	479	476	476
q16	493	398	400	398
q17	996	704	693	693
q18	8449	7852	7884	7852
q19	4649	1620	1614	1614
q20	659	332	316	316
q21	5188	3329	3414	3329
q22	382	329	329	329
Total cold run time: 116609 ms
Total hot run time: 41637 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4554	4439	4414	4414
q2	378	266	281	266
q3	3177	2879	2914	2879
q4	2003	1583	1647	1583
q5	5393	5531	5518	5518
q6	214	127	128	127
q7	2179	1785	1872	1785
q8	3239	3451	3396	3396
q9	8625	8731	8693	8693
q10	4108	3722	3803	3722
q11	592	498	502	498
q12	816	630	634	630
q13	16026	3147	3141	3141
q14	303	282	263	263
q15	546	492	480	480
q16	515	444	421	421
q17	1819	1512	1510	1510
q18	7844	7510	7499	7499
q19	1681	1577	1585	1577
q20	1992	1768	1775	1768
q21	4852	4687	4710	4687
q22	574	514	502	502
Total cold run time: 71430 ms
Total hot run time: 55359 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172013 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8cc9df72907d98faba25c5f80301c9d013fdc9e9, data reload: false

query1	929	375	368	368
query2	6445	2430	2287	2287
query3	6659	212	211	211
query4	20707	17282	17188	17188
query5	4101	421	414	414
query6	242	153	155	153
query7	4581	296	285	285
query8	301	265	272	265
query9	8581	2421	2393	2393
query10	453	288	262	262
query11	10560	10170	10089	10089
query12	155	87	88	87
query13	1636	359	366	359
query14	9465	8302	7477	7477
query15	235	190	191	190
query16	7444	265	257	257
query17	1309	517	514	514
query18	1890	273	271	271
query19	194	148	147	147
query20	94	82	83	82
query21	190	124	128	124
query22	4259	4037	3847	3847
query23	33453	33042	33327	33042
query24	5037	2803	2846	2803
query25	470	349	364	349
query26	701	152	149	149
query27	1845	321	331	321
query28	3705	2097	2087	2087
query29	839	616	607	607
query30	225	150	154	150
query31	917	762	738	738
query32	59	55	54	54
query33	432	270	258	258
query34	852	479	476	476
query35	698	605	593	593
query36	1039	898	920	898
query37	107	67	72	67
query38	2854	2796	2781	2781
query39	846	804	793	793
query40	200	127	122	122
query41	52	51	50	50
query42	101	95	95	95
query43	578	525	524	524
query44	1063	731	755	731
query45	184	170	170	170
query46	1049	704	760	704
query47	1824	1782	1759	1759
query48	367	291	294	291
query49	764	365	384	365
query50	760	385	374	374
query51	6900	6693	6697	6693
query52	108	90	93	90
query53	352	283	292	283
query54	521	431	417	417
query55	77	72	75	72
query56	266	274	248	248
query57	1119	1058	1036	1036
query58	219	211	210	210
query59	3442	3095	3220	3095
query60	268	259	259	259
query61	88	84	83	83
query62	558	461	443	443
query63	317	284	286	284
query64	2632	1740	1686	1686
query65	3172	3079	3068	3068
query66	698	324	319	319
query67	15242	15002	14787	14787
query68	4514	543	538	538
query69	429	271	270	270
query70	1202	1077	1145	1077
query71	403	279	274	274
query72	7446	5204	5441	5204
query73	717	317	318	317
query74	6018	5580	5651	5580
query75	3292	2632	2618	2618
query76	2112	991	961	961
query77	373	270	268	268
query78	10300	9797	9752	9752
query79	2480	516	511	511
query80	904	425	420	420
query81	510	228	222	222
query82	806	101	88	88
query83	249	169	169	169
query84	247	85	84	84
query85	947	266	261	261
query86	453	337	305	305
query87	3307	3071	3165	3071
query88	4235	2332	2331	2331
query89	492	405	387	387
query90	2029	180	188	180
query91	123	97	93	93
query92	66	50	49	49
query93	1943	506	496	496
query94	1146	179	179	179
query95	402	301	306	301
query96	578	272	261	261
query97	3218	3052	3002	3002
query98	248	219	208	208
query99	1175	851	842	842
Total cold run time: 249794 ms
Total hot run time: 172013 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.38 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8cc9df72907d98faba25c5f80301c9d013fdc9e9, data reload: false

query1	0.03	0.03	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.69	0.06	0.06
query5	0.52	0.48	0.50
query6	1.12	0.71	0.71
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.53	0.47	0.50
query10	0.54	0.54	0.53
query11	0.16	0.11	0.11
query12	0.14	0.12	0.11
query13	0.60	0.59	0.59
query14	0.78	0.76	0.76
query15	0.84	0.81	0.80
query16	0.37	0.37	0.37
query17	1.02	1.02	0.96
query18	0.23	0.25	0.27
query19	1.88	1.67	1.78
query20	0.01	0.01	0.01
query21	15.50	0.69	0.68
query22	4.95	6.53	1.77
query23	18.24	1.36	1.26
query24	1.95	0.22	0.21
query25	0.16	0.09	0.09
query26	0.26	0.17	0.16
query27	0.08	0.08	0.08
query28	13.31	1.01	0.99
query29	13.27	3.42	3.35
query30	0.24	0.05	0.06
query31	2.87	0.38	0.38
query32	3.30	0.47	0.47
query33	2.86	2.93	2.85
query34	17.23	4.54	4.41
query35	4.54	4.54	4.62
query36	0.65	0.46	0.45
query37	0.18	0.15	0.15
query38	0.16	0.14	0.14
query39	0.05	0.04	0.03
query40	0.16	0.14	0.15
query41	0.08	0.04	0.04
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.95 s
Total hot run time: 30.38 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.89% (9122/25417)
Line Coverage: 27.36% (74954/273905)
Region Coverage: 26.60% (38805/145874)
Branch Coverage: 23.44% (19749/84240)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8cc9df72907d98faba25c5f80301c9d013fdc9e9_8cc9df72907d98faba25c5f80301c9d013fdc9e9/report/index.html

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 42133 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 69a7be4f6a203de575602ef22a242f8f2a310176, data reload: false

------ Round 1 ----------------------------------
q1	17602	4495	4252	4252
q2	2020	195	193	193
q3	10467	1166	1239	1166
q4	10191	811	906	811
q5	7472	2750	2755	2750
q6	218	134	134	134
q7	985	622	592	592
q8	9244	2161	2123	2123
q9	9692	6717	6757	6717
q10	9175	3971	3929	3929
q11	464	254	240	240
q12	433	230	241	230
q13	18422	3212	3232	3212
q14	260	217	211	211
q15	503	466	471	466
q16	497	388	401	388
q17	1001	656	799	656
q18	8468	7920	7772	7772
q19	4234	1620	1590	1590
q20	659	316	315	315
q21	5166	4057	4096	4057
q22	410	329	333	329
Total cold run time: 117583 ms
Total hot run time: 42133 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4585	4509	4393	4393
q2	392	279	270	270
q3	3164	2963	2860	2860
q4	1937	1639	1612	1612
q5	5522	5490	5542	5490
q6	217	125	121	121
q7	2174	1813	1794	1794
q8	3289	3508	3391	3391
q9	8704	8795	8727	8727
q10	3995	3735	3874	3735
q11	610	525	514	514
q12	832	622	629	622
q13	16133	3179	3203	3179
q14	307	278	271	271
q15	542	488	471	471
q16	483	439	448	439
q17	1821	1523	1470	1470
q18	7654	7668	7623	7623
q19	1684	1522	1594	1522
q20	2050	1811	1805	1805
q21	10878	4720	4673	4673
q22	589	544	541	541
Total cold run time: 77562 ms
Total hot run time: 55523 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.99% (9148/25419)
Line Coverage: 27.43% (75143/273912)
Region Coverage: 26.64% (38891/145961)
Branch Coverage: 23.49% (19801/84310)
Coverage Report: http://coverage.selectdb-in.cc/coverage/69a7be4f6a203de575602ef22a242f8f2a310176_69a7be4f6a203de575602ef22a242f8f2a310176/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 168549 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 69a7be4f6a203de575602ef22a242f8f2a310176, data reload: false

query1	926	375	369	369
query2	7563	2414	2388	2388
query3	6642	225	214	214
query4	19240	17321	16909	16909
query5	4113	418	427	418
query6	246	164	169	164
query7	4589	304	285	285
query8	303	277	277	277
query9	8657	2400	2384	2384
query10	444	281	260	260
query11	10513	10032	10002	10002
query12	139	90	88	88
query13	1663	352	351	351
query14	8623	8333	6840	6840
query15	226	196	198	196
query16	7727	262	258	258
query17	1549	520	513	513
query18	1943	273	269	269
query19	202	152	148	148
query20	97	86	103	86
query21	202	139	135	135
query22	4118	3976	3914	3914
query23	33462	32826	33117	32826
query24	5689	2852	2844	2844
query25	465	368	351	351
query26	693	156	151	151
query27	1847	316	312	312
query28	3819	2061	2059	2059
query29	842	594	582	582
query30	227	147	150	147
query31	929	754	766	754
query32	58	52	52	52
query33	493	272	277	272
query34	886	475	485	475
query35	726	585	593	585
query36	1051	932	912	912
query37	102	65	64	64
query38	2874	2759	2747	2747
query39	858	786	782	782
query40	196	128	120	120
query41	48	44	45	44
query42	103	97	97	97
query43	581	551	535	535
query44	1107	724	733	724
query45	180	162	164	162
query46	1054	714	710	710
query47	1835	1765	1785	1765
query48	365	290	287	287
query49	764	373	391	373
query50	766	376	380	376
query51	6919	6825	6870	6825
query52	100	94	88	88
query53	361	288	287	287
query54	558	421	417	417
query55	73	73	71	71
query56	256	237	242	237
query57	1099	1071	1060	1060
query58	234	226	222	222
query59	3379	3394	3103	3103
query60	277	254	251	251
query61	93	89	89	89
query62	533	470	446	446
query63	308	287	287	287
query64	2658	1720	1726	1720
query65	3165	3073	3123	3073
query66	791	336	332	332
query67	15211	14738	14927	14738
query68	4560	532	522	522
query69	442	268	266	266
query70	1203	1155	1127	1127
query71	377	276	309	276
query72	7525	4168	2575	2575
query73	710	321	318	318
query74	5990	5523	5600	5523
query75	3458	2583	2614	2583
query76	2277	1009	1017	1009
query77	405	263	267	263
query78	10315	9692	9830	9692
query79	2453	520	517	517
query80	1090	427	423	423
query81	516	227	222	222
query82	708	92	90	90
query83	247	165	169	165
query84	250	89	82	82
query85	1422	305	311	305
query86	496	317	317	317
query87	3287	3144	3136	3136
query88	4043	2350	2345	2345
query89	475	399	403	399
query90	2064	191	190	190
query91	204	94	101	94
query92	58	48	48	48
query93	2323	501	487	487
query94	1179	179	185	179
query95	399	309	303	303
query96	594	266	265	265
query97	3137	2954	3043	2954
query98	248	211	214	211
query99	1224	841	843	841
Total cold run time: 251439 ms
Total hot run time: 168549 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.77 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 69a7be4f6a203de575602ef22a242f8f2a310176, data reload: false

query1	0.05	0.03	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.06	0.06
query5	0.50	0.49	0.51
query6	1.12	0.72	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.05
query9	0.55	0.48	0.49
query10	0.54	0.56	0.55
query11	0.16	0.11	0.11
query12	0.14	0.12	0.13
query13	0.60	0.60	0.59
query14	0.77	0.79	0.79
query15	0.82	0.81	0.82
query16	0.35	0.37	0.39
query17	0.95	1.00	0.97
query18	0.22	0.25	0.22
query19	1.82	1.73	1.68
query20	0.02	0.01	0.01
query21	15.74	0.66	0.66
query22	4.31	6.65	2.11
query23	18.29	1.32	1.25
query24	1.30	0.42	0.22
query25	0.15	0.08	0.08
query26	0.26	0.17	0.16
query27	0.07	0.07	0.08
query28	13.42	1.01	0.99
query29	13.74	3.32	3.32
query30	0.24	0.06	0.06
query31	2.89	0.39	0.38
query32	3.25	0.47	0.47
query33	2.92	2.89	2.93
query34	16.99	4.44	4.46
query35	4.66	4.48	4.60
query36	0.66	0.46	0.49
query37	0.18	0.15	0.16
query38	0.15	0.15	0.14
query39	0.04	0.04	0.03
query40	0.18	0.14	0.14
query41	0.09	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.03	0.03
Total cold run time: 110.3 s
Total hot run time: 30.77 s

@morrySnow
Copy link
Contributor

add description

@Yukang-Lian Yukang-Lian changed the title Optimize be select for group commit [Enhancement](group commit)Optimize be select for group commit May 31, 2024
@dataroaring dataroaring force-pushed the Optimize_Be_Select_For_Group_Commit_Step1 branch from 69a7be4 to 3b7a372 Compare June 5, 2024 00:51
Copy link
Contributor

github-actions bot commented Jun 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40456 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9d208b3d8e6533e24895c3b716d4a34ebf9b77f6, data reload: false

------ Round 1 ----------------------------------
q1	18506	4510	4347	4347
q2	2037	192	188	188
q3	10526	1222	1077	1077
q4	10209	896	865	865
q5	7545	2743	2669	2669
q6	223	143	142	142
q7	970	609	611	609
q8	9228	2094	2105	2094
q9	8626	6624	6592	6592
q10	8774	3814	3802	3802
q11	450	241	248	241
q12	403	236	234	234
q13	18697	2964	3017	2964
q14	279	233	238	233
q15	543	480	492	480
q16	521	386	385	385
q17	982	647	706	647
q18	8213	7581	7494	7494
q19	5479	1470	1411	1411
q20	663	333	346	333
q21	4909	3363	3447	3363
q22	366	286	296	286
Total cold run time: 118149 ms
Total hot run time: 40456 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4422	4267	4270	4267
q2	382	286	277	277
q3	3028	2956	2963	2956
q4	1993	1767	1684	1684
q5	5551	5569	5450	5450
q6	231	134	140	134
q7	2255	1900	1837	1837
q8	3269	3451	3441	3441
q9	8791	8844	8815	8815
q10	4197	3773	3882	3773
q11	599	538	502	502
q12	809	647	629	629
q13	16563	3168	3204	3168
q14	325	296	295	295
q15	523	498	494	494
q16	511	445	448	445
q17	1833	1544	1543	1543
q18	8160	7895	7848	7848
q19	1721	1603	1580	1580
q20	2094	1874	1845	1845
q21	5173	4999	4645	4645
q22	587	510	512	510
Total cold run time: 73017 ms
Total hot run time: 56138 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173255 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9d208b3d8e6533e24895c3b716d4a34ebf9b77f6, data reload: false

query1	914	381	364	364
query2	6430	1844	1785	1785
query3	6656	208	221	208
query4	24558	17634	17291	17291
query5	3659	486	478	478
query6	266	200	167	167
query7	4579	295	295	295
query8	243	200	206	200
query9	8523	2448	2438	2438
query10	428	274	296	274
query11	10757	10152	10012	10012
query12	120	87	86	86
query13	1636	380	370	370
query14	10192	7747	7599	7599
query15	226	170	175	170
query16	7703	463	499	463
query17	1591	568	574	568
query18	1958	283	281	281
query19	194	155	152	152
query20	86	81	80	80
query21	197	130	129	129
query22	4225	4031	3888	3888
query23	33915	33810	33538	33538
query24	11125	2951	2862	2862
query25	597	390	380	380
query26	715	151	150	150
query27	2196	279	279	279
query28	6493	2075	2077	2075
query29	871	651	655	651
query30	257	156	147	147
query31	986	764	748	748
query32	99	50	56	50
query33	750	333	360	333
query34	923	493	513	493
query35	854	773	719	719
query36	1151	998	990	990
query37	147	78	79	78
query38	2975	2890	2829	2829
query39	855	846	878	846
query40	205	127	122	122
query41	47	45	42	42
query42	115	98	100	98
query43	496	457	470	457
query44	1202	739	730	730
query45	191	159	157	157
query46	1063	707	740	707
query47	1839	1770	1771	1770
query48	380	296	290	290
query49	843	403	413	403
query50	776	415	401	401
query51	6989	6725	6776	6725
query52	98	96	93	93
query53	361	294	290	290
query54	921	448	444	444
query55	75	73	73	73
query56	280	269	267	267
query57	1143	1019	1008	1008
query58	244	249	261	249
query59	2872	2613	2708	2613
query60	310	271	273	271
query61	96	91	92	91
query62	798	639	611	611
query63	323	294	291	291
query64	9145	2221	1661	1661
query65	3158	3135	3083	3083
query66	753	336	323	323
query67	15316	14895	14778	14778
query68	4595	549	541	541
query69	500	338	350	338
query70	1201	1175	1065	1065
query71	430	291	280	280
query72	7954	5417	5794	5417
query73	742	326	325	325
query74	6057	5639	5650	5639
query75	3834	2677	2681	2677
query76	2777	942	941	941
query77	651	302	302	302
query78	9541	9050	9012	9012
query79	2901	526	530	526
query80	1584	539	461	461
query81	591	223	223	223
query82	911	136	135	135
query83	336	169	171	169
query84	278	83	93	83
query85	1983	319	295	295
query86	465	328	307	307
query87	3297	3078	3082	3078
query88	4027	2376	2359	2359
query89	479	387	391	387
query90	1861	196	200	196
query91	129	102	100	100
query92	57	47	51	47
query93	3124	525	512	512
query94	1190	285	288	285
query95	411	318	324	318
query96	600	332	271	271
query97	3180	3004	3066	3004
query98	218	202	195	195
query99	1726	1253	1281	1253
Total cold run time: 277719 ms
Total hot run time: 173255 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.5 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9d208b3d8e6533e24895c3b716d4a34ebf9b77f6, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.04
query4	1.69	0.07	0.07
query5	0.48	0.48	0.47
query6	1.12	0.72	0.72
query7	0.03	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.50	0.50
query10	0.55	0.53	0.54
query11	0.14	0.12	0.11
query12	0.16	0.11	0.12
query13	0.59	0.60	0.58
query14	0.77	0.76	0.79
query15	0.85	0.82	0.82
query16	0.37	0.37	0.36
query17	0.94	1.01	1.00
query18	0.23	0.22	0.22
query19	1.80	1.81	1.76
query20	0.01	0.01	0.01
query21	15.42	0.76	0.65
query22	4.05	7.13	1.84
query23	18.31	1.29	1.34
query24	2.12	0.22	0.23
query25	0.16	0.08	0.08
query26	0.30	0.21	0.21
query27	0.45	0.23	0.23
query28	13.30	1.04	1.00
query29	12.60	3.33	3.26
query30	0.27	0.06	0.06
query31	2.88	0.38	0.39
query32	3.27	0.48	0.46
query33	2.86	2.95	2.90
query34	16.80	4.29	4.32
query35	4.40	4.40	4.42
query36	0.64	0.48	0.46
query37	0.18	0.16	0.16
query38	0.15	0.14	0.14
query39	0.04	0.03	0.04
query40	0.14	0.13	0.14
query41	0.10	0.04	0.06
query42	0.06	0.05	0.06
query43	0.04	0.05	0.04
Total cold run time: 109.21 s
Total hot run time: 30.5 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 99d5bcc into apache:master Jul 20, 2024
25 of 27 checks passed
dataroaring pushed a commit that referenced this pull request Jul 22, 2024
1. Streamload and insert into, if batched and sent to the master FE,
should use a consistent BE strategy (previously, insert into reused the
first selected BE, while streamload used round robin). First, a map
<table id, be id> records a fixed be id for a certain table. The first
time a table is imported, a BE is randomly selected, and this table id
and be id are recorded in the map permanently. Subsequently, all data
imported into this table will select the BE corresponding to the table
id recorded in the map. This ensures that batching is maximized to a
single BE.
To address the issue of excessive load on a single BE, a variable
similar to a bvar window is used to monitor the total data volume sent
to a specific BE for a specific table during the batch interval (default
10 seconds). A second map <be id, window variable> is used to track
this. If a new import finds that its corresponding BE's window variable
is less than a certain value (e.g., 1G), the new import continues to be
sent to the corresponding BE according to map1. If it exceeds this
value, the new import is sent to another BE with the smallest window
variable value, and map1 is updated. If every BE exceeds this value, the
one with the smallest value is still chosen. This helps to alleviate
excessive pressure on a single BE.

2. For streamload, if batched and sent to a BE, it will batch directly
on this BE and will commit the transaction at the end of the import. At
this point, a request is sent to the FE, which records the size of this
import and adds it to the window variable.

3. Streamload sent to observer FE, as well as insert into sent to
observer FE, follow the logic in 1 by RPC, passing the table id to the
master FE to obtain the selected be id.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Jul 22, 2024
…e#35558)

1. Streamload and insert into, if batched and sent to the master FE,
should use a consistent BE strategy (previously, insert into reused the
first selected BE, while streamload used round robin). First, a map
<table id, be id> records a fixed be id for a certain table. The first
time a table is imported, a BE is randomly selected, and this table id
and be id are recorded in the map permanently. Subsequently, all data
imported into this table will select the BE corresponding to the table
id recorded in the map. This ensures that batching is maximized to a
single BE.
To address the issue of excessive load on a single BE, a variable
similar to a bvar window is used to monitor the total data volume sent
to a specific BE for a specific table during the batch interval (default
10 seconds). A second map <be id, window variable> is used to track
this. If a new import finds that its corresponding BE's window variable
is less than a certain value (e.g., 1G), the new import continues to be
sent to the corresponding BE according to map1. If it exceeds this
value, the new import is sent to another BE with the smallest window
variable value, and map1 is updated. If every BE exceeds this value, the
one with the smallest value is still chosen. This helps to alleviate
excessive pressure on a single BE.

2. For streamload, if batched and sent to a BE, it will batch directly
on this BE and will commit the transaction at the end of the import. At
this point, a request is sent to the FE, which records the size of this
import and adds it to the window variable.

3. Streamload sent to observer FE, as well as insert into sent to
observer FE, follow the logic in 1 by RPC, passing the table id to the
master FE to obtain the selected be id.
dataroaring pushed a commit that referenced this pull request Jul 24, 2024
dataroaring pushed a commit that referenced this pull request Aug 28, 2024
## Proposed changes

In #35558, we optimized be select for group commit. However, we forgot
to apply this strategy to cloud. This PR applys it.

<!--Describe your changes.-->
dataroaring pushed a commit that referenced this pull request Aug 28, 2024
## Proposed changes

In #35558, we optimized be select for group commit. However, we forgot
to apply this strategy to cloud. This PR applys it.

<!--Describe your changes.-->
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Sep 3, 2024
…#39986)

In apache#35558, we optimized be select for group commit. However, we forgot
to apply this strategy to cloud. This PR applys it.

<!--Describe your changes.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants