Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](split) remove retry when fetch split batch failed #37636

Merged
merged 1 commit into from
Jul 12, 2024

Conversation

AshinGau
Copy link
Member

Proposed changes

We need to remove the retry logic for failed to fetch split batch. Originally, this was implemented to handle cases where the cached client connection might have been lost and needed to be reestablished. However, this retry mechanism can lead to data loss. For instance, if a batch of data has already been sent, retrying can cause this batch to be lost without the receiver being aware of it.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@AshinGau
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39956 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7cd0053447c39970c595d5b78be362aeace87f48, data reload: false

------ Round 1 ----------------------------------
q1	17605	4348	4236	4236
q2	2015	195	180	180
q3	10457	1169	1134	1134
q4	10181	755	776	755
q5	7547	2700	2781	2700
q6	225	145	139	139
q7	946	603	605	603
q8	9204	2081	2056	2056
q9	8807	6614	6597	6597
q10	8840	3775	3781	3775
q11	454	242	233	233
q12	494	223	231	223
q13	18655	2990	3001	2990
q14	289	230	248	230
q15	529	494	475	475
q16	510	375	375	375
q17	969	705	745	705
q18	8037	7493	7303	7303
q19	7201	1450	1470	1450
q20	681	311	315	311
q21	4873	3162	3310	3162
q22	401	327	324	324
Total cold run time: 118920 ms
Total hot run time: 39956 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4389	4296	4280	4280
q2	386	282	259	259
q3	3063	2961	2906	2906
q4	2059	1765	1684	1684
q5	5551	5492	5456	5456
q6	237	142	134	134
q7	2269	1838	1807	1807
q8	3258	3403	3384	3384
q9	8745	8858	8790	8790
q10	4177	3829	3946	3829
q11	612	495	510	495
q12	841	727	660	660
q13	17270	3188	3165	3165
q14	313	309	297	297
q15	526	493	485	485
q16	482	432	436	432
q17	1772	1520	1504	1504
q18	8189	7849	7838	7838
q19	1822	1683	1539	1539
q20	2175	1861	1853	1853
q21	5089	4874	4883	4874
q22	654	576	571	571
Total cold run time: 73879 ms
Total hot run time: 56242 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173842 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7cd0053447c39970c595d5b78be362aeace87f48, data reload: false

query1	931	376	383	376
query2	6472	2556	2374	2374
query3	6639	211	217	211
query4	28360	17631	17146	17146
query5	3773	501	491	491
query6	301	174	176	174
query7	4598	292	287	287
query8	336	305	314	305
query9	8462	2416	2398	2398
query10	422	290	265	265
query11	11987	10116	10026	10026
query12	121	84	85	84
query13	1631	365	367	365
query14	11564	7760	6883	6883
query15	233	182	191	182
query16	7368	315	338	315
query17	1697	556	516	516
query18	1168	274	272	272
query19	192	156	144	144
query20	91	80	80	80
query21	210	125	126	125
query22	4273	4106	4015	4015
query23	34164	33770	33476	33476
query24	11119	2926	2931	2926
query25	593	415	392	392
query26	782	161	146	146
query27	2296	273	272	272
query28	6901	2192	2164	2164
query29	871	623	625	623
query30	259	151	150	150
query31	941	791	766	766
query32	100	55	55	55
query33	755	293	294	293
query34	1054	489	509	489
query35	715	578	641	578
query36	1183	959	983	959
query37	153	87	86	86
query38	3026	2824	2861	2824
query39	851	784	796	784
query40	209	121	117	117
query41	52	50	51	50
query42	123	98	94	94
query43	597	544	553	544
query44	1220	723	731	723
query45	207	167	172	167
query46	1085	724	734	724
query47	1834	1792	1777	1777
query48	366	300	296	296
query49	860	397	417	397
query50	795	381	391	381
query51	6837	6871	6792	6792
query52	103	91	90	90
query53	369	288	289	288
query54	879	440	447	440
query55	75	76	76	76
query56	277	261	263	261
query57	1113	1050	1072	1050
query58	238	256	270	256
query59	3596	3144	3397	3144
query60	299	274	280	274
query61	97	95	96	95
query62	800	614	648	614
query63	326	292	288	288
query64	9502	2228	1794	1794
query65	3217	3072	3142	3072
query66	783	344	341	341
query67	15690	14941	14868	14868
query68	5624	547	534	534
query69	707	411	347	347
query70	1128	1165	1179	1165
query71	471	284	273	273
query72	7645	5888	5676	5676
query73	773	319	323	319
query74	6003	5540	5506	5506
query75	3488	2683	2713	2683
query76	3169	952	939	939
query77	628	304	297	297
query78	11792	9312	8939	8939
query79	6912	523	525	523
query80	1582	485	467	467
query81	595	219	220	219
query82	791	132	127	127
query83	278	168	161	161
query84	275	88	86	86
query85	841	372	302	302
query86	468	282	330	282
query87	3293	3161	3130	3130
query88	4690	2354	2353	2353
query89	489	382	391	382
query90	1803	187	193	187
query91	130	101	101	101
query92	64	51	50	50
query93	5192	509	504	504
query94	1106	208	212	208
query95	412	313	308	308
query96	617	280	268	268
query97	3221	3036	3024	3024
query98	234	196	200	196
query99	1531	1256	1256	1256
Total cold run time: 294093 ms
Total hot run time: 173842 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.02 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7cd0053447c39970c595d5b78be362aeace87f48, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.05
query3	0.22	0.06	0.06
query4	1.66	0.11	0.09
query5	0.50	0.49	0.49
query6	1.14	0.73	0.72
query7	0.02	0.01	0.01
query8	0.06	0.05	0.05
query9	0.55	0.49	0.49
query10	0.54	0.54	0.55
query11	0.14	0.11	0.12
query12	0.15	0.12	0.13
query13	0.60	0.60	0.57
query14	0.74	0.80	0.76
query15	0.86	0.83	0.81
query16	0.37	0.36	0.36
query17	0.94	0.96	0.95
query18	0.22	0.22	0.21
query19	1.78	1.76	1.77
query20	0.02	0.01	0.01
query21	15.47	0.77	0.66
query22	4.59	6.58	2.26
query23	18.32	1.41	1.32
query24	2.07	0.23	0.22
query25	0.16	0.10	0.08
query26	0.28	0.20	0.22
query27	0.45	0.22	0.22
query28	13.28	1.01	0.99
query29	12.67	3.35	3.28
query30	0.26	0.06	0.06
query31	2.87	0.39	0.39
query32	3.27	0.49	0.48
query33	2.84	2.95	2.94
query34	16.87	4.36	4.32
query35	4.43	4.36	4.38
query36	0.65	0.48	0.47
query37	0.19	0.15	0.16
query38	0.16	0.15	0.15
query39	0.04	0.04	0.04
query40	0.15	0.12	0.11
query41	0.09	0.05	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.84 s
Total hot run time: 31.02 s

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 83a734e into apache:master Jul 12, 2024
27 of 30 checks passed
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 12, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
## Proposed changes

We need to remove the retry logic for failed to fetch split batch.
Originally, this was implemented to handle cases where the cached client
connection might have been lost and needed to be reestablished. However,
this retry mechanism can lead to data loss. For instance, if a batch of
data has already been sent, retrying can cause this batch to be lost
without the receiver being aware of it.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
## Proposed changes

We need to remove the retry logic for failed to fetch split batch.
Originally, this was implemented to handle cases where the cached client
connection might have been lost and needed to be reestablished. However,
this retry mechanism can lead to data loss. For instance, if a batch of
data has already been sent, retrying can cause this batch to be lost
without the receiver being aware of it.
@yiguolei yiguolei mentioned this pull request Jul 19, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants