Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](load) Fix the channel leak when close wait has been cancelled #38031

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

liaoxin01
Copy link
Contributor

When the close_wait is called, the NodeChannel has already been marked as cancelled, but close_wait will set _is_closed to true. When it actually sends a cancel request to the downstream LoadChannel, it finds that _is_closed has already been set to true, so it will not send an RPC request, causing a LoadChannel leak.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@liaoxin01
Copy link
Contributor Author

run buildall

dataroaring
dataroaring previously approved these changes Jul 17, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 17, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@liaoxin01
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 17, 2024
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39552 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f3d6a879460c611825f142e8b1686b08fb0aefe, data reload: false

------ Round 1 ----------------------------------
q1	17624	4400	4251	4251
q2	2015	192	188	188
q3	10448	1231	1091	1091
q4	10189	801	780	780
q5	7582	2681	2622	2622
q6	220	142	140	140
q7	961	598	619	598
q8	9188	2039	2073	2039
q9	8671	6545	6505	6505
q10	8732	3719	3802	3719
q11	453	238	240	238
q12	418	229	223	223
q13	17762	2961	3013	2961
q14	274	248	231	231
q15	544	470	464	464
q16	477	392	375	375
q17	948	630	658	630
q18	8106	7486	7389	7389
q19	6863	1373	1448	1373
q20	691	323	326	323
q21	4890	3128	3288	3128
q22	352	298	284	284
Total cold run time: 117408 ms
Total hot run time: 39552 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4366	4247	4276	4247
q2	367	268	262	262
q3	3040	2831	2888	2831
q4	1971	1705	1676	1676
q5	5692	5574	5581	5574
q6	218	136	135	135
q7	2195	1797	1872	1797
q8	3282	3426	3389	3389
q9	8828	8829	8885	8829
q10	4105	3965	3734	3734
q11	602	497	532	497
q12	793	653	612	612
q13	15969	3169	3197	3169
q14	317	297	293	293
q15	556	479	502	479
q16	484	438	451	438
q17	1801	1576	1514	1514
q18	8026	8047	7786	7786
q19	1780	1596	1502	1502
q20	2217	1917	1854	1854
q21	5102	4798	4748	4748
q22	595	508	527	508
Total cold run time: 72306 ms
Total hot run time: 55874 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174661 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6f3d6a879460c611825f142e8b1686b08fb0aefe, data reload: false

query1	917	375	365	365
query2	6461	1910	1804	1804
query3	6634	207	216	207
query4	27946	17478	17346	17346
query5	3689	494	493	493
query6	266	179	193	179
query7	4590	298	290	290
query8	244	195	197	195
query9	8688	2467	2428	2428
query10	458	283	292	283
query11	11811	10101	10194	10101
query12	115	81	83	81
query13	1657	356	359	356
query14	10115	7767	7714	7714
query15	213	167	164	164
query16	7736	319	305	305
query17	1813	554	531	531
query18	1973	278	280	278
query19	197	147	149	147
query20	90	81	86	81
query21	208	133	132	132
query22	4259	4006	3980	3980
query23	34017	33536	33498	33498
query24	10959	2916	2919	2916
query25	601	383	371	371
query26	709	152	161	152
query27	2202	277	278	277
query28	6275	2077	2064	2064
query29	909	652	639	639
query30	251	157	162	157
query31	975	770	786	770
query32	99	54	54	54
query33	732	330	289	289
query34	921	497	496	496
query35	676	595	598	595
query36	1152	989	999	989
query37	150	83	92	83
query38	2940	2852	2836	2836
query39	918	862	879	862
query40	206	125	120	120
query41	46	46	44	44
query42	113	108	98	98
query43	514	475	467	467
query44	1177	730	730	730
query45	193	162	160	160
query46	1086	710	758	710
query47	1840	1758	1739	1739
query48	360	287	286	286
query49	831	424	434	424
query50	783	397	400	397
query51	6915	6835	6829	6829
query52	113	97	102	97
query53	359	295	302	295
query54	916	449	444	444
query55	75	74	74	74
query56	295	265	282	265
query57	1132	1046	1040	1040
query58	236	250	270	250
query59	2798	2810	2835	2810
query60	298	288	306	288
query61	95	94	91	91
query62	808	639	667	639
query63	320	290	290	290
query64	9154	2254	7086	2254
query65	3171	3110	3106	3106
query66	756	327	333	327
query67	15312	15087	14852	14852
query68	4524	541	530	530
query69	573	419	371	371
query70	1170	1113	1144	1113
query71	397	282	281	281
query72	7099	6152	5619	5619
query73	749	322	318	318
query74	6164	5636	5623	5623
query75	3380	2691	2712	2691
query76	2564	940	907	907
query77	443	315	310	310
query78	9590	9031	9022	9022
query79	3002	513	506	506
query80	2168	471	463	463
query81	587	226	218	218
query82	802	140	142	140
query83	300	170	167	167
query84	269	90	88	88
query85	1924	385	365	365
query86	475	310	310	310
query87	3235	3096	3094	3094
query88	3905	2386	2425	2386
query89	483	391	385	385
query90	1722	199	196	196
query91	144	116	111	111
query92	64	53	55	53
query93	3126	494	502	494
query94	1060	223	225	223
query95	416	329	334	329
query96	628	276	275	275
query97	3174	2999	3058	2999
query98	235	203	196	196
query99	1503	1244	1283	1244
Total cold run time: 280343 ms
Total hot run time: 174661 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6f3d6a879460c611825f142e8b1686b08fb0aefe, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.66	0.08	0.08
query5	0.49	0.48	0.49
query6	1.13	0.73	0.72
query7	0.02	0.01	0.02
query8	0.06	0.04	0.04
query9	0.54	0.48	0.49
query10	0.53	0.56	0.55
query11	0.15	0.12	0.12
query12	0.16	0.12	0.13
query13	0.59	0.58	0.59
query14	0.76	0.79	0.77
query15	0.86	0.83	0.81
query16	0.37	0.36	0.36
query17	1.05	0.98	1.00
query18	0.24	0.22	0.21
query19	1.77	1.82	1.73
query20	0.01	0.01	0.01
query21	15.40	0.73	0.64
query22	4.10	8.03	1.88
query23	18.26	1.44	1.28
query24	2.08	0.23	0.22
query25	0.15	0.09	0.08
query26	0.29	0.22	0.21
query27	0.46	0.23	0.22
query28	13.34	1.02	1.00
query29	12.62	3.36	3.34
query30	0.25	0.06	0.05
query31	2.87	0.39	0.40
query32	3.26	0.49	0.48
query33	2.87	2.91	2.95
query34	17.05	4.33	4.34
query35	4.45	4.44	4.51
query36	0.65	0.46	0.48
query37	0.19	0.15	0.16
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.15	0.12	0.13
query41	0.09	0.06	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.57 s
Total hot run time: 30.69 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 17, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit a9a633c into apache:master Jul 18, 2024
27 of 30 checks passed
dataroaring pushed a commit that referenced this pull request Jul 19, 2024
…38031)

When the close_wait is called, the NodeChannel has already been marked
as cancelled, but close_wait will set _is_closed to true. When it
actually sends a cancel request to the downstream LoadChannel, it finds
that _is_closed has already been set to true, so it will not send an RPC
request, causing a LoadChannel leak.
liaoxin01 added a commit to liaoxin01/doris that referenced this pull request Jul 19, 2024
…pache#38031)

When the close_wait is called, the NodeChannel has already been marked
as cancelled, but close_wait will set _is_closed to true. When it
actually sends a cancel request to the downstream LoadChannel, it finds
that _is_closed has already been set to true, so it will not send an RPC
request, causing a LoadChannel leak.
liaoxin01 added a commit to liaoxin01/doris that referenced this pull request Jul 22, 2024
…pache#38031)

When the close_wait is called, the NodeChannel has already been marked
as cancelled, but close_wait will set _is_closed to true. When it
actually sends a cancel request to the downstream LoadChannel, it finds
that _is_closed has already been set to true, so it will not send an RPC
request, causing a LoadChannel leak.
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants