Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](nereids) refine row count estimation for mark join #38270

Merged
merged 2 commits into from
Jul 24, 2024

Conversation

xzj7019
Copy link
Contributor

@xzj7019 xzj7019 commented Jul 23, 2024

Proposed changes

Issue Number: close #xxx

Current semi/anti stats estimation doesn't consider the mark join case, whose row count should follow either side's stats without change.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@xzj7019
Copy link
Contributor Author

xzj7019 commented Jul 23, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39496 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 35e797938640501abdd8a69932abff333888f213, data reload: false

------ Round 1 ----------------------------------
q1	17638	4336	4194	4194
q2	2013	191	183	183
q3	10462	1171	1053	1053
q4	10182	718	789	718
q5	7530	2637	2666	2637
q6	222	136	134	134
q7	946	595	593	593
q8	9218	2072	2079	2072
q9	8739	6583	6581	6581
q10	8860	3774	3799	3774
q11	463	229	245	229
q12	521	229	225	225
q13	18740	2956	2959	2956
q14	292	242	233	233
q15	502	493	479	479
q16	527	393	377	377
q17	976	594	641	594
q18	7959	7383	7374	7374
q19	6376	1385	1402	1385
q20	690	319	317	317
q21	4916	3110	3243	3110
q22	357	283	278	278
Total cold run time: 118129 ms
Total hot run time: 39496 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4375	4238	4239	4238
q2	384	253	278	253
q3	2970	2851	2890	2851
q4	1961	1684	1696	1684
q5	5639	5496	5514	5496
q6	212	134	128	128
q7	2157	1822	1852	1822
q8	3260	3405	3387	3387
q9	8699	8760	8854	8760
q10	4077	3948	3718	3718
q11	590	487	474	474
q12	815	666	646	646
q13	15915	3137	3126	3126
q14	310	289	281	281
q15	528	506	480	480
q16	479	435	417	417
q17	1812	1496	1515	1496
q18	8052	8003	7827	7827
q19	1701	1638	1501	1501
q20	2068	1865	1847	1847
q21	7932	5010	4669	4669
q22	595	521	498	498
Total cold run time: 74531 ms
Total hot run time: 55599 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173491 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 35e797938640501abdd8a69932abff333888f213, data reload: false

query1	908	369	362	362
query2	6436	1805	1781	1781
query3	6627	203	217	203
query4	25902	17671	17217	17217
query5	3561	485	500	485
query6	281	185	160	160
query7	4572	293	299	293
query8	257	195	195	195
query9	8494	2434	2404	2404
query10	445	301	269	269
query11	10790	9971	10106	9971
query12	112	82	82	82
query13	1653	373	363	363
query14	10257	6963	7785	6963
query15	229	167	170	167
query16	7773	500	498	498
query17	1793	571	559	559
query18	1786	292	289	289
query19	205	160	162	160
query20	92	84	90	84
query21	201	131	129	129
query22	4529	4122	3931	3931
query23	34026	33785	33607	33607
query24	11002	2923	3022	2923
query25	659	409	400	400
query26	1216	158	155	155
query27	2865	272	278	272
query28	7618	2065	2065	2065
query29	972	640	640	640
query30	251	150	149	149
query31	983	760	768	760
query32	93	55	58	55
query33	753	350	321	321
query34	956	500	504	500
query35	884	756	737	737
query36	1146	983	990	983
query37	151	81	86	81
query38	2956	2929	2839	2839
query39	924	842	888	842
query40	220	126	119	119
query41	46	43	45	43
query42	113	111	99	99
query43	485	481	478	478
query44	1203	728	731	728
query45	199	168	172	168
query46	1085	730	724	724
query47	1835	1758	1767	1758
query48	359	295	280	280
query49	836	400	426	400
query50	770	392	388	388
query51	7054	6662	6722	6662
query52	99	95	93	93
query53	355	286	295	286
query54	882	441	448	441
query55	76	74	75	74
query56	298	283	279	279
query57	1149	1058	1054	1054
query58	253	250	263	250
query59	2830	2664	2493	2493
query60	340	279	278	278
query61	111	91	96	91
query62	795	658	650	650
query63	326	292	284	284
query64	9427	2188	5640	2188
query65	3178	3162	3172	3162
query66	789	324	332	324
query67	15537	14916	14787	14787
query68	6198	557	549	549
query69	729	425	344	344
query70	1199	1169	1137	1137
query71	483	275	281	275
query72	8605	5674	5882	5674
query73	765	322	324	322
query74	6120	5669	5654	5654
query75	4865	2707	2669	2669
query76	3983	973	903	903
query77	759	308	307	307
query78	9656	9820	8946	8946
query79	7775	543	536	536
query80	1090	535	486	486
query81	580	225	226	225
query82	1370	142	135	135
query83	308	166	167	166
query84	268	91	86	86
query85	1436	312	298	298
query86	446	310	321	310
query87	3307	3053	3028	3028
query88	4697	2402	2416	2402
query89	513	391	383	383
query90	1912	202	194	194
query91	134	102	100	100
query92	65	51	52	51
query93	6158	518	512	512
query94	1284	292	268	268
query95	406	318	326	318
query96	622	275	271	271
query97	3203	3047	3050	3047
query98	227	191	190	190
query99	1600	1239	1230	1230
Total cold run time: 295771 ms
Total hot run time: 173491 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 35e797938640501abdd8a69932abff333888f213, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.04
query3	0.22	0.06	0.04
query4	1.67	0.07	0.07
query5	0.48	0.49	0.47
query6	1.12	0.71	0.73
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.55	0.47	0.50
query10	0.54	0.55	0.55
query11	0.16	0.11	0.12
query12	0.15	0.12	0.13
query13	0.60	0.59	0.59
query14	0.75	0.78	0.78
query15	0.85	0.82	0.84
query16	0.37	0.35	0.37
query17	0.96	0.96	1.02
query18	0.23	0.21	0.21
query19	1.77	1.68	1.66
query20	0.01	0.01	0.00
query21	15.39	0.76	0.64
query22	4.38	7.20	2.09
query23	18.30	1.34	1.30
query24	2.11	0.24	0.22
query25	0.16	0.09	0.09
query26	0.29	0.20	0.20
query27	0.45	0.23	0.23
query28	13.22	1.01	0.99
query29	12.60	3.29	3.28
query30	0.25	0.07	0.06
query31	2.84	0.38	0.38
query32	3.28	0.48	0.47
query33	2.92	2.89	2.97
query34	17.23	4.34	4.38
query35	4.45	4.46	4.43
query36	0.66	0.47	0.46
query37	0.19	0.16	0.15
query38	0.15	0.16	0.14
query39	0.05	0.03	0.04
query40	0.16	0.12	0.13
query41	0.09	0.04	0.04
query42	0.06	0.04	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.9 s
Total hot run time: 30.66 s

@xzj7019
Copy link
Contributor Author

xzj7019 commented Jul 24, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39656 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a41efd4a83b0ee35ea2d758eada093c464015358, data reload: false

------ Round 1 ----------------------------------
q1	17636	4327	4263	4263
q2	2015	190	193	190
q3	10449	1141	1169	1141
q4	10186	828	717	717
q5	7646	2695	2556	2556
q6	220	153	136	136
q7	958	600	594	594
q8	9211	2058	2066	2058
q9	8853	6574	6551	6551
q10	8831	3763	3765	3763
q11	473	234	233	233
q12	503	223	218	218
q13	18703	2966	2978	2966
q14	283	226	236	226
q15	512	478	484	478
q16	479	397	377	377
q17	959	679	693	679
q18	8004	7475	7445	7445
q19	4960	1391	1361	1361
q20	666	333	307	307
q21	4980	3114	3299	3114
q22	359	284	283	283
Total cold run time: 116886 ms
Total hot run time: 39656 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4339	4297	4224	4224
q2	362	261	257	257
q3	2975	2725	2821	2725
q4	1992	1718	1718	1718
q5	5676	5534	5535	5534
q6	233	133	131	131
q7	2161	1819	1891	1819
q8	3254	3371	3433	3371
q9	8813	8772	8957	8772
q10	4096	3959	3741	3741
q11	584	492	499	492
q12	823	634	646	634
q13	17245	3130	3173	3130
q14	316	295	280	280
q15	522	479	493	479
q16	508	431	444	431
q17	1836	1544	1505	1505
q18	8084	7963	7880	7880
q19	1816	1627	1465	1465
q20	2180	1864	1859	1859
q21	9372	4824	4663	4663
q22	592	501	512	501
Total cold run time: 77779 ms
Total hot run time: 55611 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174386 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a41efd4a83b0ee35ea2d758eada093c464015358, data reload: false

query1	905	368	368	368
query2	6429	1941	1871	1871
query3	6638	206	214	206
query4	27863	17442	17276	17276
query5	3677	482	496	482
query6	279	167	158	158
query7	4585	289	295	289
query8	246	199	192	192
query9	8467	2400	2405	2400
query10	418	285	267	267
query11	10596	9949	10103	9949
query12	126	81	82	81
query13	1645	382	368	368
query14	10234	7842	7648	7648
query15	228	174	175	174
query16	7635	480	477	477
query17	1373	547	522	522
query18	1761	271	269	269
query19	193	153	146	146
query20	92	81	79	79
query21	203	124	122	122
query22	4295	4034	3988	3988
query23	34170	33978	33535	33535
query24	11106	2952	2969	2952
query25	645	405	421	405
query26	1012	153	157	153
query27	2368	280	283	280
query28	6343	2071	2061	2061
query29	903	654	623	623
query30	261	165	158	158
query31	1001	755	770	755
query32	96	53	56	53
query33	752	348	328	328
query34	942	501	514	501
query35	863	764	790	764
query36	1152	998	999	998
query37	149	88	87	87
query38	2941	2816	2801	2801
query39	910	876	835	835
query40	223	125	123	123
query41	54	47	48	47
query42	119	105	103	103
query43	511	484	485	484
query44	1265	725	762	725
query45	215	179	173	173
query46	1089	736	759	736
query47	1860	1774	1793	1774
query48	390	304	294	294
query49	855	420	431	420
query50	779	395	388	388
query51	6792	6701	6617	6617
query52	105	97	96	96
query53	386	299	295	295
query54	940	463	469	463
query55	80	76	77	76
query56	312	286	307	286
query57	1146	1078	1047	1047
query58	258	254	266	254
query59	3021	2618	2784	2618
query60	333	299	302	299
query61	115	114	110	110
query62	806	660	658	658
query63	329	304	301	301
query64	9412	2308	1755	1755
query65	3195	3103	3150	3103
query66	756	345	338	338
query67	15438	15096	14922	14922
query68	8250	564	575	564
query69	739	453	366	366
query70	1164	1095	1080	1080
query71	532	284	287	284
query72	9100	6077	5839	5839
query73	1112	327	327	327
query74	6237	5683	5671	5671
query75	4960	2667	2741	2667
query76	5119	978	1004	978
query77	796	305	306	305
query78	10688	9099	8841	8841
query79	12308	542	516	516
query80	1116	476	474	474
query81	581	226	226	226
query82	755	138	133	133
query83	309	168	170	168
query84	276	87	86	86
query85	1393	312	289	289
query86	364	318	302	302
query87	3317	3126	3171	3126
query88	5764	2442	2376	2376
query89	520	392	425	392
query90	2029	197	190	190
query91	128	100	100	100
query92	59	48	49	48
query93	7433	525	511	511
query94	910	285	289	285
query95	405	315	317	315
query96	637	265	266	265
query97	3227	3047	3060	3047
query98	239	196	200	196
query99	1567	1298	1293	1293
Total cold run time: 306320 ms
Total hot run time: 174386 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a41efd4a83b0ee35ea2d758eada093c464015358, data reload: false

query1	0.05	0.03	0.03
query2	0.07	0.04	0.04
query3	0.22	0.05	0.06
query4	1.67	0.09	0.09
query5	0.49	0.48	0.48
query6	1.14	0.74	0.73
query7	0.02	0.01	0.01
query8	0.05	0.05	0.04
query9	0.55	0.49	0.50
query10	0.55	0.54	0.53
query11	0.16	0.11	0.11
query12	0.14	0.13	0.12
query13	0.60	0.59	0.58
query14	0.76	0.80	0.78
query15	0.85	0.80	0.81
query16	0.36	0.36	0.37
query17	1.00	0.98	1.04
query18	0.22	0.22	0.22
query19	1.86	1.68	1.73
query20	0.01	0.01	0.01
query21	15.40	0.75	0.66
query22	3.83	7.54	2.56
query23	18.43	1.33	1.28
query24	2.12	0.26	0.22
query25	0.15	0.09	0.08
query26	0.30	0.21	0.21
query27	0.45	0.24	0.23
query28	13.19	1.01	0.99
query29	12.63	3.37	3.33
query30	0.25	0.06	0.05
query31	2.86	0.40	0.39
query32	3.28	0.47	0.47
query33	2.92	2.94	2.90
query34	16.97	4.39	4.36
query35	4.46	4.43	4.43
query36	0.65	0.46	0.50
query37	0.19	0.16	0.15
query38	0.15	0.15	0.15
query39	0.04	0.03	0.04
query40	0.15	0.12	0.13
query41	0.10	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.38 s
Total hot run time: 31.34 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 24, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit d110859 into apache:master Jul 24, 2024
27 of 29 checks passed
dataroaring pushed a commit that referenced this pull request Jul 24, 2024
Current semi/anti stats estimation doesn't consider the mark join case,
whose row count should follow either side's stats without change.
@yiguolei yiguolei mentioned this pull request Sep 5, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants