Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhance](auth)support cache ranger datamask and row filter #37723

Merged
merged 5 commits into from
Jul 30, 2024

Conversation

zddr
Copy link
Contributor

@zddr zddr commented Jul 12, 2024

The Ranger plugin is relatively slow in obtaining datamask and row filter policies, and the time consumed will become slower as the number of policies configured on the Ranger increases

optimized logic:
cache result in memory of doris, cache will invalidate when ranger plugin discover policy updates

before:

mysql> select * from zd.user;
Empty set (0.19 sec)

after:

mysql> select * from zd.user;
Empty set (0.06 sec)

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zddr
Copy link
Contributor Author

zddr commented Jul 12, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39972 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c28dc75b39149e79584cdc08b99851efad384957, data reload: false

------ Round 1 ----------------------------------
q1	17603	4346	4202	4202
q2	2015	198	190	190
q3	10457	1271	1067	1067
q4	10195	834	803	803
q5	7617	2745	2703	2703
q6	228	146	143	143
q7	979	613	618	613
q8	9628	2093	2117	2093
q9	9150	6566	6557	6557
q10	8845	3797	3759	3759
q11	461	239	247	239
q12	401	224	227	224
q13	18467	2968	3003	2968
q14	287	232	238	232
q15	523	483	488	483
q16	487	381	374	374
q17	959	661	619	619
q18	7993	7468	7496	7468
q19	4890	1388	1434	1388
q20	663	333	346	333
q21	4930	3231	3355	3231
q22	345	283	290	283
Total cold run time: 117123 ms
Total hot run time: 39972 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4399	4251	4302	4251
q2	378	275	272	272
q3	2989	2732	2704	2704
q4	1896	1609	1598	1598
q5	5284	5305	5329	5305
q6	217	136	132	132
q7	2114	1755	1752	1752
q8	3194	3314	3334	3314
q9	8384	8382	8340	8340
q10	3928	3707	3696	3696
q11	587	498	488	488
q12	773	604	579	579
q13	17472	3039	2985	2985
q14	302	284	269	269
q15	531	484	485	484
q16	485	414	425	414
q17	1765	1455	1481	1455
q18	7781	7532	7331	7331
q19	1688	1462	1517	1462
q20	1959	1775	1790	1775
q21	4822	4674	4800	4674
q22	569	499	475	475
Total cold run time: 71517 ms
Total hot run time: 53755 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172600 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c28dc75b39149e79584cdc08b99851efad384957, data reload: false

query1	912	375	353	353
query2	6464	1842	1816	1816
query3	6654	206	224	206
query4	23323	17733	17503	17503
query5	4187	499	523	499
query6	301	164	165	164
query7	4591	297	293	293
query8	245	216	194	194
query9	8315	2479	2470	2470
query10	444	283	267	267
query11	11812	10118	10185	10118
query12	135	85	86	85
query13	1650	393	390	390
query14	10080	7413	7512	7413
query15	214	170	166	166
query16	7815	313	317	313
query17	1802	534	531	531
query18	1949	270	272	270
query19	193	162	147	147
query20	97	80	84	80
query21	213	124	125	124
query22	4351	4049	3965	3965
query23	33696	33329	33013	33013
query24	12153	2934	2834	2834
query25	674	358	364	358
query26	1830	150	146	146
query27	3000	280	274	274
query28	7855	2093	2088	2088
query29	1163	653	611	611
query30	284	149	146	146
query31	937	747	754	747
query32	93	52	57	52
query33	773	310	283	283
query34	987	486	492	486
query35	655	583	580	580
query36	1107	918	925	918
query37	290	80	83	80
query38	2869	2772	2776	2772
query39	860	786	793	786
query40	276	120	118	118
query41	47	48	44	44
query42	130	97	102	97
query43	489	463	493	463
query44	1229	751	748	748
query45	198	164	160	160
query46	1087	716	728	716
query47	1868	1818	1789	1789
query48	374	297	296	296
query49	1202	414	407	407
query50	784	389	385	385
query51	6892	6806	6848	6806
query52	104	89	97	89
query53	352	285	286	285
query54	904	457	449	449
query55	72	76	73	73
query56	284	274	273	273
query57	1153	1060	1026	1026
query58	257	241	248	241
query59	2793	2505	2641	2505
query60	295	272	279	272
query61	99	94	97	94
query62	834	649	631	631
query63	314	289	288	288
query64	10490	2189	1663	1663
query65	3190	3072	3127	3072
query66	1299	336	328	328
query67	15388	15137	15109	15109
query68	4599	558	547	547
query69	446	329	332	329
query70	1183	1171	1176	1171
query71	407	277	273	273
query72	7011	5172	5768	5172
query73	760	328	328	328
query74	5975	5611	5596	5596
query75	3374	2689	2677	2677
query76	2761	968	886	886
query77	471	319	388	319
query78	9691	9040	9539	9040
query79	2547	521	525	521
query80	2280	495	468	468
query81	579	218	220	218
query82	794	133	130	130
query83	292	173	170	170
query84	277	91	87	87
query85	2151	320	300	300
query86	489	312	305	305
query87	3313	3167	3191	3167
query88	3817	2461	2446	2446
query89	477	382	384	382
query90	1837	191	193	191
query91	126	102	107	102
query92	68	49	52	49
query93	2593	517	531	517
query94	1145	268	215	215
query95	405	319	317	317
query96	602	282	270	270
query97	3232	2995	3039	2995
query98	227	192	194	192
query99	1572	1252	1295	1252
Total cold run time: 282878 ms
Total hot run time: 172600 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.38 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c28dc75b39149e79584cdc08b99851efad384957, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.05
query3	0.23	0.04	0.05
query4	1.66	0.08	0.08
query5	0.49	0.49	0.47
query6	1.14	0.73	0.73
query7	0.02	0.02	0.01
query8	0.06	0.05	0.05
query9	0.55	0.51	0.49
query10	0.53	0.54	0.55
query11	0.15	0.11	0.12
query12	0.15	0.13	0.12
query13	0.59	0.59	0.59
query14	0.76	0.79	0.83
query15	0.86	0.82	0.83
query16	0.37	0.36	0.37
query17	1.07	1.00	1.05
query18	0.23	0.22	0.23
query19	1.86	1.69	1.72
query20	0.01	0.01	0.01
query21	15.42	0.75	0.65
query22	4.23	6.44	2.60
query23	18.31	1.44	1.26
query24	2.08	0.25	0.22
query25	0.15	0.08	0.08
query26	0.28	0.21	0.20
query27	0.45	0.23	0.24
query28	13.23	1.03	1.00
query29	12.58	3.39	3.33
query30	0.27	0.06	0.06
query31	2.86	0.39	0.40
query32	3.27	0.48	0.47
query33	2.88	2.92	2.91
query34	16.86	4.32	4.36
query35	4.47	4.42	4.46
query36	0.65	0.46	0.47
query37	0.18	0.16	0.16
query38	0.15	0.16	0.15
query39	0.05	0.03	0.04
query40	0.15	0.12	0.12
query41	0.10	0.05	0.04
query42	0.07	0.05	0.06
query43	0.05	0.04	0.04
Total cold run time: 109.59 s
Total hot run time: 31.38 s

@wm1581066 wm1581066 requested a review from morrySnow July 15, 2024 00:13
@wm1581066 wm1581066 added dev/2.1.x usercase Important user case type label labels Jul 15, 2024
Copy link
Contributor

@morrySnow morrySnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i read code about ranger in hive. i think ranger could get all column mask in one table once by set setResourceMatchingScope to SELF_OR_DESCENDANTS

@zddr
Copy link
Contributor Author

zddr commented Jul 22, 2024

run buildall

@zddr zddr requested a review from morrySnow July 22, 2024 09:00
@doris-robot
Copy link

TPC-H: Total hot run time: 39987 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 41f5256f6ec4294c09656568ddf22d1b8b3dee74, data reload: false

------ Round 1 ----------------------------------
q1	18006	4536	4381	4381
q2	2684	201	206	201
q3	11952	1209	1058	1058
q4	10479	778	862	778
q5	7581	2739	2824	2739
q6	225	146	148	146
q7	972	612	601	601
q8	9584	2057	2071	2057
q9	8652	6603	6550	6550
q10	8770	3824	3791	3791
q11	457	244	243	243
q12	393	222	232	222
q13	17758	2988	3008	2988
q14	272	239	238	238
q15	525	480	490	480
q16	485	410	375	375
q17	978	623	670	623
q18	8110	7487	7344	7344
q19	3277	1449	1402	1402
q20	678	331	320	320
q21	4968	3167	3301	3167
q22	350	283	285	283
Total cold run time: 117156 ms
Total hot run time: 39987 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4393	4285	4331	4285
q2	368	264	254	254
q3	3006	2770	2686	2686
q4	1871	1616	1630	1616
q5	5259	5315	5347	5315
q6	221	130	133	130
q7	2132	1748	1690	1690
q8	3203	3329	3320	3320
q9	8473	8402	8470	8402
q10	3925	3746	3663	3663
q11	586	492	494	492
q12	773	641	583	583
q13	17035	2971	2971	2971
q14	314	269	291	269
q15	512	482	474	474
q16	464	414	416	414
q17	1751	1480	1496	1480
q18	7672	7544	7452	7452
q19	1669	1525	1484	1484
q20	1990	1803	1752	1752
q21	4806	4785	4748	4748
q22	614	500	485	485
Total cold run time: 71037 ms
Total hot run time: 53965 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174268 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 41f5256f6ec4294c09656568ddf22d1b8b3dee74, data reload: false

query1	906	372	356	356
query2	6456	1971	1837	1837
query3	6673	206	216	206
query4	28601	17480	17243	17243
query5	4241	489	482	482
query6	304	181	177	177
query7	4611	287	286	286
query8	251	196	196	196
query9	8527	2432	2410	2410
query10	437	281	283	281
query11	10950	10061	10046	10046
query12	141	89	80	80
query13	1628	365	361	361
query14	9524	7968	7900	7900
query15	225	167	172	167
query16	7749	492	509	492
query17	1583	584	554	554
query18	1839	287	282	282
query19	202	159	158	158
query20	96	83	85	83
query21	210	133	130	130
query22	4262	3985	3985	3985
query23	33752	33194	33167	33167
query24	12086	2897	2921	2897
query25	702	387	390	387
query26	1870	157	152	152
query27	2757	277	275	275
query28	7515	1990	1976	1976
query29	1148	641	641	641
query30	286	148	149	148
query31	1001	730	739	730
query32	97	57	57	57
query33	782	355	349	349
query34	938	484	507	484
query35	863	722	767	722
query36	1126	936	964	936
query37	292	83	81	81
query38	2830	2772	2739	2739
query39	892	831	820	820
query40	282	125	124	124
query41	49	48	50	48
query42	118	100	104	100
query43	514	476	471	471
query44	1266	717	721	717
query45	194	170	169	169
query46	1093	744	712	712
query47	1829	1754	1760	1754
query48	369	295	292	292
query49	1207	446	417	417
query50	807	405	405	405
query51	6759	6794	6658	6658
query52	104	93	98	93
query53	371	303	301	301
query54	969	462	464	462
query55	78	76	76	76
query56	310	287	299	287
query57	1175	1067	1067	1067
query58	273	265	269	265
query59	2963	2773	2640	2640
query60	439	282	287	282
query61	98	101	98	98
query62	855	645	664	645
query63	320	295	301	295
query64	10528	2247	5320	2247
query65	3175	3113	3100	3100
query66	1370	343	339	339
query67	15631	14772	14877	14772
query68	9018	550	545	545
query69	742	467	371	371
query70	1393	1142	1138	1138
query71	521	282	286	282
query72	9189	5855	5474	5474
query73	1854	330	329	329
query74	6287	5655	5690	5655
query75	4871	2670	2691	2670
query76	5229	968	919	919
query77	782	312	306	306
query78	9774	10412	9074	9074
query79	12325	525	528	525
query80	1361	487	479	479
query81	592	236	229	229
query82	287	136	138	136
query83	347	168	167	167
query84	273	86	89	86
query85	966	312	297	297
query86	364	314	330	314
query87	3295	3139	3081	3081
query88	5309	2461	2460	2460
query89	522	393	376	376
query90	2442	202	202	202
query91	134	103	102	102
query92	61	51	51	51
query93	7127	502	498	498
query94	1643	290	290	290
query95	416	325	327	325
query96	628	280	273	273
query97	3146	3026	3032	3026
query98	224	200	195	195
query99	1542	1293	1284	1284
Total cold run time: 314163 ms
Total hot run time: 174268 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 41f5256f6ec4294c09656568ddf22d1b8b3dee74, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.08
query5	0.50	0.50	0.50
query6	1.13	0.73	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.50	0.49
query10	0.54	0.55	0.54
query11	0.15	0.11	0.11
query12	0.15	0.12	0.13
query13	0.60	0.60	0.58
query14	0.75	0.76	0.77
query15	0.85	0.81	0.82
query16	0.36	0.36	0.36
query17	1.00	0.96	0.95
query18	0.23	0.23	0.22
query19	1.80	1.71	1.81
query20	0.01	0.01	0.01
query21	15.40	0.77	0.64
query22	4.33	6.75	1.64
query23	18.31	1.31	1.21
query24	2.08	0.23	0.23
query25	0.16	0.09	0.08
query26	0.29	0.21	0.21
query27	0.45	0.23	0.23
query28	13.22	1.02	0.99
query29	12.62	3.40	3.40
query30	0.25	0.06	0.05
query31	2.85	0.38	0.38
query32	3.29	0.48	0.47
query33	2.86	2.94	2.92
query34	17.04	4.29	4.38
query35	4.42	4.42	4.48
query36	0.66	0.47	0.45
query37	0.19	0.15	0.15
query38	0.15	0.15	0.14
query39	0.04	0.04	0.03
query40	0.15	0.13	0.13
query41	0.10	0.05	0.05
query42	0.05	0.04	0.04
query43	0.04	0.05	0.05
Total cold run time: 109.66 s
Total hot run time: 30.34 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 30, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit f38a67b into apache:master Jul 30, 2024
27 of 29 checks passed
zddr added a commit to zddr/incubator-doris that referenced this pull request Jul 31, 2024
…7723)

The Ranger plugin is relatively slow in obtaining datamask and row
filter policies, and the time consumed will become slower as the number
of policies configured on the Ranger increases

optimized logic:
cache result in memory of doris, cache will invalidate when ranger
plugin discover policy updates

before:

mysql> select * from zd.user;
Empty set (0.19 sec)


after:

mysql> select * from zd.user;
Empty set (0.06 sec)
dataroaring pushed a commit that referenced this pull request Aug 6, 2024
The Ranger plugin is relatively slow in obtaining datamask and row
filter policies, and the time consumed will become slower as the number
of policies configured on the Ranger increases

optimized logic:
cache result in memory of doris, cache will invalidate when ranger
plugin discover policy updates

before:

mysql> select * from zd.user;
Empty set (0.19 sec)


after:

mysql> select * from zd.user;
Empty set (0.06 sec)
feiniaofeiafei pushed a commit to feiniaofeiafei/doris that referenced this pull request Aug 9, 2024
…7723)

The Ranger plugin is relatively slow in obtaining datamask and row
filter policies, and the time consumed will become slower as the number
of policies configured on the Ranger increases

optimized logic:
cache result in memory of doris, cache will invalidate when ranger
plugin discover policy updates

before:

mysql> select * from zd.user;
Empty set (0.19 sec)


after:

mysql> select * from zd.user;
Empty set (0.06 sec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants