Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database: Fixed slow permission query in folder/dashboard search #17427

Merged
merged 4 commits into from
Jun 5, 2019

Conversation

aocenas
Copy link
Member

@aocenas aocenas commented Jun 4, 2019

Fixes: #17336

The issue seems to be in one join that adds default permissions that creates N x N row that needs to be filtered (for ~9k dashboards that is around 80m rows). Moving the default permissions to separate select and union-ing them seems to fix the issue while preserving the same results. The UI is still not supper snappy but it seems more of an issue of rendering too many dashboards in the search. I think we could just limit returned dashboards quite a bit.

explain analyze before:

"Unique  (cost=1383690.10..1383785.98 rows=9550 width=4) (actual time=32111.894..32113.638 rows=9551 loops=1)"
"  ->  Sort  (cost=1383690.10..1383738.04 rows=19176 width=4) (actual time=32111.893..32112.383 rows=9553 loops=1)"
"        Sort Key: d.id"
"        Sort Method: quicksort  Memory: 832kB"
"        ->  Hash Left Join  (cost=2490.09..1382326.02 rows=19176 width=4) (actual time=22.440..32101.798 rows=9553 loops=1)"
"              Hash Cond: (d.folder_id = folder.id)"
"              Filter: ((da.dashboard_id = d.id) OR (da.dashboard_id = d.folder_id) OR ((da.org_id = '-1'::integer) AND (((folder.id IS NOT NULL) AND (NOT folder.has_acl)) OR ((folder.id IS NULL) AND (NOT d.has_acl)))))"
"              Rows Removed by Filter: 90906416"
"              ->  Nested Loop  (cost=1887.21..1142251.46 rows=91159577 width=29) (actual time=8.669..16795.006 rows=90915969 loops=1)"
"                    ->  Index Scan using dashboard_pkey1 on dashboard d  (cost=0.29..686.43 rows=9550 width=13) (actual time=0.006..24.889 rows=9551 loops=1)"
"                          Filter: (org_id = 1)"
"                    ->  Materialize  (cost=1886.93..2035.14 rows=9546 width=16) (actual time=0.001..0.475 rows=9519 loops=9551)"
"                          ->  Merge Left Join  (cost=1886.93..1987.41 rows=9546 width=16) (actual time=8.658..13.496 rows=9519 loops=1)"
"                                Merge Cond: (da.team_id = ugm.team_id)"
"                                Filter: ((da.user_id = 1) OR (ugm.user_id = 1) OR ((da.role)::text = 'Viewer'::text))"
"                                Rows Removed by Filter: 9518"
"                                ->  Sort  (cost=1811.71..1859.21 rows=18999 width=39) (actual time=8.634..9.761 rows=19037 loops=1)"
"                                      Sort Key: da.team_id"
"                                      Sort Method: quicksort  Memory: 2256kB"
"                                      ->  Seq Scan on dashboard_acl da  (cost=0.00..461.49 rows=18999 width=39) (actual time=0.008..5.796 rows=19037 loops=1)"
"                                            Filter: (permission >= 1)"
"                                ->  Sort  (cost=75.21..77.91 rows=1080 width=16) (actual time=0.017..0.018 rows=1 loops=1)"
"                                      Sort Key: ugm.team_id"
"                                      Sort Method: quicksort  Memory: 25kB"
"                                      ->  Seq Scan on team_member ugm  (cost=0.00..20.80 rows=1080 width=16) (actual time=0.010..0.011 rows=1 loops=1)"
"              ->  Hash  (cost=483.50..483.50 rows=9550 width=5) (actual time=3.851..3.851 rows=9551 loops=1)"
"                    Buckets: 16384  Batches: 1  Memory Usage: 474kB"
"                    ->  Seq Scan on dashboard folder  (cost=0.00..483.50 rows=9550 width=5) (actual time=0.007..2.415 rows=9551 loops=1)"
"Planning Time: 0.422 ms"
"Execution Time: 32114.044 ms"

after:

"HashAggregate  (cost=51663.40..51665.40 rows=200 width=4) (actual time=62.122..63.198 rows=9551 loops=1)"
"  Group Key: d.id"
"  ->  HashAggregate  (cost=51231.94..51423.70 rows=19176 width=4) (actual time=58.638..60.376 rows=9551 loops=1)"
"        Group Key: d.id"
"        ->  Append  (cost=48960.62..51184.00 rows=19176 width=4) (actual time=41.872..56.794 rows=9552 loops=1)"
"              ->  Merge Left Join  (cost=48960.62..49156.70 rows=19141 width=4) (actual time=41.872..46.071 rows=9517 loops=1)"
"                    Merge Cond: (da.team_id = ugm.team_id)"
"                    Filter: ((da.user_id = 2) OR (ugm.user_id = 2) OR ((da.role)::text = 'Viewer'::text))"
"                    Rows Removed by Filter: 9518"
"                    ->  Sort  (cost=48885.41..48980.65 rows=38097 width=27) (actual time=41.852..42.920 rows=19035 loops=1)"
"                          Sort Key: da.team_id"
"                          Sort Method: quicksort  Memory: 1661kB"
"                          ->  Nested Loop  (cost=0.65..45986.72 rows=38097 width=27) (actual time=0.115..39.421 rows=19035 loops=1)"
"                                ->  Seq Scan on dashboard d  (cost=0.00..507.38 rows=9550 width=12) (actual time=0.005..2.358 rows=9551 loops=1)"
"                                      Filter: (org_id = 1)"
"                                ->  Bitmap Heap Scan on dashboard_acl da  (cost=0.65..4.72 rows=4 width=31) (actual time=0.003..0.003 rows=2 loops=9551)"
"                                      Recheck Cond: ((dashboard_id = d.id) OR (dashboard_id = d.folder_id))"
"                                      Filter: (permission >= 1)"
"                                      Heap Blocks: exact=9518"
"                                      ->  BitmapOr  (cost=0.65..0.65 rows=4 width=0) (actual time=0.002..0.002 rows=0 loops=9551)"
"                                            ->  Bitmap Index Scan on ""IDX_dashboard_acl_dashboard_id""  (cost=0.00..0.33 rows=2 width=0) (actual time=0.001..0.001 rows=2 loops=9551)"
"                                                  Index Cond: (dashboard_id = d.id)"
"                                            ->  Bitmap Index Scan on ""IDX_dashboard_acl_dashboard_id""  (cost=0.00..0.33 rows=2 width=0) (actual time=0.001..0.001 rows=0 loops=9551)"
"                                                  Index Cond: (dashboard_id = d.folder_id)"
"                    ->  Sort  (cost=75.21..77.91 rows=1080 width=16) (actual time=0.012..0.012 rows=1 loops=1)"
"                          Sort Key: ugm.team_id"
"                          Sort Method: quicksort  Memory: 25kB"
"                          ->  Seq Scan on team_member ugm  (cost=0.00..20.80 rows=1080 width=16) (actual time=0.005..0.005 rows=1 loops=1)"
"              ->  Nested Loop  (cost=602.88..1739.66 rows=35 width=4) (actual time=3.514..9.936 rows=35 loops=1)"
"                    ->  Seq Scan on dashboard_acl da_1  (cost=0.00..603.98 rows=1 width=0) (actual time=0.011..2.126 rows=1 loops=1)"
"                          Filter: ((permission >= 1) AND (org_id = '-1'::integer) AND ((user_id = 2) OR ((role)::text = 'Viewer'::text)))"
"                          Rows Removed by Filter: 19036"
"                    ->  Hash Left Join  (cost=602.88..1135.33 rows=35 width=4) (actual time=3.501..7.804 rows=35 loops=1)"
"                          Hash Cond: (d_1.folder_id = folder.id)"
"                          Filter: (((folder.id IS NOT NULL) AND (NOT folder.has_acl)) OR ((folder.id IS NULL) AND (NOT d_1.has_acl)))"
"                          Rows Removed by Filter: 9516"
"                          ->  Seq Scan on dashboard d_1  (cost=0.00..507.38 rows=9550 width=13) (actual time=0.003..2.072 rows=9551 loops=1)"
"                                Filter: (org_id = 1)"
"                          ->  Hash  (cost=483.50..483.50 rows=9550 width=5) (actual time=3.481..3.481 rows=9551 loops=1)"
"                                Buckets: 16384  Batches: 1  Memory Usage: 474kB"
"                                ->  Seq Scan on dashboard folder  (cost=0.00..483.50 rows=9550 width=5) (actual time=0.002..2.242 rows=9551 loops=1)"
"Planning Time: 0.527 ms"
"Execution Time: 63.711 ms"

TODO:

  • a bit of testing whether the output is really the same for other ACL setup (teams, users, roles etc)
  • benchmark test maybe will probably need a bit of help from @markelog

@torkelo
Copy link
Member

torkelo commented Jun 4, 2019

Did you test it with this test data? #17422

Did you notice that the GetSystemStats query also gets slow?

@aocenas
Copy link
Member Author

aocenas commented Jun 4, 2019

@torkelo Did not see that one, created my own script to create the dashboards but will check yours too.
I noticed few things being slow, for example logging in with non admin user took 20s. Seemed like the query update fixed that too, but I will check the system stats query. I assume slowdown there would not be visible in UI.

@torkelo
Copy link
Member

torkelo commented Jun 4, 2019

I also noticed the slow login time

@aocenas
Copy link
Member Author

aocenas commented Jun 4, 2019

Added unit tests (tested with previous implementation) so it seems like it works the same so far. In any case some more reviewing is probably in order as this is kinda sensitive query that is added to multiple other queries.

@torkelo
Copy link
Member

torkelo commented Jun 5, 2019

Tested this now and it does fix both the login & seach issue for me and seems to keep the old behavior / logic

@aocenas
Copy link
Member Author

aocenas commented Jun 5, 2019

@torkelo regarding the GetSystemStats query I do not see a huge slowdown with lots of dashboards. The queries there do not use the ACL check subquery. There is some dashboard - dashboard_acl joins that could get slower with number of dashboards but probably not significantly and it should be easily cached by the DB.

@torkelo
Copy link
Member

torkelo commented Jun 5, 2019

Regarding GetSystemStats , I was reading the output wrong it took 18ms not 18 seconds

@aocenas aocenas marked this pull request as ready for review June 5, 2019 08:53
@aocenas aocenas merged commit 1c3ad78 into master Jun 5, 2019
@aocenas aocenas deleted the acl-query-fix branch June 5, 2019 08:55
@marefr marefr modified the milestones: 6.3, 6.2.2 Jun 5, 2019
@marefr marefr changed the title Perf: Fix slow ACL query Database: Fixed slow permission query in folder/dashboard search Jun 5, 2019
aocenas added a commit that referenced this pull request Jun 5, 2019
Fix slow ACL query for dashboards that was used as subquery on multiple places slowing down search and login in instances with many dashboards.

(cherry picked from commit 1c3ad78)
@aocenas aocenas mentioned this pull request Jun 5, 2019
aocenas added a commit that referenced this pull request Jun 5, 2019
Fix slow ACL query for dashboards that was used as subquery on multiple places slowing down search and login in instances with many dashboards.

(cherry picked from commit 1c3ad78)
ryantxu added a commit to ryantxu/grafana that referenced this pull request Jun 6, 2019
* grafana/master:
  Prometheus: Use overridden panel range as $_range instead of dashboard range (grafana#17352)
  Update latest (grafana#17456)
  NavModel: Fixed page header ui tabs issues for some admin pages (grafana#17444)
  Update changelog for 6.2.2 (grafana#17452)
  PluginConfig: Fixed plugin config page navigation when using subpath (grafana#17364)
  Tracing: allow propagation with Zipkin headers (grafana#17009)
  Perf: Fix slow dashboards ACL query (grafana#17427)
  Explore: Fixes crash when parsing date math string with whitespace (grafana#17446)
  Cloudwatch: Add AWS DocDB metrics (grafana#17241)
  Provisioning: Support folder that doesn't exist yet in dashboard provisioning (grafana#17407)
  Codestyle: Fix govet issues (grafana#17178)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dashboard acl check is slow on large instances
3 participants