-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: remove unnecessary dataset queries from dashboard requests #16110
refactor: remove unnecessary dataset queries from dashboard requests #16110
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16110 +/- ##
=======================================
Coverage 76.83% 76.84%
=======================================
Files 995 995
Lines 52884 52881 -3
Branches 6721 6721
=======================================
Hits 40636 40636
+ Misses 12023 12020 -3
Partials 225 225
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
I think it's also safe to remove superset/superset/models/dashboard.py Lines 164 to 167 in b3e699b
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks familiar :)
…pache#16110) * refactor: remove unnecessary dataset queries from dashboard requests * fix comments
…pache#16110) * refactor: remove unnecessary dataset queries from dashboard requests * fix comments
SUMMARY
In airbnb we did some investigation for server-side processing time for dashboard requests, try to find perf bottleneck and improve the dashboard performance, especially for those large ones with multiple datasets and tabs.
Currently when user opens a dashboard we need 3 different API to fetch data:
GET /api/v1/dashboard/<id_or_slug>
: to get dashboard metadataGET /api/v1/dashboard/<id_or_slug>/charts
: to get charts metadata for a given dashboard idGET /api/v1/dashboard/<id_or_slug>/datasets
: to get dataset metadata for a given dashboard idSince this PR, we found the queries to
tables
is perf bottleneck, so we already made datasets request non-blocking for dashboard render. And given we already split big blob of dashboard data into 3 different APIs, it seems not necessary for each of request to querytables
. This PR is to remove unnecessary dataset queries from these 2 requests.We run some profiling for
/dashboard/id
and/dashboard/id/charts
requests inDatadog
. I used a heavy dashboard in airbnb as example, which has 80 datasets and about 300 charts.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before:
You will see that even each query to
tables
table only takes 1.2 ~ 1.9 ms, but the queries was called a few hundred times, it cause the overall response time took ~ 1 seconds (for both dashboard and dashboard/charts request)After:
After removing unnecessary queries to datasets,
pymysql.query
operation is reduced a lot.TESTING INSTRUCTIONS
CI and manual test.
ADDITIONAL INFORMATION