-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Implement model specific lookups by id to improve performance #20974
Conversation
@@ -491,6 +491,8 @@ def get_viz_annotation_data( | |||
chart = ChartDAO.find_by_id(annotation_layer["value"]) | |||
if not chart: | |||
raise QueryObjectValidationError(_("The chart does not exist")) | |||
if not chart.datasource: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addresses mypy issue in
viz_obj = get_viz(
datasource_type=chart.datasource.type,
datasource_id=chart.datasource.id,
form_data=form_data,
force=force,
)
ce579da
to
b5da0be
Compare
a8b2d63
to
28db76e
Compare
9f56905
to
eeb4fa5
Compare
Codecov Report
@@ Coverage Diff @@
## master #20974 +/- ##
==========================================
- Coverage 66.38% 66.34% -0.04%
==========================================
Files 1767 1767
Lines 67232 67317 +85
Branches 7138 7138
==========================================
+ Hits 44633 44664 +31
- Misses 20773 20827 +54
Partials 1826 1826
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
eeb4fa5
to
a695fc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this perf issue!
superset/dao/base.py
Outdated
cls, | ||
model_id: Union[str, int], | ||
session: Session = None, | ||
no_filter: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no_filter
feels slightly ambiguous. Would skip_filter
or ignore_filter
be more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's go with skip_filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit nit: can we maybe call this skip_base_filter
to make it even more explicit? I was confused for a while why get_by_id
would need still need a filter.
from superset.connectors.sqla.models import SqlaTable | ||
from superset.models.core import Database | ||
|
||
engine = session.get_bind() | ||
SqlaTable.metadata.create_all(engine) # pylint: disable=no-member | ||
|
||
db = Database(database_name="my_database", sqlalchemy_uri="sqlite://") | ||
sqla_table = SqlaTable( | ||
table_name="my_sqla_table", | ||
columns=[], | ||
metrics=[], | ||
database=db, | ||
) | ||
|
||
session.add(db) | ||
session.add(sqla_table) | ||
session.flush() | ||
yield session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note about missing cleanup as previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I wonder if the slowness in your check permission query is because postgres did not automatically create indexes on the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
new relic states that a lot of time is spent in python code, not sql execution. I have 2 more places I want to optimized for better perf: https://apache-superset.slack.com/archives/G013HAE6Y0K/p1659722615103529?thread_ts=1659395755.971919&cid=G013HAE6Y0K and will have PRs coming up soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice polishing in the last commit 👍
…20974) * Implement model specific lookups by id to improve performance * Address comments e.g. better variable names and test cleanup * commit after cleanup * even better name and test cleanup via rollback Co-authored-by: Bogdan Kyryliuk <bogdankyryliuk@dropbox.com>
SUMMARY
Currently the access checks are happening 2ce in the explore flow, once during the object lookup and later doing an explicit check. Existing implementation was spending > 500 ms in our deployment on doing object by id lookup and that contributed to ~1 - 1.5 second slowdown of the explore endpoint.
The proposed solution skips the filter on the chart / datasource lookup and let's explore to handle the access checks and return 403 if it was denied. Prior behavior was to return 404.
Out of scope
superset/superset/views/utils.py
Line 104 in f0ca158
superset/superset/views/utils.py
Line 72 in f0ca158
Before
After
TESTING INSTRUCTIONS
[x] Deployed and tested on dropbox staging
ADDITIONAL INFORMATION
More granular breakdown