-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import and initialization optimizations #2368
Conversation
This involved splitting validators/graph object classes back into separate files
… object belongs to these packages If pandas isn't loaded, we don't need to check whether a value is a DataFrame, and this way we don't pay the pandas import time.
…reated. Create them lazily, and cache then for use across graph objects of the same type
ecdfca2
to
4a8ea51
Compare
Saves 200ms of startup time!
4a8ea51
to
7e50c44
Compare
cc @alexcjohnson @chriddyp. These changes should help out with improving the responsiveness of Dash hot-reload. And should significantly reduce the performance cost of using graph objects and px to generate Figures in Dash callbacks. |
Very nice! Quarter second speed up on import & half second speed up when creating |
Regarding review, is there any way to re-order/squash some commits so that it is possible to review independently the non-codegened part? |
Good, point @emmanuelle and @nicolaskruchten. Sorry for not providing a commit overview. All of the codegen changes are in 4ee88bc. For that commit, that hand-edited changes are in the following files (everything else is codegen output):
The rest of the commits can be reviewed individually and do not include codegen changes. Thanks! |
I don't have a full grasp of the changes made here but the descriptions make sense, the tests pass, the docs build and most CI jobs seem to go faster, so I'd call this a win :) |
💃 unless objections |
WOW... |
|
||
# Check for submodule | ||
if import_name in module_names: | ||
# print(parent_name, import_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove commented out print statements
So in this branch I cannot get the tab completion to work for |
Also I generated the API doc on this branch and some links to |
Can you provide some more details on the exact scenario you're trying, including versions of various things? I'm on Python 3.7.7 / JupyterLab 2.1 and I'm trying |
This also works at the command-line with |
@nicolaskruchten what you describe works for me too. What does not work is to do |
Ahh, I think I have a solution. Looks like I'll add these function to the codegen and update the PR. Hopefully this will also solve the documentation generation issue @emmanuelle mentioned. |
Cool! So in principle this shouldn't be different in 3.6? |
I also have |
@jonmmease it'd be awesome if it's possible to have the best of both worlds :-). @nicolaskruchten I tried with conda envs and specific versions of Python, and this branch: py3.6, tab completion works well with |
Well if it’s broken only on 3.7+ then maybe Jon’s upcoming fix will resolve it! Thanks for checking! My fear was that it would also be broken in 3.6 but if that’s not the case then we’re in luck :) |
OK, IPython tab completion seems to be working well for me now with Python 3.7. Please let me know what you see in your environments! Thanks |
so there is some progress for me in python 3.7 (pip env), since i can now do |
…ts is not installed
@jonmmease are you able to replicate Emma's issues at all locally? Just in a shell with |
Huh, this is working for me on Python 3.7 with ipython 7.13 😕 (Despite the environment name, this is Python 3.7 🙈) @emmanuelle any improvement in the behavior of documentation generation @nicolaskruchten how do things look for you? |
@jonmmease is this Linux? I'll check locally in a bit |
no change for me, I can tab-complete to |
If I instantiate |
Did you mean to use a capital I was looking at submodule imports like this |
Yeah I tried both and I'm seeing what I would expect. |
Does that percy failure look familiar? A missing mapbox colorbar only on chrome? Should I resubmit the job? |
let's rerun it yeah. |
Ok, merging. Let's open new issues on the tab completion situation as they arise. Thanks all! |
Overview
This PR contains a variety of optimizations targeted and improving plotly.py's import and Figure creation/serialization speed.
Lazy submodule imports in Python 3.7+
PEP-562 in Python 3.7 introduced a nice approach for implementing lazy loading of submodules. The top-level
plotly/__init__.py
,plotly/io/__init__.py
and the fullgraph_objs
hierarchy have been updated to use lazy submodule importing for Python 3.7+. For older Python versions, all submodule imports are still performed immediately.Part of this process involved codegen updates to split graph object and validator classes into their own files.
Lazy creation of validators
Previously, each graph object would instantiate a set of validators (one per property) in the constructor. Now, validators are constructed when first used, and they are stored in a global cache (
plotly/validator_cache.py
)Lazy creation of child graph objects
Previously, child graph objects were created in the constructor, and they were initialized for every possible property. Now, graph objects initialized either on property access or when the property is set to a non-
None
value (if validation is enabled, see below).Avoid loading numpy and pandas when not in use
In several places in the codebase, we attempt to import numpy/pandas using our
get_module
function, and then use the pandas/numpy module handle to check whether an argument is a data structure from that library. Theget_module
function now has ashould_load
option. When set toFalse
,get_module
will only return the module if it is already loaded. This is useful because if pandas isn't loaded, then we don't need to check whether a value is aDataFrame
. This keeps us from having to pay the pandas/numpy import cost when these libraries are installed but not in use. This saves ~200ms when pandas/numpy are installed but not in use.Avoid dynamic docstring generation
This PR removes the dynamic docstring generation that was used to populate the docstrings for the Figure methods corresponding to
plotly.io
functions (e.g.Figure.show
with created by transformingplotly.io.show
). These docstrings were added statically. This saves ~200ms on import time.Support optional validation
This PR adds support for disabling property validation using the
go.validation
object. This can be used as a callable to enable/disable validation for the session (e.g.go.validation(False)
), or it can be used as a context manager to enable/disable validation within block of code (e.g.with go.validation(False):
).API inspired by Bokeh's implementation in bokeh/bokeh#6042.
Results
Here are some before/after performance results on Python 3.7 with this PR:
top-level import
Version 4.6: 239 ms
PR: 2.5ms
95x speedup
Import, create empty figure, and serialize to JSON
Version 4.6: 696 ms
PR: 27ms
25x speedup
Repeatedly create empty figure and serialize to json (after import)
Version 4.6: 68 ms
PR: 1.5ms
45x speedup
Import, load data, create animated plotly express figure, serialize to json
Version 4.6: 1530 ms
PR: 550 ms
2.7x speedup
Repeatedly create px plot after import and data are loaded
Version 4.6: 663 ms
PR: 167 ms
4x speedup
Import, load data, create animated plotly express figure, serialize to json, skip validation
PR (no validation): 449 ms
PR (with validation): 550 ms
Version 4.6: 1530 ms
Repeatedly import, load data, create animated plotly express figure, serialize to json, skip validation
PR (no validation): 127ms
PR (with validation): 167ms
Version 4.6: 663 ms
cc @nicolaskruchten @emmanuelle