Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import and initialization optimizations #2368

Merged
merged 14 commits into from
Apr 16, 2020
Merged

Conversation

jonmmease
Copy link
Contributor

@jonmmease jonmmease commented Apr 9, 2020

Overview

This PR contains a variety of optimizations targeted and improving plotly.py's import and Figure creation/serialization speed.

Lazy submodule imports in Python 3.7+

PEP-562 in Python 3.7 introduced a nice approach for implementing lazy loading of submodules. The top-level plotly/__init__.py, plotly/io/__init__.py and the full graph_objs hierarchy have been updated to use lazy submodule importing for Python 3.7+. For older Python versions, all submodule imports are still performed immediately.

Part of this process involved codegen updates to split graph object and validator classes into their own files.

Lazy creation of validators

Previously, each graph object would instantiate a set of validators (one per property) in the constructor. Now, validators are constructed when first used, and they are stored in a global cache (plotly/validator_cache.py)

Lazy creation of child graph objects

Previously, child graph objects were created in the constructor, and they were initialized for every possible property. Now, graph objects initialized either on property access or when the property is set to a non-None value (if validation is enabled, see below).

Avoid loading numpy and pandas when not in use

In several places in the codebase, we attempt to import numpy/pandas using our get_module function, and then use the pandas/numpy module handle to check whether an argument is a data structure from that library. The get_module function now has a should_load option. When set to False, get_module will only return the module if it is already loaded. This is useful because if pandas isn't loaded, then we don't need to check whether a value is a DataFrame. This keeps us from having to pay the pandas/numpy import cost when these libraries are installed but not in use. This saves ~200ms when pandas/numpy are installed but not in use.

Avoid dynamic docstring generation

This PR removes the dynamic docstring generation that was used to populate the docstrings for the Figure methods corresponding to plotly.io functions (e.g. Figure.show with created by transforming plotly.io.show). These docstrings were added statically. This saves ~200ms on import time.

Support optional validation

This PR adds support for disabling property validation using the go.validation object. This can be used as a callable to enable/disable validation for the session (e.g. go.validation(False)), or it can be used as a context manager to enable/disable validation within block of code (e.g. with go.validation(False):).

API inspired by Bokeh's implementation in bokeh/bokeh#6042.

Results

Here are some before/after performance results on Python 3.7 with this PR:

top-level import

%%time
import plotly

Version 4.6: 239 ms
PR: 2.5ms
95x speedup

Import, create empty figure, and serialize to JSON

%%time
import plotly.graph_objects as go
go.Figure().to_json()

Version 4.6: 696 ms
PR: 27ms
25x speedup

Repeatedly create empty figure and serialize to json (after import)

%%timeit
go.Figure().to_json()

Version 4.6: 68 ms
PR: 1.5ms
45x speedup

Import, load data, create animated plotly express figure, serialize to json

%%time
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 1530 ms
PR: 550 ms
2.7x speedup

Repeatedly create px plot after import and data are loaded

%%timeit
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 663 ms
PR: 167 ms
4x speedup

Import, load data, create animated plotly express figure, serialize to json, skip validation

%%time
import plotly.express as px
import plotly.graph_objects as go
df = px.data.gapminder()
with go.validate(False):
    fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
               size="pop", color="continent", hover_name="country", facet_col="continent",
               log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
    fig.to_json()

PR (no validation): 449 ms
PR (with validation): 550 ms
Version 4.6: 1530 ms

Repeatedly import, load data, create animated plotly express figure, serialize to json, skip validation

%%timeit
with go.validate(False):
    fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
               size="pop", color="continent", hover_name="country", facet_col="continent",
               log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])

PR (no validation): 127ms
PR (with validation): 167ms
Version 4.6: 663 ms

cc @nicolaskruchten @emmanuelle

This involved splitting validators/graph object classes back into separate files
… object belongs to these packages

If pandas isn't loaded, we don't need to check whether a value is a DataFrame, and this way we
don't pay the pandas import time.
…reated.

Create them lazily, and cache then for use across graph objects of the same type
@jonmmease jonmmease changed the title Import init optimization Import and initialization optimizations Apr 10, 2020
@jonmmease
Copy link
Contributor Author

cc @alexcjohnson @chriddyp. These changes should help out with improving the responsiveness of Dash hot-reload. And should significantly reduce the performance cost of using graph objects and px to generate Figures in Dash callbacks.

@chriddyp
Copy link
Member

Very nice! Quarter second speed up on import & half second speed up when creating px figures.. really impressive! 🐎

@nicolaskruchten
Copy link
Contributor

Very nice! Might take me a sec to review ;)

image

(j/k I know most of those are codegen'ed)

@emmanuelle
Copy link
Contributor

Regarding review, is there any way to re-order/squash some commits so that it is possible to review independently the non-codegened part?

@jonmmease
Copy link
Contributor Author

Good, point @emmanuelle and @nicolaskruchten. Sorry for not providing a commit overview.

All of the codegen changes are in 4ee88bc. For that commit, that hand-edited changes are in the following files (everything else is codegen output):

The rest of the commits can be reviewed individually and do not include codegen changes.

Thanks!

@nicolaskruchten
Copy link
Contributor

I don't have a full grasp of the changes made here but the descriptions make sense, the tests pass, the docs build and most CI jobs seem to go faster, so I'd call this a win :)

@nicolaskruchten
Copy link
Contributor

💃 unless objections

@nicolaskruchten
Copy link
Contributor

WOW... pytest plotly/tests/test_core/test_px/ goes from 43 seconds to 8 seconds on my machine! 5x speedup!


# Check for submodule
if import_name in module_names:
# print(parent_name, import_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove commented out print statements

@emmanuelle
Copy link
Contributor

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter, probably because of lazy loading. Is there any way to keep the performance improvement but to get the tab completion back ? Maybe populating the __all__ variable of each submodule would work (haven't checked).

@emmanuelle
Copy link
Contributor

Also I generated the API doc on this branch and some links to go classes are broken, I need to understand why and how it can be fixed.

@nicolaskruchten
Copy link
Contributor

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter

Can you provide some more details on the exact scenario you're trying, including versions of various things? I'm on Python 3.7.7 / JupyterLab 2.1 and I'm trying fig = px.scatter(x=[1,2,3]) then fig.lay<tab> and I can complete layout and then .hov<tab> and I can complete hoverlabel and then .b<tab> completes bgcolor so this appears to work 'all the way down'. This works in both /lab and /notebooks.

@nicolaskruchten
Copy link
Contributor

This also works at the command-line with ipython, for me.

@emmanuelle
Copy link
Contributor

emmanuelle commented Apr 14, 2020

@nicolaskruchten what you describe works for me too. What does not work is to do go.La + TAB, go.Layout.bar + TAB, go.Choro + TAB, etc. Python 3.7.3 here, ipython 7.8.0 and notebook server 5.7.4. (after doing import plotly.graph_objects as go, of course 😁 )

@jonmmease
Copy link
Contributor Author

Ahh, I think I have a solution. Looks like ipython honors the module-level __dir__() function that was defined in PEP-562 (https://www.python.org/dev/peps/pep-0562/) along with __getattr__().

I'll add these function to the codegen and update the PR. Hopefully this will also solve the documentation generation issue @emmanuelle mentioned.

@nicolaskruchten
Copy link
Contributor

Cool! So in principle this shouldn't be different in 3.6?

@emmanuelle
Copy link
Contributor

I also have jedi installed,0.15.1.

@emmanuelle
Copy link
Contributor

@jonmmease it'd be awesome if it's possible to have the best of both worlds :-).

@nicolaskruchten I tried with conda envs and specific versions of Python, and this branch: py3.6, tab completion works well with go. TAB, but with py3.8, it does not work. Is it a linux thing then? To be continued...

@nicolaskruchten
Copy link
Contributor

Well if it’s broken only on 3.7+ then maybe Jon’s upcoming fix will resolve it! Thanks for checking! My fear was that it would also be broken in 3.6 but if that’s not the case then we’re in luck :)

@jonmmease
Copy link
Contributor Author

OK, IPython tab completion seems to be working well for me now with Python 3.7. Please let me know what you see in your environments! Thanks

@emmanuelle
Copy link
Contributor

so there is some progress for me in python 3.7 (pip env), since i can now do go.F TAB to get go.Figure, or go.B TAB for go.Bar, but I cannot go deeper in the hierarchy, for example go.bar.M + TAB does not return anything.

@nicolaskruchten
Copy link
Contributor

@jonmmease are you able to replicate Emma's issues at all locally? Just in a shell with ipython ?

@jonmmease
Copy link
Contributor Author

Huh, this is working for me on Python 3.7 with ipython 7.13 😕

Screenshot_20200416_133605

(Despite the environment name, this is Python 3.7 🙈)

@emmanuelle any improvement in the behavior of documentation generation

@nicolaskruchten how do things look for you?

@nicolaskruchten
Copy link
Contributor

@jonmmease is this Linux? I'll check locally in a bit

@jonmmease
Copy link
Contributor Author

jonmmease commented Apr 16, 2020

Yeah, I'm on Linux. It's also working in plain vanilla python repl

Screenshot_20200416_134612

@nicolaskruchten
Copy link
Contributor

no change for me, I can tab-complete to go.Layout.hoverlabel but not through to .bgcolor. Which, BTW, is fine by me.

@nicolaskruchten
Copy link
Contributor

If I instantiate fig = go.Figure() then I seem to be able to drill down arbitrarily deeply: fig.layout.hoverlabel.font.color. I can also go arbitrarily deeply with go.bar.marker.coloraxis so long as I stay lower-cased and not to e.g. go.bar.Marker.whatever

@jonmmease
Copy link
Contributor Author

jonmmease commented Apr 16, 2020

go.Layout.hoverlabel

Did you mean to use a capital L there? In that case hoverlabel would be the method object and that method object wouldn't be expected to have completion.

I was looking at submodule imports like this go.layout.hoverlabel

@nicolaskruchten
Copy link
Contributor

Yeah I tried both and I'm seeing what I would expect.

@jonmmease
Copy link
Contributor Author

Does that percy failure look familiar? A missing mapbox colorbar only on chrome? Should I resubmit the job?

@nicolaskruchten
Copy link
Contributor

let's rerun it yeah.

@jonmmease
Copy link
Contributor Author

Ok, merging. Let's open new issues on the tab completion situation as they arise. Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants