Import and initialization optimizations #2368

jonmmease · 2020-04-09T22:38:36Z

Overview

This PR contains a variety of optimizations targeted and improving plotly.py's import and Figure creation/serialization speed.

Lazy submodule imports in Python 3.7+

PEP-562 in Python 3.7 introduced a nice approach for implementing lazy loading of submodules. The top-level plotly/__init__.py, plotly/io/__init__.py and the full graph_objs hierarchy have been updated to use lazy submodule importing for Python 3.7+. For older Python versions, all submodule imports are still performed immediately.

Part of this process involved codegen updates to split graph object and validator classes into their own files.

Lazy creation of validators

Previously, each graph object would instantiate a set of validators (one per property) in the constructor. Now, validators are constructed when first used, and they are stored in a global cache (plotly/validator_cache.py)

Lazy creation of child graph objects

Previously, child graph objects were created in the constructor, and they were initialized for every possible property. Now, graph objects initialized either on property access or when the property is set to a non-None value (if validation is enabled, see below).

Avoid loading numpy and pandas when not in use

In several places in the codebase, we attempt to import numpy/pandas using our get_module function, and then use the pandas/numpy module handle to check whether an argument is a data structure from that library. The get_module function now has a should_load option. When set to False, get_module will only return the module if it is already loaded. This is useful because if pandas isn't loaded, then we don't need to check whether a value is a DataFrame. This keeps us from having to pay the pandas/numpy import cost when these libraries are installed but not in use. This saves ~200ms when pandas/numpy are installed but not in use.

Avoid dynamic docstring generation

This PR removes the dynamic docstring generation that was used to populate the docstrings for the Figure methods corresponding to plotly.io functions (e.g. Figure.show with created by transforming plotly.io.show). These docstrings were added statically. This saves ~200ms on import time.

Support optional validation

This PR adds support for disabling property validation using the go.validation object. This can be used as a callable to enable/disable validation for the session (e.g. go.validation(False)), or it can be used as a context manager to enable/disable validation within block of code (e.g. with go.validation(False):).

API inspired by Bokeh's implementation in bokeh/bokeh#6042.

Results

Here are some before/after performance results on Python 3.7 with this PR:

top-level import

%%time
import plotly

Version 4.6: 239 ms
PR: 2.5ms
95x speedup

Import, create empty figure, and serialize to JSON

%%time
import plotly.graph_objects as go
go.Figure().to_json()

Version 4.6: 696 ms
PR: 27ms
25x speedup

Repeatedly create empty figure and serialize to json (after import)

%%timeit
go.Figure().to_json()

Version 4.6: 68 ms
PR: 1.5ms
45x speedup

Import, load data, create animated plotly express figure, serialize to json

%%time
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 1530 ms
PR: 550 ms
2.7x speedup

Repeatedly create px plot after import and data are loaded

%%timeit
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 663 ms
PR: 167 ms
4x speedup

Import, load data, create animated plotly express figure, serialize to json, skip validation

%%time
import plotly.express as px
import plotly.graph_objects as go
df = px.data.gapminder()
with go.validate(False):
    fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
               size="pop", color="continent", hover_name="country", facet_col="continent",
               log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
    fig.to_json()

PR (no validation): 449 ms
PR (with validation): 550 ms
Version 4.6: 1530 ms

Repeatedly import, load data, create animated plotly express figure, serialize to json, skip validation

%%timeit
with go.validate(False):
    fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
               size="pop", color="continent", hover_name="country", facet_col="continent",
               log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])

PR (no validation): 127ms
PR (with validation): 167ms
Version 4.6: 663 ms

cc @nicolaskruchten @emmanuelle

This involved splitting validators/graph object classes back into separate files

… object belongs to these packages If pandas isn't loaded, we don't need to check whether a value is a DataFrame, and this way we don't pay the pandas import time.

…reated. Create them lazily, and cache then for use across graph objects of the same type

…xt manager

Saves 200ms of startup time!

jonmmease · 2020-04-10T18:22:35Z

cc @alexcjohnson @chriddyp. These changes should help out with improving the responsiveness of Dash hot-reload. And should significantly reduce the performance cost of using graph objects and px to generate Figures in Dash callbacks.

chriddyp · 2020-04-10T18:33:20Z

Very nice! Quarter second speed up on import & half second speed up when creating px figures.. really impressive! 🐎

nicolaskruchten · 2020-04-11T01:40:05Z

Very nice! Might take me a sec to review ;)

(j/k I know most of those are codegen'ed)

emmanuelle · 2020-04-11T08:20:31Z

Regarding review, is there any way to re-order/squash some commits so that it is possible to review independently the non-codegened part?

jonmmease · 2020-04-11T16:08:28Z

Good, point @emmanuelle and @nicolaskruchten. Sorry for not providing a commit overview.

All of the codegen changes are in 4ee88bc. For that commit, that hand-edited changes are in the following files (everything else is codegen output):

packages/python/plotly/_plotly_utils/importers.py (4ee88bc)
packages/python/plotly/codegen/__init__.py (4ee88bc#diff-a08c8c3dc3faeb46a8a7a7eabb8da789)
packages/python/plotly/codegen/datatypes.py (4ee88bc#diff-9621264aeec2343e3c66b9920e449d85)
packages/python/plotly/codegen/figure.py (4ee88bc#diff-6cda9ae3edaeb3c80537c48218709489)
packages/python/plotly/codegen/utils.py (4ee88bc#diff-686bfdad0d0b0d7117728f3ead360ccb)
packages/python/plotly/codegen/validators.py (4ee88bc#diff-e48bce2a16745a3b4ea9dc7f242d09f1)

The rest of the commits can be reviewed individually and do not include codegen changes.

Thanks!

nicolaskruchten · 2020-04-14T00:54:41Z

I don't have a full grasp of the changes made here but the descriptions make sense, the tests pass, the docs build and most CI jobs seem to go faster, so I'd call this a win :)

nicolaskruchten · 2020-04-14T00:54:50Z

💃 unless objections

nicolaskruchten · 2020-04-14T03:21:33Z

WOW... pytest plotly/tests/test_core/test_px/ goes from 43 seconds to 8 seconds on my machine! 5x speedup!

packages/python/plotly/_plotly_utils/importers.py

emmanuelle · 2020-04-14T09:29:13Z

packages/python/plotly/_plotly_utils/importers.py

+
+        # Check for submodule
+        if import_name in module_names:
+            # print(parent_name, import_name)


please remove commented out print statements

emmanuelle · 2020-04-14T10:01:16Z

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter, probably because of lazy loading. Is there any way to keep the performance improvement but to get the tab completion back ? Maybe populating the __all__ variable of each submodule would work (haven't checked).

emmanuelle · 2020-04-14T10:23:03Z

Also I generated the API doc on this branch and some links to go classes are broken, I need to understand why and how it can be fixed.

nicolaskruchten · 2020-04-14T12:26:40Z

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter

Can you provide some more details on the exact scenario you're trying, including versions of various things? I'm on Python 3.7.7 / JupyterLab 2.1 and I'm trying fig = px.scatter(x=[1,2,3]) then fig.lay<tab> and I can complete layout and then .hov<tab> and I can complete hoverlabel and then .b<tab> completes bgcolor so this appears to work 'all the way down'. This works in both /lab and /notebooks.

nicolaskruchten · 2020-04-14T12:27:47Z

This also works at the command-line with ipython, for me.

emmanuelle · 2020-04-14T12:45:53Z

@nicolaskruchten what you describe works for me too. What does not work is to do go.La + TAB, go.Layout.bar + TAB, go.Choro + TAB, etc. Python 3.7.3 here, ipython 7.8.0 and notebook server 5.7.4. (after doing import plotly.graph_objects as go, of course 😁 )

jonmmease · 2020-04-15T14:03:34Z

Ahh, I think I have a solution. Looks like ipython honors the module-level __dir__() function that was defined in PEP-562 (https://www.python.org/dev/peps/pep-0562/) along with __getattr__().

I'll add these function to the codegen and update the PR. Hopefully this will also solve the documentation generation issue @emmanuelle mentioned.

nicolaskruchten · 2020-04-15T14:15:00Z

Cool! So in principle this shouldn't be different in 3.6?

emmanuelle · 2020-04-15T14:57:53Z

I also have jedi installed,0.15.1.

emmanuelle · 2020-04-15T21:52:23Z

@jonmmease it'd be awesome if it's possible to have the best of both worlds :-).

@nicolaskruchten I tried with conda envs and specific versions of Python, and this branch: py3.6, tab completion works well with go. TAB, but with py3.8, it does not work. Is it a linux thing then? To be continued...

nicolaskruchten · 2020-04-15T22:52:48Z

Well if it’s broken only on 3.7+ then maybe Jon’s upcoming fix will resolve it! Thanks for checking! My fear was that it would also be broken in 3.6 but if that’s not the case then we’re in luck :)

…ith Python 3.7+

jonmmease · 2020-04-16T14:04:07Z

OK, IPython tab completion seems to be working well for me now with Python 3.7. Please let me know what you see in your environments! Thanks

emmanuelle · 2020-04-16T14:48:20Z

so there is some progress for me in python 3.7 (pip env), since i can now do go.F TAB to get go.Figure, or go.B TAB for go.Bar, but I cannot go deeper in the hierarchy, for example go.bar.M + TAB does not return anything.

…ts is not installed

nicolaskruchten · 2020-04-16T17:39:21Z

@jonmmease are you able to replicate Emma's issues at all locally? Just in a shell with ipython ?

jonmmease · 2020-04-16T17:40:31Z

Huh, this is working for me on Python 3.7 with ipython 7.13 😕

(Despite the environment name, this is Python 3.7 🙈)

@emmanuelle any improvement in the behavior of documentation generation

@nicolaskruchten how do things look for you?

nicolaskruchten · 2020-04-16T17:41:36Z

@jonmmease is this Linux? I'll check locally in a bit

jonmmease · 2020-04-16T17:44:35Z

Yeah, I'm on Linux. It's also working in plain vanilla python repl

nicolaskruchten · 2020-04-16T18:13:23Z

no change for me, I can tab-complete to go.Layout.hoverlabel but not through to .bgcolor. Which, BTW, is fine by me.

nicolaskruchten · 2020-04-16T18:16:04Z

If I instantiate fig = go.Figure() then I seem to be able to drill down arbitrarily deeply: fig.layout.hoverlabel.font.color. I can also go arbitrarily deeply with go.bar.marker.coloraxis so long as I stay lower-cased and not to e.g. go.bar.Marker.whatever

jonmmease · 2020-04-16T18:31:39Z

go.Layout.hoverlabel

Did you mean to use a capital L there? In that case hoverlabel would be the method object and that method object wouldn't be expected to have completion.

I was looking at submodule imports like this go.layout.hoverlabel

nicolaskruchten · 2020-04-16T18:32:24Z

Yeah I tried both and I'm seeing what I would expect.

jonmmease · 2020-04-16T18:33:25Z

Does that percy failure look familiar? A missing mapbox colorbar only on chrome? Should I resubmit the job?

nicolaskruchten · 2020-04-16T19:12:17Z

let's rerun it yeah.

jonmmease · 2020-04-16T20:28:37Z

Ok, merging. Let's open new issues on the tab completion situation as they arise. Thanks all!

jonmmease added 8 commits April 10, 2020 10:39

Lazy imports of graph object hierarchy for Python 3.7+

4ee88bc

This involved splitting validators/graph object classes back into separate files

lazy submodule imports for plotly and plotly.io modules

cb70a40

Don't force import pandas and numpy if we're only checking whether an…

ceb9523

… object belongs to these packages If pandas isn't loaded, we don't need to check whether a value is a DataFrame, and this way we don't pay the pandas import time.

Don't auto-import all trace type classes

5974cf5

Don't construct validators up front for every object when object is c…

ffc277d

…reated. Create them lazily, and cache then for use across graph objects of the same type

Test fixes

181e427

Add optional validation using go.validate object as callable of conte…

aec7e9a

…xt manager

Delay additional imports

1084ba0

jonmmease force-pushed the import_init_optimization branch from ecdfca2 to 4a8ea51 Compare April 10, 2020 16:18

dynamic to static docstrings for Figure io methods

7e50c44

Saves 200ms of startup time!

jonmmease force-pushed the import_init_optimization branch from 4a8ea51 to 7e50c44 Compare April 10, 2020 16:24

jonmmease changed the title ~~Import init optimization~~ Import and initialization optimizations Apr 10, 2020

This was referenced Apr 10, 2020

[Feature Request] Allow direct import of utils.PlotlyJSONEncoder for faster Dash startup time #2174

Closed

Very slow performance of create_annotated_heatmap for small dataset #2299

Closed

jonmmease mentioned this pull request Apr 10, 2020

Importing plotly takes a lot of time #740

Closed

emmanuelle reviewed Apr 14, 2020

View reviewed changes

packages/python/plotly/_plotly_utils/importers.py Outdated Show resolved Hide resolved

emmanuelle reviewed Apr 14, 2020

View reviewed changes

Use module-level __dir__ function to restore IPython tab completion w…

f7e672b

…ith Python 3.7+

jonmmease added 2 commits April 16, 2020 10:15

Don't fail hierarchy test when ipywidgets not installed

20fc393

cut commented print statements

f979814

Test fix, FigureWidget is not expected to be importable when ipywidge…

7600eaf

…ts is not installed

jonmmease merged commit 7fcb95c into master Apr 16, 2020

jonmmease mentioned this pull request Apr 16, 2020

Remove plotly express from delayed submodule import. #2390

Merged

nicolaskruchten mentioned this pull request Apr 20, 2020

Add CHANGELOG step to PR template #2396

Merged

emmanuelle mentioned this pull request Apr 21, 2020

Plotly express Scatter: slow to setup with animations #2400

Closed

jonmmease mentioned this pull request Apr 21, 2020

Replace global validate state with Figure level validate flag #2403

Merged

nicolaskruchten added this to the 4.7.0 milestone Apr 27, 2020

nicolaskruchten deleted the import_init_optimization branch June 19, 2020 16:17

KyleKing mentioned this pull request Mar 24, 2021

PyInstaller + Plotly + Missing Optional Imports #3122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import and initialization optimizations #2368

Import and initialization optimizations #2368

jonmmease commented Apr 9, 2020 •

edited

Loading

jonmmease commented Apr 10, 2020

chriddyp commented Apr 10, 2020

nicolaskruchten commented Apr 11, 2020

emmanuelle commented Apr 11, 2020

jonmmease commented Apr 11, 2020

nicolaskruchten commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

emmanuelle Apr 14, 2020

emmanuelle commented Apr 14, 2020

emmanuelle commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

emmanuelle commented Apr 14, 2020 •

edited

Loading

jonmmease commented Apr 15, 2020

nicolaskruchten commented Apr 15, 2020

emmanuelle commented Apr 15, 2020

emmanuelle commented Apr 15, 2020

nicolaskruchten commented Apr 15, 2020

jonmmease commented Apr 16, 2020

emmanuelle commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020 •

edited

Loading

nicolaskruchten commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020 •

edited

Loading

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020

Import and initialization optimizations #2368

Import and initialization optimizations #2368

Conversation

jonmmease commented Apr 9, 2020 • edited Loading

Overview

Lazy submodule imports in Python 3.7+

Lazy creation of validators

Lazy creation of child graph objects

Avoid loading numpy and pandas when not in use

Avoid dynamic docstring generation

Support optional validation

Results

top-level import

Import, create empty figure, and serialize to JSON

Repeatedly create empty figure and serialize to json (after import)

Import, load data, create animated plotly express figure, serialize to json

Repeatedly create px plot after import and data are loaded

Import, load data, create animated plotly express figure, serialize to json, skip validation

Repeatedly import, load data, create animated plotly express figure, serialize to json, skip validation

jonmmease commented Apr 10, 2020

chriddyp commented Apr 10, 2020

nicolaskruchten commented Apr 11, 2020

emmanuelle commented Apr 11, 2020

jonmmease commented Apr 11, 2020

nicolaskruchten commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

emmanuelle Apr 14, 2020

Choose a reason for hiding this comment

emmanuelle commented Apr 14, 2020

emmanuelle commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

nicolaskruchten commented Apr 14, 2020

emmanuelle commented Apr 14, 2020 • edited Loading

jonmmease commented Apr 15, 2020

nicolaskruchten commented Apr 15, 2020

emmanuelle commented Apr 15, 2020

emmanuelle commented Apr 15, 2020

nicolaskruchten commented Apr 15, 2020

jonmmease commented Apr 16, 2020

emmanuelle commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020 • edited Loading

nicolaskruchten commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020 • edited Loading

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020

nicolaskruchten commented Apr 16, 2020

jonmmease commented Apr 16, 2020

jonmmease commented Apr 9, 2020 •

edited

Loading

emmanuelle commented Apr 14, 2020 •

edited

Loading

jonmmease commented Apr 16, 2020 •

edited

Loading

jonmmease commented Apr 16, 2020 •

edited

Loading