Delay import #17710

TomAugspurger · 2017-09-28T20:16:25Z

Improves performance by delaying the import of matplotlib, s3fs, pytest, and openpyxl.

Also removes our old MPL style sheet. The option was deprecated and removed, but the stylesheet hung around.

Master:

In [1]: %time import numpy
CPU times: user 36.3 ms, sys: 13.2 ms, total: 49.5 ms
Wall time: 68.4 ms

In [2]: %time import pandas
CPU times: user 306 ms, sys: 52.2 ms, total: 358 ms
Wall time: 392 ms

head:

In [1]: %time import numpy
CPU times: user 37.7 ms, sys: 12.4 ms, total: 50 ms
Wall time: 69 ms

In [2]: %time import pandas
CPU times: user 166 ms, sys: 40.9 ms, total: 207 ms
Wall time: 245 ms

I can shave off more by hardcoding pandas.__version__. Maybe that's worthwhile for releases?

In [2]: %time import pandas
CPU times: user 133 ms, sys: 30 ms, total: 163 ms
Wall time: 173 ms

TomAugspurger · 2017-09-28T20:18:07Z

I may have messed up some of the excel config stuff... I think all the defaults are still the same, and the tests pass, but that's a bit messy. AFACT the defaults are

xls: xlwt (even if not installed)
xlsm: openpyxl (even if not installed)
xlsx: xlsxwriter if installed, else openpyxl

TomAugspurger · 2017-09-28T20:19:57Z

The next big one would be avoiding numexpr, (~21 ms), since it imports pkg_resources, which takes a while. That got a little messy though.

TomAugspurger · 2017-09-28T20:24:16Z

I'm thinking about tests for this. We should be able to whip something up with ModuleFinder

max-sixty · 2017-09-28T20:31:49Z

I can shave off more by hardcoding pandas.version. Maybe that's worthwhile for releases?

I don't know versioneer in detail - at first glance it looks more feature-rich. But we've had success with setuptools_scm, and it can write a file on install (https://github.com/pypa/setuptools_scm#configuration-parameters)

jreback · 2017-09-28T20:40:35Z

I can shave off more by hardcoding pandas.version. Maybe that's worthwhile for releases?

NO this exactly the purpose of versioneer
to avoid this hardcoding mess we had over the years

jreback · 2017-09-28T20:41:25Z

The next big one would be avoiding numexpr, (~21 ms), since it imports pkg_resources, which takes a while. That got a little messy though.

this could certainly be deferred till first use

TomAugspurger · 2017-09-28T20:46:51Z

But we've had success with setuptools_scm, and it can write a file on install

I have as well. It seems to be the "recommended" way to do automatic versioning.

NO this exactly the purpose of versioneer

To be clear, this would just be done for releases, as part of making the release commit. Immediately after tagging, the release manager would add a commit going back to versioneer.

this could certainly be deferred till first use

I'm trying it out now. It's not too bad. Will push a commit in a bit.

codecov · 2017-09-28T20:59:39Z

Codecov Report

Merging #17710 into master will decrease coverage by 0.04%.
The diff coverage is 72.28%.

@@            Coverage Diff             @@
##           master   #17710      +/-   ##
==========================================
- Coverage   91.25%    91.2%   -0.05%     
==========================================
  Files         163      163              
  Lines       49823    49831       +8     
==========================================
- Hits        45464    45450      -14     
- Misses       4359     4381      +22

Flag	Coverage Δ
#multiple	`89% <72.28%> (-0.03%)`	⬇️
#single	`40.26% <39.75%> (-0.13%)`	⬇️

Impacted Files	Coverage Δ
pandas/util/testing.py	`100% <ø> (ø)`	⬆️
pandas/plotting/_style.py	`74.28% <ø> (-0.25%)`	⬇️
pandas/core/computation/expressions.py	`0% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.73% <100%> (-0.1%)`	⬇️
pandas/core/config_init.py	`98.34% <100%> (+2.22%)`	⬆️
pandas/core/panel.py	`96.91% <100%> (ø)`	⬆️
pandas/core/internals.py	`94.37% <100%> (ø)`	⬆️
pandas/core/computation/eval.py	`97.02% <100%> (+0.06%)`	⬆️
pandas/io/common.py	`68.64% <60%> (ø)`	⬆️
pandas/core/ops.py	`91.76% <75%> (-0.13%)`	⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2781b18...9d0f74a. Read the comment docs.

chris-b1 · 2017-09-28T21:07:35Z

Thanks @TomAugspurger! I'll look at the Excel stuff later, I had a half-finished branch where I think I was fighting the same config stuff you are.

I'm not sure you need to anything with the version stuff? It is very slow on dev versions, but doesn't it get effectively hardcoded in releases already?

~\AppData\Local\Continuum\Anaconda3\envs\py36\lib\site-packages\pandas\_version.py

# This file was generated by 'versioneer.py' (0.15) from
# revision-control system data, or from the parent directory name of an
# unpacked source archive. Distribution tarballs contain a pre-generated copy
# of this file.

from warnings import catch_warnings
with catch_warnings(record=True):
    import json
import sys

version_json = '''
{
 "dirty": false,
 "error": null,
 "full-revisionid": "3a7f956c30528736beaae5784f509a76d892e229",
 "version": "0.20.3"
}
'''  # END VERSION_JSON


def get_versions():
    return json.loads(version_json)

TomAugspurger · 2017-09-28T21:17:53Z

I'm not sure you need to anything with the version stuff? It is very slow on dev versions, but doesn't it get effectively hardcoded in releases already?

Oh cool. I wasn't aware of that. With numexpr delayed:

In [1]: %time import numpy
CPU times: user 35.4 ms, sys: 11.1 ms, total: 46.5 ms
Wall time: 66.7 ms

In [2]: %time import pandas
CPU times: user 125 ms, sys: 24.2 ms, total: 150 ms
Wall time: 175 ms

Down from ~240ms

pep8speaks · 2017-09-28T21:28:10Z

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 02, 2017 at 11:38 Hours UTC

TomAugspurger · 2017-09-28T22:00:17Z

I added a simple script in https://github.com/pandas-dev/pandas/pull/17710/files#diff-b3b48041ab2d614f95206404de64cdfe to check for certain modules whose import is triggered by pandas. This will run on Travis.

jorisvandenbossche · 2017-09-28T22:04:37Z

pandas/plotting/__init__.py

@@ -4,12 +4,6 @@

 # flake8: noqa

-try:  # mpl optional
-    from pandas.plotting import _converter
-    _converter.register()  # needs to override so set_xlim works with str/number


One problem with this is I think this will affect users who just use matplotlib for plotting but have pandas imported (now they will get nicer datetime formatting)

Not sure this is a reason not to do this, though, as exactly this side-effect also has come up before as a problem. But it would change the moment when the side-effect is activated (on pandas import or on first pandas plot), and not really solve the side-effect

Yep. I was about to make a PR to update their documentation. I think this is unavoidable.

I guess we could make an option, and if users really wanted they could set something in their ipython config. But I'm tempted to wait.

In #16764 (comment) you had a patch with a global registered flag. Not sure how important it is to not again register on each plotting call.

In eg a case where a user manually removed entries from the registry to not have pandas things there, this could help.

I still have that flag https://github.com/pandas-dev/pandas/pull/17710/files#diff-2b118bda866f3d626ceb6b529c62cd1aR46

Is the desired behavior that we should only attempt to register once? So if a user removes the pandas converters, they don't get re-added on each call? That's what happens currently.

Ah, should have overlooked that in the diff.

Do we need to expose a user-oriented function to register the converters? (eg pandas.plotting.register_converters())

Ah, I see that in the matplotlib docs they already use the one through pandas.tseries.converter (https://github.com/matplotlib/matplotlib/pull/9251/files) (although the location has changed)

jreback · 2017-09-30T00:21:51Z

lgtm

many run some asv (or subset) to make sure nothing changes substabtially

jorisvandenbossche

This is really nice!

chris-b1

ltgm

jreback · 2017-09-30T13:54:42Z

.travis.yml

@@ -121,6 +121,8 @@ script:
  - ci/script_single.sh
  - ci/script_multi.sh
  - ci/lint.sh
+  - echo "checking imports"


maybe make a checking_imports.sh just to make style similar to existing

jreback · 2017-09-30T13:58:54Z

pandas/core/computation/check.py

+    _NUMEXPR_INSTALLED = ver >= LooseVersion(_MIN_NUMEXPR_VERSION)
+
+    if not _NUMEXPR_INSTALLED:
+        warnings.warn(


we do the same type of thing with bottleneck

jreback · 2017-09-30T14:00:41Z

pandas/core/ops.py

    def na_op(x, y):
+        import pandas.core.computation.expressions as expressions


if these imports become an issue i can do a global check (like u do with matplotlib)

jreback · 2017-10-02T11:29:14Z

@TomAugspurger need to rebase I had merged something else.

jreback · 2017-10-02T13:32:14Z

@TomAugspurger lgtm. thanks!

pandas-dev/pandas#17710

* COMPAT: pandas TimeGrouper xref pandas-dev/pandas#16747 * COMPAT: For pandas 0.21 CategoricalDtype * COMPAT: For pandas 0.21.0 HTML repr changes pandas-dev/pandas#16879 * COMPAT: For pandas 0.21.0 numexpr import pandas-dev/pandas#17710 * COMPAT: Default for inplace is now False in eval pandas-dev/pandas#11149

closes pandas-dev#16764

gerritholl · 2017-10-25T22:28:40Z

This PR includes commit 2310faa which unfortunately triggers a problem in xarray plotting, pydata/xarray#1661 . I don't know nearly enough about pandas to know what the right approach is to mitigate or resolve this.

closes pandas-dev#16764

TomAugspurger added the Performance Memory or execution speed performance label Sep 28, 2017

TomAugspurger added this to the 0.21.0 milestone Sep 28, 2017

TomAugspurger force-pushed the delay-import branch from 0e4fb69 to 08c5b8a Compare September 28, 2017 21:29

jorisvandenbossche reviewed Sep 28, 2017

View reviewed changes

TomAugspurger mentioned this pull request Sep 28, 2017

DOC: Update instructions on pandas converters matplotlib/matplotlib#9251

Merged

1 task

jorisvandenbossche approved these changes Sep 30, 2017

View reviewed changes

chris-b1 approved these changes Sep 30, 2017

View reviewed changes

jreback reviewed Sep 30, 2017

View reviewed changes

TomAugspurger added 6 commits October 2, 2017 06:32

Delay matplotlib import

86050cd

delay pytest

be3d13e

Delay excel, probably broken

cd2226c

Matplotlib cleanup

95d96fc

excel configuration

bede1f9

PERF: delay numexpr

a1a15bb

TomAugspurger added 8 commits October 2, 2017 06:36

fixup! Matplotlib cleanup

028bb8a

PERF: delay import of py.path, Pathlib

85f8baa

Add a script to test imports

748ea2d

CI: Check for accidental imports

52295b8

whatsnew

067b561

Added release note

b9f6a14

Fix script call

9bbd9c6

pep8

9d0f74a

TomAugspurger force-pushed the delay-import branch from 99291df to 9d0f74a Compare October 2, 2017 11:37

jreback merged commit 2310faa into pandas-dev:master Oct 2, 2017

TomAugspurger mentioned this pull request Oct 3, 2017

COMPAT: pandas 0.21.0 dask/dask#2736

Closed

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Oct 3, 2017

COMPAT: For pandas 0.21.0 numexpr import

e3557fd

pandas-dev/pandas#17710

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Oct 5, 2017

COMPAT: For pandas 0.21.0 numexpr import

f3b4394

pandas-dev/pandas#17710

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Oct 5, 2017

COMPAT: For pandas 0.21.0 numexpr import

d8aa1fc

pandas-dev/pandas#17710

mrocklin mentioned this pull request Oct 16, 2017

startup time with Client is long dask/distributed#1399

Closed

TomAugspurger deleted the delay-import branch October 16, 2017 14:09

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017

Delay import (pandas-dev#17710)

d434ad2

closes pandas-dev#16764

TomAugspurger mentioned this pull request Oct 23, 2017

PERF: pandas import is too slow #7282

Closed

TomAugspurger mentioned this pull request Nov 7, 2017

DatetimeIndex not formatted for plots #18153

Closed

jklymak mentioned this pull request Nov 9, 2017

provide converters for datetime64 types matplotlib/matplotlib#9610

Closed

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

Delay import (pandas-dev#17710)

d6d62ab

closes pandas-dev#16764

This was referenced Nov 14, 2017

[MRG+1] Remove nose from CIs and documentation scikit-learn/scikit-learn#9840

Merged

don't register pandas mpl unit converters upon import #2579

Closed

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Delay import (pandas-dev#17710)

9463730

closes pandas-dev#16764

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay import #17710

Delay import #17710

TomAugspurger commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017 •

edited

Loading

TomAugspurger commented Sep 28, 2017

max-sixty commented Sep 28, 2017

jreback commented Sep 28, 2017

jreback commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017

codecov bot commented Sep 28, 2017 •

edited

Loading

chris-b1 commented Sep 28, 2017 •

edited

Loading

TomAugspurger commented Sep 28, 2017 •

edited

Loading

pep8speaks commented Sep 28, 2017 •

edited

Loading

TomAugspurger commented Sep 28, 2017

jorisvandenbossche Sep 28, 2017 •

edited

Loading

TomAugspurger Sep 28, 2017

TomAugspurger Sep 28, 2017

jorisvandenbossche Sep 28, 2017

TomAugspurger Sep 29, 2017

jorisvandenbossche Sep 29, 2017

jorisvandenbossche Sep 29, 2017

jreback commented Sep 30, 2017

jorisvandenbossche left a comment

chris-b1 left a comment

jreback Sep 30, 2017

jreback Sep 30, 2017

jreback Sep 30, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

gerritholl commented Oct 25, 2017

		def na_op(x, y):
		import pandas.core.computation.expressions as expressions

Delay import #17710

Delay import #17710

Conversation

TomAugspurger commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017 • edited Loading

TomAugspurger commented Sep 28, 2017

max-sixty commented Sep 28, 2017

jreback commented Sep 28, 2017

jreback commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017

codecov bot commented Sep 28, 2017 • edited Loading

Codecov Report

chris-b1 commented Sep 28, 2017 • edited Loading

TomAugspurger commented Sep 28, 2017 • edited Loading

pep8speaks commented Sep 28, 2017 • edited Loading

Comment last updated on October 02, 2017 at 11:38 Hours UTC

TomAugspurger commented Sep 28, 2017

jorisvandenbossche Sep 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 30, 2017

jorisvandenbossche left a comment

Choose a reason for hiding this comment

chris-b1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

gerritholl commented Oct 25, 2017

TomAugspurger commented Sep 28, 2017 •

edited

Loading

codecov bot commented Sep 28, 2017 •

edited

Loading

chris-b1 commented Sep 28, 2017 •

edited

Loading

TomAugspurger commented Sep 28, 2017 •

edited

Loading

pep8speaks commented Sep 28, 2017 •

edited

Loading

jorisvandenbossche Sep 28, 2017 •

edited

Loading