-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay import #17710
Delay import #17710
Conversation
I may have messed up some of the excel config stuff... I think all the defaults are still the same, and the tests pass, but that's a bit messy. AFACT the defaults are
|
The next big one would be avoiding numexpr, (~21 ms), since it imports pkg_resources, which takes a while. That got a little messy though. |
I'm thinking about tests for this. We should be able to whip something up with ModuleFinder |
I don't know |
NO this exactly the purpose of versioneer |
this could certainly be deferred till first use |
I have as well. It seems to be the "recommended" way to do automatic versioning.
To be clear, this would just be done for releases, as part of making the release commit. Immediately after tagging, the release manager would add a commit going back to versioneer.
I'm trying it out now. It's not too bad. Will push a commit in a bit. |
Codecov Report
@@ Coverage Diff @@
## master #17710 +/- ##
==========================================
- Coverage 91.25% 91.2% -0.05%
==========================================
Files 163 163
Lines 49823 49831 +8
==========================================
- Hits 45464 45450 -14
- Misses 4359 4381 +22
Continue to review full report at Codecov.
|
Thanks @TomAugspurger! I'll look at the Excel stuff later, I had a half-finished branch where I think I was fighting the same config stuff you are. I'm not sure you need to anything with the version stuff? It is very slow on dev versions, but doesn't it get effectively hardcoded in releases already?
# This file was generated by 'versioneer.py' (0.15) from
# revision-control system data, or from the parent directory name of an
# unpacked source archive. Distribution tarballs contain a pre-generated copy
# of this file.
from warnings import catch_warnings
with catch_warnings(record=True):
import json
import sys
version_json = '''
{
"dirty": false,
"error": null,
"full-revisionid": "3a7f956c30528736beaae5784f509a76d892e229",
"version": "0.20.3"
}
''' # END VERSION_JSON
def get_versions():
return json.loads(version_json) |
Oh cool. I wasn't aware of that. With numexpr delayed:
Down from ~240ms |
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on October 02, 2017 at 11:38 Hours UTC |
0e4fb69
to
08c5b8a
Compare
I added a simple script in https://github.com/pandas-dev/pandas/pull/17710/files#diff-b3b48041ab2d614f95206404de64cdfe to check for certain modules whose import is triggered by pandas. This will run on Travis. |
@@ -4,12 +4,6 @@ | |||
|
|||
# flake8: noqa | |||
|
|||
try: # mpl optional | |||
from pandas.plotting import _converter | |||
_converter.register() # needs to override so set_xlim works with str/number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with this is I think this will affect users who just use matplotlib for plotting but have pandas imported (now they will get nicer datetime formatting)
Not sure this is a reason not to do this, though, as exactly this side-effect also has come up before as a problem. But it would change the moment when the side-effect is activated (on pandas import or on first pandas plot), and not really solve the side-effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I was about to make a PR to update their documentation. I think this is unavoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could make an option, and if users really wanted they could set something in their ipython config. But I'm tempted to wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #16764 (comment) you had a patch with a global registered flag. Not sure how important it is to not again register on each plotting call.
In eg a case where a user manually removed entries from the registry to not have pandas things there, this could help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have that flag https://github.com/pandas-dev/pandas/pull/17710/files#diff-2b118bda866f3d626ceb6b529c62cd1aR46
Is the desired behavior that we should only attempt to register once? So if a user removes the pandas converters, they don't get re-added on each call? That's what happens currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, should have overlooked that in the diff.
Do we need to expose a user-oriented function to register the converters? (eg pandas.plotting.register_converters()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that in the matplotlib docs they already use the one through pandas.tseries.converter (https://github.com/matplotlib/matplotlib/pull/9251/files) (although the location has changed)
lgtm many run some asv (or subset) to make sure nothing changes substabtially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ltgm
@@ -121,6 +121,8 @@ script: | |||
- ci/script_single.sh | |||
- ci/script_multi.sh | |||
- ci/lint.sh | |||
- echo "checking imports" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make a checking_imports.sh just to make style similar to existing
_NUMEXPR_INSTALLED = ver >= LooseVersion(_MIN_NUMEXPR_VERSION) | ||
|
||
if not _NUMEXPR_INSTALLED: | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do the same type of thing with bottleneck
def na_op(x, y): | ||
import pandas.core.computation.expressions as expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these imports become an issue i can do a global check (like u do with matplotlib)
@TomAugspurger need to rebase I had merged something else. |
99291df
to
9d0f74a
Compare
@TomAugspurger lgtm. thanks! |
* COMPAT: pandas TimeGrouper xref pandas-dev/pandas#16747 * COMPAT: For pandas 0.21 CategoricalDtype * COMPAT: For pandas 0.21.0 HTML repr changes pandas-dev/pandas#16879 * COMPAT: For pandas 0.21.0 numexpr import pandas-dev/pandas#17710 * COMPAT: Default for inplace is now False in eval pandas-dev/pandas#11149
This PR includes commit 2310faa which unfortunately triggers a problem in xarray plotting, pydata/xarray#1661 . I don't know nearly enough about pandas to know what the right approach is to mitigate or resolve this. |
Closes #16764
Improves performance by delaying the import of matplotlib, s3fs, pytest, and openpyxl.
Also removes our old MPL style sheet. The option was deprecated and removed, but the stylesheet hung around.
Master:
head:
I can shave off more by hardcoding
pandas.__version__
. Maybe that's worthwhile for releases?