-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/API: DatetimeTZDtype #23990
REF/API: DatetimeTZDtype #23990
Conversation
* Remove magic constructor from string * Remove Caching The remaining changes in the DatetimeArray PR will be to 1. Inherit from ExtensionDtype 2. Implement construct_array_type 3. Register
Hello @TomAugspurger! Thanks for submitting the PR.
|
pls perf test things caching is essential as these are created quite a lot and they are the same virtually every time |
On microbenchmarks, things are fine master: In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 39 µs, sys: 31 µs, total: 70 µs
Wall time: 75.6 µs
In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 15 µs, sys: 0 ns, total: 15 µs
Wall time: 18.8 µs PR: In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 29 µs, sys: 23 µs, total: 52 µs
Wall time: 56 µs
In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 16 µs, sys: 0 ns, total: 16 µs
Wall time: 19.8 µs ASV for the timeseries, timestamps, and offsets
Line-profiling reveals that basically all the time is spent on the timezone check.
I switched the properties to |
pandas/core/dtypes/dtypes.py
Outdated
_metadata = ('unit', 'tz') | ||
_match = re.compile(r"(datetime64|M8)\[(?P<unit>.+), (?P<tz>.+)\]") | ||
_cache = {} | ||
# TODO: restore caching? who cares though? It seems needlessly complex. | ||
# np.dtype('datetime64[ns]') isn't a singleton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a huge performance penalty w/o caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see the perf numbers I posted in
#23990 (comment)? It seems to be slightly faster without caching (though within noise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try running a good part of the test suite. Its the repeated consruction that's a problem, not the single contruction which is fine. W/o caching you end up creating a huge number of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess could remove the comment now
Are you worried about memory usage? The time to create a new one from
scratch is identical or faster than looking it up from the cache.
…On Thu, Nov 29, 2018 at 7:06 AM Jeff Reback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/dtypes/dtypes.py
<#23990 (comment)>:
> _metadata = ('unit', 'tz')
_match = re.compile(r"(datetime64|M8)\[(?P<unit>.+), (?P<tz>.+)\]")
_cache = {}
+ # TODO: restore caching? who cares though? It seems needlessly complex.
+ # np.dtype('datetime64[ns]') isn't a singleton
try running a good part of the test suite. Its the repeated consruction
that's a problem, not the single contruction which is fine. W/o caching you
end up creating a huge number of these
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#23990 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIuydvXMpt0620svkG4I2c9VNuJfMks5uz9vOgaJpZM4Y42qg>
.
|
maybe things changed since I created this originally, but w/o caching perf was not great. |
Codecov Report
@@ Coverage Diff @@
## master #23990 +/- ##
===========================================
+ Coverage 42.38% 92.25% +49.87%
===========================================
Files 161 161
Lines 51701 51689 -12
===========================================
+ Hits 21914 47687 +25773
+ Misses 29787 4002 -25785
Continue to review full report at Codecov.
|
I've removed |
dont you need to remove the |
7ab2a74 removed all the |
Do we typically add release notes for something like this (deprecations that aren't part of the public API, but are possibly in use downstream)? |
Added to the deprecation log. |
this IS certainly part of the public API as |
I know |
Ah, I didn't realize it was exported in |
Added a release note. Going to followup on api.rst later, since we don't have an |
thanks @TomAugspurger lgtm. let's merge on green. can you add a ref into #6581 |
@datapythonista any idea what's up with the azure CI? https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4597. Seems to have hit https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4596 too. |
Haven't seen those errors before, I guess something is failing in their side. @davidstaheli do you mind taking a look at https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4597 |
@TomAugspurger needs a rebase as well |
@TomAugspurger I was trying to create an incident for azure, but the url does some weird redirect with the language and it crashes. Not sure if it's Linux specific, can you check if the "Create incident" button works for you: https://azure.microsoft.com/en-us/support/create-ticket/ (the Basic technical support one) Or @vtbassmatt, may be you can help with the above? See the error in https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4597, we are getting that often |
I was checking, and numba is getting the same error from azure. I guess something is down for the whole azure-pipelines. https://dev.azure.com/numba/numba/_build/results?buildId=471 |
I suspect it's a little bit of abuse fighting that was turned up a little too high. I think we've addressed it now. |
thanks @TomAugspurger |
A cleanup of DatetimeTZDtype
unit
with eitherunit
or a fulldatetime64[ns, tz]
string. This required changingDatetimeTZDtype.construct_from_string
and changing a number of places to useconstruct_from_string
rather than the main__new__
__new__
to__init__
and remove caching. It seemed to be causing more problems that it was worth. You could too easily create nonsense DatetimeTZDtypes like.tz
and.unit
to properties instead of attributes. I've not provided setters. We could in theory, since we're getting rid of caching, but I'd rather wait till there's a demand..The remaining changes in the DatetimeArray PR will be to