ENH: implement DatetimeLikeArray #19902

jbrockmendel · 2018-02-26T05:26:43Z

The medium-term goal: refactor out of DatetimeIndexOpsMixin/DatetimeIndex/TimedeltaIndex/PeriodIndex the bare minimum subset of functionality to implement arithmetic+comparisons for DatetimeArray/TimedeltaArray/PeriodArray. This PR does not do that.

What it does do is refactor out the subset of those methods that can be transplanted directly into the Array classes (i.e. cut/paste).

On its own this PR is not very useful, so think of it as a Proof of Concept/discussion piece.

cc @TomAugspurger since this is a precursor to getting a "real" PeriodArray.

TomAugspurger

Seems like a reasonable organization at first glance. Will look closer / think more about it later.

TomAugspurger · 2018-02-26T11:58:49Z

pandas/core/indexes/datetimelike.py

+    # ------------------------------------------------------------------
+    # Null Handling
+
+    @property  # NB: override with cache_readonly in immutable subclasses


Did you have a PR started that made something like a @maybe_cache_readonly? That'll look more appealing with this in place, else we'll just be overriding these just to mark them as cached.

Your PR probably did this, but ideally would would have a class attribute that indicates whether the class is immutable, and a single decorator for both.

Yah the PR you're thinking of did exactly that. At the time there was only one property/cache_readonly affected, but this was the motivation. It'll be easy to revive if it becomes necessary.

TomAugspurger · 2018-02-26T12:01:01Z

pandas/core/indexes/datetimelike.py

    """ common ops mixin to support a unified interface datetimelike Index """
+    inferred_freq = cache_readonly(DatetimeLikeArray.inferred_freq.fget)


Ah, this isn't so bad...

TomAugspurger · 2018-02-26T12:02:11Z

pandas/core/indexes/datetimes.py

@@ -174,8 +174,92 @@ def _new_DatetimeIndex(cls, d):
    return result


-class DatetimeIndex(DatelikeOps, TimelikeOps, DatetimeIndexOpsMixin,
-                    Int64Index):
+class DatetimeArray(DatetimeLikeArray):


I don't think we've discussed this anywhere yet, but I'm not sure if we want a plain DatetimeArray, just a DatetimeTZArray. We'll need to hash that out somewhere. That discussion probably depends on how public these EAs are going to be.

This may be a place where our goals overlap imperfectly. My goal is to ensure that Index/Series/DataFrame comparison/arithmetic behavior is consistent by having shared implementations of those methods. For that purpose I expect it'll be easier to have a single DatetimeArray for both aware/naive than to juggle DatetimeTZArray/ndarray[datetime64[ns]]

I may also be confused about what the "Extension" in Extension Array is for. I'm thinking of it largely as "extending numpy arrays", whereas the canonical usage may be for downstream users to extend pandas.

Regardless, if we reach consensus on this part of the diff, the next step is to move over a handful of methods that require only a 1-line change to wrap an Index object (e.g. DatetimeIndex.to_julian_dates)

For that purpose I expect it'll be easier to have a single DatetimeArray

Completely agreed that a single implementation is the only sane way to achieve that.

I'm thinking of it largely as "extending numpy arrays",

That's right, but NumPy's datetime64[ns] is I think sufficient for us as far as tz-naive datetimes go (I may be wrong about us).

nump's impl has served us, but we it is immensely easier to have a real DTI be the actual underlying implementation as we can easily extend this. So I am onboard with @jbrockmendel here to have a combined DatetimeArray. We actually discussed this I think in 0.17.0 when I created this originally, but was rejected for compat with numpy. We could still have that (the issue is what .values outputs). but it makes the code much better to have this than not.

@jreback glad to hear you're on board. Any thoughts on the appropriate size/scope per PR to make reviewers' task easier?

i think could move much of this to core/array/datetime.py

doing straight moves first then changes are good

codecov · 2018-02-26T12:54:58Z

Codecov Report

Merging #19902 into master will increase coverage by <.01%.
The diff coverage is 96.85%.

@@            Coverage Diff             @@
##           master   #19902      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         154      158       +4     
  Lines       49659    49701      +42     
==========================================
+ Hits        45640    45680      +40     
- Misses       4019     4021       +2

Flag	Coverage Δ
#multiple	`90.28% <96.85%> (ø)`	⬆️
#single	`41.94% <74.84%> (+0.04%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/timedelta.py	`100% <100%> (ø)`
pandas/core/indexes/timedeltas.py	`91.19% <100%> (-0.06%)`	⬇️
pandas/core/indexes/datetimes.py	`95.48% <100%> (-0.18%)`	⬇️
pandas/core/indexes/period.py	`92.58% <100%> (-0.1%)`	⬇️
pandas/core/indexes/datetimelike.py	`96.72% <100%> (+0.24%)`	⬆️
pandas/core/arrays/__init__.py	`100% <100%> (ø)`	⬆️
pandas/core/arrays/period.py	`100% <100%> (ø)`
pandas/core/arrays/datetimes.py	`100% <100%> (ø)`
pandas/core/arrays/datetimelike.py	`92.75% <92.75%> (ø)`
pandas/core/indexes/multi.py	`94.96% <0%> (-0.01%)`	⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad76ffc...a684c2d. Read the comment docs.

TomAugspurger · 2018-02-26T17:10:26Z

pandas/core/indexes/datetimelike.py

@@ -121,8 +121,149 @@ def ceil(self, freq):
        return self._round(freq, np.ceil)


-class DatetimeIndexOpsMixin(object):
+class DatetimeLikeArray(object):


Maybe append Mixin to the name here, to indicate that this still can't be constructed and used on its own?

For testing purposes I was planning to implement a bare-bones __new__. That actually raises an important question: what is the canonical attribute to assign the values input to? For DTI/TDI/PI right now it is self._data, but for the Block subclasses it's self.values. Has a convention been established for ExtensionArrays?

(none of which is mutually exclusive with Mixin being a good suggestion)

…arrays3

jbrockmendel · 2018-02-27T04:11:42Z

Moved the new classes to core.arrays directory. The set of cut/paste-able methods will be expanded when #19912 and #19903 go through.

…arrays3

jbrockmendel · 2018-03-02T01:50:23Z

Now that #19800 is in, the follow-up to this can include all of the comparison method (the Index ops will need a ~2 line wrapper around the array ops)

…arrays3

jbrockmendel · 2018-03-02T17:36:56Z

Thoughts on where to go with this? The steps after this are going to require a lot of work to carefully port the appropriate tests, so I'd like to keep slow-and-steady momentum.

…arrays3

TomAugspurger

Sorry for the delay. Overall this looks good.

@jbrockmendel can you sketch out your next few steps here? Can you edit the OP in #19696? If not make a list and I'll add it, maybe as a sublist.

Can you add DatetimeArray, PeriodArray, and TimedeltaArray to pandas.core.arrays.__init__?

TomAugspurger · 2018-03-16T13:43:48Z

pandas/core/arrays/datetimelike.py

+from pandas.core.algorithms import checked_add_with_arr
+
+
+class DatetimeLikeArray(object):


Append Mixin to the class name?

TomAugspurger · 2018-03-16T13:49:08Z

pandas/core/indexes/datetimelike.py

    """ common ops mixin to support a unified interface datetimelike Index """
+    inferred_freq = cache_readonly(DatetimeLikeArray.inferred_freq.fget)


Comment here to note why we do it like this (array is mutable, index is immutable).

jbrockmendel · 2018-03-16T16:20:47Z

can you sketch out your next few steps here?

This is ~everything that can be cut/paste directly. The next step I have in mind is a handful of methods that need a 1-line wrapping in the Index classes. e.g. comparison methods are non-Index-specific right up until the last line when they wrap an ndarray in an Index. Following that are the arithmetic methods, for which the wrapping is less trivial. Does that answer the question?

Can you edit the OP in #19696?

Sure.

BTW let me know if time is an issue on this or other PRs. I've been distracted for the last couple of weeks with a bugfix-fork of statsmodels that's left a bunch of stuff here on hiatus.

TomAugspurger · 2018-03-16T16:24:51Z

BTW let me know if time is an issue on this or other PRs.

It's starting to be. I think we want to do a release candidate in the next couple weeks. It'd be nice to have as much of the EA stuff done as possible. My plan is for groupby to be the last bit of API that we ensure works, and then pick up moving our other extension types over to the new interface. If you're able to take on any of that it'd be great.

bugfix-fork of statsmodels

That's unfortunate, but understandable :/ I'm hoping to push on a statsmodels release sometime shortly after pandas 0.23.0.

…arrays3

jbrockmendel · 2018-03-18T17:11:12Z

Making suggested changes now. Will push shortly.

re arrays.__init__, any thoughts about defining __all__ and removing the # noqas? Not a big deal, just a slight preference.

re statsmodels: see sm2. The vague hope is that it gets enough community traction to convince jpkt to take technical debt seriously, at which point fixes can be upstreamed and it can become unnecessary.

…init__

TomAugspurger · 2018-03-20T13:49:35Z

re arrays.init, any thoughts about defining all and removing the # noqas? Not a big deal, just a slight preference.

No preference at all.

@jorisvandenbossche / @jreback this LGTM. Any concerns?

jorisvandenbossche · 2018-03-20T13:50:46Z

How do I have to see this PR?
Is this is a first actual stab at implementing the array types for datetime/period/timedelta? Or more a reorganisation of code?

Because if it is the first, I probably have a bunch of comments on what we exactly want to put in the array classes. And also, if that is the case, I am not sure we want the Index to subclass them? I thought we would rather go for composition?

TomAugspurger · 2018-03-20T13:56:20Z

Reorganization. DatetimeLikeArray isn't an ExtensionArray yet.

And also, if that is the case, I am not sure we want the Index to subclass them? I thought we would rather go for composition?

I've been going back and forth on which approach is best here. I'm slightly coming around to the idea of subclassing, but haven't 100% settled yet. I think that the changes here are going to be helpful either way, correct @jbrockmendel? At some point we'll either make DatetimeLikeArray inherit from ExtensionArray, or the Index classes will change to compose it.

jbrockmendel · 2018-05-25T19:43:41Z

ResourceWarning in TestMangleDupes appears unrelated

…arrays3

jbrockmendel · 2018-06-11T16:07:10Z

gentle ping

…arrays3

jreback

lgtm. @jorisvandenbossche @TomAugspurger

jorisvandenbossche

I still have the feeling that this single step rather complicates things (eg the inheritance scheme), but since the goal is that this is only temporary, I suppose I don't really care :-)

jorisvandenbossche · 2018-06-22T15:31:32Z

pandas/core/arrays/__init__.py

@@ -1,2 +1,5 @@
 from .base import ExtensionArray  # noqa
 from .categorical import Categorical  # noqa
+from .datetimes import DatetimeArrayMixin  # noqa
+from .periods import PeriodArrayMixin  # noqa


can you make this period? and below timedelta
I am fine with keeping the conflicting one as plural, if you want that

which is "the conflicting one"?

Sorry, I was speaking about the file name, not the order. So to use singular period instead of periods (and same for timedelta), as we discussed about before: https://github.com/pandas-dev/pandas/pull/19902/files/3a67bce8169663430005b2a36673132fc1e79f4c#r175774283

…arrays3

jbrockmendel · 2018-06-28T21:32:45Z

@jreback just rebased. If we can push this through I can get the next step up over the weekend and we'll have a shot at finishing the transition at the sprint.

jreback · 2018-06-29T00:21:14Z

going to merge #21261 then have you rebase (not that I expect conflicts). Then can merge.

jreback · 2018-06-29T02:02:25Z

pls rebase

…arrays3

jbrockmendel · 2018-06-29T22:50:27Z

ping

…arrays3

jreback · 2018-07-02T23:58:48Z

thanks!

jbrockmendel added 2 commits February 25, 2018 21:14

implement DatetimeLikeArray

df9f894

docstring

004f137

TomAugspurger reviewed Feb 26, 2018

View reviewed changes

jbrockmendel added 2 commits February 26, 2018 18:49

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

db72494

…arrays3

move classes to array directory

80b525e

jbrockmendel added 4 commits February 27, 2018 07:40

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

54009d4

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

65dc829

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

4c5a05c

…arrays3

flake8 fixup remove unused import

b91ddac

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

080e477

…arrays3

gfyoung added Enhancement Internals Related to non-user accessible pandas implementation labels Mar 2, 2018

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

e19f70a

…arrays3

TomAugspurger mentioned this pull request Mar 16, 2018

ENH: Sorting of ExtensionArrays #19957

Merged

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

47d365e

…arrays3

jbrockmendel mentioned this pull request Mar 18, 2018

ExtensionArray meta-issue #19696

Closed

15 tasks

comments for cache_readonlys, append Mixin to name, add imports to __…

3a67bce

…init__

TomAugspurger approved these changes Mar 20, 2018

View reviewed changes

jbrockmendel added 6 commits May 26, 2018 11:27

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

8cee92c

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

375329e

…arrays3

fixup missing import

71dfe08

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

59c60a2

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

9db2b78

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

308c25b

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

94bdfcb

…arrays3

jreback approved these changes Jun 21, 2018

View reviewed changes

jorisvandenbossche reviewed Jun 22, 2018

View reviewed changes

jbrockmendel added 4 commits June 22, 2018 11:03

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

d589e2a

…arrays3

reorder imports

0d4f48a

De-pluralize

1b910c7

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

cece116

…arrays3

jreback added this to the 0.24.0 milestone Jun 29, 2018

jreback changed the title ~~WIP: implement DatetimeLikeArray~~ ENH: implement DatetimeLikeArray Jun 29, 2018

jbrockmendel added 3 commits June 28, 2018 20:14

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

828022a

…arrays3

fixup remove unused import

ed83046

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

c1934db

…arrays3

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

a684c2d

…arrays3

jreback merged commit 7cd2679 into pandas-dev:master Jul 2, 2018

jbrockmendel deleted the dtarrays3 branch July 2, 2018 23:59

jbrockmendel mentioned this pull request Jul 3, 2018

Move Unchanged arith methods to EA mixins #21712

Merged

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

ENH: implement DatetimeLikeArray (pandas-dev#19902)

5b2bdbf

		""" common ops mixin to support a unified interface datetimelike Index """
		inferred_freq = cache_readonly(DatetimeLikeArray.inferred_freq.fget)

		from pandas.core.algorithms import checked_add_with_arr


		class DatetimeLikeArray(object):

ENH: implement DatetimeLikeArray #19902

ENH: implement DatetimeLikeArray #19902

Conversation

jbrockmendel commented Feb 26, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 26, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Feb 27, 2018

jbrockmendel commented Mar 2, 2018

jbrockmendel commented Mar 2, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

jbrockmendel commented Mar 18, 2018

TomAugspurger commented Mar 20, 2018

jorisvandenbossche commented Mar 20, 2018

TomAugspurger commented Mar 20, 2018

jbrockmendel commented May 25, 2018

jbrockmendel commented Jun 11, 2018

jreback left a comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jun 28, 2018

jreback commented Jun 29, 2018

jreback commented Jun 29, 2018

jbrockmendel commented Jun 29, 2018

jreback commented Jul 2, 2018

codecov bot commented Feb 26, 2018 •

edited

Loading