-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/INT: remove Index as a sub-class of NDArray #7891
Conversation
We will call you Mr Anti-NDArray :) |
cc @Komnomnomnom if you could have a look at the json issues would be great! |
|
amen to that composition ftw |
No problem guys, I'll take a look this weekend. |
would you mind taking a look at this PR and see if you can figure this out? seem the PeriodIndex is getting converted to underlying (and not preserverd) like in master..... thxs
|
@jreback I think following lines should be changed not to pass One difference is matplotlib axis no longer can hold
https://github.com/pydata/pandas/blob/master/pandas/tseries/plotting.py#L64 |
@sinhrks ahh..ok, lmk try to replace that and see. |
@hayd I certainly hope so! I am super stoked about this change. |
@hayd their is already a lot of work on RangeIndex here: https://github.com/jtratner/pandas/tree/add-range-index I think it should be a bit easier. However, I have to think about how to fix the real problem, which is that That will have to wait though. @shoyer when you are ready to integrate |
|
||
- pickles <= 0.8.0 may not work if they contain MultiIndexes. | ||
- you may need to unpickle < 0.15.0 pickles using ``pd.read_pickle`` rathen than ``pickle.loads``. See :ref:`pickle docs <io.pickle>` | ||
- boolean comparisons of ``DatetimeIndex`` that have ``NaT`` with ``ndarray`` ONLY work if the ndarray is on the right-handle side. An example of this limited case is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you could fix (at least for standard ndarrays) by setting __array_priority__ > 1
?
http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#special-attributes-and-methods
(someone should really update those docs to discourage subclassing!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm maybe if I add array_prepare
the issue is that ndarray defines 'lt' for example and I don't know anyway to have it reverse the args and call 'ge' on the index instead
do u?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer I have determined that this CAN be done by intercepting (and interpreting the context
) in the __array_preprare__
call (similar to what is done in core/series.py/__array_prepare__
). However IMHO this is pretty complicated (as you would need to translate the ufunc and reverse the arguments). Leaving it off for now with just the docs warning. I think this is a very limited case anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think use of __array_prepare__
is necessary. Numpy will call gt
instead of lt
if you set a higher array priority for the second argument:
class ArrayLike(object):
def __init__(self, array, priority):
self.array = array
self.__array_priority__ = priority
def __array__(self):
return self.array
def __lt__(self, other):
print 'subclass used lt'
return self.array < other
def __le__(self, other):
print 'subclass used le'
return self.array <= other
def __eq__(self, other):
print 'subclass used eq'
return self.array == other
def __ne__(self, other):
print 'subclass used ne'
return self.array != other
def __gt__(self, other):
print 'subclass used gt'
return self.array > other
def __ge__(self, other):
print 'subclass used ge'
return self.array >= other
Examples:
In [3]: np.array([0, 0]) < ArrayLike(np.array([1, 1]), priority=None)
Out[3]: array([ True, True], dtype=bool)
In [4]: np.array([0, 0]) <= ArrayLike(np.array([1, 1]), priority=2)
subclass used ge
Out[4]: array([ True, True], dtype=bool)
In [5]: 0 <= ArrayLike(np.array([1, 1]), priority=2)
subclass used ge
Out[5]: array([ True, True], dtype=bool)
@sinhrks I have made the changes for plotting here. jreback@1fc0a1f can you confirm that the graphs act/look the same? (esp when resampled/zoomed and such) as an aside this now makes all lines whether |
@shoyer that was a good idea to use you have to catch |
@sinhrks I updated this commit a couple of times to put in some period construction optimizations.... pls lmk about the plottnig (if you think its ok). (all tests now pass). |
@jreback I've checked some plots, and these worked as the same as before. If anything, I'll confirm again. |
hey @jreback just taking a look at the json stuff now |
result = func(other) | ||
if result is NotImplemented: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I finally figured out what is happening here.
Numpy is not performing (ndarray, Index) comparisons here (returning NotImplemented
) because Index has a higher array priority.
The fix is to always coerce the right-side argument into a plain ndarray
. e.g., result = func(np.asarray(other))
(better to use asarray
than array
to avoid unnecessary copies). If you do that, you will be able to skip the NotImplemented
check.
done. also I have put in (well a little), structure on the Index testing for more generic testing, e.g. jreback@d1c4fbb so prob worhwhile after this PR is merged to 'fix' the index tests to make it more class based (how Float64/Int64 are done). to make it a big more generic. E.g. |
ok monster is ready to merge. any further comments.
@jorisvandenbossche @cpcloud @shoyer @sinhrks @hayd I think I understand pickle and all of its evils now :) (not sure if that is a net benefit to society though) |
|
||
__array_priority__ = 1000 | ||
|
||
def __array_prepare__(self, result, context=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh __array_prepare__
yes I know....sort of left in in their ...will take out
@@ -2137,14 +2137,14 @@ def copy(self, deep=True): | |||
---------- | |||
deep : boolean, default True | |||
Make a deep copy, i.e. also copy data | |||
axes : string or None, default None | |||
View copy of the axes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what this means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, maybe it's better to merge this with deep
argument, e.g.
deep=False
: shallow copydeep=True
: deep copy of values, shallow copy of axesdeep='withaxes'
: deep copy of everything (withaxes
could be any token that clarifies the meaning)
Would be nice to have deep=True
to deep-copy everything and deep='values'
/deep='axes'
to pick only one component, but that seems non-backward compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe accept deep='values'
as an alias for deep=True
and deep=('values', 'axes')
to deep-copy everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we keep the axes
arg, I would rather make it a bool like deep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was all a kludge to avoid repeating code, their is exacttly 1 case where I need this: reduce.pyx/Reducer
. Basically need to make a complete copy of an object including a deep copy of its index.
deep=True
does not copy the actual data, rather it is a view on it. This preserves numpy semantics so memory is shared. We never actually need to copy index data memory as these are immutable and so cannot be changed. We always just create a new object (with possibly shared memory).
Meta-data is a different story (e.g. .name
), where we almost always want/need to copy this (e.g. .view
uses ._shallow_copy
for this purpose).
However, in this reducer because of how it actually messes with the pointers, I do actually need to copy the memory.
I needed a 'private' way of doing that. So either make axes
private, or just overload deep
(default is still always True
). Will change to deep=True|False|'all'
.
The user never needs to actually copy the index data as it is a view and numpy takes care of that. This is an internal usage.
@jreback Added a bunch of comments (mainly on docs and public API, not familiar enough to comment on technical details) Further, I wondered, are there things we have learned from the "Series -> NDFrame subclass and no longer ndarray subclass" move that can be relevant here? Issues that came up afterwards (where we had to say "series is no longer ndarray subclass, so this will not work anymore) that we can now warn for beforehand? |
@@ -1894,8 +2008,75 @@ def drop(self, labels): | |||
raise ValueError('labels %s not contained in axis' % labels[mask]) | |||
return self.delete(indexer) | |||
|
|||
@classmethod | |||
def _add_numeric_methods_disabled(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if _add_numeric_methods_disabled
and _add_numeric_methods
could be put into pandas.core.ops
module to reuse/be reused from such methods implemented for other containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started doing this, but the ops.py
was a bit too specific. It could/should be fixed I think. but would require some dedicated effort. Feel free!
@jorisvandenbossche I just update the properties directly, better I think because then don't have to clutter with the full ndarray definitions. |
OK, it is a bit a compromise between both, as for some functions it can also be usefull information, for some it is too much clutter ..
|
ahh ok, that makes sense |
@jorisvandenbossche ok added a lot of see alsos (bot series and index), and put doc-strings on lots of attributes. |
ok, think this is ready. I put back the MultiIndex support for really old pickles (wasn't hard). though not sure anyone really has them around. any final comments |
@jorisvandenbossche added the rest of the properties (and now consolidated in 1 place) |
CLN: add searchsorted to core/base (GH6712, GH7447, GH6469) fixup tests in test_timeseries for reverse ndarray/datetimeindex comparisons fix algos / multi-index repeat (essentially this is a bug-fix) ENH: add NumericIndex and operators, related (GH7439) DOC: indexing/v0.15.0 docs TST: fixed up plotting issues COMPAT/API: use __array_priority__ to facility proper comparisons of DatetimeIndex with ndarrays fixup to do actual views in copy (except in reduce where its needed) COMPAT: numpy compat with 1.6 for np.may_share_memory FIX: access values attr in JSON code to support index that's not an ndarry subclass COMPAT: numpy compat with array_priority fix CLN: remove constructor pickle compat code as not necessary COMPAT: fix pickle in sparse CLN: clean up shallow_copy/simple_new COMPAT: pickle compat remove __array_prepare__ COMPAT: tests & compat for numeric operation support only on supported indexes DOC: fixup for comments COMPAT: allow older MultiIndex pickles again CLN: combine properties from index/series for ndarray compat
ok, bombs away... |
CLN/INT: remove Index as a sub-class of NDArray
Nice! |
That was huge, great job |
Bravo! |
make
Index
now subclassPandasObject/IndexOpsMixin
rather thanndarray
should allow much easier new Index classes (e.g. #7640)
This doesn't change the public API at all, and provides compat
closes #5080
back compat for pickles is now way simpler
ToDo:
.repeat
on MultiIndex (broken in master)DatetimeIndex
withNaT
vs ndarraysIndex
now doesn't have implicit ops (aside from__sub__/__add__
, these need to be added in (e.g.__mul__,__div__,__truediv__
).closes #5155 (perf fix for Period creation), slight increase on the plotting
because of the the plottling routines holding array of Periods (rather than PeriodIndex).