-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Internal / External values #19558
Changes from 18 commits
41f09d8
29cfd7c
3185f4e
5a59591
476f75d
b15ee5a
659073f
7accb67
9b8d2a5
9fbac29
55305dc
0e63708
fbbbc8a
46a0a49
2c4445a
5612cda
b012c19
d49e6aa
d7d31ee
7b89f1b
b0dbffd
66b936f
32ee0ef
a9882e2
f53652a
2425621
512fb89
170d0c7
402620f
d9e8dd6
815d202
a727b21
f368c29
d74c5c9
8104ee5
f8e29b9
0cd9faa
8fcdb70
34a6a22
c233c28
d6e8051
3af8a21
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,21 @@ not check (or care) whether the levels themselves are sorted. Fortunately, the | |
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but | ||
if you compute the levels and labels yourself, please be careful. | ||
|
||
Values | ||
~~~~~~ | ||
|
||
Pandas extends NumPy's type system in a few places, so we have multiple notions of "values" floating around. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first sentence is totally not clear to a new reader |
||
For 1-D containers (``Index`` classes and ``Series``) we have the following convention: | ||
|
||
* ``cls._ndarray_values`` is *always* and ``ndarray`` | ||
* ``cls._values`` refers is the "best possible" array. This could be an ``ndarray``, ``ExtensionArray``, or | ||
in ``Index`` subclass (note: we're in the process of removing the index subclasses here so that it's | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are using internal (pun intended) jargon here |
||
always an ``ndarray`` or ``ExtensionArray``). | ||
|
||
So, for example, ``Series[category]._values`` is a ``Categorical``, while ``Series[category]._ndarray_values`` is | ||
the underlying ndarray. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes |
||
|
||
|
||
.. _ref-subclassing-pandas: | ||
|
||
Subclassing pandas Data Structures | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,12 +7,14 @@ | |
import numpy as np | ||
|
||
from pandas.core.dtypes.missing import isna | ||
from pandas.core.dtypes.generic import ABCDataFrame, ABCSeries, ABCIndexClass | ||
from pandas.core.dtypes.generic import ( | ||
ABCDataFrame, ABCSeries, ABCIndexClass, ABCDatetimeIndex) | ||
from pandas.core.dtypes.common import ( | ||
is_object_dtype, | ||
is_list_like, | ||
is_scalar, | ||
is_datetimelike, | ||
is_categorical_dtype, | ||
is_extension_type) | ||
|
||
from pandas.util._validators import validate_bool_kwarg | ||
|
@@ -710,7 +712,7 @@ def transpose(self, *args, **kwargs): | |
@property | ||
def shape(self): | ||
""" return a tuple of the shape of the underlying data """ | ||
return self._values.shape | ||
return self._ndarray_values.shape | ||
|
||
@property | ||
def ndim(self): | ||
|
@@ -738,22 +740,22 @@ def data(self): | |
@property | ||
def itemsize(self): | ||
""" return the size of the dtype of the item of the underlying data """ | ||
return self._values.itemsize | ||
return self._ndarray_values.itemsize | ||
|
||
@property | ||
def nbytes(self): | ||
""" return the number of bytes in the underlying data """ | ||
return self._values.nbytes | ||
return self._ndarray_values.nbytes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this caused issues for CI, but re-running tests now with this change. |
||
|
||
@property | ||
def strides(self): | ||
""" return the strides of the underlying data """ | ||
return self._values.strides | ||
return self._ndarray_values.strides | ||
|
||
@property | ||
def size(self): | ||
""" return the number of elements in the underlying data """ | ||
return self._values.size | ||
return self._ndarray_values.size | ||
|
||
@property | ||
def flags(self): | ||
|
@@ -768,8 +770,21 @@ def base(self): | |
return self.values.base | ||
|
||
@property | ||
def _values(self): | ||
""" the internal implementation """ | ||
def _ndarray_values(self): | ||
"""The data as an ndarray, possibly losing information. | ||
|
||
The expectation is that this is cheap to compute. | ||
|
||
- categorical -> codes | ||
|
||
See '_values' for more. | ||
""" | ||
# type: () -> np.ndarray | ||
from pandas.core.dtypes.common import is_categorical_dtype | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this does NOT belong here. you already have a sub-clss EA for Categorical that can simply override this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code is to return This raises the question for me, though, what this will return for external extension types. Since it is is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've moved it all the way to |
||
|
||
if is_categorical_dtype(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we have this at all, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because But I agree it points to something we should think about how to organize this, as also eg for periods there will be a special case here in the future. So maybe we need a property on our own extension arrays that gives back this ndarray? (which is not necessarily part of the external interface for extension arrays) |
||
return self._values.codes | ||
|
||
return self.values | ||
|
||
@property | ||
|
@@ -819,8 +834,10 @@ def tolist(self): | |
|
||
if is_datetimelike(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be overriden in EA, rather than specific dispatching via if/else here, IOW it should be a part of the interface, or be defined as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure it is needed to add a |
||
return [com._maybe_box_datetimelike(x) for x in self._values] | ||
elif is_categorical_dtype(self): | ||
return self.values.tolist() | ||
else: | ||
return self._values.tolist() | ||
return self._ndarray_values.tolist() | ||
|
||
def __iter__(self): | ||
""" | ||
|
@@ -978,7 +995,9 @@ def value_counts(self, normalize=False, sort=True, ascending=False, | |
def unique(self): | ||
values = self._values | ||
|
||
# TODO: Make unique part of the ExtensionArray interface. | ||
if hasattr(values, 'unique'): | ||
|
||
result = values.unique() | ||
else: | ||
from pandas.core.algorithms import unique1d | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -480,20 +480,22 @@ def _concat_datetimetz(to_concat, name=None): | |
|
||
def _concat_index_same_dtype(indexes, klass=None): | ||
klass = klass if klass is not None else indexes[0].__class__ | ||
return klass(np.concatenate([x._values for x in indexes])) | ||
return klass(np.concatenate([x._ndarray_values for x in indexes])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is only used for numeric indices, so |
||
|
||
|
||
def _concat_index_asobject(to_concat, name=None): | ||
""" | ||
concat all inputs as object. DatetimeIndex, TimedeltaIndex and | ||
PeriodIndex are converted to object dtype before concatenation | ||
""" | ||
from pandas import Index | ||
from pandas.core.arrays import ExtensionArray | ||
|
||
klasses = ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex | ||
klasses = (ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex, | ||
ExtensionArray) | ||
to_concat = [x.astype(object) if isinstance(x, klasses) else x | ||
for x in to_concat] | ||
|
||
from pandas import Index | ||
self = to_concat[0] | ||
attribs = self._get_attributes_dict() | ||
attribs['name'] = name | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add section tags