ExtensionArray.take default implementation #20814

TomAugspurger · 2018-04-24T19:03:37Z

Implements a take interface that's compatible with NumPy and optionally pandas'
NA semantics.

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import *

In [3]: arr = DecimalArray(['1.1', '1.2', '1.3'])

In [4]: arr.take([0, 1, -1])
Out[4]: DecimalArray(array(['1.1', '1.2', '1.3'], dtype=object))

In [5]: arr.take([0, 1, -1], fill_value=float('nan'))
Out[5]: DecimalArray(array(['1.1', '1.2', Decimal('NaN')], dtype=object))

Closes #20640

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Implements a take interface that's compatible with NumPy and optionally pandas' NA semantics. ```python In [1]: import pandas as pd In [2]: from pandas.tests.extension.decimal.array import * In [3]: arr = DecimalArray(['1.1', '1.2', '1.3']) In [4]: arr.take([0, 1, -1]) Out[4]: DecimalArray(array(['1.1', '1.2', '1.3'], dtype=object)) In [5]: arr.take([0, 1, -1], fill_value=float('nan')) Out[5]: DecimalArray(array(['1.1', '1.2', Decimal('NaN')], dtype=object)) ``` Closes pandas-dev#20640

pep8speaks · 2018-04-24T19:03:48Z

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 27, 2018 at 11:03 Hours UTC

TomAugspurger

Marked as WIP for now, because something is going strange with JSONArray. NumPy doesn't like us trying to shove dictionaries into scalar spots of an ndarray I think. Still debugging.

TomAugspurger · 2018-04-24T19:04:14Z

pandas/compat/__init__.py

@@ -451,3 +451,5 @@ def is_platform_mac():

 def is_platform_32bit():
    return struct.calcsize("P") * 8 < 64
+
+_default_fill_value = object()


This probably isn't the right place for this, but it has to be importable from pandas/arrays/base.py and core/algorithms.py

pandas.core.dtypes.missing or pandas.core.missing

TomAugspurger · 2018-04-24T19:05:09Z

pandas/core/algorithms.py

@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
 take_1d = take_nd


+def take_ea(arr, indexer, fill_value=_default_fill_value):


I threw this in core/algorithms.py. Does it make more sense in core/arrays/base.py, either as a standalone method or as a staticmethod on ExtensionArray?

I would move it. you need a set of methods that are developer exposed for EA authors that are useful in building the EA, but are not pandas internals. core/arrays/base.py with an core/arrays/api.py for exposing them would be appropriate

this ended up exposed in pd.api.extensions but it isn't clear to me why. are we sure its needed there?

TomAugspurger · 2018-04-24T19:05:25Z

pandas/core/algorithms.py

@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
 take_1d = take_nd


+def take_ea(arr, indexer, fill_value=_default_fill_value):
+    """Extension-array compatible take.


Just copy-pased the docstring for now. Could do a shared doc if moved.

TomAugspurger · 2018-04-24T19:06:41Z

pandas/core/arrays/base.py

+        Returns
+        -------
+        array-like
+            Must satisify NumPy's `take` semantics.


n.b.: this doesn't have to be an ndarray. Anything that satisfies NumPy's take (and I suppose masking) semantics will work.

TomAugspurger · 2018-04-24T19:08:10Z

pandas/core/arrays/base.py

+        from pandas.core.algorithms import take_ea
+        from pandas.core.missing import isna
+
+        if isna(fill_value):


I think this if condition could be removed if we changed all the callers of pandas.core.algorithms.take* to specify what kind they want (NumPy or pandas-style). But that's a bit of work right now.

TomAugspurger · 2018-04-24T19:09:16Z

pandas/tests/extension/base/getitem.py

@@ -134,12 +134,29 @@ def test_take(self, data, na_value, na_cmp):

    def test_take_empty(self, data, na_value, na_cmp):
        empty = data[:0]
-        result = empty.take([-1])
-        na_cmp(result[0], na_value)
+        # result = empty.take([-1])


Question: should this raise now? Since this is doing NumPy style, as na_value isn't passed to take, then I think it should raise.

Yes, it should then raise (as the indexer is out of bounds?). Numpy and pandas do allow empty takes on empty values:

In [93]: np.array([]).take([]) Out[93]: array([], dtype=float64) In [94]: pd.Series([]).take([]) Out[94]: Series([], dtype: float64) In [95]: np.array([]).take([0]) ... IndexError: cannot do a non-empty take from an empty axes.

But we should keep a test like this for the case of allow_fill=True

TomAugspurger · 2018-04-24T19:09:36Z

pandas/tests/extension/category/test_categorical.py

@@ -81,18 +81,32 @@ def test_merge(self, data, na_value):


 class TestGetitem(base.BaseGetitemTests):
+    skip_take = pytest.mark.skip(reason="GH-20664.")


this is a WIP. Need to fix categorical to warn.

Probably doing that in a followup PR...

TomAugspurger · 2018-04-24T19:10:25Z

cc @jorisvandenbossche

jreback · 2018-04-24T19:15:36Z

pandas/compat/__init__.py

@@ -451,3 +451,5 @@ def is_platform_mac():

 def is_platform_32bit():
    return struct.calcsize("P") * 8 < 64
+
+_default_fill_value = object()


pandas.core.dtypes.missing or pandas.core.missing

jreback · 2018-04-24T19:16:58Z

pandas/core/algorithms.py

@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
 take_1d = take_nd


+def take_ea(arr, indexer, fill_value=_default_fill_value):


I would move it. you need a set of methods that are developer exposed for EA authors that are useful in building the EA, but are not pandas internals. core/arrays/base.py with an core/arrays/api.py for exposing them would be appropriate

jreback · 2018-04-24T19:19:20Z

pandas/core/algorithms.py

+    fill_value : any, optional
+        Fill value to use for NA-indicies. This has a few behaviors.
+
+        * fill_value is not specified : triggers NumPy's semantics


hmm, re-reading this, I find this very confusing. Why are you overloading fill_value here and not making this a separate parameter?

IOW, (terrible name), but use_numpy_semanatics=False, fill_value=None why not this?

I find this very confusing

Yeah, it's not just you 😆 I'll see how much work it is to implement your suggestion.

Although, there are two things going on here:

np.nan being swapped for ExtensionArray.dtype.na_value.

fill_value being overloaded to switch the meaning of negative indexers.

Which bit is confusing? If I'm able to change things so that fill_value is simply passed through, would that be OK? Then someone specifying ExtensionArray.take(indexer, fill_value=thing) would mean -1 is an NA marker. ExtensionArray.take(indexer) would mean -1 is the last element.

Not overloading fill_value is how it is currently, with both allow_fill and fill_value. If we find that clearer, I am fine with going back to that (although it would be nice to only need one keyword for this).

I have more an objection on the np.nan swapping. It would be nice if we could prevent this, but didn't look in detail enough to assess whether this would be possible or not.

Although, if we make sure to not pass through a default fill_value, you would in most cases only need to specify allow_fill=True to get pandas semantics, instead of needing to specify both allow_fill and fill_value. So maybe that is not that bad.

Currenlty, eg Series.reindex calls under the hood (via Block.take_nd) take_1d on the Block values, by passing through the Block.fill_value as default fill_value. Could we make sure to not pass anything through in case of ExtensionBlocks, or to let the ExtensionBlock.fill_value be the appropriate value? (to avoid the swapping of np.nan)

TomAugspurger · 2018-04-24T19:22:55Z

Oh fun

In [3]: np.array([{'a': 10}])
Out[3]: array([{'a': 10}], dtype=object)

In [4]: np.array([UserDict({'a': 10})])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-400145814467> in <module>()
----> 1 np.array([UserDict({'a': 10})])

/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/collections/__init__.py in __getitem__(self, key)
    989         if hasattr(self.__class__, "__missing__"):
    990             return self.__class__.__missing__(self, key)
--> 991         raise KeyError(key)
    992     def __setitem__(self, key, item): self.data[key] = item
    993     def __delitem__(self, key): del self.data[key]

KeyError: 0

(we use UserDict since pandas had some troubles with just regular dicts.

jorisvandenbossche · 2018-04-24T19:43:16Z

Quick general comment: why make a EA specific take method instead of just a general one that works for both ndarrays and EAs? (and for EA it would just dispatch to the EA).

I am also not sure if it is worth to add the _values_for_take. If we have a general purpose take function, an ExtensionArray.take implementation could simply be (assuming self.data is the underlying numpy array):

           def take(self, indexer, fill_value=None):
               ... somehow handle default fill_value ...
               result = take(self.data, indexer, fill_value=fill_value)
               return self.__class__(result)

I think it is not too much asked for an ExtensionArray author to implement this themselves.

There is also a problem with what a public na_value on ExtensionDtype/ExtensionArray means. Is it the value that is used "publicly"? Or is it the value that is used in the underlying array? Eg for geopandas the one is None, the other is 0 (but the 0 nobody should care about, as that is an implementation detail)

TomAugspurger · 2018-04-24T19:51:32Z

Quick general comment: why make a EA specific take method instead of just a general one that works for both ndarrays and EAs?

I think the way take_ea is written, it works on either.

In [1]: from pandas.core.arrays.base import take_ea

In [2]: import numpy as np

In [3]: arr = np.array([1, 2, 3])

In [4]: take_ea(arr, [0, 1, -1])
Out[4]: array([1, 2, 3])

Suggestions for a better name?

I think it is not too much asked for an ExtensionArray author to implement this themselves.

That seems fine.

There is also a problem with what a public na_value on ExtensionDtype/ExtensionArray means.

This can be private.

jorisvandenbossche · 2018-04-24T19:54:59Z

I think the way take_ea is written, it works on either.

The _from_sequence in there is only for EAs

jorisvandenbossche · 2018-04-24T19:57:33Z

How I had it in my head (but which doesn't mean it is better!), was that such a take method that you now wrote in algorithms.py would actually only work on numpy arrays (and on EAs as well, but for those we would dispatch), and then EA.take can call that function again but now with a numpy array of data.
Then the take method in algorithms.py would be "cleaner" in the sense that it doesn't need to be aware of EA's semantics.
But trying to look now in more detail to what you did to better see advantages/disadvantages.

jorisvandenbossche · 2018-04-24T20:04:04Z

pandas/core/algorithms.py

+            return arr._from_sequence([fill_value] * len(indexer))
+
+        result = arr.take(indexer)
+        result[mask] = fill_value


Is there a reason to not use the take_1d machinery here? (maybe there is, or maybe there is no reason to use it since the above is simpler and works fine, but was just wondering)

TomAugspurger · 2018-04-24T20:51:43Z

We should be able to have a clean implementation of take_ea (or take or whatever).

The tricky part is fixing the callers like Block.reindex_indexer / Block.get_reindexed_values / NDFrame.align, to call things correctly. These need to call algos.take(thing, fill_value=fill_value) with the right fill value for that dtype. Things should work, but it's a bit tricky.

TomAugspurger · 2018-04-25T11:04:03Z

Pushed another WIP if anyone has some thoughts on the general approach. The extension tests all pass but I'm still breaking things elsewhere, and I think things can be simplified.

The main idea is to have a sentinel _default_na_value that's used as the default for all reindexing operations (align, etc.). Then in internals, once we have a single-dtype block, we can figure out what default value to actually use based on the block type. In principle, this should work, but I'm struggling with various things, especially around upcasting. Hoping to clean it up today.

jorisvandenbossche · 2018-04-25T11:12:45Z

Regarding _default_fill_value, I have the feeling (again, from only taking a quick look) that we are mixing two different concepts: for take it means "no filling", for reindex it means "default fill value"

TomAugspurger · 2018-04-25T11:19:14Z

that we are mixing two different concepts

That feeling is correct. Is that not what we want? I can re-add the allow_fill argument.

jorisvandenbossche · 2018-04-25T11:29:04Z

That feeling is correct. Is that not what we want? I can re-add the allow_fill argument.

I am going a bit back and forth on it, not sure ...

Regarding what I said above about the dubious meaning of "na_value" (the physical value that is stored, or the public one), I think this is actually not really a problem. It should just be the public one, so the value that you get if you get an item of an EA that is missing (type of arr[0] if the first item is missing). So it is fine to keep this public.

jorisvandenbossche · 2018-04-25T12:08:10Z

Trying to get my head around the current logic. Some pseudo code flow path in case of a Series reindex operation with EA values:

s.reindex(self, ..., fill_value=_default_fill_value)
-> ... -> BlockManager.reindex_indexer(..., fill_value=fill_value)
-> Block.take_nd(..., 
        fill_tuple=(fill_value if fill_value is not _default_fill_value else Block.fill_value,)
-> algos.take_nd(values, ... allow_fill=True, fill_value=fill_tuple[1])
-> [in case of EA] values.take(..., fill_value=fill_value)
   -> if fill_value == _default_fill_value -> no filling
   -> if fill_value == Block.fill_value -> default filling behaviour -> algos.take with passed fill_value

(but in case of a reindex/aligment operation, the final fill_value actively passed to values.take(..) will never be _default_fill_value I think?)

So what I find complex about this: a) as said above different meaning of _default_fill_value throughout the code base in reindex and take, b) the fact that Block is responsible of passing an appropriate filling value to EA.take, while the EA could perfectly know and do that itself and c) that an EA.take implementation needs to be aware of the _default_fill_value object (which means we need to make that public).

TomAugspurger · 2018-04-25T12:41:40Z

EA.take implementation needs to be aware of the _default_fill_value

I think if we use allow_fill, then we can just switch to fill_value=None. The issue with fill_value=None as the default is that an EA may want to use None as their NA value, and so wouldn't be able to tell if it's explicit or the default.

This should be able to be simplified greatly. Let's see...

jorisvandenbossche · 2018-04-25T12:42:31Z

Just brainstorming on an alternative:

s.reindex(self, ..., fill_value=_default_fill_value)
-> ... -> BlockManager.reindex_indexer(..., fill_value=fill_value)
-> Block.take_nd(..., fill_tuple=(fill_value,)
-> algos.take_nd(values, ... allow_fill=True, fill_value=fill_tuple[1])
  -> if fill_value == _default_fill_value -> values.take(..., allow_fill=True) (let EA use its default)
  -> else -> values.take(..., allow_fill=True, fill_value=fill_value) (override EA default)
-> values.take(..., allow_fill=False, fill_value=None) # can use their own default for fill_value
   -> if allow_fill == False -> no filling
   -> if allow_fill == True -> filling behaviour 
      -> if fill_value == None -> fill_value = EA.dtype.na_value

(I don't know if this scheme is anywhere near clear for somebody that did not write it ...?)

This introduces again the second keyword, but eliminates double usecase of _default_fill_value, and EA.take does not need to be aware of it (the EA author can still use a similar trick if they want to have None as a valid fill_value instead of denoting the default, but that is then up to the EA author).

But of course this is only a single code path, I don't know how it interacts with all other usages of the take machinery.

TomAugspurger · 2018-04-25T12:58:45Z

I think if we use allow_fill, then we can just switch to fill_value=None. The issue with fill_value=None as the default is that an EA may want to use None as their NA value, and so wouldn't be able to tell if it's explicit or the default.

Actually, I don't think that's true. There would still be the ambiguity. Anyway, we can leave that up to the EA author I think...

Your proposal looks sensible. I'll try to implement it.

TomAugspurger · 2018-04-26T15:58:21Z

can you add there a test case with a different fill_value (not None or nan), to make sure it is passed through correctly to EA.take

I don't think we can do that generically, since we don't know what a valid na_value is.

I'll add one in decimal though.

jorisvandenbossche · 2018-04-26T16:03:40Z

I don't think we can do that generically, since we don't know what a valid na_value is.

If you fill it with an existing scalar (like pd.Series(data[:2]).reindex([1, 2], fill_value=data[2])), then we can expect that to work I think.

But it is actually a good question what should happen when you pass a value that is not accepted by the ExtensionArray. I suppose it is up to the EA to decide on that? (raise an error, or coerce to object)

jorisvandenbossche · 2018-04-26T16:35:38Z

One additional point of feedback from geopandas:

From running the extension tests in geopandas, it seems .take(..) still receives fill_value=np.nan quite a few times, so the

if allow_fill and fill_value is None:
    fill_value = self.dtype.na_value (in my case: fill_value = 0

snippet was not sufficient and I needed to add or pd.isna(fill_value) to make the tests pass.

But I don't understand why it is not failing for DecimalArray, as you added an assert to the class constructor that all values passed are actually decimals (so also Decimal('NaN') instead of np.nan), which should catch this problem.

TomAugspurger · 2018-04-26T16:44:51Z

Huh... I'll see what I can cook up. Maybe a dtype thing, since DecimalArray is fundamentally object dtype, but geopandas is integer?

For cyberpandas, I can't quite use this take with my structured array. Fixing that will be a decent bit of work though.

This does raise the question, should fill_value be the high-level NA value (ipaddres.IPv4Address(0)) or the low-level memory storage one (0, 0). I think it'll have to be the low-level one.

jorisvandenbossche · 2018-04-26T16:49:11Z

OK, found the reason: I forgot to set the na_value on the dtype.
But now I did that, I have another failure in pd.merge, which casts my geometries to object. It might be this is because our na_value is None, and somewhere this is then seen as object. But will try to figure that out, you don't need to fix that here in this PR :-)

jorisvandenbossche · 2018-04-26T16:54:26Z

This does raise the question, should fill_value be the high-level NA value (ipaddres.IPv4Address(0)) or the low-level memory storage one (0, 0). I think it'll have to be the low-level one.

So that's the question I raised before (and therefore also questioning whether it should be public), but I have now come to a somewhat conclusion that I think this should be the user-facing scalar ipaddres.IPv4Address(0) and not the low-level (0, 0) (so in case of geopandas: None and not 0). Although it makes it not very useful for actual implementations, it's then more kind of an "informative attribute".
That's also the reason I commented that the "na_value is used in EA.take" explanation in the na_value docstring comment is a bit misleading. As many actual EAs will not use this in the take implementation, they will rather pass the low-level fill_value to take (I suppose your case is mainly complicated because take_1d does not handle struct arrays? For me the new take method works nicely by passing my integer pointers and 0 as the fill_value)

TomAugspurger · 2018-04-26T17:45:52Z

I think this should be the user-facing scalar and not the low-level

That's my conclusion too. I will document this more fully in ExtensionArray.take, as algos.take doesn't really need to worry about that.

I suppose your case is mainly complicated because take_1d does not handle struct arrays?

Yeah. In my case, all of the fields are the same dtype, so I'm able reshape & view as a unit64 ndarray, do the take, and the convert back (without copies). I'll make sure everything works, but all that reshaping is a bit complicated :)

* indexer -> indices * doc user-facing vs physical * assert na_cmps * test reindex w/ non-NA fill_value

TomAugspurger · 2018-04-26T18:39:38Z

I don't think we can do that generically, since we don't know what a valid na_value is.

If you fill it with an existing scalar (like pd.Series(data[:2]).reindex([1, 2], fill_value=data[2])), then we can expect that to work I think.

https://github.com/pandas-dev/pandas/pull/20814/files#diff-0267ba650641bdaf3930400c2478b084R203

I think all your comments were addressed.

jorisvandenbossche · 2018-04-26T21:56:40Z

pandas/core/algorithms.py

@@ -1504,7 +1506,7 @@ def take(arr, indexer, allow_fill=False, fill_value=None):
    >>> from pandas.api.extensions import take

    With the default ``allow_fill=False``, negative numbers indicate
-    slices from the right.
+    positional indicies from the right.


indicies -> indices

jreback · 2018-04-26T22:22:52Z

pandas/core/algorithms.py

+        arr = np.asarray(arr)
+
+    # Do we require int64 or intp here?
+    indices = np.asarray(indices, dtype='int')


should be intp, which is what take accepts for indexers

jreback · 2018-04-26T22:23:05Z

pandas/core/arrays/base.py

-        fill_value : any, default None
-            Fill value to replace -1 values with. If applicable, this should
-            use the sentinel missing value for this type.
+        indices : sequence of integers


can you share this doc?

we want people to look at the source code of this (as we explicitly say the code of this base class is kind of the documentation), so I would certainly try to keep this one here (not sure if it would be easy to reuse it for the take function)

Sharing docs between these two are a bit difficult because of circular import issues. The docstring would have to go in a third module, which isn't desirable.

jreback · 2018-04-26T22:24:22Z

pandas/core/indexing.py

    mask = (indices >= n) | (indices < 0)
    if mask.any():
        raise IndexError("indices are out-of-bounds")
    return indices


+def validate_indices(indices, n):


why is this needed, as opposed to maybe_convert_indices? (maybe refactor that a bit?)

TomAugspurger · 2018-04-27T01:49:11Z

Apparently the random failure in test_merge was not fixed, at least not for all numpy arrays. https://circleci.com/gh/pandas-dev/pandas/13593#tests/containers/2 had a failures.

Hopefully eb43fa4, which is ugly, fixes it for good. I ran test_merge 1000 times both before and after that commit and things pass, so who knows :/

jreback · 2018-04-27T10:38:19Z

pandas/core/dtypes/base.py

+    # na_value is the default NA value to use for this type. This is used in
+    # e.g. ExtensionArray.take. This should be the user-facing "boxed" version
+    # of the NA value, not the physical NA vaalue for storage.
+    na_value = np.nan


maybe even show an example (from JSON?)

jreback · 2018-04-27T10:42:22Z

pandas/core/indexing.py

    mask = (indices >= n) | (indices < 0)
    if mask.any():
        raise IndexError("indices are out-of-bounds")
    return indices


+def validate_indices(indices, n):


so now we have 2 different code paths which do very similar type of validation, I would try to simplify Series._take and NDFrame._take to use these new helper functions, removing the need for maybe_convert_indices entirely.

jorisvandenbossche · 2018-04-27T11:46:38Z

@jreback do you have any other comments?

jreback · 2018-04-27T12:01:59Z

I guess this is fine. I am sure this will spawn many more bugs and would like this to live in master for a while.

jorisvandenbossche · 2018-04-27T13:07:17Z

I am sure this will spawn many more bugs

To what are you referring?

jbrockmendel · 2020-05-01T23:44:47Z

pandas/core/internals.py

@@ -5457,6 +5471,12 @@ def get_empty_dtype_and_na(join_units):
        if blk is None:
            return np.float64, np.nan

+    if is_uniform_reindex(join_units):
+        # XXX: integrate property


@TomAugspurger long shot any recollection what this comment is asking for?

Removing the need for that is_uniform_reindex check by including that case in the main if / elif block (which may be done on master now?)

TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 24, 2018

TomAugspurger added this to the 0.23.0 milestone Apr 24, 2018

TomAugspurger commented Apr 24, 2018

View reviewed changes

jreback requested changes Apr 24, 2018

View reviewed changes

Moves

dacd98e

jorisvandenbossche reviewed Apr 24, 2018

View reviewed changes

TomAugspurger added 3 commits April 24, 2018 18:02

non-internals changes

0be9ec6

Fixup JSON take

08f2479

More internals hacking

eba137f

TomAugspurger added 3 commits April 25, 2018 07:42

more with default fill value

67ba9dd

Merge remote-tracking branch 'upstream/master' into ea-take

37915e9

Removed default_fill_value

c721915

Updates

fbc4425

* indexer -> indices * doc user-facing vs physical * assert na_cmps * test reindex w/ non-NA fill_value

TomAugspurger added 2 commits April 26, 2018 13:43

doc fixup

f3b91ca

Skip categorical take tests

eecd632

jorisvandenbossche approved these changes Apr 26, 2018

View reviewed changes

jreback requested changes Apr 26, 2018

View reviewed changes

TomAugspurger added 3 commits April 26, 2018 20:04

Doc updates

9a6c7d4

Merge remote-tracking branch 'upstream/master' into ea-take

7c4f625

Really truly fix it hopefully.

eb43fa4

TomAugspurger mentioned this pull request Apr 27, 2018

BUG: Categorical.take and Series([Categorical]).take is inconsistent with other dtypes #20664

Closed

jreback requested changes Apr 27, 2018

View reviewed changes

TomAugspurger added 2 commits April 27, 2018 05:48

Added note

6858409

Updates

ec0cecd

TomAugspurger mentioned this pull request Apr 27, 2018

BUG/DEPR: deprecate Categorical take default behaviour + fix Series[categorical].take #20841

Merged

jreback approved these changes Apr 27, 2018

View reviewed changes

jorisvandenbossche merged commit 2cbdd9a into pandas-dev:master Apr 27, 2018

TomAugspurger deleted the ea-take branch May 9, 2018 16:08

jbrockmendel reviewed May 1, 2020

View reviewed changes

tswast mentioned this pull request Mar 10, 2022

fix: correct TypeError and comparison issues discovered in DateArray compliance tests googleapis/python-db-dtypes-pandas#79

Merged

5 tasks

		@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
		take_1d = take_nd


		def take_ea(arr, indexer, fill_value=_default_fill_value):

		@@ -81,18 +81,32 @@ def test_merge(self, data, na_value):


		class TestGetitem(base.BaseGetitemTests):
		skip_take = pytest.mark.skip(reason="GH-20664.")

ExtensionArray.take default implementation #20814

ExtensionArray.take default implementation #20814

Conversation

TomAugspurger commented Apr 24, 2018

pep8speaks commented Apr 24, 2018 • edited Loading

Comment last updated on April 27, 2018 at 11:03 Hours UTC

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger Apr 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 24, 2018

jorisvandenbossche commented Apr 24, 2018

TomAugspurger commented Apr 24, 2018

jorisvandenbossche commented Apr 24, 2018

jorisvandenbossche commented Apr 24, 2018

Choose a reason for hiding this comment

TomAugspurger commented Apr 24, 2018

TomAugspurger commented Apr 25, 2018

jorisvandenbossche commented Apr 25, 2018

TomAugspurger commented Apr 25, 2018

jorisvandenbossche commented Apr 25, 2018

jorisvandenbossche commented Apr 25, 2018

TomAugspurger commented Apr 25, 2018

jorisvandenbossche commented Apr 25, 2018

TomAugspurger commented Apr 25, 2018

TomAugspurger commented Apr 26, 2018

jorisvandenbossche commented Apr 26, 2018

jorisvandenbossche commented Apr 26, 2018

TomAugspurger commented Apr 26, 2018

jorisvandenbossche commented Apr 26, 2018

jorisvandenbossche commented Apr 26, 2018 • edited Loading

TomAugspurger commented Apr 26, 2018 • edited Loading

TomAugspurger commented Apr 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 27, 2018

jreback commented Apr 27, 2018

jorisvandenbossche commented Apr 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Apr 24, 2018 •

edited

Loading

TomAugspurger Apr 24, 2018 •

edited

Loading

jorisvandenbossche commented Apr 26, 2018 •

edited

Loading

TomAugspurger commented Apr 26, 2018 •

edited

Loading

TomAugspurger commented Apr 26, 2018 •

edited

Loading