Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtensionArray.take default implementation #20814

Merged
merged 32 commits into from
Apr 27, 2018

Conversation

TomAugspurger
Copy link
Contributor

Implements a take interface that's compatible with NumPy and optionally pandas'
NA semantics.

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import *

In [3]: arr = DecimalArray(['1.1', '1.2', '1.3'])

In [4]: arr.take([0, 1, -1])
Out[4]: DecimalArray(array(['1.1', '1.2', '1.3'], dtype=object))

In [5]: arr.take([0, 1, -1], fill_value=float('nan'))
Out[5]: DecimalArray(array(['1.1', '1.2', Decimal('NaN')], dtype=object))

Closes #20640

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Implements a take interface that's compatible with NumPy and optionally pandas'
NA semantics.

```python
In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import *

In [3]: arr = DecimalArray(['1.1', '1.2', '1.3'])

In [4]: arr.take([0, 1, -1])
Out[4]: DecimalArray(array(['1.1', '1.2', '1.3'], dtype=object))

In [5]: arr.take([0, 1, -1], fill_value=float('nan'))
Out[5]: DecimalArray(array(['1.1', '1.2', Decimal('NaN')], dtype=object))
```

Closes pandas-dev#20640
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 24, 2018
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Apr 24, 2018
@pep8speaks
Copy link

pep8speaks commented Apr 24, 2018

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 27, 2018 at 11:03 Hours UTC

Copy link
Contributor Author

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as WIP for now, because something is going strange with JSONArray. NumPy doesn't like us trying to shove dictionaries into scalar spots of an ndarray I think. Still debugging.

@@ -451,3 +451,5 @@ def is_platform_mac():

def is_platform_32bit():
return struct.calcsize("P") * 8 < 64

_default_fill_value = object()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably isn't the right place for this, but it has to be importable from pandas/arrays/base.py and core/algorithms.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.core.dtypes.missing or pandas.core.missing

@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
take_1d = take_nd


def take_ea(arr, indexer, fill_value=_default_fill_value):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I threw this in core/algorithms.py. Does it make more sense in core/arrays/base.py, either as a standalone method or as a staticmethod on ExtensionArray?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move it. you need a set of methods that are developer exposed for EA authors that are useful in building the EA, but are not pandas internals. core/arrays/base.py with an core/arrays/api.py for exposing them would be appropriate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ended up exposed in pd.api.extensions but it isn't clear to me why. are we sure its needed there?

@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
take_1d = take_nd


def take_ea(arr, indexer, fill_value=_default_fill_value):
"""Extension-array compatible take.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copy-pased the docstring for now. Could do a shared doc if moved.

Returns
-------
array-like
Must satisify NumPy's `take` semantics.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b.: this doesn't have to be an ndarray. Anything that satisfies NumPy's take (and I suppose masking) semantics will work.

from pandas.core.algorithms import take_ea
from pandas.core.missing import isna

if isna(fill_value):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if condition could be removed if we changed all the callers of pandas.core.algorithms.take* to specify what kind they want (NumPy or pandas-style). But that's a bit of work right now.

@@ -134,12 +134,29 @@ def test_take(self, data, na_value, na_cmp):

def test_take_empty(self, data, na_value, na_cmp):
empty = data[:0]
result = empty.take([-1])
na_cmp(result[0], na_value)
# result = empty.take([-1])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: should this raise now? Since this is doing NumPy style, as na_value isn't passed to take, then I think it should raise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should then raise (as the indexer is out of bounds?). Numpy and pandas do allow empty takes on empty values:

In [93]: np.array([]).take([])
Out[93]: array([], dtype=float64)

In [94]: pd.Series([]).take([])
Out[94]: Series([], dtype: float64)

In [95]: np.array([]).take([0])
...
IndexError: cannot do a non-empty take from an empty axes.

But we should keep a test like this for the case of allow_fill=True

@@ -81,18 +81,32 @@ def test_merge(self, data, na_value):


class TestGetitem(base.BaseGetitemTests):
skip_take = pytest.mark.skip(reason="GH-20664.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a WIP. Need to fix categorical to warn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doing that in a followup PR...

@TomAugspurger
Copy link
Contributor Author

cc @jorisvandenbossche

@@ -451,3 +451,5 @@ def is_platform_mac():

def is_platform_32bit():
return struct.calcsize("P") * 8 < 64

_default_fill_value = object()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.core.dtypes.missing or pandas.core.missing

@@ -1558,6 +1559,81 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
take_1d = take_nd


def take_ea(arr, indexer, fill_value=_default_fill_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move it. you need a set of methods that are developer exposed for EA authors that are useful in building the EA, but are not pandas internals. core/arrays/base.py with an core/arrays/api.py for exposing them would be appropriate

fill_value : any, optional
Fill value to use for NA-indicies. This has a few behaviors.

* fill_value is not specified : triggers NumPy's semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, re-reading this, I find this very confusing. Why are you overloading fill_value here and not making this a separate parameter?

IOW, (terrible name), but use_numpy_semanatics=False, fill_value=None why not this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very confusing

Yeah, it's not just you 😆 I'll see how much work it is to implement your suggestion.

Copy link
Contributor Author

@TomAugspurger TomAugspurger Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, there are two things going on here:

  1. np.nan being swapped for ExtensionArray.dtype.na_value.
  2. fill_value being overloaded to switch the meaning of negative indexers.

Which bit is confusing? If I'm able to change things so that fill_value is simply passed through, would that be OK? Then someone specifying ExtensionArray.take(indexer, fill_value=thing) would mean -1 is an NA marker. ExtensionArray.take(indexer) would mean -1 is the last element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not overloading fill_value is how it is currently, with both allow_fill and fill_value. If we find that clearer, I am fine with going back to that (although it would be nice to only need one keyword for this).

I have more an objection on the np.nan swapping. It would be nice if we could prevent this, but didn't look in detail enough to assess whether this would be possible or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, if we make sure to not pass through a default fill_value, you would in most cases only need to specify allow_fill=True to get pandas semantics, instead of needing to specify both allow_fill and fill_value. So maybe that is not that bad.

Currenlty, eg Series.reindex calls under the hood (via Block.take_nd) take_1d on the Block values, by passing through the Block.fill_value as default fill_value. Could we make sure to not pass anything through in case of ExtensionBlocks, or to let the ExtensionBlock.fill_value be the appropriate value? (to avoid the swapping of np.nan)

@TomAugspurger
Copy link
Contributor Author

Oh fun

In [3]: np.array([{'a': 10}])
Out[3]: array([{'a': 10}], dtype=object)

In [4]: np.array([UserDict({'a': 10})])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-400145814467> in <module>()
----> 1 np.array([UserDict({'a': 10})])

/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/collections/__init__.py in __getitem__(self, key)
    989         if hasattr(self.__class__, "__missing__"):
    990             return self.__class__.__missing__(self, key)
--> 991         raise KeyError(key)
    992     def __setitem__(self, key, item): self.data[key] = item
    993     def __delitem__(self, key): del self.data[key]

KeyError: 0

(we use UserDict since pandas had some troubles with just regular dicts.

@jorisvandenbossche
Copy link
Member

Quick general comment: why make a EA specific take method instead of just a general one that works for both ndarrays and EAs? (and for EA it would just dispatch to the EA).

I am also not sure if it is worth to add the _values_for_take. If we have a general purpose take function, an ExtensionArray.take implementation could simply be (assuming self.data is the underlying numpy array):

           def take(self, indexer, fill_value=None):
               ... somehow handle default fill_value ...
               result = take(self.data, indexer, fill_value=fill_value)
               return self.__class__(result)

I think it is not too much asked for an ExtensionArray author to implement this themselves.

There is also a problem with what a public na_value on ExtensionDtype/ExtensionArray means. Is it the value that is used "publicly"? Or is it the value that is used in the underlying array? Eg for geopandas the one is None, the other is 0 (but the 0 nobody should care about, as that is an implementation detail)

@TomAugspurger
Copy link
Contributor Author

Quick general comment: why make a EA specific take method instead of just a general one that works for both ndarrays and EAs?

I think the way take_ea is written, it works on either.

In [1]: from pandas.core.arrays.base import take_ea

In [2]: import numpy as np

In [3]: arr = np.array([1, 2, 3])

In [4]: take_ea(arr, [0, 1, -1])
Out[4]: array([1, 2, 3])

Suggestions for a better name?

I think it is not too much asked for an ExtensionArray author to implement this themselves.

That seems fine.

There is also a problem with what a public na_value on ExtensionDtype/ExtensionArray means.

This can be private.

@jorisvandenbossche
Copy link
Member

I think the way take_ea is written, it works on either.

The _from_sequence in there is only for EAs

@jorisvandenbossche
Copy link
Member

How I had it in my head (but which doesn't mean it is better!), was that such a take method that you now wrote in algorithms.py would actually only work on numpy arrays (and on EAs as well, but for those we would dispatch), and then EA.take can call that function again but now with a numpy array of data.
Then the take method in algorithms.py would be "cleaner" in the sense that it doesn't need to be aware of EA's semantics.
But trying to look now in more detail to what you did to better see advantages/disadvantages.

return arr._from_sequence([fill_value] * len(indexer))

result = arr.take(indexer)
result[mask] = fill_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not use the take_1d machinery here? (maybe there is, or maybe there is no reason to use it since the above is simpler and works fine, but was just wondering)

@TomAugspurger
Copy link
Contributor Author

We should be able to have a clean implementation of take_ea (or take or whatever).

The tricky part is fixing the callers like Block.reindex_indexer / Block.get_reindexed_values / NDFrame.align, to call things correctly. These need to call algos.take(thing, fill_value=fill_value) with the right fill value for that dtype. Things should work, but it's a bit tricky.

@TomAugspurger
Copy link
Contributor Author

Pushed another WIP if anyone has some thoughts on the general approach. The extension tests all pass but I'm still breaking things elsewhere, and I think things can be simplified.

The main idea is to have a sentinel _default_na_value that's used as the default for all reindexing operations (align, etc.). Then in internals, once we have a single-dtype block, we can figure out what default value to actually use based on the block type. In principle, this should work, but I'm struggling with various things, especially around upcasting. Hoping to clean it up today.

@jorisvandenbossche
Copy link
Member

Regarding _default_fill_value, I have the feeling (again, from only taking a quick look) that we are mixing two different concepts: for take it means "no filling", for reindex it means "default fill value"

@TomAugspurger
Copy link
Contributor Author

that we are mixing two different concepts

That feeling is correct. Is that not what we want? I can re-add the allow_fill argument.

@jorisvandenbossche
Copy link
Member

That feeling is correct. Is that not what we want? I can re-add the allow_fill argument.

I am going a bit back and forth on it, not sure ...

Regarding what I said above about the dubious meaning of "na_value" (the physical value that is stored, or the public one), I think this is actually not really a problem. It should just be the public one, so the value that you get if you get an item of an EA that is missing (type of arr[0] if the first item is missing). So it is fine to keep this public.

@jorisvandenbossche
Copy link
Member

Trying to get my head around the current logic. Some pseudo code flow path in case of a Series reindex operation with EA values:

s.reindex(self, ..., fill_value=_default_fill_value)
-> ... -> BlockManager.reindex_indexer(..., fill_value=fill_value)
-> Block.take_nd(..., 
        fill_tuple=(fill_value if fill_value is not _default_fill_value else Block.fill_value,)
-> algos.take_nd(values, ... allow_fill=True, fill_value=fill_tuple[1])
-> [in case of EA] values.take(..., fill_value=fill_value)
   -> if fill_value == _default_fill_value -> no filling
   -> if fill_value == Block.fill_value -> default filling behaviour -> algos.take with passed fill_value

(but in case of a reindex/aligment operation, the final fill_value actively passed to values.take(..) will never be _default_fill_value I think?)

So what I find complex about this: a) as said above different meaning of _default_fill_value throughout the code base in reindex and take, b) the fact that Block is responsible of passing an appropriate filling value to EA.take, while the EA could perfectly know and do that itself and c) that an EA.take implementation needs to be aware of the _default_fill_value object (which means we need to make that public).

@TomAugspurger
Copy link
Contributor Author

EA.take implementation needs to be aware of the _default_fill_value

I think if we use allow_fill, then we can just switch to fill_value=None. The issue with fill_value=None as the default is that an EA may want to use None as their NA value, and so wouldn't be able to tell if it's explicit or the default.

This should be able to be simplified greatly. Let's see...

@jorisvandenbossche
Copy link
Member

Just brainstorming on an alternative:

s.reindex(self, ..., fill_value=_default_fill_value)
-> ... -> BlockManager.reindex_indexer(..., fill_value=fill_value)
-> Block.take_nd(..., fill_tuple=(fill_value,)
-> algos.take_nd(values, ... allow_fill=True, fill_value=fill_tuple[1])
  -> if fill_value == _default_fill_value -> values.take(..., allow_fill=True) (let EA use its default)
  -> else -> values.take(..., allow_fill=True, fill_value=fill_value) (override EA default)
-> values.take(..., allow_fill=False, fill_value=None) # can use their own default for fill_value
   -> if allow_fill == False -> no filling
   -> if allow_fill == True -> filling behaviour 
      -> if fill_value == None -> fill_value = EA.dtype.na_value

(I don't know if this scheme is anywhere near clear for somebody that did not write it ...?)

This introduces again the second keyword, but eliminates double usecase of _default_fill_value, and EA.take does not need to be aware of it (the EA author can still use a similar trick if they want to have None as a valid fill_value instead of denoting the default, but that is then up to the EA author).

But of course this is only a single code path, I don't know how it interacts with all other usages of the take machinery.

@TomAugspurger
Copy link
Contributor Author

I think if we use allow_fill, then we can just switch to fill_value=None. The issue with fill_value=None as the default is that an EA may want to use None as their NA value, and so wouldn't be able to tell if it's explicit or the default.

Actually, I don't think that's true. There would still be the ambiguity. Anyway, we can leave that up to the EA author I think...

Your proposal looks sensible. I'll try to implement it.

@TomAugspurger
Copy link
Contributor Author

can you add there a test case with a different fill_value (not None or nan), to make sure it is passed through correctly to EA.take

I don't think we can do that generically, since we don't know what a valid na_value is.

I'll add one in decimal though.

@jorisvandenbossche
Copy link
Member

I don't think we can do that generically, since we don't know what a valid na_value is.

If you fill it with an existing scalar (like pd.Series(data[:2]).reindex([1, 2], fill_value=data[2])), then we can expect that to work I think.

But it is actually a good question what should happen when you pass a value that is not accepted by the ExtensionArray. I suppose it is up to the EA to decide on that? (raise an error, or coerce to object)

@jorisvandenbossche
Copy link
Member

One additional point of feedback from geopandas:

From running the extension tests in geopandas, it seems .take(..) still receives fill_value=np.nan quite a few times, so the

if allow_fill and fill_value is None:
    fill_value = self.dtype.na_value (in my case: fill_value = 0 

snippet was not sufficient and I needed to add or pd.isna(fill_value) to make the tests pass.

But I don't understand why it is not failing for DecimalArray, as you added an assert to the class constructor that all values passed are actually decimals (so also Decimal('NaN') instead of np.nan), which should catch this problem.

@TomAugspurger
Copy link
Contributor Author

Huh... I'll see what I can cook up. Maybe a dtype thing, since DecimalArray is fundamentally object dtype, but geopandas is integer?

For cyberpandas, I can't quite use this take with my structured array. Fixing that will be a decent bit of work though.

This does raise the question, should fill_value be the high-level NA value (ipaddres.IPv4Address(0)) or the low-level memory storage one (0, 0). I think it'll have to be the low-level one.

@jorisvandenbossche
Copy link
Member

OK, found the reason: I forgot to set the na_value on the dtype.
But now I did that, I have another failure in pd.merge, which casts my geometries to object. It might be this is because our na_value is None, and somewhere this is then seen as object. But will try to figure that out, you don't need to fix that here in this PR :-)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 26, 2018

This does raise the question, should fill_value be the high-level NA value (ipaddres.IPv4Address(0)) or the low-level memory storage one (0, 0). I think it'll have to be the low-level one.

So that's the question I raised before (and therefore also questioning whether it should be public), but I have now come to a somewhat conclusion that I think this should be the user-facing scalar ipaddres.IPv4Address(0) and not the low-level (0, 0) (so in case of geopandas: None and not 0). Although it makes it not very useful for actual implementations, it's then more kind of an "informative attribute".
That's also the reason I commented that the "na_value is used in EA.take" explanation in the na_value docstring comment is a bit misleading. As many actual EAs will not use this in the take implementation, they will rather pass the low-level fill_value to take (I suppose your case is mainly complicated because take_1d does not handle struct arrays? For me the new take method works nicely by passing my integer pointers and 0 as the fill_value)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Apr 26, 2018

I think this should be the user-facing scalar and not the low-level

That's my conclusion too. I will document this more fully in ExtensionArray.take, as algos.take doesn't really need to worry about that.

I suppose your case is mainly complicated because take_1d does not handle struct arrays?

Yeah. In my case, all of the fields are the same dtype, so I'm able reshape & view as a unit64 ndarray, do the take, and the convert back (without copies). I'll make sure everything works, but all that reshaping is a bit complicated :)

* indexer -> indices
* doc user-facing vs physical
* assert na_cmps
* test reindex w/ non-NA fill_value
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Apr 26, 2018

I don't think we can do that generically, since we don't know what a valid na_value is.

If you fill it with an existing scalar (like pd.Series(data[:2]).reindex([1, 2], fill_value=data[2])), then we can expect that to work I think.

https://github.com/pandas-dev/pandas/pull/20814/files#diff-0267ba650641bdaf3930400c2478b084R203

I think all your comments were addressed.

@@ -1504,7 +1506,7 @@ def take(arr, indexer, allow_fill=False, fill_value=None):
>>> from pandas.api.extensions import take

With the default ``allow_fill=False``, negative numbers indicate
slices from the right.
positional indicies from the right.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicies -> indices

arr = np.asarray(arr)

# Do we require int64 or intp here?
indices = np.asarray(indices, dtype='int')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be intp, which is what take accepts for indexers

fill_value : any, default None
Fill value to replace -1 values with. If applicable, this should
use the sentinel missing value for this type.
indices : sequence of integers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share this doc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want people to look at the source code of this (as we explicitly say the code of this base class is kind of the documentation), so I would certainly try to keep this one here (not sure if it would be easy to reuse it for the take function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing docs between these two are a bit difficult because of circular import issues. The docstring would have to go in a third module, which isn't desirable.

mask = (indices >= n) | (indices < 0)
if mask.any():
raise IndexError("indices are out-of-bounds")
return indices


def validate_indices(indices, n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed, as opposed to maybe_convert_indices? (maybe refactor that a bit?)

@TomAugspurger
Copy link
Contributor Author

Apparently the random failure in test_merge was not fixed, at least not for all numpy arrays. https://circleci.com/gh/pandas-dev/pandas/13593#tests/containers/2 had a failures.

Hopefully eb43fa4, which is ugly, fixes it for good. I ran test_merge 1000 times both before and after that commit and things pass, so who knows :/

# na_value is the default NA value to use for this type. This is used in
# e.g. ExtensionArray.take. This should be the user-facing "boxed" version
# of the NA value, not the physical NA vaalue for storage.
na_value = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe even show an example (from JSON?)

mask = (indices >= n) | (indices < 0)
if mask.any():
raise IndexError("indices are out-of-bounds")
return indices


def validate_indices(indices, n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now we have 2 different code paths which do very similar type of validation, I would try to simplify Series._take and NDFrame._take to use these new helper functions, removing the need for maybe_convert_indices entirely.

@jorisvandenbossche
Copy link
Member

@jreback do you have any other comments?

@jreback
Copy link
Contributor

jreback commented Apr 27, 2018

I guess this is fine. I am sure this will spawn many more bugs and would like this to live in master for a while.

@jorisvandenbossche jorisvandenbossche merged commit 2cbdd9a into pandas-dev:master Apr 27, 2018
@jorisvandenbossche
Copy link
Member

I am sure this will spawn many more bugs

To what are you referring?

@TomAugspurger TomAugspurger deleted the ea-take branch May 9, 2018 16:08
@@ -5457,6 +5471,12 @@ def get_empty_dtype_and_na(join_units):
if blk is None:
return np.float64, np.nan

if is_uniform_reindex(join_units):
# XXX: integrate property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger long shot any recollection what this comment is asking for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the need for that is_uniform_reindex check by including that case in the main if / elif block (which may be done on master now?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: take interface for (Extension)Array-likes
6 participants