Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Internal / External values #19558

Merged
merged 42 commits into from
Feb 13, 2018
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
41f09d8
REF/Clean: Internal / External values
TomAugspurger Feb 3, 2018
29cfd7c
Move to index base
TomAugspurger Feb 6, 2018
3185f4e
Cleanup unique handling
TomAugspurger Feb 7, 2018
5a59591
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 7, 2018
476f75d
Simplify object concat
TomAugspurger Feb 7, 2018
b15ee5a
Use values for intersection
TomAugspurger Feb 7, 2018
659073f
hmm
TomAugspurger Feb 7, 2018
7accb67
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 8, 2018
9b8d2a5
Additional testing
TomAugspurger Feb 8, 2018
9fbac29
More tests
TomAugspurger Feb 8, 2018
55305dc
ndarray_values
TomAugspurger Feb 8, 2018
0e63708
API: Default ExtensionArray.astype
TomAugspurger Feb 8, 2018
fbbbc8a
Simplify concat_as_object
TomAugspurger Feb 8, 2018
46a0a49
Py2 compat
TomAugspurger Feb 8, 2018
2c4445a
Set-ops ugliness
TomAugspurger Feb 8, 2018
5612cda
better docstrings
TomAugspurger Feb 8, 2018
b012c19
tolist
TomAugspurger Feb 8, 2018
d49e6aa
linting
TomAugspurger Feb 8, 2018
d7d31ee
Moved dtypes
TomAugspurger Feb 9, 2018
7b89f1b
clean
TomAugspurger Feb 9, 2018
b0dbffd
cleanup
TomAugspurger Feb 9, 2018
66b936f
NumPy compat
TomAugspurger Feb 9, 2018
32ee0ef
Use base _values for CategoricalIndex
TomAugspurger Feb 9, 2018
a9882e2
Update dev docs
TomAugspurger Feb 9, 2018
f53652a
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 9, 2018
2425621
cleanup
TomAugspurger Feb 9, 2018
512fb89
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 9, 2018
170d0c7
Linting
TomAugspurger Feb 9, 2018
402620f
Precision in tests
TomAugspurger Feb 9, 2018
d9e8dd6
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 9, 2018
815d202
Push _ndarray_values to ExtensionArray
TomAugspurger Feb 11, 2018
a727b21
Clean up tolist
TomAugspurger Feb 11, 2018
f368c29
Move test locations
TomAugspurger Feb 11, 2018
d74c5c9
Fixed test
TomAugspurger Feb 12, 2018
8104ee5
REF: Update per comments
TomAugspurger Feb 12, 2018
f8e29b9
lint
TomAugspurger Feb 12, 2018
0cd9faa
REF: Use _values for size and shape
TomAugspurger Feb 12, 2018
8fcdb70
PERF: Implement size, shape for IntervalIndex
TomAugspurger Feb 12, 2018
34a6a22
PERF: Avoid materializing values for PeriodIndex shape, size
TomAugspurger Feb 12, 2018
c233c28
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 13, 2018
d6e8051
Cleanup
TomAugspurger Feb 13, 2018
3af8a21
Override nbytes
TomAugspurger Feb 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions doc/source/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,25 @@ not check (or care) whether the levels themselves are sorted. Fortunately, the
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
if you compute the levels and labels yourself, please be careful.

Values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add section tags

~~~~~~

Pandas extends NumPy's type system with custom types, like ``Categorical`` or
datetimes with a timezone, so we have multiple notions of "values". For 1-D
containers (``Index`` classes and ``Series``) we have the following convention:

* ``cls._ndarray_values`` is *always* a NumPy ``ndarray``. Ideally,
``_ndarray_values`` is cheap to compute. For example, for a ``Categorical``,
this returns the codes, not the array of objects.
* ``cls._values`` refers is the "best possible" array. This could be an
``ndarray``, ``ExtensionArray``, or in ``Index`` subclass (note: we're in the
process of removing the index subclasses here so that it's always an
``ndarray`` or ``ExtensionArray``).

So, for example, ``Series[category]._values`` is a ``Categorical``, while
``Series[category]._ndarray_values`` is the underlying codes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section does not belong in the same document as "how to subclass", as this part is really internals for contributors to pandas.
So I would maybe split this in two: external vs internal details, where the future documentation on how to use and define ExtensionArrays would also go into the external details part (maybe "Extending Pandas" would be a good title, with then information about the different strategies: accessor, extension array, subclassing, ..)

But that's for a separate PR (I can start with that), so for here the above is fine for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal-internals.rst and external-internals.rst ;)

I think that "How to subclass", and the eventual "extending pandas with custom array types" would be better in developer.rst, which is labeled as "This section will focus on downstream applications of pandas.".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I didn't see that the accessor documentation is actually already there (although, I personally don't find the parquet section that fitting in there, as it is not something typical you need to know when extending pandas. I can already start with moving it to the bottom of the file :-))



.. _ref-subclassing-pandas:

Subclassing pandas Data Structures
Expand Down
12 changes: 12 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,15 @@ def _can_hold_na(self):
Setting this to false will optimize some operations like fillna.
"""
return True

@property
def _ndarray_values(self):
# type: () -> np.ndarray
"""Internal pandas method for lossy conversion to a NumPy ndarray.

This method is not part of the pandas interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange that we say this is not part of the interface, but still provide here a default implementation and say what it is.

If it is not part of the interface, we could also define it on our own subclasses without defining it here.
But on the other hand, that will raise errors when an external extension array does not have this method. Eg some attributes on Series will call into this, which makes it somehow part of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange that we say this is not part of the interface, but still provide here a default implementation and say what it is.

This simplifies the implementation since

a.) we don't have to have a PandasExtensionArray that subclasses ExtensionArray, just to define this method.
b.) we can safely call ._values anywhere in our code without checking whether it's an extension array or a pandas extension array.

My preference is to leave it out of the interface until someone sees an actual need for it. I suspect this could come up if / when we start allowing custom indexes with their own indexing engines, but that seems like a ways down the road...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand those reasons, but that means it is part of the interface (in the sense that if somebody would be stupid to implement a _ndarray_values property that returns something differently, it will break code?)
It's just a part of the interface that we ask not to implement?

To give an example, for GeoPandas, I was wondering if I would overwrite this property to return my self.data (the integer pointers) instead of a materialized array (which can be costly).
For things like Series.strides, Series.itemsize, .. this will be OK (and in principle no user should call this anyway .. so maybe I should not worry about it), but if it is used in other places as well that I am not really sure about, such an implementation might actually give a wrong result (but at the same time, if this is called in certain places for my column with an extension array, I want to know that, because materializing can be costly and I want to avoid this as much as possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Your GeoPandas concern is a good one. But I'm not sure how to proceeded :/ Do you have a preference for

a.) Adding it to the interface
b.) Leaving it out of the interface, with a default implementation on EA
c.) Making a pandas-specific EA that defines _ndarray_values, so uses of _ndarray_values will need to check for that attr.

._values / ._ndarray_values are meant to be internal, and I don't want to limit future refactorings with backwards compatibility concerns. I'm also not even sure how to document it beyond what's already there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the best option is to leave it for now? :)

Ideally, the Series machinery should not use _ndarray_values too often, as this would require knowledge of what those values mean (codes, ordinals, ...), which Series does not need to know (if there is knowledge required, it should somehow dispatch the operation to the underlying (extension) array itself).

And this is already the case I think. From a quick scan, currently in the PR, I see ndarray_values is mainly used for:

  • indexing (values used in the engine) -> for now we don't support extension arrays to be the data for Index (apart from our own), so this is a worry for later if we want to tackle that
  • Index object operations (eg the setops) -> same as above
  • in some specific cases where we know the parent object (eg we know it is a PeriodIndex, we just want to get the ordinals)
  • for algos (eg take, unique) -> either they dispatch to EA (eg take) or for now the fallback is object array anyhow (eg value_counts), and in general this is something we still need to discuss how to handle in general (factorize, unique, ..)
  • some Series attributes (shape, itemsize, strides, size) -> shape can also be derived from _values directly (like nbytes) as it is part of the EA interface. And from the others, I think size would be nice to not have to materialize the EA (itemsize and strides I don't care about). Can we let this return len(self._values), which is part of the interface? (I am not sure to what extent .size and __len__ on numpy arrays are different / have different performance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your assessment looks correct. Nothing that's series specific currently uses _ndarray_values, so it's really just code that is Index-specific or index / series agnostic.

I can change .shape and .size to use ._values instead of instead of ._ndarary values. I'll just have to override it in DatetimeIndex to avoid a recursion error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with @jorisvandenbossche assessment here. maybe if we can ultimatley remove this would be good, and simply dispatch to the array object. If Index is a proper EA then this would be possible.


The expectation is that this is cheap to compute, and is primarily
used for interacting with our indexers.
"""
return np.array(self)
4 changes: 4 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,10 @@ def dtype(self):
"""The :class:`~pandas.api.types.CategoricalDtype` for this instance"""
return self._dtype

@property
def _ndarray_values(self):
return self.codes

@property
def _constructor(self):
return Categorical
Expand Down
24 changes: 18 additions & 6 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
is_list_like,
is_scalar,
is_datetimelike,
is_extension_type)
is_extension_type,
is_extension_array_dtype)

from pandas.util._validators import validate_bool_kwarg

Expand Down Expand Up @@ -738,17 +739,17 @@ def data(self):
@property
def itemsize(self):
""" return the size of the dtype of the item of the underlying data """
return self._values.itemsize
return self._ndarray_values.itemsize

@property
def nbytes(self):
""" return the number of bytes in the underlying data """
return self._values.nbytes
return self.values.nbytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not ._values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what we do with PeriodIndex.values and IntervalIndex.values.

If we change the public .values to be an ExtensionArray then there's no difference.

If we leave the public .values as an ndarray of object, then we prefer ._values since it doesn't materialize the ndarray.

So far we've put off that decision, but we'll need to make a choice soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to override this in DatetimeIndex (tz) then for now, this is the reason CI is failing (recursion error)

On the original comment: I think it is better for now to use ._values instead of .values in such cases. .values is the public one that might not always be consistent for backwards compatibility, so let's internally use always _values to have something certainly consistent? (and if we decide .values to return extension arrays, it is just a public wrapper around ._values, and if we decide not, we don't have to change anything internally)


@property
def strides(self):
""" return the strides of the underlying data """
return self._values.strides
return self._ndarray_values.strides

@property
def size(self):
Expand All @@ -768,8 +769,17 @@ def base(self):
return self.values.base

@property
def _values(self):
""" the internal implementation """
def _ndarray_values(self):
"""The data as an ndarray, possibly losing information.

The expectation is that this is cheap to compute, and is primarily
used for interacting with our indexers.

- categorical -> codes
"""
# type: () -> np.ndarray
if is_extension_array_dtype(self):
return self.values._ndarray_values
return self.values

@property
Expand Down Expand Up @@ -978,7 +988,9 @@ def value_counts(self, normalize=False, sort=True, ascending=False,
def unique(self):
values = self._values

# TODO: Make unique part of the ExtensionArray interface.
if hasattr(values, 'unique'):

result = values.unique()
else:
from pandas.core.algorithms import unique1d
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,7 @@ def try_timedelta(v):
# will try first with a string & object conversion
from pandas import to_timedelta
try:
return to_timedelta(v)._values.reshape(shape)
return to_timedelta(v)._ndarray_values.reshape(shape)
except Exception:
return v.reshape(shape)

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1709,7 +1709,7 @@ def is_extension_array_dtype(arr_or_dtype):
from pandas.core.arrays import ExtensionArray

# we want to unpack series, anything else?
if isinstance(arr_or_dtype, ABCSeries):
if isinstance(arr_or_dtype, (ABCIndexClass, ABCSeries)):
arr_or_dtype = arr_or_dtype._values
return isinstance(arr_or_dtype, (ExtensionDtype, ExtensionArray))

Expand Down
6 changes: 4 additions & 2 deletions pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -488,12 +488,14 @@ def _concat_index_asobject(to_concat, name=None):
concat all inputs as object. DatetimeIndex, TimedeltaIndex and
PeriodIndex are converted to object dtype before concatenation
"""
from pandas import Index
from pandas.core.arrays import ExtensionArray

klasses = ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex
klasses = (ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex,
ExtensionArray)
to_concat = [x.astype(object) if isinstance(x, klasses) else x
for x in to_concat]

from pandas import Index
self = to_concat[0]
attribs = self._get_attributes_dict()
attribs['name'] = name
Expand Down
108 changes: 83 additions & 25 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,14 @@
is_object_dtype,
is_categorical_dtype,
is_interval_dtype,
is_period_dtype,
is_bool,
is_bool_dtype,
is_signed_integer_dtype,
is_unsigned_integer_dtype,
is_integer_dtype, is_float_dtype,
is_datetime64_any_dtype,
is_datetime64tz_dtype,
is_timedelta64_dtype,
needs_i8_conversion,
is_iterator, is_list_like,
Expand Down Expand Up @@ -412,7 +414,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):
values = np.array(values, copy=False)
if is_object_dtype(values):
values = cls(values, name=name, dtype=dtype,
**kwargs)._values
**kwargs)._ndarray_values

result = object.__new__(cls)
result._data = values
Expand Down Expand Up @@ -594,6 +596,40 @@ def values(self):
""" return the underlying data as an ndarray """
return self._data.view(np.ndarray)

@property
def _values(self):
# type: () -> Union[ExtensionArray, Index]
# TODO: remove index types as they become is extension arrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 'is'

"""The best array representation.

This is an ndarray, ExtensionArray, or Index subclass. This differs
from ``_ndarray_values``, which always returns an ndarray.

Both ``_values`` and ``_ndarray_values`` are consistent between
``Series`` and ``Index``.

It may differ from the public '.values' method.

index | values | _values | _ndarray_values |
----------------- | -------------- -| ----------- | --------------- |
CategoricalIndex | Categorical | Categorical | codes |
DatetimeIndex[tz] | ndarray[M8ns] | DTI[tz] | ndarray[M8ns] |

For the following, the ``._values`` is currently ``ndarray[object]``,
but will soon be an ``ExtensionArray``

index | values | _values | _ndarray_values |
----------------- | --------------- | ------------ | --------------- |
PeriodIndex | ndarray[object] | ndarray[obj] | ndarray[int] |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the _values, shouldn't this be PeriodIndex for now? Similar as for DatetimeTZ now has DTI[tz], and so in the future it would become PeriodArray.

(but not sure where this is currently actually used)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so, since Series[Period]._values returns an ndarray of objects. Series[datetime-with-TZ]._values is the only special one, and I'm only changing Index._values to match Series._values for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that is because we don't have a PeriodBlock that is backed by a PeriodIndex, as is the case for DatetimeTZ, so at the moment you cannot really have Period values in a series.
I am not saying that this should be the case now, only that I think it might be more logical. But, it might well be that we then first need to add a working "PeriodBlock" (ExtensionBlock backed by PeriodIndex/PeriodArray) to have this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what your proposing / saying then. Is it just that ._values being a PeriodIndex would be more consistent with ._values returning a DatetimeIndex?

What we have now "works", but is internally inconsistent. I'm updating Index._values to be consistent with Series._values, and changing uses of .values / ._values / ._ndarray_values to be the correct one for that use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just that ._values being a PeriodIndex would be more consistent with ._values returning a DatetimeIndex?

Yes. So regardless of what ._values currently does or doesn't return for Index/Series. Just from looking at your table, returning PeriodIndex would make more sense IMO, as this is (at this moment) the closest thing to an array-like that preserves the information.
But, I don't know what implication it would have for Series ops that ._values would start returning a PeriodIndex instead of object array of Periods, so it was really just a question, not a request for change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory these should return an Index type for ._values for PI and II, and it might work, but would be disruptive w/o a Block type to hold it. This is why for example we have the is_period_arraylike methods to detect an array of Periods. The holder ATM is a ndarray[object].

IntervalIndex | ndarray[object] | ndarray[obj] | ndarray[object] |

See Also
--------
values
_ndarray_values
"""
return self.values

def get_values(self):
""" return the underlying data as an ndarray """
return self.values
Expand Down Expand Up @@ -664,7 +700,7 @@ def ravel(self, order='C'):
--------
numpy.ndarray.ravel
"""
return self._values.ravel(order=order)
return self._ndarray_values.ravel(order=order)

# construction helpers
@classmethod
Expand Down Expand Up @@ -1597,7 +1633,7 @@ def _constructor(self):
@cache_readonly
def _engine(self):
# property, for now, slow to look up
return self._engine_type(lambda: self._values, len(self))
return self._engine_type(lambda: self._ndarray_values, len(self))

def _validate_index_level(self, level):
"""
Expand Down Expand Up @@ -2228,27 +2264,37 @@ def union(self, other):
other = other.astype('O')
return this.union(other)

# TODO: setops-refactor, clean all this up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if/then needs to all be completely removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be removed until setops are fix more generally. Like I said, I have a PR started, but it's a bigger issue than what's on the critical path for ExtensionArray.

Take CategoricalIndex.intersection We relied on CategoricalIndex._outer_indexer(self._values, other._values) raising a TypeError to get the correct result. If I pass CategoricalIndex._ndarray_values, we don't get a type error any more, but we get the wrong result.

There's no way to avoid the if statements until set ops are completely refactored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's do this next then. these if/then/else are really smelly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if might be worth it to put this in a single method now (not sure if its possible) to isolate this code, maybe

def _get_internal_type(other):
     if is_period_dtype(other) ......
          ...
    else:
         ...

if is_period_dtype(self) or is_datetime64tz_dtype(self):
lvals = self._ndarray_values
else:
lvals = self._values
if is_period_dtype(other) or is_datetime64tz_dtype(other):
rvals = other._ndarray_values
else:
rvals = other._values

if self.is_monotonic and other.is_monotonic:
try:
result = self._outer_indexer(self._values, other._values)[0]
result = self._outer_indexer(lvals, rvals)[0]
except TypeError:
# incomparable objects
result = list(self._values)
result = list(lvals)

# worth making this faster? a very unusual case
value_set = set(self._values)
result.extend([x for x in other._values if x not in value_set])
value_set = set(lvals)
result.extend([x for x in rvals if x not in value_set])
else:
indexer = self.get_indexer(other)
indexer, = (indexer == -1).nonzero()

if len(indexer) > 0:
other_diff = algos.take_nd(other._values, indexer,
other_diff = algos.take_nd(rvals, indexer,
allow_fill=False)
result = _concat._concat_compat((self._values, other_diff))
result = _concat._concat_compat((lvals, other_diff))

try:
self._values[0] < other_diff[0]
lvals[0] < other_diff[0]
except TypeError as e:
warnings.warn("%s, sort order is undefined for "
"incomparable objects" % e, RuntimeWarning,
Expand All @@ -2260,7 +2306,7 @@ def union(self, other):
result.sort()

else:
result = self._values
result = lvals

try:
result = np.sort(result)
Expand Down Expand Up @@ -2311,20 +2357,30 @@ def intersection(self, other):
other = other.astype('O')
return this.intersection(other)

# TODO: setops-refactor, clean all this up
if is_period_dtype(self):
lvals = self._ndarray_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

else:
lvals = self._values
if is_period_dtype(other):
rvals = other._ndarray_values
else:
rvals = other._values

if self.is_monotonic and other.is_monotonic:
try:
result = self._inner_indexer(self._values, other._values)[0]
result = self._inner_indexer(lvals, rvals)[0]
return self._wrap_union_result(other, result)
except TypeError:
pass

try:
indexer = Index(other._values).get_indexer(self._values)
indexer = Index(rvals).get_indexer(lvals)
indexer = indexer.take((indexer != -1).nonzero()[0])
except Exception:
# duplicates
indexer = algos.unique1d(
Index(other._values).get_indexer_non_unique(self._values)[0])
Index(rvals).get_indexer_non_unique(lvals)[0])
indexer = indexer[indexer != -1]

taken = other.take(indexer)
Expand Down Expand Up @@ -2700,7 +2756,7 @@ def get_indexer(self, target, method=None, limit=None, tolerance=None):
raise ValueError('limit argument only valid if doing pad, '
'backfill or nearest reindexing')

indexer = self._engine.get_indexer(target._values)
indexer = self._engine.get_indexer(target._ndarray_values)

return _ensure_platform_int(indexer)

Expand All @@ -2716,12 +2772,13 @@ def _get_fill_indexer(self, target, method, limit=None, tolerance=None):
if self.is_monotonic_increasing and target.is_monotonic_increasing:
method = (self._engine.get_pad_indexer if method == 'pad' else
self._engine.get_backfill_indexer)
indexer = method(target._values, limit)
indexer = method(target._ndarray_values, limit)
else:
indexer = self._get_fill_indexer_searchsorted(target, method,
limit)
if tolerance is not None:
indexer = self._filter_indexer_tolerance(target._values, indexer,
indexer = self._filter_indexer_tolerance(target._ndarray_values,
indexer,
tolerance)
return indexer

Expand Down Expand Up @@ -2812,7 +2869,7 @@ def get_indexer_non_unique(self, target):
self = Index(self.asi8)
tgt_values = target.asi8
else:
tgt_values = target._values
tgt_values = target._ndarray_values

indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
return _ensure_platform_int(indexer), missing
Expand Down Expand Up @@ -3247,16 +3304,17 @@ def _join_multi(self, other, how, return_indexers=True):
def _join_non_unique(self, other, how='left', return_indexers=False):
from pandas.core.reshape.merge import _get_join_indexers

left_idx, right_idx = _get_join_indexers([self._values],
[other._values], how=how,
left_idx, right_idx = _get_join_indexers([self._ndarray_values],
[other._ndarray_values],
how=how,
sort=True)

left_idx = _ensure_platform_int(left_idx)
right_idx = _ensure_platform_int(right_idx)

join_index = np.asarray(self._values.take(left_idx))
join_index = np.asarray(self._ndarray_values.take(left_idx))
mask = left_idx == -1
np.putmask(join_index, mask, other._values.take(right_idx))
np.putmask(join_index, mask, other._ndarray_values.take(right_idx))

join_index = self._wrap_joined_index(join_index, other)

Expand Down Expand Up @@ -3403,8 +3461,8 @@ def _join_monotonic(self, other, how='left', return_indexers=False):
else:
return ret_index

sv = self._values
ov = other._values
sv = self._ndarray_values
ov = other._ndarray_values

if self.is_unique and other.is_unique:
# We can perform much better than the general case
Expand Down Expand Up @@ -3756,7 +3814,7 @@ def insert(self, loc, item):
item = self._na_value

_self = np.asarray(self)
item = self._coerce_scalar_to_index(item)._values
item = self._coerce_scalar_to_index(item)._ndarray_values
idx = np.concatenate((_self[:loc], item, _self[loc:]))
return self._shallow_copy_with_infer(idx)

Expand Down
Loading