Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH - Index set operation modifications to address issue #23525 #23538

Merged
merged 75 commits into from
May 21, 2019

Conversation

aa1371
Copy link
Contributor

@aa1371 aa1371 commented Nov 7, 2018

This is a first pass at addressing the associated issue. Needs plenty of discussion, feedback and testing.

…empty indexes, and allow more cross index operaions
@pep8speaks
Copy link

pep8speaks commented Nov 7, 2018

Hello @ArtinSarraf! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-21 00:53:39 UTC

@gfyoung gfyoung added Dtype Conversions Unexpected or buggy dtype conversions API Design Period Period data type Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 7, 2018
@jreback
Copy link
Contributor

jreback commented Nov 7, 2018

i would write tests before writing any code

@aa1371
Copy link
Contributor Author

aa1371 commented Nov 11, 2018

@jreback / @TomAugspurger / @gfyoung - Added tests for combinations of mismatched types. As well as for compatible inconsistent pairs (i.e. Range/Int64Index). Existing tests have been updated to account for the change in behavior and all pytest pandas/tests pass.

Any feedback on whether the changes are on the right track?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArtinSarraf on good track.

pandas/core/indexes/category.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved

# if is_dtype_equal(self.dtype, other.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put some comments here on the decisions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All subclass implementations started with this check, so thought it would make sense to pull this out to be common among all, and implement index specific overriden behavior in the _union. I do override the union methods in subclasses to account for docstring changes though. Was planning on coming up with a better way to override the docstring without having to override and make a call to super. Is there any good way to do this already used within pandas?

pandas/core/indexes/interval.py Outdated Show resolved Hide resolved
pandas/core/indexes/period.py Show resolved Hide resolved
pandas/core/indexes/range.py Outdated Show resolved Hide resolved
return pd.Index([])


INDEXES = dict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have fixtures in pandas/tests/indexes/conftest.py pls use them instead of creating new ones like this. you may need to create derived fixtures which is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will look into it, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing fixture used pre-instantiated indexes. Added a fixture for the index factories.

pandas/tests/indexes/test_setops.py Show resolved Hide resolved
pandas/tests/reshape/test_concat.py Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

when you push always rebase again master.

@aa1371
Copy link
Contributor Author

aa1371 commented Nov 13, 2018

@jreback - regarding moving setop tests into the base level:
Right now all the setop tests are specific to an Index type and are segregated into their own respective index subdirectory. Combining them all together would create a 1000+ line test module with mixed test styles which might be harder to parse through than the separate files. Should we leave as is, or would one module with clear section headers be sufficient?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArtinSarraf looks really good. some small comments. pls merge master and ping on green.

cases = [klass(second.values) for klass in [np.array, Series, list]]
for case in cases:
print('hi')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the print!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -30,10 +30,14 @@ def test_union2(self, sort):
tm.assert_index_equal(union, everything)

# GH 10149
expected = first.astype('O').union(
pd.Index(second.values, dtype='O')
).astype('O')
cases = [klass(second.values) for klass in [np.array, Series, list]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally can parametrize this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -120,10 +120,6 @@ def test_union_misc(self, sort):
with pytest.raises(period.IncompatibleFrequency):
index.union(index2, sort=sort)

msg = 'can only call with other PeriodIndex-ed objects'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have cases where we would raise ever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe so. Only the incompatible frequency error which is covered in the line above.

pandas/tests/indexes/test_setops.py Show resolved Hide resolved
COMPATIBLE_INCONSISTENT_PAIRS.values())
def test_compatible_inconsistent_pairs(idx_fact1, idx_fact2):
# GH 23525
idx1 = idx_fact1(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were you able to remove any existing tests that are duplicative here? (maybe in test_base? or indexes/test_setops?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the only "compatible inconsistent pair" currently is RangeIndex/Int64Index it seems that indexes/test_range is much more thorough then this test I have here. It would make more sense to remove the test here.

However, I find this test to be convenient since it is very clear what pairs make up special cases, and adds very little overhead.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 21, 2019 via email

@jschendel
Copy link
Member

What does np.concatenate do here?

Looks like it casts to float64 in every scenario:

In [2]: a1 = np.array([1], dtype='int64')

In [3]: a2 = np.array([2], dtype='uint64')

In [4]: np.concatenate([a1, a2]).dtype
Out[4]: dtype('float64')

In [5]: a3 = np.array([-1], dtype='int64')

In [6]: a4 = np.array([np.iinfo(np.uint64).max], dtype='uint64')

In [7]: np.concatenate([a3, a4]).dtype
Out[7]: dtype('float64')

@aa1371
Copy link
Contributor Author

aa1371 commented Mar 27, 2019

I feel like this supports the current behavior of single consistent type. However, I guess it also raises the possibility of casting non-consistent numeric types to float64.

Thoughts?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 27, 2019 via email

@aa1371
Copy link
Contributor Author

aa1371 commented Mar 28, 2019

@TomAugspurger - what does “depreciation cycle” mean in this context. Also do you think there should be any action related to that in this PR?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 28, 2019 via email

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of very minor comments (except the last one), ping on green.

@@ -29,11 +29,20 @@ def test_union2(self, sort):
union = first.union(second, sort=sort)
tm.assert_index_equal(union, everything)

@pytest.mark.parametrize("klass", [np.array, Series, list])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, we usually call this box rather than klass

@pytest.fixture(params=list(it.combinations(indices_list, 2)),
ids=lambda x: type(x[0]).__name__ + type(x[1]).__name__)
def index_pair(request):
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use triple-double quotes

raises=InvalidIndexError,
strict=True)),
ops.rxor,
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this orthogonal to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the discussion here:
#23538 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough.

@jreback
Copy link
Contributor

jreback commented May 16, 2019

lgtm. if any comments @TomAugspurger

@jreback jreback added this to the 0.25.0 milestone May 16, 2019
@aa1371
Copy link
Contributor Author

aa1371 commented May 17, 2019

@jreback - tests passed.
@TomAugspurger / @jschendel any other comments?

@jreback
Copy link
Contributor

jreback commented May 19, 2019

lgtm. @ArtinSarraf can you merge master and ping on green

@aa1371
Copy link
Contributor Author

aa1371 commented May 21, 2019

@jreback - merged and green

@TomAugspurger
Copy link
Contributor

Alrighty, let's merge this. Thanks for sticking with it @ArtinSarraf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why is pd.Index.union not commutative?
7 participants