-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH - Index set operation modifications to address issue #23525 #23538
Conversation
…empty indexes, and allow more cross index operaions
Hello @ArtinSarraf! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-05-21 00:53:39 UTC |
i would write tests before writing any code |
…coerce mismatched closings on interval indexes
…ng cast to dtype object
@jreback / @TomAugspurger / @gfyoung - Added tests for combinations of mismatched types. As well as for compatible inconsistent pairs (i.e. Range/Int64Index). Existing tests have been updated to account for the change in behavior and all Any feedback on whether the changes are on the right track? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArtinSarraf on good track.
pandas/core/indexes/base.py
Outdated
|
||
# if is_dtype_equal(self.dtype, other.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put some comments here on the decisions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All subclass implementations started with this check, so thought it would make sense to pull this out to be common among all, and implement index specific overriden behavior in the _union
. I do override the union
methods in subclasses to account for docstring changes though. Was planning on coming up with a better way to override the docstring without having to override and make a call to super. Is there any good way to do this already used within pandas?
pandas/tests/indexes/test_setops.py
Outdated
return pd.Index([]) | ||
|
||
|
||
INDEXES = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have fixtures in pandas/tests/indexes/conftest.py pls use them instead of creating new ones like this. you may need to create derived fixtures which is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will look into it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing fixture used pre-instantiated indexes. Added a fixture for the index factories.
when you push always rebase again master. |
@jreback - regarding moving setop tests into the base level: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArtinSarraf looks really good. some small comments. pls merge master and ping on green.
cases = [klass(second.values) for klass in [np.array, Series, list]] | ||
for case in cases: | ||
print('hi') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the print!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -30,10 +30,14 @@ def test_union2(self, sort): | |||
tm.assert_index_equal(union, everything) | |||
|
|||
# GH 10149 | |||
expected = first.astype('O').union( | |||
pd.Index(second.values, dtype='O') | |||
).astype('O') | |||
cases = [klass(second.values) for klass in [np.array, Series, list]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally can parametrize this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -120,10 +120,6 @@ def test_union_misc(self, sort): | |||
with pytest.raises(period.IncompatibleFrequency): | |||
index.union(index2, sort=sort) | |||
|
|||
msg = 'can only call with other PeriodIndex-ed objects' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have cases where we would raise ever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe so. Only the incompatible frequency error which is covered in the line above.
COMPATIBLE_INCONSISTENT_PAIRS.values()) | ||
def test_compatible_inconsistent_pairs(idx_fact1, idx_fact2): | ||
# GH 23525 | ||
idx1 = idx_fact1(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were you able to remove any existing tests that are duplicative here? (maybe in test_base? or indexes/test_setops?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the only "compatible inconsistent pair" currently is RangeIndex/Int64Index it seems that indexes/test_range is much more thorough then this test I have here. It would make more sense to remove the test here.
However, I find this test to be convenient since it is very clear what pairs make up special cases, and adds very little overhead.
What was the resolution of the different, but sometimes compatible dtypes like uint and int? What does np.concatenate do here?
… On Mar 20, 2019, at 20:59, ArtinSarraf ***@***.***> wrote:
@ArtinSarraf commented on this pull request.
In pandas/tests/indexes/test_setops.py:
> + # This is true before this PR as well.
+
+ # Union with a non-unique, non-monotonic index raises error
+ # This applies to the boolean index
+ idx1 = idx1.sort_values()
+ idx2 = idx2.sort_values()
+
+ assert idx1.union(idx2).dtype == np.dtype('O')
+ assert idx2.union(idx1).dtype == np.dtype('O')
+
+
***@***.***('idx_fact1,idx_fact2',
+ COMPATIBLE_INCONSISTENT_PAIRS.values())
+def test_compatible_inconsistent_pairs(idx_fact1, idx_fact2):
+ # GH 23525
+ idx1 = idx_fact1(10)
Considering the only "compatible inconsistent pair" currently is RangeIndex/Int64Index it seems that indexes/test_range is much more thorough then this test I have here. It would make more sense to remove the test here.
However, I find this test to be convenient since it is very clear what pairs make up special cases, and adds very little overhead.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Looks like it casts to In [2]: a1 = np.array([1], dtype='int64')
In [3]: a2 = np.array([2], dtype='uint64')
In [4]: np.concatenate([a1, a2]).dtype
Out[4]: dtype('float64')
In [5]: a3 = np.array([-1], dtype='int64')
In [6]: a4 = np.array([np.iinfo(np.uint64).max], dtype='uint64')
In [7]: np.concatenate([a3, a4]).dtype
Out[7]: dtype('float64') |
I feel like this supports the current behavior of single consistent type. However, I guess it also raises the possibility of casting non-consistent numeric types to float64. Thoughts? |
We typically use object as the “default” type, where numpy uses float. I think for these cases like uint and int, we’ll just want object for the dtype stability. May need a depreciation cycle.
… On Mar 26, 2019, at 23:40, ArtinSarraf ***@***.***> wrote:
I feel like this supports the current behavior of single consistent type. However, I guess it also raises the possibility of casting non-consistent numeric types to float64.
Thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@TomAugspurger - what does “depreciation cycle” mean in this context. Also do you think there should be any action related to that in this PR? |
Avoiding the behavior change for those dtypes. May need to require users to astype ahead of time.
I haven’t looked at this PR recently.
… On Mar 28, 2019, at 17:56, ArtinSarraf ***@***.***> wrote:
@TomAugspurger - what does “depreciation cycle” mean in this context. Also do you think there should be any action related to that in this PR?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple of very minor comments (except the last one), ping on green.
@@ -29,11 +29,20 @@ def test_union2(self, sort): | |||
union = first.union(second, sort=sort) | |||
tm.assert_index_equal(union, everything) | |||
|
|||
@pytest.mark.parametrize("klass", [np.array, Series, list]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, we usually call this box rather than klass
pandas/tests/indexes/test_setops.py
Outdated
@pytest.fixture(params=list(it.combinations(indices_list, 2)), | ||
ids=lambda x: type(x[0]).__name__ + type(x[1]).__name__) | ||
def index_pair(request): | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use triple-double quotes
raises=InvalidIndexError, | ||
strict=True)), | ||
ops.rxor, | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this orthogonal to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the discussion here:
#23538 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair enough.
lgtm. if any comments @TomAugspurger |
@jreback - tests passed. |
lgtm. @ArtinSarraf can you merge master and ping on green |
@jreback - merged and green |
Alrighty, let's merge this. Thanks for sticking with it @ArtinSarraf! |
This is a first pass at addressing the associated issue. Needs plenty of discussion, feedback and testing.
git diff upstream/master -u -- "*.py" | flake8 --diff