Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is pd.Index.union not commutative? #23525

Closed
aa1371 opened this issue Nov 6, 2018 · 9 comments · Fixed by #23538
Closed

Why is pd.Index.union not commutative? #23525

aa1371 opened this issue Nov 6, 2018 · 9 comments · Fixed by #23538
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Milestone

Comments

@aa1371
Copy link
Contributor

aa1371 commented Nov 6, 2018

Code Sample, a copy-pastable example if possible

>>> ix = pd.Index([1,2])
>>> eix = pd.Index([])
>>> pi = pd.PeriodIndex(['19910905', '19910906'], freq='D')

# Pair 1
>>> pi.union(eix)
ValueError: can only call with other PeriodIndex-ed objects
>>> eix.union(pi)
PeriodIndex(['1991-09-05', '1991-09-06'], dtype='period[D]', freq='D')

# Pair 2
>>> pi.union(ix)
ValueError: can only call with other PeriodIndex-ed objects
>>> ix.union(pi)
Index([1, 2, 1991-09-05, 1991-09-06], dtype='object')

Problem description

Conceptually I would imagine a union operation to be commutative. I was just wondering if there was an deliberate rationale behind not implementing pd.Index._assert_can_do_setop to only fail if the complementary self._assert_can_do_setop also fails.

This behavior also leads to some unexpected behaviors in pd.concat. For example:

>>> df1 = df1 = pd.DataFrame([[1,2,3],[1,2,3]], index=pd.PeriodIndex(['19910905', '19910906'], freq='D'))
>>> df2 = pd.DataFrame()
>>> pd.concat([df1, df2], axis=1, keys=['a', 'b'])
ValueError: can only call with other PeriodIndex-ed objects
>>> pd.concat([df2, df1], axis=1, keys=['a', 'b'])
Works!

Additionally (and perhaps this should be raised as a separate issue) should the specific implementation of pd.PeriodIndex._assert_can_do_setop not raise if the other index is empty? Since pd.Index([]).union(<instance of pd.PeriodIndex>) results in an instance of pd.PeriodIndex.

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves API Design Period Period data type labels Nov 6, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

cc @jreback

@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

we don't ignore empties. actually these should all convert to object dtype (and work).

@TomAugspurger
Copy link
Contributor

Agreed.

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Nov 6, 2018
@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Nov 6, 2018
@aa1371
Copy link
Contributor Author

aa1371 commented Nov 7, 2018

@jreback - just to clarify - are you suggesting that:

  1. the union of an empty index with any other index should not result in an index of the same type as the "other" index.
    e.g.
>>> pd.Index([]).union(pd.period_range('19910905', periods=2))
Current: PeriodIndex(['1991-09-05', '1991-09-06'], dtype='period[D]', freq='D')
Desired: Index([1991-09-05, 1991-09-06], dtype='object')
>>> pd.Index([]).union(pd.interval_range(start=0, end=5))
Current: IntervalIndex([(0, 1], (1, 2]]
                       closed='right',
                       dtype='interval[int64]')
Desired: Index([(0, 1], (1, 2]], dtype='object')
  1. Any index should be able to form a union with any other type, as long as the resultant dtype is object?
    e.g.
>>> pd.period_range('19910905', periods=2).union(pd.Index([1,2,3]))
Current: ValueError
Desired: Index([1991-09-05, 1991-09-06, 1, 2, 3], dtype='object')

2b) And should this only be the case for which the _assert_can_do_setop is valid for at least one of the indexes?
e.g.
Consider the following pairs:
Index/PeriodIndex - Index can do setop with PeriodIndex, but not vice-versa.
IntervalIndex/PeriodIndex - neither index can do a setop with the other - should this still work and result in an object dtype?

@jreback
Copy link
Contributor

jreback commented Nov 7, 2018

yes to both

@aa1371
Copy link
Contributor Author

aa1371 commented Nov 7, 2018

Thoughts on 2b?

@jreback
Copy link
Contributor

jreback commented Nov 7, 2018

no

like indexes are combined to form the same type
unlike are coerced to object and then combine

it’s likely that the coercion logic is not fully handling things

we should not have any special cases - these are general ops

@aa1371
Copy link
Contributor Author

aa1371 commented Nov 9, 2018

@jreback - how should the union of Int64Index and RangeIndex behave ideally? Should they also result in an object dtype or behave the same as they do now?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 9, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants