Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: harmonize Index.to_flat_index to MultiIndex.to_flat_index #25094

Closed

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Feb 2, 2019

After "implementing" it (that is, just unifying the method), I'm even more convinced about my idea... but @WillAyd and @h-vetinari feel free to consider this just a stimulus to resurrect (and conclude) the discussion ;-)

@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #25094 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25094      +/-   ##
==========================================
+ Coverage   92.37%   92.37%   +<.01%     
==========================================
  Files         166      166              
  Lines       52420    52421       +1     
==========================================
+ Hits        48423    48424       +1     
  Misses       3997     3997
Flag Coverage Δ
#multiple 90.79% <100%> (ø) ⬆️
#single 42.87% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.59% <ø> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.55% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb43726...9070163. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #25094 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25094      +/-   ##
==========================================
+ Coverage   92.34%   92.35%   +<.01%     
==========================================
  Files         166      166              
  Lines       52314    52315       +1     
==========================================
+ Hits        48311    48313       +2     
+ Misses       4003     4002       -1
Flag Coverage Δ
#multiple 90.78% <100%> (ø) ⬆️
#single 42.84% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.59% <ø> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.55% <100%> (ø) ⬆️
pandas/util/testing.py 88.05% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 683c7b5...f7935c2. Read the comment docs.

@@ -69,6 +69,9 @@ Fixed Regressions

- Fixed the warning for implicitly registered matplotlib converters not showing. See :ref:`whatsnew_0211.converters` for more (:issue:`24963`).

**MultiIndex**

- :meth:`Index.to_flat_index` now always returns an :class:`Index` containing tuples, even in the case the original object was already a flat :class:`Index` (:issue:`23670`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a significant api change, so would go in 0.25.0. IF we do this (need discussions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been part of #22866, or at least of 0.24.0, when the method was introduced from scratch. Too late for that, but let's at least not expose for long a behavior which is inconsistent. Sure, this needs discussion ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean to say Index.to_flat_index is not idempotent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a good thing though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the reasons in the discussion, plus one: one_level_multi_index.to_flat_index() is not idempotent, either.

@h-vetinari
Copy link
Contributor

@toobaz: [...] but @WillAyd and @h-vetinari feel free to consider this just a stimulus to resurrect (and conclude) the discussion ;-)

A bit late to the party here, but I still don't understand the need for flattening MIs. I looked at the original issue and the usecase there (squashing MI-columns for csv output) could IMO more idiomatically be solved by something like MI.str.cat(', ')

Of course, the latter current does not work yet (see also #23679), but I continue to think that tuples should not be usable in indexes. They are already necessary to "talk" to MIs, which makes their use ambiguous, plus it's incredibly hard to get right (and we don't, for large parts of the code base, including basic indexing), not least because it needs so much special casing for interacting with numpy (which treats tuples as arrays by default). See #24688

@toobaz
Copy link
Member Author

toobaz commented Feb 4, 2019

I continue to think that tuples should not be usable in indexes

Will comment on #24688 in few seconds. But this PR assumes that to_flat_index exists, and tries to fix it. This is a simple PR, worth doing even if to_flat_index disappears in a couple of releases. Doesn't need all that discussion.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz pls in this issue lay out why you think this is a good think. len 1 MultiIndexes, while supported are not very useful. This elevates them to a first class citizen, not a good thing.

@toobaz
Copy link
Member Author

toobaz commented Feb 4, 2019

len 1 MultiIndexes, while supported are not very useful. This elevates them to a first class citizen, not a good thing.

No, this PR doesn't change 1 level MultiIndexes in any way.

@toobaz pls in this issue lay out why you think this is a good think

This is quite clear in the issue: #23670

Anyway, in short, it's really simple:

  1. Index.to_flat_index() doesn't make much sense per se - the index is already flat
  2. we support it only for compatibility with MultiIndex
  3. ... but in the current form, it is not compatible. The output for MultiIndex with n levels is an Index with tuples of length n
  4. So the output for a flat Index, which has 1 level, should be an Index with tuples of length 1
  5. .. which is incidentally the output on a MultiIndex with 1 level. Which is good, because 1-level MultiIndexes should behave like flat Indexes (I couldn't care less about whether they are popular, encouraged or blamed: as long as they are allowed, we want them to behave consistently)

This is really a simple thing, even @WillAyd is now neutral. Please provide any reason not to merge this, or let's proceed.

@toobaz toobaz force-pushed the flat_index_to_flat_index_23670 branch from 9070163 to f7935c2 Compare February 7, 2019 17:22
@WillAyd
Copy link
Member

WillAyd commented Apr 10, 2019

Closing for now - I think it's worth continuing conversation in issue before updating any PRs on this

@WillAyd WillAyd closed this Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rethinking to_flat_index on flat Index
4 participants