-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: RangeIndex redux #11892
ENH: RangeIndex redux #11892
Conversation
0eaeaa5
to
771e8fb
Compare
@shoyer if you can review when you have a chance. as you had a number of comments on the original. |
In [2]: s.index | ||
Out[2]: Int64Index([0, 1, 2, 3, 4], dtype='int64') | ||
|
||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add "New behavior:"
4d6739b
to
6824bd3
Compare
@shoyer all fixed up |
raise TypeError('Invalid to pass a non-int64 dtype to RangeIndex') | ||
|
||
# RangeIndex | ||
if isinstance(start, RangeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe pretty strongly that it's a bad idea to make the public API for the constructor this flexible. You can use Index(range(...))
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the dtype
part? or accepting range
part?
note that I already had removed not sure why 1) but having a nice constructor is a problem. |
Great. This is like how
Well, we could make the constructor only accept a single array/index/range argument instead, and require another dedicated method |
another attempt to change
And if the length is |
@kawochen can you add a commit for this (and some addtl tests)? |
Oh OK. |
@jreback submitted some tests at jreback#15 |
@TomAugspurger @kawochen incorporated your changes thanks! ok, ready to go on this. |
LGTM (assuming tests pass). Thanks. |
@TomAugspurger I added the floordiv enhancement to #12034 but low priority :> |
`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up
ok, anything else I suppose can just add to the enhancements list. bombs away. |
💥 |
@jreback it's great to get this in, but I'm a little disappointed that you merged this over my API objections... next time, we can please discuss such things more thoroughly and actually reach consensus before merging? |
well your objections would be to make an API change which does not preserve idempotency - so you can certainly raise that change in another issue but it would be a major break with the current Index design this keeps this enhancement quite straightforward further this change need to live in master for a while - since planning on doing an rc in s couple of weeks further this was discussed quite a bit - last thing that we need is to have endless debates |
@jreback just made a new issue: #12067 I agree with your other points (especially testing), just would have appreciated a chance to raise the issue with a broader audience (like I did now) before you merged over my objection. I understand that this is not final (given that we haven't made a release yet) and that you could possibly be convinced on this (depending on how others chime in), but that wouldn't be obvious to new contributors. Just more generally, I would appreciate it if you tried harder to operate by consensus, per our new governance docs -- as opposed to "s/he who presses the merge button makes all final decisions!" :) |
I think we can better avoid miscommunication in the future with a "-1, until we resolve X Y Z". It's too bad that GitHub doesn't have a voting system (cf https://github.com/dear-github/dear-github). As a matter of process I've seen in other projects, usually it's someone other than the person who proposed the patch who presses the Merge button, or the Merge button cannot be pressed into a code review explicitly gives a +1 for the patch. For large patches we can try to be better about this. |
closes #939
replaces #9977
ToDo:
Much commentary on the original issue #9977
but in essence
RangeIndex
is a complete replacement forInt64Index
, which all indexing semantics and interop. This is now the default indexer upon construction. It should be completely transparent to the end user.It provides a constant memory footprint for any size of index. Their is a tiny penalty for < about 10 elements (which is actually trivial to fix, e.g. we could simply instantiate an
Int64Index
for these cases). But I think it is more natural to always get aRangeIndex
.One other change here is to
assert_index_equal
theexact
kw now takesequiv
as the default (in addition to a boolean) to allow for exact comparisions except forInt64Index
/RangeIndex
are considered equivalent (as are string/unicode as inferred types, this was pre-existing).