ENH: RangeIndex redux #11892

jreback · 2015-12-23T22:10:54Z

closes #939
replaces #9977

ToDo:

test for packers.py
more code review

Much commentary on the original issue #9977

but in essence RangeIndex is a complete replacement for Int64Index, which all indexing semantics and interop. This is now the default indexer upon construction. It should be completely transparent to the end user.

It provides a constant memory footprint for any size of index. Their is a tiny penalty for < about 10 elements (which is actually trivial to fix, e.g. we could simply instantiate an Int64Index for these cases). But I think it is more natural to always get a RangeIndex.

One other change here is to assert_index_equal the exact kw now takes equiv as the default (in addition to a boolean) to allow for exact comparisions except for Int64Index/RangeIndex are considered equivalent (as are string/unicode as inferred types, this was pre-existing).

In [1]: s = Series(range(5))

In [2]: s.index
Out[2]: RangeIndex(start=0, stop=5, step=1)

In [3]: s.nbytes
Out[3]: 40

In [4]: s.index.nbytes
Out[4]: 72

In [5]: s.index.astype(int).nbytes
Out[5]: 40

In [6]: s = Series(range(100))

In [7]: s.index.astype(int).nbytes
Out[7]: 800

In [8]: s.index.nbytes
Out[8]: 72

jreback · 2015-12-23T22:11:44Z

cc @jorisvandenbossche @shoyer @sinhrks

jreback · 2015-12-29T21:42:05Z

@shoyer if you can review when you have a chance. as you had a number of comments on the original.

shoyer · 2015-12-29T22:24:49Z

doc/source/whatsnew/v0.18.0.txt

+   In [2]: s.index
+   Out[2]: Int64Index([0, 1, 2, 3, 4], dtype='int64')
+
+.. ipython:: python


add "New behavior:"

jreback · 2015-12-30T02:29:03Z

@shoyer all fixed up

shoyer · 2015-12-30T17:56:26Z

pandas/core/index.py

+            raise TypeError('Invalid to pass a non-int64 dtype to RangeIndex')
+
+        # RangeIndex
+        if isinstance(start, RangeIndex):


I still believe pretty strongly that it's a bad idea to make the public API for the constructor this flexible. You can use Index(range(...)) for this.

you mean the dtype part? or accepting range part?

jreback · 2016-01-13T18:49:10Z

note that I already had removed start,stop,step from the public API as this an implementation detail (and more like range)

not sure why 1) but having a nice constructor is a problem.

shoyer · 2016-01-13T18:55:48Z

note that I already had removed start,stop,step from the public API as this an implementation detail (and more like range)

Great. This is like how xrange works on Python 2, but Python 3's range does have start, stop, step as attribute.

not sure why 1) but having a nice constructor is a problem.

Well, we could make the constructor only accept a single array/index/range argument instead, and require another dedicated method from_steps (?) for start, stop, step -- or even skip the dedicated method entirely, requiring passing in a range object to parse it that way. But I think I would prefer the other way around.

kawochen · 2016-01-13T20:11:18Z

another attempt to change equals 👊

In [75]: range(0, 9, 2) == range(0, 10, 2)
Out[75]: True

And if the length is 1 you should only use _start

jreback · 2016-01-13T20:54:17Z

@kawochen can you add a commit for this (and some addtl tests)?

kawochen · 2016-01-14T13:24:37Z

Oh OK.

TomAugspurger · 2016-01-15T03:33:47Z

@jreback submitted some tests at jreback#15

jreback · 2016-01-15T15:00:43Z

@TomAugspurger @kawochen incorporated your changes thanks!

ok, ready to go on this.

TomAugspurger · 2016-01-15T15:03:37Z

LGTM (assuming tests pass). Thanks.

jreback · 2016-01-15T15:10:27Z

@TomAugspurger I added the floordiv enhancement to #12034 but low priority :>

@jreback

`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up

jreback · 2016-01-16T17:36:50Z

ok, anything else I suppose can just add to the enhancements list.

bombs away.

ENH: RangeIndex redux

hayd · 2016-01-16T18:38:01Z

💥

shoyer · 2016-01-17T00:18:26Z

@jreback it's great to get this in, but I'm a little disappointed that you merged this over my API objections... next time, we can please discuss such things more thoroughly and actually reach consensus before merging?

jreback · 2016-01-17T00:54:42Z

@shoyer

well your objections would be to make an API change which does not preserve idempotency - so you can certainly raise that change in another issue but it would be a major break with the current Index design

this keeps this enhancement quite straightforward

further this change need to live in master for a while - since planning on doing an rc in s couple of weeks
letting see how actual users interact is beneficial

further this was discussed quite a bit - last thing that we need is to have endless debates

shoyer · 2016-01-17T01:14:44Z

@jreback just made a new issue: #12067

I agree with your other points (especially testing), just would have appreciated a chance to raise the issue with a broader audience (like I did now) before you merged over my objection. I understand that this is not final (given that we haven't made a release yet) and that you could possibly be convinced on this (depending on how others chime in), but that wouldn't be obvious to new contributors. Just more generally, I would appreciate it if you tried harder to operate by consensus, per our new governance docs -- as opposed to "s/he who presses the merge button makes all final decisions!" :)

wesm · 2016-01-17T01:56:47Z

I think we can better avoid miscommunication in the future with a "-1, until we resolve X Y Z". It's too bad that GitHub doesn't have a voting system (cf https://github.com/dear-github/dear-github). As a matter of process I've seen in other projects, usually it's someone other than the person who proposed the patch who presses the Merge button, or the Merge button cannot be pressed into a code review explicitly gives a +1 for the patch. For large patches we can try to be better about this.

jreback added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Dec 23, 2015

jreback added this to the 0.18.0 milestone Dec 23, 2015

jreback mentioned this pull request Dec 23, 2015

Introduction of RangeIndex #9977

Closed

25 tasks

jreback force-pushed the ri branch 5 times, most recently from 0eaeaa5 to 771e8fb Compare December 27, 2015 15:45

shoyer reviewed Dec 29, 2015
View reviewed changes

jreback force-pushed the ri branch 4 times, most recently from 4d6739b to 6824bd3 Compare December 30, 2015 02:24

shoyer reviewed Dec 30, 2015
View reviewed changes

jreback force-pushed the ri branch from cce47f3 to 6d16bd5 Compare January 15, 2016 13:49

This was referenced Jan 15, 2016

oracle tests jreback/pandas#15

Closed

fixed equals, added test cases, shortcut from_range if PY3 jreback/pandas#16

Closed

ARF and others added 6 commits January 16, 2016 10:37

test fixes, enhancements, and code review

1419e8e

DOC: documentation

56cef1b

fixed equals, added test cases, shortcut from_range if PY3

c5255da

floordiv addtl tests

0407502

make floordiv return int64index always

fab291b

jreback force-pushed the ri branch from 9a5dbe4 to fab291b Compare January 16, 2016 15:42

jreback added a commit that referenced this pull request Jan 16, 2016

Merge pull request #11892 from jreback/ri

723a147

ENH: RangeIndex redux

jreback merged commit 723a147 into pandas-dev:master Jan 16, 2016

shoyer mentioned this pull request Jan 17, 2016

API: should the RangeIndex constructor accept start=range_index? #12067

Closed

kawochen mentioned this pull request Feb 9, 2016

ENH: Make use of RangeIndex in DatetimeIndex #12272

Closed

jreback mentioned this pull request Sep 21, 2016

Improve speed and memory usage with simple integer indexes #2420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: RangeIndex redux #11892

ENH: RangeIndex redux #11892

jreback commented Dec 23, 2015

jreback commented Dec 23, 2015

jreback commented Dec 29, 2015

shoyer Dec 29, 2015

jreback commented Dec 30, 2015

shoyer Dec 30, 2015

jreback Dec 31, 2015

jreback commented Jan 13, 2016

shoyer commented Jan 13, 2016

kawochen commented Jan 13, 2016

jreback commented Jan 13, 2016

kawochen commented Jan 14, 2016

TomAugspurger commented Jan 15, 2016

jreback commented Jan 15, 2016

TomAugspurger commented Jan 15, 2016

jreback commented Jan 15, 2016

jreback commented Jan 16, 2016

hayd commented Jan 16, 2016

shoyer commented Jan 17, 2016

jreback commented Jan 17, 2016

shoyer commented Jan 17, 2016

wesm commented Jan 17, 2016

ENH: RangeIndex redux #11892

ENH: RangeIndex redux #11892

Conversation

jreback commented Dec 23, 2015

jreback commented Dec 23, 2015

jreback commented Dec 29, 2015

shoyer Dec 29, 2015

Choose a reason for hiding this comment

jreback commented Dec 30, 2015

shoyer Dec 30, 2015

Choose a reason for hiding this comment

jreback Dec 31, 2015

Choose a reason for hiding this comment

jreback commented Jan 13, 2016

shoyer commented Jan 13, 2016

kawochen commented Jan 13, 2016

jreback commented Jan 13, 2016

kawochen commented Jan 14, 2016

TomAugspurger commented Jan 15, 2016

jreback commented Jan 15, 2016

TomAugspurger commented Jan 15, 2016

jreback commented Jan 15, 2016

jreback commented Jan 16, 2016

hayd commented Jan 16, 2016

shoyer commented Jan 17, 2016

jreback commented Jan 17, 2016

shoyer commented Jan 17, 2016

wesm commented Jan 17, 2016