Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939

wesm · 2012-03-18T23:53:09Z

No description provided.

hayd · 2014-09-02T00:43:53Z

Here's @jtratner's tree for this https://github.com/jtratner/pandas/tree/add-range-index (based off Wes').

(I keep struggling to find it. Perhaps I'll rebase and PR, would be interesting to experiment with this.)

immerrr · 2014-09-02T08:54:20Z

And there's likely some code that can be salvaged from BlockPlacement class I've added when refactoring block managers.

ARF1 · 2015-04-20T23:05:31Z

I am interested in RangeIndexes as well. Does anybody know what the state of @jtratner's tree is? Is there anything I can do to help push this towards a PR?

jreback · 2015-04-20T23:13:36Z

it's in a reasonable state
would welcome an update PR for it

hayd · 2015-04-20T23:20:40Z

I linked to it above, it's here https://github.com/jtratner/pandas/tree/add-range-index

IIRC it ought to rebase pretty cleanly (it's mainly new files). +1 to a resurrection!

ARF1 · 2015-04-21T13:36:17Z

Ok, I rebased jtratner's tree: https://github.com/ARF1/pandas/tree/range_index

Of course it was too much to hope it would pass all the test. - One can always dream...

Having not worked on pandas internals before I am probably not the best person to move this forward efficiently. If somebody else feels like they are better suited I would be happy to contribute individual fixes to their tree.

If there are no takers, I will give it a shot myself. Though I will probably need a fair amount of hand-holding to get started.

Running the already written tests, it appears the Index api has changed. Is that possible? Can anybody point me to the PR / issue to help me understand how to adapt the code appropriately?

pandas/core/range.py#L61-L76 has the following segment in its __init__() which seems no longer possible:

self = np.array([], dtype='int64').view(RangeIndex)
...
self.left = left

As I understand this, RangeIndex being a subclass of Index used to be a ndarray subclass but is no longer, right?

jreback · 2015-04-21T13:50:42Z

yes Index is no longer a subclass of ndarray since 0.15.0.You can look at the recently merged CategoricalIndex for the api

jorisvandenbossche · 2015-04-21T13:50:57Z

@ARF1 The easiest thing to do is probably to just open a pull request based on your branch. Then it is easier to comment on the code and give feedback.

ARF1 · 2015-04-21T14:08:26Z

@jorisvandenbossche Ok, I did not want to "pollute" the PR list with unfinished code. The rebased tree is available as PR #9961.

@jreback Thanks. I will take a look and see if I can find my way around...

@jreback

`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up

@jreback

`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up

@jreback

`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up

jreback mentioned this issue Oct 12, 2013

Unify index and multindex (and possibly others) API #3268

Closed

17 tasks

jreback closed this as completed Jan 3, 2014

jreback reopened this Jan 3, 2014

jtratner mentioned this issue Feb 1, 2014

Reducing memory footprint for catagoricals #6219

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014

immerrr mentioned this issue Apr 1, 2014

CLN: revisit & simplify core data structures #6744

Closed

jreback modified the milestones: 0.15.0, 0.15.1 Jul 7, 2014

jreback modified the milestones: 0.16, 0.15.0 Sep 23, 2014

jreback modified the milestones: 0.16, 0.15.1 Oct 7, 2014

jreback mentioned this issue Jan 9, 2015

No way to construct mixed dtype DataFrame without total copy, proposed solution #9216

Closed

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

ARF1 mentioned this issue Apr 21, 2015

Introduction of RangeIndex #9961

Closed

jreback modified the milestones: 0.17.0, Next Major Release Apr 24, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Aug 15, 2015

jreback added Prio-high and removed Prio-medium labels Aug 22, 2015

jreback mentioned this issue Dec 23, 2015

ENH: RangeIndex redux #11892

Merged

2 tasks

jreback closed this as completed in #11892 Jan 16, 2016

jreback mentioned this issue Jan 19, 2016

BUG: GH12071 .reset_index() should create a RangeIndex #12080

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939

Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939

wesm commented Mar 18, 2012

hayd commented Sep 2, 2014

immerrr commented Sep 2, 2014

ARF1 commented Apr 20, 2015

jreback commented Apr 20, 2015

hayd commented Apr 20, 2015

ARF1 commented Apr 21, 2015

jreback commented Apr 21, 2015

jorisvandenbossche commented Apr 21, 2015

ARF1 commented Apr 21, 2015

Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939

Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939

Comments

wesm commented Mar 18, 2012

hayd commented Sep 2, 2014

immerrr commented Sep 2, 2014

ARF1 commented Apr 20, 2015

jreback commented Apr 20, 2015

hayd commented Apr 20, 2015

ARF1 commented Apr 21, 2015

jreback commented Apr 21, 2015

jorisvandenbossche commented Apr 21, 2015

ARF1 commented Apr 21, 2015