-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939
Comments
Here's @jtratner's tree for this https://github.com/jtratner/pandas/tree/add-range-index (based off Wes'). (I keep struggling to find it. Perhaps I'll rebase and PR, would be interesting to experiment with this.) |
And there's likely some code that can be salvaged from BlockPlacement class I've added when refactoring block managers. |
I am interested in RangeIndexes as well. Does anybody know what the state of @jtratner's tree is? Is there anything I can do to help push this towards a PR? |
it's in a reasonable state |
I linked to it above, it's here https://github.com/jtratner/pandas/tree/add-range-index IIRC it ought to rebase pretty cleanly (it's mainly new files). +1 to a resurrection! |
Ok, I rebased jtratner's tree: https://github.com/ARF1/pandas/tree/range_index Of course it was too much to hope it would pass all the test. - One can always dream... Having not worked on pandas internals before I am probably not the best person to move this forward efficiently. If somebody else feels like they are better suited I would be happy to contribute individual fixes to their tree. If there are no takers, I will give it a shot myself. Though I will probably need a fair amount of hand-holding to get started. Running the already written tests, it appears the pandas/core/range.py#L61-L76 has the following segment in its
As I understand this, |
yes Index is no longer a subclass of ndarray since 0.15.0.You can look at the recently merged CategoricalIndex for the api |
@ARF1 The easiest thing to do is probably to just open a pull request based on your branch. Then it is easier to comment on the code and give feedback. |
@jorisvandenbossche Ok, I did not want to "pollute" the PR list with unfinished code. The rebased tree is available as PR #9961. @jreback Thanks. I will take a look and see if I can find my way around... |
`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up
`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up
`RangeIndex(1, 10, 2)` is a memory saving alternative to `Index(np.arange(1, 10,2))`: c.f. pandas-dev#939. This re-implementation is compatible with the current `Index()` api and is a drop-in replacement for `Int64Index()`. It automatically converts to Int64Index() when required by operations. At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation. This PR also extends the functionality of the `Index()` constructor to allow creation of `RangeIndexes()` with ``` Index(20) Index(2, 20) Index(0, 20, 2) ``` in analogy to ``` range(20) range(2, 20) range(0, 20, 2) ``` restore Index() fastpath precedence Various fixes suggested by @jreback and @shoyer Cache a private Int64Index object the first time it or its values are required. Restore Index(5) as error. Restore its test. Allow Index(0, 5) and Index(0, 5, 1). Make RangeIndex immutable. See start, stop, step properties. In test_constructor(): check class, attributes (possibly including dtype). In test_copy(): check that copy is not identical (but equal) to the existing. In test_duplicates(): Assert is_unique and has_duplicates return correct values. fix slicing fix view Set RangeIndex as default index * enh: set RangeIndex as default index * fix: pandas.io.packers: encode() and decode() for RangeIndex * enh: array argument pass-through * fix: reindex * fix: use _default_index() in pandas.core.frame.extract_index() * fix: pandas.core.index.Index._is() * fix: add RangeIndex to ABCIndexClass * fix: use _default_index() in _get_names_from_index() * fix: pytables tests * fix: MultiIndex.get_level_values() * fix: RangeIndex._shallow_copy() * fix: null-size RangeIndex equals() comparison * enh: make RangeIndex.is_unique immutable enh: various performance optimizations * optimize argsort() * optimize tolist() * comment clean-up
No description provided.
The text was updated successfully, but these errors were encountered: