Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

Is this project advancing? #79

Open
sursu opened this issue Sep 25, 2019 · 5 comments
Open

Is this project advancing? #79

sursu opened this issue Sep 25, 2019 · 5 comments

Comments

@sursu
Copy link

sursu commented Sep 25, 2019

Just a question:

I see that the latest commit in this repository has been more than 2 years ago. Is this project meant to replace pandas and if so: is it advancing?
This discusion on Reddit offers few answers.

The ideas proposed seem really appealing to me. Especially the "judicious and responsible use of modern C++".

If not, I suggest there to be a note to redirect potential enthusiasts to projects alike where there is a need for contributions.

@toobaz
Copy link

toobaz commented Sep 25, 2019

If not, I suggest there to be a note to redirect potential enthusiasts to projects alike where there is a need for contributions.

pandas is definitely one ;-)

@datapythonista
Copy link

datapythonista commented Sep 25, 2019 via email

@sursu
Copy link
Author

sursu commented Sep 25, 2019

My understanding is that these proposed changes will gradually be implemented in pandas, and there won't be a pandas2.

While I see that rewriting of the Block Manager is in the roadmap, I don't see there the Building “libpandas” in C++11/14 for lowest level implementation tier.

I am curious to know whether there was a decision to stick with Cython.

If I am wrong and C++ is being embraced, I imagine that some implementations will have to coexist: methods in C++ and methods in Cython. Are there already examples of that?

@jorisvandenbossche
Copy link
Contributor

A big part of the ideas that are listed in this repo (certainly the page on the "data structure changes") evolved into Wes starting Arrow. So for now that is where the C++ work is being done, there are no short-term plans to do that in pandas itself (but we might start using Arrow).

For the BlockManager rewrite, there is currently no concrete decision whatsoever (so also not to stick with cython), except that it could be beneficial. That's an item of the roadmap that needs more to be discussed/detailed more.

We should probably update the README of this repo to reflect this status better.

I imagine that some implementations will have to coexist: methods in C++ and methods in Cython

pyarrow is an example of that.

@wesm
Copy link
Owner

wesm commented Sep 25, 2019

@sursu indeed one of my primary motivations in developing the Apache Arrow project (which has more or less been my primary focus since sometime in 2015) is to develop next-generation data frame internals, and to do so in a way that doesn't create another large codebase owned by a small Python-only core development team. We're developing Arrow with the help of a much larger core community.

pandas has millions of users so advancing the goals from the "pandas2" discussion will take years of work to make progress without disrupting existing users. There is also the very important question of who will pay for the work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants