Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative DataFrame class(es) for OOC + speed #137

Open
ivirshup opened this issue Mar 6, 2023 · 4 comments
Open

Alternative DataFrame class(es) for OOC + speed #137

ivirshup opened this issue Mar 6, 2023 · 4 comments

Comments

@ivirshup
Copy link
Contributor

ivirshup commented Mar 6, 2023

Hey all,

I was wondering if you had considered supporting alternative dataframe classes in this library? In particular I was thinking about the lazy/ accelerated ones built on arrow (e.g. polars, datafusion).

I would hope that the current API could be amenable to this by @singledispatching functions to different backends. It could also be nice to take advantage of a backend that was able to make work with out-of-core amounts of data and do optimizations based column order.

I've also been having a good time interacting with annotation resources via ibis which could integrate nicely with this kind of approach.

@ivirshup ivirshup changed the title Alternative DataFrame classe(s) for OOC + speed Alternative DataFrame class(es) for OOC + speed Mar 6, 2023
@Phlya
Copy link
Member

Phlya commented Mar 6, 2023

BTW pandas 2.0 will have a pyarrow backend... I wonder how that will work for bioframe.

@ivirshup
Copy link
Contributor Author

ivirshup commented Mar 9, 2023

BTW pandas 2.0 will have a pyarrow backend

Yup, I've already opened issues around the release candidate😅. Not actually that sure how much the current pyarrow backend is changing, or if it's just not experimental anymore.

But, while pyarrow will probably have better performance than pandas (especially with strings), I think backends like duckdb or polars have the much larger benefit of being able to work with out-of-core data efficiently.

@endrebak
Copy link

I am collaborating with the bioframe authors on this project (not in a usable state yet): https://github.com/endrebak/poranges

@ivirshup
Copy link
Contributor Author

Related to this a request for input on defining a dataframe standard: https://data-apis.org/blog/dataframe_standard_rfc/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants