-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add slice_rows
to interchange protocol
#349
Conversation
fc46bfd
to
069e45b
Compare
In the case of something like pandas or other dataframe library that doesn't use the Arrow memory layout under the hood, they'd presumably materialize arrow on the Additionally, it makes it a bit hard to reason about when the producer vs when the consumer should do row selection. I.E. if Polars is consuming data from say PyArrow, I imagine Polars would rather handle row slicing itself (assuming you'll hit a situation where it's not pure pointer arithmetic). Now in the situation of Pandas consuming data from say Polars, you'd probably want Polars to handle the row slicing. Arrow interchange protocols handle the slicing case (ignoring step size) by allowing specifying an offset and a size. Maybe we can do something similar here? |
069e45b
to
94167b1
Compare
sounds good, thanks |
Do we expect / want to encourage developers using dataframe libraries to explicitly call
to:
My 2c is that this is just highlighting the lack of standard API here and that the experience should be something along the lines of (ignoring API names for column selection and row slicing):
|
Would be good to have others chime in here given this interchange protocol is already being adopted where we probably don't want to introduce something and later decide to change / remove it. |
It's what plotly already does to not have to convert the entire dataframe |
Any updates here please? This is the only thing I plan to try adding to the interchange protocol, promised I think of the interchange protocol as being useful to converting between libraries and doing some preselection in a standardised way:
|
gentle ping (would really like to get this in for pandas 3.0 tbh, and this topic actually has a real world use case microsoft/vscode-jupyter#13951)
the "standard api" solution would be:
does that really look any less clunky? |
The ability to select subset rows in addition to selecting columns seems harmonious. Implementation in Modin should not be a problem. +1 |
Any updates please? |
closing due to lack of interest (this PR has been open for 5 months), thanks all for comments |
closes #204