-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DF.transform
#912
Add DF.transform
#912
Conversation
This operation is doing three things at the moment:
It also has a limitation that it computes one single column. For example, instead we could have:
We could also emit a custom row struct that accepts both strings and atom keys and converts fields as necessary. For example, imagine we had a |
However, we should benchmark the approaches. The lazy one may end-up being less efficient if we do too many trips to Rust. We should certainly have a single operation to access a given column+row. |
José I didn't want to use my brain today :P
👍 Yeah we could definitely get multiple columns with EDIT: removed a comment about validation.
I don't think so but I'm not sure. I couldn't find a definite answer in the docs. They seem to support several kinds of index-based access and I'm not sure which is the "right" one. Following some source code led me to this file: If this is the right place, I see several references to binary searches. That makes me think it's
Yeah definitely some benchmarks are in order. I suspect the most expensive part is the de-serialization step required to feed the Elixir functions. I'll try your lazy approach and get back with some numbers. I also want to try and leverage Arrow's chunking. If de-serializing a single chunk is fast, it may be worth parallelizing over chunks on the Elixir side rather than trying to trick Polars into doing what we want. IDK how easy that level of control will be though. |
My understanding from the Rust code is that they do a binary search only if there are several chunks. What we may want to do is to rechunk the dataframe before using it. Another potential concern here is doing the bounds check on every operation, but they do have an |
Description
Adds
DF.transform/3
which is the analogous function toS.transform/2
. I've needed a version of this function many times in my own work.Example