-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support converting to NumPy masked arrays #16398
Comments
Hey, I'd be interested in taking a stab at this issue if its available! |
Sure, go ahead! |
Hey, I'm beginning to look in to this and just want to make sure I'm clear about what the source for the masked buffer is. Is this something that you envision to be passed as part of the to_numpy function? i.e. I'd be able to write: x = pl.Series([1,2,-1,4]).to_numpy(mask = [0, 0, 1, 0]) Or is there some other input where users should define the mask? |
The mask is the validity buffer of the Series. The user doesn't define it manually. |
Hey, I've been very slow to get started on this but finally have some time - a quick question about the Also, I wanted ask about the behavior for arrays that have a null bitmask - I assume this means that all entries are valid, and we should construct the python array as such? |
Hi, |
NumPy has a masked array concept:
https://numpy.org/doc/stable/reference/maskedarray.html
This type of array consists of a values buffer and a validity buffer. This more closely matches how data is represented in Polars, so it would be good to support.
There are two main benefits:
uint8
instead of object typesNote that we will still not be able to convert nulls without copy since boolean arrays are
UInt8
type (byte-packed) in NumPy, while they are bit-packed in Polars.API
The desired API would be to add a
masked
parameter toDataFrame/Series.to_numpy
. It defaults to False.Implementation
We would have to separately convert the values buffer and a validity buffer, and afterwards pass these to the array constructor. We should do this in Rust, as there we have direct access to the values buffer.
The text was updated successfully, but these errors were encountered: