-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement __array__
for sequence types
#615
Conversation
Pull Request Test Coverage Report for Build 2521772788
💛 - Coveralls |
__array__
for sequence types__array__
for sequence types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see this fixed in pyo3 since numpy can automatically handle nested sequence objects but in general the approach here looks good.
src/iterators.rs
Outdated
($($t:ty)*) => ($( | ||
impl PyConvertToPyArray for Vec<$t> { | ||
fn convert_to_pyarray(&self, py: Python) -> PyResult<PyObject> { | ||
Ok(self.clone().to_pyarray(py).into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any downside of using IntoPyArray
which doesn't allocate more memory? https://docs.rs/numpy/latest/numpy/array/struct.PyArray.html#memory-location
Ok(self.clone().to_pyarray(py).into()) | |
Ok(self.clone().into_pyarray(py).into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it depends on what we want to do here, if we want to return a copy of the inner data structure I would say lets drop the clone()
and just call to_pyarray()
. That will copy the data into a python/numpy allocated array so numpy will have full read/write on it. If we want the fastest return but at the cost of having some function restricted on the array we should drop the clone()
and just do into_pyarray(py)
. I'm not sure what is more expected from numpy's __array__
I'm thinking probably the to_pyarray()
path so it's an array copy of the data. But either way I don't think we need a clone()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, talking this over with @jakelishman I think we should just do self.into_pyarray(py).into()
here and return a numpy view directly into the buffer without copying. If the numpy function requires writing inplace or something they can use the copy()
method explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not drop clone()
if we use IntoPyArray
since it needs to take ownership of the vector but __array__
receives a ref &self
(and pyo3 doesn't allow to receive self
and for a good reason since you can not move data from Python).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the C API level of Numpy, you can have an array wrap a raw pointer, and tell Numpy that some other Python-managed object owns the lifetime of that array (so it won't attempt to free the memory). You can also set flags in Numpy to make an array non-writeable. I guess it likely would need unsafe
Rust to achieve that from an &self
borrow because you can't tell the Rust compiler about Python/Numpy's guarantees, but that behaviour (with appropriate flags set) is sort-of compatible with an &self
borrow, if you use the base
attribute to tie the lifetime of the array to the lifetime of the underlying Rust object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the underlying trait method into_pyarray
(which is part of the IntoPyArray
@georgios-ts linked to) is doing exactly that for us. It uses an unsafe block to get the pointer: https://github.com/PyO3/rust-numpy/blob/19bfc9d2d0e72bb3ed30c08244ee81abd02d7386/src/convert.rs#L67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's missing the lifetime/read-only ties, though, since it takes self
not &mut self
/&self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakelishman Yeah, this is indeed possible and rust-numpy provides the unsafe borrow_from_array
(which results in a writeable array but you can't resize it) so we can avoid cloning the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to using into_pyarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, I agree with @georgios-ts having a fix in PyO3 would be the better path forward here. But having this in place now will avoid us from blocking the 0.12 release on this issue. We can revisit if we want this after the PYO3 issue is fixed and included in a release (I meant to try and tackle it but haven't had the time lately). The only thing I was on the fence about is having a release note to document we natively implement __array__
now on the custom return types. But I think I'm ok without it as I don't think most users will notice vs older releases where having the full sequence protocol implemented implicitly exposed an array form.
Pull Request Test Coverage Report for Build 2826944969
💛 - Coveralls |
Fixes #614
Implements an explicit conversion from our custom types to Numpy arrays
This a quick fix to support converting using
np.array
andnp.asarray
using the API specified by NumPy. The solution is short and not too hard to maintain, the only burden will be remembering to addpy_convert_to_py_array_obj_impl!
for each new sequence return type we create.Tasks: