-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic storage #3
Conversation
Wow, this is very interesting! The whole thing is just inspired from numpy's ndarray, so that's where the copy on write comes from. For sure I have contemplated how to create array views and making the shared data optional. Rust has a different model than Python for sure, and our slices and shared references may be the saner way to have cheap array views. I wouldn't want to rip out Rc-sharing myself, but I don't think I need it. We can probably do something like this.. If you look around, I bet you can see that the whole crate needs modernization. It's pretty much all pre-Rust 1.0, with only changes to keep it compiling. Apart from cheap view types like this, my other big priority would be efficient iteration. I don't think the current is quite where we want it to be. |
Yes I guessed this came from numpy, though I haven't witnessed copy on write behavior in numpy yet.
There's probably one point that feels tricky to me: right now arrays have a pointer member, which was probably necessary with an rc storage, but I don't think it's necessary once the Rc storage is removed (slice indexing can be used instead). That would remove quite a lot of unsafe code.
I haven't looked at it yet, what are the problems with the current implementation? |
The pointer is needed for general slicing / stride computations. The pointer is recalculated if any dimension is sliced with a negative stride. It also avoids double indirection in case of a
It calculates indices for each lap of an iteration, ideally iteration over a regular layout 2d array should be just as fast as over a 1d array, as a first measure. |
I see, I had not thought about negative strides. For iteration, it seems there are two different cases: when the storage is contiguous (or not contiguous only on the dimension the array is iterated on), it can be done by simply offseting a pointer. |
You understand that I can't outright merge this, since it would break all client code. I'll make sure to publish the current version on crates.io so that it's possible for us and clients to version our changes properly. |
Of course, no problem with that. Also I've been thinking about it some more, and I see that some nice to have features coming from numpy can't be achieved without reference counting. So maybe I could enhance the state of this pull request by re-introducing the Rc based storage (and maybe an Arc based storage as well?) to take care of these cases. For instance:
That way maybe client code can opt into Rc for the more ergonomic API, and drop down to finer grained control where needed. If that's ok with you I'll perform the changes and append them to the pull request. |
I'm not convinced the Rc features are that useful, but they are interesting. They provide the versatility of the current solution ( I'm thinking it would be nice to generalize this with traits and explicit view types if it's possible. The |
In my opinion, the parts where the owner/view dichotomy make the API complicated are tricky points in the numpy API as well. For instance, in numpy writing to a reshaped array can either overwrite the original data, or write into a copy, silently. This means that for lots of programs, there is a need to be sure whether a reshape will cause a copy or not. With a reshape returning an array view, this is made explicit in the API. I've been experimenting with more storages in the
Apart from the pain point above, I find the |
I'm looking at your PR while remaking it to a large part. I'll go the route of preserving the shared / cow array, so that it's a smooth changeover. |
FWIW, using Deref is not possible anyway due to the usual trait trap: |
Thank you for all the ideas. I do believe this idea proved itself as soon as I tried implementing it. The view types also have some benefits of representing states that the Cow array could not, for example, the read only broadcasted to larger size array view, or a mutable subview. |
Improve pairwise summation
Hi,
I've been experimenting with ndarrays in rust recently, and I figured I should probably try and contribute to your crate rather than create yet another experimental library in rust.
I'm a bit concerned about requiring the storage for ndarrays to be
Rc<Vec>
. Indeed, most usages of ndarrays don't require shared ownership. Also it makes working with views over foreign data a bit harder in my opinion. Also, I find the implicit copy on write semantics surprising, and would prefer having an explicit Cow if I need these semantics.So here's my proposal: have the Array struct generic over its storage, provided that the storage can be Deref to a slice.
This then gives three main Array types:
type ArrayOwned = Array<Vec>
,type ArrayView = Array<&[A]>
, andtype ArrayViewMut = Array<&mut [A]>
.This makes the ownership clear, and it doesn't seem to overcomplicate the API.
What do you think?