-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
append, append_row, append_column: methods for appending an array or single rows and columns #932
Conversation
Note to those following along: there already exists other ways to build arrays row-by-row or column-by-column: the stack and concat functions. |
Functionality like this would be really nice. I think a slightly different API would be more generally useful: impl<A, D> Array<A, D>
where
D: Dimension,
{
/// Appends the subview to the specified axis.
///
/// The memory layout of the array will be changed if necessary (by copying the data)
/// so that subsequent calls to `.append_to_axis()` with the same axis will be fast.
/// This occurs in at least the following cases:
///
/// - If the outermost (i.e. largest absolute stride) axis of the array is not `axis`.
/// - If the array is discontiguous.
/// - If the stride for `axis` is negative.
///
/// So, calling `.append_to_axis()` with varying values of `axis` will likely be very slow, but
/// after one call to `.append_to_axis()`, subsequent calls to `.append_to_axis()` with the
/// *same* axis will be relatively fast.
///
/// **Errors** if the axis lengths of `subview` do not match the correponding axis lengths of `self`.
pub fn append_to_axis<S2>(&mut self, axis: Axis, subview: &ArrayBase<S2, D::Smaller>) -> Result<(), ShapeError>
where
A: Clone,
S2: Data<Elem = A>,
{}
} In other words, it would be nice for the operation to work for n-dimensional arrays of arbitrary layout. The layout of the array would be changed as necessary so that the specified axis would be the outermost axis, so that repeated calls to |
Yeah that would be more useful, but the complexity is higher. Do we have an easy way to get to the implementation? How to do the trick where we make a.swap_axis(0, axis);
let a = a.as_standard_layout().into_owned();
a.swap_axis(0, axis);
// axis is now the largest stride axis; and with that axis ignored, the rest is c-contig |
I suppose I'd suggest something like the following: impl<A, D> Array<A, D>
where
D: Dimension,
{
/// Rearranges the layout if necessary (by copying the data) such that:
///
/// - if the specified axis has length > 1, it is the outermost axis and
/// has a positive stride
/// - the other axes are in standard layout
/// - the array is contiguous
///
/// # Example
///
/// ```rust
/// use ndarray::{Array4, Axis};
///
/// let mut arr = Array4::<u8>::zeros([2, 3, 4, 5]);
/// assert_eq!(arr.shape(), &[2, 3, 4, 5]);
/// assert_eq!(arr.strides(), &[60, 20, 5, 1]);
///
/// arr.make_axis_outer_pos(Axis(2));
/// assert_eq!(arr.shape(), &[2, 3, 4, 5]);
/// assert_eq!(arr.strides(), &[15, 5, 30, 1]);
/// ```
pub(crate) fn make_axis_outer_pos(&mut self, axis: Axis)
where
A: Clone,
D: RemoveAxis,
{
if self.is_contiguous()
&& is_standard_layout(&self.dim.remove_axis(axis), &self.strides.remove_axis(axis))
&& (self.len_of(axis) <= 1
|| axis == self.dim.max_stride_axis(&self.strides) && self.stride_of(axis) > 0)
{
return;
}
let ax = axis.index();
self.dim.slice_mut()[..=ax].rotate_right(1);
self.strides.slice_mut()[..=ax].rotate_right(1);
*self = Array::from_shape_vec(self.dim.clone(), self.iter().cloned().collect()).unwrap();
self.dim.slice_mut()[..=ax].rotate_left(1);
self.strides.slice_mut()[..=ax].rotate_left(1);
}
} (The existing In principle, we could move the elements instead of cloning them, but that would require some unsafe code and would be much more complex to implement (because all the elements in the data which aren't moved would need to be dropped). This is basically the same approach as what you wrote above, but uses a rotate instead of a swap so that the other axes are in standard layout. This would make it easy to append the subview since the layout would match the order of |
I've looked at the append array case, but I'm stuck at the usual, place, handling non-copy elements. I'm stubbornly preferring to not forget/leak values in case of panic, I don't think it's the right thing to do for a Rust data structure. (Stack/concat also have this problem, while |
I'd suggest something like the following:
This way, the tricky stuff is handled by |
Hm, my implementation uses Zip to make sure shape is traversed in the right order. To use Vec::extend we need to guarantee the array is C-contig except for the axis we are extending, which is a further requirement (or, at least, whatever order The tricky stuff is our nd shape 🙂 |
The I guess you were thinking of preserving the original layout of the array as long as the axis being appended to is the outermost one with a positive stride? That would be possible, although as you pointed out, the implementation would be tricky due to iteration order / leaking issues. It would be nice to do something like that in the future, but I don't think it's necessary for an initial implementation.
I'm curious about this. I'd think we'd want to write an allocator which implements the |
I just think I'm following my own implementation, not writing down your ideas here :) So that's the reason. I was standing still at that problem in my implementation, not using your implementation idea from your comment. Just to clarify. I was thinking of more ndarray specific allocation needs, like aligned allocation or possibly the idea to be compatible with the arrow format. The Allocator trait is kind of not really even solving that problem, I think? Since we could want to tweak allocation alignment and not which allocator it calls, for example. |
Oh, okay, I understand. Regarding the allocator stuff, I was thinking of something like this: pub struct MinAlignAllocator<A: Allocator, const MIN_ALIGN: usize>(A);
impl<A: Allocator, const MIN_ALIGN: usize> Allocator for MinAlignAllocator {
fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
self.0.allocate(layout.align_to(MIN_ALIGN).unwrap_or(AllocError)?)
}
fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) {
self.0.deallocate(ptr, layout.align_to(MIN_ALIGN).unwrap())
}
} We'd add an [Edit: Another option for the API if we always want to force a minimum alignment would be to add an A custom allocator like this is the cleanest way I see to handle the minimum-alignment issue, since the In current Rust, I suppose we could change |
0019cd4
to
7b97f84
Compare
I solved my predicament with the "memory fill order" with Zip. Using a technique I think we have discussed before but I don't recall that it's in the source anywhere: (I think it's cool 🙂) (Edit: Maybe we did not. I realized it's something I'm using in my under cover by value/IntoIterator implementation that's not yet finished.) To recover the desired traversal order for array views
Thanks @jturner314 for the reminder to fix up strides. I didn't quite realize how involved it might be. |
32fc3a3
to
dd353b8
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thanks a lot for the comments and bugfixes 🙂. I don't know about append versus push, it's quite arbitrary to me. I looked into std collections and they are consistent with append always being something like My only thought is that it's not inconceivable to generalize our |
67afa5b
to
470b26a
Compare
470b26a
to
7564ad2
Compare
Now without the tricky memory layout failure mode, these methods can just have names without try_ prefix. It's expected that they work successfully as long as the user matches up array sizes correctly.
This method showed up in profiling because it wasn't inlinable.
This speeds up the benches/append.rs benchmarks by 5-10%.
The 1D case is a simpler gather operation since we only select 1 element per index. Special-case it. The performance increase for this benchmark is that the benchmark runtime changes by -92% with this change.
Co-authored-by: Jim Turner <github@turner.link>
Co-authored-by: Jim Turner <github@turner.link>
If the innermost dimension is contiguous, we can skip it in one go, and save some work in the dropping loop in move_into. Co-authored-by: Jim Turner <github@turner.link>
Benchmarks (append benchmarks) show that this is a worthwhile optimization, and that plain self.data.set_len() in the loop performs worse.
Imagine that the array-to-append has length 1 along the growing axis. With this memory layout, the resulting assignment when copying the array elements into `self`, can often be a contiguous traversal. Co-authored-by: Jim Turner <github@turner.link>
Fix two bugs that Jim found in how we calculate the new stride for the growing axis. Tests by Jim Turner. Co-authored-by: Jim Turner <github@turner.link>
This .max_by_key() call showed up in profiling, and this change improves the select_axis0 benchmark by reducing runtime by 15%
7564ad2
to
d946360
Compare
Merging to finish this work. Renaming can still be done in follow-ups. Naming suggestions welcome - append or extend_with_array? append_row or push_row? etc |
this is awesome! really helpful to have clone supported on these methods |
I'm glad you share my enthusiasm. 🙂 I'm happy to finally have arrays that can be appended to. |
I'm really looking forward to this functionality too. 🎉 There are a number of places where it'll simplify my code. To clarify my comment about the naming question -- I think that if we name the row/column methods and the general methods similarly, then the general method should also take // On 2-D arrays
fn append_row(&mut self, row: ArrayView<A, D::Smaller>) {}
fn append_column(&mut self, column: ArrayView<A, D::Smaller>) {}
// On n-D arrays
fn the_general_method(&mut self, axis: Axis, array: ArrayView<A, D::Smaller>) {} I agree that it's helpful to name the methods similarly to So, I'd actually suggest four methods: // On 2-D arrays
fn push_row(&mut self, row: ArrayView<A, D::Smaller>) {}
fn push_column(&mut self, column: ArrayView<A, D::Smaller>) {}
// On n-D arrays
fn push(&mut self, axis: Axis, array: ArrayView<A, D::Smaller>) {}
fn append(&mut self, axis: Axis, array: ArrayView<A, D>) {} // or "extend_from_view" The two separate methods on n-D arrays are both useful. The primary way in which this analogy isn't quite right is that |
That sounds good. I think you have convinced me. (however, I don't see the push_clone/push_view etc names being needed, and I hope not they will be.) |
For owned arrays specifically (Array), allow appending rows and/or columns or
whole arrays (along a given axis).
New methods:
.append_row(ArrayView1<A>) -> Result
.append_column(ArrayView1<A>) -> Result
.append(axis, ArrayView<A, D>) -> Result
.move_into(impl Into<ArrayViewMut>)
New features:
stack
,concatenate
and.select(...)
now support Clone elements(previously only Copy).
The axis we append along should be a "growing axis", i.e the axis with greatest,
and positive, stride. However if that axis is of length 0 or 1, we can always
convert it to the new growing axis.
These methods automatically move the whole array to a compatible memory
layout for appending, if needed.
The examples show that empty arrays are valid both ways (both memory
layouts) - and that might be the easiest way to use these methods for the users.
There were quite a few corner cases in the implementation in this PR, but hopefully
they should all be dealt with in a moderately clean way and with little duplication.
Fixes #269