Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an in-place rotate method for slices to libcore #41670

Merged
merged 6 commits into from
Jun 2, 2017

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented May 1, 2017

A helpful primitive for moving chunks of data around inside a slice.

For example, if you have a range selected and are drag-and-dropping it somewhere else (Example from Sean Parent's talk).

(If this should be an RFC instead of a PR, please let me know.)

Edit: changed example

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@kennytm
Copy link
Member

kennytm commented May 1, 2017

If the motivation is inserting an iterator in the middle of a vector, why not use .splice?

#![feature(splice)]
fn main() {
    let mut v = (0..10).collect::<Vec<_>>();
    v.splice(7..7, 100..104);
    assert_eq!(&v, &[0, 1, 2, 3, 4, 5, 6, 100, 101, 102, 103, 7, 8, 9]);
}

@leonardo-m
Copy link

Having an efficient rotate algorithm in the standard library of a system language is quite good, even expected.

@carols10cents carols10cents added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 1, 2017
@scottmcm
Copy link
Member Author

scottmcm commented May 1, 2017

@kennytm Well, that'll teach me not to only read the stable docs when picking example use cases 😅 I updated the doctest & the first post above to a different example instead.

That said, it would be nice to have a rotate method when implementing splice, as today it works by allocating a separate buffer for items that don't fit in the drain range instead of using excess capacity already in the vector. (Of course, for I:TrustedLen it can do even better than rotate.) Back when I first wrote this, I just did it the easy allocate+3*copy way, but benchmarking to propose for std showed that to actually be slower than using flipping hands for large rotations, presumably due to cache effects.

@Diggsey
Copy link
Contributor

Diggsey commented May 1, 2017

This would be useful for me. Currently I just use the following method:

// In-place rotation algorithm (shifts to the right)
fn rotate_slice<T>(slice: &mut [T], places: usize) {
    // Rotation can be implemented by reversing the slice,
    // splitting the slice in two, and then reversing the
    // two sub-slices.
    slice.reverse();
    let (a, b) = slice.split_at_mut(places);
    a.reverse();
    b.reverse();
}

How does this compare performance wise?

@scottmcm
Copy link
Member Author

scottmcm commented May 2, 2017

How does this compare performance wise?

The implementation in this PR is about 10% faster in general, 50% faster for certain mid/len combinations, and 90% faster for certain types.

  • The most obvious advantage is for small rotates, where it uses ptr::copy and a few stack variables. But that optimization can also apply after a few rounds of flipping, and it turns out [u64; 5*1024*1024].rotate(9199) is one of the cases that happens to be about twice as fast as the three-reverses algorithm.

  • [u8].reverse() has terrible performance. It's reordering single bytes, so LLVM can do some unrolling, but it's still byte load and byte store. Swapping slices, however, is always moving forward and never rearranging, so LLVM vectorizes it to deal in <16 x i8>, which is much faster. (This vectorizing can also help other small types, but is of course most noticeable for bytes.) Playground link to look at LLVM IR and ASM: https://is.gd/CS2skq

  • The flipping algorithm in the PR is also more memory friendly than the reversing. While both do about n swaps total, the reversing is two sequential reads over the whole slice. The PR does n/d rounds of d swaps, and in each round, puts d elements into their final location so they won't be needed to be looked at again. The other d elements that were loaded are needed again soon, often in the very next round, so when 2*d < cache < n, things go faster than using the reverses.

Raw benchmark numbers: https://gist.github.com/scottmcm/36a6d8db52870f1a60612f811b5ff69a

@leonardo-m
Copy link

[u8].reverse() has terrible performance.

This can be fixed with a specialization in another PR?

@scottmcm
Copy link
Member Author

scottmcm commented May 5, 2017

@leonardo-m I found a way: PR #41764

But reverse will always be slower than memcpy, since it's a more complex task. And that PR doesn't help things like [[u8;3]]::reverse(), which is also pretty slow.

@scottmcm scottmcm mentioned this pull request May 5, 2017
}
}

unsafe fn ptr_swap_n<T>(a: *mut T, b: *mut T, n: usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to suffer from the same undef/padding issues as #41335
In this case it would be possible to overcome them by always doing ptr_swap_u8, but that would likely result in worse performance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look, @ranma42.

I expected this to be fine as ptr::swap is memcpy+memmove+memcpy. I figured that if memcpy'ing padding was invalid, then ptr::read::<(u8, u16)> would always be invalid, since it's also just memcpy. Is that logic incorrect?

Also, if this code hits those problems, does #40454 hit them too? (Ideally I'd take the swap-slices optimizations out of this and use a new ptr::swap_nonoverlapping_n-style function that just worked well.)

Copy link
Contributor

@ranma42 ranma42 May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#40454 should be fine, because it only uses u8 values. Some of them might be undef, but as long as they are only copied and never used as condition guards, the worst that can happen is that the padding bytes are not guaranteed to be copied (as LLVM might see them as undef and use arbitrary values instead of accessing the memory).

ptr::read::<(u8, u16)> is not invalid as long as you do not try to interpret the result as another type. The problem arises, for example, if you have a (u8, u16) and read it as a u32 value (assuming that there is a 1-byte padding between the u8 and the u16). In this case, the u32 result would be undefined because of the padding.

@carols10cents
Copy link
Member

ping @sfackler, looks like this might have fallen through the cracks? (pinging you on IRC too)

@sfackler
Copy link
Member

sfackler commented May 9, 2017

The API itself seems pretty reasonable to me, though it sounds like there are still some unanswered questions in the implementation?

cc @rust-lang/libs

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 9, 2017
@scottmcm
Copy link
Member Author

scottmcm commented May 9, 2017

I've pushed an update removing the optimization, which I believe was the only question in the implementation.

(Hopefully #40454 will make it unnecessary, and if not it's probably better to share code with that that to have a different optimization here anyway.)

[Edit] Changed the commit message to make travis re-run, since it failed in git fetch. Is there a better way to handle that in future?

@BurntSushi
Copy link
Member

The API seems reasonable to me too.

@aturon
Copy link
Member

aturon commented May 9, 2017

Libs team briefly discussed the API in triage meeting, seems good on that front.

@scottmcm
Copy link
Member Author

And travis failed to git fetch in the xcode 8.2 config again: https://travis-ci.org/rust-lang/rust/jobs/230284608 Poking it...

@kennytm
Copy link
Member

kennytm commented May 10, 2017

@bors
Copy link
Contributor

bors commented May 10, 2017

☔ The latest upstream changes (presumably #41764) made this pull request unmergeable. Please resolve the merge conflicts.

@scottmcm
Copy link
Member Author

Merge conflict with myself 😆 Rebased and made a real tracking issue.

@carols10cents
Copy link
Member

sooooo sounds like everything is ready to go? wdyt @sfackler?

Copy link
Member

@sfackler sfackler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of little things. Looks reasonable otherwise.

@@ -1337,6 +1337,61 @@ impl<T> [T] {
core_slice::SliceExt::sort_unstable_by_key(self, f);
}

/// Permutes the slice in-place such that `self[mid..]` move to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"moves"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess I was mentally inserting "the elements in". Considering it as "the subslice" is probably better; will update.

@@ -1337,6 +1337,61 @@ impl<T> [T] {
core_slice::SliceExt::sort_unstable_by_key(self, f);
}

/// Permutes the slice in-place such that `self[mid..]` move to the
/// beginning of the slice while `self[..mid]` move to the end of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"moves"

rotate::ptr_rotate(mid, p.offset(mid as isize), k);
}

k
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of returning k?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self[..mid] moves to self[k..] and self[mid..] moves to self[..k], so when, say, rotating around a partition point, it gives the new partition point.

I return it because Stepanov returns the equivalent iterator, but I've definitely found the index less useful that the iterator. Obviously it's easy to calculate yourself, but the more annoying problem is that I'm usually rotating a subslice, so the returned index needs to be fixed-up since it's not slice-independent. (With iterators, in contrast, the new split point is non-trivial to calculate for non-random-access iterators, and iterators are independently-meaningful, so the returned split point can directly be used to construct ranges in the original container.)

So I don't have particularly strong feelings; let me know what you'd prefer.

@leonardo-m
Copy link

Regarding the k return value, also take a look at the API of how the D standard library performs an analogous operation (and also the name of the function):
https://dlang.org/phobos/std_algorithm_mutation.html#bringToFront

@Mark-Simulacrum Mark-Simulacrum added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 20, 2017
@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 20, 2017
A helpful primitive for moving chunks of data around inside a slice.
In particular, adding elements to the end of a Vec then moving them
somewhere else, as a way to do efficient multiple-insert.  (There's
drain for efficient block-remove, but no easy way to block-insert.)

Talk with another example: <https://youtu.be/qH6sSOr-yk8?t=560>
Batch-insert is better done with Vec::splice
It can be revisted later after the mem::swap optimizations land.
@scottmcm
Copy link
Member Author

Thanks for the link, @leonardo-m; good to see another example of returning usize with the same meaning. It's a more useful result there, though, as it working on forward iterators means it's O(k) to recalculate it, unlike here.

I thought some more about my earlier comment and decided to just remove the return value. It's easy to calculate and could plausibly be added later, so might as well just be left off for now.

@aidanhs aidanhs added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 25, 2017
@carols10cents
Copy link
Member

ping @sfackler, sounds like this is ready for another look!

@alexcrichton
Copy link
Member

@bors: r+

Thanks again @scottmcm!

@bors
Copy link
Contributor

bors commented Jun 1, 2017

📌 Commit 094d61f has been approved by alexcrichton

@alexcrichton alexcrichton added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 1, 2017
@bors
Copy link
Contributor

bors commented Jun 2, 2017

⌛ Testing commit 094d61f with merge 558cd1e...

bors added a commit that referenced this pull request Jun 2, 2017
Add an in-place rotate method for slices to libcore

A helpful primitive for moving chunks of data around inside a slice.

For example, if you have a range selected and are drag-and-dropping it somewhere else (Example from [Sean Parent's talk](https://youtu.be/qH6sSOr-yk8?t=560)).

(If this should be an RFC instead of a PR, please let me know.)

Edit: changed example
@bors
Copy link
Contributor

bors commented Jun 2, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 558cd1e to master...

@bors bors merged commit 094d61f into rust-lang:master Jun 2, 2017
@scottmcm scottmcm deleted the slice-rotate branch June 3, 2017 06:58
@RalfJung
Copy link
Member

This adds a function which says "Safety: The type T must have non-zero size", and then proceeds to call it in SliceExt::rotate without checking the size. Am I missing something or is this grossly unsafe?

kennytm added a commit to kennytm/rust that referenced this pull request Jul 20, 2018
fix unsafety: don't call ptr_rotate for ZST

`rotate::ptr_rotate` has a comment saying
```
/// # Safety
///
/// The specified range must be valid for reading and writing.
/// The type `T` must have non-zero size.
```
So we better make sure we don't call it on ZST...

Cc @scottmcm (author of rust-lang#41670)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.