-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize behavior of vec.split_off(0) (take all) #76682
Conversation
Optimization improvement to `split_off()` so the performance meets the intuitively expected behavior when `at == 0`, avoiding the current behavior of copying the entire vector. The change honors documented behavior that the method leaves the original vector's "previous capacity unchanged". This improvement better supports the pattern for building and flushing a buffer of elements, such as the following: ```rust let mut vec = Vec::new(); loop { vec.push(something); if condition_is_met { process(vec.split_off(0)); } } ``` `Option` wrapping is the first alternative I thought of, but is much less obvious and more verbose: ```rust let mut capacity = 1; let mut vec: Option<Vec<Stuff>> = None; loop { vec.get_or_insert_with(|| Vec::with_capacity(capacity)).push(something); if condition_is_met { capacity = vec.capacity(); process(vec.take().unwrap()); } } ``` Directly applying `mem::replace()` could work, but `mem::` functions are typically a last resort, when a developer is actively seeking better performance than the standard library provides, for example. The benefit of the approach to this change is it does not change the existing API contract, but improves the peformance of `split_off(0)` for `Vec`, `String` (which delegates `split_off()` to `Vec`), and any other existing use cases. This change adds tests to validate the behavior of `split_off()` with regard to capacity, as originally documented, and confirm that behavior still holds, when `at == 0`. The change is an implementation detail, and does not require a documentation change, but documenting the new behavior as part of its API contract may benefit future users. (Let me know if I should make that documentation update.) Note, for future consideration: I think it would be helpful to introduce an additional method to `Vec` (if not also to `String`): ``` pub fn take_all(&mut self) -> Self { self.split_off(0) } ``` This would make it more clear how `Vec` supports the pattern, and make it easier to find, since the behavior is similar to other `take()` methods in the Rust standard library.
Do we have some numbers on how much this helps split_off with a constant argument of zero? (That's the best case for this optimization). I'd also be interested in benchmarks showing how much of a regression this is for non-constant non-zero argument split off, i.e., how much the branch costs (I imagine not much, given that we're almost guaranteed to call into the allocator). |
Here's an ad hoc benchmark test with results below, testing with I could adjust this to capture the elapsed time only when calling the I also include, for comparison, the performance of ensuring the capacity is retained, vs. allocating a new Since this uses an unmodified time_split_off_0.rs
use std::time::Instant;
fn main() {
benchmark_suite(/*bigger_elems=*/1.0, /*buffer_more_before_flush=*/1.0, /*more_elems=*/1.0);
for growth in 2..=5 {
let factor = growth as f64;
benchmark_suite(/*bigger_elems=*/1.0, /*buffer_more_before_flush=*/factor, /*more_elems=*/1.0);
benchmark_suite(/*bigger_elems=*/1.0, /*buffer_more_before_flush=*/1.0, /*more_elems=*/factor);
benchmark_suite(/*bigger_elems=*/factor, /*buffer_more_before_flush=*/1.0, /*more_elems=*/1.0);
}
}
fn benchmark_suite(bigger_elems: f64, buffer_more_before_flush: f64, more_elems: f64, ) {
let mut elem_size = (10.0 * bigger_elems) as usize;
let mut max_buffer = (1000.0 * buffer_more_before_flush) as usize;
let mut loops = max_buffer * (100.0 * more_elems) as usize;
for _ in 1..=5 {
println!("========================== BENCHMARK TEST ==========================");
println!(
"pushing {} elements of size {} bytes into a Vec buffer flushed at max {} elements",
loops,
elem_size,
max_buffer,
);
println!("--------------------------------------------------------------------");
benchmark(elem_size, loops, max_buffer, false, false);
benchmark(elem_size, loops, max_buffer, /*replace_with_capacity=*/true, false);
benchmark(elem_size, loops, max_buffer, false, /*replace_with_new=*/true);
elem_size *= 2;
loops *= 2;
max_buffer *= 2;
}
}
fn benchmark(elem_size: usize, loops: usize, max_buffer: usize, replace_with_capacity: bool, replace_with_new: bool) {
let now = Instant::now();
let mut buffer = Vec::new();
let mut flushed = 0;
for _ in 0..loops {
if buffer.len() > max_buffer {
if replace_with_capacity {
let capacity = buffer.capacity();
flush(std::mem::replace(&mut buffer, Vec::with_capacity(capacity)));
} else if replace_with_new {
flush(std::mem::replace(&mut buffer, Vec::new()));
} else {
flush(buffer.split_off(0));
}
flushed += 1;
}
buffer.push(vec![b'\0'; elem_size]);
}
if buffer.len() > 0 {
flush(buffer);
flushed += 1;
}
let elapsed_usec = now.elapsed().as_micros();
println!(
"{} microseconds {}, called flush {} times",
elapsed_usec,
if replace_with_capacity {
"replaced with `with_capacity()`"
} else if replace_with_new {
"replaced with `new()`"
} else {
"WITH EXISTING split_off()"
},
flushed,
);
}
fn flush(buffer: Vec<Vec<u8>>) {
let _ = buffer;
// drops and frees this `Vec`
} |
a few more results, this time including values for "microseconds flushing and dropping". This is the total of elapsed time calling only Looking at either result sets, you can see the optimization has some benefit in almost all cases. (Since I'm running this on Linux and the results are in clock time, there are some nondeterministic factors, but on average the results are better with the optimization.) |
Sorry, one more. I thought I should try timing ONLY the method call, and it turns out the method call by itself is a much smaller percentage of the total time spent calling the The performance difference is more pronounced in this version. So is the difference between allocating the replacement buffer Vec with At first I didn't know if it would be worth changing the behavior (say, in a new, alternative method, like One might assume the use cases that would benefit from higher performance here probably use a similar capacity most of the time, re-using the same capacity seems to make sense. But what if there is an outlier? If a buffer Maybe it would be good to have a The downside there is, we'd lose an opportunity to replace a buffer with a vector with a specified default capacity.
Or another possibility is, take a different approach that's almost as concise, but more flexible. Essentially a slightly more convenient wrapper around mem::replace():
|
Okay, I think it would be helpful to have a summary of the benchmark results -- there's a lot of data in those text files :) I am inclined to say that we should land this though, since it avoids the copy, and when compared to the allocation and copy the additional branch is essentially free. I thought about whether there could be room to copy less in other cases, but I think the answer is no -- even split_off(1), for example, still needs to copy the whole buffer because we can't shift our existing allocation. |
Sounds great. Here is a summary of the savings* after running the benchmarks: [*By "savings" I mean, if the old way took 100 microseconds and the new way takes 90 microseconds, the savings is 10%.]
|
(The summary is just comparing the version using "with_capacity()" to the original implementation.) |
@bors r+ |
📌 Commit 79aa9b1 has been approved by |
☀️ Test successful - checks-actions, checks-azure |
Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
Rollup merge of rust-lang#119917 - Zalathar:split-off, r=cuviper Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
Optimization improvement to
split_off()
so the performance meets theintuitively expected behavior when
at == 0
, avoiding the current behaviorof copying the entire vector.
The change honors documented behavior that the original vector's
"previous capacity unchanged".
This improvement better supports the pattern for building and flushing a
buffer of elements, such as the following:
Option
wrapping is the first alternative I thought of, but is muchless obvious and more verbose:
Directly using
mem::replace()
(instead of callingsplit_off()
) could work,but
mem::replace()
is a more advanced tool for Rust developers, and inthis case, I believe developers would assume the standard library should
be sufficient for the purpose described here.
The benefit of the approach to this change is it does not change the
existing API contract, but improves the peformance of
split_off(0)
forVec
,String
(which delegatessplit_off()
toVec
), and any otherexisting use cases.
This change adds tests to validate the behavior of
split_off()
withregard to capacity, as originally documented, and confirm that behavior
still holds, when
at == 0
.The change is an implementation detail, and does not require a
documentation change, but documenting the new behavior as part of its
API contract may benefit future users.
(Let me know if I should make that documentation update.)
Note, for future consideration:
I think it would be helpful to introduce an additional method to
Vec
(if not also to
String
):This would make it more clear how
Vec
supports the pattern, and makeit easier to find, since the behavior is similar to other
take()
methods in the Rust standard library.
r? @wesleywiser
FYI: @tmandry