-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up String::from_utf16 #55530
Speed up String::from_utf16 #55530
Conversation
r? @aidanhs (rust_highfive has picked a reviewer for you, use r? to override) |
I created more benchmarks to check which initial capacity provides the best results in different scenarios. The results aren't too surprising:
The good news is that any choice is at least twice as fast as the current solution, so it boils down to how often we can expect to see non-ASCII characters and code units above The conservative approach would be to just go with |
Ping from triage @aidanhs / @rust-lang/libs: This PR requires your review. |
@bors r+ Thanks! |
📌 Commit 19aa101 has been approved by |
…r=SimonSapin Speed up String::from_utf16 Collecting into a `Result` is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion. This changes the behavior of `String::from_utf16` from collecting into a `Result` to pushing to a preallocated `String` in a loop. According to [my simple benchmark](https://gist.github.com/ljedrz/953a3fb74058806519bd4d640d6f65ae) this change makes `String::from_utf16` around **twice** as fast.
let mut ret = String::with_capacity(v.len()); | ||
for c in decode_utf16(v.iter().cloned()) { | ||
if let Ok(c) = c { | ||
ret.push(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a comment explaining why the code works this way instead of the more "obvious" collect
call.
Basically, what you wrote in the PR should be in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea; since it is already rolled up, I can add this comment afterwards, along with some other assorted code adjustments.
Rollup of 16 pull requests Successful merges: - #54906 (Reattach all grandchildren when constructing specialization graph.) - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55827 (A few tweaks to iterations/collecting) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55935 (appveyor: Use VS2017 for all our images) - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs) - #55948 (submodules: update clippy from d8b4269 to 7e0ddef) - #55956 (add tests for some fixed ICEs)
…r=SimonSapin Speed up String::from_utf16 Collecting into a `Result` is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion. This changes the behavior of `String::from_utf16` from collecting into a `Result` to pushing to a preallocated `String` in a loop. According to [my simple benchmark](https://gist.github.com/ljedrz/953a3fb74058806519bd4d640d6f65ae) this change makes `String::from_utf16` around **twice** as fast.
Rollup of 17 pull requests Successful merges: - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55750 (Make `NodeId` and `HirLocalId` `newtype_index`) - #55778 (Wrap some query results in `Lrc`.) - #55781 (More precise spans for temps and their drops) - #55785 (Add mem::forget_unsized() for forgetting unsized values) - #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint) - #55865 (Unix RwLock: avoid racy access to write_locked) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55956 (add tests for some fixed ICEs) Failed merges: r? @ghost
@ljedrz Is this a size_hint limitation when collecting into Result<,>? Otherwise I see no reason why the slice iterator couldn't propagate the hint to String::from_iter. |
@arthurprs Yes; the implementation of |
Collecting into a
Result
is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion.This changes the behavior of
String::from_utf16
from collecting into aResult
to pushing to a preallocatedString
in a loop.According to my simple benchmark this change makes
String::from_utf16
around twice as fast.