-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify CharIndices.next #61070
Simplify CharIndices.next #61070
Conversation
Char.len_utf8 is stable since rust-lang#49698, making this a little easier to follow.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @rkruppe (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
@bors rollup |
cc @SimonSapin |
Thanks, LGTM! @bors r+ |
📌 Commit 7af83dc has been approved by |
|
@bors r- (let's wait with merging until this performance question is cleared up) |
I don’t read assembly fluently enough to conclude anything from this: |
I created a sample benchmark to compare the two implementations: benchmark#![feature(test)]
extern crate test;
use std::str::Chars;
pub struct CharIndices<'a> {
front_offset: usize,
iter: Chars<'a>,
}
pub fn before(self_: &mut CharIndices) -> Option<(usize, char)> {
let pre_len = self_.iter.as_str().len();
match self_.iter.next() {
None => None,
Some(ch) => {
let index = self_.front_offset;
let len = self_.iter.as_str().len();
self_.front_offset += pre_len - len;
Some((index, ch))
}
}
}
pub fn after(self_: &mut CharIndices) -> Option<(usize, char)> {
let ch = self_.iter.next()?;
let index = self_.front_offset;
self_.front_offset += ch.len_utf8();
Some((index, ch))
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn before(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.chars().count();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some(_) = super::before(&mut chars) {
i += 1;
}
assert_eq!(i, len);
});
}
#[bench]
fn after(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.chars().count();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some(_) = super::after(&mut chars) {
i += 1;
}
assert_eq!(i, len);
});
}
}
It appears they're roughly the same speed? It might be that I'm just doing something wrong, the assembly definitely looks more complex the the after output above. Before: https://rust.godbolt.org/z/sIJTXu |
Nevermind, when I updated the benchmark to actually use the index, it's slower: benchmark using index#![feature(test)]
extern crate test;
use std::str::Chars;
pub struct CharIndices<'a> {
front_offset: usize,
iter: Chars<'a>,
}
pub fn before(self_: &mut CharIndices) -> Option<(usize, char)> {
let pre_len = self_.iter.as_str().len();
match self_.iter.next() {
None => None,
Some(ch) => {
let index = self_.front_offset;
let len = self_.iter.as_str().len();
self_.front_offset += pre_len - len;
Some((index, ch))
}
}
}
pub fn after(self_: &mut CharIndices) -> Option<(usize, char)> {
let ch = self_.iter.next()?;
let index = self_.front_offset;
self_.front_offset += ch.len_utf8();
Some((index, ch))
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn before(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.len();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some((index, _)) = super::before(&mut chars) {
i = index;
}
assert_eq!(i + 1, len);
});
}
#[bench]
fn after(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.len();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some((index, _)) = super::after(&mut chars) {
i = index;
}
assert_eq!(i + 1, len);
});
}
}
|
Char.len_utf8
is stable since #49698, making this a little easier to follow.