-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
48aafab
Add benches for DataLocale
sffc b5033d5
Add write_cmp function
sffc 9822dbe
Implement the comparison with bytes instead of strings
sffc 8f1571e
Use the new comparison code in DataLocale::strict_cmp
sffc c67e585
Use the new comparison code in icu_locid and deprecate the old stuff
sffc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
// This file is part of ICU4X. For terms of use, please see the file | ||
// called LICENSE at the top level of the ICU4X source tree | ||
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ). | ||
|
||
extern crate alloc; | ||
|
||
use criterion::{black_box, criterion_group, criterion_main, Criterion}; | ||
use icu_provider::prelude::*; | ||
use std::str::FromStr; | ||
use writeable::Writeable; | ||
|
||
static BCP47_STRINGS: &[&str] = &[ | ||
"ca", | ||
"ca-ES", | ||
"ca-ES-u-ca-buddhist", | ||
"ca-ES-valencia", | ||
"ca-ES-x-gbp", | ||
"ca-ES-x-gbp-short", | ||
"ca-ES-x-usd", | ||
"ca-ES-xyzabc", | ||
"ca-x-eur", | ||
"cat", | ||
"pl-Latn-PL", | ||
"und", | ||
"und-fonipa", | ||
"und-u-ca-hebrew", | ||
"und-u-ca-japanese", | ||
"und-x-mxn", | ||
"zh", | ||
]; | ||
|
||
fn overview_bench(c: &mut Criterion) { | ||
c.bench_function("data_locale/overview", |b| { | ||
b.iter(|| { | ||
for s in black_box(BCP47_STRINGS).iter() { | ||
let loc = DataLocale::from_str(s).unwrap(); | ||
let loc = loc.clone(); | ||
let s = loc.write_to_string(); | ||
loc.strict_cmp(s.as_bytes()); | ||
} | ||
}); | ||
}); | ||
|
||
#[cfg(feature = "bench")] | ||
data_locale_bench(c); | ||
} | ||
|
||
#[cfg(feature = "bench")] | ||
fn data_locale_bench(c: &mut Criterion) { | ||
c.bench_function("data_locale/parse", |b| { | ||
b.iter(|| { | ||
for s in black_box(BCP47_STRINGS).iter() { | ||
DataLocale::from_str(s).unwrap(); | ||
} | ||
}); | ||
}); | ||
|
||
let data_locales: Vec<DataLocale> = BCP47_STRINGS.iter().map(|s| s.parse().unwrap()).collect(); | ||
|
||
c.bench_function("data_locale/write_to_string", |b| { | ||
b.iter(|| { | ||
for loc in black_box(&data_locales).iter() { | ||
loc.write_to_string(); | ||
} | ||
}); | ||
}); | ||
c.bench_function("data_locale/clone", |b| { | ||
b.iter(|| { | ||
for loc in black_box(&data_locales).iter() { | ||
let _ = loc.clone(); | ||
} | ||
}); | ||
}); | ||
c.bench_function("data_locale/strict_cmp", |b| { | ||
b.iter(|| { | ||
for loc in black_box(&data_locales).iter() { | ||
for s in black_box(BCP47_STRINGS).iter() { | ||
loc.strict_cmp(s.as_bytes()); | ||
} | ||
} | ||
}); | ||
}); | ||
} | ||
|
||
criterion_group!(benches, overview_bench,); | ||
criterion_main!(benches); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
// This file is part of ICU4X. For terms of use, please see the file | ||
// called LICENSE at the top level of the ICU4X source tree | ||
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ). | ||
|
||
use core::cmp::Ordering; | ||
use core::fmt; | ||
|
||
pub(crate) struct WriteComparator<'a> { | ||
string: &'a [u8], | ||
result: Ordering, | ||
} | ||
|
||
/// This is an infallible impl. Functions always return Ok, not Err. | ||
impl<'a> fmt::Write for WriteComparator<'a> { | ||
#[inline] | ||
fn write_str(&mut self, other: &str) -> fmt::Result { | ||
if self.result != Ordering::Equal { | ||
return Ok(()); | ||
} | ||
let cmp_len = core::cmp::min(other.len(), self.string.len()); | ||
let (this, remainder) = self.string.split_at(cmp_len); | ||
self.string = remainder; | ||
self.result = this.cmp(other.as_bytes()); | ||
Ok(()) | ||
} | ||
} | ||
|
||
impl<'a> WriteComparator<'a> { | ||
#[inline] | ||
pub fn new(string: &'a (impl AsRef<[u8]> + ?Sized)) -> Self { | ||
Self { | ||
string: string.as_ref(), | ||
result: Ordering::Equal, | ||
} | ||
} | ||
|
||
#[inline] | ||
pub fn finish(self) -> Ordering { | ||
if matches!(self.result, Ordering::Equal) && !self.string.is_empty() { | ||
// Self is longer than Other | ||
Ordering::Greater | ||
} else { | ||
self.result | ||
} | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
use core::fmt::Write; | ||
|
||
mod data { | ||
include!("../tests/data/data.rs"); | ||
} | ||
|
||
#[test] | ||
fn test_write_char() { | ||
for a in data::KEBAB_CASE_STRINGS { | ||
for b in data::KEBAB_CASE_STRINGS { | ||
let mut wc = WriteComparator::new(a); | ||
for ch in b.chars() { | ||
wc.write_char(ch).unwrap(); | ||
} | ||
assert_eq!(a.cmp(b), wc.finish(), "{a} <=> {b}"); | ||
} | ||
} | ||
} | ||
|
||
#[test] | ||
fn test_write_str() { | ||
for a in data::KEBAB_CASE_STRINGS { | ||
for b in data::KEBAB_CASE_STRINGS { | ||
let mut wc = WriteComparator::new(a); | ||
wc.write_str(b).unwrap(); | ||
assert_eq!(a.cmp(b), wc.finish(), "{a} <=> {b}"); | ||
} | ||
} | ||
} | ||
|
||
#[test] | ||
fn test_mixed() { | ||
for a in data::KEBAB_CASE_STRINGS { | ||
for b in data::KEBAB_CASE_STRINGS { | ||
let mut wc = WriteComparator::new(a); | ||
let mut first = true; | ||
for substr in b.split('-') { | ||
if first { | ||
first = false; | ||
} else { | ||
wc.write_char('-').unwrap(); | ||
} | ||
wc.write_str(substr).unwrap(); | ||
} | ||
assert_eq!(a.cmp(b), wc.finish(), "{a} <=> {b}"); | ||
} | ||
} | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if other is longer than self we should be capping out, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
other
is longer, then the only effect is that on the line below the remainder becomes empty and we compare the whole ofother
to the whole ofself.string
.