Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename _legacy to _adjust_to_cased #3826

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion components/casemap/benches/casemap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ fn overview_bench(c: &mut Criterion) {
c.bench_function("icu_casemap/titlecase_segment", |b| {
b.iter(|| {
for s in TEST_STRING_EN.split(' ') {
black_box(casemapper.titlecase_segment_legacy_to_string(
black_box(casemapper.titlecase_segment_adjust_to_cased_to_string(
black_box(s),
&root,
Default::default(),
Expand Down
62 changes: 33 additions & 29 deletions components/casemap/src/casemapper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,16 +146,16 @@ impl CaseMapper {
/// as a `LanguageIdentifier` (usually the `id` field of the `Locale`) if available, or
/// `Default::default()` for the root locale.
///
/// This function performs legacy head adjustment behavior when [`HeadAdjustment::Adjust`] is set. See
/// This function performs "adjust to cased" head adjustment behavior when [`HeadAdjustment::Adjust`] is set. See
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"head adjustment" is confusing. I have never heard "head" used in discussion of titlecasing.
How about "titlecasing index adjustment" or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been calling this HeadAdjustment and TailCasing which I think has a nice symmetry. "Index adjustment" feels more internal, I think "head" is quite clear personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed:

  • Head/Tail
  • Lead/Trail
  • Start/Remainder

/// the docs of [`TitlecaseMapper`] for more information on what this means. There is no difference between
/// the behavior of this function and the equivalent ones on [`TitlecaseMapper`] when the head adjustment mode
/// is [`HeadAdjustment::NoAdjust`].
///
/// See [`Self::titlecase_segment_legacy_to_string()`] for the equivalent convenience function that returns a String,
/// See [`Self::titlecase_segment_adjust_to_cased_to_string()`] for the equivalent convenience function that returns a String,
/// as well as for an example.
///
/// [`TitlecaseMapper`]: crate::TitlecaseMapper
pub fn titlecase_segment_legacy<'a>(
pub fn titlecase_segment_adjust_to_cased<'a>(
&'a self,
src: &'a str,
langid: &LanguageIdentifier,
Expand Down Expand Up @@ -315,7 +315,7 @@ impl CaseMapper {
/// the behavior of this function and the equivalent ones on [`TitlecaseMapper`] when the head adjustment mode
/// is [`HeadAdjustment::NoAdjust`].
///
/// See [`Self::titlecase_segment_legacy()`] for the equivalent lower-level function that returns a [`Writeable`]
/// See [`Self::titlecase_segment_adjust_to_cased()`] for the equivalent lower-level function that returns a [`Writeable`]
///
/// # Examples
///
Expand All @@ -330,30 +330,30 @@ impl CaseMapper {
///
/// // note that the subsequent words are not titlecased, this function assumes
/// // that the entire string is a single segment and only titlecases at the beginning.
/// assert_eq!(cm.titlecase_segment_legacy_to_string("hEllO WorLd", &root, default_options), "Hello world");
/// assert_eq!(cm.titlecase_segment_legacy_to_string("Γειά σου Κόσμε", &root, default_options), "Γειά σου κόσμε");
/// assert_eq!(cm.titlecase_segment_legacy_to_string("नमस्ते दुनिया", &root, default_options), "नमस्ते दुनिया");
/// assert_eq!(cm.titlecase_segment_legacy_to_string("Привет мир", &root, default_options), "Привет мир");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("hEllO WorLd", &root, default_options), "Hello world");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("Γειά σου Κόσμε", &root, default_options), "Γειά σου κόσμε");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("नमस्ते दुनिया", &root, default_options), "नमस्ते दुनिया");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("Привет мир", &root, default_options), "Привет мир");
///
/// // Some behavior is language-sensitive
/// assert_eq!(cm.titlecase_segment_legacy_to_string("istanbul", &root, default_options), "Istanbul");
/// assert_eq!(cm.titlecase_segment_legacy_to_string("istanbul", &langid!("tr"), default_options), "İstanbul"); // Turkish dotted i
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("istanbul", &root, default_options), "Istanbul");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("istanbul", &langid!("tr"), default_options), "İstanbul"); // Turkish dotted i
///
/// assert_eq!(cm.titlecase_segment_legacy_to_string("և Երևանի", &root, default_options), "Եւ երևանի");
/// assert_eq!(cm.titlecase_segment_legacy_to_string("և Երևանի", &langid!("hy"), default_options), "Եվ երևանի"); // Eastern Armenian ech-yiwn ligature
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("և Երևանի", &root, default_options), "Եւ երևանի");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("և Երևանի", &langid!("hy"), default_options), "Եվ երևանի"); // Eastern Armenian ech-yiwn ligature
///
/// assert_eq!(cm.titlecase_segment_legacy_to_string("ijkdijk", &root, default_options), "Ijkdijk");
/// assert_eq!(cm.titlecase_segment_legacy_to_string("ijkdijk", &langid!("nl"), default_options), "IJkdijk"); // Dutch IJ digraph
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("ijkdijk", &root, default_options), "Ijkdijk");
/// assert_eq!(cm.titlecase_segment_adjust_to_cased_to_string("ijkdijk", &langid!("nl"), default_options), "IJkdijk"); // Dutch IJ digraph
/// ```
///
/// [`TitlecaseMapper`]: crate::TitlecaseMapper
pub fn titlecase_segment_legacy_to_string(
pub fn titlecase_segment_adjust_to_cased_to_string(
&self,
src: &str,
langid: &LanguageIdentifier,
options: TitlecaseOptions,
) -> String {
self.titlecase_segment_legacy(src, langid, options)
self.titlecase_segment_adjust_to_cased(src, langid, options)
.write_to_string()
.into_owned()
}
Expand Down Expand Up @@ -621,37 +621,41 @@ mod tests {
);
// but the YPOGEGRAMMENI should not titlecase
assert_eq!(
cm.titlecase_segment_legacy_to_string("α\u{0313}\u{0345}", &root, default_options),
cm.titlecase_segment_adjust_to_cased_to_string(
"α\u{0313}\u{0345}",
&root,
default_options
),
"Α\u{0313}\u{0345}"
);

// U+1F80 GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
assert_eq!(
cm.titlecase_segment_legacy_to_string("ᾀ", &root, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("ᾀ", &root, default_options),
"ᾈ"
);
assert_eq!(cm.uppercase_to_string("ᾀ", &root), "ἈΙ");

// U+1FFC GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
assert_eq!(cm.lowercase_to_string("ῼ", &root), "ῳ");
assert_eq!(
cm.titlecase_segment_legacy_to_string("ῼ", &root, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("ῼ", &root, default_options),
"ῼ"
);
assert_eq!(cm.uppercase_to_string("ῼ", &root), "ΩΙ");

// U+1F98 GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI
assert_eq!(cm.lowercase_to_string("ᾘ", &root), "ᾐ");
assert_eq!(
cm.titlecase_segment_legacy_to_string("ᾘ", &root, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("ᾘ", &root, default_options),
"ᾘ"
);
assert_eq!(cm.uppercase_to_string("ᾘ", &root), "ἨΙ");

// U+1FB2 GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI
assert_eq!(cm.lowercase_to_string("ᾲ", &root), "ᾲ");
assert_eq!(
cm.titlecase_segment_legacy_to_string("ᾲ", &root, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("ᾲ", &root, default_options),
"Ὰ\u{345}"
);
assert_eq!(cm.uppercase_to_string("ᾲ", &root), "ᾺΙ");
Expand All @@ -667,11 +671,11 @@ mod tests {
assert_eq!(cm.lowercase_to_string("İ", &tr), "i");
assert_eq!(cm.lowercase_to_string("İ", &az), "i");
assert_eq!(
cm.titlecase_segment_legacy_to_string("İ", &tr, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("İ", &tr, default_options),
"İ"
);
assert_eq!(
cm.titlecase_segment_legacy_to_string("İ", &az, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("İ", &az, default_options),
"İ"
);
assert_eq!(cm.uppercase_to_string("İ", &tr), "İ");
Expand All @@ -681,11 +685,11 @@ mod tests {
assert_eq!(cm.lowercase_to_string("I\u{0307}", &tr), "i");
assert_eq!(cm.lowercase_to_string("I\u{0307}", &az), "i");
assert_eq!(
cm.titlecase_segment_legacy_to_string("I\u{0307}", &tr, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("I\u{0307}", &tr, default_options),
"I\u{0307}"
);
assert_eq!(
cm.titlecase_segment_legacy_to_string("I\u{0307}", &az, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("I\u{0307}", &az, default_options),
"I\u{0307}"
);
assert_eq!(cm.uppercase_to_string("I\u{0307}", &tr), "I\u{0307}");
Expand All @@ -695,11 +699,11 @@ mod tests {
assert_eq!(cm.lowercase_to_string("I", &tr), "ı");
assert_eq!(cm.lowercase_to_string("I", &az), "ı");
assert_eq!(
cm.titlecase_segment_legacy_to_string("I", &tr, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("I", &tr, default_options),
"I"
);
assert_eq!(
cm.titlecase_segment_legacy_to_string("I", &az, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("I", &az, default_options),
"I"
);
assert_eq!(cm.uppercase_to_string("I", &tr), "I");
Expand All @@ -709,11 +713,11 @@ mod tests {
assert_eq!(cm.lowercase_to_string("i", &tr), "i");
assert_eq!(cm.lowercase_to_string("i", &az), "i");
assert_eq!(
cm.titlecase_segment_legacy_to_string("i", &tr, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("i", &tr, default_options),
"İ"
);
assert_eq!(
cm.titlecase_segment_legacy_to_string("i", &az, default_options),
cm.titlecase_segment_adjust_to_cased_to_string("i", &az, default_options),
"İ"
);
assert_eq!(cm.uppercase_to_string("i", &tr), "İ");
Expand Down
30 changes: 15 additions & 15 deletions components/casemap/src/titlecase.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,15 @@ pub struct TitlecaseOptions {
}

/// A wrapper around [`CaseMapper`] that can compute titlecasing stuff, and is able to load additional data
/// to support the non-legacy "head adjustment" behavior.
/// to support the non-"adjust to cased" "head adjustment" behavior.
///
///
/// By default, [`Self::titlecase_segment()`] and [`Self::titlecase_segment_to_string()`] perform "head adjustment",
/// where they wait till the first relevant character to begin titlecasing. For example, in the string `'twixt`, the apostrophe
/// is ignored because the word starts at the first "t", which will get titlecased (producing `'Twixt`). Other punctuation will
/// also be ignored, like in the string `«hello»`, which will get titlecased to `«Hello»`.
///
/// Opinions on exactly what is a "relevant" character may differ. In "legacy" mode the first cased character is considered "relevant",
/// Opinions on exactly what is a "relevant" character may differ. In "adjust to cased" mode the first cased character is considered "relevant",
/// whereas in the normal mode, it is the first character that is a letter, number, symbol, or private use character. This means
/// that the strings `49ers` and `«丰(abc)»` will titlecase to `49Ers` and `«丰(Abc)»`, whereas in the normal mode they stay unchanged.
/// This difference largely matters for things that mix numbers and letters, or mix writing systems, within a single segment.
Expand Down Expand Up @@ -111,7 +111,7 @@ pub struct TitlecaseOptions {
/// use icu_locid::langid;
///
/// let cm_normal = TitlecaseMapper::new();
/// let cm_legacy = TitlecaseMapper::new_legacy();
/// let cm_adjust_to_cased = TitlecaseMapper::new_adjust_to_cased();
/// let root = langid!("und");
///
/// let default_options = Default::default();
Expand All @@ -120,17 +120,17 @@ pub struct TitlecaseOptions {
///
/// // Exhibits head adjustment when set:
/// assert_eq!(cm_normal.titlecase_segment_to_string("«hello»", &root, default_options), "«Hello»");
/// assert_eq!(cm_legacy.titlecase_segment_to_string("«hello»", &root, default_options), "«Hello»");
/// assert_eq!(cm_adjust_to_cased.titlecase_segment_to_string("«hello»", &root, default_options), "«Hello»");
/// assert_eq!(cm_normal.titlecase_segment_to_string("«hello»", &root, no_adjust), "«hello»");
///
/// // Only changed in legacy mode:
/// assert_eq!(cm_normal.titlecase_segment_to_string("丰(abc)", &root, default_options), "丰(abc)");
/// assert_eq!(cm_legacy.titlecase_segment_to_string("丰(abc)", &root, default_options), "丰(Abc)");
/// assert_eq!(cm_adjust_to_cased.titlecase_segment_to_string("丰(abc)", &root, default_options), "丰(Abc)");
/// assert_eq!(cm_normal.titlecase_segment_to_string("丰(abc)", &root, no_adjust), "丰(abc)");
///
/// // Only changed in legacy mode:
/// assert_eq!(cm_normal.titlecase_segment_to_string("49ers", &root, default_options), "49ers");
/// assert_eq!(cm_legacy.titlecase_segment_to_string("49ers", &root, default_options), "49Ers");
/// assert_eq!(cm_adjust_to_cased.titlecase_segment_to_string("49ers", &root, default_options), "49Ers");
/// assert_eq!(cm_normal.titlecase_segment_to_string("49ers", &root, no_adjust), "49ers");
/// ```
/// <div class="stab unstable">
Expand All @@ -146,7 +146,7 @@ pub struct TitlecaseMapper<CM> {
}

impl TitlecaseMapper<CaseMapper> {
/// A constructor which creates a [`TitlecaseMapper`] using compiled data, with the normal (non-legacy) head adjustment behavior.
/// A constructor which creates a [`TitlecaseMapper`] using compiled data, with the normal (non-"adjust to cased") head adjustment behavior.
/// See struct docs on [`TitlecaseMapper`] for more information on head adjustment behavior and usage examples.
///
/// ✨ *Enabled with the `compiled_data` Cargo feature.*
Expand All @@ -159,14 +159,14 @@ impl TitlecaseMapper<CaseMapper> {
gc: Some(icu_properties::maps::general_category().static_to_owned()),
}
}
/// A constructor which creates a [`TitlecaseMapper`] using compiled data, with the legacy head adjustment behavior.
/// A constructor which creates a [`TitlecaseMapper`] using compiled data, with the "adjust to cased" head adjustment behavior.
/// See struct docs on [`TitlecaseMapper`] for more information on head adjustment behavior and usage examples.
///
/// ✨ *Enabled with the `compiled_data` Cargo feature.*
///
/// [📚 Help choosing a constructor](icu_provider::constructors)
#[cfg(feature = "compiled_data")]
pub const fn new_legacy() -> Self {
pub const fn new_adjust_to_cased() -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: on second thought, we don't really need this constructor. People can use CaseMapper::titlecase_segment_adjust_to_cased if they don't want to load the extra data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they want to still have it be the same type, though? Might make it clunky

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind-of confusing that HeadAdjustment has 2 choices but there are 3 behaviors. We can add all 3 behaviors to that enum. Over in CaseMapper::titlecase_segment_adjust_to_cased, we decide what to do if given the enum variant which we don't support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but the choice of behaviors affects data; we had kinda discussed this model and the 2x2 chart with the merged cells

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People who care about their data have a path; people who don't care as much about their data have a cleaner API. I think that's slightly better than a confusing API that affects everyone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing we remove specifically the compiled_data adjust_to_cased ctor? That seems okay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do over FFI, then?

Self {
cm: CaseMapper::new(),
gc: None,
Expand All @@ -185,10 +185,10 @@ impl TitlecaseMapper<CaseMapper> {
icu_provider::gen_any_buffer_data_constructors!(locale: skip, options: skip, error: DataError,
#[cfg(skip)]
functions: [
new_legacy,
try_new_legacy_with_any_provider,
try_new_legacy_with_buffer_provider,
try_new_legacy_unstable,
new_adjust_to_cased,
try_new_adjust_to_cased_with_any_provider,
try_new_adjust_to_cased_with_buffer_provider,
try_new_adjust_to_cased_unstable,
Self,
]);

Expand All @@ -207,7 +207,7 @@ impl TitlecaseMapper<CaseMapper> {
Ok(Self { cm, gc })
}
#[doc = icu_provider::gen_any_buffer_unstable_docs!(UNSTABLE, Self::new)]
pub fn try_new_legacy_unstable<P>(provider: &P) -> Result<Self, DataError>
pub fn try_new_adjust_to_cased_unstable<P>(provider: &P) -> Result<Self, DataError>
where
P: DataProvider<CaseMapV1Marker> + ?Sized,
{
Expand Down Expand Up @@ -247,7 +247,7 @@ impl<CM: AsRef<CaseMapper>> TitlecaseMapper<CM> {
/// See struct docs on [`TitlecaseMapper`] for more information on head adjustment behavior.
///
/// [📚 Help choosing a constructor](icu_provider::constructors)
pub const fn new_with_mapper_legacy(casemapper: CM) -> Self {
pub const fn new_with_mapper_adjust_to_cased(casemapper: CM) -> Self {
Self {
cm: casemapper,
gc: None,
Expand Down
Loading
Loading