Skip to content

Commit

Permalink
Fix NFKD for accented digraph followed by accent (unicode-org#4530)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsivonen committed Jan 23, 2024
1 parent 40b418c commit 768235d
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 3 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,15 @@
- [Remove icu_datagen's dep on `fractional`](https://github.com/unicode-org/icu4x/pull/4472)
- `icu_datagen@1.4.1`

- Fix normalization of character whose decomposition contains more than one starter and ends with a non-starter followed by a non-starter
with a lower Canonical Combining Class than the last character of the decomposition. (https://github.com/unicode-org/icu4x/pull/4530)
- `icu_normalizer@1.4.1`

## icu4x 1.4 (Nov 16, 2023)

- General
- MSRV is now 1.67

- Components
- Compiled data updated to CLDR 44 and ICU 74 (https://github.com/unicode-org/icu4x/pull/4245)
- `icu_calendar`
Expand Down
4 changes: 2 additions & 2 deletions components/normalizer/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,7 @@ where
i += 1;
// Half-width kana and iota subscript don't occur in the tails
// of these multicharacter decompositions.
if decomposition_starts_with_non_starter(trie_value) {
if !decomposition_starts_with_non_starter(trie_value) {
combining_start = i;
}
}
Expand Down Expand Up @@ -676,7 +676,7 @@ where
i += 1;
// Half-width kana and iota subscript don't occur in the tails
// of these multicharacter decompositions.
if decomposition_starts_with_non_starter(trie_value) {
if !decomposition_starts_with_non_starter(trie_value) {
combining_start = i;
}
}
Expand Down
22 changes: 22 additions & 0 deletions components/normalizer/tests/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1308,6 +1308,28 @@ fn test_utf16_basic() {
);
}

#[test]
fn test_accented_digraph() {
let normalizer: DecomposingNormalizer = DecomposingNormalizer::new_nfkd();
assert_eq!(
normalizer.normalize("\u{01C4}\u{0323}"),
"DZ\u{0323}\u{030C}"
);
assert_eq!(
normalizer.normalize("DZ\u{030C}\u{0323}"),
"DZ\u{0323}\u{030C}"
);
}

#[test]
fn test_ddd() {
let normalizer: DecomposingNormalizer = DecomposingNormalizer::new_nfd();
assert_eq!(
normalizer.normalize("\u{0DDD}\u{0334}"),
"\u{0DD9}\u{0DCF}\u{0334}\u{0DCA}"
);
}

#[test]
fn test_is_normalized() {
let nfd: DecomposingNormalizer = DecomposingNormalizer::new_nfd();
Expand Down

0 comments on commit 768235d

Please sign in to comment.