Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the case of "centi-meter" and "100-kilometer" #4418

Merged
merged 30 commits into from
Jan 5, 2024
Merged
Changes from 6 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
bd44037
Fix the case of "centi-meter" and "100-kilometer"
younies Dec 7, 2023
4413060
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Dec 12, 2023
ddf1121
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Dec 14, 2023
49e161d
fix after merge.
younies Dec 14, 2023
1deb947
fix fmt
younies Dec 14, 2023
bec7a3c
Merge branch 'main' into fix-extracting-units
younies Dec 17, 2023
7e57267
fix the cases of "centi-meter" and "100-kilometer"
younies Dec 18, 2023
11f6d4f
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Dec 18, 2023
ef2444a
try to convert from from `&ZeroTrie<ZeroVec<'_, u8>>` to &ZeroTrieSim…
younies Dec 18, 2023
2499082
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Dec 19, 2023
69a9327
Use const TRIE for powers
younies Dec 19, 2023
7e42334
Make constants TRIEs
younies Dec 19, 2023
7eee337
silent the test
younies Dec 19, 2023
e46cc25
Use ZeroVec instead of Vec
Manishearth Dec 19, 2023
a975af3
Update experimental/unitsconversion/src/measureunit.rs
younies Dec 29, 2023
28a750c
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Dec 29, 2023
8eff0c7
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Jan 3, 2024
4a3458a
Merge branch 'main' of github.com:unicode-org/icu4x into fix-extracti…
younies Jan 4, 2024
b4b1c72
add more test cases
younies Jan 4, 2024
89a3a8d
fix fmt
younies Jan 4, 2024
3c0c3f7
fix clippy
younies Jan 4, 2024
14d9162
fix all the cases
younies Jan 4, 2024
25fbd09
fix clippy
younies Jan 4, 2024
07982db
fix clippy
younies Jan 4, 2024
e1e91a1
fix clippy
younies Jan 4, 2024
2754835
fix fmt
younies Jan 4, 2024
92577dc
add tests for the non parsable units
younies Jan 4, 2024
e9232c9
update bakkeddata
younies Jan 4, 2024
61c9de3
cargo make download-repo-sources
younies Jan 4, 2024
8e2a1e6
cargo make testdata
younies Jan 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions experimental/unitsconversion/src/measureunit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,27 @@ impl<'data> MeasureUnitParser<'data> {
/// NOTE:
/// if the unit id is found, the function will return (unit id, part without the unit id and without `-` at the beginning of the remaining part if it exists).
/// if the unit id is not found, the function will return None.
fn get_unit_id(&self, part: &'data str) -> Option<usize> {
self.payload.get(part.as_bytes())
fn get_unit_id(
&self,
part: &str,
identifier_split: &mut std::str::Split<'data, char>,
) -> Option<usize> {
let mut part = part.to_string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a Cow to avoid allocation in the general case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ZeroTrie cursor is available now so you should just use that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I will do that right away.

In order to add the test cases, could you reply to this comment:
https://github.com/unicode-org/icu4x/pull/4422/files#r1419262946

I need to be able to read the provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use the ZeroTrie cursor, this means that we need to change the algorithm in getting the power and the si prefix.

Shall we do it in another PR and test the performance. who knows which implementation is better in performance ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't allocate memory in hot library code like this. We have clients who are sensitive to it. Also, we have a lot of experiential evidence that memory allocations are one of the biggest single contributors to slow code, so I have no doubt that ZeroTrieCursor will be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, no problem. actually, I am implementing it right now.

loop {
if let Some(unit_id) = self.payload.get(&part) {
return Some(unit_id);
}

match identifier_split.next() {
Some(next_part) => {
if !part.is_empty() {
part.push('-');
}
part.push_str(next_part);
}
None => return None,
}
}
}

/// Process a part of an identifier.
Expand Down Expand Up @@ -100,7 +119,7 @@ impl<'data> MeasureUnitParser<'data> {

let (si_prefix, identifier_after_si) = Self::get_si_prefix(part);
let unit_id = self
.get_unit_id(identifier_after_si)
.get_unit_id(identifier_after_si, &mut identifier_split)
.ok_or(ConversionError::InvalidUnit)?;

result.push(MeasureUnitItem {
Expand Down