Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MeasureUnit #4360

Merged
merged 56 commits into from
Dec 7, 2023
Merged
Changes from 2 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
ebcc3fe
Use ZeroTrie and remove dimension
younies Nov 17, 2023
aefef1a
Remove unneeded data in the data provider
younies Nov 17, 2023
c2edbd6
fix ci-job-msrv-features-3
younies Nov 17, 2023
5ea721a
Merge branch 'main' of github.com:unicode-org/icu4x into units-trie
younies Nov 22, 2023
dd9000a
use ZeroTrieSimpleAscii instead
younies Nov 22, 2023
63c44ce
Merge branch 'main' of github.com:unicode-org/icu4x into units-trie
younies Nov 23, 2023
e282232
Implement MeasureUnit
younies Nov 23, 2023
5861684
fix error
younies Nov 23, 2023
a2e5600
Implement measureunit
younies Nov 23, 2023
427b008
small fix
younies Nov 24, 2023
3440254
fix the analyzer
younies Nov 24, 2023
82bcd7c
Ready for the new data model
younies Nov 24, 2023
534fa73
silent test cases temporary
younies Nov 24, 2023
55ea5e1
add small output
younies Nov 24, 2023
27b65a0
process the bases units
younies Nov 24, 2023
2d2bc47
Fix test cases
younies Nov 24, 2023
77e459d
fix clippy
younies Nov 24, 2023
8dd3c86
use strip_prefix
younies Nov 24, 2023
dff30b7
fix get_unit_id
younies Nov 24, 2023
6427f4e
fix clippy
younies Nov 24, 2023
28fe962
Add comments
younies Nov 24, 2023
4b0a395
add comments
younies Nov 24, 2023
42ebe7f
Create a struct for `SiPrefix`
younies Nov 28, 2023
64ba8e5
use optional si prefix
younies Nov 28, 2023
f13d1cb
add "ronto" and "quecto"
younies Nov 28, 2023
a4b923c
return option si prefix
younies Nov 28, 2023
e839685
Merge branch 'main' of github.com:unicode-org/icu4x into units-trie
younies Nov 28, 2023
754d17e
fix clippy
younies Nov 28, 2023
dbd3e9c
Merge branch 'main' of github.com:unicode-org/icu4x into units-trie
younies Nov 29, 2023
01f4009
add more powers and si prefixes
younies Nov 29, 2023
e01e211
Fix the search for the id
younies Nov 29, 2023
bc6fcb8
SiPrefix without i8 instead of u8.
younies Nov 30, 2023
193ec7a
Instead of making the SiPrefix field optional, we can establish the d…
younies Nov 30, 2023
1002837
add ronna and quetta
younies Nov 30, 2023
7b19d2d
Update experimental/unitsconversion/src/measureunit.rs
younies Nov 30, 2023
d320904
fix
younies Nov 30, 2023
9fcf51a
reduce the created vectors.
younies Nov 30, 2023
bfff7dd
fix naming
younies Nov 30, 2023
d0d1a5d
Merge branch 'main' of github.com:unicode-org/icu4x into units-trie
younies Nov 30, 2023
1931b07
use split instead
younies Nov 30, 2023
9df4bac
improve
younies Nov 30, 2023
7c0916f
Merge branch 'main' of github.com:unicode-org/icu4x into units-trie
younies Nov 30, 2023
705355b
improve
younies Nov 30, 2023
5c6f1ee
small fix
younies Nov 30, 2023
c288a8a
improve
younies Nov 30, 2023
894bb60
fix clippy
younies Nov 30, 2023
bb144d4
fix comments
younies Dec 1, 2023
4ed7efb
move the static function
younies Dec 1, 2023
03ee92f
rename functions
younies Dec 1, 2023
6cc4f86
fix comment
younies Dec 1, 2023
0eff919
add todo
younies Dec 1, 2023
0acd58f
Merge branch 'main' into units-trie
younies Dec 1, 2023
fe3382d
use smallvec
younies Dec 1, 2023
e4fc883
Merge branch 'main' into units-trie
younies Dec 1, 2023
20d2013
Merge branch 'main' into units-trie
younies Dec 1, 2023
70c9f31
Merge branch 'main' into units-trie
younies Dec 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 36 additions & 50 deletions experimental/unitsconversion/src/measureunit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,29 +43,17 @@ impl MeasureUnit<'_> {
/// NOTE:
/// if the power is found, the function will return (power, part without the power).
/// if the power is not found, the function will return (1, part).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These docs seem out of date

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

fn get_power(part: &str) -> (u8, &str) {
if let Some(part) = part.strip_prefix("square-") {
(2, part)
} else if let Some(part) = part.strip_prefix("pow2-") {
(2, part)
} else if let Some(part) = part.strip_prefix("cubic-") {
(3, part)
} else if let Some(part) = part.strip_prefix("pow3-") {
(3, part)
} else if let Some(part) = part.strip_prefix("pow4-") {
(4, part)
} else if let Some(part) = part.strip_prefix("pow5-") {
(5, part)
} else if let Some(part) = part.strip_prefix("pow6-") {
(6, part)
} else if let Some(part) = part.strip_prefix("pow7-") {
(7, part)
} else if let Some(part) = part.strip_prefix("pow8-") {
(8, part)
} else if let Some(part) = part.strip_prefix("pow9-") {
(9, part)
} else {
(1, part)
fn get_power(part: &str) -> Option<i8> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): I prefer free functions instead of static functions in Rust except for constructors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this too. I have moved them to si+prefix.rs and power.rs

match part {
"square" | "pow2" => Some(2),
"cubic" | "pow3" => Some(3),
"pow4" => Some(4),
"pow5" => Some(5),
"pow6" => Some(6),
"pow7" => Some(7),
"pow8" => Some(8),
"pow9" => Some(9),
_ => None,
}
}

Expand Down Expand Up @@ -197,26 +185,16 @@ impl MeasureUnit<'_> {
}
}

// TODO: consider using a sufficient trie search for finding the unit id.
/// Get the unit id.
/// NOTE:
/// if the unit id is found, the function will return (unit id, part without the unit id and without `-` at the beginning of the remaining part if it exists).
/// if the unit id is not found, the function will return None.
fn get_unit_id<'data>(
part: &'data str,
trie: &ZeroTrie<ZeroVec<'data, u8>>,
) -> Option<(usize, &'data str)> {
// TODO(#4379): this is inefficient way to search for an item in a trie.
// we must implement a way to search for a prefix in a trie.
let mut result = None;
for (index, _) in part.char_indices() {
let identifier = &part[..=index];
if let Some(value) = trie.get(identifier.as_bytes()) {
result = Some((value, &part[identifier.len()..]));
}
fn get_unit_id<'data>(part: &'data str, trie: &ZeroTrie<ZeroVec<'data, u8>>) -> Option<usize> {
if let Some(unit_id) = trie.get(part.as_bytes()) {
Some(unit_id)
} else {
None
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(unit_id) = trie.get(part.as_bytes()) {
Some(unit_id)
} else {
None
}
trie.get(part.as_bytes())

or just inline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


result
}

/// Process a part of an identifier.
Expand All @@ -228,23 +206,30 @@ impl MeasureUnit<'_> {
result: &mut Vec<MeasureUnitItem>,
trie: &ZeroTrie<ZeroVec<'_, u8>>,
sffc marked this conversation as resolved.
Show resolved Hide resolved
) -> Result<(), ConversionError> {
let mut identifier = identifier_part;
while !identifier.is_empty() {
let (power, identifier_after_power) = Self::get_power(identifier);
let (si_prefix, identifier_after_si) = Self::get_si_prefix(identifier_after_power);
let (unit_id, identifier_after_unit) =
match Self::get_unit_id(identifier_after_si, trie) {
Some((unit_id, identifier_after_unit)) => (unit_id, identifier_after_unit),
None => return Err(ConversionError::InvalidUnit),
};
if identifier_part.is_empty() {
return Ok(());
}
let mut identifier_split = identifier_part.split('-');
while let Some(mut part) = identifier_split.next() {
let power = match Self::get_power(part) {
Some(power) => {
part = identifier_split
.next()
.ok_or(ConversionError::InvalidUnit)?;
power
}
None => 1,
};

let (si_prefix, identifier_after_si) = Self::get_si_prefix(part);
let unit_id =
Self::get_unit_id(identifier_after_si, trie).ok_or(ConversionError::InvalidUnit)?;

result.push(MeasureUnitItem {
power: power as i8 * sign,
power: sign * power,
si_prefix,
unit_id: unit_id as u16,
});

identifier = identifier_after_unit.get(1..).unwrap_or("");
}

Ok(())
Expand All @@ -262,6 +247,7 @@ impl MeasureUnit<'_> {
.unwrap_or((identifier, ""));

let mut measure_unit_items = Vec::<MeasureUnitItem>::new();

Self::analyze_identifier_part(num_part, 1, &mut measure_unit_items, trie)?;
Self::analyze_identifier_part(den_part, -1, &mut measure_unit_items, trie)?;
Ok(measure_unit_items)
Expand Down
Loading