-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PluralRuleStringsV1 should be parsed at data load time #615
Comments
I'd suggest taking a similar approach as Pattern and Skeleton, where the JSON representation is human-readable, and the Bincode representation is pre-parsed. When JSON gets read in, the strings should be parsed, and when Bincode gets read in, you just need to point to the data. |
We'll work on #663 first so that we know what FFI footguns are in the offing with this. It's possible to work on this now (and folks should feel free to pick it up!), but FFI may break. |
Here's a model for how to represent plural rules as a zero-copy data structure: This covers all cases, is compatible with UTS 35, and does not require infinite nesting. Rust StructuresHere is enum AndOr { And, Or };
enum Polarity { Positive, Negative };
struct AndOrRelation {
and_or: AndOr, # first entry is Or
operand: PluralOperand, # i, u, v, f, ...
modulo: u32,
polarity: Polarity,
range_list: ZeroVec<RangeOrValue>,
}; And enum RangeOrValue {
Range(u32, u32),
Value(u32),
} AlgorithmI claim that the algorithm is just as fast as a more highly nested AST structure. Pseudo-code:
Byte Representation
* The modulo could likely be compacted further, given that virtually all modulos are on powers of 10.
ExampleRule string: "n % 10 = 3..4,9 and n % 100 != 10..19,70..79,90..99 or n = 0" This rule string contains 3 operations. A JSON-like expansion into the above schema would be:
The bytes:
Total: 75 bytes. For comparison, the string is 60 bytes. So we are a bit bigger, but not too much bigger, and there are opportunities to optimize the byte length:
With these optimizations, the byte length would become:
Total: 38 bytes! Smaller than even the string representation. Note: The above size does not include the VarZeroVec's own header, which will likely incur another 16ish bytes. |
This sounds like it could benefit from the custom derive, though a couple issues are that it's harder to achieve bitpacking with a custom derive, and also I don't think a custom derive for AsULE can handle enums. So might be better to write some custom packed ULE types. |
@zbraniecki #1078 is resolved, hopefully that unblocks this |
Fixed in #1240 |
Currently they need to be further parsed into a
PluralRuleList
. It would be nice if that were handled by the data provider itself. We would have to potentially provide utility functions for converting between aPluralRulesStrings
and a final parsedPluralRuleList
, or perhaps make it so that data providers that can providePluralRulesStrings
can automatically provide aPluralRuleList
.It would also be good if
PluralRuleList
andRulesSelector
could use Cows and borrow from the data provider such thatcc @sffc
The lower level API in #575 will also need to be updated to handle this.
The text was updated successfully, but these errors were encountered: