Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch PluralRules to use FixedDecimal #190

Closed
zbraniecki opened this issue Jul 31, 2020 · 18 comments
Closed

Switch PluralRules to use FixedDecimal #190

zbraniecki opened this issue Jul 31, 2020 · 18 comments
Assignees
Labels
C-pluralrules Component: Plural rules T-core Type: Required functionality
Milestone

Comments

@zbraniecki
Copy link
Member

At landing time PluralRules create operands from primitive types. We'd like to switch that to support FixedDecimal.

@zbraniecki zbraniecki added T-core Type: Required functionality C-pluralrules Component: Plural rules labels Jul 31, 2020
@zbraniecki
Copy link
Member Author

@sffc how do you envision this working?

FixedDecimal accepts u*/i* and PluralRules accepts Into<PluralOperands>, should we have Into<PluralOperands> for FixedDecimal?

@sffc
Copy link
Member

sffc commented Aug 12, 2020

I envision an impl From<&FixedDecimal> for PluralOperands, because there's no reason that the FixedDecimal needs to be consumed.

To implement the conversion, you'd use digit_at(x) for x from 0 to the most significant digit to construct i; digit_at(x) for x from the least significant digit to -1 to construct f; and so forth.

@sffc sffc self-assigned this Aug 14, 2020
@sffc sffc mentioned this issue Sep 11, 2020
@sffc sffc added this to the ICU4X 0.1 milestone Sep 11, 2020
@sffc sffc assigned filmil and unassigned sffc Sep 28, 2020
@filmil
Copy link
Contributor

filmil commented Sep 28, 2020

Is there a reason to remove support for From<primitive_type>?

@zbraniecki
Copy link
Member Author

I don't think so.

@zbraniecki
Copy link
Member Author

we can compare performance and if the primitive -> FixedDecimal -> PluralOperands is not a huge cost compared to primitive -> PluralOperands then we can remove. For now, I'd keep it.

@sffc
Copy link
Member

sffc commented Sep 28, 2020

It might be nice to remove FromStr for PluralOperands, since we are developing a more thoroughly tested code path for that feature in #270, but I think it's harmless to keep From<int-like>.

@filmil
Copy link
Contributor

filmil commented Sep 28, 2020

Thanks for explaining. Another question: is there a reason that FixedDecimal does not have a way to extract the integer part and the fractional part, but can rather extract only digit-wise?

@sffc
Copy link
Member

sffc commented Sep 28, 2020

This ticket is all about adding a way to convert the FixedDecimal to PluralOperands, which implies separating the integer and fraction part. FixedDecimal is a super simple representation that maps from magnitudes to digits. See #190 (comment):

To implement the conversion, you'd use digit_at(x) for x from 0 to the most significant digit to construct i; digit_at(x) for x from the least significant digit to -1 to construct f; and so forth.

@filmil
Copy link
Contributor

filmil commented Sep 28, 2020

This ticket is all about adding a way to convert the FixedDecimal to PluralOperands, which implies separating the integer and fraction part. FixedDecimal is a super simple representation that maps from magnitudes to digits. See #190 (comment):

Understood, but that is not quite the answer to my question. I was interested as to why FixedDecimal doesn't have functions, say, named integer_part and fractional_part. Is there something specific about the intended use of FixedDecimal that makes this undesirable?

filmil added a commit to filmil/icu4x that referenced this issue Sep 28, 2020
This will allow us to convert FixedDecimal representation without loss
of precision into PluralOperands, for correct plural selection.

For example, a number `25` may sometimes be pluralized differently from
`25.0`.

Added the benchmarks for the conversion, though without a baseline it is
not just as useful yet.  (Until we try to reimplement.)

See issue unicode-org#190.
@sffc
Copy link
Member

sffc commented Sep 29, 2020

I was interested as to why FixedDecimal doesn't have functions, say, named integer_part and fractional_part. Is there something specific about the intended use of FixedDecimal that makes this undesirable?

What type do you envision that those functions would return?

  • Another FixedDecimal? What scale would you set on the FixedDecimal? Why would you not just use the first FixedDecimal?
  • An integer? What if the number is too big to fit in an integer?

@filmil
Copy link
Contributor

filmil commented Sep 29, 2020

  • Another FixedDecimal? What scale would you set on the FixedDecimal? Why would you not just use the first FixedDecimal?

I would use a compatible scale. It is possible to set the internal representation of a FixedDecimal such that the fractional part is zero. Move other internal representation bits to match.

I wanted to use this modified FixedDecimal because I'd be able to use the internal structures of FixedDecimal to compute the integer and fractional parts, faster than relying on digit_at to iterate over specific digits.

  • An integer? What if the number is too big to fit in an integer?

We could signal an overflow. That said, for purposes of plural rules matching, most of the time only significant digits matter, and the exponents matter less so. It would be a different story for spellout, for example.

Either way, overflow is a realistic possibility for FixedDecimal even today, given that digits are stored in an 8-byte array.

@filmil
Copy link
Contributor

filmil commented Sep 29, 2020

I would use a compatible scale. It is possible to set the internal representation of a FixedDecimal such that the fractional part is zero. Move other internal representation bits to match.

That said, let me check in a benchmark first, then I can do an experiment to figure out if this even makes sense.

Top level info that I'd appreciate is whether FixedDecimal is required to have a specific API, or if you'd be supportive of adding more methods to it.

@sffc
Copy link
Member

sffc commented Sep 29, 2020

I wanted to use this modified FixedDecimal because I'd be able to use the internal structures of FixedDecimal to compute the integer and fractional parts, faster than relying on digit_at to iterate over specific digits.

digit_at is intended to be very fast. It is the operation that FixedDecimal is designed to be very good at doing. I think the function is like 2 or 3 lines of code. If we were to write some other helper function, it would use digit_at under the hood.

Top level info that I'd appreciate is whether FixedDecimal is required to have a specific API, or if you'd be supportive of adding more methods to it.

As usual, new APIs should solve problems. I don't see what problem integer_part and fractional_part are solving, other than being more convenient for the conversion to PluralOperands. I want to push back on efforts to make FixedDecimal more than what it is: a fast, low-level representation of decimal digits at a range of magnitudes (powers of 10).

@filmil
Copy link
Contributor

filmil commented Sep 29, 2020

than being more convenient for the conversion to PluralOperands. I want to push back on efforts to make FixedDecimal more than what it is: a fast, low-level representation of decimal digits at a range of magnitudes (powers of 10).

If you don't want to widen the API on FixedDecimal, that is OK and that answers my question.

@sffc
Copy link
Member

sffc commented Sep 29, 2020

There's really only one algorithm you need, which is:

int result = 0;
for (i in high..=low) {
    result = result * 10 + fixed_decimal.digit_at(i);
}
return result;

My mental model has been that this pattern is short and simple enough to put at the call site. It doesn't need its own method in FixedDecimal.

filmil added a commit to filmil/icu4x that referenced this issue Oct 1, 2020
This will allow us to convert FixedDecimal representation without loss
of precision into PluralOperands, for correct plural selection.

For example, a number `25` may sometimes be pluralized differently from
`25.0`.

Added the benchmarks for the conversion, though without a baseline it is
not just as useful yet.  (Until we try to reimplement.)

See issue unicode-org#190.
filmil added a commit that referenced this issue Oct 1, 2020
* Implements From<FixedDecimal> for PluralOperands

This will allow us to convert FixedDecimal representation without loss
of precision into PluralOperands, for correct plural selection.

For example, a number `25` may sometimes be pluralized differently from
`25.0`.

Added the benchmarks for the conversion, though without a baseline it is
not just as useful yet.  (Until we try to reimplement.)

See issue #190.

* Rewrite eq() to branchless

* Adds a test for eq()

We needed to add a custom implementation for eq() to account
for loss of precision in PluralOperands.

* fixup:first

* Clean up the code for From

* Adds more illustrative naming in From

* Makes clippy happy

* Pulls num_fractional_digits out of the loop

It is possible to compute it based on the low end of the magnitude
range.  No performance change per benchmark.

* fixup: benchmark was wrong

* fixup: moves new operand tests to json

Defines new tests for the data model for conversion, and places them
into the JSON files instead of inline with the tests.

* fixup: moves the benchmark loop in

This allows criterion to run the benchmark loop
with a specific time limit.

* fixup: formatting

* fixup: adds individual sample measurements

Helps smoke out specific performance regressions.
@zbraniecki
Copy link
Member Author

Fixed by #278.

@sffc
Copy link
Member

sffc commented Oct 20, 2020

I'm having trouble linking #278 to this issue. Can someone else try?

@zbraniecki
Copy link
Member Author

I also cannot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-pluralrules Component: Plural rules T-core Type: Required functionality
Projects
None yet
Development

No branches or pull requests

3 participants