-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal quantization #494
Decimal quantization #494
Conversation
Codecov Report
@@ Coverage Diff @@
## master #494 +/- ##
==========================================
+ Coverage 90.04% 90.06% +0.01%
==========================================
Files 24 24
Lines 4039 4046 +7
==========================================
+ Hits 3637 3644 +7
Misses 402 402
Continue to review full report at Codecov.
|
This PR was previously based on #491 and was partially duplicating it. Now that the later was merged, I rebased and squashed the whole code on top of the current This PR is ready to be reviewed and merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Better docs needed
- Improved commit message also needed
Not easy to review, but I think I have found some bugs using some data from ICU4J. The '#E0' pattern which is used in many locales for scientific notation should not truncate the fractional part.
Ex: for the 'es' locale: ICU gives "1,23445678E3", while babel gives "1E3".
I checked with other libraries, and the Elixir CLDR one gives the same result as ICU. I think it's a special case but I cannot find it in the specs.
There are other differences but all of them seem to stem from the fact that we ignore any tag with the 'draft' attribute.
babel/numbers.py
Outdated
def format_decimal(number, format=None, locale=LC_NUMERIC): | ||
def format_decimal( | ||
number, format=None, locale=LC_NUMERIC, | ||
decimal_quantization=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc for decimal_quantization is missing. What does it do, exactly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d89365a.
currency_digits=True, format_type='standard'): | ||
def format_currency( | ||
number, currency, format=None, locale=LC_NUMERIC, currency_digits=True, | ||
format_type='standard', decimal_quantization=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc for decimal_quantization is missing here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d89365a.
babel/numbers.py
Outdated
def format_percent(number, format=None, locale=LC_NUMERIC): | ||
|
||
def format_percent( | ||
number, format=None, locale=LC_NUMERIC, decimal_quantization=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, undocumented parameter of a public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d89365a.
|
||
|
||
def format_scientific(number, format=None, locale=LC_NUMERIC): | ||
def format_scientific( | ||
number, format=None, locale=LC_NUMERIC, decimal_quantization=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc for param missing, same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d89365a.
babel/numbers.py
Outdated
detected in the pattern. Default is to not mess with the scale at all. | ||
""" | ||
scale = 0 | ||
if '%' in ''.join(self.prefix + self.suffix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to join is useless here. A simple concatenation would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7e15be9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to join
is indeed required as prefix
and suffix
are lists of strings. Removing the join
introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007
I reverted the change in 4887fdf. All unittests passes now.
babel/numbers.py
Outdated
scale = 0 | ||
if '%' in ''.join(self.prefix + self.suffix): | ||
scale = 2 | ||
elif u'‰' in ''.join(self.prefix + self.suffix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless join also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7e15be9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to join
is indeed required as prefix
and suffix
are lists of strings. Removing the join
introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007
I reverted the change in 4887fdf. All unittests passes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry about this one !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! :)
Thanks @benselme for the reviewing effort! I'll address all your comments in one or two weeks as I'm on holidays right now! 😎 |
I just addressed all coding and document issues. The PR has been squashed into one commit. @benselme : I know this PR is not easy to review. To ease the pain I took the time to write expansive tests, searching for edge-cases and undefined situations. I'm really happy with the results as I found a way to handle generically any rendering pattern. Now the only remaining issue concerns the |
Thanks a lot for your work. What's here looks good to me. But I think the I would also love to have @etanol 's opinion on all this since he's the other person who recently did some work on numbers. PS: the CLDR-users mailing list is a very useful resource. |
Sorry to respond so late, but I haven't been involved in Python development for a while. The changes look good, although they are a bit difficult to review because refactoring and new functionality are not in separate commits. If you have the time, please consider splitting this PR in to two commits, otherwise it's very confusing to tell apart code movements from actual new functionality or bugfixing. I also have a couple of other comments for specific snippets in case you want to address them. Overall, I'm okay with the feature as long as it doesn't break backwards compatibility. |
babel/numbers.py
Outdated
def apply(self, value, locale, currency=None, force_frac=None): | ||
frac_prec = force_frac or self.frac_prec | ||
@property | ||
def scale(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turning a precomputed field into a property is going to have a small performance impact. My first motivation to contribute to this module was a use case we had, where currency formatting was consuming a lot of CPU time. This is not Java, so we don't have a JIT that can inline this method call.
If you have time, try to do some trivial benchmarking by comparing against the parent commit. Think about formatting hundreds of thousands of values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use the @cached_property
decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want to add other module dependencies so I abandoned the idea of reusing a @cached_property
decorator.
Instead I simply kept the original method to keep the code readable and documented. See how the scaling value is now computed once since commit 7de4329.
babel/numbers.py
Outdated
Forced decimal quantization is active by default so we'll produce a | ||
number string that is strictly following CLDR pattern definitions. | ||
""" | ||
# Manipulates decimal instances only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the one-line comments in this method are not very useful. I understand and appreciate the aesthetics, but in practice they don't provide any value. I think it's better to add small paragraphs near the complex chunks of code, explaining the abstract ideas that lead to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand by the principle that you can't have too much documentation, but I don't really want to nit-pick here. Also note that this commenting style was originally there and I simply tried to keep the spirit of the current code base.
Anyway I made the necessary changes to remove one-line comments in commit 22b8f81.
babel/numbers.py
Outdated
if not decimal_quantization: | ||
# Bump decimal precision to the natural precision of the number if | ||
# it exceeds the one we're about to use. | ||
frac_prec = (frac_prec[0], max([frac_prec[1], self.precision(value)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the core of the PR, right? As I mentioned in the discussion, it would be much better to separate this (and all dependant changes) to a separate commit. This will allow reviewers to choose granularity (since the GitHub UI allows to review all commits combined), and will ease potential bisections in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the core of the PR which is about bypassing decimal quantization. And I'll be happy to split it into another PR.
Because the code is currently highly intertwined, I'd rather complete the review process of the whole monolithic PR in its present form before attempting a code split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are ready to see that split now. And I'm going to insist more on separating refactoring from the real decimal quantization optional behavior code. It will help potential future bisections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
babel/numbers.py
Outdated
def apply(self, value, locale, currency=None, force_frac=None): | ||
frac_prec = force_frac or self.frac_prec | ||
@property | ||
def scale(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use the @cached_property
decorator.
babel/numbers.py
Outdated
return scale | ||
|
||
@staticmethod | ||
def precision(number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think method names should contain a verb – perhaps this should be get_precision()
?
Also, instead of a static method, consider making it a free function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the method and made it a free function in commit 8c70916.
babel/numbers.py
Outdated
return abs(decimal_tuple.exponent) | ||
|
||
@staticmethod | ||
def quantum(precision): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments for the precision
method apply here too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit dba70a5.
tests/test_numbers.py
Outdated
@@ -16,7 +16,10 @@ | |||
|
|||
from datetime import date | |||
|
|||
from babel import numbers | |||
from babel.numbers import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a repeat of line 23, below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed in commit 7a8c1f2.
tests/test_numbers.py
Outdated
|
||
def test_format_decimal_quantization(): | ||
# Test precision conservation. | ||
test_data = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use @pytest.mark.parametrize
here:
@pytest.mark.parametrize('input_value, expected', [
('10000', '10,000'),
# (etc, etc.)
])
def test_format_decimal_quantization(input_value, expected):
# ...
This has the advantage that each of these cases will show up (and be addressable) as a separate test case, so if one of those fails, the reader will not have to read the locals to figure out which particular case failed. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit b882589.
tests/test_numbers.py
Outdated
@@ -328,8 +383,43 @@ def test_format_currency_format_type(): | |||
== u'1.099,98') | |||
|
|||
|
|||
def test_format_currency_quantization(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same parametrize
comment applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit b882589.
tests/test_numbers.py
Outdated
@@ -339,14 +429,82 @@ def test_format_percent(): | |||
== u'25,123\u2030') | |||
|
|||
|
|||
def test_scientific_exponent_displayed_as_integer(): | |||
assert numbers.format_scientific(100000, locale='en_US') == u'1E5' | |||
def test_format_percent_quantization(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And parametrize this too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit b882589.
tests/test_numbers.py
Outdated
assert numbers.format_scientific(42, u'00000.000000E0000', locale='en_US') == u'42000.000000E-0003' | ||
|
||
|
||
def test_format_scientific_quantization(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this oughta get parametrized too :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit b882589.
babel/numbers.py
Outdated
value = abs(value).normalize() | ||
|
||
# Prepare scientific notation metadata. | ||
if self.exp_prec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's big chunks of scientific notation code interspersed with more "regular" rendering here – would you think the exp_prec
bits could maybe be refactored into separate functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit e66a4c1.
tests/test_numbers.py
Outdated
@@ -124,7 +127,7 @@ def test_scientific_notation(self): | |||
self.assertEqual(fmt, '1.2E3') | |||
# Exponent grouping | |||
fmt = numbers.format_scientific(12345, '##0.####E0', locale='en_US') | |||
self.assertEqual(fmt, '12.345E3') | |||
self.assertEqual(fmt, '1.2345E4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So was this a bug, or a deliberate change in output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a deliberate change in output to fix what I think is a bug. The ##0.####E0
pattern implies only the first digit before the decimal separator is required.
I just finished addressing all pending comments of this PR. What needs to be done now: |
babel/numbers.py
Outdated
# triggered if the decimal quantization is disabled or if a scientific | ||
# notation pattern has a missing mandatory fractional part (as in the | ||
# default '#E0' pattern). This special case has been extensively | ||
# disccused at https://github.com/python-babel/babel/pull/494#issuecomment-307649969 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo here discussed
not disccused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed in 6d0eabc.
Thanks ! I'll try and have a look at your changes this week. |
Sorry, weeks go by and I've been quite busy. Not forgetting about this PR though. Thanks for your patience. |
Thanks @benselme for the feedback! I know this PR is kind of tough so take your time. |
Any news on that one? :) |
I just finished splitting this PR. #538 is now waiting to be merged so I can finally rebase that one to add proper decimal quantization control. |
This PR has been rebased on top of the latest |
I think three years is enough proof of patience and perseverance. Thanks @kdeldycke for having both. |
Thanks @etanol for the merge! 😃 |
This PR add a
decimal_quantization
parameter toformat_decimal()
,format_currency()
,format_percent()
andformat_scientific()
, to bypass the forced quantization on the fractional part. This PR definitely address #90 and is a continuation of #410.This PR has been extensively reviewed and is waiting for #538 to be merged, so I can rebase that one on top of it.