Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal quantization #494

Merged
merged 1 commit into from
Oct 23, 2017
Merged

Decimal quantization #494

merged 1 commit into from
Oct 23, 2017

Conversation

kdeldycke
Copy link
Contributor

@kdeldycke kdeldycke commented Apr 20, 2017

This PR add a decimal_quantization parameter to format_decimal(), format_currency(), format_percent() and format_scientific(), to bypass the forced quantization on the fractional part. This PR definitely address #90 and is a continuation of #410.

This PR has been extensively reviewed and is waiting for #538 to be merged, so I can rebase that one on top of it.

@codecov-io
Copy link

codecov-io commented May 18, 2017

Codecov Report

Merging #494 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #494      +/-   ##
==========================================
+ Coverage   90.04%   90.06%   +0.01%     
==========================================
  Files          24       24              
  Lines        4039     4046       +7     
==========================================
+ Hits         3637     3644       +7     
  Misses        402      402
Impacted Files Coverage Δ
babel/numbers.py 98.01% <100%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf03578...f5e0a3a. Read the comment docs.

@kdeldycke
Copy link
Contributor Author

This PR was previously based on #491 and was partially duplicating it. Now that the later was merged, I rebased and squashed the whole code on top of the current master branch.

This PR is ready to be reviewed and merged.

Copy link
Member

@benselme benselme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Better docs needed
  • Improved commit message also needed

Not easy to review, but I think I have found some bugs using some data from ICU4J. The '#E0' pattern which is used in many locales for scientific notation should not truncate the fractional part.

Ex: for the 'es' locale: ICU gives "1,23445678E3", while babel gives "1E3".

I checked with other libraries, and the Elixir CLDR one gives the same result as ICU. I think it's a special case but I cannot find it in the specs.

There are other differences but all of them seem to stem from the fact that we ignore any tag with the 'draft' attribute.

babel/numbers.py Outdated
def format_decimal(number, format=None, locale=LC_NUMERIC):
def format_decimal(
number, format=None, locale=LC_NUMERIC,
decimal_quantization=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc for decimal_quantization is missing. What does it do, exactly ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d89365a.

currency_digits=True, format_type='standard'):
def format_currency(
number, currency, format=None, locale=LC_NUMERIC, currency_digits=True,
format_type='standard', decimal_quantization=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc for decimal_quantization is missing here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d89365a.

babel/numbers.py Outdated
def format_percent(number, format=None, locale=LC_NUMERIC):

def format_percent(
number, format=None, locale=LC_NUMERIC, decimal_quantization=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, undocumented parameter of a public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d89365a.



def format_scientific(number, format=None, locale=LC_NUMERIC):
def format_scientific(
number, format=None, locale=LC_NUMERIC, decimal_quantization=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc for param missing, same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d89365a.

babel/numbers.py Outdated
detected in the pattern. Default is to not mess with the scale at all.
"""
scale = 0
if '%' in ''.join(self.prefix + self.suffix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to join is useless here. A simple concatenation would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7e15be9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to join is indeed required as prefix and suffix are lists of strings. Removing the join introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007

I reverted the change in 4887fdf. All unittests passes now.

babel/numbers.py Outdated
scale = 0
if '%' in ''.join(self.prefix + self.suffix):
scale = 2
elif u'‰' in ''.join(self.prefix + self.suffix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless join also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7e15be9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to join is indeed required as prefix and suffix are lists of strings. Removing the join introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007

I reverted the change in 4887fdf. All unittests passes now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry about this one !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! :)

@kdeldycke
Copy link
Contributor Author

Thanks @benselme for the reviewing effort! I'll address all your comments in one or two weeks as I'm on holidays right now! 😎

@kdeldycke
Copy link
Contributor Author

I just addressed all coding and document issues. The PR has been squashed into one commit.

@benselme : I know this PR is not easy to review. To ease the pain I took the time to write expansive tests, searching for edge-cases and undefined situations.

I'm really happy with the results as I found a way to handle generically any rendering pattern.

Now the only remaining issue concerns the #E0 situation, with breaks the consistency of the pattern rendering. What should I do here to have this PR accepted?

@benselme
Copy link
Member

Thanks a lot for your work.

What's here looks good to me. But I think the #E0 issue needs to be solved before a merge can take place, since it makes scientific formatting useless for dozens of languages. I would write a special case for it, with proper comments and test cases. I think that's what the cldr elixir package does, but I very well might be wrong. I had a quick look in ICU4J but could not locate the relevant code.

I would also love to have @etanol 's opinion on all this since he's the other person who recently did some work on numbers.

PS: the CLDR-users mailing list is a very useful resource.

@etanol
Copy link
Contributor

etanol commented Jun 12, 2017

Sorry to respond so late, but I haven't been involved in Python development for a while.

The changes look good, although they are a bit difficult to review because refactoring and new functionality are not in separate commits. If you have the time, please consider splitting this PR in to two commits, otherwise it's very confusing to tell apart code movements from actual new functionality or bugfixing.

I also have a couple of other comments for specific snippets in case you want to address them.

Overall, I'm okay with the feature as long as it doesn't break backwards compatibility.

babel/numbers.py Outdated
def apply(self, value, locale, currency=None, force_frac=None):
frac_prec = force_frac or self.frac_prec
@property
def scale(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turning a precomputed field into a property is going to have a small performance impact. My first motivation to contribute to this module was a use case we had, where currency formatting was consuming a lot of CPU time. This is not Java, so we don't have a JIT that can inline this method call.

If you have time, try to do some trivial benchmarking by comparing against the parent commit. Think about formatting hundreds of thousands of values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use the @cached_property decorator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to add other module dependencies so I abandoned the idea of reusing a @cached_property decorator.

Instead I simply kept the original method to keep the code readable and documented. See how the scaling value is now computed once since commit 7de4329.

babel/numbers.py Outdated
Forced decimal quantization is active by default so we'll produce a
number string that is strictly following CLDR pattern definitions.
"""
# Manipulates decimal instances only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the one-line comments in this method are not very useful. I understand and appreciate the aesthetics, but in practice they don't provide any value. I think it's better to add small paragraphs near the complex chunks of code, explaining the abstract ideas that lead to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand by the principle that you can't have too much documentation, but I don't really want to nit-pick here. Also note that this commenting style was originally there and I simply tried to keep the spirit of the current code base.

Anyway I made the necessary changes to remove one-line comments in commit 22b8f81.

babel/numbers.py Outdated
if not decimal_quantization:
# Bump decimal precision to the natural precision of the number if
# it exceeds the one we're about to use.
frac_prec = (frac_prec[0], max([frac_prec[1], self.precision(value)]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the core of the PR, right? As I mentioned in the discussion, it would be much better to separate this (and all dependant changes) to a separate commit. This will allow reviewers to choose granularity (since the GitHub UI allows to review all commits combined), and will ease potential bisections in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the core of the PR which is about bypassing decimal quantization. And I'll be happy to split it into another PR.

Because the code is currently highly intertwined, I'd rather complete the review process of the whole monolithic PR in its present form before attempting a code split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are ready to see that split now. And I'm going to insist more on separating refactoring from the real decimal quantization optional behavior code. It will help potential future bisections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @etanol and sorry for the delay.

I just spliced it up to #494.

babel/numbers.py Outdated
def apply(self, value, locale, currency=None, force_frac=None):
frac_prec = force_frac or self.frac_prec
@property
def scale(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use the @cached_property decorator.

babel/numbers.py Outdated
return scale

@staticmethod
def precision(number):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think method names should contain a verb – perhaps this should be get_precision()?

Also, instead of a static method, consider making it a free function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the method and made it a free function in commit 8c70916.

babel/numbers.py Outdated
return abs(decimal_tuple.exponent)

@staticmethod
def quantum(precision):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments for the precision method apply here too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit dba70a5.

@@ -16,7 +16,10 @@

from datetime import date

from babel import numbers
from babel.numbers import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a repeat of line 23, below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed in commit 7a8c1f2.


def test_format_decimal_quantization():
# Test precision conservation.
test_data = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use @pytest.mark.parametrize here:

@pytest.mark.parametrize('input_value, expected', [
 ('10000', '10,000'),
 # (etc, etc.)
])
def test_format_decimal_quantization(input_value, expected):
  # ...

This has the advantage that each of these cases will show up (and be addressable) as a separate test case, so if one of those fails, the reader will not have to read the locals to figure out which particular case failed. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit b882589.

@@ -328,8 +383,43 @@ def test_format_currency_format_type():
== u'1.099,98')


def test_format_currency_quantization():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same parametrize comment applies here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit b882589.

@@ -339,14 +429,82 @@ def test_format_percent():
== u'25,123\u2030')


def test_scientific_exponent_displayed_as_integer():
assert numbers.format_scientific(100000, locale='en_US') == u'1E5'
def test_format_percent_quantization():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And parametrize this too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit b882589.

assert numbers.format_scientific(42, u'00000.000000E0000', locale='en_US') == u'42000.000000E-0003'


def test_format_scientific_quantization():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this oughta get parametrized too :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit b882589.

babel/numbers.py Outdated
value = abs(value).normalize()

# Prepare scientific notation metadata.
if self.exp_prec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's big chunks of scientific notation code interspersed with more "regular" rendering here – would you think the exp_prec bits could maybe be refactored into separate functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit e66a4c1.

@@ -124,7 +127,7 @@ def test_scientific_notation(self):
self.assertEqual(fmt, '1.2E3')
# Exponent grouping
fmt = numbers.format_scientific(12345, '##0.####E0', locale='en_US')
self.assertEqual(fmt, '12.345E3')
self.assertEqual(fmt, '1.2345E4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So was this a bug, or a deliberate change in output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a deliberate change in output to fix what I think is a bug. The ##0.####E0 pattern implies only the first digit before the decimal separator is required.

@kdeldycke
Copy link
Contributor Author

@benselme: I just addressed the issue of the default #E0 rendering pattern in commit b275690. The current PR is now rendering scientific notation like the CLDR elixir package and ICU4J does.

@kdeldycke
Copy link
Contributor Author

kdeldycke commented Jun 29, 2017

I just finished addressing all pending comments of this PR.

What needs to be done now:

  • Have @etanol, @akx and @benselme do a final pass on the PR and eventually lift their review process.
  • Done: Squash all commits into one.
  • Split the PR into two: one for decimal quantization, and one for all the other refactoring and bug fixes.

babel/numbers.py Outdated
# triggered if the decimal quantization is disabled or if a scientific
# notation pattern has a missing mandatory fractional part (as in the
# default '#E0' pattern). This special case has been extensively
# disccused at https://github.com/python-babel/babel/pull/494#issuecomment-307649969 .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here discussed not disccused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed in 6d0eabc.

@benselme
Copy link
Member

benselme commented Jul 3, 2017

Thanks ! I'll try and have a look at your changes this week.

@benselme
Copy link
Member

Sorry, weeks go by and I've been quite busy. Not forgetting about this PR though. Thanks for your patience.

@kdeldycke
Copy link
Contributor Author

Thanks @benselme for the feedback! I know this PR is kind of tough so take your time.

@kdeldycke
Copy link
Contributor Author

Any news on that one? :)

@kdeldycke kdeldycke mentioned this pull request Oct 16, 2017
@kdeldycke
Copy link
Contributor Author

kdeldycke commented Oct 16, 2017

I just finished splitting this PR. #538 is now waiting to be merged so I can finally rebase that one to add proper decimal quantization control.

@kdeldycke
Copy link
Contributor Author

This PR has been rebased on top of the latest master branch and is ready to be merged! :)

@etanol etanol merged commit 1539c8a into python-babel:master Oct 23, 2017
@etanol
Copy link
Contributor

etanol commented Oct 23, 2017

I think three years is enough proof of patience and perseverance. Thanks @kdeldycke for having both.

@kdeldycke
Copy link
Contributor Author

Thanks @etanol for the merge! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants