Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass number rounding #90

Closed
kdeldycke opened this issue Apr 24, 2014 · 14 comments · Fixed by #410
Closed

Bypass number rounding #90

kdeldycke opened this issue Apr 24, 2014 · 14 comments · Fixed by #410

Comments

@kdeldycke
Copy link
Contributor

babel.numbers.format_decimal() and babel.numbers.format_currency() have a built-in banker rounding implemented. See:

babel/babel/numbers.py

Lines 648 to 649 in 3ec7bb1

a, b = split_number(bankersround(abs(value),
self.frac_prec[1]))

If we need to localize numbers without rounding, we can't use these methods.

The minimal default decimal precision per method are:

  • babel.numbers.format_decimal() => 3 digits after the dot
  • babel.numbers.format_currency() => 2 digits after the dot

Proof:

>>> import babel
>>> set([babel.Locale.parse(l).decimal_formats._data[None].frac_prec
...      for l in babel.localedata.locale_identifiers()])
set([(0, 6), (0, 3)])
>>>
>>> import babel
>>> set([babel.Locale.parse(l).currency_formats._data[None].frac_prec
...      for l in babel.localedata.locale_identifiers()])
set([(2, 2)])
>>>

So if you have monetary amounts to localize, and they're already rounded to 2 trailing digits, then you're lucky. Using any of the format_*() methods will have no side effects.

But for other numbers with higher precision, using format_*() as-is is dangerous. It will introduce unwanted rounding. For example, calling format_currency(0.9999, 'EUR', locale='fr') will return 1,00 €. I expect here to get the pristine 0,9999 € string.

I think there must be a clean and documented way to bypass arbitrary rounding when localizing numbers.

@kdeldycke
Copy link
Contributor Author

In the mean time, to bypass the rounding, I use a dirty workaround in the form of two helpers:

from babel import Locale
from babel.numbers import format_decimal, format_currency, LC_NUMERIC


def unrounding_format_decimal(number, formatstr=None, locale=LC_NUMERIC):
    """ Patched version of babel.numbers.format_decimal() bypassing rounding.
    """
    locale = Locale.parse(locale)
    if not formatstr:
        # Update default locale pattern with a stupidly high number of decimals
        # after the dot. This will prevent Babel's internal rounding messing
        # with our already rounded decimals.
        pattern = locale.decimal_formats.get(formatstr).pattern
        formatstr = pattern.replace('.#', '.' + '#' * 42)
    return format_decimal(number, format=formatstr, locale=locale)


def unrounding_format_currency(number, currency, formatstr=None,
                               locale=LC_NUMERIC):
    """ Patched version of babel.numbers.format_currency() bypassing rounding.
    """
    locale = Locale.parse(locale)
    if not formatstr:
        # Update default locale pattern with a stupidly high number of decimals
        # after the dot. This will prevent Babel's internal rounding messing
        # with our already rounded decimals.
        pattern = locale.currency_formats.get(formatstr).pattern
        formatstr = pattern.replace('.00', '.00' + '#' * 42)
    return format_currency(number, currency, format=formatstr, locale=locale)

@kdeldycke
Copy link
Contributor Author

Percent formatting patterns simply don't feature the fractional part of the a number:

>>> import babel
>>> patterns = set([babel.Locale.parse(l).percent_formats.get(None).pattern
...      for l in babel.localedata.locale_identifiers()])
>>> for p in patterns:
...   print p
... 
‎#0%
%#,##0
#,##,##0 %
% #,##0
'‪'#,##0%'‬'
#,##0%
#0%
#,##0 %
#,##,##0%
>>> 

So the hack to bypass artificial rounding is a slight variation of those above:

from babel import Locale
from babel.numbers import format_percent, LC_NUMERIC


def unrounding_format_percent(number, formatstr=None, locale=LC_NUMERIC):
    """ Patched version of babel.numbers.format_percent() bypassing rounding.
    """
    locale = Locale.parse(locale)
    if not formatstr:
        # Update default locale pattern with a stupidly high number of decimals
        # after the dot. This will prevent Babel's internal rounding messing
        # with our already rounded decimals.
        pattern = locale.percent_formats.get(formatstr).pattern
        formatstr = pattern.replace('#0', '#0' + '.#' * 42)
    return format_percent(number, format=formatstr, locale=locale)

@etanol
Copy link
Contributor

etanol commented Sep 26, 2015

The CLDR specification section about number rounding states that half-even should be the default algorithm. However, it seems that there is room to alternative rounding modes as an implementor decision.

This can be easily implemented once I manage to land my new code that relies on the decimal standard package to perform the rounding.

@kdeldycke
Copy link
Contributor Author

Good ! Didn't know there was a refactor in progress. I especially welcome decimal-based code, as I might finally get rid of some dirty workarounds ! :)

@etanol etanol mentioned this issue Oct 9, 2015
@akx
Copy link
Member

akx commented Jan 14, 2016

@etanol, @kdeldycke: Is this issue still valid?

@etanol
Copy link
Contributor

etanol commented Jan 14, 2016

Yes, in fact, now that #272 has been merged, it is easier to implement. Although I have a design dilemma:

  1. Adding a parameter to optionally specify the rounding mode
  2. Forcing the use of decimal contexts to override decimal parameters

The first option should be implemented by using a string identifier, since decimal and cdecimal assigned numbers for rounding modes don't match. But it would be fairly straightforward.

The second option is a bit more complex to implement, specially when trying to detect defaults because ROUND_HALF_EVEN is not Python's default rounding mode. However, it would enable much more control on decimal manipulation to users (i.e. precision, exponent limits, clamp behavior, etc).

I haven't made up my mind yet.

@kdeldycke
Copy link
Contributor Author

All my hacks above works for Babel 2.1 but not in Babel 2.2.

While waiting for @etanol progress, I've updated all my code above bypassing Babel 2.2's rounding. The result is quite convoluted but that's the only way I found to patch the original methods.

I even wrote unit-tests and was pondering the release of a dedicated Python package to monkey-patch Babel's defaults. In the end I was too lazy so I'll just post the code here.

# -*- coding: utf-8 -*-
""" Babel's formatting methods patched to bypass rounding. """
from __future__ import (
    absolute_import,
    division,
    print_function,
    unicode_literals
)

import decimal
import re
from decimal import Decimal

from babel import Locale, localedata
from babel.numbers import (
    LC_NUMERIC,
    format_decimal,
    format_percent,
    parse_pattern
)

# Regular expression to grab the trailing fractional part of formatting
# pattern, starting with a dot (.) and followed by a series of zeros (0) or
# sharps (#).
TRAILING_PRECISION = r'\.[0#]+'


def list_locale():
    """ Return a list of normalized locale codes supported by Babel. """
    return localedata.locale_identifiers()


def get_precision(value):
    """ Return the maximum precision of the fractional part of a decimal. """
    decimal_tuple = value.normalize().as_tuple()
    # Precision is extracted from the fractional part only.
    if decimal_tuple.exponent >= 0:
        return 0
    return abs(decimal_tuple.exponent)


def unround_pattern(pattern, max_prec=None):
    """ Update a format string pattern to remove artificial rounding.

    The strategy consist in updating the rendering pattern with a ridiculously
    high number of decimals after the dot. This will prevent Babel's internal
    rounding messing with our already clean and tydi Decimals.
    """
    # Search for fractionnal definition in pattern.
    matches = re.findall(TRAILING_PRECISION, pattern)

    # The pattern is going to be parsed by the decimal module, so get
    # contextual precision to not exceed the limits.
    if max_prec is None:
        max_prec = decimal.getcontext().prec

    # Extend existing fractional part of the pattern.
    if matches:
        assert len(matches) == 1
        match = matches[0]
        pattern = pattern.replace(
            match, match + '#' * (max_prec - len(match) + 1))

    # Add missing fractional part to the pattern.
    else:
        # Find position of the last zero (0).
        split_point = pattern.rfind('0')
        if split_point < 0:
            raise ValueError(
                "Can't find fractional split-point of a rendering pattern.")

        # Inject our made-up fractionnal part at the split-point we found the
        # last zero.
        pattern = pattern[:split_point] + '0.' + '#' * max_prec + pattern[
            split_point + 1:]

    return pattern


def unrounding_format_decimal(number, pattern=None, locale=LC_NUMERIC):
    """ Patched version of babel.numbers.format_decimal() bypassing rounding.
    """
    # Get default format pattern from the locale if not explicitely provided.
    if not pattern:
        pattern = Locale.parse(locale).decimal_formats.get(pattern).pattern

    # Provide number precision to not bump into Decimal module limits.
    if not isinstance(number, Decimal):
        number = Decimal(str(number))
    pattern = unround_pattern(pattern, get_precision(number))

    return format_decimal(number, format=pattern, locale=locale)


def unrounding_format_currency(
        number, currency, pattern=None, locale=LC_NUMERIC,
        currency_digits=True, format_type='standard'):
    """ Patched version of babel.numbers.format_currency() bypassing rounding.

    Unlike ``unrounding_format_decimal()`` and ``unrounding_format_percent()``,
    we do not wrap and reuse the original ``format_currency()``, as the latter
    always override the precision based on the provided currency.
    """
    locale = Locale.parse(locale)

    # currency_digits parameter is provided in the method's signature to keep
    # compatibility with the original format_currency() method.
    if not currency_digits:
        raise ValueError(
            "You want to use my unrounding currency l10n helper and still want"
            " to truncate digits? What's wrong with you?!")

    # Get default format pattern from the locale if not explicitely provided.
    if not pattern:
        pattern = locale.currency_formats.get(format_type).pattern

    # Provide number precision to not bump into Decimal module limits.
    if not isinstance(number, Decimal):
        number = Decimal(str(number))
    pattern = unround_pattern(pattern, get_precision(number))

    # Do not force fractionnal precision. Let the pattern compute it from its
    # extended format string.
    return parse_pattern(pattern).apply(
        number, locale, currency=currency, force_frac=None)


def unrounding_format_percent(number, pattern=None, locale=LC_NUMERIC):
    """ Patched version of babel.numbers.format_percent() bypassing rounding.
    """
    # Get default format pattern from the locale if not explicitely provided.
    if not pattern:
        pattern = Locale.parse(locale).percent_formats.get(pattern).pattern

    # Provide number precision to not bump into Decimal module limits.
    if not isinstance(number, Decimal):
        number = Decimal(str(number))
    # Reduce max precision by 2 digits as percentages are provided as a ratio
    # but rendered with as a fraction of 100, hence the shift.
    max_prec = get_precision(number) - 2
    pattern = unround_pattern(pattern, max_prec)

    return format_percent(number, format=pattern, locale=locale)
# -*- coding: utf-8 -*-
""" Unit-tests for Babel's rounding bypass methods. """
from __future__ import (
    absolute_import,
    division,
    print_function,
    unicode_literals
)

import re
import unittest
from decimal import Decimal
from itertools import chain, product
from operator import attrgetter

from babel import Locale, localedata

from ocs.utils.i18n import (
    TRAILING_PRECISION,
    get_precision,
    list_currency,
    unround_pattern,
    unrounding_format_currency,
    unrounding_format_decimal,
    unrounding_format_percent
)


class TestI18nMetadata(unittest.TestCase):
    """ Check structure of Babel's locale metadata.

    Ensure the layout of metadata we rely on hasn't changed in new versions of
    Babel. Any changes will requires us to revisit our hackish i18n utilities,
    especially the unrounding methods extending format patterns.
    """

    def test_decimal_formats_keys(self):
        """ Check that all locales share the same set of decimal formats. """
        self.assertEqual(
            set([
                frozenset(Locale.parse(l).decimal_formats.keys())
                for l in localedata.locale_identifiers()]),
            set([
                frozenset([None, 'long', 'short']),
                frozenset([None, 'short']),
            ]))

    def test_decimal_formats_precision(self):
        """ Check all unique decimal precision format. """
        self.assertEqual(
            set(chain.from_iterable([
                map(
                    attrgetter('frac_prec'),
                    Locale.parse(l).decimal_formats.values())
                for l in localedata.locale_identifiers()])),
            set([(0, 0), (0, 3), (0, 6)]))

    def test_currency_formats_keys(self):
        """ Check that all locales share the same set of currency formats. """
        self.assertEqual(
            set([
                frozenset(Locale.parse(l).currency_formats.keys())
                for l in localedata.locale_identifiers()]),
            set([frozenset(['accounting', 'standard', 'standard:short'])]))

    def test_currency_formats_precision(self):
        """ Check all unique currency precision format. """
        self.assertEqual(
            set(chain.from_iterable([
                map(
                    attrgetter('frac_prec'),
                    Locale.parse(l).currency_formats.values())
                for l in localedata.locale_identifiers()])),
            set([(0, 0), (2, 2)]))

    def test_percent_formats_keys(self):
        """ Check that all locales share the same set of percent formats. """
        self.assertEqual(
            set([
                frozenset(Locale.parse(l).percent_formats.keys())
                for l in localedata.locale_identifiers()]),
            set([frozenset([None])]))

    def test_percent_formats_precision(self):
        """ Check all unique percent precision format. """
        self.assertEqual(
            set(chain.from_iterable([
                map(
                    attrgetter('frac_prec'),
                    Locale.parse(l).percent_formats.values())
                for l in localedata.locale_identifiers()])),
            set([(0, 0)]))


class TestL10nRendering(unittest.TestCase):
    """ Check rendering of l10n helpers. """

    def test_get_precision(self):
        test_data = [
            ('10000', 0),
            ('1', 0),
            ('1.0', 0),
            ('1.1', 1),
            ('1.11', 2),
            ('1.110', 2),
            ('1.001', 3),
            ('1.00100', 3),
            ('01.00100', 3),
            ('101.00100', 3),
            ('00000', 0),
            ('0', 0),
            ('0.0', 0),
            ('0.1', 1),
            ('0.11', 2),
            ('0.110', 2),
            ('0.001', 3),
            ('0.00100', 3),
            ('00.00100', 3),
            ('000.00100', 3),
        ]
        for input_value, expected_value in test_data:
            self.assertEqual(
                get_precision(Decimal(input_value)),
                expected_value)

    def test_decimal_pattern_unrounding(self):
        """ All unrounded patterns must ends up with fractionnal part. """
        all_patterns = set(chain.from_iterable([
            map(
                attrgetter('pattern'),
                Locale.parse(l).decimal_formats.values())
            for l in localedata.locale_identifiers()]))

        for pattern in all_patterns:
            unrounded_pattern = unround_pattern(pattern)
            matches = re.findall(TRAILING_PRECISION, unrounded_pattern)
            self.assertEqual(len(matches), 1)
            self.assertTrue(matches[0].startswith('.'))
            self.assertTrue(matches[0].endswith('##########'))
            # Sub-sequent transformations are stable.
            self.assertEqual(
                unround_pattern(unrounded_pattern), unrounded_pattern)

    def test_unrounding_format_decimal(self):
        """ Test preservation of precision with unrounding decimal l10n helper.
        """
        # Test precision conservation.
        test_data = [
            ('10000', '10,000'),
            ('1', '1'),
            ('1.0', '1'),
            ('1.1', '1.1'),
            ('1.11', '1.11'),
            ('1.110', '1.11'),
            ('1.001', '1.001'),
            ('1.00100', '1.001'),
            ('01.00100', '1.001'),
            ('101.00100', '101.001'),
            ('00000', '0'),
            ('0', '0'),
            ('0.0', '0'),
            ('0.1', '0.1'),
            ('0.11', '0.11'),
            ('0.110', '0.11'),
            ('0.001', '0.001'),
            ('0.00100', '0.001'),
            ('00.00100', '0.001'),
            ('000.00100', '0.001'),
        ]
        for input_value, expected_value in test_data:
            self.assertEqual(
                unrounding_format_decimal(
                    Decimal(input_value), locale='en_US'),
                expected_value)

        # Test all locales.
        for locale_code in localedata.locale_identifiers():
            self.assertTrue(
                unrounding_format_decimal(
                    '0.9999999999', locale=locale_code).endswith('9999999999'))

    def test_unrounding_format_currency(self):
        """ Test preservation of precision with unrounding currency l10n helper.
        """
        locales_and_currencies = product(
            localedata.locale_identifiers(),
            list_currency())
        for locale_code, currency_code in locales_and_currencies:
            self.assertGreater(
                unrounding_format_currency(
                    '0.9999999999',
                    currency_code,
                    locale=locale_code).find('9999999999'), -1)

    def test_unrounding_format_percent(self):
        """ Test preservation of precision with unrounding percent l10n helper.
        """
        for locale_code in localedata.locale_identifiers():
            rendered_percent = unrounding_format_percent(
                '0.9999999999', locale=locale_code)
            self.assertEqual(rendered_percent.find('9999999999'), -1)
            self.assertGreater(rendered_percent.find('99999999'), -1)

@akx
Copy link
Member

akx commented Jan 25, 2016

Hey @kdeldycke, would you be interested in making a PR that folds the unrounding versions in as, say, a rounding=True kwarg for the formatting functions? And would @etanol be okay with that?

@kdeldycke
Copy link
Contributor Author

@akx why not. The thing is, my unrounding method is quite hackish as it consist in updating, in a non-destructive way, the CLDR patterns definition on the fly. I'm quite certain @etanol had a cleaner implementation in mind, solely based on Python's decimal module.

@akx
Copy link
Member

akx commented Jan 25, 2016

@kdeldycke Hmm. Well, if you can come up with a less hackish way, that'd be nice too? :D

@aandis
Copy link

aandis commented Apr 6, 2016

+1 from gratipay/gratipay.com#3966

@etanol
Copy link
Contributor

etanol commented Apr 7, 2016

Allright fellows, I think I know how to solve this.

I've reached the conclusion that the most versatile solution to this problem is to add a new optional parameter to format_decimal and format_currency to be filled with a decimal.Context instance.

While remaining backwards compatible, this solution will allow full control on decimal number operations. That means that not only users will be able to change the rounding mode, but also control precision, exponent ranges and so on.

I'll give it a try this weekend and will submit a pull request.

@sublee
Copy link
Contributor

sublee commented May 24, 2016

@etanol Are you still a work in progress?

@kdeldycke
Copy link
Contributor Author

This issue is addressed by #494.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants