Skip to content

Commit

Permalink
Proper support for IEEE 1541 definitions of units?
Browse files Browse the repository at this point in the history
Refer to issue 4 and pull requests 8 and 9 on GitHub for details:
 - #4
 - #8
 - #9
  • Loading branch information
xolox committed Sep 29, 2016
1 parent 6c3a989 commit 78f729b
Show file tree
Hide file tree
Showing 3 changed files with 123 additions and 101 deletions.
60 changes: 32 additions & 28 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ text interfaces more user friendly. Some example features:
The `humanfriendly` package is currently tested on Python 2.6, 2.7, 3.4, 3.5
and PyPy.

.. contents::
:local:

Getting started
---------------

Expand Down Expand Up @@ -91,34 +94,30 @@ Human friendly input/output (text formatting) on the command line based on the P

.. [[[end]]]
Note on units used
------------------

This package uses the traditional units based on powers of two. These units are
still used in Microsoft Windows' graphical user interface and in other
software.

+--------+----------------+
| Unit | Value in bytes |
+--------+----------------+
| ``KB`` | 1024 |
+--------+----------------+
| ``MB`` | 1048576 |
+--------+----------------+
| ``GB`` | 1073741824 |
+--------+----------------+
| ``TB`` | 1099511627776 |
+--------+----------------+
| etc | |
+--------+----------------+

The standard IEEE 1541, used by many hardware and software vendors today,
contradicts this definition, using power of 10 units instead for ``kB``,
``MB``, ``GB`` and so on. These definitions are often referred to as SI
formatting, due to their similarity with the metric system. Thankfully, IEEE
1541 also unambigously defines ``KiB``, ``MiB`` (etc) to the values based on
powers of 2. This module does not yet support these units.

A note about size units
-----------------------

When I originally published the `humanfriendly` package I went with binary
multiples of bytes (powers of two). It was pointed out several times that this
was a poor choice (see issue `#4`_ and pull requests `#8`_ and `#9`_) and thus
the new default became decimal multiples of bytes (powers of ten):

+------+---------------+---------------+
| Unit | Binary value | Decimal value |
+------+---------------+---------------+
| KB | 1024 | 1000 +
+------+---------------+---------------+
| MB | 1048576 | 1000000 |
+------+---------------+---------------+
| GB | 1073741824 | 1000000000 |
+------+---------------+---------------+
| TB | 1099511627776 | 1000000000000 |
+------+---------------+---------------+
| etc | | |
+------+---------------+---------------+

The option to use binary multiples of bytes remains by passing the keyword
argument `decimal=True` to the `format_size()`_ and `parse_size()`_ functions.

Contact
-------
Expand All @@ -136,8 +135,13 @@ This software is licensed under the `MIT license`_.
© 2016 Peter Odding.

.. External references:
.. _#4: https://github.com/xolox/python-humanfriendly/issues/4
.. _#8: https://github.com/xolox/python-humanfriendly/pull/8
.. _#9: https://github.com/xolox/python-humanfriendly/pull/9
.. _format_size(): https://humanfriendly.readthedocs.io/en/latest/#humanfriendly.format_size
.. _GitHub: https://github.com/xolox/python-humanfriendly
.. _MIT license: http://en.wikipedia.org/wiki/MIT_License
.. _parse_size(): https://humanfriendly.readthedocs.io/en/latest/#humanfriendly.parse_size
.. _peter@peterodding.com: peter@peterodding.com
.. _PyPI: https://pypi.python.org/pypi/humanfriendly
.. _Read the Docs: https://humanfriendly.readthedocs.org
126 changes: 70 additions & 56 deletions humanfriendly/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Human friendly input/output in Python.
#
# Author: Peter Odding <peter@peterodding.com>
# Last Change: September 28, 2016
# Last Change: September 29, 2016
# URL: https://humanfriendly.readthedocs.org

"""The main module of the `humanfriendly` package."""

# Standard library modules.
import collections
import decimal
import multiprocessing
import numbers
Expand Down Expand Up @@ -38,7 +39,7 @@
from humanfriendly.compat import is_string

# Semi-standard module versioning.
__version__ = '1.44.9'
__version__ = '2.0a1'

# Spinners are redrawn at most this many seconds.
minimum_spinner_interval = 0.2
Expand All @@ -51,21 +52,17 @@
hide_cursor_code = '\x1b[?25l'
show_cursor_code = '\x1b[?25h'

# Common disk size units, used for formatting and parsing.
disk_size_units = (dict(prefix='b', divider=1, singular='byte', plural='bytes'),
dict(prefix='k', divider=1024**1, singular='KB', plural='KB'),
dict(prefix='m', divider=1024**2, singular='MB', plural='MB'),
dict(prefix='g', divider=1024**3, singular='GB', plural='GB'),
dict(prefix='t', divider=1024**4, singular='TB', plural='TB'),
dict(prefix='p', divider=1024**5, singular='PB', plural='PB'))

# Common disk size units based on IEEE 1541.
disk_size_units_ieee = (dict(prefix='b', divider=1, singular='byte', plural='bytes'),
dict(prefix='k', divider=1000**1, singular='KB', plural='KB'),
dict(prefix='m', divider=1000**2, singular='MB', plural='MB'),
dict(prefix='g', divider=1000**3, singular='GB', plural='GB'),
dict(prefix='t', divider=1000**4, singular='TB', plural='TB'),
dict(prefix='p', divider=1000**5, singular='PB', plural='PB'))
SizeUnit = collections.namedtuple('SizeUnit', 'divider, symbol, name')
CombinedUnit = collections.namedtuple('CombinedUnit', 'decimal, binary')

# Common disk size units in binary (base-2) and decimal (base-10) multiples.
disk_size_units = (
CombinedUnit(SizeUnit(1000**1, 'KB', 'kilobyte'), SizeUnit(1024**1, 'KiB', 'kibibyte')),
CombinedUnit(SizeUnit(1000**2, 'MB', 'megabyte'), SizeUnit(1024**2, 'MiB', 'mebibyte')),
CombinedUnit(SizeUnit(1000**3, 'GB', 'gigabyte'), SizeUnit(1024**3, 'GiB', 'gibibyte')),
CombinedUnit(SizeUnit(1000**4, 'TB', 'terabyte'), SizeUnit(1024**4, 'TiB', 'tebibyte')),
CombinedUnit(SizeUnit(1000**5, 'PB', 'petabyte'), SizeUnit(1024**5, 'PiB', 'pebibyte')),
)

# Common length size units, used for formatting and parsing.
length_size_units = (dict(prefix='nm', divider=1e-09, singular='nm', plural='nm'),
Expand Down Expand Up @@ -112,19 +109,19 @@ def coerce_boolean(value):
return bool(value)


def format_size(num_bytes, keep_width=False, correct=False):
def format_size(num_bytes, keep_width=False, binary=False):
"""
Format a byte count as a human readable file size.
:param num_bytes: The size to format in bytes (an integer).
:param keep_width: ``True`` if trailing zeros should not be stripped,
``False`` if they can be stripped.
:param keep_width: :data:`True` if trailing zeros should not be stripped,
:data:`False` if they can be stripped.
:param binary: :data:`True` to use binary multiples of bytes (base-2),
:data:`False` to use decimal multiples of bytes (base-10).
:returns: The corresponding human readable file size (a string).
This function supports ranges from kilobytes to terabytes. It only supports
the definitions that are based on powers of 2.
Some examples:
This function knows how to format sizes in bytes, kilobytes, megabytes,
gigabytes, terabytes and petabytes. Some examples:
>>> from humanfriendly import format_size
>>> format_size(0)
Expand All @@ -133,62 +130,79 @@ def format_size(num_bytes, keep_width=False, correct=False):
'1 byte'
>>> format_size(5)
'5 bytes'
>>> format_size(1024 ** 2)
'1 MB'
>>> format_size(1024 ** 3 * 4)
'4 GB'
>>> format_size(1000 ** 3 * 4, correct=True)
> format_size(1000)
'1 KB'
> format_size(1024, binary=True)
'1 KiB'
>>> format_size(1000 ** 3 * 4)
'4 GB'
"""
units = disk_size_units
if correct:
units = disk_size_units_ieee
for unit in reversed(units):
if num_bytes >= unit['divider']:
number = round_number(float(num_bytes) / unit['divider'], keep_width=keep_width)
return pluralize(number, unit['singular'], unit['plural'])
for unit in reversed(disk_size_units):
if num_bytes >= unit.binary.divider and binary:
number = round_number(float(num_bytes) / unit.binary.divider, keep_width=keep_width)
return pluralize(number, unit.binary.symbol, unit.binary.symbol)
elif num_bytes >= unit.decimal.divider and not binary:
number = round_number(float(num_bytes) / unit.decimal.divider, keep_width=keep_width)
return pluralize(number, unit.decimal.symbol, unit.decimal.symbol)
return pluralize(num_bytes, 'byte')


def parse_size(size, correct=False):
def parse_size(size, binary=False):
"""
Parse a human readable data size and return the number of bytes.
:param size: The human readable file size to parse (a string).
:param binary: :data:`True` to use binary multiples of bytes (base-2) for
ambiguous unit symbols and names, :data:`False` to use
decimal multiples of bytes (base-10).
:returns: The corresponding size in bytes (an integer).
:raises: :exc:`InvalidSize` when the input can't be parsed.
This function only supports the definitions that are based on powers of 2.
Some examples:
This function knows how to parse sizes in bytes, kilobytes, megabytes,
gigabytes, terabytes and petabytes. Some examples:
>>> from humanfriendly import parse_size
>>> parse_size('42')
42
>>> parse_size('13b')
13
>>> parse_size('5 bytes')
5
>>> parse_size('1 KB')
1000
>>> parse_size('1 kilobyte')
1000
>>> parse_size('1 KiB')
1024
>>> parse_size('1 KB', binary=True)
1024
>>> parse_size('5 kilobyte')
5120
>>> parse_size('1.5 GB')
1610612736
>>> parse_size('1.5 GB', correct=True)
1500000000
>>> parse_size('1.5 GB', binary=True)
1610612736
"""
tokens = tokenize(size)
if tokens and isinstance(tokens[0], numbers.Number):
# If the input contains only a number, it's assumed to be the number of bytes.
if len(tokens) == 1:
# Get the normalized unit (if any) from the tokenized input.
normalized_unit = tokens[1].lower() if len(tokens) == 2 and is_string(tokens[1]) else ''
# If the input contains only a number, it's assumed to be the number of
# bytes. The second token can also explicitly reference the unit bytes.
if len(tokens) == 1 or normalized_unit.startswith('b'):
return int(tokens[0])
# Otherwise we expect to find two tokens: A number and a unit.
if len(tokens) == 2 and is_string(tokens[1]):
normalized_unit = tokens[1].lower()
# Try to match the first letter of the unit.
units = disk_size_units
if correct:
units = disk_size_units_ieee
for unit in units:
if normalized_unit.startswith(unit['prefix']):
return int(tokens[0] * unit['divider'])
# Otherwise we expect two tokens: A number and a unit.
if normalized_unit:
for unit in disk_size_units:
# First we check for unambiguous symbols (KiB, MiB, GiB, etc)
# and names (kibibyte, mebibyte, gibibyte, etc) because their
# handling is always the same.
if normalized_unit in (unit.binary.symbol.lower(), unit.binary.name.lower()):
return int(tokens[0] * unit.binary.divider)
# Now we will deal with ambiguous prefixes (K, M, G, etc),
# symbols (KB, MB, GB, etc) and names (kilobyte, megabyte,
# gigabyte, etc) according to the caller's preference.
if (normalized_unit in (unit.decimal.symbol.lower(), unit.decimal.name.lower()) or
normalized_unit.startswith(unit.decimal.symbol[0].lower())):
return int(tokens[0] * (unit.binary.divider if binary else unit.decimal.divider))
# We failed to parse the size specification.
msg = "Failed to parse size! (input %r was tokenized as %r)"
raise InvalidSize(msg % (size, tokens))
Expand Down
38 changes: 21 additions & 17 deletions humanfriendly/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Tests for the `humanfriendly' package.
#
# Author: Peter Odding <peter.odding@paylogic.eu>
# Last Change: September 28, 2016
# Last Change: September 29, 2016
# URL: https://humanfriendly.readthedocs.org

"""Test suite for the `humanfriendly` package."""
Expand Down Expand Up @@ -185,30 +185,34 @@ def test_format_size(self):
self.assertEqual('0 bytes', humanfriendly.format_size(0))
self.assertEqual('1 byte', humanfriendly.format_size(1))
self.assertEqual('42 bytes', humanfriendly.format_size(42))
self.assertEqual('1 KB', humanfriendly.format_size(1024 ** 1))
self.assertEqual('1 MB', humanfriendly.format_size(1024 ** 2))
self.assertEqual('1 GB', humanfriendly.format_size(1024 ** 3))
self.assertEqual('1 TB', humanfriendly.format_size(1024 ** 4))
self.assertEqual('1 PB', humanfriendly.format_size(1024 ** 5))
self.assertEqual('1 byte', humanfriendly.format_size(1, correct=True))
self.assertEqual('45 KB', humanfriendly.format_size(1000 * 45, correct=True))
self.assertEqual('1 GB', humanfriendly.format_size(1000 ** 3, correct=True))
self.assertEqual('2.9 TB', humanfriendly.format_size(1000 ** 4 * 2.9, correct=True))
self.assertEqual('1 KB', humanfriendly.format_size(1000 ** 1))
self.assertEqual('1 MB', humanfriendly.format_size(1000 ** 2))
self.assertEqual('1 GB', humanfriendly.format_size(1000 ** 3))
self.assertEqual('1 TB', humanfriendly.format_size(1000 ** 4))
self.assertEqual('1 PB', humanfriendly.format_size(1000 ** 5))
self.assertEqual('1 KiB', humanfriendly.format_size(1024 ** 1, binary=True))
self.assertEqual('1 MiB', humanfriendly.format_size(1024 ** 2, binary=True))
self.assertEqual('1 GiB', humanfriendly.format_size(1024 ** 3, binary=True))
self.assertEqual('1 TiB', humanfriendly.format_size(1024 ** 4, binary=True))
self.assertEqual('1 PiB', humanfriendly.format_size(1024 ** 5, binary=True))
self.assertEqual('45 KB', humanfriendly.format_size(1000 * 45))
self.assertEqual('2.9 TB', humanfriendly.format_size(1000 ** 4 * 2.9))

def test_parse_size(self):
"""Test :func:`humanfriendly.parse_size()`."""
self.assertEqual(0, humanfriendly.parse_size('0B'))
self.assertEqual(42, humanfriendly.parse_size('42'))
self.assertEqual(42, humanfriendly.parse_size('42B'))
self.assertEqual(1024, humanfriendly.parse_size('1k'))
self.assertEqual(1024, humanfriendly.parse_size('1 KB'))
self.assertEqual(1024, humanfriendly.parse_size('1 kilobyte'))
self.assertEqual(1024 ** 3, humanfriendly.parse_size('1 GB'))
self.assertEqual(1024 ** 3 * 1.5, humanfriendly.parse_size('1.5 GB'))
self.assertEqual(1000, humanfriendly.parse_size('1k'))
self.assertEqual(1024, humanfriendly.parse_size('1k', binary=True))
self.assertEqual(1000, humanfriendly.parse_size('1 KB'))
self.assertEqual(1000, humanfriendly.parse_size('1 kilobyte'))
self.assertEqual(1024, humanfriendly.parse_size('1 kilobyte', binary=True))
self.assertEqual(1000 ** 2 * 69, humanfriendly.parse_size('69 MB'))
self.assertEqual(1000 ** 3, humanfriendly.parse_size('1 GB'))
self.assertEqual(1000 ** 3 * 1.5, humanfriendly.parse_size('1.5 GB'))
self.assertRaises(humanfriendly.InvalidSize, humanfriendly.parse_size, '1z')
self.assertRaises(humanfriendly.InvalidSize, humanfriendly.parse_size, 'a')
self.assertEqual(1000, humanfriendly.parse_size('1 KB', correct=True))
self.assertEqual(1000 ** 2 * 69, humanfriendly.parse_size('69 MB', correct=True))

def test_format_length(self):
"""Test :func:`humanfriendly.format_length()`."""
Expand Down

0 comments on commit 78f729b

Please sign in to comment.