Humanized representation of a number of bytes « Python recipes

Convert an integer number of bytes to a string representation. Example: 1024 -> 1 kB

Based quite heavily on http://mail.python.org/pipermail/python-list/2008-August/1171178.html

      from __future__ import division
import doctest


def humanize_bytes(bytes, precision=1):
    """Return a humanized string representation of a number of bytes.

    Assumes `from __future__ import division`.

    >>> humanize_bytes(1)
    '1 byte'
    >>> humanize_bytes(1024)
    '1.0 kB'
    >>> humanize_bytes(1024*123)
    '123.0 kB'
    >>> humanize_bytes(1024*12342)
    '12.1 MB'
    >>> humanize_bytes(1024*12342,2)
    '12.05 MB'
    >>> humanize_bytes(1024*1234,2)
    '1.21 MB'
    >>> humanize_bytes(1024*1234*1111,2)
    '1.31 GB'
    >>> humanize_bytes(1024*1234*1111,1)
    '1.3 GB'
    """
    abbrevs = (
        (1<<50L, 'PB'),
        (1<<40L, 'TB'),
        (1<<30L, 'GB'),
        (1<<20L, 'MB'),
        (1<<10L, 'kB'),
        (1, 'bytes')
    )
    if bytes == 1:
        return '1 byte'
    for factor, suffix in abbrevs:
        if bytes >= factor:
            break
    return '%.*f %s' % (precision, bytes / factor, suffix)


if __name__ == '__main__':
    doctest.testmod()

      

I use this frequently to convert the result of os.path.getsize() into a more meaningful form.

Tags: bytes, humanize

8 comments

Michael Grünewald 14 years, 1 month ago # | flag

Wouldn't the correct metric abbreviation be calculated with base 1000 instead of 1024? Although widely used, using kB, MB, etc. with base 1024 is not correct. For base 1024 binary prefixes like ki, Mi, etc. should be used (see WP:Binary Prefix).

Doug Latornell (author) 14 years, 1 month ago # | flag

You are correct, Michael, that binary prefixes like ki, Mi, etc. would be more correct. However, as the Wikipedia article you cite mentions, widespread adoption of binary prefixes has been spotty. I have removed the reference to "metric abbreviation" from the description of this recipe.

Beyond that, I'll fall back on "practicality beats purity" and the fact that this recipe can easily be forked by anyone who wants to change it to use binary prefixes.

Rogier Steehouder 14 years, 1 month ago # | flag

You have a lot of code to do string formatting. Lines 40-47 could be changed into:

return '%.*f %s' % (precision, bytes / factor, suffix)

return '{0:.{1}f} {2}'.format(bytes / factor, precision, suffix)

Doug Latornell (author) 14 years, 1 month ago # | flag

Rogier:

Your second suggestion is only valid for Python 2.6 (when string.format() was introduced) or later.

Your first suggestion is definitely more compact, but it does impose a slightly different meaning of precision. With your version the doctests give:

$ python humanize_bytes.py
**********************************************************************
File "humanize_bytes.py", line 12, in __main__.humanize_bytes
Failed example:
    humanize_bytes(1024)
Expected:
    '1 kB'
Got:
    '1.0 kB'
**********************************************************************
File "humanize_bytes.py", line 14, in __main__.humanize_bytes
Failed example:
    humanize_bytes(1024*123)
Expected:
    '123 kB'
Got:
    '123.0 kB'
**********************************************************************
File "humanize_bytes.py", line 16, in __main__.humanize_bytes
Failed example:
    humanize_bytes(1024*12342)
Expected:
    '12 MB'
Got:
    '12.1 MB'
**********************************************************************
File "humanize_bytes.py", line 20, in __main__.humanize_bytes
Failed example:
    humanize_bytes(1024*1234,2)
Expected:
    '1.20 MB'
Got:
    '1.21 MB'
**********************************************************************
File "humanize_bytes.py", line 22, in __main__.humanize_bytes
Failed example:
    humanize_bytes(1024*1234*1111,2)
Expected:
    '1.30 GB'
Got:
    '1.31 GB'
**********************************************************************
1 items had failures:
   5 of   8 in __main__.humanize_bytes
***Test Failed*** 5 failures.

Probably not a big deal, considering that the goal of the recipe is humanized output, so I have updated the recipe to use your first suggestion.

Stephen Chappell 14 years, 1 month ago # | flag

If you want an exact and concise string output for your number of bytes, there is also the following option that can be used:

def convert(number):
    "Convert bytes into human-readable representation."
    assert 0 < number < 1 << 110, 'Number Out Of Range'
    ordered = reversed(tuple(format_bytes(partition_number(number, 1 << 10))))
    cleaned = ', '.join(item for item in ordered if item[0] != '0')
    return cleaned

################################################################################

def partition_number(number, base):
    "Continually divide number by base until zero."
    div, mod = divmod(number, base)
    yield mod
    while div:
        div, mod = divmod(div, base)
        yield mod

def format_bytes(parts):
    "Format partitioned bytes into human-readable strings."
    for power, number in enumerate(parts):
        yield '{} {}'.format(number, format_suffix(power, number))

def format_suffix(power, number):
    "Compute the suffix for a certain power of bytes."
    return (PREFIX[power] + 'byte').capitalize() + ('s' if number != 1 else '')

################################################################################

PREFIX = ' kilo mega giga tera peta exa zetta yotta bronto geop'.split(' ')

Stephen Chappell 14 years, 1 month ago # | flag

The full code from the previous excerpt comes from Recipe 576924.

T 12 years, 5 months ago # | flag

I wrote a compact function to do this (up to terabytes) and though I would share:

def GetHumanReadable(size,precision=2):
    suffixes=['B','KB','MB','GB','TB']
    suffixIndex = 0
    while size > 1024:
        suffixIndex += 1 #increment the index of the suffix
        size = size/1024.0 #apply the division
    return "%.*f %d"%(precision,size,suffixes[suffixIndex])

Giampaolo Rodolà 10 years, 1 month ago # | flag

◄	Python recipes (4591)	►
◄	Doug Latornell's recipes (1)	►

Humanized representation of a number of bytes (Python recipe) by Doug Latornell
ActiveState Code (http://code.activestate.com/recipes/577081/)

8 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Humanized representation of a number of bytes (Python recipe) by Doug Latornell ActiveState Code (http://code.activestate.com/recipes/577081/)

8 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Humanized representation of a number of bytes (Python recipe) by Doug Latornell
ActiveState Code (http://code.activestate.com/recipes/577081/)