Welcome, guest | Sign In | My Account | Store | Cart

In anticipation of creating large data structure, it appeared to be helpful if the users could get an idea of how much memory (RAM in particular) would be used while attempting to create a large, multidimensional array. In order to convert the calculated size into a precise, human-readable format, the follow code was devised. In order to convert a number into an equivalent representation of bytes, just call the "convert" function while providing the number as its argument. The other functions are currently public in case anyone else finds them useful.

Python, 56 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
"""Module for byte-to-string conversion.

This module provides several utility functions along with another
function that can convert numbers into byte-size representations."""

################################################################################

__version__ = "$Revision: 2 $"
__date__ = "8 October 2009"
__author__ = "Stephen Chappell <Noctis.Skytower@gmail.com>"
__credits__ = """\
T. Hansen, for his encouraging example as an excellent programmer.
S. Spencer, for reminding me to strive for quality in all things.
J. Sparks, for helping to reignite a dedication to writing code."""

################################################################################

import sys as _sys

################################################################################

def convert(number):
    "Convert bytes into human-readable representation."
    assert 0 < number < 1 << 110, 'number out of range'
    ordered = reversed(tuple(format_bytes(partition_number(number, 1 << 10))))
    cleaned = ', '.join(item for item in ordered if item[0] != '0')
    return cleaned

################################################################################

def partition_number(number, base):
    "Continually divide number by base until zero."
    div, mod = divmod(number, base)
    yield mod
    while div:
        div, mod = divmod(div, base)
        yield mod

def format_bytes(parts):
    "Format partitioned bytes into human-readable strings."
    for power, number in enumerate(parts):
        yield '{} {}'.format(number, format_suffix(power, number))

def format_suffix(power, number):
    "Compute the suffix for a certain power of bytes."
    return (PREFIX[power] + 'byte').capitalize() + 's'[number == 1:]

################################################################################

PREFIX = ' kilo mega giga tera peta exa zetta yotta bronto geop'.split(' ')

################################################################################

if __name__ == '__main__':
    _sys.stdout.write('Content-Type: text/plain\n\n')
    _sys.stdout.write(open(_sys.argv[0]).read())

5 comments

Gabriel Genellina 14 years, 5 months ago  # | flag

Those last three lines should not be there...

Stephen Chappell (author) 14 years, 5 months ago  # | flag

The last three lines are for servers that run *.py files as CGI scripts.

If this code is running on a server, you can examine it as open source.

Tyler Mitchell 12 years, 8 months ago  # | flag

Trying this for the first time, am I forgetting something?

>>> convert(10000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "byte_to_string.py", line 25, in convert
    ordered = reversed(tuple(format_bytes(partition_number(number, 1 << 10))))
  File "byte_to_string.py", line 42, in format_bytes
    yield '{} {}'.format(number, format_suffix(power, number))
ValueError: zero length field name in format
Stephen Chappell (author) 12 years, 8 months ago  # | flag

If you are not using the latest version of Python (3.2.1), try changing yield '{} {}'.format(number, format_suffix(power, number)) to yield '{0} {1}'.format(number, format_suffix(power, number)) instead.

Alexander 11 years, 11 months ago  # | flag

another variant

__author__ = 'egorov'

INT64_BITS_COUNT = 64
BINARY_THOUSAND = 1024

UNITS = [
    "bytes",
    "Kb",
    "Mb",
    "Gb",
    "Tb",
    "Pb",
    "Eb",
    "Zb",
    "Yb"
]

def ilog(x):
    """Calculates integer logarithm

    Args:
        x: int, the number to calculate logarithm of.
    """
    n = INT64_BITS_COUNT
    c = INT64_BITS_COUNT / 2
    while True:
        y = x >> c
        if y:
            n -= c
            x = y
        c >>= 1
        if not c:
            break
    n -= x >> (INT64_BITS_COUNT - 1)
    return (INT64_BITS_COUNT - 1) - (n - x)

def normalize(bytes):
    if not bytes:
        return 0, bytes
    units = ilog(bytes) / ilog(BINARY_THOUSAND)
    if not units:
        value = bytes
    else:
        value = float(bytes) / pow(BINARY_THOUSAND, units)
    return units, value

def formatToHumanSize(bytes, precision=2):
    units, value = normalize(bytes)
    f = '{0:.{2}f} {1}'
    if not units:
        f = '{0} {1}'
    return f.format(value, UNITS[units], precision)