Welcome, guest | Sign In | My Account | Store | Cart

Convert an integer number of bytes to a string representation. Example: 1024 -> 1 kB

Based quite heavily on http://mail.python.org/pipermail/python-list/2008-August/1171178.html

Python, 44 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44``` ```from __future__ import division import doctest def humanize_bytes(bytes, precision=1): """Return a humanized string representation of a number of bytes. Assumes `from __future__ import division`. >>> humanize_bytes(1) '1 byte' >>> humanize_bytes(1024) '1.0 kB' >>> humanize_bytes(1024*123) '123.0 kB' >>> humanize_bytes(1024*12342) '12.1 MB' >>> humanize_bytes(1024*12342,2) '12.05 MB' >>> humanize_bytes(1024*1234,2) '1.21 MB' >>> humanize_bytes(1024*1234*1111,2) '1.31 GB' >>> humanize_bytes(1024*1234*1111,1) '1.3 GB' """ abbrevs = ( (1<<50L, 'PB'), (1<<40L, 'TB'), (1<<30L, 'GB'), (1<<20L, 'MB'), (1<<10L, 'kB'), (1, 'bytes') ) if bytes == 1: return '1 byte' for factor, suffix in abbrevs: if bytes >= factor: break return '%.*f %s' % (precision, bytes / factor, suffix) if __name__ == '__main__': doctest.testmod() ```

I use this frequently to convert the result of `os.path.getsize()` into a more meaningful form.

Michael Grünewald 14 years, 1 month ago

Wouldn't the correct metric abbreviation be calculated with base 1000 instead of 1024? Although widely used, using kB, MB, etc. with base 1024 is not correct. For base 1024 binary prefixes like ki, Mi, etc. should be used (see WP:Binary Prefix).

Doug Latornell (author) 14 years, 1 month ago

You are correct, Michael, that binary prefixes like ki, Mi, etc. would be more correct. However, as the Wikipedia article you cite mentions, widespread adoption of binary prefixes has been spotty. I have removed the reference to "metric abbreviation" from the description of this recipe.

Beyond that, I'll fall back on "practicality beats purity" and the fact that this recipe can easily be forked by anyone who wants to change it to use binary prefixes.

Rogier Steehouder 14 years, 1 month ago

You have a lot of code to do string formatting. Lines 40-47 could be changed into:

``````return '%.*f %s' % (precision, bytes / factor, suffix)
``````

or

``````return '{0:.{1}f} {2}'.format(bytes / factor, precision, suffix)
``````
Doug Latornell (author) 14 years, 1 month ago

Rogier:

Your second suggestion is only valid for Python 2.6 (when `string.format()` was introduced) or later.

Your first suggestion is definitely more compact, but it does impose a slightly different meaning of precision. With your version the doctests give:

``````\$ python humanize_bytes.py
**********************************************************************
File "humanize_bytes.py", line 12, in __main__.humanize_bytes
Failed example:
humanize_bytes(1024)
Expected:
'1 kB'
Got:
'1.0 kB'
**********************************************************************
File "humanize_bytes.py", line 14, in __main__.humanize_bytes
Failed example:
humanize_bytes(1024*123)
Expected:
'123 kB'
Got:
'123.0 kB'
**********************************************************************
File "humanize_bytes.py", line 16, in __main__.humanize_bytes
Failed example:
humanize_bytes(1024*12342)
Expected:
'12 MB'
Got:
'12.1 MB'
**********************************************************************
File "humanize_bytes.py", line 20, in __main__.humanize_bytes
Failed example:
humanize_bytes(1024*1234,2)
Expected:
'1.20 MB'
Got:
'1.21 MB'
**********************************************************************
File "humanize_bytes.py", line 22, in __main__.humanize_bytes
Failed example:
humanize_bytes(1024*1234*1111,2)
Expected:
'1.30 GB'
Got:
'1.31 GB'
**********************************************************************
5 of   8 in __main__.humanize_bytes
***Test Failed*** 5 failures.
``````

Probably not a big deal, considering that the goal of the recipe is humanized output, so I have updated the recipe to use your first suggestion.

Stephen Chappell 14 years, 1 month ago

If you want an exact and concise string output for your number of bytes, there is also the following option that can be used:

``````def convert(number):
assert 0 < number < 1 << 110, 'Number Out Of Range'
ordered = reversed(tuple(format_bytes(partition_number(number, 1 << 10))))
cleaned = ', '.join(item for item in ordered if item[0] != '0')
return cleaned

################################################################################

def partition_number(number, base):
"Continually divide number by base until zero."
div, mod = divmod(number, base)
yield mod
while div:
div, mod = divmod(div, base)
yield mod

def format_bytes(parts):
"Format partitioned bytes into human-readable strings."
for power, number in enumerate(parts):
yield '{} {}'.format(number, format_suffix(power, number))

def format_suffix(power, number):
"Compute the suffix for a certain power of bytes."
return (PREFIX[power] + 'byte').capitalize() + ('s' if number != 1 else '')

################################################################################

PREFIX = ' kilo mega giga tera peta exa zetta yotta bronto geop'.split(' ')
``````
Stephen Chappell 14 years, 1 month ago

The full code from the previous excerpt comes from Recipe 576924.

T 12 years, 5 months ago

I wrote a compact function to do this (up to terabytes) and though I would share:

``````def GetHumanReadable(size,precision=2):
suffixes=['B','KB','MB','GB','TB']
suffixIndex = 0
while size > 1024:
suffixIndex += 1 #increment the index of the suffix
size = size/1024.0 #apply the division
return "%.*f %d"%(precision,size,suffixes[suffixIndex])
``````
Giampaolo Rodolà 10 years, 1 month ago
 Created by Doug Latornell on Tue, 2 Mar 2010 (MIT)

### Required Modules

• (none specified)