Convert an integer number of bytes to a string representation. Example: 1024 -> 1 kB
Based quite heavily on http://mail.python.org/pipermail/python-list/2008-August/1171178.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | from __future__ import division
import doctest
def humanize_bytes(bytes, precision=1):
"""Return a humanized string representation of a number of bytes.
Assumes `from __future__ import division`.
>>> humanize_bytes(1)
'1 byte'
>>> humanize_bytes(1024)
'1.0 kB'
>>> humanize_bytes(1024*123)
'123.0 kB'
>>> humanize_bytes(1024*12342)
'12.1 MB'
>>> humanize_bytes(1024*12342,2)
'12.05 MB'
>>> humanize_bytes(1024*1234,2)
'1.21 MB'
>>> humanize_bytes(1024*1234*1111,2)
'1.31 GB'
>>> humanize_bytes(1024*1234*1111,1)
'1.3 GB'
"""
abbrevs = (
(1<<50L, 'PB'),
(1<<40L, 'TB'),
(1<<30L, 'GB'),
(1<<20L, 'MB'),
(1<<10L, 'kB'),
(1, 'bytes')
)
if bytes == 1:
return '1 byte'
for factor, suffix in abbrevs:
if bytes >= factor:
break
return '%.*f %s' % (precision, bytes / factor, suffix)
if __name__ == '__main__':
doctest.testmod()
|
I use this frequently to convert the result of os.path.getsize()
into a more meaningful form.
Wouldn't the correct metric abbreviation be calculated with base 1000 instead of 1024? Although widely used, using kB, MB, etc. with base 1024 is not correct. For base 1024 binary prefixes like ki, Mi, etc. should be used (see WP:Binary Prefix).
You are correct, Michael, that binary prefixes like ki, Mi, etc. would be more correct. However, as the Wikipedia article you cite mentions, widespread adoption of binary prefixes has been spotty. I have removed the reference to "metric abbreviation" from the description of this recipe.
Beyond that, I'll fall back on "practicality beats purity" and the fact that this recipe can easily be forked by anyone who wants to change it to use binary prefixes.
You have a lot of code to do string formatting. Lines 40-47 could be changed into:
or
Rogier:
Your second suggestion is only valid for Python 2.6 (when
string.format()
was introduced) or later.Your first suggestion is definitely more compact, but it does impose a slightly different meaning of precision. With your version the doctests give:
Probably not a big deal, considering that the goal of the recipe is humanized output, so I have updated the recipe to use your first suggestion.
If you want an exact and concise string output for your number of bytes, there is also the following option that can be used:
The full code from the previous excerpt comes from Recipe 576924.
I wrote a compact function to do this (up to terabytes) and though I would share:
Similar recipe: http://code.activestate.com/recipes/578019-bytes-to-human-human-to-bytes-converter/