In writing a application to display the file sizes of set of files, I wanted to provide a human readable size rather then just displaying a byte count (which can get rather big).
I developed this useful short recipe that extends the format specifier mini Language to add the S presentation type - which will intelligently convert the value to be displayed into a known human readable size format - i.e. b, Kb,Mb, Gb etc. It honours the rest of the format specification language (http://docs.python.org/2/library/string.html#format-string-syntax)
It uses a factor of 1024 at each stage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import math
class size( long ):
""" define a size class to allow custom formatting
Implements a format specifier of S for the size class - which displays a human readable in b, kb, Mb etc
"""
def __format__(self, fmt):
if fmt == "" or fmt[-1] != "S":
if fmt[-1].tolower() in ['b','c','d','o','x','n','e','f','g','%']:
# Numeric format.
return long(self).__format__(fmt)
else:
return str(self).__format__(fmt)
val, s = float(self), ["b ","Kb","Mb","Gb","Tb","Pb"]
if val<1:
# Can't take log(0) in any base.
i,v = 0,0
else:
i = int(math.log(val,1024))+1
v = val / math.pow(1024,i)
v,i = (v,i) if v > 0.5 else (v*1024,i-1)
return ("{0:{1}f}"+s[i]).format(v, fmt[:-1])
if __name__ == "__main__":
# Example usages
# You can use normal format spcifiers as expected - just use S as the presentation type (instead of f, i etc)
# and cast the integer byte count to type size.
# Example format specifications
print "{0:.1f}".format(4386) # output - 4386.0
print "{0:.1S}".format(size(4386)) # output 4.3Kb
print "{0:.2S}".format(size(86247))# output 84.23Kb
|
It tries to honour the rest of the format specifier, but it is naive - for instance - if you specify a field width and alignment then that width and padding is applied to the numeric part only.
revision 5 - Slight code change to remove nested if/else blocks - functionality not impacted. Correct known issue - can now use standard numeric conversion types in the format field for arguments of type size - value is co-erced to a long before the format is applied.
Comments are more than welcome.
very nice recipes, I really liked the idea of "creating" a new format specification (even if probably would be a little more pythonic to just override the __str__ conversion and let the size be written correctly when represented as str).
I would change the last if-else in a oneliner version:
The explicit if-else is more clean, I just love the inlined if else too much :p
I must admit i love using the inline if/else as well - so long as i can walk away from the code, come back 30 mins later and still understand what it means :-)
Glad you like the recipe - and it proves either useful - or an inspiration of other recipes of your own.
I did think about overriding __str__ as well - but with __str_ you don't have a format string, so the user is "stuck" with a fixed field width and precision conversion, where as the recipe allows the user to choose what ever they wish - the only thing that does not work is the alignment piece. i.e.
which is less than ideal - I think know how to fix it - but the current version does not have that fix - will have to play tonight.
I like the idea of using a format string for this. Thank you for the recipe, I will study it in detail.
Unfortunately it has a couple of problems.
Firstly, you are using the symbol for bits ("b") when the quantities are bytes ("B"). You might think this is a trivial issue, but it isn't. Get the units wrong, and people can die, or tens of millions of dollars worth of space craft can disintegrate in the Martian atmosphere.
Confusion of units is, at best, a bloody nuisance, and at worst, an utter disaster. Your code gets the units wrong by a factor of 8.
Likewise, please don't mix SI prefixes (K, M, G, etc.) with binary multiplicative factors. There are two official standards for memory sizes:
SI prefixes and powers of 10, e.g. 1 GB = 1000**3 bytes
binary prefixes and powers of 2, e.g. 1 GiB = 1024**3 bytes
The mixed notation you use is off by over 7% for GB, and the error gets proportionally larger for larger prefixes. If you're going to support mixed notation, at least warn that it violates both standards and should be discouraged.
You may also be interested in my library for working with byte-sizes:
http://pypi.python.org/pypi/byteformat
Steve, as an engineer by training I certainly get your point about getting your units right.i will correct it although I do think that the KiB usage is clumsy.
Tony. the NIST document that defines the SI units mentions:
It is important to recognize that the new prefixes for binary multiples are not part of the International System of Units (SI), the modern metric system.
Then goes on an tries to justify it. However I think you're right about it being awkward, so please make using it optional if possible.
I have created a new fork : http://code.activestate.com/recipes/578323-human-readable-filememory-sizes-v2/ which allows choice of the SI, IEC or common formats by use of different conversion specifiers. It also fully honours width and alignment parts of the format spec.