Welcome, guest | Sign In | My Account | Store | Cart

In writing a application to display the file sizes of set of files, I wanted to provide a human readable size rather then just displaying a byte count (which can get rather big).

I developed this useful short recipe that extends the format specifier mini Language to add the S presentation type - which will intelligently convert the value to be displayed into a known human readable size format - i.e. b, Kb,Mb, Gb etc. It honours the rest of the format specification language (http://docs.python.org/2/library/string.html#format-string-syntax)

It uses a factor of 1024 at each stage

Python, 34 lines
import math

class size( long ):
    """ define a size class to allow custom formatting
        Implements a format specifier of S for the size class - which displays a human readable in b, kb, Mb etc 
    def __format__(self, fmt):
        if fmt == "" or fmt[-1] != "S":
            if fmt[-1].tolower() in ['b','c','d','o','x','n','e','f','g','%']:
                # Numeric format.
                return long(self).__format__(fmt)
                return str(self).__format__(fmt)

        val, s = float(self), ["b ","Kb","Mb","Gb","Tb","Pb"]
        if val<1:
            # Can't take log(0) in any base.
            i,v = 0,0
            i = int(math.log(val,1024))+1
            v = val / math.pow(1024,i)
            v,i = (v,i) if v > 0.5 else (v*1024,i-1)
        return ("{0:{1}f}"+s[i]).format(v, fmt[:-1])

if __name__ == "__main__":
    # Example usages

    # You can use normal format spcifiers as expected - just use S as the presentation type (instead of f, i etc)
    # and cast the integer byte count to type size.

    # Example format specifications
    print "{0:.1f}".format(4386) # output - 4386.0
    print "{0:.1S}".format(size(4386)) # output 4.3Kb
    print "{0:.2S}".format(size(86247))# output 84.23Kb

It tries to honour the rest of the format specifier, but it is naive - for instance - if you specify a field width and alignment then that width and padding is applied to the numeric part only.

revision 5 - Slight code change to remove nested if/else blocks - functionality not impacted. Correct known issue - can now use standard numeric conversion types in the format field for arguments of type size - value is co-erced to a long before the format is applied.

Comments are more than welcome.


Enrico Giampieri 11 years, 3 months ago  # | flag

very nice recipes, I really liked the idea of "creating" a new format specification (even if probably would be a little more pythonic to just override the __str__ conversion and let the size be written correctly when represented as str).

I would change the last if-else in a oneliner version:

i = 0 if val<1 else int(math.log(val,1024))+1
v = val / math.pow(1024,i)
v,i = (v,i) if v > 0.5 else (v*1024,(i-1 if i else 0))

The explicit if-else is more clean, I just love the inlined if else too much :p

Tony Flury (author) 11 years, 3 months ago  # | flag

I must admit i love using the inline if/else as well - so long as i can walk away from the code, come back 30 mins later and still understand what it means :-)

Glad you like the recipe - and it proves either useful - or an inspiration of other recipes of your own.

I did think about overriding __str__ as well - but with __str_ you don't have a format string, so the user is "stuck" with a fixed field width and precision conversion, where as the recipe allows the user to choose what ever they wish - the only thing that does not work is the alignment piece. i.e.

 "{0:<10S}".format(size(23)) = '23        b '

which is less than ideal - I think know how to fix it - but the current version does not have that fix - will have to play tonight.

Steven D'Aprano 11 years, 3 months ago  # | flag

I like the idea of using a format string for this. Thank you for the recipe, I will study it in detail.

Unfortunately it has a couple of problems.

Firstly, you are using the symbol for bits ("b") when the quantities are bytes ("B"). You might think this is a trivial issue, but it isn't. Get the units wrong, and people can die, or tens of millions of dollars worth of space craft can disintegrate in the Martian atmosphere.

Confusion of units is, at best, a bloody nuisance, and at worst, an utter disaster. Your code gets the units wrong by a factor of 8.

Likewise, please don't mix SI prefixes (K, M, G, etc.) with binary multiplicative factors. There are two official standards for memory sizes:

  • SI prefixes and powers of 10, e.g. 1 GB = 1000**3 bytes

  • binary prefixes and powers of 2, e.g. 1 GiB = 1024**3 bytes

The mixed notation you use is off by over 7% for GB, and the error gets proportionally larger for larger prefixes. If you're going to support mixed notation, at least warn that it violates both standards and should be discouraged.

You may also be interested in my library for working with byte-sizes:


Tony Flury (author) 11 years, 3 months ago  # | flag

Steve, as an engineer by training I certainly get your point about getting your units right.i will correct it although I do think that the KiB usage is clumsy.

Martin Miller 11 years, 3 months ago  # | flag

Tony. the NIST document that defines the SI units mentions:

It is important to recognize that the new prefixes for binary multiples are not part of the International System of Units (SI), the modern metric system.

Then goes on an tries to justify it. However I think you're right about it being awkward, so please make using it optional if possible.

Tony Flury (author) 11 years, 3 months ago  # | flag

I have created a new fork : http://code.activestate.com/recipes/578323-human-readable-filememory-sizes-v2/ which allows choice of the SI, IEC or common formats by use of different conversion specifiers. It also fully honours width and alignment parts of the format spec.

Created by Tony Flury on Sun, 4 Nov 2012 (MIT)
Python recipes (4591)
Tony Flury's recipes (4)

Required Modules

Other Information and Tasks