Welcome, guest | Sign In | My Account | Store | Cart

In writing a application to display the file sizes of set of files, I wanted to provide a human readable size rather then just displaying a byte count (which can get rather big).

I developed this useful short recipe that extends the format specifier mini Language to add new presentation type s- which will intelligently convert the value to be displayed into a known human readable size format - i.e. b, Kb,Mb, Gb, B, KB etc. It honours the rest of the format specification language (http://docs.python.org/2/library/string.html#format-string-syntax)

It uses a factor of 1024 for IEC and common formats, and factor of 1000 for SI units.

Python, 58 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import math
import string

class size( long ):
    """ define a size class to allow custom formatting
        format specifiers supported : 
            em : formats the size as bits in IEC format i.e. 1024 bits (128 bytes) = 1Kib 
            eM : formats the size as Bytes in IEC format i.e. 1024 bytes = 1KiB
            sm : formats the size as bits in SI format i.e. 1000 bits = 1kb
            sM : formats the size as bytes in SI format i.e. 1000 bytes = 1KB
            cm : format the size as bit in the common format i.e. 1024 bits (128 bytes) = 1Kb
            cM : format the size as bytes in the common format i.e. 1024 bytes = 1KB
    """
    def __format__(self, fmt):
        # is it an empty format or not a special format for the size class
        if fmt == "" or fmt[-2:].lower() not in ["em","sm","cm"]:
            if fmt[-1].lower() in ['b','c','d','o','x','n','e','f','g','%']:
                # Numeric format.
                return long(self).__format__(fmt)
            else:
                return str(self).__format__(fmt)

        # work out the scale, suffix and base        
        factor, suffix = (8, "b") if fmt[-1] in string.lowercase else (1,"B")
        base = 1024 if fmt[-2] in ["e","c"] else 1000

        # Add the i for the IEC format
        suffix = "i"+ suffix if fmt[-2] == "e" else suffix

        mult = ["","K","M","G","T","P"]

        val = float(self) * factor
        i = 0 if val < 1 else int(math.log(val, base))+1
        v = val / math.pow(base,i)
        v,i = (v,i) if v > 0.5 else (v*base,i-1)

        # Identify if there is a width and extract it
        width = "" if fmt.find(".") == -1 else fmt[:fmt.index(".")]        
        precis = fmt[:-2] if width == "" else fmt[fmt.index("."):-2]

        # do the precision bit first, so width/alignment works with the suffix
        t = ("{0:{1}f}"+mult[i]+suffix).format(v, precis) 

        return "{0:{1}}".format(t,width) if width != "" else t

if __name__ == "__main__":
    # Example usages

    # You can use normal format spcifiers as expected - just use the correct the presentation type (instead of f, e, G etc)
    # and cast the integer byte count to type size.

    # Example format specifications
    print "{0:.1f}".format(4386) # output - 4386.0
    print "{0:.1f}".format(size(4386)) # output 4386.0 - default numeric presentations respected.
    print "{0:.2eM}".format(size(86247)) # output 84.23KiB - base 1024
    print "{0:.2sM}".format(size(86247)) # output 86.25KB - base 1000
    print "{0:.2cM}".format(size(86247)) # output 84.23KB - base 1024
    print "{0:.2cm}".format(size(86247)) # output 0.66Mb - base 1024 in bits.

Fork of http://code.activestate.com/recipes/578321-human-readable-filememory-sizes/?in=lang-python

This fork reworks the original to honour more of the format string and allows more formatting options :

The only part of the format language not honoured are those related to signed numbers - but signs don't make much sense with file and memory sizes - using the signed number bits of the format will generate an error

Corrects many of the comments and issues highlighted on the previous recipe :

  1. Correctly uses "B" for bytes, and "b" for bits
  2. allows the user to chose between the SI, IEC or common formats, by use of different format specifiers
  3. honours field widths, with field widths applied including the correct suffixes.

Footnote : It occurred to me that you could also support SI, IEC and common formats by subclassing - and passing the appropriate factors, suffixes and bases to the super-class, that way you could have single character type specifiers - but that seems to add complexity for no obvious benefit for this problem. This would though make it easy to extend the solution to allow for other units - such as mass, time, length or any other unit you wish.