Welcome, guest | Sign In | My Account | Store | Cart

When beginning to compress a file or studying it to break certain forms of encryption, sometimes it is helpful to know how many bytes of a certain category are in a file. This recipe is a simple frequency analysis tool that may be helpful towards that end and can provide a starting point for those interested tools for such fields.

Python, 20 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import os
import sys

def main():
    try:
        table = [0] * 256
        data = open(sys.argv[1], 'rb')
        buff = data.read(2 ** 20)
        while buff:
            for c in buff:
                table[ord(c)] += 1
            buff = data.read(2 ** 20)
        data.close()
        sys.stdout.write('\n'.join('%02X = %d' % (i, c) for i, c in enumerate(table) if c))
    except:
        sys.stdout.write('Usage: %s <filename>' % os.path.basename(sys.argv[0]))


if __name__ == '__main__':
    main()

1 comment

beni hess 11 years, 4 months ago  # | flag

Others might just call it histogram ;)