When beginning to compress a file or studying it to break certain forms of encryption, sometimes it is helpful to know how many bytes of a certain category are in a file. This recipe is a simple frequency analysis tool that may be helpful towards that end and can provide a starting point for those interested tools for such fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import os
import sys
def main():
try:
table = [0] * 256
data = open(sys.argv[1], 'rb')
buff = data.read(2 ** 20)
while buff:
for c in buff:
table[ord(c)] += 1
buff = data.read(2 ** 20)
data.close()
sys.stdout.write('\n'.join('%02X = %d' % (i, c) for i, c in enumerate(table) if c))
except:
sys.stdout.write('Usage: %s <filename>' % os.path.basename(sys.argv[0]))
if __name__ == '__main__':
main()
|
Tags: demonstration
Others might just call it histogram ;)