One way to read files that contain binary fields is to use the struct
module. However, to do this properly one must learn struct's format characters, which may look especially cryptic when sprinkled around the code. So instead, I use a wrapper object that presents a simple interface as well as type names that are more inline with many IDLs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | import struct
class BinaryReaderEOFException(Exception):
def __init__(self):
pass
def __str__(self):
return 'Not enough bytes in file to satisfy read request'
class BinaryReader:
# Map well-known type names into struct format characters.
typeNames = {
'int8' :'b',
'uint8' :'B',
'int16' :'h',
'uint16' :'H',
'int32' :'i',
'uint32' :'I',
'int64' :'q',
'uint64' :'Q',
'float' :'f',
'double' :'d',
'char' :'s'}
def __init__(self, fileName):
self.file = open(fileName, 'rb')
def read(self, typeName):
typeFormat = BinaryReader.typeNames[typeName.lower()]
typeSize = struct.calcsize(typeFormat)
value = self.file.read(typeSize)
if typeSize != len(value):
raise BinaryReaderEOFException
return struct.unpack(typeFormat, value)[0]
def __del__(self):
self.file.close()
|
For example, if we were to decode a binary packet from a file it might look something like this:
binaryReader = BinaryReader('secret.bin')
try:
packetId = binaryReader.read('uint8')
timestamp = binaryReader.read('uint64')
secretCodeLen = binaryReader.read('uint32')
secretCode = []
while secretCodeLen > 0:
secretCode.append(binaryReader.read('uint8'))
secretCodeLen = secretCodeLen - 1
except BinaryReaderEOFException:
# One of our attempts to read a field went beyond the end of the file.
print "Error: File seems to be corrupted."
Note that exceptions resulting from file operations are simply forwarded as-is. Also, it goes without saying that the type names can be changed to something more familiar to you or your project.
There are various improvements that would make this object even more useful, you can experiment with these to your heart's content:
- Modify read() to take a list of types and unpack them all together. This would be more efficient than making multiple function calls. In fact, this would be especially useful when reading strings.
- Consider making BinaryReader inherit from the file object. This would give the user access to file manipulation functions, making the object more flexible.
- Add '__enter__' and '__exit__' statements to make BinaryReader compatible with Python's "with" statement (Python v2.5 and later).
- The implementation above assumes that when the user tries to read beyond the end of file, its okay to throw an exception and throw away whichever bytes were read during the offending call. Perhaps this isn't always the case?
- Come up with a snazzier name than BinaryReader :-)
Won't this slow down the code a lot, since each call to 'read' is now a double call with some extra processing?
Most of the extra processing has to happen anyway when reading a binary file in order to calculate size, verify read operation, etc. The only real extra here is the mapping of struct format characters to the more common type names. Admittedly, there is overhead related to calling
read()
for each field. As I mentioned in the list of suggested improvements, this function can be made more efficient by taking a list of types instead, and combining them when performing the actual file operation. Having both versions would give your user the ability to choose between efficiency and code clarity. Cheers.