I am writing a log parse script that needs to run ever 10 minutes or so to update some stats in a database. This subclass if the 'file' object looks for a '.filename.pkl' file which contains the seek offset of the previous end of the file, then sets the seek offset to that number before returning the file. On closing the file or StopIteration, it writes the new max offset to this pickle file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
import os import cPickle as pickle class fileUpdate(file): def __init__(self, name, mode, bufsize=-1): file.__init__(self, name, mode, bufsize) self.pkl_path = '.%s.pkl' % name self.offset = None if os.path.exists(self.pkl_path): self.pkl_file = open(self.pkl_path) self.offset = pickle.load(self.pkl_file) self.seek(self.offset, 0) def close(self): self.recordExitOffset() file.close(self) def recordExitOffset(self): pickle.dump(self.tell(), open(self.pkl_path, 'w')) def next(self): try: return file.next(self) except StopIteration: self.recordExitOffset() raise StopIteration
One problem I see with this right now is that the script has to have write access to the directory where the file resides. This isn't a problem in my situation, but I can see it coming up in the future. This should be useful in all sorts of logparsing scripts. I tried it w/ a decent sized logfile (600mb) and it was fast enough for the script I am writing.