I am writing a log parse script that needs to run ever 10 minutes or so to update some stats in a database. This subclass if the 'file' object looks for a '.filename.pkl' file which contains the seek offset of the previous end of the file, then sets the seek offset to that number before returning the file. On closing the file or StopIteration, it writes the new max offset to this pickle file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import os
import cPickle as pickle
class fileUpdate(file):
def __init__(self, name, mode, bufsize=-1):
file.__init__(self, name, mode, bufsize)
self.pkl_path = '.%s.pkl' % name
self.offset = None
if os.path.exists(self.pkl_path):
self.pkl_file = open(self.pkl_path)
self.offset = pickle.load(self.pkl_file)
self.seek(self.offset, 0)
def close(self):
self.recordExitOffset()
file.close(self)
def recordExitOffset(self):
pickle.dump(self.tell(), open(self.pkl_path, 'w'))
def next(self):
try:
return file.next(self)
except StopIteration:
self.recordExitOffset()
raise StopIteration
|
One problem I see with this right now is that the script has to have write access to the directory where the file resides. This isn't a problem in my situation, but I can see it coming up in the future. This should be useful in all sorts of logparsing scripts. I tried it w/ a decent sized logfile (600mb) and it was fast enough for the script I am writing.