Welcome, guest | Sign In | My Account | Store | Cart

Using time "slices" to categorize events by period (Python recipe) by Chris McDonough
ActiveState Code (http://code.activestate.com/recipes/104734/)

Break all of time up into "slices" in order to categorize events.

      import weblog.combined, sys, time, math

def getTimeslice(period, utime):
    low = int(math.floor(utime)) - period + 1
    high = int(math.ceil(utime)) + 1
    for x in range(low, high):
        if x % period == 0:
            return x

def main(files):
    START = time.mktime([2001,11,12,9,0,0,0,0,0])
    END   = time.mktime([2001,11,12,10,0,0,0,0,0])
    t = 0
    slices = {}
    for file in files:
        print file
        log = weblog.combined.Parser(open(file))
        i = 0
        while log.getlogent():
            if log.utime<START or log.utime>END: continue
            slice = getTimeslice(60, log.utime)
            if slices.get(slice) is None:
                slices[slice] = 1
            else:
                slices[slice]=slices[slice]+1
            i=i+1
        print i
        t = t + i

    avg = None
    peak = 0
    peak_ts = 0
    for ts in slices.keys():
        if avg is None:
            avg = slices[ts]
        else:
            avg = (avg + slices[ts]) / 2
        if slices[ts] > peak:
            peak = slices[ts]
            peak_ts = ts
        
    print "Total: %s" % t
    print "Average: %s" % avg
    print "Peak: %s (at %s seconds)" % (peak, peak_ts)

if __name__ == '__main__':
    files = sys.argv[1:]
    main(files)

      

When analyzing some types of logs like webserver logs, you'd like to attribute "hits" to "time buckets" in order to answer questions like "what is the busiest hour of the day for my website"?

The above script uses the "weblog" web log analysis framework by Mark Nottingham (which seems to be usable only with Python 1.5, due to backwards incompatibilities with Python 2.1), available from http://www.mnot.net/scripting/python/WebLog/. It analyzes a set of Apache web server access logs for a time period. It outputs the total number of "hits" as well as the peak and average number of hits per minute. It extends the weblog framework in the __main__ routine, using the "getTimeslice" method to obtain an integer that represents a unique 60-second period of time during the log period. Then the __main__ routine uses this timeslice as a key in a dictionary which maps timelice to number of hits, allowing the script to report a "peak" 60-second period.

I've also successfully used this strategy for things like opportunistic garbage collection, where it's useful to be able to place collections of items into "buckets" that are represented by a timeslice, dumping them only when the bucket is expired.

Tags: algorithms

Created by Chris McDonough on Thu, 27 Dec 2001 (PSF)

◄	Python recipes (4591)	►
◄	Chris McDonough's recipes (4)	►

Required Modules

Other Information and Tasks

Licensed under the PSF License
Viewed 6924 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Using time "slices" to categorize events by period (Python recipe) by Chris McDonough ActiveState Code (http://code.activestate.com/recipes/104734/)