ActiveState Code

Recipe 520589: Tallying of objects (hashables)


This class can be used to tally objects, for example when analyzing log files. It has two separate ways of calculating the "score board", one quicker for Python >=2.4 and one that works with older versions (though I doubt it works with anything <2.0).

As the class uses a dictionary, only hashables can be counted. Other than that, you can mix and match them however you want.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from sys import version_info as pyver


class Tally:
    """ 
    A tally or histogram class

    Counts the occurances of objects (hashables only) given to it.
    The number of seen samples is stored in the object's samples 
    instance variable.

    """
    def __init__(self):
        self.samples = 0L
        self.values = {}

    def incr(self, k, amount = 1):
        """
        Increase count for k
        """
        self.samples += 1
        self.values[k] = self.values.get(k, 0) + amount

    def decr(self, k, amount = 1):
        """
        Decrease count for k
        """
        self.samples += 1
        self.values[k] = self.values.get(k, 0) - amount

    def val(self, k):
        """
        Return count for k
        """
        return self.values.get(k, 0)

    def getkeys(self):
        """
        Return all histogram keys
        """
        return self.values.keys()

    def counts(self, desc = False):
        '''
        Return list of keys, sorted by values.
        If desc is True, return a descending sort.
        '''
        if (pyver[0] < 2) or (pyver[0] == 2 and pyver[1] < 4):
            # This is for Python versions <2.4
            i = map(lambda items: list(items), self.values.items())
            map(lambda rev: rev.reverse(), i)
            i.sort()
            if desc:
                i.reverse()
            return i

        else:
            # This only works with Python >=2.4 but is much
            # faster than the code above.

            return sorted(( list(reversed(items)) \
                for items in self.values.iteritems() ), reverse = desc)

Discussion

I keep using this class when writing log file analyzers. Usually, a count of clients, IPs, messages, events etc. is at least part of the analysis. Using a dictionary makes add items relatively fast. If this class is too slow, I use the code inline, removing all function/method calls (but making the code less readable). I haven't taken any measurements, but the class may benefit from Psyco.

Comments

  1. 1. At 6:59 a.m. on 21 may 2007, thanos vassilakis said:

    A simple counter. I and others use something like this:

    class Counter(dict):
        def count(self, what):
            try:
                self[what] +=1
            except KeyError:
                self[what] =1
            return self
    

    For sample I just

    sum(someCounter.values())
    

    Also Your sort could be a lot simpler: See other recipes.

  2. 2. At 2:24 a.m. on 22 may 2007, Matteo Dell'Amico said:

    Look at recipes for bag. There is a recipe for bags that solves the same problem, so you might want to check it.

    Also, in python 2.5, defaultdict(int) does pretty much what you need.

  3. 3. At 9:43 a.m. on 13 jun 2007, Anonymous (the author) said:

    thx. Thanks for the comments.

    The recipe is a typical result of bein entrenched. I originally wrote it in the times of 1.5.2, when most of the fancier stuff in Python either wasn't there or was slow. As it worked for me, I never bothered to rethink the entire approach.

    Lesson learned.

Sign in to comment