ActiveState Code

Recipe 277600: one liner frequency count


You often see frequency counts done with dicts, requiring serveral lines of code. Here is a way to do it in one line using itertools and list comprehensions. This revised version was suggested by Raymon Hettinger. It is O(n log n).

Python
1
2
3
>>> from itertools import groupby
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('1', 4), ('2', 1), ('3', 2), ('4', 1)]

Comments

  1. 1. At 9:54 a.m. on 11 apr 2004, David Eppstein said:

    Wrong tradeoff. Your implementation takes a linear-time call to count for each member in the set. The somewhat less concise dictionary implementation takes only a single linear-time pass through a dictionary. Since when has conciseness been a virtue in Python?

  2. 2. At 2:07 p.m. on 11 apr 2004, Jason Whitlark (the author) said:

    For readability, not conciseness. True, it is less efficient, but as a practical matter, I was using it for frequency counts on a list of ~5000 strings, and could not perceive a slowdown. Personally, I’d prefer to have 1 line instead of 6, knowing that I can replace it if performance ever becomes a problem. Especially since this version is understandable at a glance.

    As Guido says in his optimization anecdote: “Rule number one: only optimize when there is a proven speed bottleneck.”

  3. 3. At 7:39 a.m. on 8 jul 2004, Samuel Reynolds said:

    Create a function. So create a histogram function

    def histogram( A, flAsList=False ):
        """Return histogram of values in array A."""
        H = {}
        for val in A:
            H[val] = H.get(val,0) + 1
        if flAsList:
            return H.items()
        return H
    

    Then the call is self-documenting:

    freq = histogram( myList )
    
  4. 4. At 9:17 a.m. on 14 oct 2005, Raymond Hettinger said:

    Channeling Guido. Guido almost certainly would find the OP's code to be abhorrent. Admonitions against premature optimization do not equate to a directive to write insanely inefficient code that does not scale. Compression to a single line is only a worthy goal if there is no sacrifice in clarity or performance.

  5. 5. At 9:26 a.m. on 14 oct 2005, Raymond Hettinger said:

    O(n log n) variation. Equivalent to: sort myList | uniq -c.

    >>> from itertools import groupby
    >>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
    [('1', 4), ('2', 1), ('3', 2), ('4', 1)]
    
  6. 6. At 5:12 p.m. on 16 nov 2006, Jason Whitlark (the author) said:

    Thanks Raymond, for the improvement. I have revised the recipe to match your implementation.

Sign in to comment