one liner frequency count « Python recipes

You often see frequency counts done with dicts, requiring serveral lines of code. Here is a way to do it in one line using itertools and list comprehensions. This revised version was suggested by Raymon Hettinger. It is O(n log n).

      >>> from itertools import groupby
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('1', 4), ('2', 1), ('3', 2), ('4', 1)]

      

Tags: shortcuts

7 comments

David Eppstein 20 years ago # | flag

Wrong tradeoff. Your implementation takes a linear-time call to count for each member in the set. The somewhat less concise dictionary implementation takes only a single linear-time pass through a dictionary. Since when has conciseness been a virtue in Python?

Jason Whitlark (author) 20 years ago # | flag

For readability, not conciseness. True, it is less efficient, but as a practical matter, I was using it for frequency counts on a list of ~5000 strings, and could not perceive a slowdown. Personally, Id prefer to have 1 line instead of 6, knowing that I can replace it if performance ever becomes a problem. Especially since this version is understandable at a glance.

As Guido says in his optimization anecdote: Rule number one: only optimize when there is a proven speed bottleneck.

Samuel Reynolds 19 years, 9 months ago # | flag

Create a function. So create a histogram function

def histogram( A, flAsList=False ):
    """Return histogram of values in array A."""
    H = {}
    for val in A:
        H[val] = H.get(val,0) + 1
    if flAsList:
        return H.items()
    return H

Then the call is self-documenting:

freq = histogram( myList )

Raymond Hettinger 18 years, 6 months ago # | flag

Channeling Guido. Guido almost certainly would find the OP's code to be abhorrent. Admonitions against premature optimization do not equate to a directive to write insanely inefficient code that does not scale. Compression to a single line is only a worthy goal if there is no sacrifice in clarity or performance.

Raymond Hettinger 18 years, 6 months ago # | flag

O(n log n) variation. Equivalent to: sort myList | uniq -c.

>>> from itertools import groupby
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('1', 4), ('2', 1), ('3', 2), ('4', 1)]

Jason Whitlark (author) 17 years, 5 months ago # | flag

Thanks Raymond, for the improvement. I have revised the recipe to match your implementation.

Nick Galbreath 14 years, 2 months ago # | flag

At least on my computer, this can be made ~2x faster by eliminating the temporary list.

from itertools import groupby
def ilen(iterable):
    count = 0
    for i in iterable:
        count += 1
    return count
[(k, ilen(g)) for k, g in groupby(sorted(myList))]

I'm sure there is a better implementation for "ilen". Yeah, i know the recipe said "one-liner", but hopes this helps anyways.

◄	Python recipes (4591)	►
◄	Jason Whitlark's recipes (4)	►

one liner frequency count (Python recipe) by Jason Whitlark
ActiveState Code (http://code.activestate.com/recipes/277600/)

7 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

one liner frequency count (Python recipe) by Jason Whitlark ActiveState Code (http://code.activestate.com/recipes/277600/)

7 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

one liner frequency count (Python recipe) by Jason Whitlark
ActiveState Code (http://code.activestate.com/recipes/277600/)