Welcome, guest | Sign In | My Account | Store | Cart

You often see frequency counts done with dicts, requiring serveral lines of code. Here is a way to do it in one line using itertools and list comprehensions. This revised version was suggested by Raymon Hettinger. It is O(n log n).

Python, 3 lines
1
2
3
>>> from itertools import groupby
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('1', 4), ('2', 1), ('3', 2), ('4', 1)]

7 comments

David Eppstein 17 years, 8 months ago  # | flag

Wrong tradeoff. Your implementation takes a linear-time call to count for each member in the set. The somewhat less concise dictionary implementation takes only a single linear-time pass through a dictionary. Since when has conciseness been a virtue in Python?

Jason Whitlark (author) 17 years, 8 months ago  # | flag

For readability, not conciseness. True, it is less efficient, but as a practical matter, I was using it for frequency counts on a list of ~5000 strings, and could not perceive a slowdown. Personally, I’d prefer to have 1 line instead of 6, knowing that I can replace it if performance ever becomes a problem. Especially since this version is understandable at a glance.

As Guido says in his optimization anecdote: “Rule number one: only optimize when there is a proven speed bottleneck.”

Samuel Reynolds 17 years, 5 months ago  # | flag

Create a function. So create a histogram function

def histogram( A, flAsList=False ):
    """Return histogram of values in array A."""
    H = {}
    for val in A:
        H[val] = H.get(val,0) + 1
    if flAsList:
        return H.items()
    return H

Then the call is self-documenting:

freq = histogram( myList )
Raymond Hettinger 16 years, 1 month ago  # | flag

Channeling Guido. Guido almost certainly would find the OP's code to be abhorrent. Admonitions against premature optimization do not equate to a directive to write insanely inefficient code that does not scale. Compression to a single line is only a worthy goal if there is no sacrifice in clarity or performance.

Raymond Hettinger 16 years, 1 month ago  # | flag

O(n log n) variation. Equivalent to: sort myList | uniq -c.

>>> from itertools import groupby
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('1', 4), ('2', 1), ('3', 2), ('4', 1)]

Thanks Raymond, for the improvement. I have revised the recipe to match your implementation.

Nick Galbreath 11 years, 9 months ago  # | flag

At least on my computer, this can be made ~2x faster by eliminating the temporary list.

from itertools import groupby
def ilen(iterable):
    count = 0
    for i in iterable:
        count += 1
    return count
[(k, ilen(g)) for k, g in groupby(sorted(myList))]

I'm sure there is a better implementation for "ilen". Yeah, i know the recipe said "one-liner", but hopes this helps anyways.

Created by Jason Whitlark on Sun, 11 Apr 2004 (PSF)
Python recipes (4591)
Jason Whitlark's recipes (4)

Required Modules

  • (none specified)

Other Information and Tasks