This generator will sort only the requested part of a list. Useful for getting just the first few elements of a list and avoiding the overhead of sorting the whole thing.
1 2 3 4 5 6 7 8 | import heapq
def isorted(iterable):
lst = list(iterable)
heapq.heapify(lst)
pop = heapq.heappop
while lst:
yield pop(lst)
|
This second version is by Raymond Hettiger, and is both simpler and faster than my original one, based on a generator-based quicksort.
The name isorted follows the convention from the itertools package and mimics the new python 2.4 "sorted" builtin.
Hope this is useful for someone, I needed more than once something like this.
generator-based quicksort. I modified Matteo's original script so that isorted can take any iterable as an argument.
Note : my generator-based quicksort does not construct a list.
I used itertools.tee, which will appear in Python 2.4.
stable version. This version guarantees stability by selecting the first element as the pivot (this results in poor performance if the list is already sorted, but is no worse than the original version). Also, unlike the original version, it never modifies the list that is passed into it.
George and Daniel, thanks for your comments. I did some test, and saw that the problem in all these versions is performance: python's builtin list.sort is so good that there are really few cases in which these solutions are better. The real performance bonus would be implementing this in C.
If someone who has "the right to speak" :-) thinks this is worth the itertools package, I'd be happy to study how to implement it.
Simplification and speedup with heapq. In terms of use cases, lazy sorting is valuable only when you don't plan on consuming all of the output (otherwise, you might as well use a regular sort). Choosing the n smallest items out of a list would be a good application.
Keeping this in mind, a heap is a natural choice of underlying data structures:
isorted() is not an especially good candidate for itertools because it immediately manifests the whole input into memory and because most its use cases tend to be dominated by list.sort() and heapq.
Raymond, you're absolutely right. Although, I argue that a well-written lazy sort could be almost as efficient as list.sort. Not that it would matter, since heapsort is just fine. I'm adopting your modification for version 2. :-)
For 2.4 and later, consider using heapq's nsmallest or nlargest functions. The timsort implementation used by sorted and sort is great. But laziness can still win out. In my experience, lazy sorting is significantly more efficient when a percentage of the query is being used as seen in top N type queries. Such queries very popular for user interfaces! Consequently I like the fact that 2.4 added this functionality in heapq.nlargest and heapq.nsmallest.
There should be no surprise that lazy sorting can be more efficient, we are comparing O(n + k logn) = O(n) -- if k is a constant as it is with top 50 -- vs O(n logn).
Of course that percentage will vary with the constant overhead you will see in your code. As usual, try it both ways if the sorting is in a key piece of code, as it was in mine.