ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Lazy sorting
Submitter: Matteo Dell'Amico (other recipes)
Last Updated: 2004/05/03
Version no: 1.1
Category: Searching

 

5 stars 2 vote(s)


Description:

This generator will sort only the requested part of a list. Useful for getting just the first few elements of a list and avoiding the overhead of sorting the whole thing.

Source: Text Source

import heapq

def isorted(iterable):
    lst = list(iterable)
    heapq.heapify(lst)
    pop = heapq.heappop
    while lst:
        yield pop(lst)

Discussion:

This second version is by Raymond Hettiger, and is both simpler and faster than my original one, based on a generator-based quicksort.

The name isorted follows the convention from the itertools package and mimics the new python 2.4 "sorted" builtin.

Hope this is useful for someone, I needed more than once something like this.



Add comment

Number of comments: 6

generator-based quicksort, George Yoshida, 2004/04/28
I modified Matteo's original script so that isorted can take any iterable as an argument.

from itertools import ifilter, count

try:
    from itertools import tee
except ImportError, err:
    def tee(iterable):
        def gen(next, data={}, cnt=[0]):
            for i in count():
                if i == cnt[0]:
                    item = data[i] = next()
                    cnt[0] += 1
                else:
                    item = data.pop(i)
                yield item
        it = iter(iterable)
        return (gen(it.next), gen(it.next))

def isorted(iterable):
    iterable = iter(iterable)
    try:
        pivot = iterable.next()
    except:
        return

    a, b = tee(iterable)
    for x in isorted(ifilter(lambda item:item<pivot, a)):
        yield x
    yield pivot
    for x in isorted(ifilter(lambda item:item>=pivot, b)):
        yield x
Note : my generator-based quicksort does not construct a list.
I used itertools.tee, which will appear in Python 2.4.
Add comment

stable version, Daniel Harding, 2004/04/28
This version guarantees stability by selecting the first element as the pivot (this results in poor performance if the list is already sorted, but is no worse than the original version). Also, unlike the original version, it never modifies the list that is passed into it.

MIN_SIZE = 128

def isorted(lst):
    if len(lst) <= MIN_SIZE:
        lst = lst[:]
        lst.sort()
        for x in lst:
            yield x
    else:
        pivot = lst[0]
        lst = lst[1:]
        lt = [x for x in lst if x < pivot]
        for x in isorted(lt):
            yield x
        yield pivot
        ge = [x for x in lst if x >= pivot]
        for x in isorted(ge):
            yield x

Add comment

Matteo Dell'Amico, 2004/04/29
George and Daniel, thanks for your comments. I did some test, and saw that the problem in all these versions is performance: python's builtin list.sort is so good that there are really few cases in which these solutions are better. The real performance bonus would be implementing this in C.
If someone who has "the right to speak" :-) thinks this is worth the itertools package, I'd be happy to study how to implement it.
Add comment

Simplification and speedup with heapq, Raymond Hettinger, 2004/05/01
In terms of use cases, lazy sorting is valuable only when you don't plan on consuming all of the output (otherwise, you might as well use a regular sort). Choosing the n smallest items out of a list would be a good application.
Keeping this in mind, a heap is a natural choice of underlying data structures:

import heapq

def isorted(iterable):
    lst = list(iterable)
    heapq.heapify(lst)
    pop = heapq.heappop
    while lst:
        yield pop(lst)
isorted() is not an especially good candidate for itertools because it immediately manifests the whole input into memory and because most its use cases tend to be dominated by list.sort() and heapq.
Add comment

Matteo Dell'Amico, 2004/05/03
Raymond, you're absolutely right. Although, I argue that a well-written lazy sort could be almost as efficient as list.sort. Not that it would matter, since heapsort is just fine. I'm adopting your modification for version 2. :-)
Add comment

For 2.4 and later, consider using heapq's nsmallest or nlargest functions, Jim Baker, 2006/08/31
The timsort implementation used by sorted and sort is great. But laziness can still win out. In my experience, lazy sorting is significantly more efficient when a percentage of the query is being used as seen in top N type queries. Such queries very popular for user interfaces! Consequently I like the fact that 2.4 added this functionality in heapq.nlargest and heapq.nsmallest.
There should be no surprise that lazy sorting can be more efficient, we are comparing O(n + k logn) = O(n) -- if k is a constant as it is with top 50 -- vs O(n logn).
Of course that percentage will vary with the constant overhead you will see in your code. As usual, try it both ways if the sorting is in a key piece of code, as it was in mine.
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. Wrapping template engine ...

4. Assignment in expression

5. SOLVING THE METACLASS ...

6. Povray for python

7. Calling Windows API ...

8. Generic filter logic ...

9. Function Decorators by ...

10. MS SQL Server log monitor




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.