ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Closures For Highly Readable Sequence Sorting Customization
Submitter: Zoran Isailovski (other recipes)
Last Updated: 2006/01/25
Version no: 1.5
Category: OOP

 

5 stars 1 vote(s)


Description:

Whenever I feel the impulse to write a class, I pause for a moment and think whether I can get away with a closure. This time I will be using closures to readably and flexibly customize the sorting of sequences (for Pythons 2.1 through 2.3).

Source: Text Source

# -- use cases: --

def usecases():
   from pprint import pprint

   print '\n--- Sorting Simple Lists ---\n'
   data = ['Jim', 'Will', 'Tom']
   print data
   data.sort( By(len) )
   print data

   print '\n--- Sorting Simple Tables ---\n'
   data = [
      ('Jim', 21),
      ('Will', 24),
      ('Tom', 20),
      ]
   pprint(data)
   data.sort( ByColumn(0) ) # using straight forward comparator closure
   pprint(data)
   data.sort( By(Column(1)) ) # comparator/accessor closure combination
   pprint(data)

   print '\n--- Sorting Object Tables ---\n'
   class Record:
      def __init__(self, **kw): self.__dict__.update(kw)
      def __repr__(self): return '<record %r>' % self.__dict__
   data = [
      Record(name='Jim', age=21),
      Record(name='Will', age=24),
      Record(name='Tom', age=20),
      ]
   pprint(data)
   data.sort( ByAttribute('name') ) # using straight forward comparator closure
   pprint(data)
   data.sort( By(Attribute('age')) ) # comparator/accessor closure combination
   pprint(data)


# -- a straight forward approach: --

def ByColumn(key):
   def compare(obj1, obj2): return cmp(obj1[key], obj2[key])
   return compare

def ByAttribute(name):
   def compare(obj1, obj2): return cmp(getattr(obj1, name), getattr(obj2, name))
   return compare

# -- a somewhat more generic approach: --

def By(accessor):
   def compare(left, right): return cmp( accessor(left), accessor(right) )
   return compare
def Attribute(name):
   def get(obj): return getattr(obj, name)
   return get
def Column(key):
   def get(obj): return obj[key]
   return get

# -- demo: --

if __name__ == '__main__':
   usecases()

Discussion:

Innumerous programming cases involve the sorting of tabular data. Such data is most commonly represented as a sequence, where each element represents a table row. There are 3 major representations of a row:

- A tuple that contains the column values. Preserves column ordering.
- A dictionary that maps column names to column values. Assumes named columns.
- A python object that contains column values in attributes named by the colmns. Assumes columns names are valid python identifiers.

Usually, the sorting needs to occur in more then one way, and sorting by a table column (any table column) is a typical requirement.

The list.sort method supports custom sort ordering by accepting a comparator function. A typical idiom for customized sorting is the use of lambda. For example:

listOfTuples.sort(lambda p, q: cmp(p[2], q[2]))
listOfDicts.sort(lambda p, q: cmp(p['name'], q['name']))
listOfObjects.sort(lambda p, q: cmp(p.name, q.name))
What I don't like here is that lambda expression magic in the middle of the code. I'd like to have it more readable:
listOfTuples.sort( ByColumn(2) )
listOfDicts.sort( ByColumn('name') )
listOfObjects.sort( ByAttribute('name') )
This finally leads me to the closures in the listing.

By coding the closures, it hits me that, for homogeneous sequences, THE COMPARISON METHOD IS NOT THE VARYING CONCEPT. It is the method that selects values for comparison. This leads me to a more generic comparator closure, By, and two accessor closures, Attribute and Column. In conjunction with appropriate naming, they facilitate sorting code that reads almost like a book:
names.sort( By(len) )
cities.sort( By(Attribute('location')) )
matrix.sort( By(Column(5)) )
Closures to the max, right?


Cheers and happy sorting!



Add comment

Number of comments: 7

key, itemgetter, attrgetter, bearophile -, 2006/01/23
With Python V.2.4 you can do your examples as:
data.sort(key=len)
Or:
sorted(data, key=len)

from operator import itemgetter, attrgetter
sorted(data, key=itemgetter(0))
sorted(data, key=itemgetter(1))

print sorted(data, key=attrgetter("name"))
print sorted(data, key=attrgetter("age"))

Or:
print sorted(data, key=lambda r: r.name)
print sorted(data, key=lambda r: r.age)
Add comment

Right ..., Zoran Isailovski, 2006/01/24
... I did this for a py2.3 app. Nice to see that python is going where I need it ;-)
Add comment

what closures?, tim lawrence, 2006/07/08
I'm missing something here. Where is there a closure in any of this code? Over what are you closing? When is some part of the environment being captured into a function?
Add comment

no closures, tim lawrence, 2006/07/08
Been over it again... Looks to me like you think 'ByAttribute' etc are closing over the objects they will access, but they aren't. They are just producing functions that will take those objcts as arguments, and 'sort' is passing the arguments to them on every call. These are just plain vanilla functions, taking args and returning values. No part of the environment is closed over at the time of function creation. In fact, if there were closures, sort wouldn't need to keep passing the objects in each time for comparison, because the closures would have captured the values of those objects at the time they were created.
Add comment

and they are..., Zoran Isailovski, 2006/09/26
As Eric Dobs points out, they ARE closures. (Thanks Eric)

See

http://en.wikipedia.org/wiki/Closure_%28computer_science%29

for a definition expressed better then I could ;)
Add comment

those are definitely closures, Eric Dobbs, 2006/09/20
ByAttribute(name) and ByColumn(key) are both returning functions that enclose the 'name' and the 'key' respectively -- hence closures. Same goes for the more generic By(accessor), Attribute(name) and Column(key) functions.
Add comment

for example, Eric Dobbs, 2006/09/20
If I say "cmp_fn = ByAttribute('age')", then cmp_fn knows that it's job is to compare two dicts by their 'age' as opposed to comparing by some other key. cmp_fn is closed around the string 'age'.
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Treat the Win32 Registry ...

5. a friendly mkdir()

6. Wrapping template engine ...

7. Assignment in expression

8. Changing return value ...

9. Implementation of sets ...

10. bag collection class




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.