|
|
 |
|
Title: Closures For Highly Readable Sequence Sorting Customization
Submitter: Zoran Isailovski
(other recipes)
Last Updated: 2006/01/25
Version no: 1.5
Category:
OOP
|
|
1 vote(s)
|
|
|
|
Description:
Whenever I feel the impulse to write a class, I pause for a moment and think whether I can get away with a closure. This time I will be using closures to readably and flexibly customize the sorting of sequences (for Pythons 2.1 through 2.3).
Source: Text Source
def usecases():
from pprint import pprint
print '\n--- Sorting Simple Lists ---\n'
data = ['Jim', 'Will', 'Tom']
print data
data.sort( By(len) )
print data
print '\n--- Sorting Simple Tables ---\n'
data = [
('Jim', 21),
('Will', 24),
('Tom', 20),
]
pprint(data)
data.sort( ByColumn(0) )
pprint(data)
data.sort( By(Column(1)) )
pprint(data)
print '\n--- Sorting Object Tables ---\n'
class Record:
def __init__(self, **kw): self.__dict__.update(kw)
def __repr__(self): return '<record %r>' % self.__dict__
data = [
Record(name='Jim', age=21),
Record(name='Will', age=24),
Record(name='Tom', age=20),
]
pprint(data)
data.sort( ByAttribute('name') )
pprint(data)
data.sort( By(Attribute('age')) )
pprint(data)
def ByColumn(key):
def compare(obj1, obj2): return cmp(obj1[key], obj2[key])
return compare
def ByAttribute(name):
def compare(obj1, obj2): return cmp(getattr(obj1, name), getattr(obj2, name))
return compare
def By(accessor):
def compare(left, right): return cmp( accessor(left), accessor(right) )
return compare
def Attribute(name):
def get(obj): return getattr(obj, name)
return get
def Column(key):
def get(obj): return obj[key]
return get
if __name__ == '__main__':
usecases()
Discussion:
Innumerous programming cases involve the sorting of tabular data. Such data is most commonly represented as a sequence, where each element represents a table row. There are 3 major representations of a row:
- A tuple that contains the column values. Preserves column ordering.
- A dictionary that maps column names to column values. Assumes named columns.
- A python object that contains column values in attributes named by the colmns. Assumes columns names are valid python identifiers.
Usually, the sorting needs to occur in more then one way, and sorting by a table column (any table column) is a typical requirement.
The list.sort method supports custom sort ordering by accepting a comparator function. A typical idiom for customized sorting is the use of lambda. For example:
listOfTuples.sort(lambda p, q: cmp(p[2], q[2]))
listOfDicts.sort(lambda p, q: cmp(p['name'], q['name']))
listOfObjects.sort(lambda p, q: cmp(p.name, q.name))
What I don't like here is that lambda expression magic in the middle of the code. I'd like to have it more readable:
listOfTuples.sort( ByColumn(2) )
listOfDicts.sort( ByColumn('name') )
listOfObjects.sort( ByAttribute('name') )
This finally leads me to the closures in the listing.
By coding the closures, it hits me that, for homogeneous sequences, THE COMPARISON METHOD IS NOT THE VARYING CONCEPT. It is the method that selects values for comparison. This leads me to a more generic comparator closure, By, and two accessor closures, Attribute and Column. In conjunction with appropriate naming, they facilitate sorting code that reads almost like a book:
names.sort( By(len) )
cities.sort( By(Attribute('location')) )
matrix.sort( By(Column(5)) )
Closures to the max, right?
Cheers and happy sorting!
|
|
Add comment
|
|
Number of comments: 7
key, itemgetter, attrgetter, bearophile -, 2006/01/23
With Python V.2.4 you can do your examples as:
data.sort(key=len)
Or:
sorted(data, key=len)
from operator import itemgetter, attrgetter
sorted(data, key=itemgetter(0))
sorted(data, key=itemgetter(1))
print sorted(data, key=attrgetter("name"))
print sorted(data, key=attrgetter("age"))
Or:
print sorted(data, key=lambda r: r.name)
print sorted(data, key=lambda r: r.age)
Add comment
Right ..., Zoran Isailovski, 2006/01/24
... I did this for a py2.3 app. Nice to see that python is going where I need it ;-)
Add comment
what closures?, tim lawrence, 2006/07/08
I'm missing something here. Where is there a closure in any of this code? Over what are you closing? When is some part of the environment being captured into a function?
Add comment
no closures, tim lawrence, 2006/07/08
Been over it again... Looks to me like you think 'ByAttribute' etc are closing over the objects they will access, but they aren't. They are just producing functions that will take those objcts as arguments, and 'sort' is passing the arguments to them on every call. These are just plain vanilla functions, taking args and returning values. No part of the environment is closed over at the time of function creation. In fact, if there were closures, sort wouldn't need to keep passing the objects in each time for comparison, because the closures would have captured the values of those objects at the time they were created.
Add comment
and they are..., Zoran Isailovski, 2006/09/26
As Eric Dobs points out, they ARE closures. (Thanks Eric)
See
http://en.wikipedia.org/wiki/Closure_%28computer_science%29
for a definition expressed better then I could ;)
Add comment
those are definitely closures, Eric Dobbs, 2006/09/20
ByAttribute(name) and ByColumn(key) are both returning functions that enclose the 'name' and the 'key' respectively -- hence closures. Same goes for the more generic By(accessor), Attribute(name) and Column(key) functions.
Add comment
for example, Eric Dobbs, 2006/09/20
If I say "cmp_fn = ByAttribute('age')", then cmp_fn knows that it's job is to compare two dicts by their 'age' as opposed to comparing by some other key. cmp_fn is closed around the string 'age'.
Add comment
|
|
|
|
|
 |
|