Whenever I feel the impulse to write a class, I pause for a moment and think whether I can get away with a closure. This time I will be using closures to readably and flexibly customize the sorting of sequences (for Pythons 2.1 through 2.3).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | # -- use cases: --
def usecases():
from pprint import pprint
print '\n--- Sorting Simple Lists ---\n'
data = ['Jim', 'Will', 'Tom']
print data
data.sort( By(len) )
print data
print '\n--- Sorting Simple Tables ---\n'
data = [
('Jim', 21),
('Will', 24),
('Tom', 20),
]
pprint(data)
data.sort( ByColumn(0) ) # using straight forward comparator closure
pprint(data)
data.sort( By(Column(1)) ) # comparator/accessor closure combination
pprint(data)
print '\n--- Sorting Object Tables ---\n'
class Record:
def __init__(self, **kw): self.__dict__.update(kw)
def __repr__(self): return '<record %r>' % self.__dict__
data = [
Record(name='Jim', age=21),
Record(name='Will', age=24),
Record(name='Tom', age=20),
]
pprint(data)
data.sort( ByAttribute('name') ) # using straight forward comparator closure
pprint(data)
data.sort( By(Attribute('age')) ) # comparator/accessor closure combination
pprint(data)
# -- a straight forward approach: --
def ByColumn(key):
def compare(obj1, obj2): return cmp(obj1[key], obj2[key])
return compare
def ByAttribute(name):
def compare(obj1, obj2): return cmp(getattr(obj1, name), getattr(obj2, name))
return compare
# -- a somewhat more generic approach: --
def By(accessor):
def compare(left, right): return cmp( accessor(left), accessor(right) )
return compare
def Attribute(name):
def get(obj): return getattr(obj, name)
return get
def Column(key):
def get(obj): return obj[key]
return get
# -- demo: --
if __name__ == '__main__':
usecases()
|
Innumerous programming cases involve the sorting of tabular data. Such data is most commonly represented as a sequence, where each element represents a table row. There are 3 major representations of a row:
- A tuple that contains the column values. Preserves column ordering.
- A dictionary that maps column names to column values. Assumes named columns.
- A python object that contains column values in attributes named by the colmns. Assumes columns names are valid python identifiers.
Usually, the sorting needs to occur in more then one way, and sorting by a table column (any table column) is a typical requirement.
The list.sort method supports custom sort ordering by accepting a comparator function. A typical idiom for customized sorting is the use of lambda. For example: <pre>listOfTuples.sort(lambda p, q: cmp(p[2], q[2])) listOfDicts.sort(lambda p, q: cmp(p['name'], q['name'])) listOfObjects.sort(lambda p, q: cmp(p.name, q.name)) </pre>What I don't like here is that lambda expression magic in the middle of the code. I'd like to have it more readable: <pre> listOfTuples.sort( ByColumn(2) ) listOfDicts.sort( ByColumn('name') ) listOfObjects.sort( ByAttribute('name') ) </pre>This finally leads me to the closures in the listing.
By coding the closures, it hits me that, for homogeneous sequences, THE COMPARISON METHOD IS NOT THE VARYING CONCEPT. It is the method that selects values for comparison. This leads me to a more generic comparator closure, By, and two accessor closures, Attribute and Column. In conjunction with appropriate naming, they facilitate sorting code that reads almost like a book: <pre> names.sort( By(len) ) cities.sort( By(Attribute('location')) ) matrix.sort( By(Column(5)) ) </pre>Closures to the max, right?
Cheers and happy sorting!
key, itemgetter, attrgetter. With Python V.2.4 you can do your examples as:
data.sort(key=len)
Or:
sorted(data, key=len)
from operator import itemgetter, attrgetter
sorted(data, key=itemgetter(0))
sorted(data, key=itemgetter(1))
print sorted(data, key=attrgetter("name"))
print sorted(data, key=attrgetter("age"))
Or:
print sorted(data, key=lambda r: r.name)
print sorted(data, key=lambda r: r.age)
Right ... ... I did this for a py2.3 app. Nice to see that python is going where I need it ;-)
what closures? I'm missing something here. Where is there a closure in any of this code? Over what are you closing? When is some part of the environment being captured into a function?
no closures. Been over it again... Looks to me like you think 'ByAttribute' etc are closing over the objects they will access, but they aren't. They are just producing functions that will take those objcts as arguments, and 'sort' is passing the arguments to them on every call. These are just plain vanilla functions, taking args and returning values. No part of the environment is closed over at the time of function creation. In fact, if there were closures, sort wouldn't need to keep passing the objects in each time for comparison, because the closures would have captured the values of those objects at the time they were created.
those are definitely closures. ByAttribute(name) and ByColumn(key) are both returning functions that enclose the 'name' and the 'key' respectively -- hence closures. Same goes for the more generic By(accessor), Attribute(name) and Column(key) functions.
for example. If I say "cmp_fn = ByAttribute('age')", then cmp_fn knows that it's job is to compare two dicts by their 'age' as opposed to comparing by some other key. cmp_fn is closed around the string 'age'.
and they are... As Eric Dobs points out, they ARE closures. (Thanks Eric)
See
http://en.wikipedia.org/wiki/Closure_%28computer_science%29
for a definition expressed better then I could ;)