Welcome, guest | Sign In | My Account | Store | Cart

Whenever I feel the impulse to write a class, I pause for a moment and think whether I can get away with a closure. This time I will be using closures to readably and flexibly customize the sorting of sequences (for Pythons 2.1 through 2.3).

Python, 65 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# -- use cases: --

def usecases():
   from pprint import pprint

   print '\n--- Sorting Simple Lists ---\n'
   data = ['Jim', 'Will', 'Tom']
   print data
   data.sort( By(len) )
   print data

   print '\n--- Sorting Simple Tables ---\n'
   data = [
      ('Jim', 21),
      ('Will', 24),
      ('Tom', 20),
      ]
   pprint(data)
   data.sort( ByColumn(0) ) # using straight forward comparator closure
   pprint(data)
   data.sort( By(Column(1)) ) # comparator/accessor closure combination
   pprint(data)

   print '\n--- Sorting Object Tables ---\n'
   class Record:
      def __init__(self, **kw): self.__dict__.update(kw)
      def __repr__(self): return '<record %r>' % self.__dict__
   data = [
      Record(name='Jim', age=21),
      Record(name='Will', age=24),
      Record(name='Tom', age=20),
      ]
   pprint(data)
   data.sort( ByAttribute('name') ) # using straight forward comparator closure
   pprint(data)
   data.sort( By(Attribute('age')) ) # comparator/accessor closure combination
   pprint(data)


# -- a straight forward approach: --

def ByColumn(key):
   def compare(obj1, obj2): return cmp(obj1[key], obj2[key])
   return compare

def ByAttribute(name):
   def compare(obj1, obj2): return cmp(getattr(obj1, name), getattr(obj2, name))
   return compare

# -- a somewhat more generic approach: --

def By(accessor):
   def compare(left, right): return cmp( accessor(left), accessor(right) )
   return compare
def Attribute(name):
   def get(obj): return getattr(obj, name)
   return get
def Column(key):
   def get(obj): return obj[key]
   return get

# -- demo: --

if __name__ == '__main__':
   usecases()

Innumerous programming cases involve the sorting of tabular data. Such data is most commonly represented as a sequence, where each element represents a table row. There are 3 major representations of a row:

  • A tuple that contains the column values. Preserves column ordering.
  • A dictionary that maps column names to column values. Assumes named columns.
  • A python object that contains column values in attributes named by the colmns. Assumes columns names are valid python identifiers.

Usually, the sorting needs to occur in more then one way, and sorting by a table column (any table column) is a typical requirement.

The list.sort method supports custom sort ordering by accepting a comparator function. A typical idiom for customized sorting is the use of lambda. For example: <pre>listOfTuples.sort(lambda p, q: cmp(p[2], q[2])) listOfDicts.sort(lambda p, q: cmp(p['name'], q['name'])) listOfObjects.sort(lambda p, q: cmp(p.name, q.name)) </pre>What I don't like here is that lambda expression magic in the middle of the code. I'd like to have it more readable: <pre> listOfTuples.sort( ByColumn(2) ) listOfDicts.sort( ByColumn('name') ) listOfObjects.sort( ByAttribute('name') ) </pre>This finally leads me to the closures in the listing.

By coding the closures, it hits me that, for homogeneous sequences, THE COMPARISON METHOD IS NOT THE VARYING CONCEPT. It is the method that selects values for comparison. This leads me to a more generic comparator closure, By, and two accessor closures, Attribute and Column. In conjunction with appropriate naming, they facilitate sorting code that reads almost like a book: <pre> names.sort( By(len) ) cities.sort( By(Attribute('location')) ) matrix.sort( By(Column(5)) ) </pre>Closures to the max, right?

Cheers and happy sorting!

7 comments

bearophile - 15 years, 10 months ago  # | flag

key, itemgetter, attrgetter. With Python V.2.4 you can do your examples as:

data.sort(key=len)

Or:

sorted(data, key=len)

from operator import itemgetter, attrgetter

sorted(data, key=itemgetter(0))

sorted(data, key=itemgetter(1))

print sorted(data, key=attrgetter("name"))

print sorted(data, key=attrgetter("age"))

Or:

print sorted(data, key=lambda r: r.name)

print sorted(data, key=lambda r: r.age)

Zoran Isailovski (author) 15 years, 10 months ago  # | flag

Right ... ... I did this for a py2.3 app. Nice to see that python is going where I need it ;-)

tim lawrence 15 years, 5 months ago  # | flag

what closures? I'm missing something here. Where is there a closure in any of this code? Over what are you closing? When is some part of the environment being captured into a function?

tim lawrence 15 years, 5 months ago  # | flag

no closures. Been over it again... Looks to me like you think 'ByAttribute' etc are closing over the objects they will access, but they aren't. They are just producing functions that will take those objcts as arguments, and 'sort' is passing the arguments to them on every call. These are just plain vanilla functions, taking args and returning values. No part of the environment is closed over at the time of function creation. In fact, if there were closures, sort wouldn't need to keep passing the objects in each time for comparison, because the closures would have captured the values of those objects at the time they were created.

Eric Dobbs 15 years, 2 months ago  # | flag

those are definitely closures. ByAttribute(name) and ByColumn(key) are both returning functions that enclose the 'name' and the 'key' respectively -- hence closures. Same goes for the more generic By(accessor), Attribute(name) and Column(key) functions.

Eric Dobbs 15 years, 2 months ago  # | flag

for example. If I say "cmp_fn = ByAttribute('age')", then cmp_fn knows that it's job is to compare two dicts by their 'age' as opposed to comparing by some other key. cmp_fn is closed around the string 'age'.

Zoran Isailovski (author) 15 years, 2 months ago  # | flag

and they are... As Eric Dobs points out, they ARE closures. (Thanks Eric)

See

http://en.wikipedia.org/wiki/Closure_%28computer_science%29

for a definition expressed better then I could ;)

Created by Zoran Isailovski on Sun, 22 Jan 2006 (PSF)
Python recipes (4591)
Zoran Isailovski's recipes (13)

Required Modules

Other Information and Tasks