Lists of data grouped by a key value are common - obvious examples are spreadsheets or other tabular arrangements of information. In many cases, the new itertools groupby function introduced in Python 2.4 can provide a means of easily generating summaries of such information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
from itertools import groupby from operator import itemgetter def summary(data, key=itemgetter(0), value=itemgetter(1)): """Summarise the supplied data. Produce a summary of the data, grouped by the given key (default: the first item), and giving totals of the given value (default: the second item). The key and value arguments should be functions which, given a data record, return the relevant value. """ for k, group in groupby(data, key): yield (k, sum(value(row) for row in group)) if __name__ == "__main__": # Example: given a set of sales data for city within region, # produce a sales report by region sales = [('Scotland', 'Edinburgh', 20000), ('Scotland', 'Glasgow', 12500), ('Wales', 'Cardiff', 29700), ('Wales', 'Bangor', 12800), ('England', 'London', 90000), ('England', 'Manchester', 45600), ('England', 'Liverpool', 29700)] for region, total in summary(sales, key=itemgetter(0), value=itemgetter(2)): print "%10s: %d" % (region, total)
In many situations, data is available in tabular form, where the information is naturally grouped by a subset of the data values. Examples include results from database queries or data from spreadsheets. Often, it is useful to be able to produce summaries of the detail data.
The new groupby function (part of the Python 2.4 itertools module) is designed for handling such grouped data. It takes as input an iterator, along with a function to extract the "key" value from a record. It yields each distinct key from the iterator in turn, along with a new iterator which runs through the data values associated with that key.
A common use of the groupby function would be to generate summary totals for a data set. The summary function defined above shows one way of doing this. For a summary report, two extraction functions are required, one to extract the "key", which is passed to the groupby function, and one to extract the values to be summarised.
It should be noted that the groupby function does not sort its input. This can mean that with unsorted data, multiple groups with the same key will appear. If this is not appropriate, the list.sort method (or the sorted builtin) can be used to pre-sort the data. The same key function as is supplied to groupby can also be used as a key argument to the sort.
This recipe provides a good illustration of how the new Python 2.4 features work well together - in addition to the groupby function, the operator.itemgetter convenience function is used to provide natural defaults for the summary function, and a generator expression is used as the argument to the sum() function. When sorted input is required, the new key argument to list.sort provides a convenient means to reuse an existing key function, and the sorted() builtin extends this to sequences other than lists.