ActiveState Code

Recipe 334695: Pivot/Crosstab/Denormalization of a Normalized List


This function creates a cross-tab or pivot table from a normalised input table. Use this function to 'denormalize' a table of normalized records.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def pivot(table, left, top, value):
    """
    Creates a cross-tab or pivot table from a normalised input table. Use this
    function to 'denormalize' a table of normalized records.
    
    * The table argument can be a list of dictionaries or a Table object.
    (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
    * The left argument is a tuple of headings which are displayed down the 
    left side of the new table.
    * The top argument is a tuple of headings which are displayed across the 
    top of the new table.
    Tuples are used so that multiple element headings and columns can be used.
    
    Eg. To transform the list (listOfDicts):
    
    Name,   Year,  Value
    -----------------------
    'Simon', 2004, 32
    'Russel', 2004, 64
    'Simon', 2005, 128
    'Russel', 2005, 32
    
    into the new list:
    
    'Name',   2004, 2005
    ------------------------
    'Simon',  32,     128
    'Russel',  64,     32
    
    you would call pivot with the arguments:
    
    newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')
    
    """
    rs = {}
    ysort = []
    xsort = []
    for row in table:
        yaxis = tuple([row[c] for c in left])
        if yaxis not in ysort: ysort.append(yaxis)
        xaxis = tuple([row[c] for c in top])
        if xaxis not in xsort: xsort.append(xaxis)
        try:
            rs[yaxis]
        except KeyError:
            rs[yaxis] = {}
        if xaxis not in rs[yaxis]:
            rs[yaxis][xaxis] = 0
        rs[yaxis][xaxis] += row[value]
            
    headings = list(left)
    headings.extend(xsort)
    
    t = []
    #If you want a list of dictionaries returned, use a cheaper alternative,
    #the Table class at:
    #       http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621
    #and replace the above line with this code:
    #t = Table(*headings)

    for left in ysort:
        row = list(left)
        row.extend([rs[left][x] for x in rs[left]])
        t.append(dict(zip(headings,row)))
    return t


if __name__ == "__main__":
    import random
    #Build a list of dictionaries
    c = "Employee","Year","Month","Value"
    d = []
    for y in xrange(2003,2005):
        for m in xrange(1,13):
            for e in xrange(1,6):
                d.append(dict(zip(c,(e,y,m,random.randint(10,90)))))
    #pivot the list contents using the 'Employee' field for the left column,
    #and the 'Year' field for the top heading and the 'Value' field for each
    #cell in the new table.
    t = pivot(d,["Employee"],["Year"],"Value")
    for row in t:
        print row

Discussion

This is a common operation, esp. when dealing with result sets returned from a SQL query.

Comments

  1. 1. At 3:32 a.m. on 8 feb 2005, Giovanni Ulivi said:

    how to take care of missing or unordered data.

    Hi, I tried the pivot function on a csv file that has both missing data and unordered fields, and I found that it misplaces all the values.
    I modified the function to take care of these problems.
    I commented the few lines that I have added or modified, and changed the example at the top.
    I hope this can help.
    Giovanni Ulivi
    
    
    def pivot(table, left, top, value):
        """
        Creates a cross-tab or pivot table from a normalised input table. Use this
        function to 'denormalize' a table of normalized records.
    
        * The table argument can be a list of dictionaries or a Table object.
        (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
        * The left argument is a tuple of headings which are displayed down the
        left side of the new table.
        * The top argument is a tuple of headings which are displayed across the
        top of the new table.
        Tuples are used so that multiple element headings and columns can be used.
    
        E.g. To transform the list (listOfDicts):
    
        Name,   Year,  Value
        -----------------------
        'Simon', 2004, 32
        'Simon', 2005, 128
        'Russel', 2004, 64
        'Eric', 2004, 52
        'Russel', 2005, 32
    
        into the new list:
    
        'Name',   2004, 2005
        ------------------------
        'Simon',  32,     128
        'Russel',  64,     32
        'Eric',   52,     NA
    
        you would call pivot with the arguments:
    
        newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')
    
        """
        rs = {}
        ysort = []
        xsort = []
        for row in table:
            yaxis = tuple([row[c] for c in left])       # e.g. yaxis = ('Simon',)
            if yaxis not in ysort: ysort.append(yaxis)
            xaxis = tuple([row[c] for c in top])        # e.g. xaxis = ('2004',)
            if xaxis not in xsort: xsort.append(xaxis)
            try:
                rs[yaxis]
            except KeyError:
                rs[yaxis] = {}
            if xaxis not in rs[yaxis]:
                rs[yaxis][xaxis] = 0
            rs[yaxis][xaxis] += row[value]
    
        """
        In the following loop we take care of missing data,
        e.g 'Eric' has a value in 2004 but not in 2005
        """
        for key in rs:
            if len(rs[key]) <pre>
    Hi, I tried the pivot function on a csv file that has both missing data and unordered fields, and I found that it misplaces all the values.
    I modified the function to take care of these problems.
    I commented the few lines that I have added or modified, and changed the example at the top.
    I hope this can help.
    Giovanni Ulivi
    

    (comment continued...)

  2. 2. At 3:32 a.m. on 8 feb 2005, Giovanni Ulivi said:

    (...continued from previous comment)

    def pivot(table, left, top, value):
        """
        Creates a cross-tab or pivot table from a normalised input table. Use this
        function to 'denormalize' a table of normalized records.
    
        * The table argument can be a list of dictionaries or a Table object.
        (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
        * The left argument is a tuple of headings which are displayed down the
        left side of the new table.
        * The top argument is a tuple of headings which are displayed across the
        top of the new table.
        Tuples are used so that multiple element headings and columns can be used.
    
        E.g. To transform the list (listOfDicts):
    
        Name,   Year,  Value
        -----------------------
        'Simon', 2004, 32
        'Simon', 2005, 128
        'Russel', 2004, 64
        'Eric', 2004, 52
        'Russel', 2005, 32
    
        into the new list:
    
        'Name',   2004, 2005
        ------------------------
        'Simon',  32,     128
        'Russel',  64,     32
        'Eric',   52,     NA
    
        you would call pivot with the arguments:
    
        newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')
    
        """
        rs = {}
        ysort = []
        xsort = []
        for row in table:
            yaxis = tuple([row[c] for c in left])       # e.g. yaxis = ('Simon',)
            if yaxis not in ysort: ysort.append(yaxis)
            xaxis = tuple([row[c] for c in top])        # e.g. xaxis = ('2004',)
            if xaxis not in xsort: xsort.append(xaxis)
            try:
                rs[yaxis]
            except KeyError:
                rs[yaxis] = {}
            if xaxis not in rs[yaxis]:
                rs[yaxis][xaxis] = 0
            rs[yaxis][xaxis] += row[value]
    
        """
        In the following loop we take care of missing data,
        e.g 'Eric' has a value in 2004 but not in 2005
        """
        for key in rs:
            if len(rs[key])
    

    </pre>

  3. 3. At 3:53 a.m. on 8 feb 2005, Giovanni Ulivi said:

    how to take care of missing or unordered data.

    Sorry for the badly formatted previous post.
    I didn't think of the minus sign in the middle of the program.
    
    I tried the pivot function on a csv file that has both missing data and unordered fields,
    and I found that it misplaces all the values.
    I modified the function to take care of these problems.
    I commented the few lines that I have added or modified, and changed the example at the top.
    I hope this can help.
    Giovanni Ulivi
    
    
    def pivot(table, left, top, value):
        """
        Creates a cross-tab or pivot table from a normalised input table. Use this
        function to 'denormalize' a table of normalized records.
    
        * The table argument can be a list of dictionaries or a Table object.
        (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
        * The left argument is a tuple of headings which are displayed down the
        left side of the new table.
        * The top argument is a tuple of headings which are displayed across the
        top of the new table.
        Tuples are used so that multiple element headings and columns can be used.
    
        E.g. To transform the list (listOfDicts):
    
        Name,   Year,  Value
        -----------------------
        'Simon', 2004, 32
        'Simon', 2005, 128
        'Russel', 2004, 64
        'Eric', 2004, 52
        'Russel', 2005, 32
    
        into the new list:
    
        'Name',   2004, 2005
        ------------------------
        'Simon',  32,     128
        'Russel',  64,     32
        'Eric',   52,     NA
    
        you would call pivot with the arguments:
    
        newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')
    
        """
        rs = {}
        ysort = []
        xsort = []
        for row in table:
            yaxis = tuple([row[c] for c in left])       # e.g. yaxis = ('Simon',)
            if yaxis not in ysort: ysort.append(yaxis)
            xaxis = tuple([row[c] for c in top])        # e.g. xaxis = ('2004',)
            if xaxis not in xsort: xsort.append(xaxis)
            try:
                rs[yaxis]
            except KeyError:
                rs[yaxis] = {}
            if xaxis not in rs[yaxis]:
                rs[yaxis][xaxis] = 0
            rs[yaxis][xaxis] += row[value]
    
        """
        In the following loop we take care of missing data,
        e.g 'Eric' has a value in 2004 but not in 2005
        """
        for key in rs:
            if len(rs[key]) &lt; len(xsort):
                for var in xsort:
                    if var not in rs[key].keys():
                        rs[key][var] = ''
    
        headings = list(left)
        headings.extend(xsort)
    
        t = []
    

    (comment continued...)

  4. 4. At 3:53 a.m. on 8 feb 2005, Giovanni Ulivi said:

    (...continued from previous comment)

        """
        The lists 'sortedkeys' and 'sortedvalues' make sure that
         even if the field 'top' is unordered, data will be transposed correctly.
        E.g. in the example above the table rows are not ordered by the year
        """
    
        for left in ysort:
            row = list(left)
            sortedkeys = rs[left].keys()
            sortedkeys.sort()
            sortedvalues = map(rs[left].get, sortedkeys)
            row.extend(sortedvalues)
            t.append(dict(zip(headings,row)))
        return t
    
  5. 5. At 2:13 p.m. on 1 feb 2008, Billy Earney said:

    still a bug in your code. I tried using your pivot function and it works well except that the x axis headers are messed up. I think these need to be sorted..

    Try adding xsort.sort() between the 2 heading lines as shown below..

    headings = list(left)
    xsort.sort()
    headings.extend(xsort)
    

    Thanks for a great function!

Sign in to comment