Pivot/Crosstab/Denormalization of a Normalized List « Python recipes

This function creates a cross-tab or pivot table from a normalised input table. Use this function to 'denormalize' a table of normalized records.

      def pivot(table, left, top, value):
    """
    Creates a cross-tab or pivot table from a normalised input table. Use this
    function to 'denormalize' a table of normalized records.
    
    * The table argument can be a list of dictionaries or a Table object.
    (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
    * The left argument is a tuple of headings which are displayed down the 
    left side of the new table.
    * The top argument is a tuple of headings which are displayed across the 
    top of the new table.
    Tuples are used so that multiple element headings and columns can be used.
    
    Eg. To transform the list (listOfDicts):
    
    Name,   Year,  Value
    -----------------------
    'Simon', 2004, 32
    'Russel', 2004, 64
    'Simon', 2005, 128
    'Russel', 2005, 32
    
    into the new list:
    
    'Name',   2004, 2005
    ------------------------
    'Simon',  32,     128
    'Russel',  64,     32
    
    you would call pivot with the arguments:
    
    newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')
    
    """
    rs = {}
    ysort = []
    xsort = []
    for row in table:
        yaxis = tuple([row[c] for c in left])
        if yaxis not in ysort: ysort.append(yaxis)
        xaxis = tuple([row[c] for c in top])
        if xaxis not in xsort: xsort.append(xaxis)
        try:
            rs[yaxis]
        except KeyError:
            rs[yaxis] = {}
        if xaxis not in rs[yaxis]:
            rs[yaxis][xaxis] = 0
        rs[yaxis][xaxis] += row[value]
            
    headings = list(left)
    headings.extend(xsort)
    
    t = []
    #If you want a list of dictionaries returned, use a cheaper alternative,
    #the Table class at:
    #       http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621
    #and replace the above line with this code:
    #t = Table(*headings)

    for left in ysort:
        row = list(left)
        row.extend([rs[left][x] for x in rs[left]])
        t.append(dict(zip(headings,row)))
    return t


if __name__ == "__main__":
    import random
    #Build a list of dictionaries
    c = "Employee","Year","Month","Value"
    d = []
    for y in xrange(2003,2005):
        for m in xrange(1,13):
            for e in xrange(1,6):
                d.append(dict(zip(c,(e,y,m,random.randint(10,90)))))
    #pivot the list contents using the 'Employee' field for the left column,
    #and the 'Year' field for the top heading and the 'Value' field for each
    #cell in the new table.
    t = pivot(d,["Employee"],["Year"],"Value")
    for row in t:
        print row

      

This is a common operation, esp. when dealing with result sets returned from a SQL query.

Tags: database

5 comments

Giovanni Ulivi 19 years, 2 months ago # | flag

how to take care of missing or unordered data.

Hi, I tried the pivot function on a csv file that has both missing data and unordered fields, and I found that it misplaces all the values.
I modified the function to take care of these problems.
I commented the few lines that I have added or modified, and changed the example at the top.
I hope this can help.
Giovanni Ulivi


def pivot(table, left, top, value):
    """
    Creates a cross-tab or pivot table from a normalised input table. Use this
    function to 'denormalize' a table of normalized records.

    * The table argument can be a list of dictionaries or a Table object.
    (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
    * The left argument is a tuple of headings which are displayed down the
    left side of the new table.
    * The top argument is a tuple of headings which are displayed across the
    top of the new table.
    Tuples are used so that multiple element headings and columns can be used.

    E.g. To transform the list (listOfDicts):

    Name,   Year,  Value
    -----------------------
    'Simon', 2004, 32
    'Simon', 2005, 128
    'Russel', 2004, 64
    'Eric', 2004, 52
    'Russel', 2005, 32

    into the new list:

    'Name',   2004, 2005
    ------------------------
    'Simon',  32,     128
    'Russel',  64,     32
    'Eric',   52,     NA

    you would call pivot with the arguments:

    newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')

    """
    rs = {}
    ysort = []
    xsort = []
    for row in table:
        yaxis = tuple([row[c] for c in left])       # e.g. yaxis = ('Simon',)
        if yaxis not in ysort: ysort.append(yaxis)
        xaxis = tuple([row[c] for c in top])        # e.g. xaxis = ('2004',)
        if xaxis not in xsort: xsort.append(xaxis)
        try:
            rs[yaxis]
        except KeyError:
            rs[yaxis] = {}
        if xaxis not in rs[yaxis]:
            rs[yaxis][xaxis] = 0
        rs[yaxis][xaxis] += row[value]

    """
    In the following loop we take care of missing data,
    e.g 'Eric' has a value in 2004 but not in 2005
    """
    for key in rs:
        if len(rs[key]) <pre>
Hi, I tried the pivot function on a csv file that has both missing data and unordered fields, and I found that it misplaces all the values.
I modified the function to take care of these problems.
I commented the few lines that I have added or modified, and changed the example at the top.
I hope this can help.
Giovanni Ulivi

(comment continued...)

Giovanni Ulivi 19 years, 2 months ago # | flag

(...continued from previous comment)

def pivot(table, left, top, value):
    """
    Creates a cross-tab or pivot table from a normalised input table. Use this
    function to 'denormalize' a table of normalized records.

    * The table argument can be a list of dictionaries or a Table object.
    (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
    * The left argument is a tuple of headings which are displayed down the
    left side of the new table.
    * The top argument is a tuple of headings which are displayed across the
    top of the new table.
    Tuples are used so that multiple element headings and columns can be used.

    E.g. To transform the list (listOfDicts):

    Name,   Year,  Value
    -----------------------
    'Simon', 2004, 32
    'Simon', 2005, 128
    'Russel', 2004, 64
    'Eric', 2004, 52
    'Russel', 2005, 32

    into the new list:

    'Name',   2004, 2005
    ------------------------
    'Simon',  32,     128
    'Russel',  64,     32
    'Eric',   52,     NA

    you would call pivot with the arguments:

    newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')

    """
    rs = {}
    ysort = []
    xsort = []
    for row in table:
        yaxis = tuple([row[c] for c in left])       # e.g. yaxis = ('Simon',)
        if yaxis not in ysort: ysort.append(yaxis)
        xaxis = tuple([row[c] for c in top])        # e.g. xaxis = ('2004',)
        if xaxis not in xsort: xsort.append(xaxis)
        try:
            rs[yaxis]
        except KeyError:
            rs[yaxis] = {}
        if xaxis not in rs[yaxis]:
            rs[yaxis][xaxis] = 0
        rs[yaxis][xaxis] += row[value]

    """
    In the following loop we take care of missing data,
    e.g 'Eric' has a value in 2004 but not in 2005
    """
    for key in rs:
        if len(rs[key])

</pre>

Giovanni Ulivi 19 years, 2 months ago # | flag

how to take care of missing or unordered data.

Sorry for the badly formatted previous post.
I didn't think of the minus sign in the middle of the program.

I tried the pivot function on a csv file that has both missing data and unordered fields,
and I found that it misplaces all the values.
I modified the function to take care of these problems.
I commented the few lines that I have added or modified, and changed the example at the top.
I hope this can help.
Giovanni Ulivi


def pivot(table, left, top, value):
    """
    Creates a cross-tab or pivot table from a normalised input table. Use this
    function to 'denormalize' a table of normalized records.

    * The table argument can be a list of dictionaries or a Table object.
    (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334621)
    * The left argument is a tuple of headings which are displayed down the
    left side of the new table.
    * The top argument is a tuple of headings which are displayed across the
    top of the new table.
    Tuples are used so that multiple element headings and columns can be used.

    E.g. To transform the list (listOfDicts):

    Name,   Year,  Value
    -----------------------
    'Simon', 2004, 32
    'Simon', 2005, 128
    'Russel', 2004, 64
    'Eric', 2004, 52
    'Russel', 2005, 32

    into the new list:

    'Name',   2004, 2005
    ------------------------
    'Simon',  32,     128
    'Russel',  64,     32
    'Eric',   52,     NA

    you would call pivot with the arguments:

    newList = pivot(listOfDicts, ('Name',), ('Year',), 'Value')

    """
    rs = {}
    ysort = []
    xsort = []
    for row in table:
        yaxis = tuple([row[c] for c in left])       # e.g. yaxis = ('Simon',)
        if yaxis not in ysort: ysort.append(yaxis)
        xaxis = tuple([row[c] for c in top])        # e.g. xaxis = ('2004',)
        if xaxis not in xsort: xsort.append(xaxis)
        try:
            rs[yaxis]
        except KeyError:
            rs[yaxis] = {}
        if xaxis not in rs[yaxis]:
            rs[yaxis][xaxis] = 0
        rs[yaxis][xaxis] += row[value]

    """
    In the following loop we take care of missing data,
    e.g 'Eric' has a value in 2004 but not in 2005
    """
    for key in rs:
        if len(rs[key]) &lt; len(xsort):
            for var in xsort:
                if var not in rs[key].keys():
                    rs[key][var] = ''

    headings = list(left)
    headings.extend(xsort)

    t = []

(comment continued...)

Giovanni Ulivi 19 years, 2 months ago # | flag

(...continued from previous comment)

    """
    The lists 'sortedkeys' and 'sortedvalues' make sure that
     even if the field 'top' is unordered, data will be transposed correctly.
    E.g. in the example above the table rows are not ordered by the year
    """

    for left in ysort:
        row = list(left)
        sortedkeys = rs[left].keys()
        sortedkeys.sort()
        sortedvalues = map(rs[left].get, sortedkeys)
        row.extend(sortedvalues)
        t.append(dict(zip(headings,row)))
    return t

Billy Earney 16 years, 2 months ago # | flag

still a bug in your code. I tried using your pivot function and it works well except that the x axis headers are messed up. I think these need to be sorted..

Try adding xsort.sort() between the 2 heading lines as shown below..

headings = list(left)
xsort.sort()
headings.extend(xsort)

Thanks for a great function!

◄	Python recipes (4591)	►
◄	S W's recipes (20)	►

Pivot/Crosstab/Denormalization of a Normalized List (Python recipe) by S W
ActiveState Code (http://code.activestate.com/recipes/334695/)

5 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Pivot/Crosstab/Denormalization of a Normalized List (Python recipe) by S W ActiveState Code (http://code.activestate.com/recipes/334695/)

5 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Pivot/Crosstab/Denormalization of a Normalized List (Python recipe) by S W
ActiveState Code (http://code.activestate.com/recipes/334695/)