Welcome, guest | Sign In | My Account | Store | Cart

Using 'reduce' and 'map', this code shows how a matrix vector multiplication can be reduced to a single loop.

Python, 20 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#!/usr/bin/env python

""" Matrix-vector multiplication """

def matmult(m, v):
    nrows = len(m)
    w = [None] * nrows
    for row in range(nrows):
        w[row] = reduce(lambda x,y: x+y, map(lambda x,y: x*y, m[row], v))
    return w
   

#................................................
if __name__=='__main__':
    m, n = 2, 3
    vec = range(1, n+1)
    mat = [map(lambda x: i*n+x+1, range(n)) for i in range(m)]
    print 'vec=', vec
    print 'mat=', mat
    print 'mat . vec=', matmult(mat, vec)

There are numerous methods to compute the matrix vector operation. The above method is compact and elegant. However, it is not the fastest. For some reason, the following brute force approach is faster by about 10%:

def matmult2(m, v): " this is faster " nrows = len(m) ncols = len(m[0]) w = [None] * nrows for row in range(nrows): sum = 0 for col in range(ncols): sum += m[row][col]*v[col] w[row] = sum return w

This is somewhat counter-intuitive since there is an additional nested loop.

It shows that performance in Python is a tricky affair. Indeed, the test driver can be further accelerated by a factor 5 to 10 simply by initializing mat = [map(lambda x: float(i*n+x+1), range(n)) for i in range(m)] with floats for large n!

--Alex.

4 comments

Xunning Yue 22 years ago  # | flag

a faster code. I think the reason that the brute-force code is faster than the functional-style one is in that 'map' on a list would generate a new intermediate list, which is avoided in the direct implementation. Following this rule, the matrix multiplication could be accelerated a little bit like this:

def inner_prod(v1, v2):
     'inner production of two vectors.'
     sum = 0
     for i in xrange(len(v1)):
            sum += v1[i] * v2[i]
     return sum

def matmult3(m, v):
     'matrix multiply vector by inner production.'
     return [inner_prod(r, v) for r in m]
Raymond Hettinger 22 years ago  # | flag

More Speed. Xunning Yue's code is hard to beat for its clarity. Here a couple more code snippets for your speed tests.

Remember, map and reduce are generally fast but were slowed down in the first example by lambda and the unnecessary intermediate lists. The operator module takes care of the lambda problem. Xunning's approach takes care of the intermediate lists. Map also helps by pre-allocating the right amount of space which is something a list comprehension cannot do.

import operator
def matmult4(m, v):
    return [reduce(operator.add,map(operator.mul,r,v)) for r in m]

def matmult5(m, v):
    return map(lambda r: reduce(operator.add,map(operator.mul,r,v)),m)
Alexander Pletzer (author) 21 years, 11 months ago  # | flag

And the winner is... I love Raymond's code (matmult4) though Xunnin's remains the fastest, by a small margin. These are my test results obtained on PIII 600Mhz for a 800x800 matrix. I give here the ratios of CPU time to the CPU time taken by matmult: matmult2=>0.76 matmult3=>0.68 matmult4=>0.72 matmult5=>0.73. Well done!

N/A Glinsvad 17 years ago  # | flag

Another suggestion. Looping is apparently faster. Keeping this in mind, one has merely to ensure no computational power is wasted by repeatedly performing the same tasks multiple times. For instance initializing a list of indices for the nested loop saves 5% and extracting the current row of the matrix once also roughly saves 5%. Introducing local variables is therefor optimal, hence you will find this to be faster:

def matmult6(m, v):
    rows = len(m)
    w = [0]*rows
    irange = range(len(v))
    sum = 0
    for j in range(rows):
        r = m[j]
        for i in irange:
            sum += r[i]*v[i]
        w[j],sum = sum,0
    return w
Created by Alexander Pletzer on Fri, 19 Apr 2002 (PSF)
Python recipes (4591)
Alexander Pletzer's recipes (6)

Required Modules

  • (none specified)

Other Information and Tasks