Using 'reduce' and 'map', this code shows how a matrix vector multiplication can be reduced to a single loop.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
#!/usr/bin/env python
""" Matrix-vector multiplication """
def matmult(m, v):
nrows = len(m)
w = [None] * nrows
for row in range(nrows):
w[row] = reduce(lambda x,y: x+y, map(lambda x,y: x*y, m[row], v))
return w
#................................................
if __name__=='__main__':
m, n = 2, 3
vec = range(1, n+1)
mat = [map(lambda x: i*n+x+1, range(n)) for i in range(m)]
print 'vec=', vec
print 'mat=', mat
print 'mat . vec=', matmult(mat, vec)
``` |

There are numerous methods to compute the matrix vector operation. The above method is compact and elegant. However, it is not the fastest. For some reason, the following brute force approach is faster by about 10%:

def matmult2(m, v): " this is faster " nrows = len(m) ncols = len(m[0]) w = [None] * nrows for row in range(nrows): sum = 0 for col in range(ncols): sum += m[row][col]*v[col] w[row] = sum return w

This is somewhat counter-intuitive since there is an additional nested loop.

It shows that performance in Python is a tricky affair. Indeed, the test driver can be further accelerated by a factor 5 to 10 simply by initializing mat = [map(lambda x: float(i*n+x+1), range(n)) for i in range(m)] with floats for large n!

--Alex.

a faster code.I think the reason that the brute-force code is faster than the functional-style one is in that 'map' on a list would generate a new intermediate list, which is avoided in the direct implementation. Following this rule, the matrix multiplication could be accelerated a little bit like this:More Speed.Xunning Yue's code is hard to beat for its clarity. Here a couple more code snippets for your speed tests.Remember, map and reduce are generally fast but were slowed down in the first example by lambda and the unnecessary intermediate lists. The operator module takes care of the lambda problem. Xunning's approach takes care of the intermediate lists. Map also helps by pre-allocating the right amount of space which is something a list comprehension cannot do.

And the winner is...I love Raymond's code (matmult4) though Xunnin's remains the fastest, by a small margin. These are my test results obtained on PIII 600Mhz for a 800x800 matrix. I give here the ratios of CPU time to the CPU time taken by matmult: matmult2=>0.76 matmult3=>0.68 matmult4=>0.72 matmult5=>0.73. Well done!Another suggestion.Looping is apparently faster. Keeping this in mind, one has merely to ensure no computational power is wasted by repeatedly performing the same tasks multiple times. For instance initializing a list of indices for the nested loop saves 5% and extracting the current row of the matrix once also roughly saves 5%. Introducing local variables is therefor optimal, hence you will find this to be faster: