Built-in "sum" function, as well as add.reduce functions in Numeric/numarray introduce a large error when summing large arrays of like elements. I got relative error of about 1e-9 after summing 10 million doubles between 0 and 1. Function below has error less than 1e-15, doesn't use any additional memory (although it destroys the data array), and also runs asymptotically faster for unlimited precision numbers.

1 2 3 4 5 6 7 8 9 10 11 12 | ```
import math
def sum_with_partials(arr):
size = len(arr)
iters = math.ceil(math.log(size)/math.log(2))
for itr in range(int(iters)):
step = 2**itr
for i in xrange(0, size, 2**(itr+1)):
next_i = i+step
if next_i<size:
arr[i]+=arr[next_i]
return arr[0]
``` |

You can see that sum([0.123456789012345]*10000000) will be off by almost a 0.001 from the true answer. The solution above will give exact answer for this array. I also got absolute errors less than 1e-15 when trying it on arrays of random numbers. For unlimited precision numbers it also works asymptotically faster. More precisely regular sum takes O(n(d+log d)) while function above takes O(nd) where n is number of elements in array, d is number of digits in each element.

On my machine the function above takes 28% faster for adding 10,000 Decimals than built-in sum. For longs, it takes 10 times longer, presumably because the built-in is optimized for longs

That only works for lists, the Python implementation works for arbitrary sequences (anything iterable).

Note on asymptotic time.The O(n(d+log d)) time is forexactdecimal/binary arithmetic.In some cases, floating point manipulations are done with a fixed maximum precision. After this precision, digits are truncated. This could be called

truncated arithmetic, and it would take O(nd) time to perform a regular Python sum() in this mode.Simpler, more precise, but slower version.Nice recipe!A simpler, more precise (with slowly varying numbers), but slower and more memory-consuming version would be:

The advantage of this scheme is that only numbers with similar values are summed together, for smoothly varying elements; this precedure minimizes numerical errors, in floating-point arithmetic. In the original recipe, lists with 2**n+1 elements especially suffer from a loss of precision when the last element is added to the sum of all the previous ones.

(Side remarks: Speaking of speed, the original recipe is about 5x slower than the built-in sum with Python 2.4.2, on my machine--but the cost in time is certainly worth the price, in some situations.)

Speed tips.I haven't checked to see what impact this has on the whole algorithm's performace, but 1<<10 is about 30% faster than 2**10 for me (python ought to optimise that one!) Also, you might save a bit by importing ceil and log into the module's namespace, rather than looking up two symbols each time.