Welcome, guest | Sign In | My Account | Store | Cart

Built-in "sum" function, as well as add.reduce functions in Numeric/numarray introduce a large error when summing large arrays of like elements. I got relative error of about 1e-9 after summing 10 million doubles between 0 and 1. Function below has error less than 1e-15, doesn't use any additional memory (although it destroys the data array), and also runs asymptotically faster for unlimited precision numbers.

Python, 12 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import math

def sum_with_partials(arr):
  size = len(arr)
  iters = math.ceil(math.log(size)/math.log(2))
  for itr in range(int(iters)):
    step = 2**itr
    for i in xrange(0, size, 2**(itr+1)):
      next_i = i+step
      if next_i<size:
        arr[i]+=arr[next_i]
  return arr[0]

You can see that sum([0.123456789012345]*10000000) will be off by almost a 0.001 from the true answer. The solution above will give exact answer for this array. I also got absolute errors less than 1e-15 when trying it on arrays of random numbers. For unlimited precision numbers it also works asymptotically faster. More precisely regular sum takes O(n(d+log d)) while function above takes O(nd) where n is number of elements in array, d is number of digits in each element.

On my machine the function above takes 28% faster for adding 10,000 Decimals than built-in sum. For longs, it takes 10 times longer, presumably because the built-in is optimized for longs

4 comments

Bob Ippolito 19 years, 8 months ago  # | flag

That only works for lists, the Python implementation works for arbitrary sequences (anything iterable).

Connelly Barnes 19 years, 8 months ago  # | flag

Note on asymptotic time. The O(n(d+log d)) time is for exact decimal/binary arithmetic.

In some cases, floating point manipulations are done with a fixed maximum precision. After this precision, digits are truncated. This could be called truncated arithmetic, and it would take O(nd) time to perform a regular Python sum() in this mode.

  • Connelly
Eric-Olivier LE BIGOT 18 years, 5 months ago  # | flag

Simpler, more precise, but slower version. Nice recipe!

A simpler, more precise (with slowly varying numbers), but slower and more memory-consuming version would be:

def sum_with_partials(elements):
  if len(elements) == 1: return elements[0]
  return sum_with_partials(elements[::2]) + sum_with_partials(elements[1::2])

The advantage of this scheme is that only numbers with similar values are summed together, for smoothly varying elements; this precedure minimizes numerical errors, in floating-point arithmetic. In the original recipe, lists with 2**n+1 elements especially suffer from a loss of precision when the last element is added to the sum of all the previous ones.

(Side remarks: Speaking of speed, the original recipe is about 5x slower than the built-in sum with Python 2.4.2, on my machine--but the cost in time is certainly worth the price, in some situations.)

Chris Fuller 17 years, 5 months ago  # | flag

Speed tips. I haven't checked to see what impact this has on the whole algorithm's performace, but 1<<10 is about 30% faster than 2**10 for me (python ought to optimise that one!) Also, you might save a bit by importing ceil and log into the module's namespace, rather than looking up two symbols each time.

Created by Yaroslav Bulatov on Wed, 4 Aug 2004 (PSF)
Python recipes (4591)
Yaroslav Bulatov's recipes (1)

Required Modules

Other Information and Tasks