Binary floating point summation accurate to full precision « Python recipes

Completely eliminates rounding errors and loss of significance due to catastrophic cancellation during summation. Achieves exactness by keeping full precision intermediate subtotals. Offers three alternative approaches, each using a different technique to store exact subtotals.

      def msum(iterable):
    "Full precision summation using multiple floats for intermediate values"
    # Rounded x+y stored in hi with the round-off stored in lo.  Together
    # hi+lo are exactly equal to x+y.  The inner loop applies hi/lo summation
    # to each partial so that the list of partial sums remains exact.
    # Depends on IEEE-754 arithmetic guarantees.  See proof of correctness at:
    # www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps

    partials = []               # sorted, non-overlapping partial sums
    for x in iterable:
        i = 0
        for y in partials:
            if abs(x) < abs(y):
                x, y = y, x
            hi = x + y
            lo = y - (hi - x)
            if lo:
                partials[i] = lo
                i += 1
            x = hi
        partials[i:] = [x]
    return sum(partials, 0.0)


from math import frexp

def lsum(iterable):
    "Full precision summation using long integers for intermediate values"
    # Transform (exactly) a float to m * 2 ** e where m and e are integers.
    # Adjust (tmant,texp) and (mant,exp) to make texp the common exponent.
    # Given a common exponent, the mantissas can be summed directly.

    tmant, texp = 0L, 0
    for x in iterable:
        mant, exp = frexp(x)
        mant, exp = long(mant * 2.0 ** 53), exp-53
        if texp > exp:
            tmant <<= texp - exp
            texp = exp
        else:
            mant <<= exp - texp
        tmant += mant
    return float(str(tmant)) * 2.0 ** texp

from itertools import imap

def lsum_26(iterable):
    "Full precision summation using long integers for intermediate values"
    # Py2.6 and later version of lsum() relying on float.as_integer_ratio().
    # Saves work by accumulating separate sums for each power-of-two 
    # denominator and then doing the bits shifts on the combined totals.
    fractions = {}
    fractions_get = fractions.get
    for n, d in imap(float.as_integer_ratio, iterable):
        fractions[d] = fractions_get(d, 0) + n
    tn, td = 0, 1
    for d, n in sorted(fractions.items()):
        while td < d:
            td <<= 1
            tn <<= 1
        tn += n
    return tn / td

from decimal import getcontext, Decimal, Inexact
getcontext().traps[Inexact] = True

def dsum(iterable):
    "Full precision summation using Decimal objects for intermediate values"
    # Transform (exactly) a float to m * 2 ** e where m and e are integers.
    # Convert (mant, exp) to a Decimal and add to the cumulative sum.
    # If the precision is too small for exact conversion and addition,
    # then retry with a larger precision.

    total = Decimal(0)
    for x in iterable:
        mant, exp = frexp(x)
        mant, exp = int(mant * 2.0 ** 53), exp-53
        while True:
            try:
                total += mant * Decimal(2) ** exp
                break
            except Inexact:
                getcontext().prec += 1
    return float(total)

from fractions import Fraction

def frsum(iterable):
    "Full precision summation using fractions for itermediate values"
    return float(sum(imap(Fraction.from_float, iterable)))


from random import random, gauss, shuffle

def test():
    for j in xrange(1000):
        vals = [7, 1e100, -7, -1e100, -9e-20, 8e-20] * 10
        s = 0
        for i in range(200):
            v = gauss(0, random()) ** 7 - s
            s += v
            vals.append(v)
        shuffle(vals)
        assert msum(vals) == lsum(vals) == dsum(vals) == frsum(vals) == lsum_26(vals)
        print '.',
    print 'Tests Passed'

if __name__ == '__main__':
    test()

      

There are other recipes that mitigate round-off errors during floating point summation (see recipe 298339 for example). This one goes beyond mitigation and is provably exact. Other features include an O(n) typical runtime, a tiny memory footprint, and accepting any iterable input.

Tim Peters provided a good test case that defeats some other attempts at accurate summation:

>>> print msum([1, 1e100, 1, -1e100] * 10000)
20000.0

The msum() function maintains an array of partial sums stored as floats. The array is kept in increasing order of magnitude except for the last entry which can be zero. Each summand is non-overlapping (the lowest non-zero bit of the larger value is greater than the highest bit of the smaller value). Together, the array components span the full range of precision in the exact sum. In practice, the list of partial sums rarely has more than a few entries.

For proof of msum()'s correctness, see Shewchuk's "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates" at http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps . The exact summation of two floats, x+y, is proved in theorem 6. Keeping the overall sum exact through hi/lo summations is proved in theorem 10. The results follow from IEEE-754 arithmetic guarantees such as subtraction being exact whenever the two components are within a factor of two of each other. The algorithms work under various rounding modes including round-to-nearest (the 754 default).

The paper includes a caveat (in section 5) that the computation of lo may be incorrectly optimized away by some C compilers. This does not occur in Python code which has no algebraic optimizations beyond constant folding. Running dis.dis(msum) demonstrates that each addition and subtraction has its own opcode and has not been algebraically reduced.

A second function, lsum(), employs an alternative approach and provides an independent cross-check on msum(). It also computes a full precision floating point sum but keeps the exact intermediate subtotals in the form of an integer exponent and a full precision mantissa stored as a long integer. At each step, either the addend or cumulative sum is left shifted until the two have a common exponent, then the mantissas are summed directly. The running total is always exactly equal to: tmant * 2.0 ** texp.

Another approach is shown in a third function, dsum(), which uses Decimal objects for the exact intermediate subtotals. The math.frexp() function exactly splits the float into a mantissa and exponent. The mantissas are doubled and exponents are lowered until the two components are both integers. From these two integers, the float's value is exactly computed as a Decimal. The context precision grows as needed to accommodate all significant digits.

All five functions give identical answers, accurate to full precision. They each take different approaches to storing exact intermediate subtotals (i.e. multiple floats, a long integer, or a decimal object). After the summation, the exact result is rounded to a regular float, accurate to within about one half unit in the last place.

----- Additional notes ------------------------

The code for dsum() and lsum() assume 53 bit precision floats. To remove that assumption, replace this line:

mant, exp = long(mant * 2.0 ** 53), exp-53

with:

while mant != long(mant):
    mant *= 2.0
    exp -= 1
mant = long(mant)

Also, dsum() and lsum() build the result float value using float(str(x)) where x is either a Decimal or a long integer. In both cases, CPython employs the platform strtod() function whose round-off/truncation behavior is not fully specified by C89 or IEEE-754's string-to-float conversion rules (special thanks to Tim Peters for this information). For exact subtotals falling almost exactly between two representable floating point values, the least significant bit can be misrounded by some systems (hence, the error bound is very slightly more than ½ unit in the last place). On my system, the 1 bit misrounding occurs infrequently (once out of every several thousand calls to lsum()). Regardless, since rounding occurs after exact summation, the result is consistent for all data orderings (i.e. the commutative law of addition holds exactly).

The lsum() function runs on Py2.0 (when augment assignment was introduced). The msum() function runs on Py2.3 (when the builtin sum() function was introduced). Both lsum() and dsum() are easily modified to work on very old versions of Python. In contrast, the dsum() function depends completely on Python 2.4's decimal module and it does not run nearly as fast as msum() and lsum().

Tags: algorithms

20 comments

hemanth sethuram 19 years, 1 month ago # | flag

Do you need list indexing. In the recipe where you have used partials[i] and partials[i:], could not you have used partials.append()? Am I missing something subtle here?

--Hemanth Sethuram

gyro funch 19 years, 1 month ago # | flag

Credit earlier recipes. Please credit or note earlier recipes (e.g. 298339) that perform similar functions.

Raymond Hettinger (author) 19 years, 1 month ago # | flag

Reason for not using append. p[i:] = [x] appends after deleting all elements beyond i. If you print the partials array after each iteration, you will see that sometimes it grows, sometimes it shrinks, and sometimes keeps the same length. All of that logic is captured by the slice assignment.

Dozens of variations of the code were tried out -- this was by far the simplest and fastest.

Raymond Hettinger (author) 19 years, 1 month ago # | flag

Why there is no reference to other recipes. Those recipes are not exact. They mitigate round-off issues but do not eliminate them. Certain datasets will completely defeat those efforts. In contrast, this recipe is provably exact in all cases.

This recipe was not derived from those other recipes. It is a direct tranlation from the cited research paper which pre-dates the other postings.

gyro funch 19 years, 1 month ago # | flag

References are useful for people who want to discriminate between recipes. Your points are well taken (and your implementation of the cited work is excellent), but as the cookbook grows (and functionally-similar recipes multiply), it becomes more and more difficult for non-experts to determine which recipe to choose to carry out a particular task. Analogous to journal articles, it would be highly useful to users of the cookbook to see (in the recipe explanation) citations of existing related recipes and the improvements provided by the present code.

John Machin 19 years ago # | flag

I don't understand the point of having a non-zero "start" arg; it doesn't participate in the round-off error elimination. Why not initialise partials = [start] ?

Keith Briggs 19 years ago # | flag

provably exact? "This one goes beyond mitigation and is provably exact." This is subject to some quite delicate assumptions about the underlying FP arithmetic. I have done similar things in C which either work or don't work depending on the CPU rounding mode setting and compiler optimization flags. How do we know python has all the settings correct? In particular, all intermediate results must be correctly rounded to the 53-bit mantissa length. How do we know the analog of this for decimal arithmetic holds?

Keith Briggs 19 years ago # | flag

Raymond Hettinger (author) 19 years ago # | flag

Exactness. Exactness is proved in the referenced paper:

http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps

See the FAST-TWO-SUM primitive (theorem 6) and the GROW-EXPANSION algorithm (theorem 10). These theorems follow from IEEE-754 arithmetic guarantees.

Tim Peters 19 years ago # | flag

Because dsum() doesn't round at all. "How do we know the analog of this for decimal arithmetic holds?"

dsum() increases the precision, and retries, so long as the inexact exception is raised. Therefore no info is lost.

Tim Peters 19 years ago # | flag

754 is assumed. "This is subject to some quite delicate assumptions about the underlying FP arithmetic"

All the routines assume Python floats store no more than 53 significant mantissa bits, although fancier code could remove that assumption for the methods based on Decimal and longs.

The routine using native fp arithmetic also assumes IEEE-754 nearest/even rounding is in effect. Python does nothing to force that, but 754 requires that nearest/even be the starting rounding mode in a conforming system, and Python inherits that from the platform C (if the latter is in fact conforming).

Raymond Hettinger (author) 19 years ago # | flag

Keith had made his comment on an earlier revision of the recipe which erroneously claimed that msum() worked on Decimal inputs as well as binary floating point. That claim was subsequently removed from the description.

Nicky McLean 17 years, 9 months ago # | flag

Why use decimal arithmetic and incur conversion costs? Why not stay with binary arithmetic, taking the mantissa as being exactly as presented? The multi-precision array spanning the dynamic range of the floating-point numbers could work in base 2 (bits), base 16 (four bits), or in bytes; whatever is least inconvenient when aligning each incoming number's bits ready for the addition.

Raymond Hettinger (author) 15 years, 1 month ago # | flag

FWIW, here's a faster version of lsum() written for Py2.6 and later. It takes advantage of the new as_integer_ratio() method. Running sums are kept for each possible denominator, then those are combined to a common denominator.

from __future__ import division
from itertools import imap

def lsum(iterable):
    "Full precision summation using long integers for intermediate values"
    fractions = {}
    fractions_get = fractions.get
    for n, d in imap(float.as_integer_ratio, iterable):
        fractions[d] = fractions_get(d, 0) + n
    tn, td = 0, 1
    for d, n in sorted(fractions.items()):
        while td < d:
            td <<= 1
            tn <<= 1
        tn += n
    return tn / td

Raymond Hettinger (author) 15 years, 1 month ago # | flag

Also, here's a Py2.6 version using rational arithmetic.

def rsum(iterable):
    "Full precision summation using fractions for itermediate values"
    return float(sum(imap(Fraction.from_float, iterable)))

Doug Cook 12 years ago # | flag

msum is not actually correct, even if proper rounding modes are in effect. The final partials array is exactly correct (as was proven in the paper you reference), but you cannot simply sum the partials and expect the sum to also be accurately rounded. Converting from the array of partials back to a single floating-point value was not discussed in the paper.

As a case in point, let's use decimal instead of float, and assume 3 digits of precision. Given that, add together the following 3 numbers:

1.00 0.00500 0.00000000000100

The exact sum is 1.00500000000100, which rounded to 3 digits is 1.01.

Partials would end up as [0.00000000000100, 0.00500, 1.00], which is exactly accurate but no better than the original 3 values (well, it sorted them for you, but very inefficiently). Applying sum to partials will give you 1.00, which is incorrect in the last digit. To be more general, the problem occurs when partials[n-2] is non-zero but less in magnitude than the epsilon of partials[n-1], and when partials[n-1] is exactly equal in magnitude to the epsilon of partials[n]. partials[n-1] is the same as partials[n-2]+partials[n-1], and then partials[n] is the same as partials[n-1]+partials[n].

As an aside, the original paper gives some optimizations for doing sums that would convert msum from O(n*2) to O(nlog(n)), but the code is much more complicated.

Mark Dickinson 11 years, 7 months ago # | flag

@Doug Cook. Agreed. The version implemented in C in Python's mathmodule has the appropriate fix for this, and the Python code that's in a comment in that module refers to 'sum_exact' rather than 'sum' in the last line for exactly this reason.

Oscar Benjamin 10 years, 8 months ago # | flag

Adding a special case in msum() for Decimals seems to make it work for me:

if partials and isinstance(partials[0], Decimal):
    fresult = sum(map(Fraction.from_decimal, partials))
    # Is the line below guaranteed to round correctly?
    return Decimal(fresult.numerator) / Decimal(fresult.denominator)
else:
    return sum(partials, 0.0)

Or am I missing something?

It would be nice to duck-type that somehow e.g.:

return type(partials[0])(sum(map(Fraction, partials)))

which almost works in 3.3 except that Decimal.__new__ rejects Fractions.

Dmitry Pesegov 8 years, 1 month ago # | flag

Mark for those who will try to port code to Delphi.

Don't even start to implement lsum. Coz looks like there is no denormalized values in Python. For an example if u execute Python3 code (https://repl.it/BxV1/1)

import sys

x = sys.float_info.epsilon

from math import frexp

t_mant, t_exp = frexp(x)

print('t_mant=',t_mant)

print('t_exp=',t_exp)

Result will be:

t_mant= 0.5

t_exp= -51

While if u want to use denormalized values u will need more abs lower epsilon. And then if in Delphi u using 10 bites Extended with 64bits mantissa then binary exp will be outside of that 64 bit mantissa in which u can do shifting like in algorithm. For 10bits Extended with 64bit mantissa after frexp function the binary exp for epsilon is -16444. So further have 2 options: 1. drop off all denormalized values (no values after frexp in which exp is lower than -64); 2. use many mantissas blocks (64 multiple of). Both options is not usable for my opinion.

msum is ok to be implemented, coz it's using Kahan principle to recover missed lower digits and it's no matter how much bits is in the mantissa of your float type. But it's better to use presorted array and move from 0 point to left and right like a swinging, coz then u will avoid overflow (until case where MinTypeValue <= (sum_of_pos_elements + sum_of_neg_elements) <= MaxTypeValue)) and minimize the missed lower digits coz of monotonically growing magnitude of elements. Yes, 2 steps: 1.Presort; 2.Swing-like msum (I'm implementing msum at the moment, presort and swing-like elements feeding is implemented already)

michael.reshko 6 years, 11 months ago # | flag

Isn't there an problem with the lsum (and lsum_26) algorithms at "tmant += mant"? What happens when this overflows? For example, try adding up an 1m array of 1e-6.

◄	Python recipes (4591)	►
◄	Raymond Hettinger's recipes (97)	►

Binary floating point summation accurate to full precision (Python recipe) by Raymond Hettinger
ActiveState Code (http://code.activestate.com/recipes/393090/)

20 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Binary floating point summation accurate to full precision (Python recipe) by Raymond Hettinger ActiveState Code (http://code.activestate.com/recipes/393090/)

20 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Binary floating point summation accurate to full precision (Python recipe) by Raymond Hettinger
ActiveState Code (http://code.activestate.com/recipes/393090/)