Kahan summation

Note to language implementers

This task touches on the details of number representation that is often hidden for convenience in many programming languages. You may need to understand the difference between precision in calculations and precision in presenting answers for example; or understand decimal arithmetic as distinct from the usual binary based floating point schemes used in many languages.

The Kahan summation algorithm is a method of summing a series of numbers represented in a limited precision in a way that minimises the loss of precision in the result.

The task is to follow the previously linked Wikipedia articles algorithm and its worked example by:

Do all arithmetic in decimal to a precision of six digits.
Write a function/method/procedure/subroutine/... to perform Kahan summation on an ordered collection of numbers, (such as a list of numbers).
Create the three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 respectively.
Show that the simple left-to-right summation, equivalent to (a + b) + c gives an answer of 10005.8
Show that the Kahan function applied to the sequence of values a, b, c results in the more precise answer of 10005.9

If your language does not have six digit decimal point, but does have some fixed precision floating point number system then state this and try this alternative task using floating point numbers:

Follow the same steps as for the main task description above but for the numbers a, b, c use the floating-point values 1.0, epsilon, -epsilon respectively where epsilon is determined by its final value after the following:

<lang python>epsilon = 1.0 while 1.0 + epsilon != 1.0:

   epsilon = epsilon / 2.0</lang>

The above should ensure that (a + b) + c != 1.0 whilst their Kahan sum does equal 1.0 . Only if this is not the case are you then free to find three values with this property of your own.

If your language has decimals, but not six digits of precision then you may follow the outline given in Kahan_summation#Python:_Arbitrary_precision_Decimal which uses ideas from the floating point case above.

In general

Show all output on this page.
If the floating point calculations are used then answers may depend on the hardware platform or compiler/interpreter tool-chain used for a language.
Slight deviations from the task description should be explained and the the subsequent difference in output explained.
All examples should have constants chosen to clearly show the benefit of Kahan summing!

F#

This example is incomplete. Slight deviations from the task description should be explained and the the subsequent difference in output explained. All examples should have constants chosen to clearly show the benefit of Kahan summing! - The J-language example for example explains its differences well. Please ensure that it meets all task requirements and remove this message.

Solving the alternative task with an recursive algorithm and IEEE 754 double precision datatype. <lang fsharp>let ε =

   let rec δ λ = 
       match λ+1.0 with 
       | 1.0 -> λ
       | _ -> δ(λ/2.0)
   δ(1.0)

let Σ numbers =

   let rec Σ numbers σ c = 
       match numbers with
       | [] -> σ
       | number :: rest -> 
           let y = number + c
           let t = σ + y
           let z = ((t - σ) - y)
           Σ rest t z
   Σ numbers 0.0 0.0

[<EntryPoint>] let main argv =

   let (a, b, c) = ( 1.0, ε, -ε )
   let input = [ a; b; c ]
   printfn "ε:     %e" ε
   printfn "Sum:   %e" (input |> List.sum)
   printfn "Kahan: %e" (input |> Σ)
   0</lang>

Output:

ε:     1.110223e-016
Sum:   1.000000e+000
Kahan: 1.000000e+000

Go

Go has no floating point decimal type. Its floating point types are float64 and float32, implementations of IEEE 754 binary64 and binary32 formats. Alternative task: <lang go>package main

import "fmt"

type float float64

func kahan(s ...float) float {

   var tot, c float
   for _, x := range s {
       y := x - c
       t := tot + y
       c = (t - tot) - y
       tot = t
   }
   return tot

}

func epsilon() float {

   ε := float(1)
   for 1+ε != 1 {
       ε /= 2
   }
   return ε

}

func main() {

   a := float(1)
   b := epsilon()
   c := -b
   fmt.Println("Left associative:", a+b+c)
   fmt.Println("Kahan summation: ", kahan(a, b, c))
   fmt.Println("Epsilon:         ", b)

}</lang>

Output:

Left associative: 0.9999999999999999
Kahan summation:  1
Epsilon:          1.1102230246251565e-16

With float defined as float32,

Left associative: 0.99999994
Kahan summation:  1
Epsilon:          5.9604645e-08

J

J does not implement 6 digit floating point addition (though the programmer could of course emulate such a thing - if it were specified precisely enough). But J does have IEEE-754 floating point numbers. So...

Required code (for the current draft of this task):

<lang J>epsilon=:3 :0

 epsilon=. 1.0
 while. 1.0 ~: 1.0 + epsilon do.
   epsilon=. epsilon % 2.0
 end.

)

a=: 1.0 b=: epsilon c=: -epsilon

KahanSum=:3 :0

 input=. y
 c=. 0.0
 sum=. 0.0
 for_i. i.#input do.
   y=. (i{input) - c
   t=. sum + y
   c=. (t - sum) - y
   sum=. t
 end.

)</lang>

Required results:

<lang J> (a+b)+c 1

  KahanSum a,b,c

1</lang>

There are several issues here, but the bottom line is that several assumptions behind the design of this task conflict with the design of J.

But let's take them one by one.

First, the value of epsilon:

<lang J> epsilon 5.68434e_14</lang>

This is because J's concept of equality, in the context of floating point numbers, already includes an epsilon. Floating point numbers are inherently imprecise, and this task has assumed that the language has assumed that floating point numbers are precise. So let's implement something closer to this previously unstated assumption:

<lang J>epsilon=:3 :0

 epsilon=. 1.0
 while. 1.0 (~:!.0) 1.0 + epsilon do.
   epsilon=. epsilon % 2.0
 end.

)</lang>

Now we have a smaller value for this explicit value of epsilon:

<lang J> epsilon 1.11022e_16</lang>

With this new value for epsilon, let's try the problem again:

<lang J> (a+b)+c 1

  KahanSum a,b,c

1</lang>

Oh dear, we still are not failing in the manner clearly specified as a task requirement.

Well, again, the problem is that the task has assumed that the language is treating floating point values as precise when their actual accuracy is less than their precision. This time, the problem is in the display of the result. So let's also override that mechanism and ask J to display results to 16 places of precision:

<lang J> ":!.16 (a+b)+c 0.9999999999999999

  ":!.16 KahanSum a,b,c

1</lang>

Voila!

PHP

Script

<lang php><?php // Main task

// 1. Do all arithmetic in decimal to a precision of six digits. // PHP does not have a way of restricting overall precision

// 2. Write a function/method/procedure/subroutine/... to perform Kahan // summation on an ordered collection of numbers, (such as a list of numbers). ) function kahansum($input) {

   $sum = $c = 0;
   foreach($input as $item) {
       $y = $item + $c;
       $t = $sum + $y;
       $c = ($t - $sum) - $y;
       $sum = $t;
   }
   return $sum;

}

// 3. Create the three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 // respectively. $input = array(10000.0, 3.14159, 2.71828); list($a, $b, $c) = $input;

// 4. show that the simple left-to-right summation, equivalent to (a + b) + c // gives an answer of 10005.8 ) $sumabc = ($a + $b) + $c; echo "Main task - Left to right summation: "; echo sprintf("%6.1f", $sumabc).PHP_EOL; // 10005.9

// 5. Show that the Kahan function applied to the sequence of values a, b, c // results in the more precise answer of 10005.9 echo "Main task - Kahan summation: "; echo sprintf("%6.1f", kahansum($input)).PHP_EOL; // 10005.9

// Let's use the substask $epsilon = 1.0; while ((1.0 + $epsilon) != 1.0) {

   $epsilon /= 2.0;

} echo "Trying the subtask".PHP_EOL."epsilon: "; echo $epsilon.PHP_EOL; // 1.1102230246252E-16

$a = 1.0; $b = $epsilon; $c = -$epsilon;

$input = array($a, $b, $c);

// left-to-right summation $sumabc = ($a + $b) + $c; echo "Sub task - Left to right summation: "; echo sprintf("%.1f", $sumabc).PHP_EOL;

// kahan summation echo "Sub task - Kahan summation: "; echo sprintf("%.1f", kahansum($input)).PHP_EOL;

// but, are they really the same or is an artifact? echo "Are the results the same?".PHP_EOL; var_dump((($a + $b) + $c) === kahansum($input));

// ok, then what is the difference? echo "Difference between the operations: "; $diff = (($a + $b) + $c) - kahansum($input); echo $diff.PHP_EOL; </lang>

Script output

<lang bash> $ php kahansum.php Main task - Left to right summation: 10005.9 Main task - Kahan summation: 10005.9 Trying the subtask epsilon: 1.1102230246252E-16 Sub task - Left to right summation: 1.0 Sub task - Kahan summation: 1.0 Are the results the same? bool(false) Difference between the operations: 1.1102230246252E-16 </lang>

Python

Python: Decimal

<lang python>>>> from decimal import * >>> >>> getcontext().prec = 6 >>> >>> def kahansum(input):

   summ = c = 0
   for num in input:
       y = num - c
       t = summ + y
       c = (t - summ) - y
       summ = t
   return summ

>>> a, b, c = [Decimal(n) for n in '10000.0 3.14159 2.71828'.split()] >>> a, b, c (Decimal('10000.0'), Decimal('3.14159'), Decimal('2.71828')) >>> >>> (a + b) + c Decimal('10005.8') >>> kahansum([a, b, c]) Decimal('10005.9') >>> >>> >>> sum([a, b, c]) Decimal('10005.8') >>> # it seems Python's sum() doesn't make use of this technique. >>> >>> # More info on the current Decimal context: >>> getcontext() Context(prec=6, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[Inexact, Rounded], traps=[InvalidOperation, DivisionByZero, Overflow]) >>> >>> >>> ## Lets try the simple summation with more precision for comparison >>> getcontext().prec = 20 >>> (a + b) + c Decimal('10005.85987') >>> </lang>

Python: Floats

This was written as proof that the floating-point sub-task could work and is better off displayed, so...

<lang python>>>> eps = 1.0 >>> while 1.0 + eps != 1.0: eps = eps / 2.0

>>> eps 1.1102230246251565e-16 >>> (1.0 + eps) - eps 0.9999999999999999 >>> kahansum([1, eps, -eps]) 1.0 >>> >>> >>> # Info on this implementation of floats >>> import sys >>> sys.float_info sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1) >>> </lang> Note that the kahansum function from the decimal example is duck typed and adjusts to work with the number type used in its argument list.

Note also that the math.fsum function employs an even more precise algorithm for the summation of floating point numbers.

Python: Arbitrary precision Decimal

Some languages have a decimal type, but cannot alter its precision to six digits. This example mimmicks this case byusing the default Python decimal precision and a variant of epsilon finding that divides by ten instead of two. <lang python>>>> from decimal import localcontext, Decimal >>> >>> with localcontext() as ctx: one, ten = Decimal('1.0'), Decimal('10') eps = one while one + eps != one: eps = eps / ten print('eps is:', eps) print('Simple sum is:', (one + eps) - eps) print('Kahan sum is:', kahansum([one, eps, -eps]))

eps is: 1E-28 Simple sum is: 0.9999999999999999999999999999 Kahan sum is: 1.000000000000000000000000000 >>> </lang>

R

This example is incorrect. Please fix the code and remove this message.

Details: Not as clear and understandable as it should be. See talk page.

Try using the task rules

1. Do all arithmetic in decimal to a precision of six digits.

R can only limit the number of digits being displayed, not the ones used in calculation, all operations are floating point

2. Write a function/method/procedure/subroutine/... to perform Kahan summation
on an ordered collection of numbers, (such as a list of numbers).

kahansum <- function(x) {

   ks <- 0
   c <- 0
   for(i in 1:length(x)) {
       y <- x[i] - c
       kt <- ks + y
       c = (kt - ks) - y
       ks = kt
   }
   ks

}

3. Create the three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 respectively.

a <- 10000.0 b <- 3.14159 c <- 2.71828

4. Show that the simple left-to-right summation, equivalent to (a + b) + c gives an answer of 10005.8

(a + b) + c

[1] 10005.86

5. Show that the Kahan function applied to the sequence of values a, b, c results in the more precise answer of 10005.9

input <- c(a, b, c) kahansum(input)

[1] 10005.86

as there is no difference let's try the subtask

epsilon = 1.0 while ((1.0 + epsilon) != 1.0) {

   epsilon = epsilon / 2.0

} epsilon

[1] 1.11022e-16

a <- 1.0 b <- epsilon c <- -epsilon

left-to-right summation

(a + b) + c

[1] 1

kahan summation

kahansum(c(a, b, c))

[1] 1

but, are they really the same or is an artifact?

(a + b) + c == kahansum(c(a, b, c))

FALSE

ok, then what is the difference?

((a + b) + c) - kahansum(c(a, b, c))

[1] -1.110223e-16

</lang>