Talk:Kahan summation: Difference between revisions

→‎Task: added a comment concerning another talk page about summation methods.
(→‎Task 2: Why just one representation?)
(→‎Task: added a comment concerning another talk page about summation methods.)
 
(7 intermediate revisions by 4 users not shown)
Line 1:
==Task - The problem is badly formulated==
 
Rounding errors in sums do not occur because we use a certain number of digits after the decimal point. The source of the such errors is the '''floating point''' representation of decimal fractions. As you can see, when using '''fixed point''' representation, there are no rounding errors. Of course, the calculated sum of a / 3 + a / 3 + a / 3 is less than a, but this error is caused by the rounding at division.
<lang C>
/*
* RosettaCode: Kahan summation, C89 (MS Visual Studio 2010)
*
* C language has no fixed decimal fraction type. Nevertheless to obtain
* "six-digits precision" we can use ordinary fractions with fixed denominator.
*/
#include <stdio.h>
 
#define DECIMAL_FRACTION long int
#define DENOMINATOR 1000000L
#define DECIMAL_TO_FIXED(WHOLE,FRACTIONAL) (WHOLE*DENOMINATOR+FRACTIONAL)
#define FIXED_TO_WHOLE(VALUE) (VALUE / DENOMINATOR)
#define FIXED_TO_FRACT(VALUE) (VALUE % DENOMINATOR)
 
int main(void)
{
DECIMAL_FRACTION a = DECIMAL_TO_FIXED(10000,0);
DECIMAL_FRACTION b = DECIMAL_TO_FIXED(3,14159);
DECIMAL_FRACTION c = DECIMAL_TO_FIXED(2,71828);
 
DECIMAL_FRACTION leftSum;
DECIMAL_FRACTION rightSum;
DECIMAL_FRACTION kahanSum;
 
leftSum = a;
leftSum += b;
leftSum += c;
 
rightSum = c;
rightSum += b;
rightSum += a;
 
{
/*
* Actually we sum only a+b+c with an un-rolled implementation
* of Kahan algorithm. KISS
*/
 
DECIMAL_FRACTION correction = 0;
DECIMAL_FRACTION inputMinusCorrection = 0;
DECIMAL_FRACTION updatedSum = 0;
 
kahanSum = a;
 
inputMinusCorrection = b - correction;
updatedSum = kahanSum + inputMinusCorrection;
correction = updatedSum - kahanSum;
correction -= inputMinusCorrection;
kahanSum = updatedSum;
 
inputMinusCorrection = c - correction;
updatedSum = kahanSum + inputMinusCorrection;
correction = updatedSum - kahanSum;
correction -= inputMinusCorrection;
kahanSum = updatedSum;
}
 
#define PRINT(S,V) printf(S##" = %d.%d\n", FIXED_TO_WHOLE(V), FIXED_TO_FRACT(V))
 
PRINT("a", a);
PRINT("b", b);
PRINT("c", c);
putchar('\n');
 
PRINT(" (a+b)+c", leftSum);
PRINT(" a+(b+c)", rightSum);
PRINT("Kahan sum", kahanSum);
 
if ( leftSum == kahanSum && rightSum == kahanSum )
puts("\nC can compute on fixed point numbers without round-off errors");
 
getchar();
return 0;
}
</lang>
{{Out}}
<pre>
a = 1410.65408
b = 3.14159
c = 2.71828
 
(a+b)+c = 1415.151395
a+(b+c) = 1415.151395
Kahan sum = 1415.151395
 
C can compute on fixed point numbers without round-off errors
</pre>
 
==Task==
The idea of showing the Kahan summation on Rosettacode is good, but the Task is not good yet. I suggest to remove the requirements of using a Decimal module and make it optional. So many languages can use normal floating point values. I also suggest to increase the number of input values and the number of their digits, to better show why the Kahan summation is useful.
Line 54 ⟶ 146:
 
::::OK, indeed the options() call just restricts the number of digits being displayed, not the actual number used in calculations. In that respect it is in the same boat as PHP. AFAIK there is a package (not a base component) that can deal with arbitrary precision (http://cran.r-project.org/web/packages/Rmpfr/), but that is not what this particular task is aiming to show. I have redone the operations and results on a Windows 7 64-bit machine from a friend, will check again on my Ubuntu 64-bit box to corroborate the results. --[[User:Jmcastagnetto|jmcastagnetto]] ([[User talk:Jmcastagnetto|talk]]) 02:11, 17 December 2014 (UTC)
 
:For a clear example of the differences in summation methods, try adding the first billion terms of the harmonic series 1/i. It will also be significantly different adding forward or backward. In 32 bit floats, the results are:
<pre> forward: 15.4036827
backward: 18.8079185
forward compensated: 21.3004818
backward compensated: 21.3004818</pre>
--[[User:Andy a|Andy a]] ([[User talk:Andy a|talk]]) 04:56, 21 April 2020 (UTC)
 
::::: The above topic (summation methods) is also mentioned in the &nbsp; ''talk/discussion'' &nbsp; page at &nbsp; [http://rosettacode.org/wiki/Talk:Sum_of_a_series#summing_backwards (Talk) &nbsp; Sum of a series, summing backwards]. &nbsp; &nbsp; &nbsp; &nbsp; -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 09:24, 21 April 2020 (UTC)
 
==So far, in J and R==
Line 99 ⟶ 200:
 
:But what about the issues we have already uncovered in several languages? What about languages that have much better control of number representation and calculations? There is a chance for languages that have these extra capabilities to shine and I would not want to lose that by fixing on one representation. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 16:03, 20 December 2014 (UTC)
 
::There is no chance of solving this task unless the representation is known. Kahan summation is only meaningful for fixed-precision floating-point formats. (I know this issue hasn't come up but I expect some people to hear "decimal" and try to use a fixed-point decimal type, which I think may be more common than floating-point decimal.) For an appropriate type then, observing a difference from simple summation is only possible when results are displayed with sufficient precision. For the example with epsilon in R for example, you must know that you have nearly 16 significant decimal digits, then since epsilon is a difference in the 16th decimal place, you must display 16 significant figures. R can do this with sprintf. I suspect most languages will have a way of displaying numbers at either full or specified precision. Even in the case where full resolution was not displayed, a difference between Kahan and simple summation could still be shown if you at least knew how much precision is displayed (or equivalently, how much precision is suppressed.) Then you could contrive data so the the difference would ultimately be apparent at whatever (known) precision is displayed. Not knowing if you are doing fixed-precision floating point math, not knowing the precision you are working with, or not knowing what precision is displayed are showstoppers to illustrating Kahan summation. Some task requirements to demonstrate these capabilities would help avoid much floundering. For example,
::*State the precision and base used, ex, 6 digit decimal, or 53 bit binary.
::*Determine the unit of least precision, or unit of last place, or ULP. In general this is base**(-precision).
::*Compute and show 1, ULP, and 1-ULP in sufficient precision to show all three different.
::*Now compute and show 1+ULP at full precision. It must show 1.000... with zeros covering the full precision. Examples,
<pre>
6 digit decimal
1: 1.00000
ULP: .000001
1-ULP: .999999
1+ULP: 1.00000
</pre>
<pre>
IEEE 754 binary64, base 2, precision 53.
1: 1.000000000000000e+00
ULP: 1.110223024625157e-16
1-ULP: 9.999999999999999e-01
1+ULP: 1.000000000000000e+00
</pre>
:If you can't reproduce 1-ULP showing something less than one and 1+ULP showing exactly one, you can't do the task. It means you don't have a suitable representation, you don't have the ULP, you can't display full precision, or something.
 
:The epsilon technique should find ULP if the divisor is the base, and if equality can be tested without fuzzing. If you count iterations, it will tell you the precision as well. Really though I suggest the epsilon technique not be required. You should know your precision and base without computing them.
<lang go>package main
 
import (
"fmt"
"math"
)
 
// defining constants for IEEE 754 binary64
const (
base = 2
prec = 53
)
 
// "epsilon"
func ulp() (u float64, p int) {
u = 1.
for 1+u > 1 {
p++
u /= 2
}
return
}
 
func main() {
u, p := ulp()
fmt.Println("ulp by definintion:", math.Pow(base, -prec))
fmt.Println("ulp computed: ", u)
fmt.Println("computed precision:", p)
}</lang>
{{out}}
<pre>
ulp by definintion: 1.1102230246251565e-16
ulp computed: 1.1102230246251565e-16
computed precision: 53
</pre>
::Kahan summation of 1, +ULP, -ULP does technically show that it works, but I think the example is a little abstract and doesn't illustrate the practical value of Kahan summation well. My examples adding 10 copies of pi or adding a triangle of numbers were attempts at doing this and showing accumulated discrepancies greater than just 1 ULP. Here's one more attempt, this time summing a bunch of random numbers and accumulating a reference result.
<lang go>package main
 
import (
"fmt"
"math"
"math/rand"
"time"
)
 
func kahan(s []float64) float64 {
var tot, c float64
for _, x := range s {
y := x - c
t := tot + y
c = (t - tot) - y
tot = t
}
return tot
}
 
func seq(s []float64) float64 {
tot := 0.
for _, x := range s {
tot += x
}
return tot
}
 
// defining constants for IEEE 754 binary64
const (
base = 2
prec = 53
)
 
var ulp = math.Pow(base, -prec)
 
func main() {
rand.Seed(time.Now().UnixNano())
n := make([]float64, 10001)
n[0] = 1
refSum := 0.
for i := 1; i < len(n); i++ {
r := rand.Float64() * 1.1 * ulp
n[i] = r
refSum += r
}
 
fmt.Printf("Sequential: %.15f\n", seq(n))
fmt.Printf("Kahan: %.15f\n", kahan(n))
fmt.Printf("Reference: %.18f\n", refSum)
}</lang>
{{out}}
<pre>
Sequential: 1.000000000000215
Kahan: 1.000000000000614
Reference: 0.000000000000613562
</pre>
:It's a little strange because that 1.1 is a fudge factor needed for satisfying results. A multiple of a little more than half the base works well...because rounding... Anyway, it's nice to see the Kahan sum contain a rouding of the reference sum and see the sequential sum lose a few decimal places. Here's similar code for six decimal places,
<lang python>from decimal import *
from random import random
 
getcontext().prec = 6
 
def kahan(vals, start = 0):
tot = start
c = 0
for x in vals:
y = x - c
t = tot + y
c = (t - tot) - y
tot = t
return tot
 
ulp = Decimal('1')
prec = 0
while 1+ulp > 1:
prec += 1
ulp /= 10
print(ulp, prec)
 
n = []
refSum = Decimal('0')
for i in range(10000):
r = Decimal(random() * 6) * ulp
n.append(Decimal(r))
refSum += r
 
print("Sequential: ", sum(n, Decimal('1')))
print("Kahan ", kahan(n, Decimal('1')))
print("Reference: ", refSum)</lang>
{{out}}
<pre>
0.000001 6
Sequential: 1.01615
Kahan 1.02979
Reference: 0.0297927
</pre>
::&mdash;[[User:Sonia|Sonia]] ([[User talk:Sonia|talk]]) 01:53, 21 December 2014 (UTC)
 
 
==Epsilon computation==
The "Epsilon computation around 1" sub task should be a task by itself.
<lang python>epsilon = 1.0
while 1.0 + epsilon != 1.0:
epsilon = epsilon / 2.0</lang>
The "Epsilon computation around 1" sub task is a nice indicator about IEEE 754 floating point, used (or not) in the language implementation (specific compilers). Most of the time languages are not explicit about the precision required in floating point data type.
===IEEE 754===
IEEE 754 (1985 & 2008) floating point has 4 major data types:
<pre>
Precision Name Bits Mantissa Decimal precision
------------------ ---- --------- -----------------
simple precision 32 24 bits 5.9604E-9
double precision 64 53 bits 1.1102E-16
extended precision 80 64 bits 5.4210E-20
decimal128 120 34 digits 1.0000E-34
</pre>
C 99, Fortran 77, have IEEE-754 corresponding data types:
<pre>
IEEE 754 name GNU C Intel C Visual C Fortran 77 Fortran 95 VB .Net
------------------ ----------- ----------- -------- ---------- ---------------------- ----------
simple precision float float float real*4 SELECTED_REAL_KIND(8) Single
double precision double double double real*8 SELECTED_REAL_KIND(16) Double
extended precision long double long double n/a real*10 SELECTED_REAL_KIND(20) n/a
decimal128 __float128 __Quad n/a real*16 SELECTED_REAL_KIND(34) n/a
</pre>
In Microsoft Visual C long double is treated as double.
===Compilers===
The epsilon computation using different implementation of languages and different data types gives the
following results:
<pre>
Language Compiler Declaration N Epsilon IEEE-754
-------- -------- ------------ -- -------------------------- --------
C++ VC++ 6.0 float 53 1.110223E-16 Double
C++ VC++ 6.0 double 53 1.110223E-16 Double
C++ VC++ 6.0 long double 53 1.110223E-16 Double
Fortran Plato real*4 64 5.421011E-20 Extended
Fortran Plato real*8 64 5.421010862428E-20 Extended
Fortran Plato real*10 64 5.42101086242752217E-20 Extended
Fortran Plato real*16 64 5.42101086242752217E-20 Extended
Pascal Free real 64 5.42101086242752E-020 Extended
Pascal Free double 64 5.42101086242752E-020 Extended
Pascal Free extended 64 5.4210108624275222E-0020 Extended
Perl Strawberry 53 1.11022302462516e-016 Double
Python v335 53 1.1102230246251565e-16 Double
SmallBasic 1.2 94 1.0e-28 Fixed128*
VB6 VB 6.0 Single 24 5.960464E-08 Single
VB6 VB 6.0 Double 53 1.11022302462516E-16 Double
VBA VBA 7.1 Single 24 5.960464E-08 Single
VBA VBA 7.1 Double 53 1.11022302462516E-16 Double
VBScript Win 10 53 1.110223E-16 Double
VB.Net VS 2013 Single 53 1.110223E-16 Double
VB.Net VS 2013 Double 53 1.11022302462516E-16 Double
VB.Net VS 2013 Decimal 94 1.0e-28 Fixed128*
</pre>
N is the loop count.<br>
Fixed128 : Use of IEEE Decimal128 floating point to emulate fixed point arithmetic.<br>
 
It is interesting to see that several compilers do not use the different IEEE-754 precisions to implement the different data types.
The trade-off between compiler simplicity a runtime efficiency is: why to bother with different floating point precisions and all the implied cross conversion routines, why not use only the higher precision.<br>
===Kahan summation===
Kahan summation algorithm task is a good idea but, the example numbers : 10000.0, 3.14159, 2.71828
are a bad choice, because no rounding errors when IEEE 754 floating point double precision (64 bits) are used by the language, and unfortunatly is now the standard. Let's note that William Kahan is a father of the original IEEE 754 and its revisions.<br>
<br>
--[[User:PatGarrett|PatGarrett]] ([[User talk:PatGarrett|talk]]) 16:43, 16 February 2019 (UTC)