Pathological floating point problems: Difference between revisions

Content added Content deleted

Inline

Revision as of 09:08, 23 February 2016

Most programmers are familiar with the inexactness of floating point calculations in a binary processor. The classic example being:

0.1 + 0.2 =  0.30000000000000004

In many situations the amount of error in such calculations is very small and can be overlooked or eliminated with rounding.

There are pathological problems however, where seemingly simple, straight-forward calculations are extremely sensitive to even tiny amounts of imprecision.

Show how your language deals with such classes of problems.

A sequence that seems to converge to a wrong limit. Consider the sequence:

   v₁ = 2
   v₂ = -4
   v_n = 111 - 1130/v_n-1 + 3000/(v_n-1 * v_n-2)

As n grows larger, the series should converge to 6 but small amounts of error will cause it to approach 100.

Display the values of the sequence where n = 3, 4, 5, 6, 7, 8, 20, 30, 50 & 100 to at least 16 decimal places.

   n = 3   18.5
   n = 4   9.378378
   n = 5   7.801153
   n = 6   7.154414
   n = 7   6.806785
   n = 8   6.5926328
   n = 20  6.0435521101892689
   n = 30  6.006786093031205758530554
   n = 50  6.0001758466271871889456140207471954695237
   n = 100 6.000000019319477929104086803403585715024350675436952458072592750856521767230266

The Chaotic Bank Society is offering a new investment account to their customers. You first deposit $e-1 where e is 2.7182818... the base of natural logarithms.

After each year, your account balance will be multiplied by the number of years that have passed, and $1 in service charges will be removed. So after 1 year, your balance will be multiplied by 1 and $1 will be removed for service charges, after 2 years your balance will be doubled and $1 removed, after 3 years your balance will be tripled and $1 removed, after 10 years, multiplied by 10 and $1 removed and so on. What will your balance be after 25 years?

   Starting balance: $e-1
   Balance = (Balance * year) - 1 for 25 years
   Balance after 25 years: $0.0399387296732302

Siegfried Rump's example. Consider the following function, designed by Siegfried Rump in 1988.

   f(a,b) = 333.75b⁶ + a²( 11a²b² - b⁶ - 121b⁴ - 2) + 5.5b⁸ + a/2b
   compute f(a,b) where a = 77617.0 and b = 33096.0.
   f(77617.0, 33096.0) = -0.827396059946821

Demonstrate how to solve at least one of the first two problems, or both, and the third if you're feeling particularly jaunty.

See Floating-Point Arithmetic Section 1.3.2 Difficult problems.

J

A sequence that seems to converge to a wrong limit.

Implementation of vn:

Example using IEEE-754 floating point:

<lang J> 3 21j16 ":"1] 3 4 5 6 7 8 20 30 50 100 ([,.{) (,vn)^:100(2 _4)

 3   9.3783783783783861
 4   7.8011527377522611
 5   7.1544144809765555
 6   6.8067847369419638
 7   6.5926327689743687
 8   6.4494659378910058
20  99.9934721906444960
30 100.0000000000000000
50 100.0000000000000000

100 100.0000000000000000</lang>

Example using exact arithmetic:

<lang J> 3 21j16 ":"1] 3 4 5 6 7 8 20 30 50 100 ([,.{) (,vn)^:100(2 _4x)

 3   9.3783783783783784
 4   7.8011527377521614
 5   7.1544144809752494
 6   6.8067847369236330
 7   6.5926327687044384
 8   6.4494659337902880
20   6.0360318810818568
30   6.0056486887714203
50   6.0001465345613879

100 6.0000000160995649</lang>

The Chaotic Bank Society

Let's start this example by using exact arithmetic, to make sure we have the right algorithm. We'll go a bit overboard, in representing e, so we don't have to worry too much about that.

<lang J> e=: +/%1,*/\1+i.100x

  81j76":e
  2.7182818284590452353602874713526624977572470936999595749669676277240766303535
  21j16":+`*/,_1,.(1+i.-25),e
  0.0399387296732302</lang>

(Aside: here, we are used the same mechanism for adding -1 to e that we are using to add -1 to the product of the year number and the running balance.)

Next, we will use $6157974361033\div 2265392166685$ for e, to represent the limit of what can be expressed using 64 bit IEEE 754 floating point.

That's clearly way too low, so let's try instead using $(1+6157974361033)\div 2265392166685$ for e <lang J> 31j16":+`*/,_1,.(1+i.-25),6157974361034%2265392166685x

4793054977300.3491517765983265</lang>

So, our problem seems to be that there's no way we can express enough bits of e, using 64 bit IEEE-754 floating point arithmetic. Just to confirm:

<lang J> 1x1 2.71828

  +`*/,_1,.(1+i.-25),1x1

_2.24237e9</lang>

Now let's take a closer look using our rational approximation for e:

<lang J> 21j16":+`*/,_1,.(1+i.-25),+/%1,*/\1+i.40x

  0.0399387296732302
  21j16":+`*/,_1,.(1+i.-25),+/%1,*/\1+i.30x
  0.0399387277260840
  21j16":+`*/,_1,.(1+i.-25),+/%1,*/\1+i.26x
  0.0384615384615385
  21j16":+`*/,_1,.(1+i.-25),+/%1,*/\1+i.25x
  0.0000000000000000
  21j16":+`*/,_1,.(1+i.-25),+/%1,*/\1+i.24x
 _1.0000000000000000</lang>

Things go haywire when our approximation for e uses the same number of terms as our bank's term. So, what does that look like, in terms of precision?

<lang J> 41j36":+/%1,*/\1+i.26x

  2.718281828459045235360287471257428715
  41j36":+/%1,*/\1+i.25x
  2.718281828459045235360287468777832452
  41j36":+/%1,*/\1+i.24x
  2.718281828459045235360287404308329608</lang>

In other words, we go astray when our approximation for e is inaccurate in the 26th position after the decimal point. But IEEE-754 floating point arithmetic can only represent approximately 16 decimal digits of precision.

Siegfried Rump's example.

Again, we use exact arithmetic to see if we have the algorithm right. That said, we'll also do this in small steps, to make sure we're being exact every step of the way, and to keep from building overly long lines:

<lang J>rump=:4 :0

 NB. enforce exact arithmetic
 add=. +&x:
 sub=. -&x:
 mul=. *&x:
 div=. %&x:

 a=. x
 a2=. a mul a

 b=. y
 b2=. b mul b
 b4=. b2 mul b2
 b6=. b2 mul b4
 b8=. b4 mul b4

 c333_75=. 1335 div 4 NB. 333.75
 term1=. c333_75 mul b6

 t11a2b2=. 11 mul a2 mul b2
 tnb6=. 0 sub b6
 tn121b4=. 0 sub 121 mul b4
 term2=. a2*(t11a2b2 + tnb6 + tn121b4 sub 2)

 c5_5=. 11 div 2 NB. 5.5
 term3=. c5_5 mul b8

 term4=. a div 2 mul b

 term1 add term2 add term3 add term4

)</lang>

Example use:

<lang J> 61j16": 77616 rump 33096

          204003255494337230059803902062081.1725888324873096</lang>

Because we have been using exact arithmetic here, either the task's description is correct, or we have made an arithmetic mistake. So, since we were careful to make our intermediate results persistent values, let's take a look at them:

<lang J> 61j16":a

                                      77616.0000000000000000
  61j16":a2
                                 6024243456.0000000000000000
  61j16":b
                                      33096.0000000000000000
  61j16":b2
                                 1095345216.0000000000000000
  61j16":b4
                        1199781142214086656.0000000000000000
  61j16":b6
               1314174534371215466459037696.0000000000000000
  61j16":b8
      1439474789212538429291115400277262336.0000000000000000
  61j16":c333_75
                                        333.7500000000000000
  61j16":term1
             438605750846393161930703831040.0000000000000000
  61j16":t11a2b2
                       72584888745037971456.0000000000000000
  61j16":tnb6
              _1314174534371215466459037696.0000000000000000
  61j16":tn121b4
                     _145173518207904485376.0000000000000000
  61j16":term2
     _7916907776019217870264236828326711808.0000000000000000
  61j16":c5_5
                                          5.5000000000000000
  61j16":term3
      7917111340668961361101134701524942848.0000000000000000
  61j16":term4
                                          1.1725888324873096</lang>

If any of these are mistaken, it should be simple to show which value is incorrect.

Perl 6

Works with: Rakudo version 2016-01

The simple solution to doing calculations where floating point numbers exhibit pathological behavior is: don't do floating point calculations. :-) Perl 6 is just as susceptible to floating point error as any other C based language, however, it offers built-in rational Types; where numbers are represented as a ratio of two integers. For normal precision it uses Rats - accurate to 1/2^64, and for arbitrary precision, FatRats, which can grow as large as available memory. Rats don't require any special special setup to use. Any decimal number within its limits of precision is automatically stored as a Rat. FatRats require explicit coercion and are "sticky". Any FatRat operand in a calculation will cause all further results to be stored as FatRats. <lang perl6>say '1st: Convergent series'; my @series = 2.FatRat, -4, { 111 - 1130 / $^v + 3000 / ( $^v * $^u ) } ... *; for flat 3..8, 20, 30, 50, 100 -> $n {say "n = {$n.fmt("%3d")} @series[$n-1]"};

say "\n2nd: Chaotic banking society"; sub postfix:<!> (Int $n) { [*] 2..$n } # factorial operator my $years = 25; my $balance = sum map { 1 / FatRat.new($_!) }, 1 .. $years + 15; # Generate e-1 to sufficient precision with a Taylor series put "Starting balance, \$(e-1): \$$balance"; for 1..$years -> $i { $balance = $i * $balance - 1 } printf("After year %d, you will have \$%1.16g in your account.\n", $years, $balance);

print "\n3rd: Rump's example: f(77617.0, 33096.0) = "; sub f (\a, \b) { 333.75*b⁶ + a²*( 11*a²*b² - b⁶ - 121*b⁴ - 2) + 5.5*b⁸ + a/(2*b) } say f(77617.0, 33096.0).fmt("%0.16g");</lang>

Output:

1st: Convergent series
n =   3 18.5
n =   4 9.378378
n =   5 7.801153
n =   6 7.154414
n =   7 6.806785
n =   8 6.5926328
n =  20 6.0435521101892689
n =  30 6.006786093031205758530554
n =  50 6.0001758466271871889456140207471954695237
n = 100 6.000000019319477929104086803403585715024350675436952458072592750856521767230266

2nd: Chaotic banking society
Starting balance, $(e-1): $1.7182818284590452353602874713526624977572470936999
After year 25, you will have $0.0399387296732302 in your account.

3rd: Rump's example: f(77617.0, 33096.0) = -0.827396059946821

zkl

zkl doesn't have a big rational or big float library (as of this writing) but does have big ints (via GNU GMP). <lang zkl>Series:=Walker(fcn(vs){ // just keep appending new values to a list

  vs.append(111.0 - 1130.0/vs[-1] + 3000.0/(vs[-1]*vs[-2])) }.fp(List(2,-4)));

series:=Series.drop(100).value;</lang> We'll use the convenient formula given in the referenced paper to create a fraction with big ints <lang zkl>var BN=Import("zklBigNum"), digits=BN(10).pow(64); fcn u(n){ // use formula to create a fraction of big ints

  const b=-3, y=4;
  N:=BN(6).pow(n+1)*b + BN(5).pow(n+1)*y;
  D:=BN(6).pow(n)*b   + BN(5).pow(n)*y;
  r,m:=(N*digits/D).toString(), (r.len()-64).max(0);
  String(r[0,m],".",r[m,32]);

}

println("1st: Convergent series"); foreach n in (T(3,4,5,6,7,8,20,30,50,100)){

  "n =%3d; %3.20F  %s".fmt(n,series[n-1],u(n-1)).println();

}</lang>

Output:

1st: Convergent series
n =  3;  18.50000000000000000000  18.50000000000000000000000000000000
n =  4;   9.37837837837837895449  9.37837837837837837837837837837837
n =  5;   7.80115273775216877539  7.80115273775216138328530259365994
n =  6;   7.15441448097533339023  7.15441448097524935352789065386036
n =  7;   6.80678473692481134094  6.80678473692363298394175659627200
n =  8;   6.59263276872179204702  6.59263276870443839274200277636599
n = 20;  98.34950312216535905918  6.04355211018926886777747736409754
n = 30;  99.99999999999893418590  6.00678609303120575853055404795323
n = 50; 100.00000000000000000000  6.00017584662718718894561402074719
n =100; 100.00000000000000000000  6.00000001931947792910408680340358

For Rump's example, multiple the formula by 10ⁿ so we can use integer math. <lang zkl>fcn rump(a,b){ b=BN(b);

  b2,b4,b6,b8:=b.pow(2),b.pow(4),b.pow(6),b.pow(8);
  a2:=BN(a).pow(2);
  r:=( b6*33375 + a2*(a2*b2*11 - b6 - b4*121 - 2)*100 + b8*550 )*digits;
  r+=BN(a)*digits*100/(2*b);
  s,m:=r.toString(), (s.len()-66).max(0);
  String(s[0,m],".",s[m,32])

} println("\n3rd: Rump's example: f(77617.0, 33096.0) = ",rump(77617,33096));</lang>

Output:

3rd: Rump's example: f(77617.0, 33096.0) = -.82739605994682136814116509547981