Generalised floating point addition: Difference between revisions

Content added Content deleted

Inline

Revision as of 08:15, 30 October 2011

If possible, define addition for floating point numbers where the digits are stored in an arbitrary base. eg. the digits cab be stored as binary, decimal, Binary-coded decimal, or even Balanced ternary.

Implement the code in a generalised form (such as a Template, Module or Mixin etc) that permits reusing of the code for different Bases.

If it is not possible to implement code in syntax of the specific language then:

note the reason.
perform test case using a built-in or external library.

Test case:

Use the Template to define Arbitrary precision addition on numbers stored in Binary Coded Decimal. Calculate the terms for -8 to 20 in this sequence of calculations:

12345679e63 × 81 + 1e63
12345679012345679e54 × 81 + 1e54
12345679012345679012345679e45 × 81 + 1e45
12345679012345679012345679012345679e36 × 81 + 1e36
etc. (The final calculation will be over 256 digits wide)

Perform the multiplication of 81 by repeated additions. The results will always be 1e72

Kudos for a Template that successfully handles Balanced ternary to perform the above test case.

ALGOL 68

Works with: ALGOL 68 version Revision 1 - one minor extension to language used - PRAGMA READ, similar to C's #include directive.

Works with: ALGOL 68G version Any - tested with release [1].

Note: This code stores the digits as array of digits with the "most significant digit" on the "left" as per normal "human" form. The net effect is that whole numbers (such as 100) are stored in the negative array positions, eg -2, -1 & 0, or [-2:0], And the fractional part of the floating point numbers are stored from index 1, eg. 1, 2, 3 etc. or [1:].

File: Mixin_Big_float_base.a68 <lang algol68>################################################

Define the basic operators and routines for #
manipulating DIGITS in a generalised base #

MODE STRUCTDIGIT = STRUCT(DIGIT digit); MODE DIGITS = STRUCT(FLEX[0]STRUCTDIGIT digits); BOOL balanced arithmetic = FALSE; INT digit order = -1;

OP ZERO = (STRUCTDIGIT skip)STRUCTDIGIT: INITSTRUCTDIGIT 0; OP ONE = (STRUCTDIGIT skip)STRUCTDIGIT: INITSTRUCTDIGIT 1;

STRUCTDIGIT OPerators #

OP INITSTRUCTDIGIT = (#LONG# INT i)STRUCTDIGIT: (STRUCTDIGIT out; digit OF out := #SHORTEN# i; out);

OP INITSTRUCTDIGIT = (INT i)STRUCTDIGIT: INITSTRUCTDIGIT LENG i;#

OP /= = (STRUCTDIGIT a,b)BOOL: digit OF a /= digit OF b;

Define additive and multiplicative identities #

OP ZERO = (DIGITS skip)DIGITS: INITDIGITS []DIGIT(0),

  ONE = (DIGITS skip)DIGITS: INITDIGITS []DIGIT(1);

Define OPerators for Least and Most Significant DIGIT #

OP MSD = (DIGITS t)INT: LWB digits OF t,

  LSD = (DIGITS t)INT: UPB digits OF t;

Define the requitred coercion/casting operators #

OP INITDIGITS = ([]DIGIT in)DIGITS:

 (STRUCT(FLEX[LWB in:UPB in]STRUCTDIGIT digits)out; digit OF digits OF out := in; out);

OP INITDIGITS = (DIGIT in)DIGITS: INITDIGITS []DIGIT(in);

OP INITDIGITS = ([]STRUCTDIGIT in)DIGITS:

 (DIGITS out; digits OF out := in; out);

A routine for removing leadng and trailing zeros #

PROC digits normalise = ([]DIGIT in)[]DIGIT: (

   DIGIT zero = 0 # ZERO LOC DIGIT#;
   INT highest := LWB in, lowest := UPB in;
   FOR place FROM highest BY -digit order TO lowest DO
     IF in[place] /= zero THEN
       highest := place;
       done highest
     FI
   OD;
   highest := lowest+digit order;
 done highest:
   FOR place FROM lowest BY digit order TO highest DO
     IF in[place] /= zero THEN
       lowest := place;
       done lowest
     FI
   OD;
   lowest := highest+digit order;
 done lowest:
   IF highest=lowest+digit order THEN # normalise zero's exponent #
     in[highest:lowest][@0]
   ELSE
     in[highest:lowest][@highest]
   FI
 );</lang>File: Mixin_Big_float_addition.a68

<lang algol68>########################################

Define the basic addition operators #
for the generalised base #

Note: +:= returns carry #

OP +:= = (REF STRUCTDIGIT lhs, STRUCTDIGIT arg)STRUCTDIGIT: (

 STRUCTDIGIT carry;
 digit OF carry := digit OF lhs +:= digit OF arg;
 carry

);

OP +:= = (REF DIGITS lhs, DIGITS arg)DIGITS: lhs := lhs + arg;

OP + = (DIGITS arg)DIGITS: arg;

OP + = (DIGITS in a, in b)DIGITS: (

Note: []STRUCTDIGIT(~) cast removes FLEX #

 []DIGIT a = digit OF []STRUCTDIGIT(digits OF in a);
 []DIGIT b = digit OF []STRUCTDIGIT(digits OF in b);
 INT extreme highest = MSD in a MIN MSD in b,
     overlap highest = MSD in a MAX MSD in b,
     overlap lowest  = LSD in a MIN LSD in b,
     extreme lowest  = LSD in a MAX LSD in b;

 DIGIT zero = 0 # ZERO LOC DIGIT#; # "a" can be zero length #

DIGITS out; #

 IF overlap highest > overlap lowest THEN # Either: NO overlapping digits #

   IF MSD in a > LSD in a THEN # a = 0 #
     # out := # in b
   ELIF MSD in b > LSD in b THEN # b = 0 #
     # out := # in a
   ELSE
     [extreme highest:extreme lowest]DIGIT a plus b;

First: simply insert the known digits #

     a plus b[MSD in a:LSD in a] := a[:];
     a plus b[MSD in b:LSD in b] := b[:];

Next: Zero any totally non overlapping digit #

     FOR place FROM overlap highest+digit order BY digit order TO overlap lowest-digit order
     DO a plus b[place] := zero OD;

Finally: normalise by removing leading & trailing "zero" digit #

     # out := # INITDIGITS digits normalise(a plus b)
   FI

 ELSE # Or: Add ALL overlapping digits #

   [extreme highest+(digit carry|digit order|0):extreme lowest]DIGIT a plus b;

First: Deal with the non overlapping Least Significant Digits #

   a plus b[overlap lowest-digit order:] := (LSD in a > LSD in b|a|b) [overlap lowest-digit order:];

Or: Add any overlapping digits #

   DIGIT carry := zero;
   FOR place FROM overlap lowest BY digit order TO overlap highest DO
     DIGIT digit a = a[place],
           digit b = b[place];
     REF DIGIT result = a plus b[place];
     IF digit carry THEN # used in big float #
       result := carry;
       carry := ( result +:= digit a );
       MOID( carry +:= ( result +:= (digit b) ) )
     ELSE
       result  := digit a;
       MOID( result +:= digit b )
     FI
   OD;

Next: Deal with the non overlapping Most Significant digits #

   FOR place FROM overlap highest+digit order BY digit order TO extreme highest DO
     REF DIGIT result = a plus b[place];
     IF digit carry THEN
       result := carry;
       carry := ( result +:= (MSD in a < MSD in b|a|b)[place] )
     ELSE
       result := (MSD in a < MSD in b|a|b)[place]
     FI
   OD;

Next: Deal with the carry #

   IF digit carry THEN
     a plus b[extreme highest+digit order] := carry
   FI;

Finally: normalise by removing leading & trailing "zero" digits #

   # out := # INITDIGITS digits normalise(a plus b)

 FI
 # out EXIT #

);</lang>File: Mixin_Big_float_subtraction.a68 <lang algol68>OP - = (DIGITS arg)DIGITS: (

 DIGITS out := arg;
 FOR digit FROM LSD arg TO MSD arg DO digit OF (digits OF out)[digit]:=-digit OF (digits OF arg)[digit] OD;
 out

);

OP SIGN = (DIGITS arg)INT:

 IF LSD arg - MSD arg = digit order THEN 0 ELSE SIGN digit OF (digits OF arg)[MSD arg] FI;

OP ABS = (DIGITS arg)DIGITS:

 IF SIGN arg < 0 THEN -arg ELSE arg FI;

OP - = (DIGITS a, b)DIGITS: a + -b; OP -:= = (REF DIGITS a, DIGITS b)DIGITS: a := a + -b;</lang> Start of test case:

File: Big_float_BCD_base.a68 <lang algol68>################################################

Define the basic operators and routines for #
manipulating DIGITS specific to a BCD base #

define the basic axioms of the number system you are extending #

MODE DIGIT = #LONG# INT; DIGIT digit base = #LONG# 10 ** digit width;

mixin the Big_float base definitions #

PR READ "Mixin_Big_float_base.a68" PR

MODE BIGREAL = DIGITS; BOOL digit carry = TRUE;

Most Significant Digit of the left = -1 #

OP ZERO = (DIGIT skip)DIGIT: 0; OP ONE = (DIGIT skip)DIGIT: 1;

Define the basic coersion/casting rules between types. #

OP INITLONGREAL = (BIGREAL a)LONG REAL:

 IF ABS(MSD a - LSD a) < 0 THEN 0
 ELSE
   LONG REAL out := digit OF (digits OF a)[MSD a];
   FOR place FROM MSD a - digit order BY -digit order TO LSD a DO
     out := out * digit base + digit OF (digits OF a)[place]
   OD;
   out * LONG REAL(digit base) ** -LSD a
 FI;

CO OP INITBIGREAL = (INT in int)BIGREAL:

 INITBIGREAL #LENG# in int;

END CO

OP INITBIGREAL = (#LONG# INT in int)BIGREAL: (

 PROC num digits = (#LONG# INT i)INT: (
   INT remainder := i, count := 0;
   WHILE remainder /= 0 DO
     remainder %:= digit base;
     MOID(count +:= 1)
   OD;
   count
 );

 DIGIT zero = 0;

 INT max = num digits(in int);
 [1-max:0]STRUCTDIGIT out;
 BIGREAL bigreal;

 #LONG# INT int := ABS in int;
 INT sign = SIGN in int;

 FOR i FROM UPB out BY digit order TO LWB out DO
   #LONG# INT digit := int MOD digit base;
   int := (int-digit) OVER digit base;
   (digit OF out)[i] := sign * digit;
   IF int = 0 THEN done FI
 OD;

done:

 digits OF bigreal := out;
 bigreal

);

OP INITBIGREAL = (REAL in real)BIGREAL:

 INITBIGREAL LENG in real;

OP INITBIGREAL = (LONG REAL in real)BIGREAL:

 SKIP; # TODO #

OP INITBIGREAL = ([]DIGIT digits)BIGREAL: (

Note: assumes digits have been normalised ! #

 STRUCT(FLEX[LWB digits:UPB digits]STRUCTDIGIT digits)out;
 digit OF digits OF out := digits;
 out

);

Important: Operator +:= is required by Mixin_Big_float_addition. #
Note: +:= returns carry #

OP +:= = (REF DIGIT lhs, DIGIT arg)DIGIT: (

 DIGIT out := lhs + arg;
 lhs := ABS out %* digit base * SIGN out;
 (out-lhs) % digit base * SIGN out

);

Important: Operator *:= is required by Mixin_Big_float_multiplication. #
Note: *:= returns carry #

OP *:= = (REF DIGIT lhs, DIGIT arg)DIGIT: (

 DIGIT out = lhs * arg;
 lhs := ABS out %* digit base * SIGN out;
 (out-lhs) % digit base * SIGN out

);

FORMAT big digit fmt = $n(digit width)(d)$; FORMAT big real fmt = $g(-digit width)","$;

OP REPR = (STRUCTDIGIT digit)STRING:

 whole(ABS digit OF digit,-digit width);

OP REPR = (BIGREAL big real)STRING:(

 STRING out;
 FOR place FROM MSD big real BY -digit order TO LSD big real DO
   out +:= REPR (digits OF big real)[place];
   IF place = 0 AND place /=  LSD big real OR
      place = 1 AND place =  MSD big real THEN out +:= "." FI
 OD;
 IF out = "" THEN out := "0" FI;
 IF SIGN big real<0 THEN "-" +=: out FI;
 IF MSD big real > 1 AND LSD big real > 1 OR
    MSD big real < 0 AND LSD big real < 0 THEN
 # No decimal point yet, so add an exponent #
   out+"e"+whole(digit order*LSD big real*digit width,0)
 ELSE
   out
 FI

);</lang>File: test_Big_float_BCD_addition.a68 <lang algol68>#!/usr/local/bin/a68g --script #

A program to test abritary length BCD floating point addition. #

PR READ "prelude/general.a68" PR

INT digit width := 1; # define how decimal digits each "big" DIGIT is #

include the basic axioms of the digits being used #

PR READ "Big_float_BCD_base.a68" PR PR READ "Mixin_Big_float_addition.a68" PR PR READ "Mixin_Big_float_subtraction.a68" PR # need SIGN #

test: (

 BIGREAL pattern = INITBIGREAL 012345679,
 INT pattern width = 9;

 BIGREAL
   sum := INITBIGREAL 0,
   shifted pattern := pattern,
   shifted tiny := INITBIGREAL 1; # typically 0.000.....00001 #

 FOR term FROM -8 TO 20 DO

 # First make shifted pattern smaller by shifting right by the pattern width #
   digits OF shifted pattern := (digits OF shifted pattern)[@term*pattern width+2];
   digits OF shifted tiny := (digits OF shifted tiny)[@(term+1)*pattern width];
   
   MOID(sum +:= shifted pattern);

 # Manually multiply by 81 by repeated addition #
   BIGREAL prod := sum + sum + sum;
   MOID(prod +:= prod + prod);
   MOID(prod +:= prod + prod);
   MOID(prod +:= prod + prod);

   BIGREAL total = prod + shifted tiny;

   IF term < -4 THEN 
     print(( REPR sum," x 81 gives: ", REPR prod, ", Plus ",REPR shifted tiny," gives: ")) 
   ELSE
     print((LSD prod - MSD prod + 1," digit test result: "))
   FI;
   printf(($g$, REPR total, $" => "b("Passed","Failed")"!"$, LSD total = MSD total, $l$))
 OD

)</lang> Output:

12345679e63 x 81 gives: 999999999e63, Plus 1e63 gives: 1e72 => Passed!
12345679012345679e54 x 81 gives: 999999999999999999e54, Plus 1e54 gives: 1e72 => Passed!
12345679012345679012345679e45 x 81 gives: 999999999999999999999999999e45, Plus 1e45 gives: 1e72 => Passed!
12345679012345679012345679012345679e36 x 81 gives: 999999999999999999999999999999999999e36, Plus 1e36 gives: 1e72 => Passed!
        +45 digit test result: 1e72 => Passed!
        +54 digit test result: 1e72 => Passed!
        +63 digit test result: 1e72 => Passed!
        +72 digit test result: 1e72 => Passed!
        +81 digit test result: 1e72 => Passed!
        +90 digit test result: 1e72 => Passed!
        +99 digit test result: 1e72 => Passed!
       +108 digit test result: 1e72 => Passed!
       +117 digit test result: 1e72 => Passed!
       +126 digit test result: 1e72 => Passed!
       +135 digit test result: 1e72 => Passed!
       +144 digit test result: 1e72 => Passed!
       +153 digit test result: 1e72 => Passed!
       +162 digit test result: 1e72 => Passed!
       +171 digit test result: 1e72 => Passed!
       +180 digit test result: 1e72 => Passed!
       +189 digit test result: 1e72 => Passed!
       +198 digit test result: 1e72 => Passed!
       +207 digit test result: 1e72 => Passed!
       +216 digit test result: 1e72 => Passed!
       +225 digit test result: 1e72 => Passed!
       +234 digit test result: 1e72 => Passed!
       +243 digit test result: 1e72 => Passed!
       +252 digit test result: 1e72 => Passed!
       +261 digit test result: 1e72 => Passed!

J

I am not currently able to implement the task exactly because I do not quite understand what is being asked for (nor why it would be useful).

That said, the task does specify some calculations to be performed.

Given

<lang j>e=: 2 : 0

  m * 10 ^ x: n

)</lang>

<lang> 1 e 63 + 12345679 e 63 * 81 1000000000000000000000000000000000000000000000000000000000000000000000000

  1 e 54 + 12345679012345679 e 54 * 81

1000000000000000000000000000000000000000000000000000000000000000000000000

  1 e 45 + 12345679012345679012345679x e 45 * 81

1000000000000000000000000000000000000000000000000000000000000000000000000

  1 e 36 + 12345679012345679012345679012345679x e 36 * 81

1000000000000000000000000000000000000000000000000000000000000000000000000</lang>