Numeric separator syntax: Difference between revisions

Content added Content deleted

Inline

Revision as of 18:16, 30 August 2019

Several programming languages allow separators in numerals in order to group digits together.

Task

Show the numeric separator syntax and describe its specification. E.g., what separators are eligible? Can there be multiple consecutive separators? What position can a separator be in? Etc.

Factor

Factor allows the comma , as a separator character in number literals. <lang factor>USE: prettyprint

12,345 . ! 12345

! commas may be used at arbitrary intervals 1,23,456,78910 . ! 12345678910

! a comma at the beginning or end will parse as a word, likely causing an error ! ,123 . ! No word named “,123” found in current vocabulary search path ! 123, . ! No word named “123,” found in current vocabulary search path

! likewise, two commas in a row will parse as a word ! 1,,23 . ! No word named “1,,23” found in current vocabulary search path

! There are no exceptions to which numbers may have separators ! binary/octal/decimal/hexadecimal integers and floats are supported 0b1,000,001 . ! 65 -1,234e-4,5 . ! -1.234e-42 0x1.4,4p3 . ! 10.125

! as are ratios 45,2+1,1/43,2 . ! 452+11/432 1,1/1,7 . ! 11/17

! and complex numbers C{ 5.225,312 2.0 } . ! C{ 5.225312 2.0 }</lang>

If one desires to define a syntax for different separator rules, that is possible: <lang factor>USING: lexer math.parser prettyprint sequences sets ;

<< SYNTAX: PN: scan-token "_" without string>number suffix! ; >>

! permissive numbers PN: _1_2_3_ . ! 123 PN: 1__234___567 . ! 1234567 PN: 0b0___10.100001p3 . ! 20.125</lang>

Since Factor's parser is exposed, one could even make changes to the number parser, obviating the need for parsing words. <lang factor>USING: eval prettyprint ;

<<

"IN: math.parser.private USE: combinators

@pos-digit-or-punc ( i number-parse n char -- n/f )

   {
       { 95 [ [ @pos-digit ] require-next-digit ] }   ! normally 44
       { 43 [ ->numerator ] }
       { 47 [ ->denominator ] }
       { 46 [ ->mantissa ] }
       [ [ @pos-digit ] or-exponent ]
   } case ; inline" eval( -- )

>>

3_333_333 . ! 3333333</lang>

Perl 6

Perl 6 allows underscore as a grouping / separator character in numeric inputs, though there are a few restrictions.

<lang perl6># Any numeric input value may use an underscore as a grouping/separator character.

May occur in nearly any position, in any* number. * See restrictions below.

Int

say 1_2_3; # 123

Binary Int

say 0b1_0_1_0_1; # 21

Hexadecimal Int

say 0xa_bc_d; # 43981

Rat

say 1_2_3_4.2_5; # 1234.25

Num

say 6.0_22e4; # 60220

There are some restrictions on the placement.
An underscore may not be on an edge boundary, or next to another underscore.
The following are all syntax errors.

say _1234.25;
say 1234_.25;
say 1234._25;
say 1234.25_;
say 12__34.25;</lang>

Racket

Vanilla Racket does not have numeric separator syntax. However, it can be defined by users. For instance:

<lang racket>#lang racket

(require syntax/parse/define

        (only-in racket [#%top racket:#%top])
        (for-syntax racket/string))

(define-syntax-parser #%top

 [(_ . x)
  #:do [(define s (symbol->string (syntax-e #'x)))
        (define num (string->number (string-replace s "_" "")))]
  #:when num
  #`#,num]
 [(_ . x) #'(racket:#%top . x)])

1_234_567.89 1_234__567.89</lang>

Output:

1234567.89
1234567.89

In the above implementation of the syntax, _ is the separator. It allows multiple consecutive separators, and allows the separator anywhere in the numeral (front, middle, and back).

Implementation details: any token with _ is considered an identifier in vanilla Racket. If it's not defined already, it would be unbound. We therefore can define #%top to control these unbound identifiers: if the token is a number after removing _, expand it to that number.

If we wish to, for example, disallow multiple consecutive separators like 1_234__567.89, we could do so easily:

<lang racket>#lang racket

(require syntax/parse/define

        (only-in racket [#%top racket:#%top])
        (for-syntax racket/string))

(define-syntax-parser #%top

 [(_ . x)
  #:do [(define s (symbol->string (syntax-e #'x)))
        (define num (string->number (string-replace s "_" "")))]
  #:when num
  (syntax-parse #'x
    [_ #:fail-when (string-contains? s "__") "invalid multiple consecutive separators"
       #`#,num])]
 [(_ . x) #'(racket:#%top . x)])

1_234_567.89 1_234__567.89</lang>

Output:

1_234__567.89: invalid multiple consecutive separators in: 1_234__567.89