Numeric separator syntax
Several programming languages allow separators in numerals in order to group digits together.
- Task
Show the numeric separator syntax and describe its specification. E.g., what separators are eligible? Can there be multiple consecutive separators? What position can a separator be in? Etc.
Factor
Factor allows the comma ,
as a numeric separator/grouping character in number literals.
<lang factor>USE: prettyprint
12,345 . ! 12345
! commas may be used at arbitrary intervals 1,23,456,78910 . ! 12345678910
! a comma at the beginning or end will parse as a word, likely causing an error ! ,123 . ! No word named “,123” found in current vocabulary search path ! 123, . ! No word named “123,” found in current vocabulary search path
! likewise, two commas in a row will parse as a word ! 1,,23 . ! No word named “1,,23” found in current vocabulary search path
! There are no exceptions to which numbers may have separators ! binary/octal/decimal/hexadecimal integers and floats are supported 0b1,000,001 . ! 65 -1,234e-4,5 . ! -1.234e-42 0x1.4,4p3 . ! 10.125
! as are ratios 45,2+1,1/43,2 . ! 452+11/432 1,1/1,7 . ! 11/17
! and complex numbers C{ 5.225,312 2.0 } . ! C{ 5.225312 2.0 }</lang>
If one desires to define a syntax for different grouping rules, that is possible: <lang factor>USING: lexer math.parser prettyprint sequences sets ;
<< SYNTAX: PN: scan-token "_" without string>number suffix! ; >>
! permissive numbers PN: _1_2_3_ . ! 123 PN: 1__234___567 . ! 1234567 PN: 0b0___10.100001p3 . ! 20.125</lang>
Since Factor's parser is exposed, one could even make changes to the number parser, obviating the need for parsing words. <lang factor>USE: prettyprint
<<
"IN: math.parser.private USE: combinators
- @pos-digit-or-punc ( i number-parse n char -- n/f )
{ { 95 [ [ @pos-digit ] require-next-digit ] } ! normally 44 { 43 [ ->numerator ] } { 47 [ ->denominator ] } { 46 [ ->mantissa ] } [ [ @pos-digit ] or-exponent ] } case ; inline" eval( -- )
>>
3_333_333 . ! 3333333</lang>
Perl 6
Perl 6 allows underscore as a grouping / separator character in numeric inputs, though there are a few restrictions.
<lang perl6># Any numeric input value may use an underscore as a grouping/separator character.
- May occur in nearly any position, in any* number. * See restrictions below.
- Int
say 1_2_3; # 123
- Binary Int
say 0b1_0_1_0_1; # 21
- Hexadecimal Int
say 0xa_bc_d; # 43981
- Rat
say 1_2_3_4.2_5; # 1234.25
- Num
say 6.0_22e4; # 60220
- There are some restrictions on the placement.
- An underscore may not be on an edge boundary, or next to another underscore.
- The following are all syntax errors.
- say _1234.25;
- say 1234_.25;
- say 1234._25;
- say 1234.25_;
- say 12__34.25;</lang>
Racket
Vanilla Racket does not have numeric separator syntax. However, it can be defined by users. For instance:
<lang racket>#lang racket
(require syntax/parse/define
(only-in racket [#%top racket:#%top]) (for-syntax racket/string))
(define-syntax-parser #%top
[(_ . x) #:do [(define s (symbol->string (syntax-e #'x))) (define num (string->number (string-replace s "_" "")))] #:when num #`#,num] [(_ . x) #'(racket:#%top . x)])
1_234_567.89 1_234__567.89</lang>
- Output:
1234567.89 1234567.89
In the above implementation of the syntax, _
is the separator. It allows multiple consecutive separators, and allows the separator anywhere in the numeral (front, middle, and back).
Implementation details: any token with _
is considered an identifier in vanilla Racket. If it's not defined already, it would be unbound. We therefore can define #%top
to control these unbound identifiers: if the token is a number after removing _
, expand it to that number.
If we wish to, for example, disallow multiple consecutive separators like 1_234__567.89
, we could do so easily:
<lang racket>#lang racket
(require syntax/parse/define
(only-in racket [#%top racket:#%top]) (for-syntax racket/string))
(define-syntax-parser #%top
[(_ . x) #:do [(define s (symbol->string (syntax-e #'x))) (define num (string->number (string-replace s "_" "")))] #:when num (syntax-parse #'x [_ #:fail-when (string-contains? s "__") "invalid multiple consecutive separators" #`#,num])] [(_ . x) #'(racket:#%top . x)])
1_234_567.89 1_234__567.89</lang>
- Output:
1_234__567.89: invalid multiple consecutive separators in: 1_234__567.89