Numeric separator syntax

From Rosetta Code
Numeric separator syntax is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Several programming languages allow separators in numerals in order to group digits together.


Task

Show the numeric separator syntax and describe its specification.


E.G.
  •   What separators are eligible?
  •   Can there be multiple consecutive separators?
  •   What position can a separator be in?
  •   Etc.



11l

The apostrophe, ', is used as a digit separator. 3-digit groups [from the right] should be used in decimal numeric literals.

print(100'000) // correct numeric literal
print(1'00000) // wrong numeric literal

Ada

The Ada language uses the underscore '_' as a digit separator. The underscore separator must be between digits.

with Ada.Text_IO;       use Ada.Text_IO;
with Ada.Float_Text_IO; use Ada.Float_Text_IO;

procedure Main is
   type u64 is mod 2**64;
   pi       : constant Float := 
     3.14159_26535_89793_23846_26433_83279_50288_41971_69399_37511;
   Trillion : u64            := 1_000_000_000_000;
begin
   Put ("pi : ");
   Put (Item => pi, Exp => 0, Aft => 7);
   New_Line;
   Put_Line ("Trillion : " & Trillion'Image);
end Main;
Output:
pi :  3.141593
Trillion :  1000000000000

ALGOL 68

In Algol 68, spaces are not significant in identifiers or numeric literals. This allows spaces to be used as numeric separators.
Single or multiple spaces can be used as desired, it is not necessary to group the digits into blocks of three.

BEGIN
    INT  a = 1 234 567;
    REAL b = 3      .    1 4159 26 5 359;
    print( ( whole( a, 0 ), "  ", fixed( b, - 14, 11 ), newline ) )
END
Output:
1234567   3.14159265359

Arturo

Arturo does not have numeric separator syntax.

Numbers can be either normal integers (without any separators whatsoever) or floating-point numbers.

a: 1234567
b: 3.14

print a
print b
Output:
1234567
3.14

AWK

# syntax: GAWK -f NUMERIC_SEPARATOR_SYNTAX.AWK
# converted from ALGOL 68
BEGIN {
# AWK lacks numeric separators but can be simulated using white space.
    a = 1 234 567
    b = 3  "."  1 4159 26 5 359
    print(a,b)
    exit(0)
}
Output:
1234567 3.14159265359

C

locale.h provides Localization functions and is part of the C Standard Library. Separating digits in code text was not possible until C23.

#include <locale.h>
#include <stdio.h>

int main()
{
  unsigned long long int trillion = 1000000000000;

  setlocale(LC_NUMERIC,"");

  printf("Locale : %s, One Trillion : %'llu\n", setlocale(LC_CTYPE,NULL),trillion);

  return 0;
}

Output :

[pi@raspberrypi:~/doodles $ ./a.out 
Locale : C, One Trillion : 1,000,000,000,000

C++

C++14 introduced the apostrophe (') as a numeric separator. The specification is n3781. The code below will thus run only on a C++14 or later compiler :

//Aamrun, 4th October 2021

#include <iostream>
using namespace std;

int main()
{
    long long int a = 30'00'000;

    std::cout <<"And with the ' in C++ 14 : "<< a << endl;

    return 0;
}
Output:
And with the ' in C++ 14 : 3000000

Delphi

Works with: Delphi version 6.0
Library: [[:Category:|]][[Category:]]

Delphi doesn't support any alternate number separators for compiler source code input. However, Delphi does support virtually any character for thousand and decimal separators. The following code shows how to override the Windows' thousands and decimal separator conventions so it support only the US standard. This is useful when reading and writing files. If you don't do this, files written in country may be unreadable in another country. This same technique can be used to set any decimal and thousands separator you want.

procedure SetInternational(Flag: boolean);
{Enable/Disable International Support }
var DefaultLCID: Integer;
begin
InternationalFlag:=Flag;
DefaultLCID := GetThreadLocale;
if Flag then
	begin
	{This gets a "platform" warning}
	{$WARNINGS OFF}
	IndSystemData.DecimalSeparator := GetLocaleChar(DefaultLCID, LOCALE_SDECIMAL, '.');
	IndSystemData.ThousandSeparator := GetLocaleChar(DefaultLCID, LOCALE_STHOUSAND, ',');
	{$WARNINGS ON}
	end
else
	begin
	IndSystemData.DecimalSeparator:='.';
	{No thousands separator so we can parse comma separated data}
	IndSystemData.ThousandSeparator:=#0;
	end;
end;
Output:


Factor

Factor allows the comma , as a separator character in number literals.

USE: prettyprint

12,345 .   ! 12345

! commas may be used at arbitrary intervals
1,23,456,78910 .  ! 12345678910

! a comma at the beginning or end will parse as a word, likely causing an error
! ,123 .   ! No word named “,123” found in current vocabulary search path
! 123, .   ! No word named “123,” found in current vocabulary search path

! likewise, two commas in a row will parse as a word
! 1,,23 .   ! No word named “1,,23” found in current vocabulary search path

! There are no exceptions to which numbers may have separators
! binary/octal/decimal/hexadecimal integers and floats are supported
0b1,000,001 .   ! 65
-1,234e-4,5 .   ! -1.234e-42
0x1.4,4p3 .   ! 10.125

! as are ratios
45,2+1,1/43,2 .   ! 452+11/432
1,1/1,7 .   ! 11/17

! and complex numbers
C{ 5.225,312 2.0 } .   ! C{ 5.225312 2.0 }

If one desires to define a syntax for different separator rules, that is possible:

USING: lexer math.parser prettyprint sequences sets ;

<< SYNTAX: PN: scan-token "_" without string>number suffix! ; >>

! permissive numbers
PN: _1_2_3_ .   ! 123
PN: 1__234___567 .   ! 1234567
PN: 0b0___10.100001p3 .   ! 20.125

Since Factor's parser is exposed, one could even make changes to the number parser, obviating the need for parsing words.

USING: eval prettyprint ;

<<

"IN: math.parser.private
USE: combinators
: @pos-digit-or-punc ( i number-parse n char -- n/f )
    {
        { 95 [ [ @pos-digit ] require-next-digit ] }   ! normally 44
        { 43 [ ->numerator ] }
        { 47 [ ->denominator ] }
        { 46 [ ->mantissa ] }
        [ [ @pos-digit ] or-exponent ]
    } case ; inline" eval( -- )

>>

3_333_333 .   ! 3333333

FreeBASIC

FreeBASIC does not have numeric separator syntax. Not allow the use of the underscore _ as a digit separator.

However, you could, for example, define a macro to remove underscores.

Function Remove(Byval Text As String, Char As String="_") As String
    Dim As Long i
    For n As Long = 0 To Len(Text)-1
        If Text[n] <> Asc(char) Then Text[i] = Text[n]: i += 1
    Next n
    Return Left(Text,i)
End Function

#macro __(t,b...)
    Vallng(Remove(#t,b))
#endmacro

Print __(1_234_567)
Print __(&h__D__E__A__D__B__E__E__F)
Print __(&hFF_BB_00_00 Or &h_FFBB_0000)
Print __(&b_0101_0001_1110_0000)
Print __(26-10-48,"-")  'not a dash
Print Hex(__(&hFF_BB_00_01) Or __(&h_FFBB_0010))
Output:
 1234567
 3735928559
 4290445312
 20960
 261048
FFBB0011

Go

From version 1.13, Go supports underscores as digit separators for numeric literals. An underscore may appear between any two digits or between the literal prefix (0b, 0o, 0x) and the first digit.

Using the Raku examples plus a few more which Go allows:

package main

import "fmt"

func main() {
    integers := []int{1_2_3, 0b1_0_1_0_1, 0xa_bc_d, 0o4_37, 0_43_7, 0x_beef}
    for _, integer := range integers {
        fmt.Printf("%d  ", integer)
    }
    floats := []float64{1_2_3_4.2_5, 6.0_22e4, 0x_1.5p-2}
    for _, float := range floats {
        fmt.Printf("%g  ", float)
    }
    fmt.Println()
    // none of these compile
    // floats2 := []float64{_1234.25, 1234_.25, 1234._25, 1234.25_, 12__23.25}
}
Output:
123  21  43981  287  287  48879  1234.25  60220  0.328125

Java

Underscores have to be located within digits. The number of underscores and their position is not restricted.

public class NumericSeparatorSyntax {

    public static void main(String[] args) {
        runTask("Underscore allowed as seperator", 1_000);
        runTask("Multiple consecutive underscores allowed:", 1__0_0_0);
        runTask("Many multiple consecutive underscores allowed:", 1________________________00);
        runTask("Underscores allowed in multiple positions", 1__4__4);
        runTask("Underscores allowed in negative number", -1__4__4);
        runTask("Underscores allowed in floating point number", 1__4__4e-5);
        runTask("Underscores allowed in floating point exponent", 1__4__440000e-1_2);
        //runTask(_100);  does not compile - cannot be before first digit
        //runTask(100_);  does not compile - cannot be after last digit
        //runTask(144_.25);  does not compile - must be within digits
        //runTask(144._25);  does not compile - must be within digits
    }
    
    private static void runTask(String description, long n) {
        runTask(description, n, "%d");
    }

    private static void runTask(String description, double n) {
        runTask(description, n, "%3.7f");
    }

    private static void runTask(String description, Number n, String format) {
        System.out.printf("%s:  " + format + "%n", description, n);
    }

}
Output:
Underscore allowed as seperator:  1000
Multiple consecutive underscores allowed::  1000
Many multiple consecutive underscores allowed::  100
Underscores allowed in multiple positions:  144
Underscores allowed in negative number:  -144
Underscores allowed in floating point number:  0.0014400
Underscores allowed in floating point exponent:  0.0000144

jq

Works with jq and gojq, the C and Go implementations of jq

jq does not support any separator syntax for numbers, and does not provide any built-in filters for formatting them with a thousands-separator, or for "decommatizing" strings that could be interpreted as numbers.

The following definitions, however, can be used to commatize integers, whether expressed as strings or as (JSON) numbers. Exponential notation is supported, as illustrated by some of the examples below.

Note that since both gojq and sufficiently recent versions of jq support indefinitely large numeric integers, some of the examples assume such support.

# The def of _nwise/1 can be omitted if using the C implementation of jq.
def _nwise($n):
  def n: if length <= $n then . else .[0:$n] , (.[$n:] | n) end;
  n;

# commatize/0 and commatize/1 are intended for integers or integer-valued strings,
# where integers of the form [-]?[0-9]*[Ee][+]?[0-9]+ are allowed.
# Notice that a leading '+' is disallowed, as is an exponent of the form '-0'.
# Output: a string
def commatize($comma):
  def c: [explode | reverse | _nwise(3) | reverse | implode] | reverse | join($comma);
  def e: "unable to commatize: " + tostring | error;

  if type == "string"
  then if test("^[0-9]+$") then c
       elif test("^-[0-9]+$") then "-" + .[1:] | c
       else (capture("(?<s>[-])?(?<i>[0-9]*)[Ee][+]?(?<e>[0-9]+)$") // null) 
       | if .
         then if .i == "" then .i="1" else . end
         | .s |= (if . = null then "" else . end)
         | .s + ((.i + (.e|tonumber) * "0") | c)
         else e
         end
       end
  elif type == "number" and . == floor
  then if . >= 0
       then tostring|commatize($comma)
       else "-" + (-. | tostring | commatize($comma) )
       end
  else e
  end;

def commatize:
  commatize(",");

Examples

[ 1e6, 1e9, 123456789, -123456789012, 1e20, "e20", -10e19, 123456789123456789123456789 ] as $nums
| [",", ".", " ", "*"] as $seps
| range(0;$nums|length) as $i
| $nums[$i] | commatize($seps[$i] // ",")
Output:
1,000,000
1.000.000.000
123 456 789
-123*456*789*012
100,000,000,000,000,000,000
100,000,000,000,000,000,000
-100,000,000,000,000,000,000
123,456,789,123,456,789,123,456,789

Julia

Julia allows use of the underscore _ as a digit separator. The _ separator must be preceded and followed by a digit. Commas are not allowed in numeric literals.

    
    julia> 2_9
    29
    
    julia> 2_9_9_0
    2990
    
    julia> 2_9_9.0_01
    299.001
    
    julia> 1._01
    ERROR: syntax: invalid numeric constant "1._"
    
    julia> -1_0
    -10
    
    julia> -_10
    ERROR: UndefVarError: _10 not defined
    Stacktrace:
     [1] top-level scope at none:0
    
    julia> 0x34_ff
    0x34ff

    julia> 0x_34ff
    ERROR: syntax: invalid numeric constant "0x_"

    julia> 10_000_000
    10000000
    
    julia> 10__000__000
    ERROR: UndefVarError: __000__000 not defined

Nim

Nim allows to use underscores _ in numeric literals. An underscore must be preceded and followed by a digit, which means that it cannot be placed at start or end of a literal and that consecutive underscores are forbidden.

Using Julia examples, for instance in "inim" REPL:

const a = 2_9             # 29
const b = 2_9_9_0         # 2990
const c = 2_9_9.0_01      # 299.001
const d = 1._01           # Error: invalid token: _ (\95)
const e = -1_0            # -10
const f = -_10            # Error: invalid token: _ (\95)
const g = 0x34_ff         # 0x34ff
const h = 0x_34ff         # Error: invalid number: '0x_34ff'
const i = 10_000_000      # 10000000
const j = 10__000__000    # Error: only single underscores may occur in a token and token may not end with an underscore: e.g. '1__1' and '1_' are invalid

OCaml

Underscores can be used as separators in integer or floating-point literals, and they are ignored. Underscores can be in any position except at the beginning, and you can use consecutive underscores.

Printf.printf "%d\n" 1_2_3;; (* 123 *)
Printf.printf "%d\n" 0b1_0_1_0_1;; (* 21 *)
Printf.printf "%d\n" 0xa_bc_d;; (* 43981 *)
Printf.printf "%d\n" 12__34;; (* 1234 *)
Printf.printf "%f\n" 1_2_3_4.2_5;; (* 1234.250000 *)
Printf.printf "%f\n" 6.0_22e4;; (* 60220.000000 *)
Printf.printf "%f\n" 1234_.25;; (* 1234.250000 *)
Printf.printf "%f\n" 1234._25;; (* 1234.250000 *)
Printf.printf "%f\n" 1234.25_;; (* 1234.250000 *)

Pascal

Works with FPC (currently only version 3.3.1).

An underscore can be used as a digit separator. This is by default in {$mode delphi}, in other modes it is activated using the {$modeswitch underscoreisseparator}.

program test;
{$mode fpc}
{$modeswitch underscoreisseparator}
begin
  WriteLn(%1001_1001);
  WriteLn(&121_102);
  WriteLn(-1_123_123);
  WriteLn($1_123_123);
  WriteLn(-1_123___123.000_000);
  WriteLn(1_123_123.000_000e1_2);
end.

Perl

Perl allows underscore as a grouping / separator character in numeric inputs, as long as you use it between digits, and you do not use two underscores in a row:

# Int
print 1_2_3, "\n";  # 123

# Binary Int
print 0b1_0_1_0_1, "\n"; # 21

# Hexadecimal Int
print 0xa_bc_d, "\n"; # 43981

# Rat
print 1_2_3_4.2_5, "\n"; # 1234.25

# Num
print 6.0_22e4, "\n"; # 60220

Phix

Phix simply ignores underscores in numeric literals, however a leading underscore signifies a normal identifier, much like a123 or tmp2.
Commas are not allowed in numeric literals, since they delimit sequence elements, routine parameters, and such like, for example {1,2,3,4}.

with javascript_semantics
? 1_2_3          -- 123
--? _1234.25    -- undefined identifier _1234
? 0b1_0_1_0_1   -- 21
? 0b_1_0_1_0_1  -- 21
? 0xa_bc_d      -- 43981
? #_DEAD_BEEF_  -- 3735928559.0
? 0x_dead_beef  -- 3735928559.0
? 3.14_15_93    -- 3.141593
? 1_2_3_4.2_5   -- 1234.25
? 1234_.25      -- 1234.25
? 1234._25      -- 1234.25
? 1234.25_      -- 1234.25
? 12__34.25     -- 1234.25
? 6.0_22e4      -- 60220

Python

Works with: Python version 3.6+

The Syntax for separators in numbers, (numeric literals), is given here in the Python documentation.

  • The underscore, '_', is used as a separator.
  • Single underscores can be used to separate digits or can occur after base specifiers.
  • E.g. 100_000_000_000, 0x_dead_beef, 3.14_15_93

Quackery

Quackery does not have numeric separator syntax. However, as the compiler is extensible one could, for example, define a builder (i.e. a compiler directive) n to strip commas from the space delimited number following it in the Quackscript.

 [ nextword
   [] swap witheach
     [ dup char , = iff
         drop else join ]
   swap join ]            builds n ( [ $ --> $ [ )
Output:

Testing in the Quackery shell.

/O>  [ nextword
...    [] swap witheach
...      [ dup char , = iff
...          drop else join ]
...    swap join ]            builds n ( [ $ --> [ $ )
... 

Stack empty.

/O> n 123,456,789
... 

Stack: 123456789 

/O>


Racket

Vanilla Racket does not have numeric separator syntax. However, it can be defined by users. A quick solution is to use #%top:

#lang racket

(require syntax/parse/define
         (only-in racket [#%top racket:#%top])
         (for-syntax racket/string))

(define-syntax-parser #%top
  [(_ . x)
   #:do [(define s (symbol->string (syntax-e #'x)))
         (define num (string->number (string-replace s "_" "")))]
   #:when num
   #`#,num]
  [(_ . x) #'(racket:#%top . x)])

1_234_567.89
1_234__567.89
Output:
1234567.89
1234567.89

In the above implementation of the syntax, _ is the separator. It allows multiple consecutive separators, and allows the separator anywhere in the numeral (front, middle, and back).

Implementation details: any token with _ is considered an identifier in vanilla Racket. If it's not defined already, it would be unbound. We therefore can define #%top to control these unbound identifiers: if the token is a number after removing _, expand it to that number.

If we wish to, for example, disallow multiple consecutive separators like 1_234__567.89, we could do so easily:

#lang racket

(require syntax/parse/define
         (only-in racket [#%top racket:#%top])
         (for-syntax racket/string))

(define-syntax-parser #%top
  [(_ . x)
   #:do [(define s (symbol->string (syntax-e #'x)))
         (define num (string->number (string-replace s "_" "")))]
   #:when num
   (syntax-parse #'x
     [_ #:fail-when (string-contains? s "__") "invalid multiple consecutive separators"
        #`#,num])]
  [(_ . x) #'(racket:#%top . x)])

1_234_567.89
1_234__567.89
Output:
1_234__567.89: invalid multiple consecutive separators in: 1_234__567.89

A more complicated solution is to create a new language that changes Racket's reader. One approach is to adjust the readtable to recognize the new number literals so that we don't need to change the whole reader. While being slightly more complicated, this solution is better in a sense that (read) will also recognize the new number literals.

Raku

(formerly Perl 6) Raku allows underscore as a grouping / separator character in numeric inputs, though there are a few restrictions.

# Any numeric input value may use an underscore as a grouping/separator character.
# May occur in nearly any position, in any* number. * See restrictions below.

# Int
say 1_2_3;  # 123

# Binary Int
say 0b1_0_1_0_1; # 21

# Hexadecimal Int
say 0xa_bc_d; # 43981

# Rat
say 1_2_3_4.2_5; # 1234.25

# Num
say 6.0_22e4; # 60220

# There are some restrictions on the placement.
# An underscore may not be on an edge boundary, or next to another underscore.
# The following are all syntax errors.

# say _1234.25;
# say 1234_.25;
# say 1234._25;
# say 1234.25_;
# say 12__34.25;

REXX

The REXX language doesn't allow commas (or other separators) in decimal numbers   (for input),   commas are considered argument separators   (if used from within a program,   or as (passed/invoked) arguments from any program).

However, for   binary   and   hexadecimal numbers,   (multiple) blanks are allowed in appropriate places.


For   binary   numbers,   blanks are allowed between groups of four binary digits.

For example:

   '1101 1001'B
   '1101 1001'b
   "1111 0101 0011 0010"B
    '111 0101 1110'b       is the same as  '0111 0101 1110'b   


For   hexadecimal   numbers,   blanks are allowed between pairs of hexadecimal digits.

For example:

   'de ad    be ef 'x
   "08 09 0A"X
   '789 cc'x               is the same as   '07 89 CC'x


For   decimal   numbers,   blanks are allowed between the sign (if present) and the numeric part of the number.
Optional, blanks are allowed before the sign,   and also after the number.

For example:

   + 4500
   -   1719


There is a way to work around such that blanks or commas could be used within a REXX program with a bit of coding:

pi= 3 . 14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74945
pi= 3 . 14159_26535_89793_23846_26433_83279_50288_41971_69399_37510_58209_74945
pi = space( translate(pi, , ",_"), 0)

─── where the last REXX statement will translate (change) any number of separator characters into blanks,   and
remove all blanks from the "number".

Ruby

Ruby supports one separator, the underscore. It behaves like Perl's underscore.

Scala

Since Scala 2.13.0 it's stated in the Scala Language Specification that: "The digits of a numeric literal may be separated by arbitrarily many underscores for purposes of legibility." Let's see how its work in a Scala REPL session:

Welcome to Scala 2.13.0 (Java HotSpot(TM) 64-Bit Server VM, Java 12.0.2).
Type in expressions for evaluation. Or try :help.

scala> // Integer Literals

scala> // Using _ as a digit separator (neither leading nor trailing) it can be placed anywhere in the number.

scala> 1_2_3
res0: Int = 123

scala> 0xa_bc_d
res1: Int = 43981

scala> 0x_dead_beef
res2: Int = -559038737

scala> 1_2_3_4.2_5
res3: Double = 1234.25

scala> 6.0_22e4
res4: Double = 60220.0

scala> 12__34.25
res5: Double = 1234.25

scala>

Sidef

Sidef allows underscores as a separator character in numeric inputs.

# Int
say 1_2_3;  # 123

# Binary Int
say 0b1_0_1_0_1; # 21

# Hexadecimal Int
say 0xa_bc_d; # 43981

# Rational
say 1_2_3_4.2_5; # 1234.25

# Rational in exponential notation
say 6.0_22e4; # 60220

# Apart from starting the number with an underscore, can be placed anywhere in the number.

say 1234_.25;       # 1234.25
say 1234._25;       # 1234.25
say 1234.25_;       # 1234.25
say 12__34.25;      # 1234.25
# say _1234.25;     # syntax error

V (Vlang)

Vlang also supports writing numbers with _ as a separator.

fn main() {
    numbers := [1_000_000, 2_882, 3_122, 0b1_0_0_0_1, 0xa_bc_d]
    for number in numbers {println(number)}
}
Output:
1000000
2882
3122
17
43981

XPL0

Numbers can contain underlines, which is useful for making long strings of digits easier to recognize. Underlines in coded constants are simply ignored by the parser. Underlines in numbers typed in to a running program are also ignored.

def Meg = 1_000_000;
[IntOut(0, Meg);  CrLf(0);
RlOut(0, 123__45.67_89_);  CrLf(0);
HexOut(0, $ABCD_EF01);  CrLf(0);
HexOut(0, %1010_1011_1100_1101_1110_1111_0000_0001);  CrLf(0);
IntOut(0, IntIn(0));
]
Output:
1000000
12345.67890
ABCDEF01
ABCDEF01
-321_00__0_
-321000

Wren

Library: Wren-fmt

Consistent with its C heritage, Wren doesn't support any form of separator in numeric literals. However, it's possible using the Wren-fmt module to add any single character 'thousands' separator when 'stringifying' an integer as the example below shows.

As currently written, this just supports separation of decimal integers into 3 digit groups from the right though it could be extended to deal with other scenarios as well.

import "./fmt" for Fmt

var nums = [1e6, 1e9, 123456789, -123456789012]
var seps = [",", ".", " ", "*"]
for (i in 0...nums.count) System.print(Fmt.commatize(nums[i], seps[i]))
Output:
1,000,000
1.000.000.000
123 456 789
-123*456*789*012

zkl

For source code, integers and floats allow a "_" between digits (or trailing)
and completely ignores them: 
   1_000 == 1_000_ == 1_0_0_0 == 1__________000
   1_2.3_4 == 12.34
For hex, both "_" and "|" are allowed: 0x12|34
For printing, the String.fmt method will add separators for %d (interger: ","), 
%f (float: ","), %x (hex: "|") and %2B (binary: "|").
"%,d  %,.0f  %,x  %,.2B".fmt(1234, 1234.0, 0x1234, 0x1234).println();
   --> "1,234  1,234  12|34  1|0010|0011|0100"
Each objects toString method has optional parameters to specify a separator 
and "column width". This method is called (by fmt) for the above tags.