Non-decimal radices/Input: Difference between revisions

From Rosetta Code
Content added Content deleted
(The reverse operation is in task Common number base formatting)
Line 5: Line 5:


The solutions may assume that the base of the number in the string is known.
The solutions may assume that the base of the number in the string is known.

The reverse operation is in task [[Common number base formatting]]


For general number base conversion, see [[Number base conversion]].
For general number base conversion, see [[Number base conversion]].

Revision as of 08:44, 20 June 2009

Task
Non-decimal radices/Input
You are encouraged to solve this task according to the task description, using any language you may know.

It is common to have a string containing a number written in some format, with the most common ones being decimal, hexadecimal, octal and binary. Such strings are found in many places (user interfaces, configuration files, XML data, network protocols, etc.)

This task requires parsing of such a string (which may be assumed to contain nothing else) using the language's built-in facilities if possible. Parsing of decimal strings is required, parsing of other formats is optional but should be shown (i.e., if the language can parse in base-19 then that should be illustrated).

The solutions may assume that the base of the number in the string is known.

The reverse operation is in task Common number base formatting

For general number base conversion, see Number base conversion.

AutoHotkey

There is no built in support for generic base parsing. <lang AutoHotkey> MsgBox % number2base(200, 16) ; 12 MsgBox % parse(200, 16)  ; 512

number2base(number, base) {

 While, base < digit := Floor(number / base)
 {
   result := Mod(number, base) . result
   number := digit
 }
 result := digit . result
 Return result

}

parse(number, base) {

 result = 0
 pos := StrLen(number) - 1
 Loop, Parse, number 
 {
   result := ((base ** pos) * A_LoopField) + result
   base -= 1
 }
 Return result

}</lang> alternate implementation contributed by Laszlo on the ahk forum <lang AutoHotkey> MsgBox % BC("FF",16,3) ; -> 100110 in base 3 = FF in hex = 256 in base 10

BC(NumStr,InputBase=8,OutputBase=10) {

 Static S = 12345678901234567890123456789012345678901234567890123456789012345
 DllCall("msvcrt\_i64toa","Int64",DllCall("msvcrt\_strtoui64","Str",NumStr,"Uint",0,"UInt",InputBase,"CDECLInt64"),"Str",S,"UInt",OutputBase,"CDECL")
 Return S

} </lang>

Common Lisp

<lang lisp>(parse-integer "abc" :radix 20 :junk-allowed t) ; => 4232</lang>

If :radix is omitted, it defaults to 10. If :junk-allowed is omitted, it defaults to nil, causing #'parse-integer to signal an error of type parse-error rather than just returning nil whenever the input string isn't a numeral possibly surrounded by whitespace.

E

<lang e>? __makeInt("200", 16)

  1. value: 512

? __makeInt("200", 10)

  1. value: 200</lang>

Forth

Arbitrary base 2-36 parsing is supported by the same mechanism as decimal parsing: set the user variable BASE to the desired base, then scan the number. There are two convenience words for setting the base to DECIMAL or HEX. <lang forth>

parse# ( str len -- u true | false )
  0. 2SWAP DUP >R >NUMBER NIP NIP 
  R> <> DUP 0= IF NIP THEN ;
base# ( str len base -- u true | false )
 BASE @ >R  BASE !  parse#  R> BASE ! ;

</lang>

Java

Works with: Java version 1.5+

You must know the base that the String is in before you scan it. Create a Scanner in the usual way, but then set its radix to that base (obviously, the default is 10): <lang java5>Scanner sc = new Scanner(System.in); //or any other InputStream or String sc.useRadix(base); //any number from Character.MIN_RADIX (2) to CHARACTER.MAX_RADIX (36) sc.nextInt(); //read in a value</lang> Later you can call sc.reset() or sc.useRadix(10) to undo this change.

Another option using the Integer class: <lang java>Integer number = Integer.valueOf(stringNum, base);</lang> The base here has the same restrictions as the Scanner example. A similar method is available in the Long class. Use no second argument for base 10.

OCaml

The int_of_string function can parse hexadecimal, octal, and binary numbers that have the same prefix that is used to specify OCaml constants ("0x", "0o", and "0b", respectively): <lang ocaml># int_of_string "123459";; - : int = 123459

  1. int_of_string "0xabcf123";;

- : int = 180154659

  1. int_of_string "0o7651";;

- : int = 4009

  1. int_of_string "0b101011001";;

- : int = 345</lang> The Int32.of_string, Int64.of_string, and Nativeint.of_string functions also can understand the above prefixes when parsing into their appropriate types.

Unfortunately, the Big_int.big_int_of_string function does not understand these prefixes.

You could also use the Scanf module to parse un-prefixed hexadecimal, decimal, and octal numbers (binary not supported): <lang ocaml># Scanf.sscanf "123459" "%d" (fun x -> x);; - : int = 123459

  1. Scanf.sscanf "abcf123" "%x" (fun x -> x);;

- : int = 180154659

  1. Scanf.sscanf "7651" "%o" (fun x -> x);;

- : int = 4009</lang>

Perl

The hex() function parses hexadecimal strings. The oct() function parses octal strings, as well as hexadecimal, octal, or binary strings with the appropriate prefix ("0x", "0", and "0b", respectively). There is no need to parse decimal strings because in Perl decimal strings and numbers are interchangeable. <lang perl>my $dec = "0123459"; my $hex_noprefix = "abcf123"; my $hex_withprefix = "0xabcf123"; my $oct_noprefix = "7651"; my $oct_withprefix = "07651"; my $bin_withprefix = "0b101011001";

print 0 + $dec, "\n"; # => 123459 print hex($hex_noprefix), "\n"; # => 180154659 print hex($hex_withprefix), "\n"; # => 180154659 print oct($hex_withprefix), "\n"; # => 180154659 print oct($oct_noprefix), "\n"; # => 4009 print oct($oct_withprefix), "\n"; # => 4009 print oct($bin_withprefix), "\n"; # => 345

  1. nothing for binary without prefix

</lang>

PHP

The hexdec(), octdec(), bindec() function parses hexadecimal, octal, and binary strings, respectively. They skip any invalid characters, so a prefix will be ignored. There is no need to parse decimal strings because in Perl decimal strings and numbers are interchangeable. <lang php><?php echo 0 + "0123459"; // prints 123459 echo hexdec("abcf123"); // prints 180154659 echo octdec("7651"); // prints 4009 echo bindec("101011001"); // prints 345 ?></lang>

Python

The int function will interpret strings as numbers expressed to some base: <lang python>>>> text = '100' >>> for base in range(2,21):

   print ("String '%s' in base %i is  %i in base 10" 
          % (text, base, int(text, base)))


String '100' in base 2 is 4 in base 10 String '100' in base 3 is 9 in base 10 String '100' in base 4 is 16 in base 10 String '100' in base 5 is 25 in base 10 String '100' in base 6 is 36 in base 10 String '100' in base 7 is 49 in base 10 String '100' in base 8 is 64 in base 10 String '100' in base 9 is 81 in base 10 String '100' in base 10 is 100 in base 10 String '100' in base 11 is 121 in base 10 String '100' in base 12 is 144 in base 10 String '100' in base 13 is 169 in base 10 String '100' in base 14 is 196 in base 10 String '100' in base 15 is 225 in base 10 String '100' in base 16 is 256 in base 10 String '100' in base 17 is 289 in base 10 String '100' in base 18 is 324 in base 10 String '100' in base 19 is 361 in base 10 String '100' in base 20 is 400 in base 10</lang>

Ruby

The String class has methods to coerce a string into another form: <lang ruby>dec1 = "0123459" hex2 = "abcf123" oct3 = "7651" bin4 = "101011001"

dec1.to_i # => 123459 hex2.hex # => 180154659 oct3.oct # => 4009

  1. nothing for binary</lang>

The Integer class can parse a string, provided the string has the right prefix: <lang ruby>Integer(dec1) # => ArgumentError: invalid value for Integer: "0123459" Integer(dec1.sub(/^0+/,"")) # => 123459 Integer("0x" + hex2) # => 180154659 Integer("0" + oct3) # => 4009 Integer("0b" + bin4) # => 345</lang>

And then there's the poorly documented Scanf module in the Ruby stdlib, that seems to wrap the matched value in an array: <lang ruby>require 'scanf' dec1.scanf("%d") # => [123459] hex2.scanf("%x") # => [180154659] oct3.scanf("%o") # => [4009]

  1. no scanf specifier for binary numbers.</lang>

Standard ML

<lang sml>- Int.fromString "0123459"; val it = SOME 123459 : int option - StringCvt.scanString (Int.scan StringCvt.HEX) "0xabcf123"; val it = SOME 180154659 : int option - StringCvt.scanString (Int.scan StringCvt.HEX) "abcf123"; val it = SOME 180154659 : int option - StringCvt.scanString (Int.scan StringCvt.OCT) "7651"; val it = SOME 4009 : int option - StringCvt.scanString (Int.scan StringCvt.BIN) "101011001"; val it = SOME 345 : int option</lang>

Tcl

<lang tcl>package require Tcl 8.6; # For easy scanning of binary

  1. The strings to parse

set dec1 "0123459" set hex2 "abcf123" set oct3 "7651" set bin4 "101011001"

  1. Parse the numbers

scan $dec1 "%d" v1 scan $hex2 "%x" v2 scan $oct3 "%o" v3 scan $bin4 "%b" v4; # Only 8.6-specific operation; others work in all versions

  1. Print out what happened

puts "$dec1->$v1 $hex2->$v2 $oct3->$v3 $bin4->$v4"</lang> This produces this output:

0123459->123459 abcf123->180154659 7651->4009 101011001->345

For a general parser up to base 36, a little function can be written: <lang Tcl>proc scanbase {str base} {

  set res 0
  set digits {0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z}
  foreach char [split [string tolower $str] ""] {
     set value [lsearch [lrange $digits 0 [expr {$base - 1}]] $char]
     if {$value < 0} {error "bad digit $char"}
     set res [expr {$res*$base + $value}]
  }
  return $res

}</lang> Example:

% scanbase 255 19
822
% scanbase $dec1 8
bad digit 9