Names to numbers

From Rosetta Code
Revision as of 04:16, 16 May 2013 by rosettacode>2Powers (Created page with "{{draft task|Names to numbers}} Translate the spelled-out English name of a number to a number. You can use a preexisting implementation or roll your own, but you should supp...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Names to numbers is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Translate the spelled-out English name of a number to a number. You can use a preexisting implementation or roll your own, but you should support inputs up to at least one million (or the maximum value of your language's default bounded integer type, if that's less).

Support for inputs other than positive integers (like zero, negative integers, fractions and floating-point numbers) is optional.

The reverse operation is Number names.

Perl

The following code reads a file line-by-line. It echos comment lines starting with a hashmark and blank lines. Remaining lines are output followed by an arrow "=>" and any non-negative integer number names translated into numbers, e.g., a line with "ninety-nine" is output like this: "ninety-nine => 99". <lang perl> use strict; use List::Util qw(sum);

our %nums = (

   zero      => 0,	one	=> 1,	two	=> 2,	three	=> 3,
   four      => 4,	five	=> 5,	six	=> 6,	seven	=> 7,
   eight     => 8,	nine	=> 9,	ten	=> 10,	eleven	=> 11,
   twelve    => 12,	thirteen => 13,	fourteen => 14,	fifteen	=> 15,
   sixteen   => 16,	seventeen => 17, eighteen => 18, nineteen => 19,
   twenty    => 20,	thirty	=> 30,	forty	=> 40,	fifty	=> 50,
   sixty     => 60,	seventy	=> 70,	eighty	=> 80,	ninety	=> 90,
   hundred   => 100,	thousand => 1_000, million => 1_000_000,
   billion   => 1_000_000_000,         trillion => 1_000_000_000_000,
   # My ActiveState Win32 Perl uses e-notation after 999_999_999_999_999
   quadrillion => 1e+015,              quintillion =>  1e+018);
  1. Groupings for thousands, millions, ..., quintillions

our $groups = qr/\d{4}|\d{7}|\d{10}|\d{13}|1e\+015|1e\+018/;

  1. Numeral or e-notation

our $num = qr/\d+|\d+e\+\d+/;

while (<>) {

       # skip blank lines
       if(/^\s*$/) { print; next; }
       # echo comment lines
       if( /^\s*#.*$/ ) { print; next; }

chomp; my $orig = $_; s/-/ /g; # convert hyphens to spaces s/\s\s+/ /g; # remove duplicate whitespace, convert ws to space s/ $//g; # remove trailing blank s/^ //g; # remove leading blank $_ = lc($_); # convert to lower case # tokenize sentence boundaries s/([\.\?\!]) / $1\n/g; s/([\.\?\!])$/ $1\n/g; # tokenize other punctuation and symbols s/\$(.)/\$ $1/g; # prefix s/(.)([\;\:\%',])/$1 $2/g; # suffix

foreach my $key (keys %nums) { s/\b$key\b/$nums{$key}/eg; }

s/(\d) , (\d)/$1 $2/g; s/(\d) and (\d)/$1 $2/g;

s/\b(\d) 100 (\d\d) (\d) (${groups})\b/($1 * 100 + $2 + $3) * $4/eg;

s/\b(\d) 100 (\d\d) (${groups})\b/($1 * 100 + $2) * $3/eg; s/\b(\d) 100 (\d) (${groups})\b/($1 * 100 + $2) * $3/eg; s/\b(\d) 100 (${groups})\b/$1 * $2 * 100/eg;

       s/\b100 (\d\d) (\d) (${groups})\b/(100 + $1 + $2) * $3/eg;
       s/\b100 (\d\d) (${groups})\b/(100 + $1) * $2/eg;
       s/\b100 (\d) (${groups})\b/(100 + $1) * $2/eg;
       s/\b100 (${groups})\b/$1 * 100/eg;

s/\b(\d\d) (\d) (${groups})\b/($1 + $2) * $3/eg; s/\b(\d{1,2}) (${groups})\b/$1 * $2/eg;

s/\b(\d\d) (\d) 100\b/($1 + $2) * 100/eg; s/\b(\d{1,2}) 100\b/$1 * 100/eg;

       # Date anomolies: nineteen eighty-four and twenty thirteen

s/\b(\d{2}) (\d{2})\b/$1 * 100 + $2/eg;

s/((?:${num} )*${num})/sum(split(" ",$1))/eg;

print $orig, " => ", $_, "\n"; } </lang> Here is a sample input file:

# For numbers between 21 and 99 inclusive, we're supposed to use a hyphen,
# but the bank still cashes our check without the hyphen:
Seventy-two dollars
Seventy two dollars

# For numbers bigger than 100, we're not supposed to use "and,"
# except we still use "and" anyway, e.g.,
One Hundred and One Dalmatians
A Hundred and One Dalmatians
One Hundred One Dalmatians
Hundred and One Dalmatians
One Thousand and One Nights
Two Thousand and One: A Space Odyssey

# Date anomolies
Twenty Thirteen
Nineteen Eighty-Four

# Maximum value an "unsigned long int" can hold on a 32-bit machine = 2^32 - 1
# define ULONG_MAX	4294967295
four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety five

# Max positive integer on 32-bit Perl is 9_007_199_254_740_992 = 2^53
# Use Math::BigInt if you need more.
# Note Perl usually stringifies to 15 digits of precision and this has 16
Nine quadrillion, seven trillion, one hundred ninety-nine billion, two hundred fifty-four million, seven hundred forty thousand, nine hundred ninety two

Nine Hundred Ninety-Nine
One Thousand One Hundred Eleven
Eleven Hundred Eleven
Eight Thousand Eight Hundred Eighty-Eight
Eighty-Eight Hundred Eighty-Eight
Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven
Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine

ninety-nine
three hundred
three hundred and ten
one thousand, five hundred and one
twelve thousand, six hundred and nine
five hundred and twelve thousand, six hundred and nine
forty-three million, one hundred and twelve thousand, six hundred and nine
two billion, one hundred

zero
eight
one hundred
one hundred twenty three
one thousand one
ninety nine thousand nine hundred ninety nine
one hundred thousand
nine billion one hundred twenty three million four hundred fifty six thousand seven hundred eighty nine
one hundred eleven billion one hundred eleven

And here is the resulting output file:

# For numbers between 21 and 99 inclusive, we're supposed to use a hyphen,
# but the bank still cashes our check without the hyphen:
Seventy-two dollars => 72 dollars
Seventy two dollars => 72 dollars

# For numbers bigger than 100, we're not supposed to use "and,"
# except we still use "and" anyway, e.g.,
One Hundred and One Dalmatians => 101 dalmatians
A Hundred and One Dalmatians => a 101 dalmatians
One Hundred One Dalmatians => 101 dalmatians
Hundred and One Dalmatians => 101 dalmatians
One Thousand and One Nights => 1001 nights
Two Thousand and One: A Space Odyssey => 2001 : a space odyssey

# Date anomolies
Twenty Thirteen => 2013
Nineteen Eighty-Four => 1984

# Maximum value an "unsigned long int" can hold on a 32-bit machine = 2^32 - 1
# define ULONG_MAX	4294967295
four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety five => 4294967295

# Max positive integer on 32-bit Perl is 9_007_199_254_740_992 = 2^53
# Use Math::BigInt if you need more.
# Note Perl usually stringifies to 15 digits of precision and this has 16
Nine quadrillion, seven trillion, one hundred ninety-nine billion, two hundred fifty-four million, seven hundred forty thousand, nine hundred ninety two => 9.00719925474099e+015

Nine Hundred Ninety-Nine => 999
One Thousand One Hundred Eleven => 1111
Eleven Hundred Eleven => 1111
Eight Thousand Eight Hundred Eighty-Eight => 8888
Eighty-Eight Hundred Eighty-Eight => 8888
Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven => 7777777
Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine => 99999999999999

ninety-nine => 99
three hundred => 300
three hundred and ten => 310
one thousand, five hundred and one => 1501
twelve thousand, six hundred and nine => 12609
five hundred and twelve thousand, six hundred and nine => 512609
forty-three million, one hundred and twelve thousand, six hundred and nine => 43112609
two billion, one hundred => 2000000100

zero => 0
eight => 8
one hundred => 100
one hundred twenty three => 123
one thousand one => 1001
ninety nine thousand nine hundred ninety nine => 99999
one hundred thousand => 100000
nine billion one hundred twenty three million four hundred fifty six thousand seven hundred eighty nine => 9123456789
one hundred eleven billion one hundred eleven => 111000000111