Character codes: Difference between revisions

Content added Content deleted

Inline

Revision as of 09:46, 12 November 2009

Given a character value in your language, print its code (could be ASCII code, Unicode code, or whatever your language uses). For example, the character 'a' (lowercase letter A) has a code of 97 in ASCII (as well as Unicode, as ASCII forms the beginning of Unicode). Conversely, given a code, print out the corresponding character.

Ada

<lang ada> with Ada.Text_IO; use Ada.Text_IO;

procedure Char_Code is begin

  Put_Line (Character'Val (97) & " =" & Integer'Image (Character'Pos ('a')));

end Char_Code; </lang> The predefined language attributes S'Pos and S'Val for every discrete subtype, and Character is such a type, yield the position of a value and value by its position correspondingly. Sample output.

a = 97

ALGOL 68

In ALGOL 68 the FORMAT $g$ is type aware, hence the type conversion operators ABS & REPR are used to set the type. <lang algol> main:(

  printf(($gl$, ABS "a")); # for ASCII this prints "+97" EBCDIC prints "+129" #
  printf(($gl$, REPR 97))  # for ASCII this prints "a"; EBCDIC prints "/" #
)</lang>

Character conversions may be available in the standard prelude so that when a foreign tape is mounted, the characters will be converted transparently as the tape's records are read. <lang algol> FILE tape;

INT errno = open(tape, "/dev/tape1", stand out channel)
make conv(tape, ebcdic conv);
FOR record DO getf(tape, ( ~ )) OD; ~ # etc ... #</lang>

Every CHANNEL has an associated standard character conversion that can be determined using the stand conv query routine and then the conversion applied to a particular file/tape. eg. <lang algol> make conv(tape, stand conv(stand out channel))</lang>

AutoHotkey

<lang AutoHotkey> MsgBox % Chr(97) MsgBox % Asc("a") </lang>

AWK

AWK has not built-in way to convert a character into ASCII (or whatever) code; but a function that does so can be easily built using an associative array (where the keys are the characters). The opposite can be done using printf (or sprintf) with %c

<lang awk>function ord(c) {

 return chmap[c]

} BEGIN {

 for(i=0; i < 256; i++) {
   chmap[sprintf("%c", i)] = i
 }
 print ord("a"), ord("b")
 printf "%c %c\n", 97, 98
 s = sprintf("%c%c", 97, 98)
 print s

}</lang>

BASIC

Works with: QuickBasic version 4.5

<lang qbasic>charCode = 97 char = "a" PRINT CHR$(charCode) 'prints a PRINT ASC(char) 'prints 97</lang>

C

char is already an integer type in C, and it gets automatically promoted to int. So you can use a character where you would otherwise use an integer. Conversely, you can use an integer where you would normally use a character, except you may need to cast it, as char is smaller.

<lang c>#include <stdio.h>

int main() {

 printf("%d\n", 'a'); /* prints "97" */
 printf("%c\n", 97); /* prints "a"; we don't have to cast because printf is type agnostic */
 return 0;

}</lang>

C++

char is already an integer type in C++, and it gets automatically promoted to int. So you can use a character where you would otherwise use an integer. Conversely, you can use an integer where you would normally use a character, except you may need to cast it, as char is smaller.

In this case, the output operator << is overloaded to handle integer (outputs the decimal representation) and character (outputs just the character) types differently, so we need to cast it in both cases. <lang cpp>#include <iostream>

int main() {

 std::cout << (int)'a' << std::endl; // prints "97"
 std::cout << (char)97 << std::endl; // prints "a"
 return 0;

}</lang>

C#

C# represents strings and characters internally as Unicode, so casting a char to an int returns its Unicode character encoding. <lang csharp> using System;

namespace RosettaCode.CharacterCode {

   class Program
   {
       static void Main(string[] args)
       {
           Console.WriteLine((int) 'a');   //Prints "97"
           Console.WriteLine((char) 97);   //Prints "a"
       }
   }

} </lang>

Common Lisp

<lang lisp>(princ (char-code #\a)) ; prints "97" (princ (code-char 97)) ; prints "a"</lang>

D

Could be treated like C, but since the standard string type is UTF-8, let's be verbose. <lang d>import std.stdio, std.utf;

void main() {

 string test = "a";
 size_t index = 0;   
 // get four-byte utf32 value for index 0
 // this returns dchar, so cast it to numeric
 writefln(cast(uint) test.decode(index));
 // index has moved to next character position in input
 assert(index == 1);

} </lang>

E

<lang e>? 'a'.asInteger()

value: 97

? <import:java.lang.makeCharacter>.asChar(97)

value: 'a'</lang>

Erlang

In Erlang, lists and strings are the same, only the representation changes. Thus: <lang erlang> 1> F = fun([X]) -> X end.

Fun<erl_eval.6.13229925>

2> F("a"). 97</lang>

If entered manually, one can also get ASCII codes by prefixing characters with $: <lang erlang> 3> $a. 97</lang>

Unicode is fully supported since release R13A only.

FALSE

'A."
"65,

Forth

As with C, characters are just integers on the stack which are treated as ASCII. <lang forth>char a dup . \ 97 emit \ a</lang>

Fortran

Functions ACHAR and IACHAR specifically work with the ASCII character set, while the results of CHAR and ICHAR will depend on the default character set being used. <lang fortran>WRITE(*,*) ACHAR(97), IACHAR("a") WRITE(*,*) CHAR(97), ICHAR("a")</lang>

Groovy

Groovy does not have a character literal at all, so one-character strings have to be coerced to char. Groovy printf (like Java, but unlike C) is not type-agnostic, so the cast or coercion from char to int is also required. The reverse direction is considerably simpler. <lang groovy>printf ("%d\n", ('a' as char) as int) printf ("%c\n", 97)</lang>

Output:

97
a

Haskell

<lang haskell> import Data.Char

main = do

 print (ord 'a') -- prints "97"
 print (chr 97) -- prints "'a'"
 print (ord 'π') -- prints "960"
 print (chr 960) -- prints "'\960'"

</lang>

J

   4 u: 97 98 99 9786

abc☺

  3 u: 7 u: 'abc☺'

97 98 99 9786

Java

char is already an integer type in Java, and it gets automatically promoted to int. So you can use a character where you would otherwise use an integer. Conversely, you can use an integer where you would normally use a character, except you may need to cast it, as char is smaller.

In this case, the println method is overloaded to handle integer (outputs the decimal representation) and character (outputs just the character) types differently, so we need to cast it in both cases. <lang java>public class Foo {

   public static void main(String[] args) {
       System.out.println((int)'a'); // prints "97"
       System.out.println((char)97); // prints "a"
   }

}</lang>

Java characters support Unicode: <lang java>public class Bar {

   public static void main(String[] args) {
       System.out.println((int)'π'); // prints "960"
       System.out.println((char)960); // prints "π"
   }

}</lang>

JavaScript

Here character is just a string of length 1 <lang javascript>document.write('a'.charCodeAt(0)); // prints "97" document.write(String.fromCharCode(97)); // prints "a"</lang>

Joy

Logo

Logo characters are words of length 1. <lang logo>print ascii "a ; 97 print char 97 ; a</lang>

Metafont

Metafont handles only ASCII (even though codes beyond 127 can be given and used as real ASCII codes)

<lang metafont> message "enter a letter: "; string a; a := readstring; message decimal (ASCII a); % writes the decimal number of the first character

                          % of the string a

message "enter a number: "; num := scantokens readstring; message char num; % num can be anything between 0 and 255; what will be seen

                   % on output depends on the encoding used by the "terminal"; e.g.
                   % any code beyond 127 when UTF-8 encoding is in use will give
                   % a bad encoding; e.g. to see correctly an "è", we should write

message char10; % (this add a newline...) message char hex"c3" & char hex"a8"; % since C3 A8 is the UTF-8 encoding for "è" end</lang>

Modula-3

The built in functions ORD and VAL work on characters, among other things. <lang modula3>ORD('a') (* Returns 97 *) VAL(97, CHAR); (* Returns 'a' *)</lang>

OCaml

<lang ocaml>Printf.printf "%d\n" (int_of_char 'a'); (* prints "97" *) Printf.printf "%c\n" (char_of_int 97); (* prints "a" *)</lang>

Pascal

<lang pascal>writeln(ord('a')); writeln(chr(97));</lang>

Perl

Here character is just a string of length 1 <lang perl>print ord('a'), "\n"; # prints "97" print chr(97), "\n"; # prints "a"</lang>

Perl 6

As Perl 5.

PHP

Here character is just a string of length 1 <lang php>echo ord('a'), "\n"; // prints "97" echo chr(97), "\n"; // prints "a"</lang>

PowerShell

PowerShell does not allow for character literals directly, so to get a character one first needs to convert a single-character string to a char: <lang powershell>$char = [char] 'a'</lang> Then a simple cast to int yields the character code: <lang powershell>$charcode = [int] $char # => 97</lang> This also works with Unicode: <lang powershell>[int] [char] '☺' # => 9786</lang> For converting an integral character code into the actual character, a cast to char suffices: <lang powershell>[char] 97 # a [char] 9786 # ☺</lang>

Python

2.x

Here character is just a string of length 1

8-bit characters: <lang python>print ord('a') # prints "97" print chr(97) # prints "a"</lang>

Unicode characters: <lang python>print ord(u'π') # prints "960" print unichr(960) # prints "π"</lang>

3.x

Here character is just a string of length 1 <lang python>print(ord('a')) # prints "97" print(ord('π')) # prints "960" print(chr(97)) # prints "a" print(chr(960)) # prints "π"</lang>

R

<lang R> ascii <- as.integer(charToRaw("hello world")); ascii text <- rawToChar(as.raw(ascii)); text </lang>

Ruby

1.8

In Ruby 1.8 characters are usually represented directly as their integer character code. Ruby has a syntax for "character literal" which evaluates directly to the integer code: ?a evaluates to the integer 97. Subscripting a string also gives just the integer code for the character.

1.9

In Ruby 1.9 characters are represented as length-1 strings; same as in Python. The previous "character literal" syntax ?a is now the same as "a". Subscripting a string also gives a length-1 string. There is now an "ord" method of strings to convert a character into its integer code.

Scheme

<lang scheme>(display (char->integer #\a)) (newline) ; prints "97" (display (integer->char 97)) (newline) ; prints "a"</lang>

Slate

<lang slate> $a code. 97 as: String Character. </lang>

Smalltalk

<lang smalltalk>($a asInteger) displayNl. "output 97" (Character value: 97) displayNl. "output a"</lang>

Standard ML

<lang sml>print (Int.toString (ord #"a") ^ "\n"); (* prints "97" *) print (Char.toString (chr 97) ^ "\n"); (* prints "a" *)</lang>

Tcl

<lang tcl># ASCII puts [scan "a" %c] ;# ==> 97 puts [format %c 97] ;# ==> a

Unicode is the same

puts [scan "π" %c] ;# ==> 960 puts [format %c 960] ;# ==> π</lang>

TI-89 BASIC

The TI-89 uses an 8-bit charset/encoding which is similar to ISO-8859-1, but with more mathematical symbols and Greek letters. At least codes 14-31, 128-160, 180 differ. The ASCII region is unmodified. (TODO: Give a complete list.)

The TI Connect X desktop software converts between this unique character set and Unicode characters, though sometimes in a consistent but inappropriate fashion.

The below program will display the character and code for any key pressed. Some keys do not correspond to characters and have codes greater than 255. The portion of the program actually implementing the task is marked with a line of “©”s.

Prgm
  Local k, s
  ClrIO
  Loop
    Disp "Press a key, or ON to exit."
    getKey() © clear buffer
    0 → k : While k = 0 : getKey() → k : EndWhile
    ClrIO
    If k ≥ 256 Then
      Disp "Not a character."
      Disp "Code: " & string(k)
    Else

      char(k) → s                           ©
      © char() and ord() are inverses.      ©
      Disp "Character: " & s                ©
      Disp "Code: " & string(ord(s))        ©

    EndIf
  EndLoop
EndPrgm

Visual Basic .NET

<lang vbnet>Console.WriteLine(Chr(97)) 'Prints a Console.WriteLine(Asc("a")) 'Prints 97</lang>

Ursala

Character code functions are not built in but easily defined as reifications of the character table. <lang Ursala>

import std
import nat

chr = -: num characters asc = -:@rlXS num characters

cast %cnX

test = (chr97,asc`a) </lang> output:

(`a,97)

@@ Line 188: / Line 188: @@
 =={{header|J}}==
-<lang j>   4 u: 97 98 99 9786
+u: 97 98 99 9786
 abc☺
 u: 7 u: 'abc☺'
 98 99 9786
-</lang>
 =={{header|Java}}==