String length

In this task, the goal is to find the character and byte length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16.

4D

Byte Length

$length:=Length("Hello, world!")

ActionScript

Character Length

myStrVar.length()

Ada

Byte Length

Compiler: GCC 4.1.2

Str    : String := "Hello World";
Length : constant Natural := Str'Size / System.Storage_Unit;

The 'size attribute returns the size of an object in bits. System.Storage_Unit is the number of bits in a byte on the current machine.

Character Length

Compiler: GCC 4.1.2

Str    : String := "Hello World";
Length : constant Natural := Str'Length;

ALGOL 68

Character Length

STRING str := "hello, world";
INT length := UPB str;
printf(($"Length of """g""" is "g(3)$,str,length))

Result:

Length of "hello, world" is +12

AppleScript

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

count of "Hello World"

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

count of "Hello World"

Or:

count "Hello World"

AWK

Byte Length

From within any code block:

w=length("Hello, world!")      # static string example
x=length("Hello," s " world!") # dynamic string example
y=length($1)                   # input field example
z=length(s)                    # variable name example

Ad hoc program from command line:

echo "Hello, wørld!" | awk '{print length($0)}'   # 14

From executable script: (prints for every line arriving on stdin)

#!/usr/bin/awk -f
{print"The length of this line is "length($0)}

C

Byte Length

Standard: ANSI C (AKA C89):

Compiler: GCC 3.3.3

 #include <string.h>

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = strlen(string);
          
   return 0;
 }

or by hand:

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = 0;
   
   char *p = (char *) string;
   while (*p++ != '\0') length++;                                         
   
   return 0;
 }

or (for arrays of char only)

 #include <stdlib.h>
 
 int main(void)
 {
   char const s[] = "Hello, world!";
   size_t length = sizeof s - 1;
   
   return 0;
 }

Character Length

Compiler: ???

For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):

 #include <stdio.h>
 #include <wchar.h>
 
 int main(void) 
 {
    wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
    size_t length;
 
    length = wcslen(s);
    printf("Length in characters = %d\n", length);
    printf("Length in bytes      = %d\n", sizeof(s) * sizeof(wchar_t));
    
    return 0;
 }

TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()

C++

Byte Length

Standard: ISO C++ (AKA C++98):

Compiler: g++ 4.0.2

 #include <string> // note: not <string.h>
 
 int main()
 {
   std::string s = "Hello, world!";
   std::string::size_type length = s.length(); // option 1: In Characters/Bytes
   std::string::size_type size = s.size();     // option 2: In Characters/Bytes
   // In bytes same as above since sizeof(char) == 1
   std::string::size_type bytes = s.length() * sizeof(std::string::value_type); 
 }

For wide character strings:

 #include <string>
 
 int main()
 {
   std::wstring s = L"\u304A\u306F\u3088\u3046";
   std::wstring::size_type length = s.length() * sizeof(std::wstring::value_type); // in bytes
 }

Character Length

Standard: ISO C++ (AKA C++98):

Compiler: g++ 4.0.2

For wide character strings:

 #include <string>
 
 int main()
 {
   std::wstring s = L"\u304A\u306F\u3088\u3046";
   std::wstring::size_type length = s.length();
}

TODO: similar calls for variable length encodings like UTF-8

C#

Byte Length

Platform: .NET Language Version: 1.0+

string s = "Hello, world!";
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.

Character Length

Platform: .NET Language Version: 1.0+

string s = "Hello, world!";
int clength = s.Length;  // In characters

Clean

Byte Length

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

import StdEnv

strlen :: String -> Int
strlen string = size string 

Start = strlen "Hello, world!"

ColdFusion

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

  #len("Hello World")#

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

  #len("Hello World")#

Common Lisp

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

  (length "Hello World")

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

  (length "Hello World")

Component Pascal

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

  LEN("Hello, World!")

E

Character Length

"Hello World".size()

Forth

Byte Length

Interpreter: ANS Forth

Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.

Counted string

A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.

 CREATE s ," Hello world" \ create string "s"
 s C@ ( -- length=11 )

Stack string

A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word COUNT converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters.

S" string" ( addr len)
DUP .   \ 6

Character Length

The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)

Interpreter: ANS Forth

The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".

2 base !
: utf8+ ( str -- str )
  begin
    char+
    dup c@
    11000000 and
    10000000 <>
  until ;
decimal

: count-utf8 ( zstr -- n )
  0
  begin
    swap dup c@
  while
    utf8+
    swap 1+
  repeat drop ;

Haskell

Byte Length

It is not possible to determine the "byte length" of an ordinary string, because in Haskell, a string is a boxed list of unicode characters. So each character in a string is represented as whatever the compiler considers as the most efficient representation of a cons-cell and a unicode character, and not as a byte.

For efficient storage of sequences of bytes, there's Data.ByteString, which uses Word8 as a base type. Byte strings have an additional Data.ByteString.Char8 interface, which will truncate each Unicode Char to 8 bits as soon as it is converted to a byte string. However, this is not adequate for the task, because truncation simple will garble characters other than Latin-1, instead of encoding them into UTF-8, say.

There are several (non-standard, so far) Unicode encoding libraries available on Hackage. As an example, we'll use encoding-0.2, as Data.Encoding:

import Data.Encoding
import Data.ByteString as B

strUTF8  :: ByteString 
strUTF8  = encode UTF8  "Hello World!"

strUTF32 :: ByteString 
strUTF32 = encode UTF32 "Hello World!"

strlenUTF8  = B.length strUTF8
strlenUTF32 = B.length strUTF32

Character Length

Interpreter: GHCi 6.6, Hugs

Compiler: GHC 6.6

The base type Char defined by the standard is already intended for (plain) Unicode characters.

strlen = length "Hello, world!"

IDL

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Compiler: any IDL compiler should do

 length = strlen("Hello, world!")

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Compiler: any IDL compiler should do

 length = strlen("Hello, world!")

Java

Byte Length

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

String s = "Hello, world!";
int byteCount = s.length() * 2;

Another way to know the byte length of a string is to explicitly specify the charset we desire.

String s = "Hello, world!";
int byteCountUTF16 = s.getBytes("UTF-16").length;
int byteCountUTF8  = s.getBytes("UTF-8").length;

Character Length

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

The length method of String objects gives the number of 16-bit values used to encode a string.

String s = "Hello, world!";
int length = s.length();

Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.

String str = "\uD834\uDD2A"; //U+1D12A
int length1 = str.length(); //2
int length2 = str.codePointCount(0, str.length()); //1

JavaScript

Byte Length

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

var s = "Hello, world!";
var byteCount = s.length * 2; //26

Character Length

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.

var str1 = "Hello, world!";
var len1 = str1.length; //13

var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2

JudoScript

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 //Store length of hello world in length and print it
 . length = "Hello World".length();

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 //Store length of hello world in length and print it
 . length = "Hello World".length();

LSE64

Byte Length

LSE stores strings as arrays of characters in 64-bit cells plus a count.

" Hello world" @ 1 + 8 * ,   # 96 = (11+1)*(size of a cell) = 12*8

Character Length

LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.

" Hello world" @ ,   # 11

Lua

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter: Lua 5.0 or later.

 string="Hello world"
 length=#string

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter: Lua 5.0 or later.

 string="Hello world"
 length=#string

MAXScript

Character Length

"Hello world".count

mIRC Scripting Language

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

Objective-C

Character Length

// Return the length in unicode characters
unsigned length = [@"Hello Wørld!" length];  // 12 (13 UTF-8 bytes)

OCaml

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter/Compiler: Ocaml 3.09

String.length "Hello world";;

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter/Compiler: Ocaml 3.09

String.length "Hello world";;

Perl

Byte Length

Interpreter: perl 5.8

Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).

use utf8; # so we can use literal characters like ☺ in source
use Encode qw(encode);

print length encode 'UTF-8', "Hello, world! ☺";
# 17. The last character takes 3 bytes, the others 1 byte each.

print length encode 'UTF-16', "Hello, world! ☺";
# 32. 2 bytes for the BOM, then 15 byte pairs for each character.

Character Length

Interpreter: Perl any 5.X

 my $length = length "Hello, world!";

PHP

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 $length = strlen('Hello, world!');

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 $length = strlen('Hello, world!');

PL/SQL

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

DECLARE
  string VARCHAR2( 50 ) := 'Hello, world!';
  stringlength NUMBER;
BEGIN
  stringlength := length( string );
END;

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

DECLARE
  string VARCHAR2( 50 ) := 'Hello, world!';
  stringlength NUMBER;
BEGIN
  stringlength := length( string );
END;

Pop11

Byte Length

Currently Pop11 supports only strings consisting of 1-byte units. Strings can carry arbitrary binary data, so user can for example use UTF-8 (however builtin procedures will treat each byte as a single character). The length function for strings returns length in bytes:

lvars str = 'Hello, world!';
lvars len = length(str);

Python

Byte Length

Interpreter: Python 2.x

Byte length depends on the encoding. Python use 2 or 4 bytes per character internally for unicode strings, depending on how it was built. The internal representation is not interesting for the user.

# The letter Alef
>>> len(u'\u05d0'.encode('utf-8'))
2
>>> len(u'\u05d0'.encode('iso-8859-8'))
1

Example from the problem statement:

#!/bin/env python
# -*- coding: UTF-8 -*-
s = u"møøse"
assert len(s) == 5
assert len(s.encode('UTF-8')) == 7
assert len(s.encode('UTF-16')) == 12 # The extra character is probably a leading Unicode byte-order mark (BOM).

Character Length

Interpreter: Python 2.4

len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:

>>> len('ascii')
5
>>> len(u'\u05d0') # the letter Alef as unicode literal
1
>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string
1

Ruby

Byte Length

 string="Hello world"
 print string.length

or

 puts "Hello World".length

Character Length

Library: active_support

 require 'active_support'
 puts "Hello World".chars.length

Scheme

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 (string-length "Hello world")

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 (string-length "Hello world")

Seed7

Character Length

 length("Hello, world!")

Smalltalk

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 string := 'Hello, world!".
 string size.

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

 string := 'Hello, world!".
 string size.

Standard ML

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)

Compiler: MLton 20061107

val strlen = size "Hello, world!";

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)

Compiler: MLton 20061107

val strlen = size "Hello, world!";

Tcl

Byte Length

Basic version:

 string bytelength "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in bytes is %d"  $s1 [string bytelength $s1]]
 puts [format "length of \"%s\" in bytes is %d"  $s2 [string bytelength $s2]]

Character Length

Basic version:

 string length "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in characters is %d"  $s1 [string length $s1]]
 puts [format "length of \"%s\" in characters is %d"  $s2 [string length $s2]]

Toka

Byte Length

 " hello, world!" string.getLength

UNIX Shell

Byte Length

With external utilities:

Interpreter: any bourne shell

 string='Hello, world!'
 length=`echo -n "$string" | wc -c | tr -dc '0-9'`
 echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:

Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell

 string='Hello, world!'
 length="${#string}"
 echo $length # if you want it printed to the terminal

VBScript

Byte Length

LenB(string|varname)

Returns the number of bytes required to store a string in memory. Returns null if string|varname is null.

Character Length

Len(string|varname)

Returns the length of the string|varname . Returns null if string|varname is null.

XSLT

Character Length

<?xml version="1.0" encoding="UTF-8"?>
...
<xsl:value-of select="string-length('møøse')" />

xTalk

Byte Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter: HyperCard

 put the length of "Hello World"

or

 put the number of characters in "Hello World"

Character Length

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Interpreter: HyperCard

 put the length of "Hello World"

or

 put the number of characters in "Hello World"