String length
You are encouraged to solve this task according to the task description, using any language you may know.
In this task, the goal is to find the character and byte length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16.
4D
Byte Length
$length:=Length("Hello, world!")
ActionScript
Character Length
myStrVar.length()
Ada
Byte Length
Compiler: GCC 4.1.2
Str : String := "Hello World"; Length : constant Natural := Str'Size / System.Storage_Unit;
The 'size attribute returns the size of an object in bits. System.Storage_Unit is the number of bits in a byte on the current machine.
Character Length
Compiler: GCC 4.1.2
Str : String := "Hello World"; Length : constant Natural := Str'Length;
ALGOL 68
Character Length
STRING str := "hello, world"; INT length := UPB str; printf(($"Length of """g""" is "g(3)$,str,length))
Result:
Length of "hello, world" is +12
AppleScript
Byte Length
count of "Hello World"
Character Length
count of "Hello World"
Or:
count "Hello World"
AWK
Byte Length
From within any code block:
w=length("Hello, world!") # static string example x=length("Hello," s " world!") # dynamic string example y=length($1) # input field example z=length(s) # variable name example
Ad hoc program from command line:
echo "Hello, wørld!" | awk '{print length($0)}' # 14
From executable script: (prints for every line arriving on stdin)
#!/usr/bin/awk -f {print"The length of this line is "length($0)}
C
Byte Length
Compiler: GCC 3.3.3
#include <string.h> int main(void) { const char *string = "Hello, world!"; size_t length = strlen(string); return 0; }
or by hand:
int main(void) { const char *string = "Hello, world!"; size_t length = 0; char *p = (char *) string; while (*p++ != '\0') length++; return 0; }
or (for arrays of char only)
#include <stdlib.h> int main(void) { char const s[] = "Hello, world!"; size_t length = sizeof s - 1; return 0; }
Character Length
Compiler: ???
For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):
#include <stdio.h> #include <wchar.h> int main(void) { wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */ size_t length; length = wcslen(s); printf("Length in characters = %d\n", length); printf("Length in bytes = %d\n", sizeof(s) * sizeof(wchar_t)); return 0; }
TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()
C++
Byte Length
Standard: ISO C++ (AKA C++98):
Compiler: g++ 4.0.2
#include <string> // note: not <string.h> int main() { std::string s = "Hello, world!"; std::string::size_type length = s.length(); // option 1: In Characters/Bytes std::string::size_type size = s.size(); // option 2: In Characters/Bytes // In bytes same as above since sizeof(char) == 1 std::string::size_type bytes = s.length() * sizeof(std::string::value_type); }
For wide character strings:
#include <string> int main() { std::wstring s = L"\u304A\u306F\u3088\u3046"; std::wstring::size_type length = s.length() * sizeof(std::wstring::value_type); // in bytes }
Character Length
Standard: ISO C++ (AKA C++98):
Compiler: g++ 4.0.2
For wide character strings:
#include <string> int main() { std::wstring s = L"\u304A\u306F\u3088\u3046"; std::wstring::size_type length = s.length(); }
TODO: similar calls for variable length encodings like UTF-8
C#
Byte Length
Platform: .NET Language Version: 1.0+
string s = "Hello, world!"; int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.
Character Length
Platform: .NET Language Version: 1.0+
string s = "Hello, world!"; int clength = s.Length; // In characters
Clean
Byte Length
Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.
import StdEnv strlen :: String -> Int strlen string = size string Start = strlen "Hello, world!"
ColdFusion
Byte Length
#len("Hello World")#
Character Length
#len("Hello World")#
Common Lisp
Byte Length
(length "Hello World")
Character Length
(length "Hello World")
Component Pascal
Byte Length
LEN("Hello, World!")
E
Character Length
"Hello World".size()
Forth
Byte Length
Interpreter: ANS Forth
Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.
Counted string
A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.
CREATE s ," Hello world" \ create string "s" s C@ ( -- length=11 )
Stack string
A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word COUNT converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters.
S" string" ( addr len) DUP . \ 6
Character Length
The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)
Interpreter: ANS Forth
The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".
2 base ! : utf8+ ( str -- str ) begin char+ dup c@ 11000000 and 10000000 <> until ; decimal : count-utf8 ( zstr -- n ) 0 begin swap dup c@ while utf8+ swap 1+ repeat drop ;
Haskell
Byte Length
It is not possible to determine the "byte length" of an ordinary string, because in Haskell, a string is a boxed list of unicode characters. So each character in a string is represented as whatever the compiler considers as the most efficient representation of a cons-cell and a unicode character, and not as a byte.
For efficient storage of sequences of bytes, there's Data.ByteString, which uses Word8 as a base type. Byte strings have an additional Data.ByteString.Char8 interface, which will truncate each Unicode Char to 8 bits as soon as it is converted to a byte string. However, this is not adequate for the task, because truncation simple will garble characters other than Latin-1, instead of encoding them into UTF-8, say.
There are several (non-standard, so far) Unicode encoding libraries available on Hackage. As an example, we'll use encoding-0.2, as Data.Encoding:
import Data.Encoding import Data.ByteString as B strUTF8 :: ByteString strUTF8 = encode UTF8 "Hello World!" strUTF32 :: ByteString strUTF32 = encode UTF32 "Hello World!" strlenUTF8 = B.length strUTF8 strlenUTF32 = B.length strUTF32
Character Length
Compiler: GHC 6.6
The base type Char defined by the standard is already intended for (plain) Unicode characters.
strlen = length "Hello, world!"
IDL
Byte Length
Compiler: any IDL compiler should do
length = strlen("Hello, world!")
Character Length
Compiler: any IDL compiler should do
length = strlen("Hello, world!")
Java
Byte Length
Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.
String s = "Hello, world!"; int byteCount = s.length() * 2;
Another way to know the byte length of a string is to explicitly specify the charset we desire.
String s = "Hello, world!"; int byteCountUTF16 = s.getBytes("UTF-16").length; int byteCountUTF8 = s.getBytes("UTF-8").length;
Character Length
Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
The length method of String objects gives the number of 16-bit values used to encode a string.
String s = "Hello, world!"; int length = s.length();
Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.
String str = "\uD834\uDD2A"; //U+1D12A int length1 = str.length(); //2 int length2 = str.codePointCount(0, str.length()); //1
JavaScript
Byte Length
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.
var s = "Hello, world!"; var byteCount = s.length * 2; //26
Character Length
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.
var str1 = "Hello, world!"; var len1 = str1.length; //13 var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair var len2 = str2.length; //2
JudoScript
Byte Length
//Store length of hello world in length and print it . length = "Hello World".length();
Character Length
//Store length of hello world in length and print it . length = "Hello World".length();
LSE64
Byte Length
LSE stores strings as arrays of characters in 64-bit cells plus a count.
" Hello world" @ 1 + 8 * , # 96 = (11+1)*(size of a cell) = 12*8
Character Length
LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.
" Hello world" @ , # 11
Lua
Byte Length
Interpreter: Lua 5.0 or later.
string="Hello world" length=#string
Character Length
Interpreter: Lua 5.0 or later.
string="Hello world" length=#string
MAXScript
Character Length
"Hello world".count
mIRC Scripting Language
Byte Length
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }
Character Length
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }
Objective-C
Character Length
// Return the length in unicode characters unsigned length = [@"Hello Wørld!" length]; // 12 (13 UTF-8 bytes)
OCaml
Byte Length
Interpreter/Compiler: Ocaml 3.09
String.length "Hello world";;
Character Length
Interpreter/Compiler: Ocaml 3.09
String.length "Hello world";;
Perl
Byte Length
Interpreter: perl 5.8
Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).
use utf8; # so we can use literal characters like ☺ in source use Encode qw(encode); print length encode 'UTF-8', "Hello, world! ☺"; # 17. The last character takes 3 bytes, the others 1 byte each. print length encode 'UTF-16', "Hello, world! ☺"; # 32. 2 bytes for the BOM, then 15 byte pairs for each character.
Character Length
Interpreter: Perl any 5.X
my $length = length "Hello, world!";
PHP
Byte Length
$length = strlen('Hello, world!');
Character Length
$length = strlen('Hello, world!');
PL/SQL
Byte Length
DECLARE string VARCHAR2( 50 ) := 'Hello, world!'; stringlength NUMBER; BEGIN stringlength := length( string ); END;
Character Length
DECLARE string VARCHAR2( 50 ) := 'Hello, world!'; stringlength NUMBER; BEGIN stringlength := length( string ); END;
Pop11
Byte Length
Currently Pop11 supports only strings consisting of 1-byte units. Strings can carry arbitrary binary data, so user can for example use UTF-8 (however builtin procedures will treat each byte as a single character). The length function for strings returns length in bytes:
lvars str = 'Hello, world!'; lvars len = length(str);
Python
Byte Length
Interpreter: Python 2.x
Byte length depends on the encoding. Python use 2 or 4 bytes per character internally for unicode strings, depending on how it was built. The internal representation is not interesting for the user.
# The letter Alef >>> len(u'\u05d0'.encode('utf-8')) 2 >>> len(u'\u05d0'.encode('iso-8859-8')) 1
Example from the problem statement:
#!/bin/env python # -*- coding: UTF-8 -*- s = u"møøse" assert len(s) == 5 assert len(s.encode('UTF-8')) == 7 assert len(s.encode('UTF-16')) == 12 # The extra character is probably a leading Unicode byte-order mark (BOM).
Character Length
Interpreter: Python 2.4
len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:
>>> len('ascii') 5 >>> len(u'\u05d0') # the letter Alef as unicode literal 1 >>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string 1
Ruby
Byte Length
string="Hello world" print string.length
or
puts "Hello World".length
Character Length
Library: active_support
require 'active_support' puts "Hello World".chars.length
Scheme
Byte Length
(string-length "Hello world")
Character Length
(string-length "Hello world")
Seed7
Character Length
length("Hello, world!")
Smalltalk
Byte Length
string := 'Hello, world!". string size.
Character Length
string := 'Hello, world!". string size.
Standard ML
Byte Length
Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)
Compiler: MLton 20061107
val strlen = size "Hello, world!";
Character Length
Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)
Compiler: MLton 20061107
val strlen = size "Hello, world!";
Tcl
Byte Length
Basic version:
string bytelength "Hello, world!"
or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.
fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly set s1 "hello, world" set s2 "\u304A\u306F\u3088\u3046" puts [format "length of \"%s\" in bytes is %d" $s1 [string bytelength $s1]] puts [format "length of \"%s\" in bytes is %d" $s2 [string bytelength $s2]]
Character Length
Basic version:
string length "Hello, world!"
or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.
fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly set s1 "hello, world" set s2 "\u304A\u306F\u3088\u3046" puts [format "length of \"%s\" in characters is %d" $s1 [string length $s1]] puts [format "length of \"%s\" in characters is %d" $s2 [string length $s2]]
Toka
Byte Length
" hello, world!" string.getLength
UNIX Shell
Byte Length
With external utilities:
Interpreter: any bourne shell
string='Hello, world!' length=`echo -n "$string" | wc -c | tr -dc '0-9'` echo $length # if you want it printed to the terminal
With SUSv3 parameter expansion modifier:
Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell
string='Hello, world!' length="${#string}" echo $length # if you want it printed to the terminal
VBScript
Byte Length
LenB(string|varname)
Returns the number of bytes required to store a string in memory. Returns null if string|varname is null.
Character Length
Len(string|varname)
Returns the length of the string|varname . Returns null if string|varname is null.
XSLT
Character Length
<?xml version="1.0" encoding="UTF-8"?> ... <xsl:value-of select="string-length('møøse')" />
xTalk
Byte Length
Interpreter: HyperCard
put the length of "Hello World"
or
put the number of characters in "Hello World"
Character Length
Interpreter: HyperCard
put the length of "Hello World"
or
put the number of characters in "Hello World"
- Programming Tasks
- Solutions by Programming Task
- 4D
- ActionScript
- Ada
- ALGOL 68
- AppleScript
- AppleScript examples needing attention
- Examples needing attention
- AWK
- C
- C++
- C sharp
- Clean
- ColdFusion
- ColdFusion examples needing attention
- Common Lisp
- Common Lisp examples needing attention
- Component Pascal
- Component Pascal examples needing attention
- E
- Forth
- Haskell
- IDL
- IDL examples needing attention
- Java
- JavaScript
- JudoScript
- JudoScript examples needing attention
- LSE64
- Lua
- Lua examples needing attention
- MAXScript
- MIRC Scripting Language
- MIRC Scripting Language examples needing attention
- Objective-C
- OCaml
- OCaml examples needing attention
- Perl
- PHP
- PHP examples needing attention
- PL/SQL
- PL/SQL examples needing attention
- Pop11
- Python
- Ruby
- Scheme
- Scheme examples needing attention
- Seed7
- Smalltalk
- Smalltalk examples needing attention
- Standard ML
- Standard ML examples needing attention
- Tcl
- Toka
- UNIX Shell
- VBScript
- XSLT
- XTalk
- XTalk examples needing attention