UTF-8: Difference between revisions

Content added Content deleted

Inline

Revision as of 04:18, 22 January 2008

Unicode Transformation Format, 8-bit representation or UTF-8 is a particular encoding of Unicode code-points into eight-bit octets. It was originally developed for Bell Labs' Plan 9 operating system by Ken Thompson (inventor of Unix) and Rob Pike in 1992. It is widely used on Unix-like systems and for XML documents.

Some advantages of UTF-8:

byte-order independent
subsumes 7-bit ASCII
one can detect the start of characters
can encode code-points at least 32-bits long

Challenges:

characters do not have a fixed size. One needs to walk an entire string to determine the character length of a string.
biased towards European scripts. Japanese code points are more compactly stored in other encodings, such as UTF-16 or UCS-2.

Revision as of 20:53, 6 January 2008 (view source) rosettacode>Mwn3d m (Added to the encyclopedia.) ← Older edit		Revision as of 04:18, 22 January 2008 (view source) rosettacode>Mwn3d m (Replaced encyclopedic tag) Newer edit →
Line 1:		Line 1:
	~~{{encyclopedic}}~~'''Unicode Transformation Format, 8-bit representation''' or UTF-8 is a particular encoding of [[Unicode]] code-points into eight-bit octets. It was originally developed for Bell Labs' Plan 9 operating system by Ken Thompson (inventor of Unix) and Rob Pike in 1992. It is widely used on Unix-like systems and for XML documents.		[[Category:Encyclopedia]]'''Unicode Transformation Format, 8-bit representation''' or UTF-8 is a particular encoding of [[Unicode]] code-points into eight-bit octets. It was originally developed for Bell Labs' Plan 9 operating system by Ken Thompson (inventor of Unix) and Rob Pike in 1992. It is widely used on Unix-like systems and for XML documents.

	Some advantages of UTF-8:		Some advantages of UTF-8: