Talk:Determine if a string has all the same characters: Difference between revisions

Content added Content deleted

Inline

Revision as of 14:06, 30 October 2019

What is a character?

For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with Unicode equivalence it is. How are languages with Unicode support expected to deal with this?

Wikipedia gives the example of the character "ñ" which can be encoded by U+00F1, or alternately U+006E followed by U+0303. In Python, the latter would be a two "characters" string by default, which could be normalized with the unicodedata.normalize function. However, Notepad++ or MS Word correctly print both as ñ.

And while we are at it, note that, while "EEE" is a string which has all the same characters, "EΕЕ" is not.

Of course, the same comment applies to the other task.

Eoraptor (talk) 13:45, 30 October 2019 (UTC)

@@ Line 2: / Line 2: @@
 For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with [https://en.wikipedia.org/wiki/Unicode_equivalence Unicode equivalence] it is. How are languages with Unicode support expected to deal with this?
-Wikipedia gives the example of the character "ñ" which can be encoded by U+00F1, or alternately U+006E followed by U+0303 (which is "ñ"). In Python, the latter would be a two "characters" string by default, which could be normalized with the unicodedata.normalize function. However, Notepad++ or MS Word correctly print both as ñ.
+Wikipedia gives the example of the character "ñ" which can be encoded by U+00F1, or alternately U+006E followed by U+0303. In Python, the latter would be a two "characters" string by default, which could be normalized with the unicodedata.normalize function. However, Notepad++ or MS Word correctly print both as ñ.
 And while we are at it, note that, while "EEE" is a string which ''has all the same characters'', "EΕЕ" is not.