Talk:Determine if a string has all the same characters

From Rosetta Code
Revision as of 13:46, 30 October 2019 by Eoraptor (talk | contribs) (Created page with "== What is a character here? == For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

What is a character here?

For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with Unicode equivalence it is. How are languages with Unicode support expected to deal with this?

Wikipedia gives the example of the character "ñ" which can be encoded by U+00F1, or alternately U+006E followed by U+0303. In Python, the latter would be a two "characters" string by default, which could be normalized with the unicodedata.normalize function. However, Notepad++ or MS Word correctly print both as ñ.

And while we are at it, note that, while "EEE" is a string which has all the same characters, "EΕЕ" is not.

Eoraptor (talk) 13:45, 30 October 2019 (UTC)