Talk:Determine if a string has all the same characters: Difference between revisions

From Rosetta Code
Content added Content deleted
(Created page with "== What is a character here? == For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much...")
 
Line 1: Line 1:
== What is a character here? ==
== What is a character? ==
For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with [https://en.wikipedia.org/wiki/Unicode_equivalence Unicode equivalence] it is. How are languages with Unicode support expected to deal with this?
For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with [https://en.wikipedia.org/wiki/Unicode_equivalence Unicode equivalence] it is. How are languages with Unicode support expected to deal with this?


Line 5: Line 5:


And while we are at it, note that, while "EEE" is a string which ''has all the same characters'', "EΕЕ" is not.
And while we are at it, note that, while "EEE" is a string which ''has all the same characters'', "EΕЕ" is not.

Of course, the same comment applies to the [[Determine if a string has all unique characters|other task]].


[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 13:45, 30 October 2019 (UTC)
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 13:45, 30 October 2019 (UTC)

Revision as of 13:47, 30 October 2019

What is a character?

For old-style strings where one character equals one byte, it's not really a problem. Nowadays with Unicode and multibyte characters, and much worse with Unicode equivalence it is. How are languages with Unicode support expected to deal with this?

Wikipedia gives the example of the character "ñ" which can be encoded by U+00F1, or alternately U+006E followed by U+0303. In Python, the latter would be a two "characters" string by default, which could be normalized with the unicodedata.normalize function. However, Notepad++ or MS Word correctly print both as ñ.

And while we are at it, note that, while "EEE" is a string which has all the same characters, "EΕЕ" is not.

Of course, the same comment applies to the other task.

Eoraptor (talk) 13:45, 30 October 2019 (UTC)