Talk:Determine if a string has all the same characters: Difference between revisions

m
no edit summary
m (→‎What is a character?: posted a reply.)
mNo edit summary
 
(3 intermediate revisions by 3 users not shown)
Line 60:
 
:: Yes, I was aware. &nbsp; ASCII &nbsp; '03'x &nbsp; is so primative. &nbsp; &nbsp; <big><big><big> ☻ </big></big></big> &nbsp; &nbsp; -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 16:47, 30 October 2019 (UTC)
 
:Specifying a "character" without also specifying an encoding is pretty vague, but I don't necessarily think that is always a bad thing. In this particular task, I think is it somewhat useful to leave it up to interpretation since that way it doesn't lock out languages that may not be so modern encoding aware (Unicode) but also doesn't constrain unnecessarily the one that are. It might be useful to encourage some verbiage in each languages task entry ''about'' any such constraints or abilities, but I am somewhat against enforcing any particular encoding. Rather err on the side of inclusivity and deal with some fuzzy definitions than enforce rigid compliance and remove room to explore. As a point of fact, I took some pains to demonstrate in these tasks how my particular favorite language deals with some thorny issues when dealing with multi-byte utf-8 encoded Unicode. (Such as Unicode equivalence. :-) )
 
:<blockquote>''Quote'' As an aside, I always thought of ''Thundergnat'' as quite the character, but he is much more than 8 bits... ''End Quote''</blockquote>
:I resemble that remark. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 23:47, 30 October 2019 (UTC)
 
::As far as unicode is concerned and bearing in mind it's not needed for the 'compulsory' examples which Gerard has set anyway, I agree with the thrust of what Thundergnat has said that a 'character' should be defined in whatever way seems most natural for the language you're using.
 
::In the case of Go, a character (or 'rune' as we prefer to call it) is simply a unicode code point expressed as a 4 byte integer. String literals are encoded as UTF-8 and are not normalized by default (though there is a supplemental package which can do this). Consequently, an accented character is not the same as the corresponding unaccented character plus the accent. Also, unlike Perl 6, there appears to be no easy way to deal with emoji ZWJ sequences at the present time. I've therefore had to be careful in the Go examples to only use emojis which are complete in themselves. --[[User:PureFox|PureFox]] ([[User talk:PureFox|talk]]) 17:11, 31 October 2019 (UTC)
:::Ok, I'm fine with that. It means that different program will give different results for the same input, but it seems to be the consensus, and we are not going to reimplement ICU, nor to dumb down languages which are able to deal with Unicode. By the way, the langages I use (Python, R, Stata mostly) don't normalize either by default. [[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 18:23, 31 October 2019 (UTC)
1,336

edits