Talk:Unicode variable names: Difference between revisions

→‎Why!: fix broken link
m (fixed typing error)
(→‎Why!: fix broken link)
 
(9 intermediate revisions by 4 users not shown)
Line 1:
==Why!==
I just saw [httphttps://www.rosettacode.org/mwwiki/index.php?title=Bitmap/Bresenham%27s_line_algorithm&curid=3214&'s_line_algorithm?diff=11235694540&oldid=112342&rcid=11352994539 this] edit and wondered ...? --[[User:Paddy3118|Paddy3118]] 07:23, 1 July 2011 (UTC)
:Yeah, that is not good at all. It means that the source code needs conversion to be usable on ascii based platforms. Maybe we will need a unidecoder for such sources. That is probably a future task. [[User:Markhobley|Markhobley]] 20:57, 6 July 2011 (UTC)
::The change was made to Perl 6 code, and it is my understanding that that's correct code for that language. Usage of Unicode glyphs in Perl 6 code is pretty normal. --[[User:Short Circuit|Michael Mol]] 21:58, 6 July 2011 (UTC)
Line 20:
::::: How humanist. So a Russian wishing to type some text in, I don't know, Russian, he should use an US keyboard instead of a Cyrillic one, and type a whole lot of escape sequences, so that not even a Russian can read it, but some outdated compiler will be happy. Do we serve the computers, or do they serve us? --[[User:Ledrug|Ledrug]] 01:30, 10 July 2011 (UTC)
:::::: He can type text using a Russian keyboard. We are not talking about text, we are talking about source code. He can have ASCII symbols on the keyboard, so that he can still enter source code. [[User:Markhobley|Markhobley]] 07:16, 10 July 2011 (UTC)
:::::: Supposing a person in Thailand, has written the source code using Thai characters, and the Russian wants to edit it. He has not got the Thai symbols on his keyboard. The job is not going to be easy. It would be better if the code was in ASCII, and both keyboards carried the ASCII symbols. [[User:Markhobley|Markhobley]] 08:05, 10 July 2011 (UTC)
 
::::::: You can assume a Russian or Thai is better off typing in escape sequences than his natural language, he might even enjoy it, who knows. You can stick with ASCII, or go back to punch cards, or directly wire 1s and 0s into your computer with a soldering iron for all I care, it's really not my problem. The rest of the world does see the benifit of a large unified character set and will move towards it, and I personally would rather get along with it -- but no more arguing here from me, you win. --[[User:Ledrug|Ledrug]] 08:43, 10 July 2011 (UTC)
::::“You just need to speak the language of the of compiler or interpreter.” Isn't it nice that a number of languages are happy to support non-ASCII in identifiers then? People can use (variations on) their own (human) language when communicating with the computer, and it will all be semantically sound too. Moreover, if the language supports them, it'd be a poor implementation of that language that didn't. ;–) –[[User:Dkf|Donal Fellows]] 21:45, 9 July 2011 (UTC)
:::::: Just to be clear, I meant keyboards should carry ASCII symbols in addition to native language symbols, not instead of native language symbols. :) [[User:Markhobley|Markhobley]] 09:30, 10 July 2011 (UTC)
 
== The wrong triangle ==
Line 57 ⟶ 59:
 
:They could do with grouping together similar looking symbols, and making them interchangeable, so that substitution can occur. I was just looking at the Unicode table:    (five identical looking pipe signs), પ and ૫, ඤ and ඥ, ረ and ሪ, ⬦ and ⬨. That was interesting. These last two appeared filled in one browser window, but hollowed out, when I pasted them here. Some symbols have no visual difference whatsoever. It is going to be a right nightmare trying to spot mismatched characters. [[User:Markhobley|Markhobley]] 01:12, 10 July 2011 (UTC)
 
:: They're not ''semantically'' interchangeable. They might not have the same glyph in all fonts. They might participate in ligatures differently (Latin-based writing systems are largely simple that way, but other writing systems are very much not). And anyway, it tends to not be such a huge problem in practice; individual (human) languages don't have to deal with the problem in the first place, it's only when you try to support all known writing systems that you run into problems. (If you want an area where there ''are'' problems due to the sorts of issues you mention, try unicode domain names; that's a totally different problem from source code though.) –[[User:Dkf|Donal Fellows]] 23:45, 10 July 2011 (UTC)
 
==Security considerations==
I want to add an overview which approach was taken:
 
* ASCII only
* None (such as php, D, nim, crystal, ...)
* Certain ranges (e.g. ALtId in C11) (C, C++) (e.g. prone to rtl attacks)
* only TR31 (Letters, Numbers, ...)
* disallow excluded and limited use scripts
* which security profile (2-6) esp. how to check mixed scripts,
* require normalization, and which. (such as the odd choice of NFKC with python 3)
 
I'm working on such an overview and tools for all main stream languages (and filesystems and everything with names). Because 99% of them are insecure and do not follow the unicode security guidelines for identifiers, leading to unidentifiable identifiers. https://github.com/rurban/libu8ident --[[User:ReiniUrban|ReiniUrban]] ([[User talk:ReiniUrban|talk]]) 09:34, 30 December 2021 (UTC)
10,327

edits