Idiomatically determine all the characters that can be used for symbols: Difference between revisions

Line 373:
We enforce the whitespace restriction to prevent insanity in the readers of programs.
That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. <tt>:-)</tt>
<lang Phix>function run(string ident)
integer fn = open("test.exw","w")
printf(fn,"object %s",ident)
return system_exec("p -batch test.exw")
end function
string ok1 = "", ok2 = ""
integer ng1 = 0, ng2 = 0
for ch=0 to 255 do
printf(1,"checking %d/255...\r",ch)
if find(ch,"\t\r\n ") then
ng1 += 1
ng2 += 1
string c = sprintf("%c",ch)
if run(c)==0 then ok1 &= c else ng1 += 1 end if
if run("_"&c)==0 then ok2 &= c else ng2 += 1 end if
end if
end for
printf(1,"1st character: %d no good, %d OK %s\n",{ng1,length(ok1),ok1})
printf(1,"2nd..nth char: %d no good, %d OK %s\n",{ng2,length(ok2),ok2})</lang>
1st character: 194 no good, 62 OK ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô
2nd..nth char: 181 no good, 75 OK �0123456789;ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇêöÜú╗╬¤Ô
Note that ptok.e (part of the compiler) currently contains the following:
<lang Phix>charset[#80] = LETTER -- more unicode
charset[#88] = LETTER -- more unicode
charset[#94] = LETTER -- for rosettacode/unicode (as ptok.e is not stored in utf8)
charset[#9A] = LETTER -- for rosettacode/unicode
charset[#A3] = LETTER -- for rosettacode/unicode
charset[#BB] = LETTER -- for rosettacode/unicode
charset[#CE] = LETTER -- for rosettacode/unicode
charset[#CF] = LETTER
charset[#E2] = LETTER</lang>
If that is extended (with more utf-8 handling) then obviously the output will change.
