Idiomatically determine all the characters that can be used for symbols: Difference between revisions

Content added Content deleted

Inline

Revision as of 18:28, 23 March 2014

Idiomatically determine all the characters that can be used for symbols. The word symbols is meant things like names of variables, procedures (i.e., named fragments of programs, functions, subroutines, routines), statement labels, events or conditions, and in general, anything a computer programmer can choose to name, but not being restricted to this list. Identifiers might be another name for symbols.

The method should find the characters regardless of the hardware architecture that is being used (ASCII, EBCDIC, or other).

Task requirements

Display the set of all the characters that can be used for symbols which can be used (allowed) by the computer program. You may want to mention what hardware architecture is being used, and if applicable, the operating system.

Note that most languages have additional restrictions on what characters can't be used for the first character of a variable or statement label, for instance. These type of restrictions needn't be addressed here (but can be mentioned).

See also

Idiomatically determine all the lowercase and uppercase letters.

ooRexx

<lang oorexx>/*REXX program determines what characters are valid for REXX symbols.*/ /* copied/adjusted from REXX */ a= /*set symbol characters " " */

   do j=0  for 2**8                   /*traipse through all the chars. */
   _=d2c(j)                           /*convert decimal number to char.*/
   if datatype(_,'S')  then a=a || _  /*Symbol char?  Then add to list.*/
   end   /*j*/                        /* [?] put some chars into a list*/

say ' symbol characters: ' a /*display all symbol characters.*/</lang>

Output:

     symbol characters: !.0123456789?ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz

Perl 6

Any Unicode character or combination of characters can be used for symbols in Perl 6. Here's some counting rods and some cuneiform: <lang perl6>sub postfix:<𒋦>($n) { say "$n trilobites" }

sub term:<𝍧> { unival('𝍧') }

𝍧𒋦</lang>

Output:

8 trilobites

Of course, as in other languages, most of the characters you'll typically see in names are going to be alphanumerics from ASCII (or maybe Unicode), but that's a convention, not a limitation, due to the syntactic category notation demonstrated above, which can introduce any sequence of characters as a term or operator.

Actually, the above is a slight prevarication. The syntactic category notation does not allow you to use whitespace in the definition of a new symbol. But that leaves many more characters allowed than not allowed. Hence, it is much easier to enumerate the characters that cannot be used in symbols: <lang perl6>say .fmt("%4x"),"\t", uniname($_)

   if uniprop($_,'Z')
       for 0..0x1ffff;</lang>

Output:

  20	SPACE
  a0	NO-BREAK SPACE
1680	OGHAM SPACE MARK
2000	EN QUAD
2001	EM QUAD
2002	EN SPACE
2003	EM SPACE
2004	THREE-PER-EM SPACE
2005	FOUR-PER-EM SPACE
2006	SIX-PER-EM SPACE
2007	FIGURE SPACE
2008	PUNCTUATION SPACE
2009	THIN SPACE
200a	HAIR SPACE
2028	LINE SEPARATOR
2029	PARAGRAPH SEPARATOR
202f	NARROW NO-BREAK SPACE
205f	MEDIUM MATHEMATICAL SPACE
3000	IDEOGRAPHIC SPACE

We enforce the whitespace restriction to prevent insanity in the readers of programs. That being said, even the whitespace restriction is arbitrary, and can be bypassed by deriving a new grammar and switching to it. We view all other languages as dialects of Perl 6, even the insane ones. :-)

Python

See String class isidentifier.

REXX

<lang rexx>/*REXX program determines what characters are valid for REXX symbols.*/ @= /*set symbol characters " " */

   do j=0  for 2**8                   /*traipse through all the chars. */
   _=d2c(j)                           /*convert decimal number to char.*/
   if datatype(_,'S')  then @=@ || _  /*Symbol char?  Then add to list.*/
   end   /*j*/                        /* [↑] put some chars into a list*/

say ' symbol characters: ' @ /*display all symbol characters.*/

                                      /*stick a fork in it, we're done.*/</lang>

Programming note: REXX allows any symbol to begin a (statement) label, but variables can't begin with a period (.) or a numeric digit.

All examples below were executed on a (ASCII) PC using Windows/XP and Windows/7 with code page 437 in a DOS window.

Using PC/REXX and
Using Personal REXX and
Using Regina (versions 3.2 ───► 3.7)
output

     symbol characters:  !#$.0123456789?@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz

Using R4
output

     symbol characters:  !#$.0123456789?@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£áíóúñÑ╡╢╖─╞╟╨╤╥╙╘╒╓╫╪▐αßΓπΣσµτΦΘΩδ∞φ

Using ROO
output


     symbol characters:  !#$.0123456789?@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£áíóúñÑ╡╢╖╞╟╨╤╥╙╘╒╓╫╪▐αßΓπΣσµτΦΘΩδ∞φ

@@ Line 1: / Line 1: @@
 {{draft task}}
 Idiomatically determine all the characters that can be used for ''symbols''.
+The word ''symbols'' is meant things like names of variables, procedures (i.e., named fragments of programs, functions, subroutines, routines), statement labels, events or conditions, and in general, anything a computer programmer can choose to ''name'', but not being restricted to this list. ''Identifiers'' might be another name for ''symbols''.
+The method should find the characters regardless of the hardware architecture that is being used  (ASCII, EBCDIC, or other).
-The word ''symbols'' is meant things like names of variables, procedures/programs/functions/subroutines/routines, statement labels, events or conditions, and in general, anything a computer programmer can choose to ''name'', but not being restricted to this list. &nbsp; ''Identifiers'' might be another name for ''symbols''.
+;Task requirements
-The method should find the characters regardless of the hardware architecture that is being used &nbsp; (ASCII, EBCDIC, or other).
-;task requirements
 Display the set of all the characters that can be used for symbols which can be used (allowed) by the computer program.
+You may want to mention what hardware architecture is being used, and if applicable, the operating system.
+Note that most languages have additional restrictions on what characters can't be used for the first character of a variable or statement label, for instance.  These type of restrictions needn't be addressed here (but can be mentioned).
-<br>You may want to mention what hardware architecture is being used, and if applicable, the operating system.
-Note that most languages have additional restrictions on what characters can't be used for the first character of a variable or statement label, for instance. &nbsp; These type of restrictions needn't be addressed here &nbsp; (but can be mentioned).
+;See also
-;Cf:
 * [[Idiomatically_determine_all_the_lowercase_and_uppercase_letters|Idiomatically determine all the lowercase and uppercase letters]].
-<br>
 =={{header|ooRexx}}==