Naming conventions: Difference between revisions

Content added Content deleted

Inline

Revision as of 23:41, 29 March 2016

Many languages have either (or both) de-facto naming conventions or de-jure naming conventions for names used in the language and/or its libraries. These may take the form of prefixes, suffixes or a combination of upper-case and lower-case characters. Often the conventions are a bit haphazard, especially where the language and/or library has gone through a periods of evolution. (In this case: give a brief example and description).

Document as best you can (with simple examples where possible) the evolution and current status of these naming conventions. For example, name conventions for:

Procedure and operator names. (Intrinsic or external)
Class, Subclass and instance names.
Built-in versus libraries names.

If possible, indicate where the naming conventions are implicit, explicit, mandatory or discretionary. Any tools that enforced the the naming conventions. Any cases where the naming convention as commonly violated.

If possible, indicate where the convention wased use to hint at other issues, for example the C standard library uses a prefix of "_" to "hide" raw Operating System calls from the non systems-programmer. Whereas Python embeds member functions in between "__" to make a member function "private".

See also

Wikipedia: Naming convention (programming)

ALGOL 68

In the Formal Specification

The revised report used "shorthand" to indicate an MODE was "private" to the language specification. The character ℒ was used to indicate that the name could be repeated for every precision... e.g. ℒ INT could mean: ... SHORT SHORT INT, SHORT INT, INT, LONG INT, LONG LONG INT etc and ℓ cos could mean: short short cos, short cos, cos, long cos, long long cos etc. <lang algol68>MODE ℵ SIMPLEOUT = UNION (≮ℒ INT≯, ≮ℒ REAL≯, ≮ℒ COMPL≯, BOOL, ≮ℒ BITS≯, CHAR, [ ] CHAR); PROC ℓ cos = (ℒ REAL x) ℒ REAL: ¢ a ℒ real value close to the cosine of 'x' ¢;

PROC ℓ complex cos = (ℒ COMPL z) ℒ COMPL: ¢ a ℒ complex value close to the cosine of 'z' ¢;

PROC ℓ arccos = (ℒ REAL x) ℒ REAL: ¢ if ABS x ≤ ℒ 1, a ℒ real value close

     to the inverse cosine of 'x', ℒ 0 ≤ ℒ arccos (x) ≤ ℒ pi ¢; </lang>

For LONG LONG MODEs this would be coded as: <lang algol68>PROC long long cos = (LONG LONG REAL x) LONG LONG REAL: ¢ a ℒ real value close to the cosine of 'x' ¢;

PROC long long complex cos = (LONG LONG COMPL z) LONG LONG COMPL: ¢ a ℒ complex value close to the cosine of 'z' ¢;

PROC long long arccos = (LONG LONG REAL x) LONG LONG REAL: ¢ if ABS x ≤ ℒ 1, a ℒ real value close

     to the inverse cosine of 'x', ℒ 0 ≤ ℒ arccos (x) ≤ ℒ pi ¢; </lang>

Note: The type returned by the procedure is generally prefixed to the procedure name.

Standard language

Because Algol68 was required on 6-bit and 7-bit, but could take advantage of wide character sets the naming convention could be mechanically varied across platforms. In a 7-bit environment reserved words, modes and operators were typically upper-case. Constants, variable and procedure names were typically lower-case.

The more peculiar convention was for reserved words, modes and operators was for these to appear in code as bold typeface or even underlined when published.

For example:

Algol68 "strict"
as typically published

¢ underline or 
  bold typeface ¢
mode xint = int;
xint sum sq:=0;
for i while
  sum sq≠70×70
do
  sum sq+:=i↑2
od

Quote stropping
(like wikitext)

<lang algol68> 'pr' quote 'pr' 'mode' 'xint' = 'int'; 'xint' sum sq:=0; 'for' i 'while'

 sum sq≠70×70

'do'

 sum sq+:=i↑2

'od' </lang>

For a 7-bit character code compiler

<lang algol68> .PR UPPER .PR MODE XINT = INT; XINT sum sq:=0; FOR i WHILE

 sum sq/=70*70

DO

 sum sq+:=i**2

OD </lang>

For a 6-bit character code compiler

<lang algol68> .PR POINT .PR .MODE .XINT = .INT; .XINT SUM SQ:=0; .FOR I .WHILE

 SUM SQ .NE 70*70

.DO

 SUM SQ .PLUSAB I .UP 2

.OD </lang>

Algol68 using res stropping
(reserved word)

<lang algol68> .PR RES .PR mode .xint = int; .xint sum sq:=0; for i while

 sum sq≠70×70

do

 sum sq+:=i↑2

od </lang>

Note that spaces are permitted in constants, variable and procedure names.

Various other prefixes and suffixes (grouped by type function) can be found in the standard prelude:

To query file capabilities	standard file and channels	file procedures	Exception handling procedures	Implementation specific precisions	mode limits and sizes	special 'characters
get possible put possible bin possible reset possible set possible reidf possible	stand in stand out stand back stand in channel stand out channel stand back channel	print, write, put, read, get printf, writef, putf, readf, getf print bin, put bin, read bin, get bin print ℓ int, put ℓ int, read ℓ int, get ℓ int print ℓ real, put ℓ real, read ℓ real, get ℓ real etc	on logical file end on physical file end on line end on page end on format end on value error on open error on transput error on format error	int lengths int shorths real lengths real shorths bits lengths bits shorths bytes lengths bytes shorths	ℓ bits width ℓ bytes width ℓ int width ℓ real width ℓ exp width ℓ max int ℓ max real ℓ small real	error char exp char formfeed char newline char null character tab char

AWK

Field names begin with $ so $1 is the first field, $2 the second and $NF the
last. $0 references the entire input record.
Function and variable names are case sensitive and begin with an alphabetic
character or underscore followed by any number of: a-z, A-Z, 0-9, _
The awk language is type less; variables are either string or number
depending upon usage. Variables can be coerced to string by concatenating ""
or to number by adding zero. For example:
str = x ""
num = x + 0
Below are the names of the built-in functions, built-in variables and other
reserved words in the awk language separated into categories. Also shown are
the names of gawk's enhancements.
patterns:
BEGIN END
BEGINFILE ENDFILE (gawk)
actions:
break continue delete do else exit for if in next return while
case default switch (gawk)
arithmetic functions:
atan2 cos exp int log rand sin sqrt srand
bit manipulation functions:
and compl lshift or rshift xor (gawk)
i18n functions:
bindtextdomain dcgettext dcngettext (gawk)
string functions:
gsub index length match split sprintf sub substr tolower toupper
asort asorti gensub patsplit strtonum (gawk)
time functions:
mktime strftime systime (gawk)
miscellaneous functions:
isarray (gawk)
variables:
ARGC ARGV CONVFMT ENVIRON FILENAME FNR FS NF NR OFMT OFS ORS RLENGTH RS RSTART SUBSEP
ARGIND BINMODE ERRNO FIELDWIDTHS FPAT FUNCTAB IGNORECASE LINT PREC PROCINFO ROUNDMODE RT SYMTAB TEXTDOMAIN (gawk)
function definition:
func function
input-output:
close fflush getline nextfile print printf system
pre-processor directives:
@include @load (gawk)
special files:
/dev/stdin /dev/stdout /dev/error

</lang>

BASIC

BASIC is case-insensitive, although keywords are generally written entirely in uppercase.

A variable or function can have a suffix to indicate the type (which types are available depending on what implementation is in use): ! for single-precision, @ for fixed-point, # for double-precision, $ for strings, % for short integers, & for long integers.

It is also possible to use DEFtype commands to make the type of the variable to be based on what the first letter is (similar to FORTRAN). The default for Microsoft BASIC is: DEFSNG A-Z

BBC BASIC

Commands and keywords have to be entered in upper case. Variable names are case-sensitive: FOO, Foo, and foo are all different variables. Further, the same name can be used with different suffixes to refer to variables of different types: foo is a float, foo% is an integer, foo$ is a string, foo() is an array of floats, etc. There is nothing to prevent all these names being used in the same program.

The names of user-defined functions (which return exactly one value) and procedures (which may have no return value, or one, or several) must begin with FN or PROC, respectively. Many users find it convenient to follow this prefix with an underscore—so a procedure that takes a float, an array of strings, and an integer and then returns two integers might be defined as follows: <lang bbcbasic>DEF PROC_foo(bar, baz$(), quux%, RETURN fred%, RETURN jim%)</lang> Names like PROCfoo and FNbar are sometimes used, and even PROCFOO and FNBAR are entirely legal; but they are probably less readable.

TitleCase and camelCase are not much used in BBC BASIC, perhaps not used at all; lower_case_with_underscores is preferred for long names. In general, using lower case for user-defined names helps maintain a visual contrast with reserved words and the names of system variables.

The twenty-six integer variables A% to Z% (capitalized) are 'static': that is to say, they persist throughout an interpreter session and are unaffected by the commands NEW and CLEAR. They can thus be used to pass a small amount of data from one program to another.

If the first line of the program is a comment line of the form REM >myprog, the SAVE command can be used with no filename and the program will be saved as (in this case) myprog. Otherwise, it would be necessary to use SAVE "myprog".

C

Base language

All reserved words and operators are lower-case. e.g. while, for, if, sizeof and return etc.

Libraries

Constants that appear in C "header" files are typically in upper-case: <lang c>O_RDONLY, O_WRONLY, or O_RDWR. O_CREAT, O_EXCL, O_NOCTTY, and O_TRUNC</lang> Note also that there are remnants of some historic naming conventions in C where constants were required to be 8 characters or less. The "O_CREAT" constant is an example.

Types are often suffixed with a "_t", e.g. size_t, and "private" types and arguments are prefixed with "__": <lang c>extern size_t fwrite (__const void *__restrict __ptr, size_t __size,

                     size_t __n, FILE *__restrict __s) __wur;</lang>

However there are some instances where types use all upper-case. The classic is the type FILE.

In C, the standard library for floating point is focused on double precision, hence the function "cos" is for double precision, and a suffix of "f" and "l" indicate single precision and quad precision respectively. <lang c>#include <math.h> double cos(double x); float cosf(float x); long double cosl(long double x);</lang>

Whereas for complex variable a prefix of "c" is added. <lang c>#include <complex.h> double complex ccos(double complex z); float complex ccosf(float complex z); long double complex ccosl(long double complex z);</lang>

This prefix/suffix convention extends to other standard c library function, for example in the following the "f" suffix indicates that an argument is a format string, the prefixes of "s", "v" and "n" hint at other argument types: <lang c>#include <stdio.h>

int printf(const char *format, ...); int fprintf(FILE *stream, const char *format, ...); int sprintf(char *str, const char *format, ...); int snprintf(char *str, size_t size, const char *format, ...);

include <stdarg.h>

int vprintf(const char *format, va_list ap); int vfprintf(FILE *stream, const char *format, va_list ap); int vsprintf(char *str, const char *format, va_list ap); int vsnprintf(char *str, size_t size, const char *format, va_list ap);</lang>

Quirks

The Unix C standard library uses a prefix of "_" to "hide" raw Operating System calls from the non systems-programmer

Fortran

Every Fortran variable has an implicit type determined by the first letter of the variable. The implicit types are as follows. <lang fortran>IMPLICIT REAL(A-H,O-Z), INTEGER(I-M)</lang>

The implicit declaration sometimes lead to problems with misspelled variables (and typos) being accidentally implicitly declared in a program and resulting in (hard to find) code bugs. For example the output from the program isn't all the integers from 1 to 10: <lang fortran> DO 999 I=1 10

       PRINT *,I

999 CONTINUE</lang>

The next effect is that loop variable are typically one of I, J, K, L, M, N

Functions

"D" is often used to indicate that a INTRINSIC FUNCTION returns a DOUBLE PRECISION REAL number. e.g. "cosine" in DOUBLE precision is DCOS"
"Q" is often used to indicate that a INTRINSIC FUNCTION returns a QUAD PRECISION REAL number. e.g. "cosine" in QUAD precision is QCOS"
"C" is often used to indicate that a INTRINSIC FUNCTION returns a COMPLEX number. e.g. "cosine" QUAD COMPLEX use DCCOS

And combinations can be applied...

"CQ" is often used to indicate that a INTRINSIC FUNCTION returns a QUAD COMPLEX number. e.g. "cosine" QUAD COMPLEX use DCCOS

Quirks

In Fortran 77 then <lang fortran>IMPLICIT NONE</lang> was available to disable implicit typing, prior to this the code could use <lang fortran>IMPLICIT LOGICAL</lang> in the hope that the compile would detect an undeclared LOGICAL variable numerical context, hence report a semantic type error.

Haskell

Most keywords are in lowercase. Of punctuation marks, only the colon is considered as uppercase and all others that are valid are considered as lowercase.

Haskell requires that names of types, constructors, classes, and modules start with an uppercase letter, while names of constants, variables, fields of record types, must start with lowercase letters.

It is common to use camel case although not required. Sometimes the name of something ends with an apostrophe to represent a mathematical "prime" mark.

J

The nice thing about conventions is much like the nice thing about standards: there are so many to choose from.

Classic J tends to favor terse names. One influence, here, is that it's rather dismaying when the name of your procedure is longer than its implementation. This matches the style of classic works on mathematics, and also makes it easier to type, and easier to keep code near to related code. This style is especially popular with local variables.

J also sometimes borrows from C's conventions (ALL CAPS constant names, for example).

Another convention describes the transformation being done using the convention afterFromBefore. This matches the right to left style of assignment operations (which much of J's syntax also adopts). When combined with the "terse naming" convention, you get things like hfd (meaning hexadecimal from decimal).

Another convention, when dealing with external code, involves simply using the foreign names. You can see this, for example, in the opengl support. This makes it a bit easier to use the original documentation.

Other conventions are also in use.

OASYS Assembler

Prefixes and suffixes are required on names. It is allowed for a name to consist of only the prefix and/or suffix without any letters or numbers.

The prefix is one of:

no prefix = Built-in opcode or a macro
! = Static object
% = Global variable
, = Local variable or argument
. = Property
& = Method
? or * = Class
: = Label
' = Vocabulary word

The suffix specifies the data type of a variable or property or argument or the type of the return value of a method, and it is one of:

no suffix = Void; used for methods which do not return a value
@ = Object
# = Integer
$ = String
^ = Pointer; may be followed by another suffix

Terse names are generally preferred.

Perl 6

Perl 6 is written in Unicode, and has consistent Unicode semantics regardless of the underlying text representations. By default Perl 6 presents Unicode in "NFG" formation, where each grapheme counts as one character. A grapheme is what the novice user would think of as a character in their normal everyday life, including any diacritics.

Built-in object types start with an uppercase letter. This includes immutable types (e.g. Int, Num, Complex, Rat, Str, Bit, Regex, Set, Block, Iterator), as well as mutable (container) types, such as Scalar, Array, Hash, Buf, Routine, Module, and non-instantiable Roles such as Callable and Integral. The names may extend to CamelCase for compound words: IntStr, CaptureCursor, BagHash, SoftRoutine.

Non-object (native) types are lowercase: int, num, complex, rat, buf, bit.

Nearly all built-in subroutines, functions, methods and pragmas included in Perl 6 CORE are lowercase or lower kebab-case. (Compound words joined with hyphens rather than underscores or camelCase.) .grep, .pairs, .log, .defined, .subst-rw. The few notable exceptions are those which can radically change behaviour of the executing code. They are in all-cap/kebab-case to make them stand out: EVAL, MONKEY-TYPING.

All upper case names are semi-reserved. You are free to use them, but are warned that you may encounter future collisions with internal usage. Upper case names are used for pseudo-packages: MY, OUR, CORE, GLOBAL, etc., for relative scope identifiers: CALLER, OUTER, SETTING, PARENT, etc. and other things.

Variables in Perl 6 CORE tend be lower kebab-case for lexical variables and upper case for special or package globals. They have an attached, prefix sigil (or twigil) to indicate what type of object they hold and what methods are available to operate on them.

In user space, there are very few restrictions on how things are named. Identifers of any type can not contain white space. Subroutines must start with a letter character, any unicode character that has a "letter" property. Variable names can't contain any of the sigil, twigil or comment characters ($, @, %, *, ?, =, :, #). Outside of those few restrictions, it's pretty much a free-for-all.

That being said, there are some community conventions which are encouraged, though not enforced. Descriptivness is favoured over terseness, though this should be scaled to the scope of the object. It is perfectly fine to name an index variable in a three line loop, $i. An object in global scope with dozens of methods may be better off with a more descriptive name. It is encouraged to name subroutines for what they do to make it easier for others to follow your logic. Nouny things should have nouny names. Verby things should be verby. If you aren't going to follow convention, at least be consistent.

Python

Class names are typically in CamelCase, often this is reflected in the module name.
Private member functions are embeded between "__" to make a member function "private".
Variables are generally lower-case.

Racket

For more details, read the explanation in the Name section of the Style Guide: http://docs.racket-lang.org/style/Textual_Matters.html#%28part._names%29 .

The convention is to use full English lowercase words separated by dashes

<lang Racket>#lang racket render-game-state send-message-to-client traverse-forest</lang>

Usually _ is used only as the name of a dummy argument.

Most functions names have as prefix the data type of the main argument. Some notable exceptions are the functions for lists and boxes, for backward compatibility.

<lang Racket>#lang racket (string-ref "1234" 2) (string-length "123") (string-append "12" "34")

exceptions

(append (list 1 2) (list 3 4)) (unbox (box 7))</lang>

This convention generalizes the selector-style naming scheme of structs.

<lang Racket>#lang racket (struct pair (x y) #:transparent #:mutable) (define p (pair 1 2)) (pair-x p) ; ==> 1 (set-pair-y! p 3) p ; ==> (pair 1 3) </lang>

The name of conversion procedure is usually like from->to <lang Racket>#lang racket (list->vector '(1 2 3 4)) (number->string 7)</lang>

In addition to regular alphanumeric characters, some special characters are used by convention to indicate something about the name. The more usual are:

predicates and boolean-valued functions: ?

(boolean? 5) (list? "123")

setters and field mutators: !

(set! x 5) (vector-set! v 2 "x")

classes: %

game-state% button-snip%

interfaces: <%>;

dc<%>; font-name-directory<%></lang>

REXX

implicit types

The (Classic) REXX language has no implicit types for variable (names) or function (names) except
that all variables' values are of the type character.

So, it can be thought that the implicit type for everything is character.

numbers

Numbers are stored as characters: decimal digits, with/without signs, decimal points, and exponents
(and blanks where permitted).

Values that conform to the REXX definition of a number (below) are treated as a number:

{sign} {blanks} {digits} {.} {digits} {e|E} {sign} {exponent}

and it may also have any number of leading and/or trailing blanks as well.

The E or e (above) signifies the decimal number following is an exponent (a power of ten that
is multiplied to the preceding digits).

Everything (for a number) is optional, but there must be at least one decimal digit.

The sign(s) (for a number) if present, may be a minus sign (-) or a plus sign (+).

variable names

Naming conventions (as far as capitalization is concerned) is that variable names may be in any case,
the REXX language definition is that variable names are stored in capital letters internally.
Other characters may also be used for variable names (see below).

So: AbcXyz, abcxyz, ABCxyz all refer to the same variable.

function names

Naming conventions for the REXX BIFs (built-in functions) are all in uppercase, but they can be
coded in lowercase (or mixed case) for ease-of-use and readability.

I.E.: w=length(abc)

--- where length is a REXX BIF for the length of the value of the variable ABC

label names

Labels in Classic REXX can be any of the (Latin) letters, in addition to other characters such as:

. (a period or decimal point)
! (explanation point)
_ (underscore or underbar)
$ (dollar sign)
? (question mark)
# (pound sign or hash)
@ (commercial at sign)

Some Classic REXX interpreters allow additional characters [such as the ¢ (cent sign)].

Note that REXX keeps the label names as capitalized letters, but either lowercase/mixed/upper
case may be used interchangeably.

Tcl

This example is in need of improvement:

A concise example or two would do wonders here. Perhaps some of the existing challenge pages can be linked?

Tcl leaves nearly all matters of naming up to the programmer, so styles vary a bit. A few conventions are common:

variable names typically only use a-zA-Z0-9_ as these can be used without {} (Rule #7)
command names are typically lower_case or camelCase
namespaces are usually named in lowercase, starting with a letter ({[a-z][a-z0-9_]*}).
TitleCase names are typically used for private members. TclOO's default export pattern {[a-z]*} supports this convention
options/flags are typically spelled in all lower case with no internal punctuation: -nonewline
Tk window names start with . and must not have a capital letter in the next position
by convention, it's common to name hidden commands (eg: those that have been renamed and wrapped) with a leading underscore _
the name unknown is special in some contexts: it can be used to handle the "no such method" or "no such command" case
the array names _ and {} (the empty string) are used quite commonly for private state
the variable args is used for variadic functions, and typically this name is never used for anything else

zkl

Conventions are for the user to [create and] follow, no enforcement.
The compiler uses the "__" prefix as its "name space". For example: __DATE__, __sGet. A program can also use that format.
The compiler will put a "#" (comment in source code) in a name to mark it as "out of bounds". For example "__fcn#1_2" is the first lambda function and is located at source line 2.
Names must be unique. For example, a variable can not have the same name as a function. This is a confusion reducer.
Names are restricted to 80 characters of [A-Za-z0-9_], plus "#" when bypassing the tokenizer.

@@ Line 233: / Line 233: @@
 The twenty-six integer variables <tt>A%</tt> to <tt>Z%</tt> (capitalized) are 'static': that is to say, they persist throughout an interpreter session and are unaffected by the commands <tt>NEW</tt> and <tt>CLEAR</tt>. They can thus be used to pass a small amount of data from one program to another.
+If the first line of the program is a comment line of the form <tt>REM >myprog</tt>, the <tt>SAVE</tt> command can be used with no filename and the program will be saved as (in this case) <tt>myprog</tt>. Otherwise, it would be necessary to use <tt>SAVE "myprog"</tt>.
 =={{header|C}}==