Compiler/Simple file inclusion pre processor

From Rosetta Code
Compiler/Simple file inclusion pre processor is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Many programming languages allow file inclusion, so that for example, standard declarations or code-snippets can be stored in a separate source file and be included into the programs that require them at compile-time, without the need for cut-and-paste.

Probably the C pre-processor is the most well-known example.


Create a simple pre-processor that implements file-inclusion for your language.
The pre-processor need not implement macros, conditional compilation, etc.

The syntax accepted for your pre-processor should be as per the standard for your language, e.g. for C the pre-processor must recognise and handle "#include" directives. For PL/1, %include statements would be processed and for COBOL, COPY statements, etc.

If your language does not have a standard facility for file-inclusion, implement that used by a popular compiler for the language.
If there is no such feature (e.g. more recent OO languages use import/using/etc. statements to include pre-compiled class definitions), either use the C style #include directive or choose something of your own invention.

State the syntax expected and any limitations, including whether nested includes are supported and if so, how deep the nesting can be.


If possible, implement your pre-processor as a filter, i.e. read the main source file from standard input and write the pre-processed source to standard output.

NOTE to task implementors: The Task is about implementing a pre-processor for your language, not just describing it's features.

NOTE to anyone who uses the pre-processors on this page: They are supplied as-is, with no warranty - use at your own peril : )


ALGOL 68

Should work with any Algol 68 implementation that uses upper-stropping.
Implements file inclusion via pragmatic comments as in ALGOL 68G.
A pragmatic comment such as PR read "somefile.incl.a68" PR or PR include "somefile.incl.a68" PR can appear anywhere in the source and will cause the text of somefile.incl.a68 to be included at that point (Note, ALGOL 68G does not support "include" as an alternative to "read").
The PR...PR will not be recognised inside comments or string literals and cannot appear inside a symbol, i.e. 1PR...PR2 is 1 followed by a pragmatic comment followed by 2.
PR can also be written as PRAGMA.
In ALGOL 68G, PR read ... only includes the file if it has not already been included. This implementation does not check for this and so includes the file everytime it is referenced.
Includes can be nested to a depth of 10. <lang algol68># Algol 68 pre-processor #

  1. Processes read/include pragmas in an Algol 68 upper stropping source #
  2. It is assumed _ is allowd in tags and bold words #
  3. It is assumed that {} are alternatives for () as in ALGOL 68G #
  4. to use {} as ALGOL 68RS/algol68toc style nestable brief-comments, #
  5. change rs style brief comments to TRUE #
  6. ALGOL 68G ( and probably other compilers ) allow quote-stropped bold #
  7. words to appear in an otherwise upper-stropped source, #
  8. e.g. BEGIN 'SKIP' END would be a valid program #
  9. this is not supported here so pragmatic comments such as: #
  10. 'PR' read "someFile" 'PR' will cause problems #
  11. pragmatic comments should be disabled by PR pragmats off PR, #
  12. this is not implemented #
  13. the read/include must be in lower case #
  14. ALGOL 68G's read pragmatic comment only includes the file the first time #
  15. it is mentioned in a read pragmatic comment - this is not implemented #
  16. here, the file is included each time #

BEGIN

   # TRUE if {} delimits a nestable brief comment, as in ALGOL 68RS and      #
   #      algol68toc, FALSE if {} are alternatives to () as in ALGOL 68G     #
   BOOL rs style brief comments = FALSE;
   # input file information                                                  #
   MODE INFILE = STRUCT( REF FILE src         # actual source file           #
                       , STRING   line        # latest source line           #
                       , INT      pos         # character position in line   #
                       );
   # initialises the INFILE f to be associated with the FILE src             #
   PRIO INIT = 9;
   OP   INIT = ( REF FILE src, REF INFILE f )REF INFILE:
        BEGIN
           line OF f := "";
           pos  OF f := 1 + UPB line OF f;
           src  OF f := src;
           set eof handler( f );
           f
        END # INIT # ;
   # TRUE if EOF has been reached, FALSE otherwise                           #
   BOOL at eof := FALSE;
   CHAR c      := " ";
   # newline character                                                       #
   CHAR nl      = REPR 10;
   # maximum number of include files that can be nested                      #
   INT max include depth = 10;
   # source file stack                                                       #
   [ 0 : max include depth ]INFILE in stack;
   # current include depth                                                   #
   INT include depth := 0;
   # number of errors reported                                               #
   INT error count   := 0;
   # sets the logical file end procedure of the specified file to a routine  #
   # that allows us to detect EOF on a source file                           #
   PROC set eof handler = ( REF INFILE inf )VOID:
        on logical file end( src OF inf
                           , ( REF FILE f )BOOL:
                             BEGIN
                                 # note that we reached EOF on the          #
                                 # latest read                              #
                                 IF NOT at eof
                                 THEN
                                     # first time we have spotted eof,      #
                                     # we need to call newline so that      #
                                     # if the last line didn't have a       #
                                     # newline at the end, it is still read #
                                     # however that will call this routine  #
                                     # so we have to ensure we only do it   #
                                     # once                                 #
                                     at eof := TRUE;
                                     newline( f )
                                 FI;
                                 # return TRUE so processing can continue   #
                                 TRUE
                             END
                           );
   # reports an error                                                        #
   PROC error = ( STRING message )VOID:
        BEGIN
           error count +:= 1;
           print( ( newline, newline, "**** ", message, newline ) )
        END # error # ;
   # gets the next source character, handling end-of-file on include files   #
   # the source character is stored in c                                     #
   PROC next char = VOID:
        BEGIN
           WHILE
               BOOL read again := FALSE;
               REF INFILE s = in stack[ include depth ];
               IF pos OF s <= UPB line OF s THEN
                   # not past the end of the source line                     #
                   c := ( line OF s )[ pos OF s ];
                   pos OF s +:= 1
               ELIF
                   # past the end of the current source line - get the next  #
                   at eof := FALSE;
                   get( src OF s, ( line OF s, newline ) );
                   NOT at eof
               THEN
                   # got a new line                                          #
                   line OF s +:= nl;
                   pos  OF s  := LWB line OF s;
                   read again := TRUE
               ELIF include depth = 0 THEN
                   # eof on the main source                                  #
                   line OF s := ""
               ELSE
                   # got eof on an include file                              #
                   include depth -:= 1;
                   read again     := TRUE;
                   at eof         := FALSE;
                   close( src OF s )
               FI;
               read again
           DO SKIP OD
        END # next char # ;
   # returns TRUE if the current character is whitespace                     #
   PROC have whitespace = BOOL: c <= " ";
   # returns TRUE if the current character is a string delimiter             #
   PROC have string delimiter = BOOL: c = """";
   # returns TRUE if the current character can start a bold word             #
   PROC have bold = BOOL: c >= "A" AND c <= "Z";
   # returns TRUE if the current character can start a brief tag             #
   PROC have tag  = BOOL: c >= "a" AND c <= "z";
   # reports an unterminated construct ( e.g. string, comment )              #
   PROC unterminated = ( STRING construct )VOID:
        error( "Unterminated " + construct );
   # outputs ch to stand out                                                 #
   PROC put char = ( CHAR ch )VOID:
        IF ch = nl THEN print( ( newline ) ) ELSE print( ch ) FI;
   # outputs str to stand out                                                #
   PROC put string = ( STRING str )VOID: print( ( str ) );
   # outputs a brief comment to stand out                                    #
   #    end char is the closing delimiter,                                   #
   #    nested char is the opening delimiter for nestable brief comments     #
   #        if nested char is blank, the brief comment does not nest         #
   #    this handles ALGOL 68RS and algol68toc style {} comments             #
   PROC skip brief comment = ( CHAR end char, CHAR nested char )VOID:
        BEGIN
           put char( c );
           WHILE next char;
                 NOT at eof AND c /= end char
           DO
               IF c = nested char AND NOT have whitespace THEN
                   # nested brief comment                                    #
                   skip brief comment( end char, nested char )
               ELSE
                   # notmal comment char                                     #
                   put char( c )
               FI
           OD;
           IF at eof THEN
               # unterminated comment                                        #
               unterminated( """" + end char + """ comment" );
               c := end char
           FI;
           put char( c );
           next char
        END # skip brief comment # ;
   # gets a string of spaces from the source                                 #
   PROC get whitespace = STRING:
        BEGIN
           STRING result := "";
           WHILE NOT at eof AND have whitespace DO result +:= c; next char OD;
           result
        END # get whitespace # ;
   # gets a string denotation from the source                                #
   PROC get string = STRING:
        BEGIN
           STRING result := "";
           # within a string denotation, "" denotes the " character          #
           WHILE have string delimiter DO
               WHILE result +:= c;
                     next char;
                     NOT at eof AND NOT have string delimiter
               DO SKIP OD;
               IF NOT have string delimiter THEN
                   # unterminated string                                     #
                   unterminated( "string" );
                   c := """"
               FI;
               result +:= c;
               next char
           OD;
           result
        END # get string # ;
   # returns s unquoted                                                      #
   PROC unquote string = ( STRING s )STRING:
        BEGIN
           STRING result := "";
           # within a string denotation, "" denotes the " character          #
           INT c pos := LWB s + 1;
           WHILE cpos < UPB s DO
               CHAR ch = s[ c pos ];
               IF ch = """" THEN
                   # have an embedded quote - it will be doubled             #
                   c pos +:= 1
               FI;
               result +:= ch;
               c pos  +:= 1
           OD;
           result
        END # unquote string # ;
   # gets a bold word from then source                                       #
   PROC get bold word = STRING:
        BEGIN
           STRING result := "";
           WHILE have bold OR c = "_" DO result +:= c; next char OD;
           result
        END # get bold word # ;
   # geta a brief tag from the source                                        #
   PROC get tag = STRING:
        BEGIN
           STRING result := "" ;
           WHILE have tag OR c = "_" DO result +:= c; next char OD;
           result
        END # get tag # ;
   # copies the source to the output until a bold word is encountered        #
   PROC skip to bold = STRING:
        IF at eof
        THEN ""
        ELSE STRING result := "";
             WHILE put char( c );
                   next char;
                   NOT at eof
               AND NOT have bold
             DO SKIP OD;
             IF NOT at eof THEN result := get bold word FI;
             result
        FI # skip to bold # ;
   # handles a bold PRAGMA, COMMENT or other bold word                       #
   PROC bold word or pragment = VOID:
        IF STRING bold word := get bold word;
           bold word = "CO" OR bold word = "COMMENT"
        THEN
           # have a bold comment                                             #
           STRING delimiter = bold word;
           WHILE put string( bold word );
                 bold word := skip to bold;
                 NOT at eof
             AND bold word /= delimiter
           DO SKIP OD;
           IF at eof THEN
               # unterminated commant                                        #
               unterminated( "'" + delimiter + "' comment" )
           FI;
           put string( delimiter )
        ELIF bold word = "PR" OR bold word = "PRAGMA"
        THEN
           # have a pragmatic comment - could be file inclusion              #
           STRING delimiter  = bold word;
           STRING pragment  := bold word;
           STRING op        := "";
           STRING file name := "";
           # skip spaces after the PR/PRAGMA                                 #
           pragment +:= get whitespace;
           # get the operaqtion, if there is a tag                           #
           IF have tag THEN
               # have an operation                                           #
               op        := get tag;
               pragment +:= op + get whitespace
           FI;
           # get the file name, if there is one                              #
           IF have string delimiter THEN
               # have a file name                                            #
               file name := get string;
               pragment +:= file name + get whitespace;
               file name := unquote string( file name )
           FI;
           # should now have the closing delimiter                           #
           IF NOT have bold THEN
               # no bold word in/at-the-nd-of the pragment                   #
               bold word := ""
           ELSE
               # have a bold word - could be the delimiter                   #
               pragment +:= ( bold word := get bold word )
           FI;
           IF ( op /= "read" AND op /= "include" )
           OR file name  = ""
           OR bold word /= delimiter
           THEN
               # not a read/include pragmatic comment                        #
               put string( pragment );
               IF bold word /= delimiter THEN
                   # haven't got the closing delimiter yet                   #
                   WHILE bold word := skip to bold;
                         NOT at eof
                     AND bold word /= delimiter
                   DO SKIP OD;
                   IF at eof THEN
                       # unterminated commant                                #
                       unterminated( """" + delimiter )
                   FI;
                   put string( delimiter )
               FI
           ELIF
               # attempt to include the file                                 #
               include depth >= UPB in stack
           THEN
               # max include depth exceeded                                  #
               put string( pragment );
               error( "Include files nested too deply: " + file name )
           ELIF REF FILE inc := HEAP FILE;
                open( inc, file name, stand in channel ) /= 0
           THEN
               # couldn't open the file                                      #
               put string( pragment );
               error( "Unable to include: " + file name )
           ELSE
               # file opened OK                                              #
               in stack[ include depth +:= 1 ] := inc INIT HEAP INFILE
           FI
        ELSE
           # some other bold word                                            #
           put string( bold word )
        FI # bold word or pragment # ;
   # copy the source to stand out, expanding read/incldue pragmas            #
   in stack[ include depth := 0 ] := stand in INIT HEAP INFILE;
   next char;
   WHILE NOT at eof DO
       IF   c = "#" THEN
           # brief comment                                                   #
           skip brief comment( "#", " " )
       ELIF c = "{" AND rs style brief comments THEN
           # nestable brief comment ( ALGOL 68RS and algol68toc )            #
           skip brief comment( "}", "{" )
       ELIF have string delimiter THEN
           # STRING or CHAR denotation                                       #
           put string( get string )
       ELIF have bold THEN
           # have a bold word                                                #
           bold word or pragment
       ELSE
           # anything else                                                   #
           put char( c );
           next char
       FI
   OD;
   IF error count > 0 THEN
       # had errors processing the source                                    #
       print( ( "**** ", whole( error count, 0 ), " errors", newline ) )
   FI

END</lang>

Output:

Pre-processing the following program:

PR include "ex1.a68" PR

where ex1.a68 contains:

BEGIN
    PR precision 200 PR
    INT x := 1;
    PR read "in1.incl.a68" PR
END

and in1.incl.a68 contains:

    IF x > 0 THEN print( ( x, newline ) ) FI

Produces the following output:

BEGIN
    PR precision 200 PR
    INT x := 1;

    IF x > 0 THEN print( ( x, newline ) ) FI
END

Phix

Standard feature. Phix ships with a bunch of standard files in a builtins directory, most of which it knows how to "autoinclude", but some must be explicitly included (full docs). You can explicitly specify the builtins directory or not (obviously without it will look in the project directory first), and use the same mechanism for files you have written yourself. There is no limit to the number or depth of files than can be included. Relative directories are honoured, so if you specify a (partial) directory that is where it will look first for any sub-includes. You can also use single line "stub includes" to redirect include statements to different directories/versions. Note that namespaces are not supported by pwa/p2js. You can optionally use double quotes, but may then need to escape backslashes. Includes occur at compile time, as opposed to dynamically.

include builtins/complex.e
include complex.e             -- also valid
include "builtins\\complex.e" -- ditto

If the compiler detects that some file has already been included it does not do it again (from the same directory, two or more files of the same name can be included from different directories). I should perhaps also state that include handling is part of normal compilation/interpretation, as opposed to a separate "preprocessing" step, and that each file is granted a new private scope, and while of course there is only one "global" scope, it will use the implicit include hierarchy to automatically resolve any clashes that might arise to the most appropriate one, aka "if it works standalone it should work exactly the same when included in as part of a larger application".

And so on to the task as specified: Since the task specifies it "is about implementing a pre-processor for your language, not just describing it's features" and as per discussions on the talk page, and the above, a "preprocessor" for Phix would fail in so many ways it is simply not really worth attempting, and should certainly never actually be used.
The following will replace include statements with file contents, but do not expect it to work or do anything useful on any existing [Phix] code.

function preprocess(string filename)
    sequence inlines = get_text(filename,GT_LF_STRIPPED),
             outlines = {}
    for l=1 to length(inlines) do
        string line = inlines[l]
        if match("include ",line)=1 then
            line = trim(line[9..match("--",line)-1],{' ','\t','"'})
            outlines &= preprocess(line)
        else
            outlines &= line
        end if
    end for
    return outlines
end function