Compiler/Simple file inclusion pre processor: Difference between revisions

← Older edit

Compiler/Simple file inclusion pre processor (view source)

Revision as of 16:07, 20 November 2023

7,346 bytes added , 5 months ago

m

→‎{{header|Wren}}: Changed to Wren S/H

PureFox

9,476

edits

Revision as of 13:25, 8 June 2021 (view source) PureFox (talk \| contribs) (→‎{{header\|Wren}}: Revised following latest clarification.) ← Older edit		Latest revision as of 16:07, 20 November 2023 (view source) PureFox (talk \| contribs) m (→‎{{header\|Wren}}: Changed to Wren S/H)
(18 intermediate revisions by 5 users not shown)
Line 3: ;Task: <br> <b>Introduction</b> <br><br> Many programming languages allow file inclusion, so that for example, standard declarations or code-snippets can be stored in a separate source file and be included into the programs that require them at compile-time, without the need for cut-and-paste. <br> Line 13 ⟶ 15: Other languages, on the other hand do have file inclusion, e.g.: C, COBOL, PL/1. <br> <br> <br>The distinction between compiled and interpreted languages should be irrelevent - this is a specialised text processing excersise - reading a source file and producing a modified source file that contains the contents of one or more other files. The distinction between compiled and interpreted languages should be irrelevent - this is a specialised text processing exercise - reading a source file and producing a modified source file that aldo contains the contents of one or more other files. <br> <br> Line 19 ⟶ 22: <br> <br> <b>The pre-processor</b> ~~So...~~ <br><br> Create a simple pre-processor that implements file-inclusion for your language. <br> <br> ~~The pre-processor need not implement macros, conditional compilation, etc. (E.g. for COBOL, the REPLACING option need not be implemented ).~~ The pre-processor need not check the validity of the resultant source. The pre-processor's job is purely to insert te specified files into the source at the required points. Whether the result is syntacticly valid or not is a matter for the compiler/interpreter when it is presented with the source. <br> <br> The syntax accepted for your pre-processor should be as per the standard for your language, should your language have such a facility. E.g. for C the pre-processor must recognise and handle "#include" directives. For PL/1, %include statements would be processed and for COBOL, COPY statements, ~~etc~~and so on. <br> <br> If your language does not have a standard facility for file-inclusion, implement that used by a popular compiler/interpreter for the language. <br> If there is no such feature, either use the C style #include directive or choose something of your own invention, e.g., #include would be problematic for languages where # introduces a comment. <br> <br> Line 38 ⟶ 42: <br> <br> <b>Minimum requirements</b> <br><br> As a minimum, your pre-procdessor must be able to process a source file (read from a file or standard input, as you prefer) and generate another source file (written to a file or standard output, as you prefer). The file-inclusion directives in the source should be replaced by the contents of the specified files. Implementing nested file inclusion directives (i.e., if an included file contains another file-inclusion directive) is optional. <br> ~~If possible, implement your pre-processor as a filter, i.e. read the main source file from standard input and write the pre-processed source to standard output.~~ <br> Pre-processors for some languages offer additional facilities, such as macro expansion and conditional compilation. Your pre-processor need not implement such things. <br> ~~NOTE to task implementors: The Task is about implementing a pre-processor for your language, not just describing it's features.~~ <br> <b>Notes</b> ~~<br>~~ * implementors: The Task is about implementing a pre-processor for your language, not just describing it's features. Just as the task [https://www.rosettacode.org/wiki/Calculating_the_value_of_e Calculating the value of e] is not about using your language's in-built exp function but showing how e could be calculated, this is about showing how file inclusion could be implemented - even if the compiler/interpreter you are using already has such a facility. ~~NOTE to anyone who uses the pre-processors on this page: They are supplied as-is, with no warranty - use at your own peril : )~~ * the pre-processors on this page are supplied as-is, with no warranty - use at your own peril : ) ;See Also * [[include a file]] <br> <br> =={{header\|ALGOL 68}}== Should work with any Algol 68 implementation that uses upper-stropping. <br><br>Implements file inclusion via pragmatic comments as in ALGOL 68G. <br><br>A pragmatic comment such as <code>PR read "somefile.incl.a68" PR</code> or <code>PR include "somefile.incl.a68" PR</code> can appear anywhere in the source and will cause the text of somefile.incl.a68 to be included at that point (Note, ALGOL 68G does not support "include" as an alternative to "read"). <br>The PR...PR will not be recognised inside comments or string literals and cannot appear inside a symbol, i.e. 1PR...PR2 is 1 followed by a pragmatic comment followed by 2. <br> PR can also be written as PRAGMA. <br> In ALGOL 68G, <code>PR read ...</code> only includes the file if it has not already been included., ~~This~~this ~~implementation~~is ~~does~~handled ~~not~~by ~~check~~this ~~for~~implementation ~~this~~ ~~and~~but soat ~~includes~~most ~~the~~200 ~~file~~different ~~everytime~~files itcan isbe ~~referenced~~included. <br> Includes can be nested to a depth of 10. <br><br> ~~<lang algol68># Algol 68 pre-processor #~~ When run, the pre-processor will read source from standard input and write the resultant source to standard output. If standard output is re-directed to a temporary source file, it can then be compiled/interpreted with the actual Algol 68 compiler. <br>(NB: The source must end with a line-feed) <syntaxhighlight lang="algol68"># Algol 68 pre-processor # # Processes read/include pragmas in an Algol 68 upper stropping source # # It is assumed _ is allowd in tags and bold words # Line 74 ⟶ 90: # the read/include must be in lower case # # ALGOL 68G's read pragmatic comment only includes the file the first time # # it is mentioned in a read pragmatic comment - this is ~~not~~ implemented by # # ~~here,~~keeping ~~the~~a ~~file~~list isof the included ~~each time~~ files - the list is limited to 200 # # entries # BEGIN Line 110 ⟶ 127: # number of errors reported # INT error count := 0; # number of included files # INT include count := 0; # names of previously included files # [ 1 : 200 ]STRING included files; # sets the logical file end procedure of the specified file to a routine # Line 304 ⟶ 325: IF at eof THEN # unterminated commant # unterminated( "'""" + delimiter + "'"" comment" ) FI; put string( delimiter ) Line 350 ⟶ 371: DO SKIP OD; IF at eof THEN # unterminated ~~commant~~comment # unterminated( """" + delimiter + """" ) FI; put string( delimiter ) FI ELIF # check for an already included file and add the name to # # the list if it hasn't been included before # BOOL already included := FALSE; FOR file pos TO include count WHILE NOT ( already included := included files[ file pos ] = file name ) DO SKIP OD; IF NOT already included THEN # first time this file has been included # # - add it to the list # IF include count < UPB included files THEN # room to include this file # included files[ include count +:= 1 ] := file name ELSE # too many include files # error( "Too many include files: " + file name ) FI FI; op = "read" AND already included THEN # the file is already included and the operation is "read" so # # the pragma should be ignored # SKIP ELIF # ~~attempt~~check tothe include ~~the~~ file depth # include depth >= UPB in stack THEN Line 406 ⟶ 449: FI END</~~lang~~syntaxhighlight> {{out}} Pre-processing the following program: Line 433 ⟶ 476: END </pre> =={{header\|AWK}}== AWK does not have file-inclusion as standard, however some implementations, particularly GNU Awk do provide file inclusion. <br>This uses <code>@include</code> as the file inclusion directive, as in GAWK. <br>It differs from GAWK syntax in that the include directive can appear inside or outside functions. The file name can be quoted or not. Nested includes are not supported. <br>The source can be a named file or read from stdin. If it is read from stdin, <code>-v sec=sourceName</code> can be specified on the AWK command line to name the file. The pre-processed source is writen to stdout. <syntaxhighlight lang="awk"># include.awk: simple file inclusion pre-processor # # the command line can specify: # -v srcName=<source file path> BEGIN { srcName = srcName ""; } # BEGIN { if( $1 == "@include" ) { # must include a file includeFile( $0 ); } else { # normal line printf( "%s\n", $0 ); } } function includeFile( includeLine, fileName, ioStat, line ) { # get the file name from the @include line fileName = includeLine; sub( /^ @include /, "", fileName ); sub( / $/, "", fileName ); sub( / #.$/, "", fileName ); if( fileName ~ /^"/ ) { # quoted file name sub( /^"/, "", fileName ); sub( /"$/, "", fileName ); gsub( /""/, "\"", fileName ); } printf( "#line 1 %s\n", fileName ); while( ( ioStat = ( getline line < fileName ) ) > 0 ) { # have a source line printf( "%s\n", line ); } if( ioStat < 0 ) { # I/O error printf( "@include %s # not found or I/O error\n", fileName ); } close( fileName ); printf( "#line %d %s\n", NR, ( srcName != "" ? srcName : FILENAME ) ); } # includeFile</syntaxhighlight> =={{header\|J}}== Preprocessing is a task which, if used at all, would more likely be tackled in J through monkey-patching (wrapping library definitions for names, overriding their prior definition). Instead, here, we handle literal statements of the form <code>load'script references'</code> where 'load' appears at the beginning of the line (not indented) and 'script references' is a literal string, and nothing else appears on the line, and we replace any such line(s) with the content of the referenced script(s). This approach is not recursive (and while it seems to offer little advantage over the native implementation of 'load', it does support 'load' inside multi-line string constants, as long as the reference(s) to the content being loaded would be supportable as a J script reference). <syntaxhighlight lang="j">preproc=: {{ lines=. <;.2 LF,~CR-.~fread y for_ndx. \|.I.'load'-:"1 (4&{.@>) lines do. line=. ndx{::lines try. parse=. ;:line catch. continue. end. if. 3~:#parse do. continue. end. if. (<'load')~:{.parse do. continue. end. if. ''''~:(1;0){::parse do. continue. end. lines=. lines ndx}~ <;fread each getscripts_j_ ".1{::parse end. 0!:0;lines }}</syntaxhighlight> =={{header\|Julia}}== Line 442 ⟶ 565: the standard Julia syntax. Calls to the <code>include</code> function that contain a single argument which is a string in parentheses will be preproccessed. Other calls to <code>include</code> with different arguments will not be preprocessed by <code>preprocess.jl</code>. <~~lang~~syntaxhighlight lang="julia"># preprocess.jl convert includes to file contenets infile = length(ARGS) > 0 ? ARGS[1] : stdin Line 452 ⟶ 575: return m.captures[1] read(m.captures[2], String) * m.captures[3] catch y @warn( y) return s end Line 460 ⟶ 583: output = replace(input, r"\sinclude\(\"[^\"]+\"\)\s" => includefile) write(outfile, output) </syntaxhighlight> ~~</lang>~~ =={{header\|Phix}}== Standard feature. Phix ships with a bunch of standard files in a builtins directory, most of which it knows how to "autoinclude", but some must be explicitly included ([http://phix.x10.mx/docs/html/include.htm full docs]). You can explicitly specify the builtins directory or not (obviously without it will look in the project directory first), and use the same mechanism for files you have written yourself. There is no limit to the number or depth of files than can be included. Relative directories are honoured, so if you specify a (partial) directory that is where it will look first for any sub-includes. You can also use single line "stub includes" to redirect include statements to different directories/versions. Note that namespaces are '''not''' supported by pwa/p2js. You can optionally use double quotes, but may then need to escape backslashes. Includes occur at compile time, as opposed to dynamically. <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">--> <span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #004080;">complex</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> <span style="color: #008080;">include</span> <span style="color: #004080;">complex</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> <span style="color: #000080;font-style:italic;">-- also valid</span> <span style="color: #008080;">include</span> <span style="color: #008000;">"builtins\\complex.e"</span> <span style="color: #000080;font-style:italic;">-- ditto</span> <!--</~~lang~~syntaxhighlight>--> If the compiler detects that some file has already been included it does not do it again (from the same directory, two or more files of the same name ''can'' be included from different directories). I should perhaps also state that include handling is part of normal compilation/interpretation, as opposed to a separate "preprocessing" step, and that each file is granted a new private scope, and while of course there is only one "global" scope, it will use the implicit include hierarchy to automatically resolve any clashes that might arise to the most appropriate one, aka "if it works standalone it should work exactly the same when included in as part of a larger application". And so on to the task as specified: Since the task specifies it "is about implementing a pre-processor for ''your language'', not just describing it's features" and as per discussions on the talk page, and the above, a "preprocessor" for Phix would fail in so many ways it is simply not really worth attempting, and should certainly never actually be used.<br> The following will replace include statements with file contents, but do '''not''' expect it to work or do anything useful on any existing [Phix] code. Mutually recursive includes will cause a mutually recursive infinite loop, until you run out of memory, that is without adding some kind of "already done" stack to the following. <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">--> <span style="color: #008080;">function</span> <span style="color: #000000;">preprocess</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">filename</span><span style="color: #0000FF;">)</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">inlines</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">get_text</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filename</span><span style="color: #0000FF;">,</span><span style="color: #004600;">GT_LF_STRIPPED</span><span style="color: #0000FF;">),</span> Line 489 ⟶ 612: <span style="color: #008080;">return</span> <span style="color: #000000;">outlines</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> <!--</~~lang~~syntaxhighlight>--> As the Wren entry eloquently puts it: The above code is limited in the sense that all top-level variables of the imported module (not just the specifically "global" ones) are now visible in the outer scope. Consequently, the code will no longer compile if there are any name clashes. In fact pwa/p2js contains some code (see insert_dollars() in p2js.exw) that renames selected pre-known top-level variables in the standard includes, eg base64.e `aleph` -> `$aleph` to minimise such disruption, however there is (as yet) no such mechanism for user-written include files, though it is on the to-do list. =={{header\|Raku}}== Line 506 ⟶ 632: A Raku script to do source filtering / preprocessing: save it and call it 'include' <syntaxhighlight lang="raku" ~~perl6~~line>unit sub MAIN ($file-name); my $file = slurp $file-name; put $file.=subst(/[^^\|['{{' \s]] '#include' \s+ (\S+) \s '}}'?/, {run(«$EXECUTABLE-NAME $PROGRAM-NAME $0», :out).out.slurp(:close).trim}, :g);</~~lang~~syntaxhighlight> This will find: any line starting with '#include' followed by a (absolute or relative) path to a file, or #include ./path/to/file.name enclosed in double curly brackets anywhere in the file. Line 517 ⟶ 643: Top level named... whatever, let's call it 'preprocess.raku' <syntaxhighlight lang="raku" ~~perl6~~line># Top level test script file for #include Rosettacode # 'Compiler/Simple file inclusion pre processor' task Line 525 ⟶ 651: #include ./include1.file </syntaxhighlight> ~~</lang>~~ include1.file Line 614 ⟶ 740: Nevertheless, it is possible to write a limited pre-processor in Wren (the VM itself is written in C): <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">import "io" for File var source = File.read("source.wren") Line 638 ⟶ 764: } File.create("source2.wren") { \|file\| file.writeBytes(source) }</~~lang~~syntaxhighlight> <br> The above code is limited in the sense that '''all''' top-level variables of the imported module (not just the specifically imported ones) are now visible in the outer scope. Consequently, the code will no longer compile if there are any name clashes. The obvious solution of placing the imported code in a block and then 'lifting' the specifically imported variables into the outer scope does not work because of Wren's rather strange scoping rules. If you did this, then the imported module's top level variables would no longer be top-level relative to the code as a whole and hence would no longer be visible to classes defined within the module itself! For example, if you try to run the following script, you get the error shown: <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">var B { // block starts here Line 657 ⟶ 783: } // end of block B.method()</~~lang~~syntaxhighlight> Other problems include dealing with '''import''' statements which have been commented out (not catered for in the above pre-processor) and resolving import file paths which is not actually set in stone but depends on how Wren is being embedded. The above pre-processor code only deals with paths which are relative to the current directory.