Compiler/lexical analyzer: Difference between revisions
m
syntax highlighting fixup automation
m (→{{header|ALGOL 68}}: more helpful diagnostics) |
Thundergnat (talk | contribs) m (syntax highlighting fixup automation) |
||
Line 158:
For example, the following two program fragments are equivalent, and should produce the same token stream except for the line and column positions:
* <
print ( n , " " ) ;
count = count + 1 ; /* number of primes found so far */
}</
* <
;Complete list of token names
Line 237:
| style="vertical-align:top" |
Test Case 1:
<
Hello world
*/
print("Hello, World!\n");</
| style="vertical-align:top" |
Line 255:
| style="vertical-align:top" |
Test Case 2:
<
Show Ident and Integers
*/
phoenix_number = 142857;
print(phoenix_number, "\n");</
| style="vertical-align:top" |
Line 280:
| style="vertical-align:top" |
Test Case 3:
<
All lexical tokens - not syntactically correct, but that will
have to wait until syntax analysis
Line 301:
/* character literal */ '\n'
/* character literal */ '\\'
/* character literal */ ' '</
| style="vertical-align:top" |
Line 344:
| style="vertical-align:top" |
Test Case 4:
<
print(42);
print("\nHello World\nGood Bye\nok\n");
print("Print a slash n - \\n.\n");</
| style="vertical-align:top" |
Line 388:
=={{header|Ada}}==
<
Ada.Exceptions;
use Ada.Strings, Ada.Strings.Unbounded, Ada.Streams, Ada.Exceptions;
Line 648:
when error : others => IO.Put_Line("Error: " & Exception_Message(error));
end Main;
</syntaxhighlight>
{{out}} Test case 3:
<pre>
Line 691:
As an addition, it emits a diagnostic if integer literals are too big.
<
# implement C-like getchar, where EOF and EOLn are "characters" (-1 and 10 resp.). #
INT eof = -1, eoln = 10;
Line 828:
OD;
output("End_Of_Input")
END</
=={{header|ALGOL W}}==
<
%lexical analyser %
% Algol W strings are limited to 256 characters in length so we limit source lines %
Line 1,124:
while nextToken not = tEnd_of_input do writeToken;
writeToken
end.</
{{out}} Test case 3:
<pre>
Line 1,169:
(One point of note: the C "EOF" pseudo-character is detected in the following code by looking for a negative number. That EOF has to be negative and the other characters non-negative is implied by the ISO C standard.)
<
(* Usage: lex [INPUTFILE [OUTPUTFILE]]
If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input
Line 2,041:
end
(********************************************************************)</
{{out}}
Line 2,082:
=={{header|AWK}}==
Tested with gawk 4.1.1 and mawk 1.3.4.
<syntaxhighlight lang="awk">
BEGIN {
all_syms["tk_EOI" ] = "End_of_input"
Line 2,288:
}
}
</syntaxhighlight>
{{out|case=count}}
<b>
Line 2,325:
=={{header|C}}==
Tested with gcc 4.81 and later, compiles warning free with -Wpedantic -pedantic -Wall -Wextra
<
#include <stdio.h>
#include <stdarg.h>
Line 2,557:
run();
return 0;
}</
{{out|case=test case 3}}
Line 2,601:
=={{header|C sharp|C#}}==
Requires C#6.0 because of the use of null coalescing operators.
<
using System;
using System.IO;
Line 2,951:
}
}
</syntaxhighlight>
{{out|case=test case 3}}
Line 2,995:
=={{header|C++}}==
Tested with GCC 9.3.0 (g++ -std=c++17)
<
#include <fstream> // file_to_string, string_to_file
#include <functional> // std::invoke
Line 3,380:
});
}
</syntaxhighlight>
{{out|case=test case 3}}
Line 3,425:
Using GnuCOBOL 2. By Steve Williams (with one change to get around a Rosetta Code code highlighter problem).
<
*> this code is dedicated to the public domain
*> (GnuCOBOL) 2.3-dev.0
Line 3,831:
end-if
.
end program lexer.</
{{out|case=test case 3}}
Line 3,873:
Lisp has a built-in reader and you can customize the reader by modifying its readtable. I'm also using the Gray stream, which is an almost standard feature of Common Lisp, for counting lines and columns.
<
(:use #:cl #:sb-gray)
(:export #:main))
Line 4,086:
(defun main ()
(lex *standard-input*))</
{{out|case=test case 3}}
<pre> 5 16 KEYWORD-PRINT
Line 4,127:
{{trans|ATS}}
<
# -*- elixir -*-
Line 4,595:
end ## module Lex
Lex.main(System.argv)</
{{out}}
Line 4,641:
<
;;
;; The Rosetta Code lexical analyzer in GNU Emacs Lisp.
Line 5,059:
(scan-text t))
(main)</
Line 5,105:
<
%%%-------------------------------------------------------------------
Line 5,610:
%%% erlang-indent-level: 3
%%% end:
%%%-------------------------------------------------------------------</
Line 5,652:
=={{header|Euphoria}}==
Tested with Euphoria 4.05.
<
include std/map.e
include std/types.e
Line 5,877:
end procedure
main(command_line())</
{{out|case=test case 3}}
Line 5,921:
=={{header|Flex}}==
Tested with Flex 2.5.4.
<syntaxhighlight lang="c">%{
#include <stdio.h>
#include <stdlib.h>
Line 6,094:
} while (tok != tk_EOI);
return 0;
}</
{{out|case=test case 3}}
Line 6,138:
=={{header|Forth}}==
Tested with Gforth 0.7.3.
<
CREATE COLUMN# 0 ,
CREATE LINE# 1 ,
Line 6,260:
THEN THEN ;
: TOKENIZE BEGIN CONSUME AGAIN ;
TOKENIZE</
{{out}}
Line 6,274:
The author has placed this Fortran code in the public domain.
<syntaxhighlight lang="fortran">!!!
!!! An implementation of the Rosetta Code lexical analyzer task:
!!! https://rosettacode.org/wiki/Compiler/lexical_analyzer
Line 7,352:
end subroutine print_usage
end program lex</
{{out}}
Line 7,393:
=={{header|FreeBASIC}}==
Tested with FreeBASIC 1.05
<
tk_EOI
tk_Mul
Line 7,679:
print : print "Hit any to end program"
sleep
system</
{{out|case=test case 3}}
<b>
Line 7,720:
=={{header|Go}}==
{{trans|FreeBASIC}}
<
import (
Line 8,097:
initLex()
process()
}</
{{out}}
Line 8,140:
=={{header|Haskell}}==
Tested with GHC 8.0.2
<
import Control.Monad.State.Lazy
import Control.Monad.Trans.Maybe (MaybeT, runMaybeT)
Line 8,444:
where (Just t, s') = runState (runMaybeT lexer) s
(txt, _, _) = s'
</syntaxhighlight>
{{out|case=test case 3}}
Line 8,496:
Global variables are avoided except for some constants that require initialization.
<syntaxhighlight lang="icon">#
# The Rosetta Code lexical analyzer in Icon with co-expressions. Based
# upon the ATS implementation.
Line 8,994:
procedure max(x, y)
return (if x < y then y else x)
end</
Line 9,043:
Implementation:
<
ch=: {{1 0+x[symbols=: x (a.i.y)} symbols}}
'T0 token' =: 0 ch '%+-!(){};,<>=!|&'
Line 9,163:
keep=. (tokens~:<,'''')*-.comments+.whitespace+.unknown*a:=values
keep&#each ((1+lines),.columns);<names,.values
}}</
Test case 3:
<syntaxhighlight lang="j">
flex=: {{
'A B'=.y
Line 9,233:
21 28 Integer 92
22 27 Integer 32
23 1 End_of_input </
Here, it seems expedient to retain a structured representation of the lexical result. As shown, it's straightforward to produce a "pure" textual result for a hypothetical alternative implementation of the syntax analyzer, but the structured representation will be easier to deal with.
=={{header|Java}}==
<
// Translated from python source
Line 9,479:
}
}
</syntaxhighlight>
=={{header|JavaScript}}==
{{incorrect|Javascript|Please show output. Code is identical to [[Compiler/syntax_analyzer]] task}}
<
/*
Token: type, value, line, pos
Line 9,696:
l.printTokens()
})
</syntaxhighlight>
=={{header|Julia}}==
<
startline::Int
startcol::Int
Line 9,854:
println(lpad(tok.startline, 3), lpad(tok.startcol, 5), lpad(tok.name, 18), " ", tok.value != nothing ? tok.value : "")
end
</
Line Col Name Value
5 16 Keyword_print
Line 9,893:
=={{header|kotlin}}==
{{trans|Java}}
<
// three character console input of digits followed by a new line will be
// checked for an integer between zero and twenty-five to select a fixed test
Line 10,566:
System.exit(1)
} // try
} // main</
{{out|case=test case 3: All Symbols}}
<b>
Line 10,614:
The first module is simply a table defining the names of tokens which don't have an associated value.
<
local token_name = {
['*'] = 'Op_multiply',
Line 10,643:
['putc'] = 'Keyword_putc',
}
return token_name</
This module exports a function <i>find_token</i>, which attempts to find the next valid token from a specified position in a source line.
<
local M = {} -- only items added to M will be public (via 'return M' at end)
local table, concat = table, table.concat
Line 10,729:
end
return M</
The <i>lexer</i> module uses <i>finder.find_token</i> to produce an iterator over the tokens in a source.
<
local M = {} -- only items added to M will publicly available (via 'return M' at end)
local string, io, coroutine, yield = string, io, coroutine, coroutine.yield
Line 10,811:
-- M._INTERNALS = _ENV
return M
</syntaxhighlight>
This script uses <i>lexer.tokenize_text</i> to show the token sequence produced from a source text.
<
format, gsub = string.format, string.gsub
Line 10,853:
-- etc.
end
</syntaxhighlight>
===Using only standard libraries===
This version replaces the <i>lpeg_token_finder</i> module of the LPeg version with this <i>basic_token_finder</i> module, altering the <i>require</i> expression near the top of the <i>lexer</i> module accordingly. Tested with Lua 5.3.5. (Note that <i>select</i> is a standard function as of Lua 5.2.)
<
local M = {} -- only items added to M will be public (via 'return M' at end)
local table, string = table, string
Line 10,988:
-- M._ENV = _ENV
return M</
=={{header|M2000 Interpreter}}==
<syntaxhighlight lang="m2000 interpreter">
Module lexical_analyzer {
a$={/*
Line 11,247:
}
lexical_analyzer
</syntaxhighlight>
{{out}}
Line 11,292:
<
%
% Compile with maybe something like:
Line 12,022:
:- func eof = int is det.
eof = -1.</
{{out}}
Line 12,071:
Tested with Nim v0.19.4. Both examples are tested against all programs in [[Compiler/Sample programs]].
===Using string with regular expressions===
<
import re, strformat, strutils
Line 12,263:
echo input.tokenize.output
</syntaxhighlight>
===Using stream with lexer library===
<
import lexbase, streams
from strutils import Whitespace
Line 12,576:
echo &"({l.lineNumber},{l.getColNumber l.bufpos + 1}) {l.error}"
main()
</syntaxhighlight>
===Using nothing but system and strutils===
<
type
Line 12,799:
stdout.write('\n')
if token.kind == tokEnd:
break</
=={{header|ObjectIcon}}==
Line 12,809:
<
#
# The Rosetta Code lexical analyzer in Object Icon. Based upon the ATS
Line 13,306:
write!([FileStream.stderr] ||| args)
exit(1)
end</
Line 13,354:
(Much of the extra complication in the ATS comes from arrays being a linear type (whose "views" need tending), and from values of linear type having to be local to any function using them. This limitation could have been worked around, and arrays more similar to OCaml arrays could have been used, but at a cost in safety and efficiency.)
<
(* The Rosetta Code lexical analyzer, in OCaml. Based on the ATS. *)
Line 13,881:
main ()
(*------------------------------------------------------------------*)</
{{out}}
Line 13,924:
Note: we do not print the line and token source code position for the simplicity.
<
(import (owl parse))
Line 14,048:
(if (null? (cdr stream))
(print 'End_of_input))))
</syntaxhighlight>
==== Testing ====
Testing function:
<
(define (translate source)
(let ((stream (try-parse token-parser (str-iter source) #t)))
Line 14,059:
(if (null? (force (cdr stream)))
(print 'End_of_input))))
</syntaxhighlight>
====== Testcase 1 ======
<
(translate "
/*
Line 14,069:
*/
print(\"Hello, World!\\\\n\");
")</
{{Out}}
<pre>
Line 14,082:
====== Testcase 2 ======
<
(translate "
/*
Line 14,089:
phoenix_number = 142857;
print(phoenix_number, \"\\\\n\");
")</
{{Out}}
<pre>
Line 14,108:
====== Testcase 3 ======
<
(translate "
/*
Line 14,132:
/* character literal */ '\\\\'
/* character literal */ ' '
")</
{{Out}}
<pre>
Line 14,173:
====== Testcase 4 ======
<
(translate "
/*** test printing, embedded \\\\n and comments with lots of '*' ***/
Line 14,180:
print(\"Print a slash n - \\\\\\\\n.\\\\n\");
")
</syntaxhighlight>
{{Out}}
<pre>
Line 14,203:
=={{header|Perl}}==
<
use strict;
Line 14,342:
($line, $col)
}
}</
{{out|case=test case 3}}
Line 14,385:
===Alternate Perl Solution===
Tested on perl v5.26.1
<
use strict; # lex.pl - source to tokens
Line 14,421:
1 + $` =~ tr/\n//, 1 + length $` =~ s/.*\n//sr, $^R;
}
printf "%5d %7d %s\n", 1 + tr/\n//, 1, 'End_of_input';</
=={{header|Phix}}==
Line 14,428:
form. If required, demo\rosetta\Compiler\extra.e (below) contains some code that achieves the latter.
Code to print the human readable forms is likewise kept separate from any re-usable parts.
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\core.e
Line 14,588:
<span style="color: #008080;">return</span> <span style="color: #000000;">s</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</
For running under pwa/p2js, we also have a "fake file/io" component:
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\js_io.e
Line 14,692:
<span style="color: #008080;">return</span> <span style="color: #000000;">EOF</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</
The main lexer is also written to be reusable by later stages.
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\\rosetta\\Compiler\\lex.e
Line 14,881:
<span style="color: #008080;">return</span> <span style="color: #000000;">toks</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</
Optional: if you need human-readable output/input at each (later) stage, so you can use pipes
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\extra.e
Line 14,936:
<span style="color: #008080;">return</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">n_type</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">left</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">right</span><span style="color: #0000FF;">}</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</
Finally, a simple test driver for the specific task:
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\lex.exw
Line 14,966:
<span style="color: #000080;font-style:italic;">--main(command_line())</span>
<span style="color: #000000;">main</span><span style="color: #0000FF;">({</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"test4.c"</span><span style="color: #0000FF;">})</span>
<!--</
{{out}}
<pre>
Line 14,989:
=={{header|Prolog}}==
<
Test harness for the analyzer, not needed if we are actually using the output.
*/
Line 15,149:
% anything else is an error
tok(_,_,L,P) --> { format(atom(Error), 'Invalid token at line ~d,~d', [L,P]), throw(Error) }.</
{{out}}
<pre>
Line 15,190:
=={{header|Python}}==
Tested with Python 2.7 and 3.x
<
import sys
Line 15,371:
if tok == tk_EOI:
break</
{{out|case=test case 3}}
Line 15,415:
=={{header|QB64}}==
Tested with QB64 1.5
<
dim shared line_n as integer, col_n as integer, text_p as integer, err_line as integer, err_col as integer, errors as integer
Line 15,655:
end
end sub
</syntaxhighlight>
{{out|case=test case 3}}
<b>
Line 15,695:
=={{header|Racket}}==
<
#lang racket
(require parser-tools/lex)
Line 15,851:
"TEST 5"
(display-tokens (string->tokens test5))
</syntaxhighlight>
=={{header|Raku}}==
Line 15,861:
{{works with|Rakudo|2016.08}}
<syntaxhighlight lang="raku"
rule TOP { ^ <.whitespace>? <tokens> + % <.whitespace> <.whitespace> <eoi> }
Line 15,954:
my $tokenizer = tiny_C.parse(@*ARGS[0].IO.slurp);
parse_it( $tokenizer );</
{{out|case=test case 3}}
Line 16,000:
<
#
# The Rosetta Code scanner in Ratfor 77.
Line 17,230:
end
######################################################################</
Line 17,336:
The following code implements a configurable (from a symbol map and keyword map provided as parameters) lexical analyzer.
<
package xyz.hyperreal.rosettacodeCompiler
Line 17,597:
}
</syntaxhighlight>
=={{header|Scheme}}==
<
(import (scheme base)
(scheme char)
Line 17,798:
(display-tokens (lexer (cadr (command-line))))
(display "Error: provide program filename\n"))
</syntaxhighlight>
{{out}}
Line 17,816:
<
(* The Rosetta Code lexical analyzer, in Standard ML. Based on the ATS
and the OCaml. The intended compiler is Mlton or Poly/ML; there is
Line 18,622:
(* sml-indent-args: 2 *)
(* end: *)
(*------------------------------------------------------------------*)</
Line 18,676:
{{libheader|Wren-fmt}}
{{libheader|Wren-ioutil}}
<
import "/str" for Char
import "/fmt" for Fmt
Line 19,025:
lineCount = lines.count
initLex.call()
process.call()</
{{out}}
Line 19,067:
=={{header|Zig}}==
<
const std = @import("std");
Line 19,476:
return result.items;
}
</syntaxhighlight>
|