Ed Davis

Joined 31 August 2022
m
Replaced content with "Hello, World!"
No edit summary
m (Replaced content with "Hello, World!")
 
(19 intermediate revisions by the same user not shown)
Line 1:
Hello, World!
{{task}}Description of the task
Lexical Analyzer
----------------
 
From [https://en.wikipedia.org/wiki/Lexical_analysis Wikipedia]
 
Lexical analysis is the process of converting a sequence of characters (such as in a
computer program or web page) into a sequence of tokens (strings with an identified
"meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,
or scanner (though "scanner" is also used to refer to the first stage of a lexer).
 
==The Task==
 
Create a lexical analyzer for the Tiny programming language. The
program should read input from a file and/or stdin, and write
output to a file and/or stdout.
 
==Specification==
 
===Operators===
 
{| class="wikitable"
|-
! Characters !! Common name !! Name
|-
| '*' || multiply || Mul
|-
| '/' || divide || Div
|-
| '+' || plus || Add
|-
| '-' || minus and unary minus || Sub and Uminus
|-
| '<' || less than || Lss
|-
| '<=' || less than or equal || Leq
|-
| '>' || greater than || Gtr
|-
| '!=' || not equal || Neq
|-
| '=' || assign || Assign
|-
| '&&' || and || And
|}
 
===Symbols===
 
{| class="wikitable"
|-
! Characters !! Common name !! Name
|-
| '(' || left parenthesis || Lparen
|-
| ')' || right parenthesis || Rparen
|-
| '{' || left brace || Lbrace
|-
| '}' || right brace || Rbrace
|-
| ';' || semi colon || Semi
|-
| ',' || comma || Comma
|}
 
===Keywords===
 
{| class="wikitable"
|-
! Characters !! Name
|-
| "if" || If
|-
| "while" || While
|-
| "print" || Print
|-
| "putc" || Putc
|}
 
===Other entities===
 
{| class="wikitable"
|-
! Characters !! Regular expression !! Name
|-
| integers || [0-9]+ || Integer
|-
| char literal || 'x' || Integer
|-
| identifiers || [_a-zA-Z][_a-zA-Z0-9]+ || Ident
|-
| string literal || ".*" || String
|}
 
Notes: For char literals, '\n' is supported as a new line
character. To represent \, use: '\\'. \n may also be used in
Strings, to print a newline. No other special sequences are
supported.
 
'''Comments''' /* ... */ (multi-line)
 
====Complete list of token names====
 
'''EOI, Print, Putc, If, While, Lbrace, Rbrace, Lparen, Rparen,
Uminus, Mul, Div, Add, Sub, Lss, Gtr, Leq, Neq, And, Semi, Comma,
Assign, Integerk, Stringk, Ident'''
 
==Program output==
 
Output of the program should be the line and column where the
found token starts, followed by the Token name. For tokens
Integer, Ident and String, the Integer, identifier, or string
should follow.
 
===Test Cases===
 
<lang c>
/*
Hello world
*/
print("Hello, World!\n");
</lang>
 
===Output===
 
{| class="wikitable"
|-
| line || 4 || col || 1 || Print || &nbsp;
|-
| line || 4 || col || 6 || Lparen || &nbsp;
|-
| line || 4 || col || 7 || String || "Hello, World!\n"
|-
| line || 4 || col || 24 || Rparen || &nbsp;
|-
| line || 4 || col || 25 || Semi || &nbsp;
|-
| line || 5 || col || 1 || EOI || &nbsp;
|}
 
<lang c>
/*
Show Ident and Integers
*/
phoenix_number = 142857;
print(phoenix_number, "\n");
</lang>
 
===Output===
 
{| class="wikitable"
|-
| line || 1 || col || 1 || Ident || phoenix_number
|-
| line || 1 || col || 16 || Assign || &nbsp;
|-
| line || 1 || col || 18 || Integer || 142857
|-
| line || 1 || col || 24 || Semi || &nbsp;
|-
| line || 2 || col || 1 || Print || &nbsp;
|-
| line || 2 || col || 6 || Lparen || &nbsp;
|-
| line || 2 || col || 7 || Ident || phoenix_number
|-
| line || 2 || col || 21 || Comma || &nbsp;
|-
| line || 2 || col || 23 || String || "\n"
|-
| line || 2 || col || 27 || Rparen || &nbsp;
|-
| line || 2 || col || 28 || Semi || &nbsp;
|-
| line || 3 || col || 1 || EOI || &nbsp;
|}
 
==Diagnostics==
The following error conditions should be caught:
 
* Empty character constant. Example: ''
* Unknown escape sequence. Example: '\r'
* Multi-character constant. Example: 'xx'
* End-of-file in comment. Closing comment characters not found.
* End-of-file while scanning string literal. Closing string character not found.
* End-of-line while scanning string literal. Closing string character not found before end-of-line.
* Unrecognized character. Example: |
 
Refer additional questions to the C and Python implementations.
155

edits