Talk:Compiler/lexical analyzer: Difference between revisions

Response
m (char constant === char literal ?)
(Response)
Line 15:
 
Another small clarification. The table of valid tokens refers to a "char literal" but the error examples reference "char constants". Are these the same token? --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 14:44, 14 August 2016 (UTC)
 
==Response==
 
This first category is part of a larger example. In the works, I have syntax analysis,
code generation (for a stack based virtual machine) and virtual machine emulator. The
code is already complete in C and Python. But no writeups yet.
 
There are lots of things missing from this simple compiler, as I attempted to weed out
features, in order to keep the implementations down to a manageable size. Things like
'''else''', '''>=''', '''==''', data declarations, functions and so on.
 
The goal was to be able to run simple programs like the prime number generator in the
white-space example.
 
'''1) encoding (overall):'''
 
latin-1
 
'''2) encoding - string and char literals:'''
 
ASCII
 
Thinking about it a bit, for a hand-written scanner, there is really nothing that I am
aware of preventing string literals and comments from including utf-8. Of course this
does not include character literals, where the code would have to be utf-8 aware.
 
'''3) char literal regex:'''
 
The (new) definition I'm using for Flex:
 
<pre>\'([^'\n]|\\n|\\\\)\'</pre>
 
Page has been updated.
 
'''4) char literals: embedded single quote?'''
 
Not supported It is one of the features I arbitrarily removed.
 
'''5) string literals: regex:'''
 
The (new)) definition I'm using for flex: <pre>\"[^"\n]*\"</pre>
(thanks for the new definition!)
 
Page has been updated.
 
'''6) string literals: embedded double quotes?'''
 
Not supported It is one of the features I arbitrarily removed.
 
'''7) Whitespace:'''
 
I have updated the description.
 
'''8) Operators: Sub vs. Uminus'''
 
'''Uminus''' cannot be recognized by the scanner. It is recognized by
the syntax analyzer, i.e., the parser. The token type is there since
it will turn up in the parser and the code generator.
 
'''10) char literal vs char constants'''
 
Yes, char literal and char constants represent the same thing.
 
Interestingly, when I was researching this I got the following doing a Google search for:
("string literal") OR ("string constant")
 
https://en.wikipedia.org/wiki/String_literal
<br/>
''A '''string literal''' or anonymous string is the representation of a string value within
the source code ..... Among other things, it must be possible to encode the character that
normally terminates the '''string constant''', plus there must be some way to ...''
 
--[[User:Ed Davis|Ed Davis]] ([[User talk:Ed Davis|talk]]) 11:55, 15 August 2016 (UTC)
155

edits