Revision as of 14:45, 14 August 2016 (view source) Thundergnat (talk \| contribs) m (char constant === char literal ?) ← Older edit		Revision as of 15:58, 15 August 2016 (view source) Ed Davis (talk \| contribs) (Response) Newer edit →
Line 15: Another small clarification. The table of valid tokens refers to a "char literal" but the error examples reference "char constants". Are these the same token? --[[User:Thundergnat\|Thundergnat]] ([[User talk:Thundergnat\|talk]]) 14:44, 14 August 2016 (UTC) ==Response== This first category is part of a larger example. In the works, I have syntax analysis, code generation (for a stack based virtual machine) and virtual machine emulator. The code is already complete in C and Python. But no writeups yet. There are lots of things missing from this simple compiler, as I attempted to weed out features, in order to keep the implementations down to a manageable size. Things like '''else''', '''>=''', '''==''', data declarations, functions and so on. The goal was to be able to run simple programs like the prime number generator in the white-space example. '''1) encoding (overall):''' latin-1 '''2) encoding - string and char literals:''' ASCII Thinking about it a bit, for a hand-written scanner, there is really nothing that I am aware of preventing string literals and comments from including utf-8. Of course this does not include character literals, where the code would have to be utf-8 aware. '''3) char literal regex:''' The (new) definition I'm using for Flex: <pre>\'([^'\n]\|\\n\|\\\\)\'</pre> Page has been updated. '''4) char literals: embedded single quote?''' Not supported It is one of the features I arbitrarily removed. '''5) string literals: regex:''' The (new)) definition I'm using for flex: <pre>\"[^"\n]*\"</pre> (thanks for the new definition!) Page has been updated. '''6) string literals: embedded double quotes?''' Not supported It is one of the features I arbitrarily removed. '''7) Whitespace:''' I have updated the description. '''8) Operators: Sub vs. Uminus''' '''Uminus''' cannot be recognized by the scanner. It is recognized by the syntax analyzer, i.e., the parser. The token type is there since it will turn up in the parser and the code generator. '''10) char literal vs char constants''' Yes, char literal and char constants represent the same thing. Interestingly, when I was researching this I got the following doing a Google search for: ("string literal") OR ("string constant") https://en.wikipedia.org/wiki/String_literal <br/> ''A '''string literal''' or anonymous string is the representation of a string value within the source code ..... Among other things, it must be possible to encode the character that normally terminates the '''string constant''', plus there must be some way to ...'' --[[User:Ed Davis\|Ed Davis]] ([[User talk:Ed Davis\|talk]]) 11:55, 15 August 2016 (UTC)

Talk:Compiler/lexical analyzer: Difference between revisions

Talk:Compiler/lexical analyzer (view source)

Revision as of 15:58, 15 August 2016