Talk:Compiler/lexical analyzer: Difference between revisions

Line 5:

* '''encoding:''' Should we expect the input files in a specific encoding? Maybe ''latin-1'' or ''utf-8''?

* '''encoding:''' Should string and char literals support Unicode, or just ASCII?

* '''char literals:''' The stated regex is <code>'x'</code>, but that's not actually a regex. Shouldn't it be <code>'\\?[^']'</code> (a.k.a. ~~"<code>\n</code>~~ or ~~<code>\\</code>~~ or any character except <code>'</code>, enclosed in single quotes")?

* '''char literals:''' The stated regex is <code>'x'</code>, but that's not actually a regex. Shouldn't it be <code>'\\?[^']'</code> (a.k.a. an escape sequence or any character except <code>'</code>, enclosed in single quotes")?

* '''char literals:''' How can a single quote be represented as a char, if there are no other escape sequences besides <code>\n</code> and <code>\\</code>?

* '''string literals:''' The stated regex is <code>".*"</code>, but this would match e.g. <code>"foo bar" < "</code> due to the asterisk performing greedy matching. Shouldn't it be <code>"[^"]*"</code> (a.k.a. "match zero or more characters except the double quote, enclosed in double quotes")?