Talk:Compiler/lexical analyzer: Difference between revisions

Content added Content deleted
m (wording)
m (correction)
Line 5: Line 5:
* '''encoding:''' Should we expect the input files in a specific encoding? Maybe ''latin-1'' or ''utf-8''?
* '''encoding:''' Should we expect the input files in a specific encoding? Maybe ''latin-1'' or ''utf-8''?
* '''encoding:''' Should string and char literals support Unicode, or just ASCII?
* '''encoding:''' Should string and char literals support Unicode, or just ASCII?
* '''char literals:''' The stated regex is <code>'x'</code>, but that's not actually a regex. Shouldn't it be <code>'\\?[^']'</code> (a.k.a. "<code>\n</code> or <code>\\</code> or any character except <code>'</code>, enclosed in single quotes")?
* '''char literals:''' The stated regex is <code>'x'</code>, but that's not actually a regex. Shouldn't it be <code>'\\?[^']'</code> (a.k.a. an escape sequence or any character except <code>'</code>, enclosed in single quotes")?
* '''char literals:''' How can a single quote be represented as a char, if there are no other escape sequences besides <code>\n</code> and <code>\\</code>?
* '''char literals:''' How can a single quote be represented as a char, if there are no other escape sequences besides <code>\n</code> and <code>\\</code>?
* '''string literals:''' The stated regex is <code>".*"</code>, but this would match e.g. <code>"foo bar" < "</code> due to the asterisk performing greedy matching. Shouldn't it be <code>"[^"]*"</code> (a.k.a. "match zero or more characters except the double quote, enclosed in double quotes")?
* '''string literals:''' The stated regex is <code>".*"</code>, but this would match e.g. <code>"foo bar" < "</code> due to the asterisk performing greedy matching. Shouldn't it be <code>"[^"]*"</code> (a.k.a. "match zero or more characters except the double quote, enclosed in double quotes")?