Talk:Compiler/lexical analyzer: Difference between revisions
Content added Content deleted
m (wording) |
m (correction) |
||
Line 5: | Line 5: | ||
* '''encoding:''' Should we expect the input files in a specific encoding? Maybe ''latin-1'' or ''utf-8''? |
* '''encoding:''' Should we expect the input files in a specific encoding? Maybe ''latin-1'' or ''utf-8''? |
||
* '''encoding:''' Should string and char literals support Unicode, or just ASCII? |
* '''encoding:''' Should string and char literals support Unicode, or just ASCII? |
||
* '''char literals:''' The stated regex is <code>'x'</code>, but that's not actually a regex. Shouldn't it be <code>'\\?[^']'</code> (a.k.a. |
* '''char literals:''' The stated regex is <code>'x'</code>, but that's not actually a regex. Shouldn't it be <code>'\\?[^']'</code> (a.k.a. an escape sequence or any character except <code>'</code>, enclosed in single quotes")? |
||
* '''char literals:''' How can a single quote be represented as a char, if there are no other escape sequences besides <code>\n</code> and <code>\\</code>? |
* '''char literals:''' How can a single quote be represented as a char, if there are no other escape sequences besides <code>\n</code> and <code>\\</code>? |
||
* '''string literals:''' The stated regex is <code>".*"</code>, but this would match e.g. <code>"foo bar" < "</code> due to the asterisk performing greedy matching. Shouldn't it be <code>"[^"]*"</code> (a.k.a. "match zero or more characters except the double quote, enclosed in double quotes")? |
* '''string literals:''' The stated regex is <code>".*"</code>, but this would match e.g. <code>"foo bar" < "</code> due to the asterisk performing greedy matching. Shouldn't it be <code>"[^"]*"</code> (a.k.a. "match zero or more characters except the double quote, enclosed in double quotes")? |