Talk:Word frequency: Difference between revisions
m (changed font in the 1st section name.) |
m (→task clarification: corrected a misspelling.) |
||
Line 23:
<br>As it happens, those non-Latin letters don't show up in the ''top ten''.
What '''
Should we also use the prologue and epilogue of the ''Project Gutenberg'' along with the book's text?
Wouldn't it be a lot simpler to have a simple (and complete) text file to download [with no (de-)assembly or editing required]?
-- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 03:08, 16 August 2017 (UTC)
|
Revision as of 08:31, 16 August 2017
why entered as a task instead of draft task?
Why was this entry entered as a task instead of a draft task? -- Gerard Schildberger (talk) 03:08, 16 August 2017 (UTC)
task clarification
I assume we are to code programs to handle the general case, not just the file to be used as a test case.
What is a "word"?
What letters can be included in a word?
What other characters can be included in a word?
Are words that are hyphenated one word or two?
What about words like: jack-o'-lantern
What about split words across lines (if there are possi-
ble if any)?
Are words that contain an apostrophe to be included (such as let's)?
What about words that contain non-Latin (Roman) letters?
As it happens, those non-Latin letters don't show up in the top ten.
What exactly is the text (start and stop) that is contained in the web-page to be used?
Should we also use the prologue and epilogue of the Project Gutenberg along with the book's text?
Wouldn't it be a lot simpler to have a simple (and complete) text file to download [with no (de-)assembly or editing required]?
-- Gerard Schildberger (talk) 03:08, 16 August 2017 (UTC)