Talk:Word frequency: Difference between revisions

From Rosetta Code
Content added Content deleted
m (replaced clear text with a Rosetta Code user's link (to his user page).)
(→‎task clarification: changed some wording, added more questions.)
Line 5: Line 5:


==task clarification==
==task clarification==
I assume we are to code programs to handle the general case, not just the file to be used as a test case.
I assume we are to code programs to handle the general case, not just the file specified/mandated to be used as a test case.


What is a "word"?
What is a "word"?


What letters can be included in a word?
What letters can be included in a word?
<br>There are a lot of French accented letters in the prescribed text, but are we to be limited to &nbsp; ''just'' &nbsp; the French accented letters?
<br>German? &nbsp; &nbsp; Czech? &nbsp; &nbsp; Which dialects of Greek? &nbsp; &nbsp; Logographic kanji? &nbsp; &nbsp; Kana?


What other characters can be included in a word?
What other characters can be included in a word?
Line 18: Line 20:


What about split words across lines &nbsp; (if there are possi-
What about split words across lines &nbsp; (if there are possi-
<br>ble if any)?
<br>bly present)?


Are words that contain an apostrophe to be included &nbsp; (such as '''let'''&apos;'''s''')?
Are words that contain an apostrophe to be included &nbsp; (such as '''let'''&apos;'''s''')?
Line 29: Line 31:
Should we also use the prologue and epilogue of the &nbsp; ''Project Gutenberg'' &nbsp; along with the book's text?
Should we also use the prologue and epilogue of the &nbsp; ''Project Gutenberg'' &nbsp; along with the book's text?


Wouldn't it be a lot simpler to have a simple (and complete) text file to download &nbsp; [with no (de-)assembly or editing required]?
Wouldn't it be a lot simpler to have a simple (and complete) text file to download &nbsp; [with no (de-)assembly, editing, or text massaging required]?


-- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 03:08, 16 August 2017 (UTC)
-- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 03:08, 16 August 2017 (UTC)

Revision as of 20:04, 16 August 2017

why entered as a task instead of draft task?

Why was this entry entered as a   task   instead of a   draft task?   -- Gerard Schildberger (talk) 03:08, 16 August 2017 (UTC)

... ahhh ...   I see that this task was demoted to a draft task by   Paddy3118.   -- Gerard Schildberger (talk) 08:34, 16 August 2017 (UTC)

task clarification

I assume we are to code programs to handle the general case, not just the file specified/mandated to be used as a test case.

What is a "word"?

What letters can be included in a word?
There are a lot of French accented letters in the prescribed text, but are we to be limited to   just   the French accented letters?
German?     Czech?     Which dialects of Greek?     Logographic kanji?     Kana?

What other characters can be included in a word?

Are words that are hyphenated one word or two?

What about words like:     jack-o'-lantern

What about split words across lines   (if there are possi-
bly present)?

Are words that contain an apostrophe to be included   (such as let's)?

What about words that contain non-Latin (Roman) letters?
As it happens, those non-Latin letters don't show up in the   top ten.

What exactly is the text   (start and stop)   that is contained in the web-page to be used?

Should we also use the prologue and epilogue of the   Project Gutenberg   along with the book's text?

Wouldn't it be a lot simpler to have a simple (and complete) text file to download   [with no (de-)assembly, editing, or text massaging required]?

-- Gerard Schildberger (talk) 03:08, 16 August 2017 (UTC)