Talk:Word frequency: Difference between revisions

From Rosetta Code
Content added Content deleted
No edit summary
No edit summary
Line 20: Line 20:


What about words like:     '''jack-o'-lantern'''
What about words like:     '''jack-o'-lantern'''
:1 orthographic word--[[User:Nigel Galloway|Nigel Galloway]] ([[User talk:Nigel Galloway|talk]]) 12:51, 17 August 2017 (UTC)

What about split words across lines   (if there are possi-
What about split words across lines   (if there are possi-
<br>bly present)?
<br>bly present)?

Revision as of 12:51, 17 August 2017

why entered as a task instead of draft task?

Why was this entry entered as a   task   instead of a   draft task?   -- Gerard Schildberger (talk) 03:08, 16 August 2017 (UTC)

... ahhh ...   I see that this task was demoted to a draft task by   Paddy3118.   -- Gerard Schildberger (talk) 08:34, 16 August 2017 (UTC)

task clarification

I assume we are to code programs to handle the general case, not just the file specified/mandated to be used as a test case.

What is a "word"?

A single distinct meaningful element of speech. I speak words. How speech is written is very much language, time and individual dependent. Don't mention Donaudampfschifffahrtselektrizitätenhauptbetriebs. For the purpose of this task I would suggest using the concept of 'orthographic word' which works well for English. Not well for Ancient Greek and Egyptian.--Nigel Galloway (talk) 12:46, 17 August 2017 (UTC)

Is 1997 a word?   How about 20?   How about twenty?

What letters can be included in a word?
There are a lot of French accented letters in the prescribed text, but are we to be limited to   just   the French accented letters?
German?     Czech?     Which dialects of Greek?     Logographic kanji?     Kana?

What other characters can be included in a word?

Are words that are hyphenated one word or two?

What about words like:     jack-o'-lantern

1 orthographic word--Nigel Galloway (talk) 12:51, 17 August 2017 (UTC)

What about split words across lines   (if there are possi-
bly present)?

Are words that contain an apostrophe to be included   (such as let's)?

What about words that contain non-Latin (Roman) letters?
As it happens, those non-Latin letters don't show up in the   top ten.

What exactly is the text   (start and stop)   that is contained in the web-page to be used?

Should we also use the prologue and epilogue of the   Project Gutenberg   along with the book's text?

Wouldn't it be a lot simpler to have a simple (and complete) text file to download   [with no (de-)assembly, editing, or text massaging required]?

-- Gerard Schildberger (talk) 03:08, 16 August 2017 (UTC)