Revision as of 18:49, 16 April 2013 (view source) rosettacode>Paddy3118 m (→‎{{header\|Python}}: Re-factored code.) ← Older edit		Revision as of 18:57, 16 April 2013 (view source) Rdm (talk \| contribs) (J: stretch goals) Newer edit →
Line 39: Note that if we looked at frequency of use for words, instead of considering all words to have equal weights, we might come up with a different answer. === stretch goal === After downloading 1_2_all_freq to /tmp, we can read it into J, and break out the first column (as words) and the third column as numbers: <lang J>allfreq=: \|:}.<;._1;._2]1!:1<'/tmp/1_2_all_freq.txt' words=: >0 { allfreq freqs=: 0 {.@".&>2 { allfreq</lang> With these definitions, we can define a prevalence verb which will tell us how often a particular substring is appears in use: <lang J>prevalence=:verb define (y +./@E."1 words) +/ .* freqs )</lang> Investigating our original proposed rules: <lang J> 'ie' %&prevalence 'ei' 1.76868</lang> A generic "i before e" rule is not looking quite as good now - words that have i before e are used less than twice as much as words which use e before i. <lang J> 'cei' %&prevalence 'cie' 0.328974</lang> An "except after c" variant is looking awful now - words that use the cie sequence are three times as likely as words that use the cei sequence. So, of course, if we modified our original rule with this exception it would weaken the original rule: <lang J> ('ie' -&prevalence 'cie') % ('ei' -&prevalence 'cei') 1.68255</lang> Note that we might also want to consider non-adjacent matches (the regular expression 'i.e' instead of 'ie' or perhaps 'c.ie' or 'c.i.e' instead of 'cie') - this would be straightforward to check, but this would bulk up the page. =={{header\|Python}}==

I before E except after C: Difference between revisions

I before E except after C (view source)

Revision as of 18:57, 16 April 2013