WiktionaryDumps to words: Difference between revisions

m
→‎{{header|Wren}}: Minor tidy and rerun
m (syntax highlighting fixup automation)
m (→‎{{header|Wren}}: Minor tidy and rerun)
 
(2 intermediate revisions by one other user not shown)
Line 4:
Make a file that can be useful with [https://en.wikipedia.org/wiki/Spell_checker spell checkers] like [https://fr.wikipedia.org/wiki/Ispell Ispell] and [https://en.wikipedia.org/wiki/GNU_Aspell Aspell].
 
Use the [https://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2 wiktionary dump] (input) to create a file equivalent thanto [https://manpages.ubuntu.com/manpages/bionic/man5/spanish.5.html "/usr/share/dict/spanish"] (output). The input file is an XML dump of the Wiktionary that is a bz2'ed file of about 800MB. The output file should be a file similar thanto "/usr/share/dict/spanish", whicha containssimple onetext wordfile ofeach aline givenof languagewhich byis lineone word in athe simplegiven text filelanguage. An example of such a file is available in Ubuntu with the package '''wspanish'''.
 
 
Line 727:
An embedded program so we can use libcurl and libbzip2.
 
Rather than downloading the full 800MB .bz2 file and then decompressing it, we abort the download after receiving no more than the first 512 KB and then decompress that ignoring the resultant BZ_UNEXPECTED_EOF error. This turns out to be enough to find the first 2622 French words.
<syntaxhighlight lang="ecmascriptwren">/* wiktionary_dumps_to_wordsWiktionaryDumps_to_words.wren */
 
import "./pattern" for Pattern
Line 792:
<br>
We now embed this script in the following C program, build and run.
<syntaxhighlight lang="c">/* gcc wiktionary_dumps_to_wordsWiktionaryDumps_to_words.c -o wiktionary_dumps_to_wordsWiktionaryDumps_to_words -lcurl -lbz2 -lwren -lm */
 
#include <stdio.h>
Line 988:
WrenVM* vm = wrenNewVM(&config);
const char* module = "main";
const char* fileName = "wiktionary_dumps_to_wordsWiktionaryDumps_to_words.wren";
char *script = readFile(fileName);
WrenInterpretResult result = wrenInterpret(vm, module, script);
Line 1,030:
fable
a-
abaca
abada
abalone
abandon
</pre>
9,476

edits