WiktionaryDumps to words: Difference between revisions

Content added Content deleted
m (syntax highlighting fixup automation)
m (grammar etc)
Line 4: Line 4:
Make a file that can be useful with [https://en.wikipedia.org/wiki/Spell_checker spell checkers] like [https://fr.wikipedia.org/wiki/Ispell Ispell] and [https://en.wikipedia.org/wiki/GNU_Aspell Aspell].
Make a file that can be useful with [https://en.wikipedia.org/wiki/Spell_checker spell checkers] like [https://fr.wikipedia.org/wiki/Ispell Ispell] and [https://en.wikipedia.org/wiki/GNU_Aspell Aspell].


Use the [https://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2 wiktionary dump] (input) to create a file equivalent than [https://manpages.ubuntu.com/manpages/bionic/man5/spanish.5.html "/usr/share/dict/spanish"] (output). The input file is an XML dump of the Wiktionary that is a bz2'ed file of about 800MB. The output file should be a file similar than "/usr/share/dict/spanish" which contains one word of a given language by line in a simple text file. An example of such a file is available in Ubuntu with the package '''wspanish'''.
Use the [https://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2 wiktionary dump] (input) to create a file equivalent than [https://manpages.ubuntu.com/manpages/bionic/man5/spanish.5.html "/usr/share/dict/spanish"] (output). The input file is an XML dump of the Wiktionary that is a bz2'ed file of about 800MB. The output file should be a file similar to "/usr/share/dict/spanish", a simple text file each line of which is one word in the given language. An example of such a file is available in Ubuntu with the package '''wspanish'''.