Markov chain text generator: Difference between revisions
(Added Lua.) |
m (J draft) |
||
Line 147: | Line 147: | ||
</pre> |
</pre> |
||
=={{header|J}}== |
|||
This seems to be reasonably close to the specification: |
|||
<lang J>require'web/gethttp' |
|||
setstats=:dyad define |
|||
'plen slen limit'=: x |
|||
txt=. gethttp y |
|||
letters=. (tolower~:toupper)txt |
|||
NB. apostrophes have letters on both sides |
|||
apostrophes=. (_1 |.!.0 letters)*(1 |.!.0 letters)*''''=txt |
|||
parsed=. <;._1 ' ',deb ' ' (I.-.letters+apostrophes)} tolower txt |
|||
words=: ~.parsed |
|||
corpus=: words i.parsed |
|||
prefixes=: ~.plen]\corpus |
|||
suffixes=: ~.slen]\corpus |
|||
ngrams=. (plen+slen)]\corpus |
|||
pairs=. (prefixes i. plen{."1 ngrams),. suffixes i. plen}."1 ngrams |
|||
stats=: (#/.~pairs) (<"1~.pairs)} (prefixes ,&# suffixes)$0 |
|||
i.0 0 |
|||
) |
|||
genphrase=:3 :0 |
|||
pren=. #prefixes |
|||
sufn=. #suffixes |
|||
phrase=. (?pren) { prefixes |
|||
weights=. +/\"1 stats |
|||
totals=. (+/"1 stats),0 |
|||
while. limit > #limit do. |
|||
p=. prefixes i. (-plen) {. phrase |
|||
t=. p { totals |
|||
if. 0=t do. break.end. NB. no valid matching suffix |
|||
s=. (p { weights) I. ?t |
|||
phrase=. phrase, s { suffixes |
|||
end. |
|||
;:inv phrase { words |
|||
)</lang> |
|||
{{out}}<lang J> 2 1 50 setstats 'http://paulo-jorente.de/text/alice_oz.txt' |
|||
genphrase'' |
|||
got in as alice alice |
|||
genphrase'' |
|||
perhaps even alice |
|||
genphrase'' |
|||
pretty milkmaid alice |
|||
genphrase'' |
|||
all made of china alice |
|||
genphrase'' |
|||
suit them alice was not much hurt alice |
|||
genphrase'' |
|||
throat and alice looked all round alice alice |
|||
genphrase'' |
|||
often the alice |
|||
genphrase'' |
|||
head clean alice |
|||
genphrase'' |
|||
and enjoyed alice</lang> |
|||
=={{header|Lua}}== |
=={{header|Lua}}== |
Revision as of 07:30, 28 September 2016
This task is about coding a Text Generator using Markov Chain algorithm.
A Markov chain algorithm basically determines the next most probable suffix word for a given prefix.
To do this, a Markov chain program typically breaks an input text (training text) into a series of words, then by sliding along them in some fixed sized window, storing the first N words as a prefix and then the N + 1 word as a member of a set to choose from randomly for the suffix.
As an example, take this text with N = 2:
now he is gone she said he is gone for good
this would build the following table:
PREFIX SUFFIX now he is he is gone, gone is gone she, for gone she said she said he said he is gone for good for good (empty) if we get at this point, the program stops generating text
To generate the final text choose a random PREFIX, if it has more than one SUFFIX, get one at random, create the new PREFIX and repeat until you have completed the text.
Following our simple example, N = 2, 8 words:
random prefix: gone she suffix: said new prefix: she + said new suffix: he new prefix: said + he new suffix: is ... and so on gone she said he is gone she said
The bigger the training text, the better the results. You can try this text here: alice_oz.txt
Create a program that is able to handle keys of any size (I guess keys smaller than 2 words would be pretty random text but...) and create output text also in any length. Probably you want to call your program passing those numbers as parameters. Something like: markov( "text.txt", 3, 300 )
C++
In this implementation there is no repeated suffixes!
<lang cpp>
- include <ctime>
- include <iostream>
- include <algorithm>
- include <fstream>
- include <string>
- include <vector>
- include <map>
class markov { public:
void create( std::string& file, int keyLen, int words ) { std::ifstream f( file.c_str(), std::ios_base::in ); fileBuffer = std::string( ( std::istreambuf_iterator<char>( f ) ), std::istreambuf_iterator<char>() ); f.close(); if( fileBuffer.length() < 1 ) return; createDictionary( keyLen ); createText( words - keyLen ); }
private:
void createText( int w ) { std::string key, first, second; size_t next, pos; std::map<std::string, std::vector<std::string> >::iterator it = dictionary.begin(); std::advance( it, rand() % dictionary.size() ); key = ( *it ).first; std::cout << key; while( true ) { std::vector<std::string> d = dictionary[key]; if( d.size() < 1 ) break; second = d[rand() % d.size()]; if( second.length() < 1 ) break; std::cout << " " << second; if( --w < 0 ) break; next = key.find_first_of( 32, 0 ); first = key.substr( next + 1 ); key = first + " " + second; } std::cout << "\n"; } void createDictionary( int kl ) { std::string w1, key; size_t wc = 0, pos, textPos, next; next = fileBuffer.find_first_not_of( 32, 0 ); if( next == -1 ) return; while( wc < kl ) { pos = fileBuffer.find_first_of( ' ', next ); w1 = fileBuffer.substr( next, pos - next ); key += w1 + " "; next = fileBuffer.find_first_not_of( 32, pos + 1 ); if( next == -1 ) return; wc++; } key = key.substr( 0, key.size() - 1 ); while( true ) { next = fileBuffer.find_first_not_of( 32, pos + 1 ); if( next == -1 ) return; pos = fileBuffer.find_first_of( 32, next ); w1 = fileBuffer.substr( next, pos - next ); if( w1.size() < 1 ) break; if( std::find( dictionary[key].begin(), dictionary[key].end(), w1 ) == dictionary[key].end() ) dictionary[key].push_back( w1 ); key = key.substr( key.find_first_of( 32 ) + 1 ) + " " + w1; } } std::string fileBuffer; std::map<std::string, std::vector<std::string> > dictionary;
}; int main( int argc, char* argv[] ) {
srand( unsigned( time( 0 ) ) ); markov m; m.create( std::string( "alice_oz.txt" ), 3, 200 ); return 0;
} </lang>
- Output:
March Hare had just upset the milk-jug into his plate. Alice did not dare to disobey, though she felt sure it would all come wrong, and she went on. Her listeners were perfectly quiet till she got to the part about her repeating 'You are old, Father William,' said the Caterpillar. 'Well, I've tried to say How doth the little crocodile Improve his shining tail, And pour the waters of the Nile On every golden scale! 'How cheerfully he seems to grin, How neatly spread his claws, And welcome little fishes in With gently smiling jaws!' 'I'm sure those are not the right words,' said poor Alice, and her eyes filled with tears again as she went slowly after it: 'I never was so small as this before, never! And I declare it's too bad, that it is!' As she said this she looked down into its face in some alarm. This time there were three gardeners at it, busily painting them red. Alice thought this a very difficult game indeed. The players all played at once without waiting for the end of me. But the tinsmith happened to come along, and he made me a body of tin, fastening my tin arms and
J
This seems to be reasonably close to the specification:
<lang J>require'web/gethttp'
setstats=:dyad define
'plen slen limit'=: x txt=. gethttp y letters=. (tolower~:toupper)txt NB. apostrophes have letters on both sides apostrophes=. (_1 |.!.0 letters)*(1 |.!.0 letters)*'=txt parsed=. <;._1 ' ',deb ' ' (I.-.letters+apostrophes)} tolower txt words=: ~.parsed corpus=: words i.parsed prefixes=: ~.plen]\corpus suffixes=: ~.slen]\corpus ngrams=. (plen+slen)]\corpus pairs=. (prefixes i. plen{."1 ngrams),. suffixes i. plen}."1 ngrams stats=: (#/.~pairs) (<"1~.pairs)} (prefixes ,&# suffixes)$0 i.0 0
)
genphrase=:3 :0
pren=. #prefixes sufn=. #suffixes phrase=. (?pren) { prefixes weights=. +/\"1 stats totals=. (+/"1 stats),0 while. limit > #limit do. p=. prefixes i. (-plen) {. phrase t=. p { totals if. 0=t do. break.end. NB. no valid matching suffix s=. (p { weights) I. ?t phrase=. phrase, s { suffixes end. ;:inv phrase { words
)</lang>
- Output:
<lang J> 2 1 50 setstats 'http://paulo-jorente.de/text/alice_oz.txt'
genphrase
got in as alice alice
genphrase
perhaps even alice
genphrase
pretty milkmaid alice
genphrase
all made of china alice
genphrase
suit them alice was not much hurt alice
genphrase
throat and alice looked all round alice alice
genphrase
often the alice
genphrase
head clean alice
genphrase
and enjoyed alice</lang>
Lua
Not sure whether this is correct, but I am sure it is quite inefficient. Also not written very nicely.
Computes keys of all lengths <= N. During text generation, if a key does not exist in the dictionary, the last (least recent) word is removed, until a key is found (if no key at all is found, program terminates).
<lang Lua>local function pick(t)
local i = math.ceil(math.random() * #t) return t[i]
end
local n_prevs = tonumber(arg[1]) or 2 local n_words = tonumber(arg[2]) or 8
local dict, wordset = {}, {} local prevs, pidx = {}, 1
local function add(word) -- add new word to dictionary
local prev = local i, len = pidx, #prevs
for _ = 1, len do i = i - 1 if i == 0 then i = len end
if prev ~= then prev = ' ' .. prev end prev = prevs[i] .. prev local t = dict[prev] if not t then t = {} dict[prev] = t end t[#t+1] = word end
end
for line in io.lines() do
for word in line:gmatch("%S+") do wordset[word] = true add(word) prevs[pidx] = word pidx = pidx + 1; if pidx > n_prevs then pidx = 1 end end
end add()
local wordlist = {} for word in pairs(wordset) do
wordlist[#wordlist+1] = word
end wordset = nil
math.randomseed(os.time()) math.randomseed(os.time() * math.random()) local word = pick(wordlist) local prevs, cnt = , 0
--[[ print the dictionary for prevs, nexts in pairs(dict) do
io.write(prevs, ': ') for _,word in ipairs(nexts) do io.write(word, ' ') end io.write('\n')
end ]]
for i = 1, n_words do
io.write(word, ' ')
if cnt < n_prevs then cnt = cnt + 1 else local i = prevs:find(' ') if i then prevs = prevs:sub(i+1) end end if prevs ~= then prevs = prevs .. ' ' end prevs = prevs .. word
local cprevs = ' ' .. prevs local nxt_words repeat local i = cprevs:find(' ') if not i then break end cprevs = cprevs:sub(i+1) if DBG then io.write('\x1b[2m', cprevs, '\x1b[m ') end nxt_words = dict[cprevs] until nxt_words
if not nxt_words then break end word = pick(nxt_words)
end io.write('\n') </lang>
- Output:
> ./markov.lua <alice_oz.txt 3 200 hugged the soft, stuffed body of the Scarecrow in her arms instead of kissing his painted face, and found she was crying herself at this sorrowful parting from her loving comrades. Glinda the Good stepped down from her ruby throne to give the prizes?' quite a chorus of voices asked. 'Why, she, of course,' said the Dodo, pointing to Alice with one finger; and the whole party look so grave and anxious.) Alice could think of nothing else to do, and perhaps after all it might tell her something worth hearing. For some minutes it puffed away without speaking, but at last it sat down a good way off, panting, with its tongue hanging out of its mouth again, and said, 'So you think you're changed, do you?' 'I'm afraid I don't know one,' said Alice, rather alarmed at the proposal. 'Then the Dormouse shall!' they both cried. 'Wake up, Dormouse!' And they pinched it on both sides at once. The Dormouse slowly opened his eyes. 'I wasn't asleep,' he said in a low voice, 'Why the fact is, you see, Miss, we're doing our best, afore she comes, to-' At this moment Five, who had been greatly interested in