Markov chain text generator: Difference between revisions

From Rosetta Code
Content added Content deleted
({{header|Perl 6}})
Line 340: Line 340:
put @generated-text.head: $words ;
put @generated-text.head: $words ;
</lang>
</lang>
<pre>>perl6 markov.p6 <alice_oz.txt --n=3 --words=100
<pre>>perl6 markov.p6 <alice_oz.txt --n=3 --words=200
Scarecrow. He can't hurt the straw. Do let me carry that basket for you. I shall not mind it, for I can't get tired. I'll tell you what I think, said the little man. Give me two or three pairs of tiny white kid gloves: she took up the fan and gloves, and, as the Lory positively refused to tell its age, there was no use in saying anything more till the Pigeon had finished. 'As if it wasn't trouble enough hatching the eggs,' said the Pigeon; 'but I must be very careful. When Oz gives me a heart of course I needn't mind so much. They were obliged to camp out that night under a large tree in the wood,' continued the Pigeon, raising its voice to a whisper. He is more powerful than they themselves, they would surely have destroyed me. As it was, I lived in deadly fear of them for many years; so you can see for yourself. Indeed, a jolly little clown came walking toward them, and Dorothy could see that in spite of all her coaxing. Hardly knowing what she did, she picked up a little bit of stick, and held it out to
Scarecrow. He can't hurt the straw. Do let me carry that basket for you. I shall not mind it, for I can't get tired. I'll tell you what I think, said the little man. Give me two or three pairs of tiny white kid gloves: she took up the fan and gloves, and, as the Lory positively refused to tell its age, there was no use in saying anything more till the Pigeon had finished. 'As if it wasn't trouble enough hatching the eggs,' said the Pigeon; 'but I must be very careful. When Oz gives me a heart of course I needn't mind so much. They were obliged to camp out that night under a large tree in the wood,' continued the Pigeon, raising its voice to a whisper. He is more powerful than they themselves, they would surely have destroyed me. As it was, I lived in deadly fear of them for many years; so you can see for yourself. Indeed, a jolly little clown came walking toward them, and Dorothy could see that in spite of all her coaxing. Hardly knowing what she did, she picked up a little bit of stick, and held it out to
</pre>
</pre>

Revision as of 20:59, 14 November 2016

Markov chain text generator is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

This task is about coding a Text Generator using Markov Chain algorithm.

A Markov chain algorithm basically determines the next most probable suffix word for a given prefix.

To do this, a Markov chain program typically breaks an input text (training text) into a series of words, then by sliding along them in some fixed sized window, storing the first N words as a prefix and then the N + 1 word as a member of a set to choose from randomly for the suffix.

As an example, take this text with N = 2:

now he is gone she said he is gone for good

this would build the following table:

PREFIX               SUFFIX
now he               is
he is                gone, gone
is gone              she, for
gone she             said
she said             he
said he              is
gone for             good
for good             (empty) if we get at this point, the program stops generating text

To generate the final text choose a random PREFIX, if it has more than one SUFFIX, get one at random, create the new PREFIX and repeat until you have completed the text.

Following our simple example, N = 2, 8 words:

random prefix: gone she
suffix: said
new prefix: she + said
new suffix: he
new prefix: said + he
new suffix: is
... and so on

gone she said he is gone she said

The bigger the training text, the better the results. You can try this text here: alice_oz.txt

Create a program that is able to handle keys of any size (I guess keys smaller than 2 words would be pretty random text but...) and create output text also in any length. Probably you want to call your program passing those numbers as parameters. Something like: markov( "text.txt", 3, 300 )




C++

In this implementation there is no repeated suffixes!

<lang cpp>

  1. include <ctime>
  2. include <iostream>
  3. include <algorithm>
  4. include <fstream>
  5. include <string>
  6. include <vector>
  7. include <map>

class markov { public:

   void create( std::string& file, int keyLen, int words ) {
       std::ifstream f( file.c_str(), std::ios_base::in );
       fileBuffer = std::string( ( std::istreambuf_iterator<char>( f ) ), std::istreambuf_iterator<char>() );
       f.close();
       if( fileBuffer.length() < 1 ) return;
       createDictionary( keyLen );
       createText( words - keyLen );
   }

private:

   void createText( int w ) {
       std::string key, first, second;
       size_t next, pos;
       std::map<std::string, std::vector<std::string> >::iterator it = dictionary.begin();
       std::advance( it, rand() % dictionary.size() );
       key = ( *it ).first;
       std::cout << key;
       while( true ) {
           std::vector<std::string> d = dictionary[key];
           if( d.size() < 1 ) break;
           second = d[rand() % d.size()];
           if( second.length() < 1 ) break;
           std::cout << " " << second;
           if( --w < 0 ) break;
           next = key.find_first_of( 32, 0 );
           first = key.substr( next + 1 );
           key = first + " " + second;
       }
       std::cout << "\n";
   }
   void createDictionary( int kl ) {
       std::string w1, key;
       size_t wc = 0, pos, textPos, next;
       next = fileBuffer.find_first_not_of( 32, 0 );
       if( next == -1 ) return;
       while( wc < kl ) {
           pos = fileBuffer.find_first_of( ' ', next );
           w1 = fileBuffer.substr( next, pos - next );
           key += w1 + " ";
           next = fileBuffer.find_first_not_of( 32, pos + 1 );
           if( next == -1 ) return;
           wc++;
       }
       key = key.substr( 0, key.size() - 1 );
       while( true ) {
           next = fileBuffer.find_first_not_of( 32, pos + 1 );
           if( next == -1 ) return;
           pos = fileBuffer.find_first_of( 32, next );
           w1 = fileBuffer.substr( next, pos - next );
           if( w1.size() < 1 ) break;
           if( std::find( dictionary[key].begin(), dictionary[key].end(), w1 ) == dictionary[key].end() ) 
               dictionary[key].push_back( w1 );
           key = key.substr( key.find_first_of( 32 ) + 1 ) + " " + w1;
       }
   }
   std::string fileBuffer;
   std::map<std::string, std::vector<std::string> > dictionary;

}; int main( int argc, char* argv[] ) {

   srand( unsigned( time( 0 ) ) );
   markov m;
   m.create( std::string( "alice_oz.txt" ), 3, 200 );
   return 0;

} </lang>

Output:
March Hare had just upset the milk-jug into his plate. Alice did not dare to 
disobey, though she felt sure it would all come wrong, and she went on. Her 
listeners were perfectly quiet till she got to the part about her repeating 
'You are old, Father William,' said the Caterpillar. 'Well, I've tried to say 
How doth the little crocodile Improve his shining tail, And pour the waters of 
the Nile On every golden scale! 'How cheerfully he seems to grin, How neatly 
spread his claws, And welcome little fishes in With gently smiling jaws!' 
'I'm sure those are not the right words,' said poor Alice, and her eyes filled 
with tears again as she went slowly after it: 'I never was so small as this before, 
never! And I declare it's too bad, that it is!' As she said this she looked 
down into its face in some alarm. This time there were three gardeners at it, 
busily painting them red. Alice thought this a very difficult game indeed. 
The players all played at once without waiting for the end of me. But the 
tinsmith happened to come along, and he made me a body of tin, fastening my 
tin arms and


J

This seems to be reasonably close to the specification:

<lang J>require'web/gethttp'

setstats=:dyad define

 'plen slen limit'=: x
 txt=. gethttp y
 letters=. (tolower~:toupper)txt
 NB. apostrophes have letters on both sides
 apostrophes=. (_1 |.!.0 letters)*(1 |.!.0 letters)*'=txt
 parsed=. <;._1 ' ',deb ' ' (I.-.letters+apostrophes)} tolower txt
 words=: ~.parsed
 corpus=: words i.parsed
 prefixes=: ~.plen]\corpus
 suffixes=: ~.slen]\corpus
 ngrams=. (plen+slen)]\corpus
 pairs=. (prefixes i. plen{."1 ngrams),. suffixes i. plen}."1 ngrams
 stats=: (#/.~pairs) (<"1~.pairs)} (prefixes ,&# suffixes)$0
 weights=: +/\"1 stats
 totals=: (+/"1 stats),0 
 i.0 0

)

genphrase=:3 :0

 pren=. #prefixes
 sufn=. #suffixes
 phrase=. (?pren) { prefixes
 while. limit > #phrase do.
   p=. prefixes i. (-plen) {. phrase
   t=. p { totals
   if. 0=t do. break.end. NB. no valid matching suffix
   s=. (p { weights) I. ?t
   phrase=. phrase, s { suffixes
 end.
 ;:inv phrase { words

)</lang>

Output:

<lang J> 2 1 50 setstats 'http://paulo-jorente.de/text/alice_oz.txt'

  genphrase

got in as alice alice

  genphrase

perhaps even alice

  genphrase

pretty milkmaid alice</lang>

And, using 8 word suffixes (but limiting results to a bit over 50 words):

Output:

<lang J> 2 8 50 setstats 'http://paulo-jorente.de/text/alice_oz.txt'

  genphrase

added it alice was beginning to get very tired of this i vote the young lady tells us alice was beginning to get very tired of being such a tiny little thing it did not take her long to find the one paved with yellow bricks within a short time

  genphrase

the raft through the water they got along quite well alice was beginning to get very tired of this i vote the young lady tells us alice was beginning to get very tired of being all alone here as she said this last word two or three times over to

  genphrase

gown that alice was beginning to get very tired of sitting by her sister on the bank and alice was beginning to get very tired of being such a tiny little thing it did so indeed and much sooner than she had accidentally upset the week before oh i beg</lang>

(see talk page for discussion of odd line wrapping with some versions of Safari)


Lua

Not sure whether this is correct, but I am sure it is quite inefficient. Also not written very nicely.

Computes keys of all lengths <= N. During text generation, if a key does not exist in the dictionary, the first (least recent) word is removed, until a key is found (if no key at all is found, the program terminates).

<lang Lua>local function pick(t)

 local i = math.ceil(math.random() * #t)
 return t[i]

end

local n_prevs = tonumber(arg[1]) or 2 local n_words = tonumber(arg[2]) or 8

local dict, wordset = {}, {} local prevs, pidx = {}, 1

local function add(word) -- add new word to dictionary

 local prev = 
 local i, len = pidx, #prevs
 for _ = 1, len do
   i = i - 1
   if i == 0 then i = len end
   if prev ~=  then prev = ' ' .. prev end
   prev = prevs[i] .. prev
   local t = dict[prev]
   if not t then
     t = {}
     dict[prev] = t
   end
   t[#t+1] = word
 end

end

for line in io.lines() do

 for word in line:gmatch("%S+") do
   wordset[word] = true
   add(word)
   prevs[pidx] = word
   pidx = pidx + 1; if pidx > n_prevs then pidx = 1 end
 end

end add()

local wordlist = {} for word in pairs(wordset) do

 wordlist[#wordlist+1] = word

end wordset = nil

math.randomseed(os.time()) math.randomseed(os.time() * math.random()) local word = pick(wordlist) local prevs, cnt = , 0

--[[ print the dictionary for prevs, nexts in pairs(dict) do

 io.write(prevs, ': ')
 for _,word in ipairs(nexts) do
   io.write(word, ' ')
 end
 io.write('\n')

end ]]

for i = 1, n_words do

 io.write(word, ' ')
 if cnt < n_prevs then
   cnt = cnt + 1
 else
   local i = prevs:find(' ')
   if i then prevs = prevs:sub(i+1) end
 end
 if prevs ~=  then prevs = prevs .. ' ' end
 prevs = prevs .. word
 local cprevs = ' ' .. prevs
 local nxt_words
 repeat
   local i = cprevs:find(' ')
   if not i then break end
   cprevs = cprevs:sub(i+1)
   if DBG then io.write('\x1b[2m', cprevs, '\x1b[m ') end
   nxt_words = dict[cprevs]
 until nxt_words
 if not nxt_words then break end
 word = pick(nxt_words)

end io.write('\n') </lang>

Output:
> ./markov.lua <alice_oz.txt 3 200
hugged the soft, stuffed body of the Scarecrow in her arms instead of kissing his
painted face, and found she was crying herself at this sorrowful parting from her
loving comrades. Glinda the Good stepped down from her ruby throne to give the
prizes?' quite a chorus of voices asked. 'Why, she, of course,' said the Dodo,
pointing to Alice with one finger; and the whole party look so grave and
anxious.) Alice could think of nothing else to do, and perhaps after all it might
tell her something worth hearing. For some minutes it puffed away without
speaking, but at last it sat down a good way off, panting, with its tongue
hanging out of its mouth again, and said, 'So you think you're changed, do you?'
'I'm afraid I don't know one,' said Alice, rather alarmed at the proposal. 'Then
the Dormouse shall!' they both cried. 'Wake up, Dormouse!' And they pinched it
on both sides at once. The Dormouse slowly opened his eyes. 'I wasn't asleep,' he
said in a low voice, 'Why the fact is, you see, Miss, we're doing our best, afore
she comes, to-' At this moment Five, who had been greatly interested in

Perl 6

<lang Perl6>use v6; unit sub MAIN ( :$text=$*IN, :$n=2, :$words=100, );

sub add-to-dict ( $text, :$n=2, ) {

   my @prefix-suffix = $text.words.rotor: $n, 1 => -$n ;
   @prefix-suffix.pop;
   (%).push: @prefix-suffix.map: * => * ;

}

my %dict = add-to-dict $text, :$n ; my @start-words = %dict.keys.pick.words; my @generated-text = lazy |@start-words, { %dict{ "@_[ *-$n .. * ]" }.pick } ...^ !*.defined ;

put @generated-text.head: $words ; </lang>

>perl6 markov.p6 <alice_oz.txt --n=3 --words=200
Scarecrow. He can't hurt the straw. Do let me carry that basket for you. I shall not mind it, for I can't get tired. I'll tell you what I think, said the little man. Give me two or three pairs of tiny white kid gloves: she took up the fan and gloves, and, as the Lory positively refused to tell its age, there was no use in saying anything more till the Pigeon had finished. 'As if it wasn't trouble enough hatching the eggs,' said the Pigeon; 'but I must be very careful. When Oz gives me a heart of course I needn't mind so much. They were obliged to camp out that night under a large tree in the wood,' continued the Pigeon, raising its voice to a whisper. He is more powerful than they themselves, they would surely have destroyed me. As it was, I lived in deadly fear of them for many years; so you can see for yourself. Indeed, a jolly little clown came walking toward them, and Dorothy could see that in spite of all her coaxing. Hardly knowing what she did, she picked up a little bit of stick, and held it out to