Random sentence from book: Difference between revisions

From Rosetta Code
Content added Content deleted
(New draft task with Python solution)
 
Line 98: Line 98:


The ringing impact of trucks, the person or entity that provided you with the torrent to recover it.
The ringing impact of trucks, the person or entity that provided you with the torrent to recover it.
/pre>
</pre>

Revision as of 05:37, 13 February 2021

Random sentence from book is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
  • Read in the book "The War of the Worlds", by H. G. Wells.
  • Skip to the start of the book, proper.
  • Remove extraneous punctuation, but keep at least sentence-ending punctuation characters . ! and ?
  • Keep account of what words follow words and how many times it is seen, (treat sentence terminators as words too).
  • Keep account of what words follow two words and how many times it is seen, (again treating sentence terminators as words too).
  • Assume that a sentence starts with a not to be shown full-stop character then use a weighted random choice of the possible words that may follow a full-stop to add to the sentence.
  • Then repeatedly add words to the sentence based on weighted random choices of what words my follow the last two words to extend the sentence.
  • Stop when the sentence ends in a sentence ending punctuation character.
  • Tidy and then print the sentence.

Show examples of random sentences generated.

Python

Extended to preserve some extra "sentence pausing" characters and try and tidy-up apostrophes.

<lang python>from urllib.request import urlopen import re from string import punctuation from collections import Counter, defaultdict import random


  1. The War of the Worlds, by H. G. Wells

text_url = 'http://www.gutenberg.org/files/36/36-0.txt' text_start = 'No one would have believed'

sentence_ending = '.!?' sentence_pausing = ',;:'

def read_book(text_url, text_start) -> str:

   with urlopen(text_url) as book:
       text = book.read().decode('utf-8')
   return text[text.index(text_start):]

def remove_punctuation(text: str, keep=sentence_ending+sentence_pausing)-> str:

   "Remove punctuation, keeping some"
   to_remove = .join(set(punctuation) - set(keep))
   text = text.translate(str.maketrans(to_remove, ' ' * len(to_remove))).strip()
   text = re.sub(fr"[^a-zA-Z0-9{keep}\n ]+", ' ', text)
   # Remove duplicates and put space around remaining punctuation
   if keep:
       text = re.sub(f"([{keep}])+", r" \1 ", text).strip()
   if text[-1] not in sentence_ending:
       text += ' .'
   return text.lower()

def word_follows_words(txt_with_pauses_and_endings):

   "return dict of freq of words following one/two words"
   words = ['.'] + txt_with_pauses_and_endings.strip().split()
   # count of what word follows this
   word2next = defaultdict(lambda :defaultdict(int))
   word2next2 = defaultdict(lambda :defaultdict(int))
   for lh, rh in zip(words, words[1:]):
       word2next[lh][rh] += 1
   for lh, mid, rh in zip(words, words[1:], words[2:]):
       word2next2[(lh, mid)][rh] += 1
   return dict(word2next), dict(word2next2)

def gen_sentence(word2next, word2next2) -> str:

   s = ['.']
   s += random.choices(*zip(*word2next[s[-1]].items()))
   while True:
       s += random.choices(*zip(*word2next2[(s[-2], s[-1])].items()))
       if s[-1] in sentence_ending:
           break
   s  = ' '.join(s[1:]).capitalize()
   s = re.sub(fr" ([{sentence_ending+sentence_pausing}])", r'\1', s)
   s = re.sub(r" re\b", "'re", s)
   s = re.sub(r" s\b", "'s", s)
   s = re.sub(r"\bi\b", "I", s)
   return s

if __name__ == "__main__":

   txt_with_pauses_and_endings = remove_punctuation(read_book(text_url, text_start))
   word2next, word2next2 = word_follows_words(txt_with_pauses_and_endings)
   #%%
   sentence = gen_sentence(word2next, word2next2)
   print(sentence)</lang>
Output:
<# A SAMPLE OF GENERATED SENTENCES

As I stood petrified and staring down the river, over which spread a multitude of dogs, I flung myself forward under the night sky, a sky of gold.

He was walking through the gaps in the water.

There was no place to their intelligence, without a word they were in position there.

Ugh!

The ringing impact of trucks, the person or entity that provided you with the torrent to recover it.