Using a speech engine to highlight words

Display a piece of text and produce spoken output via a speech engine.

As each word is being spoken, highlight the word on the display.

In languages where cursor control and highlighting are not possible, it is permissible to output each word as it is spoken.

Related task

speech synthesis

AutoHotkey

We use the simple SAPI.SPVoice COM Object and a parsing loop. The highlighting is done with EM_SETSEL and Notepad. Rather crude, but it works. Due to the simplistic nature of the parsing loop, the text ends with a space. <lang AutoHotkey>SetTitleMatchMode 2 EM_SETSEL := 0x00B1

Run notepad,,,pid WinWaitActive ahk_pid %pid% ControlSetText, Edit1, % text := "AutoHotkey was the first to implement this task! ", ahk_pid %pid%

pVoice := ComObjCreate("Sapi.spvoice"), i := 1 ; the spvoice COM Object ships with the OS

parse the text

While lf := SubStr(text, i, 1) {

  If lf = %A_Space%
  {
     SendMessage, EM_SetSel, % i-StrLen(word)-1, % i-1, Edit1, ahk_pid %pid%
     pVoice.speak(word), word := "", i++
  }
  Else word .= lf, i++

}</lang>

Go

Works with: Ubuntu 16.04

This uses the eSpeak speech synthesizer which is invoked for each word in the text. As the word is spoken it is printed to the terminal in capitalized form (and the previous word is uncapitalized). After a second's delay the final word is uncapitalized.

Very robotic but it works. <lang go>package main

import (

   "fmt"
   "log"
   "os/exec"
   "strings"
   "time"

)

func main() {

   s := "Actions speak louder than words."
   prev := ""
   prevLen := 0
   bs := ""
   for _, word := range strings.Fields(s) {
       cmd := exec.Command("espeak", word)
       if err := cmd.Run(); err != nil {
           log.Fatal(err)
       }
       if prevLen > 0 {
           bs = strings.Repeat("\b", prevLen)
       }
       fmt.Printf("%s%s%s ", bs, prev, strings.ToUpper(word))
       prev = word + " "
       prevLen = len(word) + 1
   }
   bs = strings.Repeat("\b", prevLen)
   time.Sleep(time.Second)
   fmt.Printf("%s%s\n", bs, prev)

}</lang>

Julia

Translation of: Go

<lang julia>function speak(sentence, cmd = "/utl/espeak.bat")

   for word in split(sentence)
       s = replace(lowercase(word), r"[^a-z]" => "")
       if length(s) > 0
           print(uppercase(s))
           run(`$cmd $s`)
           sleep(1)
           print("\b"^length(s))
       end
       print(word, " ")
   end

end

speak("Are those 144 shy Eurasian footwear, cowboy chaps, or jolly earthmoving headgear?") </lang>

M2000 Interpreter

<lang M2000 Interpreter> Module UsingEvents {

     Form 60, 32
     Cls 5, 0
     Pen 14
     Declare WithEvents sp "SAPI.SpVoice"
     That$="Rosetta Code is a programming chrestomathy site"
     margin=(width-Len(That$))/2
     EndStream=False
     \\ this function called as sub routine - same scope as Module
     \\ we can call it from event function too
     Function Localtxt {
           \\ move the cursor to middle line
           Cursor 0, height/2
           \\ using OVER the line erased with background color and then print text over
           \\ ordinary Print using transparent printing of text
           \\ $(0) set mode to non proportional text, @() move the cursor to sepecific position
           Print Over $(0),@(margin), That$
     }
     Call Local LocalTxt()
     Function sp_Word {
           Read New StreamNumber, StreamPosition, CharacterPosition, Length
           Call Local LocalTxt()
           Cursor 0, height/2
           Pen 15 {Print Part $(0), @(CharacterPosition+margin); Mid$(That$, CharacterPosition+1, Length)}
           Refresh
     }
     Function sp_EndStream {
           Refresh
           EndStream=True
     }
     Const  SVEEndInputStream = 4
     Const  SVEWordBoundary = 32
     Const SVSFlagsAsync = 1&

     With sp, "EventInterests", SVEWordBoundary+SVEEndInputStream
     Method sp, "Speak", That$, SVSFlagsAsync
     While Not EndStream {Wait 10}
     Call Local LocalTxt()

} UsingEvents

</lang>

Mathematica

<lang>DynamicModule[{text = "This is some text.", words, i = 0},

Panel@Column@{Dynamic[
    Row[Riffle[
      If[i != 0, MapAt[Style[#, Red] &, #, i], #] &@(words = 
         StringSplit@text), " "]]], InputField[Dynamic@text, String],
    Button["Speak", 
    While[i < Length@words, i++; FinishDynamic[]; Speak[wordsi]; 
     Pause[Max[0.7, 0.12 StringLength[wordsi]]]]; i = 0]}]</lang>

Nim

Translation of: Go

Works on Linux but may also work on other platforms provided "espeak" is installed. <lang Nim>import os, osproc, strutils

const S = "Actions speak louder than words."

var prev, bs = "" var prevlen = 0

for word in S.splitWhitespace():

 discard execProcess("espeak " & word)
 if prevlen > 0:
   bs = repeat('\b', prevlen)
 stdout.write bs, prev, word.toUpper, ' '
 stdout.flushFile()
 prev = word & ' '
 prevlen = word.len + 1

bs = repeat('\b', prevlen) sleep(1000) echo bs, prev</lang>

Phix

<lang Phix>-- demo/rosetta/Speech.exw requires(WINDOWS) requires(32) -- Windows 32 bit only, for now... (runs fine on a 64-bit OS, but needs a 32-bit pw.exe) include pGUI.e Ihandle t3, left, red, right, btn, bc, dlg

include pComN.ew

procedure speak(atom pVoice, string text, integer flags=NULL)

   {} = cominvk(pVoice,ISpVoice_Speak,{unicode_string(text),flags,0})

end procedure

procedure set_interest(atom pVoice, sequence flags)

   atom lo = #40000000,
        hi = #00000002
   for i=1 to length(flags) do
       lo += power(2,flags[i])
   end for
   -- NB 2*64-bit as 4*32-bit:
   atom res = cominvk(pVoice,ISpVoice_SetInterest,{lo,hi,lo,hi})
   if res!=S_OK then crash("invalid") end if

end procedure

constant text = `<rate speed="-5">Highlight words as they are spoken.` atom pVoice = NULL

include cffi.e constant tSPEVENT = """ typedef struct SPEVENT {

   WORD         eEventId;
   WORD         elParamType;
   ULONG        ulStreamNum;
   ULONGLONG    ullAudioStreamOffset;
   WPARAM       wParam;
   LPARAM       lParam;

} SPEVENT;""", idSPEVENT = define_struct(tSPEVENT), pSPEVENT = allocate_struct(idSPEVENT)

function eventHandler(atom wParam, lParam)

   while cominvk(pVoice,ISpVoice_GetEvents,{1,pSPEVENT,NULL})=S_OK do
       integer eEventId = get_struct_field(idSPEVENT,pSPEVENT,"eEventId")
       if eEventId=SPEI_WORD_BOUNDARY then
           integer len = get_struct_field(idSPEVENT,pSPEVENT,"wParam"),
                   pos = get_struct_field(idSPEVENT,pSPEVENT,"lParam")
           IupSetStrAttribute(left,"TITLE",text[1..pos])
           IupSetStrAttribute(red,"TITLE",text[pos+1..pos+len])
           IupSetStrAttribute(right,"TITLE",text[pos+len+1..$])
       elsif eEventId=SPEI_END_INPUT_STREAM then
           IupSetStrAttribute(left,"TITLE",text)
           IupSetStrAttribute(red,"TITLE","")
           IupSetStrAttribute(right,"TITLE","")
       end if
       IupSetAttribute(left, "RASTERSIZE", "x0")
       IupSetAttribute(red, "RASTERSIZE", "x0")
       IupSetAttribute(right, "RASTERSIZE", "x0")
       IupRefresh(t3)
       IupRedraw(t3)
   end while
   return 0

end function constant cb_eventHandler = call_back(eventHandler)

function button_cb(Ihandle /*ih*/)

   object res
   if platform()=WINDOWS then
       if pVoice=NULL then
           pVoice = allocate(machine_word())
           res = CoCreateInstance(CLSID_SpVoice,IID_ISpVoice,pVoice)
           if res!=S_OK then
               crash("Failed to initialize SpeechAPI. (%08x)\n",res)
           end if
       end if
       if cominvk(pVoice,ISpVoice_SetNotifyCallbackFunction,{cb_eventHandler,0,0})!=S_OK then ?9/0 end if
       set_interest(pVoice, {SPEI_WORD_BOUNDARY,SPEI_END_INPUT_STREAM})
       speak(pVoice,text,SPF_IS_XML+SPF_ASYNC)
   else
       -- DEV no highlighting here...
       {} = system_exec(sprintf(`espeak "%s"`,{text}))
   end if
   return IUP_CONTINUE

end function

procedure main()

   IupOpen()
   left = IupLabel("",`FONT="Verdana, 18"`)
   red = IupLabel("",`FONT="Verdana, 18", FGCOLOR="255 0 0"`)
   right = IupLabel(text,`FONT="Verdana, 18"`)
   t3 = IupHbox({IupFill(),left,red,right,IupFill()},"MARGIN=20x20")
   btn = IupButton("Speak",Icallback("button_cb"))
   bc = IupHbox({IupFill(),btn,IupFill()},"MARGIN=10x10")
   dlg = IupDialog(IupVbox({t3,bc}),"TITLE=Speak")
   IupShow(dlg)
   IupMainLoop()
   IupClose()

end procedure

main()</lang>

Raku

Translation of: Go

<lang perl6># 20200826 Raku programming solution

my \s = "Actions speak louder than words."; my \prev = $ = ""; my \prevLen = $ = 0; my \bs = $ = "";

for s.split(' ', :skip-empty) {

  run "/usr/bin/espeak", $_ or die;
  bs = "\b" x prevLen if prevLen > 0;
  printf "%s%s%s ", bs, prev, $_.uc;
  prev = "$_ ";
  prevLen = $_.chars + 1

}

printf "%s%s\n", "\b" x prevLen, prev;</lang>

REXX

Works with: Windowx/XP or later

Programming note: This REXX program uses a freeware program NIRCMD to interface with the Microsoft Windows speech synthesizer program SAM, a text to speech using a male voice. SAM can possibly be configured to use other voices with later releases of Windows. More recent Microsoft Windows have another speech synthesizer program: ANNA.

Each word of the text is highlighted (by showing the word in uppercase). the terminal screen is cleared before showing the text that is being spoken; the repeated calls to the (Windows) speech engine makes for a slower speech rate. <lang rexx>/*REXX program uses a command line interface to invoke Windows SAM for speech synthesis.*/ parse arg t /*get the (optional) text from the C.L.*/

= words(t)

if #==0 then exit /*Nothing to say? Then exit program.*/ dq= '"' /*needed to enclose text in dbl quotes.*/ rate= 1 /*talk: -10 (slow) to 10 (fast). */

                                                /* [↓]  where the rubber meets the road*/
  do j=1  for #
  x= word(t, j);          upper x               /*extract 1 word, capitalize it for HL.*/
  if j==1  then LHS=                            /*obtain text before the spoken word.  */
           else LHS= subword(t, 1, j-1)
  if j==#  then RHS=                            /*obtain text  after the spoken word.  */
           else RHS= subword(t, j+1)
  'CLS'                                         /*use this command to clear the screen.*/
  say 'speaking: '   space(LHS  x  RHS)         /*show text,  one word is capitalized. */
  oneWord= dq  x  dq                            /*surround a word in double quotes (").*/
  'NIRCMD'  "speak text"    oneWord     rate    /*NIRCMD  invokes Microsoft's Sam voice*/
  end   /*j*/                                   /*stick a fork in it,  we're all done. */</lang>

Note: The name of the above REXX program is SPEAKHI.REX

usage using the command:

speakhi This is an example of speech synthesis.

Ring

<lang ring> load "guilib.ring"

MyApp = New qApp {

       win1 = new qWidget() {

               setwindowtitle("Hello World")
               setGeometry(100,100,370,250)

               Text = "Welcome to the Ring Programming Language"
               Text = split(Text," ")

               label1 = new qLabel(win1) {
                       settext("What is your name ?")
                       setGeometry(10,20,350,30)
                       setalignment(Qt_AlignHCenter)
               }

               btn1 = new qpushbutton(win1) {
                       setGeometry(10,200,100,30)
                       settext("Say Hello")
                       setclickevent("pHello()")
               }

               btn2 = new qpushbutton(win1) {
                       setGeometry(150,200,100,30)
                       settext("Close")
                       setclickevent("pClose()")
               }

               lineedit1 = new qlineedit(win1) {
                       setGeometry(10,100,350,30)
               }

               voice = new QTextToSpeech(win1) {                        
               }
               show()
       }
       exec()}

Func pHello

    lineedit1.settext( "Hello " + lineedit1.text())
    for n = 1 to len(Text)
        voice.Say(Text[n])
        see Text[n] + nl
    next

Func pClose

    MyApp.quit()

</lang>

Output:

Welcome
to
the
Ring
Programming
Language

Ruby

Library: Shoes

I'm having difficulty figuring out how to get Shoes to update the GUI (like Tk's update command), so the user must click the button once for each word.

Uses the Ruby code from Speech synthesis <lang ruby>load 'speechsynthesis.rb'

if ARGV.length == 1

 $text = "This is default text for the highlight and speak program"

else

 $text = ARGV[1..-1].join(" ")

end $words = $text.split

Shoes.app do

 @idx = 0

 stack do
   @sentence = para(strong($words[0] + " "), $words[1..-1].map {|word| span(word + " ")})
   button "Say word" do
     say_and_highlight
   end
 end

 keypress do |key|
   case key
   when :control_q, "\x11" then exit
   end
 end

 def say_and_highlight
   speak $words[@idx]
   @idx = (@idx + 1) % $words.length
   @sentence.replace($words.each_with_index.map {|word, idx| idx == @idx ? strong(word + " ") : span(word + " ")})
 end

end</lang>

Tcl

This code uses the external /usr/bin/say program (known available on Mac OS X) as its interface to the speech engine; this produces rather stilted speech because it forces the text to be spoken one word at a time instead of as a whole sentence (in order to keep the highlighting synchronized).

Library: Tk

<lang tcl>package require Tcl 8.5 package require Tk 8.5 proc say {text button} {

   grab $button
   $button configure -state disabled -cursor watch
   update
   set starts [$text search -all -regexp -count lengths {\S+} 1.0]
   foreach start $starts length $lengths {

lappend strings [$text get $start "$start + $length char"] lappend ends [$text index "$start + $length char"]

   }
   $text tag remove sel 1.0 end
   foreach from $starts str $strings to $ends {

$text tag add sel $from $to update idletasks exec /usr/bin/say << $str $text tag remove sel 1.0 end

   }
   grab release $button
   $button configure -state normal -cursor {}

}

pack [text .t] pack [button .b -text "Speak, computer!" -command {say .t .b}] -fill x .t insert 1.0 "This is an example of speech synthesis with Tcl/Tk."</lang>