Determine sentence type

Revision as of 16:00, 9 November 2021 by PureFox (talk | contribs) (→‎{{header|Wren}}: Typo.)

Use these sentences: "hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it."

Task
Determine sentence type
You are encouraged to solve this task according to the task description, using any language you may know.
Task
Search for the last used punctuation in a sentence, and determine its type according to its punctuation.
Output one of these letters
"E" (Exclamation!), "Q" (Question?), "S" (Serious.), "N" (Neutral).
Extra
Make your code able to determine multiple sentences.


Don't leave any errors!


Other tasks related to string operations:
Metrics
Counting
Remove/replace
Anagrams/Derangements/shuffling
Find/Search/Determine
Formatting
Song lyrics/poems/Mad Libs/phrases
Tokenize
Sequences



ALGOL 68

Classifies an empty string as "". <lang algol68>BEGIN # determuine the type of a sentence by looking at the final punctuation #

   CHAR exclamation = "E"; # classification codes... #
   CHAR question    = "Q";
   CHAR serious     = "S";
   CHAR neutral     = "N";
   # returns the type(s) of the sentence(s) in s - exclamation, question,     #
   #                     serious or neutral; if there are multiple sentences  #
   #                     the types are separated by |                         #
   PROC classify = ( STRING s )STRING:
        BEGIN
           STRING result := "";
           BOOL pending neutral := FALSE; 
           FOR s pos FROM LWB s TO UPB s DO
               IF   pending neutral := FALSE;
                    CHAR c = s[ s pos ];
                    c = "?"
               THEN result +:= question    + "|"
               ELIF c = "!"
               THEN result +:= exclamation + "|"
               ELIF c = "."
               THEN result +:= serious     + "|"
               ELSE pending neutral := TRUE
               FI
           OD;
           IF   pending neutral
           THEN result +:= neutral + "|"
           FI;
           # if s was empty, then return an empty string, otherwise remove the final separator #
           IF result = "" THEN "" ELSE result[ LWB result : UPB result - 1 ] FI
        END # classify # ;
   # task test case #
   print( ( classify( "hi there, how are you today? I'd like to present to you the washing machine 9001. "
                    + "You have been nominated to win one of these! Just make sure you don't break it"
                    )
          , newline
          )
        )

END</lang>

Output:
Q|S|E|N

AutoHotkey

<lang autohotkey>Sentence := "hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it" Msgbox, % SentenceType(Sentence)

SentenceType(Sentence) { Sentence := Trim(Sentence) Loop, Parse, Sentence, .?! { N := (!E && !Q && !S) , S := (InStr(SubStr(Sentence, InStr(Sentence, A_LoopField)+StrLen(A_LoopField), 3), ".")) , Q := (InStr(SubStr(Sentence, InStr(Sentence, A_LoopField)+StrLen(A_LoopField), 3), "?")) , E := (InStr(SubStr(Sentence, InStr(Sentence, A_LoopField)+StrLen(A_LoopField), 3), "!")) , type .= (E) ? ("E|") : ((Q) ? ("Q|") : ((S) ? ("S|") : "N|")) , D := SubStr(Sentence, InStr(Sentence, A_LoopField)+StrLen(A_LoopField), 3) } return (D = SubStr(Sentence, 1, 3)) ? RTrim(RTrim(type, "|"), "N|") : RTrim(type, "|") }</lang>

Output:
Q|S|E|N

Factor

This program attempts to prevent common abbreviations from ending sentences early. It also tries to handle parenthesized sentences and implements an additional type for exclamatory questions (EQ).

Works with: Factor version 0.99 2021-06-02

<lang factor>USING: combinators io kernel regexp sequences sets splitting wrap.strings ;

! courtesy of https://www.infoplease.com/common-abbreviations

CONSTANT: common-abbreviations {

   "A.B." "abbr." "Acad." "A.D." "alt." "A.M." "Assn."
   "at. no." "at. wt." "Aug." "Ave." "b." "B.A." "B.C." "b.p."
   "B.S." "c." "Capt." "cent." "co." "Col." "Comdr." "Corp."
   "Cpl." "d." "D.C." "Dec." "dept." "dist." "div." "Dr." "ed."
   "est." "et al." "Feb." "fl." "gal." "Gen." "Gov." "grad."
   "Hon." "i.e." "in." "inc." "Inst." "Jan." "Jr." "lat."
   "Lib." "long." "Lt." "Ltd." "M.D." "Mr." "Mrs." "mt." "mts."
   "Mus." "no." "Nov." "Oct." "Op." "pl." "pop." "pseud." "pt."
   "pub." "Rev." "rev." "R.N." "Sept." "Ser." "Sgt." "Sr."
   "St." "uninc." "Univ." "U.S." "vol." "vs." "wt."

}

sentence-enders ( str -- newstr )
   R/ \)/ "" re-replace
   " " split harvest
   unclip-last swap
   [ common-abbreviations member? ] reject
   [ last ".!?" member? ] filter
   swap suffix ;
serious? ( str -- ? ) last CHAR: . = ;
neutral? ( str -- ? ) last ".!?" member? not ;
mixed? ( str -- ? ) "?!" intersect length 2 = ;
exclamation? ( str -- ? ) last CHAR: ! = ;
question? ( str -- ? ) last CHAR: ? = ;
type ( str -- newstr )
   {
       { [ dup serious? ] [ drop "S" ] }
       { [ dup neutral? ] [ drop "N" ] }
       { [ dup mixed? ] [ drop "EQ" ] }
       { [ dup exclamation? ] [ drop "E" ] }
       { [ dup question? ] [ drop "Q" ] }
       [ drop "UNKNOWN" ]
   } cond ;
sentences ( str -- newstr )
   sentence-enders [ type ] map "|" join ;
show ( str -- )
   dup sentences " -> " glue 60 wrap-string print ;

"Hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it" show nl "(There was nary a mouse stirring.) But the cats were going bonkers!" show nl "\"Why is the car so slow?\" she said." show nl "Hello, Mr. Anderson!" show nl "Are you sure?!?! How can you know?" show</lang>

Output:
Hi there, how are you today? I'd like to present to you the
washing machine 9001. You have been nominated to win one of
these! Just make sure you don't break it -> Q|S|E|N

(There was nary a mouse stirring.) But the cats were going
bonkers! -> S|E

"Why is the car so slow?" she said. -> S

Hello, Mr. Anderson! -> E

Are you sure?!?! How can you know? -> EQ|Q

FreeBASIC

<lang freebasic>function sentype( byref s as string ) as string

   'determines the sentence type of the first sentence in the string
   'returns "E" for an exclamation, "Q" for a question, "S" for serious
   'and "N" for neutral.
   'modifies the string to remove the first sentence
   for i as uinteger = 1 to len(s)
       if mid(s, i, 1) = "!" then
           s=right(s,len(s)-i)
           return "E"
       end if
       if mid(s, i, 1) = "." then
           s=right(s,len(s)-i)
           return "S"
       end if 
       if mid(s, i, 1) = "?" then
           s=right(s,len(s)-i)
           return "Q"
       end if 
   next i
   'if we get to the end without encountering punctuation, this
   'must be a neutral sentence, which can only happen as the last one
   s=""
   return "N"

end function

dim as string spam = "hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it"

while len(spam)>0

   print sentype(spam)

wend</lang>

Output:
Q

S E N


Go

Translation of: Wren

<lang go>package main

import (

   "fmt"
   "strings"

)

func sentenceType(s string) string {

   if len(s) == 0 {
       return ""
   }
   var types []string
   for _, c := range s {
       if c == '?' {
           types = append(types, "Q")
       } else if c == '!' {
           types = append(types, "E")
       } else if c == '.' {
           types = append(types, "S")
       }
   }
   if strings.IndexByte("?!.", s[len(s)-1]) == -1 {
       types = append(types, "N")
   }
   return strings.Join(types, "|")

}

func main() {

   s := "hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it"
   fmt.Println(sentenceType(s))

}</lang>

Output:
Q|S|E|N

Julia

<lang julia>const text = """ Hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it"""

haspunctotype(s) = '.' in s ? "S" : '!' in s ? "E" : '?' in s ? "Q" : "N"

text = replace(text, "\n" => " ") parsed = strip.(split(text, r"(?:(?:(?<=[\?\!\.])(?:))|(?:(?:)(?=[\?\!\.])))")) isodd(length(parsed)) && push!(parsed, "") # if ends without pnctuation for i in 1:2:length(parsed)-1

   println(rpad(parsed[i] * parsed[i + 1], 52),  " ==> ", haspunctotype(parsed[i + 1]))

end

</lang>

Output:
Hi there, how are you today?                         ==> Q
I'd like to present to you the washing machine 9001. ==> S
You have been nominated to win one of these!         ==> E
Just make sure you don't break it                    ==> N

Perl

<lang perl>use strict; use warnings; use feature 'say'; use Lingua::Sentence;

my $para1 = <<'EOP'; hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it EOP

my $para2 = <<'EOP'; Just because there are punctuation characters like "?", "!" or especially "." present, it doesn't necessarily mean you have reached the end of a sentence, does it Mr. Magoo? The syntax highlighting here for Perl isn't bad at all. EOP

my $splitter = Lingua::Sentence->new("en"); for my $text ($para1, $para2) {

 for my $s (split /\n/, $splitter->split( $text =~ s/\n//gr ) {
   print "$s| ";
   if    ($s =~ /!$/)  { say 'E' }
   elsif ($s =~ /\?$/) { say 'Q' }
   elsif ($s =~ /\.$/) { say 'S' }
   else                { say 'N' }
 }

}</lang>

Output:
hi there, how are you today?| Q
I'd like to present to you the washing machine 9001.| S
You have been nominated to win one of these!| E
Just make sure you don't break it.| N
Just because there are punctuation characters like "?", "!" or especially "." present, it doesn't necessarily mean you have reached the end of a sentence, does it Mr. Magoo?| Q
The syntax highlighting here for Perl isn't bad at all.| S

Phix

with javascript_semantics
constant s = `hi there, how are you today? I'd like to present 
to you the washing machine 9001. You have been nominated to win 
one of these! Just make sure you don't break it`
sequence t = split_any(trim(s),"?!."),
         u = substitute_all(s,t,repeat("|",length(t))),
         v = substitute_all(u,{"|?","|!","|.","|"},"QESN"),
         w = join(v,'|')
?w
Output:
"Q|S|E|N"

Python

<lang python>import re

txt = """ Hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it"""

def haspunctotype(s):

   return 'S' if '.' in s else 'E' if '!' in s else 'Q' if '?' in s else 'N'

txt = re.sub('\n', , txt) pars = [s.strip() for s in re.split("(?:(?:(?<=[\?\!\.])(?:))|(?:(?:)(?=[\?\!\.])))", txt)] if len(pars) % 2:

   pars.append()  # if ends without punctuation

for i in range(0, len(pars)-1, 2):

   print((pars[i] + pars[i + 1]).ljust(54), "==>", haspunctotype(pars[i + 1]))

</lang>

Output:
Hi there, how are you today?                           ==> Q
I'd like to present to you the washing machine 9001.   ==> S
You have been nominated to win one of these!           ==> E
Just make sure you don't break it                      ==> N


Or for more generality, and an alternative to hand-crafted regular expressions: <lang python>Grouping and tagging by final character of string

from functools import reduce from itertools import groupby


  1. tagGroups :: Dict -> [String] -> [(String, [String])]

def tagGroups(tagDict):

   A list of (Tag, SentenceList) tuples, derived
      from an input text and a supplied dictionary of
      tags for each of a set of final punctuation marks.
   
   def go(sentences):
       return [
           (tagDict.get(k, 'Not punctuated'), list(v))
           for (k, v) in groupby(
               sorted(sentences, key=last),
               key=last
           )
       ]
   return go


  1. sentenceSegments :: Chars -> String -> [String]

def sentenceSegments(punctuationChars):

   A list of sentences delimited by the supplied
      punctuation characters, where these are followed
      by spaces.
   
   def go(s):
       return [
           .join(cs).strip() for cs
           in splitBy(
               sentenceBreak(punctuationChars)
           )(s)
       ]
   return go


  1. sentenceBreak :: Chars -> (Char, Char) -> Bool

def sentenceBreak(finalPunctuation):

   True if the first of two characters is a final
      punctuation mark and the second is a space.
   
   def go(a, b):
       return a in finalPunctuation and " " == b
   return go


  1. ------------------------- TEST -------------------------
  2. main :: IO ()

def main():

   Join, segmentation, tags
   tags = {'!': 'E', '?': 'Q', '.': 'S'}
   # Joined by spaces,
   sample = ' '.join([
       "Hi there, how are you today?",
       "I'd like to present to you the washing machine 9001.",
       "You have been nominated to win one of these!",
       "Might it be possible to add some challenge to this task?",
       "Feels as light as polystyrene filler.",
       "But perhaps substance isn't the goal!",
       "Just make sure you don't break off before the"
   ])
   # segmented by punctuation,
   sentences = sentenceSegments(
       tags.keys()
   )(sample)
   # and grouped under tags.
   for kv in tagGroups(tags)(sentences):
       print(kv)


  1. ----------------------- GENERIC ------------------------
  1. last :: [a] -> a

def last(xs):

   The last element of a non-empty list.
   return xs[-1]


  1. splitBy :: (a -> a -> Bool) -> [a] -> a

def splitBy(p):

   A list split wherever two consecutive
      items match the binary predicate p.
   
   # step :: (a, [a], a) -> a -> (a, [a], a)
   def step(acp, x):
       acc, active, prev = acp
       return (acc + [active], [x], x) if p(prev, x) else (
           (acc, active + [x], x)
       )
   # go :: [a] -> a
   def go(xs):
       if 2 > len(xs):
           return xs
       else:
           h = xs[0]
           ys = reduce(step, xs[1:], ([], [h], h))
           # The accumulated sublists, and the final group.
           return ys[0] + [ys[1]]
   return go


  1. MAIN ---

if __name__ == '__main__':

   main()</lang>
Output:
('E', ['You have been nominated to win one of these!', "But perhaps substance isn't the goal!"])
('S', ["I'd like to present to you the washing machine 9001.", 'Feels as light as polystyrene filler.'])
('Q', ['Hi there, how are you today?', 'Might it be possible to add some challenge to this task?'])
('Not punctuated', ["Just make sure you don't break off before the"])

Raku

<lang perl6>use Lingua::EN::Sentence;

my $paragraph = q:to/PARAGRAPH/; hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it


Just because there are punctuation characters like "?", "!" or especially "." present, it doesn't necessarily mean you have reached the end of a sentence, does it Mr. Magoo? The syntax highlighting here for Raku isn't the best. PARAGRAPH

say join "\n\n", $paragraph.&get_sentences.map: {

   /(<:punct>)$/;
   $_ ~ ' | ' ~ do
   given $0 {
       when '!' { 'E' };
       when '?' { 'Q' };
       when '.' { 'S' };
       default  { 'N' };
   }

}</lang>

Output:
hi there, how are you today? | Q

I'd like to present to you the washing machine
9001. | S

You have been nominated to win one of these! | E

Just make sure you don't
break it | N

Just because there are punctuation characters like "?", "!" or especially "."
present, it doesn't necessarily mean you have reached the end of a sentence,
does it Mr. Magoo? | Q

The syntax highlighting here for Raku isn't the best. | S

Wren

<lang ecmascript>var sentenceType = Fn.new { |s|

   if (s.count == 0) return ""
   var types = []
   for (c in s) {
       if (c == "?") {
           types.add("Q")
       } else if (c == "!") {
           types.add("E")
       } else if (c == ".") {
           types.add("S")
       }
   }
   if (!"?!.".contains(s[-1])) types.add("N")
   return types.join("|")

}

var s = "hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it" System.print(sentenceType.call(s))</lang>

Output:
Q|S|E|N


Library: Wren-pattern
Library: Wren-trait

The following alternative version takes the simplistic view that (unless they end the final sentence of the paragraph) ?, ! or . will only end a sentence if they're immediately followed by a space. This of course is nonsense, given the way English is written nowadays, but it's probably an improvement on the first version without the need to search through an inevitably incomplete list of abbreviations. <lang ecmascript>import "./pattern" for Pattern import "./trait" for Indexed

var map = { "?": "Q", "!": "E", ".": "S", "": "N" } var p = Pattern.new("[? |! |. ]") var paras = [

   "hi there, how are you today? I'd like to present to you the washing machine 9001. You have been nominated to win one of these! Just make sure you don't break it",
   "hi there, how are you on St.David's day (isn't it a holiday yet?), Mr.Smith? I'd like to present to you (well someone has to win one!) the washing machine 900.1. You have been nominated by Capt.Johnson('?') to win one of these! Just make sure you (or Mrs.Smith) don't break it. By the way, what the heck is an exclamatory question!?"

]

for (para in paras) {

   para = para.trim()
   var sentences = p.splitAll(para)
   var endings = p.findAll(para).map { |m| m.text[0] }.toList
   var lastChar = sentences[-1][-1]
   if ("?!.".contains(lastChar)) {
       endings.add(lastChar)
       sentences[-1] = sentences[-1][0...-1]
   } else {
       endings.add("")
   }
   for (se in Indexed.new(sentences)) {
       var ix = se.index
       var sentence = se.value
       System.print("%(map[endings[ix]]) <- %(sentence + endings[ix])")
   }
   System.print()

}</lang>

Output:
Q <- hi there, how are you today?
S <- I'd like to present to you the washing machine 9001.
E <- You have been nominated to win one of these!
N <- Just make sure you don't break it

Q <- hi there, how are you on St.David's day (isn't it a holiday yet?), Mr.Smith?
S <- I'd like to present to you (well someone has to win one!) the washing machine 900.1.
E <- You have been nominated by Capt.Johnson('?') to win one of these!
S <- Just make sure you (or Mrs.Smith) don't break it.
Q <- By the way, what the heck is an exclamatory question!?