Find words whose first and last three letters are equal: Difference between revisions

Content added Content deleted

Inline

Revision as of 16:25, 4 April 2021

Task

Use the dictionary unixdict.txt

Find the words which first and last three letters are equals.

The length of any word shown should have a length > 5.

Other tasks related to string operations:

Metrics

Counting

Remove/replace

Anagrams/Derangements/shuffling

Find/Search/Determine

Formatting

Song lyrics/poems/Mad Libs/phrases

Tokenize

Sequences

ALGOL 68

<lang algol68># find 6 (or more) character words with the same first and last 3 letters # IF FILE input file;

   STRING file name = "unixdict.txt";
   open( input file, file name, stand in channel ) /= 0

THEN

   # failed to open the file #
   print( ( "Unable to open """ + file name + """", newline ) )

ELSE

   # file opened OK #
   BOOL at eof := FALSE;
   # set the EOF handler for the file #
   on logical file end( input file, ( REF FILE f )BOOL:
                                    BEGIN
                                        # note that we reached EOF on the #
                                        # latest read #
                                        at eof := TRUE;
                                        # return TRUE so processing can continue #
                                        TRUE
                                    END
                      );
   INT count := 0;
   WHILE STRING word;
         get( input file, ( word, newline ) );
         NOT at eof
   DO
       IF INT w len = ( UPB word + 1 ) - LWB word;
          w len > 5
       THEN
           IF word[ 1 : 3 ] = word[ w len - 2 : ]
           THEN
               count +:= 1;
               print( ( word, " " ) );
               IF count MOD 5 = 0
               THEN print( ( newline ) )
               ELSE FROM w len + 1 TO 14 DO print( ( " " ) ) OD
               FI
           FI
       FI
   OD;
   print( ( newline, "found ", whole( count, 0 ), " words with the same first and last 3 characters", newline ) );
   close( input file )

FI</lang>

Output:

antiperspirant calendrical    einstein       hotshot        murmur
oshkosh        tartar         testes
found 8 words with the same first and last 3 characters

AWK

syntax: GAWK -f FIND_WORDS_WHICH_FIRST_AND_LAST_THREE_LETTERS_ARE_EQUALS.AWK unixdict.txt

(length($0) >= 6 && substr($0,1,3) == substr($0,length($0)-2,3)) END {

   exit(0)

} </lang>

Output:

antiperspirant
calendrical
einstein
hotshot
murmur
oshkosh
tartar
testes

C++

<lang cpp>#include <cstdlib>

include <fstream>
include <iostream>

int main(int argc, char** argv) {

   const char* filename(argc < 2 ? "unixdict.txt" : argv[1]);
   std::ifstream in(filename);
   if (!in) {
       std::cerr << "Cannot open file '" << filename << "'.\n";
       return EXIT_FAILURE;
   }
   std::string word;
   int n = 0;
   while (getline(in, word)) {
       const size_t len = word.size();
       if (len > 5 && word.compare(0, 3, word, len - 3) == 0)
           std::cout << ++n << ": " << word << '\n';
   }
   return EXIT_SUCCESS;

}</lang>

Output:

1. antiperspirant
2. calendrical
3. einstein
4. hotshot
5. murmur
6. oshkosh
7. tartar
8. testes

<lang fsharp> // First and last three letters are equal. Nigel Galloway: February 18th., 2021 let fN g=if String.length g<6 then false else g.[..2]=g.[g.Length-3..] seq{use n=System.IO.File.OpenText("unixdict.txt") in while not n.EndOfStream do yield n.ReadLine()}|>Seq.filter fN|>Seq.iter(printfn "%s") </lang>

Output:

antiperspirant
calendrical
einstein
hotshot
murmur
oshkosh
tartar
testes

Factor

Read entire file

This version reads the entire dictionary file into memory and filters it. This is the fastest version by far. Factor is optimized for making multiple passes over data; it actually takes longer if we combine the two filters into one, either with short-circuiting or non-short-circuiting and. <lang factor>USING: io io.encodings.ascii io.files kernel math sequences ;

"unixdict.txt" ascii file-lines [ length 5 > ] filter [ [ 3 head-slice ] [ 3 tail-slice* ] bi = ] filter [ print ] each</lang>

Output:

antiperspirant
calendrical
einstein
hotshot
murmur
oshkosh
tartar
testes

Read file line by line

This version reads the dictionary file line by line and prints out words that fit the criteria. This ends up being a bit more imperative and deeply nested, but unlike the version above, we only load one word at a time, saving quite a bit of memory. <lang factor>USING: combinators.short-circuit io io.encodings.ascii io.files kernel math sequences ;

"unixdict.txt" ascii [

   [
       readln dup
       [
           dup
           {
               [ length 5 > ]
               [ [ 3 head-slice ] [ 3 tail-slice* ] bi = ]
           } 1&&
           [ print ] [ drop ] if
       ] when*
   ] loop

] with-file-reader</lang>

Output:

As above.

Lazy file I/O

This version lazily reads the input file by treating a stream like a lazy list with the llines word. This allows us the nice style of the first example with the memory benefits of the second example. Unlike in the first example, combining the filters would buy us some time here, as lazy lists aren't as efficient as sequences. <lang factor>USING: io io.encodings.ascii io.files kernel lists lists.lazy math sequences ;

"unixdict.txt" ascii <file-reader> llines [ length 5 > ] lfilter [ [ 3 head-slice ] [ 3 tail-slice* ] bi = ] lfilter [ print ] leach</lang>

Output:

As above.

FreeBASIC

<lang freebasic>#define NULL 0

type node

   word as string*32   'enough space to store any word in the dictionary
   nxt as node ptr

end type

function addword( tail as node ptr, word as string ) as node ptr

   'allocates memory for a new node, links the previous tail to it,
   'and returns the address of the new node
   dim as node ptr newnode = allocate(sizeof(node))
   tail->nxt = newnode
   newnode->nxt = NULL
   newnode->word = word
   return newnode

end function

function length( word as string ) as uinteger

   'necessary replacement for the built-in len function, which in this
   'case would always return 32
   for i as uinteger = 1 to 32
       if asc(mid(word,i,1)) = 0 then return i-1
   next i
   return 999

end function

dim as string word dim as node ptr tail = allocate( sizeof(node) ) dim as node ptr head = tail, curr = head, currj dim as uinteger ln tail->nxt = NULL tail->word = "XXXXHEADER"

open "unixdict.txt" for input as #1 while true

   line input #1, word
   if word = "" then exit while
   if length(word)>5 then tail = addword( tail, word )

wend close #1

while curr->nxt <> NULL

   word = curr->word
   ln = length(word)
   for i as uinteger = 1 to 3
       if mid(word,i,1) <> mid(word,ln-3+i,1) then goto nextword
   next i
   print word
   nextword:
   curr = curr->nxt

wend</lang>

Output:

antiperspirant
calendrical
einstein
hotshot
murmur
oshkosh
tartar
testes

Go

<lang go>package main

import (

   "bytes"
   "fmt"
   "io/ioutil"
   "log"
   "unicode/utf8"

)

func main() {

   wordList := "unixdict.txt"
   b, err := ioutil.ReadFile(wordList)
   if err != nil {
       log.Fatal("Error reading file")
   }
   bwords := bytes.Fields(b)
   count := 0
   for _, bword := range bwords {
       s := string(bword)
       if utf8.RuneCountInString(s) > 5 && (s[0:3] == s[len(s)-3:]) {
           count++
           fmt.Printf("%d: %s\n", count, s)
       }
   }

}</lang>

Output:

1: antiperspirant
2: calendrical
3: einstein
4: hotshot
5: murmur
6: oshkosh
7: tartar
8: testes

Julia

See Alternade_words#Julia for the foreachword function. <lang julia>matchfirstlast3(word, _) = length(word) > 5 && word[1:3] == word[end-2:end] ? word : ""

foreachword("unixdict.txt", matchfirstlast3, numcols=4)</lang>

Output:

Word source: unixdict.txt

antiperspirant calendrical    einstein       hotshot
murmur         oshkosh        tartar         testes

Perl

as one-liner .. <lang perl>// 20210212 Perl programming solution

perl -ne '/(?=^(.{3}).*\1$)^.{6,}$/&&print' unixdict.txt

minor variation

perl -ne 's/(?=^(.{3}).*\1$)^.{6,}$/print/e' unixdict.txt</lang>

Phix

function flaste(string word)
    return length(word)>5 and word[1..3]=word[-3..-1]
end function
 
sequence flastes = filter(get_text("demo/unixdict.txt",GT_LF_STRIPPED),flaste)
 
printf(1,"%d words: %s\n",{length(flastes),join(shorten(flastes,"",3))})

Output:

8 words: antiperspirant calendrical einstein hotshot murmur oshkosh tartar testes

Raku

<lang perl6># 20210210 Raku programming solution

my ( \L, \N, \IN ) = 5, 3, 'unixdict.txt';

for IN.IO.lines { .say if .chars > L and .substr(0,N) eq .substr(*-N,*) } </lang>

Output:

antiperspirant
calendrical
einstein
hotshot
murmur
oshkosh
tartar
testes

REXX

This REXX version doesn't care what order the words in the dictionary are in, nor does it care what
case (lower/upper/mixed) the words are in, the search for the words and vowels is caseless.

The program verifies that the first and last three characters are, indeed, letters.

It also allows the length (3) of the first and last number of letters to be specified, and also the minimum length of the
words to be searched on the command line (CL) as well as specifying the dictionary file identifier. <lang rexx>/*REXX pgm finds words in an specified dict. which have the same 1st and last 3 letters.*/ parse arg minL many iFID . /*obtain optional arguments from the CL*/ if minL== | minL=="," then minL= 6 /* " " " " " " */ if many== | many=="," then many= 3 /* " " " " " " */ if iFID== | iFID=="," then iFID='unixdict.txt' /* " " " " " " */

             do #=1  while lines(iFID)\==0      /*read each word in the file  (word=X).*/
             x= strip( linein( iFID) )          /*pick off a word from the input line. */
             @.#= x                             /*save:  the original case of the word.*/
             end   /*#*/

= # - 1 /*adjust word count because of DO loop.*/

say copies('─', 30) # "words in the dictionary file: " iFID finds= 0 /*word count which have matching end. */

                                                /*process all the words that were found*/
    do j=1  for #;          $= @.j;    upper $  /*obtain dictionary word; uppercase it.*/
    if length($)<minL  then iterate             /*Word not long enough?   Then skip it.*/
    lhs= left($, many);     rhs= right($, many) /*obtain the left & right side of word.*/
    if \datatype(lhs || rhs, 'U')  then iterate /*are the left and right side letters? */
    if lhs \== rhs                 then iterate /*Left side match right side?  No, skip*/
    finds= finds + 1                            /*bump count of only "e" vowels found. */
    say right( left(@.j, 30),  40)              /*indent original word for readability.*/
    end        /*j*/
                                                /*stick a fork in it,  we're all done. */

say copies('─', 30) finds " words found that the left " many ' letters match the' ,

                           "right letters which a word has a minimal length of "     minL</lang>

output when using the default inputs:

────────────────────────────── 25104 words in the dictionary file:  unixdict.txt
          antiperspirant
          calendrical
          einstein
          hotshot
          murmur
          oshkosh
          tartar
          testes
────────────────────────────── 8  words found that the left  3  letters match the right letters which a word has a minimal length of  6

Ring

<lang ring> load "stdlib.ring"

cStr = read("unixdict.txt") wordList = str2list(cStr) num = 0

see "working..." + nl see "Words are:" + nl

ln = len(wordList) for n = ln to 1 step -1

   if len(wordList[n]) < 6
      del(wordList,n)
   ok

   if left(wordList[n],3) = right(wordList[n],3) 
      num = num + 1
      see "" + num + ". " + wordList[n] + nl
   ok

working...
Words are:
1. antiperspirant
2. calendrical
3. einstein
4. hotshot
5. murmur
6. oshkosh
7. tartar
8. testes
done...

Swift

<lang swift>import Foundation

do {

   try String(contentsOfFile: "unixdict.txt", encoding: String.Encoding.ascii)
       .components(separatedBy: "\n")
       .filter{$0.count > 5 && $0.prefix(3) == $0.suffix(3)}
       .enumerated()
       .forEach{print("\($0.0 + 1). \($0.1)")}

} catch {

   print(error.localizedDescription)

}</lang>

Output:

1. antiperspirant
2. calendrical
3. einstein
4. hotshot
5. murmur
6. oshkosh
7. tartar
8. testes

Wren

Library: Wren-fmt

<lang ecmascript>import "io" for File import "/fmt" for Fmt

var wordList = "unixdict.txt" // local copy var count = 0 File.read(wordList).trimEnd().split("\n").

   where { |w|
       return w.count > 5 && (w[0..2] == w[-3..-1])
   }.
   each { |w|
       count = count + 1
       Fmt.print("$d: $s", count, w)
   }</lang>

Output:

1: antiperspirant
2: calendrical
3: einstein
4: hotshot
5: murmur
6: oshkosh
7: tartar
8: testes