Words containing "the" substring

From Rosetta Code
Words containing "the" substring is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Using the dictionary   unixdict.txt,   search words containing "the" substring,
then display the found words (on this page).

The length of any word shown should have a length   >  11.


Other tasks related to string operations:
Metrics
Counting
Remove/replace
Anagrams/Derangements/shuffling
Find/Search/Determine
Formatting
Song lyrics/poems/Mad Libs/phrases
Tokenize
Sequences



11l

L(word) File(‘unixdict.txt’).read().split("\n")
   I ‘the’ C word & word.len > 11
      print(word)
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Action!

In the following solution the input file unixdict.txt is loaded from H6 drive. Altirra emulator automatically converts CR/LF character from ASCII into 155 character in ATASCII charset used by Atari 8-bit computer when one from H6-H10 hard drive under DOS 2.5 is used.

BYTE FUNC FindS(CHAR ARRAY text,sub)
  BYTE i,j,found

  i=1
  WHILE i<=text(0)-sub(0)+1
  DO
    found=0
    FOR j=1 TO sub(0)
    DO
      IF text(i+j-1)#sub(j) THEN
        found=0 EXIT
      ELSE
        found=1
      FI
    OD
    IF found THEN
      RETURN (i)
    FI
    i==+1
  OD
RETURN (0)

BYTE FUNC IsValidWord(CHAR ARRAY word)
  IF word(0)<=11 THEN RETURN (0) FI
  IF FindS(word,"the")=0 THEN RETURN(0) FI
RETURN (1)

PROC FindWords(CHAR ARRAY fname)
  CHAR ARRAY line(256)
  CHAR ARRAY tmp(256)
  BYTE pos,dev=[1]

  pos=2
  Close(dev)
  Open(dev,fname,4)
  WHILE Eof(dev)=0
  DO
    InputSD(dev,line)
    IF IsValidWord(line) THEN
      IF pos+line(0)>=39 THEN
        PutE() pos=2
      FI
      Print(line) Put(32)
      pos==+line(0)+1
    FI
  OD
  Close(dev)
RETURN

PROC Main()
  CHAR ARRAY fname="H6:UNIXDICT.TXT"

  FindWords(fname)
RETURN
Output:

Screenshot from Atari 8-bit computer

authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain
featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern
northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy
radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping

Ada

with Ada.Text_IO;            use Ada.Text_IO;
with Ada.Strings.Fixed;      use Ada.Strings.Fixed;
with Ada.Characters.Latin_1; use Ada.Characters.Latin_1;

procedure Main is
   type col_count is mod 6;
   package AF renames Ada.Strings.Fixed;

   file_name : String             := "unixdict.txt";
   The_File  : File_Type;
   Inpt_Str  : String (1 .. 40);
   Length    : Natural;
   pattern   : String             := "the";
   Columns   : col_count          := 0;
   Tally     : Natural            := 0;
   sep       : constant Character := HT;
begin

   Open (File => The_File, Mode => In_File, Name => file_name);

   while not End_Of_File (The_File) loop
      Get_Line (File => The_File, Item => Inpt_Str, Last => Length);

      if Length > 11
        and then
          AF.Count (Source => Inpt_Str (1 .. Length), Pattern => pattern) > 0
      then
         Tally   := Tally + 1;
         Columns := Columns + 1;
         Put (Inpt_Str (1 .. Length) & sep);
         if Columns = 0 then
            New_Line;
         end if;
      end if;
   end loop;
   New_Line;
   Put_Line ("Found" & Tally'Image & " ""the"" words");
   Close (The_File);
end Main;
Output:
authenticate	chemotherapy	chrysanthemum	clothesbrush	clotheshorse	eratosthenes	
featherbedding	featherbrain	featherweight	gaithersburg	hydrothermal	lighthearted	
mathematician	neurasthenic	nevertheless	northeastern	northernmost	otherworldly	
parasympathetic	physiotherapist	physiotherapy	psychotherapeutic	psychotherapist	psychotherapy	
radiotherapy	southeastern	southernmost	theoretician	weatherbeaten	weatherproof	
weatherstrip	weatherstripping	
Found 32 "the" words

ALGOL 68

# find 12 character (or more) words that have "the" in them          #
IF  FILE input file;
    STRING file name = "unixdict.txt";
    open( input file, file name, stand in channel ) /= 0
THEN
    # failed to open the file #
    print( ( "Unable to open """ + file name + """", newline ) )
ELSE
    # file opened OK #
    BOOL at eof := FALSE;
    # set the EOF handler for the file #
    on logical file end( input file, ( REF FILE f )BOOL:
                                     BEGIN
                                         # note that we reached EOF on the #
                                         # latest read #
                                         at eof := TRUE;
                                         # return TRUE so processing can continue #
                                         TRUE
                                     END
                       );
    INT the count := 0;
    WHILE STRING word;
          get( input file, ( word, newline ) );
          NOT at eof
    DO
        IF INT w len = ( UPB word + 1 ) - LWB word;
           w len > 11
        THEN
            BOOL found the := FALSE;
            FOR w pos FROM LWB word TO UPB word - 2 WHILE NOT found the DO
                IF word[ w pos : w pos + 2 ] = "the" THEN
                    found the  := TRUE;
                    the count +:= 1;
                    print( ( word, " " ) );
                    IF the count MOD 6 = 0
                    THEN print( ( newline ) )
                    ELSE FROM w len + 1 TO 18 DO print( ( " " ) ) OD
                    FI
                FI
            OD
        FI
    OD;
    print( ( newline, "found ", whole( the count, 0 ), " ""the"" words", newline ) );
    close( input file )
FI
Output:
authenticate       chemotherapy       chrysanthemum      clothesbrush       clotheshorse       eratosthenes
featherbedding     featherbrain       featherweight      gaithersburg       hydrothermal       lighthearted
mathematician      neurasthenic       nevertheless       northeastern       northernmost       otherworldly
parasympathetic    physiotherapist    physiotherapy      psychotherapeutic  psychotherapist    psychotherapy
radiotherapy       southeastern       southernmost       theoretician       weatherbeaten      weatherproof
weatherstrip       weatherstripping
found 32 "the" words

AppleScript

AppleScripters can tackle this task in a variety of ways. The example handlers below are listed in order of increasing speed but all complete the task in under 0.2 seconds on my current machine. They all take a file specifier, search string, and minimum length as parameters and return identical results for the same input.

Using just the core language — 'words':

on wordsContaining(textfile, searchText, minLength)
    script o
        property wordList : missing value
        property output : {}
    end script
    
    -- Extract the text's 'words' and return any that meet both the search text and minimum length requirements.
    set o's wordList to words of (read (textfile as alias) as «class utf8»)
    repeat with thisWord in o's wordList
        if ((thisWord contains searchText) and (thisWord's length  minLength)) then
            set end of o's output to thisWord's contents
        end if
    end repeat
    
    return o's output
end wordsContaining

Using just the core language — 'text items':

on wordsContaining(textFile, searchText, minLength)
    script o
        property textItems : missing value
        property output : {}
    end script
    
    -- Extract the text's search-text-delimited sections.
    set astid to AppleScript's text item delimiters
    set AppleScript's text item delimiters to searchText
    set o's textItems to text items of (read (textFile as alias) as «class utf8»)
    set AppleScript's text item delimiters to astid
    
    -- Reconstitute any words containing the search text from the stubs at the section ends and
    -- the search text itself, returning any results which meet the minimum length requirement.
    set thisSection to beginning of o's textItems
    set sectionHasWords to ((count thisSection's words) > 0)
    considering white space
        repeat with i from 2 to (count o's textItems)
            set foundWord to searchText
            if (sectionHasWords) then
                set thisStub to thisSection's last word
                if (thisSection ends with thisStub) then set foundWord to thisStub & foundWord
            end if
            set thisSection to item i of o's textItems
            set sectionHasWords to ((count thisSection's words) > 0)
            if (sectionHasWords) then
                set thisStub to thisSection's first word
                if (thisSection begins with thisStub) then set foundWord to foundWord & thisStub
            end if
            if (foundWord's length  minLength) then set end of o's output to foundWord
        end repeat
    end considering
    
    return o's output
end wordsContaining

Using a shell script:

on wordsContaining(textFile, searchText, minLength)
    -- Set up and execute a shell script which uses grep to find words containing the search text
    -- (matching AppleScript's current case-sensitivity setting) and awk to pass those which
    -- satisfy the minimum length requirement.
    if ("A" = "a") then
        set part1 to "grep -io "
    else
        set part1 to "grep -o "
    end if
    set shellCode to part1 & quoted form of ("\\b\\w*" & searchText & "\\w*\\b") & ¬
        (" <" & quoted form of textFile's POSIX path) & ¬
        (" | awk " & quoted form of ("// && length($0) >= " & minLength))
    
    return paragraphs of (do shell script shellCode)
end wordsContaining

Using Foundation methods (AppleScriptObjC):

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on wordsContaining(textFile, searchText, minLength)
    set theText to current application's class "NSMutableString"'s ¬
        stringWithContentsOfFile:(textFile's POSIX path) usedEncoding:(missing value) |error|:(missing value)
    -- Replace every run of non AppleScript 'word' characters with a linefeed.
    tell theText to replaceOccurrencesOfString:("(?:[\\W--[.'’]]|(?<!\\w)[.'’]|[.'’](?!\\w))++") withString:(linefeed) ¬
        options:(current application's NSRegularExpressionSearch) range:({0, its |length|()})
    -- Split the text at the linefeeds.
    set theWords to theText's componentsSeparatedByString:(linefeed)
    -- Filter the resulting array for strings which meet the search text and minimum length requirements,
    -- matching AppleScript's current case-sensitivity setting. NSString lengths are measured in 16-bit
    -- code units so use regex to check the lengths in characters.
    if ("A" = "a") then
        set filterTemplate to "((self CONTAINS[c] %@) && (self MATCHES %@))"
    else
        set filterTemplate to "((self CONTAINS %@) && (self MATCHES %@))"
    end if
    set filter to current application's class "NSPredicate"'s ¬
        predicateWithFormat_(filterTemplate, searchText, ".{" & minLength & ",}+")
    
    return (theWords's filteredArrayUsingPredicate:(filter)) as list
end wordsContaining

Test code for the task with any of the above:

local textFile, output
set textFile to ((path to desktop as text) & "unixdict.txt") as «class furl»
-- considering case -- Uncomment this and the corresponding 'end' line for case-sensitive searches.
set output to wordsContaining(textFile, "the", 12)
-- end considering
return {count output, output}
Output:
{32, {"authenticate", "chemotherapy", "chrysanthemum", "clothesbrush", "clotheshorse", "eratosthenes", "featherbedding", "featherbrain", "featherweight", "gaithersburg", "hydrothermal", "lighthearted", "mathematician", "neurasthenic", "nevertheless", "northeastern", "northernmost", "otherworldly", "parasympathetic", "physiotherapist", "physiotherapy", "psychotherapeutic", "psychotherapist", "psychotherapy", "radiotherapy", "southeastern", "southernmost", "theoretician", "weatherbeaten", "weatherproof", "weatherstrip", "weatherstripping"}}

Arturo

print.lines
    select read.lines relative "unixdict.txt" 'l -> 
        and? [11 < size l]
             [contains? l "the"]
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

AutoHotkey

FileRead, wList, % A_Desktop "\unixdict.txt"
SubString := "the"
list := ContainSubStr(wList, SubString)
for i, v in list
    result .= i "- " v "`n"
MsgBox, 262144, , % result
return
 
ContainSubStr(wList, SubString){
    oRes := []
    for i, w in StrSplit(wList, "`n", "`r")
    {
        if (StrLen(w) < 12 || !InStr(w, SubString))
            continue
        oRes.Push(w)
    }
    return oRes
}
Output:
1- authenticate
2- chemotherapy
3- chrysanthemum
4- clothesbrush
5- clotheshorse
6- eratosthenes
7- featherbedding
8- featherbrain
9- featherweight
10- gaithersburg
11- hydrothermal
12- lighthearted
13- mathematician
14- neurasthenic
15- nevertheless
16- northeastern
17- northernmost
18- otherworldly
19- parasympathetic
20- physiotherapist
21- physiotherapy
22- psychotherapeutic
23- psychotherapist
24- psychotherapy
25- radiotherapy
26- southeastern
27- southernmost
28- theoretician
29- weatherbeaten
30- weatherproof
31- weatherstrip
32- weatherstripping

AutoIt

; Includes not needed if you don't want to use the constants
#include <FileConstants.au3>
#include <StringConstants.au3>
#include <MsgBoxConstants.au3>

;Initialise some variables and constants
Local Const $sFileName = "unixdict.txt"
Local Const $sStrToFind = "the"
Local $iFoundResults = 0

; Open the file for reading and store the handle to a variable.
Local $hFileOpen = FileOpen($sFileName, $FO_READ)
If $hFileOpen = -1 Then
   MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
   Return False
EndIf

; Read the contents of the file using the handle returned by FileOpen.
Local $sFileRead = FileRead($hFileOpen)

; Close the handle returned by FileOpen.
FileClose($hFileOpen)

; Get each "word" that's on a new line
Local $aArray = StringSplit($sFileRead, @CRLF)

; Loop through the array returned by StringSplit to check the length and if it containes the "the" substring.
For $i = 1 To $aArray[0]
   If StringLen($aArray[$i]) > 11 Then
	  If StringInStr($aArray[$i], $sStrToFind) <> 0 Then
		 ; Increment the found results counter
		 $iFoundResults += 1
		 ; Log the output
		 ConsoleWrite($aArray[$i])
		 ConsoleWrite(@CRLF)
	  EndIf
   EndIf
Next

ConsoleWrite("Found " & $iFoundResults & " words containing '" & $sStrToFind & "'")
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping
Found 32 words containing 'the'>Exit code: 0    Time: 0.07385

AWK

The following is an awk one-liner entered at a Posix shell.

/Code$ awk  '/the/ && length($1) > 11' unixdict.txt
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping
/Code$

BASIC

10 OPEN "I",1,"unixdict.txt"
20 IF EOF(1) THEN CLOSE #1: END
30 LINE INPUT #1,W$
40 IF LEN(W$)>11 AND INSTR(W$,"the") THEN PRINT W$
50 GOTO 20
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

BASIC256

f = freefile
open f, "i:\unixdict.txt"
while not eof(f)
	a$ = read (f)
	if length(a$) > 11 and instr(a$, "the") then print a$
end while
close f
Output:
Same as BASIC entry.

GW-BASIC

Works with: PC-BASIC version any
Works with: QBasic
10 OPEN "unixdict.txt" FOR INPUT AS #1
20 WHILE NOT EOF(1)
30   LINE INPUT #1, A$
40   IF LEN(A$) > 11 AND INSTR(A$,"the") THEN PRINT A$
50 WEND
60 CLOSE #1
70 END
Output:
Same as BASIC entry.

QBasic

Works with: QBasic version 1.1
OPEN "unixdict.txt" FOR INPUT AS #1
WHILE NOT EOF(1)
  LINE INPUT #1, W$
  IF LEN(W$) > 11 AND INSTR(W$, "the") THEN PRINT W$
WEND
CLOSE #1
END
Output:
Same as BASIC entry.

BCPL

get "libhdr"

let read(word) = valof
$(  let ch = ?
    word%0 := 0
    $(  ch := rdch()
        if ch = endstreamch then resultis false
        word%0 := word%0 + 1
        word%(word%0) := ch
    $) repeatuntil ch = '*N'
    resultis true
$)

let contains(s1,s2) = valof
$(  for i=1 to s1%0-s2%0+1
    $(  for j=1 to s2%0 
            unless s1%(i+j-1)=s2%j goto next
        resultis true
        next: loop
    $)
    resultis false
$)

// We need to test for a length of 12 rather than 11,
// because the newline character is included.
let match(word) = word%0 > 12 & contains(word,"the")

let start() be
$(  let word = vec 63
    let file = findinput("unixdict.txt")
    test file=0 do
        writes("Cannot open unixdict.txt*N")
    or
    $(  selectinput(file)
        while read(word) if match(word) do writes(word)
        endread()
    $)
$)
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

C

#include <stdio.h>
#include <string.h>

int main() {
    char word[128];
    FILE *f = fopen("unixdict.txt","r");
    if (!f) {
        fprintf(stderr, "Cannot open unixdict.txt\n");
        return -1;
    }
    while (!feof(f)) {
        fgets(word, sizeof(word), f);
        // fgets() includes the \n character, so we need to test
        // for a length of 12 (11 letters plus the newline)
        if (strlen(word) > 12 && strstr(word,"the"))
            printf("%s",word);
    }
    fclose(f);
    return 0;
}
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

C++

#include <iostream>
#include <fstream>

int main() {
    std::string word;
    std::ifstream file("unixdict.txt");
 
    if (!file) {
        std::cerr << "Cannot open unixdict.txt" << std::endl;
        return -1;
    }
    while (file >> word) {
        if (word.length() > 11 && word.find("the") != std::string::npos)
            std::cout << word << std::endl;
    }
    return 0;
}
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Common Lisp

(defun print-words-containing-substring (str len path)
  (with-open-file (s path :direction :input)
    (do ((line (read-line s nil :eof) (read-line s nil :eof)))
        ((eql line :eof)) (when (and (> (length line) len)
                                     (search str line))
                            (format t "~a~%" line)))))

(print-words-containing-substring "the" 11 "unixdict.txt")
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping
NIL

Delphi

Translation of: Go
program Words_containing_the_substring;

{$APPTYPE CONSOLE}

uses
  System.SysUtils,
  System.IOUtils;

var
  Words, WordsFound: TArray<string>;

begin
  Words := TFile.ReadAllLines('unixdict.txt');

  for var w in Words do
  begin
    if (w.Length > 11) and (w.IndexOf('the') > -1) then
    begin
      SetLength(WordsFound, Length(WordsFound) + 1);
      WordsFound[High(WordsFound)] := w;
    end;
  end;
  writeln('Words containing "the" having a length > 11 in unixdict.txt:');

  for var i := 0 to High(WordsFound) do
    writeln(i + 1: 2, ': ', WordsFound[i]);

  readln;
end.

Draco

\util.g

proc theword(*char line) bool:
    CharsLen(line) > 11 
    and CharsIndex(line, "the") ~= -1
corp

proc nonrec main() void:
    file(1024) dictfile;
    [32] char buf;
    *char line;
    channel input text dict;
    
    open(dict, dictfile, "unixdict.txt");
    line := &buf[0];
    
    while readln(dict; line) do
        if theword(line) then writeln(line) fi
    od;
    
    close(dict)
corp
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Factor

Works with: Factor version 0.99 2020-08-14
USING: io io.encodings.ascii io.files kernel math sequences ;

"unixdict.txt" ascii file-lines
[ length 11 > ] filter
[ "the" swap subseq? ] filter
[ print ] each
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Forth

Developed with Gforth 0.7.9

11    constant      WordLen
128   constant      max-line

create              SearchSub  80 allot
Create              SrcFile   256 allot
Variable            fhin
variable            Cnt

: SrcOpen           Srcfile count r/o open-file throw Fhin ! ;
: SrcClose          fhin @ close-file throw ;
: third             >r over r> swap ;
: cnt++             cnt 1 swap   +! ;
: SubStrFound       SearchSub count  Search ;

: read-lines        fhin @
                    begin  pad max-line third read-line throw
                    while  pad swap dup WordLen >
                           if   2dup  SubStrFound -rot  2drop
                                if cnt++ cr  type else 2drop then
                           else 2DROP
                           then
                    repeat 2drop  ;

: Test              0 cnt !
                    s" ./unixdict.txt"  SrcFile   place
                    s" the"             SearchSub place
                    SrcOpen
                    read-lines
                    cr ." =============="
                    cr ." Found " cnt @  . ." Words" cr
                    SrcClose ;

Test
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping
==============
Found 32 Words

Fortran

program main
implicit none
integer :: lun
character(len=256) :: line
integer :: ios
   open(file='unixdict.txt',newunit=lun)
   do
      read(lun,'(a)',iostat=ios)line
      if(ios /= 0)exit
      if( index(line,'the') /= 0 .and. len_trim(line) > 11 ) then
         write(*,'(a)')trim(line)
      endif
   enddo
end program main
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

FreeBASIC

Reuses some code from Odd words#FreeBASIC

#define NULL 0

type node
    word as string*32   'enough space to store any word in the dictionary
    nxt as node ptr
end type

function addword( tail as node ptr, word as string ) as node ptr
    'allocates memory for a new node, links the previous tail to it,
    'and returns the address of the new node
    dim as node ptr newnode = allocate(sizeof(node))
    tail->nxt = newnode
    newnode->nxt = NULL
    newnode->word = word
    return newnode
end function

function length( word as string ) as uinteger
    'necessary replacement for the built-in len function, which in this
    'case would always return 32
    for i as uinteger = 1 to 32
        if asc(mid(word,i,1)) = 0 then return i-1
    next i
    return 999
end function

dim as string word
dim as node ptr tail = allocate( sizeof(node) )
dim as node ptr head = tail, curr = head, currj
tail->nxt = NULL
tail->word = "XXXXHEADER"

open "unixdict.txt" for input as #1
while true
    line input #1, word
    if word = "" then exit while
    if length(word)>11 then tail = addword( tail, word )
wend
close #1

dim as string tempword

while curr->nxt <> NULL
    for i as uinteger = 1 to length(curr->word)-3
        if mid(curr->word,i,3) = "the" then print curr->word
    next i
    curr = curr->nxt
wend
Output:
authenticate                    
chemotherapy                    
chrysanthemum                   
clothesbrush                    
clotheshorse                    
eratosthenes                    
featherbedding                  
featherbrain                    
featherweight                   
gaithersburg                    
hydrothermal                    
lighthearted                    
mathematician                   
neurasthenic                    
nevertheless                    
northeastern                    
northernmost                    
otherworldly                    
parasympathetic                 
physiotherapist                 
physiotherapy                   
psychotherapeutic               
psychotherapist                 
psychotherapy                   
radiotherapy                    
southeastern                    
southernmost                    
theoretician                    
weatherbeaten                   
weatherproof                    
weatherstrip                    
weatherstripping

FutureBasic

include "NSLog.incl"

#plist NSAppTransportSecurity @{NSAllowsArbitraryLoads:YES}

void local fn DoIt
  CFURLRef          url
  CFStringRef       string, wd
  ErrorRef          err = NULL
  CFArrayRef        array
  CFMutableArrayRef mutArray
  
  url = fn URLWithString( @"https://web.archive.org/web/20180611003215/http://www.puzzlers.org/pub/wordlists/unixdict.txt" )
  string = fn StringWithContentsOfURL( url, NSUTF8StringEncoding, @err )
  if ( string )
    array = fn StringComponentsSeparatedByCharactersInSet( string, fn CharacterSetNewlineSet )
    mutArray = fn MutableArrayWithCapacity(0)
    for wd in array
      if ( len(wd) > 11 and fn StringContainsString( wd, @"the" ) )
        MutableArrayAddObject( mutArray, wd )
      end if
    next
    string = fn ArrayComponentsJoinedByString( mutArray, @"\n" )
    
    NSLog(@"%@",string)
    
  else
    NSLog(@"%@",err)
  end if
end fn

fn DoIt

HandleEvents
Output:
authenticate                    
chemotherapy                    
chrysanthemum                   
clothesbrush                    
clotheshorse                    
eratosthenes                    
featherbedding                  
featherbrain                    
featherweight                   
gaithersburg                    
hydrothermal                    
lighthearted                    
mathematician                   
neurasthenic                    
nevertheless                    
northeastern                    
northernmost                    
otherworldly                    
parasympathetic                 
physiotherapist                 
physiotherapy                   
psychotherapeutic               
psychotherapist                 
psychotherapy                   
radiotherapy                    
southeastern                    
southernmost                    
theoretician                    
weatherbeaten                   
weatherproof                    
weatherstrip                    
weatherstripping

Go

package main

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "log"
    "strings"
    "unicode/utf8"
)

func main() {
    wordList := "unixdict.txt"
    b, err := ioutil.ReadFile(wordList)
    if err != nil {
        log.Fatal("Error reading file")
    }
    bwords := bytes.Fields(b)
    var words []string
    for _, bword := range bwords {
        s := string(bword)
        if utf8.RuneCountInString(s) > 11 {
            words = append(words, s)
        }
    }
    count := 0
    fmt.Println("Words containing 'the' having a length > 11 in", wordList, "\b:")
    for _, word := range words {
        if strings.Contains(word, "the") {
            count++
            fmt.Printf("%2d: %s\n", count, word)
        }
    }
}
Output:
Words containing 'the' having a length > 11 in unixdict.txt:
 1: authenticate
 2: chemotherapy
 3: chrysanthemum
 4: clothesbrush
 5: clotheshorse
 6: eratosthenes
 7: featherbedding
 8: featherbrain
 9: featherweight
10: gaithersburg
11: hydrothermal
12: lighthearted
13: mathematician
14: neurasthenic
15: nevertheless
16: northeastern
17: northernmost
18: otherworldly
19: parasympathetic
20: physiotherapist
21: physiotherapy
22: psychotherapeutic
23: psychotherapist
24: psychotherapy
25: radiotherapy
26: southeastern
27: southernmost
28: theoretician
29: weatherbeaten
30: weatherproof
31: weatherstrip
32: weatherstripping

Haskell

import System.IO (readFile)
import Data.List (isInfixOf)

main = do
  txt <- readFile "unixdict.txt"
  let res = [ w | w <- lines txt, isInfixOf "the" w, length w > 11 ]
  putStrLn $ show (length res) ++ " words were found:"
  mapM_ putStrLn res
λ> main
32 words were found:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

J

   >(#~ (+./@E.~&'the'*11<#)@>) cutLF fread'unixdict.txt'
authenticate     
chemotherapy     
chrysanthemum    
clothesbrush     
clotheshorse     
eratosthenes     
featherbedding   
featherbrain     
featherweight    
gaithersburg     
hydrothermal     
lighthearted     
mathematician    
neurasthenic     
nevertheless     
northeastern     
northernmost     
otherworldly     
parasympathetic  
physiotherapist  
physiotherapy    
psychotherapeutic
psychotherapist  
psychotherapy    
radiotherapy     
southeastern     
southernmost     
theoretician     
weatherbeaten    
weatherproof     
weatherstrip     
weatherstripping

JavaScript

document.write(`
  <p>Select a file:        <input type="file" id="file"></p>
  <p>Get words containing: <input value="THE" type="text" id="cont"></p>
  <p>Min. word length:     <input type="number" value="12" id="len"></p>
  <div id="info"></div><div id="out"></div>
`);

function search(inp) {
  let cont = document.getElementById('cont').value.toUpperCase(),
      len  = parseInt(document.getElementById('len').value),
      out  = document.getElementById('out'),
      info = document.getElementById('info'),
      result = [], i;
  inp = inp.replace(/\n|\r/g, '_');
  inp = inp.replace(/__/g, ' ').split(' ');
  for (i = 0; i < inp.length; i++)
    if (inp[i].length >= len && inp[i].toUpperCase().indexOf(cont) != -1)
      result.push(inp[i]);
  info.innerHTML = `<h2>${result.length} matches found for ${cont}, min. length ${len}:</h2>`;
  out.innerText = result.join(', ');
}

document.getElementById('file').onchange = function() {
  let fr = new FileReader(),
      f = document.getElementById('file').files[0];
  fr.onload = function() { search(fr.result); }
  fr.readAsText(f);
}
Output:
32 matches found for THE, min. length 12:
authenticate, chemotherapy, chrysanthemum, clothesbrush, clotheshorse, eratosthenes, featherbedding, featherbrain, featherweight, gaithersburg, hydrothermal, lighthearted, mathematician, neurasthenic, nevertheless, northeastern, northernmost, otherworldly, parasympathetic, physiotherapist, physiotherapy, psychotherapeutic, psychotherapist, psychotherapy, radiotherapy, southeastern, southernmost, theoretician, weatherbeaten, weatherproof, weatherstrip, weatherstripping

jq

jq -nrR 'inputs|select(length>11 and index("the"))' unixdict.txt

One could also use `test("the")` here instead, the difference being that the argument of `test` is a JSON string interpreted as a regular expression.

Output:

As for 11l et al.

Julia

See Alternade_words for the foreachword function.

containsthe(w, d) = occursin("the", w) ? w : ""
foreachword("unixdict.txt", containsthe, minlen = 12)
Output:
Word source: unixdict.txt

authenticate   chemotherapy   chrysanthemum  clothesbrush   clotheshorse   eratosthenes
featherbedding featherbrain   featherweight  gaithersburg   hydrothermal   lighthearted
mathematician  neurasthenic   nevertheless   northeastern   northernmost   otherworldly
parasympatheticphysiotherapistphysiotherapy  psychotherapeuticpsychotherapistpsychotherapy
radiotherapy   southeastern   southernmost   theoretician   weatherbeaten  weatherproof   
weatherstrip   weatherstripping

Lua

for word in io.open("unixdict.txt", "r"):lines() do
  if #word > 11 and word:find("the") then
    print(word)
  end
end
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Mathematica/Wolfram Language

dict = Once[Import["https://web.archive.org/web/20180611003215/http://www.puzzlers.org/pub/wordlists/unixdict.txt"]];
dict //= StringSplit[#, "\n"] &;
dict //= Select[StringLength /* GreaterThan[11]];
Select[dict, StringContainsQ["the"]]
Output:
{authenticate, chemotherapy, chrysanthemum, clothesbrush, clotheshorse, eratosthenes, featherbedding, featherbrain, featherweight, gaithersburg, hydrothermal, lighthearted, mathematician, neurasthenic, nevertheless, northeastern, northernmost, otherworldly, parasympathetic, physiotherapist, physiotherapy, psychotherapeutic, psychotherapist, psychotherapy, radiotherapy, southeastern, southernmost, theoretician, weatherbeaten, weatherproof, weatherstrip, weatherstripping}

min

Works with: min version 0.27.1
"unixdict.txt" fread "\n" split
(length 11 >) filter
("the" indexof -1 !=) filter
(puts!) foreach
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Nanoquery

words = split(new(Nanoquery.IO.File).open("unixdict.txt").readAll(),"\n")
for word in words
    if (word .contains. "the") and (len(word) > 11)
        println word
    end if
end for

Nim

import strutils

var count = 0
for word in "unixdict.txt".lines:
  if word.len > 11 and word.contains("the"):
    inc count
    echo ($count).align(2), ' ', word
Output:
 1 authenticate
 2 chemotherapy
 3 chrysanthemum
 4 clothesbrush
 5 clotheshorse
 6 eratosthenes
 7 featherbedding
 8 featherbrain
 9 featherweight
10 gaithersburg
11 hydrothermal
12 lighthearted
13 mathematician
14 neurasthenic
15 nevertheless
16 northeastern
17 northernmost
18 otherworldly
19 parasympathetic
20 physiotherapist
21 physiotherapy
22 psychotherapeutic
23 psychotherapist
24 psychotherapy
25 radiotherapy
26 southeastern
27 southernmost
28 theoretician
29 weatherbeaten
30 weatherproof
31 weatherstrip
32 weatherstripping

Objeck

class Thes {
  function : Main(args : String[]) ~ Nil {
    if(args->Size() = 1) {
      reader := System.IO.File.FileReader->New(args[0]);
      words := Collection.Generic.Vector->New()<String>;
      line := reader->ReadLine();
      while(line <> Nil) {
        if(line->Size() > 11 & line->Has("the")) {
          words->AddBack(line);
        };
        line := reader->ReadLine();
      };
      reader->Close();

      found := words->Size();
      "Found {$found} word(s):"->PrintLine();
      each(i : words) {
        word := words->Get(i);
        "{$word} "->Print();
        if(i > 0 & i % 5 = 0) {
          '\n'->Print();
        };
      };
    };
  }
}
Output:
Found 32 word(s):
authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes
featherbedding featherbrain featherweight gaithersburg hydrothermal
lighthearted mathematician neurasthenic nevertheless northeastern
northernmost otherworldly parasympathetic physiotherapist physiotherapy
psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern
southernmost theoretician weatherbeaten weatherproof weatherstrip
weatherstripping

Pascal

Works with: Extended Pascal
program wordsContainingTheSubstring(input, output);
var
	word: string(22);
begin
	while not EOF do
	begin
		readLn(word);
		
		if (length(word) > 11) and_then (index(word, 'the') > 0) then
		begin
			writeLn(word)
		end
	end
end.

If unixdict.txt is fed to stdin, the standard input file, you will get the following output:

Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Perl

Perl one-liner entered from a Posix shell:

/Code$ perl -n -e '/(\w*the\w*)/ && length($1)>11 && print' unixdict.txt
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping
/Code$

Phix

with javascript_semantics
function the(string word) return length(word)>11 and match("the",word) end function
sequence words = filter(unix_dict(),the)
printf(1,"found %d 'the' words:\n%s\n",{length(words),join(shorten(words,"",3),", ")})
Output:
found 32 'the' words:
authenticate, chemotherapy, chrysanthemum, ..., weatherproof, weatherstrip, weatherstripping

PHP

<?php foreach(file("unixdict.txt") as $w) echo (strstr($w, "the") && strlen(trim($w)) > 11) ? $w : "";

Plain English

To run:
Start up.
Put "c:\unixdict.txt" into a path.
Read the path into a buffer.
Slap a rider on the buffer.
Loop.
Move the rider (text file rules).
Subtract 1 from the rider's token's last. \newline
Put the rider's token into a word string.
If the word is blank, break.
If the word's length is less than 12, repeat.
If "the" is in the word, write the word on the console.
Repeat.
Wait for the escape key.
Shut down.
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

PL/I

the: procedure options(main);
    declare dict file;
    open file(dict) title('unixdict.txt');
    on endfile(dict) stop;
    
    declare word char(32) varying;
    do while('1'b);
        get file(dict) list(word);
        if length(word) > 11 & index(word,'the') ^= 0 then
            put skip list(word);
    end;
    
    close file(dict);
end the;
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Processing

String[] words = loadStrings("unixdict.txt");
for (String word : words) {
  if (word.contains("the") && word.length() > 11) {
    println(word);
  }
}
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Python

import urllib.request as request

with request.urlopen("http://wiki.puzzlers.org/pub/wordlists/unixdict.txt") as f:
    a = f.read().decode("ASCII").split()

for s in a:
    if len(s) > 11 and "the" in s:
        print(s)
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Quackery

Uses a finite state machine to search efficiently for a substring. (The fsm to search for "the" is built only once, during compilation.) Presented as a dialogue in the Quackery shell (REPL).

/O>   [ $ 'sundry/fsm.qky' loadfile ] now!
...   [ dup 
...     [ $ 'the' buildfsm ] constant 
...     usefsm over found ]           is contains-"the"
...   [] 
...   $ 'unixdict.txt' sharefile drop 
...   nest$ witheach 
...     [ dup size 12 < iff drop done
...       contains-"the" iff [ nested join ]
...       else drop ]
...   60 wrap$ cr
... 

authenticate chemotherapy chrysanthemum clothesbrush
clotheshorse eratosthenes featherbedding featherbrain
featherweight gaithersburg hydrothermal lighthearted
mathematician neurasthenic nevertheless northeastern
northernmost otherworldly parasympathetic physiotherapist
physiotherapy psychotherapeutic psychotherapist
psychotherapy radiotherapy southeastern southernmost
theoretician weatherbeaten weatherproof weatherstrip
weatherstripping

Stack empty.


R

words <- readLines("http://wiki.puzzlers.org/pub/wordlists/unixdict.txt")
grep("the", words[nchar(words) > 11], value = T)
Output:
 [1] "authenticate"      "chemotherapy"      "chrysanthemum"     "clothesbrush"     
 [5] "clotheshorse"      "eratosthenes"      "featherbedding"    "featherbrain"     
 [9] "featherweight"     "gaithersburg"      "hydrothermal"      "lighthearted"     
[13] "mathematician"     "neurasthenic"      "nevertheless"      "northeastern"     
[17] "northernmost"      "otherworldly"      "parasympathetic"   "physiotherapist"  
[21] "physiotherapy"     "psychotherapeutic" "psychotherapist"   "psychotherapy"    
[25] "radiotherapy"      "southeastern"      "southernmost"      "theoretician"     
[29] "weatherbeaten"     "weatherproof"      "weatherstrip"      "weatherstripping"

Raku

A trivial modification of the ABC words task.

put 'unixdict.txt'.IO.words».fc.grep({ (.chars > 11) && (.contains: 'the') })\
    .&{"{+$_} words:\n  " ~ .batch(8)».fmt('%-17s').join: "\n  "};
Output:
32 words:
  authenticate      chemotherapy      chrysanthemum     clothesbrush      clotheshorse      eratosthenes      featherbedding    featherbrain     
  featherweight     gaithersburg      hydrothermal      lighthearted      mathematician     neurasthenic      nevertheless      northeastern     
  northernmost      otherworldly      parasympathetic   physiotherapist   physiotherapy     psychotherapeutic psychotherapist   psychotherapy    
  radiotherapy      southeastern      southernmost      theoretician      weatherbeaten     weatherproof      weatherstrip      weatherstripping 

Red

Red[]

foreach word read/lines %unixdict.txt [
    if all [11 < length? word find word "the"] [print word]
]
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Refal

$ENTRY Go {
    , <ReadFile 1 'unixdict.txt'>: e.Dict
    = <Each Show <Filter TheWord e.Dict>>;
};

TheWord {
    (e.Word), e.Word: e.X 'the' e.Y,
              <Lenw e.Word>: s.Len e.Word,
              <Compare s.Len 11>: '+' = T;
    (e.Word) = F;
};

ReadFile {
    s.Chan e.Filename =
        <Open 'r' s.Chan e.Filename>
        <ReadFile (s.Chan)>;

    (s.Chan), <Get s.Chan>: {
        0 = <Close s.Chan>;
        e.Line = (e.Line) <ReadFile (s.Chan)>;
    };
};

Each {
    s.F = ;
    s.F t.I e.X = <Mu s.F t.I> <Each s.F e.X>;
};

Filter {
    s.F = ;
    s.F t.I e.X, <Mu s.F t.I>: {
        T = t.I <Filter s.F e.X>;
        F = <Filter s.F e.X>;
    };
};

Show {
    (e.X) = <Prout e.X>;
};
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

REXX

This REXX version doesn't care what order the words in the dictionary are in,   nor does it care what
case  (lower/upper/mixed)  the words are in,   the search for the substring   the   is   caseless.

It also allows the substring to be specified on the command line (CL) as well as the dictionary file identifier.


Programming note:   If the minimum length is negative,   it indicates to find the words   (but not display them),   and
only the display the count of found words.

/*REXX program finds words that contain the substring "the" (within an identified dict.)*/
parse arg $ minL iFID .                          /*obtain optional arguments from the CL*/
if    $=='' |    $=="," then    $= 'the'         /*Not specified?  Then use the default.*/
if minL=='' | minL=="," then minL= 12            /* "      "         "   "   "     "    */
if iFID=='' | iFID=="," then iFID='unixdict.txt' /* "      "         "   "   "     "    */
tell= minL>0;                minL= abs(minL)     /*use absolute value of minimum length.*/
@.=                                              /*default value of any dictionary word.*/
            do #=1  while lines(iFID)\==0        /*read each word in the file  (word=X).*/
            @.#= strip( linein( iFID) )          /*pick off a word from the input line. */
            end   /*#*/
#= # - 1                                         /*adjust word count because of DO loop.*/
$u= $;                                  upper $u /*obtain an uppercase version of  $.   */
say copies('─', 25)     #     "words in the dictionary file: "       iFID
say
finds= 0                                         /*count of the substring found in dict.*/
         do j=1  for #;     z= @.j;     upper z  /*process all the words that were found*/
         if length(z)<minL  then iterate         /*Is word too short?    Yes, then skip.*/
         if pos($u, z)==0   then iterate         /*Found the substring?   No,   "    "  */
         finds= finds + 1                        /*bump count of substring words found. */
         if tell  then say right(left(@.j, 20), 25)    /*Show it?  Indent original word.*/
         end        /*j*/
                                                 /*stick a fork in it,  we're all done. */
say copies('─', 25)     finds     " words (with a min. length of"  ,
                                  minL') that contains the substring: '     $
output   when using the default inputs:
───────────────────────── 25104 words in the dictionary file:  unixdict.txt
     authenticate
     chemotherapy
     chrysanthemum
     clothesbrush
     clotheshorse
     eratosthenes
     featherbedding
     featherbrain
     featherweight
     gaithersburg
     hydrothermal
     lighthearted
     mathematician
     neurasthenic
     nevertheless
     northeastern
     northernmost
     otherworldly
     parasympathetic
     physiotherapist
     physiotherapy
     psychotherapeutic
     psychotherapist
     psychotherapy
     radiotherapy
     southeastern
     southernmost
     theoretician
     weatherbeaten
     weatherproof
     weatherstrip
     weatherstripping
───────────────────────── 32  words (with a min. length of 12) that contain the substring:  the
output   when using the input of:     ,   -3
───────────────────────── 25105 words in the dictionary file:  unixdict.txt
───────────────────────── 287  words (with a min. length of 3) that contains the substring:  the

Ring

cStr = read("unixdict.txt")
wordList = str2list(cStr)
num = 0
the = "the"

see "working..." + nl

ln = len(wordList)
for n = ln to 1 step -1
    if len(wordList[n]) < 12
       del(wordList,n)
    ok
next

see "Words containing "the" substring:" + nl

for n = 1 to len(wordList)
    ind = substr(wordList[n],the)
    if ind > 0
       num = num +1
       see "" + num + ". " + wordList[n] + nl
    ok
next

see "done..." + nl

Output:

working...
Founded "the" words are:
1. authenticate
2. chemotherapy
3. chrysanthemum
4. clothesbrush
5. clotheshorse
6. eratosthenes
7. featherbedding
8. featherbrain
9. featherweight
10. gaithersburg
11. hydrothermal
12. lighthearted
13. mathematician
14. neurasthenic
15. nevertheless
16. northeastern
17. northernmost
18. otherworldly
19. parasympathetic
20. physiotherapist
21. physiotherapy
22. psychotherapeutic
23. psychotherapist
24. psychotherapy
25. radiotherapy
26. southeastern
27. southernmost
28. theoretician
29. weatherbeaten
30. weatherproof
31. weatherstrip
32. weatherstripping
done...

Ruby

File.foreach("unixdict.txt"){|w| puts w if w.size > 11 && w.match?("the") }
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Smalltalk

Works with: Smalltalk/X
'unixdict.txt' asFilename contents  
    select:[:word | (word size > 11) and:[word includesString:'the']]
    thenDo:#transcribeCR

if counting per word is required (which is overkill here, as there are no duplicates in the file), keep them in a bag:

bagOfWords := Bag new.
'unixdict.txt' asFilename contents  
    select:[:word | (word size > 11) and:[word includesString:'the']]
    thenDo:[:word | bagOfWords add:word. word transcribeCR].

bagOfWords transcribeCR.
bagOfWords size transcribeCR

Note: #transcribeCR is a method in Object which says: "Transcript showCR:self".

Works with: Smalltalk/X

Variant (as script file). Save to file: "filter.st":

#! /usr/bin/env stx --script
[Stdin atEnd] whileFalse:[
    |word|
  ((word := Stdin nextLine) size > 11
      and:[word includesString:'the']
    ) ifTrue:[
        Stdout nextPutLine: word
   ]
]

Execute with:

chmod +x filter.st
./filter.st < unixdict.txt

The output from the above counting snippet:

Output:
authenticate
chemotherapy
chrysanthemum
...
weatherproof
weatherstrip
weatherstripping

Bag(chrysanthemum(*1) hydrothermal(*1) nevertheless(*1) chemotherapy(*1) eratosthenes(*1)
    mathematician(*1) ... theoretician(*1) weatherbeaten(*1) weatherstripping(*1))

32

sed

#!/bin/sed -f

/^.\{12\}/!d
/the/!d

SETL

program the_words;
    dict := open("unixdict.txt", "r");
    loop doing geta(dict, word); until eof(dict) do
        word ?:= "";
        if #word > 11 and "the" in word then
            print(word);
        end if;
    end loop;
    close(dict);
end program;
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Standard ML

val hasThe = String.isSubstring "the"

fun isThe12 s = size s > 11 andalso hasThe s

val () = print
  ((String.concatWith " "
    o List.filter isThe12
    o String.tokens Char.isSpace
    o TextIO.inputAll) TextIO.stdIn ^ "\n")
Output:
authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping

Swift

import Foundation

let minLength = 12
let substring = "the"

do {
    try String(contentsOfFile: "unixdict.txt", encoding: String.Encoding.ascii)
        .components(separatedBy: "\n")
        .filter{$0.count >= minLength && $0.contains(substring)}
        .enumerated()
        .forEach{print(String(format: "%2d. %@", $0.0 + 1, $0.1))}
} catch {
    print(error.localizedDescription)
}
Output:
 1. authenticate
 2. chemotherapy
 3. chrysanthemum
 4. clothesbrush
 5. clotheshorse
 6. eratosthenes
 7. featherbedding
 8. featherbrain
 9. featherweight
10. gaithersburg
11. hydrothermal
12. lighthearted
13. mathematician
14. neurasthenic
15. nevertheless
16. northeastern
17. northernmost
18. otherworldly
19. parasympathetic
20. physiotherapist
21. physiotherapy
22. psychotherapeutic
23. psychotherapist
24. psychotherapy
25. radiotherapy
26. southeastern
27. southernmost
28. theoretician
29. weatherbeaten
30. weatherproof
31. weatherstrip
32. weatherstripping

Tcl

foreach w [read [open unixdict.txt]] {
    if {[string first the $w] != -1 && [string length $w] > 11} {
        puts $w
    }
}
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

VBA

Sub Main_Contain()
Dim ListeWords() As String, Book As String, i As Long, out() As String, count As Integer
    Book = Read_File("C:\Users\" & Environ("Username") & "\Desktop\unixdict.txt")
    ListeWords = Split(Book, vbNewLine)
    For i = LBound(ListeWords) To UBound(ListeWords)
        If Len(ListeWords(i)) > 11 Then
            If InStr(ListeWords(i), "the") > 0 Then
                ReDim Preserve out(count)
                out(count) = ListeWords(i)
                count = count + 1
            End If
        End If
    Next
    Debug.Print "Found : " & count & " words : " & Join(out, ", ")
End Sub
Private Function Read_File(Fic As String) As String
Dim Nb As Integer
    Nb = FreeFile
    Open Fic For Input As #Nb
        Read_File = Input(LOF(Nb), #Nb)
    Close #Nb
End Function
Output:
Found : 32 words : authenticate, chemotherapy, chrysanthemum, clothesbrush, clotheshorse, eratosthenes, featherbedding, featherbrain,
 featherweight, gaithersburg, hydrothermal, lighthearted, mathematician, neurasthenic, nevertheless, northeastern, northernmost, 
otherworldly, parasympathetic, physiotherapist, physiotherapy, psychotherapeutic, psychotherapist, psychotherapy, radiotherapy, 
southeastern, southernmost, theoretician, weatherbeaten, weatherproof, weatherstrip, weatherstripping

VBScript

Run it with Cscript

with createobject("ADODB.Stream")
  .charset ="UTF-8"
  .open
  .loadfromfile("unixdict.txt")
  s=.readtext
end with  
a=split (s,vblf)
with new regexp
  .pattern=".*?the.*"

for each i in a
  if len(trim(i))>=11 then
   if .test(i) then wscript.echo i
  end if
next
end with
Output:
authenticate
brotherhood
calisthenic
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
clothesline
earthenware
endothelial
endothermic
eratosthenes
featherbedding
featherbrain
featherweight
furtherance
furthermore
furthermost
gaithersburg
grandfather
grandmother
hydrothermal
kinesthesis
leatherback
leatherneck
leatherwork
lighthearted
mathematician
netherlands
netherworld
neurasthenic
nevertheless
nonetheless
northeastern
northernmost
otherworldly
parasympathetic
parentheses
parenthesis
parenthetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
smithereens
southeastern
southernmost
sympathetic
thenceforth
theoretician
therapeutic
thereabouts
theretofore
weatherbeaten
weatherproof
weatherstrip
weatherstripping

V (Vlang)

import os

fn main() {
    mut count := 1
    mut text :=''
    unixdict := os.read_file('./unixdict.txt') or {panic('file not found')}
    for word in unixdict.split_into_lines() {
        if word.contains('the') && word.len > 11 {text += count++.str() + ': $word \n'}
    }
    println(text)
}
Output:
1: authenticate
2: chemotherapy
3: chrysanthemum
4: clothesbrush
5: clotheshorse
6: eratosthenes
7: featherbedding
8: featherbrain
9: featherweight
10: gaithersburg
11: hydrothermal
12: lighthearted
13: mathematician
14: neurasthenic
15: nevertheless
16: northeastern
17: northernmost
18: otherworldly
19: parasympathetic
20: physiotherapist
21: physiotherapy
22: psychotherapeutic
23: psychotherapist
24: psychotherapy
25: radiotherapy
26: southeastern
27: southernmost
28: theoretician
29: weatherbeaten
30: weatherproof
31: weatherstrip
32: weatherstripping

Wren

Library: Wren-fmt
import "io" for File
import "./fmt" for Fmt

var wordList = "unixdict.txt" // local copy
var words = File.read(wordList).trimEnd().split("\n").where { |w| w.count > 11 }.toList
var count = 0
System.print("Words containing 'the' having a length > 11 in %(wordList):")
for (word in words) {
    if (word.contains("the")) {
        count = count + 1
        Fmt.print("$2d: $s", count, word)
    }
}
Output:
Words containing 'the' having a length > 11 in unixdict.txt:
 1: authenticate
 2: chemotherapy
 3: chrysanthemum
 4: clothesbrush
 5: clotheshorse
 6: eratosthenes
 7: featherbedding
 8: featherbrain
 9: featherweight
10: gaithersburg
11: hydrothermal
12: lighthearted
13: mathematician
14: neurasthenic
15: nevertheless
16: northeastern
17: northernmost
18: otherworldly
19: parasympathetic
20: physiotherapist
21: physiotherapy
22: psychotherapeutic
23: psychotherapist
24: psychotherapy
25: radiotherapy
26: southeastern
27: southernmost
28: theoretician
29: weatherbeaten
30: weatherproof
31: weatherstrip
32: weatherstripping

XPL0

string  0;              \use zero-terminated strings
int     I, Ch, Len;
char    Word(100);      \(longest word in unixdict.txt is 22 chars)
def     LF=$0A, CR=$0D, EOF=$1A;
[FSet(FOpen("unixdict.txt", 0), ^I);    \open dictionary and set it to device 3
OpenI(3);
repeat  I:= 0;
        loop    [repeat Ch:= ChIn(3) until Ch # CR;     \remove possible CR
                if Ch=LF or Ch=EOF then quit;
                Word(I):= Ch;
                I:= I+1;
                ];
        Word(I):= 0;                    \terminate string
        Len:= I;
        if Len >= 12 then
            for I:= 0 to Len-3 do       \scan for "the" (assume lowercase)
                if Word(I)=^t & Word(I+1)=^h & Word(I+2)=^e then
                        [Text(0, Word);  CrLf(0)];
until   Ch = EOF;
]
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping

Yabasic

// Rosetta Code problem: http://rosettacode.org/wiki/Words_containing_"the"_substring
// by Galileo, 02/2022

a = open("unixdict.txt")
while not eof(a)
  line input #a a$
  if len(a$) > 11 and instr(a$, "the") print a$
wend
close a
Output:
authenticate
chemotherapy
chrysanthemum
clothesbrush
clotheshorse
eratosthenes
featherbedding
featherbrain
featherweight
gaithersburg
hydrothermal
lighthearted
mathematician
neurasthenic
nevertheless
northeastern
northernmost
otherworldly
parasympathetic
physiotherapist
physiotherapy
psychotherapeutic
psychotherapist
psychotherapy
radiotherapy
southeastern
southernmost
theoretician
weatherbeaten
weatherproof
weatherstrip
weatherstripping
---Program done, press RETURN---