Anagram generator: Difference between revisions

← Older edit

Anagram generator (view source)

Revision as of 10:28, 6 November 2023

9,521 bytes added , 6 months ago

m

→‎{{header|Wren}}: Changed to Wren S/H

PureFox

9,476

edits

Revision as of 11:49, 10 July 2022 (view source) Rdm (talk \| contribs) (→‎{{header\|J}}: show a longer generated anagram) ← Older edit		Latest revision as of 10:28, 6 November 2023 (view source) PureFox (talk \| contribs) m (→‎{{header\|Wren}}: Changed to Wren S/H)
(12 intermediate revisions by 6 users not shown)
Line 1: {{~~draft~~ task}} There are already [[Anagrams\|other]] [[Anagrams/Deranged_anagrams\|tasks]] relating to ''finding'' '''existing''' [[wp:Anagram\|anagrams]]. This one is about '''creating''' them. Line 34: Implementation: <~~lang~~syntaxhighlight Jlang="j">anagen=: {{ seed=. (tolower y)([-.-.)a.{~97+i.26 letters=. ~.seed Line 55: end. EMPTY }}</~~lang~~syntaxhighlight> Conceptually, we're working with a graph here -- given a seed word, we have sequence of letters and a count of each of those letters which we must supply. Given a list of words whose letters do not exceed that count, we extend that list with an additional word, discarding word sequences which go over the letter count. We would repeat as necessary until we satisfy the needed letter counts. Also, we inspect the "final result" and remove our seed word or phrase from it (if it was present -- and if that would have been our only result we would extend our under budget word sequences until they satisfy our letter counts). It seems like two words is usually sufficient here. To limit our resource consumption, we turn this conceptual graph into a directed graph by requiring that the words in our anagram sequence appear in alphabetic order (we could go back and permute each word sequence if we wanted to pad our results). We could further improve performance (and typically generate fewer example anagrams) if we required our initial word choice to include "rarely used letters" and then alternated between increasing alphabetic order and decreasing alphabetic order when adding words to the potential anagrams. This approach would reduce the size of intermediate results which would [usually] significantly increase search speed. But it might require backtracking if our initial choices were unfortunate. Examples: <~~lang~~syntaxhighlight Jlang="j"> 'unixdict.txt' anagen 'Rosettacode' cetera stood coat oersted Line 89 ⟶ 91: downcast eliot downstate loci edison walcott ~~'unixdict.txt' anagen 'Pizza party time'~~ 'unixdict.txt' anagen 'Pizza party time' aim pizza pretty airy pizza tempt Line 132 ⟶ 135: piazza trim type piety pizza tram pizza pyrite tam</~~lang~~syntaxhighlight> =={{header\|jq}}== This entry provides a number of anagram-related functions, all based on the profiling of words by the frequency counts of their constituent letters. The emphasis is on a reasonable level efficiency within the context of general-purpose utility functions. In general, a streaming approach is taken, in part for speed and to minimize memory requirements, and in part to highlight jq's support for streaming. Phrasal anagrams are not limited by a predetermined length, and are computed as a stream consisting of a sorted list of words, e.g. for "firefox", one phrasal anagram would be presented as: <pre> ["off","re","xi"] </pre> In order to motivate the helper functions, consider first the task of generating anagrams of a word, $word. In this case, we have only to construct a listing of relevant words, which can be done by calling our "dictionary" construction function: <pre> dict(inputs; $word; $word\|length) </pre> Here, the last argument ensures that only words having the same length as the target word are selected. Next, consider the task of generating anagrammatic phrases wherein each word has a minimum length of 2. This would be accomplished by: <pre> dict(inputs; $word; 2) \| anagram_phrases($letters) </pre> '''Use of gojq''' The program presented here has been tested using both jq, the C-based implementation, and gojq, the Go implementation of jq, but in the latter case with the additional definition: <pre> def keys_unsorted: keys; </pre> '''Generic Functions''' <syntaxhighlight lang=jq> def count(stream): reduce stream as $x (0; .+1); # remove adjacent duplicates def once(stream): foreach stream as $x ({prev: null}; if $x == .prev then .emit = null else .emit = $x \| .prev = $x end; select(.emit).emit); # one-for-one subtraction # if . and $y are arrays, then emit the array difference, respecting multiplicity; # similarly if both are strings, emit a string representing the difference. def minus($y): if type == "array" then reduce $y[] as $x (.; index($x) as $ix \| if $ix then .[:$ix] + .[$ix+1:] else . end) else explode \| minus($y\|explode) \| implode end; # bag of words def bow(stream): reduce stream as $word ({}; .[($word\|tostring)] += 1); </syntaxhighlight> '''Anagrams''' <syntaxhighlight lang=jq> # Input: a string def profile: bow( explode[] \| [.] \| implode); # Comparison of profiles # . and $p2 should be profiles def le($p2): . as $p1 \| all( keys_unsorted[]; $p1[.] <= $p2[.]); def eq($p2): . as $p1 \| all( keys_unsorted + ($p2\|keys_unsorted) \| unique[]; $p1[.] == $p2[.]); # Construct a list of relevant words using the given stream. # $min is the a-priori minimum word length and # so if $min is the length of $word, we are looking for exact anagrams: def dict(stream; $word; $min): ($word\|length) as $length \| ($word\|profile) as $profile \| if $length == $min then [stream \| select(profile\|eq($profile))] else [stream \| select(length >= $min and length <= $length and (profile\|le($profile)))] end ; # Input: an array of admissible words. # Output: a stream of anagrams of $word def anagram($word): ($word\|profile) as $profile \| .[] \| select(profile\|eq($profile)); # Input: an array of admissible words. # Output: a stream of subanagrams of $word # regarding each occurrence of a letter as distinct from all others def subanagrams($word): ($word\|profile) as $profile \| .[] \| select(profile\|le($profile)); # input: an array to be extended with an additional dictionary word. # output: a stream of arrays with additional words # selected from the characters in the string $letters. # The input array should be in alphabetical order; if it is, # so will the output array. def extend($letters; $dict): if $letters == "" then . else . as $in \| ($dict\|subanagrams($letters)) as $w \| select(if ($in\|length) > 0 then $in[-1] <= $w else true end) \| ($letters \| minus($w)) as $remaining \| ($in + [$w]) \| extend($remaining; $dict) end; def anagram_phrases($letters): . as $dict \| once([] \| extend($letters; $dict)); # Global: $anagram $word $min def main: if $anagram then dict(inputs; $word; $word\|length)[] else dict(inputs; $word; $min) \| anagram_phrases($word) end ; main </syntaxhighlight> '''Invocation template''' <pre> < unixdict.txt jq -ncrR --arg word WORD --argjson anagram BOOLEAN --argjson min MIN -f anagram-generator.jq </pre> where: * WORD is a string (normally, a word) * BOOLEAN determines whether anagrams or anagrammatic phrases are sought * MIN is the minimum length of admissible words to be used if $anagram is false '''Examples''' Anagram <pre> # Invocation: < unixdict.txt jq -nrR --arg word read --argjson anagram true --argjson min 3 -f anagram-generator.jq dare dear erda read </pre> Anagrammatic phrases <pre> # Invocation: < unixdict.txt jq -ncR --arg word firefox --argjson anagram false --argjson min 2 -f anagram-generator.jq ["ex","fir","of"] ["ex","for","if"] ["ex","fro","if"] ["ex","ir","off"] ["ex","off","ri"] ["fe","fir","ox"] ["fe","fix","or"] ["fe","for","ix"] ["fe","for","xi"] ["fe","fox","ir"] ["fe","fox","ri"] ["fe","fro","ix"] ["fe","fro","xi"] ["fifo","rex"] ["fire","fox"] ["fix","fore"] ["fix","of","re"] ["fox","if","re"] ["if","of","rex"] ["ix","off","re"] ["ix","offer"] ["off","re","xi"] ["offer","xi"] </pre> =={{header\|Julia}}== <syntaxhighlight lang="julia">const unixwords = split(read("unixdict.txt", String) \|> lowercase, r"\s+") function findphrases(anastring::AbstractString, choices, sizelong = 4, n_shortpermitted = 1) anadict = Dict{Char, Int}() for c in lowercase(anastring) if 'a' <= c <= 'z' anadict[c] = get(anadict, c, 0) + 1 end end phrases = String[] function addword(remaining, phrase, numshort) for w in unixwords len = length(w) numshort < 1 && len < sizelong && continue any(c -> get(remaining, c, 0) < count(==(c), w), w) && @goto nextword cdict = copy(remaining) for c in w cdict[c] -= 1 end if all(==(0), values(cdict)) return strip(phrase * " " * w) elseif (newphrase = addword(cdict, phrase * " " * w, numshort - (len < sizelong))) != nothing push!(phrases, newphrase) print(length(phrases), "\b\b\b\b\b\b\b\b\b") end @label nextword end return nothing end addword(anadict, "", n_shortpermitted) return phrases end for s in ["Rosetta code", "Joe Biden", "wherrera"] println("\nFrom '$s':") foreach(println, findphrases(s, unixwords, 4, 0) \|> unique \|> sort!) end </syntaxhighlight>{{out}} <pre> From 'Rosetta code': cetera stood coat oersted coda rosette code rosetta coed rosetta create stood derate scoot doctor tease oersted coat rosetta code rosette coda scoot derate stood cetera tease doctor From 'Joe Biden': done jibe jibe done node jibe From 'wherrera': herr ware rare wehr rear wehr ware herr wear herr wehr rare </pre> =={{header\|Nim}}== {{trans\|Julia}} <syntaxhighlight lang="Nim">import std/[algorithm, sequtils, strutils, tables] proc readWords(filename: string): seq[string] {.compileTime.} = result = filename.staticRead().splitLines().map(toLowerAscii) const UnixWords = readWords("unixdict.txt") func findPhrases(anaString: string; choices: seq[string]; sizeLong = 4; nShortPermitted = 1): seq[string] = var anaDict: CountTable[char] for c in anaString.toLowerAscii: if c in 'a'..'z': anadict.inc(c) var phrases: seq[string] func addWord(remaining: CountTable[char]; phrase: string; numShort: int): string = for word in UnixWords: block Search: if numShort < 1 and word.len < sizeLong: break Search if anyIt(word, remaining.getOrDefault(it) < word.count(it)): break Search var cdict = remaining for c in word: cdict.inc(c, -1) if allIt(cdict.values.toSeq, it == 0): return strip(phrase & ' ' & word) let newPhrase = addWord(cdict, phrase & ' ' & word, numshort - ord(word.len < sizeLong)) if newPhrase.len > 0: phrases.add newPhrase discard addWord(anaDict, "", nShortPermitted) result = move(phrases) for s in ["Rosetta code", "Joe Biden", "wherrera"]: echo "From '$#':" % s for phrase in findPhrases(s, UnixWords, 4, 0).sorted.deduplicate(true): echo phrase echo() </syntaxhighlight> {{out}} <pre>From 'Rosetta code': cetera stood coat oersted coda rosette code rosetta coed rosetta create stood derate scoot doctor tease oersted coat rosetta code rosette coda scoot derate stood cetera tease doctor From 'Joe Biden': done jibe jibe done node jibe From 'wherrera': herr ware rare wehr rear wehr ware herr wear herr wehr rare </pre> =={{header\|Phix}}== Couldn't really think of a better way than just building a dirty great filter list to get rid of the less interesting answers.... <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">(phixonline)--> <span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> <span style="color: #008080;">constant</span> <span style="color: #000000;">bo_ring</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"al"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"alex"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"am"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"an"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"and"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"anent"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ann"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ant"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ar"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ares"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"art"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"at"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ax"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"axle"</span><span style="color: #0000FF;">,</span> Line 192 ⟶ 532: <span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span> <span style="color: #7060A8;">papply</span><span style="color: #0000FF;">({</span><span style="color: #008000;">"Rosetta"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"PureFox"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"PeteLomax"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Wherrera"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Thundergnat"</span><span style="color: #0000FF;">},</span><span style="color: #000000;">test</span><span style="color: #0000FF;">)</span> <!--</~~lang~~syntaxhighlight>--> {{out}} <pre> Line 244 ⟶ 584: Using the unixdict.txt word file by default. <syntaxhighlight lang="raku" ~~perl6~~line>unit sub MAIN ($in is copy = '', :$dict = 'unixdict.txt'); say 'Enter a word or phrase to be anagramed. (Loading dictionary)' unless $in.chars; Line 283 ⟶ 623: } } }</~~lang~~syntaxhighlight> {{out\|Truncated to only show the best few as subjectively determined by me}} ''Punctuation, capitalization and (in some cases) word order manually massaged.'' Line 308 ⟶ 648: Alternatives formed by simply changing the order of the two words have been suppressed. <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">import "io" for File import "./str" for Str, Char import "./perm" for Comb Line 315 ⟶ 655: var wordList = "unixdict.txt" // local copy var words = File.read(~~"unixdict.txt"~~wordList).split("\n").map { \|w\| w.trim() } var wordMap = {} for (word in words) { Line 371 ⟶ 711: System.print("\n%(tests[i]):") anagramGenerator.call(tests[i]) }</~~lang~~syntaxhighlight> {{out}}