Anagram generator: Difference between revisions

m
→‎{{header|Wren}}: Changed to Wren S/H
(→‎{{header|J}}: show a longer generated anagram)
m (→‎{{header|Wren}}: Changed to Wren S/H)
 
(12 intermediate revisions by 6 users not shown)
Line 1:
{{draft task}}
 
There are already [[Anagrams|other]] [[Anagrams/Deranged_anagrams|tasks]] relating to ''finding'' '''existing''' [[wp:Anagram|anagrams]]. This one is about '''creating''' them.
Line 34:
Implementation:
 
<langsyntaxhighlight Jlang="j">anagen=: {{
seed=. (tolower y)([-.-.)a.{~97+i.26
letters=. ~.seed
Line 55:
end.
EMPTY
}}</langsyntaxhighlight>
 
Conceptually, we're working with a graph here -- given a seed word, we have sequence of letters and a count of each of those letters which we must supply. Given a list of words whose letters do not exceed that count, we extend that list with an additional word, discarding word sequences which go over the letter count. We would repeat as necessary until we satisfy the needed letter counts. Also, we inspect the "final result" and remove our seed word or phrase from it (if it was present -- and if that would have been our only result we would extend our under budget word sequences until they satisfy our letter counts). It seems like two words is usually sufficient here.
 
To limit our resource consumption, we turn this conceptual graph into a directed graph by requiring that the words in our anagram sequence appear in alphabetic order (we could go back and permute each word sequence if we wanted to pad our results).
 
We could further improve performance (and typically generate fewer example anagrams) if we required our initial word choice to include "rarely used letters" and then alternated between increasing alphabetic order and decreasing alphabetic order when adding words to the potential anagrams. This approach would reduce the size of intermediate results which would [usually] significantly increase search speed. But it might require backtracking if our initial choices were unfortunate.
 
Examples:
 
<langsyntaxhighlight Jlang="j"> 'unixdict.txt' anagen 'Rosettacode'
cetera stood
coat oersted
Line 89 ⟶ 91:
downcast eliot
downstate loci
edison walcott 'unixdict.txt' anagen 'Pizza party time'
'unixdict.txt' anagen 'Pizza party time'
aim pizza pretty
airy pizza tempt
Line 132 ⟶ 135:
piazza trim type
piety pizza tram
pizza pyrite tam</langsyntaxhighlight>
 
=={{header|jq}}==
 
This entry provides a number of anagram-related functions, all based
on the profiling of words by the frequency counts of their constituent
letters. The emphasis is on a reasonable level efficiency within the context of general-purpose utility
functions.
 
In general, a streaming approach is taken, in part for speed and to minimize
memory requirements, and in part to highlight jq's support
for streaming.
 
Phrasal anagrams are not limited by a predetermined length,
and are computed as a stream consisting of a sorted list
of words, e.g. for "firefox", one phrasal anagram would be presented as:
<pre>
["off","re","xi"]
</pre>
 
In order to motivate the helper functions, consider first the task
of generating anagrams of a word, $word. In this case,
we have only to construct a listing of relevant words,
which can be done by calling our "dictionary" construction
function:
<pre>
dict(inputs; $word; $word|length)
</pre>
Here, the last argument ensures that only words having the same length
as the target word are selected.
 
Next, consider the task of generating anagrammatic phrases wherein
each word has a minimum length of 2. This would be accomplished by:
<pre>
dict(inputs; $word; 2)
| anagram_phrases($letters)
</pre>
'''Use of gojq'''
 
The program presented here has been tested using both jq, the C-based implementation,
and gojq, the Go implementation of jq, but in the latter case
with the additional definition:
<pre>
def keys_unsorted: keys;
</pre>
 
'''Generic Functions'''
<syntaxhighlight lang=jq>
 
def count(stream): reduce stream as $x (0; .+1);
 
# remove adjacent duplicates
def once(stream):
foreach stream as $x ({prev: null};
if $x == .prev then .emit = null else .emit = $x | .prev = $x end;
select(.emit).emit);
 
# one-for-one subtraction
# if . and $y are arrays, then emit the array difference, respecting multiplicity;
# similarly if both are strings, emit a string representing the difference.
def minus($y):
if type == "array"
then reduce $y[] as $x (.;
index($x) as $ix
| if $ix then .[:$ix] + .[$ix+1:] else . end)
else explode | minus($y|explode) | implode
end;
 
# bag of words
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
</syntaxhighlight>
'''Anagrams'''
<syntaxhighlight lang=jq>
# Input: a string
def profile: bow( explode[] | [.] | implode);
 
# Comparison of profiles
# . and $p2 should be profiles
def le($p2):
. as $p1
| all( keys_unsorted[]; $p1[.] <= $p2[.]);
 
def eq($p2):
. as $p1
| all( keys_unsorted + ($p2|keys_unsorted) | unique[]; $p1[.] == $p2[.]);
 
# Construct a list of relevant words using the given stream.
# $min is the a-priori minimum word length and
# so if $min is the length of $word, we are looking for exact anagrams:
def dict(stream; $word; $min):
($word|length) as $length
| ($word|profile) as $profile
| if $length == $min
then [stream | select(profile|eq($profile))]
else [stream
| select(length >= $min and
length <= $length and
(profile|le($profile)))]
end ;
 
# Input: an array of admissible words.
# Output: a stream of anagrams of $word
def anagram($word):
($word|profile) as $profile
| .[]
| select(profile|eq($profile));
 
# Input: an array of admissible words.
# Output: a stream of subanagrams of $word
# regarding each occurrence of a letter as distinct from all others
def subanagrams($word):
($word|profile) as $profile
| .[]
| select(profile|le($profile));
 
# input: an array to be extended with an additional dictionary word.
# output: a stream of arrays with additional words
# selected from the characters in the string $letters.
# The input array should be in alphabetical order; if it is,
# so will the output array.
def extend($letters; $dict):
if $letters == "" then .
else . as $in
| ($dict|subanagrams($letters)) as $w
| select(if ($in|length) > 0 then $in[-1] <= $w else true end)
| ($letters | minus($w)) as $remaining
| ($in + [$w]) | extend($remaining; $dict)
end;
 
def anagram_phrases($letters):
. as $dict
| once([] | extend($letters; $dict));
 
# Global: $anagram $word $min
def main:
if $anagram
then dict(inputs; $word; $word|length)[]
else dict(inputs; $word; $min)
| anagram_phrases($word)
end ;
 
main
</syntaxhighlight>
'''Invocation template'''
<pre>
< unixdict.txt jq -ncrR --arg word WORD --argjson anagram BOOLEAN --argjson min MIN -f anagram-generator.jq
</pre>
 
where:
* WORD is a string (normally, a word)
* BOOLEAN determines whether anagrams or anagrammatic phrases are sought
* MIN is the minimum length of admissible words to be used if $anagram is false
 
 
'''Examples'''
 
Anagram
<pre>
# Invocation:
< unixdict.txt jq -nrR --arg word read --argjson anagram true --argjson min 3 -f anagram-generator.jq
dare
dear
erda
read
</pre>
 
Anagrammatic phrases
<pre>
# Invocation:
< unixdict.txt jq -ncR --arg word firefox --argjson anagram false --argjson min 2 -f anagram-generator.jq
["ex","fir","of"]
["ex","for","if"]
["ex","fro","if"]
["ex","ir","off"]
["ex","off","ri"]
["fe","fir","ox"]
["fe","fix","or"]
["fe","for","ix"]
["fe","for","xi"]
["fe","fox","ir"]
["fe","fox","ri"]
["fe","fro","ix"]
["fe","fro","xi"]
["fifo","rex"]
["fire","fox"]
["fix","fore"]
["fix","of","re"]
["fox","if","re"]
["if","of","rex"]
["ix","off","re"]
["ix","offer"]
["off","re","xi"]
["offer","xi"]
</pre>
 
=={{header|Julia}}==
<syntaxhighlight lang="julia">const unixwords = split(read("unixdict.txt", String) |> lowercase, r"\s+")
 
function findphrases(anastring::AbstractString, choices, sizelong = 4, n_shortpermitted = 1)
anadict = Dict{Char, Int}()
for c in lowercase(anastring)
if 'a' <= c <= 'z'
anadict[c] = get(anadict, c, 0) + 1
end
end
phrases = String[]
function addword(remaining, phrase, numshort)
for w in unixwords
len = length(w)
numshort < 1 && len < sizelong && continue
any(c -> get(remaining, c, 0) < count(==(c), w), w) && @goto nextword
cdict = copy(remaining)
for c in w
cdict[c] -= 1
end
if all(==(0), values(cdict))
return strip(phrase * " " * w)
elseif (newphrase = addword(cdict, phrase * " " * w, numshort - (len < sizelong))) != nothing
push!(phrases, newphrase)
print(length(phrases), "\b\b\b\b\b\b\b\b\b")
end
@label nextword
end
return nothing
end
addword(anadict, "", n_shortpermitted)
return phrases
end
 
for s in ["Rosetta code", "Joe Biden", "wherrera"]
println("\nFrom '$s':")
foreach(println, findphrases(s, unixwords, 4, 0) |> unique |> sort!)
end
</syntaxhighlight>{{out}}
<pre>
From 'Rosetta code':
cetera stood
coat oersted
coda rosette
code rosetta
coed rosetta
create stood
derate scoot
doctor tease
oersted coat
rosetta code
rosette coda
scoot derate
stood cetera
tease doctor
 
From 'Joe Biden':
done jibe
jibe done
node jibe
 
From 'wherrera':
herr ware
rare wehr
rear wehr
ware herr
wear herr
wehr rare
</pre>
 
=={{header|Nim}}==
{{trans|Julia}}
<syntaxhighlight lang="Nim">import std/[algorithm, sequtils, strutils, tables]
 
proc readWords(filename: string): seq[string] {.compileTime.} =
result = filename.staticRead().splitLines().map(toLowerAscii)
 
const UnixWords = readWords("unixdict.txt")
 
func findPhrases(anaString: string; choices: seq[string];
sizeLong = 4; nShortPermitted = 1): seq[string] =
var anaDict: CountTable[char]
for c in anaString.toLowerAscii:
if c in 'a'..'z':
anadict.inc(c)
var phrases: seq[string]
 
func addWord(remaining: CountTable[char]; phrase: string; numShort: int): string =
for word in UnixWords:
block Search:
if numShort < 1 and word.len < sizeLong:
break Search
if anyIt(word, remaining.getOrDefault(it) < word.count(it)):
break Search
var cdict = remaining
for c in word: cdict.inc(c, -1)
if allIt(cdict.values.toSeq, it == 0):
return strip(phrase & ' ' & word)
let newPhrase = addWord(cdict, phrase & ' ' & word, numshort - ord(word.len < sizeLong))
if newPhrase.len > 0:
phrases.add newPhrase
 
discard addWord(anaDict, "", nShortPermitted)
result = move(phrases)
 
for s in ["Rosetta code", "Joe Biden", "wherrera"]:
echo "From '$#':" % s
for phrase in findPhrases(s, UnixWords, 4, 0).sorted.deduplicate(true):
echo phrase
echo()
</syntaxhighlight>
 
{{out}}
<pre>From 'Rosetta code':
cetera stood
coat oersted
coda rosette
code rosetta
coed rosetta
create stood
derate scoot
doctor tease
oersted coat
rosetta code
rosette coda
scoot derate
stood cetera
tease doctor
 
From 'Joe Biden':
done jibe
jibe done
node jibe
 
From 'wherrera':
herr ware
rare wehr
rear wehr
ware herr
wear herr
wehr rare
</pre>
 
=={{header|Phix}}==
Couldn't really think of a better way than just building a dirty great filter list to get rid of the less interesting answers....
<!--<langsyntaxhighlight Phixlang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">constant</span> <span style="color: #000000;">bo_ring</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"al"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"alex"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"am"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"an"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"and"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"anent"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ann"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ant"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ar"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ares"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"art"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"at"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ax"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"axle"</span><span style="color: #0000FF;">,</span>
Line 192 ⟶ 532:
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #7060A8;">papply</span><span style="color: #0000FF;">({</span><span style="color: #008000;">"Rosetta"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"PureFox"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"PeteLomax"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Wherrera"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Thundergnat"</span><span style="color: #0000FF;">},</span><span style="color: #000000;">test</span><span style="color: #0000FF;">)</span>
<!--</langsyntaxhighlight>-->
{{out}}
<pre>
Line 244 ⟶ 584:
Using the unixdict.txt word file by default.
 
<syntaxhighlight lang="raku" perl6line>unit sub MAIN ($in is copy = '', :$dict = 'unixdict.txt');
 
say 'Enter a word or phrase to be anagramed. (Loading dictionary)' unless $in.chars;
Line 283 ⟶ 623:
}
}
}</langsyntaxhighlight>
{{out|Truncated to only show the best few as subjectively determined by me}}
''Punctuation, capitalization and (in some cases) word order manually massaged.''
Line 308 ⟶ 648:
 
Alternatives formed by simply changing the order of the two words have been suppressed.
<langsyntaxhighlight ecmascriptlang="wren">import "io" for File
import "./str" for Str, Char
import "./perm" for Comb
Line 315 ⟶ 655:
 
var wordList = "unixdict.txt" // local copy
var words = File.read("unixdict.txt"wordList).split("\n").map { |w| w.trim() }
var wordMap = {}
for (word in words) {
Line 371 ⟶ 711:
System.print("\n%(tests[i]):")
anagramGenerator.call(tests[i])
}</langsyntaxhighlight>
 
{{out}}
9,476

edits