Anagrams/Deranged anagrams
You are encouraged to solve this task according to the task description, using any language you may know.
Two or more words are said to be anagrams if they have the same characters, but in a different order. By analogy with derangements we define a deranged anagram as two words with the same characters, but in which the same character does not appear in the same position in both words.
The task is to use the word list at http://www.puzzlers.org/pub/wordlists/unixdict.txt to find and show the longest deranged anagram.
- Cf.
CoffeeScript
<lang coffeescript>words = require('fs').readFileSync('unixdict.txt').toString().split(/[\n\r]+/) sorted = {} anagrams = []
for word in words #sort words by length (sorted[word.length] or (sorted[word.length] = [])).push [word, word.split().sort().join()]
isAnagram = (w1, w2) -> w1[0] != w2[0] and w1[1] == w2[1]
isDeranged = (word1, word2) -> for pos, letter of word1 return false if word2[pos] == letter return true
for l, group of sorted # find deranged anagrams
while word = group.pop()
anagrams.push [word[0], word2[0]] for word2 in group when isAnagram(word, word2) and isDeranged(word[0], word2[0])
console.log "Longest deranged anagram: #{anagrams[anagrams.length-1].join(' / ')}"</lang>
Output:
Longest deranged anagram: intoxicate / excitation
D
<lang d>import std.stdio, std.file, std.string, std.algorithm,
std.typecons, std.range;
auto findAnagrams(string[] words) {
string[][string] anagrams; foreach (w; words) anagrams[w.sort] ~= w.idup; return array(filter!q{a.length > 1}(anagrams.values));
}
// returns pairs of words that have no character in the same position Tuple!(string,string)[] findDeranged(string[] words) {
typeof(return) result; foreach (i, w1; words) { next: foreach (w2; words[i+1..$]) { for (int k = 0; k < w1.length; k++) { if (w1[k] == w2[k]) continue next; } result ~= tuple(w1, w2); } } return result;
}
Tuple!(string,string)[] largestDerangedAna(string[][] anagrams) {
sort!q{a[0].length > b[0].length}(anagrams); foreach (words; anagrams) { if (auto derangedPairs = findDeranged(words)) return derangedPairs; } return [];
}
void main() {
auto anags = findAnagrams(split(cast(string)read("unixdict.txt"))); writeln("Longest anagrams with no characters in the same position:"); foreach (pairs; largestDerangedAna(anags)) writefln(" %s, %s", pairs.tupleof);
}</lang> Output:
Longest anagrams with no characters in the same position: excitation, intoxicate
Icon and Unicon
This solution (which works in both languages) does a strict interpretation of the problem and ignores the fact that there may be multiple derangements that are the same length (including ignoring multiple derangements arising from the same set of words that are all anagrams).
<lang unicon>link strings # for csort() procedure
procedure main()
anagrams := table() # build lists of anagrams every *(word := !&input) > 1 do { canon := csort(word) /anagrams[canon] := [] put(anagrams[canon], word) }
longest := 1 # find a longest derangement every *(aList := !anagrams) > 1 do if derangement := derange(aList) then if longest <:= *derangement[1] then long := derangement
every writes((!\long||" ")|"\n") # show longest
end
procedure derange(aList) # Return a single derangement from this list
while aWord := get(aList) do return noMatch(aWord, !aList)
end
procedure noMatch(s1,s2) # Produce pair only if s1,s2 are deranged.
every i := 1 to *s1 do if s1[i] == s2[i] then fail return [s1,s2]
end</lang>
Sample run:
->dra <unixdict.txt excitation intoxicate ->
J
This assumes that unixdict.txt has been saved in the current directory.
<lang j> #words=: <;._2 ] 1!:1 <'unixdict.txt' 25104
#anagrams=: (#~ 1 < #@>) (</.~ /:~&>) words
1303
#maybederanged=: (#~ (1 -.@e. #@~."1)@|:@:>&>) anagrams
432
#longest=: (#~ [: (= >./) #@>@{.@>) maybederanged
1
longest
┌───────────────────────┐ │┌──────────┬──────────┐│ ││excitation│intoxicate││ │└──────────┴──────────┘│ └───────────────────────┘</lang>
Note that anagram sets with more than two members might, hypothetically, have made things more complicated. By lucky coincidence, this was not an issue. We could have taken advantage of that coincidence to achieve slight further simplifications. Perhaps maybederanged=: (#~ (-: ~."1)@|:@:>&>) anagrams
In other words, if we had had to consider whether ascertain/cartesian/sectarian
contained a deranged pair, we would have had to break it out into the three pairs it contains. However, since 'excitation' is a longer word than 'ascertain', we know that this triple cannot contain the longest deranged anagram pair. And since there are no anagrams longer than 'excitation' which involve more than a single pair, we know that we can ignore this issue entirely.
Java
<lang java>import java.io.*; import java.util.*;
public class DerangedAnagrams {
public static void main(String[] args) throws IOException { String[][] anagrams = findAnagrams(readLines("unixdict.txt")); String[] res = null; for (String[] ana : anagrams) { int j = ana[0].length() - 1; while (j >= 0 && ana[0].charAt(j) != ana[1].charAt(j)) { j--; } if (j == -1) { res = ana; break; } } System.out.printf("result: %s %s\n", res[0], res[1]); }
static String[][] findAnagrams(String[] words) { Map<String, ArrayList> map = new LinkedHashMap<String, ArrayList>(); for (String w : words) { char[] srt; Arrays.sort(srt = w.toCharArray()); String key = String.valueOf(srt); if (map.containsKey(key)) { map.get(key).add(w); } else { ArrayList v = new ArrayList(); v.add(w); map.put(key, v); } } List pairs = new ArrayList(); for (List<String> v : map.values()) { if (v.size() > 1) { String[] wrds = v.toArray(new String[]{}); for (int i = 0; i < wrds.length - 1; i++) { pairs.add(new String[]{wrds[i], wrds[i + 1]}); } } } return (String[][]) pairs.toArray(new String[][]{}); }
static public String[] readLines(String fn) throws IOException { BufferedReader br = new BufferedReader(new FileReader(fn)); List<String> lines = new ArrayList<String>(); String line = null; while ((line = br.readLine()) != null) lines.add(line.trim()); br.close(); Collections.sort(lines, new Comparator<String>() { public int compare(String a, String b) { return b.length() - a.length(); } }); return lines.toArray(new String[]{}); }
}</lang>
result: excitation intoxicate
Perl
<lang Perl>sub deranged { # only anagrams ever get here
my @a = split(, shift); # split word into letters my @b = split(, shift); for (0 .. $#a) { $a[$_] eq $b[$_] and return; } return 1
}
sub find_deranged {
for my $i ( 0 .. $#_ ) { for my $j ( $i+1 .. $#_ ) { next unless deranged $_[$i], $_[$j];
print "length ", length($_[$i]), ": $_[$i] => $_[$j]\n"; return 1; } }
}
my %letter_list; open my $in, 'unixdict.txt';
local $/ = undef;
for (split(' ', <$in>)) {
# store anagrams in hash table by letters they contain push @{ $letter_list{ join(, sort split(, $_)) } }, $_
}
for ( sort { length($b) <=> length($a) } # sort by length, descending
grep { @{ $letter_list{$_} } > 1 } # take only ones with anagrams keys %letter_list )
{
# if we find a pair, they are the longested due to the sort before last if find_deranged(@{ $letter_list{$_} });
}</lang>
PicoLisp
<lang PicoLisp>(let Words NIL
(in "unixdict.txt" (while (line) (let (Word @ Key (pack (sort (copy @)))) (if (idx 'Words Key T) (push (car @) Word) (set Key (list Word)) ) ) ) ) (maxi '((X) (length (car X))) (extract '((Key) (pick '((Lst) (and (find '((L) (not (find = L Lst))) (val Key) ) (cons (pack @) (pack Lst)) ) ) (val Key) ) ) (idx 'Words) ) ) )</lang>
Output:
-> ("excitation" . "intoxicate")
Perl 6
Note that, to make runtime manageable, we have created a subset file:
grep '^[ie]' unixdict.txt > dict.ie
<lang perl6>my %anagram = slurp('dict.ie').words.map({[.comb]}).classify({ .sort.join });
for %anagram.values.sort({ -@($_[0]) }) -> @aset {
for 0 ..^ @aset.end -> $i { for $i ^.. @aset.end -> $j { if none( @aset[$i].list Zeq @aset[$j].list ) { say "{@aset[$i].join} {@aset[$j].join}"; exit; } } }
} </lang>
Output:
excitation intoxicate
PureBasic
<lang PureBasic>Structure anagram
word.s letters.s
EndStructure
Structure wordList
List words.anagram()
EndStructure
- True = 1
- False = 0
Procedure.s sortLetters(*word.Character, wordLength)
;returns a string with the letters of a word sorted Protected Dim letters.c(wordLength) Protected *letAdr = @letters() CopyMemoryString(*word, @*letAdr) SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1) ProcedureReturn PeekS(@letters(), wordLength)
EndProcedure
- Compare a list of anagrams for derangement.
Procedure isDeranged(List anagram.s())
;If a pair of deranged anagrams is found return #True ;and and modify the list to include the pair of deranged anagrams. Protected i, length, word.s, *ptrAnagram, isDeranged Protected NewList deranged.s() FirstElement(anagram()) length = Len(anagram()) Repeat word = anagram() *ptrAnagram = @anagram() While NextElement(anagram()) isDeranged = #True For i = 1 To length If Mid(word, i, 1) = Mid(anagram(), i, 1) isDeranged = #False Break ;exit for/next EndIf Next If isDeranged AddElement(deranged()) deranged() = anagram() AddElement(deranged()) deranged() = word CopyList(deranged(), anagram()) ProcedureReturn #True ;deranged anagram found EndIf Wend ChangeCurrentElement(anagram(), *ptrAnagram) Until Not NextElement(anagram()) ProcedureReturn #False ;deranged anagram not found
EndProcedure
If OpenConsole()
;word file is assumed to be in the same directory If Not ReadFile(0,"unixdict.txt"): End: EndIf Define maxWordSize = 0, word.s, length Dim wordlists.wordList(maxWordSize) ;read word file and create separate lists of anagrams and their original words by length While Not Eof(0) word = ReadString(0) length = Len(word) If length > maxWordSize maxWordSize = length Redim wordlists.wordList(maxWordSize) EndIf AddElement(wordlists(length)\words()) wordlists(length)\words()\word = word wordlists(length)\words()\letters = sortLetters(@word, length) Wend CloseFile(0)
Define letters.s, foundDeranged NewList anagram.s() ;start search from largest to smallest For length = maxWordSize To 2 Step -1 If FirstElement(wordlists(length)\words()) ;only examine lists with words ;sort words to place anagrams next to each other SortStructuredList(wordlists(length)\words(), #PB_Sort_Ascending, OffsetOf(anagram\letters), #PB_Sort_String)
With wordlists(length)\words() letters = \letters AddElement(anagram()): anagram() = \word
;compose sets of anagrams and check for derangement with remaining words in current list While NextElement(wordlists(length)\words()) ;Check for end of a set of anagrams? If letters <> \letters
;if more than one word in a set of anagrams check for derangement If ListSize(anagram()) > 1 If isDeranged(anagram()) foundDeranged = #True ;found deranged anagrams, stop processing Break 2 ;exit while/wend and for/next EndIf EndIf letters = \letters ;setup for next set of anagrams ClearList(anagram()) EndIf AddElement(anagram()): anagram() = \word Wend EndWith EndIf ClearList(anagram()) Next ;report results If foundDeranged PrintN("Largest 'Deranged' anagrams found are of length " + Str(length) + ":" + #CRLF$) ForEach anagram() PrintN(" " + anagram()) Next Else PrintN("No 'Deranged' anagrams were found." + #CRLF$) EndIf Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input() CloseConsole()
EndIf</lang> Sample output:
Largest 'Deranged' anagrams found are of length 10: intoxicate excitation
Python
<lang python>import urllib.request from collections import defaultdict from itertools import combinations
def getwords(url='http://www.puzzlers.org/pub/wordlists/unixdict.txt'):
return list(set(urllib.request.urlopen(url).read().decode().split()))
def find_anagrams(words):
anagram = defaultdict(list) # map sorted chars to anagrams for word in words: anagram[tuple(sorted(word))].append( word ) return dict((key, words) for key, words in anagram.items() if len(words) > 1)
def is_deranged(words):
'returns pairs of words that have no character in the same position' return [ (word1, word2) for word1,word2 in combinations(words, 2) if all(ch1 != ch2 for ch1, ch2 in zip(word1, word2)) ]
def largest_deranged_ana(anagrams):
ordered_anagrams = sorted(anagrams.items(), key=lambda x:(-len(x[0]), x[0])) for _, words in ordered_anagrams: deranged_pairs = is_deranged(words) if deranged_pairs: return deranged_pairs return []
if __name__ == '__main__':
words = getwords('http://www.puzzlers.org/pub/wordlists/unixdict.txt') print("Word count:", len(words))
anagrams = find_anagrams(words) print("Anagram count:", len(anagrams),"\n")
print("Longest anagrams with no characters in the same position:") print(' ' + '\n '.join(', '.join(pairs) for pairs in largest_deranged_ana(anagrams)))</lang>
- Sample output
Word count: 25104 Anagram count: 1303 Longest anagrams with no characters in the same position: excitation, intoxicate
Tcl
<lang tcl>package require Tcl 8.5 package require http
- Fetch the words
set t [http::geturl "http://www.puzzlers.org/pub/wordlists/unixdict.txt"] set wordlist [split [http::data $t] \n] http::cleanup $t
- Group by characters in word
foreach word $wordlist {
dict lappend w [lsort [split $word ""]] [split $word ""]
}
- Deranged test
proc deranged? {l1 l2} {
foreach c1 $l1 c2 $l2 {
if {$c1 eq $c2} {return 0}
} return 1
}
- Get a deranged pair from an anagram set, if one exists
proc getDeranged {words} {
foreach l1 [lrange $words 0 end-1] {
foreach l2 [lrange $words 1 end] { if {[deranged? $l1 $l2]} { return [list $l1 $l2 1] } }
} return {{} {} 0}
}
- Find the max-length deranged anagram
set count 0 set candidates {} set max 0 dict for {k words} $w {
incr count [expr {[llength $words] > 1}] if {[llength $k] > $max && [lassign [getDeranged $words] l1 l2]} {
set max [llength $l1] lappend candidates [join $l1 ""],[join $l2 ""]
}
}
- Print out what we found
puts "[llength $wordlist] words" puts "[dict size $w] potential anagram-groups" puts "$count real anagram-groups" foreach pair $candidates {
puts "considered candidate pairing: $pair"
} puts "MAXIMAL DERANGED ANAGRAM: LENGTH $max\n\t[lindex $candidates end]"</lang> Output:
25105 words 23567 potential anagram-groups 1303 real anagram-groups considered candidate pairing: abc,cab considered candidate pairing: abed,bade considered candidate pairing: abort,bator considered candidate pairing: afresh,shafer considered candidate pairing: alberto,latrobe considered candidate pairing: american,cinerama considered candidate pairing: ancestral,lancaster considered candidate pairing: excitation,intoxicate MAXIMAL DERANGED ANAGRAM: LENGTH 10 excitation,intoxicate