Anagrams/Deranged anagrams

From Rosetta Code
Task
Anagrams/Deranged anagrams
You are encouraged to solve this task according to the task description, using any language you may know.

Two or more words are said to be anagrams if they have the same characters, but in a different order. By analogy with derangements we define a deranged anagram as two words with the same characters, but in which the same character does not appear in the same position in both words.

The task is to use the word list at http://www.puzzlers.org/pub/wordlists/unixdict.txt to find and show the longest deranged anagram.

Cf.

D

<lang d>import std.stdio, std.file, std.string, std.algorithm,

       std.typecons, std.range;

auto findAnagrams(string[] words) {

   string[][string] anagrams;
   foreach (w; words)
       anagrams[w.sort] ~= w.idup;
   return array(filter!q{a.length > 1}(anagrams.values));

}

// returns pairs of words that have no character in the same position Tuple!(string,string)[] findDeranged(string[] words) {

   typeof(return) result;
   foreach (i, w1; words) {
       next: foreach (w2; words[i+1..$]) {
           for (int k = 0; k < w1.length; k++) {
               if (w1[k] == w2[k]) continue next;
           }
           result ~= tuple(w1, w2);
       }
   }
   return result;

}

Tuple!(string,string)[] largestDerangedAna(string[][] anagrams) {

   sort!q{a[0].length > b[0].length}(anagrams);
   foreach (words; anagrams) {
       if (auto derangedPairs = findDeranged(words))
           return derangedPairs;
   }
   return [];

}

void main() {

   auto anags = findAnagrams(split(cast(string)read("unixdict.txt")));
   writeln("Longest anagrams with no characters in the same position:");
   foreach (pairs; largestDerangedAna(anags))
       writefln("  %s, %s", pairs.tupleof);

}</lang> Output:

Longest anagrams with no characters in the same position:
  excitation, intoxicate

Icon and Unicon

This solution (which works in both languages) does a strict interpretation of the problem and ignores the fact that there may be multiple derangements that are the same length (including ignoring multiple derangements arising from the same set of words that are all anagrams).

<lang unicon>link strings # for csort() procedure

procedure main()

   anagrams := table()                     # build lists of anagrams
   every *(word := !&input) > 1 do {
       canon := csort(word)
       /anagrams[canon] := []
       put(anagrams[canon], word)
       }
   longest := 1                            # find a longest derangement
   every *(aList := !anagrams) > 1 do     
       if derangement := derange(aList) then 
           if longest <:= *derangement[1] then long := derangement
   every writes((!\long||" ")|"\n")        # show longest

end

procedure derange(aList) # Return a single derangement from this list

   while aWord := get(aList) do return noMatch(aWord, !aList)

end

procedure noMatch(s1,s2) # Produce pair only if s1,s2 are deranged.

   every i := 1 to *s1 do if s1[i] == s2[i] then fail
   return [s1,s2]

end</lang>

Sample run:

->dra <unixdict.txt
excitation intoxicate 
->

J

This assumes that unixdict.txt has been saved in the current directory.

<lang j> #words=: <;._2 ] 1!:1 <'unixdict.txt' 25104

  #anagrams=: (#~ 1 < #@>) (</.~ /:~&>) words

1303

  #maybederanged=: (#~ (1 -.@e. #@~."1)@|:@:>&>) anagrams

432

  #longest=: (#~ [: (= >./) #@>@{.@>) maybederanged

1

  longest

┌───────────────────────┐ │┌──────────┬──────────┐│ ││excitation│intoxicate││ │└──────────┴──────────┘│ └───────────────────────┘</lang>

Note that anagram sets with more than two members might, hypothetically, have made things more complicated. By lucky coincidence, this was not an issue. We could have taken advantage of that coincidence to achieve slight further simplifications. Perhaps maybederanged=: (#~ (-: ~."1)@|:@:>&>) anagrams

In other words, if we had had to consider whether ascertain/cartesian/sectarian contained a deranged pair, we would have had to break it out into the three pairs it contains. However, since 'excitation' is a longer word than 'ascertain', we know that this triple cannot contain the longest deranged anagram pair. And since there are no anagrams longer than 'excitation' which involve more than a single pair, we know that we can ignore this issue entirely.

Java

Translation of: D

<lang java>import java.io.*; import java.util.*;

public class DerangedAnagrams {

   public static void main(String[] args) throws IOException {
       String[][] anagrams = findAnagrams(readLines("unixdict.txt"));
       String[] res = null;
       for (String[] ana : anagrams) {
           int j = ana[0].length() - 1;
           while (j >= 0 && ana[0].charAt(j) != ana[1].charAt(j)) {
               j--;
           }
           if (j == -1) {
               res = ana;
               break;
           }
       }
       System.out.printf("result: %s %s\n", res[0], res[1]);
   }
   static String[][] findAnagrams(String[] words) {
       Map<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
       for (String w : words) {
           char[] srt;
           Arrays.sort(srt = w.toCharArray());
           String key = String.valueOf(srt);
           if (map.containsKey(key)) {
               map.get(key).add(w);
           } else {
               ArrayList<String> v = new ArrayList<String>();
               v.add(w);
               map.put(key, v);
           }
       }       
       ArrayList<String[]> pairs = new ArrayList<String[]>();
       String[] keys = map.keySet().toArray(new String[]{});
       Arrays.sort(keys, new LengthComparator());
       for (String k : keys) {
           ArrayList<String> v = map.get(k);
           if (v.size() > 1) {
               String[] wrds = v.toArray(new String[]{});
               for (int i = 0; i < wrds.length - 1; i++) 
                   pairs.add(new String[]{wrds[i], wrds[i + 1]});
           }
       }
       return pairs.toArray(new String[][]{});
   }
   static class LengthComparator implements Comparator<String> {
       public int compare(String o1, String o2) {
           if (o1.length() < o2.length()) {
               return 1;
           } else if (o1.length() > o2.length()) {
               return -1;
           } else {
               return 0;
           }
       }
   }
   static public String[] readLines(String fn) throws IOException {
       BufferedReader br = new BufferedReader(new FileReader(fn));
       List<String> lines = new ArrayList<String>();
       String line = null;
       while ((line = br.readLine()) != null) 
           lines.add(line.trim());
       br.close();
       return lines.toArray(new String[]{});
   }

}</lang>

result: excitation intoxicate

PicoLisp

<lang PicoLisp>(let Words NIL

  (in "unixdict.txt"
     (while (line)
        (let (Word @  Key (pack (sort (copy @))))
           (if (idx 'Words Key T)
              (push (car @) Word)
              (set Key (list Word)) ) ) ) )
  (maxi '((X) (length (car X)))
     (extract
        '((Key)
           (pick
              '((Lst)
                 (and
                    (find
                       '((L) (not (find = L Lst)))
                       (val Key) )
                    (cons (pack @) (pack Lst)) ) )
              (val Key) ) )
        (idx 'Words) ) ) )</lang>

Output:

-> ("excitation" . "intoxicate")

Perl 6

Note that, to make runtime manageable, we have created a subset file:
grep '^[ie]' unixdict.txt > dict.ie <lang perl6>my %anagram = slurp('dict.ie').words.map({[.comb]}).classify({ .sort.join });

for %anagram.values.sort({ -@($_[0]) }) -> @aset {

   for     0   ..^ @aset.end -> $i {
       for $i ^..  @aset.end -> $j {
           if none(  @aset[$i].list Zeq @aset[$j].list ) {
               say "{@aset[$i].join}   {@aset[$j].join}";
               exit;
           }
       }
   }

} </lang>

Output:

excitation   intoxicate

Python

<lang python>import urllib.request from collections import defaultdict from itertools import combinations

def getwords(url='http://www.puzzlers.org/pub/wordlists/unixdict.txt'):

   return list(set(urllib.request.urlopen(url).read().decode().split()))

def find_anagrams(words):

   anagram = defaultdict(list) # map sorted chars to anagrams
   for word in words:
       anagram[tuple(sorted(word))].append( word )
   return dict((key, words) for key, words in anagram.items()
               if len(words) > 1)

def is_deranged(words):

   'returns pairs of words that have no character in the same position'
   return [ (word1, word2)
            for word1,word2 in combinations(words, 2)
            if all(ch1 != ch2 for ch1, ch2 in zip(word1, word2)) ]

def largest_deranged_ana(anagrams):

   ordered_anagrams = sorted(anagrams.items(),
                             key=lambda x:(-len(x[0]), x[0]))
   for _, words in ordered_anagrams:
       deranged_pairs = is_deranged(words)
       if deranged_pairs:
           return deranged_pairs
   return []

if __name__ == '__main__':

   words = getwords('http://www.puzzlers.org/pub/wordlists/unixdict.txt')
   print("Word count:", len(words))
   anagrams = find_anagrams(words)
   print("Anagram count:", len(anagrams),"\n")
   print("Longest anagrams with no characters in the same position:")
   print('  ' + '\n  '.join(', '.join(pairs)
                            for pairs in largest_deranged_ana(anagrams)))</lang>
Sample output
Word count: 25104
Anagram count: 1303 

Longest anagrams with no characters in the same position:
  excitation, intoxicate

Tcl

<lang tcl>package require Tcl 8.5 package require http

  1. Fetch the words

set t [http::geturl "http://www.puzzlers.org/pub/wordlists/unixdict.txt"] set wordlist [split [http::data $t] \n] http::cleanup $t

  1. Group by characters in word

foreach word $wordlist {

   dict lappend w [lsort [split $word ""]] [split $word ""]

}

  1. Deranged test

proc deranged? {l1 l2} {

   foreach c1 $l1 c2 $l2 {

if {$c1 eq $c2} {return 0}

   }
   return 1

}

  1. Get a deranged pair from an anagram set, if one exists

proc getDeranged {words} {

   foreach l1 [lrange $words 0 end-1] {

foreach l2 [lrange $words 1 end] { if {[deranged? $l1 $l2]} { return [list $l1 $l2 1] } }

   }
   return {{} {} 0}

}

  1. Find the max-length deranged anagram

set count 0 set candidates {} set max 0 dict for {k words} $w {

   incr count [expr {[llength $words] > 1}]
   if {[llength $k] > $max && [lassign [getDeranged $words] l1 l2]} {

set max [llength $l1] lappend candidates [join $l1 ""],[join $l2 ""]

   }

}

  1. Print out what we found

puts "[llength $wordlist] words" puts "[dict size $w] potential anagram-groups" puts "$count real anagram-groups" foreach pair $candidates {

   puts "considered candidate pairing: $pair"

} puts "MAXIMAL DERANGED ANAGRAM: LENGTH $max\n\t[lindex $candidates end]"</lang> Output:

25105 words
23567 potential anagram-groups
1303 real anagram-groups
considered candidate pairing: abc,cab
considered candidate pairing: abed,bade
considered candidate pairing: abort,bator
considered candidate pairing: afresh,shafer
considered candidate pairing: alberto,latrobe
considered candidate pairing: american,cinerama
considered candidate pairing: ancestral,lancaster
considered candidate pairing: excitation,intoxicate
MAXIMAL DERANGED ANAGRAM: LENGTH 10
	excitation,intoxicate