Anagrams/Deranged anagrams

From Rosetta Code
Revision as of 14:59, 14 May 2011 by rosettacode>Dkf (→‎Tcl: Added implementation)
Anagrams/Deranged anagrams is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Two or more words are said to be anagrams if they have the same characters, but in a different order. By analogy with derangements we define a deranged anagram as two words with the same characters, but in which the same character does not appear in the same position in both words.

The task is to use the word list at http://www.puzzlers.org/pub/wordlists/unixdict.txt to find and show the longest deranged anagram.

Python

<lang python>import urllib.request from collections import defaultdict from itertools import combinations

def getwords(url='http://www.puzzlers.org/pub/wordlists/unixdict.txt'):

   return list(set(urllib.request.urlopen(url).read().decode().split()))

def find_anagrams(words):

   anagram = defaultdict(list) # map sorted chars to anagrams
   for word in words:
       anagram[tuple(sorted(word))].append( word )
   return dict((key, words) for key, words in anagram.items()
               if len(words) > 1)

def is_deranged(words):

   'returns pairs of words that have no character in the same position'
   return [ (word1, word2)
            for word1,word2 in combinations(words, 2)
            if all(ch1 != ch2 for ch1, ch2 in zip(word1, word2)) ]

def largest_deranged_ana(anagrams):

   ordered_anagrams = sorted(anagrams.items(),
                             key=lambda x:(-len(x[0]), x[0]))
   for _, words in ordered_anagrams:
       deranged_pairs = is_deranged(words)
       if deranged_pairs:
           return deranged_pairs
   return []

if __name__ == '__main__':

   words = getwords('http://www.puzzlers.org/pub/wordlists/unixdict.txt')
   print("Word count:", len(words))
   anagrams = find_anagrams(words)
   print("Anagram count:", len(anagrams),"\n")
   print("Longest anagrams with no characters in the same position:")
   print('  ' + '\n  '.join(', '.join(pairs)
                            for pairs in largest_deranged_ana(anagrams)))</lang>
Sample output
Word count: 25104
Anagram count: 1303 

Longest anagrams with no characters in the same position:
  excitation, intoxicate

Tcl

<lang tcl>package require Tcl 8.5 package require http

  1. Fetch the words

set t [http::geturl http://www.puzzlers.org/pub/wordlists/unixdict.txt] set wordlist [split [http::data $t] \n] http::cleanup $t

  1. Group by characters in word

foreach word $wordlist {

   dict lappend w [lsort [split $word ""]] [split $word ""]

}

  1. Find the max-length deranged anagram

set count 0 set candidates {} set max 0 set maxset {} dict for {k words} $w {

   if {[llength $words] < 2} continue
   incr count
   if {[llength $k] < $max} continue
   foreach l1 [lrange $words 0 end-1] {

foreach l2 [lrange $words 1 end] { if {$l1 eq $l2} continue set skip 0 foreach c1 $l1 c2 $l2 { if {[set skip [string equal $c1 $c2]]} break } if {$skip} continue if {[llength $l1] > $max} { set max [llength $l1] set maxset {} } set candidate [join $l1 ""],[join $l2 ""] lappend maxset $candidate lappend candidates $candidate }

   }

}

  1. Print out what we found

puts "[llength $wordlist] words" puts "[dict size $w] potential anagram-groups" puts "$count real anagram-groups" foreach pair $candidates {

   puts "considered candidate pairing: $pair"

} puts "MAXIMAL DERANGED ANAGRAM: LENGTH $max" foreach pair $maxset {

   puts \t$pair

}</lang> Output:

25105 words
23567 potential anagram-groups
1303 real anagram-groups
considered candidate pairing: abc,cab
considered candidate pairing: abed,bade
considered candidate pairing: abel,bale
considered candidate pairing: abel,bela
considered candidate pairing: abel,elba
considered candidate pairing: able,elba
considered candidate pairing: bale,elba
considered candidate pairing: abet,bate
considered candidate pairing: abet,beta
considered candidate pairing: abort,bator
considered candidate pairing: acton,canto
considered candidate pairing: afresh,shafer
considered candidate pairing: aileen,elaine
considered candidate pairing: alberto,latrobe
considered candidate pairing: algenib,belgian
considered candidate pairing: algenib,bengali
considered candidate pairing: american,cinerama
considered candidate pairing: ancestral,lancaster
considered candidate pairing: broadside,sideboard
considered candidate pairing: englander,greenland
considered candidate pairing: excitation,intoxicate
MAXIMAL DERANGED ANAGRAM: LENGTH 10
	excitation,intoxicate