Isograms and heterograms

From Rosetta Code
Revision as of 20:33, 17 June 2022 by Wherrera (talk | contribs) (julia example)
Isograms and heterograms is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Definitions

For the purposes of this task, an isogram means a string where each character present is used the same number of times and an n-isogram means an isogram where each character present is used exactly n times.

A heterogram means a string in which no character occurs more than once. It follows that a heterogram is the same thing as a 1-isogram.


Examples

caucasus is a 2-isogram because the letters c, a, u and s all occur twice.

atmospheric is a heterogram because all its letters are used once only.


Task

Using unixdict.txt and ignoring capitalization:


1) Find and display here all words which are n-isograms where n > 1.

Present the results as a single list but sorted as follows:

a. By decreasing order of n;

b. Then by decreasing order of word length;

c. Then by ascending lexicographic order.

2) Secondly, find and display here all words which are heterograms and have more than 10 characters.

Again present the results as a single list but sorted as per b. and c. above.


Reference


Other tasks related to string operations:
Metrics
Counting
Remove/replace
Anagrams/Derangements/shuffling
Find/Search/Determine
Formatting
Song lyrics/poems/Mad Libs/phrases
Tokenize
Sequences


Factor

Works with: Factor version 0.99 2022-04-03

<lang factor>USING: assocs combinators.short-circuit.smart grouping io io.encodings.ascii io.files kernel literals math math.order math.statistics sequences sets sorting ;

CONSTANT: words $[ "unixdict.txt" ascii file-lines ]

isogram<=> ( a b -- <=> )
   { [ histogram values first ] [ length ] } compare-with ;
isogram-sort ( seq -- seq' )
   [ isogram<=> invert-comparison ] sort ;
isogram? ( seq -- ? )
   histogram values { [ first 1 > ] [ all-eq? ] } && ;
.words-by ( quot -- )
   words swap filter isogram-sort [ print ] each ; inline

"List of n-isograms where n > 1:" print [ isogram? ] .words-by nl

"List of heterograms of length > 10:" print [ { [ length 10 > ] [ all-unique? ] } && ] .words-by</lang>

Output:
List of n-isograms where n > 1:
aaa
iii
beriberi
bilabial
caucasus
couscous
teammate
appall
emmett
hannah
murmur
tartar
testes
anna
coco
dada
deed
dodo
gogo
isis
juju
lulu
mimi
noon
otto
papa
peep
poop
teet
tete
toot
tutu
ii

List of heterograms of length > 10:
ambidextrous
bluestocking
exclusionary
incomputable
lexicography
loudspeaking
malnourished
atmospheric
blameworthy
centrifugal
christendom
consumptive
countervail
countryside
countrywide
disturbance
documentary
earthmoving
exculpatory
geophysical
inscrutable
misanthrope
problematic
selfadjoint
stenography
sulfonamide
switchblade
switchboard
switzerland
thunderclap
valedictory
voluntarism

Template:Header\Julia

<lang ruby>function isogram(word)

   wchars, uchars = collect(word), unique(collect(word))
   ulen, wlen = length(uchars), length(wchars)
   (wlen == 1 || ulen == wlen) && return 1
   n = count(==(first(uchars)), wchars)
   return all(i -> count(==(uchars[i]), wchars) == n, 2:ulen) ? n : 0

end

words = split(lowercase(read("documents/julia/unixdict.txt", String)), r"\s+") orderlengthtuples = [(isogram(w), length(w), w) for w in words]

tcomp(x, y) = (x[1] != y[1] ? y[1] < x[1] : x[2] != y[2] ? y[2] < x[2] : x[3] < y[3])

nisograms = sort!(filter(t -> t[1] > 1, orderlengthtuples), lt = tcomp) heterograms = sort!(filter(t -> t[1] == 1 && length(t[3]) > 10, orderlengthtuples), lt = tcomp)

println("N-Isogram N Length\n", "-"^24) foreach(t -> println(rpad(t[3], 8), lpad(t[1], 5), lpad(t[2], 5)), nisograms) println("\nHeterogram Length\n", "-"^20) foreach(t -> println(rpad(t[3], 12), lpad(t[2], 5)), heterograms)

</lang>

Output:
N-Isogram   N  Length
------------------------
aaa         3    3
iii         3    3
beriberi    2    8
bilabial    2    8
caucasus    2    8
couscous    2    8
teammate    2    8
appall      2    6
emmett      2    6
hannah      2    6
murmur      2    6
tartar      2    6
testes      2    6
anna        2    4
coco        2    4
dada        2    4
deed        2    4
dodo        2    4
gogo        2    4
isis        2    4
juju        2    4
lulu        2    4
mimi        2    4
noon        2    4
otto        2    4
papa        2    4
peep        2    4
poop        2    4
teet        2    4
tete        2    4
toot        2    4
tutu        2    4
ii          2    2

Heterogram   Length
--------------------
ambidextrous   12
bluestocking   12
exclusionary   12
incomputable   12
lexicography   12
loudspeaking   12
malnourished   12
atmospheric    11
blameworthy    11
centrifugal    11
christendom    11
consumptive    11
countervail    11
countryside    11
countrywide    11
disturbance    11
documentary    11
earthmoving    11
exculpatory    11
geophysical    11
inscrutable    11
misanthrope    11
problematic    11
selfadjoint    11
stenography    11
sulfonamide    11
switchblade    11
switchboard    11
switzerland    11
thunderclap    11
valedictory    11
voluntarism    11

Raku

<lang perl6>my $file = 'unixdict.txt';

my @words = $file.IO.slurp.words.race.map: { $_ => .comb.Bag };

.say for (6...2).map: -> $n {

   next unless my @iso = @words.race.grep({.value.values.all == $n})».key;
   "\n({+@iso}) {$n}-isograms:\n" ~ @iso.sort({[-.chars, ~$_]}).join: "\n";

}

my $minchars = 10;

say "\n({+$_}) heterograms with $minchars or more characters:\n" ~

 .sort({[-.chars, ~$_]}).join: "\n" given
 @words.race.grep({.key.chars >$minchars && .value.values.max == 1})».key;</lang>
Output:
(2) 3-isograms:
aaa
iii

(31) 2-isograms:
beriberi
bilabial
caucasus
couscous
teammate
appall
emmett
hannah
murmur
tartar
testes
anna
coco
dada
deed
dodo
gogo
isis
juju
lulu
mimi
noon
otto
papa
peep
poop
teet
tete
toot
tutu
ii

(32) heterograms with 10 or more characters:
ambidextrous
bluestocking
exclusionary
incomputable
lexicography
loudspeaking
malnourished
atmospheric
blameworthy
centrifugal
christendom
consumptive
countervail
countryside
countrywide
disturbance
documentary
earthmoving
exculpatory
geophysical
inscrutable
misanthrope
problematic
selfadjoint
stenography
sulfonamide
switchblade
switchboard
switzerland
thunderclap
valedictory
voluntarism

Wren

Library: Wren-str

<lang ecmascript>import "io" for File import "./str" for Str

var isogram = Fn.new { |word|

   if (word.count == 1) return 1
   var map = {}
   word = Str.lower(word)
   for (c in word) {
       if (map.containsKey(c)) {
           map[c] = map[c] + 1
       } else {
           map[c] = 1
       }
   }
   var chars = map.keys.toList
   var n = map[chars[0]]
   var iso = chars[1..-1].all { |c| map[c] == n }
   return iso ? n : 0

}

var isoComparer = Fn.new { |i, j|

   if (i[1] != j[1]) return i[1] > j[1]
   if (i[0].count != j[0].count) return i[0].count > j[0].count
   return Str.le(i[0], j[0])

}

var heteroComparer = Fn.new { |i, j|

   if (i[0].count != j[0].count) return i[0].count > j[0].count
   return Str.le(i[0], j[0])

}

var wordList = "unixdict.txt" // local copy var words = File.read(wordList)

               .trimEnd()
               .split("\n")
               .map { |word| [word, isogram.call(word)] }

var isograms = words.where { |t| t[1] > 1 }

                   .toList
                   .sort(isoComparer)
                   .map { |t| "  " + t[0] }
                   .toList

System.print("List of n-isograms(%(isograms.count)) where n > 1:") System.print(isograms.join("\n"))

var heterograms = words.where { |t| t[1] == 1 && t[0].count > 10 }

                      .toList
                      .sort(heteroComparer)
                      .map { |t| "  " + t[0] }
                      .toList

System.print("\nList of heterograms(%(heterograms.count)) of length > 10:") System.print(heterograms.join("\n"))</lang>

Output:
List of n-isograms(33) where n > 1:
  aaa
  iii
  beriberi
  bilabial
  caucasus
  couscous
  teammate
  appall
  emmett
  hannah
  murmur
  tartar
  testes
  anna
  coco
  dada
  deed
  dodo
  gogo
  isis
  juju
  lulu
  mimi
  noon
  otto
  papa
  peep
  poop
  teet
  tete
  toot
  tutu
  ii

List of heterograms(32) of length > 10:
  ambidextrous
  bluestocking
  exclusionary
  incomputable
  lexicography
  loudspeaking
  malnourished
  atmospheric
  blameworthy
  centrifugal
  christendom
  consumptive
  countervail
  countryside
  countrywide
  disturbance
  documentary
  earthmoving
  exculpatory
  geophysical
  inscrutable
  misanthrope
  problematic
  selfadjoint
  stenography
  sulfonamide
  switchblade
  switchboard
  switzerland
  thunderclap
  valedictory
  voluntarism