Determine if a string has all unique characters

Task

Given a character string (which may be empty, or have a length of zero characters):

create a function/procedure/routine to:

determine if all the characters in the string are unique
indicate if or which character is duplicated and where

display each string and it's length (as the strings are being examined)
a zero─length (empty) string shall be considered as unique
process the strings from left─to─right
if unique, display a message saying such
if not unique, then:

display a message saying such
display what character is duplicated
only the 1^st non─unique character need be displayed
display where "both" duplicated characters are in the string
the above messages can be part of a single message
display the hexadecimal value of the duplicated character

Use (at least) these five test values (strings):

a string of length 0 (an empty string)
a string of length 1 which is a single period (.)
a string of length 6 which contains: abcABC
a string of length 7 which contains a blank in the middle: XYZ ZYX
a string of length 36 which doesn't contain the letter "oh":

1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ

Show all output here on this page.

Related tasks

Factor

<lang factor>USING: accessors formatting generalizations io kernel math.parser regexp sequences sets strings ;

>dup-char< ( str n -- char hex first-index second-index )

   1string tuck [ dup first >hex ] 2dip <regexp>
   all-matching-slices first2 [ from>> ] bi@ ;

duplicate-info. ( str -- )

   dup duplicates
   [ >dup-char< "'%s' (0x%s) at indices %d and %d.\n" printf ]
   with each nl ;

uniqueness-report. ( str -- )

   dup dup length "%u — length %d — contains " printf dup
   all-unique? [ drop "all unique characters." print nl ]
   [ "duplicate characters:" print duplicate-info. ] if ;

"" "." "abcABC" "XYZ ZYX" "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" [ uniqueness-report. ] 5 napply</lang>

Output:

"" — length 0 — contains all unique characters.

"." — length 1 — contains all unique characters.

"abcABC" — length 6 — contains all unique characters.

"XYZ ZYX" — length 7 — contains duplicate characters:
'Z' (0x5a) at indices 2 and 4.
'Y' (0x59) at indices 1 and 5.
'X' (0x58) at indices 0 and 6.

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" — length 36 — contains duplicate characters:
'0' (0x30) at indices 9 and 24.

Go

<lang go>package main

import "fmt"

func analyze(s string) {

   chars := []rune(s)
   le := len(chars)
   fmt.Printf("Analyzing %q which has a length of %d:\n", s, le)
   if le > 1 {
       for i := 0; i < le-1; i++ {
           for j := i + 1; j < le; j++ {
               if chars[j] == chars[i] {
                   fmt.Println("  Not all characters in the string are unique.")
                   fmt.Printf("  %q (%#[1]x) is duplicated at positions %d and %d.\n\n", chars[i], i+1, j+1)
                   return
               }
           }
       }
   }
   fmt.Println("  All characters in the string are unique.\n")

}

func main() {

   strings := []string{
       "",
       ".",
       "abcABC",
       "XYZ ZYX",
       "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ",
       "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X",
       "hétérogénéité",
       "🎆🎃🎇🎈",
       "😍😀🙌💃😍🙌",
       "🐠🐟🐡🦈🐬🐳🐋🐡",
   }
   for _, s := range strings {
       analyze(s)
   }

}</lang>

Output:

Analyzing "" which has a length of 0:
  All characters in the string are unique.

Analyzing "." which has a length of 1:
  All characters in the string are unique.

Analyzing "abcABC" which has a length of 6:
  All characters in the string are unique.

Analyzing "XYZ ZYX" which has a length of 7:
  Not all characters in the string are unique.
  'X' (0x58) is duplicated at positions 1 and 7.

Analyzing "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" which has a length of 36:
  Not all characters in the string are unique.
  '0' (0x30) is duplicated at positions 10 and 25.

Analyzing "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" which has a length of 39:
  Not all characters in the string are unique.
  '0' (0x30) is duplicated at positions 1 and 11.

Analyzing "hétérogénéité" which has a length of 13:
  Not all characters in the string are unique.
  'é' (0xe9) is duplicated at positions 2 and 4.

Analyzing "🎆🎃🎇🎈" which has a length of 4:
  All characters in the string are unique.

Analyzing "😍😀🙌💃😍🙌" which has a length of 6:
  Not all characters in the string are unique.
  '😍' (0x1f60d) is duplicated at positions 1 and 5.

Analyzing "🐠🐟🐡🦈🐬🐳🐋🐡" which has a length of 8:
  Not all characters in the string are unique.
  '🐡' (0x1f421) is duplicated at positions 3 and 8.

Perl

<lang perl>use strict; use warnings; use feature 'say'; use utf8; binmode(STDOUT, ':utf8'); use List::AllUtils qw(uniq); use Unicode::Normalize qw(NFC); use Unicode::UCD 'charinfo';

for my $str (

   ,
   '.',
   'abcABC',
   'XYZ ZYX',
   '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ',
   '01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X',
   'Δ👍👨👍Δ',

) {

   printf qq{\n"$str" (length: %d) has }, my $len = length NFC $str;
   if ($len != uniq my @S = split //, NFC $str) {
       say "duplicated characters:";
       my %P;
       push @{ $P{$S[$_]} }, 1+$_ for 0..$#S;
       for my $k (sort keys %P) {
           next unless @{$P{$k}} > 1;
           printf "'%s' %s (0x%x) in positions: %s\n", $k, charinfo(ord $k)->{'name'}, ord($k), join ', ', @{$P{$k}};
       }
   } else {
       say "no duplicated characters."
   }

}</lang>

Output:

"" (length: 0) has no duplicated characters.

"." (length: 1) has no duplicated characters.

"abcABC" (length: 6) has no duplicated characters.

"XYZ ZYX" (length: 7) has duplicated characters:
'X' LATIN CAPITAL LETTER X (0x58) in positions: 1, 7
'Y' LATIN CAPITAL LETTER Y (0x59) in positions: 2, 6
'Z' LATIN CAPITAL LETTER Z (0x5a) in positions: 3, 5

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" (length: 36) has duplicated characters:
'0' DIGIT ZERO (0x30) in positions: 10, 25

"01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" (length: 39) has duplicated characters:
'0' DIGIT ZERO (0x30) in positions: 1, 11, 26, 38
'X' LATIN CAPITAL LETTER X (0x58) in positions: 35, 39

"Δ👍👨👍Δ" (length: 5) has duplicated characters:
'Δ' GREEK CAPITAL LETTER DELTA (0x394) in positions: 1, 5
'👍' THUMBS UP SIGN (0x1f44d) in positions: 2, 4

Perl 6

Works with: Rakudo version 2019.07.1

Perl 6 works with unicode natively and handles combining characters and multi-byte emoji correctly. In the last string, notice the the length is correctly shown as 11 characters and that the delta with a combining circumflex in position 6 is not the same as the deltas without in positions 5 & 9.

<lang perl6> -> $str {

   my $i = 0;
   print "\n{$str.perl} (length: {$str.chars}), has ";
   my %m;
   %m{$_}.push: ++$i for $str.comb;
   if any(%m.values) > 1 {
       say "duplicated characters:";
       say "'{.key}' ({.key.uninames}; hex ordinal: {(.key.ords).fmt: "0x%X"})" ~
       " in positions: {.value.join: ', '}" for %m.grep( *.value > 1 ).sort( *.value[0] );
   } else {
       say "no duplicated characters."
   }

} for

   ,
   '.',
   'abcABC',
   'XYZ ZYX',
   '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ',
   '01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X',
   '🦋🙂👨‍👩‍👧‍👦🙄ΔΔ̂ 🦋Δ👍👨‍👩‍👧‍👦'</lang>

Output:

"" (length: 0), has no duplicated characters.

"." (length: 1), has no duplicated characters.

"abcABC" (length: 6), has no duplicated characters.

"XYZ ZYX" (length: 7), has duplicated characters:
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 1, 7
'Y' (LATIN CAPITAL LETTER Y; hex ordinal: 0x59) in positions: 2, 6
'Z' (LATIN CAPITAL LETTER Z; hex ordinal: 0x5A) in positions: 3, 5

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" (length: 36), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 10, 25

"01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" (length: 39), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 1, 11, 26, 38
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 35, 39

"🦋🙂👨‍👩‍👧‍👦🙄ΔΔ̂ 🦋Δ👍👨‍👩‍👧‍👦" (length: 11), has duplicated characters:
'🦋' (BUTTERFLY; hex ordinal: 0x1F98B) in positions: 1, 8
'👨‍👩‍👧‍👦' (MAN ZERO WIDTH JOINER WOMAN ZERO WIDTH JOINER GIRL ZERO WIDTH JOINER BOY; hex ordinal: 0x1F468 0x200D 0x1F469 0x200D 0x1F467 0x200D 0x1F466) in positions: 3, 11
'Δ' (GREEK CAPITAL LETTER DELTA; hex ordinal: 0x394) in positions: 5, 9

REXX

<lang rexx>/*REXX pgm determines if a string is comprised of all unique characters (no duplicates).*/ @.= /*assign a default for the @. array. */ parse arg @.1 /*obtain optional argument from the CL.*/ if @.1= then do; @.1= /*Not specified? Then assume defaults.*/

                     @.2= .
                     @.3= 'abcABC'
                     @.4= 'XYZ ZYX'
                     @.5= '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ'
               end

    do j=1;  if j\==1  &  @.j==  then leave   /*String is null & not j=1?  We're done*/
    say copies('─', 79)                         /*display a separator line  (a fence). */
    say 'Testing for the string (length' length(@.j)"): "   @.j
    say
    dup= isUnique(@.j)
    say 'The characters in the string'   word("are aren't", 1 + (dup>0) )  'all unique.'
    if dup==0  then iterate
    ?= substr(@.j, dup, 1)
    say 'The character '  ?  " ('"c2x(?)"'x)  at position "  dup ,
                                ' is repeated at position '  pos(?, @.j, dup+1)
    end   /*j*/

exit /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ isUnique: procedure; parse arg x /*obtain the character string.*/

                      do k=1  to length(x) - 1           /*examine all but the last.   */
                      p= pos( substr(x, k, 1), x, k + 1) /*see if the Kth char is a dup*/
                      if p\==0  then return k            /*Find a dup? Return location.*/
                      end   /*k*/
         return 0                                        /*indicate all chars unique.  */</lang>

output when using the internal defaults

───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 0):

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 1):  .

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 6):  abcABC

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 7):  XYZ ZYX

The characters in the string aren't all unique.
The character  X  ('58'x)  at position  1  is repeated at position  7
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 36):  1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ

The characters in the string aren't all unique.
The character  0  ('30'x)  at position  10  is repeated at position  25

zkl

<lang zkl>fcn stringUniqueness(str){ // Does not handle Unicode

  sz,unique,uz,counts := str.len(), str.unique(), unique.len(), str.counts();
  println("Length %d: \"%s\"".fmt(sz,str));
  if(sz==uz or uz==1) println("\tAll characters are unique");
  else  // counts is (char,count, char,count, ...)
     println("\tDuplicate: ",
        counts.pump(List,Void.Read,fcn(str,c,n){
           if(n>1){

is,z:=List(),-1; do(n){ is.append(z=str.find(c,z+1)) } "'%s' (0x%x)[%s]".fmt(c,c.toAsc(),is.concat(",")) } else Void.Skip }.fp(str)).concat(", ")); }</lang> <lang zkl>testStrings:=T("", ".", "abcABC", "XYZ ZYX",

  "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ",
  "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X");

foreach s in (testStrings){ stringUniqueness(s) }</lang>

Output:

Length 0: ""
	All characters are unique
Length 1: "."
	All characters are unique
Length 6: "abcABC"
	All characters are unique
Length 7: "XYZ ZYX"
	Duplicate: 'X' (0x58)[0,6], 'Y' (0x59)[1,5], 'Z' (0x5a)[2,4]
Length 36: "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ"
	Duplicate: '0' (0x30)[9,24]
Length 39: "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X"
	Duplicate: '0' (0x30)[0,10,25,37], 'X' (0x58)[34,38]