Determine if a string has all unique characters: Difference between revisions

From Rosetta Code
Content added Content deleted
(Added Go)
m (elided the forcing of the TOC (table of contents).)
Line 37: Line 37:
:*   [https://rosettacode.org/wiki/Determine_if_a_string_has_all_the_same_characters determine if a string has all the same characters]
:*   [https://rosettacode.org/wiki/Determine_if_a_string_has_all_the_same_characters determine if a string has all the same characters]
<br><br>
<br><br>

__TOC__


=={{header|Factor}}==
=={{header|Factor}}==

Revision as of 19:12, 30 October 2019

Determine if a string has all unique characters is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Given a character string   (which may be empty, or have a length of zero characters):

  •   create a function/procedure/routine to:
  •   determine if all the characters in the string are unique
  •   indicate if or which character is duplicated and where
  •   display each string and it's length   (as the strings are being examined)
  •   a zero─length (empty) string shall be considered as unique
  •   process the strings from left─to─right
  •   if       unique,   display a message saying such
  •   if not unique,   then:
  •   display a message saying such
  •   display what character is duplicated
  •   only the 1st non─unique character need be displayed
  •   display where "both" duplicated characters are in the string
  •   the above messages can be part of a single message
  •   display the hexadecimal value of the duplicated character


Use (at least) these five test values   (strings):

  •   a string of length     0   (an empty string)
  •   a string of length     1   which is a single period   (.)
  •   a string of length     6   which contains:   abcABC
  •   a string of length     7   which contains a blank in the middle:   XYZ  ZYX
  •   a string of length   36   which   doesn't   contain the letter "oh":
1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ


Show all output here on this page.


Related task



Factor

<lang factor>USING: accessors formatting generalizations io kernel math.parser regexp sequences sets strings ;

>dup-char< ( str n -- char hex first-index second-index )
   1string tuck [ dup first >hex ] 2dip <regexp>
   all-matching-slices first2 [ from>> ] bi@ ;
duplicate-info. ( str -- )
   dup duplicates
   [ >dup-char< "'%s' (0x%s) at indices %d and %d.\n" printf ]
   with each nl ;
uniqueness-report. ( str -- )
   dup dup length "%u — length %d — contains " printf dup
   all-unique? [ drop "all unique characters." print nl ]
   [ "duplicate characters:" print duplicate-info. ] if ;

"" "." "abcABC" "XYZ ZYX" "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" [ uniqueness-report. ] 5 napply</lang>

Output:
"" — length 0 — contains all unique characters.

"." — length 1 — contains all unique characters.

"abcABC" — length 6 — contains all unique characters.

"XYZ ZYX" — length 7 — contains duplicate characters:
'Z' (0x5a) at indices 2 and 4.
'Y' (0x59) at indices 1 and 5.
'X' (0x58) at indices 0 and 6.

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" — length 36 — contains duplicate characters:
'0' (0x30) at indices 9 and 24.

Go

<lang go>package main

import "fmt"

func analyze(s string) {

   // maps each character to a slice of the indices it occurs at
   charMap := make(map[rune][]int)
   runes := []rune(s)
   for i, c := range runes {
       charMap[c] = append(charMap[c], i)
   }
   le := len(runes)
   fmt.Printf("Analyzing %q which has a length of %d:\n", s, le)
   if len(charMap) == le {
       fmt.Println("  All characters in the string are unique.\n")
   } else {
       fmt.Println("  The following characters are duplicated:-")
       for k, v := range charMap {
           if len(v) > 1 {
               fmt.Printf("    %q (%#x) at indices %v\n", k, k, v)
           }
       }
       fmt.Println()
   }

}

func main() {

   strings := []string{
       "",
       ".",
       "abcABC",
       "XYZ ZYX",
       "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ",
       "hétérogénéité",
       "😍😀🙌💃😍🙌",
       "🐠🐟🐡🦈🐬🐳🐋🐡",
   }
   for _, s := range strings {
       analyze(s)
   }

}</lang>

Output:
Analyzing "" which has a length of 0:
  All characters in the string are unique.

Analyzing "." which has a length of 1:
  All characters in the string are unique.

Analyzing "abcABC" which has a length of 6:
  All characters in the string are unique.

Analyzing "XYZ ZYX" which has a length of 7:
  The following characters are duplicated:-
    'X' (0x58) at indices [0 6]
    'Y' (0x59) at indices [1 5]
    'Z' (0x5a) at indices [2 4]

Analyzing "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" which has a length of 36:
  The following characters are duplicated:-
    '0' (0x30) at indices [9 24]

Analyzing "hétérogénéité" which has a length of 13:
  The following characters are duplicated:-
    'é' (0xe9) at indices [1 3 7 9 12]
    't' (0x74) at indices [2 11]

Analyzing "😍😀🙌💃😍🙌" which has a length of 6:
  The following characters are duplicated:-
    '😍' (0x1f60d) at indices [0 4]
    '🙌' (0x1f64c) at indices [2 5]

Analyzing "🐠🐟🐡🦈🐬🐳🐋🐡" which has a length of 8:
  The following characters are duplicated:-
    '🐡' (0x1f421) at indices [2 7]

Perl 6

Works with: Rakudo version 2019.07.1

Perl 6 works with unicode natively and handles combining characters and multi-byte emoji correctly. In the last string, notice the the length is correctly shown as 11 characters and that the delta with a combining circumflex in position 6 is not the same as the deltas without in positions 5 & 9.

<lang perl6> -> $str {

   print "\n{$str.perl} (length: {$str.chars}), has ";
   if my $match = $str.match( / (.).*$0 /, :ex ) {
       my %m;
       %m{.values.Str}.append(flat 1 + .from, .pos) for $match.list;
       say "duplicated characters:";
       say "'{.key}' ({.key.uninames}; hex ordinal: {(.key.ords).fmt: "0x%X"})" ~
       " in positions: {.value.sort.squish.join: ', '}" for %m.sort( *.value[0] );
   } else {
       say "no duplicated characters."
   }

} for

   ,
   '.',
   'abcABC',
   'XYZ ZYX',
   '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ',
   '01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X',
   '🦋🙂👨‍👩‍👧‍👦🙄ΔΔ̂ 🦋Δ👍👨‍👩‍👧‍👦'</lang>
Output:
"" (length: 0), has no duplicated characters.

"." (length: 1), has no duplicated characters.

"abcABC" (length: 6), has no duplicated characters.

"XYZ ZYX" (length: 7), has duplicated characters:
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 1, 7
'Y' (LATIN CAPITAL LETTER Y; hex ordinal: 0x59) in positions: 2, 6
'Z' (LATIN CAPITAL LETTER Z; hex ordinal: 0x5A) in positions: 3, 5

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" (length: 36), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 10, 25

"01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" (length: 39), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 1, 11, 26, 38
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 35, 39

"🦋🙂👨‍👩‍👧‍👦🙄ΔΔ̂ 🦋Δ👍👨‍👩‍👧‍👦" (length: 11), has duplicated characters:
'🦋' (BUTTERFLY; hex ordinal: 0x1F98B) in positions: 1, 8
'👨‍👩‍👧‍👦' (MAN ZERO WIDTH JOINER WOMAN ZERO WIDTH JOINER GIRL ZERO WIDTH JOINER BOY; hex ordinal: 0x1F468 0x200D 0x1F469 0x200D 0x1F467 0x200D 0x1F466) in positions: 3, 11
'Δ' (GREEK CAPITAL LETTER DELTA; hex ordinal: 0x394) in positions: 5, 9

REXX

<lang rexx>/*REXX pgm determines if a string is comprised of all unique characters (no duplicates).*/ @.= /*assign a default for the @. array. */ parse arg @.1 /*obtain optional argument from the CL.*/ if @.1= then do; @.1= /*Not specified? Then assume defaults.*/

                     @.2= .
                     @.3= 'abcABC'
                     @.4= 'XYZ ZYX'
                     @.5= '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ'
               end
    do j=1;  if j\==1  &  @.j==  then leave   /*String is null & not j=1?  We're done*/
    say copies('─', 79)                         /*display a separator line  (a fence). */
    say 'Testing for the string (length' length(@.j)"): "   @.j
    say
    dup= isUnique(@.j)
    say 'The characters in the string'   word("are aren't", 1 + (dup>0) )  'all unique.'
    if dup==0  then iterate
    ?= substr(@.j, dup, 1)
    say 'The character '  ?  " ('"c2x(?)"'x)  at position "  dup ,
                                ' is repeated at position '  pos(?, @.j, dup+1)
    end   /*j*/

exit /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ isUnique: procedure; parse arg x /*obtain the character string.*/

                      do k=1  to length(x) - 1           /*examine all but the last.   */
                      p= pos( substr(x, k, 1), x, k + 1) /*see if the Kth char is a dup*/
                      if p\==0  then return k            /*Find a dup? Return location.*/
                      end   /*k*/
         return 0                                        /*indicate all chars unique.  */</lang>
output   when using the internal defaults
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 0):

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 1):  .

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 6):  abcABC

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 7):  XYZ ZYX

The characters in the string aren't all unique.
The character  X  ('58'x)  at position  1  is repeated at position  7
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 36):  1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ

The characters in the string aren't all unique.
The character  0  ('30'x)  at position  10  is repeated at position  25