Determine if a string has all unique characters
- Task
Given a character string (which may be empty, or have a length of zero characters):
- create a function/procedure/routine to:
- determine if all the characters in the string are unique
- indicate if or which character is duplicated and where
- display each string and it's length (as the strings are being examined)
- a zero─length (empty) string shall be considered as unique
- process the strings from left─to─right
- if unique, display a message saying such
- if not unique, then:
- display a message saying such
- display what character is duplicated
- only the 1st non─unique character need be displayed
- display where "both" duplicated characters are in the string
- the above messages can be part of a single message
- display the hexadecimal value of the duplicated character
Use (at least) these five test values (strings):
- a string of length 0 (an empty string)
- a string of length 1 which is a single period (.)
- a string of length 6 which contains: abcABC
- a string of length 7 which contains a blank in the middle: XYZ ZYX
- a string of length 36 which doesn't contain the letter "oh":
- 1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ
Show all output here on this page.
Factor
<lang factor>USING: accessors formatting generalizations io kernel math.parser regexp sequences sets strings ;
- >dup-char< ( str n -- char hex first-index second-index )
1string tuck [ dup first >hex ] 2dip <regexp> all-matching-slices first2 [ from>> ] bi@ ;
- duplicate-info. ( str -- )
dup duplicates [ >dup-char< "'%s' (0x%s) at indices %d and %d.\n" printf ] with each nl ;
- uniqueness-report. ( str -- )
dup dup length "%u — length %d — contains " printf dup all-unique? [ drop "all unique characters." print nl ] [ "duplicate characters:" print duplicate-info. ] if ;
"" "." "abcABC" "XYZ ZYX" "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" [ uniqueness-report. ] 5 napply</lang>
- Output:
"" — length 0 — contains all unique characters. "." — length 1 — contains all unique characters. "abcABC" — length 6 — contains all unique characters. "XYZ ZYX" — length 7 — contains duplicate characters: 'Z' (0x5a) at indices 2 and 4. 'Y' (0x59) at indices 1 and 5. 'X' (0x58) at indices 0 and 6. "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" — length 36 — contains duplicate characters: '0' (0x30) at indices 9 and 24.
Go
<lang go>package main
import "fmt"
func analyze(s string) {
chars := []rune(s) le := len(chars) fmt.Printf("Analyzing %q which has a length of %d:\n", s, le) if le > 1 { for i := 0; i < le-1; i++ { for j := i + 1; j < le; j++ { if chars[j] == chars[i] { fmt.Println(" Not all characters in the string are unique.") fmt.Printf(" %q (%#[1]x) is duplicated at positions %d and %d.\n\n", chars[i], i+1, j+1) return } } } } fmt.Println(" All characters in the string are unique.\n")
}
func main() {
strings := []string{ "", ".", "abcABC", "XYZ ZYX", "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ", "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X", "hétérogénéité", "🎆🎃🎇🎈", "😍😀🙌💃😍🙌", "🐠🐟🐡🦈🐬🐳🐋🐡", } for _, s := range strings { analyze(s) }
}</lang>
- Output:
Analyzing "" which has a length of 0: All characters in the string are unique. Analyzing "." which has a length of 1: All characters in the string are unique. Analyzing "abcABC" which has a length of 6: All characters in the string are unique. Analyzing "XYZ ZYX" which has a length of 7: Not all characters in the string are unique. 'X' (0x58) is duplicated at positions 1 and 7. Analyzing "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" which has a length of 36: Not all characters in the string are unique. '0' (0x30) is duplicated at positions 10 and 25. Analyzing "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" which has a length of 39: Not all characters in the string are unique. '0' (0x30) is duplicated at positions 1 and 11. Analyzing "hétérogénéité" which has a length of 13: Not all characters in the string are unique. 'é' (0xe9) is duplicated at positions 2 and 4. Analyzing "🎆🎃🎇🎈" which has a length of 4: All characters in the string are unique. Analyzing "😍😀🙌💃😍🙌" which has a length of 6: Not all characters in the string are unique. '😍' (0x1f60d) is duplicated at positions 1 and 5. Analyzing "🐠🐟🐡🦈🐬🐳🐋🐡" which has a length of 8: Not all characters in the string are unique. '🐡' (0x1f421) is duplicated at positions 3 and 8.
Perl 6
Perl 6 works with unicode natively and handles combining characters and multi-byte emoji correctly. In the last string, notice the the length is correctly shown as 11 characters and that the delta with a combining circumflex in position 6 is not the same as the deltas without in positions 5 & 9.
<lang perl6> -> $str {
my $i = 0; print "\n{$str.perl} (length: {$str.chars}), has "; my %m; %m{$_}.push: ++$i for $str.comb; if any(%m.values) > 1 { say "duplicated characters:"; say "'{.key}' ({.key.uninames}; hex ordinal: {(.key.ords).fmt: "0x%X"})" ~ " in positions: {.value.join: ', '}" for %m.grep( *.value > 1 ).sort( *.value[0] ); } else { say "no duplicated characters." }
} for
, '.', 'abcABC', 'XYZ ZYX', '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ', '01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X', '🦋🙂👨👩👧👦🙄ΔΔ̂ 🦋Δ👍👨👩👧👦'</lang>
- Output:
"" (length: 0), has no duplicated characters. "." (length: 1), has no duplicated characters. "abcABC" (length: 6), has no duplicated characters. "XYZ ZYX" (length: 7), has duplicated characters: 'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 1, 7 'Y' (LATIN CAPITAL LETTER Y; hex ordinal: 0x59) in positions: 2, 6 'Z' (LATIN CAPITAL LETTER Z; hex ordinal: 0x5A) in positions: 3, 5 "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" (length: 36), has duplicated characters: '0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 10, 25 "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" (length: 39), has duplicated characters: '0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 1, 11, 26, 38 'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 35, 39 "🦋🙂👨👩👧👦🙄ΔΔ̂ 🦋Δ👍👨👩👧👦" (length: 11), has duplicated characters: '🦋' (BUTTERFLY; hex ordinal: 0x1F98B) in positions: 1, 8 '👨👩👧👦' (MAN ZERO WIDTH JOINER WOMAN ZERO WIDTH JOINER GIRL ZERO WIDTH JOINER BOY; hex ordinal: 0x1F468 0x200D 0x1F469 0x200D 0x1F467 0x200D 0x1F466) in positions: 3, 11 'Δ' (GREEK CAPITAL LETTER DELTA; hex ordinal: 0x394) in positions: 5, 9
REXX
<lang rexx>/*REXX pgm determines if a string is comprised of all unique characters (no duplicates).*/ @.= /*assign a default for the @. array. */ parse arg @.1 /*obtain optional argument from the CL.*/ if @.1= then do; @.1= /*Not specified? Then assume defaults.*/
@.2= . @.3= 'abcABC' @.4= 'XYZ ZYX' @.5= '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ' end
do j=1; if j\==1 & @.j== then leave /*String is null & not j=1? We're done*/ say copies('─', 79) /*display a separator line (a fence). */ say 'Testing for the string (length' length(@.j)"): " @.j say dup= isUnique(@.j) say 'The characters in the string' word("are aren't", 1 + (dup>0) ) 'all unique.' if dup==0 then iterate ?= substr(@.j, dup, 1) say 'The character ' ? " ('"c2x(?)"'x) at position " dup , ' is repeated at position ' pos(?, @.j, dup+1) end /*j*/
exit /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ isUnique: procedure; parse arg x /*obtain the character string.*/
do k=1 to length(x) - 1 /*examine all but the last. */ p= pos( substr(x, k, 1), x, k + 1) /*see if the Kth char is a dup*/ if p\==0 then return k /*Find a dup? Return location.*/ end /*k*/ return 0 /*indicate all chars unique. */</lang>
- output when using the internal defaults
─────────────────────────────────────────────────────────────────────────────── Testing for the string (length 0): The characters in the string are all unique. ─────────────────────────────────────────────────────────────────────────────── Testing for the string (length 1): . The characters in the string are all unique. ─────────────────────────────────────────────────────────────────────────────── Testing for the string (length 6): abcABC The characters in the string are all unique. ─────────────────────────────────────────────────────────────────────────────── Testing for the string (length 7): XYZ ZYX The characters in the string aren't all unique. The character X ('58'x) at position 1 is repeated at position 7 ─────────────────────────────────────────────────────────────────────────────── Testing for the string (length 36): 1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ The characters in the string aren't all unique. The character 0 ('30'x) at position 10 is repeated at position 25
zkl
<lang zkl>fcn stringUniqueness(str){ // Does not handle Unicode
sz,unique,uz,counts := str.len(), str.unique(), unique.len(), str.counts(); println("Length %d: \"%s\"".fmt(sz,str)); if(sz==uz or uz==1) println("\tAll characters are unique"); else // counts is (char,count, char,count, ...) println("\tDuplicate: ", counts.pump(List,Void.Read,fcn(str,c,n){ if(n>1){
is,z:=List(),-1; do(n){ is.append(z=str.find(c,z+1)) } "'%s' (0x%x)[%s]".fmt(c,c.toAsc(),is.concat(",")) } else Void.Skip }.fp(str)).concat(", ")); }</lang> <lang zkl>testStrings:=T("", ".", "abcABC", "XYZ ZYX",
"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ", "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X");
foreach s in (testStrings){ stringUniqueness(s) }</lang>
- Output:
Length 0: "" All characters are unique Length 1: "." All characters are unique Length 6: "abcABC" All characters are unique Length 7: "XYZ ZYX" Duplicate: 'X' (0x58)[0,6], 'Y' (0x59)[1,5], 'Z' (0x5a)[2,4] Length 36: "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" Duplicate: '0' (0x30)[9,24] Length 39: "01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" Duplicate: '0' (0x30)[0,10,25,37], 'X' (0x58)[34,38]