I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

Unique characters

From Rosetta Code
Unique characters is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Given a list of strings,   find characters appearing only in one string and once only.

The result should be given in alphabetical order.


Use the following list for this task:

        ["133252abcdeeffd",  "a6789798st",  "yxcdfgxcyz"]


Other tasks related to string operations:
Metrics
Counting
Remove/replace
Anagrams/Derangements/shuffling
Find/Search/Determine
Formatting
Song lyrics/poems/Mad Libs/phrases
Tokenize
Sequences



AppleScript[edit]

AppleScriptObjC[edit]

The filtering here is case sensitive, the sorting dependent on locale.

on uniqueCharacters(listOfStrings)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
set countedSet to current application's class "NSCountedSet"'s setWithArray:((listOfStrings as text)'s characters)
set AppleScript's text item delimiters to astid
set mutableSet to current application's class "NSMutableSet"'s setWithSet:(countedSet)
tell countedSet to minusSet:(mutableSet)
tell mutableSet to minusSet:(countedSet)
set sortDescriptor to current application's class "NSSortDescriptor"'s sortDescriptorWithKey:("self") ¬
ascending:(true) selector:("localizedStandardCompare:")
 
return (mutableSet's sortedArrayUsingDescriptors:({sortDescriptor})) as list
end uniqueCharacters
Output:
{"1", "5", "6", "b", "g", "s", "t", "z"}

Core language only[edit]

This isn't quite as fast as the ASObjC solution above, but it can be case-insensitive if required. (Simply leave out the 'considering case' statement round the call to the handler). The requirement for AppleScript 2.3.1 is just for the 'use' command which loads the "Heap Sort" script. If "Heap Sort"'s loaded differently or compiled directly into the code, this script will work on systems at least as far back as Mac OS X 10.5 (Leopard) and possibly earlier. Same output as above.

use AppleScript version "2.3.1" -- OS X 10.9 (Mavericks) or later
use sorter : script "Heap Sort" -- <https://www.rosettacode.org/wiki/Sorting_algorithms/Heapsort#AppleScript>
 
on uniqueCharacters(listOfStrings)
script o
property allCharacters : {}
property uniques : {}
end script
 
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
set o's allCharacters to text items of (listOfStrings as text)
set AppleScript's text item delimiters to astid
 
set characterCount to (count o's allCharacters)
tell sorter to sort(o's allCharacters, 1, characterCount)
 
set i to 1
set currentCharacter to beginning of o's allCharacters
repeat with j from 2 to characterCount
set thisCharacter to item j of o's allCharacters
if (thisCharacter is not currentCharacter) then
if (j - i = 1) then set end of o's uniques to currentCharacter
set i to j
set currentCharacter to thisCharacter
end if
end repeat
if (i = j) then set end of o's uniques to currentCharacter
 
return o's uniques
end uniqueCharacters
 
considering case
return uniqueCharacters({"133252abcdeeffd", "a6789798st", "yxcdfgxcyz"})
end considering

Arturo[edit]

arr: ["133252abcdeeffd" "a6789798st" "yxcdfgxcyz"]
str: join arr
 
print sort select split str 'ch -> 1 = size match str ch
Output:
1 5 6 b g s t z

AWK[edit]

 
# syntax: GAWK -f UNIQUE_CHARACTERS.AWK
#
# sorting:
# PROCINFO["sorted_in"] is used by GAWK
# SORTTYPE is used by Thompson Automation's TAWK
#
BEGIN {
PROCINFO["sorted_in"] = "@ind_str_asc" ; SORTTYPE = 1
n = split("133252abcdeeffd,a6789798st,yxcdfgxcyz",arr1,",")
for (i=1; i<=n; i++) {
str = arr1[i]
printf("%s\n",str)
total_c += leng = length(str)
for (j=1; j<=leng; j++) {
arr2[substr(str,j,1)]++
}
}
for (c in arr2) {
if (arr2[c] == 1) {
rec = sprintf("%s%s",rec,c)
}
}
printf("%d strings, %d characters, %d different, %d unique: %s\n",n,total_c,length(arr2),length(rec),rec)
exit(0)
}
 
Output:
133252abcdeeffd
a6789798st
yxcdfgxcyz
3 strings, 35 characters, 20 different, 8 unique: 156bgstz

C++[edit]

#include <iostream>
#include <map>
 
int main() {
const char* strings[] = {"133252abcdeeffd", "a6789798st", "yxcdfgxcyz"};
std::map<char, int> count;
for (const char* str : strings) {
for (; *str; ++str)
++count[*str];
}
for (const auto& p : count) {
if (p.second == 1)
std::cout << p.first;
}
std::cout << '\n';
}
Output:
156bgstz

Factor[edit]

Works with: Factor version 0.99 build 2074
USING: io sequences sets.extras sorting ;
 
{ "133252abcdeeffd" "a6789798st" "yxcdfgxcyz" }
concat non-repeating natural-sort print
Output:
156bgstz

Julia[edit]

list = ["133252abcdeeffd", "a6789798st", "yxcdfgxcyz"]
 
function is_once_per_all_strings_in(a::Vector{String})
charlist = collect(prod(a))
counts = Dict(c => count(x -> c == x, charlist) for c in unique(charlist))
return sort([p[1] for p in counts if p[2] == 1])
end
 
println(is_once_per_all_strings_in(list))
 
Output:

['1', '5', '6', 'b', 'g', 's', 't', 'z']

One might think that the method above suffers from too many passes through the text with one pass per count, but with a small text length the dictionary lookup takes more time. Compare times for a single pass version:

function uniquein(a)
counts = Dict{Char, Int}()
for c in prod(list)
counts[c] = get!(counts, c, 0) + 1
end
return sort([c for (c, n) in counts if n == 1])
end
 
println(uniquein(list))
 
using BenchmarkTools
@btime is_once_per_all_strings_in(list)
@btime uniquein(list)
 
Output:

['1', '5', '6', 'b', 'g', 's', 't', 'z']

 1.740 μs (28 allocations: 3.08 KiB)
 3.763 μs (50 allocations: 3.25 KiB)

This can be rectified (see Phix entry) if we don't save the counts as we go but just exclude entries with duplicates:

function uniquein2(a)
s = sort(collect(prod(list)))
l = length(s)
return [p[2] for p in enumerate(s) if (p[1] == 1 || p[2] != s[p[1] - 1]) && (p[1] == l || p[2] != s[p[1] + 1])]
end
 
println(uniquein2(list))
 
@btime uniquein2(list)
 
Output:

['1', '5', '6', 'b', 'g', 's', 't', 'z']

 1.010 μs (14 allocations: 1.05 KiB)

Nim[edit]

One solution, but others are possible, for instance concatenating the strings and building the count table from it rather than merging several count tables. And to build the last sequence, we could have used something like sorted(toSeq(charCount.pairs).filterIt(it[1] == 1).mapIt(it[0])), which is a one liner but less readable and less efficient than our solution using “collect”.

import algorithm, sugar, tables
 
var charCount: CountTable[char]
 
for str in ["133252abcdeeffd", "a6789798st", "yxcdfgxcyz"]:
charCount.merge str.toCountTable
 
let uniqueChars = collect(newSeq):
for ch, count in charCount.pairs:
if count == 1: ch
 
echo sorted(uniqueChars)
Output:
@['1', '5', '6', 'b', 'g', 's', 't', 'z']

Perl[edit]

Translation of: Raku
# 20210506 Perl programming solution
 
use strict;
use warnings;
use utf8;
use Unicode::Collate 'sort';
 
my %seen;
binmode(STDOUT, ':encoding(utf8)');
map { s/(\X)/$seen{$1}++/egr }
"133252abcdeeffd", "a6789798st", "yxcdfgxcyz", "AАΑSäaoö٥🤔👨‍👩‍👧‍👧";
my $uca = Unicode::Collate->new();
print $uca->sort ( grep { $seen{$_} == 1 } keys %seen )
Output:
👨‍👩‍👧‍👧🤔15٥6AäbgoösStzΑА

Phix[edit]

function once(integer ch, i, string s)
    integer l = length(s)
    return (i=1 or ch!=s[i-1])
       and (i=l or ch!=s[i+1])
end function

sequence set = {"133252abcdeeffd","a6789798st","yxcdfgxcyz"},
         res = filter(sort(join(set,"")),once)
printf(1,"found %d unique characters: %s\n",{length(res),res})
Output:
found 8 unique characters: 156bgstz

Raku[edit]

One has to wonder where the digits 0 through 9 come in the alphabet... 🤔 For that matter, What alphabet should they be in order of? Most of these entries seem to presuppose ASCII order but that isn't specified anywhere. What to do with characters outside of ASCII (or Latin-1)? Unicode ordinal order? Or maybe DUCET Unicode collation order? It's all very vague.

my @list = <133252abcdeeffd a6789798st yxcdfgxcyz>;
 
for @list, (@list, 'AАΑSäaoö٥🤔👨‍👩‍👧‍👧') {
say "$_\nSemi-bogus \"Unicode natural sort\" order: ",
.map( *.comb ).Bag.grep( *.value == 1 )».key.sort( { .unival, .NFKD[0], .fc } ).join,
"\n (DUCET) Unicode collation order: ",
.map( *.comb ).Bag.grep( *.value == 1 )».key.collate.join, "\n";
}
Output:
133252abcdeeffd a6789798st yxcdfgxcyz
Semi-bogus "Unicode natural sort" order: 156bgstz
        (DUCET) Unicode collation order: 156bgstz

133252abcdeeffd a6789798st yxcdfgxcyz AАΑSäaoö٥🤔👨‍👩‍👧‍👧
Semi-bogus "Unicode natural sort" order: 15٥6ASäbgoöstzΑА👨‍👩‍👧‍👧🤔
        (DUCET) Unicode collation order: 👨‍👩‍👧‍👧🤔ä15٥6AbögosStzΑА

REXX[edit]

This REXX program doesn't assume ASCII (or any other) order.   This example was run on an ASCII machine.

If this REXX program is run on an  ASCII  machine,   it will use the   ASCII   order of characters,   in this case,
decimal digits,   uppercase Latin letters,   and then lowercase Latin letters,   with other characters interspersed.

On an  EBCDIC  machine,   the order would be lowercase Latin letters,   uppercase Latin letters,   and then the
decimal digits,   with other characters interspersed.

On an  EBCDIC  machine,   the lowercase letters and the uppercase letters   aren't   contiguous.

/*REXX pgm finds and shows characters that are unique to only one string  and once only.*/
parse arg $ /*obtain optional arguments from the CL*/
if $='' | $="," then $= '133252abcdeeffd' "a6789798st" 'yxcdfgxcyz' /*use defaults.*/
if $='' then do; say "***error*** no lists were specified."; exit 13; end
@= /*will be a list of all unique chars. */
 
do j=0 for 256; x= d2c(j) /*process all the possible characters. */
if x==' ' then iterate /*ignore blanks which are a delimiter. */
_= pos(x, $); if _==0 then iterate /*character not found, then skip it. */
_= pos(x, $, _+1); if _ >0 then iterate /*Character is a duplicate? Skip it. */
@= @ x
end /*j*/ /*stick a fork in it, we're all done. */
 
@@= space(@, 0); L= length(@@) /*elided superfluous blanks; get length*/
if @@=='' then @= " (none)" /*if none were found, pretty up message*/
if L==0 then L= "no" /*do the same thing for the # of chars.*/
say 'unique characters are: ' @ /*display the unique characters found. */
say
say 'Found ' L " unique characters." /*display the # of unique chars found. */
output   when using the default inputs:
unique characters are:   1 5 6 b g s t z

Found  8  unique characters.

Ring[edit]

 
see "working..." + nl
see "Unique characters are:" + nl
row = 0
str = ""
cList = []
uniqueChars = ["133252abcdeeffd", "a6789798st","yxcdfgxcyz"]
for n = 1 to len(uniqueChars)
str = str + uniqueChars[n]
next
for n = 1 to len(str)
ind = count(str,str[n])
if ind = 1
row = row + 1
add(cList,str[n])
ok
next
cList = sort(cList)
for n = 1 to len(cList)
see "" + cList[n] + " "
next
see nl
 
see "Found " + row + " unique characters" + nl
see "done..." + nl
 
func count(cString,dString)
sum = 0
while substr(cString,dString) > 0
sum++
cString = substr(cString,substr(cString,dString)+len(string(sum)))
end
return sum
 
Output:
working...
Unique characters are:
1 5 6 b g s t z 
Found 8 unique characters
done...

Wren[edit]

Library: Wren-seq
Library: Wren-sort
import "/seq" for Lst
import "/sort" for Sort
 
var strings = ["133252abcdeeffd", "a6789798st","yxcdfgxcyz"]
var totalChars = strings.reduce { |acc, str| acc + str }.toList
var uniqueChars = Lst.individuals(totalChars).where { |l| l[1] == 1 }.map { |l| l[0] }.toList
Sort.insertion(uniqueChars)
System.print("Found %(uniqueChars.count) unique character(s), namely:")
System.print(uniqueChars.join(" "))
Output:
Found 8 unique character(s), namely:
1 5 6 b g s t z

XPL0[edit]

int     List, I, N, C;
char Tbl(128), Str;
string 0;
[List:= ["133252abcdeeffd", "a6789798st","yxcdfgxcyz"];
for I:= 0 to 127 do Tbl(I):= 0;
for N:= 0 to 2 do
[Str:= List(N);
I:= 0;
loop [C:= Str(I);
if C = 0 then quit;
I:= I+1;
Tbl(C):= Tbl(C)+1;
];
];
for I:= 0 to 127 do
if Tbl(I) = 1 then ChOut(0, I);
]
Output:
156bgstz