Set: Difference between revisions

Content added Content deleted

Inline

Revision as of 11:16, 18 September 2011

A set is a collection of elements, without duplicates and without order.

Show how to create a set, and how to test m ∈ S, true if m is an element in set S. Also show each of these set operations:

A ∪ B -- union; a set of all elements either in set A or in set B.
A ∩ B -- intersection; a set of all elements in both set A and set B.
A − B -- difference; a set of all elements in set A, except those in set B.
A ⊂ B -- subset; true if every element in set A is also in set B.
A = B -- equality; true if every element of set A is in set B and vice-versa.

As an option, show some other set operations. (If A ⊂ B, but A ≠ B, then A is called a true or proper subset of B, written A ⊊ B.) As another option, show how to modify a mutable set.

One might implement a set using an associative array (with set elements as array keys), or using a list, a binary search tree or a hash table. The basic test, m ∈ S, is O(n) with a list, O(log n) with a balanced binary search tree, or near O(1) with a hash table.

C

Building a set highly depends on the datatype and use case. For example, a set of string could be implemented by hash table, sort tree or trie, but if all sets are known to have very few number of elements, it might be best to use just flat arrays. There isn't, and shouldn't be, an all-purpose set type for C.

A frequent use of set is that of small, non-negative integers, implemented as a bit field as shown below. <lang c>#include <stdio.h>

typedef unsigned int set_t; /* probably 32 bits; change according to need */

void show_set(set_t x, char *name) { int i; printf("%s is:", name); for (i = 0; (1U << i) <= x; i++) if (x & (1U << i)) printf(" %d", i); putchar('\n'); }

int main(void) { int i; set_t a, b, c;

a = 0; /* empty set */ for (i = 0; i < 10; i += 3) /* add 0 3 6 9 to set a */ a |= (1U << i); show_set(a, "a");

for (i = 0; i < 5; i++) printf("\t%d%s in set a\n", i, (a & (1U << i)) ? "":" not");

b = a; b |= (1U << 5); b |= (1U << 10); /* b is a plus 5, 10 */ b &= ~(1U << 0); /* sans 0 */ show_set(b, "b");

show_set(a | b, "union(a, b)"); show_set(c = a & b, "c = common(a, b)"); show_set(a & ~b, "a - b"); /* diff, not arithmetic minus */ show_set(b & ~a, "b - a"); printf("b is%s a subset of a\n", !(b & ~a) ? "" : " not"); printf("c is%s a subset of a\n", !(c & ~a) ? "" : " not");

printf("union(a, b) - common(a, b) %s union(a - b, b - a)\n", ((a | b) & ~(a & b)) == ((a & ~b) | (b & ~a)) ? "equals" : "does not equal");

return 0; }</lang>

Icon and Unicon

The set is a basic datatype (structure) in Icon and Unicon, which supports 'member', 'union', 'intersection' and 'difference' operations. Subset and equality must be implemented separately, or use the routines in the 'sets' library.

Implemented directly:

<lang unicon> procedure display_set (s)

 writes ("[")
 every writes (!s || " ")
 write ("]")

end

fail unless s1 and s2 contain the same elements

procedure set_equals (s1, s2)

 return subset(s1, s2) & subset(s2, s1)

end

fail if every element in s2 is not contained in s1

procedure subset (s1, s2)

 every (a := !s2) do {
   if not(member(s1,a)) then fail
 }
 return s2

end

procedure main ()

 a := set(1, 1, 2, 3, 4)
 b := set(2, 3, 5)
 writes ("a: ")
 display_set (a)
 writes ("b: ")
 display_set (b)
 # basic set operations
 writes ("Intersection: ")
 display_set (a ** b)
 writes ("Union: ")
 display_set (a ++ b)
 writes ("Difference: ")
 display_set (a -- b)
 # membership
 if member(a, 2) then
   write ("2 is a member of a")
 else
   write ("2 is not a member of a")
 if member(a, 5) then
   write ("5 is a member of a")
 else
   write ("5 is not a member of a")
 # equality
 if set_equals(a, set(1,2,3,4,4)) then
   write ("a equals set(1,2,3,4,4)")
 else
   write ("a does not equal set(1,2,3,4,4)")
 if set_equals(a, b) then
   write ("a equals b")
 else
   write ("a does not equal b")
 # subset
 if subset(a, set(1,2)) then
   write ("(1,2) is included in a")
 else
   write ("(1,2) is not included in a")
 if subset(a, set(1,2,5)) then
   write ("(1,2,5) is included in a")
 else
   write ("(1,2,5) is not included in a")

end </lang>

Output:

a: [2 4 1 3 ]
b: [5 2 3 ]
Intersection: [2 3 ]
Union: [5 2 4 1 3 ]
Difference: [4 1 ]
2 is a member of a
5 is not a member of a
a equals set(1,2,3,4,4)
a does not equal b
(1,2) is included in a
(1,2,5) is not included in a

Using library:

<lang unicon> link sets

procedure main ()

 a := set(1, 1, 2, 3, 4)
 b := set(2, 3, 5)
 write ("a: ", simage(a))
 write ("b: ", simage(b))
 # basic set operations
 write ("Intersection: ", simage (a**b))
 write ("Union: ", simage        (a++b))
 write ("Difference: ", simage   (a--b))
 # membership
 if member(a, 2) then
   write ("2 is a member of a")
 else
   write ("2 is not a member of a")
 if member(a, 5) then
   write ("5 is a member of a")
 else
   write ("5 is not a member of a")
 # equality
 if seteq(a, set(1,2,3,4,4)) then
   write ("a equals set(1,2,3,4,4)")
 else
   write ("a does not equal set(1,2,3,4,4)")
 if seteq(a, b) then
   write ("a equals b")
 else
   write ("a does not equal b")
 # check subset
 if setlt(set(1,2), a) then
   write ("(1,2) is included in a")
 else
   write ("(1,2) is not included in a")
 if setlt(a, set(1,2,5), a) then
   write ("(1,2,5) is included in a")
 else
   write ("(1,2,5) is not included in a")

end </lang>

Output:

a: { 2, 4, 1, 3 }
b: { 5, 2, 3 }
Intersection: { 2, 3 }
Union: { 5, 2, 4, 1, 3 }
Difference: { 4, 1 }
2 is a member of a
5 is not a member of a
a equals set(1,2,3,4,4)
a does not equal b
(1,2) is included in a
(1,2,5) is not included in a

J

In J, we use a sequence to represent a set. This actually winds up being a family of set implementations. In this example, we chose to ignore order and specify that duplicate elements are not allowed.

Here are definitions for the required operations:

<lang j>union=: ~.@, intersection=: [ -. -. difference=: -. subset=: *./@e. equality=: -:&(/:~)</lang>

Examples:

<lang j> 2 4 6 8 ~.@, 2 3 5 7 2 4 6 8 3 5 7

  2 4 6 8 ([ -. -.) 2 3 5 7

2

  2 4 6 8 -. 2 3 5 7

4 6 8

  2 4 6 8 *./@e. 2 3 5 7

0

   *./@e. 2 3 5 7

1

  2 4 6 8 3 5 7 -:&(/:~) 8 7 6 5 4 3 2

1</lang>

Note that these operations can be combined in sentences with other operations. For example we could define

<lang j>properSubset=: subset *. 1 - equality</lang>

Perl

Primitive set implementation using hashes. <lang perl>use strict;

package Set; # likely will conflict with stuff on CPAN use overload '""' => \&str, 'bool' => \&count, '+=' => \&add, '-=' => \&del, '-' => \&diff, '==' => \&eq, '&' => \&intersection, '|' => \&union, '^' => \&xdiff;

sub str { my $set = shift; # This has drawbacks: stringification is used as set key # if the set is added to another set as an element, which # may cause inconsistencies if the element set is modified # later. In general, a hash key loses its object identity # anyway, so it's not unique to us. "Set{ ". join(", " => sort map("$_", values %$set)) . " }" }

sub new { my $pkg = shift; my $h = bless {}; $h->add($_) for @_; $h }

sub add { my ($set, $elem) = @_; $set->{$elem} = $elem; $set }

sub del { my ($set, $elem) = @_; delete $set->{$elem}; $set }

sub has { # set has element my ($set, $elem) = @_; exists $set->{$elem} }

sub union { my ($this, $that) = @_; bless { %$this, %$that } }

sub intersection { my ($this, $that) = @_; my $s = new Set; for (keys %$this) { $s->{$_} = $this->{$_} if exists $that->{$_} } $s }

sub diff { my ($this, $that) = @_; my $s = Set->new; for (keys %$this) { $s += $this->{$_} unless exists $that->{$_} } $s }

sub xdiff { # xor, symmetric diff my ($this, $that) = @_; my $s = new Set; bless { %{ ($this - $that) | ($that - $this) } } }

sub count { scalar(keys %{+shift}) }

sub eq { my ($this, $that) = @_; !($this - $that) && !($that - $this); }

sub contains { # this is a superset of that my ($this, $that) = @_; for (keys %$that) { return 0 unless $this->has($_) } return 1 }

package main; my ($x, $y, $z, $w);

$x = Set->new(1, 2, 3); $x += $_ for (5 .. 7); $y = Set->new(1, 2, 4, $x); # not the brightest idea

print "set x is: $x\nset y is: $y\n"; for (1 .. 4, $x) { print "$_ is", $y->has($_) ? "" : " not", " in y\n"; }

print "union: ", $x | $y, "\n"; print "intersect: ", $x & $y, "\n"; print "z = x - y = ", $z = $x - $y, "\n"; print "y is ", $x->contains($y) ? "" : "not ", "a subset of x\n"; print "z is ", $x->contains($z) ? "" : "not ", "a subset of x\n"; print "z = (x | y) - (x & y) = ", $z = ($x | $y) - ($x & $y), "\n"; print "w = x ^ y = ", $w = ($x ^ $y), "\n"; print "w is ", ($w == $z) ? "" : "not ", "equal to z\n"; print "w is ", ($w == $x) ? "" : "not ", "equal to x\n";</lang>

Python

In Python, the set is a standard type with support in the languages syntax. <lang python>>>> s1, s2 = {1, 2, 3, 4}, {3, 4, 5, 6} >>> s1 | s2; # Union {1, 2, 3, 4, 5, 6} >>> s1 & s2; # Intersection {3, 4} >>> s1 - s2; # Difference {1, 2} >>> s1 < s1; # True subset False >>> {3, 1} < s1; # True subset True >>> s1 <= s1; # Subset True >>> {3, 1} <= s1; # Subset True >>> {3, 2, 4, 1} == s1; # Equality True >>> s1 == s2; # Equality False >>> 2 in s1; # Membership True >>> 10 not in s1; # Non-membership True >>> {1, 2, 3, 4, 5} > s1; # True superset True >>> {1, 2, 3, 4} > s1; # True superset False >>> {1, 2, 3, 4} >= s1; # Superset True >>> s1 ^ s2; # Symmetric difference {1, 2, 5, 6} >>> len(s1); # Cardinality 4 >>> s1.add(99); # Mutability >>> s1 {99, 1, 2, 3, 4} >>> s1.discard(99); # Mutability >>> s1 {1, 2, 3, 4} >>> s1 |= s2; # Mutability >>> s1 {1, 2, 3, 4, 5, 6} >>> s1 -= s2; # Mutability >>> s1 {1, 2} >>> s1 ^= s2; # Mutability >>> s1 {1, 2, 3, 4, 5, 6} >>> </lang>

Tcl

Sets in Tcl are modeled as lists of items, with operations that preserve uniqueness of membership.

Library: Tcllib (Package: struct::set)

<lang tcl>package require struct::set

Many ways to build sets

set s1 [list 1 2 3 4] set s2 {3 4 5 6} struct::set add s3 {2 3 4 3 2}; # $s3 will be proper set... set item 5

puts "union: [struct::set union $s1 $s2]" puts "intersection: [struct::set intersect $s1 $s2]" puts "difference: [struct::set difference $s1 $s2]" puts "membership predicate: [struct::set contains $s1 $item]" puts "subset predicate: [struct::set subsetof $s1 $s2]"; # NB: not strict subset test! puts "equality predicate: [struct::set equal $s1 $s2]"

Adding an element to a set (note that we pass in the name of the variable holding the set):

struct::set include s3 $item

Removing an element from a set:

struct::set exclude s3 $item

Getting the cardinality:

puts "cardinality: [struct::set size $s3]</lang>