Set: Difference between revisions

Content added Content deleted

Inline

Revision as of 21:29, 26 September 2011

A set is a collection of elements, without duplicates and without order.

Show each of these set operations:

Set creation
Test m ∈ S (i.e. "m is an element in set S")
A ∪ B -- union; a set of all elements either in set A or in set B.
A ∩ B -- intersection; a set of all elements in both set A and set B.
A − B -- difference; a set of all elements in set A, except those in set B.
A ⊆ B -- subset; true if every element in set A is also in set B.
A = B -- equality; true if every element of set A is in set B and vice-versa.

As an option, show some other set operations. (If A ⊆ B, but A ≠ B, then A is called a true or proper subset of B, written A ⊂ B or A ⊊ B.) As another option, show how to modify a mutable set.

One might implement a set using an associative array (with set elements as array keys and some dummy value as the values; associative arrays are usually implemented using a binary search tree or a hash table) or using a list. The basic test, m ∈ S, is O(n) with a list, O(log n) with a balanced binary search tree, or (O(1) average-case, O(n) worst case) with a hash table.

C

Building a set highly depends on the datatype and use case. For example, a set of string could be implemented by hash table, sort tree or trie, but if all sets are known to have very few number of elements, it might be best to use just flat arrays. There isn't, and shouldn't be, an all-purpose set type for C.

A frequent use of set is that of small, non-negative integers, implemented as a bit field as shown below. <lang c>#include <stdio.h>

typedef unsigned int set_t; /* probably 32 bits; change according to need */

void show_set(set_t x, char *name) { int i; printf("%s is:", name); for (i = 0; (1U << i) <= x; i++) if (x & (1U << i)) printf(" %d", i); putchar('\n'); }

int main(void) { int i; set_t a, b, c;

a = 0; /* empty set */ for (i = 0; i < 10; i += 3) /* add 0 3 6 9 to set a */ a |= (1U << i); show_set(a, "a");

for (i = 0; i < 5; i++) printf("\t%d%s in set a\n", i, (a & (1U << i)) ? "":" not");

b = a; b |= (1U << 5); b |= (1U << 10); /* b is a plus 5, 10 */ b &= ~(1U << 0); /* sans 0 */ show_set(b, "b");

show_set(a | b, "union(a, b)"); show_set(c = a & b, "c = common(a, b)"); show_set(a & ~b, "a - b"); /* diff, not arithmetic minus */ show_set(b & ~a, "b - a"); printf("b is%s a subset of a\n", !(b & ~a) ? "" : " not"); printf("c is%s a subset of a\n", !(c & ~a) ? "" : " not");

printf("union(a, b) - common(a, b) %s union(a - b, b - a)\n", ((a | b) & ~(a & b)) == ((a & ~b) | (b & ~a)) ? "equals" : "does not equal");

return 0; }</lang>

C++

C++ standard library contains a set class, which is a sorted container without duplicates and implemented as a binary tree. Additional set functionality can be implemented in terms of standard library algorithms.

C++11 standard library also contains unordered_set based on a hash table. However, algorithms like std::set_intersection etc take sorted ranges, so set-specific functions should be hand-rolled.

include <set>
include <iostream>
include <iterator>
include <algorithm>

namespace set_display { template <class T> std::ostream& operator<<(std::ostream& os, const std::set<T>& set) {

   os << '[';
   if (!set.empty()) {
       std::copy(set.begin(), --set.end(), std::ostream_iterator<T>(os, ", "));
       os << *--set.end();
   }
   return os << ']';

} }

template <class T> bool contains(const std::set<T>& set, const T& key) {

   return set.find(key) != set.end();

}

template <class T> std::set<T> set_union(std::set<T> a, const std::set<T>& b) {

   a.insert(b.begin(), b.end());
   return a;

}

template <class T> std::set<T> set_intersection(const std::set<T>& a, const std::set<T>& b) {

   std::set<T> result;
   std::set_intersection(a.begin(), a.end(), b.begin(), b.end(), std::inserter(result, result.end()));
   return result;

}

template <class T> std::set<T> set_difference(const std::set<T>& a, const std::set<T>& b) {

   std::set<T> result;
   std::set_difference(a.begin(), a.end(), b.begin(), b.end(), std::inserter(result, result.end()));
   return result;

}

template <class T> bool is_subset(const std::set<T>& set, const std::set<T>& subset) {

   return std::includes(set.begin(), set.end(), subset.begin(), subset.end());

}

int main() {

   using namespace set_display;
   std::set<int> a{2, 5, 7, 5, 9, 2}; //C++11 initialization syntax
   std::set<int> b{1, 5, 9, 7, 4 };
   std::cout << "a = " << a << '\n';
   std::cout << "b = " << b << '\n';

   int value1 = 8, value2 = 5;
   std::cout << "Set a " << (contains(a, value1) ? "contains " : "does not contain ") << value1 << '\n';
   std::cout << "Set a " << (contains(a, value2) ? "contains " : "does not contain ") << value2 << '\n';

   std::cout << "Union of a and b: " << set_union(a, b) << '\n';
   std::cout << "Intersection of a and b: " << set_intersection(a, b) << '\n';
   std::cout << "Difference of a and b: " << set_difference(a, b) << '\n';

   std::set<int> sub{5, 9};
   std::cout << "Set b " << (is_subset(a, b) ? "is" : "is not") << " a subset of a\n";
   std::cout << "Set " << sub << ' ' << (is_subset(a, sub) ? "is" : "is not") << " a subset of a\n";

   std::set<int> copy = a;
   std::cout << "a " << (a == copy ? "equals " : "does not equal ") << copy << '\n';

} </lang>

Icon and Unicon

The set is a basic datatype (structure) in Icon and Unicon, which supports 'member', 'union', 'intersection' and 'difference' operations. Subset and equality must be implemented separately, or use the routines in the 'sets' library.

Implemented directly:

<lang unicon> procedure display_set (s)

 writes ("[")
 every writes (!s || " ")
 write ("]")

end

fail unless s1 and s2 contain the same elements

procedure set_equals (s1, s2)

 return subset(s1, s2) & subset(s2, s1)

end

fail if every element in s2 is not contained in s1

procedure subset (s1, s2)

 every (a := !s2) do {
   if not(member(s1,a)) then fail
 }
 return s2

end

procedure main ()

 a := set(1, 1, 2, 3, 4)
 b := set(2, 3, 5)
 writes ("a: ")
 display_set (a)
 writes ("b: ")
 display_set (b)
 # basic set operations
 writes ("Intersection: ")
 display_set (a ** b)
 writes ("Union: ")
 display_set (a ++ b)
 writes ("Difference: ")
 display_set (a -- b)
 # membership
 if member(a, 2) then
   write ("2 is a member of a")
 else
   write ("2 is not a member of a")
 if member(a, 5) then
   write ("5 is a member of a")
 else
   write ("5 is not a member of a")
 # equality
 if set_equals(a, set(1,2,3,4,4)) then
   write ("a equals set(1,2,3,4,4)")
 else
   write ("a does not equal set(1,2,3,4,4)")
 if set_equals(a, b) then
   write ("a equals b")
 else
   write ("a does not equal b")
 # subset
 if subset(a, set(1,2)) then
   write ("(1,2) is included in a")
 else
   write ("(1,2) is not included in a")
 if subset(a, set(1,2,5)) then
   write ("(1,2,5) is included in a")
 else
   write ("(1,2,5) is not included in a")

end </lang>

Output:

a: [2 4 1 3 ]
b: [5 2 3 ]
Intersection: [2 3 ]
Union: [5 2 4 1 3 ]
Difference: [4 1 ]
2 is a member of a
5 is not a member of a
a equals set(1,2,3,4,4)
a does not equal b
(1,2) is included in a
(1,2,5) is not included in a

Using library:

<lang unicon> link sets

procedure main ()

 a := set(1, 1, 2, 3, 4)
 b := set(2, 3, 5)
 write ("a: ", simage(a))
 write ("b: ", simage(b))
 # basic set operations
 write ("Intersection: ", simage (a**b))
 write ("Union: ", simage        (a++b))
 write ("Difference: ", simage   (a--b))
 # membership
 if member(a, 2) then
   write ("2 is a member of a")
 else
   write ("2 is not a member of a")
 if member(a, 5) then
   write ("5 is a member of a")
 else
   write ("5 is not a member of a")
 # equality
 if seteq(a, set(1,2,3,4,4)) then
   write ("a equals set(1,2,3,4,4)")
 else
   write ("a does not equal set(1,2,3,4,4)")
 if seteq(a, b) then
   write ("a equals b")
 else
   write ("a does not equal b")
 # check subset
 if setlt(set(1,2), a) then
   write ("(1,2) is included in a")
 else
   write ("(1,2) is not included in a")
 if setlt(a, set(1,2,5), a) then
   write ("(1,2,5) is included in a")
 else
   write ("(1,2,5) is not included in a")

end </lang>

Output:

a: { 2, 4, 1, 3 }
b: { 5, 2, 3 }
Intersection: { 2, 3 }
Union: { 5, 2, 4, 1, 3 }
Difference: { 4, 1 }
2 is a member of a
5 is not a member of a
a equals set(1,2,3,4,4)
a does not equal b
(1,2) is included in a
(1,2,5) is not included in a

J

In J, we use a sequence to represent a set. This actually winds up being a family of set implementations. In this example, we chose to ignore order and specify that duplicate elements are not allowed.

Here are definitions for the required operations:

<lang j>union=: ~.@, intersection=: [ -. -. difference=: -. subset=: *./@e. equality=: -:&(/:~)</lang>

Examples:

<lang j> 2 4 6 8 ~.@, 2 3 5 7 2 4 6 8 3 5 7

  2 4 6 8 ([ -. -.) 2 3 5 7

2

  2 4 6 8 -. 2 3 5 7

4 6 8

  2 4 6 8 *./@e. 2 3 5 7

0

   *./@e. 2 3 5 7

1

  2 4 6 8 3 5 7 -:&(/:~) 8 7 6 5 4 3 2

1</lang>

Note that these operations can be combined in sentences with other operations. For example we could define

<lang j>properSubset=: subset *. 1 - equality</lang>

Java

Works with: Java version 7+

To use this in Java 5 replace all "<>" with "<Integer>". <lang java5>import java.util.Arrays; import java.util.Collections; import java.util.Set; import java.util.TreeSet;

public class Sets {

   public static void main(String[] args){
       Set<Integer> a = new TreeSet<>();
       //TreeSet sorts on natural ordering (or an optional comparator)
       //other options: HashSet (hashcode)
       //               LinkedHashSet (insertion order)
       //               EnumSet (optimized for enum values)
       //others at: http://download.oracle.com/javase/7/docs/api/java/util/Set.html
       Set<Integer> b = new TreeSet<>();
       Set<Integer> c = new TreeSet<>();
       Set<Integer> d = new TreeSet<>();
       
       a.addAll(Arrays.asList(1, 2, 3, 4, 5));
       b.addAll(Arrays.asList(2, 3, 4, 5, 6, 8));
       c.addAll(Arrays.asList(2, 3, 4));
       d.addAll(Arrays.asList(2, 3, 4));
       System.out.println("a: " + a);
       System.out.println("b: " + b);
       System.out.println("c: " + c);
       System.out.println("d: " + d);
       
       System.out.println("2 in a: " + a.contains(2));
       System.out.println("6 in a: " + a.contains(6));
       
       Set<Integer> ab = new TreeSet<>();
       ab.addAll(a);
       ab.addAll(b);
       System.out.println("a union b: " + ab);
       
       Set<Integer> a_b = new TreeSet<>();
       a_b.addAll(a);
       a_b.removeAll(b);
       System.out.println("a - b: " + a_b);
       
       System.out.println("c subset of a: " + a.containsAll(c));
       //use a.conatins() for single elements
       
       System.out.println("c = d: " + c.equals(d));
       System.out.println("d = c: " + d.equals(c));
       
       Set<Integer> aib = new TreeSet<>();
       aib.addAll(a);
       aib.retainAll(b);
       System.out.println("a intersect b: " + aib);
       
       System.out.println("add 7 to a: " + a.add(7));
       System.out.println("add 2 to a again: " + a.add(2));
       
       //other noteworthy things related to sets:
       Set<Integer> empty = Collections.EMPTY_SET; //immutable empty set
       //empty.add(2);  would fail
       empty.isEmpty(); //test if a set is empty
       empty.size();
       Collections.disjoint(a, b); //returns true if the sets have no common elems (based on their .equals() methods)
       Collections.unmodifiableSet(a); //returns an immutable copy of a
   }

}</lang> Output:

a: [1, 2, 3, 4, 5]
b: [2, 3, 4, 5, 6, 8]
c: [2, 3, 4]
d: [2, 3, 4]
2 in a: true
6 in a: false
a union b: [1, 2, 3, 4, 5, 6, 8]
a - b: [1]
c subset of a: true
c = d: true
d = c: true
a intersect b: [2, 3, 4, 5]
add 7 to a: true
add 2 to a again: false

Perl

Primitive set implementation using hashes. <lang perl>use strict;

package Set; # likely will conflict with stuff on CPAN use overload '""' => \&str, 'bool' => \&count, '+=' => \&add, '-=' => \&del, '-' => \&diff, '==' => \&eq, '&' => \&intersection, '|' => \&union, '^' => \&xdiff;

sub str { my $set = shift; # This has drawbacks: stringification is used as set key # if the set is added to another set as an element, which # may cause inconsistencies if the element set is modified # later. In general, a hash key loses its object identity # anyway, so it's not unique to us. "Set{ ". join(", " => sort map("$_", values %$set)) . " }" }

sub new { my $pkg = shift; my $h = bless {}; $h->add($_) for @_; $h }

sub add { my ($set, $elem) = @_; $set->{$elem} = $elem; $set }

sub del { my ($set, $elem) = @_; delete $set->{$elem}; $set }

sub has { # set has element my ($set, $elem) = @_; exists $set->{$elem} }

sub union { my ($this, $that) = @_; bless { %$this, %$that } }

sub intersection { my ($this, $that) = @_; my $s = new Set; for (keys %$this) { $s->{$_} = $this->{$_} if exists $that->{$_} } $s }

sub diff { my ($this, $that) = @_; my $s = Set->new; for (keys %$this) { $s += $this->{$_} unless exists $that->{$_} } $s }

sub xdiff { # xor, symmetric diff my ($this, $that) = @_; my $s = new Set; bless { %{ ($this - $that) | ($that - $this) } } }

sub count { scalar(keys %{+shift}) }

sub eq { my ($this, $that) = @_; !($this - $that) && !($that - $this); }

sub contains { # this is a superset of that my ($this, $that) = @_; for (keys %$that) { return 0 unless $this->has($_) } return 1 }

package main; my ($x, $y, $z, $w);

$x = Set->new(1, 2, 3); $x += $_ for (5 .. 7); $y = Set->new(1, 2, 4, $x); # not the brightest idea

print "set x is: $x\nset y is: $y\n"; for (1 .. 4, $x) { print "$_ is", $y->has($_) ? "" : " not", " in y\n"; }

print "union: ", $x | $y, "\n"; print "intersect: ", $x & $y, "\n"; print "z = x - y = ", $z = $x - $y, "\n"; print "y is ", $x->contains($y) ? "" : "not ", "a subset of x\n"; print "z is ", $x->contains($z) ? "" : "not ", "a subset of x\n"; print "z = (x | y) - (x & y) = ", $z = ($x | $y) - ($x & $y), "\n"; print "w = x ^ y = ", $w = ($x ^ $y), "\n"; print "w is ", ($w == $z) ? "" : "not ", "equal to z\n"; print "w is ", ($w == $x) ? "" : "not ", "equal to x\n";</lang>

PicoLisp

We may use plain lists, or 'idx' structures for sets. A set may contain any type of data.

Using lists

<lang PicoLisp>(setq

  Set1 (1 2 3 7 abc "def" (u v w))
  Set2 (2 3 5 hello (x y z))
  Set3 (3 hello (x y z)) )

Element tests (any non-NIL value means "yes")

(member "def" Set1)

-> ("def" (u v w))

(member "def" Set2)

-> NIL

(member '(x y z) Set2)

-> ((x y z))

Union

(uniq (append Set1 Set2))

-> (1 2 3 7 abc "def" (u v w) 5 hello (x y z))

Intersection

(sect Set1 Set2)

-> (2 3)

Difference

(diff Set1 Set2)

-> (1 7 abc "def" (u v w))

Test for subset

(not (diff Set1 Set2))

-> NIL # Set1 is not a subset of Set2

(not (diff Set3 Set2))

-> T # Set3 is a subset of Set2

Test for equality

(= (sort (copy Set1)) (sort (copy Set2)))

-> NIL

(= (sort (copy Set2)) (sort (copy Set2)))

-> T</lang>

Using 'idx' structures

<lang PicoLisp># Create three test-sets (balance 'Set1 (1 2 3 7 abc "def" (u v w))) (balance 'Set2 (2 3 5 hello (x y z))) (balance 'Set3 (3 hello (x y z)))

Get contents

(idx 'Set1)

-> (1 2 3 7 abc "def" (u v w))

(idx 'Set2)

-> (2 3 5 hello (x y z))

Element tests (any non-NIL value means "yes")

(idx 'Set1 "def")

-> ("def" (abc) (u v w))

(idx 'Set2 "def")

-> NIL

(idx 'Set2 '(x y z))

-> ((x y z))

Union

(use S

  (balance 'S (idx 'Set1))
  (balance 'S (idx 'Set2) T)
  (idx 'S) )

-> (1 2 3 5 7 abc "def" hello (u v w) (x y z))

Intersection

(sect (idx 'Set1) (idx 'Set2))

-> (2 3)

Difference

(diff (idx 'Set1) (idx 'Set2))

-> (1 7 abc "def" (u v w))

Test for subset

(not (diff (idx 'Set1) (idx 'Set2)))

-> NIL # Set1 is not a subset of Set2

(not (diff (idx 'Set3) (idx 'Set2)))

-> T # Set3 is a subset of Set2

Test for equality

(= (idx 'Set1) (idx 'Set2))

-> NIL

(= (idx 'Set2) (idx 'Set2))

-> T</lang>

Python

In Python, set is a standard type since Python 2.4. There is also frozenset which is an immutable version of set. (In Python 2.3, they were provided as Set and ImmutableSet types in the module sets.)

Language syntax for set literals is supported starting in Python 3.0 and 2.7. (For versions prior to 2.7, use set([1, 2, 3, 4]) instead of {1, 2, 3, 4}. Even in Python 2.7+ and 3.0+, it is necessary to write set() to express the empty set.)

Works with: Python version 2.7+ and 3.0+

<lang python>>>> s1, s2 = {1, 2, 3, 4}, {3, 4, 5, 6} >>> s1 | s2; # Union {1, 2, 3, 4, 5, 6} >>> s1 & s2; # Intersection {3, 4} >>> s1 - s2; # Difference {1, 2} >>> s1 < s1; # True subset False >>> {3, 1} < s1; # True subset True >>> s1 <= s1; # Subset True >>> {3, 1} <= s1; # Subset True >>> {3, 2, 4, 1} == s1; # Equality True >>> s1 == s2; # Equality False >>> 2 in s1; # Membership True >>> 10 not in s1; # Non-membership True >>> {1, 2, 3, 4, 5} > s1; # True superset True >>> {1, 2, 3, 4} > s1; # True superset False >>> {1, 2, 3, 4} >= s1; # Superset True >>> s1 ^ s2; # Symmetric difference {1, 2, 5, 6} >>> len(s1); # Cardinality 4 >>> s1.add(99); # Mutability >>> s1 {99, 1, 2, 3, 4} >>> s1.discard(99); # Mutability >>> s1 {1, 2, 3, 4} >>> s1 |= s2; # Mutability >>> s1 {1, 2, 3, 4, 5, 6} >>> s1 -= s2; # Mutability >>> s1 {1, 2} >>> s1 ^= s2; # Mutability >>> s1 {1, 2, 3, 4, 5, 6} >>> </lang>

Tcl

Sets in Tcl are modeled as lists of items, with operations that preserve uniqueness of membership.

Library: Tcllib (Package: struct::set)

<lang tcl>package require struct::set

Many ways to build sets

set s1 [list 1 2 3 4] set s2 {3 4 5 6} struct::set add s3 {2 3 4 3 2}; # $s3 will be proper set... set item 5

puts "union: [struct::set union $s1 $s2]" puts "intersection: [struct::set intersect $s1 $s2]" puts "difference: [struct::set difference $s1 $s2]" puts "membership predicate: [struct::set contains $s1 $item]" puts "subset predicate: [struct::set subsetof $s1 $s2]"; # NB: not strict subset test! puts "equality predicate: [struct::set equal $s1 $s2]"

Adding an element to a set (note that we pass in the name of the variable holding the set):

struct::set include s3 $item

Removing an element from a set:

struct::set exclude s3 $item

Getting the cardinality:

puts "cardinality: [struct::set size $s3]</lang>

@@ Line 10: / Line 10: @@
 * A &cap; B -- ''intersection''; a set of all elements in ''both'' set A and set B.
 * A &minus; B -- ''difference''; a set of all elements in set A, except those in set B.
-* A &sub; B -- ''subset''; true if every element in set A is also in set B.
+* A &sube; B -- ''subset''; true if every element in set A is also in set B.
 * A = B -- ''equality''; true if every element of set A is in set B and vice-versa.
-As an option, show some other set operations. (If A &sub; B, but A &ne; B, then A is called a true or proper subset of B, written A &#x228a; B.) As another option, show how to modify a mutable set.
+As an option, show some other set operations. (If A &sube; B, but A &ne; B, then A is called a true or proper subset of B, written A &sub; B or A &#x228a; B.) As another option, show how to modify a mutable set.
 One might implement a set using an [[associative array]] (with set elements as array keys and some dummy value as the values; associative arrays are usually implemented using a binary search tree or a hash table) or using a list. The basic test, m &isin; S, is [[O]](n) with a list, O(''log'' n) with a balanced binary search tree, or (O(1) average-case, O(n) worst case) with a hash table.