Knuth's algorithm S: Difference between revisions

From Rosetta Code
Content added Content deleted
(Add ref.)
Line 85: Line 85:
8:29758
8:29758
9:30042</pre>
9:30042</pre>

===Python Class based version===
Only a slight change creates the following class-based implementation:
<lang python>class S_of_n_creator():
def __init__(self, n):
self.n = n
self.i = 0
self.sample = []
def __call__(self, item):
self.i += 1
n, i, sample = self.n, self.i, self.sample
if i <= n:
# Keep first n items
sample.append(item)
elif random() < (n / i):
# Keep item
del sample[randrange(n)]
sample.append(item)
return sample</lang>
The above can be instantiated as follows after which <code>s_of_n</code> can be called in the same way as it is in the first example where it is a function instead of an instance.
<lang python>s_of_n = S_of_n_creator(3)</lang>

Revision as of 21:22, 21 October 2011

Knuth's algorithm S is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

This is a method of randomly sampling n items from a set of M items, with equal probability; where M >= n and M, the number of items is unknown until the end. This means that the equal probability sampling should be maintained for all successive items > n as they become available (although the content of successive samples can change).

The algorithm
  1. Select the first n items as the sample as they become available;
  2. For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If not, have it randomly (1/n) replace one of the previously selected n items of the sample.
  3. Repeat #2 for any subsequent items.
The Task
  1. Create a function s_of_n_creator that given the maximum sample size, returns a function s_of_n that takes one parameter, item.
  2. Function s_of_n when called with successive items returns an equi-weighted random sample of up to n of its items so far, each time it is called, calculated using Knuths Algorithm S.
  3. Test your functions by printing and showing the frequency of occurrences of the selected digits from 100,000 repetitions of:
  1. Use the s_of_n_creator with n == 3 to generate an s_of_n.
  2. call s_of_n with each of the digits 0 to 9 in order, keeping the returned three digits of its random sampling from its last call with argument item=9.

Note: A class taking n and generating a callable instance/function might also be used.

Reference
  • The Art of Computer Programming, Vol 2, 3.4.2 p.142
Cf.

Python

<lang python>from random import random, randrange

def s_of_n_creator(n):

   sample, i = [], 0
   def s_of_n(item):
       nonlocal n, i, sample
       i += 1
       if i <= n:
           # Keep first n items
           sample.append(item)
       elif random() < (n / i):
           # Keep item
           del sample[randrange(n)]
           sample.append(item)
       return sample
   return s_of_n

if __name__ == '__main__':

   bin = [0]* 10
   items = range(10)
   print("Single run samples for n = 3:")
   s_of_n = s_of_n_creator(3)
   for item in items:
       sample = s_of_n(item)
       print("  Item: %i -> sample: %s" % (item, sample))
   #
   for trial in range(100000):
       s_of_n = s_of_n_creator(3)
       for item in items:
           sample = s_of_n(item)
       for s in sample:
           bin[s] += 1
   print("\nTest item frequencies for 100000 runs:\n ",
         '\n  '.join("%i:%i" % x for x in enumerate(bin)))</lang>
Sample output
Single run samples for n = 3:
  Item: 0 -> sample: [0]
  Item: 1 -> sample: [0, 1]
  Item: 2 -> sample: [0, 1, 2]
  Item: 3 -> sample: [0, 1, 3]
  Item: 4 -> sample: [0, 1, 3]
  Item: 5 -> sample: [0, 1, 3]
  Item: 6 -> sample: [0, 1, 3]
  Item: 7 -> sample: [0, 3, 7]
  Item: 8 -> sample: [0, 3, 7]
  Item: 9 -> sample: [0, 3, 7]

Test item frequencies for 100000 runs:
  0:29983
  1:30240
  2:29779
  3:29921
  4:30224
  5:29967
  6:30036
  7:30050
  8:29758
  9:30042

Python Class based version

Only a slight change creates the following class-based implementation: <lang python>class S_of_n_creator():

   def __init__(self, n):
       self.n = n
       self.i = 0
       self.sample = []
   
   def __call__(self, item):
       self.i += 1
       n, i, sample = self.n, self.i, self.sample
       if i <= n:
           # Keep first n items
           sample.append(item)
       elif random() < (n / i):
           # Keep item
           del sample[randrange(n)]
           sample.append(item)
       return sample</lang>

The above can be instantiated as follows after which s_of_n can be called in the same way as it is in the first example where it is a function instead of an instance. <lang python>s_of_n = S_of_n_creator(3)</lang>