One of n lines in a file

From Rosetta Code
Revision as of 22:02, 7 September 2011 by rosettacode>Paddy3118 (New draft task and Python solution.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
One of n lines in a file is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

A method of choosing a line randomly from a file:

  • Without reading the file more than once
  • When substantial parts of the file cannot be held in memory
  • Without knowing how many lines are in the file

Is to:

  • keep the first line of the file as a possible choice, then
  • Read the second line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/2.
  • Read the third line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/3.
...
  • Read the Nth line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/N
  • Return the computed possible choice when no further lines exist in the file.
Task
  1. Create a function/method/routine called one_of_n that given n, the number of actual lines in a file, follows the algotrithm above to return an integer - the line number of the line chosen from the file.
    The number returned can vary, randomly, in each run.
  2. Use one_of_n in a simulation to find what woud be the chosen line of a 10 line file simulated 1,000,000 times.
  3. Print and show how many times each of the 10 lines is chosen as a rough measure of how well the algorithm works.

Note: You may choose a smaller number of repetitions if necessary, but mention this up-front.

Python

<lang python>from random import random as rnd

def one_of_n(n):

   # Zero based line numbers
   choice = 0
   for i in range(1, n):
       if rnd() < 1. / (i + 1.):
           choice = i
   return choice
           

def one_of_n_test(n=10, trials=1000000):

   bins = [0] * n
   if n:
       for i in range(trials):
           bins[one_of_n(n)] += 1
   return bins

print(one_of_n_test())</lang>

Sample output
[99833, 100303, 99902, 100132, 99608, 100117, 99531, 100017, 99795, 100762]