Talk:Knuth's algorithm S

From Rosetta Code

Names

Are the names all that important for this task? The Tcl implementation uses different names rather than s_of_n_creator and s_of_n for purely syntactic reasons, yet it is clearly the same algorithm, same behavior. I suppose I could dress the stuff up with aliases so that it appears to have those names, but they'd not add anything (and might slow things down). –Donal Fellows 06:01, 22 October 2011 (UTC)

The task description is questionable. "Return a funciton" is not necessarily desirable or even possible for some languages, not to mention fixed function names. A names like "s_of_n" isn't all that descriptive to start with, and is patentedly against naming conventions in different languages, I don't see the point of sticking with them. --Ledrug 03:01, 12 November 2011 (UTC)

Algorithm R?

"Algorithm S" in Knuth, Vol 2, 3.4.2 is the one with a known number of records. I think it is "Algorithm R" (Reservoir sampling) what we need here. -Abu 12:10, 27 October 2011 (UTC)

Algorithm R has a second pass -- it's not a crucial distinction, though. Mathematically this task, R and S are all the same thing anyway, but maybe R is a better fit. --Ledrug 12:27, 27 October 2011 (UTC)

BBC BASIC and 100,000 Functions kept around

Hi RichardRussell, I read your comment:

"At each of the 100000 repetitions not only is a new function created but also new copies of its PRIVATE variables index% and samples%(). Creating such a large number of variables at run-time impacts adversely on execution speed and isn't to be recommended, other than to meet the artificial requirements of the task."

I took a look at your example and it seems that function FNs_of_n_creator calculates and keeps each random sample from ten and you keep 100000 of them around before extracting their data for binning.

The Python example, (amongst others e.g. Ada), bin results as they go along and so don't have to keep so many functions around - maybe the BBC basic example could follow a similar scheme?

The task tries to:

  1. Introduce what could be a new algorithm to implementers
  2. Quickly show that the routine can generate a non-biased sample (without using too much stats).

It could well be thought of as artificial, but I hope I've explained the purpose. --Paddy3118 07:45, 5 November 2012 (UTC)