Talk:Sieve of Eratosthenes

From Rosetta Code
Revision as of 19:44, 29 October 2011 by rosettacode>Paddy3118 (Undo revision 124245 by Paddy3118 (talk))

Optimizations

My Forth is very rusty, but it looks like your Forth version of the Sieve_of_Eratosthenes lacks both important features of the sieve: The outer loop should stop at the sqrt of the total limit, and the inner loop should start with the square of the just detected prime (any other multiples must already have been crossed out). Without those, the sieve won't be faster than to trial division for each number. Some other implementations seem to have the same problem. (And some only sieve odd numbers, while others don't, making overall comparison between languages more difficult). Maybe all this should also be clarified in the task description? Dirkt 02:48, 2 December 2007 (MST)

I"m ambivalent about these optimizations. The half dozen or so sieve implementations I based this on did not contain these optimizations (including those on similar comparison sites, such as the Computer Language Shootout). Also when I measured this a while ago, the speedup of these optimizations was minimal. For small ranges, the other optimization to assume 2 is prime (i.e. only examine odd numbers) had a greater effect, both in time and space. (Out of curiosity, did Eratosthenes himself document these optimizations, or were they found later?)
As to trial division, you are assuming division and square root are cheap. The simple implementation only uses addition (multiplication may not be implemented on a given processor).
I think keeping the tasks simple should be a goal of Rosetta Code. Keeping things simple encourages folks to donate their time in writing task solutions.
I agree that the examples should all use the same algorithm. The algorithm we decide upon should be described in the task description. --IanOsgood 07:53, 2 December 2007 (MST)
The difference only starts to kick in for large numbers. It's a similar story with the purely functional Haskell version I mentioned - nobody really tried this for large numbers, so nobody discovered it was flawed for a long time. And yes, only examining odd numbers (or using wheels other than this simple one that is only based on the prime 2) do indeed speed up things quite a lot, even for large limits.
By "trial division", I didn't mean actual cost, but asymptotic cost. The 'trick' behind the Sieve of Eratosthenes is to avoid checking whether a prime p divides some remaining candidate k if it is already known that k is composite. Trial division, in contrast, performs this check for every p and every k. And because the sieve only does a fraction of those checks, it's much faster.
The above is wrong! Real difference between trial division and sieve is that in trial division most divisons is "useless": when dividing n by k we just learn that k is not a divisor of n. Sieve only looks at pairs n,k such that n divides k. Even most naive Sieve of Eratosthenes has complexity n*log(n), while using trial division and stopping at sqare root we are close to n*sqrt(n). Stopping sieve at sqrt(n) saves half of the work, so it is really comparable to looking only at odd numbers. Starting inner loop at p^2 has negligable effect.
The actual cost of sum versus multiplication is recovered quickly by the better asymptotic behavior (I can do the math for you, if you want me to).
I agree that things should be kept simple (and therefore I would prefer to simply test for all numbers, not only odd, or not those with remainder 1 or 5 (mod 6), etc.), but these two things (which are really two sides of the same idea - if you start crossing at p*p for each prime p, than of course you can stop crossing once p*p is over the limit) are really easy to implement and do affect asymptotic cost, so I'd like to see them included. Dirkt 09:24, 2 December 2007 (MST)
FYI, if you apply the outer loop optimization (loop from 2 to sqrt n), then you must also print the primes in a second loop from 2 to n. Many of the examples are now incorrect as they stand. --IanOsgood 10:15, 2 December 2007 (MST)
Yes, if one insists that the primes must actually be printed. But the task only says "find all the primes", and keeping the sieve around is actually the less space intensive way to store them (as I found out the hard way some time ago).
Anyway, I think that's not the main point :-) And the examples are first of all *different*. If you think that from the point of view of comparing languages, an inefficient Sieve of Eratosthenes is better, then the task should clearly say so, and all examples should be adapted. However, as I said, I think that those two optimasations are simple enough, and worth including. Dirkt 01:38, 3 December 2007 (MST)
If I might chime in...
I'm not really familiar with the Sieve of Eratosthenes, but perhaps a page could be taken from FizzBuzz and 100 Doors? Language sections in those pages are subsected to reflect optimized vs unoptimized approaches. For the purposes of easy comparison, perhaps such a subsection would be appropriate here?--Short Circuit 12:23, 3 December 2007 (MST)
I personally think the 100 doors "optimized" solutions are useless, adding nothing compared to the random number task, and cluttering up the table of contents. In fact, I suggested the Sieve of Eratosthenes task because I didn't like 100 doors in the first place!
I don't really have anything against the optimized bounds for the sieve (Wikipedia mentions it, and I have a Forth program ready to go). Just make sure the algorithm is documented in the task description. --IanOsgood 16:56, 3 December 2007 (MST)

In the task description: "you shouldn't optimize by using pre-computed wheels, i.e. assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3)"

This might be clearer if it read "don't assume". The word "shouldn't" is supposed to apply distributively: you shouldn't optimize ... you shouldn't assume. But it's actually open to being interpreted as the opposite of the intended meaning, which is: Don't assume that the only numbers you need to sieve are those supplied by a pre-computed wheel. Adding the word "don't" would eliminate the ambiguity. Should I change it? --Snoman 01:26, 16 July 2010 (UTC)

I went ahead and made this change. Please feel free to undo. --Snoman 18:26, 16 July 2010 (UTC)

Also, I noticed that the Ruby function was misspelled. Fixed. :-) --Snoman 03:34, 16 July 2010 (UTC)

Python

This task can also be handled by the NZMATH modules prime and arith1 to enable its generator_eratosthenes function.--Billymac00 02:45, 3 January 2011 (UTC)

You can add a code example that shows the use of the module, as long as there's already a code example that satisfies the task as written. --Michael Mol 04:13, 3 January 2011 (UTC)

The not-a-sieve method

The first code sample under Perl is indeed not a sieve method. Is there a proper procedure to remove it--or can I just remove it? (and who in the world uses a one space indent?) --Ledrug 09:13, 12 June 2011 (UTC)

Sometimes we might leave an example that deviates from the task especially if it says how it deviates from the task. If someone did a proper Perl sieve, then that would definitely be the time to consider removing this one. As it stands, its good to ask/inform the original author? --Paddy3118 12:07, 12 June 2011 (UTC)
The wrong code was added by an IP with it being his only contrib, a long time ago. The note was added by someone else, the code was edited by someone else (for stylistic reasons, and it's still bad looking), and there are three other, correct implementations in the same section, by someone else. This really screams "delete" IMO. --Ledrug 00:27, 13 June 2011 (UTC)
I have replaced that code with a sieve implementation. --Rdm 14:35, 13 June 2011 (UTC)

unix shell versions

User:Kernigh has added a note that the shell version use trial division (I don't quite understand how they actually works, it's difficult to trace), I have added on version that is actually using the sieve operation at the end. --AlexLehm 23:09, 28 October 2011 (UTC)