Sieve of Eratosthenes: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎{{header|MSX Basic}}: Fix language tag)
(461 intermediate revisions by more than 100 users not shown)
Line 29: Line 29:
*   [[sequence of primes by Trial Division]]
*   [[sequence of primes by Trial Division]]
<br><br>
<br><br>

=={{header|11l}}==
{{trans|Python}}

<syntaxhighlight lang="11l">F primes_upto(limit)
V is_prime = [0B]*2 [+] [1B]*(limit - 1)
L(n) 0 .< Int(limit ^ 0.5 + 1.5)
I is_prime[n]
L(i) (n*n..limit).step(n)
is_prime[i] = 0B
R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)

print(primes_upto(100))</syntaxhighlight>

{{out}}
<pre>
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
</pre>


=={{header|360 Assembly}}==
=={{header|360 Assembly}}==
For maximum compatibility, this program uses only the basic instruction set.
For maximum compatibility, this program uses only the basic instruction set.
<lang 360_Assembly>* Sieve of Eratosthenes
<syntaxhighlight lang="360_assembly">* Sieve of Eratosthenes
ERATOST CSECT
ERATOST CSECT
USING ERATOST,R12
USING ERATOST,R12
Line 102: Line 120:
CRIBLE DC 100000X'01'
CRIBLE DC 100000X'01'
YREGS
YREGS
END ERATOST</lang>
END ERATOST</syntaxhighlight>
{{out}}
{{out}}
<pre style="height:20ex">
<pre style="height:20ex">
Line 150: Line 168:
=={{header|6502 Assembly}}==
=={{header|6502 Assembly}}==
If this subroutine is called with the value of <i>n</i> in the accumulator, it will store an array of the primes less than <i>n</i> beginning at address 1000 hex and return the number of primes it has found in the accumulator.
If this subroutine is called with the value of <i>n</i> in the accumulator, it will store an array of the primes less than <i>n</i> beginning at address 1000 hex and return the number of primes it has found in the accumulator.
<lang 6502asm>ERATOS: STA $D0 ; value of n
<syntaxhighlight lang="6502asm">ERATOS: STA $D0 ; value of n
LDA #$00
LDA #$00
LDX #$00
LDX #$00
Line 188: Line 206:
JMP COPY
JMP COPY
COPIED: TYA ; how many found
COPIED: TYA ; how many found
RTS</lang>
RTS</syntaxhighlight>


=={{header|68000 Assembly}}==
=={{header|68000 Assembly}}==
Line 197: Line 215:
Some of the macro code is derived from the examples included with EASy68K.
Some of the macro code is derived from the examples included with EASy68K.
See 68000 "100 Doors" listing for additional information.
See 68000 "100 Doors" listing for additional information.
<lang 68000devpac>*-----------------------------------------------------------
<syntaxhighlight lang="68000devpac">*-----------------------------------------------------------
* Title : BitSieve
* Title : BitSieve
* Written by : G. A. Tippery
* Written by : G. A. Tippery
Line 451: Line 469:
Summary2: DC.B ' prime numbers found.',CR,LF,$00
Summary2: DC.B ' prime numbers found.',CR,LF,$00


END START ; last line of source</lang>
END START ; last line of source</syntaxhighlight>


=={{header|8086 Assembly}}==

<syntaxhighlight lang="asm">MAXPRM: equ 5000 ; Change this value for more primes
cpu 8086
bits 16
org 100h
section .text
erato: mov cx,MAXPRM ; Initialize array (set all items to prime)
mov bp,cx ; Keep a copy in BP
mov di,sieve
mov al,1
rep stosb
;;; Sieve
mov bx,sieve ; Set base register to array
inc cx ; CX=1 (CH=0, CL=1); CX was 0 before
mov si,cx ; Start at number 2 (1+1)
.next: inc si ; Next number
cmp cl,[bx+si] ; Is this number marked as prime?
jne .next ; If not, try next number
mov ax,si ; Otherwise, calculate square,
mul si
mov di,ax ; and put it in DI
cmp di,bp ; Check array bounds
ja output ; We're done when SI*SI>MAXPRM
.mark: mov [bx+di],ch ; Mark byte as composite
add di,si ; Next composite
cmp di,bp ; While maximum not reached
jbe .mark
jmp .next
;;; Output
output: mov si,2 ; Start at 2
.test: dec byte [bx+si] ; Prime?
jnz .next ; If not, try next number
mov ax,si ; Otherwise, print number
call prax
.next: inc si
cmp si,MAXPRM
jbe .test
ret
;;; Write number in AX to standard output (using MS-DOS)
prax: push bx ; Save BX
mov bx,numbuf
mov bp,10 ; Divisor
.loop: xor dx,dx ; Divide AX by 10, modulus in DX
div bp
add dl,'0' ; ASCII digit
dec bx
mov [bx],dl ; Store ASCII digit
test ax,ax ; More digits?
jnz .loop
mov dx,bx ; Print number
mov ah,9 ; 9 = MS-DOS syscall to print string
int 21h
pop bx ; Restore BX
ret
section .data
db '*****' ; Room for number
numbuf: db 13,10,'$'
section .bss
sieve: resb MAXPRM</syntaxhighlight>

{{out}}
<pre>2
3
5
7
11
...
4969
4973
4987
4993
4999
</pre>

=={{header|8th}}==
<syntaxhighlight lang="8th">
with: n

\ create a new buffer which will function as a bit vector
: bit-vec SED: n -- b
dup 3 shr swap 7 band if 1+ then b:new b:clear ;

\ given a buffer, sieving prime, and limit, cross off multiples
\ of the sieving prime.
: +composites SED: b n n -- b
>r dup sqr rot \ want: -- n index b
repeat
over 1- true b:bit!
>r over + r>
over r@ > until!
rdrop nip nip ;

\ SoE algorithm proper
: make-sieve SED: n -- b
dup>r bit-vec 2
repeat
tuck 1- b:bit@ not
if
over r@ +composites
then swap 1+
dup sqr r@ < while!
rdrop drop ;

\ traverse the final buffer, creating an array of primes
: sieve>a SED: b n -- a
>r a:new swap
( 1- b:bit@ not if >r I a:push r> then ) 2 r> loop drop ;

;with

: sieve SED: n -- a
dup make-sieve swap sieve>a ;
</syntaxhighlight>
{{Out}}
<pre>
ok> 100 sieve .
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok> 1_000_000 sieve a:len . \ count primes up to 1,000,000
78498
ok> -1 a:@ . \ largest prime < 1,000,000
999983
</pre>

=={{header|AArch64 Assembly}}==
{{works with|as|Raspberry Pi 3B version Buster 64 bits}}
<syntaxhighlight lang="aarch64 assembly">
/* ARM assembly AARCH64 Raspberry PI 3B */
/* program cribleEras64.s */

/*******************************************/
/* Constantes file */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"

.equ MAXI, 100

/*********************************/
/* Initialized data */
/*********************************/
.data
sMessResult: .asciz "Prime : @ \n"
szCarriageReturn: .asciz "\n"

/*********************************/
/* UnInitialized data */
/*********************************/
.bss
sZoneConv: .skip 24
TablePrime: .skip 8 * MAXI
/*********************************/
/* code section */
/*********************************/
.text
.global main
main: // entry of program
ldr x4,qAdrTablePrime // address prime table
mov x0,#2 // prime 2
bl displayPrime
mov x1,#2
mov x2,#1
1: // loop for multiple of 2
str x2,[x4,x1,lsl #3] // mark multiple of 2
add x1,x1,#2
cmp x1,#MAXI // end ?
ble 1b // no loop
mov x1,#3 // begin indice
mov x3,#1
2:
ldr x2,[x4,x1,lsl #3] // load table élément
cmp x2,#1 // is prime ?
beq 4f
mov x0,x1 // yes -> display
bl displayPrime
mov x2,x1
3: // and loop to mark multiples of this prime
str x3,[x4,x2,lsl #3]
add x2,x2,x1 // add the prime
cmp x2,#MAXI // end ?
ble 3b // no -> loop
4:
add x1,x1,2 // other prime in table
cmp x1,MAXI // end table ?
ble 2b // no -> loop

100: // standard end of the program
mov x0,0 // return code
mov x8,EXIT // request to exit program
svc 0 // perform the system call
qAdrszCarriageReturn: .quad szCarriageReturn
qAdrsMessResult: .quad sMessResult
qAdrTablePrime: .quad TablePrime

/******************************************************************/
/* Display prime table elements */
/******************************************************************/
/* x0 contains the prime */
displayPrime:
stp x1,lr,[sp,-16]! // save registers
ldr x1,qAdrsZoneConv
bl conversion10 // call décimal conversion
ldr x0,qAdrsMessResult
ldr x1,qAdrsZoneConv // insert conversion in message
bl strInsertAtCharInc
bl affichageMess // display message
100:
ldp x1,lr,[sp],16 // restaur 2 registers
ret // return to address lr x30
qAdrsZoneConv: .quad sZoneConv

/********************************************************/
/* File Include fonctions */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"
</syntaxhighlight>
<pre>
Prime : 2
Prime : 3
Prime : 5
Prime : 7
Prime : 11
Prime : 13
Prime : 17
Prime : 19
Prime : 23
Prime : 29
Prime : 31
Prime : 37
Prime : 41
Prime : 43
Prime : 47
Prime : 53
Prime : 59
Prime : 61
Prime : 67
Prime : 71
Prime : 73
Prime : 79
Prime : 83
Prime : 89
Prime : 97
</pre>


=={{header|ABAP}}==
=={{header|ABAP}}==
<syntaxhighlight lang="lisp">
<lang Lisp>
PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.
PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.


Line 496: Line 760:
ENDIF.
ENDIF.
ENDLOOP.
ENDLOOP.
</syntaxhighlight>
</lang>

=={{header|ABC}}==
<syntaxhighlight lang="ABC">HOW TO SIEVE UP TO n:
SHARE sieve
PUT {} IN sieve
FOR cand IN {2..n}: PUT 1 IN sieve[cand]
FOR cand IN {2..floor root n}:
IF sieve[cand] = 1:
PUT cand*cand IN comp
WHILE comp <= n:
PUT 0 IN sieve[comp]
PUT comp+cand IN comp

HOW TO REPORT prime n:
SHARE sieve
IF n<2: FAIL
REPORT sieve[n] = 1

SIEVE UP TO 100
FOR n IN {1..100}:
IF prime n: WRITE n</syntaxhighlight>
{{out}}
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</pre>

=={{header|ACL2}}==
=={{header|ACL2}}==
<lang Lisp>(defun nats-to-from (n i)
<syntaxhighlight lang="lisp">(defun nats-to-from (n i)
(declare (xargs :measure (nfix (- n i))))
(declare (xargs :measure (nfix (- n i))))
(if (zp (- n i))
(if (zp (- n i))
Line 527: Line 815:


(defun sieve (limit)
(defun sieve (limit)
(sieve-r 2 limit))</lang>
(sieve-r 2 limit))</syntaxhighlight>

=={{header|Action!}}==
<syntaxhighlight lang="action!">DEFINE MAX="1000"

PROC Main()
BYTE ARRAY t(MAX+1)
INT i,j,k,first

FOR i=0 TO MAX
DO
t(i)=1
OD

t(0)=0
t(1)=0

i=2 first=1
WHILE i<=MAX
DO
IF t(i)=1 THEN
IF first=0 THEN
Print(", ")
FI
PrintI(i)
FOR j=2*i TO MAX STEP i
DO
t(j)=0
OD
first=0
FI
i==+1
OD
RETURN</syntaxhighlight>
{{out}}
[https://gitlab.com/amarok8bit/action-rosetta-code/-/raw/master/images/Sieve_of_Eratosthenes.png Screenshot from Atari 8-bit computer]
<pre>
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743,
751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883,
887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
</pre>


=={{header|ActionScript}}==
=={{header|ActionScript}}==
Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)
Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)
<lang actionscript>function eratosthenes(limit:int):Array
<syntaxhighlight lang="actionscript">function eratosthenes(limit:int):Array
{
{
var primes:Array = new Array();
var primes:Array = new Array();
Line 554: Line 887:
}
}
var e:Array = eratosthenes(1000);
var e:Array = eratosthenes(1000);
trace(e);</lang>
trace(e);</syntaxhighlight>
Output:
Output:
{{out}}
{{out}}
Line 563: Line 896:
=={{header|Ada}}==
=={{header|Ada}}==


<lang Ada>with Ada.Text_IO, Ada.Command_Line;
<syntaxhighlight lang="ada">with Ada.Text_IO, Ada.Command_Line;


procedure Eratos is
procedure Eratos is

Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));
Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));
Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
Line 572: Line 905:
Cnt: Positive;
Cnt: Positive;
begin
begin
loop
while Base * Base <= Last loop
exit when Base * Base > Last;
if Prime(Base) then
if Prime(Base) then
Cnt := Base + Base;
Cnt := Base + Base;
loop
while Cnt <= Last loop
exit when Cnt > Last;
Prime(Cnt) := False;
Prime(Cnt) := False;
Cnt := Cnt + Base;
Cnt := Cnt + Base;
Line 590: Line 921:
end if;
end if;
end loop;
end loop;
end Eratos;</lang>
end Eratos;</syntaxhighlight>


{{out}}
{{out}}
<pre>> ./eratos 31
<pre>> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31</pre>
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31</pre>

=={{header|Agda}}==

<syntaxhighlight lang="agda">
-- imports
open import Data.Nat as ℕ using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin using (Fin; suc; zero)
open import Function using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe using (Maybe; just; nothing)

-- Without square cutoff optimization
module Simple where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ suc))
where
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
sieve [] = []
sieve (nothing ∷ xs) = sieve xs
sieve (just x ∷ xs) = suc x ∷ sieve (foldr B remove (const []) xs x)
where
B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n

remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
remove _ ys zero = nothing ∷ ys x
remove y ys (suc z) = y ∷ ys z

-- With square cutoff optimization
module SquareOpt where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
where
sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
sieve _ zero = List.mapMaybe id ∘ Vec.toList
sieve _ (suc _) [] = []
sieve i (suc l) (nothing ∷ xs) = sieve (suc i) (l ∸ i ∸ i) xs
sieve i (suc l) (just x ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (Vec.foldr B remove (const []) xs i)
where
B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n

remove : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
remove _ ys zero = nothing ∷ ys i
remove y ys (suc j) = y ∷ ys j
</syntaxhighlight>


=={{header|Agena}}==
=={{header|Agena}}==
Tested with Agena 2.9.5 Win32
Tested with Agena 2.9.5 Win32
<lang agena># Sieve of Eratosthenes
<syntaxhighlight lang="agena"># Sieve of Eratosthenes


# generate and return a sequence containing the primes up to sieveSize
# generate and return a sequence containing the primes up to sieveSize
Line 629: Line 1,011:


# test the sieve proc
# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();</lang>
for i in sieve( 100 ) do write( " ", i ) od; print();</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 666: Line 1,048:


'''Works with:''' ALGOL 60 for OS/360
'''Works with:''' ALGOL 60 for OS/360
<lang algol60>'BEGIN'
<syntaxhighlight lang="algol60">'BEGIN'
'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
'INTEGER' I,J,K;
'INTEGER' I,J,K;
Line 705: Line 1,087:
'END'
'END'
'END'
'END'
'END'</lang>
'END'</syntaxhighlight>


=={{header|ALGOL 68}}==
=={{header|ALGOL 68}}==
<lang algol68>BOOL prime = TRUE, non prime = FALSE;
<syntaxhighlight lang="algol68">BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
PROC eratosthenes = (INT n)[]BOOL:
(
(
Line 725: Line 1,107:
);
);
print((eratosthenes(80),new line))</lang>
print((eratosthenes(80),new line))</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 732: Line 1,114:


=={{header|ALGOL W}}==
=={{header|ALGOL W}}==
=== Standard, non-optimised sieve ===
<lang algolw>begin
<syntaxhighlight lang="algolw">begin


% implements the sieve of Eratosthenes %
% implements the sieve of Eratosthenes %
Line 774: Line 1,157:
end
end


end.</lang>
end.</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 780: Line 1,163:
</pre>
</pre>


=== Odd numbers only version ===
=={{Header|APL}}==

Alternative version that only stores odd numbers greater than 1 in the sieve.
<syntaxhighlight lang="algolw">begin
% implements the sieve of Eratosthenes %
% only odd numbers appear in the sieve, which starts at 3 %
% s( i ) is set to true if ( i * 2 ) + 1 is prime %
procedure sieve2( logical array s ( * ); integer value n ) ;
begin
% start with everything flagged as prime %
for i := 1 until n do s( i ) := true;
% sieve out the non-primes %
% the subscripts of s are 1 2 3 4 5 6 7 8 9 10 11 12 13... %
% which correspond to 3 5 7 9 11 13 15 17 19 21 23 25 27... %
for i := 1 until truncate( sqrt( n ) ) do begin
if s( i ) then begin
integer ip;
ip := ( i * 2 ) + 1;
for p := i + ip step ip until n do s( p ) := false
end if_s_i
end for_i ;
end sieve2 ;
% test the sieve2 procedure %
integer primeMax, arrayMax;
primeMax := 100;
arrayMax := ( primeMax div 2 ) - 1;
begin
logical array s ( 1 :: arrayMax);
i_w := 2; % set output field width %
s_w := 1; % and output separator width %
% find and display the primes %
sieve2( s, arrayMax );
write( 2 );
for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
end
end.</syntaxhighlight>
{{out}}
Same as the standard version.

=={{header|ALGOL-M}}==
<syntaxhighlight lang="algol">
BEGIN

COMMENT
FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING
CLASSIC SIEVE OF ERATOSTHENES;

% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
INTEGER R1, R2;
R1 := N;
R2 := 1;
WHILE R1 > R2 DO
BEGIN
R1 := (R1+R2) / 2;
R2 := N / R1;
END;
ISQRT := R1;
END;

INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];

LIMIT := 1000;
FALSE := 0;
TRUE := 1;

WRITE("FINDING PRIMES FROM 2 TO",LIMIT);

% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
FLAGS[I] := TRUE;

% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
BEGIN
IF FLAGS[I] = TRUE THEN
FOR J := (I * I) STEP I UNTIL LIMIT DO
FLAGS[J] := FALSE;
END;

% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");
FOR I := 2 STEP 1 UNTIL LIMIT DO
BEGIN
IF FLAGS[I] = TRUE THEN
BEGIN
WRITEON(I);
COUNT := COUNT + 1;
COL := COL + 1;
IF COL > 10 THEN
BEGIN
WRITE("");
COL := 1;
END;
END;
END;

WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");

END
</syntaxhighlight>
{{out}}
<pre>
FINDING PRIMES FROM 2 TO 1000
INTIALIZING ... SIEVING ... PRINTING
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
. . .
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997

168 PRIMES WERE FOUND.
</pre>

=={{header|APL}}==


All these versions requires<tt> ⎕io←0 </tt>(index origin 0).
All these versions requires<tt> ⎕io←0 </tt>(index origin 0).
Line 791: Line 1,297:
=== Non-Optimized Version ===
=== Non-Optimized Version ===


<lang apl>sieve2←{
<syntaxhighlight lang="apl">sieve2←{
b←⍵⍴1
b←⍵⍴1
b[⍳2⌊⍵]←0
b[⍳2⌊⍵]←0
Line 800: Line 1,306:
}
}


primes2←{⍵/⍳⍴⍵}∘sieve2</lang>
primes2←{⍵/⍳⍴⍵}∘sieve2</syntaxhighlight>


The required list of prime divisors obtains by recursion (<tt>{⍵/⍳⍴⍵}∇⌈⍵*0.5</tt>).
The required list of prime divisors obtains by recursion (<tt>{⍵/⍳⍴⍵}∇⌈⍵*0.5</tt>).


=== Optimized Version ===
=== Optimized Version ===|


<lang apl>sieve←{
<syntaxhighlight lang="apl">sieve←{
b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
Line 815: Line 1,321:
}
}


primes←{⍵/⍳⍴⍵}∘sieve</lang>
primes←{⍵/⍳⍴⍵}∘sieve</syntaxhighlight>


The optimizations are as follows:
The optimizations are as follows:
Line 824: Line 1,330:
=== Examples ===
=== Examples ===


<lang apl> primes 100
<syntaxhighlight lang="apl"> primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97


Line 836: Line 1,342:


+/∘sieve¨ 10*⍳10
+/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534</lang>
0 4 25 168 1229 9592 78498 664579 5761455 50847534</syntaxhighlight>


The last expression computes the number of primes < 1e0 1e1 ... 1e9.
The last expression computes the number of primes < 1e0 1e1 ... 1e9.
The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).
The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).


=={{Header|AutoHotkey}}==
=={{header|AppleScript}}==

<syntaxhighlight lang="applescript">on sieveOfEratosthenes(limit)
script o
property numberList : {missing value}
end script
repeat with n from 2 to limit
set end of o's numberList to n
end repeat
repeat with n from 2 to (limit ^ 0.5 div 1)
if (item n of o's numberList is n) then
repeat with multiple from (n * n) to limit by n
set item multiple of o's numberList to missing value
end repeat
end if
end repeat
return o's numberList's numbers
end sieveOfEratosthenes

sieveOfEratosthenes(1000)</syntaxhighlight>

{{out}}
<syntaxhighlight lang="applescript">{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}</syntaxhighlight>

=={{header|ARM Assembly}}==
{{works with|as|Raspberry Pi}}
<syntaxhighlight lang="arm assembly">

/* ARM assembly Raspberry PI */
/* program cribleEras.s */

/* REMARK 1 : this program use routines in a include file
see task Include a file language arm assembly
for the routine affichageMess conversion10
see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes */
/************************************/
.include "../constantes.inc"

.equ MAXI, 101


/*********************************/
/* Initialized data */
/*********************************/
.data
sMessResult: .asciz "Prime : @ \n"
szCarriageReturn: .asciz "\n"

/*********************************/
/* UnInitialized data */
/*********************************/
.bss
sZoneConv: .skip 24
TablePrime: .skip 4 * MAXI
/*********************************/
/* code section */
/*********************************/
.text
.global main
main: @ entry of program
ldr r4,iAdrTablePrime @ address prime table
mov r0,#2 @ prime 2
bl displayPrime
mov r1,#2
mov r2,#1
1: @ loop for multiple of 2
str r2,[r4,r1,lsl #2] @ mark multiple of 2
add r1,#2
cmp r1,#MAXI @ end ?
ble 1b @ no loop
mov r1,#3 @ begin indice
mov r3,#1
2:
ldr r2,[r4,r1,lsl #2] @ load table élément
cmp r2,#1 @ is prime ?
beq 4f
mov r0,r1 @ yes -> display
bl displayPrime
mov r2,r1
3: @ and loop to mark multiples of this prime
str r3,[r4,r2,lsl #2]
add r2,r1 @ add the prime
cmp r2,#MAXI @ end ?
ble 3b @ no -> loop
4:
add r1,#2 @ other prime in table
cmp r1,#MAXI @ end table ?
ble 2b @ no -> loop

100: @ standard end of the program
mov r0, #0 @ return code
mov r7, #EXIT @ request to exit program
svc #0 @ perform the system call
iAdrszCarriageReturn: .int szCarriageReturn
iAdrsMessResult: .int sMessResult
iAdrTablePrime: .int TablePrime

/******************************************************************/
/* Display prime table elements */
/******************************************************************/
/* r0 contains the prime */
displayPrime:
push {r1,lr} @ save registers
ldr r1,iAdrsZoneConv
bl conversion10 @ call décimal conversion
ldr r0,iAdrsMessResult
ldr r1,iAdrsZoneConv @ insert conversion in message
bl strInsertAtCharInc
bl affichageMess @ display message
100:
pop {r1,lr}
bx lr
iAdrsZoneConv: .int sZoneConv
/***************************************************/
/* ROUTINES INCLUDE */
/***************************************************/
.include "../affichage.inc"
</syntaxhighlight>
<pre>
Prime : 2
Prime : 3
Prime : 5
Prime : 7
Prime : 11
Prime : 13
Prime : 17
Prime : 19
Prime : 23
Prime : 29
Prime : 31
Prime : 37
Prime : 41
Prime : 43
Prime : 47
Prime : 53
Prime : 59
Prime : 61
Prime : 67
Prime : 71
Prime : 73
Prime : 79
Prime : 83
Prime : 89
Prime : 97
Prime : 101
</pre>
=={{header|Arturo}}==

<syntaxhighlight lang="rebol">sieve: function [upto][
composites: array.of: inc upto false
loop 2..to :integer sqrt upto 'x [
if not? composites\[x][
loop range.step: x x^2 upto 'c [
composites\[c]: true
]
]
]
result: new []
loop.with:'i composites 'c [
unless c -> 'result ++ i
]
return result -- [0,1]
]

print sieve 100</syntaxhighlight>

{{out}}

<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</pre>

=={{header|AutoHotkey}}==
{{AutoHotkey case}}Source: [http://www.autohotkey.com/forum/topic44657.html AutoHotkey forum] by Laszlo
{{AutoHotkey case}}Source: [http://www.autohotkey.com/forum/topic44657.html AutoHotkey forum] by Laszlo
<lang autohotkey>MsgBox % "12345678901234567890`n" Sieve(20)
<syntaxhighlight lang="autohotkey">MsgBox % "12345678901234567890`n" Sieve(20)


Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime
Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime
Line 858: Line 1,539:
}
}
Return S
Return S
}</lang>
}</syntaxhighlight>

=={{Header|AutoIt}}==
===Alternative Version===
<lang autoit>#include <Array.au3>
<syntaxhighlight lang="autohotkey">Sieve_of_Eratosthenes(n){
arr := []
loop % n-1
if A_Index>1
arr[A_Index] := true

for i, v in arr {
if (i>Sqrt(n))
break
else if arr[i]
while ((j := i*2 + (A_Index-1)*i) < n)
arr.delete(j)
}
return Arr
}</syntaxhighlight>
Examples:<syntaxhighlight lang="autohotkey">n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return</syntaxhighlight>
{{out}}
<pre>. 2 3 . 5 . 7 . . .
11 . 13 . . . 17 . 19 .
. . 23 . . . . . 29 .
31 . . . . . 37 . . .
41 . 43 . . . 47 . . .
. . 53 . . . . . 59 .
61 . . . . . 67 . . .
71 . 73 . . . . . 79 .
. . 83 . . . . . 89 .
. . . . . . 97 . . .</pre>

=={{header|AutoIt}}==
<syntaxhighlight lang="autoit">#include <Array.au3>
$M = InputBox("Integer", "Enter biggest Integer")
$M = InputBox("Integer", "Enter biggest Integer")
Global $a[$M], $r[$M], $c = 1
Global $a[$M], $r[$M], $c = 1
Line 874: Line 1,590:
$r[0] = $c - 1
$r[0] = $c - 1
ReDim $r[$c]
ReDim $r[$c]
_ArrayDisplay($r)</lang>
_ArrayDisplay($r)</syntaxhighlight>


=={{header|AWK}}==
=={{header|AWK}}==
Line 899: Line 1,615:
input from commandline as well as stdin,
input from commandline as well as stdin,
and input is checked for valid numbers:
and input is checked for valid numbers:
<lang awk>
<syntaxhighlight lang="awk">
# usage: gawk -v n=101 -f sieve.awk
# usage: gawk -v n=101 -f sieve.awk


Line 918: Line 1,634:


END { print "Bye!" }
END { print "Bye!" }
</syntaxhighlight>
</lang>

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.
<syntaxhighlight lang="awk">
BEGIN { ULIMIT=100

for ( n=1 ; (n++) < ULIMIT ; )
if (n in S) {
p = S[n]
delete S[n]
for ( m = n ; (m += p) in S ; ) { }
S[m] = p
}
else print ( S[(n+n)] = n )
}
</syntaxhighlight>

==Bash==
''See solutions at [[{{FULLPAGENAME}}#UNIX Shell|UNIX Shell]].''


=={{header|BASIC}}==
=={{header|BASIC}}==
{{works with|FreeBASIC}}
{{works with|FreeBASIC}}
{{works with|RapidQ}}
{{works with|RapidQ}}
<lang freebasic>DIM n AS Integer, k AS Integer, limit AS Integer
<syntaxhighlight lang="freebasic">DIM n AS Integer, k AS Integer, limit AS Integer


INPUT "Enter number to search to: "; limit
INPUT "Enter number to search to: "; limit
Line 939: Line 1,673:
FOR n = 2 TO limit
FOR n = 2 TO limit
IF flags(n) = 0 THEN PRINT n; ", ";
IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n</lang>
NEXT n</syntaxhighlight>


==={{header|Applesoft BASIC}}===
==={{header|Applesoft BASIC}}===
<lang basic>10 INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
<syntaxhighlight lang="basic">10 INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20 DIM FLAGS(LIMIT)
20 DIM FLAGS(LIMIT)
30 FOR N = 2 TO SQR (LIMIT)
30 FOR N = 2 TO SQR (LIMIT)
Line 953: Line 1,687:
100 FOR N = 2 TO LIMIT
100 FOR N = 2 TO LIMIT
110 IF FLAGS(N) = 0 THEN PRINT N;", ";
110 IF FLAGS(N) = 0 THEN PRINT N;", ";
120 NEXT N</lang>
120 NEXT N</syntaxhighlight>

==={{header|Atari BASIC}}===
{{trans|Commodore BASIC}}
Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.
<syntaxhighlight lang="basic">100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160 IF N(P)=0 THEN 200
170 FOR I=P*P TO LI STEP P
180 N(I)=0
190 NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230 IF N(I)=0 THEN 260
240 PRINT I,:C=C+1
250 IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT</syntaxhighlight>
{{Out}}
<pre> Ready
RUN
LIMIT?100
2 3 5
7 11 13
17 19 23
29 31 37
41 43 47
53 59 61
67 71 73
79 83 89
97</pre>

==={{header|Commodore BASIC}}===
Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.
<syntaxhighlight lang="basic">100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 : N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT
</syntaxhighlight>
{{Out}}
<pre>
READY.
RUN
LIMIT? 100
2 3 5 7
11 13 17 19
23 29 31 37
41 43 47 53
59 61 67 71
73 79 83 89
97

READY.
</pre>

==={{header|IS-BASIC}}===
<syntaxhighlight lang="is-basic">100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140 LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170 IF T(I)<>1 THEN
180 FOR K=I*I TO LIMIT STEP I
190 LET T(K)=1
200 NEXT
210 END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240 IF T(I)=0 THEN PRINT I;
250 NEXT</syntaxhighlight>


==={{header|Locomotive Basic}}===
==={{header|Locomotive Basic}}===
<lang locobasic>10 DEFINT a-z
<syntaxhighlight lang="locobasic">10 DEFINT a-z
20 INPUT "Limit";limit
20 INPUT "Limit";limit
30 DIM f(limit)
30 DIM f(limit)
Line 967: Line 1,788:
100 FOR n=2 TO limit
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT</lang>
120 NEXT</syntaxhighlight>


==={{header|MSX Basic}}===
==={{header|MSX Basic}}===
<syntaxhighlight lang="msx basic">5 REM Tested with MSXPen web emulator
<lang MSXBasic>5 Rem MSX BRRJPA
6 REM Translated from Rosetta's ZX Spectrum implementation
10 INPUT "Search until: ";L
10 INPUT "Enter number to search to: ";l
20 DIM p(L)
20 DIM p(l)
30 FOR n=2 TO SQR (L+1000)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN goto 80
50 FOR k=n*n TO L STEP n
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
60 LET p(k)=1
70 NEXT k
70 NEXT k
80 NEXT n
80 NEXT n
90 FOR n=2 TO L
90 REM Display the primes
100 IF p(n)=0 THEN PRINT n;", ";
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
110 NEXT n</lang>
120 NEXT n</syntaxhighlight>

==={{header|Sinclair ZX81 BASIC}}===
If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on <code>FAST</code> and <code>SLOW</code>: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering <code>FAST</code> mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to <code>SLOW</code> mode when we have something to print out.
<syntaxhighlight lang="basic"> 10 INPUT L
20 FAST
30 DIM N(L)
40 FOR I=2 TO SQR L
50 IF N(I) THEN GOTO 90
60 FOR J=I+I TO L STEP I
70 LET N(J)=1
80 NEXT J
90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I</syntaxhighlight>


==={{header|ZX Spectrum Basic}}===
==={{header|ZX Spectrum Basic}}===
<lang zxbasic>10 INPUT "Enter number to search to: ";l
<syntaxhighlight lang="zxbasic">10 INPUT "Enter number to search to: ";l
20 DIM p(l)
20 DIM p(l)
30 FOR n=2 TO SQR l
30 FOR n=2 TO SQR l
Line 995: Line 1,836:
100 FOR n=2 TO l
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n</lang>
120 NEXT n</syntaxhighlight>


=={{header|BBC BASIC}}==
==={{header|QBasic}}===
{{works with|QBasic|1.1}}
<lang bbcbasic> limit% = 100000
{{works with|QuickBasic|4.5}}
DIM sieve% limit%
<syntaxhighlight lang="qbasic">limit = 120

prime% = 2
DIM flags(limit)
WHILE prime%^2 < limit%
FOR I% = prime%*2 TO limit% STEP prime%
FOR n = 2 TO limit
sieve%?I% = 1
flags(n) = 1
NEXT n
NEXT

REPEAT prime% += 1 : UNTIL sieve%?prime%=0
PRINT "Prime numbers less than or equal to "; limit; " are: "
ENDWHILE
FOR n = 2 TO SQR(limit)
REM Display the primes:
IF flags(n) = 1 THEN
FOR I% = 1 TO limit%
FOR i = n * n TO limit STEP n
IF sieve%?I% = 0 PRINT I%;
flags(i) = 0
NEXT</lang>
NEXT i
END IF
==bash==
NEXT n
''See solutions at [[{{FULLPAGENAME}}#UNIX Shell|UNIX Shell]].''

FOR n = 1 TO limit
IF flags(n) THEN PRINT USING "####"; n;
NEXT n</syntaxhighlight>
{{out}}
<pre>Prime numbers less than or equal to 120 are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113</pre>

==={{header|BASIC256}}===
<syntaxhighlight lang="basic256">arraybase 1
limit = 120

dim flags(limit)
for n = 2 to limit
flags[n] = True
next n

print "Prime numbers less than or equal to "; limit; " are: "
for n = 2 to sqr(limit)
if flags[n] then
for i = n * n to limit step n
flags[i] = False
next i
end if
next n

for n = 1 to limit
if flags[n] then print rjust(n,4);
next n</syntaxhighlight>
{{out}}
<pre>Prime numbers less than or equal to 120 are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113</pre>

==={{header|True BASIC}}===
{{trans|QBasic}}
<syntaxhighlight lang="qbasic">LET limit = 120
DIM flags(0)
MAT redim flags(limit)
FOR n = 2 to limit
LET flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 to sqr(limit)
IF flags(n) = 1 then
FOR i = n*n to limit step n
LET flags(i) = 0
NEXT i
END IF
NEXT n
FOR n = 1 to limit
IF flags(n)<>0 then PRINT using "####": n;
NEXT n
END</syntaxhighlight>
{{out}}
<pre>Same as QBasic entry.</pre>

==={{header|QL SuperBASIC}}===
====using 'easy way' to 'add' 2n wheels====
{{trans|ZX Spectrum Basic}}
Sets h$ to 1 for higher multiples of 2 via <code>FILL$</code>, later on sets <code>STEP</code> to 2n; replaces Floating Pt array p(z) with string variable h$(z) to sieve out all primes < z=441 (<code>l</code>=21) in under 1K, so that h$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.
<syntaxhighlight lang="qbasic">
10 INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h$="10" : h$=h$ & FILL$("01",z)
40 FOR n=3 TO l
50 IF h$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h$(k)=1
80 END FOR n
90 REM Display the primes
100 FOR n=2 TO z: IF h$(n)=0: PRINT n;", ";
</syntaxhighlight>

====2i wheel emulation of Sinclair ZX81 BASIC====
Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the <code>STEP</code> of 2 in line 40 mitigates line 50's inefficiency when going to 90. <syntaxhighlight lang="qbasic">
10 INPUT Z
15 LET L=SQR(Z)
30 LET H$="10"
32 FOR J=3 TO Z STEP 2
34 LET H$=H$ & "01"
36 NEXT J
40 FOR I=3 TO L STEP 2
50 IF H$(I)="1" THEN GOTO 90
60 FOR J=I*I TO Z STEP I+I
70 LET H$(J)="1"
80 NEXT J
90 NEXT I
110 FOR I=2 TO Z
120 IF H$(I)="0" THEN PRINT I!
130 NEXT I
</syntaxhighlight>

====2i wheel emulation of Sinclair ZX80 BASIC====
. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, <code>L</code> (=21 at most on 1K ZX80s for all primes till 439).
Backward-compatible on ZX80s after substituting ** for ^ in line 120.<syntaxhighlight lang="qbasic">
10 INPUT L
15 LET Z=(L+1)*(L- 1)/2
30 DIM H(Z)
40 FOR I=3 TO L STEP 2
50 IF H((I-1)/ 2) THEN GOTO 90
60 FOR J=I*I TO L*L STEP I+I
70 LET H((J-1)/ 2)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I
</syntaxhighlight>

====Sieve of Sundaram====
Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now
just a slight transformation away from the Sieve of Sundaram, as transformed as follows: <code>O</code> is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that
duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all
optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.
Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.<syntaxhighlight lang="qbasic">
10 INPUT O
15 LET Z=2*O*O+O*2
30 DIM H(Z)
40 FOR I=1 TO O
45 LET A=2*I*I+I*2
50 REM IF H(A) THEN GOTO 90
60 FOR J=A TO Z STEP 1+I*2
65 REM IF H(J) THEN GOTO 80
70 LET H(J)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I
</syntaxhighlight>

====Eulerian optimisation====
While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of <code>FOR</code> statements (& as their 1K of static RAM is equivalent to L1 cache, even in <code>FAST</code> mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)
<syntaxhighlight lang="qbasic">
10 INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h$=FILL$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h$((2*q+1)*p+q)="1"
42 GOTO 87
61 IF h$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h$(z%)=" ": PRINT N%(z%)!
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2
</syntaxhighlight>


=={{header|Batch File}}==
=={{header|Batch File}}==
<lang dos>:: Sieve of Eratosthenes for Rosetta Code - PG
<syntaxhighlight lang="dos">:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEDELAYEDEXPANSION
Line 1,036: Line 2,024:
if !crible.%%i! EQU 1 echo %%i
if !crible.%%i! EQU 1 echo %%i
)
)
pause</lang>
pause</syntaxhighlight>
{{Out}}
{{Out}}
<pre style="height:20ex">limit: 100
<pre style="height:20ex">limit: 100
Line 1,064: Line 2,052:
89
89
97</pre>
97</pre>

=={{header|BBC BASIC}}==
<syntaxhighlight lang="bbcbasic"> limit% = 100000
DIM sieve% limit%
prime% = 2
WHILE prime%^2 < limit%
FOR I% = prime%*2 TO limit% STEP prime%
sieve%?I% = 1
NEXT
REPEAT prime% += 1 : UNTIL sieve%?prime%=0
ENDWHILE
REM Display the primes:
FOR I% = 1 TO limit%
IF sieve%?I% = 0 PRINT I%;
NEXT</syntaxhighlight>

=={{header|BCPL}}==
<syntaxhighlight lang="bcpl">get "libhdr"

manifest $( LIMIT = 1000 $)

let sieve(prime,max) be
$( let i = 2
0!prime := false
1!prime := false
for i = 2 to max do i!prime := true
while i*i <= max do
$( if i!prime do
$( let j = i*i
while j <= max do
$( j!prime := false
j := j + i
$)
$)
i := i + 1
$)
$)

let start() be
$( let prime = vec LIMIT
let col = 0
sieve(prime, LIMIT)
for i = 2 to LIMIT do
if i!prime do
$( writef("%I4",i)
col := col + 1
if col rem 20 = 0 then wrch('*N')
$)
wrch('*N')
$)</syntaxhighlight>
{{out}}
<pre> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997</pre>
=== Odds-only bit packed array version (64 bit) ===
This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.
<syntaxhighlight lang="bcpl">
GET "libhdr"

LET lowbit(n) =
0 -> -1,
VALOF {
// The table is byte packed to conserve space; therefore we must
// unpack the structure.
//
LET deBruijn64 = TABLE
#x0001300239311C03, #x3D3A322A261D1104,
#x3E373B2435332B16, #x2D27211E18120C05,
#x3F2F381B3C292510, #x362334152C20170B,
#x2E1A280F22141F0A, #x190E13090D080706

LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
}

LET primes_upto(limit) =
limit < 3 -> 0,
VALOF {
LET bit_sz = (limit + 1) / 2 - 1
LET bit, p = ?, ?
LET q, r = bit_sz >> 6, bit_sz & #x3F
LET sz = q - (r > 0)
LET sieve = getvec(sz)

// Initialize the array
FOR i = 0 TO q - 1 DO
sieve!i := -1
IF r > 0 THEN sieve!q := ~(-1 << r)
sieve!sz := -1 // Sentinel value to mark the end -
// (after sieving, we'll never have 64 consecutive odd primes.)

// run the sieve
bit := 0
{
WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
bit +:= 1
p := 2*bit + 3
q := p*p
IF q > limit THEN RESULTIS sieve
r := (q - 3) >> 1
UNTIL r >= bit_sz DO {
sieve[r >> 6] &:= ~(1 << (r & #x3F))
r +:= p
}
bit +:= 1
} REPEAT
}

MANIFEST { // fields in an iterable
sieve_start; sieve_bits; sieve_ptr
}

LET prime_iter(sieve) = VALOF {
LET iter = getvec(2)
iter!sieve_start := 0
iter!sieve_bits := sieve!0
iter!sieve_ptr := sieve
RESULTIS iter
}

LET nextprime(iter) =
!iter!sieve_ptr = -1 -> 0, // guard entry if at the end already
VALOF {
LET p, x = ?, ?

// iter!sieve_start is also a flag to yield 2.
IF iter!sieve_start = 0 {
iter!sieve_start := 3
RESULTIS 2
}
x := iter!sieve_bits
{
TEST x ~= 0
THEN {
p := (lowbit(x) << 1) + iter!sieve_start
x &:= x - 1
iter!sieve_bits := x
RESULTIS p
}
ELSE {
iter!sieve_start +:= 128
iter!sieve_ptr +:= 1
x := !iter!sieve_ptr
IF x = -1 RESULTIS 0
}
} REPEAT
}

LET show(sieve) BE {
LET iter = prime_iter(sieve)
LET c, p = 0, ?
{
p := nextprime(iter)
IF p = 0 THEN {
wrch('*n')
freevec(iter)
RETURN
}
IF c MOD 10 = 0 THEN wrch('*n')
c +:= 1
writef("%8d", p)
} REPEAT
}

LET start() = VALOF {
LET n = ?
LET argv = VEC 20
LET sz = ?
LET primes = ?

sz := rdargs("upto/a/n/p", argv, 20)
IF sz = 0 RESULTIS 1
n := !argv!0
primes := primes_upto(n)
IF primes = 0 RESULTIS 1 // no array allocated because limit too small
show(primes)
freevec(primes)
RESULTIS 0
}
</syntaxhighlight>
{{Out}}
<pre>
$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000>
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
0.005>
</pre>


=={{header|Befunge}}==
=={{header|Befunge}}==
Line 1,070: Line 2,271:
@ ^ p3\" ":<
@ ^ p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789
2 234567890123456789012345678901234567890123456789012345678901234567890123456789

=={{header|Binary Lambda Calculus}}==

The BLC sieve of Eratosthenes as documented at https://github.com/tromp/AIT/blob/master/characteristic_sequences/primes.lam is the 167 bit program

<pre>00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110</pre>

The infinitely long output is

<pre>001101010001010001010001000001010000010001010001000001000001010000010001010000010001000001000000010001010001010001000000000000010001000001010000000001010000010000010001000001000001010000000001010001010000000000010000000000010001010001000001010000000001000001000001000001010000010001010000000001000000000000010001010001000000000000010000010000000001010001000001000...</pre>

=={{header|BQN}}==

A more efficient sieve (primes below one billion in under a minute) is provided as <code>PrimesTo</code> in bqn-libs [https://github.com/mlochbaum/bqn-libs/blob/master/primes.bqn primes.bqn].

<syntaxhighlight lang="bqn">Primes ← {
𝕩≤2 ? ↕0 ; # No primes below 2
p ← 𝕊⌈√n←𝕩 # Initial primes by recursion
b ← 2≤↕n # Initial sieve: no 0 or 1
E ← {↕∘⌈⌾((𝕩×𝕩+⊢)⁼)n} # Multiples of 𝕩 under n, starting at 𝕩×𝕩
/ b E⊸{0¨⌾(𝕨⊸⊏)𝕩}´ p # Cross them out
}</syntaxhighlight>

{{out}}
<syntaxhighlight lang="bqn"> Primes 100
⟨ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 ⟩
≠∘Primes¨ 10⋆↕7 # Number of primes below 1e0, 1e1, ... 1e6
⟨ 0 4 25 168 1229 9592 78498 ⟩</syntaxhighlight>


=={{header|Bracmat}}==
=={{header|Bracmat}}==
This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.
This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.
<lang bracmat>( ( eratosthenes
<syntaxhighlight lang="bracmat">( ( eratosthenes
= n j i
= n j i
. !arg:?n
. !arg:?n
Line 1,094: Line 2,323:
)
)
& eratosthenes$100
& eratosthenes$100
)</lang>
)</syntaxhighlight>
{{out}}
{{out}}
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97


=={{header|C}}==
=={{header|C}}==
Plain sieve, without any optimizations:<lang c>#include <stdlib.h>
Plain sieve, without any optimizations:<syntaxhighlight lang="c">#include <stdlib.h>
#include <math.h>
#include <math.h>


Line 1,126: Line 2,355:
}
}
return sieve;
return sieve;
}</lang>Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.
}</syntaxhighlight>Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.


'''Another example:'''
'''Another example:'''
Line 1,133: Line 2,362:
Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples!
Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples!
To understand this better, look at the output of the following example.
To understand this better, look at the output of the following example.
To print this back, we look for ones in the array and only print those spots. <lang C>#include <stdio.h>
To print this back, we look for ones in the array and only print those spots. <syntaxhighlight lang="c">#include <stdio.h>
#include <malloc.h>
#include <malloc.h>
void sieve(int *, int);
void sieve(int *, int);
Line 1,140: Line 2,369:
{
{
int *array, n=10;
int *array, n=10;
array =(int *)malloc(sizeof(int));
array =(int *)malloc((n + 1) * sizeof(int));
sieve(array,n);
sieve(array,n);
return 0;
return 0;
Line 1,171: Line 2,400:
}
}
printf("\n\n");
printf("\n\n");
}</lang>{{out}}<lang Shell>i:2
}</syntaxhighlight>{{out}}<syntaxhighlight lang="shell">i:2
j:2
j:2
Before a[2*2]: 1
Before a[2*2]: 1
Line 1,195: Line 2,424:
i:9
i:9
i:10
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7, </lang>
Primes numbers from 1 to 10 are : 2, 3, 5, 7, </syntaxhighlight>
=={{header|C++}}==
<lang cpp>// yield all prime numbers less than limit.
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
std::vector<bool> is_prime(limit, true);
const int sqrt_limit = static_cast<int>(std::sqrt(limit));
for (int n = 2; n <= sqrt_limit; ++n)
if (is_prime[n]) {
yield(n);

for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n)
//NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
is_prime[k] = false;
}

for (int n = sqrt_limit + 1; n < limit; ++n)
if (is_prime[n])
yield(n);
}</lang>

Full program:

{{works with|Boost}}<lang cpp>/**
$ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
*/
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>


int main(int argc, char *argv[])
{
using namespace std;
using namespace boost::lambda;

int limit = 10000;
if (argc == 2) {
stringstream ss(argv[--argc]);
ss >> limit;

if (limit < 1 or ss.fail()) {
cerr << "USAGE:\n sieve LIMIT\n\nwhere LIMIT in the range [1, "
<< numeric_limits<int>::max() << ")" << endl;
return 2;
}
}

// print primes less then 100
primesupto(100, cout << _1 << " ");
cout << endl;

// find number of primes less then limit and their sum
int count = 0;
uintmax_t sum = 0;
primesupto(limit, (var(sum) += _1, var(count) += 1));

cout << "limit sum pi(n)\n"
<< limit << " " << sum << " " << count << endl;
}</lang>


=={{header|C sharp|C#}}==
=={{header|C sharp|C#}}==
{{works with|C sharp|C#|2.0+}}
{{works with|C sharp|C#|2.0+}}
<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Generic;
Line 1,308: Line 2,471:
}
}
}
}
}</lang>
}</syntaxhighlight>


===Unbounded===
===Richard Bird Sieve===

'''Richard Bird Sieve'''


{{trans|F#}}
{{trans|F#}}


To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:
To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:
<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Generic;
Line 1,357: Line 2,518:
}
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}</lang>
}</syntaxhighlight>


'''Tree Folding Sieve'''
===Tree Folding Sieve===


{{trans|F#}}
{{trans|F#}}


The above code can easily be converted to "'''odds-only'''" and a infinite tree-like folding scheme with the following minor changes:
The above code can easily be converted to "'''odds-only'''" and a infinite tree-like folding scheme with the following minor changes:
<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Generic;
Line 1,412: Line 2,573:
}
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}</lang>
}</syntaxhighlight>


The above code runs over ten times faster than the original Richard Bird algorithm.
The above code runs over ten times faster than the original Richard Bird algorithm.


'''Priority Queue Sieve'''
===Priority Queue Sieve===


{{trans|F#}}
{{trans|F#}}


First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at [http://rosettacode.org/wiki/Priority_queue#C.23 RosettaCode], with only the necessary methods duplicated here:
First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at [http://rosettacode.org/wiki/Priority_queue#C.23 RosettaCode], with only the necessary methods duplicated here:
<lang csharp>namespace PriorityQ {
<syntaxhighlight lang="csharp">namespace PriorityQ {
using KeyT = System.UInt32;
using KeyT = System.UInt32;
using System;
using System;
Line 1,478: Line 2,639:
public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
pq.rplcmin(k, v); return pq; }
pq.rplcmin(k, v); return pq; }
}</lang>
}</syntaxhighlight>


===Restricted Base Primes Queue===


The following code implements an improved version of the '''odds-only''' O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:
The following code implements an improved version of the '''odds-only''' O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Generic;
Line 1,516: Line 2,681:
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}</lang>
}</syntaxhighlight>


The above code is at least about 2.5 times faster than the Tree Folding version.
The above code is at least about 2.5 times faster than the Tree Folding version.



'''Dictionary (Hash table) Sieve'''
===Dictionary (Hash table) Sieve===


The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:
The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Generic;
Line 1,556: Line 2,723:
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}</lang>
}</syntaxhighlight>


The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".
The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".


===Best performance: CPU-Cache-Optimized Segmented Sieve===
'''Page Segmented Array Sieve'''


All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:
All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:
<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Generic;
Line 1,619: Line 2,786:
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}</lang>
}</syntaxhighlight>


The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.
The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.


The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "rull-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of '''odds-only''', this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.
The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "roll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of '''odds-only''', this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.


In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).
In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).
Line 1,630: Line 2,797:


All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):
All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):
<lang csharp> static void Main(string[] args) {
<syntaxhighlight lang="csharp"> static void Main(string[] args) {
Console.WriteLine(PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
Console.WriteLine(new PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
}</lang>
}</syntaxhighlight>


To produce the following output for all tested versions (although some are considerably faster than others):
To produce the following output for all tested versions (although some are considerably faster than others):
{{output}}
{{output}}
<pre>15485863</pre>
<pre>15485863</pre>

=={{header|C++}}==
===Standard Library===

This implementation follows the standard library pattern of [http://en.cppreference.com/w/cpp/algorithm/iota std::iota]. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

<syntaxhighlight lang="cpp">#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>

// Fills the range [start, end) with 1 if the integer corresponding to the index is composite and 0 otherwise.
// requires: I is RandomAccessIterator
template<typename I>
void mark_composites(I start, I end)
{
std::fill(start, end, 0);

for (auto it = start + 1; it != end; ++it)
{
if (*it == 0)
{
auto prime = std::distance(start, it) + 1;
// mark all multiples of this prime number as composite.
auto multiple_it = it;
while (std::distance(multiple_it, end) > prime)
{
std::advance(multiple_it, prime);
*multiple_it = 1;
}
}
}
}

// Fills "out" with the prime numbers in the range 2...N where N = distance(start, end).
// requires: I is a RandomAccessIterator
// O is an OutputIterator
template <typename I, typename O>
O sieve_primes(I start, I end, O out)
{
mark_composites(start, end);
for (auto it = start + 1; it != end; ++it)
{
if (*it == 0)
{
*out = std::distance(start, it) + 1;
++out;
}
}
return out;
}

int main()
{
std::vector<uint8_t> is_composite(1000);
sieve_primes(is_composite.begin(), is_composite.end(), std::ostream_iterator<int>(std::cout, " "));

// Alternative to store in a vector:
// std::vector<int> primes;
// sieve_primes(is_composite.begin(), is_composite.end(), std::back_inserter(primes));
}
</syntaxhighlight>

=== Boost ===

<syntaxhighlight lang="cpp">// yield all prime numbers less than limit.
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
std::vector<bool> is_prime(limit, true);
const int sqrt_limit = static_cast<int>(std::sqrt(limit));
for (int n = 2; n <= sqrt_limit; ++n)
if (is_prime[n]) {
yield(n);

for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n)
//NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
is_prime[k] = false;
}

for (int n = sqrt_limit + 1; n < limit; ++n)
if (is_prime[n])
yield(n);
}</syntaxhighlight>

Full program:

{{works with|Boost}}<syntaxhighlight lang="cpp">/**
$ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
*/
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>

int main(int argc, char *argv[])
{
using namespace std;
using namespace boost::lambda;

int limit = 10000;
if (argc == 2) {
stringstream ss(argv[--argc]);
ss >> limit;

if (limit < 1 or ss.fail()) {
cerr << "USAGE:\n sieve LIMIT\n\nwhere LIMIT in the range [1, "
<< numeric_limits<int>::max() << ")" << endl;
return 2;
}
}

// print primes less then 100
primesupto(100, cout << _1 << " ");
cout << endl;

// find number of primes less then limit and their sum
int count = 0;
uintmax_t sum = 0;
primesupto(limit, (var(sum) += _1, var(count) += 1));

cout << "limit sum pi(n)\n"
<< limit << " " << sum << " " << count << endl;
}</syntaxhighlight>


=={{header|Chapel}}==
=={{header|Chapel}}==
{{incorrect|Chapel|Doesn't compile since at least Chapel version 1.20 to 1.24.1.}}
This solution uses nested iterators to create new wheels at run time:
This solution uses nested iterators to create new wheels at run time:
<lang chapel>// yield prime and remove all multiples of it from children sieves
<syntaxhighlight lang="chapel">// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {
iter sieve(prime):int {


Line 1,650: Line 2,947:


// candidate is a multiple of this prime
// candidate is a multiple of this prime
if composite == candidate then {
if composite == candidate {
// remember size of last composite
// remember size of last composite
last = composite;
last = composite;
Line 1,662: Line 2,959:
yield candidate;
yield candidate;
}
}
}</lang>The topmost sieve needs to be started with 2 (the smallest prime):
}</syntaxhighlight>The topmost sieve needs to be started with 2 (the smallest prime):
<lang chapel>config const N = 30;
<syntaxhighlight lang="chapel">config const N = 30;
for p in sieve(2) {
for p in sieve(2) {
write(" ", p);
if p > N {
if p > N then {
writeln();
writeln();
break;
break;
}
}
write(" ", p);
}</lang>
}</syntaxhighlight>

===Alternate Conventional Bit-Packed Implementation===

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

<syntaxhighlight lang="chapel">use Time;
use BitOps;

type Prime = uint(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
write("The first 25 primes are: ");
for p in primes(100) do write(p, " "); writeln();
var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is: ", count, ".");
var timer: Timer;
timer.start();

count = 0;
for p in primes(limit) do count += 1;

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
const szlmt = n / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

for bp in 2 .. n {
if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
const s0 = bp * bp;
if s0 > n then break;
for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
}
}

for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;

}</syntaxhighlight>

{{out}}

<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is: 78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.</pre>

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

===Alternate Odds-Only Bit-Packed Implementation===

<syntaxhighlight lang="chapel">use Time;
use BitOps;

type Prime = int(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
write("The first 25 primes are: ");
for p in primes(100) do write(p, " "); writeln();
var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is: ", count, ".");
var timer: Timer;
timer.start();

count = 0;
for p in primes(limit) do count += 1;

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
const ndxlmt = (n - 3) / 2;
const szlmt = ndxlmt / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

for i in 0 .. ndxlmt { // never gets to the end!
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
const bp = i + i + 3;
const s0 = (bp * bp - 3) / 2;
if s0 > ndxlmt then break;
for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
}
}

yield 2;
for i in 0 .. ndxlmt do
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;

}</syntaxhighlight>

{{out}}

<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is: 78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.</pre>

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

===Hash Table Based Odds-Only Version===

{{trans|Python}} [https://rosettacode.org/wiki/Sieve_of_Eratosthenes#Infinite_generator_with_a_faster_algorithm code link]
{{works with|Chapel|1.25.1}}

<syntaxhighlight lang="chapel">use Time;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: owned Primes?;
var keys: domain(Prime); var dict: [keys] Prime;
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.keys += key; this.dict[key] = adv;
}
else if this.keys.contains(this.n) { // found a composite; advance...
const adv = this.dict[this.n]; this.keys.remove(this.n);
var nkey = this.n + adv;
while this.keys.contains(nkey) do nkey += adv;
this.keys += nkey; this.dict[nkey] = adv;
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();
var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}</syntaxhighlight>

{{out}}

<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 5195.41 milliseconds.</pre>

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

{{works with|Chapel|1.25.1}}
Compile with the `--fast` compiler command line option

<syntaxhighlight lang="chapel">use Time;
config const limit = 100000000;
type Prime = uint(32);

record BasePrimesTable { // specialized for the use here...
record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
var cpcty: int = 8; var sz: int = 0;
var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
proc grow() {
const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);
}
proc find(k: Prime): int { // internal get location of value or -1
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == k then return skey;
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
return -1; // not found!
}
proc contains(k: Prime): bool { return find(k) >= 0; }
proc add(k, v: Prime) { // if exists then replaces else new entry
const fndi = find(k);
if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
else {
sz += 1; if 2 * sz > cpcty then grow();
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == 0 {
bpa[skey] = new BasePrimeEntry(k, v); return; }
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
}
}
proc remove(k: Prime) { // if doesn't exist does nothing
const fndi = find(k);
if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
}
proc this(k: Prime): Prime { // returns value or 0 if not found
const fndi = find(k);
if fndi < 0 then return 0; else return bpa[fndi].val;
}
}

class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: shared Primes?; var dict = new BasePrimesTable();
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.dict.add(key, adv);
}
else if this.dict.contains(this.n) { // found a composite; advance...
const adv = this.dict[this.n]; this.dict.remove(this.n);
var nkey = this.n + adv;
while this.dict.contains(nkey) do nkey += adv;
this.dict.add(nkey, adv);
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();
var timer: Timer;
timer.start();
count = 0;
for p in new Primes() { if p > limit then break; count += 1; }
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}</syntaxhighlight>

{{out}}

<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 2351.79 milliseconds.</pre>

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:

{{works with|Chapel|1.25.1}}
Compile with the `--fast` compiler command line option

<syntaxhighlight lang="chapel">use Time;

use Map;
config const limit = 100000000;
type Prime = uint(32);
class Primes { // needed so we can use next to get successive values
record PrimeR { var prime: Prime; proc hash() { return prime; } }
var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
var bps: owned Primes?;
var dict = new map(PrimeR, Prime);
proc next(): Prime { // odd primes!
if this.n.prime < 5 { this.n.prime = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n.prime >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.dict.add(key, adv);
}
else if this.dict.contains(this.n) { // found a composite; advance...
const adv = this.dict.getValue(this.n); this.dict.remove(this.n);
var nkey = new PrimeR(this.n.prime + adv);
while this.dict.contains(nkey) do nkey.prime += adv;
this.dict.add(nkey, adv);
}
else { const p = this.n.prime;
this.n.prime += 2; return p; }
this.n.prime += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}
proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();
var timer: Timer;
timer.start();
count = 0;
for p in new Primes() { if p > limit then break; count += 1; }
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}</syntaxhighlight>

This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.

===Functional Tree Folding Odds-Only Version===

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

{{trans|Nim}} [https://rosettacode.org/wiki/Sieve_of_Eratosthenes#Nim_Unbounded_Versions code link]
{{works with|Chapel|1.22|- compile with the --fast compiler command line flag for full optimization}}
<syntaxhighlight lang="chapel">use Time;

type Prime = uint(32);

config const limit = 1000000: Prime;

// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
var head: Prime;
proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}

class PrimeMultiples: PrimeCIS {
var adv: Prime;
override proc next(): shared PrimeCIS {
return new shared PrimeMultiples(
this.head + this.adv, this.adv): shared PrimeCIS; }
}

class PrimeCISCIS { // base stream of prime streams; never used directly...
var head: shared PrimeCIS;
proc init() { this.head = new shared PrimeCIS(); }
proc next(): shared PrimeCISCIS {
return new shared PrimeCISCIS(); }
}

class AllMultiples: PrimeCISCIS {
var bps: shared PrimeCIS;
proc init(bsprms: shared PrimeCIS) {
const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
this.head = new shared PrimeMultiples(sqr, adv): PrimeCIS;
this.bps = bsprms;
}
override proc next(): shared PrimeCISCIS {
return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}

class Union: PrimeCIS {
var feeda, feedb: shared PrimeCIS;
proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
const ahd = fda.head; const bhd = fdb.head;
this.head = if ahd < bhd then ahd else bhd;
this.feeda = fda; this.feedb = fdb;
}
override proc next(): shared PrimeCIS {
const ahd = this.feeda.head; const bhd = this.feedb.head;
if ahd < bhd then
return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
if ahd > bhd then
return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
return new shared Union(this.feeda.next(),
this.feedb.next()): shared PrimeCIS;
}
}

class Pairs: PrimeCISCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
const fs = fd.head; const sss = fd.next(); const ss = sss.head;
this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
}
override proc next(): shared PrimeCISCIS {
return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}

class Composites: PrimeCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
this.head = fd.head.head; this.feed = fd;
}
override proc next(): shared PrimeCIS {
const fs = this.feed.head.next();
const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
const ncs = new shared Composites(prs): shared PrimeCIS;
return new shared Union(fs, ncs): shared PrimeCIS;
}
}

class OddPrimesFrom: PrimeCIS {
var cmpsts: shared PrimeCIS;
override proc next(): shared PrimeCIS {
var n = head + 2; var cs = this.cmpsts;
while true {
if n < cs.head then
return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
n += 2; cs = cs.next();
}
return this.cmpsts; // never used; keeps compiler happy!
}
}

class OddPrimes: PrimeCIS {
proc init() { this.head = 3; }
override proc next(): shared PrimeCIS {
const bps = new shared OddPrimes(): shared PrimeCIS;
const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
const cmpsts = new shared Composites(mlts): shared PrimeCIS;
return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
}
}

iter primes(): Prime {
yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
while true { yield cur.head; cur = cur.next(); }
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");</syntaxhighlight>
{{out}}
<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.</pre>

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

===A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version===

To take advantage of the features that make Chapel shine, we need to use it to do some parallel computations, and to efficiently do that for the Sieve of Eratosthenes, we need to divide the work into page segments where we can assign each largish segment to a separate thread/task; this also improves the speed due to better cache associativity with most memory accesses to values that are already in the cache(s). Once we have divided the work, Chapel offers lots of means to implement the parallelism but to be a true Sieve of Eratosthenes, we need to have the ability to output the results in order; with many of the convenience mechanisms not doing that, the best/simplest option is likely task parallelism with the output results assigned to an rotary indexed array containing the `sync` results. It turns out that, although the Chapel compiler can sometimes optimize the code so the overhead of creating tasks is not onerous, for this case where the actual tasks are somewhat complex, the compiler can't recognize that an automatically generated thread pool(s) are required so we need to generate the thread pool(s) manually. The code that implements the multi-threading of page segments using thread pools is as follows:
{{works with|Chapel|1.24.1|- compile with the --fast compiler command line flag for full optimization}}
<syntaxhighlight lang="chapel">use Time; use BitOps; use CPtr;

type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);

config const LIMIT = 1000000000: Prime;

config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
"L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
"L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
config const NUMTHRDS = here.maxTaskPar;
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");

const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;

// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
if L1 == 32 then 23000 else 43390)
- WHLPRMS.size + 1; // extra one for marker!

// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;

var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
const hi = WHLPRMS.size - 1;
const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
// contains wheel pattern primes skipping the small wheel prime (2)!...
// never advances past the first base prime arr as it ends with a huge!...
for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
else 0x7FFFFFFF): BasePrime; // last huge!
var whlbpas = new shared BasePrimeArrs(whlhd);
var whlstrts = new StrtsArr({rng});
wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
// eliminate wheel primes from the WHLPTRN buffer!...
wp.cmpsts[0] = 0xFF: uint(8);
}

// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
proc init() {}
proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
proc deinit() { dom = { 0 .. -1 }; }

proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
const sz = cmpsts.size; const mvsz = min(sz, 16384);
var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
for i in 0 .. sz - 1 by 16384 {
c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);
mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
}
}

proc count(btlmt: int) { // count by 64 bits using CPU popcount...
const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
var i = 0; var cnt = (lstwrd * 64 + 64): int;
while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;
}

// most of the time is spent doing culling operations as follows!...
proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
ref strts: StrtsArr) {
const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
const strtssz = strts.strtsarr.size;
// C pointer for speed magic!...
const cmpstsp = c_ptrTo(cmpsts[0]);
const strtsp = c_ptrTo(strts.strtsarr);

// first fill the strts array with pre-calculated start addresses...
var i = 0; for bp in bpas {
// calculate page start address for the given base prime...
const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
if s0 > ndxlmt then {
if i < strtssz then strtsp[i] = -1: int(32); break; }
var s = 0: int;
if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
else { const r = (lwi: Prime - s0) % bbp;
if r == 0 then s = 0: int; else s = (bbp - r): int; };
if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
if i < strtssz { strtsp[i] = -1; i = strtssz; }
// cull the full buffer for this given base prime as usual...
// only works up to limit of int(32)**2!!!!!!!!
while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
}

// cull the smaller sub buffers according to the strts array...
for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
i = 0; for bp in bpas { // bp never bigger than uint(32)!
// cull the sub buffer for this given base prime...
var s = strtsp[i]: int; if s < 0 then break;
var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
const slmt = s + bpi * 8 - 1;
while s <= slmt {
const bmi = s & 7; const msk = bitmsk[bmi];
var c = s >> 3; const clmt = sbtlmt >> 3;
while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
}
strtsp[i] = nxt: int(32); i += 1;
}
else { while s <= sbtlmt { // standard cull loop...
cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
strtsp[i] = s: int(32); i += 1; }
}
}
}
}

// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
// multi-threaded through the use of a thread pool...
class PagedResults {
const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
var lwi: PrimeNdx; var bsbtsz: int;
var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
var qi: int = 0;
var wrkq$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
var rsltsq$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;

proc init(cvclsr, li: PrimeNdx, bsz: int) {
cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }

proc deinit() { // kill the thread pool when out of scope...
if bpas == nil then return; // no thread pool!
for i in wrkq$.domain {
wrkq$[i].writeEF(-1); while true { const r = rsltsq$[i].readFE();
if r == nil then break; }
}
}
proc next(): cnvrtrclsr(lwi, sbs(0)).type {
proc dowrk(ri: int) { // used internally!...
while true {
const li = wrkq$[ri].readFE(); // following to kill thread!
if li < 0 { rsltsq$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
sbs[ri].fill(li);
sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
rsltsq$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
}
}
if this.bpas == nil { // init on first use; avoids data race!
this.bpas = new BasePrimeArrs();
if this.bsbtsz < CPUL1CACHE {
this.sbs = new SieveBuffer(bsbtsz);
this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
}
else {
this.sbs = new SieveBuffer(CPUL2CACHE);
this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
}
// start threadpool and give it inital work...
for i in rsltsq$.domain {
begin with (const in i) dowrk(i);
this.wrkq$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
}
}
const rslt = this.rsltsq$[qi].readFE();
this.wrkq$[qi].writeEF(this.lwi);
this.lwi += this.sbs[qi].cmpsts.size * 8;
this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
return rslt;
}
iter these() { while lwi >= 0 do yield next(); }
}

// the sieve buffer to base prime array converter closure...
record SB2BPArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
const bsprm = (lwi + lwi + 3): BasePrime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
i += 1; }
return arr;
}
}

// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
var head: shared BasePrimeArr;
var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var lock$: sync bool = true;
var feed: shared PagedResults(SB2BPArr) =
new shared PagedResults(new SB2BPArr(), 65536, 65536);

proc init() { // make our own first array to break data race!
var sb = new SieveBuffer(256); sb.fill(0);
const sb2 = new SB2BPArr();
head = sb2(0, sb): shared BasePrimeArr;
this.complete(); // fake base primes!
sb = new SieveBuffer(65536); sb.fill(0);
// use (completed) self as source of base primes!
var strts = new StrtsArr({ 0 .. 256 });
sb.cull(0, 65536, this, strts);
// replace head with new larger version culled using fake base primes!...
head = sb2(0, sb): shared BasePrimeArr;
}

// for initializing for use by the fillWHLPTRN proc...
proc init(hd: shared BasePrimeArr) {
head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
}

// for initializing lazily extended list as required...
proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }

proc next(): shared BasePrimeArrs {
if this.tail == nil { // in case other thread slipped through!
if this.lock$.readFE() && this.tail == nil { // empty sync -> block others!
const nhd = this.feed.next(): shared BasePrimeArr;
this.tail = new shared BasePrimeArrs(nhd , this.feed);
}
this.lock$.writeEF(false); // fill the sync so other threads can do nothing!
}
return this.tail: shared BasePrimeArrs; // necessary cast!
}
iter these(): BasePrime {
for bp in head.bparr do yield bp; var cur = next();
while true {
for bp in cur.head.bparr do yield bp; cur = cur.next(); }
}
}

record SB2PrmArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
const bsprm = (lwi + lwi + 3): Prime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
i += 1; }
return arr;
}
}

iter primes(): Prime {
for p in WHLPRMS do yield p: Prime;
for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
for p in pa!.prmarr do yield p;
}

// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }

// a class that emulates a closure and a return value...
record SB2Cnt {
const nxtlmt: PrimeNdx;
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
return new shared CntNxt(sb.count(btlmt), lstndx);
}
}

// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
for p in WHLPRMS { if p > lmt then break; count += 1; }
if lmt < FRSTSVPRM then return count;
for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
}
return count;
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");

write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");
writeln("using ", NUMTHRDS, " threads.");

var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();

write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");</syntaxhighlight>
{{out}}
<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.</pre>

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per [https://stackoverflow.com/a/57108107/549617 my StackOverflow JavaScript Tutorial answer], with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.


=={{header|Clojure}}==
=={{header|Clojure}}==
<syntaxhighlight lang="clojure">(defn primes< [n]
''primes<'' is a functional interpretation of the Sieve of Eratosthenes. It merely removes the set of composite numbers from the set of odd numbers (wheel of 2) leaving behind only prime numbers. It uses a transducer internally with "into #{}".
(remove (set (mapcat #(range (* % %) n %)
<lang clojure>
(range 2 (Math/sqrt n))))
(defn primes< [n]
(range 2 n)))</syntaxhighlight>
(if (<= n 2)
()
(remove (into #{}
(mapcat #(range (* % %) n %))
(range 3 (Math/sqrt n) 2))
(cons 2 (range 3 n 2)))))
</lang>


The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the ''mapcat'') include all of the multiples of all of the numbers and not just the multiples of primes. When tested with <code clojure>(println (time (count (primes< 1000000))))</code>, it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.
Calculates primes up to and including ''n'' using a mutable boolean array but otherwise entirely functional code.

<lang clojure>
It may be written using the ''into #{}'' function to run slightly faster due to the ''set'' function being concerned with only distinct elements whereas the ''into #{}'' only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:
(defn primes-to

<syntaxhighlight lang="clojure">(defn primes< [n]
(remove (into #{}
(mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))</syntaxhighlight>

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the ''into #{}'' transducer but has been slightly wheel-factorized to sieve odds-only:
<syntaxhighlight lang="clojure">(defn primes< [n]
(if (< n 2) ()
(cons 2 (remove (into #{}
(mapcat #(range (* % %) n %)
(range 3 (Math/sqrt n) 2)))
(range 3 n 2)))))</syntaxhighlight>

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including ''n'' using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:
<syntaxhighlight lang="clojure">(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
[n]
Line 1,698: Line 3,853:
(do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
(do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
(range 2 (inc root)))))
(range 2 (inc root)))))
(filter #(not (aget cmpsts %)) (range 2 (inc n))))))
(filter #(not (aget cmpsts %)) (range 2 (inc n))))))</syntaxhighlight>
</lang>


'''Alternative implementation using Clojure's side-effect oriented list comprehension.'''
'''Alternative implementation using Clojure's side-effect oriented list comprehension.'''


<lang clojure>
<syntaxhighlight lang="clojure"> (defn primes-to
(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
"Returns a lazy sequence of prime numbers less than lim"
[lim]
[lim]
Line 1,714: Line 3,867:
(doseq [j (range (* i i) lim i)]
(doseq [j (range (* i i) lim i)]
(aset refs j false)))
(aset refs j false)))
(filter #(aget refs %) (range 2 lim)))))
(filter #(aget refs %) (range 2 lim)))))</syntaxhighlight>
</lang>


'''Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.'''
'''Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.'''
<lang clojure>
<syntaxhighlight lang="clojure">(defn primes-to
(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
"Returns a lazy sequence of prime numbers less than lim"
[lim]
[lim]
Line 1,729: Line 3,880:
(doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
(doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
(aset refs j false)))
(aset refs j false)))
(cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))
(cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))</syntaxhighlight>

</lang>
This implemantation is about twice fast than previous one and use only half memory.
This implemantation is about twice as fast as the previous one and uses only half the memory.
From the index of array calculates the value it represents as (2*i + 1), the step between two index that represents
From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent
the multiples of primes to mark as composite is also (2*i + 1).
the multiples of primes to mark as composite is also (2*i + 1).
The index of the square of the prime to start composite marking is 2*i*(i+1).
The index of the square of the prime to start composite marking is 2*i*(i+1).
Line 1,738: Line 3,889:
'''Alternative very slow entirely functional implementation using lazy sequences'''
'''Alternative very slow entirely functional implementation using lazy sequences'''


<lang clojure>
<syntaxhighlight lang="clojure">(defn primes-to
(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
[n]
Line 1,747: Line 3,897:
(cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
(cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
set (remove cs) rest)))))))]
set (remove cs) rest)))))))]
(nxtprm (range 2 (inc n)))))
(nxtprm (range 2 (inc n)))))</syntaxhighlight>
</lang>


The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.
The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.
Line 1,755: Line 3,904:


Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:
Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:
<lang clojure>
<syntaxhighlight lang="clojure">(defn primes-to
(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[max-prime]
[max-prime]
Line 1,770: Line 3,918:
(assoc 1 false)
(assoc 1 false)
(sieve 2))
(sieve 2))
(map-indexed #(vector %2 %1)) (filter first) (map second))))
(map-indexed #(vector %2 %1)) (filter first) (map second))))</syntaxhighlight>
</lang>


The above code is still quite slow due to the cost of the immutable copy-on-modify operations.
The above code is still quite slow due to the cost of the immutable copy-on-modify operations.
Line 1,778: Line 3,925:


The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:
The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:
<syntaxhighlight lang="clojure">(set! *unchecked-math* true)
<lang clojure>
(set! *unchecked-math* true)


(defn primes-to
(defn primes-to
Line 1,805: Line 3,951:
(recur (inc i)))))))))]
(recur (inc i)))))))))]
(if (< n 2) nil
(if (< n 2) nil
(cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))
(cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))</syntaxhighlight>
</lang>


The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.
The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.
Line 1,813: Line 3,958:


The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):
The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):
<lang clojure>
<syntaxhighlight lang="clojure">(defn primes-tox
(defn primes-tox
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
[n]
Line 1,886: Line 4,030:
(toString [this] (if (= cnt tcnt) "()"
(toString [this] (if (= cnt tcnt) "()"
(.toString (seq (map identity this))))))
(.toString (seq (map identity this))))))
(->OPSeq 0 cmpsts 0 (numprms))))))))
(->OPSeq 0 cmpsts 0 (numprms))))))))</syntaxhighlight>
</lang>


'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.
'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.
Line 1,905: Line 4,048:


'''A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)'''
'''A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)'''
<lang clojure>
<syntaxhighlight lang="clojure">(defn primes-Bird
(defn primes-Bird
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
[]
[]
Line 1,927: Line 4,069:
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))
(cons 2 (lazy-seq oddprms)))))</syntaxhighlight>
</lang>


The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.
The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.
Line 1,935: Line 4,076:


The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:
The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:
<syntaxhighlight lang="clojure">(defn primes-treeFolding
<lang clojure>
(defn primes-treeFolding
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
[]
Line 1,960: Line 4,100:
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))
(cons 2 (lazy-seq oddprms)))))</syntaxhighlight>
</lang>


The above code is still slower than it should be due to the slowness of Clojure's sequence operations.
The above code is still slower than it should be due to the slowness of Clojure's sequence operations.
Line 1,968: Line 4,107:


The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:
The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:
<syntaxhighlight lang="clojure">(deftype CIS [v cont]
<lang clojure>
clojure.lang.ISeq
(first [_] v)
(next [_] (if (nil? cont) nil (cont)))
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (or (not= (type cis1) (type cis2))
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(not (nil? (.cont ^CIS cis2))))
(and (nil? (.cont ^CIS cis2))
(not (nil? (.cont cis1))))) false
(if (nil? (.cont cis1)) true
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(recur ((.cont cis)) (inc cnt)))))
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
clojure.lang.Sequential
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))

(defn primes-treeFoldingx
(defn primes-treeFoldingx
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
[]
(do (deftype CIS [v cont]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
clojure.lang.ISeq
(first [_] v)
(->CIS c (fn [] (nxtmltpl (+ c p2)))))]
(next [_] (if (nil? cont) nil (cont)))
(nxtmltpl (* p p))))),
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
(cons [this o] (clojure.core/cons o this))
(union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
(if (or (not= (type cis1) (type cis2))
(->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
(pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(->CIS (union (.v mltplss) (.v tl))
(not (nil? (.cont ^CIS cis2))))
(fn [] (pairs ((.cont tl))))))),
(and (nil? (.cont ^CIS cis2))
(mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
(not (nil? (.cont cis1))))) false
(fn [] (union ((.cont ^CIS (.v mltplss)))
(if (nil? (.cont cis1)) true
(mrgmltpls (pairs ((.cont mltplss)))))))),
(minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(if (< n (.v cmpsts))
(recur ((.cont cis)) (inc cnt)))))
(->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) ((.cont cmpsts))))))]
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
(do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
clojure.lang.Sequential
(->CIS 2 (fn [] oddprms)))))</syntaxhighlight>
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(->CIS c (fn [] (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
(union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
(if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
(if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
(->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
(pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
(->CIS (union (.v mltplss) (.v tl))
(fn [] (pairs ((.cont tl))))))),
(mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
(fn [] (union ((.cont ^CIS (.v mltplss)))
(mrgmltpls (pairs ((.cont mltplss)))))))),
(minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (.v cmpsts))
(->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) ((.cont cmpsts))))))]
(do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(->CIS 2 (fn [] oddprms))))))
</lang>


'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.
'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.
Line 2,026: Line 4,164:


The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:
The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

<lang clojure>
(defn primes-hashmap
<syntaxhighlight lang="clojure">(defn primes-hashmap
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
[]
[]
Line 2,043: Line 4,181:
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))</syntaxhighlight>
</lang>


The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.
The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.
Line 2,053: Line 4,190:


In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:
In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

<lang clojure>
(deftype PQEntry [k, v]
<syntaxhighlight lang="clojure">(deftype PQEntry [k, v]
Object
Object
(toString [_] (str "<" k "," v ">")))
(toString [_] (str "<" k "," v ">")))
(deftype PQNode [^PQEntry ntry, lft, rght, lvl]
(deftype PQNode [ntry, lft, rght]
Object
Object
(toString [_] (str "<" lvl ntry " left: " (str lft) " right: " (str rght) ">")))
(toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))


(defn empty-pq [] nil)
(defn empty-pq [] nil)


(defn getMin-pq ^PQEntry [pq] (condp instance? pq
(defn getMin-pq [^PQNode pq]
(if (nil? pq)
PQEntry pq,
nil
PQNode (.ntry ^PQNode pq)
(.ntry pq)))
nil))


(defn insert-pq [opq k v]
(defn insert-pq [^PQNode opq ok v]
(loop [kv (->PQEntry k v), msk 0, pq opq, cont identity]
(loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
(condp instance? pq
(if (nil? pq)
PQEntry (if (< k (.k ^PQEntry pq)) (cont (->PQNode kv pq nil 2))
(cont (->PQNode kv nil nil))
(let [k (.k kv),
(cont (->PQNode pq kv nil 2))),
PQNode (let [^PQNode pqn pq, kvn (.ntry pqn), l (.lft pqn), r (.rght pqn),
^PQEntry kvn (.ntry pq), kn (.k kvn),
nlvl (+ (.lvl pqn) 1),
l (.lft pq), r (.rght pq)]
(if (<= k kn)
nmsk (if (zero? msk) ;; never ever 0 again with the bit or'ed 1
(bit-or (bit-shift-left nlvl (- 64 (long (quot (Math/log (double nlvl))
(recur kvn r #(cont (->PQNode kv % l)))
(recur kv r #(cont (->PQNode kvn % l))))))))
(Math/log (double 2)))))) 1)
(bit-shift-left msk 1))]
(if (<= k (.k ^PQEntry kvn))
(if (neg? nmsk)
(recur kvn nmsk r (fn [npq] (cont (->PQNode kv l npq nlvl))))
(recur kvn nmsk l (fn [npq] (cont (->PQNode kv npq r nlvl)))))
(if (neg? nmsk)
(recur kv nmsk r (fn [npq] (cont (->PQNode kvn l npq nlvl))))
(recur kv nmsk l (fn [npq] (cont (->PQNode kvn npq r nlvl))))))),
(cont kv))))


(defn replaceMinAs-pq [opq k v]
(defn replaceMinAs-pq [^PQNode opq k v]
(let [kv (->PQEntry k v)]
(let [^PQEntry kv (->PQEntry k v)]
(if (nil? opq) ;; if was empty or just an entry, just use current entry
(loop [pq opq, cont identity]
(if (instance? PQNode pq)
(->PQNode kv nil nil)
(let [^PQNode pqn pq, l (.lft pqn), r (.rght pqn), lvl (.lvl pqn)]
(loop [pq opq, cont identity]
(let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
(cond
(and (instance? PQEntry r) (> k (.k ^PQEntry r)))
(cond ;; if left us empty, right must be too
(cond ;; right not empty so left is never empty
(nil? l)
(and (instance? PQEntry l) (> k (.k ^PQEntry l))) ;; both qualify; choose least
(cont (->PQNode kv nil nil)),
(if (> (.k ^PQEntry l) (.k ^PQEntry r))
(nil? r) ;; we only have a left...
(cont (->PQNode r l kv lvl))
(let [^PQEntry kvl (.ntry l), kl (.k kvl)]
(cont (->PQNode l kv r lvl))),
(if (<= k kl)
(and (instance? PQNode l) (> k (.k ^PQEntry (.ntry ^PQNode l))))
(cont (->PQNode kv l nil))
(let [^PQEntry kvl (.ntry ^PQNode l)]
(recur l #(cont (->PQNode kvl % nil))))),
(if (> (.k kvl) (.k ^PQEntry r)) ;; both qualify; choose least
:else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
(cont (->PQNode r l kv lvl))
^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
(recur l (fn [npq] (cont (->PQNode kvl npq r lvl)))))),
(if (and (<= k kl) (<= k kr))
:else (cont (->PQNode r l kv lvl))), ;; only right qualifies; no recursion
(cont (->PQNode kv l r))
(and (instance? PQNode r) (> k (.k ^PQEntry (.ntry ^PQNode r))))
(if (<= kl kr)
(let [^PQEntry kvr (.ntry ^PQNode r)]
(recur l #(cont (->PQNode kvl % r)))
(if (and (instance? PQNode l) (> k (.k ^PQEntry (.ntry ^PQNode l))))
(recur r #(cont (->PQNode kvr l %))))))))))))</syntaxhighlight>
(let [^PQEntry kvl (.ntry ^PQNode l)]
(if (> (.k kvl) (.k kvr)) ;; both qualify; choose least
(recur r (fn [npq] (cont (->PQNode kvr l npq lvl))))
(recur l (fn [npq] (cont (->PQNode kvl npq r lvl))))))
(recur r (fn [npq] (cont (->PQNode kvr l npq lvl)))))), ;; only right qualifies
:else (cond ;; right is empty, but as this is a node, left is never empty
(and (instance? PQEntry l) (> k (.k ^PQEntry l)))
(cont (->PQNode l kv r lvl)),
(and (instance? PQNode l) (> k (.k ^PQEntry (.ntry ^PQNode l))))
(recur l (fn [npq] (cont (->PQNode (.ntry ^PQNode l) npq r lvl)))),
:else (cont (->PQNode kv l r lvl))))) ;; just replace contents, leave same
(cont kv))))) ;; if was empty or just an entry, just use current entry
</lang>


Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.
Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.


The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:
The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

<lang clojure>
(defn primes-pq
<syntaxhighlight lang="clojure">(defn primes-pq
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
[]
[]
Line 2,145: Line 4,260:
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))</syntaxhighlight>
</lang>


The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.
The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.
Line 2,162: Line 4,276:
To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:
To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:


<lang clojure>(set! *unchecked-math* true)
<syntaxhighlight lang="clojure">(set! *unchecked-math* true)


(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
Line 2,309: Line 4,423:
(next pgseq)
(next pgseq)
(+ cnt (count-pg PGBTS pg))))))]
(+ cnt (count-pg PGBTS pg))))))]
(nxt-pg 0 (primes-pages) 1))))</lang>
(nxt-pg 0 (primes-pages) 1))))</syntaxhighlight>


The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM
The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM
Line 2,320: Line 4,434:


The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.
The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.

=={{header|CLU}}==
<syntaxhighlight lang="clu">% Sieve of Eratosthenes
eratosthenes = proc (n: int) returns (array[bool])
prime: array[bool] := array[bool]$fill(1, n, true)
prime[1] := false

for p: int in int$from_to(2, n/2) do
if prime[p] then
for c: int in int$from_to_by(p*p, n, p) do
prime[c] := false
end
end
end
return(prime)
end eratosthenes

% Print primes up to 1000 using the sieve
start_up = proc ()
po: stream := stream$primary_output()
prime: array[bool] := eratosthenes(1000)
col: int := 0

for i: int in array[bool]$indexes(prime) do
if prime[i] then
col := col + 1
stream$putright(po, int$unparse(i), 5)
if col = 10 then
col := 0
stream$putc(po, '\n')
end
end
end
end start_up</syntaxhighlight>
{{out}}
<pre> 2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997</pre>


=={{header|CMake}}==
=={{header|CMake}}==
<lang cmake>function(eratosthenes var limit)
<syntaxhighlight lang="cmake">function(eratosthenes var limit)
# Check for integer overflow. With CMake using 32-bit signed integer,
# Check for integer overflow. With CMake using 32-bit signed integer,
# this check fails when limit > 46340.
# this check fails when limit > 46340.
Line 2,356: Line 4,522:
endforeach(i)
endforeach(i)
set(${var} ${list} PARENT_SCOPE)
set(${var} ${list} PARENT_SCOPE)
endfunction(eratosthenes)</lang>
endfunction(eratosthenes)</syntaxhighlight>
# Print all prime numbers through 100.
# Print all prime numbers through 100.
eratosthenes(primes 100)
eratosthenes(primes 100)
Line 2,362: Line 4,528:


=={{header|COBOL}}==
=={{header|COBOL}}==
<lang cobol>*> Please ignore the asterisks in the first column of the next comments,
<syntaxhighlight lang="cobol">*> Please ignore the asterisks in the first column of the next comments,
*> which are kludges to get syntax highlighting to work.
*> which are kludges to get syntax highlighting to work.
IDENTIFICATION DIVISION.
IDENTIFICATION DIVISION.
Line 2,416: Line 4,582:


GOBACK
GOBACK
.</lang>
.</syntaxhighlight>

=={{header|Comal}}==
{{trans|BASIC}}
<syntaxhighlight lang="comal">// Sieve of Eratosthenes
input "Limit? ": limit
dim sieve(1:limit)
sqrlimit:=sqr(limit)
sieve(1):=1
p:=2
while p<=sqrlimit do
while sieve(p) and p<sqrlimit do
p:=p+1
endwhile
if p>sqrlimit then goto done
for i:=p*p to limit step p do
sieve(i):=1
endfor i
p:=p+1
endwhile
done:
print 2,
for i:=3 to limit do
if sieve(i)=0 then
print ", ",i,
endif
endfor i
print</syntaxhighlight>

{{Out}}
<pre>
Limit? 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97

end</pre>


=={{header|Common Lisp}}==
=={{header|Common Lisp}}==
<lang lisp>(defun sieve-of-eratosthenes (maximum)
<syntaxhighlight lang="lisp">(defun sieve-of-eratosthenes (maximum)
(loop
(loop
with sieve = (make-array (1+ maximum)
with sieve = (make-array (1+ maximum)
Line 2,429: Line 4,631:
and do (loop for composite from (expt candidate 2)
and do (loop for composite from (expt candidate 2)
to maximum by candidate
to maximum by candidate
do (setf (bit sieve composite) 1))))</lang>
do (setf (bit sieve composite) 1))))</syntaxhighlight>


Working with odds only (above twice speedup), and only test divide up to the square root of the maximum:
Working with odds only (above twice speedup), and marking composites only for primes up to the square root of the maximum:


<lang lisp>(defun sieve-odds (maximum) "sieve for odd numbers"
<syntaxhighlight lang="lisp">(defun sieve-odds (maximum)
"Prime numbers sieve for odd numbers.
(cons 2
(let ((maxi (ash (1- maximum) -1)) (stop (ash (isqrt maximum) -1)))
Returns a list with all the primes that are less than or equal to maximum."
(loop :with maxi = (ash (1- maximum) -1)
(let ((sieve (make-array (1+ maxi) :element-type 'bit :initial-element 0)))
(loop for i from 1 to maxi
:with stop = (ash (isqrt maximum) -1)
:with sieve = (make-array (1+ maxi) :element-type 'bit :initial-element 0)
when (zerop (sbit sieve i))
collect (1+ (ash i 1))
:for i :from 1 :to maxi
and when (<= i stop) do
:for odd-number = (1+ (ash i 1))
(loop for j from (ash (* i (1+ i)) 1) to maxi by (1+ (ash i 1))
:when (zerop (sbit sieve i))
:collect odd-number :into values
do (setf (sbit sieve j) 1)))))))</lang>
:when (<= i stop)
:do (loop :for j :from (* i (1+ i) 2) :to maxi :by odd-number
:do (setf (sbit sieve j) 1))
:finally (return (cons 2 values))))</syntaxhighlight>

The indexation scheme used here interprets each index <code>i</code> as standing for the value <code>2i+1</code>. Bit <code>0</code> is unused, a small price to pay for the simpler index calculations compared with the <code>2i+3</code> indexation scheme. The multiples of a given odd prime <code>p</code> are enumerated in increments of <code>2p</code>, which corresponds to the index increment of <code>p</code> on the sieve array. The starting point <code>p*p = (2i+1)(2i+1) = 4i(i+1)+1</code> corresponds to the index <code>2i(i+1)</code>.


While formally a ''wheel'', odds are uniformly spaced and do not require any special processing except for value translation. Wheels proper aren't uniformly spaced and are thus trickier.
While formally a ''wheel'', odds are uniformly spaced and do not require any special processing except for value translation. Wheels proper aren't uniformly spaced and are thus trickier.

=={{header|Cowgol}}==
<syntaxhighlight lang="cowgol">include "cowgol.coh";

# To change the maximum prime, change the size of this array
# Everything else is automatically filled in at compile time
var sieve: uint8[5000];

# Make sure all elements of the sieve are set to zero
MemZero(&sieve as [uint8], @bytesof sieve);

# Generate the sieve
var prime: @indexof sieve := 2;
while prime < @sizeof sieve loop
if sieve[prime] == 0 then
var comp: @indexof sieve := prime * prime;
while comp < @sizeof sieve loop
sieve[comp] := 1;
comp := comp + prime;
end loop;
end if;
prime := prime + 1;
end loop;

# Print all primes
var cand: @indexof sieve := 2;
while cand < @sizeof sieve loop
if sieve[cand] == 0 then
print_i16(cand as uint16);
print_nl();
end if;
cand := cand + 1;
end loop;</syntaxhighlight>

{{out}}
<pre>2
3
5
7
11
...
4967
4969
4973
4987
4999</pre>

=={{header|Craft Basic}}==
<syntaxhighlight lang="basic">define limit = 120

dim flags[limit]

for n = 2 to limit

let flags[n] = 1

next n

print "prime numbers less than or equal to ", limit ," are:"

for n = 2 to sqrt(limit)

if flags[n] = 1 then

for i = n * n to limit step n

let flags[i] = 0

next i

endif

next n

for n = 1 to limit

if flags[n] then

print n

endif

next n</syntaxhighlight>
{{out| Output}}<pre>
prime numbers less than or equal to 120 are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 </pre>

=={{header|Crystal}}==

===Basic Version===

This implementation uses a `BitArray` so it is automatically bit-packed to use just one bit per number representation:

<syntaxhighlight lang="ruby"># compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE
include Iterator(Prime)
@bits : BitArray; @bitndx : Int32 = 2

def initialize(range : Prime)
if range < 2
@bits = BitArray.new 0
else
@bits = BitArray.new((range + 1).to_i32)
end
ba = @bits; ndx = 2
while true
wi = ndx * ndx
break if wi >= ba.size
if ba[ndx]
ndx += 1; next
end
while wi < ba.size
ba[wi] = true; wi += ndx
end
ndx += 1
end
end

def next
while @bitndx < @bits.size
if @bits[@bitndx]
@bitndx += 1; next
end
rslt = @bitndx.to_u64; @bitndx += 1; return rslt
end
stop
end
end

print "Primes up to a hundred: "
SoE.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million: "
puts SoE.new(1_000_000).each.size
print "Number of primes to a billion: "
start_time = Time.monotonic
print SoE.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."</syntaxhighlight>

{{out}}
<pre>Primes up to a hundred: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million: 78498
Number of primes to a billion: 50847534 in 10219.222539 milliseconds.</pre>

This is as run on an Intel SkyLake i5-6500 at 3.6 GHz (automatic boost for single threaded as here).

===Odds-Only Version===

the non-odds-only version as per the above should never be used because in not using odds-only, it uses twice the memory and over two and a half times the CPU operations as the following odds-only code, which is very little more complex:

<syntaxhighlight lang="ruby"># compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE_Odds
include Iterator(Prime)
@bits : BitArray; @bitndx : Int32 = -1

def initialize(range : Prime)
if range < 3
@bits = BitArray.new 0
else
@bits = BitArray.new(((range - 1) >> 1).to_i32)
end
ba = @bits; ndx = 0
while true
wi = (ndx + ndx) * (ndx + 3) + 3 # start cull index calculation
break if wi >= ba.size
if ba[ndx]
ndx += 1; next
end
bp = ndx + ndx + 3
while wi < ba.size
ba[wi] = true; wi += bp
end
ndx += 1
end
end

def next
while @bitndx < @bits.size
if @bitndx < 0
@bitndx += 1; return 2_u64
elsif @bits[@bitndx]
@bitndx += 1; next
end
rslt = (@bitndx + @bitndx + 3).to_u64; @bitndx += 1; return rslt
end
stop
end
end

print "Primes up to a hundred: "
SoE_Odds.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million: "
puts SoE_Odds.new(1_000_000).each.size
print "Number of primes to a billion: "
start_time = Time.monotonic
print SoE_Odds.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."</syntaxhighlight>

{{out}}
<pre>Primes up to a hundred: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million: 78498
Number of primes to a billion: 50847534 in 4877.829642 milliseconds.</pre>

As can be seen, this is over two times faster than the non-odds-only version when run on the same CPU due to reduced pressure on the CPU data cache; however it is only reasonably performant for ranges of a few millions, and above that a page-segmented version of odds-only (or further wheel factorization) should be used plus other techniques for a further reduction of number of CPU clock cycles per culling/marking operation.

===Page-Segmented Odds-Only Version===

For sieving of ranges larger than a few million efficiently, a page-segmented sieve should always be used to preserve CPU cache associativity by making the page size to be about that of the CPU L1 data cache. The following code implements a page-segmented version that is an extensible sieve (no upper limit needs be specified) using a secondary memoized feed of base prime value arrays which use a smaller page-segment size for efficiency. When the count of the number of primes is desired, the sieve is polymorphic in output and counts the unmarked composite bits by using fast `popcount` instructions taken 64-bits at a time. The code is as follows:

<syntaxhighlight lang="ruby"># compile with `--release --no-debug` for speed...

alias Prime = UInt64
alias PrimeNdx = Int64
alias PrimeArr = Array(Prime)
alias SieveBuffer = Pointer(UInt8)
alias BasePrime = UInt32
alias BasePrimeArr = Array(BasePrime)

CPUL1CACHE = 131072 # 16 Kilobytes in nimber of bits

BITMASK = Pointer(UInt8).malloc(8) { |i| 1_u8 << i }

# Count number of non-composite (zero) bits within index range...
# sieve buffer is always evenly divisible by 64-bit words...
private def count_page_to(ndx : Int32, sb : SieveBuffer)
lstwrdndx = ndx >> 6; mask = (~1_u64) << (ndx & 63)
cnt = lstwrdndx * 64 + 64; sbw = sb.as(Pointer(UInt64))
lstwrdndx.times { |i| cnt -= sbw[i].popcount }
cnt - (sbw[lstwrdndx] | mask).popcount
end

# Cull composite bits from sieve buffer using base prime arrays;
# starting at overall given prime index for given buffer bit size...
private def cull_page(pndx : PrimeNdx, bitsz : Int32,
bps : Iterator(BasePrimeArr), sb : SieveBuffer)
bps.each { |bpa|
bpa.each { |bpu32|
bp = bpu32.to_i64; bpndx = (bp - 3) >> 1
swi = (bpndx + bpndx) * (bpndx + 3) + 3 # calculate start prime index
return if swi >= pndx + bitsz.to_i64
bpi = bp.to_i32 # calculate buffer start culling index...
bi = (swi >= pndx) ? (swi - pndx).to_i32 : begin
r = (pndx - swi) % bp; r == 0 ? 0 : bpi - r.to_i32
end
# when base prime is small enough, cull using strided loops to
# simplify the inner loops at the cost of more loop overhead...
# allmost all of the work is done by the following loop...
if bpi < (bitsz >> 4)
bilmt = bi + (bpi << 3); cplmt = sb + (bitsz >> 3)
bilmt = CPUL1CACHE if bilmt > CPUL1CACHE
while bi < bilmt
cp = sb + (bi >> 3); msk = BITMASK[bi & 7]
while cp < cplmt # use pointer to save loop overhead
cp[0] |= msk; cp += bpi
end
bi += bpi
end
else
while bi < bitsz # bitsz
sb[bi >> 3] |= BITMASK[bi & 7]; bi += bpi
end
end } }
end

# Iterator over processed prime pages, polymorphic by the converter function...
private class PagedResults(T)
@bpas : BasePrimeArrays
@cmpsts : SieveBuffer

def initialize(@prmndx : PrimeNdx,
@cmpstsbitsz : Int32,
@cnvrtrfnc : (Int64, Int32, SieveBuffer) -> T)
@bpas = BasePrimeArrays.new
@cmpsts = SieveBuffer.malloc(((@cmpstsbitsz + 63) >> 3) & (-8))
end

private def dopage
(@prmndx..).step(@cmpstsbitsz.to_i64).map { |pn|
@cmpsts.clear(@cmpstsbitsz >> 3)
cull_page(pn, @cmpstsbitsz, @bpas.each, @cmpsts)
@cnvrtrfnc.call(pn, @cmpstsbitsz, @cmpsts) }
end

def each
dopage
end

def each(& : T -> _) : Nil
itr = dopage
while true
value = itr.next
break if value.is_a?(Iterator::Stop)
yield value
end
end
end

# Secondary memoized chain of BasePrime arrays (by small page size),
# which is actually a iterable lazy list (memoized) of BasePrimeArr;
# Crystal has closures, so it is easy to implement a LazyList class
# which memoizes the results of the thunk so it is only executed once...
private class BasePrimeArrays
@baseprmarr : BasePrimeArr # head of lezy list
@tail : BasePrimeArrays? = nil # tail starts as non-existing

def initialize # special case for first page of base primes
# converter of sieve buffer to base primes array...
sb2bparrprc = -> (pn : PrimeNdx, bl : Int32, sb : SieveBuffer) {
cnt = count_page_to(bl - 1, sb)
bparr = BasePrimeArr.new(cnt, 0); j = 0
bsprm = (pn + pn + 3).to_u32
bl.times.each { |i|
next if (sb[i >> 3] & BITMASK[i & 7]) != 0
bparr[j] = bsprm + (i + i).to_u32; j += 1 }
bparr }

cmpsts = SieveBuffer.malloc 128 # fake bparr for first iter...
frstbparr = sb2bparrprc.call(0_i64, 1024, cmpsts)
cull_page(0_i64, 1024, Iterator.of(frstbparr).each, cmpsts)
@baseprmarr = sb2bparrprc.call(0_i64, 1024, cmpsts)

# initialization of pages after the first is deferred to avoid data race...
initbpas = -> { PagedResults.new(1024_i64, 1024, sb2bparrprc).each }
# recursive LazyList generator function...
nxtbpa = uninitialized Proc(Iterator(BasePrimeArr), BasePrimeArrays)
nxtbpa = -> (bppgs : Iterator(BasePrimeArr)) {
nbparr = bppgs.next
abort "Unexpectedbase primes end!!!" if nbparr.is_a?(Iterator::Stop)
BasePrimeArrays.new(nbparr, ->{ nxtbpa.call(bppgs) }) }
@thunk = ->{ nxtbpa.call(initbpas.call) }
end
def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(BasePrimeArrays))
end
def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(Nil))
end
def initialize(@baseprmarr : BasePrimeArr, @thunk : Nil)
end

def tail # not thread safe without a lock/mutex...
if thnk = @thunk
@tail = thnk.call; @thunk = nil
end
@tail
end

private class BasePrimeArrIter # iterator over BasePrime arrays...
include Iterator(BasePrimeArr)
@dbparrs : Proc(BasePrimeArrays?)

def initialize(fromll : BasePrimeArrays)
@dbparrs = ->{ fromll.as(BasePrimeArrays?) }
end

def next
if bpas = @dbparrs.call
rslt = bpas.@baseprmarr; @dbparrs = -> { bpas.tail }; rslt
else
abort "Unexpected end of base primes array iteration!!!"
end
end
end
def each
BasePrimeArrIter.new(self)
end
end

# An "infinite" extensible iteration of primes,...
def primes
sb2prms = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
cnt = count_page_to(bitsz - 1, sb)
prmarr = PrimeArr.new(cnt, 0); j = 0
bsprm = (pn + pn + 3).to_u64
bitsz.times.each { |i|
next if (sb[i >> 3] & BITMASK[i & 7]) != 0
prmarr[j] = bsprm + (i + i).to_u64; j += 1 }
prmarr
}
(2_u64..2_u64).each
.chain PagedResults.new(0, CPUL1CACHE, sb2prms).each.flat_map { |prmspg| prmspg.each }
end

# Counts number of primes to given limit...
def primes_count_to(lmt : Prime)
if lmt < 3
lmt < 2 ? return 0 : return 1
end
lmtndx = ((lmt - 3) >> 1).to_i64
sb2cnt = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
pglmt = pn + bitsz.to_i64 - 1
if (pn + CPUL1CACHE.to_i64) > lmtndx
Tuple.new(count_page_to((lmtndx - pn).to_i32, sb).to_i64, pglmt)
else
Tuple.new(count_page_to(bitsz - 1, sb).to_i64, pglmt)
end
}
count = 1
PagedResults.new(0, CPUL1CACHE, sb2cnt).each { |(cnt, lmt)|
count += cnt; break if lmt >= lmtndx }
count
end

print "The primes up to 100 are: "
primes.each.take_while { |p| p <= 100_u64 }.each { |p| print " ", p }
print ".\r\nThe Number of primes up to a million is "
print primes.each.take_while { |p| p <= 1_000_000_u64 }.size
print ".\r\nThe number of primes up to a billion is "
start_time = Time.monotonic
# answr = primes.each.take_while { |p| p <= 1_000_000_000_u64 }.size # slow way
answr = primes_count_to(1_000_000_000) # fast way
elpsd = (Time.monotonic - start_time).total_milliseconds
print "#{answr} in #{elpsd} milliseconds.\r\n"</syntaxhighlight>

{{out}}
<pre>The primes up to 100 are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97.
The Number of primes up to a million is 78498.
The number of primes up to a billion is 50847534 in 658.466028 milliseconds.</pre>

When run on the same machine as the previous version, the code is about seven and a half times as fast as even the above Odds-Only version at about 2.4 CPU clock cycles per culling operation rather than over 17, partly due to better cache associativity (about half the gain) but also due to tuning the inner culling loop for small base prime values to operate by byte pointer strides with a constant mask value to simplify the code generated for these inner loops; as there is some overhead in the eight outer loops that set this up, this technique is only applicable for smaller base primes.

Further gains are possible by using maximum wheel factorization rather than just factorization for odd base primes which can reduce the number of operations by a factor of about four and the number of CPU clock cycles per culling operation can be reduced by an average of a further about 25 percent for sieving to a billion by using extreme loop unrolling techniques for both the dense and sparse culling cases. As well, multi-threading by pages can reduce the wall clock time by a factor of the number of effective cores (non Hyper-Threaded cores).


=={{header|D}}==
=={{header|D}}==
===Simpler Version===
===Simpler Version===
Prints all numbers less than the limit.<lang d>import std.stdio, std.algorithm, std.range, std.functional;
Prints all numbers less than the limit.<syntaxhighlight lang="d">import std.stdio, std.algorithm, std.range, std.functional;


uint[] sieve(in uint limit) nothrow @safe {
uint[] sieve(in uint limit) nothrow @safe {
Line 2,466: Line 5,106:
void main() {
void main() {
50.sieve.writeln;
50.sieve.writeln;
}</lang>
}</syntaxhighlight>
{{out}}
{{out}}
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]</pre>
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]</pre>
Line 2,472: Line 5,112:
===Faster Version===
===Faster Version===
This version uses an array of bits (instead of booleans, that are represented with one byte), and skips even numbers. The output is the same.
This version uses an array of bits (instead of booleans, that are represented with one byte), and skips even numbers. The output is the same.
<lang d>import std.stdio, std.math, std.array;
<syntaxhighlight lang="d">import std.stdio, std.math, std.array;


size_t[] sieve(in size_t m) pure nothrow @safe {
size_t[] sieve(in size_t m) pure nothrow @safe {
Line 2,510: Line 5,150:
void main() {
void main() {
50.sieve.writeln;
50.sieve.writeln;
}</lang>
}</syntaxhighlight>


===Extensible Version===
===Extensible Version===
(This version is used in the task [[Extensible prime generator#D|Extensible prime generator]].)
(This version is used in the task [[Extensible prime generator#D|Extensible prime generator]].)
<lang d>/// Extensible Sieve of Eratosthenes.
<syntaxhighlight lang="d">/// Extensible Sieve of Eratosthenes.
struct Prime {
struct Prime {
uint[] a = [2];
uint[] a = [2];
Line 2,548: Line 5,188:
uint.max.iota.map!prime.until!q{a > 50}.writeln;
uint.max.iota.map!prime.until!q{a > 50}.writeln;
}
}
}</lang>
}</syntaxhighlight>
To see the output (that is the same), compile with <code>-version=sieve_of_eratosthenes3_main</code>.
To see the output (that is the same), compile with <code>-version=sieve_of_eratosthenes3_main</code>.


=={{header|Dart}}==
=={{header|Dart}}==
<lang dart>// helper function to pretty print an Iterable
<syntaxhighlight lang="dart">// helper function to pretty print an Iterable
String iterableToString(Iterable seq) {
String iterableToString(Iterable seq) {
String str = "[";
String str = "[";
Line 2,584: Line 5,224:
print(iterableToString(sortedValues)); // expect sieve.length to be 168 up to 1000...
print(iterableToString(sortedValues)); // expect sieve.length to be 168 up to 1000...
// Expect.equals(168, sieve.length);
// Expect.equals(168, sieve.length);
}</lang>
}</syntaxhighlight>
{{out}}<pre>
{{out}}<pre>
Found 168 primes up to 1000 in 9 milliseconds.
Found 168 primes up to 1000 in 9 milliseconds.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]
</pre>
</pre>
Although it has the characteristics of a true Sieve of Eratosthenes, the above code isn't very efficient due to the remove/modify operations on the Set. Due to these, the computational complexity isn't close to linear with increasing range and it is quite slow for larger sieve ranges compared to compiled languages, taking about four seconds to sieve to ten million.
Although it has the characteristics of a true Sieve of Eratosthenes, the above code isn't very efficient due to the remove/modify operations on the Set. Due to these, the computational complexity isn't close to linear with increasing range and it is quite slow for larger sieve ranges compared to compiled languages, taking an average of about 22 thousand CPU clock cycles for each of the 664579 primes (about 4 seconds on a 3.6 Gigahertz CPU) just to sieve to ten million.


===faster bit-packed array odds-only solution===
===faster bit-packed array odds-only solution===
<lang dart>import 'dart:math';
<syntaxhighlight lang="dart">import 'dart:typed_data';
import 'dart:math';


List<int> SoEOdds(int limit) {
Iterable<int> soeOdds(int limit) {
if (limit < 3) return limit < 2 ? Iterable.empty() : [2];
List<int> prms = new List();
if (limit < 2) return prms;
int lmti = (limit - 3) >> 1;
int bfsz = (lmti >> 3) + 1;
prms.add(2);
if (limit < 3) return prms;
int lmt = (limit - 3) >> 1;
int bfsz = (lmt >> 5) + 1;
int sqrtlmt = (sqrt(limit) - 3).floor() >> 1;
int sqrtlmt = (sqrt(limit) - 3).floor() >> 1;
var buf = new List<int>();
Uint32List cmpsts = Uint32List(bfsz);
for (int i = 0; i < bfsz; i++)
for (int i = 0; i <= sqrtlmt; ++i)
if ((cmpsts[i >> 5] & (1 << (i & 31))) == 0) {
buf.add(0);
for (int i = 0; i <= sqrtlmt; i++)
if ((buf[i >> 5] & (1 << (i & 31))) == 0) {
int p = i + i + 3;
int p = i + i + 3;
for (int j = (p * p - 3) >> 1; j <= lmt; j += p)
for (int j = (p * p - 3) >> 1; j <= lmti; j += p)
buf[j >> 5] |= 1 << (j & 31);
cmpsts[j >> 5] |= 1 << (j & 31);
}
}
return
for (int i = 0; i <= lmt; i++)
[2].followedBy(
if ((buf[i >> 5] & (1 << (i & 31))) == 0)
prms.add(i + i + 3);
Iterable.generate(lmti + 1)
.where((i) => cmpsts[i >> 5] & (1 << (i & 31)) == 0)
return prms;
.map((i) => i + i + 3) );
}
}


void main() {
void main() {
int limit = 10000000;
final int range = 100000000;
String s = "( ";
int strt = new DateTime.now().millisecondsSinceEpoch;
primesPaged().take(25).forEach((p)=>s += "$p "); print(s + ")");
List<int> primes = SoEOdds(limit);
print("There are ${countPrimesTo(1000000)} primes to 1000000.");
int count = primes.length;
int elpsd = new DateTime.now().millisecondsSinceEpoch - strt;
final start = DateTime.now().millisecondsSinceEpoch;
final answer = soeOdds(range).length;
print("Found " + count.toString() + " primes up to " + limit.toString() +
final elapsed = DateTime.now().millisecondsSinceEpoch - start;
" in " + elpsd.toString() + " milliseconds.");
// print(iterableToString(primes)); // expect sieve.length to be 168 up to 1000...
print("There were $answer primes found up to $range.");
print("This test bench took $elapsed milliseconds.");
}</lang>
}</syntaxhighlight>
The above code is somewhat faster at about ten seconds using the Dart VM to sieve to 100 million, although much faster at about 1.5 seconds run conventionally in Google Chrome using the JavaScript V8 engine, likely due to JavaScript using double floating point numbers for int's whereas the Dart VM uses arbitrary precision integers.
{{output}}
<pre>( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 4604 milliseconds.</pre>


The above code is somewhat faster at about 1.5 thousand CPU cycles per prime here run on a 1.92 Gigahertz low end Intel x5-Z8350 CPU or about 2.5 seconds on a 3.6 Gigahertz CPU using the Dart VM to sieve to 100 million.
===fast page segmented array infinite iterator (sieves odds-only)===
{{trans|JavaScript}}
<lang dart>import 'dart:collection';


===Unbounded infinite iterators/generators of primes===
class _SoEPagedIterator implements Iterator<int> {

static const int _BFSZ = 1 << 16;
'''Infinite generator using a (hash) Map (sieves odds-only)'''
static const int _BFBTS = _BFSZ * 32;

static const int _BFRNG = _BFBTS * 2;
The following code will have about O(n log (log n)) performance due to a hash table having O(1) average performance and is only somewhat slow due to the constant overhead of processing hashes:
int _prime = null;

int _bi = -1;
<syntaxhighlight lang="dart">Iterable<int> primesMap() {
int _lowi = 0;
List<int> _bpa = new List<int>();
Iterable<int> oddprms() sync* {
yield(3); yield(5); // need at least 2 for initialization
Iterator<int> _bps;
List<int> _buf = new List<int>();
final Map<int, int> bpmap = {9: 6};
final Iterator<int> bps = oddprms().iterator;
int get current => this._prime;
bool moveNext() {
bps.moveNext(); bps.moveNext(); // skip past 3 to 5
int bp = bps.current;
// the following redundant local variable declaration is necessary to
int n = bp;
// prevent the dart2js compiler from "tree-shaking" and eliminating some
int q = bp * bp;
// essential code from the below, which doesn't happen with the Dart VM compiler.
int lowi = this._lowi;
while (true) {
while (true) {
n += 2;
if (this._bi < 1) {
while (n >= q || bpmap.containsKey(n)) {
if (this._bi < 0) { this._bi++; this._prime = 2; break; }
if (n >= q) {
int nxt = 3 + (this._lowi << 1) + _BFRNG;
final int inc = bp << 1;
this._buf.clear();
bpmap[bp * bp + inc] = inc;
for (int i = 0; i < _BFSZ; i++) this._buf.add(0); // faster initialization:
bps.moveNext(); bp = bps.current; q = bp * bp;
} else {
if (lowi <= 0) { // special culling for first page as no base primes yet:
for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
final int inc = bpmap.remove(n);
if ((this._buf[i >> 5] & (1 << (i & 31))) == 0)
int next = n + inc;
for (int j = (sqr - 3) >> 1; j < _BFBTS; j += p)
while (bpmap.containsKey(next)) {
this._buf[j >> 5] |= 1 << (j & 31);
next += inc;
} else { // after the first page:
if (this._bpa.length == 0) { // if this is the first page after the zero one:
this._bps = new _SoEPagedIterator(); // initialize separate base primes stream:
this._bps.moveNext(); // advance to the only even prime of two
this._bps.moveNext(); // advance past 2 to the next prime of 3
}
// get enough base primes for the page range...
for (var lp = this._bps.current, sqr = lp * lp; sqr < nxt;
this._bps.moveNext(), lp = this._bps.current, sqr = lp * lp) this._bpa.add(lp);
for (var i = 0; i < this._bpa.length; i++) {
int p = this._bpa[i];
int s = (p * p - 3) >> 1;
if (s >= this._lowi) // adjust start index based on page lower limit...
s -= this._lowi;
else {
int r = (this._lowi - s) % p;
s = (r != 0) ? p - r : 0;
}
}
for (var j = s; j < _BFBTS; j += p)
bpmap[next] = inc;
this._buf[j >> 5] |= 1 << (j & 31);
}
}
n += 2;
}
}
yield(n);
}
}
}
while (this._bi < _BFBTS && ((this._buf[this._bi >> 5] & (1 << (this._bi & 31))) != 0))
return [2].followedBy(oddprms());
this._bi++; // find next marker still with prime status
}
if (this._bi < _BFBTS) { // within buffer: output computed prime

this._prime = 3 + ((this._lowi + this._bi++) << 1); break; }
void main() {
else { // beyond buffer range: advance buffer
this._bi = 0;
final int range = 100000000;
String s = "( ";
this._lowi += _BFBTS;
primesMap().take(25).forEach((p)=>s += "$p "); print(s + ")");
lowi = this._lowi;
print("There are ${primesMap().takeWhile((p)=>p<=1000000).length} preimes to 1000000.");
final start = DateTime.now().millisecondsSinceEpoch;
final answer = primesMap().takeWhile((p)=>p<=range).length;
final elapsed = DateTime.now().millisecondsSinceEpoch - start;
print("There were $answer primes found up to $range.");
print("This test bench took $elapsed milliseconds.");
}</syntaxhighlight>
{{output}}
<pre>( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 preimes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 16086 milliseconds.</pre>

This takes about 5300 CPU clock cycles per prime or about 8.4 seconds if run on a 3.6 Gigahertz CPU, which is slower than the above fixed bit-packed array version but has the advantage that it runs indefinitely, (at least on 64-bit machines; on 32 bit machines it can only be run up to the 32-bit number range, or just about a factor of 20 above as above).

Due to the constant execution overhead this is only reasonably useful for ranges up to tens of millions anyway.

'''Fast page segmented array infinite generator (sieves odds-only)'''

The following code also theoretically has a O(n log (log n)) execution speed performance and the same limited use on 32-bit execution platformas, but won't realize the theoretical execution complexity for larger primes due to the cache size increasing in size beyond its limits; but as the CPU L2 cache size that it automatically grows to use isn't any slower than the basic culling loop speed, it won't slow down much above that limit up to ranges of about 2.56e14, which will take in the order of weeks:

{{trans|Kotlin}}
<syntaxhighlight lang="dart">import 'dart:typed_data';
import 'dart:math';
import 'dart:collection';

// a lazy list
typedef _LazyList _Thunk();
class _LazyList<T> {
final T head;
_Thunk thunk;
_LazyList<T> _rest;
_LazyList(T this.head, _Thunk this.thunk);
_LazyList<T> get rest {
if (this.thunk != null) {
this._rest = this.thunk();
this.thunk = null;
}
return this._rest;
}
}

class _LazyListIterable<T> extends IterableBase<T> {
_LazyList<T> _first;
_LazyListIterable(_LazyList<T> this._first);
@override Iterator<T> get iterator {
Iterable<T> inner() sync* {
_LazyList<T> current = this._first;
while (true) {
yield(current.head);
current = current.rest;
}
}
} return true;
}
return inner().iterator;
}
}
}
}


// zero bit population count Look Up Table for 16-bit range...
class SoEPagedOddsInfGen extends IterableBase<int> {
final Uint8List CLUT =
Iterator<int> get iterator { return new _SoEPagedIterator(); }
Uint8List.fromList(
Iterable.generate(65536)
.map((i) {
final int v0 = ~i & 0xFFFF;
final int v1 = v0 - ((v0 & 0xAAAA) >> 1);
final int v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2);
return (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) * 0x0101)) >> 8) & 31;
})
.toList());

int _countComposites(Uint8List cmpsts) {
Uint16List buf = Uint16List.view(cmpsts.buffer);
int lmt = buf.length;
int count = 0;
for (var i = 0; i < lmt; ++i) {
count += CLUT[buf[i]];
}
return count;
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
Uint32List _composites2BasePrimeArray(int low, Uint8List cmpsts) {
final int lmti = cmpsts.length << 3;
final int len = _countComposites(cmpsts);
final Uint32List rslt = Uint32List(len);
int j = 0;
for (int i = 0; i < lmti; ++i) {
if (cmpsts[i >> 3] & 1 << (i & 7) == 0) {
rslt[j++] = low + i + i;
}
}
return rslt;
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
void _sieveComposites(int low, Uint8List buffer, Iterable<Uint32List> bpas) {
final int lowi = (low - 3) >> 1;
final int len = buffer.length;
final int lmti = len << 3;
final int nxti = lowi + lmti;
for (var bpa in bpas) {
for (var bp in bpa) {
final int bpi = (bp - 3) >> 1;
int strti = ((bpi * (bpi + 3)) << 1) + 3;
if (strti >= nxti) return;
if (strti >= lowi) strti = strti - lowi;
else {
strti = (lowi - strti) % bp;
if (strti != 0) strti = bp - strti;
}
if (bp <= len >> 3 && strti <= lmti - bp << 6) {
final int slmti = min(lmti, strti + bp << 3);
for (var s = strti; s < slmti; s += bp) {
final int msk = 1 << (s & 7);
for (var c = s >> 3; c < len; c += bp) {
buffer[c] |= msk;
}
}
}
else {
for (var c = strti; c < lmti; c += bp) {
buffer[c >> 3] |= 1 << (c & 7);
}
}
}
}
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
Iterable<Uint32List> _makeBasePrimeArrays() {
var cmpsts = Uint8List(512);
_LazyList<Uint32List> _nextelem(int low, Iterable<Uint32List> bpas) {
// calculate size so that the bit span is at least as big as the
// maximum culling prime required, rounded up to minsizebits blocks...
final int rqdsz = 2 + sqrt((1 + low).toDouble()).toInt();
final sz = (((rqdsz >> 12) + 1) << 9); // size in bytes
if (sz > cmpsts.length) cmpsts = Uint8List(sz);
cmpsts.fillRange(0, cmpsts.length, 0);
_sieveComposites(low, cmpsts, bpas);
final arr = _composites2BasePrimeArray(low, cmpsts);
final nxt = low + (cmpsts.length << 4);
return _LazyList(arr, () => _nextelem(nxt, bpas));
}
// pre-seeding breaks recursive race,
// as only known base primes used for first page...
final preseedarr = Uint32List.fromList( [ // pre-seed to 100, can sieve to 10,000...
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ] );
return _LazyListIterable(
_LazyList(preseedarr,
() => _nextelem(101, _makeBasePrimeArrays()))
);
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
Iterable<List> _makeSievePages() sync* {
final bpas = _makeBasePrimeArrays(); // secondary source of base prime arrays
int low = 3;
Uint8List cmpsts = Uint8List(16384);
_sieveComposites(3, cmpsts, bpas);
while (true) {
yield([low, cmpsts]);
final rqdsz = 2 + sqrt((1 + low).toDouble()).toInt(); // problem with sqrt not exact past about 10^12!!!!!!!!!
final sz = ((rqdsz >> 17) + 1) << 14; // size iin bytes
if (sz > cmpsts.length) cmpsts = Uint8List(sz);
cmpsts.fillRange(0, cmpsts.length, 0);
low += cmpsts.length << 4;
_sieveComposites(low, cmpsts, bpas);
}
}

int countPrimesTo(int range) {
if (range < 3) { if (range < 2) return 0; else return 1; }
var count = 1;
for (var sp in _makeSievePages()) {
int low = sp[0]; Uint8List cmpsts = sp[1];
if ((low + (cmpsts.length << 4)) > range) {
int lsti = (range - low) >> 1;
var lstw = (lsti >> 4); var lstb = lstw << 1;
var msk = (-2 << (lsti & 15)) & 0xFFFF;
var buf = Uint16List.view(cmpsts.buffer, 0, lstw);
for (var i = 0; i < lstw; ++i)
count += CLUT[buf[i]];
count += CLUT[(cmpsts[lstb + 1] << 8) | cmpsts[lstb] | msk];
break;
} else {
count += _countComposites(cmpsts);
}
}
return count;
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
Iterable<int> primesPaged() sync* {
yield(2);
for (var sp in _makeSievePages()) {
int low = sp[0]; Uint8List cmpsts = sp[1];
var szbts = cmpsts.length << 3;
for (var i = 0; i < szbts; ++i) {
if (cmpsts[i >> 3].toInt() & (1 << (i & 7)) != 0) continue;
yield(low + i + i);
}
}
}
}


void main() {
void main() {
int n = 1000000000;
final int range = 1000000000;
String s = "( ";
int strt = new DateTime.now().millisecondsSinceEpoch;
primesPaged().take(25).forEach((p)=>s += "$p "); print(s + ")");
int count = new SoEPagedOddsInfGen().takeWhile((p) => p <= n).length;
print("There are ${countPrimesTo(1000000)} primes to 1000000.");
int elpsd = new DateTime.now().millisecondsSinceEpoch - strt;
final start = DateTime.now().millisecondsSinceEpoch;
print("For a range of " + n.toString() + ", " + count.toString() +
final answer = countPrimesTo(range); // fast way
" primes found in " + elpsd.toString() + " milliseconds.");
// final answer = primesPaged().takeWhile((p)=>p<=range).length; // slow way using enumeration
}</lang>
final elapsed = DateTime.now().millisecondsSinceEpoch - start;
This version calculates the 50,847,534 primes up to one billion in about 20 seconds under the Dart Virtual Machine (VM). Under the Google Chrome V8 JavaScript engine it should take the same time as the JavaScript from which it was translated of about five seconds, but takes about 14 seconds due to the dart2js compiler adding extra run time array buffer range checks to the innermost culling loops, even though the "check" compiler option was not selected.
print("There were $answer primes found up to $range.");
print("This test bench took $elapsed milliseconds.");
}</syntaxhighlight>
{{output}}
<pre>( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 50847534 primes found up to 1000000000.
This test bench took 9385 milliseconds.</pre>


This version counts the primes up to one billion in about five seconds at 3.6 Gigahertz (a low end 1.92 Gigahertz CPU used here) or about 350 CPU clock cycles per prime under the Dart Virtual Machine (VM).
Also note the comment at the beginning of the "moveNext()" method about the redundant local variable needed to be added in order for the code to run under JavaScript using Dart 1.5.1 (and possible other versions), which shouldn't happen when it runs fine under the Dart VM without that extra local variable (based only on the private class field _lowi).

Note that it takes about four times as long to do this using the provided primes generator/enumerator as noted in the code, which is normal for all languages that it takes longer to actually enumerate the primes than it does to sieve in culling the composite numbers, but Dart is somewhat slower than most for this.

The algorithm can be sped up by a factor of four by extreme wheel factorization and (likely) about a factor of the effective number of CPU cores by using multi-processing isolates, but there isn't much point if one is to use the prime generator for output. For most purposes, it is better to use custom functions that directly manipulate the culled bit-packed page segments as `countPrimesTo` does here.

=={{header|dc}}==

<syntaxhighlight lang="dc">[dn[,]n dsx [d 1 r :a lx + d ln!<.] ds.x lx] ds@
[sn 2 [d;a 0=@ 1 + d ln!<#] ds#x] se

100 lex</syntaxhighlight>
{{out}}
<pre>2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,\
97,</pre>


=={{header|Delphi}}==
=={{header|Delphi}}==
<lang delphi>program erathostenes;
<syntaxhighlight lang="delphi">program erathostenes;


{$APPTYPE CONSOLE}
{$APPTYPE CONSOLE}
Line 2,797: Line 5,653:
Sieve.Free;
Sieve.Free;
readln;
readln;
end.</lang>
end.</syntaxhighlight>
Output:
Output:
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 </pre>
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 </pre>


=={{header|Draco}}==
<syntaxhighlight lang="draco">/* Sieve of Eratosthenes - fill a given boolean array */
proc nonrec sieve([*] bool prime) void:
word p, c, max;
max := dim(prime,1)-1;
prime[0] := false;
prime[1] := false;
for p from 2 upto max do prime[p] := true od;
for p from 2 upto max>>1 do
if prime[p] then
for c from p*2 by p upto max do
prime[c] := false
od
fi
od
corp

/* Print primes up to 1000 using the sieve */
proc nonrec main() void:
word MAX = 1000;
unsigned MAX i;
byte c;
[MAX+1] bool prime;
sieve(prime);

c := 0;
for i from 0 upto MAX do
if prime[i] then
write(i:4);
c := c + 1;
if c=10 then c:=0; writeln() fi
fi
od
corp</syntaxhighlight>
{{out}}
<pre> 2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997</pre>


=={{header|DWScript}}==
=={{header|DWScript}}==


<lang delphi>function Primes(limit : Integer) : array of Integer;
<syntaxhighlight lang="delphi">function Primes(limit : Integer) : array of Integer;
var
var
n, k : Integer;
n, k : Integer;
Line 2,824: Line 5,732:
var i : Integer;
var i : Integer;
for i:=0 to r.High do
for i:=0 to r.High do
PrintLn(r[i]);</lang>
PrintLn(r[i]);</syntaxhighlight>


=={{header|Dylan}}==
=={{header|Dylan}}==
With outer to sqrt and inner to p^2 optimizations:
With outer to sqrt and inner to p^2 optimizations:
<lang dylan>define method primes(n)
<syntaxhighlight lang="dylan">define method primes(n)
let limit = floor(n ^ 0.5) + 1;
let limit = floor(n ^ 0.5) + 1;
let sieve = make(limited(<simple-vector>, of: <boolean>), size: n + 1, fill: #t);
let sieve = make(limited(<simple-vector>, of: <boolean>), size: n + 1, fill: #t);
Line 2,851: Line 5,759:
if (sieve[x]) format-out("Prime: %d\n", x); end;
if (sieve[x]) format-out("Prime: %d\n", x); end;
end;
end;
end;</lang>
end;</syntaxhighlight>



=={{header|E}}==
=={{header|E}}==
Line 2,888: Line 5,795:
# 7
# 7
# 11
# 11

=={{header|EasyLang}}==
<syntaxhighlight lang="text">
len is_divisible[] 100
max = sqrt len is_divisible[]
for d = 2 to max
if is_divisible[d] = 0
for i = d * d step d to len is_divisible[]
is_divisible[i] = 1
.
.
.
for i = 2 to len is_divisible[]
if is_divisible[i] = 0
print i
.
.</syntaxhighlight>


=={{header|eC}}==
=={{header|eC}}==
{{incorrect|eC|It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
{{incorrect|eC|It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
Note: this is not a Sieve of Eratosthenes; it is just trial division.
Note: this is not a Sieve of Eratosthenes; it is just trial division.
<lang cpp>
<syntaxhighlight lang="cpp">
public class FindPrime
public class FindPrime
{
{
Line 2,936: Line 5,860:
}
}
}
}
</syntaxhighlight>
</lang>

=={{header|EchoLisp}}==
=={{header|EchoLisp}}==
===Sieve===
===Sieve===
<lang lisp>(require 'types) ;; bit-vector
<syntaxhighlight lang="lisp">(require 'types) ;; bit-vector


;; converts sieve->list for integers in [nmin .. nmax[
;; converts sieve->list for integers in [nmin .. nmax[
Line 2,968: Line 5,893:
→ (1000003 1000033 1000037 1000039 1000081 1000099)
→ (1000003 1000033 1000037 1000039 1000081 1000099)
(s-next-prime s-primes 9_000_000)
(s-next-prime s-primes 9_000_000)
→ 9000011</lang>
→ 9000011</syntaxhighlight>


===Segmented sieve===
===Segmented sieve===
Allow to extend the basis sieve (n) up to n^2. Memory requirement is O(√n)
Allow to extend the basis sieve (n) up to n^2. Memory requirement is O(√n)
<lang scheme>;; ref : http://research.cs.wisc.edu/techreports/1990/TR909.pdf
<syntaxhighlight lang="scheme">;; ref : http://research.cs.wisc.edu/techreports/1990/TR909.pdf
;; delta multiple of sqrt(n)
;; delta multiple of sqrt(n)
;; segment is [left .. left+delta-1]
;; segment is [left .. left+delta-1]
Line 3,005: Line 5,930:


;; 8 msec using the native (prime?) function
;; 8 msec using the native (prime?) function
(for/list ((p (in-range 1_000_000_000 1_000_001_000))) #:when (prime? p) p)</lang>
(for/list ((p (in-range 1_000_000_000 1_000_001_000))) #:when (prime? p) p)</syntaxhighlight>


===Wheel===
===Wheel===
A 2x3 wheel gives a 50% performance gain.
A 2x3 wheel gives a 50% performance gain.
<lang scheme>;; 2x3 wheel
<syntaxhighlight lang="scheme">;; 2x3 wheel
(define (weratosthenes n)
(define (weratosthenes n)
(define primes (make-bit-vector n )) ;; everybody to #f (false)
(define primes (make-bit-vector n )) ;; everybody to #f (false)
Line 3,025: Line 5,950:
(for ([j (in-range (* p p) n p)])
(for ([j (in-range (* p p) n p)])
(bit-vector-set! primes j #f)))
(bit-vector-set! primes j #f)))
primes)</lang>
primes)</syntaxhighlight>

=={{header|EDSAC order code}}==
This sieve program is based on one by Eiiti Wada, which on 2020-07-05 could be found at https://www.dcs.warwick.ac.uk/~edsac/

The main external change is that the program is not designed to be viewed in the monitor; it just writes as many primes as possible within the limitations imposed by Rosetta Code. Apart from the addition of comments, internal changes include the elimination of one set of masks, and a revised method of switching from one mask to another.

On the EdsacPC simulator (see link above) the printout starts off very slowly, and gradually gets faster.
<syntaxhighlight lang="edsac">
[Sieve of Eratosthenes]
[EDSAC program. Initial Orders 2]

[Memory usage:
56..87 library subroutine P6, for printing
88..222 main program
224..293 mask table: 35 long masks; each has 34 1's and a single 0
294..1023 array of bits for integers 2, 3, 4, ...,
where bit is changed from 1 to 0 when integer is crossed out.
The address of the mask table must be even, and clear of the main program.
To change it, just change the value after "T47K" below.
The address of the bit array will then be changed automatically.]
[Subroutine M3, prints header, terminated by blank row of tape.
It's an "interlude", which runs and then gets overwritten.]
PFGKIFAFRDLFUFOFE@A6FG@E8FEZPF
@&*SIEVE!OF!ERATOSTHENES!#2020
@&*BASED!ON!CODE!BY!EIITI!WADA!#2001
..PZ

[Subroutine P6, prints strictly positive integer.
32 locations; working locations 1, 4, 5.]
T 56 K
GKA3FT25@H29@VFT4DA3@TFH30@S6@T1FV4DU4DAFG26@TFTF
O5FA4DF4FS4FL4FT4DA1FS3@G9@EFSFO31@E20@J995FJF!F

[Store address of mask table in (say) location 47
(chosen because its code letter M is first letter of "Mask").
Address must be even and clear of main program.]
T 47 K
P 224 F [<-------- address of mask table]

[Main program]
T 88 K [Define load address for main program.
Must be even, because of long values at start.]
G K [set @ (theta) for relative addressing]

[Long constants]
T#Z PF TZ [clears sandwich digit between 0 and 1]
[0] PD PF [long value 1; also low word = short 1]
T2#Z PF T2Z [clears sandwich digit between 2 and 3]
[2] PF K4096F [long value 1000...000 binary;
also high word = teleprinter null]

[Short constants
The address in the following C order is the (exclusive) end of the bit table.
Must be even: max = 1024, min = M + 72 where M is address of mask table set up above.
Usually 1024, but may be reduced, e.g. to make the program run faster.]
[4] C1024 D [or e.g. C 326 D to make it much faster]
[5] U F ['U' = 'T' - 'C']
[6] K F ['K' = 'S' - 'C']
[7] H #M [H order for start of mask table]
[8] H 70#M [used to test for end of mask table]
[9] P 2 F [constant4, or 2 in address field]
[10] P 70 F [constant 140, or 70 in address field]
[11] @ F [carriage return]
[12] & F [line feed]

[Short variables]
[13] P 1 F [p = number under test
Let p = 35*q + r, where 0 <= r < 35]
[14] P F [4*q]
[15] P 4 F [4*r]

[Initial values of orders; required only for optional code below.]
[16] C 70#M [initial value of a variable C order]
[17] T #M [initial value of a variable T order]
[18] T 70#M [initial value of a variable T order]

[19]
[Enter with acc = 0]

[Optional code to do some initializing at run time.
This code allows the program to run again without being loaded again.]
A 7 @ [initial values of variable orders]
T 65 @
A 16 @
T 66 @
A 17 @
T 44 @
A 18 @
T 52 @

[Initialize variables]
A @ [load 1 (short)]
L D [shift left 1]
U 13 @ [p := 2]
L 1 F [shift left 2]
T 15 @ [4*r := 8]
T 14 @ [4*q := 0]
[End of optional code]

[Make table of 35 masks 111...110, 111...101, ..., 011...111
Treat the mask 011...111 separately to avoid accumulator overflow.
Assume acc = 0 here.]
S #@ [acc all 1's]
S 2 #@ [acc := 0111...111]
[35] T 68 #M [store at high end of mask table]
S #@ [acc := -1]
L D [make mask 111...1110]
G 43 @ [jump to store it
[Loop shifting the mask right and storing the result in the mask table.
Uses first entry of bit array as temporary store.]
[39] T F [clear acc]
A 70 #M [load previous mask]
L D [shift left]
A #@ [add 1]
[43] U 70 #M [update current mask]
[44] T #M [store it in table (order changed at run time)]
A 44 @ [load preceding T order]
A 9 @ [inc address by 2]
U 44 @ [store back]
S 35 @ [reached high entry yet?]
G 39 @ [loop back if not]
[Mask table is now complete]

[Initialize bit array: no numbers crossed out, so all bits are 1]
[50] T F [clear acc]
S #@ [subtract long 1, make top 35 bits all 1's]
[52] T 70 #M [store as long value, both words all 1's (order changed at run time)]
A 52 @ [load preceding order]
A 9 @ [add 2 to address field]
U 52 @ [and store back]
S 5 @ [convert to C order with same address (*)]
S 4 @ [test for end of bit array]
G 50 @ [loop until stored all 1's in bit table]
[(*) Done so that end of bit table can be stored at one place only
in list of constants, i.e. 'C m D' only, not 'T m D' as well.]

[Start of main loop.]
[Testing whether number has been crossed out]
[59] T F [acc := 0]
A 66 @ [deriving S order from C order]
A 6 @
T 64 @
S #@ [acc := -1]
[64] S F [acc := 1's complement of bit-table entry (order changed at run time)]
[65] H #M [mult reg := start of mask array (order changed at run time)]
[66] C 70#M [acc := -1 iff p (current number) is crossed out (order changed at run time)]
[The next order is to avoid accumulator overflow if acc = max positive number]
E 70 @ [if acc >= 0, jump to process new prime]
A #@ [if acc < 0, add 1 to test for -1]
E 106 @ [if acc now >= 0 number is crossed out, jump to test next]
[Here if new prime found.
Send it to the teleprinter]
[70] O 11 @ [print CR]
O 12 @ [print LF]
T F [clear acc]
A 13 @ [load prime]
T F [store in C(0) for print routine]
A 75 @ [for subroutine return]
G 56 F [print prime]

[Cross out its multiples by setting corresponding bits to 0]
A 65 @ [load H order above]
T 102 @ [plant in crossing-out loop]
A 66 @ [load C order above]
T1 03 @ [plant in crossing-out loop]

[Start of crossing-out loop. Here acc must = 0]
[81] A 102 @ [load H order below]
A 15 @ [inc address field by 2*r, where p = 35q + r]
U 102 @ [update H order]
S 8 @ [compare with 'H 70 #M']
G 93 @ [skip if not gone beyond end of mask table]
T F [wrap mask address and inc address in bit tsble]
A 102 @ [load H order below]
S 10 @ [reduce mask address by 70]
T 102 @ [update H order]
A 103 @ [load C order below]
A 9 @ [add 2 to address]
T 103 @ [update C order]
[93] T F [clear acc]
A 103 @ [load C order below]
A 14 @ [inc address field by 2*q, where p = 35q + r]
U 103 @ [update C order]
S 4 @ [test for end of bit array]
E 106 @ [if finished crossing out, loop to test next number]
A 4 @ [restore C order]
A 5 @ [make T order with same address]
T 104 @ [store below]

[Execute the crossing-out orders created above]
[102] X F [mult reg := mask (order created at run time)]
[103] X F [acc := logical and with bit-table entry (order created at run time)]
[104] X F [update entry (order created at run time)]
E 81 @ [loop back with acc = 0]

[106] T F [clear acc]
A 13 @ [load p = number under test]
A @ [add 1 (single)]
T 13 @ [update]
A 15 @ [load 4*r, where p = 35q + r]
A 9 @ [add 4]
U 15 @ [store back (r inc'd by 1)]
S 10 @ [is 4*r now >= 140?]
G 119 @ [no, skip]
T 15 @ [yes, reduce 4*r by 140]
A 14 @ [load 4*q]
A 9 @ [add 4]
T 14 @ [store back (q inc'd by 1)]
[119] T F [clear acc]
A 65 @ [load 'H ... D' order, which refers to a mask]
A 9 @ [inc mask address by 2]
U 65 @ [update order]
S 8 @ [over end of mask table?]
G 59 @ [no, skip wrapround code]
A 7 @ [yes, add constant to wrap round]
T 65 @ [update H order]
A 66 @
A 9 @ [inc address by 2]
U 66 @ [and store back]
S 4 @ [test for end, as defined by C order at start]
G 59 @ [loop back if not at end]

[Finished whole thing]
[132] O 3 @ [output null to flush teleprinter buffer]
Z F [stop]
E 19 Z [address to start execution]
P F [acc = 0 at start]
</syntaxhighlight>
{{out}}
<pre>
SIEVE OF ERATOSTHENES 2020
BASED ON CODE BY EIITI WADA 2001
2
3
5
7
11
13
17
[...]
12703
12713
12721
12739
12743
12757
12763
</pre>


=={{header|Eiffel}}==
=={{header|Eiffel}}==
{{works with|EiffelStudio|6.6 beta (with provisional loop syntax)}}
{{works with|EiffelStudio|6.6 beta (with provisional loop syntax)}}
<lang eiffel>class
<syntaxhighlight lang="eiffel">class
APPLICATION
APPLICATION
Line 3,063: Line 6,237:
end
end
end
end
end</lang>
end</syntaxhighlight>


Output:
Output:
Line 3,071: Line 6,245:


=={{header|Elixir}}==
=={{header|Elixir}}==
<lang elixir>defmodule Prime do
<syntaxhighlight lang="elixir">defmodule Prime do
def eratosthenes(limit \\ 1000) do
def eratosthenes(limit \\ 1000) do
sieve = [false, false | Enum.to_list(2..limit)] |> List.to_tuple
sieve = [false, false | Enum.to_list(2..limit)] |> List.to_tuple
Line 3,095: Line 6,269:
if x=elem(sieve, n), do: :io.format("~3w", [x]), else: :io.format(" .")
if x=elem(sieve, n), do: :io.format("~3w", [x]), else: :io.format(" .")
if rem(n+1, 20)==0, do: IO.puts ""
if rem(n+1, 20)==0, do: IO.puts ""
end)</lang>
end)</syntaxhighlight>


{{out}}
{{out}}
Line 3,113: Line 6,287:
Shorter version (but slow):
Shorter version (but slow):


<lang elixir>
<syntaxhighlight lang="elixir">
defmodule Sieve do
defmodule Sieve do
def primes_to(limit), do: sieve(Enum.to_list(2..limit))
def primes_to(limit), do: sieve(Enum.to_list(2..limit))
Line 3,120: Line 6,294:
defp sieve([]), do: []
defp sieve([]), do: []
end
end
</syntaxhighlight>
</lang>

'''Alternate much faster odds-only version more suitable for immutable data structures using a (hash) Map'''

The above code has a very limited useful range due to being very slow: for example, to sieve to a million, even changing the algorithm to odds-only, requires over 800 thousand "copy-on-update" operations of the entire saved immutable tuple ("array") of 500 thousand bytes in size, making it very much a "toy" application. The following code overcomes that problem by using a (immutable/hashed) Map to store the record of the current state of the composite number chains resulting from each of the secondary streams of base primes, which are only 167 in number up to this range; it is a functional "incremental" Sieve of Eratosthenes implementation:
<syntaxhighlight lang="elixir">defmodule PrimesSoEMap do
@typep stt :: {integer, integer, integer, Enumerable.integer, %{integer => integer}}

@spec advance(stt) :: stt
defp advance {n, bp, q, bps?, map} do
bps = if bps? === nil do Stream.drop(oddprms(), 1) else bps? end
nn = n + 2
if nn >= q do
inc = bp + bp
nbps = bps |> Stream.drop(1)
[nbp] = nbps |> Enum.take(1)
advance {nn, nbp, nbp * nbp, nbps, map |> Map.put(nn + inc, inc)}
else if Map.has_key?(map, nn) do
{inc, rmap} = Map.pop(map, nn)
[next] =
Stream.iterate(nn + inc, &(&1 + inc))
|> Stream.drop_while(&(Map.has_key?(rmap, &1))) |> Enum.take(1)
advance {nn, bp, q, bps, Map.put(rmap, next, inc)}
else
{nn, bp, q, bps, map}
end end
end

@spec oddprms() :: Enumerable.integer
defp oddprms do # put first base prime cull seq in Map so never empty
# advance base odd primes to 5 when initialized
init = {7, 5, 25, nil, %{9 => 6}}
[3, 5] # to avoid race, preseed with the first 2 elements...
|> Stream.concat(
Stream.iterate(init, &(advance &1))
|> Stream.map(fn {p,_,_,_,_} -> p end))
end

@spec primes() :: Enumerable.integer
def primes do
Stream.concat([2], oddprms())
end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoEMap.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
fn () ->
ans =
PrimesSoEMap.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
ans end
:timer.tc(testfunc)
|> (fn {t,ans} ->
IO.puts "There are #{ans} primes up to #{range}."
IO.puts "This test bench took #{t} microseconds." end).()</syntaxhighlight>
{{output}}
<pre>The first 25 primes are:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes up to 1000000.
This test bench took 3811957 microseconds.</pre>

The output time of about 3.81 seconds to one million is on a 1.92 Gigahertz CPU meaning that it takes about 93 thousand CPU clock cycles per prime which is still quite slow compared to mutable data structure implementations but comparable to "functional" implementations in other languages and is slow due to the time to calculate the required hashes. One advantage that it has is that it is O(n log (log n)) asymptotic computational complexity meaning that it takes not much more than ten times as long to sieve a range ten times higher.

This algorithm could be easily changed to use a Priority Queue (preferably Min-Heap based for the least constant factor computational overhead) to save some of the computation time, but then it will have the same computational complexity as the following code and likely about the same execution time.

'''Alternate faster odds-only version more suitable for immutable data structures using lazy Streams of Co-Inductive Streams'''

In order to save the computation time of computing the hashes, the following version uses a deferred execution Co-Inductive Stream type (constructed using Tuple's) in an infinite tree folding structure (by the `pairs` function):
<syntaxhighlight lang="elixir">defmodule PrimesSoETreeFolding do
@typep cis :: {integer, (() -> cis)}
@typep ciss :: {cis, (() -> ciss)}

@spec merge(cis, cis) :: cis
defp merge(xs, ys) do
{x, restxs} = xs; {y, restys} = ys
cond do
x < y -> {x, fn () -> merge(restxs.(), ys) end}
y < x -> {y, fn () -> merge(xs, restys.()) end}
true -> {x, fn () -> merge(restxs.(), restys.()) end}
end
end

@spec smlt(integer, integer) :: cis
defp smlt(c, inc) do
{c, fn () -> smlt(c + inc, inc) end}
end

@spec smult(integer) :: cis
defp smult(p) do
smlt(p * p, p + p)
end
P
@spec allmults(cis) :: ciss
defp allmults {p, restps} do
{smult(p), fn () -> allmults(restps.()) end}
end

@spec pairs(ciss) :: ciss
defp pairs {cs0, restcss0} do
{cs1, restcss1} = restcss0.()
{merge(cs0, cs1), fn () -> pairs(restcss1.()) end}
end

@spec cmpsts(ciss) :: cis
defp cmpsts {cs, restcss} do
{c, restcs} = cs
{c, fn () -> merge(restcs.(), cmpsts(pairs(restcss.()))) end}
end

@spec minusat(integer, cis) :: cis
defp minusat(n, cmps) do
{c, restcs} = cmps
if n < c do
{n, fn () -> minusat(n + 2, cmps) end}
else
minusat(n + 2, restcs.())
end
end

@spec oddprms() :: cis
defp oddprms() do
{3, fn () ->
{5, fn () -> minusat(7, cmpsts(allmults(oddprms()))) end}
end}
end

@spec primes() :: Enumerable.t
def primes do
[2] |> Stream.concat(
Stream.iterate(oddprms(), fn {_, restps} -> restps.() end)
|> Stream.map(fn {p, _} -> p end)
)
end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoETreeFolding.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
fn () ->
ans =
PrimesSoETreeFolding.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
ans end
:timer.tc(testfunc)
|> (fn {t,ans} ->
IO.puts "There are #{ans} primes up to #{range}."
IO.puts "This test bench took #{t} microseconds." end).()</syntaxhighlight>

It's output is identical to the previous version other than the time required is less than half; however, it has a O(n (log n) (log (log n))) asymptotic computation complexity meaning that it gets slower with range faster than the above version. That said, it would take sieving to billions taking hours before the two would take about the same time.

=={{header|Elm}}==

===Elm with immutable arrays===
<syntaxhighlight lang="elm">
module PrimeArray exposing (main)

import Array exposing (Array, foldr, map, set)
import Html exposing (div, h1, p, text)
import Html.Attributes exposing (style)


{-
The Eratosthenes sieve task in Rosetta Code does not accept the use of modulo function (allthough Elm functions modBy and remainderBy work always correctly as they require type Int excluding type Float). Thus the solution needs an indexed work array as Elm has no indexes for lists.

In this method we need no division remainder calculations, as we just set the markings of non-primes into the array. We need the indexes that we know, where the marking of the non-primes shall be set.

Because everything is immutable in Elm, every change of array values will create a new array save the original array unchanged. That makes the program running slower or consuming more space of memory than with non-functional imperative languages. All conventional loops (for, while, until) are excluded in Elm because immutability requirement.

Live: https://ellie-app.com/pTHJyqXcHtpa1
-}


alist =
List.range 2 150



-- Work array contains integers 2 ... 149


workArray =
Array.fromList alist


n : Int
n =
List.length alist



-- The max index of integers used in search for primes
-- limit * limit < n
-- equal: limit <= √n


limit : Int
limit =
round (0.5 + sqrt (toFloat n))

-- Remove zero cells of the array


findZero : Int -> Bool
findZero =
\el -> el > 0


zeroFree : Array Int
zeroFree =
Array.filter findZero workResult


nrFoundPrimes =
Array.length zeroFree


workResult : Array Int
workResult =
loopI 2 limit workArray



{- As Elm has no loops (for, while, until)
we must use recursion instead!
The search of prime starts allways saving the
first found value (not setting zero) and continues setting the multiples of prime to zero.
Zero is no integer and may thus be used as marking of non-prime numbers. At the end, only the primes remain in the array and the zeroes are removed from the resulted array to be shown in Html.
-}

-- The recursion increasing variable i follows:

loopI : Int -> Int -> Array Int -> Array Int
loopI i imax arr =
if i > imax then
arr

else
let
arr2 =
phase i arr
in
loopI (i + 1) imax arr2



-- The helper function


phase : Int -> Array Int -> Array Int
phase i =
arrayMarker i (2 * i - 2) n


lastPrime =
Maybe.withDefault 0 <| Array.get (nrFoundPrimes - 1) zeroFree


outputArrayInt : Array Int -> String
outputArrayInt arr =
decorateString <|
foldr (++) "" <|
Array.map (\x -> String.fromInt x ++ " ") arr


decorateString : String -> String
decorateString str =
"[ " ++ str ++ "]"



-- Recursively marking the multiples of p with zero
-- This loop operates with constant p


arrayMarker : Int -> Int -> Int -> Array Int -> Array Int
arrayMarker p min max arr =
let
arr2 =
set min 0 arr

min2 =
min + p
in
if min < max then
arrayMarker p min2 max arr2

else
arr


main =
div [ style "margin" "2%" ]
[ h1 [] [ text "Sieve of Eratosthenes" ]
, text ("List of integers [2, ... ," ++ String.fromInt n ++ "]")
, p [] [ text ("Total integers " ++ String.fromInt n) ]
, p [] [ text ("Max prime of search " ++ String.fromInt limit) ]
, p [] [ text ("The largest found prime " ++ String.fromInt lastPrime) ]
, p [ style "color" "blue", style "font-size" "1.5em" ]
[ text (outputArrayInt zeroFree) ]
, p [] [ text ("Found " ++ String.fromInt nrFoundPrimes ++ " primes") ]
]

</syntaxhighlight>

{{out}}
<pre>
List of integers [2, ... ,149]

Total integers 149

Max prime of search 13

The largest found prime 149

[ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 ]

Found 35 primes </pre>

===Concise Elm Immutable Array Version===

Although functional, the above code is written in quite an imperative style, so the following code is written in a more concise functional style and includes timing information for counting the number of primes to a million:

<syntaxhighlight lang="elm">module Main exposing (main)

import Browser exposing (element)
import Task exposing (Task, succeed, perform, andThen)
import Html exposing (Html, div, text)
import Time exposing (now, posixToMillis)

import Array exposing (repeat, get, set)

cLIMIT : Int
cLIMIT = 1000000

primesArray : Int -> List Int
primesArray n =
if n < 2 then [] else
let
sz = n + 1
loopbp bp arr =
let s = bp * bp in
if s >= sz then arr else
let tst = get bp arr |> Maybe.withDefault True in
if tst then loopbp (bp + 1) arr else
let
cullc c iarr =
if c >= sz then iarr else
cullc (c + bp) (set c True iarr)
in loopbp (bp + 1) (cullc s arr)
cmpsts = loopbp 2 (repeat sz False)
cnvt (i, t) = if t then Nothing else Just i
in cmpsts |> Array.toIndexedList
|> List.drop 2 -- skip the values for zero and one
|> List.filterMap cnvt -- primes are indexes of not composites

type alias Model = List String

type alias Msg = Model

test : (Int -> List Int) -> Int -> Cmd Msg
test primesf lmt =
let
to100 = primesf 100 |> List.map String.fromInt |> String.join ", "
to100str = "The primes to 100 are: " ++ to100
timemillis() = now |> andThen (succeed << posixToMillis)
in timemillis() |> andThen (\ strt ->
let cnt = primesf lmt |> List.length
in timemillis() |> andThen (\ stop ->
let answrstr = "Found " ++ (String.fromInt cnt) ++ " primes to "
++ (String.fromInt cLIMIT) ++ " in "
++ (String.fromInt (stop - strt)) ++ " milliseconds."
in succeed [to100str, answrstr] ) ) |> perform identity

main : Program () Model Msg
main =
element { init = \ _ -> ( [], test primesArray cLIMIT )
, update = \ msg _ -> (msg, Cmd.none)
, subscriptions = \ _ -> Sub.none
, view = div [] << List.map (div [] << List.singleton << text) }</syntaxhighlight>
{{out}}
<pre>The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 958 milliseconds.</pre>

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

===Concise Elm Immutable Array Odds-Only Version===

The following code can replace the `primesArray` function in the above program and called from the testing and display code (two places):

<syntaxhighlight lang="elm">primesArrayOdds : Int -> List Int
primesArrayOdds n =
if n < 2 then [] else
let
sz = (n - 1) // 2
loopi i arr =
let s = (i + i) * (i + 3) + 3 in
if s >= sz then arr else
let tst = get i arr |> Maybe.withDefault True in
if tst then loopi (i + 1) arr else
let
bp = i + i + 3
cullc c iarr =
if c >= sz then iarr else
cullc (c + bp) (set c True iarr)
in loopi (i + 1) (cullc s arr)
cmpsts = loopi 0 (repeat sz False)
cnvt (i, t) = if t then Nothing else Just <| i + i + 3
oddprms = cmpsts |> Array.toIndexedList |> List.filterMap cnvt
in 2 :: oddprms</syntaxhighlight>
{{out}}
<pre>The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 371 milliseconds.</pre>

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

===Richard Bird Tree Folding Elm Version===

The Elm language doesn't efficiently handle the Sieve of Eratosthenes (SoE) algorithm because it doesn't have directly accessible linear arrays (the Array module used above is based on a persistent tree of sub arrays) and also does Copy On Write (COW) for every write to every location as well as a logarithmic process of updating as a "tree" to minimize the COW operations. Thus, there is better performance implementing the Richard Bird Tree Folding functional algorithm, as follows:
{{trans|Haskell}}
<syntaxhighlight lang="elm">module Main exposing (main)

import Browser exposing (element)
import Task exposing (Task, succeed, perform, andThen)
import Html exposing (Html, div, text)
import Time exposing (now, posixToMillis)

cLIMIT : Int
cLIMIT = 1000000

type CIS a = CIS a (() -> CIS a)

uptoCIS2List : comparable -> CIS comparable -> List comparable
uptoCIS2List n cis =
let loop (CIS hd tl) lst =
if hd > n then List.reverse lst
else loop (tl()) (hd :: lst)
in loop cis []

countCISTo : comparable -> CIS comparable -> Int
countCISTo n cis =
let loop (CIS hd tl) cnt =
if hd > n then cnt else loop (tl()) (cnt + 1)
in loop cis 0

primesTreeFolding : () -> CIS Int
primesTreeFolding() =
let
merge (CIS x xtl as xs) (CIS y ytl as ys) =
case compare x y of
LT -> CIS x <| \ () -> merge (xtl()) ys
EQ -> CIS x <| \ () -> merge (xtl()) (ytl())
GT -> CIS y <| \ () -> merge xs (ytl())
pmult bp =
let adv = bp + bp
pmlt p = CIS p <| \ () -> pmlt (p + adv)
in pmlt (bp * bp)
allmlts (CIS bp bptl) =
CIS (pmult bp) <| \ () -> allmlts (bptl())
pairs (CIS frst tls) =
let (CIS scnd tlss) = tls()
in CIS (merge frst scnd) <| \ () -> pairs (tlss())
cmpsts (CIS (CIS hd tl) tls) =
CIS hd <| \ () -> merge (tl()) <| cmpsts <| pairs (tls())
testprm n (CIS hd tl as cs) =
if n < hd then CIS n <| \ () -> testprm (n + 2) cs
else testprm (n + 2) (tl())
oddprms() =
CIS 3 <| \ () -> testprm 5 <| cmpsts <| allmlts <| oddprms()
in CIS 2 <| \ () -> oddprms()

type alias Model = List String

type alias Msg = Model

test : (() -> CIS Int) -> Int -> Cmd Msg
test primesf lmt =
let
to100 = primesf() |> uptoCIS2List 100
|> List.map String.fromInt |> String.join ", "
to100str = "The primes to 100 are: " ++ to100
timemillis() = now |> andThen (succeed << posixToMillis)
in timemillis() |> andThen (\ strt ->
let cnt = primesf() |> countCISTo lmt
in timemillis() |> andThen (\ stop ->
let answrstr = "Found " ++ (String.fromInt cnt) ++ " primes to "
++ (String.fromInt cLIMIT) ++ " in "
++ (String.fromInt (stop - strt)) ++ " milliseconds."
in succeed [to100str, answrstr] ) ) |> perform identity

main : Program () Model Msg
main =
element { init = \ _ -> ( [], test primesTreeFolding cLIMIT )
, update = \ msg _ -> (msg, Cmd.none)
, subscriptions = \ _ -> Sub.none
, view = div [] << List.map (div [] << List.singleton << text) }</syntaxhighlight>
{{out}}
<pre>The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 201 milliseconds.</pre>

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

===Elm Priority Queue Version===

Using a Binary Minimum Heap Priority Queue is a constant factor faster than the above code as the data structure is balanced rather than "heavy to the right" and requires less memory allocations/deallocation in the following code, which implements enough of the Priority Queue for the purpose. Just substitute the following code for `primesTreeFolding` and pass `primesPQ` as an argument to `test` rather than `primesTreeFolding`:

<syntaxhighlight lang="elm">type PriorityQ comparable v =
Mt
| Br comparable v (PriorityQ comparable v)
(PriorityQ comparable v)

emptyPQ : PriorityQ comparable v
emptyPQ = Mt

peekMinPQ : PriorityQ comparable v -> Maybe (comparable, v)
peekMinPQ pq = case pq of
(Br k v _ _) -> Just (k, v)
Mt -> Nothing

pushPQ : comparable -> v -> PriorityQ comparable v
-> PriorityQ comparable v
pushPQ wk wv pq =
case pq of
Mt -> Br wk wv Mt Mt
(Br vk vv pl pr) ->
if wk <= vk then Br wk wv (pushPQ vk vv pr) pl
else Br vk vv (pushPQ wk wv pr) pl

siftdown : comparable -> v -> PriorityQ comparable v
-> PriorityQ comparable v -> PriorityQ comparable v
siftdown wk wv pql pqr =
case pql of
Mt -> Br wk wv Mt Mt
(Br vkl vvl pll prl) ->
case pqr of
Mt -> if wk <= vkl then Br wk wv pql Mt
else Br vkl vvl (Br wk wv Mt Mt) Mt
(Br vkr vvr plr prr) ->
if wk <= vkl && wk <= vkr then Br wk wv pql pqr
else if vkl <= vkr then Br vkl vvl (siftdown wk wv pll prl) pqr
else Br vkr vvr pql (siftdown wk wv plr prr)

replaceMinPQ : comparable -> v -> PriorityQ comparable v
-> PriorityQ comparable v
replaceMinPQ wk wv pq = case pq of
Mt -> Mt
(Br _ _ pl pr) -> siftdown wk wv pl pr

primesPQ : () -> CIS Int
primesPQ() =
let
sieve n pq q (CIS bp bptl as bps) =
if n >= q then
let adv = bp + bp in let (CIS nbp _ as nbps) = bptl()
in sieve (n + 2) (pushPQ (q + adv) adv pq) (nbp * nbp) nbps
else let
(nxtc, _) = peekMinPQ pq |> Maybe.withDefault (q, 0) -- default when empty
adjust tpq =
let (c, adv) = peekMinPQ tpq |> Maybe.withDefault (0, 0)
in if c > n then tpq
else adjust (replaceMinPQ (c + adv) adv tpq)
in if n >= nxtc then sieve (n + 2) (adjust pq) q bps
else CIS n <| \ () -> sieve (n + 2) pq q bps
oddprms() = CIS 3 <| \ () -> sieve 5 emptyPQ 9 <| oddprms()
in CIS 2 <| \ () -> oddprms()</syntaxhighlight>
{{out}}
<pre>The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 124 milliseconds.</pre>

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).


=={{header|Emacs Lisp}}==
=={{header|Emacs Lisp}}==
{{libheader|cl-lib}}
<lang lisp>
(defun sieve-set (limit)
<syntaxhighlight lang="lisp">(defun sieve-set (limit)
(let ((xs (make-vector (1+ limit) 0)))
(let ((xs (make-vector (1+ limit) 0)))
(loop for i from 2 to limit
(cl-loop for i from 2 to limit
when (zerop (aref xs i))
when (zerop (aref xs i))
collect i
collect i
and do (loop for m from (* i i) to limit by i
and do (cl-loop for m from (* i i) to limit by i
do (aset xs m 1)))))
do (aset xs m 1)))))</syntaxhighlight>
</lang>


Straightforward implementation of [http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Implementation sieve of Eratosthenes], 2 times faster:
Straightforward implementation of [http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Implementation sieve of Eratosthenes], 2 times faster:


<syntaxhighlight lang="lisp">(defun sieve (limit)
<lang lisp>
(defun sieve (limit)
(let ((xs (vconcat [0 0] (number-sequence 2 limit))))
(let ((xs (vconcat [0 0] (number-sequence 2 limit))))
(loop for i from 2 to (sqrt limit)
(cl-loop for i from 2 to (sqrt limit)
when (aref xs i)
when (aref xs i)
do (loop for m from (* i i) to limit by i
do (cl-loop for m from (* i i) to limit by i
do (aset xs m 0)))
do (aset xs m 0)))
(remove 0 xs)))
(remove 0 xs)))</syntaxhighlight>
</lang>


=={{header|Erlang}}==
=={{header|Erlang}}==
===Erlang using Dicts===
===Erlang using Dicts===
<syntaxhighlight lang="erlang">
{{incorrect|Erlang|See talk page.}}
<lang Erlang>
-module( sieve_of_eratosthenes ).
-module( sieve_of_eratosthenes ).


Line 3,165: Line 6,909:


find_prime( error, _N, Acc ) -> Acc;
find_prime( error, _N, Acc ) -> Acc;
find_prime( {ok, _Value}, N, {Max, Dict} ) -> {Max, lists:foldl( fun dict:erase/2, Dict, lists:seq(N*N, Max, N) )}.
find_prime( {ok, _Value}, N, {Max, Dict} ) when Max > N*N ->
{Max, lists:foldl( fun dict:erase/2, Dict, lists:seq(N*N, Max, N))};
</lang>
find_prime( {ok, _Value}, _, R) -> R.
</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 3,178: Line 6,924:
Has the virtue of working for any -> N :)
Has the virtue of working for any -> N :)


<syntaxhighlight lang="erlang">
<lang Erlang>
-module( sieve ).
-module( sieve ).
-export( [main/1,primes/2] ).
-export( [main/1,primes/2] ).
Line 3,212: Line 6,958:
Primes = lists:filter( fun({_,Y}) -> Y > 0 end, Tuples),
Primes = lists:filter( fun({_,Y}) -> Y > 0 end, Tuples),
[ X || {X,_} <- Primes ].
[ X || {X,_} <- Primes ].
</syntaxhighlight>
</lang>
{{out}}
{{out}}
<pre>
<pre>
Line 3,226: Line 6,972:
Since I had written a really odd and slow one, I thought I'd best do a better performer. Inspired by an example from https://github.com/jupp0r
Since I had written a really odd and slow one, I thought I'd best do a better performer. Inspired by an example from https://github.com/jupp0r


<syntaxhighlight lang="erlang">
<lang Erlang>


-module(ossieve).
-module(ossieve).
Line 3,252: Line 6,998:
ResultSet = ordsets:add_element(2,sieve(Candidates,Candidates,ordsets:new(),N)),
ResultSet = ordsets:add_element(2,sieve(Candidates,Candidates,ordsets:new(),N)),
io:fwrite("Sieved... ~w~n",[ResultSet]).
io:fwrite("Sieved... ~w~n",[ResultSet]).
</syntaxhighlight>
</lang>
{{out}}
{{out}}
<pre>
<pre>
Line 3,267: Line 7,013:
A pure list comprehension approach.
A pure list comprehension approach.


<syntaxhighlight lang="erlang">
<lang Erlang>
-module(sieveof).
-module(sieveof).
-export([main/1,primes/1, primes/2]).
-export([main/1,primes/1, primes/2]).
Line 3,285: Line 7,031:
remove([H | X], [H | Y]) -> remove(X, Y);
remove([H | X], [H | Y]) -> remove(X, Y);
remove(X, [H | Y]) -> [H | remove(X, Y)].
remove(X, [H | Y]) -> [H | remove(X, Y)].
</syntaxhighlight>
</lang>
{out}
{out}
<pre>
<pre>
Line 3,300: Line 7,046:
===Erlang ets + cpu distributed implementation ===
===Erlang ets + cpu distributed implementation ===
much faster previous erlang examples
much faster previous erlang examples
<syntaxhighlight lang="erlang">
<lang Erlang>
#!/usr/bin/env escript
#!/usr/bin/env escript
%% -*- erlang -*-
%% -*- erlang -*-
Line 3,365: Line 7,111:
comp_i(J, I, N) when J =< N -> ets:insert(comp, {J, 1}), comp_i(J+I, I, N);
comp_i(J, I, N) when J =< N -> ets:insert(comp, {J, 1}), comp_i(J+I, I, N);
comp_i(J, _, N) when J > N -> ok.
comp_i(J, _, N) when J > N -> ok.
</syntaxhighlight>
</lang>
{{out}}
{{out}}
<pre>
<pre>
Line 3,376: Line 7,122:


=={{header|ERRE}}==
=={{header|ERRE}}==
<syntaxhighlight lang="erre">
<lang ERRE>
PROGRAM SIEVE_ORG
PROGRAM SIEVE_ORG
! --------------------------------------------------
! --------------------------------------------------
Line 3,406: Line 7,152:
PRINT(COUNT%;" PRIMES")
PRINT(COUNT%;" PRIMES")
END PROGRAM
END PROGRAM
</syntaxhighlight>
</lang>
{{out}}
{{out}}
last lines of the output screen
last lines of the output screen
Line 3,417: Line 7,163:
16301 16319 16333 16339 16349 16361 16363 16369 16381
16301 16319 16333 16339 16349 16361 16363 16369 16381
1899 PRIMES
1899 PRIMES
</pre>

=={{header|Euler}}==
The original Euler doesn't have loops built-in. Loops can easily be added by defining and calling suitable procedures with literal procedures as parameters. In this sample, a C-style "for" loop procedure is defined and used to sieve and print the primes.<br>
'''begin'''
'''new''' sieve; '''new''' for; '''new''' prime; '''new''' i;
for <- ` '''formal''' init; '''formal''' test; '''formal''' incr; '''formal''' body;
'''begin'''
'''label''' again;
init;
again: '''if''' test '''then''' '''begin''' body; incr; '''goto''' again '''end''' '''else''' 0
'''end'''
'
;
sieve <- ` '''formal''' n;
'''begin'''
'''new''' primes; '''new''' i; '''new''' i2; '''new''' j;
primes <- '''list''' n;
for( ` i <- 1 ', ` i <= n ', ` i <- i + 1 '
, ` primes[ i ] <- '''true''' '
);
primes[ 1 ] <- '''false''';
for( ` i <- 2 '
, ` [ i2 <- i * i ] <= n '
, ` i <- i + 1 '
, ` '''if''' primes[ i ] '''then'''
for( ` j <- i2 ', ` j <= n ', ` j <- j + i '
, ` primes[ j ] <- '''false''' '
)
'''else''' 0
'
);
primes
'''end'''
'
;
prime <- sieve( 30 );
for( ` i <- 1 ', ` i <= '''length''' prime ', ` i <- i + 1 '
, ` '''if''' prime[ i ] '''then''' '''out''' i '''else''' 0 '
)
'''end''' $

{{out}}
<pre>
NUMBER 2
NUMBER 3
NUMBER 5
NUMBER 7
NUMBER 11
NUMBER 13
NUMBER 17
NUMBER 19
NUMBER 23
NUMBER 29
</pre>
</pre>


=={{header|Euphoria}}==
=={{header|Euphoria}}==
<lang euphoria>constant limit = 1000
<syntaxhighlight lang="euphoria">constant limit = 1000
sequence flags,primes
sequence flags,primes
flags = repeat(1, limit)
flags = repeat(1, limit)
Line 3,437: Line 7,241:
end if
end if
end for
end for
? primes</lang>
? primes</syntaxhighlight>


Output:
Output:
Line 3,452: Line 7,256:


=={{header|F Sharp}}==
=={{header|F Sharp}}==
===Short with mutable state===
<syntaxhighlight lang="fsharp">
let primes max =
let mutable xs = [|2..max|]
let limit = max |> float |> sqrt |> int
for x in [|2..limit|] do
xs <- xs |> Array.except [|x*x..x..max|]
xs
</syntaxhighlight>
===Short Sweet Functional and Idiotmatic===
===Short Sweet Functional and Idiotmatic===
Well lists may not be lazy, but if you call it a sequence then it's a lazy list!
Well lists may not be lazy, but if you call it a sequence then it's a lazy list!
<lang fsharp>
<syntaxhighlight lang="fsharp">
(*
(*
An interesting implementation of The Sieve of Eratosthenes.
An interesting implementation of The Sieve of Eratosthenes.
Line 3,462: Line 7,275:
let rec fn n g = seq{ match n with
let rec fn n g = seq{ match n with
|1 -> yield false; yield! fn g g
|1 -> yield false; yield! fn g g
|_ -> yield true; yield! fn (n-1) g}
|_ -> yield true; yield! fn (n - 1) g}
let rec fg ng = seq{
let rec fg ng = seq {
let g = (Seq.findIndex(id) ng)+2
let g = (Seq.findIndex(id) ng) + 2 // decreasingly inefficient with range at O(n)!
yield g; yield! fg (Seq.cache(Seq.map2(&&) ng (fn (g-1) g)))}
yield g; yield! fn (g - 1) g |> Seq.map2 (&&) ng |> Seq.cache |> fg }
fg (Seq.initInfinite(fun x->true))
Seq.initInfinite (fun x -> true) |> fg
</syntaxhighlight>
</lang>
{{out}}
{{out}}
<pre>
<pre>
> SofE |> Seq.take 10 |> Seq.iter(fun n->printfn "%d" n);;
> SofE |> Seq.take 10 |> Seq.iter(printfn "%d");;
2
2
3
3
Line 3,482: Line 7,295:
29
29
</pre>
</pre>
Although interesting intellectually, and although the algorithm is more Sieve of Eratosthenes (SoE) than not in that it uses a progression of composite number representations separated by base prime gaps to cull, it isn't really SoE in performance due to several used functions that aren't linear with range, such as the "findIndex" that scans from the beginning of all primes to find the next un-culled value as the next prime in the sequence and the general slowness and inefficiency of F# nested sequence generation.

It is so slow that it takes in the order of seconds just to find the primes to a thousand!

For practical use, one would be much better served by any of the other functional sieves below, which can sieve to a million in less time than it takes this one to sieve to ten thousand. Those other functional sieves aren't all that many lines of code than this one.


===Functional===
===Functional===
Line 3,488: Line 7,306:


This is the idea behind Richard Bird's unbounded code presented in the Epilogue of [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M. O'Neill's article] in Haskell. It is about twice as much code as the Haskell code because F# does not have a built-in lazy list so that the effect must be constructed using a Co-Inductive Stream (CIS) type since no memoization is required, along with the use of recursive functions in combination with sequences. The type inference needs some help with the new CIS type (including selecting the generic type for speed). Note the use of recursive functions to implement multiple non-sharing delayed generating base primes streams, which along with these being non-memoizing means that the entire primes stream is not held in memory as for the original Bird code:
This is the idea behind Richard Bird's unbounded code presented in the Epilogue of [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M. O'Neill's article] in Haskell. It is about twice as much code as the Haskell code because F# does not have a built-in lazy list so that the effect must be constructed using a Co-Inductive Stream (CIS) type since no memoization is required, along with the use of recursive functions in combination with sequences. The type inference needs some help with the new CIS type (including selecting the generic type for speed). Note the use of recursive functions to implement multiple non-sharing delayed generating base primes streams, which along with these being non-memoizing means that the entire primes stream is not held in memory as for the original Bird code:
<lang fsharp>type CIS<'T> = struct val v:'T val cont:unit->CIS<'T> //'Co Inductive Stream for laziness
<syntaxhighlight lang="fsharp">type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness
new (v,cont) = { v = v; cont = cont } end
type Prime = uint32


let primesBird() =
let primesBird() =
let rec (^^) (xs: CIS<Prime>) (ys: CIS<Prime>) = // stream merge function
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
let x = xs.v in let y = ys.v
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
if x < y then CIS(x, fun() -> xs.cont() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
elif y < x then CIS(y, fun() -> xs ^^ ys.cont())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
else CIS(x, fun() -> xs.cont() ^^ ys.cont()) // no duplications
let pmltpls p = let rec nxt c = CIS(c, fun() -> nxt (c + p)) in nxt (p * p)
let pmltpls p = let rec nxt c = CIS(c, fun() -> nxt (c + p)) in nxt (p * p)
let rec allmltps (ps: CIS<Prime>) = CIS(pmltpls ps.v, fun() -> allmltps (ps.cont()))
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec cmpsts (css: CIS<CIS<Prime>>) =
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(css.v.v, fun() -> (css.v.cont()) ^^ (cmpsts (css.cont())))
CIS(c, fun() -> (ctlf()) ^^ (cmpsts (amstlf())))
let rec minusat n (cs: CIS<Prime>) =
let rec minusat n (CIS(c, ctlf) as cs) =
if n < cs.v then CIS(n, fun() -> minusat (n + 1u) cs)
if n < c then CIS(n, fun() -> minusat (n + 1u) cs)
else minusat (n + 1u) (cs.cont())
else minusat (n + 1u) (ctlf())
let rec baseprms() = CIS(2u, fun() -> minusat 3u (cmpsts (allmltps (baseprms()))))
let rec baseprms() = CIS(2u, fun() -> baseprms() |> allmltps |> cmpsts |> minusat 3u)
Seq.unfold (fun (ps: CIS<Prime>) -> Some(ps.v, ps.cont()))
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (baseprms())</syntaxhighlight>
(minusat 2u (cmpsts (allmltps (baseprms()))))</lang>


The above code sieves all numbers of two and up including all even numbers as per the page specification; the following code makes the very minor changes for an odds-only sieve, with a speedup of over a factor of two:
The above code sieves all numbers of two and up including all even numbers as per the page specification; the following code makes the very minor changes for an odds-only sieve, with a speedup of over a factor of two:
<lang fsharp>type CIS<'T> = struct val v:'T val cont:unit->CIS<'T> //'Co Inductive Stream for laziness
<syntaxhighlight lang="fsharp">type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness
new (v,cont) = { v = v; cont = cont } end
type Prime = uint32


let primesBirdOdds() =
let primesBirdOdds() =
let rec (^^) (xs: CIS<Prime>) (ys: CIS<Prime>) = // stream merge function
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
let x = xs.v in let y = ys.v
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
if x < y then CIS(x, fun() -> xs.cont() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
elif y < x then CIS(y, fun() -> xs ^^ ys.cont())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
else CIS(x, fun() -> xs.cont() ^^ ys.cont()) // no duplications
let pmltpls p = let adv = p + p
let pmltpls p = let adv = p + p
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec allmltps (ps: CIS<Prime>) = CIS(pmltpls ps.v, fun() -> allmltps (ps.cont()))
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec cmpsts (css: CIS<CIS<Prime>>) =
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(css.v.v, fun() -> (css.v.cont()) ^^ (cmpsts (css.cont())))
CIS(c, fun() -> ctlf() ^^ cmpsts (amstlf()))
let rec minusat n (cs: CIS<Prime>) =
let rec minusat n (CIS(c, ctlf) as cs) =
if n < cs.v then CIS(n, fun() -> minusat (n + 2u) cs)
if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
else minusat (n + 2u) (cs.cont())
else minusat (n + 2u) (ctlf())
let rec oddprms() = CIS(3u, fun() -> minusat 5u (cmpsts (allmltps (oddprms()))))
let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
Seq.unfold (fun (ps: CIS<Prime>) -> Some(ps.v, ps.cont()))
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))</syntaxhighlight>
(CIS(2u, fun() -> minusat 3u (cmpsts (allmltps (oddprms())))))</lang>


'''Tree Folding Sieve'''
'''Tree Folding Sieve'''


The above code is still somewhat inefficient as it operates on a linear right extending structure that deepens linearly with increasing base primes (those up to the square root of the currently sieved number); the following code changes the structure into an infinite binary tree-like folding by combining each pair of prime composite streams before further processing as usual - this decreases the processing by approximately a factor of log n:
The above code is still somewhat inefficient as it operates on a linear right extending structure that deepens linearly with increasing base primes (those up to the square root of the currently sieved number); the following code changes the structure into an infinite binary tree-like folding by combining each pair of prime composite streams before further processing as usual - this decreases the processing by approximately a factor of log n:
<lang fsharp>type CIS<'T> = struct val v:'T val cont:unit->CIS<'T> //'Co Inductive Stream for laziness
<syntaxhighlight lang="fsharp">type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness
new (v,cont) = { v = v; cont = cont } end
type Prime = uint32


let primesTreeFold() =
let primesTreeFold() =
let rec (^^) (xs: CIS<Prime>) (ys: CIS<Prime>) = // merge streams; no duplicates
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
let x = xs.v in let y = ys.v
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
if x < y then CIS(x, fun() -> xs.cont() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
elif y < x then CIS(y, fun() -> xs ^^ ys.cont())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
else CIS(x, fun() -> xs.cont() ^^ ys.cont())
let pmltpls p = let adv = p + p
let pmltpls p = let adv = p + p
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec allmltps (ps: CIS<Prime>) = CIS(pmltpls ps.v, fun() -> allmltps (ps.cont()))
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec pairs (css: CIS<CIS<Prime>>) =
let rec pairs (CIS(cs0, cs0tlf)) =
let (CIS(cs1, cs1tlf)) = cs0tlf() in CIS(cs0 ^^ cs1, fun() -> pairs (cs1tlf()))
let ncss = css.cont()
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(css.v ^^ ncss.v, fun() -> pairs (ncss.cont()))
CIS(c, fun() -> ctlf() ^^ (cmpsts << pairs << amstlf)())
let rec cmpsts (css: CIS<CIS<Prime>>) =
let rec minusat n (CIS(c, ctlf) as cs) =
CIS(css.v.v, fun() -> (css.v.cont()) ^^ (cmpsts << pairs << css.cont)())
let rec minusat n (cs: CIS<Prime>) =
if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
if n < cs.v then CIS(n, fun() -> minusat (n + 2u) cs)
else minusat (n + 2u) (ctlf())
let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
else minusat (n + 2u) (cs.cont())
let rec oddprms() = CIS(3u, fun() -> (minusat 5u << cmpsts << allmltps) (oddprms()))
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))</syntaxhighlight>
Seq.unfold (fun (ps: CIS<Prime>) -> Some(ps.v, ps.cont()))
(CIS(2u, fun() -> (minusat 3u << cmpsts << allmltps) (oddprms())))</lang>


The above code is over four times faster than the "BirdOdds" version (at least 10x faster than the first, "primesBird", producing the millionth prime) and is moderately useful for a range of the first million primes or so.
The above code is over four times faster than the "BirdOdds" version (at least 10x faster than the first, "primesBird", producing the millionth prime) and is moderately useful for a range of the first million primes or so.
Line 3,565: Line 7,370:


In order to investigate Priority Queue Sieves as espoused by O'Neill in the referenced article, one must find an equivalent implementation of a Min Heap Priority Queue as used by her. There is such an purely functional implementation [http://rosettacode.org/wiki/Priority_queue#Functional in RosettaCode translated from the Haskell code she used], from which the essential parts are duplicated here (Note that the key value is given an integer type in order to avoid the inefficiency of F# in generic comparison):
In order to investigate Priority Queue Sieves as espoused by O'Neill in the referenced article, one must find an equivalent implementation of a Min Heap Priority Queue as used by her. There is such an purely functional implementation [http://rosettacode.org/wiki/Priority_queue#Functional in RosettaCode translated from the Haskell code she used], from which the essential parts are duplicated here (Note that the key value is given an integer type in order to avoid the inefficiency of F# in generic comparison):
<lang fsharp>[<RequireQualifiedAccess>]
<syntaxhighlight lang="fsharp">[<RequireQualifiedAccess>]
module MinHeap =
module MinHeap =


Line 3,602: Line 7,407:


let replaceMin wk wv = function | Mt -> Mt
let replaceMin wk wv = function | Mt -> Mt
| Br(_, ll, rr) -> siftdown wk wv ll rr</lang>
| Br(_, ll, rr) -> siftdown wk wv ll rr</syntaxhighlight>


Except as noted for any individual code, all of the following codes need the following prefix code in order to implement the non-memoizing Co-Inductive Streams (CIS's) and to set the type of particular constants used in the codes to the same time as the "Prime" type:
Except as noted for any individual code, all of the following codes need the following prefix code in order to implement the non-memoizing Co-Inductive Streams (CIS's) and to set the type of particular constants used in the codes to the same time as the "Prime" type:
<lang fsharp>type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
<syntaxhighlight lang="fsharp">type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint32
type Prime = uint32
let frstprm = 2u
let frstprm = 2u
let frstoddprm = 3u
let frstoddprm = 3u
let inc1 = 1u
let inc1 = 1u
let inc = 2u</lang>
let inc = 2u</syntaxhighlight>


The F# equivalent to O'Neill's "odds-only" code is then implemented as follows, which needs the included changed prefix in order to change the primes type to a larger one to prevent overflow (as well the key type for the MinHeap needs to be changed from uint32 to uint64); it is functionally the same as the O'Neill code other than for minor changes to suit the use of CIS streams and the option output of the "peekMin" function:
The F# equivalent to O'Neill's "odds-only" code is then implemented as follows, which needs the included changed prefix in order to change the primes type to a larger one to prevent overflow (as well the key type for the MinHeap needs to be changed from uint32 to uint64); it is functionally the same as the O'Neill code other than for minor changes to suit the use of CIS streams and the option output of the "peekMin" function:
<lang fsharp>type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
<syntaxhighlight lang="fsharp">type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint64
type Prime = uint64
let frstprm = 2UL
let frstprm = 2UL
Line 3,643: Line 7,448:
let rec nxto i = CIS(i, fun() -> nxto (i + inc)) in nxto frstoddprm
let rec nxto i = CIS(i, fun() -> nxto (i + inc)) in nxto frstoddprm
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
(CIS(frstprm, fun() -> (sieve odds)))</lang>
(CIS(frstprm, fun() -> (sieve odds)))</syntaxhighlight>


However, that algorithm suffers in speed and memory use due to over-eager adding of prime composite streams to the queue such that the queue used is much larger than it needs to be and a much larger range of primes number must be used in order to avoid numeric overflow on the square of the prime added to the queue. The following code corrects that by using a secondary (actually a multiple of) base primes streams which are constrained to be based on a prime that is no larger than the square root of the currently sieved number - this permits the use of much smaller Prime types as per the default prefix:
However, that algorithm suffers in speed and memory use due to over-eager adding of prime composite streams to the queue such that the queue used is much larger than it needs to be and a much larger range of primes number must be used in order to avoid numeric overflow on the square of the prime added to the queue. The following code corrects that by using a secondary (actually a multiple of) base primes streams which are constrained to be based on a prime that is no larger than the square root of the currently sieved number - this permits the use of much smaller Prime types as per the default prefix:
<lang fsharp>let primesPQx() =
<syntaxhighlight lang="fsharp">let primesPQx() =
let rec nxtprm n pq q (bps: CIS<Prime>) =
let rec nxtprm n pq q (bps: CIS<Prime>) =
if n >= q then let bp = bps.v in let adv = bp + bp
if n >= q then let bp = bps.v in let adv = bp + bp
Line 3,665: Line 7,470:
nxtprm (frstoddprm + inc) MinHeap.empty (frstoddprm * frstoddprm) (oddprms()))
nxtprm (frstoddprm + inc) MinHeap.empty (frstoddprm * frstoddprm) (oddprms()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
(CIS(frstprm, fun() -> (oddprms())))</lang>
(CIS(frstprm, fun() -> (oddprms())))</syntaxhighlight>


The above code is well over five times faster than the previous translated O'Neill version for the given variety of reasons.
The above code is well over five times faster than the previous translated O'Neill version for the given variety of reasons.
Line 3,683: Line 7,488:
The following code is written in functional style other than it uses a mutable bit array to sieve the composites:
The following code is written in functional style other than it uses a mutable bit array to sieve the composites:


<lang fsharp>let primes limit =
<syntaxhighlight lang="fsharp">let primes limit =
let buf = System.Collections.BitArray(int limit + 1, true)
let buf = System.Collections.BitArray(int limit + 1, true)
let cull p = { p * p .. p .. limit } |> Seq.iter (fun c -> buf.[int c] <- false)
let cull p = { p * p .. p .. limit } |> Seq.iter (fun c -> buf.[int c] <- false)
Line 3,693: Line 7,498:
if argv = null || argv.Length = 0 then failwith "no command line argument for limit!!!"
if argv = null || argv.Length = 0 then failwith "no command line argument for limit!!!"
printfn "%A" (primes (System.UInt32.Parse argv.[0]) |> Seq.length)
printfn "%A" (primes (System.UInt32.Parse argv.[0]) |> Seq.length)
0 // return an integer exit code</lang>
0 // return an integer exit code</syntaxhighlight>


Substituting the following minor changes to the code for the "primes" function will only deal with the odd prime candidates for a speed up of over a factor of two as well as a reduction of the buffer size by a factor of two:
Substituting the following minor changes to the code for the "primes" function will only deal with the odd prime candidates for a speed up of over a factor of two as well as a reduction of the buffer size by a factor of two:


<lang fsharp>let primes limit =
<syntaxhighlight lang="fsharp">let primes limit =
let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
let buf = System.Collections.BitArray(int lmtb + 1, true)
let buf = System.Collections.BitArray(int lmtb + 1, true)
Line 3,705: Line 7,510:
let oddprms = { 0u .. lmtb } |> Seq.map (fun i -> if buf.[int i] then i + i + 3u else 0u)
let oddprms = { 0u .. lmtb } |> Seq.map (fun i -> if buf.[int i] then i + i + 3u else 0u)
|> Seq.filter ((<>) 0u)
|> Seq.filter ((<>) 0u)
seq { yield 2u; yield! oddprms }</lang>
seq { yield 2u; yield! oddprms }</syntaxhighlight>


The following code uses other functional forms for the inner culling loops of the "primes function" to reduce the use of inefficient sequences so as to reduce the execution time by another factor of almost three:
The following code uses other functional forms for the inner culling loops of the "primes function" to reduce the use of inefficient sequences so as to reduce the execution time by another factor of almost three:


<lang fsharp>let primes limit =
<syntaxhighlight lang="fsharp">let primes limit =
let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
let buf = System.Collections.BitArray(int lmtb + 1, true)
let buf = System.Collections.BitArray(int lmtb + 1, true)
Line 3,716: Line 7,521:
let rec cullp c = if c <= lmtb then buf.[int c] <- false; cullp (c + p)
let rec cullp c = if c <= lmtb then buf.[int c] <- false; cullp (c + p)
(if buf.[int i] then cullp s); culltest (i + 1u) in culltest 0u
(if buf.[int i] then cullp s); culltest (i + 1u) in culltest 0u
seq {yield 2u; for i = 0u to lmtb do if buf.[int i] then yield i + i + 3u }</lang>
seq {yield 2u; for i = 0u to lmtb do if buf.[int i] then yield i + i + 3u }</syntaxhighlight>


Now much of the remaining execution time is just the time to enumerate the primes as can be seen by turning "primes" into a primes counting function by substituting the following for the last line in the above code doing the enumeration; this makes the code run about a further five times faster:
Now much of the remaining execution time is just the time to enumerate the primes as can be seen by turning "primes" into a primes counting function by substituting the following for the last line in the above code doing the enumeration; this makes the code run about a further five times faster:


<lang fsharp> let rec count i acc =
<syntaxhighlight lang="fsharp"> let rec count i acc =
if i > int lmtb then acc else if buf.[i] then count (i + 1) (acc + 1) else count (i + 1) acc
if i > int lmtb then acc else if buf.[i] then count (i + 1) (acc + 1) else count (i + 1) acc
count 0 1</lang>
count 0 1</syntaxhighlight>


Since the final enumeration of primes is the main remaining bottleneck, it is worth using a "roll-your-own" enumeration implemented as an object expression so as to save many inefficiencies in the use of the built-in seq computational expression by substituting the following code for the last line of the previous codes, which will decrease the execution time by a factor of over three (instead of almost five for the counting-only version, making it almost as fast):
Since the final enumeration of primes is the main remaining bottleneck, it is worth using a "roll-your-own" enumeration implemented as an object expression so as to save many inefficiencies in the use of the built-in seq computational expression by substituting the following code for the last line of the previous codes, which will decrease the execution time by a factor of over three (instead of almost five for the counting-only version, making it almost as fast):


<lang fsharp> let nmrtr() =
<syntaxhighlight lang="fsharp"> let nmrtr() =
let i = ref -2
let i = ref -2
let rec nxti() = i:=!i + 1;if !i <= int lmtb && not buf.[!i] then nxti() else !i <= int lmtb
let rec nxti() = i:=!i + 1;if !i <= int lmtb && not buf.[!i] then nxti() else !i <= int lmtb
Line 3,742: Line 7,547:
member this.GetEnumerator() = nmrtr()
member this.GetEnumerator() = nmrtr()
interface System.Collections.IEnumerable with
interface System.Collections.IEnumerable with
member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }</lang>
member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }</syntaxhighlight>


The various optimization techniques shown here can be used "jointly and severally" on any of the basic versions for various trade-offs between code complexity and performance. Not shown here are other techniques of making the sieve faster, including extending wheel factorization to much larger wheels such as 2/3/5/7, pre-culling the arrays, page segmentation, and multi-processing.
The various optimization techniques shown here can be used "jointly and severally" on any of the basic versions for various trade-offs between code complexity and performance. Not shown here are other techniques of making the sieve faster, including extending wheel factorization to much larger wheels such as 2/3/5/7, pre-culling the arrays, page segmentation, and multi-processing.
Line 3,750: Line 7,555:
the following '''odds-only''' implmentations are written in an almost functional style avoiding the use of mutability except for the contents of the data structures uses to hold the state of the and any mutability necessary to implement a "roll-your-own" IEnumberable iterator interface for speed.
the following '''odds-only''' implmentations are written in an almost functional style avoiding the use of mutability except for the contents of the data structures uses to hold the state of the and any mutability necessary to implement a "roll-your-own" IEnumberable iterator interface for speed.


'''Unbounded Dictionary (Mutable Hash Table) Based Sieve'''
====Unbounded Dictionary (Mutable Hash Table) Based Sieve====


The following code uses the DotNet Dictionary class instead of the above functional Priority Queue to implement the sieve; as average (amortized) hash table access is O(1) rather than O(log n) as for the priority queue, this implementation is slightly faster than the priority queue version for the first million primes and will always be faster for any range above some low range value:
The following code uses the DotNet Dictionary class instead of the above functional Priority Queue to implement the sieve; as average (amortized) hash table access is O(1) rather than O(log n) as for the priority queue, this implementation is slightly faster than the priority queue version for the first million primes and will always be faster for any range above some low range value:
<lang fsharp>type Prime = uint32
<syntaxhighlight lang="fsharp">type Prime = uint32
let frstprm = 2u
let frstprm = 2u
let frstoddprm = 3u
let frstoddprm = 3u
Line 3,778: Line 7,583:
nxtprm (frstoddprm + inc) (frstoddprm * frstoddprm) (oddprms()))
nxtprm (frstoddprm + inc) (frstoddprm * frstoddprm) (oddprms()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
(CIS(frstprm, fun() -> (oddprms())))</lang>
(CIS(frstprm, fun() -> (oddprms())))</syntaxhighlight>


The above code uses functional forms of code (with the imperative style commented out to show how it could be done imperatively) and also uses a recursive non-sharing secondary source of base primes just as for the Priority Queue version. As for the functional codes, the Primes type can easily be changed to "uint64" for wider range of sieving.
The above code uses functional forms of code (with the imperative style commented out to show how it could be done imperatively) and also uses a recursive non-sharing secondary source of base primes just as for the Priority Queue version. As for the functional codes, the Primes type can easily be changed to "uint64" for wider range of sieving.
Line 3,784: Line 7,589:
In spite of having true O(n log log n) Sieve of Eratosthenes computational complexity where n is the range of numbers to be sieved, the above code is still not particularly fast due to the time required to compute the hash values and manipulations of the hash table.
In spite of having true O(n log log n) Sieve of Eratosthenes computational complexity where n is the range of numbers to be sieved, the above code is still not particularly fast due to the time required to compute the hash values and manipulations of the hash table.


'''Unbounded Page Segmented Mutable Array Sieve'''
====Unbounded Page-Segmented Bit-Packed Odds-Only Mutable Array Sieve====


Note that the following code is used for the F# entry [[Extensible_prime_generator#Unbounded_Mutable_Array_Generator]] of the Extensible prime generator page.
All of the above unbounded implementations including the above Dictionary based version are quite slow due to their large constant factor computational overheads, making them more of an intellectual exercise than something practical, especially when larger sieving ranges are required. The following code implements an unbounded page segmented version of the sieve in not that many more lines of code, yet runs about 25 times faster than the Dictionary version for larger ranges of sieving such as to one billion; it uses functional forms without mutability other than for the contents of the arrays and a reference cell used to implement the "roll-your-own" IEnumerable/IEnumerator interfaces for speed:
<lang fsharp>let private PGSZBTS = (1 <<< 14) * 8 // sieve buffer size in bits
type private PS = class
val i:int val p:uint64 val cmpsts:uint32[]
new(i,p,c) = { i=i; p=p; cmpsts=c } end
let rec primesPaged(): System.Collections.Generic.IEnumerable<_> =
let lbpse = lazy (primesPaged().GetEnumerator()) // lazy to prevent race
let bpa = ResizeArray() // fills from above sequence as needed
let makePg low =
let nxt = low + (uint64 PGSZBTS <<< 1)
let cmpsts = Array.zeroCreate (PGSZBTS >>> 5)
let inline notprm c = cmpsts.[c >>> 5] &&& (1u <<< c) <> 0u
let rec nxti c = if c < PGSZBTS && notprm c
then nxti (c + 1) else c
let inline mrkc c = let w = c >>> 5
cmpsts.[w] <- cmpsts.[w] ||| (1u <<< c)
let rec cullf i =
if notprm i then cullf (i + 1) else
let p = 3 + i + i in let sqr = p * p
if uint64 sqr < nxt then
let rec cullp c = if c < PGSZBTS then mrkc c; cullp (c + p)
else cullf (i + 1) in cullp ((sqr - 3) >>> 1)
if low <= 3UL then cullf 0 // special culling for the first page
else // cull rest based on a secondary base prime stream
let bpse = lbpse.Force()
if bpa.Count <= 0 then // move past 2 to 3
bpse.MoveNext() |> ignore; bpse.MoveNext() |> ignore
let rec fill np =
if np * np >= nxt then
let bpasz = bpa.Count
let rec cull i =
if i < bpasz then
let p = bpa.[i] in let sqr = p * p in let pi = int p
let strt = if sqr >= low then int (sqr - low) >>> 1
else let r = int (((low - sqr) >>> 1) % p)
if r = 0 then 0 else int p - r
let rec cullp c = if c < PGSZBTS then mrkc c; cullp (c + pi)
cullp strt; cull (i + 1) in cull 0
else bpa.Add(np); bpse.MoveNext() |> ignore
fill bpse.Current
fill bpse.Current // fill pba as necessary and do cull
let ni = nxti 0 in let np = low + uint64 (ni <<< 1)
PS(ni, np, cmpsts)
let nmrtr() =
let ps = ref (PS(0, 0UL, Array.zeroCreate 0))
{ new System.Collections.Generic.IEnumerator<_> with
member this.Current = (!ps).p
interface System.Collections.IEnumerator with
member this.Current = box ((!ps).p)
member this.MoveNext() =
let drps = !ps in let i = drps.i in let p = drps.p
let cmpsts = drps.cmpsts in let lmt = cmpsts.Length <<< 5
if p < 3UL then (if p < 2UL then ps := PS(0, 2UL, cmpsts); true
else ps := makePg 3UL; true) else
let inline notprm c = cmpsts.[c >>> 5] &&& (1u <<< c) <> 0u
let rec nxti c = if c < lmt && notprm c
then nxti (c + 1) else c
let ni = nxti (i + 1) in let np = p + uint64 ((ni - i) <<< 1)
if ni < lmt then ps := PS(ni, np, cmpsts); true
else ps := makePg np; true
member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"
interface System.IDisposable with
member this.Dispose() = () }
{ new System.Collections.Generic.IEnumerable<_> with
member this.GetEnumerator() = nmrtr()
interface System.Collections.IEnumerable with
member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }</lang>


All of the above unbounded implementations including the above Dictionary based version are quite slow due to their large constant factor computational overheads, making them more of an intellectual exercise than something practical, especially when larger sieving ranges are required. The following code implements an unbounded page segmented version of the sieve in not that many more lines of code, yet runs about 25 times faster than the Dictionary version for larger ranges of sieving such as to one billion; it uses functional forms without mutability other than for the contents of the arrays and the `primes` enumeration generator function that must use mutability for speed:
As with all of the efficient unbounded sieves, the above code uses a secondary enumerator of the base primes less than the square root of the currently culled range ("lbpse"), which is this case is a lazy (deffered evaluation) binding so as to avoid a race condition.
<syntaxhighlight lang="fsharp">type Prime = float // use uint64/int64 for regular 64-bit F#
type private PrimeNdx = float // they are slow in JavaScript polyfills


let inline private prime n = float n // match these convenience conversions
The above code is written to output the "uint64" type for very large ranges of primes since there is little computational cost to doing this for this algorithm. As written, the practical range for this sieve is about 16 billion, however, it can be extended to about 10^14 (a week or two of execution time) by setting the "PGSZBTS" constant to the size of the CPU L2 cache rather than the L1 cache (L2 is up to about two Megabytes for modern high end desktop CPU's) at a slight loss of efficiency (a factor of up to two or so) per composite number culling operation due to the slower memory access time.
let inline private primendx n = float n // with the types above!


let private cPGSZBTS = (1 <<< 14) * 8 // sieve buffer size in bits = CPUL1CACHE
Even with the custom IEnumerable/IEnumerator interfaces using an object expression (the F# built-in sequence operators are terribly inefficient), the time to enumerate the resulting primes takes longer than the time to actually cull the composite numbers from the sieving arrays. The time to do the actual culling is thus over 50 times faster than done using the Dictionary version. The slowness of enumeration, no matter what further tweaks are done to improve it (each value enumerated will always take function calls and a scan loop that will always take something in the order of 100 CPU clock cycles per value), means that further gains in speed using extreme wheel factorization and multi-processing have little point unless the actual work on the resulting primes is done through use of auxiliary functions not using iteration.

type private SieveBuffer = uint8[]

/// a Co-Inductive Stream (CIS) of an "infinite" non-memoized series...
type private CIS<'T> = CIS of 'T * (unit -> CIS<'T>) //' apostrophe formatting adjustment

/// lazy list (memoized) series of base prime page arrays...
type private BasePrime = uint32
type private BasePrimeArr = BasePrime[]
type private BasePrimeArrs = BasePrimeArrs of BasePrimeArr * Option<Lazy<BasePrimeArrs>>

/// Masking array is faster than bit twiddle bit shifts!
let private cBITMASK = [| 1uy; 2uy; 4uy; 8uy; 16uy; 32uy; 64uy; 128uy |]

let private cullSieveBuffer lwi (bpas: BasePrimeArrs) (sb: SieveBuffer) =
let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
let rec loopbp (BasePrimeArrs(bpa, bpatl) as ibpas) i =
if i >= bpa.Length then
match bpatl with
| None -> ()
| Some lv -> loopbp lv.Value 0 else
let bp = prime bpa.[i] in let bpndx = primendx ((bp - prime 3) / prime 2)
let s = (bpndx * primendx 2) * (bpndx + primendx 3) + primendx 3 in let bpint = int bp
if s <= lmti then
let s0 = // page cull start address calculation...
if s >= lwi then int (s - lwi) else
let r = (lwi - s) % (primendx bp)
if r = primendx 0 then 0 else int (bp - prime r)
let slmt = min btlmt (s0 - 1 + (bpint <<< 3))
let rec loopc c = // loop "unpeeling" is used so
if c <= slmt then // a constant mask can be used over the inner loop
let msk = cBITMASK.[c &&& 7]
let rec loopw w =
if w < sb.Length then sb.[w] <- sb.[w] ||| msk; loopw (w + bpint)
loopw (c >>> 3); loopc (c + bpint)
loopc s0; loopbp ibpas (i + 1) in loopbp bpas 0

/// fast Counting Look Up Table (CLUT) for pop counting...
let private cCLUT =
let arr = Array.zeroCreate 65536
let rec popcnt n cnt = if n > 0 then popcnt (n &&& (n - 1)) (cnt + 1) else uint8 cnt
let rec loop i = if i < 65536 then arr.[i] <- popcnt i 0; loop (i + 1)
loop 0; arr

let countSieveBuffer ndxlmt (sb: SieveBuffer): int =
let lstw = (ndxlmt >>> 3) &&& -2
let msk = (-2 <<< (ndxlmt &&& 15)) &&& 0xFFFF
let inline cntem i m =
int cCLUT.[int (((uint32 sb.[i + 1]) <<< 8) + uint32 sb.[i]) ||| m]
let rec loop i cnt =
if i >= lstw then cnt - cntem lstw msk else loop (i + 2) (cnt - cntem i 0)
loop 0 ((lstw <<< 3) + 16)

/// a CIS series of pages from the given start index with the given SieveBuffer size,
/// and provided with a polymorphic converter function to produce
/// and type of result from the culled page parameters...
let rec private makePrimePages strtwi btsz
(cnvrtrf: PrimeNdx -> SieveBuffer -> 'T): CIS<'T> =
let bpas = makeBasePrimes() in let sb = Array.zeroCreate (btsz >>> 3)
let rec nxtpg lwi =
Array.fill sb 0 sb.Length 0uy; cullSieveBuffer lwi bpas sb
CIS(cnvrtrf lwi sb, fun() -> nxtpg (lwi + primendx btsz))
nxtpg strtwi

/// secondary feed of lazy list of memoized pages of base primes...
and private makeBasePrimes(): BasePrimeArrs =
let sb2bpa lwi (sb: SieveBuffer) =
let bsbp = uint32 (primendx 3 + lwi + lwi)
let arr = Array.zeroCreate <| countSieveBuffer 255 sb
let rec loop i j =
if i < 256 then
if sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy then loop (i + 1) j
else arr.[j] <- bsbp + uint32 (i + i); loop (i + 1) (j + 1)
loop 0 0; arr
// finding the first page as not part of the loop and making succeeding
// pages lazy breaks the recursive data race!
let frstsb = Array.zeroCreate 32
let fkbpas = BasePrimeArrs(sb2bpa (primendx 0) frstsb, None)
cullSieveBuffer (primendx 0) fkbpas frstsb
let rec nxtbpas (CIS(bpa, tlf)) = BasePrimeArrs(bpa, Some(lazy (nxtbpas (tlf()))))
BasePrimeArrs(sb2bpa (primendx 0) frstsb,
Some(lazy (nxtbpas <| makePrimePages (primendx 256) 256 sb2bpa)))

/// produces a generator of primes; uses mutability for better speed...
let primes(): unit -> Prime =
let sb2prms lwi (sb: SieveBuffer) = lwi, sb in let mutable ndx = -1
let (CIS((nlwi, nsb), npgtlf)) = // use page generator function above!
makePrimePages (primendx 0) cPGSZBTS sb2prms
let mutable lwi = nlwi in let mutable sb = nsb
let mutable pgtlf = npgtlf
let mutable baseprm = prime 3 + prime (lwi + lwi)
fun() ->
if ndx < 0 then ndx <- 0; prime 2 else
let inline notprm i = sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy
while ndx < cPGSZBTS && notprm ndx do ndx <- ndx + 1
if ndx >= cPGSZBTS then // get next page if over
let (CIS((nlwi, nsb), npgtlf)) = pgtlf() in ndx <- 0
lwi <- nlwi; sb <- nsb; pgtlf <- npgtlf
baseprm <- prime 3 + prime (lwi + lwi)
while notprm ndx do ndx <- ndx + 1
let ni = ndx in ndx <- ndx + 1 // ready for next call!
baseprm + prime (ni + ni)

let countPrimesTo (limit: Prime): int = // much faster!
if limit < prime 3 then (if limit < prime 2 then 0 else 1) else
let topndx = (limit - prime 3) / prime 2 |> primendx
let sb2cnt lwi (sb: SieveBuffer) =
let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
countSieveBuffer
(if lmti < topndx then btlmt else int (topndx - lwi)) sb, lmti
let rec loop (CIS((cnt, nxti), tlf)) count =
if nxti < topndx then loop (tlf()) (count + cnt)
else count + cnt
loop <| makePrimePages (primendx 0) cPGSZBTS sb2cnt <| 1

/// sequences are convenient but slow...
let primesSeq() = primes() |> Seq.unfold (fun gen -> Some(gen(), gen))
printfn "The first 25 primes are: %s"
( primesSeq() |> Seq.take 25
|> Seq.fold (fun s p -> s + string p + " ") "" )
printfn "There are %d primes up to a million."
( primesSeq() |> Seq.takeWhile ((>=) (prime 1000000)) |> Seq.length )

let rec cntto gen lmt cnt = // faster than seq's but still slow
if gen() > lmt then cnt else cntto gen lmt (cnt + 1)

let limit = prime 1_000_000_000
let start = System.DateTime.Now.Ticks
// let answr = cntto (primes()) limit 0 // slower way!
let answr = countPrimesTo limit // over twice as fast way!
let elpsd = (System.DateTime.Now.Ticks - start) / 10000L
printfn "Found %d primes to %A in %d milliseconds." answr limit elpsd</syntaxhighlight>

{{out}}
<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Found 50847534 primes to 1000000000 in 2161 milliseconds.</pre>

As with all of the efficient unbounded sieves, the above code uses a secondary enumerator of the base primes less than the square root of the currently culled range, which is this case is a lazy (deferred memoized evaluation) binding by small pages of base primes which also uses the laziness of the deferral of subsequent pages so as to avoid a race condition.

The above code is written to output the "uint64" type for very large ranges of primes since there is little computational cost to doing this for this algorithm when used with 64-bit compilation; however, for the Fable transpiled to JavaScript, the largest contiguous integer that can be represented is the 64-bit floating point mantissa of 52 bits and thus the large numbers can be represented by floats in this case since a 64-bit polyfill is very slow. As written, the practical range for this sieve is about 16 billion, however, it can be extended to about 10^14 (a week or two of execution time) by setting the "PGSZBTS" constant to the size of the CPU L2 cache rather than the L1 cache (L2 is up to about two Megabytes for modern high end desktop CPU's) at a slight loss of efficiency (a factor of up to two or so) per composite number culling operation due to the slower memory access time. When the Fable compilation option is used, execution speed is roughly the same as using F# with DotNet Core.

Even with the custom `primes` enumerator generator (the F#/Fable built-in sequence operators are terribly inefficient), the time to enumerate the resulting primes takes longer than the time to actually cull the composite numbers from the sieving arrays. The time to do the actual culling is thus over 50 times faster than done using the Dictionary version. The slowness of enumeration, no matter what further tweaks are done to improve it (each value enumerated will always take a function calls and a scan loop that will always take something in the order of 100 CPU clock cycles per value), means that further gains in speed using extreme wheel factorization and multi-processing have little point unless the actual work on the resulting primes is done through use of auxiliary functions not using iteration. Such a function is provided here to count the primes by pages using a "pop count" look up table to reduce the counting time to only a small fraction of a second.

=={{header|Factor}}==
Factor already contains two implementations of the sieve of Eratosthenes in <code>math.primes.erato</code> and <code>math.primes.erato.fast</code>. It is suggested to use one of them for real use, as they use faster types, faster unsafe arithmetic, and/or wheels to speed up the sieve further. Shown here is a more straightforward implementation that adheres to the restrictions given by the task (namely, no wheels).

Factor is pleasantly multiparadigm. Usually, it's natural to write more functional or declarative code in Factor, but this is an instance where it is more natural to write imperative code. Lexical variables are useful here for expressing the necessary mutations in a clean way.
<syntaxhighlight lang="factor">USING: bit-arrays io kernel locals math math.functions
math.ranges prettyprint sequences ;
IN: rosetta-code.sieve-of-erato

<PRIVATE

: init-sieve ( n -- seq ) ! Include 0 and 1 for easy indexing.
1 - <bit-array> dup set-bits ?{ f f } prepend ;

! Given the sieve and a prime starting index, create a range of
! values to mark composite. Start at the square of the prime.
: to-mark ( seq n -- range )
[ length 1 - ] [ dup dup * ] bi* -rot <range> ;

! Mark multiples of prime n as composite.
: mark-nths ( seq n -- )
dupd to-mark [ swap [ f ] 2dip set-nth ] with each ;

: next-prime ( index seq -- n ) [ t = ] find-from drop ;

PRIVATE>

:: sieve ( n -- seq )
n sqrt 2 n init-sieve :> ( limit i! s )
[ i limit < ] ! sqrt optimization
[ s i mark-nths i 1 + s next-prime i! ] while t s indices ;

: sieve-demo ( -- )
"Primes up to 120 using sieve of Eratosthenes:" print
120 sieve . ;

MAIN: sieve-demo</syntaxhighlight>

=={{header|FOCAL}}==
<syntaxhighlight lang="focal">1.1 T "PLEASE ENTER LIMIT"
1.2 A N
1.3 I (2047-N)5.1
1.4 D 2
1.5 Q

2.1 F X=2,FSQT(N); D 3
2.2 F W=2,N; I (SIEVE(W)-2)4.1

3.1 I (-SIEVE(X))3.3
3.2 F Y=X*X,X,N; S SIEVE(Y)=2
3.3 R

4.1 T %4.0,W,!

5.1 T "PLEASE ENTER A NUMBER LESS THAN 2048."!; G 1.1</syntaxhighlight>
Note that with the 4k paper tape version of FOCAL, the program will run out of memory for N>190 or so.


=={{header|Forth}}==
=={{header|Forth}}==
Line 3,881: Line 7,822:
100 sieve
100 sieve

{{out}}
<pre>Primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 </pre>

===Alternate Odds-Only, Better Style===

The above code is not really very good Forth style as the main initialization, sieving, and output, are all in one `sieve` routine which makes it difficult to understand and refactor; Forth code is normally written in a series of very small routines which makes it easier to understand what is happening on the data stack, since Forth does not have named local re-entrant variable names as most other languages do for local variables (which other languages also normally store local variables on the stack). Also, it uses the `HERE` pointer to user space which points to the next available memory after all compilation is done as a unsized buffer pointer, but as it does not reserve that space for the sieving buffer, it can be changed by other concatenated routines in unexpected ways; better is to allocate the sieving buffer as required from the available space at the time the routines are run and pass that address between concatenated functions until a finalization function frees the memory and clears the stack; this is equivalent to allocating from the "heap" in other languages. The below code demonstrates these ideas:

<syntaxhighlight lang="forth">: prime? ( addr -- ? ) C@ 0= ; \ test composites array for prime

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
DO 2 PICK I + TRUE SWAP C! DUP +LOOP DROP ;

\ process for required prime limit; allocate and initialize returned buffer...
: initsieve ( u -- u a-addr)
3 - DUP 0< IF 0 ELSE
1 RSHIFT 1+ DUP ALLOCATE 0<> IF ABORT" Memory allocation error!!!"
ELSE 2DUP SWAP ERASE THEN
THEN ;

\ pass through sieving to given index in given buffer address as side effect...
: sieve ( u a-addr -- u a-addr )
0 \ initialize test index i -- numv bufa i
BEGIN \ test prime square index < limit
DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
WHILE \ -- numv bufa sqri i
2 PICK OVER + prime? IF cullpi! \ -- numv bufa i
ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ print primes to given limit...
: .primes ( u a-addr -- )
OVER 0< IF DROP 2 - 0< IF ( ." No primes!" ) ELSE ( ." Prime: 2" ) THEN
ELSE ." Primes: 2 " SWAP 0
DO DUP I + prime? IF I I + 3 + . THEN LOOP FREE DROP THEN ;

\ count number of primes found for number odd numbers within
\ given presumed sieved buffer starting at address...
: countprimes@ ( u a-addr -- )
SWAP DUP 0< IF 1+ 0< IF DROP 0 ELSE 1 THEN
ELSE 1 SWAP \ -- bufa cnt numv
0 DO OVER I + prime? IF 1+ THEN LOOP SWAP FREE DROP
THEN ;

\ shows counted number of primes to the given limit...
: .countprimesto ( u -- )
DUP initsieve sieve countprimes@
CR ." Found " . ." primes Up to the " . ." limit." ;

\ testing the code...
100 initsieve sieve .primes
1000000 .countprimesto</syntaxhighlight>

{{out}}

<pre>Primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes Up to the 1000000 limit.</pre>

As well as solving the stated problems making it much easier to understand and refactor, an odds-only sieve takes half the space and less than half the time.

===Bit-Packing the Sieve Buffer (Odds-Only)===

Although the above version resolves many problems of the first version, it is wasteful of memory as each composite number in the sieve buffer is a byte of eight bits representing a boolean value. The memory required can be reduced eight-fold by bit packing the sieve buffer; this will take more "bit-twiddling" to read and write the bits, but reducing the memory used will give better cache assiciativity to larger ranges such that there will be a net gain in performance. This will make the code more complex and the stack manipulations will be harder to write, debug, and maintain, so ANS Forth 1994 provides a local variable naming facility to make this much easier. The following code implements bit-packing of the sieve buffer using local named variables when required:

<syntaxhighlight lang="text">\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
OVER 3 RSHIFT + C@ SWAP 7 AND bits + C@ AND 0= ;

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
DO I 3 RSHIFT 3 PICK + DUP C@ I 7 AND bits + C@ OR SWAP C! DUP +LOOP
DROP ;

\ initializes sieve storage and parameters
\ given sieve limit, returns bit limit and buffer address ..
: initsieve ( u -- u a-addr )
3 - \ test limit...
DUP 0< IF 0 ELSE \ return if number of bits is <= 0!
1 RSHIFT 1+ \ finish conbersion to number of bits
DUP 1- CellShft RSHIFT 1+ \ round up to even number of cells
CELLS DUP ALLOCATE 0= IF DUP ROT ERASE \ set cells0. to zero
ELSE ABORT" Memory allocation error!!!"
THEN
THEN ;

\ pass through sieving to given index in given buffer address as side effect...
: sieve ( u a-addr -- u a-addr )
0 \ initialize test index i -- numv bufa i
BEGIN \ test prime square index < limit
DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
WHILE \ -- numv bufa sqri i
DUP 3 PICK prime? IF cullpi! \ -- numv bufa i
ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ prints already found primes from sieved array...
: .primes ( u a-addr -- )
SWAP CR ." Primes to " DUP DUP + 2 + 2 MAX . ." are: "
DUP 0< IF 1+ 0< IF ." none." ELSE 2 . THEN DROP \ case no primes or just 2
ELSE 2 . 0 DO I OVER prime? IF I I + 3 + . THEN LOOP FREE DROP
THEN ;

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u ) numbts 16 SWAP - ;
: initLUT ( -- ) cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: popcount@ ( u -- u )
0 1 CELlS 1 RSHIFT 0
DO OVER 65535 AND cntLUT16 + C@ + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index-1 in array address;
\ params are number of bits used - bits, negative indicates <2/2 out: 0/1,
\ given address is of the allocated bit buffer - bufa;
\ values used: bmsk is bit mask to limit bit in last cell,
\ lci is cell index of last cell used, cnt is the return value...
\ NOTE. this is for little-endian; big-endian needs a byte swap
\ before the last mask and popcount operation!!!
: primecount@ ( u a-addr -- u )
LOCALS| bufa numb |
numb 0< IF numb 1+ 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
ELSE
numb 1- TO numb \ numb -= 1
1 \ initial count
numb CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
0 ?DO bufa I + @ popcount@ + 1 CELLS +LOOP \ -- lci cnt
SWAP bufa + @ \ -- cnt lstCELL
-2 numb CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
popcount@ + \ add popcount of last masked CELL -- cnt
bufa FREE DROP \ free bufa -- bmsk cnt lastcell@
THEN ;

: .countprimesto ( u -- u )
dup initsieve sieve primecount@
CR ." There are " . ." primes Up to the " . ." limit." ;

100 initsieve sieve .primes
1000000000 .countprimesto</syntaxhighlight>

The output of the above code is the same as the previous version, but it takes about two thirds the time while using eight times less memory; it takes about 6.5 seconds on my Intel Skylake i5-6500 at 3.6 GHz (turbo) using swiftForth (32-bit) and about 3.5 seconds on VFX Forth (64-bit), both of which compile to machine code but with the latter much more optimized; gforth-fast is about twice as slow as swiftForth and five times slower then VFX Forth as it just compiles to threaded execution tokens (more like an interpreter).

===Page-Segmented Bit-Packed Odds-Only Version===

While the above version does greatly reduce the amount of memory used for a given sieving range and thereby also somewhat reduces execution time; any sieve intended for sieving to limits of a hundred million or more should use a page-segmented implementation; page-segmentation means that only storage for a representation of the base primes up to the square root of the limit plus a sieve buffer that should also be at least proportional to the same square root is required; this will again make the execution faster as ranges go up due to better cache associativity with most memory accesses being within the CPU cache sizes. The following Forth code implements a basic version that does this:

<syntaxhighlight lang="forth">\ CPU L1 and L2 cache sizes in bits; power of 2...
1 17 LSHIFT CONSTANT L1CacheBits
L1CacheBits 8 * CONSTANT L2CacheBits

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ initializes sieve buffer storage and parameters
\ given sieve buffer bit size (even number of CELLS), returns buffer address ..
: initSieveBuffer ( u -- a-addr )
CellShft RSHIFT \ even number of cells
CELLS ALLOCATE 0<> IF ABORT" Memory allocation error!!!" THEN ;

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
OVER 3 RSHIFT + C@ SWAP 7 AND bits + C@ AND 0= ;

\ given square index and prime index, u0, as sell as bitsz,
\ sieve the multiples of said prime leaving prime index on the stack...
: cullpi! ( u u0 u u addr -- u0 )
LOCALS| sba bitsz lwi | DUP DUP + 3 + ROT \ -- i prm sqri
\ culling start incdx address calculation...
lwi 2DUP > IF - ELSE SWAP - OVER MOD DUP 0<> IF OVER SWAP - THEN
THEN bitsz SWAP \ -- i prm bitsz strti
DO I 3 RSHIFT sba + DUP C@ I 7 AND bits + C@ OR SWAP C! DUP +LOOP
DROP ;

\ cull sieve buffer given base wheel index, bit size,
\ address base prime sieved buffer and
\ the address of the sieve buffer to be culled of composite bits...
: cullSieveBuffer ( u u a-addr a-addr -- )
>R >R 2DUP + R> R> \ -- lwi bitsz rngi bpba sba
LOCALS| sba bpba rngi bitsz lwi |
bitsz 1- CellShft RSHIFT 1+ CELLS sba SWAP ERASE \ clear sieve buffer
0 \ initialize base prime index i -- i
BEGIN \ test prime square index < limit
DUP DUP DUP + SWAP 3 + * 3 + TUCK rngi < \ sqri = 2*i * (I+3) + 3
WHILE \ -- sqri i
DUP bpba prime? IF lwi bitsz sba cullpi! ELSE SWAP DROP THEN \ -- i
1+ REPEAT 2DROP ; \ --

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u ) numbts 16 SWAP - ;
: initLUT ( -- ) cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: popcount@ ( u -- u )
0 1 CELlS 1 RSHIFT 0
DO OVER 65535 AND cntLUT16 + C@ + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index in array address...
: countSieveBuffer@ ( u a-addr -- u )
LOCALS| bufa lmti |
0 \ initial count -- cnt
lmti CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
0 ?DO bufa I + @ popcount@ + 1 CELLS +LOOP \ -- lci cnt
SWAP bufa + @ \ -- cnt lstCELL
-2 lmti CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
popcount@ + ; \ add popcount of last masked CELL -- cnt

\ prints found primes from series of culled sieve buffers...
: .primes ( u -- )
DUP CR ." Primes to " . ." are: "
DUP 3 - 0< IF DUP 2 - 0< IF ." none." ELSE 2 . THEN \ <2/2 -> 0/1
ELSE 2 .
3 - 1 RSHIFT 1+ \ -- rngi
DUP 1- L2CacheBits / L2CacheBits * 3 RSHIFT \ -- rng rngi pglmtbytes
L1CacheBits initSieveBuffer \ address of base prime sieve buffer
L2CacheBits initSieveBuffer \ address of main sieve buffer
LOCALS| sba bpsba pglmt | \ local variables -- rngi
0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
pglmt 0 ?DO
I L2CacheBits bpsba sba cullSieveBuffer
I L2CacheBits 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
L2CacheBits +LOOP \ rngi
L2CacheBits mod DUP 0> IF \ one more page!
pglmt DUP L2CacheBits bpsba sba cullSieveBuffer
SWAP 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
THEN bpsba FREE DROP sba FREE DROP
THEN ; \ --

\ prints count of found primes from series of culled sieve buffers...
: .countPrimesTo ( u -- )
DUP 3 - 0< IF 2 - 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
ELSE
DUP 3 - 1 RSHIFT 1+
DUP 1- L2CacheBits / L2CacheBits * \ -- rng rngi pglmtbytes
L1CacheBits initSieveBuffer \ address of base prime sieve buffer
L2CacheBits initSieveBuffer \ address of main sieve buffer
LOCALS| sba bpsba pglmt | \ local variables -- rng rngi
0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
1 pglmt 0 ?DO
I L2CacheBits bpsba sba cullSieveBuffer
L2CacheBits 1- sba countSieveBuffer@ +
L2CacheBits +LOOP \ rng rngi cnt
SWAP L2CacheBits mod DUP 0> IF \ one more page!
pglmt OVER bpsba sba cullSieveBuffer
1- sba countSieveBuffer@ + \ partial count!
THEN
bpsba FREE DROP sba FREE DROP \ -- range cnt
THEN CR ." There are " . ." primes Up to the " . ." limit." ;

100 .primes
1000000000 .countPrimesTo</syntaxhighlight>

{{out}}

<pre>Primes to 100 are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 50847534 primes Up to the 1000000000 limit.</pre>

For simplicity, the base primes array is left as a sieved bit packed array (which takes minimum space) at the cost of having to scan the bit array for base primes on every page-segment culling pass. The page-segment sieve buffer is set as a fixed multiple of this (intended to fit within the CPU L2 cache size) in order to reduce the base prime start index address calculation overhead by this factor at the cost of slightly increased memory access times, which access times are still only about the same as the fastest inner culling time or less anyway. When the cache sizes are set to the 32 Kilobyte/256 Kilobyte size for L1/L2, respectively, by changing <code forth>1 18 LSHIFT CONSTANT L1CacheBits</code>) as for my Intel Skylake i5-6500 at 3.6 GHz (single-threaded turbo), it runs in about 1.25 seconds on 64-bit VFX Forth, 3.75 seconds on 32-bit swiftForth, and 12.4 seconds on 64-bit gforth-fast, obviously with the tuned in-lined machine language compiling of VFX Forth much faster than the threaded execution token interpreting of gforth and with swiftForth lacking the machine code inlining of VFX Forth.

VFX Forth is only about 25 % slower than the algorithm as written in the fastest of languages, just as they advertise.

As written, the algorithm works efficiently up to over ten billion (1e10) with 64-bit systems, but could easily be refactored to use floating point or double precision for inputs and outputs [https://stackoverflow.com/a/55761023/549617 as I have done in a StackOverflow answer in JavaScript] without costing much in execution time so 32-bit systems would have the much higher limit.

The implementation is efficient up to this range, but with a change so that the base primes array can grow with increasing limit, can sieve to much higher ranges with a loss of efficiency in unused base prime start address calculations that can't be used as the culling spans exceed the fixed sieve buffer size. Again, this can be solved by also making the page-segmentation sieve buffer grow as the square root of the limit.

Further improvements by a factor of almost four in overall execution speed would be gained by implementing maximum wheel-factorization [https://stackoverflow.com/a/57108107/549617 as per my other StackOverflow JavaScript answer], which also effectively increases sieve buffer sizes by a factor of 48 in sieving by modulo residual bit planes.

Finally, multi-processing could be applied to increase the execution speed by about the number of effective cores (non SMT - Hyper Threads) as in four on my Skylake machine; however, neither the 1994 ANS Forth standard nor the 2012 standard has a standard Forth way of implementing this so each of the implementations use their own custom WORDS; since the resulting code would not be cross-implementation, I am not going to do this.

I likely won't even add the Maximum Wheel-Factorized version as in the above linked JavaScript code, since this code is enough to demonstrate what I was going to show: that Forth can be an efficient language, albeit a little hard to code, read, and maintain due to the reliance on anonymous data stack operations; it is a language whose best use is likely in cross-compiling to embedded systems where it can easily be customized and extended as required, and because it doesn't actually require a base operating system, can use its core facilities, functions, and extensions in place of such an OS to result in a minimum memory footprint.


=={{header|Fortran}}==
=={{header|Fortran}}==
{{works with|Fortran|77}}
<syntaxhighlight lang="fortran">
PROGRAM MAIN
INTEGER LI
WRITE (6,100)
READ (5,110) LI
call SOE(LI)
100 FORMAT( 'Limit:' )
110 FORMAT( I4 )
STOP
END
C --- SIEVE OF ERATOSTHENES ----------
SUBROUTINE SOE( LI )
INTEGER LI
LOGICAL A(LI)
INTEGER SL,P,I
DO 10 I=1,LI
A(I) = .TRUE.
10 CONTINUE
SL = INT(SQRT(REAL(LI)))
A(1) = .FALSE.
DO 30 P=2,SL
IF ( .NOT. A(P) ) GOTO 30
DO 20 I=P*P,LI,P
A(I)=.FALSE.
20 CONTINUE
30 CONTINUE

DO 40 I=2,LI
IF ( A(I) ) WRITE(6,100) I
40 CONTINUE

100 FORMAT(I3)
RETURN
END
</syntaxhighlight>

{{works with|Fortran|90 and later}}
{{works with|Fortran|90 and later}}
<lang fortran>program sieve
<syntaxhighlight lang="fortran">program sieve


implicit none
implicit none
Line 3,901: Line 8,172:
write (*, *)
write (*, *)


end program sieve</lang>
end program sieve</syntaxhighlight>
Output:
Output:
<lang>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</lang>
<syntaxhighlight lang="text">2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</syntaxhighlight>

Optimised using a pre-computed wheel based on 2:
Because it uses four byte logical's (default size) as elements of the sieve buffer, the above code uses 400 bytes of memory for this trivial task of sieving to 100; it also has 49 + 31 + 16 + 8 = 104 (for the culling by the primes of two, three, five, and seven) culling operations.
<lang fortran>program sieve_wheel_2

'''Optimised using a pre-computed wheel based on 2:'''

<syntaxhighlight lang="fortran">program sieve_wheel_2


implicit none
implicit none
Line 3,923: Line 8,198:
write (*, *)
write (*, *)


end program sieve_wheel_2</lang>
end program sieve_wheel_2</syntaxhighlight>
Output:
Output:
<lang>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</lang>
<syntaxhighlight lang="text">2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</syntaxhighlight>

This so-called "optimized" version still uses 400 bytes of memory but slightly reduces to 74 operations from 104 operations including the initialization of marking all of the even representations as composite due to skipping the re-culling of the even representation, so isn't really much of an optimization at all!

'''Optimized using a proper implementation of a wheel 2:'''

The above implementations, especially the second odds-only code, are some of the most inefficient versions of the Sieve of Eratosthenes in any language here as to time and space efficiency, only worse by some naive JavaScript implementations that use eight-byte Number's as logical values; the second claims to be wheel factorized but still uses all the same memory as the first and still culls by the even numbers in the initialization of the sieve buffer. As well, using four bytes (default logical size) to store a boolean value is terribly wasteful if these implementations were to be extended to non-toy ranges. The following code implements proper wheel factorization by two, reducing the space used by a factor of about eight to 49 bytes by using `byte` as the sieve buffer array elements and not requiring the evens initialization, thus reducing the number of operations to 16 + 8 + 4 = 28 (for the culling primes of three, five, and seven) culling operations:

<syntaxhighlight lang="fortran">program sieve_wheel_2
implicit none
integer, parameter :: i_max = 100
integer, parameter :: i_limit = (i_max - 3) / 2
integer :: i
byte, dimension (0:i_limit) :: composites
composites = 0
do i = 0, (int (sqrt (real (i_max))) - 3) / 2
if (composites(i) == 0) composites ((i + i) * (i + 3) + 3 : i_limit : i + i + 3) = 1.
end do
write (*, '(i0, 1x)', advance = 'no') 2
do i = 0, i_limit
if (composites (i) == 0) write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
end do
write (*, *)
end program sieve_wheel_2</syntaxhighlight>

The output is the same as the earlier version.

'''Optimized using bit packing to reduce the memory use by a further factor of eight:'''

The above implementation is still space inefficient in effectively only using one bit out of eight; the following version implements bit packing to reduce memory use by a factor of eight by using bits to represent composite numbers rather than bytes:

<syntaxhighlight lang="fortran">program sieve_wheel_2
implicit none
integer, parameter :: i_max = 10000000
integer, parameter :: i_range = (i_max - 3) / 2
integer :: i, j, k, cnt
byte, dimension (0:i_range / 8) :: composites
composites = 0 ! pre-initialized?
do i = 0, (int (sqrt (real (i_max))) - 3) / 2
if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
do j = (i + i) * (i + 3) + 3, i_range, i + i + 3
k = shiftr(j, 3)
composites(k) = ior(composites(k), shiftl(1, iand(j, 7)))
end do
end if
end do
! write (*, '(i0, 1x)', advance = 'no') 2
cnt = 1
do i = 0, i_range
if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
! write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
cnt = cnt + 1
end if
end do
! write (*, *)
print '(a, i0, a, i0, a, f0.0, a)', &
'There are ', cnt, ' primes up to ', i_max, '.'
end program sieve_wheel_2</syntaxhighlight>

{{out}}
<pre>There are 664579 primes up to 10000000.</pre>

When the lines to print the results are enabled, the output to a maximum value of 100 is still exactly the same as the other versions, and it has exactly the same number of culling operations as the immediately above optimized version for the same range; the only difference is that less memory is used. Although the culling operations are somewhat more complex, for larger ranges the time saved in better cache associativity due to more effective use of the cache more than makes up for it so average culling time is actually reduced, so that this version can count the number of primes to several million (it takes a lot of time to list hundreds of thousands of primes, but counting is faster) in a few tens of milliseconds. For ranges above a few tens of millions, a page-segmented sieve is much more efficient due to further improved use of the CPU caches.

===Multi-Threaded Page-Segmented Bit-Packed Odds-Only Version===

As well as adding page-segmentation, the following code adds multi-processing which is onc of the capabilities for which modern Fortran is known:

<syntaxhighlight lang="fortran">subroutine cullSieveBuffer(lwi, size, bpa, sba)

implicit none
integer, intent(in) :: lwi, size
byte, intent(in) :: bpa(0:size - 1)
byte, intent(out) :: sba(0:size - 1)
integer :: i_limit, i_bitlmt, i_bplmt, i, sqri, bp, si, olmt, msk, j
byte, dimension (0:7) :: bits
common /twiddling/ bits
i_bitlmt = size * 8 - 1
i_limit = lwi + i_bitlmt
i_bplmt = size / 4
sba = 0
i = 0
sqri = (i + i) * (i + 3) + 3
do while (sqri <= i_limit)
if (iand(int(bpa(shiftr(i, 3))), shiftl(1, iand(i, 7))) == 0) then
! start index address calculation...
bp = i + i + 3
if (lwi <= sqri) then
si = sqri - lwi
else
si = mod((lwi - sqri), bp)
if (si /= 0) si = bp - si
end if
if (bp <= i_bplmt) then
olmt = min(i_bitlmt, si + bp * 8 - 1)
do while (si <= olmt)
msk = bits(iand(si, 7))
do j = shiftr(si, 3), size - 1, bp
sba(j) = ior(int(sba(j)), msk)
end do
si = si + bp
end do
else
do while (si <= i_bitlmt)
j = shiftr(si, 3)
sba(j) = ior(sba(j), bits(iand(si, 7)))
si = si + bp
end do
end if
end if
i = i + 1
sqri = (i + i) * (i + 3) + 3
end do
end subroutine cullSieveBuffer
integer function countSieveBuffer(lmti, almti, sba)
implicit none
integer, intent(in) :: lmti, almti
byte, intent(in) :: sba(0:almti)
integer :: bmsk, lsti, i, cnt
byte, dimension (0:65535) :: clut
common /counting/ clut
cnt = 0
bmsk = iand(shiftl(-2, iand(lmti, 15)), 65535)
lsti = iand(shiftr(lmti, 3), -2)
do i = 0, lsti - 1, 2
cnt = cnt + clut(shiftl(iand(int(sba(i)), 255), 8) + iand(int(sba(i + 1)), 255))
end do
countSieveBuffer = cnt + clut(ior(shiftl(iand(int(sba(lsti)), 255), 8) + iand(int(sba(lsti + 1)), 255), bmsk))
end function countSieveBuffer
program sieve_paged
use OMP_LIB
implicit none
integer, parameter :: i_max = 1000000000, i_range = (i_max - 3) / 2
integer, parameter :: i_l1cache_size = 16384, i_l1cache_bitsz = i_l1cache_size * 8
integer, parameter :: i_l2cache_size = i_l1cache_size * 8, i_l2cache_bitsz = i_l2cache_size * 8
integer :: cr, c0, c1, i, j, k, cnt
integer, save :: scnt
integer :: countSieveBuffer
integer :: numthrds
byte, dimension (0:i_l1cache_size - 1) :: bpa
byte, save, allocatable, dimension (:) :: sba
byte, dimension (0:7) :: bits = (/ 1, 2, 4, 8, 16, 32, 64, -128 /)
byte, dimension (0:65535) :: clut
common /twiddling/ bits
common /counting/ clut
type heaparr
byte, allocatable, dimension(:) :: thrdsba
end type heaparr
type(heaparr), allocatable, dimension (:) :: sbaa
!$OMP THREADPRIVATE(scnt, sba)
numthrds = 1
!$ numthrds = OMP_get_max_threads()
allocate(sbaa(0:numthrds - 1))
do i = 0, numthrds - 1
allocate(sbaa(i)%thrdsba(0:i_l2cache_size - 1))
end do
CALL SYSTEM_CLOCK(count_rate=cr)
CALL SYSTEM_CLOCK(c0)
do k = 0, 65535 ! initialize counting Look Up Table
j = k
i = 16
do while (j > 0)
i = i - 1
j = iand(j, j - 1)
end do
clut(k) = i
end do
bpa = 0 ! pre-initialization not guaranteed!
call cullSieveBuffer(0, i_l1cache_size, bpa, bpa)
cnt = 1
!$OMP PARALLEL DO ORDERED
do i = i_l2cache_bitsz, i_range, i_l2cache_bitsz * 8
scnt = 0
sba = sbaa(mod(i, numthrds))%thrdsba
do j = i, min(i_range, i + 8 * i_l2cache_bitsz - 1), i_l2cache_bitsz
call cullSieveBuffer(j - i_l2cache_bitsz, i_l2cache_size, bpa, sba)
scnt = scnt + countSieveBuffer(i_l2cache_bitsz - 1, i_l2cache_size, sba)
end do
!$OMP ATOMIC
cnt = cnt + scnt
end do
!$OMP END PARALLEL DO
j = i_range / i_l2cache_bitsz * i_l2cache_bitsz
k = i_range - j
if (k /= i_l2cache_bitsz - 1) then
call cullSieveBuffer(j, i_l2cache_size, bpa, sbaa(0)%thrdsba)
cnt = cnt + countSieveBuffer(k, i_l2cache_size, sbaa(0)%thrdsba)
end if
! write (*, '(i0, 1x)', advance = 'no') 2
! do i = 0, i_range
! if (iand(sba(shiftr(i, 3)), bits(iand(i, 7))) == 0) write (*, '(i0, 1x)', advance='no') (i + i + 3)
! end do
! write (*, *)
CALL SYSTEM_CLOCK(c1)
print '(a, i0, a, i0, a, f0.0, a)', 'Found ', cnt, ' primes up to ', i_max, &
' in ', ((c1 - c0) / real(cr) * 1000), ' milliseconds.'
do i = 0, numthrds - 1
deallocate(sbaa(i)%thrdsba)
end do
deallocate(sbaa)
end program sieve_paged</syntaxhighlight>

{{out}}
<pre>Found 50847534 primes up to 1000000000 in 219. milliseconds.</pre>

The above output was as compiled with gfortran -O3 -fopenmp using version 11.1.1-1 on my Intel Skylake i5-6500 CPU at 3.2 GHz multithreaded with four cores. There are a few more optimizations that could be made in applying Maximum Wheel-Factorization [https://stackoverflow.com/a/57108107/549617 as per my StackOverflow answer in JavaScript], which will make this almost four times faster yet again. If that optimization were done, sieving to a billion as here is really too trivial to measure and one should sieve at least up to ten billion to start to get a long enough time to be measured accurately. As explained in that answer, the Maximum Wheel-Factorized code will work efficiently up to about a trillion (1e12), when it needs yet another "bucket sieve" optimization to allow it to continue to scale efficiently for increasing range. The final optimization which can speed up the code by almost a factor of two is a very low level loop unrolling technique that I'm not sure will work with the compiler, but as it works in C/C++ and other similar languages including those that compile through LLVM, it ought to.

=={{header|Free Pascal}}==
===Basic version===
function Sieve returns a list of primes less than or equal to the given aLimit
<syntaxhighlight lang="pascal">
program prime_sieve;
{$mode objfpc}{$coperators on}
uses
SysUtils, GVector;
type
TPrimeList = specialize TVector<DWord>;
function Sieve(aLimit: DWord): TPrimeList;
var
IsPrime: array of Boolean;
I, SqrtBound: DWord;
J: QWord;
begin
Result := TPrimeList.Create;
Inc(aLimit, Ord(aLimit < High(DWord))); //not a problem because High(DWord) is composite
SetLength(IsPrime, aLimit);
FillChar(Pointer(IsPrime)^, aLimit, Byte(True));
SqrtBound := Trunc(Sqrt(aLimit));
for I := 2 to aLimit do
if IsPrime[I] then
begin
Result.PushBack(I);
if I <= SqrtBound then
begin
J := I * I;
repeat
IsPrime[J] := False;
J += I;
until J > aLimit;
end;
end;
end;

//usage

var
Limit: DWord = 0;
function ReadLimit: Boolean;
var
Lim: Int64;
begin
if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
if (Lim >= 0) and (Lim <= High(DWord)) then
begin
Limit := DWord(Lim);
exit(True);
end;
Result := False;
end;
procedure PrintUsage;
begin
WriteLn('Usage: prime_sieve Limit');
WriteLn(' where Limit in the range [0, ', High(DWord), ']');
Halt;
end;
procedure PrintPrimes(aList: TPrimeList);
var
I: DWord;
begin
if aList.Size <> 0 then begin
if aList.Size > 1 then
for I := 0 to aList.Size - 2 do
Write(aList[I], ', ');
WriteLn(aList[aList.Size - 1]);
end;
aList.Free;
end;
begin
if not ReadLimit then
PrintUsage;
try
PrintPrimes(Sieve(Limit));
except
on e: Exception do
WriteLn('An exception ', e.ClassName, ' occurred with message: ', e.Message);
end;
end.
</syntaxhighlight>
===Alternative segmented(odds only) version===
function OddSegmentSieve returns a list of primes less than or equal to the given aLimit
<syntaxhighlight lang="pascal">
program prime_sieve;
{$mode objfpc}{$coperators on}
uses
SysUtils, Math;
type
TPrimeList = array of DWord;
function OddSegmentSieve(aLimit: DWord): TPrimeList;
function EstimatePrimeCount(aLimit: DWord): DWord;
begin
case aLimit of
0..1: Result := 0;
2..200: Result := Trunc(1.6 * aLimit/Ln(aLimit)) + 1;
else
Result := Trunc(aLimit/(Ln(aLimit) - 2)) + 1;
end;
end;
function Sieve(aLimit: DWord; aNeed2: Boolean): TPrimeList;
var
IsPrime: array of Boolean;
I: DWord = 3;
J, SqrtBound: DWord;
Count: Integer = 0;
begin
if aLimit < 2 then
exit(nil);
SetLength(IsPrime, (aLimit - 1) div 2);
FillChar(Pointer(IsPrime)^, Length(IsPrime), Byte(True));
SetLength(Result, EstimatePrimeCount(aLimit));
SqrtBound := Trunc(Sqrt(aLimit));
if aNeed2 then
begin
Result[0] := 2;
Inc(Count);
end;
for I := 0 to High(IsPrime) do
if IsPrime[I] then
begin
Result[Count] := I * 2 + 3;
if Result[Count] <= SqrtBound then
begin
J := Result[Count] * Result[Count];
repeat
IsPrime[(J - 3) div 2] := False;
J += Result[Count] * 2;
until J > aLimit;
end;
Inc(Count);
end;
SetLength(Result, Count);
end;
const
PAGE_SIZE = $8000;
var
IsPrime: array[0..Pred(PAGE_SIZE)] of Boolean; //current page
SmallPrimes: TPrimeList = nil;
I: QWord;
J, PageHigh, Prime: DWord;
Count: Integer;
begin
if aLimit < PAGE_SIZE div 4 then
exit(Sieve(aLimit, True));
I := Trunc(Sqrt(aLimit));
SmallPrimes := Sieve(I + 1, False);
Count := Length(SmallPrimes) + 1;
I += Ord(not Odd(I));
SetLength(Result, EstimatePrimeCount(aLimit));
while I <= aLimit do
begin
PageHigh := Min(Pred(PAGE_SIZE * 2), aLimit - I);
FillChar(IsPrime, PageHigh div 2 + 1, Byte(True));
for Prime in SmallPrimes do
begin
J := DWord(I) mod Prime;
if J <> 0 then
J := Prime shl (1 - J and 1) - J;
while J <= PageHigh do
begin
IsPrime[J div 2] := False;
J += Prime * 2;
end;
end;
for J := 0 to PageHigh div 2 do
if IsPrime[J] then
begin
Result[Count] := J * 2 + I;
Inc(Count);
end;
I += PAGE_SIZE * 2;
end;
SetLength(Result, Count);
Result[0] := 2;
Move(SmallPrimes[0], Result[1], Length(SmallPrimes) * SizeOf(DWord));
end;

//usage

var
Limit: DWord = 0;
function ReadLimit: Boolean;
var
Lim: Int64;
begin
if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
if (Lim >= 0) and (Lim <= High(DWord)) then
begin
Limit := DWord(Lim);
exit(True);
end;
Result := False;
end;
procedure PrintUsage;
begin
WriteLn('Usage: prime_sieve Limit');
WriteLn(' where Limit in the range [0, ', High(DWord), ']');
Halt;
end;
procedure PrintPrimes(const aList: TPrimeList);
var
I: DWord;
begin
for I := 0 to Length(aList) - 2 do
Write(aList[I], ', ');
if aList <> nil then
WriteLn(aList[High(aList)]);
end;
begin
if not ReadLimit then
PrintUsage;
PrintPrimes(OddSegmentSieve(Limit));
end.
</syntaxhighlight>


=={{header|FreeBASIC}}==
=={{header|FreeBASIC}}==
<lang freebasic>' FB 1.05.0
<syntaxhighlight lang="freebasic">' FB 1.05.0


Sub sieve(n As Integer)
Sub sieve(n As Integer)
Line 3,966: Line 8,683:
Print
Print
Print "Press any key to quit"
Print "Press any key to quit"
Sleep</lang>
Sleep</syntaxhighlight>


{{out}}
{{out}}
Line 3,982: Line 8,699:
947 953 967 971 977 983 991 997
947 953 967 971 977 983 991 997
</pre>
</pre>

=={{header|Frink}}==
<syntaxhighlight lang="frink">
n = eval[input["Enter highest number: "]]
results = array[sieve[n]]
println[results]
println[length[results] + " prime numbers less than or equal to " + n]

sieve[n] :=
{
// Initialize array
array = array[0 to n]
array@1 = 0

for i = 2 to ceil[sqrt[n]]
if array@i != 0
for j = i^2 to n step i
array@j = 0

return select[array, { |x| x != 0 }]
}
</syntaxhighlight>

=={{header|Furor}}==
''Note: With benchmark function''

<syntaxhighlight lang="furor">
tick sto startingtick
#g 100000 sto MAX
@MAX mem !maximize sto primeNumbers
one count
@primeNumbers 0 2 [^]
2 @MAX külső: {||
@count {|
{}§külső {} []@primeNumbers !/ else{<}§külső
|} // @count vége
@primeNumbers @count++ {} [^]
|} // @MAX vége
@primeNumbers free
."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }
</syntaxhighlight>

=={{header|Peri}}==
''Note: With benchmark function''

<syntaxhighlight lang="peri">
###sysinclude standard.uh
tick sto startingtick
#g 100000 sto MAX
@MAX mem !maximize sto primeNumbers
one count
2 0 sto#s primeNumbers
2 @MAX külső: {{ ,
@count {{
{{}}§külső primeNumbers[{{}}] !/ else {{<}}§külső
}} // @count vége
//{{}} gprintnl // A talált prímszám kiiratásához kommentezzük ki e sort
{{}} @count++ sto#s primeNumbers
}} // @MAX vége
@primeNumbers inv mem
//."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }

</syntaxhighlight>

=={{header|FutureBasic}}==
===Basic sieve of array of booleans===
<syntaxhighlight lang="futurebasic">window 1, @"Sieve of Eratosthenes", (0,0,720,300)

begin globals
dynamic gPrimes(1) as Boolean
end globals

local fn SieveOfEratosthenes( n as long )
long i, j
for i = 2 to n
for j = i * i to n step i
gPrimes(j) = _true
next
if gPrimes(i) = 0 then print i,
next i
kill gPrimes
end fn

fn SieveOfEratosthenes( 100 )

HandleEvents</syntaxhighlight>
Output:
<pre>
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
</pre>

=={{header|Fōrmulæ}}==

{{FormulaeEntry|page=https://formulae.org/?script=examples/Sieve_of_Eratosthenes}}

'''Solution'''

[[File:Fōrmulæ - Sieve of Eratosthenes 01.png]]

'''Test case'''

[[File:Fōrmulæ - Sieve of Eratosthenes 02.png]]

[[File:Fōrmulæ - Sieve of Eratosthenes 03.png]]


=={{header|GAP}}==
=={{header|GAP}}==
<lang gap>Eratosthenes := function(n)
<syntaxhighlight lang="gap">Eratosthenes := function(n)
local a, i, j;
local a, i, j;
a := ListWithIdenticalEntries(n, true);
a := ListWithIdenticalEntries(n, true);
Line 4,008: Line 8,836:
Eratosthenes(100);
Eratosthenes(100);


[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]</lang>
[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]</syntaxhighlight>


=={{header|GLBasic}}==
=={{header|GLBasic}}==
<lang GLBasic>// Sieve of Eratosthenes (find primes)
<syntaxhighlight lang="glbasic">// Sieve of Eratosthenes (find primes)
// GLBasic implementation
// GLBasic implementation


Line 4,035: Line 8,863:


KEYWAIT
KEYWAIT
</syntaxhighlight>
</lang>

=={{header|FutureBasic}}==
===Basic sieve of array of booleans===
<lang futurebasic>
include "ConsoleWindow"

begin globals
dim dynamic gPrimes(1) as Boolean
end globals

local fn SieveOfEratosthenes( n as long )
dim as long i, j

for i = 2 to n
for j = i * i to n step i
gPrimes(j) = _true
next
if gPrimes(i) = 0 then print i;
next i
kill gPrimes
end fn

fn SieveOfEratosthenes( 100 )
</lang>
Output:
<pre>
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
</pre>


=={{header|Go}}==
=={{header|Go}}==
===Basic sieve of array of booleans===
===Basic sieve of array of booleans===
<lang go>package main
<syntaxhighlight lang="go">package main
import "fmt"
import "fmt"


Line 4,108: Line 8,908:
}
}
}
}
}</lang>
}</syntaxhighlight>
Output:
Output:
<pre>
<pre>
Line 4,127: Line 8,927:
The above version's output is rather specialized; the following version uses a closure function to enumerate over the culled composite number array, which is bit packed. By using this scheme for output, no extra memory is required above that required for the culling array:
The above version's output is rather specialized; the following version uses a closure function to enumerate over the culled composite number array, which is bit packed. By using this scheme for output, no extra memory is required above that required for the culling array:


<lang go>package main
<syntaxhighlight lang="go">package main


import (
import (
Line 4,174: Line 8,974:
}
}
fmt.Printf("\r\n%v\r\n", count)
fmt.Printf("\r\n%v\r\n", count)
}</lang>
}</syntaxhighlight>
{{output}}
{{output}}
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Line 4,181: Line 8,981:
===Sieve Tree===
===Sieve Tree===
A fairly odd sieve tree method:
A fairly odd sieve tree method:
<lang go>package main
<syntaxhighlight lang="go">package main
import "fmt"
import "fmt"


Line 4,240: Line 9,040:
fmt.Println(p())
fmt.Println(p())
}
}
}</lang>
}</syntaxhighlight>


===Concurrent Daisy-chain sieve===
===Concurrent Daisy-chain sieve===
A concurrent prime sieve adopted from the example in the "Go Playground" window at http://golang.org/
A concurrent prime sieve adopted from the example in the "Go Playground" window at http://golang.org/
<lang go>package main
<syntaxhighlight lang="go">package main
import "fmt"
import "fmt"
Line 4,295: Line 9,095:
}
}
}
}
}</lang>
}</syntaxhighlight>
The output:
The output:
<pre>
<pre>
Line 4,305: Line 9,105:
===Postponed Concurrent Daisy-chain sieve===
===Postponed Concurrent Daisy-chain sieve===
Here we postpone the ''creation'' of filters until the prime's square is seen in the input, to radically reduce the amount of filter channels in the sieve chain.
Here we postpone the ''creation'' of filters until the prime's square is seen in the input, to radically reduce the amount of filter channels in the sieve chain.
<lang go>package main
<syntaxhighlight lang="go">package main
import "fmt"
import "fmt"
Line 4,372: Line 9,172:
}
}
}
}
}</lang>
}</syntaxhighlight>


The output:
The output:
Line 4,380: Line 9,180:


[http://ideone.com/I0AXf5 Runs at ~ n^1.2] empirically, producing up to n=25,000 primes on ideone in under 5 seconds.
[http://ideone.com/I0AXf5 Runs at ~ n^1.2] empirically, producing up to n=25,000 primes on ideone in under 5 seconds.
===Incremental Odds-only Sieve===
Uses Go's built-in hash tables to store odd composites, and defers adding new known composites until the square is seen.
<syntaxhighlight lang="go">
package main

import "fmt"

func main() {
primes := make(chan int)
go PrimeSieve(primes)

p := <-primes
for p < 100 {
fmt.Printf("%d ", p)
p = <-primes
}

fmt.Println()
}

func PrimeSieve(out chan int) {
out <- 2
out <- 3

primes := make(chan int)
go PrimeSieve(primes)

var p int
<-primes
p = <-primes

sieve := make(map[int]int)
q := p * p
n := p

for {
n += 2
step, isComposite := sieve[n]
if isComposite {
delete(sieve, n)
m := n + step
for sieve[m] != 0 {
m += step
}
sieve[m] = step

} else if n < q {
out <- n

} else {
step = p + p
m := n + step
for sieve[m] != 0 {
m += step
}
sieve[m] = step
p = <-primes
q = p * p
}
}
}
</syntaxhighlight>
The output:
<pre>
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
</pre>


=={{header|Groovy}}==
=={{header|Groovy}}==
This solution uses a BitSet for compactness and speed, but in [[Groovy]], BitSet has full List semantics. It also uses both the "square root of the boundary" shortcut and the "square of the prime" shortcut.
This solution uses a BitSet for compactness and speed, but in [[Groovy]], BitSet has full List semantics. It also uses both the "square root of the boundary" shortcut and the "square of the prime" shortcut.
<lang groovy>def sievePrimes = { bound ->
<syntaxhighlight lang="groovy">def sievePrimes = { bound ->
def isPrime = new BitSet(bound)
def isPrime = new BitSet(bound)
isPrime[0..1] = false
isPrime[0..1] = false
Line 4,393: Line 9,259:
}
}
(0..bound).findAll { isPrime[it] }
(0..bound).findAll { isPrime[it] }
}</lang>
}</syntaxhighlight>


Test:
Test:
<lang groovy>println sievePrimes(100)</lang>
<syntaxhighlight lang="groovy">println sievePrimes(100)</syntaxhighlight>


Output:
Output:
Line 4,403: Line 9,269:
=={{header|GW-BASIC}}==
=={{header|GW-BASIC}}==


<lang qbasic>10 INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
<syntaxhighlight lang="qbasic">10 INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20 DIM FLAGS(LIMIT)
20 DIM FLAGS(LIMIT)
30 FOR N = 2 TO SQR (LIMIT)
30 FOR N = 2 TO SQR (LIMIT)
Line 4,414: Line 9,280:
100 FOR N = 2 TO LIMIT
100 FOR N = 2 TO LIMIT
110 IF FLAGS(N) = 0 THEN PRINT N;", ";
110 IF FLAGS(N) = 0 THEN PRINT N;", ";
120 NEXT N</lang>
120 NEXT N</syntaxhighlight>


=={{header|Haskell}}==
=={{header|Haskell}}==

===Mutable unboxed arrays===
Mutable array of unboxed <code>Bool</code>s indexed by <code>Int</code>s:

<syntaxhighlight lang="haskell">{-# LANGUAGE FlexibleContexts #-} -- too lazy to write contexts...
{-# OPTIONS_GHC -O2 #-}

import Control.Monad.ST ( runST, ST )
import Data.Array.Base ( MArray(newArray, unsafeRead, unsafeWrite),
IArray(unsafeAt),
STUArray, unsafeFreezeSTUArray, assocs )
import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing...

primesTo :: Int -> [Int] -- generate a list of primes to given limit...
primesTo limit = runST $ do
let lmt = limit - 2-- raw index of limit!
cmpsts <- newArray (2, limit) False -- when indexed is true is composite
cmpstsf <- unsafeFreezeSTUArray cmpsts -- frozen in place!
let getbpndx bp = (bp, bp * bp - 2) -- bp -> bp, raw index of start cull
cullcmpst i = unsafeWrite cmpsts i True -- cull composite by raw ndx
cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
mapM_ cull4bpndx
$ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
[ getbpndx bp | (bp, False) <- assocs cmpstsf ]
return [ p | (p, False) <- assocs cmpstsf ] -- non-raw ndx is prime

-- testing...
main :: IO ()
main = do
putStrLn $ "The primes up to 100 are " ++ show (primesTo 100)
putStrLn $ "The number of primes up to a million is " ++
show (length $ primesTo 1000000)
let top = 1000000000
start <- getPOSIXTime
let answr = length $ primesTo top
stop <- answr `seq` getPOSIXTime -- force result for timing!
let elpsd = round $ 1e3 * (stop - start) :: Int

putStrLn $ "Found " ++ show answr ++ " to " ++ show top ++
" in " ++ show elpsd ++ " milliseconds."</syntaxhighlight>

The above code chooses conciseness and elegance over speed, but it isn't too slow:
{{out}}
<pre>The primes up to 100 are [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
The number of primes up to a million is 78498
Found 50847534 to 1000000000 in 12435 milliseconds.</pre>

Run on an Intel Sky Lake i5-2500 at 3.6 GHZ (single threaded boost). As per the comments in the below, this is greatly sped up by a constant factor by using the raw `unsafeWrite`; use of the "unsafe" versions that avoid run time array bounds checks on every operation is entirely safe here as the indexing is inherently limited to be within the bounds by their use in the loops. There is an additional benefit of about 20 per cent in speed if run with the LLVM back end compiler option (add the "-fllvm" flag) if the right version of LLVM is available to the GHC Haskell compiler. We see the relatively small benefit of using LLVM in that this program spends a relatively small percentage of time in the tight inner culling loop where LLVM can help the most and a high part of the time is spent just enumerating the result list.


===Mutable unboxed arrays, odds only===
===Mutable unboxed arrays, odds only===
Mutable array of unboxed <code>Bool</code>s indexed by <code>Int</code>s, representing odds only:
Mutable array of unboxed <code>Bool</code>s indexed by <code>Int</code>s, representing odds only:


<lang haskell>import Control.Monad (forM_, when)
<syntaxhighlight lang="haskell">import Control.Monad (forM_, when)
import Control.Monad.ST
import Control.Monad.ST
import Data.Array.ST
import Data.Array.ST
Line 4,440: Line 9,354:
primesToUO :: Int -> [Int]
primesToUO :: Int -> [Int]
primesToUO top | top > 1 = 2 : [2*i + 1 | (i,True) <- assocs $ sieveUO top]
primesToUO top | top > 1 = 2 : [2*i + 1 | (i,True) <- assocs $ sieveUO top]
| otherwise = []</lang>
| otherwise = []</syntaxhighlight>


This represents ''odds only'' in the array. [http://ideone.com/KwZNc Empirical orders of growth] is ~ <i>n<sup>1.2</sup></i> in ''n'' primes produced, and improving for bigger ''n''&zwj;&thinsp;&zwj;s. Memory consumption is low (array seems to be packed) and growing about linearly with ''n''. Can further be [http://ideone.com/j24jxV significantly sped up] by re-writing the <code>forM_</code> loops with direct recursion, and using <code>unsafeRead</code> and <code>unsafeWrite</code> operations.
This represents ''odds only'' in the array. [http://ideone.com/KwZNc Empirical orders of growth] is ~ <i>n<sup>1.2</sup></i> in ''n'' primes produced, and improving for bigger ''n''&zwj;&thinsp;&zwj;s. Memory consumption is low (array seems to be packed) and growing about linearly with ''n''. Can further be [http://ideone.com/j24jxV significantly sped up] by re-writing the <code>forM_</code> loops with direct recursion, and using <code>unsafeRead</code> and <code>unsafeWrite</code> operations.

In light of the performance of the previous and following submissions results, the IDEOne results seem somewhat slow at about 10 seconds over a range of about a third of a billion, likely due to some lazily deferred operations in the processing. See the next submission for expected speeds for odds only.

The measured empirical orders of growth as per the table in the IDEOne link are easily understood if one considers that these slowish run times are primarily limited by the time to lazily enumerate the results and that the number of found primes to enumerate varies as (top / log top) by the Euler relationship. Since the prime density decreases by this relationship, the enumeration has the inverse relationship as it takes longer per prime to find the primes in the sieved buffer. Log of a million is 1.2 times larger than log of a hundred thousand and of course this ratio gets smaller with range: the ratio of the log of a billion as compared to log of a hundred million is 1.125, etc.

===Alternate Version of Mutable unboxed arrays, odds only===

The reason for this alternate version is to have an accessible version of "odds only" that uses the same optimizations and is written in the same coding style as the basic version. This can be used by just substituting the following code for the function of the same name in the first base example above. Mutable array of unboxed <code>Bool</code>s indexed by <code>Int</code>s, representing odds only:

<syntaxhighlight lang="haskell">primesTo :: Int -> [Int] -- generate a list of primes to given limit...
primesTo limit
| limit < 2 = []
| otherwise = runST $ do
let lmt = (limit - 3) `div` 2 - 1 -- limit index!
oddcmpsts <- newArray (0, lmt) False -- when indexed is true is composite
oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
mapM_ cull4bpndx
$ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
[ getbpndx i | (i, False) <- assocs oddcmpstsf ]
return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]</syntaxhighlight>

{{out}}
<pre>The primes up to 100 are [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
The number of primes up to a million is 78498
Found 50847534 to 1000000000 in 6085 milliseconds.</pre>

A "monolithic buffer" odds only sieve uses half the memory as compared to the basic version.

This is not the expected about 2.5 times faster as the basic version because there are other factors to execution time cost than just the number of culling operations, as follows:

1) Since the amount of memory used to sieve to a billion has been dropped from 125 million bytes to 62.5 million bytes, the cache associativity is slightly better, which should make it faster; however

2) We have eliminated the culling by the very small span of the base prime of two, which means a lesser percentage of the culling span operations will be within a given CPU cache size, which will make it slower, but

3) The primary reason we observe only about a factor of two difference in run times is that we have increased the prime density in the sieving buffer by a factor of two, which means that we have half the work to enumerate the primes. Since enumeration of the found primes is a major contribution of the execution time, the execution time will tend to change more by its cost than any other.

As to "empirical orders of growth", the comments made in the above are valid, but there is a further observation. For smaller ranges of primes up to a few million where the sieving buffer fits within the CPU L2 cache size (generally 256 Kilobytes/2 million bits, representing a range of about four million for this version), the cull times are their fastest and enumeration is a bigger percentage of the time; as ranges increase above that, more and more time is spent waiting on memory at the access times of the next level memory (CPU L3 cache, if present, followed by main memory) so that the controlling factor is a little less that of the enumeration time as range gets larger.

'''Because of the greatly increasing memory demands and the high execution cost of memory access as ranges exceed the span of the CPU caches, it is not recommended that these simple "monolithic buffer" sieves be used for sieving of ranges above about a hundred million.''' Rather, one should use a "Paged-Segmented" sieve as per the examples near the end of this Haskell section.


===Immutable arrays===
===Immutable arrays===
Monolithic sieving array. ''Even'' numbers above 2 are pre-marked as composite, and sieving is done only by ''odd'' multiples of ''odd'' primes:
Monolithic sieving array. ''Even'' numbers above 2 are pre-marked as composite, and sieving is done only by ''odd'' multiples of ''odd'' primes:
<lang haskell>import Data.Array.Unboxed
<syntaxhighlight lang="haskell">import Data.Array.Unboxed
primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]] :: UArray Int Bool)
primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]] :: UArray Int Bool)
Line 4,453: Line 9,409:
| p*p > m = 2 : [i | (i,True) <- assocs a]
| p*p > m = 2 : [i | (i,True) <- assocs a]
| a!p = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
| a!p = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
| otherwise = sieve (p+2) a</lang>
| otherwise = sieve (p+2) a</syntaxhighlight>


Its performance sharply depends on compiler optimizations. Compiled with -O2 flag in the presence of the explicit type signature, it is very fast in producing first few million primes. <code>(//)</code> is an array update operator.
Its performance sharply depends on compiler optimizations. Compiled with -O2 flag in the presence of the explicit type signature, it is very fast in producing first few million primes. <code>(//)</code> is an array update operator.
Line 4,460: Line 9,416:


Works by segments between consecutive primes' squares. Should be the fastest of non-monadic code. ''Evens'' are entirely ignored:
Works by segments between consecutive primes' squares. Should be the fastest of non-monadic code. ''Evens'' are entirely ignored:
<lang haskell>import Data.Array.Unboxed
<syntaxhighlight lang="haskell">import Data.Array.Unboxed


primesSA = 2 : prs ()
primesSA = 2 : prs ()
Line 4,472: Line 9,428:
a :: UArray Int Bool
a :: UArray Int Bool
a = accumArray (\ b c -> False) True (1,q-1)
a = accumArray (\ b c -> False) True (1,q-1)
[(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]</lang>
[(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]</syntaxhighlight>

====As list comprehension====

<syntaxhighlight lang="haskell">import Data.Array.Unboxed
import Data.List (tails, inits)

primes = 2 : [ n |
(r:q:_, px) <- zip (tails (2 : [p*p | p <- primes]))
(inits primes),
(n, True) <- assocs ( accumArray (\_ _ -> False) True
(r+1,q-1)
[ (m,()) | p <- px
, s <- [ div (r+p) p * p]
, m <- [s,s+p..q-1] ] :: UArray Int Bool
) ]</syntaxhighlight>


===Basic list-based sieve===
===Basic list-based sieve===
Straightforward implementation of the sieve of Eratosthenes in its original bounded form. This finds primes in gaps between the composites, and composites as an enumeration of each prime's multiples.
Straightforward implementation of the sieve of Eratosthenes in its original bounded form. This finds primes in gaps between the composites, and composites as an enumeration of each prime's multiples.
<lang haskell>primesTo m = eratos [2..m] where
<syntaxhighlight lang="haskell">primesTo m = eratos [2..m] where
eratos (p : xs)
eratos (p : xs)
| p*p > m = p : xs
| p*p > m = p : xs
Line 4,487: Line 9,458:
EQ -> minus xs ys
EQ -> minus xs ys
GT -> minus a ys
GT -> minus a ys
minus a b = a </lang>
minus a b = a </syntaxhighlight>
Its time complexity is similar to that of optimal [[Primality_by_trial_division#Haskell|trial division]] because of limitations of Haskell linked lists, where <code>(minus a b)</code> takes time proportional to <code>length(union a b)</code> and not <code>(length b)</code>, as achieved in imperative setting with direct-access memory. Uses ordered list representation of sets.
Its time complexity is similar to that of optimal [[Primality_by_trial_division#Haskell|trial division]] because of limitations of Haskell linked lists, where <code>(minus a b)</code> takes time proportional to <code>length(union a b)</code> and not <code>(length b)</code>, as achieved in imperative setting with direct-access memory. Uses ordered list representation of sets.


Line 4,494: Line 9,465:
===Unbounded list based sieve===
===Unbounded list based sieve===
Unbounded, "naive", too eager to subtract (see above for the definition of <code>minus</code>):
Unbounded, "naive", too eager to subtract (see above for the definition of <code>minus</code>):
<lang haskell>primesE = sieve [2..]
<syntaxhighlight lang="haskell">primesE = sieve [2..]
where
where
sieve (p:xs) = p : sieve (minus xs [p, p+p..])
sieve (p:xs) = p : sieve (minus xs [p, p+p..])
-- unfoldr (\(p:xs)-> Just (p, minus xs [p, p+p..])) [2..]</lang>
-- unfoldr (\(p:xs)-> Just (p, minus xs [p, p+p..])) [2..]</syntaxhighlight>
This is slow, with complexity increasing as a square law or worse so that it is only moderately useful for the first few thousand primes or so.
This is slow, with complexity increasing as a square law or worse so that it is only moderately useful for the first few thousand primes or so.


The number of active streams can be limited to what's strictly necessary by postponement until the square of a prime is seen, getting a massive complexity improvement to better than <i>~ n<sup>1.5</sup></i> so it can get first million primes or so in a tolerable time:
The number of active streams can be limited to what's strictly necessary by postponement until the square of a prime is seen, getting a massive complexity improvement to better than <i>~ n<sup>1.5</sup></i> so it can get first million primes or so in a tolerable time:
<lang haskell>primesPE = 2 : sieve [3..] 4 primesPE
<syntaxhighlight lang="haskell">primesPE = 2 : sieve [3..] 4 primesPE
where
where
sieve (x:xs) q (p:t)
sieve (x:xs) q (p:t)
Line 4,508: Line 9,479:
-- fix $ (2:) . concat
-- fix $ (2:) . concat
-- . unfoldr (\(p:ps,xs)-> Just . second ((ps,) . (`minus` [p*p, p*p+p..]))
-- . unfoldr (\(p:ps,xs)-> Just . second ((ps,) . (`minus` [p*p, p*p+p..]))
-- . span (< p*p) $ xs) . (,[3..]) </lang>
-- . span (< p*p) $ xs) . (,[3..]) </syntaxhighlight>


Transposing the workflow, going by segments between the consecutive squares of primes:
Transposing the workflow, going by segments between the consecutive squares of primes:
<lang haskell>import Data.List (inits)
<syntaxhighlight lang="haskell">import Data.List (inits)


primesSE = 2 : sieve 3 4 (tail primesSE) (inits primesSE)
primesSE = 2 : sieve 3 4 (tail primesSE) (inits primesSE)
Line 4,520: Line 9,491:
-- True (x,q-1) [(i,()) | f <- fs, let n=div(x+f-1)f*f,
-- True (x,q-1) [(i,()) | f <- fs, let n=div(x+f-1)f*f,
-- i <- [n, n+f..q-1]] :: UArray Int Bool )]
-- i <- [n, n+f..q-1]] :: UArray Int Bool )]
++ sieve q (head ps^2) (tail ps) ft</lang>
++ sieve q (head ps^2) (tail ps) ft</syntaxhighlight>


The basic gradually-deepening left-leaning <code>(((a-b)-c)- ... )</code> workflow of <code>foldl minus a bs</code> above can be rearranged into the right-leaning <code>(a-(b+(c+ ... )))</code> workflow of <code>minus a (foldr union [] bs)</code>. This is the idea behind Richard Bird's unbounded code presented in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M. O'Neill's article], equivalent to:
The basic gradually-deepening left-leaning <code>(((a-b)-c)- ... )</code> workflow of <code>foldl minus a bs</code> above can be rearranged into the right-leaning <code>(a-(b+(c+ ... )))</code> workflow of <code>minus a (foldr union [] bs)</code>. This is the idea behind Richard Bird's unbounded code presented in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M. O'Neill's article], equivalent to:


<lang haskell>primesB = _Y ( (2:) . minus [3..] . foldr (\p-> (p*p :) . union [p*p+p, p*p+2*p..]) [] )
<syntaxhighlight lang="haskell">primesB = _Y ( (2:) . minus [3..] . foldr (\p-> (p*p :) . union [p*p+p, p*p+2*p..]) [] )


-- = _Y ( (2:) . minus [3..] . _LU . map(\p-> [p*p, p*p+p..]) )
-- = _Y ( (2:) . minus [3..] . _LU . map(\p-> [p*p, p*p+p..]) )
Line 4,537: Line 9,508:
LT -> x : union xs b
LT -> x : union xs b
EQ -> x : union xs ys
EQ -> x : union xs ys
GT -> y : union a ys</lang>
GT -> y : union a ys</syntaxhighlight>


Using <code>_Y</code> is meant to guarantee the separate supply of primes to be independently calculated, recursively, instead of the same one being reused, corecursively; thus the memory footprint is drastically reduced. This idea was introduced by M. ONeill as a double-staged production, with a separate primes feed.
Using <code>_Y</code> is meant to guarantee the separate supply of primes to be independently calculated, recursively, instead of the same one being reused, corecursively; thus the memory footprint is drastically reduced. This idea was introduced by M. ONeill as a double-staged production, with a separate primes feed.
Line 4,543: Line 9,514:
The above code is also useful to a range of the first million primes or so. The code can be further optimized by fusing <code>minus [3..]</code> into one function, preventing a space leak with the newer GHC versions, getting the function <code>gaps</code> defined below.
The above code is also useful to a range of the first million primes or so. The code can be further optimized by fusing <code>minus [3..]</code> into one function, preventing a space leak with the newer GHC versions, getting the function <code>gaps</code> defined below.


====Tree-merging incremental sieve====
===Tree-merging incremental sieve===
Linear merging structure can further be replaced with an wiki.haskell.org/Prime_numbers#Tree_merging indefinitely deepening to the right tree-like structure, <code>(a-(b+((c+d)+( ((e+f)+(g+h)) + ... ))))</code>.
Linear merging structure can further be replaced with an wiki.haskell.org/Prime_numbers#Tree_merging indefinitely deepening to the right tree-like structure, <code>(a-(b+((c+d)+( ((e+f)+(g+h)) + ... ))))</code>.


This merges primes' multiples streams in a ''tree''-like fashion, as a sequence of balanced trees of <code>union</code> nodes, likely achieving theoretical time complexity only a ''log n'' factor above the optimal ''n log n log (log n)'', for ''n'' primes produced. Indeed, empirically it runs at about ''~ n<sup>1.2</sup>'' (for producing first few million primes), similarly to priority-queue&ndash;based version of M. O'Neill's, and with very low space complexity too (not counting the produced sequence of course):
This merges primes' multiples streams in a ''tree''-like fashion, as a sequence of balanced trees of <code>union</code> nodes, likely achieving theoretical time complexity only a ''log n'' factor above the optimal ''n log n log (log n)'', for ''n'' primes produced. Indeed, empirically it runs at about ''~ n<sup>1.2</sup>'' (for producing first few million primes), similarly to priority-queue&ndash;based version of M. O'Neill's, and with very low space complexity too (not counting the produced sequence of course):
<lang haskell>primes :: [Int]
<syntaxhighlight lang="haskell">primes :: () -> [Int]
primes = 2 : _Y ( (3:) . gaps 5 . _U . map(\p-> [p*p, p*p+2*p..]) )
primes() = 2 : _Y ((3:) . gaps 5 . _U . map(\p-> [p*p, p*p+2*p..])) where
_Y g = g (_Y g) -- = g (g (g ( ... ))) non-sharing multistage fixpoint combinator

gaps k s@(c:cs) | k < c = k : gaps (k+2) s -- ~= ([k,k+2..] \\ s)
gaps k s@(c:cs) | k < c = k : gaps (k+2) s -- ~= ([k,k+2..] \\ s)
| otherwise = gaps (k+2) cs -- when null(s\\[k,k+2..])
| otherwise = gaps (k+2) cs -- when null(s\\[k,k+2..])
_U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union

_U ((x:xs):t) = x : (union xs . _U . pairs) t -- tree-shaped folding big union
pairs (xs:ys:t) = merge xs ys : pairs t
merge xs@(x:xt) ys@(y:yt) | x < y = x : merge xt ys
where -- ~= nub . sort . concat
| y < x = y : merge xs yt
pairs (xs:ys:t) = union xs ys : pairs t</lang>
| otherwise = x : merge xt yt</syntaxhighlight>


Works with odds only, the simplest kind of wheel. Here's the [http://ideone.com/qpnqe test entry] on Ideone.com, and a [http://ideone.com/p0e81 comparison with more versions].
Works with odds only, the simplest kind of wheel. Here's the [http://ideone.com/qpnqe test entry] on Ideone.com, and a [http://ideone.com/p0e81 comparison with more versions].
Line 4,561: Line 9,533:
====With Wheel====
====With Wheel====
Using <code>_U</code> defined above,
Using <code>_U</code> defined above,
<lang haskell>primesW :: [Int]
<syntaxhighlight lang="haskell">primesW :: [Int]
primesW = [2,3,5,7] ++ _Y ( (11:) . gapsW 13 (tail wheel) . _U .
primesW = [2,3,5,7] ++ _Y ( (11:) . gapsW 13 (tail wheel) . _U .
map (\p->
map (\p->
Line 4,572: Line 9,544:
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: -- gaps = (`gapsW` cycle [2])
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: -- gaps = (`gapsW` cycle [2])
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
-- cycle $ zipWith (-) =<< tail $ [i | i <- [11..221], gcd i 210 == 1]</lang>
-- cycle $ zipWith (-) =<< tail $ [i | i <- [11..221], gcd i 210 == 1]</syntaxhighlight>


Used [[Emirp_primes#List-based|here]] and [[Extensible_prime_generator#List_based|here]].
Used [[Emirp_primes#List-based|here]] and [[Extensible_prime_generator#List_based|here]].

====Improved efficiency Wheels====

1. The generation of large wheels such as the 2/3/5/7/11/13/17 wheel, which has 92160 cyclic elements, needs to be done based on sieve culling which is much better as to performance and can be used without inserting the generated table.

2. Improving the means to re-generate the position on the wheel for the recursive base primes without the use of `dropWhile`, etc. The below improved code uses a copy of the place in the wheel for each found base prime for ease of use in generating the composite number to-be-culled chains.

<syntaxhighlight lang="haskell">-- autogenerates wheel primes, first sieve prime, and gaps
wheelGen :: Int -> ([Int],Int,[Int])
wheelGen n = loop 1 3 [2] [2] where
loop i frst wps gps =
if i >= n then (wps, frst, gps) else
let nfrst = frst + head gps
nhts = (length gps) * (frst - 1)
cmpsts = scanl (\ c g -> c + frst * g) (frst * frst) (cycle gps)
cull n (g:gs') cs@(c:cs') og
| nn >= c = cull nn gs' cs' (og + g) -- n == c; never greater!
| otherwise = (og + g) : cull nn gs' cs 0 where nn = n + g
in nfrst `seq` nhts `seq` loop (i + 1) nfrst (wps ++ [frst]) $ take nhts
$ cull nfrst (tail $ cycle gps) cmpsts 0

(wheelPrimes, firstSievePrime, gaps) = wheelGen 7

primesTreeFoldingWheeled :: () -> [Int]
primesTreeFoldingWheeled() =
wheelPrimes ++ map fst (
_Y ( ((firstSievePrime, wheel) :) .
gapsW (firstSievePrime + head wheel, tail wheel) . _U .
map (\ (p,w) ->
scanl (\ c m -> c + m * p) (p * p) w ) ) ) where

_Y g = g (_Y g) -- non-sharing multi-stage fixpoint Y-combinator

wheel = cycle gaps

gapsW k@(n,d:w) s@(c:cs) | n < c = k : gapsW (n + d, w) s -- set diff
| otherwise = gapsW (n + d, w) cs -- n == c
_U ((x:xs):t) = -- exactly the same as for odds-only!
x : (union xs . _U . pairs) t where -- tree-shaped folding big union
pairs (xs:ys:t) = union xs ys : pairs t -- ~= nub . sort . concat
union xs@(x:xs') ys@(y:ys')
| x < y = x : union xs' ys
| y < x = y : union xs ys'
| otherwise = x : union xs' ys' -- x and y must be equal!</syntaxhighlight>

When compiled with -O2 optimization and -fllvm (the LLVM back end), the above code is over twice as fast as the Odds-Only version as it should be as that is about the ratio of reduced operations minus some slightly increased operation complexity, sieving the primes to a hundred million in about seven seconds on a modern middle range desktop computer. It is almost twice as fast as the "primesW" version due to the increased algorithmic efficiency!

Note that the "wheelGen" code could be used to not need to do further culling at all by continuously generating wheels until the square of the "firstSievePrime" is greater than the range as there are no composites left up to that limit, but this is always slower than a SoE due to the high overhead in generating the wheels - this would take a wheel generation of 1229 (number of primes to the square root of a hundred thousand is ten thousand) to create the required wheel sieved to a hundred million; however, the theoretical (if the time to advance through the lists per element were zero, which of course it is not) asymptotic performance would be O(n) instead of O(n log (log n)) where n is the range sieved. Just another case where theory supports (slightly) reduced number of operations, but practicality means that the overheads to do this are so big as to make it useless for any reasonable range ;-) !


===Priority Queue based incremental sieve===
===Priority Queue based incremental sieve===
Line 4,580: Line 9,601:
The above work is derived from the Epilogue of the Melissa E. O'Neill paper which is much referenced with respect to incremental functional sieves; however, that paper is now dated and her comments comparing list based sieves to her original work leading up to a Priority Queue based implementation is no longer current given more recent work such as the above Tree Merging version. Accordingly, a modern "odd's-only" Priority Queue version is developed here for more current comparisons between the above list based incremental sieves and a continuation of O'Neill's work.
The above work is derived from the Epilogue of the Melissa E. O'Neill paper which is much referenced with respect to incremental functional sieves; however, that paper is now dated and her comments comparing list based sieves to her original work leading up to a Priority Queue based implementation is no longer current given more recent work such as the above Tree Merging version. Accordingly, a modern "odd's-only" Priority Queue version is developed here for more current comparisons between the above list based incremental sieves and a continuation of O'Neill's work.


In order to implement a Priority Queue version with Haskell, an efficient Priority Queue, which is not part of the standard Haskell libraries is required. A Min Heap implementation is likely best suited for this task in providing the most efficient frequently used peeks of the next item in the queue and replacement of the first item in the queue (not using a "pop" followed by a "push) with "pop" operations then not used at all, and "push" operations used relatively infrequently. Judging by O'Neill's use of an efficient "deleteMinAndInsert" operation which she states "(We provide deleteMinAndInsert becausea heap-based implementation can support this operation with considerably less rearrangement than a deleteMin followed by an insert.)", which statement is true for a Min Heap Priority Queue and not others, and her reference to a priority queue by (Paulson, 1996), the queue she used is likely the one as provided as a simple true functional [http://rosettacode.org/wiki/Priority_queue#Haskell Min Heap implementation on RosettaCode], from which the essential functions are reproduced here:
In order to implement a Priority Queue version with Haskell, an efficient Priority Queue, which is not part of the standard Haskell libraries, is required. A Min Heap implementation is likely best suited for this task in providing the most efficient frequently used peeks of the next item in the queue and replacement of the first item in the queue (not using a "pop" followed by a "push) with "pop" operations then not used at all, and "push" operations used relatively infrequently. Judging by O'Neill's use of an efficient "deleteMinAndInsert" operation which she states "(We provide deleteMinAndInsert becausea heap-based implementation can support this operation with considerably less rearrangement than a deleteMin followed by an insert.)", which statement is true for a Min Heap Priority Queue and not others, and her reference to a priority queue by (Paulson, 1996), the queue she used is likely the one as provided as a simple true functional [http://rosettacode.org/wiki/Priority_queue#Haskell Min Heap implementation on RosettaCode], from which the essential functions are reproduced here:
<lang haskell>data PriorityQ k v = Mt
<syntaxhighlight lang="haskell">data PriorityQ k v = Mt
| Br !k v !(PriorityQ k v) !(PriorityQ k v)
| Br !k v !(PriorityQ k v) !(PriorityQ k v)
deriving (Eq, Ord, Read, Show)
deriving (Eq, Ord, Read, Show)
Line 4,610: Line 9,631:
replaceMinPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
replaceMinPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
replaceMinPQ wk wv Mt = Mt
replaceMinPQ wk wv Mt = Mt
replaceMinPQ wk wv (Br _ _ pl pr) = siftdown wk wv pl pr</lang>
replaceMinPQ wk wv (Br _ _ pl pr) = siftdown wk wv pl pr</syntaxhighlight>


The "peekMin" function retrieves both of the key and value in a tuple so processing is required to access whichever is required for further processing. As well, the output of the peekMin function is a Maybe with the case of an empty queue providing a Nothing output.
The "peekMin" function retrieves both the key and value in a tuple so processing is required to access whichever is required for further processing. As well, the output of the peekMin function is a Maybe with the case of an empty queue providing a Nothing output.


The following code is O'Neill's original odds-only code (without wheel factorization) from her paper slightly adjusted as per the requirements of this Min Heap implementation as laid out above; note the `seq` adjustments to the "adjust" function to make the evaluation of the entry tuple more strict for better efficiency:
The following code is O'Neill's original odds-only code (without wheel factorization) from her paper slightly adjusted as per the requirements of this Min Heap implementation as laid out above; note the `seq` adjustments to the "adjust" function to make the evaluation of the entry tuple more strict for better efficiency:
<lang haskell>-- (c) 2006-2007 Melissa O'Neill. Code may be used freely so long as
<syntaxhighlight lang="haskell">-- (c) 2006-2007 Melissa O'Neill. Code may be used freely so long as
-- this copyright message is retained and changed versions of the file
-- this copyright message is retained and changed versions of the file
-- are clearly marked.
-- are clearly marked.
Line 4,638: Line 9,659:
| otherwise = table
| otherwise = table
where (n, n':ns) = case peekMinPQ table of
where (n, n':ns) = case peekMinPQ table of
Just tpl -> tpl</lang>
Just tpl -> tpl</syntaxhighlight>


The above code is almost four times slower than the version of the Tree Merging sieve above for the first million primes although it is about the same speed as the original Richard Bird sieve with the "odds-only" adaptation as above. It is slow and uses a huge amount of memory for primarily one reason: over eagerness in adding prime composite streams to the queue, which are added as the primes are listed rather than when they are required as the output primes stream reaches the square of a given base prime; this over eagerness also means that the processed numbers must have a large range in order to not overflow when squared (as in the default Integer = infinite precision integers as used here and by O'Neill, but Int64's or Word64's would give a practical range) which processing of wide range numbers adds processing and memory requirement overhead. Although O'Neill's code is elegant, it also loses some efficiency due to the extensive use of lazy list processing, not all of which is required even for a wheel factorization implementation.
The above code is almost four times slower than the version of the Tree Merging sieve above for the first million primes although it is about the same speed as the original Richard Bird sieve with the "odds-only" adaptation as above. It is slow and uses a huge amount of memory for primarily one reason: over eagerness in adding prime composite streams to the queue, which are added as the primes are listed rather than when they are required as the output primes stream reaches the square of a given base prime; this over eagerness also means that the processed numbers must have a large range in order to not overflow when squared (as in the default Integer = infinite precision integers as used here and by O'Neill, but Int64's or Word64's would give a practical range) which processing of wide range numbers adds processing and memory requirement overhead. Although O'Neill's code is elegant, it also loses some efficiency due to the extensive use of lazy list processing, not all of which is required even for a wheel factorization implementation.


The following code is adjusted to reduce the amount of lazy list processing and to add a secondary base primes stream (or a succession of streams when the combinator is used) so as to overcome the above problems and reduce memory consumption to only that required for the primes below the square root of the currently sieved number; using this means that 32-bit Int's are sufficient for a reasonable range and memory requirements become relatively negligible:
The following code is adjusted to reduce the amount of lazy list processing and to add a secondary base primes stream (or a succession of streams when the combinator is used) so as to overcome the above problems and reduce memory consumption to only that required for the primes below the square root of the currently sieved number; using this means that 32-bit Int's are sufficient for a reasonable range and memory requirements become relatively negligible:
<lang haskell>primesPQx :: () -> [Int]
<syntaxhighlight lang="haskell">primesPQx :: () -> [Int]
primesPQx() = 2 : _Y ((3 :) . sieve 5 emptyPQ 9) -- initBasePrms
primesPQx() = 2 : _Y ((3 :) . sieve 5 emptyPQ 9) -- initBasePrms
where
where
_Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator OR
_Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator OR

-- initBasePrms = 3 : sieve 5 emptyPQ 9 initBasePrms -- single stage
insertprime p table = let adv = 2 * p in let nv = p * p + adv in
nv `seq` pushPQ nv adv table
sieve n table q bps@(bp:bps')
sieve n table q bps@(bp:bps')
| n >= q = let nbp = head bps' in
| n >= q = let nbp = head bps' in let ntbl = insertprime bp table in
sieve (n + 2) (insertprime bp table) (nbp * nbp) bps'
ntbl `seq` sieve (n + 2) ntbl (nbp * nbp) bps'
| n >= nextComposite = sieve (n + 2) (adjust table) q bps
| n >= nextComposite = let ntbl = adjust table in
ntbl `seq` sieve (n + 2) ntbl q bps
| otherwise = n : sieve (n + 2) table q bps
| otherwise = n : sieve (n + 2) table q bps
where
where
insertprime p table = let adv = 2 * p in let nv = p * p + adv
in nv `seq` pushPQ nv adv table
nextComposite = case peekMinPQ table of
nextComposite = case peekMinPQ table of
Nothing -> q -- at beginning when queue empty
Nothing -> q -- at beginning when queue empty!
Just (c, _) -> c
Just (c, _) -> c
adjust table
adjust table
| c <= n = let nc = c + adv in
| c <= n = let ntbl = replaceMinPQ (c + adv) adv table
nc `seq` adjust (replaceMinPQ nc adv table)
in ntbl `seq` adjust ntbl
| otherwise = table
| otherwise = table
where (c, adv) = case peekMinPQ table of
where (c, adv) = case peekMinPQ table of Just ct -> ct `seq` ct</syntaxhighlight>
Just ct -> ct</lang>


The above code is over five times faster than the previous (O'Neill) Priority Queue code and about half again faster than the Tree Merging code for a range of a million primes, and will always be faster as the Min Heap is slightly more efficient than Tree Merging due to better tree balancing.
The above code is over five times faster than the previous (O'Neill) Priority Queue code half again faster than the Tree-Merging Odds-Only code for a range of a hundred million primes; it is likely faster as the Min Heap is slightly more efficient than Tree Merging due to better tree balancing.


Since the Tree-Folding version above includes the minor changes to work with a factorization wheel, this should have the same minor modifications for comparison purposes, with the code as follows:
All of these codes including the list based ones would enjoy about the same constant factor improvement of up to about four times the speed with the application of maximum wheel factorization.

<syntaxhighlight lang="haskell">-- Note: this code segment uses the same wheelGen as the Tree-Folding version...

primesPQWheeled :: () -> [Int]
primesPQWheeled() =
wheelPrimes ++ map fst (
_Y (((firstSievePrime, wheel) :) .
sieve (firstSievePrime + head wheel, tail wheel)
emptyPQ (firstSievePrime * firstSievePrime)) )
where
_Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator OR

wheel = cycle gaps

sieve npr@(n,(g:gs')) table q bpprs@(bppr:bpprs')
| n >= q =
let (nbp,_) = head bpprs' in let ntbl = insertprime bppr table in
nbp `seq` ntbl `seq` sieve (n + g, gs') ntbl (nbp * nbp) bpprs'
| n >= nextComposite = let ntbl = adjust table in
ntbl `seq` sieve (n + g, gs') ntbl q bpprs
| otherwise = npr : sieve (n + g, gs') table q bpprs
where
insertprime (p,(pg:pgs')) table =
let nv = p * (p + pg) in nv `seq` pushPQ nv (map (* p) pgs') table
nextComposite = case peekMinPQ table of
Nothing -> q -- at beginning when queue empty!
Just (c, _) -> c
adjust table
| c <= n = let ntbl = replaceMinPQ (c + a) as' table
in ntbl `seq` adjust ntbl
| otherwise = table
where (c, (a:as')) = case peekMinPQ table of Just ct -> ct `seq` ct</syntaxhighlight>

Compiled with -O2 optimization and -fllvm (the LLVM back end), this code gains about the expected ratio in performance in sieving to a range of a hundred million, sieving to this range in about five seconds on a modern medium range desktop computer. This is likely the fastest purely functional incremental type SoE useful for moderate ranges up to about a hundred million to a billion.


===Page Segmented Sieve using a mutable unboxed array===
===Page Segmented Sieve using a mutable unboxed array===


All of the above unbounded sieves are quite limited in practical sieving range due to the large constant factor overheads in computation, making them mostly just interesting intellectual exercises other than for small ranges of about the first million to ten million primes; the following '''"odds-only''' page-segmented version using (bit-packed internally) mutable unboxed arrays is about 50 times faster than the fastest of the above algorithms for ranges of about that and higher, making it practical for the first several hundred million primes:
All of the above unbounded sieves are quite limited in practical sieving range due to the large constant factor overheads in computation, making them mostly just interesting intellectual exercises other than for small ranges of up to about the first million to ten million primes; the following '''"odds-only"''' page-segmented version using (bit-packed internally) mutable unboxed arrays is about 50 times faster than the fastest of the above algorithms for ranges of about that and higher, making it practical for the first several hundred million primes:
<syntaxhighlight lang="haskell">{-# OPTIONS_GHC -O2 -fllvm #-} -- use LLVM for about double speed!
<lang haskell>import Data.Bits

import Data.Array.Base
import Control.Monad.ST
import Data.Int ( Int64 )
import Data.Array.ST (runSTUArray, STUArray(..))
import Data.Word ( Word64 )
import Data.Bits ( Bits(shiftR) )
import Data.Array.Base ( IArray(unsafeAt), UArray(UArray),
MArray(unsafeWrite), unsafeFreezeSTUArray )
import Control.Monad ( forM_ )
import Data.Array.ST ( MArray(newArray), runSTUArray )

type Prime = Word64


cSieveBufferRange :: Int
type PrimeType = Int
szPGBTS = (2^14) * 8 :: PrimeType -- CPU L1 cache in bits
cSieveBufferRange = 2^17 * 8 -- CPU L2 cache in bits


primesPaged :: () -> [PrimeType]
primes :: () -> [Prime]
primesPaged() = 2 : _Y (listPagePrms . pagesFrom 0) where
primes() = 2 : _Y (listPagePrms . pagesFrom 0) where
_Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator
_Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator
szblmt = cSieveBufferRange - 1
listPagePrms (hdpg @ (UArray lowi _ rng _) : tlpgs) =
listPagePrms pgs@(hdpg@(UArray lwi _ rng _) : tlpgs) =
let loop i = if i >= rng then listPagePrms tlpgs
let loop i | i >= fromIntegral rng = listPagePrms tlpgs
else if unsafeAt hdpg i then loop (i + 1)
else let ii = lowi + fromIntegral i in
| unsafeAt hdpg i = loop (i + 1)
case 3 + ii + ii of
| otherwise = let ii = lwi + fromIntegral i in
p -> p `seq` p : loop (i + 1) in loop 0
case fromIntegral $ 3 + ii + ii of
p -> p `seq` p : loop (i + 1) in loop 0
makePg lowi bps = runSTUArray $ do
let limi = lowi + szPGBTS - 1
makePg lwi bps = runSTUArray $ do
let nxt = 3 + limi + limi -- last candidate in range
let limi = lwi + fromIntegral szblmt
bplmt = floor $ sqrt $ fromIntegral $ limi + limi + 3
cmpsts <- newArray (lowi, limi) False
let pbts = fromIntegral szPGBTS
strta bp = let si = fromIntegral $ (bp * bp - 3) `shiftR` 1
in if si >= lwi then fromIntegral $ si - lwi else
let cull (p:ps) =
let sqr = p * p in
let r = fromIntegral (lwi - si) `mod` bp
if sqr > nxt then return cmpsts
in if r == 0 then 0 else fromIntegral $ bp - r
cmpsts <- newArray (lwi, limi) False
else let pi = fromIntegral p in
fcmpsts <- unsafeFreezeSTUArray cmpsts
let cullp c = if c > pbts then return ()
else do
let cbps = if lwi == 0 then listPagePrms [fcmpsts] else bps
forM_ (takeWhile (<= bplmt) cbps) $ \ bp ->
unsafeWrite cmpsts c True
forM_ (let sp = fromIntegral $ strta bp
cullp (c + pi) in
let a = (sqr - 3) `shiftR` 1 in
in [ sp, sp + fromIntegral bp .. szblmt ]) $ \c ->
unsafeWrite cmpsts c True
let s = if a >= lowi then fromIntegral (a - lowi)
return cmpsts
else let r = fromIntegral ((lowi - a) `rem` p) in
pagesFrom lwi bps = map (`makePg` bps)
if r == 0 then 0 else pi - r in
do { cullp s; cull ps}
[ lwi, lwi + fromIntegral szblmt + 1 .. ]</syntaxhighlight>
if lowi == 0 then do
pg0 <- unsafeFreezeSTUArray cmpsts
cull $ listPagePrms [pg0]
else cull bps
pagesFrom lowi bps =
let cf lwi = case makePg lwi bps of
pg -> pg `seq` pg : cf (lwi + szPGBTS) in cf lowi</lang>


The above code as written has a maximum practical range of about 10^12 or so in about an hour.
The above code is currently implemented to use "Int" as the prime type but one can change the "PrimeType" to "Int64" (importing Data.Int) or "Word64" (importing Data.Word) to extend the range to its maximum practical range of above 10^14 or so. Note that for larger ranges that one will want to set the "szPGBTS" to something close to the CPU L2 or even L3 cache size (up to 8 Megabytes = 2^23 for an Intel i7) for a slight cost in speed (about a factor of 1.5) but so that it still computes fairly efficiently as to memory access up to those large ranges. It would be quite easy to modify the above code to make the page array size automatically increase in size with increasing range.


The above code takes only a few tens of milliseconds to compute the first million primes and a few seconds to calculate the first 50 million primes, with over half of those times expended in just enumerating the result lazy list, with even worse times when using 64-bit list processing (especially with 32-bit versions of GHC). A further improvement to reduce the computational cost of repeated list processing across the base pages for every page segment would be to store the required base primes (or base prime gaps) in an array that gets extended in size by factors of two (by copying the old array to the new extended array) as the number of base primes increases; in that way the scans across base primes per page segment would just be array accesses which are much faster than list enumeration.
The above code takes only a few tens of milliseconds to compute the first million primes and a few seconds to calculate the first 50 million primes up to a billion, with over half of those times expended in just enumerating the result lazy list. A further improvement to reduce the computational cost of repeated list processing across the base pages for every page segment would be to store the required base primes (or base prime gaps) in a lazy list of base prime arrays; in that way the scans across base primes per page segment would just mostly be array accesses which are much faster than list enumeration.


Unlike many other other unbounded examples, this algorithm has the true Sieve of Eratosthenes computational time complexity of O(n log log n) where n is the sieving range with no extra "log n" factor while having a very low computational time cost per composite number cull of less than ten CPU clock cycles per cull (well under as in under 4 clock cycles for the Intel i7 using a page buffer size of the CPU L1 cache size).
Unlike many other other unbounded examples, this algorithm has the true Sieve of Eratosthenes computational time complexity of O(n log log n) where n is the sieving range with no extra "log n" factor while having a very low computational time cost per composite number cull of less than ten CPU clock cycles per cull (well under as in under 4 clock cycles for the Intel i7 using a page buffer size of the CPU L1 cache size).
Line 4,724: Line 9,779:
There are other ways to make the algorithm faster including high degrees of wheel factorization, which can reduce the number of composite culling operations by a factor of about four for practical ranges, and multi-processing which can reduce the computation time proportionally to the number of available independent CPU cores, but there is little point to these optimizations as long as the lazy list enumeration is the bottleneck as it is starting to be in the above code. To take advantage of those optimizations, functions need to be provided that can compute the desired results without using list processing.
There are other ways to make the algorithm faster including high degrees of wheel factorization, which can reduce the number of composite culling operations by a factor of about four for practical ranges, and multi-processing which can reduce the computation time proportionally to the number of available independent CPU cores, but there is little point to these optimizations as long as the lazy list enumeration is the bottleneck as it is starting to be in the above code. To take advantage of those optimizations, functions need to be provided that can compute the desired results without using list processing.


For ranges above about 10^14 where culling spans begin to exceed even an expanded size page array, other techniques need to be adapted such as the use of a "bucket sieve" which tracks the next page that larger base prime culling sequences will "hit" to avoid redundant (and time expensive) start address calculations for base primes that don't "hit" the current page.
For ranges above about 10^14 where culling spans begin to exceed even an expanded size page array, other techniques need to be adapted such as such as automatically extending the sieving buffer size to the square root of the maximum range currently sieved and sieving by CPU L1/L2 cache sized segments/sections.


However, even with the above code and its limitations for large sieving ranges, the speeds will never come close to as slow as the other "incremental" sieve algorithms, as the time will never exceed about 100 CPU clock cycles per composite number cull, where the fastest of those other algorithms takes many hundreds of CPU clock cycles per cull.
However, even with the above code and its limitations for large sieving ranges, the speeds will never come close to as slow as the other "incremental" sieve algorithms, as the time will never exceed about 20 CPU clock cycles per composite number cull, where the fastest of those other algorithms takes many hundreds of CPU clock cycles per cull.

'''A faster method of counting primes with a similar algorithm'''

To show the limitations of the individual prime enumeration, the following code has been refactored from the above to provide an alternate very fast method of counting the unset bits in the culled array (the primes = none composite) using a CPU native pop count instruction:

<syntaxhighlight lang="haskell">{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS_GHC -O2 -fllvm #-} -- use LLVM for about double speed!

import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing

import Data.Int ( Int64 )
import Data.Word ( Word64 )
import Data.Bits ( Bits((.&.), (.|.), shiftL, shiftR, popCount) )
import Control.Monad.ST ( ST, runST )
import Data.Array.Base ( IArray(unsafeAt), UArray(UArray), STUArray,
MArray(unsafeRead, unsafeWrite), castSTUArray,
unsafeThawSTUArray, unsafeFreezeSTUArray )
import Control.Monad ( forM_ )
import Data.Array.ST ( MArray(newArray), runSTUArray )

type Prime = Word64

cSieveBufferRange :: Int
cSieveBufferRange = 2^17 * 8 -- CPU L2 cache in bits

type PrimeNdx = Int64
type SieveBuffer = UArray PrimeNdx Bool
cWHLPRMS :: [Prime]
cWHLPRMS = [ 2 ]
cFRSTSVPRM :: Prime
cFRSTSVPRM = 3
primesPages :: () -> [SieveBuffer]
primesPages() = _Y (pagesFrom 0 . listPagePrms) where
_Y g = g (_Y g) -- non-sharing multi-stage fixpoint Y-combinator
szblmt = fromIntegral (cSieveBufferRange `shiftR` 1) - 1
makePg lwi bps = runSTUArray $ do
let limi = lwi + fromIntegral szblmt
mxprm = cFRSTSVPRM + fromIntegral (limi + limi)
bplmt = floor $ sqrt $ fromIntegral mxprm
strta bp = let si = fromIntegral $ (bp * bp - cFRSTSVPRM) `shiftR` 1
in if si >= lwi then fromIntegral $ si - lwi else
let r = fromIntegral (lwi - si) `mod` bp
in if r == 0 then 0 else fromIntegral $ bp - r
cmpsts <- newArray (lwi, limi) False
fcmpsts <- unsafeFreezeSTUArray cmpsts
let cbps = if lwi == 0 then listPagePrms [fcmpsts] else bps
forM_ (takeWhile (<= bplmt) cbps) $ \ bp ->
forM_ (let sp = fromIntegral $ strta bp
in [ sp, sp + fromIntegral bp .. szblmt ]) $ \c ->
unsafeWrite cmpsts c True
return cmpsts
pagesFrom lwi bps = map (`makePg` bps)
[ lwi, lwi + fromIntegral szblmt + 1 .. ]

-- convert a list of sieve buffers to a list of primes...
listPagePrms :: [SieveBuffer] -> [Prime]
listPagePrms pgs@(pg@(UArray lwi _ rng _) : pgstl) = bsprm `seq` loop 0 where
bsprm = cFRSTSVPRM + fromIntegral (lwi + lwi)
loop i | i >= rng = listPagePrms pgstl
| unsafeAt pg i = loop (i + 1)
| otherwise = case bsprm + fromIntegral (i + i) of
p -> p `seq` p : loop (i + 1)
primes :: () -> [Prime]
primes() = cWHLPRMS ++ listPagePrms (primesPages())

-- very fast using popCount by words technique...
countSieveBuffer :: Int -> UArray PrimeNdx Bool -> Int64
countSieveBuffer lstndx sb = fromIntegral $ runST $ do
cmpsts <- unsafeThawSTUArray sb :: ST s (STUArray s PrimeNdx Bool)
wrdcmpsts <-
(castSTUArray :: STUArray s PrimeNdx Bool ->
ST s (STUArray s PrimeNdx Word64)) cmpsts
let lstwrd = lstndx `shiftR` 6
lstmsk = 0xFFFFFFFFFFFFFFFE `shiftL` (lstndx .&. 63) :: Word64
loop wi cnt
| wi < lstwrd = do
v <- unsafeRead wrdcmpsts wi
case cnt - popCount v of ncnt -> ncnt `seq` loop (wi + 1) ncnt
| otherwise = do
v <- unsafeRead wrdcmpsts lstwrd
return $ fromIntegral (cnt - popCount (v .|. lstmsk))
loop 0 (lstwrd * 64 + 64)

-- count the remaining un-marked composite bits using very fast popcount...
countPrimesTo :: Prime -> Int64
countPrimesTo limit =
let lmtndx = fromIntegral $ (limit - 3) `shiftR` 1
loop (pg@(UArray lwi lmti rng _) : pgstl) cnt
| lmti >= lmtndx =
(cnt + countSieveBuffer (fromIntegral $ lmtndx - lwi) pg)
| otherwise = loop pgstl (cnt + countSieveBuffer (rng - 1) pg)
in if limit < 3 then if limit < 2 then 0 else 1
else loop (primesPages()) 1

-- test it...
main :: IO ()
main = do
let limit = 10^9 :: Prime

strt <- getPOSIXTime
-- let answr = length $ takeWhile (<= limit) $ primes()-- slow way
let answr = countPrimesTo limit -- fast way
stop <- answr `seq` getPOSIXTime -- force evaluation of answr b4 stop time!
let elpsd = round $ 1e3 * (stop - strt) :: Int64
putStr $ "Found " ++ show answr
putStr $ " primes up to " ++ show limit
putStrLn $ " in " ++ show elpsd ++ " milliseconds."</syntaxhighlight>

When compiled with the "fast way" commented out and the "slow way enabled, the time to find the number of primes up to one billion is about 3.65 seconds on an Intel Sandy Bridge i3-2100 at 3.1 Ghz; with the "fast way" enabled instead, the time is only about 1.45 seconds for the same range, both compiled with the LLVM back end. This shows that more than half of the time for the "slow way" is spent just producing and enumerating the list of primes!

On a Intel Sky Lake i5-2500 CPU @ 3.6 GHz (turbo boost for single threaded as here) compiled with LLVM and 256 Kilobyte buffer size (CPU L2 sized), using the fast counting method:
* takes 1.085 seconds to sieve to 10^9: about 3.81 CPU clocks per cull
* takes 126 seconds to sieve to 10^11: about 4.0 CPU clocks per cull

This shows a slight loss of efficiency in clocks per cull due to the average culling span size coming closer to the cull buffer span size, meaning that the loop overhead in address calculation and CPU L1 cache overflows increases just a bit for these relative ranges.

This an extra about 20% faster than using the Sandy Bridge i5-2100 above by more than the ratio of CPU clock speeds likely due to the better Instructions Per Clock of the newer Sky Lake architecture due to improved branch prediction and elision of a correctly predicted branch down to close to zero time.

This is about 25 to 30 per cent faster than not using LLVM for this Sky Lake processor due to the poor register allocation and optimizations by the Native Code Gnerator compared to LLVM.


===APL-style===
===APL-style===
Rolling set subtraction over the rolling element-wise addition on integers. Basic, slow, worse than quadratic in the number of primes produced, empirically:
Rolling set subtraction over the rolling element-wise addition on integers. Basic, slow, worse than quadratic in the number of primes produced, empirically:
<lang haskell>zipWith (flip (!!)) [0..] -- or: take n . last . take n ...
<syntaxhighlight lang="haskell">zipWith (flip (!!)) [0..] -- or: take n . last . take n ...
. scanl1 minus
. scanl1 minus
. scanl1 (zipWith (+)) $ repeat [2..]</lang>
. scanl1 (zipWith (+)) $ repeat [2..]</syntaxhighlight>
Or, a wee bit faster:
Or, a wee bit faster:
<lang haskell>unfoldr (\(a:b:t) -> Just . (head &&& (:t) . (`minus` b)
<syntaxhighlight lang="haskell">unfoldr (\(a:b:t) -> Just . (head &&& (:t) . (`minus` b)
. tail) $ a)
. tail) $ a)
. scanl1 (zipWith (+)) $ repeat [2..]</lang>
. scanl1 (zipWith (+)) $ repeat [2..]</syntaxhighlight>
A bit optimized, much faster, with better complexity,
A bit optimized, much faster, with better complexity,
<lang haskell>tail . concat
<syntaxhighlight lang="haskell">tail . concat
. unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
. unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
. span (< head b) $ a)
. span (< head b) $ a)
. scanl1 (zipWith (+) . tail) $ tails [1..]
. scanl1 (zipWith (+) . tail) $ tails [1..]
-- $ [ [n*n, n*n+n..] | n <- [1..] ]</lang>
-- $ [ [n*n, n*n+n..] | n <- [1..] ]</syntaxhighlight>


getting nearer to the functional equivalent of the <code>primesPE</code> above, i.e.
getting nearer to the functional equivalent of the <code>primesPE</code> above, i.e.
<lang haskell>fix ( (2:) . concat
<syntaxhighlight lang="haskell">fix ( (2:) . concat
. unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
. unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
. span (< head b) $ a)
. span (< head b) $ a)
. ([3..] :) . map (\p-> [p*p, p*p+p..]) )</lang>
. ([3..] :) . map (\p-> [p*p, p*p+p..]) )</syntaxhighlight>


An illustration:
An illustration:
<lang haskell>> mapM_ (print . take 15) $ take 10 $ scanl1 (zipWith(+)) $ repeat [2..]
<syntaxhighlight lang="haskell">> mapM_ (print . take 15) $ take 10 $ scanl1 (zipWith(+)) $ repeat [2..]
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
[ 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]
[ 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]
Line 4,773: Line 9,949:
[ 64, 72, 80, 88, 96,104,112,120,128,136,144,152,160,168,176]
[ 64, 72, 80, 88, 96,104,112,120,128,136,144,152,160,168,176]
[ 81, 90, 99,108,117,126,135,144,153,162,171,180,189,198,207]
[ 81, 90, 99,108,117,126,135,144,153,162,171,180,189,198,207]
[100,110,120,130,140,150,160,170,180,190,200,210,220,230,240]</lang>
[100,110,120,130,140,150,160,170,180,190,200,210,220,230,240]</syntaxhighlight>


=={{header|HicEst}}==
=={{header|HicEst}}==
<lang hicest>REAL :: N=100, sieve(N)
<syntaxhighlight lang="hicest">REAL :: N=100, sieve(N)


sieve = $ > 1 ! = 0 1 1 1 1 ...
sieve = $ > 1 ! = 0 1 1 1 1 ...
Line 4,789: Line 9,965:
DO i = 1, N
DO i = 1, N
IF( sieve(i) ) WRITE() i
IF( sieve(i) ) WRITE() i
ENDDO </lang>
ENDDO </syntaxhighlight>

=={{header|Hoon}}==
<syntaxhighlight lang="hoon">:: Find primes by the sieve of Eratosthenes
!:
|= end=@ud
=/ index 2
=/ primes `(list @ud)`(gulf 1 end)
|- ^- (list @ud)
?: (gte index (lent primes)) primes
$(index +(index), primes +:(skid primes |=([a=@ud] &((gth a index) =(0 (mod a index))))))
</syntaxhighlight>


=={{header|Icon}} and {{header|Unicon}}==
=={{header|Icon}} and {{header|Unicon}}==
<lang Icon> procedure main()
<syntaxhighlight lang="icon"> procedure main()
sieve(100)
sieve(100)
end
end
Line 4,802: Line 9,989:
do p[j] := 0
do p[j] := 0
every write(i:=2 to n & p[i] == 1 & i)
every write(i:=2 to n & p[i] == 1 & i)
end</lang>
end</syntaxhighlight>


Alternatively using sets
Alternatively using sets
<lang Icon> procedure main()
<syntaxhighlight lang="icon"> procedure main()
sieve(100)
sieve(100)
end
end
Line 4,816: Line 10,003:
delete(primes,1)
delete(primes,1)
every write(!sort(primes))
every write(!sort(primes))
end</lang>
end</syntaxhighlight>


=={{header|J}}==
=={{header|J}}==
{{eff note|J|i.&.(p:inv) }}
{{eff note|J|i.&.(p:inv) }}


Implementation:<syntaxhighlight lang="j">sieve=: {{
This problem is a classic example of how J can be used to represent mathematical concepts.
r=. 0#t=. y# j=.1

while. y>j=.j+1 do.
J uses x|y ([http://www.jsoftware.com/help/dictionary/d230.htm residue]) to represent the operation of finding the remainder during integer division of y divided by x
if. j{t do.

t=. t > y$j{.1
<lang J> 10|13
r=. r, j
3</lang>
end.

And x|/y gives us a [http://www.jsoftware.com/help/dictionary/d420.htm table] with all possibilities from x and all possibilities from y.

<lang J> 2 3 4 |/ 2 3 4
0 1 0
2 0 1
2 3 0</lang>

Meanwhile, |/~y ([http://www.jsoftware.com/help/dictionary/d220v.htm reflex]) copies the right argument and uses it as the left argment.

<lang J> |/~ 0 1 2 3 4
0 1 2 3 4
0 0 0 0 0
0 1 0 1 0
0 1 2 0 1
0 1 2 3 0</lang>

(Bigger examples might make the patterns more obvious but they also take up more space.)

By the way, we can ask J to count out the first N integers for us using i. ([http://www.jsoftware.com/help/dictionary/didot.htm integers]):

<lang J> i. 5
0 1 2 3 4</lang>

Anyways, the 0s in that last table represent the Sieve of Eratosthenes (in a symbolic or mathematical sense), and we can use = ([http://www.jsoftware.com/help/dictionary/d000.htm equal]) to find them.

<lang J> 0=|/~ i.5
1 0 0 0 0
1 1 1 1 1
1 0 1 0 1
1 0 0 1 0
1 0 0 0 1</lang>

Now all we need to do is add them up, using / ([http://www.jsoftware.com/help/dictionary/d420.htm insert]) in its single argument role to insert + between each row of that last table.

<lang J> +/0=|/~ i.5
5 1 2 2 3</lang>

The sieve wants the cases where we have two divisors:

<lang J> 2=+/0=|/~ i.5
0 0 1 1 0</lang>

And we just want to know the positions of the 1s in that list, which we can find using I. ([http://www.jsoftware.com/help/dictionary/dicapdot.htm indices]):

<lang J> I.2=+/0=|/~ i.5
2 3
I.2=+/0=|/~ i.100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</lang>

And we might want to express this sentence as a definition of a word that lets us use it with an arbitrary argument. There are a variety of ways of doing this. For example:

<lang J>sieve0=: verb def 'I.2=+/0=|/~ i.y'</lang>

That said, this fails with an argument of 2 (instead of giving an empty list of the primes smaller than 2, it gives a list with one element: 0). Working through why this is and why this matters can be an informative exercise. But, assuming this matters, we need to add some guard logic to prevent that problem:

<lang J>sieve0a=: verb def 'I.(y>2)*2=+/0=|/~ i.y'</lang>

Of course, we can also express this in an even more elaborate fashion. The elaboration makes more efficient use of resources for large arguments, at the expense of less efficient use of resources for small arguments:

<lang J>sieve1=: 3 : 0
m=. <.%:y
z=. $0
b=. y{.1
while. m>:j=. 1+b i. 0 do.
b=. b+.y$(-j){.1
z=. z,j
end.
end.
}}</syntaxhighlight>
z,1+I.-.b
)</lang>


Example:
"Wheels" may be implemented as follows:


<syntaxhighlight lang="j"> sieve 100
<lang J>sieve2=: 3 : 0
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</syntaxhighlight>
m=. <.%:y
z=. y (>:#]) 2 3 5 7
b=. 1,}.y$+./(*/z)$&>(-z){.&.>1
while. m>:j=. 1+b i. 0 do.
b=. b+.y$(-j){.1
z=. z,j
end.
z,1+I.-.b
)</lang>


To see into how this works, we can change the definition:
The use of<tt> 2 3 5 7 </tt>as wheels provides a
20% time improvement for<tt> n=1000 </tt>and 2% for<tt> n=1e6</tt> but note that sieve2 is still 25 times slower than i.&.(p:inv) for <tt>n=1e6</tt>. Then again, the value of the sieve of eratosthenes was not efficiency but simplicity. So perhaps we should ignore resource consumption issues and instead focus on intermediate results for reasonably sized example problems?


<lang J> 0=|/~ i.8
<syntaxhighlight lang="j">sieve=: {{
1 0 0 0 0 0 0 0
r=. 0#t=. y# j=.1
while. y>j=.j+1 do.
1 1 1 1 1 1 1 1
1 0 1 0 1 0 1 0
if. j{t do.
echo j;(y$j{.1);t=. t > y$j{.1
1 0 0 1 0 0 1 0
1 0 0 0 1 0 0 0
r=. r, j
1 0 0 0 0 1 0 0
end.
1 0 0 0 0 0 1 0
1 0 0 0 0 0 0 1</lang>

Columns with two "1" values correspond to prime numbers.

'''Alternate Implementation'''

If you feel that the intermediate results, above, are not enough "sieve-like" another approach could be:

<lang J>sieve=:verb define
seq=: 2+i.y-1 NB. 2 thru y
n=. 2
l=. #seq
whilst. -.seq-:prev do.
prev=. seq
mask=. l{.1-(0{.~n-1),1}.l$n{.1
seq=. seq * mask
n=. {.((n-1)}.seq)-.0
end.
end.
}}</syntaxhighlight>
seq -. 0
)</lang>


And go:<syntaxhighlight lang="j"> sieve 10
Example use:
┌─┬───────────────────┬───────────────────┐
│2│1 0 1 0 1 0 1 0 1 0│0 1 0 1 0 1 0 1 0 1│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│3│1 0 0 1 0 0 1 0 0 1│0 1 0 0 0 1 0 1 0 0│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│5│1 0 0 0 0 1 0 0 0 0│0 1 0 0 0 0 0 1 0 0│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│7│1 0 0 0 0 0 0 1 0 0│0 1 0 0 0 0 0 0 0 0│
└─┴───────────────────┴───────────────────┘
2 3 5 7</syntaxhighlight>


Thus, here, <code>t</code> would select numbers which have not yet been determined to be a multiple of a prime number.
<lang J> sieve 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</lang>


=={{header|Janet}}==
To see intermediate results, let's show them:


===Simple, all primes below a limit===
<lang J>label=:dyad def 'echo x,":y'
Janet has a builtin [https://janet-lang.org/docs/data_structures/buffers.html "buffer" type] which is used as a mutable byte string. It has builtin utility methods to handle bit strings (see [https://janet-lang.org/api/buffer.html here] :)


This is based off the Python version.
sieve=:verb define
'seq ' label seq=: 2+i.y-1 NB. 2 thru y
'n ' label n=. 2
'l ' label l=. #seq
whilst. -.seq-:prev do.
prev=. seq
'mask ' label mask=. l{.1-(0{.~n-1),1}.l$n{.1
'seq ' label seq=. seq * mask
'n ' label n=. {.((n-1)}.seq)-.0
end.
seq -. 0
)


<syntaxhighlight lang="janet">(defn primes-before
seq 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
"Gives all the primes < limit"
n 2
[limit]
l 59
(assert (int? limit))
mask 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
# Janet has a buffer type (mutable string) which has easy methods for use as bitset
seq 2 3 0 5 0 7 0 9 0 11 0 13 0 15 0 17 0 19 0 21 0 23 0 25 0 27 0 29 0 31 0 33 0 35 0 37 0 39 0 41 0 43 0 45 0 47 0 49 0 51 0 53 0 55 0 57 0 59 0
(def buf-size (math/ceil (/ limit 8)))
n 3
(def is-prime (buffer/new-filled buf-size (bnot 0)))
mask 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
(print "Size" buf-size "is-prime: " is-prime)
seq 2 3 0 5 0 7 0 0 0 11 0 13 0 0 0 17 0 19 0 0 0 23 0 25 0 0 0 29 0 31 0 0 0 35 0 37 0 0 0 41 0 43 0 0 0 47 0 49 0 0 0 53 0 55 0 0 0 59 0
(buffer/bit-clear is-prime 0)
n 5
(buffer/bit-clear is-prime 1)
mask 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0
(for n 0 (math/ceil (math/sqrt limit))
seq 2 3 0 5 0 7 0 0 0 11 0 13 0 0 0 17 0 19 0 0 0 23 0 0 0 0 0 29 0 31 0 0 0 0 0 37 0 0 0 41 0 43 0 0 0 47 0 49 0 0 0 53 0 0 0 0 0 59 0
(if (buffer/bit is-prime n) (loop [i :range-to [(* n n) limit n]]
n 7
(buffer/bit-clear is-prime i))))
mask 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
(def res @[]) # Result: Mutable array
seq 2 3 0 5 0 7 0 0 0 11 0 13 0 0 0 17 0 19 0 0 0 23 0 0 0 0 0 29 0 31 0 0 0 0 0 37 0 0 0 41 0 43 0 0 0 47 0 0 0 0 0 53 0 0 0 0 0 59 0
n 11
(for i 0 limit
(if (buffer/bit is-prime i)
mask 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
(array/push res i)))
seq 2 3 0 5 0 7 0 0 0 11 0 13 0 0 0 17 0 19 0 0 0 23 0 0 0 0 0 29 0 31 0 0 0 0 0 37 0 0 0 41 0 43 0 0 0 47 0 0 0 0 0 53 0 0 0 0 0 59 0
(def res (array/new limit))
n 13
(for i 0 limit
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59</lang>
(if (buffer/bit is-prime i)

(array/push res i)))
Another variation on this theme would be:
res)</syntaxhighlight>

<lang J>sieve=:verb define
seq=: 2+i.y-1 NB. 2 thru y
n=. 1
l=. #seq
whilst. -.seq-:prev do.
prev=. seq
n=. 1+n+1 i.~ * (n-1)}.seq
inds=. (2*n)+n*i.(<.l%n)-1
seq=. 0 inds} seq
end.
seq -. 0
)</lang>


Intermediate results for this variant are left as an exercise for the reader


=={{header|Java}}==
=={{header|Java}}==
{{works with|Java|1.5+}}
{{works with|Java|1.5+}}
<lang java5>import java.util.LinkedList;
<syntaxhighlight lang="java5">import java.util.LinkedList;


public class Sieve{
public class Sieve{
Line 5,025: Line 10,106:
return primes;
return primes;
}
}
}</lang>
}</syntaxhighlight>


To optimize by testing only odd numbers, replace the loop marked "unoptimized" with these lines:
To optimize by testing only odd numbers, replace the loop marked "unoptimized" with these lines:
<lang java5>nums.add(2);
<syntaxhighlight lang="java5">nums.add(2);
for(int i = 3;i <= n;i += 2){
for(int i = 3;i <= n;i += 2){
nums.add(i);
nums.add(i);
}</lang>
}</syntaxhighlight>


Version using List:
<syntaxhighlight lang="java5">
import java.util.ArrayList;
import java.util.List;
 
public class Eratosthenes {
public List<Integer> sieve(Integer n) {
List<Integer> primes = new ArrayList<Integer>(n);
boolean[] isComposite = new boolean[n + 1];
for(int i = 2; i <= n; i++) {
if(!isComposite[i]) {
primes.add(i);
for(int j = i * i; j <= n; j += i) {
isComposite[j] = true;
}
}
}
return primes;
}
}
</syntaxhighlight>
Version using a BitSet:
Version using a BitSet:
<lang java5>import java.util.LinkedList;
<syntaxhighlight lang="java5">import java.util.LinkedList;
import java.util.BitSet;
import java.util.BitSet;


Line 5,049: Line 10,151:
return primes;
return primes;
}
}
}</lang>
}</syntaxhighlight>


Version using a TreeSet:
<syntaxhighlight lang="java5">import java.util.Set;
import java.util.TreeSet;

public class Sieve{
public static Set<Integer> findPrimeNumbers(int limit) {
int last = 2;
TreeSet<Integer> nums = new TreeSet<>();

if(limit < last) return nums;

for(int i = last; i <= limit; i++){
nums.add(i);
}

return filterList(nums, last, limit);
}

private static TreeSet<Integer> filterList(TreeSet<Integer> list, int last, int limit) {
int squared = last*last;
if(squared < limit) {
for(int i=squared; i <= limit; i += last) {
list.remove(i);
}
return filterList(list, list.higher(last), limit);
}
return list;
}
}</syntaxhighlight>


===Infinite iterator===
===Infinite iterator===
Line 5,055: Line 10,188:
{{trans|Python}}
{{trans|Python}}
{{works with|Java|1.5+}}
{{works with|Java|1.5+}}
<lang java5>import java.util.Iterator;
<syntaxhighlight lang="java5">import java.util.Iterator;
import java.util.PriorityQueue;
import java.util.PriorityQueue;
import java.math.BigInteger;
import java.math.BigInteger;
Line 5,108: Line 10,241:
}
}
}
}
}</lang>
}</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 5,127: Line 10,260:
{{trans|Python}}
{{trans|Python}}
{{works with|Java|1.5+}}
{{works with|Java|1.5+}}
<lang java5>import java.util.Iterator;
<syntaxhighlight lang="java5">import java.util.Iterator;
import java.util.HashMap;
import java.util.HashMap;
Line 5,185: Line 10,318:
}
}
}</lang>
}</syntaxhighlight>


{{out}}<pre>Found 5761455 primes up to 100000000 in 4297 milliseconds.</pre>
{{out}}<pre>Found 5761455 primes up to 100000000 in 4297 milliseconds.</pre>
Line 5,195: Line 10,328:
{{trans|JavaScript}}
{{trans|JavaScript}}
{{works with|Java|1.5+}}
{{works with|Java|1.5+}}
<lang java5>import java.util.Iterator;
<syntaxhighlight lang="java5">import java.util.Iterator;
import java.util.ArrayList;
import java.util.ArrayList;


Line 5,272: Line 10,405:
}
}
}</lang>
}</syntaxhighlight>


{{out}}<pre>Found 50847534 primes up to 1000000000 in 3201 milliseconds.</pre>
{{out}}<pre>Found 50847534 primes up to 1000000000 in 3201 milliseconds.</pre>


=={{header|JavaScript}}==
=={{header|JavaScript}}==
<lang javascript>function eratosthenes(limit) {
<syntaxhighlight lang="javascript">function eratosthenes(limit) {
var primes = [];
var primes = [];
if (limit >= 2) {
if (limit >= 2) {
Line 5,303: Line 10,436:
if (typeof print == "undefined")
if (typeof print == "undefined")
print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print(primes);</lang>
print(primes);</syntaxhighlight>
outputs:
outputs:
<pre>2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97</pre>
<pre>2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97</pre>
Line 5,309: Line 10,442:
Substituting the following code for the function for '''an odds-only algorithm using bit packing''' for the array produces code that is many times faster than the above:
Substituting the following code for the function for '''an odds-only algorithm using bit packing''' for the array produces code that is many times faster than the above:


<lang javascript>function eratosthenes(limit) {
<syntaxhighlight lang="javascript">function eratosthenes(limit) {
var prms = [];
var prms = [];
if (limit >= 2) prms = [2];
if (limit >= 2) prms = [2];
Line 5,330: Line 10,463:
}
}
return prms;
return prms;
}</lang>
}</syntaxhighlight>


While the above code is quite fast especially using an efficient JavaScript engine such as Google Chrome's V8, it isn't as elegant as it could be using the features of the new EcmaScript6 specification when it comes out about the end of 2014 and when JavaScript engines including those of browsers implement that standard in that we might choose to implement an incremental algorithm iterators or generators similar to as implemented in Python or F# (yield). Meanwhile, we can emulate some of those features by using a simulation of an iterator class (which is easier than using a call-back function) for an '''"infinite" generator based on an Object dictionary''' as in the following odds-only code written as a JavaScript class:
While the above code is quite fast especially using an efficient JavaScript engine such as Google Chrome's V8, it isn't as elegant as it could be using the features of the new EcmaScript6 specification when it comes out about the end of 2014 and when JavaScript engines including those of browsers implement that standard in that we might choose to implement an incremental algorithm iterators or generators similar to as implemented in Python or F# (yield). Meanwhile, we can emulate some of those features by using a simulation of an iterator class (which is easier than using a call-back function) for an '''"infinite" generator based on an Object dictionary''' as in the following odds-only code written as a JavaScript class:


<lang javascript>var SoEIncClass = (function () {
<syntaxhighlight lang="javascript">var SoEIncClass = (function () {
function SoEIncClass() {
function SoEIncClass() {
this.n = 0;
this.n = 0;
Line 5,375: Line 10,508:
};
};
return SoEIncClass;
return SoEIncClass;
})();</lang>
})();</syntaxhighlight>


The above code can be used to find the nth prime (which would require estimating the required range limit using the previous fixed range code) by using the following code:
The above code can be used to find the nth prime (which would require estimating the required range limit using the previous fixed range code) by using the following code:


<lang javascript>var gen = new SoEIncClass();
<syntaxhighlight lang="javascript">var gen = new SoEIncClass();
for (var i = 1; i < 1000000; i++, gen.next());
for (var i = 1; i < 1000000; i++, gen.next());
var prime = gen.next();
var prime = gen.next();
Line 5,385: Line 10,518:
if (typeof print == "undefined")
if (typeof print == "undefined")
print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print(prime);</lang>
print(prime);</syntaxhighlight>


to produce the following output (about five seconds using Google Chrome's V8 JavaScript engine):
to produce the following output (about five seconds using Google Chrome's V8 JavaScript engine):
Line 5,395: Line 10,528:
This can be implemented as '''an "infinite" odds-only generator using page segmentation''' for a considerable speed-up with the alternate JavaScript class code as follows:
This can be implemented as '''an "infinite" odds-only generator using page segmentation''' for a considerable speed-up with the alternate JavaScript class code as follows:


<lang javascript>var SoEPgClass = (function () {
<syntaxhighlight lang="javascript">var SoEPgClass = (function () {
function SoEPgClass() {
function SoEPgClass() {
this.bi = -1; // constructor resets the enumeration to start...
this.bi = -1; // constructor resets the enumeration to start...
Line 5,451: Line 10,584:
};
};
return SoEPgClass;
return SoEPgClass;
})();</lang>
})();</syntaxhighlight>


The above code is about fifty times faster (about five seconds to calculate 50 million primes to about a billion on the Google Chrome V8 JavaScript engine) than the above dictionary based code.
The above code is about fifty times faster (about five seconds to calculate 50 million primes to about a billion on the Google Chrome V8 JavaScript engine) than the above dictionary based code.


The speed for both of these "infinite" solutions will also respond to further wheel factorization techniques, especially for the dictionary based version where any added overhead to deal with the factorization wheel will be negligible compared to the dictionary overhead. The dictionary version would likely speed up about a factor of three or a little more with maximum wheel factorization applied; the page segmented version probably won't gain more than a factor of two and perhaps less due to the overheads of array look-up operations.
The speed for both of these "infinite" solutions will also respond to further wheel factorization techniques, especially for the dictionary based version where any added overhead to deal with the factorization wheel will be negligible compared to the dictionary overhead. The dictionary version would likely speed up about a factor of three or a little more with maximum wheel factorization applied; the page segmented version probably won't gain more than a factor of two and perhaps less due to the overheads of array look-up operations.

function is copy-pasted from above to produce a webpage version for beginners:
<syntaxhighlight lang="javascript">
<script>
function eratosthenes(limit) {
var primes = [];
if (limit >= 2) {
var sqrtlmt = Math.sqrt(limit) - 2;
var nums = new Array(); // start with an empty Array...
for (var i = 2; i <= limit; i++) // and
nums.push(i); // only initialize the Array once...
for (var i = 0; i <= sqrtlmt; i++) {
var p = nums[i]
if (p)
for (var j = p * p - 2; j < nums.length; j += p)
nums[j] = 0;
}
for (var i = 0; i < nums.length; i++) {
var p = nums[i];
if (p)
primes.push(p);
}
}
return primes;
}
var primes = eratosthenes(100);
output='';
for (var i = 0; i < primes.length; i++) {
output+=primes[i];
if (i < primes.length-1) output+=',';
}
document.write(output);
</script>
</syntaxhighlight>


=={{header|JOVIAL}}==
=={{header|JOVIAL}}==
<syntaxhighlight lang="jovial">
<lang JOVIAL>
START
START
FILE MYOUTPUT ... $ ''Insufficient information to complete this declaration''
FILE MYOUTPUT ... $ ''Insufficient information to complete this declaration''
Line 5,512: Line 10,679:
END
END
TERM$
TERM$
</syntaxhighlight>
</lang>


=={{header|jq}}==
=={{header|jq}}==
{{works with|jq|1.4}}
{{works with|jq|1.4}}
==Bare Bones==

Short and sweet ...
Short and sweet ...


<lang jq># Denoting the input by $n, which is assumed to be a positive integer,
<syntaxhighlight lang="jq"># Denoting the input by $n, which is assumed to be a positive integer,
# eratosthenes/0 produces an array of primes less than or equal to $n:
# eratosthenes/0 produces an array of primes less than or equal to $n:
def eratosthenes:
def eratosthenes:


# erase(i) sets .[i*j] to false for integral j > 1
def erase(i):
def erase(i):
if .[i] then reduce range(2; (1 + length) / i) as $j (.; .[i * $j] = false)
if .[i] then
reduce (range(2*i; length; i)) as $j (.; .[$j] = false)
else .
else .
end;
end;
Line 5,533: Line 10,700:
| [null, null, range(2; $n)]
| [null, null, range(2; $n)]
| reduce (2, 1 + (2 * range(1; $s))) as $i (.; erase($i))
| reduce (2, 1 + (2 * range(1; $s))) as $i (.; erase($i))
| map(select(.));</lang>
| map(select(.));</syntaxhighlight>
'''Examples''':
'''Examples''':
<lang jq>100 | eratosthenes</lang>
<syntaxhighlight lang="jq">100 | eratosthenes</syntaxhighlight>
{{out}}
{{out}}


[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
<lang jq>1e7 | eratosthenes | length</lang>
<syntaxhighlight lang="jq">1e7 | eratosthenes | length</syntaxhighlight>
{{out}}
{{out}}
664579
664579

===Enhanced Sieve===
Here is a more economical variant that:

* produces a stream of primes less than or equal to a given integer;
* only records the status of odd integers greater than 3 during the sieving process;
* optimizes the inner loop as described in the task description.

<syntaxhighlight lang=jq>
def primes:
# The array we use for the sieve only stores information for the odd integers greater than 1:
# index integer
# 0 3
# k 2*k + 3
# So if we wish to mark m = 2*k + 3, the relevant index is: m - 3 / 2
def ix:
if . % 2 == 0 then null
else ((. - 3) / 2)
end;
# erase(i) sets .[i*j] to false for odd integral j > i, and assumes i is odd
def erase(i):
((i - 3) / 2) as $k
# Consider relevant multiples:
then (((length * 2 + 3) / i)) as $upper
# ... only consider odd multiples from i onwards
| reduce range(i; $upper; 2) as $j (.;
(((i * $j) - 3) / 2) as $m
| if .[$m] then .[$m] = false else . end);

if . < 2 then []
else (. + 1) as $n
| (($n|sqrt) / 2) as $s
| [range(3; $n; 2)|true]
| reduce (1 + (2 * range(1; $s)) ) as $i (.; erase($i))
| . as $sieve
| 2, (range(3; $n; 2) | select($sieve[ix]))
end ;

def count(s): reduce s as $_ (0; .+1);

count(1e6 | primes)
</syntaxhighlight>
{{output}}
<pre>
78498
</pre>


=={{header|Julia}}==
=={{header|Julia}}==


Started with 2 already in the array, and then test only for odd numbers and push the prime ones onto the array.
Started with 2 already in the array, and then test only for odd numbers and push the prime ones onto the array.
<syntaxhighlight lang="julia"># Returns an array of positive prime numbers less than or equal to lim
<lang julia>
# Returns an array of positive prime numbers less than or equal to lim
function sieve(lim :: Int)
function sieve(lim :: Int)
is_prime :: Array = trues(lim)
if lim < 2 return [] end
llim :: Int = isqrt(lim)
limi :: Int = (lim - 1) ÷ 2 # calculate the required array size
result :: Array = [2] #Initial array
isprime :: Array{Bool} = trues(limi)
llimi :: Int = (isqrt(lim) - 1) ÷ 2 # and calculate maximum root prime index
for i = 3:2:lim
result :: Array{Int} = [2] #Initial array
if is_prime[i]
if i <= llim
for i in 1:limi
for j = i*i:2*i:lim
if isprime[i]
is_prime[j] = false
p = i + i + 1 # 2i + 1
if i <= llimi
for j = (p*p-1)>>>1:p:limi # quick shift/divide in case LLVM doesn't optimize divide by 2 away
isprime[j] = false
end
end
end
end
push!(result,i)
push!(result, p)
end
end
end
end
return result
return result
end</syntaxhighlight>

Alternate version using <code>findall</code> to get all primes at once in the end

<syntaxhighlight lang="julia">function sieve(n::Integer)
primes = fill(true, n)
primes[1] = false
for p in 2:n
primes[p] || continue
primes[p .* (2:n÷p)] .= false
end
findall(primes)
end</syntaxhighlight>

At about 35 seconds for a range of a billion on my Intel Atom i5-Z8350 CPU at 1.92 GHz (single threaded) or about 70 CPU clock cycles per culling operation, the above examples are two of the very slowest ways to compute the Sieve of Eratosthenes over any kind of a reasonable range due to a couple of factors:
# The output primes are extracted to a result array which takes time (and memory) to construct.
# They use the naive "one huge memory array" method, which has poor memory access speed for larger ranges.


Even though the first uses an odds-only algorithm (not noted in the text as is a requirement of the task) that reduces the number of operations by a factor of about two and a half times, it is not faster than the second, which is not odds-only due to the second being set up to take advantage of the `findall` function to directly output the indices of the remaining true values as the found primes; the second is faster due to the first taking longer to push the found primes singly to the constructed array, whereas internally the second first creates the array to the size of the counted true values and then just fills it.

Also, the first uses more memory than necessary in one byte per `Bool` where using a `BitArray` as in the second reduces this by a factor of eight.

If one is going to "crib" the MatLab algorithm as above, one may as well do it using odds-only as per the MatLab built-in. The following alternate code improves on the "Alternate" example above by making it sieve odds-only and adjusting the result array contents after to suit:

<syntaxhighlight lang="julia">function sieve2(n :: Int)
ni = (n - 1) ÷ 2
isprime = trues(ni)
for i in 1:ni
if isprime[i]
j = 2i * (i + 1)
if j > ni
m = findall(isprime)
map!((i::Int) -> 2i + 1, m, m)
return pushfirst!(m, 2)
else
p = 2i + 1
while j <= ni
isprime[j] = false
j += p
end
end
end
end
end</syntaxhighlight>

This takes less about 18.5 seconds or 36 CPU cycles per culling operation to find the primes to a billion, but that is still quite slow compared to what can be done below. Note that the result array needs to be created then copied, created by the <code>findall</code> function, then modified in place by the <code>map!</code> function to transform the indices to primes, and finally copied by the <code>pushfirst!</code> function to add the only even prime of two to the beginning, but these operations are quire fast. However, this still consumes a lot of memory, as in about 64 Megabytes for the sieve buffer and over 400 Megabytes for the result (8-byte Int's for 64 bit execution) to sieve to a billion, and culling the huge culling buffer that doesn't fit the CPU cache sizes is what makes it slow.

===Iterator Output===

The creation of an output results array is not necessary if the purpose is just to scan across the resulting primes once, they can be output using an iterator (from a `BitArray`) as in the following odds-only code:

<syntaxhighlight lang="julia">const Prime = UInt64

struct Primes
rangei :: Int64
primebits :: BitArray{1}
function Primes(n :: Int64)
if n < 3
if n < 2 return new(-1, falses(0)) # no elements
else return new((0, trues(0))) end # n = 2: meaning is 1 element of 2
end
limi :: Int = (n - 1) ÷ 2 # calculate the required array size
isprimes :: BitArray = trues(limi)
@inbounds(
for i in 1:limi
p = i + i + 1
start = (p * p - 1) >>> 1 # shift/divide if LLVM doesn't optimize
if start > limi
return new(limi, isprimes)
end
if isprimes[i]
for j in start:p:limi
isprimes[j] = false
end
end
end)
end
end
end
</lang>


Base.eltype(::Type{Primes}) = Prime
Alternate version using <code>find</code> to get all primes at once in the end


<lang julia>function sieve(n :: Int)
function Base.length(P::Primes)::Int64
if P.rangei < 0 return 0 end
a = trues(n)
a[1] = false
return 1 + count(P.primebits)
end
for i = 1:n

if a[i]
function Base.iterate(P::Primes, state::Int = 0)::
j = i * i
Union{Tuple{Prime, Int}, Nothing}
if j > n
lmt = P.rangei
return find(a)
if state > lmt return nothing end
if state <= 0 return (UInt64(2), 1) end
let
prmbts = P.primebits
i = state
@inbounds(
while i <= lmt && !prmbts[i] i += 1 end)
if i > lmt return nothing end
return (i + i + 1, i + 1)
end
end</syntaxhighlight>

for which using the following code:

<syntaxhighlight lang="julia">function bench()
@time length(Primes(100)) # warm up JIT
# println(@time count(x->true, Primes(1000000000))) # about 1.5 seconds slower counting over iteration
println(@time length(Primes(1000000000)))
end
bench()</syntaxhighlight>

results in the following output:
{{out}}
<pre> 0.000031 seconds (3 allocations: 160 bytes)
12.214533 seconds (4 allocations: 59.605 MiB, 0.42% gc time)
50847534</pre>

This reduces the CPU cycles per culling cycles to about 24.4, but it's still slow due to using the one largish array. Note that counting each iterated prime takes an additional about one and a half seconds, where if all that is required is the count of primes over a range the specialized length function is much faster.

===Page Segmented Algorithm===

For any kind of reasonably large range such as a billion, a page segmented version should be used with the pages sized to the CPU caches for much better memory access times. As well, the following odds-only example uses a custom bit packing algorithm for a further two times speed-up, also reducing the memory allocation delays by reusing the sieve buffers when possible (usually possible):

<syntaxhighlight lang="julia">const Prime = UInt64
const BasePrime = UInt32
const BasePrimesArray = Array{BasePrime,1}
const SieveBuffer = Array{UInt8,1}

# contains a lazy list of a secondary base primes arrays feed
# NOT thread safe; needs a Mutex gate to make it so...
abstract type BPAS end # stands in for BasePrimesArrays, not defined yet
mutable struct BasePrimesArrays <: BPAS
thunk :: Union{Nothing,Function} # problem with efficiency - untyped function!!!!!!!!!
value :: Union{Nothing,Tuple{BasePrimesArray, BPAS}}
BasePrimesArrays(thunk::Function) = new(thunk)
end
Base.eltype(::Type{BasePrimesArrays}) = BasePrime
Base.IteratorSize(::Type{BasePrimesArrays}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(BPAs::BasePrimesArrays, state::BasePrimesArrays = BPAs)
if state.thunk !== nothing
newvalue :: Union{Nothing,Tuple{BasePrimesArray, BasePrimesArrays}} =
state.thunk() :: Union{Nothing,Tuple{BasePrimesArray
, BasePrimesArrays}}
state.value = newvalue
state.thunk = nothing
return newvalue
end
state.value
end

# count the number of zero bits (primes) in a byte array,
# also works for part arrays/slices, best used as an `@view`...
function countComposites(cmpsts::AbstractArray{UInt8,1})
foldl((a, b) -> a + count_zeros(b), cmpsts; init = 0)
end

# converts an entire sieved array of bytes into an array of UInt32 primes,
# to be used as a source of base primes...
function composites2BasePrimesArray(low::Prime, cmpsts::SieveBuffer)
limiti = length(cmpsts) * 8
len :: Int = countComposites(cmpsts)
rslt :: BasePrimesArray = BasePrimesArray(undef, len)
i :: Int = 0
j :: Int = 1
@inbounds(
while i < limiti
if cmpsts[i >>> 3 + 1] & (1 << (i & 7)) == 0
rslt[j] = low + i + i
j += 1
end
i += 1
end)
rslt
end

# sieving work done, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
function sieveComposites(low::Prime, buffer::Array{UInt8,1},
bpas::BasePrimesArrays)
lowi :: Int = (low - 3) ÷ 2
len :: Int = length(buffer)
limiti :: Int = len * 8 - 1
nexti :: Int = lowi + limiti
for bpa::BasePrimesArray in bpas
for bp::BasePrime in bpa
bpint :: Int = bp
bpi :: Int = (bpint - 3) >>> 1
starti :: Int = 2 * bpi * (bpi + 3) + 3
starti >= nexti && return
if starti >= lowi starti -= lowi
else
else
a[j:i:n] = false
r :: Int = (lowi - starti) % bpint
starti = r == 0 ? 0 : bpint - r
end
end
lmti :: Int = limiti - 40 * bpint
@inbounds(
if bpint <= (len >>> 2) starti <= lmti
for i in 1:8
if starti > limiti break end
mask = convert(UInt8,1) << (starti & 7)
c = starti >>> 3 + 1
while c <= len
buffer[c] |= mask
c += bpint
end
starti += bpint
end
else
c = starti
while c <= limiti
buffer[c >>> 3 + 1] |= convert(UInt8,1) << (c & 7)
c += bpint
end
end)
end
end
end
end
return
end</lang>
end


# starts the secondary base primes feed with minimum size in bits set to 4K...
=={{header|Kotlin}}==
# thus, for the first buffer primes up to 8293,
<lang scala>fun sieve(limit: Int): List<Int> {
# the seeded primes easily cover it as 97 squared is 9409.
val primes = mutableListOf<Int>()
function makeBasePrimesArrays() :: BasePrimesArrays
cmpsts :: SieveBuffer = Array{UInt8,1}(undef, 512)
function nextelem(low::Prime, bpas::BasePrimesArrays) ::
Tuple{BasePrimesArray, BasePrimesArrays}
# calculate size so that the bit span is at least as big as the
# maximum culling prime required, rounded up to minsizebits blocks...
reqdsize :: Int = 2 + isqrt(1 + low)
size :: Int = (reqdsize ÷ 4096 + 1) * 4096 ÷ 8 # size in bytes
if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
fill!(cmpsts, 0)
sieveComposites(low, cmpsts, bpas)
arr :: BasePrimesArray = composites2BasePrimesArray(low, cmpsts)
next :: Prime = low + length(cmpsts) * 8 * 2
arr, BasePrimesArrays(() -> nextelem(next, bpas))
end
# pre-seeding breaks recursive race,
# as only known base primes used for first page...
preseedarr :: BasePrimesArray = # pre-seed to 100, can sieve to 10,000...
[ 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
]
nextfunc :: Function = () ->
(nextelem(convert(Prime,101), makeBasePrimesArrays()))
firstfunc :: Function = () -> (preseedarr, BasePrimesArrays(nextfunc))
BasePrimesArrays(firstfunc)
end


# an iterator over successive sieved buffer composite arrays,
if (limit >= 2) {
# returning a tuple of the value represented by the lowest possible prime
val numbers = Array(limit + 1) { true }
# in the sieved composites array and the array itself;
val sqrtLimit = Math.sqrt(limit.toDouble()).toInt()
# the array has a 16 Kilobytes minimum size (CPU L1 cache), but
# will grow so that the bit span is larger than the
# maximum culling base prime required, possibly making it larger than
# the L1 cache for large ranges, but still reasonably efficient using
# the L2 cache: very efficient up to about 16e9 range;
# reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 week...
struct PrimesPages
baseprimes :: BasePrimesArrays
PrimesPages() = new(makeBasePrimesArrays())
end
Base.eltype(::Type{PrimesPages}) = SieveBuffer
Base.IteratorSize(::Type{PrimesPages}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PP::PrimesPages,
state :: Tuple{Prime,SieveBuffer} =
( convert(Prime,3), Array{UInt8,1}(undef,16384) ))
(low, cmpsts) = state
# calculate size so that the bit span is at least as big as the
# maximum culling prime required, rounded up to minsizebits blocks...
reqdsize :: Int = 2 + isqrt(1 + low)
size :: Int = (reqdsize ÷ 131072 + 1) * 131072 ÷ 8 # size in bytes
if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
fill!(cmpsts, 0)
sieveComposites(low, cmpsts, PP.baseprimes)
newlow :: Prime = low + length(cmpsts) * 8 * 2
( low, cmpsts ), ( newlow, cmpsts )
end


function countPrimesTo(range::Prime) :: Int64
for (factor in 2..sqrtLimit) {
if (numbers[factor]) {
range < 3 && ((range < 2 && return 0) || return 1)
count :: Int64 = 1
for (multiple in (factor * factor)..limit step factor) {
for ( low, cmpsts ) in PrimesPages() # almost never exits!!!
numbers[multiple] = false
}
if low + length(cmpsts) * 8 * 2 > range
}
lasti :: Int = (range - low) ÷ 2
count += countComposites(@view cmpsts[1:lasti >>> 3])
}
count += count_zeros(cmpsts[lasti >>> 3 + 1] |
(0xFE << (lasti & 7)))
return count
end
count += countComposites(cmpsts)
end
count
end


# iterator over primes from above page iterator;
numbers.forEachIndexed { number, isPrime ->
# unless doing something special with individual primes, usually unnecessary;
if (number >= 2) {
# better to do manipulations based on the composites bit arrays...
if (isPrime) {
# takes at least as long to enumerate the primes as sieve them...
primes.add(number)
mutable struct PrimesPaged
}
primespages :: PrimesPages
}
primespageiter :: Tuple{Tuple{Prime,SieveBuffer},Tuple{Prime,SieveBuffer}}
}
PrimesPaged() = let PP = PrimesPages(); new(PP, Base.iterate(PP)) end
}
end
Base.eltype(::Type{PrimesPaged}) = Prime
Base.IteratorSize(::Type{PrimesPaged}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PP::PrimesPaged, state::Int = -1 )
state < 0 && return Prime(2), 0
(low, cmpsts) = PP.primespageiter[1]
len = length(cmpsts) * 8
@inbounds(
while state < len && cmpsts[state >>> 3 + 1] &
(UInt8(1) << (state & 7)) != 0
state += 1
end)
if state >= len
PP.primespageiter = Base.iterate(PP.primespages, PP.primespageiter[2])
return Base.iterate(PP, 0)
end
low + state + state, state + 1
end</syntaxhighlight>


When tested with the following code:
return primes

<syntaxhighlight lang="julia">function bench()
print("( ")
for p in PrimesPaged() p > 100 && break; print(p, " ") end
println(")")
countPrimesTo(Prime(100)) # warm up JIT
#=
println(@time let count = 0
for p in PrimesPaged()
p > 1000000000 && break
count += 1
end; count end) # much slower counting over iteration
=#
println(@time countPrimesTo(Prime(1000000000)))
end
bench()</syntaxhighlight>

it produces the following:
{{out}}
<pre>( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
1.947145 seconds (59 allocations: 39.078 KiB)
50847534</pre>

Note that "the slow way" as commented out in the code takes an extra about 4.85 seconds to count the primes to a billion, or longer to enumerate the primes than to cull the composites; this makes further work in making this yet faster pointless unless techniques such as the one used here to count the number of found primes by just counting the un-cancelled bit representations in the sieved sieve buffers are used.

This takes about 1.9 seconds to count the primes to a billion (using the fast technique), or about 3.75 clock cycles per culling operation, which is reasonably fast; this is almost 20 times faster the the first naive sieves. As written, the algorithm maintains its efficiency up to about 16 billion and then slows down as the buffer size increases beyond the CPU L1 cache size into the L2 cache size such that it takes about 436.8 seconds to sieve to 100 billion instead of the expected about 300 seconds; however, an extra feature of "double buffered sieving" could be added so that the buffer is sieved in L1 cache slices followed by a final sweep of the entire buffer by the few remaining cull operations that use the larger primes for only a slight reduction in average cycles per cull up to a range of about 2.56e14 (for this CPU). For really large ranges above that, another sieving technique known as the "bucket sieve" that sorts the culling operations by page so that processing time is not expended for values that don't "hit" a given page can be used for only a slight additional reduction in efficiency.

Additionally, maximal wheel factorization can reduce the time by about a factor of four, plus multi-processing where the work is shared across the CPU cores can produce a further speed-up by the factor of the number of cores (only three times on this four-core machine due to the clock speed reducing to 75% of the rate when all cores are used), for an additional about 12 times speed-up for this CPU. These improvements are just slightly too complex to post here.

However, even the version posted shows that the naive "one huge array" implementations should never be used for sieving ranges of over a few million, and that Julia can come very close to the speed of the fastest languages such as C/C++ for the same algorithm.

===Functional Algorithm===

One of the best simple purely functional Sieve of Eratosthenes algorithms is the infinite tree folding sequence algorithm as implemented in Haskell. As Julia does not have a standard LazyList implementation or library and as a full memoizing lazy list is not required for this algorithm, the following odds-only code implements the rudiments of a Co-Inductive Stream (CIS) in its implementation:

<syntaxhighlight lang="julia">const Thunk = Function # can't define other than as a generalized Function

struct CIS{T}
head :: T
tail :: Thunk # produces the next CIS{T}
CIS{T}(head :: T, tail :: Thunk) where T = new(head, tail)
end
Base.eltype(::Type{CIS{T}}) where T = T
Base.IteratorSize(::Type{CIS{T}}) where T = Base.SizeUnknown()
function Base.iterate(C::CIS{T}, state = C) :: Tuple{T, CIS{T}} where T
state.head, state.tail()
end

function treefoldingprimes()::CIS{Int}
function merge(xs::CIS{Int}, ys::CIS{Int})::CIS{Int}
x = xs.head; y = ys.head
if x < y CIS{Int}(x, () -> merge(xs.tail(), ys))
elseif y < x CIS{Int}(y, () -> merge(xs, ys.tail()))
else CIS{Int}(x, () -> merge(xs.tail(), ys.tail())) end
end
function pmultiples(p::Int)::CIS{Int}
adv :: Int = p + p
next(c::Int)::CIS{Int} = CIS{Int}(c, () -> next(c + adv)); next(p * p)
end
function allmultiples(ps::CIS{Int})::CIS{CIS{Int}}
CIS{CIS{Int}}(pmultiples(ps.head), () -> allmultiples(ps.tail()))
end
function pairs(css :: CIS{CIS{Int}})::CIS{CIS{Int}}
nextcss = css.tail()
CIS{CIS{Int}}(merge(css.head, nextcss.head), ()->pairs(nextcss.tail()))
end
function composites(css :: CIS{CIS{Int}})::CIS{Int}
CIS{Int}(css.head.head, ()-> merge(css.head.tail(),
css.tail() |> pairs |> composites))
end
function minusat(n::Int, cs::CIS{Int})::CIS{Int}
if n < cs.head CIS{Int}(n, () -> minusat(n + 2, cs))
else minusat(n + 2, cs.tail()) end
end
oddprimes()::CIS{Int} = CIS{Int}(3, () -> minusat(5, oddprimes()
|> allmultiples |> composites))
CIS{Int}(2, () -> oddprimes())
end</syntaxhighlight>

when tested with the following:

<syntaxhighlight lang="julia">@time let count = 0; for p in treefoldingprimes() p > 1000000 && break; count += 1 end; count end</syntaxhighlight>

it outputs the following:
{{out}}
<pre> 1.791058 seconds (10.23 M allocations: 290.862 MiB, 3.64% gc time)
78498</pre>

At about 1.8 seconds or 4000 cycles per culling operation to calculate the number of primes up to a million, this is very slow, but that is not the fault of Julia but rather just that purely functional incremental Sieve of Eratosthenes implementations are much slower than those using mutable arrays and are only useful over quite limited ranges of a few million. For one thing, incremental algorithms have O(n log n log log n) asymptotic execution complexity rather than O(n log log n) (an extra log n factor) and for another the constant execution overhead is much larger in creating (and garbage collecting) elements in the sequences.

The time for this algorithm is quite comparable to as implemented in other functional languages such as F# and actually faster than implementing the same algorithm in C/C++, but slower than as implemented in purely functional languages such as Haskell or even in only partly functional languages such as Kotlin by a factor of ten or more; this is due to those languages having specialized memory allocation that is very fast at allocating small amounts of memory per allocation as is often a requirement of functional programming. The majority of the time spent for this algorithm is spent allocating memory, and if future versions of Julia are to be of better use in purely functional programming, improvements need to be made to the memory allocation.

===Infinite (Mutable) Iterator Using (Mutable) Dictionary===

To gain some extra speed above the purely functional algorithm above, the Python'ish version as a mutable iterator embedding a mutable standard base Dictionary can be used. The following version uses a secondary delayed injection stream of "base" primes defined recursively to provide the successions of composite values in the Dictionary to be used for sieving:

<syntaxhighlight lang="julia">const Prime = UInt64
abstract type PrimesDictAbstract end # used for forward reference
mutable struct PrimesDict <: PrimesDictAbstract
sieve :: Dict{Prime,Prime}
baseprimes :: PrimesDictAbstract
lastbaseprime :: Prime
q :: Prime
PrimesDict() = new(Dict())
end
Base.eltype(::Type{PrimesDict}) = Prime
Base.IteratorSize(::Type{PrimesDict}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PD::PrimesDict, state::Prime = Prime(0) )
if state < 1
PD.baseprimes = PrimesDict()
PD.lastbaseprime = Prime(3)
PD.q = Prime(9)
return Prime(2), Prime(1)
end
dict = PD.sieve
while true
state += 2
if !haskey(dict, state)
state < PD.q && return state, state
p = PD.lastbaseprime # now, state = PD.q in all cases
adv = p + p # since state is at PD.q, advance to next
dict[state + adv] = adv # adds base prime composite stream
# following initializes secondary base strea first time
p <= 3 && Base.iterate(PD.baseprimes)
p = Base.iterate(PD.baseprimes, p)[1] # next base prime
PD.lastbaseprime = p
PD.q = p * p
else # advance hit composite in dictionary...
adv = pop!(dict, state)
next = state + adv
while haskey(dict, next) next += adv end
dict[next] = adv # past other composite hits in dictionary
end
end
end</syntaxhighlight>

The above version can be used and tested with similar code as for the functional version, but is about ten times faster at about 400 CPU clock cycles per culling operation, meaning it has a practical range ten times larger although it still has a O(n (log n) (log log n)) asymptotic performance complexity; for larger ranges such as sieving to a billion or more, this is still over a hundred times slower than the page segmented version using a page segmented sieving array.

=={{header|Klingphix}}==
<syntaxhighlight lang="klingphix">include ..\Utilitys.tlhy

%limit %i
1000 !limit
( 1 $limit ) sequence

( 2 $limit sqrt int ) [ !i $i get [ ( 2 $limit 1 - $i / int ) [ $i * false swap set ] for ] if ] for
( 1 $limit false ) remove
pstack

"Press ENTER to end " input</syntaxhighlight>

=={{header|Kotlin}}==
<syntaxhighlight lang="kotlin">import kotlin.math.sqrt

fun sieve(max: Int): List<Int> {
val xs = (2..max).toMutableList()
val limit = sqrt(max.toDouble()).toInt()
for (x in 2..limit) xs -= x * x..max step x
return xs
}
}


fun main(args: Array<String>) {
fun main(args: Array<String>) {
println(sieve(100))
println(sieve(100))
}</lang>
}</syntaxhighlight>


{{out}}
{{out}}
Line 5,624: Line 11,276:
The following code overcomes most of those problems: It only culls odd composites; it culls a bit-packed primitive array (also saving memory); It uses tailcall recursive functions for the loops, which are compiled into simple loops. It also outputs the results as an enumeration, which isn't fast but does not consume any more memory than the culling array. In this way, the program is only limited in sieving range by the maximum size limit of the culling array, although as it grows larger than the CPU cache sizes, it loses greatly in speed; however, that doesn't matter so much if just enumerating the results.
The following code overcomes most of those problems: It only culls odd composites; it culls a bit-packed primitive array (also saving memory); It uses tailcall recursive functions for the loops, which are compiled into simple loops. It also outputs the results as an enumeration, which isn't fast but does not consume any more memory than the culling array. In this way, the program is only limited in sieving range by the maximum size limit of the culling array, although as it grows larger than the CPU cache sizes, it loses greatly in speed; however, that doesn't matter so much if just enumerating the results.


<lang scala>fun primesOdds(rng: Int): Iterable<Int> {
<syntaxhighlight lang="kotlin">fun primesOdds(rng: Int): Iterable<Int> {
val topi = (rng - 3) shr 1
val topi = (rng - 3) shr 1
val lstw = topi shr 5
val lstw = topi shr 5
Line 5,672: Line 11,324:
println()
println()
println(primesOdds(1000000).count())
println(primesOdds(1000000).count())
}</lang>
}</syntaxhighlight>
{{output}}
{{output}}
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Line 5,681: Line 11,333:
Ah, one might say, for such a trivial range one writes for conciseness and not for speed. Well, I say, one can still save memory and some time using odds-only and a bit-packed array, but write very clear and concise (but slower) code using nothing but higher order functions and function calling. The following code using such techniques can use the same "main" function for the same output but is about two times slower, mostly due to the extra time spent making (nested) function calls, including the function calls necessary for enumeration. Note that the effect of using the "(l .. h).forEach { .. }" is the same as the "for i in l .. h { .. }" as both use an iteration across the range but the second is just syntax sugar to make it look more imperative:
Ah, one might say, for such a trivial range one writes for conciseness and not for speed. Well, I say, one can still save memory and some time using odds-only and a bit-packed array, but write very clear and concise (but slower) code using nothing but higher order functions and function calling. The following code using such techniques can use the same "main" function for the same output but is about two times slower, mostly due to the extra time spent making (nested) function calls, including the function calls necessary for enumeration. Note that the effect of using the "(l .. h).forEach { .. }" is the same as the "for i in l .. h { .. }" as both use an iteration across the range but the second is just syntax sugar to make it look more imperative:


<lang scala>fun primesOdds(rng: Int): Iterable<Int> {
<syntaxhighlight lang="kotlin">fun primesOdds(rng: Int): Iterable<Int> {
val topi = (rng - 3) / 2 //convert to nearest index
val topi = (rng - 3) / 2 //convert to nearest index
val size = topi / 32 + 1 //word size to include index
val size = topi / 32 + 1 //word size to include index
Line 5,693: Line 11,345:
val orng = (-1 .. topi).filter { it < 0 || is_p(it) }.map { i2p(it) }
val orng = (-1 .. topi).filter { it < 0 || is_p(it) }.map { i2p(it) }
return Iterable { -> orng.iterator() }
return Iterable { -> orng.iterator() }
}</lang>
}</syntaxhighlight>


The trouble with the above version is that, at least for Kotlin version 1.0, the ".filter" and ".map" extension functions for Iterable<Int> create Java "ArrayList"'s as their output (which are wrapped to return the Kotlin "List<Int>" interface), thus take a considerable amount of memory worse than the first version (using an ArrayList to store the resulting primes), since as the calculations are chained to ".map", require a second ArrayList of up to the same size while the mapping is being done. The following version uses Sequences , which aren't backed by any permanent structure, but it is another small factor slower due to the nested function calls:
The trouble with the above version is that, at least for Kotlin version 1.0, the ".filter" and ".map" extension functions for Iterable<Int> create Java "ArrayList"'s as their output (which are wrapped to return the Kotlin "List<Int>" interface), thus take a considerable amount of memory worse than the first version (using an ArrayList to store the resulting primes), since as the calculations are chained to ".map", require a second ArrayList of up to the same size while the mapping is being done. The following version uses Sequences , which aren't backed by any permanent structure, but it is another small factor slower due to the nested function calls:


<lang scala>fun primesOdds(rng: Int): Iterable<Int> {
<syntaxhighlight lang="kotlin">fun primesOdds(rng: Int): Iterable<Int> {
val topi = (rng - 3) / 2 //convert to nearest index
val topi = (rng - 3) / 2 //convert to nearest index
val size = topi / 32 + 1 //word size to include index
val size = topi / 32 + 1 //word size to include index
Line 5,711: Line 11,363:
val oseq = iseq(topi, -1).filter { it < 0 || is_p(it) }.map { i2p(it) }
val oseq = iseq(topi, -1).filter { it < 0 || is_p(it) }.map { i2p(it) }
return Iterable { -> oseq.iterator() }
return Iterable { -> oseq.iterator() }
}</lang>
}</syntaxhighlight>

===Unbounded Versions===

'''An incremental odds-only sieve outputting a sequence (iterator)'''

The following Sieve of Eratosthenes is not purely functional in that it uses a Mutable HashMap to store the state of succeeding composite numbers to be skipped over, but embodies the principles of an incremental implementation of the Sieve of Eratosthenes sieving odds-only and is faster than most incremental sieves due to using mutability. As with the fastest of this kind of sieve, it uses a delayed secondary primes feed as a source of base primes to generate the composite number progressions. The code as follows:
<syntaxhighlight lang="kotlin">fun primesHM(): Sequence<Int> = sequence {
yield(2)
fun oddprms(): Sequence<Int> = sequence {
yield(3); yield(5) // need at least 2 for initialization
val hm = HashMap<Int,Int>()
hm.put(9, 6)
val bps = oddprms().iterator(); bps.next(); bps.next() // skip past 5
yieldAll(generateSequence(SieveState(7, 5, 25)) {
ss ->
var n = ss.n; var q = ss.q
n += 2
while ( n >= q || hm.containsKey(n)) {
if (n >= q) {
val inc = ss.bp shl 1
hm.put(n + inc, inc)
val bp = bps.next(); ss.bp = bp; q = bp * bp
}
else {
val inc = hm.remove(n)!!
var next = n + inc
while (hm.containsKey(next)) {
next += inc
}
hm.put(next, inc)
}
n += 2
}
ss.n = n; ss.q = q
ss
}.map { it.n })
}
yieldAll(oddprms())
}</syntaxhighlight>

At about 370 clock cycles per culling operation (about 3,800 cycles per prime) on my tablet class Intel CPU, this is not blazing fast but adequate for ranges of a few millions to a hundred million and thus fine for doing things like solving Euler problems. For instance, Euler Problem 10 of summing the primes to two million can be done with the following "one-liner":
<syntaxhighlight lang="kotlin">primesHM().takeWhile { it <= 2_000_000 }.map { it.toLong() }.sum()</syntaxhighlight>

to output the correct answer of the following in about 270 milliseconds for my Intel x5-Z8350 at 1.92 Gigahertz:
{{output}}
<pre>142913828922</pre>

'''A purely functional Incremental Sieve of Eratosthenes that outputs a sequence (iterator)'''

Following is a Kotlin implementation of the Tree Folding Incremental Sieve of Eratosthenes from an adaptation of the algorithm by Richard Bird. It is based on lazy lists, but in fact the memoization (and cost in execution time) of a lazy list is not required and the following code uses a "roll-your-own" implementation of a Co-Inductive Stream CIS). The final output is as a Sequence for convenience in using it. The code is written as purely function in that no mutation is used:

{{trans|Haskell}}

<syntaxhighlight lang="kotlin">data class CIS<T>(val head: T, val tailf: () -> CIS<T>) {
fun toSequence() = generateSequence(this) { it.tailf() } .map { it.head }
}

fun primes(): Sequence<Int> {
fun merge(a: CIS<Int>, b: CIS<Int>): CIS<Int> {
val ahd = a.head; val bhd = b.head
if (ahd > bhd) return CIS(bhd) { ->merge(a, b.tailf()) }
if (ahd < bhd) return CIS(ahd) { ->merge(a.tailf(), b) }
return CIS(ahd) { ->merge(a.tailf(), b.tailf()) }
}
fun bpmults(p: Int): CIS<Int> {
val inc = p + p
fun mlts(c: Int): CIS<Int> = CIS(c) { ->mlts(c + inc) }
return mlts(p * p)
}
fun allmults(ps: CIS<Int>): CIS<CIS<Int>> = CIS(bpmults(ps.head)) { allmults(ps.tailf()) }
fun pairs(css: CIS<CIS<Int>>): CIS<CIS<Int>> {
val xs = css.head; val yss = css.tailf(); val ys = yss.head
return CIS(merge(xs, ys)) { ->pairs(yss.tailf()) }
}
fun union(css: CIS<CIS<Int>>): CIS<Int> {
val xs = css.head
return CIS(xs.head) { -> merge(xs.tailf(), union(pairs(css.tailf()))) }
}
tailrec fun minus(n: Int, cs: CIS<Int>): CIS<Int> =
if (n >= cs.head) minus(n + 2, cs.tailf()) else CIS(n) { ->minus(n + 2, cs) }
fun oddprms(): CIS<Int> = CIS(3) { -> CIS(5) { ->minus(7, union(allmults(oddprms()))) } }
return CIS(2) { ->oddprms() } .toSequence()
}

fun main(args: Array<String>) {
val limit = 1000000
val strt = System.currentTimeMillis()
println(primes().takeWhile { it <= limit } .count())
val stop = System.currentTimeMillis()
println("Took ${stop - strt} milliseconds.")
}</syntaxhighlight>

The code is about five times slower than the more imperative hash table based version immediately above due to the costs of the extra levels of function calls in the functional style. The Haskell version from which this is derived is much faster due to the extensive optimizations it does to do with function/closure "lifting" as well as a Garbage Collector specifically tuned for functional code.

'''An unbounded Page Segmented Sieve of Eratosthenes that can output a sequence (iterator)'''

The very fastest implementations of a primes sieve are all based on bit-packed mutable arrays which can be made unbounded by setting them up so that they are a succession of sieved bit-packed arrays that have been culled of composites. The following code is an odds=only implementation that, again, uses a secondary feed of base primes that is only expanded as necessary (in this case memoized by a rudimentary lazy list structure to avoid recalculation for every base primes sweep per page segment):
<syntaxhighlight lang="kotlin">internal typealias Prime = Long
internal typealias BasePrime = Int
internal typealias BasePrimeArray = IntArray
internal typealias SieveBuffer = ByteArray

// contains a lazy list of a secondary base prime arrays feed
internal data class BasePrimeArrays(val arr: BasePrimeArray,
val rest: Lazy<BasePrimeArrays?>)
: Sequence<BasePrimeArray> {
override fun iterator() =
generateSequence(this) { it.rest.value }
.map { it.arr }.iterator()
}

// count the number of zero bits (primes) in a byte array,
fun countComposites(cmpsts: SieveBuffer): Int {
var cnt = 0
for (b in cmpsts) {
cnt += java.lang.Integer.bitCount(b.toInt().and(0xFF))
}
return cmpsts.size.shl(3) - cnt
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
fun composites2BasePrimeArray(low: Int, cmpsts: SieveBuffer)
: BasePrimeArray {
val lmti = cmpsts.size.shl(3)
val len = countComposites(cmpsts)
val rslt = BasePrimeArray(len)
var j = 0
for (i in 0 until lmti) {
if (cmpsts[i.shr(3)].toInt() and 1.shl(i and 7) == 0) {
rslt[j] = low + i + i; j++
}
}
return rslt
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
fun sieveComposites(low: Prime, buffer: SieveBuffer,
bpas: Sequence<BasePrimeArray>) {
val lowi = (low - 3L).shr(1)
val len = buffer.size
val lmti = len.shl(3)
val nxti = lowi + lmti.toLong()
for (bpa in bpas) {
for (bp in bpa) {
val bpi = (bp - 3).shr(1).toLong()
var strti = (bpi * (bpi + 3L)).shl(1) + 3L
if (strti >= nxti) return
val s0 =
if (strti >= lowi) (strti - lowi).toInt()
else {
val r = (lowi - strti) % bp.toLong()
if (r.toInt() == 0) 0 else bp - r.toInt()
}
if (bp <= len.shr(3) && s0 <= lmti - bp.shl(6)) {
val slmti = minOf(lmti, s0 + bp.shl(3))
tailrec fun mods(s: Int) {
if (s < slmti) {
val msk = 1.shl(s and 7)
tailrec fun cull(c: Int) {
if (c < len) {
buffer[c] = (buffer[c].toInt() or msk).toByte()
cull(c + bp)
}
}
cull(s.shr(3)); mods(s + bp)
}
}
mods(s0)
}
else {
tailrec fun cull(c: Int) {
if (c < lmti) {
val w = c.shr(3)
buffer[w] = (buffer[w].toInt() or 1.shl(c and 7)).toByte()
cull(c + bp)
}
}
cull(s0)
}
}
}
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
fun makeBasePrimeArrays(): Sequence<BasePrimeArray> {
var cmpsts = SieveBuffer(512)
fun nextelem(low: Int, bpas: Sequence<BasePrimeArray>): BasePrimeArrays {
// calculate size so that the bit span is at least as big as the
// maximum culling prime required, rounded up to minsizebits blocks...
val rqdsz = 2 + Math.sqrt((1 + low).toDouble()).toInt()
val sz = (rqdsz.shr(12) + 1).shl(9) // size iin bytes
if (sz > cmpsts.size) cmpsts = SieveBuffer(sz)
cmpsts.fill(0)
sieveComposites(low.toLong(), cmpsts, bpas)
val arr = composites2BasePrimeArray(low, cmpsts)
val nxt = low + cmpsts.size.shl(4)
return BasePrimeArrays(arr, lazy { ->nextelem(nxt, bpas) })
}
// pre-seeding breaks recursive race,
// as only known base primes used for first page...
var preseedarr = intArrayOf( // pre-seed to 100, can sieve to 10,000...
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 )
return BasePrimeArrays(preseedarr, lazy {->nextelem(101, makeBasePrimeArrays())})
}

// a seqence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
fun makeSievePages(): Sequence<Pair<Prime,SieveBuffer>> {
val bpas = makeBasePrimeArrays() // secondary source of base prime arrays
fun init(): SieveBuffer {
val c = SieveBuffer(16384); sieveComposites(3L, c, bpas); return c }
return generateSequence(Pair(3L, init())) {
(low, cmpsts) ->
// calculate size so that the bit span is at least as big as the
// max culling prime required, rounded up to minsizebits blocks...
val rqdsz = 2 + Math.sqrt((1 + low).toDouble()).toInt()
val sz = (rqdsz.shr(17) + 1).shl(14) // size iin bytes
val ncmpsts = if (sz > cmpsts.size) SieveBuffer(sz) else cmpsts
ncmpsts.fill(0)
val nlow = low + ncmpsts.size.toLong().shl(4)
sieveComposites(nlow, ncmpsts, bpas)
Pair(nlow, ncmpsts)
}
}

fun countPrimesTo(range: Prime): Prime {
if (range < 3) { if (range < 2) return 0 else return 1 }
var count = 1L
for ((low,cmpsts) in makeSievePages()) {
if (low + cmpsts.size.shl(4) > range) {
val lsti = (range - low).shr(1).toInt()
val lstw = lsti.shr(3)
val msk = -2.shl(lsti.and(7))
count += 32 + lstw.shl(3)
for (i in 0 until lstw)
count -= java.lang.Integer.bitCount(cmpsts[i].toInt().and(0xFF))
count -= java.lang.Integer.bitCount(cmpsts[lstw].toInt().or(msk))
break
} else {
count += countComposites(cmpsts)
}
}
return count
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
fun primesPaged(): Sequence<Prime> = sequence {
yield(2L)
for ((low,cmpsts) in makeSievePages()) {
val szbts = cmpsts.size.shl(3)
for (i in 0 until szbts) {
if (cmpsts[i.shr(3)].toInt() and 1.shl(i and 7) != 0) continue
yield(low + i.shl(1).toLong())
}
}
}</syntaxhighlight>

For this implementation, counting the primes to a million is trivial at about 15 milliseconds on the same CPU as above, or almost too short to count.

It shows its speed in solving the Euler Problem 10 above about five times faster at about 50 milliseconds to give the same output:

It can sum the primes to 200 million or a hundred times the range in just over three seconds.

It finds the count of primes to a billion in about 16 seconds or just about 1000 times slower than to sum the primes to a range 1000 times less for an almost linear response to range as it should be.

However, much of the time (about two thirds) is spent iterating over the results rather than doing the actual work of sieving; for this sort of problem such as counting, finding the nth prime, finding occurrences of maximum prime gaps, etc., one should really use specialized function that directly manipulate the output sieve arrays. Such a function is provided by the `countPrimeTo` function, which can count the primes to a billion (50847534) in about 5.65 seconds, or about 10.6 clock cycles per culling operation or about 210 cycles per prime.

Kotlin isn't really fast even as compared to other virtual machine languages such as C# and F# on CLI but that is mostly due to limitations of the Java Virtual Machine (JVM) as to speed of generated Just In Time (JIT) compilation, handling of primitive number operations, enforced array bounds checks, etc. It will always be much slower than native code producing compilers and the (experimental) native compiler for Kotlin still isn't up to speed (pun intended), producing code that is many times slower than the code run on the JVM (December 2018).

=={{header|Lambdatalk}}==
<syntaxhighlight lang="scheme">

• 1) create an array of natural numbers, [0,1,2,3, ... ,n-1]
• 2) the 3rd number is 2, we set to dots all its composites by steps of 2,
• 3) the 4th number is 3, we set to dots all its composites by steps of 3,
• 4) the 6th number is 5, we set to dots all its composites by steps of 5,
• 5) the remaining numbers are primes and we clean all dots.

For instance:

1: 0 0 0 0 0 0 0 0 0 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
2: 0 1 2 3 . 5 . 7 . 9 . 1 . 3 . 5 . 7 . 9 . 1 . 3 . 5 . 7 . 9 .
3: 0 1 2 3 . 5 . 7 . . . 1 . 3 . . . 7 . 9 . . . 3 . 5 . . . 9 .
4: 0 1 2 3 . 5 . 7 . . . 1 . 3 . . . 7 . 9 . . . 3 . . . . . 9 .
| | | | | | | | | |
5: 0 0 0 0 1 1 1 1 2 2
2 3 5 7 1 3 7 9 3 9


1) recursive version {rsieve n}

{def rsieve

{def rsieve.mark
{lambda {:n :a :i :j}
{if {< :j :n}
then {rsieve.mark :n
{A.set! :j . :a}
:i
{+ :i :j}}
else :a}}}

{def rsieve.loop
{lambda {:n :a :i}
{if {< {* :i :i} :n}
then {rsieve.loop :n
{if {W.equal? {A.get :i :a} .}
then :a
else {rsieve.mark :n :a :i {* :i :i}}}
{+ :i 1}}
else :a}}}

{lambda {:n}
{S.replace \s by space in
{S.replace (\[|\]|\.|,) by space in
{A.disp
{A.slice 2 -1
{rsieve.loop :n
{A.new {S.serie 0 :n}} 2}}}}}}}
-> rsieve

{rsieve 1000}
-> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Note: this version doesn't avoid stackoverflow.

2) iterative version {isieve n}

{def isieve

{def isieve.mark
{lambda {:n :a :i}
{S.map {{lambda {:a :j} {A.set! :j . :a}
} :a}
{S.serie {* :i :i} :n :i} }}}

{lambda {:n}
{S.replace \s by space in
{S.replace (\[|\]|\.|,) by space in
{A.disp
{A.slice 2 -1
{S.last
{S.map {{lambda {:n :a :i} {if {W.equal? {A.get :i :a} .}
then
else {isieve.mark :n :a :i}}
} :n {A.new {S.serie 0 :n}}}
{S.serie 2 {sqrt :n} 1}}}}}}}}}
-> isieve

{isieve 1000}
-> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Notes:
- this version avoids stackoverflow.
- From 1 to 1000000 there are 78500 primes (computed in ~15000ms) and the last is 999983.

</syntaxhighlight>

=={{header|langur}}==
{{trans|D}}
<syntaxhighlight lang="langur">val .sieve = fn(.limit) {
if .limit < 2: return []

var .composite = .limit * [false]
.composite[1] = true

for .n in 2 .. trunc(.limit ^/ 2) + 1 {
if not .composite[.n] {
for .k = .n^2 ; .k < .limit ; .k += .n {
.composite[.k] = true
}
}
}

filter fn(.n) { not .composite[.n] }, series .limit-1
}

writeln .sieve(100)
</syntaxhighlight>

{{out}}
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]</pre>

=={{header|LFE}}==
<syntaxhighlight lang="lisp">
(defmodule eratosthenes
(export (sieve 1)))

(defun sieve (limit)
(sieve limit (lists:seq 2 limit)))

(defun sieve
((limit (= l (cons p _))) (when (> (* p p) limit))
l)
((limit (cons p ns))
(cons p (sieve limit (remove-multiples p (* p p) ns)))))

(defun remove-multiples (p q l)
(lists:reverse (remove-multiples p q l '())))

(defun remove-multiples
((_ _ '() s) s)
((p q (cons q ns) s)
(remove-multiples p q ns s))
((p q (= r (cons a _)) s) (when (> a q))
(remove-multiples p (+ q p) r s))
((p q (cons n ns) s)
(remove-multiples p q ns (cons n s))))
</syntaxhighlight>
{{Out}}
<pre>
lfe> (slurp "sieve.lfe")
#(ok eratosthenes)
lfe> (sieve 100)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
</pre>


=={{header|Liberty BASIC}}==
=={{header|Liberty BASIC}}==
<lang lb> 'Notice that arrays are globally visible to functions.
<syntaxhighlight lang="lb"> 'Notice that arrays are globally visible to functions.
'The sieve() function uses the flags() array.
'The sieve() function uses the flags() array.
'This is a Sieve benchmark adapted from BYTE 1985
'This is a Sieve benchmark adapted from BYTE 1985
Line 5,738: Line 11,822:
end if
end if
next i
next i
end function</lang>
end function</syntaxhighlight>

=={{header|Limbo}}==
<syntaxhighlight lang="go">implement Sieve;

include "sys.m";
sys: Sys;
print: import sys;
include "draw.m";
draw: Draw;

Sieve : module
{
init : fn(ctxt : ref Draw->Context, args : list of string);
};

init (ctxt: ref Draw->Context, args: list of string)
{
sys = load Sys Sys->PATH;

limit := 201;
sieve : array of int;
sieve = array [201] of {* => 1};
(sieve[0], sieve[1]) = (0, 0);

for (n := 2; n < limit; n++) {
if (sieve[n]) {
for (i := n*n; i < limit; i += n) {
sieve[i] = 0;
}
}
}

for (n = 1; n < limit; n++) {
if (sieve[n]) {
print ("%4d", n);
} else {
print(" .");
};
if ((n%20) == 0)
print("\n\n");
}
}
</syntaxhighlight>

=={{header|Lingo}}==
<syntaxhighlight lang="lingo">-- parent script "sieve"
property _sieve

----------------------------------------
-- @constructor
----------------------------------------
on new (me)
me._sieve = []
return me
end

----------------------------------------
-- Returns list of primes <= n
----------------------------------------
on getPrimes (me, limit)
if me._sieve.count<limit then me._primeSieve(limit)
primes = []
repeat with i = 2 to limit
if me._sieve[i] then primes.add(i)
end repeat
return primes
end

----------------------------------------
-- Sieve of Eratosthenes
----------------------------------------
on _primeSieve (me, limit)
me._sieve = [0]
repeat with i = 2 to limit
me._sieve[i] = 1
end repeat
c = sqrt(limit)
repeat with i = 2 to c
if (me._sieve[i]=0) then next repeat
j = i*i -- start with square
repeat while (j<=limit)
me._sieve[j] = 0
j = j + i
end repeat
end repeat
end</syntaxhighlight>

<syntaxhighlight lang="lingo">sieve = script("sieve").new()
put sieve.getPrimes(100)</syntaxhighlight>

{{out}}
<pre>
-- [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
</pre>


=={{header|LiveCode}}==
=={{header|LiveCode}}==
<lang LiveCode>function sieveE int
<syntaxhighlight lang="livecode">function sieveE int
set itemdel to comma
set itemdel to comma
local sieve
local sieve
Line 5,762: Line 11,940:
sort items of sieve ascending numeric
sort items of sieve ascending numeric
return sieve
return sieve
end sieveE</lang>
end sieveE</syntaxhighlight>
Example<lang LiveCode>put sieveE(121)
Example<syntaxhighlight lang="livecode">put sieveE(121)
-- 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113</lang>
-- 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113</syntaxhighlight>



<syntaxhighlight lang="livecode"># Sieve of Eratosthenes
# calculates prime numbers up to a given number
on mouseUp
put field "maximum" into limit
put the ticks into startTicks # start a timer
repeat with i = 2 to limit step 1 # load array with zeros
put 0 into prime_array[i]
end repeat
repeat with i = 2 to trunc(sqrt(limit)) # truncate square root
if prime_array[i] = 0 then
repeat with k = (i * i) to limit step i
delete variable prime_array[k] # remove non-primes
end repeat
end if
end repeat
put the ticks - startTicks into elapsedTicks # stop timer
put elapsedTicks / 60 into field "elapsed" # calculate time
put the keys of prime_array into prime_numbers # array to variable
put the number of lines of keys of prime_array into field "count"
sort lines of prime_numbers ascending numeric
put prime_numbers into field "primeList" # show prime numbers
end mouseUp

</syntaxhighlight>

[http://www.melellington.com/sieve/livecode-sieve-output.png LiveCode output example]

===Comments===
LiveCode uses a mouse graphical drag and drop.
No text code was used to create a button and fields;
The user enters a number into the 'maximum' field
and then clicks a button to run the code.
It runs identically whether in the LiveCode IDE or
when compiled to a executable on Mac, Windows, and Linux.

The example was run on an Intel i5 CPU @ 3.29 GHz;
all primes found up to 1,000,000 in 3 seconds.


=={{header|Logo}}==
=={{header|Logo}}==
Line 5,784: Line 12,005:
due to the use of mod (modulo = division) in the filter function.
due to the use of mod (modulo = division) in the filter function.
A coinduction based solution just for fun:
A coinduction based solution just for fun:
<lang logtalk>
<syntaxhighlight lang="logtalk">
:- object(sieve).
:- object(sieve).


Line 5,823: Line 12,044:


:- end_object.
:- end_object.
</syntaxhighlight>
</lang>
Example query:
Example query:
<lang logtalk>
<syntaxhighlight lang="logtalk">
?- sieve::primes(20, P).
?- sieve::primes(20, P).
P = [2, 3|_S1], % where
P = [2, 3|_S1], % where
_S1 = [5, 7, 11, 13, 17, 19, 2, 3|_S1] .
_S1 = [5, 7, 11, 13, 17, 19, 2, 3|_S1] .
</syntaxhighlight>
</lang>

=={{header|LOLCODE}}==
<syntaxhighlight lang="lolcode">HAI 1.2
CAN HAS STDIO?

HOW IZ I Eratosumthin YR Max
I HAS A Siv ITZ A BUKKIT
Siv HAS A SRS 1 ITZ 0
I HAS A Index ITZ 2
IM IN YR Inishul UPPIN YR Dummy WILE DIFFRINT Index AN SUM OF Max AN 1
Siv HAS A SRS Index ITZ 1
Index R SUM OF Index AN 1
IM OUTTA YR Inishul

I HAS A Prime ITZ 2
IM IN YR MainLoop UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN PRODUKT OF Prime AN Prime
BOTH SAEM Siv'Z SRS Prime AN 1
O RLY?
YA RLY
Index R SUM OF Prime AN Prime
IM IN YR MarkMultipulz UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN Index
Siv'Z SRS Index R 0
Index R SUM OF Index AN Prime
IM OUTTA YR MarkMultipulz
OIC
Prime R SUM OF Prime AN 1
IM OUTTA YR MainLoop

Index R 1
I HAS A First ITZ WIN
IM IN YR PrintPrimes UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN Index
BOTH SAEM Siv'Z SRS Index AN 1
O RLY?
YA RLY
First
O RLY?
YA RLY
First R FAIL
NO WAI
VISIBLE ", "!
OIC
VISIBLE Index!
OIC
Index R SUM OF Index AN 1
IM OUTTA YR PrintPrimes
VISIBLE ""
IF U SAY SO

I IZ Eratosumthin YR 100 MKAY

KTHXBYE</syntaxhighlight>

{{Out}}
<pre>2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97</pre>


=={{header|Lua}}==
=={{header|Lua}}==
<lang lua>function erato(n)
<syntaxhighlight lang="lua">function erato(n)
if n < 2 then return {} end
if n < 2 then return {} end
local t = {0} -- clears '1'
local t = {0} -- clears '1'
Line 5,841: Line 12,116:
for i = 2, n do if t[i] ~= 0 then table.insert(primes, i) end end
for i = 2, n do if t[i] ~= 0 then table.insert(primes, i) end end
return primes
return primes
end</lang>
end</syntaxhighlight>


The following changes the code to '''odds-only''' using the same large array-based algorithm:
The following changes the code to '''odds-only''' using the same large array-based algorithm:
<lang lua>function erato2(n)
<syntaxhighlight lang="lua">function erato2(n)
if n < 2 then return {} end
if n < 2 then return {} end
if n < 3 then return {2} end
if n < 3 then return {2} end
Line 5,857: Line 12,132:
for i = 0, lmt do if t[i] ~= 0 then table.insert(primes, i + i + 3) end end
for i = 0, lmt do if t[i] ~= 0 then table.insert(primes, i + i + 3) end end
return primes
return primes
end</lang>
end</syntaxhighlight>


The following code implements '''an odds-only "infinite" generator style using a table as a hash table''', including postponing adding base primes to the table:
The following code implements '''an odds-only "infinite" generator style using a table as a hash table''', including postponing adding base primes to the table:


<lang lua>function newEratoInf()
<syntaxhighlight lang="lua">function newEratoInf()
local _cand = 0; local _lstbp = 3; local _lstsqr = 9
local _cand = 0; local _lstbp = 3; local _lstsqr = 9
local _composites = {}; local _bps = nil
local _composites = {}; local _bps = nil
Line 5,894: Line 12,169:
while gen.next() <= 10000000 do count = count + 1 end -- sieves to 10 million
while gen.next() <= 10000000 do count = count + 1 end -- sieves to 10 million
print(count)
print(count)
</syntaxhighlight>
</lang>


which outputs "664579" in about three seconds. As this code uses much less memory for a given range than the previous ones and retains efficiency better with range, it is likely more appropriate for larger sieve ranges.
which outputs "664579" in about three seconds. As this code uses much less memory for a given range than the previous ones and retains efficiency better with range, it is likely more appropriate for larger sieve ranges.

=={{header|Lucid}}==
=={{header|Lucid}}==
{{incorrect|Lucid|Not a true Sieve of Eratosthenes but rather a Trial Division Sieve}}
{{incorrect|Lucid|Not a true Sieve of Eratosthenes but rather a Trial Division Sieve}}
Line 5,917: Line 12,193:
i fby sieve ( i whenever i mod first i ne 0 ) ;
i fby sieve ( i whenever i mod first i ne 0 ) ;
end
end

=={{header|M2000 Interpreter}}==
<syntaxhighlight lang="m2000 interpreter">
Module EratosthenesSieve (x) {
\\ Κόσκινο του Ερατοσθένη
Profiler
If x>2000000 Then Exit
Dim i(x+1): k=2: k2=sqrt(x)
While k<=k2{if i(k) else for m=k*k to x step k{i(m)=1}
k++}
Print str$(timecount/1000,"####0.##")+" s"
Input "Press enter skip print or a non zero to get results:", a%
if a% then For i=2to x{If i(i)=0 Then Print i,
}
Print:Print "Done"
}
EratosthenesSieve 1000
</syntaxhighlight>


=={{header|M4}}==
=={{header|M4}}==
<lang M4>define(`lim',100)dnl
<syntaxhighlight lang="m4">define(`lim',100)dnl
define(`for',
define(`for',
`ifelse($#,0,
`ifelse($#,0,
Line 5,930: Line 12,224:
`j for(`k',eval(j*j),lim,j,
`j for(`k',eval(j*j),lim,j,
`define(a[k],1)')')')
`define(a[k],1)')')')
</syntaxhighlight>
</lang>


Output:
Output:
Line 5,936: Line 12,230:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
</pre>
</pre>

=={{header|Mathematica}}==
=={{header|MAD}}==
<lang Mathematica>Eratosthenes[n_] := Module[{numbers = Range[n]},

<syntaxhighlight lang="mad"> NORMAL MODE IS INTEGER
R TO GENERATE MORE PRIMES, CHANGE BOTH THESE NUMBERS
BOOLEAN PRIME
DIMENSION PRIME(10000)
MAXVAL = 10000
PRINT FORMAT BEGIN,MAXVAL
R ASSUME ALL ARE PRIMES AT BEGINNING
THROUGH SET, FOR I=2, 1, I.G.MAXVAL
SET PRIME(I) = 1B

R REMOVE ALL PROVEN COMPOSITES
SQMAX = SQRT.(MAXVAL)
THROUGH NEXT, FOR P=2, 1, P.G.SQMAX
WHENEVER PRIME(P)
THROUGH MARK, FOR I=P*P, P, I.G.MAXVAL
MARK PRIME(I) = 0B
NEXT END OF CONDITIONAL

R PRINT PRIMES
THROUGH SHOW, FOR P=2, 1, P.G.MAXVAL
SHOW WHENEVER PRIME(P), PRINT FORMAT NUMFMT, P
VECTOR VALUES BEGIN = $13HPRIMES UP TO ,I9*$
VECTOR VALUES NUMFMT = $I9*$
END OF PROGRAM</syntaxhighlight>

{{out}}

<pre>PRIMES UP TO 10000
2
3
5
7
11
13
17
...
9979
9983
9985
9989
9991
9995
9997</pre>

=={{header|Maple}}==
<syntaxhighlight lang="maple">Eratosthenes := proc(n::posint)
local numbers_to_check, i, k;
numbers_to_check := [seq(2 .. n)];
for i from 2 to floor(sqrt(n)) do
for k from i by i while k <= n do
if evalb(k <> i) then
numbers_to_check[k - 1] := 0;
end if;
end do;
end do;
numbers_to_check := remove(x -> evalb(x = 0), numbers_to_check);
return numbers_to_check;
end proc:
</syntaxhighlight>
{{out}}
<pre>
Eratosthenes(100);
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
</pre>

=={{header|Mathematica}}/{{header|Wolfram Language}}==
<syntaxhighlight lang="mathematica">Eratosthenes[n_] := Module[{numbers = Range[n]},
Do[If[numbers[[i]] != 0, Do[numbers[[i j]] = 0, {j, 2, n/i}]], {i,
Do[If[numbers[[i]] != 0, Do[numbers[[i j]] = 0, {j, 2, n/i}]], {i,
2, Sqrt[n]}];
2, Sqrt[n]}];
Select[numbers, # > 1 &]]
Select[numbers, # > 1 &]]


Eratosthenes[100]</lang>
Eratosthenes[100]</syntaxhighlight>
===Slightly Optimized Version===
===Slightly Optimized Version===
The below has been improved to not require so many operations per composite number cull for about two thirds the execution time:
The below has been improved to not require so many operations per composite number cull for about two thirds the execution time:
<lang Mathematica>Eratosthenes[n_] := Module[{numbers = Range[n]},
<syntaxhighlight lang="mathematica">Eratosthenes[n_] := Module[{numbers = Range[n]},
Do[If[numbers[[i]] != 0, Do[numbers[[j]] = 0, {j,i i,n,i}]],{i,2,Sqrt[n]}];
Do[If[numbers[[i]] != 0, Do[numbers[[j]] = 0, {j,i i,n,i}]],{i,2,Sqrt[n]}];
Select[numbers, # > 1 &]]
Select[numbers, # > 1 &]]


Eratosthenes[100]</lang>
Eratosthenes[100]</syntaxhighlight>
===Sieving Odds-Only Version===
===Sieving Odds-Only Version===
The below has been further improved to only sieve odd numbers for a further reduction in execution time by a factor of over two:
The below has been further improved to only sieve odd numbers for a further reduction in execution time by a factor of over two:
<lang Mathematica>Eratosthenes2[n_] := Module[{numbers = Range[3, n, 2], limit = (n - 1)/2},
<syntaxhighlight lang="mathematica">Eratosthenes2[n_] := Module[{numbers = Range[3, n, 2], limit = (n - 1)/2},
Do[c = numbers[[i]]; If[c != 0,
Do[c = numbers[[i]]; If[c != 0,
Do[numbers[[j]] = 0, {j,(c c - 1)/2,limit,c}]], {i,1,(Sqrt[n] - 1)/2}];
Do[numbers[[j]] = 0, {j,(c c - 1)/2,limit,c}]], {i,1,(Sqrt[n] - 1)/2}];
Prepend[Select[numbers, # > 1 &], 2]]
Prepend[Select[numbers, # > 1 &], 2]]


Eratosthenes2[100]</lang>
Eratosthenes2[100]</syntaxhighlight>

=={{header|MATLAB}}==
=={{header|MATLAB}} / {{header|Octave}}==
===Somewhat optimized true Sieve of Eratosthenes===
===Somewhat optimized true Sieve of Eratosthenes===
<syntaxhighlight lang="matlab">function P = erato(x) % Sieve of Eratosthenes: returns all primes between 2 and x
<lang MATLAB>
function P = erato(x) % Sieve of Eratosthenes: returns all primes between 2 and x
P = [0 2:x] ; % Create vector with all ints between 2 and x where
P = [0 2:x]; % Create vector with all ints between 2 and x where
% position 1 is hard-coded as 0 since 1 is not a prime.
% position 1 is hard-coded as 0 since 1 is not a prime.


for (n=2:sqrt(x)) % All primes factors lie between 2 and sqrt(x).
for n = 2:sqrt(x) % All primes factors lie between 2 and sqrt(x).
if P(n) % If the current value is not 0 (i.e. a prime),
if P(n) % If the current value is not 0 (i.e. a prime),
P((2*n):n:x) = 0 ; % then replace all further multiples of it with 0.
P(n*n:n:x) = 0; % then replace all further multiples of it with 0.
end
end
end % At this point P is a vector with only primes and zeroes.
end % At this point P is a vector with only primes and zeroes.


P = P(P ~= 0) ; % Remove all zeroes from P, leaving only the primes.
P = P(P ~= 0); % Remove all zeroes from P, leaving only the primes.
end</syntaxhighlight>The optimization lies in fewer steps in the for loop, use of MATLAB's built-in array operations and no modulo calculation.

return</lang>The optimization lies in fewer steps in the for loop, use of MATLAB's built-in array operations and no modulo calculation.


'''Limitation:''' your machine has to be able to allocate enough memory for an array of length x.
'''Limitation:''' your machine has to be able to allocate enough memory for an array of length x.
Line 5,981: Line 12,345:
A more efficient Sieve avoids creating a large double precision vector P, instead using a logical array (which consumes 1/8 the memory of a double array of the same size) and only converting to double those values corresponding to primes.
A more efficient Sieve avoids creating a large double precision vector P, instead using a logical array (which consumes 1/8 the memory of a double array of the same size) and only converting to double those values corresponding to primes.


<syntaxhighlight lang="matlab">function P = sieveOfEratosthenes(x)
<lang MATLAB>
ISP = [false true(1, x-1)]; % 1 is not prime, but we start off assuming all numbers between 2 and x are
function P = sieveOfEratosthenes(x)
for n = 2:sqrt(x)
ISP = [false true(1, x-1)]; % 1 is not prime, but we start off assuming all numbers between 2 and x are
for n = 2:sqrt(x)
if ISP(n)
ISP(n*n:n:x) = false; % Multiples of n that are greater than n*n are not primes
if ISP(n)
end
ISP((2*n):n:x) = false; % Multiples of n that are greater than n are not primes
end
end
% The ISP vector that we have calculated is essentially the output of the ISPRIME function called on 1:x
end
% The ISP vector that we have calculated is essentially the output of the ISPRIME function called on 1:x
P = find(ISP); % Convert the ISPRIME output to the values of the primes by finding the locations
% of the TRUE values in S.
P = find(ISP); % Convert the ISPRIME output to the values of the primes by finding the locations
end</syntaxhighlight>
% of the TRUE values in S.
</lang>


You can compare the output of this function against the PRIMES function included in MATLAB, which performs a somewhat more memory-efficient Sieve (by not storing even numbers, at the expense of a more complicated indexing expression inside the IF statement.)
You can compare the output of this function against the PRIMES function included in MATLAB, which performs a somewhat more memory-efficient Sieve (by not storing even numbers, at the expense of a more complicated indexing expression inside the IF statement.)


=={{header|Maxima}}==
=={{header|Maxima}}==
<lang maxima>sieve(n):=block(
<syntaxhighlight lang="maxima">sieve(n):=block(
[a:makelist(true,n),i:1,j],
[a:makelist(true,n),i:1,j],
a[1]:false,
a[1]:false,
Line 6,006: Line 12,369:
for j from i*i step i while j<=n do a[j]:false
for j from i*i step i while j<=n do a[j]:false
)
)
)$</lang>
)$</syntaxhighlight>

=={{header|MAXScript}}==
=={{header|MAXScript}}==
fn eratosthenes n =
fn eratosthenes n =
Line 6,026: Line 12,390:
eratosthenes 100
eratosthenes 100

=={{header|Mercury}}==
<syntaxhighlight lang="mercury">:- module sieve.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.
:- implementation.
:- import_module bool, array, int.

main(!IO) :-
sieve(50, Sieve),
dump_primes(2, size(Sieve), Sieve, !IO).

:- pred dump_primes(int, int, array(bool), io, io).
:- mode dump_primes(in, in, array_di, di, uo) is det.
dump_primes(N, Limit, !.A, !IO) :-
( if N < Limit then
unsafe_lookup(!.A, N, Prime),
(
Prime = yes,
io.write_line(N, !IO)
;
Prime = no
),
dump_primes(N + 1, Limit, !.A, !IO)
else
true
).

:- pred sieve(int, array(bool)).
:- mode sieve(in, array_uo) is det.
sieve(N, !:A) :-
array.init(N, yes, !:A),
sieve(2, N, !A).

:- pred sieve(int, int, array(bool), array(bool)).
:- mode sieve(in, in, array_di, array_uo) is det.
sieve(N, Limit, !A) :-
( if N < Limit then
unsafe_lookup(!.A, N, Prime),
(
Prime = yes,
sift(N + N, N, Limit, !A),
sieve(N + 1, Limit, !A)
;
Prime = no,
sieve(N + 1, Limit, !A)
)
else
true
).

:- pred sift(int, int, int, array(bool), array(bool)).
:- mode sift(in, in, in, array_di, array_uo) is det.
sift(I, N, Limit, !A) :-
( if I < Limit then
unsafe_set(I, no, !A),
sift(I + N, N, Limit, !A)
else
true
).</syntaxhighlight>

=={{header|Microsoft Small Basic}}==
{{trans|GW-BASIC}}
<syntaxhighlight lang="microsoftsmallbasic">
TextWindow.Write("Enter number to search to: ")
limit = TextWindow.ReadNumber()
For n = 2 To limit
flags[n] = 0
EndFor
For n = 2 To math.SquareRoot(limit)
If flags[n] = 0 Then
For K = n * n To limit Step n
flags[K] = 1
EndFor
EndIf
EndFor
' Display the primes
If limit >= 2 Then
TextWindow.Write(2)
For n = 3 To limit
If flags[n] = 0 Then
TextWindow.Write(", " + n)
EndIf
EndFor
TextWindow.WriteLine("")
EndIf
</syntaxhighlight>

=={{header|Modula-2}}==
<syntaxhighlight lang="modula2">MODULE Erato;
FROM InOut IMPORT WriteCard, WriteLn;
FROM MathLib IMPORT sqrt;

CONST Max = 100;

VAR prime: ARRAY [2..Max] OF BOOLEAN;
i: CARDINAL;

PROCEDURE Sieve;
VAR i, j, sqmax: CARDINAL;
BEGIN
sqmax := TRUNC(sqrt(FLOAT(Max)));

FOR i := 2 TO Max DO prime[i] := TRUE; END;
FOR i := 2 TO sqmax DO
IF prime[i] THEN
j := i * 2;
(* alas, the BY clause in a FOR loop must be a constant *)
WHILE j <= Max DO
prime[j] := FALSE;
j := j + i;
END;
END;
END;
END Sieve;

BEGIN
Sieve;
FOR i := 2 TO Max DO
IF prime[i] THEN
WriteCard(i,5);
WriteLn;
END;
END;
END Erato.</syntaxhighlight>
{{out}}
<pre style='height:50ex;'> 2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97</pre>


=={{header|Modula-3}}==
=={{header|Modula-3}}==
===Regular version===
===Regular version===
This version runs slow because of the way I/O is implemented in the CM3 compiler. Setting <code>ListPrimes = FALSE</code> achieves speed comparable to C on sufficiently high values of <code>LastNum</code> (e.g., 10^6).
{{incorrect|Modula-3|Not a true Sieve of Eratosthenes but rather a Trial Division Sieve}}
<lang modula3>MODULE Prime EXPORTS Main;
<syntaxhighlight lang="modula3">MODULE Eratosthenes EXPORTS Main;


IMPORT IO;
IMPORT IO;


FROM Math IMPORT sqrt;
CONST LastNum = 1000;


CONST
VAR a: ARRAY [2..LastNum] OF BOOLEAN;
LastNum = 1000;
ListPrimes = TRUE;

VAR
a: ARRAY[2..LastNum] OF BOOLEAN;

VAR
n := LastNum - 2 + 1;


BEGIN
BEGIN

(* set up *)
FOR i := FIRST(a) TO LAST(a) DO
FOR i := FIRST(a) TO LAST(a) DO
a[i] := TRUE;
a[i] := TRUE;
END;
END;


(* declare a variable local to a block *)
FOR i := FIRST(a) TO LAST(a) DO
VAR b := FLOOR(sqrt(FLOAT(LastNum, LONGREAL)));
IF a[i] THEN

IO.PutInt(i);
(* the block must follow immediately *)
IO.Put(" ");
BEGIN
FOR j := FIRST(a) TO LAST(a) DO

IF j MOD i = 0 THEN
(* print primes and mark out composites up to sqrt(LastNum) *)
a[j] := FALSE;
FOR i := FIRST(a) TO b DO
IF a[i] THEN
IF ListPrimes THEN IO.PutInt(i); IO.Put(" "); END;
FOR j := i*i TO LAST(a) BY i DO
IF a[j] THEN
a[j] := FALSE;
DEC(n);
END;
END;
END;
END;
END;
END;
END;

(* print remaining primes *)
IF ListPrimes THEN
FOR i := b + 1 TO LAST(a) DO
IF a[i] THEN
IO.PutInt(i); IO.Put(" ");
END;
END;
END;

END;
END;
IO.Put("\n");


(* report *)
END Prime.</lang>
IO.Put("There are "); IO.PutInt(n);
IO.Put(" primes from 2 to "); IO.PutInt(LastNum);
IO.PutChar('\n');

END Eratosthenes.</syntaxhighlight>

===Advanced version===
===Advanced version===
This version uses more "advanced" types.
This version uses more "advanced" types.
<lang modula3>(* From the CM3 examples folder (comments removed). *)
<syntaxhighlight lang="modula3">(* From the CM3 examples folder (comments removed). *)


MODULE Sieve EXPORTS Main;
MODULE Sieve EXPORTS Main;
Line 6,084: Line 12,633:
END;
END;
IO.Put("\n");
IO.Put("\n");
END Sieve.</lang>
END Sieve.</syntaxhighlight>

=={{header|Mojo}}==

Tested with Mojo version 0.7:

<syntaxhighlight lang="mojo">from memory import memset_zero
from memory.unsafe import (DTypePointer)
from time import (now)

alias cLIMIT: Int = 1_000_000_000

struct SoEBasic(Sized):
var len: Int
var cmpsts: DTypePointer[DType.bool] # because DynamicVector has deep copy bug in mojo version 0.7
var sz: Int
var ndx: Int
fn __init__(inout self, limit: Int):
self.len = limit - 1
self.sz = limit - 1
self.ndx = 0
self.cmpsts = DTypePointer[DType.bool].alloc(limit - 1)
memset_zero(self.cmpsts, limit - 1)
for i in range(limit - 1):
let s = i * (i + 4) + 2
if s >= limit - 1: break
if self.cmpsts[i]: continue
let bp = i + 2
for c in range(s, limit - 1, bp):
self.cmpsts[c] = True
for i in range(limit - 1):
if self.cmpsts[i]: self.sz -= 1
fn __del__(owned self):
self.cmpsts.free()
fn __copyinit__(inout self, existing: Self):
self.len = existing.len
self.cmpsts = DTypePointer[DType.bool].alloc(self.len)
for i in range(self.len):
self.cmpsts[i] = existing.cmpsts[i]
self.sz = existing.sz
self.ndx = existing.ndx
fn __moveinit__(inout self, owned existing: Self):
self.len = existing.len
self.cmpsts = existing.cmpsts
self.sz = existing.sz
self.ndx = existing.ndx
fn __len__(self: Self) -> Int: return self.sz
fn __iter__(self: Self) -> Self: return self
fn __next__(inout self: Self) -> Int:
if self.ndx >= self.len: return 0
while (self.ndx < self.len) and (self.cmpsts[self.ndx]):
self.ndx += 1
let rslt = self.ndx + 2; self.sz -= 1; self.ndx += 1
return rslt

fn main():
print("The primes to 100 are:")
for prm in SoEBasic(100): print_no_newline(prm, " ")
print()
let strt0 = now()
let answr0 = len(SoEBasic(1_000_000))
let elpsd0 = (now() - strt0) / 1000000
print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
let strt1 = now()
let answr1 = len(SoEBasic(cLIMIT))
let elpsd1 = (now() - strt1) / 1000000
print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")</syntaxhighlight>

{{out}}

<pre>The primes to 100 are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1,000,000 in 1.2642770000000001 milliseconds.
Found 50847534 primes up to 1000000000 in 6034.328751 milliseconds.</pre>
as run on an AMD 7840HS CPU at 5.1 GHz.

Note that due to the huge memory array used, when large ranges are selected, the speed is disproportional in speed slow down by about four times.

This solution uses an interator struct which seems to be the Mojo-preferred way to do this, and normally a DynamicVector would have been used the the culling array except that there is a bug in this version of DynamicVector where the array is not properly deep copied when copied to a new location, so the raw pointer type is used.

===Odds-Only with Optimizations===

This version does three significant improvements to the above code as follows:
1) It is trivial to skip the processing to store representations for and cull the even comosite numbers other than the prime number two, saving half the storage space and reducing the culling time to about 40 percent.
2) There is a repeating pattern of culling composite representations over a bit-packed byte array (which reduces the storage requirement by another eight times) that repeats every eight culling operations, which can be encapsulated by a extreme loop unrolling technique with compiler generated constants as done here.
3) Further, there is a further extreme optimization technique of dense culling for small base prime values whose culling span is less than one register in size where the loaded register is repeatedly culled for different base prime strides before being written out (with such optimization done by the compiler), again using compiler generated modification constants. This technique is usually further optimizated by modern compilers to use efficient autovectorization and the use of SIMD registers available to the architecture to reduce these culling operations to an avererage of a tiny fraction of a CPU clock cycle per cull.

Mojo version 0.7 was tested:

<syntaxhighlight lang="mojo">from memory import (memset_zero, memcpy)
from memory.unsafe import (DTypePointer)
from math.bit import ctpop
from time import (now)

alias cLIMIT: Int = 1_000_000_000

alias cBufferSize: Int = 262144 # bytes
alias cBufferBits: Int = cBufferSize * 8

alias UnrollFunc = fn(DTypePointer[DType.uint8], Int, Int, Int) -> None

@always_inline
fn extreme[OFST: Int, BP: Int](pcmps: DTypePointer[DType.uint8], bufsz: Int, s: Int, bp: Int):
var cp = pcmps + (s >> 3)
let r1: Int = ((s + bp) >> 3) - (s >> 3)
let r2: Int = ((s + 2 * bp) >> 3) - (s >> 3)
let r3: Int = ((s + 3 * bp) >> 3) - (s >> 3)
let r4: Int = ((s + 4 * bp) >> 3) - (s >> 3)
let r5: Int = ((s + 5 * bp) >> 3) - (s >> 3)
let r6: Int = ((s + 6 * bp) >> 3) - (s >> 3)
let r7: Int = ((s + 7 * bp) >> 3) - (s >> 3)
let plmt: DTypePointer[DType.uint8] = pcmps + bufsz - r7
while cp < plmt:
cp.store(cp.load() | (1 << OFST))
(cp + r1).store((cp + r1).load() | (1 << ((OFST + BP) & 7)))
(cp + r2).store((cp + r2).load() | (1 << ((OFST + 2 * BP) & 7)))
(cp + r3).store((cp + r3).load() | (1 << ((OFST + 3 * BP) & 7)))
(cp + r4).store((cp + r4).load() | (1 << ((OFST + 4 * BP) & 7)))
(cp + r5).store((cp + r5).load() | (1 << ((OFST + 5 * BP) & 7)))
(cp + r6).store((cp + r6).load() | (1 << ((OFST + 6 * BP) & 7)))
(cp + r7).store((cp + r7).load() | (1 << ((OFST + 7 * BP) & 7)))
cp += bp
let eplmt: DTypePointer[DType.uint8] = plmt + r7
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << OFST))
cp += r1
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + BP) & 7)))
cp += r2 - r1
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 2 * BP) & 7)))
cp += r3 - r2
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 3 * BP) & 7)))
cp += r4 - r3
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 4 * BP) & 7)))
cp += r5 - r4
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 5 * BP) & 7)))
cp += r6 - r5
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 6 * BP) & 7)))
cp += r7 - r6
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 7 * BP) & 7)))

fn mkExtrm[CNT: Int](pntr: Pointer[UnrollFunc]):
@parameter
if CNT >= 32:
return
alias OFST = CNT >> 2
alias BP = ((CNT & 3) << 1) + 1
pntr.offset(CNT).store(extreme[OFST, BP])
mkExtrm[CNT + 1](pntr)

@always_inline
fn mkExtremeFuncs() -> Pointer[UnrollFunc]:
let jmptbl: Pointer[UnrollFunc] = Pointer[UnrollFunc].alloc(32)
mkExtrm[0](jmptbl)
return jmptbl
let extremeFuncs = mkExtremeFuncs()

alias DenseFunc = fn(DTypePointer[DType.uint64], Int, Int) -> DTypePointer[DType.uint64]

fn mkDenseCull[N: Int, BP: Int](cp: DTypePointer[DType.uint64]):
@parameter
if N >= 64:
return
alias MUL = N * BP
var cop = cp.offset(MUL >> 6)
cop.store(cop.load() | (1 << (MUL & 63)))
mkDenseCull[N + 1, BP](cp)

@always_inline
fn denseCullFunc[BP: Int](pcmps: DTypePointer[DType.uint64], bufsz: Int, s: Int) -> DTypePointer[DType.uint64]:
var cp: DTypePointer[DType.uint64] = pcmps + (s >> 6)
let plmt = pcmps + (bufsz >> 3) - BP
while cp < plmt:
mkDenseCull[0, BP](cp)
cp += BP
return cp

fn mkDenseFunc[CNT: Int](pntr: Pointer[DenseFunc]):
@parameter
if CNT >= 64:
return
alias BP = (CNT << 1) + 3
pntr.offset(CNT).store(denseCullFunc[BP])
mkDenseFunc[CNT + 1](pntr)

@always_inline
fn mkDenseFuncs() -> Pointer[DenseFunc]:
let jmptbl : Pointer[DenseFunc] = Pointer[DenseFunc].alloc(64)
mkDenseFunc[0](jmptbl)
return jmptbl

let denseFuncs : Pointer[DenseFunc] = mkDenseFuncs()

@always_inline
fn cullPass(cmpsts: DTypePointer[DType.uint8], bytesz: Int, s: Int, bp: Int):
if bp <= 129: # dense culling
var sm = s
while (sm >> 3) < bytesz and (sm & 63) != 0:
cmpsts[sm >> 3] |= (1 << (sm & 7))
sm += bp
let bcp = denseFuncs[(bp - 3) >> 1](cmpsts.bitcast[DType.uint64](), bytesz, sm)
var ns = 0
var ncp = bcp
let cmpstslmtp = (cmpsts + bytesz).bitcast[DType.uint64]()
while ncp < cmpstslmtp:
ncp[0] |= (1 << (ns & 63))
ns += bp
ncp = bcp + (ns >> 6)
else: # extreme loop unrolling culling
extremeFuncs[((s & 7) << 2) + ((bp & 7) >> 1)](cmpsts, bytesz, s, bp)
# for c in range(s, self.len, bp): # slow bit twiddling way
# self.cmpsts[c >> 3] |= (1 << (c & 7))

fn countPagePrimes(ptr: DTypePointer[DType.uint8], bitsz: Int) -> Int:
let wordsz: Int = (bitsz + 63) // 64 # round up to nearest 64 bit boundary
var rslt: Int = wordsz * 64
let bigcmps = ptr.bitcast[DType.uint64]()
for i in range(wordsz - 1):
rslt -= ctpop(bigcmps[i]).to_int()
rslt -= ctpop(bigcmps[wordsz - 1] | (-2 << ((bitsz - 1) & 63))).to_int()
return rslt

struct SoEOdds(Sized):
var len: Int
var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
var sz: Int
var ndx: Int
fn __init__(inout self, limit: Int):
self.len = 0 if limit < 2 else (limit - 3) // 2 + 1
self.sz = 0 if limit < 2 else self.len + 1 # for the unprocessed only even prime of two
self.ndx = -1
let bytesz = 0 if limit < 2 else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
memset_zero(self.cmpsts, bytesz)
for i in range(self.len):
let s = (i + i) * (i + 3) + 3
if s >= self.len: break
if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
let bp = i + i + 3
cullPass(self.cmpsts, bytesz, s, bp)
self.sz = countPagePrimes(self.cmpsts, self.len) + 1 # add one for only even prime of two
fn __del__(owned self):
self.cmpsts.free()
fn __copyinit__(inout self, existing: Self):
self.len = existing.len
let bytesz = (self.len + 7) // 8
self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
memcpy(self.cmpsts, existing.cmpsts, bytesz)
self.sz = existing.sz
self.ndx = existing.ndx
fn __moveinit__(inout self, owned existing: Self):
self.len = existing.len
self.cmpsts = existing.cmpsts
self.sz = existing.sz
self.ndx = existing.ndx
fn __len__(self: Self) -> Int: return self.sz
fn __iter__(self: Self) -> Self: return self
@always_inline
fn __next__(inout self: Self) -> Int:
if self.ndx < 0:
self.ndx = 0; self.sz -= 1; return 2
while (self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
self.ndx += 1
let rslt = (self.ndx << 1) + 3; self.sz -= 1; self.ndx += 1; return rslt

fn main():
print("The primes to 100 are:")
for prm in SoEOdds(100): print_no_newline(prm, " ")
print()
let strt0 = now()
let answr0 = len(SoEOdds(1_000_000))
let elpsd0 = (now() - strt0) / 1000000
print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
let strt1 = now()
let answr1 = len(SoEOdds(cLIMIT))
let elpsd1 = (now() - strt1) / 1000000
print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")</syntaxhighlight>

{{out}}

<pre>The primes to 100 are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1,000,000 in 0.085067000000000004 milliseconds.
Found 50847534 primes up to 1000000000 in 1204.866606 milliseconds.</pre>

This was run on the same computer as the above example; notice that while this is much faster than that version, it is still very slow as the sieving range gets large such that the relative processing time for a range that is 1000 times as large is about ten times slower than as might be expected by simple scaling. This is due to the "one huge sieving buffer" algorithm that gets very large with increasing range (and in fact will eventually limit the sieving range that can be used) to exceed the size of CPU cache buffers and thus greatly slow average memory access times.

===Page-Segmented Odds-Only with Optimizations===

While the above version performs reasonably well for small sieving ranges that fit within the CPU caches of a few tens of millions, as one can see it gets much slower with larger ranges and as well its huge RAM memory consumption limits the maximum range over which it can be used. This version solves these problems be breaking the huge sieving array into "pages" that each fit within the CPU cache size and processing each "page" sequentially until the target range is reached. This technique also greatly reduces memory requirements to only that required to store the base prime value representations up to the square root of the range limit (about O(n/log n) storage plus a fixed size page buffer. In this case, the storage for the base primes has been reduced by a constant factor by storing them as single byte deltas from the previous value, which works for ranges up to the 64-bit number range where the biggest gap is two times 192 and since we store only for odd base primes, the gap values are all half values to fit in a single byte.

Currently, Mojo has problems with some functions in the standard libraries such as the integer square root function is not accurate nor does it work for the required integer types so a custom integer square root function is supplied. As well, current Mojo does not support recursion for hardly any useful cases (other than compile time global function recursion), so the `SoeOdds` structure from the previous answer had to be kept to generate the base prime representation table (or this would have had to be generated from scratch within the new `SoEOddsPaged` structure). Finally, it didn't seem to be worth using the `Sized` trait for the new structure as this would seem to sometimes require processing the pages twice, one to obtain the size and once if iteration across the prime values is required.

Tested with Mojo version 0.7:

<syntaxhighlight lang="mojo">from memory import (memset_zero, memcpy)
from memory.unsafe import (DTypePointer)
from math.bit import ctpop
from time import (now)

alias cLIMIT: Int = 1_000_000_000

alias cBufferSize: Int = 262144 # bytes
alias cBufferBits: Int = cBufferSize * 8

fn intsqrt(n: UInt64) -> UInt64:
if n < 4:
if n < 1: return 0 else: return 1
var x: UInt64 = n; var qn: UInt64 = 0; var r: UInt64 = 0
while qn < 64 and (1 << qn) <= n:
qn += 2
var q: UInt64 = 1 << qn
while q > 1:
if qn >= 64:
q = 1 << (qn - 2); qn = 0
else:
q >>= 2
let t: UInt64 = r + q
r >>= 1
if x >= t:
x -= t; r += q
return r

alias UnrollFunc = fn(DTypePointer[DType.uint8], Int, Int, Int) -> None

@always_inline
fn extreme[OFST: Int, BP: Int](pcmps: DTypePointer[DType.uint8], bufsz: Int, s: Int, bp: Int):
var cp = pcmps + (s >> 3)
let r1: Int = ((s + bp) >> 3) - (s >> 3)
let r2: Int = ((s + 2 * bp) >> 3) - (s >> 3)
let r3: Int = ((s + 3 * bp) >> 3) - (s >> 3)
let r4: Int = ((s + 4 * bp) >> 3) - (s >> 3)
let r5: Int = ((s + 5 * bp) >> 3) - (s >> 3)
let r6: Int = ((s + 6 * bp) >> 3) - (s >> 3)
let r7: Int = ((s + 7 * bp) >> 3) - (s >> 3)
let plmt: DTypePointer[DType.uint8] = pcmps + bufsz - r7
while cp < plmt:
cp.store(cp.load() | (1 << OFST))
(cp + r1).store((cp + r1).load() | (1 << ((OFST + BP) & 7)))
(cp + r2).store((cp + r2).load() | (1 << ((OFST + 2 * BP) & 7)))
(cp + r3).store((cp + r3).load() | (1 << ((OFST + 3 * BP) & 7)))
(cp + r4).store((cp + r4).load() | (1 << ((OFST + 4 * BP) & 7)))
(cp + r5).store((cp + r5).load() | (1 << ((OFST + 5 * BP) & 7)))
(cp + r6).store((cp + r6).load() | (1 << ((OFST + 6 * BP) & 7)))
(cp + r7).store((cp + r7).load() | (1 << ((OFST + 7 * BP) & 7)))
cp += bp
let eplmt: DTypePointer[DType.uint8] = plmt + r7
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << OFST))
cp += r1
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + BP) & 7)))
cp += r2 - r1
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 2 * BP) & 7)))
cp += r3 - r2
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 3 * BP) & 7)))
cp += r4 - r3
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 4 * BP) & 7)))
cp += r5 - r4
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 5 * BP) & 7)))
cp += r6 - r5
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 6 * BP) & 7)))
cp += r7 - r6
if eplmt == cp or eplmt < cp: return
cp.store(cp.load() | (1 << ((OFST + 7 * BP) & 7)))

fn mkExtrm[CNT: Int](pntr: Pointer[UnrollFunc]):
@parameter
if CNT >= 32:
return
alias OFST = CNT >> 2
alias BP = ((CNT & 3) << 1) + 1
pntr.offset(CNT).store(extreme[OFST, BP])
mkExtrm[CNT + 1](pntr)

@always_inline
fn mkExtremeFuncs() -> Pointer[UnrollFunc]:
let jmptbl: Pointer[UnrollFunc] = Pointer[UnrollFunc].alloc(32)
mkExtrm[0](jmptbl)
return jmptbl
let extremeFuncs = mkExtremeFuncs()

alias DenseFunc = fn(DTypePointer[DType.uint64], Int, Int) -> DTypePointer[DType.uint64]

fn mkDenseCull[N: Int, BP: Int](cp: DTypePointer[DType.uint64]):
@parameter
if N >= 64:
return
alias MUL = N * BP
var cop = cp.offset(MUL >> 6)
cop.store(cop.load() | (1 << (MUL & 63)))
mkDenseCull[N + 1, BP](cp)

@always_inline
fn denseCullFunc[BP: Int](pcmps: DTypePointer[DType.uint64], bufsz: Int, s: Int) -> DTypePointer[DType.uint64]:
var cp: DTypePointer[DType.uint64] = pcmps + (s >> 6)
let plmt = pcmps + (bufsz >> 3) - BP
while cp < plmt:
mkDenseCull[0, BP](cp)
cp += BP
return cp

fn mkDenseFunc[CNT: Int](pntr: Pointer[DenseFunc]):
@parameter
if CNT >= 64:
return
alias BP = (CNT << 1) + 3
pntr.offset(CNT).store(denseCullFunc[BP])
mkDenseFunc[CNT + 1](pntr)

@always_inline
fn mkDenseFuncs() -> Pointer[DenseFunc]:
let jmptbl : Pointer[DenseFunc] = Pointer[DenseFunc].alloc(64)
mkDenseFunc[0](jmptbl)
return jmptbl

let denseFuncs : Pointer[DenseFunc] = mkDenseFuncs()

@always_inline
fn cullPass(cmpsts: DTypePointer[DType.uint8], bytesz: Int, s: Int, bp: Int):
if bp <= 129: # dense culling
var sm = s
while (sm >> 3) < bytesz and (sm & 63) != 0:
cmpsts[sm >> 3] |= (1 << (sm & 7))
sm += bp
let bcp = denseFuncs[(bp - 3) >> 1](cmpsts.bitcast[DType.uint64](), bytesz, sm)
var ns = 0
var ncp = bcp
let cmpstslmtp = (cmpsts + bytesz).bitcast[DType.uint64]()
while ncp < cmpstslmtp:
ncp[0] |= (1 << (ns & 63))
ns += bp
ncp = bcp + (ns >> 6)
else: # extreme loop unrolling culling
extremeFuncs[((s & 7) << 2) + ((bp & 7) >> 1)](cmpsts, bytesz, s, bp)
# for c in range(s, self.len, bp): # slow bit twiddling way
# self.cmpsts[c >> 3] |= (1 << (c & 7))

fn cullPage(lwi: Int, lmt: Int, cmpsts: DTypePointer[DType.uint8], bsprmrps: DTypePointer[DType.uint8]):
var bp = 1; var ndx = 0
while True:
bp += bsprmrps[ndx].to_int() << 1
let i = (bp - 3) >> 1
var s = (i + i) * (i + 3) + 3
if s >= lmt: break
if s >= lwi: s -= lwi
else:
s = (lwi - s) % bp
if s != 0: s = bp - s
cullPass(cmpsts, cBufferSize, s, bp)
ndx += 1

fn countPagePrimes(ptr: DTypePointer[DType.uint8], bitsz: Int) -> Int:
let wordsz: Int = (bitsz + 63) // 64 # round up to nearest 64 bit boundary
var rslt: Int = wordsz * 64
let bigcmps = ptr.bitcast[DType.uint64]()
for i in range(wordsz - 1):
rslt -= ctpop(bigcmps[i]).to_int()
rslt -= ctpop(bigcmps[wordsz - 1] | (-2 << ((bitsz - 1) & 63))).to_int()
return rslt

struct SoEOdds(Sized):
var len: Int
var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
var sz: Int
var ndx: Int
fn __init__(inout self, limit: Int):
self.len = 0 if limit < 2 else (limit - 3) // 2 + 1
self.sz = 0 if limit < 2 else self.len + 1 # for the unprocessed only even prime of two
self.ndx = -1
let bytesz = 0 if limit < 2 else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
memset_zero(self.cmpsts, bytesz)
for i in range(self.len):
let s = (i + i) * (i + 3) + 3
if s >= self.len: break
if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
let bp = i + i + 3
cullPass(self.cmpsts, bytesz, s, bp)
self.sz = countPagePrimes(self.cmpsts, self.len) + 1 # add one for only even prime of two
fn __del__(owned self):
self.cmpsts.free()
fn __copyinit__(inout self, existing: Self):
self.len = existing.len
let bytesz = (self.len + 7) // 8
self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
memcpy(self.cmpsts, existing.cmpsts, bytesz)
self.sz = existing.sz
self.ndx = existing.ndx
fn __moveinit__(inout self, owned existing: Self):
self.len = existing.len
self.cmpsts = existing.cmpsts
self.sz = existing.sz
self.ndx = existing.ndx
fn __len__(self: Self) -> Int: return self.sz
fn __iter__(self: Self) -> Self: return self
@always_inline
fn __next__(inout self: Self) -> Int:
if self.ndx < 0:
self.ndx = 0; self.sz -= 1; return 2
while (self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
self.ndx += 1
let rslt = (self.ndx << 1) + 3; self.sz -= 1; self.ndx += 1; return rslt

struct SoEOddsPaged:
var len: Int
var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
var sz: Int # 0 means finished; otherwise contains number of odd base primes
var ndx: Int
var lwi: Int
var bsprmrps: DTypePointer[DType.uint8] # contains deltas between odd base primes starting from zero
fn __init__(inout self, limit: UInt64):
self.len = 0 if limit < 2 else ((limit - 3) // 2 + 1).to_int()
self.sz = 0 if limit < 2 else 1 # means iterate until this is set to zero
self.ndx = -1 # for unprocessed only even prime of two
self.lwi = 0
if self.len < cBufferBits:
let bytesz = ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
else:
self.cmpsts = DTypePointer[DType.uint8].alloc(cBufferSize)
let bsprmitr = SoEOdds(intsqrt(limit).to_int())
self.sz = len(bsprmitr)
self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
var ndx = -1; var oldbp = 1
for bsprm in bsprmitr:
if ndx < 0: ndx += 1; continue # skip over the 2 prime
self.bsprmrps[ndx] = (bsprm - oldbp) >> 1
oldbp = bsprm; ndx += 1
self.bsprmrps[ndx] = 255 # one extra value to go beyond the necessary cull space
fn __del__(owned self):
self.cmpsts.free(); self.bsprmrps.free()
fn __copyinit__(inout self, existing: Self):
self.len = existing.len
self.sz = existing.sz
let bytesz = cBufferSize if self.len >= cBufferBits
else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
memcpy(self.cmpsts, existing.cmpsts, bytesz)
self.ndx = existing.ndx
self.lwi = existing.lwi
self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
memcpy(self.bsprmrps, existing.bsprmrps, self.sz)
fn __moveinit__(inout self, owned existing: Self):
self.len = existing.len
self.cmpsts = existing.cmpsts
self.sz = existing.sz
self.ndx = existing.ndx
self.lwi = existing.lwi
self.bsprmrps = existing.bsprmrps
fn countPrimes(self) -> Int:
if self.len <= cBufferBits: return len(SoEOdds(2 * self.len + 1))
var cnt = 1; var lwi = 0
let cmpsts = DTypePointer[DType.uint8].alloc(cBufferSize)
memset_zero(cmpsts, cBufferSize)
cullPage(0, cBufferBits, cmpsts, self.bsprmrps)
while lwi + cBufferBits <= self.len:
cnt += countPagePrimes(cmpsts, cBufferBits)
lwi += cBufferBits
memset_zero(cmpsts, cBufferSize)
let lmt = lwi + cBufferBits if lwi + cBufferBits <= self.len else self.len
cullPage(lwi, lmt, cmpsts, self.bsprmrps)
cnt += countPagePrimes(cmpsts, self.len - lwi)
return cnt
fn __len__(self: Self) -> Int: return self.sz
fn __iter__(self: Self) -> Self: return self
@always_inline
fn __next__(inout self: Self) -> Int: # don't count number of primes by interating - slooow
if self.ndx < 0:
self.ndx = 0; self.lwi = 0
if self.len < 2: self.sz = 0
elif self.len <= cBufferBits:
let bytesz = ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
memset_zero(self.cmpsts, bytesz)
for i in range(self.len):
let s = (i + i) * (i + 3) + 3
if s >= self.len: break
if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
let bp = i + i + 3
cullPass(self.cmpsts, bytesz, s, bp)
else:
memset_zero(self.cmpsts, cBufferSize)
cullPage(0, cBufferBits, self.cmpsts, self.bsprmrps)
return 2
let rslt = ((self.lwi + self.ndx) << 1) + 3; self.ndx += 1
if self.lwi + cBufferBits >= self.len:
while (self.lwi + self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
self.ndx += 1
else:
while (self.ndx < cBufferBits) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
self.ndx += 1
while (self.ndx >= cBufferBits) and (self.lwi + cBufferBits <= self.len):
self.ndx = 0; self.lwi += cBufferBits; memset_zero(self.cmpsts, cBufferSize)
let lmt = self.lwi + cBufferBits if self.lwi + cBufferBits <= self.len else self.len
cullPage(self.lwi, lmt, self.cmpsts, self.bsprmrps)
let buflmt = cBufferBits if self.lwi + cBufferBits <= self.len else self.len - self.lwi
while (self.ndx < buflmt) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
self.ndx += 1
if self.lwi + self.ndx >= self.len: self.sz = 0
return rslt

fn main():
print("The primes to 100 are:")
for prm in SoEOddsPaged(100): print_no_newline(prm, " ")
print()
let strt0 = now()
let answr0 = SoEOddsPaged(1_000_000).countPrimes()
let elpsd0 = (now() - strt0) / 1000000
print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
let strt1 = now()
let answr1 = SoEOddsPaged(cLIMIT).countPrimes()
let elpsd1 = (now() - strt1) / 1000000
print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")</syntaxhighlight>

{{out}}

<pre>The primes to 100 are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1,000,000 in 0.084122000000000002 milliseconds.
Found 50847534 primes up to 1000000000 in 139.509275 milliseconds.</pre>

This was tested on the same computer as the previous Mojo versions. Note that the time now scales quite well with range since there are no longer the huge RAM access time bottleneck's. This version is only about 2.25 times slower than Kim Walich's primesieve program written in C++ and the mostly constant factor difference will be made up if one adds wheel factorization to the same level as he uses (basic wheel factorization ratio of 48/105 plus some other more minor optimizations). This version can count the number of primes to 1e11 in about 21.85 seconds on this machine. It will work reasonably efficiently up to a range of about 1e14 before other optimization techniques such as "bucket sieving" should be used.

For counting the number of primes to a billion (1e9), this version has reduced the time by about a factor of 40 from the original version and over eight times from the odds-only version above. Adding wheel factorization will make it almost two and a half times faster yet for a gain in speed of about a hundred times over the original version.


=={{header|MUMPS}}==
=={{header|MUMPS}}==
<lang MUMPS>ERATO1(HI)
<syntaxhighlight lang="mumps">ERATO1(HI)
;performs the Sieve of Erotosethenes up to the number passed in.
;performs the Sieve of Erotosethenes up to the number passed in.
;This version sets an array containing the primes
;This version sets an array containing the primes
Line 6,096: Line 13,282:
FOR I=2:1:HI S:'$DATA(P(I)) ERATO1(I)=I
FOR I=2:1:HI S:'$DATA(P(I)) ERATO1(I)=I
KILL I,J,P
KILL I,J,P
QUIT</lang>
QUIT</syntaxhighlight>
Example:
Example:
<pre>USER>SET MAX=100,C=0 DO ERATO1^ROSETTA(MAX)
<pre>USER>SET MAX=100,C=0 DO ERATO1^ROSETTA(MAX)
Line 6,103: Line 13,289:
PRIMES BETWEEN 1 AND 100
PRIMES BETWEEN 1 AND 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,79, 83, 89, 97,</pre>
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,79, 83, 89, 97,</pre>

=={{header|Neko}}==
<syntaxhighlight lang="actionscript">/* The Computer Language Shootout
http://shootout.alioth.debian.org/

contributed by Nicolas Cannasse
*/
fmt = function(i) {
var s = $string(i);
while( $ssize(s) < 8 )
s = " "+s;
return s;
}
nsieve = function(m) {
var a = $amake(m);
var count = 0;
var i = 2;
while( i < m ) {
if $not(a[i]) {
count += 1;
var j = (i << 1);
while( j < m ) {
if( $not(a[j]) ) a[j] = true;
j += i;
}
}
i += 1;
}
$print("Primes up to ",fmt(m)," ",fmt(count),"\n");
}

var n = $int($loader.args[0]);
if( n == null ) n = 2;
var i = 0;
while( i < 3 ) {
nsieve(10000 << (n - i));
i += 1;
}</syntaxhighlight>

{{out}}
<pre>prompt$ nekoc nsieve.neko
prompt$ time -p neko nsieve.n
Primes up to 40000 4203
Primes up to 20000 2262
Primes up to 10000 1229
real 0.02
user 0.01
sys 0.00</pre>



=={{header|NetRexx}}==
=={{header|NetRexx}}==
===Version 1 (slow)===
===Version 1 (slow)===
<lang Rexx>/* NetRexx */
<syntaxhighlight lang="rexx">/* NetRexx */


options replace format comments java crossref savelog symbols binary
options replace format comments java crossref savelog symbols binary
Line 6,162: Line 13,397:
method isFalse public constant binary returns boolean
method isFalse public constant binary returns boolean
return \isTrue
return \isTrue
</syntaxhighlight>
</lang>
;Output
;Output
<pre style="overflow:scroll">
<pre style="overflow:scroll">
Line 6,170: Line 13,405:
</pre>
</pre>
===Version 2 (significantly, i.e. 10 times faster)===
===Version 2 (significantly, i.e. 10 times faster)===
<lang NetRexx>/* NetRexx ************************************************************
<syntaxhighlight lang="netrexx">/* NetRexx ************************************************************
* Essential improvements:Use boolean instead of Rexx for sv
* Essential improvements:Use boolean instead of Rexx for sv
* and remove methods isTrue and isFalse
* and remove methods isTrue and isFalse
Line 6,236: Line 13,471:
end p_
end p_


return primes</lang>
return primes</syntaxhighlight>

=={{header|newLISP}}==
{{incorrect|newLISP|This version uses rem (division) testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
This version is maybe a little different because it no longer stores the primes after they've been generated and sent to the main output. Lisp has very convenient list editing, so we don't really need the Boolean flag arrays you'd tend find in the Algol-like languages. We can just throw away the multiples of each prime from an initial list of integers. The implementation is easier if we delete every multiple, including the prime number itself: the list always contains only the numbers that haven't been processed yet, starting with the next prime, and the program is finished when the list becomes empty.

Note that the lambda expression in the following script does not involve a closure; newLISP has dynamic scope, so it matters that the same variable names will not be reused for some other purpose (at runtime) before the anonymous function is called.

<syntaxhighlight lang="newlisp">(set 'upper-bound 1000)

; The initial sieve is a list of all the numbers starting at 2.
(set 'sieve (sequence 2 upper-bound))

; Keep working until the list is empty.
(while sieve

; The first number in the list is always prime
(set 'new-prime (sieve 0))
(println new-prime)

; Filter the list leaving only the non-multiples of each number.
(set 'sieve
(filter
(lambda (each-number)
(not (zero? (% each-number new-prime))))
sieve)))

(exit)</syntaxhighlight>

{{output}}
<pre>2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97
101
103
107
109
113
127
131
137
139
149
151
157
163
167
173
179
181
191
193
197
199
211
223
227
229
233
239
241
251
257
263
269
271
277
281
283
293
307
311
313
317
331
337
347
349
353
359
367
373
379
383
389
397
401
409
419
421
431
433
439
443
449
457
461
463
467
479
487
491
499
503
509
521
523
541
547
557
563
569
571
577
587
593
599
601
607
613
617
619
631
641
643
647
653
659
661
673
677
683
691
701
709
719
727
733
739
743
751
757
761
769
773
787
797
809
811
821
823
827
829
839
853
857
859
863
877
881
883
887
907
911
919
929
937
941
947
953
967
971
977
983
991
997
</pre>

=={{header|Nial}}==
=={{header|Nial}}==
{{incorrect|Nial|It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
{{incorrect|Nial|It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
Line 6,243: Line 13,677:
|primes 10
|primes 10
=2 3 5 7
=2 3 5 7

=={{header|Nim}}==
=={{header|Nim}}==
<lang nim>import math
<syntaxhighlight lang="nim">from math import sqrt
iterator iprimes_upto(limit: int): int =
iterator primesUpto(limit: int): int =
let sqrtlmt = int(sqrt float64(limit))
let sqrtLimit = int(sqrt(float64(limit)))
var is_cmpsts = newSeq[bool](limit + 1)
var composites = newSeq[bool](limit + 1)
for n in 2 .. sqrtLimit: # cull to square root of limit
is_cmpsts[0] = true; is_cmpsts[1] = true
for n in 2 .. sqrtlmt: # cull to square root of limit
if not composites[n]: # if prime -> cull its composites
for c in countup(n * n, limit, n): # start at ``n`` squared
if not is_cmpsts[n]: # if prime -> cull its composites
composites[c] = true
for i in countup((n *% n), limit+1, n): # start at ``n`` squared
is_cmpsts[i] = true
for n in 2 .. limit: # separate iteration over results
for n in 2 .. limit: # separate iteration over results
if not is_cmpsts[n]:
if not composites[n]:
yield n
yield n

stdout.write "The primes up to 100 are: "
echo("Primes are:")
for x in iprimes_upto(100):
for x in primesUpto(100):
write(stdout, x, " ")
stdout.write(x, " ")
echo ""
echo()

var count = 0
var count = 0
for p in iprimes_upto(1000000):
for p in primesUpto(1000000):
count += 1
count += 1
writeLine stdout, "There are ", count, " primes up to 1000000."</lang>
echo "There are ", count, " primes up to 1000000."</syntaxhighlight>
{{out}}
{{out}}
<pre>Primes are:
<pre>Primes are:
Line 6,272: Line 13,706:
There are 78498 primes up to 1000000.</pre>
There are 78498 primes up to 1000000.</pre>


===Alternate odds-only bit-packed version===
'''Alternate odds-only bit-packed version'''


The above version wastes quite a lot of memory by using a sequence of boolean values to sieve the composite numbers and sieving all numbers when two is the only even prime. The below code uses a bit-packed sequence to save a factor of eight in memory and also sieves only odd primes for another memory saving by a factor of two; it is also over two and a half times faster due to reduced number of culling operations and better use of the CPU cache as a little cache goes a lot further - this better use of cache is more than enough to make up for the extra bit-packing shifting operations:
The above version wastes quite a lot of memory by using a sequence of boolean values to sieve the composite numbers and sieving all numbers when two is the only even prime. The below code uses a bit-packed sequence to save a factor of eight in memory and also sieves only odd primes for another memory saving by a factor of two; it is also over two and a half times faster due to reduced number of culling operations and better use of the CPU cache as a little cache goes a lot further - this better use of cache is more than enough to make up for the extra bit-packing shifting operations:


<lang nim>iterator isoe_upto(top: uint): uint =
<syntaxhighlight lang="nim">iterator isoe_upto(top: uint): uint =
let topndx = int((top - 3) div 2)
let topndx = int((top - 3) div 2)
let sqrtndx = (int(sqrt float64(top)) - 3) div 2
let sqrtndx = (int(sqrt float64(top)) - 3) div 2
Line 6,288: Line 13,722:
for i in 0 .. topndx:
for i in 0 .. topndx:
if (cmpsts[i shr 5] and (1u32 shl (i and 31))) == 0:
if (cmpsts[i shr 5] and (1u32 shl (i and 31))) == 0:
yield uint(i + i + 3)</lang>
yield uint(i + i + 3)</syntaxhighlight>


The above code can be used with the same output functions as in the first code, just replacing the name of the iterator "iprimes_upto" with this iterator's name "isoe_upto" in two places. The output will be identical.
The above code can be used with the same output functions as in the first code, just replacing the name of the iterator "iprimes_upto" with this iterator's name "isoe_upto" in two places. The output will be identical.

==={{header|Nim Unbounded Versions}}===

For many purposes, one doesn't know the exact upper limit desired to easily use the above versions; in addition, those versions use an amount of memory proportional to the range sieved. In contrast, unbounded versions continuously update their range as they progress and only use memory proportional to the secondary base primes stream, which is only proportional to the square root of the range. One of the most basic functional versions is the TreeFolding sieve which is based on merging lazy streams as per Richard Bird's contribution to incremental sieves in Haskell, but which has a much better asymptotic execution complexity due to the added tree folding. The following code is a version of that in Nim (odds-only):
<syntaxhighlight lang="nim">import sugar
from times import epochTime

type PrimeType = int
iterator primesTreeFolding(): PrimeType {.closure.} =
# needs a Co Inductive Stream - CIS...
type
CIS[T] = ref object
head: T
tail: () -> CIS[T]

proc merge(xs, ys: CIS[PrimeType]): CIS[PrimeType] =
let x = xs.head;
let y = ys.head
if x < y:
CIS[PrimeType](head: x, tail: () => merge(xs.tail(), ys))
elif y < x:
CIS[PrimeType](
head: y,
tail: () => merge(xs, ys.tail()))
else:
CIS[PrimeType](
head: x,
tail: () => merge(xs.tail(), ys.tail()))

proc pmults(p: PrimeType): CIS[PrimeType] =
let inc = p + p
proc mlts(c: PrimeType): CIS[PrimeType] =
CIS[PrimeType](head: c, tail: () => mlts(c + inc))
mlts(p * p)

proc allmults(ps: CIS[PrimeType]): CIS[CIS[PrimeType]] =
CIS[CIS[PrimeType]](
head: pmults(ps.head),
tail: () => allmults(ps.tail()))

proc pairs(css: CIS[CIS[PrimeType]]): CIS[CIS[PrimeType]] =
let cs0 = css.head;
let rest0 = css.tail()
CIS[CIS[PrimeType]](
head: merge(cs0, rest0.head),
tail: () => pairs(rest0.tail()))

proc cmpsts(css: CIS[CIS[PrimeType]]): CIS[PrimeType] =
let cs0 = css.head
CIS[PrimeType](
head: cs0.head,
tail: () => merge(cs0.tail(), css.tail().pairs.cmpsts))

proc minusAt(n: PrimeType, cs: CIS[PrimeType]): CIS[PrimeType] =
var nn = n;
var ncs = cs
while nn >= ncs.head:
nn += 2;
ncs = ncs.tail()
CIS[PrimeType](head: nn, tail: () => minusAt(nn + 2, ncs))

proc oddprms(): CIS[PrimeType] =
CIS[PrimeType](
head: 3.PrimeType,
tail: () => minusAt(5.PrimeType, oddprms().allmults.cmpsts))

var prms = CIS[PrimeType](head: 2.PrimeType, tail: () => oddprms())
while true:
yield prms.head;
prms = prms.tail()

stdout.write "The first 25 primes are: "
var counter = 0
for p in primesTreeFolding():
if counter >= 25: break
stdout.write(p, " "); counter += 1
echo()

let start = epochTime()
counter = 0
for p in primesTreeFolding():
if p > 1000000: break
else: counter += 1
let elapsed = epochTime() - start
echo "There are ", counter, " primes up to 1000000."
echo "This test took ", elapsed, " seconds."</syntaxhighlight>
{{output}}
<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to 1000000.
This test took 0.2780287265777588 seconds.</pre>

With Nim 1.4, it takes about 0.4s (in release or danger mode) to compute the primes until one million, better than the time needed in previous versions. With option "--gc arc", this time drops to 0.28s on a small laptop. This is still slow compared to bound algorithm which is due to the many small memory allocations/de-allocations required, which is a characteristic of functional forms of code. Is is purely functional in that everything is immutable other than that Nim does not have Tail Call Optimization (TCO) so that we can freely use function recursion with no execution time cost; therefore, where necessary this is implemented with imperative loops, which is what TCO is generally turned into such forms "under the covers". It is also slow due to the algorithm being only O(n (log n) (log (log n))) rather than without the extra "log n" factor as some version have. This slowness makes it only moderately useful for ranges up to a few million.

Since the algorithm does not require the memoization of a full lazy list, it uses an internal Co Inductive Stream of deferred execution states, finally outputting an iterator to enumerate over the lazily computed stream of primes.

'''A faster alternative using a mutable hash table (odds-only)'''

To show the cost of functional forms of code, the following code is written embracing mutability, both by using a mutable hash table to store the state of incremental culling by the secondary stream of base primes and by using mutable values to store the state wherever possible, as per the following code:
<syntaxhighlight lang="nim">import tables, times

type PrimeType = int
proc primesHashTable(): iterator(): PrimeType {.closure.} =
iterator output(): PrimeType {.closure.} =
# some initial values to avoid race and reduce initializations...
yield 2.PrimeType; yield 3.PrimeType; yield 5.PrimeType; yield 7.PrimeType
var h = initTable[PrimeType,PrimeType]()
var n = 9.PrimeType
let bps = primesHashTable()
var bp = bps() # advance past 2
bp = bps()
var q = bp * bp # to initialize with 3
while true:
if n >= q:
let incr = bp + bp
h[n + incr] = incr
bp = bps()
q = bp * bp
elif h.hasKey(n):
var incr: PrimeType
discard h.take(n, incr)
var nxt = n + incr
while h.hasKey(nxt):
nxt += incr # ensure no duplicates
h[nxt] = incr
else:
yield n
n += 2.PrimeType
output

stdout.write "The first 25 primes are: "
var counter = 0
var iter = primesHashTable()
for p in iter():
if counter >= 25:
break
else:
stdout.write(p, " ")
counter += 1
echo ""
let start = epochTime()
counter = 0
iter = primesHashTable()
for p in iter():
if p > 1000000: break
else: counter += 1
let elapsed = epochTime() - start
echo "The number of primes up to a million is: ", counter
stdout.write("This test took ", elapsed, " seconds.\n")</syntaxhighlight>

{{out}}
Time for version compiled with “-d:danger” option.
<pre>The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
The number of primes up to a million is: 78498
This test took 0.05106830596923828 seconds.</pre>

The output is identical to the first unbounded version, other than, in danger mode, it is over about eight times faster sieving to a million. For larger ranges it will continue to pull further ahead of the above version due to only O(n (log (log n))) performance because of the hash table having an average of O(1) access, and it is only so slow due to the large constant overhead of doing the hashing calculations and look-ups.

'''Very fast Page Segmented version using a bit-packed mutable array (odds-only)'''

Note: This version is used as a very fast alternative in [[Extensible_prime_generator#Nim]]

For the highest speeds, one needs to use page segmented mutable arrays as in the bit-packed version here:
<syntaxhighlight lang="nim"># a Page Segmented Odd-Only Bit-Packed Sieve of Eratosthenes...

from times import epochTime # for testing
from bitops import popCount

type Prime = uint64

let LIMIT = 1_000_000_000.Prime
let CPUL1CACHE = 16384 # in bytes

const FRSTSVPRM = 3.Prime

type
BasePrime = uint32
BasePrimeArray = seq[BasePrime]
SieveBuffer = seq[byte] # byte size gives the most potential efficiency...

# define a general purpose lazy list to use as secondary base prime arrays feed
# NOT thread safe; needs a Mutex gate to make it so, but not threaded (yet)...
type
BasePrimeArrayLazyList = ref object
head: BasePrimeArray
tailf: proc (): BasePrimeArrayLazyList {.closure.}
tail: BasePrimeArrayLazyList
template makeBasePrimeArrayLazyList(hd: BasePrimeArray;
body: untyped): untyped = # factory constructor
let thnk = proc (): BasePrimeArrayLazyList {.closure.} = body
BasePrimeArrayLazyList(head: hd, tailf: thnk)
proc rest(lzylst: sink BasePrimeArrayLazyList): BasePrimeArrayLazyList {.inline.} =
if lzylst.tailf != nil: lzylst.tail = lzylst.tailf(); lzylst.tailf = nil
return lzylst.tail
iterator items(lzylst: BasePrimeArrayLazyList): BasePrime {.inline.} =
var ll = lzylst
while ll != nil:
for bp in ll.head: yield bp
ll = ll.rest

# count the number of zero bits (primes) in a SieveBuffer,
# uses native popCount for extreme speed;
# counts up to the bit index of the last bit to be counted...
proc countSieveBuffer(lsti: int; cmpsts: SieveBuffer): int =
let lstw = (lsti shr 3) and -8; let lstm = lsti and 63 # last word and bit index!
result = (lstw shl 3) + 64 # preset for all ones!
let cmpstsa = cast[int](cmpsts[0].unsafeAddr)
let cmpstslsta = cmpstsa + lstw
for csa in countup(cmpstsa, cmpstslsta - 1, 8):
result -= cast[ptr uint64](csa)[].popCount # subtract number of found ones!
let msk = (0'u64 - 2'u64) shl lstm # mask for the unused bits in last word!
result -= (cast[ptr uint64](cmpstslsta)[] or msk).popCount
# a fast fill SieveBuffer routine using pointers...
proc fillSieveBuffer(sb: var SieveBuffer) = zeroMem(sb[0].unsafeAddr, sb.len)

const BITMASK = [1'u8, 2, 4, 8, 16, 32, 64, 128] # faster than shifting!

# do sieving work, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
proc cullSieveBuffer(lwi: int; bpas: BasePrimeArrayLazyList;
sb: var SieveBuffer) =
let len = sb.len; let szbits = len shl 3; let nxti = lwi + szbits
for bp in bpas:
let bpwi = ((bp.Prime - FRSTSVPRM) shr 1).int
var s = (bpwi shl 1) * (bpwi + FRSTSVPRM.int) + FRSTSVPRM.int
if s >= nxti: break
if s >= lwi: s -= lwi
else:
let r = (lwi - s) mod bp.int
s = (if r == 0: 0 else: bp.int - r)
let clmt = szbits - (bp.int shl 3)
# if len == CPUL1CACHE: continue
if s < clmt:
let slmt = s + (bp.int shl 3)
while s < slmt:
let msk = BITMASK[s and 7]
for c in countup(s shr 3, len - 1, bp.int):
sb[c] = sb[c] or msk
s += bp.int
continue
while s < szbits:
let w = s shr 3; sb[w] = sb[w] or BITMASK[s and 7]; s += bp.int # (1'u8 shl (s and 7))

proc makeBasePrimeArrays(): BasePrimeArrayLazyList # forward reference!

# an iterator over successive sieved buffer composite arrays,
# returning whatever type the cnvrtr produces from
# the low index and the culled SieveBuffer...
proc makePrimePages[T](
strtwi, sz: int; cnvrtrf: proc (li: int; sb: var SieveBuffer): T {.closure.}
): (iterator(): T {.closure.}) =
var lwi = strtwi; let bpas = makeBasePrimeArrays(); var cmpsts = newSeq[byte](sz)
return iterator(): T {.closure.} =
while true:
fillSieveBuffer(cmpsts); cullSieveBuffer(lwi, bpas, cmpsts)
yield cnvrtrf(lwi, cmpsts); lwi += cmpsts.len shl 3
# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
proc makeBasePrimeArrays(): BasePrimeArrayLazyList =
# converts an entire sieved array of bytes into an array of base primes,
# to be used as a source of base primes as part of the Lazy List...
proc sb2bpa(li: int; sb: var SieveBuffer): BasePrimeArray =
let szbits = sb.len shl 3; let len = countSieveBuffer(szbits - 1, sb)
result = newSeq[BasePrime](len); var j = 0
for i in 0 ..< szbits:
if (sb[i shr 3] and BITMASK[i and 7]) == 0'u8:
result[j] = FRSTSVPRM.BasePrime + ((li + i) shl 1).BasePrime; j.inc
proc nxtbparr(
pgen: iterator (): BasePrimeArray {.closure.}): BasePrimeArrayLazyList =
return makeBasePrimeArrayLazyList(pgen()): nxtbparr(pgen)
# pre-seeding first array breaks recursive race,
# dummy primes of all odd numbers starting at FRSTSVPRM (unculled)...
var cmpsts = newSeq[byte](512)
let dummybparr = sb2bpa(0, cmpsts)
let fakebps = makeBasePrimeArrayLazyList(dummybparr): nil # used just once here!
cullSieveBuffer(0, fakebps, cmpsts)
return makeBasePrimeArrayLazyList(sb2bpa(0, cmpsts)):
nxtbparr(makePrimePages(4096, 512, sb2bpa)) # lazy recursive call breaks race!
# iterator over primes from above page iterator;
# takes at least as long to enumerate the primes as sieve them...
iterator primesPaged(): Prime {.inline.} =
yield 2
proc mkprmarr(li: int; sb: var SieveBuffer): seq[Prime] =
let szbits = sb.len shl 3; let low = FRSTSVPRM + (li + li).Prime; var j = 0
let len = countSieveBuffer(szbits - 1, sb); result = newSeq[Prime](len)
for i in 0 ..< szbits:
if (sb[i shr 3] and BITMASK[i and 7]) == 0'u8:
result[j] = low + (i + i).Prime; j.inc
let gen = makePrimePages(0, CPUL1CACHE, mkprmarr)
for prmpg in gen():
for prm in prmpg: yield prm
proc countPrimesTo(range: Prime): int64 =
if range < FRSTSVPRM: return (if range < 2: 0 else: 1)
result = 1; let rngi = ((range - FRSTSVPRM) shr 1).int
proc cntr(li: int; sb: var SieveBuffer): (int, int) {.closure.} =
let szbits = sb.len shl 3; let nxti = li + szbits; result = (0, nxti)
if nxti <= rngi: result[0] += countSieveBuffer(szbits - 1, sb)
else: result[0] += countSieveBuffer(rngi - li, sb)
let gen = makePrimePages(0, CPUL1CACHE, cntr)
for count, nxti in gen():
result += count; if nxti > rngi: break

# showing results...
echo "Page Segmented Bit-Packed Odds-Only Sieve of Eratosthenes"
echo "Needs at least ", CPUL1CACHE, " bytes of CPU L1 cache memory.\n"

stdout.write "First 25 primes: "
var counter0 = 0
for p in primesPaged():
if counter0 >= 25: break
stdout.write(p, " "); counter0.inc
echo ""

stdout.write "The number of primes up to a million is: "
var counter1 = 0
for p in primesPaged():
if p > 1_000_000.Prime: break else: counter1.inc
stdout.write counter1, " - these both found by (slower) enumeration.\n"

let start = epochTime()
#[ # slow way to count primes takes as long to enumerate as sieve!
var counter = 0
for p in primesPaged():
if p > LIMIT: break else: counter.inc
# ]#
let counter = countPrimesTo LIMIT # the fast way using native popCount!
let elpsd = epochTime() - start

echo "Found ", counter, " primes up to ", LIMIT, " in ", elpsd, " seconds."</syntaxhighlight>
{{output}}
Time is obtained with Nim 1.4 with options <code>-d:danger --gc:arc</code>.
<pre>Page Segmented Bit-Packed Odds-Only Sieve of Eratosthenes
Needs at least 16384 bytes of CPU L1 cache memory.

First 25 primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
The number of primes up to a million is: 78498 - these both found by (slower) enumeration.
Found 50847534 primes up to 1000000000 in 0.7931935787200928 seconds.</pre>

The above version approaches a hundred times faster than the incremental style versions above due to the high efficiency of direct mutable memory operations in modern CPU's, and is useful for ranges of billions. This version maintains its efficiency using the CPU L1 cache to a range of over 16 billion and then gets a little slower for ranges of a trillion or more using the CPU's L2 cache. It takes an average of only about 3.5 CPU clock cycles per composite number cull, or about 70 CPU clock cycles per prime found.

Note that the fastest performance is realized by using functions that directly manipulate the output "seq" (array) of culled bit number representations such as the `countPrimesTo` function provided, as enumeration using the `primesPaged` iterator takes about as long to enumerate the found primes as it takes to cull the composites.

Many further improvements in speed can be made, as in tuning the medium ranges to more efficiently use the CPU caches for an improvement in the middle ranges of up to a factor of about two, full maximum wheel factorization for a further improvement of about four, extreme loop unrolling for a further improvement of approximately two, multi-threading for an improvement of the factor of effective CPU cores used, etc. However, these improvements are of little point when used with enumeration; for instance, if one successfully reduced the time to sieve the composite numbers to zero, it would still take about a second just to enumerate the resulting primes over a range of a billion.


=={{header|Niue}}==
=={{header|Niue}}==
{{incorrect|Niue|It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
{{incorrect|Niue|It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
<lang Niue>[ dup 2 < ] '<2 ;
<syntaxhighlight lang="niue">[ dup 2 < ] '<2 ;
[ 1 + 'count ; [ <2 [ , ] when ] count times ] 'fill-stack ;
[ 1 + 'count ; [ <2 [ , ] when ] count times ] 'fill-stack ;


Line 6,314: Line 14,095:


10 sieve .s ( => 2 3 5 7 9 ) reset newline
10 sieve .s ( => 2 3 5 7 9 ) reset newline
30 sieve .s ( => 2 3 5 7 11 13 17 19 23 29 ) </lang>
30 sieve .s ( => 2 3 5 7 11 13 17 19 23 29 ) </syntaxhighlight>

=={{header|Oberon-2}}==
=={{header|Oberon-2}}==
<lang oberon2>MODULE Primes;
<syntaxhighlight lang="oberon2">MODULE Primes;


IMPORT Out, Math;
IMPORT Out, Math;
Line 6,349: Line 14,131:
END;
END;
Out.Ln;
Out.Ln;
END Primes.</lang>
END Primes.</syntaxhighlight>

=={{header|OCaml}}==
=={{header|OCaml}}==
===Imperative===
===Imperative===
<lang ocaml>let sieve n =
<syntaxhighlight lang="ocaml">let sieve n =
let is_prime = Array.create n true in
let is_prime = Array.create n true in
let limit = truncate(sqrt (float (n - 1))) in
let limit = truncate(sqrt (float (n - 1))) in
Line 6,365: Line 14,148:
is_prime.(0) <- false;
is_prime.(0) <- false;
is_prime.(1) <- false;
is_prime.(1) <- false;
is_prime</lang>
is_prime</syntaxhighlight>


<lang ocaml>let primes n =
<syntaxhighlight lang="ocaml">let primes n =
let primes, _ =
let primes, _ =
let sieve = sieve n in
let sieve = sieve n in
Line 6,375: Line 14,158:
([], Array.length sieve - 1)
([], Array.length sieve - 1)
in
in
primes</lang>
primes</syntaxhighlight>


in the top-level:
in the top-level:
Line 6,384: Line 14,167:


===Functional===
===Functional===
<lang ocaml>(* first define some iterators *)
<syntaxhighlight lang="ocaml">(* first define some iterators *)
# let fold_iter f init a b =
let fold_iter f init a b =
let rec aux acc i =
let rec aux acc i =
if i > b
if i > b
then (acc)
then (acc)
else aux (f acc i) (succ i)
else aux (f acc i) (succ i)
in
in
aux init a ;;
aux init a
val fold_iter : ('a -> int -> 'a) -> 'a -> int -> int -> 'a = <fun>
(* val fold_iter : ('a -> int -> 'a) -> 'a -> int -> int -> 'a *)


# let fold_step f init a b step =
let fold_step f init a b step =
let rec aux acc i =
let rec aux acc i =
if i > b
if i > b
then (acc)
then (acc)
else aux (f acc i) (i + step)
else aux (f acc i) (i + step)
in
in
aux init a ;;
aux init a
val fold_step : ('a -> int -> 'a) -> 'a -> int -> int -> int -> 'a = <fun>
(* val fold_step : ('a -> int -> 'a) -> 'a -> int -> int -> int -> 'a *)


(* remove a given value from a list *)
(* remove a given value from a list *)
# let remove li v =
let remove li v =
let rec aux acc = function
let rec aux acc = function
| hd::tl when hd = v -> (List.rev_append acc tl)
| hd::tl when hd = v -> (List.rev_append acc tl)
| hd::tl -> aux (hd::acc) tl
| hd::tl -> aux (hd::acc) tl
| [] -> li
| [] -> li
in
in
aux [] li ;;
aux [] li
val remove : 'a list -> 'a -> 'a list = <fun>
(* val remove : 'a list -> 'a -> 'a list *)


(* the main function *)
(* the main function *)
# let primes n =
let primes n =
let li =
let li =
(* create a list [from 2; ... until n] *)
(* create a list [from 2; ... until n] *)
List.rev(fold_iter (fun acc i -> (i::acc)) [] 2 n)
List.rev(fold_iter (fun acc i -> (i::acc)) [] 2 n)
in
in
let limit = truncate(sqrt(float n)) in
let limit = truncate(sqrt(float n)) in
fold_iter (fun li i ->
fold_iter (fun li i ->
if List.mem i li (* test if (i) is prime *)
if List.mem i li (* test if (i) is prime *)
then (fold_step remove li (i*i) n i)
then (fold_step remove li (i*i) n i)
else li)
else li)
li 2 (pred limit)
li 2 (pred limit)
(* val primes : int -> int list *)
;;
</syntaxhighlight>
val primes : int -> int list = <fun>
in the top-level:

# primes 200 ;;
<syntaxhighlight lang="ocaml"># primes 200 ;;
- : int list =
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
157; 163; 167; 173; 179; 181; 191; 193; 197; 199]</lang>
157; 163; 167; 173; 179; 181; 191; 193; 197; 199]</syntaxhighlight>


=== Another functional version ===
=== Another functional version ===
Line 6,438: Line 14,221:
This uses zero to denote struck-out numbers. It is slightly inefficient as it strikes-out multiples above p rather than p<sup>2</sup>
This uses zero to denote struck-out numbers. It is slightly inefficient as it strikes-out multiples above p rather than p<sup>2</sup>


<lang ocaml># let rec strike_nth k n l = match l with
<syntaxhighlight lang="ocaml">let rec strike_nth k n l = match l with
| [] -> []
| [] -> []
| h :: t ->
| h :: t ->
if k = 0 then 0 :: strike_nth (n-1) n t
if k = 0 then 0 :: strike_nth (n-1) n t
else h :: strike_nth (k-1) n t;;
else h :: strike_nth (k-1) n t
val strike_nth : int -> int -> int list -> int list = <fun>
(* val strike_nth : int -> int -> int list -> int list *)


# let primes n =
let primes n =
let limit = truncate(sqrt(float n)) in
let limit = truncate(sqrt(float n)) in
let rec range a b = if a > b then [] else a :: range (a+1) b in
let rec range a b = if a > b then [] else a :: range (a+1) b in
Line 6,453: Line 14,236:
| h :: t -> if h > limit then List.filter ((<) 0) l else
| h :: t -> if h > limit then List.filter ((<) 0) l else
h :: sieve_primes (strike_nth (h-1) h t) in
h :: sieve_primes (strike_nth (h-1) h t) in
sieve_primes (range 2 n) ;;
sieve_primes (range 2 n)
val primes : int -> int list = <fun>
(* val primes : int -> int list *)


</syntaxhighlight>
# primes 200;;
in the top-level:
<syntaxhighlight lang="ocaml"># primes 200;;
- : int list =
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
157; 163; 167; 173; 179; 181; 191; 193; 197; 199]</lang>
157; 163; 167; 173; 179; 181; 191; 193; 197; 199]</syntaxhighlight>


=={{header|Oforth}}==
=={{header|Oforth}}==


<lang Oforth>: eratosthenes(n)
<syntaxhighlight lang="oforth">: eratosthenes(n)
| i j |
| i j |
ListBuffer newSize(n) dup add(null) seqFrom(2, n) over addAll
ListBuffer newSize(n) dup add(null) seqFrom(2, n) over addAll
Line 6,470: Line 14,255:
dup at(i) ifNotNull: [ i sq n i step: j [ dup put(j, null) ] ]
dup at(i) ifNotNull: [ i sq n i step: j [ dup put(j, null) ] ]
]
]
filter(#notNull) ;</lang>
filter(#notNull) ;</syntaxhighlight>


{{out}}
{{out}}
Line 6,476: Line 14,261:
>100 eratosthenes println
>100 eratosthenes println
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
</pre>

=={{header|Ol}}==
<syntaxhighlight lang="scheme">
(define all (iota 999 2))

(print
(let main ((left '()) (right all))
(if (null? right)
(reverse left)
(unless (car right)
(main left (cdr right))
(let loop ((l '()) (r right) (n 0) (every (car right)))
(if (null? r)
(let ((l (reverse l)))
(main (cons (car l) left) (cdr l)))
(if (eq? n every)
(loop (cons #false l) (cdr r) 1 every)
(loop (cons (car r) l) (cdr r) (+ n 1) every)))))))
)
</syntaxhighlight>

Output:
<pre>
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997)
</pre>

=={{header|ooRexx}}==
<syntaxhighlight lang="oorexx">
/*ooRexx program generates & displays primes via the sieve of Eratosthenes.
* derived from first Rexx version
* uses an array rather than a stem for the list
* uses string methods rather than BIFs
* uses new ooRexx keyword LOOP, extended assignment
* and line comments
* uses meaningful variable names and restructures code
* layout for improved understandability
****************************************************************************/
arg highest --get highest number to use.
if \highest~datatype('W') then
highest = 200 --use default value.
isPrime = .array~new(highest) --container for all numbers.
isPrime~fill(1) --assume all numbers are prime.
w = highest~length --width of the biggest number,
-- it's used for aligned output.
out1 = 'prime'~right(20) --first part of output messages.
np = 0 --no primes so far.
loop j = 2 for highest - 1 --all numbers up through highest.
if isPrime[j] = 1 then do --found one.
np += 1 --bump the prime counter.
say out1 np~right(w) ' --> ' j~right(w) --display output.
loop m = j * j to highest by j
isPrime[m] = '' --strike all multiples: not prime.
end
end
end
say
say np~right(out1~length + 1 + w) 'primes found up to and including ' highest
exit
</syntaxhighlight>
{{out}}
<pre>
prime 1 --> 2
prime 2 --> 3
prime 3 --> 5
prime 4 --> 7
prime 5 --> 11
prime 6 --> 13
prime 7 --> 17
prime 8 --> 19
prime 9 --> 23
prime 10 --> 29
prime 11 --> 31
prime 12 --> 37
prime 13 --> 41
prime 14 --> 43
prime 15 --> 47
prime 16 --> 53
prime 17 --> 59
prime 18 --> 61
prime 19 --> 67
prime 20 --> 71
prime 21 --> 73
prime 22 --> 79
prime 23 --> 83
prime 24 --> 89
prime 25 --> 97
prime 26 --> 101
prime 27 --> 103
prime 28 --> 107
prime 29 --> 109
prime 30 --> 113
prime 31 --> 127
prime 32 --> 131
prime 33 --> 137
prime 34 --> 139
prime 35 --> 149
prime 36 --> 151
prime 37 --> 157
prime 38 --> 163
prime 39 --> 167
prime 40 --> 173
prime 41 --> 179
prime 42 --> 181
prime 43 --> 191
prime 44 --> 193
prime 45 --> 197
prime 46 --> 199

46 primes found up to and including 200
</pre>
===Wheel Version===
<syntaxhighlight lang="oorexx">
/*ooRexx program generates primes via sieve of Eratosthenes algorithm.
* wheel version, 2 handled as special case
* loops optimized: outer loop stops at the square root of
* the limit, inner loop starts at the square of the
* prime just found
* use a list rather than an array and remove composites
* rather than just mark them
* convert list of primes to a list of output messages and
* display them with one say statement
*******************************************************************************/
arg highest -- get highest number to use.
if \highest~datatype('W') then
highest = 200 -- use default value.
w = highest~length -- width of the biggest number,
-- it's used for aligned output.
thePrimes = .list~of(2) -- the first prime is 2.
loop j = 3 to highest by 2 -- populate the list with odd nums.
thePrimes~append(j)
end

j = 3 -- first prime (other than 2)
ix = thePrimes~index(j) -- get the index of 3 in the list.
loop while j*j <= highest -- strike multiples of odd ints.
-- up to sqrt(highest).
loop jm = j*j to highest by j+j -- start at J squared, incr. by 2*J.
thePrimes~removeItem(jm) -- delete it since it's composite.
end
ix = thePrimes~next(ix) -- the index of the next prime.
j = thePrimes[ix] -- the next prime.
end
np = thePrimes~items -- the number of primes since the
-- list is now only primes.
out1 = ' prime number' -- first part of output messages.
out2 = ' --> ' -- middle part of output messages.
ix = thePrimes~first
loop n = 1 to np -- change the list of primes
-- to output messages.
thePrimes[ix] = out1 n~right(w) out2 thePrimes[ix]~right(w)
ix = thePrimes~next(ix)
end
last = np~right(out1~length+1+w) 'primes found up to and including ' highest
thePrimes~append(.endofline || last) -- add blank line and summary line.
say thePrimes~makearray~toString -- display the output.
exit
</syntaxhighlight>
{{out}}when using the limit of 100
<pre>
prime number 1 --> 2
prime number 2 --> 3
prime number 3 --> 5
prime number 4 --> 7
prime number 5 --> 11
prime number 6 --> 13
prime number 7 --> 17
prime number 8 --> 19
prime number 9 --> 23
prime number 10 --> 29
prime number 11 --> 31
prime number 12 --> 37
prime number 13 --> 41
prime number 14 --> 43
prime number 15 --> 47
prime number 16 --> 53
prime number 17 --> 59
prime number 18 --> 61
prime number 19 --> 67
prime number 20 --> 71
prime number 21 --> 73
prime number 22 --> 79
prime number 23 --> 83
prime number 24 --> 89
prime number 25 --> 97

25 primes found up to and including 100
</pre>
</pre>


=={{header|Oz}}==
=={{header|Oz}}==
{{trans|Haskell}}
{{trans|Haskell}}
<lang oz>declare
<syntaxhighlight lang="oz">declare
fun {Sieve N}
fun {Sieve N}
S = {Array.new 2 N true}
S = {Array.new 2 N true}
Line 6,503: Line 14,475:
end
end
in
in
{Show {Primes 30}}</lang>
{Show {Primes 30}}</syntaxhighlight>


=={{header|PARI/GP}}==
=={{header|PARI/GP}}==
<lang parigp>Eratosthenes(lim)={
<syntaxhighlight lang="parigp">Eratosthenes(lim)={
my(v=Vectorsmall(lim\1,unused,1));
my(v=Vecsmall(lim\1,unused,1));
forprime(p=2,sqrt(lim),
forprime(p=2,sqrt(lim),
forstep(i=p^2,lim,p,
forstep(i=p^2,lim,p,
Line 6,514: Line 14,486:
);
);
for(i=1,lim,if(v[i],print1(i", ")))
for(i=1,lim,if(v[i],print1(i", ")))
};</lang>
};</syntaxhighlight>


An alternate version:
An alternate version:


<lang parigp>Sieve(n)=
<syntaxhighlight lang="parigp">Sieve(n)=
{
{
v=vector(n,unused,1);
v=vector(n,unused,1);
Line 6,524: Line 14,496:
if(v[i],
if(v[i],
forstep(j=i^2,n,i,v[j]=0)));
forstep(j=i^2,n,i,v[j]=0)));
for(i=2,n,if(v[i],print1(i)))
for(i=2,n,if(v[i],print1(i",")))
};</lang>
};</syntaxhighlight>


=={{header|Pascal}}==
=={{header|Pascal}}==
Note: Some Pascal implementations put quite low limits on the size of a set (e.g. Turbo Pascal doesn't allow more than 256 members). To compile on such an implementation, reduce the constant PrimeLimit accordingly.
Note: Some Pascal implementations put quite low limits on the size of a set (e.g. Turbo Pascal doesn't allow more than 256 members). To compile on such an implementation, reduce the constant PrimeLimit accordingly.
<lang pascal>
<syntaxhighlight lang="pascal">
program primes(output)
program primes(output)


Line 6,570: Line 14,542:
end
end
end.
end.
</syntaxhighlight>
</lang>
===alternative using wheel ===
===alternative using wheel ===
Using growing wheel to fill array for sieving for minimal unmark operations.
Using growing wheel to fill array for sieving for minimal unmark operations.
Sieving only with possible-prime factors.
Sieving only with possible-prime factors.
<lang pascal>
<syntaxhighlight lang="pascal">
program prim(output);
program prim(output);
//Sieve of Erathosthenes with fast elimination of multiples of small primes
//Sieve of Erathosthenes with fast elimination of multiples of small primes
Line 6,704: Line 14,676:
inc(prCnt,Ord(primes[i]));
inc(prCnt,Ord(primes[i]));
writeln(prCnt,' primes up to ',PrimeLimit);
writeln(prCnt,' primes up to ',PrimeLimit);
end.</lang>
end.</syntaxhighlight>


output: ( i3 4330 Haswell 3.5 Ghz fpc 2.6.4 -O3 )
output: ( i3 4330 Haswell 3.5 Ghz fpc 2.6.4 -O3 )
Line 6,720: Line 14,692:


===Classic Sieve===
===Classic Sieve===
<lang perl>sub sieve {
<syntaxhighlight lang="perl">sub sieve {
my $n = shift;
my $n = shift;
my @composite;
my @composite;
Line 6,735: Line 14,707:
}
}
@primes;
@primes;
}</lang>
}</syntaxhighlight>


===Odds only (faster)===
===Odds only (faster)===
<lang perl>sub sieve2 {
<syntaxhighlight lang="perl">sub sieve2 {
my($n) = @_;
my($n) = @_;
return @{([],[],[2],[2,3],[2,3])[$n]} if $n <= 4;
return @{([],[],[2],[2,3],[2,3])[$n]} if $n <= 4;
Line 6,754: Line 14,726:
}
}
@primes;
@primes;
}</lang>
}</syntaxhighlight>


===Odds only, using vectors for lower memory use===
===Odds only, using vectors for lower memory use===
<lang perl>sub dj_vector {
<syntaxhighlight lang="perl">sub dj_vector {
my($end) = @_;
my($end) = @_;
return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
Line 6,772: Line 14,744:
do { push @primes, 2*$_+1 if !vec($sieve,$_,1) } for (1..int(($end-1)/2));
do { push @primes, 2*$_+1 if !vec($sieve,$_,1) } for (1..int(($end-1)/2));
@primes;
@primes;
}</lang>
}</syntaxhighlight>


===Odds only, using strings for best performance===
===Odds only, using strings for best performance===
Compared to array versions, about 2x faster (with 5.16.0 or later) and lower memory. Much faster than the experimental versions below. It's possible a mod-6 or mod-30 wheel could give more improvement, though possibly with obfuscation. The best next step for performance and functionality would be segmenting.
Compared to array versions, about 2x faster (with 5.16.0 or later) and lower memory. Much faster than the experimental versions below. It's possible a mod-6 or mod-30 wheel could give more improvement, though possibly with obfuscation. The best next step for performance and functionality would be segmenting.
<lang perl>sub string_sieve {
<syntaxhighlight lang="perl">sub string_sieve {
my ($n, $i, $s, $d, @primes) = (shift, 7);
my ($n, $i, $s, $d, @primes) = (shift, 7);


Line 6,791: Line 14,763:
push @primes, pos while m/0/g;
push @primes, pos while m/0/g;
@primes;
@primes;
}</lang>
}</syntaxhighlight>


This older version uses half the memory, but at the expense of a bit of speed and code complexity:
This older version uses half the memory, but at the expense of a bit of speed and code complexity:
<lang perl>sub dj_string {
<syntaxhighlight lang="perl">sub dj_string {
my($end) = @_;
my($end) = @_;
return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
Line 6,815: Line 14,787:
push @primes, 2*pos($sieve)-1 while $sieve =~ m/0/g;
push @primes, 2*pos($sieve)-1 while $sieve =~ m/0/g;
@primes;
@primes;
}</lang>
}</syntaxhighlight>


===Experimental===
===Experimental===
Line 6,821: Line 14,793:


Golfing a bit, at the expense of speed:
Golfing a bit, at the expense of speed:
<lang perl>sub sieve{ my (@s, $i);
<syntaxhighlight lang="perl">sub sieve{ my (@s, $i);
grep { not $s[ $i = $_ ] and do
grep { not $s[ $i = $_ ] and do
{ $s[ $i += $_ ]++ while $i <= $_[0]; 1 }
{ $s[ $i += $_ ]++ while $i <= $_[0]; 1 }
Line 6,827: Line 14,799:
}
}


print join ", " => sieve 100;</lang>
print join ", " => sieve 100;</syntaxhighlight>


Or with bit strings (much slower than the vector version above):
Or with bit strings (much slower than the vector version above):
<lang perl>sub sieve{ my ($s, $i);
<syntaxhighlight lang="perl">sub sieve{ my ($s, $i);
grep { not vec $s, $i = $_, 1 and do
grep { not vec $s, $i = $_, 1 and do
{ (vec $s, $i += $_, 1) = 1 while $i <= $_[0]; 1 }
{ (vec $s, $i += $_, 1) = 1 while $i <= $_[0]; 1 }
Line 6,836: Line 14,808:
}
}


print join ", " => sieve 100;</lang>
print join ", " => sieve 100;</syntaxhighlight>


A short recursive version:
A short recursive version:
<lang perl>sub erat {
<syntaxhighlight lang="perl">sub erat {
my $p = shift;
my $p = shift;
return $p, $p**2 > $_[$#_] ? @_ : erat(grep $_%$p, @_)
return $p, $p**2 > $_[$#_] ? @_ : erat(grep $_%$p, @_)
}
}


print join ', ' => erat 2..100000;</lang>
print join ', ' => erat 2..100000;</syntaxhighlight>


Regexp (purely an example -- the regex engine limits it to only 32769):<lang perl>sub sieve {
Regexp (purely an example -- the regex engine limits it to only 32769):<syntaxhighlight lang="perl">sub sieve {
my ($s, $p) = "." . ("x" x shift);
my ($s, $p) = "." . ("x" x shift);


Line 6,857: Line 14,829:
}
}


print sieve(1000);</lang>
print sieve(1000);</syntaxhighlight>


===Extensible sieves===
===Extensible sieves===


Here are two incremental versions, which allows one to create a tied array of primes:
Here are two incremental versions, which allows one to create a tied array of primes:
<lang perl>use strict;
<syntaxhighlight lang="perl">use strict;
use warnings;
use warnings;
package Tie::SieveOfEratosthenes;
package Tie::SieveOfEratosthenes;
Line 6,967: Line 14,939:
}
}


1;</lang>
1;</syntaxhighlight>
This one is based on the vector sieve shown earlier, but adds to a list as needed, just sieving in the segment. Slightly faster and half the memory vs. the previous incremental sieve. It uses the same API -- arguably we should be offset by one so $primes[$n] returns the $n'th prime.
This one is based on the vector sieve shown earlier, but adds to a list as needed, just sieving in the segment. Slightly faster and half the memory vs. the previous incremental sieve. It uses the same API -- arguably we should be offset by one so $primes[$n] returns the $n'th prime.
<lang perl>use strict;
<syntaxhighlight lang="perl">use strict;
use warnings;
use warnings;
package Tie::SieveOfEratosthenes;
package Tie::SieveOfEratosthenes;
Line 7,022: Line 14,994:
}
}


1;</lang>
1;</syntaxhighlight>


=={{header|Perl 6}}==
=={{header|Phix}}==
{{Trans|Euphoria}}
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">constant</span> <span style="color: #000000;">limit</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1000</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">primes</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">flags</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">limit</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">floor</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sqrt</span><span style="color: #0000FF;">(</span><span style="color: #000000;">limit</span><span style="color: #0000FF;">))</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">flags</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">then</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">i</span><span style="color: #0000FF;">*</span><span style="color: #000000;">i</span> <span style="color: #008080;">to</span> <span style="color: #000000;">limit</span> <span style="color: #008080;">by</span> <span style="color: #000000;">i</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">flags</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #000000;">limit</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">flags</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">primes</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">i</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #7060A8;">pp</span><span style="color: #0000FF;">(</span><span style="color: #000000;">primes</span><span style="color: #0000FF;">,{</span><span style="color: #004600;">pp_Maxlen</span><span style="color: #0000FF;">,</span><span style="color: #000000;">77</span><span style="color: #0000FF;">})</span>
<!--</syntaxhighlight>-->
{{out}}
<pre>
{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,
101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,
193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,
293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,
409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,
521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,
641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,
757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,
881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997}
</pre>
See also [[Sexy_primes#Phix]] where the sieve is more useful than a list of primes.<br>
Most applications should use the builtins, eg <code>get_primes(-get_maxprime(1000*1000))</code> or <code>get_primes_le(1000)</code> both give exactly the same output as above.


=={{header|Phixmonti}}==
<lang perl6>sub sieve( Int $limit ) {
<syntaxhighlight lang="phixmonti">include ..\Utilitys.pmt
my @is-prime = False, False, slip True xx $limit - 1;


def sequence /# ( ini end [step] ) #/
gather for @is-prime.kv -> $number, $is-prime {
if $is-prime {
( ) swap for 0 put endfor
enddef
take $number;
loop (my $s = $number**2; $s <= $limit; $s += $number) {
@is-prime[$s] = False;
}
}
}
}


1000 var limit
(sieve 100).join(",").say;</lang>


( 1 limit ) sequence
=== A set-based approach ===


( 2 limit ) for >ps
More or less the same as the first Python example:
( tps dup * limit tps ) for
<lang perl6>sub eratsieve($n) {
dup limit < if 0 swap set else drop endif
# Requires n(1 - 1/(log(n-1))) storage
endfor
my $multiples = set();
cps
gather for 2..$n -> $i {
endfor
unless $i (&) $multiples { # is subset
( 1 limit 0 ) remove
take $i;
pstack</syntaxhighlight>
$multiples (+)= set($i**2, *+$i ... (* > $n)); # union
}
}
}


Another solution
say flat eratsieve(100);</lang>
<syntaxhighlight lang="phixmonti">include ..\Utilitys.pmt
This gives:
1000


( "Primes in " over ": " ) lprint
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)


2 swap 2 tolist for >ps
=={{header|Phix}}==
2
{{Trans|Euphoria}}
dup tps < while
<lang Phix>constant limit = 1000
tps over mod 0 == if false else 1 + true endif
sequence primes = {}
over tps < and
sequence flags = repeat(1, limit)
endwhile
for i=2 to floor(sqrt(limit)) do
if flags[i] then
tps < ps> swap if drop endif
endfor
for k=i*i to limit by i do

flags[k] = 0
pstack</syntaxhighlight>
end for
end if
end for
for i=2 to limit do
if flags[i] then
primes &= i
end if
end for
? primes</lang>
{{out}}
{{out}}
<pre>
<pre>Primes in 1000:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]
{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,

179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,
=== Press any key to exit ===</pre>
373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,
587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,
809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997}
</pre>


=={{header|PHP}}==
=={{header|PHP}}==
<lang php>
<syntaxhighlight lang="php">
function iprimes_upto($limit)
function iprimes_upto($limit)
{
{
Line 7,109: Line 15,096:
return $primes;
return $primes;
}
}
</lang>


echo wordwrap(
'Primes less or equal than 1000 are : ' . PHP_EOL .
implode(' ', array_keys(iprimes_upto(1000), true, true)),
100
);
</syntaxhighlight>

{{out}}
<pre>Primes less or equal than 1000 are :
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131
137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269
271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421
431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743
751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919
929 937 941 947 953 967 971 977 983 991 997</pre>

=={{header|Picat}}==
The SoE is provided in the standard library, defined as follows:
<syntaxhighlight lang="picat">
primes(N) = L =>
A = new_array(N),
foreach(I in 2..floor(sqrt(N)))
if (var(A[I])) then
foreach(J in I**2..I..N)
A[J]=0
end
end
end,
L=[I : I in 2..N, var(A[I])].
</syntaxhighlight>
{{Out}}
<pre>
Picat> L = math.primes(100).
L = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
yes
</pre>
=={{header|PicoLisp}}==
=={{header|PicoLisp}}==
<lang PicoLisp>(de sieve (N)
<syntaxhighlight lang="picolisp">(de sieve (N)
(let Sieve (range 1 N)
(let Sieve (range 1 N)
(set Sieve)
(set Sieve)
Line 7,119: Line 15,142:
(for (S (nth Sieve (* I I)) S (nth (cdr S) I))
(for (S (nth Sieve (* I I)) S (nth (cdr S) I))
(set S) ) ) )
(set S) ) ) )
(filter bool Sieve) ) )</lang>
(filter bool Sieve) ) )</syntaxhighlight>
Output:
Output:
<pre>: (sieve 100)
<pre>: (sieve 100)
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)</pre>
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)</pre>
===Alternate Version Using a 2x3x5x7 Wheel===
This works by destructively modifying the CDR of the previous cell when it finds a composite number. For sieving large sets (e.g. 1,000,000) it's much faster than the above.
<syntaxhighlight lang="picolisp">
(setq WHEEL-2357
(2 4 2 4 6 2 6 4
2 4 6 6 2 6 4 2
6 4 6 8 4 2 4 2
4 8 6 4 6 2 4 6
2 6 6 4 2 4 6 2
6 4 2 4 2 10 2 10 .))

(de roll2357wheel (Limit)
(let W WHEEL-2357
(make
(for (N 11 (<= N Limit) (+ N (pop 'W)))
(link N)))))

(de sqr (X) (* X X))

(de remove-multiples (L)
(let (N (car L) M (* N N) P L Q (cdr L))
(while Q
(let A (car Q)
(until (>= M A)
(setq M (+ M N)))
(when (= A M)
(con P (cdr Q))))
(setq P Q Q (cdr Q)))))


(de sieve (Limit)
(let Sieve (roll2357wheel Limit)
(for (P Sieve (<= (sqr (car P)) Limit) (cdr P))
(remove-multiples P))
(append (2 3 5 7) Sieve)))
</syntaxhighlight>
{{Out}}
<pre>
: (sieve 100)
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
: (filter '((N) (> N 900)) (sieve 1000))
-> (907 911 919 929 937 941 947 953 967 971 977 983 991 997)
: (last (sieve 1000000))
-> 999983
</pre>


=={{header|PL/I}}==
=={{header|PL/I}}==
<lang pli>eratos: proc options (main) reorder;
<syntaxhighlight lang="pli">eratos: proc options (main) reorder;


dcl i fixed bin (31);
dcl i fixed bin (31);
Line 7,161: Line 15,229:
end;
end;
end;
end;
end eratos;</lang>
end eratos;</syntaxhighlight>

=={{header|PL/M}}==
<syntaxhighlight lang="plm">100H:

DECLARE PRIME$MAX LITERALLY '5000';

/* CREATE SIEVE OF GIVEN SIZE */
MAKE$SIEVE: PROCEDURE(START, SIZE);
DECLARE (START, SIZE, M, N) ADDRESS;
DECLARE PRIME BASED START BYTE;
PRIME(0)=0; /* 0 AND 1 ARE NOT PRIMES */
PRIME(1)=0;
DO N=2 TO SIZE;
PRIME(N)=1; /* ASSUME ALL OTHERS ARE PRIME AT BEGINNING */
END;
DO N=2 TO SIZE;
IF PRIME(N) THEN DO; /* IF A NUMBER IS PRIME... */
DO M=N*N TO SIZE BY N;
PRIME(M) = 0; /* THEN ITS MULTIPLES ARE NOT */
END;
END;
END;
END MAKE$SIEVE;

/* CP/M CALLS */
BDOS: PROCEDURE(FUNC, ARG);
DECLARE FUNC BYTE, ARG ADDRESS;
GO TO 5;
END BDOS;

DECLARE BDOS$EXIT LITERALLY '0',
BDOS$PRINT LITERALLY '9';

/* PRINT A 16-BIT NUMBER */
PRINT$NUMBER: PROCEDURE(N);
DECLARE (N, P) ADDRESS;
DECLARE S (8) BYTE INITIAL ('.....',10,13,'$');
DECLARE C BASED P BYTE;
P = .S(5);
DIGIT:
P = P - 1;
C = (N MOD 10) + '0';
N = N / 10;
IF N > 0 THEN GO TO DIGIT;
CALL BDOS(BDOS$PRINT, P);
END PRINT$NUMBER;

/* PRINT ALL PRIMES UP TO N */
PRINT$PRIMES: PROCEDURE(N, SIEVE);
DECLARE (I, N, SIEVE) ADDRESS;
DECLARE PRIME BASED SIEVE BYTE;
CALL MAKE$SIEVE(SIEVE, N);
DO I = 2 TO N;
IF PRIME(I) THEN CALL PRINT$NUMBER(I);
END;
END PRINT$PRIMES;

CALL PRINT$PRIMES(PRIME$MAX, .MEMORY);

CALL BDOS(BDOS$EXIT, 0);
EOF</syntaxhighlight>
{{out}}
<pre>2
3
5
7
11
....
4967
4969
4973
4987
4999</pre>

=={{header|PL/SQL}}==
<syntaxhighlight lang="plsql">create or replace package sieve_of_eratosthenes as
type array_of_booleans is varray(100000000) of boolean;
type table_of_integers is table of integer;
function find_primes (n number) return table_of_integers pipelined;
end sieve_of_eratosthenes;
/

create or replace package body sieve_of_eratosthenes as
function find_primes (n number) return table_of_integers pipelined is
flag array_of_booleans;
ptr integer;
i integer;
begin
flag := array_of_booleans(false, true);
flag.extend(n - 2, 2);
ptr := 1;
<< outer_loop >>
while ptr * ptr <= n loop
while not flag(ptr) loop
ptr := ptr + 1;
end loop;
i := ptr * ptr;
while i <= n loop
flag(i) := false;
i := i + ptr;
end loop;
ptr := ptr + 1;
end loop outer_loop;
for i in 1 .. n loop
if flag(i) then
pipe row (i);
end if;
end loop;
return;
end find_primes;
end sieve_of_eratosthenes;
/</syntaxhighlight>

Usage:

<syntaxhighlight lang="sql">select column_value as prime_number
from table(sieve_of_eratosthenes.find_primes(30));

PRIME_NUMBER
------------
2
3
5
7
11
13
17
19
23
29

10 rows selected.

Elapsed: 00:00:00.01

select count(*) as number_of_primes, sum(column_value) as sum_of_primes
from table(sieve_of_eratosthenes.find_primes(1e7));

NUMBER_OF_PRIMES SUM_OF_PRIMES
---------------- ---------------
664579 3203324994356

Elapsed: 00:00:02.60</syntaxhighlight>

=={{header|Pony}}==
<syntaxhighlight lang="pony">use "time" // for testing
use "collections"

class Primes is Iterator[U32] // returns an Iterator of found primes...
let _bitmask: Array[U8] = [ 1; 2; 4; 8; 16; 32; 64; 128 ]
var _lmt: USize
let _cmpsts: Array[U8]
var _ndx: USize = 2
var _curr: U32 = 2
new create(limit: U32) ? =>
_lmt = USize.from[U32](limit)
let sqrtlmt = USize.from[F64](F64.from[U32](limit).sqrt())
_cmpsts = Array[U8].init(0, (_lmt + 8) / 8) // already zeroed; bit array
_cmpsts(0)? = 3 // mark 0 and 1 as not prime!
if sqrtlmt < 2 then return end
for p in Range[USize](2, sqrtlmt + 1) do
if (_cmpsts(p >> 3)? and _bitmask(p and 7)?) == 0 then
var s = p * p // cull start address for p * p!
let slmt = (s + (p << 3)).min(_lmt + 1)
while s < slmt do
let msk = _bitmask(s and 7)?
var c = s >> 3
while c < _cmpsts.size() do
_cmpsts(c)? = _cmpsts(c)? or msk
c = c + p
end
s = s + p
end
end
end

fun ref has_next(): Bool val => _ndx < (_lmt + 1)
fun ref next(): U32 ? =>
_curr = U32.from[USize](_ndx); _ndx = _ndx + 1
while (_ndx <= _lmt) and ((_cmpsts(_ndx >> 3)? and _bitmask(_ndx and 7)?) != 0) do
_ndx = _ndx + 1
end
_curr

actor Main
new create(env: Env) =>
let limit: U32 = 1_000_000_000
try
env.out.write("Primes to 100: ")
for p in Primes(100)? do env.out.write(p.string() + " ") end
var count: I32 = 0
for p in Primes(1_000_000)? do count = count + 1 end
env.out.print("\nThere are " + count.string() + " primes to a million.")
let t = Time
let start = t.millis()
let prms = Primes(limit)?
let elpsd = t.millis() - start
count = 0
for _ in prms do count = count + 1 end
env.out.print("Found " + count.string() + " primes to " + limit.string() + ".")
env.out.print("This took " + elpsd.string() + " milliseconds.")
end</syntaxhighlight>
{{out}}
<pre>Primes to 100: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes to a million.
Found 50847534 primes to 1000000000.
This took 28123 milliseconds.</pre>

Note to users: a naive monolithic sieve (one huge array) isn't really the way to implement this for other than trivial usage in sieving ranges to a few millions as cache locality becomes a very large problem as the size of the array (even bit packed with one bit per number representation as here) limits the maximum range that can be sieved and the "cache thrashing" limits the speed.

For extended ranges, a Page Segmented version should be used. As well, for any extended ranges in the billions, it is a waste of available computer resources to not use the multi-threading available in a modern CPU, at which Pony would do very well with its built-in Actor concurrency model.

These versions use "loop unpeeling" (not full loop unrolling), which recognizes the repeating modulo pattern of masking the bytes by the base primes less than the square root of the limit so that an "unpeeling" by eight loops can cull by a constant bit mask over the whole range. For smaller ranges where the speed is not limited by "cache thrashing", this can provide about a factor-of-two speed-up.

===Alternate Odds-Only version of the above===

It is a waste not to do the trivial changes to the above code to sieve odds-only, which is about two and a half times faster due to the decreased number of culling operations; it doesn't really do much about the huge array problem though, other than to reduce it by a factor of two.

<syntaxhighlight lang="pony">use "time" // for testing
use "collections"

class Primes is Iterator[U32] // returns an Iterator of found primes...
let _bitmask: Array[U8] = [ 1; 2; 4; 8; 16; 32; 64; 128 ]
var _lmti: USize
let _cmpsts: Array[U8]
var _ndx: USize = 0
var _curr: U32 = 0
new create(limit: U32) ? =>
if limit < 3 then _lmti = 0; _cmpsts = Array[U8](); return end
_lmti = USize.from[U32]((limit - 3) / 2)
let sqrtlmti = (USize.from[F64](F64.from[U32](limit).sqrt()) - 3) / 2
_cmpsts = Array[U8].init(0, (_lmti + 8) / 8) // already zeroed; bit array
for i in Range[USize](0, sqrtlmti + 1) do
if (_cmpsts(i >> 3)? and _bitmask(i and 7)?) == 0 then
let p = i + i + 3
var s = ((i << 1) * (i + 3)) + 3 // cull start address for p * p!
let slmt = (s + (p << 3)).min(_lmti + 1)
while s < slmt do
let msk = _bitmask(s and 7)?
var c = s >> 3
while c < _cmpsts.size() do
_cmpsts(c)? = _cmpsts(c)? or msk
c = c + p
end
s = s + p
end
end
end

fun ref has_next(): Bool val => _ndx < (_lmti + 1)
fun ref next(): U32 ? =>
if _curr < 1 then _curr = 3; if _lmti == 0 then _ndx = 1 end; return 2 end
_curr = U32.from[USize](_ndx + _ndx + 3); _ndx = _ndx + 1
while (_ndx <= _lmti) and ((_cmpsts(_ndx >> 3)? and _bitmask(_ndx and 7)?) != 0) do
_ndx = _ndx + 1
end
_curr

actor Main
new create(env: Env) =>
let limit: U32 = 1_000_000_000
try
env.out.write("Primes to 100: ")
for p in Primes(100)? do env.out.write(p.string() + " ") end
var count: I32 = 0
for p in Primes(1_000_000)? do count = count + 1 end
env.out.print("\nThere are " + count.string() + " primes to a million.")
let t = Time
let start = t.millis()
let prms = Primes(limit)?
let elpsd = t.millis() - start
count = 0
for _ in prms do count = count + 1 end
env.out.print("Found " + count.string() + " primes to " + limit.string() + ".")
env.out.print("This took " + elpsd.string() + " milliseconds.")
end</syntaxhighlight>
The output is the same as the above except that it is about two and a half times faster due to that many less culling operations.


=={{header|Pop11}}==
=={{header|Pop11}}==
Line 7,177: Line 15,528:
enddefine;
enddefine;
</pre>
</pre>



=={{header|PowerShell}}==
=={{header|PowerShell}}==
Line 7,183: Line 15,533:
===Basic procedure===
===Basic procedure===
It outputs immediately so that the number can be used by the pipeline.
It outputs immediately so that the number can be used by the pipeline.
<lang PowerShell>function Sieve ( [int] $num )
<syntaxhighlight lang="powershell">function Sieve ( [int] $num )
{
{
$isprime = @{}
$isprime = @{}
Line 7,194: Line 15,544:
{ $isprime[$i] = $false }
{ $isprime[$i] = $false }
}
}
}</lang>
}</syntaxhighlight>
===Another implementation===
===Another implementation===
<syntaxhighlight lang="powershell">
<lang PowerShell>
function eratosthenes ($n) {
function eratosthenes ($n) {
if($n -ge 1){
if($n -ge 1){
Line 7,211: Line 15,561:
1..$n | where{$prime[$_]}
1..$n | where{$prime[$_]}
} else {
} else {
"$n must be equal or greater than 1"
Write-Warning "$n is less than 1"
}
}
}
}
"$(eratosthenes 100)"
"$(eratosthenes 100)"
</syntaxhighlight>
</lang>
<b>Output:</b>
<b>Output:</b>
<pre>
<pre>
Line 7,222: Line 15,572:


=={{header|Processing}}==
=={{header|Processing}}==
Calculate the primes up to 1000000 with Processing, including a visualisation of the process. As an additional visual effect, the layout of the pixel could be changed from the line-by-line layout to a spiral-like layout starting in the middle of the screen.
Calculate the primes up to 1000000 with Processing, including a visualisation of the process.
<lang java>int maxx,maxy;
<syntaxhighlight lang="java">int i=2;
int maxx;
int maxy;
int max;
int max;
boolean[] sieve;
boolean[] sieve;

void plot(int pos, boolean active) {
set(pos%maxx,pos/maxx, active?#000000:#ffffff);
}
void setup() {
void setup() {
size(1000, 1000, P2D);
size(1000, 1000);
frameRate(2);
// frameRate(2);
maxx=width;
maxx=width;
maxy=height;
maxy=height;
max=width*height;
max=width*height;
sieve=new boolean[max+1];
sieve=new boolean[max+1];

sieve[1]=false;
sieve[1]=false;
plot(0,false);
plot(0, false);
plot(1,false);
plot(1, false);
for(int i=2;i<=max;i++) {
for (int i=2; i<=max; i++) {
sieve[i]=true;
sieve[i]=true;
plot(i,true);
plot(i, true);
}
}
}
}

int i=2;
void draw() {
void draw() {
if(!sieve[i]) {
if (!sieve[i]) {
while(i*i<max && !sieve[i]) {
while (i*i<max && !sieve[i]) {
i++;
i++;
}
}
}
}
if(sieve[i]) {
if (sieve[i]) {
print(i+" ");
print(i+" ");
for(int j=i*i;j<=max;j+=i) {
for (int j=i*i; j<=max; j+=i) {
if(sieve[j]) {
if (sieve[j]) {
sieve[j]=false;
sieve[j]=false;
plot(j,false);
plot(j, false);
}
}
}
}
}
}
if(i*i<max) {
if (i*i<max) {
i++;
i++;
} else {
} else {
Line 7,271: Line 15,617:
println("finished");
println("finished");
}
}
}
}</lang>

void plot(int pos, boolean active) {
set(pos%maxx, pos/maxx, active?#000000:#ffffff);
}</syntaxhighlight>

As an additional visual effect, the layout of the pixel could be changed from the line-by-line layout to a spiral-like layout starting in the middle of the screen.

==={{header|Processing Python mode}}===

<syntaxhighlight lang="python">from __future__ import print_function

i = 2

def setup():
size(1000, 1000)
# frameRate(2)
global maxx, maxy, max_num, sieve
maxx = width
maxy = height
max_num = width * height
sieve = [False] * (max_num + 1)

sieve[1] = False
plot(0, False)
plot(1, False)
for i in range(2, max_num + 1):
sieve[i] = True
plot(i, True)


def draw():
global i
if not sieve[i]:
while (i * i < max_num and not sieve[i]):
i += 1

if sieve[i]:
print("{} ".format(i), end = '')
for j in range(i * i, max_num + 1, i):
if sieve[j]:
sieve[j] = False
plot(j, False)

if i * i < max_num:
i += 1
else:
noLoop()
println("finished")


def plot(pos, active):
set(pos % maxx, pos / maxx, color(0) if active else color(255))</syntaxhighlight>


=={{header|Prolog}}==
=={{header|Prolog}}==
===Using lists===
===Using lists===
====Basic bounded sieve====
====Basic bounded sieve====
<lang Prolog>primes(N, L) :- numlist(2, N, Xs),
<syntaxhighlight lang="prolog">primes(N, L) :- numlist(2, N, Xs),
sieve(Xs, L).
sieve(Xs, L).


Line 7,289: Line 15,687:
; H3 is H2 + H,
; H3 is H2 + H,
( H1 =:= H2 -> filter(H, H3, T, R)
( H1 =:= H2 -> filter(H, H3, T, R)
; filter(H, H3, [H1|T], R) ) ).</lang>
; filter(H, H3, [H1|T], R) ) ).</syntaxhighlight>


{{out}}
{{out}}
Line 7,304: Line 15,702:
This is actually the Euler's variant of the sieve of Eratosthenes, generating (and thus removing) each multiple only once, though a sub-optimal implementation.
This is actually the Euler's variant of the sieve of Eratosthenes, generating (and thus removing) each multiple only once, though a sub-optimal implementation.


<lang Prolog>primes(X, PS) :- X > 1, range(2, X, R), sieve(R, PS).
<syntaxhighlight lang="prolog">primes(X, PS) :- X > 1, range(2, X, R), sieve(R, PS).


range(X, X, [X]) :- !.
range(X, X, [X]) :- !.
Line 7,317: Line 15,715:
remove( _, [], [] ) :- !.
remove( _, [], [] ) :- !.
remove( [H | X], [H | Y], R ) :- !, remove(X, Y, R).
remove( [H | X], [H | Y], R ) :- !, remove(X, Y, R).
remove( X, [H | Y], [H | R]) :- remove(X, Y, R). </lang>
remove( X, [H | Y], [H | R]) :- remove(X, Y, R). </syntaxhighlight>


Running in SWI Prolog,
Running in SWI Prolog,
Line 7,331: Line 15,729:
We can stop early, with massive improvement in complexity (below ~ <i>n<sup>1.5</sup></i> inferences, empirically, vs. the ~ <i>n<sup>2</sup></i> of the above, in ''n'' primes produced; showing only the modified predicates):
We can stop early, with massive improvement in complexity (below ~ <i>n<sup>1.5</sup></i> inferences, empirically, vs. the ~ <i>n<sup>2</sup></i> of the above, in ''n'' primes produced; showing only the modified predicates):


<lang Prolog>primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).
<syntaxhighlight lang="prolog">primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).


sieve(X, [H | T], [H | T]) :- H*H > X, !.
sieve(X, [H | T], [H | T]) :- H*H > X, !.
sieve(X, [H | T], [H | S]) :- maplist( mult(H), [H | T], MS),
sieve(X, [H | T], [H | S]) :- maplist( mult(H), [H | T], MS),
remove(MS, T, R), sieve(X, R, S).</lang>
remove(MS, T, R), sieve(X, R, S).</syntaxhighlight>


{{out}}
{{out}}
Line 7,347: Line 15,745:
Optimized by stopping early, traditional sieve of Eratosthenes generating multiples by iterated addition.
Optimized by stopping early, traditional sieve of Eratosthenes generating multiples by iterated addition.


<lang Prolog>primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).
<syntaxhighlight lang="prolog">primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).


range(X, X, [X]) :- !.
range(X, X, [X]) :- !.
Line 7,362: Line 15,760:
remove( [H | X], [H | Y], R ) :- !, remove(X, Y, R).
remove( [H | X], [H | Y], R ) :- !, remove(X, Y, R).
remove( [H | X], [G | Y], R ) :- H < G, !, remove(X, [G | Y], R).
remove( [H | X], [G | Y], R ) :- H < G, !, remove(X, [G | Y], R).
remove( X, [H | Y], [H | R]) :- remove(X, Y, R). </lang>
remove( X, [H | Y], [H | R]) :- remove(X, Y, R). </syntaxhighlight>


{{out}}
{{out}}
<pre> ?- time(( primes(7920,X), length(X,N) )).
<pre>?- time(( primes(7920,X), length(X,N) )).
% 140,654 inferences, 0.016 CPU in 0.011 seconds (142% CPU, 9016224 Lips)
% 140,654 inferences, 0.016 CPU in 0.011 seconds (142% CPU, 9016224 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.
N = 1000.
</pre>
</pre>

====Sift the Two's and Sift the Three's====
Another version, based on Cloksin&Mellish p.175, modified to stop early as well as to work with odds only and use addition in the removing predicate, instead of the <code>mod</code> testing as the original was doing:
<syntaxhighlight lang="prolog">primes(N,[]):- N < 2, !.
primes(N,[2|R]):- ints(3,N,L), sift(N,L,R).
ints(A,B,[A|C]):- A=<B -> D is A+2, ints(D,B,C).
ints(_,_,[]).
sift(_,[],[]).
sift(N,[A|B],[A|C]):- A*A =< N -> rmv(A,B,D), sift(N,D,C)
; C=B.
rmv(A,B,D):- M is A*A, rmv(A,M,B,D).
rmv(_,_,[],[]).
rmv(P,M,[A|B],C):- ( M>A -> C=[A|D], rmv(P,M,B,D)
; M==A -> M2 is M+2*P, rmv(P,M2,B,C)
; M<A -> M2 is M+2*P, rmv(P,M2,[A|B],C)
).</syntaxhighlight>

Runs at about n^1.4 time empirically, producing 20,000 primes in 1.4 secs [https://swish.swi-prolog.org/p/modified_C&M_SoE.pl on the SWISH platform] as of 2021-11-26.


===Using lazy lists===
===Using lazy lists===
Line 7,377: Line 15,793:
====Basic variant====
====Basic variant====


<lang prolog>primes(PS):- count(2, 1, NS), sieve(NS, PS).
<syntaxhighlight lang="prolog">primes(PS):- count(2, 1, NS), sieve(NS, PS).


count(N, D, [N|T]):- freeze(T, (N2 is N+D, count(N2, D, T))).
count(N, D, [N|T]):- freeze(T, (N2 is N+D, count(N2, D, T))).
Line 7,387: Line 15,803:
remove([A|T],[B|S],R):- A < B -> remove(T,[B|S],R) ;
remove([A|T],[B|S],R):- A < B -> remove(T,[B|S],R) ;
A=:=B -> remove(T,S,R) ;
A=:=B -> remove(T,S,R) ;
R = [B|R2], freeze(R2, remove([A|T], S, R2)).</lang>
R = [B|R2], freeze(R2, remove([A|T], S, R2)).</syntaxhighlight>


{{out}}
{{out}}
Line 7,399: Line 15,815:
====Optimized by postponed removal====
====Optimized by postponed removal====
Showing only changed predicates.
Showing only changed predicates.
<lang prolog>primes([2|PS]):-
<syntaxhighlight lang="prolog">primes([2|PS]):-
freeze(PS, (primes(BPS), count(3, 1, NS), sieve(NS, BPS, 4, PS))).
freeze(PS, (primes(BPS), count(3, 1, NS), sieve(NS, BPS, 4, PS))).


Line 7,405: Line 15,821:
N < Q -> PS = [N|PS2], freeze(PS2, sieve(NS, BPS, Q, PS2))
N < Q -> PS = [N|PS2], freeze(PS2, sieve(NS, BPS, Q, PS2))
; BPS = [BP,BP2|BPS2], Q2 is BP2*BP2, count(Q, BP, MS),
; BPS = [BP,BP2|BPS2], Q2 is BP2*BP2, count(Q, BP, MS),
remove(MS, NS, R), sieve(R, [BP2|BPS2], Q2, PS). </lang>
remove(MS, NS, R), sieve(R, [BP2|BPS2], Q2, PS). </syntaxhighlight>


{{out}}
{{out}}
Line 7,426: Line 15,842:
to record integers that are found to be composite.
to record integers that are found to be composite.


<lang Prolog>% %sieve( +N, -Primes ) is true if Primes is the list of consecutive primes
<syntaxhighlight lang="prolog">% %sieve( +N, -Primes ) is true if Primes is the list of consecutive primes
% that are less than or equal to N
% that are less than or equal to N
sieve( N, [2|Rest]) :-
sieve( N, [2|Rest]) :-
Line 7,461: Line 15,877:


:- dynamic( composite/1 ).
:- dynamic( composite/1 ).
</syntaxhighlight>
</lang>
The above has been tested with SWI-Prolog and gprolog.
The above has been tested with SWI-Prolog and gprolog.


<lang Prolog>% SWI-Prolog:
<syntaxhighlight lang="prolog">% SWI-Prolog:


?- time( (sieve(100000,P), length(P,N), writeln(N), last(P, LP), writeln(LP) )).
?- time( (sieve(100000,P), length(P,N), writeln(N), last(P, LP), writeln(LP) )).
Line 7,471: Line 15,887:
N = 9592,
N = 9592,
LP = 99991.
LP = 99991.
</syntaxhighlight>
</lang>


==== Optimized approach====
==== Optimized approach====
[http://ideone.com/WDC7z Works with SWI-Prolog].
[http://ideone.com/WDC7z Works with SWI-Prolog].


<lang Prolog>sieve(N, [2|PS]) :- % PS is list of odd primes up to N
<syntaxhighlight lang="prolog">sieve(N, [2|PS]) :- % PS is list of odd primes up to N
retractall(mult(_)),
retractall(mult(_)),
sieve_O(3,N,PS).
sieve_O(3,N,PS).
Line 7,505: Line 15,921:
:- dynamic( mult/1 ).
:- dynamic( mult/1 ).
:- main(100000), main(1000000).</lang>
:- main(100000), main(1000000).</syntaxhighlight>


Running it produces
Running it produces


<lang Prolog>%% stdout copy
<syntaxhighlight lang="prolog">%% stdout copy
[9592, 99991]
[9592, 99991]
[78498, 999983]
[78498, 999983]
Line 7,515: Line 15,931:
%% stderr copy
%% stderr copy
% 293,176 inferences, 0.14 CPU in 0.14 seconds (101% CPU, 2094114 Lips)
% 293,176 inferences, 0.14 CPU in 0.14 seconds (101% CPU, 2094114 Lips)
% 3,122,303 inferences, 1.63 CPU in 1.67 seconds (97% CPU, 1915523 Lips)</lang>
% 3,122,303 inferences, 1.63 CPU in 1.67 seconds (97% CPU, 1915523 Lips)</syntaxhighlight>


which indicates <i>~ N<sup>1.1</sup></i> [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirical orders of growth], which is consistent with the ''O(N log log N)'' theoretical runtime complexity.
which indicates <i>~ N<sup>1.1</sup></i> [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirical orders of growth], which is consistent with the ''O(N log log N)'' theoretical runtime complexity.
Line 7,523: Line 15,939:
Uses a ariority queue, from the paper "The Genuine Sieve of Eratosthenes" by Melissa O'Neill. Works with YAP (Yet Another Prolog)
Uses a ariority queue, from the paper "The Genuine Sieve of Eratosthenes" by Melissa O'Neill. Works with YAP (Yet Another Prolog)


<lang Prolog>?- use_module(library(heaps)).
<syntaxhighlight lang="prolog">?- use_module(library(heaps)).


prime(2).
prime(2).
Line 7,548: Line 15,964:
adjust_heap(H2, N, H).
adjust_heap(H2, N, H).
adjust_heap(H, N, H) :-
adjust_heap(H, N, H) :-
\+ min_of_heap(H, N, _).</lang>
\+ min_of_heap(H, N, _).</syntaxhighlight>


=={{header|PureBasic}}==
=={{header|PureBasic}}==


===Basic procedure===
===Basic procedure===
<lang PureBasic>For n=2 To Sqr(lim)
<syntaxhighlight lang="purebasic">For n=2 To Sqr(lim)
If Nums(n)=0
If Nums(n)=0
m=n*n
m=n*n
Line 7,561: Line 15,977:
Wend
Wend
EndIf
EndIf
Next n</lang>
Next n</syntaxhighlight>


===Working example===
===Working example===
<lang PureBasic>Dim Nums.i(0)
<syntaxhighlight lang="purebasic">Dim Nums.i(0)
Define l, n, m, lim
Define l, n, m, lim


Line 7,600: Line 16,016:
Print(#CRLF$+#CRLF$+"Press ENTER to exit"): Input()
Print(#CRLF$+#CRLF$+"Press ENTER to exit"): Input()
CloseConsole()
CloseConsole()
EndIf</lang>
EndIf</syntaxhighlight>


Output may look like;
Output may look like;
Line 7,625: Line 16,041:
avoids explicit iteration in the interpreter, giving a further speed improvement.
avoids explicit iteration in the interpreter, giving a further speed improvement.


<lang python>def eratosthenes2(n):
<syntaxhighlight lang="python">def eratosthenes2(n):
multiples = set()
multiples = set()
for i in range(2, n+1):
for i in range(2, n+1):
Line 7,632: Line 16,048:
multiples.update(range(i*i, n+1, i))
multiples.update(range(i*i, n+1, i))


print(list(eratosthenes2(100)))</lang>
print(list(eratosthenes2(100)))</syntaxhighlight>


===Using array lookup===
===Using array lookup===
The version below uses array lookup to test for primality. The function <tt>primes_upto()</tt> is a straightforward implementation of [http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithm Sieve of Eratosthenes]algorithm. It returns prime numbers less than or equal to <tt>limit</tt>.
The version below uses array lookup to test for primality. The function <tt>primes_upto()</tt> is a straightforward implementation of [http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithm Sieve of Eratosthenes]algorithm. It returns prime numbers less than or equal to <tt>limit</tt>.
<lang python>def primes_upto(limit):
<syntaxhighlight lang="python">def primes_upto(limit):
is_prime = [False] * 2 + [True] * (limit - 1)
is_prime = [False] * 2 + [True] * (limit - 1)
for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
Line 7,642: Line 16,058:
for i in range(n*n, limit+1, n):
for i in range(n*n, limit+1, n):
is_prime[i] = False
is_prime[i] = False
return [i for i, prime in enumerate(is_prime) if prime]</lang>
return [i for i, prime in enumerate(is_prime) if prime]</syntaxhighlight>


===Using generator===
===Using generator===
The following code may be slightly slower than using the array/list as above, but uses no memory for output:
The following code may be slightly slower than using the array/list as above, but uses no memory for output:
<lang python>def iprimes_upto(limit):
<syntaxhighlight lang="python">def iprimes_upto(limit):
is_prime = [False] * 2 + [True] * (limit - 1)
is_prime = [False] * 2 + [True] * (limit - 1)
for n in xrange(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
for n in xrange(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
Line 7,653: Line 16,069:
is_prime[i] = False
is_prime[i] = False
for i in xrange(limit + 1):
for i in xrange(limit + 1):
if is_prime[i]: yield i</lang>{{out|Example}}<lang python>>>> list(iprimes_upto(15))
if is_prime[i]: yield i</syntaxhighlight>{{out|Example}}<syntaxhighlight lang="python">>>> list(iprimes_upto(15))
[2, 3, 5, 7, 11, 13]</lang>
[2, 3, 5, 7, 11, 13]</syntaxhighlight>


===Odds-only version of the array sieve above===
===Odds-only version of the array sieve above===
The following code is faster than the above array version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:
The following code is faster than the above array version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:
<lang python>def primes2(limit):
<syntaxhighlight lang="python">def primes2(limit):
if limit < 2: return []
if limit < 2: return []
if limit < 3: return [2]
if limit < 3: return [2]
Line 7,668: Line 16,084:
s = p * (i + 1) + i
s = p * (i + 1) + i
buf[s::p] = [False] * ((lmtbf - s) // p + 1)
buf[s::p] = [False] * ((lmtbf - s) // p + 1)
return [2] + [i + i + 3 for i, v in enumerate(buf) if v]</lang>
return [2] + [i + i + 3 for i, v in enumerate(buf) if v]</syntaxhighlight>


Note that "range" needs to be changed to "xrange" for maximum speed with Python 2.
Note that "range" needs to be changed to "xrange" for maximum speed with Python 2.
Line 7,675: Line 16,091:
The following code is faster than the above generator version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:
The following code is faster than the above generator version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:


<lang python>def iprimes2(limit):
<syntaxhighlight lang="python">def iprimes2(limit):
yield 2
yield 2
if limit < 3: return
if limit < 3: return
Line 7,686: Line 16,102:
buf[s::p] = [False] * ((lmtbf - s) // p + 1)
buf[s::p] = [False] * ((lmtbf - s) // p + 1)
for i in range(lmtbf + 1):
for i in range(lmtbf + 1):
if buf[i]: yield (i + i + 3)</lang>
if buf[i]: yield (i + i + 3)</syntaxhighlight>


Note that this version may actually run slightly faster than the equivalent array version with the advantage that the output doesn't require any memory.
Note that this version may actually run slightly faster than the equivalent array version with the advantage that the output doesn't require any memory.
Line 7,695: Line 16,111:
This uses a 235 factorial wheel for further reductions in operations; the same techniques can be applied to the array version as well; it runs slightly faster and uses slightly less memory as compared to the odds-only algorithms:
This uses a 235 factorial wheel for further reductions in operations; the same techniques can be applied to the array version as well; it runs slightly faster and uses slightly less memory as compared to the odds-only algorithms:


<lang python>def primes235(limit):
<syntaxhighlight lang="python">def primes235(limit):
yield 2; yield 3; yield 5
yield 2; yield 3; yield 5
if limit < 7: return
if limit < 7: return
Line 7,714: Line 16,130:
s += p * gaps[ci]; ci += 1
s += p * gaps[ci]; ci += 1
for i in range(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
for i in range(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])</lang>
if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])</syntaxhighlight>


Note: Much of the time (almost two thirds for this last case for Python 2.7.6) for any of these array/list or generator algorithms is used in the computation and enumeration of the final output in the last line(s), so any slight changes to those lines can greatly affect execution time. For Python 3 this enumeration is about twice as slow as Python 2 (Python 3.3 slow and 3.4 slower) for an even bigger percentage of time spent just outputting the results. This slow enumeration means that there is little advantage to versions that use even further wheel factorization, as the composite number culling is a small part of the time to enumerate the results.
Note: Much of the time (almost two thirds for this last case for Python 2.7.6) for any of these array/list or generator algorithms is used in the computation and enumeration of the final output in the last line(s), so any slight changes to those lines can greatly affect execution time. For Python 3 this enumeration is about twice as slow as Python 2 (Python 3.3 slow and 3.4 slower) for an even bigger percentage of time spent just outputting the results. This slow enumeration means that there is little advantage to versions that use even further wheel factorization, as the composite number culling is a small part of the time to enumerate the results.
Line 7,723: Line 16,139:


===Using numpy===
===Using numpy===
{{libheader|numpy}}
{{libheader|NumPy}}
Below code adapted from [http://en.literateprograms.org/Sieve_of_Eratosthenes_(Python,_arrays)#simple_implementation literateprograms.org] using [http://numpy.scipy.org/ numpy]
Below code adapted from [http://en.literateprograms.org/Sieve_of_Eratosthenes_(Python,_arrays)#simple_implementation literateprograms.org] using [http://numpy.scipy.org/ numpy]
<lang python>import numpy
<syntaxhighlight lang="python">import numpy
def primes_upto2(limit):
def primes_upto2(limit):
is_prime = numpy.ones(limit + 1, dtype=numpy.bool)
is_prime = numpy.ones(limit + 1, dtype=numpy.bool)
Line 7,731: Line 16,147:
if is_prime[n]:
if is_prime[n]:
is_prime[n*n::n] = 0
is_prime[n*n::n] = 0
return numpy.nonzero(is_prime)[0][2:]</lang>
return numpy.nonzero(is_prime)[0][2:]</syntaxhighlight>
'''Performance note:''' there is no point to add wheels here, due to execution of <tt>p[n*n::n] = 0</tt> and <tt>nonzero()</tt> takes us almost all time.
'''Performance note:''' there is no point to add wheels here, due to execution of <tt>p[n*n::n] = 0</tt> and <tt>nonzero()</tt> takes us almost all time.


Line 7,738: Line 16,154:
===Using wheels with numpy===
===Using wheels with numpy===
Version with wheel based optimization:
Version with wheel based optimization:
<lang python>from numpy import array, bool_, multiply, nonzero, ones, put, resize
<syntaxhighlight lang="python">from numpy import array, bool_, multiply, nonzero, ones, put, resize
#
#
def makepattern(smallprimes):
def makepattern(smallprimes):
Line 7,757: Line 16,173:
if isprime[n]:
if isprime[n]:
isprime[n*n::n] = 0
isprime[n*n::n] = 0
return nonzero(isprime)[0]</lang>
return nonzero(isprime)[0]</syntaxhighlight>


Examples:
Examples:
<lang python>>>> primes_upto3(10**6, smallprimes=(2,3)) # Wall time: 0.17
<syntaxhighlight lang="python">>>> primes_upto3(10**6, smallprimes=(2,3)) # Wall time: 0.17
array([ 2, 3, 5, ..., 999961, 999979, 999983])
array([ 2, 3, 5, ..., 999961, 999979, 999983])
>>> primes_upto3(10**7, smallprimes=(2,3)) # Wall time: '''2.13'''
>>> primes_upto3(10**7, smallprimes=(2,3)) # Wall time: '''2.13'''
Line 7,771: Line 16,187:
array([ 2, 3, 5, ..., 9999971, 9999973, 9999991])
array([ 2, 3, 5, ..., 9999971, 9999973, 9999991])
>>> primes_upto3(10**7) # Wall time: '''1.30'''
>>> primes_upto3(10**7) # Wall time: '''1.30'''
array([ 2, 3, 5, ..., 9999971, 9999973, 9999991])</lang>
array([ 2, 3, 5, ..., 9999971, 9999973, 9999991])</syntaxhighlight>
The above-mentioned examples demonstrate that the ''given'' wheel based optimization does not show significant performance gain.
The above-mentioned examples demonstrate that the ''given'' wheel based optimization does not show significant performance gain.


Line 7,778: Line 16,194:


{{works with|Python|2.6+, 3.x}}
{{works with|Python|2.6+, 3.x}}
<lang python>import heapq
<syntaxhighlight lang="python">import heapq


# generates all prime numbers
# generates all prime numbers
Line 7,805: Line 16,221:
yield i
yield i
i += 1</lang>
i += 1</syntaxhighlight>
Example:
Example:
<pre>
<pre>
Line 7,826: Line 16,242:
The adding of each discovered prime's incremental step info to the mapping should be '''''postponed''''' until the prime's ''square'' is seen amongst the candidate numbers, as it is useless before that point. This drastically reduces the space complexity from <i>O(n)</i> to <i>O(sqrt(n/log(n)))</i>, in ''<code>n</code>'' primes produced, and also lowers the run time complexity quite low ([http://ideone.com/VXep9F this test entry in Python 2.7] and [http://ideone.com/muuS4H this test entry in Python 3.x] shows about <i>~ n<sup>1.08</sup></i> [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirical order of growth] which is very close to the theoretical value of <i>O(n log(n) log(log(n)))</i>, in ''<code>n</code>'' primes produced):
The adding of each discovered prime's incremental step info to the mapping should be '''''postponed''''' until the prime's ''square'' is seen amongst the candidate numbers, as it is useless before that point. This drastically reduces the space complexity from <i>O(n)</i> to <i>O(sqrt(n/log(n)))</i>, in ''<code>n</code>'' primes produced, and also lowers the run time complexity quite low ([http://ideone.com/VXep9F this test entry in Python 2.7] and [http://ideone.com/muuS4H this test entry in Python 3.x] shows about <i>~ n<sup>1.08</sup></i> [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirical order of growth] which is very close to the theoretical value of <i>O(n log(n) log(log(n)))</i>, in ''<code>n</code>'' primes produced):
{{works with|Python|2.6+, 3.x}}
{{works with|Python|2.6+, 3.x}}
<lang python>def primes():
<syntaxhighlight lang="python">def primes():
yield 2; yield 3; yield 5; yield 7;
yield 2; yield 3; yield 5; yield 7;
bps = (p for p in primes()) # separate supply of "base" primes (b.p.)
bps = (p for p in primes()) # separate supply of "base" primes (b.p.)
Line 7,849: Line 16,265:
import itertools
import itertools
def primes_up_to(limit):
def primes_up_to(limit):
return list(itertools.takewhile(lambda p: p <= limit, primes()))</lang>
return list(itertools.takewhile(lambda p: p <= limit, primes()))</syntaxhighlight>


===Fast infinite generator using a wheel===
===Fast infinite generator using a wheel===
Although theoretically over three times faster than odds-only, the following code using a 2/3/5/7 wheel is only about 1.5 times faster than the above odds-only code due to the extra overheads in code complexity. The [http://ideone.com/LFaRnT test link for Python 2.7] and [http://ideone.com/ZAY0T2 test link for Python 3.x] show about the same empirical order of growth as the odds-only implementation above once the range grows enough so the dict operations become amortized to a constant factor.
Although theoretically over three times faster than odds-only, the following code using a 2/3/5/7 wheel is only about 1.5 times faster than the above odds-only code due to the extra overheads in code complexity. The [http://ideone.com/LFaRnT test link for Python 2.7] and [http://ideone.com/ZAY0T2 test link for Python 3.x] show about the same empirical order of growth as the odds-only implementation above once the range grows enough so the dict operations become amortized to a constant factor.
{{works with|Python|2.6+, 3.x}}
{{works with|Python|2.6+, 3.x}}
<lang python>def primes():
<syntaxhighlight lang="python">def primes():
for p in [2,3,5,7]: yield p # base wheel primes
for p in [2,3,5,7]: yield p # base wheel primes
gaps1 = [ 2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8 ]
gaps1 = [ 2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8 ]
Line 7,883: Line 16,299:
if nni > 47: nni = 0
if nni > 47: nni = 0
n += gaps[ni]; ni = nni # advance on the wheel
n += gaps[ni]; ni = nni # advance on the wheel
for p, pi in wheel_prime_pairs(): yield p # strip out indexes</lang>
for p, pi in wheel_prime_pairs(): yield p # strip out indexes</syntaxhighlight>


Further gains of about 1.5 times in speed can be made using the same code by only changing the tables and a few constants for a further constant factor gain of about 1.5 times in speed by using a 2/3/5/7/11/13/17 wheel (with the gaps list 92160 elements long) computed for a slight constant overhead time as per the [http://ideone.com/4Ld26g test link for Python 2.7] and [http://ideone.com/72Dmyt test link for Python 3.x]. Further wheel factorization will not really be worth it as the gains will be small (if any and not losses) and the gaps table huge - it is already too big for efficient use by 32-bit Python 3 and the wheel should likely be stopped at 13:
Further gains of about 1.5 times in speed can be made using the same code by only changing the tables and a few constants for a further constant factor gain of about 1.5 times in speed by using a 2/3/5/7/11/13/17 wheel (with the gaps list 92160 elements long) computed for a slight constant overhead time as per the [http://ideone.com/4Ld26g test link for Python 2.7] and [http://ideone.com/72Dmyt test link for Python 3.x]. Further wheel factorization will not really be worth it as the gains will be small (if any and not losses) and the gaps table huge - it is already too big for efficient use by 32-bit Python 3 and the wheel should likely be stopped at 13:
<lang python>def primes():
<syntaxhighlight lang="python">def primes():
whlPrms = [2,3,5,7,11,13,17] # base wheel primes
whlPrms = [2,3,5,7,11,13,17] # base wheel primes
for p in whlPrms: yield p
for p in whlPrms: yield p
Line 7,926: Line 16,342:
n += gaps[ni]; ni = nni # advance on the wheel
n += gaps[ni]; ni = nni # advance on the wheel
for p, pi in wheel_prime_pairs(): yield p # strip out indexes
for p, pi in wheel_prime_pairs(): yield p # strip out indexes
</syntaxhighlight>
</lang>


===Iterative sieve on unbounded count from 2===
See [[Extensible_prime_generator#Python:_Iterative_sieve_on_unbounded_count_from_2| Extensible prime generator: Iterative sieve on unbounded count from 2]]

=={{header|Quackery}}==
<syntaxhighlight lang="quackery"> [ dup 1
[ 2dup > while
+ 1 >>
2dup / again ]
drop nip ] is sqrt ( n --> n )
[ stack [ 3 ~ ] constant ] is primes ( --> s )
( If a number is prime, the corresponding bit on the
number on the primes ancillary stack is set.
Initially all the bits are set except for 0 and 1,
which are not prime numbers by definition.
"eratosthenes" unsets all bits above those specified
by it's argument. )
[ bit ~
primes take & primes put ] is -composite ( n --> )
[ bit primes share & 0 != ] is isprime ( n --> b )
[ dup dup sqrt times
[ i^ 1+
dup isprime if
[ dup 2 **
[ dup -composite
over +
rot 2dup >
dip unrot until ]
drop ]
drop ]
drop
1+ bit 1 -
primes take &
primes put ] is eratosthenes ( n --> )
100 eratosthenes
100 times [ i^ isprime if [ i^ echo sp ] ]</syntaxhighlight>

'''Output:'''

<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</pre>


=={{header|R}}==
=={{header|R}}==
<lang r>sieve <- function(n) {
<syntaxhighlight lang="rsplus">sieve <- function(n) {
if (n < 2) return(NULL)
if (n < 2) integer(0)
else {
a <- rep(T, n)
a[1] <- F
primes <- rep(T, n)
primes[[1]] <- F
for(i in seq(n)) {
if (a[i]) {
for(i in seq(sqrt(n))) {
j <- i * i
if(primes[[i]]) {
if (j > n) return(which(a))
primes[seq(i * i, n, i)] <- F
a[seq(j, n, by=i)] <- F
}
}
}
which(primes)
}
}
}
}


sieve(1000)</lang>
sieve(1000)</syntaxhighlight>

{{out}}
<pre> [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61
[19] 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151
[37] 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251
[55] 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359
[73] 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
[91] 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593
[109] 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701
[127] 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827
[145] 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953
[163] 967 971 977 983 991 997</pre>

'''Alternate Odds-Only Version'''

<syntaxhighlight lang="rsplus">sieve <- function(n) {
if (n < 2) return(integer(0))
lmt <- (sqrt(n) - 1) / 2
sz <- (n - 1) / 2
buf <- rep(TRUE, sz)
for(i in seq(lmt)) {
if (buf[i]) {
buf[seq((i + i) * (i + 1), sz, by=(i + i + 1))] <- FALSE
}
}
cat(2, sep='')
for(i in seq(sz)) {
if (buf[i]) {
cat(" ", (i + i + 1), sep='')
}
}
}

sieve(1000)</syntaxhighlight>

{{out}}
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997</pre>


=={{header|Racket}}==
=={{header|Racket}}==
===Imperative versions===
===Imperative versions===
Ugly imperative version:
Ugly imperative version:
<lang Racket>#lang racket
<syntaxhighlight lang="racket">#lang racket


(define (sieve n)
(define (sieve n)
Line 7,959: Line 16,461:
(reverse primes))
(reverse primes))


(sieve 100)</lang>
(sieve 100)</syntaxhighlight>


A little nicer, but still imperative:
A little nicer, but still imperative:
<lang Racket>#lang racket
<syntaxhighlight lang="racket">#lang racket
(define (sieve n)
(define (sieve n)
(define primes (make-vector (add1 n) #t))
(define primes (make-vector (add1 n) #t))
Line 7,972: Line 16,474:
#:when (vector-ref primes n))
#:when (vector-ref primes n))
n))
n))
(sieve 100)</lang>
(sieve 100)</syntaxhighlight>


Imperative version using a bit vector:
Imperative version using a bit vector:
<lang Racket>#lang racket
<syntaxhighlight lang="racket">#lang racket
(require data/bit-vector)
(require data/bit-vector)
;; Returns a list of prime numbers up to natural number limit
;; Returns a list of prime numbers up to natural number limit
Line 7,988: Line 16,490:
(for/list ([i (bit-vector-length bv)] #:unless (bit-vector-ref bv i)) i))
(for/list ([i (bit-vector-length bv)] #:unless (bit-vector-ref bv i)) i))
(eratosthenes 100)
(eratosthenes 100)
</syntaxhighlight>
</lang>


{{output}}
{{output}}
Line 7,997: Line 16,499:
These examples use infinite lists (streams) to implement the sieve of Eratosthenes in a functional way, and producing all prime numbers. The following functions are used as a prefix for pieces of code that follow:
These examples use infinite lists (streams) to implement the sieve of Eratosthenes in a functional way, and producing all prime numbers. The following functions are used as a prefix for pieces of code that follow:


<lang Racket>#lang lazy
<syntaxhighlight lang="racket">#lang lazy
(define (ints-from i d) (cons i (ints-from (+ i d) d)))
(define (ints-from i d) (cons i (ints-from (+ i d) d)))
(define (after n l f)
(define (after n l f)
Line 8,010: Line 16,512:
(cond [(< x1 x2) (cons x1 (union (cdr l1) l2 ))]
(cond [(< x1 x2) (cons x1 (union (cdr l1) l2 ))]
[(> x1 x2) (cons x2 (union l1 (cdr l2)))]
[(> x1 x2) (cons x2 (union l1 (cdr l2)))]
[else (cons x1 (union (cdr l1) (cdr l2)))])))</lang>
[else (cons x1 (union (cdr l1) (cdr l2)))])))</syntaxhighlight>


==== Basic sieve ====
==== Basic sieve ====


<lang Racket>(define (sieve l)
<syntaxhighlight lang="racket">(define (sieve l)
(define x (car l))
(define x (car l))
(cons x (sieve (diff (cdr l) (ints-from (+ x x) x)))))
(cons x (sieve (diff (cdr l) (ints-from (+ x x) x)))))
(define primes (sieve (ints-from 2 1)))
(define primes (sieve (ints-from 2 1)))
(!! (take 25 primes))</lang>
(!! (take 25 primes))</syntaxhighlight>


Runs at ~ n^2.1 [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirically], for ''n <= 1500'' primes produced.
Runs at ~ n^2.1 [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirically], for ''n <= 1500'' primes produced.
Line 8,025: Line 16,527:
Note that the first number, 2, and its multiples stream <code>(ints-from 4 2)</code> are handled separately to ensure that the non-primes list is never empty, which simplifies the code for <code>union</code> which assumes non-empty infinite lists.
Note that the first number, 2, and its multiples stream <code>(ints-from 4 2)</code> are handled separately to ensure that the non-primes list is never empty, which simplifies the code for <code>union</code> which assumes non-empty infinite lists.


<lang Racket>(define (sieve l non-primes)
<syntaxhighlight lang="racket">(define (sieve l non-primes)
(let ([x (car l)] [np (car non-primes)])
(let ([x (car l)] [np (car non-primes)])
(cond [(= x np) (sieve (cdr l) (cdr non-primes))] ; else x < np
(cond [(= x np) (sieve (cdr l) (cdr non-primes))] ; else x < np
[else (cons x (sieve (cdr l) (union (ints-from (* x x) x)
[else (cons x (sieve (cdr l) (union (ints-from (* x x) x)
non-primes)))])))
non-primes)))])))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))</lang>
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))</syntaxhighlight>


==== Basic sieve Optimized with postponed processing ====
==== Basic sieve Optimized with postponed processing ====
Since a prime's multiples that count start from its square, we should only start removing them when we reach that square.
Since a prime's multiples that count start from its square, we should only start removing them when we reach that square.
<lang Racket>(define (sieve l prs)
<syntaxhighlight lang="racket">(define (sieve l prs)
(define p (car prs))
(define p (car prs))
(define q (* p p))
(define q (* p p))
(after q l (λ(t) (sieve (diff t (ints-from q p)) (cdr prs)))))
(after q l (λ(t) (sieve (diff t (ints-from q p)) (cdr prs)))))
(define primes (cons 2 (sieve (ints-from 3 1) primes)))</lang>
(define primes (cons 2 (sieve (ints-from 3 1) primes)))</syntaxhighlight>


Runs at ~ n^1.4 up to n=10,000. The initial 2 in the self-referential primes definition is needed to prevent a "black hole".
Runs at ~ n^1.4 up to n=10,000. The initial 2 in the self-referential primes definition is needed to prevent a "black hole".
Line 8,045: Line 16,547:
Since prime's multiples that matter start from its square, we should only add them when we reach that square.
Since prime's multiples that matter start from its square, we should only add them when we reach that square.


<lang Racket>(define (composites l q primes)
<syntaxhighlight lang="racket">(define (composites l q primes)
(after q l
(after q l
(λ(t)
(λ(t)
Line 8,053: Line 16,555:
(define primes (cons 2
(define primes (cons 2
(diff (ints-from 3 1)
(diff (ints-from 3 1)
(composites (ints-from 4 2) 9 (cdr primes)))))</lang>
(composites (ints-from 4 2) 9 (cdr primes)))))</syntaxhighlight>


==== Implementation of Richard Bird's algorithm ====
==== Implementation of Richard Bird's algorithm ====
Line 8,059: Line 16,561:
Appears in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M.O'Neill's paper]. Achieves on its own the proper postponement that is specifically arranged for in the version above (with <code>after</code>), and is yet more efficient, because it folds to the right and so builds the right-leaning structure of merges at run time, where the more frequently-producing streams of multiples appear <i>higher</i> in that structure, so the composite numbers produced by them have less <code>merge</code> nodes to percolate through:
Appears in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M.O'Neill's paper]. Achieves on its own the proper postponement that is specifically arranged for in the version above (with <code>after</code>), and is yet more efficient, because it folds to the right and so builds the right-leaning structure of merges at run time, where the more frequently-producing streams of multiples appear <i>higher</i> in that structure, so the composite numbers produced by them have less <code>merge</code> nodes to percolate through:


<lang Racket>(define primes
<syntaxhighlight lang="racket">(define primes
(cons 2 (diff (ints-from 3 1)
(cons 2 (diff (ints-from 3 1)
(foldr (λ(p r) (define q (* p p))
(foldr (λ(p r) (define q (* p p))
(cons q (union (ints-from (+ q p) p) r)))
(cons q (union (ints-from (+ q p) p) r)))
'() primes))))</lang>
'() primes))))</syntaxhighlight>


=== Using threads and channels ===
=== Using threads and channels ===
Line 8,069: Line 16,571:
Same algorithm as [[#With merged composites|"merged composites" above]] (without the postponement optimization), but now using threads and channels to produce a channel of all prime numbers (similar to newsqueak). The macro at the top is a convenient wrapper around definitions of channels using a thread that feeds them.
Same algorithm as [[#With merged composites|"merged composites" above]] (without the postponement optimization), but now using threads and channels to produce a channel of all prime numbers (similar to newsqueak). The macro at the top is a convenient wrapper around definitions of channels using a thread that feeds them.


<lang Racket>#lang racket
<syntaxhighlight lang="racket">#lang racket
(define-syntax (define-thread-loop stx)
(define-syntax (define-thread-loop stx)
(syntax-case stx ()
(syntax-case stx ()
Line 8,095: Line 16,597:
(out! x) (let loop () (out! (channel-get l)) (loop)))
(out! x) (let loop () (out! (channel-get l)) (loop)))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(for/list ([i 25] [x (in-producer channel-get eof primes)]) x)</lang>
(for/list ([i 25] [x (in-producer channel-get eof primes)]) x)</syntaxhighlight>


=== Using generators ===
=== Using generators ===
Line 8,101: Line 16,603:
Yet another variation of the same algorithm as above, this time using generators.
Yet another variation of the same algorithm as above, this time using generators.


<lang Racket>#lang racket
<syntaxhighlight lang="racket">#lang racket
(require racket/generator)
(require racket/generator)
(define (ints-from i d)
(define (ints-from i d)
Line 8,121: Line 16,623:
(define (cons x l) (generator () (yield x) (let loop () (yield (l)) (loop))))
(define (cons x l) (generator () (yield x) (let loop () (yield (l)) (loop))))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(for/list ([i 25] [x (in-producer primes)]) x)</lang>
(for/list ([i 25] [x (in-producer primes)]) x)</syntaxhighlight>

=={{header|Raku}}==
(formerly Perl 6)

<syntaxhighlight lang="raku" line>sub sieve( Int $limit ) {
my @is-prime = False, False, slip True xx $limit - 1;

gather for @is-prime.kv -> $number, $is-prime {
if $is-prime {
take $number;
loop (my $s = $number**2; $s <= $limit; $s += $number) {
@is-prime[$s] = False;
}
}
}
}

(sieve 100).join(",").say;</syntaxhighlight>

=== A set-based approach ===

More or less the same as the first Python example:
<syntaxhighlight lang="raku" line>sub eratsieve($n) {
# Requires n(1 - 1/(log(n-1))) storage
my $multiples = set();
gather for 2..$n -> $i {
unless $i (&) $multiples { # is subset
take $i;
$multiples (+)= set($i**2, *+$i ... (* > $n)); # union
}
}
}

say flat eratsieve(100);</syntaxhighlight>
This gives:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

=== Using a chain of filters ===

{{incorrect|Raku|This version uses modulo (division) testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}

''Note: while this is "incorrect" by a strict interpretation of the rules, it is being left as an interesting example''

<syntaxhighlight lang="raku" line>sub primes ( UInt $n ) {
gather {
# create an iterator from 2 to $n (inclusive)
my $iterator := (2..$n).iterator;

loop {
# If it passed all of the filters it must be prime
my $prime := $iterator.pull-one;
# unless it is actually the end of the sequence
last if $prime =:= IterationEnd;

take $prime; # add the prime to the `gather` sequence

# filter out the factors of the current prime
$iterator := Seq.new($iterator).grep(* % $prime).iterator;
# (2..*).grep(* % 2).grep(* % 3).grep(* % 5).grep(* % 7)…
}
}
}

put primes( 100 );</syntaxhighlight>
Which prints

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

=={{header|RATFOR}}==
<syntaxhighlight lang="ratfor">

program prime
#
define(true,1)
define(false,0)
#
integer loop,loop2,limit,k,primes,count
integer isprime(1000)

limit = 1000
count = 0

for (loop=1; loop<=limit; loop=loop+1)
{
isprime(loop) = true
}

isprime(1) = false

for (loop=2; loop<=limit; loop=loop+1)


{
if (isprime(loop) == true)
{
count = count + 1
for (loop2=loop*loop; loop2 <= limit; loop2=loop2+loop)
{
isprime(loop2) = false
}
}
}
write(*,*)
write(*,101) count

101 format('There are ',I12,' primes.')

count = 0
for (loop=1; loop<=limit; loop=loop+1)
if (isprime(loop) == true)
{
Count = count + 1
write(*,'(I6,$)')loop
if (mod(count,10) == 0) write(*,*)
}
write(*,*)

end
</syntaxhighlight>

=={{header|Red}}==
<syntaxhighlight lang="red">
primes: function [n [integer!]][
poke prim: make bitset! n 1 true
r: 2 while [r * r <= n][
repeat q n / r - 1 [poke prim q + 1 * r true]
until [not pick prim r: r + 1]
]
collect [repeat i n [if not prim/:i [keep i]]]
]

primes 100
== [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
</syntaxhighlight>

=={{header|Refal}}==
<syntaxhighlight lang="refal">$ENTRY Go {
= <Print <Primes 100>>;
};

Primes {
s.N = <Sieve <Iota 2 s.N>>;
};

Iota {
s.End s.End = s.End;
s.Start s.End = s.Start <Iota <+ 1 s.Start> s.End>;
};

Cross {
s.Step e.List = <Cross (s.Step 1) s.Step e.List>;
(s.Step s.Skip) = ;
(s.Step 1) s.Item e.List = X <Cross (s.Step s.Step) e.List>;
(s.Step s.N) s.Item e.List = s.Item <Cross (s.Step <- s.N 1>) e.List>;
};

Sieve {
= ;
X e.List = <Sieve e.List>;
s.N e.List = s.N <Sieve <Cross s.N e.List>>;
};</syntaxhighlight>
{{out}}
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</pre>


=={{header|REXX}}==
=={{header|REXX}}==
Line 8,129: Line 16,795:
As the stemmed array gets heavily populated, the number of entries ''may'' slow down the REXX interpreter substantially,
As the stemmed array gets heavily populated, the number of entries ''may'' slow down the REXX interpreter substantially,
<br>depending upon the efficacy of the hashing technique being used for REXX variables (setting/retrieving).
<br>depending upon the efficacy of the hashing technique being used for REXX variables (setting/retrieving).
<lang REXX>/*REXX program generates primes via the sieve of Eratosthenes algorithm. */
<syntaxhighlight lang="rexx">/*REXX program generates and displays primes via the sieve of Eratosthenes algorithm.*/
parse arg H .; if H=='' | H=="," then H=200 /*optain optional argument from the CL.*/
parse arg H .; if H=='' | H=="," then H= 200 /*optain optional argument from the CL.*/
w=length(H); @prime=right('prime', 20) /*W: is used for aligning the output.*/
w= length(H); @prime= right('prime', 20) /*W: is used for aligning the output.*/
@.=. /*assume all the numbers are prime. */
@.=. /*assume all the numbers are prime. */
#=0 /*number of primes found (so far). */
#= 0 /*number of primes found (so far). */
do j=2 for H-1; if @.j=='' then iterate /*all prime integers up to H inclusive.*/
do j=2 for H-1; if @.j=='' then iterate /*all prime integers up to H inclusive.*/
#=#+1 /*bump the prime number counter. */
#= # + 1 /*bump the prime number counter. */
say @prime right(#,w) " ───► " right(j,w) /*display the prime to the terminal. */
say @prime right(#,w) " ───► " right(j,w) /*display the prime to the terminal. */
do m=j*j to H by j; @.m=; end /*m*/ /*strike all multiples as being ¬ prime*/
do m=j*j to H by j; @.m=; end /*strike all multiples as being ¬ prime*/
end /*j*/ /* ─── */
end /*j*/ /* ─── */
say /*stick a fork in it, we're all done. */
say
say right(#,w+length(@prime)+1) 'primes found.' /*stick a fork in it, we're all done. */</lang>
say right(#, 1+w+length(@prime) ) 'primes found up to and including ' H</syntaxhighlight>
'''output''' &nbsp; when using the input default of: &nbsp; <tt> 200 </tt>
'''output''' &nbsp; when using the input default of: &nbsp; <tt> 200 </tt>
<pre style="height:45ex">
<pre style="height:45ex">
Line 8,190: Line 16,856:
prime 46 ───► 199
prime 46 ───► 199


46 primes found.
46 primes found up to and including 200
</pre>
</pre>


Line 8,197: Line 16,863:


Also supported is the suppression of listing the primes if the &nbsp; '''H''' &nbsp; ('''h'''igh limit) &nbsp; is negative.
Also supported is the suppression of listing the primes if the &nbsp; '''H''' &nbsp; ('''h'''igh limit) &nbsp; is negative.

<br>Also added is a final message indicating the number of primes found.
Also added is a final message indicating the number of primes found.
<lang rexx>/*REXX program generates primes via a wheeled sieve of Eratosthenes algorithm. */
<syntaxhighlight lang="rexx">/*REXX program generates primes via a wheeled sieve of Eratosthenes algorithm. */
parse arg H .; if H=='' | H=="," then H=200 /*obtain the optional argument from CL.*/
tell=h>0; H=abs(H); w=length(H) /*negative H suppresses prime listing.*/
parse arg H .; if H=='' then H=200 /*let the highest number be specified. */
tell=h>0; H=abs(H); w=length(H) /*a negative H suppresses prime listing*/
if 2<=H & tell then say right(1, w+20)'st prime ───► ' right(2, w)
if 2<=H & tell then say right(1, w+20)'st prime ───► ' right(2, w)
#= 2<=H /*the number of primes found (so far).*/
@.= '0'x /*assume that all numbers are prime. */
@.=. /*assume all the numbers are prime */
cw= length(@.) /*the cell width that holds numbers. */
!=0 /*skips the top part of sieve marking.*/
#= w<=H /*the number of primes found (so far).*/
do j=3 by 2 for (H-2)%2 /*the odd integers up to H inclusive.*/
!=0 /*skips the top part of sieve marking. */
if @.j=='' then iterate /*Is composite? Then skip this number.*/
do j=3 by 2 for (H-2)%2; b= j%cw /*odd integers up to H inclusive. */
#=#+1 /*bump the prime number counter. */
if substr(x2b(c2x(@.b)),j//cw+1,1) then iterate /*is J composite ? */
if tell then say right(#, w+20)th(#) 'prime ───► ' right(j, w)
#= # + 1 /*bump the prime number counter. */
if ! then iterate /*should the top part be skipped ? */
if tell then say right(#, w+20)th(#) 'prime ───► ' right(j, w)
jj=j*j /*compute the square of J. ___ */
if ! then iterate /*should the top part be skipped ? */
if jj>H then !=1 /*indicate skipping if j > H */
jj=j * j /*compute the square of J. ___*/
do m=jj to H by j+j; @.m=; end /*m*/ /*strike odd multiples as not prime. */
if jj>H then !=1 /*indicates skip top part if j > √ H */
end /*j*/ /* ─── */
do m=jj to H by j+j; call . m; end /* [↑] strike odd multiples ¬ prime */
end /*j*/ /* ─── */
say

say right(#, w+20) 'prime's(#) "found." /*display the count of primes found. */
say; say right(#, w+20) 'prime's(#) "found up to and including " H
exit /*stick a fork in it, we're all done. */
exit /*stick a fork in it, we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────────────*/
/*──────────────────────────────────────────────────────────────────────────────────────*/
.: parse arg n; b=n%cw; r=n//cw+1;_=x2b(c2x(@.b));@.b=x2c(b2x(left(_,r-1)'1'substr(_,r+1)));return
s: if arg(1)==1 then return arg(3); return word(arg(2) 's', 1) /*pluralizer.*/
th: procedure; x=arg(1); return word('th st nd rd', 1+ x//10*(x//100%10\==1) * (x//10<4))</lang>
s: if arg(1)==1 then return arg(3); return word(arg(2) 's',1) /*pluralizer.*/
th: procedure; parse arg x; x=abs(x); return word('th st nd rd',1+x//10*(x//100%10\==1)*(x//10<4))</syntaxhighlight>
'''output''' &nbsp; when using the input default of: &nbsp; <tt> 200 </tt>
{{out|output|text=&nbsp; when using the input default of: &nbsp; &nbsp; <tt> 200 </tt>}}
<pre style="height:45ex">
<pre style="height:45ex">
1st prime ───► 2
1st prime ───► 2
Line 8,269: Line 16,938:
46th prime ───► 199
46th prime ───► 199


46 primes found.
46 primes found up to and including 200
</pre>
</pre>
'''output''' &nbsp; when using the input of: &nbsp; <tt> -1000 </tt>
{{out|output|text=&nbsp; when using the input of: &nbsp; &nbsp; <tt> -1000 </tt>}}
<pre>
<pre>
168 primes found.
168 primes found up to and including 1000

</pre>
</pre>
'''output''' &nbsp; when using the input of: &nbsp; <tt> -10000 </tt>
{{out|output|text=&nbsp; when using the input of: &nbsp; &nbsp; <tt> -10000 </tt>}}
<pre>
<pre>
1229 primes found.
1229 primes found up to and including 10000
</pre>
</pre>
'''output''' &nbsp; when using the input of: &nbsp; <tt> -100000 </tt>
{{out|output|text=&nbsp; when using the input of: &nbsp; &nbsp; <tt> -100000 </tt>}}
<pre>
<pre>
9592 primes found.
9592 primes found up to and including 100000.
</pre>
</pre>
'''output''' &nbsp; when using the input of: &nbsp; <tt> -1000000 </tt>
{{out|output|text=&nbsp; when using the input of: &nbsp; &nbsp; <tt> -1000000 </tt>}}
<pre>
<pre>
78498 primes found.
78498 primes found up to and including 10000000
</pre>
</pre>
'''output''' &nbsp; when using the input of: &nbsp; <tt> -10000000 </tt>
{{out|output|text=&nbsp; when using the input of: &nbsp; &nbsp; <tt> -10000000 </tt>}}
<pre>
<pre>
664579 primes found.
664579 primes found up to and including 10000000
</pre>
</pre>
'''output''' &nbsp; when using the input of: &nbsp; <tt> -100000000 </tt>
<pre>
16 +++ @.m=
Error 5 running "C:\sieve_of_Eratosthenes.rex", line 16: System resources exhausted
</pre>
The above (using Regina 3.8.2 under Windows/XP) shows one of the weaknesses of this implementation of the ''Sieve of Eratosthenes'': &nbsp; it must keep an array of all (if not most) values which is used to strike out composite numbers.

The &nbsp; ''System resources exhausted'' &nbsp; error can be postponed by implementing further optimizations (expanding the wheel with low primes).


===wheel version===
===wheel version===
Line 8,304: Line 16,966:


It also uses a short-circuit test for striking out composites &nbsp; ≤ &nbsp; &radic;{{overline|&nbsp;target&nbsp;}}
It also uses a short-circuit test for striking out composites &nbsp; ≤ &nbsp; &radic;{{overline|&nbsp;target&nbsp;}}
<lang rexx>/*REXX program generates primes via a wheeled sieve of Eratosthenes algorithm. */
<syntaxhighlight lang="rexx">/*REXX pgm generates and displays primes via a wheeled sieve of Eratosthenes algorithm. */
parse arg H .; if H=='' | H=="," then H=200 /*obtain the optional argument from CL.*/
parse arg H .; if H=='' | H=="," then H= 200 /*obtain the optional argument from CL.*/
w=length(H); @prime=right('prime', 20) /*w: is used for aligning the output. */
w= length(H); @prime= right('prime', 20) /*w: is used for aligning the output. */
if 2<=H then say @prime right(1, w) " ───► " right(2, w)
if 2<=H then say @prime right(1, w) " ───► " right(2, w)
#= 2<=H /*the number of primes found (so far).*/
#= 2<=H /*the number of primes found (so far).*/
@.=. /*assume all the numbers are prime */
@.=. /*assume all the numbers are prime. */
!=0 /*skips the top part of sieve marking.*/
!=0; do j=3 by 2 for (H-2)%2 /*the odd integers up to H inclusive.*/
do j=3 by 2 for (H-2)%2 /*the odd integers up to H inclusive.*/
if @.j=='' then iterate /*Is composite? Then skip this number.*/
if @.j=='' then iterate /*Is composite? Then skip this number.*/
#= # + 1 /*bump the prime number counter. */
#=#+1 /*bump the prime number counter. */
say @prime right(#,w) " ───► " right(j,w) /*display the prime to the terminal. */
say @prime right(#,w) " ───► " right(j,w) /*display the prime to the terminal. */
if ! then iterate /*skip the top part of loop? ___ */
if ! then iterate /*should the top part be skipped ? */
if j*j>H then !=1 /*indicate skip top part if J > H */
jj=j*j /*compute the square of j. ___ */
do m=j*j to H by j+j; @.m=; end /*strike odd multiples as not prime. */
if jj>H then !=1 /*indicate skipping if j > H */
end /*j*/ /* ─── */
do m=jj to H by j+j; @.m=; end /*m*/ /*strike odd multiples as not prime. */
say /*stick a fork in it, we're all done. */
say right(#, 1 + w + length(@prime) ) 'primes found up to and including ' H</syntaxhighlight>
end /*j*/ /* ─── */
{{out|output|text=&nbsp; is identical to the first (non-wheel) version; &nbsp; program execution is over &nbsp; ''twice'' &nbsp; as fast.}}
say

say right(#, w+length(@prime)+1) 'primes found.' /*stick a fork in it, we're all done. */</lang>
'''output''' &nbsp; is identical to the first (non-wheel) version; &nbsp; program execution is over &nbsp; ''twice'' &nbsp; as fast.
The addition of the short-circuit test &nbsp; (using the REXX variable &nbsp;<big>'''!'''</big>) &nbsp; makes it about &nbsp; ''another'' &nbsp; '''20%''' &nbsp; faster. <br><br>
<br>The addition of the short-circuit test &nbsp; (using the REXX variable <big>'''!'''</big>) &nbsp; makes it about ''another'' &nbsp; '''20%''' &nbsp; faster.


===Wheel Version restructured===
===Wheel Version restructured===
<lang rexx>/*REXX program generates primes via sieve of Eratosthenes algorithm.
<syntaxhighlight lang="rexx">/*REXX program generates primes via sieve of Eratosthenes algorithm.
* 21.07.2012 Walter Pachl derived from above Rexx version
* 21.07.2012 Walter Pachl derived from above Rexx version
* avoid symbols @ and # (not supported by ooRexx)
* avoid symbols @ and # (not supported by ooRexx)
Line 8,354: Line 17,015:
np=np+1
np=np+1
Say ' prime number' right(np,w) " --> " right(prime,w)
Say ' prime number' right(np,w) " --> " right(prime,w)
Return</lang>
Return</syntaxhighlight>
'''output''' is mostly identical to the above versions.
'''output''' is mostly identical to the above versions.


=={{header|Ring}}==
=={{header|Ring}}==
<lang ring>
<syntaxhighlight lang="ring">
limit = 100
limit = 100
sieve = list(limit)
sieve = list(limit)
Line 8,367: Line 17,028:
if sieve[i] = 0 see "" + i + " " ok
if sieve[i] = 0 see "" + i + " " ok
next
next
</syntaxhighlight>
</lang>
Output:
Output:
<pre>
<pre>
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
</pre>
</pre>

=={{header|RPL}}==
This is a direct translation from Wikipedia. The variable <code>i</code> has been renamed <code>ii</code> to avoid confusion with the language constant <code>i</code>=√ -1
{{works with|Halcyon Calc|4.2.8}}
{| class="wikitable"
! RPL code
! Comment
|-
|
≪ → n
≪ { } n + 1 CON 'A' STO
2 n √ '''FOR''' ii
'''IF''' A ii GET '''THEN'''
ii SQ n '''FOR''' j
'A' j 0 PUT ii '''STEP'''
'''END'''
'''NEXT'''
{ }
2 n '''FOR''' ii '''IF''' A ii GET '''THEN''' ii + '''END NEXT'''
'A' PURGE
≫ ≫ '<span style="color:blue">SIEVE</span>' STO
|
<span style="color:blue">SIEVE</span> ''( n -- { prime_numbers } )''
let A be an array of Boolean values, indexed by 2 to n,
initially all set to true.
for i = 2, 3, 4, ..., not exceeding √n do
if A[i] is true
for j = i^2, i^2+i,... not exceeding n do
set A[j] := false
return all i such that A[i] is true.
|}
100 <span style="color:blue">SIEVE</span>
{{out}}
<pre>
1: { 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 }
</pre>
Latest RPL versions allow to remove some slow <code>FOR..NEXT</code> loops and use local variables only.
{{works with|HP|49}}
« 'X' DUP 1 4 PICK 1 SEQ DUP → n a seq123
« 2 n √ '''FOR''' ii
'''IF''' a ii GET '''THEN'''
ii SQ n '''FOR''' j
'a' j 0 PUT ii '''STEP'''
'''END'''
'''NEXT'''
a seq123 IFT TAIL
» » '<span style="color:blue">SIEVE</span>' STO
{{works with|HP|49}}


=={{header|Ruby}}==
=={{header|Ruby}}==
''eratosthenes'' starts with <code>nums = [nil, nil, 2, 3, 4, 5, ..., n]</code>, then marks ( the nil setting ) multiples of <code>2, 3, 5, 7, ...</code> there, then returns all non-nil numbers which are the primes.
''eratosthenes'' starts with <code>nums = [nil, nil, 2, 3, 4, 5, ..., n]</code>, then marks ( the nil setting ) multiples of <code>2, 3, 5, 7, ...</code> there, then returns all non-nil numbers which are the primes.
<lang ruby>def eratosthenes(n)
<syntaxhighlight lang="ruby">def eratosthenes(n)
nums = [nil, nil, *2..n]
nums = [nil, nil, *2..n]
(2..Math.sqrt(n)).each do |i|
(2..Math.sqrt(n)).each do |i|
Line 8,383: Line 17,098:
end
end
p eratosthenes(100)</lang>
p eratosthenes(100)</syntaxhighlight>
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]</pre>
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]</pre>


Line 8,394: Line 17,109:
* Both inner loops skip multiples of 2 and 3.
* Both inner loops skip multiples of 2 and 3.


<lang ruby>def eratosthenes2(n)
<syntaxhighlight lang="ruby">def eratosthenes2(n)
# For odd i, if i is prime, nums[i >> 1] is true.
# For odd i, if i is prime, nums[i >> 1] is true.
# Set false for all multiples of 3.
# Set false for all multiples of 3.
Line 8,436: Line 17,151:
end
end


p eratosthenes2(100)</lang>
p eratosthenes2(100)</syntaxhighlight>


This simple benchmark compares ''eratosthenes'' with ''eratosthenes2''.
This simple benchmark compares ''eratosthenes'' with ''eratosthenes2''.


<lang ruby>require 'benchmark'
<syntaxhighlight lang="ruby">require 'benchmark'
Benchmark.bmbm {|x|
Benchmark.bmbm {|x|
x.report("eratosthenes") { eratosthenes(1_000_000) }
x.report("eratosthenes") { eratosthenes(1_000_000) }
x.report("eratosthenes2") { eratosthenes2(1_000_000) }
x.report("eratosthenes2") { eratosthenes2(1_000_000) }
}</lang>
}</syntaxhighlight>


''eratosthenes2'' runs about 4 times faster than ''eratosthenes''.
''eratosthenes2'' runs about 4 times faster than ''eratosthenes''.
Line 8,451: Line 17,166:
[[MRI]] 1.9.x implements the sieve of Eratosthenes at file [http://redmine.ruby-lang.org/projects/ruby-19/repository/entry/lib/prime.rb prime.rb], <code>class EratosthensesSeive</code> (around [http://redmine.ruby-lang.org/projects/ruby-19/repository/entry/lib/prime.rb#L421 line 421]). This implementation optimizes for space, by packing the booleans into 16-bit integers. It also hardcodes all primes less than 256.
[[MRI]] 1.9.x implements the sieve of Eratosthenes at file [http://redmine.ruby-lang.org/projects/ruby-19/repository/entry/lib/prime.rb prime.rb], <code>class EratosthensesSeive</code> (around [http://redmine.ruby-lang.org/projects/ruby-19/repository/entry/lib/prime.rb#L421 line 421]). This implementation optimizes for space, by packing the booleans into 16-bit integers. It also hardcodes all primes less than 256.


<lang ruby>require 'prime'
<syntaxhighlight lang="ruby">require 'prime'
p Prime::EratosthenesGenerator.new.take_while {|i| i <= 100}</lang>
p Prime::EratosthenesGenerator.new.take_while {|i| i <= 100}</syntaxhighlight>


=={{header|Run BASIC}}==
=={{header|Run BASIC}}==
<lang runbasic>input "Gimme the limit:"; limit
<syntaxhighlight lang="runbasic">input "Gimme the limit:"; limit
dim flags(limit)
dim flags(limit)
for i = 2 to limit
for i = 2 to limit
Line 8,462: Line 17,177:
next k
next k
if flags(i) = 0 then print i;", ";
if flags(i) = 0 then print i;", ";
next i</lang>
next i</syntaxhighlight>
<pre>Gimme the limit:?100
<pre>Gimme the limit:?100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, </pre>
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, </pre>
Line 8,468: Line 17,183:
=={{header|Rust}}==
=={{header|Rust}}==
[[Category:Rust]]
[[Category:Rust]]

===Unboxed Iterator===

A slightly more idiomatic, optimized and modern iterator output example.

<syntaxhighlight lang="rust">fn primes(n: usize) -> impl Iterator<Item = usize> {
const START: usize = 2;
if n < START {
Vec::new()
} else {
let mut is_prime = vec![true; n + 1 - START];
let limit = (n as f64).sqrt() as usize;
for i in START..limit + 1 {
let mut it = is_prime[i - START..].iter_mut().step_by(i);
if let Some(true) = it.next() {
it.for_each(|x| *x = false);
}
}
is_prime
}
.into_iter()
.enumerate()
.filter_map(|(e, b)| if b { Some(e + START) } else { None })
}</syntaxhighlight>

Notes:
# Starting at an offset of 2 means that an <code>n < 2</code> input requires zero allocations, because <code>Vec::new()</code> doesn't allocate memory until elements are pushed into it.
# Using <code>Vec</code> as an output to the <code>if .. {} else {}</code> condition means the output is statically deterministic, avoiding the need for a boxed trait object.
# Iterating <code>is_prime</code> with <code>.iter_mut()</code> and then using <code>.step_by(i)</code> makes all the optimizations required, and removes a lot of tediousness.
# Returning <code>impl Iterator</code> allows for static dispatching instead of dynamic dispatching, which is possible because the type is now statically known at compile time, making the zero input/output condition an order of magnitude faster.


===Sieve of Eratosthenes - No optimization===
===Sieve of Eratosthenes - No optimization===
<lang rust>fn simple_sieve(limit: usize) -> Vec<usize> {
<syntaxhighlight lang="rust">fn simple_sieve(limit: usize) -> Vec<usize> {


let mut is_prime = vec![true; limit+1];
let mut is_prime = vec![true; limit+1];
Line 8,492: Line 17,239:
fn main() {
fn main() {
println!("{:?}", simple_sieve(100));
println!("{:?}", simple_sieve(100));
}</lang>
}</syntaxhighlight>
{{out}}
{{out}}
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]</pre>
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]</pre>
Line 8,500: Line 17,247:
The above code doesn't even do the basic optimizing of only culling composites by primes up to the square root of the range as allowed in the task; it also outputs a vector of resulting primes, which consumes memory. The following code fixes both of those, outputting the results as an Iterator:
The above code doesn't even do the basic optimizing of only culling composites by primes up to the square root of the range as allowed in the task; it also outputs a vector of resulting primes, which consumes memory. The following code fixes both of those, outputting the results as an Iterator:


<lang rust>use std::iter::{empty, once};
<syntaxhighlight lang="rust">use std::iter::{empty, once};
use std::time::Instant;
use std::time::Instant;


Line 8,544: Line 17,291:
let dur = secs * 1000 + millis;
let dur = secs * 1000 + millis;
println!("Culling composites took {} milliseconds.", dur);
println!("Culling composites took {} milliseconds.", dur);
}</lang>
}</syntaxhighlight>
{{output}}
{{output}}
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
<pre>[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Line 8,556: Line 17,303:
The following code improves the above code by sieving only odd composite numbers as 2 is the only even prime for a reduction in number of operations by a factor of about two and a half with reduction of memory use by a factor of two, and bit-packs the composite sieving array for a further reduction of memory use by a factor of eight and with some saving in time due to better CPU cache use for a given sieving range; it also demonstrates how to eliminate the redundant array bounds check:
The following code improves the above code by sieving only odd composite numbers as 2 is the only even prime for a reduction in number of operations by a factor of about two and a half with reduction of memory use by a factor of two, and bit-packs the composite sieving array for a further reduction of memory use by a factor of eight and with some saving in time due to better CPU cache use for a given sieving range; it also demonstrates how to eliminate the redundant array bounds check:


<lang rust>fn optimized_sieve(limit: usize) -> Box<Iterator<Item = usize>> {
<syntaxhighlight lang="rust">fn optimized_sieve(limit: usize) -> Box<Iterator<Item = usize>> {
if limit < 3 {
if limit < 3 {
return if limit < 2 { Box::new(empty()) } else { Box::new(once(2)) }
return if limit < 2 { Box::new(empty()) } else { Box::new(once(2)) }
Line 8,586: Line 17,333:
Some((i + i + 3) as usize) } else { None } }
Some((i + i + 3) as usize) } else { None } }
}))
}))
}</lang>
}</syntaxhighlight>


The above function can be used just by substituting "optimized_sieve" for "basic_sieve" in the previous "main" function, and the outputs are the same except that the time is only 1584 milliseconds, or about three times as fast.
The above function can be used just by substituting "optimized_sieve" for "basic_sieve" in the previous "main" function, and the outputs are the same except that the time is only 1584 milliseconds, or about three times as fast.


===Unbounded Page-Segmented bit-packed odds-only version with Iterator===
===Unbounded Page-Segmented bit-packed odds-only version with Iterator===

'''Caution!''' This implementation is used in the [[Extensible_prime_generator#Rust|Extensible prime generator task]], so be sure not to break that implementation when changing this code.


While that above code is quite fast, as the range increases above the 10's of millions it begins to lose efficiency due to loss of CPU cache associativity as the size of the one-large-array used for culling composites grows beyond the limits of the various CPU caches. Accordingly the following page-segmented code where each culling page can be limited to not larger than the L1 CPU cache is about four times faster than the above for the range of one billion:
While that above code is quite fast, as the range increases above the 10's of millions it begins to lose efficiency due to loss of CPU cache associativity as the size of the one-large-array used for culling composites grows beyond the limits of the various CPU caches. Accordingly the following page-segmented code where each culling page can be limited to not larger than the L1 CPU cache is about four times faster than the above for the range of one billion:


<lang rust>use std::iter::{empty, once};
<syntaxhighlight lang="rust">use std::iter::{empty, once};
use std::rc::Rc;
use std::rc::Rc;
use std::cell::RefCell;
use std::cell::RefCell;
Line 8,673: Line 17,422:
let r = ((lwi - s) % p) as usize;
let r = ((lwi - s) % p) as usize;
if r == 0 { 0 } else { pc - r }
if r == 0 { 0 } else { pc - r }
}
};
while cp < pbts {
while cp < pbts {
unsafe { // avoids array bounds check, which is already done above
unsafe { // avoids array bounds check, which is already done above
Line 8,768: Line 17,517:
let dur = secs * 1000 + millis;
let dur = secs * 1000 + millis;
println!("Culling composites took {} milliseconds.", dur);
println!("Culling composites took {} milliseconds.", dur);
}</lang>
}</syntaxhighlight>


The output is about the same as the previous codes except much faster; as well as cache size improvements mentioned above, it has a population count primes counting function that is able to determine the number of found primes about twice as fast as using the Iterator count() method (commented out and labelled as "the slow way" in the main function).
The output is about the same as the previous codes except much faster; as well as cache size improvements mentioned above, it has a population count primes counting function that is able to determine the number of found primes about twice as fast as using the Iterator count() method (commented out and labelled as "the slow way" in the main function).
Line 8,779: Line 17,528:


The above code demonstrates some techniques to work within the limitations of Rust's ownership/borrowing/lifetime memory model as it: 1) uses a recursive secondary base primes Iterator made persistent by using a Vec that uses its own value as a source of its own page stream, 2) this is done by using a recursive variable that accessed as a Rc reference counted heap value with internal mutability by a pair of RefCell's, 3) note that the above secondary stream is not thread safe and needs to have the Rc changed to an Arc, the RefCell's changed to Mutex'es or (probably preferably RwLock's that enclose/lock all reading and writing operations in the secondary stream "Bpsi"'s next() method, and 4) the use of Iterators where their performance doesn't matter (at the page level) while using tight loops at more inner levels.
The above code demonstrates some techniques to work within the limitations of Rust's ownership/borrowing/lifetime memory model as it: 1) uses a recursive secondary base primes Iterator made persistent by using a Vec that uses its own value as a source of its own page stream, 2) this is done by using a recursive variable that accessed as a Rc reference counted heap value with internal mutability by a pair of RefCell's, 3) note that the above secondary stream is not thread safe and needs to have the Rc changed to an Arc, the RefCell's changed to Mutex'es or (probably preferably RwLock's that enclose/lock all reading and writing operations in the secondary stream "Bpsi"'s next() method, and 4) the use of Iterators where their performance doesn't matter (at the page level) while using tight loops at more inner levels.

=={{header|S-BASIC}}==
<syntaxhighlight lang="basic">
comment
Find primes up to the specified limit (here 1,000) using
classic Sieve of Eratosthenes
end

$constant limit = 1000
$constant false = 0
$constant true = FFFFH

var i, k, count, col = integer
dim integer flags(limit)

print "Finding primes from 2 to";limit

rem - initialize table
for i = 1 to limit
flags(i) = true
next i

rem - sieve for primes
for i = 2 to int(sqr(limit))
if flags(i) = true then
for k = (i*i) to limit step i
flags(k) = false
next k
next i

rem - write out primes 10 per line
count = 0
col = 1
for i = 2 to limit
if flags(i) = true then
begin
print using "#####";i;
count = count + 1
col = col + 1
if col > 10 then
begin
print
col = 1
end
end
next i
print
print count; " primes were found."

end
</syntaxhighlight>
{{out}}
<pre>
Finding primes from 2 to 1000
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
. . .
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
168 primes were found.
</pre>

=={{header|SAS}}==
The following defines an IML routine to compute the sieve, and as an example stores the primes below 1000 in a dataset.

<syntaxhighlight lang="text">proc iml;
start sieve(n);
a = J(n,1);
a[1] = 0;
do i = 1 to n;
if a[i] then do;
if i*i>n then return(a);
a[i*(i:int(n/i))] = 0;
end;
end;
finish;

a = loc(sieve(1000))`;
create primes from a;
append from a;
close primes;
quit;</syntaxhighlight>

=={{header|SASL}}==
{{incorrect|SASL|These use REM (division) testing and so are Trial Division algorithms, not Sieve of Eratosthenes.}}
Copied from SASL manual, top of page 36. This provides an infinite list.
<syntaxhighlight lang="sasl">
show primes
WHERE
primes = sieve (2...)
sieve (p : x ) = p : sieve {a <- x; a REM p > 0}
?
</syntaxhighlight>

The limited list for the first 1000 numbers
<syntaxhighlight lang="sasl">
show primes
WHERE
primes = sieve (2..1000)
sieve (p : x ) = p : sieve {a <- x; a REM p > 0}
?
</syntaxhighlight>


=={{header|Scala}}==
=={{header|Scala}}==


=== Genuine Eratosthenes sieve===
=== Genuine Eratosthenes sieve===
<lang Scala>import scala.annotation.tailrec
<syntaxhighlight lang="scala">import scala.annotation.tailrec
import scala.collection.parallel.mutable
import scala.collection.parallel.mutable
import scala.compat.Platform
import scala.compat.Platform
Line 8,804: Line 17,655:
assert(sieveOfEratosthenes(15099480).size == 976729)
assert(sieveOfEratosthenes(15099480).size == 976729)
println(s"Successfully completed without errors. [total ${Platform.currentTime - executionStart} ms]")
println(s"Successfully completed without errors. [total ${Platform.currentTime - executionStart} ms]")
}</lang>
}</syntaxhighlight>


{{out}}
{{out}}
Line 8,815: Line 17,666:
The following [['''odds-only''']] code is written in a very concise functional style (no mutable state other than the contents of the composites buffer and "higher order functions" for clarity), in this case using a Scala mutable BitSet:
The following [['''odds-only''']] code is written in a very concise functional style (no mutable state other than the contents of the composites buffer and "higher order functions" for clarity), in this case using a Scala mutable BitSet:


<lang Scala>object SoEwithBitSet {
<syntaxhighlight lang="scala">object SoEwithBitSet {
def makeSoE_PrimesTo(top: Int): Iterator[Int] = {
def makeSoE_PrimesTo(top: Int): Iterator[Int] = {
val topNdx = (top - 3) / 2 //odds composite BitSet buffer offset down to 3
val topNdx = (top - 3) / 2 //odds composite BitSet buffer offset down to 3
Line 8,823: Line 17,674:
(0 to (Math.sqrt(top).toInt - 3) / 2).filterNot { cmpsts }.foreach { cullPrmCmpsts }
(0 to (Math.sqrt(top).toInt - 3) / 2).filterNot { cmpsts }.foreach { cullPrmCmpsts }
Iterator.single(2) ++ (0 to topNdx).filterNot { cmpsts }.map { pi => pi + pi + 3 } }
Iterator.single(2) ++ (0 to topNdx).filterNot { cmpsts }.map { pi => pi + pi + 3 } }
}</lang>
}</syntaxhighlight>


In spite of being very concise, it is very much faster than the above code converted to odds-only due to the use of the BitSet instead of the hash table based Set (or ParSet), taking only a few seconds to enumerate the primes to 100 million as compared to the 10's of seconds to count the primes to above 15 million above.
In spite of being very concise, it is very much faster than the above code converted to odds-only due to the use of the BitSet instead of the hash table based Set (or ParSet), taking only a few seconds to enumerate the primes to 100 million as compared to the 10's of seconds to count the primes to above 15 million above.
Line 8,830: Line 17,681:
The below [['''odds-only''']] code using a primitive array (bit packed) and tail recursion to avoid some of the enumeration delays due to nested complex "higher order" function calls is almost eight times faster than the above more functional code:
The below [['''odds-only''']] code using a primitive array (bit packed) and tail recursion to avoid some of the enumeration delays due to nested complex "higher order" function calls is almost eight times faster than the above more functional code:


<lang Scala>object SoEwithArray {
<syntaxhighlight lang="scala">object SoEwithArray {
def makeSoE_PrimesTo(top: Int) = {
def makeSoE_PrimesTo(top: Int) = {
import scala.annotation.tailrec
import scala.annotation.tailrec
Line 8,857: Line 17,708:
Iterator.single(2) ++ Iterator.iterate(3)(p => getNxtPrmFrom(((p + 2) - 3) >>> 1)).takeWhile(_ <= top)
Iterator.single(2) ++ Iterator.iterate(3)(p => getNxtPrmFrom(((p + 2) - 3) >>> 1)).takeWhile(_ <= top)
}
}
}</lang>
}</syntaxhighlight>


It can be tested with the following code:
It can be tested with the following code:


<lang Scala>object Main extends App {
<syntaxhighlight lang="scala">object Main extends App {
import SoEwithArray._
import SoEwithArray._
val top_num = 100000000
val top_num = 100000000
Line 8,870: Line 17,721:
println(f"Found $count primes up to $top_num" + ".")
println(f"Found $count primes up to $top_num" + ".")
println("Using one large mutable Array and tail recursive loops.")
println("Using one large mutable Array and tail recursive loops.")
}</lang>
}</syntaxhighlight>


To produce the following output:
To produce the following output:
Line 8,882: Line 17,733:
The above code still uses an amount of memory proportional to the range of the sieve (although bit-packed as 8 values per byte). As well as only sieving odd candidates, the following code uses a fixed range buffer that is about the size of the CPU L2 cache plus only storage for the base primes up to the square root of the range for a large potential saving in RAM memory used as well as greatly reducing memory access times. The use of innermost tail recursive loops for critical loops where the majority of the execution time is spent rather than "higher order" functions from iterators also greatly reduces execution time, with much of the remaining time used just to enumerate the primes output:
The above code still uses an amount of memory proportional to the range of the sieve (although bit-packed as 8 values per byte). As well as only sieving odd candidates, the following code uses a fixed range buffer that is about the size of the CPU L2 cache plus only storage for the base primes up to the square root of the range for a large potential saving in RAM memory used as well as greatly reducing memory access times. The use of innermost tail recursive loops for critical loops where the majority of the execution time is spent rather than "higher order" functions from iterators also greatly reduces execution time, with much of the remaining time used just to enumerate the primes output:


<lang Scala>object APFSoEPagedOdds {
<syntaxhighlight lang="scala">object APFSoEPagedOdds {
import scala.annotation.tailrec
import scala.annotation.tailrec
Line 8,973: Line 17,824:
val (cnt, nlwp) = gen.next(); val nacc = acc + cnt
val (cnt, nlwp) = gen.next(); val nacc = acc + cnt
if (nlwp <= top) takeUpto(nacc) else nacc }; takeUpto(1) }
if (nlwp <= top) takeUpto(nacc) else nacc }; takeUpto(1) }
}</lang>
}</syntaxhighlight>


As the above and all following sieves are "infinite", they all require an extra range limiting condition to produce a finite output, such as the addition of ".takeWhile(_ <= topLimit)" where "topLimit" is the specified range as is done in the following code:
As the above and all following sieves are "infinite", they all require an extra range limiting condition to produce a finite output, such as the addition of ".takeWhile(_ <= topLimit)" where "topLimit" is the specified range as is done in the following code:


<lang Scala>object MainSoEPagedOdds extends App {
<syntaxhighlight lang="scala">object MainSoEPagedOdds extends App {
import APFSoEPagedOdds._
import APFSoEPagedOdds._
countSoEPrimesTo(100)
countSoEPrimesTo(100)
Line 8,986: Line 17,837:
val elpsd = System.currentTimeMillis() - strt
val elpsd = System.currentTimeMillis() - strt
println(f"Found $cnt primes up to $top in $elpsd milliseconds.")
println(f"Found $cnt primes up to $top in $elpsd milliseconds.")
}</lang>
}</syntaxhighlight>


which outputs the following:
which outputs the following:
Line 9,003: Line 17,854:
The following code uses delayed recursion via Streams to implement the Richard Bird algorithm mentioned in the last part (the Epilogue) of [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M.O'Neill's paper], which is '''a true incremental Sieve of Eratosthenes'''. It is nowhere near as fast as the array based solutions due to the overhead of functionally chasing the merging of the prime multiple streams; this also means that the empirical performance is not according to the usual Sieve of Eratosthenes approximations due to this overhead increasing as the log of the sieved range, but it is much better than [[Primality_by_trial_division#Odds-Only_.22infinite.22_primes_generator_using_Streams_and_Co-Inductive_Streams|the "unfaithful" sieve]].
The following code uses delayed recursion via Streams to implement the Richard Bird algorithm mentioned in the last part (the Epilogue) of [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf M.O'Neill's paper], which is '''a true incremental Sieve of Eratosthenes'''. It is nowhere near as fast as the array based solutions due to the overhead of functionally chasing the merging of the prime multiple streams; this also means that the empirical performance is not according to the usual Sieve of Eratosthenes approximations due to this overhead increasing as the log of the sieved range, but it is much better than [[Primality_by_trial_division#Odds-Only_.22infinite.22_primes_generator_using_Streams_and_Co-Inductive_Streams|the "unfaithful" sieve]].


<lang Scala> def birdPrimes() = {
<syntaxhighlight lang="scala"> def birdPrimes() = {
def oddPrimes: Stream[Int] = {
def oddPrimes: Stream[Int] = {
def merge(xs: Stream[Int], ys: Stream[Int]): Stream[Int] = {
def merge(xs: Stream[Int], ys: Stream[Int]): Stream[Int] = {
Line 9,024: Line 17,875:
}
}
2 #:: oddPrimes
2 #:: oddPrimes
}</lang>
}</syntaxhighlight>


Now this algorithm doesn't really need the memoization and full laziness as offered by Streams, so an implementation and use of a Co-Inductive Stream (CIS) class is sufficient and reduces execution time by almost a factor of two:<lang scala> class CIS[A](val start: A, val continue: () => CIS[A])
Now this algorithm doesn't really need the memoization and full laziness as offered by Streams, so an implementation and use of a Co-Inductive Stream (CIS) class is sufficient and reduces execution time by almost a factor of two:<syntaxhighlight lang="scala"> class CIS[A](val start: A, val continue: () => CIS[A])


def primesBirdCIS: Iterator[Int] = {
def primesBirdCIS: Iterator[Int] = {
Line 9,058: Line 17,909:


Iterator.single(2) ++ Iterator.iterate(oddPrimes())(_.continue()).map(_.start)
Iterator.single(2) ++ Iterator.iterate(oddPrimes())(_.continue()).map(_.start)
}</lang>
}</syntaxhighlight>


Further gains in performance for these last two implementations can be had by using further wheel factorization and "tree folding/merging" as per [http://www.haskell.org/haskellwiki/Primes#Tree_merging_with_Wheel this Haskell implementation].
Further gains in performance for these last two implementations can be had by using further wheel factorization and "tree folding/merging" as per [http://www.haskell.org/haskellwiki/Primes#Tree_merging_with_Wheel this Haskell implementation].
Line 9,065: Line 17,916:
As per [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf the "unfaithful sieve" article linked above], the incremental "infinite" Sieve of Eratosthenes can be implemented using a hash table instead of a Priority Queue or Map (Binary Heap) as were used in that article. The following implementation postpones the adding of base prime representations to the hash table until necessary to keep the hash table small:
As per [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf the "unfaithful sieve" article linked above], the incremental "infinite" Sieve of Eratosthenes can be implemented using a hash table instead of a Priority Queue or Map (Binary Heap) as were used in that article. The following implementation postpones the adding of base prime representations to the hash table until necessary to keep the hash table small:


<lang scala> def SoEInc: Iterator[Int] = {
<syntaxhighlight lang="scala"> def SoEInc: Iterator[Int] = {
val nextComposites = scala.collection.mutable.HashMap[Int, Int]()
val nextComposites = scala.collection.mutable.HashMap[Int, Int]()
def oddPrimes: Iterator[Int] = {
def oddPrimes: Iterator[Int] = {
Line 9,089: Line 17,940:
}
}
List(2, 3).toIterator ++ oddPrimes
List(2, 3).toIterator ++ oddPrimes
}</lang>
}</syntaxhighlight>


The above could be implemented using Streams or Co-Inductive Streams to pass the continuation parameters as passed here in a tuple but there would be no real difference in speed and there is no need to use the implied laziness. As compared to the versions of the Bird (or tree folding) Sieve of Eratosthenes, this has the expected same computational complexity as the array based versions, but is about 20 times slower due to the constant overhead of processing the key value hashing. Memory use is quite low, only being the hash table entries for each of the base prime values less than the square root of the last prime enumerated multiplied by the size of each hash entry (about 12 bytes in this case) plus a "load factor" percentage overhead in hash table size to minimize hash collisions (about twice as large as entries actually used by default on average).
The above could be implemented using Streams or Co-Inductive Streams to pass the continuation parameters as passed here in a tuple but there would be no real difference in speed and there is no need to use the implied laziness. As compared to the versions of the Bird (or tree folding) Sieve of Eratosthenes, this has the expected same computational complexity as the array based versions, but is about 20 times slower due to the constant overhead of processing the key value hashing. Memory use is quite low, only being the hash table entries for each of the base prime values less than the square root of the last prime enumerated multiplied by the size of each hash entry (about 12 bytes in this case) plus a "load factor" percentage overhead in hash table size to minimize hash collisions (about twice as large as entries actually used by default on average).
Line 9,100: Line 17,951:
===Tail-recursive solution===
===Tail-recursive solution===
{{Works with|Scheme|R<math>^5</math>RS}}
{{Works with|Scheme|R<math>^5</math>RS}}
<lang scheme>; Tail-recursive solution :
<syntaxhighlight lang="scheme">; Tail-recursive solution :
(define (sieve n)
(define (sieve n)
(define (aux u v)
(define (aux u v)
Line 9,120: Line 17,971:
; (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
; (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
; > (length (sieve 10000000))
; > (length (sieve 10000000))
; 664579</lang>
; 664579</syntaxhighlight>


===Simpler, non-tail-recursive solution===
===Simpler, non-tail-recursive solution===
<lang scheme>; Simpler solution, with the penalty that none of 'iota, 'strike or 'sieve is tail-recursive :
<syntaxhighlight lang="scheme">; Simpler solution, with the penalty that none of 'iota, 'strike or 'sieve is tail-recursive :
(define (iota start stop stride)
(define (iota start stop stride)
(if (> start stop)
(if (> start stop)
Line 9,145: Line 17,996:


(display (primes 100))
(display (primes 100))
(newline)</lang>
(newline)</syntaxhighlight>
Output:
Output:
<lang>(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)</lang>
<syntaxhighlight lang="text">(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)</syntaxhighlight>
===Optimised using an odds-wheel===
===Optimised using an odds-wheel===
Optimised using a pre-computed wheel based on 2 (i.e. odds only):
Optimised using a pre-computed wheel based on 2 (i.e. odds only):
<lang scheme>(define (primes-wheel-2 limit)
<syntaxhighlight lang="scheme">(define (primes-wheel-2 limit)
(let ((stop (sqrt limit)))
(let ((stop (sqrt limit)))
(define (sieve lst)
(define (sieve lst)
Line 9,160: Line 18,011:


(display (primes-wheel-2 100))
(display (primes-wheel-2 100))
(newline)</lang>
(newline)</syntaxhighlight>
Output:
Output:
<lang>(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)</lang>
<syntaxhighlight lang="text">(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)</syntaxhighlight>


===Vector-based===
===Vector-based===
Vector-based (faster), works with R<math>^5</math>RS:
Vector-based (faster), works with R<math>^5</math>RS:
<lang scheme>; initialize v to vector of sequential integers
<syntaxhighlight lang="scheme">; initialize v to vector of sequential integers
(define (initialize! v)
(define (initialize! v)
(define (iter v n) (if (>= n (vector-length v))
(define (iter v n) (if (>= n (vector-length v))
Line 9,208: Line 18,059:
(initialize! v)
(initialize! v)
(vector-set! v 1 0) ; 1 is not a prime
(vector-set! v 1 0) ; 1 is not a prime
(remove zero? (vector->list (iter v 2)))))</lang>
(remove zero? (vector->list (iter v 2)))))</syntaxhighlight>


===SICP-style streams===
===SICP-style streams===
Using SICP-style ''head''-forced streams. Works with MIT-Scheme &ndash; or any other Scheme, if writing out by hand the expansion of the only macro here, <code>s-cons</code>, with explicit lambda. Common functions:
Using SICP-style ''head''-forced streams. Works with MIT-Scheme, Chez Scheme, &ndash; or any other Scheme, if writing out by hand the expansion of the only macro here, <code>s-cons</code>, with explicit lambda. Common functions:


<lang scheme> ;;;; Stream Implementation
<syntaxhighlight lang="scheme"> ;;;; Stream Implementation
(define (head s) (car s))
(define (head s) (car s))
(define (tail s) ((cdr s)))
(define (tail s) ((cdr s)))
Line 9,244: Line 18,095:
((< h1 h2) (s-cons h1 (s-union (tail s1) s2 )))
((< h1 h2) (s-cons h1 (s-union (tail s1) s2 )))
((< h2 h1) (s-cons h2 (s-union s1 (tail s2))))
((< h2 h1) (s-cons h2 (s-union s1 (tail s2))))
(else (s-cons h1 (s-union (tail s1) (tail s2)))))))</lang>
(else (s-cons h1 (s-union (tail s1) (tail s2)))))))</syntaxhighlight>


====The simplest, naive sieve====
====The simplest, naive sieve====
Very slow, running at ~ <i>n<sup>2.2</sup></i>, empirically, and worsening:
Very slow, running at ~ <i>n<sup>2.2</sup></i>, empirically, and worsening:
<lang scheme> (define (sieve s)
<syntaxhighlight lang="scheme"> (define (sieve s)
(let ((p (head s)))
(let ((p (head s)))
(s-cons p
(s-cons p
(sieve (s-diff (tail s) (from-By (+ p p) p))))))
(sieve (s-diff s (from-By p p))))))
(define primes (sieve (from-By 2 1)))</lang>
(define primes (sieve (from-By 2 1)))</syntaxhighlight>


====Bounded, stopping early====
====Bounded, stopping early====
Stops at the square root of the upper limit ''m'', running at about ~ <i>n<sup>1.4</sup></i> in ''n'' primes produced, empirically. Returns infinite stream
Stops at the square root of the upper limit ''m'', running at about ~ <i>n<sup>1.4</sup></i> in ''n'' primes produced, empirically. Returns infinite stream
of numbers which is only valid up to ''m'', includes composites above it:
of numbers which is only valid up to ''m'', includes composites above it:
<lang scheme> (define (primes-To m)
<syntaxhighlight lang="scheme"> (define (primes-To m)
(define (sieve s)
(define (sieve s)
(let ((p (head s)))
(let ((p (head s)))
(cond ((> (* p p) m) s)
(cond ((> (* p p) m) s)
(else (s-cons p
(else (s-cons p
(sieve (s-diff (tail s) (from-By (* p p) p))))))))
(sieve (s-diff (tail s)
(sieve (from-By 2 1)))</lang>
(from-By (* p p) p))))))))
(sieve (from-By 2 1)))</syntaxhighlight>


====Combined multiples sieve====
====Combined multiples sieve====
Archetypal, straightforward approach by Richard Bird, presented in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf Melissa E. O'Neill article]. Uses <code>s-linear-join</code>, i.e. right fold, which is less efficient and of worse time complexity than the ''tree''-folding that follows. Does not attempt to conserve space by arranging for the additional inner feedback loop, as is done in the tree-folding variant below.
Archetypal, straightforward approach by Richard Bird, presented in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf Melissa E. O'Neill article]. Uses <code>s-linear-join</code>, i.e. right fold, which is less efficient and of worse time complexity than the ''tree''-folding that follows. Does not attempt to conserve space by arranging for the additional inner feedback loop, as is done in the tree-folding variant below.
<lang scheme> (define (primes-stream-ala-Bird)
<syntaxhighlight lang="scheme"> (define (primes-stream-ala-Bird)
(define (mults p) (from-By (* p p) p))
(define (mults p) (from-By (* p p) p))
(define primes ;; primes are
(define primes ;; primes are
Line 9,278: Line 18,130:
(s-cons (head (head sts))
(s-cons (head (head sts))
(s-union (tail (head sts))
(s-union (tail (head sts))
(s-linear-join (tail sts)))))</lang>
(s-linear-join (tail sts)))))</syntaxhighlight>


Here is a version of the same sieve, which is self contained with all the requisite functions wrapped in the overall function; optimized further. It works with odd primes only, and arranges for a separate primes feed for the base primes separate from the output stream, ''calculated recursively'' by the recursive call to "oddprms" in forming "cmpsts". It also ''"fuses"'' two functions, <code>s-diff</code> and <code>from-By</code>, into one, <code>minusstrtat</code>:
Here is a version of the same sieve, which is self contained with all the requisite functions wrapped in the overall function; optimized further. It works with odd primes only, and arranges for a separate primes feed for the base primes separate from the output stream, ''calculated recursively'' by the recursive call to "oddprms" in forming "cmpsts". It also ''"fuses"'' two functions, <code>s-diff</code> and <code>from-By</code>, into one, <code>minusstrtat</code>:


<lang scheme>(define (birdPrimes)
<syntaxhighlight lang="scheme">(define (birdPrimes)
(define (mltpls p)
(define (mltpls p)
(define pm2 (* p 2))
(define pm2 (* p 2))
Line 9,304: Line 18,156:
(define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
(define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
(define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))
(define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))
(cons 2 (lambda () (oddprms))))</lang>
(cons 2 (lambda () (oddprms))))</syntaxhighlight>


It can be tested with the following code:
It can be tested with the following code:


<lang scheme>(define (nthPrime n)
<syntaxhighlight lang="scheme">(define (nthPrime n)
(let nxtprm ((cnt 0) (ps (birdPrimes)))
(let nxtprm ((cnt 0) (ps (birdPrimes)))
(if (< cnt n) (nxtprm (+ cnt 1) ((cdr ps))) (car ps))))
(if (< cnt n) (nxtprm (+ cnt 1) ((cdr ps))) (car ps))))
(nthPrime 1000000)</lang>
(nthPrime 1000000)</syntaxhighlight>


{{output}}
{{output}}
Line 9,321: Line 18,173:
The most efficient. Finds composites as a tree of unions of each prime's multiples.
The most efficient. Finds composites as a tree of unions of each prime's multiples.


<lang scheme> ;;;; all primes' multiples are removed, merged through a tree of unions
<syntaxhighlight lang="scheme"> ;;;; all primes' multiples are removed, merged through a tree of unions
;;;; runs in ~ n^1.15 run time in producing n = 100K .. 1M primes
;;;; runs in ~ n^1.15 run time in producing n = 100K .. 1M primes
(define (primes-stream)
(define (primes-stream)
(define (mults p) (from-By (* p p) (* 2 p)))
(define (mults p) (from-By (* p p) (* 2 p)))
(define (no-mults-From from)
(define (odd-primes-From from) ;; odd primes from (odd) f are
(s-diff (from-By from 2)
(s-diff (from-By from 2) ;; all odds from f without the
(s-tree-join (s-map mults odd-primes))))
(s-tree-join (s-map mults odd-primes)))) ;; multiples of odd primes
(define odd-primes
(define odd-primes
(s-cons 3 (no-mults-From 5))) ;; inner feedback loop
(s-cons 3 (odd-primes-From 5))) ;; inner feedback loop
(s-cons 2 (no-mults-From 3))) ;; result stream
(s-cons 2 (odd-primes-From 3))) ;; result stream


;;;; join an ordered stream of streams (here, of primes' multiples)
;;;; join an ordered stream of streams (here, of primes' multiples)
Line 9,343: Line 18,195:
(s-union (tail (head sts))
(s-union (tail (head sts))
(head (tail sts))))
(head (tail sts))))
(pairs (tail (tail sts)))))</lang>
(pairs (tail (tail sts)))))</syntaxhighlight>


[http://ideone.com/Uuil5M Print 10 last primes] of the first thousand primes:
[http://ideone.com/Uuil5M Print 10 last primes] of the first thousand primes:
Line 9,353: Line 18,205:
This can be also accomplished by the following self contained code which follows the format of the <code>birdPrimes</code> code above with the added "pairs" function integrated into the "mrgmltpls" function:
This can be also accomplished by the following self contained code which follows the format of the <code>birdPrimes</code> code above with the added "pairs" function integrated into the "mrgmltpls" function:


<lang scheme>(define (treemergePrimes)
<syntaxhighlight lang="scheme">(define (treemergePrimes)
(define (mltpls p)
(define (mltpls p)
(define pm2 (* p 2))
(define pm2 (* p 2))
Line 9,379: Line 18,231:
(define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
(define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
(define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))
(define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))
(cons 2 (lambda () (oddprms))))</lang>
(cons 2 (lambda () (oddprms))))</syntaxhighlight>


It can be tested with the same code as the self-contained Richard Bird sieve, just by calling <code>treemergePrimes</code> instead of <code>birdPrimes</code>.
It can be tested with the same code as the self-contained Richard Bird sieve, just by calling <code>treemergePrimes</code> instead of <code>birdPrimes</code>.
Line 9,385: Line 18,237:
===Generators===
===Generators===


<lang scheme>(define (integers n)
<syntaxhighlight lang="scheme">(define (integers n)
(lambda ()
(lambda ()
(let ((ans n))
(let ((ans n))
Line 9,409: Line 18,261:
x)))
x)))


(define primes (sieve (integers 2))) </lang>
(define primes (sieve (integers 2))) </syntaxhighlight>


=={{header|Scilab}}==
=={{header|Scilab}}==
<lang scliab> clear
<syntaxhighlight lang="scliab">function a = sieve(n)
n=99
a = ~zeros(n, 1)
sieve=ones(1,n+2)
a(1) = %f
for i=2:n
for i = 1:n
if sieve(i) then
if a(i)
for j=i*2:i:n
j = i*i
sieve(j)=0
if j > n
return
end
end
a(j:i:n) = %f
end
end
end
end
endfunction
for i=2:n

if sieve(i) then disp(i); end
find(sieve(100))
end</lang>
// [2 3 5 ... 97]

sum(sieve(1000))
// 168, the number of primes below 1000</syntaxhighlight>


=={{header|Scratch}}==
<syntaxhighlight lang="scratch">
when clicked
broadcast: fill list with zero (0) and wait
broadcast: put one (1) in list of multiples and wait
broadcast: fill primes where zero (0 in list

when I receive: fill list with zero (0)
delete all of primes
delete all of list
set i to 0
set maximum to 25
repeat maximum
add 0 to list
change i by 1
{end repeat}

when I receive: put ones (1) in list of multiples
set S to sqrt of maximum
set i to 2
set k to 0
repeat S
change J by 1
set i to 2
repeat until i > 100
if not (i = J) then
if item i of list = 0 then
set m to (i mod J)
if (m = 0) then
replace item i of list with 1
{end repeat until}
change i by 1
set k to 1
delete all of primes
{end repeat}
set J to 1

when I receive: fill primes where zeros (0) in list
repeat maximum
if (item k of list) = 0 then
add k to primes
set k to (k + 1)
{end repeat}
</syntaxhighlight>

===Comments===
Scratch is a graphical drag and drop language designed to teach
children an introduction to programming. It has easy to use
multimedia and animation features. The code listed above was not
entered into the Scratch IDE but faithfully represents the graphical
code blocks used to run the sieve algorithm. The actual Scratch
graphical code blocks cannot be represented on this web site due
to its inability to directly represent graphical code. The actual code and output
can be seen or downloaded at an external URL web link:

[http://melellington.com/sieve/scratch-sieve-output.pdf Scratch Code and Output]


=={{header|Seed7}}==
=={{header|Seed7}}==
The program below computes the number of primes between 1 and 10000000:
The program below computes the number of primes between 1 and 10000000:
<lang seed7>$ include "seed7_05.s7i";
<syntaxhighlight lang="seed7">$ include "seed7_05.s7i";


const func set of integer: eratosthenes (in integer: n) is func
const func set of integer: eratosthenes (in integer: n) is func
Line 9,450: Line 18,366:
begin
begin
writeln(card(eratosthenes(10000000)));
writeln(card(eratosthenes(10000000)));
end func;</lang>
end func;</syntaxhighlight>


Original source: [http://seed7.sourceforge.net/algorith/math.htm#sieve_of_eratosthenes]
Original source: [http://seed7.sourceforge.net/algorith/math.htm#sieve_of_eratosthenes]

=={{header|SETL}}==
<syntaxhighlight lang="setl">program eratosthenes;
print(sieve 100);

op sieve(n);
numbers := [1..n];
numbers(1) := om;
loop for i in [2..floor sqrt n] do
loop for j in [i*i, i*i+i..n] do
numbers(j) := om;
end loop;
end loop;
return [n : n in numbers | n /= om];
end op;
end program;</syntaxhighlight>
{{out}}
<pre>[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]</pre>


=={{header|Sidef}}==
=={{header|Sidef}}==
{{trans|Perl 6}}
{{trans|Raku}}
<lang ruby>func sieve(limit) {
<syntaxhighlight lang="ruby">func sieve(limit) {
var sieve_arr = [false, false, (limit-1).of(true)...]
var sieve_arr = [false, false, (limit-1).of(true)...]
gather {
gather {
Line 9,470: Line 18,404:
}
}


say sieve(100).join(",")</lang>
say sieve(100).join(",")</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 9,477: Line 18,411:


Alternative implementation:
Alternative implementation:
<lang ruby>func sieve(limit) {
<syntaxhighlight lang="ruby">func sieve(limit) {
var composite = []
var composite = []
for n in (2 .. limit.isqrt) {
for n in (2 .. limit.isqrt) {
Line 9,487: Line 18,421:
}
}


say sieve(100).join(",")</lang>
say sieve(100).join(",")</syntaxhighlight>


=={{header|Simula}}==
=={{header|Simula}}==
{{works with|Simula-67}}
{{works with|Simula-67}}
<lang simula>BEGIN
<syntaxhighlight lang="simula">BEGIN
INTEGER ARRAY t(0:1000);
INTEGER ARRAY t(0:1000);
INTEGER i,j,k;
INTEGER i,j,k;
Line 9,518: Line 18,452:
OutInt(i,5); OutImage
OutInt(i,5); OutImage
END
END
END</lang>
END</syntaxhighlight>
{{out}}
{{out}}
<pre style="height:20ex"> 2
<pre style="height:20ex"> 2
Line 9,542: Line 18,476:
997</pre>
997</pre>
===A Concurrent Prime Sieve===
===A Concurrent Prime Sieve===
<lang simula>
<syntaxhighlight lang="simula">
! A CONCURRENT PRIME SIEVE ;
! A CONCURRENT PRIME SIEVE ;


Line 9,650: Line 18,584:
END;
END;
END;
END;
</syntaxhighlight>
</lang>
Output:
Output:
<pre>
<pre>
Line 9,696: Line 18,630:
MAIN BLOCK RECEIVES 11
MAIN BLOCK RECEIVES 11
</pre>
</pre>

=={{header|Smalltalk}}==
A simple implementation that you can run in a workspace. It finds all the prime numbers up to and including <i>limit</i>—for the sake of example, up to and including 100.
<syntaxhighlight lang="smalltalk">| potentialPrimes limit |
limit := 100.
potentialPrimes := Array new: limit.
potentialPrimes atAllPut: true.
2 to: limit sqrt do: [:testNumber |
(potentialPrimes at: testNumber) ifTrue: [
(testNumber * 2) to: limit by: testNumber do: [:nonPrime |
potentialPrimes at: nonPrime put: false
]
]
].
2 to: limit do: [:testNumber |
(potentialPrimes at: testNumber) ifTrue: [
Transcript show: testNumber asString; cr
]
]</syntaxhighlight>


=={{header|SNOBOL4}}==
=={{header|SNOBOL4}}==
Line 9,701: Line 18,654:
Using strings instead of arrays, and the square/sqrt optimizations.
Using strings instead of arrays, and the square/sqrt optimizations.


<lang SNOBOL4> define('sieve(n)i,j,k,p,str,res') :(sieve_end)
<syntaxhighlight lang="snobol4"> define('sieve(n)i,j,k,p,str,res') :(sieve_end)
sieve i = lt(i,n - 1) i + 1 :f(sv1)
sieve i = lt(i,n - 1) i + 1 :f(sv1)
str = str (i + 1) ' ' :(sieve)
str = str (i + 1) ' ' :(sieve)
Line 9,717: Line 18,670:
* # Test and display
* # Test and display
output = sieve(100)
output = sieve(100)
end</lang>
end</syntaxhighlight>


Output:
Output:
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</pre>
<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</pre>

=={{header|SparForte}}==
As a structured script.
<syntaxhighlight lang="ada">#!/usr/local/bin/spar
pragma annotate( summary, "sieve" );
pragma annotate( description, "The Sieve of Eratosthenes is a simple algorithm that" );
pragma annotate( description, "finds the prime numbers up to a given integer. Implement ");
pragma annotate( description, "this algorithm, with the only allowed optimization that" );
pragma annotate( description, "the outer loop can stop at the square root of the limit," );
pragma annotate( description, "and the inner loop may start at the square of the prime" );
pragma annotate( description, "just found. That means especially that you shouldn't" );
pragma annotate( description, "optimize by using pre-computed wheels, i.e. don't assume" );
pragma annotate( description, "you need only to cross out odd numbers (wheel based on" );
pragma annotate( description, "2), numbers equal to 1 or 5 modulo 6 (wheel based on 2" );
pragma annotate( description, "and 3), or similar wheels based on low primes." );
pragma annotate( see_also, "http://rosettacode.org/wiki/Sieve_of_Eratosthenes" );
pragma annotate( author, "Ken O. Burtch" );
pragma license( unrestricted );

pragma restriction( no_external_commands );

procedure sieve is
last_bool : constant positive := 20;
type bool_array is array(2..last_bool) of boolean;
a : bool_array;
test_num : positive;
-- limit : positive := positive(numerics.sqrt(float(arrays.last(a))));

-- n : positive := 2;
begin
for i in arrays.first(a)..last_bool loop
a(i) := true;
end loop;

for num in arrays.first(a)..last_bool loop
if a(num) then
test_num := num * num;
while test_num <= last_bool loop
a(test_num) := false;
test_num := @ + num;
end loop;
end if;
end loop;
for i in arrays.first(a)..last_bool loop
if a(i) then
put_line(i);
end if;
end loop;
end sieve;</syntaxhighlight>

=={{header|Standard ML}}==
Works with SML/NJ. This uses BitArrays which are available in SML/NJ. The algorithm is the one on wikipedia, referred to above. Limit: Memory, normally. When more than 20 petabyte of memory available, this code will have its limitation at a maximum integer around 1.44*10E17, due to the maximum list length in SMLNJ. Using two extra loops, however, bit arrays can simply be stored to disk and processed in multiple lists. With a tail recursive wrapper function as well, the upper limit will be determined by available disk space only.
<syntaxhighlight lang="standard ml">
val segmentedSieve = fn N =>
(* output : list of {segment=bit segment, start=number at startposition segment} *)

let

val NSegmt= 120000000; (* segment size *)


val inf2i = IntInf.toInt ;
val i2inf = IntInf.fromInt ;
val isInt = fn m => m <= IntInf.fromInt (valOf Int.maxInt);

val sweep = fn (bits, step, k, up) => (* strike off bits up to limit *)
(while ( !k < up andalso 0 <= !k ) do
( BitArray.clrBit( bits, !k ) ; k:= !k + step ; ()) ) handle Overflow => ()

val rec nextPrimebit = (* find next 1-bit within segment *)
fn Bits =>
fn pos =>
if pos+1 >= BitArray.length Bits
then NONE
else ( if BitArray.sub ( Bits,pos) then SOME (i2inf pos) else nextPrimebit Bits (pos+1) );


val sieveEratosthenes = fn n: int => (* Eratosthenes sieve , up to+incl n *)

let
val nums= BitArray.bits(n,[] );
val i=ref 2;
val k=ref (!i * (!i) -1);

in

( BitArray.complement nums;
BitArray.clrBit( nums, 0 );
while ( !k <n ) do ( if ( BitArray.sub (nums, !i - 1) ) then sweep (nums, !i, k, n) else ();
i:= !i+1;
k:= !i * (!i) - 1
);
[ { start= i2inf 1, segment=nums } ]
)

end;



val sieveThroughSegment =

fn ( primes : { segment:BitArray.array, start:IntInf.int } list, low : IntInf.int, up ) =>
(* second segment and on *)
let
val n = inf2i (up-low+1)
val nums = BitArray.bits(n,[] );
val itdone = low div i2inf NSegmt

val rec oldprimes = fn c => fn (* use segment B to sweep current one *)
[] => ()
| ctlist as {start=st,segment=B}::t =>
let
val nxt = nextPrimebit B c ;
val p = st + Option.getOpt( nxt,~10)
val modp = ( i2inf NSegmt * itdone ) mod p
val i = inf2i ( if( isInt( p - modp ) ) then p - modp else 0 ) (* i = 0 breaks off *)
val k = ref ( if Option.isSome nxt then (i - 1) else ~2 )
val step = if (isInt(p)) then inf2i(p) else valOf Int.maxInt (* !k+maxInt > n *)

in
( sweep (nums, step, k, n) ;
if ( p*p <= up andalso Option.isSome nxt )
then oldprimes ( inf2i (p-st+1) ) ctlist
else oldprimes 0 t (* next segment B *)
)

end

in
( BitArray.complement nums;
oldprimes 0 primes;
rev ( {start = low, segment = nums } :: rev (primes) )
)
end;



val rec workSegmentsDown = fn firstFn =>
fn nextFns =>
fn sizeSeg : int =>
fn upLimit : IntInf.int =>
let
val residual = upLimit mod i2inf sizeSeg
in

if ( upLimit <= i2inf sizeSeg ) then firstFn ( inf2i upLimit )
else
if ( residual > 0 ) then
nextFns ( workSegmentsDown firstFn nextFns sizeSeg (upLimit - residual ), upLimit - residual + 1, upLimit )
else
nextFns ( workSegmentsDown firstFn nextFns sizeSeg (upLimit - i2inf sizeSeg), upLimit - i2inf sizeSeg + 1, upLimit )
end;

in

workSegmentsDown sieveEratosthenes sieveThroughSegment NSegmt N

end;
</syntaxhighlight>
Example, segment size 120 million, prime numbers up to 2.5 billion:
<syntaxhighlight lang="standard ml">
-val writeSegment = fn L : {segment:BitArray.array, start:IntInf.int} list => fn NthSegment =>
let
val M=List.nth (L , NthSegment - 1 )
in
List.map (fn x=> x + #start M) (map IntInf.fromInt (BitArray.getBits ( #segment M)) )
end;
- val primesInBits = segmentedSieve 2500000000 ;
val primesInBits =
[{segment=-,start=1},{segment=-,start=120000001},
{segment=-,start=240000001},{segment=-,start=360000001},
{segment=-,start=480000001},.. <skipped> ,...]
: {segment:BitArray.array, start:IntInf.int} list
- writeSegment primesInBits 21 ;
val it =
[2400000011,2400000017,2400000023,2400000047,2400000061,2400000073,
2400000091,2400000103,2400000121,2400000133,2400000137,2400000157,...]
: IntInf.int list
- writeSegment primesInBits 1 ;
val it = [2,3,5,7,11,13,17,19,23,29,31,37,...] : IntInf.int list


</syntaxhighlight>

=={{header|Stata}}==
A program to create a dataset with a variable p containing primes up to a given number.
<syntaxhighlight lang="stata">prog def sieve
args n
clear
qui set obs `n'
gen long p=_n
gen byte a=_n>1
forv i=2/`n' {
if a[`i'] {
loc j=`i'*`i'
if `j'>`n' {
continue, break
}
forv k=`j'(`i')`n' {
qui replace a=0 in `k'
}
}
}
qui keep if a
drop a
end</syntaxhighlight>

Example call
<syntaxhighlight lang="stata">sieve 100
list in 1/10 // show only the first ten primes

+----+
| p |
|----|
1. | 2 |
2. | 3 |
3. | 5 |
4. | 7 |
5. | 11 |
|----|
6. | 13 |
7. | 17 |
8. | 19 |
9. | 23 |
10. | 29 |
+----+</syntaxhighlight>

=== Mata ===

<syntaxhighlight lang="stata">mata
real colvector sieve(real scalar n) {
real colvector a
real scalar i, j
if (n < 2) return(J(0, 1, .))
a = J(n, 1, 1)
a[1] = 0
for (i = 1; i <= n; i++) {
if (a[i]) {
j = i*i
if (j > n) return(select(1::n, a))
for (; j <= n; j = j+i) a[j] = 0
}
}
}

sieve(10)
1
+-----+
1 | 2 |
2 | 3 |
3 | 5 |
4 | 7 |
+-----+
end</syntaxhighlight>


=={{header|Swift}}==
=={{header|Swift}}==
<lang swift>import Foundation
<syntaxhighlight lang="swift">import Foundation // for sqrt() and Date()


let max = 1_000_000
func primes(n: Int) -> AnyGenerator<Int> {
let maxroot = Int(sqrt(Double(max)))
let startingPoint = Date()
var (seive, i) = ([Int](0..<n), 1)

let lim = Int(sqrt(Double(n)))
var isprime = [Bool](repeating: true, count: max+1 )
for i in 2...maxroot {
return anyGenerator {
while ++i < n {
if isprime[i] {
for k in stride(from: max/i, through: i, by: -1) {
if seive[i] != 0 {
if i <= lim {
if isprime[k] {
for notPrime in stride(from: i*i, to: n, by: i) {
isprime[i*k] = false }
seive[notPrime] = 0
}
}
}

var count = 0
for i in 2...max {
if isprime[i] {
count += 1
}
}
print("\(count) primes found under \(max)")

print("\(startingPoint.timeIntervalSinceNow * -1) seconds")</syntaxhighlight>
{{output}}
<pre>78498 primes found under 1000000
0.01282501220703125 seconds</pre>

iMac 3,2 GHz Intel Core i5

'''Alternative odds-only version'''

The most obvious two improvements are to sieve for only odds as two is the only even prime and to make the sieving array bit-packed so that instead of a whole 8-bit byte per number representation there, each is represented by just one bit; these two changes improved memory use by a factor of 16 and the better CPU cache locality more than compensates for the extra code required to implement bit packing as per the following code:
<syntaxhighlight lang="swift">func soePackedOdds(_ n: Int) ->
LazyMapSequence<UnfoldSequence<Int, (Int?, Bool)>, Int> {
let lmti = (n - 3) / 2
let size = lmti / 8 + 1
var sieve = Array<UInt8>(repeating: 0, count: size)
let sqrtlmti = (Int(sqrt(Double(n))) - 3) / 2
for i in 0...sqrtlmti {
if sieve[i >> 3] & (1 << (i & 7)) == 0 {
let p = i + i + 3
for c in stride(from: (i*(i+3)<<1)+3, through: lmti, by: p) {
sieve[c >> 3] |= 1 << (c & 7)
}
}
}

return sequence(first: -1, next: { (i:Int) -> Int? in
var ni = i + 1
while ni <= lmti && sieve[ni >> 3] & (1 << (ni & 7)) != 0 { ni += 1}
return ni > lmti ? nil : ni
}).lazy.map { $0 < 0 ? 2 : $0 + $0 + 3 }
}</syntaxhighlight>

the output for the same testing (with `soePackedOdds` substituted for `primes`) is the same except that it is about 1.5 times faster or only 1200 cycles per prime.

These "one huge sieving array" algorithms are never going to be very fast for extended ranges (past about the CPU L2 cache size for this processor supporting a range of about eight million), and a page segmented approach should be taken as per the last of the unbounded algorithms below.

===Unbounded (Odds-Only) Versions===

To use Swift's "higher order functions" on the generated `Sequence`'s effectively, one needs unbounded (or only by the numeric range chosen for the implementation) sieves. Many of these are incremental sieves that, instead of buffering a series of culled arrays, records the culling structure of the culling base primes (which should be a secondary stream of primes for efficiency) and produces the primes incrementally through reference and update of that structure. Various structures may be chosen for this, as in a MinHeap Priority Queue, a Hash Dictionary, or a simple List tree structure as used in the following code:
<syntaxhighlight lang="swift">import Foundation

func soeTreeFoldingOdds() -> UnfoldSequence<Int, (Int?, Bool)> {
class CIS<T> {
let head: T
let rest: (() -> CIS<T>)
init(_ hd: T, _ rst: @escaping (() -> CIS<T>)) {
self.head = hd; self.rest = rst
}
}

func merge(_ xs: CIS<Int>, _ ys: CIS<Int>) -> CIS<Int> {
let x = xs.head; let y = ys.head
if x < y { return CIS(x, {() in merge(xs.rest(), ys) }) }
else { if y < x { return CIS(y, {() in merge(xs, ys.rest()) }) }
else { return CIS(x, {() in merge(xs.rest(), ys.rest()) }) } }
}

func smults(_ p: Int) -> CIS<Int> {
let inc = p + p
func smlts(_ c: Int) -> CIS<Int> {
return CIS(c, { () in smlts(c + inc) })
}
return smlts(p * p)
}

func allmults(_ ps: CIS<Int>) -> CIS<CIS<Int>> {
return CIS(smults(ps.head), { () in allmults(ps.rest()) })
}

func pairs(_ css: CIS<CIS<Int>>) -> CIS<CIS<Int>> {
let cs0 = css.head; let ncss = css.rest()
return CIS(merge(cs0, ncss.head), { () in pairs(ncss.rest()) })
}

func cmpsts(_ css: CIS<CIS<Int>>) -> CIS<Int> {
let cs0 = css.head
return CIS(cs0.head, { () in merge(cs0.rest(), cmpsts(pairs(css.rest()))) })
}

func minusAt(_ n: Int, _ cs: CIS<Int>) -> CIS<Int> {
var nn = n; var ncs = cs
while nn >= ncs.head { nn += 2; ncs = ncs.rest() }
return CIS(nn, { () in minusAt(nn + 2, ncs) })
}

func oddprms() -> CIS<Int> {
return CIS(3, { () in minusAt(5, cmpsts(allmults(oddprms()))) })
}

var odds = oddprms()
return sequence(first: 2, next: { _ in
let p = odds.head; odds = odds.rest()
return p
})
}

let range = 100000000

print("The primes up to 100 are:")
soeTreeFoldingOdds().prefix(while: { $0 <= 100 })
.forEach { print($0, "", terminator: "") }
print()

print("Found \(soeTreeFoldingOdds().lazy.prefix(while: { $0 <= 1000000 })
.reduce(0) { (a, _) in a + 1 }) primes to 1000000.")

let start = NSDate()
let answr = soeTreeFoldingOdds().prefix(while: { $0 <= range })
.reduce(0) { (a, _) in a + 1 }
let elpsd = -start.timeIntervalSinceNow

print("Found \(answr) primes to \(range).")

print(String(format: "This test took %.3f milliseconds.", elpsd * 1000))</syntaxhighlight>

The output is the same as for the above except that it is much slower at about 56,000 CPU clock cycles per prime even just sieving to ten million due to the many memory allocations/de-allocations. It also has a O(n (log n) (log (log n))) asymptotic computational complexity (with the extra "log n" factor) that makes it slower with increasing range. This makes this algorithm only useful up to ranges of a few million although it is adequate to solve trivial problems such as Euler Problem 10 of summing all the primes to two million.

Note that the above code is almost a pure functional version using immutability other than for the use of loops because Swift doesn't support Tail Call Optimization (TCO) in function recursion: the loops do what TCO usually automatically does "under the covers".

'''Alternative version using a (mutable) Hash Dictionary'''

As the above code is slow due to memory allocations/de-allocations and the inherent extra "log n" term in the complexity, the following code uses a Hash Dictionary which has an average of O(1) access time (without the "log n" and uses mutability for increased seed so is in no way purely functional:
<syntaxhighlight lang="swift">func soeDictOdds() -> UnfoldSequence<Int, Int> {
var bp = 5; var q = 25
var bps: UnfoldSequence<Int, Int>.Iterator? = nil
var dict = [9: 6] // Dictionary<Int, Int>(9 => 6)
return sequence(state: 2, next: { n in
if n < 9 { if n < 3 { n = 3; return 2 }; defer {n += 2}; return n }
while n >= q || dict[n] != nil {
if n >= q {
let inc = bp + bp
dict[n + inc] = inc
if bps == nil {
bps = soeDictOdds().makeIterator()
bp = (bps?.next())!; bp = (bps?.next())!; bp = (bps?.next())! // skip 2/3/5...
}
bp = (bps?.next())!; q = bp * bp // guaranteed never nil
} else {
let inc = dict[n] ?? 0
dict[n] = nil
var next = n + inc
while dict[next] != nil { next += inc }
dict[next] = inc
}
n += 2
}
defer { n += 2 }; return n
})
}</syntaxhighlight>

It can be substituted in the above code just by substituting the `soeDictOdds` in three places in the testing code with the same output other than it is over four times faster or about 12,500 CPU clock cycles per prime.

'''Fast Bit-Packed Page-Segmented Version'''

An unbounded array-based algorithm can be written that combines the excellent cache locality of the second bounded version above but is unbounded by producing a sequence of sieved bit-packed arrays that are CPU cache size as required with a secondary stream of base primes used in culling produced in the same fashion, as in the following code:
<syntaxhighlight lang="swift">import Foundation

typealias Prime = UInt64
typealias BasePrime = UInt32
typealias SieveBuffer = [UInt8]
typealias BasePrimeArray = [UInt32]

// the lazy list decribed problems don't affect its use here as
// it is only used here for its memoization properties and not consumed...
// In fact a consumed deferred list would be better off to use a CIS as above!

// a lazy list to memoize the progression of base prime arrays...
// there is some bug in Swift 4.2 that generating a LazyList<T> with a
// function and immediately using an extension method on it without
// first storing it to a variable results in mem seg fault for large
// ranges in the order of a million; in order to write a consuming
// function, one must write a function passing in a generator thunk, and
// immediately call a `makeIterator()` on it before storing, then doing a
// iteration on the iterator; doing a for on the immediately produced
// LazyList<T> (without storing it) also works, but this means we have to
// implement the "higher order functions" ourselves.
// this bug may have something to do with "move sematics".
class LazyList<T> : LazySequenceProtocol {
internal typealias Thunk<T> = () -> T
let head : T
internal var _thnk: Thunk<LazyList<T>?>?
lazy var tail: LazyList<T>? = {
let tl = self._thnk?(); self._thnk = nil
return tl
}()
init(_ hd: T, _ thnk: @escaping Thunk<LazyList<T>?>) {
self.head = hd; self._thnk = thnk
}
struct LLSeqIter : IteratorProtocol, LazySequenceProtocol {
@usableFromInline
internal var _isfirst: Bool = true
@usableFromInline
internal var _current: LazyList<T>
@inlinable // ensure that reference is not released by weak reference
init(_ base: LazyList<T>) { self._current = base }
@inlinable // can't be called by multiple threads on same LLSeqIter...
mutating func next() -> T? {
let curll = self._current
if (self._isfirst) { self._isfirst = false; return curll.head }
let ncur = curll.tail
if (ncur == nil) { return nil }
self._current = ncur!
return ncur!.head
}
@inlinable
func makeIterator() -> LLSeqIter {
return LLSeqIter(self._current)
}
}
@inlinable
func makeIterator() -> LLSeqIter {
return LLSeqIter(self)
}
}

internal func makeCLUT() -> Array<UInt8> {
var clut = Array(repeating: UInt8(0), count: 65536)
for i in 0..<65536 {
let v0 = ~i & 0xFFFF
let v1 = v0 - ((v0 & 0xAAAA) >> 1)
let v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2)
let v3 = (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) &* 0x0101)) >> 8) & 31
clut[i] = UInt8(v3)
}
return clut
}

internal let CLUT = makeCLUT()

internal func countComposites(_ cmpsts: SieveBuffer) -> Int {
let len = cmpsts.count >> 1
let clutp = UnsafePointer(CLUT) // for faster un-bounds checked access
var bufp = UnsafeRawPointer(UnsafePointer(cmpsts))
.assumingMemoryBound(to: UInt16.self)
let plmt = bufp + len
var count: Int = 0
while (bufp < plmt) {
count += Int(clutp[Int(bufp.pointee)])
bufp += 1
}
return count
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
internal func composites2BasePrimeArray(_ low: BasePrime, _ cmpsts: SieveBuffer)
-> BasePrimeArray {
let lmti = cmpsts.count << 3
let len = countComposites(cmpsts)
var rslt = BasePrimeArray(repeating: BasePrime(0), count: len)
var j = 0
for i in 0..<lmti {
if (cmpsts[i >> 3] & (1 << (i & 7)) == UInt8(0)) {
rslt[j] = low + BasePrime(i + i); j += 1
}
}
return rslt
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
// uses pointers to avoid bounds checking for speed, but bounds are checked in code.
// uses an improved algorithm to maximize simple culling loop speed for
// the majority of cases of smaller base primes, only reverting to normal
// bit-packing operations for larger base primes...
// NOTE: a further optimization of maximum loop unrolling can later be
// implemented when warranted after maximum wheel factorization is implemented.
internal func sieveComposites(
_ low: Prime, _ buf: SieveBuffer,
_ bpas: LazyList<BasePrimeArray>) {
let lowi = Int64((low - 3) >> 1)
let len = buf.count
let lmti = Int64(len << 3)
let bufp = UnsafeMutablePointer(mutating: buf)
let plen = bufp + len
let nxti = lowi + lmti
for bpa in bpas {
for bp in bpa {
let bp64 = Int64(bp)
let bpi64 = (bp64 - 3) >> 1
var strti = (bpi64 * (bpi64 + 3) << 1) + 3
if (strti >= nxti) { return }
if (strti >= lowi) { strti -= lowi }
else {
let r = (lowi - strti) % bp64
strti = r == 0 ? 0 : bp64 - r
}
if (bp <= UInt32(len >> 3) && strti <= (lmti - 20 * bp64)) {
let slmti = min(lmti, strti + (bp64 << 3))
while (strti < slmti) {
let msk = UInt8(1 << (strti & 7))
var cp = bufp + Int(strti >> 3)
while (cp < plen) {
cp.pointee |= msk; cp += Int(bp64)
}
}
strti &+= bp64
}
}
else {
var c = strti
let nbufp = UnsafeMutableRawPointer(bufp)
.assumingMemoryBound(to: Int32.self)
while (c < lmti) {
nbufp[Int(c >> 5)] |= 1 << (c & 31)
c &+= bp64
}
}
return i
}
}
}
}
return nil
}
}
}
}</lang>

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
// following used for fast clearing of SieveBuffer of multiple base size...
internal let clrbpseg = SieveBuffer(repeating: UInt8(0), count: 512)
internal func makeBasePrimeArrays() -> LazyList<BasePrimeArray> {
var cmpsts = SieveBuffer(repeating: UInt8(0), count: 512)
func nextelem(_ low: BasePrime, _ bpas: LazyList<BasePrimeArray>)
-> LazyList<BasePrimeArray> {
// calculate size so that the bit span is at least as big as the
// maximum culling prime required, rounded up to minsizebits blocks...
let rqdsz = 2 + Int(sqrt(Double(1 + low)))
let sz = ((rqdsz >> 12) + 1) << 9 // size in bytes, blocks of 512 bytes
if (sz > cmpsts.count) {
cmpsts = SieveBuffer(repeating: UInt8(0), count: sz)
}
// fast clearing of the SieveBuffer array?
for i in stride(from: 0, to: cmpsts.count, by: 512) {
cmpsts.replaceSubrange(i..<i+512, with: clrbpseg)
}
sieveComposites(Prime(low), cmpsts, bpas)
let arr = composites2BasePrimeArray(low, cmpsts)
let nxt = low + BasePrime(cmpsts.count << 4)
return LazyList(arr, { nextelem(nxt, bpas) })
}
// pre-seeding breaks recursive race,
// as only known base primes used for first page...
let preseedarr: [BasePrime] = [
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]
return
LazyList(
preseedarr,
{ nextelem(BasePrime(101), makeBasePrimeArrays()) })
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
internal let clrseg = SieveBuffer(repeating: UInt8(0), count: 16384)
func makeSievePages()
-> UnfoldSequence<(Prime, SieveBuffer), ((Prime, SieveBuffer)?, Bool)> {
let bpas = makeBasePrimeArrays()
let cmpsts = SieveBuffer(repeating: UInt8(0), count: 16384)
let low = Prime(3)
sieveComposites(low, cmpsts, bpas)
return sequence(first: (low, cmpsts), next: { (low, cmpsts) in
var ncmpsts = cmpsts
let rqdsz = 2 + Int(sqrt(Double(1 + low))) // problem with sqrt not exact past about 10^12!!!!!!!!!
let sz = ((rqdsz >> 17) + 1) << 14 // size iin bytes, by chunks of 16384
if (sz > ncmpsts.count) {
ncmpsts = SieveBuffer(repeating: UInt8(0), count: sz)
}
// fast clearing of the SieveBuffer array?
for i in stride(from: 0, to: ncmpsts.count, by: 16384) {
ncmpsts.replaceSubrange(i..<i+16384, with: clrseg)
}
let nlow = low + Prime(ncmpsts.count << 4)
sieveComposites(nlow, ncmpsts, bpas)
return (nlow, ncmpsts)
})
}

func countPrimesTo(_ range: Prime) -> Int64 {
if (range < 3) { if (range < 2) { return Int64(0) }
else { return Int64(1) } }
let rngi = Int64(range - 3) >> 1
let clutp = UnsafePointer(CLUT) // for faster un-bounds checked access
var count: Int64 = 1
for sp in makeSievePages() {
let (low, cmpsts) = sp; let lowi = Int64(low - 3) >> 1
if ((lowi + Int64(cmpsts.count << 3)) > rngi) {
let lsti = Int(rngi - lowi); let lstw = lsti >> 4
let msk = UInt16(-2 << (lsti & 15))
var bufp = UnsafeRawPointer(UnsafePointer(cmpsts))
.assumingMemoryBound(to: UInt16.self)
let plmt = bufp + lstw
while (bufp < plmt) {
count += Int64(clutp[Int(bufp.pointee)]); bufp += 1
}
count += Int64(clutp[Int(bufp.pointee | msk)]);
break;
} else {
count += Int64(countComposites(cmpsts))
}
}
return count
}

// iterator of primes from the generated culled page segments...
struct PagedPrimesSeqIter: LazySequenceProtocol, IteratorProtocol {
@inlinable
init() {
self._pgs = makeSievePages().makeIterator()
self._cmpstsp = UnsafePointer(self._pgs.next()!.1)
}
@usableFromInline
internal var _pgs: UnfoldSequence<(Prime, SieveBuffer), ((Prime, SieveBuffer)?, Bool)>
@usableFromInline
internal var _i = -2
@usableFromInline
internal var _low = Prime(3)
@usableFromInline
internal var _cmpstsp: UnsafePointer<UInt8>
@usableFromInline
internal var _lmt = 131072
@inlinable
mutating func next() -> Prime? {
if self._i < -1 { self._i = -1; return Prime(2) }
while true {
repeat { self._i += 1 }
while self._i < self._lmt &&
(Int(self._cmpstsp[self._i >> 3]) & (1 << (self._i & 7))) != 0
if self._i < self._lmt { break }
let pg = self._pgs.next(); self._low = pg!.0
let cmpsts = pg!.1; self._lmt = cmpsts.count << 3
self._cmpstsp = UnsafePointer(cmpsts); self._i = -1
}
return self._low + Prime(self._i + self._i)
}
@inlinable
func makeIterator() -> PagedPrimesSeqIter {
return PagedPrimesSeqIter()
}
@inlinable
var elements: PagedPrimesSeqIter {
return PagedPrimesSeqIter()
}
}
// sequence over primes using the above prime iterator from page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as to sieve them...
func primesPaged() -> PagedPrimesSeqIter { return PagedPrimesSeqIter() }

let range = Prime(1000000000)

print("The first 25 primes are:")
primesPaged().prefix(25).forEach { print($0, "", terminator: "") }
print()

let start = NSDate()

let answr =
countPrimesTo(range) // fast way, following enumeration way is slower...
// primesPaged().prefix(while: { $0 <= range }).reduce(0, { a, _ in a + 1 })

let elpsd = -start.timeIntervalSinceNow

print("Found \(answr) primes up to \(range).")

print(String(format: "This test took %.3f milliseconds.", elpsd * 1000))</syntaxhighlight>
{{output}}
<pre>The first 25 primes are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 50847534 primes up to 1000000000.
This test took 2004.007 milliseconds.</pre>

This produces similar output but is many many times times faster at about 75 CPU clock cycles per prime as used here to count the primes to a billion. If one were to produce the answer by enumeration using the commented out `primesPaged()` function, the time to enumerate the results is about the same as the time to actually do the work of culling, so the example `countPrimesTo` function that does high-speed counting of found packed bits is implemented to eliminate the enumeration. For most problems over larger ranges, this approach is recommended, and could be used for summing the primes, finding first instances of maximum prime gaps, prime pairs, triples, etc.

Further optimizations as in maximum wheel factorization (a further about five times faster), multi-threading (for a further multiple of the effective number of cores used), maximum loop unrolling (about a factor of two for smaller base primes), and other techniques for higher ranges (above 16 billion in this case) can be used with increasing code complexity, but there is little point when using prime enumeration as output: ie. one could reduce the composite number culling time to zero but it would still take about 2.8 seconds to enumerate the results over the billion range in the case of this processor.

=={{header|Tailspin}}==
<syntaxhighlight lang="tailspin">
templates sieve
def limit: $;
@: [ 2..$limit ];
1 -> #
$@ !

when <..$@::length ?($@($) * $@($) <..$limit>)> do
templates sift
def prime: $;
@: $prime * $prime;
@sieve: [ $@sieve... -> # ];
when <..~$@> do
$ !
when <$@~..> do
@: $@ + $prime;
$ -> #
end sift

$@($) -> sift !
$ + 1 -> #
end sieve

1000 -> sieve ...-> '$; ' -> !OUT::write
</syntaxhighlight>
{{out}}
<pre>
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997
</pre>

Better version using the mutability of the @-state to just update a primality flag
<syntaxhighlight lang="tailspin">
templates sieve
def limit: $;
@: [ 1..$limit -> 1 ];
@(1): 0;
2..$limit -> #
$@ -> \[i](<=1> $i !\) !

when <?($@($) <=1>)> do
def prime2: $ * $;
$prime2..$limit:$ -> @sieve($): 0;
end sieve

1000 -> sieve... -> '$; ' -> !OUT::write
</syntaxhighlight>
{{out}}
<pre>
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997
</pre>


=={{header|Tcl}}==
=={{header|Tcl}}==
<lang tcl>package require Tcl 8.5
<syntaxhighlight lang="tcl">package require Tcl 8.5


proc sieve n {
proc sieve n {
Line 9,771: Line 19,519:
}
}


puts [sieve 100] ;# 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</lang>
puts [sieve 100] ;# 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97</syntaxhighlight>




Summary :/* {{header|TI-83 BASIC}} */
Summary :/* {{header|TI-83 BASIC}} */


=={{header|TI-83 BASIC}}==
=={{header|TI-83 BASIC}}==
<lang ti83b>Input "Limit:",N
<syntaxhighlight lang="ti83b">Input "Limit:",N
N→Dim(L1)
N→Dim(L1)
For(I,2,N)
For(I,2,N)
Line 9,796: Line 19,544:
End
End
End
End
ClrList L1</lang>
ClrList L1</syntaxhighlight>


=={{header|UNIX Shell}}==
=={{header|UNIX Shell}}==
===With array===
===With array===
{{works with|zsh}}
{{works with|Bourne Again SHell}}
{{works with|Korn Shell}}
<lang bash>#!/usr/bin/zsh
{{works with|Zsh}}

function primes() {
<syntaxhighlight lang="bash">function primes {
typeset -a a
typeset -a a
typeset i j
typeset i j
a[1]=""

for (( i = 2; i <= $1; i++ )); do
a[1]=""
for (( i = 2; i <= $1; i++ )); do
a[$i]=$i
done
a[$i]=$i
for (( i = 2; i * i <= $1; i++ )); do
done
if [[ ! -z ${a[$i]} ]]; then

for (( i = 2; i * i <= $1; i++ )); do
for (( j = i * i; j <= $1; j += i )); do
if [[ ! -z $a[$i] ]]; then
a[$j]=""
done
for (( j = i * i; j <= $1; j += i )); do
fi
a[$j]=""
done
done
printf '%s' "${a[2]}"
fi
printf ' %s' ${a[*]:3}
done
printf '\n'
print $a
}
}


primes 1000</lang>
primes 1000</syntaxhighlight>


{{works with|bash}}
{{Out}}
Output is a single long line:
{{works with|ksh93}}
{{works with|pdksh}}

<lang bash>function primes {
typeset a i=2 j m=$1
# No for (( ... )) loop in pdksh. Use while loop.
while (( i <= m )); do
a[$i]=$i
(( i++ ))
done

i=2
while (( j = i * i, j <= m )); do
if [[ -n ${a[$i]} ]]; then
while (( j <= m )); do
unset a[$j]
(( j += i ))
done
fi
(( i++ ))
done
# No print command in bash. Use echo command.
echo ${a[*]}
}

primes 1000</lang>

Both scripts output a single long line.
<pre>2 3 5 7 11 13 17 19 23 ... 971 977 983 991 997</pre>
<pre>2 3 5 7 11 13 17 19 23 ... 971 977 983 991 997</pre>


Line 9,859: Line 19,580:


{{works with|Bourne Shell}}
{{works with|Bourne Shell}}
<lang bash>#! /bin/sh
<syntaxhighlight lang="bash">#! /bin/sh


LIMIT=1000
LIMIT=1000
Line 9,901: Line 19,622:
eval \\$p$i && echo $i
eval \\$p$i && echo $i
i=\`expr $i + 1\`
i=\`expr $i + 1\`
done`</lang>
done`</syntaxhighlight>


===With piping===
===With piping===
{{incorrect|Bash|This version uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
Note: McIlroy misunderstood the Sieve of Eratosthenes as did many of his day including David Turner (1975); see [https://en.m.wikipedia.org/wiki/Sieve_of_Eratosthenes Sieve of Eratosthenes article on Wikipedia]


This is an elegant script by [https://en.wikipedia.org/wiki/Douglas_McIlroy M. Douglas McIlroy], one of the founding fathers of UNIX.

This implementation is explained in his paper [https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf "Coroutine prime number sieve"] (2014).

{{works with|Bourne Shell}}
<syntaxhighlight lang="bash">sourc() {
seq 2 1000
}

cull() {
while
read p || exit
do
(($p % $1 != 0)) && echo $p
done
}

sink() {
read p || exit
echo $p
cull $p | sink &
}

sourc | sink</syntaxhighlight>

This version works by piping 1s and 0s through ''sed''. The string of 1s and 0s represents the array of primes.
This version works by piping 1s and 0s through ''sed''. The string of 1s and 0s represents the array of primes.


{{works with|Bourne Shell}}
{{works with|Bourne Shell}}
<lang bash># Fill $1 characters with $2.
<syntaxhighlight lang="bash"># Fill $1 characters with $2.
fill () {
fill () {
# This pipeline would begin
# This pipeline would begin
Line 9,939: Line 19,689:
}
}


prime 1000</lang>
prime 1000</syntaxhighlight>


==={{header|C Shell}}===
==={{header|C Shell}}===
{{trans|CMake}}
{{trans|CMake}}
<lang csh># Sieve of Eratosthenes: Echoes all prime numbers through $limit.
<syntaxhighlight lang="csh"># Sieve of Eratosthenes: Echoes all prime numbers through $limit.
@ limit = 80
@ limit = 80


Line 9,970: Line 19,720:
endif
endif
@ i += 1
@ i += 1
end</lang>
end</syntaxhighlight>

=={{header|Ursala}}==
=={{header|Ursala}}==
{{incorrect|Ursala|It probably (remainder) uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
{{incorrect|Ursala|It probably (remainder) uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.}}
with no optimizations
with no optimizations
<lang Ursala>#import nat
<syntaxhighlight lang="ursala">#import nat


sieve = ~<{0,1}&& iota; @NttPX ~&r->lx ^/~&rhPlC remainder@rlX~|@r</lang>
sieve = ~<{0,1}&& iota; @NttPX ~&r->lx ^/~&rhPlC remainder@rlX~|@r</syntaxhighlight>
test program:
test program:
<lang Ursala>#cast %nL
<syntaxhighlight lang="ursala">#cast %nL


example = sieve 50</lang>{{out}}
example = sieve 50</syntaxhighlight>{{out}}
<2,3,5,7,11,13,17,19,23,29,31,37,41,43,47>
<2,3,5,7,11,13,17,19,23,29,31,37,41,43,47>


=={{header|Vala}}==
=={{header|Vala}}==
{{libheader|Gee}}Without any optimizations:
{{libheader|Gee}}Without any optimizations:
<lang vala>using Gee;
<syntaxhighlight lang="vala">using Gee;


ArrayList<int> primes(int limit){
ArrayList<int> primes(int limit){
Line 10,022: Line 19,773:
stdout.printf("\n");
stdout.printf("\n");
}</lang>{{out}
}</syntaxhighlight>{{out}
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

=={{header|VAX Assembly}}==
<syntaxhighlight lang="vax assembly"> 000F4240 0000 1 n=1000*1000
0000 0000 2 .entry main,0
7E 7CFD 0002 3 clro -(sp) ;result buffer
5E DD 0005 4 pushl sp ;pointer to buffer
10 DD 0007 5 pushl #16 ;descriptor -> len of buffer
0009 6
02 DD 0009 7 pushl #2 ;1st candidate
000B 8 test:
09 46'AF 6E E1 000B 9 bbc (sp), b^bits, found ;bc - bit clear
0010 10 next:
F3 6E 000F4240 8F F2 0010 11 aoblss #n, (sp), test ;+1: limit,index
04 0018 12 ret
0019 13 found:
04 AE 7F 0019 14 pushaq 4(sp) ;-> descriptor by ref
04 AE DF 001C 15 pushal 4(sp) ;-> prime on stack by ref
00000000'GF 02 FB 001F 16 calls #2, g^ots$cvt_l_ti ;convert integer to string
04 AE 7F 0026 17 pushaq 4(sp) ;
00000000'GF 01 FB 0029 18 calls #1, g^lib$put_output ;show result
0030 19
53 6E D0 0030 20 movl (sp), r3
0033 21 mult:
0002 53 6E 000F4240 8F F1 0033 22 acbl #n, (sp), r3, set_mult ;limit,add,index
D1 11 003D 23 brb next
003F 24 set_mult: ;set bits for multiples
EF 46'AF 53 E2 003F 25 bbss r3, b^bits, mult ;branch on bit set & set
ED 11 0044 26 brb mult
0046 27
0001E892 0046 28 bits: .blkl <n+2+31>/32
E892 29 .end main</syntaxhighlight>

=={{header|VBA}}==
Using Excel<syntaxhighlight lang="vb"> Sub primes()
'BRRJPA
'Prime calculation for VBA_Excel
'p is the superior limit of the range calculation
'This example calculates from 2 to 100000 and print it
'at the collum A


p = 100000

Dim nprimes(1 To 100000) As Integer
b = Sqr(p)

For n = 2 To b

For k = n * n To p Step n
nprimes(k) = 1
Next k
Next n


For a = 2 To p
If nprimes(a) = 0 Then
c = c + 1
Range("A" & c).Value = a
End If
Next a

End Sub </syntaxhighlight>

=={{header|VBScript}}==
=={{header|VBScript}}==
To run in console mode with cscript.
To run in console mode with cscript.
<syntaxhighlight lang="vb">
<lang vb>
Dim sieve()
Dim sieve()
If WScript.Arguments.Count>=1 Then
If WScript.Arguments.Count>=1 Then
Line 10,047: Line 19,863:
If sieve(i) Then WScript.Echo i
If sieve(i) Then WScript.Echo i
Next
Next
</syntaxhighlight>
</lang>


=={{header|Vedit macro language}}==
=={{header|Vedit macro language}}==
Line 10,084: Line 19,900:
-P-----P---------P-----P-----P-P-----P---------P-----P-----P
-P-----P---------P-----P-----P-P-----P---------P-----P-----P
</pre>
</pre>
=={{header|VBA Excel}}==
<lang vb> Sub primes()
'BRRJPA
'Prime calculation for VBA_Excel
'p is the superior limit of the range calculation
'This example calculates from 2 to 100000 and print it
'at the collum A


p = 100000

Dim nprimes(1 To 100000) As Integer
b = Sqr(p)

For n = 2 To b

For k = n * n To p Step n
nprimes(k) = 1
Next k
Next n


For a = 2 To p
If nprimes(a) = 0 Then
c = c + 1
Range("A" & c).Value = a
End If
Next a

End Sub </lang >


=={{header|Visual Basic}}==
=={{header|Visual Basic}}==
'''Works with:''' VB6
'''Works with:''' VB6
<lang vb>Sub Eratost()
<syntaxhighlight lang="vb">Sub Eratost()
Dim sieve() As Boolean
Dim sieve() As Boolean
Dim n As Integer, i As Integer, j As Integer
Dim n As Integer, i As Integer, j As Integer
Line 10,137: Line 19,921:
If sieve(i) Then Debug.Print i
If sieve(i) Then Debug.Print i
Next i
Next i
End Sub 'Eratost</lang>
End Sub 'Eratost</syntaxhighlight>


=={{header|Visual Basic .NET}}==
=={{header|Visual Basic .NET}}==
<lang vbnet>Dim n As Integer, k As Integer, limit As Integer
<syntaxhighlight lang="vbnet">Dim n As Integer, k As Integer, limit As Integer
Console.WriteLine("Enter number to search to: ")
Console.WriteLine("Enter number to search to: ")
limit = Console.ReadLine
limit = Console.ReadLine
Line 10,151: Line 19,935:
End If
End If
Next n
Next n

' Display the primes
' Display the primes
For n = 2 To limit
For n = 2 To limit
Line 10,157: Line 19,941:
Console.WriteLine(n)
Console.WriteLine(n)
End If
End If
Next n</lang>
Next n</syntaxhighlight>
===Alternate===
Since the sieves are being removed only above the current iteration, the separate loop for display is unnecessary. And no '''Math.Sqrt()''' needed. Also, input is from command line parameter instead of Console.ReadLine(). Consolidated ''If'' block with ''For'' statement into two ''Do'' loops.
<syntaxhighlight lang="vbnet">Module Module1
Sub Main(args() As String)
Dim lmt As Integer = 500, n As Integer = 2, k As Integer
If args.Count > 0 Then Integer.TryParse(args(0), lmt)
Dim flags(lmt + 1) As Boolean ' non-primes are true in this array.
Do ' a prime was found,
Console.Write($"{n} ") ' so show it,
For k = n * n To lmt Step n ' and eliminate any multiple of it at it's square and beyond.
flags(k) = True
Next
Do ' skip over non-primes.
n += If(n = 2, 1, 2)
Loop While flags(n)
Loop while n <= lmt
End Sub
End Module</syntaxhighlight>{{out}}<pre>2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 </pre>

=={{header|V (Vlang)}}==
{{trans|go}}
===Basic sieve of array of booleans===
<syntaxhighlight lang="v (vlang)">fn main() {
limit := 201 // means sieve numbers < 201
// sieve
mut c := []bool{len: limit} // c for composite. false means prime candidate
c[1] = true // 1 not considered prime
mut p := 2
for {
// first allowed optimization: outer loop only goes to sqrt(limit)
p2 := p * p
if p2 >= limit {
break
}
// second allowed optimization: inner loop starts at sqr(p)
for i := p2; i < limit; i += p {
c[i] = true // it's a composite
}
// scan to get next prime for outer loop
for {
p++
if !c[p] {
break
}
}
}
// sieve complete. now print a representation.
for n in 1..limit {
if c[n] {
print(" .")
} else {
print("${n:3}")
}
if n%20 == 0 {
println("")
}
}
}</syntaxhighlight>
Output:
<pre>
. 2 3 . 5 . 7 . . . 11 . 13 . . . 17 . 19 .
. . 23 . . . . . 29 . 31 . . . . . 37 . . .
41 . 43 . . . 47 . . . . . 53 . . . . . 59 .
61 . . . . . 67 . . . 71 . 73 . . . . . 79 .
. . 83 . . . . . 89 . . . . . . . 97 . . .
101 .103 . . .107 .109 . . .113 . . . . . . .
. . . . . .127 . . .131 . . . . .137 .139 .
. . . . . . . .149 .151 . . . . .157 . . .
. .163 . . .167 . . . . .173 . . . . .179 .
181 . . . . . . . . .191 .193 . . .197 .199 .
</pre>


=={{header|Vorpal}}==
=={{header|Vorpal}}==
<lang vorpal>self.print_primes = method(m){
<syntaxhighlight lang="vorpal">self.print_primes = method(m){
p = new()
p = new()
p.fill(0, m, 1, true)
p.fill(0, m, 1, true)
Line 10,182: Line 20,039:
}
}


self.print_primes(100)</lang>{{out|Result}}
self.print_primes(100)</syntaxhighlight>{{out|Result}}
primes: 25 in 100
primes: 25 in 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,

=={{header|WebAssembly}}==
(module
(import "js" "print" (func $print (param i32)))
(memory 4096)
(func $sieve (export "sieve") (param $n i32)
(local $i i32)
(local $j i32)
(set_local $i (i32.const 0))
(block $endLoop
(loop $loop
(br_if $endLoop (i32.ge_s (get_local $i) (get_local $n)))
(i32.store8 (get_local $i) (i32.const 1))
(set_local $i (i32.add (get_local $i) (i32.const 1)))
(br $loop)))
(set_local $i (i32.const 2))
(block $endLoop
(loop $loop
(br_if $endLoop (i32.ge_s (i32.mul (get_local $i) (get_local $i))
(get_local $n)))
(if (i32.eq (i32.load8_s (get_local $i)) (i32.const 1))
(then
(set_local $j (i32.mul (get_local $i) (get_local $i)))
(block $endInnerLoop
(loop $innerLoop
(i32.store8 (get_local $j) (i32.const 0))
(set_local $j (i32.add (get_local $j) (get_local $i)))
(br_if $endInnerLoop (i32.ge_s (get_local $j) (get_local $n)))
(br $innerLoop)))))
(set_local $i (i32.add (get_local $i) (i32.const 1)))
(br $loop)))
(set_local $i (i32.const 2))
(block $endLoop
(loop $loop
(if (i32.eq (i32.load8_s (get_local $i)) (i32.const 1))
(then
(call $print (get_local $i))))
(set_local $i (i32.add (get_local $i) (i32.const 1)))
(br_if $endLoop (i32.ge_s (get_local $i) (get_local $n)))
(br $loop)))))

=={{header|Xojo}}==
Place the following in the '''Run''' event handler of a Console application:
<syntaxhighlight lang="xojo">Dim limit, prime, i As Integer
Dim s As String
Dim t As Double
Dim sieve(100000000) As Boolean

REM Get the maximum number
While limit<1 Or limit > 100000000
Print("Max number? [1 to 100000000]")
s = Input
limit = CDbl(s)
Wend

REM Do the calculations
t = Microseconds
prime = 2
While prime^2 < limit
For i = prime*2 To limit Step prime
sieve(i) = True
Next
Do
prime = prime+1
Loop Until Not sieve(prime)
Wend
t = Microseconds-t
Print("Compute time = "+Str(t/1000000)+" sec")
Print("Press Enter...")
s = Input

REM Display the prime numbers
For i = 1 To limit
If Not sieve(i) Then Print(Str(i))
Next
s = Input</syntaxhighlight>

{{out}}
<pre>Max number? [1 to 100000000]
1000
Compute time = 0.0000501 sec
Press Enter...

1
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
...
</pre>

This version uses a dynamic array and can use (a lot) less memory. It's (a lot) slower too.
Since Booleans are manually set to True, the algorithm makes more sense.
<syntaxhighlight lang="xojo">Dim limit, prime, i As Integer
Dim s As String
Dim t As Double
Dim sieve() As Boolean

REM Get the maximum number and define array
While limit<1 Or limit > 2147483647
Print("Max number? [1 to 2147483647]")
s = Input
limit = CDbl(s)
Wend
t = Microseconds
For i = 0 To Limit
Sieve.Append(True)
Next
t = Microseconds-t
Print("Memory allocation time = "+Str(t/1000000)+" sec")

REM Do the calculations
t = Microseconds
prime = 2
While prime^2 < limit
For i = prime*2 To limit Step prime
sieve(i) = False
Next
Do
prime = prime+1
Loop Until sieve(prime)
Wend
t = Microseconds-t
Print("Compute time = "+Str(t/1000000)+" sec")
Print("Press Enter...")
s = Input

REM Display the prime numbers
For i = 1 To limit
If sieve(i) Then Print(Str(i))
Next
s = Input</syntaxhighlight>

{{out}}
<pre>Max number? [1 to 2147483647]
1000
Memory allocation time = 0.0000296 sec
Compute time = 0.0000501 sec
Press Enter...

1
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
...
</pre>

=={{header|Woma}}==

<syntaxhighlight lang="woma">(sieve(n = /0 -> int; limit = /0 -> int; is_prime = [/0] -> *)) *
i<@>range(n*n, limit+1, n)
is_prime = is_prime[$]i,False
<*>is_prime

(primes_upto(limit = 4 -> int)) list(int)
primes = [] -> list
f = [False, False] -> list(bool)
t = [True] -> list(bool)
u = limit - 1 -> int
tt = t * u -> list(bool)
is_prime = flatten(f[^]tt) -> list(bool)
limit_sqrt = limit ** 0.5 -> float
iter1 = int(limit_sqrt + 1.5) -> int

n<@>range(iter1)
is_prime[n]<?>is_prime = sieve(n, limit, is_prime)

i,prime<@>enumerate(is_prime)
prime<?>primes = primes[^]i
<*>primes</syntaxhighlight>

=={{header|Wren}}==
<syntaxhighlight lang="wren">var sieveOfE = Fn.new { |n|
if (n < 2) return []
var comp = List.filled(n-1, false)
var p = 2
while (true) {
var p2 = p * p
if (p2 > n) break
var i = p2
while (i <= n) {
comp[i-2] = true
i = i + p
}
while (true) {
p = p + 1
if (!comp[p-2]) break
}
}
var primes = []
for (i in 0..n-2) {
if (!comp[i]) primes.add(i+2)
}
return primes
}

System.print(sieveOfE.call(100))</syntaxhighlight>

{{out}}
<pre>
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
</pre>

=={{header|XPL0}}==
=={{header|XPL0}}==
<lang XPL0>include c:\cxpl\codes; \intrinsic 'code' declarations
<syntaxhighlight lang="xpl0">include c:\cxpl\codes; \intrinsic 'code' declarations
int Size, Prime, I, Kill;
int Size, Prime, I, Kill;
char Flag;
char Flag;
Line 10,202: Line 20,289:
];
];
];
];
]</lang>{{out|Example output}}<pre>20
]</syntaxhighlight>{{out|Example output}}<pre>20
2
2
3
3
Line 10,211: Line 20,298:
17
17
19</pre>
19</pre>

=={{header|Yabasic}}==
<syntaxhighlight lang="yabasic">#!/usr/bin/yabasic

// ---------------------------
// Prime Sieve Benchmark --
// "Shootout" Version --
// ---------------------------
// usage:
// yabasic sieve8k.yab 90000


SIZE = 8192
ONN = 1 : OFF = 0
dim flags(SIZE)

sub main()
cmd = peek("arguments")
if cmd = 1 then
iterations = val(peek$("argument"))
if iterations = 0 then print "Argument wrong. Done 1000." : iterations = 1000 end if
else
print "1000 iterations."
iterations = 1000
end if
for iter = 1 to iterations
count = 0
for n= 1 to SIZE : flags(n) = ONN: next n
for i = 2 to SIZE
if flags(i) = ONN then
let k = i + i
if k < SIZE then
for k = k to SIZE step i
flags(k) = OFF
next k
end if
count = count + 1
end if
next i
next iter
print "Count: ", count // 1028
end sub

clear screen

print "Prime Sieve Benchmark\n"

main()

t = val(mid$(time$,10))

print "time: ", t, "\n"
print peek("millisrunning")</syntaxhighlight>

=={{header|Zig}}==
<syntaxhighlight lang="zig">
const std = @import("std");
const stdout = std.io.getStdOut().outStream();

pub fn main() !void {
try sieve(1000);
}

// using a comptime limit ensures that there's no need for dynamic memory.
fn sieve(comptime limit: usize) !void {
var prime = [_]bool{true} ** limit;
prime[0] = false;
prime[1] = false;
var i: usize = 2;
while (i*i < limit) : (i += 1) {
if (prime[i]) {
var j = i*i;
while (j < limit) : (j += i)
prime[j] = false;
}
}
var c: i32 = 0;
for (prime) |yes, p|
if (yes) {
c += 1;
try stdout.print("{:5}", .{p});
if (@rem(c, 10) == 0)
try stdout.print("\n", .{});
};
try stdout.print("\n", .{});
}
</syntaxhighlight>
{{out}}
<pre>
$ zig run sieve.zig
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
</pre>
===Odds-only bit packed version===
{{trans|BCPL}}
Includes the iterator, as with the BCPL Odds-only bit packed sieve. Since it's not much extra code, the sieve object also includes methods for getting the size and testing for membership.
<syntaxhighlight lang="zig">
const std = @import("std");
const heap = std.heap;
const mem = std.mem;
const stdout = std.io.getStdOut().writer();

pub fn main() !void {
const assert = std.debug.assert;

var buf: [fixed_alloc_sz(1000)]u8 = undefined; // buffer big enough for 1,000 primes.
var fba = heap.FixedBufferAllocator.init(&buf);

const sieve = try SoE.init(1000, &fba.allocator);
defer sieve.deinit(); // not needed for the FBA, but in general you would de-init the sieve

// test membership functions
assert(sieve.contains(997));
assert(!sieve.contains(995));
assert(!sieve.contains(994));
assert(!sieve.contains(1009));

try stdout.print("There are {} primes < 1000\n", .{sieve.size()});
var c: u32 = 0;
var iter = sieve.iterator();
while (iter.next()) |p| {
try stdout.print("{:5}", .{p});
c += 1;
if (c % 10 == 0)
try stdout.print("\n", .{});
}
try stdout.print("\n", .{});
}

// return size to sieve n prmes if using the Fixed Buffer Allocator
// adds some u64 words for FBA bookkeeping.
pub inline fn fixed_alloc_sz(limit: usize) usize {
return (2 + limit / 128) * @sizeOf(u64);
}

pub const SoE = struct {
const all_u64bits_on = 0xFFFF_FFFF_FFFF_FFFF;
const empty = [_]u64{};

sieve: []u64,
alloc: *mem.Allocator,

pub fn init(limit: u64, allocator: *mem.Allocator) error{OutOfMemory}!SoE {
if (limit < 3)
return SoE{
.sieve = &empty,
.alloc = allocator,
};

var bit_sz = (limit + 1) / 2 - 1;
var q = bit_sz >> 6;
var r = bit_sz & 0x3F;
var sz = q + @boolToInt(r > 0);
var sieve = try allocator.alloc(u64, sz);

var i: usize = 0;
while (i < q) : (i += 1)
sieve[i] = all_u64bits_on;
if (r > 0)
sieve[q] = (@as(u64, 1) << @intCast(u6, r)) - 1;

var bit: usize = 0;
while (true) {
while (sieve[bit >> 6] & @as(u64, 1) << @intCast(u6, bit & 0x3F) == 0)
bit += 1;

const p = 2 * bit + 3;
q = p * p;
if (q > limit)
return SoE{
.sieve = sieve,
.alloc = allocator,
};

r = (q - 3) / 2;
while (r < bit_sz) : (r += p)
sieve[r >> 6] &= ~((@as(u64, 1)) << @intCast(u6, r & 0x3F));

bit += 1;
}
}

pub fn deinit(self: SoE) void {
if (self.sieve.len > 0) {
self.alloc.free(self.sieve);
}
}

pub fn iterator(self: SoE) SoE_Iterator {
return SoE_Iterator.init(self.sieve);
}

pub fn size(self: SoE) usize {
var sz: usize = 1; // sieve doesn't include 2.
for (self.sieve) |bits|
sz += @popCount(u64, bits);
return sz;
}

pub fn contains(self: SoE, n: u64) bool {
if (n & 1 == 0)
return n == 2
else {
const bit = (n - 3) / 2;
const q = bit >> 6;
const r = @intCast(u6, bit & 0x3F);
return if (q >= self.sieve.len)
false
else
self.sieve[q] & (@as(u64, 1) << r) != 0;
}
}
};

// Create an iterater object to enumerate primes we've generated.
const SoE_Iterator = struct {
const Self = @This();

start: u64,
bits: u64,
sieve: []const u64,

pub fn init(sieve: []const u64) Self {
return Self{
.start = 0,
.sieve = sieve,
.bits = sieve[0],
};
}

pub fn next(self: *Self) ?u64 {
if (self.sieve.len == 0)
return null;

// start = 0 => first time, so yield 2.
if (self.start == 0) {
self.start = 3;
return 2;
}

var x = self.bits;
while (true) {
if (x != 0) {
const p = @ctz(u64, x) * 2 + self.start;
x &= x - 1;
self.bits = x;
return p;
} else {
self.start += 128;
self.sieve = self.sieve[1..];
if (self.sieve.len == 0)
return null;
x = self.sieve[0];
}
}
}
};
</syntaxhighlight>
{{Out}}
<pre>
There are 168 primes < 1000
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
</pre>
===Optimized version===
<syntaxhighlight lang="zig">
const stdout = @import("std").io.getStdOut().writer();

const lim = 1000;
const n = lim - 2;

var primes: [n]?usize = undefined;

pub fn main() anyerror!void {
var i: usize = 0;
var m: usize = 0;

while (i < n) : (i += 1) {
primes[i] = i + 2;
}

i = 0;
while (i < n) : (i += 1) {
if (primes[i]) |prime| {
m += 1;
try stdout.print("{:5}", .{prime});
if (m % 10 == 0) try stdout.print("\n", .{});
var j: usize = i + prime;
while (j < n) : (j += prime) {
primes[j] = null;
}
}
}
try stdout.print("\n", .{});
}

</syntaxhighlight>
{{Out}}
<pre>
$ zig run sieve.zig
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
</pre>

=={{header|zkl}}==
=={{header|zkl}}==
<lang zkl>fcn sieve(limit){
<syntaxhighlight lang="zkl">fcn sieve(limit){
composite:=Data(limit+1).fill(1); // bucket of bytes set to 1 (prime)
composite:=Data(limit+1).fill(1); // bucket of bytes set to 1 (prime)
(2).pump(limit.toFloat().sqrt()+1, Void, // Void==no results, just loop
(2).pump(limit.toFloat().sqrt()+1, Void, // Void==no results, just loop
Line 10,219: Line 20,657:
(2).filter(limit-1,composite.get); // bytes still 1 are prime
(2).filter(limit-1,composite.get); // bytes still 1 are prime
}
}
sieve(53).println();</lang>
sieve(53).println();</syntaxhighlight>
The pump method is just a loop, passing results from action to action
The pump method is just a loop, passing results from action to action
and collecting the results (ie a minimal state machine). Pumping to Void
and collecting the results (ie a minimal state machine). Pumping to Void

Revision as of 19:34, 1 May 2024

Task
Sieve of Eratosthenes
You are encouraged to solve this task according to the task description, using any language you may know.
This task has been clarified. Its programming examples are in need of review to ensure that they still fit the requirements of the task.


The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.


Task

Implement the   Sieve of Eratosthenes   algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.

That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.

If there's an easy way to add such a wheel based optimization, implement it as an alternative version.


Note
  • It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.


Related tasks



11l

Translation of: Python
F primes_upto(limit)
   V is_prime = [0B]*2 [+] [1B]*(limit - 1)
   L(n) 0 .< Int(limit ^ 0.5 + 1.5)
      I is_prime[n]
         L(i) (n*n..limit).step(n)
            is_prime[i] = 0B
   R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)

print(primes_upto(100))
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

360 Assembly

For maximum compatibility, this program uses only the basic instruction set.

*        Sieve of Eratosthenes 
ERATOST  CSECT  
         USING  ERATOST,R12
SAVEAREA B      STM-SAVEAREA(R15)
         DC     17F'0'
         DC     CL8'ERATOST'
STM      STM    R14,R12,12(R13) save calling context
         ST     R13,4(R15)      
         ST     R15,8(R13)
         LR     R12,R15         set addessability
*        ----   CODE
         LA     R4,1            I=1  
         LA     R6,1            increment
         L      R7,N            limit
LOOPI    BXH    R4,R6,ENDLOOPI  do I=2 to N
         LR     R1,R4           R1=I
         BCTR   R1,0             
         LA     R14,CRIBLE(R1)
         CLI    0(R14),X'01'
         BNE    ENDIF           if not CRIBLE(I)
         LR     R5,R4           J=I
         LR     R8,R4
         LR     R9,R7
LOOPJ    BXH    R5,R8,ENDLOOPJ  do J=I*2 to N by I
         LR     R1,R5           R1=J
         BCTR   R1,0
         LA     R14,CRIBLE(R1)
         MVI    0(R14),X'00'    CRIBLE(J)='0'B
         B      LOOPJ
ENDLOOPJ EQU    *
ENDIF    EQU    *
         B      LOOPI
ENDLOOPI EQU    *
         LA     R4,1            I=1  
         LA     R6,1
         L      R7,N
LOOP     BXH    R4,R6,ENDLOOP   do I=1 to N
         LR     R1,R4           R1=I
         BCTR   R1,0
         LA     R14,CRIBLE(R1)
         CLI    0(R14),X'01'
         BNE    NOTPRIME        if not CRIBLE(I) 
         CVD    R4,P            P=I
         UNPK   Z,P             Z=P
         MVC    C,Z             C=Z
         OI     C+L'C-1,X'F0'   zap sign
         MVC    WTOBUF(8),C+8
         WTO    MF=(E,WTOMSG)		  
NOTPRIME EQU    *
         B      LOOP
ENDLOOP  EQU    *
RETURN   EQU    *
         LM     R14,R12,12(R13) restore context
         XR     R15,R15         set return code to 0
         BR     R14             return to caller
*        ----   DATA
I        DS     F
J        DS     F
         DS     0F
P        DS     PL8             packed
Z        DS     ZL16            zoned
C        DS     CL16            character 
WTOMSG   DS     0F
         DC     H'80'           length of WTO buffer
         DC     H'0'            must be binary zeroes
WTOBUF   DC     80C' '
         LTORG  
N        DC     F'100000'
CRIBLE   DC     100000X'01'
         YREGS  
         END    ERATOST
Output:
00000002
00000003
00000005
00000007
00000011
00000013
00000017
00000019
00000023
00000029
00000031
00000037
00000041
00000043
00000047
00000053
00000059
00000061
00000067
...
00099767
00099787
00099793
00099809
00099817
00099823
00099829
00099833
00099839
00099859
00099871
00099877
00099881
00099901
00099907
00099923
00099929
00099961
00099971
00099989
00099991

6502 Assembly

If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.

ERATOS: STA  $D0      ; value of n
        LDA  #$00
        LDX  #$00
SETUP:  STA  $1000,X  ; populate array
        ADC  #$01
        INX
        CPX  $D0
        BPL  SET
        JMP  SETUP
SET:    LDX  #$02
SIEVE:  LDA  $1000,X  ; find non-zero
        INX
        CPX  $D0
        BPL  SIEVED
        CMP  #$00
        BEQ  SIEVE
        STA  $D1      ; current prime
MARK:   CLC
        ADC  $D1
        TAY
        LDA  #$00
        STA  $1000,Y
        TYA
        CMP  $D0
        BPL  SIEVE
        JMP  MARK
SIEVED: LDX  #$01
        LDY  #$00
COPY:   INX
        CPX  $D0
        BPL  COPIED
        LDA  $1000,X
        CMP  #$00
        BEQ  COPY
        STA  $2000,Y
        INY
        JMP  COPY
COPIED: TYA           ; how many found
        RTS

68000 Assembly

Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.

Works with: [EASy68K v5.13.00]

Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.

*-----------------------------------------------------------
* Title      : BitSieve
* Written by : G. A. Tippery
* Date       : 2014-Feb-24, 2013-Dec-22
* Description: Prime number sieve
*-----------------------------------------------------------
    	ORG    $1000

**	---- Generic macros ----	**
PUSH	MACRO
	MOVE.L	\1,-(SP)
	ENDM

POP	MACRO
	MOVE.L	(SP)+,\1
	ENDM

DROP	MACRO
	ADDQ	#4,SP
	ENDM
	
PUTS	MACRO
	** Print a null-terminated string w/o CRLF **
	** Usage: PUTS stringaddress
	** Returns with D0, A1 modified
	MOVEQ	#14,D0	; task number 14 (display null string)
	LEA	\1,A1	; address of string
	TRAP	#15	; display it
	ENDM
	
GETN	MACRO
	MOVEQ	#4,D0	; Read a number from the keyboard into D1.L. 
	TRAP	#15
	ENDM

**	---- Application-specific macros ----	**

val	MACRO		; Used by bit sieve. Converts bit address to the number it represents.
	ADD.L	\1,\1	; double it because odd numbers are omitted
	ADDQ	#3,\1	; add offset because initial primes (1, 2) are omitted
	ENDM

* ** ================================================================================ **
* ** Integer square root routine, bisection method **
* ** IN: D0, should be 0<D0<$10000 (65536) -- higher values MAY work, no guarantee
* ** OUT: D1
*
SquareRoot:
*
	MOVEM.L	D2-D4,-(SP)	; save registers needed for local variables
*	DO == n
*	D1 == a
*	D2 == b
*	D3 == guess
*	D4 == temp
*
*		a = 1;
*		b = n;
	MOVEQ	#1,D1
	MOVE.L	D0,D2
*		do {
	REPEAT
*		guess = (a+b)/2;
	MOVE.L	D1,D3
	ADD.L	D2,D3
	LSR.L	#1,D3
*		if (guess*guess > n) {	// inverse function of sqrt is square
	MOVE.L	D3,D4
	MULU	D4,D4		; guess^2
	CMP.L	D0,D4
	BLS	.else
*		b = guess;
	MOVE.L	D3,D2
	BRA	.endif
*		} else {
.else:
*		a = guess;
	MOVE.L	D3,D1
*		} //if
.endif:
*		} while ((b-a) > 1);	; Same as until (b-a)<=1 or until (a-b)>=1
	MOVE.L	D2,D4
	SUB.L	D1,D4	; b-a
	UNTIL.L	  D4 <LE> #1 DO.S
*		return (a)	; Result is in D1
*		} //LongSqrt()
	MOVEM.L	(SP)+,D2-D4	; restore saved registers
	RTS
*
* ** ================================================================================ **


** ======================================================================= **
*
**  Prime-number Sieve of Eratosthenes routine using a big bit field for flags  **
*  Enter with D0 = size of sieve (bit array)
*  Prints found primes 10 per line
*  Returns # prime found in D6
*
*   Register usage:
*
*	D0 == n
*	D1 == prime
*	D2 == sqroot
*	D3 == PIndex
*	D4 == CIndex
*	D5 == MaxIndex
*	D6 == PCount
*
*	A0 == PMtx[0]
*
*   On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.
*

**	------------------------	**

GetBit:		** sub-part of Sieve subroutine **
		** Entry: bit # is on TOS
		** Exit: A6 holds the byte number, D7 holds the bit number within the byte
		** Note: Input param is still on TOS after return. Could have passed via a register, but
                **  wanted to practice with stack. :)
*		
	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
	ASR.L	#3,D7	; /8
	MOVEA	D7,A6	; byte #
	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
	AND.L	#$7,D7	; bit #
	RTS

**	------------------------	**

Sieve:
	MOVE	D0,D5
	SUBQ	#1,D5
	JSR	SquareRoot	; sqrt D0 => D1
	MOVE.L	D1,D2
	LEA	PArray,A0
	CLR.L	D3
*
PrimeLoop:
	MOVE.L	D3,D1
	val	D1
	MOVE.L	D3,D4
	ADD.L	D1,D4
*
CxLoop:		; Goes through array marking multiples of d1 as composite numbers
	CMP.L	D5,D4
	BHI	ExitCx
	PUSH	D4	; set D7 as bit # and A6 as byte pointer for D4'th bit of array
	JSR GetBit
	DROP
	BSET	D7,0(A0,A6.L)	; set bit to mark as composite number
	ADD.L	D1,D4	; next number to mark
	BRA	CxLoop
ExitCx:
	CLR.L	D1	; Clear new-prime-found flag
	ADDQ	#1,D3	; Start just past last prime found 
PxLoop:		; Searches for next unmarked (not composite) number
	CMP.L	D2,D3	; no point searching past where first unmarked multiple would be past end of array
	BHI	ExitPx	; if past end of array
	TST.L	D1
	BNE	ExitPx	; if flag set, new prime found
	PUSH D3		; check D3'th bit flag
	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
	DROP		; drop TOS
	BTST	D7,0(A0,A6.L)	; read bit flag
	BNE	IsSet	; If already tagged as composite
	MOVEQ	#-1,D1	; Set flag that we've found a new prime
IsSet:
	ADDQ	#1,D3	; next PIndex
	BRA	PxLoop
ExitPx:
	SUBQ	#1,D3	; back up PIndex
	TST.L	D1	; Did we find a new prime #?
	BNE	PrimeLoop	; If another prime # found, go process it
*
		; fall through to print routine

**	------------------------	**

* Print primes found
*
*	D4 == Column count
*
*	Print header and assumed primes (#1, #2) 
    	PUTS	Header	; Print string @ Header, no CR/LF
	MOVEQ	#2,D6	; Start counter at 2 because #1 and #2 are assumed primes
	MOVEQ	#2,D4
*
	MOVEQ	#0,D3
PrintLoop:
	CMP.L	D5,D3
	BHS	ExitPL
	PUSH	D3
	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
	DROP		; drop TOS
	BTST	D7,0(A0,A6.L)
	BNE		NotPrime
*		printf(" %6d", val(PIndex)
	MOVE.L	D3,D1
	val	D1
	AND.L	#$0000FFFF,D1
	MOVEQ	#6,D2
	MOVEQ	#20,D0	; display signed RJ
	TRAP	#15
	ADDQ	#1,D4
	ADDQ	#1,D6
*	*** Display formatting ***
*		if((PCount % 10) == 0) printf("\n");
	CMP	#10,D4
	BLO	NoLF
	PUTS	CRLF
	MOVEQ	#0,D4
NoLF:
NotPrime:
	ADDQ	#1,D3
	BRA	PrintLoop
ExitPL:
	RTS

** ======================================================================= **

N	EQU	5000	; *** Size of boolean (bit) array ***
SizeInBytes	EQU	(N+7)/8
*
START:                  	; first instruction of program
	MOVE.L	#N,D0	; # to test
	JSR	Sieve
*		printf("\n %d prime numbers found.\n", D6); ***
	PUTS	Summary1,A1
	MOVE	#3,D0	; Display signed number in D1.L in decimal in smallest field.
	MOVE.W	D6,D1
	TRAP	#15
	PUTS	Summary2,A1

	SIMHALT             	; halt simulator

** ======================================================================= **

* Variables and constants here

	ORG	$2000
CR	EQU	13
LF	EQU	10
CRLF	DC.B	CR,LF,$00

PArray:	DCB.B	SizeInBytes,0

Header:	DC.B	CR,LF,LF,' Primes',CR,LF,' ======',CR,LF
		DC.B	'     1     2',$00

Summary1:	DC.B	CR,LF,' ',$00
Summary2:	DC.B	' prime numbers found.',CR,LF,$00

    END    START        	; last line of source


8086 Assembly

MAXPRM:	equ	5000		; Change this value for more primes
	cpu	8086
	bits	16
	org	100h
section	.text
erato:	mov	cx,MAXPRM	; Initialize array (set all items to prime)
	mov	bp,cx		; Keep a copy in BP
	mov	di,sieve
	mov	al,1
	rep	stosb
	;;;	Sieve
	mov	bx,sieve	; Set base register to array
	inc	cx		; CX=1 (CH=0, CL=1); CX was 0 before
	mov	si,cx		; Start at number 2 (1+1)
.next:	inc	si		; Next number 
	cmp	cl,[bx+si]	; Is this number marked as prime?
	jne	.next		; If not, try next number
	mov	ax,si		; Otherwise, calculate square,
	mul	si
	mov	di,ax		; and put it in DI
	cmp	di,bp		; Check array bounds
	ja	output		; We're done when SI*SI>MAXPRM
.mark:	mov	[bx+di],ch	; Mark byte as composite
	add	di,si		; Next composite
	cmp 	di,bp		; While maximum not reached
	jbe	.mark
	jmp	.next
	;;;	Output
output:	mov	si,2		; Start at 2
.test:	dec	byte [bx+si]	; Prime?
	jnz	.next		; If not, try next number
	mov	ax,si		; Otherwise, print number
	call	prax
.next:	inc	si
	cmp	si,MAXPRM
	jbe	.test
	ret
	;;;	Write number in AX to standard output (using MS-DOS)
prax:	push	bx		; Save BX
	mov	bx,numbuf
	mov	bp,10		; Divisor
.loop:	xor	dx,dx		; Divide AX by 10, modulus in DX
	div	bp
	add	dl,'0'		; ASCII digit
	dec	bx
	mov	[bx],dl		; Store ASCII digit
	test	ax,ax		; More digits?
	jnz	.loop
	mov	dx,bx		; Print number
	mov	ah,9		; 9 = MS-DOS syscall to print string
	int	21h
	pop	bx		; Restore BX
	ret
section	.data
	db	'*****'		; Room for number
numbuf:	db	13,10,'$'
section	.bss 
sieve:	resb	MAXPRM
Output:
2
3
5
7
11
...
4969
4973
4987
4993
4999

8th

with: n

\ create a new buffer which will function as a bit vector
: bit-vec SED: n -- b
    dup 3 shr swap 7 band if 1+ then b:new b:clear ;

\ given a buffer, sieving prime, and limit, cross off multiples
\ of the sieving prime.
: +composites SED: b n n -- b
    >r dup sqr rot \ want: -- n index b
    repeat
        over 1- true b:bit!
        >r over + r>
    over r@ > until!
    rdrop nip nip ;

\ SoE algorithm proper
: make-sieve SED: n -- b
    dup>r bit-vec 2
    repeat
        tuck 1- b:bit@ not
        if
            over r@ +composites
        then swap 1+
    dup sqr r@ < while!
    rdrop drop ;

\ traverse the final buffer, creating an array of primes
: sieve>a SED: b n -- a
    >r a:new swap
    ( 1- b:bit@ not if >r I a:push r> then ) 2 r> loop drop ;

;with

: sieve SED: n -- a
    dup make-sieve swap sieve>a ;
Output:
ok> 100 sieve .
 [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok> 1_000_000 sieve a:len . \ count primes up to 1,000,000
78498
ok> -1 a:@ . \ largest prime < 1,000,000
999983

AArch64 Assembly

Works with: as version Raspberry Pi 3B version Buster 64 bits
/* ARM assembly AARCH64 Raspberry PI 3B */
/*  program cribleEras64.s   */

/*******************************************/
/* Constantes file                         */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"

.equ MAXI,      100

/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss  
sZoneConv:                  .skip 24
TablePrime:                 .skip   8 * MAXI 
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main 
main:                               // entry of program 
    ldr x4,qAdrTablePrime           // address prime table
    mov x0,#2                       // prime 2
    bl displayPrime
    mov x1,#2
    mov x2,#1
1:                                  // loop for multiple of 2
    str x2,[x4,x1,lsl #3]           // mark  multiple of 2
    add x1,x1,#2
    cmp x1,#MAXI                    // end ?
    ble 1b                          // no loop
    mov x1,#3                       // begin indice
    mov x3,#1
2:
    ldr x2,[x4,x1,lsl #3]           // load table élément
    cmp x2,#1                       // is prime ?
    beq 4f
    mov x0,x1                       // yes -> display
    bl displayPrime
    mov x2,x1
3:                                  // and loop to mark multiples of this prime
    str x3,[x4,x2,lsl #3]
    add x2,x2,x1                    // add the prime
    cmp x2,#MAXI                    // end ?
    ble 3b                          // no -> loop
4:
    add x1,x1,2                     // other prime in table
    cmp x1,MAXI                     // end table ?
    ble 2b                          // no -> loop

100:                                // standard end of the program 
    mov x0,0                        // return code
    mov x8,EXIT                     // request to exit program
    svc 0                           // perform the system call
qAdrszCarriageReturn:    .quad szCarriageReturn
qAdrsMessResult:         .quad sMessResult
qAdrTablePrime:          .quad TablePrime

/******************************************************************/
/*      Display prime table elements                                */ 
/******************************************************************/
/* x0 contains the prime */
displayPrime:
    stp x1,lr,[sp,-16]!             // save  registers
    ldr x1,qAdrsZoneConv
    bl conversion10                 // call décimal conversion
    ldr x0,qAdrsMessResult
    ldr x1,qAdrsZoneConv            // insert conversion in message
    bl strInsertAtCharInc
    bl affichageMess                // display message
100:
    ldp x1,lr,[sp],16               // restaur  2 registers
    ret                             // return to address lr x30
qAdrsZoneConv:                   .quad sZoneConv  

/********************************************************/
/*        File Include fonctions                        */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"
Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97

ABAP

PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.

AT SELECTION-SCREEN ON p_limit.
  IF p_limit LE 1.
    MESSAGE 'Limit must be higher then 1.' TYPE 'E'.
  ENDIF.

START-OF-SELECTION.
  FIELD-SYMBOLS: <fs_prime> TYPE flag.
  DATA: gt_prime TYPE TABLE OF flag,
        gv_prime TYPE flag,
        gv_i     TYPE i,
        gv_j     TYPE i.

  DO p_limit TIMES.
    IF sy-index > 1.
      gv_prime = abap_true.
    ELSE.
      gv_prime = abap_false.
    ENDIF.

    APPEND gv_prime TO gt_prime.
  ENDDO.

  gv_i = 2.
  WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).
    IF ( gt_prime[ gv_i ] EQ abap_true ).
      gv_j =  gv_i ** 2.
      WHILE ( gv_j <= p_limit ).
        gt_prime[ gv_j ] = abap_false.
        gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).
      ENDWHILE.
    ENDIF.
    gv_i = gv_i + 1.
  ENDWHILE.

  LOOP AT gt_prime INTO gv_prime.
    IF gv_prime = abap_true.
      WRITE: / sy-tabix.
    ENDIF.
  ENDLOOP.

ABC

HOW TO SIEVE UP TO n:
    SHARE sieve
    PUT {} IN sieve
    FOR cand IN {2..n}: PUT 1 IN sieve[cand]
    FOR cand IN {2..floor root n}:
        IF sieve[cand] = 1:
            PUT cand*cand IN comp
            WHILE comp <= n:
                PUT 0 IN sieve[comp]
                PUT comp+cand IN comp

HOW TO REPORT prime n:
    SHARE sieve
    IF n<2: FAIL
    REPORT sieve[n] = 1

SIEVE UP TO 100
FOR n IN {1..100}:
    IF prime n: WRITE n
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ACL2

(defun nats-to-from (n i)
   (declare (xargs :measure (nfix (- n i))))
   (if (zp (- n i))
       nil
       (cons i (nats-to-from n (+ i 1)))))

(defun remove-multiples-up-to-r (factor limit xs i)
   (declare (xargs :measure (nfix (- limit i))))
   (if (or (> i limit)
           (zp (- limit i))
           (zp factor))
       xs
       (remove-multiples-up-to-r
        factor
        limit
        (remove i xs)
        (+ i factor))))

(defun remove-multiples-up-to (factor limit xs)
   (remove-multiples-up-to-r factor limit xs (* factor 2)))

(defun sieve-r (factor limit)
   (declare (xargs :measure (nfix (- limit factor))))
   (if (zp (- limit factor))
       (nats-to-from limit 2)
       (remove-multiples-up-to factor (+ limit 1)
                               (sieve-r (1+ factor) limit))))

(defun sieve (limit)
   (sieve-r 2 limit))

Action!

DEFINE MAX="1000"

PROC Main()
  BYTE ARRAY t(MAX+1)
  INT i,j,k,first

  FOR i=0 TO MAX
  DO
    t(i)=1
  OD

  t(0)=0
  t(1)=0

  i=2 first=1
  WHILE i<=MAX
  DO
    IF t(i)=1 THEN
      IF first=0 THEN
        Print(", ")
      FI
      PrintI(i)
      FOR j=2*i TO MAX STEP i
      DO
        t(j)=0
      OD
      first=0
    FI
    i==+1
  OD
RETURN
Output:

Screenshot from Atari 8-bit computer

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743,
751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883,
887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997

ActionScript

Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)

function eratosthenes(limit:int):Array
{
	var primes:Array = new Array();
	if (limit >= 2) {
		var sqrtlmt:int = int(Math.sqrt(limit) - 2);
		var nums:Array = new Array(); // start with an empty Array...
		for (var i:int = 2; i <= limit; i++) // and
			nums.push(i); // only initialize the Array once...
		for (var j:int = 0; j <= sqrtlmt; j++) {
			var p:int = nums[j]
			if (p)
				for (var t:int = p * p - 2; t < nums.length; t += p)
					nums[t] = 0;
		}
		for (var m:int = 0; m < nums.length; m++) {
			var r:int = nums[m];
			if (r)
				primes.push(r);
		}
	}
	return primes;
}
var e:Array = eratosthenes(1000);
trace(e);

Output:

Output:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997

Ada

with Ada.Text_IO, Ada.Command_Line;

procedure Eratos is
 
   Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));
   Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
   Base: Positive := 2;
   Cnt: Positive;
begin
   while Base * Base <= Last loop
      if Prime(Base) then
         Cnt := Base + Base;
         while Cnt <= Last loop
            Prime(Cnt) := False;
            Cnt := Cnt + Base;
         end loop;
      end if;
      Base := Base + 1;
   end loop;
   Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");
   for Number in Prime'Range loop
      if Prime(Number) then
         Ada.Text_IO.Put(Positive'Image(Number));
      end if;
   end loop;
end Eratos;
Output:
> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31

Agda

-- imports
open import Data.Nat as      using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec   using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin   using (Fin; suc; zero)
open import Function          using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe        using (Maybe; just; nothing)

-- Without square cutoff optimization
module Simple where
  primes :  n  List (Fin n)
  primes zero = []
  primes (suc zero) = []
  primes (suc (suc zero)) = []
  primes (suc (suc (suc m))) = sieve (tabulate (just  suc))
    where
    sieve :  {n}  Vec (Maybe (Fin (2 + m))) n  List (Fin (3 + m))
    sieve [] = []
    sieve (nothing  xs) =         sieve xs
    sieve (just x   xs) = suc x  sieve (foldr B remove (const []) xs x)
      where
      B = λ n   {i}  Fin i  Vec (Maybe (Fin (2 + m))) n

      remove :  {n}  Maybe (Fin (2 + m))  B n  B (suc n)
      remove _ ys zero    = nothing  ys x
      remove y ys (suc z) = y        ys z

-- With square cutoff optimization
module SquareOpt where
  primes :  n  List (Fin n)
  primes zero = []
  primes (suc zero) = []
  primes (suc (suc zero)) = []
  primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just  Fin.suc  Fin.suc))
    where
    sieve :  {n}      Vec (Maybe (Fin (3 + m))) n  List (Fin (3 + m))
    sieve _ zero = List.mapMaybe id  Vec.toList
    sieve _ (suc _) [] = []
    sieve i (suc l) (nothing  xs) =     sieve (suc i) (l  i  i) xs
    sieve i (suc l) (just x   xs) = x  sieve (suc i) (l  i  i) (Vec.foldr B remove (const []) xs i)
      where
      B = λ n    Vec (Maybe (Fin (3 + m))) n

      remove :  {i}  Maybe (Fin (3 + m))  B i  B (suc i)
      remove _ ys zero    = nothing  ys i
      remove y ys (suc j) = y        ys j

Agena

Tested with Agena 2.9.5 Win32

# Sieve of Eratosthenes

# generate and return a sequence containing the primes up to sieveSize
sieve := proc( sieveSize :: number ) :: sequence is
    local sieve, result;

    result := seq(); # sequence of primes - initially empty
    create register sieve( sieveSize ); # "vector" to be sieved

    sieve[ 1 ] := false;
    for sPos from 2 to sieveSize do sieve[ sPos ] := true od;

    # sieve the primes
    for sPos from 2 to entier( sqrt( sieveSize ) ) do
        if sieve[ sPos ] then
            for p from sPos * sPos to sieveSize by sPos do
                sieve[ p ] := false
            od
        fi
    od;

    # construct the sequence of primes
    for sPos from 1 to sieveSize do
        if sieve[ sPos ] then insert sPos into result fi
    od

return result
end; # sieve


# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();
Output:
 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ALGOL 60

Based on the 1962 Revised Repport:

comment Sieve of Eratosthenes;
begin
   integer array t[0:1000];
   integer i,j,k;
   for i:=0 step 1 until 1000 do t[i]:=1;
   t[0]:=0; t[1]:=0; i:=0;
   for i:=i while i<1000 do
   begin
       for i:=i while i<1000 and t[i]=0 do i:=i+1;
       if i<1000 then
       begin
           j:=2;
           k:=j*i;
           for k:=k while k<1000 do
           begin
               t[k]:=0;
               j:=j+1;
               k:=j*i
           end;
           i:=i+1
       end
   end;
   for i:=0 step 1 until 999 do
   if t[i]≠0 then print(i,ꞌ is primeꞌ)
end

An 1964 Implementation:

Works with: ALGOL 60 for OS/360

'BEGIN'
    'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
    'INTEGER' I,J,K;
    'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;
    SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);
    'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'
    'BEGIN'
        CANDIDATES(/I/) := 1;
    'END';
    CANDIDATES(/0/) := 0;
    CANDIDATES(/1/) := 0;
    I := 0;
    'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'
    'BEGIN'
        'FOR' I := I 'WHILE' I 'LESS' 1000
          'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'
            I := I+1;
        'IF' I 'LESS' 1000 'THEN'
        'BEGIN'
            J := 2;
            K := J*I;
            'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'
            'BEGIN'
                CANDIDATES(/K/) := 0;
                J := J + 1;
                K := J*I;
            'END';
            I := I+1;
            'END'
        'END';
        'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'
        'IF' CANDIDATES(/I/) 'NOTEQUAL' 0  'THEN'
        'BEGIN'
            OUTINTEGER(1,I);
            OUTSTRING(1,'(' IS PRIME')');
            'COMMENT' NEW LINE;
            SYSACT(1,14,1)
        'END'
    'END'
'END'

ALGOL 68

BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
(
  [n]BOOL sieve;
  FOR i TO UPB sieve DO sieve[i] := prime OD;
  INT m = ENTIER sqrt(n);
  sieve[1] := non prime;
  FOR i FROM 2 TO m DO
    IF sieve[i] = prime THEN
      FOR j FROM i*i BY i TO n DO
        sieve[j] := non prime
      OD
    FI
  OD;
  sieve
);
 
 print((eratosthenes(80),new line))
Output:
FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF

ALGOL W

Standard, non-optimised sieve

begin

    % implements the sieve of Eratosthenes                                   %
    %     s(i) is set to true if i is prime, false otherwise                 %
    %     algol W doesn't have a upb operator, so we pass the size of the    %
    %     array in n                                                         %
    procedure sieve( logical array s ( * ); integer value n ) ;
    begin

        % start with everything flagged as prime                             % 
        for i := 1 until n do s( i ) := true;

        % sieve out the non-primes                                           %
        s( 1 ) := false;
        for i := 2 until truncate( sqrt( n ) )
        do begin
            if s( i )
            then begin
                for p := i * i step i until n do s( p ) := false
            end if_s_i
        end for_i ;

    end sieve ;

    % test the sieve procedure                                               %

    integer sieveMax;

    sieveMax := 100;
    begin

        logical array s ( 1 :: sieveMax );

        i_w := 2; % set output field width                                   %
        s_w := 1; % and output separator width                               %

        % find and display the primes                                        %
        sieve( s, sieveMax );
        for i := 1 until sieveMax do if s( i ) then writeon( i );

    end

end.
Output:
 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

Odd numbers only version

Alternative version that only stores odd numbers greater than 1 in the sieve.

begin
    % implements the sieve of Eratosthenes                                   %
    % only odd numbers appear in the sieve, which starts at 3                %
    % s( i ) is set to true if ( i * 2 ) + 1 is prime                        %
    procedure sieve2( logical array s ( * ); integer value n ) ;
    begin
        % start with everything flagged as prime                             % 
        for i := 1 until n do s( i ) := true;
        % sieve out the non-primes                                           %
        % the subscripts of s are  1  2  3  4  5  6  7  8  9 10 11 12 13...  %
        %      which correspond to 3  5  7  9 11 13 15 17 19 21 23 25 27...  %
        for i := 1 until truncate( sqrt( n ) ) do begin
            if s( i ) then begin
                integer ip;
                ip := ( i * 2 ) + 1;
                for p := i + ip step ip until n do s( p ) := false
            end if_s_i
        end for_i ;
    end sieve2 ;
    % test the sieve2 procedure                                              %
    integer primeMax, arrayMax;
    primeMax := 100;
    arrayMax := ( primeMax div 2 ) - 1;
    begin
        logical array s ( 1 :: arrayMax);
        i_w := 2; % set output field width                                   %
        s_w := 1; % and output separator width                               %
        % find and display the primes                                        %
        sieve2( s, arrayMax );
        write( 2 );
        for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
    end
end.
Output:

Same as the standard version.

ALGOL-M

BEGIN

COMMENT
  FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING 
  CLASSIC SIEVE OF ERATOSTHENES;

% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
    INTEGER R1, R2;
    R1 := N;
    R2 := 1;
    WHILE R1 > R2 DO
        BEGIN
            R1 := (R1+R2) / 2;
            R2 := N / R1;
        END;
    ISQRT := R1;
END;

INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];

LIMIT := 1000;
FALSE := 0;
TRUE := 1;

WRITE("FINDING PRIMES FROM 2 TO",LIMIT);

% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
  FLAGS[I] := TRUE;

% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
  BEGIN
    IF FLAGS[I] = TRUE THEN
        FOR J := (I * I) STEP I UNTIL LIMIT DO
          FLAGS[J] := FALSE;
  END;

% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");  
FOR I := 2 STEP 1 UNTIL LIMIT DO
  BEGIN
    IF FLAGS[I] = TRUE THEN 
      BEGIN
         WRITEON(I);
         COUNT := COUNT + 1;
         COL := COL + 1;
         IF COL > 10 THEN
           BEGIN
             WRITE("");
             COL := 1;
           END;
      END;
  END;

WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");

END
Output:
FINDING PRIMES FROM 2 TO  1000
INTIALIZING ... SIEVING ... PRINTING
    2     3     5     7    11    13    17    19    23    29
   31    37    41    43    47    53    59    61    67    71
                      . . .
  877   881   883   887   907   911   919   929   937   941
  947   953   967   971   977   983   991   997

  168 PRIMES WERE FOUND.

APL

All these versions requires ⎕io←0 (index origin 0).

It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.

Non-Optimized Version

sieve2{                          
  b1             
  b[2]0         
  2⍵:b             
  p{/⍳⍴}*0.5  
  m1+⌊(-1+p×p)÷p  
  b  p {b[×+⍳]0}¨ m
}

primes2{/⍳⍴}sieve2

The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).

=== Optimized Version ===|

sieve{                           
  b{(×/)¨~¨1}2 3 5
  b[6](6)0 0 1 1 0 1 
  49⍵:b                    
  p3{/⍳⍴}*0.5        
  m1+⌊(-1+p×p)÷2×p        
  b  p {b[×+2×⍳]0}¨ m      
}

primes{/⍳⍴}sieve

The optimizations are as follows:

  • Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
  • Subsequently, only odd multiples of primes > 5 are marked.
  • Multiples of a prime to be marked start at its square.

Examples

   primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

   primes¨ 14
┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐
││││22 32 32 3 52 3 52 3 5 72 3 5 72 3 5 72 3 5 72 3 5 7 112 3 5 7 11
└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘

   sieve 13
0 0 1 1 0 1 0 1 0 0 0 1 0

   +/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534

The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).

AppleScript

on sieveOfEratosthenes(limit)
    script o
        property numberList : {missing value}
    end script
    
    repeat with n from 2 to limit
        set end of o's numberList to n
    end repeat
    repeat with n from 2 to (limit ^ 0.5 div 1)
        if (item n of o's numberList is n) then
            repeat with multiple from (n * n) to limit by n
                set item multiple of o's numberList to missing value
            end repeat
        end if
    end repeat
    
    return o's numberList's numbers
end sieveOfEratosthenes

sieveOfEratosthenes(1000)
Output:
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}

ARM Assembly

Works with: as version Raspberry Pi
/* ARM assembly Raspberry PI  */
/*  program cribleEras.s   */

 /* REMARK 1 : this program use routines in a include file 
   see task Include a file language arm assembly 
   for the routine affichageMess conversion10 
   see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes                       */
/************************************/
.include "../constantes.inc"

.equ MAXI,      101


/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss  
sZoneConv:                  .skip 24
TablePrime:                 .skip   4 * MAXI 
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main 
main:                               @ entry of program 
    ldr r4,iAdrTablePrime           @ address prime table
    mov r0,#2                       @ prime 2
    bl displayPrime
    mov r1,#2
    mov r2,#1
1:                                  @ loop for multiple of 2
    str r2,[r4,r1,lsl #2]           @ mark  multiple of 2
    add r1,#2
    cmp r1,#MAXI                    @ end ?
    ble 1b                          @ no loop
    mov r1,#3                       @ begin indice
    mov r3,#1
2:
    ldr r2,[r4,r1,lsl #2]           @ load table élément
    cmp r2,#1                       @ is prime ?
    beq 4f
    mov r0,r1                       @ yes -> display
    bl displayPrime
    mov r2,r1
3:                                  @ and loop to mark multiples of this prime
    str r3,[r4,r2,lsl #2]
    add r2,r1                       @ add the prime
    cmp r2,#MAXI              @ end ?
    ble 3b                          @ no -> loop
4:
    add r1,#2                       @ other prime in table
    cmp r1,#MAXI              @ end table ?
    ble 2b                          @ no -> loop

100:                                @ standard end of the program 
    mov r0, #0                      @ return code
    mov r7, #EXIT                   @ request to exit program
    svc #0                          @ perform the system call
iAdrszCarriageReturn:    .int szCarriageReturn
iAdrsMessResult:         .int sMessResult
iAdrTablePrime:          .int TablePrime

/******************************************************************/
/*      Display prime table elements                                */ 
/******************************************************************/
/* r0 contains the prime */
displayPrime:
    push {r1,lr}                    @ save registers
    ldr r1,iAdrsZoneConv
    bl conversion10                 @ call décimal conversion
    ldr r0,iAdrsMessResult
    ldr r1,iAdrsZoneConv            @ insert conversion in message
    bl strInsertAtCharInc
    bl affichageMess                @ display message
100:
    pop {r1,lr}
    bx lr
iAdrsZoneConv:                   .int sZoneConv  
/***************************************************/
/*      ROUTINES INCLUDE                           */
/***************************************************/
.include "../affichage.inc"
Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
Prime  : 101

Arturo

sieve: function [upto][
    composites: array.of: inc upto false
    loop 2..to :integer sqrt upto 'x [
        if not? composites\[x][
            loop range.step: x x^2 upto 'c [
                composites\[c]: true
            ]
        ]
    ]
    result: new []
    loop.with:'i composites 'c [
        unless c -> 'result ++ i
    ]
    return result -- [0,1]
]

print sieve 100
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

AutoHotkey

Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo

MsgBox % "12345678901234567890`n" Sieve(20) 

Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime 
   Static zero := 48, one := 49 ; Asc("0"), Asc("1") 
   VarSetCapacity(S,n,one) 
   NumPut(zero,S,0,"char") 
   i := 2 
   Loop % sqrt(n)-1 { 
      If (NumGet(S,i-1,"char") = one) 
         Loop % n//i 
            If (A_Index > 1) 
               NumPut(zero,S,A_Index*i-1,"char") 
      i += 1+(i>2) 
   } 
   Return S 
}

Alternative Version

Sieve_of_Eratosthenes(n){
	arr := []
	loop % n-1
		if A_Index>1
			arr[A_Index] := true

	for i, v in arr	{
		if (i>Sqrt(n))
			break
		else if arr[i]
			while ((j := i*2 + (A_Index-1)*i) < n)
				arr.delete(j)
	}
	return Arr
}

Examples:

n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
	output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return
Output:
.	2	3	.	5	.	7	.	.	.
11	.	13	.	.	.	17	.	19	.
.	.	23	.	.	.	.	.	29	.
31	.	.	.	.	.	37	.	.	.
41	.	43	.	.	.	47	.	.	.
.	.	53	.	.	.	.	.	59	.
61	.	.	.	.	.	67	.	.	.
71	.	73	.	.	.	.	.	79	.
.	.	83	.	.	.	.	.	89	.
.	.	.	.	.	.	97	.	.	.

AutoIt

#include <Array.au3>
$M = InputBox("Integer", "Enter biggest Integer")
Global $a[$M], $r[$M], $c = 1
For $i = 2 To $M -1
	If Not $a[$i] Then
		$r[$c] = $i
		$c += 1
		For $k = $i To $M -1 Step $i
			$a[$k] = True
		Next
	EndIf
Next
$r[0] = $c - 1
ReDim $r[$c]
_ArrayDisplay($r)

AWK

An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:

$ awk '{for(i=2;i<=$1;i++) a[i]=1;
>       for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) delete a[i*j];
>       for(i in a) printf i" "}'
100
71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3

The following variant does not unset non-primes, but sets them to 0, to preserve order in output:

$ awk '{for(i=2;i<=$1;i++) a[i]=1;
>       for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) a[i*j]=0;
>       for(i=2;i<=$1;i++) if(a[i])printf i" "}'
100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:

# usage:  gawk  -v n=101  -f sieve.awk

function sieve(n) { # print n,":"
	for(i=2; i<=n;      i++) a[i]=1;
	for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;
	for(i=2; i<=n;      i++) if(a[i]) printf i" "
	print ""
}

BEGIN	{ print "Sieve of Eratosthenes:"
	  if(n>1) sieve(n)
	}

	{ n=$1+0 }
n<2	{ exit }
	{ sieve(n) }

END	{ print "Bye!" }

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.

BEGIN {  ULIMIT=100

for ( n=1 ; (n++) < ULIMIT ; ) 
    if (n in S) {
        p = S[n]
        delete S[n]
        for ( m = n ; (m += p) in S ; )  { }
        S[m] = p 
        }
    else  print ( S[(n+n)] = n )
}

Bash

See solutions at UNIX Shell.

BASIC

Works with: FreeBASIC
Works with: RapidQ
DIM n AS Integer, k AS Integer, limit AS Integer

INPUT "Enter number to search to: "; limit
DIM flags(limit) AS Integer

FOR n = 2 TO SQR(limit)
    IF flags(n) = 0 THEN
        FOR k = n*n TO limit STEP n
            flags(k) = 1
        NEXT k
    END IF
NEXT n

' Display the primes
FOR n = 2 TO limit
    IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n

Applesoft BASIC

10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20  DIM FLAGS(LIMIT)
30  FOR N = 2 TO SQR (LIMIT)
40  IF FLAGS(N) < > 0 GOTO 80
50  FOR K = N * N TO LIMIT STEP N
60  FLAGS(K) = 1
70  NEXT K
80  NEXT N
90  REM  DISPLAY THE PRIMES
100  FOR N = 2 TO LIMIT
110  IF FLAGS(N) = 0 THEN PRINT N;", ";
120  NEXT N

Atari BASIC

Translation of: Commodore BASIC

Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.

100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160  IF N(P)=0 THEN 200
170  FOR I=P*P TO LI STEP P
180    N(I)=0
190  NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230   IF N(I)=0 THEN 260
240   PRINT I,:C=C+1
250   IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT
Output:
  Ready
  RUN
  LIMIT?100
  2         3         5
  7         11        13
  17        19        23
  29        31        37
  41        43        47
  53        59        61
  67        71        73
  79        83        89
  97

Commodore BASIC

Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.

100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 :   N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT
Output:
READY.
RUN
LIMIT? 100
 2         3         5         7
 11        13        17        19
 23        29        31        37
 41        43        47        53
 59        61        67        71
 73        79        83        89
 97

READY.

IS-BASIC

100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140   LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170   IF T(I)<>1 THEN
180     FOR K=I*I TO LIMIT STEP I
190       LET T(K)=1
200     NEXT
210   END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240   IF T(I)=0 THEN PRINT I;
250 NEXT

Locomotive Basic

10 DEFINT a-z
20 INPUT "Limit";limit
30 DIM f(limit)
40 FOR n=2 TO SQR(limit)
50 IF f(n)=1 THEN 90
60 FOR k=n*n TO limit STEP n
70 f(k)=1
80 NEXT k
90 NEXT n
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT

MSX Basic

5 REM Tested with MSXPen web emulator
6 REM Translated from Rosetta's ZX Spectrum implementation 
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

Sinclair ZX81 BASIC

If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on FAST and SLOW: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering FAST mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to SLOW mode when we have something to print out.

 10 INPUT L
 20 FAST
 30 DIM N(L)
 40 FOR I=2 TO SQR L
 50 IF N(I) THEN GOTO 90
 60 FOR J=I+I TO L STEP I
 70 LET N(J)=1
 80 NEXT J
 90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I

ZX Spectrum Basic

10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR l
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

QBasic

Works with: QBasic version 1.1
Works with: QuickBasic version 4.5
limit = 120

DIM flags(limit)
FOR n = 2 TO limit
    flags(n) = 1
NEXT n

PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 TO SQR(limit)
    IF flags(n) = 1 THEN
        FOR i = n * n TO limit STEP n
            flags(i) = 0
    NEXT i
    END IF
NEXT n

FOR n = 1 TO limit
    IF flags(n) THEN PRINT USING "####"; n;
NEXT n
Output:
Prime numbers less than or equal to 120 are: 
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113

BASIC256

arraybase 1
limit = 120

dim  flags(limit)
for n = 2 to limit
    flags[n] = True
next n

print "Prime numbers less than or equal to "; limit; " are: "
for n = 2 to sqr(limit)
    if flags[n] then
        for i = n * n to limit step n
            flags[i] = False
        next i
    end if
next n

for n = 1 to limit
    if flags[n] then print rjust(n,4);
next n
Output:
Prime numbers less than or equal to 120 are: 
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113

True BASIC

Translation of: QBasic
LET limit = 120
DIM flags(0)
MAT redim flags(limit)
FOR n = 2 to limit
    LET flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 to sqr(limit)
    IF flags(n) = 1 then
       FOR i = n*n to limit step n
           LET flags(i) = 0
       NEXT i
    END IF
NEXT n
FOR n = 1 to limit
    IF flags(n)<>0 then PRINT  using "####": n;
NEXT n
END
Output:
Same as QBasic entry.

QL SuperBASIC

using 'easy way' to 'add' 2n wheels

Translation of: ZX Spectrum Basic

Sets h$ to 1 for higher multiples of 2 via FILL$, later on sets STEP to 2n; replaces Floating Pt array p(z) with string variable h$(z) to sieve out all primes < z=441 (l=21) in under 1K, so that h$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.

10 INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h$="10" : h$=h$ & FILL$("01",z)
40      FOR n=3 TO l 
50 IF h$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h$(k)=1 
80      END FOR n   
90 REM Display the primes     
100 FOR n=2 TO z: IF h$(n)=0: PRINT n;", ";

2i wheel emulation of Sinclair ZX81 BASIC

Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the STEP of 2 in line 40 mitigates line 50's inefficiency when going to 90.

 10 INPUT Z
 15 LET L=SQR(Z)
 30 LET H$="10"
 32 FOR J=3 TO Z STEP 2
 34 LET H$=H$ & "01"
 36 NEXT J
 40 FOR I=3 TO L STEP 2
 50 IF H$(I)="1" THEN GOTO 90
 60 FOR J=I*I TO Z STEP I+I
 70 LET H$(J)="1"
 80 NEXT J
 90 NEXT I
110 FOR I=2 TO Z
120 IF H$(I)="0" THEN PRINT I!
130 NEXT I

2i wheel emulation of Sinclair ZX80 BASIC

. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, L (=21 at most on 1K ZX80s for all primes till 439).

Backward-compatible on ZX80s after substituting ** for ^ in line 120.

 10 INPUT L
 15 LET Z=(L+1)*(L- 1)/2
 30 DIM H(Z)
 40 FOR I=3 TO L STEP 2
 50 IF H((I-1)/ 2) THEN GOTO 90
 60 FOR J=I*I TO L*L STEP I+I
 70 LET H((J-1)/ 2)=1
 80 NEXT J
 90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Sieve of Sundaram

Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now just a slight transformation away from the Sieve of Sundaram, as transformed as follows: O is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.

Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.

 10 INPUT O
 15 LET Z=2*O*O+O*2
 30 DIM H(Z)
 40 FOR I=1 TO O
 45 LET A=2*I*I+I*2
 50 REM IF H(A) THEN GOTO 90
 60 FOR J=A TO Z STEP 1+I*2
 65 REM IF H(J) THEN GOTO 80
 70 LET H(J)=1
 80 NEXT J
 90 NEXT I 
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Eulerian optimisation

While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of FOR statements (& as their 1K of static RAM is equivalent to L1 cache, even in FAST mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)

 
10 INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h$=FILL$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h$((2*q+1)*p+q)="1" 
42 GOTO 87
61 IF h$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h$(z%)=" ": PRINT N%(z%)! 
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2

Batch File

:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEEXTENSIONS
rem echo on
set /p n=limit: 
rem set n=100
for /L %%i in (1,1,%n%) do set crible.%%i=1
for /L %%i in (2,1,%n%) do (
  if !crible.%%i! EQU 1 (
    set /A w = %%i * 2
    for /L %%j in (!w!,%%i,%n%) do (
	  set crible.%%j=0
	)
  )
)
for /L %%i in (2,1,%n%) do (
  if !crible.%%i! EQU 1 echo %%i 
)
pause
Output:
limit: 100
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97

BBC BASIC

      limit% = 100000
      DIM sieve% limit%
      
      prime% = 2
      WHILE prime%^2 < limit%
        FOR I% = prime%*2 TO limit% STEP prime%
          sieve%?I% = 1
        NEXT
        REPEAT prime% += 1 : UNTIL sieve%?prime%=0
      ENDWHILE
      
      REM Display the primes:
      FOR I% = 1 TO limit%
        IF sieve%?I% = 0 PRINT I%;
      NEXT

BCPL

get "libhdr"

manifest $( LIMIT = 1000 $)

let sieve(prime,max) be
$(  let i = 2
    0!prime := false
    1!prime := false
    for i = 2 to max do i!prime := true
    while i*i <= max do
    $(  if i!prime do
        $(  let j = i*i
            while j <= max do
            $(  j!prime := false
                j := j + i
            $)
        $)
        i := i + 1
    $)
$)

let start() be
$(  let prime = vec LIMIT
    let col = 0
    sieve(prime, LIMIT)
    for i = 2 to LIMIT do
        if i!prime do
        $(  writef("%I4",i)   
            col := col + 1
            if col rem 20 = 0 then wrch('*N')
        $)
    wrch('*N')
$)
Output:
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

Odds-only bit packed array version (64 bit)

This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.

GET "libhdr"

LET lowbit(n) =
    0 -> -1,
    VALOF {
        // The table is byte packed to conserve space; therefore we must
        // unpack the structure.
        //
        LET deBruijn64 = TABLE
            #x0001300239311C03, #x3D3A322A261D1104,
            #x3E373B2435332B16, #x2D27211E18120C05,
            #x3F2F381B3C292510, #x362334152C20170B,
            #x2E1A280F22141F0A, #x190E13090D080706

        LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
        RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
    }

LET primes_upto(limit) =
    limit < 3 -> 0,
    VALOF {
        LET bit_sz = (limit + 1) / 2 - 1
        LET bit, p = ?, ?
        LET q, r = bit_sz >> 6, bit_sz & #x3F
        LET sz = q - (r > 0)
        LET sieve = getvec(sz)

        // Initialize the array
        FOR i = 0 TO q - 1 DO
            sieve!i := -1
        IF r > 0 THEN sieve!q := ~(-1 << r)
        sieve!sz := -1 // Sentinel value to mark the end -
              // (after sieving, we'll never have 64 consecutive odd primes.)

        // run the sieve
        bit := 0
        {
            WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
                bit +:= 1
            p := 2*bit + 3
            q := p*p
            IF q > limit THEN RESULTIS sieve
            r := (q - 3) >> 1
            UNTIL r >= bit_sz DO {
                sieve[r >> 6] &:= ~(1 << (r & #x3F))
                r +:= p
            }
            bit +:= 1
        } REPEAT
    }

MANIFEST { // fields in an iterable
    sieve_start; sieve_bits; sieve_ptr
}

LET prime_iter(sieve) = VALOF {
    LET iter = getvec(2)
    iter!sieve_start := 0
    iter!sieve_bits := sieve!0
    iter!sieve_ptr := sieve
    RESULTIS iter
}

LET nextprime(iter) =
    !iter!sieve_ptr = -1 -> 0,  // guard entry if at the end already
    VALOF {
        LET p, x = ?, ?

        // iter!sieve_start is also a flag to yield 2.
        IF iter!sieve_start = 0 {
            iter!sieve_start := 3
            RESULTIS 2
        }
        x := iter!sieve_bits
        {
            TEST x ~= 0
            THEN {
                p := (lowbit(x) << 1) + iter!sieve_start
                x &:= x - 1
                iter!sieve_bits := x
                RESULTIS p
            }
            ELSE {
                iter!sieve_start +:= 128
                iter!sieve_ptr +:= 1
                x := !iter!sieve_ptr
                IF x = -1 RESULTIS 0
            }
        } REPEAT
    }

LET show(sieve) BE {
    LET iter = prime_iter(sieve)
    LET c, p = 0, ?
    {
        p := nextprime(iter)
        IF p = 0 THEN {
            wrch('*n')
            freevec(iter)
            RETURN
        }
        IF c MOD 10 = 0 THEN wrch('*n')
        c +:= 1
        writef("%8d", p)
    } REPEAT
}

LET start() = VALOF {
    LET n = ?
    LET argv = VEC 20
    LET sz = ?
    LET primes = ?

    sz := rdargs("upto/a/n/p", argv, 20)
    IF sz = 0 RESULTIS 1
    n := !argv!0
    primes := primes_upto(n)
    IF primes = 0 RESULTIS 1 // no array allocated because limit too small
    show(primes)
    freevec(primes)
    RESULTIS 0
}
Output:
$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000> 
       2       3       5       7      11      13      17      19      23      29
      31      37      41      43      47      53      59      61      67      71
      73      79      83      89      97     101     103     107     109     113
     127     131     137     139     149     151     157     163     167     173
     179     181     191     193     197     199     211     223     227     229
     233     239     241     251     257     263     269     271     277     281
     283     293     307     311     313     317     331     337     347     349
     353     359     367     373     379     383     389     397     401     409
     419     421     431     433     439     443     449     457     461     463
     467     479     487     491     499     503     509     521     523     541
     547     557     563     569     571     577     587     593     599     601
     607     613     617     619     631     641     643     647     653     659
     661     673     677     683     691     701     709     719     727     733
     739     743     751     757     761     769     773     787     797     809
     811     821     823     827     829     839     853     857     859     863
     877     881     883     887     907     911     919     929     937     941
     947     953     967     971     977     983     991     997
0.005> 

Befunge

2>:3g" "-!v\  g30          <
 |!`"O":+1_:.:03p>03g+:"O"`|
 @               ^  p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789

Binary Lambda Calculus

The BLC sieve of Eratosthenes as documented at https://github.com/tromp/AIT/blob/master/characteristic_sequences/primes.lam is the 167 bit program

00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110

The infinitely long output is

001101010001010001010001000001010000010001010001000001000001010000010001010000010001000001000000010001010001010001000000000000010001000001010000000001010000010000010001000001000001010000000001010001010000000000010000000000010001010001000001010000000001000001000001000001010000010001010000000001000000000000010001010001000000000000010000010000000001010001000001000...

BQN

A more efficient sieve (primes below one billion in under a minute) is provided as PrimesTo in bqn-libs primes.bqn.

Primes  {
  𝕩2 ? 0 ;             # No primes below 2
  p  𝕊⌈√n𝕩             # Initial primes by recursion
  b  2≤↕n               # Initial sieve: no 0 or 1
  E  {((𝕩×𝕩+⊢))n}  # Multiples of 𝕩 under n, starting at 𝕩×𝕩
  / b E{0¨(𝕨)𝕩}´ p   # Cross them out
}
Output:
   Primes 100
 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
   Primes¨ 107  # Number of primes below 1e0, 1e1, ... 1e6
 0 4 25 168 1229 9592 78498 

Bracmat

This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.

( ( eratosthenes
  =   n j i
    .   !arg:?n
      & 1:?i
      &   whl
        ' ( (1+!i:?i)^2:~>!n:?j
          & ( !!i
            |   whl
              ' ( !j:~>!n
                & nonprime:?!j
                & !j+!i:?j
                )
            )
          )
      & 1:?i
      &   whl
        ' ( 1+!i:~>!n:?i
          & (!!i|put$(!i " "))
          )
  )
& eratosthenes$100
)
Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

C

Plain sieve, without any optimizations:

#include <stdlib.h>
#include <math.h>

char*
eratosthenes(int n, int *c)
{
	char* sieve;
	int i, j, m;

	if(n < 2)
		return NULL;

	*c = n-1;     /* primes count */
	m = (int) sqrt((double) n);

	/* calloc initializes to zero */
	sieve = calloc(n+1,sizeof(char));
	sieve[0] = 1;
	sieve[1] = 1;
	for(i = 2; i <= m; i++)
		if(!sieve[i])
			for (j = i*i; j <= n; j += i)
				if(!sieve[j]){
					sieve[j] = 1; 
					--(*c);
				}
  	return sieve;
}

Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.

Another example:

We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.

To print this back, we look for ones in the array and only print those spots.

#include <stdio.h>
#include <malloc.h>
void sieve(int *, int);

int main(int argc, char *argv)
{
    int *array, n=10;
    array =(int *)malloc((n + 1) * sizeof(int));
    sieve(array,n);
    return 0;
}

void sieve(int *a, int n)
{
    int i=0, j=0;

    for(i=2; i<=n; i++) {
        a[i] = 1;
    }

    for(i=2; i<=n; i++) {
        printf("\ni:%d", i);
        if(a[i] == 1) {
            for(j=i; (i*j)<=n; j++) {
                printf ("\nj:%d", j);
                printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);
                a[(i*j)] = 0;
                printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);
            }
        }
    }

    printf("\nPrimes numbers from 1 to %d are : ", n);
    for(i=2; i<=n; i++) {
        if(a[i] == 1)
            printf("%d, ", i);
    }
    printf("\n\n");
}
Output:
i:2
j:2
Before a[2*2]: 1
After a[2*2]: 0
j:3
Before a[2*3]: 1
After a[2*3]: 0
j:4
Before a[2*4]: 1
After a[2*4]: 0
j:5
Before a[2*5]: 1
After a[2*5]: 0
i:3
j:3
Before a[3*3]: 1
After a[3*3]: 0
i:4
i:5
i:6
i:7
i:8
i:9
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7,

C#

Works with: C# version 2.0+
using System;
using System.Collections;
using System.Collections.Generic;

namespace SieveOfEratosthenes
{
    class Program
    {
        static void Main(string[] args)
        {
            int maxprime = int.Parse(args[0]);
            var primelist = GetAllPrimesLessThan(maxprime);
            foreach (int prime in primelist)
            {
                Console.WriteLine(prime);
            }
            Console.WriteLine("Count = " + primelist.Count);
            Console.ReadLine();
        }

        private static List<int> GetAllPrimesLessThan(int maxPrime)
        {
            var primes = new List<int>();
            var maxSquareRoot = (int)Math.Sqrt(maxPrime);
            var eliminated = new BitArray(maxPrime + 1);

            for (int i = 2; i <= maxPrime; ++i)
            {
                if (!eliminated[i])
                {
                    primes.Add(i);
                    if (i <= maxSquareRoot)
                    {
                        for (int j = i * i; j <= maxPrime; j += i)
                        {
                            eliminated[j] = true;
                        }
                    }
                }
            }
            return primes;
        }
    }
}

Richard Bird Sieve

Translation of: F#

To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesBird : IEnumerable<PrimeT> {
    private struct CIS<T> {
      public T v; public Func<CIS<T>> cont;
      public CIS(T v, Func<CIS<T>> cont) {
        this.v = v; this.cont = cont;
      }
    }
    private CIS<PrimeT> pmlts(PrimeT p) {
      Func<PrimeT, CIS<PrimeT>> fn = null;
      fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));
      return fn(p * p);
    }
    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }
    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
      var x = xs.v; var y = ys.v;
      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
    }
    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }
    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
      var nn = n; var ncs = cs;
      for (; ; ++nn) {
        if (nn >= ncs.v) ncs = ncs.cont();
        else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));
      }
    }
    private CIS<PrimeT> prms() {
      return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }
    public IEnumerator<PrimeT> GetEnumerator() {
      for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;
    }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

Tree Folding Sieve

Translation of: F#

The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesTreeFold : IEnumerable<PrimeT> {
    private struct CIS<T> {
      public T v; public Func<CIS<T>> cont;
      public CIS(T v, Func<CIS<T>> cont) {
        this.v = v; this.cont = cont;
      }
    }
    private CIS<PrimeT> pmlts(PrimeT p) {
      var adv = p + p;
      Func<PrimeT, CIS<PrimeT>> fn = null;
      fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));
      return fn(p * p);
    }
    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));
    }
    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
      var x = xs.v; var y = ys.v;
      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
    }
    private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {
      var nxtcss = css.cont();
      return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }
    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));
    }
    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
      var nn = n; var ncs = cs;
      for (; ; nn += 2) {
        if (nn >= ncs.v) ncs = ncs.cont();
        else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));
      }
    }
    private CIS<PrimeT> oddprms() {
      return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));
    }
    public IEnumerator<PrimeT> GetEnumerator() {
      yield return 2;
      for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;
    }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code runs over ten times faster than the original Richard Bird algorithm.

Priority Queue Sieve

Translation of: F#

First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:

namespace PriorityQ {
  using KeyT = System.UInt32;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  class Tuple<K, V> { // for DotNet 3.5 without Tuple's
    public K Item1; public V Item2;
    public Tuple(K k, V v) { Item1 = k; Item2 = v; }
    public override string ToString() {
      return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";
    }
  }
  class MinHeapPQ<V> {
    private struct HeapEntry {
      public KeyT k; public V v;
      public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }
    }
    private List<HeapEntry> pq;
    private MinHeapPQ() { this.pq = new List<HeapEntry>(); }
    private bool mt { get { return pq.Count == 0; } }
    private Tuple<KeyT, V> pkmn {
      get {
        if (pq.Count == 0) return null;
        else {
          var mn = pq[0];
          return new Tuple<KeyT, V>(mn.k, mn.v);
        }
      }
    }
    private void psh(KeyT k, V v) { // add extra very high item if none
      if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));
      var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...
      for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {
        var t = pq[ni - 1];
        if (t.k > k) pq[i - 1] = t; else break;
      }
      pq[i - 1] = new HeapEntry(k, v);
    }
    private void siftdown(KeyT k, V v, int ndx) {
      var cnt = pq.Count - 1; var i = ndx;
      for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {
        var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;
        var nk = k;
        if (k > lk) { i = ni; nk = lk; }
        if (nk > rk) { ni += 1; i = ni; }
        if (i != oi) pq[oi] = pq[i]; else break;
      }
      pq[i] = new HeapEntry(k, v);
    }
    private void rplcmin(KeyT k, V v) {
      if (pq.Count > 1) siftdown(k, v, 0); }
    public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }
    public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }
    public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {
      pq.psh(k, v); return pq; }
    public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
      pq.rplcmin(k, v); return pq; }
}


Restricted Base Primes Queue

The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesPQ : IEnumerable<PrimeT> {
    private IEnumerator<PrimeT> nmrtr() {
      MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;
      PrimeT bp = 3; PrimeT q = 9;
      IEnumerator<PrimeT> bps = null;
      yield return 2; yield return 3;
      for (var n = (PrimeT)5; ; n += 2) {
        if (n >= q) { // always equal or less...
          if (q <= 9) {
            bps = nmrtr();
            bps.MoveNext(); bps.MoveNext(); } // move to 3...
          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
          var adv = bp + bp; bp = nbp;
          pq = MinHeapPQ<PrimeT>.push(n + adv, adv, pq);
        }
        else {
          var pk = MinHeapPQ<PrimeT>.peekMin(pq);
          var ck = (pk == null) ? q : pk.Item1;
          if (n >= ck) {
            do { var adv = pk.Item2;
                  pq = MinHeapPQ<PrimeT>.replaceMin(ck + adv, adv, pq);
                  pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;
            } while (n >= ck);
          }
          else yield return n;
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code is at least about 2.5 times faster than the Tree Folding version.


Dictionary (Hash table) Sieve

The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesDict : IEnumerable<PrimeT> {
    private IEnumerator<PrimeT> nmrtr() {
      Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();
      PrimeT bp = 3; PrimeT q = 9;
      IEnumerator<PrimeT> bps = null;
      yield return 2; yield return 3;
      for (var n = (PrimeT)5; ; n += 2) {
        if (n >= q) { // always equal or less...
          if (q <= 9) {
            bps = nmrtr();
            bps.MoveNext(); bps.MoveNext();
          } // move to 3...
          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
          var adv = bp + bp; bp = nbp;
          dct.Add(n + adv, adv);
        }
        else {
          if (dct.ContainsKey(n)) {
            PrimeT nadv; dct.TryGetValue(n, out nadv); dct.Remove(n); var nc = n + nadv;
            while (dct.ContainsKey(nc)) nc += nadv;
            dct.Add(nc, nadv);
          }
          else yield return n;
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".

Best performance: CPU-Cache-Optimized Segmented Sieve

All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesPgd : IEnumerable<PrimeT> {
    private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes
    private const int BFBTS = PGSZ * 8; // in bits
    private const int BFRNG = BFBTS * 2;
    public IEnumerator<PrimeT> nmrtr() {
      IEnumerator<PrimeT> bps = null;
      List<uint> bpa = new List<uint>();
      uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words
      yield return 2;
      for (var lowi = (PrimeT)0; ; lowi += BFBTS) {
        for (var bi = 0; ; ++bi) {
          if (bi < 1) {
            if (bi < 0) { bi = 0; yield return 2; }
            PrimeT nxt = 3 + lowi + lowi + BFRNG;
            if (lowi <= 0) { // cull very first page
              for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
                if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)
                  for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
                    cbuf[j >> 5] |= 1u << j;
            }
            else { // cull for the rest of the pages
              Array.Clear(cbuf, 0, cbuf.Length);
              if (bpa.Count == 0) { // inite secondar base primes stream
                bps = nmrtr(); bps.MoveNext(); bps.MoveNext();
                bpa.Add((uint)bps.Current); bps.MoveNext();
              } // add 3 to base primes array
              // make sure bpa contains enough base primes...
              for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {
                p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);
              }
              for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {
                var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;
                // adjust start index based on page lower limit...
                if (s >= lowi) s -= lowi;
                else {
                  var r = (lowi - s) % p;
                  s = (r != 0) ? p - r : 0;
                }
                for (var j = (uint)s; j < BFBTS; j += p)
                  cbuf[j >> 5] |= 1u << ((int)j);
              }
            }
          }
          while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;
          if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);
          else break; // outer loop for next page segment...
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.

The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "roll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.

In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).

Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.

All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):

    static void Main(string[] args) {
      Console.WriteLine(new PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
    }

To produce the following output for all tested versions (although some are considerably faster than others):

Output:
15485863

C++

Standard Library

This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>

// Fills the range [start, end) with 1 if the integer corresponding to the index is composite and 0 otherwise.
// requires: I is RandomAccessIterator
template<typename I>
void mark_composites(I start, I end)
{
    std::fill(start, end, 0);

    for (auto it = start + 1; it != end; ++it)
    {
        if (*it == 0)
        {
            auto prime = std::distance(start, it) + 1;
            // mark all multiples of this prime number as composite.
            auto multiple_it = it;
            while (std::distance(multiple_it, end) > prime)
            {
                std::advance(multiple_it, prime);
                *multiple_it = 1;
            }
        }
    }
}

// Fills "out" with the prime numbers in the range 2...N where N = distance(start, end).
// requires: I is a RandomAccessIterator
//           O is an OutputIterator
template <typename I, typename O>
O sieve_primes(I start, I end, O out)
{
    mark_composites(start, end);
    for (auto it = start + 1; it != end; ++it)
    {
        if (*it == 0)
        {
            *out = std::distance(start, it) + 1;
            ++out;
        }
    }
    return out;
}

int main()
{
    std::vector<uint8_t> is_composite(1000);
    sieve_primes(is_composite.begin(), is_composite.end(), std::ostream_iterator<int>(std::cout, " "));

    // Alternative to store in a vector: 
    // std::vector<int> primes;
    // sieve_primes(is_composite.begin(), is_composite.end(), std::back_inserter(primes));
}

Boost

// yield all prime numbers less than limit. 
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
  std::vector<bool> is_prime(limit, true);
  
  const int sqrt_limit = static_cast<int>(std::sqrt(limit));
  for (int n = 2; n <= sqrt_limit; ++n)
    if (is_prime[n]) {
	yield(n);

	for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n) 
      //NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
	  is_prime[k] = false;
    }

  for (int n = sqrt_limit + 1; n < limit; ++n)
    if (is_prime[n])
	yield(n);
}

Full program:

Works with: Boost
/**
   $ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
 */
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>

int main(int argc, char *argv[])
{
  using namespace std;
  using namespace boost::lambda;

  int limit = 10000;
  if (argc == 2) {
    stringstream ss(argv[--argc]);
    ss >> limit;

    if (limit < 1 or ss.fail()) {
      cerr << "USAGE:\n  sieve LIMIT\n\nwhere LIMIT in the range [1, " 
	   << numeric_limits<int>::max() << ")" << endl;
      return 2;
    }
  }

  // print primes less then 100
  primesupto(100, cout << _1 << " ");
  cout << endl;  

  // find number of primes less then limit and their sum
  int count = 0;
  uintmax_t sum = 0;
  primesupto(limit, (var(sum) += _1, var(count) += 1));

  cout << "limit sum pi(n)\n" 
       << limit << " " << sum << " " << count << endl;
}

Chapel

This example is incorrect. Please fix the code and remove this message.

Details: Doesn't compile since at least Chapel version 1.20 to 1.24.1.

This solution uses nested iterators to create new wheels at run time:

// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {

        yield prime;

        var last = prime;
        label candidates for candidate in sieve(prime+1) do {
                for composite in last..candidate by prime do {

                        // candidate is a multiple of this prime
                        if composite == candidate {
                                // remember size of last composite
                                last = composite;
                                // and try the next candidate
                                continue candidates;
                        }
                }

                // candidate cannot need to be removed by this sieve
                // yield to parent sieve for checking
                yield candidate;
        }
}

The topmost sieve needs to be started with 2 (the smallest prime):

config const N = 30;
for p in sieve(2) {
        if p > N {
                writeln();
                break;
        }
        write(" ", p);
}

Alternate Conventional Bit-Packed Implementation

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

use Time;
use BitOps;

type Prime = uint(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
  write("The first 25 primes are:  ");
  for p in primes(100) do write(p, " "); writeln();
  
  var count = 0; for p in primes(1000000) do count += 1;
  writeln("Count of primes to a million is:  ", count, ".");
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in primes(limit) do count += 1;

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
  const szlmt = n / 8;
  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

  for bp in 2 .. n {
    if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
      const s0 = bp * bp;
      if s0 > n then break;
      for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
    }
  }

  for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;

}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

Alternate Odds-Only Bit-Packed Implementation

use Time;
use BitOps;

type Prime = int(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
  write("The first 25 primes are:  ");
  for p in primes(100) do write(p, " "); writeln();
  
  var count = 0; for p in primes(1000000) do count += 1;
  writeln("Count of primes to a million is:  ", count, ".");
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in primes(limit) do count += 1;

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
  const ndxlmt = (n - 3) / 2;
  const szlmt = ndxlmt / 8;
  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

  for i in 0 .. ndxlmt { // never gets to the end!
    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
      const bp = i + i + 3;
      const s0 = (bp * bp - 3) / 2;
      if s0 > ndxlmt then break;
      for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
    }
  }

  yield 2;
  for i in 0 .. ndxlmt do
    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;

}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

Hash Table Based Odds-Only Version

Translation of: Python

code link

Works with: Chapel version 1.25.1
use Time;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
  var n: Prime; var obp: Prime; var q: Prime;
  var bps: owned Primes?;
  var keys: domain(Prime); var dict: [keys] Prime;
  proc next(): Prime { // odd primes!
    if this.n < 5 { this.n = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = this.q + adv;
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;       
        this.keys += key; this.dict[key] = adv;
      }
      else if this.keys.contains(this.n) { // found a composite; advance...
        const adv = this.dict[this.n]; this.keys.remove(this.n);
        var nkey = this.n + adv;
        while this.keys.contains(nkey) do nkey += adv;
        this.keys += nkey; this.dict[nkey] = adv;
      }
      else { const p = this.n; this.n += 2; return p; }
      this.n += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 5761455 primes up to 100000000 in 5195.41 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;
 
config const limit = 100000000;
 
type Prime = uint(32);

record BasePrimesTable { // specialized for the use here...
  record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
  var cpcty: int = 8; var sz: int = 0;
  var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
  proc grow() {   
    const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
    bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
    for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);   
  }
  proc find(k: Prime): int { // internal get location of value or -1
    const msk = cpcty - 1; var skey = k: int & msk;
    var perturb = k: int; var loop = 8;
    do {
      if bpa[skey].fullkey == k then return skey;
      perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
      loop -= 1; if perturb > 0 then loop = 8;
    } while loop > 0;
    return -1; // not found!
  }
  proc contains(k: Prime): bool { return find(k) >= 0; }
  proc add(k, v: Prime) { // if exists then replaces else new entry
    const fndi = find(k);
    if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
    else {
      sz += 1; if 2 * sz > cpcty then grow();
      const msk = cpcty - 1; var skey = k: int & msk;
      var perturb = k: int; var loop = 8;
      do {
        if bpa[skey].fullkey == 0 {
          bpa[skey] = new BasePrimeEntry(k, v); return; }
        perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
        loop -= 1; if perturb > 0 then loop = 8;
      } while loop > 0;
    }
  }
  proc remove(k: Prime) { // if doesn't exist does nothing
    const fndi = find(k);
    if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
  }
  proc this(k: Prime): Prime { // returns value or 0 if not found
    const fndi = find(k);
    if fndi < 0 then return 0; else return bpa[fndi].val;
  }
}

class Primes { // needed so we can use next to get successive values
  var n: Prime; var obp: Prime; var q: Prime;
  var bps: shared Primes?; var dict = new BasePrimesTable();
  proc next(): Prime { // odd primes!
    if this.n < 5 { this.n = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = this.q + adv;
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;
        this.dict.add(key, adv);
      }
      else if this.dict.contains(this.n) { // found a composite; advance...
        const adv = this.dict[this.n]; this.dict.remove(this.n);
        var nkey = this.n + adv;
        while this.dict.contains(nkey) do nkey += adv;
        this.dict.add(nkey, adv);
      }
      else { const p = this.n; this.n += 2; return p; }
      this.n += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
 
  var timer: Timer;
  timer.start();
 
  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }
 
  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 5761455 primes up to 100000000 in 2351.79 milliseconds.

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;

use Map;
 
config const limit = 100000000;
 
type Prime = uint(32);
 
class Primes { // needed so we can use next to get successive values
  record PrimeR { var prime: Prime; proc hash() { return prime; } }
  var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
  var bps: owned Primes?;
  var dict = new map(PrimeR, Prime);
  proc next(): Prime { // odd primes!
    if this.n.prime < 5 { this.n.prime = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n.prime >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;       
        this.dict.add(key, adv);
      }
      else if this.dict.contains(this.n) { // found a composite; advance...
        const adv = this.dict.getValue(this.n); this.dict.remove(this.n);
        var nkey = new PrimeR(this.n.prime + adv);
        while this.dict.contains(nkey) do nkey.prime += adv;
        this.dict.add(nkey, adv);
      }
      else { const p = this.n.prime;
             this.n.prime += 2; return p; }
      this.n.prime += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}
 
proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
 
  var timer: Timer;
  timer.start();
 
  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }
 
  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.

Functional Tree Folding Odds-Only Version

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

Translation of: Nim

code link

Works with: 1.22 version - compile with the --fast compiler command line flag for full optimization
use Time;

type Prime = uint(32);

config const limit = 1000000: Prime;

// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
  var head: Prime;
  proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}

class PrimeMultiples: PrimeCIS {
  var adv: Prime;
  override proc next(): shared PrimeCIS {
    return new shared PrimeMultiples(
      this.head + this.adv, this.adv): shared PrimeCIS; }
}

class PrimeCISCIS { // base stream of prime streams; never used directly...
  var head: shared PrimeCIS;
  proc init() { this.head = new shared PrimeCIS(); }
  proc next(): shared PrimeCISCIS {
    return new shared PrimeCISCIS(); }
}

class AllMultiples: PrimeCISCIS {
  var bps: shared PrimeCIS;
  proc init(bsprms: shared PrimeCIS) {
    const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
    this.head = new shared PrimeMultiples(sqr, adv): PrimeCIS;
    this.bps = bsprms;
  }
  override proc next(): shared PrimeCISCIS {
    return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}

class Union: PrimeCIS {
  var feeda, feedb: shared PrimeCIS;
  proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
    const ahd = fda.head; const bhd = fdb.head;
    this.head = if ahd < bhd then ahd else bhd;
    this.feeda = fda; this.feedb = fdb;
  }
  override proc next(): shared PrimeCIS {    
    const ahd = this.feeda.head; const bhd = this.feedb.head;
    if ahd < bhd then
      return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
    if ahd > bhd then
      return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
    return new shared Union(this.feeda.next(),
                            this.feedb.next()): shared PrimeCIS;
  }
}

class Pairs: PrimeCISCIS {
  var feed: shared PrimeCISCIS;
  proc init(fd: shared PrimeCISCIS) {
    const fs = fd.head; const sss = fd.next(); const ss = sss.head;
    this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
  }
  override proc next(): shared PrimeCISCIS {
    return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}

class Composites: PrimeCIS {
  var feed: shared PrimeCISCIS;
  proc init(fd: shared PrimeCISCIS) {
    this.head = fd.head.head; this.feed = fd;
  }
  override proc next(): shared PrimeCIS {
    const fs = this.feed.head.next();
    const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
    const ncs = new shared Composites(prs): shared PrimeCIS;
    return new shared Union(fs, ncs): shared PrimeCIS;
  }
}

class OddPrimesFrom: PrimeCIS {
  var cmpsts: shared PrimeCIS;
  override proc next(): shared PrimeCIS {
    var n = head + 2; var cs = this.cmpsts;
    while true {
      if n < cs.head then
        return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
      n += 2; cs = cs.next();
    }
    return this.cmpsts; // never used; keeps compiler happy!
  }
}

class OddPrimes: PrimeCIS {
  proc init() { this.head = 3; }
  override proc next(): shared PrimeCIS {
    const bps = new shared OddPrimes(): shared PrimeCIS;
    const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
    const cmpsts = new shared Composites(mlts): shared PrimeCIS;
    return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
  }
}

iter primes(): Prime {
  yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
  while true { yield cur.head; cur = cur.next(); }
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version

To take advantage of the features that make Chapel shine, we need to use it to do some parallel computations, and to efficiently do that for the Sieve of Eratosthenes, we need to divide the work into page segments where we can assign each largish segment to a separate thread/task; this also improves the speed due to better cache associativity with most memory accesses to values that are already in the cache(s). Once we have divided the work, Chapel offers lots of means to implement the parallelism but to be a true Sieve of Eratosthenes, we need to have the ability to output the results in order; with many of the convenience mechanisms not doing that, the best/simplest option is likely task parallelism with the output results assigned to an rotary indexed array containing the `sync` results. It turns out that, although the Chapel compiler can sometimes optimize the code so the overhead of creating tasks is not onerous, for this case where the actual tasks are somewhat complex, the compiler can't recognize that an automatically generated thread pool(s) are required so we need to generate the thread pool(s) manually. The code that implements the multi-threading of page segments using thread pools is as follows:

Works with: 1.24.1 version - compile with the --fast compiler command line flag for full optimization
use Time; use BitOps; use CPtr;

type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);

config const LIMIT = 1000000000: Prime;

config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
        "L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
        "L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
config const NUMTHRDS = here.maxTaskPar;
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");

const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
                            11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;

// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
                     if L1 == 32 then 23000 else 43390)
                       - WHLPRMS.size + 1; // extra one for marker!

// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;

var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
  const hi = WHLPRMS.size - 1;
  const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
  // contains wheel pattern primes skipping the small wheel prime (2)!...
  // never advances past the first base prime arr as it ends with a huge!...
  for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
                                    else 0x7FFFFFFF): BasePrime; // last huge!
  var whlbpas = new shared BasePrimeArrs(whlhd);
  var whlstrts = new StrtsArr({rng});
  wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
  // eliminate wheel primes from the WHLPTRN buffer!...
  wp.cmpsts[0] = 0xFF: uint(8);
}

// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
  var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
  proc init() {}
  proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
  proc deinit() { dom = { 0 .. -1 }; }

  proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
    const sz = cmpsts.size; const mvsz = min(sz, 16384);
    var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
    for i in 0 .. sz - 1 by 16384 {
      c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
               c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);    
      mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
    }
  }

  proc count(btlmt: int) { // count by 64 bits using CPU popcount...
    const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
    const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
    var i = 0; var cnt = (lstwrd * 64 + 64): int;
    while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
    return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;    
  }

  // most of the time is spent doing culling operations as follows!...
  proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
                                        ref strts: StrtsArr) {
    const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
    const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
    const strtssz = strts.strtsarr.size;
    // C pointer for speed magic!...
    const cmpstsp = c_ptrTo(cmpsts[0]);
    const strtsp = c_ptrTo(strts.strtsarr);

    // first fill the strts array with pre-calculated start addresses...
    var i = 0; for bp in bpas {
      // calculate page start address for the given base prime...
      const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
      const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
      if s0 > ndxlmt then {
        if i < strtssz then strtsp[i] = -1: int(32); break; }
      var s = 0: int;
      if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
      else { const r = (lwi: Prime - s0) % bbp;
            if r == 0 then s = 0: int; else s = (bbp - r): int; };
      if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
      if i < strtssz { strtsp[i] = -1; i = strtssz; }
      // cull the full buffer for this given base prime as usual...
      // only works up to limit of int(32)**2!!!!!!!!
      while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
    }

    // cull the smaller sub buffers according to the strts array...
    for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
      i = 0; for bp in bpas { // bp never bigger than uint(32)!
        // cull the sub buffer for this given base prime...
        var s = strtsp[i]: int; if s < 0 then break;
        var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
        if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
          const slmt = s + bpi * 8 - 1;
          while s <= slmt {
            const bmi = s & 7; const msk = bitmsk[bmi];
            var c = s >> 3; const clmt = sbtlmt >> 3;
            while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
            nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
          }
          strtsp[i] = nxt: int(32); i += 1;
        }
        else { while s <= sbtlmt { // standard cull loop...
                 cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
               strtsp[i] = s: int(32); i += 1; }
      }
    }
  }
}

// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
// multi-threaded through the use of a thread pool...
class PagedResults {
  const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
  var lwi: PrimeNdx; var bsbtsz: int;
  var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
  var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
  var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
  var qi: int = 0;
  var wrkq$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
  var rsltsq$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;

  proc init(cvclsr, li: PrimeNdx, bsz: int) {
    cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }

  proc deinit() { // kill the thread pool when out of scope...
    if bpas == nil then return; // no thread pool!
    for i in wrkq$.domain {
      wrkq$[i].writeEF(-1); while true { const r = rsltsq$[i].readFE();
        if r == nil then break; }
    }
  }
 
  proc next(): cnvrtrclsr(lwi, sbs(0)).type {   
    proc dowrk(ri: int) { // used internally!...
      while true {
        const li = wrkq$[ri].readFE(); // following to kill thread!
        if li < 0 { rsltsq$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
        sbs[ri].fill(li);
        sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
        rsltsq$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
      }
    }
    if this.bpas == nil { // init on first use; avoids data race!
      this.bpas = new BasePrimeArrs();
      if this.bsbtsz < CPUL1CACHE {
        this.sbs = new SieveBuffer(bsbtsz);
        this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
      }
      else {
        this.sbs = new SieveBuffer(CPUL2CACHE);
        this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
      }
      // start threadpool and give it inital work...
      for i in rsltsq$.domain {
        begin with (const in i) dowrk(i);
        this.wrkq$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
      }
    }
    const rslt = this.rsltsq$[qi].readFE();
    this.wrkq$[qi].writeEF(this.lwi);
    this.lwi += this.sbs[qi].cmpsts.size * 8;
    this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
    return rslt;
  }
 
  iter these() { while lwi >= 0 do yield next(); }
}

// the sieve buffer to base prime array converter closure...
record SB2BPArr {  
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
    const bsprm = (lwi + lwi + 3): BasePrime;
    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
    var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
                        arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
                      i += 1; }
    return arr;
  }
}

// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
  var head: shared BasePrimeArr;
  var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
  var lock$: sync bool = true;
  var feed: shared PagedResults(SB2BPArr) =
    new shared PagedResults(new SB2BPArr(), 65536, 65536);

  proc init() { // make our own first array to break data race!
    var sb = new SieveBuffer(256); sb.fill(0);
    const sb2 = new SB2BPArr();
    head = sb2(0, sb): shared BasePrimeArr;
    this.complete(); // fake base primes!
    sb = new SieveBuffer(65536); sb.fill(0);
    // use (completed) self as source of base primes!
    var strts = new StrtsArr({ 0 .. 256 });
    sb.cull(0, 65536, this, strts);
    // replace head with new larger version culled using fake base primes!...
    head = sb2(0, sb): shared BasePrimeArr;
  }

  // for initializing for use by the fillWHLPTRN proc...
  proc init(hd: shared BasePrimeArr) {
    head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
  }

  // for initializing lazily extended list as required...
  proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }

  proc next(): shared BasePrimeArrs {
    if this.tail == nil { // in case other thread slipped through!     
      if this.lock$.readFE() && this.tail == nil { // empty sync -> block others!
        const nhd = this.feed.next(): shared BasePrimeArr;
        this.tail = new shared BasePrimeArrs(nhd , this.feed);
      }
      this.lock$.writeEF(false); // fill the sync so other threads can do nothing!
    }
    return this.tail: shared BasePrimeArrs; // necessary cast!
  }
 
  iter these(): BasePrime {
    for bp in head.bparr do yield bp; var cur = next();
    while true {
      for bp in cur.head.bparr do yield bp; cur = cur.next(); }
  }
}

record SB2PrmArr {
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
    const bsprm = (lwi + lwi + 3): Prime;
    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
    var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
                        arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
                      i += 1; }
    return arr;
  }
}

iter primes(): Prime {
  for p in WHLPRMS do yield p: Prime;
  for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
    for p in pa!.prmarr do yield p;
}

// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }

// a class that emulates a closure and a return value...
record SB2Cnt {
  const nxtlmt: PrimeNdx;
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
    const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
    const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
    return new shared CntNxt(sb.count(btlmt), lstndx);
  }
}

// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
  const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
  for p in WHLPRMS { if p > lmt then break; count += 1; }
  if lmt < FRSTSVPRM then return count;
  for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
    count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
  }
  return count;
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");

write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");
writeln("using ", NUMTHRDS, " threads.");

var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();

write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.

Clojure

(defn primes< [n]
   (remove (set (mapcat #(range (* % %) n %)
                        (range 2 (Math/sqrt n))))
           (range 2 n)))

The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with (println (time (count (primes< 1000000)))), it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.

It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:

(defn primes< [n]
   (remove (into #{}
                 (mapcat #(range (* % %) n %)
                         (range 2 (Math/sqrt n))))
           (range 2 n)))

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:

(defn primes< [n]
   (if (< n 2) ()
     (cons 2 (remove (into #{}
                           (mapcat #(range (* % %) n %)
                                   (range 3 (Math/sqrt n) 2)))
                     (range 3 n 2)))))

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        cmpsts (boolean-array (inc n)),
        cullp (fn [p]
                (loop [i (* p p)]
                  (if (<= i n)
                    (do (aset cmpsts i true)
                        (recur (+ i p))))))]
    (do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
                                       (range 2 (inc root)))))
        (filter #(not (aget cmpsts %)) (range 2 (inc n))))))

Alternative implementation using Clojure's side-effect oriented list comprehension.

 (defn primes-to
  "Returns a lazy sequence of prime numbers less than lim"
  [lim]
  (let [refs (boolean-array (+ lim 1) true)
        root (int (Math/sqrt lim))]
    (do (doseq [i (range 2 lim)
                :while (<= i root)
                :when (aget refs i)]
          (doseq [j (range (* i i) lim i)]
            (aset refs j false)))
        (filter #(aget refs %) (range 2 lim)))))

Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.

(defn primes-to
  "Returns a lazy sequence of prime numbers less than lim"
  [lim]
  (let [max-i (int (/ (- lim 1) 2))
        refs (boolean-array max-i true)
        root (/ (dec (int (Math/sqrt lim))) 2)]
    (do (doseq [i (range 1 (inc root))
                :when (aget refs i)]
          (doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
            (aset refs j false)))
        (cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))

This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).

Alternative very slow entirely functional implementation using lazy sequences

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (letfn [(nxtprm [cs] ; current candidates
            (let [p (first cs)]
              (if (> p (Math/sqrt n)) cs
                (cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
                                              set (remove cs) rest)))))))]
    (nxtprm (range 2 (inc n)))))

The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.

Version based on immutable Vector's

Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [max-prime]
  (let [sieve (fn [s n]
                (if (<= (* n n) max-prime)
                  (recur (if (s n)
                           (reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))
                           s)
                         (inc n))
                  s))]
    (->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))
             (assoc 0 false)
             (assoc 1 false)
             (sieve 2))
         (map-indexed #(vector %2 %1)) (filter first) (map second))))

The above code is still quite slow due to the cost of the immutable copy-on-modify operations.

Odds only bit packed mutable array based version

The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:

(set! *unchecked-math* true)

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        rootndx (long (/ (- root 3) 2)),
        ndx (long (/ (- n 3) 2)),
        cmpsts (long-array (inc (/ ndx 64))),
        isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))
                               (bit-shift-left 1 (bit-and % 63)))),
        cullp (fn [i]
                (let [p (long (+ i i 3))]
	                (loop [i (bit-shift-right (- (* p p) 3) 1)]
	                  (if (<= i ndx)
	                    (do (let [w (bit-shift-right i 6)]
	                    (aset cmpsts w (bit-or (aget cmpsts w)
	                                           (bit-shift-left 1 (bit-and i 63)))))
	                        (recur (+ i p))))))),
        cull (fn [] (loop [i 0] (if (<= i rootndx)
                                  (do (if (isprm i) (cullp i)) (recur (inc i))))))]
    (letfn [(nxtprm [i] (if (<= i ndx)
                          (cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]
                                                              (if (or (> i ndx) (isprm i)) i
                                                                (recur (inc i)))))))))]
      (if (< n 2) nil
        (cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))

The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.

If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.

The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):

(defn primes-tox
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        rootndx (long (/ (- root 3) 2)),
        ndx (max (long (/ (- n 3) 2)) 0),
        lmt (quot ndx 64),
        cmpsts (long-array (inc lmt)),
        cullp (fn [i]
                (let [p (long (+ i i 3))]
	                (loop [i (bit-shift-right (- (* p p) 3) 1)]
	                  (if (<= i ndx)
	                    (do (let [w (bit-shift-right i 6)]
                            (aset cmpsts w (bit-or (aget cmpsts w)
                                                   (bit-shift-left 1 (bit-and i 63)))))
                          (recur (+ i p))))))),
        cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)
                                                 (bit-shift-left -2 (bit-and ndx 63))))
                        (loop [i 0]
                          (when (<= i rootndx)
                            (when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))
                                                   (bit-shift-left 1 (bit-and i 63))))
                              (cullp i))
                            (recur (inc i))))))
        numprms (fn []
                  (let [w (dec (alength cmpsts))] ;; fast results count bit counter
                    (loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]
                      (if (> i w) cnt
                        (recur (inc i) 
                               (- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]
    (if (< n 2) nil
      (cons 2 (if (< n 3) nil
                (do (cull)
                    (deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???
                      clojure.lang.ISeq
                        (first [_] (if (nil? cmpsa) nil (+ i i 3)))
                        (next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil
                                                          (OPSeq.
                                                            (loop [j (inc i)]
                                                              (let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))
                                                                                       (bit-shift-left 1 (bit-and j 63))))]
                                                                (if p? j (recur (inc j)))))
                                                            cmpsa ncnt tcnt))))
                        (more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)
                                                             (.next this))))
                        (cons [this o] (clojure.core/cons o this))
                        (empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))
                        (equiv [this o] (if (or (not= (type this) (type o))
                                                (not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))
                                                (not= i (.i ^OPSeq o))) false true))
                        clojure.lang.Counted
                          (count [_] (- tcnt cnt))
                        clojure.lang.Seqable
                          (clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))
                        clojure.lang.IReduce
                          (reduce [_ f v] (let [c (- tcnt cnt)]
                                            (if (<= c 0) nil
                                              (loop [ci i, n c, rslt v]
                                                (if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))
                                                                    (bit-shift-left 1 (bit-and ci 63))))
                                                  (let [rrslt (f rslt (+ ci ci 3)),
                                                        rdcd (reduced? rrslt),
                                                        nrslt (if rdcd @rrslt rrslt)]
                                                    (if (or (<= n 1) rdcd) nrslt
                                                      (recur (inc ci) (dec n) nrslt)))
                                                  (recur (inc ci) n rslt))))))
                          (reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)
                                                              (.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))
                        clojure.lang.Sequential
                        Object
                          (toString [this] (if (= cnt tcnt) "()"
                                             (.toString (seq (map identity this)))))) 
                    (->OPSeq 0 cmpsts 0 (numprms))))))))

'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.

Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.

Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.

To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.

Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.

Unbounded Versions

For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.

The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:

A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)

(defn primes-Bird
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
          (union [xs ys] (let [xv (first xs), yv (first ys)]
                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),
          (mrgmltpls [mltplss] (cons (first (first mltplss))
                                     (lazy-seq (union (next (first mltplss))
                                                      (mrgmltpls (next mltplss)))))),
          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
                                    (if (< n (first cmpsts))
                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
                                      (recur (+ n 2) (next cmpsts)))))]
    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                         (minusStrtAt 5 cmpsts)))))
        (cons 2 (lazy-seq oddprms)))))

The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.

A Clojure version of the tree folding sieve using Lazy Sequences

The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:

(defn primes-treeFolding
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
          (union [xs ys] (let [xv (first xs), yv (first ys)]
                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),
          (pairs [mltplss] (let [tl (next mltplss)]
                             (cons (union (first mltplss) (first tl))
                                   (lazy-seq (pairs (next tl)))))),
          (mrgmltpls [mltplss] (cons (first (first mltplss))
                                     (lazy-seq (union (next (first mltplss))
                                                      (mrgmltpls (pairs (next mltplss))))))),
          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
                                    (if (< n (first cmpsts))
                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
                                      (recur (+ n 2) (next cmpsts)))))]
    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                         (minusStrtAt 5 cmpsts)))))
        (cons 2 (lazy-seq oddprms)))))

The above code is still slower than it should be due to the slowness of Clojure's sequence operations.

A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence

The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:

(deftype CIS [v cont]
  clojure.lang.ISeq
    (first [_] v)
    (next [_] (if (nil? cont) nil (cont)))
    (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
    (cons [this o] (clojure.core/cons o this))
    (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
    (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
                                                (if (or (not= (type cis1) (type cis2))
                                                        (not= (.v cis1) (.v ^CIS cis2))
                                                        (and (nil? (.cont cis1))
                                                              (not (nil? (.cont ^CIS cis2))))
                                                        (and (nil? (.cont ^CIS cis2))
                                                              (not (nil? (.cont cis1))))) false
                                                  (if (nil? (.cont cis1)) true
                                                    (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
    (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
                                            (recur ((.cont cis)) (inc cnt)))))
  clojure.lang.Seqable
    (seq [this] (if (and (nil? v) (nil? cont)) nil this))
  clojure.lang.Sequential
  Object
    (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))

(defn primes-treeFoldingx
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (->CIS c (fn [] (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
          (union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
                                     (if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
                                       (if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
                                         (->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
          (pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
                                  (->CIS (union (.v mltplss) (.v tl))
                                         (fn [] (pairs ((.cont tl))))))),
          (mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
                                           (fn [] (union ((.cont ^CIS (.v mltplss)))
                                                         (mrgmltpls (pairs ((.cont mltplss)))))))),
          (minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
                                         (if (< n (.v cmpsts))
                                           (->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
                                           (recur (+ n 2) ((.cont cmpsts))))))]
    (do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                       (minusStrtAt 5 cmpsts)))))
        (->CIS 2 (fn [] oddprms)))))

'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.

The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.

Incremental Hash Map based unbounded "odds-only" version

The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

(defn primes-hashmap
  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
  []
  (letfn [(nxtoddprm [c q bsprms cmpsts]
            (if (>= c q) ;; only ever equal
              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
                (recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))
              (if (contains? cmpsts c)
                (recur (+ c 2) q bsprms
                       (let [adv (cmpsts c), ncmps (dissoc cmpsts c)]
                         (assoc ncmps
                                (loop [try (+ c adv)] ;; ensure map entry is unique
                                  (if (contains? ncmps try)
                                    (recur (+ try adv)) try)) adv)))
                (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))

The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.

There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.

Incremental Priority Queue based unbounded "odds-only" version

In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

(deftype PQEntry [k, v]
  Object
    (toString [_] (str "<" k "," v ">")))
(deftype PQNode [ntry, lft, rght]
  Object
    (toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))

(defn empty-pq [] nil)

(defn getMin-pq [^PQNode pq]
  (if (nil? pq)
    nil
    (.ntry pq)))

(defn insert-pq [^PQNode opq ok v]
  (loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
    (if (nil? pq)
      (cont (->PQNode kv nil nil))
      (let [k (.k kv),
            ^PQEntry kvn (.ntry pq), kn (.k kvn),
            l (.lft pq), r (.rght pq)]
        (if (<= k kn)
          (recur kvn r #(cont (->PQNode kv % l)))
          (recur kv r #(cont (->PQNode kvn % l))))))))

(defn replaceMinAs-pq [^PQNode opq k v]
  (let [^PQEntry kv (->PQEntry k v)]
    (if (nil? opq) ;; if was empty or just an entry, just use current entry
      (->PQNode kv nil nil)
      (loop [pq opq, cont identity]
        (let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
          (cond ;; if left us empty, right must be too
            (nil? l)
              (cont (->PQNode kv nil nil)),
            (nil? r) ;; we only have a left...
              (let [^PQEntry kvl (.ntry l), kl (.k kvl)]
                    (if (<= k kl)
                      (cont (->PQNode kv l nil))
                      (recur l #(cont (->PQNode kvl % nil))))),
            :else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
                        ^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
                    (if (and (<= k kl) (<= k kr))
                      (cont (->PQNode kv l r))
                      (if (<= kl kr)
                        (recur l #(cont (->PQNode kvl % r)))
                        (recur r #(cont (->PQNode kvr l %))))))))))))

Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.

The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

(defn primes-pq
  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
  []
  (letfn [(nxtoddprm [c q bsprms cmpsts]
            (if (>= c q) ;; only ever equal
              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
                (recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))
              (let [mn (getMin-pq cmpsts)]
                (if (and mn (>= c (.k mn))) ;; never greater than
                  (recur (+ c 2) q bsprms
                         (loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value
                           (let [ncmps (replaceMinAs-pq cmps (+ c adv) adv),
                                 nmn (getMin-pq ncmps)]
                             (if (and nmn (>= c (.k nmn)))
                               (recur (.v nmn) ncmps)
                               ncmps))))
                  (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))

The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.

It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).

Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.

All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.

It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.

Bit packed page segmented array unbounded "odds-only" version

To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:

(set! *unchecked-math* true)

(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGBTS (bit-shift-left PGSZ 3))
(def PGWRDS (bit-shift-right PGBTS 5))
(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes
(def BPBTS (bit-shift-left BPWRDS 5))
(defn- count-pg
  "count primes in the culled page buffer, with test for limit"
  [lmt ^ints pg]
  (let [pgsz (alength pg),
        pgbts (bit-shift-left pgsz 5),
        cntem (fn [lmtw]
                (let [lmtw (long lmtw)]
	          (loop [i (long 0), c (long 0)]
	            (if (>= i lmtw) (- (bit-shift-left lmtw 5) c)
	              (recur (inc i)
	              (+ c (java.lang.Integer/bitCount (aget pg i))))))))]
    (if (< lmt pgbts)
      (let [lmtw (bit-shift-right lmt 5),
            lmtb (bit-and lmt 31)
            msk (bit-shift-left -2 lmtb)]
        (+ (cntem lmtw)
           (- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)
                                                      msk)))))
      (- pgbts
         (areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))))
;;      (cntem pgsz))))
(defn- primes-pages
  "unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."
  []
  (letfn [(make-pg [lowi pgsz bpgs]
            (let [lowi (long lowi),
                  pgbts (long (bit-shift-left pgsz 5)),
                  pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),
                  ^ints pg (int-array pgsz),
                  cull (fn [bpgs']
                         (loop [i (long 0), bpgs' bpgs']
	                         (let [^ints fbpg (first bpgs'),
	                               bpgsz (long (alength fbpg))]
	                           (if (>= i bpgsz)
	                             (recur 0 (next bpgs'))
	                             (let [p (long (aget fbpg i)),
	                                   sqr (long (* p p))]
	                               (if (< sqr pgrng) (do
                   (loop [j (long (let [s (long (bit-shift-right (- sqr 3) 1))]
                                     (if (>= s lowi) (- s lowi)
                                       (let [m (long (rem (- lowi s) p))]
                                         (if (zero? m)
                                           0
                                           (- p m))))))]
                     (if (< j pgbts) ;; fast inner culling loop where most time is spent
                       (do
                         (let [w (bit-shift-right j 5)]
                           (aset pg w (int (bit-or (aget pg w)
                                                   (bit-shift-left 1 (bit-and j 31))))))
                         (recur (+ j p)))))
                     (recur (inc i) bpgs'))))))))]
              (do (if (nil? bpgs)
                    (letfn [(mkbpps [i]
                              (if (zero? (bit-and (aget pg (bit-shift-right i 5))
                                                  (bit-shift-left 1 (bit-and i 31))))
                                (cons (int-array 1 (+ i i 3)) (lazy-seq (mkbpps (inc i))))
                                (recur (inc i))))]
                      (cull (mkbpps 0)))
                    (cull bpgs))
                  pg))),
          (page-seq [lowi pgsz bps]
            (letfn [(next-seq [lwi]
                      (cons (make-pg lwi pgsz bps)
                            (lazy-seq (next-seq (+ lwi (bit-shift-left pgsz 5))))))]
              (next-seq lowi)))
          (pgs->bppgs [ppgs]
            (letfn [(nxt-pg [lowi pgs]
                      (let [^ints pg (first pgs),
                            cnt (count-pg BPBTS pg),
                            npg (int-array cnt)]
                        (do (loop [i 0, j 0]
                              (if (< i BPBTS)
                                (if (zero? (bit-and (aget pg (bit-shift-right i 5))
                                                    (bit-shift-left 1 (bit-and i 31))))
                                  (do (aset npg j (+ (bit-shift-left (+ lowi i) 1) 3))
                                      (recur (inc i) (inc j)))
                                  (recur (inc i) j))))
                            (cons npg (lazy-seq (nxt-pg (+ lowi BPBTS) (next pgs)))))))]
              (nxt-pg 0 ppgs))),
          (make-base-prms-pgs []
            (pgs->bppgs (cons (make-pg 0 BPWRDS nil)
                              (lazy-seq (page-seq BPBTS BPWRDS (make-base-prms-pgs))))))]
    (page-seq 0 PGWRDS (make-base-prms-pgs))))
(defn primes-paged
  "unbounded Sieve of Eratosthenes producing a lazy sequence of primes"
  []
  (do (deftype CIS [v cont]
        clojure.lang.ISeq
          (first [_] v)
          (next [_] (if (nil? cont) nil (cont)))
          (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
          (cons [this o] (clojure.core/cons o this))
          (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
          (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
                                                      (if (or (not= (type cis1) (type cis2))
                                                              (not= (.v cis1) (.v ^CIS cis2))
                                                              (and (nil? (.cont cis1))
                                                                   (not (nil? (.cont ^CIS cis2))))
                                                              (and (nil? (.cont ^CIS cis2))
                                                                   (not (nil? (.cont cis1))))) false
                                                        (if (nil? (.cont cis1)) true
                                                          (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
          (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
                                                  (recur ((.cont cis)) (inc cnt)))))
        clojure.lang.Seqable
          (seq [this] (if (and (nil? v) (nil? cont)) nil this))
        clojure.lang.Sequential
        Object
          (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))
		  (letfn [(next-prm [lowi i pgseq]
		            (let [lowi (long lowi),
                      i (long i),
                      ^ints pg (first pgseq),
		                  pgsz (long (alength pg)),
		                  pgbts (long (bit-shift-left pgsz 5)),
		                  ni (long (loop [j (long i)]
		                             (if (or (>= j pgbts)
		                                     (zero? (bit-and (aget pg (bit-shift-right j 5))
		                                               (bit-shift-left 1 (bit-and j 31)))))
		                               j
		                               (recur (inc j)))))]
		              (if (>= ni pgbts)
		                (recur (+ lowi pgbts) 0 (next pgseq))
		                (->CIS (+ (bit-shift-left (+ lowi ni) 1) 3)
		                       (fn [] (next-prm lowi (inc ni) pgseq))))))]
		    (->CIS 2 (fn [] (next-prm 0 0 (primes-pages)))))))
(defn primes-paged-count-to
  "counts primes generated by page segments by Sieve of Eratosthenes to the top limit"
  [top]
  (cond (< top 2) 0
        (< top 3) 1
        :else (letfn [(nxt-pg [lowi pgseq cnt]
                        (let [topi (bit-shift-right (- top 3) 1)
                              nxti (+ lowi PGBTS),
                              pg (first pgseq)]
                          (if (> nxti topi)
                            (+ cnt (count-pg (- topi lowi) pg))
                            (recur nxti
                                   (next pgseq)
                                   (+ cnt (count-pg PGBTS pg))))))]
                (nxt-pg 0 (primes-pages) 1))))

The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM

Clojure does one thing very slowly, just as here: it enumerates extremely slowly as compared to using a more imperative iteration interface; it helps to use a roll-your-own ISeq interface as here, where enumeration of the primes reduces the time from about four times as long as the composite culling operations for those primes to only about one and a half times as long, although one must also write their own sequence handling functions (can't use "take-while" or "count", for instance) in order to enjoy that benefit. That is why the "primes-paged-count-to" function is provided so it takes a negligible percentage of the time to count the primes over a range as compared to the time for the composite culling operations.

The practical range of the above sieve is about 16 million due to the fixed size of the page buffers; in order to extend the range, a larger page buffer could be used up to the size of the CPU L2 or L3 caches. If a 2^20 buffer were used (one Megabyte, as many modern dexktop CPU's easily have in their L3 cache), then the range would be increased up to about 10^14 at a cost of about a factor of two or three in slower memory accesses per composite culling operation loop. The base primes culling page size is already adequate for this range. One could make the culling page size automatically expand with growing range by about the square root of the current prime range with not too many changes to the code.

As for many implementations of unbounded sieves, the base primes less than the square root of the current range are generated by a secondary generated stream of primes; in this case it is done recursively, so another secondary stream generates the base primes for the base primes and so on down to where the innermost generator has only one page in the stream; this only takes one or two recursions for this type of range.

The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.

CLU

% Sieve of Eratosthenes
eratosthenes = proc (n: int) returns (array[bool])
    prime: array[bool] := array[bool]$fill(1, n, true)
    prime[1] := false

    for p: int in int$from_to(2, n/2) do
        if prime[p] then
            for c: int in int$from_to_by(p*p, n, p) do
                prime[c] := false
            end
        end
    end
    return(prime)
end eratosthenes

% Print primes up to 1000 using the sieve
start_up = proc ()
    po: stream := stream$primary_output()
    prime: array[bool] := eratosthenes(1000)
    col: int := 0

    for i: int in array[bool]$indexes(prime) do
        if prime[i] then
            col := col + 1
            stream$putright(po, int$unparse(i), 5)
            if col = 10 then
                col := 0
                stream$putc(po, '\n')
            end
        end
    end
end start_up
Output:
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

CMake

function(eratosthenes var limit)
  # Check for integer overflow. With CMake using 32-bit signed integer,
  # this check fails when limit > 46340.
  if(NOT limit EQUAL 0)         # Avoid division by zero.
    math(EXPR i "(${limit} * ${limit}) / ${limit}")
    if(NOT limit EQUAL ${i})
      message(FATAL_ERROR "limit is too large, would cause integer overflow")
    endif()
  endif()

  # Use local variables prime_2, prime_3, ..., prime_${limit} as array.
  # Initialize array to y => yes it is prime.
  foreach(i RANGE 2 ${limit})
    set(prime_${i} y)
  endforeach(i)

  # Gather a list of prime numbers.
  set(list)
  foreach(i RANGE 2 ${limit})
    if(prime_${i})
      # Append this prime to list.
      list(APPEND list ${i})

      # For each multiple of i, set n => no it is not prime.
      # Optimization: start at i squared.
      math(EXPR square "${i} * ${i}")
      if(NOT square GREATER ${limit})   # Avoid fatal error.
        foreach(m RANGE ${square} ${limit} ${i})
          set(prime_${m} n)
        endforeach(m)
      endif()
    endif(prime_${i})
  endforeach(i)
  set(${var} ${list} PARENT_SCOPE)
endfunction(eratosthenes)
# Print all prime numbers through 100.
eratosthenes(primes 100)
message(STATUS "${primes}")

COBOL

*> Please ignore the asterisks in the first column of the next comments,
*> which are kludges to get syntax highlighting to work.
       IDENTIFICATION DIVISION.
       PROGRAM-ID. Sieve-Of-Eratosthenes.

       DATA DIVISION.
       WORKING-STORAGE SECTION.

       01  Max-Number       USAGE UNSIGNED-INT.
       01  Max-Prime        USAGE UNSIGNED-INT.

       01  Num-Group.
           03  Num-Table PIC X VALUE "P"
                   OCCURS 1 TO 10000000 TIMES DEPENDING ON Max-Number
                   INDEXED BY Num-Index.
               88  Is-Prime VALUE "P" FALSE "N".
               
       01  Current-Prime    USAGE UNSIGNED-INT.

       01  I                USAGE UNSIGNED-INT.

       PROCEDURE DIVISION.
           DISPLAY "Enter the limit: " WITH NO ADVANCING
           ACCEPT Max-Number
           DIVIDE Max-Number BY 2 GIVING Max-Prime

*          *> Set Is-Prime of all non-prime numbers to false.
           SET Is-Prime (1) TO FALSE
           PERFORM UNTIL Max-Prime < Current-Prime
*              *> Set current-prime to next prime.
               ADD 1 TO Current-Prime
               PERFORM VARYING Num-Index FROM Current-Prime BY 1
                   UNTIL Is-Prime (Num-Index)
               END-PERFORM
               MOVE Num-Index TO Current-Prime

*              *> Set Is-Prime of all multiples of current-prime to
*              *> false, starting from current-prime sqaured.
               COMPUTE Num-Index = Current-Prime ** 2
               PERFORM UNTIL Max-Number < Num-Index
                   SET Is-Prime (Num-Index) TO FALSE
                   SET Num-Index UP BY Current-Prime
               END-PERFORM
           END-PERFORM

*          *> Display the prime numbers.
           PERFORM VARYING Num-Index FROM 1 BY 1
                   UNTIL Max-Number < Num-Index
               IF Is-Prime (Num-Index)
                   DISPLAY Num-Index
               END-IF
           END-PERFORM

           GOBACK
           .

Comal

Translation of: BASIC
// Sieve of Eratosthenes
input "Limit? ": limit
dim sieve(1:limit)
sqrlimit:=sqr(limit)
sieve(1):=1
p:=2
while p<=sqrlimit do
 while sieve(p) and p<sqrlimit do
  p:=p+1
 endwhile
 if p>sqrlimit then goto done
 for i:=p*p to limit step p do
  sieve(i):=1
 endfor i
 p:=p+1
endwhile
done:
print 2,
for i:=3 to limit do
 if sieve(i)=0 then
  print ", ",i,
 endif
endfor i
print
Output:
Limit? 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97

end

Common Lisp

(defun sieve-of-eratosthenes (maximum)
  (loop
     with sieve = (make-array (1+ maximum)
                              :element-type 'bit
                              :initial-element 0)
     for candidate from 2 to maximum
     when (zerop (bit sieve candidate))
     collect candidate
     and do (loop for composite from (expt candidate 2) 
               to maximum by candidate
               do (setf (bit sieve composite) 1))))

Working with odds only (above twice speedup), and marking composites only for primes up to the square root of the maximum:

(defun sieve-odds (maximum)
  "Prime numbers sieve for odd numbers. 
   Returns a list with all the primes that are less than or equal to maximum."
  (loop :with maxi = (ash (1- maximum) -1)
        :with stop = (ash (isqrt maximum) -1)
        :with sieve = (make-array (1+ maxi) :element-type 'bit :initial-element 0)
        :for i :from 1 :to maxi
        :for odd-number = (1+ (ash i 1))
        :when (zerop (sbit sieve i))
          :collect odd-number :into values
        :when (<= i stop)
          :do (loop :for j :from (* i (1+ i) 2) :to maxi :by odd-number
                    :do (setf (sbit sieve j) 1))
        :finally (return (cons 2 values))))

The indexation scheme used here interprets each index i as standing for the value 2i+1. Bit 0 is unused, a small price to pay for the simpler index calculations compared with the 2i+3 indexation scheme. The multiples of a given odd prime p are enumerated in increments of 2p, which corresponds to the index increment of p on the sieve array. The starting point p*p = (2i+1)(2i+1) = 4i(i+1)+1 corresponds to the index 2i(i+1).

While formally a wheel, odds are uniformly spaced and do not require any special processing except for value translation. Wheels proper aren't uniformly spaced and are thus trickier.

Cowgol

include "cowgol.coh";

# To change the maximum prime, change the size of this array
# Everything else is automatically filled in at compile time
var sieve: uint8[5000];

# Make sure all elements of the sieve are set to zero
MemZero(&sieve as [uint8], @bytesof sieve);

# Generate the sieve
var prime: @indexof sieve := 2;
while prime < @sizeof sieve loop
    if sieve[prime] == 0 then
        var comp: @indexof sieve := prime * prime;
        while comp < @sizeof sieve loop
            sieve[comp] := 1;
            comp := comp + prime;
        end loop;
    end if;
    prime := prime + 1;
end loop;

# Print all primes
var cand: @indexof sieve := 2;
while cand < @sizeof sieve loop
    if sieve[cand] == 0 then
        print_i16(cand as uint16);
        print_nl();
    end if;
    cand := cand + 1;
end loop;
Output:
2
3
5
7
11
...
4967
4969
4973
4987
4999

Craft Basic

define limit = 120

dim flags[limit]

for n = 2 to limit

	let flags[n] = 1

next n

print "prime numbers less than or equal to ", limit ," are:"

for n = 2 to sqrt(limit)

	if flags[n] = 1 then

		for i = n * n to limit step n

			let flags[i] = 0

		next i

	endif

next n

for n = 1 to limit

	if flags[n] then

		print n

	endif

next n
Output:

prime numbers less than or equal to 120 are:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113

Crystal

Basic Version

This implementation uses a `BitArray` so it is automatically bit-packed to use just one bit per number representation:

# compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE
  include Iterator(Prime)
  @bits : BitArray; @bitndx : Int32 = 2

  def initialize(range : Prime)
    if range < 2
      @bits = BitArray.new 0
    else
      @bits = BitArray.new((range + 1).to_i32)
    end
    ba = @bits; ndx = 2
    while true
      wi = ndx * ndx
      break if wi >= ba.size
      if ba[ndx]
        ndx += 1; next
      end
      while wi < ba.size
        ba[wi] = true; wi += ndx
      end
      ndx += 1
    end
  end

  def next
    while @bitndx < @bits.size
      if @bits[@bitndx]
        @bitndx += 1; next
      end
      rslt = @bitndx.to_u64; @bitndx += 1; return rslt
    end
    stop
  end
end

print "Primes up to a hundred:  "
SoE.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million:  "
puts SoE.new(1_000_000).each.size
print "Number of primes to a billion:  "
start_time = Time.monotonic
print SoE.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."
Output:
Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 10219.222539 milliseconds.

This is as run on an Intel SkyLake i5-6500 at 3.6 GHz (automatic boost for single threaded as here).

Odds-Only Version

the non-odds-only version as per the above should never be used because in not using odds-only, it uses twice the memory and over two and a half times the CPU operations as the following odds-only code, which is very little more complex:

# compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE_Odds
  include Iterator(Prime)
  @bits : BitArray; @bitndx : Int32 = -1

  def initialize(range : Prime)
    if range < 3
      @bits = BitArray.new 0
    else
      @bits = BitArray.new(((range - 1) >> 1).to_i32)
    end
    ba = @bits; ndx = 0
    while true
      wi = (ndx + ndx) * (ndx + 3) + 3 # start cull index calculation
      break if wi >= ba.size
      if ba[ndx]
        ndx += 1; next
      end
      bp = ndx + ndx + 3
      while wi < ba.size
        ba[wi] = true; wi += bp
      end
      ndx += 1
    end
  end

  def next
    while @bitndx < @bits.size
      if @bitndx < 0
        @bitndx += 1; return 2_u64
      elsif @bits[@bitndx]
        @bitndx += 1; next
      end
      rslt = (@bitndx + @bitndx + 3).to_u64; @bitndx += 1; return rslt
    end
    stop
  end
end

print "Primes up to a hundred:  "
SoE_Odds.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million:  "
puts SoE_Odds.new(1_000_000).each.size
print "Number of primes to a billion:  "
start_time = Time.monotonic
print SoE_Odds.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."
Output:
Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 4877.829642 milliseconds.

As can be seen, this is over two times faster than the non-odds-only version when run on the same CPU due to reduced pressure on the CPU data cache; however it is only reasonably performant for ranges of a few millions, and above that a page-segmented version of odds-only (or further wheel factorization) should be used plus other techniques for a further reduction of number of CPU clock cycles per culling/marking operation.

Page-Segmented Odds-Only Version

For sieving of ranges larger than a few million efficiently, a page-segmented sieve should always be used to preserve CPU cache associativity by making the page size to be about that of the CPU L1 data cache. The following code implements a page-segmented version that is an extensible sieve (no upper limit needs be specified) using a secondary memoized feed of base prime value arrays which use a smaller page-segment size for efficiency. When the count of the number of primes is desired, the sieve is polymorphic in output and counts the unmarked composite bits by using fast `popcount` instructions taken 64-bits at a time. The code is as follows:

# compile with `--release --no-debug` for speed...

alias Prime = UInt64
alias PrimeNdx = Int64
alias PrimeArr = Array(Prime)
alias SieveBuffer = Pointer(UInt8)
alias BasePrime = UInt32
alias BasePrimeArr = Array(BasePrime)

CPUL1CACHE = 131072 # 16 Kilobytes in nimber of bits

BITMASK = Pointer(UInt8).malloc(8) { |i| 1_u8 << i }

# Count number of non-composite (zero) bits within index range...
# sieve buffer is always evenly divisible by 64-bit words...
private def count_page_to(ndx : Int32, sb : SieveBuffer)
  lstwrdndx = ndx >> 6; mask = (~1_u64) << (ndx & 63)
  cnt = lstwrdndx * 64 + 64; sbw = sb.as(Pointer(UInt64))
  lstwrdndx.times { |i| cnt -= sbw[i].popcount }
  cnt - (sbw[lstwrdndx] | mask).popcount
end

# Cull composite bits from sieve buffer using base prime arrays;
# starting at overall given prime index for given buffer bit size...
private def cull_page(pndx : PrimeNdx, bitsz : Int32,
              bps : Iterator(BasePrimeArr), sb : SieveBuffer)
  bps.each { |bpa|
    bpa.each { |bpu32|
      bp = bpu32.to_i64; bpndx = (bp - 3) >> 1
      swi = (bpndx + bpndx) * (bpndx + 3) + 3 # calculate start prime index
      return if swi >= pndx + bitsz.to_i64
      bpi = bp.to_i32 # calculate buffer start culling index...
      bi = (swi >= pndx) ? (swi - pndx).to_i32 : begin
        r = (pndx - swi) % bp; r == 0 ? 0 : bpi - r.to_i32
      end
      # when base prime is small enough, cull using strided loops to
      # simplify the inner loops at the cost of more loop overhead...
      # allmost all of the work is done by the following loop...
      if bpi < (bitsz >> 4)
        bilmt = bi + (bpi << 3); cplmt = sb + (bitsz >> 3)
        bilmt = CPUL1CACHE if bilmt > CPUL1CACHE
        while bi < bilmt
          cp = sb + (bi >> 3); msk = BITMASK[bi & 7]
          while cp < cplmt # use pointer to save loop overhead
            cp[0] |= msk; cp += bpi
          end
          bi += bpi
        end
      else
        while bi < bitsz # bitsz
          sb[bi >> 3] |= BITMASK[bi & 7]; bi += bpi
        end
      end } }
end

# Iterator over processed prime pages, polymorphic by the converter function...
private class PagedResults(T)
  @bpas : BasePrimeArrays
  @cmpsts : SieveBuffer

  def initialize(@prmndx : PrimeNdx,
                 @cmpstsbitsz : Int32,
                 @cnvrtrfnc : (Int64, Int32, SieveBuffer) -> T)
    @bpas = BasePrimeArrays.new
    @cmpsts = SieveBuffer.malloc(((@cmpstsbitsz + 63) >> 3) & (-8))
  end

  private def dopage
    (@prmndx..).step(@cmpstsbitsz.to_i64).map { |pn|
        @cmpsts.clear(@cmpstsbitsz >> 3)
        cull_page(pn, @cmpstsbitsz, @bpas.each, @cmpsts)
        @cnvrtrfnc.call(pn, @cmpstsbitsz, @cmpsts) }
  end

  def each
    dopage
  end

  def each(& : T -> _) : Nil
    itr = dopage
    while true
      value = itr.next
      break if value.is_a?(Iterator::Stop)
      yield value
    end
  end
end

# Secondary memoized chain of BasePrime arrays (by small page size),
# which is actually a iterable lazy list (memoized) of BasePrimeArr;
# Crystal has closures, so it is easy to implement a LazyList class
# which memoizes the results of the thunk so it is only executed once...
private class BasePrimeArrays
  @baseprmarr : BasePrimeArr # head of lezy list
  @tail : BasePrimeArrays? = nil # tail starts as non-existing

  def initialize # special case for first page of base primes
    # converter of sieve buffer to base primes array...
    sb2bparrprc = -> (pn : PrimeNdx, bl : Int32, sb : SieveBuffer) {
      cnt = count_page_to(bl - 1, sb)
      bparr = BasePrimeArr.new(cnt, 0); j = 0
      bsprm = (pn + pn + 3).to_u32
      bl.times.each { |i|
        next if (sb[i >> 3] & BITMASK[i & 7]) != 0 
        bparr[j] = bsprm + (i + i).to_u32; j += 1 }
      bparr }

    cmpsts = SieveBuffer.malloc 128 # fake bparr for first iter...
    frstbparr = sb2bparrprc.call(0_i64, 1024, cmpsts)
    cull_page(0_i64, 1024, Iterator.of(frstbparr).each, cmpsts)
    @baseprmarr = sb2bparrprc.call(0_i64, 1024, cmpsts)

    # initialization of pages after the first is deferred to avoid data race...
    initbpas = -> { PagedResults.new(1024_i64, 1024, sb2bparrprc).each }
    # recursive LazyList generator function...
    nxtbpa = uninitialized Proc(Iterator(BasePrimeArr), BasePrimeArrays)
    nxtbpa = -> (bppgs : Iterator(BasePrimeArr)) {
      nbparr = bppgs.next
      abort "Unexpectedbase primes end!!!" if nbparr.is_a?(Iterator::Stop)
      BasePrimeArrays.new(nbparr, ->{ nxtbpa.call(bppgs) }) }
    @thunk = ->{ nxtbpa.call(initbpas.call) }
  end
  def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(BasePrimeArrays))
  end
  def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(Nil))
  end
  def initialize(@baseprmarr : BasePrimeArr, @thunk : Nil)
  end

  def tail # not thread safe without a lock/mutex...
    if thnk = @thunk
      @tail = thnk.call; @thunk = nil
    end
    @tail
  end

  private class BasePrimeArrIter # iterator over BasePrime arrays...
    include Iterator(BasePrimeArr)
    @dbparrs : Proc(BasePrimeArrays?)

    def initialize(fromll : BasePrimeArrays)
      @dbparrs = ->{ fromll.as(BasePrimeArrays?) }
    end

    def next
      if bpas = @dbparrs.call
        rslt = bpas.@baseprmarr; @dbparrs = -> { bpas.tail }; rslt
      else
        abort "Unexpected end of base primes array iteration!!!"
      end
    end
  end
  
  def each
    BasePrimeArrIter.new(self)
  end
end

# An "infinite" extensible iteration of primes,...
def primes
  sb2prms = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
    cnt = count_page_to(bitsz - 1, sb)
    prmarr = PrimeArr.new(cnt, 0); j = 0
    bsprm = (pn + pn + 3).to_u64
    bitsz.times.each { |i|
      next if (sb[i >> 3] & BITMASK[i & 7]) != 0
      prmarr[j] = bsprm + (i + i).to_u64; j += 1 }
    prmarr
  }
  (2_u64..2_u64).each
    .chain PagedResults.new(0, CPUL1CACHE, sb2prms).each.flat_map { |prmspg| prmspg.each }
end

# Counts number of primes to given limit...
def primes_count_to(lmt : Prime)
  if lmt < 3
    lmt < 2 ? return 0 : return 1
  end
  lmtndx = ((lmt - 3) >> 1).to_i64
  sb2cnt = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
    pglmt = pn + bitsz.to_i64 - 1
    if (pn + CPUL1CACHE.to_i64) > lmtndx
      Tuple.new(count_page_to((lmtndx - pn).to_i32, sb).to_i64, pglmt)
    else
      Tuple.new(count_page_to(bitsz - 1, sb).to_i64, pglmt)
    end
  }
  count = 1
  PagedResults.new(0, CPUL1CACHE, sb2cnt).each { |(cnt, lmt)|
    count += cnt; break if lmt >= lmtndx }
  count
end

print "The primes up to 100 are: "
primes.each.take_while { |p| p <= 100_u64 }.each { |p| print " ", p }
print ".\r\nThe Number of primes up to a million is "
print primes.each.take_while { |p| p <= 1_000_000_u64 }.size
print ".\r\nThe number of primes up to a billion is "
start_time = Time.monotonic
# answr = primes.each.take_while { |p| p <= 1_000_000_000_u64 }.size # slow way
answr = primes_count_to(1_000_000_000) # fast way
elpsd = (Time.monotonic - start_time).total_milliseconds
print "#{answr} in #{elpsd} milliseconds.\r\n"
Output:
The primes up to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97.
The Number of primes up to a million is 78498.
The number of primes up to a billion is 50847534 in 658.466028 milliseconds.

When run on the same machine as the previous version, the code is about seven and a half times as fast as even the above Odds-Only version at about 2.4 CPU clock cycles per culling operation rather than over 17, partly due to better cache associativity (about half the gain) but also due to tuning the inner culling loop for small base prime values to operate by byte pointer strides with a constant mask value to simplify the code generated for these inner loops; as there is some overhead in the eight outer loops that set this up, this technique is only applicable for smaller base primes.

Further gains are possible by using maximum wheel factorization rather than just factorization for odd base primes which can reduce the number of operations by a factor of about four and the number of CPU clock cycles per culling operation can be reduced by an average of a further about 25 percent for sieving to a billion by using extreme loop unrolling techniques for both the dense and sparse culling cases. As well, multi-threading by pages can reduce the wall clock time by a factor of the number of effective cores (non Hyper-Threaded cores).

D

Simpler Version

Prints all numbers less than the limit.

import std.stdio, std.algorithm, std.range, std.functional;

uint[] sieve(in uint limit) nothrow @safe {
    if (limit < 2)
        return [];
    auto composite = new bool[limit];

    foreach (immutable uint n; 2 .. cast(uint)(limit ^^ 0.5) + 1)
        if (!composite[n])
            for (uint k = n * n; k < limit; k += n)
                composite[k] = true;

    //return iota(2, limit).filter!(not!composite).array;
    return iota(2, limit).filter!(i => !composite[i]).array;
}

void main() {
    50.sieve.writeln;
}
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Faster Version

This version uses an array of bits (instead of booleans, that are represented with one byte), and skips even numbers. The output is the same.

import std.stdio, std.math, std.array;

size_t[] sieve(in size_t m) pure nothrow @safe {
    if (m < 3)
        return null;
    immutable size_t n = m - 1;
    enum size_t bpc = size_t.sizeof * 8;
    auto F = new size_t[((n + 2) / 2) / bpc + 1];
    F[] = size_t.max;

    size_t isSet(in size_t i) nothrow @safe @nogc {
        immutable size_t offset = i / bpc;
        immutable size_t mask = 1 << (i % bpc);
        return F[offset] & mask;
    }

    void resetBit(in size_t i) nothrow @safe @nogc {
        immutable size_t offset = i / bpc;
        immutable size_t mask = 1 << (i % bpc);
        if ((F[offset] & mask) != 0)
            F[offset] = F[offset] ^ mask;
    }

    for (size_t i = 3; i <= sqrt(real(n)); i += 2)
        if (isSet((i - 3) / 2))
            for (size_t j = i * i; j <= n; j += 2 * i)
                resetBit((j - 3) / 2);

    Appender!(size_t[]) result;
    result ~= 2;
    for (size_t i = 3; i <= n; i += 2)
        if (isSet((i - 3) / 2))
            result ~= i;
    return result.data;
}

void main() {
    50.sieve.writeln;
}

Extensible Version

(This version is used in the task Extensible prime generator.)

/// Extensible Sieve of Eratosthenes.
struct Prime {
    uint[] a = [2];

    private void grow() pure nothrow @safe {
        immutable p0 = a[$ - 1] + 1;
        auto b = new bool[p0];

        foreach (immutable di; a) {
            immutable uint i0 = p0 / di * di;
            uint i = (i0 < p0) ? i0 + di - p0 : i0 - p0;
            for (; i < b.length; i += di)
                b[i] = true;
        }

        foreach (immutable uint i, immutable bi; b)
            if (!b[i])
                a ~= p0 + i;
    }

    uint opCall(in uint n) pure nothrow @safe {
        while (n >= a.length)
            grow;
        return a[n];
    }
}

version (sieve_of_eratosthenes3_main) {
    void main() {
        import std.stdio, std.range, std.algorithm;

        Prime prime;
        uint.max.iota.map!prime.until!q{a > 50}.writeln;
    }
}

To see the output (that is the same), compile with -version=sieve_of_eratosthenes3_main.

Dart

// helper function to pretty print an Iterable
String iterableToString(Iterable seq) {
  String str = "[";
  Iterator i = seq.iterator;
  if (i.moveNext()) str += i.current.toString();
  while(i.moveNext()) {
    str += ", " + i.current.toString();
  }
  return str + "]";
}

main() {
  int limit = 1000;
  int strt = new DateTime.now().millisecondsSinceEpoch;
  Set<int> sieve = new Set<int>();
  
  for(int i = 2; i <= limit; i++) {
    sieve.add(i);
  }
  for(int i = 2; i * i <= limit; i++) {
   if(sieve.contains(i)) {
     for(int j = i * i; j <= limit; j += i) {
       sieve.remove(j);
     }
   }
  }
  var sortedValues = new List<int>.from(sieve);
  int elpsd = new DateTime.now().millisecondsSinceEpoch - strt;
  print("Found " + sieve.length.toString() + " primes up to " + limit.toString() +
      " in " + elpsd.toString() + " milliseconds.");
  print(iterableToString(sortedValues)); // expect sieve.length to be 168 up to 1000...
//  Expect.equals(168, sieve.length);
}
Output:

Found 168 primes up to 1000 in 9 milliseconds. [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

Although it has the characteristics of a true Sieve of Eratosthenes, the above code isn't very efficient due to the remove/modify operations on the Set. Due to these, the computational complexity isn't close to linear with increasing range and it is quite slow for larger sieve ranges compared to compiled languages, taking an average of about 22 thousand CPU clock cycles for each of the 664579 primes (about 4 seconds on a 3.6 Gigahertz CPU) just to sieve to ten million.

faster bit-packed array odds-only solution

import 'dart:typed_data';
import 'dart:math';

Iterable<int> soeOdds(int limit) {
  if (limit < 3) return limit < 2 ? Iterable.empty() : [2];
  int lmti = (limit - 3) >> 1;
  int bfsz = (lmti >> 3) + 1;
  int sqrtlmt = (sqrt(limit) - 3).floor() >> 1;
  Uint32List cmpsts = Uint32List(bfsz);
  for (int i = 0; i <= sqrtlmt; ++i)
    if ((cmpsts[i >> 5] & (1 << (i & 31))) == 0) {
      int p = i + i + 3;
      for (int j = (p * p - 3) >> 1; j <= lmti; j += p)
        cmpsts[j >> 5] |= 1 << (j & 31);
    }
  return
    [2].followedBy(
      Iterable.generate(lmti + 1)
      .where((i) => cmpsts[i >> 5] & (1 << (i & 31)) == 0)
      .map((i) => i + i + 3) );
}

void main() {
  final int range = 100000000;
  String s = "( ";
  primesPaged().take(25).forEach((p)=>s += "$p "); print(s + ")");
  print("There are ${countPrimesTo(1000000)} primes to 1000000.");
  final start = DateTime.now().millisecondsSinceEpoch;
  final answer = soeOdds(range).length;
  final elapsed = DateTime.now().millisecondsSinceEpoch - start;
  print("There were $answer primes found up to $range.");
  print("This test bench took $elapsed milliseconds.");
}
Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 4604 milliseconds.

The above code is somewhat faster at about 1.5 thousand CPU cycles per prime here run on a 1.92 Gigahertz low end Intel x5-Z8350 CPU or about 2.5 seconds on a 3.6 Gigahertz CPU using the Dart VM to sieve to 100 million.

Unbounded infinite iterators/generators of primes

Infinite generator using a (hash) Map (sieves odds-only)

The following code will have about O(n log (log n)) performance due to a hash table having O(1) average performance and is only somewhat slow due to the constant overhead of processing hashes:

Iterable<int> primesMap() {
    Iterable<int> oddprms() sync* {
      yield(3); yield(5); // need at least 2 for initialization
      final Map<int, int> bpmap = {9: 6};
      final Iterator<int> bps = oddprms().iterator;
      bps.moveNext(); bps.moveNext(); // skip past 3 to 5
      int bp = bps.current;
      int n = bp;
      int q = bp * bp;
      while (true) {
        n += 2;
        while (n >= q || bpmap.containsKey(n)) {
          if (n >= q) {
            final int inc = bp << 1;
            bpmap[bp * bp + inc] = inc;
            bps.moveNext(); bp = bps.current; q = bp * bp;
          } else {
            final int inc = bpmap.remove(n);
            int next = n + inc;
            while (bpmap.containsKey(next)) {
              next += inc;
            }
            bpmap[next] = inc;
          }
          n += 2;
        }
        yield(n);
      }
    }
    return [2].followedBy(oddprms());
}

void main() {
  final int range = 100000000;
  String s = "( ";
  primesMap().take(25).forEach((p)=>s += "$p "); print(s + ")");
  print("There are ${primesMap().takeWhile((p)=>p<=1000000).length} preimes to 1000000.");
  final start = DateTime.now().millisecondsSinceEpoch;
  final answer = primesMap().takeWhile((p)=>p<=range).length;
  final elapsed = DateTime.now().millisecondsSinceEpoch - start;
  print("There were $answer primes found up to $range.");
  print("This test bench took $elapsed milliseconds.");
}
Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 preimes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 16086 milliseconds.

This takes about 5300 CPU clock cycles per prime or about 8.4 seconds if run on a 3.6 Gigahertz CPU, which is slower than the above fixed bit-packed array version but has the advantage that it runs indefinitely, (at least on 64-bit machines; on 32 bit machines it can only be run up to the 32-bit number range, or just about a factor of 20 above as above).

Due to the constant execution overhead this is only reasonably useful for ranges up to tens of millions anyway.

Fast page segmented array infinite generator (sieves odds-only)

The following code also theoretically has a O(n log (log n)) execution speed performance and the same limited use on 32-bit execution platformas, but won't realize the theoretical execution complexity for larger primes due to the cache size increasing in size beyond its limits; but as the CPU L2 cache size that it automatically grows to use isn't any slower than the basic culling loop speed, it won't slow down much above that limit up to ranges of about 2.56e14, which will take in the order of weeks:

Translation of: Kotlin
import 'dart:typed_data';
import 'dart:math';
import 'dart:collection';

// a lazy list
typedef _LazyList _Thunk();
class _LazyList<T> {
  final T head;
  _Thunk thunk;
  _LazyList<T> _rest;
  _LazyList(T this.head, _Thunk this.thunk);
  _LazyList<T> get rest {
    if (this.thunk != null) {
      this._rest = this.thunk();
      this.thunk = null;
    }
    return this._rest;
  }
}

class _LazyListIterable<T> extends IterableBase<T> {
  _LazyList<T> _first;
  _LazyListIterable(_LazyList<T> this._first);
  @override Iterator<T> get iterator {
    Iterable<T> inner() sync* {
      _LazyList<T> current = this._first;
      while (true) {
        yield(current.head);
        current = current.rest;
      }
    }
    return inner().iterator;
  }
}

// zero bit population count Look Up Table for 16-bit range...
final Uint8List CLUT =
  Uint8List.fromList(
    Iterable.generate(65536)
    .map((i) {
      final int v0 = ~i & 0xFFFF;
      final int v1 = v0 - ((v0 & 0xAAAA) >> 1);
      final int v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2);
      return (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) * 0x0101)) >> 8) & 31;
    })
    .toList());

int _countComposites(Uint8List cmpsts) {
  Uint16List buf = Uint16List.view(cmpsts.buffer);
  int lmt = buf.length;
  int count = 0;
  for (var i = 0; i < lmt; ++i) {
    count += CLUT[buf[i]];
  }
  return count;
} 

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
Uint32List _composites2BasePrimeArray(int low, Uint8List cmpsts) {
  final int lmti = cmpsts.length << 3;
  final int len = _countComposites(cmpsts);
  final Uint32List rslt = Uint32List(len);
  int j = 0;
  for (int i = 0; i < lmti; ++i) {
    if (cmpsts[i >> 3] & 1 << (i & 7) == 0) {
        rslt[j++] = low + i + i;
    }
  }
  return rslt;
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
void _sieveComposites(int low, Uint8List buffer, Iterable<Uint32List> bpas) {
  final int lowi = (low - 3) >> 1;
  final int len = buffer.length;
  final int lmti = len << 3;
  final int nxti = lowi + lmti;
  for (var bpa in bpas) {
    for (var bp in bpa) {
      final int bpi = (bp - 3) >> 1;
      int strti = ((bpi * (bpi + 3)) << 1) + 3;
      if (strti >= nxti) return;
      if (strti >= lowi) strti = strti - lowi;
      else {
        strti = (lowi - strti) % bp;
        if (strti != 0) strti = bp - strti;
      }
      if (bp <= len >> 3 && strti <= lmti - bp << 6) {
        final int slmti = min(lmti, strti + bp << 3);
        for (var s = strti; s < slmti; s += bp) {
          final int msk = 1 << (s & 7);
          for (var c = s >> 3; c < len; c += bp) {
              buffer[c] |= msk;
          }
        }
      }
      else {
        for (var c = strti; c < lmti; c += bp) {
            buffer[c >> 3] |= 1 << (c & 7);
        }
      }
    }
  } 
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
Iterable<Uint32List> _makeBasePrimeArrays() {
  var cmpsts = Uint8List(512);
  _LazyList<Uint32List> _nextelem(int low, Iterable<Uint32List> bpas) {
    // calculate size so that the bit span is at least as big as the
    // maximum culling prime required, rounded up to minsizebits blocks...
    final int rqdsz = 2 + sqrt((1 + low).toDouble()).toInt();
    final sz = (((rqdsz >> 12) + 1) << 9); // size in bytes
    if (sz > cmpsts.length) cmpsts = Uint8List(sz);
    cmpsts.fillRange(0, cmpsts.length, 0);
    _sieveComposites(low, cmpsts, bpas);
    final arr = _composites2BasePrimeArray(low, cmpsts);
    final nxt = low + (cmpsts.length << 4);
    return _LazyList(arr, () => _nextelem(nxt, bpas));
  }
  // pre-seeding breaks recursive race,
  // as only known base primes used for first page...
  final preseedarr = Uint32List.fromList( [ // pre-seed to 100, can sieve to 10,000...
    3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
    , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ] );
  return _LazyListIterable(
           _LazyList(preseedarr,
           () => _nextelem(101, _makeBasePrimeArrays()))
         );
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
Iterable<List> _makeSievePages() sync*  {
  final bpas = _makeBasePrimeArrays(); // secondary source of base prime arrays
  int low = 3;
  Uint8List cmpsts = Uint8List(16384);
  _sieveComposites(3, cmpsts, bpas);
  while (true) {
    yield([low, cmpsts]);
    final rqdsz = 2 + sqrt((1 + low).toDouble()).toInt(); // problem with sqrt not exact past about 10^12!!!!!!!!!
    final sz = ((rqdsz >> 17) + 1) << 14; // size iin bytes
    if (sz > cmpsts.length) cmpsts = Uint8List(sz);
    cmpsts.fillRange(0, cmpsts.length, 0);
    low += cmpsts.length << 4;
    _sieveComposites(low, cmpsts, bpas);
  }
}

int countPrimesTo(int range) {
  if (range < 3) { if (range < 2) return 0; else return 1; }
  var count = 1;
  for (var sp in _makeSievePages()) {
    int low = sp[0]; Uint8List cmpsts = sp[1];
    if ((low + (cmpsts.length << 4)) > range) {
      int lsti = (range - low) >> 1;
      var lstw = (lsti >> 4); var lstb = lstw << 1;
      var msk = (-2 << (lsti & 15)) & 0xFFFF;
      var buf = Uint16List.view(cmpsts.buffer, 0, lstw);
      for (var i = 0; i < lstw; ++i)
        count += CLUT[buf[i]];
      count += CLUT[(cmpsts[lstb + 1] << 8) | cmpsts[lstb] | msk];
      break;
    } else {
      count += _countComposites(cmpsts);
    }
  }
  return count;
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
Iterable<int> primesPaged() sync* {
  yield(2);
  for (var sp in _makeSievePages()) {
    int low = sp[0]; Uint8List cmpsts = sp[1];
    var szbts = cmpsts.length << 3;
    for (var i = 0; i < szbts; ++i) {
        if (cmpsts[i >> 3].toInt() & (1 << (i & 7)) != 0) continue;
        yield(low + i + i);
    }
  }
}

void main() {
  final int range = 1000000000;
  String s = "( ";
  primesPaged().take(25).forEach((p)=>s += "$p "); print(s + ")");
  print("There are ${countPrimesTo(1000000)} primes to 1000000.");
  final start = DateTime.now().millisecondsSinceEpoch;
  final answer = countPrimesTo(range); // fast way
//  final answer = primesPaged().takeWhile((p)=>p<=range).length; // slow way using enumeration
  final elapsed = DateTime.now().millisecondsSinceEpoch - start;
  print("There were $answer primes found up to $range.");
  print("This test bench took $elapsed milliseconds.");
}
Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 50847534 primes found up to 1000000000.
This test bench took 9385 milliseconds.

This version counts the primes up to one billion in about five seconds at 3.6 Gigahertz (a low end 1.92 Gigahertz CPU used here) or about 350 CPU clock cycles per prime under the Dart Virtual Machine (VM).

Note that it takes about four times as long to do this using the provided primes generator/enumerator as noted in the code, which is normal for all languages that it takes longer to actually enumerate the primes than it does to sieve in culling the composite numbers, but Dart is somewhat slower than most for this.

The algorithm can be sped up by a factor of four by extreme wheel factorization and (likely) about a factor of the effective number of CPU cores by using multi-processing isolates, but there isn't much point if one is to use the prime generator for output. For most purposes, it is better to use custom functions that directly manipulate the culled bit-packed page segments as `countPrimesTo` does here.

dc

[dn[,]n dsx [d 1 r :a lx + d ln!<.] ds.x lx] ds@
[sn 2 [d;a 0=@ 1 + d ln!<#] ds#x] se

100 lex
Output:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,\
97,

Delphi

program erathostenes;

{$APPTYPE CONSOLE}

type
  TSieve = class
  private
    fPrimes: TArray<boolean>;
    procedure InitArray;
    procedure Sieve;
    function getNextPrime(aStart: integer): integer;
    function getPrimeArray: TArray<integer>;
  public
    function getPrimes(aMax: integer): TArray<integer>;
  end;

  { TSieve }

function TSieve.getNextPrime(aStart: integer): integer;
begin
  result := aStart;
  while not fPrimes[result] do
    inc(result);
end;

function TSieve.getPrimeArray: TArray<integer>;
var
  i, n: integer;
begin
  n := 0;
  setlength(result, length(fPrimes)); // init array with maximum elements
  for i := 2 to high(fPrimes) do
  begin
    if fPrimes[i] then
    begin
      result[n] := i;
      inc(n);
    end;
  end;
  setlength(result, n); // reduce array to actual elements
end;

function TSieve.getPrimes(aMax: integer): TArray<integer>;
begin
  setlength(fPrimes, aMax);
  InitArray;
  Sieve;
  result := getPrimeArray;
end;

procedure TSieve.InitArray;
begin
  for i := 2 to high(fPrimes) do
    fPrimes[i] := true;
end;

procedure TSieve.Sieve;
var
  i, n, max: integer;
begin
  max := length(fPrimes);
  i := 2;
  while i < sqrt(max) do
  begin
    n := sqr(i);
    while n < max do
    begin
      fPrimes[n] := false;
      inc(n, i);
    end;
    i := getNextPrime(i + 1);
  end;
end;

var
  i: integer;
  Sieve: TSieve;

begin
  Sieve := TSieve.Create;
  for i in Sieve.getPrimes(100) do
    write(i, ' ');
  Sieve.Free;
  readln;
end.

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

Draco

/* Sieve of Eratosthenes - fill a given boolean array */
proc nonrec sieve([*] bool prime) void:
    word p, c, max;
    max := dim(prime,1)-1;
    prime[0] := false;
    prime[1] := false;
    for p from 2 upto max do prime[p] := true od;
    for p from 2 upto max>>1 do
        if prime[p] then
            for c from p*2 by p upto max do
                prime[c] := false
            od
        fi
    od
corp

/* Print primes up to 1000 using the sieve */
proc nonrec main() void:
    word MAX = 1000;
    unsigned MAX i;
    byte c;
    [MAX+1] bool prime;
    sieve(prime);     

    c := 0;
    for i from 0 upto MAX do
        if prime[i] then
            write(i:4);
            c := c + 1;
            if c=10 then c:=0; writeln() fi
        fi
    od
corp
Output:
   2   3   5   7  11  13  17  19  23  29
  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113
 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229
 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349
 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463
 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601
 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733
 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863
 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

DWScript

function Primes(limit : Integer) : array of Integer;
var
   n, k : Integer;
   sieve := new Boolean[limit+1];
begin
   for n := 2 to Round(Sqrt(limit)) do begin
      if not sieve[n] then begin
         for k := n*n to limit step n do
            sieve[k] := True;
      end;
   end;
   
   for k:=2 to limit do
      if not sieve[k] then
         Result.Add(k);
end;

var r := Primes(50);
var i : Integer;
for i:=0 to r.High do
   PrintLn(r[i]);

Dylan

With outer to sqrt and inner to p^2 optimizations:

define method primes(n)
  let limit = floor(n ^ 0.5) + 1;
  let sieve = make(limited(<simple-vector>, of: <boolean>), size: n + 1, fill: #t);
  let last-prime = 2;

  while (last-prime < limit)
    for (x from last-prime ^ 2 to n by last-prime)
      sieve[x] := #f;
    end for;
    block (found-prime)
      for (n from last-prime + 1 below limit)
        if (sieve[n] = #f)
          last-prime := n;
          found-prime()
        end;
      end;
      last-prime := limit;
    end block;
  end while;

  for (x from 2 to n)
    if (sieve[x]) format-out("Prime: %d\n", x); end;
  end;
end;

E

E's standard library doesn't have a step-by-N numeric range, so we'll define one, implementing the standard iteration protocol.

def rangeFromBelowBy(start, limit, step) {
  return def stepper {
    to iterate(f) {
      var i := start
      while (i < limit) {
        f(null, i)
        i += step
      }
    }
  }
}

The sieve itself:

def eratosthenes(limit :(int > 2), output) {
  def composite := [].asSet().diverge()
  for i ? (!composite.contains(i)) in 2..!limit {
    output(i)
    composite.addAll(rangeFromBelowBy(i ** 2, limit, i))
  }
}

Example usage:

? eratosthenes(12, println)
# stdout: 2
#         3
#         5
#         7
#         11

EasyLang

len is_divisible[] 100
max = sqrt len is_divisible[]
for d = 2 to max
   if is_divisible[d] = 0
      for i = d * d step d to len is_divisible[]
         is_divisible[i] = 1
      .
   .
.
for i = 2 to len is_divisible[]
   if is_divisible[i] = 0
      print i
   .
.

eC

This example is incorrect. Please fix the code and remove this message.

Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: this is not a Sieve of Eratosthenes; it is just trial division.

public class FindPrime
{
   Array<int> primeList { [ 2 ], minAllocSize = 64 };
   int index;

   index = 3;

   bool HasPrimeFactor(int x)
   {
      int max = (int)floor(sqrt((double)x));
     
      for(i : primeList)
      {
         if(i > max) break;
         if(x % i == 0) return true;
      }
      return false;
   }

   public int GetPrime(int x)
   {
      if(x > primeList.count - 1)
      {
         for (; primeList.count != x; index += 2)
            if(!HasPrimeFactor(index))
            {
               if(primeList.count >= primeList.minAllocSize) primeList.minAllocSize *= 2;
               primeList.Add(index);
            }
      }
      return primeList[x-1];
   }
}

class PrimeApp : Application
{
   FindPrime fp { };
   void Main()
   {
      int num = argc > 1 ? atoi(argv[1]) : 1;
      PrintLn(fp.GetPrime(num));
   }
}

EchoLisp

Sieve

(require 'types) ;; bit-vector

;; converts sieve->list for integers in [nmin .. nmax[
(define (s-range sieve nmin nmax (base 0))
	(for/list ([ i (in-range nmin nmax)]) #:when (bit-vector-ref sieve i) (+ i base)))
	
;; next prime in sieve > p, or #f
(define (s-next-prime sieve p ) ;; 
		(bit-vector-scan-1 sieve (1+ p)))
		

;; returns a bit-vector - sieve- all numbers in [0..n[
(define (eratosthenes n)
  (define primes (make-bit-vector-1 n ))
  (bit-vector-set! primes 0 #f)
  (bit-vector-set! primes 1 #f)
  (for ([p (1+ (sqrt n))])
  		 #:when (bit-vector-ref primes  p)
         (for ([j (in-range (* p p) n p)])
    (bit-vector-set! primes j #f)))
   primes) 
  
(define s-primes (eratosthenes 10_000_000))

(s-range s-primes 0 100)
    (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
(s-range s-primes 1_000_000 1_000_100)
    (1000003 1000033 1000037 1000039 1000081 1000099)
(s-next-prime s-primes 9_000_000)
    9000011

Segmented sieve

Allow to extend the basis sieve (n) up to n^2. Memory requirement is O(√n)

;; ref :  http://research.cs.wisc.edu/techreports/1990/TR909.pdf
;; delta multiple of  sqrt(n)
;; segment is [left .. left+delta-1]


(define (segmented sieve left delta  (p 2) (first 0))
	(define segment (make-bit-vector-1 delta))
	(define right (+ left (1- delta)))
	 (define pmax (sqrt right))
	  (while p
	  #:break (> p pmax)
	 (set! first (+ left (modulo (- p (modulo left p)) p )))
			
 	(for   [(q (in-range first (1+ right) p))]
	(bit-vector-set! segment (- q left) #f))	
        (set! p (bit-vector-scan-1 sieve (1+ p))))
	segment)

(define (seg-range nmin delta)
    (s-range (segmented s-primes nmin delta) 0 delta nmin))


(seg-range 10_000_000_000 1000) ;; 15 milli-sec

     (10000000019 10000000033 10000000061 10000000069 10000000097 10000000103 10000000121 
       10000000141 10000000147 10000000207 10000000259 10000000277 10000000279 10000000319 
       10000000343 10000000391 10000000403 10000000469 10000000501 10000000537 10000000583 
       10000000589 10000000597 10000000601 10000000631 10000000643 10000000649 10000000667 
       10000000679 10000000711 10000000723 10000000741 10000000753 10000000793 10000000799 
       10000000807 10000000877 10000000883 10000000889 10000000949 10000000963 10000000991 
       10000000993 10000000999)

;; 8 msec using the native (prime?) function
(for/list ((p (in-range 1_000_000_000 1_000_001_000))) #:when (prime? p) p)

Wheel

A 2x3 wheel gives a 50% performance gain.

;; 2x3 wheel
(define (weratosthenes n)
  (define primes (make-bit-vector n )) ;; everybody to #f (false)
  (bit-vector-set! primes 2 #t)
  (bit-vector-set! primes 3 #t)
  (bit-vector-set! primes 5 #t)
  
  (for ([i  (in-range 6 n 6) ]) ;; set candidate primes
  		(bit-vector-set! primes (1+ i) #t)
  		(bit-vector-set! primes (+ i 5) #t)
  		)
  		
  (for ([p  (in-range 5 (1+ (sqrt n)) 2 ) ])
  		 #:when (bit-vector-ref primes  p)
         (for ([j (in-range (* p p) n p)])
    (bit-vector-set! primes j #f)))
   primes)

EDSAC order code

This sieve program is based on one by Eiiti Wada, which on 2020-07-05 could be found at https://www.dcs.warwick.ac.uk/~edsac/

The main external change is that the program is not designed to be viewed in the monitor; it just writes as many primes as possible within the limitations imposed by Rosetta Code. Apart from the addition of comments, internal changes include the elimination of one set of masks, and a revised method of switching from one mask to another.

On the EdsacPC simulator (see link above) the printout starts off very slowly, and gradually gets faster.

 [Sieve of Eratosthenes]
 [EDSAC program. Initial Orders 2]

[Memory usage:
   56..87   library subroutine P6, for printing
   88..222  main program
  224..293  mask table: 35 long masks; each has 34 1's and a single 0
  294..1023 array of bits for integers 2, 3, 4, ...,
            where bit is changed from 1 to 0 when integer is crossed out.
  The address of the mask table must be even, and clear of the main program.
  To change it, just change the value after "T47K" below.
  The address of the bit array will then be changed automatically.]
 
[Subroutine M3, prints header, terminated by blank row of tape.
 It's an "interlude", which runs and then gets overwritten.]
      PFGKIFAFRDLFUFOFE@A6FG@E8FEZPF
      @&*SIEVE!OF!ERATOSTHENES!#2020
      @&*BASED!ON!CODE!BY!EIITI!WADA!#2001
      ..PZ

[Subroutine P6, prints strictly positive integer.
 32 locations; working locations 1, 4, 5.]
        T  56 K
  GKA3FT25@H29@VFT4DA3@TFH30@S6@T1FV4DU4DAFG26@TFTF
  O5FA4DF4FS4FL4FT4DA1FS3@G9@EFSFO31@E20@J995FJF!F

  [Store address of mask table in (say) location 47
  (chosen because its code letter M is first letter of "Mask").
  Address must be even and clear of main program.]
        T  47 K
        P 224 F  [<-------- address of mask table]

[Main program]
        T  88 K  [Define load address for main program.
                 Must be even, because of long values at start.]
        G     K [set @ (theta) for relative addressing]

[Long constants]
        T#Z PF TZ [clears sandwich digit between 0 and 1]
    [0] PD PF     [long value 1; also low word = short 1]
        T2#Z PF T2Z [clears sandwich digit between 2 and 3]
    [2] PF K4096F [long value 1000...000 binary;
                   also high word = teleprinter null]

 [Short constants
  The address in the following C order is the (exclusive) end of the bit table.
  Must be even: max = 1024, min = M + 72 where M is address of mask table set up above.
  Usually 1024, but may be reduced, e.g. to make the program run faster.]
    [4] C1024 D [or e.g. C 326 D to make it much faster]
    [5] U     F ['U' = 'T' - 'C']
    [6] K     F ['K' = 'S' - 'C']
    [7] H    #M [H order for start of mask table]
    [8] H  70#M [used to test for end of mask table]
    [9] P   2 F [constant4, or 2 in address field]
   [10] P  70 F [constant 140, or 70 in address field]
   [11] @     F [carriage return]
   [12] &     F [line feed]

 [Short variables]
   [13] P   1 F [p = number under test
                Let p = 35*q + r, where 0 <= r < 35]
   [14] P     F [4*q]
   [15] P   4 F [4*r]

 [Initial values of orders; required only for optional code below.]
   [16] C  70#M [initial value of a variable C order]
   [17] T    #M [initial value of a variable T order]
   [18] T  70#M [initial value of a variable T order]

   [19]
   [Enter with acc = 0]

  [Optional code to do some initializing at run time.
   This code allows the program to run again without being loaded again.]
         A  7 @ [initial values of variable orders]
         T 65 @
         A 16 @
         T 66 @
         A 17 @
         T 44 @
         A 18 @
         T 52 @

  [Initialize variables]
         A    @ [load 1 (short)]
         L    D [shift left 1]
         U 13 @ [p := 2]
         L  1 F [shift left 2]
         T 15 @ [4*r :=  8]
         T 14 @ [4*q :=  0]
  [End of optional code]

 [Make table of 35 masks 111...110,  111...101, ...,  011...111
  Treat the mask 011...111 separately to avoid accumulator overflow.
  Assume acc = 0 here.]
        S    #@ [acc all 1's]
        S  2 #@ [acc := 0111...111]
   [35] T 68 #M [store at high end of mask table]
        S    #@ [acc := -1]        
        L     D [make mask 111...1110]
        G 43  @ [jump to store it
 [Loop shifting the mask right and storing the result in the mask table.
  Uses first entry of bit array as temporary store.]
   [39] T     F [clear acc]
        A 70 #M [load previous mask]
        L     D [shift left]
        A    #@ [add 1]
   [43] U 70 #M [update current mask]
   [44] T    #M [store it in table (order changed at run time)]
        A 44  @ [load preceding T order]
        A  9  @ [inc address by 2]
        U 44  @ [store back]
        S 35  @ [reached high entry yet?]
        G 39  @ [loop back if not]
 [Mask table is now complete]

 [Initialize bit array: no numbers crossed out, so all bits are 1]
   [50] T     F [clear acc]
        S     #@ [subtract long 1, make top 35 bits all 1's]
   [52] T 70 #M [store as long value, both words all 1's  (order changed at run time)]
        A  52 @ [load preceding order]
        A   9 @ [add 2 to address field]
        U  52 @ [and store back]
        S   5 @ [convert to C order with same address (*)]
        S   4 @ [test for end of bit array]
        G  50 @ [loop until stored all 1's in bit table]
 [(*) Done so that end of bit table can be stored at one place only
      in list of constants, i.e. 'C m D' only, not 'T m D' as well.]

 [Start of main loop.]
 [Testing whether number has been crossed out]
   [59] T     F [acc := 0]
        A  66 @ [deriving S order from C order]
        A   6 @
        T  64 @
        S    #@ [acc := -1]
   [64] S     F [acc := 1's complement of bit-table entry (order changed at run time)]
   [65] H    #M [mult reg := start of mask array (order changed at run time)]
   [66] C  70#M [acc := -1 iff p (current number) is crossed out (order changed at run time)]
 [The next order is to avoid accumulator overflow if acc = max positive number]
        E  70 @ [if acc >= 0, jump to process new prime]
       A     #@ [if acc < 0, add 1 to test for -1]
       E 106  @ [if acc now >= 0 number is crossed out, jump to test next]
 [Here if new prime found.
  Send it to the teleprinter]
   [70] O  11 @ [print CR]
        O  12 @ [print LF]
        T     F [clear acc]
        A  13 @ [load prime]
        T     F [store in C(0) for print routine]
        A  75 @ [for subroutine return]
        G 56  F [print prime]

 [Cross out its multiples by setting corresponding bits to 0]
        A  65 @ [load H order above]
        T 102 @ [plant in crossing-out loop]
        A  66 @ [load C order above]
        T1 03 @ [plant in crossing-out loop]

 [Start of crossing-out loop. Here acc must = 0]
   [81] A 102 @ [load H order below]
        A  15 @ [inc address field by 2*r, where p = 35q + r]
        U 102 @ [update H order]
        S   8 @ [compare with 'H 70 #M']
        G  93 @ [skip if not gone beyond end of mask table]
        T     F [wrap mask address and inc address in bit tsble]
        A 102 @ [load H order below]
        S  10 @ [reduce mask address by 70]
        T 102 @ [update H order]
        A 103 @ [load C order below]
        A   9 @ [add 2 to address]
        T 103 @ [update C order]
   [93] T     F [clear acc]
        A 103 @ [load C order below]
        A  14 @ [inc address field by 2*q, where p = 35q + r]
        U 103 @ [update C order]
        S   4 @ [test for end of bit array]
        E 106 @ [if finished crossing out, loop to test next number]
        A  4  @ [restore C order]
        A  5  @ [make T order with same address]
        T 104 @ [store below]

 [Execute the crossing-out orders created above]
  [102] X     F [mult reg := mask (order created at run time)]
  [103] X     F [acc := logical and with bit-table entry (order created at run time)]
  [104] X     F [update entry (order created at run time)]
        E  81 @ [loop back with acc = 0]

  [106] T     F [clear acc]
        A  13 @ [load p = number under test]
        A     @ [add 1 (single)]
        T  13 @ [update]
        A  15 @ [load 4*r, where p = 35q + r]
        A   9 @ [add 4]
        U  15 @ [store back (r inc'd by 1)]
        S  10 @ [is 4*r now >= 140?]
        G 119 @ [no, skip]
        T  15 @ [yes, reduce 4*r by 140]
        A  14 @ [load 4*q]
        A   9 @ [add 4]
        T  14 @ [store back (q inc'd by 1)]
  [119] T     F [clear acc]
        A  65 @ [load 'H ... D' order, which refers to a mask]
        A   9 @ [inc mask address by 2]
        U  65 @ [update order]
        S   8 @ [over end of mask table?]
        G  59 @ [no, skip wrapround code]
        A   7 @ [yes, add constant to wrap round]
        T  65 @ [update H order]
        A  66 @ 
        A   9 @ [inc address by 2]
        U  66 @ [and store back]
        S   4 @ [test for end, as defined by C order at start]
        G  59 @ [loop back if not at end]

[Finished whole thing]
  [132] O   3 @ [output null to flush teleprinter buffer]
        Z     F [stop]
        E  19 Z [address to start execution]
        P     F [acc = 0 at start]
Output:
SIEVE OF ERATOSTHENES 2020
BASED ON CODE BY EIITI WADA 2001
    2
    3
    5
    7
   11
   13
   17
[...]
12703
12713
12721
12739
12743
12757
12763

Eiffel

Works with: EiffelStudio version 6.6 beta (with provisional loop syntax)
class
    APPLICATION
 
create
    make
 
feature
       make
            -- Run application.
        do
            across primes_through (100) as ic loop print (ic.item.out + " ") end
        end
 
    primes_through (a_limit: INTEGER): LINKED_LIST [INTEGER]
            -- Prime numbers through `a_limit'
        require
            valid_upper_limit: a_limit >= 2
        local
            l_tab: ARRAY [BOOLEAN]
        do
            create Result.make
            create l_tab.make_filled (True, 2, a_limit)
            across
                l_tab as ic
            loop
                if ic.item then
                    Result.extend (ic.target_index)
                    across ((ic.target_index * ic.target_index) |..| l_tab.upper).new_cursor.with_step (ic.target_index) as id
                    loop
                        l_tab [id.item] := False
                    end
                end
            end
        end
end

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Elixir

defmodule Prime do
  def eratosthenes(limit \\ 1000) do
    sieve = [false, false | Enum.to_list(2..limit)] |> List.to_tuple
    check_list = [2 | Stream.iterate(3, &(&1+2)) |> Enum.take(round(:math.sqrt(limit)/2))]
    Enum.reduce(check_list, sieve, fn i,tuple ->
      if elem(tuple,i) do
        clear_num = Stream.iterate(i*i, &(&1+i)) |> Enum.take_while(fn x -> x <= limit end)
        clear(tuple, clear_num)
      else
        tuple
      end
    end)
  end
  
  defp clear(sieve, list) do
    Enum.reduce(list, sieve, fn i, acc -> put_elem(acc, i, false) end)
  end
end

limit = 199
sieve = Prime.eratosthenes(limit)
Enum.each(0..limit, fn n ->
  if x=elem(sieve, n), do: :io.format("~3w", [x]), else: :io.format("  .") 
  if rem(n+1, 20)==0, do: IO.puts ""
end)
Output:
  .  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19
  .  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .
  . 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59
  . 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79
  .  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .
  .101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .
  .  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139
  .  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .
  .  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179
  .181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199

Shorter version (but slow):

defmodule Sieve do
  def primes_to(limit), do: sieve(Enum.to_list(2..limit))

  defp sieve([h|t]), do: [h|sieve(t -- for n <- 1..length(t), do: h*n)]
  defp sieve([]), do: []
end

Alternate much faster odds-only version more suitable for immutable data structures using a (hash) Map

The above code has a very limited useful range due to being very slow: for example, to sieve to a million, even changing the algorithm to odds-only, requires over 800 thousand "copy-on-update" operations of the entire saved immutable tuple ("array") of 500 thousand bytes in size, making it very much a "toy" application. The following code overcomes that problem by using a (immutable/hashed) Map to store the record of the current state of the composite number chains resulting from each of the secondary streams of base primes, which are only 167 in number up to this range; it is a functional "incremental" Sieve of Eratosthenes implementation:

defmodule PrimesSoEMap do
  @typep stt :: {integer, integer, integer, Enumerable.integer, %{integer => integer}}

  @spec advance(stt) :: stt
  defp advance {n, bp, q, bps?, map} do
    bps = if bps? === nil do Stream.drop(oddprms(), 1) else bps? end
    nn = n + 2
    if nn >= q do
      inc = bp + bp
      nbps = bps |> Stream.drop(1)
      [nbp] = nbps |> Enum.take(1)
      advance {nn, nbp, nbp * nbp, nbps, map |> Map.put(nn + inc, inc)}
    else if Map.has_key?(map, nn) do
      {inc, rmap} = Map.pop(map, nn)
      [next] =
        Stream.iterate(nn + inc, &(&1 + inc))
          |> Stream.drop_while(&(Map.has_key?(rmap, &1))) |> Enum.take(1)
      advance {nn, bp, q, bps, Map.put(rmap, next, inc)}
    else
      {nn, bp, q, bps, map}
    end end
  end

  @spec oddprms() :: Enumerable.integer
  defp oddprms do # put first base prime cull seq in Map so never empty
    # advance base odd primes to 5 when initialized
    init = {7, 5, 25, nil, %{9 => 6}}
    [3, 5] # to avoid race, preseed with the first 2 elements...
      |> Stream.concat(
            Stream.iterate(init, &(advance &1))
              |> Stream.map(fn {p,_,_,_,_} -> p end))
  end

  @spec primes() :: Enumerable.integer
  def primes do
    Stream.concat([2], oddprms())
  end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoEMap.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
  fn () ->
    ans =
      PrimesSoEMap.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
    ans end
:timer.tc(testfunc)
  |> (fn {t,ans} ->
    IO.puts "There are #{ans} primes up to #{range}."
    IO.puts "This test bench took #{t} microseconds." end).()
Output:
The first 25 primes are:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes up to 1000000.
This test bench took 3811957 microseconds.

The output time of about 3.81 seconds to one million is on a 1.92 Gigahertz CPU meaning that it takes about 93 thousand CPU clock cycles per prime which is still quite slow compared to mutable data structure implementations but comparable to "functional" implementations in other languages and is slow due to the time to calculate the required hashes. One advantage that it has is that it is O(n log (log n)) asymptotic computational complexity meaning that it takes not much more than ten times as long to sieve a range ten times higher.

This algorithm could be easily changed to use a Priority Queue (preferably Min-Heap based for the least constant factor computational overhead) to save some of the computation time, but then it will have the same computational complexity as the following code and likely about the same execution time.

Alternate faster odds-only version more suitable for immutable data structures using lazy Streams of Co-Inductive Streams

In order to save the computation time of computing the hashes, the following version uses a deferred execution Co-Inductive Stream type (constructed using Tuple's) in an infinite tree folding structure (by the `pairs` function):

defmodule PrimesSoETreeFolding do
  @typep cis :: {integer, (() -> cis)}
  @typep ciss :: {cis, (() -> ciss)}

  @spec merge(cis, cis) :: cis
  defp merge(xs, ys) do
    {x, restxs} = xs; {y, restys} = ys
    cond do
      x < y -> {x, fn () -> merge(restxs.(), ys) end}
      y < x -> {y, fn () -> merge(xs, restys.()) end}
      true -> {x, fn () -> merge(restxs.(), restys.()) end}
    end
  end

  @spec smlt(integer, integer) :: cis
  defp smlt(c, inc) do
    {c, fn () -> smlt(c + inc, inc) end}
  end

  @spec smult(integer) :: cis
  defp smult(p) do
    smlt(p * p, p + p)
  end
P
  @spec allmults(cis) :: ciss
  defp allmults {p, restps} do
    {smult(p), fn () -> allmults(restps.()) end}
  end

  @spec pairs(ciss) :: ciss
  defp pairs {cs0, restcss0} do
    {cs1, restcss1} = restcss0.()
    {merge(cs0, cs1), fn () -> pairs(restcss1.()) end}
  end

  @spec cmpsts(ciss) :: cis
  defp cmpsts {cs, restcss} do
    {c, restcs} = cs
    {c, fn () -> merge(restcs.(), cmpsts(pairs(restcss.()))) end}
  end

  @spec minusat(integer, cis) :: cis
  defp minusat(n, cmps) do
    {c, restcs} = cmps
    if n < c do
      {n, fn () -> minusat(n + 2, cmps) end}
    else
      minusat(n + 2, restcs.())
    end
  end

  @spec oddprms() :: cis
  defp oddprms() do
    {3, fn () ->
      {5, fn () -> minusat(7, cmpsts(allmults(oddprms()))) end}
    end}
  end

  @spec primes() :: Enumerable.t
  def primes do
    [2] |> Stream.concat(
      Stream.iterate(oddprms(), fn {_, restps} -> restps.() end)
        |> Stream.map(fn {p, _} -> p end)
    )
  end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoETreeFolding.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
  fn () ->
    ans =
      PrimesSoETreeFolding.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
    ans end
:timer.tc(testfunc)
  |> (fn {t,ans} ->
    IO.puts "There are #{ans} primes up to #{range}."
    IO.puts "This test bench took #{t} microseconds." end).()

It's output is identical to the previous version other than the time required is less than half; however, it has a O(n (log n) (log (log n))) asymptotic computation complexity meaning that it gets slower with range faster than the above version. That said, it would take sieving to billions taking hours before the two would take about the same time.

Elm

Elm with immutable arrays

module PrimeArray exposing (main)

import Array exposing (Array, foldr, map, set)
import Html exposing (div, h1, p, text)
import Html.Attributes exposing (style)


{-
The Eratosthenes sieve task in Rosetta Code does not accept the use of modulo function  (allthough Elm functions modBy and remainderBy work always correctly as they require type Int excluding type Float). Thus the solution needs an indexed work array as Elm has no indexes for lists.

In this method we need no division remainder calculations, as we just set the markings of non-primes into the array. We need the indexes that we know, where the marking of the non-primes shall be set.

Because everything is immutable in Elm, every change of array values will create a new array save the original array unchanged. That makes the program running slower or consuming more space of memory than with non-functional imperative languages. All conventional loops (for, while, until) are excluded in Elm because immutability requirement.

   Live: https://ellie-app.com/pTHJyqXcHtpa1
-}


alist =
    List.range 2 150



-- Work array contains integers 2 ... 149


workArray =
    Array.fromList alist


n : Int
n =
    List.length alist



-- The max index of integers used in search for primes
-- limit * limit < n
-- equal: limit <= √n


limit : Int
limit =
    round (0.5 + sqrt (toFloat n))
 

-- Remove zero cells of the array


findZero : Int -> Bool
findZero =
    \el -> el > 0


zeroFree : Array Int
zeroFree =
    Array.filter findZero workResult


nrFoundPrimes =
    Array.length zeroFree


workResult : Array Int
workResult =
    loopI 2 limit workArray



{- As Elm has no loops (for, while, until)
we must use recursion instead!
The search of prime starts allways saving the 
first found value (not setting zero) and continues setting the multiples of prime to zero.
Zero is no integer and may thus be used as marking of non-prime numbers. At the end, only the primes remain in the array and the zeroes are removed from the resulted array to be shown in Html. 
-}

-- The recursion increasing variable i follows:

loopI : Int -> Int -> Array Int -> Array Int
loopI i imax arr =
    if i > imax then
        arr

    else
        let
            arr2 =
                phase i arr
        in
        loopI (i + 1) imax arr2



-- The helper function


phase : Int -> Array Int -> Array Int
phase i =
    arrayMarker i (2 * i - 2) n


lastPrime =
    Maybe.withDefault 0 <| Array.get (nrFoundPrimes - 1) zeroFree


outputArrayInt : Array Int -> String
outputArrayInt arr =
    decorateString <|
        foldr (++) "" <|
            Array.map (\x -> String.fromInt x ++ " ") arr


decorateString : String -> String
decorateString str =
    "[ " ++ str ++ "]"



-- Recursively marking the multiples of p with zero
-- This loop operates with constant p


arrayMarker : Int -> Int -> Int -> Array Int -> Array Int
arrayMarker p min max arr =
    let
        arr2 =
            set min 0 arr

        min2 =
            min + p
    in
    if min < max then
        arrayMarker p min2 max arr2

    else
        arr


main =
    div [ style "margin" "2%" ]
        [ h1 [] [ text "Sieve of Eratosthenes" ]
        , text ("List of integers [2, ... ," ++ String.fromInt n ++ "]")
        , p [] [ text ("Total integers " ++ String.fromInt n) ]
        , p [] [ text ("Max prime of search " ++ String.fromInt limit) ]
        , p [] [ text ("The largest found prime " ++ String.fromInt lastPrime) ]
        , p [ style "color" "blue", style "font-size" "1.5em" ]
            [ text (outputArrayInt zeroFree) ]
        , p [] [ text ("Found " ++ String.fromInt nrFoundPrimes ++ " primes") ]
        ]
Output:
List of integers [2, ... ,149]

Total integers 149

Max prime of search 13

The largest found prime 149

[ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 ]

Found 35 primes 

Concise Elm Immutable Array Version

Although functional, the above code is written in quite an imperative style, so the following code is written in a more concise functional style and includes timing information for counting the number of primes to a million:

module Main exposing (main)

import Browser exposing (element)
import Task exposing (Task, succeed, perform, andThen)
import Html exposing (Html, div, text)
import Time exposing (now, posixToMillis)

import Array exposing (repeat, get, set)

cLIMIT : Int
cLIMIT = 1000000

primesArray : Int -> List Int
primesArray n =
  if n < 2 then [] else
  let
    sz = n + 1
    loopbp bp arr =
      let s = bp * bp in
      if s >= sz then arr else
      let tst = get bp arr |> Maybe.withDefault True in
      if tst then loopbp (bp + 1) arr else
      let
        cullc c iarr =
          if c >= sz then iarr else
          cullc (c + bp) (set c True iarr)
      in loopbp (bp + 1) (cullc s arr)
    cmpsts = loopbp 2 (repeat sz False)
    cnvt (i, t) = if t then Nothing else Just i
  in cmpsts |> Array.toIndexedList
      |> List.drop 2 -- skip the values for zero and one
      |> List.filterMap cnvt -- primes are indexes of not composites

type alias Model = List String

type alias Msg = Model

test : (Int -> List Int) -> Int -> Cmd Msg
test primesf lmt =
  let
    to100 = primesf 100 |> List.map String.fromInt |> String.join ", "
    to100str = "The primes to 100 are:  " ++ to100
    timemillis() = now |> andThen (succeed << posixToMillis)
  in timemillis() |> andThen (\ strt ->
       let cnt = primesf lmt |> List.length
       in timemillis() |> andThen (\ stop ->
         let answrstr = "Found " ++ (String.fromInt cnt) ++ " primes to "
                          ++ (String.fromInt cLIMIT) ++ " in "
                          ++ (String.fromInt (stop - strt)) ++ " milliseconds."
         in succeed [to100str, answrstr] ) ) |> perform identity

main : Program () Model Msg
main =
  element { init = \ _ -> ( [], test primesArray cLIMIT )
          , update = \ msg _ -> (msg, Cmd.none)
          , subscriptions = \ _ -> Sub.none
          , view = div [] << List.map (div [] << List.singleton << text) }
Output:
The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 958 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Concise Elm Immutable Array Odds-Only Version

The following code can replace the `primesArray` function in the above program and called from the testing and display code (two places):

primesArrayOdds : Int -> List Int
primesArrayOdds n =
  if n < 2 then [] else
  let
    sz = (n - 1) // 2
    loopi i arr =
      let s = (i + i) * (i + 3) + 3 in
      if s >= sz then arr else
      let tst = get i arr |> Maybe.withDefault True in
      if tst then loopi (i + 1) arr else
      let
        bp = i + i + 3
        cullc c iarr =
          if c >= sz then iarr else
          cullc (c + bp) (set c True iarr)
      in loopi (i + 1) (cullc s arr)
    cmpsts = loopi 0 (repeat sz False)
    cnvt (i, t) = if t then Nothing else Just <| i + i + 3
    oddprms = cmpsts |> Array.toIndexedList |> List.filterMap cnvt
  in 2 :: oddprms
Output:
The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 371 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Richard Bird Tree Folding Elm Version

The Elm language doesn't efficiently handle the Sieve of Eratosthenes (SoE) algorithm because it doesn't have directly accessible linear arrays (the Array module used above is based on a persistent tree of sub arrays) and also does Copy On Write (COW) for every write to every location as well as a logarithmic process of updating as a "tree" to minimize the COW operations. Thus, there is better performance implementing the Richard Bird Tree Folding functional algorithm, as follows:

Translation of: Haskell
module Main exposing (main)

import Browser exposing (element)
import Task exposing (Task, succeed, perform, andThen)
import Html exposing (Html, div, text)
import Time exposing (now, posixToMillis)

cLIMIT : Int
cLIMIT = 1000000

type CIS a = CIS a (() -> CIS a)

uptoCIS2List : comparable -> CIS comparable -> List comparable
uptoCIS2List n cis =
  let loop (CIS hd tl) lst =
        if hd > n then List.reverse lst
        else loop (tl()) (hd :: lst)
  in loop cis []

countCISTo : comparable -> CIS comparable -> Int
countCISTo n cis =
  let loop (CIS hd tl) cnt =
        if hd > n then cnt else loop (tl()) (cnt + 1)
  in loop cis 0

primesTreeFolding : () -> CIS Int
primesTreeFolding() =
  let
    merge (CIS x xtl as xs) (CIS y ytl as ys) =
      case compare x y of
        LT -> CIS x <| \ () -> merge (xtl()) ys
        EQ -> CIS x <| \ () -> merge (xtl()) (ytl())
        GT -> CIS y <| \ () -> merge xs (ytl())
    pmult bp =
      let adv = bp + bp
          pmlt p = CIS p <| \ () -> pmlt (p + adv)
      in pmlt (bp * bp)
    allmlts (CIS bp bptl) =
      CIS (pmult bp) <| \ () -> allmlts (bptl())
    pairs (CIS frst tls) =
      let (CIS scnd tlss) = tls()
      in CIS (merge frst scnd) <| \ () -> pairs (tlss())
    cmpsts (CIS (CIS hd tl) tls) =
      CIS hd <| \ () -> merge (tl()) <| cmpsts <| pairs (tls())
    testprm n (CIS hd tl as cs) =
      if n < hd then CIS n <| \ () -> testprm (n + 2) cs
      else testprm (n + 2) (tl())
    oddprms() =
      CIS 3 <| \ () -> testprm 5 <| cmpsts <| allmlts <| oddprms()
  in CIS 2 <| \ () -> oddprms()

type alias Model = List String

type alias Msg = Model

test : (() -> CIS Int) -> Int -> Cmd Msg
test primesf lmt =
  let
    to100 = primesf() |> uptoCIS2List 100
              |> List.map String.fromInt |> String.join ", "
    to100str = "The primes to 100 are:  " ++ to100
    timemillis() = now |> andThen (succeed << posixToMillis)
  in timemillis() |> andThen (\ strt ->
       let cnt = primesf() |> countCISTo lmt
       in timemillis() |> andThen (\ stop ->
         let answrstr = "Found " ++ (String.fromInt cnt) ++ " primes to "
                          ++ (String.fromInt cLIMIT) ++ " in "
                          ++ (String.fromInt (stop - strt)) ++ " milliseconds."
         in succeed [to100str, answrstr] ) ) |> perform identity

main : Program () Model Msg
main =
  element { init = \ _ -> ( [], test primesTreeFolding cLIMIT )
          , update = \ msg _ -> (msg, Cmd.none)
          , subscriptions = \ _ -> Sub.none
          , view = div [] << List.map (div [] << List.singleton << text) }
Output:
The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 201 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Elm Priority Queue Version

Using a Binary Minimum Heap Priority Queue is a constant factor faster than the above code as the data structure is balanced rather than "heavy to the right" and requires less memory allocations/deallocation in the following code, which implements enough of the Priority Queue for the purpose. Just substitute the following code for `primesTreeFolding` and pass `primesPQ` as an argument to `test` rather than `primesTreeFolding`:

type PriorityQ comparable v =
  Mt
  | Br comparable v (PriorityQ comparable v)
                    (PriorityQ comparable v)

emptyPQ : PriorityQ comparable v
emptyPQ = Mt

peekMinPQ : PriorityQ comparable v -> Maybe (comparable, v)
peekMinPQ  pq = case pq of
                  (Br k v _ _) -> Just (k, v)
                  Mt -> Nothing

pushPQ : comparable -> v -> PriorityQ comparable v
           -> PriorityQ comparable v
pushPQ wk wv pq =
  case pq of
    Mt -> Br wk wv Mt Mt
    (Br vk vv pl pr) -> 
      if wk <= vk then Br wk wv (pushPQ vk vv pr) pl
      else Br vk vv (pushPQ wk wv pr) pl

siftdown : comparable -> v -> PriorityQ comparable v
             -> PriorityQ comparable v -> PriorityQ comparable v
siftdown wk wv pql pqr =
  case pql of
    Mt -> Br wk wv Mt Mt
    (Br vkl vvl pll prl) ->
      case pqr of
        Mt -> if wk <= vkl then Br wk wv pql Mt
              else Br vkl vvl (Br wk wv Mt Mt) Mt
        (Br vkr vvr plr prr) ->
          if wk <= vkl && wk <= vkr then Br wk wv pql pqr
          else if vkl <= vkr then Br vkl vvl (siftdown wk wv pll prl) pqr
               else Br vkr vvr pql (siftdown wk wv plr prr)

replaceMinPQ : comparable -> v -> PriorityQ comparable v
                 -> PriorityQ comparable v
replaceMinPQ wk wv pq = case pq of
                          Mt -> Mt
                          (Br _ _ pl pr) -> siftdown wk wv pl pr

primesPQ : () -> CIS Int
primesPQ() =
  let    
    sieve n pq q (CIS bp bptl as bps) =
      if n >= q then
        let adv = bp + bp in let (CIS nbp _ as nbps) = bptl()
        in sieve (n + 2) (pushPQ (q + adv) adv pq) (nbp * nbp) nbps
      else let
             (nxtc, _) = peekMinPQ pq |> Maybe.withDefault (q, 0) -- default when empty
             adjust tpq =
               let (c, adv) = peekMinPQ tpq |> Maybe.withDefault (0, 0)
               in if c > n then tpq
                  else adjust (replaceMinPQ (c + adv) adv tpq)
           in if n >= nxtc then sieve (n + 2) (adjust pq) q bps
              else CIS n <| \ () -> sieve (n + 2) pq q bps
    oddprms() = CIS 3 <| \ () -> sieve 5 emptyPQ 9 <| oddprms()
  in CIS 2 <| \ () -> oddprms()
Output:
The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 124 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Emacs Lisp

Library: cl-lib
(defun sieve-set (limit)
  (let ((xs (make-vector (1+ limit) 0)))
    (cl-loop for i from 2 to limit
             when (zerop (aref xs i))
             collect i
             and do (cl-loop for m from (* i i) to limit by i
                             do (aset xs m 1)))))

Straightforward implementation of sieve of Eratosthenes, 2 times faster:

(defun sieve (limit)
  (let ((xs (vconcat [0 0] (number-sequence 2 limit))))
    (cl-loop for i from 2 to (sqrt limit)
             when (aref xs i)
             do (cl-loop for m from (* i i) to limit by i
                         do (aset xs m 0)))
    (remove 0 xs)))

Erlang

Erlang using Dicts

-module( sieve_of_eratosthenes ).

-export( [primes_upto/1] ).

primes_upto( N ) ->
	Ns = lists:seq( 2, N ),
	Dict = dict:from_list( [{X, potential_prime} || X <- Ns] ),
	{Upto_sqrt_ns, _T} = lists:split( erlang:round(math:sqrt(N)), Ns ),
	{N, Prime_dict} = lists:foldl( fun find_prime/2, {N, Dict}, Upto_sqrt_ns ),
	lists:sort( dict:fetch_keys(Prime_dict) ).



find_prime( N, {Max, Dict} ) -> find_prime( dict:find(N, Dict), N, {Max, Dict} ).

find_prime( error, _N, Acc ) -> Acc;
find_prime( {ok, _Value}, N, {Max, Dict} ) when Max > N*N ->
    {Max, lists:foldl( fun dict:erase/2, Dict, lists:seq(N*N, Max, N))};
find_prime( {ok, _Value}, _, R) -> R.
Output:
35> sieve_of_eratosthenes:primes_upto( 20 ).
[2,3,5,7,11,13,17,19]

Erlang Lists of Tuples, Sloww

A much slower, perverse method, using only lists of tuples. Especially evil is the P = lists:filtermap operation which yields a list for every iteration of the X * M row. Has the virtue of working for any -> N :)

-module( sieve ).                                                                                                    
-export( [main/1,primes/2] ).                                                                                        
                                                                                                                     
main(N) -> io:format("Primes: ~w~n", [ primes(2,N) ]).                                                               
                                                                                                                     
primes(M,N) -> primes(M, N,lists:seq( M, N ),[]).                                                                    
                                                                                                                     
primes(M,N,_Acc,Tuples) when M > N/2-> out(Tuples);                                                                  

primes(M,N,Acc,Tuples) when length(Tuples) < 1 -> 
        primes(M,N,Acc,[{X, X} || X <- Acc]);                              

primes(M,N,Acc,Tuples) ->                                                                                            
        {SqrtN, _T} = lists:split( erlang:round(math:sqrt(N)), Acc ),                                                
        F = Tuples,                                                                                                  
        Ms = lists:filtermap(fun(X) -> if X > 0 -> {true, X * M}; true -> false end end, SqrtN),                     
        P = lists:filtermap(fun(T) -> 
            case lists:keymember(T,1,F) of true -> 
            {true, lists:keyreplace(T,1,F,{T,0})}; 
             _-> false end end,  Ms),                                                                                              
        AA = mergeT(P,lists:last(P),1 ),                                                                             
        primes(M+1,N,Acc,AA).                                                                                        
                                                                                                                     
mergeT(L,M,Acc) when Acc == length(L) -> M;                                                                          
mergeT(L,M,Acc) ->                                                                                                   
        A = lists:nth(Acc,L),                                                                                        
        B = M,                                                                                                       
        Mer = lists:zipwith(fun(X, Y) -> if X < Y -> X; true -> Y end end, A, B),                                    
        mergeT(L,Mer,Acc+1).                                                                                         
                                                                                                                     
out(Tuples) ->                                                                                                       
        Primes = lists:filter( fun({_,Y}) -> Y > 0 end,  Tuples),                                                    
        [ X || {X,_} <- Primes ].
Output:
109> sieve:main(20).
Primes: [2,3,5,7,11,13,17,19]
ok
110> timer:tc(sieve, main, [20]).        
Primes: [2,3,5,7,11,13,17,19]
{129,ok}

Erlang with ordered sets

Since I had written a really odd and slow one, I thought I'd best do a better performer. Inspired by an example from https://github.com/jupp0r

-module(ossieve).
-export([main/1]).

sieve(Candidates,SearchList,Primes,_Maximum) when length(SearchList) == 0 ->
    ordsets:union(Primes,Candidates);
sieve(Candidates,SearchList,Primes,Maximum)  ->
     H = lists:nth(1,string:substr(Candidates,1,1)),
     Reduced1 = ordsets:del_element(H, Candidates),
     {Reduced2, ReducedSearch} = remove_multiples_of(H, Reduced1, SearchList),
     NewPrimes = ordsets:add_element(H,Primes),
     sieve(Reduced2, ReducedSearch, NewPrimes, Maximum).

remove_multiples_of(Number,Candidates,SearchList) ->                                 
    NewSearchList = ordsets:filter( fun(X) -> X >= Number * Number end, SearchList), 
    RemoveList = ordsets:filter( fun(X) -> X rem Number == 0 end, NewSearchList),
    {ordsets:subtract(Candidates, RemoveList), ordsets:subtract(NewSearchList, RemoveList)}.

main(N) ->      
    io:fwrite("Creating Candidates...~n"),
    CandidateList = lists:seq(3,N,2),
    Candidates = ordsets:from_list(CandidateList),
    io:fwrite("Sieving...~n"),
    ResultSet = ordsets:add_element(2,sieve(Candidates,Candidates,ordsets:new(),N)),
    io:fwrite("Sieved... ~w~n",[ResultSet]).
Output:
36> ossieve:main(100).
Creating Candidates...
Sieving...
Sieved... [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok

Erlang Canonical

A pure list comprehension approach.

-module(sieveof).
-export([main/1,primes/1, primes/2]).                 
                                                      
main(X) -> io:format("Primes: ~w~n", [ primes(X) ]).  
                                 
primes(X) -> sieve(range(2, X)).                                         
primes(X, Y) -> remove(primes(X), primes(Y)).                            
                                                                         
range(X, X) -> [X];                                                      
range(X, Y) -> [X | range(X + 1, Y)].                                    
                                                                         
sieve([X]) -> [X];                                                       
sieve([H | T]) -> [H | sieve(remove([H * X || X <-[H | T]], T))].        
                                                                         
remove(_, []) -> [];                                                     
remove([H | X], [H | Y]) -> remove(X, Y);                                
remove(X, [H | Y]) -> [H | remove(X, Y)].

{out}

> timer:tc(sieve, main, [100]). 
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{7350,ok}
61> timer:tc(sieveof, main, [100]). 
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{363,ok}

Clearly not only more elegant, but faster :) Thanks to http://stackoverflow.com/users/113644/g-b

Erlang ets + cpu distributed implementation

much faster previous erlang examples

#!/usr/bin/env escript
%% -*- erlang -*-
%%! -smp enable -sname p10_4
% vim:syn=erlang

-mode(compile).

main([N0]) ->
    N = list_to_integer(N0),
    ets:new(comp, [public, named_table, {write_concurrency, true} ]),
    ets:new(prim, [public, named_table, {write_concurrency, true}]),
    composite_mc(N),
    primes_mc(N),
    io:format("Answer: ~p ~n", [lists:sort([X||{X,_}<-ets:tab2list(prim)])]).

primes_mc(N) ->
    case erlang:system_info(schedulers) of
        1 -> primes(N);
        C -> launch_primes(lists:seq(1,C), C, N, N div C)
    end.
launch_primes([1|T], C, N, R) -> P = self(), spawn(fun()-> primes(2,R), P ! {ok, prm} end), launch_primes(T, C, N, R);
launch_primes([H|[]], C, N, R)-> P = self(), spawn(fun()-> primes(R*(H-1)+1,N), P ! {ok, prm} end), wait_primes(C);
launch_primes([H|T], C, N, R) -> P = self(), spawn(fun()-> primes(R*(H-1)+1,R*H), P ! {ok, prm} end), launch_primes(T, C, N, R).

wait_primes(0) -> ok;
wait_primes(C) ->
    receive
        {ok, prm} -> wait_primes(C-1)
    after 1000    -> wait_primes(C)
    end.

primes(N) -> primes(2, N).
primes(I,N) when I =< N ->
    case ets:lookup(comp, I) of
        [] -> ets:insert(prim, {I,1})
        ;_ -> ok
    end,
    primes(I+1, N);
primes(I,N) when I > N -> ok.


composite_mc(N) -> composite_mc(N,2,round(math:sqrt(N)),erlang:system_info(schedulers)).
composite_mc(N,I,M,C) when I =< M, C > 0 ->
    C1 = case ets:lookup(comp, I) of
        [] -> comp_i_mc(I*I, I, N), C-1
        ;_ -> C
    end,
    composite_mc(N,I+1,M,C1);
composite_mc(_,I,M,_) when I > M -> ok;
composite_mc(N,I,M,0) ->
    receive
        {ok, cim} -> composite_mc(N,I,M,1)
    after 1000    -> composite_mc(N,I,M,0)
    end.

comp_i_mc(J, I, N) -> 
    Parent = self(),
    spawn(fun() ->
        comp_i(J, I, N),
        Parent ! {ok, cim}
    end).

comp_i(J, I, N) when J =< N -> ets:insert(comp, {J, 1}), comp_i(J+I, I, N);
comp_i(J, _, N) when J > N -> ok.
Output:
mkh@mkh-xps:~/work/mblog/pr_euler/p10$ ./generator.erl 100
Answer: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
         97]

another several erlang implementation: http://mijkenator.github.io/2015/11/29/project-euler-problem-10/

ERRE

PROGRAM SIEVE_ORG
  ! --------------------------------------------------
  ! Eratosthenes Sieve Prime Number Program in BASIC
  ! (da 3 a SIZE*2)   from Byte September 1981
  !---------------------------------------------------
  CONST SIZE%=8190

  DIM FLAGS%[SIZE%]

BEGIN
  PRINT("Only 1 iteration")
  COUNT%=0
  FOR I%=0 TO SIZE% DO
     IF FLAGS%[I%]=TRUE THEN
         !$NULL
       ELSE
         PRIME%=I%+I%+3
         K%=I%+PRIME%
         WHILE NOT (K%>SIZE%) DO
            FLAGS%[K%]=TRUE
            K%=K%+PRIME%
         END WHILE
         PRINT(PRIME%;)
         COUNT%=COUNT%+1
     END IF
  END FOR
  PRINT
  PRINT(COUNT%;" PRIMES")
END PROGRAM
Output:

last lines of the output screen

 15749  15761  15767  15773  15787  15791  15797  15803  15809  15817  15823 
 15859  15877  15881  15887  15889  15901  15907  15913  15919  15923  15937 
 15959  15971  15973  15991  16001  16007  16033  16057  16061  16063  16067 
 16069  16073  16087  16091  16097  16103  16111  16127  16139  16141  16183 
 16187  16189  16193  16217  16223  16229  16231  16249  16253  16267  16273 
 16301  16319  16333  16339  16349  16361  16363  16369  16381 
 1899  PRIMES

Euler

The original Euler doesn't have loops built-in. Loops can easily be added by defining and calling suitable procedures with literal procedures as parameters. In this sample, a C-style "for" loop procedure is defined and used to sieve and print the primes.

begin
    new sieve; new for; new prime; new i;

    for   <- ` formal init; formal test; formal incr; formal body;
               begin
                 label again;
                 init;
again:           if test then begin body; incr; goto again end else 0
               end
             '
           ;

    sieve <- ` formal n;
               begin
                 new primes; new i; new i2; new j;
                 primes <- list n;
                 for( ` i <- 1 ', ` i <= n ', ` i <- i + 1 '
                    , ` primes[ i ] <- true '
                    );
                 primes[ 1 ] <- false;
                 for( ` i <- 2 '
                    , ` [ i2 <- i * i ] <= n '
                    , ` i <- i + 1 '
                    , ` if primes[ i ] then
                          for( ` j <- i2 ', ` j <= n ', ` j <- j + i '
                             , ` primes[ j ] <- false '
                             )
                        else 0
                      '
                    );
                 primes
               end
             '
           ;

    prime <- sieve( 30 );
    for( ` i <- 1 ', ` i <= length prime ', ` i <- i + 1 '
       , ` if prime[ i ] then out i else 0 '
       )

end $
Output:
    NUMBER                   2
    NUMBER                   3
    NUMBER                   5
    NUMBER                   7
    NUMBER                  11
    NUMBER                  13
    NUMBER                  17
    NUMBER                  19
    NUMBER                  23
    NUMBER                  29

Euphoria

constant limit = 1000
sequence flags,primes
flags = repeat(1, limit)
for i = 2 to sqrt(limit) do
    if flags[i] then
        for k = i*i to limit by i do
            flags[k] = 0
        end for
    end if
end for

primes = {}
for i = 2 to limit do
    if flags[i] = 1 then
        primes &= i
    end if
end for
? primes

Output:

{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,
181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,
277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,
383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,
487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,
601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,
709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,
827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997}

F#

Short with mutable state

let primes max =
    let mutable xs = [|2..max|]
    let limit = max |> float |> sqrt |> int
    for x in [|2..limit|] do
        xs <- xs |> Array.except [|x*x..x..max|]
    xs

Short Sweet Functional and Idiotmatic

Well lists may not be lazy, but if you call it a sequence then it's a lazy list!

(*
  An interesting implementation of The Sieve of Eratosthenes.
  Nigel Galloway April 7th., 2017.
*)
let SofE =
  let rec fn n g = seq{ match n with
                        |1 -> yield false; yield! fn g g 
                        |_ -> yield  true; yield! fn (n - 1) g}
  let rec fg ng = seq {
    let g = (Seq.findIndex(id) ng) + 2 // decreasingly inefficient with range at O(n)!
    yield g; yield! fn (g - 1) g |> Seq.map2 (&&) ng |> Seq.cache |> fg }
  Seq.initInfinite (fun x -> true) |> fg
Output:
> SofE |> Seq.take 10 |> Seq.iter(printfn "%d");;
2
3
5
7
11
13
17
19
23
29

Although interesting intellectually, and although the algorithm is more Sieve of Eratosthenes (SoE) than not in that it uses a progression of composite number representations separated by base prime gaps to cull, it isn't really SoE in performance due to several used functions that aren't linear with range, such as the "findIndex" that scans from the beginning of all primes to find the next un-culled value as the next prime in the sequence and the general slowness and inefficiency of F# nested sequence generation.

It is so slow that it takes in the order of seconds just to find the primes to a thousand!

For practical use, one would be much better served by any of the other functional sieves below, which can sieve to a million in less time than it takes this one to sieve to ten thousand. Those other functional sieves aren't all that many lines of code than this one.

Functional

Richard Bird Sieve

This is the idea behind Richard Bird's unbounded code presented in the Epilogue of M. O'Neill's article in Haskell. It is about twice as much code as the Haskell code because F# does not have a built-in lazy list so that the effect must be constructed using a Co-Inductive Stream (CIS) type since no memoization is required, along with the use of recursive functions in combination with sequences. The type inference needs some help with the new CIS type (including selecting the generic type for speed). Note the use of recursive functions to implement multiple non-sharing delayed generating base primes streams, which along with these being non-memoizing means that the entire primes stream is not held in memory as for the original Bird code:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesBird() =
  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
    if x < y then CIS(x, fun() -> xtlf() ^^ ys)
    elif y < x then CIS(y, fun() -> xs ^^ ytlf())
    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
  let pmltpls p = let rec nxt c = CIS(c, fun() -> nxt (c + p)) in nxt (p * p)
  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
    CIS(c, fun() -> (ctlf()) ^^ (cmpsts (amstlf())))
  let rec minusat n (CIS(c, ctlf) as cs) =
    if n < c then CIS(n, fun() -> minusat (n + 1u) cs)
    else minusat (n + 1u) (ctlf())
  let rec baseprms() = CIS(2u, fun() -> baseprms() |> allmltps |> cmpsts |> minusat 3u)
  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (baseprms())

The above code sieves all numbers of two and up including all even numbers as per the page specification; the following code makes the very minor changes for an odds-only sieve, with a speedup of over a factor of two:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesBirdOdds() =
  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
    if x < y then CIS(x, fun() -> xtlf() ^^ ys)
    elif y < x then CIS(y, fun() -> xs ^^ ytlf())
    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
  let pmltpls p = let adv = p + p
                  let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
    CIS(c, fun() -> ctlf() ^^ cmpsts (amstlf()))
  let rec minusat n (CIS(c, ctlf) as cs) =
    if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
    else minusat (n + 2u) (ctlf())
  let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))

Tree Folding Sieve

The above code is still somewhat inefficient as it operates on a linear right extending structure that deepens linearly with increasing base primes (those up to the square root of the currently sieved number); the following code changes the structure into an infinite binary tree-like folding by combining each pair of prime composite streams before further processing as usual - this decreases the processing by approximately a factor of log n:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesTreeFold() =
  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
    if x < y then CIS(x, fun() -> xtlf() ^^ ys)
    elif y < x then CIS(y, fun() -> xs ^^ ytlf())
    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
  let pmltpls p = let adv = p + p
                  let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
  let rec pairs (CIS(cs0, cs0tlf)) =
    let (CIS(cs1, cs1tlf)) = cs0tlf() in CIS(cs0 ^^ cs1, fun() -> pairs (cs1tlf()))
  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
    CIS(c, fun() -> ctlf() ^^ (cmpsts << pairs << amstlf)())
  let rec minusat n (CIS(c, ctlf) as cs) =
    if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
    else minusat (n + 2u) (ctlf())
  let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))

The above code is over four times faster than the "BirdOdds" version (at least 10x faster than the first, "primesBird", producing the millionth prime) and is moderately useful for a range of the first million primes or so.

Priority Queue Sieve

In order to investigate Priority Queue Sieves as espoused by O'Neill in the referenced article, one must find an equivalent implementation of a Min Heap Priority Queue as used by her. There is such an purely functional implementation in RosettaCode translated from the Haskell code she used, from which the essential parts are duplicated here (Note that the key value is given an integer type in order to avoid the inefficiency of F# in generic comparison):

[<RequireQualifiedAccess>]
module MinHeap =

  type HeapEntry<'V> = struct val k:uint32 val v:'V new(k,v) = {k=k;v=v} end
  [<CompilationRepresentation(CompilationRepresentationFlags.UseNullAsTrueValue)>]
  [<NoEquality; NoComparison>]
  type PQ<'V> =
         | Mt
         | Br of HeapEntry<'V> * PQ<'V> * PQ<'V>

  let empty = Mt

  let peekMin = function | Br(kv, _, _) -> Some(kv.k, kv.v)
                         | _            -> None

  let rec push wk wv = 
    function | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)
             | Br(vkv, ll, rr) ->
                 if wk <= vkv.k then
                   Br(HeapEntry(wk, wv), push vkv.k vkv.v rr, ll)
                 else Br(vkv, push wk wv rr, ll)

  let private siftdown wk wv pql pqr =
    let rec sift pl pr =
      match pl with
        | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)
        | Br(vkvl, pll, plr) ->
            match pr with
              | Mt -> if wk <= vkvl.k then Br(HeapEntry(wk, wv), pl, Mt)
                      else Br(vkvl, Br(HeapEntry(wk, wv), Mt, Mt), Mt)
              | Br(vkvr, prl, prr) ->
                  if wk <= vkvl.k && wk <= vkvr.k then Br(HeapEntry(wk, wv), pl, pr)
                  elif vkvl.k <= vkvr.k then Br(vkvl, sift pll plr, pr)
                  else Br(vkvr, pl, sift prl prr)
    sift pql pqr                                        

  let replaceMin wk wv = function | Mt -> Mt
                                  | Br(_, ll, rr) -> siftdown wk wv ll rr

Except as noted for any individual code, all of the following codes need the following prefix code in order to implement the non-memoizing Co-Inductive Streams (CIS's) and to set the type of particular constants used in the codes to the same time as the "Prime" type:

type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint32
let frstprm = 2u
let frstoddprm = 3u
let inc1 = 1u
let inc = 2u

The F# equivalent to O'Neill's "odds-only" code is then implemented as follows, which needs the included changed prefix in order to change the primes type to a larger one to prevent overflow (as well the key type for the MinHeap needs to be changed from uint32 to uint64); it is functionally the same as the O'Neill code other than for minor changes to suit the use of CIS streams and the option output of the "peekMin" function:

type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint64
let frstprm = 2UL
let frstoddprm = 3UL
let inc = 2UL

let primesPQ() =
  let pmult p (xs: CIS<Prime>) = // does map (* p) xs
    let rec nxtm (cs: CIS<Prime>) =
      CIS(p * cs.v, fun() -> nxtm (cs.cont())) in nxtm xs
  let insertprime p xs table =
    MinHeap.push (p * p) (pmult p xs) table
  let rec sieve' (ns: CIS<Prime>) table =
    let nextcomposite = match MinHeap.peekMin table with
                          | None -> ns.v // never happens
                          | Some (k, _) -> k
    let rec adjust table =
      let (n, advs) = match MinHeap.peekMin table with
                        | None -> (ns.v, ns.cont()) // never happens
                        | Some kv -> kv
      if n <= ns.v then adjust (MinHeap.replaceMin advs.v (advs.cont()) table)
      else table
    if nextcomposite <= ns.v then sieve' (ns.cont()) (adjust table)
    else let n = ns.v in CIS(n, fun() ->
           let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns table))
  let rec sieve (ns: CIS<Prime>) = let n = ns.v in CIS(n, fun() ->
      let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns MinHeap.empty))
  let odds = // is the odds CIS from 3 up
    let rec nxto i = CIS(i, fun() -> nxto (i + inc)) in nxto frstoddprm
  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
             (CIS(frstprm, fun() -> (sieve odds)))

However, that algorithm suffers in speed and memory use due to over-eager adding of prime composite streams to the queue such that the queue used is much larger than it needs to be and a much larger range of primes number must be used in order to avoid numeric overflow on the square of the prime added to the queue. The following code corrects that by using a secondary (actually a multiple of) base primes streams which are constrained to be based on a prime that is no larger than the square root of the currently sieved number - this permits the use of much smaller Prime types as per the default prefix:

let primesPQx() =
  let rec nxtprm n pq q (bps: CIS<Prime>) =
    if n >= q then let bp = bps.v in let adv = bp + bp
                   let nbps = bps.cont() in let nbp = nbps.v
                   nxtprm (n + inc) (MinHeap.push (n + adv) adv pq) (nbp * nbp) nbps
    else let ck, cv = match MinHeap.peekMin pq with
                        | None -> (q, inc) // only happens until first insertion
                        | Some kv -> kv
         if n >= ck then let rec adjpq ck cv pq =
                             let npq = MinHeap.replaceMin (ck + cv) cv pq
                             match MinHeap.peekMin npq with
                               | None -> npq // never happens
                               | Some(nk, nv) -> if n >= nk then adjpq nk nv npq
                                                 else npq
                         nxtprm (n + inc) (adjpq ck cv pq) q bps
         else CIS(n, fun() -> nxtprm (n + inc) pq q bps)
  let rec oddprms() = CIS(frstoddprm, fun() ->
      nxtprm (frstoddprm + inc) MinHeap.empty (frstoddprm * frstoddprm) (oddprms()))
  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
             (CIS(frstprm, fun() -> (oddprms())))

The above code is well over five times faster than the previous translated O'Neill version for the given variety of reasons.

Although slightly faster than the Tree Folding code, this latter code is also limited in practical usefulness to about the first one to ten million primes or so.

All of the above codes can be tested in the F# REPL with the following to produce the millionth prime (the "nth" function is zero based):

> primesXXX() |> Seq.nth 999999;;

where primesXXX() is replaced by the given primes generating function to be tested, and which all produce the following output (after a considerable wait in some cases):

Output:
val it : Prime = 15485863u

Imperative

The following code is written in functional style other than it uses a mutable bit array to sieve the composites:

let primes limit =
  let buf = System.Collections.BitArray(int limit + 1, true)
  let cull p = { p * p .. p .. limit } |> Seq.iter (fun c -> buf.[int c] <- false)
  { 2u .. uint32 (sqrt (double limit)) } |> Seq.iter (fun c -> if buf.[int c] then cull c)
  { 2u .. limit } |> Seq.map (fun i -> if buf.[int i] then i else 0u) |> Seq.filter ((<>) 0u)

[<EntryPoint>]
let main argv =
  if argv = null || argv.Length = 0 then failwith "no command line argument for limit!!!"
  printfn "%A" (primes (System.UInt32.Parse argv.[0]) |> Seq.length)
  0 // return an integer exit code

Substituting the following minor changes to the code for the "primes" function will only deal with the odd prime candidates for a speed up of over a factor of two as well as a reduction of the buffer size by a factor of two:

let primes limit =
  let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
  let buf = System.Collections.BitArray(int lmtb + 1, true)
  let cull i = let p = i + i + 3u in let s = p * (i + 1u) + i in
               { s .. p .. lmtb } |> Seq.iter (fun c -> buf.[int c] <- false)
  { 0u .. lmtbsqrt } |> Seq.iter (fun i -> if buf.[int i] then cull i )
  let oddprms = { 0u .. lmtb } |> Seq.map (fun i -> if buf.[int i] then i + i + 3u else 0u)
                |> Seq.filter ((<>) 0u)
  seq { yield 2u; yield! oddprms }

The following code uses other functional forms for the inner culling loops of the "primes function" to reduce the use of inefficient sequences so as to reduce the execution time by another factor of almost three:

let primes limit =
  let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
  let buf = System.Collections.BitArray(int lmtb + 1, true)
  let rec culltest i = if i <= lmtbsqrt then
                         let p = i + i + 3u in let s = p * (i + 1u) + i in
                         let rec cullp c = if c <= lmtb then buf.[int c] <- false; cullp (c + p)
                         (if buf.[int i] then cullp s); culltest (i + 1u) in culltest 0u
  seq {yield 2u; for i = 0u to lmtb do if buf.[int i] then yield i + i + 3u }

Now much of the remaining execution time is just the time to enumerate the primes as can be seen by turning "primes" into a primes counting function by substituting the following for the last line in the above code doing the enumeration; this makes the code run about a further five times faster:

  let rec count i acc =
    if i > int lmtb then acc else if buf.[i] then count (i + 1) (acc + 1) else count (i + 1) acc
  count 0 1

Since the final enumeration of primes is the main remaining bottleneck, it is worth using a "roll-your-own" enumeration implemented as an object expression so as to save many inefficiencies in the use of the built-in seq computational expression by substituting the following code for the last line of the previous codes, which will decrease the execution time by a factor of over three (instead of almost five for the counting-only version, making it almost as fast):

  let nmrtr() =
    let i = ref -2
    let rec nxti() = i:=!i + 1;if !i <= int lmtb && not buf.[!i] then nxti() else !i <= int lmtb
    let inline curr() = if !i < 0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
                        else let v = uint32 !i in v + v + 3u
    { new System.Collections.Generic.IEnumerator<_> with
        member this.Current = curr()
      interface System.Collections.IEnumerator with
        member this.Current = box (curr())
        member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
        member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"a
      interface System.IDisposable with
        member this.Dispose() = () }
  { new System.Collections.Generic.IEnumerable<_> with
      member this.GetEnumerator() = nmrtr()
    interface System.Collections.IEnumerable with
      member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }

The various optimization techniques shown here can be used "jointly and severally" on any of the basic versions for various trade-offs between code complexity and performance. Not shown here are other techniques of making the sieve faster, including extending wheel factorization to much larger wheels such as 2/3/5/7, pre-culling the arrays, page segmentation, and multi-processing.

Almost functional Unbounded

the following odds-only implmentations are written in an almost functional style avoiding the use of mutability except for the contents of the data structures uses to hold the state of the and any mutability necessary to implement a "roll-your-own" IEnumberable iterator interface for speed.

Unbounded Dictionary (Mutable Hash Table) Based Sieve

The following code uses the DotNet Dictionary class instead of the above functional Priority Queue to implement the sieve; as average (amortized) hash table access is O(1) rather than O(log n) as for the priority queue, this implementation is slightly faster than the priority queue version for the first million primes and will always be faster for any range above some low range value:

type Prime = uint32
let frstprm = 2u
let frstoddprm = 3u
let inc = 2u
let primesDict() =
  let dct = System.Collections.Generic.Dictionary()
  let rec nxtprm n q (bps: CIS<Prime>) =
    if n >= q then let bp = bps.v in let adv = bp + bp
                   let nbps = bps.cont() in let nbp = nbps.v
                   dct.Add(n + adv, adv)
                   nxtprm (n + inc) (nbp * nbp) nbps
    else if dct.ContainsKey(n) then
           let adv = dct.[n]
           dct.Remove(n) |> ignore
//           let mutable nn = n + adv // ugly imperative code
//           while dct.ContainsKey(nn) do nn <- nn + adv
//           dct.Add(nn, adv)
           let rec nxtmt k = // advance to next empty spot
             if dct.ContainsKey(k) then nxtmt (k + adv)
             else dct.Add(k, adv) in nxtmt (n + adv)
           nxtprm (n + inc) q bps
         else CIS(n, fun() -> nxtprm (n + inc) q bps)
  let rec oddprms() = CIS(frstoddprm, fun() ->
      nxtprm (frstoddprm + inc) (frstoddprm * frstoddprm) (oddprms()))
  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
             (CIS(frstprm, fun() -> (oddprms())))

The above code uses functional forms of code (with the imperative style commented out to show how it could be done imperatively) and also uses a recursive non-sharing secondary source of base primes just as for the Priority Queue version. As for the functional codes, the Primes type can easily be changed to "uint64" for wider range of sieving.

In spite of having true O(n log log n) Sieve of Eratosthenes computational complexity where n is the range of numbers to be sieved, the above code is still not particularly fast due to the time required to compute the hash values and manipulations of the hash table.

Unbounded Page-Segmented Bit-Packed Odds-Only Mutable Array Sieve

Note that the following code is used for the F# entry Extensible_prime_generator#Unbounded_Mutable_Array_Generator of the Extensible prime generator page.

All of the above unbounded implementations including the above Dictionary based version are quite slow due to their large constant factor computational overheads, making them more of an intellectual exercise than something practical, especially when larger sieving ranges are required. The following code implements an unbounded page segmented version of the sieve in not that many more lines of code, yet runs about 25 times faster than the Dictionary version for larger ranges of sieving such as to one billion; it uses functional forms without mutability other than for the contents of the arrays and the `primes` enumeration generator function that must use mutability for speed:

type Prime = float // use uint64/int64 for regular 64-bit F#
type private PrimeNdx = float // they are slow in JavaScript polyfills

let inline private prime n = float n // match these convenience conversions
let inline private primendx n = float n // with the types above!

let private cPGSZBTS = (1 <<< 14) * 8 // sieve buffer size in bits = CPUL1CACHE

type private SieveBuffer = uint8[]

/// a Co-Inductive Stream (CIS) of an "infinite" non-memoized series...
type private CIS<'T> = CIS of 'T * (unit -> CIS<'T>) //' apostrophe formatting adjustment

/// lazy list (memoized) series of base prime page arrays...
type private BasePrime = uint32
type private BasePrimeArr = BasePrime[]
type private BasePrimeArrs = BasePrimeArrs of BasePrimeArr * Option<Lazy<BasePrimeArrs>>

/// Masking array is faster than bit twiddle bit shifts!
let private cBITMASK = [| 1uy; 2uy; 4uy; 8uy; 16uy; 32uy; 64uy; 128uy |]

let private cullSieveBuffer lwi (bpas: BasePrimeArrs) (sb: SieveBuffer) =
  let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
  let rec loopbp (BasePrimeArrs(bpa, bpatl) as ibpas) i =
    if i >= bpa.Length then
      match bpatl with
      | None -> ()
      | Some lv -> loopbp lv.Value 0 else
    let bp = prime bpa.[i] in let bpndx = primendx ((bp - prime 3) / prime 2)
    let s = (bpndx * primendx 2) * (bpndx + primendx 3) + primendx 3 in let bpint = int bp
    if s <= lmti then
      let s0 = // page cull start address calculation...
        if s >= lwi then int (s - lwi) else
        let r = (lwi - s) % (primendx bp)
        if r = primendx 0 then 0 else int (bp - prime r)
      let slmt = min btlmt (s0 - 1 + (bpint <<< 3))
      let rec loopc c = // loop "unpeeling" is used so
        if c <= slmt then // a constant mask can be used over the inner loop
          let msk = cBITMASK.[c &&& 7]
          let rec loopw w =
            if w < sb.Length then sb.[w] <- sb.[w] ||| msk; loopw (w + bpint)
          loopw (c >>> 3); loopc (c + bpint)
      loopc s0; loopbp ibpas (i + 1) in loopbp bpas 0

/// fast Counting Look Up Table (CLUT) for pop counting...
let private cCLUT =
  let arr = Array.zeroCreate 65536
  let rec popcnt n cnt = if n > 0 then popcnt (n &&& (n - 1)) (cnt + 1) else uint8 cnt
  let rec loop i = if i < 65536 then arr.[i] <- popcnt i 0; loop (i + 1)
  loop 0; arr

let countSieveBuffer ndxlmt (sb: SieveBuffer): int =
  let lstw = (ndxlmt >>> 3) &&& -2
  let msk = (-2 <<< (ndxlmt &&& 15)) &&& 0xFFFF
  let inline cntem i m =
    int cCLUT.[int (((uint32 sb.[i + 1]) <<< 8) + uint32 sb.[i]) ||| m]
  let rec loop i cnt =
    if i >= lstw then cnt - cntem lstw msk else loop (i + 2) (cnt - cntem i 0)
  loop 0 ((lstw <<< 3) + 16)

/// a CIS series of pages from the given start index with the given SieveBuffer size,
/// and provided with a polymorphic converter function to produce
/// and type of result from the culled page parameters...
let rec private makePrimePages strtwi btsz
                               (cnvrtrf: PrimeNdx -> SieveBuffer -> 'T): CIS<'T> =
  let bpas = makeBasePrimes() in let sb = Array.zeroCreate (btsz >>> 3)
  let rec nxtpg lwi =
    Array.fill sb 0 sb.Length 0uy; cullSieveBuffer lwi bpas sb
    CIS(cnvrtrf lwi sb, fun() -> nxtpg (lwi + primendx btsz))
  nxtpg strtwi

/// secondary feed of lazy list of memoized pages of base primes...
and private makeBasePrimes(): BasePrimeArrs =
  let sb2bpa lwi (sb: SieveBuffer) =
    let bsbp = uint32 (primendx 3 + lwi + lwi)
    let arr = Array.zeroCreate <| countSieveBuffer 255 sb
    let rec loop i j =
      if i < 256 then
        if sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy then loop (i + 1) j
        else arr.[j] <- bsbp + uint32 (i + i); loop (i + 1) (j + 1)
    loop 0 0; arr
  // finding the first page as not part of the loop and making succeeding
  // pages lazy breaks the recursive data race!
  let frstsb = Array.zeroCreate 32
  let fkbpas = BasePrimeArrs(sb2bpa (primendx 0) frstsb, None)
  cullSieveBuffer (primendx 0) fkbpas frstsb
  let rec nxtbpas (CIS(bpa, tlf)) = BasePrimeArrs(bpa, Some(lazy (nxtbpas (tlf()))))
  BasePrimeArrs(sb2bpa (primendx 0) frstsb,
                Some(lazy (nxtbpas <| makePrimePages (primendx 256) 256 sb2bpa)))

/// produces a generator of primes; uses mutability for better speed...
let primes(): unit -> Prime =
  let sb2prms lwi (sb: SieveBuffer) = lwi, sb in let mutable ndx = -1
  let (CIS((nlwi, nsb), npgtlf)) = // use page generator function above!
    makePrimePages (primendx 0) cPGSZBTS sb2prms
  let mutable lwi = nlwi in let mutable sb = nsb
  let mutable pgtlf = npgtlf
  let mutable baseprm = prime 3 + prime (lwi + lwi) 
  fun() -> 
    if ndx < 0 then ndx <- 0; prime 2 else
    let inline notprm i = sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy
    while ndx < cPGSZBTS && notprm ndx do ndx <- ndx + 1
    if ndx >= cPGSZBTS then // get next page if over
      let (CIS((nlwi, nsb), npgtlf)) = pgtlf() in ndx <- 0
      lwi <- nlwi; sb <- nsb; pgtlf <- npgtlf
      baseprm <- prime 3 + prime (lwi + lwi) 
      while notprm ndx do ndx <- ndx + 1
    let ni = ndx in ndx <- ndx + 1 // ready for next call!
    baseprm + prime (ni + ni)

let countPrimesTo (limit: Prime): int = // much faster!
  if limit < prime 3 then (if limit < prime 2 then 0 else 1) else
  let topndx = (limit - prime 3) / prime 2 |> primendx
  let sb2cnt lwi (sb: SieveBuffer) =
    let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
    countSieveBuffer
      (if lmti < topndx then btlmt else int (topndx - lwi)) sb, lmti
  let rec loop (CIS((cnt, nxti), tlf)) count =
    if nxti < topndx then loop (tlf()) (count + cnt)
    else count + cnt
  loop <| makePrimePages (primendx 0) cPGSZBTS sb2cnt <| 1

/// sequences are convenient but slow...
let primesSeq() = primes() |> Seq.unfold (fun gen -> Some(gen(), gen))
printfn "The first 25 primes are:  %s"
  ( primesSeq() |> Seq.take 25
      |> Seq.fold (fun s p -> s + string p + " ") "" )
printfn "There are %d primes up to a million." 
  ( primesSeq() |> Seq.takeWhile ((>=) (prime 1000000)) |> Seq.length )

let rec cntto gen lmt cnt = // faster than seq's but still slow
  if gen() > lmt then cnt else cntto gen lmt (cnt + 1)

let limit = prime 1_000_000_000
let start = System.DateTime.Now.Ticks
// let answr = cntto (primes()) limit 0 // slower way!
let answr = countPrimesTo limit // over twice as fast way!
let elpsd = (System.DateTime.Now.Ticks - start) / 10000L
printfn "Found %d primes to %A in %d milliseconds." answr limit elpsd
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 78498 primes up to a million.
Found 50847534 primes to 1000000000 in 2161 milliseconds.

As with all of the efficient unbounded sieves, the above code uses a secondary enumerator of the base primes less than the square root of the currently culled range, which is this case is a lazy (deferred memoized evaluation) binding by small pages of base primes which also uses the laziness of the deferral of subsequent pages so as to avoid a race condition.

The above code is written to output the "uint64" type for very large ranges of primes since there is little computational cost to doing this for this algorithm when used with 64-bit compilation; however, for the Fable transpiled to JavaScript, the largest contiguous integer that can be represented is the 64-bit floating point mantissa of 52 bits and thus the large numbers can be represented by floats in this case since a 64-bit polyfill is very slow. As written, the practical range for this sieve is about 16 billion, however, it can be extended to about 10^14 (a week or two of execution time) by setting the "PGSZBTS" constant to the size of the CPU L2 cache rather than the L1 cache (L2 is up to about two Megabytes for modern high end desktop CPU's) at a slight loss of efficiency (a factor of up to two or so) per composite number culling operation due to the slower memory access time. When the Fable compilation option is used, execution speed is roughly the same as using F# with DotNet Core.

Even with the custom `primes` enumerator generator (the F#/Fable built-in sequence operators are terribly inefficient), the time to enumerate the resulting primes takes longer than the time to actually cull the composite numbers from the sieving arrays. The time to do the actual culling is thus over 50 times faster than done using the Dictionary version. The slowness of enumeration, no matter what further tweaks are done to improve it (each value enumerated will always take a function calls and a scan loop that will always take something in the order of 100 CPU clock cycles per value), means that further gains in speed using extreme wheel factorization and multi-processing have little point unless the actual work on the resulting primes is done through use of auxiliary functions not using iteration. Such a function is provided here to count the primes by pages using a "pop count" look up table to reduce the counting time to only a small fraction of a second.

Factor

Factor already contains two implementations of the sieve of Eratosthenes in math.primes.erato and math.primes.erato.fast. It is suggested to use one of them for real use, as they use faster types, faster unsafe arithmetic, and/or wheels to speed up the sieve further. Shown here is a more straightforward implementation that adheres to the restrictions given by the task (namely, no wheels).

Factor is pleasantly multiparadigm. Usually, it's natural to write more functional or declarative code in Factor, but this is an instance where it is more natural to write imperative code. Lexical variables are useful here for expressing the necessary mutations in a clean way.

USING: bit-arrays io kernel locals math math.functions
math.ranges prettyprint sequences ;
IN: rosetta-code.sieve-of-erato

<PRIVATE

: init-sieve ( n -- seq )   ! Include 0 and 1 for easy indexing.
    1 - <bit-array> dup set-bits ?{ f f } prepend ;

! Given the sieve and a prime starting index, create a range of
! values to mark composite. Start at the square of the prime.
: to-mark ( seq n -- range )
    [ length 1 - ] [ dup dup * ] bi* -rot <range> ;

! Mark multiples of prime n as composite.
: mark-nths ( seq n -- ) 
    dupd to-mark [ swap [ f ] 2dip set-nth ] with each ;

: next-prime ( index seq -- n ) [ t = ] find-from drop ;

PRIVATE>

:: sieve ( n -- seq )
    n sqrt 2 n init-sieve :> ( limit i! s )
    [ i limit < ]             ! sqrt optimization 
    [ s i mark-nths i 1 + s next-prime i! ] while t s indices ;

: sieve-demo ( -- )
    "Primes up to 120 using sieve of Eratosthenes:" print
    120 sieve . ;

MAIN: sieve-demo

FOCAL

1.1 T "PLEASE ENTER LIMIT"
1.2 A N
1.3 I (2047-N)5.1
1.4 D 2
1.5 Q

2.1 F X=2,FSQT(N); D 3
2.2 F W=2,N; I (SIEVE(W)-2)4.1

3.1 I (-SIEVE(X))3.3
3.2 F Y=X*X,X,N; S SIEVE(Y)=2
3.3 R

4.1 T %4.0,W,!

5.1 T "PLEASE ENTER A NUMBER LESS THAN 2048."!; G 1.1

Note that with the 4k paper tape version of FOCAL, the program will run out of memory for N>190 or so.

Forth

: prime? ( n -- ? ) here + c@ 0= ;
: composite! ( n -- ) here + 1 swap c! ;

: sieve ( n -- )
  here over erase
  2
  begin
    2dup dup * >
  while
    dup prime? if
      2dup dup * do
        i composite!
      dup +loop
    then
    1+
  repeat
  drop
  ." Primes: " 2 do i prime? if i . then loop ;

100 sieve
Output:
Primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

Alternate Odds-Only, Better Style

The above code is not really very good Forth style as the main initialization, sieving, and output, are all in one `sieve` routine which makes it difficult to understand and refactor; Forth code is normally written in a series of very small routines which makes it easier to understand what is happening on the data stack, since Forth does not have named local re-entrant variable names as most other languages do for local variables (which other languages also normally store local variables on the stack). Also, it uses the `HERE` pointer to user space which points to the next available memory after all compilation is done as a unsized buffer pointer, but as it does not reserve that space for the sieving buffer, it can be changed by other concatenated routines in unexpected ways; better is to allocate the sieving buffer as required from the available space at the time the routines are run and pass that address between concatenated functions until a finalization function frees the memory and clears the stack; this is equivalent to allocating from the "heap" in other languages. The below code demonstrates these ideas:

: prime? ( addr -- ? ) C@ 0= ; \ test composites array for prime

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
   DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
   DO 2 PICK I + TRUE SWAP C! DUP +LOOP DROP ;

\ process for required prime limit; allocate and initialize returned buffer...
: initsieve ( u -- u a-addr)
   3 - DUP 0< IF 0 ELSE
      1 RSHIFT 1+ DUP ALLOCATE 0<> IF ABORT" Memory allocation error!!!"
      ELSE 2DUP SWAP ERASE THEN
   THEN ;

\ pass through sieving to given index in given buffer address as side effect...
: sieve ( u a-addr -- u a-addr )
   0 \ initialize test index i -- numv bufa i
   BEGIN \ test prime square index < limit
      DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
   WHILE \ -- numv bufa sqri i
      2 PICK OVER + prime? IF cullpi! \ -- numv bufa i
      ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
   REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ print primes to given limit...
: .primes ( u a-addr -- )
   OVER 0< IF DROP 2 - 0< IF ( ." No primes!" ) ELSE ( ." Prime:  2" ) THEN
   ELSE ." Primes:  2 " SWAP 0
      DO DUP I + prime? IF I I + 3 + . THEN LOOP FREE DROP THEN ;

\ count number of primes found for number odd numbers within
\ given presumed sieved buffer starting at address...
: countprimes@ ( u a-addr -- )
  SWAP DUP 0< IF 1+ 0< IF DROP 0 ELSE 1 THEN
   ELSE 1 SWAP \ -- bufa cnt numv
      0 DO OVER I + prime? IF 1+ THEN LOOP SWAP FREE DROP
   THEN ;

\ shows counted number of primes to the given limit...
: .countprimesto ( u -- )
   DUP initsieve sieve countprimes@
   CR ." Found " . ." primes Up to the " . ." limit." ;

\ testing the code...
100 initsieve sieve .primes
1000000 .countprimesto
Output:
Primes:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 78498 primes Up to the 1000000 limit.

As well as solving the stated problems making it much easier to understand and refactor, an odds-only sieve takes half the space and less than half the time.

Bit-Packing the Sieve Buffer (Odds-Only)

Although the above version resolves many problems of the first version, it is wasteful of memory as each composite number in the sieve buffer is a byte of eight bits representing a boolean value. The memory required can be reduced eight-fold by bit packing the sieve buffer; this will take more "bit-twiddling" to read and write the bits, but reducing the memory used will give better cache assiciativity to larger ranges such that there will be a net gain in performance. This will make the code more complex and the stack manipulations will be harder to write, debug, and maintain, so ANS Forth 1994 provides a local variable naming facility to make this much easier. The following code implements bit-packing of the sieve buffer using local named variables when required:

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
   0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
    OVER 3 RSHIFT + C@ SWAP 7 AND bits + C@ AND 0= ;

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
   DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
   DO I 3 RSHIFT 3 PICK + DUP C@ I 7 AND bits + C@ OR SWAP C! DUP +LOOP
   DROP ;

\ initializes sieve storage and parameters
\ given sieve limit, returns bit limit and buffer address ..
: initsieve ( u -- u a-addr )
   3 - \ test limit...
   DUP 0< IF 0 ELSE \ return if number of bits is <= 0!
      1 RSHIFT 1+ \ finish conbersion to number of bits
      DUP 1- CellShft RSHIFT 1+ \ round up to even number of cells
      CELLS DUP ALLOCATE 0= IF DUP ROT ERASE \ set cells0. to zero
      ELSE ABORT" Memory allocation error!!!"
      THEN
   THEN ;

\ pass through sieving to given index in given buffer address as side effect...
: sieve ( u a-addr -- u a-addr )
   0 \ initialize test index i -- numv bufa i
   BEGIN \ test prime square index < limit
      DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
   WHILE \ -- numv bufa sqri i
      DUP 3 PICK prime? IF cullpi! \ -- numv bufa i
      ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
   REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ prints already found primes from sieved array...
: .primes ( u a-addr -- )
   SWAP CR ." Primes to " DUP DUP + 2 + 2 MAX . ." are:  "
   DUP 0< IF 1+ 0< IF ." none." ELSE 2 . THEN DROP \ case no primes or just 2
   ELSE 2 . 0 DO I OVER prime? IF I I + 3 + . THEN LOOP FREE DROP
   THEN ;

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u )   numbts 16 SWAP - ;
: initLUT ( -- )   cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: popcount@ ( u -- u )
   0 1 CELlS 1 RSHIFT 0
   DO OVER 65535 AND cntLUT16 + C@ + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index-1 in array address;
\ params are number of bits used - bits, negative indicates <2/2 out: 0/1,
\ given address is of the allocated bit buffer - bufa;
\ values used: bmsk is bit mask to limit bit in last cell,
\ lci is cell index of last cell used, cnt is the return value...
\ NOTE. this is for little-endian; big-endian needs a byte swap
\ before the last mask and popcount operation!!!
: primecount@ ( u a-addr -- u )
   LOCALS| bufa numb |
   numb 0< IF numb 1+ 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
   ELSE
      numb 1- TO numb \ numb -= 1
      1 \ initial count
      numb CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
      0 ?DO bufa I + @ popcount@ + 1 CELLS +LOOP \ -- lci cnt
      SWAP bufa + @ \ -- cnt lstCELL
      -2 numb CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
      popcount@ + \ add popcount of last masked CELL -- cnt
      bufa FREE DROP \ free bufa -- bmsk cnt lastcell@
   THEN ;

: .countprimesto ( u -- u )
   dup initsieve sieve primecount@
   CR ." There are " . ." primes Up to the " . ." limit." ;

100 initsieve sieve .primes
1000000000 .countprimesto

The output of the above code is the same as the previous version, but it takes about two thirds the time while using eight times less memory; it takes about 6.5 seconds on my Intel Skylake i5-6500 at 3.6 GHz (turbo) using swiftForth (32-bit) and about 3.5 seconds on VFX Forth (64-bit), both of which compile to machine code but with the latter much more optimized; gforth-fast is about twice as slow as swiftForth and five times slower then VFX Forth as it just compiles to threaded execution tokens (more like an interpreter).

Page-Segmented Bit-Packed Odds-Only Version

While the above version does greatly reduce the amount of memory used for a given sieving range and thereby also somewhat reduces execution time; any sieve intended for sieving to limits of a hundred million or more should use a page-segmented implementation; page-segmentation means that only storage for a representation of the base primes up to the square root of the limit plus a sieve buffer that should also be at least proportional to the same square root is required; this will again make the execution faster as ranges go up due to better cache associativity with most memory accesses being within the CPU cache sizes. The following Forth code implements a basic version that does this:

\ CPU L1 and L2 cache sizes in bits; power of 2...
1 17 LSHIFT CONSTANT L1CacheBits
L1CacheBits 8 * CONSTANT L2CacheBits

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
   0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ initializes sieve buffer storage and parameters
\ given sieve buffer bit size (even number of CELLS), returns buffer address ..
: initSieveBuffer ( u -- a-addr )
   CellShft RSHIFT \ even number of cells
   CELLS ALLOCATE 0<> IF ABORT" Memory allocation error!!!" THEN ;

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
    OVER 3 RSHIFT + C@ SWAP 7 AND bits + C@ AND 0= ;

\ given square index and prime index, u0, as sell as bitsz,
\ sieve the multiples of said prime leaving prime index on the stack...
: cullpi! ( u u0 u u addr -- u0 )
   LOCALS| sba bitsz lwi | DUP DUP + 3 + ROT \ -- i prm sqri
   \ culling start incdx address calculation...
   lwi 2DUP > IF - ELSE SWAP - OVER MOD DUP 0<> IF OVER SWAP - THEN
   THEN bitsz SWAP \ -- i prm bitsz strti
   DO I 3 RSHIFT sba + DUP C@ I 7 AND bits + C@ OR SWAP C! DUP +LOOP
   DROP ;

\ cull sieve buffer given base wheel index, bit size, 
\ address base prime sieved buffer and
\ the address of the sieve buffer to be culled of composite bits...
: cullSieveBuffer ( u u a-addr a-addr -- )
   >R >R 2DUP + R> R>  \ -- lwi bitsz rngi bpba sba
   LOCALS| sba bpba rngi bitsz lwi |
   bitsz 1- CellShft RSHIFT 1+ CELLS sba SWAP ERASE \ clear sieve buffer
   0 \ initialize base prime index i -- i
   BEGIN \ test prime square index < limit
      DUP DUP DUP + SWAP 3 + * 3 + TUCK rngi < \ sqri = 2*i * (I+3) + 3
   WHILE \ -- sqri i
      DUP bpba prime? IF lwi bitsz sba cullpi! ELSE SWAP DROP THEN \ -- i     
   1+ REPEAT 2DROP ; \ --

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u )   numbts 16 SWAP - ;
: initLUT ( -- )   cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: popcount@ ( u -- u )
   0 1 CELlS 1 RSHIFT 0
   DO OVER 65535 AND cntLUT16 + C@ + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index in array address...
: countSieveBuffer@ ( u a-addr -- u )
   LOCALS| bufa lmti |
   0 \ initial count -- cnt
   lmti CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
   0 ?DO bufa I + @ popcount@ + 1 CELLS +LOOP \ -- lci cnt
   SWAP bufa + @ \ -- cnt lstCELL
   -2 lmti CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
   popcount@ + ; \ add popcount of last masked CELL -- cnt

\ prints found primes from series of culled sieve buffers...
: .primes ( u -- )
   DUP CR ." Primes to " . ." are:  "
   DUP 3 - 0< IF DUP 2 - 0< IF ." none." ELSE 2 . THEN \ <2/2 -> 0/1
   ELSE 2 .
      3 - 1 RSHIFT 1+ \ -- rngi
      DUP 1- L2CacheBits / L2CacheBits * 3 RSHIFT \ -- rng rngi pglmtbytes
      L1CacheBits initSieveBuffer \ address of base prime sieve buffer
      L2CacheBits initSieveBuffer \ address of main sieve buffer
      LOCALS| sba bpsba pglmt | \ local variables -- rngi
      0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
      pglmt 0 ?DO
         I L2CacheBits bpsba sba cullSieveBuffer
         I L2CacheBits 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
      L2CacheBits +LOOP \ rngi
      L2CacheBits mod DUP 0> IF \ one more page!
         pglmt DUP L2CacheBits bpsba sba cullSieveBuffer
         SWAP 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
      THEN bpsba FREE DROP sba FREE DROP
   THEN ; \ --

\ prints count of found primes from series of culled sieve buffers...
: .countPrimesTo ( u -- )
   DUP 3 - 0< IF 2 - 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
   ELSE
      DUP 3 - 1 RSHIFT 1+
      DUP 1- L2CacheBits / L2CacheBits * \ -- rng rngi pglmtbytes
      L1CacheBits initSieveBuffer \ address of base prime sieve buffer
      L2CacheBits initSieveBuffer \ address of main sieve buffer
      LOCALS| sba bpsba pglmt | \ local variables -- rng rngi
      0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
      1 pglmt 0 ?DO
         I L2CacheBits bpsba sba cullSieveBuffer
         L2CacheBits 1- sba countSieveBuffer@ +
      L2CacheBits +LOOP \ rng rngi cnt
      SWAP L2CacheBits mod DUP 0> IF \ one more page!
         pglmt OVER bpsba sba cullSieveBuffer
         1- sba countSieveBuffer@ + \ partial count!
      THEN
      bpsba FREE DROP sba FREE DROP \ -- range cnt
   THEN CR ." There are " . ." primes Up to the " . ." limit." ;

100 .primes
1000000000 .countPrimesTo
Output:
Primes to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 50847534 primes Up to the 1000000000 limit.

For simplicity, the base primes array is left as a sieved bit packed array (which takes minimum space) at the cost of having to scan the bit array for base primes on every page-segment culling pass. The page-segment sieve buffer is set as a fixed multiple of this (intended to fit within the CPU L2 cache size) in order to reduce the base prime start index address calculation overhead by this factor at the cost of slightly increased memory access times, which access times are still only about the same as the fastest inner culling time or less anyway. When the cache sizes are set to the 32 Kilobyte/256 Kilobyte size for L1/L2, respectively, by changing 1 18 LSHIFT CONSTANT L1CacheBits) as for my Intel Skylake i5-6500 at 3.6 GHz (single-threaded turbo), it runs in about 1.25 seconds on 64-bit VFX Forth, 3.75 seconds on 32-bit swiftForth, and 12.4 seconds on 64-bit gforth-fast, obviously with the tuned in-lined machine language compiling of VFX Forth much faster than the threaded execution token interpreting of gforth and with swiftForth lacking the machine code inlining of VFX Forth.

VFX Forth is only about 25 % slower than the algorithm as written in the fastest of languages, just as they advertise.

As written, the algorithm works efficiently up to over ten billion (1e10) with 64-bit systems, but could easily be refactored to use floating point or double precision for inputs and outputs as I have done in a StackOverflow answer in JavaScript without costing much in execution time so 32-bit systems would have the much higher limit.

The implementation is efficient up to this range, but with a change so that the base primes array can grow with increasing limit, can sieve to much higher ranges with a loss of efficiency in unused base prime start address calculations that can't be used as the culling spans exceed the fixed sieve buffer size. Again, this can be solved by also making the page-segmentation sieve buffer grow as the square root of the limit.

Further improvements by a factor of almost four in overall execution speed would be gained by implementing maximum wheel-factorization as per my other StackOverflow JavaScript answer, which also effectively increases sieve buffer sizes by a factor of 48 in sieving by modulo residual bit planes.

Finally, multi-processing could be applied to increase the execution speed by about the number of effective cores (non SMT - Hyper Threads) as in four on my Skylake machine; however, neither the 1994 ANS Forth standard nor the 2012 standard has a standard Forth way of implementing this so each of the implementations use their own custom WORDS; since the resulting code would not be cross-implementation, I am not going to do this.

I likely won't even add the Maximum Wheel-Factorized version as in the above linked JavaScript code, since this code is enough to demonstrate what I was going to show: that Forth can be an efficient language, albeit a little hard to code, read, and maintain due to the reliance on anonymous data stack operations; it is a language whose best use is likely in cross-compiling to embedded systems where it can easily be customized and extended as required, and because it doesn't actually require a base operating system, can use its core facilities, functions, and extensions in place of such an OS to result in a minimum memory footprint.

Fortran

Works with: Fortran version 77
      PROGRAM MAIN
      INTEGER LI
      WRITE (6,100)
      READ  (5,110) LI
      call SOE(LI)
 100  FORMAT( 'Limit:' )
 110  FORMAT( I4 )
      STOP
      END
      
C --- SIEVE OF ERATOSTHENES ----------
      SUBROUTINE SOE( LI )
      INTEGER LI
      LOGICAL A(LI)
      INTEGER SL,P,I
      
      DO 10 I=1,LI
         A(I) = .TRUE.
 10   CONTINUE
      
      SL = INT(SQRT(REAL(LI)))
      A(1) = .FALSE.
      DO 30 P=2,SL
         IF ( .NOT. A(P) ) GOTO 30
         DO 20 I=P*P,LI,P
            A(I)=.FALSE.
 20      CONTINUE
 30   CONTINUE

      DO 40 I=2,LI
         IF ( A(I) ) WRITE(6,100) I
 40   CONTINUE

 100  FORMAT(I3)
      RETURN
      END
Works with: Fortran version 90 and later
program sieve

  implicit none
  integer, parameter :: i_max = 100
  integer :: i
  logical, dimension (i_max) :: is_prime

  is_prime = .true.
  is_prime (1) = .false.
  do i = 2, int (sqrt (real (i_max)))
    if (is_prime (i)) is_prime (i * i : i_max : i) = .false.
  end do
  do i = 1, i_max
    if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i
  end do
  write (*, *)

end program sieve

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Because it uses four byte logical's (default size) as elements of the sieve buffer, the above code uses 400 bytes of memory for this trivial task of sieving to 100; it also has 49 + 31 + 16 + 8 = 104 (for the culling by the primes of two, three, five, and seven) culling operations.

Optimised using a pre-computed wheel based on 2:

program sieve_wheel_2

  implicit none
  integer, parameter :: i_max = 100
  integer :: i
  logical, dimension (i_max) :: is_prime

  is_prime = .true.
  is_prime (1) = .false.
  is_prime (4 : i_max : 2) = .false.
  do i = 3, int (sqrt (real (i_max))), 2
    if (is_prime (i)) is_prime (i * i : i_max : 2 * i) = .false.
  end do
  do i = 1, i_max
    if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i
  end do
  write (*, *)

end program sieve_wheel_2

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

This so-called "optimized" version still uses 400 bytes of memory but slightly reduces to 74 operations from 104 operations including the initialization of marking all of the even representations as composite due to skipping the re-culling of the even representation, so isn't really much of an optimization at all!

Optimized using a proper implementation of a wheel 2:

The above implementations, especially the second odds-only code, are some of the most inefficient versions of the Sieve of Eratosthenes in any language here as to time and space efficiency, only worse by some naive JavaScript implementations that use eight-byte Number's as logical values; the second claims to be wheel factorized but still uses all the same memory as the first and still culls by the even numbers in the initialization of the sieve buffer. As well, using four bytes (default logical size) to store a boolean value is terribly wasteful if these implementations were to be extended to non-toy ranges. The following code implements proper wheel factorization by two, reducing the space used by a factor of about eight to 49 bytes by using `byte` as the sieve buffer array elements and not requiring the evens initialization, thus reducing the number of operations to 16 + 8 + 4 = 28 (for the culling primes of three, five, and seven) culling operations:

program sieve_wheel_2
 
  implicit none
  integer, parameter :: i_max = 100
  integer, parameter :: i_limit = (i_max - 3) / 2
  integer :: i
  byte, dimension (0:i_limit) :: composites
 
  composites = 0
  do i = 0, (int (sqrt (real (i_max))) - 3) / 2
    if (composites(i) == 0) composites ((i + i) * (i + 3) + 3 : i_limit : i + i + 3) = 1.
  end do
  write (*, '(i0, 1x)', advance = 'no') 2
  do i = 0, i_limit
    if (composites (i) == 0) write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
  end do
  write (*, *)
 
end program sieve_wheel_2

The output is the same as the earlier version.

Optimized using bit packing to reduce the memory use by a further factor of eight:

The above implementation is still space inefficient in effectively only using one bit out of eight; the following version implements bit packing to reduce memory use by a factor of eight by using bits to represent composite numbers rather than bytes:

program sieve_wheel_2
 
  implicit none
  integer, parameter :: i_max = 10000000
  integer, parameter :: i_range = (i_max - 3) / 2
  integer :: i, j, k, cnt
  byte, dimension (0:i_range / 8) :: composites
 
  composites = 0 ! pre-initialized?
  do i = 0, (int (sqrt (real (i_max))) - 3) / 2
    if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
      do j = (i + i) * (i + 3) + 3, i_range, i + i + 3
        k = shiftr(j, 3)
        composites(k) = ior(composites(k), shiftl(1, iand(j, 7)))
      end do
    end if
  end do
!  write (*, '(i0, 1x)', advance = 'no') 2
  cnt = 1
  do i = 0, i_range
    if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
!      write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
      cnt = cnt + 1
    end if
  end do
!  write (*, *)
  print '(a, i0, a, i0, a, f0.0, a)', &
        'There are ', cnt, ' primes up to ', i_max, '.'
end program sieve_wheel_2
Output:
There are 664579 primes up to 10000000.

When the lines to print the results are enabled, the output to a maximum value of 100 is still exactly the same as the other versions, and it has exactly the same number of culling operations as the immediately above optimized version for the same range; the only difference is that less memory is used. Although the culling operations are somewhat more complex, for larger ranges the time saved in better cache associativity due to more effective use of the cache more than makes up for it so average culling time is actually reduced, so that this version can count the number of primes to several million (it takes a lot of time to list hundreds of thousands of primes, but counting is faster) in a few tens of milliseconds. For ranges above a few tens of millions, a page-segmented sieve is much more efficient due to further improved use of the CPU caches.

Multi-Threaded Page-Segmented Bit-Packed Odds-Only Version

As well as adding page-segmentation, the following code adds multi-processing which is onc of the capabilities for which modern Fortran is known:

subroutine cullSieveBuffer(lwi, size, bpa, sba)

    implicit none
    integer, intent(in) :: lwi, size
    byte, intent(in) :: bpa(0:size - 1)
    byte, intent(out) :: sba(0:size - 1)
    integer :: i_limit, i_bitlmt, i_bplmt, i, sqri, bp, si, olmt, msk, j
    byte, dimension (0:7) :: bits
    common /twiddling/ bits
    
    i_bitlmt = size * 8 - 1
    i_limit = lwi + i_bitlmt
    i_bplmt = size / 4
    sba = 0
    i = 0
    sqri = (i + i) * (i + 3) + 3
    do while (sqri <= i_limit)
      if (iand(int(bpa(shiftr(i, 3))), shiftl(1, iand(i, 7))) == 0) then
        ! start index address calculation...
        bp = i + i + 3
        if (lwi <= sqri) then
          si = sqri - lwi
        else
          si = mod((lwi - sqri), bp)
          if (si /= 0) si = bp - si
        end if
        if (bp <= i_bplmt) then
          olmt = min(i_bitlmt, si + bp * 8 - 1)
          do while (si <= olmt)
            msk = bits(iand(si, 7))
            do j = shiftr(si, 3), size - 1, bp
              sba(j) = ior(int(sba(j)), msk)
            end do
            si = si + bp
          end do
        else
          do while (si <= i_bitlmt)
            j = shiftr(si, 3)
            sba(j) = ior(sba(j), bits(iand(si, 7)))
            si = si + bp
          end do
        end if
      end if
      i = i + 1
      sqri = (i + i) * (i + 3) + 3
    end do
  
  end subroutine cullSieveBuffer
  
  integer function countSieveBuffer(lmti, almti, sba)
  
    implicit none
    integer, intent(in) :: lmti, almti
    byte, intent(in) :: sba(0:almti)
    integer :: bmsk, lsti, i, cnt
    byte, dimension (0:65535) :: clut
    common /counting/ clut
  
    cnt = 0
    bmsk = iand(shiftl(-2, iand(lmti, 15)), 65535)
    lsti = iand(shiftr(lmti, 3), -2)
    do i = 0, lsti - 1, 2
      cnt = cnt + clut(shiftl(iand(int(sba(i)), 255), 8) + iand(int(sba(i + 1)), 255))
    end do
    countSieveBuffer = cnt + clut(ior(shiftl(iand(int(sba(lsti)), 255), 8) + iand(int(sba(lsti + 1)), 255), bmsk))
    
  end function countSieveBuffer
  
  program sieve_paged
  
    use OMP_LIB
    implicit none
    integer, parameter :: i_max = 1000000000, i_range = (i_max - 3) / 2
    integer, parameter :: i_l1cache_size = 16384, i_l1cache_bitsz = i_l1cache_size * 8
    integer, parameter :: i_l2cache_size = i_l1cache_size * 8, i_l2cache_bitsz = i_l2cache_size * 8
    integer :: cr, c0, c1, i, j, k, cnt
    integer, save :: scnt
    integer :: countSieveBuffer
    integer :: numthrds
    byte, dimension (0:i_l1cache_size - 1) :: bpa
    byte, save, allocatable, dimension (:) :: sba
    byte, dimension (0:7) :: bits = (/ 1, 2, 4, 8, 16, 32, 64, -128 /)
    byte, dimension (0:65535) :: clut
    common /twiddling/ bits
    common /counting/ clut
  
    type heaparr
      byte, allocatable, dimension(:) :: thrdsba
    end type heaparr
    type(heaparr), allocatable, dimension (:) :: sbaa
  
    !$OMP THREADPRIVATE(scnt, sba)
  
    numthrds = 1
    !$ numthrds = OMP_get_max_threads()
    allocate(sbaa(0:numthrds - 1))
    do i = 0, numthrds - 1
      allocate(sbaa(i)%thrdsba(0:i_l2cache_size - 1))
    end do
  
    CALL SYSTEM_CLOCK(count_rate=cr)
    CALL SYSTEM_CLOCK(c0)
    do k = 0, 65535 ! initialize counting Look Up Table
      j = k
      i = 16
      do while (j > 0)
        i = i - 1
        j = iand(j, j - 1)
      end do
      clut(k) = i
    end do
    bpa = 0 ! pre-initialization not guaranteed!
    call cullSieveBuffer(0, i_l1cache_size, bpa, bpa)
  
    cnt = 1
    !$OMP PARALLEL DO ORDERED
      do i = i_l2cache_bitsz, i_range, i_l2cache_bitsz * 8
        scnt = 0
        sba = sbaa(mod(i, numthrds))%thrdsba
        do j = i, min(i_range, i + 8 * i_l2cache_bitsz - 1), i_l2cache_bitsz
          call cullSieveBuffer(j - i_l2cache_bitsz, i_l2cache_size, bpa, sba)
          scnt = scnt + countSieveBuffer(i_l2cache_bitsz - 1, i_l2cache_size, sba)
        end do
        !$OMP ATOMIC
          cnt = cnt + scnt
      end do
    !$OMP END PARALLEL DO
  
    j = i_range / i_l2cache_bitsz * i_l2cache_bitsz
    k = i_range - j
    if (k /= i_l2cache_bitsz - 1) then
      call cullSieveBuffer(j, i_l2cache_size, bpa, sbaa(0)%thrdsba)
      cnt = cnt + countSieveBuffer(k, i_l2cache_size, sbaa(0)%thrdsba)
    end if
  !  write (*, '(i0, 1x)', advance = 'no') 2
  !  do i = 0, i_range
  !    if (iand(sba(shiftr(i, 3)), bits(iand(i, 7))) == 0) write (*, '(i0, 1x)', advance='no') (i + i + 3)
  !  end do
  !  write (*, *)
    CALL SYSTEM_CLOCK(c1)
    print '(a, i0, a, i0, a, f0.0, a)', 'Found ', cnt, ' primes up to ', i_max, &
          ' in ', ((c1 - c0) / real(cr) * 1000), ' milliseconds.'
  
    do i = 0, numthrds - 1
      deallocate(sbaa(i)%thrdsba)
    end do
    deallocate(sbaa)
  
  end program sieve_paged
Output:
Found 50847534 primes up to 1000000000 in 219. milliseconds.

The above output was as compiled with gfortran -O3 -fopenmp using version 11.1.1-1 on my Intel Skylake i5-6500 CPU at 3.2 GHz multithreaded with four cores. There are a few more optimizations that could be made in applying Maximum Wheel-Factorization as per my StackOverflow answer in JavaScript, which will make this almost four times faster yet again. If that optimization were done, sieving to a billion as here is really too trivial to measure and one should sieve at least up to ten billion to start to get a long enough time to be measured accurately. As explained in that answer, the Maximum Wheel-Factorized code will work efficiently up to about a trillion (1e12), when it needs yet another "bucket sieve" optimization to allow it to continue to scale efficiently for increasing range. The final optimization which can speed up the code by almost a factor of two is a very low level loop unrolling technique that I'm not sure will work with the compiler, but as it works in C/C++ and other similar languages including those that compile through LLVM, it ought to.

Free Pascal

Basic version

function Sieve returns a list of primes less than or equal to the given aLimit

program prime_sieve;
{$mode objfpc}{$coperators on}
uses
  SysUtils, GVector;
type
  TPrimeList = specialize TVector<DWord>;
function Sieve(aLimit: DWord): TPrimeList;
var
  IsPrime: array of Boolean;
  I, SqrtBound: DWord;
  J: QWord;
begin
  Result := TPrimeList.Create;
  Inc(aLimit, Ord(aLimit < High(DWord))); //not a problem because High(DWord) is composite
  SetLength(IsPrime, aLimit);
  FillChar(Pointer(IsPrime)^, aLimit, Byte(True));
  SqrtBound := Trunc(Sqrt(aLimit));
  for I := 2 to aLimit do
    if IsPrime[I] then
      begin
        Result.PushBack(I);
        if I <= SqrtBound then
          begin
            J := I * I;
            repeat
              IsPrime[J] := False;
              J += I;
            until J > aLimit;
          end;
      end;
end;

 //usage

var
  Limit: DWord = 0;
function ReadLimit: Boolean;
var
  Lim: Int64;
begin
  if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
    if (Lim >= 0) and (Lim <= High(DWord)) then
      begin
        Limit := DWord(Lim);
        exit(True);
      end;
  Result := False;
end;
procedure PrintUsage;
begin
  WriteLn('Usage: prime_sieve Limit');
  WriteLn('  where Limit in the range [0, ', High(DWord), ']');
  Halt;
end;
procedure PrintPrimes(aList: TPrimeList);
var
  I: DWord;
begin
  if aList.Size <> 0 then begin
    if aList.Size > 1 then
      for I := 0 to aList.Size - 2 do
        Write(aList[I], ', ');
    WriteLn(aList[aList.Size - 1]);
  end;
  aList.Free;
end;
begin
  if not ReadLimit then
    PrintUsage;
  try
    PrintPrimes(Sieve(Limit));
  except
    on e: Exception do
      WriteLn('An exception ', e.ClassName, ' occurred with message: ', e.Message);
  end;
end.

Alternative segmented(odds only) version

function OddSegmentSieve returns a list of primes less than or equal to the given aLimit

program prime_sieve;
{$mode objfpc}{$coperators on}
uses
  SysUtils, Math;
type
  TPrimeList = array of DWord;
function OddSegmentSieve(aLimit: DWord): TPrimeList;
  function EstimatePrimeCount(aLimit: DWord): DWord;
  begin
    case aLimit of
      0..1:   Result := 0;
      2..200: Result := Trunc(1.6 * aLimit/Ln(aLimit)) + 1;
    else
      Result := Trunc(aLimit/(Ln(aLimit) - 2)) + 1;
    end;
  end;
  function Sieve(aLimit: DWord; aNeed2: Boolean): TPrimeList;
  var
    IsPrime: array of Boolean;
    I: DWord = 3;
    J, SqrtBound: DWord;
    Count: Integer = 0;
  begin
    if aLimit < 2 then
      exit(nil);
    SetLength(IsPrime, (aLimit - 1) div 2);
    FillChar(Pointer(IsPrime)^, Length(IsPrime), Byte(True));
    SetLength(Result, EstimatePrimeCount(aLimit));
    SqrtBound := Trunc(Sqrt(aLimit));
    if aNeed2 then
      begin
        Result[0] := 2;
        Inc(Count);
      end;
    for I := 0 to High(IsPrime) do
      if IsPrime[I] then
        begin
          Result[Count] := I * 2 + 3;
          if Result[Count] <= SqrtBound then
            begin
              J := Result[Count] * Result[Count];
              repeat
                IsPrime[(J - 3) div 2] := False;
                J += Result[Count] * 2;
              until J > aLimit;
            end;
          Inc(Count);
        end;
    SetLength(Result, Count);
  end;
const
  PAGE_SIZE = $8000;
var
  IsPrime: array[0..Pred(PAGE_SIZE)] of Boolean; //current page
  SmallPrimes: TPrimeList = nil;
  I: QWord;
  J, PageHigh, Prime: DWord;
  Count: Integer;
begin
  if aLimit < PAGE_SIZE div 4 then
    exit(Sieve(aLimit, True));
  I := Trunc(Sqrt(aLimit));
  SmallPrimes := Sieve(I + 1, False);
  Count := Length(SmallPrimes) + 1;
  I += Ord(not Odd(I));
  SetLength(Result, EstimatePrimeCount(aLimit));
  while I <= aLimit do
    begin
      PageHigh := Min(Pred(PAGE_SIZE * 2), aLimit - I);
      FillChar(IsPrime, PageHigh div 2 + 1, Byte(True));
      for Prime in SmallPrimes do
        begin
          J := DWord(I) mod Prime;
          if J <> 0 then
            J := Prime shl (1 - J and 1) - J;
          while J <= PageHigh do
            begin
              IsPrime[J div 2] := False;
              J += Prime * 2;
            end;
        end;
      for J := 0 to PageHigh div 2 do
        if IsPrime[J] then
          begin
            Result[Count] := J * 2 + I;
            Inc(Count);
          end;
      I += PAGE_SIZE * 2;
    end;
  SetLength(Result, Count);
  Result[0] := 2;
  Move(SmallPrimes[0], Result[1], Length(SmallPrimes) * SizeOf(DWord));
end;

  //usage

var
  Limit: DWord = 0;
function ReadLimit: Boolean;
var
  Lim: Int64;
begin
  if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
    if (Lim >= 0) and (Lim <= High(DWord)) then
      begin
        Limit := DWord(Lim);
        exit(True);
      end;
  Result := False;
end;
procedure PrintUsage;
begin
  WriteLn('Usage: prime_sieve Limit');
  WriteLn('  where Limit in the range [0, ', High(DWord), ']');
  Halt;
end;
procedure PrintPrimes(const aList: TPrimeList);
var
  I: DWord;
begin
  for I := 0 to Length(aList) - 2 do
    Write(aList[I], ', ');
  if aList <> nil then
    WriteLn(aList[High(aList)]);
end;
begin
  if not ReadLimit then
    PrintUsage;
  PrintPrimes(OddSegmentSieve(Limit));
end.

FreeBASIC

' FB 1.05.0

Sub sieve(n As Integer)
  If n < 2 Then Return
  Dim a(2 To n) As Integer
  For i As Integer = 2 To n : a(i) = i : Next
  Dim As Integer p = 2, q
  ' mark non-prime numbers by setting the corresponding array element to 0
  Do
    For j As Integer = p * p To n Step p
      a(j) = 0
    Next j
    ' look for next non-zero element in array after 'p'
    q = 0
    For j As Integer = p + 1 To Sqr(n)
      If a(j) <> 0 Then
        q = j
        Exit For
      End If
    Next j    
    If q = 0 Then Exit Do
    p = q
  Loop

  ' print the non-zero numbers remaining i.e. the primes
  For i As Integer = 2 To n
    If a(i) <> 0 Then
      Print Using "####"; a(i);      
    End If
  Next
  Print
End Sub

Print "The primes up to 1000 are :"
Print
sieve(1000)
Print
Print "Press any key to quit"
Sleep
Output:
The primes up to 1000 are :

   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

Frink

n = eval[input["Enter highest number: "]]
results = array[sieve[n]]
println[results]
println[length[results] + " prime numbers less than or equal to " + n]

sieve[n] :=
{
   // Initialize array
   array = array[0 to n]
   array@1 = 0

   for i = 2 to ceil[sqrt[n]]
      if array@i != 0
         for j = i^2 to n step i
            array@j = 0

   return select[array, { |x| x != 0 }]
}

Furor

Note: With benchmark function

tick sto startingtick
#g 100000 sto MAX
@MAX mem !maximize sto primeNumbers
one count
@primeNumbers 0 2 [^]
2 @MAX külső: {||
@count {|
{}§külső {} []@primeNumbers !/ else{<}§külső
|} // @count vége
@primeNumbers @count++ {} [^]
|} // @MAX vége
@primeNumbers free
."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }

Peri

Note: With benchmark function

###sysinclude standard.uh
tick sto startingtick
#g 100000 sto MAX
@MAX mem !maximize sto primeNumbers
one count
2 0 sto#s primeNumbers
2 @MAX külső: {{ ,
@count {{
{{}}§külső primeNumbers[{{}}] !/ else {{<}}§külső
}} // @count vége
//{{}} gprintnl  // A talált prímszám kiiratásához kommentezzük ki e sort
{{}} @count++ sto#s primeNumbers
}} // @MAX vége
@primeNumbers inv mem
//."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }

FutureBasic

Basic sieve of array of booleans

window 1, @"Sieve of Eratosthenes", (0,0,720,300)

begin globals
dynamic gPrimes(1) as Boolean
end globals

local fn SieveOfEratosthenes( n as long )
  long i, j
  
  for i = 2 to  n
    for j = i * i to n step i
      gPrimes(j) = _true
    next
    if gPrimes(i) = 0 then print i,
  next i
  kill gPrimes
end fn

fn SieveOfEratosthenes( 100 )

HandleEvents

Output:

 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Fōrmulæ

Fōrmulæ programs are not textual, visualization/edition of programs is done showing/manipulating structures but not text. Moreover, there can be multiple visual representations of the same program. Even though it is possible to have textual representation —i.e. XML, JSON— they are intended for storage and transfer purposes more than visualization and edition.

Programs in Fōrmulæ are created/edited online in its website.

In this page you can see and run the program(s) related to this task and their results. You can also change either the programs or the parameters they are called with, for experimentation, but remember that these programs were created with the main purpose of showing a clear solution of the task, and they generally lack any kind of validation.

Solution

Test case

GAP

Eratosthenes := function(n)
    local a, i, j;
    a := ListWithIdenticalEntries(n, true);
    if n < 2 then
        return [];
    else
        for i in [2 .. n] do
            if a[i] then
                j := i*i;
                if j > n then
                    return Filtered([2 .. n], i -> a[i]);
                else
                    while j <= n do
                        a[j] := false;
                        j := j + i;
                    od;
                fi;
            fi;
        od;
    fi;
end;

Eratosthenes(100);

[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]

GLBasic

// Sieve of Eratosthenes (find primes)
// GLBasic implementation


GLOBAL n%, k%, limit%, flags%[]
 
limit = 100			// search primes up to this number

DIM flags[limit+1]		// GLBasic arrays start at 0
 
FOR n = 2 TO SQR(limit)
    IF flags[n] = 0
        FOR k = n*n TO limit STEP n
            flags[k] = 1
        NEXT
    ENDIF
NEXT
 
// Display the primes
FOR n = 2 TO limit
    IF flags[n] = 0 THEN STDOUT n + ", "
NEXT

KEYWAIT

Go

Basic sieve of array of booleans

package main
import "fmt"

func main() {
    const limit = 201 // means sieve numbers < 201

    // sieve
    c := make([]bool, limit) // c for composite.  false means prime candidate
    c[1] = true              // 1 not considered prime
    p := 2
    for {
        // first allowed optimization:  outer loop only goes to sqrt(limit)
        p2 := p * p
        if p2 >= limit {
            break
        }
        // second allowed optimization:  inner loop starts at sqr(p)
        for i := p2; i < limit; i += p {
            c[i] = true // it's a composite

        }
        // scan to get next prime for outer loop
        for {
            p++
            if !c[p] {
                break
            }
        }
    }

    // sieve complete.  now print a representation.
    for n := 1; n < limit; n++ {
        if c[n] {
            fmt.Print("  .")
        } else {
            fmt.Printf("%3d", n)
        }
        if n%20 == 0 {
            fmt.Println("")
        }
    }
}

Output:

  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19  .
  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .  .
 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59  .
 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79  .
  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .  .
101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .  .
  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139  .
  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .  .
  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179  .
181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199  .

Odds-only bit-packed array output-enumerating version

The above version's output is rather specialized; the following version uses a closure function to enumerate over the culled composite number array, which is bit packed. By using this scheme for output, no extra memory is required above that required for the culling array:

package main

import (
	"fmt"
	"math"
)

func primesOdds(top uint) func() uint {
	topndx := int((top - 3) / 2)
	topsqrtndx := (int(math.Sqrt(float64(top))) - 3) / 2
	cmpsts := make([]uint, (topndx/32)+1)
	for i := 0; i <= topsqrtndx; i++ {
		if cmpsts[i>>5]&(uint(1)<<(uint(i)&0x1F)) == 0 {
			p := (i << 1) + 3
			for j := (p*p - 3) >> 1; j <= topndx; j += p {
				cmpsts[j>>5] |= 1 << (uint(j) & 0x1F)
			}
		}
	}
	i := -1
	return func() uint {
		oi := i
		if i <= topndx {
			i++
		}
		for i <= topndx && cmpsts[i>>5]&(1<<(uint(i)&0x1F)) != 0 {
			i++
		}
		if oi < 0 {
			return 2
		} else {
			return (uint(oi) << 1) + 3
		}
	}
}

func main() {
	iter := primesOdds(100)
	for v := iter(); v <= 100; v = iter() {
		print(v, " ")
	}
	iter = primesOdds(1000000)
	count := 0
	for v := iter(); v <= 1000000; v = iter() {
		count++
	}
	fmt.Printf("\r\n%v\r\n", count)
}
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
78498

Sieve Tree

A fairly odd sieve tree method:

package main
import "fmt"

type xint uint64
type xgen func()(xint)

func primes() func()(xint) {
	pp, psq := make([]xint, 0), xint(25)

	var sieve func(xint, xint)xgen
	sieve = func(p, n xint) xgen {
		m, next := xint(0), xgen(nil)
		return func()(r xint) {
			if next == nil {
				r = n
				if r <= psq {
					n += p
					return
				}

				next = sieve(pp[0] * 2, psq) // chain in
				pp = pp[1:]
				psq = pp[0] * pp[0]

				m = next()
			}
			switch {
			case n < m: r, n = n, n + p
			case n > m: r, m = m, next()
			default:    r, n, m = n, n + p, next()
			}
			return
		}
	}

	f := sieve(6, 9)
	n, p := f(), xint(0)

	return func()(xint) {
		switch {
		case p < 2: p = 2
		case p < 3: p = 3
		default:
			for p += 2; p == n; {
				p += 2
				if p > n {
					n = f()
				}
			}
			pp = append(pp, p)
		}
		return p
	}
}

func main() {
	for i, p := 0, primes(); i < 100000; i++ {
		fmt.Println(p())
	}
}

Concurrent Daisy-chain sieve

A concurrent prime sieve adopted from the example in the "Go Playground" window at http://golang.org/

package main
import "fmt"
 
// Send the sequence 2, 3, 4, ... to channel 'out'
func Generate(out chan<- int) {
	for i := 2; ; i++ {
		out <- i                  // Send 'i' to channel 'out'
	}
}
 
// Copy the values from 'in' channel to 'out' channel,
//   removing the multiples of 'prime' by counting.
// 'in' is assumed to send increasing numbers
func Filter(in <-chan int, out chan<- int, prime int) {
        m := prime + prime                // first multiple of prime
	for {
		i := <- in                // Receive value from 'in'
		for i > m {
			m = m + prime     // next multiple of prime
			}
		if i < m {
			out <- i          // Send 'i' to 'out'
			}
	}
}
 
// The prime sieve: Daisy-chain Filter processes
func Sieve(out chan<- int) {
	gen := make(chan int)             // Create a new channel
	go Generate(gen)                  // Launch Generate goroutine
	for  {
		prime := <- gen
		out <- prime
		ft := make(chan int)
		go Filter(gen, ft, prime)
		gen = ft
	}
}

func main() {
	sv := make(chan int)              // Create a new channel
	go Sieve(sv)                      // Launch Sieve goroutine
	for i := 0; i < 1000; i++ {
		prime := <- sv
		if i >= 990 { 
		    fmt.Printf("%4d ", prime) 
		    if (i+1)%20==0 {
			fmt.Println("")
		    }
		}
	}
}

The output:

7841 7853 7867 7873 7877 7879 7883 7901 7907 7919 

Runs at ~ n^2.1 empirically, producing up to n=3000 primes in under 5 seconds.

Postponed Concurrent Daisy-chain sieve

Here we postpone the creation of filters until the prime's square is seen in the input, to radically reduce the amount of filter channels in the sieve chain.

package main
import "fmt"
 
// Send the sequence 2, 3, 4, ... to channel 'out'
func Generate(out chan<- int) {
	for i := 2; ; i++ {
		out <- i                  // Send 'i' to channel 'out'
	}
}
 
// Copy the values from 'in' channel to 'out' channel,
//   removing the multiples of 'prime' by counting.
// 'in' is assumed to send increasing numbers
func Filter(in <-chan int, out chan<- int, prime int) {
        m := prime * prime                // start from square of prime
	for {
		i := <- in                // Receive value from 'in'
		for i > m {
			m = m + prime     // next multiple of prime
			}
		if i < m {
			out <- i          // Send 'i' to 'out'
			}
	}
}
 
// The prime sieve: Postponed-creation Daisy-chain of Filters
func Sieve(out chan<- int) {
	gen := make(chan int)             // Create a new channel
	go Generate(gen)                  // Launch Generate goroutine
	p := <- gen
	out <- p
	p = <- gen          // make recursion shallower ---->
	out <- p            // (Go channels are _push_, not _pull_)
	
	base_primes := make(chan int)     // separate primes supply  
	go Sieve(base_primes)             
	bp := <- base_primes              // 2           <---- here
	bq := bp * bp                     // 4

	for  {
		p = <- gen
		if p == bq {                    // square of a base prime
			ft := make(chan int)
			go Filter(gen, ft, bp)  // filter multiples of bp in gen out
			gen = ft
			bp = <- base_primes     // 3
			bq = bp * bp            // 9
		} else {
			out <- p
		}
	}
}

func main() {
	sv := make(chan int)              // Create a new channel
	go Sieve(sv)                      // Launch Sieve goroutine
	lim := 25000              
	for i := 0; i < lim; i++ {
		prime := <- sv
		if i >= (lim-10) { 
		    fmt.Printf("%4d ", prime) 
		    if (i+1)%20==0 {
			fmt.Println("")
		    }
		}
	}
}

The output:

286999 287003 287047 287057 287059 287087 287093 287099 287107 287117 

Runs at ~ n^1.2 empirically, producing up to n=25,000 primes on ideone in under 5 seconds.

Incremental Odds-only Sieve

Uses Go's built-in hash tables to store odd composites, and defers adding new known composites until the square is seen.

package main

import "fmt"

func main() {
    primes := make(chan int)
    go PrimeSieve(primes)

    p := <-primes
    for p < 100 {
        fmt.Printf("%d ", p)
        p = <-primes
    }

    fmt.Println()
}

func PrimeSieve(out chan int) {
    out <- 2
    out <- 3

    primes := make(chan int)
    go PrimeSieve(primes)

    var p int
    <-primes
    p = <-primes

    sieve := make(map[int]int)
    q := p * p
    n := p

    for {
        n += 2
        step, isComposite := sieve[n]
        if isComposite {
            delete(sieve, n)
            m := n + step
            for sieve[m] != 0 {
                m += step
            }
            sieve[m] = step

        } else if n < q {
            out <- n

        } else {
            step = p + p
            m := n + step
            for sieve[m] != 0 {
                m += step
            }
            sieve[m] = step
            p = <-primes
            q = p * p
        }
    }
}

The output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Groovy

This solution uses a BitSet for compactness and speed, but in Groovy, BitSet has full List semantics. It also uses both the "square root of the boundary" shortcut and the "square of the prime" shortcut.

def sievePrimes = { bound -> 
    def isPrime  = new BitSet(bound)
    isPrime[0..1] = false
    isPrime[2..bound] = true
    (2..(Math.sqrt(bound))).each { pc ->
        if (isPrime[pc]) {
            ((pc**2)..bound).step(pc) { isPrime[it] = false }
        }
    }
    (0..bound).findAll { isPrime[it] }
}

Test:

println sievePrimes(100)

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

GW-BASIC

10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20  DIM FLAGS(LIMIT)
30  FOR N = 2 TO SQR (LIMIT)
40  IF FLAGS(N) < > 0 GOTO 80
50  FOR K = N * N TO LIMIT STEP N
60  FLAGS(K) = 1
70  NEXT K
80  NEXT N
90  REM  DISPLAY THE PRIMES
100  FOR N = 2 TO LIMIT
110  IF FLAGS(N) = 0 THEN PRINT N;", ";
120  NEXT N

Haskell

Mutable unboxed arrays

Mutable array of unboxed Bools indexed by Ints:

{-# LANGUAGE FlexibleContexts #-} -- too lazy to write contexts...
{-# OPTIONS_GHC -O2 #-}

import Control.Monad.ST ( runST, ST )
import Data.Array.Base ( MArray(newArray, unsafeRead, unsafeWrite),
                         IArray(unsafeAt),
                         STUArray, unsafeFreezeSTUArray, assocs )
import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing...

primesTo :: Int -> [Int] -- generate a list of primes to given limit...
primesTo limit = runST $ do
  let lmt = limit - 2-- raw index of limit!
  cmpsts <- newArray (2, limit) False -- when indexed is true is composite
  cmpstsf <- unsafeFreezeSTUArray cmpsts -- frozen in place!
  let getbpndx bp = (bp, bp * bp - 2) -- bp -> bp, raw index of start cull
      cullcmpst i = unsafeWrite cmpsts i True -- cull composite by raw ndx
      cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
  mapM_ cull4bpndx
        $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                    [ getbpndx bp | (bp, False) <- assocs cmpstsf ]
  return [ p | (p, False) <- assocs cmpstsf ] -- non-raw ndx is prime

-- testing...
main :: IO ()
main = do
  putStrLn $ "The primes up to 100 are " ++ show (primesTo 100)
  putStrLn $ "The number of primes up to a million is " ++
               show (length $ primesTo 1000000)
  let top = 1000000000
  start <- getPOSIXTime
  let answr = length $ primesTo top
  stop <- answr `seq` getPOSIXTime -- force result for timing!
  let elpsd =  round $ 1e3 * (stop - start) :: Int

  putStrLn $ "Found " ++ show answr ++ " to " ++ show top ++
               " in " ++ show elpsd ++ " milliseconds."

The above code chooses conciseness and elegance over speed, but it isn't too slow:

Output:
The primes up to 100 are [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
The number of primes up to a million is 78498
Found 50847534 to 1000000000 in 12435 milliseconds.

Run on an Intel Sky Lake i5-2500 at 3.6 GHZ (single threaded boost). As per the comments in the below, this is greatly sped up by a constant factor by using the raw `unsafeWrite`; use of the "unsafe" versions that avoid run time array bounds checks on every operation is entirely safe here as the indexing is inherently limited to be within the bounds by their use in the loops. There is an additional benefit of about 20 per cent in speed if run with the LLVM back end compiler option (add the "-fllvm" flag) if the right version of LLVM is available to the GHC Haskell compiler. We see the relatively small benefit of using LLVM in that this program spends a relatively small percentage of time in the tight inner culling loop where LLVM can help the most and a high part of the time is spent just enumerating the result list.

Mutable unboxed arrays, odds only

Mutable array of unboxed Bools indexed by Ints, representing odds only:

import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

sieveUO :: Int -> UArray Int Bool
sieveUO top = runSTUArray $ do
    let m = (top-1) `div` 2
        r = floor . sqrt $ fromIntegral top + 1
    sieve <- newArray (1,m) True          -- :: ST s (STUArray s Int Bool)
    forM_ [1..r `div` 2] $ \i -> do       -- prime(i) = 2i+1
      isPrime <- readArray sieve i        -- ((2i+1)^2-1)`div`2 = 2i(i+1)
      when isPrime $ do                   
        forM_ [2*i*(i+1), 2*i*(i+2)+1..m] $ \j -> do
          writeArray sieve j False
    return sieve

primesToUO :: Int -> [Int]
primesToUO top | top > 1   = 2 : [2*i + 1 | (i,True) <- assocs $ sieveUO top]
               | otherwise = []

This represents odds only in the array. Empirical orders of growth is ~ n1.2 in n primes produced, and improving for bigger n‍ ‍s. Memory consumption is low (array seems to be packed) and growing about linearly with n. Can further be significantly sped up by re-writing the forM_ loops with direct recursion, and using unsafeRead and unsafeWrite operations.

In light of the performance of the previous and following submissions results, the IDEOne results seem somewhat slow at about 10 seconds over a range of about a third of a billion, likely due to some lazily deferred operations in the processing. See the next submission for expected speeds for odds only.

The measured empirical orders of growth as per the table in the IDEOne link are easily understood if one considers that these slowish run times are primarily limited by the time to lazily enumerate the results and that the number of found primes to enumerate varies as (top / log top) by the Euler relationship. Since the prime density decreases by this relationship, the enumeration has the inverse relationship as it takes longer per prime to find the primes in the sieved buffer. Log of a million is 1.2 times larger than log of a hundred thousand and of course this ratio gets smaller with range: the ratio of the log of a billion as compared to log of a hundred million is 1.125, etc.

Alternate Version of Mutable unboxed arrays, odds only

The reason for this alternate version is to have an accessible version of "odds only" that uses the same optimizations and is written in the same coding style as the basic version. This can be used by just substituting the following code for the function of the same name in the first base example above. Mutable array of unboxed Bools indexed by Ints, representing odds only:

primesTo :: Int -> [Int] -- generate a list of primes to given limit...
primesTo limit
  | limit < 2 = []
  | otherwise = runST $ do
      let lmt = (limit - 3) `div` 2 - 1 -- limit index!
      oddcmpsts <- newArray (0, lmt) False -- when indexed is true is composite
      oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
      let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
          cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
          cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
      mapM_ cull4bpndx
            $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                        [ getbpndx i | (i, False) <- assocs oddcmpstsf ]
      return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]
Output:
The primes up to 100 are [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
The number of primes up to a million is 78498
Found 50847534 to 1000000000 in 6085 milliseconds.

A "monolithic buffer" odds only sieve uses half the memory as compared to the basic version.

This is not the expected about 2.5 times faster as the basic version because there are other factors to execution time cost than just the number of culling operations, as follows:

1) Since the amount of memory used to sieve to a billion has been dropped from 125 million bytes to 62.5 million bytes, the cache associativity is slightly better, which should make it faster; however

2) We have eliminated the culling by the very small span of the base prime of two, which means a lesser percentage of the culling span operations will be within a given CPU cache size, which will make it slower, but

3) The primary reason we observe only about a factor of two difference in run times is that we have increased the prime density in the sieving buffer by a factor of two, which means that we have half the work to enumerate the primes. Since enumeration of the found primes is a major contribution of the execution time, the execution time will tend to change more by its cost than any other.

As to "empirical orders of growth", the comments made in the above are valid, but there is a further observation. For smaller ranges of primes up to a few million where the sieving buffer fits within the CPU L2 cache size (generally 256 Kilobytes/2 million bits, representing a range of about four million for this version), the cull times are their fastest and enumeration is a bigger percentage of the time; as ranges increase above that, more and more time is spent waiting on memory at the access times of the next level memory (CPU L3 cache, if present, followed by main memory) so that the controlling factor is a little less that of the enumeration time as range gets larger.

Because of the greatly increasing memory demands and the high execution cost of memory access as ranges exceed the span of the CPU caches, it is not recommended that these simple "monolithic buffer" sieves be used for sieving of ranges above about a hundred million. Rather, one should use a "Paged-Segmented" sieve as per the examples near the end of this Haskell section.

Immutable arrays

Monolithic sieving array. Even numbers above 2 are pre-marked as composite, and sieving is done only by odd multiples of odd primes:

import Data.Array.Unboxed
 
primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]] :: UArray Int Bool)
  where
    sieve p a 
      | p*p > m   = 2 : [i | (i,True) <- assocs a]
      | a!p       = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
      | otherwise = sieve (p+2) a

Its performance sharply depends on compiler optimizations. Compiled with -O2 flag in the presence of the explicit type signature, it is very fast in producing first few million primes. (//) is an array update operator.

Immutable arrays, by segments

Works by segments between consecutive primes' squares. Should be the fastest of non-monadic code. Evens are entirely ignored:

import Data.Array.Unboxed

primesSA = 2 : prs ()
  where 
    prs () = 3 : sieve 3 [] (prs ())
    sieve x fs (p:ps) = [i*2 + x | (i,True) <- assocs a] 
                        ++ sieve (p*p) fs2 ps
     where
      q     = (p*p-x)`div`2                  
      fs2   = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs]
      a     :: UArray Int Bool
      a     = accumArray (\ b c -> False) True (1,q-1)
                         [(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]

As list comprehension

import Data.Array.Unboxed
import Data.List (tails, inits)

primes = 2 : [ n |
   (r:q:_, px) <- zip (tails (2 : [p*p | p <- primes]))
                      (inits primes),
   (n, True)   <- assocs ( accumArray (\_ _ -> False) True
                     (r+1,q-1)
                     [ (m,()) | p <- px
                              , s <- [ div (r+p) p * p]
                              , m <- [s,s+p..q-1] ] :: UArray Int Bool
                  ) ]

Basic list-based sieve

Straightforward implementation of the sieve of Eratosthenes in its original bounded form. This finds primes in gaps between the composites, and composites as an enumeration of each prime's multiples.

primesTo m = eratos [2..m] where
   eratos (p : xs) 
      | p*p > m   = p : xs
      | otherwise = p : eratos (xs `minus` [p*p, p*p+p..m])
                                    -- map (p*) [p..]  
                                    -- map (p*) (p:xs)   -- (Euler's sieve)
   
minus a@(x:xs) b@(y:ys) = case compare x y of
         LT -> x : minus  xs b
         EQ ->     minus  xs ys
         GT ->     minus  a  ys
minus a        b        = a

Its time complexity is similar to that of optimal trial division because of limitations of Haskell linked lists, where (minus a b) takes time proportional to length(union a b) and not (length b), as achieved in imperative setting with direct-access memory. Uses ordered list representation of sets.

This is reasonably useful up to ranges of fifteen million or about the first million primes.

Unbounded list based sieve

Unbounded, "naive", too eager to subtract (see above for the definition of minus):

primesE  = sieve [2..] 
           where
           sieve (p:xs) = p : sieve (minus xs [p, p+p..])
-- unfoldr (\(p:xs)-> Just (p, minus xs [p, p+p..])) [2..]

This is slow, with complexity increasing as a square law or worse so that it is only moderately useful for the first few thousand primes or so.

The number of active streams can be limited to what's strictly necessary by postponement until the square of a prime is seen, getting a massive complexity improvement to better than ~ n1.5 so it can get first million primes or so in a tolerable time:

primesPE = 2 : sieve [3..] 4 primesPE
               where
               sieve (x:xs) q (p:t)
                 | x < q     = x : sieve xs q (p:t)
                 | otherwise =     sieve (minus xs [q, q+p..]) (head t^2) t
-- fix $ (2:) . concat 
--     . unfoldr (\(p:ps,xs)-> Just . second ((ps,) . (`minus` [p*p, p*p+p..])) 
--                                  . span (< p*p) $ xs) . (,[3..])

Transposing the workflow, going by segments between the consecutive squares of primes:

import Data.List (inits)

primesSE = 2 : sieve 3 4 (tail primesSE) (inits primesSE) 
               where
               sieve x q ps (fs:ft) =  
                  foldl minus [x..q-1] [[n, n+f..q-1] | f <- fs, let n=div x f * f]
                          -- [i|(i,True) <- assocs ( accumArray (\ b c -> False) 
                          --     True (x,q-1) [(i,()) | f <- fs, let n=div(x+f-1)f*f,
                          --         i <- [n, n+f..q-1]] :: UArray Int Bool )]
                  ++ sieve q (head ps^2) (tail ps) ft

The basic gradually-deepening left-leaning (((a-b)-c)- ... ) workflow of foldl minus a bs above can be rearranged into the right-leaning (a-(b+(c+ ... ))) workflow of minus a (foldr union [] bs). This is the idea behind Richard Bird's unbounded code presented in M. O'Neill's article, equivalent to:

primesB = _Y ( (2:) . minus [3..] . foldr (\p-> (p*p :) . union [p*p+p, p*p+2*p..]) [] )

--      = _Y ( (2:) . minus [3..] . _LU . map(\p-> [p*p, p*p+p..]) )
-- _LU ((x:xs):t) = x : (union xs . _LU) t             -- linear folding big union

_Y g = g (_Y g)  -- = g (g (g ( ... )))      non-sharing multistage fixpoint combinator
--                  = g . g . g . ...            ... = g^inf
--   = let x = g x in g x -- = g (fix g)     two-stage fixpoint combinator 
--   = let x = g x in x   -- = fix g         sharing fixpoint combinator

union a@(x:xs) b@(y:ys) = case compare x y of
         LT -> x : union  xs b
         EQ -> x : union  xs ys
         GT -> y : union  a  ys

Using _Y is meant to guarantee the separate supply of primes to be independently calculated, recursively, instead of the same one being reused, corecursively; thus the memory footprint is drastically reduced. This idea was introduced by M. ONeill as a double-staged production, with a separate primes feed.

The above code is also useful to a range of the first million primes or so. The code can be further optimized by fusing minus [3..] into one function, preventing a space leak with the newer GHC versions, getting the function gaps defined below.

Tree-merging incremental sieve

Linear merging structure can further be replaced with an wiki.haskell.org/Prime_numbers#Tree_merging indefinitely deepening to the right tree-like structure, (a-(b+((c+d)+( ((e+f)+(g+h)) + ... )))).

This merges primes' multiples streams in a tree-like fashion, as a sequence of balanced trees of union nodes, likely achieving theoretical time complexity only a log n factor above the optimal n log n log (log n), for n primes produced. Indeed, empirically it runs at about ~ n1.2 (for producing first few million primes), similarly to priority-queue–based version of M. O'Neill's, and with very low space complexity too (not counting the produced sequence of course):

primes :: () -> [Int]   
primes() = 2 : _Y ((3:) . gaps 5 . _U . map(\p-> [p*p, p*p+2*p..])) where
  _Y g = g (_Y g)  -- = g (g (g ( ... )))   non-sharing multistage fixpoint combinator
  gaps k s@(c:cs) | k < c     = k : gaps (k+2) s  -- ~= ([k,k+2..] \\ s)
                  | otherwise =     gaps (k+2) cs --   when null(s\\[k,k+2..]) 
  _U ((x:xs):t) = x : (merge xs . _U . pairs) t   -- tree-shaped folding big union
  pairs (xs:ys:t) = merge xs ys : pairs t
  merge xs@(x:xt) ys@(y:yt) | x < y     = x : merge xt ys
                            | y < x     = y : merge xs yt
                            | otherwise = x : merge xt yt

Works with odds only, the simplest kind of wheel. Here's the test entry on Ideone.com, and a comparison with more versions.

With Wheel

Using _U defined above,

primesW :: [Int]   
primesW = [2,3,5,7] ++ _Y ( (11:) . gapsW 13 (tail wheel) . _U .
                            map (\p->  
                              map (p*) . dropWhile (< p) $
                                scanl (+) (p - rem (p-11) 210) wheel) )

gapsW k (d:w) s@(c:cs) | k < c     = k : gapsW (k+d) w s    -- set difference
                       | otherwise =     gapsW (k+d) w cs   --   k==c

wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:    -- gaps = (`gapsW` cycle [2])
        4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
  -- cycle $ zipWith (-) =<< tail $ [i | i <- [11..221], gcd i 210 == 1]

Used here and here.

Improved efficiency Wheels

1. The generation of large wheels such as the 2/3/5/7/11/13/17 wheel, which has 92160 cyclic elements, needs to be done based on sieve culling which is much better as to performance and can be used without inserting the generated table.

2. Improving the means to re-generate the position on the wheel for the recursive base primes without the use of `dropWhile`, etc. The below improved code uses a copy of the place in the wheel for each found base prime for ease of use in generating the composite number to-be-culled chains.

-- autogenerates wheel primes, first sieve prime, and gaps
wheelGen :: Int -> ([Int],Int,[Int])
wheelGen n = loop 1 3 [2] [2] where
  loop i frst wps gps =
    if i >= n then (wps, frst, gps) else
    let nfrst = frst + head gps
        nhts = (length gps) * (frst - 1)
        cmpsts = scanl (\ c g -> c + frst * g)  (frst * frst) (cycle gps)
        cull n (g:gs') cs@(c:cs') og
            | nn >= c = cull nn gs' cs' (og + g) -- n == c; never greater!
            | otherwise = (og + g) : cull nn gs' cs 0 where nn = n + g
    in nfrst `seq` nhts `seq` loop (i + 1) nfrst (wps ++ [frst]) $ take nhts
                                $ cull nfrst (tail $ cycle gps) cmpsts 0

(wheelPrimes, firstSievePrime, gaps) = wheelGen 7

primesTreeFoldingWheeled :: () -> [Int]   
primesTreeFoldingWheeled() =    
    wheelPrimes ++ map fst (
      _Y ( ((firstSievePrime, wheel) :) .
               gapsW (firstSievePrime + head wheel, tail wheel) . _U .
                 map (\ (p,w) ->
                          scanl (\ c m -> c + m * p) (p * p) w ) ) ) where

  _Y g = g (_Y g) -- non-sharing multi-stage fixpoint Y-combinator

  wheel = cycle gaps

  gapsW k@(n,d:w) s@(c:cs) | n < c     = k : gapsW (n + d, w) s  -- set diff
                           | otherwise =     gapsW (n + d, w) cs --   n == c
 
  _U ((x:xs):t) = -- exactly the same as for odds-only!
      x : (union xs . _U . pairs) t where   -- tree-shaped folding big union
    pairs (xs:ys:t) = union xs ys : pairs t --  ~= nub . sort . concat
    union xs@(x:xs') ys@(y:ys')
      | x < y = x : union xs' ys
      | y < x = y : union xs ys'
      | otherwise = x : union xs' ys' -- x and y must be equal!

When compiled with -O2 optimization and -fllvm (the LLVM back end), the above code is over twice as fast as the Odds-Only version as it should be as that is about the ratio of reduced operations minus some slightly increased operation complexity, sieving the primes to a hundred million in about seven seconds on a modern middle range desktop computer. It is almost twice as fast as the "primesW" version due to the increased algorithmic efficiency!

Note that the "wheelGen" code could be used to not need to do further culling at all by continuously generating wheels until the square of the "firstSievePrime" is greater than the range as there are no composites left up to that limit, but this is always slower than a SoE due to the high overhead in generating the wheels - this would take a wheel generation of 1229 (number of primes to the square root of a hundred thousand is ten thousand) to create the required wheel sieved to a hundred million; however, the theoretical (if the time to advance through the lists per element were zero, which of course it is not) asymptotic performance would be O(n) instead of O(n log (log n)) where n is the range sieved. Just another case where theory supports (slightly) reduced number of operations, but practicality means that the overheads to do this are so big as to make it useless for any reasonable range ;-) !

Priority Queue based incremental sieve

The above work is derived from the Epilogue of the Melissa E. O'Neill paper which is much referenced with respect to incremental functional sieves; however, that paper is now dated and her comments comparing list based sieves to her original work leading up to a Priority Queue based implementation is no longer current given more recent work such as the above Tree Merging version. Accordingly, a modern "odd's-only" Priority Queue version is developed here for more current comparisons between the above list based incremental sieves and a continuation of O'Neill's work.

In order to implement a Priority Queue version with Haskell, an efficient Priority Queue, which is not part of the standard Haskell libraries, is required. A Min Heap implementation is likely best suited for this task in providing the most efficient frequently used peeks of the next item in the queue and replacement of the first item in the queue (not using a "pop" followed by a "push) with "pop" operations then not used at all, and "push" operations used relatively infrequently. Judging by O'Neill's use of an efficient "deleteMinAndInsert" operation which she states "(We provide deleteMinAndInsert becausea heap-based implementation can support this operation with considerably less rearrangement than a deleteMin followed by an insert.)", which statement is true for a Min Heap Priority Queue and not others, and her reference to a priority queue by (Paulson, 1996), the queue she used is likely the one as provided as a simple true functional Min Heap implementation on RosettaCode, from which the essential functions are reproduced here:

data PriorityQ k v = Mt
                     | Br !k v !(PriorityQ k v) !(PriorityQ k v)
  deriving (Eq, Ord, Read, Show)

emptyPQ :: PriorityQ k v
emptyPQ = Mt
 
peekMinPQ :: PriorityQ k v -> Maybe (k, v)
peekMinPQ Mt           = Nothing
peekMinPQ (Br k v _ _) = Just (k, v)

pushPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
pushPQ wk wv Mt           = Br wk wv Mt Mt
pushPQ wk wv (Br vk vv pl pr)
             | wk <= vk   = Br wk wv (pushPQ vk vv pr) pl
             | otherwise  = Br vk vv (pushPQ wk wv pr) pl
 
siftdown :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v -> PriorityQ k v
siftdown wk wv Mt _          = Br wk wv Mt Mt
siftdown wk wv (pl @ (Br vk vv _ _)) Mt
    | wk <= vk               = Br wk wv pl Mt
    | otherwise              = Br vk vv (Br wk wv Mt Mt) Mt
siftdown wk wv (pl @ (Br vkl vvl pll plr)) (pr @ (Br vkr vvr prl prr))
    | wk <= vkl && wk <= vkr = Br wk wv pl pr
    | vkl <= vkr             = Br vkl vvl (siftdown wk wv pll plr) pr
    | otherwise              = Br vkr vvr pl (siftdown wk wv prl prr)
 
replaceMinPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
replaceMinPQ wk wv Mt             = Mt
replaceMinPQ wk wv (Br _ _ pl pr) = siftdown wk wv pl pr

The "peekMin" function retrieves both the key and value in a tuple so processing is required to access whichever is required for further processing. As well, the output of the peekMin function is a Maybe with the case of an empty queue providing a Nothing output.

The following code is O'Neill's original odds-only code (without wheel factorization) from her paper slightly adjusted as per the requirements of this Min Heap implementation as laid out above; note the `seq` adjustments to the "adjust" function to make the evaluation of the entry tuple more strict for better efficiency:

-- (c) 2006-2007 Melissa O'Neill.  Code may be used freely so long as
-- this copyright message is retained and changed versions of the file
-- are clearly marked.
--   the only changes are the names of the called PQ functions and the
--   included processing for the result of the peek function being a maybe tuple.

primesPQ() = 2 : sieve [3,5..]
  where
    sieve [] = []
    sieve (x:xs) = x : sieve' xs (insertprime x xs emptyPQ)
      where
        insertprime p xs table = pushPQ (p*p) (map (* p) xs) table
        sieve' [] table = []
        sieve' (x:xs) table
            | nextComposite <= x = sieve' xs (adjust table)
            | otherwise = x : sieve' xs (insertprime x xs table)
          where
            nextComposite = case peekMinPQ table of
                              Just (c, _) -> c
            adjust table
                | n <= x = adjust (replaceMinPQ n' ns table)
                | otherwise = table
              where (n, n':ns) = case peekMinPQ table of
                                   Just tpl -> tpl

The above code is almost four times slower than the version of the Tree Merging sieve above for the first million primes although it is about the same speed as the original Richard Bird sieve with the "odds-only" adaptation as above. It is slow and uses a huge amount of memory for primarily one reason: over eagerness in adding prime composite streams to the queue, which are added as the primes are listed rather than when they are required as the output primes stream reaches the square of a given base prime; this over eagerness also means that the processed numbers must have a large range in order to not overflow when squared (as in the default Integer = infinite precision integers as used here and by O'Neill, but Int64's or Word64's would give a practical range) which processing of wide range numbers adds processing and memory requirement overhead. Although O'Neill's code is elegant, it also loses some efficiency due to the extensive use of lazy list processing, not all of which is required even for a wheel factorization implementation.

The following code is adjusted to reduce the amount of lazy list processing and to add a secondary base primes stream (or a succession of streams when the combinator is used) so as to overcome the above problems and reduce memory consumption to only that required for the primes below the square root of the currently sieved number; using this means that 32-bit Int's are sufficient for a reasonable range and memory requirements become relatively negligible:

primesPQx :: () -> [Int]
primesPQx() = 2 : _Y ((3 :) . sieve 5 emptyPQ 9) -- initBasePrms
  where
    _Y g = g (_Y g)        -- non-sharing multi-stage fixpoint combinator OR

    sieve n table q bps@(bp:bps')
        | n >= q = let nbp = head bps' in let ntbl = insertprime bp table in
                   ntbl `seq` sieve (n + 2) ntbl (nbp * nbp) bps'
        | n >= nextComposite = let ntbl = adjust table in
                               ntbl `seq` sieve (n + 2) ntbl q bps
        | otherwise = n : sieve (n + 2) table q bps
      where
        insertprime p table = let adv = 2 * p in let nv = p * p + adv
                              in nv `seq` pushPQ nv adv table
        nextComposite = case peekMinPQ table of
                          Nothing -> q -- at beginning when queue empty!
                          Just (c, _) -> c
        adjust table
            | c <= n = let ntbl = replaceMinPQ (c + adv) adv table
                       in ntbl `seq` adjust ntbl
            | otherwise = table
          where (c, adv) = case peekMinPQ table of Just ct -> ct `seq` ct

The above code is over five times faster than the previous (O'Neill) Priority Queue code half again faster than the Tree-Merging Odds-Only code for a range of a hundred million primes; it is likely faster as the Min Heap is slightly more efficient than Tree Merging due to better tree balancing.

Since the Tree-Folding version above includes the minor changes to work with a factorization wheel, this should have the same minor modifications for comparison purposes, with the code as follows:

-- Note:  this code segment uses the same wheelGen as the Tree-Folding version...

primesPQWheeled :: () -> [Int]
primesPQWheeled() =
    wheelPrimes ++ map fst (
      _Y (((firstSievePrime, wheel) :) .
            sieve (firstSievePrime + head wheel, tail wheel)
                  emptyPQ (firstSievePrime * firstSievePrime)) )
  where
    _Y g = g (_Y g)        -- non-sharing multi-stage fixpoint combinator OR

    wheel = cycle gaps

    sieve npr@(n,(g:gs')) table q bpprs@(bppr:bpprs')
        | n >= q =
            let (nbp,_) = head bpprs' in let ntbl = insertprime bppr table in
            nbp `seq` ntbl `seq` sieve (n + g, gs') ntbl (nbp * nbp) bpprs'
        | n >= nextComposite = let ntbl = adjust table in
                               ntbl `seq` sieve (n + g, gs') ntbl q bpprs
        | otherwise = npr : sieve (n + g, gs') table q bpprs
      where
        insertprime (p,(pg:pgs')) table =
          let nv = p * (p + pg) in nv `seq` pushPQ nv (map (* p) pgs') table
        nextComposite = case peekMinPQ table of
                          Nothing -> q -- at beginning when queue empty!
                          Just (c, _) -> c
        adjust table
            | c <= n = let ntbl = replaceMinPQ (c + a) as' table
                       in ntbl `seq` adjust ntbl
            | otherwise = table
          where (c, (a:as')) = case peekMinPQ table of Just ct -> ct `seq` ct

Compiled with -O2 optimization and -fllvm (the LLVM back end), this code gains about the expected ratio in performance in sieving to a range of a hundred million, sieving to this range in about five seconds on a modern medium range desktop computer. This is likely the fastest purely functional incremental type SoE useful for moderate ranges up to about a hundred million to a billion.

Page Segmented Sieve using a mutable unboxed array

All of the above unbounded sieves are quite limited in practical sieving range due to the large constant factor overheads in computation, making them mostly just interesting intellectual exercises other than for small ranges of up to about the first million to ten million primes; the following "odds-only" page-segmented version using (bit-packed internally) mutable unboxed arrays is about 50 times faster than the fastest of the above algorithms for ranges of about that and higher, making it practical for the first several hundred million primes:

{-# OPTIONS_GHC -O2 -fllvm #-} -- use LLVM for about double speed!

import Data.Int ( Int64 )
import Data.Word ( Word64 )
import Data.Bits ( Bits(shiftR) )
import Data.Array.Base ( IArray(unsafeAt), UArray(UArray),     
                         MArray(unsafeWrite), unsafeFreezeSTUArray ) 
import Control.Monad ( forM_ )
import Data.Array.ST ( MArray(newArray), runSTUArray )

type Prime = Word64

cSieveBufferRange :: Int
cSieveBufferRange = 2^17 * 8 -- CPU L2 cache in bits

primes :: () -> [Prime]
primes() = 2 : _Y (listPagePrms . pagesFrom 0) where
  _Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator
  szblmt = cSieveBufferRange - 1
  listPagePrms pgs@(hdpg@(UArray lwi _ rng _) : tlpgs) =
    let loop i | i >= fromIntegral rng = listPagePrms tlpgs
               | unsafeAt hdpg i = loop (i + 1)
               | otherwise = let ii = lwi + fromIntegral i in
                             case fromIntegral $ 3 + ii + ii of
                               p -> p `seq` p : loop (i + 1) in loop 0
  makePg lwi bps = runSTUArray $ do
    let limi = lwi + fromIntegral szblmt
        bplmt = floor $ sqrt $ fromIntegral $ limi + limi + 3
        strta bp = let si = fromIntegral $ (bp * bp - 3) `shiftR` 1
                   in if si >= lwi then fromIntegral $ si - lwi else
                   let r = fromIntegral (lwi - si) `mod` bp
                   in if r == 0 then 0 else fromIntegral $ bp - r
    cmpsts <- newArray (lwi, limi) False
    fcmpsts <- unsafeFreezeSTUArray cmpsts
    let cbps = if lwi == 0 then listPagePrms [fcmpsts] else bps
    forM_ (takeWhile (<= bplmt) cbps) $ \ bp ->
      forM_ (let sp = fromIntegral $ strta bp
             in [ sp, sp + fromIntegral bp .. szblmt ]) $ \c ->
        unsafeWrite cmpsts c True
    return cmpsts
  pagesFrom lwi bps = map (`makePg` bps)
                          [ lwi, lwi + fromIntegral szblmt + 1 .. ]

The above code as written has a maximum practical range of about 10^12 or so in about an hour.

The above code takes only a few tens of milliseconds to compute the first million primes and a few seconds to calculate the first 50 million primes up to a billion, with over half of those times expended in just enumerating the result lazy list. A further improvement to reduce the computational cost of repeated list processing across the base pages for every page segment would be to store the required base primes (or base prime gaps) in a lazy list of base prime arrays; in that way the scans across base primes per page segment would just mostly be array accesses which are much faster than list enumeration.

Unlike many other other unbounded examples, this algorithm has the true Sieve of Eratosthenes computational time complexity of O(n log log n) where n is the sieving range with no extra "log n" factor while having a very low computational time cost per composite number cull of less than ten CPU clock cycles per cull (well under as in under 4 clock cycles for the Intel i7 using a page buffer size of the CPU L1 cache size).

There are other ways to make the algorithm faster including high degrees of wheel factorization, which can reduce the number of composite culling operations by a factor of about four for practical ranges, and multi-processing which can reduce the computation time proportionally to the number of available independent CPU cores, but there is little point to these optimizations as long as the lazy list enumeration is the bottleneck as it is starting to be in the above code. To take advantage of those optimizations, functions need to be provided that can compute the desired results without using list processing.

For ranges above about 10^14 where culling spans begin to exceed even an expanded size page array, other techniques need to be adapted such as such as automatically extending the sieving buffer size to the square root of the maximum range currently sieved and sieving by CPU L1/L2 cache sized segments/sections.

However, even with the above code and its limitations for large sieving ranges, the speeds will never come close to as slow as the other "incremental" sieve algorithms, as the time will never exceed about 20 CPU clock cycles per composite number cull, where the fastest of those other algorithms takes many hundreds of CPU clock cycles per cull.

A faster method of counting primes with a similar algorithm

To show the limitations of the individual prime enumeration, the following code has been refactored from the above to provide an alternate very fast method of counting the unset bits in the culled array (the primes = none composite) using a CPU native pop count instruction:

{-# LANGUAGE  FlexibleContexts #-}
{-# OPTIONS_GHC -O2 -fllvm #-} -- use LLVM for about double speed!

import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing

import Data.Int ( Int64 )
import Data.Word ( Word64 )
import Data.Bits ( Bits((.&.), (.|.), shiftL, shiftR, popCount) )
import Control.Monad.ST ( ST, runST )
import Data.Array.Base ( IArray(unsafeAt), UArray(UArray), STUArray,
                         MArray(unsafeRead, unsafeWrite), castSTUArray,
                         unsafeThawSTUArray, unsafeFreezeSTUArray ) 
import Control.Monad ( forM_ )
import Data.Array.ST ( MArray(newArray), runSTUArray )

type Prime = Word64

cSieveBufferRange :: Int
cSieveBufferRange = 2^17 * 8 -- CPU L2 cache in bits

type PrimeNdx = Int64
type SieveBuffer = UArray PrimeNdx Bool
cWHLPRMS :: [Prime]
cWHLPRMS = [ 2 ]
cFRSTSVPRM :: Prime
cFRSTSVPRM = 3
primesPages :: () -> [SieveBuffer]
primesPages() = _Y (pagesFrom 0 . listPagePrms) where
  _Y g = g (_Y g) -- non-sharing multi-stage fixpoint Y-combinator
  szblmt = fromIntegral (cSieveBufferRange `shiftR` 1) - 1
  makePg lwi bps = runSTUArray $ do
    let limi = lwi + fromIntegral szblmt
        mxprm = cFRSTSVPRM + fromIntegral (limi + limi)
        bplmt = floor $ sqrt $ fromIntegral mxprm
        strta bp = let si = fromIntegral $ (bp * bp - cFRSTSVPRM) `shiftR` 1
                   in if si >= lwi then fromIntegral $ si - lwi else
                   let r = fromIntegral (lwi - si) `mod` bp
                   in if r == 0 then 0 else fromIntegral $ bp - r
    cmpsts <- newArray (lwi, limi) False
    fcmpsts <- unsafeFreezeSTUArray cmpsts
    let cbps = if lwi == 0 then listPagePrms [fcmpsts] else bps
    forM_ (takeWhile (<= bplmt) cbps) $ \ bp ->
      forM_ (let sp = fromIntegral $ strta bp
             in [ sp, sp + fromIntegral bp .. szblmt ]) $ \c ->
        unsafeWrite cmpsts c True
    return cmpsts
  pagesFrom lwi bps = map (`makePg` bps)
                          [ lwi, lwi + fromIntegral szblmt + 1 .. ]

-- convert a list of sieve buffers to a list of primes...
listPagePrms :: [SieveBuffer] -> [Prime]
listPagePrms pgs@(pg@(UArray lwi _ rng _) : pgstl) = bsprm `seq` loop 0 where
  bsprm = cFRSTSVPRM + fromIntegral (lwi + lwi)
  loop i | i >= rng = listPagePrms pgstl
         | unsafeAt pg i = loop (i + 1)
         | otherwise = case bsprm + fromIntegral (i + i) of
                         p -> p `seq` p : loop (i + 1)
 
primes :: () -> [Prime]
primes() = cWHLPRMS ++ listPagePrms (primesPages())

-- very fast using popCount by words technique...
countSieveBuffer :: Int -> UArray PrimeNdx Bool -> Int64
countSieveBuffer lstndx sb = fromIntegral $ runST $ do
  cmpsts <- unsafeThawSTUArray sb :: ST s (STUArray s PrimeNdx Bool)
  wrdcmpsts <-
    (castSTUArray :: STUArray s PrimeNdx Bool ->
                      ST s (STUArray s PrimeNdx Word64)) cmpsts
  let lstwrd = lstndx `shiftR` 6
      lstmsk = 0xFFFFFFFFFFFFFFFE `shiftL` (lstndx .&. 63) :: Word64
      loop wi cnt
        | wi < lstwrd = do
          v <- unsafeRead wrdcmpsts wi
          case cnt - popCount v of ncnt -> ncnt `seq` loop (wi + 1) ncnt
        | otherwise = do
            v <- unsafeRead wrdcmpsts lstwrd
            return $ fromIntegral (cnt - popCount (v .|. lstmsk))
  loop 0 (lstwrd * 64 + 64)

-- count the remaining un-marked composite bits using very fast popcount...
countPrimesTo :: Prime -> Int64
countPrimesTo limit =
  let lmtndx = fromIntegral $ (limit - 3) `shiftR` 1
      loop (pg@(UArray lwi lmti rng _) : pgstl) cnt
        | lmti >= lmtndx =
          (cnt + countSieveBuffer (fromIntegral $ lmtndx - lwi) pg)
        | otherwise = loop pgstl (cnt + countSieveBuffer (rng - 1) pg)
  in if limit < 3 then if limit < 2 then 0 else 1
     else loop (primesPages()) 1

-- test it...
main :: IO ()
main = do
  let limit = 10^9 :: Prime

  strt <- getPOSIXTime
--  let answr = length $ takeWhile (<= limit) $ primes()-- slow way
  let answr = countPrimesTo limit -- fast way
  stop <- answr `seq` getPOSIXTime -- force evaluation of answr b4 stop time!
  let elpsd = round $ 1e3 * (stop - strt) :: Int64
 
  putStr $ "Found " ++ show answr
  putStr $ " primes up to " ++ show limit
  putStrLn $ " in " ++ show elpsd ++ " milliseconds."

When compiled with the "fast way" commented out and the "slow way enabled, the time to find the number of primes up to one billion is about 3.65 seconds on an Intel Sandy Bridge i3-2100 at 3.1 Ghz; with the "fast way" enabled instead, the time is only about 1.45 seconds for the same range, both compiled with the LLVM back end. This shows that more than half of the time for the "slow way" is spent just producing and enumerating the list of primes!

On a Intel Sky Lake i5-2500 CPU @ 3.6 GHz (turbo boost for single threaded as here) compiled with LLVM and 256 Kilobyte buffer size (CPU L2 sized), using the fast counting method:

  • takes 1.085 seconds to sieve to 10^9: about 3.81 CPU clocks per cull
  • takes 126 seconds to sieve to 10^11: about 4.0 CPU clocks per cull

This shows a slight loss of efficiency in clocks per cull due to the average culling span size coming closer to the cull buffer span size, meaning that the loop overhead in address calculation and CPU L1 cache overflows increases just a bit for these relative ranges.

This an extra about 20% faster than using the Sandy Bridge i5-2100 above by more than the ratio of CPU clock speeds likely due to the better Instructions Per Clock of the newer Sky Lake architecture due to improved branch prediction and elision of a correctly predicted branch down to close to zero time.

This is about 25 to 30 per cent faster than not using LLVM for this Sky Lake processor due to the poor register allocation and optimizations by the Native Code Gnerator compared to LLVM.

APL-style

Rolling set subtraction over the rolling element-wise addition on integers. Basic, slow, worse than quadratic in the number of primes produced, empirically:

zipWith (flip (!!)) [0..]    -- or: take n . last . take n ...
     . scanl1 minus 
     . scanl1 (zipWith (+)) $ repeat [2..]

Or, a wee bit faster:

unfoldr (\(a:b:t) -> Just . (head &&& (:t) . (`minus` b)
                                           . tail) $ a)
     . scanl1 (zipWith (+)) $ repeat [2..]

A bit optimized, much faster, with better complexity,

tail . concat 
     . unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
                                 . span (< head b) $ a)
     . scanl1 (zipWith (+) . tail) $ tails [1..]
  -- $ [ [n*n, n*n+n..] | n <- [1..] ]

getting nearer to the functional equivalent of the primesPE above, i.e.

fix ( (2:) . concat 
      . unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
                                  . span (< head b) $ a)
      . ([3..] :) . map (\p-> [p*p, p*p+p..]) )

An illustration:

> mapM_ (print . take 15) $ take 10 $ scanl1 (zipWith(+)) $ repeat [2..]
[  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16]
[  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]
[  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48]
[  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64]
[ 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80]
[ 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96]
[ 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98,105,112]
[ 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96,104,112,120,128]
[ 18, 27, 36, 45, 54, 63, 72, 81, 90, 99,108,117,126,135,144]
[ 20, 30, 40, 50, 60, 70, 80, 90,100,110,120,130,140,150,160]

> mapM_ (print . take 15) $ take 10 $ scanl1 (zipWith(+) . tail) $ tails [1..]
[  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]
[  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]
[  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51]
[ 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72]
[ 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
[ 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96,102,108,114,120]
[ 49, 56, 63, 70, 77, 84, 91, 98,105,112,119,126,133,140,147]
[ 64, 72, 80, 88, 96,104,112,120,128,136,144,152,160,168,176]
[ 81, 90, 99,108,117,126,135,144,153,162,171,180,189,198,207]
[100,110,120,130,140,150,160,170,180,190,200,210,220,230,240]

HicEst

REAL :: N=100,  sieve(N)

sieve = $ > 1     ! = 0 1 1 1 1 ...
DO i = 1, N^0.5
  IF( sieve(i) ) THEN
     DO j = i^2, N, i
       sieve(j) = 0
     ENDDO
  ENDIF
ENDDO

DO i = 1, N
  IF( sieve(i) ) WRITE() i
ENDDO

Hoon

::  Find primes by the sieve of Eratosthenes
!:
|=  end=@ud
=/  index  2
=/  primes  `(list @ud)`(gulf 1 end)
|-  ^-  (list @ud)
?:  (gte index (lent primes))  primes
$(index +(index), primes +:(skid primes |=([a=@ud] &((gth a index) =(0 (mod a index))))))

Icon and Unicon

 procedure main()
    sieve(100)
 end

 procedure sieve(n)
    local p,i,j
    p:=list(n, 1)
    every i:=2 to sqrt(n) & j:= i+i to n by i & p[i] == 1
      do p[j] := 0
    every write(i:=2 to n & p[i] == 1 & i)
 end

Alternatively using sets

 procedure main()
     sieve(100)
 end

 procedure sieve(n)
     primes := set()
     every insert(primes,1 to n)
     every member(primes,i := 2 to n) do
         every delete(primes,i + i to n by i)
     delete(primes,1)
     every write(!sort(primes))
end

J

Generally, this task should be accomplished in J using i.&.(p:inv) . Here we take an approach that's more comparable with the other examples on this page.

Implementation:

sieve=: {{
  r=. 0#t=. y# j=.1
  while. y>j=.j+1 do.
    if. j{t do.
      t=. t > y$j{.1
      r=. r, j
    end.
  end.
}}

Example:

   sieve 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

To see into how this works, we can change the definition:

sieve=: {{
  r=. 0#t=. y# j=.1
  while. y>j=.j+1 do.
    if. j{t do.
      echo j;(y$j{.1);t=. t > y$j{.1
      r=. r, j
    end.
  end.
}}

And go:

   sieve 10
┌─┬───────────────────┬───────────────────┐
│2│1 0 1 0 1 0 1 0 1 0│0 1 0 1 0 1 0 1 0 1│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│3│1 0 0 1 0 0 1 0 0 1│0 1 0 0 0 1 0 1 0 0│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│5│1 0 0 0 0 1 0 0 0 0│0 1 0 0 0 0 0 1 0 0│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│7│1 0 0 0 0 0 0 1 0 0│0 1 0 0 0 0 0 0 0 0│
└─┴───────────────────┴───────────────────┘
2 3 5 7

Thus, here, t would select numbers which have not yet been determined to be a multiple of a prime number.

Janet

Simple, all primes below a limit

Janet has a builtin "buffer" type which is used as a mutable byte string. It has builtin utility methods to handle bit strings (see here :)

This is based off the Python version.

(defn primes-before
  "Gives all the primes < limit"
  [limit]
  (assert (int? limit))
  # Janet has a buffer type (mutable string) which has easy methods for use as bitset
  (def buf-size (math/ceil (/ limit 8)))
  (def is-prime (buffer/new-filled buf-size (bnot 0)))
  (print "Size" buf-size "is-prime: " is-prime)
  (buffer/bit-clear is-prime 0)
  (buffer/bit-clear is-prime 1)
  (for n 0 (math/ceil (math/sqrt limit))
    (if (buffer/bit is-prime n) (loop [i :range-to [(* n n) limit n]]
      (buffer/bit-clear is-prime i))))
  (def res @[]) # Result: Mutable array
  (for i 0 limit
    (if (buffer/bit is-prime i)
      (array/push res i)))
  (def res (array/new limit))
  (for i 0 limit
    (if (buffer/bit is-prime i)
      (array/push res i)))
  res)


Java

Works with: Java version 1.5+
import java.util.LinkedList;

public class Sieve{
       public static LinkedList<Integer> sieve(int n){
               if(n < 2) return new LinkedList<Integer>();
               LinkedList<Integer> primes = new LinkedList<Integer>();
               LinkedList<Integer> nums = new LinkedList<Integer>();

               for(int i = 2;i <= n;i++){ //unoptimized
                       nums.add(i);
               }

               while(nums.size() > 0){
                       int nextPrime = nums.remove();
                       for(int i = nextPrime * nextPrime;i <= n;i += nextPrime){
                               nums.removeFirstOccurrence(i);
                       }
                       primes.add(nextPrime);
               }
               return primes;
       }
}

To optimize by testing only odd numbers, replace the loop marked "unoptimized" with these lines:

nums.add(2);
for(int i = 3;i <= n;i += 2){
       nums.add(i);
}

Version using List:

import java.util.ArrayList;
import java.util.List;
 
public class Eratosthenes {
    public List<Integer> sieve(Integer n) {
        List<Integer> primes = new ArrayList<Integer>(n);
        boolean[] isComposite = new boolean[n + 1];
        for(int i = 2; i <= n; i++) {
            if(!isComposite[i]) {
                primes.add(i);
                for(int j = i * i; j <= n; j += i) {
                    isComposite[j] = true;
                }
            }
        }
        return primes;
    }
}

Version using a BitSet:

import java.util.LinkedList;
import java.util.BitSet;

public class Sieve{
    public static LinkedList<Integer> sieve(int n){
        LinkedList<Integer> primes = new LinkedList<Integer>();
        BitSet nonPrimes = new BitSet(n+1);
        
        for (int p = 2; p <= n ; p = nonPrimes.nextClearBit(p+1)) {
            for (int i = p * p; i <= n; i += p)
                nonPrimes.set(i);
            primes.add(p);
        }
        return primes;
    }
}


Version using a TreeSet:

import java.util.Set;
import java.util.TreeSet;

public class Sieve{
    public static Set<Integer> findPrimeNumbers(int limit) {
    int last = 2;
    TreeSet<Integer> nums = new TreeSet<>();

    if(limit < last) return nums;

    for(int i = last; i <= limit; i++){
      nums.add(i);
    }

    return filterList(nums, last, limit);
  }

  private static TreeSet<Integer> filterList(TreeSet<Integer> list, int last, int limit) {
    int squared = last*last;
    if(squared < limit) {
      for(int i=squared; i <= limit; i += last) {
        list.remove(i);
      }
      return filterList(list, list.higher(last), limit);
    } 
    return list; 
  }
}

Infinite iterator

An iterator that will generate primes indefinitely (perhaps until it runs out of memory), but very slowly.

Translation of: Python
Works with: Java version 1.5+
import java.util.Iterator;
import java.util.PriorityQueue;
import java.math.BigInteger;

// generates all prime numbers
public class InfiniteSieve implements Iterator<BigInteger> {

    private static class NonPrimeSequence implements Comparable<NonPrimeSequence> {
	BigInteger currentMultiple;
	BigInteger prime;

	public NonPrimeSequence(BigInteger p) {
	    prime = p;
	    currentMultiple = p.multiply(p); // start at square of prime
	}
	@Override public int compareTo(NonPrimeSequence other) {
	    // sorted by value of current multiple
	    return currentMultiple.compareTo(other.currentMultiple);
	}
    }

    private BigInteger i = BigInteger.valueOf(2);
    // priority queue of the sequences of non-primes
    // the priority queue allows us to get the "next" non-prime quickly
    final PriorityQueue<NonPrimeSequence> nonprimes = new PriorityQueue<NonPrimeSequence>();

    @Override public boolean hasNext() { return true; }
    @Override public BigInteger next() {
	// skip non-prime numbers
	for ( ; !nonprimes.isEmpty() && i.equals(nonprimes.peek().currentMultiple); i = i.add(BigInteger.ONE)) {
            // for each sequence that generates this number,
            // have it go to the next number (simply add the prime)
            // and re-position it in the priority queue
	    while (nonprimes.peek().currentMultiple.equals(i)) {
		NonPrimeSequence x = nonprimes.poll();
		x.currentMultiple = x.currentMultiple.add(x.prime);
		nonprimes.offer(x);
	    }
	}
	// prime
        // insert a NonPrimeSequence object into the priority queue
	nonprimes.offer(new NonPrimeSequence(i));
	BigInteger result = i;
	i = i.add(BigInteger.ONE);
	return result;
    }

    public static void main(String[] args) {
	Iterator<BigInteger> sieve = new InfiniteSieve();
	for (int i = 0; i < 25; i++) {
	    System.out.println(sieve.next());
	}
    }
}
Output:
2
3
5
7
11
13
17
19

Infinite iterator with a faster algorithm (sieves odds-only)

The adding of each discovered prime's incremental step information to the mapping should be postponed until the candidate number reaches the primes square, as it is useless before that point. This drastically reduces the space complexity from O(n/log(n)) to O(sqrt(n/log(n))), in n primes produced, and also lowers the run time complexity due to the use of the hash table based HashMap, which is much more efficient for large ranges.

Translation of: Python
Works with: Java version 1.5+
import java.util.Iterator;
import java.util.HashMap;
 
// generates all prime numbers up to about 10 ^ 19 if one can wait 1000's of years or so...
public class SoEInfHashMap implements Iterator<Long> {

  long candidate = 2;
  Iterator<Long> baseprimes = null;
  long basep = 3;
  long basepsqr = 9;
  // HashMap of the sequences of non-primes
  // the hash map allows us to get the "next" non-prime reasonably quickly
  // but further allows re-insertions to take amortized constant time
  final HashMap<Long,Long> nonprimes = new HashMap<>();

  @Override public boolean hasNext() { return true; }
  @Override public Long next() {
    // do the initial primes separately to initialize the base primes sequence
    if (this.candidate <= 5L) if (this.candidate++ == 2L) return 2L; else {
      this.candidate++; if (this.candidate == 5L) return 3L; else {
        this.baseprimes = new SoEInfHashMap();
        this.baseprimes.next(); this.baseprimes.next(); // throw away 2 and 3
        return 5L;
    } }
    // skip non-prime numbers including squares of next base prime
    for ( ; this.candidate >= this.basepsqr || //equals nextbase squared => not prime
              nonprimes.containsKey(this.candidate); candidate += 2) {
      // insert a square root prime sequence into hash map if to limit
      if (candidate >= basepsqr) { // if square of base prime, always equal
        long adv = this.basep << 1;
        nonprimes.put(this.basep * this.basep + adv, adv);
        this.basep = this.baseprimes.next();
        this.basepsqr = this.basep * this.basep;
      }
      // else for each sequence that generates this number,
      // have it go to the next number (simply add the advance)
      // and re-position it in the hash map at an emply slot
      else {
        long adv = nonprimes.remove(this.candidate);
        long nxt = this.candidate + adv;
        while (this.nonprimes.containsKey(nxt)) nxt += adv; //unique keys
        this.nonprimes.put(nxt, adv);
      }
    }
    // prime
    long tmp = candidate; this.candidate += 2; return tmp;
  }

  public static void main(String[] args) {    
    int n = 100000000;    
    long strt = System.currentTimeMillis();    
    SoEInfHashMap sieve = new SoEInfHashMap();
    int count = 0;
    while (sieve.next() <= n) count++;    
    long elpsd = System.currentTimeMillis() - strt;    
    System.out.println("Found " + count + " primes up to " + n + " in " + elpsd + " milliseconds.");
  }
  
}
Output:
Found 5761455 primes up to 100000000 in 4297 milliseconds.

Infinite iterator with a very fast page segmentation algorithm (sieves odds-only)

Although somewhat faster than the previous infinite iterator version, the above code is still over 10 times slower than an infinite iterator based on array paged segmentation as in the following code, where the time to enumerate/iterate over the found primes (common to all the iterators) is now about half of the total execution time:

Translation of: JavaScript
Works with: Java version 1.5+
import java.util.Iterator;
import java.util.ArrayList;

// generates all prime numbers up to about 10 ^ 19 if one can wait 100's of years or so...
// practical range is about 10^14 in a week or so...
public class SoEPagedOdds implements Iterator<Long> {
  private final int BFSZ = 1 << 16;
  private final int BFBTS = BFSZ * 32;
  private final int BFRNG = BFBTS * 2;
  private long bi = -1;
  private long lowi = 0;
  private final ArrayList<Integer> bpa = new ArrayList<>();
  private Iterator<Long> bps;
  private final int[] buf = new int[BFSZ];
  
  @Override public boolean hasNext() { return true; }
  @Override public Long next() {
    if (this.bi < 1) {
      if (this.bi < 0) {
        this.bi = 0;
        return 2L;
      }
      //this.bi muxt be 0
      long nxt = 3 + (this.lowi << 1) + BFRNG;
      if (this.lowi <= 0) { // special culling for first page as no base primes yet:
          for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
              if ((this.buf[i >>> 5] & (1 << (i & 31))) == 0)
                  for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
                      this.buf[j >>> 5] |= 1 << (j & 31);
      }
      else { // after the first page:
        for (int i = 0; i < this.buf.length; i++)
          this.buf[i] = 0; // clear the sieve buffer
        if (this.bpa.isEmpty()) { // if this is the first page after the zero one:
            this.bps = new SoEPagedOdds(); // initialize separate base primes stream:
            this.bps.next(); // advance past the only even prime of two
            this.bpa.add(this.bps.next().intValue()); // get the next prime (3 in this case)
        }
        // get enough base primes for the page range...
        for (long p = this.bpa.get(this.bpa.size() - 1), sqr = p * p; sqr < nxt;
                p = this.bps.next(), this.bpa.add((int)p), sqr = p * p) ;
        for (int i = 0; i < this.bpa.size() - 1; i++) {
          long p = this.bpa.get(i);
          long s = (p * p - 3) >>> 1;
          if (s >= this.lowi) // adjust start index based on page lower limit...
            s -= this.lowi;
          else {
            long r = (this.lowi - s) % p;
            s = (r != 0) ? p - r : 0;
          }
          for (int j = (int)s; j < BFBTS; j += p)
            this.buf[j >>> 5] |= 1 << (j & 31);
        }
      }
    }
    while ((this.bi < BFBTS) &&
           ((this.buf[(int)this.bi >>> 5] & (1 << ((int)this.bi & 31))) != 0))
        this.bi++; // find next marker still with prime status
    if (this.bi < BFBTS) // within buffer: output computed prime
        return 3 + ((this.lowi + this.bi++) << 1);
    else { // beyond buffer range: advance buffer
        this.bi = 0;
        this.lowi += BFBTS;
        return this.next(); // and recursively loop
    }
  }

  public static void main(String[] args) {    
    long n = 1000000000;
    long strt = System.currentTimeMillis();
    Iterator<Long> gen = new SoEPagedOdds();
    int count = 0;
    while (gen.next() <= n) count++;
    long elpsd = System.currentTimeMillis() - strt;
    System.out.println("Found " + count + " primes up to " + n + " in " + elpsd + " milliseconds.");
  }
  
}
Output:
Found 50847534 primes up to 1000000000 in 3201 milliseconds.

JavaScript

function eratosthenes(limit) {
    var primes = [];
    if (limit >= 2) {
        var sqrtlmt = Math.sqrt(limit) - 2;
        var nums = new Array(); // start with an empty Array...
        for (var i = 2; i <= limit; i++) // and
            nums.push(i); // only initialize the Array once...
        for (var i = 0; i <= sqrtlmt; i++) {
            var p = nums[i]
            if (p)
                for (var j = p * p - 2; j < nums.length; j += p)
                    nums[j] = 0;
        }
        for (var i = 0; i < nums.length; i++) {
            var p = nums[i];
            if (p)
                primes.push(p);
        }
    }
    return primes;
}

var primes = eratosthenes(100);

if (typeof print == "undefined")
    print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print(primes);

outputs:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97

Substituting the following code for the function for an odds-only algorithm using bit packing for the array produces code that is many times faster than the above:

function eratosthenes(limit) {
    var prms = [];
    if (limit >= 2) prms = [2];
    if (limit >= 3) {
        var sqrtlmt = (Math.sqrt(limit) - 3) >> 1;
        var lmt = (limit - 3) >> 1;
        var bfsz = (lmt >> 5) + 1
        var buf = [];
        for (var i = 0; i < bfsz; i++)
            buf.push(0);
        for (var i = 0; i <= sqrtlmt; i++)
            if ((buf[i >> 5] & (1 << (i & 31))) == 0) {
                var p = i + i + 3;
                for (var j = (p * p - 3) >> 1; j <= lmt; j += p)
                    buf[j >> 5] |= 1 << (j & 31);
            }
        for (var i = 0; i <= lmt; i++)
            if ((buf[i >> 5] & (1 << (i & 31))) == 0)
                prms.push(i + i + 3);
    }
    return prms;
}

While the above code is quite fast especially using an efficient JavaScript engine such as Google Chrome's V8, it isn't as elegant as it could be using the features of the new EcmaScript6 specification when it comes out about the end of 2014 and when JavaScript engines including those of browsers implement that standard in that we might choose to implement an incremental algorithm iterators or generators similar to as implemented in Python or F# (yield). Meanwhile, we can emulate some of those features by using a simulation of an iterator class (which is easier than using a call-back function) for an "infinite" generator based on an Object dictionary as in the following odds-only code written as a JavaScript class:

var SoEIncClass = (function () {
    function SoEIncClass() {
        this.n = 0;
    }
    SoEIncClass.prototype.next = function () {
        this.n += 2;
        if (this.n < 7) { // initialization of sequence to avoid runaway:
            if (this.n < 3) { // only even of two:
                this.n = 1; // odds from here...
                return 2;
            }
            if (this.n < 5)
                return 3;
            this.dict = {}; // n must be 5...
            this.bps = new SoEIncClass(); // new source of base primes
            this.bps.next(); // advance past the even prime of two...
            this.p = this.bps.next(); // first odd prime (3 in this case)
            this.q = this.p * this.p; // set guard
            return 5;
        } else { // past initialization:
            var s = this.dict[this.n]; // may or may not be defined...
            if (!s) { // not defined:
                if (this.n < this.q) // haven't reached the guard:
                    return this.n; // found a prime
                else { // n === q => not prime but at guard, so:
                    var p2 = this.p << 1; // the span odds-only is twice prime
                    this.dict[this.n + p2] = p2; // add next composite of prime to dict
                    this.p = this.bps.next();
                    this.q = this.p * this.p; // get next base prime guard
                    return this.next(); // not prime so advance...
                }
            } else { // is a found composite of previous base prime => not prime
                delete this.dict[this.n]; // advance to next composite of this prime:
                var nxt = this.n + s;
                while (this.dict[nxt]) nxt += s; // find unique empty slot in dict
                this.dict[nxt] = s; // to put the next composite for this base prime
                return this.next(); // not prime so advance...
            }
        }
    };
    return SoEIncClass;
})();

The above code can be used to find the nth prime (which would require estimating the required range limit using the previous fixed range code) by using the following code:

var gen = new SoEIncClass(); 
for (var i = 1; i < 1000000; i++, gen.next());
var prime = gen.next();
 
if (typeof print == "undefined")
    print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print(prime);

to produce the following output (about five seconds using Google Chrome's V8 JavaScript engine):

15485863

The above code is considerably slower than the fixed range code due to the multiple method calls and the use of an object as a dictionary, which (using a hash table as its basis for most implementations) will have about a constant O(1) amortized time per operation but has quite a high constant overhead to convert the numeric indices to strings which are then hashed to be used as table keys for the look-up operations as compared to doing this more directly in implementations such as the Python dict with Python's built-in hashing functions for every supported type.

This can be implemented as an "infinite" odds-only generator using page segmentation for a considerable speed-up with the alternate JavaScript class code as follows:

var SoEPgClass = (function () {
    function SoEPgClass() {
        this.bi = -1; // constructor resets the enumeration to start...
    }
    SoEPgClass.prototype.next = function () {
        if (this.bi < 1) {
            if (this.bi < 0) {
                this.bi++;
                this.lowi = 0; // other initialization done here...
                this.bpa = [];
                return 2;
            } else { // bi must be zero:
                var nxt = 3 + (this.lowi << 1) + 262144;
                this.buf = new Array();
                for (var i = 0; i < 4096; i++) // faster initialization:
                    this.buf.push(0);
                if (this.lowi <= 0) { // special culling for first page as no base primes yet:
                    for (var i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
                        if ((this.buf[i >> 5] & (1 << (i & 31))) === 0)
                            for (var j = (sqr - 3) >> 1; j < 131072; j += p)
                                this.buf[j >> 5] |= 1 << (j & 31);
                } else { // after the first page:
                    if (!this.bpa.length) { // if this is the first page after the zero one:
                        this.bps = new SoEPgClass(); // initialize separate base primes stream:
                        this.bps.next(); // advance past the only even prime of two
                        this.bpa.push(this.bps.next()); // get the next prime (3 in this case)
                    }
                    // get enough base primes for the page range...
                    for (var p = this.bpa[this.bpa.length - 1], sqr = p * p; sqr < nxt;
                            p = this.bps.next(), this.bpa.push(p), sqr = p * p) ;
                    for (var i = 0; i < this.bpa.length; i++) {
                        var p = this.bpa[i];
                        var s = (p * p - 3) >> 1;
                        if (s >= this.lowi) // adjust start index based on page lower limit...
                            s -= this.lowi;
                        else {
                            var r = (this.lowi - s) % p;
                            s = (r != 0) ? p - r : 0;
                        }
                        for (var j = s; j < 131072; j += p)
                            this.buf[j >> 5] |= 1 << (j & 31);
                    }
                }
            }
        }
        while (this.bi < 131072 && this.buf[this.bi >> 5] & (1 << (this.bi & 31)))
            this.bi++; // find next marker still with prime status
        if (this.bi < 131072) // within buffer: output computed prime
            return 3 + ((this.lowi + this.bi++) << 1);
        else { // beyond buffer range: advance buffer
            this.bi = 0;
            this.lowi += 131072;
            return this.next(); // and recursively loop
        }
    };
    return SoEPgClass;
})();

The above code is about fifty times faster (about five seconds to calculate 50 million primes to about a billion on the Google Chrome V8 JavaScript engine) than the above dictionary based code.

The speed for both of these "infinite" solutions will also respond to further wheel factorization techniques, especially for the dictionary based version where any added overhead to deal with the factorization wheel will be negligible compared to the dictionary overhead. The dictionary version would likely speed up about a factor of three or a little more with maximum wheel factorization applied; the page segmented version probably won't gain more than a factor of two and perhaps less due to the overheads of array look-up operations.

function is copy-pasted from above to produce a webpage version for beginners:

<script>
function eratosthenes(limit) {
    var primes = [];
    if (limit >= 2) {
        var sqrtlmt = Math.sqrt(limit) - 2;
        var nums = new Array(); // start with an empty Array...
        for (var i = 2; i <= limit; i++) // and
            nums.push(i); // only initialize the Array once...
        for (var i = 0; i <= sqrtlmt; i++) {
            var p = nums[i]
            if (p)
                for (var j = p * p - 2; j < nums.length; j += p)
                    nums[j] = 0;
        }
        for (var i = 0; i < nums.length; i++) {
            var p = nums[i];
            if (p)
                primes.push(p);
        }
    }
    return primes;
}
var primes = eratosthenes(100);
	output='';
        for (var i = 0; i < primes.length; i++) {
		output+=primes[i];	
		if (i < primes.length-1) output+=',';
        }
document.write(output);
</script>

JOVIAL

START
FILE MYOUTPUT ... $ ''Insufficient information to complete this declaration''
PROC SIEVEE $
    '' define the sieve data structure ''
    ARRAY CANDIDATES 1000 B $
    FOR I =0,1,999 $
    BEGIN
        '' everything is potentially prime until proven otherwise ''
        CANDIDATES($I$) = 1$
    END
    '' Neither 1 nor 0 is prime, so flag them off ''
    CANDIDATES($0$) = 0$
    CANDIDATES($1$) = 0$
    '' start the sieve with the integer 0 ''
    FOR I = 0$
    BEGIN
        IF I GE 1000$
        GOTO DONE$
        '' advance to the next un-crossed out number. ''
        '' this number must be a prime ''
NEXTI.  IF I LS 1000 AND Candidates($I$) EQ 0 $
        BEGIN
            I = I + 1 $
            GOTO NEXTI $
        END
        '' insure against running off the end of the data structure ''
        IF I LT 1000 $
        BEGIN
            '' cross out all multiples of the prime, starting with 2*p. ''
            FOR J=2 $
            FOR K=0 $
            BEGIN
                K = J * I $
                IF K GT 999 $
                GOTO ADV $
                CANDIDATES($K$) = 0 $
                J = J + 1 $
            END
            '' advance to the next candidate ''
ADV.        I = I + 1 $
        END
    END
    '' all uncrossed-out numbers are prime (and only those numbers) ''
    '' print all primes ''
DONE. OPEN OUTPUT MYOUTPUT $
    FOR I=0,1,999$
    BEGIN
        IF CANDIDATES($I$) NQ 0$
        BEGIN
            OUTPUT MYOUTPUT I $
        END
    END
TERM$

jq

Works with: jq version 1.4

Bare Bones

Short and sweet ...

# Denoting the input by $n, which is assumed to be a positive integer,
# eratosthenes/0 produces an array of primes less than or equal to $n:
def eratosthenes:

  def erase(i):
    if .[i] then
      reduce (range(2*i; length; i)) as $j (.; .[$j] = false) 
    else .
    end;

  (. + 1) as $n
  | (($n|sqrt) / 2) as $s
  | [null, null, range(2; $n)]
  | reduce (2, 1 + (2 * range(1; $s))) as $i (.; erase($i))
  | map(select(.));

Examples:

100 | eratosthenes
Output:

[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]

1e7 | eratosthenes | length
Output:

664579

Enhanced Sieve

Here is a more economical variant that:

  • produces a stream of primes less than or equal to a given integer;
  • only records the status of odd integers greater than 3 during the sieving process;
  • optimizes the inner loop as described in the task description.
def primes:
  # The array we use for the sieve only stores information for the odd integers greater than 1:
  #  index   integer
  #      0         3
  #      k   2*k + 3
  # So if we wish to mark m = 2*k + 3, the relevant index is: m - 3 / 2
  def ix:
    if . % 2 == 0 then null
    else ((. - 3) / 2)
    end;
    
  # erase(i) sets .[i*j] to false for odd integral j > i, and assumes i is odd
  def erase(i):
    ((i - 3) / 2) as $k
    # Consider relevant multiples:
    then (((length * 2 + 3) / i)) as $upper
    # ... only consider odd multiples from i onwards
    | reduce range(i; $upper; 2) as $j (.;
         (((i * $j) - 3) / 2) as $m
         | if .[$m] then .[$m] = false else . end);

  if . < 2 then []
  else (. + 1) as $n
  | (($n|sqrt) / 2) as $s
  | [range(3; $n; 2)|true]
  | reduce (1 + (2 * range(1; $s)) ) as $i (.; erase($i))
  | . as $sieve
  | 2, (range(3; $n; 2) | select($sieve[ix]))
  end ;

def count(s): reduce s as $_ (0; .+1);

count(1e6 | primes)
Output:
78498

Julia

Started with 2 already in the array, and then test only for odd numbers and push the prime ones onto the array.

# Returns an array of positive prime numbers less than or equal to lim
function sieve(lim :: Int)
    if lim < 2 return [] end
    limi :: Int = (lim - 1) ÷ 2 # calculate the required array size
    isprime :: Array{Bool} = trues(limi)
    llimi :: Int = (isqrt(lim) - 1) ÷ 2 # and calculate maximum root prime index
    result :: Array{Int} = [2]  #Initial array
    for i in 1:limi
        if isprime[i]
            p = i + i + 1 # 2i + 1
            if i <= llimi
                for j = (p*p-1)>>>1:p:limi # quick shift/divide in case LLVM doesn't optimize divide by 2 away
                    isprime[j] = false
                end
            end
            push!(result, p)
        end
    end
    return result
end

Alternate version using findall to get all primes at once in the end

function sieve(n::Integer)
    primes = fill(true, n)
    primes[1] = false
    for p in 2:n
        primes[p] || continue
        primes[p .* (2:n÷p)] .= false
    end
    findall(primes)
end

At about 35 seconds for a range of a billion on my Intel Atom i5-Z8350 CPU at 1.92 GHz (single threaded) or about 70 CPU clock cycles per culling operation, the above examples are two of the very slowest ways to compute the Sieve of Eratosthenes over any kind of a reasonable range due to a couple of factors:

  1. The output primes are extracted to a result array which takes time (and memory) to construct.
  2. They use the naive "one huge memory array" method, which has poor memory access speed for larger ranges.


Even though the first uses an odds-only algorithm (not noted in the text as is a requirement of the task) that reduces the number of operations by a factor of about two and a half times, it is not faster than the second, which is not odds-only due to the second being set up to take advantage of the `findall` function to directly output the indices of the remaining true values as the found primes; the second is faster due to the first taking longer to push the found primes singly to the constructed array, whereas internally the second first creates the array to the size of the counted true values and then just fills it.

Also, the first uses more memory than necessary in one byte per `Bool` where using a `BitArray` as in the second reduces this by a factor of eight.

If one is going to "crib" the MatLab algorithm as above, one may as well do it using odds-only as per the MatLab built-in. The following alternate code improves on the "Alternate" example above by making it sieve odds-only and adjusting the result array contents after to suit:

function sieve2(n :: Int)
    ni = (n - 1) ÷ 2
    isprime = trues(ni)
    for i in 1:ni
        if isprime[i]
            j = 2i * (i + 1)
            if j > ni
                m = findall(isprime)
                map!((i::Int) -> 2i + 1, m, m)
                return pushfirst!(m, 2)
            else
                p = 2i + 1
                while j <= ni
                  isprime[j] = false
                  j += p
                end
            end
        end
    end
end

This takes less about 18.5 seconds or 36 CPU cycles per culling operation to find the primes to a billion, but that is still quite slow compared to what can be done below. Note that the result array needs to be created then copied, created by the findall function, then modified in place by the map! function to transform the indices to primes, and finally copied by the pushfirst! function to add the only even prime of two to the beginning, but these operations are quire fast. However, this still consumes a lot of memory, as in about 64 Megabytes for the sieve buffer and over 400 Megabytes for the result (8-byte Int's for 64 bit execution) to sieve to a billion, and culling the huge culling buffer that doesn't fit the CPU cache sizes is what makes it slow.

Iterator Output

The creation of an output results array is not necessary if the purpose is just to scan across the resulting primes once, they can be output using an iterator (from a `BitArray`) as in the following odds-only code:

const Prime = UInt64

struct Primes
    rangei :: Int64
    primebits :: BitArray{1}
    function Primes(n :: Int64)
        if n < 3
          if n < 2 return new(-1, falses(0)) # no elements
          else return new((0, trues(0))) end # n = 2: meaning is 1 element of 2
        end
        limi :: Int = (n - 1) ÷ 2 # calculate the required array size
        isprimes :: BitArray = trues(limi)
        @inbounds(
        for i in 1:limi
            p = i + i + 1
            start = (p * p - 1) >>> 1 # shift/divide if LLVM doesn't optimize
            if start > limi
                return new(limi, isprimes)
            end
            if isprimes[i]
                for j in start:p:limi
                  isprimes[j] = false
                end
            end
        end)
    end
end

Base.eltype(::Type{Primes}) = Prime

function Base.length(P::Primes)::Int64
    if P.rangei < 0 return 0 end
    return 1 + count(P.primebits)
end

function Base.iterate(P::Primes, state::Int = 0)::
                                        Union{Tuple{Prime, Int}, Nothing}
    lmt = P.rangei
    if state > lmt return nothing end
    if state <= 0 return (UInt64(2), 1) end
    let
        prmbts = P.primebits
        i = state
        @inbounds(
        while i <= lmt && !prmbts[i] i += 1 end)
        if i > lmt return nothing end
        return (i + i + 1, i + 1)
    end
end

for which using the following code:

function bench()
  @time length(Primes(100)) # warm up JIT
#  println(@time count(x->true, Primes(1000000000))) # about 1.5 seconds slower counting over iteration
  println(@time length(Primes(1000000000)))
end
bench()

results in the following output:

Output:
  0.000031 seconds (3 allocations: 160 bytes)
 12.214533 seconds (4 allocations: 59.605 MiB, 0.42% gc time)
50847534

This reduces the CPU cycles per culling cycles to about 24.4, but it's still slow due to using the one largish array. Note that counting each iterated prime takes an additional about one and a half seconds, where if all that is required is the count of primes over a range the specialized length function is much faster.

Page Segmented Algorithm

For any kind of reasonably large range such as a billion, a page segmented version should be used with the pages sized to the CPU caches for much better memory access times. As well, the following odds-only example uses a custom bit packing algorithm for a further two times speed-up, also reducing the memory allocation delays by reusing the sieve buffers when possible (usually possible):

const Prime = UInt64
const BasePrime = UInt32
const BasePrimesArray = Array{BasePrime,1}
const SieveBuffer = Array{UInt8,1}

# contains a lazy list of a secondary base primes arrays feed
# NOT thread safe; needs a Mutex gate to make it so...
abstract type BPAS end # stands in for BasePrimesArrays, not defined yet
mutable struct BasePrimesArrays <: BPAS
    thunk :: Union{Nothing,Function} # problem with efficiency - untyped function!!!!!!!!!
    value :: Union{Nothing,Tuple{BasePrimesArray, BPAS}}
    BasePrimesArrays(thunk::Function) = new(thunk)
end
Base.eltype(::Type{BasePrimesArrays}) = BasePrime
Base.IteratorSize(::Type{BasePrimesArrays}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(BPAs::BasePrimesArrays, state::BasePrimesArrays = BPAs)
    if state.thunk !== nothing
        newvalue :: Union{Nothing,Tuple{BasePrimesArray, BasePrimesArrays}} =
            state.thunk() :: Union{Nothing,Tuple{BasePrimesArray
                                                 , BasePrimesArrays}}
        state.value = newvalue
        state.thunk = nothing
        return newvalue
    end
    state.value
end

# count the number of zero bits (primes) in a byte array,
# also works for part arrays/slices, best used as an `@view`...
function countComposites(cmpsts::AbstractArray{UInt8,1})
    foldl((a, b) -> a + count_zeros(b), cmpsts; init = 0)
end

# converts an entire sieved array of bytes into an array of UInt32 primes,
# to be used as a source of base primes...
function composites2BasePrimesArray(low::Prime, cmpsts::SieveBuffer)
    limiti = length(cmpsts) * 8
    len :: Int = countComposites(cmpsts)
    rslt :: BasePrimesArray = BasePrimesArray(undef, len)
    i :: Int = 0
    j :: Int = 1
    @inbounds(
    while i < limiti
        if cmpsts[i >>> 3 + 1] & (1 << (i & 7)) == 0
            rslt[j] = low + i + i
            j += 1
        end
        i += 1
    end)
    rslt
end

# sieving work done, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
function sieveComposites(low::Prime, buffer::Array{UInt8,1},
                                     bpas::BasePrimesArrays)
    lowi :: Int = (low - 3) ÷ 2
    len :: Int = length(buffer)
    limiti :: Int = len * 8 - 1
    nexti :: Int = lowi + limiti
    for bpa::BasePrimesArray in bpas
        for bp::BasePrime in bpa
            bpint :: Int = bp
            bpi :: Int = (bpint - 3) >>> 1
            starti :: Int = 2 * bpi * (bpi + 3) + 3
            starti >= nexti && return
            if starti >= lowi starti -= lowi
            else
                r :: Int = (lowi - starti) % bpint
                starti = r == 0 ? 0 : bpint - r
            end
            lmti :: Int = limiti - 40 * bpint
            @inbounds(
            if bpint <= (len >>> 2) starti <= lmti
                for i in 1:8
                    if starti > limiti break end
                    mask = convert(UInt8,1) << (starti & 7)
                    c = starti >>> 3 + 1
                    while c <= len
                        buffer[c] |= mask
                        c += bpint
                    end
                    starti += bpint
                end
            else
                c = starti
                while c <= limiti
                    buffer[c >>> 3 + 1] |= convert(UInt8,1) << (c & 7)
                    c += bpint
                end
            end)
        end
    end
    return
end

# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
function makeBasePrimesArrays() :: BasePrimesArrays
    cmpsts :: SieveBuffer = Array{UInt8,1}(undef, 512)
    function nextelem(low::Prime, bpas::BasePrimesArrays) ::
                                    Tuple{BasePrimesArray, BasePrimesArrays}
        # calculate size so that the bit span is at least as big as the
        # maximum culling prime required, rounded up to minsizebits blocks...
        reqdsize :: Int = 2 + isqrt(1 + low)
        size :: Int = (reqdsize ÷ 4096 + 1) * 4096 ÷ 8 # size in bytes
        if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
        fill!(cmpsts, 0)
        sieveComposites(low, cmpsts, bpas)
        arr :: BasePrimesArray = composites2BasePrimesArray(low, cmpsts)
        next :: Prime = low + length(cmpsts) * 8 * 2
        arr, BasePrimesArrays(() -> nextelem(next, bpas))
    end
    # pre-seeding breaks recursive race,
    # as only known base primes used for first page...
    preseedarr :: BasePrimesArray = # pre-seed to 100, can sieve to 10,000...
        [ 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
        , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
        ]
    nextfunc :: Function = () ->
        (nextelem(convert(Prime,101), makeBasePrimesArrays()))
    firstfunc :: Function = () -> (preseedarr, BasePrimesArrays(nextfunc))
    BasePrimesArrays(firstfunc)
end

# an iterator over successive sieved buffer composite arrays,
# returning a tuple of the value represented by the lowest possible prime
# in the sieved composites array and the array itself;
# the array has a 16 Kilobytes minimum size (CPU L1 cache), but
# will grow so that the bit span is larger than the
# maximum culling base prime required, possibly making it larger than
# the L1 cache for large ranges, but still reasonably efficient using
# the L2 cache: very efficient up to about 16e9 range;
# reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 week...
struct PrimesPages
    baseprimes :: BasePrimesArrays
    PrimesPages() = new(makeBasePrimesArrays())
end
Base.eltype(::Type{PrimesPages}) = SieveBuffer
Base.IteratorSize(::Type{PrimesPages}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PP::PrimesPages,
                      state :: Tuple{Prime,SieveBuffer} =
                            ( convert(Prime,3), Array{UInt8,1}(undef,16384) ))
    (low, cmpsts) = state
    # calculate size so that the bit span is at least as big as the
    # maximum culling prime required, rounded up to minsizebits blocks...
    reqdsize :: Int = 2 + isqrt(1 + low)
    size :: Int = (reqdsize ÷ 131072 + 1) * 131072 ÷ 8 # size in bytes
    if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
    fill!(cmpsts, 0)
    sieveComposites(low, cmpsts, PP.baseprimes)
    newlow :: Prime = low + length(cmpsts) * 8 * 2
    ( low, cmpsts ), ( newlow, cmpsts )
end

function countPrimesTo(range::Prime) :: Int64
    range < 3 && ((range < 2 && return 0) || return 1)
    count :: Int64 = 1
    for ( low, cmpsts ) in PrimesPages() # almost never exits!!!
        if low + length(cmpsts) * 8 * 2 > range
            lasti :: Int = (range - low) ÷ 2
            count += countComposites(@view cmpsts[1:lasti >>> 3])
            count += count_zeros(cmpsts[lasti >>> 3 + 1] |
                                 (0xFE << (lasti & 7)))
            return count
        end
        count += countComposites(cmpsts)
    end
    count
end

# iterator over primes from above page iterator;
# unless doing something special with individual primes, usually unnecessary;
# better to do manipulations based on the composites bit arrays...
# takes at least as long to enumerate the primes as sieve them...
mutable struct PrimesPaged
    primespages :: PrimesPages
    primespageiter :: Tuple{Tuple{Prime,SieveBuffer},Tuple{Prime,SieveBuffer}}
    PrimesPaged() = let PP = PrimesPages(); new(PP, Base.iterate(PP)) end
end
Base.eltype(::Type{PrimesPaged}) = Prime
Base.IteratorSize(::Type{PrimesPaged}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PP::PrimesPaged, state::Int = -1 )
    state < 0 && return Prime(2), 0
    (low, cmpsts) = PP.primespageiter[1]
    len = length(cmpsts) * 8
    @inbounds(
    while state < len && cmpsts[state >>> 3 + 1] &
                         (UInt8(1) << (state & 7)) != 0
        state += 1
    end)
    if state >= len
        PP.primespageiter = Base.iterate(PP.primespages, PP.primespageiter[2])
        return Base.iterate(PP, 0)
    end
    low + state + state, state + 1
end

When tested with the following code:

function bench()
    print("( ")
    for p in PrimesPaged() p > 100 && break; print(p, " ") end
    println(")")
    countPrimesTo(Prime(100)) # warm up JIT
#=
    println(@time let count = 0
                      for p in PrimesPaged()
                          p > 1000000000 && break
                          count += 1
                      end; count end) # much slower counting over iteration
=#
    println(@time countPrimesTo(Prime(1000000000)))
end
bench()

it produces the following:

Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
  1.947145 seconds (59 allocations: 39.078 KiB)
50847534

Note that "the slow way" as commented out in the code takes an extra about 4.85 seconds to count the primes to a billion, or longer to enumerate the primes than to cull the composites; this makes further work in making this yet faster pointless unless techniques such as the one used here to count the number of found primes by just counting the un-cancelled bit representations in the sieved sieve buffers are used.

This takes about 1.9 seconds to count the primes to a billion (using the fast technique), or about 3.75 clock cycles per culling operation, which is reasonably fast; this is almost 20 times faster the the first naive sieves. As written, the algorithm maintains its efficiency up to about 16 billion and then slows down as the buffer size increases beyond the CPU L1 cache size into the L2 cache size such that it takes about 436.8 seconds to sieve to 100 billion instead of the expected about 300 seconds; however, an extra feature of "double buffered sieving" could be added so that the buffer is sieved in L1 cache slices followed by a final sweep of the entire buffer by the few remaining cull operations that use the larger primes for only a slight reduction in average cycles per cull up to a range of about 2.56e14 (for this CPU). For really large ranges above that, another sieving technique known as the "bucket sieve" that sorts the culling operations by page so that processing time is not expended for values that don't "hit" a given page can be used for only a slight additional reduction in efficiency.

Additionally, maximal wheel factorization can reduce the time by about a factor of four, plus multi-processing where the work is shared across the CPU cores can produce a further speed-up by the factor of the number of cores (only three times on this four-core machine due to the clock speed reducing to 75% of the rate when all cores are used), for an additional about 12 times speed-up for this CPU. These improvements are just slightly too complex to post here.

However, even the version posted shows that the naive "one huge array" implementations should never be used for sieving ranges of over a few million, and that Julia can come very close to the speed of the fastest languages such as C/C++ for the same algorithm.

Functional Algorithm

One of the best simple purely functional Sieve of Eratosthenes algorithms is the infinite tree folding sequence algorithm as implemented in Haskell. As Julia does not have a standard LazyList implementation or library and as a full memoizing lazy list is not required for this algorithm, the following odds-only code implements the rudiments of a Co-Inductive Stream (CIS) in its implementation:

const Thunk = Function # can't define other than as a generalized Function

struct CIS{T}
    head :: T
    tail :: Thunk # produces the next CIS{T}
    CIS{T}(head :: T, tail :: Thunk) where T = new(head, tail)
end
Base.eltype(::Type{CIS{T}}) where T = T
Base.IteratorSize(::Type{CIS{T}}) where T = Base.SizeUnknown()
function Base.iterate(C::CIS{T}, state = C) :: Tuple{T, CIS{T}} where T
    state.head, state.tail()
end

function treefoldingprimes()::CIS{Int}
    function merge(xs::CIS{Int}, ys::CIS{Int})::CIS{Int}
        x = xs.head; y = ys.head
        if x < y CIS{Int}(x, () -> merge(xs.tail(), ys))
        elseif y < x CIS{Int}(y, () -> merge(xs, ys.tail()))
        else CIS{Int}(x, () -> merge(xs.tail(), ys.tail())) end
    end
    function pmultiples(p::Int)::CIS{Int}
        adv :: Int = p + p
        next(c::Int)::CIS{Int} = CIS{Int}(c, () -> next(c + adv)); next(p * p)
    end
    function allmultiples(ps::CIS{Int})::CIS{CIS{Int}}
        CIS{CIS{Int}}(pmultiples(ps.head), () -> allmultiples(ps.tail()))
    end
    function pairs(css :: CIS{CIS{Int}})::CIS{CIS{Int}}
        nextcss = css.tail()
        CIS{CIS{Int}}(merge(css.head, nextcss.head), ()->pairs(nextcss.tail()))
    end
    function composites(css :: CIS{CIS{Int}})::CIS{Int}
        CIS{Int}(css.head.head, ()-> merge(css.head.tail(),
                                            css.tail() |> pairs |> composites))
    end
    function minusat(n::Int, cs::CIS{Int})::CIS{Int}
        if n < cs.head CIS{Int}(n, () -> minusat(n + 2, cs))
        else minusat(n + 2, cs.tail()) end
    end
    oddprimes()::CIS{Int} = CIS{Int}(3, () -> minusat(5, oddprimes()
                                        |> allmultiples |> composites))
    CIS{Int}(2, () -> oddprimes())
end

when tested with the following:

@time let count = 0; for p in treefoldingprimes() p > 1000000 && break; count += 1 end; count end

it outputs the following:

Output:
  1.791058 seconds (10.23 M allocations: 290.862 MiB, 3.64% gc time)
78498

At about 1.8 seconds or 4000 cycles per culling operation to calculate the number of primes up to a million, this is very slow, but that is not the fault of Julia but rather just that purely functional incremental Sieve of Eratosthenes implementations are much slower than those using mutable arrays and are only useful over quite limited ranges of a few million. For one thing, incremental algorithms have O(n log n log log n) asymptotic execution complexity rather than O(n log log n) (an extra log n factor) and for another the constant execution overhead is much larger in creating (and garbage collecting) elements in the sequences.

The time for this algorithm is quite comparable to as implemented in other functional languages such as F# and actually faster than implementing the same algorithm in C/C++, but slower than as implemented in purely functional languages such as Haskell or even in only partly functional languages such as Kotlin by a factor of ten or more; this is due to those languages having specialized memory allocation that is very fast at allocating small amounts of memory per allocation as is often a requirement of functional programming. The majority of the time spent for this algorithm is spent allocating memory, and if future versions of Julia are to be of better use in purely functional programming, improvements need to be made to the memory allocation.

Infinite (Mutable) Iterator Using (Mutable) Dictionary

To gain some extra speed above the purely functional algorithm above, the Python'ish version as a mutable iterator embedding a mutable standard base Dictionary can be used. The following version uses a secondary delayed injection stream of "base" primes defined recursively to provide the successions of composite values in the Dictionary to be used for sieving:

const Prime = UInt64
abstract type PrimesDictAbstract end # used for forward reference
mutable struct PrimesDict <: PrimesDictAbstract
    sieve :: Dict{Prime,Prime}
    baseprimes :: PrimesDictAbstract
    lastbaseprime :: Prime
    q :: Prime
    PrimesDict() = new(Dict())
end
Base.eltype(::Type{PrimesDict}) = Prime
Base.IteratorSize(::Type{PrimesDict}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PD::PrimesDict, state::Prime = Prime(0) )
    if state < 1
        PD.baseprimes = PrimesDict()
        PD.lastbaseprime = Prime(3)
        PD.q = Prime(9)
        return Prime(2), Prime(1)
    end
    dict = PD.sieve
    while true
        state += 2
        if !haskey(dict, state)
            state < PD.q && return state, state
            p = PD.lastbaseprime # now, state = PD.q in all cases
            adv = p + p # since state is at PD.q, advance to next
            dict[state + adv] = adv # adds base prime composite stream
            # following initializes secondary base strea first time
            p <= 3 && Base.iterate(PD.baseprimes)
            p = Base.iterate(PD.baseprimes, p)[1] # next base prime
            PD.lastbaseprime = p
            PD.q = p * p
        else # advance hit composite in dictionary...
            adv = pop!(dict, state)
            next = state + adv
            while haskey(dict, next) next += adv end
            dict[next] = adv # past other composite hits in dictionary
        end
    end
end

The above version can be used and tested with similar code as for the functional version, but is about ten times faster at about 400 CPU clock cycles per culling operation, meaning it has a practical range ten times larger although it still has a O(n (log n) (log log n)) asymptotic performance complexity; for larger ranges such as sieving to a billion or more, this is still over a hundred times slower than the page segmented version using a page segmented sieving array.

Klingphix

include ..\Utilitys.tlhy

%limit %i
1000 !limit
( 1 $limit ) sequence

( 2 $limit sqrt int ) [ !i $i get [ ( 2 $limit 1 - $i / int ) [ $i * false swap set ] for ] if ] for
( 1 $limit false ) remove
pstack

"Press ENTER to end " input

Kotlin

import kotlin.math.sqrt

fun sieve(max: Int): List<Int> {
    val xs = (2..max).toMutableList()
    val limit = sqrt(max.toDouble()).toInt()
    for (x in 2..limit) xs -= x * x..max step x
    return xs
}

fun main(args: Array<String>) {
    println(sieve(100))
}
Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Alternative much faster odds-only version that outputs an enumeration

The above version is quite slow for a lot of reasons: It includes even number culling even though those will be eliminated on the first pass; It uses a list rather than an array to do the composite culling (both of the above reasons also meaning it takes more memory); It uses enumerations (for..in) to implement loops at a execution time cost per loop. It also consumes more memory in the final result output as another list.

The following code overcomes most of those problems: It only culls odd composites; it culls a bit-packed primitive array (also saving memory); It uses tailcall recursive functions for the loops, which are compiled into simple loops. It also outputs the results as an enumeration, which isn't fast but does not consume any more memory than the culling array. In this way, the program is only limited in sieving range by the maximum size limit of the culling array, although as it grows larger than the CPU cache sizes, it loses greatly in speed; however, that doesn't matter so much if just enumerating the results.

fun primesOdds(rng: Int): Iterable<Int> {
    val topi = (rng - 3) shr 1
    val lstw = topi shr 5
    val sqrtndx = (Math.sqrt(rng.toDouble()).toInt() - 3) shr 1
    val cmpsts = IntArray(lstw + 1)

    tailrec fun testloop(i: Int) {
        if (i <= sqrtndx) {
            if (cmpsts[i shr 5] and (1 shl (i and 31)) == 0) {
                val p = i + i + 3
                tailrec fun cullp(j: Int) {
                    if (j <= topi) {
                        cmpsts[j shr 5] = cmpsts[j shr 5] or (1 shl (j and 31))
                        cullp(j + p)
                    }
                }
                cullp((p * p - 3) shr 1)
            }
            testloop(i + 1)
        }
    }

    tailrec fun test(i: Int): Int {
        return if (i <= topi && cmpsts[i shr 5] and (1 shl (i and 31)) != 0) {
            test(i + 1)
        } else {
            i
        }
    }

    testloop(0)

    val iter = object : IntIterator() {
        var i = -1
        override fun nextInt(): Int {
            val oi = i
            i = test(i + 1)
            return if (oi < 0) 2 else oi + oi + 3
        }
        override fun hasNext() = i < topi
    }
    return Iterable { -> iter }
}

fun main(args: Array<String>) {
    primesOdds(100).forEach { print("$it ") }
    println()
    println(primesOdds(1000000).count())
}
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
78498

Concise Functional Versions

Ah, one might say, for such a trivial range one writes for conciseness and not for speed. Well, I say, one can still save memory and some time using odds-only and a bit-packed array, but write very clear and concise (but slower) code using nothing but higher order functions and function calling. The following code using such techniques can use the same "main" function for the same output but is about two times slower, mostly due to the extra time spent making (nested) function calls, including the function calls necessary for enumeration. Note that the effect of using the "(l .. h).forEach { .. }" is the same as the "for i in l .. h { .. }" as both use an iteration across the range but the second is just syntax sugar to make it look more imperative:

fun primesOdds(rng: Int): Iterable<Int> {
    val topi = (rng - 3) / 2 //convert to nearest index
    val size = topi / 32 + 1 //word size to include index
    val sqrtndx = (Math.sqrt(rng.toDouble()).toInt() - 3) / 2
    val cmpsts = IntArray(size)
    fun is_p(i: Int) = cmpsts[i shr 5] and (1 shl (i and 0x1F)) == 0
    fun cull(i: Int) { cmpsts[i shr 5] = cmpsts[i shr 5] or (1 shl (i and 0x1F)) }
    fun cullp(p: Int) = (((p * p - 3) / 2 .. topi).step(p)).forEach { cull(it) }
    (0 .. sqrtndx).filter { is_p(it) }.forEach { cullp(it + it + 3) }
    fun i2p(i: Int) = if (i < 0) 2 else i + i + 3
    val orng = (-1 .. topi).filter { it < 0 || is_p(it) }.map { i2p(it) }
    return Iterable { -> orng.iterator() }
}

The trouble with the above version is that, at least for Kotlin version 1.0, the ".filter" and ".map" extension functions for Iterable<Int> create Java "ArrayList"'s as their output (which are wrapped to return the Kotlin "List<Int>" interface), thus take a considerable amount of memory worse than the first version (using an ArrayList to store the resulting primes), since as the calculations are chained to ".map", require a second ArrayList of up to the same size while the mapping is being done. The following version uses Sequences , which aren't backed by any permanent structure, but it is another small factor slower due to the nested function calls:

fun primesOdds(rng: Int): Iterable<Int> {
    val topi = (rng - 3) / 2 //convert to nearest index
    val size = topi / 32 + 1 //word size to include index
    val sqrtndx = (Math.sqrt(rng.toDouble()).toInt() - 3) / 2
    val cmpsts = IntArray(size)
    fun is_p(i: Int) = cmpsts[i shr 5] and (1 shl (i and 0x1F)) == 0
    fun cull(i: Int) { cmpsts[i shr 5] = cmpsts[i shr 5] or (1 shl (i and 0x1F)) }
    fun iseq(high: Int, low: Int = 0, stp: Int = 1) =
            Sequence { (low .. high step(stp)).iterator() }
    fun cullp(p: Int) = iseq(topi, (p * p - 3) / 2, p).forEach { cull(it) }
    iseq(sqrtndx).filter { is_p(it) }.forEach { cullp(it + it + 3) }
    fun i2p(i: Int) = if (i < 0) 2 else i + i + 3
    val oseq = iseq(topi, -1).filter { it < 0 || is_p(it) }.map { i2p(it) }
    return Iterable { -> oseq.iterator() }
}

Unbounded Versions

An incremental odds-only sieve outputting a sequence (iterator)

The following Sieve of Eratosthenes is not purely functional in that it uses a Mutable HashMap to store the state of succeeding composite numbers to be skipped over, but embodies the principles of an incremental implementation of the Sieve of Eratosthenes sieving odds-only and is faster than most incremental sieves due to using mutability. As with the fastest of this kind of sieve, it uses a delayed secondary primes feed as a source of base primes to generate the composite number progressions. The code as follows:

fun primesHM(): Sequence<Int> = sequence {
    yield(2)
    fun oddprms(): Sequence<Int> = sequence {
        yield(3); yield(5) // need at least 2 for initialization
        val hm = HashMap<Int,Int>()
        hm.put(9, 6)
        val bps = oddprms().iterator(); bps.next(); bps.next() // skip past 5
        yieldAll(generateSequence(SieveState(7, 5, 25)) {
            ss ->
                var n = ss.n; var q = ss.q
                n += 2
                while ( n >= q || hm.containsKey(n)) {
                    if (n >= q) {
                        val inc = ss.bp shl 1
                        hm.put(n + inc, inc)
                        val bp = bps.next(); ss.bp = bp; q = bp * bp
                    }
                    else {
                        val inc = hm.remove(n)!!
                        var next = n + inc
                        while (hm.containsKey(next)) {
                            next += inc
                        }
                        hm.put(next, inc)
                    }
                    n += 2
                }
                ss.n = n; ss.q = q
                ss
        }.map { it.n })
    }
    yieldAll(oddprms())
}

At about 370 clock cycles per culling operation (about 3,800 cycles per prime) on my tablet class Intel CPU, this is not blazing fast but adequate for ranges of a few millions to a hundred million and thus fine for doing things like solving Euler problems. For instance, Euler Problem 10 of summing the primes to two million can be done with the following "one-liner":

primesHM().takeWhile { it <= 2_000_000 }.map { it.toLong() }.sum()

to output the correct answer of the following in about 270 milliseconds for my Intel x5-Z8350 at 1.92 Gigahertz:

Output:
142913828922

A purely functional Incremental Sieve of Eratosthenes that outputs a sequence (iterator)

Following is a Kotlin implementation of the Tree Folding Incremental Sieve of Eratosthenes from an adaptation of the algorithm by Richard Bird. It is based on lazy lists, but in fact the memoization (and cost in execution time) of a lazy list is not required and the following code uses a "roll-your-own" implementation of a Co-Inductive Stream CIS). The final output is as a Sequence for convenience in using it. The code is written as purely function in that no mutation is used:

Translation of: Haskell
data class CIS<T>(val head: T, val tailf: () -> CIS<T>) {
  fun toSequence() = generateSequence(this) { it.tailf() } .map { it.head }
}

fun primes(): Sequence<Int> {
  fun merge(a: CIS<Int>, b: CIS<Int>): CIS<Int> {
    val ahd = a.head; val bhd = b.head
    if (ahd > bhd) return CIS(bhd) { ->merge(a, b.tailf()) }
    if (ahd < bhd) return CIS(ahd) { ->merge(a.tailf(), b) }
    return CIS(ahd) { ->merge(a.tailf(), b.tailf()) }
  }
  fun bpmults(p: Int): CIS<Int> {
    val inc = p + p
    fun mlts(c: Int): CIS<Int> = CIS(c) { ->mlts(c + inc) }
    return mlts(p * p)
  }
  fun allmults(ps: CIS<Int>): CIS<CIS<Int>> = CIS(bpmults(ps.head)) { allmults(ps.tailf()) }
  fun pairs(css: CIS<CIS<Int>>): CIS<CIS<Int>> {
    val xs = css.head; val yss = css.tailf(); val ys = yss.head
    return CIS(merge(xs, ys)) { ->pairs(yss.tailf()) }
  }
  fun union(css: CIS<CIS<Int>>): CIS<Int> {
    val xs = css.head
    return CIS(xs.head) { -> merge(xs.tailf(), union(pairs(css.tailf()))) }
  }
  tailrec fun minus(n: Int, cs: CIS<Int>): CIS<Int> =
    if (n >= cs.head) minus(n + 2, cs.tailf()) else CIS(n) { ->minus(n + 2, cs) }    
  fun oddprms(): CIS<Int> = CIS(3) { -> CIS(5) { ->minus(7, union(allmults(oddprms()))) } }
  return CIS(2) { ->oddprms() } .toSequence()
}

fun main(args: Array<String>) {
  val limit = 1000000
  val strt = System.currentTimeMillis()
  println(primes().takeWhile { it <= limit } .count())
  val stop = System.currentTimeMillis()
  println("Took ${stop - strt} milliseconds.")
}

The code is about five times slower than the more imperative hash table based version immediately above due to the costs of the extra levels of function calls in the functional style. The Haskell version from which this is derived is much faster due to the extensive optimizations it does to do with function/closure "lifting" as well as a Garbage Collector specifically tuned for functional code.

An unbounded Page Segmented Sieve of Eratosthenes that can output a sequence (iterator)

The very fastest implementations of a primes sieve are all based on bit-packed mutable arrays which can be made unbounded by setting them up so that they are a succession of sieved bit-packed arrays that have been culled of composites. The following code is an odds=only implementation that, again, uses a secondary feed of base primes that is only expanded as necessary (in this case memoized by a rudimentary lazy list structure to avoid recalculation for every base primes sweep per page segment):

internal typealias Prime = Long
internal typealias BasePrime = Int
internal typealias BasePrimeArray = IntArray
internal typealias SieveBuffer = ByteArray

// contains a lazy list of a secondary base prime arrays feed
internal data class BasePrimeArrays(val arr: BasePrimeArray,
                                     val rest: Lazy<BasePrimeArrays?>)
                                                : Sequence<BasePrimeArray> {
    override fun iterator() =
        generateSequence(this) { it.rest.value }
            .map { it.arr }.iterator()
}

// count the number of zero bits (primes) in a byte array,
fun countComposites(cmpsts: SieveBuffer): Int {
    var cnt = 0
    for (b in cmpsts) {
        cnt += java.lang.Integer.bitCount(b.toInt().and(0xFF))
    }
    return cmpsts.size.shl(3) - cnt
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
fun composites2BasePrimeArray(low: Int, cmpsts: SieveBuffer)
                                                            : BasePrimeArray {
    val lmti = cmpsts.size.shl(3)
    val len = countComposites(cmpsts)
    val rslt = BasePrimeArray(len)
    var j = 0
    for (i in 0 until lmti) {
        if (cmpsts[i.shr(3)].toInt() and 1.shl(i and 7) == 0) {
            rslt[j] = low + i + i; j++
        }
    }
    return rslt
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
fun sieveComposites(low: Prime, buffer: SieveBuffer,
                             bpas: Sequence<BasePrimeArray>) {
    val lowi = (low - 3L).shr(1)
    val len = buffer.size
    val lmti = len.shl(3)
    val nxti = lowi + lmti.toLong()
    for (bpa in bpas) {
        for (bp in bpa) {
            val bpi = (bp - 3).shr(1).toLong()
            var strti = (bpi * (bpi + 3L)).shl(1) + 3L
            if (strti >= nxti) return
            val s0 =
                if (strti >= lowi) (strti - lowi).toInt()
                else {
                    val r = (lowi - strti) % bp.toLong()
                    if (r.toInt() == 0) 0 else bp - r.toInt()
                }
            if (bp <= len.shr(3) && s0 <= lmti - bp.shl(6)) {
                val slmti = minOf(lmti, s0 + bp.shl(3))
                tailrec fun mods(s: Int) {
                    if (s < slmti) {
                        val msk = 1.shl(s and 7)
                        tailrec fun cull(c: Int) {
                            if (c < len) {
                                buffer[c] = (buffer[c].toInt() or msk).toByte()
                                cull(c + bp)
                            }
                        }
                        cull(s.shr(3)); mods(s + bp)
                    }
                }
                mods(s0)
            }
            else {
                tailrec fun cull(c: Int) {
                    if (c < lmti) {
                        val w = c.shr(3)
                        buffer[w] = (buffer[w].toInt() or 1.shl(c and 7)).toByte()
                        cull(c + bp)
                    }
                }
                cull(s0)
            }
        }
    } 
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
fun makeBasePrimeArrays(): Sequence<BasePrimeArray> {
    var cmpsts = SieveBuffer(512)
    fun nextelem(low: Int, bpas: Sequence<BasePrimeArray>): BasePrimeArrays {
        // calculate size so that the bit span is at least as big as the
        // maximum culling prime required, rounded up to minsizebits blocks...
        val rqdsz = 2 + Math.sqrt((1 + low).toDouble()).toInt()
        val sz = (rqdsz.shr(12) + 1).shl(9) // size iin bytes
        if (sz > cmpsts.size) cmpsts = SieveBuffer(sz)
        cmpsts.fill(0)
        sieveComposites(low.toLong(), cmpsts, bpas)
        val arr = composites2BasePrimeArray(low, cmpsts)
        val nxt = low + cmpsts.size.shl(4)
        return BasePrimeArrays(arr, lazy { ->nextelem(nxt, bpas) })
    }
    // pre-seeding breaks recursive race,
    // as only known base primes used for first page...
    var preseedarr = intArrayOf( // pre-seed to 100, can sieve to 10,000...
        3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
        , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 )
    return BasePrimeArrays(preseedarr, lazy {->nextelem(101, makeBasePrimeArrays())})
}

// a seqence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
fun makeSievePages(): Sequence<Pair<Prime,SieveBuffer>> {
    val bpas = makeBasePrimeArrays() // secondary source of base prime arrays
    fun init(): SieveBuffer {
        val c = SieveBuffer(16384); sieveComposites(3L, c, bpas); return c }
    return generateSequence(Pair(3L, init())) {
        (low, cmpsts) ->
            // calculate size so that the bit span is at least as big as the
            // max culling prime required, rounded up to minsizebits blocks...
            val rqdsz = 2 + Math.sqrt((1 + low).toDouble()).toInt()
            val sz = (rqdsz.shr(17) + 1).shl(14) // size iin bytes
            val ncmpsts = if (sz > cmpsts.size) SieveBuffer(sz) else cmpsts
            ncmpsts.fill(0)
            val nlow = low + ncmpsts.size.toLong().shl(4)
            sieveComposites(nlow, ncmpsts, bpas)
            Pair(nlow, ncmpsts)
    }
}

fun countPrimesTo(range: Prime): Prime {
    if (range < 3) { if (range < 2) return 0 else return 1 }
    var count = 1L
    for ((low,cmpsts) in makeSievePages()) {
        if (low + cmpsts.size.shl(4) > range) {
            val lsti = (range - low).shr(1).toInt()
            val lstw = lsti.shr(3)
            val msk = -2.shl(lsti.and(7))
            count += 32 + lstw.shl(3)
            for (i in 0 until lstw)
                count -= java.lang.Integer.bitCount(cmpsts[i].toInt().and(0xFF))
            count -= java.lang.Integer.bitCount(cmpsts[lstw].toInt().or(msk))
            break
        } else {
            count += countComposites(cmpsts)
        }
    }
    return count
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
fun primesPaged(): Sequence<Prime> = sequence {
    yield(2L)
    for ((low,cmpsts) in makeSievePages()) {
        val szbts = cmpsts.size.shl(3)
        for (i in 0 until szbts) {
            if (cmpsts[i.shr(3)].toInt() and 1.shl(i and 7) != 0) continue
            yield(low + i.shl(1).toLong())
        }
    }
}

For this implementation, counting the primes to a million is trivial at about 15 milliseconds on the same CPU as above, or almost too short to count.

It shows its speed in solving the Euler Problem 10 above about five times faster at about 50 milliseconds to give the same output:

It can sum the primes to 200 million or a hundred times the range in just over three seconds.

It finds the count of primes to a billion in about 16 seconds or just about 1000 times slower than to sum the primes to a range 1000 times less for an almost linear response to range as it should be.

However, much of the time (about two thirds) is spent iterating over the results rather than doing the actual work of sieving; for this sort of problem such as counting, finding the nth prime, finding occurrences of maximum prime gaps, etc., one should really use specialized function that directly manipulate the output sieve arrays. Such a function is provided by the `countPrimeTo` function, which can count the primes to a billion (50847534) in about 5.65 seconds, or about 10.6 clock cycles per culling operation or about 210 cycles per prime.

Kotlin isn't really fast even as compared to other virtual machine languages such as C# and F# on CLI but that is mostly due to limitations of the Java Virtual Machine (JVM) as to speed of generated Just In Time (JIT) compilation, handling of primitive number operations, enforced array bounds checks, etc. It will always be much slower than native code producing compilers and the (experimental) native compiler for Kotlin still isn't up to speed (pun intended), producing code that is many times slower than the code run on the JVM (December 2018).

Lambdatalk

• 1) create an array of natural numbers, [0,1,2,3, ... ,n-1]
• 2) the 3rd number is 2, we set to dots all its composites by steps of 2,
• 3) the 4th number is 3, we set to dots all its composites by steps of 3,
• 4) the 6th number is 5, we set to dots all its composites by steps of 5,
• 5) the remaining numbers are primes and we clean all dots.

For instance:

1: 0 0 0 0 0 0 0 0 0 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
2: 0 1 2 3 . 5 . 7 . 9 . 1 . 3 . 5 . 7 . 9 . 1 . 3 . 5 . 7 . 9 .
3: 0 1 2 3 . 5 . 7 . . . 1 . 3 . . . 7 . 9 . . . 3 . 5 . . . 9 .
4: 0 1 2 3 . 5 . 7 . . . 1 . 3 . . . 7 . 9 . . . 3 . . . . . 9 .
       | |   |   |       |   |       |   |       |           |
5:     0 0   0   0       1   1       1   1       2           2
       2 3   5   7       1   3       7   9       3           9


1) recursive version {rsieve n}

{def rsieve

  {def rsieve.mark
   {lambda {:n :a :i :j}
    {if {< :j :n}
     then {rsieve.mark :n 
                       {A.set! :j . :a}
                       :i
                       {+ :i :j}}
     else :a}}}

  {def rsieve.loop
   {lambda {:n :a :i}
    {if {< {* :i :i} :n}
     then {rsieve.loop :n
                       {if {W.equal? {A.get :i :a} .}
                        then :a 
                        else {rsieve.mark :n :a :i {* :i :i}}}
                       {+ :i 1}}
     else :a}}}

 {lambda {:n}
  {S.replace \s by space in 
   {S.replace (\[|\]|\.|,) by space in
    {A.disp
     {A.slice 2 -1 
      {rsieve.loop :n
       {A.new {S.serie 0 :n}} 2}}}}}}}
-> rsieve

{rsieve 1000}
-> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Note: this version doesn't avoid stackoverflow.

2) iterative version {isieve n}

{def isieve

 {def isieve.mark 
  {lambda {:n :a :i}
   {S.map {{lambda {:a :j} {A.set! :j . :a}
           } :a}
          {S.serie {* :i :i} :n :i} }}}

 {lambda {:n}
  {S.replace \s by space in 
   {S.replace (\[|\]|\.|,) by space in
    {A.disp
     {A.slice 2 -1 
      {S.last 
       {S.map {{lambda {:n :a :i} {if {W.equal? {A.get :i :a} .}
                                   then
                                   else {isieve.mark :n :a :i}}
               } :n {A.new {S.serie 0 :n}}}
              {S.serie 2 {sqrt :n} 1}}}}}}}}}
-> isieve

{isieve 1000}
-> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Notes: 
- this version avoids stackoverflow. 
- From 1 to 1000000 there are 78500 primes (computed in ~15000ms) and the last is 999983.

langur

Translation of: D
val .sieve = fn(.limit) {
    if .limit < 2: return []

    var .composite = .limit * [false]
    .composite[1] = true

    for .n in 2 .. trunc(.limit ^/ 2) + 1 {
        if not .composite[.n] {
            for .k = .n^2 ; .k < .limit ; .k += .n {
                .composite[.k] = true
            }
        }
    }

    filter fn(.n) { not .composite[.n] }, series .limit-1
}

writeln .sieve(100)
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

LFE

(defmodule eratosthenes
   (export (sieve 1)))

(defun sieve (limit)
   (sieve limit (lists:seq 2 limit)))

(defun sieve
   ((limit (= l (cons p _))) (when (> (* p p) limit))
      l)
   ((limit (cons p ns))
      (cons p (sieve limit (remove-multiples p (* p p) ns)))))

(defun remove-multiples (p q l)
   (lists:reverse (remove-multiples p q l '())))

(defun remove-multiples
   ((_ _ '() s) s)
   ((p q (cons q ns) s)
      (remove-multiples p q ns s))
   ((p q (= r (cons a _)) s) (when (> a q))
      (remove-multiples p (+ q p) r s))
   ((p q (cons n ns) s)
      (remove-multiples p q ns (cons n s))))
Output:
lfe> (slurp "sieve.lfe")
#(ok eratosthenes)
lfe> (sieve 100)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Liberty BASIC

    'Notice that arrays are globally visible to functions.
    'The sieve() function uses the flags() array.
    'This is a Sieve benchmark adapted from BYTE 1985
    ' May, page 286

    size = 7000
    dim flags(7001)
    start = time$("ms")
    print sieve(size); " primes found."
    print "End of iteration.  Elapsed time in milliseconds: "; time$("ms")-start
    end

    function sieve(size)
        for i = 0 to size
            if flags(i) = 0 then
                prime = i + i + 3
                k = i + prime
                while k <= size
                    flags(k) = 1
                    k = k + prime
                wend
                sieve = sieve + 1
            end if
        next i
    end function

Limbo

implement Sieve;

include "sys.m";
	sys: Sys;
	print: import sys;
include "draw.m";
	draw: Draw;

Sieve : module
{
	init : fn(ctxt : ref Draw->Context, args : list of string);
};

init (ctxt: ref Draw->Context, args: list of string)
{
	sys = load Sys Sys->PATH;

	limit := 201;
	sieve : array of int;
	sieve = array [201] of {* => 1};
	(sieve[0], sieve[1]) = (0, 0);

	for (n := 2; n < limit; n++) {
		if (sieve[n]) {
			for (i := n*n; i < limit; i += n) {
				sieve[i] = 0;
			}
		}
	}

	for (n = 1; n < limit; n++) {
		if (sieve[n]) {
			print ("%4d", n);
		} else {
			print("   .");
		};
		if ((n%20) == 0) 
			print("\n\n");
	}	
}

Lingo

-- parent script "sieve"
property _sieve

----------------------------------------
-- @constructor
----------------------------------------
on new (me)
    me._sieve = []
    return me
end

----------------------------------------
-- Returns list of primes <= n
----------------------------------------
on getPrimes (me, limit)
    if me._sieve.count<limit then me._primeSieve(limit)
    primes = []
    repeat with i = 2 to limit
        if me._sieve[i] then primes.add(i)
    end repeat
    return primes
end

----------------------------------------
-- Sieve of Eratosthenes
----------------------------------------
on _primeSieve (me, limit)
    me._sieve = [0]
    repeat with i = 2 to limit
        me._sieve[i] = 1
    end repeat
    c = sqrt(limit)
    repeat with i = 2 to c
        if (me._sieve[i]=0) then next repeat
        j = i*i -- start with square
        repeat while (j<=limit)
            me._sieve[j] = 0
            j = j + i
        end repeat
    end repeat
end
sieve = script("sieve").new()
put sieve.getPrimes(100)
Output:
-- [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

LiveCode

function sieveE int
    set itemdel to comma
    local sieve
    repeat with i = 2 to int
        put i into sieve[i]
    end repeat
    put 2 into n
    repeat while n < int
        repeat with p = n to int step n
            if p = n then 
                next repeat
            else
                put empty into sieve[p]
            end if
        end repeat
        add 1 to n
    end repeat
    combine sieve with comma
    filter items of sieve without empty
    sort items of sieve ascending numeric
    return sieve
end sieveE

Example

put sieveE(121)
--  2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113


# Sieve of Eratosthenes
# calculates prime numbers up to a given number
 
on mouseUp
   put field "maximum" into limit  
   put the ticks into startTicks      # start a timer
   repeat with i = 2 to limit step 1  # load array with zeros
      put 0 into prime_array[i]
   end repeat
   
   repeat with i = 2 to trunc(sqrt(limit)) # truncate square root
      if prime_array[i] = 0 then  
         repeat with k = (i * i) to limit step i
            delete variable prime_array[k] # remove non-primes
         end repeat
      end if
   end repeat
   put the ticks - startTicks into elapsedTicks  # stop timer
   put elapsedTicks / 60 into field "elapsed"    # calculate time
   
   put the keys of prime_array into prime_numbers # array to variable
   put the number of lines of keys of prime_array into field "count"
   sort lines of prime_numbers ascending numeric  
   put prime_numbers into field "primeList"      # show prime numbers
end mouseUp

LiveCode output example

Comments

LiveCode uses a mouse graphical drag and drop. No text code was used to create a button and fields; The user enters a number into the 'maximum' field and then clicks a button to run the code. It runs identically whether in the LiveCode IDE or when compiled to a executable on Mac, Windows, and Linux.

The example was run on an Intel i5 CPU @ 3.29 GHz; all primes found up to 1,000,000 in 3 seconds.

to sieve :limit
  make "a (array :limit 2)     ; initialized to empty lists
  make "p []
  for [i 2 :limit] [
    if empty? item :i :a [
      queue "p :i
      for [j [:i * :i] :limit :i] [setitem :j :a :i]
    ]
  ]
  output :p
end
print sieve 100   ; 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Logtalk

This example is incorrect. Please fix the code and remove this message.

Details: Not a true Sieve of Eratosthenes but rather a Trial Division Sieve

due to the use of mod (modulo = division) in the filter function.

A coinduction based solution just for fun:

:- object(sieve).

	:- public(primes/2).

	:- coinductive([
		sieve/2, filter/3
	]).

	% computes a coinductive list with all the primes in the 2..N interval
	primes(N, Primes) :-
		generate_infinite_list(N, List),
		sieve(List, Primes).

	% generate a coinductive list with a 2..N repeating patern
	generate_infinite_list(N, List) :-
		sequence(2, N, List, List).

	sequence(Sup, Sup, [Sup| List], List) :-
		!.
	sequence(Inf, Sup, [Inf| List], Tail) :-
		Next is Inf + 1,
		sequence(Next, Sup, List, Tail).

	sieve([H| T], [H| R]) :-
		filter(H, T, F),
		sieve(F, R).

	filter(H, [K| T], L) :-
		(	K > H, K mod H =:= 0 ->
			% throw away the multiple we found
			L = T1
		;	% we must not throw away the integer used for filtering
			% as we must return a filtered coinductive list
			L = [K| T1]
		),
		filter(H, T, T1).

:- end_object.

Example query:

?- sieve::primes(20, P).
P = [2, 3|_S1], % where
    _S1 = [5, 7, 11, 13, 17, 19, 2, 3|_S1] .

LOLCODE

HAI 1.2
CAN HAS STDIO?

HOW IZ I Eratosumthin YR Max
  I HAS A Siv ITZ A BUKKIT
  Siv HAS A SRS 1 ITZ 0
  I HAS A Index ITZ 2
  IM IN YR Inishul UPPIN YR Dummy WILE DIFFRINT Index AN SUM OF Max AN 1
    Siv HAS A SRS Index ITZ 1
    Index R SUM OF Index AN 1
  IM OUTTA YR Inishul

  I HAS A Prime ITZ 2
  IM IN YR MainLoop UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN PRODUKT OF Prime AN Prime
    BOTH SAEM Siv'Z SRS Prime AN 1
    O RLY?
      YA RLY
        Index R SUM OF Prime AN Prime
        IM IN YR MarkMultipulz UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN Index
          Siv'Z SRS Index R 0
          Index R SUM OF Index AN Prime
        IM OUTTA YR MarkMultipulz
    OIC
    Prime R SUM OF Prime AN 1
  IM OUTTA YR MainLoop

  Index R 1
  I HAS A First ITZ WIN
  IM IN YR PrintPrimes UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN Index
    BOTH SAEM Siv'Z SRS Index AN 1
    O RLY?
      YA RLY
        First
        O RLY?
          YA RLY
            First R FAIL
          NO WAI
            VISIBLE ", "!
        OIC
        VISIBLE Index!
    OIC
    Index R SUM OF Index AN 1
  IM OUTTA YR PrintPrimes
  VISIBLE ""
IF U SAY SO

I IZ Eratosumthin YR 100 MKAY

KTHXBYE
Output:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

Lua

function erato(n)
  if n < 2 then return {} end
  local t = {0} -- clears '1'
  local sqrtlmt = math.sqrt(n)
  for i = 2, n do t[i] = 1 end
  for i = 2, sqrtlmt do if t[i] ~= 0 then for j = i*i, n, i do t[j] = 0 end end end
  local primes = {}
  for i = 2, n do if t[i] ~= 0 then table.insert(primes, i) end end
  return primes
end

The following changes the code to odds-only using the same large array-based algorithm:

function erato2(n)
  if n < 2 then return {} end
  if n < 3 then return {2} end
  local t = {}
  local lmt = (n - 3) / 2
  local sqrtlmt = (math.sqrt(n) - 3) / 2
  for i = 0, lmt do t[i] = 1 end
  for i = 0, sqrtlmt do if t[i] ~= 0 then
    local p = i + i + 3
    for j = (p*p - 3) / 2, lmt, p do t[j] = 0 end end end
  local primes = {2}
  for i = 0, lmt do if t[i] ~= 0 then table.insert(primes, i + i + 3) end end
  return primes
end

The following code implements an odds-only "infinite" generator style using a table as a hash table, including postponing adding base primes to the table:

function newEratoInf()
  local _cand = 0; local _lstbp = 3; local _lstsqr = 9
  local _composites = {}; local _bps = nil
  local _self = {}
  function _self.next()
    if _cand < 9 then if _cand < 1 then _cand = 1; return 2
                     elseif _cand >= 7 then
                       --advance aux source base primes to 3...
                       _bps = newEratoInf()
                       _bps.next(); _bps.next() end end
    _cand = _cand + 2
    if _composites[_cand] == nil then -- may be prime
      if _cand >= _lstsqr then -- if not the next base prime
        local adv = _lstbp + _lstbp -- if next base prime
        _composites[_lstbp * _lstbp + adv] = adv -- add cull seq
        _lstbp = _bps.next(); _lstsqr = _lstbp * _lstbp -- adv next base prime
        return _self.next()
      else return _cand end -- is prime
    else
      local v = _composites[_cand]
      _composites[_cand] = nil
      local nv = _cand + v
      while _composites[nv] ~= nil do nv = nv + v end
      _composites[nv] = v
      return _self.next() end
  end
  return _self
end

gen = newEratoInf()
count = 0
while gen.next() <= 10000000 do count = count + 1 end -- sieves to 10 million
print(count)

which outputs "664579" in about three seconds. As this code uses much less memory for a given range than the previous ones and retains efficiency better with range, it is likely more appropriate for larger sieve ranges.

Lucid

This example is incorrect. Please fix the code and remove this message.

Details: Not a true Sieve of Eratosthenes but rather a Trial Division Sieve

prime
 where
    prime = 2 fby (n whenever isprime(n));
    n = 3 fby n+2;
    isprime(n) = not(divs) asa divs or prime*prime > N
                    where
                      N is current n;
                      divs = N mod prime eq 0;
                    end;
 end

recursive

This example is incorrect. Please fix the code and remove this message.

Details: Not a true Sieve of Eratosthenes but rather a Trial Division Sieve

sieve( N )
   where
    N = 2 fby N + 1;
    sieve( i ) =
      i fby sieve ( i whenever i mod first i ne 0 ) ;
   end

M2000 Interpreter

Module EratosthenesSieve (x) {
      \\ Κόσκινο του Ερατοσθένη
      Profiler
      If x>2000000 Then Exit
      Dim i(x+1): k=2: k2=sqrt(x)
      While k<=k2{if i(k) else for m=k*k to x step k{i(m)=1}
      k++}
      Print str$(timecount/1000,"####0.##")+" s"
      Input "Press enter skip print or a non zero to get results:", a%
      if a% then For i=2to x{If i(i)=0 Then Print i, 
      }
      Print:Print "Done"
}
EratosthenesSieve 1000

M4

define(`lim',100)dnl
define(`for',
   `ifelse($#,0,
      ``$0'',
      `ifelse(eval($2<=$3),1,
         `pushdef(`$1',$2)$5`'popdef(`$1')$0(`$1',eval($2+$4),$3,$4,`$5')')')')dnl
for(`j',2,lim,1,
   `ifdef(a[j],
      `',
      `j for(`k',eval(j*j),lim,j,
         `define(a[k],1)')')')

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

MAD

            NORMAL MODE IS INTEGER
          
          R TO GENERATE MORE PRIMES, CHANGE BOTH THESE NUMBERS          
            BOOLEAN PRIME
            DIMENSION PRIME(10000)
            MAXVAL = 10000
            PRINT FORMAT BEGIN,MAXVAL
            
          R ASSUME ALL ARE PRIMES AT BEGINNING
            THROUGH SET, FOR I=2, 1, I.G.MAXVAL
SET         PRIME(I) = 1B

          R REMOVE ALL PROVEN COMPOSITES
            SQMAX = SQRT.(MAXVAL)
            THROUGH NEXT, FOR P=2, 1, P.G.SQMAX
            WHENEVER PRIME(P)
                THROUGH MARK, FOR I=P*P, P, I.G.MAXVAL
MARK            PRIME(I) = 0B
NEXT        END OF CONDITIONAL

          R PRINT PRIMES
            THROUGH SHOW, FOR P=2, 1, P.G.MAXVAL
SHOW        WHENEVER PRIME(P), PRINT FORMAT NUMFMT, P
            
            VECTOR VALUES BEGIN = $13HPRIMES UP TO ,I9*$
            VECTOR VALUES NUMFMT = $I9*$
            END OF PROGRAM
Output:
PRIMES UP TO     10000
        2
        3
        5
        7
       11
       13
       17
...
     9979
     9983
     9985
     9989
     9991
     9995
     9997

Maple

Eratosthenes := proc(n::posint) 
  local numbers_to_check, i, k; 
  numbers_to_check := [seq(2 .. n)]; 
  for i from 2 to floor(sqrt(n)) do 
      for k from i by i while k <= n do 
          if evalb(k <> i) then
            numbers_to_check[k - 1] := 0; 
          end if; 
      end do; 
  end do; 
  numbers_to_check := remove(x -> evalb(x = 0), numbers_to_check); 
  return numbers_to_check; 
  end proc:
Output:
Eratosthenes(100);
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Mathematica/Wolfram Language

Eratosthenes[n_] := Module[{numbers = Range[n]},
  Do[If[numbers[[i]] != 0, Do[numbers[[i j]] = 0, {j, 2, n/i}]], {i, 
    2, Sqrt[n]}];
  Select[numbers, # > 1 &]]

Eratosthenes[100]

Slightly Optimized Version

The below has been improved to not require so many operations per composite number cull for about two thirds the execution time:

Eratosthenes[n_] := Module[{numbers = Range[n]},
  Do[If[numbers[[i]] != 0, Do[numbers[[j]] = 0, {j,i i,n,i}]],{i,2,Sqrt[n]}];
  Select[numbers, # > 1 &]]

Eratosthenes[100]

Sieving Odds-Only Version

The below has been further improved to only sieve odd numbers for a further reduction in execution time by a factor of over two:

Eratosthenes2[n_] := Module[{numbers = Range[3, n, 2], limit = (n - 1)/2}, 
  Do[c = numbers[[i]]; If[c != 0,
    Do[numbers[[j]] = 0, {j,(c c - 1)/2,limit,c}]], {i,1,(Sqrt[n] - 1)/2}];
  Prepend[Select[numbers, # > 1 &], 2]]

Eratosthenes2[100]

MATLAB / Octave

Somewhat optimized true Sieve of Eratosthenes

function P = erato(x)        % Sieve of Eratosthenes: returns all primes between 2 and x
        
    P = [0 2:x];             % Create vector with all ints between 2 and x where
                             %   position 1 is hard-coded as 0 since 1 is not a prime.

    for n = 2:sqrt(x)        % All primes factors lie between 2 and sqrt(x).
       if P(n)               % If the current value is not 0 (i.e. a prime),
          P(n*n:n:x) = 0;    % then replace all further multiples of it with 0.
       end
    end                      % At this point P is a vector with only primes and zeroes.

    P = P(P ~= 0);           % Remove all zeroes from P, leaving only the primes.
end

The optimization lies in fewer steps in the for loop, use of MATLAB's built-in array operations and no modulo calculation.

Limitation: your machine has to be able to allocate enough memory for an array of length x.

A more efficient Sieve

A more efficient Sieve avoids creating a large double precision vector P, instead using a logical array (which consumes 1/8 the memory of a double array of the same size) and only converting to double those values corresponding to primes.

function P = sieveOfEratosthenes(x)
    ISP = [false true(1, x-1)]; % 1 is not prime, but we start off assuming all numbers between 2 and x are
    for n = 2:sqrt(x)
        if ISP(n)
            ISP(n*n:n:x) = false; % Multiples of n that are greater than n*n are not primes
        end
    end
    % The ISP vector that we have calculated is essentially the output of the ISPRIME function called on 1:x
    P = find(ISP); % Convert the ISPRIME output to the values of the primes by finding the locations 
                   % of the TRUE values in S.
end

You can compare the output of this function against the PRIMES function included in MATLAB, which performs a somewhat more memory-efficient Sieve (by not storing even numbers, at the expense of a more complicated indexing expression inside the IF statement.)

Maxima

sieve(n):=block(
   [a:makelist(true,n),i:1,j],
   a[1]:false,
   do (
      i:i+1,
      unless a[i] do i:i+1,
      if i*i>n then return(sublist_indices(a,identity)),
      for j from i*i step i while j<=n do a[j]:false
   )
)$

MAXScript

fn eratosthenes n =
(
    multiples = #()
    print 2
    for i in 3 to n do
    (
        if (findItem multiples i) == 0 then
        (
            print i
            for j in (i * i) to n by i do
            (
                append multiples j
            )
        )
    )
)

eratosthenes 100

Mercury

:- module sieve.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.
:- implementation.
:- import_module bool, array, int.

main(!IO) :-
    sieve(50, Sieve),
    dump_primes(2, size(Sieve), Sieve, !IO).

:- pred dump_primes(int, int, array(bool), io, io).
:- mode dump_primes(in, in, array_di, di, uo) is det.
dump_primes(N, Limit, !.A, !IO) :-
    ( if N < Limit then
        unsafe_lookup(!.A, N, Prime),
        (
            Prime = yes,
            io.write_line(N, !IO)
        ;
            Prime = no
        ),
        dump_primes(N + 1, Limit, !.A, !IO)
    else
        true
    ).

:- pred sieve(int, array(bool)).
:- mode sieve(in, array_uo) is det.
sieve(N, !:A) :-
    array.init(N, yes, !:A),
    sieve(2, N, !A).

:- pred sieve(int, int, array(bool), array(bool)).
:- mode sieve(in, in, array_di, array_uo) is det.
sieve(N, Limit, !A) :-
    ( if N < Limit then
        unsafe_lookup(!.A, N, Prime),
        (
            Prime = yes,
            sift(N + N, N, Limit, !A),
            sieve(N + 1, Limit, !A)
        ;
            Prime = no,
            sieve(N + 1, Limit, !A)
        )
    else
        true
    ).

:- pred sift(int, int, int, array(bool), array(bool)).
:- mode sift(in, in, in, array_di, array_uo) is det.
sift(I, N, Limit, !A) :-
    ( if I < Limit then
        unsafe_set(I, no, !A),
        sift(I + N, N, Limit, !A)
    else
        true
    ).

Microsoft Small Basic

Translation of: GW-BASIC
TextWindow.Write("Enter number to search to: ")
limit = TextWindow.ReadNumber()
For n = 2 To limit 
  flags[n] = 0
EndFor
For n = 2 To math.SquareRoot(limit)
  If flags[n] = 0 Then
    For K = n * n To limit Step n
      flags[K] = 1
    EndFor
  EndIf
EndFor
' Display the primes
If limit >= 2 Then
  TextWindow.Write(2)
  For n = 3 To limit
    If flags[n] = 0 Then 
      TextWindow.Write(", " + n)
    EndIf  
  EndFor
  TextWindow.WriteLine("")
EndIf

Modula-2

MODULE Erato;
FROM InOut IMPORT WriteCard, WriteLn;
FROM MathLib IMPORT sqrt;

CONST Max = 100; 

VAR prime: ARRAY [2..Max] OF BOOLEAN;
    i: CARDINAL;

PROCEDURE Sieve;
    VAR i, j, sqmax: CARDINAL;
BEGIN
    sqmax := TRUNC(sqrt(FLOAT(Max)));

    FOR i := 2 TO Max DO prime[i] := TRUE; END;
    FOR i := 2 TO sqmax DO
        IF prime[i] THEN
            j := i * 2;
            (* alas, the BY clause in a FOR loop must be a constant *)
            WHILE j <= Max DO 
                prime[j] := FALSE;
                j := j + i;
            END;
        END;
    END;
END Sieve;

BEGIN
    Sieve;
    FOR i := 2 TO Max DO
        IF prime[i] THEN
            WriteCard(i,5);
            WriteLn;
        END;
    END;
END Erato.
Output:
    2
    3
    5
    7
   11
   13
   17
   19
   23
   29
   31
   37
   41
   43
   47
   53
   59
   61
   67
   71
   73
   79
   83
   89
   97

Modula-3

Regular version

This version runs slow because of the way I/O is implemented in the CM3 compiler. Setting ListPrimes = FALSE achieves speed comparable to C on sufficiently high values of LastNum (e.g., 10^6).

MODULE Eratosthenes EXPORTS Main;

IMPORT IO;

FROM Math IMPORT sqrt;

CONST
  LastNum    = 1000;
  ListPrimes = TRUE;

VAR
  a: ARRAY[2..LastNum] OF BOOLEAN;

VAR
  n := LastNum - 2 + 1;

BEGIN

  (* set up *)
  FOR i := FIRST(a) TO LAST(a) DO
    a[i] := TRUE;
  END;

  (* declare a variable local to a block *)
  VAR b := FLOOR(sqrt(FLOAT(LastNum, LONGREAL)));

  (* the block must follow immediately *)
  BEGIN

    (* print primes and mark out composites up to sqrt(LastNum) *)
    FOR i := FIRST(a) TO b DO
      IF a[i] THEN
        IF ListPrimes THEN IO.PutInt(i); IO.Put(" "); END;
        FOR j := i*i TO LAST(a) BY i DO
          IF a[j] THEN
            a[j] := FALSE;
            DEC(n);
          END;
        END;
      END;
    END;

    (* print remaining primes *)
    IF ListPrimes THEN
      FOR i := b + 1 TO LAST(a) DO
        IF a[i] THEN
          IO.PutInt(i); IO.Put(" ");
        END;
      END;
    END;

  END;

  (* report *)
  IO.Put("There are ");         IO.PutInt(n);
  IO.Put(" primes from 2 to "); IO.PutInt(LastNum);
  IO.PutChar('\n');

END Eratosthenes.

Advanced version

This version uses more "advanced" types.

(* From the CM3 examples folder (comments removed). *)

MODULE Sieve EXPORTS Main;

IMPORT IO;

TYPE
  Number = [2..1000];
  Set = SET OF Number;

VAR
  prime: Set := Set {FIRST(Number) .. LAST(Number)};

BEGIN
  FOR i := FIRST(Number) TO LAST(Number) DO
    IF i IN prime THEN
      IO.PutInt(i);
      IO.Put(" ");

      FOR j := i TO LAST(Number) BY i DO
        prime := prime - Set{j};
      END;
    END;
  END;
  IO.Put("\n");
END Sieve.

Mojo

Tested with Mojo version 0.7:

from memory import memset_zero
from memory.unsafe import (DTypePointer)
from time import (now)

alias cLIMIT: Int = 1_000_000_000

struct SoEBasic(Sized):
    var len: Int
    var cmpsts: DTypePointer[DType.bool] # because DynamicVector has deep copy bug in mojo version 0.7
    var sz: Int
    var ndx: Int
    fn __init__(inout self, limit: Int):
        self.len = limit - 1
        self.sz = limit - 1
        self.ndx = 0
        self.cmpsts = DTypePointer[DType.bool].alloc(limit - 1)
        memset_zero(self.cmpsts, limit - 1)
        for i in range(limit - 1):
            let s = i * (i + 4) + 2
            if s >= limit - 1: break
            if self.cmpsts[i]: continue
            let bp = i + 2
            for c in range(s, limit - 1, bp):
                self.cmpsts[c] = True
        for i in range(limit - 1):
            if self.cmpsts[i]: self.sz -= 1
    fn __del__(owned self):
        self.cmpsts.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        self.cmpsts = DTypePointer[DType.bool].alloc(self.len)
        for i in range(self.len):
            self.cmpsts[i] = existing.cmpsts[i]
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    fn __next__(inout self: Self) -> Int:
        if self.ndx >= self.len: return 0
        while (self.ndx < self.len) and (self.cmpsts[self.ndx]):
            self.ndx += 1
        let rslt = self.ndx + 2; self.sz -= 1; self.ndx += 1
        return rslt

fn main():
    print("The primes to 100 are:")
    for prm in SoEBasic(100): print_no_newline(prm, " ")
    print()
    let strt0 = now()
    let answr0 = len(SoEBasic(1_000_000))
    let elpsd0 = (now() - strt0) / 1000000
    print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
    let strt1 = now()
    let answr1 = len(SoEBasic(cLIMIT))
    let elpsd1 = (now() - strt1) / 1000000
    print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")
Output:
The primes to 100 are:
2  3  5  7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97  
Found 78498 primes up to 1,000,000 in 1.2642770000000001 milliseconds.
Found 50847534 primes up to 1000000000 in 6034.328751 milliseconds.

as run on an AMD 7840HS CPU at 5.1 GHz.

Note that due to the huge memory array used, when large ranges are selected, the speed is disproportional in speed slow down by about four times.

This solution uses an interator struct which seems to be the Mojo-preferred way to do this, and normally a DynamicVector would have been used the the culling array except that there is a bug in this version of DynamicVector where the array is not properly deep copied when copied to a new location, so the raw pointer type is used.

Odds-Only with Optimizations

This version does three significant improvements to the above code as follows: 1) It is trivial to skip the processing to store representations for and cull the even comosite numbers other than the prime number two, saving half the storage space and reducing the culling time to about 40 percent. 2) There is a repeating pattern of culling composite representations over a bit-packed byte array (which reduces the storage requirement by another eight times) that repeats every eight culling operations, which can be encapsulated by a extreme loop unrolling technique with compiler generated constants as done here. 3) Further, there is a further extreme optimization technique of dense culling for small base prime values whose culling span is less than one register in size where the loaded register is repeatedly culled for different base prime strides before being written out (with such optimization done by the compiler), again using compiler generated modification constants. This technique is usually further optimizated by modern compilers to use efficient autovectorization and the use of SIMD registers available to the architecture to reduce these culling operations to an avererage of a tiny fraction of a CPU clock cycle per cull.

Mojo version 0.7 was tested:

from memory import (memset_zero, memcpy)
from memory.unsafe import (DTypePointer)
from math.bit import ctpop
from time import (now)

alias cLIMIT: Int = 1_000_000_000

alias cBufferSize: Int = 262144 # bytes
alias cBufferBits: Int = cBufferSize * 8

alias UnrollFunc = fn(DTypePointer[DType.uint8], Int, Int, Int) -> None

@always_inline
fn extreme[OFST: Int, BP: Int](pcmps: DTypePointer[DType.uint8], bufsz: Int, s: Int, bp: Int):
  var cp = pcmps + (s >> 3)
  let r1: Int = ((s + bp) >> 3) - (s >> 3)
  let r2: Int = ((s + 2 * bp) >> 3) - (s >> 3)
  let r3: Int = ((s + 3 * bp) >> 3) - (s >> 3)
  let r4: Int = ((s + 4 * bp) >> 3) - (s >> 3)
  let r5: Int = ((s + 5 * bp) >> 3) - (s >> 3)
  let r6: Int = ((s + 6 * bp) >> 3) - (s >> 3)
  let r7: Int = ((s + 7 * bp) >> 3) - (s >> 3)
  let plmt: DTypePointer[DType.uint8] = pcmps + bufsz - r7
  while cp < plmt:
    cp.store(cp.load() | (1 << OFST))
    (cp + r1).store((cp + r1).load() | (1 << ((OFST + BP) & 7)))
    (cp + r2).store((cp + r2).load() | (1 << ((OFST + 2 * BP) & 7)))
    (cp + r3).store((cp + r3).load() | (1 << ((OFST + 3 * BP) & 7)))
    (cp + r4).store((cp + r4).load() | (1 << ((OFST + 4 * BP) & 7)))
    (cp + r5).store((cp + r5).load() | (1 << ((OFST + 5 * BP) & 7)))
    (cp + r6).store((cp + r6).load() | (1 << ((OFST + 6 * BP) & 7)))
    (cp + r7).store((cp + r7).load() | (1 << ((OFST + 7 * BP) & 7)))
    cp += bp
  let eplmt: DTypePointer[DType.uint8] = plmt + r7
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << OFST))
  cp += r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + BP) & 7)))
  cp += r2 - r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 2 * BP) & 7)))
  cp += r3 - r2
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 3 * BP) & 7)))
  cp += r4 - r3
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 4 * BP) & 7)))
  cp += r5 - r4
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 5 * BP) & 7)))
  cp += r6 - r5
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 6 * BP) & 7)))
  cp += r7 - r6
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 7 * BP) & 7)))

fn mkExtrm[CNT: Int](pntr: Pointer[UnrollFunc]):
  @parameter
  if CNT >= 32:
    return
  alias OFST = CNT >> 2
  alias BP = ((CNT & 3) << 1) + 1
  pntr.offset(CNT).store(extreme[OFST, BP])
  mkExtrm[CNT + 1](pntr)

@always_inline
fn mkExtremeFuncs() -> Pointer[UnrollFunc]:
  let jmptbl: Pointer[UnrollFunc] = Pointer[UnrollFunc].alloc(32)
  mkExtrm[0](jmptbl)
  return jmptbl
  
let extremeFuncs = mkExtremeFuncs()

alias DenseFunc = fn(DTypePointer[DType.uint64], Int, Int) -> DTypePointer[DType.uint64]

fn mkDenseCull[N: Int, BP: Int](cp: DTypePointer[DType.uint64]):
  @parameter
  if N >= 64:
    return
  alias MUL = N * BP
  var cop = cp.offset(MUL >> 6)
  cop.store(cop.load() | (1 << (MUL & 63)))
  mkDenseCull[N + 1, BP](cp)

@always_inline
fn denseCullFunc[BP: Int](pcmps: DTypePointer[DType.uint64], bufsz: Int, s: Int) -> DTypePointer[DType.uint64]:
  var cp: DTypePointer[DType.uint64] = pcmps + (s >> 6)
  let plmt = pcmps + (bufsz >> 3) - BP
  while cp < plmt:
    mkDenseCull[0, BP](cp)
    cp += BP
  return cp

fn mkDenseFunc[CNT: Int](pntr: Pointer[DenseFunc]):
  @parameter
  if CNT >= 64:
    return
  alias BP = (CNT << 1) + 3
  pntr.offset(CNT).store(denseCullFunc[BP])
  mkDenseFunc[CNT + 1](pntr)

@always_inline
fn mkDenseFuncs() -> Pointer[DenseFunc]:
  let jmptbl : Pointer[DenseFunc] = Pointer[DenseFunc].alloc(64)
  mkDenseFunc[0](jmptbl)
  return jmptbl

let denseFuncs : Pointer[DenseFunc] = mkDenseFuncs()

@always_inline
fn cullPass(cmpsts: DTypePointer[DType.uint8], bytesz: Int, s: Int, bp: Int):
    if bp <= 129: # dense culling
        var sm = s
        while (sm >> 3) < bytesz and (sm & 63) != 0:
            cmpsts[sm >> 3] |= (1 << (sm & 7))
            sm += bp
        let bcp = denseFuncs[(bp - 3) >> 1](cmpsts.bitcast[DType.uint64](), bytesz, sm)
        var ns = 0
        var ncp = bcp
        let cmpstslmtp = (cmpsts + bytesz).bitcast[DType.uint64]()
        while ncp < cmpstslmtp:
            ncp[0] |= (1 << (ns & 63))
            ns += bp
            ncp = bcp + (ns >> 6)
    else: # extreme loop unrolling culling
        extremeFuncs[((s & 7) << 2) + ((bp & 7) >> 1)](cmpsts, bytesz, s, bp)
#    for c in range(s, self.len, bp): # slow bit twiddling way
#        self.cmpsts[c >> 3] |= (1 << (c & 7))

fn countPagePrimes(ptr: DTypePointer[DType.uint8], bitsz: Int) -> Int:
    let wordsz: Int = (bitsz + 63) // 64  # round up to nearest 64 bit boundary 
    var rslt: Int = wordsz * 64
    let bigcmps = ptr.bitcast[DType.uint64]()        
    for i in range(wordsz - 1):
       rslt -= ctpop(bigcmps[i]).to_int()
    rslt -= ctpop(bigcmps[wordsz - 1] | (-2 << ((bitsz - 1) & 63))).to_int()
    return rslt

struct SoEOdds(Sized):
    var len: Int
    var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
    var sz: Int
    var ndx: Int
    fn __init__(inout self, limit: Int):
        self.len = 0 if limit < 2 else (limit - 3) // 2 + 1
        self.sz = 0 if limit < 2 else self.len + 1 # for the unprocessed only even prime of two
        self.ndx = -1
        let bytesz = 0 if limit < 2 else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memset_zero(self.cmpsts, bytesz)
        for i in range(self.len):
            let s = (i + i) * (i + 3) + 3
            if s >= self.len: break
            if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
            let bp = i + i + 3
            cullPass(self.cmpsts, bytesz, s, bp)
        self.sz = countPagePrimes(self.cmpsts, self.len) + 1 # add one for only even prime of two
    fn __del__(owned self):
        self.cmpsts.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        let bytesz = (self.len + 7) // 8
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memcpy(self.cmpsts, existing.cmpsts, bytesz)
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    @always_inline
    fn __next__(inout self: Self) -> Int:
        if self.ndx < 0:
            self.ndx = 0; self.sz -= 1; return 2
        while (self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
            self.ndx += 1
        let rslt = (self.ndx << 1) + 3; self.sz -= 1; self.ndx += 1; return rslt

fn main():
    print("The primes to 100 are:")
    for prm in SoEOdds(100): print_no_newline(prm, " ")
    print()
    let strt0 = now()
    let answr0 = len(SoEOdds(1_000_000))
    let elpsd0 = (now() - strt0) / 1000000
    print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
    let strt1 = now()
    let answr1 = len(SoEOdds(cLIMIT))
    let elpsd1 = (now() - strt1) / 1000000
    print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")
Output:
The primes to 100 are:
2  3  5  7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97  
Found 78498 primes up to 1,000,000 in 0.085067000000000004 milliseconds.
Found 50847534 primes up to 1000000000 in 1204.866606 milliseconds.

This was run on the same computer as the above example; notice that while this is much faster than that version, it is still very slow as the sieving range gets large such that the relative processing time for a range that is 1000 times as large is about ten times slower than as might be expected by simple scaling. This is due to the "one huge sieving buffer" algorithm that gets very large with increasing range (and in fact will eventually limit the sieving range that can be used) to exceed the size of CPU cache buffers and thus greatly slow average memory access times.

Page-Segmented Odds-Only with Optimizations

While the above version performs reasonably well for small sieving ranges that fit within the CPU caches of a few tens of millions, as one can see it gets much slower with larger ranges and as well its huge RAM memory consumption limits the maximum range over which it can be used. This version solves these problems be breaking the huge sieving array into "pages" that each fit within the CPU cache size and processing each "page" sequentially until the target range is reached. This technique also greatly reduces memory requirements to only that required to store the base prime value representations up to the square root of the range limit (about O(n/log n) storage plus a fixed size page buffer. In this case, the storage for the base primes has been reduced by a constant factor by storing them as single byte deltas from the previous value, which works for ranges up to the 64-bit number range where the biggest gap is two times 192 and since we store only for odd base primes, the gap values are all half values to fit in a single byte.

Currently, Mojo has problems with some functions in the standard libraries such as the integer square root function is not accurate nor does it work for the required integer types so a custom integer square root function is supplied. As well, current Mojo does not support recursion for hardly any useful cases (other than compile time global function recursion), so the `SoeOdds` structure from the previous answer had to be kept to generate the base prime representation table (or this would have had to be generated from scratch within the new `SoEOddsPaged` structure). Finally, it didn't seem to be worth using the `Sized` trait for the new structure as this would seem to sometimes require processing the pages twice, one to obtain the size and once if iteration across the prime values is required.

Tested with Mojo version 0.7:

from memory import (memset_zero, memcpy)
from memory.unsafe import (DTypePointer)
from math.bit import ctpop
from time import (now)

alias cLIMIT: Int = 1_000_000_000

alias cBufferSize: Int = 262144 # bytes
alias cBufferBits: Int = cBufferSize * 8

fn intsqrt(n: UInt64) -> UInt64:
  if n < 4:
    if n < 1: return 0 else: return 1
  var x: UInt64 = n; var qn: UInt64 = 0; var r: UInt64 = 0
  while qn < 64 and (1 << qn) <= n:
    qn += 2
  var q: UInt64 = 1 << qn
  while q > 1:
    if qn >= 64:
      q = 1 << (qn - 2); qn = 0
    else:
      q >>= 2
    let t: UInt64 =  r + q
    r >>= 1
    if x >= t:
      x -= t; r += q
  return r

alias UnrollFunc = fn(DTypePointer[DType.uint8], Int, Int, Int) -> None

@always_inline
fn extreme[OFST: Int, BP: Int](pcmps: DTypePointer[DType.uint8], bufsz: Int, s: Int, bp: Int):
  var cp = pcmps + (s >> 3)
  let r1: Int = ((s + bp) >> 3) - (s >> 3)
  let r2: Int = ((s + 2 * bp) >> 3) - (s >> 3)
  let r3: Int = ((s + 3 * bp) >> 3) - (s >> 3)
  let r4: Int = ((s + 4 * bp) >> 3) - (s >> 3)
  let r5: Int = ((s + 5 * bp) >> 3) - (s >> 3)
  let r6: Int = ((s + 6 * bp) >> 3) - (s >> 3)
  let r7: Int = ((s + 7 * bp) >> 3) - (s >> 3)
  let plmt: DTypePointer[DType.uint8] = pcmps + bufsz - r7
  while cp < plmt:
    cp.store(cp.load() | (1 << OFST))
    (cp + r1).store((cp + r1).load() | (1 << ((OFST + BP) & 7)))
    (cp + r2).store((cp + r2).load() | (1 << ((OFST + 2 * BP) & 7)))
    (cp + r3).store((cp + r3).load() | (1 << ((OFST + 3 * BP) & 7)))
    (cp + r4).store((cp + r4).load() | (1 << ((OFST + 4 * BP) & 7)))
    (cp + r5).store((cp + r5).load() | (1 << ((OFST + 5 * BP) & 7)))
    (cp + r6).store((cp + r6).load() | (1 << ((OFST + 6 * BP) & 7)))
    (cp + r7).store((cp + r7).load() | (1 << ((OFST + 7 * BP) & 7)))
    cp += bp
  let eplmt: DTypePointer[DType.uint8] = plmt + r7
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << OFST))
  cp += r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + BP) & 7)))
  cp += r2 - r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 2 * BP) & 7)))
  cp += r3 - r2
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 3 * BP) & 7)))
  cp += r4 - r3
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 4 * BP) & 7)))
  cp += r5 - r4
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 5 * BP) & 7)))
  cp += r6 - r5
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 6 * BP) & 7)))
  cp += r7 - r6
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 7 * BP) & 7)))

fn mkExtrm[CNT: Int](pntr: Pointer[UnrollFunc]):
  @parameter
  if CNT >= 32:
    return
  alias OFST = CNT >> 2
  alias BP = ((CNT & 3) << 1) + 1
  pntr.offset(CNT).store(extreme[OFST, BP])
  mkExtrm[CNT + 1](pntr)

@always_inline
fn mkExtremeFuncs() -> Pointer[UnrollFunc]:
  let jmptbl: Pointer[UnrollFunc] = Pointer[UnrollFunc].alloc(32)
  mkExtrm[0](jmptbl)
  return jmptbl
  
let extremeFuncs = mkExtremeFuncs()

alias DenseFunc = fn(DTypePointer[DType.uint64], Int, Int) -> DTypePointer[DType.uint64]

fn mkDenseCull[N: Int, BP: Int](cp: DTypePointer[DType.uint64]):
  @parameter
  if N >= 64:
    return
  alias MUL = N * BP
  var cop = cp.offset(MUL >> 6)
  cop.store(cop.load() | (1 << (MUL & 63)))
  mkDenseCull[N + 1, BP](cp)

@always_inline
fn denseCullFunc[BP: Int](pcmps: DTypePointer[DType.uint64], bufsz: Int, s: Int) -> DTypePointer[DType.uint64]:
  var cp: DTypePointer[DType.uint64] = pcmps + (s >> 6)
  let plmt = pcmps + (bufsz >> 3) - BP
  while cp < plmt:
    mkDenseCull[0, BP](cp)
    cp += BP
  return cp

fn mkDenseFunc[CNT: Int](pntr: Pointer[DenseFunc]):
  @parameter
  if CNT >= 64:
    return
  alias BP = (CNT << 1) + 3
  pntr.offset(CNT).store(denseCullFunc[BP])
  mkDenseFunc[CNT + 1](pntr)

@always_inline
fn mkDenseFuncs() -> Pointer[DenseFunc]:
  let jmptbl : Pointer[DenseFunc] = Pointer[DenseFunc].alloc(64)
  mkDenseFunc[0](jmptbl)
  return jmptbl

let denseFuncs : Pointer[DenseFunc] = mkDenseFuncs()

@always_inline
fn cullPass(cmpsts: DTypePointer[DType.uint8], bytesz: Int, s: Int, bp: Int):
    if bp <= 129: # dense culling
        var sm = s
        while (sm >> 3) < bytesz and (sm & 63) != 0:
            cmpsts[sm >> 3] |= (1 << (sm & 7))
            sm += bp
        let bcp = denseFuncs[(bp - 3) >> 1](cmpsts.bitcast[DType.uint64](), bytesz, sm)
        var ns = 0
        var ncp = bcp
        let cmpstslmtp = (cmpsts + bytesz).bitcast[DType.uint64]()
        while ncp < cmpstslmtp:
            ncp[0] |= (1 << (ns & 63))
            ns += bp
            ncp = bcp + (ns >> 6)
    else: # extreme loop unrolling culling
        extremeFuncs[((s & 7) << 2) + ((bp & 7) >> 1)](cmpsts, bytesz, s, bp)
#    for c in range(s, self.len, bp): # slow bit twiddling way
#        self.cmpsts[c >> 3] |= (1 << (c & 7))

fn cullPage(lwi: Int, lmt: Int, cmpsts: DTypePointer[DType.uint8], bsprmrps: DTypePointer[DType.uint8]):
    var bp = 1; var ndx = 0
    while True:
        bp += bsprmrps[ndx].to_int() << 1
        let i = (bp - 3) >> 1
        var s = (i + i) * (i + 3) + 3
        if s >= lmt: break
        if s >= lwi: s -= lwi
        else:
            s = (lwi - s) % bp
            if s != 0: s = bp - s
        cullPass(cmpsts, cBufferSize, s, bp)
        ndx += 1

fn countPagePrimes(ptr: DTypePointer[DType.uint8], bitsz: Int) -> Int:
    let wordsz: Int = (bitsz + 63) // 64  # round up to nearest 64 bit boundary 
    var rslt: Int = wordsz * 64
    let bigcmps = ptr.bitcast[DType.uint64]()        
    for i in range(wordsz - 1):
       rslt -= ctpop(bigcmps[i]).to_int()
    rslt -= ctpop(bigcmps[wordsz - 1] | (-2 << ((bitsz - 1) & 63))).to_int()
    return rslt

struct SoEOdds(Sized):
    var len: Int
    var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
    var sz: Int
    var ndx: Int
    fn __init__(inout self, limit: Int):
        self.len = 0 if limit < 2 else (limit - 3) // 2 + 1
        self.sz = 0 if limit < 2 else self.len + 1 # for the unprocessed only even prime of two
        self.ndx = -1
        let bytesz = 0 if limit < 2 else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memset_zero(self.cmpsts, bytesz)
        for i in range(self.len):
            let s = (i + i) * (i + 3) + 3
            if s >= self.len: break
            if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
            let bp = i + i + 3
            cullPass(self.cmpsts, bytesz, s, bp)
        self.sz = countPagePrimes(self.cmpsts, self.len) + 1 # add one for only even prime of two
    fn __del__(owned self):
        self.cmpsts.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        let bytesz = (self.len + 7) // 8
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memcpy(self.cmpsts, existing.cmpsts, bytesz)
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    @always_inline
    fn __next__(inout self: Self) -> Int:
        if self.ndx < 0:
            self.ndx = 0; self.sz -= 1; return 2
        while (self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
            self.ndx += 1
        let rslt = (self.ndx << 1) + 3; self.sz -= 1; self.ndx += 1; return rslt

struct SoEOddsPaged:
    var len: Int
    var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
    var sz: Int # 0 means finished; otherwise contains number of odd base primes
    var ndx: Int
    var lwi: Int
    var bsprmrps: DTypePointer[DType.uint8] # contains deltas between odd base primes starting from zero
    fn __init__(inout self, limit: UInt64):
        self.len = 0 if limit < 2 else ((limit - 3) // 2 + 1).to_int()
        self.sz = 0 if limit < 2 else 1 # means iterate until this is set to zero
        self.ndx = -1 # for unprocessed only even prime of two
        self.lwi = 0
        if self.len < cBufferBits:
            let bytesz = ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
            self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
            self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
        else:
            self.cmpsts = DTypePointer[DType.uint8].alloc(cBufferSize)
            let bsprmitr = SoEOdds(intsqrt(limit).to_int())
            self.sz = len(bsprmitr)
            self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
            var ndx = -1; var oldbp = 1
            for bsprm in bsprmitr:
                if ndx < 0: ndx += 1; continue # skip over the 2 prime
                self.bsprmrps[ndx] = (bsprm - oldbp) >> 1
                oldbp = bsprm; ndx += 1
            self.bsprmrps[ndx] = 255 # one extra value to go beyond the necessary cull space
    fn __del__(owned self):
        self.cmpsts.free(); self.bsprmrps.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        self.sz = existing.sz
        let bytesz = cBufferSize if self.len >= cBufferBits
                     else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memcpy(self.cmpsts, existing.cmpsts, bytesz)
        self.ndx = existing.ndx
        self.lwi = existing.lwi
        self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
        memcpy(self.bsprmrps, existing.bsprmrps, self.sz)
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
        self.lwi = existing.lwi
        self.bsprmrps = existing.bsprmrps
    fn countPrimes(self) -> Int:
        if self.len <= cBufferBits: return len(SoEOdds(2 * self.len + 1))
        var cnt = 1; var lwi = 0
        let cmpsts = DTypePointer[DType.uint8].alloc(cBufferSize)
        memset_zero(cmpsts, cBufferSize)
        cullPage(0, cBufferBits, cmpsts, self.bsprmrps)
        while lwi + cBufferBits <= self.len:
            cnt += countPagePrimes(cmpsts, cBufferBits)
            lwi += cBufferBits
            memset_zero(cmpsts, cBufferSize)
            let lmt = lwi + cBufferBits if lwi + cBufferBits <= self.len else self.len
            cullPage(lwi, lmt, cmpsts, self.bsprmrps)
        cnt += countPagePrimes(cmpsts, self.len - lwi)
        return cnt
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    @always_inline
    fn __next__(inout self: Self) -> Int: # don't count number of primes by interating - slooow
        if self.ndx < 0:
            self.ndx = 0; self.lwi = 0
            if self.len < 2: self.sz = 0
            elif self.len <= cBufferBits:
                let bytesz = ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
                memset_zero(self.cmpsts, bytesz)
                for i in range(self.len):
                    let s = (i + i) * (i + 3) + 3
                    if s >= self.len: break
                    if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
                    let bp = i + i + 3
                    cullPass(self.cmpsts, bytesz, s, bp)
            else:
                memset_zero(self.cmpsts, cBufferSize)
                cullPage(0, cBufferBits, self.cmpsts, self.bsprmrps)
            return 2
        let rslt = ((self.lwi + self.ndx) << 1) + 3; self.ndx += 1
        if self.lwi + cBufferBits >= self.len:
            while (self.lwi + self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
                self.ndx += 1
        else:
            while (self.ndx < cBufferBits) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
                self.ndx += 1
            while (self.ndx >= cBufferBits) and (self.lwi + cBufferBits <= self.len):
                self.ndx = 0; self.lwi += cBufferBits; memset_zero(self.cmpsts, cBufferSize)
                let lmt = self.lwi + cBufferBits if self.lwi + cBufferBits <= self.len else self.len
                cullPage(self.lwi, lmt, self.cmpsts, self.bsprmrps)
                let buflmt = cBufferBits if self.lwi + cBufferBits <= self.len else self.len - self.lwi
                while (self.ndx < buflmt) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
                    self.ndx += 1
        if self.lwi + self.ndx >= self.len: self.sz = 0
        return rslt

fn main():
    print("The primes to 100 are:")
    for prm in SoEOddsPaged(100): print_no_newline(prm, " ")
    print()
    let strt0 = now()
    let answr0 = SoEOddsPaged(1_000_000).countPrimes()
    let elpsd0 = (now() - strt0) / 1000000
    print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
    let strt1 = now()
    let answr1 = SoEOddsPaged(cLIMIT).countPrimes()
    let elpsd1 = (now() - strt1) / 1000000
    print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")
Output:
The primes to 100 are:
2  3  5  7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97  
Found 78498 primes up to 1,000,000 in 0.084122000000000002 milliseconds.
Found 50847534 primes up to 1000000000 in 139.509275 milliseconds.

This was tested on the same computer as the previous Mojo versions. Note that the time now scales quite well with range since there are no longer the huge RAM access time bottleneck's. This version is only about 2.25 times slower than Kim Walich's primesieve program written in C++ and the mostly constant factor difference will be made up if one adds wheel factorization to the same level as he uses (basic wheel factorization ratio of 48/105 plus some other more minor optimizations). This version can count the number of primes to 1e11 in about 21.85 seconds on this machine. It will work reasonably efficiently up to a range of about 1e14 before other optimization techniques such as "bucket sieving" should be used.

For counting the number of primes to a billion (1e9), this version has reduced the time by about a factor of 40 from the original version and over eight times from the odds-only version above. Adding wheel factorization will make it almost two and a half times faster yet for a gain in speed of about a hundred times over the original version.

MUMPS

ERATO1(HI)
 ;performs the Sieve of Erotosethenes up to the number passed in.
 ;This version sets an array containing the primes
 SET HI=HI\1
 KILL ERATO1 ;Don't make it new - we want it to remain after we quit the function
 NEW I,J,P
 FOR I=2:1:(HI**.5)\1 FOR J=I*I:I:HI SET P(J)=1
 FOR I=2:1:HI S:'$DATA(P(I)) ERATO1(I)=I
 KILL I,J,P
 QUIT

Example:

USER>SET MAX=100,C=0 DO ERATO1^ROSETTA(MAX) 
USER>WRITE !,"PRIMES BETWEEN 1 AND ",MAX,! FOR  SET I=$ORDER(ERATO1(I)) Q:+I<1  WRITE I,", "

PRIMES BETWEEN 1 AND 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,79, 83, 89, 97,

Neko

/* The Computer Language Shootout
   http://shootout.alioth.debian.org/

   contributed by Nicolas Cannasse
*/
fmt = function(i) {
        var s = $string(i);
        while( $ssize(s) < 8 )
                s = " "+s;
        return s;
}
nsieve = function(m) {
        var a = $amake(m);
        var count = 0;
        var i = 2;
        while( i < m ) {
                if $not(a[i]) {
                        count += 1;
                        var j = (i << 1);
                        while( j < m ) {
                                if( $not(a[j]) ) a[j] = true;
                                j += i;
                        }
                }
                i += 1;
        }
        $print("Primes up to ",fmt(m)," ",fmt(count),"\n");
}

var n = $int($loader.args[0]);
if( n == null ) n = 2;
var i = 0;
while( i < 3 ) {
        nsieve(10000 << (n - i));
        i += 1;
}
Output:
prompt$ nekoc nsieve.neko
prompt$ time -p neko nsieve.n
Primes up to    40000     4203
Primes up to    20000     2262
Primes up to    10000     1229
real 0.02
user 0.01
sys 0.00


NetRexx

Version 1 (slow)

/* NetRexx */

options replace format comments java crossref savelog symbols binary

parse arg loWatermark hiWatermark .
if loWatermark = '' | loWatermark = '.' then loWatermark = 1
if hiWatermark = '' | hiWatermark = '.' then hiWatermark = 200

do
  if \loWatermark.datatype('w') | \hiWatermark.datatype('w') then -
    signal NumberFormatException('arguments must be whole numbers')
  if loWatermark > hiWatermark then -
    signal IllegalArgumentException('the start value must be less than the end value')

  seive = sieveOfEratosthenes(hiWatermark)
  primes = getPrimes(seive, loWatermark, hiWatermark).strip

  say 'List of prime numbers from' loWatermark 'to' hiWatermark 'via a "Sieve of Eratosthenes" algorithm:'
  say '  'primes.changestr(' ', ',')
  say '  Count of primes:' primes.words
catch ex = Exception
  ex.printStackTrace
end

return

method sieveOfEratosthenes(hn = long) public static binary returns Rexx

  sv = Rexx(isTrue)
  sv[1] = isFalse
  ix = long
  jx = long

  loop ix = 2 while ix * ix <= hn
    if sv[ix] then loop jx = ix * ix by ix while jx <= hn
      sv[jx] = isFalse
      end jx
    end ix

  return sv

method getPrimes(seive = Rexx, lo = long, hi = long) private constant binary returns Rexx

  primes = Rexx('')
  loop p_ = lo to hi
    if \seive[p_] then iterate p_
    primes = primes p_
    end p_

  return primes

method isTrue public constant binary returns boolean
  return 1 == 1

method isFalse public constant binary returns boolean
  return \isTrue
Output
List of prime numbers from 1 to 200 via a "Sieve of Eratosthenes" algorithm:
  2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199
  Count of primes: 46

Version 2 (significantly, i.e. 10 times faster)

/* NetRexx ************************************************************
* Essential improvements:Use boolean instead of Rexx for sv
*                        and remove methods isTrue and isFalse
* 24.07.2012 Walter Pachl courtesy Kermit Kiser
**********************************************************************/

options replace format comments java crossref savelog symbols binary

parse arg loWatermark hiWatermark .
if loWatermark = '' | loWatermark = '.' then loWatermark = 1
if hiWatermark = '' | hiWatermark = '.' then hiWatermark = 200000

startdate=Date Date()
do
  if \loWatermark.datatype('w') | \hiWatermark.datatype('w') then -
    signal NumberFormatException('arguments must be whole numbers')
  if loWatermark > hiWatermark then -
    signal IllegalArgumentException(-
                 'the start value must be less than the end value')
  sieve = sieveOfEratosthenes(hiWatermark)
  primes = getPrimes(sieve, loWatermark, hiWatermark).strip
  if hiWatermark = 200 Then do
    say 'List of prime numbers from' loWatermark 'to' hiWatermark
    say '  'primes.changestr(' ', ',')
  end
catch ex = Exception
  ex.printStackTrace
end
enddate=Date Date()
Numeric Digits 20
say (enddate.getTime-startdate.getTime)/1000 'seconds elapsed'
say '  Count of primes:' primes.words

return

method sieveOfEratosthenes(hn = int) -
                                  public static binary returns boolean[]
  true  = boolean 1
  false = boolean 0
  sv = boolean[hn+1]
  sv[1] = false

  ix = int
  jx = int

  loop ix=2 to hn
    sv[ix]=true
    end ix

  loop ix = 2 while ix * ix <= hn
    if sv[ix] then loop jx = ix * ix by ix while jx <= hn
      sv[jx] = false
      end jx
    end ix

  return sv

method getPrimes(sieve = boolean[], lo = int, hi = int) -
                                    private constant binary Returns Rexx
  p_ = int
  primes = Rexx('')
  loop p_ = lo to hi
    if \sieve[p_] then iterate p_
    primes = primes p_
    end p_

  return primes

newLISP

This example is incorrect. Please fix the code and remove this message.

Details: This version uses rem (division) testing and so is a trial division algorithm, not a sieve of Eratosthenes.

This version is maybe a little different because it no longer stores the primes after they've been generated and sent to the main output. Lisp has very convenient list editing, so we don't really need the Boolean flag arrays you'd tend find in the Algol-like languages. We can just throw away the multiples of each prime from an initial list of integers. The implementation is easier if we delete every multiple, including the prime number itself: the list always contains only the numbers that haven't been processed yet, starting with the next prime, and the program is finished when the list becomes empty.

Note that the lambda expression in the following script does not involve a closure; newLISP has dynamic scope, so it matters that the same variable names will not be reused for some other purpose (at runtime) before the anonymous function is called.

(set 'upper-bound 1000)

; The initial sieve is a list of all the numbers starting at 2.
(set 'sieve (sequence 2 upper-bound))

; Keep working until the list is empty.
(while sieve

	; The first number in the list is always prime
	(set 'new-prime (sieve 0))
	(println new-prime)

	; Filter the list leaving only the non-multiples of each number.
	(set 'sieve
		(filter
			(lambda (each-number)
				(not (zero? (% each-number new-prime))))
			sieve)))

(exit)
Output:
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97
101
103
107
109
113
127
131
137
139
149
151
157
163
167
173
179
181
191
193
197
199
211
223
227
229
233
239
241
251
257
263
269
271
277
281
283
293
307
311
313
317
331
337
347
349
353
359
367
373
379
383
389
397
401
409
419
421
431
433
439
443
449
457
461
463
467
479
487
491
499
503
509
521
523
541
547
557
563
569
571
577
587
593
599
601
607
613
617
619
631
641
643
647
653
659
661
673
677
683
691
701
709
719
727
733
739
743
751
757
761
769
773
787
797
809
811
821
823
827
829
839
853
857
859
863
877
881
883
887
907
911
919
929
937
941
947
953
967
971
977
983
991
997

Nial

This example is incorrect. Please fix the code and remove this message.

Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

primes is sublist [ each (2 = sum eachright (0 = mod) [pass,count]), pass ] rest count

Using it

|primes 10
=2 3 5 7

Nim

from math import sqrt
 
iterator primesUpto(limit: int): int =
  let sqrtLimit = int(sqrt(float64(limit)))
  var composites = newSeq[bool](limit + 1)
  for n in 2 .. sqrtLimit: # cull to square root of limit
    if not composites[n]: # if prime -> cull its composites
      for c in countup(n * n, limit, n): # start at ``n`` squared
        composites[c] = true
  for n in 2 .. limit: # separate iteration over results
    if not composites[n]:
      yield n
 
stdout.write "The primes up to 100 are:  "
for x in primesUpto(100):
   stdout.write(x, " ")
echo()
 
var count = 0
for p in primesUpto(1000000):
  count += 1
echo "There are ", count, " primes up to 1000000."
Output:
Primes are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to 1000000.

Alternate odds-only bit-packed version

The above version wastes quite a lot of memory by using a sequence of boolean values to sieve the composite numbers and sieving all numbers when two is the only even prime. The below code uses a bit-packed sequence to save a factor of eight in memory and also sieves only odd primes for another memory saving by a factor of two; it is also over two and a half times faster due to reduced number of culling operations and better use of the CPU cache as a little cache goes a lot further - this better use of cache is more than enough to make up for the extra bit-packing shifting operations:

iterator isoe_upto(top: uint): uint =
  let topndx = int((top - 3) div 2)
  let sqrtndx = (int(sqrt float64(top)) - 3) div 2
  var cmpsts = newSeq[uint32](topndx div 32 + 1)
  for i in 0 .. sqrtndx: # cull composites for primes
    if (cmpsts[i shr 5] and (1u32 shl (i and 31))) == 0:
      let p = i + i + 3
      for j in countup((p * p - 3) div 2, topndx, p):
        cmpsts[j shr 5] = cmpsts[j shr 5] or (1u32 shl (j and 31))
  yield 2 # separate culling above and iteration here
  for i in 0 .. topndx:
    if (cmpsts[i shr 5] and (1u32 shl (i and 31))) == 0:
      yield uint(i + i + 3)

The above code can be used with the same output functions as in the first code, just replacing the name of the iterator "iprimes_upto" with this iterator's name "isoe_upto" in two places. The output will be identical.

Nim Unbounded Versions

For many purposes, one doesn't know the exact upper limit desired to easily use the above versions; in addition, those versions use an amount of memory proportional to the range sieved. In contrast, unbounded versions continuously update their range as they progress and only use memory proportional to the secondary base primes stream, which is only proportional to the square root of the range. One of the most basic functional versions is the TreeFolding sieve which is based on merging lazy streams as per Richard Bird's contribution to incremental sieves in Haskell, but which has a much better asymptotic execution complexity due to the added tree folding. The following code is a version of that in Nim (odds-only):

import sugar
from times import epochTime

type PrimeType = int
iterator primesTreeFolding(): PrimeType {.closure.} =
  # needs a Co Inductive Stream - CIS...
  type
    CIS[T] = ref object
      head: T
      tail: () -> CIS[T]

  proc merge(xs, ys: CIS[PrimeType]): CIS[PrimeType] =
    let x = xs.head;
    let y = ys.head
    if x < y:
      CIS[PrimeType](head: x, tail: () => merge(xs.tail(), ys))
    elif y < x:
      CIS[PrimeType](
        head: y,
        tail: () => merge(xs, ys.tail()))
    else:
      CIS[PrimeType](
        head: x,
        tail: () => merge(xs.tail(), ys.tail()))

  proc pmults(p: PrimeType): CIS[PrimeType] =
    let inc = p + p
    proc mlts(c: PrimeType): CIS[PrimeType] =
      CIS[PrimeType](head: c, tail: () => mlts(c + inc))
    mlts(p * p)

  proc allmults(ps: CIS[PrimeType]): CIS[CIS[PrimeType]] =
    CIS[CIS[PrimeType]](
      head: pmults(ps.head),
      tail: () => allmults(ps.tail()))

  proc pairs(css: CIS[CIS[PrimeType]]): CIS[CIS[PrimeType]] =
    let cs0 = css.head;
    let rest0 = css.tail()
    CIS[CIS[PrimeType]](
      head: merge(cs0, rest0.head),
      tail: () => pairs(rest0.tail()))

  proc cmpsts(css: CIS[CIS[PrimeType]]): CIS[PrimeType] =
    let cs0 = css.head
    CIS[PrimeType](
      head: cs0.head,
      tail: () => merge(cs0.tail(), css.tail().pairs.cmpsts))

  proc minusAt(n: PrimeType, cs: CIS[PrimeType]): CIS[PrimeType] =
    var nn = n;
    var ncs = cs
    while nn >= ncs.head:
      nn += 2;
      ncs = ncs.tail()
    CIS[PrimeType](head: nn, tail: () => minusAt(nn + 2, ncs))

  proc oddprms(): CIS[PrimeType] =
    CIS[PrimeType](
      head: 3.PrimeType,
      tail: () => minusAt(5.PrimeType, oddprms().allmults.cmpsts))

  var prms = CIS[PrimeType](head: 2.PrimeType, tail: () => oddprms())
  while true:
    yield prms.head;
    prms = prms.tail()

stdout.write "The first 25 primes are:  "
var counter = 0
for p in primesTreeFolding():
  if counter >= 25: break
  stdout.write(p, " "); counter += 1
echo()

let start = epochTime()
counter = 0
for p in primesTreeFolding():
  if p > 1000000: break
  else: counter += 1
let elapsed = epochTime() - start
echo "There are ", counter, " primes up to 1000000."
echo "This test took ", elapsed, " seconds."
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 78498 primes up to 1000000.
This test took 0.2780287265777588 seconds.

With Nim 1.4, it takes about 0.4s (in release or danger mode) to compute the primes until one million, better than the time needed in previous versions. With option "--gc arc", this time drops to 0.28s on a small laptop. This is still slow compared to bound algorithm which is due to the many small memory allocations/de-allocations required, which is a characteristic of functional forms of code. Is is purely functional in that everything is immutable other than that Nim does not have Tail Call Optimization (TCO) so that we can freely use function recursion with no execution time cost; therefore, where necessary this is implemented with imperative loops, which is what TCO is generally turned into such forms "under the covers". It is also slow due to the algorithm being only O(n (log n) (log (log n))) rather than without the extra "log n" factor as some version have. This slowness makes it only moderately useful for ranges up to a few million.

Since the algorithm does not require the memoization of a full lazy list, it uses an internal Co Inductive Stream of deferred execution states, finally outputting an iterator to enumerate over the lazily computed stream of primes.

A faster alternative using a mutable hash table (odds-only)

To show the cost of functional forms of code, the following code is written embracing mutability, both by using a mutable hash table to store the state of incremental culling by the secondary stream of base primes and by using mutable values to store the state wherever possible, as per the following code:

import tables, times

type PrimeType = int
proc primesHashTable(): iterator(): PrimeType {.closure.} =
  iterator output(): PrimeType {.closure.} =
    # some initial values to avoid race and reduce initializations...
    yield 2.PrimeType; yield 3.PrimeType; yield 5.PrimeType; yield 7.PrimeType
    var h = initTable[PrimeType,PrimeType]()
    var n = 9.PrimeType
    let bps = primesHashTable()
    var bp = bps()  # advance past 2
    bp = bps()
    var q = bp * bp # to initialize with 3
    while true:
      if n >= q:
        let incr = bp + bp
        h[n + incr] = incr
        bp = bps()
        q = bp * bp
      elif h.hasKey(n):
        var incr: PrimeType
        discard h.take(n, incr)
        var nxt = n + incr
        while h.hasKey(nxt):
          nxt += incr # ensure no duplicates
        h[nxt] = incr
      else:
        yield n
      n += 2.PrimeType
  output

stdout.write "The first 25 primes are:  "
var counter = 0
var iter = primesHashTable()
for p in iter():
  if counter >= 25:
    break
  else:
    stdout.write(p, " ")
    counter += 1
echo ""
let start = epochTime()
counter = 0
iter = primesHashTable()
for p in iter():
  if p > 1000000: break
  else: counter += 1
let elapsed = epochTime() - start
echo "The number of primes up to a million is:  ", counter
stdout.write("This test took ", elapsed, " seconds.\n")
Output:

Time for version compiled with “-d:danger” option.

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
The number of primes up to a million is:  78498
This test took 0.05106830596923828 seconds.

The output is identical to the first unbounded version, other than, in danger mode, it is over about eight times faster sieving to a million. For larger ranges it will continue to pull further ahead of the above version due to only O(n (log (log n))) performance because of the hash table having an average of O(1) access, and it is only so slow due to the large constant overhead of doing the hashing calculations and look-ups.

Very fast Page Segmented version using a bit-packed mutable array (odds-only)

Note: This version is used as a very fast alternative in Extensible_prime_generator#Nim

For the highest speeds, one needs to use page segmented mutable arrays as in the bit-packed version here:

# a Page Segmented Odd-Only Bit-Packed Sieve of Eratosthenes...

from times import epochTime # for testing
from bitops import popCount

type Prime = uint64

let LIMIT = 1_000_000_000.Prime
let CPUL1CACHE = 16384 # in bytes

const FRSTSVPRM = 3.Prime

type
  BasePrime = uint32
  BasePrimeArray = seq[BasePrime]
  SieveBuffer = seq[byte] # byte size gives the most potential efficiency...

# define a general purpose lazy list to use as secondary base prime arrays feed
# NOT thread safe; needs a Mutex gate to make it so, but not threaded (yet)...
type
  BasePrimeArrayLazyList = ref object
    head: BasePrimeArray
    tailf: proc (): BasePrimeArrayLazyList {.closure.}
    tail: BasePrimeArrayLazyList
template makeBasePrimeArrayLazyList(hd: BasePrimeArray;
                      body: untyped): untyped = # factory constructor
  let thnk = proc (): BasePrimeArrayLazyList {.closure.} = body
  BasePrimeArrayLazyList(head: hd, tailf: thnk)
proc rest(lzylst: sink BasePrimeArrayLazyList): BasePrimeArrayLazyList {.inline.} =
  if lzylst.tailf != nil: lzylst.tail = lzylst.tailf(); lzylst.tailf = nil
  return lzylst.tail
iterator items(lzylst: BasePrimeArrayLazyList): BasePrime {.inline.} =
  var ll = lzylst
  while ll != nil:
    for bp in ll.head: yield bp
    ll = ll.rest

# count the number of zero bits (primes) in a SieveBuffer,
# uses native popCount for extreme speed;
# counts up to the bit index of the last bit to be counted...
proc countSieveBuffer(lsti: int; cmpsts: SieveBuffer): int =
  let lstw = (lsti shr 3) and -8; let lstm = lsti and 63 # last word and bit index!
  result = (lstw shl 3) + 64 # preset for all ones!
  let cmpstsa = cast[int](cmpsts[0].unsafeAddr)
  let cmpstslsta = cmpstsa + lstw
  for csa in countup(cmpstsa, cmpstslsta - 1, 8):
    result -= cast[ptr uint64](csa)[].popCount # subtract number of found ones!
  let msk = (0'u64 - 2'u64) shl lstm # mask for the unused bits in last word!
  result -= (cast[ptr uint64](cmpstslsta)[] or msk).popCount
 
# a fast fill SieveBuffer routine using pointers...
proc fillSieveBuffer(sb: var SieveBuffer) = zeroMem(sb[0].unsafeAddr, sb.len)

const BITMASK = [1'u8, 2, 4, 8, 16, 32, 64, 128] # faster than shifting!

# do sieving work, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
proc cullSieveBuffer(lwi: int; bpas: BasePrimeArrayLazyList;
                               sb: var SieveBuffer) =
  let len = sb.len; let szbits = len shl 3; let nxti = lwi + szbits
  for bp in bpas:
    let bpwi = ((bp.Prime - FRSTSVPRM) shr 1).int
    var s = (bpwi shl 1) * (bpwi + FRSTSVPRM.int) + FRSTSVPRM.int
    if s >= nxti: break
    if s >= lwi: s -= lwi
    else:
      let r = (lwi - s) mod bp.int
      s = (if r == 0: 0 else: bp.int - r)
    let clmt = szbits - (bp.int shl 3)
#    if len == CPUL1CACHE: continue
    if s < clmt:
      let slmt = s + (bp.int shl 3)
      while s < slmt:
        let msk = BITMASK[s and 7]
        for c in countup(s shr 3, len - 1, bp.int):
          sb[c] = sb[c] or msk
        s += bp.int
      continue
    while s < szbits:
      let w = s shr 3; sb[w] = sb[w] or BITMASK[s and 7]; s += bp.int # (1'u8 shl (s and 7))

proc makeBasePrimeArrays(): BasePrimeArrayLazyList # forward reference!

# an iterator over successive sieved buffer composite arrays,
# returning whatever type the cnvrtr produces from
# the low index and the culled SieveBuffer...
proc makePrimePages[T](
    strtwi, sz: int; cnvrtrf: proc (li: int; sb: var SieveBuffer): T {.closure.}
      ): (iterator(): T {.closure.}) =
  var lwi = strtwi; let bpas = makeBasePrimeArrays(); var cmpsts = newSeq[byte](sz)
  return iterator(): T {.closure.} =
    while true:
      fillSieveBuffer(cmpsts); cullSieveBuffer(lwi, bpas, cmpsts)
      yield cnvrtrf(lwi, cmpsts); lwi += cmpsts.len shl 3
 
# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
proc makeBasePrimeArrays(): BasePrimeArrayLazyList =
  # converts an entire sieved array of bytes into an array of base primes,
  # to be used as a source of base primes as part of the Lazy List...
  proc sb2bpa(li: int; sb: var SieveBuffer): BasePrimeArray =
    let szbits = sb.len shl 3; let len = countSieveBuffer(szbits - 1, sb)
    result = newSeq[BasePrime](len); var j = 0
    for i in 0 ..< szbits:
      if (sb[i shr 3] and BITMASK[i and 7]) == 0'u8:
        result[j] = FRSTSVPRM.BasePrime + ((li + i) shl 1).BasePrime; j.inc
  proc nxtbparr(
      pgen: iterator (): BasePrimeArray {.closure.}): BasePrimeArrayLazyList =
    return makeBasePrimeArrayLazyList(pgen()): nxtbparr(pgen)
  # pre-seeding first array breaks recursive race,
  # dummy primes of all odd numbers starting at FRSTSVPRM (unculled)...
  var cmpsts = newSeq[byte](512)
  let dummybparr = sb2bpa(0, cmpsts)
  let fakebps = makeBasePrimeArrayLazyList(dummybparr): nil # used just once here!
  cullSieveBuffer(0, fakebps, cmpsts)
  return makeBasePrimeArrayLazyList(sb2bpa(0, cmpsts)):
    nxtbparr(makePrimePages(4096, 512, sb2bpa)) # lazy recursive call breaks race!
 
# iterator over primes from above page iterator;
# takes at least as long to enumerate the primes as sieve them...
iterator primesPaged(): Prime {.inline.} =
  yield 2
  proc mkprmarr(li: int; sb: var SieveBuffer): seq[Prime] =
    let szbits = sb.len shl 3; let low = FRSTSVPRM + (li + li).Prime; var j = 0
    let len = countSieveBuffer(szbits - 1, sb); result = newSeq[Prime](len)
    for i in 0 ..< szbits:
      if (sb[i shr 3] and BITMASK[i and 7]) == 0'u8:
        result[j] = low + (i + i).Prime; j.inc
  let gen = makePrimePages(0, CPUL1CACHE, mkprmarr)
  for prmpg in gen():
    for prm in prmpg: yield prm
 
proc countPrimesTo(range: Prime): int64 =
  if range < FRSTSVPRM: return (if range < 2: 0 else: 1)
  result = 1; let rngi = ((range - FRSTSVPRM) shr 1).int
  proc cntr(li: int; sb: var SieveBuffer): (int, int) {.closure.} =
    let szbits = sb.len shl 3; let nxti = li + szbits; result = (0, nxti)
    if nxti <= rngi: result[0] += countSieveBuffer(szbits - 1, sb)
    else: result[0] += countSieveBuffer(rngi - li, sb)
  let gen = makePrimePages(0, CPUL1CACHE, cntr)
  for count, nxti in gen():
    result += count; if nxti > rngi: break

# showing results...
echo "Page Segmented Bit-Packed Odds-Only Sieve of Eratosthenes"
echo "Needs at least ", CPUL1CACHE, " bytes of CPU L1 cache memory.\n"

stdout.write "First 25 primes:  "
var counter0 = 0
for p in primesPaged():
  if counter0 >= 25: break
  stdout.write(p, " "); counter0.inc
echo ""

stdout.write "The number of primes up to a million is:  "
var counter1 = 0
for p in primesPaged():
  if p > 1_000_000.Prime: break else: counter1.inc
stdout.write counter1, " - these both found by (slower) enumeration.\n"

let start = epochTime()
#[ # slow way to count primes takes as long to enumerate as sieve!
var counter = 0
for p in primesPaged():
  if p > LIMIT: break else: counter.inc
# ]#
let counter = countPrimesTo LIMIT # the fast way using native popCount!
let elpsd = epochTime() - start

echo "Found ", counter, " primes up to ", LIMIT, " in ", elpsd, " seconds."
Output:

Time is obtained with Nim 1.4 with options -d:danger --gc:arc.

Page Segmented Bit-Packed Odds-Only Sieve of Eratosthenes
Needs at least 16384 bytes of CPU L1 cache memory.

First 25 primes:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
The number of primes up to a million is:  78498 - these both found by (slower) enumeration.
Found 50847534 primes up to 1000000000 in 0.7931935787200928 seconds.

The above version approaches a hundred times faster than the incremental style versions above due to the high efficiency of direct mutable memory operations in modern CPU's, and is useful for ranges of billions. This version maintains its efficiency using the CPU L1 cache to a range of over 16 billion and then gets a little slower for ranges of a trillion or more using the CPU's L2 cache. It takes an average of only about 3.5 CPU clock cycles per composite number cull, or about 70 CPU clock cycles per prime found.

Note that the fastest performance is realized by using functions that directly manipulate the output "seq" (array) of culled bit number representations such as the `countPrimesTo` function provided, as enumeration using the `primesPaged` iterator takes about as long to enumerate the found primes as it takes to cull the composites.

Many further improvements in speed can be made, as in tuning the medium ranges to more efficiently use the CPU caches for an improvement in the middle ranges of up to a factor of about two, full maximum wheel factorization for a further improvement of about four, extreme loop unrolling for a further improvement of approximately two, multi-threading for an improvement of the factor of effective CPU cores used, etc. However, these improvements are of little point when used with enumeration; for instance, if one successfully reduced the time to sieve the composite numbers to zero, it would still take about a second just to enumerate the resulting primes over a range of a billion.

Niue

This example is incorrect. Please fix the code and remove this message.

Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

[ dup 2 < ] '<2 ;
[ 1 + 'count ; [ <2 [ , ] when ] count times ] 'fill-stack ;

0 'n ; 0 'v ;

[ .clr 0 'n ; 0 'v ; ] 'reset ;
[ len 1 - n - at 'v ; ] 'set-base ;
[ n 1 + 'n ; ] 'incr-n ;
[ mod 0 = ] 'is-factor ;
[ dup * ] 'sqr ;

[ set-base
  v sqr 2 at > not 
  [ [ dup v = not swap v is-factor and ] remove-if incr-n run ] when ] 'run ;

[ fill-stack run ] 'sieve ;

( tests )

10 sieve .s ( => 2 3 5 7 9 ) reset newline
30 sieve .s ( => 2 3 5 7 11 13 17 19 23 29 )

Oberon-2

MODULE Primes;

   IMPORT Out, Math;

   CONST N = 1000;

   VAR a: ARRAY N OF BOOLEAN;
      i, j, m: INTEGER;

BEGIN
   (* Set all elements of a to TRUE. *)
   FOR i := 1 TO N - 1 DO
      a[i] := TRUE;
   END;

   (* Compute square root of N and convert back to INTEGER. *)
   m := ENTIER(Math.Sqrt(N));

   FOR i := 2 TO m DO
      IF a[i] THEN
         FOR j := 2 TO (N - 1) DIV i DO 
            a[i*j] := FALSE;
         END;
      END;
   END;

   (* Print all the elements of a that are TRUE. *)
   FOR i := 2 TO N - 1 DO
      IF a[i] THEN
         Out.Int(i, 4);
      END;
   END;
   Out.Ln;
END Primes.

OCaml

Imperative

let sieve n =
  let is_prime = Array.create n true in
  let limit = truncate(sqrt (float (n - 1))) in
  for i = 2 to limit do
    if is_prime.(i) then
      let j = ref (i*i) in
      while !j < n do
        is_prime.(!j) <- false;
        j := !j + i;
      done
  done;
  is_prime.(0) <- false;
  is_prime.(1) <- false;
  is_prime
let primes n =
  let primes, _ =
    let sieve = sieve n in
    Array.fold_right
      (fun is_prime (xs, i) -> if is_prime then (i::xs, i-1) else (xs, i-1))
      sieve
      ([], Array.length sieve - 1)
  in
  primes

in the top-level:

# primes 100 ;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97]

Functional

(* first define some iterators *)
let fold_iter f init a b =
  let rec aux acc i =
    if i > b
    then (acc)
    else aux (f acc i) (succ i)
  in
  aux init a
(* val fold_iter : ('a -> int -> 'a) -> 'a -> int -> int -> 'a *)

let fold_step f init a b step =
  let rec aux acc i =
    if i > b
    then (acc)
    else aux (f acc i) (i + step)
  in
  aux init a
(* val fold_step : ('a -> int -> 'a) -> 'a -> int -> int -> int -> 'a *)

(* remove a given value from a list *)
let remove li v =
  let rec aux acc = function
    | hd::tl when hd = v -> (List.rev_append acc tl)
    | hd::tl -> aux (hd::acc) tl
    | [] -> li
  in
  aux [] li
(* val remove : 'a list -> 'a -> 'a list *)

(* the main function *)
let primes n =
  let li =
    (* create a list [from 2; ... until n] *)
    List.rev(fold_iter (fun acc i -> (i::acc)) [] 2 n)
  in
  let limit = truncate(sqrt(float n)) in
  fold_iter (fun li i ->
      if List.mem i li  (* test if (i) is prime *)
      then (fold_step remove li (i*i) n i)
      else li)
    li 2 (pred limit)
(* val primes : int -> int list *)

in the top-level:

# primes 200 ;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
 157; 163; 167; 173; 179; 181; 191; 193; 197; 199]

Another functional version

This uses zero to denote struck-out numbers. It is slightly inefficient as it strikes-out multiples above p rather than p2

let rec strike_nth k n l = match l with
  | [] -> []
  | h :: t ->
    if k = 0 then 0 :: strike_nth (n-1) n t
    else h :: strike_nth (k-1) n t
(* val strike_nth : int -> int -> int list -> int list *)

let primes n =
  let limit = truncate(sqrt(float n)) in
  let rec range a b = if a > b then [] else a :: range (a+1) b in
  let rec sieve_primes l = match l with
    | [] -> []
    | 0 :: t -> sieve_primes t
    | h :: t -> if h > limit then List.filter ((<) 0) l else
        h :: sieve_primes (strike_nth (h-1) h t) in
  sieve_primes (range 2 n)
(* val primes : int -> int list *)

in the top-level:

# primes 200;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
 157; 163; 167; 173; 179; 181; 191; 193; 197; 199]

Oforth

: eratosthenes(n)
| i j |
   ListBuffer newSize(n) dup add(null) seqFrom(2, n) over addAll
   2 n sqrt asInteger for: i [
      dup at(i) ifNotNull: [ i sq n i step: j [ dup put(j, null) ] ]
      ]
   filter(#notNull) ;
Output:
>100 eratosthenes println
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Ol

(define all (iota 999 2))

(print
   (let main ((left '()) (right all))
      (if (null? right)
         (reverse left)
         (unless (car right)
            (main left (cdr right))
            (let loop ((l '()) (r right) (n 0) (every (car right)))
               (if (null? r)
                  (let ((l (reverse l)))
                     (main (cons (car l) left) (cdr l)))
                  (if (eq? n every)
                     (loop (cons #false l) (cdr r) 1 every)
                     (loop (cons (car r) l) (cdr r) (+ n 1) every)))))))
)

Output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997)

ooRexx

/*ooRexx program generates & displays primes via the sieve of Eratosthenes.
*                       derived from first Rexx version
*                       uses an array rather than a stem for the list
*                       uses string methods rather than BIFs
*                       uses new ooRexx keyword LOOP, extended assignment
*                         and line comments
*                       uses meaningful variable names and restructures code
*                         layout for improved understandability
****************************************************************************/
  arg highest                       --get highest number to use.
  if \highest~datatype('W') then
    highest = 200                   --use default value.
  isPrime = .array~new(highest)     --container for all numbers.
  isPrime~fill(1)                   --assume all numbers are prime.
  w = highest~length                --width of the biggest number,
                                    --  it's used for aligned output.
  out1 = 'prime'~right(20)          --first part of output messages.
  np = 0                            --no primes so far.
  loop j = 2 for highest - 1        --all numbers up through highest.
    if isPrime[j] = 1 then do       --found one.
      np += 1                       --bump the prime counter.
      say out1 np~right(w) ' --> ' j~right(w)   --display output.
      loop m = j * j to highest by j
        isPrime[m] = ''             --strike all multiples: not prime.
      end
    end
  end
  say
  say np~right(out1~length + 1 + w) 'primes found up to and including ' highest
  exit
Output:
               prime   1  -->    2
               prime   2  -->    3
               prime   3  -->    5
               prime   4  -->    7
               prime   5  -->   11
               prime   6  -->   13
               prime   7  -->   17
               prime   8  -->   19
               prime   9  -->   23
               prime  10  -->   29
               prime  11  -->   31
               prime  12  -->   37
               prime  13  -->   41
               prime  14  -->   43
               prime  15  -->   47
               prime  16  -->   53
               prime  17  -->   59
               prime  18  -->   61
               prime  19  -->   67
               prime  20  -->   71
               prime  21  -->   73
               prime  22  -->   79
               prime  23  -->   83
               prime  24  -->   89
               prime  25  -->   97
               prime  26  -->  101
               prime  27  -->  103
               prime  28  -->  107
               prime  29  -->  109
               prime  30  -->  113
               prime  31  -->  127
               prime  32  -->  131
               prime  33  -->  137
               prime  34  -->  139
               prime  35  -->  149
               prime  36  -->  151
               prime  37  -->  157
               prime  38  -->  163
               prime  39  -->  167
               prime  40  -->  173
               prime  41  -->  179
               prime  42  -->  181
               prime  43  -->  191
               prime  44  -->  193
               prime  45  -->  197
               prime  46  -->  199

                      46 primes found up to and including  200

Wheel Version

/*ooRexx program generates primes via sieve of Eratosthenes algorithm.
*                       wheel version, 2 handled as special case
*                       loops optimized: outer loop stops at the square root of
*                         the limit, inner loop starts at the square of the
*                         prime just found
*                       use a list rather than an array and remove composites
*                         rather than just mark them
*                       convert list of primes to a list of output messages and
*                         display them with one say statement
*******************************************************************************/
    arg highest                             -- get highest number to use.
    if \highest~datatype('W') then
        highest = 200                       -- use default value.
    w = highest~length                      -- width of the biggest number,
                                            --  it's used for aligned output.
    thePrimes = .list~of(2)                 -- the first prime is 2.
    loop j = 3 to highest by 2              -- populate the list with odd nums.
        thePrimes~append(j)
    end

    j = 3                                   -- first prime (other than 2)
    ix = thePrimes~index(j)                 -- get the index of 3 in the list.
    loop while j*j <= highest               -- strike multiples of odd ints.
                                            --  up to sqrt(highest).
        loop jm = j*j to highest by j+j     -- start at J squared, incr. by 2*J.
            thePrimes~removeItem(jm)        -- delete it since it's composite.
        end
        ix = thePrimes~next(ix)             -- the index of the next prime.
        j = thePrimes[ix]                   -- the next prime.
    end
    np = thePrimes~items                    -- the number of primes since the
                                            --  list is now only primes.
    out1 = '           prime number'        -- first part of output messages.
    out2 = ' --> '                          -- middle part of output messages.
    ix = thePrimes~first
    loop n = 1 to np                        -- change the list of primes
                                            --  to output messages.
        thePrimes[ix] = out1 n~right(w) out2 thePrimes[ix]~right(w)
        ix = thePrimes~next(ix)
    end
    last = np~right(out1~length+1+w) 'primes found up to and including ' highest
    thePrimes~append(.endofline || last)    -- add blank line and summary line.
    say thePrimes~makearray~toString        -- display the output.
    exit
Output:

when using the limit of 100

           prime number    1  -->     2
           prime number    2  -->     3
           prime number    3  -->     5
           prime number    4  -->     7
           prime number    5  -->    11
           prime number    6  -->    13
           prime number    7  -->    17
           prime number    8  -->    19
           prime number    9  -->    23
           prime number   10  -->    29
           prime number   11  -->    31
           prime number   12  -->    37
           prime number   13  -->    41
           prime number   14  -->    43
           prime number   15  -->    47
           prime number   16  -->    53
           prime number   17  -->    59
           prime number   18  -->    61
           prime number   19  -->    67
           prime number   20  -->    71
           prime number   21  -->    73
           prime number   22  -->    79
           prime number   23  -->    83
           prime number   24  -->    89
           prime number   25  -->    97

                          25 primes found up to and including  100

Oz

Translation of: Haskell
declare
  fun {Sieve N}
     S = {Array.new 2 N true}
     M = {Float.toInt {Sqrt {Int.toFloat N}}}
  in
     for I in 2..M do
	if S.I then
	   for J in I*I..N;I do
	      S.J := false
	   end
	end
     end
     S
  end

  fun {Primes N}
     S = {Sieve N}
  in
     for I in 2..N collect:C do
	if S.I then {C I} end
     end
  end
in
  {Show {Primes 30}}

PARI/GP

Eratosthenes(lim)={
  my(v=Vecsmall(lim\1,unused,1));
  forprime(p=2,sqrt(lim),
    forstep(i=p^2,lim,p,
      v[i]=0
    )
  );
  for(i=1,lim,if(v[i],print1(i", ")))
};

An alternate version:

Sieve(n)=
{
v=vector(n,unused,1);
for(i=2,sqrt(n),
    if(v[i],
       forstep(j=i^2,n,i,v[j]=0)));
for(i=2,n,if(v[i],print1(i",")))
};

Pascal

Note: Some Pascal implementations put quite low limits on the size of a set (e.g. Turbo Pascal doesn't allow more than 256 members). To compile on such an implementation, reduce the constant PrimeLimit accordingly.

program primes(output)

const
 PrimeLimit = 1000;

var
 primes: set of 1 .. PrimeLimit;
 n, k: integer;
 needcomma: boolean;

begin
 { calculate the primes }
 primes := [2 .. PrimeLimit];
 for n := 1 to trunc(sqrt(PrimeLimit)) do
  begin
   if n in primes
    then
     begin
      k := n*n;
      while k < PrimeLimit do
       begin
        primes := primes - [k];
        k := k + n
       end
     end
  end;

  { output the primes }
  needcomma := false;
  for n := 1 to PrimeLimit do
   if n in primes
    then
     begin
      if needcomma
       then
        write(', ');
      write(n);
      needcomma := true
     end
end.

alternative using wheel

Using growing wheel to fill array for sieving for minimal unmark operations. Sieving only with possible-prime factors.

program prim(output);
//Sieve of Erathosthenes with fast elimination of multiples of small primes
{$IFNDEF FPC}
  {$APPTYPE CONSOLE}
{$ENDIF}
const
  PrimeLimit = 100*1000*1000;//1;
type
  tLimit = 1..PrimeLimit;
var
  //always initialized with 0 => false at startup
  primes: array [tLimit] of boolean;

function BuildWheel: longInt;
//fill primfield with no multiples of small primes
//returns next sieveprime
//speedup ~1/3
var
  //wheelprimes = 2,3,5,7,11... ;
  //wheelsize = product [i= 0..wpno-1]wheelprimes[i] > Uint64 i> 13
  wheelprimes :array[0..13] of byte;
  wheelSize,wpno,
  pr,pw,i, k: LongWord;
begin
  //the mother of all numbers 1 ;-)
  //the first wheel = generator of numbers
  //not divisible by the small primes first found primes
  pr := 1;
  primes[1]:= true;
  WheelSize := 1;

  wpno := 0;
  repeat
    inc(pr);
    //pw = pr projected in wheel of wheelsize
    pw := pr;
    if pw > wheelsize then
      dec(pw,wheelsize);
    If Primes[pw] then
    begin
//      writeln(pr:10,pw:10,wheelsize:16);
      k := WheelSize+1;
      //turn the wheel (pr-1)-times
      for i := 1 to pr-1 do
      begin
        inc(k,WheelSize);
        if k<primeLimit then
          move(primes[1],primes[k-WheelSize],WheelSize)
        else
        begin
          move(primes[1],primes[k-WheelSize],PrimeLimit-WheelSize*i);
          break;
        end;
      end;
      dec(k);
      IF k > primeLimit then
        k := primeLimit;
      wheelPrimes[wpno] := pr;
      primes[pr] := false;

      inc(wpno);
      //the new wheelsize
      WheelSize := k;

      //sieve multiples of the new found prime
      i:= pr;
      i := i*i;
      while i <= k do
      begin
        primes[i] := false;
        inc(i,pr);
      end;
    end;
  until WheelSize >= PrimeLimit;

  //re-insert wheel-primes
  // 1 still stays prime
  while wpno > 0 do
  begin
    dec(wpno);
    primes[wheelPrimes[wpno]] := true;
  end;
  BuildWheel  := pr+1;
end;

procedure Sieve;
var
  sieveprime,
  fakt : LongWord;
begin
//primes[1] = true is needed to stop for sieveprime = 2
// at //Search next smaller possible prime
  sieveprime := BuildWheel;
//alternative here
  //fillchar(primes,SizeOf(Primes),chr(ord(true)));sieveprime := 2;
  repeat
    if primes[sieveprime] then
    begin
      //eliminate 'possible prime' multiples of sieveprime
      //must go downwards
      //2*2 would unmark 4 -> 4*2 = 8 wouldnt be unmarked
      fakt := PrimeLimit DIV sieveprime;
      IF fakt < sieveprime then
        BREAK;
      repeat
        //Unmark
        primes[sieveprime*fakt] := false;
        //Search next smaller possible prime
        repeat
          dec(fakt);
        until primes[fakt];
      until fakt < sieveprime;
    end;
    inc(sieveprime);
  until false;
  //remove 1
  primes[1] := false;
end;

var
  prCnt,
  i : LongWord;
Begin
  Sieve;
  {count the primes }
  prCnt := 0;
  for i:= 1 to PrimeLimit do
    inc(prCnt,Ord(primes[i]));
  writeln(prCnt,' primes up to ',PrimeLimit);
end.

output: ( i3 4330 Haswell 3.5 Ghz fpc 2.6.4 -O3 )

5761455 primes up to 100000000

real	0m0.204s
user	0m0.193s
sys	0m0.013s

Perl

For highest performance and ease, typically a module would be used, such as Math::Prime::Util, Math::Prime::FastSieve, or Math::Prime::XS.

Classic Sieve

sub sieve {
  my $n = shift;
  my @composite;
  for my $i (2 .. int(sqrt($n))) {
    if (!$composite[$i]) {
      for (my $j = $i*$i; $j <= $n; $j += $i) {
        $composite[$j] = 1;
      }
    }
  }
  my @primes;
  for my $i (2 .. $n) {
    $composite[$i] || push @primes, $i;
  }
  @primes;
}

Odds only (faster)

sub sieve2 {
  my($n) = @_;
  return @{([],[],[2],[2,3],[2,3])[$n]} if $n <= 4;

  my @composite;
  for (my $t = 3;  $t*$t <= $n;  $t += 2) {
     if (!$composite[$t]) {
        for (my $s = $t*$t;  $s <= $n;  $s += $t*2)
           { $composite[$s]++ }
     }
  }
  my @primes = (2);
  for (my $t = 3;  $t <= $n;  $t += 2) { 
     $composite[$t] || push @primes, $t;
  }
  @primes;
}

Odds only, using vectors for lower memory use

sub dj_vector {
  my($end) = @_;
  return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
  $end-- if ($end & 1) == 0; # Ensure end is odd

  my ($sieve, $n, $limit, $s_end) = ( '', 3, int(sqrt($end)), $end >> 1 );
  while ( $n <= $limit ) {
    for (my $s = ($n*$n) >> 1; $s <= $s_end; $s += $n) {
      vec($sieve, $s, 1) = 1;
    }
    do { $n += 2 } while vec($sieve, $n >> 1, 1) != 0;
  }
  my @primes = (2);
  do { push @primes, 2*$_+1 if !vec($sieve,$_,1) } for (1..int(($end-1)/2));
  @primes;
}

Odds only, using strings for best performance

Compared to array versions, about 2x faster (with 5.16.0 or later) and lower memory. Much faster than the experimental versions below. It's possible a mod-6 or mod-30 wheel could give more improvement, though possibly with obfuscation. The best next step for performance and functionality would be segmenting.

sub string_sieve {
  my ($n, $i, $s, $d, @primes) = (shift, 7);

  local $_ = '110010101110101110101110111110' .
             '101111101110101110101110111110' x ($n/30);

  until (($s = $i*$i) > $n) {
    $d = $i<<1;
    do { substr($_, $s, 1, '1') } until ($s += $d) > $n;
    1 while substr($_, $i += 2, 1);
  }
  $_ = substr($_, 1, $n); 
  # For just the count:  return ($_ =~ tr/0//);
  push @primes, pos while m/0/g;
  @primes;
}

This older version uses half the memory, but at the expense of a bit of speed and code complexity:

sub dj_string {
  my($end) = @_;
  return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
  $end-- if ($end & 1) == 0;
  my $s_end = $end >> 1;

  my $whole = int( ($end>>1) / 15);    # prefill with 3 and 5 marked
  my $sieve = '100010010010110' . '011010010010110' x $whole;
  substr($sieve, ($end>>1)+1) = '';
  my ($n, $limit, $s) = ( 7, int(sqrt($end)), 0 );
  while ( $n <= $limit ) {
    for ($s = ($n*$n) >> 1; $s <= $s_end; $s += $n) {
      substr($sieve, $s, 1) = '1';
    }
    do { $n += 2 } while substr($sieve, $n>>1, 1);
  }
  # If you just want the count, it's very fast:
  #       my $count = 1 + $sieve =~ tr/0//;
  my @primes = (2);
  push @primes, 2*pos($sieve)-1 while $sieve =~ m/0/g;
  @primes;
}

Experimental

These are examples of golfing or unusual styles.

Golfing a bit, at the expense of speed:

sub sieve{ my (@s, $i);
	grep { not $s[ $i  = $_ ] and do
		 { $s[ $i += $_ ]++ while $i <= $_[0]; 1 }
	} 2..$_[0]
}

print join ", " => sieve 100;

Or with bit strings (much slower than the vector version above):

sub sieve{ my ($s, $i);
	grep { not vec $s, $i  = $_, 1 and do 
		{ (vec $s, $i += $_, 1) = 1 while $i <= $_[0]; 1 }
	} 2..$_[0]
}

print join ", " => sieve 100;

A short recursive version:

sub erat {
    my $p = shift;
    return $p, $p**2 > $_[$#_] ? @_ : erat(grep $_%$p, @_)
}

print join ', ' => erat 2..100000;

Regexp (purely an example -- the regex engine limits it to only 32769):

sub sieve {
	my ($s, $p) = "." . ("x" x shift);

	1 while ++$p
		and $s =~ /^(.{$p,}?)x/g
		and $p = length($1)
		and $s =~ s/(.{$p})./${1}./g
		and substr($s, $p, 1) = "x";
	$s
}

print sieve(1000);

Extensible sieves

Here are two incremental versions, which allows one to create a tied array of primes:

use strict;
use warnings;
package Tie::SieveOfEratosthenes;

sub TIEARRAY {
	my $class = shift;
	bless \$class, $class;
}

# If set to true, produces copious output.  Observing this output
# is an excellent way to gain insight into how the algorithm works.
use constant DEBUG => 0;

# If set to true, causes the code to skip over even numbers,
# improving runtime.  It does not alter the output content, only the speed.
use constant WHEEL2 => 0;

BEGIN {

	# This is loosely based on the Python implementation of this task,
	# specifically the "Infinite generator with a faster algorithm"

	my @primes = (2, 3);
	my $ps = WHEEL2 ? 1 : 0;
	my $p = $primes[$ps];
	my $q = $p*$p;
	my $incr = WHEEL2 ? 2 : 1;
	my $candidate = $primes[-1] + $incr;
	my %sieve;
	
	print "Initial: p = $p, q = $q, candidate = $candidate\n" if DEBUG;

	sub FETCH {
		my $n = pop;
		return if $n < 0;
		return $primes[$n] if $n <= $#primes;
		OUTER: while( 1 ) {

			# each key in %sieve is a composite number between
			# p and p-squared.  Each value in %sieve is $incr x the prime
			# which acted as a 'seed' for that key.  We use the value
			# to step through multiples of the seed-prime, until we find
			# an empty slot in %sieve.
			while( my $s = delete $sieve{$candidate} ) {
				print "$candidate a multiple of ".($s/$incr).";\t\t" if DEBUG;
				my $composite = $candidate + $s;
				$composite += $s while exists $sieve{$composite};
				print "The next stored multiple of ".($s/$incr)." is $composite\n" if DEBUG;
				$sieve{$composite} = $s;
				$candidate += $incr;
			}

			print "Candidate $candidate is not in sieve\n" if DEBUG;

			while( $candidate < $q ) {
				print "$candidate is prime\n" if DEBUG;
				push @primes, $candidate;
				$candidate += $incr;
				next OUTER if exists $sieve{$candidate};
			} 

			die "Candidate = $candidate, p = $p, q = $q" if $candidate > $q;
			print "Candidate $candidate is equal to $p squared;\t" if DEBUG;

			# Thus, it is now time to add p to the sieve,
			my $step = $incr * $p;
			my $composite = $q + $step;
			$composite += $step while exists $sieve{$composite};
			print "The next multiple of $p is $composite\n" if DEBUG;
			$sieve{$composite} = $step;
		
			# and fetch out a new value for p from our primes array.
			$p = $primes[++$ps];
			$q = $p * $p;	
			
			# And since $candidate was equal to some prime squared,
			# it's obviously composite, and we need to increment it.
			$candidate += $incr;
			print "p is $p, q is $q, candidate is $candidate\n" if DEBUG;
		} continue {
			return $primes[$n] if $n <= $#primes;
		}
	}

}

if( !caller ) {
	tie my (@prime_list), 'Tie::SieveOfEratosthenes';
	my $limit = $ARGV[0] || 100;
	my $line = "";
	for( my $count = 0; $prime_list[$count] < $limit; ++$count ) {
		$line .= $prime_list[$count]. ", ";
		next if length($line) <= 70;
		if( $line =~ tr/,// > 1 ) {
			$line =~ s/^(.*,) (.*, )/$2/;
			print $1, "\n";
		} else {
			print $line, "\n";
			$line = "";
		}
	}
	$line =~ s/, \z//;
	print $line, "\n" if $line;
}

1;

This one is based on the vector sieve shown earlier, but adds to a list as needed, just sieving in the segment. Slightly faster and half the memory vs. the previous incremental sieve. It uses the same API -- arguably we should be offset by one so $primes[$n] returns the $n'th prime.

use strict;
use warnings;
package Tie::SieveOfEratosthenes;

sub TIEARRAY {
  my $class = shift;
  my @primes = (2,3,5,7);
  return bless \@primes, $class;
}

sub prextend { # Extend the given list of primes using a segment sieve
  my($primes, $to) = @_;
  $to-- unless $to & 1; # Ensure end is odd
  return if $to < $primes->[-1];
  my $sqrtn = int(sqrt($to)+0.001);
  prextend($primes, $sqrtn) if $primes->[-1] < $sqrtn;
  my($segment, $startp) = ('', $primes->[-1]+1);
  my($s_beg, $s_len) = ($startp >> 1, ($to>>1) - ($startp>>1));
  for my $p (@$primes) {
    last if $p > $sqrtn;
    if ($p >= 3) {
      my $p2 = $p*$p;
      if ($p2 < $startp) {   # Bump up to next odd multiple of p >= startp
        my $f = 1+int(($startp-1)/$p);
        $p2 = $p * ($f | 1);
      }
      for (my $s = ($p2>>1)-$s_beg; $s <= $s_len; $s += $p) {
        vec($segment, $s, 1) = 1;   # Mark composites in segment
      }
    }
  }
  # Now add all the primes found in the segment to the list
  do { push @$primes, 1+2*($_+$s_beg) if !vec($segment,$_,1) } for 0 .. $s_len;
}

sub FETCHSIZE { 0x7FFF_FFFF }  # Allows foreach to work
sub FETCH {
  my($primes, $n) = @_;
  return if $n < 0;
  # Keep expanding the list as necessary, 5% larger each time.
  prextend($primes, 1000+int(1.05*$primes->[-1])) while $n > $#$primes;
  return $primes->[$n];
}

if( !caller ) {
  tie my @prime_list, 'Tie::SieveOfEratosthenes';
  my $limit = $ARGV[0] || 100;
  print $prime_list[0];
  my $i = 1;
  while ($prime_list[$i] < $limit) { print " ", $prime_list[$i++]; }
  print "\n";
}

1;

Phix

Translation of: Euphoria
constant limit = 1000
sequence primes = {}
sequence flags = repeat(1, limit)
for i=2 to floor(sqrt(limit)) do
    if flags[i] then
        for k=i*i to limit by i do
            flags[k] = 0
        end for
    end if
end for
for i=2 to limit do
    if flags[i] then
        primes &= i
    end if
end for
pp(primes,{pp_Maxlen,77})
Output:
{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,
 101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,
 193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,
 293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,
 409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,
 521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,
 641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,
 757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,
 881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997}

See also Sexy_primes#Phix where the sieve is more useful than a list of primes.
Most applications should use the builtins, eg get_primes(-get_maxprime(1000*1000)) or get_primes_le(1000) both give exactly the same output as above.

Phixmonti

include ..\Utilitys.pmt

def sequence /# ( ini end [step] ) #/
    ( ) swap for 0 put endfor
enddef

1000 var limit

( 1 limit ) sequence

( 2 limit ) for >ps
    ( tps dup * limit tps ) for
        dup limit < if 0 swap set else drop endif
    endfor
    cps
endfor
( 1 limit 0 ) remove
pstack

Another solution

include ..\Utilitys.pmt
   
1000

( "Primes in " over ": " ) lprint

2 swap 2 tolist for >ps
    2
    dup tps < while
        tps over mod 0 == if false else 1 + true endif
        over tps < and
    endwhile
    tps < ps> swap if drop endif
endfor

pstack
Output:
Primes in 1000:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

=== Press any key to exit ===

PHP

function iprimes_upto($limit)
{
    for ($i = 2; $i < $limit; $i++)
    {
	$primes[$i] = true;
    }
    
    for ($n = 2; $n < $limit; $n++)
    {
	if ($primes[$n])
	{
	    for ($i = $n*$n; $i < $limit; $i += $n)
	    {
		$primes[$i] = false;
	    }
	}
    }
    
    return $primes;
}

echo wordwrap(
    'Primes less or equal than 1000 are : ' . PHP_EOL .
    implode(' ', array_keys(iprimes_upto(1000), true, true)),
    100
);
Output:
Primes less or equal than 1000 are : 
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131
137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269
271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421
431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743
751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919
929 937 941 947 953 967 971 977 983 991 997

Picat

The SoE is provided in the standard library, defined as follows:

primes(N) = L =>
    A = new_array(N),
    foreach(I in 2..floor(sqrt(N)))
        if (var(A[I])) then
            foreach(J in I**2..I..N)
                A[J]=0
            end
         end
     end,
     L=[I : I in 2..N, var(A[I])].
Output:
Picat> L = math.primes(100).
L = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
yes

PicoLisp

(de sieve (N)
   (let Sieve (range 1 N)
      (set Sieve)
      (for I (cdr Sieve)
         (when I
            (for (S (nth Sieve (* I I)) S (nth (cdr S) I))
               (set S) ) ) )
      (filter bool Sieve) ) )

Output:

: (sieve 100)
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Alternate Version Using a 2x3x5x7 Wheel

This works by destructively modifying the CDR of the previous cell when it finds a composite number. For sieving large sets (e.g. 1,000,000) it's much faster than the above.

(setq WHEEL-2357
    (2  4  2  4  6  2  6  4
     2  4  6  6  2  6  4  2
     6  4  6  8  4  2  4  2
     4  8  6  4  6  2  4  6
     2  6  6  4  2  4  6  2
     6  4  2  4  2 10  2 10 .))

(de roll2357wheel (Limit)
    (let W WHEEL-2357
        (make
            (for (N 11  (<= N Limit)  (+ N (pop 'W)))
                (link N)))))

(de sqr (X) (* X X))

(de remove-multiples (L)
    (let (N (car L)  M (* N N)  P L  Q (cdr L))
        (while Q
            (let A (car Q)
                (until (>= M A)
                    (setq M (+ M N)))
                (when (= A M)
                    (con P (cdr Q))))
            (setq  P Q  Q (cdr Q)))))


(de sieve (Limit)
    (let Sieve (roll2357wheel Limit)
        (for (P Sieve  (<= (sqr (car P)) Limit)  (cdr P))
            (remove-multiples P))
        (append (2 3 5 7) Sieve)))
Output:
: (sieve 100)
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
: (filter '((N) (> N 900)) (sieve 1000))
-> (907 911 919 929 937 941 947 953 967 971 977 983 991 997)
: (last (sieve 1000000))
-> 999983

PL/I

eratos: proc options (main) reorder;

dcl i  fixed bin (31);
dcl j  fixed bin (31);
dcl n  fixed bin (31);
dcl sn fixed bin (31);

dcl hbound builtin;
dcl sqrt   builtin;

dcl sysin    file;
dcl sysprint file;

get list (n);
sn = sqrt(n);

begin;
  dcl primes(n) bit (1) aligned init ((*)((1)'1'b));

  i = 2;

  do while(i <= sn);
    do j = i ** 2 by i to hbound(primes, 1);
      /* Adding a test would just slow down processing! */
      primes(j) = '0'b; 
     end;

    do i = i + 1 to sn until(primes(i));
    end;
  end;

  do i = 2 to hbound(primes, 1);
    if primes(i) then
      put data(i);
  end;
end;
end eratos;

PL/M

100H:

DECLARE PRIME$MAX LITERALLY '5000';

/* CREATE SIEVE OF GIVEN SIZE */
MAKE$SIEVE: PROCEDURE(START, SIZE);
    DECLARE (START, SIZE, M, N) ADDRESS;
    DECLARE PRIME BASED START BYTE;
    
    PRIME(0)=0; /* 0 AND 1 ARE NOT PRIMES */
    PRIME(1)=0; 
    DO N=2 TO SIZE;
        PRIME(N)=1; /* ASSUME ALL OTHERS ARE PRIME AT BEGINNING */
    END;
    
    DO N=2 TO SIZE;
        IF PRIME(N) THEN DO; /* IF A NUMBER IS PRIME... */
            DO M=N*N TO SIZE BY N;
                PRIME(M) = 0; /* THEN ITS MULTIPLES ARE NOT */
            END;
        END;
    END;
END MAKE$SIEVE;

/* CP/M CALLS */
BDOS: PROCEDURE(FUNC, ARG);
    DECLARE FUNC BYTE, ARG ADDRESS;
    GO TO 5;
END BDOS;

DECLARE BDOS$EXIT  LITERALLY '0',
        BDOS$PRINT LITERALLY '9';

/* PRINT A 16-BIT NUMBER */
PRINT$NUMBER: PROCEDURE(N);
    DECLARE (N, P) ADDRESS;
    DECLARE S (8) BYTE INITIAL ('.....',10,13,'$');
    DECLARE C BASED P BYTE;
    P = .S(5);
DIGIT:
    P = P - 1;
    C = (N MOD 10) + '0';
    N = N / 10;
    IF N > 0 THEN GO TO DIGIT;
    CALL BDOS(BDOS$PRINT, P);
END PRINT$NUMBER;

/* PRINT ALL PRIMES UP TO N */
PRINT$PRIMES: PROCEDURE(N, SIEVE);
    DECLARE (I, N, SIEVE) ADDRESS;
    DECLARE PRIME BASED SIEVE BYTE;
    CALL MAKE$SIEVE(SIEVE, N);
    DO I = 2 TO N;
        IF PRIME(I) THEN CALL PRINT$NUMBER(I);
    END;
END PRINT$PRIMES;

CALL PRINT$PRIMES(PRIME$MAX, .MEMORY);

CALL BDOS(BDOS$EXIT, 0);
EOF
Output:
2
3
5
7
11
....
4967
4969
4973
4987
4999

PL/SQL

create or replace package sieve_of_eratosthenes as
  type array_of_booleans is varray(100000000) of boolean;
  type table_of_integers is table of integer;
  function find_primes (n number) return table_of_integers pipelined;
end sieve_of_eratosthenes;
/

create or replace package body sieve_of_eratosthenes as
  function find_primes (n number) return table_of_integers pipelined is
      flag array_of_booleans;
      ptr  integer;
      i    integer;
  begin
      flag := array_of_booleans(false, true);
      flag.extend(n - 2, 2);
      ptr  := 1;
      << outer_loop >>
      while ptr * ptr <= n loop
          while not flag(ptr) loop
              ptr := ptr + 1;
          end loop;
          i := ptr * ptr;
          while i <= n loop
              flag(i) := false;
              i := i + ptr;
          end loop;
          ptr := ptr + 1;
      end loop outer_loop;
      for i in 1 .. n loop
          if flag(i) then
              pipe row (i);
          end if;
      end loop;
      return;
  end find_primes;
end sieve_of_eratosthenes;
/

Usage:

select column_value as prime_number
from   table(sieve_of_eratosthenes.find_primes(30));

PRIME_NUMBER
------------
	   2
	   3
	   5
	   7
	  11
	  13
	  17
	  19
	  23
	  29

10 rows selected.

Elapsed: 00:00:00.01

select count(*) as number_of_primes, sum(column_value) as sum_of_primes
from   table(sieve_of_eratosthenes.find_primes(1e7));

NUMBER_OF_PRIMES   SUM_OF_PRIMES
---------------- ---------------
          664579   3203324994356

Elapsed: 00:00:02.60

Pony

use "time" // for testing
use "collections"

class Primes is Iterator[U32] // returns an Iterator of found primes...
  let _bitmask: Array[U8] = [ 1; 2; 4; 8; 16; 32; 64; 128 ]
  var _lmt: USize
  let _cmpsts: Array[U8]
  var _ndx: USize = 2
  var _curr: U32 = 2
  
  new create(limit: U32) ? =>
    _lmt = USize.from[U32](limit)
    let sqrtlmt = USize.from[F64](F64.from[U32](limit).sqrt())
    _cmpsts = Array[U8].init(0, (_lmt + 8) / 8) // already zeroed; bit array
    _cmpsts(0)? = 3 // mark 0 and 1 as not prime!
    if sqrtlmt < 2 then return end
    for p in Range[USize](2, sqrtlmt + 1) do
      if (_cmpsts(p >> 3)? and _bitmask(p and 7)?) == 0 then
        var s = p * p // cull start address for p * p!
        let slmt = (s + (p << 3)).min(_lmt + 1)
        while s < slmt do
          let msk = _bitmask(s and 7)?
          var c = s >> 3
          while c < _cmpsts.size() do
            _cmpsts(c)? = _cmpsts(c)? or msk
            c = c + p
          end
          s = s + p
        end
      end
    end

  fun ref has_next(): Bool val => _ndx < (_lmt + 1)
  
  fun ref next(): U32 ? =>
    _curr = U32.from[USize](_ndx); _ndx = _ndx + 1
    while (_ndx <= _lmt) and ((_cmpsts(_ndx >> 3)? and _bitmask(_ndx and 7)?) != 0) do
      _ndx = _ndx + 1
    end
    _curr

actor Main
  new create(env: Env) =>
    let limit: U32 = 1_000_000_000
    try
      env.out.write("Primes to 100:  ")
      for p in Primes(100)? do env.out.write(p.string() + " ") end
      var count: I32 = 0
      for p in Primes(1_000_000)? do count = count + 1 end
      env.out.print("\nThere are " + count.string() + " primes to a million.")
      let t = Time
      let start = t.millis()
      let prms = Primes(limit)?
      let elpsd = t.millis() - start
      count = 0
      for _ in prms do count = count + 1 end
      env.out.print("Found " + count.string() + " primes to " + limit.string() + ".")
      env.out.print("This took " + elpsd.string() + " milliseconds.")
    end
Output:
Primes to 100:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 78498 primes to a million.
Found 50847534 primes to 1000000000.
This took 28123 milliseconds.

Note to users: a naive monolithic sieve (one huge array) isn't really the way to implement this for other than trivial usage in sieving ranges to a few millions as cache locality becomes a very large problem as the size of the array (even bit packed with one bit per number representation as here) limits the maximum range that can be sieved and the "cache thrashing" limits the speed.

For extended ranges, a Page Segmented version should be used. As well, for any extended ranges in the billions, it is a waste of available computer resources to not use the multi-threading available in a modern CPU, at which Pony would do very well with its built-in Actor concurrency model.

These versions use "loop unpeeling" (not full loop unrolling), which recognizes the repeating modulo pattern of masking the bytes by the base primes less than the square root of the limit so that an "unpeeling" by eight loops can cull by a constant bit mask over the whole range. For smaller ranges where the speed is not limited by "cache thrashing", this can provide about a factor-of-two speed-up.

Alternate Odds-Only version of the above

It is a waste not to do the trivial changes to the above code to sieve odds-only, which is about two and a half times faster due to the decreased number of culling operations; it doesn't really do much about the huge array problem though, other than to reduce it by a factor of two.

use "time" // for testing
use "collections"

class Primes is Iterator[U32] // returns an Iterator of found primes...
  let _bitmask: Array[U8] = [ 1; 2; 4; 8; 16; 32; 64; 128 ]
  var _lmti: USize
  let _cmpsts: Array[U8]
  var _ndx: USize = 0
  var _curr: U32 = 0
  
  new create(limit: U32) ? =>
    if limit < 3 then _lmti = 0; _cmpsts = Array[U8](); return end
    _lmti = USize.from[U32]((limit - 3) / 2)
    let sqrtlmti = (USize.from[F64](F64.from[U32](limit).sqrt()) - 3) / 2
    _cmpsts = Array[U8].init(0, (_lmti + 8) / 8) // already zeroed; bit array
    for i in Range[USize](0, sqrtlmti + 1) do
      if (_cmpsts(i >> 3)? and _bitmask(i and 7)?) == 0 then
        let p = i + i + 3
        var s = ((i << 1) * (i + 3)) + 3 // cull start address for p * p!
        let slmt = (s + (p << 3)).min(_lmti + 1)
        while s < slmt do
          let msk = _bitmask(s and 7)?
          var c = s >> 3
          while c < _cmpsts.size() do
            _cmpsts(c)? = _cmpsts(c)? or msk
            c = c + p
          end
          s = s + p
        end
      end
    end

  fun ref has_next(): Bool val => _ndx < (_lmti + 1)
  
  fun ref next(): U32 ? =>
    if _curr < 1 then _curr = 3; if _lmti == 0 then _ndx = 1 end; return 2 end
    _curr = U32.from[USize](_ndx + _ndx + 3); _ndx = _ndx + 1
    while (_ndx <= _lmti) and ((_cmpsts(_ndx >> 3)? and _bitmask(_ndx and 7)?) != 0) do
      _ndx = _ndx + 1
    end
    _curr

actor Main
  new create(env: Env) =>
    let limit: U32 = 1_000_000_000
    try
      env.out.write("Primes to 100:  ")
      for p in Primes(100)? do env.out.write(p.string() + " ") end
      var count: I32 = 0
      for p in Primes(1_000_000)? do count = count + 1 end
      env.out.print("\nThere are " + count.string() + " primes to a million.")
      let t = Time
      let start = t.millis()
      let prms = Primes(limit)?
      let elpsd = t.millis() - start
      count = 0
      for _ in prms do count = count + 1 end
      env.out.print("Found " + count.string() + " primes to " + limit.string() + ".")
      env.out.print("This took " + elpsd.string() + " milliseconds.")
    end

The output is the same as the above except that it is about two and a half times faster due to that many less culling operations.

Pop11

define eratostenes(n);
lvars bits = inits(n), i, j;
for i from 2 to n do
   if bits(i) = 0 then
      printf('' >< i, '%s\n');
      for j from 2*i by i to n do
         1 -> bits(j);
      endfor;
   endif;
endfor;
enddefine;

PowerShell

Basic procedure

It outputs immediately so that the number can be used by the pipeline.

function Sieve ( [int] $num )
{
    $isprime = @{}
    2..$num | Where-Object {
        $isprime[$_] -eq $null } | ForEach-Object {
        $_
        $isprime[$_] = $true
        $i=$_*$_
        for ( ; $i -le $num; $i += $_ )
        { $isprime[$i] = $false }
    }
}

Another implementation

function eratosthenes ($n) {
    if($n -ge 1){
        $prime = @(1..($n+1) | foreach{$true})
        $prime[1] = $false
        $m = [Math]::Floor([Math]::Sqrt($n))
        for($i = 2; $i -le $m; $i++) {
            if($prime[$i]) {
                for($j = $i*$i; $j -le $n; $j += $i) {
                    $prime[$j] = $false
                }
            }
        }
        1..$n | where{$prime[$_]}
    } else {
        Write-Warning "$n is less than 1"
    }
}
"$(eratosthenes 100)"

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Processing

Calculate the primes up to 1000000 with Processing, including a visualisation of the process.

int i=2;
int maxx;
int maxy;
int max;
boolean[] sieve;

void setup() {
  size(1000, 1000);
  // frameRate(2);
  maxx=width;
  maxy=height;
  max=width*height;
  sieve=new boolean[max+1];

  sieve[1]=false;
  plot(0, false);
  plot(1, false);
  for (int i=2; i<=max; i++) {
    sieve[i]=true;
    plot(i, true);
  }
}

void draw() {
  if (!sieve[i]) {
    while (i*i<max && !sieve[i]) {
      i++;
    }
  }
  if (sieve[i]) {
    print(i+" ");
    for (int j=i*i; j<=max; j+=i) {
      if (sieve[j]) {
        sieve[j]=false;
        plot(j, false);
      }
    }
  }
  if (i*i<max) {
    i++;
  } else {
    noLoop();
    println("finished");
  }
}

void plot(int pos, boolean active) {
  set(pos%maxx, pos/maxx, active?#000000:#ffffff);
}

As an additional visual effect, the layout of the pixel could be changed from the line-by-line layout to a spiral-like layout starting in the middle of the screen.

Processing Python mode

from __future__ import print_function

i = 2

def setup():
    size(1000, 1000)
    # frameRate(2)
    global maxx, maxy, max_num, sieve
    maxx = width
    maxy = height
    max_num = width * height
    sieve = [False] * (max_num + 1)

    sieve[1] = False
    plot(0, False)
    plot(1, False)
    for i in range(2, max_num + 1):
        sieve[i] = True
        plot(i, True)


def draw():
    global i
    if not sieve[i]:
        while (i * i < max_num and not sieve[i]):
            i += 1

    if sieve[i]:
        print("{} ".format(i), end = '')
        for j in range(i * i, max_num + 1, i):
            if sieve[j]:
                sieve[j] = False
                plot(j, False)

    if i * i < max_num:
        i += 1
    else:
        noLoop()
        println("finished")


def plot(pos, active):
    set(pos % maxx, pos / maxx, color(0) if active else color(255))

Prolog

Using lists

Basic bounded sieve

primes(N, L) :- numlist(2, N, Xs),
	        sieve(Xs, L).

sieve([H|T], [H|X]) :- H2 is H + H, 
                       filter(H, H2, T, R),
                       sieve(R, X).
sieve([], []).

filter(_, _, [], []).
filter(H, H2, [H1|T], R) :- 
    (   H1 < H2 -> R = [H1|R1], filter(H, H2, T, R1)
    ;   H3 is H2 + H,
        (   H1 =:= H2  ->       filter(H, H3, T, R)
        ;                       filter(H, H3, [H1|T], R) ) ).
Output:
 ?- time(( primes(7920,X), length(X,N) )).
% 1,131,127 inferences, 0.109 CPU in 0.125 seconds (88% CPU, 10358239 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000 .

Basic bounded Euler's sieve

Translation of: Erlang Canonical

This is actually the Euler's variant of the sieve of Eratosthenes, generating (and thus removing) each multiple only once, though a sub-optimal implementation.

primes(X, PS) :- X > 1, range(2, X, R), sieve(R, PS).

range(X, X, [X]) :- !.
range(X, Y, [X | R]) :- X < Y, X1 is X + 1, range(X1, Y, R).

mult(A, B, C) :- C is A*B.

sieve([X], [X]) :- !.
sieve([H | T], [H | S]) :- maplist( mult(H), [H | T], MS), 
                           remove(MS, T, R), sieve(R, S).
 
remove( _,       [],      []     ) :- !.
remove( [H | X], [H | Y], R      ) :- !, remove(X, Y, R).
remove( X,       [H | Y], [H | R]) :- remove(X, Y, R).

Running in SWI Prolog,

Output:
 ?- time(( primes(7920,X), length(X,N) )).
% 2,087,373 inferences, 0.203 CPU in 0.203 seconds (100% CPU, 10297621 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.

Optimized Euler's sieve

We can stop early, with massive improvement in complexity (below ~ n1.5 inferences, empirically, vs. the ~ n2 of the above, in n primes produced; showing only the modified predicates):

primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).

sieve(X, [H | T], [H | T]) :- H*H > X, !.
sieve(X, [H | T], [H | S]) :- maplist( mult(H), [H | T], MS), 
                              remove(MS, T, R), sieve(X, R, S).
Output:
 ?- time(( primes(7920,X), length(X,N) )).
% 174,437 inferences, 0.016 CPU in 0.016 seconds (100% CPU, 11181787 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.

Bounded sieve

Optimized by stopping early, traditional sieve of Eratosthenes generating multiples by iterated addition.

primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).

range(X, X, [X]) :- !.
range(X, Y, [X | R]) :- X < Y, X1 is X + 1, range(X1, Y, R).

sieve(X, [H | T], [H | T]) :- H*H > X, !.
sieve(X, [H | T], [H | S]) :- mults( H, X, MS), remove(MS, T, R), sieve(X, R, S).

mults( H, Lim, MS):- M is H*H, mults( H, M, Lim, MS).
mults( _, M, Lim, []):- M > Lim, !.
mults( H, M, Lim, [M|MS]):- M2 is M+H, mults( H, M2, Lim, MS).

remove( _,       [],      []     ) :- !.
remove( [H | X], [H | Y], R      ) :- !, remove(X, Y, R).
remove( [H | X], [G | Y], R      ) :- H < G, !, remove(X, [G | Y], R).
remove( X,       [H | Y], [H | R]) :- remove(X, Y, R).
Output:
?- time(( primes(7920,X), length(X,N) )).
% 140,654 inferences, 0.016 CPU in 0.011 seconds (142% CPU, 9016224 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.

Sift the Two's and Sift the Three's

Another version, based on Cloksin&Mellish p.175, modified to stop early as well as to work with odds only and use addition in the removing predicate, instead of the mod testing as the original was doing:

primes(N,[]):- N < 2, !.
primes(N,[2|R]):- ints(3,N,L), sift(N,L,R).
ints(A,B,[A|C]):- A=<B -> D is A+2, ints(D,B,C).
ints(_,_,[]).
sift(_,[],[]).
sift(N,[A|B],[A|C]):- A*A =< N ->  rmv(A,B,D), sift(N,D,C)
                      ; C=B.
rmv(A,B,D):- M is A*A, rmv(A,M,B,D).
rmv(_,_,[],[]).
rmv(P,M,[A|B],C):- (   M>A ->  C=[A|D], rmv(P,M,B,D)
                   ;   M==A ->  M2 is M+2*P, rmv(P,M2,B,C) 
                   ;   M<A ->  M2 is M+2*P, rmv(P,M2,[A|B],C)
                   ).

Runs at about n^1.4 time empirically, producing 20,000 primes in 1.4 secs on the SWISH platform as of 2021-11-26.

Using lazy lists

In SWI Prolog and others, where freeze/2 is available.

Basic variant

primes(PS):- count(2, 1, NS), sieve(NS, PS).

count(N, D, [N|T]):- freeze(T, (N2 is N+D, count(N2, D, T))).

sieve([N|NS],[N|PS]):- N2 is N*N, count(N2,N,A), remove(A,NS,B), freeze(PS, sieve(B,PS)).

take(N, X, A):- length(A, N), append(A, _, X).

remove([A|T],[B|S],R):- A < B -> remove(T,[B|S],R) ;
                        A=:=B -> remove(T,S,R) ; 
                        R = [B|R2], freeze(R2, remove([A|T], S, R2)).
Output:
 ?- time(( primes(PS), take(1000,PS,R1), length(R,10), append(_,R,R1), writeln(R), false )).
[7841,7853,7867,7873,7877,7879,7883,7901,7907,7919]
% 8,464,518 inferences, 0.702 CPU in 0.697 seconds (101% CPU, 12057641 Lips)
false.

Optimized by postponed removal

Showing only changed predicates.

primes([2|PS]):- 
    freeze(PS, (primes(BPS), count(3, 1, NS), sieve(NS, BPS, 4, PS))). 

sieve([N|NS], BPS, Q, PS):- 
    N < Q -> PS = [N|PS2], freeze(PS2, sieve(NS, BPS, Q, PS2))
    ;  BPS = [BP,BP2|BPS2], Q2 is BP2*BP2, count(Q, BP, MS),
       remove(MS, NS, R), sieve(R, [BP2|BPS2], Q2, PS).
Output:
 ?- time(( primes(PS), take(1000,PS,R1), length(R,10), append(_,R,R1), writeln(R), false )).
[7841,7853,7867,7873,7877,7879,7883,7901,7907,7919]
% 697,727 inferences, 0.078 CPU in 0.078 seconds (100% CPU, 8945161 Lips)
false.       %% odds only: 487,441 inferences

Using facts to record composite numbers

The first two solutions use Prolog "facts" to record the composite (i.e. already-visited) numbers.

Elementary approach: multiplication-free, division-free, mod-free, and cut-free

The basic Eratosthenes sieve depends on nothing more complex than counting. In celebration of this simplicity, the first approach to the problem taken here is free of multiplication and division, as well as Prolog's non-logical "cut".

It defines the predicate between/4 to avoid division, and composite/1 to record integers that are found to be composite.

% %sieve( +N, -Primes ) is true if Primes is the list of consecutive primes
% that are less than or equal to N
sieve( N, [2|Rest]) :-
  retractall( composite(_) ),
  sieve( N, 2, Rest ) -> true.  % only one solution

% sieve P, find the next non-prime, and then recurse:
sieve( N, P, [I|Rest] ) :-
  sieve_once(P, N),
  (P = 2 -> P2 is P+1; P2 is P+2),
  between(P2, N, I), 
  (composite(I) -> fail; sieve( N, I, Rest )).

% It is OK if there are no more primes less than or equal to N:
sieve( N, P, [] ).

sieve_once(P, N) :-
  forall( between(P, N, P, IP),
          (composite(IP) -> true ; assertz( composite(IP) )) ).


% To avoid division, we use the iterator
% between(+Min, +Max, +By, -I) 
% where we assume that By > 0
% This is like "for(I=Min; I <= Max; I+=By)" in C.
between(Min, Max, By, I) :- 
  Min =< Max, 
  A is Min + By, 
  (I = Min; between(A, Max, By, I) ).


% Some Prolog implementations require the dynamic predicates be
%  declared:

:- dynamic( composite/1 ).

The above has been tested with SWI-Prolog and gprolog.

% SWI-Prolog:

?- time( (sieve(100000,P), length(P,N), writeln(N), last(P, LP), writeln(LP) )).
% 1,323,159 inferences, 0.862 CPU in 0.921 seconds (94% CPU, 1534724 Lips)
P = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 9592,
LP = 99991.

Optimized approach

Works with SWI-Prolog.

sieve(N, [2|PS]) :-       % PS is list of odd primes up to N
    retractall(mult(_)),
    sieve_O(3,N,PS).

sieve_O(I,N,PS) :-        % sieve odds from I up to N to get PS
    I =< N, !, I1 is I+2,
    (   mult(I) -> sieve_O(I1,N,PS)
    ;   (   I =< N / I -> 
            ISq is I*I, DI  is 2*I, add_mults(DI,ISq,N)
        ;   true 
        ),
        PS = [I|T],
        sieve_O(I1,N,T)
    ).
sieve_O(I,N,[]) :- I > N.

add_mults(DI,I,N) :-
    I =< N, !,
    ( mult(I) -> true ; assert(mult(I)) ),
    I1 is I+DI,
    add_mults(DI,I1,N).
add_mults(_,I,N) :- I > N.

main(N) :- current_prolog_flag(verbose,F),
  set_prolog_flag(verbose,normal), 
  time( sieve( N,P)), length(P,Len), last(P, LP), writeln([Len,LP]),
  set_prolog_flag(verbose,F).
 
:- dynamic( mult/1 ).
:- main(100000), main(1000000).

Running it produces

%% stdout copy
[9592, 99991]
[78498, 999983]

%% stderr copy
% 293,176 inferences, 0.14 CPU in 0.14 seconds (101% CPU, 2094114 Lips)
% 3,122,303 inferences, 1.63 CPU in 1.67 seconds (97% CPU, 1915523 Lips)

which indicates ~ N1.1 empirical orders of growth, which is consistent with the O(N log log N) theoretical runtime complexity.

Using a priority queue

Uses a ariority queue, from the paper "The Genuine Sieve of Eratosthenes" by Melissa O'Neill. Works with YAP (Yet Another Prolog)

?- use_module(library(heaps)).

prime(2).
prime(N) :- prime_heap(N, _).

prime_heap(3, H) :- list_to_heap([9-6], H).
prime_heap(N, H) :-
    prime_heap(M, H0), N0 is M + 2,
    next_prime(N0, H0, N, H).

next_prime(N0, H0, N, H) :-
    \+ min_of_heap(H0, N0, _),
    N = N0, Composite is N*N, Skip is N+N,
    add_to_heap(H0, Composite, Skip, H).
next_prime(N0, H0, N, H) :-
    min_of_heap(H0, N0, _),
    adjust_heap(H0, N0, H1), N1 is N0 + 2,
    next_prime(N1, H1, N, H).

adjust_heap(H0, N, H) :-
    min_of_heap(H0, N, _),
    get_from_heap(H0, N, Skip, H1),
    Composite is N + Skip, add_to_heap(H1, Composite, Skip, H2),
    adjust_heap(H2, N, H).
adjust_heap(H, N, H) :-
    \+ min_of_heap(H, N, _).

PureBasic

Basic procedure

For n=2 To Sqr(lim)
  If Nums(n)=0
    m=n*n
    While m<=lim
      Nums(m)=1
      m+n
    Wend
  EndIf
Next n

Working example

Dim Nums.i(0)
Define l, n, m, lim

If OpenConsole()

  ; Ask for the limit to search, get that input and allocate a Array
  Print("Enter limit for this search: ")
  lim=Val(Input())
  ReDim Nums(lim)
  
  ; Use a basic Sieve of Eratosthenes
  For n=2 To Sqr(lim)
    If Nums(n)=#False
      m=n*n
      While m<=lim
        Nums(m)=#True
        m+n
      Wend
    EndIf
  Next n
  
  ;Present the result to our user
  PrintN(#CRLF$+"The Prims up to "+Str(lim)+" are;")
  m=0: l=Log10(lim)+1
  For n=2 To lim
    If Nums(n)=#False
      Print(RSet(Str(n),l)+" ")
      m+1
      If m>72/(l+1)
        m=0: PrintN("")
      EndIf
    EndIf
  Next
  
  Print(#CRLF$+#CRLF$+"Press ENTER to exit"): Input()
  CloseConsole()
EndIf

Output may look like;

Enter limit for this search: 750

The Prims up to 750 are;
   2    3    5    7   11   13   17   19   23   29   31   37   41   43   47
  53   59   61   67   71   73   79   83   89   97  101  103  107  109  113
 127  131  137  139  149  151  157  163  167  173  179  181  191  193  197
 199  211  223  227  229  233  239  241  251  257  263  269  271  277  281
 283  293  307  311  313  317  331  337  347  349  353  359  367  373  379
 383  389  397  401  409  419  421  431  433  439  443  449  457  461  463
 467  479  487  491  499  503  509  521  523  541  547  557  563  569  571
 577  587  593  599  601  607  613  617  619  631  641  643  647  653  659
 661  673  677  683  691  701  709  719  727  733  739  743

Press ENTER to exit

Python

Note that the examples use range instead of xrange for Python 3 and Python 2 compatability, but when using Python 2 xrange is the nearest equivalent to Python 3's implementation of range and should be substituted for ranges with a very large number of items.

Using set lookup

The version below uses a set to store the multiples. set objects are much faster (usually O(log n)) than lists (O(n)) for checking if a given object is a member. Using the set.update method avoids explicit iteration in the interpreter, giving a further speed improvement.

def eratosthenes2(n):
    multiples = set()
    for i in range(2, n+1):
        if i not in multiples:
            yield i
            multiples.update(range(i*i, n+1, i))

print(list(eratosthenes2(100)))

Using array lookup

The version below uses array lookup to test for primality. The function primes_upto() is a straightforward implementation of Sieve of Eratosthenesalgorithm. It returns prime numbers less than or equal to limit.

def primes_upto(limit):
    is_prime = [False] * 2 + [True] * (limit - 1) 
    for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
        if is_prime[n]:
            for i in range(n*n, limit+1, n):
                is_prime[i] = False
    return [i for i, prime in enumerate(is_prime) if prime]

Using generator

The following code may be slightly slower than using the array/list as above, but uses no memory for output:

def iprimes_upto(limit):
    is_prime = [False] * 2 + [True] * (limit - 1)
    for n in xrange(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
        if is_prime[n]:
            for i in range(n * n, limit + 1, n): # start at ``n`` squared
                is_prime[i] = False
    for i in xrange(limit + 1):
        if is_prime[i]: yield i
Example:
>>> list(iprimes_upto(15))
[2, 3, 5, 7, 11, 13]

Odds-only version of the array sieve above

The following code is faster than the above array version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:

def primes2(limit):
    if limit < 2: return []
    if limit < 3: return [2]
    lmtbf = (limit - 3) // 2
    buf = [True] * (lmtbf + 1)
    for i in range((int(limit ** 0.5) - 3) // 2 + 1):
        if buf[i]:
            p = i + i + 3
            s = p * (i + 1) + i
            buf[s::p] = [False] * ((lmtbf - s) // p + 1)
    return [2] + [i + i + 3 for i, v in enumerate(buf) if v]

Note that "range" needs to be changed to "xrange" for maximum speed with Python 2.

Odds-only version of the generator version above

The following code is faster than the above generator version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:

def iprimes2(limit):
    yield 2
    if limit < 3: return
    lmtbf = (limit - 3) // 2
    buf = [True] * (lmtbf + 1)
    for i in range((int(limit ** 0.5) - 3) // 2 + 1):
        if buf[i]:
            p = i + i + 3
            s = p * (i + 1) + i
            buf[s::p] = [False] * ((lmtbf - s) // p + 1)
    for i in range(lmtbf + 1):
        if buf[i]: yield (i + i + 3)

Note that this version may actually run slightly faster than the equivalent array version with the advantage that the output doesn't require any memory.

Also note that "range" needs to be changed to "xrange" for maximum speed with Python 2.

Factorization wheel235 version of the generator version

This uses a 235 factorial wheel for further reductions in operations; the same techniques can be applied to the array version as well; it runs slightly faster and uses slightly less memory as compared to the odds-only algorithms:

def primes235(limit):
    yield 2; yield 3; yield 5
    if limit < 7: return
    modPrms = [7,11,13,17,19,23,29,31]
    gaps = [4,2,4,2,4,6,2,6,4,2,4,2,4,6,2,6] # 2 loops for overflow
    ndxs = [0,0,0,0,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5,5,5,6,6,7,7,7,7,7,7]
    lmtbf = (limit + 23) // 30 * 8 - 1 # integral number of wheels rounded up
    lmtsqrt = (int(limit ** 0.5) - 7)
    lmtsqrt = lmtsqrt // 30 * 8 + ndxs[lmtsqrt % 30] # round down on the wheel
    buf = [True] * (lmtbf + 1)
    for i in range(lmtsqrt + 1):
        if buf[i]:
            ci = i & 7; p = 30 * (i >> 3) + modPrms[ci]
            s = p * p - 7; p8 = p << 3
            for j in range(8):
                c = s // 30 * 8 + ndxs[s % 30]
                buf[c::p8] = [False] * ((lmtbf - c) // p8 + 1)
                s += p * gaps[ci]; ci += 1
    for i in range(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
        if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])

Note: Much of the time (almost two thirds for this last case for Python 2.7.6) for any of these array/list or generator algorithms is used in the computation and enumeration of the final output in the last line(s), so any slight changes to those lines can greatly affect execution time. For Python 3 this enumeration is about twice as slow as Python 2 (Python 3.3 slow and 3.4 slower) for an even bigger percentage of time spent just outputting the results. This slow enumeration means that there is little advantage to versions that use even further wheel factorization, as the composite number culling is a small part of the time to enumerate the results.

If just the count of the number of primes over a range is desired, then converting the functions to prime counting functions by changing the final enumeration lines to "return buf.count(True)" will save a lot of time.

Note that "range" needs to be changed to "xrange" for maximum speed with Python 2 where Python 2's "xrange" is a better choice for very large sieve ranges.
Timings were done primarily in Python 2 although source is Python 2/3 compatible (shows range and not xrange).

Using numpy

Library: NumPy

Below code adapted from literateprograms.org using numpy

import numpy
def primes_upto2(limit):
    is_prime = numpy.ones(limit + 1, dtype=numpy.bool)
    for n in xrange(2, int(limit**0.5 + 1.5)): 
        if is_prime[n]:
            is_prime[n*n::n] = 0
    return numpy.nonzero(is_prime)[0][2:]

Performance note: there is no point to add wheels here, due to execution of p[n*n::n] = 0 and nonzero() takes us almost all time.

Also see Prime numbers and Numpy – Python.

Using wheels with numpy

Version with wheel based optimization:

from numpy import array, bool_, multiply, nonzero, ones, put, resize
#
def makepattern(smallprimes):
    pattern = ones(multiply.reduce(smallprimes), dtype=bool_)
    pattern[0] = 0
    for p in smallprimes:
        pattern[p::p] = 0
    return pattern
#
def primes_upto3(limit, smallprimes=(2,3,5,7,11)):    
    sp = array(smallprimes)
    if limit <= sp.max(): return sp[sp <= limit]
    #
    isprime = resize(makepattern(sp), limit + 1) 
    isprime[:2] = 0; put(isprime, sp, 1) 
    #
    for n in range(sp.max() + 2, int(limit**0.5 + 1.5), 2): 
        if isprime[n]:
            isprime[n*n::n] = 0 
    return nonzero(isprime)[0]

Examples:

>>> primes_upto3(10**6, smallprimes=(2,3)) # Wall time: 0.17
array([     2,      3,      5, ..., 999961, 999979, 999983])
>>> primes_upto3(10**7, smallprimes=(2,3))            # Wall time: '''2.13'''
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])
>>> primes_upto3(15)
array([ 2,  3,  5,  7, 11, 13])
>>> primes_upto3(10**7, smallprimes=primes_upto3(15)) # Wall time: '''1.31'''
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])
>>> primes_upto2(10**7)                               # Wall time: '''1.39''' <-- version ''without'' wheels
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])
>>> primes_upto3(10**7)                               # Wall time: '''1.30'''
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])

The above-mentioned examples demonstrate that the given wheel based optimization does not show significant performance gain.

Infinite generator

A generator that will generate primes indefinitely (perhaps until it runs out of memory). Used as a library here.

Works with: Python version 2.6+, 3.x
import heapq

# generates all prime numbers
def sieve():
    # priority queue of the sequences of non-primes
    # the priority queue allows us to get the "next" non-prime quickly
    nonprimes = []
    
    i = 2
    while True:
        if nonprimes and i == nonprimes[0][0]: # non-prime
            while nonprimes[0][0] == i:
                # for each sequence that generates this number,
                # have it go to the next number (simply add the prime)
                # and re-position it in the priority queue
                x = nonprimes[0]
                x[0] += x[1]
                heapq.heapreplace(nonprimes, x)
        
        else: # prime
            # insert a 2-element list into the priority queue:
            # [current multiple, prime]
            # the first element allows sorting by value of current multiple
            # we start with i^2
            heapq.heappush(nonprimes, [i*i, i])
            yield i
        
        i += 1

Example:

>>> foo = sieve()
>>> for i in range(8):
...     print(next(foo))
... 
2
3
5
7
11
13
17
19

Infinite generator with a faster algorithm

The adding of each discovered prime's incremental step info to the mapping should be postponed until the prime's square is seen amongst the candidate numbers, as it is useless before that point. This drastically reduces the space complexity from O(n) to O(sqrt(n/log(n))), in n primes produced, and also lowers the run time complexity quite low (this test entry in Python 2.7 and this test entry in Python 3.x shows about ~ n1.08 empirical order of growth which is very close to the theoretical value of O(n log(n) log(log(n))), in n primes produced):

Works with: Python version 2.6+, 3.x
def primes():
    yield 2; yield 3; yield 5; yield 7;
    bps = (p for p in primes())             # separate supply of "base" primes (b.p.)
    p = next(bps) and next(bps)             # discard 2, then get 3
    q = p * p                               # 9 - square of next base prime to keep track of,
    sieve = {}                              #                       in the sieve dict
    n = 9                                   # n is the next candidate number
    while True:
        if n not in sieve:                  # n is not a multiple of any of base primes,
            if n < q:                       # below next base prime's square, so
                yield n                     # n is prime
            else:
                p2 = p + p                  # n == p * p: for prime p, add p * p + 2 * p
                sieve[q + p2] = p2          #   to the dict, with 2 * p as the increment step
                p = next(bps); q = p * p    # pull next base prime, and get its square
        else:
            s = sieve.pop(n); nxt = n + s   # n's a multiple of some b.p., find next multiple
            while nxt in sieve: nxt += s    # ensure each entry is unique
            sieve[nxt] = s                  # nxt is next non-marked multiple of this prime
        n += 2                              # work on odds only
 
import itertools
def primes_up_to(limit):
    return list(itertools.takewhile(lambda p: p <= limit, primes()))

Fast infinite generator using a wheel

Although theoretically over three times faster than odds-only, the following code using a 2/3/5/7 wheel is only about 1.5 times faster than the above odds-only code due to the extra overheads in code complexity. The test link for Python 2.7 and test link for Python 3.x show about the same empirical order of growth as the odds-only implementation above once the range grows enough so the dict operations become amortized to a constant factor.

Works with: Python version 2.6+, 3.x
def primes():
    for p in [2,3,5,7]: yield p                 # base wheel primes
    gaps1 = [ 2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8 ]
    gaps = gaps1 + [ 6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10 ] # wheel2357
    def wheel_prime_pairs():
        yield (11,0); bps = wheel_prime_pairs() # additional primes supply
        p, pi = next(bps); q = p * p            # adv to get 11 sqr'd is 121 as next square to put
        sieve = {}; n = 13; ni = 1              #   into sieve dict; init cndidate, wheel ndx
        while True:
            if n not in sieve:                  # is not a multiple of previously recorded primes
                if n < q: yield (n, ni)         # n is prime with wheel modulo index
                else:
                    npi = pi + 1                # advance wheel index
                    if npi > 47: npi = 0
                    sieve[q + p * gaps[pi]] = (p, npi) # n == p * p: put next cull position on wheel
                    p, pi = next(bps); q = p * p  # advance next prime and prime square to put
            else:
                s, si = sieve.pop(n)
                nxt = n + s * gaps[si]          # move current cull position up the wheel
                si = si + 1                     # advance wheel index
                if si > 47: si = 0
                while nxt in sieve:             # ensure each entry is unique by wheel
                    nxt += s * gaps[si]
                    si = si + 1                 # advance wheel index
                    if si > 47: si = 0
                sieve[nxt] = (s, si)            # next non-marked multiple of a prime
            nni = ni + 1                        # advance wheel index
            if nni > 47: nni = 0
            n += gaps[ni]; ni = nni             # advance on the wheel
    for p, pi in wheel_prime_pairs(): yield p   # strip out indexes

Further gains of about 1.5 times in speed can be made using the same code by only changing the tables and a few constants for a further constant factor gain of about 1.5 times in speed by using a 2/3/5/7/11/13/17 wheel (with the gaps list 92160 elements long) computed for a slight constant overhead time as per the test link for Python 2.7 and test link for Python 3.x. Further wheel factorization will not really be worth it as the gains will be small (if any and not losses) and the gaps table huge - it is already too big for efficient use by 32-bit Python 3 and the wheel should likely be stopped at 13:

def primes():
    whlPrms = [2,3,5,7,11,13,17]                # base wheel primes
    for p in whlPrms: yield p
    def makeGaps():
        buf = [True] * (3 * 5 * 7 * 11 * 13 * 17 + 1) # all odds plus extra for o/f
        for p in whlPrms:
            if p < 3:
                continue              # no need to handle evens
            strt = (p * p - 19) >> 1            # start position (divided by 2 using shift)
            while strt < 0: strt += p
            buf[strt::p] = [False] * ((len(buf) - strt - 1) // p + 1) # cull for p
        whlPsns = [i + i for i,v in enumerate(buf) if v]
        return [whlPsns[i + 1] - whlPsns[i] for i in range(len(whlPsns) - 1)]
    gaps = makeGaps()                           # big wheel gaps
    def wheel_prime_pairs():
        yield (19,0); bps = wheel_prime_pairs() # additional primes supply
        p, pi = next(bps); q = p * p            # adv to get 11 sqr'd is 121 as next square to put
        sieve = {}; n = 23; ni = 1              #   into sieve dict; init cndidate, wheel ndx
        while True:
            if n not in sieve:                  # is not a multiple of previously recorded primes
                if n < q: yield (n, ni)         # n is prime with wheel modulo index
                else:
                    npi = pi + 1                # advance wheel index
                    if npi > 92159: npi = 0
                    sieve[q + p * gaps[pi]] = (p, npi) # n == p * p: put next cull position on wheel
                    p, pi = next(bps); q = p * p  # advance next prime and prime square to put
            else:
                s, si = sieve.pop(n)
                nxt = n + s * gaps[si]          # move current cull position up the wheel
                si = si + 1                     # advance wheel index
                if si > 92159: si = 0
                while nxt in sieve:             # ensure each entry is unique by wheel
                    nxt += s * gaps[si]
                    si = si + 1                 # advance wheel index
                    if si > 92159: si = 0
                sieve[nxt] = (s, si)            # next non-marked multiple of a prime
            nni = ni + 1                        # advance wheel index
            if nni > 92159: nni = 0
            n += gaps[ni]; ni = nni             # advance on the wheel
    for p, pi in wheel_prime_pairs(): yield p   # strip out indexes


Iterative sieve on unbounded count from 2

See Extensible prime generator: Iterative sieve on unbounded count from 2

Quackery

  [ dup 1
    [ 2dup > while
      + 1 >>
      2dup / again ]
    drop nip ]                  is sqrt         ( n --> n )
 
  [ stack [ 3 ~ ] constant ]    is primes       (   --> s )
 
  ( If a number is prime, the corresponding bit on the
    number on the primes ancillary stack is set.
    Initially all the bits are set except for 0 and 1,
    which are not prime numbers by definition. 
    "eratosthenes" unsets all bits above those specified
    by it's argument. )
 
  [ bit ~
    primes take & primes put ]  is -composite   ( n -->   )
 
  [ bit primes share & 0 != ]   is isprime      ( n --> b )
 
  [ dup dup sqrt times
      [ i^ 1+
        dup isprime if
          [ dup 2 **
            [ dup -composite
              over +
              rot 2dup >
              dip unrot until ]
            drop ]
        drop ]
     drop 
     1+ bit 1 -
     primes take & 
     primes put ]                 is eratosthenes ( n -->   )
 
  100 eratosthenes
 
  100 times [ i^ isprime if [ i^ echo sp ] ]

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

R

sieve <- function(n) {
  if (n < 2) integer(0)
  else {
    primes <- rep(T, n)
    primes[[1]] <- F
    for(i in seq(sqrt(n))) {
      if(primes[[i]]) {
        primes[seq(i * i, n, i)] <- F
      }
    }
    which(primes)
  }
}

sieve(1000)
Output:
  [1]   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61
 [19]  67  71  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151
 [37] 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251
 [55] 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359
 [73] 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
 [91] 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593
[109] 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701
[127] 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827
[145] 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953
[163] 967 971 977 983 991 997

Alternate Odds-Only Version

sieve <- function(n) {
  if (n < 2) return(integer(0))
  lmt <- (sqrt(n) - 1) / 2
  sz <- (n - 1) / 2
  buf <- rep(TRUE, sz)
  for(i in seq(lmt)) {
    if (buf[i]) {
      buf[seq((i + i) * (i + 1), sz, by=(i + i + 1))] <- FALSE
    }
  }
  cat(2, sep='')
  for(i in seq(sz)) {
    if (buf[i]) {
      cat(" ", (i + i + 1), sep='')
    }
  }
}

sieve(1000)
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Racket

Imperative versions

Ugly imperative version:

#lang racket

(define (sieve n)
  (define non-primes '())
  (define primes '())
  (for ([i (in-range 2 (add1 n))])
    (unless (member i non-primes)
      (set! primes (cons i primes))
      (for ([j (in-range (* i i) (add1 n) i)])
        (set! non-primes (cons j non-primes)))))
  (reverse primes))

(sieve 100)

A little nicer, but still imperative:

#lang racket
(define (sieve n)
  (define primes (make-vector (add1 n) #t))
  (for* ([i (in-range 2 (add1 n))]
         #:when (vector-ref primes i)
         [j (in-range (* i i) (add1 n) i)])
    (vector-set! primes j #f))
  (for/list ([n (in-range 2 (add1 n))]
             #:when (vector-ref primes n))
    n))
(sieve 100)

Imperative version using a bit vector:

#lang racket
(require data/bit-vector)
;; Returns a list of prime numbers up to natural number limit
(define (eratosthenes limit)
  (define bv (make-bit-vector (+ limit 1) #f))
  (bit-vector-set! bv 0 #t)
  (bit-vector-set! bv 1 #t)
  (for* ([i (in-range (add1 (sqrt limit)))] #:unless (bit-vector-ref bv i)
         [j (in-range (* 2 i) (bit-vector-length bv) i)])
    (bit-vector-set! bv j #t))
  ;; translate to a list of primes
  (for/list ([i (bit-vector-length bv)] #:unless (bit-vector-ref bv i)) i))
(eratosthenes 100)
Output:

'(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Infinite list of primes Using laziness

These examples use infinite lists (streams) to implement the sieve of Eratosthenes in a functional way, and producing all prime numbers. The following functions are used as a prefix for pieces of code that follow:

#lang lazy
(define (ints-from i d) (cons i (ints-from (+ i d) d)))
(define (after n l f)
  (if (< (car l) n) (cons (car l) (after n (cdr l) f)) (f l)))
(define (diff l1 l2)
  (let ([x1 (car l1)] [x2 (car l2)])
    (cond [(< x1 x2) (cons x1 (diff (cdr l1)      l2 ))]
          [(> x1 x2)          (diff      l1  (cdr l2)) ]
          [else               (diff (cdr l1) (cdr l2)) ])))
(define (union l1 l2)        ; union of two lists
  (let ([x1 (car l1)] [x2 (car l2)])
    (cond [(< x1 x2) (cons x1 (union (cdr l1)      l2 ))]
          [(> x1 x2) (cons x2 (union      l1  (cdr l2)))]
          [else      (cons x1 (union (cdr l1) (cdr l2)))])))

Basic sieve

(define (sieve l)
  (define x (car l))
  (cons x (sieve (diff (cdr l) (ints-from (+ x x) x)))))
(define primes (sieve (ints-from 2 1)))
(!! (take 25 primes))

Runs at ~ n^2.1 empirically, for n <= 1500 primes produced.

With merged composites

Note that the first number, 2, and its multiples stream (ints-from 4 2) are handled separately to ensure that the non-primes list is never empty, which simplifies the code for union which assumes non-empty infinite lists.

(define (sieve l non-primes)
  (let ([x (car l)] [np (car non-primes)])
    (cond [(= x np)     (sieve (cdr l) (cdr  non-primes))]    ; else x < np
          [else (cons x (sieve (cdr l) (union (ints-from (* x x) x)  
                                               non-primes)))]))) 
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))

Basic sieve Optimized with postponed processing

Since a prime's multiples that count start from its square, we should only start removing them when we reach that square.

(define (sieve l prs)
  (define p (car prs))
  (define q (* p p))
  (after q l (λ(t) (sieve (diff t (ints-from q p)) (cdr prs)))))
(define primes (cons 2 (sieve (ints-from 3 1) primes)))

Runs at ~ n^1.4 up to n=10,000. The initial 2 in the self-referential primes definition is needed to prevent a "black hole".

Merged composites Optimized with postponed processing

Since prime's multiples that matter start from its square, we should only add them when we reach that square.

(define (composites l q primes)
  (after q l 
    (λ(t) 
      (let ([p (car primes)] [r (cadr primes)])
        (composites (union t (ints-from q p))   ; q = p*p
                    (* r r) (cdr primes))))))
(define primes (cons 2 
                 (diff (ints-from 3 1)
                       (composites (ints-from 4 2) 9 (cdr primes)))))

Implementation of Richard Bird's algorithm

Appears in M.O'Neill's paper. Achieves on its own the proper postponement that is specifically arranged for in the version above (with after), and is yet more efficient, because it folds to the right and so builds the right-leaning structure of merges at run time, where the more frequently-producing streams of multiples appear higher in that structure, so the composite numbers produced by them have less merge nodes to percolate through:

(define primes
  (cons 2 (diff (ints-from 3 1)
                (foldr (λ(p r) (define q (* p p))
                               (cons q (union (ints-from (+ q p) p) r)))
                       '() primes))))

Using threads and channels

Same algorithm as "merged composites" above (without the postponement optimization), but now using threads and channels to produce a channel of all prime numbers (similar to newsqueak). The macro at the top is a convenient wrapper around definitions of channels using a thread that feeds them.

#lang racket
(define-syntax (define-thread-loop stx)
  (syntax-case stx ()
    [(_ (name . args) expr ...)
     (with-syntax ([out! (datum->syntax stx 'out!)])
       #'(define (name . args)
           (define out (make-channel))
           (define (out! x) (channel-put out x))
           (thread (λ() (let loop () expr ... (loop))))
           out))]))
(define-thread-loop (ints-from i d) (out! i) (set! i (+ i d)))
(define-thread-loop (merge c1 c2)
  (let loop ([x1 (channel-get c1)] [x2 (channel-get c2)])
    (cond [(> x1 x2) (out! x2) (loop x1 (channel-get c2))]
          [(< x1 x2) (out! x1) (loop (channel-get c1) x2)]
          [else      (out! x1) (loop (channel-get c1) (channel-get c2))])))
(define-thread-loop (sieve l non-primes)
  (let loop ([x (channel-get l)] [np (channel-get non-primes)])
    (cond [(> x np) (loop x (channel-get non-primes))]
          [(= x np) (loop (channel-get l) (channel-get non-primes))]
          [else     (out! x) 
                    (set! non-primes (merge (ints-from (* x x) x) non-primes))
                    (loop (channel-get l)  np)])))
(define-thread-loop (cons x l)
  (out! x) (let loop () (out! (channel-get l)) (loop)))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(for/list ([i 25] [x (in-producer channel-get eof primes)]) x)

Using generators

Yet another variation of the same algorithm as above, this time using generators.

#lang racket
(require racket/generator)
(define (ints-from i d)
  (generator () (let loop ([i i]) (yield i) (loop (+ i d)))))
(define (merge g1 g2)
  (generator ()
    (let loop ([x1 (g1)] [x2 (g2)])
      (cond [(< x1 x2) (yield x1) (loop (g1) x2)]
            [(> x1 x2) (yield x2) (loop x1 (g2))]
            [else      (yield x1) (loop (g1) (g2))]))))
(define (sieve l non-primes)
  (generator ()
    (let loop ([x (l)] [np (non-primes)])
      (cond [(> x np) (loop x (non-primes))]
            [(= x np) (loop (l) (non-primes))]
            [else (yield x)
                  (set! non-primes (merge (ints-from (* x x) x) non-primes))
                  (loop (l) np)]))))
(define (cons x l) (generator () (yield x) (let loop () (yield (l)) (loop))))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(for/list ([i 25] [x (in-producer primes)]) x)

Raku

(formerly Perl 6)

sub sieve( Int $limit ) {
    my @is-prime = False, False, slip True xx $limit - 1;

    gather for @is-prime.kv -> $number, $is-prime {
        if $is-prime {
            take $number;
            loop (my $s = $number**2; $s <= $limit; $s += $number) {
                @is-prime[$s] = False;
            }
        }
    }
}

(sieve 100).join(",").say;

A set-based approach

More or less the same as the first Python example:

sub eratsieve($n) {
    # Requires n(1 - 1/(log(n-1))) storage
    my $multiples = set();
    gather for 2..$n -> $i {
        unless $i (&) $multiples { # is subset
            take $i;
            $multiples (+)= set($i**2, *+$i ... (* > $n)); # union
        }
    }
}

say flat eratsieve(100);

This gives:

 (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Using a chain of filters

This example is incorrect. Please fix the code and remove this message.

Details: This version uses modulo (division) testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: while this is "incorrect" by a strict interpretation of the rules, it is being left as an interesting example

sub primes ( UInt $n ) {
    gather {
        # create an iterator from 2 to $n (inclusive)
        my $iterator := (2..$n).iterator;

        loop {
            # If it passed all of the filters it must be prime
            my $prime := $iterator.pull-one;
            # unless it is actually the end of the sequence
            last if $prime =:= IterationEnd;

            take $prime; # add the prime to the `gather` sequence

            # filter out the factors of the current prime
            $iterator := Seq.new($iterator).grep(* % $prime).iterator;
            # (2..*).grep(* % 2).grep(* % 3).grep(* % 5).grep(* % 7)…
        }
    }
}

put primes( 100 );

Which prints

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

RATFOR

program prime
#
define(true,1)
define(false,0)
#
integer loop,loop2,limit,k,primes,count
integer isprime(1000)

limit = 1000
count = 0

for (loop=1; loop<=limit; loop=loop+1)
    {
       isprime(loop) = true
    }

isprime(1) = false

for (loop=2; loop<=limit; loop=loop+1)


    {
       if (isprime(loop) == true) 
          {
              count = count + 1
              for (loop2=loop*loop; loop2 <= limit; loop2=loop2+loop)
                 {
                     isprime(loop2) = false
                 }
          }
    }
write(*,*)
write(*,101) count

101 format('There are ',I12,' primes.')

count = 0
for (loop=1; loop<=limit; loop=loop+1)
        if (isprime(loop) == true)
           {
               Count = count + 1
               write(*,'(I6,$)')loop
               if (mod(count,10) == 0) write(*,*)
           }
write(*,*)

end

Red

primes: function [n [integer!]][
   poke prim: make bitset! n 1 true
   r: 2 while [r * r <= n][
      repeat q n / r - 1 [poke prim q + 1 * r true] 
      until [not pick prim r: r + 1]
   ]
   collect [repeat i n [if not prim/:i [keep i]]]
]

primes 100
== [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

Refal

$ENTRY Go {
    = <Print <Primes 100>>;
};

Primes {
    s.N = <Sieve <Iota 2 s.N>>;
};

Iota {
    s.End s.End = s.End;
    s.Start s.End = s.Start <Iota <+ 1 s.Start> s.End>;
};

Cross {
    s.Step e.List = <Cross (s.Step 1) s.Step e.List>;
    (s.Step s.Skip) = ;
    (s.Step 1) s.Item e.List = X <Cross (s.Step s.Step) e.List>;
    (s.Step s.N) s.Item e.List = s.Item <Cross (s.Step <- s.N 1>) e.List>;
};

Sieve {
    = ;
    X e.List = <Sieve e.List>;
    s.N e.List = s.N <Sieve <Cross s.N e.List>>;
};
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

REXX

no wheel version

The first three REXX versions make use of a sparse stemmed array:   [@.].

As the stemmed array gets heavily populated, the number of entries may slow down the REXX interpreter substantially,
depending upon the efficacy of the hashing technique being used for REXX variables (setting/retrieving).

/*REXX program generates and displays primes  via the  sieve of Eratosthenes  algorithm.*/
parse arg H .;   if H=='' | H==","  then H= 200  /*optain optional argument from the CL.*/
w= length(H);    @prime= right('prime', 20)      /*W:   is used for aligning the output.*/
@.=.                                             /*assume all the numbers are  prime.   */
#= 0                                             /*number of primes found  (so far).    */
     do j=2  for H-1;   if @.j==''  then iterate /*all prime integers up to H inclusive.*/
     #= # + 1                                    /*bump the prime number counter.       */
     say  @prime right(#,w)  " ───► " right(j,w) /*display the  prime  to the terminal. */
         do m=j*j  to H  by j;    @.m=;   end    /*strike all multiples as being ¬ prime*/
     end   /*j*/                                 /*       ───                           */
say                                              /*stick a fork in it,  we're all done. */
say  right(#, 1+w+length(@prime) )     'primes found up to and including '       H

output   when using the input default of:   200

               prime   1  ───►    2
               prime   2  ───►    3
               prime   3  ───►    5
               prime   4  ───►    7
               prime   5  ───►   11
               prime   6  ───►   13
               prime   7  ───►   17
               prime   8  ───►   19
               prime   9  ───►   23
               prime  10  ───►   29
               prime  11  ───►   31
               prime  12  ───►   37
               prime  13  ───►   41
               prime  14  ───►   43
               prime  15  ───►   47
               prime  16  ───►   53
               prime  17  ───►   59
               prime  18  ───►   61
               prime  19  ───►   67
               prime  20  ───►   71
               prime  21  ───►   73
               prime  22  ───►   79
               prime  23  ───►   83
               prime  24  ───►   89
               prime  25  ───►   97
               prime  26  ───►  101
               prime  27  ───►  103
               prime  28  ───►  107
               prime  29  ───►  109
               prime  30  ───►  113
               prime  31  ───►  127
               prime  32  ───►  131
               prime  33  ───►  137
               prime  34  ───►  139
               prime  35  ───►  149
               prime  36  ───►  151
               prime  37  ───►  157
               prime  38  ───►  163
               prime  39  ───►  167
               prime  40  ───►  173
               prime  41  ───►  179
               prime  42  ───►  181
               prime  43  ───►  191
               prime  44  ───►  193
               prime  45  ───►  197
               prime  46  ───►  199

                      46 primes found up to and including  200

wheel version, optional prime list suppression

This version skips striking the even numbers   (as being not prime),   2   is handled as a special case.

Also supported is the suppression of listing the primes if the   H   (high limit)   is negative.

Also added is a final message indicating the number of primes found.

/*REXX program generates primes via a  wheeled  sieve of Eratosthenes  algorithm.       */
parse arg H .;   if H==''  then H=200            /*let the highest number be specified. */
tell=h>0;     H=abs(H);    w=length(H)           /*a negative H suppresses prime listing*/
if 2<=H & tell  then say right(1, w+20)'st prime   ───► '      right(2, w)
@.= '0'x                                         /*assume that  all  numbers are prime. */
cw= length(@.)                                   /*the cell width that holds numbers.   */
#= w<=H                                          /*the number of primes found  (so far).*/
!=0                                              /*skips the top part of sieve marking. */
    do j=3  by 2  for (H-2)%2;  b= j%cw          /*odd integers up to   H   inclusive.  */
    if substr(x2b(c2x(@.b)),j//cw+1,1)  then iterate              /*is  J  composite ?  */
    #= # + 1                                     /*bump the prime number counter.       */
    if tell  then say right(#, w+20)th(#)    'prime   ───► '      right(j, w)
    if !     then iterate                        /*should the top part be skipped ?     */
    jj=j * j                                     /*compute the square of  J.         ___*/
    if jj>H  then !=1                            /*indicates skip top part  if  j > √ H */
      do m=jj  to H  by j+j;   call . m;   end   /* [↑]  strike odd multiples  ¬ prime  */
    end   /*j*/                                  /*             ───                     */

say;             say  right(#, w+20)      'prime's(#)    "found up to and including "  H
exit                                             /*stick a fork in it,  we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────────────*/
.: parse arg n; b=n%cw; r=n//cw+1;_=x2b(c2x(@.b));@.b=x2c(b2x(left(_,r-1)'1'substr(_,r+1)));return
s: if arg(1)==1  then return arg(3);  return word(arg(2) 's',1)            /*pluralizer.*/
th: procedure; parse arg x; x=abs(x); return word('th st nd rd',1+x//10*(x//100%10\==1)*(x//10<4))
output   when using the input default of:     200
                      1st prime   ───►    2
                      2nd prime   ───►    3
                      3rd prime   ───►    5
                      4th prime   ───►    7
                      5th prime   ───►   11
                      6th prime   ───►   13
                      7th prime   ───►   17
                      8th prime   ───►   19
                      9th prime   ───►   23
                     10th prime   ───►   29
                     11th prime   ───►   31
                     12th prime   ───►   37
                     13th prime   ───►   41
                     14th prime   ───►   43
                     15th prime   ───►   47
                     16th prime   ───►   53
                     17th prime   ───►   59
                     18th prime   ───►   61
                     19th prime   ───►   67
                     20th prime   ───►   71
                     21st prime   ───►   73
                     22nd prime   ───►   79
                     23rd prime   ───►   83
                     24th prime   ───►   89
                     25th prime   ───►   97
                     26th prime   ───►  101
                     27th prime   ───►  103
                     28th prime   ───►  107
                     29th prime   ───►  109
                     30th prime   ───►  113
                     31st prime   ───►  127
                     32nd prime   ───►  131
                     33rd prime   ───►  137
                     34th prime   ───►  139
                     35th prime   ───►  149
                     36th prime   ───►  151
                     37th prime   ───►  157
                     38th prime   ───►  163
                     39th prime   ───►  167
                     40th prime   ───►  173
                     41st prime   ───►  179
                     42nd prime   ───►  181
                     43rd prime   ───►  191
                     44th prime   ───►  193
                     45th prime   ───►  197
                     46th prime   ───►  199

                     46 primes found up to and including  200
output   when using the input of:     -1000
                     168 primes found up to and including  1000

output   when using the input of:     -10000
                     1229 primes found up to and including  10000
output   when using the input of:     -100000
                     9592 primes found up to and including  100000.
output   when using the input of:     -1000000
                     78498 primes found up to and including  10000000
output   when using the input of:     -10000000
                     664579 primes found up to and including  10000000

wheel version

This version skips striking the even numbers   (as being not prime),   2   is handled as a special case.

It also uses a short-circuit test for striking out composites   ≤   √ target 

/*REXX pgm generates and displays primes via a wheeled sieve of Eratosthenes algorithm. */
parse arg H .;  if H=='' | H==","  then H= 200   /*obtain the optional argument from CL.*/
w= length(H);       @prime= right('prime', 20)   /*w:  is used for aligning the output. */
if 2<=H  then  say  @prime  right(1, w)       " ───► "       right(2, w)
#= 2<=H                                          /*the number of primes found  (so far).*/
@.=.                                             /*assume all the numbers are  prime.   */
!=0;  do j=3  by 2  for (H-2)%2                  /*the odd integers up to  H  inclusive.*/
      if @.j==''  then iterate                   /*Is composite?  Then skip this number.*/
      #= # + 1                                   /*bump the prime number counter.       */
      say  @prime right(#,w) " ───► " right(j,w) /*display the prime to the terminal.   */
      if !        then iterate                   /*skip the top part of loop?       ___ */
      if j*j>H     then !=1                      /*indicate skip top part  if  J > √ H  */
          do m=j*j  to H  by j+j;   @.m=;   end  /*strike odd multiples as  not  prime. */
      end   /*j*/                                /*       ───                           */
say                                              /*stick a fork in it,  we're all done. */
say right(#,  1 + w + length(@prime) )    'primes found up to and including '    H
output   is identical to the first (non-wheel) version;   program execution is over   twice   as fast.

The addition of the short-circuit test   (using the REXX variable  !)   makes it about   another   20%   faster.

Wheel Version restructured

/*REXX program generates primes via sieve of Eratosthenes algorithm.
* 21.07.2012 Walter Pachl derived from above Rexx version
*                       avoid symbols @ and # (not supported by ooRexx)
*                       avoid negations (think positive)
**********************************************************************/
  highest=200                       /*define highest number to use.  */
  is_prime.=1                       /*assume all numbers are prime.  */
  w=length(highest)                 /*width of the biggest number,   */
                                    /*  it's used for aligned output.*/
  Do j=3 To highest By 2,           /*strike multiples of odd ints.  */
               While j*j<=highest   /* up to sqrt(highest)           */
      If is_prime.j Then Do
        Do jm=j*3 To highest By j+j /*start with next odd mult. of J */
          is_prime.jm=0             /*mark odd mult. of J not prime. */
          End
        End
    End
  np=0                              /*number of primes shown         */
  Call tell 2
  Do n=3 To highest By 2            /*list all the primes found.     */
    If is_prime.n Then Do
      Call tell n
      End
    End
  Exit
tell: Parse Arg prime
      np=np+1
      Say '           prime number' right(np,w) " --> " right(prime,w)
      Return

output is mostly identical to the above versions.

Ring

limit = 100
sieve = list(limit)
for i = 2 to limit
    for k = i*i to limit step i 
        sieve[k] = 1
    next
    if sieve[i] = 0 see "" + i + " " ok
next

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

RPL

This is a direct translation from Wikipedia. The variable i has been renamed ii to avoid confusion with the language constant i=√ -1

Works with: Halcyon Calc version 4.2.8
RPL code Comment
 ≪ → n
   ≪ { } n + 1 CON 'A' STO
       
      2 n √ FOR ii 
         IF A ii GET THEN
            ii SQ n FOR j
               'A' j 0 PUT ii STEP
         END
      NEXT
      { } 
      2 n FOR ii IF A ii GET THEN ii + END NEXT
      'A' PURGE
≫ ≫ 'SIEVE' STO
SIEVE ( n -- { prime_numbers } )
let A be an array of Boolean values, indexed by 2 to n,
   initially all set to true.   
   for i = 2, 3, 4, ..., not exceeding √n do
       if A[i] is true
           for j = i^2, i^2+i,... not exceeding n do
               set A[j] := false



return all i such that A[i] is true.


100 SIEVE
Output:
1: { 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 }

Latest RPL versions allow to remove some slow FOR..NEXT loops and use local variables only.

Works with: HP version 49
« 'X' DUP 1 4 PICK 1 SEQ DUP → n a seq123
  « 2 n √ FOR ii 
       IF a ii GET THEN
          ii SQ n FOR j
             'a' j 0 PUT ii STEP
       END
    NEXT 
    a seq123 IFT TAIL
» » 'SIEVE' STO
Works with: HP version 49

Ruby

eratosthenes starts with nums = [nil, nil, 2, 3, 4, 5, ..., n], then marks ( the nil setting ) multiples of 2, 3, 5, 7, ... there, then returns all non-nil numbers which are the primes.

def eratosthenes(n)
  nums = [nil, nil, *2..n]
  (2..Math.sqrt(n)).each do |i|
    (i**2..n).step(i){|m| nums[m] = nil}  if nums[i]
  end
  nums.compact
end
 
p eratosthenes(100)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

With a wheel

eratosthenes2 adds more optimizations, but the code is longer.

  • The array nums only tracks odd numbers (skips multiples of 2).
  • The array nums holds booleans instead of integers, and every multiple of 3 begins false.
  • The outer loop skips multiples of 2 and 3.
  • Both inner loops skip multiples of 2 and 3.
def eratosthenes2(n)
  # For odd i, if i is prime, nums[i >> 1] is true.
  # Set false for all multiples of 3.
  nums = [true, false, true] * ((n + 5) / 6)
  nums[0] = false  # 1 is not prime.
  nums[1] = true   # 3 is prime.

  # Outer loop and both inner loops are skipping multiples of 2 and 3.
  # Outer loop checks i * i > n, same as i > Math.sqrt(n).
  i = 5
  until (m = i * i) > n
    if nums[i >> 1]
      i_times_2 = i << 1
      i_times_4 = i << 2
      while m <= n
        nums[m >> 1] = false
        m += i_times_2
        nums[m >> 1] = false
        m += i_times_4  # When i = 5, skip 45, 75, 105, ...
      end
    end
    i += 2
    if nums[i >> 1]
      m = i * i
      i_times_2 = i << 1
      i_times_4 = i << 2
      while m <= n
        nums[m >> 1] = false
        m += i_times_4  # When i = 7, skip 63, 105, 147, ...
        nums[m >> 1] = false
        m += i_times_2
      end
    end
    i += 4
  end

  primes = [2]
  nums.each_index {|i| primes << (i * 2 + 1) if nums[i]}
  primes.pop while primes.last > n
  primes
end

p eratosthenes2(100)

This simple benchmark compares eratosthenes with eratosthenes2.

require 'benchmark'
Benchmark.bmbm {|x|
  x.report("eratosthenes") { eratosthenes(1_000_000) }
  x.report("eratosthenes2") { eratosthenes2(1_000_000) }
}

eratosthenes2 runs about 4 times faster than eratosthenes.

With the standard library

MRI 1.9.x implements the sieve of Eratosthenes at file prime.rb, class EratosthensesSeive (around line 421). This implementation optimizes for space, by packing the booleans into 16-bit integers. It also hardcodes all primes less than 256.

require 'prime'
p Prime::EratosthenesGenerator.new.take_while {|i| i <= 100}

Run BASIC

input "Gimme the limit:"; limit
dim flags(limit)
for i = 2 to  limit
 for k = i*i to limit step i 
  flags(k) = 1
 next k
if flags(i) = 0 then print i;", ";
next i
Gimme the limit:?100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 

Rust

Unboxed Iterator

A slightly more idiomatic, optimized and modern iterator output example.

fn primes(n: usize) -> impl Iterator<Item = usize> {
    const START: usize = 2;
    if n < START {
        Vec::new()
    } else {
        let mut is_prime = vec![true; n + 1 - START];
        let limit = (n as f64).sqrt() as usize;
        for i in START..limit + 1 {
            let mut it = is_prime[i - START..].iter_mut().step_by(i);
            if let Some(true) = it.next() {
                it.for_each(|x| *x = false);
            }
        }
        is_prime
    }
    .into_iter()
    .enumerate()
    .filter_map(|(e, b)| if b { Some(e + START) } else { None })
}

Notes:

  1. Starting at an offset of 2 means that an n < 2 input requires zero allocations, because Vec::new() doesn't allocate memory until elements are pushed into it.
  2. Using Vec as an output to the if .. {} else {} condition means the output is statically deterministic, avoiding the need for a boxed trait object.
  3. Iterating is_prime with .iter_mut() and then using .step_by(i) makes all the optimizations required, and removes a lot of tediousness.
  4. Returning impl Iterator allows for static dispatching instead of dynamic dispatching, which is possible because the type is now statically known at compile time, making the zero input/output condition an order of magnitude faster.


Sieve of Eratosthenes - No optimization

fn simple_sieve(limit: usize) -> Vec<usize> {

    let mut is_prime = vec![true; limit+1];
    is_prime[0] = false;
    if limit >= 1 { is_prime[1] = false }

    for num in 2..limit+1 {
        if is_prime[num] {
            let mut multiple = num*num;
            while multiple <= limit {
                is_prime[multiple] = false;
                multiple += num;
            }
        }
    }

    is_prime.iter().enumerate()
        .filter_map(|(pr, &is_pr)| if is_pr {Some(pr)} else {None} )
        .collect()
}

fn main() {
    println!("{:?}", simple_sieve(100));
}
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Basic Version slightly optimized, Iterator output

The above code doesn't even do the basic optimizing of only culling composites by primes up to the square root of the range as allowed in the task; it also outputs a vector of resulting primes, which consumes memory. The following code fixes both of those, outputting the results as an Iterator:

use std::iter::{empty, once};
use std::time::Instant;

fn basic_sieve(limit: usize) -> Box<Iterator<Item = usize>> {
    if limit < 2 { return Box::new(empty()) }

    let mut is_prime = vec![true; limit+1];
    is_prime[0] = false;
    if limit >= 1 { is_prime[1] = false }
    let sqrtlmt = (limit as f64).sqrt() as usize + 1; 

    for num in 2..sqrtlmt {
        if is_prime[num] {
            let mut multiple = num * num;
            while multiple <= limit {
                is_prime[multiple] = false;
                multiple += num;
            }
        }
    }

    Box::new(is_prime.into_iter().enumerate()
                .filter_map(|(p, is_prm)| if is_prm { Some(p) } else { None }))

}
 
fn main() {
    let n = 1000000;
    let vrslt = basic_sieve(100).collect::<Vec<_>>();
    println!("{:?}", vrslt);
    let strt = Instant::now();

    // do it 1000 times to get a reasonable execution time span...
    let rslt = (1..1000).map(|_| basic_sieve(n)).last().unwrap();

    let elpsd = strt.elapsed();

    let count = rslt.count();
    println!("{}", count);

    let secs = elpsd.as_secs();
    let millis = (elpsd.subsec_nanos() / 1000000) as u64;
    let dur = secs * 1000 + millis;
    println!("Culling composites took {} milliseconds.", dur);
}
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
78498
Culling composites took 4595 milliseconds.

The sieving operation is run for 1000 loops in order to get a reasonable execution time for comparison.

Odds-only bit-packed array, Iterator output

The following code improves the above code by sieving only odd composite numbers as 2 is the only even prime for a reduction in number of operations by a factor of about two and a half with reduction of memory use by a factor of two, and bit-packs the composite sieving array for a further reduction of memory use by a factor of eight and with some saving in time due to better CPU cache use for a given sieving range; it also demonstrates how to eliminate the redundant array bounds check:

fn optimized_sieve(limit: usize) -> Box<Iterator<Item = usize>> {
    if limit < 3 {
        return if limit < 2 { Box::new(empty()) } else { Box::new(once(2)) }
    }

    let ndxlmt = (limit - 3) / 2 + 1;
    let bfsz = ((limit - 3) / 2) / 32 + 1;
    let mut cmpsts = vec![0u32; bfsz];
    let sqrtndxlmt = ((limit as f64).sqrt() as usize - 3) / 2 + 1;

    for ndx in 0..sqrtndxlmt {
        if (cmpsts[ndx >> 5] & (1u32 << (ndx & 31))) == 0 {
            let p = ndx + ndx + 3;
            let mut cullpos = (p * p - 3) / 2;
            while cullpos < ndxlmt {
                unsafe { // avoids array bounds check, which is already done above
	            let cptr = cmpsts.get_unchecked_mut(cullpos >> 5);
	            *cptr |= 1u32 << (cullpos & 31);
                }
//                cmpsts[cullpos >> 5] |= 1u32 << (cullpos & 31); // with bounds check
                cullpos += p;
            }
        }
    }

    Box::new((-1 .. ndxlmt as isize).into_iter().filter_map(move |i| {
                if i < 0 { Some(2) } else {
                    if cmpsts[i as usize >> 5] & (1u32 << (i & 31)) == 0 {
                        Some((i + i + 3) as usize) } else { None } }
    }))
}

The above function can be used just by substituting "optimized_sieve" for "basic_sieve" in the previous "main" function, and the outputs are the same except that the time is only 1584 milliseconds, or about three times as fast.

Unbounded Page-Segmented bit-packed odds-only version with Iterator

Caution! This implementation is used in the Extensible prime generator task, so be sure not to break that implementation when changing this code.

While that above code is quite fast, as the range increases above the 10's of millions it begins to lose efficiency due to loss of CPU cache associativity as the size of the one-large-array used for culling composites grows beyond the limits of the various CPU caches. Accordingly the following page-segmented code where each culling page can be limited to not larger than the L1 CPU cache is about four times faster than the above for the range of one billion:

use std::iter::{empty, once};
use std::rc::Rc;
use std::cell::RefCell;
use std::time::Instant;

const RANGE: u64 = 1000000000;
const SZ_PAGE_BTS: u64 = (1 << 14) * 8; // this should be the size of the CPU L1 cache
const SZ_BASE_BTS: u64 = (1 << 7) * 8;
static CLUT: [u8; 256] = [
	8, 7, 7, 6, 7, 6, 6, 5, 7, 6, 6, 5, 6, 5, 5, 4, 7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 
	7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 
	7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 
	6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 
	7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 
	6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 
	6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 
	5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0 ];

fn count_page(lmti: usize, pg: &[u32]) -> i64 {
	let pgsz = pg.len(); let pgbts = pgsz * 32;
	let (lmt, icnt) = if lmti >= pgbts { (pgsz, 0) } else {
		let lstw = lmti / 32;
		let msk = 0xFFFFFFFEu32 << (lmti & 31);
		let v = (msk | pg[lstw]) as usize;
		(lstw, (CLUT[v & 0xFF] + CLUT[(v >> 8) & 0xFF]
			+ CLUT[(v >> 16) & 0xFF] + CLUT[v >> 24]) as u32)
	};
	let mut count = 0u32;
	for i in 0 .. lmt {
		let v = pg[i] as usize;
		count += (CLUT[v & 0xFF] + CLUT[(v >> 8) & 0xFF]
					+ CLUT[(v >> 16) & 0xFF] + CLUT[v >> 24]) as u32;
	}
	(icnt + count) as i64
}

fn primes_pages() -> Box<Iterator<Item = (u64, Vec<u32>)>> {
	// a memoized iterable enclosing a Vec that grows as needed from an Iterator...
	type Bpasi = Box<Iterator<Item = (u64, Vec<u32>)>>; // (lwi, base cmpsts page)
	type Bpas = Rc<(RefCell<Bpasi>, RefCell<Vec<Vec<u32>>>)>; // interior mutables
	struct Bps(Bpas); // iterable wrapper for base primes array state
	struct Bpsi<'a>(usize, &'a Bpas); // iterator with current pos, state ref's
	impl<'a> Iterator for Bpsi<'a> {
		type Item = &'a Vec<u32>;
		fn next(&mut self) -> Option<Self::Item> {
			let n = self.0; let bpas = self.1;
			while n >= bpas.1.borrow().len() { // not thread safe
				let nbpg = match bpas.0.borrow_mut().next() {
								Some(v) => v, _ => (0, vec!()) };
				if nbpg.1.is_empty() { return None } // end if no source iter
				bpas.1.borrow_mut().push(cnvrt2bppg(nbpg));
			}
			self.0 += 1; // unsafe pointer extends interior -> exterior lifetime
			// multi-threading might drop following Vec while reading - protect
			let ptr = &bpas.1.borrow()[n] as *const Vec<u32>;
			unsafe { Some(&(*ptr)) }
		}
	}
	impl<'a> IntoIterator for &'a Bps {
		type Item = &'a Vec<u32>;
		type IntoIter = Bpsi<'a>;
		fn into_iter(self) -> Self::IntoIter {
			Bpsi(0, &self.0)
		}
	}
	fn make_page(lwi: u64, szbts: u64, bppgs: &Bpas)
			-> (u64, Vec<u32>) {
		let nxti = lwi + szbts;
		let pbts = szbts as usize;
		let mut cmpsts = vec!(0u32; pbts / 32);
		'outer: for bpg in Bps(bppgs.clone()).into_iter() { // in the inner tight loop...
			let pgsz = bpg.len();
			for i in 0 .. pgsz {
				let p = bpg[i] as u64; let pc = p as usize;
				let s = (p * p - 3) / 2;
				if s >= nxti { break 'outer; } else { // page start address:
					let mut cp = if s >= lwi { (s - lwi) as usize } else {
						let r = ((lwi - s) % p) as usize;
						if r == 0 { 0 } else { pc - r }
					};
					while cp < pbts {
						unsafe { // avoids array bounds check, which is already done above
							let cptr = cmpsts.get_unchecked_mut(cp >> 5);
							*cptr |= 1u32 << (cp & 31); // about as fast as it gets...
						}
//						cmpsts[cp >> 5] |= 1u32 << (cp & 31);
						cp += pc;
					}
				}
			}
		}
		(lwi, cmpsts)
	}
	fn pages_from(lwi: u64, szbts: u64, bpas: Bpas)
			-> Box<Iterator<Item = (u64, Vec<u32>)>> {
		struct Gen(u64,  u64);
		impl Iterator for Gen {
			type Item = (u64, u64);
			#[inline]
			fn next(&mut self) -> Option<(u64, u64)> {
				let v = self.0; let inc = self.1; // calculate variable size here
				self.0 = v + inc;
				Some((v, inc))
			}
		}
		Box::new(Gen(lwi, szbts)
					.map(move |(lwi, szbts)| make_page(lwi, szbts, &bpas)))
	}
	fn cnvrt2bppg(cmpsts: (u64, Vec<u32>)) -> Vec<u32> {
		let (lwi, pg) = cmpsts;
		let pgbts = pg.len() * 32;
		let cnt = count_page(pgbts, &pg) as usize;
		let mut bpv = vec!(0u32; cnt);
		let mut j = 0; let bsp = (lwi + lwi + 3) as usize;
		for i in 0 .. pgbts {
			if (pg[i >> 5] & (1u32 << (i & 0x1F))) == 0u32 {
				bpv[j] = (bsp + i + i) as u32; j += 1;
			}
		}
		bpv
	}
	// recursive Rc/RefCell variable bpas - used only for init, then fixed ...
	// start with just enough base primes to init the first base primes page...
	let base_base_prms = vec!(3u32,5u32,7u32);
	let rcvv = RefCell::new(vec!(base_base_prms));
	let bpas: Bpas = Rc::new((RefCell::new(Box::new(empty())), rcvv));
	let initpg = make_page(0, 32, &bpas); // small base primes page for SZ_BASE_BTS = 2^7 * 8
	*bpas.1.borrow_mut() = vec!(cnvrt2bppg(initpg)); // use for first page
	let frstpg = make_page(0, SZ_BASE_BTS, &bpas); // init bpas for first base prime page
	*bpas.0.borrow_mut() = pages_from(SZ_BASE_BTS, SZ_BASE_BTS, bpas.clone()); // recurse bpas
	*bpas.1.borrow_mut() = vec!(cnvrt2bppg(frstpg)); // fixed for subsequent pages
	pages_from(0, SZ_PAGE_BTS, bpas) // and bpas also used here for main pages
}
 
fn primes_paged() -> Box<Iterator<Item = u64>> {
	fn list_paged_primes(cmpstpgs: Box<Iterator<Item = (u64, Vec<u32>)>>)
			-> Box<Iterator<Item = u64>> {
		Box::new(cmpstpgs.flat_map(move |(lwi, cmpsts)| {
			let pgbts = (cmpsts.len() * 32) as usize;
			(0..pgbts).filter_map(move |i| {
				if cmpsts[i >> 5] & (1u32 << (i & 31)) == 0 {
					Some((lwi + i as u64) * 2 + 3) } else { None } }) }))
	}
	Box::new(once(2u64).chain(list_paged_primes(primes_pages())))
}

fn count_primes_paged(top: u64) -> i64 {
	if top < 3 { if top < 2 { return 0i64 } else { return 1i64 } }
	let topi = (top - 3u64) / 2;
	primes_pages().take_while(|&(lwi, _)| lwi <= topi)
		.map(|(lwi, pg)| { count_page((topi - lwi) as usize, &pg) })
		.sum::<i64>() + 1
}

fn main() {
	let n = 262146;
	let vrslt = primes_paged()
			.take_while(|&p| p <= 100)
			.collect::<Vec<_>>();
	println!("{:?}", vrslt);

	let strt = Instant::now();

//	let count = primes_paged().take_while(|&p| p <= RANGE).count(); // slow way to count
	let count = count_primes_paged(RANGE); // fast way to count

	let elpsd = strt.elapsed();

	println!("{}", count);

	let secs = elpsd.as_secs();
	let millis = (elpsd.subsec_nanos() / 1000000) as u64;
	let dur = secs * 1000 + millis;
	println!("Culling composites took {} milliseconds.", dur);
}

The output is about the same as the previous codes except much faster; as well as cache size improvements mentioned above, it has a population count primes counting function that is able to determine the number of found primes about twice as fast as using the Iterator count() method (commented out and labelled as "the slow way" in the main function).

As listed above, the code maintains its efficiency up to about sixteen billion, and can easily be extended to be useful above that point by having the buffer size dynamically calculated to be proportional to the square root of the current range as commented in the code.

It would also be quite easy to extend the code to use multi-threading per page so that the time would be reduced proportionally to the number of true CPU cores used (not Hyper-Threaded ones) as in four true cores for many common high end desktop CPU's.

Before being extended to truly huge ranges such a 1e14, the code should have maximum wheel factorization added (2;3;5;7 wheels and the culling buffers further pre-culled by the primes (11;13;17; and maybe 19), which would speed it up by another factor of four or so for the range of one billion. It would also be possible to use extreme loop unrolling techniques such as used by "primesieve" written in C/C++ to increase the speed for this range by another factor of two or so.

The above code demonstrates some techniques to work within the limitations of Rust's ownership/borrowing/lifetime memory model as it: 1) uses a recursive secondary base primes Iterator made persistent by using a Vec that uses its own value as a source of its own page stream, 2) this is done by using a recursive variable that accessed as a Rc reference counted heap value with internal mutability by a pair of RefCell's, 3) note that the above secondary stream is not thread safe and needs to have the Rc changed to an Arc, the RefCell's changed to Mutex'es or (probably preferably RwLock's that enclose/lock all reading and writing operations in the secondary stream "Bpsi"'s next() method, and 4) the use of Iterators where their performance doesn't matter (at the page level) while using tight loops at more inner levels.

S-BASIC

comment
   Find primes up to the specified limit (here 1,000) using
   classic Sieve of Eratosthenes
end

$constant limit = 1000
$constant false = 0
$constant true = FFFFH

var i, k, count, col = integer
dim integer flags(limit)

print "Finding primes from 2 to";limit

rem - initialize table
for i = 1 to limit
  flags(i) = true
next i

rem - sieve for primes
for i = 2 to int(sqr(limit))
  if flags(i) = true then
     for k = (i*i) to limit step i
        flags(k) = false
     next k
next i

rem - write out primes 10 per line
count = 0
col = 1
for i = 2 to limit
   if flags(i) = true then
      begin
         print using "#####";i;
         count = count + 1
         col = col + 1
         if col > 10 then
            begin
               print
               col = 1
            end
      end
next i
print
print count; " primes were found."

end
Output:
Finding primes from 2 to 1000
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
                      . . .
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997
168 primes were found.

SAS

The following defines an IML routine to compute the sieve, and as an example stores the primes below 1000 in a dataset.

proc iml;
start sieve(n);
    a = J(n,1);
    a[1] = 0;
    do i = 1 to n;
        if a[i] then do;
            if i*i>n then return(a);
            a[i*(i:int(n/i))] = 0;
        end;
    end;
finish;

a = loc(sieve(1000))`;
create primes from a;
append from a;
close primes;
quit;

SASL

This example is incorrect. Please fix the code and remove this message.

Details: These use REM (division) testing and so are Trial Division algorithms, not Sieve of Eratosthenes.

Copied from SASL manual, top of page 36. This provides an infinite list.

show primes
WHERE
primes = sieve (2...)
sieve (p : x ) = p : sieve {a <- x; a REM p > 0}
?

The limited list for the first 1000 numbers

show primes
WHERE
primes = sieve (2..1000)
sieve (p : x ) = p : sieve {a <- x; a REM p > 0}
?

Scala

Genuine Eratosthenes sieve

import scala.annotation.tailrec
import scala.collection.parallel.mutable
import scala.compat.Platform

object GenuineEratosthenesSieve extends App {
  def sieveOfEratosthenes(limit: Int) = {
    val (primes: mutable.ParSet[Int], sqrtLimit) = (mutable.ParSet.empty ++ (2 to limit), math.sqrt(limit).toInt)
    @tailrec
    def prim(candidate: Int): Unit = {
      if (candidate <= sqrtLimit) {
        if (primes contains candidate) primes --= candidate * candidate to limit by candidate
        prim(candidate + 1)
      }
    }
    prim(2)
    primes
  }
  // BitSet toList is shuffled when using the ParSet version. So it has to be sorted before using it as a sequence.

  assert(sieveOfEratosthenes(15099480).size == 976729)
  println(s"Successfully completed without errors. [total ${Platform.currentTime - executionStart} ms]")
}
Output:
Successfully completed without errors. [total 39807 ms]

Process finished with exit code 0

While concise, the above code is quite slow but a little faster not using the ParSet (take out the '.par' for speed), in which case the sorting ('sorted') is not necessary for an additional small gain in speed; the above code is slow because of all the overhead in processing the bit packed "BitSet" bib-by-bit using complex "higher-order" method calls.

The following '''odds-only''' code is written in a very concise functional style (no mutable state other than the contents of the composites buffer and "higher order functions" for clarity), in this case using a Scala mutable BitSet:

object SoEwithBitSet {
  def makeSoE_PrimesTo(top: Int): Iterator[Int] = {
    val topNdx = (top - 3) / 2 //odds composite BitSet buffer offset down to 3
    val cmpsts = new scala.collection.mutable.BitSet(topNdx + 1) //size includes topNdx
    @inline def cullPrmCmpsts(prmNdx: Int) = {
      val prm = prmNdx + prmNdx + 3; cmpsts ++= ((prm * prm - 3) >>> 1 to topNdx by prm) }
    (0 to (Math.sqrt(top).toInt - 3) / 2).filterNot { cmpsts }.foreach { cullPrmCmpsts }
    Iterator.single(2) ++ (0 to topNdx).filterNot { cmpsts }.map { pi => pi + pi + 3 } }
}

In spite of being very concise, it is very much faster than the above code converted to odds-only due to the use of the BitSet instead of the hash table based Set (or ParSet), taking only a few seconds to enumerate the primes to 100 million as compared to the 10's of seconds to count the primes to above 15 million above.

Using tail recursion

The below '''odds-only''' code using a primitive array (bit packed) and tail recursion to avoid some of the enumeration delays due to nested complex "higher order" function calls is almost eight times faster than the above more functional code:

object SoEwithArray {
  def makeSoE_PrimesTo(top: Int) = {
    import scala.annotation.tailrec
    val topNdx = (top - 3) / 2 + 1 //odds composite BitSet buffer offset down to 3 plus 1 for overflow
    val (cmpsts, sqrtLmtNdx) = (new Array[Int]((topNdx >>> 5) + 1), (Math.sqrt(top).toInt - 3) / 2)

    @inline def isCmpst(ci: Int): Boolean = (cmpsts(ci >>> 5) & (1 << (ci & 31))) != 0

    @inline def setCmpst(ci: Int): Unit = cmpsts(ci >>> 5) |= 1 << (ci & 31)

    @tailrec def forCndNdxsFrom(cndNdx: Int): Unit =
      if (cndNdx <= sqrtLmtNdx) {
        if (!isCmpst(cndNdx)) { //is prime
          val p = cndNdx + cndNdx + 3
          
          @tailrec def cullPrmCmpstsFrom(cmpstNdx: Int): Unit =
            if (cmpstNdx <= topNdx) { setCmpst(cmpstNdx); cullPrmCmpstsFrom(cmpstNdx + p) }
          
          cullPrmCmpstsFrom((p * p - 3) >>> 1) }
        
        forCndNdxsFrom(cndNdx + 1) }; forCndNdxsFrom(0)

    @tailrec def getNxtPrmFrom(cndNdx: Int): Int =
      if ((cndNdx > topNdx) || !isCmpst(cndNdx)) cndNdx + cndNdx + 3 else getNxtPrmFrom(cndNdx + 1)

    Iterator.single(2) ++ Iterator.iterate(3)(p => getNxtPrmFrom(((p + 2) - 3) >>> 1)).takeWhile(_ <= top)
  }
}

It can be tested with the following code:

object Main extends App {
  import SoEwithArray._
  val top_num = 100000000
  val strt = System.nanoTime()
  val count = makeSoE_PrimesTo(top_num).size
  val end = System.nanoTime()
  println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
  println(f"Found $count primes up to $top_num" + ".")
  println("Using one large mutable Array and tail recursive loops.")
}

To produce the following output:

Output:
Successfully completed without errors. [total 661 ms]
Found 5761455 primes up to 100000000.
Using one large mutable Array and tail recursive loops.

Odds-only page-segmented "infinite" generator version using tail recursion

The above code still uses an amount of memory proportional to the range of the sieve (although bit-packed as 8 values per byte). As well as only sieving odd candidates, the following code uses a fixed range buffer that is about the size of the CPU L2 cache plus only storage for the base primes up to the square root of the range for a large potential saving in RAM memory used as well as greatly reducing memory access times. The use of innermost tail recursive loops for critical loops where the majority of the execution time is spent rather than "higher order" functions from iterators also greatly reduces execution time, with much of the remaining time used just to enumerate the primes output:

object APFSoEPagedOdds {
  import scala.annotation.tailrec
  
  private val CACHESZ = 1 << 18 //used cache buffer size
  private val PGSZ = CACHESZ / 4 //number of int's in cache
  private val PGBTS = PGSZ * 32 //number of bits in buffer
  
  //processing output type is a tuple of low bit (odds) address,
  // bit range size, and the actual culled page segment buffer.
  private type Chunk = (Long, Int, Array[Int])
  
  //produces an iteration of all the primes from an iteration of Chunks
  private def enumChnkPrms(chnks: Stream[Chunk]): Iterator[Long] = {
    def iterchnk(chnk: Chunk) = { //iterating primes per Chunk
      val (lw, rng, bf) = chnk
      @tailrec def nxtpi(i: Int): Int = { //find next prime index not composite
        if (i < rng && (bf(i >>> 5) & (1 << (i & 31))) != 0) nxtpi(i + 1) else i }
      Iterator.iterate(nxtpi(0))(i => nxtpi(i + 1)).takeWhile { _ < rng }
        .map { i => ((lw + i) << 1) + 3 } } //map from index to prime value
    chnks.toIterator.flatMap { iterchnk } }
  
  //culls the composite number bit representations from the bit-packed page buffer
  //using a given source of a base primes iterator
  private def cullPg(bsprms: Iterator[Long],
                     lowi: Long, buf: Array[Int]): Unit = {
    //cull for all base primes until square >= nxt
    val rng = buf.length * 32; val nxt = lowi + rng
    @tailrec def cull(bps: Iterator[Long]): Unit = {
      //given prime then calculate the base start address for prime squared
      val bp = bps.next(); val s = (bp * bp - 3) / 2
      //almost all of the execution time is spent in the following tight loop
      @tailrec def cullp(j: Int): Unit = { //cull the buffer for given prime
        if (j < rng) { buf(j >>> 5) |= 1 << (j & 31); cullp(j + bp.toInt) } }
      if (s < nxt) { //only cull for primes squared less than max
        //calculate the start address within this given page segment
        val strt = if (s >= lowi) (s - lowi).toInt else {
          val b = (lowi - s) % bp
          if (b == 0) 0 else (bp - b).toInt }
        cullp(strt); if (bps.hasNext) cull(bps) } } //loop for all primes in range
    //for the first page, use own bit pattern as a source of base primes
    //if another source is not given
    if (lowi <= 0 && bsprms.isEmpty)
      cull(enumChnkPrms(Stream((0, buf.length << 5, buf))))
    //otherwise use the given source of base primes
    else if (bsprms.hasNext) cull(bsprms) }
  
  //makes a chunk given a low address in (odds) bits
  private def mkChnk(lwi: Long): Chunk = {
    val rng = PGBTS; val buf = new Array[Int](rng / 32);
    val bps = if (lwi <= 0) Iterator.empty else enumChnkPrms(basePrms)
    cullPg(bps, lwi, buf); (lwi, rng, buf) }
  
  //new independent source of base primes in a stream of packed-bit arrays
  //memoized by converting it to a Stream and retaining a reference here
  private val basePrms: Stream[Chunk] =
    Stream.iterate(mkChnk(0)) { case (lw, rng, bf) => { mkChnk(lw + rng) } }
  
  //produces an infinite iterator over all the chunk results
  private def itrRslts[R](rsltf: Chunk => R): Iterator[R] = {
    def mkrslt(lwi: Long) = { //makes tuple of result and next low index
      val c = mkChnk(lwi); val (_, rng, _) = c; (rsltf(c), lwi + rng) }
    Iterator.iterate(mkrslt(0)) { case (_, nlwi) => mkrslt(nlwi) }
            .map { case (rslt, _) => rslt} } //infinite iteration of results
  
  //iterates across the "infinite" produced output primes
  def enumSoEPrimes(): Iterator[Long] = //use itrRsltsMP to produce Chunks iteration
    Iterator.single(2L) ++ enumChnkPrms(itrRslts { identity }.toStream)
 
  //counts the number of remaining primes in a page segment buffer
  //using a very fast bitcount per integer element
  //with special treatment for the last page
  private def countpgto(top: Long, b: Array[Int], nlwp: Long) = {
    val numbts = b.length * 32; val prng = numbts * 2
    @tailrec def cnt(i: Int, c: Int): Int = { //tight int bitcount loop
      if (i >= b.length) c else cnt (i + 1, c - Integer.bitCount(b(i))) }
    if (nlwp > top) { //for top in page, calculate int address containing top
      val bi = ((top - nlwp + prng) >>> 1).toInt
      val w = bi >>> 5; b(w) |= -2 << (bi & 31) //mark all following as composite
      for (i <- w + 1 until b.length) b(i) = -1 } //for all int's to end of buffer
    cnt(0, numbts) } //counting the entire buffer in every case
  
  //counts all the primes up to a top value
  def countSoEPrimesTo(top: Long): Long = {
    if (top < 2) return 0L else if (top < 3) return 1L //no work necessary
    //count all Chunks using multi-processing
    val gen = itrRslts { case (lwi, rng, bf) =>
      val nlwp = (lwi + rng) * 2 + 3; (countpgto(top, bf, nlwp), nlwp) }
    //a loop to take Chunk's up to including top limit but not past it
    @tailrec def takeUpto(acc: Long): Long = {
      val (cnt, nlwp) = gen.next(); val nacc = acc + cnt
      if (nlwp <= top) takeUpto(nacc) else nacc }; takeUpto(1) }
}

As the above and all following sieves are "infinite", they all require an extra range limiting condition to produce a finite output, such as the addition of ".takeWhile(_ <= topLimit)" where "topLimit" is the specified range as is done in the following code:

object MainSoEPagedOdds extends App {
  import APFSoEPagedOdds._
  countSoEPrimesTo(100)
  val top = 1000000000
  val strt = System.currentTimeMillis()
  val cnt = enumSoEPrimes().takeWhile { _ <= top }.length
//  val cnt = countSoEPrimesTo(top)
  val elpsd = System.currentTimeMillis() - strt
  println(f"Found $cnt primes up to $top in $elpsd milliseconds.")
}

which outputs the following:

Output:
Found 50847534 primes up to 1000000000 in 5867 milliseconds.

While the above code is reasonably fast, much of the execution time is consumed by the use of the built-in functions and iterators for concise code, especially in the use of iterators for primes output. To show this, the code includes a "countSoEPrimesTo" function/method that can be uncommented in the above code (commenting out the "takeWhile" line) to produce the following output:

Output:
Found 50847534 primes up to 1000000000 in 2623 milliseconds.

This shows that it takes somewhat longer to enumerate the primes than it does to actually produce them; this could be improved with a "roll-your-own" enumeration Iterator implementation at considerable increased complexity, but enumeration time will still be a significant portion of the execution time. Further improvements to the code using extreme wheel factorization and multi-processing will make enumeration time an even higher percentage of the total; this is why for large ranges one writes functions/methods similar to "countSoEPrimesTo" to (say) sum the primes, to find the nth prime, etc.

Odds-Only "infinite" generator sieve using Streams and Co-Inductive Streams

The following code uses delayed recursion via Streams to implement the Richard Bird algorithm mentioned in the last part (the Epilogue) of M.O'Neill's paper, which is a true incremental Sieve of Eratosthenes. It is nowhere near as fast as the array based solutions due to the overhead of functionally chasing the merging of the prime multiple streams; this also means that the empirical performance is not according to the usual Sieve of Eratosthenes approximations due to this overhead increasing as the log of the sieved range, but it is much better than the "unfaithful" sieve.

  def birdPrimes() = {
    def oddPrimes: Stream[Int] = {
      def merge(xs: Stream[Int], ys: Stream[Int]): Stream[Int] = {
        val (x, y) = (xs.head, ys.head)
   
        if (y > x) x #:: merge(xs.tail, ys) else if (x > y) y #:: merge(xs, ys.tail) else x #:: merge(xs.tail, ys.tail)
      }
   
      def primeMltpls(p: Int): Stream[Int] = Stream.iterate(p * p)(_ + p + p)
   
      def allMltpls(ps: Stream[Int]): Stream[Stream[Int]] = primeMltpls(ps.head) #:: allMltpls(ps.tail)
   
      def join(ams: Stream[Stream[Int]]): Stream[Int] = ams.head.head #:: merge(ams.head.tail, join(ams.tail))
   
      def oddPrms(n: Int, composites: Stream[Int]): Stream[Int] =
        if (n >= composites.head) oddPrms(n + 2, composites.tail) else n #:: oddPrms(n + 2, composites)
   
      //following uses a new recursive source of odd base primes
      3 #:: oddPrms(5, join(allMltpls(oddPrimes)))
    }
    2 #:: oddPrimes
  }

Now this algorithm doesn't really need the memoization and full laziness as offered by Streams, so an implementation and use of a Co-Inductive Stream (CIS) class is sufficient and reduces execution time by almost a factor of two:

  class CIS[A](val start: A, val continue: () => CIS[A])

  def primesBirdCIS: Iterator[Int] = {
    def merge(xs: CIS[Int], ys: CIS[Int]): CIS[Int] = {
      val (x, y) = (xs.start, ys.start)

      if (y > x) new CIS(x, () => merge(xs.continue(), ys))
      else if (x > y) new CIS(y, () => merge(xs, ys.continue()))
      else new CIS(x, () => merge(xs.continue(), ys.continue()))
    }

    def primeMltpls(p: Int): CIS[Int] = {
      def nextCull(cull: Int): CIS[Int] = new CIS[Int](cull, () => nextCull(cull + 2 * p))
      nextCull(p * p)
    }

    def allMltpls(ps: CIS[Int]): CIS[CIS[Int]] =
      new CIS[CIS[Int]](primeMltpls(ps.start), () => allMltpls(ps.continue()))
    def join(ams: CIS[CIS[Int]]): CIS[Int] = {
      new CIS[Int](ams.start.start, () => merge(ams.start.continue(), join(ams.continue())))
    }

    def oddPrimes(): CIS[Int] = {
      def oddPrms(n: Int, composites: CIS[Int]): CIS[Int] = { //"minua"
        if (n >= composites.start) oddPrms(n + 2, composites.continue())
        else new CIS[Int](n, () => oddPrms(n + 2, composites))
      }
      //following uses a new recursive source of odd base primes
      new CIS(3, () => oddPrms(5, join(allMltpls(oddPrimes()))))
    }

    Iterator.single(2) ++ Iterator.iterate(oddPrimes())(_.continue()).map(_.start)
  }

Further gains in performance for these last two implementations can be had by using further wheel factorization and "tree folding/merging" as per this Haskell implementation.

Odds-Only "infinite" generator sieve using a hash table (HashMap)

As per the "unfaithful sieve" article linked above, the incremental "infinite" Sieve of Eratosthenes can be implemented using a hash table instead of a Priority Queue or Map (Binary Heap) as were used in that article. The following implementation postpones the adding of base prime representations to the hash table until necessary to keep the hash table small:

  def SoEInc: Iterator[Int] = {
    val nextComposites = scala.collection.mutable.HashMap[Int, Int]()
    def oddPrimes: Iterator[Int] = {
      val basePrimes = SoEInc
      basePrimes.next()
      basePrimes.next() // skip the two and three prime factors
      @tailrec def makePrime(state: (Int, Int, Int)): (Int, Int, Int) = {
        val (candidate, nextBasePrime, nextSquare) = state
        if (candidate >= nextSquare) {
          val adv = nextBasePrime << 1
          nextComposites += ((nextSquare + adv) -> adv)
          val np = basePrimes.next()
          makePrime((candidate + 2, np, np * np))
        } else if (nextComposites.contains(candidate)) {
          val adv = nextComposites(candidate)
          nextComposites -= (candidate) += (Iterator.iterate(candidate + adv)(_ + adv)
            .dropWhile(nextComposites.contains(_)).next() -> adv)
          makePrime((candidate + 2, nextBasePrime, nextSquare))
        } else (candidate, nextBasePrime, nextSquare)
      }
      Iterator.iterate((5, 3, 9)) { case (c, p, q) => makePrime((c + 2, p, q)) }
        .map { case (p, _, _) => p }
    }
    List(2, 3).toIterator ++ oddPrimes
  }

The above could be implemented using Streams or Co-Inductive Streams to pass the continuation parameters as passed here in a tuple but there would be no real difference in speed and there is no need to use the implied laziness. As compared to the versions of the Bird (or tree folding) Sieve of Eratosthenes, this has the expected same computational complexity as the array based versions, but is about 20 times slower due to the constant overhead of processing the key value hashing. Memory use is quite low, only being the hash table entries for each of the base prime values less than the square root of the last prime enumerated multiplied by the size of each hash entry (about 12 bytes in this case) plus a "load factor" percentage overhead in hash table size to minimize hash collisions (about twice as large as entries actually used by default on average).

The Scala implementable of a mutable HashMap is slower than the java.util.HashMap one by a factor of almost two, but the Scala version is used here to keep the code more portable (as to CLR). One can also quite easily convert this code to use the immutable Scala HashMap, but the code runs about four times slower due to the required "copy on update" operations for immutable objects.

This algorithm is very responsive to further application of wheel factorization, which can make it run up to about four times faster for the composite number culling operations; however, that is not enough to allow it to catch up to the array based sieves.

Scheme

Tail-recursive solution

Works with: Scheme version RRS
; Tail-recursive solution :
(define (sieve n)
  (define (aux u v)
    (let ((p (car v)))
      (if (> (* p p) n)
        (let rev-append ((u u) (v v))
          (if (null? u) v (rev-append (cdr u) (cons (car u) v))))
        (aux (cons p u)
          (let wheel ((u '()) (v (cdr v)) (a (* p p)))
            (cond ((null? v) (reverse u))
                  ((= (car v) a) (wheel u (cdr v) (+ a p)))
                  ((> (car v) a) (wheel u v (+ a p)))
                  (else (wheel (cons (car v) u) (cdr v) a))))))))
  (aux '(2)
    (let range ((v '()) (k (if (odd? n) n (- n 1))))
      (if (< k 3) v (range (cons k v) (- k 2))))))

; > (sieve 100)
; (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
; > (length (sieve 10000000))
; 664579

Simpler, non-tail-recursive solution

; Simpler solution, with the penalty that none of 'iota, 'strike or 'sieve is tail-recursive :
(define (iota start stop stride)
  (if (> start stop)
      (list)
      (cons start (iota (+ start stride) stop stride))))

(define (strike lst start stride)
  (cond ((null? lst) lst)
        ((= (car lst) start) (strike (cdr lst) (+ start stride) stride))
        ((> (car lst) start) (strike lst (+ start stride) stride))
        (else (cons (car lst) (strike (cdr lst) start stride)))))

(define (primes limit)
  (let ((stop (sqrt limit)))
    (define (sieve lst)
      (let ((p (car lst)))
        (if (> p stop)
            lst
            (cons p (sieve (strike (cdr lst) (* p p) p))))))
    (sieve (iota 2 limit 1))))

(display (primes 100))
(newline)

Output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Optimised using an odds-wheel

Optimised using a pre-computed wheel based on 2 (i.e. odds only):

(define (primes-wheel-2 limit)
  (let ((stop (sqrt limit)))
    (define (sieve lst)
      (let ((p (car lst)))
        (if (> p stop)
            lst
            (cons p (sieve (strike (cdr lst) (* p p) (* 2 p)))))))
    (cons 2 (sieve (iota 3 limit 2)))))

(display (primes-wheel-2 100))
(newline)

Output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Vector-based

Vector-based (faster), works with RRS:

; initialize v to vector of sequential integers
(define (initialize! v)
  (define (iter v n) (if (>= n (vector-length v)) 
                         (values) 
                         (begin (vector-set! v n n) (iter v (+ n 1)))))
  (iter v 0))

; set every nth element of vector v to 0,
; starting with element m
(define (strike! v m n)
  (cond ((>= m (vector-length v)) (values))
        (else (begin
                (vector-set! v m 0)
                (strike! v (+ m n) n)))))

; lowest non-zero index of vector v >= n
(define (nextprime v n)
  (if (zero? (vector-ref v n))
      (nextprime v (+ n 1))
      (vector-ref v n)))

; remove elements satisfying pred? from list lst
(define (remove pred? lst)
  (cond 
    ((null? lst) '())
    ((pred? (car lst))(remove pred? (cdr lst)))
    (else (cons (car lst) (remove pred? (cdr lst))))))

; the sieve itself
(define (sieve n)
  (define stop (sqrt n))
  (define (iter v p)
    (cond 
      ((> p stop) v)
      (else 
       (begin
         (strike! v (* p p) p)
         (iter v (nextprime v (+ p 1)))))))
  
  (let ((v (make-vector (+ n 1))))
    (initialize! v)
    (vector-set! v 1 0) ; 1 is not a prime
    (remove zero? (vector->list (iter v 2)))))

SICP-style streams

Using SICP-style head-forced streams. Works with MIT-Scheme, Chez Scheme, – or any other Scheme, if writing out by hand the expansion of the only macro here, s-cons, with explicit lambda. Common functions:

 ;;;; Stream Implementation
 (define (head s) (car s))   
 (define (tail s) ((cdr s)))  
 (define-syntax s-cons
   (syntax-rules () ((s-cons h t) (cons h (lambda () t))))) 

 ;;;; Stream Utility Functions
 (define (from-By x s)
   (s-cons x (from-By (+ x s) s)))
 (define (take n s) 
   (cond 
     ((> n 1) (cons (head s) (take (- n 1) (tail s))))
     ((= n 1) (list (head s)))      ;; don't force it too soon!!
     (else '())))     ;; so (take 4 (s-map / (from-By 4 -1))) works
 (define (drop n s)
   (cond 
     ((> n 0) (drop (- n 1) (tail s)))
     (else s)))
 (define (s-map f s)
   (s-cons (f (head s)) (s-map f (tail s))))
 (define (s-diff s1 s2)
   (let ((h1 (head s1)) (h2 (head s2)))
    (cond
     ((< h1 h2) (s-cons h1 (s-diff  (tail s1)       s2 )))
     ((< h2 h1)            (s-diff        s1  (tail s2)))
     (else                 (s-diff  (tail s1) (tail s2))))))
 (define (s-union s1 s2)
   (let ((h1 (head s1)) (h2 (head s2)))
    (cond
     ((< h1 h2) (s-cons h1 (s-union (tail s1)       s2 )))
     ((< h2 h1) (s-cons h2 (s-union       s1  (tail s2))))
     (else      (s-cons h1 (s-union (tail s1) (tail s2)))))))

The simplest, naive sieve

Very slow, running at ~ n2.2, empirically, and worsening:

 (define (sieve s) 
	(let ((p (head s))) 
	  (s-cons p 
	          (sieve (s-diff s (from-By p p))))))
 (define primes (sieve (from-By 2 1)))

Bounded, stopping early

Stops at the square root of the upper limit m, running at about ~ n1.4 in n primes produced, empirically. Returns infinite stream of numbers which is only valid up to m, includes composites above it:

 (define (primes-To m)
   (define (sieve s) 
     (let ((p (head s))) 
       (cond ((> (* p p) m) s) 
             (else (s-cons p 
	             (sieve (s-diff (tail s) 
	                     (from-By (* p p) p))))))))
   (sieve (from-By 2 1)))

Combined multiples sieve

Archetypal, straightforward approach by Richard Bird, presented in Melissa E. O'Neill article. Uses s-linear-join, i.e. right fold, which is less efficient and of worse time complexity than the tree-folding that follows. Does not attempt to conserve space by arranging for the additional inner feedback loop, as is done in the tree-folding variant below.

 (define (primes-stream-ala-Bird)
   (define (mults p) (from-By (* p p) p))
   (define primes                                          ;; primes are 
       (s-cons 2 (s-diff (from-By 3 1)                     ;;  numbers > 1, without 
                  (s-linear-join (s-map mults primes)))))  ;;   multiples of primes
   primes)

 ;;;; join streams using linear structure
 (define (s-linear-join sts)
   (s-cons (head (head sts)) 
           (s-union (tail (head sts)) 
                    (s-linear-join (tail sts)))))

Here is a version of the same sieve, which is self contained with all the requisite functions wrapped in the overall function; optimized further. It works with odd primes only, and arranges for a separate primes feed for the base primes separate from the output stream, calculated recursively by the recursive call to "oddprms" in forming "cmpsts". It also "fuses" two functions, s-diff and from-By, into one, minusstrtat:

(define (birdPrimes)
  (define (mltpls p)
    (define pm2 (* p 2))
    (let nxtmltpl ((cmpst (* p p)))
      (cons cmpst (lambda () (nxtmltpl (+ cmpst pm2))))))
  (define (allmltpls ps)
    (cons (mltpls (car ps)) (lambda () (allmltpls ((cdr ps))))))
  (define (merge xs ys)
    (let ((x (car xs)) (xt (cdr xs)) (y (car ys)) (yt (cdr ys)))
      (cond ((< x y) (cons x (lambda () (merge (xt) ys))))
            ((> x y) (cons y (lambda () (merge xs (yt)))))
            (else (cons x (lambda () (merge (xt) (yt))))))))
  (define (mrgmltpls mltplss)
    (cons (car (car mltplss))
          (lambda () (merge ((cdr (car mltplss)))
                            (mrgmltpls ((cdr mltplss)))))))
  (define (minusstrtat n cmps)
    (if (< n (car cmps))
      (cons n (lambda () (minusstrtat (+ n 2) cmps)))
      (minusstrtat (+ n 2) ((cdr cmps)))))
  (define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
  (define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))  
  (cons 2 (lambda () (oddprms))))

It can be tested with the following code:

(define (nthPrime n)
  (let nxtprm ((cnt 0) (ps (birdPrimes)))
    (if (< cnt n) (nxtprm (+ cnt 1) ((cdr ps))) (car ps))))
(nthPrime 1000000)
Output:

15485863

The same code can easily be modified to perform the folded tree case just by writing and integrating a "pairs" function to do the folding along with the merge, which has been done as an alternate tree folding case below.

Tree-folding

The most efficient. Finds composites as a tree of unions of each prime's multiples.

 ;;;; all primes' multiples are removed, merged through a tree of unions
 ;;;;  runs in ~ n^1.15 run time in producing n = 100K .. 1M primes
 (define (primes-stream)
   (define (mults p) (from-By (* p p) (* 2 p)))
   (define (odd-primes-From from)              ;; odd primes from (odd) f are
       (s-diff (from-By from 2)                ;; all odds from f without the
               (s-tree-join (s-map mults odd-primes))))  ;; multiples of odd primes
   (define odd-primes 
       (s-cons 3 (odd-primes-From 5)))         ;; inner feedback loop
   (s-cons 2 (odd-primes-From 3)))             ;; result stream

 ;;;; join an ordered stream of streams (here, of primes' multiples)
 ;;;; into one ordered stream, via an infinite right-deepening tree
 (define (s-tree-join sts)
   (s-cons (head (head sts))
           (s-union (tail (head sts))
                    (s-tree-join (pairs (tail sts))))))

 (define (pairs sts)                        ;; {a.(b.t)} -> (a+b).{t}
     (s-cons (s-cons (head (head sts)) 
                     (s-union (tail (head sts)) 
                              (head (tail sts))))
             (pairs (tail (tail sts)))))

Print 10 last primes of the first thousand primes:

(display (take 10 (drop 990 (primes-stream)))) 
;
(7841 7853 7867 7873 7877 7879 7883 7901 7907 7919)

This can be also accomplished by the following self contained code which follows the format of the birdPrimes code above with the added "pairs" function integrated into the "mrgmltpls" function:

(define (treemergePrimes)
  (define (mltpls p)
    (define pm2 (* p 2))
    (let nxtmltpl ((cmpst (* p p)))
      (cons cmpst (lambda () (nxtmltpl (+ cmpst pm2))))))
  (define (allmltpls ps)
    (cons (mltpls (car ps)) (lambda () (allmltpls ((cdr ps))))))
  (define (merge xs ys)
    (let ((x (car xs)) (xt (cdr xs)) (y (car ys)) (yt (cdr ys)))
      (cond ((< x y) (cons x (lambda () (merge (xt) ys))))
            ((> x y) (cons y (lambda () (merge xs (yt)))))
            (else (cons x (lambda () (merge (xt) (yt))))))))
  (define (pairs mltplss)
    (let ((tl ((cdr mltplss))))
      (cons (merge (car mltplss) (car tl))
            (lambda () (pairs ((cdr tl)))))))
  (define (mrgmltpls mltplss)
    (cons (car (car mltplss))
          (lambda () (merge ((cdr (car mltplss)))
                            (mrgmltpls (pairs ((cdr mltplss))))))))
  (define (minusstrtat n cmps)
    (if (< n (car cmps))
      (cons n (lambda () (minusstrtat (+ n 2) cmps)))
      (minusstrtat (+ n 2) ((cdr cmps)))))
  (define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
  (define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))  
  (cons 2 (lambda () (oddprms))))

It can be tested with the same code as the self-contained Richard Bird sieve, just by calling treemergePrimes instead of birdPrimes.

Generators

(define (integers n)
  (lambda ()
    (let ((ans n))
      (set! n (+ n 1))
      ans)))

(define natural-numbers (integers 0)) 

(define (remove-multiples g n)
  (letrec ((m (+ n n))
           (self
              (lambda ()
                 (let loop ((x (g)))
                    (cond ((< x m) x)
                          ((= x m) (set! m (+ m n)) (self))
                          (else (set! m (+ m n)) (loop x)))))))
     self))

(define (sieve g)
  (lambda ()
    (let ((x (g)))
      (set! g (remove-multiples g x))
      x)))

(define primes (sieve (integers 2)))

Scilab

function a = sieve(n)
    a = ~zeros(n, 1)
    a(1) = %f
    for i = 1:n
        if a(i)
            j = i*i
            if j > n
                return
            end
            a(j:i:n) = %f
        end
    end
endfunction

find(sieve(100))
// [2 3 5 ... 97]

sum(sieve(1000))
// 168, the number of primes below 1000


Scratch

when clicked
    broadcast: fill list with zero (0) and wait
    broadcast: put one (1) in list of multiples and wait
    broadcast: fill primes where zero (0 in list

when I receive: fill list with zero (0)
    delete all of primes
    delete all of list
    set i to 0
    set maximum to 25
    repeat maximum
        add 0 to list
        change i by 1
    {end repeat}

when I receive: put ones (1) in list of multiples
    set S to sqrt of maximum
    set i to 2
    set k to 0
    repeat S
        change J by 1
        set i to 2
        repeat until i > 100
            if not (i = J) then
                if item i of list = 0 then
                    set m to (i mod J)
                    if (m = 0) then
                        replace item i of list with 1
        {end repeat until}
        change i by 1
        set k to 1
        delete all of primes
    {end repeat}
    set J to 1

when I receive: fill primes where zeros (0) in list
    repeat maximum
        if (item k of list) = 0 then
            add k to primes
        set k to (k + 1)
    {end repeat}

Comments

Scratch is a graphical drag and drop language designed to teach children an introduction to programming. It has easy to use multimedia and animation features. The code listed above was not entered into the Scratch IDE but faithfully represents the graphical code blocks used to run the sieve algorithm. The actual Scratch graphical code blocks cannot be represented on this web site due to its inability to directly represent graphical code. The actual code and output can be seen or downloaded at an external URL web link:

Scratch Code and Output

Seed7

The program below computes the number of primes between 1 and 10000000:

$ include "seed7_05.s7i";

const func set of integer: eratosthenes (in integer: n) is func
  result
    var set of integer: sieve is EMPTY_SET;
  local
    var integer: i is 0;
    var integer: j is 0;
  begin
    sieve := {2 .. n};
    for i range 2 to sqrt(n) do
      if i in sieve then
        for j range i ** 2 to n step i do
          excl(sieve, j);
        end for;
      end if;
    end for;
  end func;

const proc: main is func
  begin
    writeln(card(eratosthenes(10000000)));
  end func;

Original source: [1]

SETL

program eratosthenes;
    print(sieve 100);

    op sieve(n);
        numbers := [1..n];
        numbers(1) := om;
        loop for i in [2..floor sqrt n] do
            loop for j in [i*i, i*i+i..n] do
                numbers(j) := om;
            end loop;
        end loop;
        return [n : n in numbers | n /= om];
    end op;
end program;
Output:
[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

Sidef

Translation of: Raku
func sieve(limit) {
    var sieve_arr = [false, false, (limit-1).of(true)...]
    gather {
        sieve_arr.each_kv { |number, is_prime|
            if (is_prime) {
                take(number)
                for i in (number**2 .. limit `by` number) {
                    sieve_arr[i] = false
                }
            }
        }
    }
}

say sieve(100).join(",")
Output:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97

Alternative implementation:

func sieve(limit) {
    var composite = []
    for n in (2 .. limit.isqrt) {
        for i in (n**2 .. limit `by` n) {
            composite[i] = true
        }
    }
    2..limit -> grep{ !composite[_] }
}

say sieve(100).join(",")

Simula

Works with: Simula-67
BEGIN
    INTEGER ARRAY t(0:1000);
    INTEGER i,j,k;
    FOR i:=0 STEP 1 UNTIL 1000 DO t(i):=1;
    t(0):=0; t(1):=0;
    i:=0;
    FOR i:=i WHILE i<1000 DO
    BEGIN
        FOR i:=i WHILE i<1000 AND t(i)=0 DO i:=i+1;
        IF i<1000 THEN
        BEGIN
            j:=2;
            k:=j*i;
            FOR k:=k WHILE k<1000 DO
            BEGIN
                t(k):=0;
                j:=j+1;
                k:=j*i
            END;
            i:=i+1
        END
    END;
    FOR i:=0 STEP 1 UNTIL 999 DO
       IF t(i)<>0  THEN
       BEGIN
           OutInt(i,5); OutImage
       END
END
Output:
    2
    3
    5
    7
   11
   13
   17
   19
   23
   29
...
  937
  941
  947
  953
  967
  971
  977
  983
  991
  997

A Concurrent Prime Sieve

! A CONCURRENT PRIME SIEVE ;

BEGIN

BOOLEAN DEBUG;

CLASS FILTER(INPUT, OUTPUT, PRIME); REF(FILTER) INPUT, OUTPUT; INTEGER PRIME;
BEGIN
    INTEGER NUM;
    IF PRIME = 0 AND INPUT == NONE THEN
    BEGIN
        ! SEND THE SEQUENCE 2, 3, 4, ... TO CHANNEL 'CH'. ;
        DETACH;
        NUM := 2;
        WHILE TRUE DO
        BEGIN
            IF OUTPUT == NONE THEN
            BEGIN
                IF DEBUG THEN
                BEGIN
                    OUTTEXT("GENERATE SENDS ");
                    OUTINT(NUM, 0);
                    OUTIMAGE;
                END;
                DETACH; ! SEND 'NUM' ;
            END ELSE
            BEGIN
                IF DEBUG THEN
                BEGIN
                    OUTTEXT("GENERATE SENDS ");
                    OUTINT(NUM, 0);
                    OUTTEXT(" TO FILTER("); OUTINT(OUTPUT.PRIME, 0);
                    OUTTEXT(")");
                    OUTIMAGE;
                END;
                OUTPUT.NUM := NUM;
                RESUME(OUTPUT);
            END;
            NUM := NUM + 1;
        END;
    END ELSE
    BEGIN
        ! COPY THE VALUES FROM CHANNEL 'IN' TO CHANNEL 'OUT', ;
        ! REMOVING THOSE DIVISIBLE BY 'PRIME'. ;
        DETACH;
        ! FILTER ;
        WHILE TRUE DO
        BEGIN
            INTEGER I;
            RESUME(INPUT);
            I := INPUT.NUM; ! RECEIVE VALUE FROM 'INPUT'. ;
            IF DEBUG THEN
            BEGIN
                OUTTEXT("FILTER("); OUTINT(PRIME, 0); OUTTEXT(") RECEIVES ");
                OUTINT(I, 0);
                OUTIMAGE;
            END;
            IF NOT MOD(I, PRIME) = 0 THEN
            BEGIN
                IF OUTPUT == NONE THEN
                BEGIN
                    IF DEBUG THEN
                    BEGIN
                        OUTTEXT("FILTER("); OUTINT(PRIME, 0);
                        OUTTEXT(") SENDS ");
                        OUTINT(I, 0);
                        OUTIMAGE;
                    END;
                    DETACH;
                END ELSE
                BEGIN
                    IF DEBUG THEN
                    BEGIN
                        OUTTEXT("FILTER("); OUTINT(PRIME, 0);
                        OUTTEXT(") SENDS ");
                        OUTINT(I, 0);
                        OUTTEXT(" TO FILTER("); OUTINT(OUTPUT.PRIME, 0);
                        OUTTEXT(")");
                        OUTIMAGE;
                    END;
                    OUTPUT.NUM := I; ! SEND 'I' TO 'OUT'. ;
                    RESUME(OUTPUT);
                END;
            END;
        END;
    END;
END;

! THE PRIME SIEVE: DAISY-CHAIN FILTER PROCESSES. ;
! MAIN BLOCK ;
    REF(FILTER) CH;
    INTEGER I, PRIME;
    DEBUG := TRUE;
    CH :- NEW FILTER(NONE, NONE, 0); ! LAUNCH GENERATE GOROUTINE. ;
    FOR I := 1 STEP 1 UNTIL 5 DO
    BEGIN
        REF(FILTER) CH1;
        RESUME(CH);
        PRIME := CH.NUM;
        IF DEBUG THEN OUTTEXT("MAIN BLOCK RECEIVES ");
        OUTINT(PRIME,0);
        OUTIMAGE;
        CH1 :- NEW FILTER(CH, NONE, PRIME);
        CH.OUTPUT :- CH1;
        CH :- CH1;
    END;
END;

Output:

GENERATE SENDS 2
MAIN BLOCK RECEIVES 2
GENERATE SENDS 3 TO FILTER(2)
FILTER(2) RECEIVES 3
FILTER(2) SENDS 3
MAIN BLOCK RECEIVES 3
GENERATE SENDS 4 TO FILTER(2)
FILTER(2) RECEIVES 4
GENERATE SENDS 5 TO FILTER(2)
FILTER(2) RECEIVES 5
FILTER(2) SENDS 5 TO FILTER(3)
FILTER(3) RECEIVES 5
FILTER(3) SENDS 5
MAIN BLOCK RECEIVES 5
GENERATE SENDS 6 TO FILTER(2)
FILTER(2) RECEIVES 6
GENERATE SENDS 7 TO FILTER(2)
FILTER(2) RECEIVES 7
FILTER(2) SENDS 7 TO FILTER(3)
FILTER(3) RECEIVES 7
FILTER(3) SENDS 7 TO FILTER(5)
FILTER(5) RECEIVES 7
FILTER(5) SENDS 7
MAIN BLOCK RECEIVES 7
GENERATE SENDS 8 TO FILTER(2)
FILTER(2) RECEIVES 8
GENERATE SENDS 9 TO FILTER(2)
FILTER(2) RECEIVES 9
FILTER(2) SENDS 9 TO FILTER(3)
FILTER(3) RECEIVES 9
GENERATE SENDS 10 TO FILTER(2)
FILTER(2) RECEIVES 10
GENERATE SENDS 11 TO FILTER(2)
FILTER(2) RECEIVES 11
FILTER(2) SENDS 11 TO FILTER(3)
FILTER(3) RECEIVES 11
FILTER(3) SENDS 11 TO FILTER(5)
FILTER(5) RECEIVES 11
FILTER(5) SENDS 11 TO FILTER(7)
FILTER(7) RECEIVES 11
FILTER(7) SENDS 11
MAIN BLOCK RECEIVES 11

Smalltalk

A simple implementation that you can run in a workspace. It finds all the prime numbers up to and including limit—for the sake of example, up to and including 100.

| potentialPrimes limit |
limit := 100.
potentialPrimes := Array new: limit.
potentialPrimes atAllPut: true.
2 to: limit sqrt do: [:testNumber |
    (potentialPrimes at: testNumber) ifTrue: [
        (testNumber * 2) to: limit by: testNumber do: [:nonPrime |
            potentialPrimes at: nonPrime put: false
        ]
    ]
].
2 to: limit do: [:testNumber |
    (potentialPrimes at: testNumber) ifTrue: [
        Transcript show: testNumber asString; cr
    ]
]

SNOBOL4

Using strings instead of arrays, and the square/sqrt optimizations.

        define('sieve(n)i,j,k,p,str,res') :(sieve_end)
sieve   i = lt(i,n - 1) i + 1 :f(sv1)
        str = str (i + 1) ' ' :(sieve)
sv1     str break(' ') . j span(' ') = :f(return)
        sieve = sieve j ' '
        sieve = gt(j ^ 2,n) sieve str :s(return) ;* Opt1
        res = ''
        str (arb ' ') @p ((j ^ 2) ' ') ;* Opt2
        str len(p) . res = ;* Opt2
sv2     str break(' ') . k  span(' ') = :f(sv3)
        res = ne(remdr(k,j),0) res k ' ' :(sv2)
sv3     str = res :(sv1)
sieve_end

*       # Test and display        
        output = sieve(100)
end

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

SparForte

As a structured script.

#!/usr/local/bin/spar
pragma annotate( summary, "sieve" );
pragma annotate( description, "The Sieve of Eratosthenes is a simple algorithm that" );
pragma annotate( description, "finds the prime numbers up to a given integer. Implement ");
pragma annotate( description, "this algorithm, with the only allowed optimization that" );
pragma annotate( description, "the outer loop can stop at the square root of the limit," );
pragma annotate( description, "and the inner loop may start at the square of the prime" );
pragma annotate( description, "just found. That means especially that you shouldn't" );
pragma annotate( description, "optimize by using pre-computed wheels, i.e. don't assume" );
pragma annotate( description, "you need only to cross out odd numbers (wheel based on" );
pragma annotate( description, "2), numbers equal to 1 or 5 modulo 6 (wheel based on 2" );
pragma annotate( description, "and 3), or similar wheels based on low primes." );
pragma annotate( see_also, "http://rosettacode.org/wiki/Sieve_of_Eratosthenes" );
pragma annotate( author, "Ken O. Burtch" );
pragma license( unrestricted );

pragma restriction( no_external_commands );

procedure sieve is 
   last_bool : constant positive := 20;
   type bool_array is array(2..last_bool) of boolean;
   a : bool_array;
 
   test_num : positive;  
   -- limit    : positive := positive(numerics.sqrt(float(arrays.last(a))));

   -- n : positive := 2;  
begin
   for i in arrays.first(a)..last_bool loop
     a(i) := true;
   end loop;

   for num in arrays.first(a)..last_bool loop
     if a(num) then
        test_num := num * num;
        while test_num <= last_bool loop
          a(test_num) := false;
          test_num := @ + num;
        end loop;
     end if;
   end loop;
 
   for i in arrays.first(a)..last_bool loop
     if a(i) then
       put_line(i);
     end if;
   end loop;
end sieve;

Standard ML

Works with SML/NJ. This uses BitArrays which are available in SML/NJ. The algorithm is the one on wikipedia, referred to above. Limit: Memory, normally. When more than 20 petabyte of memory available, this code will have its limitation at a maximum integer around 1.44*10E17, due to the maximum list length in SMLNJ. Using two extra loops, however, bit arrays can simply be stored to disk and processed in multiple lists. With a tail recursive wrapper function as well, the upper limit will be determined by available disk space only.

val segmentedSieve =  fn N =>
(* output : list of {segment=bit segment, start=number at startposition segment} *)

let

val NSegmt= 120000000;                                                                                  (* segment size *)


val inf2i = IntInf.toInt ;
val i2inf = IntInf.fromInt ;
val isInt = fn m => m <= IntInf.fromInt (valOf Int.maxInt);

val sweep = fn (bits, step, k, up) =>                                                                   (* strike off bits up to limit *)
       (while (  !k < up  andalso 0 <= !k  ) do
             (  BitArray.clrBit( bits, !k ) ; k:= !k +  step ; ()) ) handle Overflow => ()

val rec nextPrimebit =                                                                                  (* find next 1-bit within segment *)
     fn Bits =>
     fn pos  =>
        if pos+1 >= BitArray.length Bits
	  then    NONE
          else    ( if BitArray.sub ( Bits,pos) then SOME (i2inf pos) else nextPrimebit Bits (pos+1) );


val sieveEratosthenes =  fn n: int =>                                                             (* Eratosthenes sieve , up to+incl n *)

 let
  val nums= BitArray.bits(n,[] );
  val i=ref 2;
  val k=ref (!i * (!i) -1);

 in

  ( BitArray.complement nums;
    BitArray.clrBit( nums, 0 );
    while ( !k <n ) do (  if ( BitArray.sub (nums, !i - 1) ) then  sweep (nums, !i, k, n) else ();
      i:= !i+1;
      k:= !i * (!i) - 1 
    );
    [ { start= i2inf 1, segment=nums } ]                                                                              
  )

 end;



val sieveThroughSegment =

 fn ( primes : { segment:BitArray.array, start:IntInf.int } list, low : IntInf.int, up ) =>
                                                                                                        (* second segment and on *)
 let
  val n      = inf2i (up-low+1)
  val nums   = BitArray.bits(n,[] ); 
  val itdone = low div i2inf NSegmt

  val rec oldprimes = fn c =>  fn                                                                 (* use segment B to sweep current one *)
                 []                       => ()
      | ctlist as {start=st,segment=B}::t =>
       let
       
        val nxt  = nextPrimebit B c ;
        val p    = st +  Option.getOpt( nxt,~10)  
        val modp = ( i2inf NSegmt * itdone ) mod p
        val i    = inf2i ( if( isInt( p - modp ) ) then p - modp else 0 )                               (* i = 0 breaks off *)
        val k    = ref   ( if Option.isSome nxt  then  (i - 1)  else ~2 )
        val step = if (isInt(p)) then inf2i(p) else valOf Int.maxInt                                    (* !k+maxInt > n *)

       in
       
          ( sweep (nums, step, k, n) ;
	    if ( p*p <= up  andalso  Option.isSome nxt )
	       then    oldprimes ( inf2i (p-st+1) ) ctlist
	       else    oldprimes 0 t                                                                    (* next segment B *)
          ) 

       end

 in
  (  BitArray.complement nums;
     oldprimes 0 primes;
     rev ( {start = low, segment = nums } :: rev (primes) )
  )
 end;



val rec workSegmentsDown = fn firstFn =>
    			   fn nextFns =>
			   fn sizeSeg : int =>
			   fn upLimit : IntInf.int =>
 let
   val residual = upLimit mod i2inf sizeSeg
 in

   if ( upLimit <= i2inf sizeSeg ) then firstFn (  inf2i upLimit )
   else
     if ( residual > 0 ) then
           nextFns ( workSegmentsDown firstFn nextFns sizeSeg (upLimit - residual ),     upLimit - residual      + 1, upLimit )
     else
           nextFns ( workSegmentsDown firstFn nextFns sizeSeg (upLimit - i2inf sizeSeg), upLimit - i2inf sizeSeg + 1, upLimit ) 
 end;

in

  workSegmentsDown  sieveEratosthenes  sieveThroughSegment  NSegmt  N

end;

Example, segment size 120 million, prime numbers up to 2.5 billion:

-val writeSegment = fn  L : {segment:BitArray.array, start:IntInf.int} list =>   fn NthSegment =>
		   let
		        val M=List.nth (L , NthSegment - 1 )
		   in
		        List.map (fn x=> x + #start M)  (map IntInf.fromInt (BitArray.getBits ( #segment M)) ) 
		   end;
- val primesInBits = segmentedSieve 2500000000 ;
val primesInBits =
  [{segment=-,start=1},{segment=-,start=120000001},
   {segment=-,start=240000001},{segment=-,start=360000001},
   {segment=-,start=480000001},..  <skipped> ,...]
  : {segment:BitArray.array, start:IntInf.int} list
- writeSegment primesInBits 21 ;
val it =
  [2400000011,2400000017,2400000023,2400000047,2400000061,2400000073,
   2400000091,2400000103,2400000121,2400000133,2400000137,2400000157,...]
  : IntInf.int list
- writeSegment primesInBits 1 ;
val it = [2,3,5,7,11,13,17,19,23,29,31,37,...] : IntInf.int list

Stata

A program to create a dataset with a variable p containing primes up to a given number.

prog def sieve
	args n
	clear
	qui set obs `n'
	gen long p=_n
	gen byte a=_n>1
	forv i=2/`n' {
		if a[`i'] {
			loc j=`i'*`i'
			if `j'>`n' {
				continue, break
			}
			forv k=`j'(`i')`n' {
				qui replace a=0 in `k'
			}
		}
	}
	qui keep if a
	drop a
end

Example call

sieve 100
list in 1/10 // show only the first ten primes

     +----+
     |  p |
     |----|
  1. |  2 |
  2. |  3 |
  3. |  5 |
  4. |  7 |
  5. | 11 |
     |----|
  6. | 13 |
  7. | 17 |
  8. | 19 |
  9. | 23 |
 10. | 29 |
     +----+

Mata

mata
real colvector sieve(real scalar n) {
	real colvector a
	real scalar i, j
	if (n < 2) return(J(0, 1, .))
	a = J(n, 1, 1)
	a[1] = 0
	for (i = 1; i <= n; i++) {
		if (a[i]) {
			j = i*i
			if (j > n) return(select(1::n, a))
			for (; j <= n; j = j+i) a[j] = 0
		}
	}
}

sieve(10)
       1
    +-----+
  1 |  2  |
  2 |  3  |
  3 |  5  |
  4 |  7  |
    +-----+
end

Swift

import Foundation // for sqrt() and Date()

let max = 1_000_000
let maxroot = Int(sqrt(Double(max)))
let startingPoint = Date()

var isprime = [Bool](repeating: true, count: max+1 )
for i in 2...maxroot {
    if isprime[i] {
        for k in stride(from: max/i, through: i, by: -1) {
            if isprime[k] {
                isprime[i*k] = false }
        }
    }
}

var count = 0
for i in 2...max {
    if isprime[i] {
        count += 1
    }
}
print("\(count) primes found under \(max)")

print("\(startingPoint.timeIntervalSinceNow * -1) seconds")
Output:
78498 primes found under 1000000
0.01282501220703125 seconds

iMac 3,2 GHz Intel Core i5

Alternative odds-only version

The most obvious two improvements are to sieve for only odds as two is the only even prime and to make the sieving array bit-packed so that instead of a whole 8-bit byte per number representation there, each is represented by just one bit; these two changes improved memory use by a factor of 16 and the better CPU cache locality more than compensates for the extra code required to implement bit packing as per the following code:

func soePackedOdds(_ n: Int) ->
    LazyMapSequence<UnfoldSequence<Int, (Int?, Bool)>, Int> {
 
  let lmti = (n - 3) / 2
  let size = lmti / 8 + 1
  var sieve = Array<UInt8>(repeating: 0, count: size)
  let sqrtlmti = (Int(sqrt(Double(n))) - 3) / 2
 
  for i in 0...sqrtlmti {
    if sieve[i >> 3] & (1 << (i & 7)) == 0 {
      let p = i + i + 3
      for c in stride(from: (i*(i+3)<<1)+3, through: lmti, by: p) {
        sieve[c >> 3] |= 1 << (c & 7)
      }
    }
  }

  return sequence(first: -1, next: { (i:Int) -> Int? in
      var ni = i + 1
      while ni <= lmti && sieve[ni >> 3] & (1 << (ni & 7)) != 0 { ni += 1}
      return ni > lmti ? nil : ni
    }).lazy.map { $0 < 0 ? 2 : $0 + $0 + 3 }
}

the output for the same testing (with `soePackedOdds` substituted for `primes`) is the same except that it is about 1.5 times faster or only 1200 cycles per prime.

These "one huge sieving array" algorithms are never going to be very fast for extended ranges (past about the CPU L2 cache size for this processor supporting a range of about eight million), and a page segmented approach should be taken as per the last of the unbounded algorithms below.

Unbounded (Odds-Only) Versions

To use Swift's "higher order functions" on the generated `Sequence`'s effectively, one needs unbounded (or only by the numeric range chosen for the implementation) sieves. Many of these are incremental sieves that, instead of buffering a series of culled arrays, records the culling structure of the culling base primes (which should be a secondary stream of primes for efficiency) and produces the primes incrementally through reference and update of that structure. Various structures may be chosen for this, as in a MinHeap Priority Queue, a Hash Dictionary, or a simple List tree structure as used in the following code:

import Foundation

func soeTreeFoldingOdds() -> UnfoldSequence<Int, (Int?, Bool)> {
  class CIS<T> {
    let head: T
    let rest: (() -> CIS<T>)
    init(_ hd: T, _ rst: @escaping (() -> CIS<T>)) {
      self.head = hd; self.rest = rst
    }
  }

  func merge(_ xs: CIS<Int>, _ ys: CIS<Int>) -> CIS<Int> {
    let x = xs.head; let y = ys.head
    if x < y { return CIS(x, {() in merge(xs.rest(), ys) }) }
    else { if y < x { return CIS(y, {() in merge(xs, ys.rest()) }) }
    else { return CIS(x, {() in merge(xs.rest(), ys.rest()) }) } }
  }

  func smults(_ p: Int) -> CIS<Int> {
    let inc = p + p
    func smlts(_ c: Int) -> CIS<Int> {
      return CIS(c, { () in smlts(c + inc) })
    }
    return smlts(p * p)
  }

  func allmults(_ ps: CIS<Int>) -> CIS<CIS<Int>> {
    return CIS(smults(ps.head), { () in allmults(ps.rest()) })
  }

  func pairs(_ css: CIS<CIS<Int>>) -> CIS<CIS<Int>> {
    let cs0 = css.head; let ncss = css.rest()
    return CIS(merge(cs0, ncss.head), { () in pairs(ncss.rest()) })
  }

  func cmpsts(_ css: CIS<CIS<Int>>) -> CIS<Int> {
    let cs0 = css.head
    return CIS(cs0.head, { () in merge(cs0.rest(), cmpsts(pairs(css.rest()))) })
  }

  func minusAt(_ n: Int, _ cs: CIS<Int>) -> CIS<Int> {
    var nn = n; var ncs = cs
    while nn >= ncs.head { nn += 2; ncs = ncs.rest() }
    return CIS(nn, { () in minusAt(nn + 2, ncs) })
  }

  func oddprms() -> CIS<Int> {
    return CIS(3, { () in minusAt(5, cmpsts(allmults(oddprms()))) })
  }

  var odds = oddprms()
  return sequence(first: 2, next: { _ in
    let p = odds.head; odds = odds.rest()
    return p
  })
}

let range = 100000000

print("The primes up to 100 are:")
soeTreeFoldingOdds().prefix(while: { $0 <= 100 })
  .forEach { print($0, "", terminator: "") }
print()

print("Found \(soeTreeFoldingOdds().lazy.prefix(while: { $0 <= 1000000 })
                .reduce(0) { (a, _) in a + 1 }) primes to 1000000.")

let start = NSDate()
let answr = soeTreeFoldingOdds().prefix(while: { $0 <= range })
              .reduce(0) { (a, _) in a + 1 }
let elpsd = -start.timeIntervalSinceNow

print("Found \(answr) primes to \(range).")

print(String(format: "This test took %.3f milliseconds.", elpsd * 1000))

The output is the same as for the above except that it is much slower at about 56,000 CPU clock cycles per prime even just sieving to ten million due to the many memory allocations/de-allocations. It also has a O(n (log n) (log (log n))) asymptotic computational complexity (with the extra "log n" factor) that makes it slower with increasing range. This makes this algorithm only useful up to ranges of a few million although it is adequate to solve trivial problems such as Euler Problem 10 of summing all the primes to two million.

Note that the above code is almost a pure functional version using immutability other than for the use of loops because Swift doesn't support Tail Call Optimization (TCO) in function recursion: the loops do what TCO usually automatically does "under the covers".

Alternative version using a (mutable) Hash Dictionary

As the above code is slow due to memory allocations/de-allocations and the inherent extra "log n" term in the complexity, the following code uses a Hash Dictionary which has an average of O(1) access time (without the "log n" and uses mutability for increased seed so is in no way purely functional:

func soeDictOdds() -> UnfoldSequence<Int, Int> {
  var bp = 5; var q = 25
  var bps: UnfoldSequence<Int, Int>.Iterator? = nil
  var dict = [9: 6] // Dictionary<Int, Int>(9 => 6)
  return sequence(state: 2, next: { n in
    if n < 9 { if n < 3 { n = 3; return 2 }; defer {n += 2}; return n }
    while n >= q || dict[n] != nil {
      if n >= q {
        let inc = bp + bp
        dict[n + inc] = inc
        if bps == nil {
          bps = soeDictOdds().makeIterator()
          bp = (bps?.next())!; bp = (bps?.next())!; bp = (bps?.next())! // skip 2/3/5...
        }
        bp = (bps?.next())!; q = bp * bp // guaranteed never nil
      } else {
        let inc = dict[n] ?? 0
        dict[n] = nil
        var next = n + inc
        while dict[next] != nil { next += inc }
        dict[next] = inc
      }
      n += 2
    }
    defer { n += 2 }; return n
  })
}

It can be substituted in the above code just by substituting the `soeDictOdds` in three places in the testing code with the same output other than it is over four times faster or about 12,500 CPU clock cycles per prime.

Fast Bit-Packed Page-Segmented Version

An unbounded array-based algorithm can be written that combines the excellent cache locality of the second bounded version above but is unbounded by producing a sequence of sieved bit-packed arrays that are CPU cache size as required with a secondary stream of base primes used in culling produced in the same fashion, as in the following code:

import Foundation

typealias Prime = UInt64
typealias BasePrime = UInt32
typealias SieveBuffer = [UInt8]
typealias BasePrimeArray = [UInt32]

// the lazy list decribed problems don't affect its use here as
// it is only used here for its memoization properties and not consumed...
// In fact a consumed deferred list would be better off to use a CIS as above!

// a lazy list to memoize the progression of base prime arrays...
// there is some bug in Swift 4.2 that generating a LazyList<T> with a
// function and immediately using an extension method on it without
// first storing it to a variable results in mem seg fault for large
// ranges in the order of a million; in order to write a consuming
// function, one must write a function passing in a generator thunk, and
// immediately call a `makeIterator()` on it before storing, then doing a
// iteration on the iterator; doing a for on the immediately produced
// LazyList<T> (without storing it) also works, but this means we have to
// implement the "higher order functions" ourselves.
// this bug may have something to do with "move sematics".
class LazyList<T> : LazySequenceProtocol {
  internal typealias Thunk<T> = () -> T
  let head : T
  internal var _thnk: Thunk<LazyList<T>?>?
  lazy var tail: LazyList<T>? = {
    let tl = self._thnk?(); self._thnk = nil
    return tl
  }()
  init(_ hd: T, _ thnk: @escaping Thunk<LazyList<T>?>) {
    self.head = hd; self._thnk = thnk
  }
  struct LLSeqIter : IteratorProtocol, LazySequenceProtocol {
    @usableFromInline
    internal var _isfirst: Bool = true
    @usableFromInline
    internal var _current: LazyList<T>
    @inlinable // ensure that reference is not released by weak reference
    init(_ base: LazyList<T>) { self._current = base }
    @inlinable // can't be called by multiple threads on same LLSeqIter...
    mutating func next() -> T? {
      let curll = self._current
      if (self._isfirst) { self._isfirst = false; return curll.head }
      let ncur = curll.tail
      if (ncur == nil) { return nil }
      self._current = ncur!
      return ncur!.head
    }
    @inlinable
    func makeIterator() -> LLSeqIter {
      return LLSeqIter(self._current)
    }
  }
  @inlinable
  func makeIterator() -> LLSeqIter {
    return LLSeqIter(self)
  }
}

internal func makeCLUT() -> Array<UInt8> {
  var clut = Array(repeating: UInt8(0), count: 65536)
  for i in 0..<65536 {
    let v0 = ~i & 0xFFFF
    let v1 = v0 - ((v0 & 0xAAAA) >> 1)
    let v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2)
    let v3 = (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) &* 0x0101)) >> 8) & 31
    clut[i] = UInt8(v3)
  }
  return clut
}

internal let CLUT = makeCLUT()

internal func countComposites(_ cmpsts: SieveBuffer) -> Int {
  let len = cmpsts.count >> 1
  let clutp = UnsafePointer(CLUT) // for faster un-bounds checked access
  var bufp = UnsafeRawPointer(UnsafePointer(cmpsts))
                .assumingMemoryBound(to: UInt16.self)
  let plmt = bufp + len
  var count: Int = 0
  while (bufp < plmt) {
    count += Int(clutp[Int(bufp.pointee)])
    bufp += 1
  }
  return count
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
internal func composites2BasePrimeArray(_ low: BasePrime, _ cmpsts: SieveBuffer)
                                                          -> BasePrimeArray {
  let lmti = cmpsts.count << 3
  let len = countComposites(cmpsts)
  var rslt = BasePrimeArray(repeating: BasePrime(0), count: len)
  var j = 0
  for i in 0..<lmti {
    if (cmpsts[i >> 3] & (1 << (i & 7)) == UInt8(0)) {
      rslt[j] = low + BasePrime(i + i); j += 1
    }
  }
  return rslt
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
// uses pointers to avoid bounds checking for speed, but bounds are checked in code.
// uses an improved algorithm to maximize simple culling loop speed for
// the majority of cases of smaller base primes, only reverting to normal
// bit-packing operations for larger base primes...
// NOTE: a further optimization of maximum loop unrolling can later be
// implemented when warranted after maximum wheel factorization is implemented.
internal func sieveComposites(
      _ low: Prime, _ buf: SieveBuffer,
      _ bpas: LazyList<BasePrimeArray>) {
  let lowi = Int64((low - 3) >> 1)
  let len = buf.count
  let lmti = Int64(len << 3)
  let bufp = UnsafeMutablePointer(mutating: buf)
  let plen = bufp + len
  let nxti = lowi + lmti
  for bpa in bpas {
    for bp in bpa {
      let bp64 = Int64(bp)
      let bpi64 = (bp64 - 3) >> 1
      var strti = (bpi64 * (bpi64 + 3) << 1) + 3
      if (strti >= nxti) { return }
      if (strti >= lowi) { strti -= lowi }
      else {
        let r = (lowi - strti) % bp64
        strti = r == 0 ? 0 : bp64 - r
      }
      if (bp <= UInt32(len >> 3) && strti <= (lmti - 20 * bp64)) {
        let slmti = min(lmti, strti + (bp64 << 3))
        while (strti < slmti) {
          let msk = UInt8(1 << (strti & 7))
          var cp = bufp + Int(strti >> 3)
          while (cp < plen) {
              cp.pointee |= msk; cp += Int(bp64)
          }
          strti &+= bp64
        }
      }
      else {
        var c = strti
        let nbufp = UnsafeMutableRawPointer(bufp)
                      .assumingMemoryBound(to: Int32.self)
        while (c < lmti) {
            nbufp[Int(c >> 5)] |= 1 << (c & 31)
            c &+= bp64
        }
      }
    }
  }
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
// following used for fast clearing of SieveBuffer of multiple base size...
internal let clrbpseg = SieveBuffer(repeating: UInt8(0), count: 512)
internal func makeBasePrimeArrays() -> LazyList<BasePrimeArray> {
  var cmpsts = SieveBuffer(repeating: UInt8(0), count: 512)
  func nextelem(_ low: BasePrime, _ bpas: LazyList<BasePrimeArray>)
                                                -> LazyList<BasePrimeArray> {
    // calculate size so that the bit span is at least as big as the
    // maximum culling prime required, rounded up to minsizebits blocks...
    let rqdsz = 2 + Int(sqrt(Double(1 + low)))
    let sz = ((rqdsz >> 12) + 1) << 9 // size in bytes, blocks of 512 bytes
    if (sz > cmpsts.count) {
      cmpsts = SieveBuffer(repeating: UInt8(0), count: sz)
    }
    // fast clearing of the SieveBuffer array?
    for i in stride(from: 0, to: cmpsts.count, by: 512) {
      cmpsts.replaceSubrange(i..<i+512, with: clrbpseg)
    }
    sieveComposites(Prime(low), cmpsts, bpas)
    let arr = composites2BasePrimeArray(low, cmpsts)
    let nxt = low + BasePrime(cmpsts.count << 4)
    return LazyList(arr, { nextelem(nxt, bpas) })
  }
  // pre-seeding breaks recursive race,
  // as only known base primes used for first page...
  let preseedarr: [BasePrime] = [
    3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
    , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]
  return
    LazyList(
      preseedarr,
      { nextelem(BasePrime(101), makeBasePrimeArrays()) })
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
internal let clrseg = SieveBuffer(repeating: UInt8(0), count: 16384)
func makeSievePages()
    -> UnfoldSequence<(Prime, SieveBuffer), ((Prime, SieveBuffer)?, Bool)> {
  let bpas = makeBasePrimeArrays()
  let cmpsts = SieveBuffer(repeating: UInt8(0), count: 16384)
  let low = Prime(3)
  sieveComposites(low, cmpsts, bpas)
  return sequence(first: (low, cmpsts), next: { (low, cmpsts) in
    var ncmpsts = cmpsts
    let rqdsz = 2 + Int(sqrt(Double(1 + low))) // problem with sqrt not exact past about 10^12!!!!!!!!!
    let sz = ((rqdsz >> 17) + 1) << 14 // size iin bytes, by chunks of 16384
    if (sz > ncmpsts.count) {
      ncmpsts = SieveBuffer(repeating: UInt8(0), count: sz)
    }
    // fast clearing of the SieveBuffer array?
    for i in stride(from: 0, to: ncmpsts.count, by: 16384) {
      ncmpsts.replaceSubrange(i..<i+16384, with: clrseg)
    }
    let nlow = low + Prime(ncmpsts.count << 4)
    sieveComposites(nlow, ncmpsts, bpas)
    return (nlow, ncmpsts)
  })
}

func countPrimesTo(_ range: Prime) -> Int64 {
  if (range < 3) { if (range < 2) { return Int64(0) }
                   else { return Int64(1) } }
  let rngi = Int64(range - 3) >> 1
  let clutp = UnsafePointer(CLUT) // for faster un-bounds checked access
  var count: Int64 = 1
  for sp in makeSievePages() {
    let (low, cmpsts) = sp; let lowi = Int64(low - 3) >> 1
    if ((lowi + Int64(cmpsts.count << 3)) > rngi) {
      let lsti = Int(rngi - lowi); let lstw = lsti >> 4
      let msk = UInt16(-2 << (lsti & 15))
      var bufp = UnsafeRawPointer(UnsafePointer(cmpsts))
                    .assumingMemoryBound(to: UInt16.self)
      let plmt = bufp + lstw
      while (bufp < plmt) {
        count += Int64(clutp[Int(bufp.pointee)]); bufp += 1
      }
      count += Int64(clutp[Int(bufp.pointee | msk)]);
      break;
    } else {
      count += Int64(countComposites(cmpsts))
    }
  }
  return count
}

// iterator of primes from the generated culled page segments...
struct PagedPrimesSeqIter: LazySequenceProtocol, IteratorProtocol {
  @inlinable
  init() {
    self._pgs = makeSievePages().makeIterator()
    self._cmpstsp = UnsafePointer(self._pgs.next()!.1)
  }
  @usableFromInline
  internal var _pgs: UnfoldSequence<(Prime, SieveBuffer), ((Prime, SieveBuffer)?, Bool)>
  @usableFromInline
  internal var _i = -2
  @usableFromInline
  internal var _low = Prime(3)
  @usableFromInline
  internal var _cmpstsp: UnsafePointer<UInt8>
  @usableFromInline
  internal var _lmt = 131072
  @inlinable
  mutating func next() -> Prime? {
    if self._i < -1 { self._i = -1; return Prime(2) }
    while true {
      repeat { self._i += 1 }
      while self._i < self._lmt &&
              (Int(self._cmpstsp[self._i >> 3]) & (1 << (self._i & 7))) != 0
      if self._i < self._lmt { break }
      let pg = self._pgs.next(); self._low = pg!.0
      let cmpsts = pg!.1; self._lmt = cmpsts.count << 3
      self._cmpstsp = UnsafePointer(cmpsts); self._i = -1
    }
    return self._low + Prime(self._i + self._i)
  }
  @inlinable
  func makeIterator() -> PagedPrimesSeqIter {
    return PagedPrimesSeqIter()
  }
  @inlinable
  var elements: PagedPrimesSeqIter {
    return PagedPrimesSeqIter()
  }
}
 
// sequence over primes using the above prime iterator from page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as to sieve them...
func primesPaged() -> PagedPrimesSeqIter { return PagedPrimesSeqIter() }

let range = Prime(1000000000)

print("The first 25 primes are:")
primesPaged().prefix(25).forEach { print($0, "", terminator: "") }
print()

let start = NSDate()

let answr =
  countPrimesTo(range) // fast way, following enumeration way is slower...
//  primesPaged().prefix(while: { $0 <= range }).reduce(0, { a, _ in a + 1 })

let elpsd = -start.timeIntervalSinceNow

print("Found \(answr) primes up to \(range).")

print(String(format: "This test took %.3f milliseconds.", elpsd * 1000))
Output:
The first 25 primes are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 50847534 primes up to 1000000000.
This test took 2004.007 milliseconds.

This produces similar output but is many many times times faster at about 75 CPU clock cycles per prime as used here to count the primes to a billion. If one were to produce the answer by enumeration using the commented out `primesPaged()` function, the time to enumerate the results is about the same as the time to actually do the work of culling, so the example `countPrimesTo` function that does high-speed counting of found packed bits is implemented to eliminate the enumeration. For most problems over larger ranges, this approach is recommended, and could be used for summing the primes, finding first instances of maximum prime gaps, prime pairs, triples, etc.

Further optimizations as in maximum wheel factorization (a further about five times faster), multi-threading (for a further multiple of the effective number of cores used), maximum loop unrolling (about a factor of two for smaller base primes), and other techniques for higher ranges (above 16 billion in this case) can be used with increasing code complexity, but there is little point when using prime enumeration as output: ie. one could reduce the composite number culling time to zero but it would still take about 2.8 seconds to enumerate the results over the billion range in the case of this processor.

Tailspin

templates sieve
  def limit: $;
  @: [ 2..$limit ];
  1 -> #
  $@ !

  when <..$@::length ?($@($) * $@($) <..$limit>)> do
    templates sift
      def prime: $;
      @: $prime * $prime;
      @sieve: [ $@sieve... -> # ];
      when <..~$@> do
        $ !
      when <$@~..> do
        @: $@ + $prime;
        $ -> #
    end sift

    $@($) -> sift !
    $ + 1 -> #
end sieve

1000 -> sieve ...->  '$; ' -> !OUT::write
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997 

Better version using the mutability of the @-state to just update a primality flag

templates sieve
  def limit: $;
  @: [ 1..$limit -> 1 ];
  @(1): 0;
  2..$limit -> #
  $@ -> \[i](<=1> $i !\) !

  when <?($@($) <=1>)> do
    def prime2: $ * $;
    $prime2..$limit:$ -> @sieve($): 0;
end sieve

1000 -> sieve... ->  '$; ' -> !OUT::write
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997 

Tcl

package require Tcl 8.5

proc sieve n {
    if {$n < 2} {return {}}
    
    # create a container to hold the sequence of numbers.
    # use a dictionary for its speedy access (like an associative array) 
    # and for its insertion order preservation (like a list)
    set nums [dict create]
    for {set i 2} {$i <= $n} {incr i} {
        # the actual value is never used
        dict set nums $i ""
    }
    
    set primes [list]
    while {[set nextPrime [lindex [dict keys $nums] 0]] <= sqrt($n)} {
        dict unset nums $nextPrime
        for {set i [expr {$nextPrime ** 2}]} {$i <= $n} {incr i $nextPrime} {
            dict unset nums $i
        }
        lappend primes $nextPrime
    }
    return [concat $primes [dict keys $nums]]
}

puts [sieve 100]   ;# 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97


Summary :/* TI-83 BASIC */

TI-83 BASIC

Input "Limit:",N
N→Dim(L1)
For(I,2,N)
1→L1(I)
End
For(I,2,SQRT(N))
If L1(I)=1
Then
For(J,I*I,N,I)
0→L1(J)
End
End
End
For(I,2,N)
If L1(I)=1
Then
Disp i
End
End
ClrList L1

UNIX Shell

With array

Works with: Bourne Again SHell
Works with: Korn Shell
Works with: Zsh
function primes {
  typeset -a a
  typeset i j
  a[1]=""
  for (( i = 2; i <= $1; i++ )); do
    a[$i]=$i
  done
  for (( i = 2; i * i <= $1; i++ )); do
    if [[ ! -z ${a[$i]} ]]; then
      for (( j = i * i; j <= $1; j += i )); do
        a[$j]=""
      done
    fi
  done
  printf '%s' "${a[2]}"
  printf ' %s' ${a[*]:3}
  printf '\n'
}

primes 1000
Output:

Output is a single long line:

2 3 5 7 11 13 17 19 23 ... 971 977 983 991 997

Using variables as fake array

Bourne Shell and Almquist Shell have no arrays. This script works with bash or dash (standard shell in Ubuntu), but uses no specifics of the shells, so it works with plain Bourne Shell as well.

Works with: Bourne Shell
#! /bin/sh

LIMIT=1000

# As a workaround for missing arrays, we use variables p2, p3, ...,
# p$LIMIT, to represent the primes. Values are true or false.
#   eval p$i=true     # Set value.
#   eval \$p$i        # Run command: true or false.
#
# A previous version of this script created a temporary directory and
# used files named 2, 3, ..., $LIMIT to represent the primes. We now use
# variables so that a killed script does not leave extra files. About
# performance, variables are about as slow as files.

i=2
while [ $i -le $LIMIT ]
do
    eval p$i=true               # was touch $i
    i=`expr $i + 1`
done

i=2
while
    j=`expr $i '*' $i`
    [ $j -le $LIMIT ]
do
    if eval \$p$i               # was if [ -f $i ]
    then
        while [ $j -le $LIMIT ]
        do
            eval p$j=false      # was rm -f $j
            j=`expr $j + $i`
        done
    fi
    i=`expr $i + 1`
done

# was echo `ls|sort -n`
echo `i=2
      while [ $i -le $LIMIT ]; do
          eval \\$p$i && echo $i
          i=\`expr $i + 1\`
      done`

With piping

This example is incorrect. Please fix the code and remove this message.

Details: This version uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: McIlroy misunderstood the Sieve of Eratosthenes as did many of his day including David Turner (1975); see Sieve of Eratosthenes article on Wikipedia


This is an elegant script by M. Douglas McIlroy, one of the founding fathers of UNIX.

This implementation is explained in his paper "Coroutine prime number sieve" (2014).

Works with: Bourne Shell
sourc() {
    seq 2 1000
}

cull() {
    while
        read p || exit
    do
        (($p % $1 != 0)) && echo $p
    done
}

sink() {
    read p || exit
    echo $p
    cull $p | sink &
}

sourc | sink

This version works by piping 1s and 0s through sed. The string of 1s and 0s represents the array of primes.

Works with: Bourne Shell
# Fill $1 characters with $2.
fill () {
	# This pipeline would begin
	#   head -c $1 /dev/zero | ...
	# but some systems have no head -c. Use dd.
	dd if=/dev/zero bs=$1 count=1 2>/dev/null | tr '\0' $2
}

filter () {
	# Use sed to put an 'x' after each multiple of $1, remove
	# first 'x', and mark non-primes with '0'.
	sed -e s/$2/\&x/g -e s/x// -e s/.x/0/g | {
		if expr $1 '*' $1 '<' $3 > /dev/null; then
			filter `expr $1 + 1` .$2 $3
		else
			cat
		fi
	}
}

# Generate a sequence of 1s and 0s indicating primality.
oz () {
	fill $1 1 | sed s/1/0/ | filter 2 .. $1
}

# Echo prime numbers from 2 to $1.
prime () {
	# Escape backslash inside backquotes. sed sees one backslash.
	echo `oz $1 | sed 's/./&\\
/g' | grep -n 1 | sed s/:1//`
}

prime 1000

C Shell

Translation of: CMake
# Sieve of Eratosthenes: Echoes all prime numbers through $limit.
@ limit = 80

if ( ( $limit * $limit ) / $limit != $limit ) then
	echo limit is too large, would cause integer overflow.
	exit 1
endif

# Use $prime[2], $prime[3], ..., $prime[$limit] as array of booleans.
# Initialize values to 1 => yes it is prime.
set prime=( `repeat $limit echo 1` )

# Find and echo prime numbers.
@ i = 2
while ( $i <= $limit )
	if ( $prime[$i] ) then
		echo $i

		# For each multiple of i, set 0 => no it is not prime.
		# Optimization: start at i squared.
		@ m = $i * $i
		while ( $m <= $limit )
			set prime[$m] = 0
			@ m += $i
		end
	endif
	@ i += 1
end

Ursala

This example is incorrect. Please fix the code and remove this message.

Details: It probably (remainder) uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

with no optimizations

#import nat

sieve = ~<{0,1}&& iota; @NttPX ~&r->lx ^/~&rhPlC remainder@rlX~|@r

test program:

#cast %nL

example = sieve 50
Output:
<2,3,5,7,11,13,17,19,23,29,31,37,41,43,47>

Vala

Library: Gee

Without any optimizations:

using Gee;

ArrayList<int> primes(int limit){
	var sieve = new ArrayList<bool>();
	var prime_list = new ArrayList<int>();
	
	for(int i = 0; i <= limit; i++){
		sieve.add(true);
	}
	
	sieve[0] = false;
	sieve[1] = false;
	
	for (int i = 2; i <= limit/2; i++){
		if (sieve[i] != false){
			for (int j = 2; i*j <= limit; j++){
				sieve[i*j] = false;
			}
		}
	}

	for (int i = 0; i < sieve.size; i++){
		if (sieve[i] != false){
			prime_list.add(i);
		}
	}
	
	return prime_list;
} // end primes

public static void main(){
	var prime_list = primes(50);
	
	foreach(var prime in prime_list)
		stdout.printf("%s ", prime.to_string());
	
	stdout.printf("\n");
}

{{out}

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

VAX Assembly

                           000F4240  0000     1 n=1000*1000
                               0000  0000     2 .entry	main,0
                            7E 7CFD  0002     3 	clro	-(sp)			;result buffer
                            5E   DD  0005     4 	pushl	sp			;pointer to buffer
                            10   DD  0007     5 	pushl	#16			;descriptor -> len of buffer
                                     0009     6 
                            02   DD  0009     7 	pushl	#2			;1st candidate
                                     000B     8 test:
                 09 46'AF   6E   E1  000B     9 	bbc	(sp), b^bits, found	;bc - bit clear
                                     0010    10 next:
           F3 6E   000F4240 8F   F2  0010    11         aoblss  #n, (sp), test		;+1: limit,index
                                 04  0018    12         ret
                                     0019    13 found:
                         04 AE   7F  0019    14 	pushaq	4(sp)			;-> descriptor by ref
                         04 AE   DF  001C    15 	pushal	4(sp)			;-> prime on stack by ref
              00000000'GF   02   FB  001F    16 	calls	#2, g^ots$cvt_l_ti	;convert integer to string
                         04 AE   7F  0026    17 	pushaq	4(sp)			;
              00000000'GF   01   FB  0029    18 	calls	#1, g^lib$put_output	;show result
                                     0030    19 
                       53   6E   D0  0030    20 	movl	(sp), r3
                                     0033    21 mult:
    0002 53   6E   000F4240 8F   F1  0033    22 	acbl    #n, (sp), r3, set_mult	;limit,add,index
                            D1   11  003D    23 	brb	next
                                     003F    24 set_mult:				;set bits for multiples
                 EF 46'AF   53   E2  003F    25 	bbss	r3, b^bits, mult	;branch on bit set & set
                            ED   11  0044    26 	brb	mult
                                     0046    27 
                           0001E892  0046    28 bits:	.blkl	<n+2+31>/32
                                     E892    29 .end	main

VBA

Using Excel

 Sub primes()
'BRRJPA
'Prime calculation for VBA_Excel
'p is the superior limit of the range calculation
'This example calculates from 2 to 100000 and print it
'at the collum A


p = 100000

Dim nprimes(1 To 100000) As Integer
b = Sqr(p)

For n = 2 To b

    For k = n * n To p Step n
        nprimes(k) = 1
        
    Next k
Next n


For a = 2 To p
    If nprimes(a) = 0 Then
      c = c + 1
      Range("A" & c).Value = a
        
    End If
 Next a

End Sub

VBScript

To run in console mode with cscript.

    Dim sieve()
	If WScript.Arguments.Count>=1 Then
	    n = WScript.Arguments(0)
	Else 
	    n = 99
	End If
    ReDim sieve(n)
    For i = 1 To n
        sieve(i) = True
    Next
    For i = 2 To n
        If sieve(i) Then
            For j = i * 2 To n Step i
                sieve(j) = False
            Next
        End If
    Next
    For i = 2 To n
        If sieve(i) Then WScript.Echo i
    Next

Vedit macro language

This implementation uses an edit buffer as an array for flags. After the macro has been run, you can see how the primes are located in the array. Primes are marked with 'P' and non-primes with '-'. The first character position represents number 0.

#10 = Get_Num("Enter number to search to: ", STATLINE)
Buf_Switch(Buf_Free)                    // Use edit buffer as flags array
Ins_Text("--")                          // 0 and 1 are not primes
Ins_Char('P', COUNT, #10-1)             // init rest of the flags to "prime"
for (#1 = 2; #1*#1 < #10; #1++) {
    Goto_Pos(#1)
    if (Cur_Char=='P') {                // this is a prime number
        for (#2 = #1*#1; #2 <= #10; #2 += #1) {
            Goto_Pos(#2)
            Ins_Char('-', OVERWRITE)
        }
    }
}

Sample output showing numbers in range 0 to 599.

--PP-P-P---P-P---P-P---P-----P-P-----P---P-P---P-----P-----P
-P-----P---P-P-----P---P-----P-------P---P-P---P-P---P------
-------P---P-----P-P---------P-P-----P-----P---P-----P-----P
-P---------P-P---P-P-----------P-----------P---P-P---P-----P
-P---------P-----P-----P-----P-P-----P---P-P---------P------
-------P---P-P---P-------------P-----P---------P-P---P-----P
-------P-----P-----P---P-----P-------P---P-------P---------P
-P---------P-P-----P---P-----P-------P---P-P---P-----------P
-------P---P-------P---P-----P-----------P-P----------------
-P-----P---------P-----P-----P-P-----P---------P-----P-----P

Visual Basic

Works with: VB6

Sub Eratost()
    Dim sieve() As Boolean
    Dim n As Integer, i As Integer, j As Integer
    n = InputBox("limit:", n)
    ReDim sieve(n)
    For i = 1 To n
        sieve(i) = True
    Next i
    For i = 2 To n
        If sieve(i) Then
            For j = i * 2 To n Step i
                sieve(j) = False
            Next j
        End If
    Next i
    For i = 2 To n
        If sieve(i) Then Debug.Print i
    Next i
End Sub 'Eratost

Visual Basic .NET

Dim n As Integer, k As Integer, limit As Integer
Console.WriteLine("Enter number to search to: ")
limit = Console.ReadLine
Dim flags(limit) As Integer
For n = 2 To Math.Sqrt(limit)
    If flags(n) = 0 Then
        For k = n * n To limit Step n
            flags(k) = 1
        Next k
    End If
Next n
 
' Display the primes
For n = 2 To limit
    If flags(n) = 0 Then
        Console.WriteLine(n)
    End If
Next n

Alternate

Since the sieves are being removed only above the current iteration, the separate loop for display is unnecessary. And no Math.Sqrt() needed. Also, input is from command line parameter instead of Console.ReadLine(). Consolidated If block with For statement into two Do loops.

Module Module1
    Sub Main(args() As String)
        Dim lmt As Integer = 500, n As Integer = 2, k As Integer
        If args.Count > 0 Then Integer.TryParse(args(0), lmt)
        Dim flags(lmt + 1) As Boolean   ' non-primes are true in this array.
        Do                              ' a prime was found, 
            Console.Write($"{n} ")      ' so show it,
            For k = n * n To lmt Step n ' and eliminate any multiple of it at it's square and beyond.
                flags(k) = True
            Next
            Do                          ' skip over non-primes.
                n += If(n = 2, 1, 2)
            Loop While flags(n)
        Loop while n <= lmt
    End Sub
End Module
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 

V (Vlang)

Translation of: go

Basic sieve of array of booleans

fn main() {
    limit := 201 // means sieve numbers < 201
 
    // sieve
    mut c := []bool{len: limit} // c for composite.  false means prime candidate
    c[1] = true              // 1 not considered prime
    mut p := 2
    for {
        // first allowed optimization:  outer loop only goes to sqrt(limit)
        p2 := p * p
        if p2 >= limit {
            break
        }
        // second allowed optimization:  inner loop starts at sqr(p)
        for i := p2; i < limit; i += p {
            c[i] = true // it's a composite
        }
        // scan to get next prime for outer loop
        for {
            p++
            if !c[p] {
                break
            }
        }
    }
 
    // sieve complete.  now print a representation.
    for n in 1..limit {
        if c[n] {
            print("  .")
        } else {
            print("${n:3}")
        }
        if n%20 == 0 {
            println("")
        }
    }
}

Output:

  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19  .
  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .  .
 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59  .
 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79  .
  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .  .
101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .  .
  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139  .
  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .  .
  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179  .
181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199  .

Vorpal

self.print_primes = method(m){
   p = new()
   p.fill(0, m, 1, true)

   count = 0
   i = 2
   while(i < m){
      if(p[i] == true){
         p.fill(i+i, m, i, false)
         count = count + 1
      }
      i = i + 1
   }
   ('primes: ' + count + ' in ' + m).print()
   for(i = 2, i < m, i = i + 1){
      if(p[i] == true){
         ('' + i + ', ').put()
      }
   }
   ''.print()
}

self.print_primes(100)
Result:
primes: 25 in 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,

WebAssembly

(module
 (import "js" "print" (func $print (param i32)))
 (memory 4096)
 
 (func $sieve (export "sieve") (param $n i32)
   (local $i i32)
   (local $j i32)
 
   (set_local $i (i32.const 0))
   (block $endLoop
     (loop $loop
       (br_if $endLoop (i32.ge_s (get_local $i) (get_local $n)))
       (i32.store8 (get_local $i) (i32.const 1))
       (set_local $i (i32.add (get_local $i) (i32.const 1)))
       (br $loop)))
 
   (set_local $i (i32.const 2))
   (block $endLoop
     (loop $loop
       (br_if $endLoop (i32.ge_s (i32.mul (get_local $i) (get_local $i)) 
                                 (get_local $n)))
       (if (i32.eq (i32.load8_s (get_local $i)) (i32.const 1))
         (then
           (set_local $j (i32.mul (get_local $i) (get_local $i)))
           (block $endInnerLoop
             (loop $innerLoop
               (i32.store8 (get_local $j) (i32.const 0))
               (set_local $j (i32.add (get_local $j) (get_local $i)))
               (br_if $endInnerLoop (i32.ge_s (get_local $j) (get_local $n)))
               (br $innerLoop)))))
       (set_local $i (i32.add (get_local $i) (i32.const 1)))
       (br $loop)))
 
   (set_local $i (i32.const 2))
   (block $endLoop
     (loop $loop
       (if (i32.eq (i32.load8_s (get_local $i)) (i32.const 1))
         (then
           (call $print (get_local $i))))
       (set_local $i (i32.add (get_local $i) (i32.const 1)))
       (br_if $endLoop (i32.ge_s (get_local $i) (get_local $n)))
       (br $loop)))))

Xojo

Place the following in the Run event handler of a Console application:

Dim limit, prime, i As Integer
Dim s As String
Dim t As Double
Dim sieve(100000000) As Boolean

REM Get the maximum number
While limit<1 Or limit > 100000000
  Print("Max number? [1 to 100000000]")
  s = Input
  limit = CDbl(s)
Wend

REM Do the calculations
t = Microseconds
prime = 2
While prime^2 < limit
  For i = prime*2 To limit Step prime
    sieve(i) = True
  Next
  Do
    prime = prime+1
  Loop Until Not sieve(prime)
Wend
t = Microseconds-t
Print("Compute time = "+Str(t/1000000)+" sec")
Print("Press Enter...")
s = Input

REM Display the prime numbers
For i = 1 To limit
  If Not sieve(i) Then Print(Str(i))
Next
s = Input
Output:
Max number? [1 to 100000000]
1000
Compute time = 0.0000501 sec
Press Enter...

1
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
...

This version uses a dynamic array and can use (a lot) less memory. It's (a lot) slower too. Since Booleans are manually set to True, the algorithm makes more sense.

Dim limit, prime, i As Integer
Dim s As String
Dim t As Double
Dim sieve() As Boolean

REM Get the maximum number and define array
While limit<1 Or limit > 2147483647
  Print("Max number? [1 to 2147483647]")
  s = Input
  limit = CDbl(s)
Wend
t = Microseconds
For i = 0 To Limit
   Sieve.Append(True)
Next
t = Microseconds-t
Print("Memory allocation time = "+Str(t/1000000)+" sec")

REM Do the calculations
t = Microseconds
prime = 2
While prime^2 < limit
  For i = prime*2 To limit Step prime
    sieve(i) = False
  Next
  Do
    prime = prime+1
  Loop Until sieve(prime)
Wend
t = Microseconds-t
Print("Compute time = "+Str(t/1000000)+" sec")
Print("Press Enter...")
s = Input

REM Display the prime numbers
For i = 1 To limit
  If sieve(i) Then Print(Str(i))
Next
s = Input
Output:
Max number? [1 to 2147483647]
1000
Memory allocation time = 0.0000296 sec
Compute time = 0.0000501 sec
Press Enter...

1
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
...

Woma

(sieve(n = /0 -> int; limit = /0 -> int; is_prime = [/0] -> *)) *
    i<@>range(n*n, limit+1, n)
        is_prime = is_prime[$]i,False
    <*>is_prime

(primes_upto(limit = 4 -> int)) list(int)
    primes = [] -> list
    f = [False, False] -> list(bool)
    t = [True] -> list(bool)
    u = limit - 1 -> int
    tt = t * u -> list(bool)
    is_prime = flatten(f[^]tt) -> list(bool)
    limit_sqrt = limit ** 0.5 -> float
    iter1 = int(limit_sqrt + 1.5) -> int

    n<@>range(iter1)
        is_prime[n]<?>is_prime = sieve(n, limit, is_prime)

    i,prime<@>enumerate(is_prime)
        prime<?>primes = primes[^]i
    <*>primes

Wren

var sieveOfE = Fn.new { |n|
    if (n < 2) return []
    var comp = List.filled(n-1, false)
    var p = 2
    while (true) {
        var p2 = p * p
        if (p2 > n) break
        var i = p2
        while (i <= n) {
            comp[i-2] = true
            i = i + p
        }
        while (true) {
            p = p + 1
            if (!comp[p-2]) break
        }
    }
    var primes = []
    for (i in 0..n-2) {
        if (!comp[i]) primes.add(i+2)
    }
    return primes
}

System.print(sieveOfE.call(100))
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

XPL0

include c:\cxpl\codes;                  \intrinsic 'code' declarations
int  Size, Prime, I, Kill;
char Flag;
[Size:= IntIn(0);
Flag:= Reserve(Size+1);
for I:= 2 to Size do Flag(I):= true;
for I:= 2 to Size do
    if Flag(I) then                     \found a prime
        [Prime:= I;
        IntOut(0, Prime);  CrLf(0);
        Kill:= Prime + Prime;           \first multiple to kill
        while Kill <= Size do
                [Flag(Kill):= false;    \zero a non-prime
                Kill:= Kill + Prime;    \next multiple
                ];
        ];
]
Example output:
20

2 3 5 7 11 13 17

19

Yabasic

#!/usr/bin/yabasic

// ---------------------------
// Prime Sieve Benchmark --
// "Shootout" Version    --
// ---------------------------
// usage:
//     yabasic sieve8k.yab 90000


SIZE = 8192
ONN = 1 : OFF = 0
dim flags(SIZE)

sub main()
    
    cmd = peek("arguments")
    if cmd = 1 then
       iterations = val(peek$("argument"))
       if iterations = 0 then print "Argument wrong. Done 1000." : iterations = 1000 end if
    else
       print "1000 iterations."
       iterations = 1000
    end if
    
    for iter = 1 to iterations
        count = 0
        for n= 1 to SIZE : flags(n) = ONN: next n
        for i = 2 to SIZE
            if flags(i) = ONN then
               let k = i + i
               if k < SIZE then
                 for k = k to SIZE step i
                    flags(k) = OFF
                 next k
               end if
               count = count + 1                 
            end if
        next i
    next iter
    print "Count: ", count  // 1028
end sub

clear screen

print "Prime Sieve Benchmark\n"

main()

t = val(mid$(time$,10))

print "time: ", t, "\n"
print peek("millisrunning")

Zig

const std = @import("std");
const stdout = std.io.getStdOut().outStream();

pub fn main() !void {
    try sieve(1000);
}

// using a comptime limit ensures that there's no need for dynamic memory.
fn sieve(comptime limit: usize) !void {
    var prime = [_]bool{true} ** limit;
    prime[0] = false;
    prime[1] = false;
    var i: usize = 2;
    while (i*i < limit) : (i += 1) {
        if (prime[i]) {
            var j = i*i;
            while (j < limit) : (j += i)
                prime[j] = false;
        }
    }
    var c: i32 = 0;
    for (prime) |yes, p|
        if (yes) {
            c += 1;
            try stdout.print("{:5}", .{p});
            if (@rem(c, 10) == 0)
                try stdout.print("\n", .{});
        };
    try stdout.print("\n", .{});
}
Output:
$ zig run sieve.zig 
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

Odds-only bit packed version

Translation of: BCPL

Includes the iterator, as with the BCPL Odds-only bit packed sieve. Since it's not much extra code, the sieve object also includes methods for getting the size and testing for membership.

const std = @import("std");
const heap = std.heap;
const mem = std.mem;
const stdout = std.io.getStdOut().writer();

pub fn main() !void {
    const assert = std.debug.assert;

    var buf: [fixed_alloc_sz(1000)]u8 = undefined; // buffer big enough for 1,000 primes.
    var fba = heap.FixedBufferAllocator.init(&buf);

    const sieve = try SoE.init(1000, &fba.allocator);
    defer sieve.deinit(); // not needed for the FBA, but in general you would de-init the sieve

    // test membership functions
    assert(sieve.contains(997));
    assert(!sieve.contains(995));
    assert(!sieve.contains(994));
    assert(!sieve.contains(1009));

    try stdout.print("There are {} primes < 1000\n", .{sieve.size()});
    var c: u32 = 0;
    var iter = sieve.iterator();
    while (iter.next()) |p| {
        try stdout.print("{:5}", .{p});
        c += 1;
        if (c % 10 == 0)
            try stdout.print("\n", .{});
    }
    try stdout.print("\n", .{});
}

// return size to sieve n prmes if using the Fixed Buffer Allocator
//     adds some u64 words for FBA bookkeeping.
pub inline fn fixed_alloc_sz(limit: usize) usize {
    return (2 + limit / 128) * @sizeOf(u64);
}

pub const SoE = struct {
    const all_u64bits_on = 0xFFFF_FFFF_FFFF_FFFF;
    const empty = [_]u64{};

    sieve: []u64,
    alloc: *mem.Allocator,

    pub fn init(limit: u64, allocator: *mem.Allocator) error{OutOfMemory}!SoE {
        if (limit < 3)
            return SoE{
                .sieve = &empty,
                .alloc = allocator,
            };

        var bit_sz = (limit + 1) / 2 - 1;
        var q = bit_sz >> 6;
        var r = bit_sz & 0x3F;
        var sz = q + @boolToInt(r > 0);
        var sieve = try allocator.alloc(u64, sz);

        var i: usize = 0;
        while (i < q) : (i += 1)
            sieve[i] = all_u64bits_on;
        if (r > 0)
            sieve[q] = (@as(u64, 1) << @intCast(u6, r)) - 1;

        var bit: usize = 0;
        while (true) {
            while (sieve[bit >> 6] & @as(u64, 1) << @intCast(u6, bit & 0x3F) == 0)
                bit += 1;

            const p = 2 * bit + 3;
            q = p * p;
            if (q > limit)
                return SoE{
                    .sieve = sieve,
                    .alloc = allocator,
                };

            r = (q - 3) / 2;
            while (r < bit_sz) : (r += p)
                sieve[r >> 6] &= ~((@as(u64, 1)) << @intCast(u6, r & 0x3F));

            bit += 1;
        }
    }

    pub fn deinit(self: SoE) void {
        if (self.sieve.len > 0) {
            self.alloc.free(self.sieve);
        }
    }

    pub fn iterator(self: SoE) SoE_Iterator {
        return SoE_Iterator.init(self.sieve);
    }

    pub fn size(self: SoE) usize {
        var sz: usize = 1; // sieve doesn't include 2.
        for (self.sieve) |bits|
            sz += @popCount(u64, bits);
        return sz;
    }

    pub fn contains(self: SoE, n: u64) bool {
        if (n & 1 == 0)
            return n == 2
        else {
            const bit = (n - 3) / 2;
            const q = bit >> 6;
            const r = @intCast(u6, bit & 0x3F);
            return if (q >= self.sieve.len)
                false
            else
                self.sieve[q] & (@as(u64, 1) << r) != 0;
        }
    }
};

// Create an iterater object to enumerate primes we've generated.
const SoE_Iterator = struct {
    const Self = @This();

    start: u64,
    bits: u64,
    sieve: []const u64,

    pub fn init(sieve: []const u64) Self {
        return Self{
            .start = 0,
            .sieve = sieve,
            .bits = sieve[0],
        };
    }

    pub fn next(self: *Self) ?u64 {
        if (self.sieve.len == 0)
            return null;

        // start = 0 => first time, so yield 2.
        if (self.start == 0) {
            self.start = 3;
            return 2;
        }

        var x = self.bits;
        while (true) {
            if (x != 0) {
                const p = @ctz(u64, x) * 2 + self.start;
                x &= x - 1;
                self.bits = x;
                return p;
            } else {
                self.start += 128;
                self.sieve = self.sieve[1..];
                if (self.sieve.len == 0)
                    return null;
                x = self.sieve[0];
            }
        }
    }
};
Output:
There are 168 primes < 1000
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

Optimized version

const stdout = @import("std").io.getStdOut().writer();

const lim = 1000;
const n = lim - 2;

var primes: [n]?usize = undefined;

pub fn main() anyerror!void {
    var i: usize = 0;
    var m: usize = 0;

    while (i < n) : (i += 1) {
        primes[i] = i + 2;
    }

    i = 0;
    while (i < n) : (i += 1) {
        if (primes[i]) |prime| {
            m += 1;
            try stdout.print("{:5}", .{prime});
            if (m % 10 == 0) try stdout.print("\n", .{});
            var j: usize = i + prime;
            while (j < n) : (j += prime) {
                primes[j] = null;
            }
        }
    }
    try stdout.print("\n", .{});
}
Output:
$ zig run sieve.zig 
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

zkl

fcn sieve(limit){
   composite:=Data(limit+1).fill(1);  // bucket of bytes set to 1 (prime)
   (2).pump(limit.toFloat().sqrt()+1, Void,  // Void==no results, just loop
       composite.get, Void.Filter,	// if prime, zero multiples
      'wrap(n){ [n*n..limit,n].pump(Void,composite.set.fp1(0)) }); //composite[n*p]=0
   (2).filter(limit-1,composite.get); // bytes still 1 are prime
}
sieve(53).println();

The pump method is just a loop, passing results from action to action and collecting the results (ie a minimal state machine). Pumping to Void means don't collect. The Void.Filter action means if result.toBool() is False, skip else get the source input (pre any action) and pass that to the next action. Here, the first filter checks the table if src is prime, if so, the third action take the prime and does some side effects.

Output:
L(2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53)