I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

# Sieve of Eratosthenes

Sieve of Eratosthenes
You are encouraged to solve this task according to the task description, using any language you may know.
This task has been clarified. Its programming examples are in need of review to ensure that they still fit the requirements of the task.

The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.

Implement the   Sieve of Eratosthenes   algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.

That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.

If there's an easy way to add such a wheel based optimization, implement it as an alternative version.

Note
• It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.

## 11l

Translation of: Python
`F primes_upto(limit)   V is_prime = [0B]*2 [+] [1B]*(limit - 1)   L(n) 0 .< Int(limit ^ 0.5 + 1.5)      I is_prime[n]         L(i) (n*n..limit).step(n)            is_prime[i] = 0B   R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i) print(primes_upto(100))`
Output:
```[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
```

## 360 Assembly

For maximum compatibility, this program uses only the basic instruction set.

`*        Sieve of Eratosthenes ERATOST  CSECT           USING  ERATOST,R12SAVEAREA B      STM-SAVEAREA(R15)         DC     17F'0'         DC     CL8'ERATOST'STM      STM    R14,R12,12(R13) save calling context         ST     R13,4(R15)               ST     R15,8(R13)         LR     R12,R15         set addessability*        ----   CODE         LA     R4,1            I=1           LA     R6,1            increment         L      R7,N            limitLOOPI    BXH    R4,R6,ENDLOOPI  do I=2 to N         LR     R1,R4           R1=I         BCTR   R1,0                      LA     R14,CRIBLE(R1)         CLI    0(R14),X'01'         BNE    ENDIF           if not CRIBLE(I)         LR     R5,R4           J=I         LR     R8,R4         LR     R9,R7LOOPJ    BXH    R5,R8,ENDLOOPJ  do J=I*2 to N by I         LR     R1,R5           R1=J         BCTR   R1,0         LA     R14,CRIBLE(R1)         MVI    0(R14),X'00'    CRIBLE(J)='0'B         B      LOOPJENDLOOPJ EQU    *ENDIF    EQU    *         B      LOOPIENDLOOPI EQU    *         LA     R4,1            I=1           LA     R6,1         L      R7,NLOOP     BXH    R4,R6,ENDLOOP   do I=1 to N         LR     R1,R4           R1=I         BCTR   R1,0         LA     R14,CRIBLE(R1)         CLI    0(R14),X'01'         BNE    NOTPRIME        if not CRIBLE(I)          CVD    R4,P            P=I         UNPK   Z,P             Z=P         MVC    C,Z             C=Z         OI     C+L'C-1,X'F0'   zap sign         MVC    WTOBUF(8),C+8         WTO    MF=(E,WTOMSG)		  NOTPRIME EQU    *         B      LOOPENDLOOP  EQU    *RETURN   EQU    *         LM     R14,R12,12(R13) restore context         XR     R15,R15         set return code to 0         BR     R14             return to caller*        ----   DATAI        DS     FJ        DS     F         DS     0FP        DS     PL8             packedZ        DS     ZL16            zonedC        DS     CL16            character WTOMSG   DS     0F         DC     H'80'           length of WTO buffer         DC     H'0'            must be binary zeroesWTOBUF   DC     80C' '         LTORG  N        DC     F'100000'CRIBLE   DC     100000X'01'         YREGS           END    ERATOST`
Output:
```00000002
00000003
00000005
00000007
00000011
00000013
00000017
00000019
00000023
00000029
00000031
00000037
00000041
00000043
00000047
00000053
00000059
00000061
00000067
...
00099767
00099787
00099793
00099809
00099817
00099823
00099829
00099833
00099839
00099859
00099871
00099877
00099881
00099901
00099907
00099923
00099929
00099961
00099971
00099989
00099991
```

## 6502 Assembly

If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.

`ERATOS: STA  \$D0      ; value of n        LDA  #\$00        LDX  #\$00SETUP:  STA  \$1000,X  ; populate array        ADC  #\$01        INX        CPX  \$D0        BPL  SET        JMP  SETUPSET:    LDX  #\$02SIEVE:  LDA  \$1000,X  ; find non-zero        INX        CPX  \$D0        BPL  SIEVED        CMP  #\$00        BEQ  SIEVE        STA  \$D1      ; current primeMARK:   CLC        ADC  \$D1        TAY        LDA  #\$00        STA  \$1000,Y        TYA        CMP  \$D0        BPL  SIEVE        JMP  MARKSIEVED: LDX  #\$01        LDY  #\$00COPY:   INX        CPX  \$D0        BPL  COPIED        LDA  \$1000,X        CMP  #\$00        BEQ  COPY        STA  \$2000,Y        INY        JMP  COPYCOPIED: TYA           ; how many found        RTS`

## 68000 Assembly

Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.

Works with: [EASy68K v5.13.00]

Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.

`*-----------------------------------------------------------* Title      : BitSieve* Written by : G. A. Tippery* Date       : 2014-Feb-24, 2013-Dec-22* Description: Prime number sieve*-----------------------------------------------------------    	ORG    \$1000 **	---- Generic macros ----	**PUSH	MACRO	MOVE.L	\1,-(SP)	ENDM POP	MACRO	MOVE.L	(SP)+,\1	ENDM DROP	MACRO	ADDQ	#4,SP	ENDM PUTS	MACRO	** Print a null-terminated string w/o CRLF **	** Usage: PUTS stringaddress	** Returns with D0, A1 modified	MOVEQ	#14,D0	; task number 14 (display null string)	LEA	\1,A1	; address of string	TRAP	#15	; display it	ENDM GETN	MACRO	MOVEQ	#4,D0	; Read a number from the keyboard into D1.L. 	TRAP	#15	ENDM **	---- Application-specific macros ----	** val	MACRO		; Used by bit sieve. Converts bit address to the number it represents.	ADD.L	\1,\1	; double it because odd numbers are omitted	ADDQ	#3,\1	; add offset because initial primes (1, 2) are omitted	ENDM * ** ================================================================================ *** ** Integer square root routine, bisection method *** ** IN: D0, should be 0<D0<\$10000 (65536) -- higher values MAY work, no guarantee* ** OUT: D1*SquareRoot:*	MOVEM.L	D2-D4,-(SP)	; save registers needed for local variables*	DO == n*	D1 == a*	D2 == b*	D3 == guess*	D4 == temp**		a = 1;*		b = n;	MOVEQ	#1,D1	MOVE.L	D0,D2*		do {	REPEAT*		guess = (a+b)/2;	MOVE.L	D1,D3	ADD.L	D2,D3	LSR.L	#1,D3*		if (guess*guess > n) {	// inverse function of sqrt is square	MOVE.L	D3,D4	MULU	D4,D4		; guess^2	CMP.L	D0,D4	BLS	.else*		b = guess;	MOVE.L	D3,D2	BRA	.endif*		} else {.else:*		a = guess;	MOVE.L	D3,D1*		} //if.endif:*		} while ((b-a) > 1);	; Same as until (b-a)<=1 or until (a-b)>=1	MOVE.L	D2,D4	SUB.L	D1,D4	; b-a	UNTIL.L	  D4 <LE> #1 DO.S*		return (a)	; Result is in D1*		} //LongSqrt()	MOVEM.L	(SP)+,D2-D4	; restore saved registers	RTS** ** ================================================================================ **  ** ======================================================================= *****  Prime-number Sieve of Eratosthenes routine using a big bit field for flags  ***  Enter with D0 = size of sieve (bit array)*  Prints found primes 10 per line*  Returns # prime found in D6**   Register usage:**	D0 == n*	D1 == prime*	D2 == sqroot*	D3 == PIndex*	D4 == CIndex*	D5 == MaxIndex*	D6 == PCount**	A0 == PMtx**   On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.* **	------------------------	** GetBit:		** sub-part of Sieve subroutine **		** Entry: bit # is on TOS		** Exit: A6 holds the byte number, D7 holds the bit number within the byte		** Note: Input param is still on TOS after return. Could have passed via a register, but                **  wanted to practice with stack. :)*			MOVE.L	(4,SP),D7	; get value from (pre-call) TOS	ASR.L	#3,D7	; /8	MOVEA	D7,A6	; byte #	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS	AND.L	#\$7,D7	; bit #	RTS **	------------------------	** Sieve:	MOVE	D0,D5	SUBQ	#1,D5	JSR	SquareRoot	; sqrt D0 => D1	MOVE.L	D1,D2	LEA	PArray,A0	CLR.L	D3*PrimeLoop:	MOVE.L	D3,D1	val	D1	MOVE.L	D3,D4	ADD.L	D1,D4*CxLoop:		; Goes through array marking multiples of d1 as composite numbers	CMP.L	D5,D4	BHI	ExitCx	PUSH	D4	; set D7 as bit # and A6 as byte pointer for D4'th bit of array	JSR GetBit	DROP	BSET	D7,0(A0,A6.L)	; set bit to mark as composite number	ADD.L	D1,D4	; next number to mark	BRA	CxLoopExitCx:	CLR.L	D1	; Clear new-prime-found flag	ADDQ	#1,D3	; Start just past last prime found PxLoop:		; Searches for next unmarked (not composite) number	CMP.L	D2,D3	; no point searching past where first unmarked multiple would be past end of array	BHI	ExitPx	; if past end of array	TST.L	D1	BNE	ExitPx	; if flag set, new prime found	PUSH D3		; check D3'th bit flag	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer	DROP		; drop TOS	BTST	D7,0(A0,A6.L)	; read bit flag	BNE	IsSet	; If already tagged as composite	MOVEQ	#-1,D1	; Set flag that we've found a new primeIsSet:	ADDQ	#1,D3	; next PIndex	BRA	PxLoopExitPx:	SUBQ	#1,D3	; back up PIndex	TST.L	D1	; Did we find a new prime #?	BNE	PrimeLoop	; If another prime # found, go process it*		; fall through to print routine **	------------------------	** * Print primes found**	D4 == Column count**	Print header and assumed primes (#1, #2)     	PUTS	Header	; Print string @ Header, no CR/LF	MOVEQ	#2,D6	; Start counter at 2 because #1 and #2 are assumed primes	MOVEQ	#2,D4*	MOVEQ	#0,D3PrintLoop:	CMP.L	D5,D3	BHS	ExitPL	PUSH	D3	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer	DROP		; drop TOS	BTST	D7,0(A0,A6.L)	BNE		NotPrime*		printf(" %6d", val(PIndex)	MOVE.L	D3,D1	val	D1	AND.L	#\$0000FFFF,D1	MOVEQ	#6,D2	MOVEQ	#20,D0	; display signed RJ	TRAP	#15	ADDQ	#1,D4	ADDQ	#1,D6*	*** Display formatting ****		if((PCount % 10) == 0) printf("\n");	CMP	#10,D4	BLO	NoLF	PUTS	CRLF	MOVEQ	#0,D4NoLF:NotPrime:	ADDQ	#1,D3	BRA	PrintLoopExitPL:	RTS ** ======================================================================= ** N	EQU	5000	; *** Size of boolean (bit) array ***SizeInBytes	EQU	(N+7)/8*START:                  	; first instruction of program	MOVE.L	#N,D0	; # to test	JSR	Sieve*		printf("\n %d prime numbers found.\n", D6); ***	PUTS	Summary1,A1	MOVE	#3,D0	; Display signed number in D1.L in decimal in smallest field.	MOVE.W	D6,D1	TRAP	#15	PUTS	Summary2,A1 	SIMHALT             	; halt simulator ** ======================================================================= ** * Variables and constants here 	ORG	\$2000CR	EQU	13LF	EQU	10CRLF	DC.B	CR,LF,\$00 PArray:	DCB.B	SizeInBytes,0 Header:	DC.B	CR,LF,LF,' Primes',CR,LF,' ======',CR,LF		DC.B	'     1     2',\$00 Summary1:	DC.B	CR,LF,' ',\$00Summary2:	DC.B	' prime numbers found.',CR,LF,\$00     END    START        	; last line of source`

## 8086 Assembly

`MAXPRM:	equ	5000		; Change this value for more primes	cpu	8086	bits	16	org	100hsection	.texterato:	mov	cx,MAXPRM	; Initialize array (set all items to prime)	mov	bp,cx		; Keep a copy in BP	mov	di,sieve	mov	al,1	rep	stosb	;;;	Sieve	mov	bx,sieve	; Set base register to array	inc	cx		; CX=1 (CH=0, CL=1); CX was 0 before	mov	si,cx		; Start at number 2 (1+1).next:	inc	si		; Next number 	cmp	cl,[bx+si]	; Is this number marked as prime?	jne	.next		; If not, try next number	mov	ax,si		; Otherwise, calculate square,	mul	si	mov	di,ax		; and put it in DI	cmp	di,bp		; Check array bounds	ja	output		; We're done when SI*SI>MAXPRM.mark:	mov	[bx+di],ch	; Mark byte as composite	add	di,si		; Next composite	cmp 	di,bp		; While maximum not reached	jbe	.mark	jmp	.next	;;;	Outputoutput:	mov	si,2		; Start at 2.test:	dec	byte [bx+si]	; Prime?	jnz	.next		; If not, try next number	mov	ax,si		; Otherwise, print number	call	prax.next:	inc	si	cmp	si,MAXPRM	jbe	.test	ret	;;;	Write number in AX to standard output (using MS-DOS)prax:	push	bx		; Save BX	mov	bx,numbuf	mov	bp,10		; Divisor.loop:	xor	dx,dx		; Divide AX by 10, modulus in DX	div	bp	add	dl,'0'		; ASCII digit	dec	bx	mov	[bx],dl		; Store ASCII digit	test	ax,ax		; More digits?	jnz	.loop	mov	dx,bx		; Print number	mov	ah,9		; 9 = MS-DOS syscall to print string	int	21h	pop	bx		; Restore BX	retsection	.data	db	'*****'		; Room for numbernumbuf:	db	13,10,'\$'section	.bss sieve:	resb	MAXPRM`
Output:
```2
3
5
7
11
...
4969
4973
4987
4993
4999
```

## AArch64 Assembly

Works with: as version Raspberry Pi 3B version Buster 64 bits
` /* ARM assembly AARCH64 Raspberry PI 3B *//*  program cribleEras64.s   */ /*******************************************//* Constantes file                         *//*******************************************//* for this file see task include a file in language AArch64 assembly */.include "../includeConstantesARM64.inc" .equ MAXI,      100 /*********************************//* Initialized data              *//*********************************/.datasMessResult:        .asciz "Prime  : @ \n"szCarriageReturn:   .asciz "\n" /*********************************//* UnInitialized data            *//*********************************/.bss  sZoneConv:                  .skip 24TablePrime:                 .skip   8 * MAXI /*********************************//*  code section                 *//*********************************/.text.global main main:                               // entry of program     ldr x4,qAdrTablePrime           // address prime table    mov x0,#2                       // prime 2    bl displayPrime    mov x1,#2    mov x2,#11:                                  // loop for multiple of 2    str x2,[x4,x1,lsl #3]           // mark  multiple of 2    add x1,x1,#2    cmp x1,#MAXI                    // end ?    ble 1b                          // no loop    mov x1,#3                       // begin indice    mov x3,#12:    ldr x2,[x4,x1,lsl #3]           // load table élément    cmp x2,#1                       // is prime ?    beq 4f    mov x0,x1                       // yes -> display    bl displayPrime    mov x2,x13:                                  // and loop to mark multiples of this prime    str x3,[x4,x2,lsl #3]    add x2,x2,x1                    // add the prime    cmp x2,#MAXI                    // end ?    ble 3b                          // no -> loop4:    add x1,x1,2                     // other prime in table    cmp x1,MAXI                     // end table ?    ble 2b                          // no -> loop 100:                                // standard end of the program     mov x0,0                        // return code    mov x8,EXIT                     // request to exit program    svc 0                           // perform the system callqAdrszCarriageReturn:    .quad szCarriageReturnqAdrsMessResult:         .quad sMessResultqAdrTablePrime:          .quad TablePrime /******************************************************************//*      Display prime table elements                                */ /******************************************************************//* x0 contains the prime */displayPrime:    stp x1,lr,[sp,-16]!             // save  registers    ldr x1,qAdrsZoneConv    bl conversion10                 // call décimal conversion    ldr x0,qAdrsMessResult    ldr x1,qAdrsZoneConv            // insert conversion in message    bl strInsertAtCharInc    bl affichageMess                // display message100:    ldp x1,lr,[sp],16               // restaur  2 registers    ret                             // return to address lr x30qAdrsZoneConv:                   .quad sZoneConv   /********************************************************//*        File Include fonctions                        *//********************************************************//* for this file see task include a file in language AArch64 assembly */.include "../includeARM64.inc" `
```Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
```

## ABAP

` PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100. AT SELECTION-SCREEN ON p_limit.  IF p_limit LE 1.    MESSAGE 'Limit must be higher then 1.' TYPE 'E'.  ENDIF. START-OF-SELECTION.  FIELD-SYMBOLS: <fs_prime> TYPE flag.  DATA: gt_prime TYPE TABLE OF flag,        gv_prime TYPE flag,        gv_i     TYPE i,        gv_j     TYPE i.   DO p_limit TIMES.    IF sy-index > 1.      gv_prime = abap_true.    ELSE.      gv_prime = abap_false.    ENDIF.     APPEND gv_prime TO gt_prime.  ENDDO.   gv_i = 2.  WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).    IF ( gt_prime[ gv_i ] EQ abap_true ).      gv_j =  gv_i ** 2.      WHILE ( gv_j <= p_limit ).        gt_prime[ gv_j ] = abap_false.        gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).      ENDWHILE.    ENDIF.    gv_i = gv_i + 1.  ENDWHILE.   LOOP AT gt_prime INTO gv_prime.    IF gv_prime = abap_true.      WRITE: / sy-tabix.    ENDIF.  ENDLOOP. `

## ACL2

`(defun nats-to-from (n i)   (declare (xargs :measure (nfix (- n i))))   (if (zp (- n i))       nil       (cons i (nats-to-from n (+ i 1))))) (defun remove-multiples-up-to-r (factor limit xs i)   (declare (xargs :measure (nfix (- limit i))))   (if (or (> i limit)           (zp (- limit i))           (zp factor))       xs       (remove-multiples-up-to-r        factor        limit        (remove i xs)        (+ i factor)))) (defun remove-multiples-up-to (factor limit xs)   (remove-multiples-up-to-r factor limit xs (* factor 2))) (defun sieve-r (factor limit)   (declare (xargs :measure (nfix (- limit factor))))   (if (zp (- limit factor))       (nats-to-from limit 2)       (remove-multiples-up-to factor (+ limit 1)                               (sieve-r (1+ factor) limit)))) (defun sieve (limit)   (sieve-r 2 limit))`

## ActionScript

Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)

`function eratosthenes(limit:int):Array{	var primes:Array = new Array();	if (limit >= 2) {		var sqrtlmt:int = int(Math.sqrt(limit) - 2);		var nums:Array = new Array(); // start with an empty Array...		for (var i:int = 2; i <= limit; i++) // and			nums.push(i); // only initialize the Array once...		for (var j:int = 0; j <= sqrtlmt; j++) {			var p:int = nums[j]			if (p)				for (var t:int = p * p - 2; t < nums.length; t += p)					nums[t] = 0;		}		for (var m:int = 0; m < nums.length; m++) {			var r:int = nums[m];			if (r)				primes.push(r);		}	}	return primes;}var e:Array = eratosthenes(1000);trace(e);`

Output:

Output:
```2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997
```

`with Ada.Text_IO, Ada.Command_Line; procedure Eratos is    Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));   Prime: array(1 .. Last) of Boolean := (1 => False, others => True);   Base: Positive := 2;   Cnt: Positive;begin   loop      exit when Base * Base > Last;      if Prime(Base) then         Cnt := Base + Base;         loop            exit when Cnt > Last;            Prime(Cnt) := False;            Cnt := Cnt + Base;         end loop;      end if;      Base := Base + 1;   end loop;   Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");   for Number in Prime'Range loop      if Prime(Number) then         Ada.Text_IO.Put(Positive'Image(Number));      end if;   end loop;end Eratos;`
Output:
```> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31```

## Agda

` -- importsopen import Data.Nat as ℕ     using (ℕ; suc; zero; _+_; _∸_)open import Data.Vec as Vec   using (Vec; _∷_; []; tabulate; foldr)open import Data.Fin as Fin   using (Fin; suc; zero)open import Function          using (_∘_; const; id)open import Data.List as List using (List; _∷_; [])open import Data.Maybe        using (Maybe; just; nothing) -- Without square cutoff optimizationmodule Simple where  primes : ∀ n → List (Fin n)  primes zero = []  primes (suc zero) = []  primes (suc (suc zero)) = []  primes (suc (suc (suc m))) = sieve (tabulate (just ∘ suc))    where    sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))    sieve [] = []    sieve (nothing ∷ xs) =         sieve xs    sieve (just x  ∷ xs) = suc x ∷ sieve (foldr B remove (const []) xs x)      where      B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n       remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)      remove _ ys zero    = nothing ∷ ys x      remove y ys (suc z) = y       ∷ ys z -- With square cutoff optimizationmodule SquareOpt where  primes : ∀ n → List (Fin n)  primes zero = []  primes (suc zero) = []  primes (suc (suc zero)) = []  primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))    where    sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))    sieve _ zero = List.mapMaybe id ∘ Vec.toList    sieve _ (suc _) [] = []    sieve i (suc l) (nothing ∷ xs) =     sieve (suc i) (l ∸ i ∸ i) xs    sieve i (suc l) (just x  ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (Vec.foldr B remove (const []) xs i)      where      B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n       remove : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)      remove _ ys zero    = nothing ∷ ys i      remove y ys (suc j) = y       ∷ ys j `

## Agena

Tested with Agena 2.9.5 Win32

`# Sieve of Eratosthenes # generate and return a sequence containing the primes up to sieveSizesieve := proc( sieveSize :: number ) :: sequence is    local sieve, result;     result := seq(); # sequence of primes - initially empty    create register sieve( sieveSize ); # "vector" to be sieved     sieve[ 1 ] := false;    for sPos from 2 to sieveSize do sieve[ sPos ] := true od;     # sieve the primes    for sPos from 2 to entier( sqrt( sieveSize ) ) do        if sieve[ sPos ] then            for p from sPos * sPos to sieveSize by sPos do                sieve[ p ] := false            od        fi    od;     # construct the sequence of primes    for sPos from 1 to sieveSize do        if sieve[ sPos ] then insert sPos into result fi    od return resultend; # sieve  # test the sieve procfor i in sieve( 100 ) do write( " ", i ) od; print();`
Output:
``` 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

## ALGOL 60

Based on the 1962 Revised Repport:

```comment Sieve of Eratosthenes;
begin
integer array t[0:1000];
integer i,j,k;
for i:=0 step 1 until 1000 do t[i]:=1;
t:=0; t:=0; i:=0;
for i:=i while i<1000 do
begin
for i:=i while i<1000 and t[i]=0 do i:=i+1;
if i<1000 then
begin
j:=2;
k:=j*i;
for k:=k while k<1000 do
begin
t[k]:=0;
j:=j+1;
k:=j*i
end;
i:=i+1
end
end;
for i:=0 step 1 until 999 do
if t[i]≠0 then print(i,ꞌ is primeꞌ)
end
```

An 1964 Implementation:

Works with: ALGOL 60 for OS/360

`'BEGIN'    'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);    'INTEGER' I,J,K;    'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;    SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);    'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'    'BEGIN'        CANDIDATES(/I/) := 1;    'END';    CANDIDATES(/0/) := 0;    CANDIDATES(/1/) := 0;    I := 0;    'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'    'BEGIN'        'FOR' I := I 'WHILE' I 'LESS' 1000          'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'            I := I+1;        'IF' I 'LESS' 1000 'THEN'        'BEGIN'            J := 2;            K := J*I;            'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'            'BEGIN'                CANDIDATES(/K/) := 0;                J := J + 1;                K := J*I;            'END';            I := I+1;            'END'        'END';        'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'        'IF' CANDIDATES(/I/) 'NOTEQUAL' 0  'THEN'        'BEGIN'            OUTINTEGER(1,I);            OUTSTRING(1,'(' IS PRIME')');            'COMMENT' NEW LINE;            SYSACT(1,14,1)        'END'    'END''END'`

## ALGOL 68

`BOOL prime = TRUE, non prime = FALSE;PROC eratosthenes = (INT n)[]BOOL:(  [n]BOOL sieve;  FOR i TO UPB sieve DO sieve[i] := prime OD;  INT m = ENTIER sqrt(n);  sieve := non prime;  FOR i FROM 2 TO m DO    IF sieve[i] = prime THEN      FOR j FROM i*i BY i TO n DO        sieve[j] := non prime      OD    FI  OD;  sieve);  print((eratosthenes(80),new line))`
Output:
```FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF
```

## ALGOL W

### Standard, non-optimised sieve

`begin     % implements the sieve of Eratosthenes                                   %    %     s(i) is set to true if i is prime, false otherwise                 %    %     algol W doesn't have a upb operator, so we pass the size of the    %    %     array in n                                                         %    procedure sieve( logical array s ( * ); integer value n ) ;    begin         % start with everything flagged as prime                             %         for i := 1 until n do s( i ) := true;         % sieve out the non-primes                                           %        s( 1 ) := false;        for i := 2 until truncate( sqrt( n ) )        do begin            if s( i )            then begin                for p := i * i step i until n do s( p ) := false            end if_s_i        end for_i ;     end sieve ;     % test the sieve procedure                                               %     integer sieveMax;     sieveMax := 100;    begin         logical array s ( 1 :: sieveMax );         i_w := 2; % set output field width                                   %        s_w := 1; % and output separator width                               %         % find and display the primes                                        %        sieve( s, sieveMax );        for i := 1 until sieveMax do if s( i ) then writeon( i );     end end.`
Output:
``` 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

### Odd numbers only version

Alternative version that only stores odd numbers greater than 1 in the sieve.

`begin    % implements the sieve of Eratosthenes                                   %    % only odd numbers appear in the sieve, which starts at 3                %    % s( i ) is set to true if ( i * 2 ) + 1 is prime                        %    procedure sieve2( logical array s ( * ); integer value n ) ;    begin        % start with everything flagged as prime                             %         for i := 1 until n do s( i ) := true;        % sieve out the non-primes                                           %        % the subscripts of s are  1  2  3  4  5  6  7  8  9 10 11 12 13...  %        %      which correspond to 3  5  7  9 11 13 15 17 19 21 23 25 27...  %        for i := 1 until truncate( sqrt( n ) ) do begin            if s( i ) then begin                integer ip;                ip := ( i * 2 ) + 1;                for p := i + ip step ip until n do s( p ) := false            end if_s_i        end for_i ;    end sieve2 ;    % test the sieve2 procedure                                              %    integer primeMax, arrayMax;    primeMax := 100;    arrayMax := ( primeMax div 2 ) - 1;    begin        logical array s ( 1 :: arrayMax);        i_w := 2; % set output field width                                   %        s_w := 1; % and output separator width                               %        % find and display the primes                                        %        sieve2( s, arrayMax );        write( 2 );        for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );    endend.`
Output:

Same as the standard version.

## ALGOL-M

` BEGIN COMMENT  FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING   CLASSIC SIEVE OF ERATOSTHENES; % CALCULATE INTEGER SQUARE ROOT %INTEGER FUNCTION ISQRT(N);INTEGER N;BEGIN    INTEGER R1, R2;    R1 := N;    R2 := 1;    WHILE R1 > R2 DO        BEGIN            R1 := (R1+R2) / 2;            R2 := N / R1;        END;    ISQRT := R1;END; INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;INTEGER ARRAY FLAGS[1:1000]; LIMIT := 1000;FALSE := 0;TRUE := 1; WRITE("FINDING PRIMES FROM 2 TO",LIMIT); % INITIALIZE TABLE %WRITE("INITIALIZING ... ");FOR I := 1 STEP 1 UNTIL LIMIT DO  FLAGS[I] := TRUE; % SIEVE FOR PRIMES %WRITEON("SIEVING ... ");FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO  BEGIN    IF FLAGS[I] = TRUE THEN        FOR J := (I * I) STEP I UNTIL LIMIT DO          FLAGS[J] := FALSE;  END; % WRITE OUT THE PRIMES TEN PER LINE %WRITEON("PRINTING");COUNT := 0;COL := 1;WRITE("");  FOR I := 2 STEP 1 UNTIL LIMIT DO  BEGIN    IF FLAGS[I] = TRUE THEN       BEGIN         WRITEON(I);         COUNT := COUNT + 1;         COL := COL + 1;         IF COL > 10 THEN           BEGIN             WRITE("");             COL := 1;           END;      END;  END; WRITE("");WRITE(COUNT, " PRIMES WERE FOUND."); END `
Output:
```FINDING PRIMES FROM 2 TO  1000
INTIALIZING ... SIEVING ... PRINTING
2     3     5     7    11    13    17    19    23    29
31    37    41    43    47    53    59    61    67    71
. . .
877   881   883   887   907   911   919   929   937   941
947   953   967   971   977   983   991   997

168 PRIMES WERE FOUND.
```

## APL

All these versions requires ⎕io←0 (index origin 0).

It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.

### Non-Optimized Version

`sieve2←{                            b←⍵⍴1               b[⍳2⌊⍵]←0           2≥⍵:b               p←{⍵/⍳⍴⍵}∇⌈⍵*0.5    m←1+⌊(⍵-1+p×p)÷p    b ⊣ p {b[⍺×⍺+⍳⍵]←0}¨ m} primes2←{⍵/⍳⍴⍵}∘sieve2`

The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).

### Optimized Version

`sieve←{                             b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5  b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1   49≥⍵:b                      p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5          m←1+⌊(⍵-1+p×p)÷2×p          b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m      } primes←{⍵/⍳⍴⍵}∘sieve`

The optimizations are as follows:

• Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
• Subsequently, only odd multiples of primes > 5 are marked.
• Multiples of a prime to be marked start at its square.

### Examples

`   primes 1002 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97    primes¨ ⍳14┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐││││2│2 3│2 3│2 3 5│2 3 5│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7 11│2 3 5 7 11│└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘    sieve 130 0 1 1 0 1 0 1 0 0 0 1 0    +/∘sieve¨ 10*⍳100 4 25 168 1229 9592 78498 664579 5761455 50847534`

The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).

## AppleScript

`on sieveOfEratosthenes(limit)    script o        property numberList : {missing value}    end script     repeat with n from 2 to limit        set end of o's numberList to n    end repeat    repeat with n from 2 to (limit ^ 0.5 div 1)        if (item n of o's numberList is n) then            repeat with multiple from (n * n) to limit by n                set item multiple of o's numberList to missing value            end repeat        end if    end repeat     return o's numberList's numbersend sieveOfEratosthenes sieveOfEratosthenes(1000)`
Output:
`{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}`

## ARM Assembly

Works with: as version Raspberry Pi
`  /* ARM assembly Raspberry PI  *//*  program cribleEras.s   */  /* REMARK 1 : this program use routines in a include file    see task Include a file language arm assembly    for the routine affichageMess conversion10    see at end of this program the instruction include *//* for constantes see task include a file in arm assembly *//************************************//* Constantes                       *//************************************/.include "../constantes.inc" .equ MAXI,      101  /*********************************//* Initialized data              *//*********************************/.datasMessResult:        .asciz "Prime  : @ \n"szCarriageReturn:   .asciz "\n" /*********************************//* UnInitialized data            *//*********************************/.bss  sZoneConv:                  .skip 24TablePrime:                 .skip   4 * MAXI /*********************************//*  code section                 *//*********************************/.text.global main main:                               @ entry of program     ldr r4,iAdrTablePrime           @ address prime table    mov r0,#2                       @ prime 2    bl displayPrime    mov r1,#2    mov r2,#11:                                  @ loop for multiple of 2    str r2,[r4,r1,lsl #2]           @ mark  multiple of 2    add r1,#2    cmp r1,#MAXI                    @ end ?    ble 1b                          @ no loop    mov r1,#3                       @ begin indice    mov r3,#12:    ldr r2,[r4,r1,lsl #2]           @ load table élément    cmp r2,#1                       @ is prime ?    beq 4f    mov r0,r1                       @ yes -> display    bl displayPrime    mov r2,r13:                                  @ and loop to mark multiples of this prime    str r3,[r4,r2,lsl #2]    add r2,r1                       @ add the prime    cmp r2,#MAXI              @ end ?    ble 3b                          @ no -> loop4:    add r1,#2                       @ other prime in table    cmp r1,#MAXI              @ end table ?    ble 2b                          @ no -> loop 100:                                @ standard end of the program     mov r0, #0                      @ return code    mov r7, #EXIT                   @ request to exit program    svc #0                          @ perform the system calliAdrszCarriageReturn:    .int szCarriageReturniAdrsMessResult:         .int sMessResultiAdrTablePrime:          .int TablePrime /******************************************************************//*      Display prime table elements                                */ /******************************************************************//* r0 contains the prime */displayPrime:    push {r1,lr}                    @ save registers    ldr r1,iAdrsZoneConv    bl conversion10                 @ call décimal conversion    ldr r0,iAdrsMessResult    ldr r1,iAdrsZoneConv            @ insert conversion in message    bl strInsertAtCharInc    bl affichageMess                @ display message100:    pop {r1,lr}    bx lriAdrsZoneConv:                   .int sZoneConv  /***************************************************//*      ROUTINES INCLUDE                           *//***************************************************/.include "../affichage.inc" `
```Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
Prime  : 101
```

## AutoHotkey

Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo

`MsgBox % "12345678901234567890`n" Sieve(20)  Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime    Static zero := 48, one := 49 ; Asc("0"), Asc("1")    VarSetCapacity(S,n,one)    NumPut(zero,S,0,"char")    i := 2    Loop % sqrt(n)-1 {       If (NumGet(S,i-1,"char") = one)          Loop % n//i             If (A_Index > 1)                NumPut(zero,S,A_Index*i-1,"char")       i += 1+(i>2)    }    Return S }`

### Alternative Version

`Sieve_of_Eratosthenes(n){	arr := []	loop % n-1		if A_Index>1			arr[A_Index] := true 	for i, v in arr	{		if (i>Sqrt(n))			break		else if arr[i]			while ((j := i*2 + (A_Index-1)*i) < n)				arr.delete(j)	}	return Arr}`
Examples:
`n := 101Arr := Sieve_of_Eratosthenes(n)loop, % n-1	output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")MsgBox % outputreturn`
Output:
```.	2	3	.	5	.	7	.	.	.
11	.	13	.	.	.	17	.	19	.
.	.	23	.	.	.	.	.	29	.
31	.	.	.	.	.	37	.	.	.
41	.	43	.	.	.	47	.	.	.
.	.	53	.	.	.	.	.	59	.
61	.	.	.	.	.	67	.	.	.
71	.	73	.	.	.	.	.	79	.
.	.	83	.	.	.	.	.	89	.
.	.	.	.	.	.	97	.	.	.```

## AutoIt

`#include <Array.au3>\$M = InputBox("Integer", "Enter biggest Integer")Global \$a[\$M], \$r[\$M], \$c = 1For \$i = 2 To \$M -1	If Not \$a[\$i] Then		\$r[\$c] = \$i		\$c += 1		For \$k = \$i To \$M -1 Step \$i			\$a[\$k] = True		Next	EndIfNext\$r = \$c - 1ReDim \$r[\$c]_ArrayDisplay(\$r)`

## AWK

An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:

```\$ awk '{for(i=2;i<=\$1;i++) a[i]=1;
>       for(i=2;i<=sqrt(\$1);i++) for(j=2;j<=\$1;j++) delete a[i*j];
>       for(i in a) printf i" "}'
100
71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3
```

The following variant does not unset non-primes, but sets them to 0, to preserve order in output:

```\$ awk '{for(i=2;i<=\$1;i++) a[i]=1;
>       for(i=2;i<=sqrt(\$1);i++) for(j=2;j<=\$1;j++) a[i*j]=0;
>       for(i=2;i<=\$1;i++) if(a[i])printf i" "}'
100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:

` # usage:  gawk  -v n=101  -f sieve.awk function sieve(n) { # print n,":"	for(i=2; i<=n;      i++) a[i]=1;	for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;	for(i=2; i<=n;      i++) if(a[i]) printf i" "	print ""} BEGIN	{ print "Sieve of Eratosthenes:"	  if(n>1) sieve(n)	} 	{ n=\$1+0 }n<2	{ exit }	{ sieve(n) } END	{ print "Bye!" } `

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.

` BEGIN {  ULIMIT=100 for ( n=1 ; (n++) < ULIMIT ; )     if (n in S) {        p = S[n]        delete S[n]        for ( m = n ; (m += p) in S ; )  { }        S[m] = p         }    else  print ( S[(n+n)] = n )} `

## Bash

See solutions at UNIX Shell.

## BASIC

Works with: FreeBASIC
Works with: RapidQ
`DIM n AS Integer, k AS Integer, limit AS Integer INPUT "Enter number to search to: "; limitDIM flags(limit) AS Integer FOR n = 2 TO SQR(limit)    IF flags(n) = 0 THEN        FOR k = n*n TO limit STEP n            flags(k) = 1        NEXT k    END IFNEXT n ' Display the primesFOR n = 2 TO limit    IF flags(n) = 0 THEN PRINT n; ", ";NEXT n`

### Applesoft BASIC

`10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT20  DIM FLAGS(LIMIT)30  FOR N = 2 TO SQR (LIMIT)40  IF FLAGS(N) < > 0 GOTO 8050  FOR K = N * N TO LIMIT STEP N60  FLAGS(K) = 170  NEXT K80  NEXT N90  REM  DISPLAY THE PRIMES100  FOR N = 2 TO LIMIT110  IF FLAGS(N) = 0 THEN PRINT N;", ";120  NEXT N`

### Atari BASIC

Translation of: Commodore BASIC

Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.

`100 REM SIEVE OF ERATOSTHENES110 PRINT "LIMIT";:INPUT LI120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I130 SL = SQR(LI)140 N(0)=0:N(1)=0150 FOR P=2 TO SL160  IF N(P)=0 THEN 200170  FOR I=P*P TO LI STEP P180    N(I)=0190  NEXT I200 NEXT P210 C=0220 FOR I=2 TO LI230   IF N(I)=0 THEN 260240   PRINT I,:C=C+1250   IF C=3 THEN PRINT:C=0260 NEXT I270 IF C THEN PRINT`
Output:
```  Ready
RUN
LIMIT?100
2         3         5
7         11        13
17        19        23
29        31        37
41        43        47
53        59        61
67        71        73
79        83        89
97```

### Commodore BASIC

Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.

`100 REM SIEVE OF ERATOSTHENES110 INPUT "LIMIT";LI120 DIM N(LI)130 SL = SQR(LI)140 N(0)=1:N(1)=1150 FOR P=2 TO SL160 : IF N(P) THEN 200170 : FOR I=P*P TO LI STEP P180 :   N(I)=1190 : NEXT I200 NEXT P210 FOR I=2 TO LI220 : IF N(I)=0 THEN PRINT I,230 NEXT I240 PRINT `
Output:
```READY.
RUN
LIMIT? 100
2         3         5         7
11        13        17        19
23        29        31        37
41        43        47        53
59        61        67        71
73        79        83        89
97

```

### IS-BASIC

`100 PROGRAM "Sieve.bas"110 LET LIMIT=100120 NUMERIC T(1 TO LIMIT)130 FOR I=1 TO LIMIT140   LET T(I)=0150 NEXT160 FOR I=2 TO SQR(LIMIT)170   IF T(I)<>1 THEN180     FOR K=I*I TO LIMIT STEP I190       LET T(K)=1200     NEXT210   END IF220 NEXT230 FOR I=2 TO LIMIT ! Display the primes240   IF T(I)=0 THEN PRINT I;250 NEXT`

### Locomotive Basic

`10 DEFINT a-z20 INPUT "Limit";limit30 DIM f(limit)40 FOR n=2 TO SQR(limit)50 IF f(n)=1 THEN 9060 FOR k=n*n TO limit STEP n70 f(k)=180 NEXT k90 NEXT n100 FOR n=2 TO limit110 IF f(n)=0 THEN PRINT n;",";120 NEXT`

### MSX Basic

`5 REM Tested with MSXPen web emulator6 REM Translated from Rosetta's ZX Spectrum implementation 10 INPUT "Enter number to search to: ";l20 DIM p(l)30 FOR n=2 TO SQR(l)40 IF p(n)<>0 THEN NEXT n50 FOR k=n*n TO l STEP n60 LET p(k)=170 NEXT k80 NEXT n90 REM Display the primes100 FOR n=2 TO l110 IF p(n)=0 THEN PRINT n;", ";120 NEXT n`

### Sinclair ZX81 BASIC

If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on `FAST` and `SLOW`: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering `FAST` mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to `SLOW` mode when we have something to print out.

` 10 INPUT L 20 FAST 30 DIM N(L) 40 FOR I=2 TO SQR L 50 IF N(I) THEN GOTO 90 60 FOR J=I+I TO L STEP I 70 LET N(J)=1 80 NEXT J 90 NEXT I100 SLOW110 FOR I=2 TO L120 IF NOT N(I) THEN PRINT I;" ";130 NEXT I`

### ZX Spectrum Basic

`10 INPUT "Enter number to search to: ";l20 DIM p(l)30 FOR n=2 TO SQR l40 IF p(n)<>0 THEN NEXT n50 FOR k=n*n TO l STEP n60 LET p(k)=170 NEXT k80 NEXT n90 REM Display the primes100 FOR n=2 TO l110 IF p(n)=0 THEN PRINT n;", ";120 NEXT n`

### QL SuperBASIC

#### using 'easy way' to 'add' 2n wheels

Translation of: ZX Spectrum Basic

Sets h\$ to 1 for higher multiples of 2 via `FILL\$`, later on sets `STEP` to 2n; replaces Floating Pt array p(z) with string variable h\$(z) to sieve out all primes < z=441 (`l`=21) in under 1K, so that h\$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.

`10 INPUT "Enter Stopping Pt for squared factors: ";z15 LET l=SQR(z)20 LET h\$="10" : h\$=h\$ & FILL\$("01",z)40      FOR n=3 TO l 50 IF h\$(n) THEN NEXT n60 FOR k=n*n TO z STEP n+n: h\$(k)=1 80      END FOR n   90 REM Display the primes     100 FOR n=2 TO z: IF h\$(n)=0 THEN PRINT n;", "; `

#### 2i wheel emulation of Sinclair ZX81 BASIC

Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the `STEP` of 2 in line 40 mitigates line 50's inefficiency when going to 90.
` 10 INPUT Z 15 LET L=SQR(Z) 30 LET H\$="10" 32 FOR J=3 TO Z STEP 2 34 LET H\$=H\$ & "01" 36 NEXT J 40 FOR I=3 TO L STEP 2 50 IF H\$(I)="1" THEN GOTO 90 60 FOR J=I*I TO Z STEP I+I 70 LET H\$(J)="1" 80 NEXT J 90 NEXT I110 FOR I=2 TO Z120 IF H\$(I)="0" THEN PRINT I!130 NEXT I `

#### 2i wheel emulation of Sinclair ZX80 BASIC

. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, `L` (=21 at most on 1K ZX80s for all primes till 439).
` 10 INPUT L 15 LET Z=(L+1)*(L- 1)/2 30 DIM H(Z) 40 FOR I=3 TO L STEP 2 50 IF H((I-1)/ 2) THEN GOTO 90 60 FOR J=I*I TO L*L STEP I+I 70 LET H((J-1)/ 2)=1 80 NEXT J 90 NEXT I110 FOR I=0 TO Z120 IF NOT H(I) THEN PRINT 0^I+1+I*2!130 NEXT I `

#### Sieve of Sundaram

Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now just a slight transformation away from the Sieve of Sundaram, as transformed as follows: `O` is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all optimisations incorporated into the prior entries for QL SuperBASIC, AND by not implementing an equivalent to line 50 in them.

Backward-compatible on 1K ZX80s for all primes < 441 (O=10).
` 10 INPUT O 15 LET Z=2*O^2+O*2  30 DIM H(Z) 40 FOR I=1 TO O 45 LET A=2*I^2+I*2 60 FOR J=A TO Z STEP 1+I*2 70 LET H(J)=1 80 NEXT J 90 NEXT I 110 FOR I=0 TO Z120 IF NOT H(I) THEN PRINT 0^I+1+I*2!130 NEXT I `

#### Eulerian optimisation

While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of `FOR` statements (& as their 1K of static RAM is equivalent to L1 cache, even in `FAST` mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30).

` 10 INPUT "Enter highest value of diagonal index q%: ";o%15 LET z%=o%*(2+o%*2) : h\$=FILL\$("00",z%) : q%=1 : q=q% : m=z% DIV (2*q%+1)30 FOR p=m TO q STEP -1: h\$(p+q+2*q*p)="1" 42 GOTO 8761 IF h\$(p%)="1" THEN GOTO 6362 IF p%<q% THEN GOTO 87 : ELSE h\$(p%+q%+2*q%*p%)="1"63 LET p%=p%-1 : GOTO 6187 LET q%=q%+1 : IF h\$(q%)="1" THEN GOTO 8790 LET p%=z% DIV (2*q%+1) : IF q%<=o% THEN GOTO 61115 LET z=z%127 FOR i=0 TO z: IF h\$(i)="0" THEN PRINT 0^i+1+i*2! `

## Batch File

`:: Sieve of Eratosthenes for Rosetta Code - PG@echo offsetlocal ENABLEDELAYEDEXPANSIONsetlocal ENABLEEXTENSIONSrem echo onset /p n=limit: rem set n=100for /L %%i in (1,1,%n%) do set crible.%%i=1for /L %%i in (2,1,%n%) do (  if !crible.%%i! EQU 1 (    set /A w = %%i * 2    for /L %%j in (!w!,%%i,%n%) do (	  set crible.%%j=0	)  ))for /L %%i in (2,1,%n%) do (  if !crible.%%i! EQU 1 echo %%i )pause`
Output:
```limit: 100
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97```

## BBC BASIC

`      limit% = 100000      DIM sieve% limit%       prime% = 2      WHILE prime%^2 < limit%        FOR I% = prime%*2 TO limit% STEP prime%          sieve%?I% = 1        NEXT        REPEAT prime% += 1 : UNTIL sieve%?prime%=0      ENDWHILE       REM Display the primes:      FOR I% = 1 TO limit%        IF sieve%?I% = 0 PRINT I%;      NEXT`

## BCPL

`get "libhdr" manifest \$( LIMIT = 1000 \$) let sieve(prime,max) be\$(  let i = 2    0!prime := false    1!prime := false    for i = 2 to max do i!prime := true    while i*i <= max do    \$(  if i!prime do        \$(  let j = i*i            while j <= max do            \$(  j!prime := false                j := j + i            \$)        \$)        i := i + 1    \$)\$) let start() be\$(  let prime = vec LIMIT    let col = 0    sieve(prime, LIMIT)    for i = 2 to LIMIT do        if i!prime do        \$(  writef("%I4",i)               col := col + 1            if col rem 20 = 0 then wrch('*N')        \$)    wrch('*N')\$)`
Output:
```   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997```

### Odds-only bit packed array version (64 bit)

This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.

` GET "libhdr" LET lowbit(n) =    0 -> -1,    VALOF {        // The table is byte packed to conserve space; therefore we must        // unpack the structure.        //        LET deBruijn64 = TABLE            #x0001300239311C03, #x3D3A322A261D1104,            #x3E373B2435332B16, #x2D27211E18120C05,            #x3F2F381B3C292510, #x362334152C20170B,            #x2E1A280F22141F0A, #x190E13090D080706         LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58        RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF    } LET primes_upto(limit) =    limit < 3 -> 0,    VALOF {        LET bit_sz = (limit + 1) / 2 - 1        LET bit, p = ?, ?        LET q, r = bit_sz >> 6, bit_sz & #x3F        LET sz = q - (r > 0)        LET sieve = getvec(sz)         // Initialize the array        FOR i = 0 TO q - 1 DO            sieve!i := -1        IF r > 0 THEN sieve!q := ~(-1 << r)        sieve!sz := -1 // Sentinel value to mark the end -              // (after sieving, we'll never have 64 consecutive odd primes.)         // run the sieve        bit := 0        {            WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO                bit +:= 1            p := 2*bit + 3            q := p*p            IF q > limit THEN RESULTIS sieve            r := (q - 3) >> 1            UNTIL r >= bit_sz DO {                sieve[r >> 6] &:= ~(1 << (r & #x3F))                r +:= p            }            bit +:= 1        } REPEAT    } MANIFEST { // fields in an iterable    sieve_start; sieve_bits; sieve_ptr} LET prime_iter(sieve) = VALOF {    LET iter = getvec(2)    iter!sieve_start := 0    iter!sieve_bits := sieve!0    iter!sieve_ptr := sieve    RESULTIS iter} LET nextprime(iter) =    !iter!sieve_ptr = -1 -> 0,  // guard entry if at the end already    VALOF {        LET p, x = ?, ?         // iter!sieve_start is also a flag to yield 2.        IF iter!sieve_start = 0 {            iter!sieve_start := 3            RESULTIS 2        }        x := iter!sieve_bits        {            TEST x ~= 0            THEN {                p := (lowbit(x) << 1) + iter!sieve_start                x &:= x - 1                iter!sieve_bits := x                RESULTIS p            }            ELSE {                iter!sieve_start +:= 128                iter!sieve_ptr +:= 1                x := !iter!sieve_ptr                IF x = -1 RESULTIS 0            }        } REPEAT    } LET show(sieve) BE {    LET iter = prime_iter(sieve)    LET c, p = 0, ?    {        p := nextprime(iter)        IF p = 0 THEN {            wrch('*n')            freevec(iter)            RETURN        }        IF c MOD 10 = 0 THEN wrch('*n')        c +:= 1        writef("%8d", p)    } REPEAT} LET start() = VALOF {    LET n = ?    LET argv = VEC 20    LET sz = ?    LET primes = ?     sz := rdargs("upto/a/n/p", argv, 20)    IF sz = 0 RESULTIS 1    n := !argv!0    primes := primes_upto(n)    IF primes = 0 RESULTIS 1 // no array allocated because limit too small    show(primes)    freevec(primes)    RESULTIS 0} `
Output:
```\$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000>
2       3       5       7      11      13      17      19      23      29
31      37      41      43      47      53      59      61      67      71
73      79      83      89      97     101     103     107     109     113
127     131     137     139     149     151     157     163     167     173
179     181     191     193     197     199     211     223     227     229
233     239     241     251     257     263     269     271     277     281
283     293     307     311     313     317     331     337     347     349
353     359     367     373     379     383     389     397     401     409
419     421     431     433     439     443     449     457     461     463
467     479     487     491     499     503     509     521     523     541
547     557     563     569     571     577     587     593     599     601
607     613     617     619     631     641     643     647     653     659
661     673     677     683     691     701     709     719     727     733
739     743     751     757     761     769     773     787     797     809
811     821     823     827     829     839     853     857     859     863
877     881     883     887     907     911     919     929     937     941
947     953     967     971     977     983     991     997
0.005>
```

## Befunge

```2>:3g" "-!v\  g30          <
|!`"O":+1_:.:03p>03g+:"O"`|
@               ^  p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789
```

## Bracmat

This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.

`( ( eratosthenes  =   n j i    .   !arg:?n      & 1:?i      &   whl        ' ( (1+!i:?i)^2:~>!n:?j          & ( !!i            |   whl              ' ( !j:~>!n                & nonprime:?!j                & !j+!i:?j                )            )          )      & 1:?i      &   whl        ' ( 1+!i:~>!n:?i          & (!!i|put\$(!i " "))          )  )& eratosthenes\$100)`
Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

## C

Plain sieve, without any optimizations:
`#include <stdlib.h>#include <math.h> char*eratosthenes(int n, int *c){	char* sieve;	int i, j, m; 	if(n < 2)		return NULL; 	*c = n-1;     /* primes count */	m = (int) sqrt((double) n); 	/* calloc initializes to zero */	sieve = calloc(n+1,sizeof(char));	sieve = 1;	sieve = 1;	for(i = 2; i <= m; i++)		if(!sieve[i])			for (j = i*i; j <= n; j += i)				if(!sieve[j]){					sieve[j] = 1; 					--(*c);				}  	return sieve;}`
Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.

Another example:

We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.

To print this back, we look for ones in the array and only print those spots.
`#include <stdio.h>#include <malloc.h>void sieve(int *, int); int main(int argc, char *argv){    int *array, n=10;    array =(int *)malloc((n + 1) * sizeof(int));    sieve(array,n);    return 0;} void sieve(int *a, int n){    int i=0, j=0;     for(i=2; i<=n; i++) {        a[i] = 1;    }     for(i=2; i<=n; i++) {        printf("\ni:%d", i);        if(a[i] == 1) {            for(j=i; (i*j)<=n; j++) {                printf ("\nj:%d", j);                printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);                a[(i*j)] = 0;                printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);            }        }    }     printf("\nPrimes numbers from 1 to %d are : ", n);    for(i=2; i<=n; i++) {        if(a[i] == 1)            printf("%d, ", i);    }    printf("\n\n");}`
Output:
`i:2j:2Before a[2*2]: 1After a[2*2]: 0j:3Before a[2*3]: 1After a[2*3]: 0j:4Before a[2*4]: 1After a[2*4]: 0j:5Before a[2*5]: 1After a[2*5]: 0i:3j:3Before a[3*3]: 1After a[3*3]: 0i:4i:5i:6i:7i:8i:9i:10Primes numbers from 1 to 10 are : 2, 3, 5, 7, `

## C#

Works with: C# version 2.0+
`using System;using System.Collections;using System.Collections.Generic; namespace SieveOfEratosthenes{    class Program    {        static void Main(string[] args)        {            int maxprime = int.Parse(args);            var primelist = GetAllPrimesLessThan(maxprime);            foreach (int prime in primelist)            {                Console.WriteLine(prime);            }            Console.WriteLine("Count = " + primelist.Count);            Console.ReadLine();        }         private static List<int> GetAllPrimesLessThan(int maxPrime)        {            var primes = new List<int>();            var maxSquareRoot = (int)Math.Sqrt(maxPrime);            var eliminated = new BitArray(maxPrime + 1);             for (int i = 2; i <= maxPrime; ++i)            {                if (!eliminated[i])                {                    primes.Add(i);                    if (i <= maxSquareRoot)                    {                        for (int j = i * i; j <= maxPrime; j += i)                        {                            eliminated[j] = true;                        }                    }                }            }            return primes;        }    }}`

### Unbounded

Richard Bird Sieve

Translation of: F#

To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:

`using System;using System.Collections;using System.Collections.Generic;using System.Linq;using PrimeT = System.UInt32;  class PrimesBird : IEnumerable<PrimeT> {    private struct CIS<T> {      public T v; public Func<CIS<T>> cont;      public CIS(T v, Func<CIS<T>> cont) {        this.v = v; this.cont = cont;      }    }    private CIS<PrimeT> pmlts(PrimeT p) {      Func<PrimeT, CIS<PrimeT>> fn = null;      fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));      return fn(p * p);    }    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {      var x = xs.v; var y = ys.v;      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));    }    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {      var nn = n; var ncs = cs;      for (; ; ++nn) {        if (nn >= ncs.v) ncs = ncs.cont();        else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));      }    }    private CIS<PrimeT> prms() {      return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }    public IEnumerator<PrimeT> GetEnumerator() {      for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;    }    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }  }`

Tree Folding Sieve

Translation of: F#

The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:

`using System;using System.Collections;using System.Collections.Generic;using System.Linq;using PrimeT = System.UInt32;  class PrimesTreeFold : IEnumerable<PrimeT> {    private struct CIS<T> {      public T v; public Func<CIS<T>> cont;      public CIS(T v, Func<CIS<T>> cont) {        this.v = v; this.cont = cont;      }    }    private CIS<PrimeT> pmlts(PrimeT p) {      var adv = p + p;      Func<PrimeT, CIS<PrimeT>> fn = null;      fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));      return fn(p * p);    }    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));    }    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {      var x = xs.v; var y = ys.v;      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));    }    private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {      var nxtcss = css.cont();      return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));    }    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {      var nn = n; var ncs = cs;      for (; ; nn += 2) {        if (nn >= ncs.v) ncs = ncs.cont();        else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));      }    }    private CIS<PrimeT> oddprms() {      return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));    }    public IEnumerator<PrimeT> GetEnumerator() {      yield return 2;      for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;    }    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }  }`

The above code runs over ten times faster than the original Richard Bird algorithm.

Priority Queue Sieve

Translation of: F#

First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:

`namespace PriorityQ {  using KeyT = System.UInt32;  using System;  using System.Collections.Generic;  using System.Linq;  class Tuple<K, V> { // for DotNet 3.5 without Tuple's    public K Item1; public V Item2;    public Tuple(K k, V v) { Item1 = k; Item2 = v; }    public override string ToString() {      return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";    }  }  class MinHeapPQ<V> {    private struct HeapEntry {      public KeyT k; public V v;      public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }    }    private List<HeapEntry> pq;    private MinHeapPQ() { this.pq = new List<HeapEntry>(); }    private bool mt { get { return pq.Count == 0; } }    private Tuple<KeyT, V> pkmn {      get {        if (pq.Count == 0) return null;        else {          var mn = pq;          return new Tuple<KeyT, V>(mn.k, mn.v);        }      }    }    private void psh(KeyT k, V v) { // add extra very high item if none      if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));      var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...      for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {        var t = pq[ni - 1];        if (t.k > k) pq[i - 1] = t; else break;      }      pq[i - 1] = new HeapEntry(k, v);    }    private void siftdown(KeyT k, V v, int ndx) {      var cnt = pq.Count - 1; var i = ndx;      for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {        var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;        var nk = k;        if (k > lk) { i = ni; nk = lk; }        if (nk > rk) { ni += 1; i = ni; }        if (i != oi) pq[oi] = pq[i]; else break;      }      pq[i] = new HeapEntry(k, v);    }    private void rplcmin(KeyT k, V v) {      if (pq.Count > 1) siftdown(k, v, 0); }    public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }    public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }    public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {      pq.psh(k, v); return pq; }    public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {      pq.rplcmin(k, v); return pq; }}`

The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

`using System;using System.Collections;using System.Collections.Generic;using System.Linq;using PrimeT = System.UInt32;  class PrimesPQ : IEnumerable<PrimeT> {    private IEnumerator<PrimeT> nmrtr() {      MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;      PrimeT bp = 3; PrimeT q = 9;      IEnumerator<PrimeT> bps = null;      yield return 2; yield return 3;      for (var n = (PrimeT)5; ; n += 2) {        if (n >= q) { // always equal or less...          if (q <= 9) {            bps = nmrtr();            bps.MoveNext(); bps.MoveNext(); } // move to 3...          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;          var adv = bp + bp; bp = nbp;          pq = MinHeapPQ<PrimeT>.push(n + adv, adv, pq);        }        else {          var pk = MinHeapPQ<PrimeT>.peekMin(pq);          var ck = (pk == null) ? q : pk.Item1;          if (n >= ck) {            do { var adv = pk.Item2;                  pq = MinHeapPQ<PrimeT>.replaceMin(ck + adv, adv, pq);                  pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;            } while (n >= ck);          }          else yield return n;        }      }    }    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }  }`

The above code is at least about 2.5 times faster than the Tree Folding version.

Dictionary (Hash table) Sieve

The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

`using System;using System.Collections;using System.Collections.Generic;using System.Linq;using PrimeT = System.UInt32;  class PrimesDict : IEnumerable<PrimeT> {    private IEnumerator<PrimeT> nmrtr() {      Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();      PrimeT bp = 3; PrimeT q = 9;      IEnumerator<PrimeT> bps = null;      yield return 2; yield return 3;      for (var n = (PrimeT)5; ; n += 2) {        if (n >= q) { // always equal or less...          if (q <= 9) {            bps = nmrtr();            bps.MoveNext(); bps.MoveNext();          } // move to 3...          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;          var adv = bp + bp; bp = nbp;          dct.Add(n + adv, adv);        }        else {          if (dct.ContainsKey(n)) {            PrimeT nadv; dct.TryGetValue(n, out nadv); dct.Remove(n); var nc = n + nadv;            while (dct.ContainsKey(nc)) nc += nadv;            dct.Add(nc, nadv);          }          else yield return n;        }      }    }    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }  }`

The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".

Page Segmented Array Sieve

All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:

`using System;using System.Collections;using System.Collections.Generic;using System.Linq;using PrimeT = System.UInt32;  class PrimesPgd : IEnumerable<PrimeT> {    private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes    private const int BFBTS = PGSZ * 8; // in bits    private const int BFRNG = BFBTS * 2;    public IEnumerator<PrimeT> nmrtr() {      IEnumerator<PrimeT> bps = null;      List<uint> bpa = new List<uint>();      uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words      yield return 2;      for (var lowi = (PrimeT)0; ; lowi += BFBTS) {        for (var bi = 0; ; ++bi) {          if (bi < 1) {            if (bi < 0) { bi = 0; yield return 2; }            PrimeT nxt = 3 + lowi + lowi + BFRNG;            if (lowi <= 0) { // cull very first page              for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)                if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)                  for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)                    cbuf[j >> 5] |= 1u << j;            }            else { // cull for the rest of the pages              Array.Clear(cbuf, 0, cbuf.Length);              if (bpa.Count == 0) { // inite secondar base primes stream                bps = nmrtr(); bps.MoveNext(); bps.MoveNext();                bpa.Add((uint)bps.Current); bps.MoveNext();              } // add 3 to base primes array              // make sure bpa contains enough base primes...              for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {                p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);              }              for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {                var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;                // adjust start index based on page lower limit...                if (s >= lowi) s -= lowi;                else {                  var r = (lowi - s) % p;                  s = (r != 0) ? p - r : 0;                }                for (var j = (uint)s; j < BFBTS; j += p)                  cbuf[j >> 5] |= 1u << ((int)j);              }            }          }          while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;          if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);          else break; // outer loop for next page segment...        }      }    }    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }  }`

The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.

The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "rull-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.

In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).

Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.

All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):

`    static void Main(string[] args) {      Console.WriteLine(PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...    }`

To produce the following output for all tested versions (although some are considerably faster than others):

Output:
`15485863`

## C++

### Standard Library

This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

`#include <iostream>#include <algorithm>#include <vector> // requires Iterator satisfies RandomAccessIteratortemplate <typename Iterator>size_t prime_sieve(Iterator start, Iterator end){    if (start == end) return 0;    // clear the container with 0    std::fill(start, end, 0);    // mark composites with 1    for (Iterator prime_it = start + 1; prime_it != end; ++prime_it)    {        if (*prime_it == 1) continue;        // determine the prime number represented by this iterator location        size_t stride = (prime_it - start) + 1;        // mark all multiples of this prime number up to max        Iterator mark_it = prime_it;        while ((end - mark_it) > stride)        {            std::advance(mark_it, stride);            *mark_it = 1;        }    }    // copy marked primes into container    Iterator out_it = start;    for (Iterator it = start + 1; it != end; ++it)    {        if (*it == 0)        {            *out_it = (it - start) + 1;            ++out_it;        }    }    return out_it - start;} int main(int argc, const char* argv[]){    std::vector<int> primes(100);    size_t count = prime_sieve(primes.begin(), primes.end());    // display the primes    for (size_t i = 0; i < count; ++i)        std::cout << primes[i] << " ";    std::cout << std::endl;    return 1;}`

### Boost

`// yield all prime numbers less than limit. template<class UnaryFunction>void primesupto(int limit, UnaryFunction yield){  std::vector<bool> is_prime(limit, true);   const int sqrt_limit = static_cast<int>(std::sqrt(limit));  for (int n = 2; n <= sqrt_limit; ++n)    if (is_prime[n]) {	yield(n); 	for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n)       //NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX	  is_prime[k] = false;    }   for (int n = sqrt_limit + 1; n < limit; ++n)    if (is_prime[n])	yield(n);}`

Full program:

Works with: Boost
`/**   \$ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000 */#include <inttypes.h> // uintmax_t#include <limits>#include <cmath>#include <iostream>#include <sstream>#include <vector> #include <boost/lambda/lambda.hpp> int main(int argc, char *argv[]){  using namespace std;  using namespace boost::lambda;   int limit = 10000;  if (argc == 2) {    stringstream ss(argv[--argc]);    ss >> limit;     if (limit < 1 or ss.fail()) {      cerr << "USAGE:\n  sieve LIMIT\n\nwhere LIMIT in the range [1, " 	   << numeric_limits<int>::max() << ")" << endl;      return 2;    }  }   // print primes less then 100  primesupto(100, cout << _1 << " ");  cout << endl;     // find number of primes less then limit and their sum  int count = 0;  uintmax_t sum = 0;  primesupto(limit, (var(sum) += _1, var(count) += 1));   cout << "limit sum pi(n)\n"        << limit << " " << sum << " " << count << endl;}`

## Chapel

 This example is incorrect. Please fix the code and remove this message.Details: Doesn't compile since at least Chapel version 1.20 to 1.24.1.

This solution uses nested iterators to create new wheels at run time:

`// yield prime and remove all multiples of it from children sievesiter sieve(prime):int {         yield prime;         var last = prime;        label candidates for candidate in sieve(prime+1) do {                for composite in last..candidate by prime do {                         // candidate is a multiple of this prime                        if composite == candidate {                                // remember size of last composite                                last = composite;                                // and try the next candidate                                continue candidates;                        }                }                 // candidate cannot need to be removed by this sieve                // yield to parent sieve for checking                yield candidate;        }}`
The topmost sieve needs to be started with 2 (the smallest prime):
`config const N = 30;for p in sieve(2) {        if p > N {                writeln();                break;        }        write(" ", p);}`

### Alternate Conventional Bit-Packed Implementation

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

`use Time;use BitOps; type Prime = uint(32); config const limit: Prime = 1000000000; // sieve limit proc main() {  write("The first 25 primes are:  ");  for p in primes(100) do write(p, " "); writeln();   var count = 0; for p in primes(1000000) do count += 1;  writeln("Count of primes to a million is:  ", count, ".");   var timer: Timer;  timer.start();   count = 0;  for p in primes(limit) do count += 1;   timer.stop();  write("Found ", count, " primes up to ", limit);  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");} iter primes(n: Prime): Prime {  const szlmt = n / 8;  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up   for bp in 2 .. n {    if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {      const s0 = bp * bp;      if s0 > n then break;      for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }    }  }   for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p; }`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

### Alternate Odds-Only Bit-Packed Implementation

`use Time;use BitOps; type Prime = int(32); config const limit: Prime = 1000000000; // sieve limit proc main() {  write("The first 25 primes are:  ");  for p in primes(100) do write(p, " "); writeln();   var count = 0; for p in primes(1000000) do count += 1;  writeln("Count of primes to a million is:  ", count, ".");   var timer: Timer;  timer.start();   count = 0;  for p in primes(limit) do count += 1;   timer.stop();  write("Found ", count, " primes up to ", limit);  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");} iter primes(n: Prime): Prime {  const ndxlmt = (n - 3) / 2;  const szlmt = ndxlmt / 8;  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up   for i in 0 .. ndxlmt { // never gets to the end!    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {      const bp = i + i + 3;      const s0 = (bp * bp - 3) / 2;      if s0 > ndxlmt then break;      for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);    }  }   yield 2;  for i in 0 .. ndxlmt do    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3; }`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

### Hash Table Based Odds-Only Version

Translation of: Python
Works with: Chapel version 1.22
`use Time; config const limit = 100000000; type Prime = uint(32); class Primes { // needed so we can use next to get successive values  var n: Prime; var obp: Prime; var q: Prime;  var bps: owned Primes?;  var keys: domain(Prime); var dict: [keys] Prime;  proc next(): Prime { // odd primes!    if this.n < 5 { this.n = 5; return 3; }    if this.bps == nil {      this.bps = new Primes(); // secondary odd base primes feed      this.obp = this.bps!.next(); this.q = this.obp * this.obp;    }    while true {      if this.n >= this.q { // advance secondary stream of base primes...        const adv = this.obp * 2; const key = this.q + adv;        this.obp = this.bps!.next(); this.q = this.obp * this.obp;               this.keys += key; this.dict[key] = adv;      }      else if this.keys.contains(this.n) { // found a composite; advance...        const adv = this.dict[this.n]; this.keys.remove(this.n);        var nkey = this.n + adv;        while this.keys.contains(nkey) do nkey += adv;        this.keys += nkey; this.dict[nkey] = adv;      }      else { const p = this.n; this.n += 2; return p; }      this.n += 2;    }    return 0; // to keep compiler happy in returning a value!  }  iter these(): Prime { yield 2; while true do yield this.next(); }} proc main() {  var count = 0;  write("The first 25 primes are:  ");  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }  writeln();   var timer: Timer;  timer.start();   count = 0;  for p in new Primes() { if p > limit then break; count += 1; }   timer.stop();  write("Found ", count, " primes up to ", limit);  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");}`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 9020.12 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions; this is due in part to Chapel's current somewhat inefficient implementation of hash tables, both as Association Arrays as used here and in the Map standard library. This is somewhat slower than as run in Python as for the code from which it was translated due to Python's extremely well optimized hash table implementation as used internally.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

Works with: Chapel version 1.22

Compile with the `--fast` compiler command line option

`use Time; config const limit = 100000000; type Prime = uint(32); record BasePrimesTable { // specialized for the use here...  record BasePrimeEntry { var fullkey: Prime; var val: Prime; }  var cpcty: int = 8; var sz: int = 0;  var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;  proc grow() {       const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];    bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };    for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);     }  proc find(k: Prime): int { // internal get location of value or -1    const msk = cpcty - 1; var skey = k: int & msk;    var perturb = k: int; var loop = 8;    do {      if bpa[skey].fullkey == k then return skey;      perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;      loop -= 1; if perturb > 0 then loop = 8;    } while loop > 0;    return -1; // not found!  }  proc contains(k: Prime): bool { return find(k) >= 0; }  proc add(k, v: Prime) { // if exists then replaces else new entry    const fndi = find(k);    if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);    else {      sz += 1; if 2 * sz > cpcty then grow();      const msk = cpcty - 1; var skey = k: int & msk;      var perturb = k: int; var loop = 8;      do {        if bpa[skey].fullkey == 0 {          bpa[skey] = new BasePrimeEntry(k, v); return; }        perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;        loop -= 1; if perturb > 0 then loop = 8;      } while loop > 0;    }  }  proc remove(k: Prime) { // if doesn't exist does nothing    const fndi = find(k);    if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }  }  proc this(k: Prime): Prime { // returns value or 0 if not found    const fndi = find(k);    if fndi < 0 then return 0; else return bpa[fndi].val;  }} class Primes { // needed so we can use next to get successive values  var n: Prime; var obp: Prime; var q: Prime;  var bps: shared Primes?; var dict = new BasePrimesTable();  proc next(): Prime { // odd primes!    if this.n < 5 { this.n = 5; return 3; }    if this.bps == nil {      this.bps = new Primes(); // secondary odd base primes feed      this.obp = this.bps!.next(); this.q = this.obp * this.obp;    }    while true {      if this.n >= this.q { // advance secondary stream of base primes...        const adv = this.obp * 2; const key = this.q + adv;        this.obp = this.bps!.next(); this.q = this.obp * this.obp;        this.dict.add(key, adv);      }      else if this.dict.contains(this.n) { // found a composite; advance...        const adv = this.dict[this.n]; this.dict.remove(this.n);        var nkey = this.n + adv;        while this.dict.contains(nkey) do nkey += adv;        this.dict.add(nkey, adv);      }      else { const p = this.n; this.n += 2; return p; }      this.n += 2;    }    return 0; // to keep compiler happy in returning a value!  }  iter these(): Prime { yield 2; while true do yield this.next(); }} proc main() {  var count = 0;  write("The first 25 primes are:  ");  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }  writeln();   var timer: Timer;  timer.start();   count = 0;  for p in new Primes() { if p > limit then break; count += 1; }   timer.stop();  write("Found ", count, " primes up to ", limit);  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");}`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 2020.03 milliseconds.```

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

### Functional Tree Folding Odds-Only Version

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

Translation of: Nim
Works with: 1.22 version - compile with the --fast compiler command line flag for full optimization
`use Time; type Prime = uint(32); config const limit = 1000000: Prime; // Chapel doesn't have closures, so we need to emulate them with classes...class PrimeCIS { // base prime stream...  var head: Prime;  proc next(): shared PrimeCIS { return new shared PrimeCIS(); }} class PrimeMultiples: PrimeCIS {  var adv: Prime;  override proc next(): shared PrimeCIS {    return new shared PrimeMultiples(      this.head + this.adv, this.adv): shared PrimeCIS; }} class PrimeCISCIS { // base stream of prime streams; never used directly...  var head: shared PrimeCIS;  proc init() { this.head = new shared PrimeCIS(); }  proc next(): shared PrimeCISCIS {    return new shared PrimeCISCIS(); }} class AllMultiples: PrimeCISCIS {  var bps: shared PrimeCIS;  proc init(bsprms: shared PrimeCIS) {    const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;    this.head = new shared PrimeMultiples(sqr, adv): PrimeCIS;    this.bps = bsprms;  }  override proc next(): shared PrimeCISCIS {    return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }} class Union: PrimeCIS {  var feeda, feedb: shared PrimeCIS;  proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {    const ahd = fda.head; const bhd = fdb.head;    this.head = if ahd < bhd then ahd else bhd;    this.feeda = fda; this.feedb = fdb;  }  override proc next(): shared PrimeCIS {        const ahd = this.feeda.head; const bhd = this.feedb.head;    if ahd < bhd then      return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;    if ahd > bhd then      return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;    return new shared Union(this.feeda.next(),                            this.feedb.next()): shared PrimeCIS;  }} class Pairs: PrimeCISCIS {  var feed: shared PrimeCISCIS;  proc init(fd: shared PrimeCISCIS) {    const fs = fd.head; const sss = fd.next(); const ss = sss.head;    this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;  }  override proc next(): shared PrimeCISCIS {    return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }} class Composites: PrimeCIS {  var feed: shared PrimeCISCIS;  proc init(fd: shared PrimeCISCIS) {    this.head = fd.head.head; this.feed = fd;  }  override proc next(): shared PrimeCIS {    const fs = this.feed.head.next();    const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;    const ncs = new shared Composites(prs): shared PrimeCIS;    return new shared Union(fs, ncs): shared PrimeCIS;  }} class OddPrimesFrom: PrimeCIS {  var cmpsts: shared PrimeCIS;  override proc next(): shared PrimeCIS {    var n = head + 2; var cs = this.cmpsts;    while true {      if n < cs.head then        return new shared OddPrimesFrom(n, cs): shared PrimeCIS;      n += 2; cs = cs.next();    }    return this.cmpsts; // never used; keeps compiler happy!  }} class OddPrimes: PrimeCIS {  proc init() { this.head = 3; }  override proc next(): shared PrimeCIS {    const bps = new shared OddPrimes(): shared PrimeCIS;    const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;    const cmpsts = new shared Composites(mlts): shared PrimeCIS;    return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;  }} iter primes(): Prime {  yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;  while true { yield cur.head; cur = cur.next(); }} // test it...write("The first 25 primes are: "); var cnt = 0;for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); } Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded). var timer: Timer; timer.start(); cnt = 0;for p in primes() { if p > limit then break; cnt += 1; }timer.stop(); write("\nFound ", cnt, " primes up to ", limit);writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

### A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version

To take advantage of the features that make Chapel shine, we need to use it to do some parallel computations, and to efficiently do that for the Sieve of Eratosthenes, we need to divide the work into page segments where we can assign each largish segment to a separate thread/task; this also improves the speed due to better cache associativity with most memory accesses to values that are already in the cache(s). Once we have divided the work, Chapel offers lots of means to implement the parallelism but to be a true Sieve of Eratosthenes, we need to have the ability to output the results in order; with many of the convenience mechanisms not doing that, the best/simplest option is likely task parallelism with the output results assigned to an rotary indexed array containing the `sync` results. It turns out that, although the Chapel compiler can sometimes optimize the code so the overhead of creating tasks is not onerous, for this case where the actual tasks are somewhat complex, the compiler can't recognize that an automatically generated thread pool(s) are required so we need to generate the thread pool(s) manually. The code that implements the multi-threading of page segments using thread pools is as follows:

Works with: 1.24.1 version - compile with the --fast compiler command line flag for full optimization
`use Time; use BitOps; use CPtr; type Prime = uint(64);type PrimeNdx = int(64);type BasePrime = uint(32); config const LIMIT = 1000000000: Prime; config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);assert (L1 == 16 || L1 == 32 || L1 == 64,        "L1 cache size must be 16, 32, or 64 Kilobytes!");config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);assert (L2 == 128 || L2 == 256 || L2 == 512,        "L2 cache size must be 128, 256, or 512 Kilobytes!");const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!config const NUMTHRDS = here.maxTaskPar;assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!"); const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,                            11: Prime, 13: Prime, 17: Prime];const FRSTSVPRM = 19: Prime; // past the pre-cull primes!// 2 eliminated as even; 255255 in bytes...const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072; // number of base primes within small span!const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!// number of base primes for CPU L1 cache buffer!const SZMNSTRTS = (if L1 == 16 then 12251 else                     if L1 == 32 then 23000 else 43390)                       - WHLPRMS.size + 1; // extra one for marker! // using this Look Up Table faster than bit twiddling...const bitmsk = for i in 0 .. 7 do 1:uint(8) << i; var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);proc fillWHLPTRN(ref wp: SieveBuffer) {  const hi = WHLPRMS.size - 1;  const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});  // contains wheel pattern primes skipping the small wheel prime (2)!...  // never advances past the first base prime arr as it ends with a huge!...  for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!                                    else 0x7FFFFFFF): BasePrime; // last huge!  var whlbpas = new shared BasePrimeArrs(whlhd);  var whlstrts = new StrtsArr({rng});  wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);  // eliminate wheel primes from the WHLPTRN buffer!...  wp.cmpsts = 0xFF: uint(8);} // the following two must be classes for compability with sync...class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }record SieveBuffer {  var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;  proc init() {}  proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }  proc deinit() { dom = { 0 .. -1 }; }   proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...    const sz = cmpsts.size; const mvsz = min(sz, 16384);    var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;    for i in 0 .. sz - 1 by 16384 {      c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,               c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);          mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;    }  }   proc count(btlmt: int) { // count by 64 bits using CPU popcount...    const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);    const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));    var i = 0; var cnt = (lstwrd * 64 + 64): int;    while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }    return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;      }   // most of the time is spent doing culling operations as follows!...  proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,                                        ref strts: StrtsArr) {    const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;    const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!    const strtssz = strts.strtsarr.size;    // C pointer for speed magic!...    const cmpstsp = c_ptrTo(cmpsts);    const strtsp = c_ptrTo(strts.strtsarr);     // first fill the strts array with pre-calculated start addresses...    var i = 0; for bp in bpas {      // calculate page start address for the given base prime...      const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;      const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!      if s0 > ndxlmt then {        if i < strtssz then strtsp[i] = -1: int(32); break; }      var s = 0: int;      if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;      else { const r = (lwi: Prime - s0) % bbp;            if r == 0 then s = 0: int; else s = (bbp - r): int; };      if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }      if i < strtssz { strtsp[i] = -1; i = strtssz; }      // cull the full buffer for this given base prime as usual...      // only works up to limit of int(32)**2!!!!!!!!      while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }    }     // cull the smaller sub buffers according to the strts array...    for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {      i = 0; for bp in bpas { // bp never bigger than uint(32)!        // cull the sub buffer for this given base prime...        var s = strtsp[i]: int; if s < 0 then break;        var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;        if bpi <= bplmt { // use loop "unpeeling" for a small improvement...          const slmt = s + bpi * 8 - 1;          while s <= slmt {            const bmi = s & 7; const msk = bitmsk[bmi];            var c = s >> 3; const clmt = sbtlmt >> 3;            while c <= clmt { cmpstsp[c] |= msk; c += bpi; }            nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;          }          strtsp[i] = nxt: int(32); i += 1;        }        else { while s <= sbtlmt { // standard cull loop...                 cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }               strtsp[i] = s: int(32); i += 1; }      }    }  }} // a generic record that contains a page result generating function;// allows manual iteration through the use of the next() method;// multi-threaded through the use of a thread pool...class PagedResults {  const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output  var lwi: PrimeNdx; var bsbtsz: int;  var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;  var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();  var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();  var qi: int = 0;  var wrkq\$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;  var rsltsq\$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;   proc init(cvclsr, li: PrimeNdx, bsz: int) {    cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }   proc deinit() { // kill the thread pool when out of scope...    if bpas == nil then return; // no thread pool!    for i in wrkq\$.domain {      wrkq\$[i].writeEF(-1); while true { const r = rsltsq\$[i].readFE();        if r == nil then break; }    }  }   proc next(): cnvrtrclsr(lwi, sbs(0)).type {       proc dowrk(ri: int) { // used internally!...      while true {        const li = wrkq\$[ri].readFE(); // following to kill thread!        if li < 0 { rsltsq\$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }        sbs[ri].fill(li);        sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);        rsltsq\$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));      }    }    if this.bpas == nil { // init on first use; avoids data race!      this.bpas = new BasePrimeArrs();      if this.bsbtsz < CPUL1CACHE {        this.sbs = new SieveBuffer(bsbtsz);        this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});      }      else {        this.sbs = new SieveBuffer(CPUL2CACHE);        this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});      }      // start threadpool and give it inital work...      for i in rsltsq\$.domain {        begin with (const in i) dowrk(i);        this.wrkq\$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;      }    }    const rslt = this.rsltsq\$[qi].readFE();    this.wrkq\$[qi].writeEF(this.lwi);    this.lwi += this.sbs[qi].cmpsts.size * 8;    this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;    return rslt;  }   iter these() { while lwi >= 0 do yield next(); }} // the sieve buffer to base prime array converter closure...record SB2BPArr {    proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {    const bsprm = (lwi + lwi + 3): BasePrime;    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;    var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {                        arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }                      i += 1; }    return arr;  }} // a memoizing lazy list of BasePrimeArr's...class BasePrimeArrs {  var head: shared BasePrimeArr;  var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;  var lock\$: sync bool = true;  var feed: shared PagedResults(SB2BPArr) =    new shared PagedResults(new SB2BPArr(), 65536, 65536);   proc init() { // make our own first array to break data race!    var sb = new SieveBuffer(256); sb.fill(0);    const sb2 = new SB2BPArr();    head = sb2(0, sb): shared BasePrimeArr;    this.complete(); // fake base primes!    sb = new SieveBuffer(65536); sb.fill(0);    // use (completed) self as source of base primes!    var strts = new StrtsArr({ 0 .. 256 });    sb.cull(0, 65536, this, strts);    // replace head with new larger version culled using fake base primes!...    head = sb2(0, sb): shared BasePrimeArr;  }   // for initializing for use by the fillWHLPTRN proc...  proc init(hd: shared BasePrimeArr) {    head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);  }   // for initializing lazily extended list as required...  proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }   proc next(): shared BasePrimeArrs {    if this.tail == nil { // in case other thread slipped through!           if this.lock\$.readFE() && this.tail == nil { // empty sync -> block others!        const nhd = this.feed.next(): shared BasePrimeArr;        this.tail = new shared BasePrimeArrs(nhd , this.feed);      }      this.lock\$.writeEF(false); // fill the sync so other threads can do nothing!    }    return this.tail: shared BasePrimeArrs; // necessary cast!  }   iter these(): BasePrime {    for bp in head.bparr do yield bp; var cur = next();    while true {      for bp in cur.head.bparr do yield bp; cur = cur.next(); }  }} record SB2PrmArr {  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {    const bsprm = (lwi + lwi + 3): Prime;    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;    var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {                        arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }                      i += 1; }    return arr;  }} iter primes(): Prime {  for p in WHLPRMS do yield p: Prime;  for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do    for p in pa!.prmarr do yield p;} // use a class so that it can be used as a generic sync value!...class CntNxt { const cnt: int; const nxt: PrimeNdx; } // a class that emulates a closure and a return value...record SB2Cnt {  const nxtlmt: PrimeNdx;  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {    const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;    const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;    return new shared CntNxt(sb.count(btlmt), lstndx);  }} // couut primes to limit, just like it says...proc countPrimesTo(lmt: Prime): int(64) {  const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);  for p in WHLPRMS { if p > lmt then break; count += 1; }  if lmt < FRSTSVPRM then return count;  for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {    count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;  }  return count;} // test it...write("The first 25 primes are: "); var cnt = 0;for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); } cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }writeln("\nThere are ", cnt, " primes up to a million."); write("Sieving to ", LIMIT, " with ");write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");writeln("using ", NUMTHRDS, " threads."); var timer: Timer; timer.start();// the slow way!:// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }const count = countPrimesTo(LIMIT); // the fast way!timer.stop(); write("Found ", count, " primes up to ", LIMIT);writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.```

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.

## Clojure

`(defn primes< [n]   (remove (set (mapcat #(range (* % %) n %)                        (range 2 (Math/sqrt n))))           (range 2 n)))`

The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with `(println (time (count (primes< 1000000))))`, it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.

It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:

`(defn primes< [n]   (remove (into #{}                 (mapcat #(range (* % %) n %)                         (range 2 (Math/sqrt n))))           (range 2 n)))`

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:

`(defn primes< [n]   (if (< n 2) ()     (cons 2 (remove (into #{}                           (mapcat #(range (* % %) n %)                                   (range 3 (Math/sqrt n) 2)))                     (range 3 n 2)))))`

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:

`(defn primes-to  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"  [n]  (let [root (-> n Math/sqrt long),        cmpsts (boolean-array (inc n)),        cullp (fn [p]                (loop [i (* p p)]                  (if (<= i n)                    (do (aset cmpsts i true)                        (recur (+ i p))))))]    (do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))                                       (range 2 (inc root)))))        (filter #(not (aget cmpsts %)) (range 2 (inc n))))))`

Alternative implementation using Clojure's side-effect oriented list comprehension.

` (defn primes-to  "Returns a lazy sequence of prime numbers less than lim"  [lim]  (let [refs (boolean-array (+ lim 1) true)        root (int (Math/sqrt lim))]    (do (doseq [i (range 2 lim)                :while (<= i root)                :when (aget refs i)]          (doseq [j (range (* i i) lim i)]            (aset refs j false)))        (filter #(aget refs %) (range 2 lim)))))`

Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.

`(defn primes-to  "Returns a lazy sequence of prime numbers less than lim"  [lim]  (let [max-i (int (/ (- lim 1) 2))        refs (boolean-array max-i true)        root (/ (dec (int (Math/sqrt lim))) 2)]    (do (doseq [i (range 1 (inc root))                :when (aget refs i)]          (doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]            (aset refs j false)))        (cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))`

This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).

Alternative very slow entirely functional implementation using lazy sequences

`(defn primes-to  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"  [n]  (letfn [(nxtprm [cs] ; current candidates            (let [p (first cs)]              (if (> p (Math/sqrt n)) cs                (cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)                                              set (remove cs) rest)))))))]    (nxtprm (range 2 (inc n)))))`

The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.

Version based on immutable Vector's

Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:

`(defn primes-to  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"  [max-prime]  (let [sieve (fn [s n]                (if (<= (* n n) max-prime)                  (recur (if (s n)                           (reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))                           s)                         (inc n))                  s))]    (->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))             (assoc 0 false)             (assoc 1 false)             (sieve 2))         (map-indexed #(vector %2 %1)) (filter first) (map second))))`

The above code is still quite slow due to the cost of the immutable copy-on-modify operations.

Odds only bit packed mutable array based version

The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:

`(set! *unchecked-math* true) (defn primes-to  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"  [n]  (let [root (-> n Math/sqrt long),        rootndx (long (/ (- root 3) 2)),        ndx (long (/ (- n 3) 2)),        cmpsts (long-array (inc (/ ndx 64))),        isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))                               (bit-shift-left 1 (bit-and % 63)))),        cullp (fn [i]                (let [p (long (+ i i 3))]	                (loop [i (bit-shift-right (- (* p p) 3) 1)]	                  (if (<= i ndx)	                    (do (let [w (bit-shift-right i 6)]	                    (aset cmpsts w (bit-or (aget cmpsts w)	                                           (bit-shift-left 1 (bit-and i 63)))))	                        (recur (+ i p))))))),        cull (fn [] (loop [i 0] (if (<= i rootndx)                                  (do (if (isprm i) (cullp i)) (recur (inc i))))))]    (letfn [(nxtprm [i] (if (<= i ndx)                          (cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]                                                              (if (or (> i ndx) (isprm i)) i                                                                (recur (inc i)))))))))]      (if (< n 2) nil        (cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))`

The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.

If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.

The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):

`(defn primes-tox  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"  [n]  (let [root (-> n Math/sqrt long),        rootndx (long (/ (- root 3) 2)),        ndx (max (long (/ (- n 3) 2)) 0),        lmt (quot ndx 64),        cmpsts (long-array (inc lmt)),        cullp (fn [i]                (let [p (long (+ i i 3))]	                (loop [i (bit-shift-right (- (* p p) 3) 1)]	                  (if (<= i ndx)	                    (do (let [w (bit-shift-right i 6)]                            (aset cmpsts w (bit-or (aget cmpsts w)                                                   (bit-shift-left 1 (bit-and i 63)))))                          (recur (+ i p))))))),        cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)                                                 (bit-shift-left -2 (bit-and ndx 63))))                        (loop [i 0]                          (when (<= i rootndx)                            (when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))                                                   (bit-shift-left 1 (bit-and i 63))))                              (cullp i))                            (recur (inc i))))))        numprms (fn []                  (let [w (dec (alength cmpsts))] ;; fast results count bit counter                    (loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]                      (if (> i w) cnt                        (recur (inc i)                                (- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]    (if (< n 2) nil      (cons 2 (if (< n 3) nil                (do (cull)                    (deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???                      clojure.lang.ISeq                        (first [_] (if (nil? cmpsa) nil (+ i i 3)))                        (next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil                                                          (OPSeq.                                                            (loop [j (inc i)]                                                              (let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))                                                                                       (bit-shift-left 1 (bit-and j 63))))]                                                                (if p? j (recur (inc j)))))                                                            cmpsa ncnt tcnt))))                        (more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)                                                             (.next this))))                        (cons [this o] (clojure.core/cons o this))                        (empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))                        (equiv [this o] (if (or (not= (type this) (type o))                                                (not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))                                                (not= i (.i ^OPSeq o))) false true))                        clojure.lang.Counted                          (count [_] (- tcnt cnt))                        clojure.lang.Seqable                          (clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))                        clojure.lang.IReduce                          (reduce [_ f v] (let [c (- tcnt cnt)]                                            (if (<= c 0) nil                                              (loop [ci i, n c, rslt v]                                                (if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))                                                                    (bit-shift-left 1 (bit-and ci 63))))                                                  (let [rrslt (f rslt (+ ci ci 3)),                                                        rdcd (reduced? rrslt),                                                        nrslt (if rdcd @rrslt rrslt)]                                                    (if (or (<= n 1) rdcd) nrslt                                                      (recur (inc ci) (dec n) nrslt)))                                                  (recur (inc ci) n rslt))))))                          (reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)                                                              (.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))                        clojure.lang.Sequential                        Object                          (toString [this] (if (= cnt tcnt) "()"                                             (.toString (seq (map identity this))))))                     (->OPSeq 0 cmpsts 0 (numprms))))))))`

'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.

Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.

Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.

To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.

Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.

### Unbounded Versions

For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.

The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:

A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)

`(defn primes-Bird  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."  []  (letfn [(mltpls [p] (let [p2 (* 2 p)]                        (letfn [(nxtmltpl [c]                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]                          (nxtmltpl (* p p))))),          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),          (union [xs ys] (let [xv (first xs), yv (first ys)]                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),          (mrgmltpls [mltplss] (cons (first (first mltplss))                                     (lazy-seq (union (next (first mltplss))                                                      (mrgmltpls (next mltplss)))))),          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]                                    (if (< n (first cmpsts))                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))                                      (recur (+ n 2) (next cmpsts)))))]    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]                                         (minusStrtAt 5 cmpsts)))))        (cons 2 (lazy-seq oddprms)))))`

The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.

A Clojure version of the tree folding sieve using Lazy Sequences

The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:

`(defn primes-treeFolding  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."  []  (letfn [(mltpls [p] (let [p2 (* 2 p)]                        (letfn [(nxtmltpl [c]                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]                          (nxtmltpl (* p p))))),          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),          (union [xs ys] (let [xv (first xs), yv (first ys)]                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),          (pairs [mltplss] (let [tl (next mltplss)]                             (cons (union (first mltplss) (first tl))                                   (lazy-seq (pairs (next tl)))))),          (mrgmltpls [mltplss] (cons (first (first mltplss))                                     (lazy-seq (union (next (first mltplss))                                                      (mrgmltpls (pairs (next mltplss))))))),          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]                                    (if (< n (first cmpsts))                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))                                      (recur (+ n 2) (next cmpsts)))))]    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]                                         (minusStrtAt 5 cmpsts)))))        (cons 2 (lazy-seq oddprms)))))`

The above code is still slower than it should be due to the slowness of Clojure's sequence operations.

A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence

The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:

`(deftype CIS [v cont]  clojure.lang.ISeq    (first [_] v)    (next [_] (if (nil? cont) nil (cont)))    (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))    (cons [this o] (clojure.core/cons o this))    (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))    (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)                                                (if (or (not= (type cis1) (type cis2))                                                        (not= (.v cis1) (.v ^CIS cis2))                                                        (and (nil? (.cont cis1))                                                              (not (nil? (.cont ^CIS cis2))))                                                        (and (nil? (.cont ^CIS cis2))                                                              (not (nil? (.cont cis1))))) false                                                  (if (nil? (.cont cis1)) true                                                    (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))    (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt                                            (recur ((.cont cis)) (inc cnt)))))  clojure.lang.Seqable    (seq [this] (if (and (nil? v) (nil? cont)) nil this))  clojure.lang.Sequential  Object    (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this)))))) (defn primes-treeFoldingx  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."  []  (letfn [(mltpls [p] (let [p2 (* 2 p)]                        (letfn [(nxtmltpl [c]                                  (->CIS c (fn [] (nxtmltpl (+ c p2)))))]                          (nxtmltpl (* p p))))),          (allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),          (union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]                                     (if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))                                       (if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))                                         (->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),          (pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]                                  (->CIS (union (.v mltplss) (.v tl))                                         (fn [] (pairs ((.cont tl))))))),          (mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))                                           (fn [] (union ((.cont ^CIS (.v mltplss)))                                                         (mrgmltpls (pairs ((.cont mltplss)))))))),          (minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]                                         (if (< n (.v cmpsts))                                           (->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))                                           (recur (+ n 2) ((.cont cmpsts))))))]    (do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]                                       (minusStrtAt 5 cmpsts)))))        (->CIS 2 (fn [] oddprms)))))`

'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.

The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.

Incremental Hash Map based unbounded "odds-only" version

The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

`(defn primes-hashmap  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"  []  (letfn [(nxtoddprm [c q bsprms cmpsts]            (if (>= c q) ;; only ever equal              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]                (recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))              (if (contains? cmpsts c)                (recur (+ c 2) q bsprms                       (let [adv (cmpsts c), ncmps (dissoc cmpsts c)]                         (assoc ncmps                                (loop [try (+ c adv)] ;; ensure map entry is unique                                  (if (contains? ncmps try)                                    (recur (+ try adv)) try)) adv)))                (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))`

The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.

There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.

Incremental Priority Queue based unbounded "odds-only" version

In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

`(deftype PQEntry [k, v]  Object    (toString [_] (str "<" k "," v ">")))(deftype PQNode [ntry, lft, rght]  Object    (toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">"))) (defn empty-pq [] nil) (defn getMin-pq [^PQNode pq]  (if (nil? pq)    nil    (.ntry pq))) (defn insert-pq [^PQNode opq ok v]  (loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]    (if (nil? pq)      (cont (->PQNode kv nil nil))      (let [k (.k kv),            ^PQEntry kvn (.ntry pq), kn (.k kvn),            l (.lft pq), r (.rght pq)]        (if (<= k kn)          (recur kvn r #(cont (->PQNode kv % l)))          (recur kv r #(cont (->PQNode kvn % l)))))))) (defn replaceMinAs-pq [^PQNode opq k v]  (let [^PQEntry kv (->PQEntry k v)]    (if (nil? opq) ;; if was empty or just an entry, just use current entry      (->PQNode kv nil nil)      (loop [pq opq, cont identity]        (let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]          (cond ;; if left us empty, right must be too            (nil? l)              (cont (->PQNode kv nil nil)),            (nil? r) ;; we only have a left...              (let [^PQEntry kvl (.ntry l), kl (.k kvl)]                    (if (<= k kl)                      (cont (->PQNode kv l nil))                      (recur l #(cont (->PQNode kvl % nil))))),            :else (let [^PQEntry kvl (.ntry l), kl (.k kvl),                        ^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both                    (if (and (<= k kl) (<= k kr))                      (cont (->PQNode kv l r))                      (if (<= kl kr)                        (recur l #(cont (->PQNode kvl % r)))                        (recur r #(cont (->PQNode kvr l %))))))))))))`

Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.

The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

`(defn primes-pq  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"  []  (letfn [(nxtoddprm [c q bsprms cmpsts]            (if (>= c q) ;; only ever equal              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]                (recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))              (let [mn (getMin-pq cmpsts)]                (if (and mn (>= c (.k mn))) ;; never greater than                  (recur (+ c 2) q bsprms                         (loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value                           (let [ncmps (replaceMinAs-pq cmps (+ c adv) adv),                                 nmn (getMin-pq ncmps)]                             (if (and nmn (>= c (.k nmn)))                               (recur (.v nmn) ncmps)                               ncmps))))                  (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))`

The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.

It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).

Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.

All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.

It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.

Bit packed page segmented array unbounded "odds-only" version

To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:

`(set! *unchecked-math* true) (def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache(def PGBTS (bit-shift-left PGSZ 3))(def PGWRDS (bit-shift-right PGBTS 5))(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes(def BPBTS (bit-shift-left BPWRDS 5))(defn- count-pg  "count primes in the culled page buffer, with test for limit"  [lmt ^ints pg]  (let [pgsz (alength pg),        pgbts (bit-shift-left pgsz 5),        cntem (fn [lmtw]                (let [lmtw (long lmtw)]	          (loop [i (long 0), c (long 0)]	            (if (>= i lmtw) (- (bit-shift-left lmtw 5) c)	              (recur (inc i)	              (+ c (java.lang.Integer/bitCount (aget pg i))))))))]    (if (< lmt pgbts)      (let [lmtw (bit-shift-right lmt 5),            lmtb (bit-and lmt 31)            msk (bit-shift-left -2 lmtb)]        (+ (cntem lmtw)           (- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)                                                      msk)))))      (- pgbts         (areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))));;      (cntem pgsz))))(defn- primes-pages  "unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."  []  (letfn [(make-pg [lowi pgsz bpgs]            (let [lowi (long lowi),                  pgbts (long (bit-shift-left pgsz 5)),                  pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),                  ^ints pg (int-array pgsz),                  cull (fn [bpgs']                         (loop [i (long 0), bpgs' bpgs']	                         (let [^ints fbpg (first bpgs'),	                               bpgsz (long (alength fbpg))]	                           (if (>= i bpgsz)	                             (recur 0 (next bpgs'))	                             (let [p (long (aget fbpg i)),	                                   sqr (long (* p p))]	                               (if (< sqr pgrng) (do                   (loop [j (long (let [s (long (bit-shift-right (- sqr 3) 1))]                                     (if (>= s lowi) (- s lowi)                                       (let [m (long (rem (- lowi s) p))]                                         (if (zero? m)                                           0                                           (- p m))))))]                     (if (< j pgbts) ;; fast inner culling loop where most time is spent                       (do                         (let [w (bit-shift-right j 5)]                           (aset pg w (int (bit-or (aget pg w)                                                   (bit-shift-left 1 (bit-and j 31))))))                         (recur (+ j p)))))                     (recur (inc i) bpgs'))))))))]              (do (if (nil? bpgs)                    (letfn [(mkbpps [i]                              (if (zero? (bit-and (aget pg (bit-shift-right i 5))                                                  (bit-shift-left 1 (bit-and i 31))))                                (cons (int-array 1 (+ i i 3)) (lazy-seq (mkbpps (inc i))))                                (recur (inc i))))]                      (cull (mkbpps 0)))                    (cull bpgs))                  pg))),          (page-seq [lowi pgsz bps]            (letfn [(next-seq [lwi]                      (cons (make-pg lwi pgsz bps)                            (lazy-seq (next-seq (+ lwi (bit-shift-left pgsz 5))))))]              (next-seq lowi)))          (pgs->bppgs [ppgs]            (letfn [(nxt-pg [lowi pgs]                      (let [^ints pg (first pgs),                            cnt (count-pg BPBTS pg),                            npg (int-array cnt)]                        (do (loop [i 0, j 0]                              (if (< i BPBTS)                                (if (zero? (bit-and (aget pg (bit-shift-right i 5))                                                    (bit-shift-left 1 (bit-and i 31))))                                  (do (aset npg j (+ (bit-shift-left (+ lowi i) 1) 3))                                      (recur (inc i) (inc j)))                                  (recur (inc i) j))))                            (cons npg (lazy-seq (nxt-pg (+ lowi BPBTS) (next pgs)))))))]              (nxt-pg 0 ppgs))),          (make-base-prms-pgs []            (pgs->bppgs (cons (make-pg 0 BPWRDS nil)                              (lazy-seq (page-seq BPBTS BPWRDS (make-base-prms-pgs))))))]    (page-seq 0 PGWRDS (make-base-prms-pgs))))(defn primes-paged  "unbounded Sieve of Eratosthenes producing a lazy sequence of primes"  []  (do (deftype CIS [v cont]        clojure.lang.ISeq          (first [_] v)          (next [_] (if (nil? cont) nil (cont)))          (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))          (cons [this o] (clojure.core/cons o this))          (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))          (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)                                                      (if (or (not= (type cis1) (type cis2))                                                              (not= (.v cis1) (.v ^CIS cis2))                                                              (and (nil? (.cont cis1))                                                                   (not (nil? (.cont ^CIS cis2))))                                                              (and (nil? (.cont ^CIS cis2))                                                                   (not (nil? (.cont cis1))))) false                                                        (if (nil? (.cont cis1)) true                                                          (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))          (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt                                                  (recur ((.cont cis)) (inc cnt)))))        clojure.lang.Seqable          (seq [this] (if (and (nil? v) (nil? cont)) nil this))        clojure.lang.Sequential        Object          (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))		  (letfn [(next-prm [lowi i pgseq]		            (let [lowi (long lowi),                      i (long i),                      ^ints pg (first pgseq),		                  pgsz (long (alength pg)),		                  pgbts (long (bit-shift-left pgsz 5)),		                  ni (long (loop [j (long i)]		                             (if (or (>= j pgbts)		                                     (zero? (bit-and (aget pg (bit-shift-right j 5))		                                               (bit-shift-left 1 (bit-and j 31)))))		                               j		                               (recur (inc j)))))]		              (if (>= ni pgbts)		                (recur (+ lowi pgbts) 0 (next pgseq))		                (->CIS (+ (bit-shift-left (+ lowi ni) 1) 3)		                       (fn [] (next-prm lowi (inc ni) pgseq))))))]		    (->CIS 2 (fn [] (next-prm 0 0 (primes-pages)))))))(defn primes-paged-count-to  "counts primes generated by page segments by Sieve of Eratosthenes to the top limit"  [top]  (cond (< top 2) 0        (< top 3) 1        :else (letfn [(nxt-pg [lowi pgseq cnt]                        (let [topi (bit-shift-right (- top 3) 1)                              nxti (+ lowi PGBTS),                              pg (first pgseq)]                          (if (> nxti topi)                            (+ cnt (count-pg (- topi lowi) pg))                            (recur nxti                                   (next pgseq)                                   (+ cnt (count-pg PGBTS pg))))))]                (nxt-pg 0 (primes-pages) 1))))`

The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM

Clojure does one thing very slowly, just as here: it enumerates extremely slowly as compared to using a more imperative iteration interface; it helps to use a roll-your-own ISeq interface as here, where enumeration of the primes reduces the time from about four times as long as the composite culling operations for those primes to only about one and a half times as long, although one must also write their own sequence handling functions (can't use "take-while" or "count", for instance) in order to enjoy that benefit. That is why the "primes-paged-count-to" function is provided so it takes a negligible percentage of the time to count the primes over a range as compared to the time for the composite culling operations.

The practical range of the above sieve is about 16 million due to the fixed size of the page buffers; in order to extend the range, a larger page buffer could be used up to the size of the CPU L2 or L3 caches. If a 2^20 buffer were used (one Megabyte, as many modern dexktop CPU's easily have in their L3 cache), then the range would be increased up to about 10^14 at a cost of about a factor of two or three in slower memory accesses per composite culling operation loop. The base primes culling page size is already adequate for this range. One could make the culling page size automatically expand with growing range by about the square root of the current prime range with not too many changes to the code.

As for many implementations of unbounded sieves, the base primes less than the square root of the current range are generated by a secondary generated stream of primes; in this case it is done recursively, so another secondary stream generates the base primes for the base primes and so on down to where the innermost generator has only one page in the stream; this only takes one or two recursions for this type of range.

The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.

## CMake

`function(eratosthenes var limit)  # Check for integer overflow. With CMake using 32-bit signed integer,  # this check fails when limit > 46340.  if(NOT limit EQUAL 0)         # Avoid division by zero.    math(EXPR i "(\${limit} * \${limit}) / \${limit}")    if(NOT limit EQUAL \${i})      message(FATAL_ERROR "limit is too large, would cause integer overflow")    endif()  endif()   # Use local variables prime_2, prime_3, ..., prime_\${limit} as array.  # Initialize array to y => yes it is prime.  foreach(i RANGE 2 \${limit})    set(prime_\${i} y)  endforeach(i)   # Gather a list of prime numbers.  set(list)  foreach(i RANGE 2 \${limit})    if(prime_\${i})      # Append this prime to list.      list(APPEND list \${i})       # For each multiple of i, set n => no it is not prime.      # Optimization: start at i squared.      math(EXPR square "\${i} * \${i}")      if(NOT square GREATER \${limit})   # Avoid fatal error.        foreach(m RANGE \${square} \${limit} \${i})          set(prime_\${m} n)        endforeach(m)      endif()    endif(prime_\${i})  endforeach(i)  set(\${var} \${list} PARENT_SCOPE)endfunction(eratosthenes)`
```# Print all prime numbers through 100.
eratosthenes(primes 100)
message(STATUS "\${primes}")
```

## COBOL

`*> Please ignore the asterisks in the first column of the next comments,*> which are kludges to get syntax highlighting to work.       IDENTIFICATION DIVISION.       PROGRAM-ID. Sieve-Of-Eratosthenes.        DATA DIVISION.       WORKING-STORAGE SECTION.        01  Max-Number       USAGE UNSIGNED-INT.       01  Max-Prime        USAGE UNSIGNED-INT.        01  Num-Group.           03  Num-Table PIC X VALUE "P"                   OCCURS 1 TO 10000000 TIMES DEPENDING ON Max-Number                   INDEXED BY Num-Index.               88  Is-Prime VALUE "P" FALSE "N".        01  Current-Prime    USAGE UNSIGNED-INT.        01  I                USAGE UNSIGNED-INT.        PROCEDURE DIVISION.           DISPLAY "Enter the limit: " WITH NO ADVANCING           ACCEPT Max-Number           DIVIDE Max-Number BY 2 GIVING Max-Prime *          *> Set Is-Prime of all non-prime numbers to false.           SET Is-Prime (1) TO FALSE           PERFORM UNTIL Max-Prime < Current-Prime*              *> Set current-prime to next prime.               ADD 1 TO Current-Prime               PERFORM VARYING Num-Index FROM Current-Prime BY 1                   UNTIL Is-Prime (Num-Index)               END-PERFORM               MOVE Num-Index TO Current-Prime *              *> Set Is-Prime of all multiples of current-prime to*              *> false, starting from current-prime sqaured.               COMPUTE Num-Index = Current-Prime ** 2               PERFORM UNTIL Max-Number < Num-Index                   SET Is-Prime (Num-Index) TO FALSE                   SET Num-Index UP BY Current-Prime               END-PERFORM           END-PERFORM *          *> Display the prime numbers.           PERFORM VARYING Num-Index FROM 1 BY 1                   UNTIL Max-Number < Num-Index               IF Is-Prime (Num-Index)                   DISPLAY Num-Index               END-IF           END-PERFORM            GOBACK           .`

## Comal

Translation of: BASIC
`// Sieve of Eratosthenesinput "Limit? ": limitdim sieve(1:limit)sqrlimit:=sqr(limit)sieve(1):=1p:=2while p<=sqrlimit do while sieve(p) and p<sqrlimit do  p:=p+1 endwhile if p>sqrlimit then goto done for i:=p*p to limit step p do  sieve(i):=1 endfor i p:=p+1endwhiledone:print 2,for i:=3 to limit do if sieve(i)=0 then  print ", ",i, endifendfor iprint`
Output:
```Limit? 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97

end```

## Common Lisp

`(defun sieve-of-eratosthenes (maximum)  (loop     with sieve = (make-array (1+ maximum)                              :element-type 'bit                              :initial-element 0)     for candidate from 2 to maximum     when (zerop (bit sieve candidate))     collect candidate     and do (loop for composite from (expt candidate 2)                to maximum by candidate               do (setf (bit sieve composite) 1))))`

Working with odds only (above twice speedup), and marking composites only for primes up to the square root of the maximum:

`(defun sieve-odds (maximum)  "Prime numbers sieve for odd numbers.    Returns a list with all the primes that are less than or equal to maximum."  (loop :with maxi = (ash (1- maximum) -1)        :with stop = (ash (isqrt maximum) -1)        :with sieve = (make-array (1+ maxi) :element-type 'bit :initial-element 0)        :for i :from 1 :to maxi        :for odd-number = (1+ (ash i 1))        :when (zerop (sbit sieve i))          :collect odd-number :into values        :when (<= i stop)          :do (loop :for j :from (* i (1+ i) 2) :to maxi :by odd-number                    :do (setf (sbit sieve j) 1))        :finally (return (cons 2 values))))`

The indexation scheme used here interprets each index `i` as standing for the value `2i+1`. Bit `0` is unused, a small price to pay for the simpler index calculations compared with the `2i+3` indexation scheme. The multiples of a given odd prime `p` are enumerated in increments of `2p`, which corresponds to the index increment of `p` on the sieve array. The starting point `p*p = (2i+1)(2i+1) = 4i(i+1)+1` corresponds to the index `2i(i+1)`.

While formally a wheel, odds are uniformly spaced and do not require any special processing except for value translation. Wheels proper aren't uniformly spaced and are thus trickier.

## Cowgol

`include "cowgol.coh"; # To change the maximum prime, change the size of this array# Everything else is automatically filled in at compile timevar sieve: uint8; # Make sure all elements of the sieve are set to zeroMemZero(&sieve as [uint8], @bytesof sieve); # Generate the sievevar prime: @indexof sieve := 2;while prime < @sizeof sieve loop    if sieve[prime] == 0 then        var comp: @indexof sieve := prime * prime;        while comp < @sizeof sieve loop            sieve[comp] := 1;            comp := comp + prime;        end loop;    end if;    prime := prime + 1;end loop; # Print all primesvar cand: @indexof sieve := 2;while cand < @sizeof sieve loop    if sieve[cand] == 0 then        print_i16(cand as uint16);        print_nl();    end if;    cand := cand + 1;end loop;`
Output:
```2
3
5
7
11
...
4967
4969
4973
4987
4999```

## Crystal

### Basic Version

This implementation uses a `BitArray` so it is automatically bit-packed to use just one bit per number representation:

`# compile with `--release --no-debug` for speed... require "bit_array" alias Prime = UInt64 class SoE  include Iterator(Prime)  @bits : BitArray; @bitndx : Int32 = 2   def initialize(range : Prime)    if range < 2      @bits = BitArray.new 0    else      @bits = BitArray.new((range + 1).to_i32)    end    ba = @bits; ndx = 2    while true      wi = ndx * ndx      break if wi >= ba.size      if ba[ndx]        ndx += 1; next      end      while wi < ba.size        ba[wi] = true; wi += ndx      end      ndx += 1    end  end   def next    while @bitndx < @bits.size      if @bits[@bitndx]        @bitndx += 1; next      end      rslt = @bitndx.to_u64; @bitndx += 1; return rslt    end    stop  endend print "Primes up to a hundred:  "SoE.new(100).each { |p| print " ", p }; putsprint "Number of primes to a million:  "puts SoE.new(1_000_000).each.sizeprint "Number of primes to a billion:  "start_time = Time.monotonicprint SoE.new(1_000_000_000).each.sizeelpsd = (Time.monotonic - start_time).total_millisecondsputs " in #{elpsd} milliseconds."`
Output:
```Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 10219.222539 milliseconds.```

This is as run on an Intel SkyLake i5-6500 at 3.6 GHz (automatic boost for single threaded as here).

### Odds-Only Version

the non-odds-only version as per the above should never be used because in not using odds-only, it uses twice the memory and over two and a half times the CPU operations as the following odds-only code, which is very little more complex:

`# compile with `--release --no-debug` for speed... require "bit_array" alias Prime = UInt64 class SoE_Odds  include Iterator(Prime)  @bits : BitArray; @bitndx : Int32 = -1   def initialize(range : Prime)    if range < 3      @bits = BitArray.new 0    else      @bits = BitArray.new(((range - 1) >> 1).to_i32)    end    ba = @bits; ndx = 0    while true      wi = (ndx + ndx) * (ndx + 3) + 3 # start cull index calculation      break if wi >= ba.size      if ba[ndx]        ndx += 1; next      end      bp = ndx + ndx + 3      while wi < ba.size        ba[wi] = true; wi += bp      end      ndx += 1    end  end   def next    while @bitndx < @bits.size      if @bitndx < 0        @bitndx += 1; return 2_u64      elsif @bits[@bitndx]        @bitndx += 1; next      end      rslt = (@bitndx + @bitndx + 3).to_u64; @bitndx += 1; return rslt    end    stop  endend print "Primes up to a hundred:  "SoE_Odds.new(100).each { |p| print " ", p }; putsprint "Number of primes to a million:  "puts SoE_Odds.new(1_000_000).each.sizeprint "Number of primes to a billion:  "start_time = Time.monotonicprint SoE_Odds.new(1_000_000_000).each.sizeelpsd = (Time.monotonic - start_time).total_millisecondsputs " in #{elpsd} milliseconds."`
Output:
```Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 4877.829642 milliseconds.```

As can be seen, this is over two times faster than the non-odds-only version when run on the same CPU due to reduced pressure on the CPU data cache; however it is only reasonably performant for ranges of a few millions, and above that a page-segmented version of odds-only (or further wheel factorization) should be used plus other techniques for a further reduction of number of CPU clock cycles per culling/marking operation.

### Page-Segmented Odds-Only Version

For sieving of ranges larger than a few million efficiently, a page-segmented sieve should always be used to preserve CPU cache associativity by making the page size to be about that of the CPU L1 data cache. The following code implements a page-segmented version that is an extensible sieve (no upper limit needs be specified) using a secondary memoized feed of base prime value arrays which use a smaller page-segment size for efficiency. When the count of the number of primes is desired, the sieve is polymorphic in output and counts the unmarked composite bits by using fast `popcount` instructions taken 64-bits at a time. The code is as follows:

`# compile with `--release --no-debug` for speed... alias Prime = UInt64alias PrimeNdx = Int64alias PrimeArr = Array(Prime)alias SieveBuffer = Pointer(UInt8)alias BasePrime = UInt32alias BasePrimeArr = Array(BasePrime) CPUL1CACHE = 131072 # 16 Kilobytes in nimber of bits BITMASK = Pointer(UInt8).malloc(8) { |i| 1_u8 << i } # Count number of non-composite (zero) bits within index range...# sieve buffer is always evenly divisible by 64-bit words...private def count_page_to(ndx : Int32, sb : SieveBuffer)  lstwrdndx = ndx >> 6; mask = (~1_u64) << (ndx & 63)  cnt = lstwrdndx * 64 + 64; sbw = sb.as(Pointer(UInt64))  lstwrdndx.times { |i| cnt -= sbw[i].popcount }  cnt - (sbw[lstwrdndx] | mask).popcountend # Cull composite bits from sieve buffer using base prime arrays;# starting at overall given prime index for given buffer bit size...private def cull_page(pndx : PrimeNdx, bitsz : Int32,              bps : Iterator(BasePrimeArr), sb : SieveBuffer)  bps.each { |bpa|    bpa.each { |bpu32|      bp = bpu32.to_i64; bpndx = (bp - 3) >> 1      swi = (bpndx + bpndx) * (bpndx + 3) + 3 # calculate start prime index      return if swi >= pndx + bitsz.to_i64      bpi = bp.to_i32 # calculate buffer start culling index...      bi = (swi >= pndx) ? (swi - pndx).to_i32 : begin        r = (pndx - swi) % bp; r == 0 ? 0 : bpi - r.to_i32      end      # when base prime is small enough, cull using strided loops to      # simplify the inner loops at the cost of more loop overhead...      # allmost all of the work is done by the following loop...      if bpi < (bitsz >> 4)        bilmt = bi + (bpi << 3); cplmt = sb + (bitsz >> 3)        bilmt = CPUL1CACHE if bilmt > CPUL1CACHE        while bi < bilmt          cp = sb + (bi >> 3); msk = BITMASK[bi & 7]          while cp < cplmt # use pointer to save loop overhead            cp |= msk; cp += bpi          end          bi += bpi        end      else        while bi < bitsz # bitsz          sb[bi >> 3] |= BITMASK[bi & 7]; bi += bpi        end      end } }end # Iterator over processed prime pages, polymorphic by the converter function...private class PagedResults(T)  @bpas : BasePrimeArrays  @cmpsts : SieveBuffer   def initialize(@prmndx : PrimeNdx,                 @cmpstsbitsz : Int32,                 @cnvrtrfnc : (Int64, Int32, SieveBuffer) -> T)    @bpas = BasePrimeArrays.new    @cmpsts = SieveBuffer.malloc(((@cmpstsbitsz + 63) >> 3) & (-8))  end   private def dopage    (@prmndx..).step(@cmpstsbitsz.to_i64).map { |pn|        @cmpsts.clear(@cmpstsbitsz >> 3)        cull_page(pn, @cmpstsbitsz, @bpas.each, @cmpsts)        @cnvrtrfnc.call(pn, @cmpstsbitsz, @cmpsts) }  end   def each    dopage  end   def each(& : T -> _) : Nil    itr = dopage    while true      value = itr.next      break if value.is_a?(Iterator::Stop)      yield value    end  endend # Secondary memoized chain of BasePrime arrays (by small page size),# which is actually a iterable lazy list (memoized) of BasePrimeArr;# Crystal has closures, so it is easy to implement a LazyList class# which memoizes the results of the thunk so it is only executed once...private class BasePrimeArrays  @baseprmarr : BasePrimeArr # head of lezy list  @tail : BasePrimeArrays? = nil # tail starts as non-existing   def initialize # special case for first page of base primes    # converter of sieve buffer to base primes array...    sb2bparrprc = -> (pn : PrimeNdx, bl : Int32, sb : SieveBuffer) {      cnt = count_page_to(bl - 1, sb)      bparr = BasePrimeArr.new(cnt, 0); j = 0      bsprm = (pn + pn + 3).to_u32      bl.times.each { |i|        next if (sb[i >> 3] & BITMASK[i & 7]) != 0         bparr[j] = bsprm + (i + i).to_u32; j += 1 }      bparr }     cmpsts = SieveBuffer.malloc 128 # fake bparr for first iter...    frstbparr = sb2bparrprc.call(0_i64, 1024, cmpsts)    cull_page(0_i64, 1024, Iterator.of(frstbparr).each, cmpsts)    @baseprmarr = sb2bparrprc.call(0_i64, 1024, cmpsts)     # initialization of pages after the first is deferred to avoid data race...    initbpas = -> { PagedResults.new(1024_i64, 1024, sb2bparrprc).each }    # recursive LazyList generator function...    nxtbpa = uninitialized Proc(Iterator(BasePrimeArr), BasePrimeArrays)    nxtbpa = -> (bppgs : Iterator(BasePrimeArr)) {      nbparr = bppgs.next      abort "Unexpectedbase primes end!!!" if nbparr.is_a?(Iterator::Stop)      BasePrimeArrays.new(nbparr, ->{ nxtbpa.call(bppgs) }) }    @thunk = ->{ nxtbpa.call(initbpas.call) }  end  def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(BasePrimeArrays))  end  def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(Nil))  end  def initialize(@baseprmarr : BasePrimeArr, @thunk : Nil)  end   def tail # not thread safe without a lock/mutex...    if thnk = @thunk      @tail = thnk.call; @thunk = nil    end    @tail  end   private class BasePrimeArrIter # iterator over BasePrime arrays...    include Iterator(BasePrimeArr)    @dbparrs : Proc(BasePrimeArrays?)     def initialize(fromll : BasePrimeArrays)      @dbparrs = ->{ fromll.as(BasePrimeArrays?) }    end     def next      if bpas = @dbparrs.call        rslt = [email protected]; @dbparrs = -> { bpas.tail }; rslt      else        abort "Unexpected end of base primes array iteration!!!"      end    end  end   def each    BasePrimeArrIter.new(self)  endend # An "infinite" extensible iteration of primes,...def primes  sb2prms = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {    cnt = count_page_to(bitsz - 1, sb)    prmarr = PrimeArr.new(cnt, 0); j = 0    bsprm = (pn + pn + 3).to_u64    bitsz.times.each { |i|      next if (sb[i >> 3] & BITMASK[i & 7]) != 0      prmarr[j] = bsprm + (i + i).to_u64; j += 1 }    prmarr  }  (2_u64..2_u64).each    .chain PagedResults.new(0, CPUL1CACHE, sb2prms).each.flat_map { |prmspg| prmspg.each }end # Counts number of primes to given limit...def primes_count_to(lmt : Prime)  if lmt < 3    lmt < 2 ? return 0 : return 1  end  lmtndx = ((lmt - 3) >> 1).to_i64  sb2cnt = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {    pglmt = pn + bitsz.to_i64 - 1    if (pn + CPUL1CACHE.to_i64) > lmtndx      Tuple.new(count_page_to((lmtndx - pn).to_i32, sb).to_i64, pglmt)    else      Tuple.new(count_page_to(bitsz - 1, sb).to_i64, pglmt)    end  }  count = 1  PagedResults.new(0, CPUL1CACHE, sb2cnt).each { |(cnt, lmt)|    count += cnt; break if lmt >= lmtndx }  countend print "The primes up to 100 are: "primes.each.take_while { |p| p <= 100_u64 }.each { |p| print " ", p }print ".\r\nThe Number of primes up to a million is "print primes.each.take_while { |p| p <= 1_000_000_u64 }.sizeprint ".\r\nThe number of primes up to a billion is "start_time = Time.monotonic# answr = primes.each.take_while { |p| p <= 1_000_000_000_u64 }.size # slow wayanswr = primes_count_to(1_000_000_000) # fast wayelpsd = (Time.monotonic - start_time).total_millisecondsprint "#{answr} in #{elpsd} milliseconds.\r\n"`
Output:
```The primes up to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97.
The Number of primes up to a million is 78498.
The number of primes up to a billion is 50847534 in 658.466028 milliseconds.```

When run on the same machine as the previous version, the code is about seven and a half times as fast as even the above Odds-Only version at about 2.4 CPU clock cycles per culling operation rather than over 17, partly due to better cache associativity (about half the gain) but also due to tuning the inner culling loop for small base prime values to operate by byte pointer strides with a constant mask value to simplify the code generated for these inner loops; as there is some overhead in the eight outer loops that set this up, this technique is only applicable for smaller base primes.

Further gains are possible by using maximum wheel factorization rather than just factorization for odd base primes which can reduce the number of operations by a factor of about four and the number of CPU clock cycles per culling operation can be reduced by an average of a further about 25 percent for sieving to a billion by using extreme loop unrolling techniques for both the dense and sparse culling cases. As well, multi-threading by pages can reduce the wall clock time by a factor of the number of effective cores (non Hyper-Threaded cores).

## D

### Simpler Version

Prints all numbers less than the limit.
`import std.stdio, std.algorithm, std.range, std.functional; uint[] sieve(in uint limit) nothrow @safe {    if (limit < 2)        return [];    auto composite = new bool[limit];     foreach (immutable uint n; 2 .. cast(uint)(limit ^^ 0.5) + 1)        if (!composite[n])            for (uint k = n * n; k < limit; k += n)                composite[k] = true;     //return iota(2, limit).filter!(not!composite).array;    return iota(2, limit).filter!(i => !composite[i]).array;} void main() {    50.sieve.writeln;}`
Output:
`[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]`

### Faster Version

This version uses an array of bits (instead of booleans, that are represented with one byte), and skips even numbers. The output is the same.

`import std.stdio, std.math, std.array; size_t[] sieve(in size_t m) pure nothrow @safe {    if (m < 3)        return null;    immutable size_t n = m - 1;    enum size_t bpc = size_t.sizeof * 8;    auto F = new size_t[((n + 2) / 2) / bpc + 1];    F[] = size_t.max;     size_t isSet(in size_t i) nothrow @safe @nogc {        immutable size_t offset = i / bpc;        immutable size_t mask = 1 << (i % bpc);        return F[offset] & mask;    }     void resetBit(in size_t i) nothrow @safe @nogc {        immutable size_t offset = i / bpc;        immutable size_t mask = 1 << (i % bpc);        if ((F[offset] & mask) != 0)            F[offset] = F[offset] ^ mask;    }     for (size_t i = 3; i <= sqrt(real(n)); i += 2)        if (isSet((i - 3) / 2))            for (size_t j = i * i; j <= n; j += 2 * i)                resetBit((j - 3) / 2);     Appender!(size_t[]) result;    result ~= 2;    for (size_t i = 3; i <= n; i += 2)        if (isSet((i - 3) / 2))            result ~= i;    return result.data;} void main() {    50.sieve.writeln;}`

### Extensible Version

(This version is used in the task Extensible prime generator.)

`/// Extensible Sieve of Eratosthenes.struct Prime {    uint[] a = ;     private void grow() pure nothrow @safe {        immutable p0 = a[\$ - 1] + 1;        auto b = new bool[p0];         foreach (immutable di; a) {            immutable uint i0 = p0 / di * di;            uint i = (i0 < p0) ? i0 + di - p0 : i0 - p0;            for (; i < b.length; i += di)                b[i] = true;        }         foreach (immutable uint i, immutable bi; b)            if (!b[i])                a ~= p0 + i;    }     uint opCall(in uint n) pure nothrow @safe {        while (n >= a.length)            grow;        return a[n];    }} version (sieve_of_eratosthenes3_main) {    void main() {        import std.stdio, std.range, std.algorithm;         Prime prime;        uint.max.iota.map!prime.until!q{a > 50}.writeln;    }}`

To see the output (that is the same), compile with `-version=sieve_of_eratosthenes3_main`.

## Dart

`// helper function to pretty print an IterableString iterableToString(Iterable seq) {  String str = "[";  Iterator i = seq.iterator;  if (i.moveNext()) str += i.current.toString();  while(i.moveNext()) {    str += ", " + i.current.toString();  }  return str + "]";} main() {  int limit = 1000;  int strt = new DateTime.now().millisecondsSinceEpoch;  Set<int> sieve = new Set<int>();   for(int i = 2; i <= limit; i++) {    sieve.add(i);  }  for(int i = 2; i * i <= limit; i++) {   if(sieve.contains(i)) {     for(int j = i * i; j <= limit; j += i) {       sieve.remove(j);     }   }  }  var sortedValues = new List<int>.from(sieve);  int elpsd = new DateTime.now().millisecondsSinceEpoch - strt;  print("Found " + sieve.length.toString() + " primes up to " + limit.toString() +      " in " + elpsd.toString() + " milliseconds.");  print(iterableToString(sortedValues)); // expect sieve.length to be 168 up to 1000...//  Expect.equals(168, sieve.length);}`
Output:
```
Found 168 primes up to 1000 in 9 milliseconds.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

```

Although it has the characteristics of a true Sieve of Eratosthenes, the above code isn't very efficient due to the remove/modify operations on the Set. Due to these, the computational complexity isn't close to linear with increasing range and it is quite slow for larger sieve ranges compared to compiled languages, taking an average of about 22 thousand CPU clock cycles for each of the 664579 primes (about 4 seconds on a 3.6 Gigahertz CPU) just to sieve to ten million.

### faster bit-packed array odds-only solution

`import 'dart:typed_data';import 'dart:math'; Iterable<int> soeOdds(int limit) {  if (limit < 3) return limit < 2 ? Iterable.empty() : ;  int lmti = (limit - 3) >> 1;  int bfsz = (lmti >> 3) + 1;  int sqrtlmt = (sqrt(limit) - 3).floor() >> 1;  Uint32List cmpsts = Uint32List(bfsz);  for (int i = 0; i <= sqrtlmt; ++i)    if ((cmpsts[i >> 5] & (1 << (i & 31))) == 0) {      int p = i + i + 3;      for (int j = (p * p - 3) >> 1; j <= lmti; j += p)        cmpsts[j >> 5] |= 1 << (j & 31);    }  return    .followedBy(      Iterable.generate(lmti + 1)      .where((i) => cmpsts[i >> 5] & (1 << (i & 31)) == 0)      .map((i) => i + i + 3) );} void main() {  final int range = 100000000;  String s = "( ";  primesPaged().take(25).forEach((p)=>s += "\$p "); print(s + ")");  print("There are \${countPrimesTo(1000000)} primes to 1000000.");  final start = DateTime.now().millisecondsSinceEpoch;  final answer = soeOdds(range).length;  final elapsed = DateTime.now().millisecondsSinceEpoch - start;  print("There were \$answer primes found up to \$range.");  print("This test bench took \$elapsed milliseconds.");}`
Output:
```( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 4604 milliseconds.```

The above code is somewhat faster at about 1.5 thousand CPU cycles per prime here run on a 1.92 Gigahertz low end Intel x5-Z8350 CPU or about 2.5 seconds on a 3.6 Gigahertz CPU using the Dart VM to sieve to 100 million.

### Unbounded infinite iterators/generators of primes

Infinite generator using a (hash) Map (sieves odds-only)

The following code will have about O(n log (log n)) performance due to a hash table having O(1) average performance and is only somewhat slow due to the constant overhead of processing hashes:

`Iterable<int> primesMap() {    Iterable<int> oddprms() sync* {      yield(3); yield(5); // need at least 2 for initialization      final Map<int, int> bpmap = {9: 6};      final Iterator<int> bps = oddprms().iterator;      bps.moveNext(); bps.moveNext(); // skip past 3 to 5      int bp = bps.current;      int n = bp;      int q = bp * bp;      while (true) {        n += 2;        while (n >= q || bpmap.containsKey(n)) {          if (n >= q) {            final int inc = bp << 1;            bpmap[bp * bp + inc] = inc;            bps.moveNext(); bp = bps.current; q = bp * bp;          } else {            final int inc = bpmap.remove(n);            int next = n + inc;            while (bpmap.containsKey(next)) {              next += inc;            }            bpmap[next] = inc;          }          n += 2;        }        yield(n);      }    }    return .followedBy(oddprms());} void main() {  final int range = 100000000;  String s = "( ";  primesMap().take(25).forEach((p)=>s += "\$p "); print(s + ")");  print("There are \${primesMap().takeWhile((p)=>p<=1000000).length} preimes to 1000000.");  final start = DateTime.now().millisecondsSinceEpoch;  final answer = primesMap().takeWhile((p)=>p<=range).length;  final elapsed = DateTime.now().millisecondsSinceEpoch - start;  print("There were \$answer primes found up to \$range.");  print("This test bench took \$elapsed milliseconds.");}`
Output:
```( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 preimes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 16086 milliseconds.```

This takes about 5300 CPU clock cycles per prime or about 8.4 seconds if run on a 3.6 Gigahertz CPU, which is slower than the above fixed bit-packed array version but has the advantage that it runs indefinitely, (at least on 64-bit machines; on 32 bit machines it can only be run up to the 32-bit number range, or just about a factor of 20 above as above).

Due to the constant execution overhead this is only reasonably useful for ranges up to tens of millions anyway.

Fast page segmented array infinite generator (sieves odds-only)

The following code also theoretically has a O(n log (log n)) execution speed performance and the same limited use on 32-bit execution platformas, but won't realize the theoretical execution complexity for larger primes due to the cache size increasing in size beyond its limits; but as the CPU L2 cache size that it automatically grows to use isn't any slower than the basic culling loop speed, it won't slow down much above that limit up to ranges of about 2.56e14, which will take in the order of weeks:

Translation of: Kotlin
`import 'dart:typed_data';import 'dart:math';import 'dart:collection'; // a lazy listtypedef _LazyList _Thunk();class _LazyList<T> {  final T head;  _Thunk thunk;  _LazyList<T> _rest;  _LazyList(T this.head, _Thunk this.thunk);  _LazyList<T> get rest {    if (this.thunk != null) {      this._rest = this.thunk();      this.thunk = null;    }    return this._rest;  }} class _LazyListIterable<T> extends IterableBase<T> {  _LazyList<T> _first;  _LazyListIterable(_LazyList<T> this._first);  @override Iterator<T> get iterator {    Iterable<T> inner() sync* {      _LazyList<T> current = this._first;      while (true) {        yield(current.head);        current = current.rest;      }    }    return inner().iterator;  }} // zero bit population count Look Up Table for 16-bit range...final Uint8List CLUT =  Uint8List.fromList(    Iterable.generate(65536)    .map((i) {      final int v0 = ~i & 0xFFFF;      final int v1 = v0 - ((v0 & 0xAAAA) >> 1);      final int v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2);      return (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) * 0x0101)) >> 8) & 31;    })    .toList()); int _countComposites(Uint8List cmpsts) {  Uint16List buf = Uint16List.view(cmpsts.buffer);  int lmt = buf.length;  int count = 0;  for (var i = 0; i < lmt; ++i) {    count += CLUT[buf[i]];  }  return count;}  // converts an entire sieved array of bytes into an array of UInt32 primes,// to be used as a source of base primes...Uint32List _composites2BasePrimeArray(int low, Uint8List cmpsts) {  final int lmti = cmpsts.length << 3;  final int len = _countComposites(cmpsts);  final Uint32List rslt = Uint32List(len);  int j = 0;  for (int i = 0; i < lmti; ++i) {    if (cmpsts[i >> 3] & 1 << (i & 7) == 0) {        rslt[j++] = low + i + i;    }  }  return rslt;} // do sieving work based on low starting value for the given buffer and// the given lazy list of base prime arrays...void _sieveComposites(int low, Uint8List buffer, Iterable<Uint32List> bpas) {  final int lowi = (low - 3) >> 1;  final int len = buffer.length;  final int lmti = len << 3;  final int nxti = lowi + lmti;  for (var bpa in bpas) {    for (var bp in bpa) {      final int bpi = (bp - 3) >> 1;      int strti = ((bpi * (bpi + 3)) << 1) + 3;      if (strti >= nxti) return;      if (strti >= lowi) strti = strti - lowi;      else {        strti = (lowi - strti) % bp;        if (strti != 0) strti = bp - strti;      }      if (bp <= len >> 3 && strti <= lmti - bp << 6) {        final int slmti = min(lmti, strti + bp << 3);        for (var s = strti; s < slmti; s += bp) {          final int msk = 1 << (s & 7);          for (var c = s >> 3; c < len; c += bp) {              buffer[c] |= msk;          }        }      }      else {        for (var c = strti; c < lmti; c += bp) {            buffer[c >> 3] |= 1 << (c & 7);        }      }    }  } } // starts the secondary base primes feed with minimum size in bits set to 4K...// thus, for the first buffer primes up to 8293,// the seeded primes easily cover it as 97 squared is 9409...Iterable<Uint32List> _makeBasePrimeArrays() {  var cmpsts = Uint8List(512);  _LazyList<Uint32List> _nextelem(int low, Iterable<Uint32List> bpas) {    // calculate size so that the bit span is at least as big as the    // maximum culling prime required, rounded up to minsizebits blocks...    final int rqdsz = 2 + sqrt((1 + low).toDouble()).toInt();    final sz = (((rqdsz >> 12) + 1) << 9); // size in bytes    if (sz > cmpsts.length) cmpsts = Uint8List(sz);    cmpsts.fillRange(0, cmpsts.length, 0);    _sieveComposites(low, cmpsts, bpas);    final arr = _composites2BasePrimeArray(low, cmpsts);    final nxt = low + (cmpsts.length << 4);    return _LazyList(arr, () => _nextelem(nxt, bpas));  }  // pre-seeding breaks recursive race,  // as only known base primes used for first page...  final preseedarr = Uint32List.fromList( [ // pre-seed to 100, can sieve to 10,000...    3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41    , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ] );  return _LazyListIterable(           _LazyList(preseedarr,           () => _nextelem(101, _makeBasePrimeArrays()))         );} // an iterable sequence over successive sieved buffer composite arrays,// returning a tuple of the value represented by the lowest possible prime// in the sieved composites array and the array itself;// the array has a 16 Kilobytes minimum size (CPU L1 cache), but// will grow so that the bit span is larger than the// maximum culling base prime required, possibly making it larger than// the L1 cache for large ranges, but still reasonably efficient using// the L2 cache: very efficient up to about 16e9 range;// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...Iterable<List> _makeSievePages() sync*  {  final bpas = _makeBasePrimeArrays(); // secondary source of base prime arrays  int low = 3;  Uint8List cmpsts = Uint8List(16384);  _sieveComposites(3, cmpsts, bpas);  while (true) {    yield([low, cmpsts]);    final rqdsz = 2 + sqrt((1 + low).toDouble()).toInt(); // problem with sqrt not exact past about 10^12!!!!!!!!!    final sz = ((rqdsz >> 17) + 1) << 14; // size iin bytes    if (sz > cmpsts.length) cmpsts = Uint8List(sz);    cmpsts.fillRange(0, cmpsts.length, 0);    low += cmpsts.length << 4;    _sieveComposites(low, cmpsts, bpas);  }} int countPrimesTo(int range) {  if (range < 3) { if (range < 2) return 0; else return 1; }  var count = 1;  for (var sp in _makeSievePages()) {    int low = sp; Uint8List cmpsts = sp;    if ((low + (cmpsts.length << 4)) > range) {      int lsti = (range - low) >> 1;      var lstw = (lsti >> 4); var lstb = lstw << 1;      var msk = (-2 << (lsti & 15)) & 0xFFFF;      var buf = Uint16List.view(cmpsts.buffer, 0, lstw);      for (var i = 0; i < lstw; ++i)        count += CLUT[buf[i]];      count += CLUT[(cmpsts[lstb + 1] << 8) | cmpsts[lstb] | msk];      break;    } else {      count += _countComposites(cmpsts);    }  }  return count;} // sequence over primes from above page iterator;// unless doing something special with individual primes, usually unnecessary;// better to do manipulations based on the composites bit arrays...// takes at least as long to enumerate the primes as sieve them...Iterable<int> primesPaged() sync* {  yield(2);  for (var sp in _makeSievePages()) {    int low = sp; Uint8List cmpsts = sp;    var szbts = cmpsts.length << 3;    for (var i = 0; i < szbts; ++i) {        if (cmpsts[i >> 3].toInt() & (1 << (i & 7)) != 0) continue;        yield(low + i + i);    }  }} void main() {  final int range = 1000000000;  String s = "( ";  primesPaged().take(25).forEach((p)=>s += "\$p "); print(s + ")");  print("There are \${countPrimesTo(1000000)} primes to 1000000.");  final start = DateTime.now().millisecondsSinceEpoch;  final answer = countPrimesTo(range); // fast way//  final answer = primesPaged().takeWhile((p)=>p<=range).length; // slow way using enumeration  final elapsed = DateTime.now().millisecondsSinceEpoch - start;  print("There were \$answer primes found up to \$range.");  print("This test bench took \$elapsed milliseconds.");}`
Output:
```( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 50847534 primes found up to 1000000000.
This test bench took 9385 milliseconds.```

This version counts the primes up to one billion in about five seconds at 3.6 Gigahertz (a low end 1.92 Gigahertz CPU used here) or about 350 CPU clock cycles per prime under the Dart Virtual Machine (VM).

Note that it takes about four times as long to do this using the provided primes generator/enumerator as noted in the code, which is normal for all languages that it takes longer to actually enumerate the primes than it does to sieve in culling the composite numbers, but Dart is somewhat slower than most for this.

The algorithm can be sped up by a factor of four by extreme wheel factorization and (likely) about a factor of the effective number of CPU cores by using multi-processing isolates, but there isn't much point if one is to use the prime generator for output. For most purposes, it is better to use custom functions that directly manipulate the culled bit-packed page segments as `countPrimesTo` does here.

## Delphi

`program erathostenes; {\$APPTYPE CONSOLE} type  TSieve = class  private    fPrimes: TArray<boolean>;    procedure InitArray;    procedure Sieve;    function getNextPrime(aStart: integer): integer;    function getPrimeArray: TArray<integer>;  public    function getPrimes(aMax: integer): TArray<integer>;  end;   { TSieve } function TSieve.getNextPrime(aStart: integer): integer;begin  result := aStart;  while not fPrimes[result] do    inc(result);end; function TSieve.getPrimeArray: TArray<integer>;var  i, n: integer;begin  n := 0;  setlength(result, length(fPrimes)); // init array with maximum elements  for i := 2 to high(fPrimes) do  begin    if fPrimes[i] then    begin      result[n] := i;      inc(n);    end;  end;  setlength(result, n); // reduce array to actual elementsend; function TSieve.getPrimes(aMax: integer): TArray<integer>;begin  setlength(fPrimes, aMax);  InitArray;  Sieve;  result := getPrimeArray;end; procedure TSieve.InitArray;begin  for i := 2 to high(fPrimes) do    fPrimes[i] := true;end; procedure TSieve.Sieve;var  i, n, max: integer;begin  max := length(fPrimes);  i := 2;  while i < sqrt(max) do  begin    n := sqr(i);    while n < max do    begin      fPrimes[n] := false;      inc(n, i);    end;    i := getNextPrime(i + 1);  end;end; var  i: integer;  Sieve: TSieve; begin  Sieve := TSieve.Create;  for i in Sieve.getPrimes(100) do    write(i, ' ');  Sieve.Free;  readln;end.`

Output:

`2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 `

## DWScript

`function Primes(limit : Integer) : array of Integer;var   n, k : Integer;   sieve := new Boolean[limit+1];begin   for n := 2 to Round(Sqrt(limit)) do begin      if not sieve[n] then begin         for k := n*n to limit step n do            sieve[k] := True;      end;   end;    for k:=2 to limit do      if not sieve[k] then         Result.Add(k);end; var r := Primes(50);var i : Integer;for i:=0 to r.High do   PrintLn(r[i]);`

## Dylan

With outer to sqrt and inner to p^2 optimizations:

`define method primes(n)  let limit = floor(n ^ 0.5) + 1;  let sieve = make(limited(<simple-vector>, of: <boolean>), size: n + 1, fill: #t);  let last-prime = 2;   while (last-prime < limit)    for (x from last-prime ^ 2 to n by last-prime)      sieve[x] := #f;    end for;    block (found-prime)      for (n from last-prime + 1 below limit)        if (sieve[n] = #f)          last-prime := n;          found-prime()        end;      end;      last-prime := limit;    end block;  end while;   for (x from 2 to n)    if (sieve[x]) format-out("Prime: %d\n", x); end;  end;end;`

## E

E's standard library doesn't have a step-by-N numeric range, so we'll define one, implementing the standard iteration protocol.

```def rangeFromBelowBy(start, limit, step) {
return def stepper {
to iterate(f) {
var i := start
while (i < limit) {
f(null, i)
i += step
}
}
}
}
```

The sieve itself:

```def eratosthenes(limit :(int > 2), output) {
def composite := [].asSet().diverge()
for i ? (!composite.contains(i)) in 2..!limit {
output(i)
}
}
```

Example usage:

```? eratosthenes(12, println)
# stdout: 2
#         3
#         5
#         7
#         11
```

## EasyLang

`len prims[] 100max = sqrt len prims[]tst = 2while tst <= max  if prims[tst] = 0    i = tst * tst    while i < len prims[]      prims[i] = 1      i += tst    .  .  tst += 1.i = 2while i < len prims[]  if prims[i] = 0    print i  .  i += 1.`

## eC

 This example is incorrect. Please fix the code and remove this message.Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: this is not a Sieve of Eratosthenes; it is just trial division.

` public class FindPrime{   Array<int> primeList { [ 2 ], minAllocSize = 64 };   int index;    index = 3;    bool HasPrimeFactor(int x)   {      int max = (int)floor(sqrt((double)x));       for(i : primeList)      {         if(i > max) break;         if(x % i == 0) return true;      }      return false;   }    public int GetPrime(int x)   {      if(x > primeList.count - 1)      {         for (; primeList.count != x; index += 2)            if(!HasPrimeFactor(index))            {               if(primeList.count >= primeList.minAllocSize) primeList.minAllocSize *= 2;               primeList.Add(index);            }      }      return primeList[x-1];   }} class PrimeApp : Application{   FindPrime fp { };   void Main()   {      int num = argc > 1 ? atoi(argv) : 1;      PrintLn(fp.GetPrime(num));   }} `

## EchoLisp

### Sieve

`(require 'types) ;; bit-vector ;; converts sieve->list for integers in [nmin .. nmax[(define (s-range sieve nmin nmax (base 0))	(for/list ([ i (in-range nmin nmax)]) #:when (bit-vector-ref sieve i) (+ i base))) ;; next prime in sieve > p, or #f(define (s-next-prime sieve p ) ;; 		(bit-vector-scan-1 sieve (1+ p)))  ;; returns a bit-vector - sieve- all numbers in [0..n[(define (eratosthenes n)  (define primes (make-bit-vector-1 n ))  (bit-vector-set! primes 0 #f)  (bit-vector-set! primes 1 #f)  (for ([p (1+ (sqrt n))])  		 #:when (bit-vector-ref primes  p)         (for ([j (in-range (* p p) n p)])    (bit-vector-set! primes j #f)))   primes)  (define s-primes (eratosthenes 10_000_000)) (s-range s-primes 0 100)   → (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)(s-range s-primes 1_000_000 1_000_100)   → (1000003 1000033 1000037 1000039 1000081 1000099)(s-next-prime s-primes 9_000_000)   → 9000011`

### Segmented sieve

Allow to extend the basis sieve (n) up to n^2. Memory requirement is O(√n)

`;; ref :  http://research.cs.wisc.edu/techreports/1990/TR909.pdf;; delta multiple of  sqrt(n);; segment is [left .. left+delta-1]  (define (segmented sieve left delta  (p 2) (first 0))	(define segment (make-bit-vector-1 delta))	(define right (+ left (1- delta)))	 (define pmax (sqrt right))	  (while p	  #:break (> p pmax)	 (set! first (+ left (modulo (- p (modulo left p)) p )))  	(for   [(q (in-range first (1+ right) p))]	(bit-vector-set! segment (- q left) #f))	        (set! p (bit-vector-scan-1 sieve (1+ p))))	segment) (define (seg-range nmin delta)    (s-range (segmented s-primes nmin delta) 0 delta nmin))  (seg-range 10_000_000_000 1000) ;; 15 milli-sec     → (10000000019 10000000033 10000000061 10000000069 10000000097 10000000103 10000000121        10000000141 10000000147 10000000207 10000000259 10000000277 10000000279 10000000319        10000000343 10000000391 10000000403 10000000469 10000000501 10000000537 10000000583        10000000589 10000000597 10000000601 10000000631 10000000643 10000000649 10000000667        10000000679 10000000711 10000000723 10000000741 10000000753 10000000793 10000000799        10000000807 10000000877 10000000883 10000000889 10000000949 10000000963 10000000991        10000000993 10000000999) ;; 8 msec using the native (prime?) function(for/list ((p (in-range 1_000_000_000 1_000_001_000))) #:when (prime? p) p)`

### Wheel

A 2x3 wheel gives a 50% performance gain.

`;; 2x3 wheel(define (weratosthenes n)  (define primes (make-bit-vector n )) ;; everybody to #f (false)  (bit-vector-set! primes 2 #t)  (bit-vector-set! primes 3 #t)  (bit-vector-set! primes 5 #t)   (for ([i  (in-range 6 n 6) ]) ;; set candidate primes  		(bit-vector-set! primes (1+ i) #t)  		(bit-vector-set! primes (+ i 5) #t)  		)   (for ([p  (in-range 5 (1+ (sqrt n)) 2 ) ])  		 #:when (bit-vector-ref primes  p)         (for ([j (in-range (* p p) n p)])    (bit-vector-set! primes j #f)))   primes)`

## EDSAC order code

This sieve program is based on one by Eiiti Wada, which on 2020-07-05 could be found at https://www.dcs.warwick.ac.uk/~edsac/

The main external change is that the program is not designed to be viewed in the monitor; it just writes as many primes as possible within the limitations imposed by Rosetta Code. Apart from the addition of comments, internal changes include the elimination of one set of masks, and a revised method of switching from one mask to another.

On the EdsacPC simulator (see link above) the printout starts off very slowly, and gradually gets faster.

`  [Sieve of Eratosthenes] [EDSAC program. Initial Orders 2] [Memory usage:   56..87   library subroutine P6, for printing   88..222  main program  224..293  mask table: 35 long masks; each has 34 1's and a single 0  294..1023 array of bits for integers 2, 3, 4, ...,            where bit is changed from 1 to 0 when integer is crossed out.  The address of the mask table must be even, and clear of the main program.  To change it, just change the value after "T47K" below.  The address of the bit array will then be changed automatically.] [Subroutine M3, prints header, terminated by blank row of tape. It's an "interlude", which runs and then gets overwritten.]      [email protected]@E8FEZPF      @&*SIEVE!OF!ERATOSTHENES!#2020      @&*BASED!ON!CODE!BY!EIITI!WADA!#2001      ..PZ [Subroutine P6, prints strictly positive integer. 32 locations; working locations 1, 4, 5.]        T  56 K  [email protected]@[email protected]@[email protected]@TFTF  [email protected]@[email protected]@J995FJF!F   [Store address of mask table in (say) location 47  (chosen because its code letter M is first letter of "Mask").  Address must be even and clear of main program.]        T  47 K        P 224 F  [<-------- address of mask table] [Main program]        T  88 K  [Define load address for main program.                 Must be even, because of long values at start.]        G     K [set @ (theta) for relative addressing] [Long constants]        T#Z PF TZ [clears sandwich digit between 0 and 1]     PD PF     [long value 1; also low word = short 1]        T2#Z PF T2Z [clears sandwich digit between 2 and 3]     PF K4096F [long value 1000...000 binary;                   also high word = teleprinter null]  [Short constants  The address in the following C order is the (exclusive) end of the bit table.  Must be even: max = 1024, min = M + 72 where M is address of mask table set up above.  Usually 1024, but may be reduced, e.g. to make the program run faster.]     C1024 D [or e.g. C 326 D to make it much faster]     U     F ['U' = 'T' - 'C']     K     F ['K' = 'S' - 'C']     H    #M [H order for start of mask table]     H  70#M [used to test for end of mask table]     P   2 F [constant4, or 2 in address field]    P  70 F [constant 140, or 70 in address field]    @     F [carriage return]    &     F [line feed]  [Short variables]    P   1 F [p = number under test                Let p = 35*q + r, where 0 <= r < 35]    P     F [4*q]    P   4 F [4*r]  [Initial values of orders; required only for optional code below.]    C  70#M [initial value of a variable C order]    T    #M [initial value of a variable T order]    T  70#M [initial value of a variable T order]       [Enter with acc = 0]   [Optional code to do some initializing at run time.   This code allows the program to run again without being loaded again.]         A  7 @ [initial values of variable orders]         T 65 @         A 16 @         T 66 @         A 17 @         T 44 @         A 18 @         T 52 @   [Initialize variables]         A    @ [load 1 (short)]         L    D [shift left 1]         U 13 @ [p := 2]         L  1 F [shift left 2]         T 15 @ [4*r :=  8]         T 14 @ [4*q :=  0]  [End of optional code]  [Make table of 35 masks 111...110,  111...101, ...,  011...111  Treat the mask 011...111 separately to avoid accumulator overflow.  Assume acc = 0 here.]        S    #@ [acc all 1's]        S  2 #@ [acc := 0111...111]    T 68 #M [store at high end of mask table]        S    #@ [acc := -1]                L     D [make mask 111...1110]        G 43  @ [jump to store it [Loop shifting the mask right and storing the result in the mask table.  Uses first entry of bit array as temporary store.]    T     F [clear acc]        A 70 #M [load previous mask]        L     D [shift left]        A    #@ [add 1]    U 70 #M [update current mask]    T    #M [store it in table (order changed at run time)]        A 44  @ [load preceding T order]        A  9  @ [inc address by 2]        U 44  @ [store back]        S 35  @ [reached high entry yet?]        G 39  @ [loop back if not] [Mask table is now complete]  [Initialize bit array: no numbers crossed out, so all bits are 1]    T     F [clear acc]        S     #@ [subtract long 1, make top 35 bits all 1's]    T 70 #M [store as long value, both words all 1's  (order changed at run time)]        A  52 @ [load preceding order]        A   9 @ [add 2 to address field]        U  52 @ [and store back]        S   5 @ [convert to C order with same address (*)]        S   4 @ [test for end of bit array]        G  50 @ [loop until stored all 1's in bit table] [(*) Done so that end of bit table can be stored at one place only      in list of constants, i.e. 'C m D' only, not 'T m D' as well.]  [Start of main loop.] [Testing whether number has been crossed out]    T     F [acc := 0]        A  66 @ [deriving S order from C order]        A   6 @        T  64 @        S    #@ [acc := -1]    S     F [acc := 1's complement of bit-table entry (order changed at run time)]    H    #M [mult reg := start of mask array (order changed at run time)]    C  70#M [acc := -1 iff p (current number) is crossed out (order changed at run time)] [The next order is to avoid accumulator overflow if acc = max positive number]        E  70 @ [if acc >= 0, jump to process new prime]       A     #@ [if acc < 0, add 1 to test for -1]       E 106  @ [if acc now >= 0 number is crossed out, jump to test next] [Here if new prime found.  Send it to the teleprinter]    O  11 @ [print CR]        O  12 @ [print LF]        T     F [clear acc]        A  13 @ [load prime]        T     F [store in C(0) for print routine]        A  75 @ [for subroutine return]        G 56  F [print prime]  [Cross out its multiples by setting corresponding bits to 0]        A  65 @ [load H order above]        T 102 @ [plant in crossing-out loop]        A  66 @ [load C order above]        T1 03 @ [plant in crossing-out loop]  [Start of crossing-out loop. Here acc must = 0]    A 102 @ [load H order below]        A  15 @ [inc address field by 2*r, where p = 35q + r]        U 102 @ [update H order]        S   8 @ [compare with 'H 70 #M']        G  93 @ [skip if not gone beyond end of mask table]        T     F [wrap mask address and inc address in bit tsble]        A 102 @ [load H order below]        S  10 @ [reduce mask address by 70]        T 102 @ [update H order]        A 103 @ [load C order below]        A   9 @ [add 2 to address]        T 103 @ [update C order]    T     F [clear acc]        A 103 @ [load C order below]        A  14 @ [inc address field by 2*q, where p = 35q + r]        U 103 @ [update C order]        S   4 @ [test for end of bit array]        E 106 @ [if finished crossing out, loop to test next number]        A  4  @ [restore C order]        A  5  @ [make T order with same address]        T 104 @ [store below]  [Execute the crossing-out orders created above]   X     F [mult reg := mask (order created at run time)]   X     F [acc := logical and with bit-table entry (order created at run time)]   X     F [update entry (order created at run time)]        E  81 @ [loop back with acc = 0]    T     F [clear acc]        A  13 @ [load p = number under test]        A     @ [add 1 (single)]        T  13 @ [update]        A  15 @ [load 4*r, where p = 35q + r]        A   9 @ [add 4]        U  15 @ [store back (r inc'd by 1)]        S  10 @ [is 4*r now >= 140?]        G 119 @ [no, skip]        T  15 @ [yes, reduce 4*r by 140]        A  14 @ [load 4*q]        A   9 @ [add 4]        T  14 @ [store back (q inc'd by 1)]   T     F [clear acc]        A  65 @ [load 'H ... D' order, which refers to a mask]        A   9 @ [inc mask address by 2]        U  65 @ [update order]        S   8 @ [over end of mask table?]        G  59 @ [no, skip wrapround code]        A   7 @ [yes, add constant to wrap round]        T  65 @ [update H order]        A  66 @         A   9 @ [inc address by 2]        U  66 @ [and store back]        S   4 @ [test for end, as defined by C order at start]        G  59 @ [loop back if not at end] [Finished whole thing]   O   3 @ [output null to flush teleprinter buffer]        Z     F [stop]        E  19 Z [address to start execution]        P     F [acc = 0 at start] `
Output:
```SIEVE OF ERATOSTHENES 2020
BASED ON CODE BY EIITI WADA 2001
2
3
5
7
11
13
17
[...]
12703
12713
12721
12739
12743
12757
12763
```

## Eiffel

Works with: EiffelStudio version 6.6 beta (with provisional loop syntax)
`class    APPLICATION create    make feature       make            -- Run application.        do            across primes_through (100) as ic loop print (ic.item.out + " ") end        end     primes_through (a_limit: INTEGER): LINKED_LIST [INTEGER]            -- Prime numbers through `a_limit'        require            valid_upper_limit: a_limit >= 2        local            l_tab: ARRAY [BOOLEAN]        do            create Result.make            create l_tab.make_filled (True, 2, a_limit)            across                l_tab as ic            loop                if ic.item then                    Result.extend (ic.target_index)                    across ((ic.target_index * ic.target_index) |..| l_tab.upper).new_cursor.with_step (ic.target_index) as id                    loop                        l_tab [id.item] := False                    end                end            end        endend`

Output:

```2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

## Elixir

`defmodule Prime do  def eratosthenes(limit \\ 1000) do    sieve = [false, false | Enum.to_list(2..limit)] |> List.to_tuple    check_list = [2 | Stream.iterate(3, &(&1+2)) |> Enum.take(round(:math.sqrt(limit)/2))]    Enum.reduce(check_list, sieve, fn i,tuple ->      if elem(tuple,i) do        clear_num = Stream.iterate(i*i, &(&1+i)) |> Enum.take_while(fn x -> x <= limit end)        clear(tuple, clear_num)      else        tuple      end    end)  end   defp clear(sieve, list) do    Enum.reduce(list, sieve, fn i, acc -> put_elem(acc, i, false) end)  endend limit = 199sieve = Prime.eratosthenes(limit)Enum.each(0..limit, fn n ->  if x=elem(sieve, n), do: :io.format("~3w", [x]), else: :io.format("  .")   if rem(n+1, 20)==0, do: IO.puts ""end)`
Output:
```  .  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19
.  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .
. 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59
. 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79
.  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .
.101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .
.  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139
.  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .
.  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179
.181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199
```

Shorter version (but slow):

` defmodule Sieve do  def primes_to(limit), do: sieve(Enum.to_list(2..limit))   defp sieve([h|t]), do: [h|sieve(t -- for n <- 1..length(t), do: h*n)]  defp sieve([]), do: []end `

Alternate much faster odds-only version more suitable for immutable data structures using a (hash) Map

The above code has a very limited useful range due to being very slow: for example, to sieve to a million, even changing the algorithm to odds-only, requires over 800 thousand "copy-on-update" operations of the entire saved immutable tuple ("array") of 500 thousand bytes in size, making it very much a "toy" application. The following code overcomes that problem by using a (immutable/hashed) Map to store the record of the current state of the composite number chains resulting from each of the secondary streams of base primes, which are only 167 in number up to this range; it is a functional "incremental" Sieve of Eratosthenes implementation:

`defmodule PrimesSoEMap do  @typep stt :: {integer, integer, integer, Enumerable.integer, %{integer => integer}}   @spec advance(stt) :: stt  defp advance {n, bp, q, bps?, map} do    bps = if bps? === nil do Stream.drop(oddprms(), 1) else bps? end    nn = n + 2    if nn >= q do      inc = bp + bp      nbps = bps |> Stream.drop(1)      [nbp] = nbps |> Enum.take(1)      advance {nn, nbp, nbp * nbp, nbps, map |> Map.put(nn + inc, inc)}    else if Map.has_key?(map, nn) do      {inc, rmap} = Map.pop(map, nn)      [next] =        Stream.iterate(nn + inc, &(&1 + inc))          |> Stream.drop_while(&(Map.has_key?(rmap, &1))) |> Enum.take(1)      advance {nn, bp, q, bps, Map.put(rmap, next, inc)}    else      {nn, bp, q, bps, map}    end end  end   @spec oddprms() :: Enumerable.integer  defp oddprms do # put first base prime cull seq in Map so never empty    # advance base odd primes to 5 when initialized    init = {7, 5, 25, nil, %{9 => 6}}    [3, 5] # to avoid race, preseed with the first 2 elements...      |> Stream.concat(            Stream.iterate(init, &(advance &1))              |> Stream.map(fn {p,_,_,_,_} -> p end))  end   @spec primes() :: Enumerable.integer  def primes do    Stream.concat(, oddprms())  end end range = 1000000IO.write "The first 25 primes are:\n( "PrimesSoEMap.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))IO.puts ")"testfunc =  fn () ->    ans =      PrimesSoEMap.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()    ans end:timer.tc(testfunc)  |> (fn {t,ans} ->    IO.puts "There are #{ans} primes up to #{range}."    IO.puts "This test bench took #{t} microseconds." end).()`
Output:
```The first 25 primes are:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes up to 1000000.
This test bench took 3811957 microseconds.```

The output time of about 3.81 seconds to one million is on a 1.92 Gigahertz CPU meaning that it takes about 93 thousand CPU clock cycles per prime which is still quite slow compared to mutable data structure implementations but comparable to "functional" implementations in other languages and is slow due to the time to calculate the required hashes. One advantage that it has is that it is O(n log (log n)) asymptotic computational complexity meaning that it takes not much more than ten times as long to sieve a range ten times higher.

This algorithm could be easily changed to use a Priority Queue (preferably Min-Heap based for the least constant factor computational overhead) to save some of the computation time, but then it will have the same computational complexity as the following code and likely about the same execution time.

Alternate faster odds-only version more suitable for immutable data structures using lazy Streams of Co-Inductive Streams

In order to save the computation time of computing the hashes, the following version uses a deferred execution Co-Inductive Stream type (constructed using Tuple's) in an infinite tree folding structure (by the `pairs` function):

`defmodule PrimesSoETreeFolding do  @typep cis :: {integer, (() -> cis)}  @typep ciss :: {cis, (() -> ciss)}   @spec merge(cis, cis) :: cis  defp merge(xs, ys) do    {x, restxs} = xs; {y, restys} = ys    cond do      x < y -> {x, fn () -> merge(restxs.(), ys) end}      y < x -> {y, fn () -> merge(xs, restys.()) end}      true -> {x, fn () -> merge(restxs.(), restys.()) end}    end  end   @spec smlt(integer, integer) :: cis  defp smlt(c, inc) do    {c, fn () -> smlt(c + inc, inc) end}  end   @spec smult(integer) :: cis  defp smult(p) do    smlt(p * p, p + p)  endP  @spec allmults(cis) :: ciss  defp allmults {p, restps} do    {smult(p), fn () -> allmults(restps.()) end}  end   @spec pairs(ciss) :: ciss  defp pairs {cs0, restcss0} do    {cs1, restcss1} = restcss0.()    {merge(cs0, cs1), fn () -> pairs(restcss1.()) end}  end   @spec cmpsts(ciss) :: cis  defp cmpsts {cs, restcss} do    {c, restcs} = cs    {c, fn () -> merge(restcs.(), cmpsts(pairs(restcss.()))) end}  end   @spec minusat(integer, cis) :: cis  defp minusat(n, cmps) do    {c, restcs} = cmps    if n < c do      {n, fn () -> minusat(n + 2, cmps) end}    else      minusat(n + 2, restcs.())    end  end   @spec oddprms() :: cis  defp oddprms() do    {3, fn () ->      {5, fn () -> minusat(7, cmpsts(allmults(oddprms()))) end}    end}  end   @spec primes() :: Enumerable.t  def primes do     |> Stream.concat(      Stream.iterate(oddprms(), fn {_, restps} -> restps.() end)        |> Stream.map(fn {p, _} -> p end)    )  end end range = 1000000IO.write "The first 25 primes are:\n( "PrimesSoETreeFolding.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))IO.puts ")"testfunc =  fn () ->    ans =      PrimesSoETreeFolding.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()    ans end:timer.tc(testfunc)  |> (fn {t,ans} ->    IO.puts "There are #{ans} primes up to #{range}."    IO.puts "This test bench took #{t} microseconds." end).()`

It's output is identical to the previous version other than the time required is less than half; however, it has a O(n (log n) (log (log n))) asymptotic computation complexity meaning that it gets slower with range faster than the above version. That said, it would take sieving to billions taking hours before the two would take about the same time.

## Emacs Lisp

` (defun sieve-set (limit)  (let ((xs (make-vector (1+ limit) 0)))    (loop for i from 2 to limit          when (zerop (aref xs i))          collect i          and do (loop for m from (* i i) to limit by i                       do (aset xs m 1))))) `

Straightforward implementation of sieve of Eratosthenes, 2 times faster:

` (defun sieve (limit)  (let ((xs (vconcat [0 0] (number-sequence 2 limit))))    (loop for i from 2 to (sqrt limit)          when (aref xs i)          do (loop for m from (* i i) to limit by i                   do (aset xs m 0)))    (remove 0 xs))) `

## Erlang

### Erlang using Dicts

 This example is incorrect. Please fix the code and remove this message.Details: See talk page.
` -module( sieve_of_eratosthenes ). -export( [primes_upto/1] ). primes_upto( N ) ->	Ns = lists:seq( 2, N ),	Dict = dict:from_list( [{X, potential_prime} || X <- Ns] ),	{Upto_sqrt_ns, _T} = lists:split( erlang:round(math:sqrt(N)), Ns ),	{N, Prime_dict} = lists:foldl( fun find_prime/2, {N, Dict}, Upto_sqrt_ns ),	lists:sort( dict:fetch_keys(Prime_dict) ).   find_prime( N, {Max, Dict} ) -> find_prime( dict:find(N, Dict), N, {Max, Dict} ). find_prime( error, _N, Acc ) -> Acc;find_prime( {ok, _Value}, N, {Max, Dict} ) -> {Max, lists:foldl( fun dict:erase/2, Dict, lists:seq(N*N, Max, N) )}. `
Output:
```35> sieve_of_eratosthenes:primes_upto( 20 ).
[2,3,5,7,11,13,17,19]
```

### Erlang Lists of Tuples, Sloww

A much slower, perverse method, using only lists of tuples. Especially evil is the P = lists:filtermap operation which yields a list for every iteration of the X * M row. Has the virtue of working for any -> N :)

` -module( sieve ).                                                                                                    -export( [main/1,primes/2] ).                                                                                         main(N) -> io:format("Primes: ~w~n", [ primes(2,N) ]).                                                                primes(M,N) -> primes(M, N,lists:seq( M, N ),[]).                                                                     primes(M,N,_Acc,Tuples) when M > N/2-> out(Tuples);                                                                   primes(M,N,Acc,Tuples) when length(Tuples) < 1 ->         primes(M,N,Acc,[{X, X} || X <- Acc]);                               primes(M,N,Acc,Tuples) ->                                                                                                    {SqrtN, _T} = lists:split( erlang:round(math:sqrt(N)), Acc ),                                                        F = Tuples,                                                                                                          Ms = lists:filtermap(fun(X) -> if X > 0 -> {true, X * M}; true -> false end end, SqrtN),                             P = lists:filtermap(fun(T) ->             case lists:keymember(T,1,F) of true ->             {true, lists:keyreplace(T,1,F,{T,0})};              _-> false end end,  Ms),                                                                                                      AA = mergeT(P,lists:last(P),1 ),                                                                                     primes(M+1,N,Acc,AA).                                                                                         mergeT(L,M,Acc) when Acc == length(L) -> M;                                                                          mergeT(L,M,Acc) ->                                                                                                           A = lists:nth(Acc,L),                                                                                                B = M,                                                                                                               Mer = lists:zipwith(fun(X, Y) -> if X < Y -> X; true -> Y end end, A, B),                                            mergeT(L,Mer,Acc+1).                                                                                          out(Tuples) ->                                                                                                               Primes = lists:filter( fun({_,Y}) -> Y > 0 end,  Tuples),                                                            [ X || {X,_} <- Primes ].                                                                                     `
Output:
```109> sieve:main(20).
Primes: [2,3,5,7,11,13,17,19]
ok
110> timer:tc(sieve, main, ).
Primes: [2,3,5,7,11,13,17,19]
{129,ok}
```

### Erlang with ordered sets

Since I had written a really odd and slow one, I thought I'd best do a better performer. Inspired by an example from https://github.com/jupp0r

`  -module(ossieve).-export([main/1]). sieve(Candidates,SearchList,Primes,_Maximum) when length(SearchList) == 0 ->    ordsets:union(Primes,Candidates);sieve(Candidates,SearchList,Primes,Maximum)  ->     H = lists:nth(1,string:substr(Candidates,1,1)),     Reduced1 = ordsets:del_element(H, Candidates),     {Reduced2, ReducedSearch} = remove_multiples_of(H, Reduced1, SearchList),     NewPrimes = ordsets:add_element(H,Primes),     sieve(Reduced2, ReducedSearch, NewPrimes, Maximum). remove_multiples_of(Number,Candidates,SearchList) ->                                     NewSearchList = ordsets:filter( fun(X) -> X >= Number * Number end, SearchList),     RemoveList = ordsets:filter( fun(X) -> X rem Number == 0 end, NewSearchList),    {ordsets:subtract(Candidates, RemoveList), ordsets:subtract(NewSearchList, RemoveList)}. main(N) ->          io:fwrite("Creating Candidates...~n"),    CandidateList = lists:seq(3,N,2),    Candidates = ordsets:from_list(CandidateList),    io:fwrite("Sieving...~n"),    ResultSet = ordsets:add_element(2,sieve(Candidates,Candidates,ordsets:new(),N)),    io:fwrite("Sieved... ~w~n",[ResultSet]). `
Output:
```36> ossieve:main(100).
Creating Candidates...
Sieving...
Sieved... [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok

```

### Erlang Canonical

A pure list comprehension approach.

` -module(sieveof).-export([main/1,primes/1, primes/2]).                  main(X) -> io:format("Primes: ~w~n", [ primes(X) ]).   primes(X) -> sieve(range(2, X)).                                         primes(X, Y) -> remove(primes(X), primes(Y)).                             range(X, X) -> [X];                                                      range(X, Y) -> [X | range(X + 1, Y)].                                     sieve([X]) -> [X];                                                       sieve([H | T]) -> [H | sieve(remove([H * X || X <-[H | T]], T))].         remove(_, []) -> [];                                                     remove([H | X], [H | Y]) -> remove(X, Y);                                remove(X, [H | Y]) -> [H | remove(X, Y)].                                 `

{out}

```> timer:tc(sieve, main, ).
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{7350,ok}
61> timer:tc(sieveof, main, ).
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{363,ok}
```

Clearly not only more elegant, but faster :) Thanks to http://stackoverflow.com/users/113644/g-b

### Erlang ets + cpu distributed implementation

much faster previous erlang examples

` #!/usr/bin/env escript%% -*- erlang -*-%%! -smp enable -sname p10_4% vim:syn=erlang -mode(compile). main([N0]) ->    N = list_to_integer(N0),    ets:new(comp, [public, named_table, {write_concurrency, true} ]),    ets:new(prim, [public, named_table, {write_concurrency, true}]),    composite_mc(N),    primes_mc(N),    io:format("Answer: ~p ~n", [lists:sort([X||{X,_}<-ets:tab2list(prim)])]). primes_mc(N) ->    case erlang:system_info(schedulers) of        1 -> primes(N);        C -> launch_primes(lists:seq(1,C), C, N, N div C)    end.launch_primes([1|T], C, N, R) -> P = self(), spawn(fun()-> primes(2,R), P ! {ok, prm} end), launch_primes(T, C, N, R);launch_primes([H|[]], C, N, R)-> P = self(), spawn(fun()-> primes(R*(H-1)+1,N), P ! {ok, prm} end), wait_primes(C);launch_primes([H|T], C, N, R) -> P = self(), spawn(fun()-> primes(R*(H-1)+1,R*H), P ! {ok, prm} end), launch_primes(T, C, N, R). wait_primes(0) -> ok;wait_primes(C) ->    receive        {ok, prm} -> wait_primes(C-1)    after 1000    -> wait_primes(C)    end. primes(N) -> primes(2, N).primes(I,N) when I =< N ->    case ets:lookup(comp, I) of        [] -> ets:insert(prim, {I,1})        ;_ -> ok    end,    primes(I+1, N);primes(I,N) when I > N -> ok.  composite_mc(N) -> composite_mc(N,2,round(math:sqrt(N)),erlang:system_info(schedulers)).composite_mc(N,I,M,C) when I =< M, C > 0 ->    C1 = case ets:lookup(comp, I) of        [] -> comp_i_mc(I*I, I, N), C-1        ;_ -> C    end,    composite_mc(N,I+1,M,C1);composite_mc(_,I,M,_) when I > M -> ok;composite_mc(N,I,M,0) ->    receive        {ok, cim} -> composite_mc(N,I,M,1)    after 1000    -> composite_mc(N,I,M,0)    end. comp_i_mc(J, I, N) ->     Parent = self(),    spawn(fun() ->        comp_i(J, I, N),        Parent ! {ok, cim}    end). comp_i(J, I, N) when J =< N -> ets:insert(comp, {J, 1}), comp_i(J+I, I, N);comp_i(J, _, N) when J > N -> ok. `
Output:
```[email protected]:~/work/mblog/pr_euler/p10\$ ./generator.erl 100
97]
```

another several erlang implementation: http://mijkenator.github.io/2015/11/29/project-euler-problem-10/

## ERRE

` PROGRAM SIEVE_ORG  ! --------------------------------------------------  ! Eratosthenes Sieve Prime Number Program in BASIC  ! (da 3 a SIZE*2)   from Byte September 1981  !---------------------------------------------------  CONST SIZE%=8190   DIM FLAGS%[SIZE%] BEGIN  PRINT("Only 1 iteration")  COUNT%=0  FOR I%=0 TO SIZE% DO     IF FLAGS%[I%]=TRUE THEN         !\$NULL       ELSE         PRIME%=I%+I%+3         K%=I%+PRIME%         WHILE NOT (K%>SIZE%) DO            FLAGS%[K%]=TRUE            K%=K%+PRIME%         END WHILE         PRINT(PRIME%;)         COUNT%=COUNT%+1     END IF  END FOR  PRINT  PRINT(COUNT%;" PRIMES")END PROGRAM `
Output:

last lines of the output screen

``` 15749  15761  15767  15773  15787  15791  15797  15803  15809  15817  15823
15859  15877  15881  15887  15889  15901  15907  15913  15919  15923  15937
15959  15971  15973  15991  16001  16007  16033  16057  16061  16063  16067
16069  16073  16087  16091  16097  16103  16111  16127  16139  16141  16183
16187  16189  16193  16217  16223  16229  16231  16249  16253  16267  16273
16301  16319  16333  16339  16349  16361  16363  16369  16381
1899  PRIMES
```

## Euphoria

`constant limit = 1000sequence flags,primesflags = repeat(1, limit)for i = 2 to sqrt(limit) do    if flags[i] then        for k = i*i to limit by i do            flags[k] = 0        end for    end ifend for primes = {}for i = 2 to limit do    if flags[i] = 1 then        primes &= i    end ifend for? primes`

Output:

```{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,
181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,
277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,
383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,
487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,
601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,
709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,
827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997}```

## F#

### Short Sweet Functional and Idiotmatic

Well lists may not be lazy, but if you call it a sequence then it's a lazy list!

` (*  An interesting implementation of The Sieve of Eratosthenes.  Nigel Galloway April 7th., 2017.*)let SofE =  let rec fn n g = seq{ match n with                        |1 -> yield false; yield! fn g g                         |_ -> yield  true; yield! fn (n - 1) g}  let rec fg ng = seq {    let g = (Seq.findIndex(id) ng) + 2 // decreasingly inefficient with range at O(n)!    yield g; yield! fn (g - 1) g |> Seq.map2 (&&) ng |> Seq.cache |> fg }  Seq.initInfinite (fun x -> true) |> fg `
Output:
```> SofE |> Seq.take 10 |> Seq.iter(printfn "%d");;
2
3
5
7
11
13
17
19
23
29
```

Although interesting intellectually, and although the algorithm is more Sieve of Eratosthenes (SoE) than not in that it uses a progression of composite number representations separated by base prime gaps to cull, it isn't really SoE in performance due to several used functions that aren't linear with range, such as the "findIndex" that scans from the beginning of all primes to find the next un-culled value as the next prime in the sequence and the general slowness and inefficiency of F# nested sequence generation.

It is so slow that it takes in the order of seconds just to find the primes to a thousand!

For practical use, one would be much better served by any of the other functional sieves below, which can sieve to a million in less time than it takes this one to sieve to ten thousand. Those other functional sieves aren't all that many lines of code than this one.

### Functional

Richard Bird Sieve

This is the idea behind Richard Bird's unbounded code presented in the Epilogue of M. O'Neill's article in Haskell. It is about twice as much code as the Haskell code because F# does not have a built-in lazy list so that the effect must be constructed using a Co-Inductive Stream (CIS) type since no memoization is required, along with the use of recursive functions in combination with sequences. The type inference needs some help with the new CIS type (including selecting the generic type for speed). Note the use of recursive functions to implement multiple non-sharing delayed generating base primes streams, which along with these being non-memoizing means that the entire primes stream is not held in memory as for the original Bird code:

`type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness let primesBird() =  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function    if x < y then CIS(x, fun() -> xtlf() ^^ ys)    elif y < x then CIS(y, fun() -> xs ^^ ytlf())    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication  let pmltpls p = let rec nxt c = CIS(c, fun() -> nxt (c + p)) in nxt (p * p)  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =    CIS(c, fun() -> (ctlf()) ^^ (cmpsts (amstlf())))  let rec minusat n (CIS(c, ctlf) as cs) =    if n < c then CIS(n, fun() -> minusat (n + 1u) cs)    else minusat (n + 1u) (ctlf())  let rec baseprms() = CIS(2u, fun() -> baseprms() |> allmltps |> cmpsts |> minusat 3u)  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (baseprms())`

The above code sieves all numbers of two and up including all even numbers as per the page specification; the following code makes the very minor changes for an odds-only sieve, with a speedup of over a factor of two:

`type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness let primesBirdOdds() =  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function    if x < y then CIS(x, fun() -> xtlf() ^^ ys)    elif y < x then CIS(y, fun() -> xs ^^ ytlf())    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication  let pmltpls p = let adv = p + p                  let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =    CIS(c, fun() -> ctlf() ^^ cmpsts (amstlf()))  let rec minusat n (CIS(c, ctlf) as cs) =    if n < c then CIS(n, fun() -> minusat (n + 2u) cs)    else minusat (n + 2u) (ctlf())  let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))`

Tree Folding Sieve

The above code is still somewhat inefficient as it operates on a linear right extending structure that deepens linearly with increasing base primes (those up to the square root of the currently sieved number); the following code changes the structure into an infinite binary tree-like folding by combining each pair of prime composite streams before further processing as usual - this decreases the processing by approximately a factor of log n:

`type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness let primesTreeFold() =  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function    if x < y then CIS(x, fun() -> xtlf() ^^ ys)    elif y < x then CIS(y, fun() -> xs ^^ ytlf())    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication  let pmltpls p = let adv = p + p                  let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))  let rec pairs (CIS(cs0, cs0tlf)) =    let (CIS(cs1, cs1tlf)) = cs0tlf() in CIS(cs0 ^^ cs1, fun() -> pairs (cs1tlf()))  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =    CIS(c, fun() -> ctlf() ^^ (cmpsts << pairs << amstlf)())  let rec minusat n (CIS(c, ctlf) as cs) =    if n < c then CIS(n, fun() -> minusat (n + 2u) cs)    else minusat (n + 2u) (ctlf())  let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))`

The above code is over four times faster than the "BirdOdds" version (at least 10x faster than the first, "primesBird", producing the millionth prime) and is moderately useful for a range of the first million primes or so.

Priority Queue Sieve

In order to investigate Priority Queue Sieves as espoused by O'Neill in the referenced article, one must find an equivalent implementation of a Min Heap Priority Queue as used by her. There is such an purely functional implementation in RosettaCode translated from the Haskell code she used, from which the essential parts are duplicated here (Note that the key value is given an integer type in order to avoid the inefficiency of F# in generic comparison):

`[<RequireQualifiedAccess>]module MinHeap =   type HeapEntry<'V> = struct val k:uint32 val v:'V new(k,v) = {k=k;v=v} end  [<CompilationRepresentation(CompilationRepresentationFlags.UseNullAsTrueValue)>]  [<NoEquality; NoComparison>]  type PQ<'V> =         | Mt         | Br of HeapEntry<'V> * PQ<'V> * PQ<'V>   let empty = Mt   let peekMin = function | Br(kv, _, _) -> Some(kv.k, kv.v)                         | _            -> None   let rec push wk wv =     function | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)             | Br(vkv, ll, rr) ->                 if wk <= vkv.k then                   Br(HeapEntry(wk, wv), push vkv.k vkv.v rr, ll)                 else Br(vkv, push wk wv rr, ll)   let private siftdown wk wv pql pqr =    let rec sift pl pr =      match pl with        | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)        | Br(vkvl, pll, plr) ->            match pr with              | Mt -> if wk <= vkvl.k then Br(HeapEntry(wk, wv), pl, Mt)                      else Br(vkvl, Br(HeapEntry(wk, wv), Mt, Mt), Mt)              | Br(vkvr, prl, prr) ->                  if wk <= vkvl.k && wk <= vkvr.k then Br(HeapEntry(wk, wv), pl, pr)                  elif vkvl.k <= vkvr.k then Br(vkvl, sift pll plr, pr)                  else Br(vkvr, pl, sift prl prr)    sift pql pqr                                           let replaceMin wk wv = function | Mt -> Mt                                  | Br(_, ll, rr) -> siftdown wk wv ll rr`

Except as noted for any individual code, all of the following codes need the following prefix code in order to implement the non-memoizing Co-Inductive Streams (CIS's) and to set the type of particular constants used in the codes to the same time as the "Prime" type:

`type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} endtype Prime = uint32let frstprm = 2ulet frstoddprm = 3ulet inc1 = 1ulet inc = 2u`

The F# equivalent to O'Neill's "odds-only" code is then implemented as follows, which needs the included changed prefix in order to change the primes type to a larger one to prevent overflow (as well the key type for the MinHeap needs to be changed from uint32 to uint64); it is functionally the same as the O'Neill code other than for minor changes to suit the use of CIS streams and the option output of the "peekMin" function:

`type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} endtype Prime = uint64let frstprm = 2ULlet frstoddprm = 3ULlet inc = 2UL let primesPQ() =  let pmult p (xs: CIS<Prime>) = // does map (* p) xs    let rec nxtm (cs: CIS<Prime>) =      CIS(p * cs.v, fun() -> nxtm (cs.cont())) in nxtm xs  let insertprime p xs table =    MinHeap.push (p * p) (pmult p xs) table  let rec sieve' (ns: CIS<Prime>) table =    let nextcomposite = match MinHeap.peekMin table with                          | None -> ns.v // never happens                          | Some (k, _) -> k    let rec adjust table =      let (n, advs) = match MinHeap.peekMin table with                        | None -> (ns.v, ns.cont()) // never happens                        | Some kv -> kv      if n <= ns.v then adjust (MinHeap.replaceMin advs.v (advs.cont()) table)      else table    if nextcomposite <= ns.v then sieve' (ns.cont()) (adjust table)    else let n = ns.v in CIS(n, fun() ->           let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns table))  let rec sieve (ns: CIS<Prime>) = let n = ns.v in CIS(n, fun() ->      let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns MinHeap.empty))  let odds = // is the odds CIS from 3 up    let rec nxto i = CIS(i, fun() -> nxto (i + inc)) in nxto frstoddprm  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))             (CIS(frstprm, fun() -> (sieve odds)))`

However, that algorithm suffers in speed and memory use due to over-eager adding of prime composite streams to the queue such that the queue used is much larger than it needs to be and a much larger range of primes number must be used in order to avoid numeric overflow on the square of the prime added to the queue. The following code corrects that by using a secondary (actually a multiple of) base primes streams which are constrained to be based on a prime that is no larger than the square root of the currently sieved number - this permits the use of much smaller Prime types as per the default prefix:

`let primesPQx() =  let rec nxtprm n pq q (bps: CIS<Prime>) =    if n >= q then let bp = bps.v in let adv = bp + bp                   let nbps = bps.cont() in let nbp = nbps.v                   nxtprm (n + inc) (MinHeap.push (n + adv) adv pq) (nbp * nbp) nbps    else let ck, cv = match MinHeap.peekMin pq with                        | None -> (q, inc) // only happens until first insertion                        | Some kv -> kv         if n >= ck then let rec adjpq ck cv pq =                             let npq = MinHeap.replaceMin (ck + cv) cv pq                             match MinHeap.peekMin npq with                               | None -> npq // never happens                               | Some(nk, nv) -> if n >= nk then adjpq nk nv npq                                                 else npq                         nxtprm (n + inc) (adjpq ck cv pq) q bps         else CIS(n, fun() -> nxtprm (n + inc) pq q bps)  let rec oddprms() = CIS(frstoddprm, fun() ->      nxtprm (frstoddprm + inc) MinHeap.empty (frstoddprm * frstoddprm) (oddprms()))  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))             (CIS(frstprm, fun() -> (oddprms())))`

The above code is well over five times faster than the previous translated O'Neill version for the given variety of reasons.

Although slightly faster than the Tree Folding code, this latter code is also limited in practical usefulness to about the first one to ten million primes or so.

All of the above codes can be tested in the F# REPL with the following to produce the millionth prime (the "nth" function is zero based):

`> primesXXX() |> Seq.nth 999999;;`

where primesXXX() is replaced by the given primes generating function to be tested, and which all produce the following output (after a considerable wait in some cases):

Output:
`val it : Prime = 15485863u`

### Imperative

The following code is written in functional style other than it uses a mutable bit array to sieve the composites:

`let primes limit =  let buf = System.Collections.BitArray(int limit + 1, true)  let cull p = { p * p .. p .. limit } |> Seq.iter (fun c -> buf.[int c] <- false)  { 2u .. uint32 (sqrt (double limit)) } |> Seq.iter (fun c -> if buf.[int c] then cull c)  { 2u .. limit } |> Seq.map (fun i -> if buf.[int i] then i else 0u) |> Seq.filter ((<>) 0u) [<EntryPoint>]let main argv =  if argv = null || argv.Length = 0 then failwith "no command line argument for limit!!!"  printfn "%A" (primes (System.UInt32.Parse argv.) |> Seq.length)  0 // return an integer exit code`

Substituting the following minor changes to the code for the "primes" function will only deal with the odd prime candidates for a speed up of over a factor of two as well as a reduction of the buffer size by a factor of two:

`let primes limit =  let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u  let buf = System.Collections.BitArray(int lmtb + 1, true)  let cull i = let p = i + i + 3u in let s = p * (i + 1u) + i in               { s .. p .. lmtb } |> Seq.iter (fun c -> buf.[int c] <- false)  { 0u .. lmtbsqrt } |> Seq.iter (fun i -> if buf.[int i] then cull i )  let oddprms = { 0u .. lmtb } |> Seq.map (fun i -> if buf.[int i] then i + i + 3u else 0u)                |> Seq.filter ((<>) 0u)  seq { yield 2u; yield! oddprms }`

The following code uses other functional forms for the inner culling loops of the "primes function" to reduce the use of inefficient sequences so as to reduce the execution time by another factor of almost three:

`let primes limit =  let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u  let buf = System.Collections.BitArray(int lmtb + 1, true)  let rec culltest i = if i <= lmtbsqrt then                         let p = i + i + 3u in let s = p * (i + 1u) + i in                         let rec cullp c = if c <= lmtb then buf.[int c] <- false; cullp (c + p)                         (if buf.[int i] then cullp s); culltest (i + 1u) in culltest 0u  seq {yield 2u; for i = 0u to lmtb do if buf.[int i] then yield i + i + 3u }`

Now much of the remaining execution time is just the time to enumerate the primes as can be seen by turning "primes" into a primes counting function by substituting the following for the last line in the above code doing the enumeration; this makes the code run about a further five times faster:

`  let rec count i acc =    if i > int lmtb then acc else if buf.[i] then count (i + 1) (acc + 1) else count (i + 1) acc  count 0 1`

Since the final enumeration of primes is the main remaining bottleneck, it is worth using a "roll-your-own" enumeration implemented as an object expression so as to save many inefficiencies in the use of the built-in seq computational expression by substituting the following code for the last line of the previous codes, which will decrease the execution time by a factor of over three (instead of almost five for the counting-only version, making it almost as fast):

`  let nmrtr() =    let i = ref -2    let rec nxti() = i:=!i + 1;if !i <= int lmtb && not buf.[!i] then nxti() else !i <= int lmtb    let inline curr() = if !i < 0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")                        else let v = uint32 !i in v + v + 3u    { new System.Collections.Generic.IEnumerator<_> with        member this.Current = curr()      interface System.Collections.IEnumerator with        member this.Current = box (curr())        member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()        member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"a      interface System.IDisposable with        member this.Dispose() = () }  { new System.Collections.Generic.IEnumerable<_> with      member this.GetEnumerator() = nmrtr()    interface System.Collections.IEnumerable with      member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }`

The various optimization techniques shown here can be used "jointly and severally" on any of the basic versions for various trade-offs between code complexity and performance. Not shown here are other techniques of making the sieve faster, including extending wheel factorization to much larger wheels such as 2/3/5/7, pre-culling the arrays, page segmentation, and multi-processing.

### Almost functional Unbounded

the following odds-only implmentations are written in an almost functional style avoiding the use of mutability except for the contents of the data structures uses to hold the state of the and any mutability necessary to implement a "roll-your-own" IEnumberable iterator interface for speed.

#### Unbounded Dictionary (Mutable Hash Table) Based Sieve

The following code uses the DotNet Dictionary class instead of the above functional Priority Queue to implement the sieve; as average (amortized) hash table access is O(1) rather than O(log n) as for the priority queue, this implementation is slightly faster than the priority queue version for the first million primes and will always be faster for any range above some low range value:

`type Prime = uint32let frstprm = 2ulet frstoddprm = 3ulet inc = 2ulet primesDict() =  let dct = System.Collections.Generic.Dictionary()  let rec nxtprm n q (bps: CIS<Prime>) =    if n >= q then let bp = bps.v in let adv = bp + bp                   let nbps = bps.cont() in let nbp = nbps.v                   dct.Add(n + adv, adv)                   nxtprm (n + inc) (nbp * nbp) nbps    else if dct.ContainsKey(n) then           let adv = dct.[n]           dct.Remove(n) |> ignore//           let mutable nn = n + adv // ugly imperative code//           while dct.ContainsKey(nn) do nn <- nn + adv//           dct.Add(nn, adv)           let rec nxtmt k = // advance to next empty spot             if dct.ContainsKey(k) then nxtmt (k + adv)             else dct.Add(k, adv) in nxtmt (n + adv)           nxtprm (n + inc) q bps         else CIS(n, fun() -> nxtprm (n + inc) q bps)  let rec oddprms() = CIS(frstoddprm, fun() ->      nxtprm (frstoddprm + inc) (frstoddprm * frstoddprm) (oddprms()))  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))             (CIS(frstprm, fun() -> (oddprms())))`

The above code uses functional forms of code (with the imperative style commented out to show how it could be done imperatively) and also uses a recursive non-sharing secondary source of base primes just as for the Priority Queue version. As for the functional codes, the Primes type can easily be changed to "uint64" for wider range of sieving.

In spite of having true O(n log log n) Sieve of Eratosthenes computational complexity where n is the range of numbers to be sieved, the above code is still not particularly fast due to the time required to compute the hash values and manipulations of the hash table.

#### Unbounded Page-Segmented Bit-Packed Odds-Only Mutable Array Sieve

Note that the following code is used for the F# entry Extensible_prime_generator#Unbounded_Mutable_Array_Generator of the Extensible prime generator page.

All of the above unbounded implementations including the above Dictionary based version are quite slow due to their large constant factor computational overheads, making them more of an intellectual exercise than something practical, especially when larger sieving ranges are required. The following code implements an unbounded page segmented version of the sieve in not that many more lines of code, yet runs about 25 times faster than the Dictionary version for larger ranges of sieving such as to one billion; it uses functional forms without mutability other than for the contents of the arrays and the `primes` enumeration generator function that must use mutability for speed:

`type Prime = float // use uint64/int64 for regular 64-bit F#type private PrimeNdx = float // they are slow in JavaScript polyfills let inline private prime n = float n // match these convenience conversionslet inline private primendx n = float n // with the types above! let private cPGSZBTS = (1 <<< 14) * 8 // sieve buffer size in bits = CPUL1CACHE type private SieveBuffer = uint8[] /// a Co-Inductive Stream (CIS) of an "infinite" non-memoized series...type private CIS<'T> = CIS of 'T * (unit -> CIS<'T>) //' apostrophe formatting adjustment /// lazy list (memoized) series of base prime page arrays...type private BasePrime = uint32type private BasePrimeArr = BasePrime[]type private BasePrimeArrs = BasePrimeArrs of BasePrimeArr * Option<Lazy<BasePrimeArrs>> /// Masking array is faster than bit twiddle bit shifts!let private cBITMASK = [| 1uy; 2uy; 4uy; 8uy; 16uy; 32uy; 64uy; 128uy |] let private cullSieveBuffer lwi (bpas: BasePrimeArrs) (sb: SieveBuffer) =  let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt  let rec loopbp (BasePrimeArrs(bpa, bpatl) as ibpas) i =    if i >= bpa.Length then      match bpatl with      | None -> ()      | Some lv -> loopbp lv.Value 0 else    let bp = prime bpa.[i] in let bpndx = primendx ((bp - prime 3) / prime 2)    let s = (bpndx * primendx 2) * (bpndx + primendx 3) + primendx 3 in let bpint = int bp    if s <= lmti then      let s0 = // page cull start address calculation...        if s >= lwi then int (s - lwi) else        let r = (lwi - s) % (primendx bp)        if r = primendx 0 then 0 else int (bp - prime r)      let slmt = min btlmt (s0 - 1 + (bpint <<< 3))      let rec loopc c = // loop "unpeeling" is used so        if c <= slmt then // a constant mask can be used over the inner loop          let msk = cBITMASK.[c &&& 7]          let rec loopw w =            if w < sb.Length then sb.[w] <- sb.[w] ||| msk; loopw (w + bpint)          loopw (c >>> 3); loopc (c + bpint)      loopc s0; loopbp ibpas (i + 1) in loopbp bpas 0 /// fast Counting Look Up Table (CLUT) for pop counting...let private cCLUT =  let arr = Array.zeroCreate 65536  let rec popcnt n cnt = if n > 0 then popcnt (n &&& (n - 1)) (cnt + 1) else uint8 cnt  let rec loop i = if i < 65536 then arr.[i] <- popcnt i 0; loop (i + 1)  loop 0; arr let countSieveBuffer ndxlmt (sb: SieveBuffer): int =  let lstw = (ndxlmt >>> 3) &&& -2  let msk = (-2 <<< (ndxlmt &&& 15)) &&& 0xFFFF  let inline cntem i m =    int cCLUT.[int (((uint32 sb.[i + 1]) <<< 8) + uint32 sb.[i]) ||| m]  let rec loop i cnt =    if i >= lstw then cnt - cntem lstw msk else loop (i + 2) (cnt - cntem i 0)  loop 0 ((lstw <<< 3) + 16) /// a CIS series of pages from the given start index with the given SieveBuffer size,/// and provided with a polymorphic converter function to produce/// and type of result from the culled page parameters...let rec private makePrimePages strtwi btsz                               (cnvrtrf: PrimeNdx -> SieveBuffer -> 'T): CIS<'T> =  let bpas = makeBasePrimes() in let sb = Array.zeroCreate (btsz >>> 3)  let rec nxtpg lwi =    Array.fill sb 0 sb.Length 0uy; cullSieveBuffer lwi bpas sb    CIS(cnvrtrf lwi sb, fun() -> nxtpg (lwi + primendx btsz))  nxtpg strtwi /// secondary feed of lazy list of memoized pages of base primes...and private makeBasePrimes(): BasePrimeArrs =  let sb2bpa lwi (sb: SieveBuffer) =    let bsbp = uint32 (primendx 3 + lwi + lwi)    let arr = Array.zeroCreate <| countSieveBuffer 255 sb    let rec loop i j =      if i < 256 then        if sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy then loop (i + 1) j        else arr.[j] <- bsbp + uint32 (i + i); loop (i + 1) (j + 1)    loop 0 0; arr  // finding the first page as not part of the loop and making succeeding  // pages lazy breaks the recursive data race!  let frstsb = Array.zeroCreate 32  let fkbpas = BasePrimeArrs(sb2bpa (primendx 0) frstsb, None)  cullSieveBuffer (primendx 0) fkbpas frstsb  let rec nxtbpas (CIS(bpa, tlf)) = BasePrimeArrs(bpa, Some(lazy (nxtbpas (tlf()))))  BasePrimeArrs(sb2bpa (primendx 0) frstsb,                Some(lazy (nxtbpas <| makePrimePages (primendx 256) 256 sb2bpa))) /// produces a generator of primes; uses mutability for better speed...let primes(): unit -> Prime =  let sb2prms lwi (sb: SieveBuffer) = lwi, sb in let mutable ndx = -1  let (CIS((nlwi, nsb), npgtlf)) = // use page generator function above!    makePrimePages (primendx 0) cPGSZBTS sb2prms  let mutable lwi = nlwi in let mutable sb = nsb  let mutable pgtlf = npgtlf  let mutable baseprm = prime 3 + prime (lwi + lwi)   fun() ->     if ndx < 0 then ndx <- 0; prime 2 else    let inline notprm i = sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy    while ndx < cPGSZBTS && notprm ndx do ndx <- ndx + 1    if ndx >= cPGSZBTS then // get next page if over      let (CIS((nlwi, nsb), npgtlf)) = pgtlf() in ndx <- 0      lwi <- nlwi; sb <- nsb; pgtlf <- npgtlf      baseprm <- prime 3 + prime (lwi + lwi)       while notprm ndx do ndx <- ndx + 1    let ni = ndx in ndx <- ndx + 1 // ready for next call!    baseprm + prime (ni + ni) let countPrimesTo (limit: Prime): int = // much faster!  if limit < prime 3 then (if limit < prime 2 then 0 else 1) else  let topndx = (limit - prime 3) / prime 2 |> primendx  let sb2cnt lwi (sb: SieveBuffer) =    let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt    countSieveBuffer      (if lmti < topndx then btlmt else int (topndx - lwi)) sb, lmti  let rec loop (CIS((cnt, nxti), tlf)) count =    if nxti < topndx then loop (tlf()) (count + cnt)    else count + cnt  loop <| makePrimePages (primendx 0) cPGSZBTS sb2cnt <| 1 /// sequences are convenient but slow...let primesSeq() = primes() |> Seq.unfold (fun gen -> Some(gen(), gen))printfn "The first 25 primes are:  %s"  ( primesSeq() |> Seq.take 25      |> Seq.fold (fun s p -> s + string p + " ") "" )printfn "There are %d primes up to a million."   ( primesSeq() |> Seq.takeWhile ((>=) (prime 1000000)) |> Seq.length ) let rec cntto gen lmt cnt = // faster than seq's but still slow  if gen() > lmt then cnt else cntto gen lmt (cnt + 1) let limit = prime 1_000_000_000let start = System.DateTime.Now.Ticks// let answr = cntto (primes()) limit 0 // slower way!let answr = countPrimesTo limit // over twice as fast way!let elpsd = (System.DateTime.Now.Ticks - start) / 10000Lprintfn "Found %d primes to %A in %d milliseconds." answr limit elpsd`
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Found 50847534 primes to 1000000000 in 2161 milliseconds.```

As with all of the efficient unbounded sieves, the above code uses a secondary enumerator of the base primes less than the square root of the currently culled range, which is this case is a lazy (deferred memoized evaluation) binding by small pages of base primes which also uses the laziness of the deferral of subsequent pages so as to avoid a race condition.

The above code is written to output the "uint64" type for very large ranges of primes since there is little computational cost to doing this for this algorithm when used with 64-bit compilation; however, for the Fable transpiled to JavaScript, the largest contiguous integer that can be represented is the 64-bit floating point mantissa of 52 bits and thus the large numbers can be represented by floats in this case since a 64-bit polyfill is very slow. As written, the practical range for this sieve is about 16 billion, however, it can be extended to about 10^14 (a week or two of execution time) by setting the "PGSZBTS" constant to the size of the CPU L2 cache rather than the L1 cache (L2 is up to about two Megabytes for modern high end desktop CPU's) at a slight loss of efficiency (a factor of up to two or so) per composite number culling operation due to the slower memory access time. When the Fable compilation option is used, execution speed is roughly the same as using F# with DotNet Core.

Even with the custom `primes` enumerator generator (the F#/Fable built-in sequence operators are terribly inefficient), the time to enumerate the resulting primes takes longer than the time to actually cull the composite numbers from the sieving arrays. The time to do the actual culling is thus over 50 times faster than done using the Dictionary version. The slowness of enumeration, no matter what further tweaks are done to improve it (each value enumerated will always take a function calls and a scan loop that will always take something in the order of 100 CPU clock cycles per value), means that further gains in speed using extreme wheel factorization and multi-processing have little point unless the actual work on the resulting primes is done through use of auxiliary functions not using iteration. Such a function is provided here to count the primes by pages using a "pop count" look up table to reduce the counting time to only a small fraction of a second.

## Factor

Factor already contains two implementations of the sieve of Eratosthenes in `math.primes.erato` and `math.primes.erato.fast`. It is suggested to use one of them for real use, as they use faster types, faster unsafe arithmetic, and/or wheels to speed up the sieve further. Shown here is a more straightforward implementation that adheres to the restrictions given by the task (namely, no wheels).

Factor is pleasantly multiparadigm. Usually, it's natural to write more functional or declarative code in Factor, but this is an instance where it is more natural to write imperative code. Lexical variables are useful here for expressing the necessary mutations in a clean way.

`USING: bit-arrays io kernel locals math math.functionsmath.ranges prettyprint sequences ;IN: rosetta-code.sieve-of-erato <PRIVATE : init-sieve ( n -- seq )   ! Include 0 and 1 for easy indexing.    1 - <bit-array> dup set-bits ?{ f f } prepend ; ! Given the sieve and a prime starting index, create a range of! values to mark composite. Start at the square of the prime.: to-mark ( seq n -- range )    [ length 1 - ] [ dup dup * ] bi* -rot <range> ; ! Mark multiples of prime n as composite.: mark-nths ( seq n -- )     dupd to-mark [ swap [ f ] 2dip set-nth ] with each ; : next-prime ( index seq -- n ) [ t = ] find-from drop ; PRIVATE> :: sieve ( n -- seq )    n sqrt 2 n init-sieve :> ( limit i! s )    [ i limit < ]             ! sqrt optimization     [ s i mark-nths i 1 + s next-prime i! ] while t s indices ; : sieve-demo ( -- )    "Primes up to 120 using sieve of Eratosthenes:" print    120 sieve . ; MAIN: sieve-demo`

## FOCAL

`1.1 T "PLEASE ENTER LIMIT"1.2 A N1.3 I (2047-N)5.11.4 D 21.5 Q 2.1 F X=2,FSQT(N); D 32.2 F W=2,N; I (SIEVE(W)-2)4.1 3.1 I (-SIEVE(X))3.33.2 F Y=X*X,X,N; S SIEVE(Y)=23.3 R 4.1 T %4.0,W,! 5.1 T "PLEASE ENTER A NUMBER LESS THAN 2048."!; G 1.1`

Note that with the 4k paper tape version of FOCAL, the program will run out of memory for N>190 or so.

## Forth

```: prime? ( n -- ? ) here + [email protected] 0= ;
: composite! ( n -- ) here + 1 swap c! ;

: sieve ( n -- )
here over erase
2
begin
2dup dup * >
while
dup prime? if
2dup dup * do
i composite!
dup +loop
then
1+
repeat
drop
." Primes: " 2 do i prime? if i . then loop ;

100 sieve
```
Output:
`Primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 `

### Alternate Odds-Only, Better Style

The above code is not really very good Forth style as the main initialization, sieving, and output, are all in one `sieve` routine which makes it difficult to understand and refactor; Forth code is normally written in a series of very small routines which makes it easier to understand what is happening on the data stack, since Forth does not have named local re-entrant variable names as most other languages do for local variables (which other languages also normally store local variables on the stack). Also, it uses the `HERE` pointer to user space which points to the next available memory after all compilation is done as a unsized buffer pointer, but as it does not reserve that space for the sieving buffer, it can be changed by other concatenated routines in unexpected ways; better is to allocate the sieving buffer as required from the available space at the time the routines are run and pass that address between concatenated functions until a finalization function frees the memory and clears the stack; this is equivalent to allocating from the "heap" in other languages. The below code demonstrates these ideas:

`: prime? ( addr -- ? ) [email protected] 0= ; \ test composites array for prime \ given square index and prime index, u0, sieve the multiples of said prime...: cullpi! ( u addr u u0 -- u addr u0 )   DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri   DO 2 PICK I + TRUE SWAP C! DUP +LOOP DROP ; \ process for required prime limit; allocate and initialize returned buffer...: initsieve ( u -- u a-addr)   3 - DUP 0< IF 0 ELSE      1 RSHIFT 1+ DUP ALLOCATE 0<> IF ABORT" Memory allocation error!!!"      ELSE 2DUP SWAP ERASE THEN   THEN ; \ pass through sieving to given index in given buffer address as side effect...: sieve ( u a-addr -- u a-addr )   0 \ initialize test index i -- numv bufa i   BEGIN \ test prime square index < limit      DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3   WHILE \ -- numv bufa sqri i      2 PICK OVER + prime? IF cullpi! \ -- numv bufa i      ELSE SWAP DROP THEN 1+ \ -- numv bufa ni   REPEAT 2DROP ; \ -- numv bufa; drop sqri i \ print primes to given limit...: .primes ( u a-addr -- )   OVER 0< IF DROP 2 - 0< IF ( ." No primes!" ) ELSE ( ." Prime:  2" ) THEN   ELSE ." Primes:  2 " SWAP 0      DO DUP I + prime? IF I I + 3 + . THEN LOOP FREE DROP THEN ; \ count number of primes found for number odd numbers within\ given presumed sieved buffer starting at address...: [email protected] ( u a-addr -- )  SWAP DUP 0< IF 1+ 0< IF DROP 0 ELSE 1 THEN   ELSE 1 SWAP \ -- bufa cnt numv      0 DO OVER I + prime? IF 1+ THEN LOOP SWAP FREE DROP   THEN ; \ shows counted number of primes to the given limit...: .countprimesto ( u -- )   DUP initsieve sieve [email protected]   CR ." Found " . ." primes Up to the " . ." limit." ; \ testing the code...100 initsieve sieve .primes1000000 .countprimesto`
Output:
```Primes:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes Up to the 1000000 limit.```

As well as solving the stated problems making it much easier to understand and refactor, an odds-only sieve takes half the space and less than half the time.

### Bit-Packing the Sieve Buffer (Odds-Only)

Although the above version resolves many problems of the first version, it is wasteful of memory as each composite number in the sieve buffer is a byte of eight bits representing a boolean value. The memory required can be reduced eight-fold by bit packing the sieve buffer; this will take more "bit-twiddling" to read and write the bits, but reducing the memory used will give better cache assiciativity to larger ranges such that there will be a net gain in performance. This will make the code more complex and the stack manipulations will be harder to write, debug, and maintain, so ANS Forth 1994 provides a local variable naming facility to make this much easier. The following code implements bit-packing of the sieve buffer using local named variables when required:

`\ produces number of one bits in given word...: numbts ( u -- u ) \ pop count number of bits...   0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ; \ constants for variable 32/64 etc. CELL size...1 CELLS 3 LSHIFT 1- CONSTANT CellMskCellMsk numbts CONSTANT CellShft CREATE bits 8 ALLOT \ bit position Look Up Table...: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts \ test bit index composites array for prime...: prime? ( u addr -- ? )    OVER 3 RSHIFT + [email protected] SWAP 7 AND bits + [email protected] AND 0= ; \ given square index and prime index, u0, sieve the multiples of said prime...: cullpi! ( u addr u u0 -- u addr u0 )   DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri   DO I 3 RSHIFT 3 PICK + DUP [email protected] I 7 AND bits + [email protected] OR SWAP C! DUP +LOOP   DROP ; \ initializes sieve storage and parameters\ given sieve limit, returns bit limit and buffer address ..: initsieve ( u -- u a-addr )   3 - \ test limit...   DUP 0< IF 0 ELSE \ return if number of bits is <= 0!      1 RSHIFT 1+ \ finish conbersion to number of bits      DUP 1- CellShft RSHIFT 1+ \ round up to even number of cells      CELLS DUP ALLOCATE 0= IF DUP ROT ERASE \ set cells0. to zero      ELSE ABORT" Memory allocation error!!!"      THEN   THEN ; \ pass through sieving to given index in given buffer address as side effect...: sieve ( u a-addr -- u a-addr )   0 \ initialize test index i -- numv bufa i   BEGIN \ test prime square index < limit      DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3   WHILE \ -- numv bufa sqri i      DUP 3 PICK prime? IF cullpi! \ -- numv bufa i      ELSE SWAP DROP THEN 1+ \ -- numv bufa ni   REPEAT 2DROP ; \ -- numv bufa; drop sqri i \ prints already found primes from sieved array...: .primes ( u a-addr -- )   SWAP CR ." Primes to " DUP DUP + 2 + 2 MAX . ." are:  "   DUP 0< IF 1+ 0< IF ." none." ELSE 2 . THEN DROP \ case no primes or just 2   ELSE 2 . 0 DO I OVER prime? IF I I + 3 + . THEN LOOP FREE DROP   THEN ; \ pop count style Look Up Table by 16 bits entry;\ is a 65536 byte array containing number of zero bits for each index...CREATE cntLUT16 65536 ALLOT: mkpop ( u -- u )   numbts 16 SWAP - ;: initLUT ( -- )   cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT: [email protected] ( u -- u )   0 1 CELlS 1 RSHIFT 0   DO OVER 65535 AND cntLUT16 + [email protected] + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ; \ count number of zero bits up to given bits index-1 in array address;\ params are number of bits used - bits, negative indicates <2/2 out: 0/1,\ given address is of the allocated bit buffer - bufa;\ values used: bmsk is bit mask to limit bit in last cell,\ lci is cell index of last cell used, cnt is the return value...\ NOTE. this is for little-endian; big-endian needs a byte swap\ before the last mask and popcount operation!!!: [email protected] ( u a-addr -- u )   LOCALS| bufa numb |   numb 0< IF numb 1+ 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!   ELSE      numb 1- TO numb \ numb -= 1      1 \ initial count      numb CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv      0 ?DO bufa I + @ [email protected] + 1 CELLS +LOOP \ -- lci cnt      SWAP bufa + @ \ -- cnt lstCELL      -2 numb CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL      [email protected] + \ add popcount of last masked CELL -- cnt      bufa FREE DROP \ free bufa -- bmsk cnt [email protected]   THEN ; : .countprimesto ( u -- u )   dup initsieve sieve [email protected]   CR ." There are " . ." primes Up to the " . ." limit." ; 100 initsieve sieve .primes1000000000 .countprimesto`

The output of the above code is the same as the previous version, but it takes about two thirds the time while using eight times less memory; it takes about 6.5 seconds on my Intel Skylake i5-6500 at 3.6 GHz (turbo) using swiftForth (32-bit) and about 3.5 seconds on VFX Forth (64-bit), both of which compile to machine code but with the latter much more optimized; gforth-fast is about twice as slow as swiftForth and five times slower then VFX Forth as it just compiles to threaded execution tokens (more like an interpreter).

### Page-Segmented Bit-Packed Odds-Only Version

While the above version does greatly reduce the amount of memory used for a given sieving range and thereby also somewhat reduces execution time; any sieve intended for sieving to limits of a hundred million or more should use a page-segmented implementation; page-segmentation means that only storage for a representation of the base primes up to the square root of the limit plus a sieve buffer that should also be at least proportional to the same square root is required; this will again make the execution faster as ranges go up due to better cache associativity with most memory accesses being within the CPU cache sizes. The following Forth code implements a basic version that does this:

`\ CPU L1 and L2 cache sizes in bits; power of 2...1 17 LSHIFT CONSTANT L1CacheBitsL1CacheBits 8 * CONSTANT L2CacheBits \ produces number of one bits in given word...: numbts ( u -- u ) \ pop count number of bits...   0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ; \ constants for variable 32/64 etc. CELL size...1 CELLS 3 LSHIFT 1- CONSTANT CellMskCellMsk numbts CONSTANT CellShft CREATE bits 8 ALLOT \ bit position Look Up Table...: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts \ initializes sieve buffer storage and parameters\ given sieve buffer bit size (even number of CELLS), returns buffer address ..: initSieveBuffer ( u -- a-addr )   CellShft RSHIFT \ even number of cells   CELLS ALLOCATE 0<> IF ABORT" Memory allocation error!!!" THEN ; \ test bit index composites array for prime...: prime? ( u addr -- ? )    OVER 3 RSHIFT + [email protected] SWAP 7 AND bits + [email protected] AND 0= ; \ given square index and prime index, u0, as sell as bitsz,\ sieve the multiples of said prime leaving prime index on the stack...: cullpi! ( u u0 u u addr -- u0 )   LOCALS| sba bitsz lwi | DUP DUP + 3 + ROT \ -- i prm sqri   \ culling start incdx address calculation...   lwi 2DUP > IF - ELSE SWAP - OVER MOD DUP 0<> IF OVER SWAP - THEN   THEN bitsz SWAP \ -- i prm bitsz strti   DO I 3 RSHIFT sba + DUP [email protected] I 7 AND bits + [email protected] OR SWAP C! DUP +LOOP   DROP ; \ cull sieve buffer given base wheel index, bit size, \ address base prime sieved buffer and\ the address of the sieve buffer to be culled of composite bits...: cullSieveBuffer ( u u a-addr a-addr -- )   >R >R 2DUP + R> R>  \ -- lwi bitsz rngi bpba sba   LOCALS| sba bpba rngi bitsz lwi |   bitsz 1- CellShft RSHIFT 1+ CELLS sba SWAP ERASE \ clear sieve buffer   0 \ initialize base prime index i -- i   BEGIN \ test prime square index < limit      DUP DUP DUP + SWAP 3 + * 3 + TUCK rngi < \ sqri = 2*i * (I+3) + 3   WHILE \ -- sqri i      DUP bpba prime? IF lwi bitsz sba cullpi! ELSE SWAP DROP THEN \ -- i        1+ REPEAT 2DROP ; \ -- \ pop count style Look Up Table by 16 bits entry;\ is a 65536 byte array containing number of zero bits for each index...CREATE cntLUT16 65536 ALLOT: mkpop ( u -- u )   numbts 16 SWAP - ;: initLUT ( -- )   cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT: [email protected] ( u -- u )   0 1 CELlS 1 RSHIFT 0   DO OVER 65535 AND cntLUT16 + [email protected] + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ; \ count number of zero bits up to given bits index in array address...: [email protected] ( u a-addr -- u )   LOCALS| bufa lmti |   0 \ initial count -- cnt   lmti CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv   0 ?DO bufa I + @ [email protected] + 1 CELLS +LOOP \ -- lci cnt   SWAP bufa + @ \ -- cnt lstCELL   -2 lmti CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL   [email protected] + ; \ add popcount of last masked CELL -- cnt \ prints found primes from series of culled sieve buffers...: .primes ( u -- )   DUP CR ." Primes to " . ." are:  "   DUP 3 - 0< IF DUP 2 - 0< IF ." none." ELSE 2 . THEN \ <2/2 -> 0/1   ELSE 2 .      3 - 1 RSHIFT 1+ \ -- rngi      DUP 1- L2CacheBits / L2CacheBits * 3 RSHIFT \ -- rng rngi pglmtbytes      L1CacheBits initSieveBuffer \ address of base prime sieve buffer      L2CacheBits initSieveBuffer \ address of main sieve buffer      LOCALS| sba bpsba pglmt | \ local variables -- rngi      0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer      pglmt 0 ?DO         I L2CacheBits bpsba sba cullSieveBuffer         I L2CacheBits 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP      L2CacheBits +LOOP \ rngi      L2CacheBits mod DUP 0> IF \ one more page!         pglmt DUP L2CacheBits bpsba sba cullSieveBuffer         SWAP 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP      THEN bpsba FREE DROP sba FREE DROP   THEN ; \ -- \ prints count of found primes from series of culled sieve buffers...: .countPrimesTo ( u -- )   DUP 3 - 0< IF 2 - 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!   ELSE      DUP 3 - 1 RSHIFT 1+      DUP 1- L2CacheBits / L2CacheBits * \ -- rng rngi pglmtbytes      L1CacheBits initSieveBuffer \ address of base prime sieve buffer      L2CacheBits initSieveBuffer \ address of main sieve buffer      LOCALS| sba bpsba pglmt | \ local variables -- rng rngi      0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer      1 pglmt 0 ?DO         I L2CacheBits bpsba sba cullSieveBuffer         L2CacheBits 1- sba [email protected] +      L2CacheBits +LOOP \ rng rngi cnt      SWAP L2CacheBits mod DUP 0> IF \ one more page!         pglmt OVER bpsba sba cullSieveBuffer         1- sba [email protected] + \ partial count!      THEN      bpsba FREE DROP sba FREE DROP \ -- range cnt   THEN CR ." There are " . ." primes Up to the " . ." limit." ; 100 .primes1000000000 .countPrimesTo`
Output:
```Primes to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 50847534 primes Up to the 1000000000 limit.```

For simplicity, the base primes array is left as a sieved bit packed array (which takes minimum space) at the cost of having to scan the bit array for base primes on every page-segment culling pass. The page-segment sieve buffer is set as a fixed multiple of this (intended to fit within the CPU L2 cache size) in order to reduce the base prime start index address calculation overhead by this factor at the cost of slightly increased memory access times, which access times are still only about the same as the fastest inner culling time or less anyway. When the cache sizes are set to the 32 Kilobyte/256 Kilobyte size for L1/L2, respectively, by changing `1 18 LSHIFT CONSTANT L1CacheBits`) as for my Intel Skylake i5-6500 at 3.6 GHz (single-threaded turbo), it runs in about 1.25 seconds on 64-bit VFX Forth, 3.75 seconds on 32-bit swiftForth, and 12.4 seconds on 64-bit gforth-fast, obviously with the tuned in-lined machine language compiling of VFX Forth much faster than the threaded execution token interpreting of gforth and with swiftForth lacking the machine code inlining of VFX Forth.

VFX Forth is only about 25 % slower than the algorithm as written in the fastest of languages, just as they advertise.

As written, the algorithm works efficiently up to over ten billion (1e10) with 64-bit systems, but could easily be refactored to use floating point or double precision for inputs and outputs as I have done in a StackOverflow answer in JavaScript without costing much in execution time so 32-bit systems would have the much higher limit.

The implementation is efficient up to this range, but with a change so that the base primes array can grow with increasing limit, can sieve to much higher ranges with a loss of efficiency in unused base prime start address calculations that can't be used as the culling spans exceed the fixed sieve buffer size. Again, this can be solved by also making the page-segmentation sieve buffer grow as the square root of the limit.

Further improvements by a factor of almost four in overall execution speed would be gained by implementing maximum wheel-factorization as per my other StackOverflow JavaScript answer, which also effectively increases sieve buffer sizes by a factor of 48 in sieving by modulo residual bit planes.

Finally, multi-processing could be applied to increase the execution speed by about the number of effective cores (non SMT - Hyper Threads) as in four on my Skylake machine; however, neither the 1994 ANS Forth standard nor the 2012 standard has a standard Forth way of implementing this so each of the implementations use their own custom WORDS; since the resulting code would not be cross-implementation, I am not going to do this.

I likely won't even add the Maximum Wheel-Factorized version as in the above linked JavaScript code, since this code is enough to demonstrate what I was going to show: that Forth can be an efficient language, albeit a little hard to code, read, and maintain due to the reliance on anonymous data stack operations; it is a language whose best use is likely in cross-compiling to embedded systems where it can easily be customized and extended as required, and because it doesn't actually require a base operating system, can use its core facilities, functions, and extensions in place of such an OS to result in a minimum memory footprint.

## Fortran

Works with: Fortran version 90 and later
`program sieve   implicit none  integer, parameter :: i_max = 100  integer :: i  logical, dimension (i_max) :: is_prime   is_prime = .true.  is_prime (1) = .false.  do i = 2, int (sqrt (real (i_max)))    if (is_prime (i)) is_prime (i * i : i_max : i) = .false.  end do  do i = 1, i_max    if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i  end do  write (*, *) end program sieve`

Output:

`2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97`

Because it uses four byte logical's (default size) as elements of the sieve buffer, the above code uses 400 bytes of memory for this trivial task of sieving to 100; it also has 49 + 31 + 16 + 8 = 104 (for the culling by the primes of two, three, five, and seven) culling operations.

Optimised using a pre-computed wheel based on 2:

`program sieve_wheel_2   implicit none  integer, parameter :: i_max = 100  integer :: i  logical, dimension (i_max) :: is_prime   is_prime = .true.  is_prime (1) = .false.  is_prime (4 : i_max : 2) = .false.  do i = 3, int (sqrt (real (i_max))), 2    if (is_prime (i)) is_prime (i * i : i_max : 2 * i) = .false.  end do  do i = 1, i_max    if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i  end do  write (*, *) end program sieve_wheel_2`

Output:

`2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97`

This so-called "optimized" version still uses 400 bytes of memory but slightly reduces to 74 operations from 104 operations including the initialization of marking all of the even representations as composite due to skipping the re-culling of the even representation, so isn't really much of an optimization at all!

Optimized using a proper implementation of a wheel 2:

The above implementations, especially the second odds-only code, are some of the most inefficient versions of the Sieve of Eratosthenes in any language here as to time and space efficiency, only worse by some naive JavaScript implementations that use eight-byte Number's as logical values; the second claims to be wheel factorized but still uses all the same memory as the first and still culls by the even numbers in the initialization of the sieve buffer. As well, using four bytes (default logical size) to store a boolean value is terribly wasteful if these implementations were to be extended to non-toy ranges. The following code implements proper wheel factorization by two, reducing the space used by a factor of about eight to 49 bytes by using `byte` as the sieve buffer array elements and not requiring the evens initialization, thus reducing the number of operations to 16 + 8 + 4 = 28 (for the culling primes of three, five, and seven) culling operations:

`program sieve_wheel_2   implicit none  integer, parameter :: i_max = 100  integer, parameter :: i_limit = (i_max - 3) / 2  integer :: i  byte, dimension (0:i_limit) :: composites   composites = 0  do i = 0, (int (sqrt (real (i_max))) - 3) / 2    if (composites(i) == 0) composites ((i + i) * (i + 3) + 3 : i_limit : i + i + 3) = 1.  end do  write (*, '(i0, 1x)', advance = 'no') 2  do i = 0, i_limit    if (composites (i) == 0) write (*, '(i0, 1x)', advance = 'no') (i + i + 3)  end do  write (*, *) end program sieve_wheel_2`

The output is the same as the earlier version.

Optimized using bit packing to reduce the memory use by a further factor of eight:

The above implementation is still space inefficient in effectively only using one bit out of eight; the following version implements bit packing to reduce memory use by a factor of eight by using bits to represent composite numbers rather than bytes:

`program sieve_wheel_2   implicit none  integer, parameter :: i_max = 10000000  integer, parameter :: i_range = (i_max - 3) / 2  integer :: i, j, k, cnt  byte, dimension (0:i_range / 8) :: composites   composites = 0 ! pre-initialized?  do i = 0, (int (sqrt (real (i_max))) - 3) / 2    if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then      do j = (i + i) * (i + 3) + 3, i_range, i + i + 3        k = shiftr(j, 3)        composites(k) = ior(composites(k), shiftl(1, iand(j, 7)))      end do    end if  end do!  write (*, '(i0, 1x)', advance = 'no') 2  cnt = 1  do i = 0, i_range    if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then!      write (*, '(i0, 1x)', advance = 'no') (i + i + 3)      cnt = cnt + 1    end if  end do!  write (*, *)  print '(a, i0, a, i0, a, f0.0, a)', &        'There are ', cnt, ' primes up to ', i_max, '.'end program sieve_wheel_2`
Output:
`There are 664579 primes up to 10000000.`

When the lines to print the results are enabled, the output to a maximum value of 100 is still exactly the same as the other versions, and it has exactly the same number of culling operations as the immediately above optimized version for the same range; the only difference is that less memory is used. Although the culling operations are somewhat more complex, for larger ranges the time saved in better cache associativity due to more effective use of the cache more than makes up for it so average culling time is actually reduced, so that this version can count the number of primes to several million (it takes a lot of time to list hundreds of thousands of primes, but counting is faster) in a few tens of milliseconds. For ranges above a few tens of millions, a page-segmented sieve is much more efficient due to further improved use of the CPU caches.

### Multi-Threaded Page-Segmented Bit-Packed Odds-Only Version

As well as adding page-segmentation, the following code adds multi-processing which is onc of the capabilities for which modern Fortran is known:

`subroutine cullSieveBuffer(lwi, size, bpa, sba)     implicit none    integer, intent(in) :: lwi, size    byte, intent(in) :: bpa(0:size - 1)    byte, intent(out) :: sba(0:size - 1)    integer :: i_limit, i_bitlmt, i_bplmt, i, sqri, bp, si, olmt, msk, j    byte, dimension (0:7) :: bits    common /twiddling/ bits     i_bitlmt = size * 8 - 1    i_limit = lwi + i_bitlmt    i_bplmt = size / 4    sba = 0    i = 0    sqri = (i + i) * (i + 3) + 3    do while (sqri <= i_limit)      if (iand(int(bpa(shiftr(i, 3))), shiftl(1, iand(i, 7))) == 0) then        ! start index address calculation...        bp = i + i + 3        if (lwi <= sqri) then          si = sqri - lwi        else          si = mod((lwi - sqri), bp)          if (si /= 0) si = bp - si        end if        if (bp <= i_bplmt) then          olmt = min(i_bitlmt, si + bp * 8 - 1)          do while (si <= olmt)            msk = bits(iand(si, 7))            do j = shiftr(si, 3), size - 1, bp              sba(j) = ior(int(sba(j)), msk)            end do            si = si + bp          end do        else          do while (si <= i_bitlmt)            j = shiftr(si, 3)            sba(j) = ior(sba(j), bits(iand(si, 7)))            si = si + bp          end do        end if      end if      i = i + 1      sqri = (i + i) * (i + 3) + 3    end do   end subroutine cullSieveBuffer   integer function countSieveBuffer(lmti, almti, sba)     implicit none    integer, intent(in) :: lmti, almti    byte, intent(in) :: sba(0:almti)    integer :: bmsk, lsti, i, cnt    byte, dimension (0:65535) :: clut    common /counting/ clut     cnt = 0    bmsk = iand(shiftl(-2, iand(lmti, 15)), 65535)    lsti = iand(shiftr(lmti, 3), -2)    do i = 0, lsti - 1, 2      cnt = cnt + clut(shiftl(iand(int(sba(i)), 255), 8) + iand(int(sba(i + 1)), 255))    end do    countSieveBuffer = cnt + clut(ior(shiftl(iand(int(sba(lsti)), 255), 8) + iand(int(sba(lsti + 1)), 255), bmsk))   end function countSieveBuffer   program sieve_paged     use OMP_LIB    implicit none    integer, parameter :: i_max = 1000000000, i_range = (i_max - 3) / 2    integer, parameter :: i_l1cache_size = 16384, i_l1cache_bitsz = i_l1cache_size * 8    integer, parameter :: i_l2cache_size = i_l1cache_size * 8, i_l2cache_bitsz = i_l2cache_size * 8    integer :: cr, c0, c1, i, j, k, cnt    integer, save :: scnt    integer :: countSieveBuffer    integer :: numthrds    byte, dimension (0:i_l1cache_size - 1) :: bpa    byte, save, allocatable, dimension (:) :: sba    byte, dimension (0:7) :: bits = (/ 1, 2, 4, 8, 16, 32, 64, -128 /)    byte, dimension (0:65535) :: clut    common /twiddling/ bits    common /counting/ clut     type heaparr      byte, allocatable, dimension(:) :: thrdsba    end type heaparr    type(heaparr), allocatable, dimension (:) :: sbaa     !\$OMP THREADPRIVATE(scnt, sba)     numthrds = 1    !\$ numthrds = OMP_get_max_threads()    allocate(sbaa(0:numthrds - 1))    do i = 0, numthrds - 1      allocate(sbaa(i)%thrdsba(0:i_l2cache_size - 1))    end do     CALL SYSTEM_CLOCK(count_rate=cr)    CALL SYSTEM_CLOCK(c0)    do k = 0, 65535 ! initialize counting Look Up Table      j = k      i = 16      do while (j > 0)        i = i - 1        j = iand(j, j - 1)      end do      clut(k) = i    end do    bpa = 0 ! pre-initialization not guaranteed!    call cullSieveBuffer(0, i_l1cache_size, bpa, bpa)     cnt = 1    !\$OMP PARALLEL DO ORDERED      do i = i_l2cache_bitsz, i_range, i_l2cache_bitsz * 8        scnt = 0        sba = sbaa(mod(i, numthrds))%thrdsba        do j = i, min(i_range, i + 8 * i_l2cache_bitsz - 1), i_l2cache_bitsz          call cullSieveBuffer(j - i_l2cache_bitsz, i_l2cache_size, bpa, sba)          scnt = scnt + countSieveBuffer(i_l2cache_bitsz - 1, i_l2cache_size, sba)        end do        !\$OMP ATOMIC          cnt = cnt + scnt      end do    !\$OMP END PARALLEL DO     j = i_range / i_l2cache_bitsz * i_l2cache_bitsz    k = i_range - j    if (k /= i_l2cache_bitsz - 1) then      call cullSieveBuffer(j, i_l2cache_size, bpa, sbaa(0)%thrdsba)      cnt = cnt + countSieveBuffer(k, i_l2cache_size, sbaa(0)%thrdsba)    end if  !  write (*, '(i0, 1x)', advance = 'no') 2  !  do i = 0, i_range  !    if (iand(sba(shiftr(i, 3)), bits(iand(i, 7))) == 0) write (*, '(i0, 1x)', advance='no') (i + i + 3)  !  end do  !  write (*, *)    CALL SYSTEM_CLOCK(c1)    print '(a, i0, a, i0, a, f0.0, a)', 'Found ', cnt, ' primes up to ', i_max, &          ' in ', ((c1 - c0) / real(cr) * 1000), ' milliseconds.'     do i = 0, numthrds - 1      deallocate(sbaa(i)%thrdsba)    end do    deallocate(sbaa)   end program sieve_paged`
Output:
`Found 50847534 primes up to 1000000000 in 219. milliseconds.`

The above output was as compiled with gfortran -O3 -fopenmp using version 11.1.1-1 on my Intel Skylake i5-6500 CPU at 3.2 GHz multithreaded with four cores. There are a few more optimizations that could be made in applying Maximum Wheel-Factorization as per my StackOverflow answer in JavaScript, which will make this almost four times faster yet again. If that optimization were done, sieving to a billion as here is really too trivial to measure and one should sieve at least up to ten billion to start to get a long enough time to be measured accurately. As explained in that answer, the Maximum Wheel-Factorized code will work efficiently up to about a trillion (1e12), when it needs yet another "bucket sieve" optimization to allow it to continue to scale efficiently for increasing range. The final optimization which can speed up the code by almost a factor of two is a very low level loop unrolling technique that I'm not sure will work with the compiler, but as it works in C/C++ and other similar languages including those that compile through LLVM, it ought to.

## Free Pascal

### Basic version

function Sieve returns a list of primes less than or equal to the given aLimit

` program prime_sieve;{\$mode objfpc}{\$coperators on}uses  SysUtils, GVector;type  TPrimeList = specialize TVector<DWord>;function Sieve(aLimit: DWord): TPrimeList;var  IsPrime: array of Boolean;  I, SqrtBound: DWord;  J: QWord;begin  Result := TPrimeList.Create;  Inc(aLimit, Ord(aLimit < High(DWord))); //not a problem because High(DWord) is composite  SetLength(IsPrime, aLimit);  FillChar(Pointer(IsPrime)^, aLimit, Byte(True));  SqrtBound := Trunc(Sqrt(aLimit));  for I := 2 to aLimit do    if IsPrime[I] then      begin        Result.PushBack(I);        if I <= SqrtBound then          begin            J := I * I;            repeat              IsPrime[J] := False;              J += I;            until J > aLimit;          end;      end;end;  //usage var  Limit: DWord = 0;function ReadLimit: Boolean;var  Lim: Int64;begin  if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then    if (Lim >= 0) and (Lim <= High(DWord)) then      begin        Limit := DWord(Lim);        exit(True);      end;  Result := False;end;procedure PrintUsage;begin  WriteLn('Usage: prime_sieve Limit');  WriteLn('  where Limit in the range [0, ', High(DWord), ']');  Halt;end;procedure PrintPrimes(aList: TPrimeList);var  I: DWord;begin  for I := 0 to aList.Size - 2 do    Write(aList[I], ', ');  WriteLn(aList[aList.Size - 1]);  aList.Free;end;begin  if not ReadLimit then    PrintUsage;  try    PrintPrimes(Sieve(Limit));  except    on e: Exception do      WriteLn('An exception ', e.ClassName, ' occurred with message: ', e.Message);  end;end. `

### Alternative segmented(odds only) version

function OddSegmentSieve returns a list of primes less than or equal to the given aLimit

` program prime_sieve;{\$mode objfpc}{\$coperators on}uses  SysUtils, Math;type  TPrimeList = array of DWord;function OddSegmentSieve(aLimit: DWord): TPrimeList;  function EstimatePrimeCount(aLimit: DWord): DWord;  begin    case aLimit of      0..1:   Result := 0;      2..200: Result := Trunc(1.6 * aLimit/Ln(aLimit)) + 1;    else      Result := Trunc(aLimit/(Ln(aLimit) - 2)) + 1;    end;  end;  function Sieve(aLimit: DWord; aNeed2: Boolean): TPrimeList;  var    IsPrime: array of Boolean;    I: DWord = 3;    J, SqrtBound: DWord;    Count: Integer = 0;  begin    if aLimit < 2 then      exit(nil);    SetLength(IsPrime, (aLimit - 1) div 2);    FillChar(Pointer(IsPrime)^, Length(IsPrime), Byte(True));    SetLength(Result, EstimatePrimeCount(aLimit));    SqrtBound := Trunc(Sqrt(aLimit));    if aNeed2 then      begin        Result := 2;        Inc(Count);      end;    for I := 0 to High(IsPrime) do      if IsPrime[I] then        begin          Result[Count] := I * 2 + 3;          if Result[Count] <= SqrtBound then            begin              J := Result[Count] * Result[Count];              repeat                IsPrime[(J - 3) div 2] := False;                J += Result[Count] * 2;              until J > aLimit;            end;          Inc(Count);        end;    SetLength(Result, Count);  end;const  PAGE_SIZE = \$8000;var  IsPrime: array[0..Pred(PAGE_SIZE)] of Boolean; //current page  SmallPrimes: TPrimeList = nil;  I: QWord;  J, PageHigh, Prime: DWord;  Count: Integer;begin  if aLimit < PAGE_SIZE div 4 then    exit(Sieve(aLimit, True));  I := Trunc(Sqrt(aLimit));  SmallPrimes := Sieve(I + 1, False);  Count := Length(SmallPrimes) + 1;  I += Ord(not Odd(I));  SetLength(Result, EstimatePrimeCount(aLimit));  while I <= aLimit do    begin      PageHigh := Min(Pred(PAGE_SIZE * 2), aLimit - I);      FillChar(IsPrime, PageHigh div 2 + 1, Byte(True));      for Prime in SmallPrimes do        begin          J := DWord(I) mod Prime;          if J <> 0 then            J := Prime shl (1 - J and 1) - J;          while J <= PageHigh do            begin              IsPrime[J div 2] := False;              J += Prime * 2;            end;        end;      for J := 0 to PageHigh div 2 do        if IsPrime[J] then          begin            Result[Count] := J * 2 + I;            Inc(Count);          end;      I += PAGE_SIZE * 2;    end;  SetLength(Result, Count);  Result := 2;  Move(SmallPrimes, Result, Length(SmallPrimes) * SizeOf(DWord));end;   //usage var  Limit: DWord = 0;function ReadLimit: Boolean;var  Lim: Int64;begin  if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then    if (Lim >= 0) and (Lim <= High(DWord)) then      begin        Limit := DWord(Lim);        exit(True);      end;  Result := False;end;procedure PrintUsage;begin  WriteLn('Usage: prime_sieve Limit');  WriteLn('  where Limit in the range [0, ', High(DWord), ']');  Halt;end;procedure PrintPrimes(const aList: TPrimeList);var  I: DWord;begin  for I := 0 to Length(aList) - 2 do    Write(aList[I], ', ');  WriteLn(aList[High(aList)]);end;begin  if not ReadLimit then    PrintUsage;  PrintPrimes(OddSegmentSieve(Limit));end. `

## FreeBASIC

`' FB 1.05.0 Sub sieve(n As Integer)  If n < 2 Then Return  Dim a(2 To n) As Integer  For i As Integer = 2 To n : a(i) = i : Next  Dim As Integer p = 2, q  ' mark non-prime numbers by setting the corresponding array element to 0  Do    For j As Integer = p * p To n Step p      a(j) = 0    Next j    ' look for next non-zero element in array after 'p'    q = 0    For j As Integer = p + 1 To Sqr(n)      If a(j) <> 0 Then        q = j        Exit For      End If    Next j        If q = 0 Then Exit Do    p = q  Loop   ' print the non-zero numbers remaining i.e. the primes  For i As Integer = 2 To n    If a(i) <> 0 Then      Print Using "####"; a(i);          End If  Next  PrintEnd Sub Print "The primes up to 1000 are :"Printsieve(1000)PrintPrint "Press any key to quit"Sleep`
Output:
```The primes up to 1000 are :

2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
```

## Frink

` n = eval[input["Enter highest number: "]]results = array[sieve[n]]println[results]println[length[results] + " prime numbers less than or equal to " + n] sieve[n] :={   // Initialize array   array = array[0 to n]   [email protected] = 0    for i = 2 to ceil[sqrt[n]]      if [email protected] != 0         for j = i^2 to n step i            [email protected] = 0    return select[array, { |x| x != 0 }]} `

## Furor

Note: With benchmark function

` tick sto startingtick#g 100000 sto MAX@MAX mem !maximize sto primeNumbersone count@primeNumbers 0 2 [^]2 @MAX külső: {||@count {|{}§külső {} []@primeNumbers !/ else{<}§külső|} // @count vége@primeNumbers @count++ {} [^]|} // @MAX vége@primeNumbers free."Time : " tick @startingtick - print ." tick\n"."Prímek száma = " @count printnlend{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” } `

## FutureBasic

### Basic sieve of array of booleans

` include "ConsoleWindow" begin globalsdim dynamic gPrimes(1) as Booleanend globals local fn SieveOfEratosthenes( n as long )dim as long i, j for i = 2 to  n  for j = i * i to n step i     gPrimes(j) = _true  next  if gPrimes(i) = 0 then print i;next ikill gPrimesend fn fn SieveOfEratosthenes( 100 ) `

Output:

``` 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

## Fōrmulæ

Fōrmulæ programs are not textual, visualization/edition of programs is done showing/manipulating structures but not text. Moreover, there can be multiple visual representations of the same program. Even though it is possible to have textual representation —i.e. XML, JSON— they are intended for storage and transfer purposes more than visualization and edition.

Programs in Fōrmulæ are created/edited online in its website, However they run on execution servers. By default remote servers are used, but they are limited in memory and processing power, since they are intended for demonstration and casual use. A local server can be downloaded and installed, it has no limitations (it runs in your own computer). Because of that, example programs can be fully visualized and edited, but some of them will not run if they require a moderate or heavy computation/memory resources, and no local server is being used.

## GAP

`Eratosthenes := function(n)    local a, i, j;    a := ListWithIdenticalEntries(n, true);    if n < 2 then        return [];    else        for i in [2 .. n] do            if a[i] then                j := i*i;                if j > n then                    return Filtered([2 .. n], i -> a[i]);                else                    while j <= n do                        a[j] := false;                        j := j + i;                    od;                fi;            fi;        od;    fi;end; Eratosthenes(100); [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]`

## GLBasic

`// Sieve of Eratosthenes (find primes)// GLBasic implementation  GLOBAL n%, k%, limit%, flags%[] limit = 100			// search primes up to this number DIM flags[limit+1]		// GLBasic arrays start at 0 FOR n = 2 TO SQR(limit)    IF flags[n] = 0        FOR k = n*n TO limit STEP n            flags[k] = 1        NEXT    ENDIFNEXT // Display the primesFOR n = 2 TO limit    IF flags[n] = 0 THEN STDOUT n + ", "NEXT KEYWAIT `

## Go

### Basic sieve of array of booleans

`package mainimport "fmt" func main() {    const limit = 201 // means sieve numbers < 201     // sieve    c := make([]bool, limit) // c for composite.  false means prime candidate    c = true              // 1 not considered prime    p := 2    for {        // first allowed optimization:  outer loop only goes to sqrt(limit)        p2 := p * p        if p2 >= limit {            break        }        // second allowed optimization:  inner loop starts at sqr(p)        for i := p2; i < limit; i += p {            c[i] = true // it's a composite         }        // scan to get next prime for outer loop        for {            p++            if !c[p] {                break            }        }    }     // sieve complete.  now print a representation.    for n := 1; n < limit; n++ {        if c[n] {            fmt.Print("  .")        } else {            fmt.Printf("%3d", n)        }        if n%20 == 0 {            fmt.Println("")        }    }}`

Output:

```  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19  .
.  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .  .
41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59  .
61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79  .
.  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .  .
101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .  .
.  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139  .
.  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .  .
.  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179  .
181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199  .
```

### Odds-only bit-packed array output-enumerating version

The above version's output is rather specialized; the following version uses a closure function to enumerate over the culled composite number array, which is bit packed. By using this scheme for output, no extra memory is required above that required for the culling array:

`package main import (	"fmt"	"math") func primesOdds(top uint) func() uint {	topndx := int((top - 3) / 2)	topsqrtndx := (int(math.Sqrt(float64(top))) - 3) / 2	cmpsts := make([]uint, (topndx/32)+1)	for i := 0; i <= topsqrtndx; i++ {		if cmpsts[i>>5]&(uint(1)<<(uint(i)&0x1F)) == 0 {			p := (i << 1) + 3			for j := (p*p - 3) >> 1; j <= topndx; j += p {				cmpsts[j>>5] |= 1 << (uint(j) & 0x1F)			}		}	}	i := -1	return func() uint {		oi := i		if i <= topndx {			i++		}		for i <= topndx && cmpsts[i>>5]&(1<<(uint(i)&0x1F)) != 0 {			i++		}		if oi < 0 {			return 2		} else {			return (uint(oi) << 1) + 3		}	}} func main() {	iter := primesOdds(100)	for v := iter(); v <= 100; v = iter() {		print(v, " ")	}	iter = primesOdds(1000000)	count := 0	for v := iter(); v <= 1000000; v = iter() {		count++	}	fmt.Printf("\r\n%v\r\n", count)}`
Output:
```2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
78498```

### Sieve Tree

A fairly odd sieve tree method:

`package mainimport "fmt" type xint uint64type xgen func()(xint) func primes() func()(xint) {	pp, psq := make([]xint, 0), xint(25) 	var sieve func(xint, xint)xgen	sieve = func(p, n xint) xgen {		m, next := xint(0), xgen(nil)		return func()(r xint) {			if next == nil {				r = n				if r <= psq {					n += p					return				} 				next = sieve(pp * 2, psq) // chain in				pp = pp[1:]				psq = pp * pp 				m = next()			}			switch {			case n < m: r, n = n, n + p			case n > m: r, m = m, next()			default:    r, n, m = n, n + p, next()			}			return		}	} 	f := sieve(6, 9)	n, p := f(), xint(0) 	return func()(xint) {		switch {		case p < 2: p = 2		case p < 3: p = 3		default:			for p += 2; p == n; {				p += 2				if p > n {					n = f()				}			}			pp = append(pp, p)		}		return p	}} func main() {	for i, p := 0, primes(); i < 100000; i++ {		fmt.Println(p())	}}`

### Concurrent Daisy-chain sieve

A concurrent prime sieve adopted from the example in the "Go Playground" window at http://golang.org/

`package mainimport "fmt" // Send the sequence 2, 3, 4, ... to channel 'out'func Generate(out chan<- int) {	for i := 2; ; i++ {		out <- i                  // Send 'i' to channel 'out'	}} // Copy the values from 'in' channel to 'out' channel,//   removing the multiples of 'prime' by counting.// 'in' is assumed to send increasing numbersfunc Filter(in <-chan int, out chan<- int, prime int) {        m := prime + prime                // first multiple of prime	for {		i := <- in                // Receive value from 'in'		for i > m {			m = m + prime     // next multiple of prime			}		if i < m {			out <- i          // Send 'i' to 'out'			}	}} // The prime sieve: Daisy-chain Filter processesfunc Sieve(out chan<- int`