Legendre prime counting function: Difference between revisions

Line 651:

=={{header|Haskell}}==

With Memoization:

Memoization utilities:

<syntaxhighlight lang="haskell">~~data~~ ~~Memo~~ a = ~~Node~~ ~~a (Memo a) (Memo a)~~

<syntaxhighlight lang="haskell">{-# OPTIONS_GHC -O2 -fllvm -Wno-incomplete-patterns #-}

{-# LANGUAGE DeriveFunctor #-}

import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing

import Data.Int ( Int64 )

import Data.Bits ( Bits( shiftL, shiftR ) )

data Memo a = EmptyNode | Node a (Memo a) (Memo a)

deriving Functor

Line 673:

Line 681:

memoList = memo . mkList

where

mkList [] = EmptyNode -- never used; makes complete

mkList (x:xs) = Node x (mkList l) (mkList r)

where (l,r) = split xs

split [] = ([],[])

split [x] = ([x],[])

split (x:y:xs) = let (l,r) = split xs in (x:l, y:r)~~</syntaxhighlight>~~

split (x:y:xs) = let (l,r) = split xs in (x:l, y:r)

isqrt :: Integer -> Integer

Computation of Legendre function:

<syntaxhighlight lang="haskell">isqrt :: Integer -> Integer

isqrt n = go n 0 (q `shiftR` 2)

where

Line 690:

Line 698:

else go z (r `shiftR` 1) (q `shiftR` 2)

primes :: [Integer]

primes = 2 : _Y ((3:) . gaps 5 . _U . map(\p-> [p*p, p*p+2*p..])) where

_Y g = g (_Y g) -- = g (g (g ( ... ))) non-sharing multistage fixpoint combinator

gaps k s@(c:cs) | k < c = k : gaps (k+2) s -- ~= ([k,k+2..] \\ s)

| otherwise = gaps (k+2) cs -- when null(s\\[k,k+2..])

_U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union

pairs (xs:ys:t) = merge xs ys : pairs t

merge xs@(x:xt) ys@(y:yt) | x < y = x : merge xt ys

| y < x = y : merge xs yt

| otherwise = x : merge xt yt

phi :: Integer -> Integer -> Integer

phi = memoize2 phiM

where

phiM x 0 = x

phiM x a = phi x (a-1) - phi (x `div` p a) (a - 1)

p = memoList (undefined : primes)

Line 703:

Line 723:

where a = legendrePi (floor (sqrt (fromInteger n)))

main :: IO ()

main = mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (legendrePi (10^n))) [1..7]</syntaxhighlight>

main = do

strt <- getPOSIXTime

mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (legendrePi (10^n))) [0..9]

stop <- getPOSIXTime

let elpsd = round $ 1e3 * (stop - strt) :: Int64

putStrLn $ "This last took " ++ show elpsd ++ " milliseconds."</syntaxhighlight>

<pre>0 0

1 4

2 25

3 168

4 1229

5 9592

6 78498

7 664579

8 5761455

9 50847534

This last took 20383 milliseconds.</pre>

The above code reveals all of the problems with the usual Haskell developer with overuse of the multi-precision `Integer` type when there is no way that the range of a 64-bit integer will ever be exceeded, overuse of lists, developed and tested using the interpreter, etc., such that, even compiled using the LLVM back-end, the above code is about ten times slower that any other compiled language for a memoized version.

===Non Memoized Haskell Versions===

There doesn't seem to be much point to improving the above memoized version even though one could gain a factor of about ten in doing so as the need for memoization seems to be the result of the task author not realizing that there is a simple way of keeping the computation complexity from having exponential growth with counting range by limiting the "splitting" of the "phi" calculation when the result must be just one. Accordingly, the following Haskell versions are [translated from the non-memoizing Nim versions](https://rosettacode.org/wiki/Legendre_prime_counting_function#Non-Memoized_Versions), which explanations can be referenced for an understanding of how they work:

'''The Basic Recursive Legendre Algorithm'''

<syntaxhighlight lang="haskell">{-# OPTIONS_GHC -O2 -fllvm #-}

{-# LANGUAGE FlexibleContexts, BangPatterns #-}

import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing

import Data.Int ( Int64 )

import Data.Word ( Word32, Word64 )

import Data.Bits ( Bits( (.&.), (.|.), shiftL, shiftR ) )

import Control.Monad ( unless, when, forM_ )

import Control.Monad.ST ( ST, runST )

import Data.Array.ST ( runSTUArray )

import Data.Array.Base ( UArray(..), IArray(unsafeAt), listArray, elems, assocs,

MArray( unsafeNewArray_, newArray, unsafeRead, unsafeWrite ),

STUArray, unsafeFreezeSTUArray, castSTUArray )

countPrimes :: Word64 -> Int64

countPrimes n =

if n < 3 then (if n < 2 then 0 else 1) else

let sqrtn = truncate $ sqrt $ fromIntegral n

qdrtn = truncate $ sqrt $ fromIntegral sqrtn

rtlmt = (sqrtn - 3) `div` 2

qrtlmt = (qdrtn - 3) `div` 2

oddPrimes@(UArray _ _ psz _) = runST $ do -- UArray of odd primes...

cmpsts <- newArray (0, rtlmt) False :: ST s (STUArray s Int Bool)

forM_ [ 0 .. qrtlmt ] $ \ i -> do

t <- unsafeRead cmpsts i

unless t $ do

let sqri = (i + i) * (i + 3) + 3

bp = i + i + 3

forM_ [ sqri, sqri + bp .. rtlmt ] $ \ c ->

unsafeWrite cmpsts c True

fcmpsts <- unsafeFreezeSTUArray cmpsts

let !numoprms = sum $ [ 1 | False <- elems fcmpsts ]

prms = [ fromIntegral $ i + i + 3 | (i, False) <- assocs fcmpsts ]

return $ listArray (0, numoprms - 1) prms :: ST s (UArray Int Word32)

phi x a =

if a < 1 then x - (x `shiftR` 1) else

let na = a - 1

p = fromIntegral $ unsafeAt oddPrimes na in

if x < p then 1 else

phi x na - phi (x `div` p) na

in fromIntegral (phi n psz) + fromIntegral psz

main :: IO ()

main = do

strt <- getPOSIXTime

mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (countPrimesx (10^n))) [ 0 .. 9 ]

stop <- getPOSIXTime

let elpsd = round $ 1e3 * (stop - strt) :: Int64

putStrLn $ "This took " ++ show elpsd ++ " milliseconds."</syntaxhighlight>

<pre>0 0

1 4

2 25

3 168

4 1229

5 9592

6 78498

7 664579

8 5761455

9 50847534

This took 273 milliseconds.</pre>

The time is as run on an Intel i5-6500 (3.6 GHz single-threaded boost).

Although this is much faster than if it were using memoization (and uses much less memory), it is only reasonably usable for trivial ranges such as this (trivial in terms of prime counting functions).

'''The Less Recursive "Bottom-Up" with a TinyPhi LUT Algorithm'''

Just substitute the following Haskell code for the `countPrimes` function above to enjoy the gain in speed:<syntaxhighlight lang="haskell">cTinyPhiPrimes :: [Int]

cTinyPhiPrimes = [ 2, 3, 5, 7, 11, 13 ]

cC :: Int

cC = length cTinyPhiPrimes - 1

cTinyPhiOddCirc :: Int

cTinyPhiOddCirc = product cTinyPhiPrimes `div` 2

cTinyPhiTot :: Int

cTinyPhiTot = product [ p - 1 | p <- cTinyPhiPrimes ]

cTinyPhiLUT :: UArray Int Word32

cTinyPhiLUT = runSTUArray $ do

ma <- newArray (0, cTinyPhiOddCirc - 1) 1

forM_ (drop 1 cTinyPhiPrimes) $ \ bp -> do

let i = (bp - 1) `shiftR` 1

let sqri = (i + i) * (i + 1)

unsafeWrite ma i 0

forM_ [ sqri, sqri + bp .. cTinyPhiOddCirc - 1 ] $ \ c -> unsafeWrite ma c 0

let tot i acc =

if i >= cTinyPhiOddCirc then return ma else do

v <- unsafeRead ma i

if v == 0 then do unsafeWrite ma i acc; tot (i + 1) acc

else do let nacc = acc + 1

unsafeWrite ma i nacc; tot (i + 1) nacc

tot 0 0

tinyPhi :: Word64 -> Int64

tinyPhi n =

let on = (n - 1) `shiftR` 1

numcyc = on `div` fromIntegral cTinyPhiOddCirc

rem = fromIntegral $ on - numcyc * fromIntegral cTinyPhiOddCirc

in fromIntegral numcyc * fromIntegral cTinyPhiTot +

fromIntegral (unsafeAt cTinyPhiLUT rem)

countPrimes :: Word64 -> Int64

countPrimes n =

if n < 3 then (if n < 2 then 0 else 1) else

let sqrtn = truncate $ sqrt $ fromIntegral n

qdrtn = truncate $ sqrt $ fromIntegral sqrtn

rtlmt = (sqrtn - 3) `div` 2

qrtlmt = (qdrtn - 3) `div` 2

oddPrimes@(UArray _ _ psz _) = runST $ do -- UArray of odd primes...

cmpsts <- newArray (0, rtlmt) False :: ST s (STUArray s Int Bool)

forM_ [ 0 .. qrtlmt ] $ \ i -> do

t <- unsafeRead cmpsts i

unless t $ do

let sqri = (i + i) * (i + 3) + 3

bp = i + i + 3

forM_ [ sqri, sqri + bp .. rtlmt ] $ \ c ->

unsafeWrite cmpsts c True

fcmpsts <- unsafeFreezeSTUArray cmpsts

let !numoprms = sum $ [ 1 | False <- elems fcmpsts ]

prms = [ fromIntegral $ i + i + 3 | (i, False) <- assocs fcmpsts ]

return $ listArray (0, numoprms - 1) prms :: ST s (UArray Int Word32)

lvl pi pilmt !m !acc =

if pi >= pilmt then acc else

let p = fromIntegral $ unsafeAt oddPrimes pi

nm = m * p in

if n <= nm * p then acc + fromIntegral (pilmt - pi) else

let !q = fromIntegral $ n `div` nm

!nacc = acc + tinyPhi q

!sacc = if pi <= cC then 0 else lvl cC pi nm 0

in lvl (pi + 1) pilmt m $ nacc - sacc

in tinyPhi n - lvl cC psz 1 0 + fromIntegral psz</syntaxhighlight>

Use of the above function gets a gain in speed of about a further fifteen times over the above version due to less recursion and the use of the "Tiny_Phi" Look Up Table (LUT) for smaller "degrees" of primes which greatly reduces the number of integer divisions. This version of prime counting function might be considered to be reasonably useful for counting primes up to a trillion (1e12) in a little over ten seconds.

'''The Non-Recursive Partial Sieving Algorithm'''

Just substitute the following Haskell code for the `countPrimes` function above to enjoy the further gain in speed:

<syntaxhighlight lang="haskell">countPrimesxx :: Word64 -> Int64

countPrimesxx n =

if n < 3 then (if n < 2 then 0 else 1) else

let

{-# INLINE divide #-}

divide :: Word64 -> Word64 -> Int

divide nm d = truncate $ (fromIntegral nm :: Double) / fromIntegral d

{-# INLINE half #-}

half :: Int -> Int

half x = (x - 1) `shiftR` 1

rtlmt = floor $ sqrt (fromIntegral n :: Double)

mxndx = (rtlmt - 1) `div` 2

(!nbps, !nrs, !smalls, !roughs, !larges) = runST $ do

mss <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Word32)

let msscst =

castSTUArray :: STUArray s Int Word32 -> ST s (STUArray s Int Int64)

mdss <- msscst mss -- for use in adjing counts LUT

forM_ [ 0 .. mxndx ] $ \ i -> unsafeWrite mss i (fromIntegral i)

mrs <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Word32)

forM_ [ 0 .. mxndx ] $ \ i -> unsafeWrite mrs i (fromIntegral i * 2 + 1)

mls <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Int64)

forM_ [ 0 .. mxndx ] $ \ i ->

let d = fromIntegral (i + i + 1)

in unsafeWrite mls i (fromIntegral (divide n d - 1) `div` 2)

cmpsts <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Bool)

let loop i !cbpi !rlmti =

let sqri = (i + i) * (i + 1) in

if sqri > mxndx then do

fss <- unsafeFreezeSTUArray mss

frs <- unsafeFreezeSTUArray mrs

fls <- unsafeFreezeSTUArray mls

return (cbpi, rlmti + 1, fss, frs, fls)

else do

v <- unsafeRead cmpsts i

if v then loop (i + 1) cbpi rlmti else do

unsafeWrite cmpsts i True -- cull current bp so not a "k-rough"!

let bp = i + i + 1

-- partial cull by current base prime, bp...

cull c = if c > mxndx then return () else do

unsafeWrite cmpsts c True; cull (c + bp)

-- adjust `larges` according to partial sieve...

part ri nri = -- old "rough" index to new one...

if ri > rlmti then return (nri - 1) else do

r <- unsafeRead mrs ri -- "rough" always odd!

t <- unsafeRead cmpsts (fromIntegral r `shiftR` 1)

if t then part (ri + 1) nri else do -- skip newly culled

olv <- unsafeRead mls ri

let m = fromIntegral r * fromIntegral bp

adjv <- if m <= fromIntegral rtlmt then do

let ndx = fromIntegral m `shiftR` 1

sv <- unsafeRead mss ndx

unsafeRead mls (fromIntegral sv - cbpi)

else do

sv <- unsafeRead mss (half (divide n m))

return (fromIntegral sv)

unsafeWrite mls nri (olv - (adjv - fromIntegral cbpi))

unsafeWrite mrs nri r; part (ri + 1) (nri + 1)

!pm0 = ((rtlmt `div` bp) - 1) .|. 1 -- max base prime mult

adjc lmti pm = -- adjust smalls according to partial sieve:

if pm < bp then return () else do

c <- unsafeRead mss (pm `shiftR` 1)

let ac = c - fromIntegral cbpi -- correction

bi = (pm * bp) `shiftR` 1 -- start array index

adj si = if si > lmti then adjc (bi - 1) (pm - 2)

else do ov <- unsafeRead mss si

unsafeWrite mss si (ov - ac)

adj (si + 1)

adj bi

dadjc lmti pm =

if pm < bp then return () else do

c <- unsafeRead mss (pm `shiftR` 1)

let ac = c - fromIntegral cbpi -- correction

bi = (pm * bp) `shiftR` 1 -- start array index

ac64 = fromIntegral ac :: Int64

dac = (ac64 `shiftL` 32) .|. ac64

dbi = (bi + 1) `shiftR` 1

dlmti = (lmti - 1) `shiftR` 1

dadj dsi = if dsi > dlmti then return ()

else do dov <- unsafeRead mdss dsi

unsafeWrite mdss dsi (dov - dac)

dadj (dsi + 1)

when (bi .&. 1 /= 0) $ do

ov <- unsafeRead mss bi

unsafeWrite mss bi (ov - ac)

dadj dbi

when (lmti .&. 1 == 0) $ do

ov <- unsafeRead mss lmti

unsafeWrite mss lmti (ov - ac)

adjc (bi - 1) (pm - 2)

cull sqri; nrlmti <- part 0 0

dadjc mxndx pm0

loop (i + 1) (cbpi + 1) nrlmti

loop 1 0 mxndx

!ans0 = unsafeAt larges 0 - -- combine all counts; each includes nbps...

sum [ unsafeAt larges i | i <- [ 1 .. nrs - 1 ] ]

-- adjust for all the base prime counts subracted above...

!adj = (nrs + 2 * (nbps - 1)) * (nrs - 1) `div` 2

!adjans0 = ans0 + fromIntegral adj

loopr ri !acc = -- o final phi calculation for pairs of larger primes...

let r = fromIntegral (unsafeAt roughs ri)

q = n `div` r

lmtsi = half (fromIntegral (q `div` r))

lmti = fromIntegral (unsafeAt smalls lmtsi) - nbps

addcnt pi !ac =

if pi > lmti then ac else

let p = fromIntegral (unsafeAt roughs pi)

ci = half (fromIntegral (divide q p))

in addcnt (pi + 1) (ac + fromIntegral (unsafeAt smalls ci))

in if lmti <= ri then acc else -- break when up to cube root of range!

-- adjust for the `nbps`'s over added in the `smalls` counts...

let !adj = fromIntegral ((lmti - ri) * (nbps + ri - 1))

in loopr (ri + 1) (addcnt (ri + 1) acc - adj)

in loopr 1 adjans0 + 1 -- add one for only even prime of two!</syntaxhighlight>

The above function enjoys quite a large gain in speed of about a further ten times over the immediately preceding version due to not using recursion at all and the greatly reduced computational complexity of O(n**(3/4)/((log n)**2)) instead of the almost-linear-for-large-counting-ranges O(n/((log n)**2)) for the previous two versions, although the previous two versions differ in performance by a constant factor due to the overheads of recursion and division. This version of prime counting function might be considered to be reasonably useful for counting primes up to 1e16 in just over a minute as long as the computer used has the required free memory of about 800 Megabytes, counting the primes to 1e11 in under 30 milliseconds. This Haskell version is about twenty percent slower than the Nim version from which it was translated because Nim's default GCC back-end is better at optimization (at least for this algorithm) than the LLVM back-end used by Haskell here.

<pre>λ> main

1 4

2 25

3 168

4 1229

5 9592

6 78498

7 664579

8 5761455

9 50847534</pre>

=={{header|J}}==