Legendre prime counting function: Difference between revisions
Content added Content deleted
GordonBGood (talk | contribs) m (→Non-Memoized Versions: Nim - tuned comments on partial sieving implementation...) |
GordonBGood (talk | contribs) (→{{header|Haskell}}: make memoized code compile; add non-memoized versions...) |
||
Line 651: | Line 651: | ||
=={{header|Haskell}}== |
=={{header|Haskell}}== |
||
With Memoization: |
|||
Memoization utilities: |
|||
<syntaxhighlight lang="haskell"> |
<syntaxhighlight lang="haskell">{-# OPTIONS_GHC -O2 -fllvm -Wno-incomplete-patterns #-} |
||
{-# LANGUAGE DeriveFunctor #-} |
|||
import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing |
|||
import Data.Int ( Int64 ) |
|||
import Data.Bits ( Bits( shiftL, shiftR ) ) |
|||
data Memo a = EmptyNode | Node a (Memo a) (Memo a) |
|||
deriving Functor |
deriving Functor |
||
Line 673: | Line 681: | ||
memoList = memo . mkList |
memoList = memo . mkList |
||
where |
where |
||
mkList [] = EmptyNode -- never used; makes complete |
|||
mkList (x:xs) = Node x (mkList l) (mkList r) |
mkList (x:xs) = Node x (mkList l) (mkList r) |
||
where (l,r) = split xs |
where (l,r) = split xs |
||
split [] = ([],[]) |
split [] = ([],[]) |
||
split [x] = ([x],[]) |
split [x] = ([x],[]) |
||
split (x:y:xs) = let (l,r) = split xs in (x:l, y:r) |
split (x:y:xs) = let (l,r) = split xs in (x:l, y:r) |
||
isqrt :: Integer -> Integer |
|||
Computation of Legendre function: |
|||
<syntaxhighlight lang="haskell">isqrt :: Integer -> Integer |
|||
isqrt n = go n 0 (q `shiftR` 2) |
isqrt n = go n 0 (q `shiftR` 2) |
||
where |
where |
||
Line 690: | Line 698: | ||
else go z (r `shiftR` 1) (q `shiftR` 2) |
else go z (r `shiftR` 1) (q `shiftR` 2) |
||
primes :: [Integer] |
|||
primes = 2 : _Y ((3:) . gaps 5 . _U . map(\p-> [p*p, p*p+2*p..])) where |
|||
_Y g = g (_Y g) -- = g (g (g ( ... ))) non-sharing multistage fixpoint combinator |
|||
gaps k s@(c:cs) | k < c = k : gaps (k+2) s -- ~= ([k,k+2..] \\ s) |
|||
| otherwise = gaps (k+2) cs -- when null(s\\[k,k+2..]) |
|||
_U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union |
|||
pairs (xs:ys:t) = merge xs ys : pairs t |
|||
merge xs@(x:xt) ys@(y:yt) | x < y = x : merge xt ys |
|||
| y < x = y : merge xs yt |
|||
| otherwise = x : merge xt yt |
|||
phi :: Integer -> Integer -> Integer |
|||
phi = memoize2 phiM |
phi = memoize2 phiM |
||
where |
where |
||
phiM x 0 = x |
phiM x 0 = x |
||
phiM x a = phi x (a-1) - phi (x `div` p a) (a - 1) |
phiM x a = phi x (a-1) - phi (x `div` p a) (a - 1) |
||
p = memoList (undefined : primes) |
p = memoList (undefined : primes) |
||
Line 703: | Line 723: | ||
where a = legendrePi (floor (sqrt (fromInteger n))) |
where a = legendrePi (floor (sqrt (fromInteger n))) |
||
main :: IO () |
|||
main = mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (legendrePi (10^n))) [1..7]</syntaxhighlight> |
|||
main = do |
|||
strt <- getPOSIXTime |
|||
mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (legendrePi (10^n))) [0..9] |
|||
stop <- getPOSIXTime |
|||
let elpsd = round $ 1e3 * (stop - strt) :: Int64 |
|||
putStrLn $ "This last took " ++ show elpsd ++ " milliseconds."</syntaxhighlight> |
|||
<pre>0 0 |
|||
1 4 |
|||
2 25 |
|||
3 168 |
|||
4 1229 |
|||
5 9592 |
|||
6 78498 |
|||
7 664579 |
|||
8 5761455 |
|||
9 50847534 |
|||
This last took 20383 milliseconds.</pre> |
|||
The above code reveals all of the problems with the usual Haskell developer with overuse of the multi-precision `Integer` type when there is no way that the range of a 64-bit integer will ever be exceeded, overuse of lists, developed and tested using the interpreter, etc., such that, even compiled using the LLVM back-end, the above code is about ten times slower that any other compiled language for a memoized version. |
|||
===Non Memoized Haskell Versions=== |
|||
There doesn't seem to be much point to improving the above memoized version even though one could gain a factor of about ten in doing so as the need for memoization seems to be the result of the task author not realizing that there is a simple way of keeping the computation complexity from having exponential growth with counting range by limiting the "splitting" of the "phi" calculation when the result must be just one. Accordingly, the following Haskell versions are [translated from the non-memoizing Nim versions](https://rosettacode.org/wiki/Legendre_prime_counting_function#Non-Memoized_Versions), which explanations can be referenced for an understanding of how they work: |
|||
'''The Basic Recursive Legendre Algorithm''' |
|||
<syntaxhighlight lang="haskell">{-# OPTIONS_GHC -O2 -fllvm #-} |
|||
{-# LANGUAGE FlexibleContexts, BangPatterns #-} |
|||
import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing |
|||
import Data.Int ( Int64 ) |
|||
import Data.Word ( Word32, Word64 ) |
|||
import Data.Bits ( Bits( (.&.), (.|.), shiftL, shiftR ) ) |
|||
import Control.Monad ( unless, when, forM_ ) |
|||
import Control.Monad.ST ( ST, runST ) |
|||
import Data.Array.ST ( runSTUArray ) |
|||
import Data.Array.Base ( UArray(..), IArray(unsafeAt), listArray, elems, assocs, |
|||
MArray( unsafeNewArray_, newArray, unsafeRead, unsafeWrite ), |
|||
STUArray, unsafeFreezeSTUArray, castSTUArray ) |
|||
countPrimes :: Word64 -> Int64 |
|||
countPrimes n = |
|||
if n < 3 then (if n < 2 then 0 else 1) else |
|||
let sqrtn = truncate $ sqrt $ fromIntegral n |
|||
qdrtn = truncate $ sqrt $ fromIntegral sqrtn |
|||
rtlmt = (sqrtn - 3) `div` 2 |
|||
qrtlmt = (qdrtn - 3) `div` 2 |
|||
oddPrimes@(UArray _ _ psz _) = runST $ do -- UArray of odd primes... |
|||
cmpsts <- newArray (0, rtlmt) False :: ST s (STUArray s Int Bool) |
|||
forM_ [ 0 .. qrtlmt ] $ \ i -> do |
|||
t <- unsafeRead cmpsts i |
|||
unless t $ do |
|||
let sqri = (i + i) * (i + 3) + 3 |
|||
bp = i + i + 3 |
|||
forM_ [ sqri, sqri + bp .. rtlmt ] $ \ c -> |
|||
unsafeWrite cmpsts c True |
|||
fcmpsts <- unsafeFreezeSTUArray cmpsts |
|||
let !numoprms = sum $ [ 1 | False <- elems fcmpsts ] |
|||
prms = [ fromIntegral $ i + i + 3 | (i, False) <- assocs fcmpsts ] |
|||
return $ listArray (0, numoprms - 1) prms :: ST s (UArray Int Word32) |
|||
phi x a = |
|||
if a < 1 then x - (x `shiftR` 1) else |
|||
let na = a - 1 |
|||
p = fromIntegral $ unsafeAt oddPrimes na in |
|||
if x < p then 1 else |
|||
phi x na - phi (x `div` p) na |
|||
in fromIntegral (phi n psz) + fromIntegral psz |
|||
main :: IO () |
|||
main = do |
|||
strt <- getPOSIXTime |
|||
mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (countPrimesx (10^n))) [ 0 .. 9 ] |
|||
stop <- getPOSIXTime |
|||
let elpsd = round $ 1e3 * (stop - strt) :: Int64 |
|||
putStrLn $ "This took " ++ show elpsd ++ " milliseconds."</syntaxhighlight> |
|||
{{out}} |
|||
<pre>0 0 |
|||
1 4 |
|||
2 25 |
|||
3 168 |
|||
4 1229 |
|||
5 9592 |
|||
6 78498 |
|||
7 664579 |
|||
8 5761455 |
|||
9 50847534 |
|||
This took 273 milliseconds.</pre> |
|||
The time is as run on an Intel i5-6500 (3.6 GHz single-threaded boost). |
|||
Although this is much faster than if it were using memoization (and uses much less memory), it is only reasonably usable for trivial ranges such as this (trivial in terms of prime counting functions). |
|||
'''The Less Recursive "Bottom-Up" with a TinyPhi LUT Algorithm''' |
|||
Just substitute the following Haskell code for the `countPrimes` function above to enjoy the gain in speed:<syntaxhighlight lang="haskell">cTinyPhiPrimes :: [Int] |
|||
cTinyPhiPrimes = [ 2, 3, 5, 7, 11, 13 ] |
|||
cC :: Int |
|||
cC = length cTinyPhiPrimes - 1 |
|||
cTinyPhiOddCirc :: Int |
|||
cTinyPhiOddCirc = product cTinyPhiPrimes `div` 2 |
|||
cTinyPhiTot :: Int |
|||
cTinyPhiTot = product [ p - 1 | p <- cTinyPhiPrimes ] |
|||
cTinyPhiLUT :: UArray Int Word32 |
|||
cTinyPhiLUT = runSTUArray $ do |
|||
ma <- newArray (0, cTinyPhiOddCirc - 1) 1 |
|||
forM_ (drop 1 cTinyPhiPrimes) $ \ bp -> do |
|||
let i = (bp - 1) `shiftR` 1 |
|||
let sqri = (i + i) * (i + 1) |
|||
unsafeWrite ma i 0 |
|||
forM_ [ sqri, sqri + bp .. cTinyPhiOddCirc - 1 ] $ \ c -> unsafeWrite ma c 0 |
|||
let tot i acc = |
|||
if i >= cTinyPhiOddCirc then return ma else do |
|||
v <- unsafeRead ma i |
|||
if v == 0 then do unsafeWrite ma i acc; tot (i + 1) acc |
|||
else do let nacc = acc + 1 |
|||
unsafeWrite ma i nacc; tot (i + 1) nacc |
|||
tot 0 0 |
|||
tinyPhi :: Word64 -> Int64 |
|||
tinyPhi n = |
|||
let on = (n - 1) `shiftR` 1 |
|||
numcyc = on `div` fromIntegral cTinyPhiOddCirc |
|||
rem = fromIntegral $ on - numcyc * fromIntegral cTinyPhiOddCirc |
|||
in fromIntegral numcyc * fromIntegral cTinyPhiTot + |
|||
fromIntegral (unsafeAt cTinyPhiLUT rem) |
|||
countPrimes :: Word64 -> Int64 |
|||
countPrimes n = |
|||
if n < 3 then (if n < 2 then 0 else 1) else |
|||
let sqrtn = truncate $ sqrt $ fromIntegral n |
|||
qdrtn = truncate $ sqrt $ fromIntegral sqrtn |
|||
rtlmt = (sqrtn - 3) `div` 2 |
|||
qrtlmt = (qdrtn - 3) `div` 2 |
|||
oddPrimes@(UArray _ _ psz _) = runST $ do -- UArray of odd primes... |
|||
cmpsts <- newArray (0, rtlmt) False :: ST s (STUArray s Int Bool) |
|||
forM_ [ 0 .. qrtlmt ] $ \ i -> do |
|||
t <- unsafeRead cmpsts i |
|||
unless t $ do |
|||
let sqri = (i + i) * (i + 3) + 3 |
|||
bp = i + i + 3 |
|||
forM_ [ sqri, sqri + bp .. rtlmt ] $ \ c -> |
|||
unsafeWrite cmpsts c True |
|||
fcmpsts <- unsafeFreezeSTUArray cmpsts |
|||
let !numoprms = sum $ [ 1 | False <- elems fcmpsts ] |
|||
prms = [ fromIntegral $ i + i + 3 | (i, False) <- assocs fcmpsts ] |
|||
return $ listArray (0, numoprms - 1) prms :: ST s (UArray Int Word32) |
|||
lvl pi pilmt !m !acc = |
|||
if pi >= pilmt then acc else |
|||
let p = fromIntegral $ unsafeAt oddPrimes pi |
|||
nm = m * p in |
|||
if n <= nm * p then acc + fromIntegral (pilmt - pi) else |
|||
let !q = fromIntegral $ n `div` nm |
|||
!nacc = acc + tinyPhi q |
|||
!sacc = if pi <= cC then 0 else lvl cC pi nm 0 |
|||
in lvl (pi + 1) pilmt m $ nacc - sacc |
|||
in tinyPhi n - lvl cC psz 1 0 + fromIntegral psz</syntaxhighlight> |
|||
Use of the above function gets a gain in speed of about a further fifteen times over the above version due to less recursion and the use of the "Tiny_Phi" Look Up Table (LUT) for smaller "degrees" of primes which greatly reduces the number of integer divisions. This version of prime counting function might be considered to be reasonably useful for counting primes up to a trillion (1e12) in a little over ten seconds. |
|||
'''The Non-Recursive Partial Sieving Algorithm''' |
|||
Just substitute the following Haskell code for the `countPrimes` function above to enjoy the further gain in speed: |
|||
<syntaxhighlight lang="haskell">countPrimesxx :: Word64 -> Int64 |
|||
countPrimesxx n = |
|||
if n < 3 then (if n < 2 then 0 else 1) else |
|||
let |
|||
{-# INLINE divide #-} |
|||
divide :: Word64 -> Word64 -> Int |
|||
divide nm d = truncate $ (fromIntegral nm :: Double) / fromIntegral d |
|||
{-# INLINE half #-} |
|||
half :: Int -> Int |
|||
half x = (x - 1) `shiftR` 1 |
|||
rtlmt = floor $ sqrt (fromIntegral n :: Double) |
|||
mxndx = (rtlmt - 1) `div` 2 |
|||
(!nbps, !nrs, !smalls, !roughs, !larges) = runST $ do |
|||
mss <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Word32) |
|||
let msscst = |
|||
castSTUArray :: STUArray s Int Word32 -> ST s (STUArray s Int Int64) |
|||
mdss <- msscst mss -- for use in adjing counts LUT |
|||
forM_ [ 0 .. mxndx ] $ \ i -> unsafeWrite mss i (fromIntegral i) |
|||
mrs <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Word32) |
|||
forM_ [ 0 .. mxndx ] $ \ i -> unsafeWrite mrs i (fromIntegral i * 2 + 1) |
|||
mls <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Int64) |
|||
forM_ [ 0 .. mxndx ] $ \ i -> |
|||
let d = fromIntegral (i + i + 1) |
|||
in unsafeWrite mls i (fromIntegral (divide n d - 1) `div` 2) |
|||
cmpsts <- unsafeNewArray_ (0, mxndx) :: ST s (STUArray s Int Bool) |
|||
let loop i !cbpi !rlmti = |
|||
let sqri = (i + i) * (i + 1) in |
|||
if sqri > mxndx then do |
|||
fss <- unsafeFreezeSTUArray mss |
|||
frs <- unsafeFreezeSTUArray mrs |
|||
fls <- unsafeFreezeSTUArray mls |
|||
return (cbpi, rlmti + 1, fss, frs, fls) |
|||
else do |
|||
v <- unsafeRead cmpsts i |
|||
if v then loop (i + 1) cbpi rlmti else do |
|||
unsafeWrite cmpsts i True -- cull current bp so not a "k-rough"! |
|||
let bp = i + i + 1 |
|||
-- partial cull by current base prime, bp... |
|||
cull c = if c > mxndx then return () else do |
|||
unsafeWrite cmpsts c True; cull (c + bp) |
|||
-- adjust `larges` according to partial sieve... |
|||
part ri nri = -- old "rough" index to new one... |
|||
if ri > rlmti then return (nri - 1) else do |
|||
r <- unsafeRead mrs ri -- "rough" always odd! |
|||
t <- unsafeRead cmpsts (fromIntegral r `shiftR` 1) |
|||
if t then part (ri + 1) nri else do -- skip newly culled |
|||
olv <- unsafeRead mls ri |
|||
let m = fromIntegral r * fromIntegral bp |
|||
adjv <- if m <= fromIntegral rtlmt then do |
|||
let ndx = fromIntegral m `shiftR` 1 |
|||
sv <- unsafeRead mss ndx |
|||
unsafeRead mls (fromIntegral sv - cbpi) |
|||
else do |
|||
sv <- unsafeRead mss (half (divide n m)) |
|||
return (fromIntegral sv) |
|||
unsafeWrite mls nri (olv - (adjv - fromIntegral cbpi)) |
|||
unsafeWrite mrs nri r; part (ri + 1) (nri + 1) |
|||
!pm0 = ((rtlmt `div` bp) - 1) .|. 1 -- max base prime mult |
|||
adjc lmti pm = -- adjust smalls according to partial sieve: |
|||
if pm < bp then return () else do |
|||
c <- unsafeRead mss (pm `shiftR` 1) |
|||
let ac = c - fromIntegral cbpi -- correction |
|||
bi = (pm * bp) `shiftR` 1 -- start array index |
|||
adj si = if si > lmti then adjc (bi - 1) (pm - 2) |
|||
else do ov <- unsafeRead mss si |
|||
unsafeWrite mss si (ov - ac) |
|||
adj (si + 1) |
|||
adj bi |
|||
dadjc lmti pm = |
|||
if pm < bp then return () else do |
|||
c <- unsafeRead mss (pm `shiftR` 1) |
|||
let ac = c - fromIntegral cbpi -- correction |
|||
bi = (pm * bp) `shiftR` 1 -- start array index |
|||
ac64 = fromIntegral ac :: Int64 |
|||
dac = (ac64 `shiftL` 32) .|. ac64 |
|||
dbi = (bi + 1) `shiftR` 1 |
|||
dlmti = (lmti - 1) `shiftR` 1 |
|||
dadj dsi = if dsi > dlmti then return () |
|||
else do dov <- unsafeRead mdss dsi |
|||
unsafeWrite mdss dsi (dov - dac) |
|||
dadj (dsi + 1) |
|||
when (bi .&. 1 /= 0) $ do |
|||
ov <- unsafeRead mss bi |
|||
unsafeWrite mss bi (ov - ac) |
|||
dadj dbi |
|||
when (lmti .&. 1 == 0) $ do |
|||
ov <- unsafeRead mss lmti |
|||
unsafeWrite mss lmti (ov - ac) |
|||
adjc (bi - 1) (pm - 2) |
|||
cull sqri; nrlmti <- part 0 0 |
|||
dadjc mxndx pm0 |
|||
loop (i + 1) (cbpi + 1) nrlmti |
|||
loop 1 0 mxndx |
|||
!ans0 = unsafeAt larges 0 - -- combine all counts; each includes nbps... |
|||
sum [ unsafeAt larges i | i <- [ 1 .. nrs - 1 ] ] |
|||
-- adjust for all the base prime counts subracted above... |
|||
!adj = (nrs + 2 * (nbps - 1)) * (nrs - 1) `div` 2 |
|||
!adjans0 = ans0 + fromIntegral adj |
|||
loopr ri !acc = -- o final phi calculation for pairs of larger primes... |
|||
let r = fromIntegral (unsafeAt roughs ri) |
|||
q = n `div` r |
|||
lmtsi = half (fromIntegral (q `div` r)) |
|||
lmti = fromIntegral (unsafeAt smalls lmtsi) - nbps |
|||
addcnt pi !ac = |
|||
if pi > lmti then ac else |
|||
let p = fromIntegral (unsafeAt roughs pi) |
|||
ci = half (fromIntegral (divide q p)) |
|||
in addcnt (pi + 1) (ac + fromIntegral (unsafeAt smalls ci)) |
|||
in if lmti <= ri then acc else -- break when up to cube root of range! |
|||
-- adjust for the `nbps`'s over added in the `smalls` counts... |
|||
let !adj = fromIntegral ((lmti - ri) * (nbps + ri - 1)) |
|||
in loopr (ri + 1) (addcnt (ri + 1) acc - adj) |
|||
in loopr 1 adjans0 + 1 -- add one for only even prime of two!</syntaxhighlight> |
|||
The above function enjoys quite a large gain in speed of about a further ten times over the immediately preceding version due to not using recursion at all and the greatly reduced computational complexity of O(n**(3/4)/((log n)**2)) instead of the almost-linear-for-large-counting-ranges O(n/((log n)**2)) for the previous two versions, although the previous two versions differ in performance by a constant factor due to the overheads of recursion and division. This version of prime counting function might be considered to be reasonably useful for counting primes up to 1e16 in just over a minute as long as the computer used has the required free memory of about 800 Megabytes, counting the primes to 1e11 in under 30 milliseconds. This Haskell version is about twenty percent slower than the Nim version from which it was translated because Nim's default GCC back-end is better at optimization (at least for this algorithm) than the LLVM back-end used by Haskell here. |
|||
<pre>λ> main |
|||
1 4 |
|||
2 25 |
|||
3 168 |
|||
4 1229 |
|||
5 9592 |
|||
6 78498 |
|||
7 664579 |
|||
8 5761455 |
|||
9 50847534</pre> |
|||
=={{header|J}}== |
=={{header|J}}== |