Legendre prime counting function: Difference between revisions

→‎{{header|Erlang}}: add Crystal non-memoized versions as per Nim...
m (Minor edit to C++ and Java code)
(→‎{{header|Erlang}}: add Crystal non-memoized versions as per Nim...)
Line 154:
10^9 50847534
</pre>
=={{header|Crystal}}==
 
There doesn't seem to be much point to contributing memoized (Hash table) versions as they seem to be the result of the task author not realizing that there is a simple way of keeping the computation complexity from having exponential growth with counting range by limiting the "splitting" of the "phi" calculation when the result must be just one. Accordingly, the following Crystal versions are [translated from the non-memoizing Nim versions](https://rosettacode.org/wiki/Legendre_prime_counting_function#Non-Memoized_Versions), which explanations can be referenced for an understanding of how they work:
 
'''The Basic Recursive Legendre Algorithm'''
 
<syntaxhighlight lang="crystal">require "bit_array"
 
def count_primes(n : Int64)
if n < 3_i64
return 0_i64 if n < 2_i64
return 1_i64
end
rtlmt = Math.sqrt(n.to_f64).to_i32
mxndx = (rtlmt - 3) // 2
cmpsts = BitArray.new(mxndx + 1)
i = 0
while true
c = (i + i) * (i + 3) + 3
break if c > mxndx
unless cmpsts[i]
bp = i + i + 3
until c > mxndx
cmpsts[c] = true
c += bp
end
end
i += 1
end
oprms = Array(Int32).new(cmpsts.count { |e| !e }, 0)
pi = 0
cmpsts.each_with_index do |e, i|
unless e
oprms[pi] = (i + i + 3).to_i32; pi += 1
end
end
phi = uninitialized Proc(Int64, Int32, Int64) # recursion target!
phi = ->(x : Int64, a : Int32) {
return x - (x >> 1) if a < 1
p = oprms.unsafe_fetch(a - 1)
return 1_i64 if x <= p
phi.call(x, a - 1) - phi.call((x.to_f64 / p.to_f64).to_i64, a - 1)
}
phi.call(n, oprms.size) + oprms.size
end
 
start_time = Time.monotonic
(0 .. 9).each { |i| puts "π(10**#{i}) = #{count_primes(10_i64**i)}" }
elpsd = (Time.monotonic - start_time).total_milliseconds
 
puts "This took #{elpsd} milliseconds."</syntaxhighlight>
 
{{out}}
<pre>π(10**0) = 0
π(10**1) = 4
π(10**2) = 25
π(10**3) = 168
π(10**4) = 1229
π(10**5) = 9592
π(10**6) = 78498
π(10**7) = 664579
π(10**8) = 5761455
π(10**9) = 50847534
This took 272.428954 milliseconds.</pre>
 
The time is as run on an Intel i5-6500 (3.6 GHz single-threaded boost).
 
Although this is about ten times faster than if it were using memoization (and uses much less memory), it is only reasonably usable for trivial ranges such as this (trivial in terms of prime counting functions).
 
'''The Less Recursive "Bottom-Up" with a TinyPhi LUT Algorithm'''
 
Just substitute the following Crystal code for the `count_primes` function above to enjoy the gain in speed:
<syntaxhighlight lang="crystal">Tiny_Phi_Primes = [ 2, 3, 5, 7, 11, 13 ]
Tiny_Phi_Odd_Circ = Tiny_Phi_Primes.product // 2
Tiny_Phi_Tot = Tiny_Phi_Primes.reduce(1) { |acc, p| acc * (p - 1) }
CC = Tiny_Phi_Primes.size - 1
def make_Tiny_Phi_LUT()
rslt = Array(UInt16).new(Tiny_Phi_Odd_Circ, 1_u16)
Tiny_Phi_Primes.skip(1).each { |bp|
i = (bp - 1) >> 1; rslt[i] = 0; c = (i + i) * (i + 1)
while c < Tiny_Phi_Odd_Circ
rslt[c] = 0; c += bp
end }
acc = 0_u16; i = 0
while i < Tiny_Phi_Odd_Circ
acc += rslt[i]; rslt[i] = acc; i += 1
end
rslt
end
Tiny_Phi_LUT = make_Tiny_Phi_LUT()
@[AlwaysInline]
def tiny_Phi(x : Int64) : Int64
ndx = (x - 1) >> 1; numtot = ndx // Tiny_Phi_Odd_Circ.to_i64
tpli = ndx - numtot * Tiny_Phi_Odd_Circ.to_i64
numtot * Tiny_Phi_Tot.to_i64 +
Tiny_Phi_LUT.unsafe_fetch(tpli).to_i64
end
 
def count_primes(n : Int64)
if n < 169_i64 # below 169 whose sqrt is 13 is where TinyPhi doesn't work...
return 0_i64 if n < 2_i64
return 1_i64 if n < 3_i64
# adjust for the missing "degree" base primes
return 1 + (n - 1) // 2 if n < 9_i64
return (n - 1) // 2 if n <= 13_i64
return 5 + Tiny_Phi_LUT[(n - 1).to_i32 // 2].to_i64
end
rtlmt = Math.sqrt(n.to_f64).to_i32
mxndx = (rtlmt - 3) // 2
cmpsts = BitArray.new(mxndx + 1)
i = 0
while true
c = (i + i) * (i + 3) + 3
break if c > mxndx
unless cmpsts[i]
bp = i + i + 3
until c > mxndx
cmpsts[c] = true
c += bp
end
end
i += 1
end
oprms = Array(Int32).new(cmpsts.count { |e| !e }, 0)
opi = 0
cmpsts.each_with_index do |e, i|
unless e
oprms[opi] = (i + i + 3).to_i32; opi += 1
end
end
lvl = uninitialized Proc(Int32, Int32, Int64, Int64) # recursion target!
lvl = ->(pilo : Int32, pilmt : Int32, m : Int64) : Int64 {
pi = pilo; answr = 0_i64
while pi < pilmt
p = oprms.unsafe_fetch(pi).to_i64; nm = p * m
return answr + (pilmt - pi) if n <= nm * p
q = (n.to_f64 / nm.to_f64).to_i64; answr += tiny_Phi(q)
answr -= lvl.call(CC, pi, nm) if pi > CC
pi += 1
end
answr
}
tiny_Phi(n) - lvl.call(CC, oprms.size, 1_i64) + oprms.size
end</syntaxhighlight>
 
Use of the above function gets a gain in speed of about a further ten times over the above version due to less recursion, the use of the "Tiny_Phi" Look Up Table (LUT) for smaller "degrees" of primes which greatly reduces the number of integer divisions, and by using floating point division for integer division because it is faster, although it is limited in precision to 53-bits, one wouldn't want to use these algorithms over a range where that would cause a problem. This version of prime counting function might be considered to be reasonably useful for counting primes up to a trillion (1e12) in a few tens of seconds.
 
'''The Non-Recursive Partial Sieving Algorithm'''
 
Just substitute the following Crystal code for the `count_primes` function above to enjoy the further gain in speed:
<syntaxhighlight lang="crystal">def count_primes(n : Int64) : Int64
if n < 3
if n < 2
return 0_i64
else
return 1_i64
end
end
half = ->(n : Int64) : Int64 { (n - 1) >> 1 }
divide = ->(n : Int64, d : Int64) : Int64 { (n.to_f64 / d.to_f64).to_i64 }
rtlmt = Math.sqrt(n.to_f64).to_i32; mxndx = (rtlmt - 1) // 2
cmpsts = BitArray.new(mxndx + 1)
smalls = Array(Int32).new(mxndx + 1) { |i| i }
roughs = Array(Int32).new(mxndx + 1) { |i| i + i + 1 }
larges =
Array(Int64).new(mxndx + 1) { |i| ((n // (i + i + 1)).to_i64 - 1) >> 1 }
i = 1; nbps = 0; mxri = mxndx
while true
c = (i + i) * (i + 1); break if c > mxndx
if !cmpsts.unsafe_fetch(i)
bp = i + i + 1; cmpsts.unsafe_put(i, true)
until c > mxndx
cmpsts.unsafe_put(c, true); c += bp
end # partial sieving for bp completed here!
 
j = 0; ri = 0 # adjust `larges` according to partial sieve...
while j <= mxri
q = roughs.unsafe_fetch(j); qi = q >> 1
if !cmpsts.unsafe_fetch(qi)
d = bp.to_i64 * q.to_i64
larges.unsafe_put(ri, larges.unsafe_fetch(j) -
if d <= rtlmt.to_i64
ndx = smalls.unsafe_fetch(d >> 1) - nbps
larges.unsafe_fetch(ndx)
else
ndx = half.call(divide.call(n, d))
smalls.unsafe_fetch(ndx)
end + nbps)
roughs.unsafe_put(ri, q); ri += 1
end; j += 1
end
 
si = mxndx; bpm = (rtlmt // bp - 1) | 1
while bpm >= bp # adjust smalls according to partial sieve...
c = smalls.unsafe_fetch(bpm >> 1) - nbps; e = (bpm * bp) >> 1
while si >= e
smalls.unsafe_put(si, smalls.unsafe_fetch(si) - c); si -= 1
end
bpm -= 2
end
 
mxri = ri - 1; nbps += 1
end; i += 1
end
 
ans = larges.unsafe_fetch(0); i = 1
while i <= mxri # combine results; adjust for over subtraction base primes...
ans -= larges.unsafe_fetch(i); i += 1
end
ans += (mxri.to_i64 + 1 + 2 * (nbps.to_i64 - 1)) * mxri.to_i64 // 2 # adjust!
 
ri = 1 # do final phi calculation for pairs of larger primes...
while true # break on condition when up to cube root of range!
p = roughs.unsafe_fetch(ri).to_i64; q = n // p
e = smalls.unsafe_fetch(half.call(divide.call(q, p))) - nbps
break if e <= ri; ori = ri + 1
while ori <= e
ndx = half.call(divide.call(q, roughs.unsafe_fetch(ori).to_i64))
ans += smalls.unsafe_fetch(ndx).to_i64; ori += 1
end
ans -= (e - ri).to_i64 * (nbps.to_i64 + ri.to_i64 - 1); ri += 1
end
 
ans + 1 # for only even prime of two!
end</syntaxhighlight>
 
The above function enjoys quite a large gain in speed of about a further ten times over the immediately preceding version due to not using recursion at all and the greatly reduced computational complexity of O(n**(3/4)/((log n)**2)) instead of the almost-linear-for-large-counting-ranges O(n/((log n)**2)) for the previous two versions, although the previous two versions differ in performance by a constant factor due to the overheads of recursion and division. This version of prime counting function might be considered to be reasonably useful for counting primes up to a 1e16 in almost two minutes as long as the computer used has the required free memory of about 800 Megabytes. This Crystal version is about twice as slow as the Nim version from which it was translated because Nim's default GCC back-end is better at optimization (at least for this algorithm) than the LLVM back-end used by Crystal and also there is more numeric calculation checking such as numeric overflow used in Crystal that can't be turned off. At that, this version is about the same speed as when translated to Rust, which also uses a LLVM back-end.
 
=={{header|Erlang}}==
<syntaxhighlight lang="erlang">
Line 222 ⟶ 450:
10^9 50847534
</pre>
 
 
=={{header|Go}}==
474

edits