Legendre prime counting function: Difference between revisions

Content added Content deleted

Inline

Revision as of 21:27, 17 October 2022

The prime-counting function π(n) computes the number of primes not greater than n. Legendre was the first mathematician to create a formula to compute π(n) based on the inclusion/exclusion principle.

To calculate:

Define

φ(x, 0) = x
φ(x, a) = φ(x, a−1) − φ(⌊x/p_a⌋, a−1), where p_a is the a^th prime number.

then

π(n) = 0 when n < 2
π(n) = φ(n, a) + a - 1, where a = π(√n), n ≥ 2

The Legendre formula still requires the use of a sieve to enumerate primes; however it's only required to sieve up to the √n, and for counting primes, the Legendre method is generally much faster than sieving up to n.

Task

Calculate π(n) for values up to 1 billion. Show π(n) for n = 1, 10, 100, ... 10⁹.

For this task, you may refer to a prime number sieve (such as the Sieve of Eratosthenes or the extensible sieve) in an external library to enumerate the primes required by the formula. Also note that it will be necessary to memoize the results of φ(x, a) in order to have reasonable performance, since the recurrence relation would otherwise take exponential time.

Comments on Task

Regarding "it will be necessary to memoize the results of φ(x, a)", it will have exponential time performance without memoization only if a very small optimization that should be obvious is not done: it should be obvious that one can stop "splitting" the φ(x, a) "tree" when 'x' is zero, and even before that since the real meaning of the "phi"/φ function is to produce a count of all of the values greater than zero (including one) up to `x` that have been culled of all multiples of the primes up to and including the `p_a` prime value, if `x` is less than or equal to `p_a`, then the whole "tree" must result in a value of just one. If this minor (and obvious) optimization is done, the "exponential time" performance goes away, memoization is not absolutely necessary (saving the overhead in time and space of doing the memoization), and the time complexity becomes O(n/(log n²) and the space complexity becomes O(n^1/2/log n) as they should be.

This is the problem when non-mathematician programmers blindly apply such a general formula as the recursive Legendre one without doing any work to understand it or even doing some trivial hand calculations to better understand how it works just because they have very powerful computers which mask the limitations: a few minutes of hand calculation would make it obvious that there is no need to "split"/recursively call for "phi" nodes where the first argument is zero, and someone with a mathematics interest would then investigate to see if that limit can be pushed a little further as here to the range of nodes whose result will always be one. Once there is no exponential growth of the number of "nodes", then there is no need for memoization as usually implemented with a hash table at a huge cost of memory overhead and constant time computation per operation.

As to "the Legendre method is generally much faster than sieving up to n.", while the number of operations for the Legendre algorithm is about a factor of `log n` squared less than the number of operations for odds-only Sieve of Eratosthenes (SoE) sieving, those operations are "divide" operations which are generally much slower than the simple array access and addition operations used in sieving and the SoE can be optimized further using wheel factorization so that the required time for a given range can be less for a fully optimized SoE than for the common implementation of the Legendre algorithm; the trade off is that a fully optimized SoE is at least 500 lines of code, whereas the basic version of the Legendre prime counting algorithm is only about 40 to 50 lines of code (depending somewhat on the language used).

C++

#include <cmath>
#include <iostream>
#include <unordered_map>
#include <vector>

std::vector<int> generate_primes(int limit) {
    std::vector<bool> sieve(limit >> 1, true);
    for (int p = 3, s = 9; s < limit; p += 2) {
        if (sieve[p >> 1]) {
            for (int q = s; q < limit; q += p << 1)
                sieve[q >> 1] = false;
        }
        s += (p + 1) << 2;
    }
    std::vector<int> primes;
    if (limit > 2)
        primes.push_back(2);
    for (int i = 1; i < sieve.size(); ++i) {
        if (sieve[i])
            primes.push_back((i << 1) + 1);
    }
    return primes;
}

class legendre_prime_counter {
public:
    explicit legendre_prime_counter(int limit);
    int prime_count(int n);
private:
    int phi(int x, int a);
    std::vector<int> primes;
    std::unordered_map<int, std::unordered_map<int, int>> phi_cache;
};

legendre_prime_counter::legendre_prime_counter(int limit) :
    primes(generate_primes(static_cast<int>(std::sqrt(limit)))) {}

int legendre_prime_counter::prime_count(int n) {
    if (n < 2)
        return 0;
    int a = prime_count(static_cast<int>(std::sqrt(n)));
    return phi(n, a) + a - 1;
}

int legendre_prime_counter::phi(int x, int a) {
    if (a == 0)
        return x;
    auto& map = phi_cache[x];
    auto i = map.find(a);
    if (i != map.end())
        return i->second;
    int result = phi(x, a - 1) - phi(x / primes[a - 1], a - 1);
    map[a] = result;
    return result;
}

int main() {
    legendre_prime_counter counter(1000000000);
    for (int i = 0, n = 1; i < 10; ++i, n *= 10)
        std::cout << "10^" << i << "\t" << counter.prime_count(n) << '\n';
}

Output:

10^0	0
10^1	4
10^2	25
10^3	168
10^4	1229
10^5	9592
10^6	78498
10^7	664579
10^8	5761455
10^9	50847534

CoffeeScript

sorenson = require('sieve').primes  # Sorenson's extensible sieve from task: Extensible Prime Generator

# put in outer scope to avoid recomputing the cache
memoPhi = {}
primes = []

isqrt = (x) -> Math.floor Math.sqrt x

pi = (n) ->
    phi = (x, a) ->
        y = memoPhi[[x,a]]
        return y unless y is undefined

        memoPhi[[x,a]] =
            if a is 0 then x
            else
                p = primes[a - 1]
                throw "You need to generate at least #{a} primes." if p is undefined
                phi(x, a - 1) - phi(x // p, a - 1)

    if n < 2
        0
    else
        a = pi isqrt n
        phi(n, a) + a - 1

maxPi = 1e9
gen = sorenson()
primes = while (p = gen.next().value) < isqrt maxPi then p

n = 1
for i in [0..9]
    console.log "10^#{i}\t#{pi(n)}"
    n *= 10

Output:

10^0	0
10^1	4
10^2	25
10^3	168
10^4	1229
10^5	9592
10^6	78498
10^7	664579
10^8	5761455
10^9	50847534

Erlang

-mode(native).

-define(LIMIT, 1000000000).

main(_) ->
    put(primes, array:from_list(primality:sieve(floor(math:sqrt(?LIMIT))))),
    ets:new(memphi, [set, named_table, protected]),
    output(0, 9).

nthprime(N) -> array:get(N - 1, get(primes)).

output(A, B) -> output(A, B, 1).
output(A, B, _) when A > B -> ok;
output(A, B, N) ->
    io:format("10^~b    ~b~n", [A, pi(N)]),
    output(A + 1, B, N * 10).

pi(N) ->
    Primes = get(primes),
    Last = array:get(array:size(Primes) - 1, Primes),
    if
        N =< Last -> small_pi(N);
        true ->
            A = pi(floor(math:sqrt(N))),
            phi(N, A) + A - 1
    end.

phi(X, 0) -> X;
phi(X, A) ->
    case ets:lookup(memphi, {X, A}) of
        [] ->
            Phi = phi(X, A-1) - phi(X div nthprime(A), A-1),
            ets:insert(memphi, {{X, A}, Phi}),
            Phi;

        [{{X, A}, Phi}] -> Phi
    end.

% Use binary search to count primes that we have already listed.
small_pi(N) ->
    Primes = get(primes),
    small_pi(N, Primes, 0, array:size(Primes)).

small_pi(_, _, L, H) when L >= (H - 1) -> L + 1;
small_pi(N, Primes, L, H) ->
    M = (L + H) div 2,
    P = array:get(M, Primes),
    if
        N > P -> small_pi(N, Primes, M, H);
        N < P -> small_pi(N, Primes, 0, M);
        true -> M + 1
    end.

Output:

10^0    1
10^1    4
10^2    25
10^3    168
10^4    1229
10^5    9592
10^6    78498
10^7    664579
10^8    5761455
10^9    50847534

Go

Translation of: Wren

Library: Go-rcu

Memoized

This runs in about 2.2 seconds which is surprisingly slow for Go. However, Go has the same problem as Wren in not being able to use lists for map keys. I tried using a 2-element array for the key but this was a second slower than the Cantor function.

The alternative of sieving a billion numbers and then counting them takes about 4.8 seconds using the present library function though this could be cut to 1.2 seconds using a third party library such as primegen.

package main

import (
    "fmt"
    "log"
    "math"
    "rcu"
)

var limit = int(math.Sqrt(1e9))
var primes = rcu.Primes(limit)
var memoPhi = make(map[int]int)

func cantorPair(x, y int) int {
    if x < 0 || y < 0 {
        log.Fatal("Arguments must be non-negative integers.")
    }
    return (x*x + 3*x + 2*x*y + y + y*y) / 2
}

func phi(x, a int) int {
    if a == 0 {
        return x
    }
    key := cantorPair(x, a)
    if v, ok := memoPhi[key]; ok {
        return v
    }
    pa := primes[a-1]
    memoPhi[key] = phi(x, a-1) - phi(x/pa, a-1)
    return memoPhi[key]
}

func pi(n int) int {
    if n < 2 {
        return 0
    }
    a := pi(int(math.Sqrt(float64(n))))
    return phi(n, a) + a - 1
}

func main() {
    for i, n := 0, 1; i <= 9; i, n = i+1, n*10 {
        fmt.Printf("10^%d  %d\n", i, pi(n))
    }
}

Output:

10^0  0
10^1  4
10^2  25
10^3  168
10^4  1229
10^5  9592
10^6  78498
10^7  664579
10^8  5761455
10^9  50847534

Non-memoized

Much quicker than before (0.53 seconds) and, of course, uses less memory.

package main

import (
    "fmt"
    "math"
    "rcu"
)

func pi(n int) int {
    if n < 2 {
        return 0
    }
    if n == 2 {
        return 1
    }
    primes := rcu.Primes(int(math.Sqrt(float64(n))))
    a := len(primes)

    var phi func(x, a int) int // recursive closure
    phi = func(x, a int) int {
        if a < 1 {
            return x
        }
        if a == 1 {
            return x - (x >> 1)
        }
        pa := primes[a-1]
        if x <= pa {
            return 1
        }
        return phi(x, a-1) - phi(x/pa, a-1)
    }

    return phi(n, a) + a - 1
}

func main() {
    for i, n := 0, 1; i <= 9; i, n = i+1, n*10 {
        fmt.Printf("10^%d  %d\n", i, pi(n))
    }
}

Output:

Same as first version.

Java

import java.util.*;

public class LegendrePrimeCounter {
    public static void main(String[] args) {
        LegendrePrimeCounter counter = new LegendrePrimeCounter(1000000000);
        for (int i = 0, n = 1; i < 10; ++i, n *= 10)
            System.out.printf("10^%d\t%d\n", i, counter.primeCount((n)));
    }

    private List<Integer> primes;
    private Map<Integer, Map<Integer, Integer>> phiCache = new HashMap<>();

    public LegendrePrimeCounter(int limit) {
        primes = generatePrimes((int)Math.sqrt((double)limit));
    }

    public int primeCount(int n) {
        if (n < 2)
            return 0;
        int a = primeCount((int)Math.sqrt((double)n));
        return phi(n, a) + a - 1;
    }

    private int phi(int x, int a) {
        if (a == 0)
            return x;
        Map<Integer, Integer> map = phiCache.computeIfAbsent(x, k -> new HashMap<>());
        Integer value = map.get(a);
        if (value != null)
            return value;
        int result = phi(x, a - 1) - phi(x / primes.get(a - 1), a - 1);
        map.put(a, result);
        return result;
    }

    private static List<Integer> generatePrimes(int limit) {
        boolean[] sieve = new boolean[limit >> 1];
        Arrays.fill(sieve, true);
        for (int p = 3, s = 9; s < limit; p += 2) {
            if (sieve[p >> 1]) {
                for (int q = s; q < limit; q += p << 1)
                    sieve[q >> 1] = false;
            }
            s += (p + 1) << 2;
        }
        List<Integer> primes = new ArrayList<>();
        if (limit > 2)
            primes.add(2);
        for (int i = 1; i < sieve.length; ++i) {
            if (sieve[i])
                primes.add((i << 1) + 1);
        } 
        return primes;
    }
}

Output:

10^0	0
10^1	4
10^2	25
10^3	168
10^4	1229
10^5	9592
10^6	78498
10^7	664579
10^8	5761455
10^9	50847534

Haskell

Memoization utilities:

data Memo a = Node a (Memo a) (Memo a)
  deriving Functor

memo :: Integral a => Memo p -> a -> p
memo (Node a l r) n
  | n == 0 = a
  | odd n = memo l (n `div` 2)
  | otherwise = memo r (n `div` 2 - 1)

nats :: Integral a => Memo a
nats = Node 0 ((+1).(*2) <$> nats) ((*2).(+1) <$> nats)

memoize :: Integral a => (a -> b) -> a -> b
memoize f = memo (f <$> nats)

memoize2 :: (Integral a, Integral b) => (a -> b -> c) -> a -> b -> c
memoize2 f = memoize (memoize . f)

memoList :: [b] -> Integer -> b
memoList = memo . mkList
  where
    mkList (x:xs) = Node x (mkList l) (mkList r)
      where (l,r) = split xs
            split [] = ([],[])
            split [x] = ([x],[])
            split (x:y:xs) = let (l,r) = split xs in (x:l, y:r)

Computation of Legendre function:

isqrt :: Integer -> Integer
isqrt n = go n 0 (q `shiftR` 2)
 where
   q = head $ dropWhile (< n) $ iterate (`shiftL` 2) 1
   go z r 0 = r
   go z r q = let t = z - r - q
              in if t >= 0
                 then go t (r `shiftR` 1 + q) (q `shiftR` 2)
                 else go z (r `shiftR` 1) (q `shiftR` 2)

phi = memoize2 phiM
  where
    phiM x 0 = x
    phiM x a = phi x (a-1) - phi (x `div` p a) (a - 1)
    
    p = memoList (undefined : primes)

legendrePi :: Integer -> Integer
legendrePi n
  | n < 2 = 0
  | otherwise = phi n a + a - 1
    where a = legendrePi (floor (sqrt (fromInteger n)))

main = mapM_ (\n -> putStrLn $ show n ++ "\t" ++ show (legendrePi (10^n))) [1..7]

J

This is "almost a primitive" in J:

   require'format/printf'
   {{echo '10^%d: %d' sprintf y;p:inv 10^y}}"0 i.10
10^0: 0
10^1: 4
10^2: 25
10^3: 168
10^4: 1229
10^5: 9592
10^6: 78498
10^7: 664579
10^8: 5761455
10^9: 50847534

A complete implementation of the legendre prime counting function would be (1&p: + p:inv):

   (1&p: + p:inv) 10
4
   (1&p: + p:inv) 11
5

But we can simplify that to p:inv when we know that primes are not being tested.

jq

Adapted from Wren

Works with: jq

Works with gojq, the Go implementation of jq (*)

The following implementation freely uses the following conveniences of jq syntax:

the jq expression {$x} is an abbreviation for {"x" : $x}
the jq expression {x} is an abbreviation for {"x" : .x}

The key-value pairs can be similarly specified in JSON objects with multiple keys.

(*) gojq struggles to get beyond [5,9592] without using huge amounts of memory; jq becomes excessively slow after [7,664579].

Preliminaries

# For the sake of the accuracy of integer arithmetic when using gojq:
def power($b): . as $in | reduce range(0;$b) as $i (1; . * $in);

# Input: $n, which is assumed to be positive integer,
# Output: an array of primes less than or equal to $n (e.g. 10|eratosthenes #=> [2,3,5,7]
def eratosthenes:
  # erase(i) sets .[i*j] to false for integral j > 1
  def erase(i):
    if .[i] then 
      reduce range(2; (1 + length) / i) as $j (.; .[i * $j] = false)
    else .
    end;

  (. + 1) as $n
  | (($n|sqrt) / 2) as $s
  | [null, null, range(2; $n)]
  | reduce (2, 1 + (2 * range(1; $s))) as $i (.; erase($i))
  | map(select(.)) ;

Phi and the Legendre Counting Function

def legendre:
  (sqrt | floor + 1 | eratosthenes) as $primes

  # Input: {x, a}
  # Output: {phi: phi(x,a), memo} where .memo might have been updated
  | def phi:
    . as {x: $x, a: $a, memo: $memo}
    | if $a == 0 then {phi: $x, $memo}
      else "\($x),\($a)" as $ix
      | $memo[ $ix ] as $m
      | if $m then {phi: $m, $memo}
        else .a += -1
        | phi as {phi: $phi1, memo: $memo}
        | (($x / $primes[$a - 1])|floor) as $x
        | ({$x, a, $memo} | phi) as {phi: $phi2, memo: $memo}
        | ($phi1 - $phi2) as $phi
        | {$phi, $memo}
	| .memo[$ix] = $phi
        end
      end;

  def l:
    . as $n
    | if . < 2 then 0
      else ($n|sqrt|floor|l) as $a
      | ({x: $n, $a, memo: {}} | phi).phi + $a - 1
      end;

  l;


def task:
  range(0;10)
  | . as $i
  | [., (10|power($i)|legendre)]
  ;

task

Output:

[0,0]
[1,4]
[2,25]
[3,168]
[4,1229]
[5,9592]
[6,78498]
[7,664579]
<terminated>

Julia

Using the Julia Primes library

This would be faster with a custom sieve that only counted instead of making a bitset, instead of the more versatile library function.

using Primes

function primepi(N)
    delta = round(Int, N^0.8)
    return sum(i -> count(primesmask(i, min(i + delta - 1, N))), 1:delta:N)
end

@time for power in 0:9
    println("10^", rpad(power, 5), primepi(10^power))
end

Output:

10^0    0
10^1    4
10^2    25
10^3    168
10^4    1229
10^5    9592
10^6    78498
10^7    664579
10^8    5761455
10^9    50847534
  1.935556 seconds (1.64 k allocations: 415.526 MiB, 2.15% gc time)

Using the method given in the task

Translation of: CoffeeScript

Run twice to show the benefits of precomputing the library prime functions.

using Primes

const maxpi = 1_000_000_000
const memoφ = Dict{Vector{Int}, Int}()
const pₐ = primes(isqrt(maxpi))

function π(n)
    function φ(x, a)
        if !haskey(memoφ, [x, a])
            memoφ[[x, a]] = a == 0 ? x : φ(x, a - 1) - φ(x ÷ pₐ[a], a - 1)
        end
        return memoφ[[x, a]]
    end

    n < 2 && return 0
    a = π(isqrt(n))
    return φ(n, a) + a - 1
end

@time for i in 0:9
    println("10^", rpad(i, 5), π(10^i))
end

@time for i in 0:9
    println("10^", rpad(i, 5), π(10^i))
end

Output:

10^0    1
10^1    4   
10^2    25  
10^3    168 
10^4    1229
10^5    9592
10^6    78498
10^7    664579
10^8    5761455
10^9    50847534
 12.939211 seconds (69.44 M allocations: 3.860 GiB, 27.98% gc time, 1.58% compilation time)
10^0    1
10^1    4
10^2    25
10^3    168
10^4    1229
10^5    9592
10^6    78498
10^7    664579
10^8    5761455
10^9    50847534
  0.003234 seconds (421 allocations: 14.547 KiB)

Mathematica/Wolfram Language

ClearAll[Phi,pi]
$RecursionLimit = 10^6;
Phi[x_, 0] := x
Phi[x_, a_] := Phi[x, a] = Phi[x, a - 1] - Phi[Floor[x/Prime[a]], a - 1]
pi[n_] := Module[{a}, If[n < 2, 0, a = pi[Floor[Sqrt[n]]]; Phi[n, a] + a - 1]]
Scan[Print[pi[10^#]] &, Range[0,9]]

Output:

Nim

The program runs in 2.2 seconds on our laptop. Using int32 instead of naturals (64 bits on our 64 bits computer) saves 0.3 second. Using an integer rather than a tuple as key (for instance x shl 32 or a) saves 0.4 second.

import math, strutils, sugar, tables

const
  N = 1_000_000_000
  S = sqrt(N.toFloat).int

var composite: array[3..S, bool]
for n in countup(3, S, 2):
  if n * n > S: break
  if not composite[n]:
    for k in countup(n * n, S, 2 * n):
      composite[k] = true

# Prime list. Add a dummy zero to start at index 1.
let primes = @[0, 2] & collect(newSeq, for n in countup(3, S, 2): (if not composite[n]: n))

var cache: Table[(Natural, Natural), Natural]

proc phi(x, a: Natural): Natural =
  if a == 0: return x
  let pair = (x, a)
  if pair in cache: return cache[pair]
  result = phi(x, a - 1) - phi(x div primes[a], a - 1)
  cache[pair] = result

proc π(n: Natural): Natural =
  if n <= 2: return 0
  let a = π(sqrt(n.toFloat).Natural)
  result = phi(n, a) + a - 1

var n = 1
for i in 0..9:
  echo "π(10^$1) = $2".format(i, π(n))
  n *= 10

Output:

π(10^0) = 0
π(10^1) = 4
π(10^2) = 25
π(10^3) = 168
π(10^4) = 1229
π(10^5) = 9592
π(10^6) = 78498
π(10^7) = 664579
π(10^8) = 5761455
π(10^9) = 50847534

Non-Memoized Version

The above code takes a huge amount of memory to run, especially if using the full size "Naturals" which not using them will severely limit the usable counting range (which of course is limited anyway by the huge memory use by the memoization). As per the comments on the task, memoization is not strictly needed, as the exponential growth in execution time can be limited by a very simple optimization that stops splitting of the recursive "tree" when there are no more contributions to be had by further "leaves", as per the following code:

# compile with: nim c -d:danger -t:-march=native --gc:arc

from std/monotimes import getMonoTime, `-`
from std/times import inMilliseconds
from std/math import sqrt

let masks = [ 1'u8, 2, 4, 8, 16, 32, 64, 128 ] # faster than bit twiddling
let masksp = cast[ptr[UncheckedArray[byte]]](unsafeAddr(masks[0]))

proc countPrimes(n: int64): int64 =
  if n < 3:
    return if n < 2: 0 else: 1
  else:
    let rtlmt = n.float64.sqrt.int
    let mxndx = (rtlmt - 1) div 2
    let sz = (mxndx + 8) div 8
    var cmpsts = cast[ptr[UncheckedArray[byte]]](alloc0(sz))
    for i in 1 .. mxndx:
      if (cmpsts[i shr 3] and masksp[i and 7]) != 0: continue
      let sqri = (i + i) * (i + 1)
      if sqri > mxndx: break
      let bp = i + i + 1
      for c in countup(sqri, mxndx, bp):
        let w = c shr 3; cmpsts[w] = cmpsts[w] or masksp[c and 7]
    var pisqrt = 0'i64
    for i in 0 .. mxndx:
      if (cmpsts[i shr 3] and masksp[i and 7]) == 0: pisqrt += 1
    var primes = cast[ptr[UncheckedArray[uint32]]](alloc(sizeof(uint32) * pisqrt.int))
    var j = 0
    for i in 0 .. mxndx:
      if (cmpsts[i shr 3] and masksp[i and 7]) == 0: primes[j] = (i + i + 1).uint32; j += 1
    proc phi(x: int64; a: int): int64 =
      if a <= 1:
        return if a < 1: x else: x - (x shr 1)
      let p = primes[a - 1].int64
      if x <= p: return 1 # very simple one-line optimization that limits exponential growth!
      return phi(x, a - 1) - phi((x.float64 / p.float64).int64, a - 1)
    result = phi(n, pisqrt.int) + pisqrt - 1
    cmpsts.dealloc; primes.dealloc

let nstrt = getMonoTime()
var pow = 1'i64
for i in 0 .. 9: echo "π(10^", i, ") = ", pow.countPrimes; pow *= 10
let nelpsd = (getMonoTime() - nstrt).inMilliseconds
echo "This took ", nelpsd, " milliseconds."

Output:

π(10^0) = 0
π(10^1) = 4
π(10^2) = 25
π(10^3) = 168
π(10^4) = 1229
π(10^5) = 9592
π(10^6) = 78498
π(10^7) = 664579
π(10^8) = 5761455
π(10^9) = 50847534
This took 194 milliseconds.

As shown in the output, the above program runs without memoization about 12 times faster and using a huge amount less memory than the previous memoized algorithm. Other than not requiring memoization, the implementation also substitutes floating point division for integer division so as to be faster for many CPU architectures (although only being of 53-bits accuracy instead of 64-bits; which limited precision will limit the range over which the primes can be accurately counted, but as this algorithm is only reasonably useful to about ten to the fourteenth power anyway, that shouldn't be a problem.

Using Look Up Table for "TinyPhi" and Decreasing Recursion

A further optimization that greatly reduces the number of operations and thus the run time for these small ranges is to use a pre-computed "TinyPhi" Look Up Table (LUT) array for the "leaves" where the "degree" is small (such as degree of six for the smallest primes used as 13 as used here, which then forms a repeating wheel pattern of 30030 values); this would reduce the "width" of the tree at its widest "base" to eliminate many divisions and recursions and thus reduce the execution time greatly, although the effect is more for smaller ranges such as this as for greater ranges. As well, one can compute the "tree" starting at the lower left, which then reduces some of the recursions to a simple loop and can eliminate the recursions used for succeeding "ones" according to the optimization eliminating the need for more recursion. This is because "Phi"/φ can be calculated by the given "top down" recursion expression given in the task which is derived from the "bottom up" expression as follows:

\lfloor x\rfloor -\sum _{i}\left\lfloor {\frac {x}{p_{i}}}\right\rfloor +\sum _{i<j}\left\lfloor {\frac {x}{p_{i}p_{j}}}\right\rfloor -\sum _{i<j<k}\left\lfloor {\frac {x}{p_{i}p_{j}p_{k}}}\right\rfloor +\cdots

(where $\lfloor {x}\rfloor$ denotes the floor function).

The following Nim code implements these changes:

# compile with: nim c -d:danger -t:-march=native --gc:arc

from std/monotimes import getMonoTime, `-`
from std/times import inMilliseconds
from std/math import sqrt

let masks = [ 1'u8, 2, 4, 8, 16, 32, 64, 128 ] # faster than bit twiddling
let masksp = cast[ptr[UncheckedArray[byte]]](unsafeAddr(masks[0]))

const TinyPhiPrimes = [2, 3, 5, 7, 11, 13]
const TinyPhiCirc = 3 * 5 * 7 * 11 * 13
const TinyPhiRes = 2 * 4 * 6 * 10 * 12
const CC = 6
proc makeTinyPhiLUT(): array[TinyPhiCirc, uint16] =
  for i in 0 .. TinyPhiCirc - 1: result[i] = 1
  for i in 1 .. 6:
    if result[i] == 0: continue
    result[i] = 0; let bp = i + i + 1
    let sqri = (i + i) * (i + 1)
    for c in countup(sqri, TinyPhiCirc - 1, bp): result[c] = 0
  var acc = 0'u16;
  for i in 0 .. TinyPhiCirc - 1: acc += result[i]; result[i] = acc
const TinyPhiLUT = makeTinyPhiLUT()
proc tinyPhi(x: int64): int64 {.inline.} =
  let ndx = (x - 1) div 2; let numtot = ndx div TinyPhiCirc.int64
  return numtot * TinyPhiRes.int64 + TinyPhiLUT[(ndx - numtot * TinyPhiCirc.int64).int].int64

proc countPrimes(n: int64): int64 =
  if n < 169: # below 169 whose sqrt is 13 is where TinyPhi doesn't work...
    if n < 3: return if n < 2: 0 else: 1
    # adjust for the missing "degree" base primes
    if n <= 13: return (n - 1) div 2 + (if n < 9: 1 else: 0)
    return 5 + TinyPhiLUT[(n - 1).int div 2].int64
  let rtlmt = n.float64.sqrt.int
  let mxndx = (rtlmt - 1) div 2
  var cmpsts = cast[ptr[UncheckedArray[byte]]](alloc0((mxndx + 8) div 8))
  for i in 1 .. mxndx:
    if (cmpsts[i shr 3] and masksp[i and 7]) != 0: continue
    let sqri = (i + i) * (i + 1)
    if sqri > mxndx: break
    let bp = i + i + 1
    for c in countup(sqri, mxndx, bp):
      let w = c shr 3; cmpsts[w] = cmpsts[w] or masksp[c and 7]
  var pisqrt = 0'i64
  for i in 0 .. mxndx:
    if (cmpsts[i shr 3] and masksp[i and 7]) == 0: pisqrt += 1
  let primes = cast[ptr[UncheckedArray[uint32]]](alloc(sizeof(uint32) * pisqrt.int))
  var j = 0
  for i in 0 .. mxndx:
    if (cmpsts[i shr 3] and masksp[i and 7]) == 0: primes[j] = (i + i + 1).uint32; j += 1
  var phi = tinyPhi(n)
  proc lvl(m, mbf: int64; mxa: int) = # recurse from bottom left of "tree"...
    for a in CC .. mxa:
      let p = primes[a].int64
      if m < p * p: phi += mbf * (mxa - a + 1).int64; return # rest of level all ones!
      let nm = (m.float64 / p.float64).int64; phi += mbf * tinyPhi(nm)
      if a > CC: lvl(nm, -mbf, a - 1) # split
    # finished level!
  lvl(n, -1, pisqrt.int - 1); result = phi + pisqrt - 1
  cmpsts.dealloc; primes.dealloc  

let strt = getMonoTime()
var pow = 1'i64
for i in 0 .. 9: echo "π(10^", i, ") = ", pow.countPrimes; pow *= 10
let elpsd = (getMonoTime() - strt).inMilliseconds
echo "This took ", elpsd, " milliseconds."

The results of the above code are identical to the previous version except it is almost 20 times faster due to the reduced recursion; however, the gain due to the "TinyPhi" LUT is most effective for relatively trivial counting ranges such as this and will have less and less benefit as ranges get larger, although the reduced recursion due to reordering is a constant factor gain across the range.

Larger "TinyPhi" LUT's can be used for some added benefit (especially for these smaller ranges) although they quickly get quite huge where the benefit is reduced due to lesser cache associativity; for instance the LUT for a "TinyPhi" of degree eight can a wheel circumference of 9699690 values which can be reduced by a factor of two by only considering odd values and another factor of two using the fact that the LUT is symmetric approaching the middle from the beginning and end of the LUT.

For larger ranges, a better algorithm such as that of Legarias, Miller, and Odlyzko (LMO) will perform much better than this and use much less memory, although at the cost of greater complexity and more lines of code (about 500 LOC). There is also a variation of the Legendre algorithm which is to this basic Legendre algorithm as the LMO algorithm is to the Meissel algorithm (which directly followed and improved the Legendre work), and which can be implemented in about 60 LOC to be much faster due to reduced computational complexity although it still uses a lot of memory.

Perl

#!/usr/bin/perl

use strict; # https://rosettacode.org/wiki/Legendre_prime_counting_function
use warnings;
no warnings qw(recursion);
use ntheory qw( nth_prime prime_count );

my (%cachephi, %cachepi);

sub phi
  {
  return $cachephi{"@_"} //= do {
    my ($x, $aa) = @_;
    $aa <= 0 ? $x : phi($x, $aa - 1) - phi(int $x / nth_prime($aa), $aa - 1) };
  }

sub pi
  {
  return $cachepi{$_[0]} //= do {
    my $n = shift;
    $n < 2 ? 0 : do{ my $aa = pi(int sqrt $n); phi($n, $aa) + $aa - 1 } };
  }

print "e             n   Legendre    ntheory\n",
      "-             -   --------    -------\n";
for (1 .. 9)
  {
  printf "%d  %12d %10d %10d\n", $_, 10**$_, pi(10**$_), prime_count(10**$_);
  }

Output:

e             n   Legendre    ntheory
-             -   --------    -------
1            10          4          4
2           100         25         25
3          1000        168        168
4         10000       1229       1229
5        100000       9592       9592
6       1000000      78498      78498
7      10000000     664579     664579
8     100000000    5761455    5761455
9    1000000000   50847534   50847534

Phix

with javascript_semantics
--
-- While a phix dictionary can handle keys of {x,a}, for this 
-- task performance was dreadful (7,612,479 entries, >3mins),
-- so instead memophix maps known x to an index to memophia 
-- which holds the full [1..a] for each x, dropping to a much
-- more respectable (albeit not super-fast) 14s
--
constant memophix = new_dict()
sequence memophia = {}  -- 1..a (max 3401) for each x

function phi(integer x, a)
    if a=0 then return x end if
    integer adx = getd(x,memophix), res
    if adx=NULL then
        memophia = append(memophia,repeat(-1,a))
        adx = length(memophia)
        setd(x,adx,memophix)
    else
        object ma = memophia[adx]
        integer l = length(ma)
        if a>l then
            memophia[adx] = 0   -- kill refcount
            memophia[adx] = ma & repeat(-1,a-l)
        else
            res = ma[a]
            if res>=0 then return res end if
        end if
        ma = 0                  -- kill refcount
    end if
    res = phi(x, a-1) - phi(floor(x/get_prime(a)), a-1)
    memophia[adx][a] = res
    return res
end function
 
function pi(integer n)
    if n<2 then return 0 end if
    integer a = pi(floor(sqrt(n)))
    return phi(n, a) + a - 1
end function
 
atom t0 = time()
for i=0 to iff(platform()=JS?8:9) do
    printf(1,"10^%d  %d\n",{i,pi(power(10,i))})
--  printf(1,"10^%d  %d\n",{i,length(get_primes_le(power(10,i)))})
end for
?elapsed(time()-t0)

The commented-out length(get_primes_le()) gives the same results, in about the same time, and in both cases re-running the main loop a second time finishes near-instantly.

Output:

10^0  0
10^1  4
10^2  25
10^3  168
10^4  1229
10^5  9592
10^6  78498
10^7  664579
10^8  5761455
10^9  50847534
"14.0s"

(It is about 4 times slower under pwa/p2js so output is limited to 10^8, unless you like staring at a blank screen for 52s)

Picat

Performance is surprisingly good, considering that the Picat's library SoE is very basic and that I recompute the base primes each time pi() is called. This version takes 23 seconds on my laptop, outperforming the CoffeeScript version which takes 30 seconds.

table(+, +, nt)
phi(X, 0, _) = X.
phi(X, A, Primes) = phi(X, A - 1, Primes) - phi(X // Primes[A], A - 1, Primes).

pi(N) = Count, N < 2 => Count = 0.
pi(N) = Count =>
    M = floor(sqrt(N)),
    A = pi(M),
    Count = phi(N, A, math.primes(M)) + A - 1.

main =>
    N = 1,
    foreach (K in 0..9)
        writef("10^%w    %w%n", K, pi(N)),
        N := N * 10.

Output:

10^0	0
10^1	4
10^2	25
10^3	168
10^4	1229
10^5	9592
10^6	78498
10^7	664579
10^8	5761455
10^9	50847534

PicoLisp

Uses code from the Sieve of Eratosthenes to generate primes.

(setq Legendre-Max (** 10 9))

(load "plcommon/eratosthenes.l")  # see task "Sieve of Eratosthenes, 2x3x5x7 wheel version.

# Create an index tree of the first N primes up to √Legendre-Max
(setq Sieve (sieve (sqrt Legendre-Max)))
(balance 'Primes (mapcar 'cons (range 1 (length Sieve)) Sieve))
(de prime (N)  (cdr (lup Primes N)))

(de ϕ (X A)
   (cache '(NIL) (cons X A)
      (if (=0 A)
         X
         (- (ϕ X (dec A)) (ϕ (/ X (prime A)) (dec A))))))

(de π (N)
   (if (< N 2)
      0
      (let A (π (sqrt N))
         (+ (ϕ N A) A -1))))

(for N 10 
   (prinl "10\^" (dec N) "^I" (π (** 10 (dec N)))))

(bye)

Output:

10^0	0
10^1	4
10^2	25
10^3	168
10^4	1229
10^5	9592
10^6	78498
10^7	664579
10^8	5761455
10^9	50847534

Python

Using primesieve module (pip install primesieve).

from primesieve import primes
from math import isqrt
from functools import cache

p = primes(isqrt(1_000_000_000))

@cache
def phi(x, a):
    res = 0
    while True:
        if not a or not x:
            return x + res
    
        a -= 1
        res -= phi(x//p[a], a) # partial tail recursion

def legpi(n):
    if n < 2: return 0

    a = legpi(isqrt(n))
    return phi(n, a) + a - 1

for e in range(10):
    print(f'10^{e}', legpi(10**e))

Output:

10^0 0
10^1 4
10^2 25
10^3 168
10^4 1229
10^5 9592
10^6 78498
10^7 664579
10^8 5761455
10^9 50847534

Raku

Seems like an awful lot of pointless work. Using prime sieve anyway, why not just use it?

use Math::Primesieve;

my $sieve = Math::Primesieve.new;

say "10^$_\t" ~ $sieve.count: exp($_,10) for ^10;

say (now - INIT now) ~ ' elapsed seconds';

Output:

10^0	0
10^1	4
10^2	25
10^3	168
10^4	1229
10^5	9592
10^6	78498
10^7	664579
10^8	5761455
10^9	50847534
0.071464489 elapsed seconds

Sidef

func legendre_phi(x, a) is cached {
    return x if (a <= 0)
    __FUNC__(x, a-1) - __FUNC__(idiv(x, a.prime), a-1)
}

func legendre_prime_count(n) is cached {
    return 0 if (n < 2)
    var a = __FUNC__(n.isqrt)
    legendre_phi(n, a) + a - 1
}

print("e             n   Legendre    builtin\n",
      "-             -   --------    -------\n")

for n in (1 .. 9) {
    printf("%d  %12d %10d %10d\n", n, 10**n,
        legendre_prime_count(10**n), prime_count(10**n))
    assert_eq(legendre_prime_count(10**n), prime_count(10**n))
}

Output:

e             n   Legendre    builtin
-             -   --------    -------
1            10          4          4
2           100         25         25
3          1000        168        168
4         10000       1229       1229
5        100000       9592       9592
6       1000000      78498      78498
7      10000000     664579     664579

Alternative implementation of the Legendre phi function, by counting k-rough numbers <= n.

func rough_count (n,k) {

    # Count of k-rough numbers <= n.

    func (n,p) is cached {

        if (p > n.isqrt) {
            return 1
        }

        if (p == 2) {
            return (n >> 1)
        }

        if (p == 3) {
            var t = idiv(n,3)
            return (t - (t >> 1))
        }

        var u = 0
        var t = idiv(n,p)

        for (var q = 2; q < p; q.next_prime!) {

            var v = __FUNC__(t - (t % q), q)

            if (v == 1) {
                u += prime_count(q, p-1)
                break
            }

            u += v
        }

        t - u
    }(n*k, k)
}

func legendre_phi(n, a) {
     rough_count(n, prime(a+1))
}

func legendre_prime_count(n) is cached {
    return 0 if (n < 2)
    var a = __FUNC__(n.isqrt)
    legendre_phi(n, a) + a - 1
}

print("e             n   Legendre    builtin\n",
      "-             -   --------    -------\n")

for n in (1 .. 9) {
    printf("%d  %12d %10d %10d\n", n, 10**n,
        legendre_prime_count(10**n), prime_count(10**n))
    assert_eq(legendre_prime_count(10**n), prime_count(10**n))
}

Swift

import Foundation

extension Numeric where Self: Strideable {
  @inlinable
  public func power(_ n: Self) -> Self {
    return stride(from: 0, to: n, by: 1).lazy.map({_ in self }).reduce(1, *)
  }
}

func eratosthenes(limit: Int) -> [Int] {
  guard limit >= 3 else {
    return limit < 2 ? [] : [2]
  }

  let ndxLimit = (limit - 3) / 2 + 1
  let bufSize = ((limit - 3) / 2) / 32 + 1
  let sqrtNdxLimit = (Int(Double(limit).squareRoot()) - 3) / 2 + 1
  var cmpsts = Array(repeating: 0, count: bufSize)

  for ndx in 0..<sqrtNdxLimit where (cmpsts[ndx >> 5] & (1 << (ndx & 31))) == 0 {
    let p = ndx + ndx + 3
    var cullPos = (p * p - 3) / 2

    while cullPos < ndxLimit {
      cmpsts[cullPos >> 5] |= 1 << (cullPos & 31)

      cullPos += p
    }
  }

  return (-1..<ndxLimit).compactMap({i -> Int? in
    if i < 0 {
      return 2
    } else {
      if cmpsts[i >> 5] & (1 << (i & 31)) == 0 {
        return .some(i + i + 3)
      } else {
        return nil
      }
    }
  })
}

let primes = eratosthenes(limit: 1_000_000_000)

func φ(_ x: Int, _ a: Int) -> Int {
  struct Cache {
    static var cache = [String: Int]()
  }

  guard a != 0 else {
    return x
  }

  guard Cache.cache["\(x),\(a)"] == nil else {
    return Cache.cache["\(x),\(a)"]!
  }

  Cache.cache["\(x),\(a)"] = φ(x, a - 1) - φ(x / primes[a - 1], a - 1)

  return Cache.cache["\(x),\(a)"]!
}

func π(n: Int) -> Int {
  guard n > 2 else {
    return 0
  }

  let a = π(n: Int(Double(n).squareRoot()))

  return φ(n, a) + a - 1
}

for i in 0..<10 {
  let n = 10.power(i)

  print("π(10^\(i)) = \(π(n: n))")
}

Output:

π(10^0) = 0
π(10^1) = 4
π(10^2) = 25
π(10^3) = 168
π(10^4) = 1229
π(10^5) = 9592
π(10^6) = 78498
π(10^7) = 664579
π(10^8) = 5761455
π(10^9) = 50847534

Wren

Library: Wren-math

Memoized

This runs in about 5.7 seconds which is not too bad for the Wren interpreter. As map keys cannot be lists, the Cantor pairing function has been used to represent [x, a] which is considerably faster than using a string based approach for memoization.

To sieve a billion numbers and then count the primes up to 10^k would take about 53 seconds in Wren so, as expected, the Legendre method represents a considerable speed up.

import "/math" for Int

var limit = 1e9.sqrt.floor
var primes = Int.primeSieve(limit)
var memoPhi = {}

var phi // recursive function
phi = Fn.new { |x, a|
    if (a == 0) return x
    var key = Int.cantorPair(x, a)
    if (memoPhi.containsKey(key)) return memoPhi[key]
    var pa = primes[a-1]
    return memoPhi[key] = phi.call(x, a-1) - phi.call((x/pa).floor, a-1)
}

var pi // recursive function
pi = Fn.new { |n|
    if (n < 2) return 0
    var a = pi.call(n.sqrt.floor)
    return phi.call(n, a) + a - 1
}

var n = 1
for (i in 0..9) {
    System.print("10^%(i)  %(pi.call(n))")
    n = n * 10
}

Output:

10^0  0
10^1  4
10^2  25
10^3  168
10^4  1229
10^5  9592
10^6  78498
10^7  664579
10^8  5761455
10^9  50847534

Non-memoized

Inspired by the non-memoized Nim version.

Marginally quicker (5.4 seconds) than the memoized version and, of course, uses less memory.

import "./math" for Int

var pi = Fn.new { |n|
    if (n < 3) return (n < 2) ? 0 : 1
    var primes = Int.primeSieve(n.sqrt.floor)
    var a = primes.count

    var phi // recursive closure
    phi = Fn.new { |x, a|
        if (a <= 1) return (a < 1) ? x : x - (x >> 1)
        var pa = primes[a-1]
        if (x <= pa) return 1
        return phi.call(x, a-1) - phi.call((x/pa).floor, a-1)
    }

    return phi.call(n, a) + a - 1
}

var n = 1
for (i in 0..9) {
    System.print("10^%(i)  %(pi.call(n))")
    n = n * 10
}

Output:

Same as first version.