Logistic curve fitting in epidemiology: Difference between revisions

Content added Content deleted

Inline

Revision as of 14:06, 29 May 2020

The least-squares method (see references below) in statistics is used to fit data to the best of a family of similar curves by finding the parameters for a curve which minimizes the total of the distances from each data point to the curve.

Often, the curve used is a straight line, in which case the method is also called linear regression. If a curve which uses logarithmic growth is fit, the method can be called logistic regression.

A commonly used family of functions used in statistical studies of populations, including the growth of epidemics, are curves akin to the logistic curve:

   f(x) = L / (1 + e^-k(x-x₀))

Though predictions based on fitting to such curves may err, especially if used to extrapolate from incomplete data, curves similar to the logistic curve have had good fits in population studies, including modeling the growth of past epidemics.

The task:

Task

Given the following daily world totals since December 31, 2019 for persons

who have become infected with the novel coronavirus Covid-19:

Daily totals:
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820,
4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273,
31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103,
69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339, 
80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077, 
95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 
133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364, 
271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 
656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652

Use the following variant of the logistic curve as a formula:

   f(t) = n₀ e^{(r t)} / ((1 + n₀ (e^{(r t)} - 1) / K)

Where r is the rate of growth of the infection in the population.

The R0 of an infection (different from r above) is a measure of how many new individuals will become infected for every individual currently infected. It is an important measure of how quickly an infectious disease may spread.

R0 is related to the logistic curve's r parameter by the formula

   r ≈ ln(R0) / G

where G, the generation time, is roughly the sum of the incubation time, perhaps 5 days, and the mean contagion period, perhaps 7 days, so, for covid-19, roughly we have

   R0 ≈ e^12r

Assume the following constants hold in the formula above:

K is the world population, about 7.8 billion
n0 is 27, the number of cases found in China at the start of the pandemic.

Task

Demonstrate code that finds a least-squares fits of the curve to the data.
Show the calculated r for the logistic curve.
Show the final R0 parameter you calculate from the logistic curve r value parameter.

See also

[Basic reproduction number] [Logistic functions] [Least squares] [World covid-19 case tallies] [Calculating r and R0]

Go

Library: lm

This uses the Levenberg-Marquardt method. <lang go>package main

import (

   "fmt"
   "github.com/maorshutman/lm"
   "log"
   "math"

)

const (

   K  = 7_800_000_000 // approx world population
   n0 = 27            // number of cases at day 0

)

var y = []float64{

   27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
   61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
   2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
   24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
   60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
   76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
   85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
   105824, 109695, 114232, 118610, 125497, 133852, 143227,
   151367, 167418, 180096, 194836, 213150, 242364, 271106,
   305117, 338133, 377918, 416845, 468049, 527767, 591704,
   656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
   1174652,

}

func f(dst, p []float64) {

   for i := 0; i < len(y); i++ {
       t := float64(i)
       dst[i] = (n0*math.Exp(p[0]*t))/(1+n0*(math.Exp(p[0]*t)-1)/K) - y[i]
   }

}

func main() {

   j := lm.NumJac{Func: f}
   prob := lm.LMProblem{
       Dim:        1,
       Size:       len(y),
       Func:       f,
       Jac:        j.Jac,
       InitParams: []float64{0.5},
       Tau:        1e-6,
       Eps1:       1e-8,
       Eps2:       1e-8,
   }
   res, err := lm.LM(prob, &lm.Settings{Iterations: 100, ObjectiveTol: 1e-16})
   if err != nil {
       log.Fatal(err)
   }
   r := res.X[0]
   fmt.Printf("The logistic curve r for the world data is %.8f\n", r)
   fmt.Printf("R0 is then approximately equal to %.7f\n", math.Exp(12*r))

}</lang>

Output:

The logistic curve r for the world data is 0.11230218
R0 is then approximately equal to 3.8482793

Julia

<lang julia>using LsqFit

const K = 7_800_000_000 # approximate world population const n0 = 27 # starting at day 0 with 27 Chinese cases

""" The model for logistic regression with a given r """ @. model(t, r) = (n0 * exp(r * t)) / (( 1 + n0 * (exp(r * t) - 1) / K))

Daily world totals of covid cases, all countries

ydata = [ 27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652, ] tdata = collect(LinRange(0.0, 96, 97))

starting approximation for r of 1/2

rparam = [0.5]

fit = curve_fit(model, tdata, ydata, rparam)

Our answer for r given the world data and simplistic model

r = fit.param println("The logistic curve r for the world data is: ", r) println("The confidence interval at 5% significance is: ",

   confidence_interval(fit, 0.05))

println("Since R0 ≈ exp(G * r), and G ≈ 12 days, R0 ≈ ", exp(12r[1]))

</lang>

Output:

The logistic curve r for the world data is: [0.11230217572265622]
The confidence interval at 5% significance is: [(0.11199074156706985, 0.11261360987824258)]
Since R0 ≈ exp(G * r), and G ≈ 12 days, R0 ≈ 3.8482792820761063

Perl

Translation of: Phix

<lang perl>use strict; use warnings;

my $K = 7_800_000_000; # population my $n0 = 27; # cases @ day 0

my @y = (

   27,     27,     27,     44,     44,     59,     59,     59,    59,      59,     59,     59,     59,
   60,     60,     61,     61,     66,     83,    219,    239,    392,    534,    631,    897,   1350,
 2023,   2820,   4587,   6067,   7823,   9826,  11946,  14554,  17372,  20615,  24522,  28273,  31491,
34933,  37552,  40540,  43105,  45177,  60328,  64543,  67103,  69265,  71332,  73327,  75191,  75723,
76719,  77804,  78812,  79339,  80132,  80995,  82101,  83365,  85203,  87024,  89068,  90664,  93077,
95316,  98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096,

194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353, 777796, 851308, 928436,1000249,1082054,1174652 );

sub logistic_func {

   my($r) = @_;
   my $sq = 0;
   for my $i (0 .. @y-1) {
       my $eri = exp($r * $i);
       my $dst = ($n0 * $eri) / (1 + $n0 * ($eri-1) / $K) - $y[$i];
       $sq = $sq + $dst**2;
   }
   $sq

}

sub solve {

   my($fn, $guess, $epsilon) = @_;
   my($nfm,$nfp);
   my $f0 = &$fn($guess);
   my $delta = $guess;
   my $factor = 2;
   while ($delta > $epsilon) {
       ($nfm = &$fn($guess - $delta)) < $f0 ?
           ($f0 = $nfm, $guess -= $delta, $delta *= $factor)
       :
       ($nfp = &$fn($guess + $delta)) < $f0 ?
           ($f0 = $nfp, $guess += $delta, $delta *= $factor)
       :
           $delta /= $factor
   }
   $guess

}

my $r = solve(\&logistic_func, 0.5, 0); my $R0 = exp(12 * $r); printf "r = %%(%.3f), R0 = %%(%.3f)\n", $r, $R0;</lang>

Output:

r = %(0.112), R0 = %(3.848)

Phix

Simplified, my interpretation of shift-cutting (from wp:Non-linear_least_squares) <lang Phix>-- demo\rosetta\Curve_fit.exw constant K = 7_800_000_000, -- approx world population

        n0 = 27,               -- number of cases at day 0
        actual = {
   27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
   61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
   2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
   24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
   60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
   76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
   85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
   105824, 109695, 114232, 118610, 125497, 133852, 143227,
   151367, 167418, 180096, 194836, 213150, 242364, 271106,
   305117, 338133, 377918, 416845, 468049, 527767, 591704,
   656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
   1174652 }

function f(atom r)

   atom sq = 0
   for i=1 to length(actual) do
       atom eri = exp(r*(i-1)),
            guess = (n0*eri)/(1+n0*(eri-1)/K),
            diff = guess-actual[i]
       sq += diff*diff
   end for
   return sq

end function

function solve(integer f, atom guess=0.5, epsilon=0)

   atom f0 = f(guess),
        delta = iff(guess?guess:1),
        factor = 2 -- double until f0 best, then
                   -- halve until delta<=epsilon
                   --  or we hit precision limit
   while delta>epsilon             -- (predefined limit)
     and guess!=guess-delta do     -- (precision limit)
       atom nf = f(guess-delta)
       if nf<f0 then
           f0 = nf
           guess -= delta
       else
           nf = f(guess+delta)
           if nf<f0 then
               f0 = nf
               guess += delta
           else
               factor = 0.5
           end if
       end if
       delta *= factor
   end while
   return guess

end function

atom r = solve(f),

    R0 = exp(12 * r)

printf(1,"r = %.10f, R0 = %.8f\n",{r,R0})</lang>

Output:

r = 0.1123021757, R0 = 3.84827928

Python

Uses NumPy/SciPy's optimize package. <lang python>import numpy as np import scipy.optimize as opt

n0, K = 27, 7_800_000_000

def f(t, r):

   return (n0 * np.exp(r * t)) / (( 1 + n0 * (np.exp(r * t) - 1) / K))

y = [ 27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652, ] x = np.linspace(0.0, 96, 97)

r, cov = opt.curve_fit(f, x, y, [0.5])

Our answer for r given the world data and simplistic model

print("The r for the world Covid-19 data is:", r,

   ", with covariance of", cov)

print("The calculated R0 is then", np.exp(12 * r))

</lang>

Output:

The r for the world Covid-19 data is: [0.11230218] , with covariance of [[2.46164331e-08]]
The calculated R0 is then [3.8482793]

R

<lang R> library(minpack.lm)

K <- 7800000000 # approximate world population n0 <- 27 # starting at day 0 with 27 Chinese cases

x <- seq(from=0.0, 96.0, length=97)

The model for logistic regression with a given r0

getPred <- function(par, xx) (n0 * exp(par$r * xx)) / (( 1 + n0 * (exp(par$r * xx) - 1) / K))

residFun <- function(p, observed, xx) observed - getPred(p, xx)

parStart <- list(r=0.5)

Daily world totals of covid cases, all countries

observed <- c(

 27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
 61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
 2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
 24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
 60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
 76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
 85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
 105824, 109695, 114232, 118610, 125497, 133852, 143227,
 151367, 167418, 180096, 194836, 213150, 242364, 271106,
 305117, 338133, 377918, 416845, 468049, 527767, 591704,
 656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
 1174652)

nls.out <- nls.lm(par=parStart, fn=residFun, observed=observed, xx = x)

cat("The r for the model is: ", coef(nls.out), "\n")

cat("Therefore, R0 is approximately: ", exp(12 * coef(nls.out)))

</lang>

Output:

The r for the model is:  0.1123022 
Therefore, R0 is approximately:  3.848279

Raku

Works with: Rakudo version 2020.05

Translation of: Perl

Original task numbers of cases per day as of April 05 plus updated, as of today, May 11 numbers.

<lang perl6>my $K = 7_800_000_000; # population my $n0 = 27; # cases @ day 0

my @Apr05 = <

        27      27      27      44      44      59      59      59      59      59
        59      59      59      60      60      61      61      66      83     219
       239     392     534     631     897    1350    2023    2820    4587    6067
      7823    9826   11946   14554   17372   20615   24522   28273   31491   34933
     37552   40540   43105   45177   60328   64543   67103   69265   71332   73327
     75191   75723   76719   77804   78812   79339   80132   80995   82101   83365
     85203   87024   89068   90664   93077   95316   98172  102133  105824  109695
    114232  118610  125497  133852  143227  151367  167418  180096  194836  213150
    242364  271106  305117  338133  377918  416845  468049  527767  591704  656866
    715353  777796  851308  928436 1000249 1082054 1174652

>;

my @May11 = <

        27      27      27      44      44      59      59      59      59      59
        59      59      59      60      60      61      61      66      83     219
       239     392     534     631     897    1350    2023    2820    4587    6067
      7823    9826   11946   14554   17372   20615   24522   28273   31491   34933
     37552   40540   43105   45177   60328   64543   67103   69265   71332   73327
     75191   75723   76719   77804   78812   79339   80132   80995   82101   83365
     85203   87026   89068   90865   93077   95316   98172  102133  105824  109695
    114235  118613  125497  133853  143259  154774  167418  180094  194843  213149
    242374  271116  305235  338235  377968  416881  468092  527839  591796  656834
    715377  777187  851587  928491 1006063 1087822 1174306 1245601 1316988 1391881
   1476792 1563819 1653160 1734868 1807256 1873639 1953786 2033745 2117297 2199460
   2278484 2350993 2427353 2513399 2579823 2657910 2731217 2832750 2915977 2981542
   3054404 3131487 3216467 3308341 3389459 3468047 3545486 3624789 3714816 3809262
   3899379 3986931 4063525

>;

sub logistic-func ($rate, @y) {

   my $sq = 0;
   for ^@y -> $time {
       my $ert = exp($rate * $time);
       my $δt = ($n0 * $ert) / (1 + $n0 * ($ert-1) / $K) - @y[$time];
       $sq += $δt²;
   }
   $sq

}

sub solve (&f, $guess, \ε, @y) {

   my $fₙ-minus;
   my $fₙ-plus;
   my $rate   = $guess;
   my $fₙ     = f $rate, @y;
   my $Δ      = $rate;
   my $factor = 2;
   while $Δ > ε {
       ($fₙ-minus = f $rate - $Δ, @y) < $fₙ ??
       do {
           $fₙ    = $fₙ-minus;
           $rate -= $Δ;
           $Δ    *= $factor;
       } !!
       ($fₙ-plus = f $rate + $Δ, @y) < $fₙ ??
       do {
           $fₙ    = $fₙ-plus;
           $rate += $Δ;
           $Δ    *= $factor;
       } !!
       $Δ /= $factor
   }
   $rate

}

for @Apr05, 'Dec 31 - Apr 5',

   @May11, 'Dec 31 - May 11' -> @y, $period {
   my $rate = solve(&logistic-func, 0.5, 0, @y);
   my $R0   = exp(12 * $rate);
   say "\nFor period: $period";
   say "Instantaneous rate of growth: r = " , $rate.fmt('%08f');
   say "Reproductive rate: R0 = ", $R0.fmt('%08f');

} </lang>

Output:

For period: Dec 31 - Apr 5
Instantaneous rate of growth: r = 0.112302
Reproductive rate: R0 = 3.848279

For period: Dec 31 - May 11
Instantaneous rate of growth: r = 0.093328
Reproductive rate: R0 = 3.064672

Wren

Translation of: Phix

Library: Wren-math

<lang ecmascript>import "/math" for Math

var K = 7800000000 // approx world population var n0 = 27 // number of cases at day 0

var y = [

   27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
   61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
   2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
   24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
   60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
   76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
   85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
   105824, 109695, 114232, 118610, 125497, 133852, 143227,
   151367, 167418, 180096, 194836, 213150, 242364, 271106,
   305117, 338133, 377918, 416845, 468049, 527767, 591704,
   656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
   1174652

]

var f = Fn.new { |r|

   var sq = 0
   for (i in 0...y.count) {
       var eri = Math.exp(r*i)
       var dst = (n0*eri)/(1+n0*(eri-1)/K) - y[i]
       sq = sq + dst * dst
   }
   return sq

}

var solve = Fn.new { |f, guess, epsilon|

   var f0 = f.call(guess)
   var delta = guess
   var factor = 2 // double until f0 best then halve until delta <= epsilon
   while (delta > epsilon) {
       var nf = f.call(guess - delta)
       if (nf < f0) {
           f0 = nf
           guess = guess - delta
       } else {
           nf = f.call(guess + delta)
           if (nf < f0) {
               f0 = nf
               guess = guess + delta
           } else {
               factor = 0.5
           }
       }
       delta = delta * factor
   }
   return guess

}

var r = (solve.call(f, 0.5, 0) * 1e10).round / 1e10 var R0 = (Math.exp(12 * r) * 1e8).round / 1e8 System.print("r = %(r), R0 = %(R0)")</lang>

Output:

r = 0.1123021757, R0 = 3.84827928