Talk:Statistics/Chi-squared distribution

From Rosetta Code

Chi-squared cumulative probability distribution

I'm wondering whether the formula given in the task description for the above is in fact correct?

According to Wikipedia, for the case k = 2, this formula reduces to F(x; 2) = 1 - exp(-x/2). But for values of x from 0 to 10, the only way I can get this simple formula to agree with the one in the task description is if I halve 'x' first before applying the latter. Here's the Wren code I have so far in case I'm missing something or have done something silly:

import "./math" for Math
import "./fmt" for Fmt

class Chi2 {
    static pdf(x, k) {
        if (x <= 0) return 0  // this should be in task description
        return (-x/2).exp * x.pow(k/2-1) / (2.pow(k/2) * Math.gamma(k/2))
    }

    static cpdf(x, k) {
        x = x / 2  // need to do this to agree with Wikipedia formula for k = 2
        var t = (-x).exp * x.pow(k/2) * Math.gamma(k/2) / Math.gamma(k)
        var s = 0
        var m = 0
        var tol = 1e-15 // say
        while (true) {
            var term = x.pow(m) / Math.gamma(k/2 + m + 1)
            s = s + term
            if (term.abs < tol) break
            m = m + 1
        }
        return t * s
    }
}
System.print("    Values of the χ2 probablility distribution function")
System.print(" x/k    1         2         3         4         5")
for (x in 0..10) {
    Fmt.write("$2d  ", x)
    for (k in 1..5) {
        Fmt.write("$f  ", Chi2.pdf(x, k))
    }
    System.print()
}
System.print("\n------Checking cpdf formula works for k = 2-------")
System.print("\n    Values of the χ2 cumulative probability distribution function for k = 2")
System.print(" x")
for (x in 0..10) {
    Fmt.print("$2d  $f", x,  Chi2.cpdf(x, 2))
}
System.print("\n    Values of the χ2 cumulative pdf using simple formula for k = 2")
System.print(" x")
for (x in 0..10) {
    Fmt.print("$2d  $f", x, 1 - (-x/2).exp)
}
Output:
    Values of the χ2 probablility distribution function
 x/k    1         2         3         4         5
 0  0.000000  0.000000  0.000000  0.000000  0.000000  
 1  0.241971  0.303265  0.241971  0.151633  0.080657  
 2  0.103777  0.183940  0.207554  0.183940  0.138369  
 3  0.051393  0.111565  0.154180  0.167348  0.154180  
 4  0.026995  0.067668  0.107982  0.135335  0.143976  
 5  0.014645  0.041042  0.073225  0.102606  0.122042  
 6  0.008109  0.024894  0.048652  0.074681  0.097304  
 7  0.004553  0.015099  0.031873  0.052845  0.074371  
 8  0.002583  0.009158  0.020667  0.036631  0.055112  
 9  0.001477  0.005554  0.013296  0.024995  0.039887  
10  0.000850  0.003369  0.008500  0.016845  0.028335  

------Checking cpdf formula works for k = 2-------

    Values of the χ2 cumulative probability distribution function for k = 2
 x
 0  0.000000
 1  0.393469
 2  0.632121
 3  0.776870
 4  0.864665
 5  0.917915
 6  0.950213
 7  0.969803
 8  0.981684
 9  0.988891
10  0.993262

    Values of the χ2 cumulative pdf using simple formula for k = 2
 x
 0  0.000000
 1  0.393469
 2  0.632121
 3  0.776870
 4  0.864665
 5  0.917915
 6  0.950213
 7  0.969803
 8  0.981684
 9  0.988891
10  0.993262

As there's some heavy math here, a Julia or Python example to check against would be useful. --PureFox (talk) 09:46, 2 October 2022 (UTC)

I haven't looked closely at this one, yet. But it's also worth asking whether the wikipedia entry is using notation in the way we expect
That said, there's a small j wiki entry on this subject which claims to include an example which matches "calculated results against values from the Handbook of Mathematical Functions by Abramowitz and Stegun, Table 26.8." --Rdm (talk) 09:59, 2 October 2022 (UTC)

Well, the Wikipedia formula looks correct to me. If we substitute k = 2 in their formula for the pdf (which agrees with the task description so there's no notational difference here) we get (since gamma(1) = 1):

f(x; 2) = exp(-x/2) / 2

and if we integrate that we get:

F(x; 2) = 1 - exp(-x/2)

PureFox (talk) 10:48, 2 October 2022 (UTC)

This tells us that that notation in the task description matches that notation in the wikipedia entry. --Rdm (talk) 11:11, 2 October 2022 (UTC)
I am right now also having trouble with the cdf formula at the level of the incomplete Gamma I am using. I think I may have to calculate it in 2 or 3 different ways depending on x and k. --Wherrera (talk) 18:33, 2 October 2022 (UTC)
Well, I gave up getting incomplete gamma to work accurately with just two methods, and am just going to call the one in the BigFloat MPFR library, which I think uses six or more methods for incomplete gamma based on the a and x values. --Wherrera (talk) 19:43, 2 October 2022 (UTC)
I did find that the two formulas I had to calculate the last summation used 2 different meanings for x which had things overly complicated and incorrect. Fixed. --Wherrera (talk) 01:22, 3 October 2022 (UTC)

Thanks for correcting the cumulative pdf formula. I'm pleased we didn't have to resort to the mpfr_gamma_inc function as it's not one I've wrapped in Wren-GMP. --PureFox (talk) 09:48, 3 October 2022 (UTC)