Verify distribution uniformity/Chi-squared test: Difference between revisions

From Rosetta Code
Content added Content deleted
(Created a new task to improve statistics coverage)
 
(Add J)
Line 1: Line 1:
{{task|Probability and statistics}}
{{task|Probability and statistics}}
In this task, write a function to verify that a given distribution of values is uniform by using the [[wp:Pearson's chi-square test|<math>\chi^2</math> test]] to see if the distribution has a likelihood of happening of at least the significance level (conventionally 5%). The function should return a boolean that is true if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce.
In this task, write a function to verify that a given distribution of values is uniform by using the [[wp:Pearson's chi-square test|<math>\chi^2</math> test]] to see if the distribution has a likelihood of happening of at least the significance level (conventionally 5%). The function should return a boolean that is true if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce.

=={{header|J}}==
J has two types of verb definitions, Explicit and Tacit.

This is an Explicit solution:
<lang j>
require 'stats/base'

NB.*isUniform v Tests whether y is uniformly distributed
NB. result is: boolean describing if distribution y is uniform
NB. y is: distribution to test
NB. x is: optionally specify number of categories possible
isUniform=: verb define
(#~. y) isUniform y
:
signif=. 0.95 NB. set significance level
expected=. (#y) % x NB. number of items divided by the category count
observed=. #/.~ y NB. frequency count for each category
X2=. +/ (*: observed - expected) % expected NB. the test statistic
degfreedom=. <: x NB. degrees of freedom
signif > degfreedom chisqcdf :: 1: X2
)
</lang>
This is the equivalent Tacit solution:
<lang j>
require 'stats/base'

SIGNIF=: 0.95 NB. set significance level
countCats=: #@~. NB. counts the number of unique items
getExpected=: #@] % [ NB. divides no of items by category count
getObserved=: #/.~@] NB. counts frequency for each category
calcX2=: [: +/ *:@(getObserved - getExpected) % getExpected NB. calculates test statistic
calcDf=: <:@[ NB. calculates degrees of freedom for uniform distribution

NB.*isUniformT v Tests whether y is uniformly distributed
NB. result is: boolean describing if distribution y is uniform
NB. y is: distribution to test
NB. x is: optionally specify number of categories possible
isUniformT=: (countCats $: ]) : (SIGNIF > calcDf chisqcdf :: 1: calcX2)
</lang>

Verbs and Distributions for testing:
<lang j>
freqCount=: ~. ,. #/.~

testFair=: monad define
'distribution ', (":,freqCount y) , ' assessed as ', (isUniform y) {:: ;:'unfair fair'
)

FairDistrib=: 1e6 ?@$ 5
UnfairDistrib=: (9.5e5 ?@$ 5) , (5e4 ?@$ 4)
</lang>

Usage:
<lang j>
testFair FairDistrib
distribution 4 143155 0 143085 1 143111 2 142706 6 142666 5 142365 3 142912 assessed as fair
testFair UnfairDistrib
distribution 0 203086 1 201897 2 202648 3 202388 4 189981 assessed as unfair
</lang>


=={{header|Tcl}}==
=={{header|Tcl}}==

Revision as of 11:29, 15 September 2009

Task
Verify distribution uniformity/Chi-squared test
You are encouraged to solve this task according to the task description, using any language you may know.

In this task, write a function to verify that a given distribution of values is uniform by using the test to see if the distribution has a likelihood of happening of at least the significance level (conventionally 5%). The function should return a boolean that is true if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce.

J

J has two types of verb definitions, Explicit and Tacit.

This is an Explicit solution: <lang j> require 'stats/base'

NB.*isUniform v Tests whether y is uniformly distributed NB. result is: boolean describing if distribution y is uniform NB. y is: distribution to test NB. x is: optionally specify number of categories possible isUniform=: verb define

 (#~. y) isUniform y
 signif=. 0.95                    NB. set significance level
 expected=. (#y) % x              NB. number of items divided by the category count
 observed=. #/.~ y                NB. frequency count for each category
 X2=. +/ (*: observed - expected) % expected  NB. the test statistic
 degfreedom=. <: x                NB. degrees of freedom
 signif > degfreedom chisqcdf :: 1: X2

) </lang> This is the equivalent Tacit solution: <lang j> require 'stats/base'

SIGNIF=: 0.95 NB. set significance level countCats=: #@~. NB. counts the number of unique items getExpected=: #@] % [ NB. divides no of items by category count getObserved=: #/.~@] NB. counts frequency for each category calcX2=: [: +/ *:@(getObserved - getExpected) % getExpected NB. calculates test statistic calcDf=: <:@[ NB. calculates degrees of freedom for uniform distribution

NB.*isUniformT v Tests whether y is uniformly distributed NB. result is: boolean describing if distribution y is uniform NB. y is: distribution to test NB. x is: optionally specify number of categories possible isUniformT=: (countCats $: ]) : (SIGNIF > calcDf chisqcdf :: 1: calcX2) </lang>

Verbs and Distributions for testing: <lang j> freqCount=: ~. ,. #/.~

testFair=: monad define

 'distribution ', (":,freqCount y) , ' assessed as ', (isUniform y) {:: ;:'unfair fair'

)

FairDistrib=: 1e6 ?@$ 5 UnfairDistrib=: (9.5e5 ?@$ 5) , (5e4 ?@$ 4) </lang>

Usage: <lang j>

  testFair FairDistrib

distribution 4 143155 0 143085 1 143111 2 142706 6 142666 5 142365 3 142912 assessed as fair

  testFair UnfairDistrib

distribution 0 203086 1 201897 2 202648 3 202388 4 189981 assessed as unfair </lang>

Tcl

Works with: Tcl version 8.5
Library: tcllib

<lang tcl>package require Tcl 8.5 package require math::statistics

proc isUniform {distribution {significance 0.05}} {

   set count [tcl::mathop::+ {*}[dict values $distribution]]
   set expected [expr {double($count) / [dict size $distribution]}]
   set X2 0.0
   foreach value [dict values $distribution] {

set X2 [expr {$X2 + ($value - $expected)**2 / $expected}]

   }
   set degreesOfFreedom [expr {[dict size $distribution] - 1}]
   set likelihoodOfRandom [::math::statistics::incompleteGamma \

[expr {$degreesOfFreedom / 2.0}] [expr {$X2 / 2.0}]]

   expr {$likelihoodOfRandom > $significance}

}</lang> Testing: <lang tcl>proc makeDistribution {operation {count 1000000}} {

   for {set i 0} {$i<$count} {incr i} {incr distribution([uplevel 1 $operation])}
   return [array get distribution]

}

set distFair [makeDistribution {expr int(rand()*5)}] puts "distribution \"$distFair\" assessed as [expr [isUniform $distFair]?{fair}:{unfair}]" set distUnfair [makeDistribution {expr int(rand()*rand()*5)}] puts "distribution \"$distUnfair\" assessed as [expr [isUniform $distUnfair]?{fair}:{unfair}]"</lang> Output:

distribution "0 199809 4 199649 1 200665 2 199607 3 200270" assessed as fair
distribution "4 21461 0 522573 1 244456 2 139979 3 71531" assessed as unfair