Talk:Permutation test: Difference between revisions

From Rosetta Code
Content added Content deleted
(commentary on differences in results)
Line 71: Line 71:
(1,0.501555555555555532)</pre>
(1,0.501555555555555532)</pre>


[[User:Sluggo|Sluggo]]
--[[User:Sluggo|Sluggo]] 02:06, 4 February 2011 (UTC)


== Name of task ==
== Name of task ==

Revision as of 02:06, 4 February 2011

Difference in results?

I see that the Tcl and Ursala code seem to be calculating different results. How can this be the case? I've been careful to check that the number of cases generated in the Tcl code is the correct one (92378 for selecting 10 from 19) so I think that's correct... –Donal Fellows 14:48, 1 February 2011 (UTC)

I am getting different results form everyone else, myself. I have 80551 cases which have a difference in means which is less than or equal to the result's difference in means (and note that this includes the result), and I have 11827 cases where the difference in means is greater than the result's difference in means. My smallest difference in means is -0.518111 and my largest difference in means is 0.501556. My difference in means for the original result is 0.153222. Since other people are getting different numbers, I am curious about what these statistics look like for them (or if I have made a mistake -- which should show up in these stats). --Rdm 20:42, 3 February 2011 (UTC)
Ah, I think I see: I have 313 cases which are equal to my result, that's 0.34% of the total. I am using an epsilon of result*(2^-44) which is approximately 9e-15, which is much tighter than experimental accuracy, and the largest difference between a value I classify as equal and the computed result is approximately 2e-16. Differing epsilons, or differences in floating point implementations for systems without epsilon are enough to account the differences I currently see on the task page --Rdm 21:23, 3 February 2011 (UTC)

When setting the task, I neglected to specify whether the difference in means should be calculated as the treatment group mean minus the control group mean, or vice versa. It's just a matter of convention, but now I realize that it affects the result because there are 313 alternatives with equal differences regardless, as Rdm notes, but the lesser and greater designations will of course depend on the direction of the subtraction. I propose that two results be accepted as valid, either  : , or  : .

Unfortunately there is still an issue of roundoff error, as Rdm also notes. Scaling the experimental data up by a factor of 100 seems to be an adequate workaround in the Ursala solution (and possibly others using standard IEEE double precision math). Can any numerical analysis experts reading this please suggest a better one? I've tried using sums as a proxy for means, and also calculating means and sums the by first sorting the numbers in order of absolute value, which didn't help. To help head off any further arguments about the results, here are my statistics, which agree with those of Rdm.

empirical difference in means (treatment - control):  0.153222222222222287

number of lesser or equal alternatives: 80551

partial sorted listing of lesser or equal alternatives (run length coded)

         (1,-0.518111111111111189)
         (1,-0.484333333333333227)
         (1,-0.482222222222222219)
         (1,-0.477999999999999980)
         (1,-0.467444444444444385)
         (1,-0.465333333333333377)
         (2,-0.463222222222222146)
         (2,-0.461111111111111138)
         (1,-0.456888888888888844)
         (1,-0.448444444444444479)
         .
         .
         .
         (357,0.134222222222222326)
         (357,0.136333333333333251)
         (351,0.138444444444444426)
         (336,0.140555555555555572)
         (335,0.142666666666666719)
         (335,0.144777777777777755)
         (325,0.146888888888888819)
         (319,0.148999999999999994)
         (319,0.151111111111111140)
         (313,0.153222222222222287)

number of greater alternatives: 11827

partial sorted listing of greater alternatives (run length coded):

         (310,0.155333333333333323)
         (297,0.157444444444444387)
         (299,0.159555555555555562)
         (288,0.161666666666666708)
         (286,0.163777777777777717)
         (286,0.165888888888888891)
         (269,0.167999999999999983)
         (273,0.170111111111111130)
         (265,0.172222222222222276)
         (266,0.174333333333333312)
         .
         .
         .
         (1,0.448777777777777720)
         (1,0.452999999999999958)
         (2,0.459333333333333316)
         (1,0.461444444444444435)
         (1,0.463555555555555554)
         (1,0.465666666666666673)
         (1,0.480444444444444452)
         (1,0.495222222222222230)
         (1,0.499444444444444413)
         (1,0.501555555555555532)

--Sluggo 02:06, 4 February 2011 (UTC)

Name of task

Shouldn't this be all combinations instead of all permutations? What difference does the order of members within the control or treatment group make (other than extra computation time)? --Rdm 16:08, 1 February 2011 (UTC)

I'm currently using "all combinations", which is reasonably fast (though I'm careful to treat each sample point independently, just in case there are repeated samples). All permutations is much slower (or it is if you generate directly) and I'm not sure how much difference it should make to the result since you'd effectively just be multiplying the number of each count by (== 1316818944000 in this example; we'd be waiting a while for the result) and then dropping all that anyway when you average out. –Donal Fellows 21:46, 1 February 2011 (UTC)