User talk:Hailholyghost: Difference between revisions

From Rosetta Code
Content added Content deleted
No edit summary
No edit summary
Line 34: Line 34:


[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 13:08, 10 March 2018 (UTC)
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 13:08, 10 March 2018 (UTC)

Hi Eoraptor, thanks for your suggestions. I've tried to re-edit my code as much as I could. I am new, and am learning from old code. I don't know how I can make the code idiomatic. I make sure that the output is correct before I post the code. Please feel free to edit what I put, I won't be insulted.--[[User:Hailholyghost|Hailholyghost]] ([[User talk:Hailholyghost|talk]]) 20:01, 12 March 2018 (UTC)

Revision as of 20:01, 12 March 2018

Welch's t-test

Hello, you wrote: "Welch's t-test is only part of the calculation, it isn't the purpose of the page."

What's the purpose of the page, then?

By the way, "Given two lists of data, calculate the p-value used for Calculation of P-value." is totally meaningless. Which p-value? There are zillions of p-value, associated with zillions of statistical tests. Basically all you have to know is the probability distribution of some statistic, but there are infinitely many of them: you are not going to ask for every possible distribution, so you have to choose one. Here, all the task in its current state is about Welch's t-test and how to compute the corresponding p-value, and yet you pretend it's not the purpose of the page. Puzzling.

Eoraptor (talk) 11:37, 8 December 2017 (UTC)

Welch's t-test is easy, the reason I created this page is because I had no idea how to calculate the integration, which is the point of the page. This page is meant to calculate this the same as R's "t.test(x,y,paired=FALSE)" If this page were about the t-test, as you say it is, the C code would have 1 line, and is completely trivial. The point of the page is to show how to do the non-trivial things: integration. I spent weeks figuring out how to do this, and your title change obfuscates my hard work.--Hailholyghost (talk) 12:51, 8 December 2017 (UTC)
Then this is a very badly designed task. There are already tasks about integration. Anyway, nobody would compute the p-value this way: there are good implementations of special functions, here the incomplete beta function, and usually they make use of specific properties of the functions, like series, continued fraction, optimal rational (Pade like) or polynomial (Chebyshev equioscillation) approximations, etc. If you want the p-value, the "standard" way is to find a special function library. Integration should be a last resort for Rosetta Code, for languages that do not have readily available special functions libraries. Common languages such as C, Fortran or Python all have something (I implemented the task in Python using numpy/scipy, and there is something in Fortran using IMSL and SLATEC, both well-known libraries), statistical packages even have builtin function for the whole test (see Stata and R here).
If you want to see how to use a generalist integration routine to compute the p-value, you are doing it wrong. If you want a general task on integration, you are doing it wrong too, and there is already a task about this.
You write the C code is trivial: really? How? You will need at least the incomplete beta function from GSL (or IMSL, NAG or any library with it, even a Fortran-based library), and this will require more than one line (you may do more or less what is done here in Fortran or Python). Even getting a working GSL on Windows is not trivial at all. And what you call the non-trivial thing is really not about integration at all.
Eoraptor (talk) 13:13, 8 December 2017 (UTC)
For instance, the R implementation uses a port of TOMS 708 from the ACM. The original code is here. Notice that, as usual with special functions, there are calls to different methods according the the values of the arguments. There are reasons not to reimplement this from scratch: it requires much research work to prove a given algorithm is correct, to find necessary coefficients with enough precision, and to get a correct and efficient implementation. Here you are not even trying to investigate the convergence of the integral, and it's an inefficient way to find an accurate answer. Eoraptor (talk) 13:30, 8 December 2017 (UTC)

"You will need at least the incomplete beta function" you are contradicting yourself. The Welch's t-test requires no integration. It took me a long time to implement this in portable GNU99 C, which wouldn't require library installations. This was a particularly daunting task, because I couldn't install the libraries- they were useless. I thought the world could benefit from my work. If you have some better way of doing this, fine, re-write the C code to implement the algorithms you mention.

Nonetheless, I don't see how to alter the title of the page. It appears that this page is stuck with a wrong title.

Oh, no integration, no beta incomplete function? That is going to be quite difficult. You need the beta incomplete function, which can be written as an integral. Either you integrate, or you apply clever techniques to compute the function without integration. But you have to do something. And what you ask for is not doable anyway (no "general p-value"). You don't seem to be understanding what you are trying to do wrong. Good luck. I give up. Eoraptor (talk) 15:46, 8 December 2017 (UTC)


Hi,

Suggested reading for your Python code: PEP 8 -- Style Guide for Python Code Most notably, tabs are not recommended but 4 spaces per tab are common, line length should be limited, no spaces after opening paren and before closing paren, no space between function name and following paren... While it may look stupid to you, these rules are applied in virtually every published Python code, and this is an important part of the readability of this language. There is also a minor bug at the beginning (two tests on ARRAY1_SIZE, none on ARRAY2_SIZE), and "while (1):" would rather be written "while True:". There are other oddities (inconsistent spacing around operators at least). If you don't mind, I could rewrite this in a more "pythonistic" way.

As an aside, it's an often overlooked matter (and Burkardt was guilty as well, as far as I was able to check) : translating from a language to another language requires more than just converting syntax. It's a criticism that was also made to the famous (or infamous) Numerical Recipes : the C version looked too much like the Fortran original, and not like usual C.

I also moved the numpy solution first: not that I am eager to put "my" solution first, but: 1/ It's much shorter, so the second solution is more visible if it's the low-level implementation 2/ It's not what would actually be done in Python, for several reasons: speed (use compiled code when it's available), but most importantly because statistical computations in Python are usually done with the numpy/scipy/pandas framework, which is much closer to what's available in, say, R, than "pure Python", which would require reinventing everything.

HTH

Eoraptor (talk) 13:08, 10 March 2018 (UTC)

Hi Eoraptor, thanks for your suggestions. I've tried to re-edit my code as much as I could. I am new, and am learning from old code. I don't know how I can make the code idiomatic. I make sure that the output is correct before I post the code. Please feel free to edit what I put, I won't be insulted.--Hailholyghost (talk) 20:01, 12 March 2018 (UTC)