Talk:Fivenum: Difference between revisions
m
→Large vs not large
Line 19:
== Large vs not large ==
I removed the requirement, as it seems unrelated to the task. We are faced with a choice here:
* Either the important part is the large dataset. But then, how large? Does the data fit in memory? On a single hard drive? Does it require multiple hard drives in a network of computers? A dataset that fits in memory does not look large to me. Of course, it's a matter of hardware:
* Either the important part is computing these numbers. Then it's all about computing the median and quartiles (min and max are trivially doable in O(n)). A much simpler task, but every language should be able to do that.
I would be happy with both possibilities, but these are entirely different tasks, and if we have to manage large data, please state how large, and adapt the current solutions accordingly. All current solutions imply the dataset lies entirely in memory. For "usual" machines, that means the dataset is actually rather small. To give an example, most of my work is done on a business PC with 8GB RAM and SAS/Stata/R (and I suspect most professional statisticians work on a daily basis on that kind of machine with that kind of software). Some of my work is done on a SAS VA server with 265GB RAM. Still another part of my job is done in a Citrix environment connecting to larger servers (health data for the entire french population). Different machines, different ways to work, obviously.
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 20:12, 27 February 2018 (UTC)
|