Jump to content

Talk:Fivenum: Difference between revisions

m
Line 19:
== Large vs not large ==
I removed the requirement, as it seems unrelated to the task. We are faced with a choice here:
* Either the important part is the large dataset. But then, how large? Does the data fit in memory? On a single hard drive? Does it require multiple hard drives in a network of computers? A dataset that fits in memory does not look large to me. Of course, it's a matter of hardware: I have at hand a SAS VA server with 256GB256 GB memory, that will be enough to do in-memory computations that would require a hard drive on most PCs. A really large file would require a network, and technology like Hadoop or Spark. If we insist in requiring all of this (which looks perfectly acceptable, as it would be a good exercise in managine large data), the task will be much more difficult, or impossible for most languages. And the R solution would be wrong (but I imagine there are packages to do that correctly in R).
* Either the important part is computing these numbers. Then it's all about computing the median and quartiles (min and max are trivially doable in O(n)). A much simpler task, but every language should be able to do that.
 
I would be happy with both possibilities, but these are entirely different tasks, and if we have to manage large data, please state how large, and adapt the current solutions accordingly. All current solutions imply the dataset lies entirely in memory. For "usual" machines, that means the dataset is actually rather small. To give an example, most of my work is done on a business PC with 8GB RAM and SAS/Stata/R (and I suspect most professional statisticians work on a daily basis on that kind of machine with that kind of software). Some of my work is done on a SAS VA server with 265GB RAM. Still another part of my job is done in a Citrix environment connecting to larger servers (health data for the entire french population). Different machines, different ways to work, obviously.
 
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 20:12, 27 February 2018 (UTC)
1,336

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.