Fivenum
Many big data or scientific programs use boxplots to show distributions of data. However, sometimes saving large arrays for boxplots can be impractical and use extreme amounts of RAM. It can be useful to save large arrays as arrays with 5 numbers to save memory. The base statistics of the R programming language have this as the `fivenum` function.
Task Description
Given a large array, reduce a large array to five numbers that will have the same boxplot properties as the larger array.
Perl
<lang Perl>
- !/usr/bin/env perl
use strict; use warnings; use Cwd 'getcwd'; use feature 'say'; my $TOP_DIRECTORY = getcwd(); local $SIG{__WARN__} = sub {#kill the program if there are any warnings my $message = shift; my $fail_filename = "$TOP_DIRECTORY/$0.FAIL"; open my $fh, '>', $fail_filename or die "Can't write $fail_filename: $!"; printf $fh ("$message @ %s\n", getcwd()); close $fh; die "$message\n"; };#http://perlmaven.com/how-to-capture-and-save-warnings-in-perl
use POSIX qw(ceil floor);
sub fivenum { my $array = shift; my @x = sort {$a <=> $b} @{ $array }; printf("There are %u elements.\n", scalar @{ $array }); my $n = scalar @{ $array }; if ($n == 0) { print "no values were entered into fivenum.\n"; die; } my $n4 = floor(($n+3)/2)/2; my @d = (1, $n4, ($n +1)/2, $n+1-$n4, $n);#d <- c(1, n4, (n + 1)/2, n + 1 - n4, n) my (@floor_d, @ceiling_d); foreach my $d (0..4) { $floor_d[$d] = floor($d[$d]); $ceiling_d[$d] = ceil($d[$d]); } my @sum_array; foreach my $e (0..4) { if (not defined $floor_d[$e]) { say "\$floor_d[$e] isn't defined."; die; } if (not defined $ceiling_d[$e]) { say "\$ceiling_d[$e] isn't defined."; die; } if (!defined $x[$floor_d[$e]-1]) { say "\$x[$floor_d[$e-1]-1] isn't defined."; die; } if (!defined $x[$ceiling_d[$e]-1]) { say "\$x[$ceiling_d[$e]-1] isn't defined."; die; } push @sum_array, (0.5 * ($x[$floor_d[$e]-1] + $x[$ceiling_d[$e]-1])); } return @sum_array; }
my @x = qw(0.14082834 0.09748790 1.73131507 0.87636009 -1.95059594 0.73438555 -0.03035726 1.46675970 -0.74621349 -0.72588772 0.63905160 0.61501527
-0.98983780 -1.00447874 -0.62759469 0.66206163 1.04312009 -0.10305385 0.75775634 0.32566578);
my @y = fivenum(\@x);
say join (',', @y); </lang>
- Output:
-1.95059594,-0.676741205,0.23324706,0.746070945,1.73131507
R
The commented lines are from R source code. This is extremely easy to execute in R. <lang R>
- > fivenum
- function (x, na.rm = TRUE)
- {
- xna <- is.na(x)
- if (any(xna)) {
- if (na.rm)
- x <- x[!xna]
- else return(rep.int(NA, 5))
- }
- x <- sort(x)
- n <- length(x)
- if (n == 0)
- rep.int(NA, 5)
- else {
- n4 <- floor((n + 3)/2)/2
- d <- c(1, n4, (n + 1)/2, n + 1 - n4, n)
- 0.5 * (x[floor(d)] + x[ceiling(d)])
- }
- }
- <bytecode: 0x7fd0db42a7b8>
- <environment: namespace:stats>
> fivenum(rnorm(4)) [1] -0.4366061 -0.2225105 0.3213424 0.7110099 0.7709201 </lang>