Fivenum: Difference between revisions

From Rosetta Code
Content added Content deleted
(Added Kotlin)
m (→‎{{header|R}}: added zkl header)
Line 164: Line 164:
[1] -0.4366061 -0.2225105 0.3213424 0.7110099 0.7709201
[1] -0.4366061 -0.2225105 0.3213424 0.7110099 0.7709201
</lang>
</lang>

=={{header|zkl}}==
<lang zkl></lang>
<lang zkl></lang>
{{out}}
<pre>
</pre>

Revision as of 21:17, 21 February 2018

Fivenum is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Many big data or scientific programs use boxplots to show distributions of data. However, sometimes saving large arrays for boxplots can be impractical and use extreme amounts of RAM. It can be useful to save large arrays as arrays with 5 numbers to save memory. The base statistics of the R programming language have this as the `fivenum` function.

Task Description

Given a large array, reduce a large array to five numbers that will have the same boxplot properties as the larger array.

Kotlin

The following uses Tukey's method for calculating the lower and upper quartiles (or 'hinges') which is what the R function, fivenum, appears to use.

As arrays containing NaNs and nulls cannot really be dealt with in a sensible fashion in Kotlin, they've been excluded altogether. <lang scala>// version 1.2.21

fun median(x: List<Double>): Double {

   require (x.size != 0) { "List cannot be empty" }
   val m = x.size / 2
   return if (x.size % 2 == 1) x[m] else (x[m - 1] + x[m]) / 2.0

}

fun fivenum(x: DoubleArray): DoubleArray {

   require(x.none { it.isNaN() }) { "Unable to deal with arrays containing NaN" }
   val result = DoubleArray(5)
   x.sort()
   result[0] = x[0]
   result[2] = median(x.asList())
   result[4] = x[x.lastIndex]
   var lower: List<Double>
   var upper: List<Double>
   if (x.size % 2 == 1) {
       lower = x.takeWhile { it <= result[2] }
       upper = x.takeLastWhile { it >= result[2] }
   }
   else {
       val m = x.size / 2
       lower = x.take(m)
       upper = x.takeLast(m)
   }
   result[1] = median(lower)
   result[3] = median(upper)
   return result

}

fun main(args: Array<String>) {

   var xl = listOf(
       doubleArrayOf(15.0, 6.0, 42.0, 41.0, 7.0, 36.0, 49.0, 40.0, 39.0, 47.0, 43.0),
       doubleArrayOf(36.0, 40.0, 7.0, 39.0, 41.0, 15.0),
       doubleArrayOf(
            0.14082834,  0.09748790,  1.73131507,  0.87636009, -1.95059594,  0.73438555,
           -0.03035726,  1.46675970, -0.74621349, -0.72588772,  0.63905160,  0.61501527,
           -0.98983780, -1.00447874, -0.62759469,  0.66206163,  1.04312009, -0.10305385,
            0.75775634,  0.32566578
       )
   )
   xl.forEach { println("${fivenum(it).asList()}\n") }

}</lang>

Output:
[6.0, 25.5, 40.0, 42.5, 49.0]

[7.0, 15.0, 37.5, 40.0, 41.0]

[-1.95059594, -0.676741205, 0.23324706, 0.746070945, 1.73131507]

Perl

Translation of: R
Works with: Perl 5.10

<lang Perl>

  1. !/usr/bin/env perl

use strict; use warnings; use Cwd 'getcwd'; use feature 'say'; my $TOP_DIRECTORY = getcwd(); local $SIG{__WARN__} = sub {#kill the program if there are any warnings my $message = shift; my $fail_filename = "$TOP_DIRECTORY/$0.FAIL"; open my $fh, '>', $fail_filename or die "Can't write $fail_filename: $!"; printf $fh ("$message @ %s\n", getcwd()); close $fh; die "$message\n"; };#http://perlmaven.com/how-to-capture-and-save-warnings-in-perl

use POSIX qw(ceil floor);

sub fivenum { my $array = shift; my @x = sort {$a <=> $b} @{ $array }; printf("There are %u elements.\n", scalar @{ $array }); my $n = scalar @{ $array }; if ($n == 0) { print "no values were entered into fivenum.\n"; die; } my $n4 = floor(($n+3)/2)/2; my @d = (1, $n4, ($n +1)/2, $n+1-$n4, $n);#d <- c(1, n4, (n + 1)/2, n + 1 - n4, n) my (@floor_d, @ceiling_d); foreach my $d (0..4) { $floor_d[$d] = floor($d[$d]); $ceiling_d[$d] = ceil($d[$d]); } my @sum_array; foreach my $e (0..4) { if (not defined $floor_d[$e]) { say "\$floor_d[$e] isn't defined."; die; } if (not defined $ceiling_d[$e]) { say "\$ceiling_d[$e] isn't defined."; die; } if (!defined $x[$floor_d[$e]-1]) { say "\$x[$floor_d[$e-1]-1] isn't defined."; die; } if (!defined $x[$ceiling_d[$e]-1]) { say "\$x[$ceiling_d[$e]-1] isn't defined."; die; } push @sum_array, (0.5 * ($x[$floor_d[$e]-1] + $x[$ceiling_d[$e]-1])); } return @sum_array; }

my @x = qw(0.14082834 0.09748790 1.73131507 0.87636009 -1.95059594 0.73438555 -0.03035726 1.46675970 -0.74621349 -0.72588772 0.63905160 0.61501527

-0.98983780 -1.00447874 -0.62759469  0.66206163  1.04312009 -0.10305385
 0.75775634  0.32566578);

my @y = fivenum(\@x);

say join (',', @y); </lang>

Output:
 -1.95059594,-0.676741205,0.23324706,0.746070945,1.73131507 

R

The commented lines are from R source code. This is extremely easy to execute in R. <lang R>

  1. > fivenum
  2. function (x, na.rm = TRUE)
  3. {
  4. xna <- is.na(x)
  5. if (any(xna)) {
  6. if (na.rm)
  7. x <- x[!xna]
  8. else return(rep.int(NA, 5))
  9. }
  10. x <- sort(x)
  11. n <- length(x)
  12. if (n == 0)
  13. rep.int(NA, 5)
  14. else {
  15. n4 <- floor((n + 3)/2)/2
  16. d <- c(1, n4, (n + 1)/2, n + 1 - n4, n)
  17. 0.5 * (x[floor(d)] + x[ceiling(d)])
  18. }
  19. }
  20. <bytecode: 0x7fd0db42a7b8>
  21. <environment: namespace:stats>

> fivenum(rnorm(4)) [1] -0.4366061 -0.2225105 0.3213424 0.7110099 0.7709201 </lang>

zkl

<lang zkl></lang> <lang zkl></lang>

Output: