Stem-and-leaf plot

From Rosetta Code
Revision as of 22:19, 16 December 2009 by rosettacode>Dcsobral (Added Scala)
This task is still under development.
In particular, the data set will be changed. —Kevin Reid 00:27, 14 December 2009 (UTC)
Task
Stem-and-leaf plot
You are encouraged to solve this task according to the task description, using any language you may know.

Create a well-formatted stem-and-leaf plot from the following data set, where the leaves are the last digits:

110 436 124 109 440 330 53 352 315 452 54 49 334 102 432 123 442 125 97 104 11 446 123 360 324 427 451 329 139 42 324 320 450 100 87 414 305 21 375 324 360 123 33 378 37 66 41 321 68 356 407 448 5 128 81 361 419 134 147 146

The primary intent of this task is the presentation of information. It is acceptable to hardcode the data set or characteristics of it (such as what the stems are) in the example, insofar as it is impractical to make the example generic to any data set. For example, in a computation-less language like HTML the data set may be entirely prearranged within the example; the interesting characteristics are how the proper visual formatting is arranged.

If possible, the output should not be a bitmap image. Monospaced plain text is acceptable, but do better if you can. It may be a window, i.e. not a file.

Note: If you wish to try multiple data sets, you might try this generator.

Haskell

<lang haskell>import Data.List import Control.Arrow import Control.Monad

nls :: [Int] nls = [110, 436, 124, 109, 440, 330, 53, 352, 315, 452,

 54, 49, 334, 102, 432, 123, 442, 125, 97, 104, 11, 446,
 123, 360, 324, 427, 451, 329, 139, 42, 324, 320, 450, 100,
 87, 414, 305, 21, 375, 324, 360, 123, 33, 378, 37, 66,
 41, 321, 68, 356, 407, 448, 5, 128, 81, 361, 419, 134, 147, 146]

groupWith f = takeWhile(not.null). unfoldr(Just. (partition =<< (. f). (==). f. head)) justifyR = foldl ((. return) . (++) . tail) . flip replicate ' '

task ds = mapM_ (putStrLn. showStemLeaves justifyR fb. (head *** sort.concat). unzip)

   $ groupWith fst $ stems ++ map (second return) stemLeaf
 where stemLeaf = map (`quotRem` 10) ds

stems = map (flip(,)[]) $ uncurry enumFromTo $ minimum &&& maximum $ fst $ unzip stemLeaf showStemLeaves f w (a,b) = f w (show a) ++ " |" ++ concatMap (f w. show) b fb = length $ show $ maximum $ map abs ds</lang> Output:

*Main> task  nls
  0 |  5                                                                
  1 |  1                                                                
  2 |  1                                                                
  3 |  3  7                                                             
  4 |  1  2  9                                                          
  5 |  3  4                                                             
  6 |  6  8                                                             
  7 |                                                                   
  8 |  1  7                                                             
  ....
 37 |  5  8
 38 |
 39 |
 40 |  7
 41 |  4  9
 42 |  7
 43 |  2  6
 44 |  0  2  6  8
 45 |  0  1  2

J

Solution: (Tacit) <lang j>stem =: <.@(%&10) leaf =: 10&| stemleaf =: (stem@{. ; leaf)/.~ stem expandStems =: <./ ([ + i.@>:@-~) >./ expandLeaves=: (expandStems e. ])@[ #inv ]

showStemLeaf=: (":@,.@expandStems@[ ; ":&>@expandLeaves)&>/@(>@{. ; <@{:)@|:@stemleaf@/:~</lang>

Solution: (Explicit) <lang j>stemleafX=: monad define

 leaves=. 10 | y
 stems=. y <.@:% 10
 leaves=. stems </. leaves                           NB. group leaves by stem
 (<"0 ~.stems),.leaves

)

showStemLeafX=: monad define

 'stems leaves'=. (>@{. ; <@{:)@|: stemleafX /:~ y
 xstems=. (<./ ([ + i.@>:@-~ ) >./) stems            NB. stems including those with no leaves
 xleaves=. (xstems e. stems) #inv leaves             NB. expand leaves to match xstems
 (": ,.xstems) ; ":&> xleaves

)</lang>

Example: <lang j> nls =: ; <@(_&".);._2 noun define 110 436 124 109 440 330 53 352 315 452 54 49 334 102 432 123 442 125 97 104 11 446 123 360 324 427 451 329 139 42 324 320 450 100 87 414 305 21 375 324 360 123 33 378 37 66 41 321 68 356 407 448 5 128 81 361 419 134 147 146 )

  stemleaf nls        NB. display has been abbreviated

┌──┬───────────┐ │11│0 │ ├──┼───────────┤ │43│6 2 │ ├──┼───────────┤ │12│4 3 5 3 3 8│ ├──┼───────────┤ ...

  showStemLeaf nls    NB. display has been abbreviated

┌──┬───────────┐ │ 0│5 │ │ 1│1 │ │ 2│1 │ │ 3│3 7 │ │ 4│1 2 9 │ ... │42│7 │ │43│2 6 │ │44│0 2 6 8 │ │45│0 1 2 │ └──┴───────────┘

  (showStemLeaf -: showStemLeafX) nls   NB. both solutions give same result

1</lang>

Perl generating LaTeX

This example is in need of improvement:

Once the task spec has settled down, post a rendered PDF.

<lang perl>#!/usr/bin/perl -w

my @data = sort {$a <=> $b} qw(110 436 124 109 440 330 53 352 315 452 54 49 334 102 432 123 442 125 97 104 11 446 123 360 324 427 451 329 139 42 324 320 450 100 87 414 305 21 375 324 360 123 33 378 37 66 41 321 68 356 407 448 5 128 81 361 419 134 147 146);

  1. FIXME: This should count the maximum number of leaves in any one stem;
  2. instead it takes the total number of data items, which is usually
  3. a massive overestimate.

my $columns = @data;

print <<"EOT"; \\documentclass{report} \\usepackage{fullpage} \\begin{document}

 \\begin{tabular}{ r | *{$columns}{c} }

EOT

my $laststem = undef;

for my $value (@data) {

 my $stem = int($value / 10);
 my $leaf = $value % 10;
 while (not defined $laststem or $stem > $laststem) {
   if (not defined $laststem) {
     $laststem = $stem - 1;
   } else {
     print " \\\\\n";
   }
   $laststem++;
   print "    $laststem";
 }
 printf " & %3d", $leaf;

}

print <<'EOT';

 \end{tabular}

\end{document} EOT</lang>

LaTeX output of the Perl program:

<lang latex>\documentclass{report} \usepackage{fullpage} \begin{document}

 \begin{tabular}{ r | *{60}{c} }
   0 & 5 \\
   1 & 1 \\
   2 & 1 \\
   3 & 3 & 7 \\
   ...
   44 & 0 & 2 & 6 & 8 \\
   45 & 0 & 1 & 2
 \end{tabular}

\end{document}</lang>

The parameter to the tabular environment defines the columns of the table. “r” and “c” are right- and center-aligned columns, “|” is a vertical rule, and “*{count}{cols}” repeats a column definition count times.

Python

Adjusting Stem.leafdigits allows you to modify how many digits of a value are used in the leaf, with the stem intervals adjusted accordingly. <lang python> from collections import namedtuple from pprint import pprint as pp from math import floor

Stem = namedtuple('Stem', 'data, leafdigits')

data0 = Stem((110, 436, 124, 109, 440, 330, 53, 352, 315, 452,

              54,  49, 334, 102, 432, 123, 442, 125,  97, 104,
              11, 446, 123, 360, 324, 427, 451, 329, 139,  42,
             324, 320, 450, 100,  87, 414, 305,  21, 375, 324,
             360, 123,  33, 378,  37,  66,  41, 321,  68, 356,
             407, 448,   5, 128,  81, 361, 419, 134, 147, 146),
            2.0)

def stemplot(stem):

   d = []
   interval = int(10**int(stem.leafdigits))
   for data in sorted(stem.data):
       data = int(floor(data))
       stm, lf = divmod(data,interval)
       d.append( (int(stm), int(lf)) )
   stems, leafs = list(zip(*d))
   stemwidth = max(len(str(x)) for x in stems)
   leafwidth = max(len(str(x)) for x in leafs)
   laststem, out = min(stems) - 1, []
   for s,l in d:
       while laststem < s:
           laststem += 1
           out.append('\n%*i |' % ( stemwidth, laststem))
       out.append(' %0*i' % (leafwidth, l))
   out.append('\n\nKey:\n Stem multiplier: %i\n X | Y  =>  %i*X+Y\n'
              % (interval, interval))
   return .join(out)

if __name__ == '__main__':

   print( stemplot(data0) )
   print( stemplot(Stem(data0.data, 1.0)) )</lang>

Sample Output

>>> 

0 | 05 11 21 33 37 41 42 49 53 54 66 68 81 87 97
1 | 00 02 04 09 10 23 23 23 24 25 28 34 39 46 47
2 |
3 | 05 15 20 21 24 24 24 29 30 34 52 56 60 60 61 75 78
4 | 07 14 19 27 32 36 40 42 46 48 50 51 52

Key:
 Stem multiplier: 100
 X | Y  =>  100*X+Y


 0 | 5
 1 | 1
 2 | 1
 3 | 3 7
 4 | 1 2 9
 5 | 3 4
 6 | 6 8
 7 |
 8 | 1 7
 9 | 7
10 | 0 2 4 9
11 | 0
12 | 3 3 3 4 5 8
13 | 4 9
14 | 6 7
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 | 5
31 | 5
32 | 0 1 4 4 4 9
33 | 0 4
34 |
35 | 2 6
36 | 0 0 1
37 | 5 8
38 |
39 |
40 | 7
41 | 4 9
42 | 7
43 | 2 6
44 | 0 2 6 8
45 | 0 1 2

Key:
 Stem multiplier: 10
 X | Y  =>  10*X+Y

>>> 

Ruby

I added a few negative values to the data to demonstrate the "negative zero" stem. That is to say, the reason the bulky Stem class exists is to differentiate "0" and "-0". <lang ruby>class StemLeafPlot

 def initialize(data, options = {})
   opts = {:leaf_digits => 1}.merge(options)
   @leaf_digits = opts[:leaf_digits]
   @multiplier = 10 ** @leaf_digits
   @plot = generate_structure(data)
 end
 private
 def generate_structure(data)
   plot = Hash.new {|h,k| h[k] = []}
   data.sort.each do |value| 
     stem, leaf = parse(value)
     plot[stem] << leaf
   end
   plot
 end
 def parse(value)
   stem, leaf = value.abs.divmod(@multiplier)
   [Stem.get(stem, value), leaf.round]
 end
 public
 def print
   stem_width = Math.log10(@plot.keys.max_by {|s| s.value}.value).ceil + 1
   Stem.get_range(@plot.keys).each do |stem|
     leaves = @plot[stem].inject("") {|str,leaf| str << "%*d " % [@leaf_digits, leaf]}
     puts "%*s | %s" % [stem_width, stem, leaves]
   end
   puts "key: 5|4=#{5 * @multiplier + 4}"
   puts "leaf unit: 1"
   puts "stem unit: #@multiplier"
 end

end

class Stem

 @@cache = {}
 def self.get(stem_value, datum)
   sign = datum < 0 ? :- : :+
   cache(stem_value, sign)
 end
 
 private
 
 def self.cache(value, sign)
   if @@cachevalue, sign.nil?
     @@cachevalue, sign = self.new(value, sign)
   end
   @@cachevalue, sign 
 end
 def initialize(value, sign)
   @value = value
   @sign = sign
 end
 
 public 
 
 attr_accessor :value, :sign
 
 def negative?
   @sign == :-
 end
 def <=>(other)
   if self.negative?
     if other.negative?
       other.value <=> self.value
     else
       -1
     end
   else
     if other.negative?
       1
     else
       self.value <=> other.value
     end
   end
 end
 def to_s
   "%s%d" % [(self.negative? ? '-' : ' '), @value]
 end
 
 def self.get_range(array_of_stems)
   min, max = array_of_stems.minmax
   if min.negative?
     if max.negative?
       min.value.downto(max.value).collect {|n| cache(n, :-)}
     else
       min.value.downto(0).collect {|n| cache(n, :-)} + 0.upto(max.value).collect {|n| cache(n, :+)}
     end
   else
     min.value.upto(max.value).collect {|n| cache(n, :+)}
   end
 end

end

data = DATA.read.split.map {|s| Float(s)} StemLeafPlot.new(data, :leaf_digits => 2).print puts StemLeafPlot.new(data).print

__END__

   0 -3 -45 -167
   110 436 124 109 440 330 53 352 315 452 54 49 334 102 432 
   123 442 125 97 104 11 446 123 360 324 427 451 329 139 42 
   324 320 450 100 87 414 305 21 375 324 360 123 33 378 37 
   66 41 321 68 356 407 448 5 128 81 361 419 134 147 146</lang>

outputs

-1 | 67
-0 | 45  3
 0 |  0  5 11 21 33 37 41 42 49 53 54 66 68 81 87 97
 1 |  0  2  4  9 10 23 23 23 24 25 28 34 39 46 47
 2 |
 3 |  5 15 20 21 24 24 24 29 30 34 52 56 60 60 61 75 78
 4 |  7 14 19 27 32 36 40 42 46 48 50 51 52
key: 5|4=504
leaf unit: 1
stem unit: 100

-16 | 7
-15 |
-14 |
-13 |
-12 |
-11 |
-10 |
 -9 |
 -8 |
 -7 |
 -6 |
 -5 |
 -4 | 5
 -3 |
 -2 |
 -1 |
 -0 | 3
  0 | 0 5
  1 | 1
  2 | 1
  3 | 3 7
  4 | 1 2 9
  5 | 3 4
  6 | 6 8
  7 |
  8 | 1 7
  9 | 7
 10 | 0 2 4 9
 11 | 0
 12 | 3 3 3 4 5 8
 13 | 4 9
 14 | 6 7
 15 |
 16 |
 17 |
 18 |
 19 |
 20 |
 21 |
 22 |
 23 |
 24 |
 25 |
 26 |
 27 |
 28 |
 29 |
 30 | 5
 31 | 5
 32 | 0 1 4 4 4 9
 33 | 0 4
 34 |
 35 | 2 6
 36 | 0 0 1
 37 | 5 8
 38 |
 39 |
 40 | 7
 41 | 4 9
 42 | 7
 43 | 2 6
 44 | 0 2 6 8
 45 | 0 1 2
key: 5|4=54
leaf unit: 1
stem unit: 10

Scala

Works with: Scala version 2.8

<lang scala>def stemAndLeaf(numbers: List[Int]) = {

 val lineFormat = "%" + (numbers map (_.toString.length) max) + "d | %s"
 val map = numbers groupBy (_ / 10)
 for (stem <- numbers.min / 10 to numbers.max / 10) {
   println(lineFormat format (stem, map.getOrElse(stem, Nil) map (_ % 10) sortBy identity mkString " "))
 }

}</lang>

Example:

scala> val list = """110   436   124   109   440   330   53    352   315   452   54    49    334   102   432   123   442
   125   97    104   11    446   123   360   324   427   451   329   139   42    324   320   450   100   87    414   305
   21    375   324   360   123   33    378   37    66    41    321   68    356   407   448   5     128   81    361   419
   134   147   146""" split "\\s+" map (_.toInt) toList
list: List[Int] = List(110, 436, 124, 109, 440, 330, 53, 352, 315, 452, 54, 49, 334, 102, 432, 123, 442, 125, 97, 104, 1
1, 446, 123, 360, 324, 427, 451, 329, 139, 42, 324, 320, 450, 100, 87, 414, 305, 21, 375, 324, 360, 123, 33, 378, 37, 66
, 41, 321, 68, 356, 407, 448, 5, 128, 81, 361, 419, 134, 147, 146)

scala> stemAndLeaf(list)
  0 | 5
  1 | 1
  2 | 1
  3 | 3 7
  4 | 1 2 9
  5 | 3 4
  6 | 6 8
  7 |
  8 | 1 7
  9 | 7
 10 | 0 2 4 9
 11 | 0
 12 | 3 3 3 4 5 8
 13 | 4 9
 14 | 6 7
 15 |
 16 |
 17 |
 18 |
 19 |
 20 |
 21 |
 22 |
 23 |
 24 |
 25 |
 26 |
 27 |
 28 |
 29 |
 30 | 5
 31 | 5
 32 | 0 1 4 4 4 9
 33 | 0 4
 34 |
 35 | 2 6
 36 | 0 0 1
 37 | 5 8
 38 |
 39 |
 40 | 7
 41 | 4 9
 42 | 7
 43 | 2 6
 44 | 0 2 6 8
 45 | 0 1 2

Tcl

Works with: Tcl version 8.5

<lang tcl>package require Tcl 8.5

  1. How to process a single value, adding it to the table mapping stems to
  2. leaves.

proc addSLValue {tblName value {splitFactor 10}} {

   upvar 1 $tblName tbl
   # Extract the stem and leaf
   if {$value < 0} {

set value [expr {round(-$value)}] set stem -[expr {$value / $splitFactor}]

   } else {

set value [expr {round($value)}] set stem [expr {$value / $splitFactor}]

   }
   if {![info exist tbl]} {

dict set tbl min $stem

   }
   dict set tbl max $stem
   set leaf [expr {$value % $splitFactor}]
   dict lappend tbl $stem $leaf

}

  1. How to do the actual output of the stem-and-leaf table, given that we have
  2. already done the splitting into stems and leaves.

proc printSLTable {tblName} {

   upvar 1 $tblName tbl
   # Get the range of stems
   set min [dict get $tbl min]
   set max [dict get $tbl max]
   # Work out how much width the stems take so everything lines up
   set l [expr {max([string length $min], [string length $max])}]
   # Print out the table
   for {set i $min} {$i <= $max} {incr i} {

if {![dict exist $tbl $i]} { puts [format " %*d |" $l $i] } else { puts [format " %*d | %s" $l $i [dict get $tbl $i]] }

   }

}

  1. Assemble the parts into a full stem-and-leaf table printer.

proc printStemLeaf {dataList {splitFactor 10}} {

   foreach value [lsort -real $dataList] {

addSLValue tbl $value $splitFactor

   }
   printSLTable tbl

}

  1. Demo code

set data {

   110 436 124 109 440 330 53 352 315 452 54 49 334 102 432 123 442 125 97
   104 11 446 123 360 324 427 451 329 139 42 324 320 450 100 87 414 305 21
   375 324 360 123 33 378 37 66 41 321 68 356 407 448 5 128 81 361 419 134
   147 146

} printStemLeaf $data</lang> Output:

  0 | 5
  1 | 1
  2 | 1
  3 | 3 7
  4 | 1 2 9
  5 | 3 4
  6 | 6 8
  7 |
  8 | 1 7
  9 | 7
 10 | 0 2 4 9
 11 | 0
 12 | 3 3 3 4 5 8
 13 | 4 9
 14 | 6 7
 15 |
 16 |
 17 |
 18 |
 19 |
 20 |
 21 |
 22 |
 23 |
 24 |
 25 |
 26 |
 27 |
 28 |
 29 |
 30 | 5
 31 | 5
 32 | 0 1 4 4 4 9
 33 | 0 4
 34 |
 35 | 2 6
 36 | 0 0 1
 37 | 5 8
 38 |
 39 |
 40 | 7
 41 | 4 9
 42 | 7
 43 | 2 6
 44 | 0 2 6 8
 45 | 0 1 2