Entropy

From Rosetta Code
Task
Entropy
You are encouraged to solve this task according to the task description, using any language you may know.

Calculate the information entropy (Shannon entropy) of a given input string.

Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable is defined as:

where the information content . If the base of the logarithm , the result is expressed in bits, a unit of information. Therefore, given a string of length where is the relative frequency of each character, the entropy of a string in bits is:

For this task, use "1223334444" as an example. The result should be around 1.84644 bits.

Ada

Uses Ada 2012. <lang Ada>with Ada.Text_IO, Ada.Float_Text_IO, Ada.Numerics.Elementary_Functions;

procedure Count_Entropy is

  package TIO renames Ada.Text_IO;
  Count: array(Character) of Natural := (others => 0);
  Sum:   Natural := 0;
  Line: String := "1223334444";

begin

  for I in Line'Range loop   -- count the characters
     Count(Line(I)) := Count(Line(I))+1;
     Sum := Sum + 1;
  end loop;
  declare   -- compute the entropy and print it
     function P(C: Character) return Float is (Float(Count(C)) / Float(Sum));
     use Ada.Numerics.Elementary_Functions, Ada.Float_Text_IO;
     Result: Float := 0.0;
  begin
     for Ch in Character loop
        Result := Result -
         (if P(Ch)=0.0 then 0.0 else P(Ch) * Log(P(Ch), Base => 2.0));
     end loop;
     Put(Result, Fore => 1, Aft => 5, Exp => 0);
  end;

end Count_Entropy;</lang>

Burlesque

<lang burlesque>blsq ) "1223334444"F:u[vv^^{1\/?/2\/LG}m[?*++ 1.8464393446710157</lang>

Euler Math Toolbox

<lang EulerMathToolbox>>function entropy (s) ... $ v=strtochar(s); $ m=getmultiplicities(unique(v),v); $ m=m/sum(m); $ return sum(-m*logbase(m,2)) $endfunction >entropy("1223334444")

1.84643934467</lang>

D

<lang d>import std.stdio, std.algorithm, std.math;

double entropy(T)(T[] s) /*pure nothrow*/ if (__traits(compiles, sort(s))) {

   return s
          .sort()
          .group
          .map!(g => g[1] / cast(double)s.length)
          .map!(p => -p * log2(p))
          .reduce!q{a + b};

}

void main() {

   "1223334444"d.dup.entropy.writeln;

}</lang>

Output:
1.84644

friendly interactive shell

Sort of hacky, but friendly interactive shell isn't really optimized for mathematic tasks (in fact, it doesn't even have associative arrays).

<lang fishshell>function entropy

   for arg in $argv
       set name count_$arg
       if not count $$name > /dev/null
           set $name 0
           set values $values $arg
       end
       set $name (math $$name + 1)
   end
   set entropy 0
   for value in $values
       set name count_$value
       set entropy (echo "
           scale = 50
           p = "$$name" / "(count $argv)"
           $entropy - p * l(p)
       " | bc -l)
   end
   echo "$entropy / l(2)" | bc -l

end entropy (echo 1223334444 | fold -w1)</lang>

Output:
1.84643934467101549345

Haskell

<lang haskell>import Data.List

main = print $ entropy "1223334444"

entropy s =

sum . map lg' . fq' . map (fromIntegral.length) . group . sort $ s
 where lg' c = (c * ) . logBase 2 $ 1.0 / c
       fq' c = map (\x -> x / (sum c)) c </lang>

J

Solution:<lang j> entropy=: +/@:-@(* 2&^.)@(#/.~ % #)</lang>

Example:

<lang j> entropy '1223334444' 1.84644</lang>

Lang5

<lang lang5>: -rot rot rot ; [] '__A set : dip swap __A swap 1 compress append '__A set execute __A -1 extract nip ; : nip swap drop ; : sum '+ reduce ;

2dip swap 'dip dip ; : 2drop drop drop ; : comb "" split ; : 1+ 1 + ;
3dip swap '2dip dip ; : chars comb length nip ; : take* 0 remove ;
<group>
   0 swap grade subscript dup 's dress distinct strip
   do  length
       if  2dup [0] subscript over in select length
           '-rot dip "swap 1+" 3dip
           iota 'remove rot take* 'execute dip
       else break
       then
   loop 2drop compress ;
elements(*) length nip ;
entropy
   dup comb <group>
   'elements apply swap chars /
   dup neg swap log * sum 2 log / ;

"1223334444" entropy . # 1.84643934467102</lang>

Mathematica

<lang Mathematica>shE[s_String] := -Plus @@ ((# Log[2., #]) & /@ ((Length /@ Gather[#])/

        Length[#]) &[Characters[s]])</lang>
Example:

<lang Mathematica> shE["1223334444"]

1.84644 shE["Rosetta Code"] 3.08496</lang>

Perl

<lang Perl>sub entropy {

   my %count; $count{$_}++ for @_;
   my @p = map $_/@_, values %count;
   my $entropy = 0;
   $entropy += - $_ * log $_ for @p;
   $entropy / log 2

}

print entropy split //, "1223334444";</lang>

Perl 6

<lang Perl 6>sub entropy(@a) {

   [+] map -> \p { p * -log p }, @a.bag.values »/» +@a;

}

say log(2) R/ entropy '1223334444'.comb;</lang>

Output:
1.84643934467102

Python

<lang python> import math

def hist(source):

   hist = {}; l = 0;
   for e in source:
       l += 1
       if not hist.has_key(e):
           hist[e] = 1.0
       else:
           hist[e] += 1.0
   return (l,hist)

def entropy(hist,l):

   elist = []
   for k in hist.keys():
       c = hist[k] / l
       elist.append(c * math.log(1.0 / c ,2))
   return sum(elist)

def printHist(h):

   flip = lambda (k,v) : (v,k)
   h = sorted(h.iteritems(), key = flip)
   print 'Sym\thi\tfi\tInf'
   for (k,v) in h:
       print '%s\t%f\t%f\t%f'%(k,v,v/l,math.log(1.0 / (v/l), 2))
   
   

source = "1223334444" (l,h) = hist(source); print '.[Results].' print 'Length',l print 'Entropy:', entropy(h, l) printHist(h) </lang>

Output:

.[Results].
Length 10
Entropy: 1.84643934467
Sym	hi	fi	Inf
1	1.000000	0.100000	3.321928
2	2.000000	0.200000	2.321928
3	3.000000	0.300000	1.736966
4	4.000000	0.400000	1.321928

R

<lang r>entropy = function(s)

  {freq = prop.table(table(
       substring(s, 1 : nchar(s), 1 : nchar(s))))
   -sum(freq * log(freq, base = 2))}

print(entropy("1223334444")) # 1.846439</lang>

Racket

<lang racket>

  1. lang racket

(require math)

(define (log2 x)

 (/ (log x) (log 2)))

(define (digits x)

 (for/list ([c (number->string x)])
   (- (char->integer c) (char->integer #\0))))

(define (entropy x)

 (define ds (digits x))
 (define n (length ds))
 (- (for/sum ([(d c) (in-hash (samples->hash ds))])
      (* (/ d n) (log2 (/ d n))))))

(entropy 1223334444) </lang> Output: 1.8464393446710154

Ruby

Works with: Ruby version 1.9

<lang ruby>def entropy(s)

 counts = Hash.new(0)
 s.each_char { |c| counts[c] += 1 }
 counts.values.reduce(0) do |entropy, count|
   freq = count / s.length.to_f
   entropy - freq * Math.log2(freq)
 end

end</lang> One-liner, same performance (or better): <lang ruby>def entropy2(s)

 s.each_char.group_by(&:to_s).values.map { |x| x.length / s.length.to_f }.reduce(0) { |e, x| e - x*Math.log2(x) }

end</lang>

Scala

<lang scala>import scala.math._

def entropy( v:String ) = { v

 .groupBy (a => a)
 .values
 .map( i => i.length.toDouble / v.length )
 .map( p => -p * log10(p) / log10(2))
 .sum

}

// Confirm that "1223334444" has an entropy of about 1.84644 assert( math.round( entropy("1223334444") * 100000 ) * 0.00001 == 1.84644 )</lang>

Tcl

<lang tcl>proc entropy {str} {

   set log2 [expr log(2)]
   foreach char [split $str ""] {dict incr counts $char}
   set entropy 0.0
   foreach count [dict values $counts] {

set freq [expr {$count / double([string length $str])}] set entropy [expr {$entropy - $freq * log($freq)/$log2}]

   }
   return $entropy

}</lang> Demonstration: <lang tcl>puts [format "entropy = %.5f" [entropy "1223334444"]] puts [format "entropy = %.5f" [entropy "Rosetta Code"]]</lang>

Output:
entropy = 1.84644
entropy = 3.08496

XPL0

<lang XPL0>code real RlOut=48, Ln=54; \intrinsic routines string 0; \use zero-terminated strings

func StrLen(A); \Return number of characters in an ASCIIZ string char A; int I; for I:= 0, -1>>1-1 do

   if A(I) = 0 then return I;

func real Entropy(Str); \Return Shannon entropy of string char Str; int Len, I, Count(128); real Sum, Prob; [Len:= StrLen(Str); for I:= 0 to 127 do Count(I):= 0; for I:= 0 to Len-1 do \count number of each character in string

   Count(Str(I)):= Count(Str(I)) + 1;

Sum:= 0.0; for I:= 0 to 127 do

   if Count(I) # 0 then        \(avoid Ln(0.0) error)
       [Prob:= float(Count(I)) / float(Len);   \probability of char in string
       Sum:= Sum + Prob*Ln(Prob);
       ];

return -Sum/Ln(2.0); ];

RlOut(0, Entropy("1223334444"))</lang>

Output:
    1.84644