Entropy

From Rosetta Code
Revision as of 09:16, 26 February 2013 by rosettacode>Dkf (tidy up task description, add category)
Task
Entropy
You are encouraged to solve this task according to the task description, using any language you may know.

Calculate the information entropy (Shannon entropy) of a given input string. Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable is defined as:

where the information content . If the base of the logarithm , the result is expressed in bits, a unit of information. Therefore, given a string of length where is the relative frequency of each character, the entropy of a string in bits is:

For this task, use "1223334444" as an example. The result should be around 1.84644 bits.

Ada

Uses Ada 2012. <lang Ada>with Ada.Text_IO, Ada.Float_Text_IO, Ada.Numerics.Elementary_Functions;

procedure Count_Entropy is

  package TIO renames Ada.Text_IO;
  Count: array(Character) of Natural := (others => 0);
  Sum:   Natural := 0;
  Line: String := "1223334444";

begin

  for I in Line'Range loop   -- count the characters
     Count(Line(I)) := Count(Line(I))+1;
     Sum := Sum + 1;
  end loop;
  declare   -- compute the entropy and print it
     function P(C: Character) return Float is (Float(Count(C)) / Float(Sum));
     use Ada.Numerics.Elementary_Functions, Ada.Float_Text_IO;
     Result: Float := 0.0;
  begin
     for Ch in Character loop
        Result := Result -
         (if P(Ch)=0.0 then 0.0 else P(Ch) * Log(P(Ch), Base => 2.0));
     end loop;
     Put(Result, Fore => 1, Aft => 5, Exp => 0);
  end;

end Count_Entropy;</lang>

Burlesque

<lang burlesque>blsq ) "1223334444"F:u[vv^^{1\/?/2\/LG}m[?*++ 1.8464393446710157</lang>

Euler Math Toolbox

<lang EulerMathToolbox>>function entropy (s) ... $ v=strtochar(s); $ m=getmultiplicities(unique(v),v); $ m=m/sum(m); $ return sum(-m*logbase(m,2)) $endfunction >entropy("1223334444")

1.84643934467</lang>

D

<lang d>import std.stdio, std.algorithm, std.math;

double entropy(T)(T[] s) /*pure nothrow*/ if (__traits(compiles, sort(s))) {

   return s
          .sort()
          .group
          .map!(g => g[1] / cast(double)s.length)
          .map!(p => -p * log2(p))
          .reduce!q{a + b};

}

void main() {

   "1223334444"d.dup.entropy.writeln;

}</lang>

Output:
1.84644

Haskell

<lang haskell>import Data.List

main = print $ entropy "1223334444"

entropy s =

sum . map lg' . fq' . map (fromIntegral.length) . group . sort $ s
 where lg' c = (c * ) . logBase 2 $ 1.0 / c
       fq' c = map (\x -> x / (sum c)) c </lang>

J

Solution:<lang j> entropy=: +/@:-@(* 2&^.)@(#/.~ % #)</lang>

Example:

<lang j> entropy '1223334444' 1.84644</lang>

Perl

<lang Perl>sub entropy {

   my %count; $count{$_}++ for @_;
   my @p = map $_/@_, values %count;
   my $entropy = 0;
   $entropy += - $_ * log $_ for @p;
   $entropy / log 2

}

print entropy split //, "1223334444";</lang>

Perl 6

<lang Perl 6>sub entropy(@a) {

   [+] map -> \p { p * -log p }, @a.bag.values »/» +@a;

}

say log(2) R/ entropy '1223334444'.comb;</lang>

Output:
1.84643934467102

Ruby

Works with: Ruby version 1.9

<lang ruby>def entropy(s)

 counts = Hash.new(0)
 s.each_char { |c| counts[c] += 1 }
 counts.values.reduce(0) do |entropy, count|
   freq = count / s.length.to_f
   entropy - freq * Math.log2(freq)
 end

end</lang> One-liner, same performance (or better): <lang ruby>def entropy2(s)

 s.each_char.group_by(&:to_s).values.map { |x| x.length / s.length.to_f }.reduce(0) { |e, x| e - x*Math.log2(x) }

end</lang>

Tcl

<lang tcl>proc entropy {str} {

   set log2 [expr log(2)]
   foreach char [split $str ""] {dict incr counts $char}
   set entropy 0.0
   foreach count [dict values $counts] {

set freq [expr {$count / double([string length $str])}] set entropy [expr {$entropy - $freq * log($freq)/$log2}]

   }
   return $entropy

}</lang> Demonstration: <lang tcl>puts [format "entropy = %.5f" [entropy "1223334444"]]</lang>

Output:
entropy = 1.84644

XPL0

<lang XPL0>code real RlOut=48, Ln=54; \intrinsic routines string 0; \use zero-terminated strings

func StrLen(A); \Return number of characters in an ASCIIZ string char A; int I; for I:= 0, -1>>1-1 do

   if A(I) = 0 then return I;

func real Entropy(Str); \Return Shannon entropy of string char Str; int Len, I, Count(128); real Sum, Prob; [Len:= StrLen(Str); for I:= 0 to 127 do Count(I):= 0; for I:= 0 to Len-1 do \count number of each character in string

   Count(Str(I)):= Count(Str(I)) + 1;

Sum:= 0.0; for I:= 0 to 127 do

   if Count(I) # 0 then        \(avoid Ln(0.0) error)
       [Prob:= float(Count(I)) / float(Len);   \probability of char in string
       Sum:= Sum + Prob*Ln(Prob);
       ];

return -Sum/Ln(2.0); ];

RlOut(0, Entropy("1223334444"))</lang>

Output:
    1.84644