Entropy: Difference between revisions

Content added Content deleted

Inline

Revision as of 06:40, 26 February 2013

Calculate the information entropy (Shannon entropy) of a given input string.

Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable $X$ is defined as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H(X) = \sum_x P(x) I(x)} where the information content $I(x)=-\log _{b}P(x)$ . If the base of the logarithm $b=2$ , the result is expressed in bits, a unit of information.

Therefore, given a string $S$ of length $n$ where $P(s_{i})$ is the relative frequency of each character, the entropy of a string in bits is

$H(S)=-\sum _{i=0}^{n}P(s_{i})\log _{2}(P(s_{i}))$

For this task, use "1223334444" as an example. The result should be around 1.84644 bits.

Ada

Uses Ada 2012. <lang Ada>with Ada.Text_IO, Ada.Float_Text_IO, Ada.Numerics.Elementary_Functions;

procedure Count_Entropy is

  package TIO renames Ada.Text_IO;

  Count: array(Character) of Natural := (others => 0);
  Sum:   Natural := 0;
  Line: String := "1223334444";

begin

  for I in Line'Range loop   -- count the characters
     Count(Line(I)) := Count(Line(I))+1;
     Sum := Sum + 1;
  end loop;

  declare   -- compute the entropy and print it
     function P(C: Character) return Float is (Float(Count(C)) / Float(Sum));
     use Ada.Numerics.Elementary_Functions, Ada.Float_Text_IO;
     Result: Float := 0.0;
  begin
     for Ch in Character loop
        Result := Result -
         (if P(Ch)=0.0 then 0.0 else P(Ch) * Log(P(Ch), Base => 2.0));
     end loop;
     Put(Result, Fore => 1, Aft => 5, Exp => 0);
  end;

end Count_Entropy;</lang>

Burlesque

Euler Math Toolbox

<lang EulerMathToolbox>>function entropy (s) ... $ v=strtochar(s); $ m=getmultiplicities(unique(v),v); $ m=m/sum(m); $ return sum(-m*logbase(m,2)) $endfunction >entropy("1223334444")

1.84643934467</lang>

D

<lang d>import std.stdio, std.algorithm, std.math;

double entropy(T)(T[] s) /*pure nothrow*/ if (__traits(compiles, sort(s))) {

   return s
          .sort()
          .group
          .map!(g => g[1] / cast(double)s.length)
          .map!(p => -p * log2(p))
          .reduce!q{a + b};

}

void main() {

   "1223334444"d.dup.entropy.writeln;

}</lang>

Output:

1.84644

Haskell

<lang haskell>import Data.List

main = print $ entropy "1223334444"

entropy s =

sum . map lg' . fq' . map (fromIntegral.length) . group . sort $ s
 where lg' c = (c * ) . logBase 2 $ 1.0 / c
       fq' c = map (\x -> x / (sum c)) c </lang>

J

Solution:<lang j> entropy=: +/@:-@(* 2&^.)@(#/.~ % #)</lang>

Example:

<lang j> entropy '1223334444' 1.84644</lang>

Perl

<lang Perl>sub entropy {

   my %count; $count{$_}++ for @_;
   my @p = map $_/@_, values %count;
   my $entropy = 0;
   $entropy += - $_ * log $_ for @p;
   $entropy / log 2

}

print entropy split //, "1223334444";</lang>

Perl 6

<lang Perl 6>sub entropy(@a) {

   [+] map -> \p { p * -log p }, @a.bag.values »/» +@a;

}

say log(2) R/ entropy '1223334444'.comb;</lang>

Output:

1.84643934467102

For fun, and for comparison with other programming languages, here is a program that computes and shows its own entropy: <lang Perl 6>say log(2) R/ [+] map -> \p { p * -log p }, $_.bag.values »/» +$_

   given slurp($*PROGRAM_NAME).comb</lang>

Output:

4.98893170929151

Ruby

Works with: Ruby version 1.9

<lang ruby>def entropy(s)

 counts = Hash.new(0)
 s.each_char { |c| counts[c] += 1 }

 counts.values.reduce(0) do |entropy, count|
   freq = count / s.length.to_f
   entropy - freq * Math.log2(freq)
 end

end</lang> One-liner, same performance (or better): <lang ruby>def entropy2(s)

 s.each_char.group_by(&:to_s).values.map { |x| x.length / s.length.to_f }.reduce(0) { |e, x| e - x*Math.log2(x) }

end</lang>

XPL0

<lang XPL0>code real RlOut=48, Ln=54; \intrinsic routines string 0; \use zero-terminated strings

func StrLen(A); \Return number of characters in an ASCIIZ string char A; int I; for I:= 0, -1>>1-1 do

   if A(I) = 0 then return I;

func real Entropy(Str); \Return Shannon entropy of string char Str; int Len, I, Count(128); real Sum, Prob; [Len:= StrLen(Str); for I:= 0 to 127 do Count(I):= 0; for I:= 0 to Len-1 do \count number of each character in string

   Count(Str(I)):= Count(Str(I)) + 1;

Sum:= 0.0; for I:= 0 to 127 do

   if Count(I) # 0 then        \(avoid Ln(0.0) error)
       [Prob:= float(Count(I)) / float(Len);   \probability of char in string
       Sum:= Sum + Prob*Ln(Prob);
       ];

return -Sum/Ln(2.0); ];

RlOut(0, Entropy("1223334444"))</lang>

Output:

    1.84644

@@ Line 3: / Line 3: @@
 Calculate the [http://en.wikipedia.org/wiki/Entropy_(information_theory) information entropy] (Shannon entropy) of a given input string.
-Entropy is the [[wp:Expected value|expected value]] of the measure of [[wp:Self-information|information]] content in a system. In general, the Shannon entropy of a variable <math>X</math> is defined as <math>H(X) = -\sum_x P(x) I(x)</math> where the information content <math>I(x) = \log_{b} P(x)</math>. If the base of the logarithm <math>b = 2</math>, the result is expressed in ''bits'', a [[wp:Units of information|unit of information]].
+Entropy is the [[wp:Expected value|expected value]] of the measure of [[wp:Self-information|information]] content in a system. In general, the Shannon entropy of a variable <math>X</math> is defined as <math>H(X) = \sum_x P(x) I(x)</math> where the information content <math>I(x) = -\log_{b} P(x)</math>. If the base of the logarithm <math>b = 2</math>, the result is expressed in ''bits'', a [[wp:Units of information|unit of information]].
 Therefore, given a string <math>S</math> of length <math>n</math> where <math>P(s_i)</math> is the relative frequency of each character, the entropy of a string in bits is