Entropy: Difference between revisions
m (whitespace) |
m (switching signs (the information is positive)) |
||
Line 3: | Line 3: | ||
Calculate the [http://en.wikipedia.org/wiki/Entropy_(information_theory) information entropy] (Shannon entropy) of a given input string. |
Calculate the [http://en.wikipedia.org/wiki/Entropy_(information_theory) information entropy] (Shannon entropy) of a given input string. |
||
Entropy is the [[wp:Expected value|expected value]] of the measure of [[wp:Self-information|information]] content in a system. In general, the Shannon entropy of a variable <math>X</math> is defined as <math>H(X) = |
Entropy is the [[wp:Expected value|expected value]] of the measure of [[wp:Self-information|information]] content in a system. In general, the Shannon entropy of a variable <math>X</math> is defined as <math>H(X) = \sum_x P(x) I(x)</math> where the information content <math>I(x) = -\log_{b} P(x)</math>. If the base of the logarithm <math>b = 2</math>, the result is expressed in ''bits'', a [[wp:Units of information|unit of information]]. |
||
Therefore, given a string <math>S</math> of length <math>n</math> where <math>P(s_i)</math> is the relative frequency of each character, the entropy of a string in bits is |
Therefore, given a string <math>S</math> of length <math>n</math> where <math>P(s_i)</math> is the relative frequency of each character, the entropy of a string in bits is |
Revision as of 06:40, 26 February 2013
You are encouraged to solve this task according to the task description, using any language you may know.
Calculate the information entropy (Shannon entropy) of a given input string.
Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable is defined as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H(X) = \sum_x P(x) I(x)} where the information content . If the base of the logarithm , the result is expressed in bits, a unit of information.
Therefore, given a string of length where is the relative frequency of each character, the entropy of a string in bits is
For this task, use "1223334444" as an example. The result should be around 1.84644 bits.
Ada
Uses Ada 2012. <lang Ada>with Ada.Text_IO, Ada.Float_Text_IO, Ada.Numerics.Elementary_Functions;
procedure Count_Entropy is
package TIO renames Ada.Text_IO;
Count: array(Character) of Natural := (others => 0); Sum: Natural := 0; Line: String := "1223334444";
begin
for I in Line'Range loop -- count the characters Count(Line(I)) := Count(Line(I))+1; Sum := Sum + 1; end loop;
declare -- compute the entropy and print it function P(C: Character) return Float is (Float(Count(C)) / Float(Sum)); use Ada.Numerics.Elementary_Functions, Ada.Float_Text_IO; Result: Float := 0.0; begin for Ch in Character loop Result := Result - (if P(Ch)=0.0 then 0.0 else P(Ch) * Log(P(Ch), Base => 2.0)); end loop; Put(Result, Fore => 1, Aft => 5, Exp => 0); end;
end Count_Entropy;</lang>
Burlesque
<lang burlesque>blsq ) "1223334444"F:u[vv^^{1\/?/2\/LG}m[?*++ 1.8464393446710157</lang>
Euler Math Toolbox
<lang EulerMathToolbox>>function entropy (s) ... $ v=strtochar(s); $ m=getmultiplicities(unique(v),v); $ m=m/sum(m); $ return sum(-m*logbase(m,2)) $endfunction >entropy("1223334444")
1.84643934467</lang>
D
<lang d>import std.stdio, std.algorithm, std.math;
double entropy(T)(T[] s) /*pure nothrow*/ if (__traits(compiles, sort(s))) {
return s .sort() .group .map!(g => g[1] / cast(double)s.length) .map!(p => -p * log2(p)) .reduce!q{a + b};
}
void main() {
"1223334444"d.dup.entropy.writeln;
}</lang>
- Output:
1.84644
Haskell
<lang haskell>import Data.List
main = print $ entropy "1223334444"
entropy s =
sum . map lg' . fq' . map (fromIntegral.length) . group . sort $ s where lg' c = (c * ) . logBase 2 $ 1.0 / c fq' c = map (\x -> x / (sum c)) c </lang>
J
Solution:<lang j> entropy=: +/@:-@(* 2&^.)@(#/.~ % #)</lang>
- Example:
<lang j> entropy '1223334444' 1.84644</lang>
Perl
<lang Perl>sub entropy {
my %count; $count{$_}++ for @_; my @p = map $_/@_, values %count; my $entropy = 0; $entropy += - $_ * log $_ for @p; $entropy / log 2
}
print entropy split //, "1223334444";</lang>
Perl 6
<lang Perl 6>sub entropy(@a) {
[+] map -> \p { p * -log p }, @a.bag.values »/» +@a;
}
say log(2) R/ entropy '1223334444'.comb;</lang>
- Output:
1.84643934467102
For fun, and for comparison with other programming languages, here is a program that computes and shows its own entropy: <lang Perl 6>say log(2) R/ [+] map -> \p { p * -log p }, $_.bag.values »/» +$_
given slurp($*PROGRAM_NAME).comb</lang>
- Output:
4.98893170929151
Ruby
<lang ruby>def entropy(s)
counts = Hash.new(0) s.each_char { |c| counts[c] += 1 }
counts.values.reduce(0) do |entropy, count| freq = count / s.length.to_f entropy - freq * Math.log2(freq) end
end</lang> One-liner, same performance (or better): <lang ruby>def entropy2(s)
s.each_char.group_by(&:to_s).values.map { |x| x.length / s.length.to_f }.reduce(0) { |e, x| e - x*Math.log2(x) }
end</lang>
XPL0
<lang XPL0>code real RlOut=48, Ln=54; \intrinsic routines string 0; \use zero-terminated strings
func StrLen(A); \Return number of characters in an ASCIIZ string char A; int I; for I:= 0, -1>>1-1 do
if A(I) = 0 then return I;
func real Entropy(Str); \Return Shannon entropy of string char Str; int Len, I, Count(128); real Sum, Prob; [Len:= StrLen(Str); for I:= 0 to 127 do Count(I):= 0; for I:= 0 to Len-1 do \count number of each character in string
Count(Str(I)):= Count(Str(I)) + 1;
Sum:= 0.0; for I:= 0 to 127 do
if Count(I) # 0 then \(avoid Ln(0.0) error) [Prob:= float(Count(I)) / float(Len); \probability of char in string Sum:= Sum + Prob*Ln(Prob); ];
return -Sum/Ln(2.0); ];
RlOut(0, Entropy("1223334444"))</lang>
- Output:
1.84644