Entropy
Calculate the information entropy (Shannon entropy) of a given input string.
You are encouraged to solve this task according to the task description, using any language you may know.
Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable is defined as where the information content . If the base of the logarithm , the result is expressed in bits, a unit of information.
Therefore, given a string of length where is the relative frequency of each character, the entropy of a string in bits is
For this task, use "1223334444" as an example. The result should be around 1.84644 bits.
Ada
Uses Ada 2012.
<lang Ada>with Ada.Text_IO, Ada.Float_Text_IO, Ada.Numerics.Elementary_Functions;
procedure Count_Entropy is
package TIO renames Ada.Text_IO;
Count: array(Character) of Natural := (others => 0); Sum: Natural := 0; Line: String := "1223334444";
begin
for I in Line'Range loop -- count the characters Count(Line(I)) := Count(Line(I))+1; Sum := Sum + 1; end loop;
declare -- compute the entropy and print it function P(C: Character) return Float is (Float(Count(C)) / Float(Sum)); use Ada.Numerics.Elementary_Functions, Ada.Float_Text_IO; Result: Float := 0.0; begin for Ch in Character loop Result := Result - (if P(Ch)=0.0 then 0.0 else P(Ch) * Log(P(Ch), Base => 2.0)); end loop; Put(Result, Fore => 1, Aft => 5, Exp => 0); end;
end Count_Entropy;</lang>
Burlesque
<lang burlesque>blsq ) "1223334444"F:u[vv^^{1\/?/2\/LG}m[?*++ 1.8464393446710157</lang>
Euler Math Toolbox
<lang Euler Math Toolbox> >function entropy (s) ... $ v=strtochar(s); $ m=getmultiplicities(unique(v),v); $ m=m/sum(m); $ return sum(-m*logbase(m,2)) $endfunction >entropy("1223334444")
1.84643934467
</lang>
D
<lang d>import std.stdio, std.algorithm, std.math;
double entropy(T)(T[] s) /*pure nothrow*/ if (__traits(compiles, sort(s))) {
return s .sort() .group .map!(g => g[1] / cast(double)s.length) .map!(p => -p * log2(p)) .reduce!q{a + b};
}
void main() {
"1223334444"d.dup.entropy.writeln;
}</lang>
- Output:
1.84644
Haskell
<lang haskell>import Data.List
main = print $ entropy "1223334444"
entropy s =
sum . map lg' . fq' . map (fromIntegral.length) . group . sort $ s where lg' c = (c * ) . logBase 2 $ 1.0 / c fq' c = map (\x -> x / (sum c)) c </lang>
J
Solution:<lang j> entropy=: +/@:-@(* 2&^.)@(#/.~ % #)</lang>
- Example:
<lang j> entropy '1223334444' 1.84644</lang>
Perl
<lang Perl>sub entropy {
my %count; $count{$_}++ for @_; my @p = map $_/@_, values %count; my $entropy = 0; $entropy += - $_ * log $_ for @p; $entropy / log 2
}
print entropy split //, "1223334444";</lang>
Perl 6
<lang Perl 6>sub entropy(@a) {
[+] map -> \p { p * -log p }, @a.bag.values »/» +@a;
}
say log(2) R/ entropy '1223334444'.comb;</lang>
- Output:
1.84643934467102
For fun, and for comparison with other programming languages, here is a program that computes and shows its own entropy:
<lang Perl 6>say log(2) R/ [+] map -> \p { p * -log p }, $_.bag.values »/» +$_
given slurp($*PROGRAM_NAME).comb</lang>
- Output:
4.98893170929151
Ruby
<lang ruby>def entropy(s)
counts = Hash.new(0) s.each_char { |c| counts[c] += 1 }
counts.values.reduce(0) do |entropy, count| freq = count / s.length.to_f entropy - freq * Math.log2(freq) end
end</lang> One-liner, same performance (or better): <lang ruby>def entropy2(s)
s.each_char.group_by(&:to_s).values.map { |x| x.length / s.length.to_f }.reduce(0) { |e, x| e - x*Math.log2(x) }
end</lang>
XPL0
<lang XPL0>code real RlOut=48, Ln=54; \intrinsic routines string 0; \use zero-terminated strings
func StrLen(A); \Return number of characters in an ASCIIZ string char A; int I; for I:= 0, -1>>1-1 do
if A(I) = 0 then return I;
func real Entropy(Str); \Return Shannon entropy of string char Str; int Len, I, Count(128); real Sum, Prob; [Len:= StrLen(Str); for I:= 0 to 127 do Count(I):= 0; for I:= 0 to Len-1 do \count number of each character in string
Count(Str(I)):= Count(Str(I)) + 1;
Sum:= 0.0; for I:= 0 to 127 do
if Count(I) # 0 then \(avoid Ln(0.0) error) [Prob:= float(Count(I)) / float(Len); \probability of char in string Sum:= Sum + Prob*Ln(Prob); ];
return -Sum/Ln(2.0); ];
RlOut(0, Entropy("1223334444"))</lang>
- Output:
1.84644