Talk:Entropy

What about a less dull example?

What about using "Rosetta code" as an example instead of the dull "1223334444"?--Grondilu 17:53, 22 February 2013 (UTC)

Or even funnier: write a program who computes its own entropy :) --Grondilu 12:40, 25 February 2013 (UTC)

Better yet, a bonus task of computing the entropy of each solution on the page. :) --TimToady 19:23, 25 February 2013 (UTC)

I like computing the entropy of “Rosetta Code” (it's about 3.08496, assuming my code is right); a more self-referential one is fine too, except it involves features that might block some languages from participating. (The draft child task is a better place for that.) –Donal Fellows 09:31, 26 February 2013 (UTC)

Alternate form

Not sure this is very useful, but I was wondering if one could not find a more concise way of writing this.

If we call $N$ the length of the string and $n_{c}$ the number of occurrences of the character c, we have:

$H=\sum _{c}-p_{c}\ln p_{c}=\sum _{c}-{\frac {n_{c}}{N}}\ln {\frac {n_{c}}{N}}=\ln N-{\frac {1}{N}}\sum _{c}n_{c}\ln n_{c}$

In perl6, this allows a slightly simpler formula, i.e. not using hyperoperators:

<lang Perl 6>sub entropy(@a) {

   log(@a) - @a R/ [+] map -> \n { n * log n }, @a.bag.values

}</lang>

For what it's worth.--Grondilu 18:14, 4 March 2013 (UTC)

Description

I think the task is not described correctly. It calculates the average entropy per character, not the entropy of the message as a whole.

For example, I generated this 100-digit string with 0 through 7 from random.org. It has 300 bits of entropy (more conservatively, 299 to 300 bits of entropy, in case the generation is slightly flawed). The task gives it just under 3: <lang parigp>entropy("1652410507230105455225274644011734652475143261241037401534561367707447375435416503021223072520334062") %1 = 2.9758111700170733830744745234131842224</lang>

CRGreathouse (talk) 02:08, 17 April 2013 (UTC)

I don't know. It's not simple. I was a bit embarrassed with this task as I struggled to understand what the entropy of a string is. Normally entropy is defined for a system whose internal state is only known probabilisticly. A string is known exactly so stricto sensu, the entropy of any string is zero. As I understood it, the task defines the entropy of a string as the entropy of whatever system "generated" the string. The string is thus seen as a representative samples of the probabilities of each possible character, seen as a random variable.

In your example, you say the entropy is 300 bits, but I'm not sure it makes much sense to say that. 300 bits is the amount of information in this string, if you consider it as the result of picking a random number from 0 to 10^300. Yet the entropy is still zero, for your string is exactly defined.

The entropy as currently defined in the task does not directly depend on the "size" of the string in memory, because it merely represents the probability table of a stochastic process.

--Grondilu (talk) 15:42, 17 April 2013 (UTC)

What I mean when I say 299-300 bits is that if we knew I was going to pick 100 digits from random.org as described we wouldn't be able to design a scheme that would average fewer than 299 bits to tell you which one I picked, but we could design one which would do it in 300. This task takes a slightly different model (if I was going to generate a 100-character string with a given number of each digit) but the two are similar. Either way the string has about 300 and each character has about 3.

CRGreathouse (talk) 00:55, 18 April 2013 (UTC)