Talk:Entropy: Difference between revisions

Content added Content deleted
Line 91: Line 91:
This article is confusing Shannon entropy with information entropy and incorrectly states Shannon entropy H has units of bits.
This article is confusing Shannon entropy with information entropy and incorrectly states Shannon entropy H has units of bits.


There are many problems in applying H= -1*sum(p*log(p)) to a string and calling it the entropy of that string. H is called entropy but its units are bits/symbol, or entropy/symbol if the correct log base is chosen. For example, H of 01 and 011100101010101000011110 are exactly the same "entropy", H=1 bit/symbol, even though the 2nd one obviously carries more information entropy than the 1st. Another problem is that if you simply re-express the same data in hexadecimal, H gives a different answer for the same information entropy. The best and real information entropy of a string is 4) below.
There are many problems in applying H= -1*sum(p*log(p)) to a string and calling it the entropy of that string. H is called entropy but its units are bits/symbol, or entropy/symbol if the correct log base is chosen. For example, H of 01 and 011100101010101000011110 are exactly the same "entropy", H=1 bit/symbol, even though the 2nd one obviously carries more information entropy than the 1st. Another problem is that if you simply re-express the same data in hexadecimal, H gives a different answer for the same information entropy. The best and real information entropy of a string is 4) below. Applying 4) to binary data gives the entropy in "bits", but since the data was binary, its units are also a true statistical "entropy" without having to specify "bits" as a unit.


Total entropy (in an arbitrarily chosen log base, which is not the best type of "entropy") for a file is S=N*H where N is the length of the file. Many times in [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book] he says H is in units of "bits/symbol", "entopy/symbol", and "information/symbol". Some people don't believe Shannon, so [https://schneider.ncifcrf.gov/ here's a modern respected researcher's home page] that tries to clear the confusion by stating the units out in the open.
Total entropy (in an arbitrarily chosen log base, which is not the best type of "entropy") for a file is S=N*H where N is the length of the file. Many times in [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book] he says H is in units of "bits/symbol", "entopy/symbol", and "information/symbol". Some people don't believe Shannon, so [https://schneider.ncifcrf.gov/ here's a modern respected researcher's home page] that tries to clear the confusion by stating the units out in the open.