Talk:Entropy: Difference between revisions

 
(10 intermediate revisions by 3 users not shown)
Line 41:
:: What I mean when I say 299-300 bits is that if we knew I was going to pick 100 digits from random.org as described we wouldn't be able to design a scheme that would average fewer than 299 bits to tell you which one I picked, but we could design one which would do it in 300. This task takes a slightly different model (if I was going to generate a 100-character string with a given number of each digit) but the two are similar. Either way the string has about 300 and each character has about 3.
:: [[User:CRGreathouse|CRGreathouse]] ([[User talk:CRGreathouse|talk]]) 00:55, 18 April 2013 (UTC)
 
: Yes, it was supposed to be bits/character and I've corrected it. Shannon called N the number of "symbols" and I call the number of different symbols "characters" (your number of c) to distinguish n "different symbols" from N symbols. The entropy in bits (technically called shannons to indicate not only base 2 was used but that the function N*H was applied) is S<sub>2</sub>=NH<sub>2</sub> = 100*2.9758 = '''297 shannons'''. H is intensive or specific entropy, S is extensive or total entropy. The normalized specific entropy/symbol (the degree to which the symbols were equally frequently used, from 0 to 1) is H<sub>8</sub> = H<sub>n</sub> = H<sub>2</sub>*ln(2)/ln(8) = 0.992 in "octals/symbol" or "normalized entropy/symbol". ln(2)/ln(8) changes from base 2 to base 8. The same distribution of characters with a different number n of characters would give the same H<sub>n</sub>. The most objective "total entropy" is then S<sub>n</sub> = H<sub>n</sub> N = 99.2. The are no units because this is a pure statistical measure of the distribution of the ratios times the number of symbols. 99.2 shows that only 1 of the 100 symbols was not perfectly "random" from an H-function perspective (which is VERY limited and basic because it does not try to do any compression, i.e. it does not try to look for patterns except for character frequency). But the last digit had to be chosen randomly without trying to make H=1. S<sub>n</sub> is the total "information" in the data if the observer of the data was not able to compress the data more. Compression has no perfect universal function so information depends on the observer's view, or his access to compression tools like gzip. So for general computer work, the compressed file is the amount of information in it. See also my comments in a section below.
 
:Physics is our best known (most efficient) set of non-lossy compression functions for the data observed in the world, which is a way of saying Occam's razor was applied. Other sets of compression functions like religion + democracy + free markets are more intelligent than physics's functions if they result in more profit for the computing device implementing the functions (algorithms), in keeping with Marcus Hutter's definition of intelligence. Deciding the most efficient way to move the symbols around with electrons in CPUs, ions in brains, votes in government, reputation in open source, or money in markets allows the computing device (CPUs implementing real A.I., brains, government, open source, markets) to most efficiently move the much larger real-world objects that the symbols represent so that the computing device can more efficiently make copies of itself (seek lower entropy). This is how entropy relates to information and how information relates to intelligence and evolution which are the result of least action dynamics seeking lower physical entropy in the distribution of N atoms on Earth, as the excess entropy is sloughed off to space with 17 low-energy un-directed photons per incoming sun-directed photon, so that the Universe can expand (entropy and energy per comoving volume of the universe is constant). The result is higher energy bonds (lower entropy due to smaller volume) of the N atoms on Earth of n types, which is why machines that utilize metal and metalloid (e.g. silicon) atoms with their previously attached oxygen atoms removed (resulting in carbon-carbon, metal-metal, silicon-silicon) are replacing biology: they are more efficient due to the higher energy low entropy bonds. This is why entropy is important. [[User:Zawy|Zawy]] ([[User talk:Zawy|talk]]) 10:33, 23 January 2016 (UTC)
 
== REXX (log2) ==
Line 91 ⟶ 95:
This article is confusing Shannon entropy with information entropy and incorrectly states Shannon entropy H has units of bits.
 
There are many problems in applying H= -1*sum(p*log(p)) to a string and calling it the entropy of that string. H is called entropy but its units are bits/symbol, or entropy/symbol if the correct log base is chosen. For example, H of 01 and 011100101010101000011110 are exactly the same "entropy", H=1 bit/symbol, even though the 2nd one obviously carries more information entropy than the 1st. Another problem is that if you simply re-express the same data in hexadecimal, H gives a different answer for the same information entropy. The best and real information entropy of a string is 4) below. Applying 4) to binary data gives the entropy in "bits", but since the data was binary, its units are also a true statistical "entropy" without having to specify "bits" as a unit.
 
Total entropy (in an arbitrarily chosen log base, which is not the best type of "entropy") for a file is S=N*H where N is the length of the file. Many times in [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book] he says H is in units of "bits/symbol", "entopy/symbol", and "information/symbol". Some people don't believe Shannon, so [https://schneider.ncifcrf.gov/ here's a modern respected researcher's home page] that tries to clear the confusion by stating the units out in the open.
Line 104 ⟶ 108:
 
2) Normalized specific entropy: '''H<sub>n</sub> = H / log(n).'''
The division converts the logarithm base of H to n. Units are entropy/symbol or 1/symbol. Ranges from 0 to 1. When it is 1 it means each symbol occurred equally often, n/N times. Near 0 means all symbols except 1 occurred only once, and the rest of a very long file was the other symbol. "Log" is in same base as H.
 
3) Total entropy '''S' = N * H.'''
Line 110 ⟶ 114:
 
4) Normalized total entropy '''S<sub>n</sub>' = N * H / log(n).''' See "gotcha" below in choosing n.
No units in the same way a ratio does not have units. Notice the formula uses a ratio and the data itself instead of a person chooses the logarithm base. This is the total entropy of the symbols. It varies from 0 to N
Unit is "entropy". It varies from 0 to N
 
5) Physical entropy S of a binary file when the data is stored perfectly efficiently (using Landauer's limit): '''S = S' * k<sub>B</sub> / log(e)'''
Line 117 ⟶ 121:
 
*Gotcha: a data generator may have the option of using say 256 symbols but only use 200 of those symbols for a set of data. So it becomes a matter of semantics if you chose n=256 or n=200, and neither may work (giving the same entropy when expressed in a different symbol set) because an implicit compression has been applied.
 
== possible typo in an equation ==
About in the middle of the task's preamble, there is a formula:
 
The total entropy in bits of the example above is S= 10*18.4644 = 18.4644 bits.
 
Should the last number be &nbsp; '''184.644''' &nbsp; (instead)? &nbsp; -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 03:29, 16 August 2016 (UTC)
 
==Several formulae on task page now invisible to many browsers==
 
Various formulae on this page became invisible to many browsers (all those - the majority - which display the graphic file rather than processing the MathML directly), as a result of the 'tidying and spacing' edits at 19:40 Jul 14 2016. One of the issues is the editor's introduction of redundant white space around Latex expressions inside &lt;math&gt; tags. This white space is not currently supported by the MediaWiki processor. [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:52, 19 September 2016 (UTC)
 
: Visibility of formulae to Chrome, IE/Edge, Safari etc restored by reverting task description to state before under-tested cosmetic edits of 19:40, 14 July 2016‎. The editor may wish to restore some of these cosmetic edits, testing their real effects in Google, Microsoft and Apple's browsers, but the the most practical immediate solution is simply to return to the state before visibility of formulae was lost. [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 23:09, 16 October 2016 (UTC)
9,655

edits