Talk:Arithmetic coding/As a generalized change of radix: Difference between revisions

Content added Content deleted

Inline

Revision as of 19:16, 28 January 2016

Goof?

The J entry gives different values for most of these examples than Go/Perl/Perl 6/Ruby/Sidef.

The difference seems to be related to the difference between "frequency" and "cumulative frequency".

The wikipedia writeup uses frequency when describing the "As a generalized change of radix" mechanism, and then introduces "cumulative frequency" when describing how to reverse the process.

Anyways, it looks like Go/Perl/Perl 6/Ruby/Sidef are using a different algorithm than the one described in wikipedia, as they all use "cumulative frequency" rather than "frequency" when encoding.

Or perhaps the wikipedia page is in error? Do we have anyone with sufficient background in the history of this algorithm to make that determination? --Rdm (talk) 17:44, 27 January 2016 (UTC)

I am not familiar with J. But the number generated could be different if a different cumulative frequency dictionary is used. One commonality of the Go/Perl/Perl 6/Ruby/Sidef solutions is that they generate the cumulative frequencies by accumulating in ascending order of the character codes (because they check keys from 0 to 255), so they always generate the same cumulative frequency dictionary. However, the order of characters when accumulating is not relevant to the correctness of the algorithm. If you accumulate in a different order, you will get a different cumulative frequency dictionary, but as long as decoding uses the same cumulative frequency dictionary (i.e. it generates it by iterating through the frequency dictionary in the same order), it will produce the correct result. I can't read J, but if the J solution simply iterates through the frequency dictionary in an unspecified by consistent (not necessarily ascending) order of the keys, then it could generate a different cumulative frequency dictionary, and hence a different number. I am not sure if this is really what is happening. --Spoon! (talk) 10:40, 28 January 2016 (UTC)

The J implementation does not use cumulative frequency for this task. The wikipedia clearly specifies that frequency and not cumulative frequency should be used for this task. Instead, it introduces cumulative frequency for the decoding operation.

So why are people using cumulative frequency for this task? --Rdm (talk) 16:24, 28 January 2016 (UTC)

I'm not sure if I understood your point correctly, but maybe this will clarify something. As Wikipedia describes it, in the encoding process, the formula to calculate the lower bound

L

, is:

\mathrm {L} =\sum _{i=1}^{n}n^{n-i}C_{i}{\prod _{k=1}^{i-1}f_{k}}

where

\scriptstyle C_{i}

are the cumulative frequencies and

\scriptstyle f_{k}

are the frequencies of occurrences.

After that, we compute the upper bound as follows:

\mathrm {U} =\mathrm {L} +\prod _{k=1}^{n}f_{k}

After this point, it doesn't matter which number

N

we return, as long as

L<=N<U

. --Trizen (talk) 17:28, 28 January 2016 (UTC)

@@ Line 11: / Line 11: @@
 Or perhaps the wikipedia page is in error? Do we have anyone with sufficient background in the history of this algorithm to make that determination? --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 17:44, 27 January 2016 (UTC)
-: I am not familiar with J. But the number generated could be different if a different cumulative frequency dictionary is used. One commonality of the Go/Perl/Perl 6/Ruby/Sidef solutions is that they generate the cumulative frequencies by accumulating in ascending order of the character codes (because they check keys from 0 to 255), so they always generate the same cumulative frequency dictionary. However, the order of characters when accumulating is not relevant to the correctness of the algorithm. If you accumulate in a different order, you will get a different cumulative frequency dictionary, but as long as decoding uses the same dictionary, it will produce the correct result. I can't read J, but if the J solution simply iterates through the frequency dictionary in an unspecified (not necessarily ascending) order of the keys, then it could generate a different cumulative frequency dictionary, and hence a different number. I am not sure if this is really what is happening.
+: I am not familiar with J. But the number generated could be different if a different cumulative frequency dictionary is used. One commonality of the Go/Perl/Perl 6/Ruby/Sidef solutions is that they generate the cumulative frequencies by accumulating in ascending order of the character codes (because they check keys from 0 to 255), so they always generate the same cumulative frequency dictionary. However, the order of characters when accumulating is not relevant to the correctness of the algorithm. If you accumulate in a different order, you will get a different cumulative frequency dictionary, but as long as decoding uses the same cumulative frequency dictionary (i.e. it generates it by iterating through the frequency dictionary in the same order), it will produce the correct result. I can't read J, but if the J solution simply iterates through the frequency dictionary in an unspecified by consistent (not necessarily ascending) order of the keys, then it could generate a different cumulative frequency dictionary, and hence a different number. I am not sure if this is really what is happening. --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 10:40, 28 January 2016 (UTC)
-: However, I also wanted to change the other solutions so that they can simply create the cumulative frequencies by simply iterating over the frequencies dictionary in any order, to remove the dependence on the range of character codes. However, such a change will also lead to nondeterministic output, so I am not sure if that's a good idea. --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 10:40, 28 January 2016 (UTC)
 :: The J implementation does not use <code>cumulative frequency</code> for this task. The wikipedia clearly specifies that <code>frequency</code> and not <code>cumulative frequency</code> should be used for this task. Instead, it introduces <code>cumulative frequency</code> for the decoding operation.
 :: So why are people using <code>cumulative frequency</code> for this task? --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 16:24, 28 January 2016 (UTC)