Arithmetic coding/As a generalized change of radix: Difference between revisions

J: show several example implementations of the wikipedia spec, and discuss how they differ
m (→‎{{header|Perl}}: Optimized the decoding process)
(J: show several example implementations of the wikipedia spec, and discuss how they differ)
Line 215:
1224972022395235072 15</lang>
 
(See talk page.)
Draft note: these results are different from those of some other implementations currently visible here. I expect that that difference occurs from those other implementations using cumulative frequency instead of frequency at the wrong place in the calculation. (If I am incorrect about this, I'd appreciate a detailed exposition.)
 
If, however, we ignore the wikipedia suggestion that letters have predefined values (e.g. C = 2), but instead use cumulative frequency (in order from least common to most common and leaving in order of appearance for letters with the same frequencies, we get this:
 
<lang J>aekc=:3 :0
b=. x:#y
i=. (~.i.])y
n=. #/.~y
f=. i{n
c=. i{(+/\0,}:/:~n)/:/:n
L=. b #. c**/\1,}:f
p=. */f
e=. x:<.10^.p
e,~<.(L+p)%10^e
)</lang>
 
With encoded values:
 
<lang J> aekc 'DABDDB'
251 2
aekc 'DABDDBBDDBA'
167351 6
aekc 'ABRACADABRA'
19022571 4
aekc 'TOBEORNOTTOBEORTOBEORNOT'
807796941974181890 15</lang>
 
If we instead use cumulative frequencies, cumulated strictly in alphabetic order, we get this:
 
<lang J>aeka=:3 :0
b=. x:#y
i=. (~.i.])y
n=. #/.~y
o=. /:~.y
f=. i{n
c=. i{(+/\0,}:o{n)/:o
L=. b #. c**/\1,}:f
p=. */f
e=. x:<.10^.p
e,~<.(L+p)%10^e
)</lang>
 
<lang J> aeka 'DABDDB'
251 2
aeka 'DABDDBBDDBA'
167351 6
aeka 'ABRACADABRA'
7954170 4
aeka 'TOBEORNOTTOBEORTOBEORNOT'
1150764267498783364 15</lang>
 
Note that this last variant appears to match some other implementations of this task, though the algorithm itself conflicts with the specification currently visible at [[wp:Arithmetic_coding#Arithmetic_coding_as_a_generalized_change_of_radix|wikipedia]] in several respects:
 
# The wikipedia description currently specifies "The cumulative frequency is the total of all frequencies below it in a frequency distribution (a running total of frequencies)" but that is not alphabetic order,
# The wikipedia description currently specifies for the "DABDDB" example the encoding "A = 0, B = 1, C = 2, D = 3" but that encoding is clearly not the specified cumulative frequencies.
 
=={{header|Perl}}==
6,951

edits