Move-to-front algorithm: Difference between revisions
(→{{header|Icon}} and {{header|Unicon}}: Marked incomplete.) |
(→{{header|Python}}: Add more functional version. (but less clear).) |
||
Line 500: | Line 500: | ||
=={{header|Python}}== |
=={{header|Python}}== |
||
===Python: Procedural=== |
|||
<lang python>from __future__ import print_function |
<lang python>from __future__ import print_function |
||
from string import ascii_lowercase |
from string import ascii_lowercase |
||
Line 533: | Line 535: | ||
'bananaaa' encodes to [1, 1, 13, 1, 1, 1, 0, 0], which decodes back to 'bananaaa' |
'bananaaa' encodes to [1, 1, 13, 1, 1, 1, 0, 0], which decodes back to 'bananaaa' |
||
'hiphophiphop' encodes to [7, 8, 15, 2, 15, 2, 2, 3, 2, 2, 3, 2], which decodes back to 'hiphophiphop'</pre> |
'hiphophiphop' encodes to [7, 8, 15, 2, 15, 2, 2, 3, 2, 2, 3, 2], which decodes back to 'hiphophiphop'</pre> |
||
===Python: Functional=== |
|||
From the procedural version note that through the encoding or decoding loops both the final output is accumulated as well as the symbol table being rearranged.<br> |
|||
For the functional forms a 2-item list of output to be accumulated and symboltable manipulation is calculated then only the former accumulated as the later works to transform the symbol table in-place. |
|||
<lang python>def m2f_e(s, st): |
|||
return [[st.index(ch), st.insert(0, st.pop(st.index(ch)))][0] for ch in s] |
|||
def m2f_d(sq, st): |
|||
return ''.join([st[i], st.insert(0, st.pop(i))][0] for i in sq) |
|||
ST = list('abcdefghijklmnopqrstuvwxyz') |
|||
for s in ['broood', 'bananaaa', 'hiphophiphop']: |
|||
encode = m2f_e(s, ST[::]) |
|||
print('%14r encodes to %r' % (s, encode), end=', ') |
|||
decode = m2f_d(encode, ST[::]) |
|||
print('decodes back to %r' % decode) |
|||
assert s == decode, 'Whoops!'</lang> |
|||
{{out}} |
|||
Similar to that of the procedural version above. |
|||
=={{header|REXX}}== |
=={{header|REXX}}== |
Revision as of 07:08, 26 May 2014
You are encouraged to solve this task according to the task description, using any language you may know.
Given a symbol table of a zero-indexed array of all possible input symbols
this algorithm reversibly transforms a sequence
of input symbols into an array of output numbers (indices).
The transform in many cases acts to give frequently repeated input symbols
lower indices which is useful in some compression algorithms.
- Encoding algorithm
for each symbol of the input sequence: output the index of the symbol in the symbol table move that symbol to the front of the symbol table
- Decoding algorithm
# Using the same starting symbol table for each index of the input sequence: output the symbol at that index of the symbol table move that symbol to the front of the symbol table
- Example
Encoding the string of character symbols 'broood' using a symbol table of the characters 'a'-to-'z'
Input | Output | SymbolTable |
---|---|---|
broood | 1 | 'abcdefghijklmnopqrstuvwxyz' |
broood | 1 17 | 'bacdefghijklmnopqrstuvwxyz' |
broood | 1 17 15 | 'rbacdefghijklmnopqstuvwxyz' |
broood | 1 17 15 0 | 'orbacdefghijklmnpqstuvwxyz' |
broood | 1 17 15 0 0 | 'orbacdefghijklmnpqstuvwxyz' |
broood | 1 17 15 0 0 5 | 'orbacdefghijklmnpqstuvwxyz' |
Decoding the indices back to the original symbol order:
Input | Output | SymbolTable |
---|---|---|
1 17 15 0 0 5 | b | 'abcdefghijklmnopqrstuvwxyz' |
1 17 15 0 0 5 | br | 'bacdefghijklmnopqrstuvwxyz' |
1 17 15 0 0 5 | bro | 'rbacdefghijklmnopqstuvwxyz' |
1 17 15 0 0 5 | broo | 'orbacdefghijklmnpqstuvwxyz' |
1 17 15 0 0 5 | brooo | 'orbacdefghijklmnpqstuvwxyz' |
1 17 15 0 0 5 | broood | 'orbacdefghijklmnpqstuvwxyz' |
- Task
- Encode and decode the following three strings of characters using the symbol table of the
characters 'a'-to-'z' as above.
- Show the strings and their encoding here.
- Add a check to ensure that the decoded string is the same as the original.
The strings are: 'broood', 'bananaaa', and 'hiphophiphop'. (Note the spellings).
ALGOL 68
<lang algol68># initial value for the symbol table # []CHAR initial table = "abcdefghijklmnopqrstuvwxyz";
- move the character at text pos to the front of text #
- note text pos is based from 0 #
PROC move to front = ( STRING text, INT text pos )STRING:
IF text pos < 1 THEN # the character is already at the front (or not in the string) # text ELSE # the character isn't already at the front - construct the new string #
INT char pos = LWB text + text pos;
STRING result := text[ char pos : char pos ] + text[ : char pos - 1 ] ;
IF char pos < UPB text THEN # the character being moved isn't the last character # result +:= text[ char pos + 1 : ] FI;
result FI;
- encode a symbol #
PROC encode = ( STRING text )[]INT: BEGIN
[ 1 : ( UPB text - LWB text ) + 1 ]INT result;
# dynamic symbol table # STRING symbol table := initial table;
FOR text pos FROM LWB text TO UPB text DO INT symbol pos := 0;
result[ text pos ] := IF char in string( text[ text pos ], symbol pos, symbol table ) THEN # the character is in the symbol table at symbol pos # # (indexed from LWB text) - we store the positions # # indexed from 0 # symbol pos - LWB text ELSE # the character isn't in the symbol table # -1 FI;
# modify the symbol table so the latest character is at the front # symbol table := move to front( symbol table, result[ text pos ] )
OD;
result
END; # encode #
- decode a symbol #
PROC decode = ( []INT encoded )STRING: BEGIN
STRING result := "";
# dynamic symbol table # STRING symbol table := initial table;
FOR text pos FROM LWB encoded TO UPB encoded DO result +:= IF encoded[ text pos ] < 0 THEN # the encoded character wasn't in the string # "?" ELSE # the character is in the symbol table # symbol table[ encoded[ text pos ] + LWB symbol table ] FI;
# modify the symbol table so the latest character is at the front # symbol table := move to front( symbol table, encoded[ text pos ] )
OD;
result
END; # decode #
- routine to test the encode and decode routines #
PROC test encode and decode = ( STRING text )VOID: BEGIN
# procedure to format the encoded value # PROC format encoded value = ( []INT values )STRING: BEGIN STRING result := ""; FOR value pos FROM LWB values TO UPB values DO result +:= ( " " + whole( values[ value pos ], 0 ) ) OD;
result END; # format encoded value #
[]INT encoded = encode( text ); STRING decoded = decode( encoded );
print( ( ( text + " encodes to [" + format encoded value( encoded ) + " ] which " + IF text = decoded THEN "correctly" ELSE "INCORRECTLY" FI + " decodes to """ + decoded + """" ) , newline ) )
END; # test encode and decode #
main: (
test encode and decode( "broood" ) ; test encode and decode( "bananaaa" ) ; test encode and decode( "hiphophiphop" )
- ; test encode and decode( "abcdefghijklmnopqrstuvwxyz" ) #
- ; test encode and decode( "zyxwvutsrqponmlkjihgfedcba" ) #
) </lang>
- Output:
broood encodes to [ 1 17 15 0 0 5 ] which correctly decodes to "broood" bananaaa encodes to [ 1 1 13 1 1 1 0 0 ] which correctly decodes to "bananaaa" hiphophiphop encodes to [ 7 8 15 2 15 2 2 3 2 2 3 2 ] which correctly decodes to "hiphophiphop"
D
<lang d>import std.stdio, std.string, std.ascii, std.algorithm;
ptrdiff_t[] mtfEncoder(in string data) pure nothrow @safe in {
assert(data.countchars("a-z") == data.length);
} out(result) {
assert(result.length == data.length); assert(result.all!(e => e >= 0 && e < lowercase.length));
} body {
ubyte[lowercase.length] order = lowercase.representation; auto encoded = new typeof(return)(data.length);
size_t i = 0; foreach (immutable b; data) { immutable j = encoded[i++] = order[].countUntil(b); bringToFront(order[0 .. j], order[j .. j + 1]); }
return encoded;
}
string mtfDecoder(in ptrdiff_t[] encoded) pure nothrow @safe in {
assert(encoded.all!(e => e >= 0 && e < lowercase.length));
} out(result) {
assert(result.length == encoded.length); assert(result.countchars("a-z") == result.length);
} body {
ubyte[lowercase.length] order = lowercase.representation; auto decoded = new char[encoded.length];
size_t i = 0; foreach (immutable code; encoded) { decoded[i++] = order[code]; bringToFront(order[0 .. code], order[code .. code + 1]); }
return decoded;
}
void main() {
foreach (immutable word; ["broood", "bananaaa", "hiphophiphop"]) { immutable encoded = word.mtfEncoder; immutable decoded = encoded.mtfDecoder; writefln("'%s' encodes to %s, which decodes back to '%s'", word, encoded, decoded); assert(word == decoded); }
}</lang>
- Output:
'broood' encodes to [1, 17, 15, 0, 0, 5], which decodes back to 'broood' 'bananaaa' encodes to [1, 1, 13, 1, 1, 1, 0, 0], which decodes back to 'bananaaa' 'hiphophiphop' encodes to [7, 8, 15, 2, 15, 2, 2, 3, 2, 2, 3, 2], which decodes back to 'hiphophiphop'
Go
<lang go>package main
import (
"bytes" "fmt"
)
type moveToFront string
func (symbols moveToFront) encode(s string) []int {
seq := make([]int, len(s)) pad := []byte(symbols) c1 := []byte{0} for i := 0; i < len(s); i++ { c := s[i] c1[0] = c x := bytes.Index(pad, c1) seq[i] = x copy(pad[1:], pad[:x]) pad[0] = c } return seq
} func (symbols moveToFront) decode(seq []int) string {
chars := make([]byte, len(seq)) pad := []byte(symbols) for i, x := range seq { c := pad[x] chars[i] = c copy(pad[1:], pad[:x]) pad[0] = c } return string(chars)
}
func main() {
m := moveToFront("abcdefghijklmnopqrstuvwxyz") for _, s := range []string{"broood", "bananaaa", "hiphophiphop"} { enc := m.encode(s) dec := m.decode(enc) fmt.Println(s, enc, dec) if dec != s { panic("Whoops!") } }
}</lang>
- Output:
broood [1 17 15 0 0 5] broood bananaaa [1 1 13 1 1 1 0 0] bananaaa hiphophiphop [7 8 15 2 15 2 2 3 2 2 3 2] hiphophiphop
Icon and Unicon
Works in both languages: <lang unicon>procedure main(A)
every writes(s := !A, " -> [") do { every writes(!(enc := encode(&lcase,s))," ") write("] -> ",decode(&lcase,enc)) }
end
procedure encode(m,s)
enc := [] every c := !s do { m ?:= reorder(tab(i := upto(c)),move(1),tab(0)) put(enc,i-1) # Strings are 1-based } return enc
end
procedure decode(m,enc)
dec := "" every i := 1 + !enc do { # Lists are 1-based dec ||:= m[i] m ?:= reorder(tab(i),move(1),tab(0)) } return dec
end
procedure reorder(s1,s2,s3)
return s2||s1||s3
end</lang>
Sample run:
->mtfa broood bananaaa hiphophiphop broood -> [1 17 15 0 0 5 ] -> broood bananaaa -> [1 1 13 1 1 1 0 0 ] -> bananaaa hiphophiphop -> [7 8 15 2 15 2 2 3 2 2 3 2 ] -> hiphophiphop ->
J
<lang J>spindizzy=:3 :0
'seq table'=. y ndx=.$0 orig=. table for_sym. seq do. ndx=.ndx,table i.sym table=. sym,table-.sym end. ndx assert. seq-:yzzidnips ndx;orig
)
yzzidnips=:3 :0
'ndx table'=. y seq=. for_n. ndx do. seq=.seq,sym=. n{table table=. sym,table-.sym end. seq
)</lang>
Required examples:
<lang J> spindizzy 'broood';'abcdefghijklmnopqrstuvwxyz' 1 17 15 0 0 5
spindizzy 'bananaaa';'abcdefghijklmnopqrstuvwxyz'
1 1 13 1 1 1 0 0
spindizzy 'hiphophiphop';'abcdefghijklmnopqrstuvwxyz'
7 8 15 2 15 2 2 3 2 2 3 2</lang>
It's not clear, though, why anyone would think that this is any better than lookups against an unmodified symbol table.
PL/I
<lang pli>*process source attributes xref or(!);
/********************************************************************* * 25.5.2014 Walter Pachl translated from REXX *********************************************************************/ ed: Proc Options(main); Call enc_dec('broood'); Call enc_dec('bananaaa'); Call enc_dec('hiphophiphop');
enc_dec: Proc(in); Dcl in Char(*); Dcl out Char(20) Var Init(); Dcl st Char(26) Init('abcdefghijklmnopqrstuvwxyz'); Dcl sta Char(26) Init((st)); Dcl enc(20) Bin Fixed(31); Dcl encn Bin Fixed(31) Init(0); Dcl (i,p.k) Bin Fixed(31); Dcl c Char(1); Do i=1 To length(in); c=substr(in,i,1); p=index(st,c); encn+=1; enc(encn)=p-1; st=c!!left(st,p-1)!!substr(st,p+1); End; Put Skip List(' in='!!in); Put Skip List('sta='!!sta!!' original symbol table'); Put Skip Edit('enc=',(enc(i) do i=1 To encn))(a,20(F(3))); Put Skip List(' st='!!st!!' symbol table after encoding'); Do i=1 To encn; k=enc(i)+1; out=out!!substr(sta,k,1); sta=substr(sta,k,1)!!left(sta,k-1)!!substr(sta,k+1); End; Put Skip List('out='!!out); Put Skip List(' '); If out=in Then; Else Put Skip List('all wrong!!'); End;</lang>
- Output:
in=broood sta=abcdefghijklmnopqrstuvwxyz original symbol table enc= 1 17 15 0 0 5 st=dorbacefghijklmnpqstuvwxyz symbol table after encoding out=broood in=bananaaa sta=abcdefghijklmnopqrstuvwxyz original symbol table enc= 1 1 13 1 1 1 0 0 st=anbcdefghijklmopqrstuvwxyz symbol table after encoding out=bananaaa in=hiphophiphop sta=abcdefghijklmnopqrstuvwxyz original symbol table enc= 7 8 15 2 15 2 2 3 2 2 3 2 st=pohiabcdefgjklmnqrstuvwxyz symbol table after encoding out=hiphophiphop
Python
Python: Procedural
<lang python>from __future__ import print_function from string import ascii_lowercase
SYMBOLTABLE = list(ascii_lowercase)
def move2front_encode(strng, symboltable):
sequence, pad = [], symboltable[::] for char in strng: indx = pad.index(char) sequence.append(indx) pad = [pad.pop(indx)] + pad return sequence
def move2front_decode(sequence, symboltable):
chars, pad = [], symboltable[::] for indx in sequence: char = pad[indx] chars.append(char) pad = [pad.pop(indx)] + pad return .join(chars)
if __name__ == '__main__':
for s in ['broood', 'bananaaa', 'hiphophiphop']: encode = move2front_encode(s, SYMBOLTABLE) print('%14r encodes to %r' % (s, encode), end=', ') decode = move2front_decode(encode, SYMBOLTABLE) print('which decodes back to %r' % decode) assert s == decode, 'Whoops!'</lang>
- Output:
'broood' encodes to [1, 17, 15, 0, 0, 5], which decodes back to 'broood' 'bananaaa' encodes to [1, 1, 13, 1, 1, 1, 0, 0], which decodes back to 'bananaaa' 'hiphophiphop' encodes to [7, 8, 15, 2, 15, 2, 2, 3, 2, 2, 3, 2], which decodes back to 'hiphophiphop'
Python: Functional
From the procedural version note that through the encoding or decoding loops both the final output is accumulated as well as the symbol table being rearranged.
For the functional forms a 2-item list of output to be accumulated and symboltable manipulation is calculated then only the former accumulated as the later works to transform the symbol table in-place.
<lang python>def m2f_e(s, st):
return [[st.index(ch), st.insert(0, st.pop(st.index(ch)))][0] for ch in s]
def m2f_d(sq, st):
return .join([st[i], st.insert(0, st.pop(i))][0] for i in sq)
ST = list('abcdefghijklmnopqrstuvwxyz') for s in ['broood', 'bananaaa', 'hiphophiphop']:
encode = m2f_e(s, ST[::]) print('%14r encodes to %r' % (s, encode), end=', ') decode = m2f_d(encode, ST[::]) print('decodes back to %r' % decode) assert s == decode, 'Whoops!'</lang>
- Output:
Similar to that of the procedural version above.
REXX
<lang rexx>/* REXX ***************************************************************
- 25.05.2014 Walter Pachl
- REXX strings start with position 1
- /
Call enc_dec 'broood' Call enc_dec 'bananaaa' Call enc_dec 'hiphophiphop' Exit enc_dec: Procedure Parse Arg in st='abcdefghijklmnopqrstuvwxyz' sta=st /* remember this for decoding */ enc= Do i=1 To length(in)
c=substr(in,i,1) p=pos(c,st) enc=enc (p-1) st=c||left(st,p-1)substr(st,p+1) End
Say ' in='in Say 'sta='sta 'original symbol table' Say 'enc='enc Say ' st='st 'symbol table after encoding' out= Do i=1 To words(enc)
k=word(enc,i)+1 out=out||substr(sta,k,1) sta=substr(sta,k,1)left(sta,k-1)substr(sta,k+1) End
Say 'out='out Say ' ' If out==in Then Nop Else
Say 'all wrong!!'
Return </lang>
- Output:
in=broood sta=abcdefghijklmnopqrstuvwxyz original symbol table enc= 1 17 15 0 0 5 st=dorbacefghijklmnpqstuvwxyz symbol table after encoding out=broood in=bananaaa sta=abcdefghijklmnopqrstuvwxyz original symbol table enc= 1 1 13 1 1 1 0 0 st=anbcdefghijklmopqrstuvwxyz symbol table after encoding out=bananaaa in=hiphophiphop sta=abcdefghijklmnopqrstuvwxyz original symbol table enc= 7 8 15 2 15 2 2 3 2 2 3 2 st=pohiabcdefgjklmnqrstuvwxyz symbol table after encoding out=hiphophiphop
zkl
<lang zkl>fcn encode(text){ //-->List
st:=["a".."z"].aggregate(String).walk().toData().copy(); //"abcd..z" text.reduce(fcn(st,c,sink){ n:=st.index(c); sink.write(n); st.del(n).insert(0,c); },st,sink:=L()); sink;
}</lang> String are immutable so we create a bit bucket (which is mutable) to hold the symbol table which can then be modified in place. <lang zkl>fcn decode(list){ //-->String
st:=["a".."z"].aggregate(String).walk(); //"abcd..z" sink:=Sink(String); list.reduce('wrap(st,n){ c:=st[n]; sink.write(c); c+st.del(n); },st); sink.close();
}></lang> Here, we create a new symbol table each round as we would have to convert the byte we got from the bit bucket to string (so it is a wash garbage wise). <lang zkl>texts:=T("broood","bananaaa","hiphophiphop"); out:=texts.apply(encode); texts.zipWith(fcn(t,e){ println(t,"-->",e) },out);
out.apply(decode).println(); texts.zipWith('==,out.apply(decode)).println();</lang>
- Output:
broood-->L(1,17,15,0,0,5) bananaaa-->L(1,1,13,1,1,1,0,0) hiphophiphop-->L(7,8,15,2,15,2,2,3,2,2,3,2) L("broood","bananaaa","hiphophiphop") L(True,True,True)