Run-length encoding: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎{{header|Smalltalk}}: a note about a "feature" (without example))
m (→‎{{header|C}}: a note)
Line 113: Line 113:
}
}
}</lang>
}</lang>

'''Final note''': since the repeat counter value 0 has no meaning, it could be used as it would be 256, so extending by one the maximum number of repetitions representable with a single byte; or instead it could be used as a special marker to encode in a more efficient way (long) sequences of ''isolated characters'', e.g. "ABCDE" would be encoded as "1A1B1C1D1E"; it could be instead encoded as "05ABCDE".


=={{header|Haskell}}==
=={{header|Haskell}}==

Revision as of 14:55, 24 April 2009

This page uses content from Wikipedia. The original article was at Run-length_encoding. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)
Task
Run-length encoding
You are encouraged to solve this task according to the task description, using any language you may know.

Given a string containing uppercase characters (A-Z), compress repeated 'runs' of the same character by storing the length of that run, and provide a function to reverse the compression. The output can be anything, as long as you can recreate the input with it.

Example:

Input: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
Output: 12W1B12W3B24W1B14W

C

These functions have no check for the size of the output buffers.

Encoding function

Since repeat counter must fit a single byte in this implementation, it can't be greater than 255, so a byte repeated more than 255 times generates in the compressed stream more than 2 bytes (4 bytes if the length of the repeated byte sequence is less than 510 and so on)

<lang c>int rle_encode(char *out, const char *in, int l) {

 int dl, i;
 char cp, c;
 for(cp=c= *in++, i = 0, dl=0; l>=0 ; c = *in++, l-- ) {
   if ( c == cp ) {
     i++;
     if ( i > 255 ) {

*out++ = 255; *out++ = c; dl += 2; i = 1;

     }
   } else {
     *out++ = i;
     *out++ = cp; dl += 2;
     i = 1;
   }
   cp = c;
 }
 return dl;

}</lang>

Decoding function

<lang c>int rle_decode(char *out, const char *in, int l) {

 int i, j, tb;
 char c;
 for(tb=0 ; l>=0 ; l -= 2 ) {
   i = *in++;
   c = *in++;
   tb += i;
   while(i-- > 0) *out++ = c;
 }
 return tb;

}</lang>

Usage example

<lang c>#include <stdio.h>

  1. include <stdlib.h>
  2. include <string.h>

const char *o = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";

int main() {

 char *d = malloc(strlen(o));
 char *oc = malloc(strlen(o));
 
 int rl = rle_encode(d, o, strlen(o));
 /* fwrite(d, 1, rl, stdout); */
 int ocl = rle_decode(oc, d, rl);
 fwrite(oc, 1, ocl, stdout);
 free(d); free(oc);
 return 0;

}</lang>

In the following codes, encoding and decoding are implementend as "filters" which compress/decompress standard input to standard output writing ASCII strings; they will work as long as the input has no ASCII digits in it, and the compressed/original ratio of a "single group" will be less than or equal to 1 as long as the ASCII decimal representation's length of the repeat counter will be shorter than the length of the "group". It should be so except in the case the group is a single isolated character, e.g. B gives 1B (one byte against two compressed bytes)

Encoding filter

<lang c>#include <stdio.h>

int main() {

 int i, c, cp;
 for(cp=c=getchar(), i = 0; c != EOF; c = getchar() ) {
   if ( c == cp ) {
     i++;
   } else {
     printf("%d%c", i, cp);
     i = 1;
   }
   cp = c;
 }

}</lang>

Decoding filter

<lang c>#include <stdio.h>

int main() {

 int i, c, j;
 while( scanf("%d%c", &i, &c) == 2 ) {
   for(j=0; j < i; j++) printf("%c", c);
 }

}</lang>

Final note: since the repeat counter value 0 has no meaning, it could be used as it would be 256, so extending by one the maximum number of repetitions representable with a single byte; or instead it could be used as a special marker to encode in a more efficient way (long) sequences of isolated characters, e.g. "ABCDE" would be encoded as "1A1B1C1D1E"; it could be instead encoded as "05ABCDE".

Haskell

<lang haskell>-- needs Data.List for 'group' function import Data.List

-- Datatypes type Encoded = [(Int, Char)] -- An encoded String with form [(times, char), ...] type Decoded = String

-- Takes a decoded string and returns an encoded list of tuples rlencode :: Decoded -> Encoded rlencode s = zip (map length splitstr) (map head splitstr)

 where splitstr = group s

-- Takes an encoded list of tuples and returns the associated decoded String rldecode :: Encoded -> Decoded rldecode (s:sx)

 | sx == []  = decodeTuple s
 | otherwise = decodeTuple s ++ rldecode sx
   where repeatchar c n  -- helper function to repeat a character, c, n times
           | n < 1     = []
           | otherwise = c : repeatchar c (n - 1)
         decodeTuple t = repeatchar (snd t) (fst t)

main :: IO () main = do

 -- Get input
 putStrLn "String to encode: "
 input <- getLine
 -- Output encoded and decoded versions of input
 let encoded = rlencode input
     decoded = rldecode encoded
   in putStrLn $ "Encoded: " ++ show encoded ++ "\nDecoded: " ++ show decoded</lang>

Java

<lang java>import java.util.regex.Matcher; import java.util.regex.Pattern;

public class RunLengthEncoding {

public String encode(String source) { StringBuffer dest = new StringBuffer(); for (int i = 0; i < source.length(); i++) { int runLength = 1; while( i+1 < source.length() && source.charAt(i) == source.charAt(i+1) ) { runLength++; i++; } dest.append(runLength); dest.append(source.charAt(i)); } return dest.toString(); }


public String decode(String source) { StringBuffer dest = new StringBuffer(); Pattern pattern = Pattern.compile("[0-9]+|[a-zA-Z]"); Matcher matcher = pattern.matcher(source);

while (matcher.find()) { int number = Integer.parseInt(matcher.group()); matcher.find(); while (number-- != 0) { dest.append(matcher.group()); } } return dest.toString(); }

public static void main(String[] args) {

            RunLengthEncoding RLE = new RunLengthEncoding();
            String example = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";
            System.out.println(RLE.encode(example));
            System.out.println(RLE.decode("1W1B1W1B1W1B1W1B1W1B1W1B1W1B"));

} }</lang>

Lisp

<lang lisp>(defun group-similar (sequence &key (test 'eql))

 (loop for x in (rest sequence)
       with temp = (subseq sequence 0 1)
       if (funcall test (first temp) x)
         do (push x temp)
       else
         collect temp
         and do (setf temp (list x))))

(defun run-length-encode (sequence)

 (mapcar (lambda (group) (list (first group) (length group)))
         (group-similar (coerce sequence 'list))))

(defun run-length-decode (sequence)

 (reduce (lambda (s1 s2) (concatenate 'simple-string s1 s2))
         (mapcar (lambda (elem)
                   (make-string (second elem)
                                :initial-element
                                (first elem)))
                 sequence)))

(run-length-encode "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW") (run-length-decode '((#\W 12) (#\B 1) (#\W 12) (#\B 3) (#\W 24) (#\B 1)))</lang>

Python

<lang python>def encode(input_string):

   count = 1
   prev = 
   lst = []
   for character in input_string:
       if character != prev:
           if prev:
               entry = (prev,count)
               lst.append(entry)
               #print lst
           count = 1
           prev = character
       elif character == prev:
           count += 1
   else:
       entry = (character,count)
       lst.append(entry)
   return lst


def decode(lst):

   q = []
   for i in range(len(lst)):
       z = 1
       while z <= lst[i][1]:
           r = lst[i][0]
           q.append(r)
           z += 1
   return "".join(q)
  1. Method call

encode("aaaaahhhhhhmmmmmmmuiiiiiiiaaaaaa") decode([('a', 5), ('h', 6), ('m', 7), ('u', 1), ('i', 7), ('a', 6)])</lang>

Smalltalk

Works with: GNU Smalltalk

The class RunLengthEnc holds a representation of a run length encoded sequence of objects.

<lang smalltalk>Object subclass: RunLengthEnc [

 |counters contents|
 <category: 'Compression'>
 <comment: 'My instances are similar to a Bag, except

that items are ordered and counted iff they are adjacent. So that this instance keeps a representation of the added items suitable for performing a RunLengthEncoding, hence the name.'>

 RunLengthEnc class >> new [ ^self basicNew initialize ]
 size [ ^counters size ]
 add: anObject [ ^(self add: anObject withCount: 1) ]
 add: anObject withCount: anInt [
   anObject isNil
       ifTrue: [ 
           SystemExceptions.InvalidArgument signalOn: anObject
             reason: 'RunLengthEnc encodes existing objects, e.g. integers or characters, not nil' 
       ].
   (self size) > 0
   ifTrue: [
     (contents last) == anObject
         ifTrue: [
            self incrementLastCounter: anInt.
         ]
         ifFalse: [

self appendObject: anObject withCount: anInt

         ]
   ] ifFalse: [ self appendObject: anObject withCount: anInt ].
   ^anObject
 ]
 initialize [
   counters := OrderedCollection new.
   contents := OrderedCollection new.
 ]
 appendObject: anObject withCount: anInt [
   contents addLast: anObject.
   counters addLast: anInt
 ]
 appendObject: anObject [
   contents addLast: anObject.
   counters addLast: 1
 ]
 incrementLastCounter: howMuch [ | c |
   c := counters removeLast.
   counters addLast: (c+howMuch)
 ]
 "the 'do:' can be used to let the user store the compressed 'stream' as s/he
  prefers, while 'add:withCount:' can be used to rebuild the informations from
  the custom storage" 
 do: aBlock [
   1 to: (counters size) do: [ :i | | l |
     aBlock value: (contents at: i) value: (counters at: i)
   ]
 ]
 asOrderedCollection [ |oc|
   oc := OrderedCollection new.
   self do: [ :o :c |
     1 to: c do: [ :i| oc add: o ]
   ].
   ^oc
 ]
 printOn: aStream [
     "output a representation of the object:
      counter [object] ... for every object"
     1 to: (counters size) do: [ :i |
        (counters at: i) printOn: aStream.

aStream nextPut: (Character value: 32). (contents at: i) printOn: aStream. aStream nextPut: (Character nl).

     ]
 ]  
 asString [ |oc| 
   "'build' a string from the run length encoded objects;
    works only if objects are Characters or Strings"
   oc := self asOrderedCollection.
   ^(oc asString)
 ]

].</lang>

The following extension to the OrderedCollection class allows to run length encode an ordered collection (theoretically of any objects' kind, but the RunLengthEnc class is supposed to work with characters mainly).

<lang smalltalk>OrderedCollection extend [

  asRunLengthEnc [ |rc|
      rc := RunLengthEnc new.
      self do: [ :o |
         rc add: o
      ].
      ^rc
  ]

].</lang>

The following extention to the String class allows to run length encode a string (it is basically a shortcut for aString asOrderedCollection asRunLengthEnc).

<lang smalltalk>String extend [

  asRunLengthEnc [ ^self asOrderedCollection asRunLengthEnc ]

].</lang>


Usage examples

<lang smalltalk>|cs s os|

s := 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'.

"let us run length encode the string" cs := s asRunLengthEnc. cs print. "this shows an ASCII representation of the run length encoded objects collection;

          in this case, of the string"

"let us show that the class is able to return the string back; this really works

iff the objects of the collection are strings or characters"

cs asString displayNl.</lang>

The class does not mandate a way of storing itself into a file that can be loaded later. The following sample code shows how it could be done quickly, but not efficiently from the point of view of a compressor.

<lang smalltalk>|f| "let's write the object and its informations to a file..." f := FileStream open: 'rledump' mode: FileStream write. ObjectDumper dump: cs to: f. f close.

"... and let's read it back" |o| f := FileStream open: 'rledump' mode: FileStream read. o := ObjectDumper loadFrom: f. o print. "show that the object holds the same informations of cs" f close.</lang>