Run-length encoding: Difference between revisions

Content added Content deleted

Inline

Revision as of 01:52, 9 June 2010

This page uses content from Wikipedia. The original article was at Run-length_encoding. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

Given a string containing uppercase characters (A-Z), compress repeated 'runs' of the same character by storing the length of that run, and provide a function to reverse the compression. The output can be anything, as long as you can recreate the input with it.

Example:

Input: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

Output: 12W1B12W3B24W1B14W

Note: the encoding step in the above example is the same as a step of the Look-and-say sequence.

Ada

<lang Ada>with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings.Fixed; use Ada.Strings.Fixed; procedure Test_Run_Length_Encoding is

  function Encode (Data : String) return String is
  begin
     if Data'Length = 0 then
        return "";
     else
        declare
           Code  : constant Character := Data (Data'First);
           Index : Integer := Data'First + 1;
        begin
           while Index <= Data'Last and then Code = Data (Index) loop
              Index := Index + 1;
           end loop;
           declare
              Prefix : constant String := Integer'Image (Index - Data'First);
           begin
              return Prefix (2..Prefix'Last) & Code & Encode (Data (Index..Data'Last));
           end;
        end;
     end if;
  end Encode;
  function Decode (Data : String) return String is
  begin
     if Data'Length = 0 then
        return "";
     else
        declare
           Index : Integer := Data'First;
           Count : Natural := 0;
        begin
           while Index < Data'Last and then Data (Index) in '0'..'9' loop
              Count := Count * 10 + Character'Pos (Data (Index)) - Character'Pos ('0');
              Index := Index + 1;
           end loop;
           if Index > Data'First then
              return Count * Data (Index) & Decode (Data (Index + 1..Data'Last));
           else
              return Data;
           end if;
        end;
     end if;
  end Decode;

begin

  Put_Line (Encode ("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"));
  Put_Line (Decode ("12W1B12W3B24W1B14W"));

end Test_Run_Length_Encoding;</lang> Sample output:

12W1B12W3B24W1B14W
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

AWK

Works with: gawk

It works with "textual" input. Lines containing numbers are skipped, since they can't be represented in a not ambiguous way in this implementation (e.g. "11AA" would be encoded as "212A", which would be decoded as A repeated 212 times!)

Encoding

<lang awk>BEGIN {

FS=""

} /^[^0-9]+$/ {

 cp = $1; j = 0
 for(i=1; i <= NF; i++) {
   if ( $i == cp ) {
     j++; 
   } else {
     printf("%d%c", j, cp)
     j = 1
   }
   cp = $i
 }
 printf("%d%c", j, cp)

}</lang>

Decoding

<lang awk>BEGIN {

 RS="[0-9]+[^0-9]"
 final = "";

} {

 match(RT, /([0-9]+)([^0-9])/, r)
 for(i=0; i < int(r[1]); i++) {
   final = final r[2]
 }

} END {

 print final

}</lang>

ALGOL 68

Works with: ALGOL 68 version Revision 1 - no extensions to language used

Works with: ALGOL 68G version Any - tested with release 1.18.0-9h.tiny

Works with: ELLA ALGOL 68 version Any (with appropriate job cards) - tested with release 1.8-8d

Note: The following uses iterators, eliminating the need of declaring arbitrarily large CHAR arrays for caching. <lang algol68>STRING input := "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"; STRING output := "12W1B12W3B24W1B14W";

MODE YIELDCHAR = PROC(CHAR)VOID; MODE GENCHAR = PROC(YIELDCHAR)VOID;

PROC gen char string = (REF STRING s, YIELDCHAR yield)VOID:

 FOR i FROM LWB s TO UPB s DO yield(s[i]) OD;

CO

Note: The following 2 lines use currying. This not supported by ELLA ALGOL 68RS #

GENCHAR input seq = gen char string(input,),

       output seq = gen char string(output,);

END CO

GENCHAR

 input seq = (YIELDCHAR yield)VOID: gen char string(input, yield),
 output seq = (YIELDCHAR yield)VOID: gen char string(output, yield);

PROC gen encode = (GENCHAR gen char, YIELDCHAR yield)VOID: (

 INT count := 0;
 CHAR prev;

FOR CHAR c IN # gen char( # ) DO ( #
1. (CHAR c)VOID: (

     IF count = 0 THEN
       count := 1;
       prev := c
     ELIF c NE prev THEN
       STRING str count := whole(count,0);
       gen char string(str count, yield); count := 1;
       yield(prev); prev := c
     ELSE
       count +:=1
     FI

OD # ));

 IF count NE 0 THEN
   STRING str count := whole(count,0);
   gen char string(str count,yield);
   yield(prev)
 FI

);

STRING zero2nine = "0123456789";

PROC gen decode = (GENCHAR gen char, YIELDCHAR yield)VOID: (

 INT repeat := 0;

FOR CHAR c IN # gen char( # ) DO ( #
1. (CHAR c)VOID: (

   IF char in string(c, LOC INT, zero2nine) THEN
     repeat := repeat*10 + ABS c - ABS "0"
   ELSE
     FOR i TO repeat DO yield(c) OD;
     repeat := 0
   FI

OD # ))

);

iterate through input string #

print("Encode input: ");

FOR CHAR c IN # gen encode(input seq, # ) DO ( #
1. (CHAR c)VOID:

   print(c)

OD # );

print(new line);

iterate through output string #

print("Decode output: ");

FOR CHAR c IN # gen decode(output seq, # ) DO ( #
1. (CHAR c)VOID:

   print(c)

OD # );

print(new line)</lang> Output:

Encode input: 12W1B12W3B24W1B14W
Decode output: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

AutoHotkey

<lang AutoHotkey>MsgBox % key := rle_encode("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW") MsgBox % rle_decode(key)

rle_encode(message) {

 StringLeft, previous, message, 1
 StringRight, last, message, 1
 message .= Asc(Chr(last)+1)
 count = 0
 Loop, Parse, message
 {
   If (previous == A_LoopField)
     count +=1
   Else
   {
     output .= previous . count
     previous := A_LoopField 
     count = 1
   }
 }
 Return output

}

rle_decode(message) {

 pos = 1
 While, item := RegExMatch(message, "\D", char, pos)
 {
   digpos := RegExMatch(message, "\d+", dig, item)
   Loop, % dig
     output .= char
   pos := digpos 
 }
 Return output

}</lang>

BASIC

Works with: QBasic

Translation of: PowerBASIC

<lang qbasic>DECLARE FUNCTION RLDecode$ (i AS STRING) DECLARE FUNCTION RLEncode$ (i AS STRING)

DIM initial AS STRING, encoded AS STRING, decoded AS STRING

INPUT "Type something: ", initial encoded = RLEncode(initial) decoded = RLDecode(encoded) PRINT initial PRINT encoded PRINT decoded

FUNCTION RLDecode$ (i AS STRING)

   DIM Loop0 AS LONG, rCount AS STRING, outP AS STRING, m AS STRING

   FOR Loop0 = 1 TO LEN(i)
       m = MID$(i, Loop0, 1)
       SELECT CASE m
           CASE "0" TO "9"
               rCount = rCount + m
           CASE ELSE
               IF LEN(rCount) THEN
                   outP = outP + STRING$(VAL(rCount), m)
                   rCount = ""
               ELSE
                   outP = outP + m
               END IF
       END SELECT
   NEXT
   RLDecode$ = outP

END FUNCTION

FUNCTION RLEncode$ (i AS STRING)

   DIM tmp1 AS STRING, tmp2 AS STRING, outP AS STRING
   DIM Loop0 AS LONG, rCount AS LONG

   tmp1 = MID$(i, 1, 1)
   tmp2 = tmp1
   rCount = 1

   FOR Loop0 = 2 TO LEN(i)
       tmp1 = MID$(i, Loop0, 1)
       IF tmp1 <> tmp2 THEN
           outP = outP + LTRIM$(RTRIM$(STR$(rCount))) + tmp2
           tmp2 = tmp1
           rCount = 1
       ELSE
           rCount = rCount + 1
       END IF
   NEXT

   outP = outP + LTRIM$(RTRIM$(STR$(rCount)))
   outP = outP + tmp2
   RLEncode$ = outP

END FUNCTION</lang>

Sample output (last one shows errors from using numbers in input string):

Type something: aaaaeeeeeeiiiioooouuy
aaaaeeeeeeiiiioooouuy
4a6e4i4o2u1y
aaaaeeeeeeiiiioooouuy

Type something: My dog has fleas.
My dog has fleas.
1M1y1 1d1o1g1 1h1a1s1 1f1l1e1a1s1.
My dog has fleas.

Type something: 1r
1r
111r
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

C

See Run-length encoding/C

C++

<lang cpp>#include <iostream>

include <string>
include <sstream>
include <boost/regex.hpp>
include <cstdlib>

std::string encode ( const std::string & ) ; std::string decode ( const std::string & ) ;

int main( ) {

  std::string to_encode ( "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" ) ;
  std::cout << to_encode << " encoded:" << std::endl ;
  std::string encoded ( encode ( to_encode ) ) ;
  std::cout << encoded << std::endl ;
  std::string decoded ( decode( encoded ) ) ;
  std::cout << "Decoded again:\n" ;
  std::cout << decoded << std::endl ;
  if ( to_encode == decoded ) 
     std::cout << "It must have worked!\n" ;
  return 0 ;

}

std::string encode( const std::string & to_encode ) {

  std::string::size_type found = 0 , nextfound = 0 ;
  std::ostringstream oss ;
  nextfound = to_encode.find_first_not_of( to_encode[ found ] , found ) ;
  while ( nextfound != std::string::npos ) {
     oss << nextfound - found ;
     oss << to_encode[ found ] ;
     found = nextfound ;
     nextfound = to_encode.find_first_not_of( to_encode[ found ] , found ) ;
  }
  //since we must not discard the last characters we add them at the end of the string
  std::string rest ( to_encode.substr( found ) ) ;//last run of characters starts at position found 
  oss << rest.length( ) << to_encode[ found ] ;
  return oss.str( ) ;

}

std::string decode ( const std::string & to_decode ) {

  boost::regex e ( "(\\d+)(\\w)" ) ;
  boost::match_results<std::string::const_iterator> matches ;
  std::ostringstream oss ;
  std::string::const_iterator start = to_decode.begin( ) , end = to_decode.end( ) ;
  while ( boost::regex_search ( start , end , matches , e ) ) {
     std::string numberstring ( matches[ 1 ].first , matches[ 1 ].second ) ;
     int number = atoi( numberstring.c_str( ) ) ;
     std::string character ( matches[ 2 ].first , matches[ 2 ].second ) ;
     for ( int i = 0 ; i < number ; i++ )

oss << character ;

     start = matches[ 2 ].second ;
  }
  return oss.str( ) ;

}</lang>

Clojure

<lang lisp>(defn step [[result, n, c], new] ;function used in encode's reduce call

 (cond
   (zero? n) [result, 1, new]
   (= c new) [result, (inc n), c]
   :else     [(conj result [n c]), 1, new]))

(defn encode [s]

 (let [[result,n,chr] (reduce step [[],0,nil] s)]
   (if (= n 0)
     result
     (conj result [n chr]))))

(defn decode [v]

 (let [expand (fn n c (repeat n c))]
   (apply str (mapcat expand v))))</lang>

<lang lisp>(def uncoded "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW") (def coded [[12 \W] [1 \B] [12 \W] [3 \B] [24 \W] [1 \B] [14 \W]])

(assert (= (encode uncoded) coded)) (assert (= (decode coded) uncoded))</lang>

Common Lisp

<lang lisp>(defun group-similar (sequence &key (test 'eql))

 (loop for x in (rest sequence)
       with temp = (subseq sequence 0 1)
       if (funcall test (first temp) x)
         do (push x temp)
       else
         collect temp
         and do (setf temp (list x))))

(defun run-length-encode (sequence)

 (mapcar (lambda (group) (list (first group) (length group)))
         (group-similar (coerce sequence 'list))))

(defun run-length-decode (sequence)

 (reduce (lambda (s1 s2) (concatenate 'simple-string s1 s2))
         (mapcar (lambda (elem)
                   (make-string (second elem)
                                :initial-element
                                (first elem)))
                 sequence)))

(run-length-encode "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW") (run-length-decode '((#\W 12) (#\B 1) (#\W 12) (#\B 3) (#\W 24) (#\B 1)))</lang>

D

<lang d>import std.stdio; import std.string; void main() {

       char[]rle = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";
       char[]encoded = encode(rle);
       char[]decoded = decode(encoded);
       writefln("\"%s\" == \"%s\", intermediary %s",rle,decoded,encoded);
       assert(rle == decoded);

}

// this is essentially an exact copy of the look and say D function char[]encode(char[]input) {

       char last = input[$-1];
       char[]output;
       int count = 0;
       foreach_reverse(i;input) {
               if (i == last) {
                       count++;
               } else {
                       output = toString(count)~last~output;
                       count = 1;
                       last = i;
               }
       }
       output = toString(count)~last~output;
       return output;

}

char[]decode(char[]input) {

       char[]i = "";
       char[]ret;
       foreach(letter;input) {
               if (letter <= 'Z' && letter >= 'A') {
                       // this is the letter to be repeated
                       if (!i.length) throw new Exception("Can not repeat a letter without a number of repetitions");
                       ret ~= repeat([letter],atoi(i));
                       i = null;
               } else if (letter <= '9' && letter >= '0') {
                       // this is a digit to mark the number of repetitions
                       i ~= letter;
               } else {
                       throw new Exception("'"~letter~"' is not capalphanumeric");
               }
       }
       return ret;

}</lang>

E

<lang e>def rle(string) {

 var seen := null
 var count := 0
 var result := []
 def put() {
   if (seen != null) {
     result with= [count, seen]
   }
 }
 for ch in string {
   if (ch != seen) {
     put()
     seen := ch
     count := 0
   }
   count += 1
 }
 put()
 return result

}

def unrle(coded) {

 var result := ""
 for [count, ch] in coded {
   result += E.toString(ch) * count
 }
 return result

}</lang>

<lang e>? rle("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW")

value: [[12, 'W'], [1, 'B'], [12, 'W'], [3, 'B'], [24, 'W'], [1, 'B'], [14, 'W']]

? unrle(rle("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"))

value: "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"</lang>

Erlang

A single-threaded/process version with a simple set of unit test.

<lang erlang>-module(rle).

-export([encode/1,decode/1]).

-include_lib("eunit/include/eunit.hrl").

encode(S) ->

   doEncode(string:substr(S, 2), string:substr(S, 1, 1), 1, []).

doEncode([], CurrChar, Count, R) ->

   R ++ integer_to_list(Count) ++ CurrChar;

doEncode(S, CurrChar, Count, R) ->

   NextChar = string:substr(S, 1, 1),
   if
       NextChar == CurrChar ->
           doEncode(string:substr(S, 2), CurrChar, Count + 1, R);
       true ->
           doEncode(string:substr(S, 2), NextChar, 1,
               R ++ integer_to_list(Count) ++ CurrChar)
   end.

decode(S) ->

   doDecode(string:substr(S, 2), string:substr(S, 1, 1), []).

doDecode([], _, R) ->

R;

doDecode(S, CurrString, R) ->

   NextChar = string:substr(S, 1, 1),
   IsInt = erlang:is_integer(catch(erlang:list_to_integer(NextChar))),
   if
       IsInt ->
           doDecode(string:substr(S, 2), CurrString ++ NextChar, R);
       true ->
           doDecode(string:substr(S, 2), [],
               R ++ string:copies(NextChar, list_to_integer(CurrString)))
   end.

rle_test_() ->

   PreEncoded =
       "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW",
   Expected = "12W1B12W3B24W1B14W",
   [
       ?_assert(encode(PreEncoded) =:= Expected),
       ?_assert(decode(Expected) =:= PreEncoded),
       ?_assert(decode(encode(PreEncoded)) =:= PreEncoded)
   ].</lang>

FALSE

<lang false>1^[^$~][$@$@=$[%%\1+\$0~]?~[@.,1\$]?%]#%\., {encode}</lang> <lang false>[0[^$$'9>'0@>|~]['0-\10*+]#]n: [n;!$~][[\$][1-\$,]#%%]#%% {decode}</lang>

Fan

<lang Fan>**

- Generates a run-length encoding for a string

class RLE {

 Run[] encode(Str s)
 {
   runs := Run[,]

   s.size.times |i|
   {
     ch := s[i]
     if (runs.size==0 || runs.last.char != ch)
       runs.add(Run(ch))
     runs.last.inc
   }
   return runs
 }

 Str decode(Run[] runs)
 {
   buf := StrBuf()
   runs.each |run|
   {
     run.count.times { buf.add(run.char.toChar) }
   }
   return buf.toStr
 }

 Void main()
 {
   echo(decode(encode(

"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"

       )))
 }

}

internal class Run {

 Int char
 Int count := 0
 new make(Int ch) { char = ch }
 Void inc() { ++count }

 override Str toStr() { return "${count}${char.toChar}" }

}</lang>

Forth

<lang forth>variable a

n>a (.) tuck a @ swap move a +! ;

>a a @ c! 1 a +! ;

encode ( c-addr +n a -- a n' )

 dup a ! -rot over c@ 1 2swap 1 /string bounds ?do
   over i c@ = if 1+
   else n>a >a i c@ 1 then
 loop n>a >a  a @ over - ;

digit? [char] 0 [ char 9 1+ literal ] within ;

decode ( c-addr +n a -- a n' )

 dup a ! 0 2swap bounds ?do
   i c@ digit? if 10 * i c@ [char] 0 - + else
   a @ over i c@ fill a +! 0 then
 loop drop a @ over - ;</lang>

Example:

<lang forth>s" WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" here 1000 + encode here 2000 + decode cr 3 spaces type

  WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW</lang>

Haskell

<lang haskell>import Data.List (group)

-- Datatypes type Encoded = [(Int, Char)] -- An encoded String with form [(times, char), ...] type Decoded = String

-- Takes a decoded string and returns an encoded list of tuples rlencode :: Decoded -> Encoded rlencode = map (\g -> (length g, head g)) . group

-- Takes an encoded list of tuples and returns the associated decoded String rldecode :: Encoded -> Decoded rldecode = concatMap decodeTuple

   where decodeTuple (n,c) = replicate n c

main :: IO () main = do

 -- Get input
 putStr "String to encode: "
 input <- getLine
 -- Output encoded and decoded versions of input
 let encoded = rlencode input
     decoded = rldecode encoded
 putStrLn $ "Encoded: " ++ show encoded ++ "\nDecoded: " ++ show decoded</lang>

Icon and Unicon

Icon

<lang Icon>procedure main(arglist)

s := "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"

write("s=",image(s)) write("s1=",image(s1 := rle_encode(s))) write("s2=",image(s2 := rle_decode(s1))) write("s3=",image(rle_encode(s3 := "1111s1")) | (s3 || " is unencodable"))

if s ~== s2 then write("Encode/Decode problem.")

  else write("Encode/Decode worked.")

end

procedure rle_encode(s) local es, n, c

es := ""

s ? repeat {

  n := &pos
  if pos(0) then return es
  c := move(1)
  while =c    
  es ||:= (&pos - n ) || c
  }

end

procedure rle_decode(es) local s, n

s := "" es ? repeat {

  if pos(0) then return s
  ( n := tab(many(&digits)), s ||:= repl(move(1),n) ) | fail
  }

end</lang>

Sample output:

s="WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
s1="12W1B12W3B24W1B14W"
s2="WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
s3=1111s1 is unencodable
Encode/Decode worked.

Unicon

This Icon solution works in Unicon.

J

Solution: <lang j>rle=: ([: ; (":@# , {.)&.>)@((1 , }. ~: }:) <;.1 ]) rld=: '0123456789'&((i. ".@:{ ' ' ,~ [) # -.@:e.~ # ])</lang>

Example: <lang j> rle 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW' 12W1B12W3B24W1B14W

  rld '12W1B12W3B24W1B14W'

WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW</lang>

Java

<lang java>import java.util.regex.Matcher; import java.util.regex.Pattern; public class RunLengthEncoding {

   public static String encode(String source) {
       StringBuffer dest = new StringBuffer();
       for (int i = 0; i < source.length(); i++) {
           int runLength = 1;
           while (i+1 < source.length() && source.charAt(i) == source.charAt(i+1)) {
               runLength++;
               i++;
           }
           dest.append(runLength);
           dest.append(source.charAt(i));
       }
       return dest.toString();
   }

   public static String decode(String source) {
       StringBuffer dest = new StringBuffer();
       Pattern pattern = Pattern.compile("[0-9]+|[a-zA-Z]");
       Matcher matcher = pattern.matcher(source);
       while (matcher.find()) {
           int number = Integer.parseInt(matcher.group());
           matcher.find();
           while (number-- != 0) {
               dest.append(matcher.group());
           }
       }
       return dest.toString();
   }

   public static void main(String[] args) {
       String example = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";
       System.out.println(encode(example));
       System.out.println(decode("1W1B1W1B1W1B1W1B1W1B1W1B1W1B"));
   }

}</lang> Tests:

Library: JUnit

<lang java>import static org.junit.Assert.assertEquals;

import org.junit.Test;

public class RunLengthEncodingTest { private RLE = new RunLengthEncoding();

@Test public void encodingTest() { assertEquals("1W", RLE.encode("W")); assertEquals("4W", RLE.encode("WWWW")); assertEquals("5w4i7k3i6p5e4d2i1a", RLE.encode("wwwwwiiiikkkkkkkiiippppppeeeeeddddiia")); assertEquals("12B1N12B3N24B1N14B", RLE.encode("BBBBBBBBBBBBNBBBBBBBBBBBBNNNBBBBBBBBBBBBBBBBBBBBBBBBNBBBBBBBBBBBBBB")); assertEquals("12W1B12W3B24W1B14W", RLE.encode("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW")); assertEquals("1W1B1W1B1W1B1W1B1W1B1W1B1W1B", RLE.encode("WBWBWBWBWBWBWB"));

}

@Test public void decodingTest() { assertEquals("W", RLE.decode("1W")); assertEquals("WWWW", RLE.decode("4W")); assertEquals("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW", RLE.decode("12W1B12W3B24W1B14W")); assertEquals("WBWBWBWBWBWBWB", RLE.decode("1W1B1W1B1W1B1W1B1W1B1W1B1W1B")); assertEquals("WBWBWBWBWBWBWB", RLE.decode("1W1B1W1B1W1B1W1B1W1B1W1B1W1B"));

} }</lang>

JavaScript

Here's an encoding method that walks the input string character by character <lang javascript>function encode(input) {

   var encoding = [];
   var prev, count, i;
   for (count = 1, prev = input[0], i = 1; i < input.length; i++) {
       if (input[i] != prev) {
           encoding.push([count, prev]);
           count = 1;
           prev = input[i];
       }
       else 
           count ++;
   }
   encoding.push([count, prev]);
   return encoding;

}</lang>

Here's an encoding method that uses a regular expression to grab the character runs (

Works with: JavaScript version 1.6

for the forEach method)

<lang javascript>function encode_re(input) {

   var encoding = [];
   input.match(/(.)\1*/g).forEach(function(substr){ encoding.push([substr.length, substr[0]]) });
   return encoding;

}</lang>

And to decode (see Repeating a string) <lang javascript>function decode(encoded) {

   var output = "";
   encoded.forEach(function(pair){ output += new Array(1+pair[0]).join(pair[1]) })
   return output;

}</lang>

Logo

<lang logo>to encode :str [:out "||] [:count 0] [:last first :str]

 if empty? :str [output (word :out :count :last)]
 if equal? first :str :last [output (encode bf :str :out :count+1 :last)]
 output (encode bf :str (word :out :count :last) 1 first :str)

end

to reps :n :w

 output ifelse :n = 0 ["||] [word :w reps :n-1 :w]

end to decode :str [:out "||] [:count 0]

 if empty? :str [output :out]
 if number? first :str [output (decode bf :str :out 10*:count + first :str)]
 output (decode bf :str word :out reps :count first :str)

end

make "foo "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW make "rle encode :foo show equal? :foo decode :rle</lang>

Lua

<lang lua>local C, Ct, R, Cf, Cc = lpeg.C, lpeg.Ct, lpeg.R, lpeg.Cf, lpeg.Cc astable = Ct(C(1)^0)

function compress(t)

   local ret = {}
   for i, v in ipairs(t) do
     if t[i-1] and v == t[i-1] then
       ret[#ret - 1] = ret[#ret - 1] + 1
     else
       ret[#ret + 1] = 1
       ret[#ret + 1] = v
     end
   end
   t = ret
   return table.concat(ret)

end q = io.read() print(compress(astable:match(q)))

undo = Ct((Cf(Cc"0" * C(R"09")^1, function(a, b) return 10 * a + b end) * C(R"AZ"))^0)

function decompress(s)

 t = undo:match(s)
 local ret = ""
 for i = 1, #t - 1, 2 do
   for _ = 1, t[i] do
     ret = ret .. t[i+1]
   end
 end
 return ret

end</lang>

Mathematica

Custom functions using Map, Apply, pure functions, replacing using pattern matching, delayed rules and other functions: <lang Mathematica>RunLengthEncode[input_String]:=StringJoin@@Sequence@@@({ToString @Length[#],First[#]}&/@Split[Characters[input]]) RunLengthDecode[input_String]:=StringJoin@@ConstantArray@@@Reverse/@Partition[(Characters[input]/.(ToString[#]->#&/@Range[0,9]))//.{x___,i_Integer,j_Integer,y___}:>{x,10i+j,y},2]</lang> Example: <lang Mathematica>mystring="WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"; RunLengthEncode[mystring] RunLengthDecode[%] %==mystring</lang> gives back: <lang Mathematica>12W1B12W3B24W1B14W WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW True</lang>

MMIX

<lang mmix> LOC Data_Segment GREG @ Buf OCTA 0,0,0,0 integer print buffer Char BYTE 0,0 single char print buffer task BYTE "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWW" BYTE "WWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW",0 len GREG @-1-task

// task should become this tEnc BYTE "12W1B12W3B24W1B14W",0

GREG @ // tuple array for encoding purposes // each tuple is a tetra (4 bytes long or 2 wydes long) // (c,l) in which c is a char and l = number of chars c // high wyde of the tetra contains the char // low wyde .. .. .. contains the length RLE TETRA 0

LOC #100 locate program GREG @ // print number to stdout // destroys input arg $3 ! Prt64 LDA $255,Buf+23 points to LSD // do 2H DIV $3,$3,10 (N,R) = divmod (N,10) GET $13,rR get remainder INCL $13,'0' convert to ascii STBU $13,$255 store ascii digit BZ $3,3F SUB $255,$255,1 move pointer down JMP 2B While N !=0 3H TRAP 0,Fputs,StdOut print number to standard out GO $127,$127,0 return

GREG @ // print char to stdout PChar LDA $255,Char STBU $4,$255 TRAP 0,Fputs,StdOut GO $127,$127,0

GREG @ // encode routine // $0 string pointer // $1 index var // $2 pointer to tuple array // $11 temp var tuple Encode SET $1,0 initialize index = 0 SET $11,0 postion in string = 0 LDBU $3,$0,$1 get first char ADDU $6,$3,0 remember it

do

1H INCL $1,1 repeat incr index LDBU $3,$0,$1 get a char BZ $3,2F if EOS then finish CMP $7,$3,$6 PBZ $7,1B while new == old XOR $4,$4,$4 new tuple ADDU $4,$6,0 SLU $4,$4,16 old char to tuple -> (c,_) SUB $7,$1,$11 length = index - previous position ADDU $11,$1,0 incr position OR $4,$4,$7 length l to tuple -> (c,l) STT $4,$2 put tuple in array ADDU $6,$3,0 remember new char INCL $2,4 incr 'tetra' pointer JMP 1B loop 2H XOR $4,$4,$4 put last tuple in array ADDU $4,$6,0 SLU $4,$4,16 SUB $7,$1,$11 ADDU $11,$1,0 OR $4,$4,$7 STT $4,$2 GO $127,$127,0 return

GREG @ Main LDA $0,task pointer uncompressed string LDA $2,RLE pointer tuple array GO $127,Encode encode string LDA $2,RLE points to start tuples SET $5,#ffff mask for extracting length 1H LDTU $3,$2 while not End of Array BZ $3,2F SRU $4,$3,16 char = (c,_) AND $3,$3,$5 length = (_,l) GO $127,Prt64 print length GO $127,PChar print char INCL $2,4 incr tuple pointer JMP 1B wend 2H SET $4,#a print NL GO $127,PChar

// decode using the RLE tuples LDA $2,RLE pointer tuple array SET $5,#ffff mask 1H LDTU $3,$2 while not End of Array BZ $3,2F SRU $4,$3,16 char = (c,_) AND $3,$3,$5 length = (_,l) // for (i=0;i<length;i++) { 3H GO $127,PChar print a char SUB $3,$3,1 PBNZ $3,3B INCL $2,4 JMP 1B } 2H SET $4,#a print NL GO $127,PChar TRAP 0,Halt,0 EXIT</lang> Example run encode --> decode:

~/MIX/MMIX/Rosetta> mmix rle
12W1B12W3B24W1B14W
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

Objective-C

See Run-length encoding/Objective-C

OCaml

<lang ocaml>let encode str =

 let len = String.length str in
 let rec aux i acc =
   if i >= len then List.rev acc
   else
     let c1 = str.[i] in
     let rec aux2 j =
       if j >= len then (c1, j-i)
       else
         let c2 = str.[j] in
         if c1 = c2
         then aux2 (j+1)
         else (c1, j-i)
     in
     let (c,n) as t = aux2 (i+1) in
     aux (i+n) (t::acc)
 in
 aux 0 []

let decode lst =

 let l = List.map (fun (c,n) -> String.make n c) lst in
 (String.concat "" l)</lang>

<lang ocaml>let () =

 let e = encode "aaaaahhhhhhmmmmmmmuiiiiiiiaaaaaa" in
 List.iter (fun (c,n) ->
   Printf.printf " (%c, %d);\n" c n;
 ) e;
 print_endline (decode [('a', 5); ('h', 6); ('m', 7); ('u', 1); ('i', 7); ('a', 6)]);

</lang>

Using regular expressions

<lang ocaml>#load "str.cma";;

open Str

let encode =

 global_substitute (Str.regexp "\\(.\\)\\1*")
   (fun s -> string_of_int (String.length (matched_string s)) ^
             matched_group 1 s)

let decode =

 global_substitute (Str.regexp "\\([0-9]+\\)\\([^0-9]\\)")
   (fun s -> String.make (int_of_string (matched_group 1 s))
                         (matched_group 2 s).[0])

let () =

 print_endline (encode "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW");
 print_endline (decode "12W1B12W3B24W1B14W");</lang>

Oz

<lang oz>declare

 fun {RLEncode Xs}
    for G in {Group Xs} collect:C do

{C {Length G}#G.1}

    end
 end

 fun {RLDecode Xs}
    for C#Y in Xs append:Ap do

{Ap {Replicate Y C}}

    end
 end

 %% Helpers
 %% e.g. "1122" -> ["11" "22"]
 fun {Group Xs}
    case Xs of nil then nil
    [] X|Xr then

Ys Zs

       {List.takeDropWhile Xr fun {$ W} W==X end ?Ys ?Zs}
    in
       (X|Ys) | {Group Zs}
    end
 end
 %% e.g. 3,4 -> [3 3 3 3] 
 fun {Replicate X N}
    case N of 0 then nil
    else X|{Replicate X N-1}
    end
 end
 
 Data = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
 Enc = {RLEncode Data}

in

 {System.showInfo Data}
 {Show Enc}
 {System.showInfo {RLDecode Enc}}</lang>

Perl

<lang perl>sub encode

{my $str = shift;
 $str =~ s {(.)(\1*)} {(length($2) + 1) . $1 . ';'}gse;
 return $str;}

sub decode

{my $str = shift;
 $str =~ s {(\d+)(.);} {$2 x $1}gse;
 return $str;}</lang>

The following modified versions of the previous one, encode/decode a bytes sequence in a way compatible with the functions of the C version.

<lang perl>sub encode

{my $str = shift;
 $str =~ s {(.)(\1{0,254})} {pack("C",(length($2) + 1)) . $1 }gse;
 return $str;}

sub decode {

    my @str = split //, shift;
    my $r = "";
    foreach my $i (0 .. scalar(@str)/2-1) {

$r .= $str[2*$i + 1] x unpack("C", $str[2*$i]);

    }
    return $r;

}</lang>

PHP

<lang php><?php function encode($str) {

 return preg_replace('/(.)\1*/e', 'strlen($0) . $1', $str);

}

function decode($str) {

 return preg_replace('/(\d+)(\D)/e', 'str_repeat($2, $1)', $str);

}

echo encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'), "\n"; echo decode('12W1B12W3B24W1B14W'), "\n"; ?></lang>

PicoLisp

<lang PicoLisp>(de encode (Str)

  (pack
     (make
        (for (Lst (chop Str) Lst)
           (let (N 1  C)
              (while (= (setq C (pop 'Lst)) (car Lst))
                 (inc 'N) )
              (link (format N) C) ) ) ) ) )

(de decode (Str)

  (pack
     (make
        (let N 0
           (for C (chop Str)
              (if (>= "9" C "0")
                 (setq N (+ (format C) (* 10 N)))
                 (do N (link C))
                 (zero N) ) ) ) ) ) )

(and

  (prinl "Data:    " "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW")
  (prinl "Encoded: " (encode @))
  (prinl "Decoded: " (decode @)) )</lang>

Output:

Data:    WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
Encoded: 12W1B12W3B24W1B14W
Decoded: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

PL/I

<lang PL/I> declare (c1, c2) character (1); declare run_length fixed binary; declare input file;

open file (input) title ('/RLE.DAT,type(text),recsize(20000)'); on endfile (input) go to epilog;

get file (input) edit (c1) (a(1)); run_length = 1; do forever;

  get file (input) edit (c2) (a(1));
  if c1 = c2 then
     run_length = run_length + 1;
  else
     do; put edit (trim(run_length), c1) (a); run_length=1; end;
  c1 = c2;

end; epilog:

  put edit (trim(run_length), c1) (a);
  put skip;

/* The reverse of the above operation: */ declare c character (1); declare i fixed binary; declare new file;

open file (new) title ('/NEW.DAT,type(text),recsize(20000)'); on endfile (new) stop; do forever;

  run_length = 0;
  do forever;
     get file (new) edit (c) (a(1));
     if index('0123456789', c) = 0 then leave;
     run_length = run_length*10 + c;
  end;
  put edit ((c do i = 1 to run_length)) (a);

end; </lang>

PowerBASIC

This version can handle any arbitrary string that doesn't contain numbers (not just letters). (A flag value could be added which would allow the inclusion of any character, but such a flag isn't in this example.)

<lang powerbasic>FUNCTION RLDecode (i AS STRING) AS STRING

   DIM Loop0 AS LONG, rCount AS STRING, outP AS STRING, m AS STRING

   FOR Loop0 = 1 TO LEN(i)
       m = MID$(i, Loop0, 1)
       SELECT CASE m
           CASE "0" TO "9"
               rCount = rCount & m
           CASE ELSE
               IF LEN(rCount) THEN
                   outP = outP & STRING$(VAL(rCount), m)
                   rCount=""
               ELSE
                   outP = outP & m
               END IF
       END SELECT
   NEXT
   FUNCTION = outP

END FUNCTION

FUNCTION RLEncode (i AS STRING) AS STRING

   DIM tmp1 AS STRING, tmp2 AS STRING, outP AS STRING
   DIM Loop0 AS LONG, rCount AS LONG

   tmp1 = MID$(i, 1, 1)
   tmp2 = tmp1
   rCount = 1

   FOR Loop0 = 2 TO LEN(i)
       tmp1 = MID$(i, Loop0, 1)
       IF tmp1 <> tmp2 THEN
           outP = outP & TRIM$(STR$(rCount)) & tmp2
           tmp2 = tmp1
           rCount = 1
       ELSE
           INCR rCount
       END IF
   NEXT

   outP = outP & TRIM$(STR$(rCount))
   outP = outP & tmp2
   FUNCTION = outP

END FUNCTION

FUNCTION PBMAIN () AS LONG

   DIM initial AS STRING, encoded AS STRING, decoded AS STRING
   initial = INPUTBOX$("Type something.")
   encoded = RLEncode(initial)
   decoded = RLDecode(encoded)
   'in PB/Win, "?" = MSGBOX; in PB/DOS & PB/CC. "?" = PRINT
   ? initial & $CRLF & encoded & $CRLF & decoded

END FUNCTION</lang>

Outputs are similar to those in BASIC, above.

PowerShell

<lang powershell>function Compress-RLE ($s) {

   $re = [regex] '(.)\1*'
   $ret = ""
   foreach ($m in $re.Matches($s)) {
       $ret += $m.Length
       $ret += $m.Value[0]
   }
   return $ret

}

function Expand-RLE ($s) {

   $re = [regex] '(\d+)(.)'
   $ret = ""
   foreach ($m in $re.Matches($s)) {
       $ret += [string] $m.Groups[2] * [int] [string] $m.Groups[1]
   }
   return $ret

}</lang> Output:

PS> Compress-RLE "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
12W1B12W3B24W1B14W
PS> Expand-RLE "12W1B12W3B24W1B14W"
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

Python

<lang python>def encode(input_string):

   count = 1
   prev = 
   lst = []
   for character in input_string:
       if character != prev:
           if prev:
               entry = (prev,count)
               lst.append(entry)
               #print lst
           count = 1
           prev = character
       else:
           count += 1
   else:
       entry = (character,count)
       lst.append(entry)
   return lst

def decode(lst):

   q = ""
   for character, count in lst:
       q += character * count
   return q

Method call

encode("aaaaahhhhhhmmmmmmmuiiiiiiiaaaaaa") decode([('a', 5), ('h', 6), ('m', 7), ('u', 1), ('i', 7), ('a', 6)])</lang>

Functional

Works with: Python version 2.4

<lang python>from itertools import groupby def encode(input_string):

   return [(len(list(g)), k) for k,g in groupby(input_string)]

def decode(lst):

   return .join(c * n for n,c in lst)

encode("aaaaahhhhhhmmmmmmmuiiiiiiiaaaaaa") decode([(5, 'a'), (6, 'h'), (7, 'm'), (1, 'u'), (7, 'i'), (6, 'a')])</lang>

By regular expression
The simplified input range of only uppercase characters allows a simple regular expression to be applied repeatedly for encoding, and another for decoding: <lang python>from re import sub

def encode(text):

   
   Doctest:
       >>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
       '12W1B12W3B24W1B14W'    
   
   return sub(r'(.)\1*', lambda m: str(len(m.group(0))) + m.group(1),
              text)

def decode(text):

   
   Doctest:
       >>> decode('12W1B12W3B24W1B14W')
       'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
   
   return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
              text)

textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" assert decode(encode(textin)) == textin</lang>

R

R has a built-in function, rle, for run length encoding. This modification allows input and output in the forms specified above. <lang R>runlengthencoding <- function(x) {

  splitx <- unlist(strsplit(input, ""))
  rlex <- rle(splitx)
  paste(with(rlex, as.vector(rbind(lengths, values))), collapse="")

}

input <- "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" runlengthencoding(input)</lang> Similarly, inverse.rle provides decompression after a run length encoding. <lang R>inverserunlengthencoding <- function(x) {

   lengths <- as.numeric(unlist(strsplit(output, "alpha:")))
   values <- unlist(strsplit(output, "digit:"))
   values <- values[values != ""]
   uncompressed <- inverse.rle(list(lengths=lengths, values=values))
   paste(uncompressed, collapse="")

}

output <- "12W1B12W3B24W1B14W" inverserunlengthencoding(output)</lang>

Ruby

<lang ruby>def encode(string)

 string.scan(/(.)(\1*)/).collect do |char, repeat|
   [char, 1 + repeat.length] 
 end

end

def decode(encoding)

 encoding.collect { |char, length| char * length }.join

end

orig = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" enc = encode(orig) # => [["W", 12], ["B", 1], ["W", 12], ["B", 3], ["W", 24], ["B", 1], ["W", 14]] dec = decode(enc) puts "success!" if dec == orig</lang>

This usage also seems to be idiomatic, and perhaps less cryptic: <lang ruby>def encode(string)

 encoding = []
 for char, repeat in string.scan(/(.)(\1*)/)
   encoding << [char, 1 + repeat.length]
 end
 encoding

end

def decode(encoding)

 decoding = ""
 for char, length in encoding
   decoding << char * length
 end
 decoding

end</lang>

By regular expression
The simplified input range of only uppercase characters allows a simple regular expression to be applied repeatedly for encoding, and another for decoding: <lang ruby>def encode(str)

   str.gsub(/(.)\1*/) {$&.length.to_s + $1}

end

def decode(str)

   str.gsub(/(\d+)(\D)/) {$2 * $1.to_i}

end

encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW') #=> "12W1B12W3B24W1B14W" decode('12W1B12W3B24W1B14W') #=> "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"</lang>

Scala

Care is taken to use StringBuilder for performance reasons.

<lang scala>def encode(s: String) = (1 until s.size).foldLeft((1, s(0), new StringBuilder)) {

 case ((len, c, sb), index) if c != s(index) => sb.append(len); sb.append(c); (1, s(index), sb)
 case ((len, c, sb), _) => (len + 1, c, sb)

} match {

 case (len, c, sb) => sb.append(len); sb.append(c); sb.toString

}

def decode(s: String) = {

 val sb = new StringBuilder
 val Code = """(\d+)([A-Z])""".r
 for (Code(len, c) <- Code findAllIn s) sb.append(c * len.toInt)
 sb.toString

}</lang>

A simpler (?) encoder: <lang scala>def encode(s:String) = {

 s.foldLeft((0,s(0),""))( (t,c) => t match {case (i,p,s) => if (p==c) (i+1,p,s) else (1,c,s+i+p)})
   match {case (i,p,s) => s+i+p}

}</lang>

To make it faster (it's also faster than the longer implementation above) just replace "" with new StringBuilder and s+i+p with {s.append(i);s.append(p)}

Smalltalk

See Run-length encoding/Smalltalk

Tcl

The encoding is an even-length list with elements {count char ...} <lang tcl>proc encode {string} {

   set encoding {}
   # use a regular expression to match runs of one character
   foreach {run -} [regexp -all -inline {(.)\1+|.} $string] {
       lappend encoding [string length $run] [string index $run 0]
   }
   return $encoding

}

proc decode {encoding} {

   foreach {count char} $encoding  {
       append decoded [string repeat $char $count]
   }
   return $decoded

}</lang>

<lang tcl>set str "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" set enc [encode $str] ;# ==> {12 W 1 B 12 W 3 B 24 W 1 B 14 W} set dec [decode $enc] if {$str eq $dec} {

   puts "success"

}</lang>

Ursala

A standard library function, rlc, does most of the work for this task, which is a second order function taking a binary predicate that decides when consecutive items of an input list belong to the same run. <lang Ursala>#import std

import nat

encode = (rlc ==); *= ^lhPrNCT\~&h %nP+ length

decode = (rlc ~&l-=digits); *=zyNCXS ^|DlS/~& iota+ %np

test_data = 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'

show+

example =

<

  encode test_data,
  decode encode test_data></lang>

The output shows an encoding of the test data, and a decoding of the encoding, which matches the original test data.

12W1B12W3B24W1B14W
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

Vedit macro language

The following example encodes/decodes an entire file. Each run is coded with two bytes. The first byte is the run length with high bit set, the second byte is the character code. ASCII characters with run length of 1 are left unchanged. Character codes above 127 are always coded with run length. Newlines are not converted (the regular expression does not count newlines). This methods supports any type of input. <lang vedit>:RL_ENCODE: BOF While (!At_EOF) {

   if (At_EOL) { Line(1) Continue }    // skip newlines
   #1 = Cur_Char                       // #1 = character
   Match("(.)\1*", REGEXP)             // count run length
   #2 = Chars_Matched                  // #2 = run length
   if (#2 > 127) { #2 = 127 }          // can be max 127
   if (#2 > 1 || #1 > 127) {
       Del_Char(#2)
       Ins_Char(#2 | 128)              // run length (high bit set)
       Ins_Char(#1)                    // character
   } else {                            // single ASCII char
       Char                            // skip
   }

} Return

RL_DECODE:

BOF While (!At_EOF) {

   #2 = Cur_Char
   if (#2 > 127) {                     // is this run length?
       #1 = Cur_Char(1)                // #1 = character value
       Del_Char(2)
       Ins_Char(#1, COUNT, #2 & 127)
   } else {                            // single ASCII char
       Char
   }

} Return</lang>