Longest common subsequence: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎{{header|Java}}: added dynamic programming solution)
(Added Haskell example)
Line 7: Line 7:


In this puzzle, your code only needs to deal with strings. Write a function which returns an LCS of two strings (case-sensitive). You don't need to show multiple LCS's.
In this puzzle, your code only needs to deal with strings. Write a function which returns an LCS of two strings (case-sensitive). You don't need to show multiple LCS's.

=={{header|Haskell}}==

The [http://en.wikipedia.org/wiki/Longest_common_subsequence#Solution_for_two_sequences Wikipedia solution] translates directly into Haskell, with the only difference that equal characters are added in front:

<pre>
longest xs ys = if length xs > length ys then xs else ys

lcs [] _ = []
lcs _ [] = []
lcs (x:xs) (y:ys)
| x == y = x : lcs xs ys
| otherwise = longest (lcs (x:xs) ys) (lcs xs (y:ys))
</pre>

Memoization (aka dynamic programming) of that uses ''zip'' to make both the index and the character available:
<pre>
import Data.Array

lcs xs ys = a!(0,0) where
n = length xs
m = length ys
a = array ((0,0),(n,m)) $ l1 ++ l2 ++ l3
l1 = [((i,m),[]) | i <- [0..n]]
l2 = [((n,j),[]) | j <- [0..m]]
l3 = [((i,j), f x y i j) | (x,i) <- zip xs [0..], (y,j) <- zip ys [0..]]
f x y i j
| x == y = x : a!(i+1,j+1)
| otherwise = longest (a!(i,j+1)) (a!(i+1,j))
</pre>
Both solutions work of course not only with strings, but also with any other list. Example:
<pre>
*Main> lcs "thisisatest" "testing123testing"
"tsitest"
</pre>


=={{header|Java}}==
=={{header|Java}}==

Revision as of 10:32, 8 May 2008

Longest common subsequence is a programming puzzle. It lays out a problem which Rosetta Code users are encouraged to solve, using languages and techniques they know. Multiple approaches are not discouraged, so long as the puzzle guidelines are followed. For other Puzzles, see Category:Puzzles.

The longest common subsequence (or LCS) of groups A and B is the longest group of elements from A and B that are common between the two groups and in the same order in each group. For example, the sequences "1234" and "1224533324" have an LCS of "1234":

1234
1224533324

For a string example, consider the sequences "thisisatest" and "testing123testing". An LCS would be "tsitest":

thisisatest
testing123testing

In this puzzle, your code only needs to deal with strings. Write a function which returns an LCS of two strings (case-sensitive). You don't need to show multiple LCS's.

Haskell

The Wikipedia solution translates directly into Haskell, with the only difference that equal characters are added in front:

longest xs ys = if length xs > length ys then xs else ys

lcs [] _ = []
lcs _ [] = []
lcs (x:xs) (y:ys) 
  | x == y    = x : lcs xs ys
  | otherwise = longest (lcs (x:xs) ys) (lcs xs (y:ys))

Memoization (aka dynamic programming) of that uses zip to make both the index and the character available:

import Data.Array

lcs xs ys = a!(0,0) where
  n = length xs
  m = length ys
  a = array ((0,0),(n,m)) $ l1 ++ l2 ++ l3
  l1 = [((i,m),[]) | i <- [0..n]]
  l2 = [((n,j),[]) | j <- [0..m]]
  l3 = [((i,j), f x y i j) | (x,i) <- zip xs [0..], (y,j) <- zip ys [0..]]
  f x y i j 
    | x == y    = x : a!(i+1,j+1)
    | otherwise = longest (a!(i,j+1)) (a!(i+1,j))

Both solutions work of course not only with strings, but also with any other list. Example:

*Main> lcs "thisisatest" "testing123testing"
"tsitest"

Java

This is not a particularly fast algorithm, but it gets the job done eventually. The speed is a result of many recursive functions calls.

<java>public static String lcs(String a, String b){

   if(a.length() == 0 || b.length() == 0){
       return "";
   }else if(a.charAt(a.length()-1) == b.charAt(b.length()-1)){
       return lcs(a.substring(0,a.length()-1),b.substring(0,b.length()-1))
           + a.charAt(a.length()-1);
   }else{
       String x = lcs(a, b.substring(0,b.length()-1));
       String y = lcs(a.substring(0,a.length()-1), b);
       return (x.length() > y.length()) ? x : y;
   }

}</java>

A dynamic programming solution: <java>public static String lcs(String a, String b) {

   int[][] lengths = new int[a.length()+1][b.length()+1];
   // row 0 and column 0 are initialized to 0 already
   for (int i = 0; i < a.length(); i++)
       for (int j = 0; j < b.length(); j++)
           if (a.charAt(i) == b.charAt(j))
               lengths[i+1][j+1] = lengths[i][j] + 1;
           else
               lengths[i+1][j+1] =
                   Math.max(lengths[i+1][j], lengths[i][j+1]);
   // read the substring out from the matrix
   StringBuffer sb = new StringBuffer();
   for (int x = a.length(), y = b.length();
        x != 0 && y != 0; ) {
       if (lengths[x][y] == lengths[x-1][y])
           x--;
       else if (lengths[x][y] == lengths[x][y-1])
           y--;
       else {
           assert a.charAt(x-1) == b.charAt(y-1);
           sb.append(a.charAt(x-1));
           x--;
           y--;
       }
   }
   return sb.reverse().toString();

}</java>