Longest common subsequence: Difference between revisions
Content added Content deleted
m (Reformatted References) |
Thundergnat (talk | contribs) (Undo revision 344177 by CNHume ([[Userp formatting) |
||
Line 1: | Line 1: | ||
{{task}}[[Category:Recursion]][[Category:Memoization]] |
{{task}}[[Category:Recursion]][[Category:Memoization]] |
||
=== Introduction === |
|||
Define a subsequence to be any string obtained by deleting zero or more symbols from an input string. |
Define a subsequence to be any string obtained by deleting zero or more symbols from an input string. |
||
The [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem |
The '''Longest Common Subsequence''' (or [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''LCS''']) is a subsequence of maximum length common to two (or more) strings. |
||
Let A = A[0]… A[m-1] and B = B[0]… B[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in A + B. |
Let A = A[0]… A[m-1] and B = B[0]… B[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in A + B. |
||
Line 12: | Line 10: | ||
Define the strict Cartesian product-order (<) over matches, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2. |
Define the strict Cartesian product-order (<) over matches, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2. |
||
If i1 <= j1 and i2 >= j2 (or if i1 >= i2 and i2 <= j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are ''incomparable''. Defining (#) to denote this case, we can write m1 # m2. |
If i1 <= j1 and i2 >= j2 (or if i1 >= i2 and i2 <= j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are said to be ''incomparable''. Defining (#) to denote this case, we can write m1 # m2. |
||
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where either m1 < m2 or m2 < m1 for every pair of distinct elements m1 and m2 of '''C'''. |
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where either m1 < m2 or m2 < m1 for every pair of distinct elements m1 and m2 of '''C'''. |
||
Line 22: | Line 20: | ||
Then any chain '''C''' can be visualized as a monotonically increasing curve through those match bits which are set to True. |
Then any chain '''C''' can be visualized as a monotonically increasing curve through those match bits which are set to True. |
||
'''Background''' |
|||
Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards quadratic, O(m*n) growth. This occurs, for example, in the Bioinformatics application of nucleotide and protein sequencing. |
Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards quadratic, O(m*n) growth. This occurs, for example, in the Bioinformatics application of nucleotide and protein sequencing. |
||
Line 36: | Line 34: | ||
Claus Rick has also described a linear-space algorithm with a time bound of O(n*s + min(p*m, p*(n-p))). |
Claus Rick has also described a linear-space algorithm with a time bound of O(n*s + min(p*m, p*(n-p))). |
||
'''References''' |
|||
"A linear space algorithm for computing maximal common subsequences" by Daniel S. Hirschberg, published June 1975<br /> |
|||
Communications of the ACM [Volume 18, Number 6, pp. 341–343] |
|||
⚫ | |||
Computing Science Technical Report, Bell Laboratories 41 |
|||
⚫ | |||
Communications of the ACM [Volume 20, Number 5, pp. 350-353] |
|||
"Simple and fast linear space computation of longest common subsequences" by Claus Rick, received 17 March 2000, Information Processing Letters,<br /> |
|||
Elsevier Science [Volume 75, pp. 275–281] |
|||
⚫ | |||
⚫ | |||
# "Simple and fast linear space computation of longest common subsequences" by Claus Rick, received 17 March 2000, Information Processing Letters, Elsevier Science [Volume 75, pp. 275–281] |
|||
'''Examples''' |
|||
The sequences "1234" and "1224533324" have an LCS of "1234": |
The sequences "1234" and "1224533324" have an LCS of "1234": |