Longest common subsequence: Difference between revisions

Undo revision 344177 by CNHume ([[Userp formatting
m (Reformatted References)
(Undo revision 344177 by CNHume ([[Userp formatting)
Line 1:
{{task}}[[Category:Recursion]][[Category:Memoization]]
=== Introduction ===
 
Define a subsequence to be any string obtained by deleting zero or more symbols from an input string.
 
The '''Longest Common Subsequence''' (or [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''Longest Common Subsequence'''] (or '''LCS''']) is a subsequence of maximum length common to two (or more) strings.
 
Let A = A[0]… A[m-1] and B = B[0]… B[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in A + B.
Line 12 ⟶ 10:
Define the strict Cartesian product-order (<) over matches, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2.
 
If i1 <= j1 and i2 >= j2 (or if i1 >= i2 and i2 <= j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are said to be ''incomparable''. Defining (#) to denote this case, we can write m1 # m2.
 
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where either m1 < m2 or m2 < m1 for every pair of distinct elements m1 and m2 of '''C'''.
Line 22 ⟶ 20:
Then any chain '''C''' can be visualized as a monotonically increasing curve through those match bits which are set to True.
 
=== '''Background ==='''
 
Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards quadratic, O(m*n) growth. This occurs, for example, in the Bioinformatics application of nucleotide and protein sequencing.
Line 36 ⟶ 34:
Claus Rick has also described a linear-space algorithm with a time bound of O(n*s + min(p*m, p*(n-p))).
 
=== '''References ==='''
 
"A linear space algorithm for computing maximal common subsequences" by Daniel S. Hirschberg, published June 1975<br />
Communications of the ACM [Volume 18, Number 6, pp. 341–343]
 
# "An Algorithm for Differential File Comparison" by James W. Hunt and M. Douglas McIlroy, June 1976<br Computing Science Technical Report, Bell Laboratories 41/>
Computing Science Technical Report, Bell Laboratories 41
 
# "A Fast Algorithm for Computing Longest Common Subsequences" by James W. Hunt and Thomas G. Szymanski, published May 1977<br Communications of the ACM [Volume 20, Number 5, pp. 350-353]/>
Communications of the ACM [Volume 20, Number 5, pp. 350-353]
 
# "ASimple and fast linear space algorithmcomputation forof computing maximallongest common subsequences" by DanielClaus S. HirschbergRick, publishedreceived June17 1975March Communications2000, ofInformation theProcessing ACM [Volume 18Letters,<br Number 6, pp. 341–343]/>
Elsevier Science [Volume 75, pp. 275–281]
# "An Algorithm for Differential File Comparison" by James W. Hunt and M. Douglas McIlroy, June 1976 Computing Science Technical Report, Bell Laboratories 41
# "A Fast Algorithm for Computing Longest Common Subsequences" by James W. Hunt and Thomas G. Szymanski, published May 1977 Communications of the ACM [Volume 20, Number 5, pp. 350-353]
# "Simple and fast linear space computation of longest common subsequences" by Claus Rick, received 17 March 2000, Information Processing Letters, Elsevier Science [Volume 75, pp. 275–281]
 
=== '''Examples ==='''
 
The sequences "1234" and "1224533324" have an LCS of "1234":
10,327

edits