Jump to content

Longest common subsequence: Difference between revisions

m
Reformatted References
(Moved Background and References sections from the C++ sample to the main Introduction for the page.)
m (Reformatted References)
Line 1:
{{task}}[[Category:Recursion]][[Category:Memoization]]
=== Introduction ===
 
Define a subsequence to be any string obtained by deleting zero or more symbols from an input string.
 
The '''Longest Common Subsequence''' (or [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem '''LCSLongest Common Subsequence'''] (or '''LCS''') is a subsequence of maximum length common to two (or more) strings.
 
Let A = A[0]… A[m-1] and B = B[0]… B[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in A + B.
Line 10 ⟶ 12:
Define the strict Cartesian product-order (<) over matches, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2.
 
If i1 <= j1 and i2 >= j2 (or if i1 >= i2 and i2 <= j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are said to be ''incomparable''. Defining (#) to denote this case, we can write m1 # m2.
 
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where either m1 < m2 or m2 < m1 for every pair of distinct elements m1 and m2 of '''C'''.
Line 20 ⟶ 22:
Then any chain '''C''' can be visualized as a monotonically increasing curve through those match bits which are set to True.
 
'''=== Background''' ===
 
Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards quadratic, O(m*n) growth. This occurs, for example, in the Bioinformatics application of nucleotide and protein sequencing.
Line 34 ⟶ 36:
Claus Rick has also described a linear-space algorithm with a time bound of O(n*s + min(p*m, p*(n-p))).
 
'''=== References''' ===
 
"A linear space algorithm for computing maximal common subsequences" by Daniel S. Hirschberg, published June 1975<br />
Communications of the ACM [Volume 18, Number 6, pp. 341–343]
 
"An Algorithm for Differential File Comparison" by James W. Hunt and M. Douglas McIlroy, June 1976<br />
Computing Science Technical Report, Bell Laboratories 41
 
"A Fast Algorithm for Computing Longest Common Subsequences" by James W. Hunt and Thomas G. Szymanski, published May 1977<br />
Communications of the ACM [Volume 20, Number 5, pp. 350-353]
 
# "Simple and fastA linear space computationalgorithm offor longestcomputing maximal common subsequences" by ClausDaniel RickS. Hirschberg, receivedpublished 17June March1975 2000,Communications Informationof Processingthe LettersACM [Volume 18,<br />Number 6, pp. 341–343]
# "An Algorithm for Differential File Comparison" by James W. Hunt and M. Douglas McIlroy, June 1976<br />Computing Science Technical Report, Bell Laboratories 41
Elsevier Science [Volume 75, pp. 275–281]
# "A Fast Algorithm for Computing Longest Common Subsequences" by James W. Hunt and Thomas G. Szymanski, published May 1977<br />Communications of the ACM [Volume 20, Number 5, pp. 350-353]
# "Simple and fast linear space computation of longest common subsequences" by Claus Rick, received 17 March 2000, Information Processing Letters, Elsevier Science [Volume 75, pp. 275–281]
 
'''=== Examples''' ===
 
The sequences "1234" and "1224533324" have an LCS of "1234":
159

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.