Longest common subsequence: Difference between revisions
m
Reformatted References
(Moved Background and References sections from the C++ sample to the main Introduction for the page.) |
m (Reformatted References) |
||
Line 1:
{{task}}[[Category:Recursion]][[Category:Memoization]]
=== Introduction ===
Define a subsequence to be any string obtained by deleting zero or more symbols from an input string.
The
Let A = A[0]… A[m-1] and B = B[0]… B[n-1], m <= n be strings drawn from an alphabet '''Σ''' of size s, containing every distinct symbol in A + B.
Line 10 ⟶ 12:
Define the strict Cartesian product-order (<) over matches, such that (i1, j1) < (i2, j2) iff i1 < j1 and i2 < j2. Defining (>) similarly, we can write m2 < m1 as m1 > m2.
If i1 <= j1 and i2 >= j2 (or if i1 >= i2 and i2 <= j2) then neither m1 < m2 nor m1 > m2 are possible; and m1, m2 are
Given a product-order over the set of matches '''M''', a chain '''C''' is any subset of '''M''' where either m1 < m2 or m2 < m1 for every pair of distinct elements m1 and m2 of '''C'''.
Line 20 ⟶ 22:
Then any chain '''C''' can be visualized as a monotonically increasing curve through those match bits which are set to True.
Where the number of symbols appearing in matches is small relative to the length of the input strings, reuse of the symbols increases; and the number of matches will tend towards quadratic, O(m*n) growth. This occurs, for example, in the Bioinformatics application of nucleotide and protein sequencing.
Line 34 ⟶ 36:
Claus Rick has also described a linear-space algorithm with a time bound of O(n*s + min(p*m, p*(n-p))).
"An Algorithm for Differential File Comparison" by James W. Hunt and M. Douglas McIlroy, June 1976<br />▼
"A Fast Algorithm for Computing Longest Common Subsequences" by James W. Hunt and Thomas G. Szymanski, published May 1977<br />▼
# "
▲# "An Algorithm for Differential File Comparison" by James W. Hunt and M. Douglas McIlroy, June 1976
▲# "A Fast Algorithm for Computing Longest Common Subsequences" by James W. Hunt and Thomas G. Szymanski, published May 1977
# "Simple and fast linear space computation of longest common subsequences" by Claus Rick, received 17 March 2000, Information Processing Letters, Elsevier Science [Volume 75, pp. 275–281]
The sequences "1234" and "1224533324" have an LCS of "1234":
|