Talk:Suffix tree: Difference between revisions

← Older edit

Talk:Suffix tree (view source)

Revision as of 23:33, 20 August 2015

700 bytes added , 8 years ago

→‎Valid implementations?: new section

Rdm

6,951

edits

Revision as of 13:58, 27 May 2013 (view source) Rdm (talk \| contribs) m (→‎definition (take 2)) ← Older edit		Latest revision as of 23:33, 20 August 2015 (view source) Rdm (talk \| contribs) (→‎Valid implementations?: new section)
(2 intermediate revisions by 2 users not shown)
Line 7: The wikipedia definition for a suffix tree currently looks like this: :The suffix tree for the string <math>S</math> of length <math>n</math> is defined as a tree such that:<ref>~~{{harvtxt\|~~Gusfield\|, 1999}}, p.90.</ref> :* the paths from the root to the leaves have a one-to-one relationship with the suffixes of <math>S</math>, :* edges spell non-empty strings, Line 72: :::: Anyways, the wikipedia definition does mention that every parent node other than the root has at least two edges leading out. Which (now that the defect in my thinking about what an edge is has been fixed) seems to address the "must be longest possible" issue. But emphasizing the point might not hurt. (As an aside, note that I almost never use trees with low children counts in coding because for my applications the constant multipliers on their costs almost always mean that another approach is better - roughly speaking, if I need a tree at all, for me "better" is something like sequence of trees (new content in a small mutable tree, old content in a larger more constant tree) where a node occupies most of an L1 cache and where the "edges" have roughly fixed size - in other words, lots of readers and very few writers - but that kind of reasoning does not seem to apply here. So things which should be just obvious for someone used to working with tall narrow tree implementations on a regular basis can easily escape my notice (similarly, things which seem obvious to me seem to be routinely overlooked by people specifying algorithms which favor skinny trees - and it's not that either approach is universally wrong it's just a reflection of different kinds information from different kinds of applications). Meanwhile, reading the wikipedia page was less than fruitful because that turned my attention to things like insertion operations (which probably means merging two of these trees, but maybe not) and I needed to focus on more basic issues.) :::: That said, my current impression is that the wikipedia article is confusing because it claims linear time for O(n log n) mechanisms. Average time might be linear (I do not know enough to determine that) but time is proportional to space and we need an additional copy of the text for every prefix variant that needs to be treated. I may be wrong here (I do not have proof) but thinking about using this mechanism to encode long, random bit strings seems to support this way of thinking. (And, if I am wrong, I would be very interested in seeing ''proof that the algorithm is O(n)'' that adequately covers space needed for random bit strings.) --[[User:Rdm\|Rdm]] ([[User talk:Rdm\|talk]]) 13:57, 27 May 2013 (UTC) ==Definition (take 3)== So the definition has been updated, both on Wikipedia and on RC. I used http://www.cs.uoi.gr/~kblekas/courses/bioinformatics/Suffix_Trees1.pdf as a reference.--[[User:Grondilu\|Grondilu]] ([[User talk:Grondilu\|talk]]) 17:37, 26 August 2013 (UTC) == Valid implementations? == None of the displayed current implementations number their leaves, but numbering of leaves was a task requirement. Also, they are not displayed such that implicit numbering can be used (for example: node 0 first, node 1 second, node 2 third, ...) So can any of these implementations be considered valid for the current task definition? --[[User:Rdm\|Rdm]] ([[User talk:Rdm\|talk]]) 23:33, 20 August 2015 (UTC)