Suffix tree
A suffix tree is a data structure commonly used in string algorithms. Basically, for any string, its suffix tree is a rooted tree where each edge is labelled, and where the concatenation of all the labels from the root to a leaf uniquely identifies a suffix of the string.
For this task, build the suffix tree of the string "banana$", and show that its edges are:
$ $ $ $ a banana$ na na na$ na$
Adding a $ sign at the end of the string is a common practice. Here we shall not try to explain the rationale behind it, though.
Extra-credit: use the visualize a tree task in order to show the whole tree.
Perl
<lang Perl>use strict; use warnings; use Data::Dumper;
sub classify {
my ($f, $h) = (shift, {}); for (@_) { push @{$h->{$f->($_)}}, $_ } return $h;
} sub suffixes {
my $str = shift; map { substr $str, $_ } 0 .. length($str) - 1;
} sub suffix_tree {
return +{} if @_ == 0; return +{ $_[0] => +{} } if @_ == 1; my $h = {}; my $classif = classify sub { substr shift, 0, 1 }, @_; for my $key (sort keys %$classif) { my $subtree = suffix_tree( grep "$_", map { substr $_, 1 } @{$classif->{$key}} ); my @subkeys = keys %$subtree; if (@subkeys == 1) { my $subkey = shift @subkeys; $h->{"$key$subkey"} = $subtree->{$subkey}; } else { $h->{$key} = $subtree } } return $h;
}
print +Dumper suffix_tree suffixes 'banana$';</lang>
- Output:
$VAR1 = { '$' => {}, 'a' => { '$' => {}, 'na' => { 'na$' => [], '$' => {} } }, 'banana$' => [], 'na' => { 'na$' => [], '$' => {} } };
Perl 6
<lang Perl 6>multi suffix-tree(Str $str) { suffix-tree map &flip, [\~] $str.flip.comb } multi suffix-tree(@a) {
hash @a == 0 ?? () !! @a == 1 ?? @a[0] => [] !! gather for @a.classify(*.substr(0, 1)) { my $subtree = suffix-tree(grep *.chars, map *.substr(1), .value[]); if $subtree == 1 { my $pair = $subtree.pick; take .key ~ $pair.key => $pair.value; } else { take .key => $subtree; } }
}
sub edges($tree) {
gather for $tree[] { .take for .key, edges .value; }
}
say sort edges suffix-tree 'banana$';</lang>
Output matches the one in the task description.
Extra credit: <lang Perl 6>my $tree = root => suffix-tree 'banana$'; .say for visualize-tree $tree, *.key, *.value.list;</lang>
- Output:
root ├─$ ├─a │ ├─$ │ └─na │ ├─$ │ └─na$ ├─na │ ├─$ │ └─na$ └─banana$
Racket
See Suffix trees with Ukkonen’s algorithm by Danny Yoo for more information on how to use suffix trees in Racket.
<lang racket>
- lang racket
(require (planet dyoo/suffixtree)) (define tree (make-tree)) (tree-add! tree (string->label "rosettacode$")) (for ([i (in-naturals)]
[c (node-children (tree-root tree))]) (printf "~a: ~a\n" i (label->string (node-up-label c))))
</lang> Output: <lang racket> 0: $ 1: e 2: de$ 3: o 4: code$ 5: acode$ 6: t 7: settacode$ 8: rosettacode$ </lang>