Suffix tree: Difference between revisions
(note about the $ sign) |
(Perl section) |
||
Line 12: | Line 12: | ||
Extra-credit: use the [[visualize a tree]] task in order to show the whole tree. |
Extra-credit: use the [[visualize a tree]] task in order to show the whole tree. |
||
=={{header|Perl}}== |
|||
{{trans|Perl 6}} |
|||
<lang Perl>use strict; |
|||
use warnings; |
|||
use Data::Dumper; |
|||
sub classify { |
|||
my ($f, $h) = (shift, {}); |
|||
for (@_) { push @{$h->{$f->($_)}}, $_ } |
|||
return $h; |
|||
} |
|||
sub suffixes { |
|||
my $str = shift; |
|||
map { substr $str, $_ } 0 .. length($str) - 1; |
|||
} |
|||
sub suffix_tree { |
|||
return +{} if @_ == 0; |
|||
return +{ $_[0] => [] } if @_ == 1; |
|||
my $h = {}; |
|||
my $classif = classify sub { substr shift, 0, 1 }, @_; |
|||
for my $key (sort keys %$classif) { |
|||
my $subtree = suffix_tree( |
|||
grep "$_", map { substr $_, 1 } @{$classif->{$key}} |
|||
); |
|||
my @subkeys = keys %$subtree; |
|||
if (@subkeys == 1) { |
|||
my $subkey = shift @subkeys; |
|||
$h->{"$key$subkey"} = $subtree->{$subkey}; |
|||
} else { $h->{$key} = $subtree } |
|||
} |
|||
return $h; |
|||
} |
|||
print +Dumper suffix_tree suffixes 'banana$';</lang> |
|||
{{out}} |
|||
<pre>$VAR1 = { |
|||
'$' => {}, |
|||
'a' => { |
|||
'$' => {}, |
|||
'na' => { |
|||
'na$' => [], |
|||
'$' => {} |
|||
} |
|||
}, |
|||
'banana$' => [], |
|||
'na' => { |
|||
'na$' => [], |
|||
'$' => {} |
|||
} |
|||
};</pre> |
|||
=={{header|Perl 6}}== |
=={{header|Perl 6}}== |
||
<lang Perl 6>multi suffix-tree(Str $str) { suffix-tree map &flip, [\~] $str.flip.comb } |
<lang Perl 6>multi suffix-tree(Str $str) { suffix-tree map &flip, [\~] $str.flip.comb } |
Revision as of 19:41, 3 August 2013
A suffix tree is a data structure commonly used in string algorithms. Basically, for any string, its suffix tree is a rooted tree where each edge is labelled, and where the concatenation of all the labels from the root to a leaf uniquely identifies a suffix of the string.
For this task, build the suffix tree of the string "banana$", and show that its edges are:
$ $ $ $ a banana$ na na na$ na$
Adding a $ sign at the end of the string is a common practice. Here we shall not try to explain the rationale behind it, though.
Extra-credit: use the visualize a tree task in order to show the whole tree.
Perl
<lang Perl>use strict; use warnings; use Data::Dumper;
sub classify {
my ($f, $h) = (shift, {}); for (@_) { push @{$h->{$f->($_)}}, $_ } return $h;
} sub suffixes {
my $str = shift; map { substr $str, $_ } 0 .. length($str) - 1;
} sub suffix_tree {
return +{} if @_ == 0; return +{ $_[0] => [] } if @_ == 1; my $h = {}; my $classif = classify sub { substr shift, 0, 1 }, @_; for my $key (sort keys %$classif) { my $subtree = suffix_tree( grep "$_", map { substr $_, 1 } @{$classif->{$key}} ); my @subkeys = keys %$subtree; if (@subkeys == 1) { my $subkey = shift @subkeys; $h->{"$key$subkey"} = $subtree->{$subkey}; } else { $h->{$key} = $subtree } } return $h;
}
print +Dumper suffix_tree suffixes 'banana$';</lang>
- Output:
$VAR1 = { '$' => {}, 'a' => { '$' => {}, 'na' => { 'na$' => [], '$' => {} } }, 'banana$' => [], 'na' => { 'na$' => [], '$' => {} } };
Perl 6
<lang Perl 6>multi suffix-tree(Str $str) { suffix-tree map &flip, [\~] $str.flip.comb } multi suffix-tree(@a) {
hash @a == 0 ?? () !! @a == 1 ?? @a[0] => [] !! gather for @a.classify(*.substr(0, 1)) { my $subtree = suffix-tree(grep *.chars, map *.substr(1), .value[]); if $subtree == 1 { my $pair = $subtree.pick; take .key ~ $pair.key => $pair.value; } else { take .key => $subtree; } }
}
sub edges($tree) {
gather for $tree[] { .take for .key, edges .value; }
}
say sort edges suffix-tree 'banana$';</lang>
Output matches the one in the task description.
Extra credit: <lang Perl 6>my $tree = root => suffix-tree 'banana$'; .say for visualize-tree $tree, *.key, *.value.list;</lang>
- Output:
root ├─$ ├─a │ ├─$ │ └─na │ ├─$ │ └─na$ ├─na │ ├─$ │ └─na$ └─banana$
Racket
See Suffix trees with Ukkonen’s algorithm by Danny Yoo for more information on how to use suffix trees in Racket.
<lang racket>
- lang racket
(require (planet dyoo/suffixtree)) (define tree (make-tree)) (tree-add! tree (string->label "rosettacode$")) (for ([i (in-naturals)]
[c (node-children (tree-root tree))]) (printf "~a: ~a\n" i (label->string (node-up-label c))))
</lang> Output: <lang racket> 0: $ 1: e 2: de$ 3: o 4: code$ 5: acode$ 6: t 7: settacode$ 8: rosettacode$ </lang>