Suffix tree: Difference between revisions

Content added Content deleted

Inline

Revision as of 19:41, 3 August 2013

A suffix tree is a data structure commonly used in string algorithms. Basically, for any string, its suffix tree is a rooted tree where each edge is labelled, and where the concatenation of all the labels from the root to a leaf uniquely identifies a suffix of the string.

For this task, build the suffix tree of the string "banana$", and show that its edges are:

$ $ $ $ a banana$ na na na$ na$

Adding a $ sign at the end of the string is a common practice. Here we shall not try to explain the rationale behind it, though.

Extra-credit: use the visualize a tree task in order to show the whole tree.

Perl

Translation of: Perl 6

<lang Perl>use strict; use warnings; use Data::Dumper;

sub classify {

   my ($f, $h) = (shift, {});
   for (@_) { push @{$h->{$f->($_)}}, $_ }
   return $h;

} sub suffixes {

   my $str = shift;
   map { substr $str, $_ } 0 .. length($str) - 1;

} sub suffix_tree {

   return +{} if @_ == 0;
   return +{ $_[0] => [] } if @_ == 1;
   my $h = {};
   my $classif = classify sub { substr shift, 0, 1 }, @_;
   for my $key (sort keys %$classif) {
       my $subtree = suffix_tree(
           grep "$_", map { substr $_, 1 } @{$classif->{$key}}
       );
       my @subkeys = keys %$subtree;
       if (@subkeys == 1) {
           my $subkey = shift @subkeys;
           $h->{"$key$subkey"} = $subtree->{$subkey};
       } else { $h->{$key} = $subtree }
   }
   return $h;

}

print +Dumper suffix_tree suffixes 'banana$';</lang>

Output:

$VAR1 = {
          '$' => {},
          'a' => {
                   '$' => {},
                   'na' => {
                             'na$' => [],
                             '$' => {}
                           }
                 },
          'banana$' => [],
          'na' => {
                    'na$' => [],
                    '$' => {}
                  }
        };

Perl 6

<lang Perl 6>multi suffix-tree(Str $str) { suffix-tree map &flip, [\~] $str.flip.comb } multi suffix-tree(@a) {

   hash
   @a == 0 ?? () !!
   @a == 1 ?? @a[0] => [] !!
   gather for @a.classify(*.substr(0, 1)) {
       my $subtree = suffix-tree(grep *.chars, map *.substr(1), .value[]);
       if $subtree == 1 {
           my $pair = $subtree.pick;
           take .key ~ $pair.key => $pair.value;
       } else {
           take .key => $subtree;
       }
   }

}

sub edges($tree) {

   gather for $tree[] {
       .take for .key, edges .value;
   }

}

say sort edges suffix-tree 'banana$';</lang>

Output matches the one in the task description.

Extra credit: <lang Perl 6>my $tree = root => suffix-tree 'banana$'; .say for visualize-tree $tree, *.key, *.value.list;</lang>

Output:

root
├─$
├─a
│ ├─$
│ └─na
│   ├─$
│   └─na$
├─na
│ ├─$
│ └─na$
└─banana$

Racket

See Suffix trees with Ukkonen’s algorithm by Danny Yoo for more information on how to use suffix trees in Racket.

lang racket

(require (planet dyoo/suffixtree)) (define tree (make-tree)) (tree-add! tree (string->label "rosettacode$")) (for ([i (in-naturals)]

     [c (node-children (tree-root tree))])
 (printf "~a: ~a\n" i (label->string (node-up-label c))))

</lang> Output: <lang racket> 0: $ 1: e 2: de$ 3: o 4: code$ 5: acode$ 6: t 7: settacode$ 8: rosettacode$ </lang>