Tree datastructures

From Rosetta Code
Revision as of 08:45, 16 October 2019 by Hout (talk | contribs) (→‎{{header|Python}}: Added a functionally composed variant.)
Tree datastructures is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The following shows a tree of data with nesting denoted by visual levels of indentation:

RosettaCode
    rocks
        code
        comparison
        wiki
    mocks
        golfing

A common datastructure for trees is to define node structures having a name and a, (possibly empty), list of child nodes. The nesting of nodes captures the indentation of the tree. Lets call this the nest form.

Another datastructure for trees is to construct from the root an ordered list of the nodes level of indentation and the name of that node. The indentation for the root node is zero; node 'rocks is indented by one level from the left, and so on. Lets call this the indent form.

0 RosettaCode
1 rocks
2 code
...
Task
  1. Create/use a nest datastructure format and textual representation for arbitrary trees.
  2. Create/use an indent datastructure format and textual representation for arbitrary trees.
  3. Create methods/classes/proceedures/routines etc to:
    1. Change from a nest tree datastructure to an indent one.
    2. Change from an indent tree datastructure to a nest one
  4. Use the above to encode the example at the start into the nest format, and show it.
  5. transform the initial nest format to indent format and show it.
  6. transform the indent format to final nest format and show it.
  7. Compare initial and final nest formats which should be the same.

Show all output on this page.

JavaScript

Parses the initial tree from outline text, and writes out the flat and nested structures in a minimal JSON format: <lang javascript>(() => {

   'use strict';
   // main :: IO ()
   const main = () => {
       // (INDENT, STRING) PAIRS FROM OUTLINE ------------
       const
           indentLevelTuplesA = indentLevelsFromLines(
               lines(strOutlineB)
           );
       // LIST OF TREES FROM LIST OF (INDENT, STRING) PAIRS
       const
           forestA = forestFromIndentLevels(
               indentLevelTuplesA
           );
       // (INDENT, STRING) PAIRS FROM LIST OF TREES ------
       const
           indentLevelTuplesB = indentLevelsFromForest(forestA);
       // LIST OF TREES FROM SECONDARY (INDENT, STRING) PAIRS
       const forestB = forestFromIndentLevels(
           indentLevelTuplesB
       );
       // JSON OUTPUT OF FORESTS AND INDENT TUPLES -------
       console.log('Tree structure from outline:\n')
       console.log(jsonFromForest(forestA));
       console.log('\n\nIndent levels from tree structure:\n')
       console.log(jsonFromIndentLevels(indentLevelTuplesB));
       console.log('\nTree structure from indent levels:\n')
       console.log(jsonFromForest(forestB));
   };
   // CONVERSIONS BETWEEN OUTLINES, TREES, AND (LEVEL, VALUE) PAIRS
   // indentLevelsFromLines :: [String] -> [(Int, String)]
   const indentLevelsFromLines = xs => {
       const
           indentTextPairs = xs.map(compose(
               firstArrow(length), span(isSpace)
           )),
           indentUnit = minimum(indentTextPairs.flatMap(pair => {
               const w = fst(pair);
               return 0 < w ? [w] : [];
           }));
       return indentTextPairs.map(
           firstArrow(flip(div)(indentUnit))
       );
   };
   // forestFromIndentLevels :: [(Int, String)] -> [Tree String]
   const forestFromIndentLevels = tuples => {
       const go = xs =>
           0 < xs.length ? (() => {
               const [n, s] = Array.from(xs[0]);
               // Lines indented under this line,
               // tupled with all the rest.
               const [firstTreeLines, rest] = Array.from(
                   span(x => n < x[0])(xs.slice(1))
               );
               // This first tree, and then the rest.
               return [Node(s)(go(firstTreeLines))]
                   .concat(go(rest));
           })() : [];
       return go(tuples);
   };
   // indentLevelsFromForest :: [Tree a] -> [(Int, a)]
   const indentLevelsFromForest = trees => {
       const go = n => node => [
               [n, node.root]
           ]
           .concat(node.nest.flatMap(go(1 + n)))
       return trees.flatMap(go(0));
   };
   // JSON RENDERING OF NESTED LINES AND (LEVEL, VALUE) PAIRS
   // jsonFromForest :: [Tree a] -> JSON String
   const jsonFromForest = trees =>
       JSON.stringify(
           nestedListsFromForest(trees),
           null, 2
       );
   // nestedListsFromForest :: [Tree a] -> NestedList a
   const nestedListsFromForest = xs => {
       const go = node => [node.root, node.nest.map(go)];
       return xs.map(go);
   };
   // jsonFromIndentLevels :: [(Int, String)] -> JSON String
   const jsonFromIndentLevels = xs =>
       JSON.stringify(
           xs.map(x => Array.from(x)),
           null, 2
       );


   // GENERIC FUNCTIONS ----------------------------
   // Node :: a -> [Tree a] -> Tree a
   const Node = v => xs => ({
       type: 'Node',
       root: v, // any type of value (consistent across tree)
       nest: xs || []
   });
   // Tuple (,) :: a -> b -> (a, b)
   const Tuple = a => b => ({
       type: 'Tuple',
       '0': a,
       '1': b,
       length: 2
   });
   // compose (<<<) :: (b -> c) -> (a -> b) -> a -> c
   const compose = (...fs) =>
       x => fs.reduceRight((a, f) => f(a), x);
   // div :: Int -> Int -> Int
   const div = x => y => Math.floor(x / y);
   // firstArrow :: (a -> b) -> ((a, c) -> (b, c))
   const firstArrow = f => xy => Tuple(f(xy[0]))(
       xy[1]
   );
   // flip :: (a -> b -> c) -> b -> a -> c
   const flip = f =>
       1 < f.length ? (
           (a, b) => f(b, a)
       ) : (x => y => f(y)(x));
   // foldl1 :: (a -> a -> a) -> [a] -> a
   const foldl1 = f => xs =>
       1 < xs.length ? xs.slice(1)
       .reduce(uncurry(f), xs[0]) : xs[0];
   // fst :: (a, b) -> a
   const fst = tpl => tpl[0];
   // isSpace :: Char -> Bool
   const isSpace = c => /\s/.test(c);
   // Returns Infinity over objects without finite length.
   // This enables zip and zipWith to choose the shorter
   // argument when one is non-finite, like cycle, repeat etc
   // length :: [a] -> Int
   const length = xs =>
       (Array.isArray(xs) || 'string' === typeof xs) ? (
           xs.length
       ) : Infinity;
   // lines :: String -> [String]
   const lines = s => s.split(/[\r\n]/);
   // minimum :: Ord a => [a] -> a
   const minimum = xs =>
       0 < xs.length ? (
           foldl1(a => x => x < a ? x : a)(xs)
       ) : undefined;
   // span :: (a -> Bool) -> [a] -> ([a], [a])
   const span = p => xs => {
       const iLast = xs.length - 1;
       return splitAt(
           until(i => iLast < i || !p(xs[i]))(
               succ
           )(0)
       )(xs);
   };
   // splitAt :: Int -> [a] -> ([a], [a])
   const splitAt = n => xs =>
       Tuple(xs.slice(0, n))(
           xs.slice(n)
       );
   // succ :: Enum a => a -> a
   const succ = x => 1 + x;
   // uncurry :: (a -> b -> c) -> ((a, b) -> c)
   const uncurry = f =>
       (x, y) => f(x)(y);
   // until :: (a -> Bool) -> (a -> a) -> a -> a
   const until = p => f => x => {
       let v = x;
       while (!p(v)) v = f(v);
       return v;
   };
   // SAMPLE OUTLINES ------------------------------------
   const strOutlineA = `Heilmeier catechism
   Objectives and benefits
       What are you trying to do?
           Articulate your objectives using absolutely no jargon.
           What are the problems you address ?
               How is it done today,
               and what are the limits of current practice?
           What is your solution ?
               What is new in your approach
               and why do you think it will be successful?
       Who cares? If you are successful, what difference will it make?
   Costs
       What are the risks?
       How much will it cost?
       How long will it take?
   Indicators
       What are the mid-term and final “exams” to check for success?`;
   const strOutlineB = `Rosetta stone
   is a granodiorite stele
       engraved
           with Greek and Egyptian texts
       in different scripts.
   which shed new light
       on various homologies.`;
   // MAIN ---
   return main();

})();</lang>

Output:
Tree structure from outline:

[
  [
    "Rosetta stone",
    [
      [
        "is a granodiorite stele",
        [
          [
            "engraved",
            [
              [
                "with Greek and Egyptian texts",
                []
              ]
            ]
          ],
          [
            "in different scripts.",
            []
          ]
        ]
      ],
      [
        "which shed new light",
        [
          [
            "on various homologies.",
            []
          ]
        ]
      ]
    ]
  ]
]

Indent levels from tree structure:

[
  [
    0,
    "Rosetta stone"
  ],
  [
    1,
    "is a granodiorite stele"
  ],
  [
    2,
    "engraved"
  ],
  [
    3,
    "with Greek and Egyptian texts"
  ],
  [
    2,
    "in different scripts."
  ],
  [
    1,
    "which shed new light"
  ],
  [
    2,
    "on various homologies."
  ]
]

Tree structure from indent levels:

[
  [
    "Rosetta stone",
    [
      [
        "is a granodiorite stele",
        [
          [
            "engraved",
            [
              [
                "with Greek and Egyptian texts",
                []
              ]
            ]
          ],
          [
            "in different scripts.",
            []
          ]
        ]
      ],
      [
        "which shed new light",
        [
          [
            "on various homologies.",
            []
          ]
        ]
      ]
    ]
  ]
]

Python

Procedural

Just arranges the standard lists and tuples for the datastructures allowing pprint to show the different arrangement of storage.

<lang python>from pprint import pprint as pp from collections import namedtuple

def to_indent(node, depth=0, flat=None):

   if flat is None:
       flat = []
   if node:
       flat.append((depth, node[0]))
   for child in node[1]:
       to_indent(child, depth + 1, flat)
   return flat

def to_nest(lst, depth=0, level=None):

   if level is None:
       level = []
   while lst:
       d, name = lst[0]
       if d == depth:
           children = []
           level.append((name, children))
           lst.pop(0)
       elif d > depth:  # down
           to_nest(lst, d, children)
       elif d < depth:  # up
           return
   return level[0] if level else None
                   

if __name__ == '__main__':

   print('Start Nest format:')
   nest = ('RosettaCode', [('rocks', [('code', []), ('comparison', []), ('wiki', [])]), 
                           ('mocks', [('golfing', [])])])
   pp(nest, width=25)
   print('\n... To Indent format:')
   as_ind = to_indent(nest)
   pp(as_ind, width=25)
   print('\n... To Nest format:')
   as_nest = to_nest(as_ind)
   pp(as_nest, width=25)
   if nest != as_nest:
       print("Whoops round-trip issues")</lang>
Output:
Start Nest format:
('RosettaCode',
 [('rocks',
   [('code', []),
    ('comparison', []),
    ('wiki', [])]),
  ('mocks',
   [('golfing', [])])])

... To Indent format:
[(0, 'RosettaCode'),
 (1, 'rocks'),
 (2, 'code'),
 (2, 'comparison'),
 (2, 'wiki'),
 (1, 'mocks'),
 (2, 'golfing')]

... To Nest format:
('RosettaCode',
 [('rocks',
   [('code', []),
    ('comparison', []),
    ('wiki', [])]),
  ('mocks',
   [('golfing', [])])])

Functional

Using a Node constructor with root and nest keys for the value and sub-forest of each tree node, and serialising both trees and nesting-level lists to JSON-compatible formats.

(Initial tree constructed as the parse of an outline text)

Works with: Python version 3.7

<lang python>Tree data structures

from itertools import chain, takewhile import json


  1. Node :: a -> [Tree a] -> Tree a

def Node(v):

   Constructor for a Tree node which connects a
      value of some kind to a list of zero or
      more child trees.
   
   return lambda xs: {'type': 'Tree', 'root': v, 'nest': xs}


  1. forestFromNesttLevels :: [(Int, String)] -> [Tree String]

def forestFromNestLevels(tuples):

   A list of trees derived from a list of lines paired
      with integers giving their levels of indentation.
   
   def go(xs):
       if xs:
           (intIndent, txt) = xs[0]
           (firstTreeLines, rest) = span(
               lambda x: intIndent < x[0]
           )(xs[1:])
           return [Node(txt)(go(firstTreeLines))] + go(rest)
       else:
           return []
   return go(tuples)


  1. nestLevelsFromForest :: [Tree a] -> [(Int, a)]

def nestLevelsFromForest(xs):

   A flat list of (nest level, value) tuples,
      representing a series of trees.
   
   def go(level):
       return lambda node: [(level, node['root'])] + concatMap(
           go(1 + level)
       )(node['nest'])
   return concatMap(go(0))(xs)


  1. INITIAL TREE FROM PARSE OF OUTLINE TEXT -----------------
  1. indentLevelsFromLines :: [String] -> [(Int, String)]

def indentLevelsFromLines(xs):

   Each input line stripped of leading
      white space, and tupled with a preceding integer
      giving its level of indentation from 0 upwards.
   
   indentTextPairs = [
       (n, s[n:]) for (n, s)
       in ((len(list(takewhile(isSpace, x))), x) for x in xs)
   ]
   indentUnit = min(concatMap(
       lambda x: [x[0]] if x[0] else []
   )(indentTextPairs))
   return [
       (x[0] // indentUnit, x[1])
       for x in indentTextPairs
   ]


  1. JSON SERIALISATION --------------------------------------
  1. forestJSON :: [Tree a] -> JSON String

def forestJSON(trees):

   A simple JSON serialisation of a list of trees, with
      each tree node represented as a [value, nodes] pair.
   
   return json.dumps(
       forestAsNestedPairs(trees),
       indent=2
   )


  1. forestAsNestedPairs :: [Tree a] -> NestedPair [(a, [NestedPair])]

def forestAsNestedPairs(xs):

   A simple nested pair representation of a tree.
   def go(node):
       return [node['root'], [go(x) for x in node['nest']]]
   return [go(x) for x in xs]


  1. TEST -----------------------------------------------------------
  2. main :: IO ()

def main():

   Conversion from trees to flat lists of nest levels,
      and back again, with each stage shown as a JSON
      string.
   
   forestA = forestFromNestLevels(
       indentLevelsFromLines(OUTLINE.splitlines())
   )
   nestLevels = nestLevelsFromForest(forestA)
   forestB = forestFromNestLevels(nestLevels)
   for x in [
           'Parse tree from outline text:\n',
           forestJSON(forestA),
           '\nNesting level list from tree:\n',
           json.dumps(nestLevels, indent=2),
           '\nTree rebuilt from nesting level list:\n',
           forestJSON(forestB),
   ]:
       print(x)


  1. GENERIC -------------------------------------------------


  1. concatMap :: (a -> [b]) -> [a] -> [b]

def concatMap(f):

   A concatenated list or string over which a function f
      has been mapped.
      The list monad can be derived by using an (a -> [b])
      function which wraps its output in a list (using an
      empty list to represent computational failure).
   
   return lambda xs: (.join if isinstance(xs, str) else list)(
       chain.from_iterable(map(f, xs))
   )


  1. isSpace :: Char -> Bool
  2. isSpace :: String -> Bool

def isSpace(s):

   True if s is not empty, and
      contains only white space.
   
   return s.isspace()


  1. span :: (a -> Bool) -> [a] -> ([a], [a])

def span(p):

   The longest (possibly empty) prefix of xs
      that contains only elements satisfying p,
      tupled with the remainder of xs.
      span p xs is equivalent to (takeWhile p xs, dropWhile p xs).
   
   def go(xs):
       prefix = list(takewhile(p, xs))
       return (prefix, xs[len(prefix):])
   return lambda xs: go(xs)


  1. MAIN ---

if __name__ == '__main__':

   OUTLINE = Rosetta stone
   is a granodiorite stele
       engraved
           with Greek and Egyptian texts
       in different scripts.
   which shed new light
       on various homologies.
   main()</lang>
Output:
Parse tree from outline text:

[
  [
    "Rosetta stone",
    [
      [
        "is a granodiorite stele",
        [
          [
            "engraved",
            [
              [
                "with Greek and Egyptian texts",
                []
              ]
            ]
          ],
          [
            "in different scripts.",
            []
          ]
        ]
      ],
      [
        "which shed new light",
        [
          [
            "on various homologies.",
            []
          ]
        ]
      ]
    ]
  ]
]

Nesting level list from tree:

[
  [
    0,
    "Rosetta stone"
  ],
  [
    1,
    "is a granodiorite stele"
  ],
  [
    2,
    "engraved"
  ],
  [
    3,
    "with Greek and Egyptian texts"
  ],
  [
    2,
    "in different scripts."
  ],
  [
    1,
    "which shed new light"
  ],
  [
    2,
    "on various homologies."
  ]
]

Tree rebuilt from nesting level list:

[
  [
    "Rosetta stone",
    [
      [
        "is a granodiorite stele",
        [
          [
            "engraved",
            [
              [
                "with Greek and Egyptian texts",
                []
              ]
            ]
          ],
          [
            "in different scripts.",
            []
          ]
        ]
      ],
      [
        "which shed new light",
        [
          [
            "on various homologies.",
            []
          ]
        ]
      ]
    ]
  ]
]