Sort an outline at every level

From Rosetta Code
Revision as of 10:41, 19 August 2020 by PureFox (talk | contribs) (→‎{{header|Wren}}: Correction to mixed whitespace code.)
Sort an outline at every level is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Write and test a function over an indented plain text outline which either:

  1. Returns a copy of the outline in which the sub-lists at every level of indentation are sorted, or
  2. reports that the indentation characters or widths are not consistent enough to make the outline structure clear.


Your code should detect and warn of at least two types of inconsistent indentation:

  • inconsistent use of whitespace characters (e.g. mixed use of tabs and spaces)
  • inconsistent indent widths. For example, an indentation with an odd number of spaces in an outline in which the unit indent appears to be 2 spaces, or 4 spaces.


Your code should be able to detect and handle both tab-indented, and space-indented (e.g. 4 space, 2 space etc) outlines, without being given any advance warning of the indent characters used, or the size of the indent units.

You should also be able to specify different types of sort, for example, as a minimum, both ascending and descending lexical sorts.

Your sort should not alter the type or size of the indentation units used in the input outline.


(For an application of Indent Respectful Sort, see the Sublime Text package of that name. The Python source text [1] is available for inspection on Github).


Tests

  • Sort every level of the (4 space indented) outline below lexically, once ascending and once descending.
zeta
    beta
    gamma
        lambda
        kappa
        mu
    delta
alpha
    theta
    iota
    epsilon
  • Do the same with a tab-indented equivalent of the same outline.
zeta
	gamma
		mu
		lambda
		kappa
	delta
	beta
alpha
	theta
	iota
	epsilon


The output sequence of an ascending lexical sort of each level should be:

alpha
    epsilon
    iota
    theta
zeta
    beta
    delta
    gamma
        kappa
        lambda
        mu

The output sequence of a descending lexical sort of each level should be:

zeta
    gamma
        mu
        lambda
        kappa
    delta
    beta
alpha
    theta
    iota
    epsilon
  • Attempt to separately sort each of the following two outlines, reporting any inconsistencies detected in their indentations by your validation code.
alpha
    epsilon
	iota
    theta
zeta
    beta
    delta
    gamma
    	kappa
        lambda
        mu
zeta
    beta
   gamma
        lambda
         kappa
        mu
    delta
alpha
    theta
    iota
    epsilon



Haskell

<lang haskell>{-# LANGUAGE OverloadedStrings #-}

import Data.Tree (Tree(..), foldTree) import qualified Data.Text.IO as T import qualified Data.Text as T import qualified Data.List as L import Data.Bifunctor (first) import Data.Ord (comparing) import Data.Char (isSpace)



OUTLINE SORTED AT EVERY LEVEL --------------

sortedOutline :: (Tree T.Text -> Tree T.Text -> Ordering)

             -> T.Text
             -> Either T.Text T.Text

sortedOutline cmp outlineText =

 let xs = T.lines outlineText
 in consistentIndentUnit (nonZeroIndents xs) >>=
    \indentUnit ->
       let forest = forestFromLineIndents $ indentLevelsFromLines xs
           sortedForest =
             subForest $
             foldTree (\x xs -> Node x (L.sortBy cmp xs)) (Node "" forest)
       in Right $ outlineFromForest (<>) indentUnit sortedForest



TESTS --------------------------

main :: IO () main =

 mapM_ (T.putStrLn . (<> "\n")) $
 concat $
 [ \cmp ->
      either id id . sortedOutline cmp <$>
      [spacedOutline, tabbedOutline, confusedOutline, raggedOutline]
 ] <*>
 -- Two lexical sort comparators: AZ and ZA
 [comparing rootLabel, flip (comparing rootLabel)]

spacedOutline, tabbedOutline, confusedOutline, raggedOutline :: T.Text spacedOutline =

 "zeta\n\
   \    beta\n\
   \    gamma\n\
   \        lambda\n\
   \        kappa\n\
   \        mu\n\
   \    delta\n\
   \alpha\n\
   \    theta\n\
   \    iota\n\
   \    epsilon"

tabbedOutline =

 "zeta\n\
   \\tbeta\n\
   \\tgamma\n\
   \\t\tlambda\n\
   \\t\tkappa\n\
   \\t\tmu\n\
   \\tdelta\n\
   \alpha\n\
   \\ttheta\n\
   \\tiota\n\
   \\tepsilon"

confusedOutline =

 "zeta\n\
   \    beta\n\
   \  gamma\n\
   \        lambda\n\
   \  \t    kappa\n\
   \        mu\n\
   \    delta\n\
   \alpha\n\
   \    theta\n\
   \    iota\n\
   \    epsilon"

raggedOutline =

 "zeta\n\
   \    beta\n\
   \   gamma\n\
   \        lambda\n\
   \         kappa\n\
   \        mu\n\
   \    delta\n\
   \alpha\n\
   \    theta\n\
   \    iota\n\
   \    epsilon"



OUTLINE TREES :: SERIALIZED AND DESERIALIZED ------

forestFromLineIndents :: [(Int, T.Text)] -> [Tree T.Text] forestFromLineIndents = go

 where
   go [] = []
   go ((n, s):xs) = Node s (go subOutline) : go rest
     where
       (subOutline, rest) = span ((n <) . fst) xs

indentLevelsFromLines :: [T.Text] -> [(Int, T.Text)] indentLevelsFromLines xs = first (`div` indentUnit) <$> pairs

 where
   pairs = first T.length . T.span isSpace <$> xs
   indentUnit = maybe 1 fst (L.find ((0 <) . fst) pairs)

outlineFromForest :: (T.Text -> a -> T.Text) -> T.Text -> [Tree a] -> T.Text outlineFromForest showRoot tabString forest = T.unlines $ forest >>= go ""

 where
   go indent node =
     showRoot indent (rootLabel node) :
     (subForest node >>= go (T.append tabString indent))



OUTLINE CHECKING - INDENT CHARACTERS AND WIDTHS -----

consistentIndentUnit :: [T.Text] -> Either T.Text T.Text consistentIndentUnit prefixes = minimumIndent prefixes >>= flip checked prefixes

 where
   checked indentUnit xs
     | all ((0 ==) . (`rem` unitLength) . T.length) xs = Right indentUnit
     | otherwise =
       Left
         ("Inconsistent indent depths: " <>
          T.pack (show (T.length <$> prefixes)))
     where
       unitLength = T.length indentUnit

minimumIndent :: [T.Text] -> Either T.Text T.Text minimumIndent prefixes = go $ T.foldr newChar "" $ T.concat prefixes

 where
   newChar c seen
     | c `L.elem` seen = seen
     | otherwise = c : seen
   go cs
     | 1 < length cs =
       Left $ "Mixed indent characters used: " <> T.pack (show cs)
     | otherwise = Right $ L.minimumBy (comparing T.length) prefixes

nonZeroIndents :: [T.Text] -> [T.Text] nonZeroIndents textLines =

 [ s
 | x <- textLines 
 , s <- [T.takeWhile isSpace x]
 , 0 /= T.length s ]</lang>
Output:
alpha
    epsilon
    iota
    theta
zeta
    beta
    delta
    gamma
        kappa
        lambda
        mu


alpha
    epsilon
    iota
    theta
zeta
    beta
    delta
    gamma
        kappa
        lambda
        mu


Mixed indent characters used: "\t "

Inconsistent indent depths: [4,3,8,9,8,4,4,4,4]

zeta
    gamma
        mu
        lambda
        kappa
    delta
    beta
alpha
    theta
    iota
    epsilon


zeta
    gamma
        mu
        lambda
        kappa
    delta
    beta
alpha
    theta
    iota
    epsilon


Mixed indent characters used: "\t "

Inconsistent indent depths: [4,3,8,9,8,4,4,4,4]

Wren

Library: Wren-sort
Library: Wren-fmt

<lang ecmascript>import "/sort" for Sort import "/fmt" for Fmt

var sortedOutline = Fn.new { |originalOutline, ascending|

   var outline = originalOutline.toList // make copy in case we mutate it
   var indent = ""
   var del = "\x7f"
   var sep = "\0"
   var messages = []
   if (outline[0].trimStart(" \t") != outline[0]) {
       System.print("    outline structure is unclear")
       return
   }
   for (i in 1...outline.count) {
       var line = outline[i]
       var lc = line.count
       if (line.startsWith("  ") || line.startsWith(" \t") || line.startsWith("\t")) {
           var lc2 = line.trimStart(" \t").count
           if (lc2 < lc) {
               var currIndent = line[0...lc-lc2]
               if (indent == "") {
                   indent = currIndent
               } else {
                   var correctionNeeded = false
                   if ((currIndent.contains("\t") && !indent.contains("\t")) ||
                       (!currIndent.contains("\t") && indent.contains("\t"))) {
                       messages.add(indent + "corrected inconsistent whitespace use at line '%(line)'")
                       correctionNeeded = true
                   } else if (currIndent.count % indent.count != 0) {
                       messages.add(indent + "corrected inconsistent indent width at line '%(line)'")
                       correctionNeeded = true
                   }
                   if (correctionNeeded) {
                       var mult = (currIndent.count / indent.count).round
                       outline[i] = (indent * mult) + line[lc-lc2..-1]
                   }
               }
           }
       }
   }
   var levels = List.filled(outline.count, 0)
   levels[0] = 1
   var level = 1
   var margin = ""
   while (!levels.all { |l| l > 0 }) {
       var mc = margin.count
       for (i in 1...outline.count) {
           if (levels[i] == 0) {
               var line = outline[i]
               if (line.startsWith(margin) && line[mc] != " " && line[mc] != "\t") levels[i] = level
           }
       }
       margin = margin + indent
       level = level + 1
   }
   var lines = List.filled(outline.count, "")
   lines[0] = outline[0]
   var nodes = []
   for (i in 1...outline.count) {
       if (levels[i] > levels[i-1]) {
           nodes.add((nodes.count == 0) ? outline[i - 1] : sep + outline[i-1])
       } else if (levels[i] < levels[i-1]) {
           var j = levels[i-1] - levels[i]
           for (k in 1..j) nodes.removeAt(-1)
       }
       if (nodes.count > 0) {
           lines[i] = nodes.join() + sep + outline[i]
       } else {
           lines[i] = outline[i]
       }
   }
   if (ascending) {
       Sort.insertion(lines)
   } else {
       var maxLen = lines.reduce(0) { |max, l| (l.count > max) ? l.count : max }
       for (i in 0...lines.count) lines[i] = Fmt.ljust(maxLen, lines[i], del)
       Sort.insertion(lines, true)
   }
   for (i in 0...lines.count) {
       var s = lines[i].split(sep)
       lines[i] = s[-1]
       if (!ascending) lines[i] = lines[i].trimEnd(del)
   }
   if (messages.count > 0) {
       System.print(messages.join("\n"))
       System.print()
   }
   System.print(lines.join("\n"))

}

var outline = [

   "zeta",
   "    beta",
   "    gamma",
   "        lambda",
   "        kappa",
   "        mu",
   "    delta",
   "alpha",
   "    theta",
   "    iota",
   "    epsilon"

]

var outline2 = outline.map { |s| s.replace(" ", "\t") }.toList

var outline3 = [

   "alpha",
   "    epsilon",

" iota",

   "    theta",
   "zeta",
   "    beta",
   "    delta",
   "    gamma",
   "    \t   kappa", // same length but \t instead of space
   "        lambda",
   "        mu"

]

var outline4 = [

   "zeta",
   "    beta",
   "   gamma",
   "        lambda",
   "         kappa",
   "        mu",
   "    delta",
   "alpha",
   "    theta",
   "    iota",
   "    epsilon"

]

System.print("Four space indented outline, ascending sort:") sortedOutline.call(outline, true)

System.print("\nFour space indented outline, descending sort:") sortedOutline.call(outline, false)

System.print("\nTab indented outline, ascending sort:") sortedOutline.call(outline2, true)

System.print("\nTab indented outline, descending sort:") sortedOutline.call(outline2, false)

System.print("\nFirst unspecified outline, ascending sort:") sortedOutline.call(outline3, true)

System.print("\nFirst unspecified outline, descending sort:") sortedOutline.call(outline3, false)

System.print("\nSecond unspecified outline, ascending sort:") sortedOutline.call(outline4, true)

System.print("\nSecond unspecified outline, descending sort:") sortedOutline.call(outline4, false)</lang>

Output:
Four space indented outline, ascending sort:
alpha
    epsilon
    iota
    theta
zeta
    beta
    delta
    gamma
        kappa
        lambda
        mu

Four space indented outline, descending sort:
zeta
    gamma
        mu
        lambda
        kappa
    delta
    beta
alpha
    theta
    iota
    epsilon

Tab indented outline, ascending sort:
alpha
	epsilon
	iota
	theta
zeta
	beta
	delta
	gamma
		kappa
		lambda
		mu

Tab indented outline, descending sort:
zeta
	gamma
		mu
		lambda
		kappa
	delta
	beta
alpha
	theta
	iota
	epsilon

First unspecified outline, ascending sort:
    corrected inconsistent whitespace use at line '    	   kappa'

alpha
    epsilon
        iota
    theta
zeta
    beta
    delta
    gamma
        kappa
        lambda
        mu

First unspecified outline, descending sort:
    corrected inconsistent whitespace use at line '    	   kappa'

zeta
    gamma
        mu
        lambda
        kappa
    delta
    beta
alpha
    theta
    epsilon
        iota

Second unspecified outline, ascending sort:
    corrected inconsistent indent width at line '   gamma'
    corrected inconsistent indent width at line '         kappa'

alpha
    epsilon
    iota
    theta
zeta
    beta
    delta
    gamma
        kappa
        lambda
        mu

Second unspecified outline, descending sort:
    corrected inconsistent indent width at line '   gamma'
    corrected inconsistent indent width at line '         kappa'

zeta
    gamma
        mu
        lambda
        kappa
    delta
    beta
alpha
    theta
    iota
    epsilon