Brace expansion

From Rosetta Code
Revision as of 16:52, 31 January 2014 by rosettacode>TimToady (→‎{{header|Haskell}}: mark incomplete, needs output)
Brace expansion is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Brace expansion is a type of parameter expansion made popular by Unix shells, where it allows users to specify multiple similar string parameters without having to type them all out. E.g. the parameter enable_{audio,video} would be interpreted as if both enable_audio and enable_video had been specified.

The Task

Write a function that can perform brace expansion on any input string.
Demonstrate how it would be used, and that it passes the four test cases given below.

Specification

For the purposes of this task, "brace expansion" shall be defined according to the following specification. Please try to comply with it exactly, even when inconvenient, to ensure that all implementations are comparable. (See #Perl for a reference implementation.)

  • These characters (and only these!) are to be interpreted as potential meta-characters in the input string:
    { opens an alternation group
    , separates alternatives inside an alternation group
    } closes the last opened alternation group
    \ escapes the immediately following character, to prevent it from being treated as a meta-character
  • A valid alternation group consists of a balanced pair of braces containing two or more comma-separated alternatives (e.g. {,} or {aa,aa} or {,aa,bb} or {aa,bb,,cc}). The brace and comma characters which make up such a group are not passed along to the output; instead, the number of outputs is multiplied by the number of alternatives, each copy featuring one of the alternatives at that position in the string  (e.g. aa{,1,2,1}b --> aab  aa1b  aa2b  aa1b ).
  • The multiplication of outputs caused by two or more alternation groups in the same string is cumulative  (e.g. {a,b}{1,2} --> a1  a2  b1  b2 ).
  • The rules for parsing/expanding the whole string also apply recursively to each alternative of an alternation group, so nested groups result in nested expansion  (e.g. {a,b{1,2}} --> a  b1  b2 ).
  • Extraneous brace and comma characters which are not part of a valid alternation group, must be passed along to the output unchanged. These are:
    • commas which are not contained by any balanced brace pair  (e.g. a,b{1,2} --> a,b1  a,b2 )
    • brace pairs which have fewer than two alternatives, that is, contain no commas or braces  (e.g. a{b{1,2}c{}}) --> a{b1c{}}  a{b2c{}} ) Neither of the braces is eligible to participate in delimiting a valid comma group; both braces and any contents are considered literal.
    • opening braces which don't have a matching closing brace, or vice versa  (e.g. a{b{1,2}c --> a{b1c  a{b2c ) For this purpose, the innermost braces that produce a valid alternating group are taken as the intended pair, and the outer extra left or right brace is taken as literal.
  • All backslashes must be passed along to the output unchanged, despite being interpreted as meta-characters  (e.g. a\\{1,2\,c} --> a\\1  a\\2\,c ). A single backslash at the end of the string is taken as a literal.
  • The code should be able to handle inputs with arbitrary numbers of brace groups and nesting levels (limited only by the computer's memory or by a reasonably high recursion limit).
  • Results should be returned in the form of an array/list-like data structure (suitable for further processing within the same program), and in the order demonstrated below (i.e. as if the original brace groups were used as sort keys for lexicographic ordering).

Test Cases

Input Ouput

~/{Downloads,Pictures}/*.{jpg,gif,png}

~/Downloads/*.jpg
~/Downloads/*.gif
~/Downloads/*.png
~/Pictures/*.jpg
~/Pictures/*.gif
~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.

Itemized, please.
Itemize, please.
Italicized, please.
Italicize, please.
Iterated, please.
Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!

cowbell!
more cowbell!
gotta have more cowbell!
gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}

{}} some {\\edge }{ cases, here\\\}
{}} some {\\edgy }{ cases, here\\\}

Motivation

The point of this task is to show how different programming languages deal with a real-life problem that combines:

  • simple lexing (to deal with backslash escapes)
  • fault-tolerant parsing (i.e. handling all inputs, even partially invalid ones, without aborting)
  • list transformation (Conceptually, this task involves parsing linear information into a tree, and then recursively flattening it while applying the Cartesian product to each level. But feel free to take shortcuts!)

So if your language provides ready-to-use brace expansion functionality, then by all means mention it, but please also show a solution that implements it manually.

Haskell

This example is incomplete. Please show output of test cases Please ensure that it meets all task requirements and remove this message.

"Here is a direct translation to Haskell using parsec" (of an earlier version of the Perl 6 solution):

<lang haskell>import Control.Applicative (pure, (<$>), (<*>)) import Control.Monad (forever) import Text.Parsec

parser :: Parsec String u [String] parser = expand <$> many (try alts <|> try alt1 <|> escape <|> pure . pure <$> anyChar)

   where alts = concat <$> between (char '{') (char '}') (alt `sepBy2` char ',')
         alt1 = (\s -> ["{" ++ s ++ "}"]) <$> between (char '{') (char '}') (many $ noneOf ",{}")
         alt = expand <$> many (try alts <|> try alt1 <|> escape <|> pure . pure <$> noneOf ",}")
         escape = pure <$> sequence [char '\\', anyChar]
         expand = foldr (\x xs -> (++) <$> x <*> xs) [""]
         p `sepBy2` sep = (:) <$> p <*> many1 (sep >> p)

main :: IO () main = forever $ parse parser [] <$> getLine >>= either print (mapM_ putStrLn)</lang>

Perl

Perl has a built-in glob function which does brace expansion, but it can't be used to solve this task because it also does shell-like word splitting, wildcard expansion, and tilde expansion at the same time. The File::Glob core module gives access to the more low-level bsd_glob function which actually supports exclusive brace expansion, however it does not comply with this task's specification when it comes to preserving backslashes and handling unbalanced brace characters.

So here is a manual solution that implements the specification precisely:

<lang perl>sub brace_expand {

   my $input = shift;
   my @stack = ([my $current = []]);
   
   while ($input =~ /\G ((?:[^\\{,}]++ | \\(?:.|\z))++ | . )/gx) {
       if ($1 eq '{') {
           push @stack, [$current = []];
       }
       elsif ($1 eq ',' && @stack > 1) {
           push @{$stack[-1]}, ($current = []);
       }
       elsif ($1 eq '}' && @stack > 1) {
           my $group = pop @stack;
           $current = $stack[-1][-1];
           
           # handle the case of brace pairs without commas:
           @{$group->[0]} = map { "{$_}" } @{$group->[0]} if @$group == 1;
           
           @$current = map {
               my $c = $_;
               map { map { $c . $_ } @$_ } @$group;
           } @$current;
       }
       else { $_ .= $1 for @$current; }
   }
   
   # handle the case of missing closing braces:
   while (@stack > 1) {
       my $right = pop @{$stack[-1]};
       my $sep;
       if (@{$stack[-1]}) { $sep = ',' }
       else               { $sep = '{'; pop @stack }
       $current = $stack[-1][-1];
       @$current = map {
           my $c = $_;
           map { $c . $sep . $_ } @$right;
       } @$current;
   }
   
   return @$current;

}</lang>

Usage demonstration: <lang perl>while (my $input = ) {

   chomp($input);
   print "$input\n";
   print "    $_\n" for brace_expand($input);
   print "\n";

}

__DATA__ ~/{Downloads,Pictures}/*.{jpg,gif,png} It{{em,alic}iz,erat}e{d,}, please. {,{,gotta have{ ,\, again\, }}more }cowbell! {}} some {\\{edge,edgy} }{ cases, here\\\}</lang>

Output:
~/{Downloads,Pictures}/*.{jpg,gif,png}
    ~/Downloads/*.jpg
    ~/Downloads/*.gif
    ~/Downloads/*.png
    ~/Pictures/*.jpg
    ~/Pictures/*.gif
    ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
    Itemized, please.
    Itemize, please.
    Italicized, please.
    Italicize, please.
    Iterated, please.
    Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
    cowbell!
    more cowbell!
    gotta have more cowbell!
    gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
    {}} some {\\edge }{ cases, here\\\}
    {}} some {\\edgy }{ cases, here\\\}

Perl 6

The task description allows the taking of shortcuts, but please note that we are not taking any shortcuts here. The solution is short because this particular problem happens to map quite naturally to the strengths of Perl 6.

First, the parsing is handled with a grammar that can backtrack in the few places this problem needs it. The 2..* is the exact quantifier needed for valid items, and the % operator requires the comma between each quantified item to its left. Note by the way that the * quantifiers do not backtrack, because the token keyword suppresses that; all the backtracking here fails over to a different alternative (that is, things separated by the | character in the grammar.)

On the other end, recursively walking the parse tree and doing the cartesian concatenation is handled by use of the X~ operator, which is a "cross" metaoperator used on a simple ~ concatenation. As a list infix operator, X~ does not care how many items are on either side, which is just what you want in this case. Most of these cross-concats have a single literal item on one side or the other; notably we start off the @sofar array with a single empty string, which is the identity value for a cross-concat. <lang perl6>grammar BraceExpansion {

   token TOP { ( <meta> | . )* }
   token meta {
       | '{' <alts> '}'
       | '{' <-[ { , } ]>*? '}'
       | \\ .
   }
   token alts { <alt> ** 2..* % ',' }
   token alt { ( <meta> | <-[ , } ]> )* }

}

sub crosswalk($/) {

   my @sofar = ;
   for $0[] -> $/ {
       @sofar = @sofar X~ ($<meta><alts> ?? $<meta><alts><alt>».&crosswalk !! ~$/);
   }
   @sofar;

}

sub brace-expand($s) { crosswalk BraceExpansion.parse($s) }</lang> And to test... <lang perl6>sub bxtest(*@s) {

   for @s -> $s {
       say "\n$s";
       for brace-expand($s) {
           say "    ", $_;
       }
   }

}

bxtest Q:to/END/.lines;

   ~/{Downloads,Pictures}/*.{jpg,gif,png}
   It{{em,alic}iz,erat}e{d,}, please.
   {,{,gotta have{ ,\, again\, }}more }cowbell!
   {}} some {\\{edge,edgy} }{ cases, here\\\}
   a{b{1,2}c
   a{1,2}b}c
   a{1,{2},3}b
   a{b{1,2}c{}}
   END</lang>
Output:
~/{Downloads,Pictures}/*.{jpg,gif,png}
    ~/Downloads/*.jpg
    ~/Downloads/*.gif
    ~/Downloads/*.png
    ~/Pictures/*.jpg
    ~/Pictures/*.gif
    ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
    Itemized, please.
    Itemize, please.
    Italicized, please.
    Italicize, please.
    Iterated, please.
    Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
    cowbell!
    more cowbell!
    gotta have more cowbell!
    gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
    {}} some {\\edge }{ cases, here\\\}
    {}} some {\\edgy }{ cases, here\\\}

a{b{1,2}c
    a{b1c
    a{b2c

a{1,2}b}c
    a1b}c
    a2b}c

a{1,{2},3}b
    a1b
    a{2}b
    a3b

a{b{1,2}c{}}
    a{b1c{}}
    a{b2c{}}

Oh, and if you're sufficiently steeped in functional programming to dis-admire that single mutable variable, @sofar, here's an equivalent version of crosswalk that brings the X~ out as a reduction, er, I mean, a fold operation. (If you were wondering, or even if you weren't, it doesn't matter whether it's a foldl or a foldr because cross operators are list associative, not left or right associative.) <lang perl6>sub crosswalk($/) {

   [X~] $0.map: -> $/ { $<meta><alts> ?? [$<meta><alts><alt>».&crosswalk] !! ~$/ }

}</lang>

Python

<lang python>def getitem(s, depth=0):

   out = [""]
   while s:
       c = s[0]
       if depth and (c == ',' or c == '}'):
           return out,s
       if c == '{':
           x = getgroup(s[1:], depth+1)
           if x:
               out,s = [a+b for a in out for b in x[0]], x[1]
               continue
       if c == '\\' and len(s) > 1:
           s, c = s[1:], c + s[1]
       out, s = [a+c for a in out], s[1:]
   return out,s

def getgroup(s, depth):

   out, comma = [], False
   while s:
       g,s = getitem(s, depth)
       if not s: break
       out += g
       if s[0] == '}':
           if comma: return out, s[1:]
           return ['{' + a + '}' for a in out], s[1:]
       if s[0] == ',':
           comma,s = True, s[1:]
   return None
  1. stolen cowbells from perl6 example

for s in ~/{Downloads,Pictures}/*.{jpg,gif,png} It{{em,alic}iz,erat}e{d,}, please. {,{,gotta have{ ,\, again\, }}more }cowbell! {}} some {\\\\{edge,edgy} }{ cases, here\\\\\} a{b{1,2}c a{1,2}b}c a{1,{2},3}b a{b{1,2}c{}}.split('\n'):

   print "\n\t".join([s] + getitem(s)[0]) + "\n"</lang>
Output:
~/{Downloads,Pictures}/*.{jpg,gif,png}
        ~/Downloads/*.jpg
        ~/Downloads/*.gif
        ~/Downloads/*.png
        ~/Pictures/*.jpg
        ~/Pictures/*.gif
        ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
        Itemized, please.
        Itemize, please.
        Italicized, please.
        Italicize, please.
        Iterated, please.
        Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
        cowbell!
        more cowbell!
        gotta have more cowbell!
        gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
        {}} some {\\edge }{ cases, here\\\}
        {}} some {\\edgy }{ cases, here\\\}

a{b{1,2}c
        a{b1c
        a{b2c

a{1,2}b}c
        a1b}c
        a2b}c

a{1,{2},3}b
        a1b
        a{2}b
        a3b

a{b{1,2}c{}}
        a{b1c{}}
        a{b2c{}}