Brace expansion: Difference between revisions

Content added Content deleted

Inline

Revision as of 01:03, 1 February 2014

Brace expansion is a type of parameter expansion made popular by Unix shells, where it allows users to specify multiple similar string parameters without having to type them all out. E.g. the parameter enable_{audio,video} would be interpreted as if both enable_audio and enable_video had been specified.

The Task

Write a function that can perform brace expansion on any input string.
Demonstrate how it would be used, and that it passes the four test cases given below.

Specification

For the purposes of this task, "brace expansion" shall be defined according to the following specification. Please try to comply with it exactly, even when inconvenient, to ensure that all implementations are comparable. (See #Perl for a reference implementation.)

These characters (and only these!) are to be interpreted as potential meta-characters in the input string:

`{`	opens an alternation group
`,`	separates alternatives inside an alternation group
`}`	closes the last opened alternation group
`\`	escapes the immediately following character, to prevent it from being treated as a meta-character

A valid alternation group consists of a balanced pair of braces containing two or more comma-separated alternatives (e.g. {,} or {aa,aa} or {,aa,bb} or {aa,bb,,cc}). The brace and comma characters which make up such a group are not passed along to the output; instead, the number of outputs is multiplied by the number of alternatives (which need not be unique), each copy featuring one of them in turn at that position in the string (e.g. aa{,1,2,1}b --> aab aa1b aa2b aa1b ).
The multiplication of outputs caused by two or more alternation groups in the same string is cumulative (e.g. {a,b}{1,2} --> a1 a2 b1 b2 ).
The rules for parsing/expanding the whole string also apply recursively to each alternative of an alternation group, so nested groups result in nested expansion (e.g. {a,b{1,2}} --> a b1 b2 ).
Extraneous brace and comma characters which are not part of a valid alternation group, must be passed along to the output unchanged. These are:
- commas which are not contained by any balanced brace pair (e.g. a,b{1,2} --> a,b1 a,b2 )
- brace pairs which have fewer than two alternatives, that is, contain no commas or braces (e.g. a{b{1,2}c{}}) --> a{b1c{}} a{b2c{}} ) Neither of the braces is eligible to participate in delimiting a valid comma group; both braces and any contents are considered literal.
- opening braces which don't have a matching closing brace, or vice versa (e.g. a{b{1,2}c --> a{b1c a{b2c ) For this purpose, the innermost braces that produce a valid alternating group are taken as the intended pair, and the outer extra left or right brace is taken as literal.
All backslashes must be passed along to the output unchanged, despite being interpreted as meta-characters (e.g. a\\{1,2\,c} --> a\\1 a\\2\,c ). A single backslash at the end of the string is taken as a literal.
The code should be able to handle inputs with arbitrary numbers of brace groups and nesting levels (limited only by the computer's memory or by a reasonably high recursion limit).
Results should be returned in the form of an array/list-like data structure (suitable for further processing within the same program), and in the order demonstrated below (i.e. as if the original brace groups were used as sort keys for lexicographic ordering).

Test Cases

Input	Ouput
`~/{Downloads,Pictures}/*.{jpg,gif,png}`	`~/Downloads/.jpg` `~/Downloads/.gif` `~/Downloads/.png` `~/Pictures/.jpg` `~/Pictures/.gif` `~/Pictures/.png`
`It{{em,alic}iz,erat}e{d,}, please.`	`Itemized, please.` `Itemize, please.` `Italicized, please.` `Italicize, please.` `Iterated, please.` `Iterate, please.`
`{,{,gotta have{ ,\, again\, }}more }cowbell!`	`cowbell!` `more cowbell!` `gotta have more cowbell!` `gotta have\, again\, more cowbell!`
`{}} some {\\{edge,edgy} }{ cases, here\\\}`	`{}} some {\\edge }{ cases, here\\\}` `{}} some {\\edgy }{ cases, here\\\}`

Motivation

The point of this task is to show how different programming languages deal with a real-life problem that combines:

simple lexing (to deal with backslash escapes)
fault-tolerant parsing (i.e. handling all inputs, even partially invalid ones, without aborting)
list transformation (Conceptually, this task involves parsing linear information into a tree, and then recursively flattening it while applying the Cartesian product to each level. But feel free to take shortcuts!)

So if your language provides ready-to-use brace expansion functionality, then by all means mention it, but please also show a solution that implements it manually.

Haskell

"Here is a direct translation to Haskell using parsec" (of an earlier version of the Perl 6 solution):

<lang haskell>import Control.Applicative (pure, (<$>), (<*>)) import Control.Monad (forever) import Text.Parsec

parser :: Parsec String u [String] parser = expand <$> many (try alts <|> try alt1 <|> escape <|> pure . pure <$> anyChar)

   where alts = concat <$> between (char '{') (char '}') (alt `sepBy2` char ',')
         alt1 = (\s -> ["{" ++ s ++ "}"]) <$> between (char '{') (char '}') (many $ noneOf ",{}")
         alt = expand <$> many (try alts <|> try alt1 <|> escape <|> pure . pure <$> noneOf ",}")
         escape = pure <$> sequence [char '\\', anyChar]
         expand = foldr (\x xs -> (++) <$> x <*> xs) [""]
         p `sepBy2` sep = (:) <$> p <*> many1 (sep >> p)

main :: IO () main = forever $ parse parser [] <$> getLine >>= either print (mapM_ putStrLn)</lang>

Output:

$ ./bracex
~/{Downloads,Pictures}/*.{jpg,gif,png}
~/Downloads/*.jpg
~/Downloads/*.gif
~/Downloads/*.png
~/Pictures/*.jpg
~/Pictures/*.gif
~/Pictures/*.png
It{{em,alic}iz,erat}e{d,}, please.
Itemized, please.
Itemize, please.
Italicized, please.
Italicize, please.
Iterated, please.
Iterate, please.
{,{,gotta have{ ,\, again\, }}more }cowbell!
cowbell!
more cowbell!
gotta have more cowbell!
gotta have\, again\, more cowbell!
{}} some {\\{edge,edgy} }{ cases, here\\\}
{}} some {\\edge }{ cases, here\\\}
{}} some {\\edgy }{ cases, here\\\}
a{b{1,2}c
a{b1c
a{b2c
a{1,2}b}c
a1b}c
a2b}c
a{1,{2},3}b
a1b
a{2}b
a3b
a{b{1,2}c{}}
a{b1c{}}
a{b2c{}}
^D
bracex: <stdin>: hGetLine: end of file

Perl

Perl has a built-in glob function which does brace expansion, but it can't be used to solve this task because it also does shell-like word splitting, wildcard expansion, and tilde expansion at the same time. The File::Glob core module gives access to the more low-level bsd_glob function which actually supports exclusive brace expansion, however it does not comply with this task's specification when it comes to preserving backslashes and handling unbalanced brace characters.

So here is a manual solution that implements the specification precisely:

<lang perl>sub brace_expand {

   my $input = shift;
   my @stack = ([my $current = []]);
   
   while ($input =~ /\G ((?:[^\\{,}]++ | \\(?:.|\z))++ | . )/gx) {
       if ($1 eq '{') {
           push @stack, [$current = []];
       }
       elsif ($1 eq ',' && @stack > 1) {
           push @{$stack[-1]}, ($current = []);
       }
       elsif ($1 eq '}' && @stack > 1) {
           my $group = pop @stack;
           $current = $stack[-1][-1];
           
           # handle the case of brace pairs without commas:
           @{$group->[0]} = map { "{$_}" } @{$group->[0]} if @$group == 1;
           
           @$current = map {
               my $c = $_;
               map { map { $c . $_ } @$_ } @$group;
           } @$current;
       }
       else { $_ .= $1 for @$current; }
   }
   
   # handle the case of missing closing braces:
   while (@stack > 1) {
       my $right = pop @{$stack[-1]};
       my $sep;
       if (@{$stack[-1]}) { $sep = ',' }
       else               { $sep = '{'; pop @stack }
       $current = $stack[-1][-1];
       @$current = map {
           my $c = $_;
           map { $c . $sep . $_ } @$right;
       } @$current;
   }
   
   return @$current;

}</lang>

Usage demonstration: <lang perl>while (my $input = ) {

   chomp($input);
   print "$input\n";
   print "    $_\n" for brace_expand($input);
   print "\n";

}

__DATA__ ~/{Downloads,Pictures}/*.{jpg,gif,png} It{{em,alic}iz,erat}e{d,}, please. {,{,gotta have{ ,\, again\, }}more }cowbell! {}} some {\\{edge,edgy} }{ cases, here\\\}</lang>

Output:

~/{Downloads,Pictures}/*.{jpg,gif,png}
    ~/Downloads/*.jpg
    ~/Downloads/*.gif
    ~/Downloads/*.png
    ~/Pictures/*.jpg
    ~/Pictures/*.gif
    ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
    Itemized, please.
    Itemize, please.
    Italicized, please.
    Italicize, please.
    Iterated, please.
    Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
    cowbell!
    more cowbell!
    gotta have more cowbell!
    gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
    {}} some {\\edge }{ cases, here\\\}
    {}} some {\\edgy }{ cases, here\\\}

Perl 6

The task description allows the taking of shortcuts, but please note that we are not taking any shortcuts here. The solution is short because this particular problem happens to map quite naturally to the strengths of Perl 6.

First, the parsing is handled with a grammar that can backtrack in the few places this problem needs it. The 2..* is the exact quantifier needed for valid items, and the % operator requires the comma between each quantified item to its left. Note that the * quantifiers do not backtrack here, because the token keyword suppresses that; all the backtracking here fails over to a different alternative in an outer alternation (that is, things separated by the | character in the grammar, which means match order is determined by longest token matching.)

On the other end, we recursively walk the parse tree returning expanded sublists, and we do the cartesian concatenation of sublists at each level by use of the X~ operator, which is a "cross" metaoperator used on a simple ~ concatenation. As a list infix operator, X~ does not care how many items are on either side, which is just what you want in this case, since some of the arguments are strings and some are lists. Here we use a fold or reduction form in square brackets to interpose the cross-concat between each value generated by the map, which returns a mixture of lists and literal strings. One other thing that might not be obvious: if we bind to the match variable, $/, we automatically get all the syntactic sugar for its submatches. In this case, $0 is short for $/[0], and represents all the submatches captured by 0th set of parens in either TOP or alt. $<meta> is likewise short for $/<meta>, and retrieves what was captured by that named submatch. <lang perl6>grammar BraceExpansion {

   token TOP { ( <meta> | . )* }

   token meta {
       | '{' <alts> '}'
       | '{' <-[ { , } ]>*? '}'
       | \\ .
   }

   token alts { <alt> ** 2..* % ',' }

   token alt { ( <meta> | <-[ , } ]> )* }

}

sub crosswalk($/) {

   [X~] , $0.map: -> $/ { [$<meta><alts><alt>».&crosswalk] or ~$/ }

}

sub brace-expand($s) { crosswalk BraceExpansion.parse($s) }</lang> And to test... <lang perl6>sub bxtest(*@s) {

   for @s -> $s {
       say "\n$s";
       for brace-expand($s) {
           say "    ", $_;
       }
   }

}

bxtest Q:to/END/.lines;

   ~/{Downloads,Pictures}/*.{jpg,gif,png}
   It{{em,alic}iz,erat}e{d,}, please.
   {,{,gotta have{ ,\, again\, }}more }cowbell!
   {}} some {\\{edge,edgy} }{ cases, here\\\}
   a{b{1,2}c
   a{1,2}b}c
   a{1,{2},3}b
   a{b{1,2}c{}}
   END</lang>

Output:

~/{Downloads,Pictures}/*.{jpg,gif,png}
    ~/Downloads/*.jpg
    ~/Downloads/*.gif
    ~/Downloads/*.png
    ~/Pictures/*.jpg
    ~/Pictures/*.gif
    ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
    Itemized, please.
    Itemize, please.
    Italicized, please.
    Italicize, please.
    Iterated, please.
    Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
    cowbell!
    more cowbell!
    gotta have more cowbell!
    gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
    {}} some {\\edge }{ cases, here\\\}
    {}} some {\\edgy }{ cases, here\\\}

a{b{1,2}c
    a{b1c
    a{b2c

a{1,2}b}c
    a1b}c
    a2b}c

a{1,{2},3}b
    a1b
    a{2}b
    a3b

a{b{1,2}c{}}
    a{b1c{}}
    a{b2c{}}

Python

<lang python>def getitem(s, depth=0):

   out = [""]
   while s:
       c = s[0]
       if depth and (c == ',' or c == '}'):
           return out,s
       if c == '{':
           x = getgroup(s[1:], depth+1)
           if x:
               out,s = [a+b for a in out for b in x[0]], x[1]
               continue
       if c == '\\' and len(s) > 1:
           s, c = s[1:], c + s[1]

       out, s = [a+c for a in out], s[1:]

   return out,s

def getgroup(s, depth):

   out, comma = [], False
   while s:
       g,s = getitem(s, depth)
       if not s: break
       out += g

       if s[0] == '}':
           if comma: return out, s[1:]
           return ['{' + a + '}' for a in out], s[1:]

       if s[0] == ',':
           comma,s = True, s[1:]

   return None

stolen cowbells from perl6 example

for s in ~/{Downloads,Pictures}/*.{jpg,gif,png} It{{em,alic}iz,erat}e{d,}, please. {,{,gotta have{ ,\, again\, }}more }cowbell! {}} some {\\\\{edge,edgy} }{ cases, here\\\\\} a{b{1,2}c a{1,2}b}c a{1,{2},3}b a{b{1,2}c{}}.split('\n'):

   print "\n\t".join([s] + getitem(s)[0]) + "\n"</lang>

Output:

~/{Downloads,Pictures}/*.{jpg,gif,png}
        ~/Downloads/*.jpg
        ~/Downloads/*.gif
        ~/Downloads/*.png
        ~/Pictures/*.jpg
        ~/Pictures/*.gif
        ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
        Itemized, please.
        Itemize, please.
        Italicized, please.
        Italicize, please.
        Iterated, please.
        Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
        cowbell!
        more cowbell!
        gotta have more cowbell!
        gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
        {}} some {\\edge }{ cases, here\\\}
        {}} some {\\edgy }{ cases, here\\\}

a{b{1,2}c
        a{b1c
        a{b2c

a{1,2}b}c
        a1b}c
        a2b}c

a{1,{2},3}b
        a1b
        a{2}b
        a3b

a{b{1,2}c{}}
        a{b1c{}}
        a{b2c{}}

@@ Line 153: / Line 153: @@
 ^D
 bracex: <stdin>: hGetLine: end of file</pre>
-=={{header|J}}==
-Caution: task ambiguity - see talk page.
-<lang J>legit=: 1,_1}.4>(3;(_2[\"1".;._2]0 :0);('\';a.);0 _1 0 1)&;:
-1   1 1
-2   1 2
-2   1 2
-)
-comma=: legit * =&','
-left=:  legit * =&'{'
-right=: legit * =&'}'
-expand=:3 :0
-  candidates=: i.0 2
-  for_check.I. comma y do.
-    begin=. >./I. left check{. y
-    end=. check+<./I. right check}. y
-    if. (#y)>:end-begin do. candidates=. candidates,begin,end end.
-  end.
-  if. 0=#candidates do.,<y return. end.
-  'begin end'=. candidates{~(i.>./) -/"1 candidates
-  options=. (<;._1~ comma) ',',(1+begin)}.end{.y
-  ;expand each (<begin{.y),each options,each <(1+end)}.y
-)</lang>
-Example use:
-<lang j>   >expand '~/{Downloads,Pictures}/*.{jpg,gif,png}'
-~/Downloads/*.jpg
-~/Pictures/*.jpg
-~/Downloads/*.gif
-~/Pictures/*.gif
-~/Downloads/*.png
-~/Pictures/*.png
-   > expand 'It{{em,alic}iz,erat}e{d,}, please.'
-Itemized, please.
-Iterated, please.
-Italicized, please.
-Iterated, please.
-Itemize, please.
-Iterate, please.
-Italicize, please.
-Iterate, please.
-   >expand '{,{,gotta have{ ,\, again\, }}more }cowbell!'
-cowbell!
-more cowbell!
-cowbell!
-gotta have more cowbell!
-cowbell!
-more cowbell!
-cowbell!
-gotta have\, again\, more cowbell!
-   >expand '{}} some {\\{edge,edgy} }{ cases, here\\\}'
-{}} some {\\edge }{ cases, here\\\}
-{}} some {\\edgy }{ cases, here\\\}</lang>
 =={{header|Perl}}==