Brace expansion

From Rosetta Code
Revision as of 05:17, 26 January 2014 by rosettacode>TimToady (clarify that {a} forms contain neither commas nor braces)
Brace expansion is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Brace expansion is a type of parameter expansion made popular by Unix shells, where it allows users to specify multiple similar string parameters without having to type them all out. E.g. the parameter enable_{audio,video} would be interpreted as if both enable_audio and enable_video had been specified.

The Task

Write a function that can perform brace expansion on any input string.
Briefly show how it would be used.

Specification

For the purposes of this task, "brace expansion" shall be defined according to the following specification. Please try to comply with it exactly, even when inconvenient, to ensure that all implementations are comparable. (See #Perl for a reference implementation.)

  • These characters (and only these!) are to be interpreted as potential meta-characters in the input string:
    { opens an alternation group
    , separates alternatives inside an alternation group
    } closes the last opened alternation group
    \ escapes the immediately following character, to prevent it from being treated as a meta-character
  • A valid alternation group consists of a balanced pair of braces containing two or more comma-separated alternatives (e.g. {,} or {aa,aa} or {,aa,bb} or {aa,bb,,cc}). The brace and comma characters which make up such a group are not passed along to the output; instead, the number of outputs is multiplied by the number of alternatives, each copy featuring one of the alternatives at that position in the string  (e.g. aa{,1,2,1}b --> aab  aa1b  aa2b  aa1b ).
  • The multiplication of outputs caused by two or more alternation groups in the same string is cumulative  (e.g. {a,b}{1,2} --> a1  a2  b1  b2 ).
  • The rules for parsing/expanding the whole string also apply recursively to each alternative of an alternation group, so nested groups result in nested expansion  (e.g. {a,b{1,2}} --> a  b1  b2 ).
  • Extraneous brace and comma characters which are not part of a valid alternation group, must be passed along to the output unchanged. These are:
    • commas which are not contained by any balanced brace pair  (e.g. a,b{1,2} --> a,b1  a,b2 )
    • brace pairs which have fewer than two alternatives, that is, contain no commas or braces  (e.g. a{b{1,2}c{}}) --> a{b1c{}}  a{b2c{}} ) Neither of the braces is eligible to participate in delimiting a valid comma group; both braces and any contents are considered literal.
    • opening braces which don't have a matching closing brace, or vice versa  (e.g. a{b{1,2}c --> a{b1c  a{b2c ) For this purpose, the innermost braces that produce a valid alternating group are taken as the intended pair, and the outer extra left or right brace is taken as literal.
  • All backslashes must be passed along to the output unchanged, despite being interpreted as meta-characters  (e.g. a\\{1,2\,c} --> a\\1  a\\2\,c ). A single backslash at the end of the string is taken as a literal.
  • The code should be able to handle inputs with arbitrary numbers of brace groups and nesting levels (limited only by the computer's memory or by a reasonably high recursion limit).
  • Results should be returned in the form of an array/list-like data structure (suitable for further processing within the same program), and in the order demonstrated below (i.e. as if the original brace groups were used as sort keys for lexicographic ordering).

Test Cases

Input Ouput

~/{Downloads,Pictures}/*.{jpg,gif,png}

~/Downloads/*.jpg
~/Downloads/*.gif
~/Downloads/*.png
~/Pictures/*.jpg
~/Pictures/*.gif
~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.

Itemized, please.
Itemize, please.
Italicized, please.
Italicize, please.
Iterated, please.
Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!

cowbell!
more cowbell!
gotta have more cowbell!
gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}

{}} some {\\edge }{ cases, here\\\}
{}} some {\\edgy }{ cases, here\\\}

Motivation

The point of this task is to show how different programming languages deal with a real-life problem that combines:

  • simple lexing (to deal with backslash escapes)
  • fault-tolerant parsing (i.e. handling all inputs, even partially invalid ones, without aborting)
  • list transformation (Conceptually, this task involves parsing linear information into a tree, and then recursively flattening it while applying the Cartesian product to each level. But feel free to take shortcuts!)

So if your language provides ready-to-use brace expansion functionality, then by all means mention it, but please also show a solution that implements it manually.


Perl

Perl has a built-in glob function which does brace expansion, but it can't be used to solve this task because it also does shell-like word splitting, wildcard expansion, and tilde expansion at the same time. The File::Glob core module gives access to the more low-level bsd_glob function which actually supports exclusive brace expansion, however it does not comply with this task's specification when it comes to preserving backslashes and handling unbalanced brace characters.

So here is a manual solution that implements the specification precisely:

<lang perl6>sub brace_expand {

   my $input = shift;
   my @stack = ([my $current = []]);
   
   while ($input =~ /\G ((?:[^\\{,}]++ | \\(?:.|\z))++ | . )/gx) {
       if ($1 eq '{') {
           push @stack, [$current = []];
       }
       elsif ($1 eq ',' && @stack > 1) {
           push @{$stack[-1]}, ($current = []);
       }
       elsif ($1 eq '}' && @stack > 1) {
           my $group = pop @stack;
           $current = $stack[-1][-1];
           
           # handle the case of brace pairs without commas:
           @{$group->[0]} = map { "{$_}" } @{$group->[0]} if @$group == 1;
           
           @$current = map {
               my $c = $_;
               map { map { $c . $_ } @$_ } @$group;
           } @$current;
       }
       else { $_ .= $1 for @$current; }
   }
   
   # handle the case of missing closing braces:
   while (@stack > 1) {
       my $right = pop @{$stack[-1]};
       my $sep;
       if (@{$stack[-1]}) { $sep = ',' }
       else               { $sep = '{'; pop @stack }
       $current = $stack[-1][-1];
       @$current = map {
           my $c = $_;
           map { $c . $sep . $_ } @$right;
       } @$current;
   }
   
   return @$current;

}

  1. Usage demonstration:

print "And for desert:\n"; print $_ for brace_expand("- {apple,{rasp,straw}berry} pie\n");</lang>

Output:
And for desert:
- apple pie
- raspberry pie
- strawberry pie

Of course it also passes the four test cases listed in the task description.

Perl 6

<lang perl6>grammar BraceExpansion {

   token TOP {
       (
       | '{' <alts> '}'
       | '{' <-[ { , } ]>*? '}'
       | \\ .
       | .
       )*
   }
   token alts { <alt> ** 2..* % ',' }
   token alt {
       (
       | '{' <alts> '}'
       | '{' <-[ { , } ]>*? '}'
       | \\ .
       | <-[ , } ]>
       )*
   }

}

sub crosswalk($/) {

   my @sofar = ;
   for $0[] -> $/ {
       @sofar = @sofar X~ ($<alts> ?? $<alts><alt>».&crosswalk !! ~$/);
   }
   @sofar;

}

sub brace-expand($s) { crosswalk BraceExpansion.parse($s) }</lang> And to test... <lang perl6>sub bxtest(*@s) {

   for @s -> $s {
       say "\n$s";
       for brace-expand($s) {
           say "    ", $_;
       }
   }

}

bxtest Q:to/END/.lines;

   ~/{Downloads,Pictures}/*.{jpg,gif,png}
   It{{em,alic}iz,erat}e{d,}, please.
   {,{,gotta have{ ,\, again\, }}more }cowbell!
   {}} some {\\{edge,edgy} }{ cases, here\\\}
   a{{1,2}b
   a{1,2}}b
   a{1,{2},3}b
   END</lang>
Output:
~/{Downloads,Pictures}/*.{jpg,gif,png}
    ~/Downloads/*.jpg
    ~/Downloads/*.gif
    ~/Downloads/*.png
    ~/Pictures/*.jpg
    ~/Pictures/*.gif
    ~/Pictures/*.png

It{{em,alic}iz,erat}e{d,}, please.
    Itemized, please.
    Itemize, please.
    Italicized, please.
    Italicize, please.
    Iterated, please.
    Iterate, please.

{,{,gotta have{ ,\, again\, }}more }cowbell!
    cowbell!
    more cowbell!
    gotta have more cowbell!
    gotta have\, again\, more cowbell!

{}} some {\\{edge,edgy} }{ cases, here\\\}
    {}} some {\\edge }{ cases, here\\\}
    {}} some {\\edgy }{ cases, here\\\}

a{{1,2}b
    a{1b
    a{2b

a{1,2}}b
    a1}b
    a2}b

a{1,{2},3}b
    a1b
    a{2}b
    a3b