S-expressions

S-Expressions are one convenient way to parse and store data.

Write a simple reader and writer for S-Expressions that handles quoted and unquoted strings, integers and floats.

The reader should read a single but nested S-Expression from a string and store it in a suitable datastructure (list, array, etc). Newlines and other whitespace may be ignored unless contained within a quoted string. () inside quoted strings are not interpreted, but treated as part of the string. Handling escaped quotes inside a string is optional. thus (foo"bar) maybe treated as a string 'foo"bar', or as an error.

Languages that support it may treat unquoted strings as symbols.

The reader should be able to read the following input <lang lips>((data "quoted data" 123 4.5)

(data (123 (4.5) "(more" "data)")))</lang>

and eg. in python produce a list as:

<lang python>[["data", "quoted data", 123, 4.5]

["data", [123, [4.5], "(more", "data)"]]]</lang>

The writer should be able to take the produced list and turn it into a new S-Expression. Strings that don't contain whitespace or parentheses () don't need to be quoted in the resulting S-Expression, but as a simplification, any string may be quoted.

Pike

this version doesn't yet handle int and float and it doesn't remove unneeded quotes from simple strings <lang pike>string input = ((data \"quoted data\" 123 4.5)\n (data (123 (45) \"(more\" \"data)\")))";

array tokenizer(string input) {

   array output = ({}); 
   for(int i=0; i<sizeof(input); i++)
   { 
       switch(input[i])
       { 
           case '(': output+= ({"("}); break; 
           case ')': output += ({")"}); break; 
           case '"': output+=array_sscanf(input[++i..], "%s\"%[ \t\n]")[0..0]; 
                     i+=sizeof(output[-1]); 
                     break; 
           case ' ': 
           case '\t': 
           case '\n': break; 
           default: output+=array_sscanf(input[i..], "%s%[) \t\n]")[0..0]; 
                    i+=sizeof(output[-1])-1; break; 
       }
   }
   return output;

}

// this function is based on the logic in Parser.C.group() in the pike library; array group(array tokens) {

   ADT.Stack stack=ADT.Stack();
   array ret =({});

   foreach(tokens;; string token)
   {
       switch(token)
       {
           case "(": stack->push(ret); ret=({}); break;
           case ")":
                   if (!sizeof(ret) || !stack->ptr) 
                   {
                     // Mismatch
                       werror ("unmatched close parenthesis\n");
                       return ret;
                   }
                   ret=stack->pop()+({ ret }); 
                   break;
           default: ret+=({token}); break;
       }
   }
   return ret;

}

string sexp(array input) {

   array output = ({});
   foreach(input;; mixed item)
   {
       if (arrayp(item))
           output += ({ sexp(item) });
       else
           output += ({ sprintf("%O", item) });
   }
   return "("+output*" "+")";

}

array data = group(tokenizer(input))[0]; string output = sexp(data); </lang>

Output:

({({"data", "quoted data", "123", "4.5"}), ({"data", ({"123", ({"45"}), "(more", "data)"})})})
(("data" "quoted data" "123" "4.5") ("data" ("123" ("45") "(more" "data)")))