S-expressions
S-Expressions are one convenient way to parse and store data.
Write a simple reader and writer for S-Expressions that handles quoted and unquoted strings, integers and floats.
The reader should read a single but nested S-Expression from a string and store it in a suitable datastructure (list, array, etc). Newlines and other whitespace may be ignored unless contained within a quoted string. () inside quoted strings are not interpreted, but treated as part of the string. Handling escaped quotes inside a string is optional. thus (foo"bar) maybe treated as a string 'foo"bar', or as an error.
Languages that support this may treat unquoted strings as symbols.
The reader should be able to read the following input <lang lips>((data "quoted data" 123 4.5)
(data (123 (4.5) "(more" "data)")))</lang>
and eg. in python produce a list as:
<lang python>[["data", "quoted data", 123, 4.5]
["data", [123, [4.5], "(more", "data)"]]]</lang>
The writer should be able to take the produced list and turn it into a new S-Expression. Strings that don't contain whitespace or parentheses () don't need to be quoted in the resulting S-Expression, but as a simplification, any string may be quoted.
Pike
this version doesn't yet handle int and float and it doesn't remove unneeded quotes from simple strings <lang pike>string input = ((data \"quoted data\" 123 4.5)\n (data (123 (45) \"(more\" \"data)\")))";
array tokenizer(string input) {
array output = ({}); for(int i=0; i<sizeof(input); i++) { switch(input[i]) { case '(': output+= ({"("}); break; case ')': output += ({")"}); break; case '"': output+=array_sscanf(input[++i..], "%s\"%[ \t\n]")[0..0]; i+=sizeof(output[-1]); break; case ' ': case '\t': case '\n': break; default: output+=array_sscanf(input[i..], "%s%[) \t\n]")[0..0]; i+=sizeof(output[-1])-1; break; } } return output;
}
// this function is based on the logic in Parser.C.group() in the pike library; array group(array tokens) {
ADT.Stack stack=ADT.Stack(); array ret =({});
foreach(tokens;; string token) { switch(token) { case "(": stack->push(ret); ret=({}); break; case ")": if (!sizeof(ret) || !stack->ptr) { // Mismatch werror ("unmatched close parenthesis\n"); return ret; } ret=stack->pop()+({ ret }); break; default: ret+=({token}); break; } } return ret;
}
string sexp(array input) {
array output = ({}); foreach(input;; mixed item) { if (arrayp(item)) output += ({ sexp(item) }); else output += ({ sprintf("%O", item) }); } return "("+output*" "+")";
}
array data = group(tokenizer(input))[0]; string output = sexp(data); </lang>
Output:
({({"data", "quoted data", "123", "4.5"}), ({"data", ({"123", ({"45"}), "(more", "data)"})})}) (("data" "quoted data" "123" "4.5") ("data" ("123" ("45") "(more" "data)")))