S-expressions: Difference between revisions

From Rosetta Code
Content added Content deleted
(implement prototype in pike)
mNo edit summary
Line 6: Line 6:
The reader should read a single but nested S-Expression from a string and store it in a suitable datastructure (list, array, etc). Newlines and other whitespace may be ignored unless contained within a quoted string. () inside quoted strings are not interpreted, but treated as part of the string. Handling escaped quotes inside a string is optional. thus (foo"bar) maybe treated as a string 'foo"bar', or as an error.
The reader should read a single but nested S-Expression from a string and store it in a suitable datastructure (list, array, etc). Newlines and other whitespace may be ignored unless contained within a quoted string. () inside quoted strings are not interpreted, but treated as part of the string. Handling escaped quotes inside a string is optional. thus (foo"bar) maybe treated as a string 'foo"bar', or as an error.


Languages that support this may treat unquoted strings as symbols.
Languages that support it may treat unquoted strings as symbols.


The reader should be able to read the following input
The reader should be able to read the following input

Revision as of 18:18, 15 October 2011

S-expressions is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

S-Expressions are one convenient way to parse and store data.

Write a simple reader and writer for S-Expressions that handles quoted and unquoted strings, integers and floats.

The reader should read a single but nested S-Expression from a string and store it in a suitable datastructure (list, array, etc). Newlines and other whitespace may be ignored unless contained within a quoted string. () inside quoted strings are not interpreted, but treated as part of the string. Handling escaped quotes inside a string is optional. thus (foo"bar) maybe treated as a string 'foo"bar', or as an error.

Languages that support it may treat unquoted strings as symbols.

The reader should be able to read the following input <lang lips>((data "quoted data" 123 4.5)

(data (123 (4.5) "(more" "data)")))</lang>

and eg. in python produce a list as:

<lang python>[["data", "quoted data", 123, 4.5]

["data", [123, [4.5], "(more", "data)"]]]</lang>

The writer should be able to take the produced list and turn it into a new S-Expression. Strings that don't contain whitespace or parentheses () don't need to be quoted in the resulting S-Expression, but as a simplification, any string may be quoted.

Pike

this version doesn't yet handle int and float and it doesn't remove unneeded quotes from simple strings <lang pike>string input = ((data \"quoted data\" 123 4.5)\n (data (123 (45) \"(more\" \"data)\")))";

array tokenizer(string input) {

   array output = ({}); 
   for(int i=0; i<sizeof(input); i++)
   { 
       switch(input[i])
       { 
           case '(': output+= ({"("}); break; 
           case ')': output += ({")"}); break; 
           case '"': output+=array_sscanf(input[++i..], "%s\"%[ \t\n]")[0..0]; 
                     i+=sizeof(output[-1]); 
                     break; 
           case ' ': 
           case '\t': 
           case '\n': break; 
           default: output+=array_sscanf(input[i..], "%s%[) \t\n]")[0..0]; 
                    i+=sizeof(output[-1])-1; break; 
       }
   }
   return output;

}

// this function is based on the logic in Parser.C.group() in the pike library; array group(array tokens) {

   ADT.Stack stack=ADT.Stack();
   array ret =({});
   foreach(tokens;; string token)
   {
       switch(token)
       {
           case "(": stack->push(ret); ret=({}); break;
           case ")":
                   if (!sizeof(ret) || !stack->ptr) 
                   {
                     // Mismatch
                       werror ("unmatched close parenthesis\n");
                       return ret;
                   }
                   ret=stack->pop()+({ ret }); 
                   break;
           default: ret+=({token}); break;
       }
   }
   return ret;

}

string sexp(array input) {

   array output = ({});
   foreach(input;; mixed item)
   {
       if (arrayp(item))
           output += ({ sexp(item) });
       else
           output += ({ sprintf("%O", item) });
   }
   return "("+output*" "+")";

}

array data = group(tokenizer(input))[0]; string output = sexp(data); </lang>

Output:

({({"data", "quoted data", "123", "4.5"}), ({"data", ({"123", ({"45"}), "(more", "data)"})})})
(("data" "quoted data" "123" "4.5") ("data" ("123" ("45") "(more" "data)")))