Talk:S-expressions: Difference between revisions

From Rosetta Code
Content added Content deleted
Line 4: Line 4:
the purpose of this task is to read and write structured data, in order to be able to use it to save or exchange data between systems. the vehicle chosen for this data is s-expressions. just like you would use json, xml or other formats.
the purpose of this task is to read and write structured data, in order to be able to use it to save or exchange data between systems. the vehicle chosen for this data is s-expressions. just like you would use json, xml or other formats.


it really does not matter how the data is translated in a language, as long as it is in a format that is of practical use in the language. in other words, it should be possible to access the values and manipulate them. for this reason, i am not sure how useful it is to represent symbols as such, other than to help differentiate between quoted and unquoted strings as [[User:Ledrug]] pointed out very early on in the discussion. if the language has no support for symbols then care needs to be taken that the value of the symbol is actually accessible.
it really does not matter how the data is translated in a language, as long as it is in a format that is of practical use in the language. in other words, it should be possible to access the values and manipulate them. for this reason, i am not sure how useful it is to represent symbols as such, other than to help differentiate between quoted and unquoted strings as [[User:Ledrug|Ledrug]] pointed out very early on in the discussion. if the language has no support for symbols then care needs to be taken that the value of the symbol is actually accessible.


the OCaml solution is for example on the border. it doesn't handle numbers and it does not even encode the difference between quoted and unquoted strings.--[[User:EMBee|eMBee]] 01:59, 18 October 2011 (UTC)
the OCaml solution is for example on the border. it doesn't handle numbers and it does not even encode the difference between quoted and unquoted strings.--[[User:EMBee|eMBee]] 01:59, 18 October 2011 (UTC)

Revision as of 02:00, 18 October 2011

The Goal of this task

is reliable parsing of input data that doesn't break no matter what is thrown at it (as long as it follows the rules) into any native data structrue in your language and back.--eMBee 00:14, 18 October 2011 (UTC)

the purpose of this task is to read and write structured data, in order to be able to use it to save or exchange data between systems. the vehicle chosen for this data is s-expressions. just like you would use json, xml or other formats.

it really does not matter how the data is translated in a language, as long as it is in a format that is of practical use in the language. in other words, it should be possible to access the values and manipulate them. for this reason, i am not sure how useful it is to represent symbols as such, other than to help differentiate between quoted and unquoted strings as Ledrug pointed out very early on in the discussion. if the language has no support for symbols then care needs to be taken that the value of the symbol is actually accessible.

the OCaml solution is for example on the border. it doesn't handle numbers and it does not even encode the difference between quoted and unquoted strings.--eMBee 01:59, 18 October 2011 (UTC)

Symbols and strings

To be more generally useful, it's probably better to distinguish between quoted and unquoted strings instead of giving numbers special treatment. 0x1, 1d0, 13#4bc, 1.3f, 1_000 may or may not be parsed as numbers depending on what the definition of literal numbers is, and can be deferred to a separate step -- as long as the parse remembers that they are not quoted. On the other hand, it's more likely than not that "data" and data mean completely different things, so the parser better remember that information instead of making it optional. --Ledrug 10:48, 16 October 2011 (UTC)

you are of course right, i just didn't want to make the task to hard. in languages that don't support symbols, an object would need to be created, if that can be done. otherwise, how can a symbol be represented?
That's the task implementor's problem, isn't it? What do you think the S stands for in S-expression? String expression? I think not. To call symbols unquoted strings is beyond laughable. A correct implementation of this task replaces A with pointers to the same object in (A A A). There is no S-expression without interning.24.85.131.247 04:51, 17 October 2011 (UTC)
that is not even true in lisp: in common lisp (A A A), the first A is a function and the other two A are a variable.
You are incorrect. The expression (A A A) denotes a list containing three repetititions of the same object, the symbol a. Whether this denotes a function call or something else is entirely up to the context. For instance, it could be embedded inside (quote (a a a)).192.139.122.42 22:29, 17 October 2011 (UTC)
s-expressions are just data. the interpretation of the meaning of atoms in an s-expression is entirely up to the application. in this application they are strings.--eMBee 05:37, 17 October 2011 (UTC)
S-expressions are a notation which denotes structure. Treating them as characters is unsatisfactory. That's like saying that XML just a bunch of characters and angle brackets. It's a bunch of characters and angle brackets, plus some rules which give them meaning and make it all correspond to some kind of structure. You contradict yourself anyway becuaes you're using the term atom, which is structural. If there is no structure, then (a b c) is just "(a b c)": a string consisting of parentheses, spaces and letters.

192.139.122.42 22:29, 17 October 2011 (UTC)

where i am contradicting myself? of course there is structure. structure is denoted by parenthesis. atoms are denoted by strings.--eMBee 01:17, 18 October 2011 (UTC)
whether it is useful to distinguish between quoted and unquoted strings also depends on what is done with the input. unless you age writing an interpreter of sorts, the input is just data. and if the language can only handle strings as data, then what good is it to have a special representation for unquoted strings?
but if anyone wants to distinguish between quoted and unquoted strings and skip numbers instead, they are free to do so--eMBee 12:00, 16 October 2011 (UTC)
Well, without context, it's not like you can actually expect the code to do something useful to the symbols anyway. All the parser needs to do is distinguish between "123" and 123, "data" and data, just stick a is_quoted flag somewhere on the strings. If your usage later needs to tell symbols from strings, look that flag up; if not, it does no harm. For numbers, just assume you can check the unquoted strings and see if they match some patterns later. It's probably simpler this way and more language neutral (parsing numbers is likely language dependent). --Ledrug 13:37, 16 October 2011 (UTC)
that is not as easy as it sounds, not every language can associate flags with strings without creating a new class, in which case it usually isn't a string anymore.
Not every language has to solve every task. There is a way in Rosetta Code to exclude a language from a task when it's starkly obvious that it can't be done.192.139.122.42 22:31, 17 October 2011 (UTC)
True, but if the task is changed so that most languages cannot implement the task, that would be a problem with the specification of the task and not with the languages. --Rdm 00:41, 18 October 2011 (UTC)
right. the purpose of this task is to read and write structured data, in order to be able to use it to save or exchange data between systems. the vehicle chosen for this data is s-expressions. just like you would use json, xml or other formats. that should be more or less doable in any language that's of practical use.
take a look at the pike example, now that i introduced the Symbol class without handling numbers in the parser, all but quoted strings become Symbols and i find myself having to emulate not only strings but numbers (still incomplete) as well. if i would parse numbers upfront i could store them as such and the Symbol class would be simpler. i will still have to deal with strings in the Symbol class, but with numbers out of the way i could require explicit casting to use Symbols as strings. as long as Symbols can contain numbers the Symbol class has to tell if it is a string, int or float and behave accordingly, because if i have to check what the type of the symbol is before i can use it that would just be to cumbersome.--eMBee 14:03, 16 October 2011 (UTC)
i could of course stick all tokens into a token class and handle the conversion at a later step. but then in order to use the input that conversion step is mandatory and i just end up with more complicated code for something that was supposed to be simple.--eMBee 14:15, 16 October 2011 (UTC)

So... what is the point here?

Is the point here to represent data in a way which is natural to the language (thus, for example, allowing the language to throw errors for unquoted character sequences which have been reserved), or is it to emulate another system? And are we producing a result which displays pleasantly, or do we want type annotations? And if we want type annotations, what types do we support and when do we use them? (In my experience, S expressions are simple only if you ignore most of the details of how they are implemented, and the task description seems ambivalent about where to draw the line. That said, any concept is simple once you understand it, but here I am focusing on the task description and not on the abstract concepts.) --Rdm 17:34, 17 October 2011 (UTC)

It seemed straight-forward for Python as I was given sample input and a specific example of what the intermediate Python datastructure has to be. (Nested lists with ints as ints, floats as floats, strings for the rest). Maybe it needs to be emphasisized for other languages too? --Paddy3118 19:16, 17 October 2011 (UTC)
So am I supposed to emulate python's data language (and, if so, where is the specification for that)? Or am I supposed to use native types (which do not precisely match the word syntax being asked for, but would allow support for things like symbols, rational numbers, complex numbers and representations of functions)? Anyways, for now, I am not implementing any data language, since none was specified. --Rdm 19:22, 17 October 2011 (UTC)
you are supposed to use native types where it makes sense. the python example was just something to get people started. now that we have a bunch of implementations i am considering to remove that.--eMBee 01:26, 18 October 2011 (UTC)
If your language has native support for nested lists of floats, ints and strings, then wouldn't that be enough to emulate the python? (Or nested lists of a variant type that could hold a string/float/int)? --Paddy3118 20:13, 17 October 2011 (UTC)
Yes, the data structures are doable. But think about what each of these should be represented as:
  1. 1
  2. 0.1
  3. 127.0.0.1
  4. Do
  5. Don't
  6. 1e4
  7. 1r4
  8. 1j4
  9. ...
  10. :::
  11. "food"
  12. food
  13. 1efood
  14. 0.food
  15. 'food
  16. `food
and so on... The task implies that some of these should be treated differently, but it also says that there are no special characters outside of ( ) " and whitespace... Anyways, right now, I see several courses of action: I can attempt to emulate Python's data language -- but I do not know the syntax nor rules of that language. I can use a native data language with a special case for quote handling -- and some of the above unquoted words would result in errors if I did that. Or, I can not implement any data language and leave issue to some other module. And, for now, despite the task's suggestion that I should implement a data language, I am implementing the null data language (where every data element is preserved as a sequence of characters even if those characters are numeric). --Rdm 20:26, 17 October 2011 (UTC)
use a native data format, optionally with a special case for quote handling. anything that you can not natively represent may be stored as a string. so if there is a language that doesn't have numbers, then it's strings all the way.--eMBee 01:26, 18 October 2011 (UTC)
Ok, that helps, though it's still ambiguous. For example, should words that can represent functions be treated as representations of those functions or as symbols? (This is in a language where symbols are a valid data type -- which, in effect, implements intern -- but does not grant any special syntax for symbols). --Rdm 01:32, 18 October 2011 (UTC)