Talk:S-expressions: Difference between revisions

 
(31 intermediate revisions by 10 users not shown)
Line 15:
 
Common Lisp: ANSI Standard available in the form of the [[http://www.lispworks.com/documentation/HyperSpec/Front/index.htm Common Lisp HyperSpec]]. The reader syntax is described in [[http://www.lispworks.com/documentation/HyperSpec/Body/02_.htm Syntax]].
:this page describing the [http://www.lispworks.com/documentation/HyperSpec/Body/02_ad.htm Character Syntax Types] seems most interestunginteresting.--[[User:EMBee|eMBee]] 02:07, 20 October 2011 (UTC)
::Yes. In Common Lisp, the reading is very simple. Characters are assigned to categories in a read-table (in a context-independent way). The reader takes a character and based on what category it is in, it assembles it into a token, or dispatches a function associated with that character. Donald Knuth's TeX has a similar approach: it also has character categories (which TeX code can manipulate).[[Special:Contributions/24.85.131.247|24.85.131.247]] 02:22, 20 October 2011 (UTC)
:::indeed, i like the approach, it also demonstrates the simplicity of s-expressions (and lisp) syntax. you could not do this as easely with xml or json.--[[User:EMBee|eMBee]] 05:22, 20 October 2011 (UTC)
Line 32:
it seems there is no formal definition of S-Expressions, and much of the arguments in the discussion on this task seems to revolve around what is required for S-Expressions and what is not.
this [http://people.csail.mit.edu/rivest/Sexp.txt draft RFC] makes an attempt at a definition, but it was not accepted for reasons that are unknown.
: the [http://www-formal.stanford.edu/jmc/recursive/node3.html original definition of S-Expressions] by John McCarthy can be found on his homepage.--[[User:EMBee|eMBee]] 05:28, 4 November 2011 (UTC)
 
for a definition i'd like to separate two aspects:
Line 46 ⟶ 47:
* atom: a symbol, string or number
* S-Expression: a sequence of atoms or S-Expressions separated by whitespace and surrounded by parentheses ()
 
 
do we need others? the rfc above introduces special syntax for hex, and base64 encodings, as well as length-prefix encodings, all of which i consider not needed. there are other standards for hex, and there are other encodings, so i don't see why they need to be treated special.
Line 70 ⟶ 72:
:::::::they are bothering, but i can for example work around unbalanced parenthesis by adding a few extra at the beginning or end, to make the parser happy, and then later analyze where the real error is. try that with xml.
::::::::Decent editors have a Lisp mode for automatically indenting S-expressions, matching parentheses, or selecting subexpressions and moving them around, etc.[[Special:Contributions/192.139.122.42|192.139.122.42]] 22:03, 19 October 2011 (UTC)
:::::::::And there's nXML mode for emacs, for example, which automates indentation and tag closing and so on when editing xml files. --[[User:Rdm|Rdm]]
:::::::of course this has nothing to do with the task or the definition of s-expressions, it just shows that s-expressions are easier to manipulate programmatically which probably is one of the reasons why lisp is as powerful as it is.--[[User:EMBee|eMBee]] 17:09, 19 October 2011 (UTC)
::''[... CONTINUATION] so this mechanism only makes sense for interoperability. And interoperability only makes sense when it's defined in such a way that both my implementation and that of another language are doing the same thing. And that's just not happening here, because that is not how the task is written. --[[User:Rdm|Rdm]] 10:39, 19 October 2011 (UTC)''
Line 93 ⟶ 96:
::i have been searching for a definition for s-expressions, and i could not find anything. this one raises some interesting points and has things i disagree with.--[[User:EMBee|eMBee]] 06:02, 19 October 2011 (UTC)
:::Common Lisp (ANSI standard), ISLisp (ISO) and Scheme are programing languages which are based on S-expressions. There are differences between them but they have a lot in common. Rivest's RFC is awful.[[Special:Contributions/192.139.122.42|192.139.122.42]] 22:07, 19 October 2011 (UTC)
 
 
== Symbols and strings ==
Line 172 ⟶ 174:
::There has been too much of the Python taken out I think. The intermediate language specific list structure should have been left in but instead of saying it was Python, just say enough so that implementers knew to use their languages native ints, floats, strings and nestable lists. (They might have to create a [[Linked list|linked list]] data-structure if their language or a library does not produce one).
::Also removed is the notion of taking the S-expression through to the internal representation, then back to an S-expression of the original which I thought was good too. --[[User:Paddy3118|Paddy3118]] 07:08, 20 October 2011 (UTC)
:::huh? the only thing ever removed was: "must, in addition, recognise integers and floating point literals (that are distinguished as digits followed by a decimal point, followed by digits)." replacing it with "should, in addition, recognize numbers if the language has appropriate datatypes." i have also moved that paragraph in the process. this was the last edit on the description [http://rosettacode.org/mw/index.php?title=S-Expressions&diff=next&oldid=123268 (see diff)]
:::i have checked the complete history to make sure i didn't delete something by accident.--[[User:EMBee|eMBee]] 09:06, 20 October 2011 (UTC)
 
::::Hi eMBee, This got lost:
:::::<lang python>[["data", "quoted data", 123, 4.5]
["data", ["!@#", [4.5], "(more", "data)"]]]</lang>
::::--[[User:Paddy3118|Paddy3118]] 18:42, 20 October 2011 (UTC)
:::::oh, now i understand what you mean: keep the list but describe it as a generic nested list, not a python list.
:::::i removed the list intentionally because i think there are now enough examples of how a native structure might look like. at least the lisp solutions, pike, python and ruby produce directly native results. (for the other languages i can't tell how native lists look like.)--[[User:EMBee|eMBee]] 00:22, 21 October 2011 (UTC)
 
== Lisp Solutions ==
 
The lisp solutions demonstrate the steps needed to actually read an s-expression from a string, but for those interested in learning how to write a simple parser in lisp it would also be nice if we could also have solutions that re-implement a parser with basic lisp tools. assume for example the s-expressions are written with [] instead of ().
 
how would a parser look like that turns
<lang lisp>(setq input "[[data \"quoted data\" 123 4.5]
[data [!@# [4.5] \"[more\" \"data]\"]]]")</lang>
into
<lang lisp>((data "quoted data" 123 4.5)
(data (|!@#| (4.5) "[more" "data]")))</lang>
and back?--[[User:EMBee|eMBee]] 09:49, 20 October 2011 (UTC)
:Unfortunately the task isn't to show a parser but to get the S-expression into an internal form that just happens to be native Lisp.[[User:Paddy3118|Paddy3118]] 08:51, 23 June 2012 (UTC)
::the intention of the task was to create a collection of s-expression parsers as well as demonstrate how to write a parser. if the lisp solutions could show how to implement an actual parser similar to what needs to be done in most other languages, i would consider this a nice addition to the current solution. so, yes, if someone wants to implement a full parser go ahead.--[[User:EMBee|eMBee]] 15:18, 24 June 2012 (UTC)
:You should show how easy it is to do in Lisp. S-expressions have been used as a data format where much of the processing is done in languages other than Lisp - for example in the Electronics industry. --[[User:Paddy3118|Paddy3118]] 08:51, 23 June 2012 (UTC)
::Using S-Expressions as a data format in a language other than Lisp?! Those godless heathens!!! =) --[[User:Lhignight|Larry Hignight]] 11:38, 23 June 2012 (UTC)
 
::: I've just looked up the formats I was thinking of. They are [[wp:EDIF|EDIF]] and [[wp:Standard Delay Format|Standard Delay Format]] which are heavily S-expression inspired but not the same as Lisp. --[[User:Paddy3118|Paddy3118]] 21:37, 23 June 2012 (UTC)
 
::Hi Donald, I think the Common Lisp section is solid now. Check it out and let me know what you think. --[[User:Lhignight|Larry Hignight]] 11:38, 23 June 2012 (UTC)
 
:::Hi Larry, I think the Lisp does the task and a bit more which is fine as the extra is explained and close enough to the topic to belong. --[[User:Paddy3118|Paddy3118]] 21:28, 23 June 2012 (UTC)
 
== Writer in separate subtask? ==
Aside from the Python and Ruby implementations, I haven't seen any other languages writing native code back to S-Exps. Perhaps the writer should be spunoff as a separate sub-task? Also, many of the implementations don't show any kind of output. --[[User:Lhignight|Larry Hignight]] 22:28, 22 June 2012 (UTC)
: (moved this out of the lisp section as it seems rather unrelated to it.
: one goal of this task was to provide a collection of code to handle s-expressions. for anyone writing a library both sides should be covered and seperating them may not be worth it. consider the java solution for example. that is real code in production use. splitting out the writer from it would just needlessly make it harder. also in most cases i believe the writer is not very hard so there wouldn't be much gain from separating it out.--[[User:EMBee|eMBee]] 15:18, 24 June 2012 (UTC)
: if any solution is missing a writer then it is incomplete. showing the output would of course also be nice.--[[User:EMBee|eMBee]] 15:23, 24 June 2012 (UTC)
 
== TCL native types ==
 
what would a native structure look like? i don't quite understand the comment: ''A “native” data structure could also be generated, but then that would turn things into lists that are not in the original.''
also, since tcl doesn't have a type system i think the tags are not really needed, except for the quotes.
 
since tcl doesn't need quotes for strings can't the structure be like:
{{data "quoted data" 123 4.5} {data {!@# {4.5} "(more" "data)"}}}
using " as part of the string value?--[[User:EMBee|eMBee]] 10:53, 3 November 2011 (UTC)
: In TCL, "everything is a string" (or list, depending on how you look at it). <code>{a b c}</code> is the same as <code>"a b c"</code>, which will lead to some unfortunate ambiguities. --[[User:Ledrug|Ledrug]] 16:25, 3 November 2011 (UTC)
:: ah, yes, that's a problem. actually, i found that the real problem is that a single element list is indistinguishable from the element itself:
<blockquote><pre>% puts [list [list [list a]]]
a</pre></blockquote>
:: therefore tagging is already needed just to distinguish lists from atoms.--[[User:EMBee|eMBee]] 17:37, 3 November 2011 (UTC)
::: There are dirty ways to distinguish (which I use in the task on JSON) but generally speaking Tcl programmers write their code to not need to ask whether a particular word is a list or terminal; it's not a distinction that they draw. (It's a different attitude.) The "native" representation of the above would be:
::: <pre>{{data {quoted data} 123 4.5} {data {!@# 4.5 (more data)}}}</pre>
:::But as you can see there's a fair loss of information because lists ''are'' a subtype of strings and singleton lists can be rendered like their first element under (common) circumstances. The alternative to putting the tagging inline would have been to have returned a separate type descriptor. (Generating everything from the fully parsed representation is left as an exercise.) –[[User:Dkf|Donal Fellows]] 10:47, 4 November 2011 (UTC)
 
==A bit late to add the extra credit?==
:''"'''Extra Credit:''' Let the writer produce pretty printed output with indenting and line-breaks."''
There are already a large number of solutions to the task. I was thinking of arguing for the removal of the extra credit that has just been added for that reason, but would others give their opinions. Thanks. --[[User:Paddy3118|Paddy3118]] 06:41, 26 October 2012 (UTC)
:: if it wasn't so late i would have added it to the task directly. being late is the reason for it being extra credit. this way the exiting tasks are not invalidated.
:: one could argue to make this a separate task though, but it's debatable which is better. --[[User:EMBee|eMBee]] 07:18, 26 October 2012 (UTC)
 
== extend task for Associative Arrays, Mappings, Dictionarys? ==
 
I'd like to extend this task to support Associative Arrays using the Lisp method of Cons Pairs:
((key1 . value1) (key2 . value2)) for { key1:value1, key2:value2 }
 
i think it is better to add it as Extra Credit to this task instead of creating a new task for it because a new task would require all of the code that this task needs, and then we'll get people wondering if the tasks should be merged.
 
i am aware that (a b) is equivalent to (a . (b)) in Lisp and therefore any list could be treated as an associative array, however for the purpose of this task (and for the usability as an interchangeable data format, we could just require that a pair with a list as value be written as (a . (b))
 
for the same reason a solution in lisp better ought to use a hash type when reading the data.
--[[User:EMBee|eMBee]] 11:07, 26 October 2012 (UTC)
 
== Haskell version failing with GHC 7.10.3 and current Stack ==
Compilation failing with two errors at the moment:
# 'between' is not in scope. (Not included in the imports)
# "Precedence parsing error cannot mix ‘<?>’ [infix 0] and ‘<?>’ [infix 0] in the same infix expression"-- [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 22:12, 27 April 2017 (UTC)
 
== JavaScript version bugfix for \" and \n in strings ==
I found that the present solution doesn't treat escaped double quotes and newlines in strings right (used in KiCad)
 
(property "ki_description" "Power symbol creates a global label with name \"GND\" ,\nexample new line")
 
See below for my "hack" for the procedural version. The functional version is too difficult to read for me - in any case, I hope that the original author comes back to fix it correctly, right now it just works for my purpose
String.prototype.parseSexpr = function () {
//schweick: modified first option in regexp to catch string containing \"
//I just trusted the expression from https://stackoverflow.com/a/30737232 (some others would not work):
var t = this.match(/\s*("(?:[^"\\]*(?:\\.)?)*"|\(|\)|"|[^\s()"]+)/g)
for (var o, c = 0, i = t.length - 1; i >= 0; i--) {
var n, ti = t[i].trim()
if (ti == '"') return
else if (ti == '(') t[i] = '[', c += 1
else if (ti == ')') t[i] = ']', c -= 1
else if ((n = +ti) == ti) t[i] = n
// schweick: inserted two replacements to double escape \", \n in strings
// which otherwise would loose their backslash in toPretty():
else t[i] = '\'' + ti.replace('\'', '\\\'').replace(/\\"/g, '\\\\"').replace(/\\n/g, '\\\\n') + '\''
if (i > 0 && ti != ']' && t[i - 1].trim() != '(') t.splice(i, 0, ',')
if (!c) if (!o) o = true; else return
}
return c ? undefined : eval(t.join(''))
}
Please delete my comment above when taken care of on the main page
[[User:Schweick|Schweick]] ([[User talk:Schweick|talk]]) 09:17, 25 January 2023 (UTC)
3

edits