Category:TXR: Difference between revisions

Describe some Lisp innovations.
mNo edit summary
(Describe some Lisp innovations.)
Line 8:
<code>eval $(txr <txr-program> <args> ...)</code>
 
TXR remains close to these roots: its main language is the pattern-based text extraction notation well -suited for matching large regions of
entire text documents. That language has become powerful enough to neatly parse grammars, while the tool developed a Lisp dialect to round out the functionality and increase its power.
entire text documents.
 
=== What's with all that <code>@</code> stuff? ===
Line 27:
 
The second unusual feature in TXR Lisp is that the "tokens" in the pattern matching language are essentially themselves Lisp symbols and expressions. These "tokens" are used to create a block-structured language. This is quite odd. For instance a construct might begin with a <code>@(collect :vars (foo))</code>. This is a Lisp expression with interior structure, but to the parser of the pattern language, it's also basically just a token, like a giant keyword. IT begins a collect clause, and is followed by some optional material which may just be literal text, and must be terminated by the <code>@(end)</code> directive: another token-expression entity.
 
=== Dual Personality ===
 
From the solutions below, it is obvious that some of them make heavy use of the pattern language and little or no TXR Lisp. Others are quite the other way around, based entirely or almost entirely on Lisp, whereas others follow strategies of mixing the approaches.
 
=== Lisp Innovation ===
 
Although TXR Lisp is heavily based on prior work and strongly influenced by Common Lisp (for instance, there is a real <code>nil</code> symbol which is a list terminator and means "false", and it's primarily Lisp-2 dialect), TXR Lisp nevertheless manages to innovate within the world of Lisp.
 
==== Can't We All Just Get Along? ====
 
One major innovation is the way in which TXR Lisp closes the ages-old gap between Lisp-1 and Lisp-2, and the associated religious sectarian division it has caused. TXR Lisp recognizes that there are advantages both in having separate function and variable namespaces, and in having a single namespace: and it provides support for both approaches. The idea is devilishly simple. A special operator called "dwim" (do what I mean) is provided such that <code>(dwim forms ...)</code> basically denotes <code>(forms ...)</code>, but with Lisp-1 style evaluation: any of the forms which are symbols are resolved according to a flattened namespace that combines variable and function bindings (with conflicts resolved in favor of variables). TXR Lisp programmers do not use the <code>dwim</code> operator directly, but rather the square bracket notation <code>[a b c ...] -> (dwim a b c)</code>. So for instance, <code>[mapcar list '(1 2 3) '(a b c)]</code> zips together two lists. Using round brackets, it would have to be <code>(mapcar (fun list) '(1 2 3) '(a b c))</code>. Moreover, if <code>foo</code> is a variable that holds a function of one argument, it has to be called using <code>(call foo arg)</code>. But with the "dwim brackets", it is just <code>[foo arg]</code>. The "dwim brackets" also support various funcall-extensions, like indexing into arrays and slicing and so on. If <code>h</code> is a hash then <code>[h "a" 42]</code> looks up the string <code>a</code> in the hash and returns the value, or else returns 42 if the key is not found. If <code>a</code> is an array, list or string, then <code>[a 2..3]</code> extracts a slice. On the other hand, "dwim brackets" do not support operators: <code>[let (a b c) ...]</code> is not valid; or rather, it expects a variable or function <code>let</code> to exist, ignoring the <code>let</code> operator. Operators are strictly in the Lisp-2 domain, which is fine because Lisp-2 is a cleaner model with regard to operators anyway!
 
==== Polymorphizing the Classics ====
 
In TXR Lisp, classic Lisp operations like <code>mapcar</code>, <code>cdr</code> or <code>append</code> work exactly like they are supposed to, when they operate on lists. However, most of the operations have been extended to also work with strings and vectors. So for instance, <code>(mapcar (op + 1) "abc")</code> nicely yields the string <code>"bcd"</code> where <code>1</code> has been added to every character code of the original string. <code>(car "abc")</code> yields <code>#\a</code>, a character, <code>(cdr "abc")</code> yields <code>"bc"</code>, and <code>(cdr "c")</code> yields <code>nil</code> (rather than the empty string!) (Almost) everything proceeds from there.
 
==== One Quote to Bind Them ====
 
TXR Lisp dispenses with the distinction between quoted lists and quasi-quoted lists, in favor of an experiment to combine them. There one quoted syntax denoted by forward quote. This syntax supports unquoting. If you don't unquote anything, you get a regular quote. Splices are not spelled <code>,@</code> because we have had enough of the <code>@</code> character, and because it is ambiguous: does <code>,@x</code> mean "unquote (meta x)" or "splice x?" So the splice operator is denoted by an asterisk. Because there is only one kind of quote, there is some reduction in power in regard to mixing quotes and quasiquotes in one expression due to the ambiguity. The quasi-quote expander contains some hacks to make all the common cases work fine. Notably, the combinations <code>,'form</code> and <code>,*'form</code> (unquote quoted expression and splice quoted expression) are treated specially: the "comma cancels the quote", and so any unquotes inside <code>form</code> do not belong to the inner quote, but to the surrounding quote.
 
==== Good Hacker Borrow; Great Ones Steal ====
 
TXR steals ideas from libraries and languages related to Lisp. The <code>op</code> operator is inspired by Goo's <code>op</code>, and the cl-op library for Common Lisp which is also inspired by Goo's <code>op</code>.
 
Array and list slices are straight out of Python, right down to the half-open ranges (not including the upper endpoint), and negative values indexing from the tail of the array. The concept is further improved upon by allowing the Lisp symbols <code>nil</code> and <code>t</code> to be used as range indexes.
 
Slices, even empty ones, are assignable places in TXR Lisp: we can easily replace a possibly empty section of a string, lisp or vector, with a sequence that is of different length, also possibly empty: for instance, if <code>x</code> contains <code>"abc"</code> then (set [x 1..2] "foo")</code> changes <code>x</code> to contain <code>"afooc"</code>.
543

edits