Category:TXR: Difference between revisions

Rewrite; eliminate rambling.
(Rewrite; eliminate rambling.)
 
(21 intermediate revisions by 3 users not shown)
Line 1:
{{stub}}{{language
|site=http://www.nongnu.org/txr/}}
{{language programming paradigm|functional}}
{{language programming paradigm|procedural}}
{{language programming paradigm|object-oriented}}
{{language programming paradigm|imperative}}
{{language programming paradigm|declarative}}
 
TXR is a new text extraction language implemented in [[C]], running on POSIX platforms such as [[Linux]], [[Mac OS X]] (and possibly[[Solaris]] otheras well as on [[POSIXMicrosoft Windows]]. platforms)It is a dynamic, high level language originally intended for "data munging" tasks in Unix-like environments, particularly tasks requiring accurate, robust text scraping from loosely structured documents.
 
The Rosetta Code TXR solutions can be viewed in color, and all on one page with a convenient navigation pane [http://www.nongnu.org/txr/rosetta-solutions.html here].
The source of a TXR query is literal text except for directives and variables preceded by the <code>@</code> character.
 
TXR started as a language for "reversing here-documents": evaluating a template of text containing variables, plus useful pattern matching directives, against some body of text and binding pieces of the text which matches variables. The variable bindings were output in POSIX shell variable assignment syntax, allowing for shell code like
Computation evolves by textual pattern matching with implicit backtracking. Non-pattern matching activities are embedded into a pattern matching paradigm. For instance, the line
 
<code>eval $(txr <txr-program> <args> ...)</code>
<pre>Four score and seven years ago,</pre>
 
TXR was internally based, from the beginning, on a data model based on Lisp and eventually exposed a Lisp dialect that came to be known as TXR Lisp. TXR Lisp at first complemented the pattern extraction language, extending its power, but eventually became distinct. Programs can be written in TXR Lisp with no traces of the TXR pattern language, or vice versa.
is a TXR directive which matches a line of that exact text, or else fails. The following is also a directive:
 
TXR Lisp is an original dialect that contains many innovative features, which orchestrate together to express neat, compact solutions to everyday data processing problems. Programmers familiar with Common Lisp will be comfortable with TXR Lisp, and there is much to like for those who use Scheme, Racket or Clojure. TXR Lisp incorporates ideas from contemporary scripting languages also; a key motivation in many of its developments is the promotion of succinctness, which is something that often isn't associated with languages in the Lisp family.
<lang txr>@(bind foo "abc")</lang>
 
which succeeds if <code>foo</code> has no prior binding, or already contains <code>"abc"</code>, but fails if <code>foo</code> has a binding to something other than <code>"abc"</code>.
 
The success of a directive means that computation proceeds to the next directive (and, if this is a pattern match, the input position advances). Failure means that the enclosing query fails, triggering back-tracking behaviors and possibly failure of the entire query.
 
==Simple Query==
 
Here is a very basic hello-world-type TXR query that re-implements the "free" utility:
 
<lang txr>#!/usr/bin/txr -f
@(next "/proc/meminfo")
@(skip)
MemTotal:@/ +/@TOTAL kB
MemFree:@/ +/@FREE kB
Buffers:@/ +/@BUFS kB
Cached:@/ +/@CACHED kB
@(skip)
SwapTotal:@/ +/@SWTOT kB
SwapFree:@/ +/@SWFRE kB
@(next `!echo $(( @TOTAL - @FREE ))`)
@USED
@(next `!echo $(( @USED - @BUFS - @CACHED ))`)
@RUSED
@(next `!echo $(( @FREE + @BUFS + @CACHED ))`)
@RFREE
@(next `!echo $(( @SWTOT - @SWFRE ))`)
@SWUSE
@(output)
TOTAL USED FREE BUFFERS CACHED
Mem: @{TOTAL -12} @{USED -12} @{FREE -12} @{BUFS -12} @{CACHED -12}
+/- buffers/cache: @{RUSED -12} @{RFREE -12}
Swap: @{SWTOT -12} @{SWUSE -12} @{SWFRE -12}
@(end)</lang>
 
Sample run:
 
<pre>$ ./meminfo.txr
TOTAL USED FREE BUFFERS CACHED
Mem: 769280 647752 121528 160108 286844
+/- buffers/cache: 200800 568480
Swap: 1048568 18200 1030368
</pre>
 
Arithmetic is not implemented in TXR as of version 035. The above example simply continues the pattern matching across invocations of echo to borrow the shell to do math. The command
 
<pre>@(next `!echo $(( @TOTAL - @FREE ))`)
@USED
</pre>
 
means, "Next, please switch to scanning the output of this echo command with some variables substituted in. Then capture entire first line of this command into the USED variable."
 
== Complex Query ==
 
Here is a TXR query for computing the complete dependencies of a C source file (including system and compiler headers) on a typical GNU/Linux system, demonstrating features like parallel clauses, recursion and exception handling.
 
<lang txr>@(define process_file (dir file already_visited visited_out))
@ (local this_file next_file header_dir next_dir next_dir2 visited_out_next)
@ (bind this_file `@dir@file`)
@ (none)
@ (bind already_visited this_file)
@ (end)
@ (merge already_visited already_visited this_file)
@ (next this_file)
@ (collect)
@ (cases)
#include "@*header_dir/@next_file"
@ (bind next_dir `@dir@header_dir/`)
@ (or)
#include "@next_file"
@ (bind next_dir dir)
@ (or)
#@/ */include <@*header_dir/@next_file>
@ (bind next_dir `@sys_includes@header_dir/`)
@ (bind next_dir2 `@gcc_includes@header_dir/`)
@ (or)
#@/ */include <@next_file>
@ (bind next_dir sys_includes)
@ (bind next_dir2 gcc_includes)
@ (or)
#@/ */include_next <@*header_dir/@next_file>
@ (bind next_dir `@gcc_includes@header_dir/`)
@ (or)
#@/ */include_next <@next_file>
@ (bind next_dir gcc_includes)
@ (end)
@ (try)
@ (process_file next_dir next_file already_visited visited_out_next)
@ (merge already_visited already_visited visited_out_next)
@ (catch file_error)
@ (try)
@ (process_file next_dir2 next_file already_visited visited_out_next)
@ (merge already_visited already_visited visited_out_next)
@ (catch file_error)
@ (end)
@ (end)
@ (end)
@ (bind visited_out this_file)
@ (try)
@ (flatten visited_out_next)
@ (merge visited_out visited_out visited_out_next)
@ (catch)
@ (end)
@(end)
@(next :args)
@(cases)
@*dir/@*file.@suffix
@ (bind directory `@dir/`)
@(or)
@*file.@suffix
@ (bind directory "")
@(end)
@(next `!gcc -print-search-dirs`)
install: @gcc_install
@(bind gcc_includes `@{gcc_install}include/`)
@(bind sys_includes "/usr/include/")
@(process_file directory `@file.@suffix` nil list_out)
@(output)
@(rep) @list_out@(first)@file.o:@(end)
@(end)</lang>
 
Sample run:
 
<pre>$ txr dep.txr match.c
match.o: /usr/include/stdio.h /usr/include/features.h /usr/include/sys/cdefs.h /usr/include/bits/wordsize.h /usr/include/gnu/stubs.h /usr/include/gnu/stubs-32.h /usr/lib/gcc/i586-redhat-linux/4.4.1/include/stddef.h /usr/include/bits/types.h /usr/include/libio.h /usr/include/_G_config.h /usr/include/wchar.h /usr/lib/gcc/i586-redhat-linux/4.4.1/include/stdarg.h /usr/include/bits/wchar.h /usr/include/wctype.h /usr/include/endian.h /usr/include/bits/endian.h /usr/include/bits/byteswap.h /usr/include/xlocale.h /usr/include/bits/wchar2.h /usr/include/bits/wchar-ldbl.h /usr/include/gconv.h /usr/include/bits/stdio-lock.h /usr/include/bits/libc-lock.h /usr/include/pthread.h /usr/include/sched.h /usr/include/time.h /usr/include/bits/time.h /usr/include/bits/sched.h /usr/include/signal.h /usr/include/bits/signum.h /usr/include/bits/siginfo.h /usr/include/bits/sigaction.h /usr/include/bits/sigcontext.h /usr/include/asm/sigcontext.h /usr/include/linux/types.h /usr/include/linux/posix_types.h /usr/include/linux/stddef.h /usr/include/asm/posix_types.h /usr/include/asm/types.h /usr/include/asm-generic/int-ll64.h /usr/include/bits/sigstack.h /usr/include/sys/ucontext.h /usr/include/bits/pthreadtypes.h /usr/include/bits/sigthread.h /usr/include/bits/setjmp.h /usr/include/bits/libio-ldbl.h /usr/include/bits/stdio_lim.h /usr/include/bits/sys_errlist.h /usr/include/getopt.h /usr/include/ctype.h /usr/include/bits/stdio.h /usr/include/bits/stdio2.h /usr/include/bits/stdio-ldbl.h /usr/include/stdlib.h /usr/include/bits/waitflags.h /usr/include/bits/waitstatus.h /usr/include/alloca.h /usr/include/bits/stdlib.h /usr/include/bits/stdlib-ldbl.h /usr/include/string.h /usr/include/bits/string.h /usr/include/bits/string2.h /usr/include/bits/string3.h /usr/include/assert.h /usr/include/errno.h /usr/include/bits/errno.h /usr/include/linux/errno.h /usr/include/asm/errno.h /usr/include/asm-generic/errno.h /usr/include/asm-generic/errno-base.h /usr/include/dirent.h /usr/include/bits/dirent.h /usr/include/bits/posix1_lim.h /usr/include/bits/local_lim.h /usr/include/linux/limits.h /usr/include/setjmp.h config.h lib.h gc.h unwind.h regex.h /usr/include/limits.h /usr/lib/gcc/i586-redhat-linux/4.4.1/include/limits.h /usr/lib/gcc/i586-redhat-linux/4.4.1/include/syslimits.h /usr/include/bits/posix2_lim.h /usr/include/bits/xopen_lim.h stream.h parser.h txr.h utf8.h filter.h match.h</pre>
543

edits