Talk:Execute a Markov algorithm: Difference between revisions

Content added Content deleted

Inline

Revision as of 00:33, 21 December 2009

Rules re-write needed

The example rules should be in the format of the WP article, as they are hard to follow as given. (Is their any reason not to use one of the rule sets worked out from the WP article)? --Paddy3118 07:55, 15 December 2009 (UTC)

I put it in Extended BNF notation which is more precise than the format in the WP article, and it is a standard for representing grammars. There's only two main differences: the WP version has quotes, which aren't really necessary; and this one allows for comments using #. --Rob.s.brit 16:21, 15 December 2009 (UTC)

(I originally wrote this early this morning, but I didn't notice an edit conflict with the J talk addition before I went to work.) While EBNF is pretty common (and interesting on its own), I can't help but wonder if the parsing of the Markov rules should be distinct pieces of funtionality. I.e. some Parse EBNF, and a Execute Markov Algorithm. The particulars of parsing EBNF seem irrelevant to Markov chains themselves. While there are broad tasks on RC (such as RCRPG and a few interpreters), they're usually not created until after the individual concepts have places elsewhere. --Michael Mol 08:17, 16 December 2009 (UTC)

J

explicit vs tacit

Returning to this just now, I realized from the WP description (and its talk page) that the Markov Algorithm has a very strict order of evaluation -- much stricter than I originally considered. In particular, the entire ruleset must be applied after every successful match.

That means, for example, if rule 0 matches, then make one and only one substitution (to the leftmost match) and reapply rule 0 until it no longer matches. Then apply rule 1. If that matches, then start all over and apply rule 0. And so on. At any point, if a terminating rule matches, make the single substitution and quit.

I haven't read the other implementations yet, but at least for the J it means we must take a different approach. In particular, we cannot use stringreplace because it replaces all matches (for even a single rule) simultaneously. We must be more granular and iterative.

So I suggest we use an explicit while. loop, as distasteful as that is*. Maybe something along the following lines: <lang j>markovLexer =: verb define rules =. LF cut TAB&=`(,:&' ')}y rules =. a: -.~ (dltb@:{.~ i:&'#')&.> rules rules =. 0 _1 {"1 '\s+->\s+' (rxmatch rxcut ])S:0 rules

NB. Determine which rules are terminating (,. ] (}.&.>~ ,. ]) ('.'={.)&.>)/ |: rules )

replace =: dyad define 'index patternLength replacement'=. x 'head tail' =. index split y

head, replacement, patternLength }. tail )

matches =: E. i. 1:

markovStrict =: dyad define rules =. markovLexer x

ruleIdx =. 0

while. ruleIdx < #rules do. 'pattern replacement terminating' =. ruleIdx { rules

if. (#y) > index =. pattern matches y do.

y =. (index ; (#pattern) ; replacement) replace y

if. terminating do. NB. If terminating rule, just return current string ruleIdx =. #rules else. NB. Else reevaluate from the beginning after every match ruleIdx =. 0 end. else. ruleIdx =. 1 + ruleIdx end. end.

y )</lang>

Of course, this can be translated fairly mechanically to tacit code:<lang j>markovStrictT =: [: finalize markovLexer@:[ evalMarkov^:( noMoreRules )^:_ initialize

 initialize       =.  0 ; ]                            NB.  First rule is 0 obviously
 finalize         =.  >@:{:                            NB.  Don't need ruleNo when finished
 noMoreRules      =.  >@:{.@:] < #@:[                  NB.  Done when rule index = # of rules
 evalMarkov       =.  ruleNo  nextRule`applyRule@.ruleMatched checkRule
   nextRule       =.  >:@:[ ; {.@:]
   ruleMatched    =.  [: (>~ #)~&>/ 2 {. ]
   applyRule      =.  (terminated ; replaceArg replace >@:{.)@:]
     terminated   =.  _ * >@:{:@:]                     NB.  (_ * terminated) = _ when terminated, 0 otherwise
     replaceArg   =.  1&{ , #&.>@:(2&{) , 3&{          NB.  index ; (#pattern) ; replacement
     replace      =.  2 ;@:A. >@:{:@:[ ; (0 , 1 {:: [) }.&.> (split~ >@:{.)~
   checkRule      =.  rule match txt
     txt          =.  1 {:: ]
     rule         =.  {~ >@:{.
     match        =.  ] ; (>@:{.@:[ index ]) ; [       NB.  txt ; index ; pattern ; replacement ; terminating
       index      =.  E. i. 1:
     ruleNo       =.  0 {:: ]
 markovLexer      =.  (,. ] (}.&.>~ ,. ]) ('.'={.)&.>)/@:|:@:lexed@:normal@:lines 
   lines          =.  LF cut TAB&=`(,:&' ')}
   normal         =.  a: -.~ (dltb@:{.~ i:&'#')&.> 
   lexed          =.  0 _1 {"1 '\s+->\s+' (rxmatch rxcut ])S:0 ]</lang>

But I do not think that the tacit can claim any advantage over the explicit, in this case. Unless someone can come up with a clearer way of expressing the algorithm tacitly (maying using a different approach).

Comments?

--DanBron 00:18, 16 December 2009 (UTC)

* I agree with "Physis" on WP who thinks the Markov Algorithm specification is ill considered, and who probably came to that conclusion the same way -- by having to bend over backwards in an implementation to accommodate its spec -- fighting the natural expression.

I was new to this Markov stuff and just took it as read - that is the spec, now implement it. I did enjoy the the task as getting the implementation right made me use the else clause of the Python for statement, which I hardly ever do.

I guess, on reflection, the logic in the replace function of the Python example is not straight-forward, but at the time, I just took the spec as being correct, I have no reason to question it? --Paddy3118 05:03, 16 December 2009 (UTC)

So did I, until I re-read the WP page, and in particular Physis' "Example" section on the Talk page. The primary surprise (for me) is that you may only do one replacement at a time, then you must start all over from the beginning, until you hit a terminating rule or you stop doing replacements (i.e. the output is stable).

So, e.g., if you have the rules baa->def, a->b, then applied to the input aaa your result must be def. This was an eye-opener for me, because I would expect the result bbb. But then, J is an array-oriented language and likes to do things "all at once" or "in bulk", so maybe this single-match iterative approach is natural to scalar languages. --DanBron 12:48, 16 December 2009 (UTC)

Would the all-at-once version of the language still be Turing-complete? --Ce 17:49, 16 December 2009 (UTC)

Do you mean the all-at-once version of the Markov algorithm written in J, or the all-at-once flavor of J? Certainly the all-at-once flavor of J is Turing complete (a well-known Jer implemented a Turing machine in it). I don't know about the Markov algorithm, but I suspect it is TC, because as Paddy points out, Markov is just a constricted term rewriter (which are Turing complete in general).

The all-at-once Markov algorithm (I assumed that J is Turing complete, because general-purpose programming languages usually are). I've now figured out the answer myself: It is. Indeed it is quite easy to write a Turing machine in it (it made me catch a bug in my C++ implementation, though). It operates the same with the all-at-once and the first-match-only version because it only operates at the head position, and there's only one head. --Ce 23:17, 17 December 2009 (UTC)

While it might well be TC, it would also be a different algorithm; you'd have to write things differently. Solve this problem, and then maybe write the other one up as another {{task}}... –Donal Fellows 10:26, 18 December 2009 (UTC)

Markov a specific form of Rewriting scheme?

I got lost in this, and this, and this. After a surfaced it seemed to me that the Markov is a specific case of more general rewriting schemes, so maybe you have been expecting the more general case? --Paddy3118 13:07, 16 December 2009 (UTC)

original version

I just slapped together the J solution in 20 seconds, so it doesn't fully implement a Markov evaluator. In particular, it doesn't handle terminating rules; there may be other small wrinkles, but that depends on how one interprets the (underspecified) BNF grammar. All would be trivial to fix, but I don't have time now. Anyway, here are the gaps I think of:

Terminating rules are not handled. The fix is easy but would involve modification or duplication of a standard function, stringreplace.

Similarly, according to WP,
if [a rule is matched], replace the leftmost matching text in the input string with its replacement in the first applicable rule.
But stringreplace will replace all instances of matches at once. Again, a trivial fix as long as we're willing to reprise stringreplace.

The standard function rxcut will cut on all \s+->\s+ though the grammar calls for ignoring any of these after the first (so the replacement string may carry a ->). Fix is trivial, where we'd probably use rxEinstead. Silly, we can just use rxmatch rather than rxmatches. --DanBron 17:47, 15 December 2009 (UTC)

~~Note also that leading blanks in the rule and trailing blanks in the replacement are removed -- the grammar doesn't call for this. Very trivial fix.~~
~~I just realized it took me longer to write up the description of these gaps than it would've taken me to fix them. :P --DanBron 16:27, 15 December 2009 (UTC)~~
The above comments applied to the original version <lang j>require'strings regex'
NB. Lex a Markov program markovLexer =: verb define rules =. LF cut TAB&=`(,:&' ')}y rules =. a: -.~ (dltb@:{.~ i:&'#')&.> rules 0 _1 {"1 '\s+->\s+' (rxmatch rxcut ])S:0 rules )
NB. Given ruleset and target string, output NB. result. markov =: markovLexer@[ stringreplace^:_ ]
NB. Same as above, but output all intermediate NB. evaluations markovDebug =: markovLexer@[ stringreplace^:a: ]</lang> ... which has since been replaced.
~~Multiplication~~
To show the power of a Markov Algorithm engine, I've included a sample testcase that performs unary multiplication. If you can do this, you've got a good implementation. (I'm not sure if this indicates that such engines are turing complete, but I suspect it does.) –Donal Fellows 09:51, 17 December 2009 (UTC)
~~lede too long~~
The task description (everything prior to the TOC) is long enough to be distracting -- if I didn't know RC's conventions, I might never scroll past it all to see the implementations. Can we somehow wrap the lede around the TOC? Or put the rulesets on a subpage and link to them? Or have fewer rulesets? I'm actually surprised we need so many; though annoying, the implementation is trivial, and the (distasteful) while loop I tossed together passed all the tests without modification (well, I keep modifying it for aesthetic reasons, but it hasn't changed functionally). Can we just keep the "hardest" or "most comprehensive" one?
--DanBron 14:33, 18 December 2009 (UTC)
Easiest and hardest ones? --Paddy3118 14:52, 18 December 2009 (UTC)
I was thinking the hardest one, because doing that implies you can do all the rest. The full "test suite" could be linked to as a sub page. -DanBron 18:48, 18 December 2009 (UTC)

I think that should solve the problem... --Michael Mol 19:38, 18 December 2009 (UTC)
This leads leads me to suspect we should start discussing task layout and style. --Michael Mol 19:41, 18 December 2009 (UTC)
~~Is the pattern greedy or conservative?~~
~~In the rule:~~
~~foo -> -> bar~~
Is the pattern foo ->, and the replacement bar; or is the pattern foo, and the replacement -> bar?
What should be the right answer, and does the BNF express it? --Paddy3118 22:30, 18 December 2009 (UTC)
You'd think BNF would consistently be either minimal or greedy, but I couldn't find anything with Wikipedia or Google. I guess we just shouldn't consider that case. There are a number of related flaws in the ruleset syntax; there's no way to end a pattern with whitespace, for example. —Underscore (Talk) 17:56, 20 December 2009 (UTC)
~~Task addition~~
~~~~Paddy3118~~ today added to the task description:~~
~~In order to promote flexibility, the interpreter should load the set of rules from one file, take the string to operate on from a second file, and write the output to a third.~~
~~IMHO that's the antithesis of flexibility. You can redirect or pipe standard input/standard output. You can't easily do that with files. --Ce 22:32, 18 December 2009 (UTC)~~
Hi Ce, I didn't make the addition to the task description. In fact, I too thought it was less flexible so wrote the following in the Python example:
"The example gains flexibility by not being tied to specific files. The functions may be imported into other programs which then can provide textual input from their sources without the need to pass 'file handles' around."

--Paddy3118 05:09, 19 December 2009 (UTC)
Oops, sorry for the mis-attribution. I hadn't noticed the sentence before and your latest edit had a change comment about flexibility, so I mistakenly thought you had added it. I should have checked the history more carefully. --Ce 09:03, 19 December 2009 (UTC)
Yeah, I completely ignored that rule when I wrote the Perl and Haskell implementations. —Underscore (Talk) 18:15, 19 December 2009 (UTC)
I excised the rule. If anyone wants to put it back, we can discuss that here, first. --DanBron 18:41, 19 December 2009 (UTC)
I wrote that bit as part of an effort to characterize the original implementation and build up the task description. (The task description was very rough to start out with!) I still think it makes sense to load the rulesets from an external file. After all, they're a program (a nearly pure turing machine) and they can get rather long, and it's useful to load from a file as this encourages separation of the interpretation engine from ruleset itself. –Donal Fellows 09:35, 20 December 2009 (UTC)
Plus the fact that the ruleset definition includes a comment syntax (which I didn't write, just tidied up) is highly indicative that it was originally intended to be file-hosted. (I'm not sorry to lose the requirement for input and output data though.) –Donal Fellows 09:37, 20 December 2009 (UTC)
Hi Donal, on some operating systems, like flavours of Unix, stipluating the data source to be a file makes it very flexible, as most things are made to look like a file by the OS: ftp, HTTP, pipes, ... if your program aaccepts files, then all is well. This may not be the case in other operating systems. I think it is OK to treat the source of the data as a separate concern that is not part of the task. --Paddy3118 23:12, 20 December 2009 (UTC)

The old Unix way would be to have a couple of flags -f rule_file, and optionally -o output_file. The input file would be any additional files specified on the command line or, if none were specified, from stdin. The output would be to stdout unles the -o option was specified.

examples:
process1 | markov -f myRules | process2

markov -f myRules -o myOut.txt myInput.txt myInput2.txt myInput3.txt

markov -f myRules < myInput.txt > myOutput.txt

--Rldrenth 00:25, 21 December 2009 (UTC)

@@ Line 169: / Line 169: @@
 :::: Plus the fact that the ruleset definition includes a comment syntax (which I didn't write, just tidied up) is highly indicative that it was originally intended to be file-hosted. (I'm not sorry to lose the requirement for input and output data though.) –[[User:Dkf|Donal Fellows]] 09:37, 20 December 2009 (UTC)
 :::::Hi Donal, on some operating systems, like flavours of Unix, stipluating the data source to be a file makes it very flexible, as most things are made to look like a file by the OS: ftp, HTTP, pipes, ... if your program aaccepts files, then all is well. This may not be the case in other operating systems. I think it is OK to treat the source of the data as a separate concern that is not part of the task. --[[User:Paddy3118|Paddy3118]] 23:12, 20 December 2009 (UTC)
-:::The old Unix way would be to have a couple of flags -f rule_file, and optionally -o output_file. The input file would be any additional files specified on the command line or, if none were specified, from stdin.  The output would be to stdout unles the -o option was specified. --[[User:Rldrenth|Rldrenth]] 00:25, 21 December 2009 (UTC)
+:::The old Unix way would be to have a couple of flags -f rule_file, and optionally -o output_file. The input file would be any additional files specified on the command line or, if none were specified, from stdin.  The output would be to stdout unles the -o option was specified.
+:::examples:
+::::process1 | markov -f myRules | process2
+::::markov -f myRules -o myOut.txt myInput.txt myInput2.txt myInput3.txt
+::::markov -f myRules < myInput.txt > myOutput.txt
+:::--[[User:Rldrenth|Rldrenth]] 00:25, 21 December 2009 (UTC)