Talk:Execute a Markov algorithm

From Rosetta Code

Rules re-write needed

The example rules should be in the format of the WP article, as they are hard to follow as given. (Is their any reason not to use one of the rule sets worked out from the WP article)? --Paddy3118 07:55, 15 December 2009 (UTC)

I put it in Extended BNF notation which is more precise than the format in the WP article, and it is a standard for representing grammars. There's only two main differences: the WP version has quotes, which aren't really necessary; and this one allows for comments using #. --Rob.s.brit 16:21, 15 December 2009 (UTC)

J

I just slapped together the J solution in 20 seconds, so it doesn't fully implement a Markov evaluator. In particular, it doesn't handle terminating rules; there may be other small wrinkles, but that depends on how one interprets the (underspecified) BNF grammar. All would be trivial to fix, but I don't have time now. Anyway, here are the gaps I think of:

  1. Terminating rules are not handled. The fix is easy but would involve modification or duplication of a standard function, stringreplace.
  2. Similarly, according to WP,

    if [a rule is matched], replace the leftmost matching text in the input string with its replacement in the first applicable rule.

    But stringreplace will replace all instances of matches at once. Again, a trivial fix as long as we're willing to reprise stringreplace.
  3. The standard function rxcut will cut on all \s+->\s+ though the grammar calls for ignoring any of these after the first (so the replacement string may carry a ->). Fix is trivial, where we'd probably use rxEinstead. Silly, we can just use rxmatch rather than rxmatches. --DanBron 17:47, 15 December 2009 (UTC)
  4. Note also that leading blanks in the rule and trailing blanks in the replacement are removed -- the grammar doesn't call for this. Very trivial fix.

I just realized it took me longer to write up the description of these gaps than it would've taken me to fix them. :P --DanBron 16:27, 15 December 2009 (UTC)

explicit vs tacit

Returning to this just now, I realized from the WP description (and its talk page) that the Markov Algorithm has a very strict order of evaluation -- much stricter than I originally considered. In particular, the entire ruleset must be applied after every successful match.

That means, for example, if rule 0 matches, then make one and only one substitution (to the leftmost match) and reapply rule 0 until it no longer matches. Then apply rule 1. If that matches, then start all over and apply rule 0. And so on. At any point, if a terminating rule matches, make the single substitution and quit.

I haven't read the other implementations yet, but at least for the J it means we must take a different approach. In particular, we cannot use stringreplace because it replaces all matches (for even a single rule) simultaneously. We must take a much more iterative approach.

So I suggest we use an explicit while loop, as distasteful as it is (I agree with "Physis" on WP who thinks the Markov Algorithm specification is ill considered, and who probably came to that conclusion the same way -- by having to bend over backwards in an implementation to accommodate its spec -- fighting the natural expression).

Maybe something along these lines: <lang j>markovLexer =: verb define rules =. LF cut TAB&=`(,:&' ')}y rules =. a: -.~ (dltb@:{.~ i:&'#')&.> rules rules =. 0 _1 {"1 '\s+->\s+' (rxmatch rxcut ])S:0 rules

NB. Determine which rules are terminating (,. ] (}.&.>~ ,. ]) ('.'={.)&.>)/ |: rules )


replace =: dyad define 'index patternLength replacement'=. x 'head tail' =. index split y

head, replacement, patternLength }. tail )

matches =: E. i. 1:

markovStrict =: dyad define rules =. markovLexer x

ruleIdx =. 0

while. ruleIdx < #rules do. 'pattern replacement terminating' =. ruleIdx { rules

if. (#y) > index =. pattern matches y do.

y =. (index ; (#pattern) ; replacement) replace y

if. terminating do. NB. If terminating rule, just return current string ruleIdx =. #rules else. NB. Else reevaluate from the beginning after every match ruleIdx =. 0 end. else. ruleIdx =. 1 + ruleIdx end. end.

y )</lang>

Of course, this can be translated fairly mechanically to tacit code:<lang j>markovStrictT =: [: finalize markovLexer@:[ evalMarkov^:( noMoreRules )^:_ initialize

 initialize       =.  0 ; ]                            NB.  First rule is 0 obviously
 finalize         =.  >@:{:                            NB.  Don't need ruleNo when finished
 noMoreRules      =.  >@:{.@:] < #@:[                  NB.  Done when rule index = # of rules
 evalMarkov       =.  ruleNo  nextRule`applyRule@.ruleMatched checkRule
   nextRule       =.  >:@:[ ; {.@:]
   ruleMatched    =.  [: (>~ #)~&>/ 2 {. ]
   applyRule      =.  (terminated ; replaceArg replace >@:{.)@:]
     terminated   =.  _ * >@:{:@:]                     NB.  (_ * terminated) = _ when terminated, 0 otherwise
     replaceArg   =.  1&{ , #&.>@:(2&{) , 3&{          NB.  index ; (#pattern) ; replacement
   checkRule      =.  rule match txt

txt =. 1 {:: ]

     rule         =.  {~ >@:{.
     match        =.  ] ; (>@:{.@:[ index ]) ; [       NB.  txt ; index ; pattern ; replacement ; terminating
       index      =.  E. i. 1:
     ruleNo       =.  0 {:: ]</lang>

But I do not think that the tacit can claim any advantage over the explicit, in this case. Unless someone can come up with a clearer way of expressing the algorithm tacitly (maying using a different approach).

Comments?

--DanBron 00:18, 16 December 2009 (UTC)


http://rosettacode.org/wiki/Markov_Algorithm#J