Talk:Best shuffle: Difference between revisions

Content added Content deleted

Inline

Revision as of 19:54, 15 December 2010

This was a fun algorithm to write --- lots of twists and turns. Gerard Schildberger

Task Description

I think it would improve the task description if we said something along the lines of "Shuffle the characters of a string to produce a string as different as possible from the original. That is, the maximum # of characters in the output differ from the characters at the corresponding positions at the input". Only more concise :). I say that because the current wording lends itself to a trivial solution - reversing the string (because it doesn't make clear that a "character" is different only if its value, not its index, is different). In fact, that's how I initially interpreted the task (character=index), until I saw the solutions were a lot more complex than reverse().

--DanBron 18:28, 15 December 2010 (UTC)

@@ Line 1: / Line 1: @@
 This was a fun algorithm to write --- lots of twists and turns. [[User:Gerard Schildberger|Gerard Schildberger]]
-== J solution ==
+== Task Description ==
+I think it would improve the task description if we said something along the lines of "Shuffle the characters of a string to produce a string as different as possible from the original.  That is, the maximum # of characters in the output differ from the characters at the corresponding positions at the input".  Only more concise :).  I say that because the current wording lends itself to a trivial solution - reversing the string (because it doesn't make clear that a "character" is different only if its value, not its index, is different).  In fact, that's how I initially interpreted the task (character=index), until I saw the solutions were a lot more complex than reverse().
-[http://rosettacode.org/mw/index.php?title=Best_shuffle&diff=97364&oldid=97362 Previously], the J solution was <lang j>bestShuf=:3 :0
-  target=. 0 >. ({. - +/@}.) \:~ #/.~y
-  n=._1 [ lim=.!#y
-  while.lim > n=.n+1 do.
-    r=.n A. y
-    if.target=+/r=y do.return.end.
-  end.
-)</lang>
-I just [http://rosettacode.org/mw/index.php?title=Best_shuffle&diff=97419&oldid=97384 changed this] to <lang j>bestShuf1 =: verb define
-	yy=.(\:#&>)@:(<@I.@=) y
-	z=.0 $ a:
-	while. 1 < #yy do.
-		 r  =. {.&>{.yy
-		 yy =. (}.&.>{.yy),}.yy
-		 q  =. {.&> {. }.yy
-		 yy =. ({.yy),(}.&.>{.}.yy),}.}.yy
-		 z  =. z,<r,q
-		 yy =. yy-.a:
-	end.
-	z=.  z , > {. 'b f'=.((2#a:)&, </.~ 1 0 , 1<#&>) yy
-	w=.  y C.~ z
-	w C.~ :: [ f ,&.> w (~:/ i. 1:)"1 0 y {~ f =. ;f
-)</lang> While the new version still could use a lot of cleanup, it is significantly faster than the previous, even on the test inputs provided with the task.
-I do agree that an <tt>A.</tt>-based or <tt>A.</tt>-family approach seems warranted, and that was my first instinct as well.  However, I couldn't find a way to directly calculate the anagram index, which perhaps defeats the purpose (<tt>A.</tt> et al. are best and clearest when you know what you're looking for in advance).  Given the nature of the task, and that the anagram index will depend on the character frequency and will likely not be unique, I'm not convinced there is a closed-form solution.  It seems Raul ran into the same issue, hence the O(n!) brute-force approach.
-The new approach is O(n), but it has its own issues.  In particular, I wonder if there's a better way to handle the "remainder" issue, so that the current last line may be omitted as superfluous and the final assignment to <tt>z</tt> could be cleaned up.
-Ideally, I'd also like to replace the explicit loop with a more natural, larger-scope array approach (rather than operating scalar-by-scalar).  That done, we could convert the whole thing to a simple tacit verb.  I don't think that would be hard, but I don't have time to do it now.
-Finally, and if possible, I'd prefer to use <tt>A.</tt> rather than <tt>C.</tt> (without applying the trivial identity <tt>C. -: (A.~ A.)~</tt>).
-Since I didn't comment the code, it may be useful to sketch the approach, for anyone else who'd like to clean it up:
-*  Generate the list of "equivalent indices" (i.e. a vector of boxes, each box containing a list of indices where all the characters are identical).  Obviously (#yy)=#~.y  .
-*  Sort this vector so that the boxes containing the longest lists come first. (Not sure this step is required, but it seemed sensible at the time).
-*  Loop over the index vector, moving indices to a permutation vector, until the index vector is empty or mostly empty (less than 2 boxes).
-*  In each iteration of the loop, work with the first 2 boxes of the index vector, and update the permutation vector.
-*  Remove the first index from the first box of the index vector.
-*  Remove the second index from the second box of the index vector.
-*  Pair these (put them in a box), and append to the permutation vector.
-The concept is that we know all the indices in each box of the index vector represent identical values, and that therefore indexes from different boxes of the index vector represent different values.  So we pair off indexes from different boxes, until there are (almost) no indexes left.
-Then we use <tt>C.</tt> to swap each of these pairs in the original input, producing the tentative output.  now we must handle the remainder - the unpaired indexes.  Since I haven't studied the potential characteristics of these remainders, and since at least in the test inputs the remainder is at most one index, I simply swap this remainder with the first character in the tentative output that's different from it, producing the final output.
-It's this last step, and the scalar nature of the explicit loop, and the explicit loop itself, that I would like to see addressed first.  One obvious improvement would be to pair off more than one index at a time.  Again, that wouldn't be hard, but I don't have time now.
 --[[User:DanBron|DanBron]] 18:28, 15 December 2010 (UTC)
-:  PS:  I think it would improve the task description if we said something along the lines of "Shuffle the characters of a string to produce a string as different as possible from the original.  That is, the maximum # of characters in the output differ from the characters at the corresponding positions at the input".  Only more concise :).  I say that because the current wording lends itself to a trivial solution - reversing the string (because it doesn't make clear that a "character" is different only if its value, not its index, is different).  In fact, that's how I initially interpreted the task (character=index), until I saw the solutions were a lot more complex than reverse().