Talk:Best shuffle

From Rosetta Code

This was a fun algorithm to write --- lots of twists and turns. Gerard Schildberger

Task Description

I think it would improve the task description if we said something along the lines of "Shuffle the characters of a string to produce a string as different as possible from the original. That is, the maximum # of characters in the output differ from the characters at the corresponding positions at the input". Only more concise :). I say that because the current wording lends itself to a trivial solution - reversing the string (because it doesn't make clear that a "character" is different only if its value, not its index, is different). In fact, that's how I initially interpreted the task (character=index), until I saw the solutions were a lot more complex than reverse().

--DanBron 18:28, 15 December 2010 (UTC)

Well, it says "characters" not "indexes". But feel free to change the wording as you see fit. Fwend 20:16, 15 December 2010 (UTC)

I recently compiled and ran the D codes that had replaced my own version and noticed that they produced the same result every time. This isn't what I had in mind when I used the word "shuffle". I'm assuming that the other "deterministic" codes have the same limitation. It isn't a problem as far as I'm concerned. It's still interesting and educational to see this approach. Nevertheless I'm putting back my own code because it produces the result that I was looking for. Fwend 09:39, 15 May 2011 (UTC)

If I read the task description correctly, the task is actually looking for an "an organization as distant from the original organization as possible". The result is probably provable to that end. If the non-deterministic code doesn't result in an optimum solution, it's probably incorrect. I may be wrong, though. --Michael Mol 10:49, 15 May 2011 (UTC)
I'm going to play devils advocate on the words here. The task says "as many of the character values are in a different position as possible" which seems clear to me that this means as few coincidences of characters by position. While "an organization as distant from the original organization as possible" could be taken as an arrangement where the characters are also far away from their original position. I don't see any of the solutions addressing that metric. I think either approach could be deterministic. I do admit that "shuffle" implies some randomness. (The Icon solution has a commented out line for this since it wasn't clear at the time). Some of the other solutions may have trouble adapting (not sure). If it's late in the game to clarify, could randomness be an extra points type requirement? --Dgamey 13:53, 15 May 2011 (UTC)
I think that's a good idea. Having more than one approach is more interesting anyway. English is not my native language. Maybe somebody could adjust the task description. I don't want to risk creating more confusion :) Fwend 14:24, 15 May 2011 (UTC)
Okay, how about something like this "Shuffle the characters of a string in such a way that as many of the character values are in a different position as possible. Print the result as follows: original string, shuffled string, (score). \n\nWhere the score the number of characters that remain in their original position. The better the shuffle the smaller the score. \n\nAs there are often multiple solutions that achieve the lowest score, for extra points include or show how to achieve randomization amongst these lowest scoring results." --Dgamey 03:12, 16 May 2011 (UTC)
How about: "Shuffle the characters of a string in such a way that as many of the character values are in a different position as possible. Print the result as follows: original string, shuffled string, (score). \n\n. The score gives the number of positions whose character value did not change. A shuffle that produces a randomized result is to be preferred. A deterministic approach that produces the same sequence every time is acceptable as an alternative." Fwend 20:13, 16 May 2011 (UTC)
Very clear. Go for it! --Dgamey 20:27, 16 May 2011 (UTC)

J implementation notes

<lang j>bestShuf2 =: verb define

 equivs=. (\:#&>)@:(<@I.@=) y
 y C.~ (;equivs) </.~ (i.#y) |~ #>{. equivs

)</lang>

This mechanism has two steps.

First, we group indices for multiple instances of the same character, with indices for the most frequently occurring character appearing first. In other words:

<lang j> (\:#&>)@:(<@I.@=) 'abracadabra' ┌──────────┬───┬───┬─┬─┐ │0 3 5 7 10│1 8│2 9│4│6│ └──────────┴───┴───┴─┴─┘</lang>

Here, 'a' is the most frequently occurring character and it appears at character indices 0, 3, 5, 7 and 10.

Next we find the number of occurrences of the most frequently occurring character and we group these rearranged indices into that many distinct groups. In other words if equivs=: (\:#&>)@:(<@I.@=) y=:'abracadabra', we need 5 distinct groups of indices to separate the five instances of the letter 'a'. We do this by taking the grouped character indices from before, ignoring the grouping and counting to 5, repeatedly (the first index goes in the first group, the second goes in the second group, and so on):

<lang j> (i.#y) |~ #>{. equivs 0 1 2 3 4 0 1 2 3 4 0

  (;equivs) </.~ (i.#y) |~ #>{. equivs

┌─────┬───┬───┬───┬────┐ │0 1 6│3 8│5 2│7 9│10 4│ └─────┴───┴───┴───┴────┘</lang>

These new groupings represent cycles and are used to permute the original sequence of characters.

Maybe it's late or just missing something (and my J just sucks), but if I read this right this should lead to the permutation "baadcbaraar" with position 4 unchanged for a score of 1. It certainly isn't obvious how the posted solution arrives at "bdabararaac". --Dgamey 02:00, 18 April 2011 (UTC)

In degenerate cases, where more than half of the characters are the same, some of these cycles will only involve one character, which can not move to another position.

Note that we can accomplish this without sorting. The important issue is that each cycle involving the same character be distinct from other cycles.

<lang j>bestShuf3 =: verb define

 y C.~ (;eqs) </.~ (i.#y) |~ {:$> eqs=. <@I.= y

)</lang>

Or, for fans of zero-point code: bestShuf4=: (C.~ ; </.~ {:@$@:> | i.@#@;) <@I.@=

Javascript implementation

abracadabra, bdabararaca, (1)

Thank you for implementing a Javascript version. I'm afraid it needs tweaking, however. The word "abracadabra" can be shuffled without any character in the original position. Fwend 19:01, 17 December 2010 (UTC)

Oops, thank you, I should have paid attention to the results I was getting. --Rdm 19:15, 17 December 2010 (UTC)


C entry

"... every character is considered unique in a word and that's why grrrrr also has a score of 0."

The score is supposed to give the number of positions that have the same character value as before. It's not very important, but makes it easier to quickly see if you've achieved the optimal result.

Also the code doesn't shuffle the characters. Perhaps you should imagine the routine being used in a word game or puzzle. The results that this code produces, although clever in the way it solves the problem, wouldn't be very satisfactory. Fwend 18:53, 6 January 2011 (UTC)

The problem with the C entry is that it does not work in the general case. Consider, for example:

<lang>Enter String : aabbbbaa

aabbbbaa, aabbbbaa, (0)</lang>

(The correct answer, here, would be bbaaaabb). --Rdm 20:24, 6 January 2011 (UTC)
Never mind, I fixed it. --Rdm 17:31, 7 January 2011 (UTC)

On the Python "Swap if it is locally better algorithm"

I had visited this problem around two weeks before and wrote a solution based on testing all permutations for the one with the best shuffle, but I could not wait for an answer when it was given 'abracadabra' and decided to 'sit on it'.

Today someone made an addition and I glanced through the AWK entry which mentioned swapping. That got me thinking of trying swapping characters and the first result of note that I came up with was: <lang python>>>> def bs(w): w2 = list(w) rangew = range(len(w)) for i in rangew: for j in rangew: if i != j and w[i] != w[j]: w2[j], w2[i] = w2[i], w2[j] w2 = .join(w2) return w2, count(w, w2)

>>> bs('elk') ('kel', 0) >>> for w in 'tree abracadabra seesaw elk grrrrrr up a'.split(): print(w, bs(w))


tree ('reet', 1) abracadabra ('aacrdrbaaab', 2) seesaw ('seesaw', 6) elk ('kel', 0) grrrrrr ('rrgrrrr', 5) up ('up', 2) a ('a', 1)</lang>

I then thought that for better results I needed to go through the swapping twice; then multiple times - but that might loop forever so I added n, the maximum times through the for loops. The condition on when to swap needed refinement too (- I never did check if this refinement would work without the need for the outer while loop?): <lang python>>>> def bs(w): w2 = list(w) w2old = [] n = len(w) rangew = range(n) while w2old != w2 and n: n -= 1 w2old = w2[:] for i in rangew: for j in rangew: if i != j and w2[j] != w2[i] and w[i] != w2[j] and w[j] != w2[i]: w2[j], w2[i] = w2[i], w2[j] w2 = .join(w2) return w2, count(w, w2) </lang>

At this point I shoved this code in a file from the idle shell that I had been developing on, so intermediate results are harder to recreate.

The main changes developed in the file where to further refine the inner-most if statement determining when to swap. I then decided to add some randomness by the simple means of shuffling the range of integers that variables i and j iterate over.

Being as the code was nolonger based on testing all permutations, I gave it a word guaranteed to stress those types of solutions: antidisestablishmentarianism ;-) then just made sure it worked with the variety of words shown on the page, and tidied my output formatting. --Paddy3118 08:38, 22 May 2011 (UTC)