Talk:Multisplit: Difference between revisions

Discard verbose description -- no one found it useful, and implementation has been replaced
(Discard verbose description -- no one found it useful, and implementation has been replaced)
 
(16 intermediate revisions by 7 users not shown)
Line 23:
 
:I do not know if it needs four tasks, but the API needed for an implementation of both the task and the extra credit could be annoyingly complicated to implement, for some languages and/or implementers. But I suppose the extra credit part currently does have enough ambiguity to spawn several tasks? --[[User:Rdm|Rdm]] 20:30, 26 April 2011 (UTC)
 
::Actually, I was referring only to the ambiguity in the interpretation of the non-extra credit part. <tt>:)</tt> The degrees of freedom (some since patched in the description) seem to have included 1) whether patterns are to be applied left-to-right in parallel vs each pattern does a split and then each substring is split on subsequent patterns, 2) whether the patterns are meant to be re-used vs the patterns are used once (or maybe even cyclically), and 3) whether the list should be assumed to be ordered vs the function should order them to ensure longest-token matching. Most of this dithering would have been avoided if the task description had been written unambiguously in the first place, but seeing the ambiguities of one's own writing is difficult for most folks... --[[User:TimToady|TimToady]] 23:20, 26 April 2011 (UTC)
 
==Correct result==
Line 39 ⟶ 41:
 
Vincent> Program does exactly what you describe, except your mistake: spearators doesn't reused once they are finished, so for “a!===b=!=c” ("!=", "==", "=") it produces “a <!=> <==> b <=> !=c” - note that '!=' separator doesn't used AGAIN.
 
=== String modified ===
I noticed another present from the argument earlier. "==d was added by anonymous and not caught making every answer wrong. I will reset it. --[[User:Dgamey|Dgamey]] 01:19, 27 April 2011 (UTC)
 
===Clarification in order===
Line 91 ⟶ 96:
This is a reason to tighten the task description, as I think the task description relies too much on the original Python implementation at the moment. --[[User:Paddy3118|Paddy3118]] 08:16, 28 February 2011 (UTC)
:I'm not tempted in a formulation of Rosetta tasks, therefore feel free to rewrite the task as you see fit.--[[User:DSblizzard|DSblizzard]] 09:33, 28 February 2011 (UTC)
 
== J implementation ==
 
===implementation===
 
The J implementation currently looks like this:
 
<lang j>multisplit=:4 :0
'sep begin'=.|:t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x
end=. begin + sep { #@>y
last=.next=.0
r=.2 0$0
while.next<#begin do.
r=.r,.(last}.x{.~next{begin);next{t
last=.next{end
next=.1 i.~(begin>next{begin)*.begin>:last
end.
r=.r,.'';~last}.x
)</lang>
 
The first line uses a fair bit of point free code (because it was convenient, and easy for me) and many Rosetta readers are likely to have problems reading it. And task currently has people puzzled, so perhaps a fuller discussion would also be helpful.
 
===sep begin===
 
The first line of this multisplit defines two variables: <code>sep</code> and <code>begin</code>
 
<lang j> y=. '==';'!=';'='
x=. 'a!===b=!=c'
'sep begin'=.|:t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x
sep
1 0 2 0 2 2 2 1 2
begin
1 2 2 3 3 4 6 7 8</lang>
 
(Note: its normally bad practice to use variables named 'x' and 'y' because they get values bound to them in functions. However, for purposes of illustration I think it's fine to pull code out of a function and use that with explicitly assigned values.)
 
Here, <code>sep</code> is a list of 0, 1 or 2, with 0 corresponding to '==', 1 corresponding to '!=' and 2 corresponding to '='. And, <code>begin</code> is a list of corresponding character positions. The values in <code>begin</code> are in non-decreasing order, and the values in <code>sep</code> are ascending when they correspond to equal values in <code>begin</code>.
 
So, how does that work?
 
<code>E.</code> finds locations where substrings match:
 
<lang j> '==' E. 'a!===b=!=c'
0 0 1 1 0 0 0 0 0 0</lang>
 
In other words, for each character in the right argument, we have a 1 if the substring in the left argument begins there.
 
Or, we can get indices instead of a bit mask:
 
<lang j> '==' I.@E. 'a!===b=!=c'
2 3</lang>
 
Except we need to deal with multiple separators, and they can be differing lengths which means that representing them as a homogeneous array would be bad (every item in a homogeneous array must have the same length). But we can put them in boxes, and have an array of boxes:
 
<lang j> '==';'!=';'='
┌──┬──┬─┐
│==│!=│=│
└──┴──┴─┘</lang>
 
But now we need to tell I.@E. that it is not supposed to work on the boxes directly, instead it needs to work inside the boxes.
 
<lang j> ('==';'!=';'=') I.@E.L:0 'a!===b=!=c'
┌───┬───┬─────────┐
│2 3│1 7│2 3 4 6 8│
└───┴───┴─────────┘</lang>
 
Ok, so great, but now we need to combine the results from those different boxes so we can work with them. Except, before that, right now the boxes distinguish which separator we are using, and we are going to need to add that, explicitly, so we do not lose track of that information after the boxes are gone. In other words, we have three boxes so we need three different values to label their contents with:
 
<lang j> i.@# '==';'!=';'='
0 1 2
('==';'!=';'=') i.@#@[ 'a!===b=!=c'
0 1 2</lang>
 
So, let's add the labels to each of the values in each of the boxes:
 
<lang j> 0 1 2 (,.L:0"0) 2 3;1 7;2 3 4 6 8
┌───┬───┬───┐
│0 2│1 1│2 2│
│0 3│1 7│2 3│
│ │ │2 4│
│ │ │2 6│
│ │ │2 8│
└───┴───┴───┘
('==';'!=';'=') (i.@#@[ ,.L:0"0 I.@E.L:0) 'a!===b=!=c'
┌───┬───┬───┐
│0 2│1 1│2 2│
│0 3│1 7│2 3│
│ │ │2 4│
│ │ │2 6│
│ │ │2 8│
└───┴───┴───┘</lang>
 
(In the first example, we needed parenthesis to tell the computer that 0 2 3 was not meant to be combined. In the second example, we need parenthesis to tell the computer that we want the result of the left and right verbs to be the arguments for the verb in the middle.)
 
So, now we are ready to combine them:
 
<lang j> ;('==';'!=';'=') (i.@#@[ ,.L:0"0 I.@E.L:0) 'a!===b=!=c'
0 2
0 3
1 1
1 7
2 2
2 3
2 4
2 6
2 8</lang>
 
and sort them:
 
<lang j> ('==';'!=';'=') /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) 'a!===b=!=c'
1 1
0 2
2 2
0 3
2 3
2 4
2 6
1 7
2 8</lang>
 
This suggests a simplification -- The &.:(."1) means we are reversing each row before we sort and then reversing them back after each sort. But this busy work could be eliminated by reversing the order of the columns. I might change the code (and this writeup), but I am not going to do that now -- and someone else can (and delete this paragraph) if they get to that before I do.
 
Finally, to assign this array to two different values, we need an array of two items, so we transpose the array.
 
<lang j> |:('==';'!=';'=') /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) 'a!===b=!=c'
1 0 2 0 2 2 2 1 2
1 2 2 3 3 4 6 7 8</lang>
 
But before transposing it we save a copy in the variable <code>t</code> because we need that for the "extra credit" part of this task.
 
===other pre-loop setup===
 
<lang j> end=. begin + sep { #@>y
last=.next=.0
r=.2 0$0</lang>
 
<code>end</code> has the same structure as <code>begin</code> and <code>sep</code>, and gives the character position where each separator ends. It's just the beginning character location plus the length of each corresponding separator.
 
<code>next</code> will be our loop variable -- it's the index of the next value from <code>begin</code>/<code>sep</code>/<code>t</code>/<code>end</code> to be used.
 
<code>last</code> will be our "the last separator ended here" value, which we will be using to search for the next value for <code>next</code>
 
Finally, <code>r</code> will hold our result. It's arranged horizontally rather than vertically, because that uses less screen real estate on rosettacode. And, fortuitously, that makes it easier to select out just the string values (the useful part) from the "extra credit" result.</code>
 
===while. loop===
 
<code>next</code> is an index into any of <code>begin</code>/<code>sep</code>/<code>t</code>/<code>end</code> and when it goes off the end, we are done.
 
<lang j>r=.r,.(last}.x{.~next{begin);next{t</lang>
 
This has three important parts:
 
# <code>r=.r,.</code>''value'' (the result builder).
# <code>last}.x{.~next{begin</code> (the substring extractor), and
# <code>next{t</code> (the extra credit assignment)
 
The first time through the loop, <code>last</code> and <next> will be 0, and the first value from begin is 1, so <code>last}.x{.~next{begin</code> is equivalent to <code>0}.'a!===b=!=c'{.~1</code> or: take the first 1 character from that string and drop the first 0 characters from it. That's the substring containing the single character 'a'. Likewise, <code>0{t</code> will be the value <code>1 1</code>.
 
Next, we save the value where this delimiter instance ends. In other words <code>last</code> takes on the value 3.
 
Finally, we use that ending location to find our next delimiter instance. <code>begin>:last</code>, the first time through the loop, is equivalent to
 
<lang j> 1 2 2 3 3 4 6 7 8 >: 3
0 0 0 1 1 1 1 1 1</lang>
 
And, that would be enough, but I did not want an infinite loop if someone happened to use an empty delimiter, so I added an explicit test for that. It's just a safety measure, and does not actually matter for the required test case. But next{begin is still 1, so:
 
<lang j> 1 2 2 3 3 4 6 7 8 > 1
0 1 1 1 1 1 1 1 1</lang>
 
Anyways, I find the index of the first value where these two bitmasks are both set. (<code>*.</code> is J's boolean and operator.)
<lang j> 0 1 1 1 1 1 1 1 1 *. 0 0 0 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1
(i.&1)0 1 1 1 1 1 1 1 1 *. 0 0 0 1 1 1 1 1 1
3</lang>
 
The next time through the loop, <code>next</code> will be 3. And, when we are done, one of these bitmasks will be all zeros, so next wil l be larger than the the bitmask length.
 
===ending===
 
Finally, we have one last bit of code to execute:
<lang j>r=.r,.'';~last}.x</lang>
 
This gives us the "tail end" -- everything appearing after the last valid separator (or the entire argument string if none were valid). Since there is no separator terminating this instance, I use an empty value to represent its details.
 
== F# incorrect ==
Line 301 ⟶ 122:
whatever remains of inputString after the last separator gets included in the result
</lang> --[[User:Rdm|Rdm]] 19:57, 21 April 2011 (UTC)
 
I do not understand the expected results statement in the task description. The task description says "The function should take an input string and an ordered collection of separator strings, and split the string into pieces representing the various substrings."
 
Given an input string a!===b=!=c, and three separators ==,!= and =, I would expect a result as follows:
 
a! =b= c
 
With the operators matching as follows:
 
a! (==) =b= (!=) c
Note that the third separator is not matched.
 
--[[User:Markhobley|Markhobley]] 21:43, 9 June 2011 (UTC)
 
: There has been a lot of confusion over this task. A couple of people took the position that the delimiters were to be matched and not reused. Others took the position that the delimiter order represented priority in matching and they could be reused. This later is the consensus opinion. Having said that at each position left to right see if a delimiter matches. If not advance and try again. Hence: "a" is not a delimiter, "!=" is, now at "==b", "==" (and not "=") is the delimiter. Does that help? --[[User:Dgamey|Dgamey]] 22:46, 9 June 2011 (UTC)
 
With so many caveats should the F# be just dumped here on the talk page and maybe resurrected as an alternative solution once we have a more compliant F# solution? --[[User:Paddy3118|Paddy3118]] 14:47, 21 July 2011 (UTC)
 
==No longer draft==
The updated wording looks good. I vote to promote. --[[User:Dgamey|Dgamey]] 00:50, 6 July 2011 (UTC)
:How about waiting for one more good implementation? I have just read the talk page again which makes me want to be sure. --[[User:Paddy3118|Paddy3118]] 06:00, 6 July 2011 (UTC)
:: The task description is much better and the implementations are now pretty consistent. But ok. --[[User:Dgamey|Dgamey]] 14:13, 6 July 2011 (UTC)
 
== Desired output ==
 
So what is the desired output of the program? Some solutions on this page only print the fields between the delimiters and don't print anything about the delimiters. Some solutions print a list of alternating fields and delimiters. Others alternate the fields with a pair indicating the type and location of the delimiter. It seems fairly inconsistent. --[[Special:Contributions/208.80.119.68|208.80.119.68]] 23:16, 29 December 2011 (UTC)
: (sigh) This task has had a sordid history and was the subject of some vandalism relating to the intended way to parse it. It appears that the current description could have had a better reviewed after that all died down. The description starting "For these inputs the string should be parsed ..." talks about the expected parsing and output. The output description was added later. As it stands an output of "a b c" works, but this was a late addition to the description. Based on the discussions I would have expected the string to (i) include the substrings and separators so something like "a != == b = != c" works , or (ii) to show the substrings including the null substrings so that something like "a,,b,,c" works. Now where does that leave examples that output "a b c"? My preference would have been for (i) as the separators are at least as interesting as the substrings. However, I'm not sure if reopening this is useful. The main point is does the output clearly show the input was parsed correctly. Keep in mind that any change to the task description that invalidates existing solutions would need to be marked with a template that indicates the task description changed and the example needs improvement. (I forget the template name for this). --[[User:Dgamey|Dgamey]] 04:20, 30 December 2011 (UTC)
6,951

edits