Talk:Symmetric difference: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Set type: Obvoius or not...)
Line 61: Line 61:
:::::''If input types are being substituted then is it obvious that any input lists should/should not have duplicates and should/should not have an order imposed for the algorithm to work? ''
:::::''If input types are being substituted then is it obvious that any input lists should/should not have duplicates and should/should not have an order imposed for the algorithm to work? ''
:::::Obvious or not, the concern can be resolved by simply noting the caveat in each language's example area, rather than requiring that a workaround be demonstrated in code. --[[User:Short Circuit|Michael Mol]] 13:44, 10 February 2010 (UTC)
:::::Obvious or not, the concern can be resolved by simply noting the caveat in each language's example area, rather than requiring that a workaround be demonstrated in code. --[[User:Short Circuit|Michael Mol]] 13:44, 10 February 2010 (UTC)

:: Paddy3118 wrote:<br />
:::> Oh, I don't think it is hard to do, I just think that because we are dealing with sets then it <br />
:::> isn't right to have the chance of duplicates in outputs.

::My programming philosopy would be that if the purpose of the function is to return the symmetric difference of two sets, then it is an error to call the function with a list containing duplicate elmeents. The result of calling a function with invalid arguments is undefined, and the function can do whatever it pleases, including withdrawing all the money in your banking accounts. The symmetric difference function is not in error, the calling function is in error.

::Now, however, if the stated function is to convert the inputs to sets, and return the symmetric difference of the sets, the function should do just that, and duplicate elements in each list are allowed as valid inputs. It's a matter of what you defining as the purpose of your function.

::Furthermore, you can define a function to return the symmetric difference of two sets if two sets are provided as input, or to throw an exception if lists are provided (containing one or more duplicates). This is a different definition of the function, and it's definition includes lists containing duplicate elements as valid inputs to the functon. Just that it's response is different than if two valid sets were provided as inputs.--[[User:Rldrenth|Rldrenth]] 16:43, 10 February 2010 (UTC)

Revision as of 16:43, 10 February 2010

The symmetric difference should give one list which is the union of the two differences of the lists. The Perl example shows two lists. --Mwn3d 23:19, 2 December 2009 (UTC)

  • I agree with your statement about what “symmetric difference” means.
  • All of the current examples produce two sets.
  • The task as written tends to the two-lists interpretation.

Conclusion: Either the task should be renamed, or the task description should be clarified and the examples revised. --Kevin Reid 02:02, 3 December 2009 (UTC)

I think the task name is good. Symmetric difference is actually an exercise I did in CS classes a few times. I think the task description should be changed to match the task name. --Mwn3d 02:17, 3 December 2009 (UTC)
Description updated, Perl example fixed, J example marked. (I saw that the Python example already provided the symmetric difference. --Michael Mol 04:23, 3 December 2009 (UTC)

Set type

I noticed that the Ruby example was using lists rather than a set datatype. Although its use of lists satisfied the original task decription, in that it gave the correct answer, its use of lists would fall down if, for example, certain duplicates existed in the input lists. No duplicate values would ever exist in any result from a set based solution. Since the task is about sets rather than lists, (and has the Ritzy set expressions to prove it), I modified the task description to force a set-type result, without duplicates. It should hopefully be a small update affected implementations.

If you think an example falls foul of this then maybe you could fix/flag them? Thanks. --Paddy3118 06:12, 30 January 2010 (UTC)

Yeah but to deal with the duplicates issue all you have to do is run the Create a Sequence of unique elements task on the inputs to get rid of duplicates. --Spoon! 05:42, 2 February 2010 (UTC)
Oh, I don't think it is hard to do, I just think that because we are dealing with sets then it isn't right to have the chance of duplicates in outputs. --Paddy3118 07:23, 2 February 2010 (UTC)
Alright, this doesn't make sense to me. I understand what you said in the log entry, and I'll admit I was unfamiliar with that requirement of "set", but if we're insisting on a strict definition of "set", I don't see how it makes sense to hold examples in languages without a set type to a different requirement. If the data passed into the program has duplicates within a list, then that list isn't a set. I would recommend leaving the note, but reducing it to a "optionally, verify your inputs," rather than a language-attribute-conditional requirement. --Michael Mol 15:12, 9 February 2010 (UTC)
Hi Michael, I wrote the note because of personal experience. before Python had an explicit set type I had learned to not use lists as sets because of the duplicates issue. I n Python the idiom then was to use the keys of a dictionary (map or hash) and code around that to make it look like a set. It was quicker to find out if a key is in a dict than in a list and the keys of a dict are unique. Seeing more than one of the RC examples using lists, and knowing how easy it is to have duplicates in a list made me check the algorithms used.
I reasoned that the task is about sets. If I use a set type then the type ensures there are no duplicates. Isn't it fair that if another example is using lists then either they show how their lists are further constrained to work as sets, or that the algorithm will give a set-like answer if such checks are not shown? --Paddy3118 23:39, 9 February 2010 (UTC)
I understand where you're coming from; I couldn't earn my paycheck if I didn't deal with practical concerns when dealing with my code. However, when I write code, I work to keep input validation a component distinct from operating on that input; it improves mine and my coworkers' comprehension of my code, as well as keep the algorithms themselves visible and distinct. With this task, as I understand it and your understanding of it, the task description specifies that the input has the properties of a set, and requires that an algorithm be applied to that input. Verifying and ensuring that the data passed in meets the constraints of being a set falls under data validation and sanitation, and, for clarity's sake, I believe that such things should be a distinct component of the program where clarity is key.
I'm not saying it must be a separate example, but rather that if it's included, it should be not be confused with the actual implementation of the algorithm itself (). By all means, point out practical considerations and caveats; add them to the example's prologue, add them as an identified component of example code, or some other means, but ensure that input validation isn't confused with algorithm implementation.
By changing the task to require input sanitation, it became necessary to mark a number of examples as incorrect, adding templates to identify those examples as requiring attention. If one were to change that requirement to allow noting input constraints as an alternate requirement, the ENAs aren't required, observers of the code are warned of caveats, and the core algorithm is still demonstrated. Does that make sense? --Michael Mol 05:50, 10 February 2010 (UTC)
The task asks for an operation of sets. It is reasonable to expect answers to be general over its inputs when those inputs are sets. If we give an answer where the input types are lists and not sets then that is a substitution of one well understood type by another well understood type, who's major differences are that a set is unordered and a list may have duplicates.
I think it is reasonable when given a task where you are substituting input types that you either check it works 'as generally'. I think the list solutions would not depend on order, but may fail given duplication. If input types are being substituted then is it obvious that any input lists should/should not have duplicates and should/should not have an order imposed for the algorithm to work?
When comparing languages you may be doing a disservice to those that have a set type which automatically works in a more general manner over its inputs.
Probably gratuitous example that won't help my case :-) <lang python>>>> s0 =list('ABCCD')

>>> # Lists as sets, not handling duplicates >>> s1 =list('AAEEC') >>> s0 ['A', 'B', 'C', 'C', 'D'] >>> s1 ['A', 'A', 'E', 'E', 'C'] >>> ans1 = ( [x for x in s0 if x not in s1] + [y for y in s1 if y not in s0] ) >>> ans1 ['B', 'D', 'E', 'E'] >>> >>> # Dictionary keys as sets giving the right answer >>> s0 =dict((k,None) for k in 'ABCCD') >>> s1 =dict((k,None) for k in 'AAEEC') >>> s0 {'A': None, 'C': None, 'B': None, 'D': None} >>> s1 {'A': None, 'C': None, 'E': None} >>> ans2 = ( [x for x in s0 if x not in s1] + [y for y in s1 if y not in s0] ) >>> ans2 ['B', 'D', 'E'] >>> >>> # Using sets as inputs >>> s0 =set('ABCCD') >>> s1 =set('AAEEC') >>> ans3 = s0 ^ s1 >>> ans3 {'B', 'E', 'D'}</lang> --Paddy3118 07:40, 10 February 2010 (UTC)

If input types are being substituted then is it obvious that any input lists should/should not have duplicates and should/should not have an order imposed for the algorithm to work?
Obvious or not, the concern can be resolved by simply noting the caveat in each language's example area, rather than requiring that a workaround be demonstrated in code. --Michael Mol 13:44, 10 February 2010 (UTC)
Paddy3118 wrote:
> Oh, I don't think it is hard to do, I just think that because we are dealing with sets then it
> isn't right to have the chance of duplicates in outputs.
My programming philosopy would be that if the purpose of the function is to return the symmetric difference of two sets, then it is an error to call the function with a list containing duplicate elmeents. The result of calling a function with invalid arguments is undefined, and the function can do whatever it pleases, including withdrawing all the money in your banking accounts. The symmetric difference function is not in error, the calling function is in error.
Now, however, if the stated function is to convert the inputs to sets, and return the symmetric difference of the sets, the function should do just that, and duplicate elements in each list are allowed as valid inputs. It's a matter of what you defining as the purpose of your function.
Furthermore, you can define a function to return the symmetric difference of two sets if two sets are provided as input, or to throw an exception if lists are provided (containing one or more duplicates). This is a different definition of the function, and it's definition includes lists containing duplicate elements as valid inputs to the functon. Just that it's response is different than if two valid sets were provided as inputs.--Rldrenth 16:43, 10 February 2010 (UTC)