Talk:Symmetric difference: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Set type: clarity, data validation and sanitation.)
Line 18: Line 18:
::::Hi Michael, I wrote the note because of personal experience. before Python had an explicit set type I had learned to ''not'' use lists as sets because of the duplicates issue. I n Python the idiom then was to use the keys of a dictionary (map or hash) and code around that to make it look like a set. It was quicker to find out if a key is in a dict than in a list and the keys of a dict are unique. Seeing more than one of the RC examples using lists, and knowing how easy it is to have duplicates in a list made me check the algorithms used.
::::Hi Michael, I wrote the note because of personal experience. before Python had an explicit set type I had learned to ''not'' use lists as sets because of the duplicates issue. I n Python the idiom then was to use the keys of a dictionary (map or hash) and code around that to make it look like a set. It was quicker to find out if a key is in a dict than in a list and the keys of a dict are unique. Seeing more than one of the RC examples using lists, and knowing how easy it is to have duplicates in a list made me check the algorithms used.
::::I reasoned that the task is about sets. If I use a set type then the type ensures there are no duplicates. Isn't it fair that if another example is using lists then either they show how their lists are further constrained to work as sets, or that the algorithm will give a set-like answer if such checks are not shown? --[[User:Paddy3118|Paddy3118]] 23:39, 9 February 2010 (UTC)
::::I reasoned that the task is about sets. If I use a set type then the type ensures there are no duplicates. Isn't it fair that if another example is using lists then either they show how their lists are further constrained to work as sets, or that the algorithm will give a set-like answer if such checks are not shown? --[[User:Paddy3118|Paddy3118]] 23:39, 9 February 2010 (UTC)

:::::I understand where you're coming from; I couldn't earn my paycheck if I didn't deal with practical concerns when dealing with my code. However, when I write code, I work to keep input validation a component distinct from operating on that input; it improves mine and my coworkers' comprehension of my code, as well as keep the algorithms themselves visible and distinct. With this task, as I understand it and your understanding of it, the task description specifies that the input has the properties of a set, and requires that an algorithm be applied to that input. Verifying and ensuring that the data passed in meets the constraints of being a set falls under data validation and sanitation, and, for clarity's sake, I believe that such things should be a distinct component of the program where clarity is key.

:::::I'm not saying it ''must'' be a separate example, but rather that if it's included, it should be not be confused with the actual implementation of the algorithm itself (<math>(A \setminus B) \cup (B \setminus A)</math>). By all means, point out practical considerations and caveats; add them to the example's prologue, add them as an identified component of example code, or some other means, but ensure that input validation isn't confused with algorithm implementation.

:::::By changing the task to require input sanitation, it became necessary to mark a number of examples as incorrect, adding templates to identify those examples as requiring attention. If one were to change that requirement to allow noting input constraints as an alternate requirement, the ENAs aren't required, observers of the code are warned of caveats, and the core algorithm is still demonstrated. Does that make sense? --[[User:Short Circuit|Michael Mol]] 05:50, 10 February 2010 (UTC)

Revision as of 05:50, 10 February 2010

The symmetric difference should give one list which is the union of the two differences of the lists. The Perl example shows two lists. --Mwn3d 23:19, 2 December 2009 (UTC)

  • I agree with your statement about what “symmetric difference” means.
  • All of the current examples produce two sets.
  • The task as written tends to the two-lists interpretation.

Conclusion: Either the task should be renamed, or the task description should be clarified and the examples revised. --Kevin Reid 02:02, 3 December 2009 (UTC)

I think the task name is good. Symmetric difference is actually an exercise I did in CS classes a few times. I think the task description should be changed to match the task name. --Mwn3d 02:17, 3 December 2009 (UTC)
Description updated, Perl example fixed, J example marked. (I saw that the Python example already provided the symmetric difference. --Michael Mol 04:23, 3 December 2009 (UTC)

Set type

I noticed that the Ruby example was using lists rather than a set datatype. Although its use of lists satisfied the original task decription, in that it gave the correct answer, its use of lists would fall down if, for example, certain duplicates existed in the input lists. No duplicate values would ever exist in any result from a set based solution. Since the task is about sets rather than lists, (and has the Ritzy set expressions to prove it), I modified the task description to force a set-type result, without duplicates. It should hopefully be a small update affected implementations.

If you think an example falls foul of this then maybe you could fix/flag them? Thanks. --Paddy3118 06:12, 30 January 2010 (UTC)

Yeah but to deal with the duplicates issue all you have to do is run the Create a Sequence of unique elements task on the inputs to get rid of duplicates. --Spoon! 05:42, 2 February 2010 (UTC)
Oh, I don't think it is hard to do, I just think that because we are dealing with sets then it isn't right to have the chance of duplicates in outputs. --Paddy3118 07:23, 2 February 2010 (UTC)
Alright, this doesn't make sense to me. I understand what you said in the log entry, and I'll admit I was unfamiliar with that requirement of "set", but if we're insisting on a strict definition of "set", I don't see how it makes sense to hold examples in languages without a set type to a different requirement. If the data passed into the program has duplicates within a list, then that list isn't a set. I would recommend leaving the note, but reducing it to a "optionally, verify your inputs," rather than a language-attribute-conditional requirement. --Michael Mol 15:12, 9 February 2010 (UTC)
Hi Michael, I wrote the note because of personal experience. before Python had an explicit set type I had learned to not use lists as sets because of the duplicates issue. I n Python the idiom then was to use the keys of a dictionary (map or hash) and code around that to make it look like a set. It was quicker to find out if a key is in a dict than in a list and the keys of a dict are unique. Seeing more than one of the RC examples using lists, and knowing how easy it is to have duplicates in a list made me check the algorithms used.
I reasoned that the task is about sets. If I use a set type then the type ensures there are no duplicates. Isn't it fair that if another example is using lists then either they show how their lists are further constrained to work as sets, or that the algorithm will give a set-like answer if such checks are not shown? --Paddy3118 23:39, 9 February 2010 (UTC)
I understand where you're coming from; I couldn't earn my paycheck if I didn't deal with practical concerns when dealing with my code. However, when I write code, I work to keep input validation a component distinct from operating on that input; it improves mine and my coworkers' comprehension of my code, as well as keep the algorithms themselves visible and distinct. With this task, as I understand it and your understanding of it, the task description specifies that the input has the properties of a set, and requires that an algorithm be applied to that input. Verifying and ensuring that the data passed in meets the constraints of being a set falls under data validation and sanitation, and, for clarity's sake, I believe that such things should be a distinct component of the program where clarity is key.
I'm not saying it must be a separate example, but rather that if it's included, it should be not be confused with the actual implementation of the algorithm itself (). By all means, point out practical considerations and caveats; add them to the example's prologue, add them as an identified component of example code, or some other means, but ensure that input validation isn't confused with algorithm implementation.
By changing the task to require input sanitation, it became necessary to mark a number of examples as incorrect, adding templates to identify those examples as requiring attention. If one were to change that requirement to allow noting input constraints as an alternate requirement, the ENAs aren't required, observers of the code are warned of caveats, and the core algorithm is still demonstrated. Does that make sense? --Michael Mol 05:50, 10 February 2010 (UTC)