Talk:Closest-pair problem

Since we only compare distances and the actual smallest distance never needs to be reported, could the calculation be sped up by not using a square root in the distance calculation? I'm not sure how the current examples are calculating the distance, but they may be optimized by calculating the square of the distance (d^2 = (x2-x1)^2+(y2-y1)^2). I don't think there is any use for the actual distance anywhere in the pseudocode, but I may be wrong. --Mwn3d 19:03, 9 May 2009 (UTC)

I think the actual distance is meant to be the result of the function. Still, that would let you optimize to reduce the number of square root operations performed. (I do question whether we're doing a “how fast can we go” exercise though; clarity is better here, yes?) — Dkf 21:10, 9 May 2009 (UTC)

I believe it would be a reasonable optimization. It would be enough to compute sqrt on the return value (we need a "non recursive front-end" function anyway). But maybe this is up to implementors. Pseudocode should not think about it. --ShinTakezou 09:21, 11 May 2009 (UTC)

About this task

I am working on it; currently I have less time than before. The pseudocode needs to be cleaned up and corrected; the Smalltalk impl works fine in all my tests, but I've done a C and Perl impl too... and they work 1/6 times... meaning that once a while the result between the "brute force" method and "fast" method disagree; at the very beginning I thought the fault was in the C and Perl impls, since Smalltalk worked always... but now I think it is indeed a deeper problem and the error does not exhibit in Smalltalk because of the distribution of the random number (this is just an idea... I am trying to figure out a set of tests to understand it better what's going wrong and why, and why not in Smalltalk... using the same set of points could be enough, it's in my plan). Once C and Perl code will work, I will fix the pseudocode and the Smalltalk impl. Of course, if anyone can fix the pseudocode (it is deduced by the Smalltalk impl indeed!)... or doing a new impl finds it work properly (this would be my C and Perl impl are wrong someway... but I can't see where!)... I would like to read any idea on it. (First I am going to use the same set of points, then I will take paper and pencil and try to reinvent the wheel in the hope I understand where I get the algorithm explanation wrong!) --ShinTakezou 09:17, 11 May 2009 (UTC)

A note: should be the pseudocode marked as "problematic" (or whatever)? (References are there even to allow any good guy to make it better; sorry for this, but sadly as said before the Smalltalk impl does not exhibit a wrong behaviour, so I was deceived by it :D) --ShinTakezou 09:23, 11 May 2009 (UTC)

It seems weird to only return the distance of the closest pair. Wouldn't want the actual pair of points rather than just the distance between them? That's what the description seems to say, but the examples only return the distance. Are they missing the point? --Mwn3d 16:15, 11 May 2009 (UTC)

Implementations (at least Smalltalk and new Perl and C) do so! Just I haven't specified it into the pseudocode, since it would make it less clear (the point is not to keep track of the closest points: it is about the "smaller distance" so to say). Indeed it is not hard to keep track of that information even in the pseudocode, so maybe I can add it after all. But my first effort was to provide a working pseudocode... which points can come after. (Now everything seems fixed, so I am going to add it to the pseudocode too...) --ShinTakezou 17:06, 11 May 2009 (UTC)

I've updated the pseudocode to explicitly store the closest pair; however, after updating I thought after all it was not so useful: since a distance is always computed between two points, the information could be considered as hidden inside the computed distance... However now the implementors are suggested to give that information too... --ShinTakezou 18:09, 11 May 2009 (UTC)

I found This reference helped me with my Python example, which seems to work. --Paddy3118 21:56, 11 May 2009 (UTC)

C and Perl and also Smalltalk (updated), based now on the new pseudocode, work, so I suppose the algorithm now is correct! (What written before referred to the previous pseudocode and first Smalltalk implementation) --ShinTakezou 09:09, 12 May 2009 (UTC)

Comment on Algorithms Given

Critique of pseudocode explanations of the Closest Pair Algorithm --Paddy3118 05:39, 12 May 2009 (UTC)

The new code (id est, as it is now) works. The code that had problems was the previous one (in your blog you report the most recent version, that, as said, works). Maybe I've chosen a not so communicative pseudolanguage... but I thought we were familiar with such notation like "ceil" or "floor", or the meaning of {elements...} (which resembles a set; here we intend an ordered, indexable, "set"); and it seemed to me intuitive that a P(i)_x refers to the point i-th, x coordinate, even though it would be better to write (P(i))_x maybe. I would have used P_i,x but at the end I preferred the P(index) form in general; I could have written P_x(index) instead... maybe it is clearer? Using the last reference, I've corrected the last part of the pseudocode (and in fact the while loop is fundamentally the one you can see there). The syntax I am dissatisfied with is the last I've added to explicitly take care of the points' coordinates, e.g. (closest, closestPair)... but it is the faster modification I was able to think to add that without too many efforts.

Language apart, the algorithm now works (at least, Smalltalk, C and Perl implementations, based on that, works).

Anyone interested in writing better pseudocode (even in changing the current "language" keeping the same "code") is welcome of course! --ShinTakezou 09:04, 12 May 2009 (UTC)

Hi, You've done sterling work. I was dumb. It just took a different explanation before I could 'get' it. On the Ceiling thing, I totally missed the the fact that the bottom of the brackets was missing and so mis-read it as plain old square brackets using a crap font.

Their is still one outstanding point though: in the ref. I found, they stated that you had to pre-sort only once, for both X and Y before you entered the recursive routine or the sorts for Y would make the algorithm n(logn)**2 rather than nlogn. But I'm no expert in deriving O(n) notation. --Paddy3118 12:27, 12 May 2009 (UTC)

Of course you're right: sorting must be done once (since once you sort, e.g. by x, splitting won't mess up the order...) That's also why the refs pass sorted sets as arguments... but while implementing it the first time, I've disregarded the question since I was more concentrated on other details... Argh. I should rewrite it all a lot better :( --ShinTakezou 13:03, 12 May 2009 (UTC)