Closest-pair problem
You are encouraged to solve this task according to the task description, using any language you may know.
This page uses content from Wikipedia. The original article was at Closest-pair problem. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
The aim of this task is to provide a function to find the closest two points among a set of given points in two dimensions, i.e. to solve the Closest pair of points problem in the planar case.
The straightforward solution is a O(n2) algorithm (which we can call brute-force algorithm); the pseudocode (using indexes) could be simply:
bruteForceClosestPair of P(1), P(2), ... P(N) if N < 2 then return ∞ else minDistance ← |P(1) - P(2)| minPoints ← { P(1), P(2) } foreach i ∈ [1, N-1] foreach j ∈ [i+1, N] if |P(i) - P(j)| < minDistance then minDistance ← |P(i) - P(j)| minPoints ← { P(i), P(j) } endif return minDistance, minPoints endif
A better algorithm is based on the recursive divide&conquer approach, as explained also at Wikipedia, which is O(n log n); a pseudocode could be:
closestPair of P(1), P(2), ... P(N) if N ≤ 3 then return closest points of P using brute-force algorithm else xP ← P ordered by the x coordinate, in ascending order PL ← points of xP from 1 to ⌈N/2⌉ PR ← points of xP from ⌈N/2⌉+1 to N (dL, pairL) ← closestPair of PL (dR, pairR) ← closestPair of PR (dmin, pairmin) ← (dR, pairR) if dL < dR then (dmin, pairmin) ← (dL, pairL) endif xM ← xP(⌈N/2⌉)x S ← { p ∈ xP : |xM - px| < dmin } yP ← S ordered by the y coordinate, in ascending order nP ← number of points in yP (closest, closestPair) ← (dmin, pairmin) for i from 1 to nP - 1 k ← i + 1 while k ≤ nP and yP(k)y - yP(k)y < dmin if |yP(k) - yP(i)| < closest then (closest, closestPair) ← (|yP(k) - yP(i)|, {yP(k), yP(i)}) endif k ← k + 1 endwhile return closest, closestPair endif
References
C
<lang c>#include <stdio.h>
- include <stdlib.h>
- include <string.h>
- include <math.h>
- include <assert.h>
typedef struct {
double x, y;
} point_t;
inline double distance(point_t *p1, point_t *p2) {
return sqrt((p1->x - p2->x)*(p1->x - p2->x) +
(p1->y - p2->y)*(p1->y - p2->y)); }</lang>
We need an ad hoc sort function; here is a modified version of what you can find at Quicksort.
<lang c>typedef int(*comparator)(const void *, const void *); void quick(point_t *P, int *left, int *right, comparator compar) {
if (right > left) { int pivot = left[(right-left)/2]; int *r = right, *l = left; do { while (compar(&P[*l], &P[pivot]) < 0) l++; while (compar(&P[*r], &P[pivot]) > 0) r--; if (l <= r) { int t = *l; *l++ = *r; *r-- = t; } } while (l <= r); quick(P, left, r, compar); quick(P, l, right, compar); }
} void m_qsort(point_t *P, int *array, int length, comparator compar) {
quick(P, array, array+length-1, compar);
}
int sort_index_by_x(const void *ra, const void *rb) {
const point_t *a = ra, *b = rb; double d = a->x - b->x; return (d<0) ? -1 : ( (d==0) ? 0 : 1);
}
int sort_index_by_y(const void *ra, const void *rb) {
const point_t *a = ra, *b = rb; double d = a->y - b->y; return (d<0) ? -1 : ( (d==0) ? 0 : 1 );
}</lang>
<lang c>double closest_pair_simple(point_t *P, int *pts, int num, int *pia, int *pib) {
int i, j; if ( num < 2 ) return HUGE_VAL; double ird = distance(&P[pts[0]], &P[pts[1]]); *pia = pts[0]; *pib = pts[1]; for(i=0; i < (num-1); i++) { for(j=i+1; j < num; j++) { double t; t = distance(&P[pts[i]], &P[pts[j]]); if ( t < ird ) {
ird = t; *pia = pts[i]; *pib = pts[j];
} } } return ird;
}</lang>
<lang c>double closest_pair(point_t *P, int *xP, int N, int *iA, int *iB) {
int *yP, i, k; int *PL, *PR; int lA, lB, rA, rB, midx; double dL, dR, dmin, xm, closest; int nS;
if ( N <= 3 ) return closest_pair_simple(P, xP, N, iA, iB);
m_qsort(P, xP, N, sort_index_by_x);
midx = ceil((double)N/2.0) - 1; PL = malloc(sizeof(int)*(midx+1)); assert(PL != NULL); PR = malloc(sizeof(int)*(N-(midx+1))); assert(PR != NULL);
for(i=0;i <= midx; i++) PL[i] = xP[i]; for(i=midx+1; i < N; i++) PR[i-(midx+1)] = xP[i];
dL = closest_pair(P, PL, midx+1, &lA, &lB); dR = closest_pair(P, PR, (N-(midx+1)), &rA, &rB);
if ( dL < dR ) { dmin = dL; *iA = lA; *iB = lB; } else { dmin = dR; *iA = rA; *iB = rB; }
xm = P[xP[midx]].x;
yP = malloc(sizeof(int)*N); assert(yP != NULL); nS = 0; for(i=0; i < N; i++) { if ( fabs(xm - P[xP[i]].x) < dmin ) { yP[nS++] = xP[i]; } }
closest = dmin; if (nS > 1) {
m_qsort(P, yP, nS, sort_index_by_y);
for(i=0; i < (nS-1); i++) { k = i + 1; while( (k < nS) && ( (P[yP[k]].y - P[yP[i]].y) < dmin ) ) {
double d = distance(&P[yP[i]], &P[yP[k]]); if ( d < closest ) { closest = d; *iA = yP[i]; *iB = yP[k]; } k++;
} } }
free(PR); free(PL); free(yP); return closest;
}</lang>
Testing
<lang c>#define NP 10000
int main() {
point_t *points; int i; int p[2], *indexes; double c;
srand(123451);
points = malloc(sizeof(point_t)*NP); indexes = malloc(sizeof(int)*NP);
for(i=0; i < NP; i++) { points[i].x = 20.0*((double)rand()/(RAND_MAX+1.0)) - 10.0; points[i].y = 20.0*((double)rand()/(RAND_MAX+1.0)) - 10.0; indexes[i] = i; }
c = closest_pair_simple(points, indexes, NP, p, p+1); printf("%lf %d %d (%lf)\n", c, p[0], p[1], distance(&points[p[0]], &points[p[1]])); c = closest_pair(points, indexes, NP, p, p+1); printf("%lf %d %d (%lf)\n", c, p[0], p[1], distance(&points[p[0]], &points[p[1]]));
free(points); free(indexes); return EXIT_SUCCESS;
}</lang>
The closest_pair_simple gave 1.85user 0.00system 0:01.86elapsed, while the closest_pair gave 0.04user 0.00system 0:00.04elapsed, using the time command.
Perl
<lang perl>#! /usr/bin/perl use strict; use List::Util qw(min); use POSIX qw(ceil);
sub dist {
my ( $a, $b) = @_; return sqrt( (${$a}[0] - @{$b}[0])**2 + (@{$a}[1] - @{$b}[1])**2 );
}
sub closest_pair_simple {
my $ra = shift; my @arr = @$ra; my $inf = 1e600; return $inf if (scalar(@arr) < 2); my ( $a, $b, $d ) = ($arr[0], $arr[1], dist($arr[0], $arr[1])); while( scalar(@arr) > 0 ) {
my $p = pop @arr; foreach my $l (@arr) { my $t = dist($p, $l); ($a, $b, $d) = ($p, $l, $t) if $t < $d; }
} return ($a, $b, $d);
}
sub closest_pair {
my $ra = shift; my @arr = @$ra; my $N = @arr; return closest_pair_simple($ra) if ( scalar(@arr) <= 3 ); my $inf = 1e600; my @xP = sort { ${$a}[0] <=> ${$b}[0] } @arr; my $midx = ceil($N/2)-1; my @PL = @xP[0 .. $midx]; my @PR = @xP[$midx+1 .. $N-1]; my ($al, $bl, $dL) = closest_pair(\@PL); my ($ar, $br, $dR) = closest_pair(\@PR); my ($m1, $m2, $dmin) = ($al, $bl, $dL); ($m1, $m2, $dmin) = ($ar, $br, $dR) if ( $dR < $dL ); my $xm = ${$xP[$midx]}[0]; my @S = (); foreach my $p (@xP) {
push @S, $p if ( abs($xm - ${$p}[0]) < $dmin );
} my @yP = sort { ${$a}[1] <=> ${$b}[1] } @S; if ( scalar(@yP) > 0 ) {
my ( $w1, $w2, $closest ) = ($m1, $m2, $dmin); foreach my $i (0 .. ($#yP - 1)) {
my $k = $i + 1; while ( ($k <= $#yP) && ( (${$yP[$k]}[1] - ${$yP[$i]}[1]) < $dmin) ) { my $d = dist($yP[$k], $yP[$i]); ($w1, $w2, $closest) = ($yP[$k], $yP[$i], $d) if ($d < $closest); $k++; } } return ($w1, $w2, $closest);
} else {
return ($m1, $m2, $dmin);
}
}
my @points = (); my $N = 5000;
foreach my $i (1..$N) {
push @points, [rand(20)-10.0, rand(20)-10.0];
}
- my ($a, $b, $d) = closest_pair_simple(\@points);
- print "$d\n";
my ($a1, $b1, $d1) = closest_pair(\@points); print "$d1\n";
exit 0;</lang>
Time for the brute-force algorithm gave 40.63user 0.12system 0:41.06elapsed, while the divide&conqueer algorithm gave 0.38user 0.00system 0:00.38elapsed with 5000 points.
Python
<lang python>"""
Compute nearest pair of points using two algorithms First algorithm is 'brute force' comparison of every possible pair. Second, 'divide and conquer', is based on: www.cs.iupui.edu/~xkzou/teaching/CS580/Divide-and-conquer-closestPair.ppt
"""
from random import randint
infinity = float('inf')
- Note the use of complex numbers to represent 2D points making distance == abs(P1-P2)
def bruteForceClosestPair(point):
numPoints = len(point) if numPoints < 2: return infinity, (None, None) return min( ((abs(point[i] - point[j]), (point[i], point[j])) for i in range(numPoints-1) for j in range(i+1,numPoints)), key=lambda x: x[0])
def closestPair(point):
xP = sorted(point, key= lambda p: p.real) yP = sorted(point, key= lambda p: p.imag) return _closestPair(xP, yP)
def _closestPair(xP, yP):
numPoints = len(xP) if numPoints <= 3: return bruteForceClosestPair(xP) Pl = xP[:numPoints/2] Pr = xP[numPoints/2:] Yl, Yr = [], [] xDivider = Pl[-1].real for p in yP: if p.real <= xDivider: Yl.append(p) else: Yr.append(p) dl, pairl = _closestPair(Pl, Yl) dr, pairr = _closestPair(Pr, Yr) dm, pairm = (dl, pairl) if dl < dr else (dr, pairr) # Points within dm of xDivider sorted by Y coord closeY = [p for p in yP if abs(p.real - xDivider) < dm] numCloseY = len(closeY) if numCloseY > 1: # There is a proof that you only need compare a max of 7 next points closestY = min( ((abs(closeY[i] - closeY[j]), (closeY[i], closeY[j])) for i in range(numCloseY-1) for j in range(i+1,min(i+8, numCloseY))), key=lambda x: x[0]) return (dm, pairm) if dm <= closestY[0] else closestY else: return dm, pairm
def times():
Time the different josephus functions import timeit
functions = [bruteForceClosestPair, closestPair] for f in functions: print 'Time for', f.__name__, timeit.Timer( '%s(pointList)' % f.__name__, 'from closestpair import %s, pointList' % f.__name__).timeit(number=1)
pointList = [randint(0,1000)+1j*randint(0,1000) for i in range(2000)]
if __name__ == '__main__':
pointList = [(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)] print pointList print ' bruteForceClosestPair:', bruteForceClosestPair(pointList) print ' closestPair:', closestPair(pointList) for i in range(10): pointList = [randint(0,10)+1j*randint(0,10) for i in range(10)] print '\n', pointList print ' bruteForceClosestPair:', bruteForceClosestPair(pointList) print ' closestPair:', closestPair(pointList) print '\n' times() times() times()
</lang>
Sample output followed by timing comparisons
[(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)] bruteForceClosestPair: (1.0, ((8+4j), (7+4j))) closestPair: (1.0, ((8+4j), (7+4j))) [(10+6j), (7+0j), (9+4j), (4+8j), (7+5j), (6+4j), (1+9j), (6+4j), (1+3j), (5+0j)] bruteForceClosestPair: (0.0, ((6+4j), (6+4j))) closestPair: (0.0, ((6+4j), (6+4j))) [(4+10j), (8+5j), (10+3j), (9+7j), (2+5j), (6+7j), (6+2j), (9+6j), (3+8j), (5+1j)] bruteForceClosestPair: (1.0, ((9+7j), (9+6j))) closestPair: (1.0, ((9+7j), (9+6j))) [(10+0j), (3+10j), (10+7j), (1+8j), (5+10j), (8+8j), (4+7j), (6+2j), (6+10j), (9+3j)] bruteForceClosestPair: (1.0, ((5+10j), (6+10j))) closestPair: (1.0, ((5+10j), (6+10j))) [(3+7j), (5+3j), 0j, (2+9j), (2+5j), (9+6j), (5+9j), (4+3j), (3+8j), (8+7j)] bruteForceClosestPair: (1.0, ((3+7j), (3+8j))) closestPair: (1.0, ((4+3j), (5+3j))) [(4+3j), (10+9j), (2+7j), (7+8j), 0j, (3+10j), (10+2j), (7+10j), (7+3j), (1+4j)] bruteForceClosestPair: (2.0, ((7+8j), (7+10j))) closestPair: (2.0, ((7+8j), (7+10j))) [(9+2j), (9+8j), (6+4j), (7+0j), (10+2j), (10+0j), (2+7j), (10+7j), (9+2j), (1+5j)] bruteForceClosestPair: (0.0, ((9+2j), (9+2j))) closestPair: (0.0, ((9+2j), (9+2j))) [(3+3j), (8+2j), (4+0j), (1+1j), (9+10j), (5+0j), (2+3j), 5j, (5+0j), (7+0j)] bruteForceClosestPair: (0.0, ((5+0j), (5+0j))) closestPair: (0.0, ((5+0j), (5+0j))) [(1+5j), (8+3j), (8+10j), (6+8j), (10+9j), (2+0j), (2+7j), (8+7j), (8+4j), (1+2j)] bruteForceClosestPair: (1.0, ((8+3j), (8+4j))) closestPair: (1.0, ((8+3j), (8+4j))) [(8+4j), (8+6j), (8+0j), 0j, (10+7j), (10+6j), 6j, (1+3j), (1+8j), (6+9j)] bruteForceClosestPair: (1.0, ((10+7j), (10+6j))) closestPair: (1.0, ((10+7j), (10+6j))) [(6+8j), (10+1j), 3j, (7+9j), (4+10j), (4+7j), (5+7j), (6+10j), (4+7j), (2+4j)] bruteForceClosestPair: (0.0, ((4+7j), (4+7j))) closestPair: (0.0, ((4+7j), (4+7j))) Time for bruteForceClosestPair 4.57953371169 Time for closestPair 0.122539596513 Time for bruteForceClosestPair 5.13221177552 Time for closestPair 0.124602707886 Time for bruteForceClosestPair 4.83609397284 Time for closestPair 0.119326618327 >>>
Smalltalk
These class methods return a three elements array, where the first two items are the points, while the third is the distance between them (which having the two points, can be computed; but it was easier to keep it already computed in the third position of the array).
<lang smalltalk>Object subclass: ClosestPair [
ClosestPair class >> raiseInvalid: arg [ SystemExceptions.InvalidArgument signalOn: arg reason: 'specify at least two points' ]
ClosestPair class >> bruteForce: pointsList [ |dist from to points| (pointsList size < 2) ifTrue: [ ^ FloatD infinity ]. points := pointsList asOrderedCollection. from := points at: 1. to := points at: 2. dist := from dist: to. [ points isEmpty ] whileFalse: [ |p0| p0 := points removeFirst. points do: [ :p | ((p0 dist: p) <= dist) ifTrue: [ from := p0. to := p. dist := p0 dist: p. ] ] ]. ^ { from. to. from dist: to } ]
ClosestPair class >> orderByX: points [ ^ points asSortedCollection: [:a :b| (a x) < (b x) ] ] ClosestPair class >> orderByY: points [ ^ points asSortedCollection: [:a :b| (a y) < (b y) ] ]
ClosestPair class >> splitLeft: pointsList [ ^ pointsList copyFrom: 1 to: ((pointsList size / 2) ceiling) ] ClosestPair class >> splitRight: pointsList [ |s| ^ pointsList copyFrom: (((pointsList size / 2) ceiling) + 1) to: (pointsList size). ]
ClosestPair class >> minBetween: a and: b [ (a at: 3) < (b at: 3) ifTrue: [ ^a ] ifFalse: [ ^b ] ]
ClosestPair class >> recursiveDAndC: pointsList [ |orderedByX pR pL minL minR minDist middleVLine joiningStrip tDist nP| (pointsList size <= 3) ifTrue: [ ^ self bruteForce: pointsList ]. orderedByX := self orderByX: pointsList. pR := self splitLeft: orderedByX. pL := self splitRight: orderedByX. minR := self recursiveDAndC: pR. minL := self recursiveDAndC: pL. minDist := self minBetween: minR and: minL.
middleVLine := (orderedByX at: ((orderedByX size / 2) ceiling)) x.
joiningStrip := self orderByY: (pointsList select: [ :p | ((middleVLine - (p x)) abs) < (minDist at: 3) ] ). tDist := minDist. nP := joiningStrip size. 1 to: (nP - 1) do: [ :i | |k| k := i + 1. [ (k <= nP) & ( ((joiningStrip at: (k min: nP)) y - (joiningStrip at: i) y) < (minDist at: 3) ) ] whileTrue: [ ((joiningStrip at: i) dist: (joiningStrip at: k)) < (tDist at: 3) ifTrue: [ tDist := { joiningStrip at: i. joiningStrip at: k. (joiningStrip at: i) dist: (joiningStrip at: k) } ]. k := k + 1. ] ]. ^ tDist ]
].</lang>
Testing
<lang smalltalk>|coll cp ran| "Let's use the same seed to be sure of the results..." ran := Random seed: 100.
coll := (1 to: 10000 collect: [ :a |
Point x: (ran next)*10 y: (ran next)*10 ]).
cp := ClosestPair bruteForce: coll. ((cp at: 3) asScaledDecimal: 7) displayNl.
"or"
cp := ClosestPair recursiveDAndC: coll. ((cp at: 3) asScaledDecimal: 7) displayNl.</lang>
The brute-force approach with 10000 points, run with the time tool, gave
224.21user 1.31system 3:46.84elapsed 99%CPU
while the recursive divide&conquer algorithm gave
2.37user 0.01system 0:02.56elapsed 93%CPU
(Of course these results must be considered relative and taken cum grano salis; time counts also the time taken to initialize the Smalltalk environment, and I've taken no special measures to avoid the system load falsifying the results)
Tcl
Each point is represented as a list of two floating-point numbers, the first being the x coordinate, and the second being the y. <lang Tcl># A trivial helper... proc distance {p1 p2} {
lassign $p1 x1 y1 lassign $p2 x2 y2 expr { hypot($x1-$x2, $y1-$y2) }
}
proc bruteForceClosestPair {points} {
if {[package provide Tcl] >= "8.5"} { set min Inf } else { set min [distance [lindex $points 0] [lindex $points 1]] }
for {set i 0} {$i < [llength $points]-1} {incr i} { for {set j $i} {[incr j] < [llength $points]} {} { set dist [distance [lindex $points $i] [lindex $points $j]] if {$dist < $min} { set min $dist } } } return $min
}
- Some testing code:
set points {} for {set n 0} {$n < 10000} {incr n} {
lappend points [list [expr {10*rand()}] [expr {10*rand()}]]
} set t [time {
set d [bruteForceClosestPair $points]
}] puts "closest was $d, found in [lindex $t 0] microseconds"</lang>Example output
closest was 0.0006785546466922088, found in 94942709 microseconds
Note that the lindex
and llength
commands are both O(1).