# p-median on a Real Line

Given a set of n points S on the real line and a positive integer k, design an O($p n^2$) time algorithm to find a subset P of S with size k, such that the total distance from each point in S to the nearest point in P is minimized.

Solution:

We refer the i-th point in S as S[i] and use d(x, y) to denote the distance between two points x and y. The main idea is dynamic programming. First, sort the points in S in increasing order. Let F[q][j] be the optimal objective value for points S[j..n] choosing at most q points, G[q][j] be the optimal objective value for points S[j..n] choosing at most q points, provided that S[j] is chosen. Hence, we get a recurrence relation G[q][j] = $\min_{j < k \leq n+1} \sum_{t=j}^{k-1}$ d(S[t], S[j]) + F[q-1][k] and F[q][j] = $\min_{j \leq k \leq n} \sum_{t=j}^k$ d(S[t], S[k]) + G[q][k]. F[p][1] is the answer. By using this recurrence relation, we get an O($pn^2$) time algorithm.

In order to speed up, let w[j][k] = $\sum_{t=j}^k$ d(S[t], S[j]) and u[j][k] = $\sum_{t=j}^k$ d(S[t], S[k]). The values in w and u satisfy the quadrangle inequality, that is w[i][k] + w[j][l] ≤ w[i][l] + w[j][k] for all i ≤ j ≤ k ≤ l. Thus, we can speed-up the dynamic programming and get an O(pn) time algorithm[1].

If the points is on a tree, there is an O($pn^2$) time algorithm[2].

Note: p-median on plane is NP-hard. It also can be shown that even we can use points outside S, we cannot get a better solution.

Reference:

[1] R. Hassin, A. Tamir, “Improved complexity bounds for location problems on the real line,” Operations Research Letters Volume 10, Issue 7, October 1991, Pages 395–402.

[2] Arie Tamir, “An O(pn2) algorithm for the p-median and related problems on tree graphs,” Operations Research Letters Volume 19, Issue 2, August 1996, Pages 59–64