Given a sequence of n words and the limit of the length of each line M, let the cost of putting k words with total m characters in a line be , design an O(n lg n) time algorithm to find the optimal way to break the sequence of words into lines with minimum cost.
Solution:
The solution is based on dynamic programming. Let p[i] be the total length for the first i words. Let c[i][j] be the cost of putting the i-th word to j-th word in the same line. By using the prefix sum of the length, each element of c[i][j] can be computed in constant time. Let F[i] be the optimal objective value for the subproblem containing the first i words. We get a recurrence relation F[j] = F[i] + c[i][j]. By using the recurrence relation, we get an O(
) time algorithm.
In order to speed up, note that c[i][j] satisfies quadrangle inequality, that is c[i][k] + c[j][l] ≤ c[i][l] + c[j][k] for all i ≤ j ≤ k ≤ l. Furthermore, let d[i][j] = F[i] + c[i][j], the problem becomes finding the minimum for each column of matrix d. Note that the matrix d also satisfies quadrangle inequality. We will use a queue = (), …, (
) to maintain the candidates, where
and (
) is in the front of the queue. When processing to the j-th column, one element (
) in the queue means that in the submatrix d[1..j-2][j..n], row
is the column minimum for columns
to
. Initially, the queue only contains (1, 1). When consider the column j, if d[j-1][n] is smaller then d[
][n], then we need to dequeue until the d[
][
] < d[j-1][
], where
is the rear of the queue. Then, we find the smallest h >
, such that d[
][h] < d[j-1][h] and enqueue (j-1, h+1). Finding such h needs O(lg n) time by using binary search. Since each index can be enqueue and dequeue once, the total complexity is O(n lg n).
Note: This problem can be solved in linear time by using slope optimization.
解法:
解法是基於動態規劃。令p[i]為第一個字到第i個字的總長度。令c[i][j]為把第i個字到第j個字放到一行的費用。利用陣列p,每一個c[i][j]都可以在常數時間內計算出。令F[i]為前i個字的最佳解之花費,我們可得一遞迴關係式F[j] = F[i] + c[i][j]。 利用此遞迴關係,我們可得到O(
)時間的演算法。
有一個方法可以加速,因為c[i][j]滿足四邊形不等式,意即c[i][k] + c[j][l] ≤ c[i][l] + c[j][k]對所有的 i ≤ j ≤ k ≤ l。令d[i][j] = F[i] + c[i][j],問題就變成是在矩陣d中找出每列的最小值。 此時矩陣d同樣也滿足四邊形不等式。我們可以利用一個queue (), …, (
)來維護可能的最佳解範圍,其中
且(
)是queue的頭。當處理到第j列時,queue中的元素(
)表示在子矩陣d[1..j-2][j..n]中,第
行是第
列到第
列的最小值。一開始queue中只包含(1, 1)。當處理到第j列時,如果d[j-1][n]比d[
][n]小,那麼我們必須把元素從queue中移除直到d[
][
] < d[j-1][
],
是queue的尾。然後我們找出最小的h >
,使得d[
][h] < d[j-1][h],然後把(j-1, h+1) enqueue。利用二分搜尋可以在O(lg n)的時間內找到此h。因為每一個索引只能被enqueue和dequeue一次,所以總時間複雜度為O(n lg n)。
Reference: Zvi Galil and Kunsoo Park, “Dynamic programming with convexity, concavity and sparsity,” Theoretical Computer Science Volume 92, Issue 1, 6 January 1992, Pages 49–76.