Word Wrap

Given a sequence of n words and the limit of the length of each line M, let the cost of putting k words with total m characters in a line be (M - m - k + 1)^3, design an O(n lg n) time algorithm to find the optimal way to break the sequence of words into lines with minimum cost.

Solution:

The solution is based on dynamic programming. Let p[i] be the total length for the first i words. Let c[i][j] be the cost of putting the i-th word to j-th word in the same line. By using the prefix sum of the length, each element of c[i][j] can be computed in constant time. Let F[i] be the optimal objective value for the subproblem containing the first i words. We get a recurrence relation F[j] = \min_{1 \leq i < j} F[i] + c[i][j]. By using the recurrence relation, we get an O(n^2) time algorithm.

In order to speed up, note that c[i][j] satisfies quadrangle inequality, that is c[i][k] + c[j][l] ≤ c[i][l] + c[j][k] for all i ≤ j ≤ k ≤ l. Furthermore, let d[i][j] = F[i] + c[i][j], the problem becomes finding the minimum for each column of matrix d. Note that the matrix d also satisfies quadrangle inequality. We will use a queue = (i_1, h_i), …, (i_k, h_k) to maintain the candidates, where i_1 \leq i_2 \dots \leq i_k and (i_1, h_1) is in the front of the queue. When processing to the j-th column, one element (i_r, h_r) in the queue means that in the submatrix d[1..j-2][j..n], row i_r is the column minimum for columns h_r to h_{r+1} - 1. Initially, the queue only contains (1, 1). When consider the column j, if d[j-1][n] is smaller then d[i_k][n], then we need to dequeue until the d[i_r][h_r] < d[j-1][h_r], where i_r is the rear of the queue. Then, we find the smallest h > h_r, such that d[i_r][h] < d[j-1][h] and enqueue (j-1, h+1). Finding such h needs O(lg n) time by using binary search. Since each index can be enqueue and dequeue once, the total complexity is O(n lg n).

Note: This problem can be solved in linear time by using slope optimization.

解法

解法是基於動態規劃。令p[i]為第一個字到第i個字的總長度。令c[i][j]為把第i個字到第j個字放到一行的費用。利用陣列p,每一個c[i][j]都可以在常數時間內計算出。令F[i]為前i個字的最佳解之花費,我們可得一遞迴關係式F[j] = \min_{1 \leq i < j} F[i] + c[i][j]。 利用此遞迴關係,我們可得到O(n^2)時間的演算法。

有一個方法可以加速,因為c[i][j]滿足四邊形不等式,意即c[i][k] + c[j][l] ≤ c[i][l] + c[j][k]對所有的 i ≤ j ≤ k ≤ l。令d[i][j] = F[i] + c[i][j],問題就變成是在矩陣d中找出每列的最小值。 此時矩陣d同樣也滿足四邊形不等式。我們可以利用一個queue (i_1, h_i), …, (i_k, h_k)來維護可能的最佳解範圍,其中i_1 \leq i_2 \dots \leq i_k且(i_1, h_1)是queue的頭。當處理到第j列時,queue中的元素(i_r, h_r)表示在子矩陣d[1..j-2][j..n]中,第i_r行是第h_r列到第h_{r+1} - 1列的最小值。一開始queue中只包含(1, 1)。當處理到第j列時,如果d[j-1][n]比d[i_k][n]小,那麼我們必須把元素從queue中移除直到d[i_r][h_r] < d[j-1][h_r],i_r是queue的尾。然後我們找出最小的h > h_r,使得d[i_r][h] < d[j-1][h],然後把(j-1, h+1) enqueue。利用二分搜尋可以在O(lg n)的時間內找到此h。因為每一個索引只能被enqueue和dequeue一次,所以總時間複雜度為O(n lg n)。

Reference: Zvi Galil and Kunsoo Park, “Dynamic programming with convexity, concavity and sparsity,” Theoretical Computer Science Volume 92, Issue 1, 6 January 1992, Pages 49–76.

Geeksforgeeks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s