PALSTAR

Given a string S with length n, design a linear time algorithm to decompose S into several palindromic substrings that their concatenation is S.

Solution:

Let F(S, i) be the smallest index j, such that S[i..j] is an even palindrome. This function can be computed in a way similar to Manacher algorithm. Then, we can greedily decompose S into several even palindromic substrings.

In general, greedy decomposition may not work. However, we can prove that if S can be decompose, then the length of the first palindromic substring is one of F(S, 1), 2F(S, 1) – 1, or 2F(S, 1) + 1, where F(S, i) is the smallest index j that S[i..j] is a palindrome. Consequently, we can solved this problem in linear time by dynamic programming.

It is also possible to test whether a string is a composition of two, three, or four palindromes in linear time.

解法

F(Si)為最小的索引j使得S[i..j]是一個偶回文。此函數可藉由Manacher algorithm計算。借此可以利用貪心法把S分解成數個偶回文。

一般而言,貪心分解可能不存在。但是我們可以證明如果S存在一組分解,那第一個切點只會是F(S, 1)、2F(S, 1) – 1或是2F(S, 1) + 1,F(S, i)為最小的索引j使得S[i..j]是一回文。利用動態規劃的技巧就可在線性時間內建構出分解。

Reference: Geeksforgeeks

Zvi Galil and Joel Seiferas, “A Linear-Time On-Line Recognition Algorithm for “Palstar,” Journal of the ACM (JACM) JACM Homepage archive Volume 25 Issue 1, Jan. 1978  Pages 102-111

String Matching with k-mismatches

Given a text T with length n and a pattern P with length m and an integer k, design an O(kn) time algorithm to find a substring of T that matches P with at most k mismatches.

Solution:

We can use sliding window technique. Let Q(i) be the set of locations of first at most k + 1 mismatches when matching P to T[i..i+m-1]. If the size of Q(i) is smaller than k, then we find the answer. If Q(i) is larger then k, then we need to shift P to align with T[i+j..j+m-1] and compute Q(i+j) for some j accordingly.

Let G(j) be the set of locations of first at most 2k + 1 mismatches when matching P to P[j..m]. G(j) can be computed in O(m^2) time before hand. By merging the elements in G(j) and Q(i), we can compute a subset of Q(i+j) in O(k) time. We can try j incrementally until that subset of Q(i+j) is smaller than k and then compute Q(i + j) completely. The total complexity is O(kn).

This problem can also be solved in O(kn) time by using suffix tree.

解法

Q(i)為當把P對應到T[i..i+m-1]時最前面且至多k + 1的錯誤匹配位置。如果Q(i)大小小於k,那麼我們找到了答案。如果Q(i)比k來得大,那我們就必須找出一個特定的j然後把P平移j位去跟T[i+j..j+m-1]配對同時計算出Q(i+j)。

G(j)為當把P對應到P[j..m]時最前面且至多2k + 1的錯誤匹配位置。G(j)可以在利用O(m^2)的前處理時間計算出。把G(j)和Q(i)合併,我們可以在O(k)的時間內計算出Q(i+j)的一個子集。我們可以遞增的去嘗試j直到找到計算出一個Q(i+j)的子集小於k,然後再完整計算出Q(i + j)。總時間複雜度是O(kn)。

利用suffix tree這問題也可以在O(kn)時間內解出。

Reference: Stackoverflow

Gad M. Landau, Uzi Vishkin, “Efficient string matching with k mismatches,” Theoretical Computer Science Volume 43, 1986, Pages 239–249