Блог пользователя Libraion

Автор Libraion, история, 3 года назад, По-английски

Given a string $$$S (|S| \le 100)$$$ and $$$n (n \le 1000)$$$ strings $$$T_1, T_2, \dots, T_n (|T_i| \le 30)$$$. Each string $$$T_i$$$ has a corresponding point $$$p_i (1 \le p_i \le 10000)$$$.

You can do the following operation any number of times (until you cannot choose any substring): Choose a substring in $$$S$$$ that is equal to one of $$$T_i$$$, delete that substring in $$$S$$$, you get $$$p_i$$$ points and the remaining characters of $$$S$$$ concatenate.

What is the maximum points you can get by doing the operations?

Sample input

I was thinking about a solution using DP and Aho Corasick.

Specifically, let $$$dp(i, j) = $$$ the maximum points you can get with the first $$$i$$$ characters of $$$S$$$ and you are standing at node $$$j$$$ in the Trie after doing some operations.

But I came across a problem. When we move from $$$dp(i, j)$$$ to $$$dp(i + 1, k)$$$, if we jump from node $$$j$$$ to node $$$k$$$ with suffix link, we lost some information of the prefix that we haven't used in any operation yet.

Could you guys suggest any alternatives?

Thanks in advance.

  • Проголосовать: нравится
  • +63
  • Проголосовать: не нравится

»
3 года назад, # |
Rev. 4   Проголосовать: нравится +32 Проголосовать: не нравится

I think that we need an interval dp solution. Here's mine in $$$O(S^3 \cdot N \cdot T)$$$, which is too slow but should be a good start.

Let $$$B[L][R]$$$ be the best score we can get by erasing the substring $$$S[L..R]$$$. In order to compute this value, let's iterate what is the last move, i.e. which string $$$T_k$$$ we use last. The substring $$$S[L..R]$$$ must contain $$$T_k$$$ as a subsequence, split by some shorter intervals $$$[l, r]$$$, which were erased earlier. We need $$$dp[is][itk]$$$ where $$$is$$$ is position in $$$S$$$, and $$$itk$$$ is position in $$$T_k$$$. Transitions are either eating the next character from $$$T_k$$$ (if matching) or jumping a short interval $$$[l, r]$$$ by using $$$B[l][r]$$$. Here's pseudocode.

for L = len(S)-1 .. 0:
  for R = L .. len(S)-1: // you can get rid of this for-loop because dp[][] just depends on L
    B[L][R] = -INF
    for k = 0 .. n-1:
       // consider T[k] to be the last scored string in substring S[L, R]
       int dp[len(S)+1][len(Tk)+1] = -INF
       dp[L][0] = 0
       for is = L..R:
         for itk = 0..len(Tk)-1:
           if S[is] == Tk[itk]:
             dp[is][itk] -> dp[is+1][itk+1]
           for r = is..R:
             dp[is][itk] + B[is][r] -> dp[r+1][itk]
       B[L][R] = max(B[L][R], score[k] + dp[len(S)][len(Tk)])
»
3 года назад, # |
  Проголосовать: нравится +16 Проголосовать: не нравится

It seems, I've found better algorithm, but I'm not good in Aho-Corasick.

Let's preprocess for every $$$pos$$$ in string $$$S$$$ the set of lengthes of substrings, which are in $$$T$$$, and their last position is $$$pos$$$. To do it, we bring the subtree of trie, which contain only terminal vertices and inversed links, and for every vertice we want to be able to check, if there is an ancestor with label $$$x$$$, where label is a depth of vertice in original trie. Notice, that when we are moving to root, the labels decrease, so we can apply binary lifting. We can do it in $$$O(n \cdot T \cdot \log(n \cdot T))$$$. Then using the binary lifting, we can find the sets for $$$S$$$, which takes is $$$O(S \cdot \log(n \cdot T))$$$.

Now, we can in $$$O(set)$$$ check if substring of $$$S$$$ is in $$$T$$$. Do simple $$$DP$$$ on substrings in $$$O(S^4 \cdot set)$$$.

If I did not make mistake in algorithm, the total complexity is $$$O(n \cdot T \cdot \log(n \cdot T) + S^4 \cdot set)$$$.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +48 Проголосовать: не нравится

    Could you elaborate a little bit more? Like with a real example. Thanks

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Oh, it seems I was yesterday a little tired, I tried to check for every substring of $$$S$$$ isn't it some $$$T_i$$$, but it is easy to make using hashes.)

      But anyway, this algorithm probably correct, I hope I explained it understandable.

      Recall what is Aho-Corasich trie: it is a trie and suffix links from every vertice to vertice with smaller deep. Obviously, the suffix links form tree. We assume here, that root doesn't have suffix link. What do we do, after we built tree and before we scan our text? We want to check for every vertice, isn't it contain some $$$T_i$$$ as suffix. It is not only the vertices, which are at the end of path from root with characters of $$$T_i$$$, it is also every vertice, such that there is path from it through inversed suffix links to one of vertices, which are at the end of path from root with characters of $$$T_i$$$. So we do propagation from every such vertice through inversed suffix links. We call all this vertices terminal.

      Back to our algorithm. Now look at tree with suffix links of trie. Let's look at subset of vertices, which are terminal. Every descendant of a terminal vertice is terminal. Also, let's write on every vertice its depth in trie. If we move to root, the depth is decreases (but not by $$$-1$$$). Also, if we move to root, firstly we stay on only terminal vertices, then only on non-terminal vertices. All ancestors of vertice are some $$$T_i$$$, which are suffix of this vertice. All we have to do, is to be able to check for vertice $$$v$$$ and number $$$d$$$ is there an ancestor with such number, and if there is, is it terminal.

      Now, we can for every vertice check, whether there is $$$T_i$$$ with length $$$d$$$ which is suffix of it. We want to implement DP. We firstly preprocess for every position in $$$S$$$ the corresponding vertice of trie. Now we want to check, if the substring of $$$S$$$ is some $$$T_i$$$. We bring the vertice of trie, corresponding to the last index of $$$S$$$ and ask, if it has terminal ansestor with label $$$length$$$.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        I moreover hadn't read task correctly: how am I going to find answer for string after deleting substring from middle? Sorry for waisting time, I should think first.