MissTolochin's blog

By MissTolochin, history, 5 years ago, translation, In English

Today I tried to solve this problem (https://codeforces.me/gym/102069/problem/J).

There are two strings s, t. Given q queries. Each request must be answered how many times the string s[l1..r1] lies on the segment t[l2..r2] (l1, r1,l2, r2 we read in each request).

It seems to be obvious that in this case, you can use a suffix array, and then use the bin.search for all the necessary sub-sections (and for O (the length of the string to be found), check whether this sub-section is suitable), then use the segment tree to calculate the answer. However, this takes place in the first two subgroups (takes 25 points). There is one problem, it is that there is no guarantee that the sum of all the substrings that we have to check will not exceed any constant (for example, the length of the string). This means that you need to optimize the binary search (that is, find the desired sub-sections for a constant less than O(the length of the string to be found)). What data structure can be used to do this?

P.S: I was able to optimize my solution (in binary search, you can do a check using another binary search, checking whether this sub-section is suitable, using hashes). But it still takes only 60 points (passes only if the length of string does not exceed 100,000). What else can be optimized?

  • Vote: I like it
  • +29
  • Vote: I do not like it

| Write comment?
»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

If you preprocess the lcp array on the suffix array, you can get lcp of two suffixes in O(1) as it is just RMQ, which can be achieved with sparse table. Now you have lcp of any two substrings in O(1), so comparing any two substrings lexicographicly is also in O(1), what leads to binsearch in just O(logn), not including the length of substring in the complexity.

More details

»
5 years ago, # |
Rev. 5   Vote: I like it +5 Vote: I do not like it

I solved it with a different approach.

Concatenate the two string with a diameter in between (S $ T) and build both suffix array and LCP array. Now for each query, find the range in the suffix array which contain the the substring from S, this can be done with binary search and sparse table.

After you find such range for each query, the answer for that query is the number of indices in its range that have values between [L, R-len+1], where L and R are represent the range from T for that query, and len is the length of the substring we're counting.

To count those values, you can either build a segment tree with each node being a sorted vector, and do a binary search on it ( O(n*log^2(n) + m*log^2(n), probably won't get 100 score), or you can process the queries offline. the answer for some query with corresponding range [a, b] in the suffix array, is :

Ans = (Number of values >= L in [a,b]) + (Number of values <= R-len+1 in [a,b]) — ( Length of [a,b] )

So, you can update values from 0 to |S|+|T| in order and find the (Number of values >= L), and similarly for the (Number of values <= R-len+1)

My Submission

P.S: I'm using an O(n) suffix array algorithm, and its code is just unreadable XD.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it +14 Vote: I do not like it

    When will you drop your next album?

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it +16 Vote: I do not like it

      Tss, he can't hear you

      • »
        »
        »
        »
        5 years ago, # ^ |
          Vote: I like it +4 Vote: I do not like it

        Er kann lesen. Wann wirst du dein nächstes Album fallen lassen?

      • »
        »
        »
        »
        5 years ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        But I've been waiting for near 2 centuries... QAQ

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it +7 Vote: I do not like it

      I'm retired now XD. Solving problems turned out to be my passion after all.