1k_trash's blog

By 1k_trash, 11 years ago, translation, In English

You are given a string (|S| <= 10^5) and have to find such substring that repeats one after another maximum number of times. Or, if there are some such substrings, you must choose the longest/shortest/lexicografically_something.

For example: for test bacacacb answer is ac.

How to solve it? Thanks!

  • Vote: I like it
  • +6
  • Vote: I do not like it

»
11 years ago, # |
Rev. 3   Vote: I like it +2 Vote: I do not like it

Make a suffix array, and an LCP array. Now your task is to find the longest non-zero contiguous subsequence in the LCP array, which can easily be done in linear time.

edit: sorry, I thought it just said maximum number of occurrences.

  • »
    »
    11 years ago, # ^ |
    Rev. 2   Vote: I like it +1 Vote: I do not like it

    aclwawalaciac
    Your solution will give ac as an answer, but the correct answer is wa.
    It seems that you didn't notice "such substring that repeats one after another maximum number of times"

»
6 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

This post mentioned this paper which I think solves the problem.

  • »
    »
    5 hours ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Nevermind, a easier solution can be found here:

    Iterate through all possible "period" (the length of the repeated string) of the answer $$$p$$$ from $$$1$$$ to $$$|S|$$$. For each $$$p$$$, check LCP(i * p, (i + 1) * p) and LCS(i * p - 1, (i + 1) * p - 1) to get the length of the whole repetition substring.

    Iterating through all $$$i$$$ takes $$$O(n / i)$$$, thus the total complexity is $$$O(n \log n)$$$ since LCP and LCS can be calculated from SA (LCS's calculation is based on reversed string) and sparse table.