Блог пользователя kAc

Автор kAc, 12 лет назад, По-английски

This algorithm can solve some difficult data structure problems. Like this: Give an array a[] of size N, we query Q times. Each time we want to get the mode number(the value that appears most often in a set of data) of subsequence a[l], a[l + 1] .. a[r].

Maybe you want to solve it by segment tree or some other data structures. However, if you know the answer of [l, middle], [middle + 1, r], it's difficult to get the answer of [l, r].

Here is an offline algorithm which can solve this problem in O(N * sqrt(N) * log(N) + Q log Q).

First we need a data structure that can support: insert a number. delete a number. get the mode number. It can be solved easily by binary heap / balanced binary search tree. We call this data structure DS.

Secondly, if we have all the numbers in [l, r] organized in DS, we can transform to [l', r'] in O((|l-l'| + |r-r'|) * log N). We need a good tranform order.

S = the max integer number which is less than sqrt(N);
bool cmp(Query A, Query B)
{
  if (A.l / S != B.l / S) return A.l / S < B.l / S;
  return A.r > B.r
}

We sort all the queries by this comparator, and it can be proofed easily that the total transform costs is N sqrt(N) log N.

  • Проголосовать: нравится
  • +72
  • Проголосовать: не нравится

»
12 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Thanks in advance.

»
12 лет назад, # |
Rev. 2   Проголосовать: нравится +12 Проголосовать: не нравится

....So why does this post keep getting negative feedback...

EDIT : Now it starts to get positive votes

  • »
    »
    12 лет назад, # ^ |
      Проголосовать: нравится +21 Проголосовать: не нравится

    I think, that it's just nice usage of SQRT-decomposition idea, but not algo that deserves to be named in honor of person.

    • »
      »
      »
      12 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      I'm so sorry but this algorithm is famous as Mo's algorithm in Chinese high school student. Here I just use the ordinary name.

»
12 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

How can this problem be solved with less time complexity?

»
12 лет назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

is there any problems in oj?

»
12 лет назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

I got a couple of questions: - Which value would S take if there wasn't any element having a value less than sqrt(N)? - Could you please describe the proof of the overall time complexity? EDIT: When you say the max integer which is less than sqrt(N) you mean floor(sqrt(N)), then it makes sense and the proof is easy as you already said, sorry for the misunderstading.

»
12 лет назад, # |
  Проголосовать: нравится +6 Проголосовать: не нравится

Why A.r > B.r? I can't see it, shouldn't it be < instead?

  • »
    »
    12 лет назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    Both them are right. let l' = l / S For all the queries whose l' is the same, the movement of right pointer is monotonous, so the total costs is O(N). And there can be at most S kinds of queries whose l' is different, here S = sqrt(N), so the total costs of right pointer movement is at most N * sqrt(N)

    • »
      »
      »
      12 лет назад, # ^ |
        Проголосовать: нравится +1 Проголосовать: не нравится

      Yeah i just realized that it doesn't matter on my way back to home. Thanks for the reply.

»
12 лет назад, # |
Rev. 2   Проголосовать: нравится -16 Проголосовать: не нравится

thanks for good algorithm.

»
12 лет назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

I think it can be solved in .

  1. Sort all queries with your comparator except A.r < B.r in the last line.
  2. Make array cnt[] where cnt[x] is how many times number x was met. Also we will maintain the number maxNum which was met maximal number of times (cnt[maxNum] is this number of times).
  3. Process the queries group by group. Queries A and B are in the same group if A.l == B.l. So there are such groups. Clear array cnt, then...
  4. Start with r = k*S (k = 1, 2, ...) (it's the first element of the next block in sqrt-decomposition) and move it to the right, updating cnt and maxNum. If some query in this block has the right border equal to r, then...
  5. Store number maxNum in the temp variable and move pointer l from position k*S to the left while it is not equal to current query's left border. It will move to distance no more than . Of course, we updating array cnt and variable maxNum during this process.
  6. If l is equal to the query's left border we can answer the query — it is maxNum. Then move pointer l back to the right, to its initial place k*S, updating cnt with subtractions, not additions. Finally, variable maxNum — the most appearing number — is now wrong — we cannot maintain it when we subtract from cnt. But we stored its actual value (for segment [kS, r]) at the beginning of step 5. Our data structure became consistent and we can continue processing the queries.
»
12 лет назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

Here is idea.

First of all, lets sort all queries using your comparator.

Let array S[N] be an array, which consist of N hashsets. S[i] contains all numbers, which appear in our current segment exactly i times. Let A[n] be an array(or hashmap), where A[i] contains the number of appearances of number i in our current segment. Let max be the biggest index of nonempty set in S

Using this structures, we can add or delete number in O(1) time:

1) Add some number v: delete v from S[A[v]], increase A[v] by 1, add v to S[A[v]], if A[v] > max — increase max by 1.

2) Delete some number v: delete v from S[A[v]], decrease A[v] by 1, add v to S[A[v]], if S[max].size() =  = 0 — decrease max by 1.

To get answer for the query we just take any number from S[max]

  • »
    »
    6 лет назад, # ^ |
    Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

    Since S[i] is storing numbers, won't it take O(logN) time for each add/delete operation and that too when we use some container like set. Otherwise delete operation could take O(N) time. Please correct me if I'm wrong.
    Although we can solve this question if instead of storing numbers that appear exactly i times, we store the count of such numbers in S[i], I found this technique really useful. Edit: Thought we just need to output the mode number's count. Thanks!

»
12 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

The total transformation cost is (N sqrt(N) + Q sqrt(N) )log N instead of N sqrt(N) log N, because for each query in the worst case you will have to go through a whole block of length sqrt(N).

»
11 лет назад, # |
Rev. 23   Проголосовать: нравится +33 Проголосовать: не нравится

There's one cool adaption of that: Say you have a data structure that supports insert trivially but not delete. There are lots of examples where this is the case.

Assume however that we can implement a snapshot() and rollback() function for our data structure, so that a rollback() brings us back to the state of the latest snapshot in , where k is the number of inserts since the snapshot. Every persistent data structure makes this possible trivially, but that's actually more than we need and often we can find tricks to implement these operations in a much simpler way, possibly even by using memcpy

We can still use the technique presented here with runtime . init() can have complexity .

In the following code, we assume zero-based indexing of the queries and inclusive right borders:

   rt = sqrt(n)
   init()  // this initializes our data structure (clears it)
   snapshot()
   sort queries (l, r) by (l / rt, r) ascending
   for all queries q
       if q.r - q.l + 1 <= rt + 1 // we process light queries
           for j := q.l to q.r
               insert(j)
           store answer for query q
           rollback()
   last_bucket = -1
   for all queries q 
       if q.r - q.l + 1 <= rt: continue
       bucket = q.l / rt

       if bucket != last_bucket
           init()
           l = (bucket + 1) * rt // right border of the bucket
           r = q.r
           for j := l to r
               insert(j)
       last_bucket = bucket

       while r < q.r 
           insert(++r)
       snapshot()
       for j := q.l to l - 1
           insert(j)
       store answer for query q
       rollback()
  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Will you please elaborate a bit? Will you do rollback, if you need O(n) or O(nlgn) to do it?

    • »
      »
      »
      11 лет назад, # ^ |
      Rev. 4   Проголосовать: нравится 0 Проголосовать: не нравится

      You do that Ω(Q) times, so to stay within the time bound, we have to be able to charge it the inserts that happened since the last snapshot. That's why I said it must be bounded by

  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится +16 Проголосовать: не нравится

    Now that the contest is finally over, would you mind answering why you posted the solution to this problem while the contest was midway?

    • »
      »
      »
      11 лет назад, # ^ |
      Rev. 17   Проголосовать: нравится +20 Проголосовать: не нравится

      Someone asked a question about the problem on Stack Overflow (obviously without reference to the source). I came up the idea (or better reinvented it, since it seems to be quite well-known), and because it seems like a generally useful construction I posted it here. When I was notified by a fellow user that it is from a running contest, I removed the references to DSU, which seems to be the core idea, but well you can always see the old revisions. As you might know, there is no option to delete your posts on Codeforces.

      Unfortunately it often happens that we get questions about running contests on Stack Overflow. It is not against SO's rules to post them there, so they cannot be deleted and often get good answers as well. I have seen at least 20 questions about the MIKE3 and SSTORY problem, some had good answers, and I have also unknowingly described a solution for STREETTA there towards the beginning of the contest. Even if you realize this later, you cannot delete your answers once they are marked as accepted. After you see the same question 3 times in a row you can usually figure that something is probably up, but then it's often too late. What can I say. Just one more reason to have more fellow competitive programmers on Stack Overflow to spot these types of questions and mark them in the comments.

      The same goes for cs.stackexchange.com I suppose, although it is a bit less frequented.

      I guess in the end this is all about fun, so whoever used this approach blindly without thinking about it further has missed a good opportunity to come up with the much more elegant approach using Link-cut trees... It will probably haunt you in the future :)

      • »
        »
        »
        »
        11 лет назад, # ^ |
          Проголосовать: нравится +8 Проголосовать: не нравится

        What surprised me initially was that you had submissions to the problems of this contest before you posted this solution. Now I see that all the initial submissions were to the problem ANUGCD, so I assume you only had that one problem read at the time of this post.

        • »
          »
          »
          »
          »
          11 лет назад, # ^ |
          Rev. 5   Проголосовать: нравится +8 Проголосовать: не нравится

          Yes indeed, the reason I looked at ANUGCD was yet another Stack Overflow post that made me interested in that particular problem.

»
10 лет назад, # |
  Проголосовать: нравится -8 Проголосовать: не нравится

"First we need a data structure that can support: insert a number. delete a number. get the mode number. It can be solved easily by binary heap / balanced binary search tree."
Another easy way would be to keep a map of (number, frequency) and a set of pair (frequency,number). Insertion/Deletion are O(log N). Getting mode is O(1).

»
9 лет назад, # |
  Проголосовать: нравится +4 Проголосовать: не нравится

Thanks for sharing.It's really helpful for me to learn Mo's algorithm. I can hardly find a paper about Mo's algorithm on Internet.

»
9 лет назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

It seems that this cmp is faster than your cmp:

bool cmp(Query A, Query B)
{
  if (A.l / S != B.l / S) return A.l< B.l;
  return (A.r < B.r)^(A.l/S%2);
}

2182 MS with above cmp: 15551689

3586 MS with your cmp: 15558775

  • »
    »
    9 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    This is very specific on tests. I'm pretty sure that tests in this specific task are bad enough to be optimized nearly twice this way.

    Your comparator is completely identical to the one presented in this blog if all queries are in the same bucket of size S.

    UPD: one can prove that this comparator is better like this

  • »
    »
    6 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Your cmp is for <= operator. So it can gain RE.

    • »
      »
      »
      6 лет назад, # ^ |
        Проголосовать: нравится +3 Проголосовать: не нравится

      Yes. You're right. Use this function instead:

      bool cmp(Query A, Query B){
        if (A.l / S != B.l / S) return A.l < B.l;
        return A.l / S % 2 ? A.r > B.r : A.r < B.r;
      }
      
      • »
        »
        »
        »
        5 лет назад, # ^ |
          Проголосовать: нравится +5 Проголосовать: не нравится

        It is too awesome. My TLE converted to Accepted by just changing Mo comparator But may I know why? What is the reason behind the efficiency of this comparator?

        • »
          »
          »
          »
          »
          5 лет назад, # ^ |
            Проголосовать: нравится +8 Проголосовать: не нравится

          Visualize it.

          • »
            »
            »
            »
            »
            »
            5 лет назад, # ^ |
              Проголосовать: нравится +8 Проголосовать: не нравится

            Oh nice, It was amazing, Please tell me if I am right that while right pointer is returning back towards the block for queries in odd block then changing sorting order in even block (every time direction changes for right pointer) helps to solve the query in next block while going away from even block (towards right). thus reducing significant amount of left to right iterations for right pointer and instead using left to right movement and right to left movement of right pointer effectively.

»
9 лет назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

If we had all queries with l in the same segment, except that in sorted order they alternate with one starting at the beginning of the segment, the next starting at the end of the segment. Wouldn't the time complexity for queries be O(Q log Q + Q * sqrt N)?
By segment I mean the sqrt N partition.

»
9 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Тут предлагалось несколько вариантов избавления от множителя lgn в оценке времени. Еще один вариант: можно хранить кол-во текущих появлений в любой пирамиде. Т.к. в ней будут целые числа, то при изменении любого значения на 1 произойдет не более одного перемещения. Но чтобы по числу за O(1) найти его узел в пирамиде придется также заводить либо массив размером max{a[i]}, либо хештаблицу и тогда время будет ожидаемым O(1).

»
5 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

what is mos algo + dp my sol is not passing in 2 sec by only mo's algorithm where both n and q is 5e5

»
3 года назад, # |
  Проголосовать: нравится -21 Проголосовать: не нравится

how to sort like this in python?