Sort before insert — A small, yet powerful extension to Merge sort tree

Правка en7, от darkkcyan, 2020-12-12 11:45:13

Hello Codeforces!

Today I wanted to gain contributions share my small invention during my upsolving. I don't know if there existed a similar idea yet, but as far as I can tell, the editorial for the problem does not use my idea. I think it would be nice to share with you guys and have your opinions on this idea.

Idea explanation

How do we build Merge-sort tree again?

Merge sort tree is a Segment tree, each node of whose contains a sorted set/vector of every element in its range. We build it in a bottom-up manner. To build the parent nodes, we merge 2 sets/vectors of the children the same as we do in the merge-sort algorithm in $$$O(n)$$$, hence the name Merge-sort tree.

But there is another way.

Let A be the array we are building the merge-sort tree on, it[i] be the vector that contains sorted elements in the range conquer by the i-th node. Let's create another array B that stores the following pairs: B[i].first = A[i], B[i].second = i. We sort the array B in the increasing order of first. Then we iterate each pair element value, position of B in that sorted order, push the value to the back of every it[i] where i is the node of the merge-sort tree that contains position.

We can see that I sort the array before inserting, so I decided to call this trick sort before insert. Do you guys have a better name?

Some advantages of "sort before insert" over the classical way

  • We can handle the case where we have multiple element located in the same position.
  • With efficient style we can build this tree iteratively.

Here is the small snippet for building the tree.

const int N = 1e5;  // limit for array size
vector<int> it[2 * N];
void build(const vector<int>& a) {
  vector<pair<int, int>> b;
  for (int i = 0; i < (int)a.size(); ++i) {
    b.emplace_back(a[i], i);
  }
  sort(b.begin(), b.end());
  for (auto [val, p]: b) {
    for (p += (int)a.size(); p > 0; p >>= 1)
      it[p].push_back(val);
  }
}

So now we can call merge-sort tree — quick-sort tree from now :))

Can we do with ranges?

As we can see, each node in the merge-sort tree contains only elements of its range, and in the sort before insert version, we used point-update to build the tree. Can we do the same with range-like update? Yes, yes we can with sort before insert!

But what does each node contains exactly? We already know that for every interval, we can break it into $$$O(\log n)$$$ sub-intervals, each of them will correspond to a node of the segment tree. So if we have some intervals with some associated value, we can add these values to every node corresponding to the sub-interval of the considering interval, so that the vectors of the nodes will still be sorted.

Here is the implementation of the idea.

const int N = 1e5;  // limit for array size
vector<int> it[2 * N];
struct Range {
  int l, r, value;  // [l, r)
};
void build(vector<Range> a) {
  sort(a.begin(), a.end(), [](const Range& u, const Range& v) { return u.value < v.value});
  int n = (int)a.size();
  for (auto [l, r, val]: b) {
    for (l += n, r += n; l < r; l >>= 1, r >>= 1) {
      if (l & 1) it[l++].push_back(val);
      if (r & 1) it[--r].push_back(val);
    }
  }
}

Basic application

Disclaimer: the following problems are only for demonstrating the usage of this trick and is only compared with the classical merge-sort tree. These problems already have their solutions with similar complexity.

SPOJ KQUERY

This is a classical problem and can be solved with a Fenwick tree with coordinate compression in $$$O((n + q) \log n)$$$. But if the merge-sort tree is used instead, the complexity is $$$O(n \log n + q \log^2 n)$$$ with binary searching on each node of the sub-intervals for each query. Using sort before insert, we can change the binary searching part to two pointer technique by the following:

  • Build 2 trees with sort before insert, one for array's elements, one for the query.
  • We go from top to bottom in both trees simultaneously. Considering the nodes corresponding to the same interval on both trees. Here we find the answer locally for all queries contains this sub-interval with two pointer technique. Since queries that contain this sub-interval is sorted by their value and the array elements contained in this sub-interval is also sorted, we can maintain 2 pointers, one point to the current query, the other point to the last array's element that is bigger (or first element that is smaller or equals) than the current value of the query.

The complexity of this approach is $$$O((n + q) \log n + q \log q)$$$. Firstly, we must sort both array elements and the queries. Secondly, we can see that the number of times we visit each array's element equals the number of the node that contains it, which is $$$O(n \log n)$$$. And finally, the number of times we visit each query equals the number of sub-intervals of its query range.

The implementation is here

SPOJ MKTHNUM

There is already a solution using the merge-sort tree in $$$O((n + q) \log^2 n)$$$. But I wanted to discuss a little bit naive solution with the merge-sort tree. Let's do binary searching for the answer. If $$$F(x, l, r)$$$ is the number of elements in the range $$$[l, r)$$$ that is smaller or equals to x, then the answer for the query $$$l, r, k$$$ is the smallest number $$$v$$$ such that $$$F(v, l, r) >= k$$$. We can already see that this is exactly the same as the problem SPOJ KQUERY. If we use the above approach, then for the single element (q = 1), we have the complexity of $$$O(n \log n \log (10^9))$$$. But doing that $$$q > 1$$$ times, we need to do multiply the complexity by $$$q$$$, which is very bad.

But the in problem KQUERY, we actually can check $$$q$$$ numbers simultaneously. So the obvious optimization here is to use parallel binary search. Implementation is here.

Теги #segment tree, #merge sort tree

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en25 Английский darkkcyan 2021-07-06 10:35:36 0 (published)
en24 Английский darkkcyan 2021-07-06 10:34:17 270 Add an applicable problem
en23 Английский darkkcyan 2021-07-06 10:29:28 454 (saved to drafts)
en22 Английский darkkcyan 2020-12-12 18:43:21 11 Fix typo.
en21 Английский darkkcyan 2020-12-12 18:40:33 12 Fix typo.
en20 Английский darkkcyan 2020-12-12 17:15:56 2 Fix typo.
en19 Английский darkkcyan 2020-12-12 17:13:46 417 Add short problem statements for the first 2 problems.
en18 Английский darkkcyan 2020-12-12 17:08:46 285 Add prerequisite.
en17 Английский darkkcyan 2020-12-12 16:58:54 2 Fix copy and paste code.
en16 Английский darkkcyan 2020-12-12 16:52:06 1 Fix typo.
en15 Английский darkkcyan 2020-12-12 16:46:04 0 (published)
en14 Английский darkkcyan 2020-12-12 16:45:54 64 Tiny change.
en13 Английский darkkcyan 2020-12-12 16:43:24 16 Tiny changes.
en12 Английский darkkcyan 2020-12-12 16:38:49 411 Fix typo.
en11 Английский darkkcyan 2020-12-12 16:18:51 672 Add summary.
en10 Английский darkkcyan 2020-12-12 16:10:52 1918 Add GP of NorthBeach G application.
en9 Английский darkkcyan 2020-12-12 15:42:19 2714 Add H 2020 HongKong regional application.
en8 Английский darkkcyan 2020-12-12 13:15:41 2122 Add 2D range sum application.
en7 Английский darkkcyan 2020-12-12 11:45:13 1027 Add MKTHNUM application.
en6 Английский darkkcyan 2020-12-12 11:14:41 54 Minor changes for the bulding code.
en5 Английский darkkcyan 2020-12-12 11:09:29 1866 Add application for KQUERY.
en4 Английский darkkcyan 2020-12-12 10:34:28 26 Minor changes in the building code.
en3 Английский darkkcyan 2020-12-12 10:29:33 1324 Add building the tree with range.
en2 Английский darkkcyan 2020-12-12 10:08:56 1795 Added building part for idea explaination.
en1 Английский darkkcyan 2020-12-12 09:45:32 423 Initial revision (saved to drafts)