chinesedfan's blog

By chinesedfan, history, 10 days ago, In English

As is well-known, segment tree allows answering range queries over an array efficiently, while still being flexible enough to allow quick modification of the array. -- https://cp-algorithms.com/

This blog is about how to solve segment tree problems generally, including not only the classic range sum/minimum/maximum, but also some other complex cases. If are already confident enough of your skills, you can skip reading most of the content and go to practice the last problem directly.

Basic Example

Almost everyone learns segment tree from calculating range sums with single element updates. The formal definition is:

Given an array $$$A$$$, for each query range $$$[L,R]$$$, returns its sum $$$\sum_{i=L}^R A[i]$$$. And also handle assignments of the form $$$A[i] = x$$$.

The main idea is to divide the whole range $$$[1,N]$$$ into 2 parts. Obviously, if we know sums of each part, the whole range sum can be calculated in $$$O(1)$$$ by a simple add operation.

sum[1,N] = sum[1, N/2] + sum[N/2 + 1, N]

Keep the divide process recursively until the range size is 1, then we get a binary tree, while each node represents for an interval. For any range $$$[1,X]$$$, its sum can be calculated by $$$O(logN)$$$ nodes with several add operations. Because the length of $$$X$$$'s binary form won't exceed $$$O(logN)$$$.

sum[1,3] = sum[1,2] + sum[3,3]
sum[1,11] = sum[1,8] + sum[9,10] + sum[11,11]

When querying range $$$[L,R]$$$, you can image it as the difference between $$$[1,R]$$$ with $$$[1,L-1]$$$, as well as prefix sums. In fact, it can also turn into several add operations. We need $$$O(logN)$$$ nodes, so the time complexity is $$$O(logN)$$$.

  sum[4,11]
= sum[1,11] - sum[1,3]
= (sum[1,8] + sum[9,10] + sum[11,11]) - (sum[1,2] + sum[3,3])
= (sum[1,2] + sum[3,3] + sum[4,4] + sum[5,8] + sum[9,10] + sum[11,11]) - (sum[1,2] + sum[3,3])
= sum[4,4] + sum[5,8] + sum[9,10] + sum[11,11]

When updating position $$$X$$$, it affects $$$O(logN)$$$ nodes. We can change $$$sum[X,X]$$$ at first, then do lots of add operations again.

*When updating range $$$[L,R]$$$, the above method is not very efficient. The worst case is updating $$$[1,N]$$$, all $$$O(NlogN)$$$ nodes need to be updated. Using lazy propagation can optimize to $$$O(logN)$$$, but it is a little off-topic here.

Now you should find that the add operation + is a key part for using segment trees to calculate range sums. That's so-called "conquer" in divide and conquer principle.

Advanced Example

Let's try more examples.

  • If want to query the minimum value in range, just change + to min.
  • If want to query the maximum value in range, just change + to max.
  • If want to query the and sum of elements in range, just change + to &.
  • If want to query the or sum of elements in range, just change + to |.
  • If want to query the xor sum of elements in range, just change + to ^.
  • If want to query the gcd of elements in range, just change + to gcd.
  • ...

See? We can replace the add operation with other operations, as long as they accept two values of the same type. To maintain generality, define a structure S and a function op. And we need the identity element e as initial value before any calculating. For any S instance a, it should have $$$op(a, e) = op(e, a) = a$$$. (It may be used in different places, so design e as a function)

struct S {};
S op(S a, S b);
S e();

That's all 3 things we need to implement to solve a segment tree problem. Great news is that, atcoder has provided a convenient library. Refer detailed usages in ACL.

A Common way to implement op is considering prefixes and suffixes. For example, how to find the size of the longest continuous 0s in range $$$[L,R]$$$ for a binary string s that contains only 0 and 1? Say s="0010011", L=2 and R=5, then the answer should be 2, because s[L,R]="0100".

Suppose we have the longest 0-prefix and the longest 0-suffix of two intervals a and b. When they are concated, new continuous 0s will be created, i.e. '1100' with '0111'. And if one of them is all 0s, the longest 0-prefix or the longest 0-suffix of the other will increase.

Then we can solve it like following,

struct S {
  int prefix, suffix, ans, sz;
};
S op(S a, S b) {
  int ans = max(max(a.ans, b.ans), a.suffix + b.prefix);
  int prefix = a.prefix;
  if (a.prefix == a.sz) prefix += b.prefix;
  int suffix = b.suffix;
  if (b.suffix == b.sz) suffix += a.suffix;
  return { prefix, suffix, ans, a.sz + b.sz };
}
S e() {
  return { 0, 0, 0, 0 };
}

Now you will never be afraid of changing your algorithm and introducing strange bugs. This is the power of templates.

Complex example

The original problem is not written in English, which comes from IT MARAFON 2024 3-sinov bosqichi Reytingli in robocontest.uz (another cool CP website). The reason why I write this blog is also due to I was the unique participant who passed this problem in contest time, even though participants were much less than in codeforces.

I translate and give a short problem statements with my words:

Given an array $$$A$$$, for each query range $$$[L,R]$$$, returns the number of pairs $$$[l,r](L \leq l \leq r \leq R)$$$ whose gcd sum, $$$gcd(A_l, A_{l+1}, ..., A_r)$$$, is larger than 1. And also handle assignments of the form $$$A[i] = x$$$.

I will not show full codes, the more important thing is how you think when facing segment tree problems.

Expand the following editorial when you are ready.

Wrote in last

Thanks for your time reading to here. This is my first blog and hope it can help readers understand how segment trees work better. If find any incorrect places, feel free to let me know.

Read more segment tree problems in this classic cf blog.

Full text and comments »

  • Vote: I like it
  • +15
  • Vote: I do not like it