on minimizing the expected number of recursive calls for segment tree

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	166
2	-is-this-fft-	161
3	Qingyu	160
4	Dominater069	158
5	atcoder_official	157
6	adamant	155
7	Um_nik	152
8	djm03178	151
9	luogu_official	150
10	awoo	148

For the longest time, this comment https://codeforces.me/blog/entry/112755#comment-1004966 was bugging me (the part about how the 'complete' segment tree is 20% slower (complete, because it’s a complete binary tree)). Instead of benchmarking, this blog calculates the expected number of recursive calls visited between segment tree implementations.↵
↵
Thanks to [user:camc,2025-03-03] for giving valuable feedback.↵
↵
This blog will study 2 implementations of segment trees.↵
↵
<spoiler summary="'standard' segment tree">↵
~~~~~↵
void dfs(int v, int l, int r) { // [l, r)↵
  if (r - l == 1) {↵
    return;↵
  }↵
  int m = (l + r + 1) / 2;↵
  dfs(2 * v, l, m);↵
  dfs(2 * v + 1, m, r);↵
}↵
// dfs(1, 0, n);↵
~~~~~↵
</spoiler>↵
↵
And↵
↵
<spoiler summary="'complete' segment tree">↵
~~~~~↵
int split(int l, int r) {↵
  int pw2 = 1 << __lg(r - l);↵
  return min(l + pw2, r - pw2 / 2);↵
}↵
void dfs(int v, int l, int r) { // [l, r)↵
  if (r - l == 1) {↵
    return;↵
  }↵
  int m = split(l, r);↵
  dfs(2 * v, l, m);↵
  dfs(2 * v + 1, m, r);↵
}↵
// dfs(1, 0, n);↵
~~~~~↵
</spoiler>↵
↵
(complete, because it’s a complete binary tree); Also tourist uses [the iterative version of this](https://github.com/the-tourist/algo/blob/master/segtree/lazy.cpp).↵
↵
↵
Structure↵
------------------↵
↵
Consider the structures of 'standard' segment trees on n=1,2,3,...↵
↵
![ ](/predownloaded/60/aa/60aa955a4d3dfd6107da2f9b050870271930567d.png)↵
↵
(credit to [this tool](https://codeforces.me/blog/entry/132771) for making the pictures; [here](https://drive.google.com/drive/folders/1LoOD3Z9kCdVSjOixHs4DdfjLZAlhqQNg?usp=sharing) are the pictures in higher resolution)↵
↵
Notice going from one tree to the next, exactly one leaf node turns into an internal node, and “grows” 2 new leaves. Well consider the sequence of the nodes which “grow” 2 new leaves:↵
↵
1, 2, 3, 4, 6, 5, 7, 8, 12, 10, 14, 9, 13, 11, 15, …↵
↵
It is exactly this sequence https://oeis.org/A059893. But this sequence is about reversing bits, so how is it related?↵
↵
As you increment n, the child (left or right) of the root whose range-size increments will alternate (so it’s based on parity/LSB of n), then the child’s range-size is half of the root’s range-size and you repeat to determine which grandchild’s range-size increments (it also alternates), and so on. Eventually you get to a leaf, and this leaf’s range-size increments (i.e. “grows” 2 new leaves).↵
↵
If you split up the sequence into subarrays `[1,2)`, `[2,4)`, `[4,8)`, ..., `[2^(i-1),2^i)`,..., then:↵
↵
- the `i`-th subarray represents all nodes at depth `i`↵
- Every node at depth `i` grows its leaves before all nodes at depth `(i+1)`, and after all nodes at depth `(i-1)`↵
- So as n increases, the standard segment tree grows its leaves layer by layer, i.e. each layer is completed fully before the next layer’s leaves start growing.↵
- `abs(depth(leaf_i)-depth(leaf_j))<=1` for any pairs of leaves↵
↵
![ ](/predownloaded/9c/19/9c19348a41c51caaba10e73439d5e498e4c4638f.png)↵
↵
Point updates/queries↵
------------------↵
↵
Expected number of recursive calls of point update↵
↵
= expected depth of leaf node (where depth of root=1)↵
↵
= sum_of_leaf_depths/n↵
↵
So how to work out the sum of depths of leaves?↵
↵
Well there are `x` leaves of depth `__lg(n)+1` and `y` leaves of depth `__lg(n)+2`. We have:↵
↵
- `x+y=n`↵
- `y = 2*(bit_floor(n)-x)` because there are `bit_floor(n)-x` internal nodes at depth `__lg(n)+1`, each having 2 leaf-childs.↵
↵
Then sum of depths of leaves = `x*(__lg(n)+1)+y*(__lg(n)+2)`.↵
↵
Finally notice, this math is exactly the same for both the complete segment tree and the standard segment tree, so they have the same sum of leaf depths.↵
↵
Also for merge sort tree: (sum of array lengths) = (sum of leaf depths). So the respective merge sort tree’s have the same sum of array lengths.↵
↵
Range updates/queries↵
------------------↵
↵
Expected number of recursive calls of range update↵
↵
= total_recursive_calls_over_all_possible_updates/(n*(n+1)/2)↵
↵
<spoiler summary="Test comparing total number of recursive calls">↵
~~~~~↵
#include <bits/stdc++.h>↵
using namespace std;↵
↵
int split(int tl, int tr) {↵
  int pw2 = 1 << __lg(tr - tl);↵
  return min(tl + pw2, tr - pw2 / 2);↵
}↵
↵
void update_complete_tree(int l, int r, int tl, int tr, int& num_calls) {↵
  num_calls++;↵
  if (r <= tl || tr <= l) return;↵
  if (l <= tl && tr <= r) return;↵
  int tm = split(tl, tr);↵
  update_complete_tree(l, r, tl, tm, num_calls);↵
  update_complete_tree(l, r, tm, tr, num_calls);↵
}↵
↵
void update_standard_tree(int l, int r, int tl, int tr, int& num_calls) {↵
  num_calls++;↵
  if (r <= tl || tr <= l) return;↵
  if (l <= tl && tr <= r) return;↵
  int tm = (tl + tr + 1) / 2;↵
  update_standard_tree(l, r, tl, tm, num_calls);↵
  update_standard_tree(l, r, tm, tr, num_calls);↵
}↵
↵
int main() {↵
  for (int n = 1; n < 1100; n++) {↵
    int complete_calls = 0, standard_calls = 0;↵
    for (int l = 0; l < n; l++) {↵
      for (int r = l + 1; r <= n; r++) {↵
        update_complete_tree(l, r, 0, n, complete_calls);↵
        update_standard_tree(l, r, 0, n, standard_calls);↵
      }↵
    }↵
    assert(complete_calls <= standard_calls);  // unexpected!!!↵
    cout << n << ' ' << complete_calls << ' ' << standard_calls << endl;↵
  }↵
↵
  return 0;↵
}↵
~~~~~↵
</spoiler>↵
↵
I was blown away by this. I thought that the complete segment tree should have more recursive calls because the ranges aren’t split in the middle. But somehow this is wrong? Let’s see why.↵
↵
I was able to come up with this formula for the total number of recursive calls (derivation left to the reader haha).↵
↵
↵
<spoiler summary="formula">↵
~~~~~↵
#include <bits/stdc++.h>↵
using namespace std;↵
↵
int split(int tl, int tr) {↵
  // return (tl + tr + 1) / 2;  // try this too!↵
  // return tl + 1;  // the formula also works for unbalanced trees↵
  int pw2 = 1 << __lg(tr - tl);↵
  return min(tl + pw2, tr - pw2 / 2);↵
}↵
↵
void update(int l, int r, int tl, int tr, int& num_calls_naive) {↵
  num_calls_naive++;↵
  if (r <= tl || tr <= l) return;↵
  if (l <= tl && tr <= r) return;↵
  int tm = split(tl, tr);↵
  update(l, r, tl, tm, num_calls_naive);↵
  update(l, r, tm, tr, num_calls_naive);↵
}↵
↵
void dfs(int tl, int tr, int depth, int n, int& num_calls_fast) {↵
  num_calls_fast -= (tr - tl) * (tr - tl);↵
  if (tr - tl == 1) {↵
    num_calls_fast += depth * (2 * n + 3);↵
    return;↵
  }↵
  int tm = split(tl, tr);↵
  dfs(tl, tm, depth + 1, n, num_calls_fast);↵
  dfs(tm, tr, depth + 1, n, num_calls_fast);↵
}↵
↵
int main() {↵
  for (int n = 1; n <= 1000; n++) {↵
    int num_calls_naive = 0;↵
    for (int l = 0; l < n; l++) {↵
      for (int r = l + 1; r <= n; r++) {↵
        update(l, r, 0, n, num_calls_naive);↵
      }↵
    }↵
    int num_calls_fast = (-7 * n * n - 3 * n + 4) / 2;↵
    dfs(0, n, 1, n, num_calls_fast);↵
    assert(num_calls_naive == num_calls_fast);↵
    cout << n << "    " << num_calls_naive << ' ' << num_calls_fast << endl;↵
  }↵
  return 0;↵
}↵
~~~~~↵
</spoiler>↵
↵
Comparing this formula for complete versus standard segment tree:↵
↵
- `(-7n^2-3n+4)/2` is the same↵
- sum of `[depth*(2n+3)]` over leaves is the same as they both have the same number of leaves at depths `__lg(n)+1` and `__lg(n)+2`↵
↵
The only difference is we subtract `f(n) = n^2 + f(left child range-size) + f(right child range-size)` from the total number of calls. And `f(n)` increases when `abs(left child range-size - right child range-size)` increases, i.e. as the tree becomes less balanced.↵
↵
![ ](/predownloaded/36/0d/360d92a8f3d340b7491fdfa319a573b2471b65ec.png)↵
↵
To me, this result is unintuitive. I’d be interested if anyone can come up with some intuition for this.↵
↵
But then why is the complete segment tree slower...↵
------------------↵
↵
Calculating the midpoint has a larger constant.↵
↵
- https://judge.yosupo.jp/submission/270739 288 ms 15.82 Mib — complete segment tree↵
- https://judge.yosupo.jp/submission/270737 298 ms 16.27 Mib — standard segment tree, but with artificially inflated constant factor of calculating midpoint↵

	Rev.	Lang.	By	When	Δ	Comment
	en2		lrvideckis	2025-03-03 04:24:24	132
	en1		lrvideckis	2025-03-03 04:13:49	8029	Initial revision (published)

History