An efficient way to strengthen up your segment tree.

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	adamant	152
6	djm03178	152
8	Qingyu	151
9	luogu_official	149
10	awoo	147

I'm going to talking about a coding style in segment tree which I used for a long time. Not only is it more systematically, but it also support more kind of segment tree.

1. Old style

Assume that we are making a segment tree support operators assign elements in a range (assign-range L R X) and get max element in a range (max-range L R).

I found a large number of people use the following way to build segment tree:

void lazy_update(int u, int ll, int rr) {
  if (Lazy[u]==-1) return; // assume that -1 is unused value
  if (ll!=rr) Lazy[u*2]=Lazy[u*2+1]=Lazy[u];
  Max[u]=Lazy[u]; Lazy[u]=-1;
}

void assign_range(int u, int ll, int rr, int L, int R, int X) {
  lazy_update(u, ll, rr); // call a function to lazy-update node u (which operate segment ll..rr)
  if (ll>R || L>rr) return;
  if (L<=ll && rr<=R) { 
    Lazy[u]=X; 
    lazy_update(u, ll, rr); 
    return; 
  }
  assign_range(2*u, ll, (ll+rr)/2, L, R, X);
  assign_range(2*u+1, (ll+rr)/2+1, rr, L, R, X);
  Max[u] = max(Max[2*u], Max[2*u+1]);
}

int max_range(int u, int ll, int rr, int L, int R) {
  lazy_update(u, ll, rr);
  if (ll>R || L>rr) return -oo;
  if (L<=ll && rr<=R) return Max[u];
  int Max1 = max_range(2*u, ll, (ll+rr)/2, L, R);
  int Max2 = max_range(2*u+1, (ll+rr)/2+1, rr, L, R);
  return max(Max1, Max2);
}

int main(){
  ...
  cin >> L >> R >> X;
  assign_range(1, 1, n, L, R, X);
  ...
  cin >> L >> R;
  cout << max_range(1, 1, n, L, R) << endl;
  ...
}

There are some duplicated code which need to be reduce:

int u, int ll, int rr : appear in both three function declarations
lazy_update(u, ll, rr); : appear in beginning of both two functions
(2*u, ll, (ll+rr)/2 and 2*u+1, (ll+rr)/2+1, rr : they appear four times.

They will be reduced efficienly in new style.

2. New style

Forcing people to use a personal style is a bad idea. I share with you this style to give you a suggestion in coding segment tree.

struct node {
  int ll, rr, id;

  node(int L, int R, int X) { 
    ll=L, rr=R, id=X; 
    lazy_update(); 
  }

  node left() 
    { return node(ll, (ll+rr)/2, id*2); }
  node right() 
    { return node((ll+rr)/2+1, rr, id*2+1); }

  void lazy_update() { 
    if (Lazy[id]==-1) return; // assume that -1 is unused value
    if (ll!=rr) Lazy[id*2]=Lazy[id*2+1]=Lazy[id];
    Max[id]=Lazy[id]; Lazy[id]=-1;
  }
  
  void assign_range(int L, int R, int X){ // don't need to call lazy_update() at the beginning
    if (ll>R || L>rr) return ;
    if (L<=ll && rr<=R) 
      { Lazy[id]=X; lazy_update(); return ; }
    left().assign_range(L, R, X); // easier to read
    right().assign_range(L, R, X);
    Max[id]=max(Max[id*2], Max[id*2+1]);
  }
  
  int max_range(int L, int R) {
    if (ll>R || L>rr) return -oo;
    if (L<=ll && rr<=R) return Max[id];
    int Max1 = left().max_range(L, R);
    int Max2 = right().max_range(L, R);
    return max(Max1, Max2);
  }
};

int main(){
  ...
  cin >> L >> R >> X;
  node(1, n, 1).assign_range(L, R, X); // easier to read
  ...
  node Root(1, n, 1); // use this if we have to write to much node(1, n, 1)
  cin >> L >> R;
  cout << Root.max_range(L, R) << endl;
  ...
}

Let's list advantages of new style:

Reduce duplicated code
Average lines' length and functions' length is reduced.
Easier to read

The more complex problem, the more efficient this new style.

3. Additionally feature in new style

Assume that we are facing this problem:

There is a string S which contains lowercase letters. Execute following operators on this string:

check i j k: Check if S[i..i+k-1] is equal to S[j..j+k-1].
delete k: Delete k-th element.

To solve above problem, we need to build a segment tree which support two operators : hash-range L R return hash value of S[L..R], and remove X remove X-th element. I called the following structure indexable segment tree. In each node, we maintain hash value and size of this node.

#define long long long

struct node {
  int ll, rr, id;
  node(int L, int X) 
    { ll=L, rr=L+Size[X]-1, id=X; }
  node left() 
    { return node(ll, id*2); }
  node right() 
    { return node(ll+Size[id*2], id*2+1); }

  void update() {
    Hash[id]=sum_hash(Hash[id*2], Hash[id*2+1], Size[id*2+1]);
    Size[id]=Size[id*2] + Size[id*2+1];
  }
  
  long hash_range(int L, int R) {
    if (ll>R || L>rr) return 0LL;
    if (L<=ll && rr<=R) return Hash[id];
    long Hash1 = left().hash_range(L, R);
    long Hash2 = right().hash_range(L, R);
    return sum_hash(Hash1, Hash2, Size[id*2+1]); // easy to implement sum_hash()
  }
  
  void remove(int X){
    if (ll>X || X>rr) return ;
    if (ll==rr) { Size[id]=0; Hash[id]=0; return; }
    right().remove(X); // if call left().remove(X) first and succeeded, ...
    left().remove(X); // ... we are not allowed to call right().remove(X)
    update();
  }

  void build(char a[], int L, int R) { // Note: ll, rr are not valid now but id
    if (L==R) 
      { Size[id]=1; Hash[id]=a[L]; return ; }
    left().build(a, L, (L+R)/2);
    right().build(a, (L+R)/2+1, R);
    update();
  }
};

char a[N]; // zero-based;

main(){
  cin >> a; n=strlen(a);
  node(0, 1).build(a, 0, n-1); // initialize segment tree
  ...
  cin >> i >> j >> k;
  long Hash1 = node(0, 1).hash_range(i, i+k-1);
  long Hash2 = node(0, 1).hash_range(j, j+k-1);
  cout << (Hash1 == Hash2 ? "Yes" : "No") << endl;
  ...
  cin >> k;
  node(0, 1).remove(k);
  ...
}

In the above code, managed segment of each nodes in segment tree is consecutively changed. And struct node become very flexible and efficient.

In this example, we are not allowed to use node Root(0, 1) because Root content is always change (rr field). If you want to use Root variable, you should update Root after every remove query: node(0, 1).remove(k); Root=node(0,1); (and I don't like this).

4. Conclusion

It is hard to write both long and detailed blog. Therefore, you can comment anything which you didn't understand well. I will reply (or update this blog if it is necessary).

Comments (17)

Show archived | Write comment?

dnk

10 years ago, # |

+13

Your approach is worth a try. As I see, this style will cost us three more integers, won't it? So, we should not use it in problems which have strict memory limit.

It's not relevant to your topic. However, your code seems to be short mostly because you put many statements and all the brackets in one line, which make me feel pretty uncomfortable. It's impossible to manipulate other people's coding style but I think that you should expand your code before posting to make it easier to read.

→ Reply

kien_coi_1997

10 years ago, # ^ |

+16

"3 ints", it is 3*log(n) ints, not 3*n ints, because depth of each query is log(n).

← Rev. 4 →

I will expand my code now.

P/S: Done

Truba

Good blog! Is it possible to implement 2nd problem with normal SegTree?

Yes. There are two typical solution.

1) Create another operator to find real position of indexes

For example, if we have 5 elements, XXXXX, after operator delete 3, it become XX-XX. From this time, operator hash-range 1 3 is actually hash-range 1 4. To do this, we can add a function into segment tree or create a new BIT. It is easy to code but expensive.

2) Still use old style

long hash_range(int u, int ll, int rr, int L, int R) {
  ...
  hash_range(u*2, ll, ll+Size[u*2]-1, L, R);
  hash_range(u*2+1, ll+Size[u*2], Size[u*2+1], L, R);
}

Actually, it seems not too bad. However, I don't like to copy-paste them into function remove again.

Thanks for your answer! What does the function sum_hash or how can i implement it?

Assume P and Q is a string.

P have length m and hash value is p.
Q have length n and hash value is q.

Assume that we use module 1000000007 and use M as factor. (e.g. p = (P[0]*M^(m-1) + P[1]*M^(m-2) + ... + P[p-1] ) % 1000000007)

Hash value of P+Q is (p*(M^n)+q) % 1000000007.

YuukaKazami

Why not just use pointer? Like this http://codeforces.me/contest/438/submission/6770765 .

You used additional 2*n pointers (pl, pr) and 2*n integers (l, r). Used memory is doubled (for precise, 7/3).

Umm... If you like to optimize the constant then you have done a great job.

But actually unlike in OI competition, in CF or TC the running time constant is not that important．It is more important to code them fast and correctly. I used to code Segment Tree in the way like you when I was young, but then I realized that it scarfice the 'user-experience' for speedness.

And by pointer you can also easier generalized it to support functional operation too.

hust.111qqz

9 years ago, # ^ |

Is there any limit using this style ? I get MLE on hdu 1754 hdu 1754 TAT

__builtin__wolfy

← Rev. 2 →

One way to handle the int u, int ll, int rr that appear in function calls is using default arguments: int query(int L, int R, int u = 1, int ll = 0, int rr = n - 1);. A call to the function looks like: query(L, R);

does it work?. n is undetermined in compile time.

It works.

It's called default argument.

HekpoMaH

← Rev. 3 →

Hi, I really enjoy that coding style, but the root should be "node root(1,n,1);", because the left and right elements are 1 and n and the id is 1. what do you think?

Oh, yes. You are right. Thank you. I will update now.

EDIT: Done.

kien_coi_1997's blog

1. Old style

2. New style

3. Additionally feature in new style

4. Conclusion