maomao90's blog

By maomao90, history, 21 month(s) ago, In English

Recently when I was doing Universal Cup Round 5, I got stuck on a tree problem A as I realized that my solution required way too much memory. However, after the contest, I realized that there was a way that I could reduce a lot of memory using HLD. So here I am with my idea...

Structure of Tree DP

Most tree DP problems follow the following structure.

struct S {
    // return value of DP
};
S init(int u) {
    // initialise the base state of dp[u]
}
S merge(S left, S right) {
    // returns the new dp state where old state is left and transition using right
}
S dp(int u, int p) {
    S res = init(u);
    for (int v : adj[u]) {
        if (v == p) continue;
        res = merge(res, dp(v, u));
    }
    return res;
}
int main() {
    dp(1, -1);
}

An example of a tree DP using this structure is maximum independent set (MIS).

Code

Suppose struct $$$S$$$ requires $$$|S|$$$ bytes and our tree has N vertices. Then this naive implementation of tree DP requires $$$O(N\cdot |S|)$$$ memory as res of the parent is stored in the recursion stack as we recurse down to the leaves. This is fine for many problems as most of the time, $$$|S| = O(1)$$$, however in the case of the above question, $$$|S| = 25^2\cdot 24$$$ bytes and $$$N = 10^5$$$, which will require around $$$1.5$$$ gigabytes of memory, which is too much to pass the memory limit of $$$256$$$ megabytes. Below, I will show a way to use only $$$O(N + |S|\log N)$$$ memory.

Optimization

We try to make use of the idea of HLD and visit the vertex with the largest subtree size first.

struct S {
    // return value of DP
};
S init(int u) {
    // initialise the base state of dp[u]
}
S merge(S left, S right) {
    // returns the new dp state where old state is left and transition using right
}
int sub[MAXN];
void getSub(int u, int p) {
    sub[u] = 1;
    pair<int, int> heavy = {-1, -1};
    for (int i = 0; i < adj[u].size(); i++) {
        int v = adj[u][i];
        if (v == p) continue;
        getSub(v, u);
        sub[u] += sub[v];
        heavy = max(heavy, {sub[v], i});
    }
    // make the vertex with the largest subtree size the first
    if (heavy.first != -1) {
        swap(adj[u][0], adj[u][heavy.second]);
    }
}
S dp(int u, int p) {
    // do not initialize yet
    S res;
    bool hasInit = false;
    for (int v : adj[u]) {
        if (v == p) continue;
        S tmp = dp(v, u);
        if (!hasInit) {
            res = init();
            hasInit = true;
        }
        res = merge(res, tmp);
    }
    if (!hasInit) {
        res = init();
        hasInit = true;
    }
    return res;
}
int main() {
    getSub(1, -1);
    dp(1, -1);
}

If we analyze the memory complexity properly, we will realize that it becomes $$$O(N + |S|\log N)$$$. The $$$O(N)$$$ comes from storing the subtree sizes, and the $$$O(|S|\log N)$$$ comes from the DP itself.

Proof

The two main changes from our naive DP are that we initialize our res only after we visit the first child, and we visit the child with the largest subtree size first. Recalling the definitions used in HLD, we define an edge $$$p_u \rightarrow u$$$ to be a heavy edge if $$$u$$$ is the child with the largest subtree size among all children of $$$p_u$$$. We will call all other non-heavy edges light edges. It is well known that the path from the root to any vertex of a tree will involve traversing many heavy edges but at most $$$O(\log N)$$$ light edges.

Making use of this idea, if we visit heavy edges first, res of the parent of heavy edges will not be stored in the recursion stack since we only initialize our res after visiting the first edge, hence only res of the parent of light edges will be stored in the recursion stack. Then since there is at most $$$O(\log N)$$$ light edges, the memory complexity is $$$O(|S|\log N)$$$.

My solution to the above problem

Conclusion

I have not seen this idea anywhere before and only came up with it recently during the Universal Cup. Please tell me if it is a well-known technique or if there are some flaws in my analysis. Hope that this will be helpful to everyone!

  • Vote: I like it
  • +270
  • Vote: I do not like it

»
21 month(s) ago, # |
  Vote: I like it +10 Vote: I do not like it

Auto comment: topic has been updated by maomao90 (previous revision, new revision, compare).

»
21 month(s) ago, # |
  Vote: I like it +51 Vote: I do not like it

This is one of the most beautiful techniques I've ever seen, hands down. I would have never thought you could do something like this.

»
21 month(s) ago, # |
  Vote: I like it +16 Vote: I do not like it

Nice! But shouldn't we firstly go to the child, and only then initialise our base state?

auto save = dp(v, u);
if (!hasInit) {
    res = init();
    hasInit = true;
}
res = merge(res, save);
»
21 month(s) ago, # |
Rev. 2   Vote: I like it +12 Vote: I do not like it

This technique is also mentioned in the editorial of 103119I - Nim Cheater, and it's the intended solution for this problem.

Also, I believe this should be the correct source code.

S dp(int u, int p) {
    // do not initialize yet
    S res;
    bool hasInit = false;
    for (int v : adj[u]) {
        if (v == p) continue;
        if (!hasInit) {
            res = dp(v, u));
            res = merge(init(),res);
            hasInit = true;
            continue;
        }
        res = merge(res, dp(v, u));
    }
    if (!hasInit) {
        res = init();
        hasInit = true;
    }
    return res;
}

If it's bamboo, you will call init() for all nodes before returning from the leaf; therefore, it's O(n) memory only instead of O(logN) we are targeting.

»
21 month(s) ago, # |
  Vote: I like it +11 Vote: I do not like it

Is trivial == well-known?

»
21 month(s) ago, # |
  Vote: I like it +13 Vote: I do not like it

Thanks a lot for sharing this with us. At first, it might look like it doesn't have much of a use case. However, it is actually helpful. For example, there was a tree DP problem in my second NOI (National OI). It was quite simple, but it required $$$O(nk)$$$ memory where $$$n \leq 10^5$$$, $$$k \leq 200$$$, and the memory limit was 32 MB, which is why the main problem was reducing memory usage. Even though other people managed to squeeze in wrong solutions because of bad test cases, I tried many different methods, but I couldn't get AC. If I knew this back then, I would've solved it easily and won a gold medal.

  • »
    »
    21 month(s) ago, # ^ |
    Rev. 2   Vote: I like it -29 Vote: I do not like it

    okaragul not giving a fuck and inventing this in like 30 seconds (In my defense I was the one lucky enough to teach him HLD in the first place, so I'm taking full credit)

»
21 month(s) ago, # |
  Vote: I like it +3 Vote: I do not like it

Always wrote tree dp using hld, but never thought it actually optimized something

»
21 month(s) ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

A maybe related blog post (as for solving problem A in ucup): https://codeforces.me/blog/entry/67001
In some sense, centroid decomposition can also help.

»
21 month(s) ago, # |
  Vote: I like it +18 Vote: I do not like it

Why we are calling getSub() for the root only?

  • »
    »
    21 month(s) ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Oops i forgot to call the function inside the for loop. It was meant to be a dfs by the way

»
21 month(s) ago, # |
  Vote: I like it +24 Vote: I do not like it

I think it's more-or-less intended solution of ICPC WF 2020 problem B

»
21 month(s) ago, # |
  Vote: I like it 0 Vote: I do not like it

Isn't this trick was in IOI 2009 — Regions? Also as Alperent said in our national OI.

»
21 month(s) ago, # |
  Vote: I like it +16 Vote: I do not like it

I think your getSub function doesn't really calculate subtree sizes :)

»
17 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I tried to solve the problem using choose(25,2) DFSs, and instead of the memory problem you faced, I got stuck not being able to make the code fast enough to pass the time limit. One trick I come up with is to re-index the nodes according to the pre-order of their tree. I noticed that my code is doing many parent-child calculations and if the indexes are not randomly assigned, some time can be saved.

TLE code

AC code

in short, i added the following:

	int tin[NMAX], tim;

	void dfs(int u, int p){
		tin[u] = tim++;
		for(auto v : adj[u]) if(v != p){
			dfs(v, u);
		}
	}

{
	dfs(0, -1);
			
	vi nvec(n);
	vector<vi> nadj(n);

	for(int i = 0; i < n; i++){
		nvec[tin[i]] = vec[i];
	}

	for(int i = 0; i < n; i++){
		for(auto v : adj[i]){
			nadj[tin[i]].push_back(tin[v]);
		}
	}

	vec.swap(nvec);
	adj.swap(nadj);
}