#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	adamant	152
6	djm03178	152
8	Qingyu	151
9	luogu_official	149
10	awoo	147

Recently when I was doing Universal Cup Round 5, I got stuck on a tree problem A as I realized that my solution required way too much memory. However, after the contest, I realized that there was a way that I could reduce a lot of memory using HLD. So here I am with my idea...

Structure of Tree DP

Most tree DP problems follow the following structure.

struct S {
    // return value of DP
};
S init(int u) {
    // initialise the base state of dp[u]
}
S merge(S left, S right) {
    // returns the new dp state where old state is left and transition using right
}
S dp(int u, int p) {
    S res = init(u);
    for (int v : adj[u]) {
        if (v == p) continue;
        res = merge(res, dp(v, u));
    }
    return res;
}
int main() {
    dp(1, -1);
}

An example of a tree DP using this structure is maximum independent set (MIS).

Code

struct S {
    // return value of DP
    int take, notTake;
};
S init(int u) {
    // initialise the base state of dp[u]
    return {1, 0};
}
S merge(S left, S right) {
    // returns the new dp state where old state is left and transition using right
    return {left.take + right.notTake, left.notTake + max(right.take, right.notTake)};
}
S dp(int u, int p) {
    S res = init(u);
    for (int v : adj[u]) {
        if (v == p) continue;
        res = merge(res, dp(v, u));
    }
    return res;
}
int main() {
    dp(1, -1);
}

Suppose struct $$$S$$$ requires $$$|S|$$$ bytes and our tree has N vertices. Then this naive implementation of tree DP requires $$$O(N\cdot |S|)$$$ memory as res of the parent is stored in the recursion stack as we recurse down to the leaves. This is fine for many problems as most of the time, $$$|S| = O(1)$$$, however in the case of the above question, $$$|S| = 25^2\cdot 24$$$ bytes and $$$N = 10^5$$$, which will require around $$$1.5$$$ gigabytes of memory, which is too much to pass the memory limit of $$$256$$$ megabytes. Below, I will show a way to use only $$$O(N + |S|\log N)$$$ memory.

Optimization

We try to make use of the idea of HLD and visit the vertex with the largest subtree size first.

struct S {
    // return value of DP
};
S init(int u) {
    // initialise the base state of dp[u]
}
S merge(S left, S right) {
    // returns the new dp state where old state is left and transition using right
}
int sub[MAXN];
void getSub(int u, int p) {
    sub[u] = 1;
    pair<int, int> heavy = {-1, -1};
    for (int i = 0; i < adj[u].size(); i++) {
        int v = adj[u][i];
        if (v == p) continue;
        getSub(v, u);
        sub[u] += sub[v];
        heavy = max(heavy, {sub[v], i});
    }
    // make the vertex with the largest subtree size the first
    if (heavy.first != -1) {
        swap(adj[u][0], adj[u][heavy.second]);
    }
}
S dp(int u, int p) {
    // do not initialize yet
    S res;
    bool hasInit = false;
    for (int v : adj[u]) {
        if (v == p) continue;
        S tmp = dp(v, u);
        if (!hasInit) {
            res = init();
            hasInit = true;
        }
        res = merge(res, tmp);
    }
    if (!hasInit) {
        res = init();
        hasInit = true;
    }
    return res;
}
int main() {
    getSub(1, -1);
    dp(1, -1);
}

If we analyze the memory complexity properly, we will realize that it becomes $$$O(N + |S|\log N)$$$. The $$$O(N)$$$ comes from storing the subtree sizes, and the $$$O(|S|\log N)$$$ comes from the DP itself.

Proof

The two main changes from our naive DP are that we initialize our res only after we visit the first child, and we visit the child with the largest subtree size first. Recalling the definitions used in HLD, we define an edge $$$p_u \rightarrow u$$$ to be a heavy edge if $$$u$$$ is the child with the largest subtree size among all children of $$$p_u$$$. We will call all other non-heavy edges light edges. It is well known that the path from the root to any vertex of a tree will involve traversing many heavy edges but at most $$$O(\log N)$$$ light edges.

Making use of this idea, if we visit heavy edges first, res of the parent of heavy edges will not be stored in the recursion stack since we only initialize our res after visiting the first edge, hence only res of the parent of light edges will be stored in the recursion stack. Then since there is at most $$$O(\log N)$$$ light edges, the memory complexity is $$$O(|S|\log N)$$$.

My solution to the above problem

Conclusion

I have not seen this idea anywhere before and only came up with it recently during the Universal Cup. Please tell me if it is a well-known technique or if there are some flaws in my analysis. Hope that this will be helpful to everyone!

S dp(int u, int p) { // do not initialize yet S res; bool hasInit = false; for (int v : adj[u]) { if (v == p) continue; if (!hasInit) { res = dp(v, u)); res = merge(init(),res); hasInit = true; continue; } res = merge(res, dp(v, u)); } if (!hasInit) { res = init(); hasInit = true; } return res; }

int tin[NMAX], tim; void dfs(int u, int p){ tin[u] = tim++; for(auto v : adj[u]) if(v != p){ dfs(v, u); } } { dfs(0, -1); vi nvec(n); vector<vi> nadj(n); for(int i = 0; i < n; i++){ nvec[tin[i]] = vec[i]; } for(int i = 0; i < n; i++){ for(auto v : adj[i]){ nadj[tin[i]].push_back(tin[v]); } } vec.swap(nvec); adj.swap(nadj); }

Comments (18)

Write comment?

maomao90

2 years ago, # |

+10

Auto comment: topic has been updated by maomao90 (previous revision, new revision, compare).

→ Reply

Everule

+51

This is one of the most beautiful techniques I've ever seen, hands down. I would have never thought you could do something like this.

bashkort

+16

Nice! But shouldn't we firstly go to the child, and only then initialise our base state?

auto save = dp(v, u);
if (!hasInit) {
    res = init();
    hasInit = true;
}
res = merge(res, save);

2 years ago, # ^ |

Oops yes sorry my mistake

aryanc403

← Rev. 2 →

+12

This technique is also mentioned in the editorial of 103119I - Nim Cheater, and it's the intended solution for this problem.

Also, I believe this should be the correct source code.

If it's bamboo, you will call init() for all nodes before returning from the leaf; therefore, it's O(n) memory only instead of O(logN) we are targeting.

Oh wow I see thanks!

oversolver

+11

Is trivial == well-known?

AlperenT

+13

Thanks a lot for sharing this with us. At first, it might look like it doesn't have much of a use case. However, it is actually helpful. For example, there was a tree DP problem in my second NOI (National OI). It was quite simple, but it required $$$O(nk)$$$ memory where $$$n \leq 10^5$$$, $$$k \leq 200$$$, and the memory limit was 32 MB, which is why the main problem was reducing memory usage. Even though other people managed to squeeze in wrong solutions because of bad test cases, I tried many different methods, but I couldn't get AC. If I knew this back then, I would've solved it easily and won a gold medal.

ErrrrrrrayReis37

-29

okaragul not giving a fuck and inventing this in like 30 seconds (In my defense I was the one lucky enough to teach him HLD in the first place, so I'm taking full credit)

welleyth

Always wrote tree dp using hld, but never thought it actually optimized something

oToToT

A maybe related blog post (as for solving problem A in ucup): https://codeforces.me/blog/entry/67001
In some sense, centroid decomposition can also help.

ANNIHILLATOR

+18

Why we are calling getSub() for the root only?

Oops i forgot to call the function inside the for loop. It was meant to be a dfs by the way

Um_nik

+24

I think it's more-or-less intended solution of ICPC WF 2020 problem B

mychecksdead

Isn't this trick was in IOI 2009 — Regions? Also as Alperent said in our national OI.

peltorator

I think your getSub function doesn't really calculate subtree sizes :)

Bruh I can't code DFS 🤡

AmmarDab3an

20 months ago, # |

I tried to solve the problem using choose(25,2) DFSs, and instead of the memory problem you faced, I got stuck not being able to make the code fast enough to pass the time limit. One trick I come up with is to re-index the nodes according to the pre-order of their tree. I noticed that my code is doing many parent-child calculations and if the indexes are not randomly assigned, some time can be saved.

TLE code

AC code

in short, i added the following:

maomao90's blog

Structure of Tree DP

Optimization

Proof

Conclusion