spike1236's blog

By spike1236, history, 4 days ago, In English

Hey Codeforces community!

I wanted to share something cool with you all — I've been working on WnSOJ (Work and Solve Online Judge), a platform that is not just another online judge where you solve problems — you can also find a job here! Yes, you read that right. Solve problems, learn algorithms, and get hired!

What’s in WnSOJ?

  • Robust testing system – run your solutions in isolated environments.
  • Categorized problemset – find tasks based on topics and difficulty.
  • Editorials & solutions – each problem comes with a C++ solution and explanation.
  • Submissions & statistics – track your progress and improve.
  • Job search feature – developers can find jobs, companies can post them.
  • User profiles – keep track of your submissions and achievements.

Check it out: wnsoj.xyz!

Spoiler

A Bit About the Project

The story of this project spans four years — originally, it was built with Flask, but this year, I decided to switch to Django, and for the past three months, I’ve been working extensively on it to make it better, more scalable, and feature-rich.

I believe the Codeforces community can benefit from this journey because my commit history is essentially a real-time record of how I learned Django and Celery (a framework that helps run a testing system in parallel with the server). If you're interested in back-end development, you can explore the code and follow along to learn it yourself.

If you're curious, the full source code is available here, so feel free to explore, learn, or contribute. If you like the project, consider starring the repo and following me on GitHub!

Would love to hear your thoughts! Thank you for reading!

Full text and comments »

  • Vote: I like it
  • +47
  • Vote: I do not like it

By spike1236, history, 12 days ago, In English

Hello, Codeforces community! Today, I want to share my solution to a problem from recent contest: Binary Subsequence Value Sum. Rather than using polynomials and FFT, I tried to approach this problem from a different viewpoint — probability! Now, let's get started with the solution itself.


1. Deterministic definition of the score

Suppose you have a binary string $$$v$$$ of length $$$m$$$. Interpret each character as a “contribution”: assign $$$+1$$$ for a ‘1’ and $$$-1$$$ for a ‘0’. Define

$$$ d = \text{(number of 1's)} - \text{(number of 0's)} $$$

so that $$$d$$$ is the overall “imbalance” of $$$v$$$.

The score of $$$v$$$ is defined as

$$$ \text{score}(v) = \max_{1 \le i < m} \Bigl( F(v,1,i) \cdot F(v,i+1, m) \Bigr), $$$

where $$$F(v,1,i)$$$ is imbalance of prefix and $$$F(v,i+1,m)$$$ is imbalance of suffix; if the prefix imbalance is $$$S$$$, then the suffix is $$$d-S$$$, and the product is

$$$ P(S) = S\cdot (d-S). $$$

For fixed $$$d$$$, the product $$$S(d-S)$$$ is a quadratic function in $$$S$$$. It is maximized when

$$$ S = \frac{d}{2}. $$$

However, since $$$S$$$ must be an integer (and may not be exactly $$$d/2$$$ when $$$d$$$ is odd), the best we can do is to take

$$$ S = \left\lfloor\frac{d}{2}\right\rfloor \quad \text{and} \quad d-S = \left\lceil\frac{d}{2}\right\rceil. $$$

Thus the maximum product is

$$$ \left\lfloor\frac{d}{2}\right\rfloor \cdot \left\lceil\frac{d}{2}\right\rceil. $$$

So in one compact expression, the score of $$$v$$$ can be written as

$$$ \text{score}(v) = \frac{d^2 - (d \bmod 2)}{4}. $$$

2. Counting scores over all subsequences

Now, consider a fixed binary string $$$s$$$ of length $$$n$$$ with $$$R$$$ ones and $$$Z = n-R$$$ zeros.

A nonempty subsequence of $$$s$$$ is obtained by choosing, independently for each position, whether to include that bit. (There are $$$2^n$$$ subsequences in total, including the empty one—but the empty subsequence contributes nothing to the score.)

Imagine that for each bit in $$$s$$$ we make an independent choice with probability $$$1/2$$$ of including it. Then every subsequence occurs with equal probability $$$(\frac{1}{2})^n$$$.

For any subsequence $$$v$$$, let its imbalance be

$$$ d' = \text{(# of ones chosen)} - \text{(# of zeros chosen)}. $$$

Then the score of that subsequence is, by our previous argument,

$$$ \text{score}(v) = \frac{(d')^2 - ((d') \bmod 2)}{4}. $$$

We want to sum the scores over all nonempty subsequences. Rather than summing directly, we use the idea of expectation.

Recall that for any function $$$f(v)$$$ defined on the set of subsequences (with the uniform probability distribution), the average value is:

$$$ E[f(v)] = \frac{1}{2^n} \sum_{v} f(v). $$$

Multiplying by $$$2^n$$$ recovers the sum over all subsequences:

$$$ \sum_{v} f(v) = 2^n \cdot E[f(v)]. $$$

3. Calculating the Expectation $$$E[(d')^2]$$$

For each bit of $$$s$$$, define a random variable $$$X_i$$$ as follows:

  • If the $$$i$$$th bit is a ‘1’ and is chosen, $$$X_i=+1$$$; if it’s not chosen, $$$X_i=0$$$.
  • If the $$$i$$$th bit is a ‘0’ and is chosen, $$$X_i=-1$$$; if it’s not chosen, $$$X_i=0$$$.

Then the imbalance for a given subsequence is

$$$ d' = \sum_{i=1}^n X_i. $$$
  • For ‘1’: $$$E[X_i]= \frac{1}{2}$$$ (since it contributes $$$+1$$$ with probability $$$1/2$$$).
  • For ‘0’: $$$E[X_i]= -\frac{1}{2}$$$.

Thus, if there are $$$R$$$ ones and $$$Z$$$ zeros,

$$$ E[d'] = R\cdot\frac{1}{2} + Z\cdot\left(-\frac{1}{2}\right) = \frac{R - Z}{2} = \frac{2R-n}{2}. $$$

Each $$$X_i$$$ is a Bernoulli-type variable with two outcomes (nonzero with probability $$$1/2$$$ and zero with probability $$$1/2$$$). In either case (for ‘1’ or ‘0’), one can show that

$$$ \operatorname{Var}(X_i) = \frac{1}{4}. $$$

Since the choices are independent,

$$$ \operatorname{Var}(d') = \sum_{i=1}^n \operatorname{Var}(X_i) = \frac{n}{4}. $$$

Recall that

$$$ E[(d')^2] = \operatorname{Var}(d') + (E[d'])^2. $$$

Thus,

$$$ E[(d')^2] = \frac{n}{4} + \left(\frac{2R-n}{2}\right)^2 = \frac{n}{4} + \frac{(2R-n)^2}{4} = \frac{(2R-n)^2 + n}{4}. $$$

Then the expected value of $$$\frac{(d')^2}{4}$$$ is

$$$ E\!\left[\frac{(d')^2}{4}\right] = \frac{1}{4} \cdot \frac{(2R-n)^2+n}{4} = \frac{(2R-n)^2+n}{16}. $$$

So if we (naively) approximated the score of every subsequence by $$$\frac{(d')^2}{4}$$$, then the total over all $$$2^n$$$ subsequences would be:

$$$ 2^n \cdot \frac{(2R-n)^2+n}{16} = 2^{n-4}\Bigl((2R-n)^2+n\Bigr). $$$

4. $$$d'$$$ parity correction

The true score of a subsequence with imbalance $$$d'$$$ is

$$$ \frac{(d')^2 - (d'\bmod 2)}{4}, $$$

which means that whenever $$$d'$$$ is odd, our naive expression $$$\frac{(d')^2}{4}$$$ is an overestimate by $$$\frac{1}{4}$$$.

A classical combinatorial fact (which one may also derive via the bruteforce check, as I did) is that the total number of subsequences with an odd imbalance is exactly half of all subsequences. Thus, the total “overcount” when summing $$$\frac{(d')^2}{4}$$$ is:

$$$ \frac{1}{4} \cdot \bigl(\text{number of subsequences with odd } d'\bigr) = \frac{1}{4} \cdot 2^{n-1} = 2^{n-3}. $$$

Therefore, by linearity of expectation, the correct total sum $$$S$$$ over all nonempty subsequences is:

$$$ S = \left(2^n \cdot E\!\left[\frac{(d')^2}{4}\right]\right) - 2^{n-3}. $$$

Substitute the expectation we computed:

$$$ S = 2^n \cdot \frac{(2R-n)^2+n}{16} - 2^{n-3}. $$$

5. Simplifying to the Final Formula

$$$ S = 2^n \cdot \frac{(2R-n)^2+n}{16} - 2^{n-3} = $$$
$$$ 2^{n-4} \cdot ((2R-n)^2+n) - 2^{n-3}. $$$

Factor $$$2^{n-4}$$$ out of both terms, and we obtain the final closed-form formula:

$$$ S = 2^{n-4} ((2R-n)^2+n-2). $$$

My solution: 309852412. Thank you for reading!

Full text and comments »

  • Vote: I like it
  • +112
  • Vote: I do not like it

By spike1236, history, 4 weeks ago, translation, In English

This article explores the concept of virtual trees, proves their key properties, describes an efficient algorithm for their construction, and then applies the method to solve a specific problem using dynamic programming (DP) on trees.

Introduction

Many tree-related problems require working with a subset of vertices while preserving the tree structure induced by their pairwise lowest common ancestors (LCA). The concept of a virtual tree allows us to transition from the original tree $$$T$$$ with $$$N$$$ vertices to a substructure whose size linearly depends on the size of the selected set $$$X$$$. This significantly accelerates algorithms, particularly for DP computations.

Definition of a Virtual Tree

Let $$$T$$$ be a given rooted tree, and $$$X$$$ be some subset of its vertices. A virtual tree $$$A(X)$$$ is defined as follows:

$$$ A(X) = \{ \operatorname{lca}(x,y) \mid x,y \in X \}, $$$

where $$$\operatorname{lca}(x,y)$$$ denotes the lowest common ancestor of vertices $$$x$$$ and $$$y$$$ in tree $$$T$$$. In this tree, an edge is drawn between each pair of vertices if one of them is an ancestor of the other in $$$T$$$.

This definition ensures that the structure includes all vertices "important" for analyzing $$$X$$$ (i.e., all LCAs for pairs of vertices from $$$X$$$), and connectivity is inherited from the original tree.

Key Properties and Their Proofs

Property 1: Size Estimation

Statement.
For any set $$$X$$$ of vertices of tree $$$T$$$, the following inequality holds:

$$$ \vert A(X) \vert \le 2 \vert X \vert - 1. $$$

Proof.
Let's choose a depth-first search (DFS) traversal order and consider the vertices of set $$$X$$$ arranged in traversal order:

$$$ x_1, x_2, \dots, x_m. $$$

Denote

$$$ l = \operatorname{lca}(x_1, x_2). $$$

We need to prove the following lemma.

Lemma 1. For any integer $$$n \ge 3$$$, if $$$\operatorname{lca}(x_1,x_n) \neq l$$$, then

$$$ \operatorname{lca}(x_1,x_n) = \operatorname{lca}(x_2,x_n). $$$

Proof (by contradiction).
Suppose that for some $$$n \ge 3$$$, both $$$\operatorname{lca}(x_1,x_n) \neq l$$$ and $$$\operatorname{lca}(x_1,x_n) \neq \operatorname{lca}(x_2,x_n)$$$ are true. Let $$$k = \operatorname{lca}(x_1,x_n)$$$.
If $$$k$$$ is an ancestor of $$$l$$$, then by definition both $$$\operatorname{lca}(x_1,x_n)$$$ and $$$\operatorname{lca}(x_2,x_n)$$$ equal $$$k$$$, which contradicts our assumption. Therefore, $$$k$$$ must be a descendant of $$$l$$$. But then in the DFS order, we would get the sequence $$$x_1, x_n, x_2$$$, which contradicts our chosen traversal order.
This contradiction completes the proof of the lemma.

Returning to the proof of the property, we note that according to the lemma, each LCA arising from sequential consideration of the vertices in $$$X$$$ equals either: 1. the vertex $$$x_1$$$ itself, 2. $$$\operatorname{lca}(x_1,x_2)$$$, or 3. $$$\operatorname{lca}(x_2,x_n)$$$ for some $$$n$$$.

Thus, we can recursively write:

$$$ A(\{x_1,\dots,x_m\}) = A(\{x_2,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2) \}. $$$

From this, it follows that

$$$ \vert A(X) \vert \le \vert A(\{x_2,\dots,x_m\}) \vert + 2. $$$

Applying induction on the size of set $$$X$$$, we obtain the final estimate:

$$$ \vert A(X) \vert \le 2 \vert X \vert - 1. $$$

Moreover, if tree $$$T$$$ is a perfect binary tree and $$$X$$$ consists of its leaves, the inequality is achieved exactly.

Property 2: Representation through Sequential LCAs

Statement.
Let the vertices of $$$X$$$ be ordered by DFS traversal order: $$$x_1, x_2, \dots, x_m$$$. Then

$$$ A(X) = X \cup \{ \operatorname{lca}(x_i, x_{i+1}) \mid 1 \le i < m \}. $$$

Proof.
We start with the recursive representation:

$$$ A(\{x_1,\dots,x_m\}) = A(\{x_2,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2) \}. $$$

Let's expand it recursively:

$$$ \begin{aligned} A(\{x_1,\dots,x_m\}) &= A(\{x_2,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2) \} \\ &= A(\{x_3,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2), x_2, \operatorname{lca}(x_2,x_3) \} \\ &\quad \vdots \\ &= \{ x_1, \operatorname{lca}(x_1,x_2), x_2, \dots, x_{m-1}, \operatorname{lca}(x_{m-1},x_m), x_m \}. \end{aligned} $$$

Thus, we obtain the required representation.

Construction of a Virtual Tree

Given a set of vertices $$$X$$$, the virtual tree $$$A(X)$$$ can be constructed in time

$$$ O\left(|X| (\log |X| + \log N)\right), $$$

with preprocessing of the original tree $$$T$$$ in $$$O(N \log N)$$$ to ensure fast LCA queries.

Main construction steps:

  1. Preprocessing.
    Using depth-first search (DFS) or methods such as binary lifting or HLD, we compute entry times for vertices and LCAs for any two vertices.

  2. Vertex Sorting.
    We sort the vertices of set $$$X$$$ in the order of their appearance during DFS traversal (using entry times).

  3. Adding LCAs.
    For consecutive vertices $$$x_i$$$ and $$$x_{i+1}$$$, we compute $$$\operatorname{lca}(x_i,x_{i+1})$$$. The union of $$$X$$$ and these LCAs gives the set of vertices $$$A(X)$$$ (according to Property 2).

  4. Tree Construction.
    Having obtained the set of vertices $$$A(X)$$$ sorted by DFS order, we can traverse the sequence using a stack to restore the tree structure. During traversal, we maintain a stack whose top element is the current vertex, and if a new vertex is not a descendant of the vertex at the top of the stack, elements are "popped" until the corresponding ancestor is found, after which connecting occurs.

This algorithm guarantees that the virtual tree contains $$$O(|X|)$$$ vertices, allowing efficient application of DP.

Application: Counting Subtrees in a Colored Tree

Problem Statement

Given a tree $$$T$$$ with $$$N$$$ vertices numbered from $$$1$$$ to $$$N$$$. Edge $$$i$$$ connects vertices $$$u[i]$$$ and $$$v[i]$$$. Each vertex $$$i$$$ is colored with color $$$A[i]$$$.
It is necessary to find (modulo 998244353) the number of (non-empty) subsets $$$S$$$ of vertices of tree $$$T$$$ satisfying the condition: - The induced graph $$$G[S]$$$ is a tree. - All vertices of $$$G[S]$$$ with degree 1 have the same color.

Main Solution Idea

The main idea is to break down the problem by colors. For each color $$$c$$$, we consider the set of vertices $$$X$$$ colored with $$$c$$$ and build a virtual tree for $$$X$$$. Then, on the resulting tree, we perform DP to count valid subtrees where all vertices with degree 1 have color $$$c$$$. The final answer is obtained by summing the results for all colors.

Thanks to the construction of the virtual tree, although the original tree contains $$$N$$$ vertices, each virtual tree for a specific color has size $$$O(|X|)$$$, which allows DP to be performed in an acceptable time.

We get that the total complexity of preprocessing and building virtual trees does not exceed $$$O(N \log N + \sum_{c \in C} (|X_c| (\log |X_c| + \log N)) = O(N \log N)$$$, where $$$C$$$ is the set of different vertex colors and $$$X_c$$$ is the set of vertices with color $$$c$$$.

Solution Implementation

Below is the complete C++ code with comments describing the main components of the algorithm:

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 2e5 + 10;
const int LOGN = 20;
const int MOD = 998244353;

vector<int> g[MAXN];         // Adjacency list of the given graph
vector<int> ver[MAXN];       // For each color c, ver[c] stores vertices with color c
vector<int> aux_g[MAXN];     // Adjacency list for the virtual tree
int col[MAXN];               // Color of each vertex
int tmr, n;                  // Global time counter and number of vertices
int up[LOGN][MAXN], dep[MAXN], tin[MAXN]; // For computing LCA and entry time

// DP arrays for dynamic programming on the virtual tree
int dp[MAXN][2], sum[MAXN];

// Preprocessing function: DFS to compute tin, up array, and dep
void dfs_precalc(int v, int p) {
    tin[v] = ++tmr;
    up[0][v] = p;
    dep[v] = dep[p] + 1;
    for (int i = 1; i < LOGN; ++i)
        up[i][v] = up[i - 1][up[i - 1][v]];
    for (auto to : g[v])
        if (to != p)
            dfs_precalc(to, v);
}

// Function to compute LCA using binary lifting
int getlca(int x, int y) {
    if (dep[x] < dep[y]) swap(x, y);
    for (int i = LOGN - 1; i >= 0; --i)
        if (dep[up[i][x]] >= dep[y])
            x = up[i][x];
    if (x == y) return x;
    for (int i = LOGN - 1; i >= 0; --i)
        if (up[i][x] != up[i][y]) {
            x = up[i][x];
            y = up[i][y];
        }
    return up[0][x];
}

// DFS on the virtual tree to perform DP.
// Parameter c — target color for which counting is performed.

void dfs_calc(int v, int p, int c, int &ans) {
    dp[v][0] = dp[v][1] = 0;
    sum[v] = 0;
    for(auto to : aux_g[v]) {
        if(to == p) continue;
        dfs_calc(to, v, c, ans);
        // DP transitions: combining current state with result from subtree.
        int nxt0 = (dp[v][0] + sum[to]) % MOD;
        int nxt1 = ((dp[v][0] + dp[v][1]) * 1ll * sum[to] % MOD + dp[v][1]) % MOD;
        dp[v][0] = nxt0;
        dp[v][1] = nxt1;
    }
    sum[v] = (dp[v][0] + dp[v][1]) % MOD;
    if(col[v] == c) {
        // If the vertex has the target color, it can participate in a valid subtree.
        sum[v] = (sum[v] + 1) % MOD;
        ans = (ans + sum[v]) % MOD;
    } else {
        ans = (ans + dp[v][1]) % MOD;
    }
}

// Function to build a virtual tree for color c and perform DP.
void calc_aux(int c, int &ans) {
    auto p = ver[c];
    if (p.empty()) return;
    // Sort vertices by entry time (tin) — DFS traversal order.
    sort(p.begin(), p.end(), [&](const int a, const int b) { return tin[a] < tin[b]; });
    vector<int> verstk = {1}; // Initialize stack with the root of tree T (vertex 1).
    aux_g[1].clear();
    auto add = [&](int u, int v) {
        aux_g[u].push_back(v);
        aux_g[v].push_back(u);
    };
    // Process each vertex from set p, maintaining a stack to build the virtual tree.
    for (auto u : p) {
        if (u == 1) continue;
        int lca = getlca(u, verstk.back());
        if (lca != verstk.back()) {
            while (verstk.size() >= 2 && tin[lca] < tin[verstk[verstk.size() - 2]]) {
                add(verstk.back(), verstk[verstk.size() - 2]);
                verstk.pop_back();
            }
            if (verstk.size() >= 2 && tin[lca] != tin[verstk[verstk.size() - 2]]) {
                aux_g[lca].clear();
                add(verstk.back(), lca);
                verstk.back() = lca;
            } else {
                add(verstk.back(), lca);
                verstk.pop_back();
            }
        }
        aux_g[u].clear();
        verstk.push_back(u);
    }
    while (verstk.size() > 1) {
        add(verstk.back(), verstk[verstk.size() - 2]);
        verstk.pop_back();
    }
    // Perform DP on the virtual tree, starting from root 1.
    return dfs_calc(1, 0, c, ans);
}

int main() {
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    cin >> n;
    // Read colors and group vertices by color.
    for (int i = 1; i <= n; ++i) {
        cin >> col[i];
        ver[col[i]].push_back(i);
    }
    for (int i = 1; i < n; ++i) {
        int u, v;
        cin >> u >> v;
        g[u].push_back(v);
        g[v].push_back(u);
    }
    dfs_precalc(1, 0);
    int ans = 0;
    // Process the corresponding virtual tree for each possible color.
    for (int i = 1; i <= n; ++i)
        calc_aux(i, ans);
    cout << ans;
    return 0;
}

Implementation Explanation

  • Preprocessing.
    The dfs_precalc function performs a depth-first traversal of tree $$$T$$$, computing entry time (tin), depth, and filling the binary lifting table up for fast LCA queries.

  • Computing LCA.
    The getlca function implements binary lifting for quickly finding the lowest common ancestor of two vertices.

  • Building the Virtual Tree.
    The calc_aux function takes a color $$$c$$$, extracts the set of vertices ver[c], sorts it by tin, and uses a stack to build the virtual tree. For each pair of consecutive vertices, the LCA is computed, corresponding to Property 2.

  • Dynamic Programming.
    The dfs_calc function traverses the virtual tree and combines the results from subtrees according to DP transitions. The DP states are calculated in such a way that the contribution of a vertex is accounted for if it has the target color, and only those subtrees where all leaves have the same color are counted.

  • Collecting the Result.
    The main function reads the input data, performs preprocessing, and then sums up the results for each color, outputting the final answer modulo 998244353.

Conclusion

In this article, we have examined in detail the concept of a virtual tree, proved its main properties, described an efficient construction algorithm, and demonstrated the application of this technique to solve a DP problem on trees. The advantage of this approach is that it reduces the problem to working with a substructure of size $$$O(|X|)$$$, which significantly accelerates computations even with a large size of the original tree.

The article is based on editorial of Atcoder problem ABC340G by Nyaan.

Full text and comments »

  • Vote: I like it
  • +108
  • Vote: I do not like it