Alternative Solution to 2077C Using Probability

#	User	Rating
1	jiangly	3846
2	tourist	3799
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3590
6	Ormlis	3533
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
9	Um_nik	3451

#	User	Contrib.
1	cry	165
2	-is-this-fft-	161
3	Qingyu	160
4	atcoder_official	156
4	Dominater069	156
6	adamant	154
7	djm03178	151
8	luogu_official	149
9	Um_nik	148
10	awoo	147

Hello, Codeforces community! Today, I want to share my solution to a problem from recent contest: Binary Subsequence Value Sum. Rather than using polynomials and FFT, I tried to approach this problem from a different viewpoint — probability! Now, let's get started with the solution itself.

1. Deterministic definition of the score

Suppose you have a binary string $$$v$$$ of length $$$m$$$. Interpret each character as a “contribution”: assign $$$+1$$$ for a ‘1’ and $$$-1$$$ for a ‘0’. Define

$$$ d = \text{(number of 1's)} - \text{(number of 0's)} $$$

so that $$$d$$$ is the overall “imbalance” of $$$v$$$.

The score of $$$v$$$ is defined as

$$$ \text{score}(v) = \max_{1 \le i < m} \Bigl( F(v,1,i) \cdot F(v,i+1, m) \Bigr), $$$

where $$$F(v,1,i)$$$ is imbalance of prefix and $$$F(v,i+1,m)$$$ is imbalance of suffix; if the prefix imbalance is $$$S$$$, then the suffix is $$$d-S$$$, and the product is

$$$ P(S) = S\cdot (d-S). $$$

For fixed $$$d$$$, the product $$$S(d-S)$$$ is a quadratic function in $$$S$$$. It is maximized when

$$$ S = \frac{d}{2}. $$$

However, since $$$S$$$ must be an integer (and may not be exactly $$$d/2$$$ when $$$d$$$ is odd), the best we can do is to take

$$$ S = \left\lfloor\frac{d}{2}\right\rfloor \quad \text{and} \quad d-S = \left\lceil\frac{d}{2}\right\rceil. $$$

Thus the maximum product is

$$$ \left\lfloor\frac{d}{2}\right\rfloor \cdot \left\lceil\frac{d}{2}\right\rceil. $$$

So in one compact expression, the score of $$$v$$$ can be written as

$$$ \text{score}(v) = \frac{d^2 - (d \bmod 2)}{4}. $$$

2. Counting scores over all subsequences

Now, consider a fixed binary string $$$s$$$ of length $$$n$$$ with $$$R$$$ ones and $$$Z = n-R$$$ zeros.

A nonempty subsequence of $$$s$$$ is obtained by choosing, independently for each position, whether to include that bit. (There are $$$2^n$$$ subsequences in total, including the empty one—but the empty subsequence contributes nothing to the score.)

Imagine that for each bit in $$$s$$$ we make an independent choice with probability $$$1/2$$$ of including it. Then every subsequence occurs with equal probability $$$(\frac{1}{2})^n$$$.

For any subsequence $$$v$$$, let its imbalance be

$$$ d' = \text{(# of ones chosen)} - \text{(# of zeros chosen)}. $$$

Then the score of that subsequence is, by our previous argument,

$$$ \text{score}(v) = \frac{(d')^2 - ((d') \bmod 2)}{4}. $$$

We want to sum the scores over all nonempty subsequences. Rather than summing directly, we use the idea of expectation.

Recall that for any function $$$f(v)$$$ defined on the set of subsequences (with the uniform probability distribution), the average value is:

$$$ E[f(v)] = \frac{1}{2^n} \sum_{v} f(v). $$$

Multiplying by $$$2^n$$$ recovers the sum over all subsequences:

$$$ \sum_{v} f(v) = 2^n \cdot E[f(v)]. $$$

3. Calculating the Expectation $$$E[(d')^2]$$$

For each bit of $$$s$$$, define a random variable $$$X_i$$$ as follows:

If the $$$i$$$th bit is a ‘1’ and is chosen, $$$X_i=+1$$$; if it’s not chosen, $$$X_i=0$$$.
If the $$$i$$$th bit is a ‘0’ and is chosen, $$$X_i=-1$$$; if it’s not chosen, $$$X_i=0$$$.

Then the imbalance for a given subsequence is

$$$ d' = \sum_{i=1}^n X_i. $$$

For ‘1’: $$$E[X_i]= \frac{1}{2}$$$ (since it contributes $$$+1$$$ with probability $$$1/2$$$).
For ‘0’: $$$E[X_i]= -\frac{1}{2}$$$.

Thus, if there are $$$R$$$ ones and $$$Z$$$ zeros,

$$$ E[d'] = R\cdot\frac{1}{2} + Z\cdot\left(-\frac{1}{2}\right) = \frac{R - Z}{2} = \frac{2R-n}{2}. $$$

Each $$$X_i$$$ is a Bernoulli-type variable with two outcomes (nonzero with probability $$$1/2$$$ and zero with probability $$$1/2$$$). In either case (for ‘1’ or ‘0’), one can show that

$$$ \operatorname{Var}(X_i) = \frac{1}{4}. $$$

Since the choices are independent,

$$$ \operatorname{Var}(d') = \sum_{i=1}^n \operatorname{Var}(X_i) = \frac{n}{4}. $$$

Recall that

$$$ E[(d')^2] = \operatorname{Var}(d') + (E[d'])^2. $$$

Thus,

$$$ E[(d')^2] = \frac{n}{4} + \left(\frac{2R-n}{2}\right)^2 = \frac{n}{4} + \frac{(2R-n)^2}{4} = \frac{(2R-n)^2 + n}{4}. $$$

Then the expected value of $$$\frac{(d')^2}{4}$$$ is

$$$ E\!\left[\frac{(d')^2}{4}\right] = \frac{1}{4} \cdot \frac{(2R-n)^2+n}{4} = \frac{(2R-n)^2+n}{16}. $$$

So if we (naively) approximated the score of every subsequence by $$$\frac{(d')^2}{4}$$$, then the total over all $$$2^n$$$ subsequences would be:

$$$ 2^n \cdot \frac{(2R-n)^2+n}{16} = 2^{n-4}\Bigl((2R-n)^2+n\Bigr). $$$

4. $$$d'$$$ parity correction

The true score of a subsequence with imbalance $$$d'$$$ is

$$$ \frac{(d')^2 - (d'\bmod 2)}{4}, $$$

which means that whenever $$$d'$$$ is odd, our naive expression $$$\frac{(d')^2}{4}$$$ is an overestimate by $$$\frac{1}{4}$$$.

A classical combinatorial fact (which one may also derive via the bruteforce check, as I did) is that the total number of subsequences with an odd imbalance is exactly half of all subsequences. Thus, the total “overcount” when summing $$$\frac{(d')^2}{4}$$$ is:

$$$ \frac{1}{4} \cdot \bigl(\text{number of subsequences with odd } d'\bigr) = \frac{1}{4} \cdot 2^{n-1} = 2^{n-3}. $$$

Therefore, by linearity of expectation, the correct total sum $$$S$$$ over all nonempty subsequences is:

$$$ S = \left(2^n \cdot E\!\left[\frac{(d')^2}{4}\right]\right) - 2^{n-3}. $$$

Substitute the expectation we computed:

$$$ S = 2^n \cdot \frac{(2R-n)^2+n}{16} - 2^{n-3}. $$$

5. Simplifying to the Final Formula

$$$ S = 2^n \cdot \frac{(2R-n)^2+n}{16} - 2^{n-3} = $$$ $$$ 2^{n-4} \cdot ((2R-n)^2+n) - 2^{n-3}. $$$

Factor $$$2^{n-4}$$$ out of both terms, and we obtain the final closed-form formula:

$$$ S = 2^{n-4} ((2R-n)^2+n-2). $$$

My solution: 309852412. Thank you for reading!

spike1236's blog

1. Deterministic definition of the score

2. Counting scores over all subsequences

3. Calculating the Expectation $$$E[(d')^2]$$$

4. $$$d'$$$ parity correction

5. Simplifying to the Final Formula