Recently, I got a request asking me to write down my thought process while solving questions. So, here is the promised blog.
I would like to thank IceKnight1093, qwexd, Sana, Everule and NovusStellachan for proof reading and suggesting edits in the blog. Special thanks to satyam343 for discussing most of the blog with me.
1. Overview
The blog contains my solutions to $$$7$$$ problems in a wide range of ratings, starting from $$$1200$$$ all the way upto $$$2700$$$. Each problem has a step-by-step solution and you can notice how there are no large jumps in logic, but everything comes naturally. I do not claim that this is always possible in each problem, however I solve majority of CF problems in such a manner.
There are certainly other high rated people who will have completely different methods of solving. However, this is about what works for me. There are some meta ideas taken from parts of the solution and listed at the end in a "What can we learn?" section. I hope the blog is useful for you.
2. Some General Question Solving Strategies
Here is a non-exhaustive list of techniques that are useful to keep in mind. These are internalized by most experienced participants already but maybe it helps you.
Figuring out the Nature of an Optimal Solution. One of the most important thing in problems imo.
Solving subtasks of the original problem and then trying to extend/generalize your solution.
Fixing a parameter and then trying to maximise the result with respect to that fixed parameter.
Finding necessary and sufficient conditions. Sometimes, your necessary conditions themselves become sufficient, or vice versa.
Identifying Lower and Upper bounds, and constructing them.
Reducing a problem into smaller subproblems, commonly used with Dynamic Programming or Inductive Proofs.
Considering the Decision Version of the Problem instead. Especially useful in Counting problems, where you need to count number of good subsequences for example.
Formalizing the Problem
3. Questions and Solutions Section
Question 1. 1979C - Earning on Bets
First, let's formalize the statement.
Suppose $$$a_i$$$ is the amount you bet on the $$$i$$$-th outcome. let $$$S = \sum a_i$$$.
You are asked whether there exists an array $$$a$$$ such that $$$a_i \ge 0$$$ and $$$(a_i \cdot k_i) - S > 0$$$ for all $$$i$$$.
Assuming the solution is possible, we want $$$a_i \cdot k_i > S$$$ to hold.
Forget about $$$a_i$$$ needing to be integer and let's fix $$$min(a_i \cdot k_i) = c$$$, and we want to minimize $$$S$$$.
Naturally, the optimal choice is to have each
$$$S$$$ can be computed as $$$\sum \frac{c}{k_i}$$$. This solution works when $$$S < c$$$, i.e. $$$\sum \frac{1}{k_i} < 1$$$. It is not too hard to see that when $$$\sum \frac{1}{k_i} \ge 1$$$, the solution is impossible. A final detail is we can choose $$$c = LCM(k_i)$$$ and this guarantees that all $$$a_i$$$ is integer as required by the problem
For completeness, here is a simple proof of the impossibility :
Rewritten Condition is
Additing the equations up over all $$$i$$$,
giving us our desired result
- Take a small amount of time to formalize the conditions. What is the problem exactly asking for?
- Don't be afraid of a little bit of algebra.
- Rephrasing the problem slightly might lead you to an obvious solution, for example just fixing $$$min(a_i \cdot k_i)$$$ as above, and then asking for minimum $$$S$$$.
- How do we motivate ourselves to fix $$$min(a_i \cdot k_i)$$$ though? Well, look at the equation. It is $$$(a_i \cdot k_i) > S$$$, This implies $$$min(a_i \cdot k_i) > S$$$. We can fix either $$$S$$$ or $$$min(a_i \cdot k_i)$$$ and then maximize or minimize the other quantity respectively, to have the largest chance of satisfying the given inequality.
Question 2. 1987D - World is Mine
In a lot of asymmetric game questions, it is useful to consider Alice and Bob's strategies separately.
For example, here Alice's strategy is pretty trivial. Once you have identified a simple strategy, it also becomes easier to analyze the opponent's strategy.
At each step, Alice simply takes the element with smallest value of $$$a_i$$$ which doesn't break her sorted condition.
This is pretty obvious as it doesn't worsen any of her future choices. You can formally prove it with Exchange Argument. Note that while I pass it off as something trivial, if you do not understand Exchange Argument, OR do not immediately see how to prove this with Exchange Arguments, I highly recommend you to check up on it. It will help you understand what greedy algorithms are logical and what are baseless.
Let's now analyze Bob's moves. Bob will want to take all occurrences of some element $$$x$$$ himself before Alice reaches there so as to deny Alice that element $$$x$$$.
Here is a very important pitfall. You might think can't Alice play against Bob's strategy specifically? No, as we discussed before, Alice's strategy is fixed.
For the unconvinced, here is a reasoning you can use : If she tried to take $$$x$$$ herself when she sees Bob taking them, she is unable to take some integer $$$y < x$$$. However, you should be convinced of the claim without needing such reasoning, Alice's optimal move at each step should only be dependent on the current setup and not Bob's previous moves.
Let's remove all elements with $$$freq_i = 0$$$ for convenience, and renumber the remaining elements into $$$1...k$$$. Now, consider the case when Bob wants to deny the sequence $$$x_1, x_2, ..., x_m$$$ from Alice. What is Bob's strategy and when is it possible? Both are very important questions.
Reminder that we have fixed the sequence $$$x_1, x_2, ..., x_m$$$ Bob wants to remove, and we will be discussing optimal strategy wrt that.
Claim 1 : Bob will remove the elements in sorted order, i.e. $$$x_i < x_{i + 1}$$$.
This is not hard to see since Alice will approach $$$x_1$$$ first, then $$$x_2$$$, and so on. Thus, if Bob wants to be able to delete these elements, his best chance is removing them in the same order.
Now, we know Bob's strategy too. Let's look at when he can successfully deny Alice all those elements.
For the first element, $$$freq_{x_1} < x_1$$$ must hold, otherwise Alice reaches $$$x_1$$$ too early (reminder : we have renumbered the elements present in the array to always be $$$1...k$$$ to make these formulae easier)
Similarly, for the second element $$$freq_{x_2} + freq_{x_1} < x_2 - 1$$$ must hold, and in general $$$freq_{x_1} + freq_{x_2} + ... + freq_{x_i} < (x_i - i + 1)$$$ must hold. This is pretty obvious once we know Alice and Bob's strategy.
Ok, so we know what sequences are removable. Now, the dynamic programming becomes sort of obvious.
Let's look at what information we need to store. Consider the subproblem till $$$i - 1$$$ and we want to check if we can add $$$i$$$ to Bob's sequence $$$x$$$ or not. We need $$$2$$$ pieces of information :
- $$$\sum freq_{x}$$$ till now
- $$$|x|$$$ till now
Naively, this is still $$$O(N^3)$$$ states but we can incorporate either of the 2 additional parameters into the $$$dp$$$ state. For example, $$$dp_{(i, j)} = $$$ the minimum value of $$$\sum freq_{x}$$$ considering the first $$$i$$$ elements and $$$|x| = j$$$. (You can do it the other way too, for example $$$dp_{(i, j)} =$$$ the maximum value of $$$|x|$$$ under the constraint that $$$\sum freq_{x}$$$ is $$$j$$$).
If you understood the question, transitions are pretty simple to come up with and are left as an exercise.
- The general idea of separately considering Alice and Bob's strategy is pretty useful.
- When we are asked to find a optimal "possible" sequence, it helps to change the problem into a decision problem first. Writing a dp afterwards, becomes much easier.
- Break the problem into several small parts : for example, here i did $$$4$$$ main things : Alice's strategy, Bob's strategy, decision problem of a sequence $$$x$$$, finally the dynamic programming. If you guess one of the steps, be careful as the entire solution from the start might become wrong due to an error in one step. It helps to carefully analyze each step instead. Making small motivated steps like these helps immensely.
- In dynamic programming problems, the method I use isn't to just randomly guess the correct states, but rather looking at what information we need to store which is sufficient to calculate the answer.
- Considering suboptimal solutions and then speeding them up is very important. Here, specifically i used a dp-specific trick where i went from $$$dp_{(i, j, k)} = $$$ is state $$$(i, j, k)$$$ possible? to $$$dp_{(i, j)} = $$$ min/max $$$k$$$ for a possible state.
All these problems authored by me
https://www.codechef.com/problems/ARRAYGAME (quite similiar statements, completely different solutions)
1943E1 - MEX Game 2 (Easy Version) and 1943E2 - MEX Game 2 (Hard Version) (much harder problems but utilizes very similar concepts to the ones used here)
Question 3. 1977C - Nikita and LCM
We want to find the length of the longest special subsequence of $$$a$$$. A very natural thought is to first check if it is $$$n$$$ which is the whole array itself.
Let's sort the array. Now, we can easily verify the only case when the answer is not $$$n$$$ is when $$$a_i | a_n$$$ ($$$a_i$$$ divides $$$a_n$$$) for all $$$i$$$. Otherwise, the lcm exceeds $$$a_n$$$ and hence doesn't belong to the array.
This powerful idea highly restricts the nature of "interesting" testcases.
So, assume the answer is not $$$n$$$ now.
Let's look at the divisors of $$$a_n$$$. LCM of any subsequence of $$$a$$$ must be a divisor of $$$a_n$$$ naturally. We know that the number of divisors isn't too large even for a number of the order $$$10^9$$$ (we can prove it is definitely $$$\le 2 \cdot \sqrt{10^9}$$$ but in practise it is much lower, somewhere around $$$2000$$$ probably). Since the number of divisors isn't large, we can try them all.
Let's try to fix the LCM of the subsequence to be $$$x$$$ and then calculate longest sequence. Only try $$$x$$$ such that $$$x$$$ doesn't belong to $$$a$$$ as given in the problem. So, we want to find the longest subsequence $$$b$$$ of $$$a$$$ such that and $$$LCM(b_i) = x$$$.
Obviously, we can simply take all those $$$a_i$$$ which are divisors of $$$x$$$. This guarantees that LCM will be a divisor of $$$x$$$, and this is the longest subsequence since we cannot take non-divisors of $$$x$$$. If even this subsequence fails to have LCM $$$x$$$, it is impossible to obtain it for any subsequence.
Bruteforcing on the LCM solves the problem in $$$O(n \cdot d(A))$$$ where $$$d(A)$$$ denotes the maximum number of divisors of any number $$$\le A (= 10^9)$$$
- Eliminate obvious cases (for example when the answer is $$$n$$$ above). These might even simplify the problem by constraining how the non-obvious tests look like.
- Fixing a parameter and then calculating the answer for that value of the parameter is a useful technique.
Question 4. 1889B - Doremy's Connecting Plan
Define $$$B_i$$$ as the sum of $$$a_j$$$ where $$$j$$$ belongs to the same component as $$$i$$$.
The given equation is $$$(B_i + B_j) \ge (i \cdot j \cdot c)$$$
Considering all edges seems hard. So, we should try to first consider some special edges. What edges should we try first? Well naturally, an idea is to try those edges with a small value of $$$i \cdot j \cdot c$$$.
Here, you will need to make a claim that you only need edges of the form $$$(1, i)$$$. More specifically, if it is possible to make an edge between $$$(i, j)$$$, it is also possible to make an edge between either $$$(1, i)$$$ or $$$(1, j)$$$. The intuition is there in the previous step that we want to consider edges with the least value of $$$i \cdot j \cdot c$$$, and you must be able to make this small jump.
Proving it is quite simple :
If $$$B_i \ge (i \cdot c)$$$, or $$$B_j \ge (j \cdot c)$$$, we are done.
Otherwise,
for all $$$i, j > 1$$$
Hence, it would be impossible to make an edge $$$(i, j)$$$ if you cannot make edges $$$(1, i)$$$ and $$$(1, j)$$$.
Since, we are only considering edges $$$(1, i)$$$, we want to essentially find an ordering of the $$$n - 1$$$ edges $$$(1, i)$$$ for all $$$i \ge 2$$$ where we are able to add edges in that order.
The condition becomes
at the time of adding edge $$$(1, i)$$$, i.e.
Let's define a new quantity $$$d_i = (i \cdot c) - a_i$$$. Then, if $$$(1, i)$$$ edge is possible right now, and $$$d_j < d_i$$$, then $$$(1, j)$$$ edge is also possible.
This gives us the insight to add edges in increasing order of $$$d_i$$$ since $$$B_1$$$ is an only increasing quantity. Thus, sort the edges $$$(1, i)$$$ according to $$$d_i$$$ and maintain $$$B_1$$$ and check whether each edge is possible or not.
- The most important thing is the motivation behind making claims like we only need edges $$$(1, i)$$$. It might seem like a big jump in logic, but I do not believe it so. It feels very natural to me that we want some equation to hold, we should try to minimize one of the quantities independently of the other. Since we have to make at least one edge $$$(x, i)$$$ for all $$$i$$$, we should consider the case when $$$(x \cdot i \cdot c)$$$ is minimized i.e. $$$x = 1$$$. Proving it is quite easy after we make the claim
- Try to use arguments about the Nature of the Optimal Solution. For example, we used it twice here. First to see that we only need edges $$$(1, i)$$$. Then, to see that we should consider edges in increasing order of $$$d_i = (i \cdot c) - a_i$$$. $$$d_i$$$ is not a randomly plucked quantity, we manipulated some equations and found out $$$B_i \ge d_i$$$ is the condition for adding edge $$$(1, i)$$$. (This is called Separation of Variables fyi, where you split the equation into $$$2$$$ parts, the $$$2$$$ being independent of each other, so that only either the LHS or RHS is dependent on $$$i$$$ while the other part is independent of it)
- Again, formalize(here, simplify) the statement and don't be afraid to do a little algebra. Those are essential skills.
Question 5. 2002D1 - DFS Checker (Easy Version)
First, let us understand when a permutation is a valid DFS order.
The permutation must begin with $$$1$$$ obviously. After that, the next element must be $$$2$$$ or $$$3$$$. Assuming it is $$$2$$$, we will go on to visit the subtree of $$$2$$$ before coming back up and then going to visit subtree of $$$3$$$.
In general, after visiting a non-leaf node $$$x$$$, we will visit the entire subtree of one of the children of $$$x$$$, and after visiting the entire subtree of its child, we will go on to visit the other child of $$$x$$$.
This gives us the following conditon :
- Condition $$$x$$$ : Let $$$p_1$$$ be the position of $$$x$$$ a non-leaf node, and $$$p_2$$$, $$$p_3$$$ be the positions of the children of $$$x$$$ with $$$p_2 < p_3$$$. Let $$$s$$$ be the subtree size of the child of $$$x$$$. Then, $$$p_2 = p_1 + 1$$$ and $$$p_3 = p_2 + s$$$ must hold.
It is not too hard to see that if the above condition holds for all $$$x$$$, the permutation is a valid DFS order.
We can now solve the problem in $$$O(N \cdot Q)$$$ but we need something faster.
The important note is that our permutation doesn't change significantly throughout operations. Only $$$2$$$ elements gets swapped, so not too many things should be affected.
Let's define $$$bad_x = 1$$$ if Condition $$$x$$$ is false, and $$$bad_x = 0$$$ otherwise. Then, the answer is Yes iff $$$\sum bad_x = 0$$$.
When we swap $$$p_x = b$$$ and $$$p_y = c$$$, we can note that only $$$bad_{b}$$$, $$$bad_{c}$$$, $$$bad_{a_b}$$$, $$$bad_{a_c}$$$ gets affected, as $$$bad_x$$$ only depends on the relative positions of $$$x$$$ and it's children ($$$a_i$$$ is the parent of $$$i$$$).
Thus, we only need to recalculate these values, and this can be easily done in $$$O(1)$$$ time.
This solution generalizes to D2, albeit with much more implementation details.
Also, be careful while implementing this solution. I made a small bug during the contest due to which I skipped to just implementing D2 instead of wasting more time since I had it mindsolved while implementing D1.
The bug is you need to specifically consider only distinct elements in $$$(b, c, a_b, a_c)$$$. Otherwise, it might happen that you add/subtract $$$bad_x$$$ twice and thus get an incorrect result.
- Try to solve the problem in simpler settings, for example allowing yourself $$$O(N)$$$ time to check the permutation each time. All solutions to this do not generalize however, you can understand what solutions to this easier problem can be adapted to solve the full problem only through experience.
- One tip about extending solutions to simpler versions to queries is that : consider mathematical conditions instead of "processes". For example, i considered conditions on the positions of $$$x$$$ and it's $$$2$$$ children instead of say using DFS itself to check the permutation. Those solutions are usually not generalizable.
- Once we have the condition, it becomes much easier to solve the queries. Usually, it is a matter of being "Chinese" enough, and knowing sufficient tricks.
- The concept of a update not changing much (for example, here only $$$O(1)$$$ items) is very prevalent in a lot of query problems.
Question 6. 1930E - 2..3...4.... Wonderful! Wonderful!
We first convert the problem to it's decision variant.
Given $$$n$$$ and $$$k$$$ and a subsequence $$$s$$$ of $$$[1, 2, ... n]$$$, we need to tell if it's achievable.
In each operation, we choose an element $$$x$$$ in $$$s$$$ (call it the 'pivot'), and then $$$k$$$ elements smaller than $$$x$$$ and $$$k$$$ elements larger than $$$x$$$ and delete the $$$2 \cdot k$$$ elements.
I made a mistake in contest because I forgot that elements not part of the final sequence may have acted as pivots too.
Lets write down few necessary conditions :
$$$n - |S|$$$ must be divisible by $$$2 \cdot k$$$, since we delete $$$2 \cdot k$$$ elements each time
There must exist at least one element part of $$$s$$$ which can act as a "pivot" for at least one operation. This is because of the nature of the operation. We must have a last operation where the pivot is chosen among the elements of the sequence $$$s$$$. Minor exception : $$$s = [1, 2, .. n]$$$ because we do not perform any operations in this case. More formally, there exists $$$x$$$ in $$$s$$$ such that :
- There are at least $$$k$$$ elements smaller than $$$x$$$ and not in $$$s$$$
- There are at least $$$k$$$ elements larger than $$$x$$$ and not in $$$s$$$
Consider only sequences that satisfy the above $$$2$$$ conditions. So, let's assume that a possible pivot of the last operation is $$$i$$$ and there are $$$x$$$ larger elements to be deleted (i.e. not part of $$$s$$$), and $$$y$$$ smaller elements. Let $$$X$$$ and $$$Y$$$ denote the respective sets of larger and smaller elements which haven't been deleted yet (but need to be deleted)
Because $$$i$$$ can act as a pivot $$$x, y \ge k$$$. Note that $$$(x + y)$$$ is a multiple of $$$2 \cdot k$$$.
If $$$x + y = 2 \cdot k$$$, we are done, because $$$x = y = k$$$, and we can choose subsequence = pivot $$$i$$$ + all remaining elements to be deleted.
Otherwise, without loss of generality, let's assume $$$x \le y$$$.
Case 1 : $$$x \ge 2 \cdot k$$$ : Choose subsequence = pivot $$$i$$$ + arbitrary $$$k$$$ elements of $$$X$$$ + arbitrary $$$k$$$ elements of $$$Y$$$. This will reduce each of $$$x$$$ and $$$y$$$ by $$$k$$$.
Case 2 : $$$x < 2 \cdot k$$$ : This is my solution and not necessarily the most optimum one.
Case 2.1 : $$$(x + y) \ge (6 \cdot k)$$$ : This means $$$y > 4 \cdot k$$$. Choose any $$$2 \cdot k + 1$$$ elements of $$$Y$$$ and do operation, thus reducing $$$y$$$ by $$$2 \cdot k$$$.
Case 2.2 : $$$(x + y) = 4 \cdot k$$$ : Take $$$(x - k)$$$ elements of $$$X$$$, and $$$(y - k + 1)$$$ elements of $$$Y$$$ (which adds to exactly $$$2 \cdot k + 1$$$). Then, both $$$x$$$ and $$$y$$$ become $$$k$$$.
Every pair $$$(x, y)$$$ is reduced to a smaller pair which still satisfies $$$x, y \ge k$$$. Thus, by Induction, this necessary condition becomes sufficient as it is possible to achieve every subsequence which satisfies the $$$2$$$ conditions.
Let $$$T$$$ denote the sequence of elements to be deleted. Note that we will only count sequences $$$T$$$ such that $$$|T| \mod (2 \cdot k) = 0$$$. Let's assume $$$|T| = x$$$.
Counting the number of valid sequences $$$T$$$ directly is hard. So, let's count it's complement — the number of invalid $$$T$$$.
$$$T$$$ is good if there exists at least $$$1$$$ element $$$y$$$ such that :
- $$$y$$$ doesn't belong to $$$T$$$
- There exists $$$\ge k$$$ elements in $$$T$$$ which are $$$< y$$$.
- There exists $$$\ge k$$$ elements in $$$T$$$ which are $$$> y$$$.
$$$T$$$ is bad if there exists no such element $$$y$$$.
What does the bad condition imply? Starting from $$$T_k$$$ till $$$T_{x - k + 1}$$$, the elements must form a continuous subarray, i.e. all elements $$$y$$$ satisfying $$$T_k \le y \le T_{x - k + 1}$$$ belong to $$$T$$$, otherwise they are possible pivots.
let $$$d_i = T_{i + 1} - T_i$$$, then the condition implies $$$d_k = 1, d_{k + 1} = 1 .... d_{x - k} = 1$$$. Define $$$d_0 = T_1 - 1$$$ and $$$d_x = n - T_x$$$. We have the additional conditions that $$$d_i \ge 0, \sum d_i = (n - 1)$$$, and these are all the constraints with respect to the difference array $$$d$$$.
Now, it's a standard Stars and Bars approach to find the number of difference arrays $$$d$$$.
The solution fixes $$$k$$$ and then $$$|T|$$$, but since we only consider $$$|T|$$$ as multiples of $$$2 \cdot k$$$, the solution is $$$O(\sum \frac{n}{2 \cdot i}) = O(n \cdot log(n))$$$
- Again, considering the decision problem first is a very essential step
- Finding necessary conditions and then analyzing the structure of the problem is a very nice skill. Often times, the necessary conditions become sufficient as we see here (or the reverse way, sufficient conditions become necessary)
- Note that I did not assume that the necessary condition would become sufficient. I was analyzing the structure of sequences which satisfy the necessary condition.
- Reducing problems to subproblems is a very useful method. It is very commonly used alongside dynamic programming especially, but here we used it to apply Induction to prove all such sequences would be reachable
- This is a somewhat guessable problem. I don't deny that. However, I think analyzing structures like I did is much more useful for gaining rating.
Counting the number of bad ways instead of the number of good ways is a very common trick in combinatorics problems. An even stronger version of the trick is Principle of Inclusion-Exclusion or PIE for short.
Question 7. 2003E1 - Turtle and Inversions (Easy Version)
Disclaimer : My solution differs from the editorial completely. You can read the editorial for their approach. I will present mine. Try to extend to the hard version of the problem from this.
Consider the subproblem when the union of the set of intervals is $$$[1, n]$$$, i.e. for every $$$i (1 \le i \le n)$$$, there exists exactly one $$$j$$$ such that $$$l_j \le i \le r_j$$$.
That is, the set of intervals satisfy :
- $$$l_1 = 1$$$
- $$$l_i = r_{i - 1} + 1$$$
- $$$r_m = n$$$
First, let's rewrite the condition given in the statement.
Let $$$k = max(a_i)$$$. Then, the permutation is divided into $$$2$$$ parts : $$$p_i \le k$$$ part and $$$p_i > k$$$ part. Let's assign a $$$0$$$ to all elements satisfying $$$p_i \le k$$$ and a $$$1$$$ to all elements satisfying $$$p_i > k$$$.
The condition given in the statement essentially means that every interval $$$(l_i, r_i)$$$, there exists a $$$x_i$$$ such that for all $$$l_i \le j \le x_i$$$, $$$a_j \le k$$$, and for all $$$x_i < j \le r_i, a_j > k$$$.
Suppose we have a binary sequence $$$a$$$ of $$$0$$$s and $$$1$$$s which satisfies the condition. How do we get the maximum inversions in $$$p$$$ corresponding to the sequence $$$a$$$?
Claim 1 : The number of inversions in the optimal $$$p$$$ corresponding to $$$a$$$ is $$$\frac{n \cdot (n - 1)}{2}$$$ — number of $$$(i, j)$$$ such that $$$i < j, a_i = 0, a_j = 1$$$.
This is trivially an upper bound, as for all $$$a_i = 0$$$, $$$p_i \le k$$$, and $$$a_j = 1$$$, $$$p_j > k$$$, so this pair $$$(i, j)$$$ cannot be an inversion.
We now prove that we can construct this. Note that $$$k$$$ is forced to be the number of $0$s.
- if $$$a_i = 0$$$, assign $$$p_i = \sum\limits_{j > i} [a_j = 0] + 1$$$
- if $$$a_i = 1$$$, assign $$$p_i = \sum\limits_{j > i} [a_j = 1] + k + 1$$$
You can easily notice that all other pairs become an inversion here.
Just with the previous observation on the maximum number of inversions corresponding to a binary sequence $$$a$$$, we can write a dynamic programming, as in the editorial, and we are done. However, please continue reading as I find my approach cool and it can solve the problem with higher constraints.
Suppose we fixed the rest of the sequence $$$a$$$, and we must decide the partition point within an interval $$$(l_i, r_i)$$$, i.e. we must decide a $$$x$$$ such that $$$1 \le x < (r_i - l_i + 1)$$$ and make the first $$$x$$$ elements in the interval $$$(l_i, r_i) 0$$$'s and the rest $$$1$$$'s.
Let's figure out the cost function $$$f(x)$$$ which represents the number of pairs of indices which cannot be inversions with respect to the specific choice of $$$0$$$ and $$$1$$$.
Let $$$b$$$ denote the number of $$$0$$$ in the subarray $$$[a_1, a_{l_i - 1}]$$$ and $$$c$$$ denote the number of $$$1$$$ in the subarray $$$[a_{r_i + 1}, a_n]$$$ (Reminder that we have fixed the entire sequence except for the interval $$$(l_i, r_i)$$$). Let $$$d = r_i - l_i + 1$$$.
Then, the cost function
and we need to find minimum value of the function varying $$$x$$$ from $$$1$$$ to $$$d - 1$$$.
This is a quadratic function with negative coefficient of $$$x^2$$$. This means that it has a central maxima, and then decreasing on both sides, depending on the distance from the central maxima. With some casework on the position of the central maxima, we can conclude that in the allowed interval $$$1 \le x \le (d - 1)$$$, this function is minimized at one of the $$$2$$$ endpoints, i.e. either $$$1$$$ or $$$(d - 1)$$$.
This means that it is optimal to have either $$$a_{l_i} = 0$$$, and $$$a_j = 1$$$ for all $$$(l_i < j \le r_i)$$$, OR $$$a_{r_i} = 1$$$ and $$$a_j = 0$$$ for all $$$(l_i \le j < r_i)$$$.
Applying this for each interval iteratively, we know that an Optimal Solution satisfies this criteria for all it's intervals
Let's call an interval $$$P$$$-type if it has exactly one $$$1$$$, otherwise $$$S$$$-type.
Claim 2 : There exists a prefix of intervals which will be $$$P$$$-type and then the suffix of intervals will be $$$S$$$-type
Remember our cost function
Note, that $$$x \cdot (d - x)$$$ is a fixed value as it is gives the same result for both $$$x = 1$$$ and $$$x = (d - 1)$$$. Let's remove this from the cost function since it is a constant, and also subtract $$$b \cdot d$$$.
Thus, we can modify the cost function
Now, it is obvious that we should select an interval to be $$$P$$$-type only if $$$(c - b) \ge 0$$$, otherwise $$$S$$$ type.
Further, we can easily note that for intervals $$$I_j$$$ and $$$I_k$$$ where $$$k > j$$$, the coefficient $$$c$$$ is smaller for $$$I_k$$$ than it is for $$$I_j$$$, and the coefficient $$$b$$$ is larger for $$$I_k$$$ than it is for $$$I_j$$$. This implies that $$$(c - b)$$$ decreases for $$$I_k$$$ as compared to $$$I_j$$$ since $$$k > j$$$.
Thus, we can easily see that a prefix of intervals will have $$$(c - b) \ge 0$$$ and will be $$$P$$$-type. The rest will be $$$S$$$-type.
With this in hand, we can finally solve the problem. Since $$$O(m^2)$$$ works, you can directly bruteforce the prefix of $$$P$$$ type intervals and calculate value according to that.
Returning to the actual problem, there may exist some elements which belong to no intervals. However, they are easy to handle : assign them values maximising the number of inversions. More precisely, check whether assigning them $$$0$$$ or $$$1$$$ gets you more inversions.
Let $$$S$$$ be the set of elements which belong to no intervals. The above argument works only because in the Optimal Solution for all $$$x < y$$$ where $$$x, y$$$ in $$$S$$$, $$$a_x \ge a_y$$$ will hold and they will be an inversion. Otherwise, this argument might not have been correct as the optimum choices might depend on the choices of the other elements in $$$S$$$ themselves.
The above part felt a bit confusing, so I tried to rewrite what It says exactly :
We first fix the $$$a_i$$$ values of those indices $$$i$$$ which belong to an interval, call this set $$$T$$$.
Then, we traverse over the elements of $$$S$$$ (which belong to no interval), and we use the greedy idea of trying both $$$0$$$ or $$$1$$$ and maximising inversions of these elements in $$$S$$$ with respect to $$$T$$$.
It is easy to see that the elements in $$$S$$$ will be assigned $$$1, 1, .... 1, 0, 0, .... 0$$$ and all pairs within $$$S$$$ will be inversions. Thus, this approach works.
Thus, we can still solve this version in $$$O(m^2)$$$.
This can be extended to the hard version of the problem. Further, this can probably be optimized to $$$O(m)$$$ by quickly calculating the change in answers from prefix of $$$i$$$ intervals being $$$P$$$-type to $$$(i + 1)$$$ intervals being $$$P$$$-type.
The Quadratic function being minimized at the boundaries is a more general observation and can be applied to E2 except now, the boundaries are no longer $$$1$$$ and $$$d - 1$$$. I will leave you to figure out how to extend the solution. There are $$$2$$$ hints below.
Consider the graph where there exists edge $$$j - k$$$ if intervals $$$I_j$$$ and $$$I_k$$$ have non-empty intersection. Work with the connnected components of this graph.
Reduce the problem to this problem :
- The intervals are disjoint (use the previous hint for this)
- Each interval has $$$2$$$ values associated with it, $$$lb_i$$$ and $$$ub_i$$$ such that the valid range of $$$x_i$$$ (the partition point of $$$(l_i, r_i)$$$) must lie in $$$[lb_i, ub_i]$$$.
- Solving subproblems is very useful. Even this entire problem is essentially a subproblem of the main problem. We can find a solution to this and then generalize/extend to there.
- Reducing the problem into something simpler, for example we went from a permutation to a binary array.
- Finding upper/lower bounds on the answer, and then constructing the bounds, proving they are the answer.
- Analyzing the nature of the Optimal Solution. We did it at least twice here
- Algebra, observing stuff about cost functions.
4. About Stress Testing
Something which a lot of low rated participants do wrong : They don't stress test.
There have even been contests where I stress tested $$$3$$$ problems. I tend to make a lot of errors while writing code and it is not easy to always catch those errors with manually reading or debugging. Learning the skill of stress testing is insanely useful (and you don't need any tools for it, I do it all in the same piece of code without taking much time).
This was the first problem (I think) that i had stress tested during contest : 1854A1 - Dual (Easy Version) I was still done within $$$12$$$ minutes and it took me less than $$$5$$$ mins to stress test, and fix my bug (this is an easy example though as I didn't have to write a brute). My bug only occurred when all elements of the array were equal and negative, I don't think I would manually be able to catch that.
1987F1 - Interesting Problem (Easy Version) In the initial submission to this problem, I missed so many if-cases. I was very careless. But i managed to stress test and avoid more WAs. I think it took me 3 — 4 iterations of finding bug with stress test, fixing them and then again running stress test before my code was actually correct.
I highly recommend stress testing for you. It is simpler than you think : all you need is a brute function, and a generator. I usually replace my input function with a generator, and use a lambda function for the brute. It takes barely $$$5$$$ minutes to setup for most problems.
Dominater069 orz Thanks for the CodeChef rounds and for writing comments in your solutions. I hope to see more educational blogs and contests from you.
bro stop wasting your times writing this blogs and instead practice some problems.
we are desperately waiting for first LGM from bharat
I dont think it is possible sorry to say mate
Thanks for the blog. Also congrats Dominater069 on reaching 2800+ and LGM performance this contest.
thanks
Thanks for the great blog! I think it's worth mention that writing a few example on paper and making conjecture/observation based on them is also extremely useful and used a lot when solving ad-hoc problems.
I guess Radewoosh was incorrect on this one /j
It is intentionally done due to that :)
If you look deeper into the blog, i always mention "problems" (because i was lazy to change them) only the headings have questions
you don't solve questions you answer them
you might become the first "question" lgm
I really wanted you to do this <3. As they always target lower-rated individuals.
God, it's like staring in the mirror...
Thats high praise! Thanks
this strategies are really very helpful
can you elaborate what you mean by "Nature of an Optimal Solution" ?
what are some properties that at least one best solution of the problem may contain?
for example, read the solution to q4 and its what can we learn section
I think for Q1, you meant to write $$$\sum \frac{1}{k_i} < 1$$$.
Other than that, this is a nice blog as I can see how a top competitor approaches a problem. Please do more of these write ups!
Thanks, fixed
It would be nice if someone added this blog to the catalog
done
That's of great help! Hope it can be read by more!
https://usaco.guide/ is a great resource.
Learn that topic by reading blogs and solving some problems on them? Idk what else advice you're looking for.
Thanks for the guide!!
Thank you for your helpful blog! I still have a questions(perhaps there are many others who have the same question): how do you complete a stress test in 5 minutes? Can you explain it in more detail?
278921664 This is what I do
Of course time taken depends on the time to write the brute mainly, and so longer brute => longer time.
“Great work!@Dominater069 I really appreciate the valuable time you put into this blog. I believe that combining all these with other great blogs in the catalog will help us improve to a certain extent.”
"(You can do it the other way too, for example dp(i,j)= the maximum value of |x| under the constraint that $$$∑freq_x$$$ is j ).
If you understood the question, transitions are pretty simple to come up with(no its not) and are left as an exercise."
How? Don't we need the value of the dp ( |x| ) to do the transitions?
Thanks a lot for the tips, also where can I find a more detailed guide for stress testing? thanks again :)
https://ali-ibrahim137.github.io/competitive/programming/2020/08/23/Stress-Testing.html
thanks , we need more of this !
I see most of your explanations are heavily relied on mathematics(algebra). If you don't mind can you please suggest us the topics that we need to master in mathematics in order to come up and write the solutions as you did in the blog. Thank you!!
Dominater069
Thanks for this really helpful blog!
Waiting for, How to Create Questions Dominater069 Version
Please write a blog about stress testing