So, I was solving this problem today which I couldn't solve on my own. Then I saw the unofficial editorial by neal. Neal has given an amazing algorithm to solve this problem but unfortunately no proof was provided. So, I took upon myself to prove that algorithm. I would also like to ask neal to please check my proof if it is correct. Thank you. Here is my proof.
Editorial for 1400E — Clear the Multiset
Algorithm
You can only do two actions for an optimal solution. Let's say we want to find out answer for $$$[L,R]$$$.
So, first option is to take $$$R-L+1$$$ and second is: $$$a_m + f(l_1,r_1) + ... + f(l_z,r_z)$$$. So, the answer is
$$$f(L,R) = min(R-L+1,a_m + f(l_1,r_1) + ... + f(l_z,r_z))$$$, where $$$a_m$$$ is the minimum element in the segment $$$[L,R]$$$ and $$$[l_i,r_i]$$$ is the range which does not contain $$$a_m$$$ and each element of this range is decreased by $$$a_m$$$. Now, the main point for this algorithm is that when we perform operation 1, then we do so $$$a_m$$$ times and not less. Otherwise we don't perform operation 1.
Proof
First of all, we observe that if we don't apply operation 1 to $$$[L,R]$$$ and apply operation 1 to any $$$[l_i,r_i]$$$ at any later point of time, then it is better that we instead apply operation 1 to $$$[L,R]$$$ first and apply remaining operations to $$$[l_i,r_i]$$$. This will not make our answer worse. So, we can say that we only have two choice, first is that we apply operation 1 to $$$[L,R]$$$ and then recurse for remaining $$$[l_i,r_i]$$$ or we don't apply operation 1 at all and take our answer as $$$R-L+1$$$.
At first, let's consider the case when all elements are equal (valued $$$x$$$) in our array. For this case, our answer will be, $$$ans = min(x,len)$$$, where $$$len$$$ is the length of array. Let's prove this with contradiction, let's say we decrease all elements by $$$y$$$ which is smaller than $$$x$$$. So the total will be $$$len + y$$$ (after applying operation 2 on each element) which is greater than $$$len$$$. So, we either decrease the whole array by $$$x$$$ (when $$$x \leq len$$$) or we just take the length of whole array (when $$$len<x$$$).
Now, we will prove the above algorithm with the help of induction.
Step 1
Base Case: When we have two types of elements (valued $$$x$$$ and $$$x+m$$$).
First we will prove that taking elements of equal value (valued $$$m$$$) in one block will give us better value than if we take those equal elements in multiple blocks.
Let $$$len$$$ be length of the whole block and $$$len_1, len_2, ..., len_z$$$ be the lengths of multiple blocks. So, $$$len = len_1 + len_2 + ... + len_z $$$. Now, the answer for one block will be $$$ans_{one} = min(m,len)$$$.
Now, if $$$ans_{one} = len$$$, then it would mean that $$$ans_{multiple} = len_1 + len_2 + ... + len_z$$$, where $$$ans_{multiple}$$$ is the answer for multiple blocks. So, in this case, $$$ans_{one} = ans_{multiple}$$$.
And when $$$ans_{one}=m$$$, then $$$ans_{multiple} = w.m + len_{left}$$$, where $$$w$$$ is the number of blocks for which $$$m$$$ is smaller than the length of block and $$$len_{left}$$$ is the sum of lengths of remaining blocks for which $$$m$$$ is greater than or equal to the length of block.
So, we can finally conclude that $$$ans_{one} \leq ans_{multiple}$$$.
From now on, we will only consider $$$ans_{one}$$$, which will denote the worst case.
So, to the final proof for base case. Let $$$t$$$ the number of elements which are equal to $$$x$$$, so $$$len-t$$$ elements will be equal to $$$x+m$$$. Now, we consider two cases, first is when we decrease all elements by $$$x$$$, and second is when we decrease all elements by $$$y$$$ which is smaller than $$$x$$$.
For case 1,
$$$ans_{case1} = min(len,x+min(len-t,m))$$$.
For case 2,
$$$ans_{case2} = min(len,t+y+min(len-t,x+m-y))$$$
Let's analyze the second term in case 2, i.e. $$$t+y+min(len-t,x+m-y)$$$. When $$$min(len-t,x+m-y) = len-t$$$, our term will become $$$t + y + len - t$$$ equals to $$$ len + y$$$ which is worse than $$$len$$$ (since our answer cannot exceed $$$len$$$). And when $$$min(len-t,x+m-y) = x + m - y$$$, our term will become $$$ x + t + m$$$ which is worse than $$$x+m$$$ in the first case, since $$$x-y \geq 1$$$. Hence $$$ans_{case1}$$$ is always better.
So, we can conclude that if we apply operation 1, then we should apply it $$$a_m$$$ times, otherwise we should not apply it at all.
Step 2
Case for k+1 type elements: When we have elements whose values equals to $$$x, x+m_1, x+m_1 + m_2, ..., x+m_1+m_2+...+m_k$$$. We assume that our answer is:
$$$ans_k = min(len_k, x + W_{k-1})$$$, where $$$len_i$$$ is the length of array when there are $$$i+1$$$ types of elements, and $$$W_{k-1}$$$ is the minimum cost for case of $$$k$$$ types elements.
Step 3
Case for k+2 type elements: When we have elements whose values equals to $$$x, x+m_1, ..., x+ m_1 + ... + m_{k+1}$$$.
Now, we consider two cases, first is when we decrease all elements by $$$x$$$, second is when we decrease all elements by $$$y$$$ which is smaller than $$$x$$$. Here, $$$t$$$ is number of occurrences of $$$x$$$ in the array.
$$$ans_{case1} = min(len_{k+1}, x + min(len_k, m_1 + W_{k-1}))$$$
$$$ans_{case2} = min(len_{k+1}, t + y + min(len_k,x + m_1 - y + W_{k-1})) $$$
Now, let's analyze the second term in case 2. In one case, it will become, $$$ t + y + len_k$$$, we know that $$$t+ len_k = len_{k+1}$$$, so it will become, $$$ y + len_{k+1}$$$, which is worse than $$$len_{k+1}$$$. And in the other case, it will become $$$ x + m_1 + W_{k-1} + t$$$ which is worse then second term in case 1, i.e. $$$ x + m_1 + W_{k-1}$$$.
So, we can see that $$$ans_{case1}$$$ is always better than $$$ans_{case2}$$$.
So, we can finally conclude that it is always better to either decrease all terms by $$$a_m$$$ or not decrease at all.