Yandex.Algorithm 2020 D — my solution

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	Dominater069	160
3	Um_nik	160
5	atcoder_official	159
6	djm03178	157
7	adamant	153
8	luogu_official	150
9	awoo	149
10	TheScrasse	146

This is a continuation of https://codeforces.me/blog/entry/81365#comment-719271. I'll use what we proved there.

Let's assign all consecutive '0'-s to the cloest non-'0' before them. We get a sequence of tokens between which we may add "**" arbitrarily.

How should we pick the first base? There are special cases if the whole string is "102" and "1d0...01" where we don't split at all. Otherwise, the first base won't be "102" or "1d0...01", only a 1-digit or 2-digit number, "d0...01", "1d0...0" or "d0...0".

If the first token is $$$1$$$, it'll be "1d0...0". We can't add more tokens and we obviously don't want $$$1$$$ as a base. From now on we can eliminate "1d0...0" as an option.
If the second token is at least $$$10$$$, the first token forms the first base. This also follows directly from Step 2.
If the second token is at least $$$2$$$ and we include it in the first base:
- The first token must be one digit.
- If the exponent (rest of the string) has at least two digits, then $$$e \ge 10$$$, the first token/digit is $$$a \ge 2$$$, the second token/digit is $$$b \ge 2$$$ and we need $$$a^{b^e} \lt (10a+b)^e$$$.
- Since $$$10a+b \le a^5$$$ (proof by bruteforce) and $$$b^e \ge 5e$$$ in this case, then $$$a^{b^e} \ge a^{5e} \ge (10a+b)^e$$$.
- The only possible case is when the exponent is just one digit, i.e. one token. Then we only have three one-digit tokens and this can be solved easily since there's at most one option greater than $$$9^{99}$$$. If you're worried about time complexity, precompute.
- In all other cases, the first base is just the first token.
Now the second token is $$$1$$$. Do we put it at the end of the first base or the start of the second base? Probably the second, right?

Once again, let's use the rapid increase of the remaining exponent. We set two conditions that cover the vast majority of the remaining cases:

There are at least three digits in the third+ tokens.
The remaining tokens don't form only one (the last) base. They form an exponential "h**r", where $$$r \ge 2$$$ (Step 1) and "h" is the second base, with length $$$k$$$.

We want to prove that putting the second token in the second base instead would be better, i.e. that $$$(10a+1)^{h^r} \le a^{(10^k + h)^r}$$$. Since $$$(10a+1)^{h^r} \le (21a/2)^{h^r}$$$, and $$$a \ge 2$$$, we can simplify that to $$$\left((10^k+h)^r-h^r\right)/h^r \gt \ln (21/2) / \ln a$$$ and further to $$$(10^k/h+1)^r \ge \frac{\ln (21/2)}{\ln 2} + 1$$$.

Since $$$h \lt 10^k$$$, it obviously holds for $$$r \ge 3$$$, where the left side is greater than $$$8$$$ and the right side smaller than $$$5$$$.
For $$$r = 2$$$, we can further simplify to $$$h/10^k \le 91/100$$$, i.e. $$$h \le 91 \cdot 10^{k-2}$$$.
Let's keep in mind that "h" must be a valid base that's not at the end and has at least two digits, so it's either an exactly 2-digit number ($$$k = 2$$$), "d0...01", "1d0...0" or "d0...0".
The last three options obviously satisfy $$$h \le 91 \cdot 10^{k-2}$$$.
For $$$92 \le h \le 99$$$, though, we can immediately see that $$$92^2 \lt 9^{22}$$$ and $$$99 \gt 9^3$$$, so these are always suboptimal bases if they aren't at the end of the string.

All that remains is dealing with remaining, quite special, cases.

First, if our string is "d0...01c", where $$$c$$$ and $$$d$$$ are non-zero digits and there are $$$k$$$ '0'-s in the first token (which isn't "1"), then "d0...0**1c" and "d0...01**c" are the only options (we eliminate the case with no "**" since the first token isn't "1" and there are $$$3$$$ non-zero digits). We're putting the token "1" in the second base if $$$(d \cdot 10^k)^{10+c} \ge (d \cdot 10^{k+1} + 1)^c$$$.

Let's simplify it to $$$(d \cdot 10^k)^{10} \ge \left(10 + \frac{1}{d \cdot 10^{k+1}}\right)^c$$$.
The right side is $$$\le \left(10 + \frac{1}{20}\right)^c \lt 11 \cdot 10^8$$$.
For $$$k \ge 1$$$, the left side is $$$\ge 10^10$$$, so the inequality holds.
For $$$k = 0$$$, we can bruteforce. We don't get a clean rule for when "1" goes to the first base, but since we're dealing with 3-digit strings, we don't need it.

If our string is "d0...01cg", where we just added an arbitrary digit $$$g$$$, then "d0...0**1cg", "d0...0**1c**g", "d0...01**cg" and "d0...01**c**g" are the only options.

Let's start with leaving $$$k = 0$$$ to bruteforce/precomputation just like before. That shit's ugly.
We can eliminate "d0...01**c**g", it's worse than "d0...0**1c**g". Proof just above.
The same proof works for $$$(d \cdot 10^k)^{100+10c+g} \ge (d \cdot 10^{k+1} + 1)^{10c+g}$$$: $$$(d \cdot 10^k)^{100} \ge 10^{100} \gt \left(10 + \frac{1}{20}\right)^{99} \gt \left(10 + \frac{1}{20}\right)^{10c+g}$$$. We eliminate "d0...01**cg" too.
Now we know that the first base is just the first token.

Finally, if there are at least three digits in the third+ tokens, but they can form one base, then it's a suffix "102", "1c0...01", "c00...01", "1c00...0" or "c000...0" (the whole string minus the prefix "d0...01", where the first token isn't "1").

With "102", we can form "d0...0**110**2", i.e. $$$(d \cdot 10^k)^{12,100}$$$. Among other options, only "d0...01**102" isn't completely obviously bad, and since $$$\left(10+1/(d \cdot 10^k)\right)^{102} \le (10+1/2)^{102} \lt 2^{12,100-102}$$$, that has a smaller value for any valid $$$k$$$ and $$$d$$$. We can throw out this case.
With "1c0...01", the only options are "d0...01**1c0...01" and "d0...0**11**c0...01". Here, $$$(d \cdot 10^{k+1} + 1)^{(10+c) \cdot 10^{l+1} + 1} \lt (d \cdot 10^k)^{11^{c \cdot 10^{l+1} + 1}}$$$ holds since it simplifies to

$$$\left(10 + 1/(d \cdot 10^k)\right)^{(10+c) \cdot 10^{l+1} + 1} \lt (d \cdot 10^k)^{11^{c \cdot 10^{l+1} + 1} - (10+c) \cdot 10^{l+1} - 1}\,;$$$

the exponent on the right side is $$$11^{c \cdot 10^{l+1} + 1} - (10+c) \cdot 10^{l+1} - 1 \gt \left((10+c) \cdot 10^{l+1} + 1\right) \cdot 10$$$ (prove that it's increasing, by induction on $$$c$$$ and $$$l$$$), so even for $$$d \cdot 10^k = 2$$$, the right side is at least $$${2^{10}}^{(10+c) \cdot 10^{l+1} + 1}$$$. We can also throw out this case.

With "1c00...0" (at least one '0'), only "d0...0**11**c00...0" and "d0...01**1c0...0" are options. Since there's at least one '0', this is also bad and the proof is almost identical to the one just above — we get an exponent that's at least $$$11^{10}$$$ instead of $$$11^{11}$$$, but that's still a huge number and everything works the same way.
With "c000...0", only "d0...0**1c000...0" and "d0...01**c000...0" are options. The number of '0'-s at the end doesn't matter since multiplying the exponent by $$$10$$$ is here just taking the $$$10$$$-th power of the previous exponential's value, for both options.
- It's easy to prove that if $$$d \cdot 10^k \ge 11$$$, then $$$(d \cdot 10^k)^{10+c} \ge (d \cdot 10^{k+1}+1)^c$$$ since $$$(d \cdot 10^k)^{10} \ge \left(10+1/(d \cdot 10^k)\right)^c$$$, where the left side is $$$\ge 11^10$$$ and the right side $$$\le 11^c$$$.
- If $$$d \cdot 10^k = 10$$$, then $$$10^{10} \ge \left(10+1/10\right)^9 \ge \left(10+1/10\right)^c$$$ still holds.
- We can see that for $$$k \ge 1$$$, the second token ("1") doesn't go to the first base. For $$$k = 0$$$, we bruteforce all remaining cases.
With "c00...01", this suffix must stay together, so the only options are "d0...0**1c0...01" and "d0...01**c0...01". This looks similar to the previous case. For $$$k \ge 1$$$, we'd still expect $$$(d \cdot 10^k)^{(10+c) \cdot 10^{l+1} + 1} \ge (d \cdot 10^{k+1} + 1)^{c \cdot 10^{l+1} + 1}$$$, i.e.

$$$(d \cdot 10^k)^{10^{l+2}} \ge \left(10 + 1/(d \cdot 10^k)\right)^{9 \cdot 10^{l+1} + 1} \ge \left(10 + 1/(d \cdot 10^k)\right)^{c \cdot 10^{l+1} + 1} \,;$$$

for $$$d \cdot 10^k \gt 10$$$, it obviously holds, and for $$$= 10$$$, $$$10^{10^{l+1}-1} \ge \left(1 + 1/100\right)^{10^{l+2}-10} \gt \left(1 + 1/100\right)^{9 \cdot 10^{l+1} + 1}$$$ still holds since $$$l \ge 1$$$ and $$$10 \ge (1+1/100)^{10}$$$.

Finally, if it's between "d**1c0...01" and "d1**c0...01", the first option is better if $$$10 \ln d / \ln(10+1/d) \ge c + 10^{-l-1}$$$; even though any $$$l \ge 1$$$ is possible, the last term is so small ($$$\le 1/100$$$) that it doesn't matter — we decide based on $$$c$$$ and $$$d$$$ in the same way as with "c000...0".

This is a huge proof (I didn't find a shorter one that I'd be sure of), so I'm sure to have made some small mistakes at least. Of course, this isn't how you solve the problem during a contest — you can precompute more cases, use intuition to decide that all bases except the last few must follow some nice rules, just pray that you didn't miss anything and still submit... or test your solution on small diverse cases. Nevertheless, this ensures that no failing test cases are missed.

	Rev.	Lang.	By	When	Δ	Comment
	en2		Xellos	2020-11-10 15:49:29	3718
	en1		Xellos	2020-11-10 01:59:09	8576	Initial revision (published)

History