Two days ago, the Div.3 (Codeforces Round 938 (Div. 3)) suffered from severe issues of hacks, because the problem G (1955G - GCD on a grid) did not have proper validation for the sum of $$$nm$$$ in hacks, which was specified as at most $$$2 \cdot 10^5$$$ in the statements. Sure, I won't ask about why that happened, that is not constructive discussion. Instead, I will discuss about something very interesting about the task.
During that incident, I was wondering. There has to be a way to solve this without the constraint on $$$n \cdot m$$$, right? Of course, $$$7\times 10^8$$$ bytes of input is impossible anyways, but if we ignore that, $$$10^8$$$ is not a very "dreaded" number under these ages of optimizations. There has to be a way to do this.
Then the idea came to me.
Before we cover the solution, we cover a few basic facts on number theory — it might not be necessary to know this to understand the solution, but it will be helpful. Basically, every integer is a point on the grid of infinite dimensions. Each dimension represents a prime factor, so if we restrict the domain to divisors of some integer, it becomes $$$O(\log x)$$$ dimensions because there are only $$$O(\log x)$$$ prime factors of an integer. $$$\gcd$$$ and $$$\text{lcm}$$$ becomes much more tractable to deal with on this grid, because they become simply $$$\min$$$ and $$$\max$$$ on each dimension. Same with divisibility, if each exponent on $$$a$$$ is no less than the corresponding exponent on $$$b$$$, then $$$a$$$ is divisible by $$$b$$$.
Now to the solution. The grid of divisors of $$$a_{1,1}$$$ has $$$O(\log a)$$$ dimensions and $$$d(a)$$$ points, so if we use the same idea on how one flattens a multidimensional array to one dimension, we can map each divisor (point) to their corresponding indices in one array. So, let us consider using a bitset of divisors, so each cell in the DP table can comfortably store the status of each divisor comfortably.
Let us make a bitmask for each divisor $$$mask_d$$$, defined as the union of all divisors of $$$d$$$. Let the multiplier on prime $$$p$$$ while flattening the multidimensional grid be $$$mult_p$$$ (From the facts above, one can see this is essentially the product of $$$\text{exponent}+1$$$ for all smaller primes). Then, $$$mask_1=\texttt{0000...0001}$$$, and $$$mask_d=mask_{(d/p)}|(mask_{(d/p)} \ll mult_p)$$$ if $$$d$$$ is divisible by some prime $$$p$$$. From preprocessing a sieve we have information on all such values of $$$p$$$, so this can be computed nicely as well.
Now we assume WLOG all values in $$$a$$$ are divisors of $$$a_{1,1}$$$ (if it isn't then we can take GCD to make it so). Let $$$b_{i,j}$$$ be the $$$mask$$$ corresponding to the value of $$$a_{i,j}$$$. Then the DP transition becomes as follows —
And of course, the base condition is $$$dp_{1,1}=b_{1,1}$$$.
After finding $$$dp_{n,m}$$$, we can see that if $$$mask_d$$$ for some $$$d$$$ is completely contained in $$$dp_{n,m}$$$, then there exists a path whose GCD is divisible by $$$d$$$. So we try that for each $$$d$$$, and take the maximum $$$d$$$ where the divisibility condition holds.
The time complexity analysis is simple. Because it takes $$$\mathcal{O}(\sqrt{a})$$$ time to enumerate divisors of $$$a_{1,1}$$$, and processing $$$mask$$$-s takes $$$\mathcal{O}(\frac{d(a)^2}{w})$$$ time, we must use $$$\mathcal{O}(\sqrt{a}+\frac{d(a)^2}{w})$$$ time per test case. Then, there are $$$\mathcal{O}(nm)$$$ transitions in the DP, each taking $$$\mathcal{O}(\frac{d(a)}{w})$$$ time. So the DP takes $$$\mathcal{O}(nm\frac{d(a)}{w})$$$ time. Also as we did $$$\gcd$$$ for each cell, the $$$\gcd$$$ must take $$$\mathcal{O}(nm\log(a))$$$ Finally, trying the divisibility for each $$$d$$$ takes $$$\mathcal{O}(\frac{d(a)^2}{w})$$$ again, but that is already counted in the time complexity per test case so we are fine. The final time complexity required is $$$\mathcal{O}(t(\sqrt{a}+\frac{d(a)^2}{w})+\sum{nm}({\log(a)+\frac{d(a)}{w}}))$$$. Because $$$\frac{d(a)}{w}$$$ is such a small constant (precisely $$$4$$$), it should scale well for much larger values of $$$nm$$$, and even possibly run even when there were no constraints on the sum of $$$nm$$$, that is $$$\sum{nm}=10^8$$$ in the worst situation.
255932465 is the accepted submission, and the benchmark is as follows. For all cases $$$a_{i,j}=720\,720$$$ was used for all cells because that is the worst case for $$$d(a)$$$ (though $$$\gcd$$$ might be worse for other values). Only informations of $$$n$$$ and $$$m$$$ were input for each test case, to minimize the effect from IO bound. All benchmark results are from custom invocations. The result was as follows.
Case | Runtime |
---|---|
$$$t=100,n=100,m=100$$$ | $$$46\text{ms}$$$ |
$$$t=1000,n=100,m=100$$$ | $$$217\text{ms}$$$ |
$$$t=10000,n=100,m=100$$$ | $$$1358\text{ms}$$$ |
$$$t=100,n=300,m=300$$$ | $$$171\text{ms}$$$ |
$$$t=1,n=1000,m=1000$$$ | $$$92\text{ms}$$$ |
$$$t=1,n=2000,m=2000$$$ | $$$187\text{ms}$$$ |
$$$t=1,n=3000,m=3000$$$ | $$$359\text{ms}$$$ $$$^\dagger$$$ |
$$$t=1,n=4000,m=4000$$$ | MLE $$$^\dagger$$$ |
$$$^\dagger$$$ The stack overflowed, so I had to move the array dp
to static (global range).
May you have any question, please ask in comments! Thank you for reading this far.
why dont use bfs with target set as every divisors of a(0, 0) and a(n-1, m-1) by checking if there's a path? That's much easier tho a bit slow, but if optimize this you can avoid TLE. I just realised this right after the contest, when my computer went back after 2 hours crashing
The BFS takes $$$\mathcal{O}(nmd(a))$$$, but there isn't a very nice way to optimize it further with bitsets, at least there are none that I know of. If you mean using $$$\gcd(a_{1,1},a_{n,m})$$$ instead of $$$a_{1,1}$$$, I tried it (255928562), the difference was marginal.
I just replaced
vector<vector<bool>> vis(n, vector<bool>(m))
inside of BFS function tobitset<10005> vis
and got AC, thank you for inspiration. if you wondering 255935755 TL using 2d-vector, 255935445 AC using bitset.to be honest, it's so stupid. I mean isn't this problem setters responsibility which check this kind of silly "optimization" on pretest? idk who would think of this issue after pass pretest. the only reason 2/3 of ac disappeared after systest is not due to complexity of problem itself but weak pretest.
nice
orz sir it was a great opportunity working with you
Segment trees, sparse table, square root should all just submit to bitset supremacy at this point. It is the master data-structure to rule over all the subservient data-structures.
Agreed. As the representative of the bitset community, I suggest that we ban all non-bitset data structures. That way, we will be able to focus on solving the problems with a bitset rather than developing a huge data structure which will ofc fail to work correctly.
You can get rid of MLE by processing row by row (example)
"Simply use bitset" -- https://codeforces.me/blog/entry/53168
Nice find, although I’d say that referring to it as ‘bitset where you least expect’ is a bit of an overstatement. Basically, the intended solution is to compute for each $$$d$$$ a 0/1 dp indicating whether or not you reached cell $$$(i, j)$$$ going through only multiples of $$$d$$$. You just parallelized the computations over all $$$d$$$ (as the required dp tables are binary matrices).
Sorry, that was intentionally a clickbait title