a_little_cute's blog

By a_little_cute, history, 10 months ago, In English

Can we solve problem 1920C - Partitioning the Array faster than O(n*d(n)) ≈ O(n^(4/3))?

I'm getting interested in it.

  • Vote: I like it
  • +16
  • Vote: I do not like it

»
10 months ago, # |
  Vote: I like it +1 Vote: I do not like it

Of course.

»
10 months ago, # |
  Vote: I like it +8 Vote: I do not like it

How could we solve the problem in O(N * d(N))? For each divisor we need to calculate the GCD of the differences of each index [0..k-1] which would result in a final complexity of O(N log N d(N)) if I am not mistaken.

  • »
    »
    10 months ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    Taking the gcd (using the Euclidean algorithm) of elements in an array of size $$$n$$$ and max element $$$x$$$ takes $$$O(n + \log x)$$$ time.

    • »
      »
      »
      10 months ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      Ah, my ignorance of basic Number Theory shows once again. Maybe I could have passed with much better time if I implemented the GCD function using the Euclidean algorithm instead of using __gcd, because I got close 1.5 seconds runtime in-con (which seems around N log N d(N)).

      Upd: According to https://stackoverflow.com/questions/24981380/implementation-of-the-in-built-gcd-method-in-c, __gcd is already implemented using the Euclidean algorithm... Maybe the constants are blowing up runtime

      • »
        »
        »
        »
        10 months ago, # ^ |
        Rev. 2   Vote: I like it +11 Vote: I do not like it

        Your solution is $$$O(n\log n\ d(n))$$$ since you're sorting all elements for every divisor.

        • »
          »
          »
          »
          »
          10 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          You both are completely correct, it appears that I may have been a bit too drunk off of the buzzer-beater AC on C to properly analyze my solution.

      • »
        »
        »
        »
        10 months ago, # ^ |
          Vote: I like it +5 Vote: I do not like it

        The most recent gcd implementation (at least in the libstdc++ version I have) uses binary GCD, which doesn't enjoy this nice property. Binary GCD in general is pretty fast, though.

        (Here's a reference for why the claim I mentioned earlier is true).

        I went through your implementation, and it looked like you were doing some other operations and sorting too, which could have had an extra overhead.

        For references, here are some links to my implementations with Euclidean GCD (300ms), inbuilt GCD (650ms), inbuilt GCD without pragmas (700ms). The fact that there is a slight speedup with pragmas might mean that the binary GCD part is getting sped up using specific BMI/BMI2 instructions.

        • »
          »
          »
          »
          »
          10 months ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          btw __gcd is probably always euclidean

          std::gcd241431575 ~950 ms

          __gcd: — 241568039 ~400 ms

          • »
            »
            »
            »
            »
            »
            10 months ago, # ^ |
              Vote: I like it +5 Vote: I do not like it

            I find this quite interesting, since std::__gcd is libstdc++'s own implementation of GCD, and a divergence between std::gcd and std::__gcd points to either forgetting to update it, or some other more important reason that they might have found.

        • »
          »
          »
          »
          »
          10 months ago, # ^ |
          Rev. 2   Vote: I like it +10 Vote: I do not like it

          Can you explain how my most naive GCD implementation runs in 187ms? I can only imagine that comparing elements in different order breaks faster in particular test suite.

        • »
          »
          »
          »
          »
          10 months ago, # ^ |
            Vote: I like it +10 Vote: I do not like it

          Thanks for the analysis, I have removed the mentioned overheads and now it indeed performs much better (259 ms).

»
10 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Can you please explain how calculating d(n) is O(n^(1/3))?

»
10 months ago, # |
  Vote: I like it +5 Vote: I do not like it

I think the problem is solvable in something like $$$n\cdot\log(n)^2$$$, more precisely, it's $$$\sigma(n)\cdot\log(n)\cdot\omega(n)$$$, where $$$\sigma(n)$$$ is the sum of divisors of $$$n$$$ and $$$\omega(n)$$$ is the number of prime factors of $$$n$$$.

First, we come up with a function $$$g(x,d) \ \forall \ d \ |\ n, 0\le x < d$$$ (there are $$$\sigma(n)$$$ such values we need to calculate), and we'll calculate these values in decreasing order of the divisors.

We define

$$$g(x,d) = \gcd(a_x - a_{d+x}, \cdots, a_{n-2\cdot d+x} - a_{n-d+x})$$$

Now, notice that if $d = p \cdot q$ for some prime factor $$$p$$$ of $$$d$$$, then

$$$g(y,q) = \gcd(a_y - a_{y+q}, \cdots , a_{p-2\cdot q + y} - a_{p - q + y}, g(y, d), \cdots , g(p - q + y, d))$$$

which takes $O(p)$ time to compute for a given $$$y$$$. Since there are $$$q$$$ such values, the total computation time is $$$O(p\cdot q) = O(d)$$$ So, when iterating over the divisors in decreasing order, after we've computed the values of $$$g(x,d)$$$ for all $$$0 \le x < d$$$, we'll be doing this computation for all prime factors of $$$d$$$, which makes the overall complexity $$$O(d\cdot \omega(d))$$$. Since we're iterating in descending order of divisors, when we reach a given divisor, we would have already computed its set of values, and we can do the similar thing for all its prime factors.

Total complexity hence turns out to be $$$\sum_{d \ | \ n} O(d \cdot \omega(d)) = \sigma(n) * \omega(n)$$$. I believe the complexity could even be brought further down if we are somehow smarter about which prime factors to consider, as one divisor would have been calculated multiple times in this approach.