Hi everyone!
There is an article on cp-algorithms about how to compute $$$n!$$$ modulo prime number $$$p$$$. Today I was wondering, how to do it if we need to compute modulo $$$p^k$$$ rather than just $$$p$$$. Turns out one can do it efficiently if $$$p$$$ is small. Great thanks to Endagorion for sharing it with me!
Task formulation and outline of result
To clarify the task a bit, our ultimate goal here is to be able to compute e.g. binomial coefficients modulo $$$p^k$$$. Thus, what we really need is to represent $$$n! = p^t a$$$, where $$$\gcd(a, p)=1$$$, and then report $$$t$$$ and $$$a$$$ modulo $$$p^k$$$. This is sufficient to also compute $$$\binom{n}{r}$$$ modulo $$$p^k$$$.
We will show that, assuming that polynomial multiplication of size $$$n$$$ requires $$$O(n \log n)$$$ operations, and assuming that arithmetic operations modulo $$$p^k$$$ take $$$O(1)$$$, we can find $$$t$$$ and $$$a$$$ in $$$O(d^2 + dk\log k)$$$, where $$$d=\log_p n$$$. It requires $$$O(pk\log^2 k)$$$ pre-computation that takes $$$O(pk \log k)$$$ memory.
Motivational example
Xiaoxu Guo Contest 3 — Binomial Coefficient. Given $$$1 \leq n,k \leq 10^{18}$$$, find $$$\binom{n}{k}$$$ modulo $$$2^{32}$$$.
There are also some projecteuler problems that require it, including with several queries of distinct $$$n$$$ and $$$k$$$.
Algorithm with polynomials
Let $$$k=\lfloor \frac{n}{p} \rfloor$$$, then we can represent $$$n!$$$ as
So, there is a part contributed by numbers divisible by $$$p$$$, which can be reduced to computing $$$k!$$$, and a part contributed by numbers not divisible by $$$p$$$. Let $$$n = a_0 + a_1 p + \dots + a_d p^d$$$, then the later can be represented as follows:
To compute it quickly, we can define a family of polynomial $$$P_{i,a}(x)$$$ for $$$0 \leq a \leq p$$$ such that
so the value of the factorial would be represented as
and expanding $$$k!$$$ it then rewrites into
which simplifies as
Now, what would it take us to use this setup? First of all, notice that $$$P_{i,a}(x)$$$ can be computed from one another:
Note that for shorter and more consistent implementation, this recurrent formula also mostly works for $$$i=1$$$ if we put $$$P_{0, p}(x) = x+1$$$, but for $$$P_{1,p}$$$ we should go up to $$$p-2$$$ instead of $$$p-1$$$. We should also note that for $$$P_{i,a}(x)$$$, we only care for coefficients up to $$$x^{\lfloor x/i \rfloor}$$$, as the larger ones are divisible by $$$p^k$$$.
This allows to compute $$$P_{i,a}(x)$$$ in $$$O(\frac{pk \log k}{i})$$$ for all $$$a$$$ for a given $$$i$$$. Over all $$$i$$$ from $$$1$$$ to $$$k$$$ it sums up to $$$O(pk \log^2 k)$$$ time and $$$O(p k \log k)$$$ memory. Then, evaluating all the polynomials for any specific $$$a$$$ requires $$$O(d^2 + dk \log k)$$$ operations, where $$$d = \log_p n$$$.
As it requires some manipulations with polynomials, I implemented it in Python with sympy just as a proof of concept:
Algorithm with $$$p$$$-adic logarithms and exponents
The algorithm above requires some heavy machinery on polynomials. I also found out an alternative approach that allows to compute $$$t$$$ and $$$a$$$ for any given $$$n$$$ divisible by $$$p$$$ in $$$O(pk)$$$. It doesn't rely on polynomial operations, except for Lagrange interpolation.
Let $$$n=pt+b$$$, where $$$0 \leq b < p$$$, then we can represent its factorial as
We can further rewrite it as
Let's learn how to compute the product
as with it we can represent the factorial as
We can take a $$$p$$$-adic logarithm from the product to get
We can precompute sums of $$$i^{-z}$$$ for each $$$z$$$ up to $$$k$$$ and $$$b$$$ up to $$$p-1$$$ in $$$O(pk)$$$, and we can find the sum of $$$j^z$$$ for $$$j$$$ from $$$0$$$ to $$$t$$$ in $$$O(z)$$$ via Lagrange interpolation. Therefore, with $$$O(pk)$$$ precomputation, we can compute $$$\log A(b, t)$$$ in $$$O(k^2)$$$, which then allows to find $$$n!$$$ in $$$O(dk^2)$$$, where $$$d = \log_p n$$$. I implemented it in Python, as a proof of concept, but I didn't bother with faster sums over $$$i$$$ and $$$j$$$:
Note: Due to inherent properties of $$$p$$$-adic logarithms and exponents, the algorithm only works with $$$p>2$$$, and it might return $$$-n!$$$ instead with $$$p=2$$$. This is because $$$\exp \log z = -z$$$ with $$$p=2$$$ when $$$z$$$ has remainder $$$3$$$ modulo $$$4$$$, and it should be tracked separately.
Some other approaches
There is also a blog by prabowo, and a scholarly article by Andrew Granville. I'm not sure whether they talk about the same algorithms as described here.