Montgomery Multiplication Explained (Fast Modular Multiplication)

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Hi CF! During this past weekend I was reading up on Montgomery transformation, which is a really interesting and useful technique to do fast modular multiplication. However, all of the explanations I could find online felt very unintuitive for me, so I decided to write my own blog on the subject. A big thanks to kostia244, nor, nskybytskyi and -is-this-fft- for reading this blog and giving me some feedback =).

Fast modular multiplication

Let $$$P=10^9+7$$$ and let $$$a$$$ and $$$b$$$ be two numbers in $$$[0,P)$$$. Our goal is to calculate $$$a \cdot b \, \% \, P$$$ without ever actually calling $$$\% \, P$$$. This is because calling $$$\% \, P$$$ is very costly.

If you haven't noticed that calling $$$\% \, P$$$ is really slow, then the reason you haven't noticed it is likely because the compiler automatically optimizes away the $$$\% \, P$$$ call if $$$P$$$ is known at compile time. But if $$$P$$$ is not known at compile time, then the compiler will have to call $$$\% \, P$$$, which is really really slow.

Montgomery reduction of $$$a \cdot b$$$

It turns out that the trick to calculate $$$a \cdot b \, \% \, P$$$ efficently is to calculate $$$a \cdot b \cdot 2^{-32} \, \% \, P$$$ efficiently. So the goal for this section will be to figure out how to calculate $$$a \cdot b \cdot 2^{-32} \, \% \, P$$$ efficently. $$$a \cdot b \cdot 2^{-32} \, \% \, P$$$ is called the Montgomery reduction of $$$a \cdot b$$$, denoted by $$$\text{m_reduce}(a \cdot b)$$$.

Idea (easy case)

Suppose that $$$a \cdot b$$$ just happens to be divisible by $$$2^{32}$$$. Then $$$(a \cdot b \cdot 2^{-32}) \, \% \, P = (a \cdot b) \gg 32$$$, which runs super fast!

Idea (general case)

Can we do something similar if $$$a \cdot b$$$ is not divisible by $$$2^{32}$$$? The answer is yes! The trick is to find some integer $$$m$$$ such that $$$(a \cdot b + m \cdot P)$$$ is divisible by $$$2^{32}$$$. Then $$$a \cdot b \cdot 2^{-32} \, \% \, P = (a \cdot b + m \cdot P) \cdot 2^{-32} \, \% \, P = (a \cdot b + m \cdot P) \gg 32$$$.

So how do we find such an integer $$$m$$$? We want $$$ (a \cdot b + m \cdot P) \,\%\, 2^{32} = 0$$$ so $$$m = (-a \cdot b \cdot P^{-1}) \,\%\, 2^{32}$$$. So if we precalculate $$$(-P^{-1}) \,\%\, 2^{32}$$$ then calculating $$$m$$$ can be done blazingly fast.

Montgomery transformation

Since the Montgomery reduction divides $$$a \cdot b$$$ by $$$2^{32}$$$, we would like some some way of multiplying by $$$2^{32}$$$ modulo $$$P$$$. The operation $$$x \cdot 2^{32} \, \% \, P$$$ is called the Montgomery transform of $$$x$$$, denoted by $$$\text{m_transform}(x)$$$.

The trick to implement $$$\text{m_transform}$$$ efficently is to make use of the Montgomery reduction. Note that $$$\text{m_transform}(x) = \text{m_reduce}(x \cdot (2^{64} \, \% \, P))$$$, so if we precalculate $$$2^{64} \, \% \, P$$$, then $$$\text{m_transform}$$$ also runs blazingly fast.

Montgomery multiplication

Using $$$\text{m_reduce}$$$ and $$$\text{m_transform}$$$ there are multiple different ways of calculating $$$a \cdot b \, \% \, P$$$ effectively. One way is to run $$$\text{m_transform}(\text{m_reduce}(a \cdot b))$$$. This results in two calls to $$$\text{m_reduce}$$$ per multiplication.

Another common way to do it is to always keep all integers transformed in the so called Montgomery space. If $$$a' = \text{m_transform}(a)$$$ and $$$b' = \text{m_transform}(b)$$$ then $$$\text{m_transform}(a \cdot b \, \% \, P) = \text{m_reduce}(a' \cdot b')$$$. This effectively results in one call to $$$\text{m_reduce}$$$ per multiplication, however you now have to pay to move integers in to and out of the Montgomery space.

Example implementation

Here is a Python 3.8 implementation of Montgomery multiplication. This implementation is just meant to serve as a basic example. Implement it in C++ if you want it to run fast.

P = 10**9 + 7
r = 2**32
r2 = r * r % P
Pinv = pow(-P, -1, r) # (-P^-1) % r

def m_reduce(ab):
  m = ab * Pinv % r
  return (ab + m * P) // r

def m_transform(a):
  return m_reduce(a * r2)

# Example of how to use it
a = 123456789
b = 35
a_prim = m_transform(a) # mult a by 2^32
b_prim = m_transform(b) # mult b by 2^32
prod_prim = m_reduce(a_prim * b_prim) # divide a' * b' by 2^32
prod = m_reduce(prod_prim) # divide prod' by 2^32
print('%d * %d %% %d = %d' % (a, b, P, prod)) # prints 123456789 * 35 % 1000000007 = 320987587

Final remarks

One important issue that I've so far swept under the rug is that the output of m_reduce is actually in $$$[0, 2 P)$$$ and not $$$[0, P)$$$. I just want end by discussing this issue. I can see two ways of handling this:

Alternative 1. You can force $$$\text{m_reduce}(a \cdot b)$$$ to be in $$$[0, P)$$$ for $$$a$$$ and $$$b$$$ in $$$[0, P)$$$ by adding an if-stament to the output of m_reduce. This will work for any odd integer $$$P < 2^{31}$$$.

Fixed implementation of m_reduce

def m_reduce(ab):
  m = ab * Pinv % r
  y = (ab + m * P) // r
  return y if y < P else y - P

Alternative 2. Assuming $$$P$$$ is an odd integer $$$< 2^{30}$$$ then if $$$a$$$ and $$$b$$$ $$$\in [0, 2 P)$$$ you can show that the output of $$$\text{m_reduce}(a \cdot b)$$$ is also in $$$[0,2 P)$$$. So if you are fine working with $$$[0, 2 P) \vphantom]$$$ everywhere then you don't need any if-statements. Nyaan's github has a nice C++ implementation of Montgomery multiplication using this style of implementation.

Rev.	Кто	Когда	Δ	Комментарий
en8	pajenegod	2022-05-31 23:47:08	0	(published)
en7	pajenegod	2022-05-31 23:34:44	34	Tiny change: ' $[0, 2 P)$ everywhe' -> ' $[0, 2 P) \vphantom]$ everywhe'
en6	pajenegod	2022-05-31 22:35:15	108	Tiny change: '\cdot P) \, \% \, 2^{32} = ' -> '\cdot P) \% 2^{32} = '
en5	pajenegod	2022-05-31 22:21:45	654	Tiny change: 'ssuming $P < 2^{30}$ ' -> 'ssuming $P$ is an odd integer $< 2^{30}$ '
en4	pajenegod	2022-05-31 21:12:43	1098	Tiny change: 'stament:\n```\ndef' -> 'stament:\n\n```\ndef'
en3	pajenegod	2022-05-31 18:27:10	24
en2	pajenegod	2022-05-31 17:54:47	3	Tiny change: 'Hi CF!\nDuring t' -> 'Hi CF!\n\nDuring t'
en1	pajenegod	2022-05-31 17:46:24	3716	Initial revision (saved to drafts)