Today, I want to talk a bit about the relationship between two well-known methods for calculating the $$$n$$$-th term of a linear recurrence: generating functions and matrix exponentiation. More specifically, I want to explore how these two things are related to one another or how they can be thought of as variants of one another.
This blog shouldn't be treated as a tutorial. If you know these algorithms, I doubt that I can teach you to solve some new problem in this blog. Rather, I'm kind of thinking out loud. I think it is a good idea to try to reflect and find connections between different things we have learned. This kind of thinking tends to strengthen the understanding of both methods and makes it easier to improve upon them to solve problems.
And I'm sure that what I'm about to present is already quite well-known to some people ;). But when I realized this it was cool to me, so I want to share it.
Problem. You have two sequences $$$b_1, b_2, \ldots, b_m$$$ (where $$$b_m \ne 0$$$) and $$$c_0, c_1, \ldots, c_{m - 1}$$$. The infinite sequence $$$a_0, a_1, a_2, \ldots$$$ is defined as
Calculate $$$a_n$$$.
As a simple example, you can consider the following simple variant: the Fibonacci sequence is defined by $$$F_0 = 0$$$, $$$F_1 = 1$$$ and $$$F_i = F_{i - 1} + F_{i - 2}$$$ for any $$$i \ge 2$$$. Calculate $$$F_n$$$.
Matrix exponentiation method
Fibonacci case. Notice that for any $$$i \ge 2$$$, we have
Thus, we can calculate the $$$n$$$-th term by calculating
using binary exponentiation and reading the answer from the bottom entry of the resulting vector.
General case. Define the matrix $$$B$$$ as
Notice that for any $$$i \ge m$$$,
Thus we can calculate
using binary exponentiation and read the answer from the bottom entry in the resulting vector.
Generating function method, case without multiple roots
Fibonacci case. First, recall that the generating function of the Fibonacci sequence, $$$\mathcal{F}(x) = \sum_{i = 0}^\infty F_i x^i$$$, can be expressed using the closed form formula $$$\mathcal{F}(x) = \frac{-x}{x^2 + x - 1}$$$.
The polynomial $$$x^2 + x - 1$$$ can be factored as $$$x^2 + x - 1 = (x - \phi)(x - \psi)$$$; you can calculate $$$\phi$$$ and $$$\psi$$$ using the quadratic formula, they are $$$\phi = \frac{-1 + \sqrt{5}}{2}$$$ and $$$\psi = \frac{-1 - \sqrt{5}}{2}$$$.
We want to represent $$$\mathcal{F}(x)$$$ as $$$\frac{A}{\phi - x} + \frac{B}{\psi - x}$$$, where $$$A$$$ and $$$B$$$ are numbers. To find $$$A$$$ and $$$B$$$, we note that
must hold. Thus we must have $$$A+ B = 1$$$ and $$$A \psi + B \phi = 0$$$. This is a system of linear equations, the solutions are $$$A = \frac{\phi}{\phi - \psi}$$$ and $$$B = \frac{\psi}{\psi - \phi}$$$.
We know that $$$\frac{1}{a - x} = \frac{1}{a} + \frac{x}{a^2} + \frac{x^2}{a^3} + \cdots$$$ for any $$$a \ne 0$$$. Thus
So, to get the $$$n$$$-th term of the Fibonacci sequence, all you have to do is calculate $$$\frac{A}{\phi^{n + 1}} + \frac{B}{\psi^{n + 1}}$$$.
General case. Let's turn our sequences into generating functions: define $$$\mathcal{A}(x) = \sum_{i = 0}^\infty a_i x^i$$$, $$$\mathcal{B}(x) = \sum_{i = 1}^m b_i x^i$$$ and $$$\mathcal{C}(x) = \sum_{i = 0}^{m - 1} c_i x^i$$$. The problem statement basically says that $$$\mathcal{A}(x) = \mathcal{A}(x) \mathcal{B}(x) + \mathcal{C}(x)$$$. Rearranging this, we get
We must devise a fast way to get the $$$n$$$-th coefficient of this infinite series. I'll also introduce the notation $$$\mathcal{B}^-(x) = \mathcal{B}(x) - 1$$$.
We now proceed as before. The polynomial $$$\mathcal{B}^-(x)$$$ can be factored as
Again, I assume we are working in a field where the $$$\lambda_i$$$ exist and if not, we can always move to a field extension. However, in this blog, we assume that $$$\lambda_i$$$ are all distinct.
As before, we want to find the partial fraction decomposition of $$$\frac{\mathcal{C}(x)}{\mathcal{B}^-(x)}$$$: to find $$$k_1, k_2, \ldots, k_m$$$ such that
To do that, multiply out the polynomials $$$b_m k_1 (x - \lambda_2) \cdots (x - \lambda_m)$$$, $$$b_m (x - \lambda_1) k_2 \cdots (x - \lambda_m)$$$ and so on, and add them all together. You need to find values of $$$k_i$$$ such that the resulting polynomial equals $$$\mathcal{C}(x)$$$. For each $$$i = 0, 1, \ldots, m - 1$$$, you have an equation of the form $$$\alpha_1 k_1 x^i + \alpha_2 k_2 x^i + \cdots + \alpha_m k_m x^i = c_i x^i$$$. Divide it by $$$x^i$$$ and you have a linear equation. In other words, you have a system of $$$k$$$ linear equations and $$$k$$$ variables; this can be solved by Gaussian elimination. Solve it for $$$k_i$$$. It can be proven that it always has an unique solution (this can be derived from Bezout's lemma for polynomials).
Now we have shown that
Thus, we can calculate $$$a_n$$$ with the formula
The polynomial $$$\mathcal{B}^-(x)$$$ has a coefficient of 1 at $$$x^0$$$, so we know that $$$\lambda_i \ne 0$$$ for all $$$i$$$ — thus there will be no annoying issues with division by zero here.
The cool bit I want to show you, i.e. their relationship
The reason for writing all of the above was basically to make sure we are all on the page about what the methods exactly are, to agree on notation etc. The point of the blog is this next bit.
Generating functions are intriguing. It seems completely outlandish that you can do anything useful at all by converting such sequences into series — objects that on the surface, seem analytic. And we really aren't even converting them to functions. We are converting them to formal power series, which don't make sense as functions since they may not converge. So, it's all just a really strange notation and we have been working with sequences all along. But the really weird part is that this strange notation actually gave us so much insight. When I see such solutions, I wonder if it is really useful at all and whether we are not just overcomplicating things with this notation. If I try to think about the same solution without working with generating functions, the part where we move to the partial fraction decomposition is roughly the point where I can't intuitively understand what's going on with the sequence anymore.
Well, but let's try to somehow describe what we did anyway. We calculated $$$k_1, \ldots, k_m$$$ by solving
for some matrix $$$T$$$. Because the equation has exactly one solution, $$$T$$$ must be invertible and we can write
After that, we multiplied each of the $$$k_i$$$ by $$$\lambda^{-n - 1}$$$. In matrix form, you could say we calculated $$$\Lambda^{n + 1} T^{-1} c$$$, where
Do you see it? Here, let me clean that up a bit. If we write $$$U = \Lambda T^{-1}$$$, then the vector we effectively calculated was $$$\Lambda^n U c$$$. After that, we took the sum of all the elements of the vector.
Now, let's make the simple concept of summing up elements of a vector more complicated. Consider the matrix $$$U^{-1} = (\Lambda T^{-1})^{-1} = T \Lambda^{-1}$$$. From the construction of $$$T$$$ we can see that the last row of $$$T$$$ is of the form
This is because the $$$i$$$-th column of the matrix $$$T$$$ is the polynomial $$$\frac{b_m (x - \lambda_1)(x - \lambda_2) \cdots (x - \lambda_m)}{x - \lambda_i}$$$, with the last row corresponding to the free term.
After multiplying $$$T$$$ with $$$\Lambda^{-1}$$$ from the right, the $$$i$$$-th column is multiplied by $$$\lambda_i$$$, giving the last row the form
From Viète's formula and the fact that $$$\lambda_i$$$ are the roots of $$$\mathcal{B}^-(x)$$$, we know that $$$(-1)^m \lambda_1 \lambda_2 \cdots \lambda_m$$$ is the free term of the polynomial $$$\mathcal{B}^-(x)$$$, that is, $$$-1$$$. Putting it together, we see that the last row of $$$U^{-1}$$$ is in fact
In other words, the generating function method calculates $$$U^{-1} \Lambda^n U c$$$ and returns the bottom entry of the resulting vector. The actual computation in the generating function method pretty closely mirrors this matrix expression, with the caveat that instead of calculating $$$Uc$$$ we solved a system of linear equations, and instead of multiplying with $$$U^{-1}$$$, we only multiplied with the last row because that was all we needed.
The expression $$$U^{-1} \Lambda^n U$$$ heavily reminds of similar matrices and diagonalization. Recall that matrices $$$X$$$ and $$$Y$$$ are said to be similar if there exists some invertible matrix $$$Z$$$ such that $$$X = Z^{-1} Y Z$$$. Also, recall that if all eigenvalues of a matrix are distinct, then it is similar to a diagonal matrix whose diagonal elements are its eigenvalues. Returning to the matrix exponentiation method, what happens if we try to transform the matrix $$$B$$$ this way? To start off, what are its eigenvalues? Well, we usually calculate eigenvalues by factoring the characteristic polynomial. It turns out that the characteristic polynomial of $$$B$$$ is (up to sign) $$$\mathcal{B}^R(x) = x^m - b_1 x^{m - 1} - b_2 x^{m - 2} - \cdots - b_m$$$.
Not exactly $$$\mathcal{B}^-(x)$$$, but very similar — it is basically the same, except the order of the coefficient is reversed. What are its roots? You might have a hunch already, let's test it. Suppose that $$$\mathcal{B}^-(\lambda) = 0$$$. Then
Because all roots of $$$\mathcal{B}^-(x)$$$ are nonzero, we can divide the last equation by $$$\lambda^m$$$ and conclude that the eigenvalues of $$$B$$$ are precisely the reciprocals of the roots of $$$\mathcal{B}^-(x)$$$. That also means that they are all different. Hence, there exists an invertible matrix $$$V$$$ such that $$$B = V^{-1} \Lambda V$$$. Here $$$\Lambda$$$ has the same meaning as above.
In the matrix exponentiation method we calculated $$$B^n c$$$ and read the bottom entry of the resulting vector. This is equivalent to calculating $$$(V^{-1} \Lambda V)^n c = V^{-1} \Lambda^n V c$$$ and reading the bottom entry of the resulting vector. Now the relationship between the generating function and matrix exponentiation methods is hopefully clear. The generating function method calculates the same thing, but it optimizes the exponentiation part by diagonalizing $$$B$$$ and applying a few "ad hoc optimizations". In particular, the partial fraction decomposition corresponds to diagonalization.