On linear recurrences and the math behind them

Revision en15, by adamant, 2022-02-22 15:40:16

Hi everyone!

There are already dozens of blogs on linear recurrences, why not make another one? In this article, my main goal is to highlight the possible approaches to solving linear recurrence relations, their applications and implications. I will try to derive the results with different approaches independently from each other, but will also highlight similarities between them after they're derived.

Definitions

Def. 1. An order $$$d$$$ homogeneous linear recurrence with constant coefficients (or linear recurrence) is an equation of the form

$$$ F_n = \sum\limits_{k=1}^d a_k F_{n-k}. $$$

Def. 2. In the equation above, the coefficients $$$a_1, \dots, a_d \in R$$$ are called the recurrence parameters,

Def. 3. and a sequence $$$F_0, F_1, \dots \in R$$$ is called an order $$$d$$$ linear recurrence sequence.


The most common task with linear recurrences is, given initial coefficients $$$F_0, F_1, \dots, F_{d-1}$$$, to find the value of $$$F_n$$$.
Example 1. A famous Fibonacci sequence $$$F_n = F_{n-1} + F_{n-2}$$$ is an order 2 linear recurrence sequence.

Example 2. Let $$$F_n = n^2$$$. One can prove that $$$F_n = 3 F_{n-1} - 3 F_{n-2} + F_{n-3}$$$.

Example 3. Moreover, for $$$F_n = P(n)$$$, where $$$P(n)$$$ is a degree $$$d$$$ polynomial, it holds that

$$$ F_n = \sum\limits_{k=1}^{d+1} (-1)^{k+1}\binom{d+1}{k} F_{n-k}. $$$

If this fact is not obvious to you, do not worry as it will be explained further below.


Finally, before proceeding to next sections, we'll need one more definition.
Def. 4. A polynomial
$$$ A(x) = x^d - \sum\limits_{k=1}^d a_k x^{d-k} $$$

is called the characteristic polynomial of the linear recurrence defined by $$$a_1, \dots, a_d$$$.

Example 4. For Fibonacci sequence, the characteristic polynomial is $$$A(x) = x^2-x-1$$$.

Matrix approach

For linear recurrence sequences it holds that

$$$ \mathbf{F}_{n} = \begin{bmatrix}F_{n+d-1} \\ F_{n+d-2} \\ \dots \\ F_{n}\end{bmatrix} = \begin{bmatrix} a_1 & a_2 & \dots & a_{d-1} & a_d \\ 1 & 0 & \dots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix} \begin{bmatrix}F_{n+d-2} \\ F_{n+d-3} \\ \dots \\ F_{n-1}\end{bmatrix} =

\left(\begin{bmatrix} a_1 & a_2 & \dots & a_{d-1} & a_d \\ 1 & 0 & \dots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix} \right)^{n} \begin{bmatrix}F_{d-1} \\ F_{d-2} \\ \dots \\ F_{0}\end{bmatrix} = A^n \mathbf{F}_0. $$$

This representation allows to find $$$F_n$$$ in $$$O(d^3 \log n)$$$ with binary matrix exponentiation.

Improving the complexity of this solution is possible with the following theorem:


Theorem 1 (Cayley–Hamilton theorem). Every square matrix over a commutative ring satisfies its own characteristic polynomial.
For the matrix $$$A$$$ used above, it can be proven that its characteristic polynomial is $$$(-1)^dA(x)$$$, thus
$$$ A^d = \sum\limits_{k=1}^d a_k A^{d-k}. $$$

For the purpose of calculating $$$F_n$$$ it means that we may find $$$b_0, b_1, \dots, b_{d-1}$$$ such that

$$$ x^n \equiv b_0 + b_1 x + \dots + b_{d-1} x^{d-1} \mod{A(x)}, $$$

and this equivalence would also hold if we substitute $$$x$$$ with $$$A$$$. This also implies that

$$$ F_n = (\mathbf F_n)_d = (A^n \mathbf F_0)_d= \sum\limits_{k=0}^{d-1} b_k \left(A^k \mathbf{F}_0\right)_d = \sum\limits_{k=0}^{d-1} b_k F_k, $$$

thus allowing to compute $$$F_n$$$ in $$$O(d \log d \log n)$$$ using the binary lifting modulo $$$A(x)$$$ (see on CP-Algorithms).


Approach above is intuitive and good if you want just to find the $$$n$$$-th element of the recurrence. However it might be hard to analyze sequences with it (e.g. estimate the asymptotic behavior or find a recurrence for a sum of sequences).

Problem example: CodeChef — Random Number Generator. Just find $$$F_n$$$, where $$$F_k$$$ is a given linear recurrence.

Generating function approach

Consider the generating function $$$G(x) = F_0 + F_1 x + F_2 x^2 + \dots$$$ of the sequence. For $$$G(x)$$$ it holds that

$$$ G(x) = P(x) + \sum\limits_{k=1}^d a_k x^k G(x), $$$

where $$$P(x)$$$ is some polynomial of degree less than $$$d$$$ used to calibrate the fact that $$$F_0, \dots, F_{d-1}$$$ are pretty much arbitrary and might not adhere to the recurrence with lower terms. From this equation we see that

$$$ G(x) = \frac{P(x)}{Q(x)}, $$$

where $$$Q(x)$$$ is a polynomial obtained by reversing the coefficients of $$$A(x)$$$, that is

$$$ Q(x) = 1 - \sum\limits_{k=1}^d a_k x^k = x^d A(x^{-1}). $$$

Side note: If you know $$$F_0, \dots, F_{d-1}$$$ and $$$a_1, \dots, a_d$$$, you can find $$$P(x)$$$ from

$$$ P(x) \equiv G(x) Q(x) \pmod{x^d}, $$$

in other words,

$$$ G(x) = \frac{G(x) Q(x) \bmod x^d}{Q(x)}. $$$

Now, if we allow negative powers near $$$x$$$, the following equation will hold:

$$$ F_n = [x^0] x^n G(x^{-1}), $$$

where $$$[x^k] P(x)$$$ is the coefficient near $$$x^k$$$ in the series $$$P(x)$$$. We should notice that

$$$ A(x) G(x^{-1}) = x^d P(x^{-1}) $$$

has only positive coefficients, so is $$$x^k A(x) G(x^{-1})$$$ and so is all their linear combinations. In other words,

$$$ [x^0] X(x) G(x^{-1}) = [x^0] Y(x) G(x^{-1}) $$$

whenever $$$X(x)-Y(x)$$$ is divisible by $$$A(x)$$$. From this, we conclude that

$$$ F_n = [x^0] (x^n \bmod A)G(x^{-1}) = \sum\limits_{k=0}^{d-1} b_k [x^0] x^k G(x^{-1}) = \sum\limits_{k=0}^{d-1} b_k F_k, $$$

which is essentially the same result as in the matrix approach.

Graeffe's method

This approach was described by Golovanov399 in this comment.

Alternatively to finding $$$x^n \bmod A(x)$$$, you may consider the following transform:

$$$ F_n = [x^n]\frac{P(x)}{Q(x)}=[x^n]\frac{P(x)Q(-x)}{Q(x)Q(-x)}=[x^n]\frac{S_0(x^2)+xS_1(x^2)}{T(x^2)}. $$$

For even $$$n$$$, it is equal to $$$[x^{n/2}]\frac{S_0(x)}{T(x)}$$$ and for odd $$$k$$$ it is equal to $$$[x^{(n-1)/2}]\frac{S_1(x)}{T(x)}$$$.

It works with the same complexity, but has a better constant.

Side note: Same approach could be used to find the polynomial inverse in $$$O(n \log n)$$$:

$$$\frac{1}{Q(x)}=\frac{Q(-x)}{Q(x)Q(-x)}=\frac{Q(-x)}{T(x^2)}.$$$

Closed-form solution

While it could be hard to analyze the asymptotic behavior of $$$F_n$$$ with matrices (you probably can, but you'd need to work with eigenvalues of $$$A$$$), it is way simpler with the generating function representation, thanks to the existence of the partial fraction decomposition:

$$$ \frac{P(x)}{Q(x)} = \sum\limits_{i=1}^m \frac{C_i(x)}{(1-\lambda_i x)^{d_i}}, $$$

where $$$\lambda_1, \dots, \lambda_m$$$ are the roots of $$$A(x)$$$ with multiplicities $$$d_1,\dots,d_m$$$, and $$$C_i(x)$$$ is a polynomial of degree less than $$$d_i$$$. In other words,

$$$ \begin{gather} Q(x) = \prod\limits_{i=1}^m (1-\lambda_i x)^{d_i},\\ A(x) = \prod\limits_{i=1}^m (x-\lambda_i)^{d_i}.\\ \end{gather} $$$

It is generally known that

$$$ \frac{1}{(1-\lambda x)^d} = \sum\limits_{k=0}^\infty \binom{k+d-1}{d-1} \lambda^k x^k, $$$

from which we may conclude that the general solution to the linear recurrence is given by

$$$ F_n = \sum\limits_{i=1}^m p_i(n)\lambda_i^n, $$$

where $$$p_i(n)$$$ is a polynomial of degree less than $$$d_i$$$, determined from $$$F_0, \dots, F_{d-1}$$$.

Example 3 (follow-up): in particular, the formula above means that generating function for $$$p(0), p(1), \dots$$$, where $$$p(x)$$$ is a polynomial of degree $$$d$$$, has a denominator $$$(1-x)^{d+1}$$$. This proves the formula from the example 3.

Problem example: Timus — 2125. Continue the Sequence. You're given a sequence $$$a_1,\dots,a_n$$$ and a number $$$m$$$. Let $$$p(x)$$$ be a minimum-degree polynomial such that $$$p(1)=a_1,\dots, p(n)=a_n$$$. Find $$$p(n+1),\dots,p(n+m)$$$ in $$$O(n \log n)$$$.

Umbral calculus approach

Thanks to ftiasch for letting me know about umbral calculus! This approach makes so much more sense to me now.

Let's do something esoteric. We're given the recurrence

$$$ F_n = \sum\limits_{k=1}^d a_k F_{n-k}. $$$

Consider formal variable $$$b$$$ such that

$$$ b^n = \sum\limits_{k=1}^d a_k b^{n-k}. $$$

Using the formula above we may express any $$$b^n$$$ as a linear combination of $$$b^0, \dots, b^{d-1}$$$ by calculating it modulo $$$A(b)$$$. If, after that, we do a back-substitution of $$$F_k$$$ instead of $$$b^k$$$, we will have a formula that is true for any sequence $$$F$$$ adhering to the recurrence.

What we did is known as the umbral calculus. Formal theory behind it is as follows.

Consider a linear functional $$$L : R[b] \to R$$$ such that

$$$ L(b^n) = F_n. $$$

By the definition of the functional,

$$$ L(b^{n-d} A(b)) = F_n - \sum\limits_{k=1}^d a_k F_{n-k} = 0. $$$

As it is a linear functional, all possible linear combinations of $$$b^k A(b)$$$ would also give zero. In other words,

$$$ L(X)-L(Y)=0 $$$

when $$$X(b)-Y(b)$$$ is divisible by $$$A(b)$$$. In other words, $$$\langle A(b) \rangle \subset \ker L$$$, where $$$\langle A(b) \rangle$$$ is the ideal generated by $$$A(b)$$$.

In particular, it is true for $$$X(b) = b^n$$$ and $$$Y(b) = b^n \bmod A(b)$$$, leading us to the $$$b^n \bmod A(b)$$$ solution again.

Side note: Comparing this approach with the genfunc one, you may notice that $$$L(P)=[x^0]P(x)G(x^{-1})$$$.

This is the approach you could've seen earlier in TLE's blog. Special thanks to islingr for a very clear explanation.

I wonder if it can be used to derive closed-form solution...

Shift operator approach

It is of little use in competitive programming, but I like it because of its genericity.

Let $$$\Delta : R^\mathbb{N} \to R^\mathbb{N}$$$ be a linear operator defined on sequences over $$$R$$$, such that $$$(\Delta a)_n = a_{n+1}$$$.

With this operator, the recurrence rewrites as

$$$ \Delta^d F = \sum\limits_{k=1}^d a_k \Delta^{d-k} F, $$$

or, which is equivalent, as

$$$ A(\Delta) F = \left(\Delta^d - \sum\limits_{k=1}^d a_k \Delta^{d-k}\right)F = 0. $$$

Finding all possible $$$F$$$ is equivalent to parametrizing the elements of $$$\ker A(\Delta)$$$. The following result holds:

Theorem 2. Let $$$A: X \to X$$$ be a linear operator defined on a complex vector space $$$X$$$. Let $$$\lambda_1, \dots, \lambda_m$$$ be different complex numbers and $$$d_1, \dots, d_m$$$ be positive integer numbers. Then the following holds:

$$$ \ker \prod\limits_{i=1}^m (A-\lambda_i)^{d_i} = \sum\limits_{i=1}^m \ker (A-\lambda_i)^{d_i}. $$$

Here $$$\ker A = \{x : Ax = 0\}$$$ is the kernel of the linear operator $$$A$$$ and $$$\Sigma$$$ denotes the Minkowski sum of corresponding subspaces.

Proof sketch: If $$$X$$$ is finite-dimensional space, one can prove this by considering the Jordan normal form of the corresponding matrix. For infinite-dimension case, it is clear that right-hand side is a subset of left-hand side. For any vector $$$x$$$ from the left-hand side, the span of $$$x, Ax, A^2 x, \dots$$$ (known as a Krylov subspace) is finite-dimensional and you can prove that $$$x$$$ is also in the right-hand side by reducing all operators to this finite-dimensional subspace (it's invariant for all operators in the equation).

Thanks to tranquility for kindly proving this result to me!

What this means for us is that we may solve the problem independently for all $$$(\Delta-\lambda)^{d}$$$, where $$$\lambda$$$ are roots of $$$A(x)$$$ and $$$d$$$ are their multiplicities, and the final solution will be a linear combination of these basic solutions.

For $$$d = 1$$$, the equation here is of form

$$$ \Delta F = \lambda F, $$$

or, equivalently,

$$$ F_{n+1} = \lambda F_n. $$$

The solution to this is simply $$$F_n = C \cdot \lambda^n$$$, where $$$C$$$ is arbitrary constant.

As for higher-dimensional case, it can be reduced to the lower-dimensional as

$$$ (\Delta-\lambda)^d F = 0 \iff (\Delta - \lambda)^{d-1}[(\Delta - \lambda) F] = 0. $$$

In other words, let's denote by $$$F^{(k)}$$$ the generic solution to $$$(\Delta-\lambda)^k F^{(k)} = 0$$$. Then the equation above rewrites as

$$$ (\Delta - \lambda)F^{(k)} = F^{(k-1)} $$$

with the base case $$$F^{(1)}_n = C_1 \cdot \lambda^n$$$. For example,

$$$ (\Delta - \lambda)F^{(2)} = F^{(1)} = C_1 \lambda^n, $$$

which has a generic solution of form $$$F^{(2)}_n = (a+bn)\lambda^n$$$.

For this series of equations it can be proven that $$$F_n^{(k)} = C_k(n) \cdot \lambda^n$$$, where $$$C_k(n)$$$ is a polynomial of degree less than $$$k$$$. This essentially proves the closed-form solution to linear recurrences.

If you have some intuition about Jordan normal form and generalized eigenvectors, you can compare them to the construction above!

As you may see, $$$F^{(k)}$$$ is the generalized eigenvector of $$$\Delta$$$ of rank $$$k$$$, and the construction itself is very similar with Jordan chains.

In a very similar manner and using the same theorem you may prove that the solution to the linear differential equation

$$$ f^{(d)}(x) = \sum\limits_{k=1}^d a_k f^{(d-k)}(x) $$$

is given as

$$$ f(x) = \sum\limits_{i=1}^m p_i(x) e^{\lambda_i x}, $$$

where $$$\lambda_i$$$ are the roots of differential equation's characteristic polynomial and $$$p_i(x)$$$ are polynomials of degree less than $$$d_i$$$.

To get it, one should consider operator $$$A = \frac{\partial}{\partial x}$$$, hence the equation is

$$$ \left(\frac{\partial^n}{\partial x^n} - \sum\limits_{k=1}^d a_k \frac{\partial^{d-k}}{\partial x^{d-k}}\right)f(x) = 0. $$$
Tags tutorial, linear recurrence, umbral calculus, generating function, matrix, matrix exponentiation

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en15 English adamant 2022-02-22 15:40:16 195
en14 English adamant 2022-02-22 02:36:24 14 (-1)^d for char. polynomial
en13 English adamant 2022-02-21 15:21:45 20
en12 English adamant 2022-02-21 05:21:39 721
en11 English adamant 2022-02-21 05:04:45 47
en10 English adamant 2022-02-21 05:02:02 229
en9 English adamant 2022-02-21 05:00:00 20
en8 English adamant 2022-02-21 04:58:19 144
en7 English adamant 2022-02-21 04:51:51 220
en6 English adamant 2022-02-21 02:47:54 192 links
en5 English adamant 2022-02-21 02:30:18 4
en4 English adamant 2022-02-21 02:22:36 2
en3 English adamant 2022-02-21 02:21:20 145
en2 English adamant 2022-02-21 02:14:19 5631 (published)
en1 English adamant 2022-02-20 22:53:50 6455 Initial revision (saved to drafts)