LorentzianExpanders's blog

By LorentzianExpanders, history, 3 weeks ago, In English

Hello everyone!

Convolution is perhaps the first class of algorithms that one encounters in competitive programming requiring algebraic knowledge. There are many types of convolutions that we are interested in, including:

  • Sequential convolution (i.e., polynomial multiplication),
  • XOR convolution,
  • AND/OR convolution,
  • GCD/LCM convolution,
  • Subset convolution, and others.

One of the most well-known ways to understand fast algorithms for these convolutions is through their associated algebraic structures. These algebraic structures can often be decomposed using the Chinese Remainder Theorem (CRT) into simpler components, i.e., direct products. For instance, in the case of XOR convolution, the algebraic structure arises from the computation rules of the group algebra:

$$$ R[\mathbb Z_2^n]\colon \left(\sum_{g\in \mathbb Z_2^n} a_g g\right) \left(\sum_{g\in \mathbb Z_2^n} b_g g\right) = \left(\sum_{g,h\in \mathbb Z_2^n} a_g b_h (g+h) \right), $$$

and we have the following isomorphism of algebras:

$$$ \begin{align*} R[\mathbb Z_2^n] &\cong R[X_1,\dots,X_n]/(X_1^2-1, \dots, X_n^2 - 1) \\ &= \prod_{e\in \\\{\pm 1\\\}^n } R[X_1,\dots,X_n]/(X_1-e_1,\dots,X_n-e_n)\\ &\cong R^{2^n}, \end{align*} $$$

assuming 2 is invertible in $$$R$$$.

Such isomorphism induced by CRT is essentially the associated "transform" of the convolution. For sequential convolution, the corresponding transform is the Fourier transform, and for XOR convolution, the corresponding transform is the Walsh-Hadamard transform, etc. The isomorphism induced by CRT essentially defines the "transform" associated with the convolution. For example:

  • For sequential convolution, the corresponding transform is the Fourier transform.
  • For XOR convolution, the corresponding transform is the Walsh-Hadamard transform.

All the convolutions listed earlier can be understood in this framework, although subset convolution is somewhat different. However, not all convolutions fit neatly into this algebraic perspective. For example, consider the following problem from adamant:

Mex Convolution. Compute

$$$ c_k = \sum_{i\circ j = k} a_i b_j, $$$

where $$$i,j,k\in \{0,1,2\}^n$$$ and $$$k = i\circ j$$$ is taking $$$k_\ell = \operatorname{mex} \{ i_\ell, j_\ell \}$$$ for all $$$\ell$$$.

One can verify that this convolution does not form an associative algebra, so the rich theory of algebras does not apply here. The intended solution for Mex convolution has a complexity of $$$N^{\log_3 4} \approx N^{1.262}$$$.

In this blog, I will introduce a new perspective for understanding convolution: 3-dimensional tensors, and discuss some insights provided by this perspective.

Bilinear Algorithms, Its Rotations, and Tensors

Now we suppose we are working on a field $$$F$$$. (Most algorithms conceptually work on arbitrary rings, but some characterization theorems require the field assumption.)

The first thing I want to define is called a bilinear problem. Given a 3-dimensional array $$$T_{ijk}$$$, the bilinear problem takes arrays $$$X_i$$$ and $$$Y_j$$$ and computes the array $$$Z_k$$$ such that

$$$ Z_k = \sum_{i,j} X_i Y_j T_{ijk}. $$$

Clearly, this class of problems includes all the convolutions mentioned above. The convolution is a bilinear problem with $$$T_{ijk} = [i \circ j = k]$$$.

Given one bilinear problem, the only information defining the problem is the 3-dimensional array $$$T_{ijk}$$$. Then here naturally come two rotations of the problem:

  • The first rotation is the problem of computing $$$X_i$$$ given $$$Y_j$$$ and $$$Z_k$$$, i.e.,
$$$ X_i = \sum_{j,k} Y_j Z_k T_{ijk}. $$$
  • The second rotation is the problem of computing $$$Y_j$$$ given $$$X_i$$$ and $$$Z_k$$$, i.e.,
$$$ Y_j = \sum_{i,k} X_i Z_k T_{ijk}. $$$

We remark that there is no relation between the arrays $$$X_i$$$, $$$Y_j$$$, and $$$Z_k$$$ between the original problem and the two rotations. Those familiar with the transposition principle may be accustomed to this kind of rotation.

Finally, we define the trilinear problem as the problem of computing

$$$ \sum_{i,j,k} X_i Y_j Z_k T_{ijk}. $$$

We usually call such a trilinear form $$$T(X,Y,Z)$$$ the tensor.

Therefore, the convolution problem can be written as tensors

$$$ T = \sum_{i,j} X_i Y_j Z_{i \circ j}. $$$

For XOR convolution and subset convolution, it's also convenient to write them in a symmetric form: For XOR convolution, we have

$$$ T_{\mathbb Z_2^n} = \sum_{\substack{i,j,k\in\mathbb Z_2^n \\ i+j+k=0}} X_i Y_j Z_k. $$$

For subset convolution, we have

$$$ T = \sum_{A \sqcup B \sqcup C = [n]} X_A Y_B Z_C. $$$

The important observation is that the bilinear problem, its rotations, and the trilinear problem are all equivalent.

Claim. Given a tensor $$$T$$$, its associated bilinear problem, its two rotations, and the trilinear problem can be computed in the same time complexity.

One direction is straightforward: Suppose we have an algorithm computing the bilinear problem (or its rotations), then we can compute the trilinear problem by running the algorithm to compute

$$$ z_k = \sum_{i,j} X_i Y_j T_{ijk}, $$$

and then compute the trilinear form by inner product:

$$$ T(X,Y,Z) = \sum_k z_k Z_k. $$$

The other direction is more interesting. We need the following result:

Baur-Strassen's Algorithm. Let $$$f(X_1, \dots, X_n)$$$ be a polynomial that can be computed in $$$M$$$ operations. Then we can compute all the partial derivatives

$$$ \frac{\partial}{\partial X_i} f(X_1, \dots, X_n) $$$

in $$$O(M)$$$ operations.

Those familiar with the transposition principle can see that Baur-Strassen's algorithm is a vast generalization. In fact, the algorithm itself is also a generalization of the transposition principle. The key idea is to first write down a sequence of arithmetic operations $$$O_1, \dots, O_M$$$ that ends with the desired polynomial. Considering only the last $$$k$$$ operations, this is also a polynomial where the variables are some intermediate variables $$$g_k = g_k(Y_1, \dots, Y_r)$$$. We could consider computing the partial derivatives of $$$g_k$$$ with $$$k$$$ growing from $$$1$$$ to $$$M$$$, thus gradually computing the partial derivatives of $$$f$$$.

Now here are two possibilities:

  • If $$$O_{M-k}$$$ is an addition operator, without loss of generality, assume $$$Y_1 = Y'_1 + Y'_2$$$. Then we have $$$g_{k+1}(Y'_1, Y'_2, Y_2, \dots, Y_r) = g_k(Y'_1 + Y'_2, Y_2, \dots, Y_r)$$$. Thus,
$$$ \frac{\partial}{\partial Y'_i} g_{k+1} = \frac{\partial}{\partial Y_1} g_k, $$$

and for $$$i > 1$$$,

$$$ \frac{\partial}{\partial Y_i} g_{k+1} = \frac{\partial}{\partial Y_i} g_k. $$$
  • If $$$O_{M-k}$$$ is a multiplication operator, without loss of generality, assume $$$Y_1 = Y'_1 Y'_2$$$. Then we have $$$g_{k+1}(Y'_1, Y'_2, Y_2, \dots, Y_r) = g_k(Y'_1 Y'_2, Y_2, \dots, Y_r)$$$. Thus,
$$$ \frac{\partial}{\partial Y'_i} g_{k+1} = Y'_{3-i} \frac{\partial}{\partial Y_1} g_k, $$$

and for $$$i > 1$$$,

$$$ \frac{\partial}{\partial Y_i} g_{k+1} = \frac{\partial}{\partial Y_i} g_k. $$$

So, passing from $$$g_1$$$ to $$$g_M$$$, we can compute all the partial derivatives of $$$f$$$, with each step requiring only $$$O(1)$$$ operations. Thus, the total complexity is $$$O(M)$$$.

Having Baur-Strassen's algorithm, we can compute the bilinear problem by computing the partial derivatives of the trilinear form:

$$$ \frac{\partial}{\partial Z_k} T(X,Y,Z) = \sum_{i,j} X_i Y_j T_{ijk}. $$$

This also gives an alternative explanation of the algorithm of those "transposed convolutions" we usually encounter in practice.

Tensor Rank Decomposition

Let us now revisit the perspective of bilinear algorithms computing $$$Z_k = \sum_{i,j} X_i Y_j T_{ijk}$$$. We introduce the following assumption about the algorithm:

Assumption. For each operation $$$O_i$$$, let $$$g_i$$$ denote the polynomial computing the intermediate variable. Then $$$g_i$$$ is either a constant, a linear form in $$$X$$$, a linear form in $$$Y$$$, or a bilinear form in $$$X$$$ and $$$Y$$$.

This assumption is not restrictive because any algorithm can be rewritten to satisfy it by splitting each operation into its homogeneous components. This process, called homogenization, simplifies the algorithm's structure, though we omit the technical details here.

Under this assumption, the algorithm naturally divides into three phases:

  1. The first phase computes a linear transformation $$$R_X = A X$$$.
  2. The second phase computes a linear transformation $$$R_Y = B Y$$$.
  3. In the third phase, with $$$(R_Z)_i = (R_X)_i (R_Y)_i$$$, the algorithm computes a linear transformation $$$Z = C^{\mathsf T} R_Z$$$.

Rewriting the algorithm in terms of the trilinear form, the final step involves taking the inner product of $$$C^{\mathsf T} R_Z$$$ and $$$Z$$$:

$$$ T(X,Y,Z) = (C Z)^{\mathsf T} R_Z = \sum_{i=1}^r (AX)_i (BY)_i (CZ)_i. $$$

In other words, the tensor $$$T$$$ can be expressed as a sum of $$$r$$$ products of linear forms in $$$X$$$, $$$Y$$$, and $$$Z$$$:

$$$ T(X,Y,Z) = \sum_{i=1}^r u_i(X) v_i(Y) w_i(Z). $$$

This representation is known as the rank decomposition of the tensor $$$T$$$. The smallest $$$r$$$ for which such a decomposition exists is called the tensor rank of $$$T$$$, denoted $$$\mathbf{R}(T)$$$.

The concept of tensor rank extends to $$$d$$$-dimensional tensors, where the case $$$d=2$$$ corresponds to matrix rank. However, for $$$d=3$$$, many properties of matrix rank no longer hold, and determining the tensor rank becomes an NP-hard problem.

Tensor rank serves as a relaxed measure of the complexity of bilinear problems. While it provides a lower bound on complexity—since any algorithm must use at least this many multiplications—it ignores the costs of the linear transformations $$$A$$$, $$$B$$$, and $$$C$$$. Thus, for general bilinear problems, tensor rank is not always an accurate measure of complexity.

However, for many convolution problems discussed earlier, this limitation is not significant because these problems exhibit a tensor power structure. Specifically, let $$$T$$$ and $$$T'$$$ be two tensors. The tensor product $$$T \otimes T'$$$ is defined as:

$$$ (T \otimes T')_{ii', jj', kk'} = T_{i,j,k} T'_{i',j',k'}. $$$

The tensor product of $$$n$$$ copies of $$$T$$$ is denoted $$$T^{\otimes n}$$$.

A key observation is that, for XOR/AND/OR convolutions and subset convolution, the tensor $$$T_n$$$ can be written as $$$T_1^{\otimes n}$$$ for some base tensor $$$T_1$$$. For instance:

$$$ T_{\mathbb{Z}_2^n} = T_{\mathbb{Z}_2}^{\otimes n}. $$$

In such cases, tensor rank becomes a meaningful complexity measure because the linear transformations $$$A$$$, $$$B$$$, and $$$C$$$ in $$$T_n$$$ derive from those in $$$T_1$$$. By the mixed-product property of the tensor product, the computation of $$$A^{\otimes n}$$$ can be decomposed into:

$$$ A^{\otimes n} = (A \otimes I \cdots I) \cdot (I \otimes A \otimes I \cdots I) \cdots (I \cdots I \otimes A). $$$

If $$$A$$$ is an $$$r \times a$$$ matrix with $$$r = a$$$, the time complexity is $$$O(n \cdot a^n)$$$; if $$$r > a$$$, the complexity is $$$O(r^n)$$$. This method is often referred to as Yates' algorithm.

For example, the XOR convolution has a tensor rank decomposition:

$$$ T_{\mathbb{Z}_2} = \frac{1}{2} \left[ (X_0 + X_1)(Y_0 + Y_1)(Z_0 + Z_1) + (X_0 - X_1)(Y_0 - Y_1)(Z_0 - Z_1) \right], $$$

where the linear transformation $$$A$$$ is:

$$$ A = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}. $$$

Hence, the linear transformation $$$A^{\otimes n}$$$ corresponds to the Walsh-Hadamard transform.

Border Rank

The concept of tensor rank successfully explains the fast transforms associated with XOR/AND/OR convolution. However, a subtle issue arises with subset convolution. The tensor rank of the subset convolution tensor $$$T_1$$$ is not $$$2$$$ but $$$3$$$. This issue can be resolved using the concept of border rank.

Consider the subset convolution tensor $$$T_1$$$:

$$$ T_1 = X_0Y_0Z_1 + X_0Y_1Z_0 + X_1Y_0Z_0. $$$

Although its tensor rank is $$$3$$$, we can perturb it slightly to approximate a tensor of rank $$$2$$$:

$$$ \frac{1}{\epsilon} \left[ (X_0 + \epsilon X_1)(Y_0 + \epsilon Y_1)(Z_0 + \epsilon Z_1) - X_0Y_0Z_0 \right] = T_1 + O(\epsilon). $$$

This might seem nonsensical from a computational perspective, but it can be converted into an exact algorithm. The key idea is to treat $$$\epsilon$$$ as a formal variable. Rewriting the equation gives:

$$$ T_1' = (X_0 + \epsilon X_1)(Y_0 + \epsilon Y_1)(Z_0 + \epsilon Z_1) - X_0Y_0Z_0 = \epsilon T_1 + O(\epsilon^2). $$$

Here, the terms in the decomposition are polynomials in $$$\epsilon$$$. The tensor power of $$$T_1'$$$ then becomes:

$$$ T_1'^{\otimes n} = \epsilon^n T_1^{\otimes n} + O(\epsilon^{n+1}). $$$

Thus, we can compute the tensor power of $$$T_1$$$ exactly by maintaining the formal variable $$$\epsilon$$$ to a precision of $$$\epsilon^n$$$.

This method leads to the standard $$$O(n^2 2^n)$$$ algorithm for subset convolution, which is essentially equivalent with maintaining the so-called rank vector.

Formally, the border rank $$$\underline{\mathbf{R}}(T)$$$ is defined as the smallest $$$r$$$ such that a tensor can be expressed as:

$$$ \sum_{i=1}^r u_i(X) v_i(Y) w_i(Z) = \epsilon^d T + O(\epsilon^{d+1}). $$$

The above algorithm demonstrates the use of border rank, as it enables an algorithm with a time complexity of $$$\tilde{O}(\underline{\mathbf{R}}(T)^n)$$$.

An intriguing result by Alder (1981) states that if $$$F$$$ is an algebraically closed field, and $$$R_{\leq r}$$$ denotes the set of tensors with rank at most $$$r$$$, then $$$\underline{\mathbf{R}}(T) \leq r$$$ if and only if $$$T$$$ lies in the Zariski closure of $$$R_{\leq r}$$$. This result provides a geometric interpretation of border rank, justifying that $$$\epsilon$$$ is really an "infinitesimal" in the algebraic sense.

More Topics

In this blog, tensor rank and border rank appear as alternative perspectives to describe well-known algorithms. However, perhaps the tensor that has garnered the most attention in the study of tensor rank is the matrix multiplication tensor. It is commonly written as:

$$$ \langle n, m, p \rangle = \sum_{i \in [n], j \in [m], k \in [p]} X_{ij} Y_{jk} Z_{ki}. $$$

The famous Strassen's algorithm can be viewed as finding a decomposition with $$$\mathbf{R}(\langle 2, 2, 2 \rangle) \leq 7$$$.

Interestingly, to compute the tensor power $$$T^{\otimes n}$$$, one does not necessarily need to start with the tensor rank decomposition of $$$T$$$. Instead, one can start with a fixed tensor power $$$T^{\otimes k}$$$ and compute $$$(T^{\otimes k})^{\otimes (n/k)}$$$ using the same method. This leads to the notion of asymptotic tensor rank:

$$$ \widetilde{\mathbf{R}}(T) = \lim_{k \to \infty} \mathbf{R}(T^{\otimes k})^{1/k}. $$$

Thus, the central question in algebraic complexity can be rephrased as determining $$$\widetilde{\mathbf{R}}(\langle 2, 2, 2 \rangle) = 2^\omega$$$, where $$$\omega$$$ is the exponent of matrix multiplication.

This is an area full of open questions, but there are still some interesting results:

For any $$$n \times n \times n$$$ tensor $$$T$$$, a trivial decomposition yields $$$\mathbf{R}(T) \leq n^2$$$. However, Strassen's asymptotic submultiplicativity shows that $$$\widetilde{\mathbf{R}}(T) \leq n^{2\omega / 3}$$$. This means that matrix multiplication is universal in the sense that it can speed up the computation of any tensor power. The current world record is $$$\omega \leq 2.371339$$$ (arXiv:2404.16349). For more on this topic, see the great lecture notes by Virginia Vassilevska Williams (starting from Lecture 19), which discuss key ideas in this line of research.

Now fix a finite group $$$G$$$, and let $$$T_G$$$ be the tensor of the group algebra of $$$G$$$ with coefficients in $$$\mathbb C$$$. When $$$G$$$ is abelian, we know that $$$\widetilde{\mathbf{R}}(T_G) = |G|$$$. What about the non-abelian case? This can be fully determined by $$$\omega$$$. Suppose the irreducible representations of $$$G$$$ have dimensions $$$d_1, \dots, d_k$$$. Then:

$$$ \widetilde{\mathbf{R}}(T_G) = \sum_i d_i^\omega. $$$

The inequality $$$\leq$$$ is relatively straightforward to observe, while the reverse direction follows from a deeper result known as Schönhage's asymptotic sum inequality. However, even the trivial bound $$$\omega \leq 3$$$ already leads to non-trivial algorithms in some sense.

Full text and comments »

  • Vote: I like it
  • +238
  • Vote: I do not like it

By LorentzianExpanders, history, 12 months ago, In English

Hello everyone!

I'd like to share a simple algorithm for computing the Frobenius form (or called rational canonical form) of a matrix. Before stating what is the algorithm, let me first give some applications.

Matrix Exponentiation

We know that one can compute the matrix exponential of a matrix $$$A^K$$$ in $$$O(N^3 \log K)$$$ time, via binary exponentiation.

If one knows how to compute the characteristic polynomial $$$p(T)$$$ of $$$A$$$ in $$$O(N^3)$$$ time, then one can first use the Cayley–Hamilton theorem to reduce the problem into computing

$$$ \sum_i r_i A^i, $$$

where $$$r_i$$$ is the coefficient of

$$$ \sum_i r_i T^i = T^K \bmod p(T). $$$

This reduces the time to $$$O(N^4 + N^2\log K)$$$, or even $$$O(N^4 + N\log N\log K)$$$ if you use FFT to speed up the computation of $$$T^K \bmod p(T)$$$.

Some sqrt tricks can further reduce the time to $$$O(N^{3.5} + N\log N \log K)$$$, but it is still not good enough.

When you attend some linear algebra course, you may learn that, if one can find the Jordan form of $$$A$$$, then one can compute $$$A^K$$$ faster. However, the Jordan form is usually not a good choice for computation, because it at least needs to find the root of the characteristic polynomial, which in general needs to consider the field extension.

The Frobenius form (or called rational canonical form) is a better choice. It states that any matrix $$$A$$$ can be uniquely written as $$$A = Q F Q^{-1}$$$, where $$$F$$$ has the form

$$$ F = \begin{pmatrix} C_1 \\ & C_2 \\ & & \ddots \\ & & & C_k \end{pmatrix}, $$$

where $$$C_i$$$ is the companion matrix of polynomial $$$p_i(T)$$$, satisfying $$$p_k \mid \cdots \mid p_2 \mid p_1$$$.

Note that the Frobenius form $$$F$$$ is very sparse (only contains $$$< 2N$$$ non-zero entries), then we can compute $$$A^K$$$

$$$ A^K = \sum r_i A^i = Q\left( \sum r_i F^i \right)Q^{-1}, $$$

where can be computed in $$$O(N^3 + N\log N \log K)$$$ time.

The Algorithm

To state the algorithm, we first consider a proof of the Cayley–Hamilton theorem that directly gives an algorithm to compute the characteristic polynomial.

Let $$$A$$$ be a matrix, we first pick an arbitrary nonzero vector $$$v$$$, and consider the sequence $$$v, Av, A^2 v, \dots$$$, these vectors are linearly independent until some point, say $$$A^k v$$$, then we have a relation

$$$ A^kv = \sum_{i=0}^{k-1} c_i A^i v. $$$

Let $$$p(T) = T^k - \sum_{i=0}^{k-1} c_i T^i$$$, we have $$$p(A)v = 0$$$.

We call the above vector $$$v_1$$$, say $$$V_1$$$ be the span of $$${v_1,\dots, A^{k-1} v_1}$$$, and then pick another vector $$$v_2$$$ such that $$$v_2$$$ is not spanned by $$$v_1, Av_1, \dots, A^{k-1}v_1$$$, then consider the sequence $$$v_1,\dots, A^{k-1}v_1, v_2, Av_2, A^2 v_2, \dots$$$, we can find another polynomial $$$p_2(T)$$$ such that $$$p_2(A)v_2$$$ is in $$$V_1$$$. Then we take $$$V_2$$$ to be the span of $$${v_1,\dots, A^{k-1}v_1, v_2, \dots, A^{k-1}v_2}$$$, and we repeat such process until

$$$ 0 = V_0 \subset V_1 \subset V_2 \subset \cdots \subset V_\ell = V, $$$

where $$$V$$$ is the whole space. Let $$$\beta = {v_1,Av_1,\dots,}$$$ be the basis.

Under our chosen basis, the matrix looks like

$$$ [A]_\beta = \begin{pmatrix} C_1 & 0|* & \cdots & 0|*\\ & C_2 & \cdots & \ldots \\ & & \ddots & 0|*\\ & & & C_\ell \end{pmatrix}, $$$

where $$$C_i$$$ is the companion matrix of $$$p_i$$$.

Then for each vector $$$v$$$, we show that $$$p_1(A)\cdots p_\ell(A) v = 0$$$. First consider $$$p_\ell(A)v$$$, this eliminates all components of the basis elements given by $$$v_\ell$$$, so $$$p_\ell(A) \in V_{\ell - 1}$$$, then we can inductively show $$$p_i(A) \cdots p_\ell(A) v \subset V_{i-1}$$$, in conclusion, we have $$$p_1(A)\cdots p_\ell(A) v = 0$$$. Note that by direct computation, we have $$$p(T) = \det(T I - A) = p_1(T) \cdots p_\ell(T)$$$, so the characteristic polynomial annihilates all vectors, the Cayley–Hamilton theorem is proved.

Note that there is a difference that outside the diagonal blocks, there are still some terms. For each $$$A^j v_i$$$, if $$$j$$$ is not the largest one, then $$$A(A^j v_i)$$$ takes the vector to $$$A^{j+1} v_i$$$, so this columns only contains one $$$1$$$ in $$$[A]_\beta$$$. The only problem is that the column for each maximal $$$j$$$ such that $$$A^j v_i \in\beta$$$ may contain some other terms (this corresponds to the $$$*$$$ in $$$0|*$$$).

A simple randomized strategy may help us to remedy this problem. What if we randomly choose a vector $$$v_i$$$ each time? Actually, it can be shown that, when the base field is $$$\mathbb F_q$$$, we have probability $$$\geq 1 - O(N / q)$$$ that $$$p_1, \dots , p_\ell$$$ are exactly those polynomials in the rational canonical form! (They are also called the elementary divisors when we give $$$V$$$ the $$$\mathbb F_q[T]$$$-module structure by $$$f(T) v = f(A) v$$$, and use the structure theorem of finitely generated module over PID.)

So what happens when we are guaranteed that $$$p_\ell \mid \cdots \mid p_1$$$? We first look at what happened at $$$v_2$$$.

Firstly, we have

$$$ p_2(A) v_2 = r(A) v_1, $$$

and since $$$p_1$$$ is the minimal polynomial $$$A$$$ of $$$v_1$$$, let $$$q(T) = p_1(T) / p_2(T)$$$, we have

$$$ 0 = p_1(A)v_2 = q(A) p_2(A)v_2 = q(A) r(A) v_1, $$$

since $$$v_1$$$ is only annihilated by a multiple of $$$p_1(T)$$$, we have $$$p_1 \mid q r = p_1 r / p_2$$$, thus we have $$$p_2 \mid r$$$. This shows that we can have some $$$s = r / p_2$$$, then consider

$$$ v_2' = v_2 - s(A)v_1, $$$

we would have

$$$ p_2(A) v_2' = p_2(A)v_2 - r(A)v_1 = 0. $$$

This really looks like the "orthogonalization" step as what was done in the Gram–Schmidt process!

This process can also be done at larger $$$i$$$ s. Now we can state the algorithm.

[Datas] Maintain a basis $$$B$$$ of vectors $$${ A^j v_i }$$$ for $$$0\leq j < d_i$$$, where $$$d_i$$$ is the degree of the elementary divisor $$$p_i$$$. Initially $$$i=0$$$.

  1. [Initial vector] Increase $$$i$$$ by $$$1$$$. Uniformly pick some random vector $$$v_i$$$.

  2. [Iteration] Take $$$B' = B$$$, repeatedly insert $$$B'$$$ with vectors $$$v_i, Av_i, \dots$$$ until finds a relation $$$p_i(A) v_i = \sum_{j < i} g_{ji}(A) v_j$$$.

  3. [Orthogonalization] For each $$$j < i$$$, compute $$$s_{ji} = g_{ji} / p_i$$$, then take $$$v_i \gets v_i - \sum_j s_{ji}(A) v_j$$$.

  4. [Update] For the new orthogonalized vector $$$v_i$$$, insert $$$B$$$ with vectors $$$v_i, Av_i, \dots, A^{d_i - 1} v_i$$$.

  5. [Finish] If $$$B$$$ has $$$n$$$ vectors, halt, otherwise go to 1.

After a careful implementation, the algorithm can be implemented in $$$O(N^3)$$$ time, and the probability that the algorithm fails is $$$O(N / q)$$$, where $$$q$$$ is the size of the base field.

Although the ideas behind need some further analysis (for example I didn't prove the probability of success), but the final algorithm is quite clean. I wrote a code for this algorithm, for the Pow of Matrix problem on Library Checker. You can refer to this to have a more precise understanding of how this algorithm works.

Full text and comments »

  • Vote: I like it
  • +288
  • Vote: I do not like it