#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	adamant	152
6	djm03178	152
8	Qingyu	151
9	luogu_official	149
10	awoo	147

Elegia's blog

Sums involving powers

Let's consider the following problem of summation

$$$ \sum_{0\leq n \leq M} f(n) \cdot n^k, $$$

where $$$M$$$ can be finite or infinite. We want to separate a large class of $$$f$$$ such that such summation can be solved in $$$O(k)$$$.

You may already know some examples:

For the constant function $$$f(n) = 1$$$, known as 662F, can be solved in linear time through Lagrange interpolation on consecutive sampling points.
For the geometric series $$$f(n) = r^n$$$, you can find the finite case and infinite case (assume $$$|r| < 1$$$) in Library Checker. One well-known way to solve this is through the partial fraction.
For the binomial case $$$f(n) = \binom N n$$$, known as 932E (though we only have $$$k\leq 5000$$$), or some similar problem 1528F, such kind of summation are usually dealed with Stirling number $$$n^k = \sum_j {k\brace j} j! \binom n j$$$, but such solution involving a whole row of Stirling number costs $$$O(k\log k)$$$ time. I think it was Min25 who first published a deduction which leads to a $$$O(k + \log M)$$$ solution (Web Archive).
The following examples might not be related to the summation obviously, but we will explain this later. We focus on some famous sequences, such as the Bell number $$$k![x^k] e^{e^x-1}$$$, Bernoulli number $$$k![x^k] \frac{x}{e^x-1}$$$, and the number of alternating permutations $$$k![x^k] \tan x + \sec x$$$. Clearly, these things can all be done in $$$O(k\log k)$$$ by elementary operations on formal power series. But actually, the single term can be evaluated in $$$O(k)$$$.

So how to solve all the above problems in a unified way? The answer is D-Finite functions.

We consider the problem of extracting the coefficient

$$$ [x^k] F(e^x) $$$

for some D-Finite function $$$F(t)$$$.

The favorable case is that $$$F(t)$$$ itself only has a degree up to $$$k$$$, then the problem will become trivial. So the general method is to transfer any $$$F(t)$$$ into such case. Consider the expansion

$$$ F(e^x) = F((e^x-1)+1) = \sum_{j\geq 0} \frac{F^{j}(1)}{j!} (e^x-1)^j, $$$

note that $$$[ x^k ] ( e^x - 1 )^j$$$ vanishes when $$$j > k$$$, so we can truncate the sum to $$$j\leq k$$$. Let $$$F_0(t) = F(t+1) \bmod t^{k+1}$$$ and $$$F_1(t) = F_0(t-1)$$$, then we have

$$$ [x^k]F(e^x) = [x^k]F_1(e^x), $$$

and $$$F_1(t)$$$ has degree at most $$$k$$$.

We first show that $$$F_0(t)$$$ is D-Finite. Since $$$F(t)$$$ is D-Finite, clearly $$$F(t+1)$$$ is D-Finite with relations

$$$ \sum_i q_i(t) \partial_t^i F(t+1) = 0. $$$

let $$$a_n$$$ be the sequence associated with $$$F(t+1)$$$, then the D-Finite relation of $$$F(t+1)$$$ induces a P-recursive relation for $$$a_n$$$:

$$$ \sum_{j\leq \ell} P_j(n) a_{n-j} = 0 \tag{1} $$$

for some polynomials $$$P_j$$$. Let $$$b_n$$$ be the sequence associated with $$$F_0(t)$$$, then we have

$$$ b_n = a_n \cdot [n \leq k]. $$$

Note that (1) still holds when $$$a_n$$$ is substituded by $$$b_n$$$, when $$$n \leq k$$$ and $$$n > k + \ell$$$, this corresponds to an equation in generating function:

$$$ \sum_i q_i(t) \partial_t^i F_0 = \sum_{1\leq j\leq \ell} \epsilon_j t^{k+j}. $$$

Then plug in $$$t-1$$$, we get the D-Finite relation for $$$F_1(t)$$$:

$$$ \sum_i q_i(t-1) \partial_t^i F_1 = \sum_{1\leq j\leq \ell} \epsilon_j (t-1)^{k+j}. $$$

This could be used to compute the coefficient of $$$F_1(t)$$$ in $$$O(k)$$$ time. Therefore, we can compute the coefficient $$$[x^k]F(e^x)$$$ in $$$O(k)$$$ time.

Now we can see how to solve the above problems in a unified way:

For the constant function $$$f(n) = 1$$$, we have $$$F(t) = \frac{1-t^{M+1}}{1-t}$$$, which is D-Finite.
For the geometric series $$$f(n) = r^n$$$, we have $$$F(t) = \frac{1}{1-rt}$$$, which is D-Finite.
For the binomial case $$$f(n) = \binom N n$$$, we have $$$F(t) = (1+t)^N$$$, which is D-Finite.
For the Bell number, we have $$$F_0(t) = e^t \bmod t^{k+1}$$$.
For the Bernoulli number, we have $$$F_0(t) = \frac{\ln(1+t)}{t} \bmod t^{k+1}$$$.
...

Multipoint Evaluation for some 2-dimensional Arrays

Perhaps the following example is quite famous:

Problem. Given $$$N$$$ points $$$(n, m)$$$ ($$$n,m\leq N$$$), compute the sum

$$$ a_{n,m} = \sum_{k=0}^m \binom n k . $$$

A well-known solution is to use the fact that $$$a_{n,m}$$$ can be efficiently maintained when the coordinates are increased/decreased by one. Then Mo's algorithm can solve this problem in $$$O(N\sqrt N)$$$ time.

In the summer of 2021, djq_cpp told me this problem can be solved in $$$O(N \log^2 N)$$$ time. Since I didn't publish this solution, it was later independently discovered by orzdevinwang and Nyaan.

Consider a sequence of vectors of polynomials

$$$ v_{m}(x) = \begin{bmatrix} \binom x m \\ \sum_{k< m} \binom x k \end{bmatrix}, $$$

our goal becomes to compute the value of $$$v_m(n)$$$ for several $$$m$$$ and $$$n$$$. We can see that the sequence of vectors $$$v_m(x)$$$ has a transition matrix of constant degree

$$$ v_m = \begin{bmatrix} \frac{x-m+1}{m} & 0\\ 1 & 1 \end{bmatrix} v_{m-1} =: T_m v_{m-1}. $$$

Then the problem becomes carefully modifying the subproduct tree in the $$$O(n\log^2n)$$$ algorithm of the Vandermonde matrix determinant.

Let $$$[l, r]$$$ be an interval appearing in the segment tree of $$$[1, n]$$$. We maintain all the matrix products

$$$ M_{l,r} = T_l T_{l+1} \cdots T_r, $$$

which are polynomial matrices of degree $$$r-l+1$$$.

For each $$$[l, r]$$$ we consider the residue

$$$ r_{l, r} = v_{l-1} \bmod R_{l, r}(x), $$$

where

$$$ R_{l, r}(x) = \prod_{m_i \in [l, r]} (x - n_i). $$$

When recursing down the segment tree, we can maintain

$$$ r_{l, mid} = r_{l, r} \bmod R_{l, mid} $$$

and

$$$ r_{mid+1, r} =( M_{l,mid}\cdot r_{l, r}) \bmod R_{mid+1, r}. $$$

Then for each leaf node, we can do multipoint evaluation of $$$v_m(x) \bmod R_{m,m}(x)$$$.

It's not hard to see that the above algorithm can be generalized to the case that every $$$T_m$$$ is a polynomial matrix where the sum of degrees is $$$O(N)$$$. This encapsulates a lot of 2-dimensional arrays that can be computed in $$$O(N\log^2 N)$$$ time, for example, the coefficients of 2 variables D-Finite functions. For instance, solving Yukicoder No.2166 Paint and Fill in $$$O(N\log^2 N)$$$ time.

Remark: As we know, there are ways to avoid polynomial modulo operations in multipoint evaluation of polynomials. The situation is similar for the above problem, see this remark (in Chinese) by aleph1022.

The $$$q$$$-analogue of D-Finite Functions

Recently there have been more and more problems related to $$$q$$$-binomials. One of the ways to derive basic identities such as the coefficient

$$$ f(z) = (1-z)(1-qz)(1-q^2z) \cdots $$$

is to use the $$$q$$$-analogue of differential operators. Compare $$$f(qz)$$$ and $$$f(z)$$$, we have

$$$ (1-z)f(qz) = f(z), $$$

then compare the coefficients of two sides, we have

$$$ f_n = q^n f_n - q^{n-1}f_{n-1}, $$$

thus

$$$ f_n = \frac{-q^{n-1}}{1-q^n} f_{n-1} = (-1)^n\frac{q^{n(n-1)/2}}{(1-q)\cdots(1-q^n)}. $$$

These tricks could be further developed to compute sequences that are $$$q$$$-holonomic, i.e., the recurrence relation has coefficient rational in $$$q^i$$$.

Uoj Long Round #2, Hash Killer. There is a polynomial $$$f(x)$$$ of degree $$$n-1$$$, we know that its value at $$$f(q^i)$$$ for $$$0\leq i < n$$$, in particular, $$$f(q^i) \neq 0$$$ for only $$$k$$$ positions. Given these data, compute the $$$m$$$th coefficient of $$$f(x)$$$. $$$n$$$ can be very large.

By Lagrange interpolation we have

$$$ f(x) = \sum_{i=1}^{n-1} f(q^i) \frac{\prod_{j=0}^{n-1} (x-q^j)}{(x-q^i) \prod_{j\in \{0,\dots,\overline i,\dots,n-1\}} (q^i-q^j)}, $$$

since $$$f(q^i)$$$ is nonzero for only $$$k$$$ positions, we only need to compute

$$$ [x^m]\frac{\prod_{j=0}^{n-1} (x-q^j)}{(x-q^i) \prod_{j\in \{0,\dots,\overline i,\dots,n-1\}} (q^i-q^j)} $$$

for this many $$$i$$$.

After some calculation, one can derive that the sequence

$$$ u_i = [x^m]\frac{\prod_{j=0}^{n-1} (x-q^j)}{(x-q^i)} $$$

satisfies a $$$q$$$-holonomic relation. How to compute $$$k$$$ values of the sequence $$$u_i$$$ efficiently? One could set a block size $$$B\leq \sqrt n$$$, and compute the sequence $$$u_{iB}$$$ for all $$$i\leq n/B$$$ by:

Compute the consecutive matrix products of the recurrence relation.
Use the Chirp-Z transform to compute the actual value of the transition matrix.

Then we can compute the sequence $$$u_i$$$ in $$$O((n/B)\log n + k B)$$$ time.

Actually, this problem can be seen as a $$$q$$$-analogue of the multipoint evaluation of P-recursive 2-d arrays. So the theoretical complexity of this problem can be further reduced to $$$\tilde O(\sqrt n + k)$$$.

It's also useful when $$$q$$$ is a formal variable, for example, see AGC069 E.

Graph Enumeration Problems

PA 2019 final, Grafy Count the $$$n$$$-vertex simple digraphs such that every vertex has in-degree and out-degree exactly $$$2$$$. $$$n\leq 500$$$.

The intended solution might be some inclusion-exclusion principle to solve in $$$O(n^3)$$$ time. However, this sequence is P-recursive! So we could compute in $$$O(n)$$$ or $$$O(\sqrt n\log n)$$$ time.

Here are several ways to prove this, the first one is by first writing down the formula obtained by the inclusion-exclusion principle:

$$$ a_n = \frac1{2^{2n}}\sum_{i,j,k} 2^i (-1)^j (-1)^k \binom{n}{i,j,k,n-i-j-k}2^i2^{2j}\binom{n-i-j}{k} k! 2^k (2n-2i-j-2k)! $$$

The precise formula is not very important, but the key observation is that the above formula is a sum of products of binomials and factorials, then as we argued before, the closure properties of D-Finite functions imply that $$$a_n$$$ is P-recursive.

Another way is to derive the D-Finite via the structure of 2-regular graphs, we first demonstrate a toy example.

Problem. Count the number of $$$n$$$-vertex simple 2-regular graphs.

Note that any 2-regular graph is a disjoint union of cycles. We have its exponential generating function is

$$$ \exp\left( \sum_{k\geq 3} \frac{(n-1)!x^n/2}{n!} \right) = \exp\left(-\frac x 2 - \frac{x^2}4\right)\frac 1{\sqrt{1-x}}, $$$

which is D-Finite.

The analysis of grafy is much more technical. See a treatment (in Chinese) by Isonan. Roughly speaking, his strategy is the following:

Let $$$G$$$ be a simple digraph, we construct the graph $$$G'$$$ by the following way: - For each vertex $$$u$$$ of $$$G$$$, it has two out-edges $$$(u, x)$$$ and $$$(u, y)$$$. We add a colored edge $$$(x, y)$$$ with color $$$u$$$ in $$$G'$$$.

Then $$$2$$$-biregular simple digraphs $$$G$$$ are in bijection with $$$n$$$-vertex colored graphs $$$G'$$$ in the following way:

$$$G'$$$ can have multiple edges, but no self-loop.
$$$G'$$$ is 2-regular.
The $$$n$$$ edges are colored by $$$[n]$$$, and each colored appears exactly once.
For each vertex, its adjacent edges are not colored by its own index.

Then we could apply the inclusion-exclusion principle in the following way. Let $$$b_{n,k}$$$ be the number of $$$n$$$-vertex graphs that satisfy 1-2, and have $$$n-k$$$ specified vertices matching an adjacent edge. Then we have

$$$ a_n = \sum_k (-1)^{n-k} k! b_{n,k}. $$$

The gf of $$$b$$$ should be the $$$\exp$$$ of "single cycles", which are easy to write down. Then we could derive the gf of $$$b$$$, and finally the gf of $$$a$$$.

I also want to remark that the above argument explicitly relies on the structure of 2-regular graphs. But this seems not necessary. In 1986, Goulden and Jackson proved that the number of $$$3$$$-regular graphs and $$$4$$$-regular graphs are P-recursive. Later, Ira M. Gessel proved that for $$$k$$$-regular graphs for any $$$k$$$ are P-recursive. See Symmetric functions and P-recursiveness. His argument doesn't rely on the structure of regular graphs, but on the theory of symmetric functions.

Change of Basis

Here are several attempts to generalize the ideas in Little Q's sequence (see also the blog).

China Team Selection 2023, Yet Another Eulerian Number Problem. Given $$$n, m, n_0, m_0$$$, consider an 2-d array with initial value $$$F_{n_0, k} = [k = m_0]$$$ and the recurrence relation $$$F_{n, m} = (\alpha\cdot (n-1) + 1 - m) F_{n-1,m-1} + (m+1)F_{n-1,m}$$$, compute the value of $$$F_{n, m}$$$.

Remark: The case $$$\alpha = 1$$$ is the Eulerian number problem.

Let $$$F_n(x)$$$ be the gf of the $$$n$$$-th row, we have the relation

$$$ F_{n+1} = (\alpha n x + 1)F_n + x (1-x) F_n', $$$

it's not very clear how this equation helps us solve the problem.

Consider the change of basis

$$$ G_n = \frac{x F_n}{(1-x)^{\alpha n + 1}}, $$$

the iteration of $$$G$$$ is simplified to

$$$ G_{n+1} = \frac{x}{(1-x)^{\alpha - 1}} G_n'. $$$

Remark: This step makes the iteration formula independent of $$$n$$$.

When $$$\alpha = 1$$$, this transform is simply taking the coefficient $$${a_k}$$$ to $$${k a_k}$$$, which is easy to perform multiple times. For the general case, we need to perform another change of basis.

Suppose we have some $$$u(x)$$$, take $$$H_n(u) = G_n(x)$$$ and $$$H_n$$$ satisfies

$$$ H_{n+1} = u H_n'. $$$

Then we can easily perform the iteration of $$$H$$$ multiple times.

Plug in the iteration rule of $$$G$$$ and $$$H$$$, we obtain the differential equation

$$$ u = u' \frac{x}{(1-x)^{\alpha-1}}. $$$

Now we can solve the problem in the following way:

From $$$F_{n_0}(x) = x^{m_0}$$$, we can derive the initial value of $$$G_{n_0}(x) = x^{m_0+1}/(1-x)^{\alpha n_0+1}$$$.
Then $$$H_{n_0}(x) = G_{n_0}(u^{\langle -1\rangle})$$$ can be computed in $$$O(n\log n)$$$ time. We can first solve $$$u^{\langle -1\rangle}$$$ by solving the differential equation, then use the explicit formula of $$$G_{n_0}$$$.
Then we can compute $$$H_{n}(x)$$$ from $$$H_{n_0}(x)$$$ by taking the coefficient $$${a_k}$$$ to $$${k ^{n-n_0}a_k}$$$.
Finally we convert back to $$$F_{n}(x)$$$ by

$$$F_{n}(x) = (1-x)^{\alpha n + 1} H_n(u(x)) = [(1-u^{\langle -1\rangle})^{\alpha n+1}H(x)] \circ u, $$$

since we only want to extract the coefficient of $$$x^m$$$, this can be done via Lagrange Inversion, in $$$O(n\log n)$$$ time.

Remark: Since now we have a general way to compose power series in $$$O(n\log^2 n)$$$ time, the above idea actually leads to an algorithm that takes arbitrary initial value $$$F_{n_0}(x)$$$ as input, and compute the value of $$$F_{n}(x)$$$ in $$$O(n\log^2 n)$$$ time.

Luogu P8555 Liar. Given $$$n$$$, for all $$$m$$$, count the number of permutations $$$\sigma$$$ of $$$[n]$$$ such that there exists distinct positions $$$p_m,\dots,p_{n-1}$$$ such that $$$p_t\leq t$$$ and $$$\sigma(p_t)$$$ is increasing.

The original solution is more combinatorial, here I present a solution using the idea of basis change.

We first consider a greedy procedure, first try to maximize the value of $$$\sigma(p_{n-1})$$$, then $$$\sigma(p_{n-2})$$$, and so on.

Consider dynamic programming starting with initial condition $$$f(n,0) = 1$$$. For $$$f(i,j)$$$ we are counting something that $$$p_{\geq i}$$$ are determined, and there are $$$j$$$ positions $$$p_t$$$ such that $$$t\geq i$$$ and $$$p_t \leq i$$$. We'll see what does this mean later.

For $$$f(i, j)$$$ ($$$i > m$$$), we need to determine $$$p_{i-1}$$$. There are two cases:

Case 1: The position $$$i$$$ is already chosen by some $$$p_t$$$, there are $$$j$$$ possibilities. This gives

$$$ f(i-1,j) \gets f(i-1,j) + j f(i,j). $$$

Case 2: that $$$i$$$ is not chosen by any $$$p_t$$$, the rest of unused values are $$$1,\dots,i-j$$$, thus there are $$$i-j$$$ choices for $$$\sigma(i)$$$, and then we must find a position for $$$p_{i-1}\leq i-1$$$, this increases $$$j$$$ by $$$1$$$. This gives

$$$ f(i-1,j+1) \gets f(i-1,j+1) + (i-j) f(i,j). $$$

Finally, we have the answer is

$$$ m!\sum_{j\leq m} f(m,j), $$$

because we first need to assign those $$$j$$$ positions, and then the rest of the values are determined.

We let $$$f_i(x)$$$ be the gf of $$$f(i, \cdot)$$$, then the above iteration can be written as

$$$ f_{i-1} = (x-x^2) f'_i + i x f_i. $$$

Consider the change of basis

$$$ g_i = \frac{f_i}{(1-x)^{i}}, $$$

then we have

$$$ g_{i-1} = x(1-x)^2 g_i'. $$$

Then we do another change of basis

$$$ h_i(u) = g_i(x), $$$

where $$$u = te^t, t=x/(1-x)$$$. One can verify that

$$$ h_{i-1} = u h_i'. $$$

This again reduces the iteration to a simple form.

In order to extract the answer, we first define the inner product

$$$ \langle f, g\rangle = \sum_k [x^k] f(x) \cdot [x^k]g(x). $$$

Our goal is to compute

$$$ \begin{align*} &\quad \langle f_m(x), 1 + \cdots + x^m \rangle\\ &= \langle g_m(x)(1-x)^m, 1 + \cdots + x^m \rangle, \end{align*} $$$

observed that

$$$ \begin{align*} &\quad \langle x^k (1-x)^m, 1 + \cdots + x^m \rangle\\ &= [x^m] \frac{x^k (1-x)^m}{1-x}\\ &= [x^{m-k}] (1-x)^{m-1} \\ &= [x^k] x(x-1)^{m-1}, \end{align*} $$$

we have

$$$ \begin{align*} &\quad \langle g_m(x)(1-x)^m, 1 + \cdots + x^m \rangle\\ &= \sum g_{m,k} \langle x^k(1-x)^m, 1+\cdots+x^m \rangle\\ &= \sum g_{m,k} [x^k] x(x-1)^{m-1}\\ &= \langle g_m(x), x(x-1)^{m-1} \rangle\\ &= \langle h_m(u), x(x-1)^{m-1} \rangle\\ &= \langle h_m(te^t), x(x-1)^{m-1} \rangle. \end{align*} $$$

Recall that $$$t = x/(1-x)$$$, we have

$$$ \begin{align*} &\quad \langle t^k, x(x-1)^{m-1} \rangle\\ &= \left\langle \left(\frac{x}{1-x}\right)^k, x(x-1)^{m-1} \right\rangle\\ &= [x^m]\left(\frac{x}{1-x}\right)^k \cdot (1-x)^{m-1}\\ &= [x^m] x^k (1-x)^{m-k-1}\\ &= [k=m]. \end{align*} $$$

Hence

$$$ \begin{align*} &\quad \langle h_m(te^t), x(x-1)^{m-1} \rangle\\ &= [t^m] h_m(te^t)\\ &= \sum_{j} ([u^j] h_m) \frac{j^{m-j}}{(m-j)!}\\ &= \sum_{j} ([u^j] h_n) \cdot j^{n-m} \frac{j^{m-j}}{(m-j)!}\\ &= \sum_{j} ([u^j] h_n)j^{n-j} \cdot \frac{1}{(m-j)!}. \end{align*} $$$

We can first compute the coefficient of $$$h_n(u)$$$, then use convolution to compute the answer for all $$$n$$$.

Since $$$u = t e^t$$$, by Lagrange inversion, we have

$$$ g_n(x)=(1-x)^{-n}=(1+t)^n = \left( 1 + \sum_{k\geq 1} \frac{(-k)^{k-1}}{k!}u^k \right)^n = h_n(u). $$$

Therefore, we can compute the answer in $$$O(n\log n)$$$ time.

Full text and comments »

+113

Elegia
7 weeks ago
2

Appetizer

We start with this toy problem.

$$$n$$$-King Problem. Given $$$n$$$, count the number of permutations $$$\sigma$$$ of $$$[n]$$$ such that for any $$$i$$$, $$$|\sigma(i) - \sigma(i+1)| \neq 1$$$.

Of course, this sequence has already been studied before, it is the A002464 in OEIS. You can find its recurrence relation

$$$ a_n = (n+1)a_{n-1} - (n-2)a_{n-2} - (n-5)a_{n-3}+ (n-3)a_{n-4}. $$$

But how to prove this? A combinatorial proof is possible, but not that easy. Here I will show a different way to prove this.

We first consider the inclusion-exclusion principle. Suppose we force a subset $$$S$$$ of $$$[n-1]$$$ such that $$$|\sigma(i) - \sigma(i+1)| = 1$$$ for all $$$i \in S$$$. Then we can see that the number of permutations is determined by the following: First arbitrarily permute order the contiguous blocks of $$$[n]$$$ determined by $$$S$$$, then for each block of length $$$\geq 2$$$, it can be ascending or descending, this contributes a factor of $$$2$$$. Therefore, we could write down the generating function of the sequence $$$a_n$$$ as

$$$ \begin{align*} \sum_{n\geq 0} a_n x^n &= \left(\sum_{n\geq 0} n! T^n\right) \circ \left(x - 2 x^2 + 2x^3 - 2x^4 + \cdots\right)\\ &= \left(\sum_{n\geq 0} n! T^n\right) \circ \left(x\frac{1-x}{1+x}\right). \end{align*} $$$

Let $$$A(x)$$$ be the generating function of $$$a_n$$$, if we want to derive a P-recursive relation for $$$a_n$$$, this is equivalent to finding the D-Finite relation for $$$A(x)$$$. The idea is to first analyze the generating function of $$$n! T^n$$$: let

$$$ S(T) = \sum_{n\geq 0} n!T^n, $$$

let $$$s_n = n!$$$ be the sequence of $$$S(T)$$$, then by the recurrence relation

$$$ s_n = n s_{n-1} + [n=0], $$$

we can turn this into a D-finite relation

$$$ S(T) = (S\cdot T)' \cdot T +1 = 1 + TS + T^2S'. $$$

Therefore, since $$$A(x) = S(g(x))$$$, where $$$g(x) = x\frac{1-x}{1+x}$$$, we can derive the D-Finite relation for $$$A(x)$$$:

$$$ A(x) = S \circ g = 1 + gA + g^2 S'\circ g. $$$

recall that $$$A' = g' S'(g)$$$, we can plug in $$$S'\circ g = A' / g'$$$, and we can get the D-Finite relation for $$$A(x)$$$:

$$$ A = 1 + gA + \frac{g^2 A'}{g'}. $$$

After some simplification, we can get the desired P-recursive relation for $$$a_n$$$. I omit the computation details here.

The derivation above is a typical example of the application of the following basic property of D-Finite functions:

Let $$$f$$$ be a D-Finite function, and $$$q$$$ be a rational function, then $$$f\circ q$$$ is D-Finite.

This leads to a fundamental reason why D-Finite functions are useful: They satisfy so many nice closure properties, and these properties are almost complete for explaining the ubiquity of P-recursive sequences in combinatorics.

Multivariate D-Finite Functions

We are already more-or-less familiar with one variable D-Finite functions, whose associated sequence is P-recursive (or called holonomic). A natural question is to ask the correct generalization of D-Finite functions to multiple variables.

The correct definition is due to Richard P. Stanley.

Def 1. Let $$$K$$$ be a field of characteristic zero. A formal power series $$$f \in K[ [ X_1,\dots,X_d ] ]$$$ is called D-finite if the $$$K(X_1,\dots,X_d)$$$-vector space spanned by all derivatives of $$$f$$$ is finite-dimensional.

A caveat is that for multiple variables, the description of the property of the associated sequence of a D-Finite series, is not as nice as in the univariate case. However, we will see why the definition is still useful later.

Similar to the univariate case, multivariate D-Finite functions have nice closure properties.

Theorem. If $$$f, g \in K[ [ X_1,\dots,X_d ] ]$$$ is D-Finite, then

For any $$$a,b \in K(X_1,\dots,X_d)$$$, if $$$af+bg$$$ is still a formal power series, it is D-Finite.
The product $$$fg$$$ is D-Finite.
If $$$u_1,\dots,u_d \in K[ [ Y_1,\dots,Y_e ] ]$$$ is algebraic, and the composition $$$f(u_1,\dots,u_d)$$$ is well-defined in some sense, then it is D-Finite.
(Lipshitz, 1988) If $$$f$$$ is D-Finite, then the diagonal operator $$$\Delta_{X_{d-1},X_d}\colon K[ [ X_1,\dots,X_d ] ] \to K[ [ X_1,\dots,X_{d-2},U ] ]$$$

$$$ \left(\sum_{i_1,\dots,i_d\geq 0} f_{i_1,\dots,i_d} X_1^{i_1}\cdots X_d^{i_d}\right) \mapsto \sum_{i_1,\dots,i_{d-2},j\geq 0} f_{i_1,\dots,i_{d-2},j,j} X_1^{i_1}\cdots X_{d-2}^{i_{d-2}} U^j. $$$

maps D-Finite functions to D-Finite functions.

The first two properties shows that D-Finite functions in $$$K[ [ X_1,\dots,X_d ] ] \otimes_{K[X_1,\dots,X_d]} K(X_1,\dots,X_d)$$$ form a $$$K(X_1,\dots,X_d)$$$-algebra.

The first three properties are easy to prove and the last one needs more insights (but is also the most useful one).

I'll not present the proofs here; interested readers can refer to Manuel Kauers's recent book D-Finite Functions. However, I want to remark that all these properties can be made constructive.

Make Practical

One could prove that Def 1 is equivalent to the following definition:

Def 2. A formal power series $$$f \in K[ [ X_1,\dots,X_d ] ]$$$ is D-Finite if for any $$$1\leq \ell \leq d$$$, there exists $$$k_\ell$$$ such that $$$\partial_{X_\ell}^{k_\ell} f$$$ can be $$$K(X_1,\dots,X_d)$$$-linearly expressed by $$$\partial_{X_\ell}^{j} f$$$ for $$$0\leq j < k_\ell$$$.

Therefore, the constructiveness means that:

We could use data of the linear relation in Def 2 to represent a D-Finite function. (Rigorously speaking, the solution of this system of equations didn't fully determine the D-Finite function, but it is enough for most purposes.)
The above closure properties can be proved by an algorithm that takes the linear relations as input, and outputs the linear relations of the output D-Finite function.

Now we take the simplest example of the sum of two univariate D-Finite functions. Let $$$f,g$$$ be D-Finite, and suppose they satisfy the linear relations

$$$ \partial^n_X f = \sum_{j < n} p_j(X) \partial^j_X f, \quad \partial^m_X g = \sum_{j < m} q_j(X) \partial^j_X g. $$$

Then we can derive the linear relation for $$$f+g$$$: Recursively compute the representation of $$$\partial_X^i (f+g)$$$, in terms of $$$\partial_X^j f$$$ and $$$\partial_X^k g$$$ for $$$j < n$$$ and $$$k < m$$$. Since they form an $$$(n+m)$$$-dimensional vector space, we can always find a linear relation for $$$\partial_X^i (f+g)$$$ between $$$0\leq i\leq n+m$$$.

For example, in jqdai0815's blog, he showed that the power series involved in the problem Chinese Elephant Chess is D-Finite. He suspected that it would be non-practical to compute the sequence via the P-recursive relation, but in fact, I implemented the above algorithm to automatically compute the relation, and then compute the sequence in linear time. Turns out that this code is, in fact, quite efficient. The code is attached below as a reference.

Code

#include <bits/stdc++.h>

#define LOG(FMT...) fprintf(stderr, FMT)

using namespace std;

typedef long long ll;
typedef unsigned long long ull;

// mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

const uint Mod = 998244353;

constexpr uint norm(uint v) { return v >= Mod ? v - Mod : v; }

struct Z {
  uint v;

  constexpr Z(uint v = 0) : v(v) {}

  constexpr Z(int v) : v(norm(static_cast<uint>(v % Mod + Mod))) {}

  inline constexpr Z operator+(const Z &rhs) const { return norm(v + rhs.v); }

  inline constexpr Z operator-(const Z &rhs) const { return norm(v + Mod - rhs.v); }

  inline constexpr Z operator-() const { return norm(Mod - v); }

  inline constexpr Z operator*(const Z &rhs) const { return static_cast<uint>(static_cast<ull>(v) * rhs.v % Mod); }

  inline constexpr Z inv() const;

  inline constexpr Z operator/(const Z &rhs) const { return *this * rhs.inv(); }

  constexpr operator uint() const { return v; }
};

inline constexpr Z &operator+=(Z &lhs, const Z &rhs) { return lhs = lhs + rhs; }

inline constexpr Z &operator-=(Z &lhs, const Z &rhs) { return lhs = lhs - rhs; }

inline constexpr Z &operator*=(Z &lhs, const Z &rhs) { return lhs = lhs * rhs; }

inline constexpr Z &operator/=(Z &lhs, const Z &rhs) { return lhs = lhs / rhs; }

inline constexpr Z pow(Z base, uint exp) {
  Z res = 1;
  for (; exp; base *= base, exp >>= 1)
    if (exp & 1)
      res *= base;
  return res;
}

inline constexpr Z Z::inv() const { return pow(*this, Mod - 2); }

int nPeak;

struct P : vector<Z> {
  P() : vector(1) {}

  P(unsigned long n) : vector(n) {}

  P(const initializer_list<value_type> &il) : vector(il) {}

  ~P() { nPeak = max(nPeak, (int) size()); }

  int deg() const { return (int) size() - 1; }

  bool trim() {
    int k = size();
    while (k && !operator[](k - 1)) --k;
    resize(k);
    return k;
  }

  P operator-() const {
    P ret(size());
    for (int i = 0; i < size(); ++i) ret[i] = -operator[](i);
    return ret;
  }
};

P operator*(const P &a, const P &b) {
  int n = a.deg(), m = b.deg();
  if (n == -1 || m == -1) return P();
  P c(n + m + 1);
  for (int i = 0; i <= n + m; ++i)
    for (int j = max(0, i - m); j <= min(i, n); ++j)
      c[i] += a[j] * b[i - j];
  return c;
}

P operator*(const P &a, const Z &z) {
  P c(a);
  for (Z &x : c) x *= z;
  return c;
}

P operator+(const P &a, const P &b) {
  if (a.size() >= b.size()) {
    P c = a;
    for (int i = 0; i < b.size(); ++i) c[i] += b[i];
    return c;
  }
  P c = b;
  for (int i = 0; i < a.size(); ++i) c[i] += a[i];
  return c;
}

P operator-(const P &a, const P &b) { return a + -b; }

bool operator==(const P &a, const P &b) {
  if (a.size() < b.size())
    return equal(a.begin(), a.end(), b.begin()) && count(b.begin() + a.size(), b.end(), Z(0)) == b.size() - a.size();
  return equal(b.begin(), b.end(), a.begin()) && count(a.begin() + b.size(), a.end(), Z(0)) == a.size() - b.size();
}

P gcd(P a, P b) {
  if (!a.trim()) return b;
  if (!b.trim()) return a;
  if (a.size() < b.size()) swap(a, b);
  while (b.trim()) {
    Z in = b.back().inv();
    for (Z &x : b) x *= in;
    int n = a.deg(), m = b.deg();
    for (int i = n; i >= m; --i) {
      for (int j = 1; j <= m; ++j) a[i - j] -= a[i] * b[m - j];
      a[i] = 0;
    }
    swap(a, b);
  }
  return a;
}

P div(P a, P b) {
  Z in = b.back().inv();
  for (Z &x : b) x *= in;
  int n = a.deg(), m = b.deg();
  P ret(n - m + 1);
  for (int i = n; i >= m; --i) {
    ret[i - m] = a[i] * b[m];
    for (int j = 1; j <= m; ++j) a[i - j] -= a[i] * b[m - j];
  }
  for (Z &x : ret) x = x * in;
  return ret;
}

struct Q {
  P x, y;

  Q(const P &x = P(), const P &y = {Z(1)}) : x(x), y(y) {}

  Q operator+(const Q &rhs) const { return Q(x * rhs.y + y * rhs.x, y * rhs.y); }

  Q operator-() const { return Q(-x, y); }

  Q operator-(const Q &rhs) const { return *this + -rhs; }

  Q operator*(const Q &rhs) const { return Q(x * rhs.x, y * rhs.y); }

  Q operator*(const Z &rhs) const { return Q(x * rhs, y * rhs); }

  Q inv() const { return Q(y, x); }

  Q operator/(const Q &rhs) const { return *this * rhs.inv(); }

  bool operator==(const Q &rhs) const { return x * rhs.y == y * rhs.x; }

  bool operator!=(const Q &rhs) const { return !operator==(rhs); }

  void simplify() {
    y.trim();
    if (!x.trim()) {
      y = P{Z(1)};
      return;
    }
    P g = gcd(x, y);
    x = div(x, g);
    y = div(y, g);
  }
};

P der(P a) {
  if (a.empty()) return a;
  for (int i = 1; i < a.size(); ++i) a[i] *= i;
  a.erase(a.begin());
  return a;
}

Z eval(const P &a, const Z &z) {
  Z v = 0;
  for (int i = a.deg(); i >= 0; --i)
    v = v * z + a[i];
  return v;
}

vector<Z> invs(const vector<Z>& vec) {
  vector<Z> pre(vec.size()), ret(vec.size());
  pre[0] = 1;
  for (int i = 1; i < vec.size(); ++i)
    pre[i] = pre[i - 1] * vec[i - 1];
  Z tot = accumulate(vec.begin(), vec.end(), Z(1), multiplies<Z>()).inv();
  for (int i = (int)vec.size() - 1; i >= 0; --i) {
    ret[i] = tot * pre[i];
    tot *= vec[i];
  }
  return ret;
}

Q der(const Q &a) {
  return Q(der(a.x) * a.y - der(a.y) * a.x, a.y * a.y);
}

struct Q_Basis {
  int dim, id;
  vector<vector<Q>> basis, augment;

  Q_Basis(int dim) : dim(dim), id(), basis(dim), augment(dim) {}

  vector<Q> insert(vector<Q> vec) {
    vector<Q> rep(dim + 1);
    rep[id++] = Q({Z(1)});
    for (int i = 0; i < dim; ++i) {
      if (vec[i] != Q()) {
        if (basis[i].empty()) {
          for (int j = i + 1; j < dim; ++j) {
            vec[j] = vec[j] / vec[i];
            vec[j].simplify();
          }
          for (int j = 0; j < id; ++j) {
            rep[j] = rep[j] / vec[i];
            rep[j].simplify();
          }
          vec[i] = Q({Z(1)});
          basis[i] = vec;
          augment[i] = rep;
          return {};
        } else {
          for (int j = i + 1; j < dim; ++j)
            vec[j] = vec[j] - vec[i] * basis[i][j];
          for (int j = 0; j < id; ++j)
            rep[j] = rep[j] - vec[i] * augment[i][j];
          vec[i] = Q();
        }
      }
    }
    return rep;
  }
};

// m = size(), sum_m A_m(x)P^(m) = 0
using ODE = vector<P>;
using PRec = vector<P>;

PRec DFinite_genPRec(const ODE &ode) {
  int offset = numeric_limits<int>::max();
  int n = ode.size() - 1, m = numeric_limits<int>::min();
  for (int i = 0; i <= n; ++i) {
    if (ode[i].empty()) continue;
    m = max(m, (int) ode[i].size() - 1 - i);
    int j = 0;
    while (ode[i][j] == 0) ++j;
    offset = min(offset, j - i);
  }
  m -= offset;
  PRec rec(m + 1, P(n + 1));
  P fall{Z(1)};
  for (int i = 0; i <= n; ++i) {
    P coe = fall;
    for (int j = 0; j < (int) ode[i].size() - i - offset; ++j) {
      if (j + i + offset >= 0) rec[j] = rec[j] + coe * ode[i][j + i + offset];
      coe = div(coe * P{-Z(i + j), Z(1)}, P{-Z(j), Z(1)});
    }
    fall = fall * P{-Z(i), Z(1)};
  }
  for (int i = 0; i <= m; ++i) rec[i].trim();
  return rec;
}

struct Eval {
  PRec prec;
  vector<Z> coeff;

  Eval(const PRec &prec) : prec(prec) {}

  int pre(int n) {
    coeff.resize(n + 1);
    for (int i = 0; i <= n; ++i) coeff[i] = eval(prec[0], i);
    int r = 0;
    for (int i = n; i; --i)
      if (coeff[i] == 0) {
        r = i;
        break;
      }
    return r;
  }

  vector<Z> post(vector<Z> init) {
    int start = init.size();
    auto nvs = invs(vector<Z>(coeff.begin() + start, coeff.end()));
    init.resize(coeff.size());
    for (int i = start; i < coeff.size(); ++i) {
      for (int j = 1; j < min(i + 1, (int)prec.size()); ++j)
        init[i] += init[i - j] * eval(prec[j], i);
      init[i] = init[i] * -nvs[i - start];
    }
    return init;
  }
};

ostream &operator<<(ostream &os, const P &p) {
  if (p.empty()) return os << 0;
  os << p[0];
  for (int i = 1; i < p.size(); ++i)
    os << " + " << p[i] << "x^" << i;
  return os;
}

ostream &operator<<(ostream &os, const Q &q) {
  return os << '(' << q.x << ") / (" << q.y << ')';
}

ostream &operator<<(ostream &os, const vector<P> &ode) {
  for (int i = 0; i < ode.size(); ++i) {
    if (i) os << " + ";
    os << '(' << ode[i] << ")P^(" << i << ")";
  }
  return os << " = 0";
}

template<class T>
istream &operator>>(istream &is, vector<T> &v) {
  for (T &x : v)
    is >> x;
  return is;
}

template<class T>
ostream &operator<<(ostream &os, const vector<T> &v) {
  if (!v.empty()) {
    os << v.front();
    for (int i = 1; i < v.size(); ++i)
      os << ' ' << v[i];
  }
  return os;
}

void simplify(ODE &ode) {
  P g = P();
  for (P &x : ode) {
    x.trim();
    g = gcd(g, x);
  }
  for (P &x : ode) if (!x.empty()) x = div(x, g);
}

ODE DFinite_sum(const ODE &op, const ODE &oq) {
  int n = op.size() - 1, m = oq.size() - 1;
  Q_Basis basis(n + m);
  vector<Q> pd(n + 1), qd(m + 1);
  pd[0] = qd[0] = Q({Z(1)});
  for (int dim = 0; dim <= n + m; ++dim) {
    cerr << "INSERT " << pd << " | " << qd << '\n';
    {
      vector<Q> vec(n + m);
      copy(pd.begin(), pd.begin() + n, vec.begin());
      copy(qd.begin(), qd.begin() + m, vec.begin() + n);
      auto ret = basis.insert(vec);
      if (!ret.empty()) {
        cerr << "OK dim = " << dim << "\n";
        ODE ode(dim + 1);
        P prod = {Z(1)};
        for (int i = 0; i < dim; ++i) prod = prod * ret[i].y;
        ode[dim] = prod;
        for (int i = 0; i < dim; ++i) ode[i] = ret[i].x * div(prod, ret[i].y);
        simplify(ode);
        return ode;
      }
    }
    pd[n] = qd[m] = Q();
    for (int j = n - 1; j >= 0; --j) {
      pd[j + 1] = pd[j + 1] + pd[j];
      pd[j] = der(pd[j]);
    }
    for (int j = m - 1; j >= 0; --j) {
      qd[j + 1] = qd[j + 1] + qd[j];
      qd[j] = der(qd[j]);
    }
    for (int j = 0; j < n; ++j) {
      pd[j] = pd[j] - pd[n] * op[j] / op[n];
      pd[j].simplify();
    }
    for (int j = 0; j < m; ++j) {
      qd[j] = qd[j] - qd[m] * oq[j] / oq[m];
      qd[j].simplify();
    }
  }
  cerr << "!FAILED\n";
  assert(false);
}

ODE DFinite_prod(const ODE &op, const ODE &oq) {
  int n = op.size() - 1, m = oq.size() - 1;
  Q_Basis basis(n * m);
  vector<vector<Q>> p(n + 1, vector<Q>(m + 1));
  p[0][0] = Q({Z(1)});
  for (int dim = 0; dim <= n * m; ++dim) {
    {
      vector<Q> vec(n * m);
      for (int i = 0; i < n; ++i)
        for (int j = 0; j < m; ++j)
          vec[i * m + j] = p[i][j];
      cerr << "INSERT " << vec << '\n';
      auto ret = basis.insert(vec);
      if (!ret.empty()) {
        cerr << "OK dim = " << dim << "\n";
        ODE ode(dim + 1);
        P prod = {Z(1)};
        for (int i = 0; i < dim; ++i) prod = prod * ret[i].y;
        ode[dim] = prod;
        for (int i = 0; i < dim; ++i) ode[i] = ret[i].x * div(prod, ret[i].y);
        simplify(ode);
        return ode;
      }
    }
    for (int i = 0; i < n; ++i) p[i][m] = Q();
    for (int j = 0; j < m; ++j) p[n][j] = Q();
    for (int i = n - 1; i >= 0; --i)
      for (int j = m - 1; j >= 0; --j) {
        p[i + 1][j] = p[i + 1][j] + p[i][j];
        p[i][j + 1] = p[i][j + 1] + p[i][j];
        p[i][j] = der(p[i][j]);
      }
    for (int i = 0; i < n; ++i)
      for (int j = 0; j < m; ++j) {
        p[i][j] = p[i][j] - p[n][j] * op[i] / op[n] - p[i][m] * oq[j] / oq[m];
        p[i][j].simplify();
      }
  }
  cerr << "!FAILED\n";
  assert(false);
}

ODE DFinite_compQ(const ODE &op, Q q) {
  q.simplify();
  int n = op.size() - 1;
  vector<vector<Q>> tri(n + 1, vector<Q>(n + 1));
  tri[0][0] = Q({Z(1)});
  Q d = der(q);
  d.simplify();
  for (int i = 0; i < n; ++i) {
    for (int j = 0; j <= i; ++j) {
      tri[i + 1][j] = tri[i + 1][j] + der(tri[i][j]);
      tri[i + 1][j + 1] = tri[i][j] * d;
    }
    for (int j = 0; j <= i + 1; ++j) tri[i + 1][j].simplify();
  }
  vector<Q> vec(n + 1);
  for (int i = 0; i <= n; ++i) {
    for (int j = op[i].deg(); j >= 0; --j)
      vec[i] = vec[i] * q + Q({op[i][j]});
    vec[i].simplify();
  }
  for (int i = n; i >= 0; --i) {
    cerr << tri[i][i] << '\n';
    vec[i] = vec[i] / tri[i][i];
    vec[i].simplify();
    for (int j = 0; j < i; ++j) {
      vec[j] = vec[j] - vec[i] * tri[i][j];
    }
  }
  P prod = P{Z(1)};
  for (int i = 0; i <= n; ++i) prod = prod * vec[i].y;
  ODE ret(n + 1);
  for (int i = 0; i <= n; ++i) ret[i] = vec[i].x * div(prod, vec[i].y);
  simplify(ret);
  return ret;
}

vector<Z> apply(vector<Z> vec, const ODE &ode) {
  vector<Z> ret(vec.size());
  for (int i = 0; i < ode.size(); ++i) {
    for (int j = 0; j < ode[i].size(); ++j)
      for (int k = j; k < vec.size(); ++k)
        ret[k] += vec[k - j] * ode[i][j];
    for (int j = 1; j < vec.size(); ++j)
      vec[j - 1] = vec[j] * Z(j);
    vec.back() = 0;
  }
  return ret;
}

const ODE ODE_EXP = {{-Z(1)},
                     {Z(1)}};
const ODE ODE_LN = {{Z()},
                    {-Z(1)},
                    {Z(1), -Z(1)}};
const ODE ODE_FACT = {{Z(1)},
                      {-Z(1), Z(3)},
                      {Z(),   Z(), Z(1)}};

const Q A1x({Z(), Z(1), Z(1)}, {Z(2), -Z(2)});
const Q B1x({Z(2), -Z(1)}, {Z(2), -Z(2)});

ODE ode_power(const Z &k) {
  return ODE{{-Z(k)},
             {Z(), Z(1)}};
}

ODE ode_bessel(const Z &k) {
  return ODE({{-Z(1)},
              {k + Z(1)},
              {Z(), Z(1)}});
}

int main() {
#ifdef ELEGIA
  freopen("test.in", "r", stdin);
  int nol_cl = clock();
#endif
  ios::sync_with_stdio(false);
  cin.tie(nullptr);

  int n, m;
  cin >> n >> m;
  if (n > m) swap(n, m);
  Z k = m - n;
  ODE A = DFinite_prod(DFinite_compQ(ODE_EXP, A1x), DFinite_compQ(ode_power(-Z(2).inv()), Q({Z(1), -Z(1)})));
  cerr << "A: " << A << '\n';
  ODE B = DFinite_compQ(ode_bessel(k), B1x * B1x * Q({Z(), Z(1)}));
  cerr << "B: " << B << '\n';
  ODE C = DFinite_compQ(ode_power(k), B1x);
  cerr << "C: " << C << '\n';

  ODE tot = DFinite_prod(B, DFinite_prod(A, C));
  cerr << "TOT ODE: " << tot << '\n';

  PRec prec = DFinite_genPRec(tot);
  cerr << prec << '\n';
  Eval eval(prec);
  cerr << eval.coeff << '\n';
  eval.pre(n);
  auto res = eval.post({Z(1)});
  Z ans = res[n];
  Z v = 1;
  for (int i = 1; i <= k; ++i) v *= i; ans /= v;
  for (int i = 1; i <= n; ++i) ans *= i;
  for (int i = 1; i <= n + k; ++i) ans *= i;
  cout << ans << '\n';

  cerr << "peak size = " << nPeak << '\n';

#ifdef ELEGIA
  LOG("Time: %dms\n", int ((clock()
          -nol_cl) / (double)CLOCKS_PER_SEC * 1000));
#endif
  return 0;
}

More Implications

Then, let me demonstrate a more interesting example, this might explain why D-Finite functions are everywhere.

We are often concerned with a summation involving binomials, and we want to simplify the expression. For example, the following sum

$$$ a_n = \sum_k (-1)^k \binom{n}{k}^3. $$$

Well, this actually can be simplified and is known as Dixon's identity, which has a lot of beautiful proofs.

But in a computational perspective, suppose I just want to compute the sequence $$$a_n$$$ efficiently,

do we need to know such a beautiful identity?

The answer is no. It's very general that such kind of sequences are P-recursive, here is a proof:

We first write down the generating function of the binomial:

$$$ Q(X,Y)= \sum_{n,k} \binom{n}{k} X^n Y^k = \sum_n X^n (1+Y)^n = \frac 1{1-X-XY}, $$$

which is clearly D-Finite. Then consider the generating function

$$$ Q(X_1,Y_1)Q(X_2,Y_2)Q(X_3,Y_3), $$$

is clearly D-Finite. Then we use the diagonal operator to glue the variables $$$X_1,X_2,X_3$$$ together, and $$$Y_1,Y_2,Y_3$$$ together, and we get a D-Finite function

$$$ R(X,Y) = \sum_{n,k}\binom{n}{k}^3 X^n Y^k. $$$

Finally, we plug in $$$Y=-1$$$, and we get the generating function of $$$a_n$$$:

$$$ R(X,-1) = \sum_n a_n X^n. $$$

Since $$$R(X,-1)$$$ is D-Finite, $$$a_n$$$ is P-recursive.

Therefore, we could compute $$$a_n$$$ in linear time by just using the recurrence, or even in $$$O(\sqrt n \log n)$$$ time by the fast evaluation algorithm of P-recursive sequences. Knowing the simplification of the expression doesn't help reducing the time complexity in this sense.

This argument has a vast generalization. In the paper Multiple Binomial Sums, the authors formalized a class of summation involving binomials that captures almost all the known identities. One can mimic the above argument to prove that all these sequences are D-Finite. (In the paper they proved some stronger characterization.)

Then here comes another question:

If we just want to solve a problem in competition, do we need to know the proof of P-Recursiveness?

The answer is also no! If a priori we know that the sequence is P-recursive, we can just compute the first several terms in some brute-force way, and then use Gaussian elimination to find the P-recursive relation. This is usually called Min25-BM since the BM algorithm tells that linear recurrence sequences can be reconstructed efficiently.

Remark: The problem of finding the P-recursive relation can be reduced to the Hermite-Padé approximation problem. While Gaussian elimination takes $$$O(n^3)$$$ time, the Hermite-Padé approximation can be solved $$$O(n^2)$$$ time or even $$$O(n\log^2 n)$$$ time. (I'm a little bit hand-waving here about what $$$n$$$ actually means), though this might not be very useful if one just wants to find a P-recursive relation of constant size.

Full text and comments »

+123

Elegia
7 weeks ago
0

How to composite (some) polynomials faster?

By Elegia, history, 12 months ago, In English

Hi everyone, today I'd like to elaborate something more on the problem I on the recently ended div 1 contest.

As described in the official solution, a bottleneck of the problem is the following: Given a sequence $$${c_j}_{-n \leq j \leq n}$$$, compute

$$$ g_k = \sum_{-k\leq j\leq k} ([x^j] (x + 1/x)^k) c_j $$$

for all $$$0 \leq k \leq n$$$.

In the official solution, an $$$O(n\log^2 n)$$$ time algorithm is presented, and we also claimed that it is possible to compute $$$g_k$$$ in $$$O(n\log n)$$$ time. In this post, I'd like to elaborate on how to do that.

By the way, another bottleneck of the problem is the relaxed convolution problem, where the usual implementation also takes $$$O(n\log^2 n)$$$ time. I mentioned in comment that the relaxed convolution has a somehow practical solution that has time complexity

$$$ O\left(\frac{n\log^2 n}{\log \log n}\right), $$$

and there are more intricated optimization that have theoretical complexity $$$O(n(\log n)^{1+o(1)})$$$, we have theoretically faster algorithms for problem I.

Well, let's now focus on the previous problem. If you know that methodology called transposition principle (aka. Tellegen's principle), you will immediately see that the problem is equivalent to the following: Given a polynomial $$$A(x)$$$ of degree $$$n$$$, compute the coefficients of $$$A(x + 1/x)$$$.

Well. If you don't know that, hope there will be someone writing a blog post about that. And just forget the previous problem, and focus on the problem of computing the coefficients of $$$A(x + 1/x)$$$. :)

So here comes our protagonist of this post: the composition problem.

Composition problem

The problem is described as follows. Given a polynomial $$$A(x)$$$ of degree $$$n$$$, and a rational function $$$P(x)/Q(x)$$$, where $$$P, Q$$$ has a constant degree, compute the coefficients of the polynomial $$$A(P(x)/Q(x))$$$. Here we mean that, since $$$Q(x)^n A(P(x)/Q(x))$$$ must be a polynomial, the goal is to compute its coefficients.

Another intuitive interpretation is that, $$$A(P(x)/Q(x))$$$ is a rational function, and we want to compute the coefficients of its numerator. This naturally leads to the following general algorithm: Consider a divide-and-conquer algorithm, let $$$A(x) = A_0(x) + x^{n/2} A_1(x)$$$, we compute $$$A_0(P/Q)$$$ and $$$A_1(P/Q)$$$ separately, and merge them via

$$$ A(P/Q) = A_0(P/Q) + (P/Q)^{n/2}A_1(P/Q). $$$

The time complexity of this algorithm is given by the recurrence relation $$$T(n) = 2T(n/2) + O(n\log n)$$$, which has solution $$$O(n\log^2 n)$$$.

Several years ago I started to think about this problem. Unfortunately, I don't know how to do better than this in general. However, I do discover some special cases that have $$$O(n\log n)$$$ algorithms, where indeed handle the case $$$x+1/x = (1 + x^2)/x$$$.

You might suspect that there is some magic theory behind this, but actually I don't have. The only thing I did was to find some algebraic tricks, and applying them just works.

Building Blocks

We first start with several building blocks that are useful for the composition problem.

The first case is $$$A(x+c)$$$. Actually this is almost the only nontrivial tool we can use to speed up the composition problem. I guess most of the audience already knows how to do this in $$$O(n\log n)$$$, but for completeness I will still present the idea here. Just look at the expansion

$$$ b_k = \sum_j a_j \binom j k = \frac 1{k!}\sum_j a_j j! \cdot \frac{c^{j-k}}{(j-k)!}, $$$

which is a convolution.

The second case is $$$A(x^k)$$$. This only requires $$$O(n)$$$ time, since it essentially needs no arithmetic operations.

The third case is $$$A(\alpha x)$$$, where $$$\alpha$$$ is a constant. This can be done by scaling the coefficients of $$$A(x)$$$, which also takes $$$O(n)$$$ time.

Basic Tricks

With the three building blocks above, we can already handle something that seems to be much more.

The first problem is $$$A(x^2 + ax + b)$$$, i.e., the composition of $$$A(x)$$$ with a quadratic polynomial. This can be done by an algebraic trick:

$$$ x^2 + ax + b = \left(x + \frac a 2\right)^2 + \left(b - \frac{a^2}{4}\right). $$$

What does this mean? This means that we can compute $$$A(x^2 + ax + b)$$$ by computing the compositions

$$$ A(x^2 + ax + b) = A(x) \circ \left(x +\left( b - \frac{a^2}{4}\right)\right) \circ x^2 \circ \left(x + \frac a 2\right) $$$

in the order from left to right.

The second problem is $$$A((ax+b)/(cx+d))$$$, i.e., the Mobius transformation. Just look at

$$$ \frac{ax+b}{cx+d} = \frac ac + \frac{bc-ad}{c(cx+d)}, $$$

and we can compute $$$A((ax+b)/(cx+d))$$$ by computing the compositions

$$$ A\left(\frac{ax+b}{cx+d}\right) = A(x) \circ \left(x + \frac ac\right) \circ \frac{bc-ad}{c}x^{-1} \circ (cx+d). $$$

Well, there are some technical details on how to compute the compositions after composing $$$x^{-1}$$$, but these are just tedious and I think you clever readers can figure them out by yourself. :)

The Trick for $$$A(x + 1/x)$$$

Now we are going to solve $$$A(x + 1/x)$$$, and here is the trick:

$$$ x + 1/x = 2\frac{x+1}{x-1} \circ x^2 \circ \frac{x+1}{x-1}. $$$

Using this special case, we actually can handle general $$$A(P(x)/Q(x))$$$ in $$$O(n\log n)$$$ time, where $$$P, Q$$$ are quadratic polynomials.

For $$$(ax^2 + bx + c) / (dx^2 + ex + f)$$$, we can first move out a constant term, then reduce to the cases $$$(b' x + c') / (dx^2 + ex + f)$$$.

Then substitute by $$$x - c'/b'$$$, we reduce to the case $$$b'x / (d'x^2 + e'x + f')$$$.

Then take reciprocal, we reduce to the case $$$(d'x^2 + e'x + f') / b'x = (d'x^2 + f') / b'x + e'/b'$$$.

Seems that $$$d'x/b' + f'x^{-1}/b'$$$ is different with $$$x+x^{-1}$$$, but after scaling, we can again reduce to the case $$$x+x^{-1}$$$.

Well, for some special cases, we cannot do division, but then it can also be reduced to other cases, such as composing a quadratic polynomial, which is also already settled.

(Acknowledgement: This blog (Chinese) reminded me that it's possible to settle all $$$(ax^2 + bx + c) / (dx^2 + ex + f)$$$).

The Trick for $$$A(x^3 + ax^2 + bx + c)$$$

Actually, we can also handle $$$A(x^3 + ax^2 + bx + c)$$$ in $$$O(n\log n)$$$ time.

The first two tricks are just applying $$$x+c$$$, then we can figure out that the only thing we need to handle is to compute $$$A(x^3 + cx)$$$. Since we can also do scaling, we only need to solve some specific $$$c$$$.

The game ends with the following trick:

$$$ (x^3 - 3x) \circ \frac{x+x^{-1}}{2} = \frac{x^3+x^{-3}}{2}. $$$

This means that

$$$ x^3-3x = \frac{x+x^{-1}}{2} \circ x^3 \circ \left(\frac{x+x^{-1}}{2}\right)^{\langle -1\rangle}. $$$

You may ask, how can I composite $$$\left(\frac{x+x^{-1}}{2}\right)^{\langle -1\rangle}$$$? Well, the problem is that it will occur the composition $$$x^{1/2}$$$ when you expand the composition chain of $$$x+x^{-1}$$$, but at that time, it can be proved that all terms of the polynomial have the form $$$x^{2k}$$$, then the composition $$$x^{1/2}$$$ has clear meanings.

Well, there are many technical details that I omitted, for example, it will occur in some essential cases that we need to do scaling with a parameter is a square root of something. This can be handled by introducing a formal element $$$R = \sqrt \alpha$$$, and there might be some other technical details that I omitted.

Whatever, the long chain of composition and the technical details make the $$$O(n\log n)$$$ algorithm not so practical according to my experimental test, but I think it is still an interesting result.

Can we go further?

Is it possible to solve the composition problem in $$$O(n\log n)$$$ time for general rational functions? At least for the current techniques we have, the answer is no.

Here is an argument. What we have done before is to reduce the composition problem to the composition chain with the Mobius transform $$$(ax+b)/(cx+d)$$$, and $$$x^k$$$, and $$$x^{1/k}$$$. Note that if we want to solve the equation $$$P(x) = 0$$$. Suppose $$$P(x)$$$ has such a composition chain, then we can solve the equation $$$P(x) = 0$$$ by solving the composition chain step by step, this will result in an expression of the roots of $$$P(x)$$$ in radicals. However, by the celebrated Abel–Ruffini theorem, we know that there are polynomials (actually general polynomials of degree $$$\geq 5$$$) that cannot be solved by radicals, and this means that these polynomials definitely cannot be represented by the composition chain we want!

So there might be some interesting questions: Try to find some characterization of the polynomials that can be represented by the composition chain. This seems to let me recall the Galois theory, or a characterization of the composition chain of a polynomial by polynomials, known as the "Ritt decomposition theorem". However, they seem to not be directly related to the composition chain we are talking about, I wonder if there might or might not be some well-developed mathematical theory behind this problem.

Another question is, can we find some other building blocks that can be used to solve the composition problem, thus extending our ability to solve the composition problem for more cases? Or more ambitiously, can we find an entirely different algorithm that can solve the composition problem in $$$O(n\log n)$$$ time for general rational functions?

I'll be happy to see if any of these questions have positive or negative answers.

Anyway, have a nice day!

Full text and comments »

Discussion of think-cell Round 1

+269

Elegia
12 months ago
8

Optimal Algorithm on Polynomial Composite Set Power Series

By Elegia, history, 4 years ago, In English

This is a part of my report on China Team Selection 2021. I might translate some other parts later if you are interested.

I'm going to show that we can calculate $$$f(G)$$$ for any polynomial $$$f\in R[x]$$$ and set power series $$$G: 2^{n} \rightarrow R$$$ under $$$\Theta(n^2 2^n)$$$ operations on $$$R$$$.

Let's first consider a special case: calculate $$$F = \exp G$$$. We consider set power series as truncated multivariate polynomial $$$R[x_1,\dots,x_n]/(x_1^2,\dots,x_n^2)$$$. We can see that $$$F' = FG'$$$ whichever partial difference $$$\frac{\partial}{\partial x_k}$$$ we take. Then we can rewrite this equation as $$$[x_n^1] F = ([x_n^1]G)\cdot [x_n^0]F$$$. Hence we reduced the problem into calculating $$$[x_n^0] F$$$, and then one subset convolution gives the rest part. The time complexity is $$$T(n)=\Theta(n^22^n) + T(n-1)$$$, which is exactly $$$\Theta(n^22^n)$$$. I call this method "Point-wise Newton Iteration".

This method sounds more reasonable than calculating the $$$\exp$$$ on each rank vector. Because $$$\frac{G^k}{k!}$$$ counts all "partitions" with $$$k$$$ nonempty sets in $$$G$$$. So it exists whenever $$$1\sim n$$$ is invertible in $$$R$$$.

Now we try to apply this kind of iteration into arbitrary polynomial $$$f$$$. Let $$$F=f(G)$$$, then $$$F'=f'(G)G'$$$. Then we reduced the problem into calculating $$$[x_n^0]f(G)$$$ and $$$[x_n^0]f'(G)$$$. This becomes two problems! But don't worry. In the second round of recursion, it becomes $$$[x_{n-1}^0 x_n^0]$$$ of $$$f(G), f'(G), f"(G)$$$. The number of problem grows linearly. You will find out that the complexity is $$$\sum_{0\le k\le n} (n-k) \cdot \Theta(k^2 2^k)$$$.

It's not hard to see that the complexity is still $$$\Theta(n^2 2^n)$$$, because $$$\sum_{k\ge 0} k\cdot 2^{-k}$$$ converges.

Noted that $$$\frac{(A+B)^2}2-\frac{A^2}2-\frac{B^2}2=AB$$$, we proved the equivalence between the composition and subset convolution.

Here are some remarks:

If $$$G$$$ has no constant term, $$$f$$$ is better to be comprehended as an EGF.
This algorithm holds the same complexity on calculating $$$f(A, B)$$$ and so on.
We can optimize the constant of this algorithm by always using the rank vectors during the iteration. Then it can beat lots of methods that work for some specialized $$$f$$$.
If you have some knowledge of the "Transposition Principle", you will find out that, for any fixed $$$F, G: 2^n \rightarrow R$$$, the transposed algorithm gives $$$\sum_S F_S \cdot (G^k)_S$$$ for all $$$k$$$. It immediately gives an $$$\Theta(n^22^n)$$$ algorithm to calculate the whole chromatic polynomial of a graph with $$$n$$$ vertices.

Full text and comments »

+312

Elegia
4 years ago
21

IOI2021 China Team

By Elegia, 4 years ago, In English

The China team selection ended today, the top 4 are:

Yu Haoxiang (yhx-12243),
Deng Mingyang (heuristica),
Qian Yi (skip2004) and
Dai Chenxin (gtrhetr).

They will participate in IOI2021. Good luck!

Full text and comments »

+692

Elegia
4 years ago
59

Another solution of CF#641 F2

By Elegia, history, 5 years ago, In English

We will get the bivariate generating function $$$\widehat F(z, t) = \sum_{n\ge 1}\sum_{k=1}^n A_{n,k}\frac{z^nt^k}{n!}$$$, where $$$A_{n,k}$$$ is the sum of occurrences of $$$k$$$ in all good sequence of length $$$n$$$. Consider the inclusion-exclution principle. For all contribution with $$$\max k = m$$$, we have

$$$ \sum_{j=1}^m \sum_{S \subseteq \{1, 2, \cdots m-1\}} (-1)^{|S|} Q(S, z, t, j) $$$

This means that for all $$$i \in S$$$, we force all recurrences of $$$i$$$ are before $$$i + 1$$$, and the occurrences of $$$j$$$ are counted.

After some tough calculation, we will found out that this equates to $$$\frac{t(\mathrm e^{z(1-t)}-1)}{(1-z) (1-t \mathrm e^{z(1-t)})}$$$. Here are the details:

Details

This part will be kind of hard to understand because I used some of my own symbols in solving combinatorial problems. I tried to remove some of them unfortunately it's still so abstract.

We consider changing the order of calculation. Because those $$$(-1)$$$ connected several consecutive "$$$k$$$"s, we enumerate the consecutive "segment" of $$$k$$$. Then we should count the occurrences of $$$k$$$ and multiply it with $$$t^k$$$. So this could be represented with a sum of this kind of structure:

$$$ \overbrace{\left.\bigcirc {\color{blue}(\cdot t)} \frac{\tiny \color{red} -1}{}\bigcirc {\color{blue}(\cdot t)} \frac{\tiny \color{red} -1}{} \cdots \frac{\tiny \color{red} -1}{} \bigcirc {\color{blue}(\cdot t)} \middle\vert \cdots \middle\vert \bigcirc {\color{blue}(\cdot t)} \frac{\tiny \color{red} -1}{}\cdots \frac{\tiny \color{red} -1}{} {\color{green}\bigcirc} {\color{blue}(\cdot t)}{\color{green}(\cdot \mathrm{Len})} \right.}^{\mathrm{with} {\color{blue}(\cdot t)}} \left. \frac{\tiny \color{red} -1}{} \cdots \frac{\tiny \color{red} -1}{} \bigcirc \middle\vert \cdots \middle\vert \bigcirc \frac{\tiny \color{red} -1}{}\bigcirc \frac{\tiny \color{red} -1}{} \cdots \frac{\tiny \color{red} -1}{} \bigcirc \right. $$$

Here $$$\bigcirc$$$ means all elements with number $$$k$$$, and $$$-$$$ means the restriction between two numbers, $$$|$$$ means no restriction.

For a consecutive segment of "$$$k$$$"s with restriction, it's easy to be represented by OGF. The last combination between them could be expressed by EGF.

The OGF have 3 parts. The first part contains $$$t$$$s. The second part contains the ending of $$$t$$$ and counts the occurrence. The third part doesn't contain $$$t$$$.

For the first part:

$$$ \begin{aligned} L(z,t) &=\frac{-1}{1+\frac z{1-z}t} + 1 \\ &= \frac{t z}{1-(1-t) z}\\ &= \sum_{k\ge 0} z^{k+1} t(1-z)^k\\ \widehat L(z,t) &= \frac t{1-t}\sum_{k\ge 0} \frac{z^{k+1}}{(k+1)!} (1-t)^{k+1}\\ &= \frac t{1-t} (\mathrm e^ {z(1-t)} - 1) \end{aligned} $$$

For the second part:

$$$ \begin{aligned} U(z,t) &= \frac1{1+\frac z{1-z}t} \frac {zt}{(1-z)^2} \frac1{1+\frac z{1-z}}\\ &= \frac{tz}{1-(1-t)z} = L(z,t)\\ \widehat U(z,t)&= \widehat L(z,t) \end{aligned} $$$

For the third part it's easy to know that $$$L(z, 1) = z = \widehat L(z, 1)$$$. So the EGF of our destination is

$$$ \frac1{1-\widehat L(z,t)} \widehat U(z, t) \frac 1{1 - \widehat L(z, 1)} $$$

After the simplication we have $$$\frac{t(\mathrm e^{z(1-t)}-1)}{(1-z) (1-t \mathrm e^{z(1-t)})}$$$.

Now our goal is to calculate

$$$ [z^n]\frac{t(\mathrm e^{z(1-t)}-1)}{(1-z) (1-t \mathrm e^{z(1-t)})} $$$

We consider $$$\left([z^n]\frac{t(\mathrm e^{z(1-t)}-1)}{(1-z) (1-t \mathrm e^{z(1-t)})}\right) + 1 = (1-t)[z^n] \frac 1{(1-z)(1-t\mathrm e^{z(1-t)})}$$$, which looks simpler. We let $$$z = \frac u{1-t}$$$, then

$$$ [z^n]\frac 1{(1-z)(1-t\mathrm e^{z(1-t)})} = (1-t)^n[u^n] \frac1{(1-\frac u{1-t})(1-t\mathrm{e}^u)} $$$

Hence we have

$$$ \begin{aligned} (1-t)[z^n] \frac 1{(1-z)(1-t\mathrm e^{z(1-t)})} &= [u^n]\frac{(1-t)^{n+2}}{(1-\frac{t}{1-u})(1-t\mathrm e^u)(1-u)}\\&= (1-t)^{n+2} [u^n] \left(\frac{-\mathrm e^u}{\left(\mathrm e^u u-\mathrm e^u+1\right) \left(1-t \mathrm e^u\right)}+\frac{\frac{1}{1-u}}{\left(\mathrm e^u u-\mathrm e^u+1\right) (1-\frac{t}{1-u})}\right)\end{aligned} $$$

Noticed that $$$[u^m]\mathrm{e}^{ku} = \frac{k^m}{m!}$$$, this could be calculated straightly through multipoint evaluation with time complexity of $$$\Theta(n\log^2 n)$$$. And $$$[u^m] \frac1{(1-u)^k} = \binom{n+k-1}{n} = \frac{(n+k-1)!}{n!(k-1)!}$$$ so this part could be calculated through a convolution. It will pass if your implementation doesn't have a big constant.

It could also be improved to $$$\Theta(n\log n)$$$ through the Lagrange Inversion formula similar to the original solution, I leave this as an exercise.

UPD1: simplified some deduction.

Full text and comments »

+172

Elegia
5 years ago
15