Hi everyone!
In this blog, I would like to present a somewhat novel concept in competitive programming, which is, while being relatively simple, allows us to enumerate directed graphs with specific type of strongly connected components. This allows us to easily and uniformly compute generating functions e.g. for acyclic digraphs or strongly connected digraphs. The formalism introduced here is also very intuitive at that.
I initially wanted to make a problem based on the content of this blog to appear in today's round, but after some discussions we decided to exclude it, as the contest already had a sufficient amount of difficult problems.
Prerequisites
You are expected to be familiar with exponential generating functions, and the exponential formula for connected structures, that is, that if $$$A(x)$$$ is the EGF for combinatorial structures $$$A$$$, then $$$\exp A(x)$$$ is the EGF corresponding to sets of isolated instances of $$$A$$$.
Directed acyclic graphs
To warm-up, consider the following problem:
You're given an integer $$$n$$$. You need to count the number of distinct directed acyclic graphs on $$$n$$$ vertices.
In other words, the problem asks you to find the $$$n$$$-th entry of the sequence A003024. One direct and somewhat well-known way to solve this problem is to use the inclusion-exclusion principle based on the number of the source vertices of the graph:
In the formula above, we choose $$$i$$$ out of $$$n$$$ vertices to be the source vertices in $$$\binom{n}{i}$$$ ways, and then we choose how we draw the arcs from the source vertices to all the other in $$$2^{i(n-i)}$$$ ways, and we build a DAG on the remaining $$$n-i$$$ vertices. To avoid double-counting, we do inclusion-exclusion on the number of source vertices $$$i$$$.
Is this fft?
The formula above, used as it is, allows to compute $$$a_n$$$ in $$$O(n^2)$$$. But what if we want to do it faster? Let's rewrite it as
It almost looks like a convolution, except for the nasty $$$2^{i(n-i)}$$$ part. Luckily, we have a trick up our sleeve to deal with it! Recall the following formula from the chirp Z-transform that allows to rewrite $$$ab$$$ in terms of $$$a+b$$$, $$$a$$$ and $$$b$$$:
Applying this formula, we get
and moving the sum to the right, we may write it in a bit more uniform way as
In terms of generating functions, it rewrites in a much simpler form as
where
is the exponential generating function of the acyclic digraphs, and $$$L(\cdot)$$$ is a function defined on formal power series, which divides the $$$k$$$-th coefficient of the formal power series by $$$2^{\binom{k}{2}}$$$. In other words, we get the formula
and first $$$n$$$ coefficients of the right-hand side can be computed in $$$O(n \log n)$$$ with power series inversion.
Arrow product
Let's stop for a moment and think a bit more generically about what just happened. We found out that the product
corresponds to a special kind of convolution defined as
The formula above is very similar to general convolution as sum of $$$a_i b_j$$$, which is known to enumerate the pairs of an element described by the generating function $$$A$$$ and an element described by the generating function $$$B$$$. And the additional $$$2^{ij}$$$ multiplier means that, in addition to picking an element of type $$$A$$$ on $$$i$$$ "atoms" and an element of type $$$B$$$ on $$$j$$$ atoms, we connect each pair of atoms of these elements in one of $$$2$$$ ways. This can be interpreted e.g as follows:
Element of the arrow product of digraphs (first interpretation)
Illustration from the paper Symbolic method and directed graph enumeration
- We either draw, or do not draw a (directed) edge from the vertices of an object of type $$$A$$$ to the vertices of an object of type $$$B$$$;
- We either draw, or do not draw an (undirected) edge between the vertices of an object of type $$$A$$$ and an object of type $$$B$$$;
- We always draw a (directed) edge between each pair of vertices, and choose a direction in one of $$$2$$$ ways;
- We always draw an (undirected) edge between each pair of vertices, and color it in one of $$$2$$$ colors.
A combinatorial structure built on top of structures of types $$$A$$$ and $$$B$$$ in accordance to one of the above interpretations is called the arrow product of the structures $$$A$$$ and $$$B$$$. This interpretation allows to think of the task above from a different perspective.
As a notational convenience, we will use the symbol $$$\curvearrowright$$$ for such product, so that
where $$$a_i = [x^i] A(x)$$$ and $$$b_j = [x^j] B(x)$$$.
Enumerating DAGs
Let's mark the sources of a digraph with special variable $$$u$$$. In other words, let $$$D(x, u)$$$ be a generating function such that the coefficient near $$$x^n u^k$$$ corresponds to the number $$$a_{n,k}$$$ of DAGs with $$$n$$$ vertices and $$$k$$$ source vertices, divided by $$$n!$$$ (so it is EGF in terms of the number of vertices, as they're considered pairwise distinct, and OGF in terms of the number of sources, as they're considered indistinguishable). With definition above in mind, we can write the following equation for $$$D(x, u)$$$:
In the formula above, $$$u+1$$$ after expansion can be interpreted in such way that now the coefficient near $$$x^n u^k$$$ corresponds to the number of DAGs that have $$$n$$$ vertices overall and $$$k$$$ of their sources are specifically marked. In other way, $$$(u+1)^t$$$ expands in a way that expanding it into $$$u$$$ corresponds to marking the vertex, while expanding it into $$$1$$$ corresponds to leaving it as is.
Then, we can represent the class of objects enumerated by $$$D(x, u+1)$$$ as the arrow product of a set of marked source vertices, corresponding to the generating function $$$e^{ux}$$$, and of a directed graph without any marked vertices, corresponding to $$$D(x)$$$. Please refer to the diagram below for better understanding, the vertices on the left part will become marked source vertices and enumerated by $$$u$$$, while the source vertices on the right part will remain unmarked.
$$$D(x,u+1)$$$ represented as an arrow product
Illustration from the paper Symbolic method and directed graph enumeration
Now, using the definition of $$$D(x, u)$$$ we should note that $$$D(x, 1) = D(x)$$$ and $$$D(x, 0) = 1$$$. Thus, substituting $$$u=-1$$$, we get
which is equivalent to the already known formula
Enumerating digraphs with given type of SCCs
Now, thinking about this problem a bit more generically, we may thing of DAGs as directed graphs such that each of their strongly connected components is a single vertex. So, the class of all possible strongly connected components has an EGF $$$A(x) = x$$$. But the same reasoning applies to any other kind of digraphs with specified type of strongly connected components.
For examples we can use $$$A(x) = \log \frac{1}{1-x}$$$ for graphs whose SCCs are simple cycles, or we can put $$$A(x)$$$ to be an EGF of strongly connected graphs, to obtain EGF for all possible digraphs. Let $$$G(x, u)$$$ be generating function for digraphs having only strongly connected components enumerated by the EGF $$$A(x)$$$, such that the coefficient near $$$x^n u^k$$$ corresponds to the number of digraphs of type $$$G$$$ such that they have $$$n$$$ vertices and $$$k$$$ "source" strongly connected components (i.e. components without incoming edges).
$$$G(x,u+1)$$$ represented as the arrow product of $$$e^{A(x) u}$$$ and $$$G(x)$$$
Illustration from the paper Symbolic method and directed graph enumeration
Then, with the same logic, we can write that
from which one may again use the substitution $$$u=-1$$$, and get
Which leads us to the general formula
Finding EGF for strongly connected digraphs
The formula above can also be used to find $$$A(x)$$$ when $$$G(x)$$$ is known. If we use $$$A(x) = S(x)$$$, where $$$S(x)$$$ is the EGF of all strongly connected digraphs, then $$$G(x)$$$ would be the EGF of all possible digraphs. On the other hand, it is known that the number of digraphs on $$$n$$$ vertices is known to be $$$2^{n(n-1)}$$$ (for each of $$$n(n-1)$$$ ordered pairs we either draw a directed edge or not). In terms of $$$L(\cdot)$$$ it means
from which one gets that
which in turn rewrites as
which can also be computed in $$$O(n \log n)$$$ with formal power series inversion and logarithms.
Enumerating with edges
Now, what if we want a generating function $$$A(x, w)$$$ such that a coefficient near $$$x^n w^m$$$ corresponds to the number of digraphs of type $$$A$$$ that have $$$n$$$ vertices and $$$m$$$ edges? Again, we would divide this coefficient by $$$n!$$$, but not $$$m!$$$, as we generally consider vertices distinct (enumerated/labeled), while edges indistinguishable from each other. Proper way to define an arrow product in this case would be
where $$$a_i(w) = [x^i]A(x, w)$$$ and $$$b_j(w) = [x^j] B(x, w)$$$. Here, $$$1+w$$$ semantically means that we decide to either draw (or refrain from it) an edge, which is either undirected, or is always directed from left to right (depending on the context). Then, expanding into $$$w$$$ means drawing an edge, while expanding into $$$1$$$ would mean not doing it.
We can also define $$$L_w(A(x, w))$$$ to be a function that divides $$$[x^k] A(x, w)$$$ by $$$(1+w)^{\binom{k}{2}}$$$, in which case we will get the familiar identity
In this notion, the formula for SCCs still holds:
assuming that $$$G(x, w)$$$ enumerates graphs whose strongly connected components only belong to $$$A(x, w)$$$.
Enumerating directed cacti for free
Consider the problem 104234L - Directed Vertex Cacti. It essentially asks us to find $$$[x^n w^m] G(x, w)$$$ of the generating function of graphs $$$G$$$ such that each of their strongly connected components is a simple cycle, and $$$w$$$ only enumerates edges outside the SCCs. What would be a generating function $$$A(x)$$$ for such SCCs? Nothing else than
as it is a well-known generating function for simple cycles. Then, using the formula above, we get
hence
where $$$L_w(\cdot)$$$ divides $$$[x^k]G(x, w)$$$ by $$$(1+w)^{\binom{k}{2}}$$$. Therefore, we get
meaning that $$$[x^n y^m] G(x, w) = \binom{\binom{n}{2}}{m}$$$.