Hi everyone!
if you read about Dinic algorithm on CP-Algorithms, especially the section about how it works on unit networks, you'll notice that its supposed runtime for bipartite matching is $$$O(E \sqrt V)$$$, where $$$E$$$ is the total number of edges, and $$$V$$$ is the total number of vertices of a bipartite graph. Let's try it out on the following problem:
Library Checker — matching on Bipartite Graph. Given a bipartite graph with $$$V, E \leq 2 \cdot 10^5$$$, find the maximum matching.
The implementation of the Dinitz algorithm that I typically use looks as follows:
Now, if we make a submission with this implementation, we get... Time Limit Exceeded?! Okay, that's weird. Apparently, roughly the fastest practical algorithm for maximum bipartite matching is so-called Hopcroft-Karp algorithm. Maybe it has a better complexity? Nope, Wikipedia page says it's $$$O(E \sqrt V)$$$. Maybe it's somehow fundamentally different from Dinitz? No, not really — Wikipedia page explicitly states that it is essentially a special case of Dinitz algorithm. So... What gives such bad performance?
Of course, there can simply be a bug in my Dinitz algorithm implementation, but after some checks, it seems that it's not the case. On particular failing test-case, we can see that the algorithm really just executes around $$$200$$$ iterations of traversing the whole flow network of roughly $$$8 \cdot 10^5$$$ edges. So, it seems that the core reason is just the enormous constant-time overhead.
So... If Hopcroft-Karp is just the special case of Dinitz algorithm, applied to bipartite graphs, how do we make it 50x faster?
Blocking Kuhn algorithm
It may seem a bit weird, but let's first talk about something different. One standard and pretty well-known algorithm for bipartite matching is Kuhn's algorithm. It's known to run $$$O(VE)$$$, which wouldn't work as is for this problem, but there are some optimizations to it that will actually allow to get AC on this problem. Yes, you heard it right, with proper heuristics, you can get AC on this problem with presumable $$$O(VE)$$$ runtime, while proven $$$O(E \sqrt V)$$$ runtime algorithm fails. So... What is standard Kuhn's algorithm? It looks like this:
For each vertex in the right part, we keep an array match
, so that match[u]
is $$$-1$$$, if it's not matched with anything, and match[u] = v
if it is matched with the vertex $$$v$$$ from the left part. Then, for each vertex $$$v$$$ from the left part, we try to match it with some vertex $$$u$$$ from the right part. If $$$u$$$ is not matched — that's great, we can just match it with $$$v$$$. Otherwise, we try to recursively match the vertex match[u]
with something different. if we succeed, the vertex $$$u$$$ is free now, and we can match $$$v$$$ with it. Otherwise, we go to the next $$$u$$$ in the adjacency list of $$$v$$$.
Of course, using this implementation, we get Time Limit Exceeded. But! We can change just a few lines to get Accepted instead. Here:
for(int i = 0; i < n; i++) {
visited.assign(n, 0);
kuhn(i);
}
Written as it is now, the algorithm is essentially equivalent to Ford-Fulkerson algorithm, as we find augmenting paths one by one and push the flow alongside the path. After that, we fully reset visited
tags and run everything from scratch. But the core idea of Dinitz algorithm is to push as much flow as possible before resetting visited
tags and trying pushing it again. A flow found this way is called "blocking flow", as it blocks as from finding further augmenting paths without updating the residual network. So... let's do just that. Try to match as much vertices as possible before resetting visited
:
vector<bool> paired(n);
for(bool repeat = true; repeat;) {
repeat = false;
visited.assign(n, 0);
for(int i = 0; i < n; i++) {
if(!paired[i]) {
paired[i] = kuhn(i);
repeat |= paired[i];
}
}
}
And doing just that is enough to get Accepted on the problem using just 1583ms out of 5s! Not bad for a $$$O(VE)$$$ algorithm, huh? This heuristic was already described 8 years ago in this Russian blog by Burunduk1, but I haven't seen its highlight in English yet.
Anyway, adding a bit more optimizations and heuristics, you can bring it down to 986ms, while still using Kuhn's algorithm.
Dinitz on bipartite graphs
But what if we still want something faster? After all, we still didn't use the Dinitz's algorithm most important concept. Layered networks. The implementation above finds blocking flow in some network, which is defined by depth-first search. But to provably constrain the number of iterations of finding blocking flow, we need to make sure that it is a specific network. Namely, the layered network defined by shortest paths from source to each vertex. Without constructing the flow network explicitly, we still can do it like this:
vector<int> dist;
bool bfs() {
dist.assign(n, n);
int que[n];
int st = 0, fi = 0;
for(int v = 0; v < n; v++) {
if(!paired[v]) {
dist[v] = 0;
que[fi++] = v;
}
}
bool rep = false;
while(st < fi) {
int v = que[st++];
for(auto e: g[v]) {
int u = match[e];
rep |= u == -1;
if(u != -1 && dist[v] + 1 < dist[u]) {
dist[u] = dist[v] + 1;
que[fi++] = u;
}
}
}
return rep;
}
In this implementation, rather than initiating the bfs queue with source, we initiate it with all vertices that are directly reachable from it (that is, vertices of the left part that are not paired yet). After that, we find the shortest distance from such vertices to each vertex in the left part, and we can use this information to implicitly only traverse the layered network:
vector<size_t> ptr;
bool kuhn(int v) {
for(size_t &i = ptr[v]; i < size(g[v]); i++) {
int &u = match[g[v][i]];
if(u == -1 || (dist[u] == dist[v] + 1 && kuhn(u))) {
u = v;
paired[v] = true;
return true;
}
}
return false;
}
This time, similar to Dinitz algorithm, we maintain a pointer to the last checked edge in each vertex. Then, the main loop looks like
while(bfs()) {
ptr.assign(n, 0);
for(int v = 0; v < n; v++) {
if(!paired[v]) {
kuhn(v);
}
}
}
Implemented this way, it gets Accepted in just 160ms, which is a huge improvement over blocking Kuhn's algorithm which ran in 960ms at best, and certainly over Dinitz algorithm in explicitly constructed flow network, which couldn't even pass under 5s... Note: This is my interpretation of how Dinitz on bipartite graphs should look like if we store the network implicitly. I'm not entirely sure if it coincides with Hopcroft-Karp algorithm, because I don't know it.