Chilli's blog

By Chilli, history, 6 years ago, In English

In various places(12), people have talked about "Dinic's With Scaling" having $$$O(VElog(U))$$$ complexity, where U is the max edge capacity. However, in just as many places, people have cast doubt about this complexity or worried that this algorithm ends up being slower in practice(12).

I had the same doubts. In fact, a couple months ago I benchmarked Dinic's with scaling on some hard test cases and thought that Dinic's with/without scaling performed similarly.

As it turns out, I benchmarked it again yesterday and found that Dinic's with scaling definitely has significantly better asymptotic complexity (by a factor of V).

Synthetic Benchmarking

I used the hard test case generator at 2 posted by ko_osaga here. The one posted at 1 creates a graph with only V edges and moreover, doesn't work for me... Notably, given a V, it generates a graph with V nodes and $$$O(V^2)$$$ edges.

The code I used was my own Dinic's implementation (originally from emaxx) found here. Notably, the SCALING variable toggles between Dinic's with scaling and without. It's a small change, as you can see.

I ran the two code across a variety of input sizes, averaging across 10 runs, then taking the log of input size and times, plotting them in a chart where blue is the runtime of Dinic's without scaling and orange is the runtime of Dinic's with scaling. Since the important thing to look at is the slope of the line, I normalized the lines to start at the same point.

Fitting a linear regression to the points, we see that the slope of Dinic's without scaling is 3.204, while the slope of Dinic's with scaling is 1.93, which correspond to complexities of $$$O(V^3)$$$ and $$$O(V^2)$$$ respectively.

In terms of actual runtime, here are some numbers on specific sizes.

Benchmarking on Problems

There are 3 flow problems, I benchmarked on — SPOJ FASTFLOW, (VNSPOJ FASTFLOW)[https://vn.spoj.com/problems/FFLOW/), and a Chinese LOJ Flow Problem (it's intended to solve this problem with Highest Label Preflow Push, which has $$$O(V^2\sqrt{E})$$$ complexity).

Also, to vouch for my implementation, here's some benchmarks with other implementations :^) Mine (to enable scaling change the SCALING boolean), Emaxxs dinic,LOJ fastest (HLPP, the fastest submission on the Chinese flow problem). As I was doing these benchmarks I realized the LOJ submission is actually faster than mine on all of my benchmarks :'(. It is substantially longer than my implementation though :^)

SPOJ (V<=5000, E<=3e4)
Mine: 0.04
Emaxx Dinic: 0.11
Mine w/ scaling: 0.27
LOJ fastest: 0.00

VNSPOJ (V<=5000, E<=3e4, harder test cases):
Mine: TLE
Emaxx dinic: TLE
Mine w/ scaling: 0.00
LOJ fastest: 0.00

LOJ (V<=1200, E<=120000):
Mine: 56 points
Mine w/ scaling: 88 points
Emaxx dinic: 44 points
LOJ fastest: AC, 1176 ms total, #1 on leaderboard

Synthetic Test(N=2000, E=4e6):
Mine: 66.56
Emaxx dinic: 247.40
Mine w/ scaling: 0.14
LOJ fastest: 0.10

Number of non-whitespace characters:
Mine: 1164
Emaxx dinic: 1326
LOJ fastest: 2228

Conclusions:

First off, HLPP seems to be superior to Dinic's in terms of runtime, so perhaps the real lesson is that people should be using HLPP. Also, I hope to provide a solid well-tested implementation of Dinic's with scaling that people can use. Other than that, however, I hope it's clear that Dinic's with scaling really does have a significant improvement in runtime (approximately a factor of V), and that translates to significantly better performance on real problems.

PS: If anyone has a flow implementation they think is better than mine, I'd like to see it :^)

  • Vote: I like it
  • +149
  • Vote: I do not like it

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Chilli (previous revision, new revision, compare).

»
6 years ago, # |
Rev. 4   Vote: I like it +29 Vote: I do not like it

I've just been made aware of dacin21'simplementation, and although I haven't had the time to properly compare it yet — it does seem to be fairly fast — 0.05 on SPOJ, 0.03 on VNSPOJ, and 2000 ms on LOJ. It also seems to be reasonably short — seems very impressive.

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +21 Vote: I do not like it

    It's not really my own implementation, I think I got it from the kactl codebook. If I recall correctly, the only thing I changed in this version was fixing a runtime error when the source node is isolated. (I also have a modified version of it for fast bipartite matching.)

    I'll try some of my other codes on the LOJ problem and see how they fare. (I couldn't really benchmark on the SPOJ problem anymore, as the runtime were really low. Moving to static memory allocation reduces the runtime from $$$0.05$$$ to $$$0.02$$$ for example. I didn't want to do that in my codebook for obvious reasons.)

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +13 Vote: I do not like it

      Kactl notebook truly is a gold mine of great implementations...

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +11 Vote: I do not like it

      Does push-relabel provide a noticeable speed improvement in bipartite matching over Hopcroft-Karp?

      • »
        »
        »
        »
        6 years ago, # ^ |
          Vote: I like it +13 Vote: I do not like it

        From my experience (from a few problems I solved) it does, although I have never benchmarked it (I never wrote Hopcroft-Karp actually). My $$$0.00$$$ on https://www.spoj.com/problems/MATCHING/ uses Push-Relabel.

        One thing to note for matching problem is that you should pick the vertex with the lowest label to discharge (for flow you pick the one with the highest label) and do global relabeling, both to get the theoretical $$$O(\sqrt{V} E)$$$ bound and for it to run fast in practice. There were also quite a few constant-factor optimizations, such as keeping all the excess on one side of the graph.

»
6 years ago, # |
Rev. 2   Vote: I like it +4 Vote: I do not like it

nice

»
6 years ago, # |
  Vote: I like it +24 Vote: I do not like it

The LOJ submissions was really interesting to have a look at. Here's my quick analysis of it:

  • It uses the highest label push-relabel with buckets.
  • Edges are stored in a 2D adjacency list. (The alternative is to compress the graph into a 1D list.)
  • It uses gap relabeling. Gap relabeling is optimized by maintaining all nodes (active or inactive) in an additional bucket datastructure. This is crucial on SPOJ, where repeated gab-relabellings are the bottleneck. (Normal gap relabeling runs in $$$O(n)$$$ every time it occurs, with this is O(n) in total. This does not matter on LOJ that much, as the graphs there are a lot denser.) This optimization was new to me.
  • It only computes the min-cut and not the full max flow, as vertices with heights $$$\geq n$$$ are ignored. This only works on tasks where you only need the total flow and not the flow on specific edges.
  • It does not maintain the current edge if each node. This leads to more arc-scans on LOJ. (The advantage of this is that you don't need a second pass to relabel, but the increase in arc-scans is also significant.)
  • It does a single global relabeling step at the beginning. This helps in many cases. If we connect every node to both the source and the sink with edges of capacity $$$1$$$, then this won't help at all.

With the improved gap relabeling, I now also got my $$$0.00$$$ on SPOJ, but getting that is rather inconsistent (I submitted a few times and the runtime varied between $$$0.00$$$, $$$0.01$$$ and $$$0.02$$$.)

  • »
    »
    6 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks for the analysis of that implementation! I've noticed that for my implementation of Dinic's, storing the edges in a 2d adjacency list performs better than in a flat structure.

    The single global relabeling is interesting too — I've always thought it was something you do repeatedly. Is there any way to test its performance without that single global relabeling at the beginning? Removing it for me causes the code to fail.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      Yeah, running global updates every $$$\sim 4 n$$$ relabels is what the papers suggest. With some heuristics in the global relabeling, I got the following runtimes on LOJ (SPOJ is still $$$0.00$$$) (with static memory)

      $$$\require{siunitx}$$$

      $$$ \begin{array}[lr]\\ \text{global updates} & \text{runtime}\\ \hline \text{yes} & 947 \,\mathrm{ms}\\ \text{initial only} & 1334\,\mathrm{ms}\\ \text{no} & 1465\,\mathrm{ms} \end{array} $$$

      so it does indeed help. I also tried partial augmentation, but that was a lot slower. (Maybe I understood it wrong, the notes about it "The Partial Augment-Relabel Algorithm for the Maximum Flow Problem" are a bit vague about the multi-push operation.)

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        Well, I get slightly less drastic results, but it does seem to be superior to do global updates every $$$4n$$$ relabels. One note, I'm counting each gap relabel as multiple relabels, while you seem to count them as just one — counting them as multiple relabels seems to help for me.

        With global updates, I get a best of $$$913$$$ ms, and with only an initial update, I get a best of $$$1050$$$ ms.

»
6 years ago, # |
Rev. 2   Vote: I like it +16 Vote: I do not like it

If anyone wants to use the really fast implementation from LOJ in their library, I simplified it and cleaned up some stuff here: https://gist.github.com/Chillee/ad2110fc17af453fb6fc3357a78cfd28#file-hlpp-cpp

It's currently the fastest/tied for 0.00 seconds on all 3 of SPOJ FASTFLOW, VNSPOJ FASTFLOW, and LOJ flow.

It comes in at 87 lines and 1687 non whitespace characters, so decently long, but about 30% shorter than the original implementation and a lot easier to understand for me.

  • »
    »
    6 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    There is an error in the code. In line 41, you try to get the back element from an empty queue.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Well... Perhaps undefined behavior is the correct word. At least in my compiler (gcc c++14), it returns a reference to the last element that was in the queue.

      I do agree that it's pretty jank(and should be changed), but it's not an "error".

      • »
        »
        »
        »
        6 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Sorry, I meant undefined behavior too. It surprisingly doesn't work on my compiler and gives a runtime error.

        • »
          »
          »
          »
          »
          6 years ago, # ^ |
            Vote: I like it +11 Vote: I do not like it

          Alright, I fixed it. Thanks for pointing out that it fails.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    In line 50 of your dinic with scaling implementation:

    for (lim = SCALING ? (1 << 30) : 1; lim > 0; lim >>= 1)

    If capacity is long long, is 1 << 30 enough? Shouldn't it be higher, like 1 << 62?

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      as capacity scaling says "the highest bit of max capacity".

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Hi, thanks for the nice implementation first! In SPOJ, I use my java version based on your hlpp code, and I got a rte(nzec). And when i use 2 * MAXN to create these arrays and vectors then I got an ac. So can you show your spoj submission? I want to check whether there are errors in my java implementation.

»
5 years ago, # |
  Vote: I like it +10 Vote: I do not like it

Hi, I think you have made a mistake in testing ko_osaga's synthetic test. I think that test is made for source = 1 and sink = n, maybe you tested for some other source or sink. Your dinic with scaling takes quite a lot of time in that case. Even for N = 1000, M = 1e6, it takes 3.5 second on ideone: https://ideone.com/vh6Uxm

»
3 years ago, # |
  Vote: I like it +18 Vote: I do not like it

Chilli Hi, I believe your github HLPP code fails on the following test case: https://ideone.com/HbPNMR

The correct answer is 40000024.

  • »
    »
    3 years ago, # ^ |
    Rev. 3   Vote: I like it +18 Vote: I do not like it

    Found a simpler test case

    8 8 1 7
    1 7 1
    1 2 2
    2 3 1
    3 4 1
    4 5 1
    5 6 1
    6 7 1
    8 3 1
    

    The correct answer is 2, but the github HLPP outputs 1.

    One really weird thing is that the directed edge 8->3 really shouldn't matter, but removing it somehow makes HLPP find the correct max flow.