Can someone interpret the top-performing Rust code for CSES Graph Theory: High Score?

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	168
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	159
5	atcoder_official	156
6	djm03178	153
6	adamant	153
8	luogu_official	149
9	awoo	148
10	TheScrasse	146

UPDATE: I actually managed to hack the Rust Code with this testcase. This is the same as CSES test case #27, but I switch the order of the first two edges. The code fails if it sees the 1 2 -100 edge after the 1 3 1 edge, since that would cause it to traverse the 1 2 -100 edge first (it traverses edges in reverse order of how they show up in the input), and apparently that is enough to make it output 3 instead of the correct answer, -1. As according to CSES Guidelines, all submissions will be regraded. I wonder how this'll turn out!

Problem Summary: Given a directed and weighted graph, find the longest path from node 1 to node N, given that edges can be traversed multiple times. If there is an arbitrarily long path, then print -1. The problem link is here.

My method to solve this problem came straight from the CSES Book: Use the Bellman-Ford Algorithm to determine which nodes (if any) are in a positive cycle. If any positive cycle is reachable from both node 1 and node N, then return -1. Otherwise, just return the longest distance from node 1 to node N. Another method to solve this problem is by using a topological sort and finding the longest distance from node 1 to node N that way.

However, the top-performing code for this problem comes from Rust, and seems to use a completely different method. Here is the code. My main language is C++ so I can kind of interpret some parts of the code. The program seems to keep extra variables first_edge and first_reverse_edge to access edges, instead of using an adjacency list. The function mark_useful_edges() seems to set all edges that can reach node N as useful. But I have no clue how the search() function works.

Can someone interpret this code, or reveal the algorithm or logic the code-writer used?

Note: I understand that simply caring about best performance is a narrow-minded way of learning algorithms. I am interested in this solution because it seems to be a simple way that I have never seen before, and still solves the problem quickly. If this code involves a new technique, I want to learn it.

Comments (2)

Show archived | Write comment?

Ritwin

3 years ago, # |

← Rev. 4 →

+22

Before I say anything, I want to thank you for one of the best-written questions on codeforces.

The method used in that solution is called (at least by me) an "adjacency edge list", and I learned about it from a blog that called it the same thing. It's not anything more special than a normal adjacency list, and you can think of it as storing the edges out of each vertex in a linked list. ~~I'll find the blog and link it here.~~ Found it!.

I've gotten the fastest solutions on some problems with this method, and I've seen it before in other judges as well (I believe https://judge.yosupo.jp). Found a submission of mine! I encourage people reading this to try and beat my time on that problem; it seems quite unoptimized.

I'll show the general idea here with a DFS floodfill implementation.

Code

constexpr int N = 100005; // Number of vertices
constexpr int M = 200005; // Number of edges
int head[N], nxt[M], to[M], e;
void init() {
  memset(head, ~0, sizeof(head));
}
void add_edge(int f, int t) {
  nxt[e] = head[f];
  to[e] = t;
  head[f] = e++;
}

bool vis[N];

void dfs(int c) {
  vis[c] = true;
  // Iterating over nodes adjacent to c
  for (int j=head[c]; ~j; j=nxt[j]) {
    if (!vis[to[j]]) dfs(to[j]);
  }
}

int main() {
  init();
  // Add edges
  dfs(0);
}

The key idea is that for each edge, you store a index for the previous edge you've initialized for that node. You start with that index as -1, which is why we memset it at the beginning, and why the stopping condition in the loop uses ~j (you can also do j != -1, but I like how it makes the code more mysterious :)

~~How do you use :) at the end of a parenthesized statement~~

Note: It's also worth mentioning that this idea can be applied to speed up any context where you use vector<T>[N] and add elements to the vectors as long as you don't need random-access references to the elements. Just replace the to array with data to store whatever you want. Often when I want to use this on weighted graphs for example, I use an extra array cost.

→ Reply

tfg

3 years ago, # ^ |

Whether it's faster or not is doubtful. For example, if you use that in min cost max flow problems I think it'll end up being slower because of cache miss reasons.

Anthony_Smith's blog