Recently I've read about minimum number of needed edges to construct SCC from a DAG. And it was the MAX(x,y) where x is the number vertices with 0 in-degree and y is the number of vertices with 0 out-degree. But i am wondering how to find such combination of edges in O(N) time. Thanks in advance :)
edit : solved. thank you very much for all people who helped me :)
Let $$$s_1, s_2, ..., s_n$$$ will be $$$0$$$ in-degree vertices and $$$e_1, e_2, ..., e_m$$$ vertices with $$$0$$$ out-degree. Assume that $$$m \geq n$$$ (if it's wrong we can, for example, change direction of all edges).
Notice, that out-degree of every $$$e_i$$$ vertex should be at least one after adding edges. But we can add only $$$m$$$ edges. So, every edge should begins in one of $$$e$$$ vertices and out-degree of every vertex from $$$e$$$ will be exactly $$$1$$$. Let's build these edges.
Run dfs from $$$s_1$$$ and find all $$$0$$$ out-degree vertices, which achievable from $$$s_1$$$. Let's call these vertices $$$c_1, c_2, ..., c_k$$$. Now add edges $$$(c_1, c_2)$$$, $$$(c_2, c_3), ..., (c_{k-1}, c_{k})$$$ and $$$(c_k, s_2)$$$. Then repeat this process with other vertices from $$$s$$$. On every step we shouldn't touch vertices from $$$e$$$, which were visited on previous steps. And don't forget about edge $$$(c_{nk_n}, s_1)$$$ in the end.
Total complexity is $$$O(N+M)$$$.
Ok it looks very clear for me thank you very much except this part (cnkn,s1) which is the last edge, but how can we deal with it if we had multiple DAGs in the same graph?
It also works with multiple DAGs in the same graph. After adding edges there will be cycle $$$(s_1, c_{11}), (c_{11}, c_{12}), ... (c_{1k_{1}-1}, c_{1k_1}), (c_{1k_1}, s_2), (s_2, c_{21}), (c_{21}, c_{22}), ...,$$$ $$$(c_{2k_{2}-1}, c_{2k_2}), (c_{2k_2}, s_3), ..., (c_{(n-1)k_{n-1}}, s_n), ... , (c_{nk_{n}}, s_1)$$$. This cycle covers all $$$0$$$-in degree verteces, from which we can reach any vertex.
What's your response for Radewoosh's counter example?
His counter example works :)
why ? won't you connect 3 -> 4 , 4 -> 1 which is wrong ?
There are will be 2 CSS: (2) and (1, 3, 4)
each vertex should be in a different SCC in the example 1,3,4 can't reach each other out only 1 can reach 3,4 but 3,4 can't reach anything
Graph after adding 3->4 and 4->1 edges.
i mean it's still not a SCC but i need to be an SCC
the solution of this example should be 4 -> 1 , 3 -> 2 while your idea's logic say the answer is 3 -> 4 , 4 -> 1 which is wrong
I meant that Radewoosh's counter example works against my solution. So, this approach doesn`t work.
ohh ok i misunderstoond you sorry :)
but i appreciate you helping me :)
thank you very much
And what do you propose we do if all vertices reachable from $$$s_i$$$ were previously visited?
n=4, m=4, edges:
Also thank you for the edge case lol
Any other ideas of how to make it work :)?
I have a solution which I think should work. Let
Number of nodes with 0 in-degree = x
Number of nodes with 0 out-degree = y
I am considering y $$$\geq$$$ x. If y $$$<$$$ x then we can reverse the edges.
First, colour all the x nodes with colours from 0 to x-1. Now do a dfs from each of these x nodes. While doing dfs from ith node, colour all nodes you visit with the colour of the root node (the node with 0 in-degree from which we are running dfs currently). In dfs, whenever you encounter a node which has already been coloured, simply return as it is guaranteed that all nodes in it's subtree are already coloured. Now our graph will have these properties-
Here,
root node of ith colour is node which has colour i and in-degree 0 and,
leaf nodes of ith colour are nodes which have colour i and out-degree 0.
Now let us consider all colours c such that there exist atleast 1 node with colour c and out-degree = 0. Let there are p such colours f1, f2....fp. Also, select one leaf node of each of these p colours. Now add edges like this-
leaf(f1) -> root(f2), leaf(f2) -> root(f3), .... leaf(fp) -> root(f1). Total p edges would be added like this.
Now, there are x-p colours left which do not have any leaf. Let these be c1, c2 ... cx-p. We have y-p leaf nodes left. Now out of these y-p leaves, choose any x-p leaves and let those be l1, l2....lx-p. Add edges like this-
l1 -> root(c1), l2 -> root(c2) ... lx-p -> root(cx-p). Total x-p edges will be added like this.
Total edges added till now = p+x-p = x.
We still have y-x leaves left. Add edges from these leaves to root of any colour. So total y edges would be added. It can easily be shown that we can reach from any node of colour fi to any node of colour fj, from any node of colour fi to any node of colour cj and from any node of colour ci to any node of colour cj.
I do feel that it might fail on a case like this-
We have to go from some node of colour ci to some node of colour fj. It is possible in my algorithm that from each node of colour ci, I am only able to reach a leaf node of colour fj which is connected to a root node of colour ck and thus I won't be able to reach every node of colour fk. In this case, I will choose a colour k such that it has a leaf node connected to root(fi) and at least 1 leaf node connected to root(cj). I will swap these two leaf nodes and the problem would be solved.
It is a very long solution, I am sorry for any typos in between or any mistake. I think this should solve your problem in linear time. Please correct me if I am wrong somewhere.
Hello, this is known as the strong connectivity augmentation problem. You can find a solution described in this paper.
Actually, I've read this paper more than once but the idea isn't very clear to me. Also I tried to just implement it, And it didn't work, probably there is something I missed, But I am not able to figure it out.
There is a pretty nice and neat algorithm for this.
The basic idea of the algorithm is to greedily try to match sources with sinks using dfs. Then when no more matches can be found, the sources/sinks that never found a match must be able to reach/be reachable by some node in matched_sinks/matched_sources. You can then strongly connect everything using $$$\text{max}(\text{size}(\text{sources}), \text{size}(\text{sinks}))$$$ edges.
Properties of the greedy matching
The process of greedily matching source and sinks has the following properties:
How to add the edges
To make the DAG strongly connected.
so as i did understand
firstly:
for any source i will try to connect it to exactly one sink which it can reach if not i will add it to never_matched_sources
secondly:
i will find all of the unmatched sinks and put them in never_matched_sinks
thirdly:
strongly connect the never_matched_sinks to the never_matched_sources
fourthly:
i have to strongly connect the rest
is it this ?
"firstly: for any source i will try to connect it to exactly one sink which it can reach if not i will add it to never_matched_sources"
Note that the dfs never visits the same node twice. So the formulation should be "it can reach using unvisited nodes"
"thirdly: strongly connect the never_matched_sinks to the never_matched_sources"
I start by strongly connecting all of the matched sources and sinks into one big SCC. Think of it like making one big cycle.
Once I've made the big SCC, I then strongly connect the unmatched sources and sinks to it.
Yeah it's very clear to me i did implement it and it looks to work for single component graph, but can you solution be generalized for several DAGs in the graph ?
Still not sure what you are talking about. What do you even mean by single component graph when it comes to directed graphs? The algorithm works for any DAG.
take this for example
n = 4 , m = 2
1 2
2 3
you will connect 3 -> 1 then you will have 4 as a sink and source at the same time and you can add only 1 more edge what would you do in this case ?
For that graph
matched_sources
will become [1,4] andmatched_sinks
will become [3,4] (note this means (source) 1 is matched with (sink) 3, and (source) 4 is matched with (sink) 4, so 4 is matched with itself). There are no unmatched sources or sinks.To create one big SCC out of the matched sources and sinks, I run this code
which connects 3->4 and 4->1. It does not connect 3->1. Think of this process as making one big cycle.
ohh ok
is it possible to explain the last 4 loops cause i am not used to python :)
this is the same as
which in C++ would look something like
ohh yeah i just got it all right now thank you very much :))
is my point clear to you ?
Also how would you deal with it if the graph is made of multiple DAGs ?
The definition of DAG does not require the nodes to be connected. For example a graph with no edges is technically a DAG. So multiple DAGs is still a DAG.
Auto comment: topic has been updated by faresbasbas (previous revision, new revision, compare).