Strongly Connected Components in Graph Data Structure

Strongly Connected Components (SCCs) are subsets of vertices in a directed graph where each vertex in the subset is reachable from every other vertex in the same subset. In other words, if there is a path from each vertex in the subset to all other vertices within the subset, then those vertices form a strongly connected component. Before you understand what strongly connected components are, you need to understand what a strongly connected graph is.

Strongly Connected Graph

A strongly connected graph is a directed graph where every node is reachable from every other node. In formal terms, for any pair of vertices u and v in the graph, there exists a path from u to v and also a path from v to u. For example, consider the graph shown below:

From node A there is a direct path to node B. Also from node B, there is an indirect path to node A via node C. Here nodes A and B are strongly connected because both can travel to each other. Similarly, you can travel from A to C and also from C to A. Same for nodes B and C. This kind of a graph where each node in the graph can reach any other node in the graph directly or indirectly is termed as a strongly connected graph.

In contrast, consider the directed graph shown below:

If you consider nodes A and D, there is a path from node A to node D but there is no path back from node D to node A. So nodes A and D are not strongly connected and hence the graph is not a strongly connected graph.

Strongly Connected Components

Strongly Connected Components (SCCs) represent a partitioning (division) of a directed graph into maximal subgraphs where each node within the subgraph can reach every other node in that subgraph. This division creates groups of nodes (subgraphs) that are mutually reachable from one another, either directly or through intermediate nodes present in the same subgraph. Each SCC forms a strongly connected graph on its own, where every node can reach every other node within that component.

For example, consider the graph shown below:

If you look at the subset of nodes {A, B, C}, each node in the subset can reach every other node in the subset. From node D, there is no other node to which there is a back and forth path. So node D belongs to another subset. Similarly, the subset of nodes {E,F,G,H} are strongly connected to each other.

After partitioning the graph into multiple strongly connected sub-graphs, below are how the strongly connected components (SCC) look like:

A
A
B
B
C
C
D
D
E
E
H
H
G
G
F
F
SCC-1
SCC-1
SCC-2
SCC-2
SCC-3
SCC-3
Text is not SVG - cannot display

The above graph had a total of 3 Strongly Connected Components viz SCC1, SCC2, SCC3. Notice that not even a single node in each strongly connected component can travel to another node in a different strongly connected component and come back to itself. For example, you can travel from node E in SCC3 to node D in SCC2 but you can’t come back from node D to node E.

Properties of Strongly Connected Components

The properties of strongly connected components are as follows:

  1. Reachability: Within an Strongly Connected Component, all vertices are mutually reachable, meaning that there is a path from any vertex to any other vertex in the same SCC.
  2. Maximality: An SCC is a maximal set of vertices that satisfies the reachability property. It cannot be part of a larger set of vertices that also satisfies the reachability property. For example, in the graph below, nodes A, B, C, D can all reach one another. So all of them have to be part of the same connected component and they can’t be divided into different strongly connected components.
  1. Disjointness: The SCCs of a directed graph are disjoint, meaning that they do not overlap. Each vertex in the graph belongs to exactly one SCC.

Applications and Uses of Strongly Connected Components

SCCs are useful in graph theory and algorithm design because they can help identify the structure and relationships within a directed graph. By identifying the SCCs, you can break down a complex graph into smaller, more manageable components, which can simplify various graph-related problems and algorithms.

A
A
B
B
C
C
D
D
E
E
H
H
G
G
F
F
SCC-1
SCC-1
SCC-2
SCC-2
SCC-3
SCC-3
Text is not SVG - cannot display

If we consider the previous example, we have 3 different strongly connected components. We can reduce the graph to a smaller graph in which each strongly connected component is a node which has connections to other strongly connected components. This allows us to parallelise graph algorithms by performing them on each component simultaneously and then consolidating all the results to get the overall result.

One other real-world scenario where SCCs can be applied is in the analysis of social networks. In a social network, users can be represented as vertices, and the connections between them (e.g., friendships, follows, or interactions) can be represented as directed edges. By identifying the SCCs within the social network, you can discover groups of users who are closely connected to each other, which can be useful for targeted marketing, community detection, or understanding the dynamics of information flow within the network.

Another example is in the analysis of web pages and hyperlinks. The World Wide Web can be modeled as a directed graph, where web pages are vertices and hyperlinks are directed edges. By identifying the SCCs within this graph, you can discover groups of web pages that are strongly connected, which can be useful for search engine optimization, web crawling, or understanding the structure and organization of the web.

Algorithms for Finding SCCs

Two popular algorithms for identifying these strongly connected components are Kosaraju’s algorithm and Tarjan’s algorithm. Kosaraju’s algorithm works by first performing two depth-first searchs on the graph: one depth first search in forward order and another one in reverse order. Tarjan’s algorithm, on the other hand, uses a single depth-first search to identify the strongly connected components directly, by keeping track of the minimum reachable vertex in the current component. Both algorithms have a time complexity of O(V+E), making them efficient methods for finding the strongly connected components in a directed graph.