Group Connected Components in an UnDirected Graph

In graph theory, a connected component refers to a subset of vertices in a graph where each vertex is connected to every other vertex in the subset, either directly or indirectly. Grouping all connected components in a graph is the process of identifying these subsets and assigning a unique identifier or label to each subset/group of vertices.

Labeling Connected Components Using Depth-First Search (DFS):

Depth-First Search (DFS) is a graph traversal algorithm that can be used to identify and label connected components in a graph. The process of using DFS for this purpose can be summarized as follows:

  1. Start by iterating through all the nodes in the graph.
  2. For each unvisited node, begin a DFS traversal starting from that node.
  3. As the DFS traversal progresses, assign a unique identifier (e.g., a number) to all the nodes that are directly or indirectly connected to the starting node.
  4. Repeat steps 2 and 3 for all the unvisited nodes in the graph.

By the end of this process, all the nodes in the graph will be assigned a unique identifier, and each set of connected nodes will have the same identifier. This effectively groups and labels the connected components in the graph.

Below is the implementation to find and label connected components in a given un-directed graph using Depth First Search:


# Function to label all connected nodes with a component ID using DFS
def label_nodes_dfs(current_node, component_id, graph, node_to_component_id):
  # Mark the current node as visited by assigning the component ID
  node_to_component_id[current_node] = component_id
  
  # Traverse all the neighbors of the current node
  for neighbor in graph[current_node]:
    # If the neighbor is not yet visited, recursively call the function
    if neighbor not in node_to_component_id:
      label_nodes_dfs(neighbor, component_id, graph, node_to_component_id)

# Function to find all connected components in the graph
def find_connected_components(graph):
  # Initialize the node to component ID mapping
  node_to_component_id = {}
  
  # Initialize the component ID
  component_id = 0
  
  # Iterate through all the nodes in the graph
  for node in graph:
    # If the node is not yet visited, start a new DFS traversal
    if node not in node_to_component_id:
      label_nodes_dfs(node, component_id, graph, node_to_component_id)
      component_id += 1
  
  return node_to_component_id

# Initialize a sample graph as an adjacency list
graph = {
  'A' : ['B', 'C'],
  'B' : ['A', 'C'],
  'C' : ['A', 'B'],
  'D' : ['E'],
  'E' : ['D'],
  'F' : []
}

node_to_component_id = find_connected_components(graph)
print("Below is the component ID given to each node:")
print(node_to_component_id)

# You can also group components based on ID and store them together
nodes_by_component_id = {}
for node,component_id in node_to_component_id.items():
  if component_id not in nodes_by_component_id:
    nodes_by_component_id[component_id] = []
  nodes_by_component_id[component_id].append(node)
print("Below are all the nodes in graph grouped by component ID:")
print(nodes_by_component_id)

Labeling Connected Components Using Breadth-First Search (BFS):

Breadth-First Search (BFS) technique can also be used to group connected components. The BFS approach explores all the neighboring vertices at the current depth before moving on to the vertices at the next depth level. Below is the implementation of this approach:


from collections import deque

# Function to label all connected nodes with a component ID using BFS
def label_nodes_bfs(start_node, component_id, graph, node_to_component_id):
  # Initialize a queue for BFS
  queue = deque([start_node])
  
  # Mark the start node as visited by assigning the component ID
  node_to_component_id[start_node] = component_id
  
  while queue:
    current_node = queue.popleft()
    
    # Traverse all the neighbors of the current node
    for neighbor in graph[current_node]:
      # If the neighbor is not yet visited, mark it and add it to the queue
      if neighbor not in node_to_component_id:
        node_to_component_id[neighbor] = component_id
        queue.append(neighbor)

# Function to find all connected components in the graph
def find_connected_components(graph):
  # Initialize the node to component ID mapping
  node_to_component_id = {}
  
  # Initialize the component ID
  component_id = 0
  
  # Iterate through all the nodes in the graph
  for node in graph:
    # If the node is not yet visited, start a new BFS traversal
    if node not in node_to_component_id:
      label_nodes_bfs(node, component_id, graph, node_to_component_id)
      component_id += 1
  
  return node_to_component_id

# Initialize a sample graph as an adjacency list
graph = {
  'A': ['B', 'C'],
  'B': ['A', 'C'],
  'C': ['A', 'B'],
  'D': ['E'],
  'E': ['D'],
  'F': []
}

node_to_component_id = find_connected_components(graph)
print("Below is the component ID given to each node:")
print(node_to_component_id)

# Group components based on ID and store them together
nodes_by_component_id = {}
for node, component_id in node_to_component_id.items():
  if component_id not in nodes_by_component_id:
    nodes_by_component_id[component_id] = []
  nodes_by_component_id[component_id].append(node)

print("Below are all the nodes in graph grouped by component ID:")
print(nodes_by_component_id)

Applications of Grouping Connected Components:

Grouping connected components in a graph has various applications, such as:

  • Social network analysis: Identifying communities or clusters of users in a social network.
  • Recommendation systems: Grouping related items or products to provide better recommendations.
  • Image segmentation: Separating an image into distinct regions or objects.
  • Bioinformatics: Analyzing protein-protein interaction networks or gene regulatory networks.
  • Network routing and connectivity: Identifying independent network segments or subnetworks.