Topological Sorting

All about Topological Sort by WilliamFiset
🔗
Topological Sort DFS Algorithm Wallkthrough
🔗
Walkthrough of Kahn's Algorithm
🔗
Click here to suggest a better video or view suggestions..

Topological sorting is a concept in computer science, which is used in scenarios where a set of tasks must be performed in a specific order such that all the required conditions are satisfied. This concept is widely applied in task scheduling, dependency management, and graph theory.

Example of Topological Sorting

Let’s consider a sample example using academic subjects and their prerequisites/dependencies to demonstrate topological sorting.

Assume that below are the list of subjects a student has to complete:

  1. Introduction to Programming (IP)
  2. Data Structures (DS)
  3. Algorithms (ALG)
  4. Database Systems (DB)
  5. Web Development (WD)
  6. Machine Learning (ML)

Below are the dependencies/requirements before enrolling for each subject:

  • “Data Structures” requires completion of “Introduction to Programming”
  • “Algorithms” requires completion of “Data Structures”
  • “Database Systems” requires “Introduction to Programming”
  • “Web Development” requires “Data Structures” and “Database Systems”
  • “Machine Learning” requires “Algorithms” and “Web Development”

We can represent the dependencies as a directed acyclic graph (DAG):

IP ---> DS ---> ALG ---> ML
 |       |              ^
 |       |             /
 v       v            /
DB ----> WD ----------

Now, let’s find a valid topological ordering for these subjects:

  1. Start with Introduction to Programming (IP) as it has no prerequisites.
  2. We can then take either Data Structures (DS) or Database Systems (DB), as both only require IP.
  3. Let’s choose DS next.
  4. Now we can take ALG (which requires DS) or DB.
  5. Let’s choose DB.
  6. At this point, we can take either ALG or WD.
  7. We’ll choose ALG.
  8. Now we can only take WD as ML needs WD.
  9. Let’s finish with WD, then ML.

So, a valid topological ordering would be:

IP -> DS -> DB -> ALG -> WD -> ML

This ordering ensures that each subject is taken only after its prerequisites have been completed.

It’s important to note that this is not the only valid topological ordering. For example, another valid ordering could be:

IP -> DB -> DS -> WD -> ALG -> ML

In this alternative ordering, we took DB before DS, which is also valid since both only depend on IP. We also took WD before ALG, which is fine because WD doesn’t depend on ALG.

The key point is that in any valid topological ordering:

  1. IP will always come first
  2. ML will always come last
  3. DS will always come before ALG
  4. DB will always come before WD and ML
  5. ALG and WD will always come before ML

Depth-First Search (DFS) recursively explores a graph, fully visiting all connected nodes before backtracking. This property is useful for topological sorting, which orders nodes in a directed graph such that for every edge A -> B, A comes before B.

In the DFS-based topological sort, we maintain a list and add each node to this list after exploring all its neighbors. This ensures that a node is added only after all nodes depending on it are added. However, this process actually creates a reverse topological order.

For example, in a graph with A -> B -> D, DFS exploration would add nodes to the list in the order [D, B, A]. To get the correct topological order, we must reverse this list, resulting in [A, B, D]. This final reversed list ensures that each node appears before any nodes that depend on it, satisfying the definition of a topological sort.

Below is the implementation of topological sort using DFS:


def dfs(graph, node, visited, sorted_nodes):
  """
  Perform a Depth-First Search (DFS) starting from a given node.

  Args:
    graph: A dictionary representing the adjacency list of the graph.
    node: The starting node for the DFS.
    visited: A set to keep track of visited nodes.
    sorted_nodes: A list to store the topologically sorted nodes.
  """
  # Mark the current node as visited
  visited.add(node)

  # Iterate through the neighbors of the current node
  for neighbor in graph[node]:
    # If the neighbor has not been visited, recursively call DFS
    if neighbor not in visited:
      dfs(graph, neighbor, visited, sorted_nodes)

  # Add the current node to the sorted list
  sorted_nodes.append(node)

def topological_sort(graph):
  """
  Perform a topological sort on a directed graph.

  Args:
    graph (dict): A dictionary representing the adjacency list of the graph.

  Returns:
    list: A list of nodes in topologically sorted order.
  """
  # Initialize an empty list to store the sorted nodes
  sorted_nodes = []

  # Initialize a set to keep track of visited nodes
  visited = set()

  # Iterate through all the nodes in the graph
  for node in graph:
    # If the node has not been visited, perform DFS
    if node not in visited:
      dfs(graph, node, visited, sorted_nodes)

  # Reverse the list to get the correct topological order
  sorted_nodes.reverse()

  return sorted_nodes

# Let's test the function with courses dependencies discussed before
courses_graph = {
  "IP" : ["DS", "DB"],
  "DS" : ["ALG", "WD"],
  "DB" : ["WD"],
  "ALG" : ["ML"],
  "WD" : ["ML"],
  "ML" : []
}

topological_order = topological_sort(courses_graph)
print("Below is one valid order in which courses can be taken:")
print(topological_order)

Topological Sorting Using Khan’s Algorithm

Kahn’s algorithm for topological sorting works by repeatedly finding and removing nodes that have no dependencies. We start by identifying all nodes with no prerequisites (no incoming edges to the node). These are our “ready to process” nodes. We add them to our result list and remove them from the graph. As we remove each node, we also remove its outgoing dependencies. If this causes any other nodes to lose all their dependencies, we add those to our “ready to process” list. We repeat this process until all nodes are completed.

Below is the implementation of topological sort using Kahn’s algorithm:


def topological_sort(graph):
    """
    Perform a topological sort on a directed graph using Kahn's algorithm.

    Args:
        graph (dict): A dictionary representing the adjacency list of the graph.

    Returns:
        list: A list of nodes in topologically sorted order
    """
    # Create a list to store nodes with 0 in-degree
    zero_indegree_nodes = []
    
    # Create a dictionary to store the in-degree of each node
    in_degree = {}
    
    # Initialize in-degree of all nodes to 0
    for node in graph:
        in_degree[node] = 0
    
    # Calculate the in-degree of each node
    for node in graph:
        for neighbor in graph[node]:
            in_degree[neighbor] += 1
    
    # Add nodes with 0 in-degree to the zero_indegree_nodes list
    for node in in_degree:
        if in_degree[node] == 0:
            zero_indegree_nodes.append(node)
    
    # Perform topological sort
    topological_order = []
    while zero_indegree_nodes:
        # Remove a node from the end of the list
        current_node = zero_indegree_nodes.pop()
        topological_order.append(current_node)
        
        # Decrement the in-degree of its neighbors
        for neighbor in graph[current_node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                zero_indegree_nodes.append(neighbor)
    
    # Check if there is a cycle in the graph
    if len(topological_order) != len(graph):
        print("Error: Cycle detected in the graph")
        return []
    
    return topological_order

# Let's test the function with courses dependencies discussed before
courses_graph = {
    "IP": ["DS", "DB"],
    "DS": ["ALG", "WD"],
    "DB": ["WD"],
    "ALG": ["ML"],
    "WD": ["ML"],
    "ML": []
}

topological_order = topological_sort(courses_graph)
print("Below is one valid order in which courses can be taken:")
print(topological_order)

Applications of Topological Sorting

Topological sorting has numerous real-world applications:

  • Task Scheduling and Dependency Management: Ensuring tasks are executed in the correct order based on dependencies.
  • Compiler Design and Build Systems: Determining the order of compilation for source code files.
  • Ordering of Courses in an Academic Curriculum: Ensuring prerequisites are completed before advanced courses.
  • Dependency Analysis in Software Engineering: Managing package dependencies in software projects.
  • Sequencing Problems in Various Domains: Solving problems where a sequence must respect certain constraints.