Union Find (Disjoint Set) Data Structure for Graphs

Union Find Data Structure
Disjoint Sets Lecture

Union Find data structure, also known as Disjoint set data structure, is a data structure used to find the total number of connected components in graph. It can be used to partition nodes in a graph into multiple disjoint sets based on their connectivity. It can also be used to find metrics like maximum/minimum number of nodes in a connected component etc.

How is this Data Structure useful?

Let’s say we have a graph with some nodes and edges as shown below:

1--2--3   6--7   9-10
|  |      |      |
4  5      8     11

The requirement is to label each group of connected nodes with a common identifier/representative. In the above example, the total connected node groups are 3. So every node in the graph will be under one of the 3 identifiers. How can we do this programatically?

The disjoint set data structure helps categorizing each node based on the group it belongs to. It makes sure that every node that is connected to some other node has the same identifier as the other node. After processing all the nodes in the example graph, the union find data structure stores sets as below:

{1, 2, 3, 4, 5}
{6, 7, 8}
{9, 10, 11}

For each set, an element contained in the set is chosen as the representative. For example, 1 could be the reprentative of the set containing elements {1, 2, 3, 4, 5}.

Operations Supported By Disjoint Set Data Structure

Initialize a set for an element

This operation initializes a new set for the input element. The same element will the the identifier of this set. For example, if sets are initialized for elements 1, 2, 3, 4, 5, the disjoint set data structure will initialize and store 5 different sets: {1}, {2}, {3}, {4}, {5}. Also since all the sets have a single element, each set’s representative will be the only element in the set. So for set {1}, the representative will be element 1, for set {2}, the representative will be element 2 etc.

Find representative for an element

This operation finds the representative or identifier of the set in which the given element exists. For example, if we try to find the identifier for set {1, 5, 6}, the answer will be 1 (Assuming that element 1 is the first one added to the list).

Union (Combine) two different sets

This operation takes in two different elements and merges both the sets into a single set. All the elements in both the sets now contain a single representative corresponding to the merged set.

TODO: Implementation.

Uses of Union Find Data Structure

  • For finding metrics of a graph like the number of connected components, minimum/maximum nodes among all the connected components etc.
  • In a Krusal’s Algorithm’s routine
  • To find cycles in an undirected graph