Find Duplicates Using Hash Table

Finding out duplicate entries in a list and analyzing or fixing them is one of the most common requirements in data analysis or cleansing. In this article we will look at how hash tables can be used to quickly get the list of all elements which are repeated more than once in a list.

In the code example below, find_duplicates function takes in a list of numbers as an input and returns list of all numbers which are repeated more than once (duplicates).

def find_duplicates(nums):
    # Initialize Python Dictionary/Hash Table to store frequencies
    frequency = {}
    # Initialize list to store duplicate numbers
    duplicates = []

    for num in nums:
        if num in frequency:
            # If the element already exists in the frequency dictionary
            # increment its count
            frequency[num] += 1
            # If the count reaches 2, then add it to the duplicates list
            if frequency[num] == 2:
            # If the element is encountered for the first time
            # Set its count to 1
            frequency[num] = 1

    return duplicates

# Example usage
input_list = [3, 5, 2, 7, 5, 8, 2, 2, 7]
result = find_duplicates(input_list)
print("Duplicates found:", result)