Finding out duplicate entries in a list and analyzing or fixing them is one of the most common requirements in data analysis or cleansing. In this article we will look at how hash tables can be used to quickly get the list of all elements which are repeated more than once in a list.
In the code example below, find_duplicates
function takes in a list of numbers as an input and returns list of all numbers which are repeated more than once (duplicates).
def find_duplicates(nums):
# Initialize Python Dictionary/Hash Table to store frequencies
frequency = {}
# Initialize list to store duplicate numbers
duplicates = []
for num in nums:
if num in frequency:
# If the element already exists in the frequency dictionary
# increment its count
frequency[num] += 1
# If the count reaches 2, then add it to the duplicates list
if frequency[num] == 2:
duplicates.append(num)
else:
# If the element is encountered for the first time
# Set its count to 1
frequency[num] = 1
return duplicates
# Example usage
input_list = [3, 5, 2, 7, 5, 8, 2, 2, 7]
result = find_duplicates(input_list)
print("Duplicates found:", result)