Hash tables are incredibly useful tools for storing and retrieving data quickly. Imagine them like a set of labeled bins. A special function, called a hash function, tells you exactly which bin to put an item in or look for it. However, sometimes, two different items might be assigned to the same bin! This is called a collision.
Open Addressing is one family of techniques used to solve this collision problem. Instead of putting multiple items in the same bin (like using a list inside the bin, which is called Separate Chaining), open addressing says: “If the target bin is full, just find another empty bin within the table itself.” Linear Probing is the simplest way to find that next empty bin. Let’s explore how it works.
What is Linear Probing?
Think about parking your car in a parking lot with numbered spots. You have a preferred spot (calculated by your “hash function”). If you arrive and your spot is taken, what’s the simplest thing to do? You’d probably just check the very next spot (spot + 1
). If that’s taken, you check the one after that (spot + 2
), and so on.
Linear probing works exactly like this! When a collision occurs at a certain index (bin) in the hash table, linear probing looks for the next available slot in a linear sequence: index + 1, index + 2, index + 3, and so on. If it reaches the end of the table, it wraps around to the beginning.
The sequence of locations checked is called the probe sequence. In linear probing, this sequence is determined by simply adding 1, then 2, then 3, etc., to the original hash index (modulo the table size to handle wrap-around).
How Linear Probing Works
Let’s see how the basic operations work using linear probing. We represent the hash table as an array. The hash function hash(key)
gives us the initial index for a given key. Let tableSize
be the number of slots in the table.
Insertion
- Calculate Initial Index: Compute the index
i = hash(key) % tableSize
. - Check Slot: Look at the slot
table[i]
. - Insert if Empty: If
table[i]
is empty (or marked as deleted), insert the new key-value pair there. - Probe if Occupied: If
table[i]
is already occupied by a different key, check the next slot:(i + 1) % tableSize
. - Continue Probing: Keep checking the subsequent slots
(i + 2) % tableSize
,(i + 3) % tableSize
, etc., until an empty or deleted slot is found. - Insert in Found Slot: Place the new key-value pair in the first empty/deleted slot encountered.
- Table Full: If you probe through the entire table and return to the starting index
i
without finding an empty slot, the table is full, and the insertion fails (or triggers a table resize, called rehashing).
Searching
- Calculate Initial Index: Compute the index
i = hash(key) % tableSize
. - Check Slot: Examine the slot
table[i]
. - Found: If
table[i]
contains the key you’re looking for, the search is successful. - Not Found (Empty Slot): If
table[i]
is empty, it means the key cannot be in the table (because if it were inserted, it would either be here or in a subsequent probed slot before this empty one). The search fails. - Probe if Different Key: If
table[i]
contains a different key, probe the next slot:(i + 1) % tableSize
. - Continue Probing: Keep checking subsequent slots
(i + 2) % tableSize
,(i + 3) % tableSize
, etc. In each slot:- If the key is found, return success.
- If an empty slot is found, the key is not in the table, so return failure.
- If the slot is occupied by a different key (or marked deleted), continue probing.
- Search Exhausted: If you probe through the entire table and return to the starting index
i
without finding the key or an empty slot, the key is not present.
Deletion
Simply emptying a slot when deleting an item can cause problems. Why? Imagine you inserted keys A, B, and C. hash(A)
points to index 5. hash(B)
also points to index 5, so B goes into index 6 (due to linear probing). hash(C)
points to index 5, so it probes past A (index 5) and B (index 6) and goes into index 7.
Now, if you delete B by emptying slot 6, and then search for C, the search starts at index 5 (occupied by A), probes to index 6 (now empty!), and incorrectly concludes that C is not in the table because it hit an empty slot.
To fix this, instead of truly emptying the slot, we use a special marker (often called a “tombstone” or “deleted” marker).
- Find the Item: Use the search process described above to locate the item to be deleted.
- Mark as Deleted: If the item is found at index
j
, replace it with the special “deleted” marker instead of making the slot empty.
Note
When searching, treat slots marked “deleted” as occupied โ you need to continue probing past them. When inserting, you can insert a new item into a slot marked “deleted”. This helps reuse space.
Advantages and Disadvantages
Advantages
- Simplicity: It’s the easiest open addressing technique to understand and implement.
- Cache Performance: Because it probes adjacent slots, it often benefits from CPU caching (spatial locality), which can make it fast in practice when the table isn’t too full.
Disadvantages
- Primary Clustering: Linear probing suffers from a problem called primary clustering. This means that as keys collide and probe linearly, long sequences of occupied slots tend to build up. When a new key hashes into this cluster, it has to probe further down the sequence, potentially making the cluster even longer. This significantly slows down insertions and searches as the table gets fuller. Think of it like traffic jams forming on a highway โ once one starts, it tends to grow.
Complexity and Performance
- Average Case: If the table is sparsely populated (low load factor), insertion, deletion, and search operations have an average time complexity close to
O(1)
. - Worst Case: In the worst case (e.g., extreme clustering or a nearly full table), these operations can degrade to
O(n)
, wheren
is the number of elements in the table. This happens because you might have to scan almost the entire table.
Performance heavily depends on the load factor (the ratio of stored elements to the total number of slots). Linear probing works well for low load factors (e.g., below 0.5 or 0.6), but performance drops sharply as the load factor increases due to clustering.
What’s Next?
Now that you understand how linear probing works, you might be interested in exploring other related concepts:
- Open Addressing Overview: Revisit the main concept of open addressing where collisions are resolved by finding alternative slots within the table itself.
- Quadratic Probing: Explore another open addressing technique that uses a quadratic step size (like
index + 1^2
,index + 2^2
,index + 3^2
, …) to probe for empty slots, which helps reduce the primary clustering problem seen in linear probing. - Double Hashing: Learn about a more sophisticated open addressing method that uses a second hash function to determine the step size for probing, effectively minimizing both primary and secondary clustering.
- Separate Chaining: Understand the main alternative approach to collision resolution, where each table slot points to a list (or chain) of elements that hash to the same index.
- Comparing Collision Resolution Techniques: Compare the pros and cons of different methods like separate chaining, linear probing, quadratic probing, and double hashing to see when each performs best.