Hash tables are fantastic tools for storing and retrieving data quickly. They work by using a special function, called a hash function, to convert a key (like a name or ID) into an index (a slot number) in an array. Ideally, each key gets its own unique slot. However, sometimes the hash function generates the same index for two or more different keys. This is called a collision.
In our previous discussions, we saw Separate Chaining which uses linked lists outside the table to handle collisions. Open Addressing takes a different approach: if the target slot is full, we simply look for another empty slot within the table itself to place the item.
What is Open Addressing?
Think of the hash table as a parking lot with numbered parking spots. When you arrive (insert an item), the hash function gives you an assigned spot number. Let’s say you’re assigned spot #5 based on your license plate (hash function). By the time you arrive, spot #5 has already been taken (a collision!). With Open Addressing, instead of parking somewhere else entirely (like Separate Chaining might suggest), you decide to check the next spot, #6. If #6 is taken, you check #7, and so on, until you find an empty spot in the main parking lot.
That’s the core idea of Open Addressing: all elements are stored directly inside the hash table array. When a collision occurs at a particular index, we systematically “probe” or search for the next available slot in the table according to a predefined sequence.
How Open Addressing Works (The Probing Concept)
The process of finding an alternative empty slot when a collision happens is called probing. Here’s the general idea:
- Calculate Initial Index: Use the hash function
h(key)
to find the initial indexi
where the element should ideally go. - Check the Slot: Look at the slot
table[i]
. - Collision? Probe!:
- If
table[i]
is empty, insert the new element there. Done! - If
table[i]
is occupied, we have a collision. We need to find the next slot to check. The decision on which slot to check next depends on the probing strategy used (Which we will cover in the upcoming sections). Let’s say the next slot to check is indexj
.
- If
- Check the Next Slot: Look at
table[j]
. - Repeat:
- If
table[j]
is empty, insert the element there. Done! - If
table[j]
is also occupied, calculate the next slot in the probing sequence (say, indexk
) and repeat the check.
- If
- Continue: Keep probing until an empty slot is found.
Note
The way we calculate the “next slot” (the probing sequence) is crucial. There are several methods like Linear Probing, Quadratic Probing, and Double Hashing, each with its own way of determining the sequence. We’ll explore these in detail in separate articles.
Operations with Open Addressing
Now let’s look at how we perform basic hash table operations when using Open Addressing:
Insertion:
- Calculate the initial hash index for the key you’re searching for.
- If the slot is empty or marked as “deleted” (more on this below), insert the item.
- If the slot is occupied, follow the probing sequence (check the next slot, then the next, etc.) until an empty or “deleted” slot is found. Insert the item there.
- If the table becomes too full, you might need to perform rehashing.
Deletion: Deletion is a bit tricky in Open Addressing. If you simply remove an element and leave the slot empty, it might break the probing chain for searching other elements. Imagine you inserted A, then B (which collided with A and went to the next slot), and then C (which collided with both and went further). If you delete B and mark its slot as empty, searching for C might stop prematurely at B’s now-empty slot, incorrectly concluding C isn’t there.
- To handle this, instead of truly emptying the slot, we mark it with a special “deleted” or “tombstone” marker.
- During insertion, “deleted” slots can be treated as empty and reused.
- During search, “deleted” slots are treated as occupied, and the probing continues.
Search:
- Calculate the initial hash index
i
for the key you want to find. - Start probing at index
i
. - Examine the slot
table[i]
:- If
table[i]
contains the exact key you are searching for, great! You’ve found it. - If
table[i]
is empty, you can stop searching. The key isn’t in the table. (If it existed, it would have been placed either here or in a slot probed before). - If
table[i]
contains a different key, OR if it’s marked as “deleted” (tombstone), you haven’t found your key yet, but it might still be further along the probe path. You need to continue probing.
- If
- Calculate the next index
j
using the exact same probing sequence that was used during insertion. - Repeat step 3, examining the slot
table[j]
. Keep probing until you either find the key or encounter an truly empty slot.
- Calculate the initial hash index
Note
The use of “tombstones” for deletion adds some complexity and can slightly slow down searches over time if many deletions occur without subsequent insertions reusing those slots.
Advantages and Disadvantages of Open Addressing
Advantages:
- Space Efficiency: It doesn’t require extra storage for pointers or linked lists, as all elements reside within the table array itself.
- Cache Performance: Elements are often stored close to each other in memory (especially with techniques like Linear Probing), which can lead to better cache utilization and potentially faster access times compared to Separate Chaining where elements might be scattered in memory via linked lists.
Disadvantages:
- Clustering: Elements can tend to clump together in the table (primary clustering with Linear Probing, secondary clustering with Quadratic Probing). This means long sequences of occupied slots can form, increasing search and insertion times as probes have to travel further.
- Sensitivity to Load Factor: Performance degrades significantly as the table fills up (i.e., as the load factor gets high). It’s crucial to keep the load factor low (often below 0.7 or even 0.5) for good performance, which might mean using a larger table than needed.
- Deletion Complexity: Handling deletions requires the special “tombstone” marker, adding complexity to the implementation and potentially impacting search performance over time.
- Hash Function Importance: The quality of the hash function and the chosen probing strategy are critical to minimize clustering and maintain good performance.
Open Addressing offers a compelling alternative to Separate Chaining for collision resolution, particularly when memory is a primary concern or cache performance is critical. However, it comes with its own set of challenges, especially related to clustering and deletion.
What’s Next?
Now that you understand the general concept of Open Addressing, you can dive deeper into the specific techniques used for probing:
- Linear Probing: The simplest probing technique, where we check consecutive slots.
- Quadratic Probing: Uses a quadratic step to check slots further apart, aiming to reduce clustering.
- Double Hashing: Uses a second hash function to determine the step size for probing, often providing the best distribution.
You might also be interested in comparing these techniques directly:
Understanding these specific methods will give you a complete picture of how Open Addressing works in practice.