Imagine a hash table as a set of labelled boxes (or slots). When we want to store an item, a hash function tells us which box to use. But what happens if that box is already full? This situation is called a collision. Open addressing is one way to handle collisions: instead of putting the item somewhere else entirely, we look for the next available slot within the table itself.
We’ve seen techniques like Linear Probing (check the very next box) and Quadratic Probing (check boxes farther and farther away quadratically). Double Hashing is another, often more efficient, way to find that next empty box when a collision happens. It uses a second hash function to decide how far to jump to find the next spot.
How Double Hashing Works
The core idea of double hashing is simple yet clever: instead of using a fixed step size (like 1 in linear probing) or a quadratically increasing step size, the step size itself depends on the key you’re trying to insert. This is achieved by using two different hash functions:
h1(key)
: This is your primary hash function. It calculates the initial index where you try to place the key, just like in other hashing methods.initial_index = h1(key) % table_size
.h2(key)
: This is the second hash function. It calculates the step size or interval you’ll use to jump through the table if the initial slot is already taken.step_size = h2(key) % table_size
.
When you need to find the next slot (after i
collisions), the formula looks like this:
index = (h1(key) + i * h2(key)) % table_size
Here:
h1(key)
is the result of the first hash function.h2(key)
is the result of the second hash function (the step size).i
is the probe number (0 for the first attempt, 1 for the second, 2 for the third, and so on).table_size
is the total number of slots in your hash table.
Think of it like this: Imagine trying to find a parking spot in a numbered lot (table_size
). Your car’s license plate (key
) tells you where to look first (h1(key)
). If that spot is taken, your car’s model (h2(key)
) tells you how many spots to skip to find the next potential spot. If that’s also taken, you skip the same number of spots again, and so on. Different car models might have different skip numbers, leading to different search patterns.
Note
For double hashing to work well, the second hash function h2(key)
must have two crucial properties:
h2(key)
must never return 0. If the step size were 0, you’d keep probing the same spot forever!- Ideally, the value of
h2(key)
should be relatively prime to thetable_size
. This helps ensure that your probing sequence eventually checks every slot in the table if necessary, preventing you from getting stuck in a smaller loop. A common way to achieve this is to maketable_size
a prime number and chooseh2(key)
such that it’s always between 1 andtable_size - 1
. For example,h2(key) = R - (key % R)
, whereR
is a prime number smaller thantable_size
.
Example of Double Hashing
Let’s say our hash table has table_size = 7
.
Our hash functions are:
h1(key) = key % 7
h2(key) = 5 - (key % 5)
(Note:h2
will never be 0 here, askey % 5
ranges from 0 to 4, so5 - (key % 5)
ranges from 1 to 5).
Let’s insert the keys: 15, 22, 8.
jsondsav[Placeholder for an interactive animation showing the insertion process]
Insert 15:
h1(15) = 15 % 7 = 1
. Index 1 is empty. Place 15 at index 1.- Table:
[_, 15, _, _, _, _, _]
Insert 22:
h1(22) = 22 % 7 = 1
. Index 1 is occupied by 15 (Collision!).- Calculate step size:
h2(22) = 5 - (22 % 5) = 5 - 2 = 3
. - Probe 1 (
i=1
):index = (h1(22) + 1 * h2(22)) % 7 = (1 + 1 * 3) % 7 = 4 % 7 = 4
. Index 4 is empty. Place 22 at index 4. - Table:
[_, 15, _, _, 22, _, _]
Insert 8:
h1(8) = 8 % 7 = 1
. Index 1 is occupied by 15 (Collision!).- Calculate step size:
h2(8) = 5 - (8 % 5) = 5 - 3 = 2
. - Probe 1 (
i=1
):index = (h1(8) + 1 * h2(8)) % 7 = (1 + 1 * 2) % 7 = 3 % 7 = 3
. Index 3 is empty. Place 8 at index 3. - Table:
[_, 15, _, 8, 22, _, _]
Notice how keys 22 and 8, which initially collided at index 1, had different step sizes (3 and 2 respectively) calculated by h2
, leading them to different subsequent probe locations. This helps spread out keys more effectively than linear or quadratic probing.
Advantages and Disadvantages
Double hashing is often considered one of the best open addressing methods.
Advantages:
- Reduces Clustering: It significantly minimizes both primary clustering (long runs of occupied slots caused by linear probing) and secondary clustering (where different keys hashing to the same initial slot follow the same probe sequence, as in quadratic probing). Because the step size depends on the key itself, keys that collide initially are likely to have different probe sequences.
- More Uniform Distribution: Keys tend to be spread out more evenly across the table compared to linear or quadratic probing, especially as the table gets fuller.
- Good Performance: On average, it requires fewer probes than linear or quadratic probing to find an empty slot or an element, leading to faster
insert
,search
, anddelete
operations (closer toO(1)
) provided the load factor is kept reasonably low.
Disadvantages:
- Computational Cost: It requires calculating a second hash function (
h2
) for each probe after the initial collision. This adds a small amount of extra computation compared to the simpler calculations in linear or quadratic probing. - Complexity of
h2
: Choosing a good second hash functionh2
that avoids returning 0 and is relatively prime to the table size can be more complex than implementing the fixed steps of linear or quadratic probing. - Performance Degradation: Like all open addressing techniques, performance can still degrade significantly as the table becomes nearly full (high load factor). The average number of probes increases dramatically.
What’s Next?
You’ve now explored double hashing, a powerful technique for handling collisions in hash tables. To continue building your understanding of hash tables and related concepts, consider exploring these topics:
- Comparing Collision Resolution Techniques: See how double hashing stacks up against other methods like separate chaining, linear probing, and quadratic probing in terms of performance and trade-offs.
- Load Factor and Rehashing: Learn how the “fullness” of a hash table affects performance and how resizing (rehashing) helps maintain efficiency, which is crucial for open addressing techniques.
- Hash Tables in Programming Languages: Discover how hash tables (often called dictionaries, maps, or hash maps) are implemented and used in popular languages like Python, Java, or JavaScript.
- Basic Hash Table Examples: Apply your knowledge by looking at simple problems that can be solved effectively using hash tables, such as counting word frequencies or finding duplicates.