Hash Table Data Structure

Introduction to Hash Tables
🔗
Hash Tables, Hashing Functions and Collisions
🔗

A Hash table (also known as a hash map) is a data structure used to store a set of items and retrieve any stored item quickly using key associated with the item. Every item stored in a hash table is stored against a key unique to the item.

For example, if we were to store details of students, we allocate an id to each student and then store the student details against his id. Here id corresponds to the key in hash table and student data is the actual data. Any time we require details of a student, we can give any student’s id to the hash map and the hash map can quickly return data of the student.

How a Hash Table Works

Let’s say we have 5 students enrolled for a course. In simple scenario, we can assign ids starting from 0 to 4 for each of the student. So the data looks like below:

Now, if we wish to look at the details of student with id-3, we can directly go to position 3 in the array and access the corresponding student details. Similarly changing any attributes of any student is as simple as going to the slot corresponding to student’s id and change the attributes.

The above data structure is a simple array and we are accessing element using the index where the element is stored. But what if the id/key is not a number? in this case the id can’t be used to access a position in the array. This is where hash tables come into picture.

A basic hash table internally uses an array to store data. It takes in the key/id of any object and tries to map the key to a number which lies in the range of size of the array. This mapping is done using a hash function.

For example if the hash table needs to store details of a student with key stud22, it uses a hashing function to convert that key to a number, for example stud22 is mapped to 4. Then the student’s data is stored at position 4 in the array. Next time when we wish to access the data of student with key/id stud22, the hash table uses the same hashing function to get the index position (which is 4). Then the data at position 4 in the array is accessed and returned to the requester.

Note: The hash function should consistently return the same index for a key. For example, the hash function when applied in stud22 should always return a same index (eg: 4) at any time.

Hashing Collisions and Resolution Techniques

In the previous section, we discussed how a hash function can be used to map any string or an object to a number/index. This index can be used to locate data in the storage array. But what if the hash function returns the same number/index for two different strings?

For example, if the hash function is a simple one which maps a to 1, b to 2, c to 3 and so on. If the string contains multiple characters, let’s say the hash function adds the values of all characters in the string. For the string “ab”, the hash value will be 1 + 2 which is 3. Also for the string “ba”, the hash value will be 2 + 1 which is 3!.

So the hash function we chose has decided that the index for the key “ab” is same as the index for the string “ba”. This is termed as hash collision and can result in data being overwritten for two different objects.

To handle scenarios like these, we use different hashing collision resolution techniques to identify the correct location and prevent overwriting the data corresponding to another key.

Below are some of the hashing collision techniques:

Collision resolution by Separate Chaining

In this method, the actual data is stored in a linked list present at the hashed index location. So if two different keys have the same hashed index, both of them will be added to a linked list present at that index. While retrieving the values, we go through each object in the linked list present at the index position and find the correct one. Below is a sample illustration of how a hash table implementing separate chaining looks (source: wikipedia):

In the above example, both “John Smith” and “Sandra Dee” are mapped to the same index “152”. If each index of the array stores only a single object, the data of one person will be overwritten by the other. But instead a linked list is used at position “152” into which the details of both the persons are added.

Collision resolution by Open Addressing

In this method, if the slot corresponding to the key where the object needs to be stored is filled, the next unfilled slot is used to store the data. While retrieving the object, we start from the index where it should ideally be stored and move ahead until we find the object or an empty slot. Below is a sample illustration of how a hash table implementing open addressing looks (source: wikipedia):

In the above example, both “John Smith” and “Sandra Dee” are mapped to the same index “152”. But by the time the data of “Sandra Dee” is about to be stored, the slot at index “152” is already filled with data of “John Smith”. So the data of “Sandra Dee” is stored in the next available slot “153”.

TODO: Other collision resolution techniques, Time space complexities and other implementation info.

References