Word Frequency Counter using Hash Tables

When dealing with text analysis or natural language processing tasks, one fundamental operation is counting the frequency of words within a given text. This process involves determining how often each word appears in the text. One efficient and commonly used approach to accomplish this task is by using a hash table (also known as dictionary in python, unordered_map in C++).

Below is the code containing a function which takes in a text and prints the number of times each unique word appeared in the text.


def print_word_frequency(text):
    # Initialize Python Dictionary/Hash Table to store frequencies
    word_count = {}
    
    # Create list all words in text
    words = text.split()

    # Iterate through each word in the text
    for word in words:
        # Remove punctuation and convert word to lowercase for accurate counting
        word = word.strip('.,!?').lower()

        if word:
            # Get count of this word so far
            prev_count = word_count.get(word, 0)
            # Add one more to it and update count
            word_count[word] = prev_count + 1

    # Print word frequencies
    for word, count in word_count.items():
        print(f"'{word}': {count}")

# Example usage:
sample_text = "This is a sample text. This text will be used for word frequency counting."
print_word_frequency(sample_text)

# Example Output:
# 'this': 2
# 'is': 1
# 'a': 1
# 'sample': 1
# 'text': 2
# 'will': 1
# 'be': 1
# 'used': 1
# 'for': 1
# 'word': 1
# 'frequency': 1
# 'counting': 1

The code counts the frequency of each word in the input text by utilizing a hash table (dictionary in python) to store words as keys and their frequencies as values. It also handles punctuation and case sensitivity to accurately count word occurrences in the text.

This is the explanation of code in more detail:

  • print_word_frequency function takes a string text as input and prints the frequency of each unique word in the text.
  • Initializes an empty Hash table (python dictionary) word_count to store word frequencies.
  • Splits the text into individual words using text.split(), creating a list of words.
  • Iterates through each word in the list:
    • Removes punctuation marks (such as periods, commas, exclamation marks, and question marks) using strip('.,!?').
    • Converts the word to lowercase using lower() for uniformity.
    • Checks if the word is not empty, then updates its count in the dictionary.
  • Finally, it iterates over the word_count dictionary and prints each word along with its frequency using a loop over the dictionary’s items.