When dealing with text analysis or natural language processing tasks, one fundamental operation is counting the frequency of words within a given text. This process involves determining how often each word appears in the text. One efficient and commonly used approach to accomplish this task is by using a hash table (also known as dictionary in python
, unordered_map in C++
).
Below is the code containing a function which takes in a text and prints the number of times each unique word appeared in the text.
def print_word_frequency(text):
# Initialize Python Dictionary/Hash Table to store frequencies
word_count = {}
# Create list all words in text
words = text.split()
# Iterate through each word in the text
for word in words:
# Remove punctuation and convert word to lowercase for accurate counting
word = word.strip('.,!?').lower()
if word:
# Get count of this word so far
prev_count = word_count.get(word, 0)
# Add one more to it and update count
word_count[word] = prev_count + 1
# Print word frequencies
for word, count in word_count.items():
print(f"'{word}': {count}")
# Example usage:
sample_text = "This is a sample text. This text will be used for word frequency counting."
print_word_frequency(sample_text)
# Example Output:
# 'this': 2
# 'is': 1
# 'a': 1
# 'sample': 1
# 'text': 2
# 'will': 1
# 'be': 1
# 'used': 1
# 'for': 1
# 'word': 1
# 'frequency': 1
# 'counting': 1
The code counts the frequency of each word in the input text by utilizing a hash table (dictionary in python) to store words as keys and their frequencies as values. It also handles punctuation and case sensitivity to accurately count word occurrences in the text.
This is the explanation of code in more detail:
print_word_frequency
function takes a stringtext
as input and prints the frequency of each unique word in the text.- Initializes an empty Hash table (python dictionary)
word_count
to store word frequencies. - Splits the text into individual words using
text.split()
, creating a list of words. - Iterates through each word in the list:
- Removes punctuation marks (such as periods, commas, exclamation marks, and question marks) using
strip('.,!?')
. - Converts the word to lowercase using
lower()
for uniformity. - Checks if the word is not empty, then updates its count in the dictionary.
- Removes punctuation marks (such as periods, commas, exclamation marks, and question marks) using
- Finally, it iterates over the
word_count
dictionary and prints each word along with its frequency using a loop over the dictionary’s items.