In this tutorial, you’ll learn how to use the counter object from Python’s collection module.
When you’re working with long sequences in Python, say, Python lists or strings, you may sometimes need to store the items that appear in the sequence and the number of times they appear.
A Python dictionary is a suitable built-in data structure for such applications. However, Python’s Counter class from the collections module can simplify this—by constructing a counter—which is a dictionary of items and their count in the sequence.
Over the next few minutes, you’ll learn the following:
- Use Python’s counter object
- Create a Python dictionary to store count values of items in an iterable
- Rewrite the dictionary using Python’s counter with a simplified syntax
- Perform operations such as updating and subtracting elements, finding intersection between two counter objects
- Get the most frequent items in the counter using the
most_common()
method
Let’s get started!
Python Collections Module and Counter Class
You’ll often use a Python dictionary is to store the items and their count in an iterable. The items and the count are stored as keys and values, respectively.
As the Counter
class is part of Python’s built-in collections module, you can import it in your Python script like so:
from collections import Counter
After importing the Counter class as mentioned, you can instantiate a counter object as shown:
<counter_object> = Counter(iterable)
Here:
iterable
is any valid Python iterable such as Python list, string, or tuple.- The items in the iterable should be hashable.
Now that we know how to use Counter
to create counter objects from any Python iterable, let’s start coding.
The examples used in this tutorial can be found in this GitHub gist.
How to Create a Counter Object from Python Iterables
Let’s create a Python string, say, ‘renaissance’ and call it word
.
>>> word = "renaissance"
Our goal is to create a dictionary where each letter in the word
string is mapped to the number of times it occurs in the string. One approach is to use for loops as shown:
>>> letter_count = {}
>>> for letter in word:
... if letter not in letter_count:
... letter_count[letter] = 0
... letter_count[letter] += 1
...
>>> letter_count
{'r': 1, 'e': 2, 'n': 2, 'a': 2, 'i': 1, 's': 2, 'c': 1}
Let’s parse what the above code snippet does:
- Initializes
letter_count
to an empty Python dictionary. - Loops through the
word
string. - Checks if
letter
is present in theletter_count
dictionary. - If
letter
is not present, it adds it with a value of0
and subsequently increments the value by 1. - For each occurrence of
letter
inword
, the value corresponding toletter
is incremented by 1. - This continues until we loop through the entire string.
We constructed the letter_count
dictionary—on our own—using for loop to loop through the string word
.
Now let’s use the Counter class from the collections module. We only need to pass in the word
string to Counter()
to get letter_count
without having to loop through iterables.
>>> from collections import Counter
>>> letter_count = Counter(word)
>>> letter_count
Counter({'e': 2, 'n': 2, 'a': 2, 's': 2, 'r': 1, 'i': 1, 'c': 1})
The counter object is also a Python dictionary. We can use the built-in isinstance()
function to verify this:
>>> isinstance(letter_count,dict)
True
As seen, isinstance(letter_count, dict)
returns True
indicating that the counter object letter_count
is an instance of the Python dict
class.
Modifying the Counter Object
So far, we’ve learned to create counter objects from Python strings.
You can also modify counter objects by updating them with elements from another iterable or subtracting another iterable from them.
Updating a Counter with Elements from Another Iterable
Let’s initialize another string another_word
:
>>> another_word = "effervescence"
Suppose we’d like to update the letter_count
counter object with the items from another_word
string.
We can use the update()
method on the counter object letter_count
.
>>> letter_count.update(another_word)
>>> letter_count
Counter({'e': 7, 'n': 3, 's': 3, 'c': 3, 'r': 2, 'a': 2, 'f': 2, 'i': 1, 'v': 1})
In the output, we see that the counter object has been updated to also include the letters and their number of occurrences from another_word
.
Subtracting Elements from Another Iterable
Now let’s subtract the value of another_word
from letter_count
object. To do so, we can use the subtract()
method. Using <counter-object>.subtract(<some-iterable>)
subtracts the values corresponding to items in <some-iterable>
from the <counter-object>
.
Let’s subtract another_word
from letter_count
.
>>> letter_count.subtract(another_word)
>>> letter_count
Counter({'e': 2, 'n': 2, 'a': 2, 's': 2, 'r': 1, 'i': 1, 'c': 1, 'f': 0, 'v': 0})
We see that the values corresponding to the letters in another_word
have been subtracted, but the added keys ‘f’ and ‘v’ are not removed. They now map to a value of 0.
Note: Here, we have passed in
another_word
, a Python string, to thesubtract()
method call. We can also pass in a Python counter object or another iterable.
Intersection Between Two Counter Objects in Python
You may sometimes want to find the intersection between two Python counter objects to identify which keys are common between the two.
Let’s create a counter object, say, letter_count_2
, from the another_word
string ‘effervescence’.
>>> another_word = "effervescence"
>>> letter_count_2 = Counter(another_word)
>>> letter_count_2
Counter({'e': 5, 'f': 2, 'c': 2, 'r': 1, 'v': 1, 's': 1, 'n': 1})
We can use the simple & operator to find the intersection between letter_count
and letter_count_2
.
>>> letter_count & letter_count_2
Counter({'e': 2, 'r': 1, 'n': 1, 's': 1, 'c': 1})
Notice how you get the keys and the number of occurrences common to the two words. Both ‘renaissance’ and ‘effervescence’ contain two occurrences of ‘e’, and one occurrence each of ‘r’, ‘n’, ‘s’, and ‘c’ in common.
Find the Most Frequent Items Using most_common
Another common operation on the Python counter object is to find the most frequently occurring items.
To get the top k most common items in the counter, you can use the most_common()
method on the counter object. Here, we call most_common()
on letter_count
to find the three most frequently occurring letters.
>>> letter_count.most_common(3)
[('e', 2), ('n', 2), ('a', 2)]
We see that the letters ‘e’, ‘n’, and ‘a’ occur twice in the word ‘renaissance’.
This is especially helpful if the counter contains a large number of entries and you’re interested in working with the most common keys.
Conclusion
Here’s a quick review of what we’ve learned in tutorial:
- The
Counter
class from Python’s built-in collections module can be used to get a dictionary of count values of all items in any iterable. You should make sure that all the items in the iterable are hashable. - You can update the contents of one Python counter object with contents from another counter object or any other iterable using the
update()
method with the syntax:counter1.update(counter2)
. Note that you can use any iterable in place ofcounter2
. - If you want to remove the contents of one of the iterables from the updated counter, you can use the
subtract()
method:counter1.subtract(counter2)
. - To find the common elements between two counter objects, you can use the & operator. Given two counters
counter1
andcounter2
,counter1 & counter2
returns the intersection of these two counter objects. - To get the k most frequent items in a counter, you can use the
most_common()
method.counter.most_common(k)
gives the k most common items and the respective counts.
Next, learn how to use default dict, another class in the collections module. You can use default dict instead of a regular Python dictionary to handle missing keys.