Interested in learning how Python manages memory internally? If so, this guide to garbage collection in Python is for you.
When programming in Python, you don’t generally have to worry about memory allocation and deallocation. However, it’s helpful to understand how Python handles this under the hood through the process of garbage collection.
In this tutorial, we’ll explore garbage collection and its importance. We’ll then look at how Python uses reference counting to remove objects that are no longer in use
What Is Garbage Collection, And Why Is It Important?
In Python, you may not run into out-of-memory errors often, but it is possible.
When you create objects in a program, they take up memory. And there will likely be many objects that are no longer in use.
What happens if the memory occupied by such objects is never freed? Well, you’ll eventually run out of memory. Enter garbage collection.
Garbage collection is the process that is responsible for automatically identifying and reclaiming memory occupied by objects that are no longer in use—preventing memory leaks and improving the overall efficiency of memory usage.
How Garbage Collection Works in Python?
Now that we understand garbage collection and its significance, let’s see how Python uses garbage collection.
Understanding Variables and Objects in Python
You’re probably used to thinking of variables as containers that hold values. In Python, however, it’s helpful to describe variables as “names” or “labels” for objects rather than containers that hold objects. This is a fundamental concept to understand how variables work in Python.
Let’s take a closer look.
When you create a variable in Python, you’re essentially creating a reference to an object in memory. The variable is a label or name that points to a memory location where the object is stored. It does not store the object itself, but it “references” or “points to” the object.
Here’s an example:
a = 27
When you create a = 27:
An object of type int is created in memory.
It takes a value of 27.
The variable a points to or references that object. So the reference count associated with the object is 1. (We’ll discuss reference counting in detail in the next section).
Note: You can run hex(id(a)) to get the address in memory of the object that the label a points to.
What happens if you change the value of a? Now a no longer points to the previous memory address.
The variable a now points to an integer object with a value of 7.
Reference counting is a memory management technique used in Python to keep track of the number of references to an object.
The idea is to assign a reference count to each object and increment or decrement this count as references to the object are created or deleted. When the reference count drops to zero—indicating that there are no more references to the object—the memory occupied by the object can be reclaimed.
So what is the reference count? As discussed, each Python object has a reference count associated with it. Which is the number of variables pointing to it.
When we create a new reference to an object, the reference count is incremented. When we delete a reference, (a variable goes out of scope or is explicitly set to None), the reference count is decremented.
And how does garbage collection occur?
When the reference count of an object drops to zero, it means there are no more references to that object.
The memory occupied by the object is then eligible for reclamation.
Python uses a garbage collector to periodically identify and collect objects that are no longer referenced and free up the associated memory.
a = [1,2,3]
print(sys.getrefcount(a)) # Prints 2 because getrefcount itself creates a reference
b = a # reference count increases by 1
print(sys.getrefcount(a)) # Prints 3
del b # one reference removed
print(sys.getrefcount(a)) # Prints 2
You’ll get the following output:
How this works is pretty straightforward, but let’s go over the steps:
Initially, the reference count of the list [1, 2, 3] is 1.
Because there’s a temporary reference to a as the argument to getrefcount(), the count is 2 (one higher than expected).
When we assign a to b, the reference count becomes 3.
When we delete b, the reference count becomes 2 again.
While reference counting is fundamental to Python’s memory management, it’s worth mentioning that Python also uses cyclic reference detection to handle more complex cases, such as circular references (more on this later).
Python’s Built-in gc Module
Let’s now learn about the gc module for garbage collection.
You can explicitly trigger the garbage collector in your Python script like so:
def __init__(self, name):
self.name = name
# Create some objects
obj1 = MyClass('Object 1')
obj2 = MyClass('Object 2')
obj3 = MyClass('Object 3')
# Get all objects tracked by the garbage collector
all_objects = gc.get_objects()
# Print information about each object
for obj in all_objects:
if isinstance(obj, MyClass):
# Manually trigger garbage collection
Using get_objects() from the gc module will give you the list of all objects known to the garbage collector. Because this list can be prohibitively large, we’ve tried to get information only on objects belonging to MyClass.
Object 1:<__main__.MyClass object at 0x7f96e7421510>
Object 2:<__main__.MyClass object at 0x7f96e74219d0>
Object 3:<__main__.MyClass object at 0x7f96e7421990>
Note: You can use gc.disable() to manually disable automatic garbage collection. But it is not recommended unless you need granular control over your script.
Cyclic (or circular) references occur when a group of objects references each other in a way that forms a cycle.
If no external references exist to this group, it can create memory leaks. Because the reference count for each object never reaches zero, and the objects are never garbage collected.
Here’s a simple example of a circular reference:
# Create an empty set
my_set = set()
# Add circular reference
Now the my_set contains a reference to itself, creating a circular reference. In addition to reference counting, Python supports cyclic garbage collection to detect such circular references.
In addition, it uses a generational approach to garbage collection as well.
Generational Approach to Garbage Collection
Generational garbage collection in Python is based on the observation that most objects in a program have a short lifespan. Objects are categorized into different generations based on how long they have been alive—how many collection sweeps they’ve survived—and the garbage collector applies different collection strategies to these generations.
Python’s generational garbage collection typically divides objects into three generations:
#1. Generation 0
Newly created objects start in this generation.
Objects that survive a garbage collection cycle in this young generation move to the next older generation.
#2. Generation 1
Objects that survive several garbage collection cycles in the young generation move to this middle generation.
Garbage collection is less frequent in this generation as compared to generation 0.
#3. Generation 2
Objects that survive numerous garbage collection cycles in Generation 1 move to Generation 2, the oldest generation.
Garbage collection in this generation is even less frequent.
Generational garbage collection is, therefore, based on the generational hypothesis, which states that young objects are more likely to become garbage than older objects.
By collecting the young generation more frequently and the older generations less frequently, the garbage collector can achieve better performance.
Advantages of Garbage Collection
We’ve already discussed why garbage collection is important. Let’s restate the advantages:
Garbage collection automates the process of reclaiming memory occupied by objects that are no longer in use, relieving developers from manual memory management.
It helps prevent memory leaks by identifying and cleaning up unreachable objects.
Garbage collection makes Python applications more robust, reducing the risk of crashes and unexpected behavior due to memory issues.
Best Practices for Garbage Collection in Python
Let’s list some best practices to leverage automatic garbage collection while also writing efficient Pythonic code:
✅ Use Automatic Garbage Collection: Python has built-in automatic garbage collection, so avoid manually managing memory in most cases. Avoid disabling the garbage collector unless absolutely necessary.
✅ Use Context Managers for Resources: When working with external resources like files or network connections, use context managers in with statements to ensure resources are properly closed and released. This reduces the chances of resource leaks.
✅ Monitor and Profile Memory Usage: Use the gc and tracemalloc modules to monitor and profile memory usage in your Python applications to identify performance bottlenecks and potential memory leaks.
✅ Optimize Memory Usage: Minimize the creation of unnecessary objects, and consider using suitable built-in data structures and memory-efficient iterators like generators. Additionally, use appropriate data types to avoid unnecessary memory overhead.
This article explained how garbage collection and memory management work in Python. Let’s review what we’ve learned.
We learned how Python uses reference counting to remove references to objects that are no longer in use. Then, we looked at how Python handles cyclic references and the generational approach to garbage collection.
We then went over the advantages of garbage collection and wrapped up by discussing some of the best practices for garbage collection in Python.
When a user opens a website, one of the first things they notice is the header. A website header is the top section of a webpage, which contains elements such as a site’s logo, navigation menu, and functionalities such as searching and logging in.