Python Topics : Garbage Collection
Strategies
Python's memory allocation and deallocation method is automatic
user does not have to preallocate or deallocate memory
unlike dynamic memory allocation in languages such as C or C++. Python uses two strategies for memory allocation
  1. reference counting
  2. garbage collection
Reference Counting
automatically manage memory by tracking how many times an object is referenced
a reference count is a property of each object in the Python language
when an object's reference count reaches zero, it becomes un-referenceable and its memory can be freed up

Simple Reference Counting
# Create an object
x = [1, 2, 3]

# Increment reference count
y = x

# Decrement reference count
y = None
Reference Counting with Cyclic Reference
# Create two objects that refer to each other
x = [1, 2, 3]
y = [4, 5, 6]
x.append(y)
y.append(x)
Using the sys.getrefcount() function
import sys

# Create an object
x = [1, 2, 3]

# Get reference count
ref_count = sys.getrefcount(x)

print("Reference count of x:", ref_count)
Garbage Collection
garbage collection is a memory management technique used in programming languages
automatically reclaims memory that is no longer accessible or in use by the application
helps prevent memory leaks, optimize memory usage, and ensure efficient memory allocation for the program
Generational Garbage Collection
when attempting to add an object to a reference counter a cyclical reference or reference cycle is produced
the object's reference counter could never reach 0 (due to cycle)
a reference counter cannot destroy the object
in situations like this employ the universal waste collector
operates and releases the memory used
a Generational Garbage Collector can be found in the standard library's gc module.
Automatic Garbage Collection of Cycles
reference cycles take computational work to discover
garbage collection must be a scheduled activity
Python schedules garbage collection based upon a threshold of object allocations and object deallocations
when the number of allocations minus the number of deallocations is greater than the threshold number, the garbage collector is run
can inspect the threshold for new objects (objects in Python known as generation 0 objects) by importing the gc module and asking for garbage collection thresholds
# loading gc
import gc

# get the current collection 
# thresholds as a tuple
print("Garbage collection thresholds:", gc.get_threshold())
output
Garbage collection thresholds: (700, 10, 10) 
the default threshold on the above system is 700
means when the number of allocations vs. the number of deallocations is greater than 700 the automatic garbage collector will run
Manual Garbage Collection
any portion of the code which frees up large blocks of memory is a good candidate for running manual garbage collection
garbage collection can be invoked manually
# Importing gc module
import gc

# Returns the number of
# objects it has collected
# and deallocated
collected = gc.collect()

# Prints Garbage collector 
# as 0 object
print("Garbage collector: collected", "%d objects." % collected)
output
('Garbage collector: collected', '0 objects.')
garbage collection after several cycles
import gc
i = 0

# create a cycle and on each iteration x as a dictionary
# assigned to 1
def create_cycle():
	x = { }
	x[i+1] = x
	print(x)

# lists are cleared whenever a full collection or 
# collection of the highest generation (2) is run
collected = gc.collect() # or gc.collect(2)
print("Garbage collector: collected %d objects." % (collected))

print("Creating cycles...")
for i in range(10):
	create_cycle()

collected = gc.collect()

print("Garbage collector: collected %d objects." % (collected))
manual collection after a few cycles
import gc
i = 0

# create a cycle and on each iteration x as a dictionary
# assigned to 1
def create_cycle():
	x = { }
	x[i+1] = x
	print(x)

# lists are cleared whenever a full collection or 
# collection of the highest generation (2) is run
collected = gc.collect() # or gc.collect(2)
print("Garbage collector: collected %d objects." % (collected))

print("Creating cycles...")
for i in range(10):
	create_cycle()

collected = gc.collect()

print("Garbage collector: collected %d objects." % (collected))
output
Garbage collector: collected 0 objects.
Creating cycles...
{1: {...}}
{2: {...}}
{3: {...}}
{4: {...}}
{5: {...}}
{6: {...}}
{7: {...}}
{8: {...}}
{9: {...}}
{10: {...}}
Garbage collector: collected 10 objects.
two ways for performing manual garbage collection:

time-based is simple
the garbage collector is called after a fixed time interval
event-based garbage collection calls the garbage collector on event occurrence
examples
  • user exits the application
  • application enters into an idle state
Forced Garbage Collection
In Python the garbage collector runs automatically and periodically
cleans up objects that are no longer referenced and thus are eligible for garbage collection
in some cases may want to force garbage collection to occur immediately
can do this using the gc.collect() function provided by the gc module
import gc

# Create some objects
obj1 = [1, 2, 3]
obj2 = {"a": 1, "b": 2}
obj3 = "Hello, world!"

# Delete references to objects
del obj1
del obj2
del obj3

# Force a garbage collection
gc.collect()
Disabling Garbage Collection
may want to disable the garbage collector to prevent it from running
can do this using the gc.disable() function provided by the gc module
import gc

# Disable the garbage collector
gc.disable()

# Create some objects
obj1 = [1, 2, 3]
obj2 = {"a": 1, "b": 2}
obj3 = "Hello, world!"

# Delete references to objects
del obj1
del obj2
del obj3

# The garbage collector is disabled, so it will not run
Interacting with Python Garbage Collector
Enabling and disabling the garbage collector:
import gc

# Disable the garbage collector
gc.disable()

# Enable the garbage collector
gc.enable()
Forcing garbage collection
import gc

# Trigger a garbage collection
gc.collect()
Inspecting garbage collector settings
import gc

# Get the current garbage collector thresholds
thresholds = gc.get_threshold()
print(thresholds)
Setting garbage collector thresholds
import gc

gc.set_threshold(500, 5, 5)
print("Garbage collector thresholds set")

# Get the current garbage collector thresholds
thresholds = gc.get_threshold()
print("Current thresholds:", thresholds)
Advantages and Disadvantages
Advantages
automated memory management to avoid memory leaks and lower the chance of running out of memory, the Python garbage collector automatically removes objects which are no longer referenced
memory management made easier the garbage collector frees developers from having to manually manage memory
efficient memory cleanup the garbage collector is designed to minimise performance effects while swiftly identifying and collecting short-lived objects via generational garbage collection
customizable settings the garbage collector provides options to customize its settings
allows developers to fine-tune the garbage collection process based on their specific application requirements

Disadvantages
impact on performance the garbage collector is designed to efficiently clean up unused memory
may still be some CPU consumption and execution time overhead
the difficulty of memory management garbage collector makes managing memory easier
using it successfully may still necessitate knowledge of other concepts
  • object lifetimes
  • object references
  • garbage collection algorithms
limited control over memory management developers have little control over the precise timing and behaviour of memory cleanup
may not be ideal for many application scenarios where fine-grained control over memory management is necessary
bug potential the garbage collector is intended to be dependable and effective
it is not impervious to errors or atypical behaviour
could lead to memory leaks or improper object cleanup
index