Python Topics : LRU Cache Strategy
Caching and Its Uses

caching is an optimization technique
can use in applications to keep recent or often-used data in memory locations which are faster or computationally cheaper to access than their source
apps such as a newsreader use caching to keep recent remotely-accessed files locally

Implementing a Cache Using a Python Dictionary
simple cache example
import requests

cache = dict()

def get_article_from_server(url):
    print("Fetching article from server...")
    response = requests.get(url)
    return response.text

def get_article(url):
    print("Getting article...")
    if url not in cache:
        cache[url] = get_article_from_server(url)

    return cache[url]

get_article("https://realpython.com/sorting-algorithms-python/")
get_article("https://realpython.com/sorting-algorithms-python/")
output
$ pip install requests
$ python caching.py
Getting article...
Fetching article from server...
Getting article...
Caching Strategies

StrategyEviction PolicyUse Case
First-In/First-Out (FIFO) Evicts the oldest of the entries Newer entries are most likely to be reused
Last-In/First-Out (LIFO) Evicts the latest of the entries Older entries are most likely to be reused
Least Recently Used (LRU) Evicts the least recently used entry Recently used entries are most likely to be reused
Most Recently Used (MRU) Evicts the most recently used entry Least recently used entries are most likely to be reused
Least Frequently Used (LFU) Evicts the least often accessed entry Evicts the least often accessed entry

Diving Into the Least Recently Used (LRU) Cache Strategy
a cache implemented using the LRU strategy organizes its items in order of use
every time you access an entry, the LRU algorithm will move it to the top of the cache
the algorithm can quickly identify the entry that's gone unused the longest by looking at the bottom of the list
LRU strategy assumes that the more recently an object has been used, the more likely it will be needed in the future

Peeking Behind the Scenes of the LRU Cache
one way to implement an LRU cache in Python is to use a combination of a doubly linked list and a hash map
using the hash map can ensure access to every item in the cache
map each entry to the specific location in the doubly linked list
strategy is very fast
accessing the least recently used item and updating the cache are operations with a runtime of O(1)

Measuring Efficiency With Big O Notation

Big OComplexityDescription
O(1) constant the runtime is constant regardless of the size of the input
finding an element in a hash table is an example of an operation that can be performed in constant time
O(n) linear the runtime grows linearly with the size of the input
a function that checks a condition on every item of a list is an example of an O(n) algorithm./td>
O(n2) quadratic the runtime is a quadratic function of the size of the input
a naive implementation of finding duplicate values in a list, in which each item has to be checked twice, is an example of a quadratic algorithm
O(2n) exponential the runtime grows exponentially with the size of the input
these algorithms are considered extremely inefficient
O(log n) logarithmic the runtime grows linearly while the size of the input grows exponentially
if it takes one second to process one thousand elements, then it will take two seconds to process ten thousand, three seconds to process one hundred thousand, and so on
binary search is an example of a logarithmic runtime algorithm

the @lru_cache decorator is used for implementing the LRU strategy
can use this decorator to wrap functions and cache their results up to a maximum number of entries

Using @lru_cache to Implement an LRU Cache in Python
Playing With Stairs
want to determine all the different ways you can reach a specific stair in a staircase by hopping one, two, or three stairs at a time
with three steps there are 4 different ways to reach the third step
  • 0 - 1 - 2 - 3
  • 0 - 1 - 3
  • 0 - 2 - 3
  • 0 - 3
the example below uses recursion to determine the number of possible combinations to reach a fourth step
# stairs.py

def steps_to(stair):
    if stair == 1:
        # You can reach the first stair with only a single step
        # from the floor.
        return 1
    elif stair == 2:
        # You can reach the second stair by jumping from the
        # floor with a single two-stair hop or by jumping a single
        # stair a couple of times.
        return 2
    elif stair == 3:
        # You can reach the third stair using four possible
        # combinations:
        # 1. Jumping all the way from the floor
        # 2. Jumping two stairs, then one
        # 3. Jumping one stair, then two
        # 4. Jumping one stair three times
        return 4
    else:
        # You can reach your current stair from three different places:
        # 1. From three stairs down
        # 2. From two stairs down
        # 2. From one stair down
        #
        # If you add up the number of ways of getting to those
        # those three positions, then you should have your solution
        return (
            steps_to(stair - 3)
            + steps_to(stair - 2)
            + steps_to(stair - 1)
        )

print(steps_to(4))
the result shows there are 7 different combinations to reach the fourth step
when there are 30 steps the number of possible combinations is 53,798,080

Timing Your Code
can measure how long it takes for the code to run by using timeit module
at the top of stairs.py import repeat from the timeit module
from timeit import repeat
at the end of the file add
setup_code = "from __main__ import steps_to"
stmt = "steps_to(30)"
times = repeat(setup=setup_code, stmt=stmt, repeat=3, number=10)
print(f"Minimum execution time: {min(times)}")
the added four lines
  1. imports the name of steps_to() so that timeit.repeat() knows how to call it
  2. prepares the call to the function with the stair number to reach
    as shown in this case is 30
    this is the statement that will be executed and timed
  3. calls timeit.repeat() with the setup code and the statement
    syntax of repeat()
    timeit.repeat(stmt='pass', setup='pass', timer=timeit.default_timer, repeat=3, number=1000000)
    the number param is the number of times to call the statement
    the repeat param is the number of times to repeat the number of calls made by number
    above ten calls are made to steps_to()
    those calls are repeated three times
  4. identifies and prints the shortest time returned
Using Memoization to Improve the Solution
memoization ensures a function doesn't run for the same inputs more than once
stores its result in memory and then referencing it later when necessary
changes to make to stairs.py
from functools import lru_cache
from timeit import repeat

@lru_cache
def steps_to(stair):
    if stair == 1:
        ...
the unedited script makes 30 calls to step_to()
the script takes over 12 seconds to complete
using the LRU cache only 1 call is made
the script completes in 0.0002 seconds

Unpacking the Functionality of @lru_cache
@lru_cache decorator offers a maxsize attribute
defines the maximum number of entries before the cache starts evicting old items
from functools import lru_cache
from timeit import repeat

@lru_cache(maxsize=16)
def steps_to(stair):
    if stair == 1:
can use cache_info() to inspect the number of hits and misses and the current size of the cache
print(steps_to.cache_info())
output
3.9998849388211966e-07
CacheInfo(hits=81, misses=30, maxsize=16, currsize=16)
CachInfo values

hitsthe number of calls that @lru_cache returned directly from memory because they existed in the cache
misses the number of calls that didn't come from memory and were computed
it makes sense that each of these calls missed the cache the first time they were made
maxsizethe size of the cache as defined by the maxsize attribute of the decorator.
currsizethe current size of the cache

can use .cache_clear() to clear the cache

Adding Cache Expiration

below is a script which monitors a website
the example monitors Real Python and prints the number of characters in any article that contains the word python
the monitor runs every five seconds

import feedparser
import requests
import ssl
import time

# this is a workaround to an issue when feedparser tries to access content served over HTTPS
if hasattr(ssl, "_create_unverified_context"):
    ssl._create_default_https_context = ssl._create_unverified_context

def get_article_from_server(url):
    print("Fetching article from server...")
    response = requests.get(url)
    return response.text

# this function runs continuously
def monitor(url):
    maxlen = 45
    # monitor
    while True:
        print("\nChecking feed...")
        # loads and parses the feed from Real Python
        feed = feedparser.parse(url)
        # goes through the first 5 entries on the list
        for entry in feed.entries[:5]:
            if "python" in entry.title.lower():
                truncated_title = (
                    entry.title[:maxlen] + "..."
                    if len(entry.title) > maxlen
                    else entry.title
                )
                print(
                    "Match found:",
                    truncated_title,
                    len(get_article_from_server(entry.link)),
                )

        time.sleep(5)

monitor("https://realpython.com/atom.xml")
feedparser and requests modules are required
$ pip install feedparser requests
use ^Ctrl+C to kill the script

can cache the article's contents and avoid hitting the network every five seconds
could use @lru_cache decorator but decorator won't recognize any updates to articles
the first time an article is accessed, the decorator will store its content and return the same data every time after
can set the cache entries to expire

Evicting Cache Entries Based on Both Time and Space
@lru_cache decorator evicts existing entries only when there's no more space to store new listings
with sufficient space, entries in the cache will live forever and never get refreshed

to solve the issue can update the cache implementation so it expires after a specific time
can implement this idea into a new decorator that extending @lru_cache
timed_lru_cache.py

from functools import lru_cache, wraps
from datetime import datetime, timedelta

def timed_lru_cache(seconds: int, maxsize: int = 128):
    def wrapper_cache(func):
        #  wraps the decorated function with the lru_cache decorator
        # allows using the cache functionality already provided by lru_cache
        func = lru_cache(maxsize=maxsize)(func)
        # lifetime of the cache
        func.lifetime = timedelta(seconds=seconds)
        # expiration date-time
        func.expiration = datetime.utcnow() + func.lifetime

        @wraps(func)
        def wrapped_func(*args, **kwargs):
            # check if cache entry is still valid
            if datetime.utcnow() >= func.expiration:
                func.cache_clear()
                func.expiration = datetime.utcnow() + func.lifetime

            return func(*args, **kwargs)

        return wrapped_func

    return wrapper_cache
Caching Articles With the New Decorator
decorated_monitor.py
from timed_lru_cache import timed_lru_cache
import feedparser
import requests
import ssl
import time

from functools import lru_cache, wraps
from datetime import datetime, timedelta

if hasattr(ssl, "_create_unverified_context"):
    ssl._create_default_https_context = ssl._create_unverified_context

@timed_lru_cache(10)
def get_article_from_server(url):
    print("Fetching article from server...")
    response = requests.get(url)
    return response.text

def monitor(url):
    maxlen = 45
    while True:
        print("\nChecking feed...")
        feed = feedparser.parse(url)

        for entry in feed.entries[:5]:
            if "python" in entry.title.lower():
                truncated_title = (
                    entry.title[:maxlen] + "..."
                    if len(entry.title) > maxlen
                    else entry.title
                )
                print(
                    "Match found:",
                    truncated_title,
                    len(get_article_from_server(entry.link)),
                )

        time.sleep(5)

monitor("https://realpython.com/atom.xml")
index