Caching and Its Uses | ||||||||||||||||||||||||||||||||||||
can use in applications to keep recent or often-used data in memory locations which are faster or computationally cheaper to access than their source apps such as a newsreader use caching to keep recent remotely-accessed files locally Implementing a Cache Using a Python Dictionary
simple cache example
import requests cache = dict() def get_article_from_server(url): print("Fetching article from server...") response = requests.get(url) return response.text def get_article(url): print("Getting article...") if url not in cache: cache[url] = get_article_from_server(url) return cache[url] get_article("https://realpython.com/sorting-algorithms-python/") get_article("https://realpython.com/sorting-algorithms-python/")output $ pip install requests $ python caching.py Getting article... Fetching article from server... Getting article... Caching Strategies
Diving Into the Least Recently Used (LRU) Cache Strategy
a cache implemented using the LRU strategy organizes its items in order of useevery time you access an entry, the LRU algorithm will move it to the top of the cache the algorithm can quickly identify the entry that's gone unused the longest by looking at the bottom of the list LRU strategy assumes that the more recently an object has been used, the more likely it will be needed in the future Peeking Behind the Scenes of the LRU Cache
one way to implement an LRU cache in Python is to use a combination of a doubly linked list and a hash mapusing the hash map can ensure access to every item in the cache map each entry to the specific location in the doubly linked list strategy is very fast accessing the least recently used item and updating the cache are operations with a runtime of O(1) Measuring Efficiency With Big O Notation
can use this decorator to wrap functions and cache their results up to a maximum number of entries |
||||||||||||||||||||||||||||||||||||
Using @lru_cache to Implement an LRU Cache in Python | ||||||||||||||||||||||||||||||||||||
Playing With Stairs
want to determine all the different ways you can reach a specific stair in a staircase
by hopping one, two, or three stairs at a timewith three steps there are 4 different ways to reach the third step
# stairs.py def steps_to(stair): if stair == 1: # You can reach the first stair with only a single step # from the floor. return 1 elif stair == 2: # You can reach the second stair by jumping from the # floor with a single two-stair hop or by jumping a single # stair a couple of times. return 2 elif stair == 3: # You can reach the third stair using four possible # combinations: # 1. Jumping all the way from the floor # 2. Jumping two stairs, then one # 3. Jumping one stair, then two # 4. Jumping one stair three times return 4 else: # You can reach your current stair from three different places: # 1. From three stairs down # 2. From two stairs down # 2. From one stair down # # If you add up the number of ways of getting to those # those three positions, then you should have your solution return ( steps_to(stair - 3) + steps_to(stair - 2) + steps_to(stair - 1) ) print(steps_to(4))the result shows there are 7 different combinations to reach the fourth step when there are 30 steps the number of possible combinations is 53,798,080 Timing Your Code
can measure how long it takes for the code to run by using timeit moduleat the top of stairs.py import repeat from the timeit module from timeit import repeatat the end of the file add setup_code = "from __main__ import steps_to" stmt = "steps_to(30)" times = repeat(setup=setup_code, stmt=stmt, repeat=3, number=10) print(f"Minimum execution time: {min(times)}")the added four lines
Using Memoization to Improve the Solution
memoization ensures a function doesn't run for the same inputs more than oncestores its result in memory and then referencing it later when necessary changes to make to stairs.py from functools import lru_cache from timeit import repeat @lru_cache def steps_to(stair): if stair == 1: ...the unedited script makes 30 calls to step_to() the script takes over 12 seconds to complete using the LRU cache only 1 call is made the script completes in 0.0002 seconds Unpacking the Functionality of @lru_cache
@lru_cache decorator offers a maxsize attributedefines the maximum number of entries before the cache starts evicting old items from functools import lru_cache from timeit import repeat @lru_cache(maxsize=16) def steps_to(stair): if stair == 1:can use cache_info() to inspect the number of hits and misses and the current size of the cache print(steps_to.cache_info())output 3.9998849388211966e-07 CacheInfo(hits=81, misses=30, maxsize=16, currsize=16)CachInfo values
|
||||||||||||||||||||||||||||||||||||
Adding Cache Expiration | ||||||||||||||||||||||||||||||||||||
below is a script which monitors a website the example monitors Real Python and prints the number of characters in any article that contains the word python the monitor runs every five seconds import feedparser import requests import ssl import time # this is a workaround to an issue when feedparser tries to access content served over HTTPS if hasattr(ssl, "_create_unverified_context"): ssl._create_default_https_context = ssl._create_unverified_context def get_article_from_server(url): print("Fetching article from server...") response = requests.get(url) return response.text # this function runs continuously def monitor(url): maxlen = 45 # monitor while True: print("\nChecking feed...") # loads and parses the feed from Real Python feed = feedparser.parse(url) # goes through the first 5 entries on the list for entry in feed.entries[:5]: if "python" in entry.title.lower(): truncated_title = ( entry.title[:maxlen] + "..." if len(entry.title) > maxlen else entry.title ) print( "Match found:", truncated_title, len(get_article_from_server(entry.link)), ) time.sleep(5) monitor("https://realpython.com/atom.xml")feedparser and requests modules are required $ pip install feedparser requestsuse ^Ctrl+C to kill the script can cache the article's contents and avoid hitting the network every five seconds could use @lru_cache decorator but decorator won't recognize any updates to articles the first time an article is accessed, the decorator will store its content and return the same data every time after can set the cache entries to expire Evicting Cache Entries Based on Both Time and Space
@lru_cache decorator evicts existing entries only when there's no more space to store new listingswith sufficient space, entries in the cache will live forever and never get refreshed to solve the issue can update the cache implementation so it expires after a specific time can implement this idea into a new decorator that extending @lru_cache timed_lru_cache.py from functools import lru_cache, wraps from datetime import datetime, timedelta def timed_lru_cache(seconds: int, maxsize: int = 128): def wrapper_cache(func): # wraps the decorated function with the lru_cache decorator # allows using the cache functionality already provided by lru_cache func = lru_cache(maxsize=maxsize)(func) # lifetime of the cache func.lifetime = timedelta(seconds=seconds) # expiration date-time func.expiration = datetime.utcnow() + func.lifetime @wraps(func) def wrapped_func(*args, **kwargs): # check if cache entry is still valid if datetime.utcnow() >= func.expiration: func.cache_clear() func.expiration = datetime.utcnow() + func.lifetime return func(*args, **kwargs) return wrapped_func return wrapper_cache Caching Articles With the New Decorator
decorated_monitor.py
from timed_lru_cache import timed_lru_cache import feedparser import requests import ssl import time from functools import lru_cache, wraps from datetime import datetime, timedelta if hasattr(ssl, "_create_unverified_context"): ssl._create_default_https_context = ssl._create_unverified_context @timed_lru_cache(10) def get_article_from_server(url): print("Fetching article from server...") response = requests.get(url) return response.text def monitor(url): maxlen = 45 while True: print("\nChecking feed...") feed = feedparser.parse(url) for entry in feed.entries[:5]: if "python" in entry.title.lower(): truncated_title = ( entry.title[:maxlen] + "..." if len(entry.title) > maxlen else entry.title ) print( "Match found:", truncated_title, len(get_article_from_server(entry.link)), ) time.sleep(5) monitor("https://realpython.com/atom.xml") |