Rediscovering Dictionary Order in Python |
a Python dictionary is an implementation of the hash table a hash table is traditionally an unordered data structure in Python 3.6 dictionaries started to conserve insertion order from 3.7 the insertion order has been guaranteed dictionaries conserve insertion order but they're not considered a sequence a dictionary is like a set of key-value pairs, and sets are unordered dictionaries also don't have much reordering functionality an alternative for storing an ordered key-value pair data is to store the pairs as a list of tuples |
Understanding What Sorting a Dictionary Really Means |
sorting a dictionary is rarely done in-place there are no methods for explicitly moving items in a dictionary can use del and .append() to manually sort a dictionary in-place the OrderedDict class has a specific method to move an item to the end or the start may make OrderedDict preferable for keeping a sorted dictionary not very common isn't very performant the typical method for sorting dictionaries is
|
Sorting Dictionaries in Python |
Using the sorted() Function
the critical function used to sort dictionaries is the built-in sorted() functionfunction takes an iterable as the main argument two optional keyword-only arguments
it sorts strings in alphabetical order >>> numbers = [5, 3, 4, 3, 6, 7, 3, 2, 3, 4, 1] >>> sorted(numbers) [1, 2, 3, 3, 3, 3, 4, 4, 5, 6, 7] >>> words = ["aa", "ab", "ac", "ba", "cb", "ca"] >>> sorted(words) ['aa', 'ab', 'ac', 'ba', 'ca', 'cb']one optional argument to .sorted() is a callback function function acts as a sort key the function gets called for each element in the iterable >>> def select_second_character(word): ... return word[1] ... >>> sorted(words, key=select_second_character) ['aa', 'ba', 'ca', 'ab', 'cb', 'ac'] sorted() vs. sort()
the arguments are the identicalsort() sorts the iterable in-place sorted() returns a new list leaving the original unchanged the other arg determines how the list is sorted >>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} >>> sorted(people) [1, 2, 3, 4] >>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} >>> sorted(people, reverse=True) [4, 3, 2, 1]can also use the reverse() method >>> list(reverse([1, 2, 3])) [3, 2, 1] Getting Keys, Values, or Both From a Dictionary
the dict.items() function returns a read-only dictionary view objectan iterable of tuples representing key-value pairs view is not a copy or a list view is linked to the dictionary any updates to the dictionary are reflected in the view can use the .values() method to get a view of only the values can use the .keys() method to get a view of only the keys can use the sorted() function with views results in a sorted list of tuples each tuple represents a key-value pair of the dictionary result is sorted by keys >>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} >>> sorted(people.items()) [(1, 'Jill'), (2, 'Jack'), (3, 'Jim'), (4, 'Jane')] Understanding How Python Sorts Tuples
Python uses lexicographical sorting when sorting tuplessorts by key unless a callback function (sort key) is included as an argument to sorted() and sort() functions Using the key Parameter and Lambda Functions
when sorting the key argument has nothing to do with dictionary keysthe key argument can be a defined function or a lambda function >>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} >>> # Sort key >>> def value_getter(item): ... return item[1] ... >>> sorted(people.items(), key=value_getter) [(2, 'Jack'), (4, 'Jane'), (1, 'Jill'), (3, 'Jim')] >>> # Or with a lambda function >>> sorted(people.items(), key=lambda item: item[1]) [(2, 'Jack'), (4, 'Jane'), (1, 'Jill'), (3, 'Jim')]the purpose of both is to return something which can be compared above the returned value is the second element of each tuple Selecting a Nested Value With a Sort Key
can use a sort key to select nested values which may or may not be presentreturns a default value if they're not present data = { 193: {"name": "John", "age": 30, "skills": {"python": 8, "js": 7}}, 209: {"name": "Bill", "age": 15, "skills": {"python": 6}}, 746: {"name": "Jane", "age": 58, "skills": {"js": 2, "python": 5}}, 109: {"name": "Jill", "age": 83, "skills": {"java": 10}}, 984: {"name": "Jack", "age": 28, "skills": {"c": 8, "assembly": 7}}, 765: {"name": "Penelope", "age": 76, "skills": {"python": 8, "go": 5}}, 598: {"name": "Sylvia", "age": 62, "skills": {"bash": 8, "java": 7}}, 483: {"name": "Anna", "age": 24, "skills": {"js": 10}}, 277: {"name": "Beatriz", "age": 26, "skills": {"python": 2, "js": 4}}, } def get_relevant_skills(item): """Get the sum of Python and JavaScript skill""" skills = item[1]["skills"] # Return default value that is equivalent to no skill return skills.get("python", 0) + skills.get("js", 0) print(sorted(data.items(), key=get_relevant_skills, reverse=True)) Converting Back to a Dictionary
can iterate over the result with a for loop and populate a dictionary on each iterationprovides absolute control and flexibility >>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} >>> sorted_people = sorted(people.items(), key=lambda item: item[1]) >>> sorted_people_dict = {} >>> for key, value in sorted_people: ... sorted_people_dict[key] = value ... >>> sorted_people_dict {2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}if there aren't any special requirements for constructing the dictionary, then can use a dictionary constructor >>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} >>> sorted_people = sorted(people.items(), key=lambda item: item[1]) >>> dict(sorted_people) {2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}can use a dictionary comprehension only makes sense if changing the shape of the dictionary or swapping the keys and values >>> { ... value: key ... for key, value in sorted(people.items(), key=lambda item: item[1]) ... } ... {'Jack': 2, 'Jane': 4, 'Jill': 1, 'Jim': 3} |
Considering Strategic and Performance Issues |
Using Special Getter Functions to Increase Performance and Readability
the sort key functions used so far don't do muchall the function does is get a value from a tuple common pattern led to Python's way to create special functions special functions get values faster tha regular functions the itemgetter() function can produce highly efficient versions of getter functions pass itemgetter() an argument typically the key or index position to select the itemgetter() function will then return a getter object call the getter object like a function the getter object will call the .__getitem__() method a call to .__getitem__() needs the key or index of what to get the argument that's used for .__getitem__() is the same argument passed to itemgetter() >>> item = ("name", "Guido") >>> from operator import itemgetter >>> getter = itemgetter(0) >>> getter(item) 'name' >>> getter = itemgetter(1) >>> getter(item) 'Guido'make the first getter by passing 0 as an argument to itemgetter() when the resultant getter receives the tuple, it returns the first item in the tuple call itemgetter() with an argument of 1, it gets the value at index position 1 can use the itemgetter as a key for the sorted() function >>> from operator import itemgetter >>> fruit_inventory = [ ... ("banana", 5), ("orange", 15), ("apple", 3), ("kiwi", 0) ... ] >>> # Sort by key >>> sorted(fruit_inventory, key=itemgetter(0)) [('apple', 3), ('banana', 5), ('kiwi', 0), ('orange', 15)] >>> # Sort by value >>> sorted(fruit_inventory, key=itemgetter(1)) [('kiwi', 0), ('apple', 3), ('banana', 5), ('orange', 15)] >>> sorted(fruit_inventory, key=itemgetter(2)) Traceback (most recent call last): File "<input>", line 1, in Measuring Performance When Using itemgetter()
can use the timeit module to compare performance of sorted() using a lambda expression
versus using itemgetter() as the sort key
from timeit import timeit dict_to_order = { 1: "requests", 2: "pip", 3: "jinja", 4: "setuptools", 5: "pandas", 6: "numpy", 7: "black", 8: "pillow", 9: "pyparsing", 10: "boto3", 11: "botocore", 12: "urllib3", 13: "s3transfer", 14: "six", 15: "python-dateutil", 16: "pyyaml", 17: "idna", 18: "certifi", 19: "typing-extensions", 20: "charset-normalizer", 21: "awscli", 22: "wheel", 23: "rsa", } sorted_with_lambda = "sorted(dict_to_order.items(), key=lambda item: item[1])" sorted_with_itemgetter = "sorted(dict_to_order.items(), key=itemgetter(1))" sorted_with_lambda_time = timeit(stmt=sorted_with_lambda, globals=globals()) sorted_with_itemgetter_time = timeit( stmt=sorted_with_itemgetter, setup="from operator import itemgetter", globals=globals(), ) print( f"""\ {sorted_with_lambda_time=:.2f} seconds {sorted_with_itemgetter_time=:.2f} seconds itemgetter is {( sorted_with_lambda_time / sorted_with_itemgetter_time ):.2f} times faster""" )running this script from the shell should give similar results to below $ python compare_lambda_vs_getter.py sorted_with_lambda_time=1.81 seconds sorted_with_itemgetter_time=1.29 seconds itemgetter is 1.41 times faster Judging Whether You Want to Use a Sorted Dictionary
if data is to be added to a dictionary and it needs to stay sorted, might be better off
using a structure like
# Dictionary people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"} # List of tuples people = [ (3, "Jim"), (2, "Jack"), (4, "Jane"), (1, "Jill"), ] # List of dictionaries people = [ {"id": 3, "name": "Jim"}, {"id": 2, "name": "Jack"}, {"id": 4, "name": "Jane"}, {"id": 1, "name": "Jill"}, ]a list of dictionaries is the most widespread pattern because of its cross-language compatibility another option is to simply not worry about ordering the data if it's not needed Comparing the Performance of Different Data Structures
if performance is a consideration then should carefully consider what will be done
with the dictionaryneed to consider
Comparing the Performance of Sorting
use timeit to compare the time it takes to sort the two data structures by the age attribute
from timeit import timeit from samples import dictionary_of_dictionaries, list_of_dictionaries sorting_list = "sorted(list_of_dictionaries, key=lambda item:item['age'])" sorting_dict = """ dict( sorted( dictionary_of_dictionaries.items(), key=lambda item: item[1]['age'] ) ) """ sorting_list_time = timeit(stmt=sorting_list, globals=globals()) sorting_dict_time = timeit(stmt=sorting_dict, globals=globals()) print( f"""\ {sorting_list_time=:.2f} seconds {sorting_dict_time=:.2f} seconds list is {(sorting_dict_time/sorting_list_time):.2f} times faster""" )code imports the sample data structures for sorting on the age attribute it's necessary for these samples to be in the global namespace so that the timeit context has access to them the script's result $ python compare_sorting_dict_vs_list.py sorting_list_time=1.15 seconds sorting_dict_time=2.26 seconds list is 1.95 times faster Comparing the Performance of Lookups
if the dictionary is to sort the data once and the dictionary is used mainly for lookups
then a dictionary will definitely make more sense than a list
from timeit import timeit from samples import dictionary_of_dictionaries, list_of_dictionaries lookups = [15, 18, 19, 16, 6, 12, 5, 3, 9, 20, 2, 10, 13, 17, 4, 14, 11, 7, 8] list_setup = """ def get_key_from_list(key): for item in list_of_dictionaries: if item["id"] == key: return item """ lookup_list = """ for key in lookups: get_key_from_list(key) """ lookup_dict = """ for key in lookups: dictionary_of_dictionaries[key] """ lookup_list_time = timeit(stmt=lookup_list, setup=list_setup, globals=globals()) lookup_dict_time = timeit(stmt=lookup_dict, globals=globals()) print( f"""\ {lookup_list_time=:.2f} seconds {lookup_dict_time=:.2f} seconds dict is {(lookup_list_time / lookup_dict_time):.2f} times faster""" )this code makes a series of lookups to both the list and the dictionary note that with the list a special function must be written to make a lookup the function to make the list lookup iterates the list elements until the target element is found the script result $ python compare_lookup_dict_vs_list.py lookup_list_time=6.73 seconds lookup_dict_time=0.38 seconds dict is 17.83 times faster |