Sorting a Dictionary

Rediscovering Dictionary Order in Python

a Python dictionary is an implementation of the hash table
a hash table is traditionally an unordered data structure
in Python 3.6 dictionaries started to conserve insertion order
from 3.7 the insertion order has been guaranteed
dictionaries conserve insertion order but they're not considered a sequence
a dictionary is like a set of key-value pairs, and sets are unordered
dictionaries also don't have much reordering functionality

an alternative for storing an ordered key-value pair data is to store the pairs as a list of tuples

Understanding What Sorting a Dictionary Really Means

sorting a dictionary is rarely done in-place
there are no methods for explicitly moving items in a dictionary
can use del and .append() to manually sort a dictionary in-place

the OrderedDict class has a specific method to move an item to the end or the start
may make OrderedDict preferable for keeping a sorted dictionary
not very common
isn't very performant

the typical method for sorting dictionaries is

get a dictionary view
sort it
cast the resulting list back into a dictionary

depending on the use case, it may not be necessary to convert the list back into a dictionary

Sorting Dictionaries in Python

Using the sorted() Function
sorted() vs. sort()
Getting Keys, Values, or Both From a Dictionary
Understanding How Python Sorts Tuples
Using the key Parameter and Lambda Functions
Selecting a Nested Value With a Sort Key
Converting Back to a Dictionary

Using the sorted() Function

the critical function used to sort dictionaries is the built-in sorted() function
function takes an iterable as the main argument
two optional keyword-only arguments

a key function
a reverse Boolean value

sorts comparable elements like numbers in ascending order
it sorts strings in alphabetical order

>>> numbers = [5, 3, 4, 3, 6, 7, 3, 2, 3, 4, 1]
>>> sorted(numbers)
[1, 2, 3, 3, 3, 3, 4, 4, 5, 6, 7]

>>> words = ["aa", "ab", "ac", "ba", "cb", "ca"]
>>> sorted(words)
['aa', 'ab', 'ac', 'ba', 'ca', 'cb']

one optional argument to .sorted() is a callback function
function acts as a sort key
the function gets called for each element in the iterable

>>> def select_second_character(word):
...     return word[1]
...
>>> sorted(words, key=select_second_character)
['aa', 'ba', 'ca', 'ab', 'cb', 'ac']

sorted() vs. sort()

the arguments are the identical
sort() sorts the iterable in-place
sorted() returns a new list leaving the original unchanged

the other arg determines how the list is sorted

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted(people)
[1, 2, 3, 4]

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted(people, reverse=True)
[4, 3, 2, 1]

can also use the reverse() method

>>> list(reverse([1, 2, 3]))
[3, 2, 1]

Getting Keys, Values, or Both From a Dictionary

the dict.items() function returns a read-only dictionary view object
an iterable of tuples representing key-value pairs
view is not a copy or a list
view is linked to the dictionary
any updates to the dictionary are reflected in the view
can use the .values() method to get a view of only the values
can use the .keys() method to get a view of only the keys
can use the sorted() function with views
results in a sorted list of tuples
each tuple represents a key-value pair of the dictionary
result is sorted by keys

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted(people.items())
[(1, 'Jill'), (2, 'Jack'), (3, 'Jim'), (4, 'Jane')]

Understanding How Python Sorts Tuples

Python uses lexicographical sorting when sorting tuples
sorts by key unless a callback function (sort key) is included as an argument to sorted() and sort() functions

Using the key Parameter and Lambda Functions

when sorting the key argument has nothing to do with dictionary keys
the key argument can be a defined function or a lambda function

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}

>>> # Sort key
>>> def value_getter(item):
...     return item[1]
...

>>> sorted(people.items(), key=value_getter)
[(2, 'Jack'), (4, 'Jane'), (1, 'Jill'), (3, 'Jim')]

>>> # Or with a lambda function
>>> sorted(people.items(), key=lambda item: item[1])
[(2, 'Jack'), (4, 'Jane'), (1, 'Jill'), (3, 'Jim')]

the purpose of both is to return something which can be compared
above the returned value is the second element of each tuple

Selecting a Nested Value With a Sort Key

can use a sort key to select nested values which may or may not be present
returns a default value if they're not present

data = {
    193: {"name": "John", "age": 30, "skills": {"python": 8, "js": 7}},
    209: {"name": "Bill", "age": 15, "skills": {"python": 6}},
    746: {"name": "Jane", "age": 58, "skills": {"js": 2, "python": 5}},
    109: {"name": "Jill", "age": 83, "skills": {"java": 10}},
    984: {"name": "Jack", "age": 28, "skills": {"c": 8, "assembly": 7}},
    765: {"name": "Penelope", "age": 76, "skills": {"python": 8, "go": 5}},
    598: {"name": "Sylvia", "age": 62, "skills": {"bash": 8, "java": 7}},
    483: {"name": "Anna", "age": 24, "skills": {"js": 10}},
    277: {"name": "Beatriz", "age": 26, "skills": {"python": 2, "js": 4}},
}

def get_relevant_skills(item):
    """Get the sum of Python and JavaScript skill"""
    skills = item[1]["skills"]

    # Return default value that is equivalent to no skill
    return skills.get("python", 0) + skills.get("js", 0)

print(sorted(data.items(), key=get_relevant_skills, reverse=True))

Converting Back to a Dictionary

can iterate over the result with a for loop and populate a dictionary on each iteration
provides absolute control and flexibility

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted_people = sorted(people.items(), key=lambda item: item[1])

>>> sorted_people_dict = {}
>>> for key, value in sorted_people:
...     sorted_people_dict[key] = value
...

>>> sorted_people_dict
{2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}

if there aren't any special requirements for constructing the dictionary, then can use a dictionary constructor

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted_people = sorted(people.items(), key=lambda item: item[1])
>>> dict(sorted_people)
{2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}

can use a dictionary comprehension
only makes sense if changing the shape of the dictionary or swapping the keys and values

>>> {
...     value: key
...     for key, value in sorted(people.items(), key=lambda item: item[1])
... }
...
{'Jack': 2, 'Jane': 4, 'Jill': 1, 'Jim': 3}

Considering Strategic and Performance Issues

Using Special Getter Functions to Increase Performance and Readability
Measuring Performance When Using itemgetter()
Judging Whether You Want to Use a Sorted Dictionary
Comparing the Performance of Different Data Structures
Comparing the Performance of Sorting
Comparing the Performance of Lookups

Using Special Getter Functions to Increase Performance and Readability

the sort key functions used so far don't do much
all the function does is get a value from a tuple
common pattern led to Python's way to create special functions
special functions get values faster tha regular functions

the itemgetter() function can produce highly efficient versions of getter functions
pass itemgetter() an argument
typically the key or index position to select
the itemgetter() function will then return a getter object
call the getter object like a function

the getter object will call the .__getitem__() method
a call to .__getitem__() needs the key or index of what to get
the argument that's used for .__getitem__() is the same argument passed to itemgetter()

>>> item = ("name", "Guido")

>>> from operator import itemgetter

>>> getter = itemgetter(0)
>>> getter(item)
'name'
>>> getter = itemgetter(1)
>>> getter(item)
'Guido'

make the first getter by passing 0 as an argument to itemgetter()
when the resultant getter receives the tuple, it returns the first item in the tuple
call itemgetter() with an argument of 1, it gets the value at index position 1

can use the itemgetter as a key for the sorted() function

>>> from operator import itemgetter

>>> fruit_inventory = [
...     ("banana", 5), ("orange", 15), ("apple", 3), ("kiwi", 0)
... ]

>>> # Sort by key
>>> sorted(fruit_inventory, key=itemgetter(0))
[('apple', 3), ('banana', 5), ('kiwi', 0), ('orange', 15)]

>>> # Sort by value
>>> sorted(fruit_inventory, key=itemgetter(1))
[('kiwi', 0), ('apple', 3), ('banana', 5), ('orange', 15)]

>>> sorted(fruit_inventory, key=itemgetter(2))
Traceback (most recent call last):
  File "<input>", line 1, in 
    sorted(fruit_inventory, key=itemgetter(2))
IndexError: tuple index out of range

Measuring Performance When Using itemgetter()

can use the timeit module to compare performance of sorted() using a lambda expression versus using itemgetter() as the sort key

from timeit import timeit

dict_to_order = {
    1: "requests",
    2: "pip",
    3: "jinja",
    4: "setuptools",
    5: "pandas",
    6: "numpy",
    7: "black",
    8: "pillow",
    9: "pyparsing",
    10: "boto3",
    11: "botocore",
    12: "urllib3",
    13: "s3transfer",
    14: "six",
    15: "python-dateutil",
    16: "pyyaml",
    17: "idna",
    18: "certifi",
    19: "typing-extensions",
    20: "charset-normalizer",
    21: "awscli",
    22: "wheel",
    23: "rsa",
}

sorted_with_lambda = "sorted(dict_to_order.items(), key=lambda item: item[1])"
sorted_with_itemgetter = "sorted(dict_to_order.items(), key=itemgetter(1))"

sorted_with_lambda_time = timeit(stmt=sorted_with_lambda, globals=globals())
sorted_with_itemgetter_time = timeit(
    stmt=sorted_with_itemgetter,
    setup="from operator import itemgetter",
    globals=globals(),
)

print(
    f"""\
{sorted_with_lambda_time=:.2f} seconds
{sorted_with_itemgetter_time=:.2f} seconds
itemgetter is {(
    sorted_with_lambda_time / sorted_with_itemgetter_time
):.2f} times faster"""
)

running this script from the shell should give similar results to below

$ python compare_lambda_vs_getter.py
sorted_with_lambda_time=1.81 seconds
sorted_with_itemgetter_time=1.29 seconds
itemgetter is 1.41 times faster

Judging Whether You Want to Use a Sorted Dictionary

if data is to be added to a dictionary and it needs to stay sorted, might be better off using a structure like

a list of tuples
a list of dictionaries

# Dictionary
people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}

# List of tuples
people = [
    (3, "Jim"),
    (2, "Jack"),
    (4, "Jane"),
    (1, "Jill"),
]

# List of dictionaries
people = [
    {"id": 3, "name": "Jim"},
    {"id": 2, "name": "Jack"},
    {"id": 4, "name": "Jane"},
    {"id": 1, "name": "Jill"},
]

a list of dictionaries is the most widespread pattern because of its cross-language compatibility

another option is to simply not worry about ordering the data if it's not needed

Comparing the Performance of Different Data Structures

if performance is a consideration then should carefully consider what will be done with the dictionary
need to consider

will the dictionary be sorted once and then making lots of lookups?
will the dictionary be sorted many times and making very few lookups?

when possible usage patterns for the data structure are considered then can use the timeit module to test the performance

Comparing the Performance of Sorting

use timeit to compare the time it takes to sort the two data structures by the age attribute

from timeit import timeit
from samples import dictionary_of_dictionaries, list_of_dictionaries

sorting_list = "sorted(list_of_dictionaries, key=lambda item:item['age'])"
sorting_dict = """
dict(
    sorted(
        dictionary_of_dictionaries.items(), key=lambda item: item[1]['age']
    )
)
"""

sorting_list_time = timeit(stmt=sorting_list, globals=globals())
sorting_dict_time = timeit(stmt=sorting_dict, globals=globals())

print(
    f"""\
{sorting_list_time=:.2f} seconds
{sorting_dict_time=:.2f} seconds
list is {(sorting_dict_time/sorting_list_time):.2f} times faster"""
)

code imports the sample data structures for sorting on the age attribute
it's necessary for these samples to be in the global namespace so that the timeit context has access to them
the script's result

$ python compare_sorting_dict_vs_list.py
sorting_list_time=1.15 seconds
sorting_dict_time=2.26 seconds
list is 1.95 times faster

Comparing the Performance of Lookups

if the dictionary is to sort the data once and the dictionary is used mainly for lookups then a dictionary will definitely make more sense than a list

from timeit import timeit
from samples import dictionary_of_dictionaries, list_of_dictionaries

lookups = [15, 18, 19, 16, 6, 12, 5, 3, 9, 20, 2, 10, 13, 17, 4, 14, 11, 7, 8]

list_setup = """
def get_key_from_list(key):
    for item in list_of_dictionaries:
        if item["id"] == key:
            return item
"""

lookup_list = """
for key in lookups:
    get_key_from_list(key)
"""

lookup_dict = """
for key in lookups:
    dictionary_of_dictionaries[key]
"""

lookup_list_time = timeit(stmt=lookup_list, setup=list_setup, globals=globals())
lookup_dict_time = timeit(stmt=lookup_dict, globals=globals())

print(
    f"""\
{lookup_list_time=:.2f} seconds
{lookup_dict_time=:.2f} seconds
dict is {(lookup_list_time / lookup_dict_time):.2f} times faster"""
)

this code makes a series of lookups to both the list and the dictionary
note that with the list a special function must be written to make a lookup
the function to make the list lookup iterates the list elements until the target element is found
the script result

$ python compare_lookup_dict_vs_list.py
lookup_list_time=6.73 seconds
lookup_dict_time=0.38 seconds
dict is 17.83 times faster