Python Topics : TOML - Tom's Obvious Minimal Language
Use TOML as a Configuration Format
TOML: Tom's Obvious Minimal Language
TOML is a fairly new format
the first format specification, version 0.1.0, was released in 2013
focused on being a minimal configuration file format that's human-readable
[user]
player_x.color = "blue"
player_o.color = "green"

[constant]
board_size = 3

[server]
url = "https://tictactoe.example.com"
TOML has a specification which spells out precisely what's allowed and how different values should be interpreted
TOML is restrictive in a few aspects
  • all keys are interpreted as strings
  • TOML has no null type
  • Some whitespace is important
    makes it less efficient to compress the size of TOML documents.
TOML Schema Validation
can validate the TOML configuration manually in simple application
below the config file has been parsed into a string named config
the code shows how the string/file is validated
match config:
    case {
        "user": {"player_x": {"color": str()}, "player_o": {"color": str()}},
        "constant": {"board_size": int()},
        "server": {"url": str()},
    }:
        pass
    case _:
        raise ValueError(f"invalid configuration: {config}")
approach may not scale well if the TOML document is more complicated A better alternative is to use pydantic
utilizes type annotations to do data validation at runtime
one advantage of pydantic is that it has precise and helpful error messages built in

Get to Know TOML: Key-Value Pairs

TOML is built around key-value pairs that map nicely to hash table data structures
TOML values have different types
each value must have one of the following types
  • String
  • Integer
  • Float
  • Boolean
  • Offset date-time
  • Local date-time
  • Local date
  • Local time
  • Array
  • Inline table
can use tables and arrays of tables as collections that organize several key-value pairs
key-value pairs are the basic building blocks in a TOML document
keys are always interpreted as strings even without quotation marks
bare keys consist only of ASCII letters and numbers as well as underscores and dashes
all such keys can be written without quotation marks

can avoid the limits of bare keys by using Unicode strings as keys
generally want to use bare keys

dots (.) play a special role in TOML keys
can use dots in unquoted keys
in that case, they'll trigger grouping by splitting the dotted key at each dot

player_x.symbol = "X"
player_x.color = "purple"
both keys start with player_x
the keys symbol and color will be grouped together inside a section named player_x

Strings, Numbers, and Booleans
one difference between TOML and Python is that TOML's Boolean values are lowercase (true/false)

a TOML string should typically use double quotation marks (")
inside strings can escape special characters with the help of backslashes

"\u03c0 is less than four"
\u03c0 denotes the Unicode character with codepoint for the Greek letter π
string will be interpreted as "π is less than four"

can also specify TOML strings using single quotation marks (')
single-quoted strings are called literal strings and behave similarly to raw strings in Python
nothing is escaped and interpreted in a literal string
'\u03c0 is the Unicode codepoint of π' starts with the literal \u03c0 characters

'\u03c0 is the Unicode codepoint of π'
TOML strings can also be specified using triple quotation marks (""" or ''')
triple-quoted strings allow writing a string over multiple lines
similar to Python multiline strings
partly_zen = """
Flat is better than nested.
Sparse is better than dense.
"""
control characters, including literal newlines, aren't allowed in basic strings
can use \n to represent a newline inside a basic string
must use a multiline string if to format strings over several lines
can also use triple-quoted literal strings
the only way to include a single quotation mark inside a literal string
'''Use '\u03c0' to represent π'''
be careful with special characters when creating TOML documents inside Python code
Python will also interpret the special characters
the following is a valid TOML document
numbers = "one\ntwo\nthree"
the value of numbers is a string that's split over three lines
try to represent the same document in Python
>>> 'numbers = "one\ntwo\nthree"'
'numbers = "one\ntwo\nthree"'
Python parses the \n characters and creates an invalid TOML document
need to keep the special characters away from Python
example using raw strings
>>> r'numbers = "one\ntwo\nthree"'
'numbers = "one\\ntwo\\nthree"'
Integers
integers represent whole numbers and are specified as plain, numeric characters
as in Python can use underscores to enhance readability
number = 42
negative = -8
large = 60_481_729
Floating point numbers
represent decimal numbers and include an integer part, a dot representing the decimal point, and a fractional part
floats can use scientific notation to represent very small or very large numbers
TOML also supports special float values like infinity and not a number (NaN)
number = 3.11
googol = 1e100
mole = 6.22e23
negative_infinity = -inf
not_a_number = nan
the TOML specification requires that integers at least are represented as 64-bit signed integers
Python handles arbitrarily large integers
only integers with up to about 19 digits are guaranteed to work on all TOML implementations

non-negative integer values may also be represented as hexadecimal, octal, or binary values by using a 0x, 0o, or 0b prefix, respectively

Tables
a TOML document consists of one or more key-value pairs
when represented in a programming language these should be stored in a hash table data structure
in Python that would be a dictionary or another dictionary-like data structure
to organize key-value pairs, can use tables

TOML supports three different ways of specifying tables
the different tables do have slightly different use cases

  • use regular tables with headers in most cases
  • use dotted key tables when need to specify a few key-value pairs
    pairs are closely tied to their parent table
  • use inline tables only for very small tables with up to three key-value pairs
    the data makes up a clearly defined entity.
the different table representations are mostly interchangeable
should default to regular tables
only use dotted key tables or inline tables if it improves the configuration's readability or clarifies the intent

regular tables are defined by adding a table header above the key-value pairs
a header is a key without a value, wrapped inside square brackets ([])
three tables

[user]
player_x.color = "blue"
player_o.color = "green"

[constant]
board_size = 3

[server]
url = "https://tictactoe.example.com"
[user] is a dotted key table
[user]
player_x.color = "blue"
player_o.color = "green"
the dot (.) in the keys creates a table named by the part of the key before the dot
the same configuration using regular tables
[user]

    [user.player_x]
    color = "blue"

    [user.player_o]
    color = "green"
indentation isn't important in TOML
use it here to represent the nesting of the tables

table examples

# nested regular tables
[user]

    [user.player_x]
    symbol = "X"
    color = "blue"

    [user.player_o]
    symbol = "O"
    color = "green"

# nested dotted key tables
[user]
player_x.symbol = "X"
player_x.color = "blue"
player_o.symbol = "O"
player_o.color = "green"

# inline table
[user]
player_x = { symbol = "X", color = "blue" }
player_o = { symbol = "O", color = "green" }
an inline table is defined with curly braces ({}) wrapped around comma-separated key-value pairs
in this example, the inline table brings a nice balance of readability and compactness
the grouping of the player tables becomes clear
inline tables are intentionally limited compared to regular tables
an inline table must be written on one line in the TOML file

a TOML document is represented by a nameless root table that contains all other tables and key-value pairs
key-value pairs written at the top of the TOML configuration (before any table header) are stored directly in the root table

title = "Tic-Tac-Toe"

[constant]
board_size = 3
a table includes all key-value pairs written between its header and the next table header
below the background_color doesn't appear to be part of the [user.player_o] table
it is because indentation isn't important in TOML
[user]

    [user.player_x]
    color = "blue"

    [user.player_o]
    color = "green"

background_color = "white"
to eliminate an confusion background_color should be defined before the nested tables

Times and Dates
four representations of date-time
  • offset date-time - a timestamp with time zone information
    representing a specific instant in time
  • local date-time - a timestamp without time zone information
  • local date - a date without any time zone information
    typically used to represent a full day
  • local time - a time with any date or time zone information
    use a local time to represent a time of day
a fully defined timestamp
2021-01-12T01:23:45.654321+01:00
timestamp fields

FieldExampleDetails
year2021
month01two digits from 01 (January) to 12 (December)
day12two digits, zero padded when below ten
hour01two digits from 00 to 23
minute23two digits from 00 to 59
seconds45two digits from 00 to 59
microseconds654321six digits from 000000 to 999999
offset+01:00time zone as offset from UTC, with Z representing UTC

the microsecond field is optional for all date-time and time types
can replace the T that separates the date and time with a space
examples of each of the timestamp-related types

offset_date-time     = 2021-01-12 01:23:45+01:00
offset_date-time_utc = 2021-01-12 00:23:45Z
local_date-time      = 2021-01-12 01:23:45
local_date           = 2021-01-12
local_time           = 01:23:45
local_time_with_us   = 01:23:45.654321
Arrays
TOML arrays represent an ordered list of values
specify them using square brackets ([])
they resemble Python's lists
packages = ["tomllib", "tomli", "tomli_w", "tomlkit"]
can use any TOML data type, including other arrays, inside arrays
one array can contain different data types
allowed to specify an array over several lines
can use a trailing comma after the last element in the array
the following examples are valid TOML arrays
potpourri = ["flower", 1749, { symbol = "X", color = "blue" }, 1994-02-14]
skiers = ["Thomas", "Bjørn", "Mika"]
players = [
    { symbol = "X", color = "blue", ai = true },
    { symbol = "O", color = "green", ai = false },
]
players is a table containing two inline arrays
in general should express an array of tables by writing table headers inside double square brackets ([[]])
syntax isn't necessarily pretty, but it's quite effective
below the array of tables is equivalent to the array of inline tables above
[[players]]
symbol = "X"
color = "blue"
ai = true

[[players]]
symbol = "O"
color = "green"
ai = false
excerpt from a TOML document
[python]
label = "Python"

[[python.questions]]
question = "Which built-in function can get information from the user"
answers = ["input"]
alternatives = ["get", "print", "write"]

[[python.questions]]
question = "What's the purpose of the built-in zip() function"
answers = ["To iterate over two or more sequences at the same time"]
alternatives = [
    "To combine several strings into one",
    "To compress several files into one archive",
    "To get information from the user",
]
the snippet shows the python table holds two keys
  • label
  • questions
questions is an array of tables with two elements
each element is a table with three keys
  • question
  • answers
  • alternatives
Load TOML With Python
Read TOML Documents With tomli and tomllib
tomli and its sibling tomllib are great libraries when you only want to load a TOML document into Python
create the following TOML file and saving it as tic_tac_toe.toml
# tic_tac_toe.toml

[user]
player_x.color = "blue"
player_o.color = "green"

[constant]
board_size = 3

[server]
url = "https://tictactoe.example.com"
create a virtual environment in the same folder as tic_tac_toe.toml
activate the virtual environment and install tomli
(venv) $ python -m pip install tomli
tomli exposes two methods
  • load() - loads a TOML doc from a file
  • loads() - loads a TOML doc from a string
to load and view tic_tac_toe.toml
>>> import tomli
>>> with open("tic_tac_toe.toml", mode="rb") as fp:
...     config = tomli.load(fp)
...
>>> config
{'user': {'player_x': {'color': 'blue'}, 'player_o': {'color': 'green'}},
 'constant': {'board_size': 3},
 'server': {'url': 'https://tictactoe.example.com'}}

>>> config["user"]["player_o"]
{'color': 'green'}

>>> config["server"]["url"]
'https://tictactoe.example.com'
the TOML document is represented as a Python dictionary
all the tables and sub-tables in the TOML file show up as nested dictionaries in config variable
can pick out individual values by following the keys into the nested dictionary

if the TOML document is represented as a string, then use loads()
the 's' in loads() stands for string

>>> import tomli
>>> toml_str = """
... offset_date-time_utc = 2021-01-12 00:23:45Z
... potpourri = ["flower", 1749, { symbol = "X", color = "blue" }, 1994-02-14]
... """

>>> tomli.loads(toml_str)
{'offset_date-time_utc': datetime.datetime(2021, 1, 12, 0, 23, 45, tzinfo=datetime.timezone.utc),
 'potpourri': ['flower',
               1749,
               {'symbol': 'X', 'color': 'blue'},
               datetime.date(1994, 2, 14)]}
Compare TOML Types and Python Types
the TOML specification mentions some requirements on its own types
  • a TOML file must be a valid UTF-8 encoded Unicode document
  • arbitrary 64-bit signed integers (from −2^63 to 2^63−1) should be accepted and handled losslessly
  • floats should be implemented as IEEE 754 binary64 values
the mapping between TOML's data types and Python's data types is quite natural

TOMLPython
stringstr
integerint
floatfloat
booleanbool
tabledict
TOMLPython
offset date-time datetime.datetime (.tzinfo is an instance of datetime.timezone)
local_date-time datetime.datetime (.tzinfo is None)
local datedatetime.time
local timedatetime.time
arraylist

Use Configuration Files in Projects
a config file should be read only once
wrap the config file in a module
when a module is imported it is cached for later use

in the folder where tic-tac-toe.toml is located and add a subfolder named config

# tic_tac_toe.toml

[user]
player_x.color = "blue"
player_o.color = "green"

[constant]
board_size = 3

[server]
url = "https://tictactoe.example.com"
move tic-tac-toe.toml to the new folder
add a file named __init__.py to the config folder
the folder structure should look like
config/
├── __init__.py
└── tic_tac_toe.toml
add the code below to __init__.py
# __init__.py

import pathlib
import tomli

path = pathlib.Path(__file__).parent / "tic_tac_toe.toml"
with path.open(mode="rb") as fp:
    tic_tac_toe = tomli.load(fp)
in a REPL session
>>> import config
>>> config.path
PosixPath('/home/realpython/config/tic_tac_toe.toml')

>>> config.tic_tac_toe
{'user': {'player_x': {'color': 'blue'}, 'player_o': {'color': 'green'}},
 'constant': {'board_size': 3},
 'server': {'url': 'https://tictactoe.example.com'}}

>>> config.tic_tac_toe["server"]["url"]
'https://tictactoe.example.com'

>>> config.tic_tac_toe["constant"]["board_size"]
3

>>> config.tic_tac_toe["user"]["player_o"]
{'color': 'green'}

>>> config.tic_tac_toe["user"]["player_o"]["color"]
'green'
may want to alias the config import
from config import tic_tac_toe as CFG

color_x = CFG["user"]["player_x"]["color"]
Dump Python Objects as TOML
Convert Dictionaries to TOML
code a basic TOML writer

_dumps_value() is a helper function
takes a value and return its TOML representation based on the value type
uses pattern matching for type checking

# to_toml.py

def _dumps_value(value):
    match value:
        case bool() : return "true" if value else "false"         
        case float() : return str(value)
        case int() : return str(value)
        case str() : return f'"{value}"'
        case list() : return f"[{', '.join(_dumps_value(v) for v in value)}]"
        case _ : raise TypeError(f"{type(value).__name__} {value!r} is not supported")
if the value is a list _dumps_value() calls itself recursively

dumps() parses the dictionary converting each item to a key-value pair
if an item is another dictionary the dumps() function recurses on itself
add this method to to_toml.py

def dumps(toml_dict, table=""):
    def tables_at_end(item):
        _, value = item
        return isinstance(value, dict)

    toml = []
    for key, value in sorted(toml_dict.items(), key=tables_at_end):
        if isinstance(value, dict):
            table_key = f"{table}.{key}" if table else key
            toml.append(f"\n[{table_key}]\n{dumps(value, table_key)}")
        else:
            toml.append(f"{key} = {_dumps_value(value)}")
    return "\n".join(toml)
the mule used to run to_toml.py
# toml_mule.py

import to_toml

config = {
    "user": {
        "player_x": {"symbol": "X", "color": "blue", "ai": True},
        "player_o": {"symbol": "O", "color": "green", "ai": False},
        "ai_skill": 0.85,
    },
    "board_size": 3,
    "server": {"url": "https://tictactoe.example.com"},
}

print(to_toml.dumps(config))
Write TOML Documents With tomli_w
install tomli_w into the venv using command
(venv) $ python -m pip install tomli_w
a new mule using tomli_w
tomli_w_mule.py produces the same output as toml_mule.py did using to_toml.py
# tomli_w_mule.py

import tomli_w

config = {
    "user": {
        "player_x": {"symbol": "X", "color": "blue", "ai": True},
        "player_o": {"symbol": "O", "color": "green", "ai": False},
        "ai_skill": 0.85,
    },
    "board_size": 3,
    "server": {"url": "https://tictactoe.example.com"},
}

print(tomli_w.dumps(config))
tomli_w supports all the features which weren't implemented in to_toml.py
includes times and dates, inline tables, and arrays of tables

dumps() writes to a string so processing can continue
to store the new TOML document directly to disk, then call dump() instead
as with load() need to pass in a file pointer opened in binary mode

>>> with open("tic-tac-toe-config.toml", mode="wb") as fp:
...     tomli_w.dump(config, fp)
...
tomli discards comments
can't distinguish between literal strings, multiline strings, and regular strings in the dictionary returned by load() or loads()
lose some metainformation when parsing a TOML document and then writing it back
>>> import tomli, tomli_w
>>> toml_data = """
... [nested]  # Not necessary
...
...     [nested.table]
...     string       = "Hello, TOML!"
...     weird_string = '''Literal
...         Multiline'''
... """
>>> print(tomli_w.dumps(tomli.loads(toml_data)))
[nested.table]
string = "Hello, TOML!"
weird_string = "Literal\n        Multiline"
Create New TOML Documents
Format and Style TOML Documents
generally whitespace is ignored in TOML files
can take advantage of this to make configuration files well organized, readable, and intuitive
a hash symbol (#) marks the rest of the line as a comment

there's no style guide for TOML documents
some features in TOML are quite flexible

  • can define tables in any order
  • can even define a sub-table before its parent
  • whitespace is ignored around keys
  • the headers [nested.table] and [ nested . table] start the same nested table
focus on consistency and readability
Create TOML From Scratch With tomlkit
install package into the venv
(venv) $ python -m pip install tomlkit
the 'round trip' below shows tomlkit preserves all string types, indentations, comments, and alignments
>>> import tomlkit
>>> toml_data = """
... [nested]  # Not necessary
...
...     [nested.table]
...     string       = "Hello, TOML!"
...     weird_string = '''Literal
...         Multiline'''
... """
>>> print(tomlkit.dumps(tomlkit.loads(toml_data)))

[nested]  # Not necessary

    [nested.table]
    string       = "Hello, TOML!"
    weird_string = '''Literal
        Multiline'''

>>> tomlkit.dumps(tomlkit.loads(toml_data)) == toml_data
True
create a TOML document from scratch
>>> from tomlkit import comment, document, nl, table

>>> toml = document()
>>> toml.add(comment("Written by TOML Kit"))
>>> toml.add(nl()) # empty line
>>> toml.add("board_size", 3)
convert toml to an actual TOML document by using dump() or dumps()
can use .as_string() with the toml object
>>> print(toml.as_string())
# Written by TOML Kit

board_size = 3
complete tomlkit_test.py
from tomlkit import comment, document, nl, table

toml = document()
toml.add(comment("Written by TOML Kit"))
toml.add(nl())
toml.add("board_size", 3)
player_x = table()
player_x.add("symbol", "X")
player_x.add("color", "blue")
player_x.comment("Start player")
toml.add("player_x", player_x)

player_o = table()
player_o.update({"symbol": "O", "color": "green"})
toml["player_o"] = player_o

print(toml.as_string())
Update Existing TOML Documents
Represent TOML as tomlkit Objects
tic-tac-toe-config.toml
# tic-tac-toe-config.toml

board_size = 3

[user]
ai_skill = 0.85  # A number between 0 (random) and 1 (expert)

    [user.player_x]
    symbol = "X"
    color = "blue"
    ai = true

    [user.player_o]
    symbol = "O"
    color = "green"
    ai = false

# Settings used when deploying the application
[server]
url = "https://tictactoe.example.com"
in a REPL session load tic-tac-toe-config.toml
>>> import tomlkit
>>> with open("tic-tac-toe-config.toml", mode="rt", encoding="utf-8") as fp:
...     config = tomlkit.load(fp)
...
>>> config
{'board_size': 3, 'user': {'ai_skill': 0.85, 'player_x': { ... }}}

>>> type(config)
<class 'tomlkit.toml_document.TOMLDocument'>
when using tomlkit the file must be opened in the text mode (mode="rt")
the encoding must be encoding="utf-8"

a TOMLDocument works like a dictionary or a dictionary of dictionaries

>>> config["user"]["player_o"]["color"]
'green'

>>> type(config["user"]["player_o"]["color"])
<class 'tomlkit.items.String'>

>>> config["user"]["player_o"]["color"].upper()
'GREEN'
values are also special tomlkit data types, can work with them as if they're regular Python types
special TOML data types is that they give youprovide access to metainformation about the document
>>> config["user"]["ai_skill"]
0.85

>>> config["user"]["ai_skill"].trivia.comment
'# A number between 0 (random) and 1 (expert)'

>>> config["user"]["player_x"].trivia.indent
'    '
can mostly treat these special objects as if they were native Python objects
they inherit from their native counterparts
can use .unwrap() to convert them to plain Python
>>> config["board_size"] ** 2
9

>>> isinstance(config["board_size"], int)
True

>>> config["board_size"].unwrap()
3

>>> type(config["board_size"].unwrap())
<class 'int'>
Read and Write TOML Losslessly
a TOMLDocument can be treated just as any other object
can use .add() to add new elements to it ,br/> can't use .add() to update the value of existing keys
>>> config.add("app_name", "Tic-Tac-Toe")
{'board_size': 3, 'app_name': 'Tic-Tac-Toe', 'user': { ... }}

>>> config["user"].add("ai_skill", 0.6)
Traceback (most recent call last):
  ...
KeyAlreadyPresent: Key "ai_skill" already exists.
can assign the new value as if config were a regular dictionary
>>> config["user"]["ai_skill"] = 0.6
>>> print(config["user"].as_string())
ai_skill = 0.6  # A number between 0 (random) and 1 (expert)

    [user.player_x]
    symbol = "X"
    color = "blue"
    ai = true

    [user.player_o]
    symbol = "O"
    color = "green"
    ai = false
parts of the tomlkit support a fluent interface
operations like .add() return the updated object so can chain another call to .add() onto it
>>> from tomlkit import aot, comment, inline_table, nl, table
>>> player_data = [
...     {"user": "gah", "first_name": "Geir Arne", "last_name": "Hjelle"},
...     {"user": "tompw", "first_name": "Tom", "last_name": "Preston-Werner"},
... ]

>>> players = aot()
>>> for player in player_data:
...     players.append(
...         table()
...         .add("username", player["user"])
...         .add("name",
...             inline_table()
...             .add("first", player["first_name"])
...             .add("last", player["last_name"])
...         )
...     )
...
>>> config.add(nl()).add(comment("Players")).add("players", players)
index