Use TOML as a Configuration Format | |||||||||||||||||||||||||||
TOML: Tom's Obvious Minimal Language
TOML is a fairly new formatthe first format specification, version 0.1.0, was released in 2013 focused on being a minimal configuration file format that's human-readable [user] player_x.color = "blue" player_o.color = "green" [constant] board_size = 3 [server] url = "https://tictactoe.example.com"TOML has a specification which spells out precisely what's allowed and how different values should be interpreted TOML is restrictive in a few aspects
TOML Schema Validation
can validate the TOML configuration manually in simple applicationbelow the config file has been parsed into a string named config the code shows how the string/file is validated match config: case { "user": {"player_x": {"color": str()}, "player_o": {"color": str()}}, "constant": {"board_size": int()}, "server": {"url": str()}, }: pass case _: raise ValueError(f"invalid configuration: {config}")approach may not scale well if the TOML document is more complicated A better alternative is to use pydantic utilizes type annotations to do data validation at runtime one advantage of pydantic is that it has precise and helpful error messages built in |
|||||||||||||||||||||||||||
Get to Know TOML: Key-Value Pairs | |||||||||||||||||||||||||||
TOML is built around key-value pairs that map nicely to hash table data structures TOML values have different types each value must have one of the following types
key-value pairs are the basic building blocks in a TOML document keys are always interpreted as strings even without quotation marks bare keys consist only of ASCII letters and numbers as well as underscores and dashes all such keys can be written without quotation marks can avoid the limits of bare keys by using Unicode strings as keys generally want to use bare keys dots (.) play a special role in TOML keys can use dots in unquoted keys in that case, they'll trigger grouping by splitting the dotted key at each dot player_x.symbol = "X" player_x.color = "purple"both keys start with player_x the keys symbol and color will be grouped together inside a section named player_x Strings, Numbers, and Booleans
one difference between TOML and Python is that TOML's Boolean values are lowercase (true/false)
a TOML string should typically use double quotation marks (")inside strings can escape special characters with the help of backslashes "\u03c0 is less than four"\u03c0 denotes the Unicode character with codepoint for the Greek letter π string will be interpreted as "π is less than four" can also specify TOML strings using single quotation marks (') single-quoted strings are called literal strings and behave similarly to raw strings in Python nothing is escaped and interpreted in a literal string '\u03c0 is the Unicode codepoint of π' starts with the literal \u03c0 characters '\u03c0 is the Unicode codepoint of π'TOML strings can also be specified using triple quotation marks (""" or ''') triple-quoted strings allow writing a string over multiple lines similar to Python multiline strings partly_zen = """ Flat is better than nested. Sparse is better than dense. """control characters, including literal newlines, aren't allowed in basic strings can use \n to represent a newline inside a basic string must use a multiline string if to format strings over several lines can also use triple-quoted literal strings the only way to include a single quotation mark inside a literal string '''Use '\u03c0' to represent π'''be careful with special characters when creating TOML documents inside Python code Python will also interpret the special characters the following is a valid TOML document numbers = "one\ntwo\nthree"the value of numbers is a string that's split over three lines try to represent the same document in Python >>> 'numbers = "one\ntwo\nthree"' 'numbers = "one\ntwo\nthree"'Python parses the \n characters and creates an invalid TOML document need to keep the special characters away from Python example using raw strings >>> r'numbers = "one\ntwo\nthree"' 'numbers = "one\\ntwo\\nthree"'Integers integers represent whole numbers and are specified as plain, numeric characters as in Python can use underscores to enhance readability number = 42 negative = -8 large = 60_481_729Floating point numbers represent decimal numbers and include an integer part, a dot representing the decimal point, and a fractional part floats can use scientific notation to represent very small or very large numbers TOML also supports special float values like infinity and not a number (NaN) number = 3.11 googol = 1e100 mole = 6.22e23 negative_infinity = -inf not_a_number = nanthe TOML specification requires that integers at least are represented as 64-bit signed integers Python handles arbitrarily large integers only integers with up to about 19 digits are guaranteed to work on all TOML implementations non-negative integer values may also be represented as hexadecimal, octal, or binary values by using a 0x, 0o, or 0b prefix, respectively Tables
a TOML document consists of one or more key-value pairswhen represented in a programming language these should be stored in a hash table data structure in Python that would be a dictionary or another dictionary-like data structure to organize key-value pairs, can use tables TOML supports three different ways of specifying tables the different tables do have slightly different use cases
should default to regular tables only use dotted key tables or inline tables if it improves the configuration's readability or clarifies the intent regular tables are defined by adding a table header above the key-value pairs a header is a key without a value, wrapped inside square brackets ([]) three tables [user] player_x.color = "blue" player_o.color = "green" [constant] board_size = 3 [server] url = "https://tictactoe.example.com"[user] is a dotted key table [user] player_x.color = "blue" player_o.color = "green"the dot (.) in the keys creates a table named by the part of the key before the dot the same configuration using regular tables [user] [user.player_x] color = "blue" [user.player_o] color = "green"indentation isn't important in TOML use it here to represent the nesting of the tables table examples # nested regular tables [user] [user.player_x] symbol = "X" color = "blue" [user.player_o] symbol = "O" color = "green" # nested dotted key tables [user] player_x.symbol = "X" player_x.color = "blue" player_o.symbol = "O" player_o.color = "green" # inline table [user] player_x = { symbol = "X", color = "blue" } player_o = { symbol = "O", color = "green" }an inline table is defined with curly braces ({}) wrapped around comma-separated key-value pairs in this example, the inline table brings a nice balance of readability and compactness the grouping of the player tables becomes clear inline tables are intentionally limited compared to regular tables an inline table must be written on one line in the TOML file a TOML document is represented by a nameless root table that contains all other tables and key-value pairs key-value pairs written at the top of the TOML configuration (before any table header) are stored directly in the root table title = "Tic-Tac-Toe" [constant] board_size = 3a table includes all key-value pairs written between its header and the next table header below the background_color doesn't appear to be part of the [user.player_o] table it is because indentation isn't important in TOML [user] [user.player_x] color = "blue" [user.player_o] color = "green" background_color = "white"to eliminate an confusion background_color should be defined before the nested tables Times and Dates
four representations of date-time
2021-01-12T01:23:45.654321+01:00timestamp fields
can replace the T that separates the date and time with a space examples of each of the timestamp-related types offset_date-time = 2021-01-12 01:23:45+01:00 offset_date-time_utc = 2021-01-12 00:23:45Z local_date-time = 2021-01-12 01:23:45 local_date = 2021-01-12 local_time = 01:23:45 local_time_with_us = 01:23:45.654321 Arrays
TOML arrays represent an ordered list of valuesspecify them using square brackets ([]) they resemble Python's lists packages = ["tomllib", "tomli", "tomli_w", "tomlkit"]can use any TOML data type, including other arrays, inside arrays one array can contain different data types allowed to specify an array over several lines can use a trailing comma after the last element in the array the following examples are valid TOML arrays potpourri = ["flower", 1749, { symbol = "X", color = "blue" }, 1994-02-14] skiers = ["Thomas", "Bjørn", "Mika"] players = [ { symbol = "X", color = "blue", ai = true }, { symbol = "O", color = "green", ai = false }, ]players is a table containing two inline arrays in general should express an array of tables by writing table headers inside double square brackets ([[]]) syntax isn't necessarily pretty, but it's quite effective below the array of tables is equivalent to the array of inline tables above [[players]] symbol = "X" color = "blue" ai = true [[players]] symbol = "O" color = "green" ai = falseexcerpt from a TOML document [python] label = "Python" [[python.questions]] question = "Which built-in function can get information from the user" answers = ["input"] alternatives = ["get", "print", "write"] [[python.questions]] question = "What's the purpose of the built-in zip() function" answers = ["To iterate over two or more sequences at the same time"] alternatives = [ "To combine several strings into one", "To compress several files into one archive", "To get information from the user", ]the snippet shows the python table holds two keys
each element is a table with three keys
|
|||||||||||||||||||||||||||
Load TOML With Python | |||||||||||||||||||||||||||
Read TOML Documents With tomli and tomllib
tomli and its sibling tomllib are great libraries when you only want to load a
TOML document into Pythoncreate the following TOML file and saving it as tic_tac_toe.toml # tic_tac_toe.toml [user] player_x.color = "blue" player_o.color = "green" [constant] board_size = 3 [server] url = "https://tictactoe.example.com"create a virtual environment in the same folder as tic_tac_toe.toml activate the virtual environment and install tomli (venv) $ python -m pip install tomlitomli exposes two methods
>>> import tomli >>> with open("tic_tac_toe.toml", mode="rb") as fp: ... config = tomli.load(fp) ... >>> config {'user': {'player_x': {'color': 'blue'}, 'player_o': {'color': 'green'}}, 'constant': {'board_size': 3}, 'server': {'url': 'https://tictactoe.example.com'}} >>> config["user"]["player_o"] {'color': 'green'} >>> config["server"]["url"] 'https://tictactoe.example.com'the TOML document is represented as a Python dictionary all the tables and sub-tables in the TOML file show up as nested dictionaries in config variable can pick out individual values by following the keys into the nested dictionary if the TOML document is represented as a string, then use loads() the 's' in loads() stands for string >>> import tomli >>> toml_str = """ ... offset_date-time_utc = 2021-01-12 00:23:45Z ... potpourri = ["flower", 1749, { symbol = "X", color = "blue" }, 1994-02-14] ... """ >>> tomli.loads(toml_str) {'offset_date-time_utc': datetime.datetime(2021, 1, 12, 0, 23, 45, tzinfo=datetime.timezone.utc), 'potpourri': ['flower', 1749, {'symbol': 'X', 'color': 'blue'}, datetime.date(1994, 2, 14)]} Compare TOML Types and Python Types
the TOML specification mentions some requirements on its own types
Use Configuration Files in Projects
a config file should be read only oncewrap the config file in a module when a module is imported it is cached for later use in the folder where tic-tac-toe.toml is located and add a subfolder named config # tic_tac_toe.toml [user] player_x.color = "blue" player_o.color = "green" [constant] board_size = 3 [server] url = "https://tictactoe.example.com"move tic-tac-toe.toml to the new folder add a file named __init__.py to the config folder the folder structure should look like config/ ├── __init__.py └── tic_tac_toe.tomladd the code below to __init__.py # __init__.py import pathlib import tomli path = pathlib.Path(__file__).parent / "tic_tac_toe.toml" with path.open(mode="rb") as fp: tic_tac_toe = tomli.load(fp)in a REPL session >>> import config >>> config.path PosixPath('/home/realpython/config/tic_tac_toe.toml') >>> config.tic_tac_toe {'user': {'player_x': {'color': 'blue'}, 'player_o': {'color': 'green'}}, 'constant': {'board_size': 3}, 'server': {'url': 'https://tictactoe.example.com'}} >>> config.tic_tac_toe["server"]["url"] 'https://tictactoe.example.com' >>> config.tic_tac_toe["constant"]["board_size"] 3 >>> config.tic_tac_toe["user"]["player_o"] {'color': 'green'} >>> config.tic_tac_toe["user"]["player_o"]["color"] 'green'may want to alias the config import from config import tic_tac_toe as CFG color_x = CFG["user"]["player_x"]["color"] |
|||||||||||||||||||||||||||
Dump Python Objects as TOML | |||||||||||||||||||||||||||
Convert Dictionaries to TOML
code a basic TOML writer
_dumps_value() is a helper functiontakes a value and return its TOML representation based on the value type uses pattern matching for type checking # to_toml.py def _dumps_value(value): match value: case bool() : return "true" if value else "false" case float() : return str(value) case int() : return str(value) case str() : return f'"{value}"' case list() : return f"[{', '.join(_dumps_value(v) for v in value)}]" case _ : raise TypeError(f"{type(value).__name__} {value!r} is not supported")if the value is a list _dumps_value() calls itself recursively dumps() parses the dictionary converting each item to a key-value pair if an item is another dictionary the dumps() function recurses on itself add this method to to_toml.py def dumps(toml_dict, table=""): def tables_at_end(item): _, value = item return isinstance(value, dict) toml = [] for key, value in sorted(toml_dict.items(), key=tables_at_end): if isinstance(value, dict): table_key = f"{table}.{key}" if table else key toml.append(f"\n[{table_key}]\n{dumps(value, table_key)}") else: toml.append(f"{key} = {_dumps_value(value)}") return "\n".join(toml)the mule used to run to_toml.py # toml_mule.py import to_toml config = { "user": { "player_x": {"symbol": "X", "color": "blue", "ai": True}, "player_o": {"symbol": "O", "color": "green", "ai": False}, "ai_skill": 0.85, }, "board_size": 3, "server": {"url": "https://tictactoe.example.com"}, } print(to_toml.dumps(config)) Write TOML Documents With tomli_w
install tomli_w into the venv using command
(venv) $ python -m pip install tomli_wa new mule using tomli_w tomli_w_mule.py produces the same output as toml_mule.py did using to_toml.py # tomli_w_mule.py import tomli_w config = { "user": { "player_x": {"symbol": "X", "color": "blue", "ai": True}, "player_o": {"symbol": "O", "color": "green", "ai": False}, "ai_skill": 0.85, }, "board_size": 3, "server": {"url": "https://tictactoe.example.com"}, } print(tomli_w.dumps(config))tomli_w supports all the features which weren't implemented in to_toml.py includes times and dates, inline tables, and arrays of tables dumps() writes to a string so processing can continue to store the new TOML document directly to disk, then call dump() instead as with load() need to pass in a file pointer opened in binary mode >>> with open("tic-tac-toe-config.toml", mode="wb") as fp: ... tomli_w.dump(config, fp) ...tomli discards comments can't distinguish between literal strings, multiline strings, and regular strings in the dictionary returned by load() or loads() lose some metainformation when parsing a TOML document and then writing it back >>> import tomli, tomli_w >>> toml_data = """ ... [nested] # Not necessary ... ... [nested.table] ... string = "Hello, TOML!" ... weird_string = '''Literal ... Multiline''' ... """ >>> print(tomli_w.dumps(tomli.loads(toml_data))) [nested.table] string = "Hello, TOML!" weird_string = "Literal\n Multiline" |
|||||||||||||||||||||||||||
Create New TOML Documents | |||||||||||||||||||||||||||
Format and Style TOML Documents
generally whitespace is ignored in TOML filescan take advantage of this to make configuration files well organized, readable, and intuitive a hash symbol (#) marks the rest of the line as a comment there's no style guide for TOML documents some features in TOML are quite flexible
Create TOML From Scratch With tomlkit
install package into the venv
(venv) $ python -m pip install tomlkitthe 'round trip' below shows tomlkit preserves all string types, indentations, comments, and alignments >>> import tomlkit >>> toml_data = """ ... [nested] # Not necessary ... ... [nested.table] ... string = "Hello, TOML!" ... weird_string = '''Literal ... Multiline''' ... """ >>> print(tomlkit.dumps(tomlkit.loads(toml_data))) [nested] # Not necessary [nested.table] string = "Hello, TOML!" weird_string = '''Literal Multiline''' >>> tomlkit.dumps(tomlkit.loads(toml_data)) == toml_data Truecreate a TOML document from scratch >>> from tomlkit import comment, document, nl, table >>> toml = document() >>> toml.add(comment("Written by TOML Kit")) >>> toml.add(nl()) # empty line >>> toml.add("board_size", 3)convert toml to an actual TOML document by using dump() or dumps() can use .as_string() with the toml object >>> print(toml.as_string()) # Written by TOML Kit board_size = 3complete tomlkit_test.py from tomlkit import comment, document, nl, table toml = document() toml.add(comment("Written by TOML Kit")) toml.add(nl()) toml.add("board_size", 3) player_x = table() player_x.add("symbol", "X") player_x.add("color", "blue") player_x.comment("Start player") toml.add("player_x", player_x) player_o = table() player_o.update({"symbol": "O", "color": "green"}) toml["player_o"] = player_o print(toml.as_string()) |
|||||||||||||||||||||||||||
Update Existing TOML Documents | |||||||||||||||||||||||||||
Represent TOML as tomlkit Objects
tic-tac-toe-config.toml
# tic-tac-toe-config.toml board_size = 3 [user] ai_skill = 0.85 # A number between 0 (random) and 1 (expert) [user.player_x] symbol = "X" color = "blue" ai = true [user.player_o] symbol = "O" color = "green" ai = false # Settings used when deploying the application [server] url = "https://tictactoe.example.com"in a REPL session load tic-tac-toe-config.toml >>> import tomlkit >>> with open("tic-tac-toe-config.toml", mode="rt", encoding="utf-8") as fp: ... config = tomlkit.load(fp) ... >>> config {'board_size': 3, 'user': {'ai_skill': 0.85, 'player_x': { ... }}} >>> type(config) <class 'tomlkit.toml_document.TOMLDocument'>when using tomlkit the file must be opened in the text mode (mode="rt") the encoding must be encoding="utf-8" a TOMLDocument works like a dictionary or a dictionary of dictionaries >>> config["user"]["player_o"]["color"] 'green' >>> type(config["user"]["player_o"]["color"]) <class 'tomlkit.items.String'> >>> config["user"]["player_o"]["color"].upper() 'GREEN'values are also special tomlkit data types, can work with them as if they're regular Python types special TOML data types is that they give youprovide access to metainformation about the document >>> config["user"]["ai_skill"] 0.85 >>> config["user"]["ai_skill"].trivia.comment '# A number between 0 (random) and 1 (expert)' >>> config["user"]["player_x"].trivia.indent ' 'can mostly treat these special objects as if they were native Python objects they inherit from their native counterparts can use .unwrap() to convert them to plain Python >>> config["board_size"] ** 2 9 >>> isinstance(config["board_size"], int) True >>> config["board_size"].unwrap() 3 >>> type(config["board_size"].unwrap()) <class 'int'> Read and Write TOML Losslessly
a TOMLDocument can be treated just as any other objectcan use .add() to add new elements to it ,br/> can't use .add() to update the value of existing keys >>> config.add("app_name", "Tic-Tac-Toe") {'board_size': 3, 'app_name': 'Tic-Tac-Toe', 'user': { ... }} >>> config["user"].add("ai_skill", 0.6) Traceback (most recent call last): ... KeyAlreadyPresent: Key "ai_skill" already exists.can assign the new value as if config were a regular dictionary >>> config["user"]["ai_skill"] = 0.6 >>> print(config["user"].as_string()) ai_skill = 0.6 # A number between 0 (random) and 1 (expert) [user.player_x] symbol = "X" color = "blue" ai = true [user.player_o] symbol = "O" color = "green" ai = falseparts of the tomlkit support a fluent interface operations like .add() return the updated object so can chain another call to .add() onto it >>> from tomlkit import aot, comment, inline_table, nl, table >>> player_data = [ ... {"user": "gah", "first_name": "Geir Arne", "last_name": "Hjelle"}, ... {"user": "tompw", "first_name": "Tom", "last_name": "Preston-Werner"}, ... ] >>> players = aot() >>> for player in player_data: ... players.append( ... table() ... .add("username", player["user"]) ... .add("name", ... inline_table() ... .add("first", player["first_name"]) ... .add("last", player["last_name"]) ... ) ... ) ... >>> config.add(nl()).add(comment("Players")).add("players", players) |