Performance Analysis of Python’s dict()
and {}
Table Of Contents
Some time ago, during a code review, I had a discussion with a colleague of mine about preferring dict()
over {}
in
new Python code. They argued that dict()
is more readable — and expresses intent more clearly — therefore should
be preferred. I wasn’t convinced by that, but at that time I didn’t have any counterarguments, so I passed.
Yet that made me wonder: what’s the difference between the dict
type and {}
literal expression?
Let’s go down the rabbit hole.
Note
Python’s dictionaries
First, let’s check what’s the performance difference between the two methods. I’ll use the timeit
module for that.
1$ python -m timeit "dict()"
210000000 loops, best of 5: 40 nsec per loop
3$ python -m timeit "{}"
420000000 loops, best of 5: 19.6 nsec per loop
On my MacBook M1 the difference is almost 2x. A non-trivial difference, especially knowing that both of these expressions produce the exact same object.
Where does the difference come from?
Before we get into that, we need to sidetrack a little bit and talk about what happens when you execute Python code. Python is an interpreted language — it needs to have an interpreter. CPython is the most wildly used interpreter of Python; it’s a reference implementation, meaning that other interpreters use it to determine the “correct” behavior. If you’ve installed Python from a default distribution, you have CPython installed on your machine.
Interestingly, CPython is both a compiler and an interpreter. When executing Python code it first compiles it into bytecode instructions and interprets them.
To better understand the performance difference between dist()
and {}
let’s compare the bytecode instructions they
compile into. Python has a built-in module called dis which does exactly that.
1>>> import dis
2>>> def a():
3... return dict()
4...
5>>> def b():
6... return {}
7...
8>>> dis.dis(a)
9 1 0 RESUME 0
10
11 2 2 LOAD_GLOBAL 1 (NULL + dict)
12 12 CALL 0
13 20 RETURN_VALUE
14>>> dis.dis(b)
15 1 0 RESUME 0
16
17 2 2 BUILD_MAP 0
18 4 RETURN_VALUE
Although they produce the same end result these two expression, quite clearly, execute different code.
Bytecode instructions
The output of dis.dis
isn’t very hard to understand.
1 (1) | (2) | (3) | (4) | (5)
2-----|-----------|-----------------------|-----|---------------
3 1 | 0 | RESUME | 0 |
4 | | | |
5 2 | 2 | LOAD_GLOBAL | 1 | (NULL + dict)
6 | 12 | CALL | 0 |
7 | 20 | RETURN_VALUE | |
Each column has a purpose:
- Line number in the source code.
- The address of the instruction — byte index in the compiled bytecode.
- The bytecode name (opcode).
- Operation parameters.
- Human-readable interpretation of the operation parameters.
Alright, equipped with this knowledge and by cross-referencing the opcode set we know that:
RESUME
— does nothing. It performs internal tracking, debugging and optimization checks.LOAD_GLOBAL
— loads a global variable onto the stack. From the human-readable interpretation of the opcode parameters we know that it loadsdict
(ignoreNULL
for now).CALL
— calls a callable object with the number of arguments specified byargc
— in our case it’s zero.RETURN_VALUE
— returns with the last element from the stack to the caller.
Compiling return {}
yields one more opcode:
BUILD_MAP
— pushes a new dictionary object onto the stack. Pops2 * count
items so that the dictionary holds count entries.
Already we have a few obvious differences between the two cases. The {}
expression builds a dictionary directly, while
dict()
delegates that to a callable object. Before that can even happen it first needs to load the global value
(dict
) onto the stack — it actually does that every single time we call that function.
Why?
Because dict
is not constant: it can be overwritten — and therefore — produce a different value.
It can happen. This is why Python needs to generate this overhead of loading and calling a callable for every call of the function.
OK, it all sound very neat. Calling a callable does create an overhead and it’s reasonable to assume that the difference
we saw at the beginning of this post is a result of this overhead. However, are we sure that dict()
internally calls
the same code, as {}
does? dict
is a class and, luckily, the dis.dis
function can compile classes to bytecode.
Example
Let’s try this for dict
:
… it doesn’t print anything? Why?
Well, the dis module isn’t very useful for internal types and dict
is one of those types — defined
within the interpreter’s source code.
Getting lost in the CPython source code
To tap into the forbidden magic, we need to clone the CPython repository first. We don’t need all the
history, so let’s --depth 1
this.
1git clone --depth 1 --branch v3.12.0 https://github.com/python/cpython.git
2# or
3git clone --depth 1 --branch v3.12.0 git@github.com:python/cpython.git
Then we need to find what we’re actually looking for — a place where opcode instructions are interpreted. After a cup
of coffee and a lot of greps later we find the Python/bytecodes.c
file. It consists of a single switch
statement and it seems like all bytecode instruction are interpreted there.
dict
type
The Let’s tackle dict
first. All internal types are defined as PyTypeObject
objects and the dict
type is defined in
the dictobject.c
file. It has a lot of attributes defined, but only two are of interest
for us:
1PyTypeObject PyDict_Type = {
2 PyVarObject_HEAD_INIT(&PyType_Type, 0)
3 "dict",
4 sizeof(PyDictObject),
5 // ...
6 dict_init, /* tp_init */
7 // ...
8 dict_new, /* tp_new */
9 // ...
10};
The pair (dict_new
and dict_init
) will tell us what happens when someone creates a new dictionary.
Note
The constructor in Python is called __new__
. It’s a static method that returns a new object of its type.
The initializer method is called __init__
. It takes a newly created object and initializes its attributes. The
__init__
method is called after the __new__
method.
The dict_new
function is defined here: dictobject.c#L3751.
1static PyObject *
2dict_new(PyTypeObject*type, PyObject *args, PyObject*kwds)
3{
4 assert(type != NULL);
5 assert(type->tp_alloc != NULL);
6 // dict subclasses must implement the GC protocol
7 assert(_PyType_IS_GC(type));
8
9 PyObject *self = type->tp_alloc(type, 0);
10 if (self == NULL) {
11 return NULL;
12 }
13 PyDictObject *d = (PyDictObject *)self;
14
15 d->ma_used = 0;
16 d->ma_version_tag = DICT_NEXT_VERSION(
17 _PyInterpreterState_GET());
18 dictkeys_incref(Py_EMPTY_KEYS);
19 d->ma_keys = Py_EMPTY_KEYS;
20 d->ma_values = NULL;
21 ASSERT_CONSISTENT(d);
22
23 // ...
24
25 return self;
26}
We see that first, it allocates a new object via the provided allocator (9th line) and then sets
some of its internal fields (15th, 19th, and 20th).
Interestingly for us, it does not pre-allocate the dictionary’s internal memory (see marked lines). It might seems
strange at first, but please remember: an object’s initialization — like memory pre-allocation — is not a
responsibility of the __new__
method, that what the __init__
method is for.
With the new object in memory, the dict_init
function can insert entries to it. The insertion logic is delegated to
the dict_update_common
function, which can be found here: dictobject.c#L2678.
1static int
2dict_update_common(PyObject *self, PyObject *args, PyObject *kwds,
3 const char *methname)
4{
5 PyObject *arg = NULL;
6 int result = 0;
7
8 if (!PyArg_UnpackTuple(args, methname, 0, 1, &arg)) {
9 result = -1;
10 }
11 else if (arg != NULL) {
12 result = dict_update_arg(self, arg);
13 }
14
15 if (result == 0 && kwds != NULL) {
16 if (PyArg_ValidateKeywordArguments(kwds))
17 result = PyDict_Merge(self, kwds, 1);
18 else
19 result = -1;
20 }
21 return result;
22}
It updates the dictionary from both args and kwargs.
dict_update_arg
args & For args
it calls the dict_update_arg
function.
1static int
2dict_update_arg(PyObject *self, PyObject *arg)
3{
4 if (PyDict_CheckExact(arg)) {
5 return PyDict_Merge(self, arg, 1);
6 }
7 PyObject *func;
8 if (_PyObject_LookupAttr(arg, &_Py_ID(keys), &func) < 0) {
9 return -1;
10 }
11 if (func != NULL) {
12 Py_DECREF(func);
13 return PyDict_Merge(self, arg, 1);
14 }
15 return PyDict_MergeFromSeq2(self, arg, 1);
16}
The function checks if arg
is a dictionary, if so it merges the two (PyDict_Merge
), otherwise it adds new entries
from a sequence of pairs (PyDict_MergeFromSeq2
).
PyDict_Merge
kwargs & The kwargs
path ends in the same place — PyDict_Merge
. Let’s take a look.
Internally, it delegates all logic to the dict_merge
function.
1static int
2dict_merge(PyInterpreterState *interp, PyObject *a, PyObject *b, int override)
3{
4 PyDictObject *mp, *other;
5
6 assert(0 <= override && override <= 2);
7
8 /* We accept for the argument either a concrete dictionary object,
9 * or an abstract "mapping" object. For the former, we can do
10 * things quite efficiently. For the latter, we only require that
11 * PyMapping_Keys() and PyObject_GetItem() be supported.
12 */
13 if (a == NULL || !PyDict_Check(a) || b == NULL) {
14 PyErr_BadInternalCall();
15 return -1;
16 }
17 mp = (PyDictObject*)a;
18 if (PyDict_Check(b) && (Py_TYPE(b)->tp_iter == (getiterfunc)dict_iter)) {
19
20 // ...
21 else {
22 /* Do it the generic, slower way */
23
24 // ...
25 }
26 ASSERT_CONSISTENT(a);
27 return 0;
28}
From the comment, we know that the function has been optimized for being called with another dictionary as a parameter.
If the parameter is not a dictionary, then it’s a generic Mapping
— a container object that supports arbitrary key
lookups.
{}
expression
The The literal expression should be easier to reason about. Let’s go back to the bytecode.c file and find mapping for the
BUILD_MAP
opcode.
1inst(BUILD_MAP, (values[oparg*2] -- map)) {
2 map = _PyDict_FromItems(
3 values, 2,
4 values+1, 2,
5 oparg);
6 if (map == NULL)
7 goto error;
8
9 DECREF_INPUTS();
10 ERROR_IF(map == NULL, error);
11}
See sources here: bytecode.c#L1541.
We see that the dictionary is fully constructed and returned by _PyDict_FromItems
. Let’s go there.
1PyObject *
2_PyDict_FromItems(PyObject *const *keys, Py_ssize_t keys_offset,
3 PyObject *const *values, Py_ssize_t values_offset,
4 Py_ssize_t length)
5{
6 bool unicode = true;
7 PyObject *const *ks = keys;
8 PyInterpreterState *interp = _PyInterpreterState_GET();
9
10 for (Py_ssize_t i = 0; i < length; i++) {
11 if (!PyUnicode_CheckExact(*ks)) {
12 unicode = false;
13 break;
14 }
15 ks += keys_offset;
16 }
17
18 PyObject *dict = dict_new_presized(interp, length, unicode);
19 if (dict == NULL) {
20 return NULL;
21 }
22
23 ks = keys;
24 PyObject *const *vs = values;
25
26 for (Py_ssize_t i = 0; i < length; i++) {
27 PyObject *key = *ks;
28 PyObject *value = *vs;
29 if (PyDict_SetItem(dict, key, value) < 0) {
30 Py_DECREF(dict);
31 return NULL;
32 }
33 ks += keys_offset;
34 vs += values_offset;
35 }
36
37 return dict;
38}
I’ve marked the most interesting line: in opposition to dict_new
, it pre-allocates the dictionary, so it already
has a capacity for all of its entries. After that it inserts key-value pairs one-by-one.
Conclusions
To sum things up.
When you do dict(a=1, b=2)
, Python needs to:
- allocate a new
PyObject
, - construct a
dict
via the__new__
method, - call its
__init__
method, which internally callsPyDict_Merge
, - because
kwargs
is notdict
,PyDict_Merge
needs to use the slower method, which inserts entries one-by-one.
Whereas doing {}
causes Python to:
- construct a new pre-allocated dictionary,
- insert entries one-by-one.
To be fair, unless you construct dictionaries in a loop, I don’t believe there’s a lot to gain performance-wise by
switching from dict
to {}
. There’s one thing I’d like you to remember after reading this post:
tl;dr
{}
is always faster than dict
.Appendix
Similar analysis can be done for lists, sets, and (possibly) tuples. However, this post is already too long, so I’ll drop all the other resources I’ve gathered here.
Lists
1$ python -m timeit "list((1, 2, 'a'))"
25000000 loops, best of 5: 53.4 nsec per loop
3$ python -m timeit "[1, 2, 'a']"
410000000 loops, best of 5: 29.7 nsec per loop
list
type
The Python’s list
type is defined in the listobject.c
file:
1PyTypeObject PyList_Type = {
2 // ...
3 (initproc)list___init__, /* tp_init */
4 PyType_GenericAlloc, /* tp_alloc */
5 PyType_GenericNew, /* tp_new */
6 // ...
7}
PyType_GenericNew
does nothing — it ignores args
and kwargs
, and return an object allocated by
type->typ_alloc
. Source:
PyType_GenericAlloc
allocated a new object and (if it’s garbage-collectable) marks it as “to-be-collected”.
list___init__
is a convenience wrapper around the list___init___impl
function, which does the actual initialization.
It does some basic pre-steps:
- fails if any
kwarg
has been passed, - fails if
args
has more than 1 element, - converts the first argument of
args
to an iterable.
list___init___impl
clears the list (if not empty) and extends it by the provided iterable (by calling the
list_extend
function).
Question
[]
literal expression
Opcode BUILD_LIST
uses the _PyList_FromArraySteal
function to construct a list object and return it.
Note
_PyList_FromArraySteal is used since 3.12. See 3.12 changelog
and gh-100146. Before 3.12, the opcode was repeatedly calling POP()
to get items from the stack, and then inserting them into the list.
The _PyList_FromArraySteal
helper creates an empty list (if there are no items), or preallocates the list before
inserting items to it.
Sets
Warning
1
and 2
.1$ python -m timeit "set((1, 2))"
25000000 loops, best of 5: 82.6 nsec per loop
3$ python -m timeit "{1, 2}"
45000000 loops, best of 5: 45.5 nsec per loop
set
type
The Python’s set
type is defined in the setobject.c
file:
1PyTypeObject PySet_Type = {
2 // ...
3 (initproc)set_init, /* tp_init */
4 PyType_GenericAlloc, /* tp_alloc */
5 set_new, /* tp_new */
6 // ...
7}
set_new
uses the allocated to create a new empty set object. Internally it calls the make_new_set
function.
set_init
performs some pre-steps:
- fails if any
kwarg
has been passed, - fails if
args
has more then 1 element, - converts the first argument of
args
to an iterable, - if the set object has already been filled, then it clears it
… and calls the set_update_internal
function, to insert values into the set object.
{1, 2}
literal expression
The opcode handler has no dedicated helpers. It constructs a new set (empty), and then iterates over all items from the stack and insets them one-by-one.
1inst(BUILD_SET, (values[oparg] -- set)) {
2 set = PySet_New(NULL);
3 if (set == NULL)
4 GOTO_ERROR(error);
5 int err = 0;
6 for (int i = 0; i < oparg; i++) {
7 PyObject *item = values[i];
8 if (err == 0)
9 err = PySet_Add(set, item);
10 Py_DECREF(item);
11 }
12 if (err != 0) {
13 Py_DECREF(set);
14 ERROR_IF(true, error);
15 }
16}
Source:
.Warning
Tuples
This one’s tricky. Constructing a tuple with the tuple
type requires an iterable. The easiest way of getting one it to
create a tuple literal, which is obviously less efficient, than using a tuple literal syntax in the first place.
1>>> import sys
2>>> sys.version_info
3sys.version_info(major=3, minor=12, micro=1, releaselevel='final', serial=0)
4>>>
5>>> def a():
6... return tuple((1, 2, []))
7>>>
8>>> def b():
9... return (1, 2, [])
10>>>
11>>> import dis
12>>>
13>>> dis.dis(a)
14 1 0 RESUME 0
15
16 2 2 LOAD_GLOBAL 1 (NULL + tuple)
17 12 LOAD_CONST 1 (1)
18 14 LOAD_CONST 2 (2)
19 16 BUILD_LIST 0
20 18 BUILD_TUPLE 3
21 20 CALL 1
22 28 RETURN_VALUE
23>>> dis.dis(b)
24 1 0 RESUME 0
25
26 2 2 LOAD_CONST 1 (1)
27 4 LOAD_CONST 2 (2)
28 6 BUILD_LIST 0
29 8 BUILD_TUPLE 3
30 10 RETURN_VALUE
It might not make much sense to include this example in the article.
tuple
type
The Python’s tuple
type is defined in tupleobject.c
. Interestingly, it has no tp_alloc
, nor tp_init
.
1PyTypeObject PyTuple_Type = {
2 // ...
3 0, /* tp_init */
4 0, /* tp_alloc */
5 tuple_new, /* tp_new */
6 // ...
7}
The tuple_new
function is responsible for allocating a new tuple object, with the items passed as a parameter.
Interested in my work?
Consider subscribing to the RSS Feed or joining my mailing list: madebyme-notifications on Google Groups .
Disclaimer: Only group owner (i.e. me) can view e-mail addresses of group members. I will not share your e-mail with any third-parties — it will be used exclusively to notify you about new blog posts.