Deep Dive into Python’s dict type

Table Of Contents

Let’s continue analysis from Performance Analysis of Python’s dict() and {} and dive deeper into the Python’s internal types for dict.

Research

 1>>> import sys
 2>>> sys.version_info
 3sys.version_info(major=3, minor=12, micro=1, releaselevel='final', serial=0)
 4>>>
 5>>> def a():
 6...   return dict()
 7>>>
 8>>> def b():
 9...   return {}
10>>>
11>>> import dis
12>>>
13>>> dis.dis(a)
14  1           0 RESUME                   0
15
16  2           2 LOAD_GLOBAL              1 (NULL + dict)
17             12 CALL                     0
18             20 RETURN_VALUE
19>>> dis.dis(b)
20  1           0 RESUME                   0
21
22  2           2 BUILD_MAP                0
23              4 RETURN_VALUE
1$ python -m timeit "dict(a='value', another='value')"
25000000 loops, best of 5: 71 nsec per loop
3
4$ python -m timeit "{'a': 'value','another': 'value'}"
55000000 loops, best of 5: 51.1 nsec per loop

C code from cpython

dict type

Python’s dict type is defined inside the interpreter:

1PyTypeObject PyDict_Type = {
2 // ...
3 dict_init, /* tp_init */
4 _PyType_AllocNoTrack, /* tp_alloc */
5 dict_new, /* tp_new */
6 // ...
7}

See: https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3962

When a new dict object is created (dict()), it gets created by the dict_new function, which internally uses set tp_alloc to allocate a new memory region, and then initialises with the dict_init function.

Most of the dict_init logic is contained in the dict_update_common function. First it gets filled up with unpacked args, and then with kwargs. In both cases it uses the dict_merge function, which does a lot.

{} literal expression

Python’s {} literal expression gets compiled into BUILD_MAP opcode. The opcode is interpreted by one of the following:

Python/bytecodes.c#BUILD_MAP, Python/executor_cases.c.h#_BUILD_MAP, Python/generated_cases.c.h#BUILD_MAP

Note

The top of the executor_cases.c.h and generated_cases.c.h files says:

1// This file is generated by ...
2// from:
3//   Python/bytecodes.c
4// Do not edit!

Therefore, we should really look only at the true source: bytecodes.c.

The main logic of the opcode is contained in _PyDict_FromItems function form the dictobject.c file.

1// Include/internal/pycore_dict.h
2extern PyObject *_PyDict_FromItems(
3 PyObject *const *keys, Py_ssize_t keys_offset,
4 PyObject *const *values, Py_ssize_t values_offset,
5 Py_ssize_t length);

The _PyDict_FromItems function:

  • checks if all keys on the stack are “valid” (see PyUnicode_CheckExact function)
  • constructs a pre-sized dict object (see dict_new_presized function),
  • inserts key-values one-by-one from the stack,
  • returns the dict object.

Case for list() vs. []

A similar analysis can be done for Python’s lists.

 1>>> import sys
 2>>> sys.version_info
 3sys.version_info(major=3, minor=12, micro=1, releaselevel='final', serial=0)
 4>>>
 5>>> def a():
 6...   return list((1, 2))
 7>>>
 8>>> def b():
 9...   return [1, 2]
10>>>
11>>> import dis
12>>>
13>>> dis.dis(a)
14  1           0 RESUME                   0
15
16  2           2 LOAD_GLOBAL              1 (NULL + list)
17             12 LOAD_CONST               1 ((1, 2))
18             14 CALL                     1
19             22 RETURN_VALUE
20>>> dis.dis(b)
21  1           0 RESUME                   0
22
23  2           2 LOAD_CONST               1 (1)
24              4 LOAD_CONST               2 (2)
25              6 BUILD_LIST               2
26              8 RETURN_VALUE
1$ python -m timeit "list((1, 2, 'a'))"
25000000 loops, best of 5: 53 nsec per loop
3$ python -m timeit "[(1, 2, 'a')]"
410000000 loops, best of 5: 30.4 nsec per loop

list type

The Python’s list type is defined in the listobject.c file:

1PyTypeObject PyList_Type = {
2 // ...
3 (initproc)list___init__, /* tp_init */
4 PyType_GenericAlloc, /* tp_alloc */
5 PyType_GenericNew, /* tp_new */
6 // ...
7}

PyType_GenericNew does nothing – it ignores args and kwargs, and return an object allocated by type->typ_alloc. Source: https://github.com/python/cpython/blob/main/Objects/typeobject.c#L1768

PyType_GenericAlloc allocated a new object and (if it’s garbage-collectable) marks it as “to-be-collected”.

list___init__ is a convenience wrapper around the list___init___impl function, which does the actual initialisation. It does some basic pre-steps:

  • fails if any kwarg has been passed,
  • fails if args has more then 1 element,
  • converts the first argument of args to an iterable.

list___init___impl clears the list (if not empty) and extends it by the provided iterable (by calling the list_extend function).

Question

I think the two:

1list((1, 2, 3))
2
3_tmp = list()
4_tmp.extend((1, 2, 3))

are equivalent in every way (but the generated opcodes sequence) and are calling the same underlying C code.

[] literal expression

Opcode BUILD_LIST uses the _PyList_FromArraySteal function to construct a list object and return it.

Note

_PyList_FromArraySteal is used since 3.12. See 3.12 changelog and gh-100146. Before 3.12, the opcode was repeatedly calling POP() to get items from the stack, and then inserting them into the list.

1while (--oparg >= 0) {
2    PyObject *item = POP();
3    PyList_SET_ITEM(list, oparg, item);
4}

See change in gh-100146 PR.

The _PyList_FromArraySteal helper creates an empty list (if there are no items), or preallocates the list before inserting items to it.

1PyObject **dst = list->ob_item;
2memcpy(dst, src, n * sizeof(PyObject *));

Case of set((1, 2)) vs. {1, 2}

… and similar analysis can be done for sets.

Warning
AFAIK there’s no way to construct a new set with a literal expression, hence the following analysis is performed for sets with two elements: 1 and 2.
1$ python -m timeit "set((1, 2))"
25000000 loops, best of 5: 82.6 nsec per loop
3$ python -m timeit "{1, 2}"
45000000 loops, best of 5: 45.5 nsec per loop
 1>>> import sys
 2>>> sys.version_info
 3sys.version_info(major=3, minor=12, micro=1, releaselevel='final', serial=0)
 4>>>
 5>>> def a():
 6...   return set((1, 2))
 7>>>
 8>>> def b():
 9...   return {1, 2}
10>>>
11>>> import dis
12>>>
13>>> dis.dis(a)
14  1           0 RESUME                   0
15
16  2           2 LOAD_GLOBAL              1 (NULL + set)
17             12 LOAD_CONST               1 ((1, 2))
18             14 CALL                     1
19             22 RETURN_VALUE
20>>> dis.dis(b)
21  1           0 RESUME                   0
22
23  2           2 LOAD_CONST               1 (1)
24              4 LOAD_CONST               2 (2)
25              6 BUILD_SET                2
26              8 RETURN_VALUE

set type

The Python’s set type is defined in the setobject.c file:

1PyTypeObject PySet_Type = {
2 // ...
3 (initproc)set_init, /* tp_init */
4 PyType_GenericAlloc, /* tp_alloc */
5 set_new, /* tp_new */
6 // ...
7}

set_new uses the allocated to create a new empty set object. Internally it calls the make_new_set function.

set_init performs some pre-steps:

  • fails if any kwarg has been passed,
  • fails if args has more then 1 element,
  • converts the first argument of args to an iterable,
  • if the set object has already been filled, then it clears it

… and calls the set_update_internal function, to insert values into the set object.

{1, 2} literal expression

The opcode handler has no dedicated helpers. It constructs a new set (empty), and then iterates over all items from the stack and insets them one-by-one.

 1inst(BUILD_SET, (values[oparg] -- set)) {
 2 set = PySet_New(NULL);
 3 if (set == NULL)
 4  GOTO_ERROR(error);
 5 int err = 0;
 6 for (int i = 0; i < oparg; i++) {
 7  PyObject *item = values[i];
 8  if (err == 0)
 9   err = PySet_Add(set, item);
10  Py_DECREF(item);
11 }
12 if (err != 0) {
13  Py_DECREF(set);
14  ERROR_IF(true, error);
15 }
16}

Source: https://github.com/python/cpython/blob/main/Python/bytecodes.c#L1627

Warning

The generated bytecode is different if we add another argument to the literal expression: {1, 2, 3}. Then it becomes:

1   2 BUILD_SET                0
2   4 LOAD_CONST               1 (frozenset({1, 2, 3}))
3   6 SET_UPDATE               1
4   8 RETURN_VALUE

Case of tuple((1, 2, []) vs. (1, 2, [])

This one’s tricky. Constructing a tuple with the tuple type requires an iterable. The easiest way of getting one it to create a tuple literal, which is obviously less efficient, than using a tuple literal syntax in the first place.

 1>>> import sys
 2>>> sys.version_info
 3sys.version_info(major=3, minor=12, micro=1, releaselevel='final', serial=0)
 4>>>
 5>>> def a():
 6...   return tuple((1, 2, []))
 7>>>
 8>>> def b():
 9...   return (1, 2, [])
10>>>
11>>> import dis
12>>>
13>>> dis.dis(a)
14  1           0 RESUME                   0
15
16  2           2 LOAD_GLOBAL              1 (NULL + tuple)
17             12 LOAD_CONST               1 (1)
18             14 LOAD_CONST               2 (2)
19             16 BUILD_LIST               0
20             18 BUILD_TUPLE              3
21             20 CALL                     1
22             28 RETURN_VALUE
23>>> dis.dis(b)
24  1           0 RESUME                   0
25
26  2           2 LOAD_CONST               1 (1)
27              4 LOAD_CONST               2 (2)
28              6 BUILD_LIST               0
29              8 BUILD_TUPLE              3
30             10 RETURN_VALUE

It might not make much sense to include this example in the article.

tuple type

The Python’s tuple type is defined in tupleobject.c. Interestingly, it has no tp_alloc, nor tp_init.

1PyTypeObject PyTuple_Type = {
2 // ...
3 0, /* tp_init */
4 0, /* tp_alloc */
5 tuple_new, /* tp_new */
6 // ...
7}

The tuple_new function is responsible for allocating a new tuple object, with the items passed as a parameter.

Questions

References


Interested in my work?

Consider subscribing to the RSS Feed or joining my mailing list: madebyme-notifications on Google Groups .


Disclaimer: Only group owner (i.e. me) can view e-mail addresses of group members. I will not share your e-mail with any third-parties — it will be used exclusively to notify you about new blog posts.