Commit e55c2b7f authored by Stefan Behnel's avatar Stefan Behnel Committed by GitHub

Merge pull request #2852 from jdemeyer/trashcan

Document trashcan
parents f781880b e2d73ad2
......@@ -33,6 +33,10 @@ Features added
* ``--no-capture`` added to ``runtests.py`` to prevent stdout/stderr capturing
during srctree tests. Patch by Matti Picus.
* ``@cython.trashcan(True)`` can be used on an extension type to enable the
CPython trashcan. This allows deallocating deeply recursive objects without
overflowing the stack. Patch by Jeroen Demeyer. (Github issue #2842)
Bugs fixed
----------
......
......@@ -611,10 +611,95 @@ object called :attr:`__weakref__`. For example,::
cdef object __weakref__
Controlling cyclic garbage collection in CPython
================================================
Controlling deallocation and garbage collection in CPython
==========================================================
By default each extension type will support the cyclic garbage collector of
.. NOTE::
This section only applies to the usual CPython implementation
of Python. Other implementations like PyPy work differently.
.. _dealloc_intro:
Introduction
------------
First of all, it is good to understand that there are two ways to
trigger deallocation of Python objects in CPython:
CPython uses reference counting for all objects and any object with a
reference count of zero is immediately deallocated. This is the most
common way of deallocating an object. For example, consider ::
>>> x = "foo"
>>> x = "bar"
After executing the second line, the string ``"foo"`` is no longer referenced,
so it is deallocated. This is done using the ``tp_dealloc`` slot, which can be
customized in Cython by implementing ``__dealloc__``.
The second mechanism is the cyclic garbage collector.
This is meant to resolve cyclic reference cycles such as ::
>>> class Object:
... pass
>>> def make_cycle():
... x = Object()
... y = [x]
... x.attr = y
When calling ``make_cycle``, a reference cycle is created since ``x``
references ``y`` and vice versa. Even though neither ``x`` or ``y``
are accessible after ``make_cycle`` returns, both have a reference count
of 1, so they are not immediately deallocated. At regular times, the garbage
collector runs, which will notice the reference cycle
(using the ``tp_traverse`` slot) and break it.
Breaking a reference cycle means taking an object in the cycle
and removing all references from it to other Python objects (we call this
*clearing* an object). Clearing is almost the same as deallocating, except
that the actual object is not yet freed. For ``x`` in the example above,
the attributes of ``x`` would be removed from ``x``.
Note that it suffices to clear just one object in the reference cycle,
since there is no longer a cycle after clearing one object. Once the cycle
is broken, the usual refcount-based deallocation will actually remove the
objects from memory. Clearing is implemented in the ``tp_clear`` slot.
As we just explained, it is sufficient that one object in the cycle
implements ``tp_clear``.
Enabling the deallocation trashcan
----------------------------------
In CPython, it is possible to create deeply recursive objects. For example::
>>> L = None
>>> for i in range(2**20):
... L = [L]
Now imagine that we delete the final ``L``. Then ``L`` deallocates
``L[0]``, which deallocates ``L[0][0]`` and so on until we reach a
recursion depth of ``2**20``. This deallocation is done in C and such
a deep recursion will likely overflow the C call stack, crashing Python.
CPython invented a mechanism for this called the *trashcan*. It limits the
recursion depth of deallocations by delaying some deallocations.
By default, Cython extension types do not use the trashcan but it can be
enabled by setting the ``trashcan`` directive to ``True``. For example::
cimport cython
@cython.trashcan(True)
cdef class Object:
cdef dict __dict__
Trashcan usage is inherited by subclasses
(unless explicitly disabled by ``@cython.trashcan(False)``).
Some builtin types like ``list`` use the trashcan, so subclasses of it
use the trashcan by default.
Disabling cycle breaking (``tp_clear``)
---------------------------------------
By default, each extension type will support the cyclic garbage collector of
CPython. If any Python objects can be referenced, Cython will automatically
generate the ``tp_traverse`` and ``tp_clear`` slots. This is usually what you
want.
......@@ -622,13 +707,13 @@ want.
There is at least one reason why this might not be what you want: If you need
to cleanup some external resources in the ``__dealloc__`` special function and
your object happened to be in a reference cycle, the garbage collector may
have triggered a call to ``tp_clear`` to drop references. This is the way that
reference cycles are broken so that the garbage can actually be reclaimed.
have triggered a call to ``tp_clear`` to clear the object
(see :ref:`dealloc_intro`).
In that case any object references have vanished by the time when
``__dealloc__`` is called. Now your cleanup code lost access to the objects it
has to clean up. In that case you can disable the cycle breaker ``tp_clear``
by using the ``no_gc_clear`` decorator ::
In that case, any object references have vanished when ``__dealloc__``
is called. Now your cleanup code lost access to the objects it has to clean up.
To fix this, you can disable clearing instances of a specific class by using
the ``no_gc_clear`` directive::
@cython.no_gc_clear
cdef class DBCursor:
......@@ -641,17 +726,21 @@ by using the ``no_gc_clear`` decorator ::
This example tries to close a cursor via a database connection when the Python
object is destroyed. The ``DBConnection`` object is kept alive by the reference
from ``DBCursor``. But if a cursor happens to be in a reference cycle, the
garbage collector may effectively "steal" the database connection reference,
garbage collector may delete the database connection reference,
which makes it impossible to clean up the cursor.
Using the ``no_gc_clear`` decorator this can not happen anymore because the
references of a cursor object will not be cleared anymore.
If you use ``no_gc_clear``, it is important that any given reference cycle
contains at least one object *without* ``no_gc_clear``. Otherwise, the cycle
cannot be broken, which is a memory leak.
Disabling cyclic garbage collection
-----------------------------------
In rare cases, extension types can be guaranteed not to participate in cycles,
but the compiler won't be able to prove this. This would be the case if
the class can never reference itself, even indirectly.
In that case, you can manually disable cycle collection by using the
``no_gc`` decorator, but beware that doing so when in fact the extension type
``no_gc`` directive, but beware that doing so when in fact the extension type
can participate in cycles could cause memory leaks ::
@cython.no_gc
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment