bigfile/py: Properly untrack PyVMA from GC before dealloc

On a testing instance we started to see segfaults in pyvma_dealloc() with inside calls to vma_unmap but with NULL pyvma->fileh. That was strange, becuse before calling vma_unmap(), the code explicitly checks whether pyvma->fileh is !NULL. That was, as it turned out, due to pyvma_dealloc being called twice at the same time from two python threads. Here is how that was possible: T1 decrefs pyvma and finds its reference count drops to zero. It calls pyvma_dealloc. From there vma_unmap() is called, which calls virt_lock() and that releases GIL first. Another thread T2 was waiting for GIL, it acquires it, does some work at python level and somehow triggers GC. Since PyVMA supports cyclic GC, it was on GC list and thus GC calls dealloc for the same vma again. Here is how it looks in the backtraces: T1: #0 0x00007f6aefc57827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x1e011d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x1e011d0, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007f6aefc578d4 in __new_sem_wait_slow (sem=0x1e011d0, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007f6aefc5797a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29 #4 0x00000000004ffbc4 in PyThread_acquire_lock () #5 0x00000000004dbe8a in PyEval_RestoreThread () #6 0x00007f6ac6d3b8fc in py_gil_retake_if_waslocked (arg=0x4f18f00) at bigfile/_bigfile.c:1048 #7 0x00007f6ac6d3dcfc in virt_gil_retake_if_waslocked (gilstate=0x4f18f00) at bigfile/virtmem.c:78 #8 0x00007f6ac6d3dd30 in virt_lock () at bigfile/virtmem.c:92 #9 0x00007f6ac6d3e724 in vma_unmap (vma=0x7f6a7e0c4100) at bigfile/virtmem.c:271 #10 0x00007f6ac6d3a0bc in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:284 ... #13 0x00000000004d76b0 in PyEval_EvalFrameEx () T2: #5 0x00007f6ac6d3a081 in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:276 #6 0x0000000000500450 in ?? () #7 0x00000000004ffd82 in _PyObject_GC_New () #8 0x0000000000485392 in PyList_New () #9 0x00000000004d3bff in PyEval_EvalFrameEx () T2 does the work of vma_unmap and clears C-level vma. Then, when T1 wakes up and returns to vma_unmap, it sees vma->file and all other fields cleared -> oops segfault. Fix it by removing pyvma from GC list before going to do actual destruction. This way if a concurrent GC triggers, it won't see the vma object on its list, and thus won't have a chance to invoke its destructor the second time. The bug was introduced in 450ad804 (bigarray: ArrayRef support for BigArray) when PyVMA was changed to be cyclic-GC aware. However at that time, even Python documentation itself was not saying PyObject_GC_UnTrack is needed, as it was added only in 2.7.15 after finding that many types in CPython itself are vulnerable to similar segfaults: https://github.com/python/cpython/commit/4cde4bdcc86 https://bugs.python.org/issue31095 It is pity, that CPython took the approach to force all type authors to care to invoke PyObject_GC_UnTrack explicitly, instead of doing that automatically in Python runtime before calling tp_dealloc. /cc @Tyagov, @klaus /reviewed-on nexedi/wendelin.core!11

bigfile/py: Properly untrack PyVMA from GC before dealloc
On a testing instance we started to see segfaults in pyvma_dealloc() with inside calls to vma_unmap but with NULL pyvma->fileh. That was strange, becuse before calling vma_unmap(), the code explicitly checks whether pyvma->fileh is !NULL. That was, as it turned out, due to pyvma_dealloc being called twice at the same time from two python threads. Here is how that was possible: T1 decrefs pyvma and finds its reference count drops to zero. It calls pyvma_dealloc. From there vma_unmap() is called, which calls virt_lock() and that releases GIL first. Another thread T2 was waiting for GIL, it acquires it, does some work at python level and somehow triggers GC. Since PyVMA supports cyclic GC, it was on GC list and thus GC calls dealloc for the same vma again. Here is how it looks in the backtraces: T1: #0 0x00007f6aefc57827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x1e011d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x1e011d0, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007f6aefc578d4 in __new_sem_wait_slow (sem=0x1e011d0, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007f6aefc5797a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29 #4 0x00000000004ffbc4 in PyThread_acquire_lock () #5 0x00000000004dbe8a in PyEval_RestoreThread () #6 0x00007f6ac6d3b8fc in py_gil_retake_if_waslocked (arg=0x4f18f00) at bigfile/_bigfile.c:1048 #7 0x00007f6ac6d3dcfc in virt_gil_retake_if_waslocked (gilstate=0x4f18f00) at bigfile/virtmem.c:78 #8 0x00007f6ac6d3dd30 in virt_lock () at bigfile/virtmem.c:92 #9 0x00007f6ac6d3e724 in vma_unmap (vma=0x7f6a7e0c4100) at bigfile/virtmem.c:271 #10 0x00007f6ac6d3a0bc in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:284 ... #13 0x00000000004d76b0 in PyEval_EvalFrameEx () T2: #5 0x00007f6ac6d3a081 in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:276 #6 0x0000000000500450 in ?? () #7 0x00000000004ffd82 in _PyObject_GC_New () #8 0x0000000000485392 in PyList_New () #9 0x00000000004d3bff in PyEval_EvalFrameEx () T2 does the work of vma_unmap and clears C-level vma. Then, when T1 wakes up and returns to vma_unmap, it sees vma->file and all other fields cleared -> oops segfault. Fix it by removing pyvma from GC list before going to do actual destruction. This way if a concurrent GC triggers, it won't see the vma object on its list, and thus won't have a chance to invoke its destructor the second time. The bug was introduced in 450ad804 (bigarray: ArrayRef support for BigArray) when PyVMA was changed to be cyclic-GC aware. However at that time, even Python documentation itself was not saying PyObject_GC_UnTrack is needed, as it was added only in 2.7.15 after finding that many types in CPython itself are vulnerable to similar segfaults: https://github.com/python/cpython/commit/4cde4bdcc86 https://bugs.python.org/issue31095 It is pity, that CPython took the approach to force all type authors to care to invoke PyObject_GC_UnTrack explicitly, instead of doing that automatically in Python runtime before calling tp_dealloc. /cc @Tyagov, @klaus /reviewed-on nexedi/wendelin.core!11
d97641d2 · Kirill Smelkov · 32ca80e2 · d97641d2 · d97641d2
Commit d97641d2 authored Jan 09, 2019 by Kirill Smelkov
Hide whitespace changes
Inline Side-by-side

Showing with 28 additions and 1 deletion

bigfile/_bigfile.c bigfile/_bigfile.c +17 -1

bigfile/tests/test_basic.py bigfile/tests/test_basic.py +11 -0

No files found.
--- a/bigfile/_bigfile.c
+++ b/bigfile/_bigfile.c
 /* Wendelin.bigfile | Python interface to memory/files
- * Copyright (C) 2014-2018  Nexedi SA and Contributors.
+ * Copyright (C) 2014-2019  Nexedi SA and Contributors.
 *                          Kirill Smelkov <kirr@nexedi.com>
 *
 * This program is free software: you can Use, Study, Modify and Redistribute
@@ -26,6 +26,13 @@
 *   VMA with a buffer/memoryview interface
 *   BigFileH  with mmap (-> vma) and writeout control
 *   BigFile   base class (to allow implementing BigFile backends in python)
+ *
+ * NOTE virtmem/bigfile functions release/reacquire GIL (see virt_lock()) -
+ * thus functions that use them cannot assume they run mutually exclusive to
+ * other Python threads. See Py_CLEAR documentation which explain what could go
+ * wrong if code is not careful to protect itself against concurrent GC:
+ *
+ * https://github.com/python/cpython/blob/v2.7.15-310-g112e4afd582/Include/object.h#L790-L798
 */

 #include <Python.h>
@@ -269,6 +276,13 @@ PyFunc(pyvma_pagesize, "pagesize() -> pagesize -- page size of RAM underlying th
 static void
 pyvma_dealloc(PyObject *pyvma0)
 {
+    /* PyVMA supports cyclic GC - first, before destructing, remove it from GC
+     * list to avoid segfaulting on double entry here - e.g. if GC is triggered
+     * from a weakref callback, or concurrently from another thread.
+     *
+     * See https://bugs.python.org/issue31095 for details */
+    PyObject_GC_UnTrack(pyvma0);
+
    PyVMA       *pyvma = upcast(PyVMA *, pyvma0);
    BigFileH    *fileh = pyvma->fileh;

@@ -460,6 +474,8 @@ PyFunc(pyfileh_invalidate_page, "invalidate_page(pgoffset) - invalidate fileh pa
 static void
 pyfileh_dealloc(PyObject *pyfileh0)
 {
+    /* PyBigFileH does not support cyclic GC - no need to PyObject_GC_UnTrack it */
+
    PyBigFileH  *pyfileh = upcast(PyBigFileH *, pyfileh0);
    BigFile     *file = pyfileh->file;
    PyBigFile   *pyfile;

--- a/bigfile/tests/test_basic.py
+++ b/bigfile/tests/test_basic.py
@@ -413,6 +413,17 @@ def test_gc_from_sighandler():
    assert f3.marker_list == [2]


+# test that vma dealloc is safe wrt concurrent GC.
+def test_vma_del_vs_gc():
+    f   = XBigFile(b'abcd', PS)
+    fh  = f.fileh_open()
+    vma = fh.mmap(0, 1)
+
+    # del vma, but make sure vma.dealloc calls another code, which eventually calls GC.
+    # this will segfault, if vma.dealloc does not properly untrack vma from GC first.
+    wvma = weakref.ref(vma, lambda _: gc.collect())
+    del vma
+
 # test that there is no crash after first store error on writeout
 class StoreError(Exception):
    pass