Commit 4228d8b6 authored by Kirill Smelkov's avatar Kirill Smelkov

bigfile/py: Don't forget to clear exception state after retrieving pybuf referrers

A buffer object (pybuf) is passed by C-level loadblk to python loadblk
implementation. Since pybuf points to memory that will go away after
loadblk call returns to virtmem, PyBigFile tries hard to make sure
nothing stays referencing pybuf so it can be released.

It tries to:

1. automatically GC cycles referencing pybuf (9aa6a5d7 "bigfile/py: Teach
   loadblk() to automatically break reference cycles to pybuf")
2. replace pybuf with stub object if a calling frame referencing it still
   stays alive (61b18a40 "bigfile/py/loadblk: Replace pybuf with a stub
   object in calling frame in case it stays alive")
3. and as a last resort unpins pybuf from original buffer memeory to
   point it to NULL (024c246c "bigfile/py/loadblk: Resort to pybuf
   unpinning, if nothing helps")

Step #1 invokes GC. Step #2 calls gc.get_referrers(pybuf) and looks for
frames in there.

The gc.get_referrers() call happens at python level with allocating some
objects, e.g. tuple to pass arguments, resulting list etc. And we all
know that any object allocation might cause automatic garbage
collection, and GC'ing can in turn ran arbitrary code due to __del__ in
release objects and weakrefs callbacks.

At a first glance the scenario that GC will be triggered at step #2
looks unrealistic because the GC was just run at step #1 and it is only
a few objects being allocated for the call at step #2. However if
arbitrary code runs from under GC it can create new garbage and thus
upon returning from gc.collect() the garbage list is not empty as the
following program demonstrates:

    ---- 8< ----
    import gc

    # just an object we can set attributes on
    class X:
        pass

    # call f on __del__
    class DelCall:
        def __init__(self, f):
            self.f = f

        def __del__(self):
            self.f()

    # _mkgarbage creates n objects of garbage kept referenced from an object cycle
    # so that only cyclic GC can free them.
    def _mkgarbage(n):
        # cycle
        a, b = X(), X()
        a.b, b.a = b, a

        # cycle references [n] garbage
        a.objv = [X() for _ in range(n)]
        return a

    # mkgarbage creates cycled garbage and arranges for twice more garbage to be
    # created when original garbage is collected
    def mkgarbage(n):
        a = _mkgarbage(n)
        a.ondel = DelCall(lambda : _mkgarbage(2*n))

    def main():
        for i in xrange(10):
            mkgarbage(1000)
            print '> %s' % (gc.get_count(),)
            n = gc.collect()
            print '< %s' % (gc.get_count(),)

    main()
    ---- 8< ----

    kirr@deco:~/tmp/trashme/t$ ./gcmoregarbage.py
    > (482, 11, 0)
    < (1581, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)
    > (531, 3, 0)
    < (2070, 0, 0)

here lines starting with "<" show amount of live garbage objects after
gc.collect() call has been finished.

This way on a busy server there could be arrangements when GC is
explicitly ran at step #1 and then automatically run at step #2 (because of
gc.get_referrers() python-level call) and from under GC #2 arbitrary code runs
thus potentially mutating exception state which shows in logs as

    bigfile/_bigfile.c:685: pybigfile_loadblk: Assertion `!(ts->exc_type || ts->exc_value || ts->exc_traceback)' failed.

----

So don't assume we end with clean exception state after collecting pybuf
referrers and just clear exception state once again as we do after explicit GC.
Don't make a similar assumption for buffer unpinning as an object is
decrefed there and in theory this can run some code.

A test is added to automatically exercise exception state clearing for
get_referrers code path via approach similar to demonstrated in above -
- we generate more garbage from under garbage and also arrange for
finalizers, which mutate exceptions state, to be run at GC #2.

The test without the fix applied fails like this:

    bigfile/_bigfile.c:710 pybigfile_loadblk WARN: python thread-state found with handled but not cleared exception state
    bigfile/_bigfile.c:711 pybigfile_loadblk WARN: I will dump it and then crash
    ts->exc_type:   None
    ts->exc_value:  <nil>
    ts->exc_traceback:      <nil>
    Segmentation fault (core dumped)

The None in ts->exc_type and nil value and traceback are probably coming from
here in cpython runtime:

    https://github.com/python/cpython/blob/883520a8/Python/ceval.c#L3717

Since this took some time to find, more diagnostics is also added before
BUG_ONs corresponding to finding unclean exception state.
parent 3804cc39
......@@ -551,6 +551,16 @@ out:
* removing references to pybuf.
*/
if (pybuf->ob_refcnt != 1) {
/* NOTE this could be noop if GC was already started from another
* thread, called some python code via decref and then while python
* code there was executing - python thread switch happens to us to
* come here with gc.collecting=1
*
* NOTE also: while collecting garbage even more garbage can be
* created due to arbitrary code run from undel __del__ of released
* objects and weakref callbacks. This way after here GC collect
* even a single allocation could trigger GC, and thus arbitrary
* python code run, again */
PyGC_Collect();
/* garbage collection could result in running arbitraty code
......@@ -610,6 +620,14 @@ out:
}
Py_DECREF(pybuf_users);
/* see note ^^^ around PyGC_Collect() call that we can land here
* with arbitrary python code ran again (because e.g.
* XPyObject_GetReferrers() allocates at least a list and that
* might trigger automatic GC again */
if (PyErr_Occurred())
PyErr_PrintEx(0);
XPyErr_FullClear();
}
/* above attempts were "best effort" to unreference pybuf. However we
......@@ -640,6 +658,12 @@ out:
PyMemoryViewObject *pybufm = (PyMemoryViewObject *)pybuf;
XPyBuffer_Unpin(&pybufm->view);
#endif
/* we released pybuf->base object which might run some code in __del__
* clear exception state again */
if (PyErr_Occurred())
PyErr_PrintEx(0);
XPyErr_FullClear();
}
#if 0
......@@ -661,7 +685,21 @@ out:
* NOTE in theory decref could run arbitrary code, but buffer_dealloc() is
* simply PyObject_DEL which does not lead to running python code. */
Py_XDECREF(pybuf);
if (PyErr_Occurred()) {
WARN("python thread-state found with exception set; but should not");
WARN("I will dump the exception and then crash");
PyErr_PrintEx(0);
}
BUG_ON(ts->curexc_type || ts->curexc_value || ts->curexc_traceback);
if (ts->exc_type) {
WARN("python thread-state found with handled but not cleared exception state");
WARN("I will dump it and then crash");
fprintf(stderr, "ts->exc_type:\t"); PyObject_Print(ts->exc_type, stderr, 0);
fprintf(stderr, "\nts->exc_value:\t"); PyObject_Print(ts->exc_value, stderr, 0);
fprintf(stderr, "\nts->exc_traceback:\t"); PyObject_Print(ts->exc_traceback, stderr, 0);
fprintf(stderr, "\n");
PyErr_Display(ts->exc_type, ts->exc_value, ts->exc_traceback);
}
BUG_ON(ts->exc_type || ts->exc_value || ts->exc_traceback);
BUG_ON(ts->async_exc);
......
......@@ -25,6 +25,7 @@ import gc
from pytest import raises
from six import PY2
from six.moves import range as xrange
# read-only attributes are reported differently on py2 & py3
ROAttributeError = (PY2 and TypeError or AttributeError)
......@@ -164,6 +165,28 @@ def test_pagefault_savestate():
# a case where only f_locals dict is kept alive
self.keep_f_locals(buf)
# arrange so that automatic GC is triggered several times from under
# pybigfile_loadblk() with modifying thread exception state - thus
# testing various codepaths that need to restore it there.
#gc.set_debug(gc.DEBUG_STATS)
g0 = gc.get_threshold()[0] # N(obj) in generation 0 to trigger automatic GC
a = mkgarbage(g0)
a.buf = buf # cycle references buf
# when garbage of a will be collected new even more garbage will be created
# and when that garbage will be collected an exception will be raised
def mkraisinggarbagealot():
# disable automatic GC while we are generating a lot of garbage
# so it is not collected automatically while we prepare it.
gc.disable()
mkraisinggarbage(2*g0);
# enable automatic GC near end so that it will be pybigfile_loadblk()
# to trigger next automatic GC with e.g. allocating new list for
# XPyObject_GetReferrers() call
gc.enable()
a.ondel = DelCall(mkraisinggarbagealot)
del a
self.loadblk_run = 1
......@@ -183,6 +206,48 @@ def test_pagefault_savestate():
except:
keepg.append(sys.exc_info()[2].tb_frame.f_locals)
# class without finalizer (objects with finalizers if cycled are not collected)
class X:
pass
# call f on __del__
class DelCall:
def __init__(self, f):
self.f = f
def __del__(self):
#print 'delcall', self.f
self.f()
# mkgarbage creates n objects of garbage kept referenced from an object cycle
# so that only cyclic GC can free them.
def mkgarbage(n):
#print 'mkgarbage', n
# cycle
a, b = X(), X()
a.b, b.a = b, a
# cycle references [n] garbage objects
a.objv = [X() for _ in xrange(n)]
return a
# mkraisinggarbage like mkgarbage creates cycle with n objects of garbage
# but in addition prepares things so that when the cycle is collected
# exceptions are raise from under finalizers in various ways
def mkraisinggarbage(n):
#print 'mkraisinggarbage', n
a = mkgarbage(n)
a.boom = DelCall(lambda: 1/0) # raise without catch
a.boom2 = DelCall(raiseandcatch) # leaves exception object in "exc_handled" state
def raiseandcatch():
#print 'raiseandcatch'
try:
1/0
except:
pass
f = BadFile(PS)
fh = f.fileh_open()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment