• Kirill Smelkov's avatar
    bigfile/py: Don't forget to clear exception state after retrieving pybuf referrers · 4228d8b6
    Kirill Smelkov authored
    A buffer object (pybuf) is passed by C-level loadblk to python loadblk
    implementation. Since pybuf points to memory that will go away after
    loadblk call returns to virtmem, PyBigFile tries hard to make sure
    nothing stays referencing pybuf so it can be released.
    
    It tries to:
    
    1. automatically GC cycles referencing pybuf (9aa6a5d7 "bigfile/py: Teach
       loadblk() to automatically break reference cycles to pybuf")
    2. replace pybuf with stub object if a calling frame referencing it still
       stays alive (61b18a40 "bigfile/py/loadblk: Replace pybuf with a stub
       object in calling frame in case it stays alive")
    3. and as a last resort unpins pybuf from original buffer memeory to
       point it to NULL (024c246c "bigfile/py/loadblk: Resort to pybuf
       unpinning, if nothing helps")
    
    Step #1 invokes GC. Step #2 calls gc.get_referrers(pybuf) and looks for
    frames in there.
    
    The gc.get_referrers() call happens at python level with allocating some
    objects, e.g. tuple to pass arguments, resulting list etc. And we all
    know that any object allocation might cause automatic garbage
    collection, and GC'ing can in turn ran arbitrary code due to __del__ in
    release objects and weakrefs callbacks.
    
    At a first glance the scenario that GC will be triggered at step #2
    looks unrealistic because the GC was just run at step #1 and it is only
    a few objects being allocated for the call at step #2. However if
    arbitrary code runs from under GC it can create new garbage and thus
    upon returning from gc.collect() the garbage list is not empty as the
    following program demonstrates:
    
        ---- 8< ----
        import gc
    
        # just an object we can set attributes on
        class X:
            pass
    
        # call f on __del__
        class DelCall:
            def __init__(self, f):
                self.f = f
    
            def __del__(self):
                self.f()
    
        # _mkgarbage creates n objects of garbage kept referenced from an object cycle
        # so that only cyclic GC can free them.
        def _mkgarbage(n):
            # cycle
            a, b = X(), X()
            a.b, b.a = b, a
    
            # cycle references [n] garbage
            a.objv = [X() for _ in range(n)]
            return a
    
        # mkgarbage creates cycled garbage and arranges for twice more garbage to be
        # created when original garbage is collected
        def mkgarbage(n):
            a = _mkgarbage(n)
            a.ondel = DelCall(lambda : _mkgarbage(2*n))
    
        def main():
            for i in xrange(10):
                mkgarbage(1000)
                print '> %s' % (gc.get_count(),)
                n = gc.collect()
                print '< %s' % (gc.get_count(),)
    
        main()
        ---- 8< ----
    
        kirr@deco:~/tmp/trashme/t$ ./gcmoregarbage.py
        > (482, 11, 0)
        < (1581, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
        > (531, 3, 0)
        < (2070, 0, 0)
    
    here lines starting with "<" show amount of live garbage objects after
    gc.collect() call has been finished.
    
    This way on a busy server there could be arrangements when GC is
    explicitly ran at step #1 and then automatically run at step #2 (because of
    gc.get_referrers() python-level call) and from under GC #2 arbitrary code runs
    thus potentially mutating exception state which shows in logs as
    
        bigfile/_bigfile.c:685: pybigfile_loadblk: Assertion `!(ts->exc_type || ts->exc_value || ts->exc_traceback)' failed.
    
    ----
    
    So don't assume we end with clean exception state after collecting pybuf
    referrers and just clear exception state once again as we do after explicit GC.
    Don't make a similar assumption for buffer unpinning as an object is
    decrefed there and in theory this can run some code.
    
    A test is added to automatically exercise exception state clearing for
    get_referrers code path via approach similar to demonstrated in above -
    - we generate more garbage from under garbage and also arrange for
    finalizers, which mutate exceptions state, to be run at GC #2.
    
    The test without the fix applied fails like this:
    
        bigfile/_bigfile.c:710 pybigfile_loadblk WARN: python thread-state found with handled but not cleared exception state
        bigfile/_bigfile.c:711 pybigfile_loadblk WARN: I will dump it and then crash
        ts->exc_type:   None
        ts->exc_value:  <nil>
        ts->exc_traceback:      <nil>
        Segmentation fault (core dumped)
    
    The None in ts->exc_type and nil value and traceback are probably coming from
    here in cpython runtime:
    
        https://github.com/python/cpython/blob/883520a8/Python/ceval.c#L3717
    
    Since this took some time to find, more diagnostics is also added before
    BUG_ONs corresponding to finding unclean exception state.
    4228d8b6
test_basic.py 12.5 KB