You need to sign in or sign up before continuing.
  • Kirill Smelkov's avatar
    Hopefully fix test_cancel_from_signal flakiness on testnodes · c9355bec
    Kirill Smelkov authored
    After inspecting nxdtest status on testnodes I see frequent failures of
    test_cancel_from_signal - from e.g. https://erp5js.nexedi.net/#/test_result_module/20250218-F232A924/2 :
    
        =================================== FAILURES ===================================
        _________________ test_cancel_from_signal[userns_default-sig0] _________________
    
            def _():
                proc.terminate()
                if proc.poll() is None:
        >           time.sleep(1)
        E           Failed: Timeout >3.0s
    
        nxdtest/nxdtest_test.py:385: Failed
    
    which means, as that _() is executed upon exiting from test_cancel_from_signal
    function, that time.sleep(1) simply timed out exceeding 3 seconds of total
    time budget we gave to this test to run.
    
    Now if I look at b0cf277d (Cancel test run on SIGINT/SIGTERM), that
    introduced this test, I can see that I even put another
    pytest.mark.timeout(timeout=10) there in commented form, and probably
    forgot to switch back from pytest.mark.timeout(timeout=3) I used during
    local debugging.
    
    The other tests, that use timeouts, all use 10s of time budget out of the box:
    
        test_run_procleak        0ad45a9c (Detect if a test leaks processes and terminate them)
        test_cancel_from_master  5d656ccf (Add test for cancel propagation)
    
    so, given that OS load is high on testnodes machines, I think we should
    do the same for test_cancel_from_signal as well.
    
    This should hopefully fix the run-out timeouts for this test we
    currently see on testnodes.
    
    /reviewed-by @jerome
    /reviewed-on nexedi/nxdtest!20
    c9355bec