Hopefully fix test_cancel_from_signal flakiness on testnodes (!20) · Merge requests · nexedi / nxdtest

Hopefully fix test_cancel_from_signal flakiness on testnodes

After inspecting nxdtest status on testnodes I see frequent failures of test_cancel_from_signal - from e.g. https://erp5js.nexedi.net/#/test_result_module/20250218-F232A924/2 :

=================================== FAILURES ===================================
_________________ test_cancel_from_signal[userns_default-sig0] _________________

    def _():
        proc.terminate()
        if proc.poll() is None:
>           time.sleep(1)
E           Failed: Timeout >3.0s

nxdtest/nxdtest_test.py:385: Failed

which means, as that _() is executed upon exiting from test_cancel_from_signal function, that time.sleep(1) simply timed out exceeding 3 seconds of total time budget we gave to this test to run.

Now if I look at b0cf277d (Cancel test run on SIGINT/SIGTERM), that introduced this test, I can see that I even put another pytest.mark.timeout(timeout=10) there in commented form, and probably forgot to switch back from pytest.mark.timeout(timeout=3) I used during local debugging.

The other tests, that use timeouts, all use 10s of time budget out of the box:

test_run_procleak 0ad45a9c (Detect if a test leaks processes and terminate them)
test_cancel_from_master 5d656ccf (Add test for cancel propagation)

so, given that OS load is high on testnodes machines, I think we should do the same for test_cancel_from_signal as well.

This should hopefully fix the run-out timeouts for this test we currently see on testnodes.

/cc @jerome

Edited Feb 21, 2025 by Kirill Smelkov