Commit 0ad45a9c authored by Kirill Smelkov's avatar Kirill Smelkov

Detect if a test leaks processes and terminate them

For every TestCase nxdtest spawns test process to run with stdout/stderr
redirected to pipes that nxdtest reads. Nxdtest, in turn, tees those
pipes to its stdout/stderr until the pipes become EOF. If the test
process, in turn, spawns other processes, those other processes will
inherit opened pipes, and so the pipes won't become EOF untill _all_
spawned test processes (main test process + other processes that it
spawns) exit. Thus, if there will be any process, that the main test
process spawned, but did not terminated upon its own exit, nxdtest will
get stuck waiting for pipes to become EOF which won't happen at all if a
spawned test subprocess persists not to terminate.

I hit this problem for real on a Wendelin.core 2 test - there the main
test processes was segfaulting and so did not instructed other spawned
processes (ZEO, WCFS, ...) to terminate. As the result the whole test
was becoming stuck instead of being promptly reported as failed:

    runTestSuite: Makefile:175: recipe for target 'test.wcfs' failed
    runTestSuite: make: *** [test.wcfs] Segmentation fault
    runTestSuite: wcfs: 2021/08/09 17:32:09 zlink [::1]:52052 - [::1]:23386: recvPkt: EOF
    runTestSuite: E0809 17:32:09.376800   38082 wcfs.go:2574] zwatch zeo://localhost:23386: zlink [::1]:52052 - [::1]:23386: recvPkt: EOF
    runTestSuite: E0809 17:32:09.377431   38082 wcfs.go:2575] zwatcher failed -> switching filesystem to EIO mode (TODO)
    <LONG WAIT>
    runTestSuite: PROCESS TOO LONG OR DEAD, GOING TO BE TERMINATED

-> Fix it.

/reviewed-by @jerome
/reviewed-on !9
parent b5a74214
Pipeline #16923 passed with stage
in 0 seconds