Detect if a test leaks processes and terminate them
For every TestCase nxdtest spawns test process to run with stdout/stderr redirected to pipes that nxdtest reads. Nxdtest, in turn, tees those pipes to its stdout/stderr until the pipes become EOF. If the test process, in turn, spawns other processes, those other processes will inherit opened pipes, and so the pipes won't become EOF untill _all_ spawned test processes (main test process + other processes that it spawns) exit. Thus, if there will be any process, that the main test process spawned, but did not terminated upon its own exit, nxdtest will get stuck waiting for pipes to become EOF which won't happen at all if a spawned test subprocess persists not to terminate. I hit this problem for real on a Wendelin.core 2 test - there the main test processes was segfaulting and so did not instructed other spawned processes (ZEO, WCFS, ...) to terminate. As the result the whole test was becoming stuck instead of being promptly reported as failed: runTestSuite: Makefile:175: recipe for target 'test.wcfs' failed runTestSuite: make: *** [test.wcfs] Segmentation fault runTestSuite: wcfs: 2021/08/09 17:32:09 zlink [::1]:52052 - [::1]:23386: recvPkt: EOF runTestSuite: E0809 17:32:09.376800 38082 wcfs.go:2574] zwatch zeo://localhost:23386: zlink [::1]:52052 - [::1]:23386: recvPkt: EOF runTestSuite: E0809 17:32:09.377431 38082 wcfs.go:2575] zwatcher failed -> switching filesystem to EIO mode (TODO) <LONG WAIT> runTestSuite: PROCESS TOO LONG OR DEAD, GOING TO BE TERMINATED -> Fix it. /reviewed-by @jerome /reviewed-on !9
Status | Job ID | Name | Coverage | ||||||
---|---|---|---|---|---|---|---|---|---|
External | |||||||||
passed |
#253810
external
|
nxdtest.UnitTest-Master.Python2 |
00:00:47
|
||||||
passed |
#253895
external
|
nxdtest.UnitTest-Master.Python3 |
01:07:26
|
||||||
failed |
#253745
external
retried
|
nxdtest.UnitTest-Master.Python2 |
00:39:44
|
||||||
passed |
#253782
external
retried
|
nxdtest.UnitTest-Master.Python2 |
00:05:26
|
||||||
failed |
#253743
external
retried
|
nxdtest.UnitTest-Master.Python3 |
01:03:23
|
||||||