Run each testcase with its own /tmp and /dev/shm
and detect leaked temporary files and mount entries after each test run.
Background
Currently we have several testing-related problems that are all connected to /tmp and similar directories:
Problem 1: many tests create temporary files for each run. Usually tests are careful to remove them on teardown, but due to bugs, many kind of tests, test processes being hard-killed (SIGKILL, or SIGSEGV) and other reasons, in practice this cleanup does not work 100% reliably and there is steady growth of files leaked on /tmp on testnodes.
Problem 2: due to using shared /tmp and /dev/shm, the isolation in between different test runs of potentially different users is not strong. For example @jerome reports that due to leakage of faketime's shared segments separate test runs affect each other and fail: https://erp5.nexedi.net/bug_module/20211125-1C8FE17
Problem 3: many tests depend on /tmp being a tmpfs instance. This are for example wendelin.core tests which are intensively writing to database, and, if /tmp is resided on disk, timeout due to disk IO stalls in fsync on every commit. The stalls are as much as >30s and lead to ~2.5x overall slowdown for test runs. However the main problem is spike of increased latency which, with close to 100% probability, always render some test as missing its deadline. This topic is covered in https://erp5.com/group_section/forum/Using-tmpfs-for--tmp-on-testnodes-JTocCtJjOd
There are many ways to try to address each problem separately, but they all come with limitations and drawbacks. We discussed things with @tomo and @jerome, and it looks like that all those problems can be addressed in one go if we run tests under user namespaces with private mounts for /tmp and /dev/shm.
Even though namespaces is generally no-go in Nexedi, they seem to be ok to use in tests. For example they are already used via private_tmpfs option in SlapOS:
https://lab.nexedi.com/nexedi/slapos/blob/1876c150/slapos/recipe/librecipe/execute.py#L87-103 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo-input-schema.json#L121-124 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L11-16 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L30-34 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L170-177 ... https://lab.nexedi.com/nexedi/slapos/blob/1876c150/stack/erp5/instance-zope.cfg.in#L227-230
Thomas says that using private tmpfs for each test would be a better solution than implementing tmpfs for whole /tmp on testnodes. He also reports that @jp is OK to use namespaces for test as long as there is a fallback if namespaces aren't available.
-> So let's do that: teach nxdtest to run each test case in its own private environment with privately-mounted /tmp and /dev/shm if we can detect that user namespaces are available. In an environment where user namespaces are indeed available this addresses all 3 problems because isolation and being-tmpfs are there by design, and even if some files will leak, the kernel will free everything when test terminates and the filesystem is automatically unmounted. We also detect such leakage and report a warning so that such problems do not go completely unnoticed.
Implementation
We leverage unshare(1) for simplicity. I decided to preserve uid/gid
instead of becoming uid=0 (= unshare -Umr
) for better traceability, so
that it is clear from test output under which real slapuser a test is
run(*). Not changing uid requires to activate ambient capabilities so
that mounting filesystems, including FUSE-based needed by wendelin.core,
continue to work under regular non-zero uid. Please see
https://git.kernel.org/linus/58319057b784 for details on this topic. And
please refer to added trun.py for details on how per-test namespace is setup.
Using FUSE inside user namespaces requires Linux >= 4.18 (see https://git.kernel.org/linus/da315f6e0398 and https://git.kernel.org/linus/8cb08329b080), so if we are really to use this patch we'll have to upgrade kernel on our testnodes, at least where wendelin.core is used in tests.
"no namespaces" detection is implemented via first running unshare ... true
with the same unshare options that are going to be used to create
and enter new user namespace for real. If that fails, we fallback into
"no namespaces" mode where no private /tmp and /dev/shm are mounted(%).
(*) for example nxdtest logs information about the system on startup:
date: Mon, 29 Nov 2021 17:27:04 MSK
xnode: slapuserX@test.node
...
(%) Here is how nxdtest is run in fallback mode on my Debian 11 with
user namespaces disabled via sysctl kernel.unprivileged_userns_clone=0
(neo) (z-dev) (g.env) kirr@deca:~/src/wendelin/nxdtest$ nxdtest
date: Thu, 02 Dec 2021 14:04:30 MSK
xnode: kirr@deca.navytux.spb.ru
uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>>> pytest
$ python -m pytest
# user namespaces not available. isolation and many checks will be deactivated. <--- NOTE
===================== test session starts ======================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.10.0, pluggy-0.13.1
rootdir: /home/kirr/src/wendelin/nxdtest
plugins: timeout-1.4.2
collected 23 items
nxdtest/nxdtest_pylint_test.py .... [ 17%]
nxdtest/nxdtest_pytest_test.py ... [ 30%]
nxdtest/nxdtest_test.py ......xx [ 65%]
nxdtest/nxdtest_unittest_test.py ........ [100%]
============= 21 passed, 2 xfailed in 2.67 seconds =============
ok pytest 3.062s # 23t 0e 0f 0s
# ran 1 test case: 1·ok