Run each testcase with its own /tmp and /dev/shm

and detect leaked temporary files and mount entries after each test run. Background Currently we have several testing-related problems that are all connected to /tmp and similar directories: Problem 1: many tests create temporary files for each run. Usually tests are careful to remove them on teardown, but due to bugs, many kind of tests, test processes being hard-killed (SIGKILL, or SIGSEGV) and other reasons, in practice this cleanup does not work 100% reliably and there is steady growth of files leaked on /tmp on testnodes. Problem 2: due to using shared /tmp and /dev/shm, the isolation in between different test runs of potentially different users is not strong. For example @jerome reports that due to leakage of faketime's shared segments separate test runs affect each other and fail: https://erp5.nexedi.net/bug_module/20211125-1C8FE17 Problem 3: many tests depend on /tmp being a tmpfs instance. This are for example wendelin.core tests which are intensively writing to database, and, if /tmp is resided on disk, timeout due to disk IO stalls in fsync on every commit. The stalls are as much as >30s and lead to ~2.5x overall slowdown for test runs. However the main problem is spike of increased latency which, with close to 100% probability, always render some test as missing its deadline. This topic is covered in https://erp5.com/group_section/forum/Using-tmpfs-for--tmp-on-testnodes-JTocCtJjOd -------- There are many ways to try to address each problem separately, but they all come with limitations and drawbacks. We discussed things with @tomo and @jerome, and it looks like that all those problems can be addressed in one go if we run tests under user namespaces with private mounts for /tmp and /dev/shm. Even though namespaces is generally no-go in Nexedi, they seem to be ok to use in tests. For example they are already used via private_tmpfs option in SlapOS: https://lab.nexedi.com/nexedi/slapos/blob/1876c150/slapos/recipe/librecipe/execute.py#L87-103 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo-input-schema.json#L121-124 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L11-16 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L30-34 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L170-177 ... https://lab.nexedi.com/nexedi/slapos/blob/1876c150/stack/erp5/instance-zope.cfg.in#L227-230 Thomas says that using private tmpfs for each test would be a better solution than implementing tmpfs for whole /tmp on testnodes. He also reports that @jp is OK to use namespaces for test as long as there is a fallback if namespaces aren't available. -> So let's do that: teach nxdtest to run each test case in its own private environment with privately-mounted /tmp and /dev/shm if we can detect that user namespaces are available. In an environment where user namespaces are indeed available this addresses all 3 problems because isolation and being-tmpfs are there by design, and even if some files will leak, the kernel will free everything when test terminates and the filesystem is automatically unmounted. We also detect such leakage and report a warning so that such problems do not go completely unnoticed. Implementation We leverage unshare(1) for simplicity. I decided to preserve uid/gid instead of becoming uid=0 (= `unshare -Umr`) for better traceability, so that it is clear from test output under which real slapuser a test is run(*). Not changing uid requires to activate ambient capabilities so that mounting filesystems, including FUSE-based needed by wendelin.core, continue to work under regular non-zero uid. Please see https://git.kernel.org/linus/58319057b784 for details on this topic. And please refer to added trun.py for details on how per-test namespace is setup. Using FUSE inside user namespaces requires Linux >= 4.18 (see https://git.kernel.org/linus/da315f6e0398 and https://git.kernel.org/linus/8cb08329b080), so if we are really to use this patch we'll have to upgrade kernel on our testnodes, at least where wendelin.core is used in tests. "no namespaces" detection is implemented via first running `unshare ... true` with the same unshare options that are going to be used to create and enter new user namespace for real. If that fails, we fallback into "no namespaces" mode where no private /tmp and /dev/shm are mounted(%). (*) for example nxdtest logs information about the system on startup: date: Mon, 29 Nov 2021 17:27:04 MSK xnode: slapuserX@test.node ... (%) Here is how nxdtest is run in fallback mode on my Debian 11 with user namespaces disabled via `sysctl kernel.unprivileged_userns_clone=0` (neo) (z-dev) (g.env) kirr@deca:~/src/wendelin/nxdtest$ nxdtest date: Thu, 02 Dec 2021 14:04:30 MSK xnode: kirr@deca.navytux.spb.ru uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz >>> pytest $ python -m pytest # user namespaces not available. isolation and many checks will be deactivated. <--- NOTE ===================== test session starts ====================== platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.10.0, pluggy-0.13.1 rootdir: /home/kirr/src/wendelin/nxdtest plugins: timeout-1.4.2 collected 23 items nxdtest/nxdtest_pylint_test.py .... [ 17%] nxdtest/nxdtest_pytest_test.py ... [ 30%] nxdtest/nxdtest_test.py ......xx [ 65%] nxdtest/nxdtest_unittest_test.py ........ [100%] ============= 21 passed, 2 xfailed in 2.67 seconds ============= ok pytest 3.062s # 23t 0e 0f 0s # ran 1 test case: 1·ok /helped-by @tomo /helped-and-reviewed-by @jerome /reviewed-on nexedi/nxdtest!13

Run each testcase with its own /tmp and /dev/shm
and detect leaked temporary files and mount entries after each test run. Background Currently we have several testing-related problems that are all connected to /tmp and similar directories: Problem 1: many tests create temporary files for each run. Usually tests are careful to remove them on teardown, but due to bugs, many kind of tests, test processes being hard-killed (SIGKILL, or SIGSEGV) and other reasons, in practice this cleanup does not work 100% reliably and there is steady growth of files leaked on /tmp on testnodes. Problem 2: due to using shared /tmp and /dev/shm, the isolation in between different test runs of potentially different users is not strong. For example @jerome reports that due to leakage of faketime's shared segments separate test runs affect each other and fail: https://erp5.nexedi.net/bug_module/20211125-1C8FE17 Problem 3: many tests depend on /tmp being a tmpfs instance. This are for example wendelin.core tests which are intensively writing to database, and, if /tmp is resided on disk, timeout due to disk IO stalls in fsync on every commit. The stalls are as much as >30s and lead to ~2.5x overall slowdown for test runs. However the main problem is spike of increased latency which, with close to 100% probability, always render some test as missing its deadline. This topic is covered in https://erp5.com/group_section/forum/Using-tmpfs-for--tmp-on-testnodes-JTocCtJjOd -------- There are many ways to try to address each problem separately, but they all come with limitations and drawbacks. We discussed things with @tomo and @jerome, and it looks like that all those problems can be addressed in one go if we run tests under user namespaces with private mounts for /tmp and /dev/shm. Even though namespaces is generally no-go in Nexedi, they seem to be ok to use in tests. For example they are already used via private_tmpfs option in SlapOS: https://lab.nexedi.com/nexedi/slapos/blob/1876c150/slapos/recipe/librecipe/execute.py#L87-103 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo-input-schema.json#L121-124 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L11-16 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L30-34 https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L170-177 ... https://lab.nexedi.com/nexedi/slapos/blob/1876c150/stack/erp5/instance-zope.cfg.in#L227-230 Thomas says that using private tmpfs for each test would be a better solution than implementing tmpfs for whole /tmp on testnodes. He also reports that @jp is OK to use namespaces for test as long as there is a fallback if namespaces aren't available. -> So let's do that: teach nxdtest to run each test case in its own private environment with privately-mounted /tmp and /dev/shm if we can detect that user namespaces are available. In an environment where user namespaces are indeed available this addresses all 3 problems because isolation and being-tmpfs are there by design, and even if some files will leak, the kernel will free everything when test terminates and the filesystem is automatically unmounted. We also detect such leakage and report a warning so that such problems do not go completely unnoticed. Implementation We leverage unshare(1) for simplicity. I decided to preserve uid/gid instead of becoming uid=0 (= `unshare -Umr`) for better traceability, so that it is clear from test output under which real slapuser a test is run(*). Not changing uid requires to activate ambient capabilities so that mounting filesystems, including FUSE-based needed by wendelin.core, continue to work under regular non-zero uid. Please see https://git.kernel.org/linus/58319057b784 for details on this topic. And please refer to added trun.py for details on how per-test namespace is setup. Using FUSE inside user namespaces requires Linux >= 4.18 (see https://git.kernel.org/linus/da315f6e0398 and https://git.kernel.org/linus/8cb08329b080), so if we are really to use this patch we'll have to upgrade kernel on our testnodes, at least where wendelin.core is used in tests. "no namespaces" detection is implemented via first running `unshare ... true` with the same unshare options that are going to be used to create and enter new user namespace for real. If that fails, we fallback into "no namespaces" mode where no private /tmp and /dev/shm are mounted(%). (*) for example nxdtest logs information about the system on startup: date: Mon, 29 Nov 2021 17:27:04 MSK xnode: slapuserX@test.node ... (%) Here is how nxdtest is run in fallback mode on my Debian 11 with user namespaces disabled via `sysctl kernel.unprivileged_userns_clone=0` (neo) (z-dev) (g.env) kirr@deca:~/src/wendelin/nxdtest$ nxdtest date: Thu, 02 Dec 2021 14:04:30 MSK xnode: kirr@deca.navytux.spb.ru uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz >>> pytest $ python -m pytest # user namespaces not available. isolation and many checks will be deactivated. <--- NOTE ===================== test session starts ====================== platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.10.0, pluggy-0.13.1 rootdir: /home/kirr/src/wendelin/nxdtest plugins: timeout-1.4.2 collected 23 items nxdtest/nxdtest_pylint_test.py .... [ 17%] nxdtest/nxdtest_pytest_test.py ... [ 30%] nxdtest/nxdtest_test.py ......xx [ 65%] nxdtest/nxdtest_unittest_test.py ........ [100%] ============= 21 passed, 2 xfailed in 2.67 seconds ============= ok pytest 3.062s # 23t 0e 0f 0s # ran 1 test case: 1·ok /helped-by @tomo /helped-and-reviewed-by @jerome /reviewed-on nexedi/nxdtest!13
a191468f · Kirill Smelkov · 4fe9ee16 · a191468f · a191468f · a191468f
Commit a191468f authored Nov 29, 2021 by Kirill Smelkov
5 changed files
--- a/nxdtest/__init__.py
+++ b/nxdtest/__init__.py
@@ -60,11 +60,15 @@ from subprocess import Popen, PIPE
 from time import time, sleep, strftime, gmtime, localtime
 import os, sys, argparse, logging, traceback, re, pwd, socket
 from errno import ESRCH, EPERM
+from os.path import dirname
 import six
 from golang import b, defer, func, select, default
 from golang import context, sync
 import psutil

+# trun.py is a helper via which we run tests.
+trun_py = "%s/trun.py" % dirname(__file__)
+
 # loadNXDTestFile loads .nxdtest file located @path.
 def loadNXDTestFile(path): # -> TestEnv
    t = TestEnv()
@@ -249,7 +253,7 @@ def main():
            # TODO session -> cgroup, because a child process could create another new session.
            def newsession():
                os.setsid()
-            p = Popen(t.argv, env=env, stdin=devnull, stdout=PIPE, stderr=PIPE, bufsize=0, preexec_fn=newsession, **kw)
+            p = Popen([sys.executable, trun_py] + t.argv, env=env, stdin=devnull, stdout=PIPE, stderr=PIPE, bufsize=0, preexec_fn=newsession, **kw)
        except:
            stdout, stderr = b'', b(traceback.format_exc())
            bstderr.write(stderr)

--- a/nxdtest/nxdtest_test.py
+++ b/nxdtest/nxdtest_test.py
@@ -19,10 +19,14 @@

 # verify general functionality

+import os
 import sys
 import re
 import time
-from os.path import dirname
+import tempfile
+import shutil
+import subprocess
+from os.path import dirname, exists, devnull
 from golang import chan, select, default, func, defer
 from golang import context, sync

@@ -52,6 +56,44 @@ def run_nxdtest(tmpdir):
    return _run_nxdtest


+# run all tests twice:
+# 1) with user namespaces disabled,
+# 2) with user namespaces potentially enabled.
+@pytest.fixture(autouse=True, params=('userns_disabled', 'userns_default'))
+def with_and_without_userns(tmp_path, monkeypatch, request):
+    if request.param == 'userns_disabled':
+        if request.node.get_closest_marker("userns_only"):
+            pytest.skip("test is @userns_only")
+        with open(str(tmp_path / 'unshare'), 'w') as f:
+            f.write('#!/bin/sh\nexit 1')
+        os.chmod(f.name, 0o755)
+        monkeypatch.setenv("PATH", str(tmp_path), prepend=os.pathsep)
+
+    else:
+        assert request.param == 'userns_default'
+        request.node.add_marker(
+            pytest.mark.xfail(not userns_works,
+                reason="this functionality needs user-namespaces to work"))
+
+# @userns_only marks test as requiring user-namespaces to succeed.
+try:
+    with open(devnull, 'w') as null:
+        # since trun uses unshare(1) instead of direct system calls, use all
+        # those unshare options used by trun to verify that we indeed have
+        #
+        #   1) userns support from kernel, and
+        #   2) recent enough unshare that won't fail due to "unknown option".
+        #
+        # change this back to plain `unshare -U` when/if trun is reworked to
+        # use system calls directly.
+        subprocess.check_call(['unshare', '-Umc', '--keep-caps', 'true'], stdout=null, stderr=null)
+except (OSError, subprocess.CalledProcessError):
+    userns_works = False
+else:
+    userns_works = True
+userns_only = pytest.mark.userns_only
+
+
 def test_main(run_nxdtest, capsys):
    run_nxdtest(
        """\
@@ -68,7 +110,7 @@ TestCase('TESTNAME', ['echo', 'TEST OUPUT'])
    assert re.match(u"# ran 1 test case:  1·ok", output_lines[-1])


-def test_error_invoking_command(run_nxdtest, capsys):
+def test_command_does_not_exist(run_nxdtest, capsys):
    run_nxdtest(
        """\
 TestCase('TESTNAME', ['not exist command'])
@@ -76,7 +118,21 @@ TestCase('TESTNAME', ['not exist command'])
    )

    captured = capsys.readouterr()
-    assert "No such file or directory" in captured.err
+    assert 'Traceback' not in captured.out
+    assert 'Traceback' not in captured.err
+    assert captured.err == "not exist command: No such file or directory\n"
+
+
+def test_command_exit_with_non_zero(run_nxdtest, capsys):
+    run_nxdtest(
+        """\
+TestCase('TESTNAME', ['false'])
+"""
+    )
+
+    captured = capsys.readouterr()
+    assert 'Traceback' not in captured.out
+    assert 'Traceback' not in captured.err


 def test_error_invoking_summary(run_nxdtest, capsys):
@@ -165,3 +221,41 @@ TestCase('TEST_WITH_PROCLEAK', ['%s', 'AAA', 'BBB', 'CCC'])
    assert "AAA: terminating" in captured.out
    assert "BBB: terminating" in captured.out
    assert "CCC: terminating" in captured.out
+
+
+# verify that files leaked on /tmp are detected.
+@userns_only
+@func
+def test_run_tmpleak(run_nxdtest, capsys):
+    xtouch = "%s/testprog/xtouch" % (dirname(__file__),)
+
+    tmpd = tempfile.mkdtemp("", "nxdtest-leak.", "/tmp")
+    def _():
+        shutil.rmtree(tmpd)
+    defer(_)
+
+    tmpleakv = list('%s/%d' % (tmpd, i) for i in range(10))
+    for f in tmpleakv:
+        assert not exists(f)
+
+    run_nxdtest(
+        """
+TestCase('TESTCASE', ['%s'] + %r)
+""" % (xtouch, tmpleakv,)
+    )
+    captured = capsys.readouterr()
+
+    for f in tmpleakv:
+        assert ("# leaked %s" % f) in captured.out
+        assert not exists(f)
+
+
+# verify that leaked mounts are detected.
+@userns_only
+def test_run_mountleak(run_nxdtest, capsys):
+    run_nxdtest(
+        """
+TestCase('TESTCASE', ['mount', '-t', 'tmpfs', 'none', '/etc'])
+""")
+    captured = capsys.readouterr()
+    assert "# leaked mount: none /etc tmpfs" in captured.out
--- a/nxdtest/testprog/xtouch
+++ b/nxdtest/testprog/xtouch
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright (C) 2021  Nexedi SA and Contributors.
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Program xtouch helps to verify that nxdtest detects files leaked on /tmp.
+
+It is similar to touch(1), but creates leading directories automatically.
+It also always exits with non-zero status to simulate failure.
+"""
+
+from __future__ import print_function, absolute_import
+
+import os, sys
+from os.path import dirname
+from errno import EEXIST
+
+def main():
+    for f in sys.argv[1:]:
+        mkdir_p(dirname(f))
+        with open(f, "a"):
+            pass
+
+    sys.exit(1)
+
+
+# mkdir_p mimics `mkdir -p`
+def mkdir_p(path):
+    try:
+        os.makedirs(path)
+    except OSError as e:
+        if e.errno != EEXIST:
+            raise
+
+
+if __name__ == '__main__':
+    main()
--- a/nxdtest/trun.py
+++ b/nxdtest/trun.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright (C) 2021  Nexedi SA and Contributors.
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+""" `trun ...` - run test specified by `...`
+
+The test is run in dedicated environment, which, after test completes, is
+checked for leaked files, leaked mount entries, etc.
+
+The environment is activated only if user namespaces are available(*).
+If user namespaces are not available, the test is still run but without the checks.
+
+(*) see https://man7.org/linux/man-pages/man7/user_namespaces.7.html
+"""
+
+from __future__ import print_function, absolute_import
+
+import errno, os, sys, stat, difflib
+from subprocess import check_call as xrun, CalledProcessError
+from os.path import join, devnull
+from golang import func, defer
+
+def main():
+    # Try to respawn ourselves in user-namespace where we can mount things, e.g. new /tmp.
+    # Keep current uid/gid the same for better traceability. In other words current user
+    # stays the same. Activate ambient capabilities(*) so that mounting filesystems,
+    # including FUSE-based ones for wendelin.core, still works under regular non-zero uid.
+    #
+    # (*) see https://man7.org/linux/man-pages/man7/capabilities.7.html
+    #     and git.kernel.org/linus/58319057b784.
+    in_userns = True
+    mypid = str(os.getpid())
+    _ = os.environ.get("_NXDTEST_TRUN_RESPAWNED", "")
+    if mypid != _:
+        uargv = ["-Umc", "--keep-caps"] # NOTE keep this in sync with @userns_only in nxdtest_test.py
+        try:
+            # check if user namespaces are available
+            with open(devnull, "w") as null:
+                xrun(["unshare"] + uargv + ["true"], stdout=null, stderr=null)
+        except (OSError, CalledProcessError):
+            in_userns = False
+            print("# user namespaces not available. isolation and many checks will be deactivated.")
+        else:
+            os.environ["_NXDTEST_TRUN_RESPAWNED"] = mypid
+            os.execvp("unshare", ["unshare"] + uargv + [sys.executable] + sys.argv)
+            raise AssertionError("unreachable")
+
+    # either respawned in new namespace, or entered here without respawn with in_userns=n.
+    # run the test via corresponding driver.
+    run = run_in_userns if in_userns else run_no_userns
+    def _():
+        try:
+            xrun(sys.argv[1:])
+        except OSError as e:
+            if e.errno != errno.ENOENT:
+                raise
+            #print(e.strerror, file=sys.stderr)   # e.strerror does not include filename on py2
+            print("%s: %s" % (sys.argv[1], os.strerror(e.errno)), # e.filename is also ø on py2
+                    file=sys.stderr)
+            sys.exit(127)
+        except CalledProcessError as e:
+            sys.exit(e.returncode)
+    run(_)
+
+
+# run_in_userns runs f with checks assuming that we are in a user namespace.
+@func
+def run_in_userns(f):
+    # mount new /tmp and /dev/shm to isolate this run from other programs and to detect
+    # leaked temporary files at the end.
+    tmpreg = {
+        "/tmp":     [], # mountpoint -> extra options
+        "/dev/shm": []
+    }
+    for tmp, optv in tmpreg.items():
+        xrun(["mount", "-t", "tmpfs", "none", tmp] + optv)
+
+    # in the end: check file leakage on /tmp and friends.
+    def _():
+        for root in tmpreg:
+            for d, dirs, files in os.walk(root):
+                if d != root:
+                    st = os.stat(d)
+                    if st.st_mode & stat.S_ISVTX:
+                        # sticky wcfs/ alike directories are used as top of registry for
+                        # multiple users. It is kind of normal not to delete such
+                        # directories by default.
+                        print("# found sticky %s/" % d)
+                    else:
+                        print("# leaked %s/" % d)
+                for f in files:
+                    print("# leaked %s" % join(d, f))
+    defer(_)
+
+    # in the end: check fstab changes.
+    fstab_before = mounts()
+    def _():
+        fstab_after = mounts()
+        for d in difflib.ndiff(fstab_before, fstab_after):
+            if d.startswith("- "):
+                print("# gone mount: %s" % d[2:])
+            if d.startswith("+ "):
+                print("# leaked mount: %s" % d[2:])
+    defer(_)
+
+    # run the test
+    f()
+
+
+# run_no_userns runs f assuming that we are not in a user namespace.
+def run_no_userns(f):
+    f()
+
+
+# mounts returns current mount entries.
+def mounts(): # -> []str
+    return readfile("/proc/mounts").split('\n')
+
+
+# readfile returns content of file @path.
+def readfile(path): # -> str
+    with open(path, "r") as f:
+        return f.read()
+
+
+if __name__ == '__main__':
+    main()
--- a/pytest.ini
+++ b/pytest.ini
+[pytest]
+markers =
+    userns_only: test os run only when user namespaces are available