- 08 Nov, 2020 2 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Since ZBigFile keeps references to fileh objects that are created through it it forms a file <=> fileh cycle that is not collected without cyclic GC: https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-52-ga702d41/bigfile/file_zodb.py#L497 https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-52-ga702d41/bigfile/file_zodb.py#L566-571 We did not noticed this leak until now because it is small, but with upcoming wendelin.core 2 it is important to release a fileh, because there is WCFS connection associated with fileh, and if fileh is not released, that connection also stays alive, keeping on-WCFS resources still being used, and preventing WCFS from being unmounted cleanly. -> Add cyclic GC support to PyBigFile / PyBigFileH NOTE: we still don't allow PyVMA <=> PyBigFileH cycles to be collected, because fileh_close called from fileh.__del__ asserts that there are no live mappings left. See added comments for details. There is no known practical need to use such cycles, so this should be ok. See also other patches on cyclic GC topic: - 450ad804 (bigarray: ArrayRef support for BigArray) // adds cyclic GC support for PyVMA - d97641d2 (bigfile/py: Properly untrack PyVMA from GC before dealloc)
-
- 05 Nov, 2020 4 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
X bigfile/_file_zodb: Fix ZSync to close not only wconn, but also wconn.wc through which wconn was created pywconnOf, before creating wconn, performs wc=wcfs.join(zurl) which creates new filesystem-level connection to WCFS server. This wc is used only to create wconn. So if we do not close wc, after releaseing wconn, it will leak opened file descriptor, to e.g. .wcfs/zurl and prevent tests from finishing cleanly.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Its just a debugging print - helpful to debug zwatcher, but not helpful to understand which events the system was observing.
-
- 04 Nov, 2020 3 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Helps to understand why if wcfs cannot be unmounted.
-
Kirill Smelkov authored
Don't use regular mutex to protect _zsyncReg updates as this can deadlock because one of _zsyncReg mutators (on_zconn_dealloc) is invoked by automatic GC that can be triggered any time.
-
- 03 Nov, 2020 13 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
The logic inside ZSync was correct, but it was incorrect to attach zsync to zconn to stay alive and react when that zconn is garbage collected: zsync._on_zconn_dealloc was not called because zsync itself was garbage collected too. This fixes many failures where wconn and associated pinner was not released even though ZODB DB was correctly closed.
-
Kirill Smelkov authored
This makes sure to cleanup /proc/mounts from stale / broken FUSE connection, and removes uninformational `assert not is_mountpoint` from raising, thus, adding more noise in already very verbose wcfs-kill-dump.
-
Kirill Smelkov authored
Excercise the logic that keeps wconn <-> zconn in sync.
-
Kirill Smelkov authored
To known to which DB state WCFS connection corresponds. This is similar to zodb.Connection.At() in ZODB/go and to zconn_at in ZODB/py. wconn.at() will be used in the next patch to verify ZSync.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Manaully, because there is no automatic dependency tracking in setuptools... Dependency tracking is needed to avoid miscompilation after incremental update under SlapOS/buildout/testnode/... when e.g. only .h was changed.
-
Kirill Smelkov authored
Tests inside wcfs/ care to do this, but e.g. test.py/fs-wcfs autospawns wcfs servers during regular bigfile tests. If we don't stop spawned wcfs, those processes will leak, and also they keep `nxdtest test.py/*-wcfs` in "hung" state, because nxdtest is waiting for wcfs to stop as wcfs stdout is connected to nxdtest input. Currently kills wcfs in abrupt way, because graceful pinner shutdown is not yet implemented there.
-
Kirill Smelkov authored
tWCFS is responsible for starting/mounting/unmounting/stopping wcfs tDB uses tWCFS and provides commit/test service on top. We'll use tWCFS in the next patch to unmount/stop WCFS processes that are automatically spawned during test.py
-
Kirill Smelkov authored
Once WCFS instance is created, use wc.mountpoint to refer to where this wcfs is mounted. It does not change anything right now, but in a follow-up patches we'll reuse the code from wcfs_test to work on any wc, not neccessarily mounted on testmntpt.
-
Kirill Smelkov authored
Else, when runing tests intree `import wcfs` and `import wendelin.wcfs` will give two different modules, and inspecting e.g. wendelin.wcfs at teardown will see fresh module state (_wcregistry) because it was wcfs which was used. Also just `import wcfs` will raise ImportError when run out of tree.
-
Kirill Smelkov authored
Starting from 3f83469c Conn and WatchLink started to inherit from interface, which made them to use virtual functions, which, without destructor being also virtual emits the following warnings: wcfs/client/wcfs.cpp: In member function ‘virtual void wcfs::_Conn::decref()’: wcfs/client/wcfs.cpp:1531:16: warning: deleting object of polymorphic class type ‘wcfs::_Conn’ which has non-virtual destructor might cause undefined behavior [-Wdelete-non-virtual-dtor] delete this; ^~~~ wcfs/client/wcfs_watchlink.cpp: In member function ‘virtual void wcfs::_WatchLink::decref()’: wcfs/client/wcfs_watchlink.cpp:514:16: warning: deleting object of polymorphic class type ‘wcfs::_WatchLink’ which has non-virtual destructor might cause undefined behavior [-Wdelete-non-virtual-dtor] delete this; ^~~~
-
- 02 Nov, 2020 1 commit
-
-
Kirill Smelkov authored
FUSE puts X as st_dev's minor, which, for minors <= 255 is the same as st_dev. However when there are many connections, and minor goes after 255, minor becomes != st_dev: In [2]: os.makedev(0, 254) Out[2]: 254 In [3]: os.makedev(0, 255) Out[3]: 255 In [5]: os.makedev(0, 256) Out[5]: 1048576 As a result we were constructing wrong patch, and if wcfs was failing we were also failing to kill it with something like: t = <wcfs.wcfs_test.tDB object at 0x7fef78043260> @func def __init__(t): t.root = testdb.dbopen() def _(): # close/unlock db if __init__ fails exc = sys.exc_info()[1] if exc is not None: dbclose(t.root) defer(_) assert not os.path.exists(testmntpt) t.wc = wcfs.join(testzurl, autostart=True) assert os.path.exists(testmntpt) assert is_mountpoint(testmntpt) # force-unmount wcfs on timeout to unstuck current test and let it fail. # Force-unmount can be done reliably only by writing into # /sys/fs/fuse/connections/<X>/abort. For everything else there are # cases, when wcfs, even after receiving `kill -9`, will be stuck in kernel. # ( git.kernel.org/linus/a131de0a482a makes in-kernel FUSE client to # still wait for request completion even after fatal signal ) > t._wcfuseabort = open("/sys/fs/fuse/connections/%d/abort" % os.stat(testmntpt).st_dev, "w") E IOError: [Errno 2] No such file or directory: '/sys/fs/fuse/connections/2097264/abort' wcfs/wcfs_test.py:236: IOError In the above failure st_dev=2097264 corresponds to X=624: In [6]: os.minor(2097264) Out[6]: 624
-
- 01 Nov, 2020 2 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 30 Oct, 2020 5 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Without special care a forked child may interfere in parent-wcfs exchange via Python GC -> PyFileH.__del__ -> FileH.close -> message to WCFS sent from the child. This actually happens for real when running test.py/neo-wcfs because NEO test cluster spawns master and storage nodes with just fork without exec. -> detach from wcfs in child right after fork and deactivate all mappings in order not to provide stale data. See top-level comments added to wcfs/client/wcfs.cpp for details.
-
Kirill Smelkov authored
Currently in wcfs_test.py there is only waiting for a proc (subprocess.Popen instance) to become ready. However in the next patch we'll need to wait via polling for another condition. -> Generalize the pollwait code into waitfor* variants, and make procwait* use waitfor* internally.
-
Kirill Smelkov authored
Currently the code to convert `int err` or errno into string is usde only in _pathError, but in the next patches we'll need it to also handle error from pthread_atfork. -> Factor-out to separate function.
-
- 27 Oct, 2020 1 commit
-
-
Kirill Smelkov authored
-
- 25 Oct, 2020 1 commit
-
-
Kirill Smelkov authored
-
- 23 Oct, 2020 3 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 22 Oct, 2020 5 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-