- 16 Nov, 2021 12 commits
-
-
Kirill Smelkov authored
* t2: *: Cosmetics . . lib/zodb: Mark test_zconn_at as xfail on plain ZODB4 . . wcfs: Server.stop: Make sure to remove mount entry even if we had to use FUSE abort tests: Don't leak WCFS log files tests: Remove test NEO database after test run is over nxdtest: Don't run test.go for multiple GOMAXPROCS wcfs: Make sure to remove mountpoint directory on Server.stop nxdtest: Run WCFS-related tests in verbose mode on testnodes setup: Fix egg_info after addition of δbtail.go
-
Kirill Smelkov authored
* master: *: Cosmetics
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* master: lib/zodb: Mark test_zconn_at as xfail on plain ZODB4
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
This way on plain ZODB4 the following non-wcfs tests will continue to pass test.py/fs-!wcfs test.py/zeo-!wcfs test.py/neo-!wcfs instead of failing as e.g. in here: https://nexedijs.erp5.net/#/test_result_module/20211116-123A66706 On plain ZODB4 WCFS-related functionality - which uses zconn_at - will continue to raise corresponding assertion in WCFS-related tests, as e.g. in https://nexedijs.erp5.net/#/test_result_module/20211116-123A66706/6
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* master: wcfs: Server.stop: Make sure to remove mount entry even if we had to use FUSE abort tests: Don't leak WCFS log files tests: Remove test NEO database after test run is over nxdtest: Don't run test.go for multiple GOMAXPROCS wcfs: Make sure to remove mountpoint directory on Server.stop nxdtest: Run WCFS-related tests in verbose mode on testnodes setup: Fix egg_info after addition of δbtail.go
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* t2: . fixup! wcfs: Handle ZODB invalidations . . wcfs/internal/mm: Complete the package fixup! wcfs: client: Provide client package to care about isolation protocol details lib/zodb: zconn_at: Fix how ZODB4 is asserted to be patched . . . . .
-
Kirill Smelkov authored
-
- 15 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
Server.stop currently tries to unmount, and if that fails invokes FUSE abort and kills wcfs.go . However it does not call unmount the second time after such abort, and this way the filesystem remains mounted (in ENOTCONN state) and rmdir(mountpoint) fails. -> Fix it by calling unmount the second time if we had to abort FUSE connection. In that second try use lazy unmounting, because regular unmount can still fail with "Device or resource busy" since there could be still client file descriptors left pointing to the mounted filesystem. With lazy mode unmounting + followup rmdir, hopefully, always succeeds. Here is example test run where one test timed out, FUSE connection was aborted, but neither the filesystem was unmounted, nor mountpoint directory was deleted, which led to all followup tests failing in setup assert that testmountpoint does not exist: https://nexedijs.erp5.net/#/test_result_module/20211112-1ACEA62D/22 This patch should fix those followup failures + fix another leakage of WCFS mounts in real services.
-
- 12 Nov, 2021 4 commits
-
-
Kirill Smelkov authored
By default every WCFS run creates several files in /tmp/wcfs.*.log.* and without explicit cleanup those files are left hanging on testnodes. Over last ~6 months we accumulated ~ 300K such files. Don't allow those files to be leaked by instructing WCFS to log to stderr during test run. This should be also useful to see details in the test output.
-
Kirill Smelkov authored
With NEO we were creating test database on /tmp but we were not deleting it in the end. As the result many /tmp/neo_XXXXXX non-empty directories were being leaked. -> Fix it by creating testdb directory outselves and removing it at the end, similarly to FileStorage and ZEO. Fixes: 7fc4ec66 (tests: Allow to test with ZEO & NEO ZODB storages)
-
Kirill Smelkov authored
We run tests with different GOMAXPROCS because some WCFS bugs are only likely to trigger when there is only 1 or 2 main OS thread(s) in WCFS. However test.go does not exercise filesystem functionality - it runs unit tests for ZBlk decoding, ΔBtail and similar. At the same time test.go:* currently occupies ~ 50% of whole time to run full testsuite with the main consumer being ΔBtail random testing. -> Run test.go only once. This should save ~ 1000s for each run and lower whole time to run wendelin.core testsuite on testnode from ~60m -> to ~40 minutes.
-
Kirill Smelkov authored
Else every time test.py/wcfs is run several empty directories are left in /dev/shm/wcfs - each corresponding to WCFS server that was automatically spawned and stopped at the end of the test. Over time this can accumulate to some big number as e.g. ~20000 of such directories were left on the testnode during last 6 months.
-
- 09 Nov, 2021 2 commits
-
-
Kirill Smelkov authored
This are the early days of WCFS - we want full details which in default configuration might not be available to see if WCFS gets stuck for one reason or another. See added comments for details.
-
Kirill Smelkov authored
`python setup.py egg_info` stopped working after we added non-ASCII files, e.g. δbtail.go in 2ab4be93 (wcfs: xbtree: ΔBtail) and δftail.go in f980471f (wcfs: zdata: ΔFtail): (neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/nexedi/wendelin.core$ python setup.py egg_info running egg_info writing requirements to wendelin.core.egg-info/requires.txt writing wendelin.core.egg-info/PKG-INFO writing top-level names to wendelin.core.egg-info/top_level.txt writing dependency_links to wendelin.core.egg-info/dependency_links.txt writing entry points to wendelin.core.egg-info/entry_points.txt package init file '__init__.py' not found (or not a regular file) /usr/lib/python2.7/distutils/filelist.py:64: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal sortable_files.sort() Traceback (most recent call last): File "setup.py", line 416, in <module> """.splitlines()] File "/home/kirr/src/tools/go/pygolang/golang/pyx/build.py", line 118, in setup setuptools_dso.setup(**kw) File "/home/kirr/src/wendelin/venv/z-dev/lib/python2.7/site-packages/setuptools_dso/__init__.py", line 37, in setup _setup(**kws) File "/home/kirr/src/wendelin/venv/z-dev/lib/python2.7/site-packages/setuptools/__init__.py", line 162, in setup return distutils.core.setup(**attrs) File "/usr/lib/python2.7/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/home/kirr/src/wendelin/venv/z-dev/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 296, in run self.find_sources() File "/home/kirr/src/wendelin/venv/z-dev/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 303, in find_sources mm.run() File "/home/kirr/src/wendelin/venv/z-dev/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 538, in run self.filelist.sort() File "/usr/lib/python2.7/distutils/filelist.py", line 64, in sort sortable_files.sort() UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128) This happens becuase by default setuptools collects filenames as str, not unicode, and our git_lsfiles - also registered into setuptools.file_finders entrypoint - collects filenames as unicode. Previously everything was working because there was no on-ASCII filenames, and so unicode vs str coercion worked automatically. But now, after there is filename like 'δbtail.go', it stopped to work and raises UnicodeDecodeError. -> Fix it by adjusting git_lsfiles to collect filenames as UTF-8 encoded strings instead of unicode.
-
- 08 Nov, 2021 8 commits
-
-
Kirill Smelkov authored
* master: (40 commits) fixup! wcfs: Handle ZODB invalidations wcfs/internal/mm: Complete the package fixup! wcfs: client: Provide client package to care about isolation protocol details lib/zodb: zconn_at: Fix how ZODB4 is asserted to be patched lib/zodb: zstor_2zurl: Explicitly reject MappingStorage bigfile/zodb: Teach ZBigFile backend to use WCFS wcfs: client: Provide virtmem integration wcfs: client: Add wczsync package to maintain WCFS connection in sync to ZODB connection lib/zodb: Teach zconn_at to work on ZODB4 lib/zodb: Add ZODB.Connection.onShutdownCallback lib/zodb: Teach Connection.onResyncCallback to work on ZODB4 bigfile/py: Allow PyBigFile backend to expose "mmap overlay" functionality bigfile/virtmem: Introduce "mmap overlay" mode wcfs: client: Provide client package to care about isolation protocol details wcfs: Provide isolation to clients wcfs: Handle ZODB invalidations wcfs: Add FileSock FUSE utility wcfs: zdata: ΔFtail wcfs: xbtree: ΔBtail wcfs: xbtree: BTree-diff algorithm ...
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Fix last-minute error that crept in during kirr/wendelin.core@4af54da9 : (neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs$ go test # lab.nexedi.com/nexedi/wendelin.core/wcfs ./wcfs.go:957:4: Errorf format %s has arg sk of wrong type *lab.nexedi.com/nexedi/wendelin.core/wcfs.FileSock Amends 4430de41.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Add two functions, that were developed during wendelin.core 2 α, to the package for completeness: - map_zero_into_ro complements map_zero_ro, but mmaps into user-provided buffer. - sync calls msync on the provided memory.
-
Kirill Smelkov authored
Remove outdated TODO because test_wcfs_watch_before_create passes this days. It was fixed after ΔFtail was taught about epochs and the fix was reflected in kirr/wendelin.core@63ae8326. Amends 10f7153a.
-
Kirill Smelkov authored
Fix how unpatched ZODB4 is reported to lack required patch: Before: Traceback (most recent call last): File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 251, in test_zconn_at assert zconn_at(conn1) == at0 File "/home/kirr/src/wendelin/wendelin.core/lib/zodb.py", line 162, in zconn_at assert 'conn:MVCC-via-loadBefore-only' in ZODB.nxd_patches, \ AttributeError: 'module' object has no attribute 'nxd_patches' After: Traceback (most recent call last): File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 251, in test_zconn_at assert zconn_at(conn1) == at0 File "/home/kirr/src/wendelin/wendelin.core/lib/zodb.py", line 163, in zconn_at "nexedi/ZODB!1") File "/home/kirr/src/wendelin/wendelin.core/lib/zodb.py", line 191, in _zassertHasNXDPatch (zmajor, patch, details_link)) AssertionError: ZODB4 is not patched with required Nexedi patch 'conn:MVCC-via-loadBefore-only' See nexedi/ZODB!1 for details Fixes 1f866c00 (lib/zodb: Teach zconn_at to work on ZODB4).
-
- 28 Oct, 2021 13 commits
-
-
Kirill Smelkov authored
* master: (36 commits) lib/zodb: zstor_2zurl: Explicitly reject MappingStorage bigfile/zodb: Teach ZBigFile backend to use WCFS wcfs: client: Provide virtmem integration wcfs: client: Add wczsync package to maintain WCFS connection in sync to ZODB connection lib/zodb: Teach zconn_at to work on ZODB4 lib/zodb: Add ZODB.Connection.onShutdownCallback lib/zodb: Teach Connection.onResyncCallback to work on ZODB4 bigfile/py: Allow PyBigFile backend to expose "mmap overlay" functionality bigfile/virtmem: Introduce "mmap overlay" mode wcfs: client: Provide client package to care about isolation protocol details wcfs: Provide isolation to clients wcfs: Handle ZODB invalidations wcfs: Add FileSock FUSE utility wcfs: zdata: ΔFtail wcfs: xbtree: ΔBtail wcfs: xbtree: BTree-diff algorithm wcfs: xbtree: blib += PPTreeSubSet, ΔPPTreeSubSet wcfs: xbtree: blib += RangedMap, RangedKeySet wcfs: tests: Tree-based testing environment wcfs: Set package ...
-
Kirill Smelkov authored
It is not possible for WCFS to access data of in-RAM storage of another process. But without explicit explanation the error message is confusing - it was something like: NotImplementedError: don't know how to extract zurl from <ZODB.MappingStorage.MappingStorage object at 0x7f28f04cea10> which suggests it was just not implemented.
-
Kirill Smelkov authored
By using WCFS as mmap-overlay for base data(*). WCFS-mode is still opt-in with default remaining to use old full user-space virtual memory manager mode as initially introduced in 2015. Wendelin.core should be draftly usable in WCFS mode now. This patch is organized as follows: - file_zodb.cpp provides mmap-overlay operations for WCFS implemented via WCFS client library. - file_zodb.py is adjusted accordingly to use WCFS if requested. Low-level things specific to gluing to file_zodb.cpp are moved to _file_zodb.pyx. - the rest of the changes are drive-by by main ones. (*) see the following patches for what is mmap-overlay: - fae045cc (bigfile/virtmem: Introduce "mmap overlay" mode) - 23362204 (bigfile/py: Allow PyBigFile backend to expose "mmap overlay" functionality) Some preliminary history: kirr/wendelin.core@01916f09 X Draft demo that reading data through wcfs works kirr/wendelin.core@fd58082a X Fix build on old GCC kirr/wendelin.core@f622e751 X tests: Stop wcfs spawned during tests kirr/wendelin.core@f118617b X tests: Don't try to stop wcfs that is already exited
-
Kirill Smelkov authored
Provide integration with virtmem, so that WCFS Mapping can be associated and managed under virtmem VMA. In other words provide support so that WCFS can be used as ZBigFile backend in "mmap overlay" mode (see fae045cc "bigfile/virtmem: Introduce "mmap overlay" mode" for description of mmap-overlay mode). We'll need this functionality for ZBigFile + WCFS client integration. Virtmem integration will be tested via running whole wendelin.core functional testsuite in wcfs-mode after the next patch. Quoting added description: ---- 8< ---- Integration with wendelin.core virtmem layer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This client package can be used standalone, but additionally provides integration with wendelin.core userspace virtual memory manager: when a Mapping is created, it can be associated as serving base layer for a particular virtmem VMA via FileH.mmap(vma=...). In that case, since virtmem itself adds another layer of dirty pages over read-only base provided by Mapping(+) ┌──┐ ┌──┐ │RW│ │RW│ ← virtmem VMA dirty pages └──┘ └──┘ + VMA base = X@at view provided by Mapping: ___ /@revA/bigfile/X __ /@revB/bigfile/X _ /@revC/bigfile/X + ... ─── ───── ────────────────────────── ───── /head/bigfile/X the Mapping will interact with virtmem layer to coordinate updates to mapping virtual memory. How it works ~~~~~~~~~~~~ Wcfs client integrates with virtmem layer to support virtmem handle dirtying pages of read-only base-layer that wcfs client provides via isolated Mapping. For wcfs-backed bigfiles every virtmem VMA is interlinked with Mapping: VMA -> BigFileH -> ZBigFile -----> Z ↑↓ O Mapping -> FileH -> wcfs server --> DB When a page is write-accessed, virtmem mmaps in a page of RAM in place of accessed virtual memory, copies base-layer content provided by Mapping into there, and marks that page as read-write. Upon receiving pin message, the pinner consults virtmem, whether corresponding page was already dirtied in virtmem's BigFileH (call to __fileh_page_isdirty), and if it was, the pinner does not remmap Mapping part to wcfs/@revX/f and just leaves dirty page in its place, remembering pin information in fileh._pinned. Once dirty pages are no longer needed (either after discard/abort or writeout/commit), virtmem asks wcfs client to remmap corresponding regions of Mapping in its place again via calls to Mapping.remmap_blk for previously dirtied blocks. The scheme outlined above does not need to split Mapping upon dirtying an inner page. See bigfile_ops interface (wendelin/bigfile/file.h) that explains base-layer and overlaying from virtmem point of view. For wcfs this interface is provided by small wcfs client wrapper in bigfile/file_zodb.cpp. (+) see bigfile_ops interface (wendelin/bigfile/file.h) that gives virtmem point of view on layering. ---------------------------------------- Some preliminary history: kirr/wendelin.core@f330bd2f X wcfs/client: Overview += interaction with virtmem layer
-
Kirill Smelkov authored
For ZBigFile + WCFS client integration we'll need to open WCFS connections that observer database at the same state as current ZODB connection. Later that WCFS connection needs to adjust its on-WCFS view in accordance to how ZODB connection adjusts its one. Wczsync provides a function to do so: pywconnOf(zconn) will open WCFS connection and maintain it in sync with ZODB connection zconn. Some preliminary history: kirr/wendelin.core@8bf8f23b X bigfile/_file_zodb: Fix logic around ZSync usage kirr/wendelin.core@571cb737 fixup! X bigfile/_file_zodb: Fix logic around ZSync usage kirr/wendelin.core@a9a82d5a X bigfile/_file_zodb: Fix ZSync to close not only wconn, but also wconn.wc through which wconn was created kirr/wendelin.core@cf92937f X wcfs: Move wconn<->zconn sync functionality into wcfs.client._wczsync kirr/wendelin.core@7203d7ab X wcfs: Fix ZSync to close wconn on zdb.close, even if zconn stays alive
-
Kirill Smelkov authored
In 3bd82127 (lib/zodb: Add zconn_at draft (ZODB5 only)) we added zconn_at function to find out as of which state a ZODB connection is viewing the database. That was ZODB5-only however. Let's add support for ZODB4 now - by requiring ZODB4-wc2 - a version of ZODB4 with MVCC backported from ZODB5: nexedi/ZODB!1 This makes wendelin.core to work on either ZODB5 or ZODB4-wc2, but not plain ZODB4. However as zconn_at will be used only for WCFS-integration, non-wcfs mode will continue to work on all ZODB5, ZODB4-wc2 and plain ZODB4. ZBigFile + WCFS client integration will use zconn_at to open WCFS connection that corresponds to ZODB connection. Preliminary history: kirr/wendelin.core@1c3b7750 X zconn_at for ZODB4
-
Kirill Smelkov authored
Add patch to ZODB.Connection to support callback on after database is closed. ZBigFile + WCFS client integration will use this callback to close WCFS connection when corresponding ZODB.DB is closed. Preliminary history: kirr/wendelin.core@a26d9659 X lib/zodb: Connection += onShutdownCallback
-
Kirill Smelkov authored
In 959ae2d0 (lib/zodb: Add patch to ZODB.Connection to support callback on connection DB view change) we added patch for ZODB.Connection to support callback when database view of the connection changes. At that time the patch was working for ZODB5 and ZODB4 was TODO. Let's add support for ZODB4 (both ZODB4 and ZODB4-wc2) now. As a reminder: ZBigFile + WCFS client integration will use this callback to keep WCFS connection in sync with ZODB connection. Preliminary history: kirr/wendelin.core@533a4cfa X onResyncCallback for ZODB4
-
Kirill Smelkov authored
This patch logically continues previous change `bigfile/virtmem: Introduce "mmap overlay" mode` and exposes mmap-overlay functionality to Python: if PyBigFile backend provides .blkmmapper PyCapsule the mmap-related methods will be extracted from it and passed on through to virtmem - see _bigfile.h for details. ZBigFile will use this to hook into using WCFS.
-
Kirill Smelkov authored
with the intention to later use WCFS through it. Before this patch virtmem had only one mode: a BigFile backend was providing loadblk and storeblk methods, and on every block access loadblk was called to load block data into allocated RAM page. However with WCFS virtmem won't be needed to do anything to load data - because loading from head/bigfile/f mmaped through OS will be handled by OS directly. Thus for wcfs, that leaves virtmem only to handle dirtying and writeout. -> Introduce "mmap overlay" mode into virtmem to handle WCFS-like BigFile backends - that can provide read-only base layer suitable for mmapping. This patch is organized as follows: - fileh_open is added flags argument to indicate which mode to use for opened fileh. BigFileH is added .mmap_overlay bitfield correspondingly. (virtmem.h) - struct bigfile_ops is extended with 3 optional methods that a BigFile backend might provide to support mmap-overlay mode: * mmap_setup_read, * remmap_blk_read, and * munmap (see file.h changes for documentation of this new interface) - if opened with MMAP_OVERLAY flag, virtmem is using those methods to organize VMA views backed by read-only base mmap layer and writeout for such VMAs (virtmem.c) - a test is added to exercise MMAP_OVERLAY virtmem mode (test_virtmem.c) - everything else, including bigfile.py, is switched to use DONT_MMAP_OVERLAY unconditionally for now. In internal comments inside virtmem new mode is interchangeable called "mmap overlay" and "wcfs", even though wcfs is not hooked to be used mmap-overlaying yet. Some preliminary history: kirr/wendelin.core@fb6932a2 X Split PAGE_LOADED -> PAGE_LOADED, PAGE_LOADED_FOR_WRITE kirr/wendelin.core@4a20a573 X Settled on what should happen after writeout for wcfs case kirr/wendelin.core@f084ff9b X Transition to all VMA under 1 fileh to be either all based on wcfs or all based on !wcfs
-
Kirill Smelkov authored
This patch follows-up on previous patch, that added server-side part of isolation protocol handling, and adds client package that takes care about WCFS isolation protocol details and provides to clients simple interface to isolated view of bigfile data on WCFS similar to regular files: given a particular revision of database @at, it provides synthetic read-only bigfile memory mappings with data corresponding to @at state, but using /head/bigfile/* most of the time to build and maintain the mappings. The patch is organized as follows: - wcfs.h and wcfs.cpp brings in usage documentation, internal overview and the main part of the implementation. - wcfs/client/client_test.py is tests. - The rest of the changes in wcfs/client/ are to support the implementation and tests. Quoting package documentation for the reference: ---- 8< ---- Package wcfs provides WCFS client. This client package takes care about WCFS isolation protocol details and provides to clients simple interface to isolated view of bigfile data on WCFS similar to regular files: given a particular revision of database @at, it provides synthetic read-only bigfile memory mappings with data corresponding to @at state, but using /head/bigfile/* most of the time to build and maintain the mappings. For its data a mapping to bigfile X mostly reuses kernel cache for /head/bigfile/X with amount of data not associated with kernel cache for /head/bigfile/X being proportional to δ(bigfile/X, at..head). In the usual case where many client workers simultaneously serve requests, their database views are a bit outdated, but close to head, which means that in practice the kernel cache for /head/bigfile/* is being used almost 100% of the time. A mapping for bigfile X@at is built from OS-level memory mappings of on-WCFS files as follows: ___ /@revA/bigfile/X __ /@revB/bigfile/X _ /@revC/bigfile/X + ... ─── ───── ────────────────────────── ───── /head/bigfile/X where @revR mmaps are being dynamically added/removed by this client package to maintain X@at data view according to WCFS isolation protocol(*). API overview - `WCFS` represents filesystem-level connection to wcfs server. - `Conn` represents logical connection that provides view of data on wcfs filesystem as of particular database state. - `FileH` represent isolated file view under Conn. - `Mapping` represents one memory mapping of FileH. A path from WCFS to Mapping is as follows: WCFS.connect(at) -> Conn Conn.open(foid) -> FileH FileH.mmap([blk_start +blk_len)) -> Mapping A connection can be resynced to another database view via Conn.resync(at'). Documentation for classes provides more thorough overview and API details. -------- (*) see wcfs.go documentation for WCFS isolation protocol overview and details. . Wcfs client organization ~~~~~~~~~~~~~~~~~~~~~~~~ Wcfs client provides to its users isolated bigfile views backed by data on WCFS filesystem. In the absence of Isolation property, wcfs client would reduce to just directly using OS-level file wcfs/head/f for a bigfile f. On the other hand there is a simple, but inefficient, way to support isolation: for @at database view of bigfile f - directly use OS-level file wcfs/@at/f. The latter works, but is very inefficient because OS-cache for f data is not shared in between two connections with @at1 and @at2 views. The cache is also lost when connection view of the database is resynced on transaction boundary. To support isolation efficiently, wcfs client uses wcfs/head/f most of the time, but injects wcfs/@revX/f parts into mappings to maintain f@at view driven by pin messages that wcfs server sends to client in accordance to WCFS isolation protocol(*). Wcfs server sends pin messages synchronously triggered by access to mmaped memory. That means that a client thread, that is accessing wcfs/head/f mmap, is completely blocked while wcfs server sends pins and waits to receive acks from all clients. In other words on-client handling of pins has to be done in separate thread, because wcfs server can also send pins to client that triggered the access. Wcfs client implements pins handling in so-called "pinner" thread(+). The pinner thread receives pin requests from wcfs server via watchlink handle opened through wcfs/head/watch. For every pin request the pinner finds corresponding Mappings and injects wcfs/@revX/f parts via Mapping._remmapblk appropriately. The same watchlink handle is used to send client-originated requests to wcfs server. The requests are sent to tell wcfs that client wants to observe a particular bigfile as of particular revision, or to stop watching it. Such requests originate from regular client threads - not pinner - via entry points like Conn.open, Conn.resync and FileH.close. Every FileH maintains fileh._pinned {} with currently pinned blk -> rev. This dict is updated by pinner driven by pin messages, and is used when new fileh Mapping is created (FileH.mmap). In wendelin.core a bigfile has semantic that it is infinite in size and reads as all zeros beyond region initialized with data. Memory-mapping of OS-level files can also go beyond file size, however accessing memory corresponding to file region after file.size triggers SIGBUS. To preserve wendelin.core semantic wcfs client mmaps-in zeros for Mapping regions after wcfs/head/f.size. For simplicity it is assumed that bigfiles only grow and never shrink. It is indeed currently so, but will have to be revisited if/when wendelin.core adds bigfile truncation. Wcfs client restats wcfs/head/f at every transaction boundary (Conn.resync) and remembers f.size in FileH._headfsize for use during one transaction(%). -------- (*) see wcfs.go documentation for WCFS isolation protocol overview and details. (+) currently, for simplicity, there is one pinner thread for each connection. In the future, for efficiency, it might be reworked to be one pinner thread that serves all connections simultaneously. (%) see _headWait comments on how this has to be reworked. Wcfs client locking organization Wcfs client needs to synchronize regular user threads vs each other and vs pinner. A major lock Conn.atMu protects updates to changes to Conn's view of the database. Whenever atMu.W is taken - Conn.at is changing (Conn.resync), and contrary whenever atMu.R is taken - Conn.at is stable (roughly speaking Conn.resync is not running). Similarly to wcfs.go(*) several locks that protect internal data structures are minor to Conn.atMu - they need to be taken only under atMu.R (to synchronize e.g. multiple fileh open running simultaneously), but do not need to be taken at all if atMu.W is taken. In data structures such locks are noted as follows sync::Mutex xMu; // atMu.W | atMu.R + xMu After atMu, Conn.filehMu protects registry of opened file handles (Conn._filehTab), and FileH.mmapMu protects registry of created Mappings (FileH.mmaps) and FileH.pinned. Several locks are RWMutex instead of just Mutex not only to allow more concurrency, but, in the first place for correctness: pinner thread being core element in handling WCFS isolation protocol, is effectively invoked synchronously from other threads via messages coming through wcfs server. For example Conn.resync sends watch request to wcfs server and waits for the answer. Wcfs server, in turn, might send corresponding pin messages to the pinner and _wait_ for the answer before answering to resync: - - - - - - | .···|·····. ----> = request pinner <------.↓ <···· = response | | wcfs resync -------^↓ | `····|····· - - - - - - client process This creates the necessity to use RWMutex for locks that pinner and other parts of the code could be using at the same time in synchronous scenarios similar to the above. This locks are: - Conn.atMu - Conn.filehMu Note that FileH.mmapMu is regular - not RW - mutex, since nothing in wcfs client calls into wcfs server via watchlink with mmapMu held. The ordering of locks is: Conn.atMu > Conn.filehMu > FileH.mmapMu The pinner takes the following locks: - wconn.atMu.R - wconn.filehMu.R - fileh.mmapMu (to read .mmaps + write .pinned) (*) see "Wcfs locking organization" in wcfs.go Handling of fork When a process calls fork, OS copies its memory and creates child process with only 1 thread. That child inherits file descriptors and memory mappings from parent. To correctly continue using Conn, FileH and Mappings, the child must recreate pinner thread and reconnect to wcfs via reopened watchlink. The reason here is that without reconnection - by using watchlink file descriptor inherited from parent - the child would interfere into parent-wcfs exchange and neither parent nor child could continue normal protocol communication with WCFS. For simplicity, since fork is seldomly used for things besides followup exec, wcfs client currently takes straightforward approach by disabling mappings and detaching from WCFS server in the child right after fork. This ensures that there is no interference into parent-wcfs exchange should child decide not to exec and to continue running in the forked thread. Without this protection the interference might come even automatically via e.g. Python GC -> PyFileH.__del__ -> FileH.close -> message to WCFS. ---------------------------------------- Some preliminary history: kirr/wendelin.core@a8fa9178 X wcfs: move client tests into client/ kirr/wendelin.core@990afac1 X wcfs/client: Package overview (draft) kirr/wendelin.core@3f83469c X wcfs: client: Handle fork kirr/wendelin.core@0ed6b8b6 fixup! X wcfs: client: Handle fork kirr/wendelin.core@24378c46 X wcfs: client: Provide Conn.at()
-
Kirill Smelkov authored
Via custom isolation protocol that both server and clients must cooperatively follow. This is the core change that enables file cache to be practically shared while each client can still be provided with isolated view of the database. This patch brings only server changes, tests + the minimum client bits to support the tests. The client library, that will implement isolation protocol on client side, will come next. This patch is organized as follows: - wcfs.go brings in description of the protocol, overview of how server implements that protocol and the implementation itself. See also notes.txt - wcfs_test.py brings in tests for server implementation. tWCFS._abort_ontimeout had to be moved into nogil mode into wcfs_test.pyx to avoid deadlock on the GIL (see comments in wcfs_test.pyx for details). - files added in wcfs/client/ are needed to provide client-side implementation of WatchLink - the message exchange protocol over opened head/watch file - for tests. Client-side watchlink implementation lives in wcfs/client/wcfs_watchlink.{h,cpp}. The other additions in wcfs/client/ are to support that and to expose the WatchLink to Python. Client-side bits are done right in C++ because upcoming WCFS client library will be implemented in C++ to work in nogil mode in order to avoid deadlock on the GIL because client-side pinner thread might be woken-up synchronously by WCFS server at any moment, including when another client thread already holds the GIL and is paused by WCFS. Some preliminary history: kirr/wendelin.core@9b4a42a3 X invalidation design draftly settled kirr/wendelin.core@27d91d47 X δFtail settled kirr/wendelin.core@c27c1940 X mmap over under pagefault to this mmapping works kirr/wendelin.core@d36b171f X ptrace when client is under pagefault or syscall won't work kirr/wendelin.core@c1f5bb19 X notes on why lazy-invalidate approach was taken kirr/wendelin.core@4fbdd270 X Proof that that it is possible to change mmapping while under pagefault to it kirr/wendelin.core@33e0dfce X ΔTail draftly done kirr/wendelin.core@12628943 X make sure "bye" is always processed immediately - even if a handleWatch is currently blocked kirr/wendelin.core@af0a64cb X test for "bye" canceling blocked handlers kirr/wendelin.core@996dc6a8 X Fix race in test kirr/wendelin.core@43915fe9 X wcfs: Don't forbid simultaneous watch requests kirr/wendelin.core@941dc54b X wcfs: threading.Lock -> sync.Mutex kirr/wendelin.core@d75b2304 X wcfs: Move _abort_ontimeout to pyx/nogil kirr/wendelin.core@79234659 X Notes on why eagier invalidation was rejected kirr/wendelin.core@f05271b1 X Test that sysread(/head/watch) can be interrupted kirr/wendelin.core@5ba816da X restore test_wcfs_watch_robust after f05271b1. kirr/wendelin.core@4bd88564 X "Invalidation protocol" -> "Isolation protocol" kirr/wendelin.core@f7b54ca4 X avoid fmt::vsprintf (now compils again with latest pygolang@master) kirr/wendelin.core@0a8fcd9d X wcfs/client: Move EOF -> pygolang kirr/wendelin.core@153e02e6 X test_wcfs_watch_setup and test_wcfs_watch_setup_ahead work again kirr/wendelin.core@17f98edc X wcfs: client: os: Factor syserr -> string into _sysErrString kirr/wendelin.core@7b0c301c X wcfs: tests: Fix tFile.assertBlk not to segfault on a test failure kirr/wendelin.core@b74dda09 X Start switching Track from Track(key) to Track(keycov) kirr/wendelin.core@8b5d8523 X Move tracking of which blocks were accessed from wcfs to ΔFtail
-
Kirill Smelkov authored
Use ΔFtail.Track on every READ, and query accumulated ΔFtail upon receiving ZODB invalidation to query it about which blocks of which files have been changed. Then invalidate those blocks in OS file cache. See added documentation to wcfs.go and notes.txt for details. Now the filesystem is no longer stale: it provides view of data that is uptodate wrt changes on ZODB storage. Some preliminary history: kirr/wendelin.core@9b4a42a3 X invalidation design draftly settled kirr/wendelin.core@27d91d47 X δFtail settled kirr/wendelin.core@33e0dfce X ΔTail draftly done kirr/wendelin.core@822366a7 X keeping fd to root opened prevents the filesystem from being unmounted kirr/wendelin.core@89ad3a79 X Don't keep ZBigFile activated during whole current transaction kirr/wendelin.core@245511ac X Give pointer on from where to get nxd-fuse.ko kirr/wendelin.core@d1cd128c X Hit FUSE-related deadlock kirr/wendelin.core@d134ee44 X FUSE lookup deadlock should be hopefully fixed kirr/wendelin.core@0e60e9ff X wcfs: Don't noise ZWatcher trace logs with "select ..." kirr/wendelin.core@bf9a7405 X No longer rely on ZODB cache invariant for invalidations
-