wcfs: zdata: ΔFtail
ΔFtail builds on ΔBtail and provides ZBigFile-level history that WCFS will use to compute which blocks of a ZBigFile need to be invalidated in OS file cache given raw ZODB changes on ZODB invalidation message. It also will be used by WCFS to implement isolation protocol, where on every FUSE READ request WCFS will query ΔFtail to find out revision of corresponding file block. Quoting ΔFtail documentation: ---- 8< ---- ΔFtail provides ZBigFile-level history tail. It translates ZODB object-level changes to information about which blocks of which ZBigFile were modified, and provides service to query that information. ΔFtail class documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~ ΔFtail represents tail of revisional changes to files. It semantically consists of []δF ; rev ∈ (tail, head] where δF represents a change in files space δF: .rev↑ {} file -> {}blk | EPOCH Only files and blocks explicitly requested to be tracked are guaranteed to be present. In particular a block that was not explicitly requested to be tracked, even if it was changed in δZ, is not guaranteed to be present in δF. After file epoch (file creation, deletion, or any other change to file object) previous track requests for that file become forgotten and have no further effect. ΔFtail provides the following operations: .Track(file, blk, path, zblk) - add file and block reached via BTree path to tracked set. .Update(δZ) -> δF - update files δ tail given raw ZODB changes .ForgetPast(revCut) - forget changes ≤ revCut .SliceByRev(lo, hi) -> []δF - query for all files changes with rev ∈ (lo, hi] .SliceByFileRev(file, lo, hi) -> []δfile - query for changes of a file with rev ∈ (lo, hi] .BlkRevAt(file, #blk, at) -> blkrev - query for what is last revision that changed file[#blk] as of @at database state. where δfile represents a change to one file δfile: .rev↑ {}blk | EPOCH See also zodb.ΔTail and xbtree.ΔBtail Concurrency ΔFtail is safe to use in single-writer / multiple-readers mode. That is at any time there should be either only sole writer, or, potentially several simultaneous readers. The table below classifies operations: Writers: Update, ForgetPast Readers: Track + all queries (SliceByRev, SliceByFileRev, BlkRevAt) Note that, in particular, it is correct to run multiple Track and queries requests simultaneously. ΔFtail organization ~~~~~~~~~~~~~~~~~~~ ΔFtail leverages: - ΔBtail to track changes to ZBigFile.blktab BTree, and - ΔZtail to track changes to ZBlk objects and to ZBigFile object itself. then every query merges ΔBtail and ΔZtail data on the fly to provide ZBigFile-level result. Merging on the fly, contrary to computing and maintaining vδF data, is done to avoid complexity of recomputing vδF when tracking set changes. Most of ΔFtail complexity is, thus, located in ΔBtail, which implements BTree diff and handles complexity of recomputing vδB when set of tracked blocks changes after new track requests. Changes to ZBigFile object indicate epochs. Epochs could be: - file creation or deletion, - change of ZBigFile.blksize, - change of ZBigFile.blktab to point to another BTree. Epochs represent major changes to file history where file is assumed to change so dramatically, that practically it can be considered to be a "whole" change. In particular, WCFS, upon seeing a ZBigFile epoch, invalidates all data in corresponding OS-level cache for the file. The only historical data, that ΔFtail maintains by itself, is history of epochs. That history does not need to be recomputed when more blocks become tracked and is thus easy to maintain. It also can be maintained only in ΔFtail because ΔBtail and ΔZtail does not "know" anything about ZBigFile. Concurrency In order to allow multiple Track and queries requests to be served in parallel, ΔFtail bases its concurrency promise on ΔBtail guarantees + snapshot-style access for vδE and ztrackInBlk in queries: 1. Track calls ΔBtail.Track and quickly updates .byFile, .byRoot and _RootTrack indices under a lock. 2. BlkRevAt queries ΔBtail.GetAt and then combines retrieved information about zblk with vδE and δZ. 3. SliceByFileRev queries ΔBtail.SliceByRootRev and then merges retrieved vδT data with vδZ, vδE and ztrackInBlk. 4. In queries vδE is retrieved/built in snapshot style similarly to how vδT is built in ΔBtail. Note that vδE needs to be built only the first time, and does not need to be further rebuilt, so the logic in ΔFtail is simpler compared to ΔBtail. 5. for ztrackInBlk - that is used by SliceByFileRev query - an atomic snapshot is retrieved for objects of interest. This allows to hold δFtail.mu lock for relatively brief time without blocking other parallel Track/queries requests for long. Combined this organization allows non-overlapping queries/track-requests to run simultaneously. (This property is essential to WCFS because otherwise WCFS would not be able to serve several non-overlapping READ requests to one file in parallel.) See also "Concurrency" in ΔBtail organization for more details. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some preliminary history: kirr/wendelin.core@ef74aebc X ΔFtail: Keep reference to ZBigFile via Oid, not via *ZBigFile kirr/wendelin.core@bf9a7405 X No longer rely on ZODB cache invariant for invalidations kirr/wendelin.core@46340069 X found by Random kirr/wendelin.core@e7b598c6 X start of ΔFtail.SliceByFileRev rework to function via merging δB and δZ histories on the fly kirr/wendelin.core@59c83009 X ΔFtail.SliceByFileRoot tests started to work draftly after "on-the-fly" rework kirr/wendelin.core@210e9b07 X Fix ΔBtail.SliceByRootRev (lo,hi] handling kirr/wendelin.core@bf3ace66 X ΔFtail: Rebuild vδE after first track kirr/wendelin.core@46624787 X ΔFtail: `go test -failfast -short -v -run Random -randseed=1626793016249041295` discovered problems kirr/wendelin.core@786dd336 X Size no longer tracks [0,∞) since we start tracking when zfile is non-empty kirr/wendelin.core@4f707117 X test that shows problem of SliceByRootRev where untracked blocks are not added uniformly into whole history kirr/wendelin.core@c0b7e4c3 X ΔFtail.SliceByFileRev: Fix untracked entries to be present uniformly in result kirr/wendelin.core@aac37c11 X zdata: Introduce T to start removing duplication in tests kirr/wendelin.core@bf411aa9 X zdata: Deduplicate zfile loading kirr/wendelin.core@b74dda09 X Start switching Track from Track(key) to Track(keycov) kirr/wendelin.core@aa0288ce X Switch SliceByRootRev to vδTSnapForTracked kirr/wendelin.core@588a512a X zdata: Switch SliceByFileRev not to clone Zinblk kirr/wendelin.core@8b5d8523 X Move tracking of which blocks were accessed from wcfs to ΔFtail kirr/wendelin.core@30f5ddc7 ΔFtail += .Epoch in δf kirr/wendelin.core@22f5f096 X Rework ΔFtail so that BlkRevAt works with ZBigFile checkout from any at ∈ (tail, head] kirr/wendelin.core@0853cc9f X ΔFtail + tests kirr/wendelin.core@124688f9 X ΔFtail fixes kirr/wendelin.core@d85bb82c ΔFtail concurrency
Showing