-
Kirill Smelkov authored
ΔFtail builds on ΔBtail and provides ZBigFile-level history that WCFS will use to compute which blocks of a ZBigFile need to be invalidated in OS file cache given raw ZODB changes on ZODB invalidation message. It also will be used by WCFS to implement isolation protocol, where on every FUSE READ request WCFS will query ΔFtail to find out revision of corresponding file block. Quoting ΔFtail documentation: ---- 8< ---- ΔFtail provides ZBigFile-level history tail. It translates ZODB object-level changes to information about which blocks of which ZBigFile were modified, and provides service to query that information. ΔFtail class documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~ ΔFtail represents tail of revisional changes to files. It semantically consists of []δF ; rev ∈ (tail, head] where δF represents a change in files space δF: .rev↑ {} file -> {}blk | EPOCH Only files and blocks explicitly requested to be tracked are guaranteed to be present. In particular a block that was not explicitly requested to be tracked, even if it was changed in δZ, is not guaranteed to be present in δF. After file epoch (file creation, deletion, or any other change to file object) previous track requests for that file become forgotten and have no further effect. ΔFtail provides the following operations: .Track(file, blk, path, zblk) - add file and block reached via BTree path to tracked set. .Update(δZ) -> δF - update files δ tail given raw ZODB changes .ForgetPast(revCut) - forget changes ≤ revCut .SliceByRev(lo, hi) -> []δF - query for all files changes with rev ∈ (lo, hi] .SliceByFileRev(file, lo, hi) -> []δfile - query for changes of a file with rev ∈ (lo, hi] .BlkRevAt(file, #blk, at) -> blkrev - query for what is last revision that changed file[#blk] as of @at database state. where δfile represents a change to one file δfile: .rev↑ {}blk | EPOCH See also zodb.ΔTail and xbtree.ΔBtail Concurrency ΔFtail is safe to use in single-writer / multiple-readers mode. That is at any time there should be either only sole writer, or, potentially several simultaneous readers. The table below classifies operations: Writers: Update, ForgetPast Readers: Track + all queries (SliceByRev, SliceByFileRev, BlkRevAt) Note that, in particular, it is correct to run multiple Track and queries requests simultaneously. ΔFtail organization ~~~~~~~~~~~~~~~~~~~ ΔFtail leverages: - ΔBtail to track changes to ZBigFile.blktab BTree, and - ΔZtail to track changes to ZBlk objects and to ZBigFile object itself. then every query merges ΔBtail and ΔZtail data on the fly to provide ZBigFile-level result. Merging on the fly, contrary to computing and maintaining vδF data, is done to avoid complexity of recomputing vδF when tracking set changes. Most of ΔFtail complexity is, thus, located in ΔBtail, which implements BTree diff and handles complexity of recomputing vδB when set of tracked blocks changes after new track requests. Changes to ZBigFile object indicate epochs. Epochs could be: - file creation or deletion, - change of ZBigFile.blksize, - change of ZBigFile.blktab to point to another BTree. Epochs represent major changes to file history where file is assumed to change so dramatically, that practically it can be considered to be a "whole" change. In particular, WCFS, upon seeing a ZBigFile epoch, invalidates all data in corresponding OS-level cache for the file. The only historical data, that ΔFtail maintains by itself, is history of epochs. That history does not need to be recomputed when more blocks become tracked and is thus easy to maintain. It also can be maintained only in ΔFtail because ΔBtail and ΔZtail does not "know" anything about ZBigFile. Concurrency In order to allow multiple Track and queries requests to be served in parallel, ΔFtail bases its concurrency promise on ΔBtail guarantees + snapshot-style access for vδE and ztrackInBlk in queries: 1. Track calls ΔBtail.Track and quickly updates .byFile, .byRoot and _RootTrack indices under a lock. 2. BlkRevAt queries ΔBtail.GetAt and then combines retrieved information about zblk with vδE and δZ. 3. SliceByFileRev queries ΔBtail.SliceByRootRev and then merges retrieved vδT data with vδZ, vδE and ztrackInBlk. 4. In queries vδE is retrieved/built in snapshot style similarly to how vδT is built in ΔBtail. Note that vδE needs to be built only the first time, and does not need to be further rebuilt, so the logic in ΔFtail is simpler compared to ΔBtail. 5. for ztrackInBlk - that is used by SliceByFileRev query - an atomic snapshot is retrieved for objects of interest. This allows to hold δFtail.mu lock for relatively brief time without blocking other parallel Track/queries requests for long. Combined this organization allows non-overlapping queries/track-requests to run simultaneously. (This property is essential to WCFS because otherwise WCFS would not be able to serve several non-overlapping READ requests to one file in parallel.) See also "Concurrency" in ΔBtail organization for more details. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some preliminary history: ef74aebc X ΔFtail: Keep reference to ZBigFile via Oid, not via *ZBigFile bf9a7405 X No longer rely on ZODB cache invariant for invalidations 46340069 X found by Random e7b598c6 X start of ΔFtail.SliceByFileRev rework to function via merging δB and δZ histories on the fly 59c83009 X ΔFtail.SliceByFileRoot tests started to work draftly after "on-the-fly" rework 210e9b07 X Fix ΔBtail.SliceByRootRev (lo,hi] handling bf3ace66 X ΔFtail: Rebuild vδE after first track 46624787 X ΔFtail: `go test -failfast -short -v -run Random -randseed=1626793016249041295` discovered problems 786dd336 X Size no longer tracks [0,∞) since we start tracking when zfile is non-empty 4f707117 X test that shows problem of SliceByRootRev where untracked blocks are not added uniformly into whole history c0b7e4c3 X ΔFtail.SliceByFileRev: Fix untracked entries to be present uniformly in result aac37c11 X zdata: Introduce T to start removing duplication in tests bf411aa9 X zdata: Deduplicate zfile loading b74dda09 X Start switching Track from Track(key) to Track(keycov) aa0288ce X Switch SliceByRootRev to vδTSnapForTracked 588a512a X zdata: Switch SliceByFileRev not to clone Zinblk 8b5d8523 X Move tracking of which blocks were accessed from wcfs to ΔFtail 30f5ddc7 ΔFtail += .Epoch in δf 22f5f096 X Rework ΔFtail so that BlkRevAt works with ZBigFile checkout from any at ∈ (tail, head] 0853cc9f X ΔFtail + tests 124688f9 X ΔFtail fixes d85bb82c ΔFtail concurrency
f980471f