Commit bf0c04bd authored by Kirill Smelkov's avatar Kirill Smelkov

.

parent f38caef7
......@@ -6,6 +6,30 @@ This file contains notes additional to usage documentation and internal
organization overview in wcfs.go .
Notes on OS pagecache control
=============================
The cache of snapshotted bigfile can be pre-made hot, if invalidated region
was already in pagecache of head/bigfile/file:
- we can retrieve a region from pagecache of head/file with FUSE_NOTIFY_RETRIEVE.
- we can store that retrieved data into pagecache region of @<revX>/ with FUSE_NOTIFY_STORE.
- we can invalidate a region from pagecache of head/file with FUSE_NOTIFY_INVAL_INODE.
we have to disable FUSE_AUTO_INVAL_DATA to tell the kernel we are fully
responsible for invalidating pagecache. If we don't, the kernel will be
clearing whole cache of head/file on e.g. its mtime change.
XXX FUSE_AUTO_INVAL_DATA does not fully prevent kernel from automatically
invalidating pagecache - e.g. it will invalidate whole cache on file size changes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=e0bc833d10#n233
we can currently workaround it with using writeback mode (see !is_wb in the
link above), but better we have proper FUSE flag for filesystem server to
tell the kernel it is fully responsible for invalidating pagecache.
Invalidations to wcfs clients are delayed until block access
============================================================
......@@ -116,3 +140,51 @@ to wcfs, and the above shows it won't work - trying to ptrace the
client from under wcfs will just block forever (the kernel will be
waiting for read operation to finish for ptrace, and read will be first
waiting on ptrace stopping to complete = deadlock)
δ(BTree) notes (XXX -> btreediff package)
=========================================
input: BTree, (@new, []oid) -> find out δ(BTree) i.e. {-k(v), +k'(v'), ...}
- oid ∈ Bucket
- oid ∈ BTree
Bucket:
old = {k -> v}
new = {k' -> v'}
Δ = -k(v), +k(v), ...
=> for all buckets
Δ accumulates to []δk(v)[n+,n-] n+ ∈ {0,1}, n- ∈ {0,1}, if n+=n- - cancel
BTree:
old = {k -> B} or {k -> T}
new = {k' -> B'} or {k' -> T'}
Δ = -k(B), +k(B), -k(T), +K(T), ...
we translate (in top-down order):
k(B) -> {} of k(v)
k(T) -> {} of k(B) -> {} of k(v)
which gives
Δ = k(v), +k(v), ...
i.e. exactly as for buckets and it accumulates to global Δ.
The globally-accumulated Δ is the answer for δ(BTree, (@new, []oid))
XXX -> internal/btreediff ?
δ(BTree) in wcfs context:
. -k(blk) -> invalidate #blk
. +k(blk) -> invalidate #blk (e.g. if blk was previously read as hold)
......@@ -277,7 +277,7 @@ package main
//
// 4.4) for all file/blk to invalidate we do:
//
// - try to retrieve head/bigfile/file[blk] from OS file cache;
// - try to retrieve head/bigfile/file[blk] from OS file cache(*);
// - if retrieved successfully -> store retrieved data back into OS file
// cache for @<rev>/bigfile/file[blk], where
//
......@@ -290,7 +290,7 @@ package main
// won't be served from OS file cache and instead will trigger a FUSE read
// request to wcfs.
//
// 4.5) no invalidation messages are sent to wcfs clients at this point(*).
// 4.5) no invalidation messages are sent to wcfs clients at this point(+).
//
// 4.6) processing ZODB invalidations and serving file reads (see 7) are
// organized to be mutually exclusive.
......@@ -350,7 +350,7 @@ package main
// remmapping is done via "invalidation protocol" exchange with client.
// ( one could imagine adjusting mappings synchronously via running
// wcfs-trusted code via ptrace that wcfs injects into clients, but ptrace
// won't work when client thread is blocked under pagefault or syscall(~) )
// won't work when client thread is blocked under pagefault or syscall(^) )
//
// in order to support remmapping for each head/bigfile/file
//
......@@ -362,11 +362,12 @@ package main
//
// Thus a client that wants latest data on pagefault will get latest data,
// and a client that wants @rev data will get @rev data, even if it was this
// "old" client that triggered the pagefault(+).
// "old" client that triggered the pagefault(~).
//
// (*) see "Invalidations to wcfs clients are delayed until block access" in notes.txt
// (+) see "Changing mmapping while under pagefault is possible" in notes.txt
// (~) see "Client cannot be ptraced while under pagefault" in notes.txt
// (*) see notes.txt -> "Notes on OS pagecache control"
// (+) see notes.txt -> "Invalidations to wcfs clients are delayed until block access"
// (~) see notes.txt -> "Changing mmapping while under pagefault is possible"
// (^) see notes.txt -> "Client cannot be ptraced while under pagefault"
//
//
// XXX 8) serving read from @<rev>/data + zconn(s) for historical state
......@@ -375,80 +376,6 @@ package main
//
// XXX(integrate place=?) ZData - no need to keep track -> ZBlk1 is always
// marked as changed on blk data change.
//
// ----------------------------------------
//
// δ(BTree) notes
//
//
// input: BTree, (@new, []oid) -> find out δ(BTree) i.e. {-k(v), +k'(v'), ...}
//
// - oid ∈ Bucket
// - oid ∈ BTree
//
// Bucket:
//
// old = {k -> v}
// new = {k' -> v'}
//
// Δ = -k(v), +k(v), ...
//
// => for all buckets
//
// Δ accumulates to []δk(v)[n+,n-] n+ ∈ {0,1}, n- ∈ {0,1}, if n+=n- - cancel
//
//
// BTree:
//
// old = {k -> B} or {k -> T}
// new = {k' -> B'} or {k' -> T'}
//
// Δ = -k(B), +k(B), -k(T), +K(T), ...
//
// we translate (in top-down order):
//
// k(B) -> {} of k(v)
// k(T) -> {} of k(B) -> {} of k(v)
//
// which gives
//
// Δ = k(v), +k(v), ...
//
// i.e. exactly as for buckets and it accumulates to global Δ.
//
// The globally-accumulated Δ is the answer for δ(BTree, (@new, []oid))
//
// XXX -> internal/btreediff ?
//
// δ(BTree) in wcfs context:
//
// . -k(blk) -> invalidate #blk
// . +k(blk) -> invalidate #blk (e.g. if blk was previously read as hold)
//
//
// ----------------------------------------
//
// Notes on OS pagecache control:
//
// the cache of snapshotted bigfile can be pre-made hot, if invalidated region
// was already in pagecache of head/data:
//
// - we can retrieve a region from pagecache of head/data with FUSE_NOTIFY_RETRIEVE.
// - we can store that retrieved data into pagecache region of @<tidX>/ with FUSE_NOTIFY_STORE.
// - we can invalidate a region from pagecache of head/data with FUSE_NOTIFY_INVAL_INODE.
//
// we have to disable FUSE_AUTO_INVAL_DATA to tell the kernel we are fully
// responsible for invalidating pagecache. If we don't, the kernel will be
// clearing whole cache of head/data on e.g. its mtime change.
//
// XXX FUSE_AUTO_INVAL_DATA does not fully prevent kernel from automatically
// invalidating pagecache - e.g. it will invalidate whole cache on file size changes:
//
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=e0bc833d10#n233
//
// we can currently workaround it with using writeback mode (see !is_wb in the
// link above), but better we have proper FUSE flag for filesystem server to
// tell the kernel it is fully responsible for invalidating pagecache.
import (
"context"
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment