Commit bf0c04bd authored by Kirill Smelkov's avatar Kirill Smelkov

.

parent f38caef7
...@@ -6,6 +6,30 @@ This file contains notes additional to usage documentation and internal ...@@ -6,6 +6,30 @@ This file contains notes additional to usage documentation and internal
organization overview in wcfs.go . organization overview in wcfs.go .
Notes on OS pagecache control
=============================
The cache of snapshotted bigfile can be pre-made hot, if invalidated region
was already in pagecache of head/bigfile/file:
- we can retrieve a region from pagecache of head/file with FUSE_NOTIFY_RETRIEVE.
- we can store that retrieved data into pagecache region of @<revX>/ with FUSE_NOTIFY_STORE.
- we can invalidate a region from pagecache of head/file with FUSE_NOTIFY_INVAL_INODE.
we have to disable FUSE_AUTO_INVAL_DATA to tell the kernel we are fully
responsible for invalidating pagecache. If we don't, the kernel will be
clearing whole cache of head/file on e.g. its mtime change.
XXX FUSE_AUTO_INVAL_DATA does not fully prevent kernel from automatically
invalidating pagecache - e.g. it will invalidate whole cache on file size changes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=e0bc833d10#n233
we can currently workaround it with using writeback mode (see !is_wb in the
link above), but better we have proper FUSE flag for filesystem server to
tell the kernel it is fully responsible for invalidating pagecache.
Invalidations to wcfs clients are delayed until block access Invalidations to wcfs clients are delayed until block access
============================================================ ============================================================
...@@ -116,3 +140,51 @@ to wcfs, and the above shows it won't work - trying to ptrace the ...@@ -116,3 +140,51 @@ to wcfs, and the above shows it won't work - trying to ptrace the
client from under wcfs will just block forever (the kernel will be client from under wcfs will just block forever (the kernel will be
waiting for read operation to finish for ptrace, and read will be first waiting for read operation to finish for ptrace, and read will be first
waiting on ptrace stopping to complete = deadlock) waiting on ptrace stopping to complete = deadlock)
δ(BTree) notes (XXX -> btreediff package)
=========================================
input: BTree, (@new, []oid) -> find out δ(BTree) i.e. {-k(v), +k'(v'), ...}
- oid ∈ Bucket
- oid ∈ BTree
Bucket:
old = {k -> v}
new = {k' -> v'}
Δ = -k(v), +k(v), ...
=> for all buckets
Δ accumulates to []δk(v)[n+,n-] n+ ∈ {0,1}, n- ∈ {0,1}, if n+=n- - cancel
BTree:
old = {k -> B} or {k -> T}
new = {k' -> B'} or {k' -> T'}
Δ = -k(B), +k(B), -k(T), +K(T), ...
we translate (in top-down order):
k(B) -> {} of k(v)
k(T) -> {} of k(B) -> {} of k(v)
which gives
Δ = k(v), +k(v), ...
i.e. exactly as for buckets and it accumulates to global Δ.
The globally-accumulated Δ is the answer for δ(BTree, (@new, []oid))
XXX -> internal/btreediff ?
δ(BTree) in wcfs context:
. -k(blk) -> invalidate #blk
. +k(blk) -> invalidate #blk (e.g. if blk was previously read as hold)
...@@ -277,7 +277,7 @@ package main ...@@ -277,7 +277,7 @@ package main
// //
// 4.4) for all file/blk to invalidate we do: // 4.4) for all file/blk to invalidate we do:
// //
// - try to retrieve head/bigfile/file[blk] from OS file cache; // - try to retrieve head/bigfile/file[blk] from OS file cache(*);
// - if retrieved successfully -> store retrieved data back into OS file // - if retrieved successfully -> store retrieved data back into OS file
// cache for @<rev>/bigfile/file[blk], where // cache for @<rev>/bigfile/file[blk], where
// //
...@@ -290,7 +290,7 @@ package main ...@@ -290,7 +290,7 @@ package main
// won't be served from OS file cache and instead will trigger a FUSE read // won't be served from OS file cache and instead will trigger a FUSE read
// request to wcfs. // request to wcfs.
// //
// 4.5) no invalidation messages are sent to wcfs clients at this point(*). // 4.5) no invalidation messages are sent to wcfs clients at this point(+).
// //
// 4.6) processing ZODB invalidations and serving file reads (see 7) are // 4.6) processing ZODB invalidations and serving file reads (see 7) are
// organized to be mutually exclusive. // organized to be mutually exclusive.
...@@ -350,7 +350,7 @@ package main ...@@ -350,7 +350,7 @@ package main
// remmapping is done via "invalidation protocol" exchange with client. // remmapping is done via "invalidation protocol" exchange with client.
// ( one could imagine adjusting mappings synchronously via running // ( one could imagine adjusting mappings synchronously via running
// wcfs-trusted code via ptrace that wcfs injects into clients, but ptrace // wcfs-trusted code via ptrace that wcfs injects into clients, but ptrace
// won't work when client thread is blocked under pagefault or syscall(~) ) // won't work when client thread is blocked under pagefault or syscall(^) )
// //
// in order to support remmapping for each head/bigfile/file // in order to support remmapping for each head/bigfile/file
// //
...@@ -362,11 +362,12 @@ package main ...@@ -362,11 +362,12 @@ package main
// //
// Thus a client that wants latest data on pagefault will get latest data, // Thus a client that wants latest data on pagefault will get latest data,
// and a client that wants @rev data will get @rev data, even if it was this // and a client that wants @rev data will get @rev data, even if it was this
// "old" client that triggered the pagefault(+). // "old" client that triggered the pagefault(~).
// //
// (*) see "Invalidations to wcfs clients are delayed until block access" in notes.txt // (*) see notes.txt -> "Notes on OS pagecache control"
// (+) see "Changing mmapping while under pagefault is possible" in notes.txt // (+) see notes.txt -> "Invalidations to wcfs clients are delayed until block access"
// (~) see "Client cannot be ptraced while under pagefault" in notes.txt // (~) see notes.txt -> "Changing mmapping while under pagefault is possible"
// (^) see notes.txt -> "Client cannot be ptraced while under pagefault"
// //
// //
// XXX 8) serving read from @<rev>/data + zconn(s) for historical state // XXX 8) serving read from @<rev>/data + zconn(s) for historical state
...@@ -375,80 +376,6 @@ package main ...@@ -375,80 +376,6 @@ package main
// //
// XXX(integrate place=?) ZData - no need to keep track -> ZBlk1 is always // XXX(integrate place=?) ZData - no need to keep track -> ZBlk1 is always
// marked as changed on blk data change. // marked as changed on blk data change.
//
// ----------------------------------------
//
// δ(BTree) notes
//
//
// input: BTree, (@new, []oid) -> find out δ(BTree) i.e. {-k(v), +k'(v'), ...}
//
// - oid ∈ Bucket
// - oid ∈ BTree
//
// Bucket:
//
// old = {k -> v}
// new = {k' -> v'}
//
// Δ = -k(v), +k(v), ...
//
// => for all buckets
//
// Δ accumulates to []δk(v)[n+,n-] n+ ∈ {0,1}, n- ∈ {0,1}, if n+=n- - cancel
//
//
// BTree:
//
// old = {k -> B} or {k -> T}
// new = {k' -> B'} or {k' -> T'}
//
// Δ = -k(B), +k(B), -k(T), +K(T), ...
//
// we translate (in top-down order):
//
// k(B) -> {} of k(v)
// k(T) -> {} of k(B) -> {} of k(v)
//
// which gives
//
// Δ = k(v), +k(v), ...
//
// i.e. exactly as for buckets and it accumulates to global Δ.
//
// The globally-accumulated Δ is the answer for δ(BTree, (@new, []oid))
//
// XXX -> internal/btreediff ?
//
// δ(BTree) in wcfs context:
//
// . -k(blk) -> invalidate #blk
// . +k(blk) -> invalidate #blk (e.g. if blk was previously read as hold)
//
//
// ----------------------------------------
//
// Notes on OS pagecache control:
//
// the cache of snapshotted bigfile can be pre-made hot, if invalidated region
// was already in pagecache of head/data:
//
// - we can retrieve a region from pagecache of head/data with FUSE_NOTIFY_RETRIEVE.
// - we can store that retrieved data into pagecache region of @<tidX>/ with FUSE_NOTIFY_STORE.
// - we can invalidate a region from pagecache of head/data with FUSE_NOTIFY_INVAL_INODE.
//
// we have to disable FUSE_AUTO_INVAL_DATA to tell the kernel we are fully
// responsible for invalidating pagecache. If we don't, the kernel will be
// clearing whole cache of head/data on e.g. its mtime change.
//
// XXX FUSE_AUTO_INVAL_DATA does not fully prevent kernel from automatically
// invalidating pagecache - e.g. it will invalidate whole cache on file size changes:
//
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=e0bc833d10#n233
//
// we can currently workaround it with using writeback mode (see !is_wb in the
// link above), but better we have proper FUSE flag for filesystem server to
// tell the kernel it is fully responsible for invalidating pagecache.
import ( import (
"context" "context"
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment