Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
W
wendelin.core
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Labels
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Commits
Open sidebar
Kirill Smelkov
wendelin.core
Commits
bf0c04bd
Commit
bf0c04bd
authored
Dec 25, 2018
by
Kirill Smelkov
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
.
parent
f38caef7
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
80 additions
and
81 deletions
+80
-81
wcfs/notes.txt
wcfs/notes.txt
+72
-0
wcfs/wcfs.go
wcfs/wcfs.go
+8
-81
No files found.
wcfs/notes.txt
View file @
bf0c04bd
...
...
@@ -6,6 +6,30 @@ This file contains notes additional to usage documentation and internal
organization overview in wcfs.go .
Notes on OS pagecache control
=============================
The cache of snapshotted bigfile can be pre-made hot, if invalidated region
was already in pagecache of head/bigfile/file:
- we can retrieve a region from pagecache of head/file with FUSE_NOTIFY_RETRIEVE.
- we can store that retrieved data into pagecache region of @<revX>/ with FUSE_NOTIFY_STORE.
- we can invalidate a region from pagecache of head/file with FUSE_NOTIFY_INVAL_INODE.
we have to disable FUSE_AUTO_INVAL_DATA to tell the kernel we are fully
responsible for invalidating pagecache. If we don't, the kernel will be
clearing whole cache of head/file on e.g. its mtime change.
XXX FUSE_AUTO_INVAL_DATA does not fully prevent kernel from automatically
invalidating pagecache - e.g. it will invalidate whole cache on file size changes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=e0bc833d10#n233
we can currently workaround it with using writeback mode (see !is_wb in the
link above), but better we have proper FUSE flag for filesystem server to
tell the kernel it is fully responsible for invalidating pagecache.
Invalidations to wcfs clients are delayed until block access
============================================================
...
...
@@ -116,3 +140,51 @@ to wcfs, and the above shows it won't work - trying to ptrace the
client from under wcfs will just block forever (the kernel will be
waiting for read operation to finish for ptrace, and read will be first
waiting on ptrace stopping to complete = deadlock)
δ(BTree) notes (XXX -> btreediff package)
=========================================
input: BTree, (@new, []oid) -> find out δ(BTree) i.e. {-k(v), +k'(v'), ...}
- oid ∈ Bucket
- oid ∈ BTree
Bucket:
old = {k -> v}
new = {k' -> v'}
Δ = -k(v), +k(v), ...
=> for all buckets
Δ accumulates to []δk(v)[n+,n-] n+ ∈ {0,1}, n- ∈ {0,1}, if n+=n- - cancel
BTree:
old = {k -> B} or {k -> T}
new = {k' -> B'} or {k' -> T'}
Δ = -k(B), +k(B), -k(T), +K(T), ...
we translate (in top-down order):
k(B) -> {} of k(v)
k(T) -> {} of k(B) -> {} of k(v)
which gives
Δ = k(v), +k(v), ...
i.e. exactly as for buckets and it accumulates to global Δ.
The globally-accumulated Δ is the answer for δ(BTree, (@new, []oid))
XXX -> internal/btreediff ?
δ(BTree) in wcfs context:
. -k(blk) -> invalidate #blk
. +k(blk) -> invalidate #blk (e.g. if blk was previously read as hold)
wcfs/wcfs.go
View file @
bf0c04bd
...
...
@@ -277,7 +277,7 @@ package main
//
// 4.4) for all file/blk to invalidate we do:
//
// - try to retrieve head/bigfile/file[blk] from OS file cache;
// - try to retrieve head/bigfile/file[blk] from OS file cache
(*)
;
// - if retrieved successfully -> store retrieved data back into OS file
// cache for @<rev>/bigfile/file[blk], where
//
...
...
@@ -290,7 +290,7 @@ package main
// won't be served from OS file cache and instead will trigger a FUSE read
// request to wcfs.
//
// 4.5) no invalidation messages are sent to wcfs clients at this point(
*
).
// 4.5) no invalidation messages are sent to wcfs clients at this point(
+
).
//
// 4.6) processing ZODB invalidations and serving file reads (see 7) are
// organized to be mutually exclusive.
...
...
@@ -350,7 +350,7 @@ package main
// remmapping is done via "invalidation protocol" exchange with client.
// ( one could imagine adjusting mappings synchronously via running
// wcfs-trusted code via ptrace that wcfs injects into clients, but ptrace
// won't work when client thread is blocked under pagefault or syscall(
~
) )
// won't work when client thread is blocked under pagefault or syscall(
^
) )
//
// in order to support remmapping for each head/bigfile/file
//
...
...
@@ -362,11 +362,12 @@ package main
//
// Thus a client that wants latest data on pagefault will get latest data,
// and a client that wants @rev data will get @rev data, even if it was this
// "old" client that triggered the pagefault(
+
).
// "old" client that triggered the pagefault(
~
).
//
// (*) see "Invalidations to wcfs clients are delayed until block access" in notes.txt
// (+) see "Changing mmapping while under pagefault is possible" in notes.txt
// (~) see "Client cannot be ptraced while under pagefault" in notes.txt
// (*) see notes.txt -> "Notes on OS pagecache control"
// (+) see notes.txt -> "Invalidations to wcfs clients are delayed until block access"
// (~) see notes.txt -> "Changing mmapping while under pagefault is possible"
// (^) see notes.txt -> "Client cannot be ptraced while under pagefault"
//
//
// XXX 8) serving read from @<rev>/data + zconn(s) for historical state
...
...
@@ -375,80 +376,6 @@ package main
//
// XXX(integrate place=?) ZData - no need to keep track -> ZBlk1 is always
// marked as changed on blk data change.
//
// ----------------------------------------
//
// δ(BTree) notes
//
//
// input: BTree, (@new, []oid) -> find out δ(BTree) i.e. {-k(v), +k'(v'), ...}
//
// - oid ∈ Bucket
// - oid ∈ BTree
//
// Bucket:
//
// old = {k -> v}
// new = {k' -> v'}
//
// Δ = -k(v), +k(v), ...
//
// => for all buckets
//
// Δ accumulates to []δk(v)[n+,n-] n+ ∈ {0,1}, n- ∈ {0,1}, if n+=n- - cancel
//
//
// BTree:
//
// old = {k -> B} or {k -> T}
// new = {k' -> B'} or {k' -> T'}
//
// Δ = -k(B), +k(B), -k(T), +K(T), ...
//
// we translate (in top-down order):
//
// k(B) -> {} of k(v)
// k(T) -> {} of k(B) -> {} of k(v)
//
// which gives
//
// Δ = k(v), +k(v), ...
//
// i.e. exactly as for buckets and it accumulates to global Δ.
//
// The globally-accumulated Δ is the answer for δ(BTree, (@new, []oid))
//
// XXX -> internal/btreediff ?
//
// δ(BTree) in wcfs context:
//
// . -k(blk) -> invalidate #blk
// . +k(blk) -> invalidate #blk (e.g. if blk was previously read as hold)
//
//
// ----------------------------------------
//
// Notes on OS pagecache control:
//
// the cache of snapshotted bigfile can be pre-made hot, if invalidated region
// was already in pagecache of head/data:
//
// - we can retrieve a region from pagecache of head/data with FUSE_NOTIFY_RETRIEVE.
// - we can store that retrieved data into pagecache region of @<tidX>/ with FUSE_NOTIFY_STORE.
// - we can invalidate a region from pagecache of head/data with FUSE_NOTIFY_INVAL_INODE.
//
// we have to disable FUSE_AUTO_INVAL_DATA to tell the kernel we are fully
// responsible for invalidating pagecache. If we don't, the kernel will be
// clearing whole cache of head/data on e.g. its mtime change.
//
// XXX FUSE_AUTO_INVAL_DATA does not fully prevent kernel from automatically
// invalidating pagecache - e.g. it will invalidate whole cache on file size changes:
//
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=e0bc833d10#n233
//
// we can currently workaround it with using writeback mode (see !is_wb in the
// link above), but better we have proper FUSE flag for filesystem server to
// tell the kernel it is fully responsible for invalidating pagecache.
import
(
"context"
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment