Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
W
wendelin.core
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Joshua
wendelin.core
Commits
c40f3831
Commit
c40f3831
authored
Dec 24, 2018
by
Kirill Smelkov
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
.
parent
901e9fc1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
87 additions
and
80 deletions
+87
-80
wcfs/wcfs.go
wcfs/wcfs.go
+87
-80
No files found.
wcfs/wcfs.go
View file @
c40f3831
...
@@ -28,7 +28,7 @@
...
@@ -28,7 +28,7 @@
// file that represents whole ZBigFile's data.
// file that represents whole ZBigFile's data.
//
//
// For a client, the primary way to access a bigfile should be to mmap
// For a client, the primary way to access a bigfile should be to mmap
//
bigfile/<bigfileX>/head/data
which represents always latest bigfile data.
//
head/bigfile/<bigfileX>
which represents always latest bigfile data.
// Clients that want to get isolation guarantee should subscribe for
// Clients that want to get isolation guarantee should subscribe for
// invalidations and re-mmap invalidated regions to file with pinned bigfile revision for
// invalidations and re-mmap invalidated regions to file with pinned bigfile revision for
// the duration of their transaction. See "Invalidation protocol" for details.
// the duration of their transaction. See "Invalidation protocol" for details.
...
@@ -42,119 +42,125 @@
...
@@ -42,119 +42,125 @@
//
//
// Top-level structure of provided filesystem is as follows:
// Top-level structure of provided filesystem is as follows:
//
//
// bigfile/
// head/ ; latest database view
// <oid(bigfile1)>/
// ...
// ...
//
<oid(bigfile2)>/
//
@<rev1>/ ; database view as of revision <revX>
// ...
// ...
// ...
// @<rev2>/
//
// where for a bigfileX there is bigfile/<oid(bigfileX)>/ directory, with
// oid(bigfileX) being ZODB object-id of corresponding ZBigFile object formatted with %016x.
//
// Each bigfileX/ has the following structure:
//
// bigfile/<bigfileX>/
// head/ ; latest bigfile revision
// ...
// @<tid1>/ ; bigfile revision as of transaction <tidX>
// ...
// @<tid2>/
// ...
// ...
// ...
// ...
//
//
// where head/ represents latest
bigfile
as stored in upstream ZODB, and
// where head/ represents latest
data
as stored in upstream ZODB, and
// @<
tidX>/ represents bigfile as of transaction <tid
X>.
// @<
revX>/ represents data as of revision <rev
X>.
//
//
// head/ has the following structure:
// head/ has the following structure:
//
//
// bigfile/<bigfileX>/head/
// head/
// data ; latest bigfile data
// at ; data inside head/ is as of this ZODB transaction
// at ; data is bigfile view as of this ZODB transaction
// watch ; channel for bigfile invalidations
// invalidations ; channel that describes invalidated data regions
// bigfile/ ; bigfiles' data
// <oid(bigfile1)>
// <oid(bigfile2)>
// ...
//
//
// where /
data represents latest bigfile data as stored in upstream ZODB. As
// where /
bigfile/<bigfileX> represents latest bigfile data as stored in
//
there can be some lag receiving updates from the database, /at describes
//
upstream ZODB. As there can be some lag receiving updates from the database,
//
precisely ZODB state for which bigfile data is currently exposed. Whenever
//
/at describes precisely ZODB state for which bigfile data is currently
//
bigfile data is changed in upstream ZODB, information about the changes is
//
exposed. Whenever bigfile data is changed in upstream ZODB, information
//
first propagated to /invalidations, and only after that /data is
//
about the changes is first propagated to /watch, and only after that
// updated. See "Invalidation protocol" for details.
//
/bigfile/<bigfileX> is
updated. See "Invalidation protocol" for details.
//
//
// @<
tid
X>/ has the following structure:
// @<
rev
X>/ has the following structure:
//
//
// bigfile/<bigfileX>/@<tidX>/
// @<revX>/
// data ; bigfile data as of transaction <tidX>
// at
// bigfile/ ; bigfiles' data as of revision <revX>
// <oid(bigfile1)>
// <oid(bigfile2)>
// ...
//
//
// where /
data represents bigfile data as of transaction <tid
X>.
// where /
bigfile/<bigfileX> represent bigfile data as of revision <rev
X>.
//
//
// bigfile/<bigfileX>/ should be created by client via mkdir. Unless explicitly
// Unless accessed {head,@<revX>}/bigfile/<bigfileX> are not automatically visible in
// created bigfile/<bigfileX>/ are not automatically visible in wcfs
// wcfs filesystem. Similarly @<revX>/ should be explicitly created by client via mkdir.
// filesystem. Similarly bigfile/<bigfileX>/@<tidX>/ should be too created by
// client.
//
//
//
//
// Invalidation protocol
// Invalidation protocol
//
//
// XXX invalidations will be done via ptrace because we need them to be
// In order to support isolation, wcfs implements invalidation protocol that
// synchronous (see "wcfs organization")
//
// In order to support isolation wcfs implements invalidation protocol that
// must be cooperatively followed by both wcfs and client.
// must be cooperatively followed by both wcfs and client.
//
//
// First, before client wants to mmap bigfile, it opens
// First, client mmaps latest bigfile, but does not access it
// bigfile/<bigfileX>/head/invalidations and tells wcfs through it for which
// ZODB state it wants to get bigfile view. The server in turn reports for
// which ZODB state head/data is current, δ describing changed bigfile region
// between those revisions, or "wait" flag if server state is earlier compared
// to what client wants:
//
//
// C: want <Cat>
// mmap(head/bigfile/<bigfileX>)
// S: have <Sat>, wait ; Sat < Cat
// S: have <Sat>, δR(Cat,Sat) ; Sat ≥ Cat
//
//
// If server reply was "wait" the client does nothing and waits for next server
// Then client opens head/watch and tells wcfs through it for which ZODB state
// message which must come without "wait" flag set. When client receives have
// it wants to get bigfile's view.
// message with δR(Cat,Sat) it has the guarantee from wcfs that head/data
// content is for Sat ZODB revision and won't change until client sends ack
// back to the server. The client in turn now can mmap head/data and
// @<Cat>/data to get bigfile view as of Cat:
//
//
// mmap(bigfile/<bigfileX>/head/data)
// C: 1 watch <bigfileX> @<at>
// mmap(bigfile/<bigfileX>/@<Cat>/data, δR(Cat,Sat), MAP_FIXED) # mmaped at addresses corresponding to δR(Cat,Sat)
//
//
// When client completes its initial mmapping it sends ack back to the server:
// The server then, after potentially sending initial pin messages (see below),
// reports either success or failure:
//
//
// C: ack
// S: 1 ok
// S: 1 error ... ; if <at> is too far away back from head/at
//
//
// From now on the server will be processing updates to bigfile coming from
// The server sends "ok" reply only after head/at is ≥ requested <at>, and
// ZODB as follows:
// only after all initial pin messages are fully acknowledged by the client.
// The client can start to use mmapped data after it gets "ok".
// The server sends "error" reply if requested <at> is too far away back from
// head/at.
//
//
// Upon watch request, either initially, or after sending "ok", the server will be notifying the
// client about file blocks that client needs to pin in order to observe file's
// data as of <at> revision:
//
//
// The filesystem server itself receives information about changed data
// The filesystem server itself receives information about changed data from
// from ZODB server through regular ZODB invalidation channel (as it is ZODB
// ZODB server through regular ZODB invalidation channel (as it is ZODB client
// client itself). Then, before actually updating bigfile/<bigfileX>/head/data
// itself). Then, separately for each changed file block, before actually
// content in changed part, it notifies through bigfile/<bigfileX>/head/invalidations
// updating head/bigfile/<bigfileX> content, it notifies through head/watch to
// to clients that had opened this file (separately to each client) about the changes:
// clients, that had requested it (separately to each client), about the
// changes:
//
//
// S:
have <Sat>, δR(Sat_prev, Sat)
// S:
2 pin <bigfileX> #<blk> @<rev_max>
//
//
// where Sat_prev is ZODB revision last reported to client for this bigfile,
// and waits until all clients confirm that changed file block can be updated
// and waits until they all confirm that changed file part can be updated in
// in global OS cache.
// global OS cache.
//
//
// The client in turn
can now re-mmap invalidated regions to bigfile@Cat
// The client in turn
should now re-mmap requested to be pinned block to bigfile@<rev_max>
//
//
// # mmapped at address
es corresponding to δR(Sat_prev, Sat)
// # mmapped at address
corresponding to #blk
// mmap(
bigfile/<bigfileX>/@<Cat>/data, δR(Sat_prev, Sat)
, MAP_FIXED)
// mmap(
@<rev_max>/bigfile/<bigfileX>, #blk
, MAP_FIXED)
//
//
// and must send ack back to the server when it is done:
// and must send ack back to the server when it is done:
//
//
// C: ack
// C: 2 ack
//
// The server sends pin notifications only for file blocks, that are known to
// be potentially changed after client's <at>, and <rev_max> describes the
// upper bound for the block revision:
//
// <at> < <rev_max>
//
// The server maintains short history tail of file changes to be able to
// support openings with <at> being slightly in the past compared to current
// head/at. The server might reject a watch request if <at> is too far away in
// the past from head/at. The client is advised to restart its transaction with
// more uptodate database view if it gets watch setup error.
//
// A later request from the client for the same <bigfileX> but with different
// <at>, overrides previous watch request for that file. A client can use "-"
// instead of "@<at>" to stop watching the file.
//
// A single client can send several watch requests through single head/watch
// open, as well as it can use several head/watch opens simultaneously.
// The server sends pin notifications for all files requested to be watched via
// every head/watch open.
//
//
// When clients are done with
bigfile/<bigfileX>/@<Cat>/data (i.e. Cat
// When clients are done with
@<revX>/bigfile/<bigfileX> (i.e. client's
// transaction ends and array is unmapped), the server sees number of opened
// transaction ends and array is unmapped), the server sees number of opened
// files to
bigfile/<bigfileX>/@<Cat>/data
drops to zero, and automatically
// files to
@<revX>/bigfile/<bigfileX>
drops to zero, and automatically
// destroys
bigfile/<bigfileX>/@<Cat>/ directory
after reasonable timeout.
// destroys
@<revX>/bigfile/<bigfileX>
after reasonable timeout.
//
//
//
//
// Protection against slow or faulty clients
// Protection against slow or faulty clients
...
@@ -293,6 +299,7 @@ package main
...
@@ -293,6 +299,7 @@ package main
// δFtail.by allows to quickly lookup information by #blk.
// δFtail.by allows to quickly lookup information by #blk.
//
//
// min(rev) in δFtail is min(@at) at which head/data is currently mmapped (see below).
// min(rev) in δFtail is min(@at) at which head/data is currently mmapped (see below).
// XXX min(10 minutes) of history to support initial openenings
//
//
// 7) when we receive a FUSE read(#blk) request to a file/head/data we process it as follows:
// 7) when we receive a FUSE read(#blk) request to a file/head/data we process it as follows:
//
//
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment