Commit f34ea31d authored by Kirill Smelkov's avatar Kirill Smelkov

.

parent d1b58568
==============================================
Additional notes to documentation in wcfs.go
==============================================
This file contains notes additional to usage documentation and internal
organization overview in wcfs.go .
Changing mmapping while under pagefault is possible
===================================================
We can change a mapping while a page from it is under pagefault:
- the kernel, upon handling pagefault, queues read request to filesystem
server. As of Linux 4.20 this is done _with_ holding client->mm->mmap_sem:
kprobe:fuse_readpages (client->mm->mmap_sem.count: 1)
fuse_readpages+1
read_pages+109
__do_page_cache_readahead+401
filemap_fault+635
__do_fault+31
__handle_mm_fault+3403
handle_mm_fault+220
__do_page_fault+598
page_fault+30
- however the read request is queued to be performed asynchronously -
the kernel does not wait for it in fuse_readpages, because
* git.kernel.org/linus/c1aa96a5,
* git.kernel.org/linus/9cd68455,
* and go-fuse initially negotiating CAP_ASYNC_READ to the kernel.
- the kernel then _releases_ client->mm->mmap_sem and then waits
for to-read pages to become ready:
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n2411
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n2457
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n1301
- the filesystem server upon receiving the read request can manipulate
client's address space. This requires to write-lock client->mm->mmap_sem,
but we can be sure it won't deadlock because the kernel releases it
before waiting (see previous point).
in practice the manipulation is done by another client thread, because
on Linux it is not possible to change mm of another process. However
the main point here is that the manipulation is possible because
there will be no deadlock on client->mm->mmap_sem.
For the reference here is how filesystem server reply looks under trace:
kprobe:fuse_readpages_end
fuse_readpages_end+1
request_end+188
fuse_dev_do_write+1921
fuse_dev_write+78
do_iter_readv_writev+325
do_iter_write+128
vfs_writev+152
do_writev+94
do_syscall_64+85
entry_SYSCALL_64_after_hwframe+68
and a test program that demonstrates that it is possible to change
mmapping while under pagefault to it:
https://lab.nexedi.com/kirr/go-fuse/commit/f822c9db
In the future mmap_sem might be released while doing any IO:
https://lwn.net/Articles/768857
but before that the analysis remains FUSE-specific.
Client cannot be ptraced while under pagefault
==============================================
We cannot use ptrace to run code on client thread that is under pagefault:
The kernel sends SIGSTOP to interrupt tracee, but the signal will be
processed only when the process returns from kernel space, e.g. here
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/common.c?id=v4.19-rc8-151-g23469de647c4#n160
This way the tracer won't receive obligatory information that tracee
stopped (via wait...) and even though ptrace(ATTACH) succeeds, all other
ptrace commands will fail:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/ptrace.c?id=v4.19-rc8-151-g23469de647c4#n1140
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/ptrace.c?id=v4.19-rc8-151-g23469de647c4#n207
My original idea was to use ptrace to run code in process to change it's
memory mappings, while the triggering process is under pagefault/read
to wcfs, and the above shows it won't work - trying to ptrace the
client from under wcfs will just block forever (the kernel will be
waiting for read operation to finish for ptrace, and read will be first
waiting on ptrace stopping to complete = deadlock)
digraph {
wcfs -> wcfs_simple;
wcfs -> ZODB_go_inv;
wcfs -> Sinvtree;
wcfs -> δR;
// wcfs -> wcfs_simple;
// wcfs -> Sinvtree;
// wcfs -> δR;
wcfs -> autoexit;
wcfs_simple -> Btree_read;
wcfs_simple -> ZBlk_read;
wcfs_simple -> autoexit;
wcfs -> wcfsInvProcess;
wcfs -> wcfsRead;
client -> wcfs_spawn;
client -> δR;
wcfsInvProcess -> ZODB_go_inv;
wcfsInvProcess -> zconnCacheGet;
wcfsInvProcess -> zobj2file;
wcfsInvProcess -> δFtail;
wcfsInvProcess -> fuseRetrieveCache;
wcfsRead -> blktabGet;
wcfsRead -> δFtail;
wcfsRead -> mappingRegister;
wcfsRead -> headInv;
zobj2file -> zblk2file;
zobj2file -> zbtree2file;
zbtree2file -> δBTree;
// wcfs_simple -> Btree_read;
// wcfs_simple -> ZBlk_read;
// wcfs_simple -> autoexit;
client -> wcfsRead;
client -> mappingRegister;
client -> clientInvHandle;
// client -> δR;
client -> nowcfs;
client -> zodburl;
// client -> zodburl;
// client -> wcfs_spawn;
Btree_read -> ZODB_read;
ZBlk_read -> ZODB_read;
ZODB_read -> ZODB_binary;
ZODB_read -> ogorek_persref;
clientInvHandle -> headInv;
// Btree_read -> ZODB_read;
// ZBlk_read -> ZODB_read;
// ZODB_read -> ogorek_persref;
wcfs [label="wcfs"]
wcfs_simple [label="wcfs no\ninvalidations", style=filled fillcolor=grey95]
// wcfs_simple [label="wcfs no\ninvalidations", style=filled fillcolor=grey95]
client [label="client"]
wcfs_spawn [label="spawn wcfs", style=filled fillcolor=lightyellow]
// wcfs_spawn [label="spawn wcfs", style=filled fillcolor=lightyellow]
nowcfs [label="!wcfs mode"]
wcfsInvProcess [label="process\nZODB invalidations"]
zconnCacheGet [label="zconn.Cache.Get"]
zobj2file [label="Z* → file/[]#blk"]
zblk2file [label="ZBlk* → file/[]#blk"]
zbtree2file [label="BTree/Bucket → file/[]#blk"]
δBTree [label="δ(BTree)"]
fuseRetrieveCache [label="FUSE:\nretrieve cache"]
wcfsRead [label="read(#blk)"]
blktabGet [label="blktab.Get(#blk):\nmanually + → ⌈rev(#blk)⌉"]
mappingRegister [label="mmappings:\nregister/maint"]
clientInvHandle [label="process\n#blk invalidations"]
headInv [label="#blk ← head/inv."]
ZODB_go_inv [label="ZODB/go\ninvalidations"]
Btree_read [label="BTree read", style=filled fillcolor=lightyellow]
ZBlk_read [label="ZBigFile / ZBlk* read", style=filled fillcolor=lightyellow]
ZODB_read [label="ZODB deserialize object", style=filled fillcolor=lightyellow]
ZODB_binary [label="Adapt to zodbpickle.binary"];
ogorek_persref [label="ogórek:\npersistent references", style=filled fillcolor=lightyellow];
// Btree_read [label="BTree read", style=filled fillcolor=lightyellow]
// ZBlk_read [label="ZBigFile / ZBlk* read", style=filled fillcolor=lightyellow]
// ZODB_read [label="ZODB deserialize object", style=filled fillcolor=lightyellow]
// ogorek_persref [label="ogórek:\npersistent references", style=filled fillcolor=lightyellow];
Sinvtree [label="server: inv. tree"]
δR [label="δR encoding"]
// Sinvtree [label="server: inv. tree"]
// δR [label="δR encoding"]
test [label="? tests"]
zodburl [label="zstor -> zurl", style=filled fillcolor=grey95]
// zodburl [label="zstor -> zurl", style=filled fillcolor=grey95]
autoexit [label="autoexit\nif !activity"]
}
......@@ -4,206 +4,245 @@
<!-- Generated by graphviz version 2.40.1 (20161225.0304)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="1115pt" height="385pt"
viewBox="0.00 0.00 1114.84 385.22" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 381.2203)">
<svg width="1321pt" height="367pt"
viewBox="0.00 0.00 1321.46 367.48" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 363.4802)">
<title>%3</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-381.2203 1110.8356,-381.2203 1110.8356,4 -4,4"/>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-363.4802 1317.4565,-363.4802 1317.4565,4 -4,4"/>
<!-- wcfs -->
<g id="node1" class="node">
<title>wcfs</title>
<ellipse fill="none" stroke="#000000" cx="500.2405" cy="-359.2203" rx="27.0966" ry="18"/>
<text text-anchor="middle" x="500.2405" y="-355.5203" font-family="Times,serif" font-size="14.00" fill="#000000">wcfs</text>
<ellipse fill="none" stroke="#000000" cx="447.8112" cy="-341.4802" rx="27.0966" ry="18"/>
<text text-anchor="middle" x="447.8112" y="-337.7802" font-family="Times,serif" font-size="14.00" fill="#000000">wcfs</text>
</g>
<!-- wcfs_simple -->
<!-- autoexit -->
<g id="node2" class="node">
<title>wcfs_simple</title>
<ellipse fill="#f2f2f2" stroke="#000000" cx="215.2405" cy="-278.3503" rx="60.623" ry="26.7407"/>
<text text-anchor="middle" x="215.2405" y="-282.1503" font-family="Times,serif" font-size="14.00" fill="#000000">wcfs no</text>
<text text-anchor="middle" x="215.2405" y="-267.1503" font-family="Times,serif" font-size="14.00" fill="#000000">invalidations</text>
<title>autoexit</title>
<ellipse fill="none" stroke="#000000" cx="287.8112" cy="-260.6102" rx="52.1524" ry="26.7407"/>
<text text-anchor="middle" x="287.8112" y="-264.4102" font-family="Times,serif" font-size="14.00" fill="#000000">autoexit</text>
<text text-anchor="middle" x="287.8112" y="-249.4102" font-family="Times,serif" font-size="14.00" fill="#000000">if !activity</text>
</g>
<!-- wcfs&#45;&gt;wcfs_simple -->
<!-- wcfs&#45;&gt;autoexit -->
<g id="edge1" class="edge">
<title>wcfs&#45;&gt;wcfs_simple</title>
<path fill="none" stroke="#000000" d="M474.1444,-353.7321C433.4028,-344.9257 352.3914,-326.4565 285.2405,-305.2203 280.1207,-303.6013 274.8334,-301.8063 269.5655,-299.9342"/>
<polygon fill="#000000" stroke="#000000" points="270.7386,-296.6365 260.1446,-296.5023 268.3426,-303.2137 270.7386,-296.6365"/>
<title>wcfs&#45;&gt;autoexit</title>
<path fill="none" stroke="#000000" d="M425.8993,-330.4052C402.4756,-318.5659 364.6367,-299.4407 334.487,-284.2019"/>
<polygon fill="#000000" stroke="#000000" points="335.9051,-280.997 325.4015,-279.6097 332.7474,-287.2444 335.9051,-280.997"/>
</g>
<!-- ZODB_go_inv -->
<!-- wcfsInvProcess -->
<g id="node3" class="node">
<title>ZODB_go_inv</title>
<ellipse fill="none" stroke="#000000" cx="355.2405" cy="-278.3503" rx="60.623" ry="26.7407"/>
<text text-anchor="middle" x="355.2405" y="-282.1503" font-family="Times,serif" font-size="14.00" fill="#000000">ZODB/go</text>
<text text-anchor="middle" x="355.2405" y="-267.1503" font-family="Times,serif" font-size="14.00" fill="#000000">invalidations</text>
<title>wcfsInvProcess</title>
<ellipse fill="none" stroke="#000000" cx="447.8112" cy="-260.6102" rx="89.6056" ry="26.7407"/>
<text text-anchor="middle" x="447.8112" y="-264.4102" font-family="Times,serif" font-size="14.00" fill="#000000">process</text>
<text text-anchor="middle" x="447.8112" y="-249.4102" font-family="Times,serif" font-size="14.00" fill="#000000">ZODB invalidations</text>
</g>
<!-- wcfs&#45;&gt;ZODB_go_inv -->
<!-- wcfs&#45;&gt;wcfsInvProcess -->
<g id="edge2" class="edge">
<title>wcfs&#45;&gt;ZODB_go_inv</title>
<path fill="none" stroke="#000000" d="M479.1562,-347.4611C459.0614,-336.2538 428.1093,-318.991 402.1675,-304.5226"/>
<polygon fill="#000000" stroke="#000000" points="403.6649,-301.3502 393.2265,-299.536 400.2552,-307.4637 403.6649,-301.3502"/>
<title>wcfs&#45;&gt;wcfsInvProcess</title>
<path fill="none" stroke="#000000" d="M447.8112,-323.1296C447.8112,-315.5597 447.8112,-306.5002 447.8112,-297.6583"/>
<polygon fill="#000000" stroke="#000000" points="451.3113,-297.4808 447.8112,-287.4808 444.3113,-297.4808 451.3113,-297.4808"/>
</g>
<!-- Sinvtree -->
<!-- wcfsRead -->
<g id="node4" class="node">
<title>Sinvtree</title>
<ellipse fill="none" stroke="#000000" cx="500.2405" cy="-278.3503" rx="66.0889" ry="18"/>
<text text-anchor="middle" x="500.2405" y="-274.6503" font-family="Times,serif" font-size="14.00" fill="#000000">server: inv. tree</text>
<title>wcfsRead</title>
<ellipse fill="none" stroke="#000000" cx="863.8112" cy="-260.6102" rx="47.3916" ry="18"/>
<text text-anchor="middle" x="863.8112" y="-256.9102" font-family="Times,serif" font-size="14.00" fill="#000000">read(#blk)</text>
</g>
<!-- wcfs&#45;&gt;Sinvtree -->
<!-- wcfs&#45;&gt;wcfsRead -->
<g id="edge3" class="edge">
<title>wcfs&#45;&gt;Sinvtree</title>
<path fill="none" stroke="#000000" d="M500.2405,-340.8697C500.2405,-330.8401 500.2405,-318.1956 500.2405,-306.9067"/>
<polygon fill="#000000" stroke="#000000" points="503.7406,-306.6283 500.2405,-296.6283 496.7406,-306.6284 503.7406,-306.6283"/>
<title>wcfs&#45;&gt;wcfsRead</title>
<path fill="none" stroke="#000000" d="M474.1494,-336.3601C541.6501,-323.238 721.8143,-288.2143 811.5369,-270.7722"/>
<polygon fill="#000000" stroke="#000000" points="812.3476,-274.1803 821.4959,-268.8362 811.0117,-267.3089 812.3476,-274.1803"/>
</g>
<!-- δR -->
<!-- ZODB_go_inv -->
<g id="node5" class="node">
<title>δR</title>
<ellipse fill="none" stroke="#000000" cx="678.2405" cy="-278.3503" rx="55.7903" ry="18"/>
<text text-anchor="middle" x="678.2405" y="-274.6503" font-family="Times,serif" font-size="14.00" fill="#000000">δR encoding</text>
<title>ZODB_go_inv</title>
<ellipse fill="none" stroke="#000000" cx="60.8112" cy="-170.8701" rx="60.623" ry="26.7407"/>
<text text-anchor="middle" x="60.8112" y="-174.6701" font-family="Times,serif" font-size="14.00" fill="#000000">ZODB/go</text>
<text text-anchor="middle" x="60.8112" y="-159.6701" font-family="Times,serif" font-size="14.00" fill="#000000">invalidations</text>
</g>
<!-- wcfs&#45;&gt;δR -->
<!-- wcfsInvProcess&#45;&gt;ZODB_go_inv -->
<g id="edge4" class="edge">
<title>wcfs&#45;&gt;δR</title>
<path fill="none" stroke="#000000" d="M522.7843,-348.9781C551.295,-336.025 600.9447,-313.4678 636.4659,-297.3296"/>
<polygon fill="#000000" stroke="#000000" points="637.9682,-300.4914 645.6249,-293.1684 635.0727,-294.1183 637.9682,-300.4914"/>
<title>wcfsInvProcess&#45;&gt;ZODB_go_inv</title>
<path fill="none" stroke="#000000" d="M382.8073,-241.9601C371.5207,-239.011 359.8613,-236.1456 348.8112,-233.7401 252.8573,-212.852 225.5821,-223.4682 130.8112,-197.7401 125.5506,-196.312 120.138,-194.6321 114.7639,-192.8211"/>
<polygon fill="#000000" stroke="#000000" points="115.7682,-189.4641 105.1734,-189.4456 113.4442,-196.0671 115.7682,-189.4641"/>
</g>
<!-- autoexit -->
<!-- zconnCacheGet -->
<g id="node6" class="node">
<title>autoexit</title>
<ellipse fill="none" stroke="#000000" cx="524.2405" cy="-188.6102" rx="52.1524" ry="26.7407"/>
<text text-anchor="middle" x="524.2405" y="-192.4102" font-family="Times,serif" font-size="14.00" fill="#000000">autoexit</text>
<text text-anchor="middle" x="524.2405" y="-177.4102" font-family="Times,serif" font-size="14.00" fill="#000000">if !activity</text>
<title>zconnCacheGet</title>
<ellipse fill="none" stroke="#000000" cx="210.8112" cy="-170.8701" rx="71.4873" ry="18"/>
<text text-anchor="middle" x="210.8112" y="-167.1701" font-family="Times,serif" font-size="14.00" fill="#000000">zconn.Cache.Get</text>
</g>
<!-- wcfs&#45;&gt;autoexit -->
<!-- wcfsInvProcess&#45;&gt;zconnCacheGet -->
<g id="edge5" class="edge">
<title>wcfs&#45;&gt;autoexit</title>
<path fill="none" stroke="#000000" d="M523.6041,-349.6048C541.3543,-340.9016 564.4877,-326.2123 575.2405,-305.2203 586.1295,-283.9624 583.2229,-273.9913 575.2405,-251.4802 571.2079,-240.1079 564.1443,-229.2925 556.5362,-219.9395"/>
<polygon fill="#000000" stroke="#000000" points="558.9615,-217.3934 549.7726,-212.1194 553.6671,-221.9725 558.9615,-217.3934"/>
<title>wcfsInvProcess&#45;&gt;zconnCacheGet</title>
<path fill="none" stroke="#000000" d="M391.9456,-239.4567C351.9531,-224.3135 298.6039,-204.1128 260.174,-189.5613"/>
<polygon fill="#000000" stroke="#000000" points="261.3337,-186.258 250.7423,-185.99 258.8549,-192.8044 261.3337,-186.258"/>
</g>
<!-- wcfs_simple&#45;&gt;autoexit -->
<g id="edge8" class="edge">
<title>wcfs_simple&#45;&gt;autoexit</title>
<path fill="none" stroke="#000000" d="M260.5453,-260.133C268.7109,-257.0866 277.1874,-254.0765 285.2405,-251.4802 346.3654,-231.7741 417.7214,-213.6181 466.4456,-201.9369"/>
<polygon fill="#000000" stroke="#000000" points="467.2668,-205.3393 476.183,-199.6164 465.644,-198.53 467.2668,-205.3393"/>
</g>
<!-- Btree_read -->
<!-- zobj2file -->
<g id="node7" class="node">
<title>Btree_read</title>
<ellipse fill="#ffffe0" stroke="#000000" cx="136.2405" cy="-188.6102" rx="50.0912" ry="18"/>
<text text-anchor="middle" x="136.2405" y="-184.9102" font-family="Times,serif" font-size="14.00" fill="#000000">BTree read</text>
<title>zobj2file</title>
<ellipse fill="none" stroke="#000000" cx="370.8112" cy="-170.8701" rx="70.3881" ry="18"/>
<text text-anchor="middle" x="370.8112" y="-167.1701" font-family="Times,serif" font-size="14.00" fill="#000000">Z* → file/[]#blk</text>
</g>
<!-- wcfs_simple&#45;&gt;Btree_read -->
<!-- wcfsInvProcess&#45;&gt;zobj2file -->
<g id="edge6" class="edge">
<title>wcfs_simple&#45;&gt;Btree_read</title>
<path fill="none" stroke="#000000" d="M192.8136,-252.8744C181.9617,-240.5473 168.981,-225.8018 158.1924,-213.5465"/>
<polygon fill="#000000" stroke="#000000" points="160.6985,-211.0963 151.4637,-205.903 155.4443,-215.7217 160.6985,-211.0963"/>
<title>wcfsInvProcess&#45;&gt;zobj2file</title>
<path fill="none" stroke="#000000" d="M425.1251,-234.1706C414.8887,-222.2405 402.847,-208.2064 392.7065,-196.3881"/>
<polygon fill="#000000" stroke="#000000" points="395.2035,-193.9233 386.0354,-188.6132 389.891,-198.4816 395.2035,-193.9233"/>
</g>
<!-- ZBlk_read -->
<!-- δFtail -->
<g id="node8" class="node">
<title>ZBlk_read</title>
<ellipse fill="#ffffe0" stroke="#000000" cx="294.2405" cy="-188.6102" rx="89.8845" ry="18"/>
<text text-anchor="middle" x="294.2405" y="-184.9102" font-family="Times,serif" font-size="14.00" fill="#000000">ZBigFile / ZBlk* read</text>
<title>δFtail</title>
<ellipse fill="none" stroke="#000000" cx="638.8112" cy="-170.8701" rx="31.6951" ry="18"/>
<text text-anchor="middle" x="638.8112" y="-167.1701" font-family="Times,serif" font-size="14.00" fill="#000000">δFtail</text>
</g>
<!-- wcfs_simple&#45;&gt;ZBlk_read -->
<!-- wcfsInvProcess&#45;&gt;δFtail -->
<g id="edge7" class="edge">
<title>wcfs_simple&#45;&gt;ZBlk_read</title>
<path fill="none" stroke="#000000" d="M237.6674,-252.8744C248.3696,-240.7173 261.1423,-226.2081 271.8409,-214.055"/>
<polygon fill="#000000" stroke="#000000" points="274.5487,-216.2759 278.5293,-206.4573 269.2946,-211.6506 274.5487,-216.2759"/>
</g>
<!-- ZODB_read -->
<g id="node13" class="node">
<title>ZODB_read</title>
<ellipse fill="#ffffe0" stroke="#000000" cx="215.2405" cy="-107.7401" rx="98.5829" ry="18"/>
<text text-anchor="middle" x="215.2405" y="-104.0401" font-family="Times,serif" font-size="14.00" fill="#000000">ZODB deserialize object</text>
</g>
<!-- Btree_read&#45;&gt;ZODB_read -->
<g id="edge13" class="edge">
<title>Btree_read&#45;&gt;ZODB_read</title>
<path fill="none" stroke="#000000" d="M152.9944,-171.4597C163.8716,-160.325 178.2398,-145.6167 190.4308,-133.1371"/>
<polygon fill="#000000" stroke="#000000" points="193.2052,-135.3057 197.6894,-125.7066 188.1979,-130.4142 193.2052,-135.3057"/>
<title>wcfsInvProcess&#45;&gt;δFtail</title>
<path fill="none" stroke="#000000" d="M504.5002,-239.696C533.0616,-228.4505 567.8907,-213.6423 597.8112,-197.7401 601.8951,-195.5696 606.0928,-193.1097 610.1718,-190.5812"/>
<polygon fill="#000000" stroke="#000000" points="612.3018,-193.3728 618.8218,-185.0218 608.5171,-187.4841 612.3018,-193.3728"/>
</g>
<!-- ZBlk_read&#45;&gt;ZODB_read -->
<g id="edge14" class="edge">
<title>ZBlk_read&#45;&gt;ZODB_read</title>
<path fill="none" stroke="#000000" d="M276.7078,-170.6625C265.8663,-159.5643 251.7875,-145.1522 239.8398,-132.9218"/>
<polygon fill="#000000" stroke="#000000" points="242.2157,-130.3452 232.7241,-125.6376 237.2084,-135.2367 242.2157,-130.3452"/>
</g>
<!-- client -->
<!-- fuseRetrieveCache -->
<g id="node9" class="node">
<title>client</title>
<ellipse fill="none" stroke="#000000" cx="865.2405" cy="-359.2203" rx="30.5947" ry="18"/>
<text text-anchor="middle" x="865.2405" y="-355.5203" font-family="Times,serif" font-size="14.00" fill="#000000">client</text>
<title>fuseRetrieveCache</title>
<ellipse fill="none" stroke="#000000" cx="523.8112" cy="-170.8701" rx="65.1077" ry="26.7407"/>
<text text-anchor="middle" x="523.8112" y="-174.6701" font-family="Times,serif" font-size="14.00" fill="#000000">FUSE:</text>
<text text-anchor="middle" x="523.8112" y="-159.6701" font-family="Times,serif" font-size="14.00" fill="#000000">retrieve cache</text>
</g>
<!-- wcfsInvProcess&#45;&gt;fuseRetrieveCache -->
<g id="edge8" class="edge">
<title>wcfsInvProcess&#45;&gt;fuseRetrieveCache</title>
<path fill="none" stroke="#000000" d="M470.2026,-234.1706C478.149,-224.7876 487.1975,-214.1031 495.5685,-204.2188"/>
<polygon fill="#000000" stroke="#000000" points="498.3107,-206.3965 502.1026,-196.5034 492.969,-201.8726 498.3107,-206.3965"/>
</g>
<!-- client&#45;&gt;δR -->
<!-- wcfsRead&#45;&gt;δFtail -->
<g id="edge10" class="edge">
<title>client&#45;&gt;δR</title>
<path fill="none" stroke="#000000" d="M840.4083,-348.4814C810.0384,-335.3476 758.127,-312.898 721.2218,-296.938"/>
<polygon fill="#000000" stroke="#000000" points="722.2814,-293.583 711.7136,-292.8261 719.5028,-300.0079 722.2814,-293.583"/>
<title>wcfsRead&#45;&gt;δFtail</title>
<path fill="none" stroke="#000000" d="M824.3537,-250.4755C786.0592,-239.8896 726.8474,-221.5129 678.8112,-197.7401 674.7366,-195.7236 670.5893,-193.3524 666.584,-190.8679"/>
<polygon fill="#000000" stroke="#000000" points="668.4114,-187.8811 658.1254,-185.3419 664.5829,-193.7414 668.4114,-187.8811"/>
</g>
<!-- wcfs_spawn -->
<!-- blktabGet -->
<g id="node10" class="node">
<title>wcfs_spawn</title>
<ellipse fill="#ffffe0" stroke="#000000" cx="804.2405" cy="-278.3503" rx="51.9908" ry="18"/>
<text text-anchor="middle" x="804.2405" y="-274.6503" font-family="Times,serif" font-size="14.00" fill="#000000">spawn wcfs</text>
<title>blktabGet</title>
<ellipse fill="none" stroke="#000000" cx="802.8112" cy="-170.8701" rx="114.6026" ry="26.7407"/>
<text text-anchor="middle" x="802.8112" y="-174.6701" font-family="Times,serif" font-size="14.00" fill="#000000">blktab.Get(#blk):</text>
<text text-anchor="middle" x="802.8112" y="-159.6701" font-family="Times,serif" font-size="14.00" fill="#000000">manually + → ⌈rev(#blk)⌉</text>
</g>
<!-- client&#45;&gt;wcfs_spawn -->
<!-- wcfsRead&#45;&gt;blktabGet -->
<g id="edge9" class="edge">
<title>client&#45;&gt;wcfs_spawn</title>
<path fill="none" stroke="#000000" d="M852.6012,-342.464C844.2652,-331.4126 833.1808,-316.7176 823.7271,-304.1844"/>
<polygon fill="#000000" stroke="#000000" points="826.3128,-301.8002 817.4965,-295.9244 820.7243,-306.0156 826.3128,-301.8002"/>
<title>wcfsRead&#45;&gt;blktabGet</title>
<path fill="none" stroke="#000000" d="M851.7595,-242.8804C844.594,-232.3388 835.2523,-218.5957 826.6529,-205.9448"/>
<polygon fill="#000000" stroke="#000000" points="829.3703,-203.7165 820.854,-197.4138 823.5811,-207.6516 829.3703,-203.7165"/>
</g>
<!-- nowcfs -->
<!-- mappingRegister -->
<g id="node11" class="node">
<title>nowcfs</title>
<ellipse fill="none" stroke="#000000" cx="927.2405" cy="-278.3503" rx="52.7911" ry="18"/>
<text text-anchor="middle" x="927.2405" y="-274.6503" font-family="Times,serif" font-size="14.00" fill="#000000">!wcfs mode</text>
<title>mappingRegister</title>
<ellipse fill="none" stroke="#000000" cx="1000.8112" cy="-170.8701" rx="65.1077" ry="26.7407"/>
<text text-anchor="middle" x="1000.8112" y="-174.6701" font-family="Times,serif" font-size="14.00" fill="#000000">mmappings:</text>
<text text-anchor="middle" x="1000.8112" y="-159.6701" font-family="Times,serif" font-size="14.00" fill="#000000">register/maint</text>
</g>
<!-- client&#45;&gt;nowcfs -->
<!-- wcfsRead&#45;&gt;mappingRegister -->
<g id="edge11" class="edge">
<title>client&#45;&gt;nowcfs</title>
<path fill="none" stroke="#000000" d="M878.0869,-342.464C886.5596,-331.4126 897.8257,-316.7176 907.4344,-304.1844"/>
<polygon fill="#000000" stroke="#000000" points="910.4604,-305.99 913.7671,-295.9244 904.9051,-301.731 910.4604,-305.99"/>
<title>wcfsRead&#45;&gt;mappingRegister</title>
<path fill="none" stroke="#000000" d="M887.6748,-244.9786C906.8447,-232.4216 934.2755,-214.4534 957.3257,-199.3547"/>
<polygon fill="#000000" stroke="#000000" points="959.4718,-202.133 965.9191,-193.7257 955.6361,-196.2774 959.4718,-202.133"/>
</g>
<!-- zodburl -->
<!-- headInv -->
<g id="node12" class="node">
<title>zodburl</title>
<ellipse fill="#f2f2f2" stroke="#000000" cx="1052.2405" cy="-278.3503" rx="54.6905" ry="18"/>
<text text-anchor="middle" x="1052.2405" y="-274.6503" font-family="Times,serif" font-size="14.00" fill="#000000">zstor &#45;&gt; zurl</text>
<title>headInv</title>
<ellipse fill="none" stroke="#000000" cx="1156.8112" cy="-170.8701" rx="73.387" ry="18"/>
<text text-anchor="middle" x="1156.8112" y="-167.1701" font-family="Times,serif" font-size="14.00" fill="#000000">#blk ← head/inv.</text>
</g>
<!-- client&#45;&gt;zodburl -->
<!-- wcfsRead&#45;&gt;headInv -->
<g id="edge12" class="edge">
<title>client&#45;&gt;zodburl</title>
<path fill="none" stroke="#000000" d="M889.7468,-348.1053C894.859,-345.8112 900.2197,-343.4235 905.2405,-341.2203 915.3589,-336.7804 969.7634,-313.5397 1009.4466,-296.6045"/>
<polygon fill="#000000" stroke="#000000" points="1010.8525,-299.81 1018.6764,-292.666 1008.1051,-293.3717 1010.8525,-299.81"/>
<title>wcfsRead&#45;&gt;headInv</title>
<path fill="none" stroke="#000000" d="M901.6392,-249.6738C943.9378,-237.3671 1014.4354,-216.6233 1074.8112,-197.7401 1084.0434,-194.8526 1093.8351,-191.7123 1103.3168,-188.6304"/>
<polygon fill="#000000" stroke="#000000" points="1104.6291,-191.8839 1113.0489,-185.4529 1102.4565,-185.2295 1104.6291,-191.8839"/>
</g>
<!-- zblk2file -->
<g id="node13" class="node">
<title>zblk2file</title>
<ellipse fill="none" stroke="#000000" cx="264.8112" cy="-90" rx="83.3857" ry="18"/>
<text text-anchor="middle" x="264.8112" y="-86.3" font-family="Times,serif" font-size="14.00" fill="#000000">ZBlk* → file/[]#blk</text>
</g>
<!-- zobj2file&#45;&gt;zblk2file -->
<g id="edge13" class="edge">
<title>zobj2file&#45;&gt;zblk2file</title>
<path fill="none" stroke="#000000" d="M348.3313,-153.7196C333.1164,-142.1118 312.8112,-126.6204 296.0317,-113.8189"/>
<polygon fill="#000000" stroke="#000000" points="297.6975,-110.6875 287.6241,-107.4045 293.4515,-116.2528 297.6975,-110.6875"/>
</g>
<!-- ZODB_binary -->
<!-- zbtree2file -->
<g id="node14" class="node">
<title>ZODB_binary</title>
<ellipse fill="none" stroke="#000000" cx="107.2405" cy="-26.8701" rx="107.4815" ry="18"/>
<text text-anchor="middle" x="107.2405" y="-23.1701" font-family="Times,serif" font-size="14.00" fill="#000000">Adapt to zodbpickle.binary</text>
<title>zbtree2file</title>
<ellipse fill="none" stroke="#000000" cx="475.8112" cy="-90" rx="109.6807" ry="18"/>
<text text-anchor="middle" x="475.8112" y="-86.3" font-family="Times,serif" font-size="14.00" fill="#000000">BTree/Bucket → file/[]#blk</text>
</g>
<!-- ZODB_read&#45;&gt;ZODB_binary -->
<g id="edge15" class="edge">
<title>ZODB_read&#45;&gt;ZODB_binary</title>
<path fill="none" stroke="#000000" d="M191.8061,-90.1925C176.4855,-78.7205 156.2705,-63.5836 139.4468,-50.9861"/>
<polygon fill="#000000" stroke="#000000" points="141.1046,-47.8549 131.0021,-44.6627 136.9089,-53.4582 141.1046,-47.8549"/>
<!-- zobj2file&#45;&gt;zbtree2file -->
<g id="edge14" class="edge">
<title>zobj2file&#45;&gt;zbtree2file</title>
<path fill="none" stroke="#000000" d="M393.079,-153.7196C408.0478,-142.1908 427.9906,-126.831 444.5456,-114.0805"/>
<polygon fill="#000000" stroke="#000000" points="447.0627,-116.5596 452.8496,-107.6848 442.7914,-111.0138 447.0627,-116.5596"/>
</g>
<!-- ogorek_persref -->
<!-- δBTree -->
<g id="node15" class="node">
<title>ogorek_persref</title>
<ellipse fill="#ffffe0" stroke="#000000" cx="323.2405" cy="-26.8701" rx="90.5193" ry="26.7407"/>
<text text-anchor="middle" x="323.2405" y="-30.6701" font-family="Times,serif" font-size="14.00" fill="#000000">ogórek:</text>
<text text-anchor="middle" x="323.2405" y="-15.6701" font-family="Times,serif" font-size="14.00" fill="#000000">persistent references</text>
<title>δBTree</title>
<ellipse fill="none" stroke="#000000" cx="475.8112" cy="-18" rx="43.5923" ry="18"/>
<text text-anchor="middle" x="475.8112" y="-14.3" font-family="Times,serif" font-size="14.00" fill="#000000">δ(BTree)</text>
</g>
<!-- ZODB_read&#45;&gt;ogorek_persref -->
<g id="edge16" class="edge">
<title>ZODB_read&#45;&gt;ogorek_persref</title>
<path fill="none" stroke="#000000" d="M238.6749,-90.1925C251.1213,-80.8726 266.7981,-69.134 281.2625,-58.3031"/>
<polygon fill="#000000" stroke="#000000" points="283.6984,-60.8515 289.6052,-52.056 279.5027,-55.2483 283.6984,-60.8515"/>
<!-- zbtree2file&#45;&gt;δBTree -->
<g id="edge15" class="edge">
<title>zbtree2file&#45;&gt;δBTree</title>
<path fill="none" stroke="#000000" d="M475.8112,-71.8314C475.8112,-64.131 475.8112,-54.9743 475.8112,-46.4166"/>
<polygon fill="#000000" stroke="#000000" points="479.3113,-46.4132 475.8112,-36.4133 472.3113,-46.4133 479.3113,-46.4132"/>
</g>
<!-- test -->
<!-- client -->
<g id="node16" class="node">
<title>test</title>
<ellipse fill="none" stroke="#000000" cx="946.2405" cy="-359.2203" rx="32.4942" ry="18"/>
<text text-anchor="middle" x="946.2405" y="-355.5203" font-family="Times,serif" font-size="14.00" fill="#000000">? tests</text>
<title>client</title>
<ellipse fill="none" stroke="#000000" cx="1054.8112" cy="-341.4802" rx="30.5947" ry="18"/>
<text text-anchor="middle" x="1054.8112" y="-337.7802" font-family="Times,serif" font-size="14.00" fill="#000000">client</text>
</g>
<!-- client&#45;&gt;wcfsRead -->
<g id="edge16" class="edge">
<title>client&#45;&gt;wcfsRead</title>
<path fill="none" stroke="#000000" d="M1029.8413,-330.9079C998.1468,-317.4884 943.0353,-294.1539 905.0558,-278.0733"/>
<polygon fill="#000000" stroke="#000000" points="906.367,-274.8277 895.7937,-274.1517 903.6377,-281.2737 906.367,-274.8277"/>
</g>
<!-- client&#45;&gt;mappingRegister -->
<g id="edge17" class="edge">
<title>client&#45;&gt;mappingRegister</title>
<path fill="none" stroke="#000000" d="M1041.5213,-324.9601C1033.9839,-314.7434 1025.04,-301.0299 1019.8112,-287.4802 1010.0131,-262.0902 1005.2677,-231.6953 1002.9694,-208.3035"/>
<polygon fill="#000000" stroke="#000000" points="1006.4343,-207.7489 1002.0763,-198.0919 999.4609,-208.3588 1006.4343,-207.7489"/>
</g>
<!-- clientInvHandle -->
<g id="node17" class="node">
<title>clientInvHandle</title>
<ellipse fill="none" stroke="#000000" cx="1109.8112" cy="-260.6102" rx="80.7205" ry="26.7407"/>
<text text-anchor="middle" x="1109.8112" y="-264.4102" font-family="Times,serif" font-size="14.00" fill="#000000">process</text>
<text text-anchor="middle" x="1109.8112" y="-249.4102" font-family="Times,serif" font-size="14.00" fill="#000000">#blk invalidations</text>
</g>
<!-- client&#45;&gt;clientInvHandle -->
<g id="edge18" class="edge">
<title>client&#45;&gt;clientInvHandle</title>
<path fill="none" stroke="#000000" d="M1066.2072,-324.7239C1072.0085,-316.1939 1079.286,-305.4933 1086.2244,-295.2914"/>
<polygon fill="#000000" stroke="#000000" points="1089.2033,-297.135 1091.9329,-286.8978 1083.415,-293.1984 1089.2033,-297.135"/>
</g>
<!-- nowcfs -->
<g id="node18" class="node">
<title>nowcfs</title>
<ellipse fill="none" stroke="#000000" cx="1260.8112" cy="-260.6102" rx="52.7911" ry="18"/>
<text text-anchor="middle" x="1260.8112" y="-256.9102" font-family="Times,serif" font-size="14.00" fill="#000000">!wcfs mode</text>
</g>
<!-- client&#45;&gt;nowcfs -->
<g id="edge19" class="edge">
<title>client&#45;&gt;nowcfs</title>
<path fill="none" stroke="#000000" d="M1081.1838,-332.1823C1110.3792,-321.768 1158.7442,-304.1555 1199.8112,-287.4802 1206.2226,-284.8769 1212.967,-282.0237 1219.5391,-279.1794"/>
<polygon fill="#000000" stroke="#000000" points="1221.2256,-282.262 1228.9857,-275.0487 1218.4211,-275.8483 1221.2256,-282.262"/>
</g>
<!-- clientInvHandle&#45;&gt;headInv -->
<g id="edge20" class="edge">
<title>clientInvHandle&#45;&gt;headInv</title>
<path fill="none" stroke="#000000" d="M1123.9127,-233.6852C1129.8276,-222.3916 1136.6935,-209.282 1142.6181,-197.9698"/>
<polygon fill="#000000" stroke="#000000" points="1145.7928,-199.452 1147.3318,-188.9696 1139.5917,-196.2043 1145.7928,-199.452"/>
</g>
</g>
</svg>
......@@ -337,90 +337,8 @@ package main
// and a client that wants @rev data will get @rev data, even if it was this
// "old" client that triggered the pagefault(*).
//
// (*) we can change a mapping while a page from it is under pagefault:
//
// - the kernel, upon handling pagefault, queues read request to filesystem
// server. As of Linux 4.20 this is done _with_ holding client->mm->mmap_sem:
//
// kprobe:fuse_readpages (client->mm->mmap_sem.count: 1)
// fuse_readpages+1
// read_pages+109
// __do_page_cache_readahead+401
// filemap_fault+635
// __do_fault+31
// __handle_mm_fault+3403
// handle_mm_fault+220
// __do_page_fault+598
// page_fault+30
//
// - however the read request is queued to be performed asynchronously -
// the kernel does not wait for it in fuse_readpages, because
//
// * git.kernel.org/linus/c1aa96a5,
// * git.kernel.org/linus/9cd68455,
// * and go-fuse initially negotiating CAP_ASYNC_READ to the kernel.
//
// - the kernel then _releases_ client->mm->mmap_sem and then waits
// for to-read pages to become ready:
//
// * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n2411
// * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n2457
// * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n1301
//
// - the filesystem server upon receiving the read request can manipulate
// client's address space. This requires to write-lock client->mm->mmap_sem,
// but we can be sure it won't deadlock because the kernel releases it
// before waiting (see previous point).
//
// in practice the manipulation is done by another client thread, because
// on Linux it is not possible to change mm of another process. However
// the main point here is that the manipulation is possible because
// there will be no deadlock on client->mm->mmap_sem.
//
// For the reference here is how filesystem server reply looks under trace:
//
// kprobe:fuse_readpages_end
// fuse_readpages_end+1
// request_end+188
// fuse_dev_do_write+1921
// fuse_dev_write+78
// do_iter_readv_writev+325
// do_iter_write+128
// vfs_writev+152
// do_writev+94
// do_syscall_64+85
// entry_SYSCALL_64_after_hwframe+68
//
// and a test program that demonstrates that it is possible to change
// mmapping while under pagefault to it:
//
// https://lab.nexedi.com/kirr/go-fuse/commit/f822c9db
//
// In the future mmap_sem might be released while doing any IO:
//
// https://lwn.net/Articles/768857
//
// but before that the analysis remains FUSE-specific.
//
//
// (+) the kernel sends SIGSTOP to interrupt tracee, but the signal will be
// processed only when the process returns from kernel space, e.g. here
//
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/common.c?id=v4.19-rc8-151-g23469de647c4#n160
//
// This way the tracer won't receive obligatory information that tracee
// stopped (via wait...) and even though ptrace(ATTACH) succeeds, all other
// ptrace commands will fail:
//
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/ptrace.c?id=v4.19-rc8-151-g23469de647c4#n1140
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/ptrace.c?id=v4.19-rc8-151-g23469de647c4#n207
//
// My original idea was to use ptrace to run code in process to change it's
// memory mappings, while the triggering process is under pagefault/read
// to wcfs, and the above shows it won't work - trying to ptrace the
// client from under wcfs will just block forever (the kernel will be
// waiting for read operation to finish for ptrace, and read will be first
// waiting on ptrace stopping to complete = deadlock)
// (*) see "Changing mmapping while under pagefault is possible" in notes.txt
// (+) see "Client cannot be ptraced while under pagefault" in notes.txt
//
//
// XXX mmap(@at) open
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment