Commit 4fbdd270 authored by Kirill Smelkov's avatar Kirill Smelkov

X Proof that that it is possible to change mmapping while under pagefault to it

Evn though the kernel not generally releasing mmap_sem on IO caused by
pagefault.
parent e1f05973
...@@ -335,7 +335,72 @@ package main ...@@ -335,7 +335,72 @@ package main
// //
// Thus a client that wants latest data on pagefault will get latest data, // Thus a client that wants latest data on pagefault will get latest data,
// and a client that wants @rev data will get @rev data, even if it was this // and a client that wants @rev data will get @rev data, even if it was this
// "old" client that triggered the pagefault. // "old" client that triggered the pagefault(*).
//
// (*) we can change a mapping while a page from it is under pagefault:
//
// - the kernel, upon handling pagefault, queues read request to filesystem
// server. As of Linux 4.20 this is done _with_ holding client->mm->mmap_sem:
//
// kprobe:fuse_readpages (client->mm->mmap_sem.count: 1)
// fuse_readpages+1
// read_pages+109
// __do_page_cache_readahead+401
// filemap_fault+635
// __do_fault+31
// __handle_mm_fault+3403
// handle_mm_fault+220
// __do_page_fault+598
// page_fault+30
//
// - however the read request is queued to be performed asynchronously -
// the kernel does not wait for it in fuse_readpages, because
//
// * git.kernel.org/linus/c1aa96a5,
// * git.kernel.org/linus/9cd68455,
// * and go-fuse initially negotiating CAP_ASYNC_READ to the kernel.
//
// - the kernel then _releases_ client->mm->mmap_sem and then waits
// for to-read pages to become ready:
//
// * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n2411
// * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n2457
// * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?id=v4.20-rc3-83-g06e68fed3282#n1301
//
// - the filesystem server upon receiving the read request can manipulate
// client's address space. This requires to write-lock client->mm->mmap_sem,
// but we can be sure it won't deadlock because the kernel releases it
// before waiting (see previous point).
//
// in practice the manipulation is done by another client thread, because
// on Linux it is not possible to change mm of another process. However
// the main point here is that the manipulation is possible because
// there will be no deadlock on client->mm->mmap_sem.
//
// For the reference here is how filesystem server reply looks under trace:
//
// kprobe:fuse_readpages_end
// fuse_readpages_end+1
// request_end+188
// fuse_dev_do_write+1921
// fuse_dev_write+78
// do_iter_readv_writev+325
// do_iter_write+128
// vfs_writev+152
// do_writev+94
// do_syscall_64+85
// entry_SYSCALL_64_after_hwframe+68
//
// and a test program that demonstrates that it is possible to change
// mmapping while under pagefault to it:
//
// https://lab.nexedi.com/kirr/go-fuse/commit/f822c9db
//
// In the future mmap_sem might be released while doing any IO:
//
// https://lwn.net/Articles/768857
//
// but before that the analysis remains FUSE-specific.
// //
// //
// XXX mmap(@at) open // XXX mmap(@at) open
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment