1. 25 Apr, 2019 10 commits
    • Kirill Smelkov's avatar
      . · 9d5f7e99
      Kirill Smelkov authored
      9d5f7e99
    • Kirill Smelkov's avatar
      fuse: request explicit control over data cache if filesystem asks for it · 1d6f9351
      Kirill Smelkov authored
      This complements commit "fuse: allow filesystems to disable
      CAP_AUTO_INVAL_DATA" and teaches go-fuse to request explicit data cache
      invalidation mode if fuse.MountOptions.ExplicitDataCacheControl is set.
      
      See https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/commit/?id=ad2ba64dd489
      and https://lwn.net/ml/linux-fsdevel/20190315212556.9315-1-kirr%40nexedi.com/ for rationale and details.
      1d6f9351
    • Kirill Smelkov's avatar
      fuse/test: Fix TestFopenKeepCache · 904ef0cc
      Kirill Smelkov authored
      This test was flaky even before ce2558b4 (fuse/test: disable
      TestFopenKeepCache) because after file change it was reporting to kernel
      both different size (before=6, after=5) and potentially different mtime.
      The kernel is known to invalidate file's data cache on size change, and
      also to invalidate file's data cache on mtime change if
      CAP_AUTO_INVAL_DATA was negotiated at FUSE handshake. Until recently
      go-fuse was always using CAP_AUTO_INVAL_DATA mode.
      
      The test was somehow passing on kernels < Linux 4.20 due to the fact
      that the write was coming soon after previous lookup/getattr and thus
      likely before attributes timeout. The kernel was still seeing old mtime
      and size and was not invalidating file cache. If I add just "sleep
      enough time for file attributes to expire..." + followup stat from the
      patch, the test fails reliably even on older kernels where it used to be
      passing:
      
      	=== RUN   TestFopenKeepCache
      	17:44:33.397329 rx 1: INIT i0 {7.27 Ra 0x20000 SPLICE_READ,HANDLE_KILLPRIV,IOCTL_DIR,READDIRPLUS,PARALLEL_DIROPS,ABORT_ERROR,POSIX_LOCKS,DONT_MASK,SPLICE_WRITE,SPLICE_MOVE,ASYNC_READ,FLOCK_LOCKS,WRITEBACK_CACHE,POSIX
      	17:44:33.397413 tx 1:     OK, {7.23 Ra 0x20000 ASYNC_READ,NO_OPEN_SUPPORT,BIG_WRITES,AUTO_INVAL_DATA,READDIRPLUS,PARALLEL_DIROPS 0/0 Wr 0x10000 Tg 0x0}
      	17:44:33.398589 rx 2: LOOKUP i1 [".go-fuse-epoll-hack"] 20b
      	17:44:33.398644 tx 2:     2=no such file or directory, {i0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
      	17:44:33.398686 rx 3: CREATE i1 {0100100 [WRONLY,TRUNC,CREAT,0x8000] (022)} [".go-fuse-epoll-hack"] 20b
      	17:44:33.398710 tx 3:     OK, {i18446744073709551615 g0 {M0100644 SZ=0 L=1 0:0 B0*0 i0:18446744073709551615 A 0.000000 M 0.000000 C 0.000000} &{18446744073709551615 0 0}}
      	17:44:33.398745 rx 4: POLL i18446744073709551615
      	17:44:33.398753 tx 4:     38=function not implemented
      	17:44:33.399446 rx 5: FLUSH i18446744073709551615 {Fh 18446744073709551615}
      	17:44:33.399466 tx 5:     5=input/output error
      	17:44:33.399540 rx 6: RELEASE i18446744073709551615 {Fh 18446744073709551615 WRONLY,0x8000  L0}
      	17:44:33.399548 tx 6:     5=input/output error
      	17:44:33.399567 rx 7: LOOKUP i1 ["file.txt"] 9b
      	17:44:33.399648 tx 7:     OK, {i3 g2 tE=1s tA=0.01s {M0100644 SZ=6 L=1 1000:1000 B8*4096 i0:841351 A 1552833873.399069 M 1552833873.399069 C 1552833873.399069}}
      	17:44:33.399936 rx 8: OPEN i3 {O_RDONLY,0x8000}
      	17:44:33.399976 tx 8:     OK, {Fh 2 CACHE}
      	17:44:33.400045 rx 9: READ i3 {Fh 2 [0 +4096)  L 0 NONBLOCK,0x8000}
      	17:44:33.400065 tx 9:     OK,  4096b data (fd data)
      	17:44:33.400185 rx 10: GETATTR i3 {Fh 2}
      	17:44:33.400261 tx 10:     OK, {tA=0.01s {M0100644 SZ=6 L=1 1000:1000 B8*4096 i0:841351 A 1552833873.399069 M 1552833873.399069 C 1552833873.399069}}
      	17:44:33.400296 rx 11: FLUSH i3 {Fh 2}
      	17:44:33.400305 tx 11:     OK
      	17:44:33.400324 rx 12: RELEASE i3 {Fh 2 NONBLOCK,0x8000  L0}
      	17:44:33.400334 tx 12:     OK
      
      		sleep here
      
      	17:44:33.500843 rx 13: GETATTR i3 {Fh 0}
      	17:44:33.500939 tx 13:     OK, {tA=0.01s {M0100644 SZ=5 L=1 1000:1000 B8*4096 i0:841351 A 1552833873.399069 M 1552833873.399069 C 1552833873.399069}}
      	17:44:33.501118 rx 14: OPEN i3 {O_RDONLY,0x8000}
      	17:44:33.501195 tx 14:     OK, {Fh 2 CACHE}
      	17:44:33.501468 rx 15: READ i3 {Fh 2 [0 +4096)  L 0 NONBLOCK,0x8000}
      	17:44:33.501500 tx 15:     OK,  4096b data (fd data)
      	17:44:33.501582 rx 16: GETATTR i3 {Fh 2}
      	17:44:33.501625 tx 16:     OK, {tA=0.01s {M0100644 SZ=5 L=1 1000:1000 B8*4096 i0:841351 A 1552833873.499071 M 1552833873.399069 C 1552833873.399069}}
      	17:44:33.502176 rx 17: FLUSH i3 {Fh 2}
      	17:44:33.502210 tx 17:     OK
      	17:44:33.502268 rx 18: RELEASE i3 {Fh 2 NONBLOCK,0x8000  L0}
      	17:44:33.502296 tx 18:     OK
      	17:44:33.547469 received ENODEV (unmount request), thread exiting
      	17:44:33.547471 received ENODEV (unmount request), thread exiting
      	17:44:33.547469 received ENODEV (unmount request), thread exiting
      	--- FAIL: TestFopenKeepCache (0.15s)
      	    cache_test.go:147: ReadFile: got "after", want cached "before"
      
      In other words the test was racy and was passing only due to likely
      conditions to win a race in particular environment. Here is example debug trace
      when that particular conditions are met:
      
      	17:52:00.119419 rx 7: LOOKUP i1 ["file.txt"] 9b
      	17:52:00.119818 tx 7:     OK, {i3 g2 tE=1s tA=0.01s {M0100644 SZ=6 L=1 1000:1000 B8*4096 i0:853832 A 1552834320.116131 M 1552834320.116131 C 1552834320.116131}}
      	17:52:00.122865 rx 8: OPEN i3 {O_RDONLY,0x8000}
      	17:52:00.122889 tx 8:     OK, {Fh 2 CACHE}
      	17:52:00.122933 rx 9: READ i3 {Fh 2 [0 +4096)  L 0 NONBLOCK,0x8000}
      	17:52:00.122957 tx 9:     OK,  4096b data (fd data)
      	17:52:00.123014 rx 10: GETATTR i3 {Fh 2}
      	17:52:00.123031 tx 10:     OK, {tA=0.01s {M0100644 SZ=6 L=1 1000:1000 B8*4096 i0:853832 A 1552834320.116131 M 1552834320.116131 C 1552834320.116131}}
      	17:52:00.123050 rx 11: FLUSH i3 {Fh 2}
      	17:52:00.123056 tx 11:     OK
      	17:52:00.123071 rx 12: RELEASE i3 {Fh 2 NONBLOCK,0x8000  L0}
      	17:52:00.123082 tx 12:     OK
      
      	17:52:00.123105 rx 13: OPEN i3 {O_RDONLY,0x8000} 		<-- NOTE: OPEN, but no GETATTR around
      	17:52:00.123124 tx 13:     OK, {Fh 2 CACHE}
      	17:52:00.123146 rx 14: FLUSH i3 {Fh 2}
      	17:52:00.123152 tx 14:     OK
      	17:52:00.123164 rx 15: RELEASE i3 {Fh 2 NONBLOCK,0x8000  L0}
      	17:52:00.123183 tx 0:     NOTIFY_INVAL_ENTRY, {parent i1 sz 8} "file.txt"
      	17:52:00.123186 tx 15:     OK
      
      However starting from Linux 4.20 the kernel started to always issue
      GETATTR request around second OPEN, for example:
      
      	18:34:22.323238 rx 26: LOOKUP i1 ["file.txt"] 9b
      	18:34:22.323309 tx 26:     OK, {i3 g2 tE=1s tA=1s {M0100644 SZ=6 L=1 1000:1000 B8*4096 i0:1531145 A 1550252062.321237 M 1550252062.321237 C 1550252062.321237}}
      	18:34:22.323339 rx 28: OPEN i3 {O_RDONLY,0x8000}
      	18:34:22.323384 tx 28:     OK, {Fh 2 CACHE}
      	18:34:22.323441 rx 30: READ i3 {Fh 2 [0 +4096)  L 0 NONBLOCK,0x8000}
      	18:34:22.323477 tx 30:     OK,  4096b data (fd data)
      	18:34:22.323534 rx 32: FLUSH i3 {Fh 2}
      	18:34:22.323546 tx 32:     OK
      	18:34:22.323577 rx 34: RELEASE i3 {Fh 2 NONBLOCK,0x8000  L0}
      	18:34:22.323594 tx 34:     OK
      	18:34:22.323611 rx 36: OPEN i3 {O_RDONLY,0x8000}		<-- NOTE: OPEN with GETATTR around
      	18:34:22.323636 tx 36:     OK, {Fh 2 CACHE}
      	18:34:22.323661 rx 38: GETATTR i3 {Fh 0}
      	18:34:22.323684 tx 38:     OK, {tA=1s {M0100644 SZ=5 L=1 1000:1000 B8*4096 i0:1531145 A 1550252062.322237 M 1550252062.322237 C 1550252062.322237}}
      	18:34:22.323729 rx 40: READ i3 {Fh 2 [0 +4096)  L 0 NONBLOCK,0x8000}
      	18:34:22.323740 tx 40:     OK,  4096b data (fd data)
      
      which almost always triggers conditions to invalidate data cache on kernel side
      (different size and different mtime).
      
      The kernel is not doing anything wrong here - it is allowed to issue
      GETATTR request at any time. It is thus only a kernel behaviour change,
      still being valid from FUSE protocol point of view, not a kernel
      regression.
      
      -> Fix the test
      
      - by disabling CAP_AUTO_INVAL_DATA via ExplicitDataCacheControl. This
        should stop kernel from dropping data cache on mtime change;
      - by using the same size for before and after states. This avoid hitting
        cache being dropped when kernel sees file size being changed.
      
      Make the test more picky trying to hit the pain points:
      
      - make sure that mtime of file.txt at before and after states are
        different. Without added sleep the panic on mtime δ == 0 is triggered
        reliably on my notebook.
      - issue explicit stat before second open to force kernel to
        relookup/regetattr the file and reread the attributes.
      
      Hopefully finally fixes https://github.com/hanwen/go-fuse/issues/168.
      904ef0cc
    • Kirill Smelkov's avatar
      fuse/test: TestFopenKeepCache: denoise · 0d80af72
      Kirill Smelkov authored
      We have many calls to ioutil.WriteFile and ioutil.ReadFile + checking
      for error. Move those calls into utility functions which call t.Fatal if
      they see any error. No need to use additional error prefix as both
      ioutil.WriteFile and ReadFile produce os.PathError which always has
      opertion and path on which it was performed.
      0d80af72
    • Kirill Smelkov's avatar
      nodefs += Mount · 39953190
      Kirill Smelkov authored
      Add nodefs.Mount - utility to mount root over mountpoint with given
      options. We already had nodefs.MountRoot, but that was taking only
      nodefs.Options and there was no way to pass fuse.MountOptions in. The
      new utility accepts both fuse.MountOptions and nodefs.Options, each
      covering their level.
      
      This should be generally useful(*), as well as it will be used in a next
      patch in TestFopenKeepCache where fuse.MountOptions.PreciseDataCacheControl
      will need to be used.
      
      (*) see e.g. https://lab.nexedi.com/kirr/wendelin.core/blob/8f497094/wcfs/misc.go#L258
      as example that users unroll their Mount versions to be able to pass in
      fuse.MountOptions.
      39953190
    • Kirill Smelkov's avatar
      fuse: allow filesystems to disable CAP_AUTO_INVAL_DATA · 48c07fc7
      Kirill Smelkov authored
      CAP_AUTO_INVAL_DATA is capability of kernel, but from the point of view
      of a filesystem it is not a capability, but a behaviour request: if set,
      it requests to kernel - filesystem client - to perform data cache
      invalidations based on heuristics. Current heuristic is to drop data
      cache for a file if kernel sees file's mtime being changed:
      
      	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/inode.c?id=v5.0-0-g1c163f4c7b3f#n238
      	https://git.kernel.org/linus/eed2179efe
      
      However this could be unwanted behaviour if filesystem is careful to
      explicitly invalidate local file cache: despite filesystem attempts to
      preciously keep data cache, the whole cache of the file is unnecessarily
      dropped.
      
      -> To fix add a new mount options for filesystems to indicate they are
      careful with respect to data cache invalidations and thus should be
      fully responsible for invalidating data cache. Teach go-fuse to not send
      CAP_AUTO_INVAL_DATA in this mode to kernel on FUSE handshake.
      
      Note, as of upcoming Linux 5.1 (estimated to be released mid 2019), FUSE
      client in kernel still automatically and unconditionally drops whole
      data cache of a file if its sees size change. Kernel and go-fuse patches
      to fix that are here:
      
      	https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/commit/?id=ad2ba64dd489
      	https://github.com/hanwen/go-fuse/pull/273
      
      The kernel patch is likely to enter mainline kernel when 5.2 merge
      window opens. We'll remove XXX in a follow-up go-fuse patch that adds
      support for CAP_EXPLICIT_INVAL_DATA.
      48c07fc7
    • Kirill Smelkov's avatar
      fuse: Add FOPEN_STREAM · 98229206
      Kirill Smelkov authored
      FOPEN_STREAM, together with FOPEN_NONSEEKABLE must be used on
      stream-like file handles that provide both read and write to avoid
      hitting deadlock in the kernel.
      
      Please see the following kernel patch for details on how the deadlock
      can happen:
      
      	git.kernel.org/linus/10dce8af3422
      
      Adding FOPEN_STREAM to kernel FUSE is in fuse.git#for-next now and is likely
      to enter mainline kernel when 5.2 merge window opens way or another:
      
      	https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/commit/?id=bbd84f33652
      	https://lore.kernel.org/linux-fsdevel/CAHk-=whQQdoQsgEx1vO7OkfPDcV5hurPnMRLgzfXAPN63n5Sbg@mail.gmail.com/
      	https://lore.kernel.org/linux-fsdevel/CAHk-=wjEOyba5As1PEMk6RitNVOJH9oJ_Jbg4y=5B0fcX1iKGw@mail.gmail.com/
      	https://lore.kernel.org/linux-fsdevel/CAHk-=wgh234SyBG810=vB360PCzVkAhQRqGg8aFdATZd+daCFw@mail.gmail.com/
      	https://lore.kernel.org/linux-fsdevel/20190424183012.GB3798@deco.navytux.spb.ru/
      
      Here is example for FOPEN_STREAM usage:
      
      	https://lab.nexedi.com/kirr/wendelin.core/blob/7783ecf4/wcfs/misc.go#L327-335
      	https://lab.nexedi.com/kirr/wendelin.core/blob/7783ecf4/wcfs/misc.go#L276-428
      98229206
    • Han-Wen Nienhuys's avatar
      fs: tweak debug String method · 698ea1ab
      Han-Wen Nienhuys authored
      698ea1ab
    • Han-Wen Nienhuys's avatar
      2ac66408
    • Han-Wen Nienhuys's avatar
      815ee9aa
  2. 23 Apr, 2019 3 commits
    • Kirill Smelkov's avatar
      Merge branch 'master' into t · 8e7f68ca
      Kirill Smelkov authored
      * master:
        fuse: Don't skip WRITE details in debug output
      8e7f68ca
    • Kirill Smelkov's avatar
      fuse: Don't skip WRITE details in debug output · 19ede699
      Kirill Smelkov authored
      When kernel sends WRITE request it sends with it file handle, offset,
      size etc. We were not printing all this. Compare e.g. debug output
      sample for TestUtimesNano before and after the patch.
      
      before:
      
      	rx 8: CREATE i1 {0100600 [CREAT,TRUNC,WRONLY,0x8000] (00)} ["hello.txt"] 10b
      	tx 8:     OK, {i3 g2 {M0100600 SZ=0 L=1 1000:1000 B0*4096 i0:733462 A 1556038355.426630 M 1556038355.426630 C 1556038355.426630} &{2 0 0}}
      	rx 9: GETXATTR i3 {sz 0} ["security.capability"] 20b
      	tx 9:     61=no data available
      	rx 10: WRITE i3  3b							<-- NOTE
      	tx 10:     OK
      
      after:
      
      	rx 8: CREATE i1 {0100600 [WRONLY,CREAT,TRUNC,0x8000] (00)} ["hello.txt"] 10b
      	tx 8:     OK, {i3 g2 {M0100600 SZ=0 L=1 1000:1000 B0*4096 i0:736300 A 1556038379.359197 M 1556038379.359197 C 1556038379.359197} &{2 0 0}}
      	rx 9: GETXATTR i3 {sz 0} ["security.capability"] 20b
      	tx 9:     61=no data available
      	rx 10: WRITE i3 {Fh 2 [0 +3)  L 0 WRONLY,NONBLOCK,0x8000}  3b		<-- NOTE
      	tx 10:     OK
      19ede699
    • Kirill Smelkov's avatar
      . · ae0f3969
      Kirill Smelkov authored
      ae0f3969
  3. 17 Apr, 2019 5 commits
  4. 16 Apr, 2019 2 commits
    • Jakob Unterwurzacher's avatar
      Add TestParallelDiropsHang / emulate gvfs-udisks2-volume-monitor · 6560fb0d
      Jakob Unterwurzacher authored
      There is a hang that appears when enabling CAP_PARALLEL_DIROPS on Linux
      4.15.0: https://github.com/hanwen/go-fuse/issues/281
      
      The hang was originally triggered by gvfs-udisks2-volume-monitor. This
      test emulates what gvfs-udisks2-volume-monitor does.
      
      On 4.15.0 kernels, the test will get stuck, and after 120 seconds you
      get a kernel backtrace like this:
      
      [ 1813.463679] INFO: task nodefs.test:2357 blocked for more than 120 seconds.
      [ 1813.463685]       Not tainted 4.15.0-45-generic #48~16.04.1-Ubuntu
      [ 1813.463687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 1813.463689] nodefs.test     D    0  2357   2311 0x00000004
      [ 1813.463691] Call Trace:
      [ 1813.463709]  __schedule+0x3d6/0x8b0
      [ 1813.463712]  schedule+0x36/0x80
      [ 1813.463714]  schedule_preempt_disabled+0xe/0x10
      [ 1813.463716]  __mutex_lock.isra.2+0x2ae/0x4e0
      [ 1813.463720]  ? ___slab_alloc+0x223/0x4e0
      [ 1813.463722]  ? _cond_resched+0x1a/0x50
      [ 1813.463724]  __mutex_lock_slowpath+0x13/0x20
      [ 1813.463725]  ? __mutex_lock_slowpath+0x13/0x20
      [ 1813.463727]  mutex_lock+0x2f/0x40
      [ 1813.463729]  fuse_lock_inode+0x2a/0x30
      [ 1813.463732]  fuse_lookup+0x31/0x140
      [ 1813.463735]  ? d_alloc_parallel+0xc1/0x4c0
      [ 1813.463738]  fuse_atomic_open+0x6d/0xf0
      [ 1813.463740]  path_openat+0xc5d/0x13f0
      [ 1813.463744]  do_filp_open+0x99/0x110
      [ 1813.463747]  ? __check_object_size+0xfc/0x1a0
      [ 1813.463749]  ? __alloc_fd+0x46/0x170
      [ 1813.463752]  do_sys_open+0x12d/0x290
      [ 1813.463754]  ? do_sys_open+0x12d/0x290
      [ 1813.463756]  SyS_openat+0x14/0x20
      [ 1813.463759]  do_syscall_64+0x73/0x130
      [ 1813.463762]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      6560fb0d
    • Jakob Unterwurzacher's avatar
      dirstream_linux: fix zero terminator search · 25ee0996
      Jakob Unterwurzacher authored
      The old code searched for the first non-null byte from the end of
      the slice. This assumes that all bytes after the name are initialized
      to zero, which does not hold true on Linux 4.15.
      
      Instead, search for the first null byte from the start of the slice,
      which is guaranteed by man(3) readdir.
      
      Fixes https://github.com/hanwen/go-fuse/issues/287
      25ee0996
  5. 09 Apr, 2019 6 commits
  6. 08 Apr, 2019 10 commits
  7. 07 Apr, 2019 3 commits
    • Han-Wen Nienhuys's avatar
      nodefs: protect dirsteam overflow with lock · 7572e9d8
      Han-Wen Nienhuys authored
      Appeases the race detector.
      7572e9d8
    • Jakob Unterwurzacher's avatar
      nodefs: add TestReadDirStress · 623db2fc
      Jakob Unterwurzacher authored
      This currently fails on Linux 5.0 and may be related to
      https://github.com/hanwen/go-fuse/issues/287 .
      
      1 jakob@brikett:~/go/src/github.com/hanwen/go-fuse/nodefs$ go test
      21:42:47.356529 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
      21:42:47.598309 writer: Write/Writev failed, err: 22=invalid argument. opcode: READDIRPLUS
      21:42:47.604424 writer: Write/Writev failed, err: 22=invalid argument. opcode: READDIRPLUS
      21:42:47.606073 writer: Write/Writev failed, err: 22=invalid argument. opcode: READDIRPLUS
      --- FAIL: TestReadDirStress (0.36s)
          simple_test.go:270: goroutine 2 iteration 5: readdirent: input/output error
          simple_test.go:270: goroutine 1 iteration 9: readdirent: input/output error
          simple_test.go:270: goroutine 3 iteration 10: readdirent: input/output error
          simple_test.go:43: /usr/bin/fusermount: entry for /tmp/TestReadDirStress639795934/mnt not found in /etc/mtab
               (code exit status 1)
      
      FAIL
      exit status 1
      FAIL	github.com/hanwen/go-fuse/nodefs	0.994s
      623db2fc
    • Han-Wen Nienhuys's avatar
      nodefs: protect against double close · a856a74e
      Han-Wen Nienhuys authored
      Access file descriptors under lock, and set to -1 on close. This
      avoids confusing errors if Close() is doubly called
      a856a74e
  8. 06 Apr, 2019 1 commit