An error occurred fetching the project authors.
  1. 04 Sep, 2024 1 commit
  2. 29 Aug, 2024 2 commits
  3. 15 Apr, 2024 1 commit
    • Amir Goldstein's avatar
      fuse: fix wrong ff->iomode state changes from parallel dio write · 4864a6dd
      Amir Goldstein authored
      There is a confusion with fuse_file_uncached_io_{start,end} interface.
      These helpers do two things when called from passthrough open()/release():
      1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode)
      2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open)
      
      The calls from parallel dio write path need to take a reference on
      fi->iocachectr, but they should not be changing ff->iomode state, because
      in this case, the fi->iocachectr reference does not stick around until file
      release().
      
      Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from
      parallel dio write path and rename fuse_file_*cached_io_{start,end} helpers
      to fuse_file_*cached_io_{open,release} to clarify the difference.
      
      Fixes: 205c1d80 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP")
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      4864a6dd
  4. 06 Mar, 2024 2 commits
    • Miklos Szeredi's avatar
      fuse: get rid of ff->readdir.lock · cdf6ac2a
      Miklos Szeredi authored
      The same protection is provided by file->f_pos_lock.
      
      Note, this relies on the fact that file->f_mode has FMODE_ATOMIC_POS.
      This flag is cleared by stream_open(), which would prevent locking of
      f_pos_lock.
      
      Prior to commit 7de64d52 ("fuse: break up fuse_open_common()")
      FOPEN_STREAM on a directory would cause stream_open() to be called.
      After this commit this is not done anymore, so f_pos_lock will always
      be locked.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      cdf6ac2a
    • Lei Huang's avatar
      fuse: Fix missing FOLL_PIN for direct-io · 738adade
      Lei Huang authored
      Our user space filesystem relies on fuse to provide POSIX interface.
      In our test, a known string is written into a file and the content
      is read back later to verify correct data returned. We observed wrong
      data returned in read buffer in rare cases although correct data are
      stored in our filesystem.
      
      Fuse kernel module calls iov_iter_get_pages2() to get the physical
      pages of the user-space read buffer passed in read(). The pages are
      not pinned to avoid page migration. When page migration occurs, the
      consequence are two-folds.
      
      1) Applications do not receive correct data in read buffer.
      2) fuse kernel writes data into a wrong place.
      
      Using iov_iter_extract_pages() to pin pages fixes the issue in our
      test.
      
      An auxiliary variable "struct page **pt_pages" is used in the patch
      to prepare the 2nd parameter for iov_iter_extract_pages() since
      iov_iter_get_pages2() uses a different type for the 2nd parameter.
      
      [SzM] add iov_iter_extract_will_pin(ii) and unpin only if true.
      Signed-off-by: default avatarLei Huang <lei.huang@linux.intel.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      738adade
  5. 05 Mar, 2024 7 commits
    • Miklos Szeredi's avatar
      fuse: don't unhash root · b1fe686a
      Miklos Szeredi authored
      The root inode is assumed to be always hashed.  Do not unhash the root
      inode even if it is marked BAD.
      
      Fixes: 5d069dbe ("fuse: fix bad inode")
      Cc: <stable@vger.kernel.org> # v5.11
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      b1fe686a
    • Amir Goldstein's avatar
      fuse: implement passthrough for mmap · fda0b98e
      Amir Goldstein authored
      An mmap request for a file open in passthrough mode, maps the memory
      directly to the backing file.
      
      An mmap of a file in direct io mode, usually uses cached mmap and puts
      the inode in caching io mode, which denies new passthrough opens of that
      inode, because caching io mode is conflicting with passthrough io mode.
      
      For the same reason, trying to mmap a direct io file, while there is
      a passthrough file open on the same inode will fail with -ENODEV.
      
      An mmap of a file in direct io mode, also needs to wait for parallel
      dio writes in-progress to complete.
      
      If a passthrough file is opened, while an mmap of another direct io
      file is waiting for parallel dio writes to complete, the wait is aborted
      and mmap fails with -ENODEV.
      
      A FUSE server that uses passthrough and direct io opens on the same inode
      that may also be mmaped, is advised to provide a backing fd also for the
      files that are open in direct io mode (i.e. use the flags combination
      FOPEN_DIRECT_IO | FOPEN_PASSTHROUGH), so that mmap will always use the
      backing file, even if read/write do not passthrough.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      fda0b98e
    • Amir Goldstein's avatar
      fuse: implement splice read/write passthrough · 5ca73468
      Amir Goldstein authored
      This allows passing fstests generic/249 and generic/591.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5ca73468
    • Amir Goldstein's avatar
      fuse: implement read/write passthrough · 57e1176e
      Amir Goldstein authored
      Use the backing file read/write helpers to implement read/write
      passthrough to a backing file.
      
      After read/write, we invalidate a/c/mtime/size attributes.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      57e1176e
    • Amir Goldstein's avatar
      fuse: implement open in passthrough mode · 4a90451b
      Amir Goldstein authored
      After getting a backing file id with FUSE_DEV_IOC_BACKING_OPEN ioctl,
      a FUSE server can reply to an OPEN request with flag FOPEN_PASSTHROUGH
      and the backing file id.
      
      The FUSE server should reuse the same backing file id for all the open
      replies of the same FUSE inode and open will fail (with -EIO) if a the
      server attempts to open the same inode with conflicting io modes or to
      setup passthrough to two different backing files for the same FUSE inode.
      Using the same backing file id for several different inodes is allowed.
      
      Opening a new file with FOPEN_DIRECT_IO for an inode that is already
      open for passthrough is allowed, but only if the FOPEN_PASSTHROUGH flag
      and correct backing file id are specified as well.
      
      The read/write IO of such files will not use passthrough operations to
      the backing file, but mmap, which does not support direct_io, will use
      the backing file insead of using the page cache as it always did.
      
      Even though all FUSE passthrough files of the same inode use the same
      backing file as a backing inode reference, each FUSE file opens a unique
      instance of a backing_file object to store the FUSE path that was used
      to open the inode and the open flags of the specific open file.
      
      The per-file, backing_file object is released along with the FUSE file.
      The inode associated fuse_backing object is released when the last FUSE
      passthrough file of that inode is released AND when the backing file id
      is closed by the server using the FUSE_DEV_IOC_BACKING_CLOSE ioctl.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      4a90451b
    • Amir Goldstein's avatar
      fuse: prepare for opening file in passthrough mode · fc8ff397
      Amir Goldstein authored
      In preparation for opening file in passthrough mode, store the
      fuse_open_out argument in ff->args to be passed into fuse_file_io_open()
      with the optional backing_id member.
      
      This will be used for setting up passthrough to backing file on open
      reply with FOPEN_PASSTHROUGH flag and a valid backing_id.
      
      Opening a file in passthrough mode may fail for several reasons, such as
      missing capability, conflicting open flags or inode in caching mode.
      Return EIO from fuse_file_io_open() in those cases.
      
      The combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO is allowed -
      it mean that read/write operations will go directly to the server,
      but mmap will be done to the backing file.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      fc8ff397
    • Amir Goldstein's avatar
      fuse: implement ioctls to manage backing files · 44350256
      Amir Goldstein authored
      FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file
      descriptor.  If the call succeeds, a backing file identifier is returned.
      
      A later change will be using this backing file id in a reply to OPEN
      request with the flag FOPEN_PASSTHROUGH to setup passthrough of file
      operations on the open FUSE file to the backing file.
      
      The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the
      backing file by its id.
      
      This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH
      flag is still in progress, the open may fail if the backing file is
      closed before the fuse file was opened.
      
      Setting up backing files requires a server with CAP_SYS_ADMIN privileges.
      For the backing file to be successfully setup, the backing file must
      implement both read_iter and write_iter file operations.
      
      The limitation on the level of filesystem stacking allowed for the
      backing file is enforced before setting up the backing file.
      Signed-off-by: default avatarAlessio Balsini <balsini@android.com>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      44350256
  6. 25 Feb, 2024 1 commit
    • Al Viro's avatar
      fuse: fix UAF in rcu pathwalks · 053fc4f7
      Al Viro authored
      ->permission(), ->get_link() and ->inode_get_acl() might dereference
      ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
      as well) when called from rcu pathwalk.
      
      Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
      and dropping ->user_ns rcu-delayed too.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      053fc4f7
  7. 23 Feb, 2024 6 commits
  8. 04 Dec, 2023 2 commits
    • Krister Johansen's avatar
      fuse: share lookup state between submount and its parent · c4d361f6
      Krister Johansen authored
      Fuse submounts do not perform a lookup for the nodeid that they inherit
      from their parent.  Instead, the code decrements the nlookup on the
      submount's fuse_inode when it is instantiated, and no forget is
      performed when a submount root is evicted.
      
      Trouble arises when the submount's parent is evicted despite the
      submount itself being in use.  In this author's case, the submount was
      in a container and deatched from the initial mount namespace via a
      MNT_DEATCH operation.  When memory pressure triggered the shrinker, the
      inode from the parent was evicted, which triggered enough forgets to
      render the submount's nodeid invalid.
      
      Since submounts should still function, even if their parent goes away,
      solve this problem by sharing refcounted state between the parent and
      its submount.  When all of the references on this shared state reach
      zero, it's safe to forget the final lookup of the fuse nodeid.
      Signed-off-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Cc: stable@vger.kernel.org
      Fixes: 1866d779 ("fuse: Allow fuse_fill_super_common() for submounts")
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c4d361f6
    • Tyler Fanelli's avatar
      fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP · c55e0a55
      Tyler Fanelli authored
      Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its
      description indicates a purpose of reducing memory footprint. This
      may imply that it could be further used to relax other DIRECT_IO
      operations in the future.
      
      Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing,
      allow shared mmap of DIRECT_IO files while still bypassing the cache
      on regular reads and writes.
      
      [Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility.
      Signed-off-by: default avatarTyler Fanelli <tfanelli@redhat.com>
      Fixes: e78662e8 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
      Cc: <stable@vger.kernel.org> # v6.6
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c55e0a55
  9. 09 Oct, 2023 1 commit
  10. 21 Aug, 2023 2 commits
    • Miklos Szeredi's avatar
      fuse: cache btime · 972f4c46
      Miklos Szeredi authored
      Not all inode attributes are supported by all filesystems, but for the
      basic stats (which are returned by stat(2) and friends) all of them will
      have some value, even if that doesn't reflect a real attribute of the file.
      
      Btime is different, in that filesystems are free to report or not report a
      value in statx.  If the value is available, then STATX_BTIME bit is set in
      stx_mask.
      
      When caching the value of btime, remember the availability of the attribute
      as well as the value (if available).  This is done by using the
      FUSE_I_BTIME bit in fuse_inode->state to indicate availability, while using
      fuse_inode->inval_mask & STATX_BTIME to indicate the state of the cache
      itself (i.e. set if cache is invalid, and cleared if cache is valid).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      972f4c46
    • Miklos Szeredi's avatar
      fuse: implement statx · d3045530
      Miklos Szeredi authored
      Allow querying btime.  When btime is requested in mask, then FUSE_STATX
      request is sent.  Otherwise keep using FUSE_GETATTR.
      
      The userspace interface for statx matches that of the statx(2) API.
      However there are limitations on how this interface is used:
      
       - returned basic stats and btime are used, stx_attributes, etc. are
         ignored
      
       - always query basic stats and btime, regardless of what was requested
      
       - requested sync type is ignored, the default is passed to the server
      
       - if server returns with some attributes missing from the result_mask,
         then no attributes will be cached
      
       - btime is not cached yet (next patch will fix that)
      
      For new inodes initialize fi->inval_mask to "all invalid", instead of "all
      valid" as previously.  Also only clear basic stats from inval_mask when
      caching attributes.  This will result in the caching logic not thinking
      that btime is cached.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      d3045530
  11. 16 Aug, 2023 2 commits
    • Miklos Szeredi's avatar
      fuse: add ATTR_TIMEOUT macro · 9dc10a54
      Miklos Szeredi authored
      Next patch will introduce yet another type attribute reply.  Add a macro
      that can handle attribute timeouts for all of the structs.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      9dc10a54
    • Hao Xu's avatar
      fuse: add a new fuse init flag to relax restrictions in no cache mode · e78662e8
      Hao Xu authored
      FOPEN_DIRECT_IO is usually set by fuse daemon to indicate need of strong
      coherency, e.g. network filesystems.  Thus shared mmap is disabled since it
      leverages page cache and may write to it, which may cause inconsistence.
      
      But FOPEN_DIRECT_IO can be used not for coherency but to reduce memory
      footprint as well, e.g. reduce guest memory usage with virtiofs.
      Therefore, add a new fuse init flag FUSE_DIRECT_IO_RELAX to relax
      restrictions in that mode, currently, it allows shared mmap.  One thing to
      note is to make sure it doesn't break coherency in your use case.
      Signed-off-by: default avatarHao Xu <howeyxu@tencent.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e78662e8
  12. 26 Jan, 2023 2 commits
    • Miklos Szeredi's avatar
      fuse: optional supplementary group in create requests · 8ed7cb3f
      Miklos Szeredi authored
      Permission to create an object (create, mkdir, symlink, mknod) needs to
      take supplementary groups into account.
      
      Add a supplementary group request extension.  This can contain an arbitrary
      number of group IDs and can be added to any request.  This extension is not
      added to any request by default.
      
      Add FUSE_CREATE_SUPP_GROUP init flag to enable supplementary group info in
      creation requests.  This adds just a single supplementary group that
      matches the parent group in the case described above.  In other cases the
      extension is not added.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8ed7cb3f
    • Miklos Szeredi's avatar
      fuse: add request extension · 15d937d7
      Miklos Szeredi authored
      Will need to add supplementary groups to create messages, so add the
      general concept of a request extension.  A request extension is appended to
      the end of the main request.  It has a header indicating the size and type
      of the extension.
      
      The create security context (fuse_secctx_*) is similar to the generic
      request extension, so include that as well in a backward compatible manner.
      
      Add the total extension length to the request header.  The offset of the
      extension block within the request can be calculated by:
      
        inh->len - inh->total_extlen * 8
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      15d937d7
  13. 24 Jan, 2023 1 commit
    • Christian Brauner's avatar
      fuse: fixes after adapting to new posix acl api · facd6105
      Christian Brauner authored
      This cycle we ported all filesystems to the new posix acl api. While
      looking at further simplifications in this area to remove the last
      remnants of the generic dummy posix acl handlers we realized that we
      regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
      of posix acls.
      
      With the change to a dedicated posix acl api interacting with posix acls
      doesn't go through the old xattr codepaths anymore and instead only
      relies the get acl and set acl inode operations.
      
      Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
      to get and set posix acl albeit with two caveats. First, that posix acls
      aren't cached. And second, that they aren't used for permission checking
      in the vfs.
      
      We regressed that use-case as we currently refuse to retrieve any posix
      acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
      would see a change in behavior.
      
      We can restore the old behavior in multiple ways. We could change the
      new posix acl api and look for a dedicated xattr handler and if we find
      one prefer that over the dedicated posix acl api. That would break the
      consistency of the new posix acl api so we would very much prefer not to
      do that.
      
      We could introduce a new ACL_*_CACHE sentinel that would instruct the
      vfs permission checking codepath to not call into the filesystem and
      ignore acls.
      
      But a more straightforward fix for v6.2 is to do the same thing that
      Overlayfs does and give fuse a separate get acl method for permission
      checking. Overlayfs uses this to express different needs for vfs
      permission lookup and acl based retrieval via the regular system call
      path as well. Let fuse do the same for now. This way fuse can continue
      to refuse to retrieve posix acls for daemons that don't set
      FUSE_POSXI_ACL for permission checking while allowing a fuse server to
      retrieve it via the usual system calls.
      
      In the future, we could extend the get acl inode operation to not just
      pass a simple boolean to indicate rcu lookup but instead make it a flag
      argument. Then in addition to passing the information that this is an
      rcu lookup to the filesystem we could also introduce a flag that tells
      the filesystem that this is a request from the vfs to use these acls for
      permission checking. Then fuse could refuse the get acl request for
      permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
      the same get acl method. This would also help Overlayfs and allow us to
      remove the second method for it as well.
      
      But since that change is more invasive as we need to update the get acl
      inode operation for multiple filesystems we should not do this as a fix
      for v6.2. Instead we will do this for the v6.3 merge window.
      
      Fwiw, since posix acls are now always correctly translated in the new
      posix acl api we could also allow them to be used for daemons without
      FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
      change and again if dones should be done for v6.3. For now, let's just
      restore the original behavior.
      
      A nice side-effect of this change is that for fuse daemons with and
      without FUSE_POSIX_ACL the same code is used for posix acls in a
      backwards compatible way. This also means we can remove the legacy xattr
      handlers completely. We've also added comments to explain the expected
      behavior for daemons without FUSE_POSIX_ACL into the code.
      
      Fixes: 318e6685 ("xattr: use posix acl api")
      Signed-off-by: default avatarSeth Forshee (Digital Ocean) <sforshee@kernel.org>
      Reviewed-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      facd6105
  14. 19 Jan, 2023 2 commits
    • Christian Brauner's avatar
      fs: port ->fileattr_set() to pass mnt_idmap · 8782a9ae
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      8782a9ae
    • Christian Brauner's avatar
      fs: port ->set_acl() to pass mnt_idmap · 13e83a49
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      13e83a49
  15. 23 Nov, 2022 2 commits
    • Dave Marchevsky's avatar
      fuse: Rearrange fuse_allow_current_process checks · b1387777
      Dave Marchevsky authored
      This is a followup to a previous commit of mine [0], which added the
      allow_sys_admin_access && capable(CAP_SYS_ADMIN) check.  This patch
      rearranges the order of checks in fuse_allow_current_process without
      changing functionality.
      
      Commit 9ccf47b2 ("fuse: Add module param for CAP_SYS_ADMIN access
      bypassing allow_other") added allow_sys_admin_access &&
      capable(CAP_SYS_ADMIN) check to the beginning of the function, with the
      reasoning that allow_sys_admin_access should be an 'escape hatch' for users
      with CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
      
      However, placing this new check first results in many capable() calls when
      allow_sys_admin_access is set, where another check would've also returned
      1.  This can be problematic when a BPF program is tracing capable() calls.
      
      At Meta we ran into such a scenario recently.  On a host where
      allow_sys_admin_access is set but most of the FUSE access is from processes
      which would pass other checks - i.e.  they don't need CAP_SYS_ADMIN 'escape
      hatch' - this results in an unnecessary capable() call for each fs op.  We
      also have a daemon tracing capable() with BPF and doing some data
      collection, so tracing these extraneous capable() calls has the potential
      to regress performance for an application doing many FUSE ops.
      
      So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
      hatch' is checked last.  Add a small helper, fuse_permissible_uidgid, to
      make the logic easier to understand.  Previously, if allow_other is set on
      the fuse_conn, uid/git checking doesn't happen as current_in_userns result
      is returned.  These semantics are maintained here: fuse_permissible_uidgid
      check only happens if allow_other is not set.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      b1387777
    • Miklos Szeredi's avatar
      fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY · 4f8d3702
      Miklos Szeredi authored
      Add a flag to entry expiration that lets the filesystem expire a dentry
      without kicking it out from the cache immediately.
      
      This makes a difference for overmounted dentries, where plain invalidation
      would detach all submounts before dropping the dentry from the cache.  If
      only expiry is set on the dentry, then any overmounts are left alone and
      until ->d_revalidate() is called.
      
      Note: ->d_revalidate() is not called for the case of following a submount,
      so invalidation will only be triggered for the non-overmounted case.  The
      dentry could also be mounted in a different mount instance, in which case
      any submounts will still be detached.
      Suggested-by: default avatarJakob Blomer <jblomer@cern.ch>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      4f8d3702
  16. 19 Oct, 2022 1 commit
    • Christian Brauner's avatar
      fs: pass dentry to set acl method · 138060ba
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      Since some filesystem rely on the dentry being available to them when
      setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
      operation. But since ->set_acl() is required in order to use the generic
      posix acl xattr handlers filesystems that do not implement this inode
      operation cannot use the handler and need to implement their own
      dedicated posix acl handlers.
      
      Update the ->set_acl() inode method to take a dentry argument. This
      allows all filesystems to rely on ->set_acl().
      
      As far as I can tell all codepaths can be switched to rely on the dentry
      instead of just the inode. Note that the original motivation for passing
      the dentry separate from the inode instead of just the dentry in the
      xattr handlers was because of security modules that call
      security_d_instantiate(). This hook is called during
      d_instantiate_new(), d_add(), __d_instantiate_anon(), and
      d_splice_alias() to initialize the inode's security context and possibly
      to set security.* xattrs. Since this only affects security.* xattrs this
      is completely irrelevant for posix acls.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      138060ba
  17. 24 Sep, 2022 1 commit
  18. 01 Apr, 2022 1 commit
  19. 07 Mar, 2022 1 commit
    • Miklos Szeredi's avatar
      fuse: fix pipe buffer lifetime for direct_io · 0c4bcfde
      Miklos Szeredi authored
      In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
      fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
      imports the write buffer with fuse_get_user_pages(), which uses
      iov_iter_get_pages() to grab references to userspace pages instead of
      actually copying memory.
      
      On the filesystem device side, these pages can then either be read to
      userspace (via fuse_dev_read()), or splice()d over into a pipe using
      fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
      
      This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
      the userspace filesystem can mark the request as completed, causing write()
      to return. At that point, the userspace filesystem should no longer have
      access to the pipe buffer.
      
      Fix by copying pages coming from the user address space to new pipe
      buffers.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Fixes: c3021629 ("fuse: support splice() reading from fuse device")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      0c4bcfde
  20. 14 Dec, 2021 2 commits
    • Jeffle Xu's avatar
      fuse: mark inode DONT_CACHE when per inode DAX hint changes · c3cb6f93
      Jeffle Xu authored
      When the per inode DAX hint changes while the file is still *opened*, it
      is quite complicated and maybe fragile to dynamically change the DAX
      state.
      
      Hence mark the inode and corresponding dentries as DONE_CACHE once the
      per inode DAX hint changes, so that the inode instance will be evicted
      and freed as soon as possible once the file is closed and the last
      reference to the inode is put. And then when the file gets reopened next
      time, the new instantiated inode will reflect the new DAX state.
      
      In summary, when the per inode DAX hint changes for an *opened* file, the
      DAX state of the file won't be updated until this file is closed and
      reopened later.
      Signed-off-by: default avatarJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c3cb6f93
    • Jeffle Xu's avatar
      fuse: negotiate per inode DAX in FUSE_INIT · 2ee019fa
      Jeffle Xu authored
      Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
      mounted with "dax=inode". Then server is aware that client is in per
      inode DAX mode, and will construct per-inode DAX attribute accordingly.
      
      Server shall also advertise support for per inode DAX. If server doesn't
      support it while client is mounted with "dax=inode", client will
      silently fallback to "dax=never" since "dax=inode" is advisory only.
      Signed-off-by: default avatarJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2ee019fa