1. 12 Jan, 2023 2 commits
  2. 07 Jan, 2023 1 commit
    • Chuck Lever's avatar
      NFSD: Use set_bit(RQ_DROPME) · 5304930d
      Chuck Lever authored
      The premise that "Once an svc thread is scheduled and executing an
      RPC, no other processes will touch svc_rqst::rq_flags" is false.
      svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
      threads when determining which thread to wake up next.
      
      Fixes: 93155647 ("NFSD: Use only RQ_DROPME to signal the need to drop a reply")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5304930d
  3. 06 Jan, 2023 2 commits
    • Chuck Lever's avatar
      Revert "SUNRPC: Use RMW bitops in single-threaded hot paths" · 7827c81f
      Chuck Lever authored
      The premise that "Once an svc thread is scheduled and executing an
      RPC, no other processes will touch svc_rqst::rq_flags" is false.
      svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
      threads when determining which thread to wake up next.
      
      Found via KCSAN.
      
      Fixes: 28df0988 ("SUNRPC: Use RMW bitops in single-threaded hot paths")
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      7827c81f
    • Jeff Layton's avatar
      nfsd: fix handling of cached open files in nfsd4_open codepath · 0b3a551f
      Jeff Layton authored
      Commit fb70bf12 ("NFSD: Instantiate a struct file when creating a
      regular NFSv4 file") added the ability to cache an open fd over a
      compound. There are a couple of problems with the way this currently
      works:
      
      It's racy, as a newly-created nfsd_file can end up with its PENDING bit
      cleared while the nf is hashed, and the nf_file pointer is still zeroed
      out. Other tasks can find it in this state and they expect to see a
      valid nf_file, and can oops if nf_file is NULL.
      
      Also, there is no guarantee that we'll end up creating a new nfsd_file
      if one is already in the hash. If an extant entry is in the hash with a
      valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with
      the value of op_file and the old nf_file will leak.
      
      Fix both issues by making a new nfsd_file_acquirei_opened variant that
      takes an optional file pointer. If one is present when this is called,
      we'll take a new reference to it instead of trying to open the file. If
      the nfsd_file already has a valid nf_file, we'll just ignore the
      optional file and pass the nfsd_file back as-is.
      
      Also rework the tracepoints a bit to allow for an "opened" variant and
      don't try to avoid counting acquisitions in the case where we already
      have a cached open file.
      
      Fixes: fb70bf12 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file")
      Cc: Trond Myklebust <trondmy@hammerspace.com>
      Reported-by: default avatarStanislav Saner <ssaner@redhat.com>
      Reported-and-Tested-by: default avatarRuben Vestergaard <rubenv@drcmr.dk>
      Reported-and-Tested-by: default avatarTorkil Svensgaard <torkil@drcmr.dk>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      0b3a551f
  4. 02 Jan, 2023 1 commit
  5. 22 Dec, 2022 1 commit
  6. 14 Dec, 2022 1 commit
  7. 12 Dec, 2022 1 commit
    • Dan Aloni's avatar
      nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure · 3bc8edc9
      Dan Aloni authored
      On error situation `clp->cl_cb_conn.cb_xprt` should not be given
      a reference to the xprt otherwise both client cleanup and the
      error handling path of the caller call to put it. Better to
      delay handing over the reference to a later branch.
      
      [   72.530665] refcount_t: underflow; use-after-free.
      [   72.531933] WARNING: CPU: 0 PID: 173 at lib/refcount.c:28 refcount_warn_saturate+0xcf/0x120
      [   72.533075] Modules linked in: nfsd(OE) nfsv4(OE) nfsv3(OE) nfs(OE) lockd(OE) compat_nfs_ssc(OE) nfs_acl(OE) rpcsec_gss_krb5(OE) auth_rpcgss(OE) rpcrdma(OE) dns_resolver fscache netfs grace rdma_cm iw_cm ib_cm sunrpc(OE) mlx5_ib mlx5_core mlxfw pci_hyperv_intf ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nft_counter xt_addrtype nft_compat br_netfilter bridge stp llc nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set overlay nf_tables nfnetlink crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xfs serio_raw virtio_net virtio_blk net_failover failover fuse [last unloaded: sunrpc]
      [   72.540389] CPU: 0 PID: 173 Comm: kworker/u16:5 Tainted: G           OE     5.15.82-dan #1
      [   72.541511] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+1084+97b81f61 04/01/2014
      [   72.542717] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
      [   72.543575] RIP: 0010:refcount_warn_saturate+0xcf/0x120
      [   72.544299] Code: 55 00 0f 0b 5d e9 01 50 98 00 80 3d 75 9e 39 08 00 0f 85 74 ff ff ff 48 c7 c7 e8 d1 60 8e c6 05 61 9e 39 08 01 e8 f6 51 55 00 <0f> 0b 5d e9 d9 4f 98 00 80 3d 4b 9e 39 08 00 0f 85 4c ff ff ff 48
      [   72.546666] RSP: 0018:ffffb3f841157cf0 EFLAGS: 00010286
      [   72.547393] RAX: 0000000000000026 RBX: ffff89ac6231d478 RCX: 0000000000000000
      [   72.548324] RDX: ffff89adb7c2c2c0 RSI: ffff89adb7c205c0 RDI: ffff89adb7c205c0
      [   72.549271] RBP: ffffb3f841157cf0 R08: 0000000000000000 R09: c0000000ffefffff
      [   72.550209] R10: 0000000000000001 R11: ffffb3f841157ad0 R12: ffff89ac6231d180
      [   72.551142] R13: ffff89ac6231d478 R14: ffff89ac40c06180 R15: ffff89ac6231d4b0
      [   72.552089] FS:  0000000000000000(0000) GS:ffff89adb7c00000(0000) knlGS:0000000000000000
      [   72.553175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   72.553934] CR2: 0000563a310506a8 CR3: 0000000109a66000 CR4: 0000000000350ef0
      [   72.554874] Call Trace:
      [   72.555278]  <TASK>
      [   72.555614]  svc_xprt_put+0xaf/0xe0 [sunrpc]
      [   72.556276]  nfsd4_process_cb_update.isra.11+0xb7/0x410 [nfsd]
      [   72.557087]  ? update_load_avg+0x82/0x610
      [   72.557652]  ? cpuacct_charge+0x60/0x70
      [   72.558212]  ? dequeue_entity+0xdb/0x3e0
      [   72.558765]  ? queued_spin_unlock+0x9/0x20
      [   72.559358]  nfsd4_run_cb_work+0xfc/0x270 [nfsd]
      [   72.560031]  process_one_work+0x1df/0x390
      [   72.560600]  worker_thread+0x37/0x3b0
      [   72.561644]  ? process_one_work+0x390/0x390
      [   72.562247]  kthread+0x12f/0x150
      [   72.562710]  ? set_kthread_struct+0x50/0x50
      [   72.563309]  ret_from_fork+0x22/0x30
      [   72.563818]  </TASK>
      [   72.564189] ---[ end trace 031117b1c72ec616 ]---
      [   72.566019] list_add corruption. next->prev should be prev (ffff89ac4977e538), but was ffff89ac4763e018. (next=ffff89ac4763e018).
      [   72.567647] ------------[ cut here ]------------
      
      Fixes: a4abc6b1 ("nfsd: Fix svc_xprt refcnt leak when setup callback client failed")
      Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarDan Aloni <dan.aloni@vastdata.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      3bc8edc9
  8. 11 Dec, 2022 1 commit
    • Jeff Layton's avatar
      nfsd: rework refcounting in filecache · ac3a2585
      Jeff Layton authored
      The filecache refcounting is a bit non-standard for something searchable
      by RCU, in that we maintain a sentinel reference while it's hashed. This
      in turn requires that we have to do things differently in the "put"
      depending on whether its hashed, which we believe to have led to races.
      
      There are other problems in here too. nfsd_file_close_inode_sync can end
      up freeing an nfsd_file while there are still outstanding references to
      it, and there are a number of subtle ToC/ToU races.
      
      Rework the code so that the refcount is what drives the lifecycle. When
      the refcount goes to zero, then unhash and rcu free the object. A task
      searching for a nfsd_file is allowed to bump its refcount, but only if
      it's not already 0. Ensure that we don't make any other changes to it
      until a reference is held.
      
      With this change, the LRU carries a reference. Take special care to deal
      with it when removing an entry from the list, and ensure that we only
      repurpose the nf_lru list_head when the refcount is 0 to ensure
      exclusive access to it.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ac3a2585
  9. 10 Dec, 2022 21 commits
  10. 28 Nov, 2022 9 commits
    • Jeff Layton's avatar
      nfsd: reorganize filecache.c · 82141185
      Jeff Layton authored
      In a coming patch, we're going to rework how the filecache refcounting
      works. Move some code around in the function to reduce the churn in the
      later patches, and rename some of the functions with (hopefully) clearer
      names: nfsd_file_flush becomes nfsd_file_fsync, and
      nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue.
      
      Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free,
      to better match the name of the function from which it's called.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      82141185
    • Jeff Layton's avatar
      nfsd: remove the pages_flushed statistic from filecache · 1f696e23
      Jeff Layton authored
      We're counting mapping->nrpages, but not all of those are necessarily
      dirty. We don't really have a simple way to count just the dirty pages,
      so just remove this stat since it's not accurate.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      1f696e23
    • Chuck Lever's avatar
      MAINTAINERS: NFSD should be responsible for fs/exportfs · 296f3d7f
      Chuck Lever authored
      We recently received a patch for fs/exportfs/expfs.c, but there
      isn't a subsystem maintainer listed for fs/exportfs:
      
      Christian Brauner <brauner@kernel.org> (commit_signer:2/2=100%,authored:1/2=50%,added_lines:3/6=50%,removed_lines:2/6=33%)
      Al Viro <viro@zeniv.linux.org.uk> (commit_signer:1/2=50%,authored:1/2=50%,added_lines:3/6=50%,removed_lines:4/6=67%)
      Miklos Szeredi <mszeredi@redhat.com> (commit_signer:1/2=50%)
      Amir Goldstein <amir73il@gmail.com> (commit_signer:1/2=50%)
      linux-kernel@vger.kernel.org (open list)
      
      Neil says:
      > Looking at recent commits, patches come in through multiple
      > different trees.
      > nfsd certainly has an interest in expfs.c.  The only other user is
      > name_to_handle/open_by_handle API.
      > I see it as primarily nfsd functionality which is useful enough to
      > be exported directly to user-space.
      > (It was created by me when I was nfsd maintainer - does that
      > count?)
      Suggested-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      296f3d7f
    • Chuck Lever's avatar
      NFSD: Fix licensing header in filecache.c · 3f054211
      Chuck Lever authored
      Add a missing SPDX header.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      3f054211
    • Chuck Lever's avatar
      NFSD: Use rhashtable for managing nfs4_file objects · d47b295e
      Chuck Lever authored
      fh_match() is costly, especially when filehandles are large (as is
      the case for NFSv4). It needs to be used sparingly when searching
      data structures. Unfortunately, with common workloads, I see
      multiple thousands of objects stored in file_hashtbl[], which has
      just 256 buckets, making its bucket hash chains quite lengthy.
      
      Walking long hash chains with the state_lock held blocks other
      activity that needs that lock. Sizable hash chains are a common
      occurrance once the server has handed out some delegations, for
      example -- IIUC, each delegated file is held open on the server by
      an nfs4_file object.
      
      To help mitigate the cost of searching with fh_match(), replace the
      nfs4_file hash table with an rhashtable, which can dynamically
      resize its bucket array to minimize hash chain length.
      
      The result of this modification is an improvement in the latency of
      NFSv4 operations, and the reduction of nfsd CPU utilization due to
      eliminating the cost of multiple calls to fh_match() and reducing
      the CPU cache misses incurred while walking long hash chains in the
      nfs4_file hash table.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      d47b295e
    • Chuck Lever's avatar
      NFSD: Refactor find_file() · 15424748
      Chuck Lever authored
      find_file() is now the only caller of find_file_locked(), so just
      fold these two together.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      15424748
    • Chuck Lever's avatar
      NFSD: Clean up find_or_add_file() · 9270fc51
      Chuck Lever authored
      Remove the call to find_file_locked() in insert_nfs4_file(). Tracing
      shows that over 99% of these calls return NULL. Thus it is not worth
      the expense of the extra bucket list traversal. insert_file() already
      deals correctly with the case where the item is already in the hash
      bucket.
      
      Since nfsd4_file_hash_insert() is now just a wrapper around
      insert_file(), move the meat of insert_file() into
      nfsd4_file_hash_insert() and get rid of it.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      9270fc51
    • Chuck Lever's avatar
      NFSD: Add a nfsd4_file_hash_remove() helper · 3341678f
      Chuck Lever authored
      Refactor to relocate hash deletion operation to a helper function
      that is close to most other nfs4_file data structure operations.
      
      The "noinline" annotation will become useful in a moment when the
      hlist_del_rcu() is replaced with a more complex rhash remove
      operation. It also guarantees that hash remove operations can be
      traced with "-p function -l remove_nfs4_file_locked".
      
      This also simplifies the organization of forward declarations: the
      to-be-added rhashtable and its param structure will be defined
      /after/ put_nfs4_file().
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      3341678f
    • Chuck Lever's avatar
      NFSD: Clean up nfsd4_init_file() · 81a21fa3
      Chuck Lever authored
      Name this function more consistently. I'm going to use nfsd4_file_
      and nfsd4_file_hash_ for these helpers.
      
      Change the @fh parameter to be const pointer for better type safety.
      
      Finally, move the hash insertion operation to the caller. This is
      typical for most other "init_object" type helpers, and it is where
      most of the other nfs4_file hash table operations are located.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      81a21fa3