1. 20 Apr, 2024 1 commit
    • Chuck Lever's avatar
      Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain" · 32cf5a4e
      Chuck Lever authored
      Performance regression reported with NFS/RDMA using Omnipath,
      bisected to commit e084ee67 ("svcrdma: Add Write chunk WRs to
      the RPC's Send WR chain").
      
      Tracing on the server reports:
      
        nfsd-7771  [060]  1758.891809: svcrdma_sq_post_err:
      	cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12
      
      sq_post_err reports ENOMEM, and the rdma->sc_sq_avail (13643) is
      larger than rdma->sc_sq_depth (851). The number of available Send
      Queue entries is always supposed to be smaller than the Send Queue
      depth. That seems like a Send Queue accounting bug in svcrdma.
      
      As it's getting to be late in the 6.9-rc cycle, revert this commit.
      It can be revisited in a subsequent kernel release.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743
      Fixes: e084ee67 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      32cf5a4e
  2. 11 Apr, 2024 1 commit
  3. 10 Apr, 2024 1 commit
    • Steven Rostedt (Google)'s avatar
      SUNRPC: Fix rpcgss_context trace event acceptor field · a4833e3a
      Steven Rostedt (Google) authored
      The rpcgss_context trace event acceptor field is a dynamically sized
      string that records the "data" parameter. But this parameter is also
      dependent on the "len" field to determine the size of the data.
      
      It needs to use __string_len() helper macro where the length can be passed
      in. It also incorrectly uses strncpy() to save it instead of
      __assign_str(). As these macros can change, it is not wise to open code
      them in trace events.
      
      As of commit c759e609 ("tracing: Remove __assign_str_len()"),
      __assign_str() can be used for both __string() and __string_len() fields.
      Before that commit, __assign_str_len() is required to be used. This needs
      to be noted for backporting. (In actuality, commit c1fa617c ("tracing:
      Rework __assign_str() and __string() to not duplicate getting the string")
      is the commit that makes __string_str_len() obsolete).
      
      Cc: stable@vger.kernel.org
      Fixes: 0c77668d ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a4833e3a
  4. 05 Apr, 2024 1 commit
    • Jeff Layton's avatar
      nfsd: hold a lighter-weight client reference over CB_RECALL_ANY · 10396f4d
      Jeff Layton authored
      Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
      client. While a callback job is technically an RPC that counter is
      really more for client-driven RPCs, and this has the effect of
      preventing the client from being unhashed until the callback completes.
      
      If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
      can end up in a situation where the callback can't complete on the (now
      dead) callback channel, but the new client can't connect because the old
      client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
      return on the CREATE_SESSION operation.
      
      The job is only holding a reference to the client so it can clear a flag
      after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a
      reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of
      reference when dealing with the nfsdfs info files, but it should work
      appropriately here to ensure that the nfs4_client doesn't disappear.
      
      Fixes: 44df6f43 ("NFSD: add delegation reaper to react to low memory condition")
      Reported-by: default avatarVladimir Benes <vbenes@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      10396f4d
  5. 04 Apr, 2024 1 commit
  6. 27 Mar, 2024 1 commit
    • Chuck Lever's avatar
      NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies · 99dc2ef0
      Chuck Lever authored
      There are one or two cases where CREATE_SESSION returns
      NFS4ERR_DELAY in order to force the client to wait a bit and try
      CREATE_SESSION again. However, after commit e4469c6c ("NFSD: Fix
      the NFSv4.1 CREATE_SESSION operation"), NFSD caches that response in
      the CREATE_SESSION slot. Thus, when the client resends the
      CREATE_SESSION, the server always returns the cached NFS4ERR_DELAY
      response rather than actually executing the request and properly
      recording its outcome. This blocks the client from making further
      progress.
      
      RFC 8881 Section 15.1.1.3 says:
      > If NFS4ERR_DELAY is returned on an operation other than SEQUENCE
      > that validly appears as the first operation of a request ... [t]he
      > request can be retried in full without modification. In this case
      > as well, the replier MUST avoid returning a response containing
      > NFS4ERR_DELAY as the response to an initial operation of a request
      > solely on the basis of its presence in the reply cache.
      
      Neither the original NFSD code nor the discussion in section 18.36.4
      refer explicitly to this important requirement, so I missed it.
      
      Note also that not only must the server not cache NFS4ERR_DELAY, but
      it has to not advance the CREATE_SESSION slot sequence number so
      that it can properly recognize and accept the client's retry.
      Reported-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Fixes: e4469c6c ("NFSD: Fix the NFSv4.1 CREATE_SESSION operation")
      Tested-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      99dc2ef0
  7. 22 Mar, 2024 2 commits
  8. 09 Mar, 2024 1 commit
    • Chuck Lever's avatar
      NFSD: Clean up nfsd4_encode_replay() · 9b350d3e
      Chuck Lever authored
      Replace open-coded encoding logic with the use of conventional XDR
      utility functions. Add a tracepoint to make replays observable in
      field troubleshooting situations.
      
      The WARN_ON is removed. A stack trace is of little use, as there is
      only one call site for nfsd4_encode_replay(), and a buffer length
      shortage here is unlikely.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9b350d3e
  9. 05 Mar, 2024 2 commits
  10. 01 Mar, 2024 29 commits