1. 23 Sep, 2024 18 commits
  2. 20 Sep, 2024 22 commits
    • Chuck Lever's avatar
      xdrgen: Prevent reordering of encoder and decoder functions · 509abfc7
      Chuck Lever authored
      I noticed that "xdrgen source" reorders the procedure encoder and
      decoder functions every time it is run. I would prefer that the
      generated code be more deterministic: it enables a reader to better
      see exactly what has changed between runs of the tool.
      
      The problem is that Python sets are not ordered. I use a Python set
      to ensure that, when multiple procedures use a particular argument or
      result type, the encoder/decoder for that type is emitted only once.
      
      Sets aren't ordered, but I can use Python dictionaries for this
      purpose to ensure the procedure functions are always emitted in the
      same order if the .x file does not change.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      509abfc7
    • Chuck Lever's avatar
      xdrgen: typedefs should use the built-in string and opaque functions · fed8a17c
      Chuck Lever authored
      'typedef opaque yada<XYZ>' should use xdrgen's built-in opaque
      encoder and decoder, to enable better compiler optimization.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      fed8a17c
    • Chuck Lever's avatar
      xdrgen: Fix return code checking in built-in XDR decoders · 663ad8b1
      Chuck Lever authored
      xdr_stream_encode_u32() returns XDR_UNIT on success.
      xdr_stream_decode_u32() returns zero or -EMSGSIZE, but never
      XDR_UNIT.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      663ad8b1
    • Chuck Lever's avatar
      tools: Add xdrgen · 4b132aac
      Chuck Lever authored
      Add a Python-based tool for translating XDR specifications into XDR
      encoder and decoder functions written in the Linux kernel's C coding
      style. The generator attempts to match the usual C coding style of
      the Linux kernel's SunRPC consumers.
      
      This approach is similar to the netlink code generator in
      tools/net/ynl .
      
      The maintainability benefits of machine-generated XDR code include:
      
      - Stronger type checking
      - Reduces the number of bugs introduced by human error
      - Makes the XDR code easier to audit and analyze
      - Enables rapid prototyping of new RPC-based protocols
      - Hardens the layering between protocol logic and marshaling
      - Makes it easier to add observability on demand
      - Unit tests might be built for both the tool and (automatically)
        for the generated code
      
      In addition, converting the XDR layer to use memory-safe languages
      such as Rust will be easier if much of the code can be converted
      automatically.
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      4b132aac
    • NeilBrown's avatar
      nfsd: fix delegation_blocked() to block correctly for at least 30 seconds · 45bb63ed
      NeilBrown authored
      The pair of bloom filtered used by delegation_blocked() was intended to
      block delegations on given filehandles for between 30 and 60 seconds.  A
      new filehandle would be recorded in the "new" bit set.  That would then
      be switch to the "old" bit set between 0 and 30 seconds later, and it
      would remain as the "old" bit set for 30 seconds.
      
      Unfortunately the code intended to clear the old bit set once it reached
      30 seconds old, preparing it to be the next new bit set, instead cleared
      the *new* bit set before switching it to be the old bit set.  This means
      that the "old" bit set is always empty and delegations are blocked
      between 0 and 30 seconds.
      
      This patch updates bd->new before clearing the set with that index,
      instead of afterwards.
      Reported-by: default avatarOlga Kornievskaia <okorniev@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 6282cd56 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      45bb63ed
    • Jeff Layton's avatar
      nfsd: fix initial getattr on write delegation · bf92e500
      Jeff Layton authored
      At this point in compound processing, currentfh refers to the parent of
      the file, not the file itself. Get the correct dentry from the delegation
      stateid instead.
      
      Fixes: c5967721 ("NFSD: handle GETATTR conflict with write delegation")
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      bf92e500
    • NeilBrown's avatar
      nfsd: untangle code in nfsd4_deleg_getattr_conflict() · a078a7dc
      NeilBrown authored
      The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.
      
      With this patch we:
       - properly handle non-nfsd leases.  We must not assume flc_owner is a
          delegation unless fl_lmops == &nfsd_lease_mng_ops
       - move the main code out of the for loop
       - have a single exit which calls nfs4_put_stid()
         (and other exits which don't need to call that)
      
      [ jlayton: refactored on top of Neil's other patch: nfsd: fix
      	   nfsd4_deleg_getattr_conflict in presence of third party lease ]
      
      Fixes: c5967721 ("NFSD: handle GETATTR conflict with write delegation")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a078a7dc
    • Scott Mayhew's avatar
      nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall() · 5559c157
      Scott Mayhew authored
      This patch is intended to go on top of "nfsd: return -EINVAL when
      namelen is 0" from Li Lingfeng.  Li's patch checks for 0, but we should
      be enforcing an upper bound as well.
      
      Note that if nfsdcld somehow gets an id > NFS4_OPAQUE_LIMIT in its
      database, it'll truncate it to NFS4_OPAQUE_LIMIT when it does the
      downcall anyway.
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5559c157
    • Li Lingfeng's avatar
      nfsd: return -EINVAL when namelen is 0 · 22451a16
      Li Lingfeng authored
      When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may
      result in namelen being 0, which will cause memdup_user() to return
      ZERO_SIZE_PTR.
      When we access the name.data that has been assigned the value of
      ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is
      triggered.
      
      [ T1205] ==================================================================
      [ T1205] BUG: KASAN: null-ptr-deref in nfs4_client_to_reclaim+0xe9/0x260
      [ T1205] Read of size 1 at addr 0000000000000010 by task nfsdcld/1205
      [ T1205]
      [ T1205] CPU: 11 PID: 1205 Comm: nfsdcld Not tainted 5.10.0-00003-g2c1423731b8d #406
      [ T1205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [ T1205] Call Trace:
      [ T1205]  dump_stack+0x9a/0xd0
      [ T1205]  ? nfs4_client_to_reclaim+0xe9/0x260
      [ T1205]  __kasan_report.cold+0x34/0x84
      [ T1205]  ? nfs4_client_to_reclaim+0xe9/0x260
      [ T1205]  kasan_report+0x3a/0x50
      [ T1205]  nfs4_client_to_reclaim+0xe9/0x260
      [ T1205]  ? nfsd4_release_lockowner+0x410/0x410
      [ T1205]  cld_pipe_downcall+0x5ca/0x760
      [ T1205]  ? nfsd4_cld_tracking_exit+0x1d0/0x1d0
      [ T1205]  ? down_write_killable_nested+0x170/0x170
      [ T1205]  ? avc_policy_seqno+0x28/0x40
      [ T1205]  ? selinux_file_permission+0x1b4/0x1e0
      [ T1205]  rpc_pipe_write+0x84/0xb0
      [ T1205]  vfs_write+0x143/0x520
      [ T1205]  ksys_write+0xc9/0x170
      [ T1205]  ? __ia32_sys_read+0x50/0x50
      [ T1205]  ? ktime_get_coarse_real_ts64+0xfe/0x110
      [ T1205]  ? ktime_get_coarse_real_ts64+0xa2/0x110
      [ T1205]  do_syscall_64+0x33/0x40
      [ T1205]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
      [ T1205] RIP: 0033:0x7fdbdb761bc7
      [ T1205] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 514
      [ T1205] RSP: 002b:00007fff8c4b7248 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ T1205] RAX: ffffffffffffffda RBX: 000000000000042b RCX: 00007fdbdb761bc7
      [ T1205] RDX: 000000000000042b RSI: 00007fff8c4b75f0 RDI: 0000000000000008
      [ T1205] RBP: 00007fdbdb761bb0 R08: 0000000000000000 R09: 0000000000000001
      [ T1205] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000042b
      [ T1205] R13: 0000000000000008 R14: 00007fff8c4b75f0 R15: 0000000000000000
      [ T1205] ==================================================================
      
      Fix it by checking namelen.
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Fixes: 74725959 ("nfsd: un-deprecate nfsdcld")
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarScott Mayhew <smayhew@redhat.com>
      Tested-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      22451a16
    • Chuck Lever's avatar
      NFSD: Wrap async copy operations with trace points · 0505de96
      Chuck Lever authored
      Add an nfsd_copy_async_done to record the timestamp, the final
      status code, and the callback stateid of an async copy.
      
      Rename the nfsd_copy_do_async tracepoint to match that naming
      convention to make it easier to enable both of these with a
      single glob.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      0505de96
    • Chuck Lever's avatar
    • Chuck Lever's avatar
      NFSD: Record the callback stateid in copy tracepoints · e1d2697c
      Chuck Lever authored
      Match COPY operations up with CB_OFFLOAD operations.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e1d2697c
    • Chuck Lever's avatar
      NFSD: Display copy stateids with conventional print formatting · 11848e98
      Chuck Lever authored
      Make it easier to grep for s2s COPY stateids in trace logs: Use the
      same display format in nfsd_copy_class as is used to display other
      stateids.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      11848e98
    • Chuck Lever's avatar
      NFSD: Limit the number of concurrent async COPY operations · aadc3bbe
      Chuck Lever authored
      Nothing appears to limit the number of concurrent async COPY
      operations that clients can start. In addition, AFAICT each async
      COPY can copy an unlimited number of 4MB chunks, so can run for a
      long time. Thus IMO async COPY can become a DoS vector.
      
      Add a restriction mechanism that bounds the number of concurrent
      background COPY operations. Start simple and try to be fair -- this
      patch implements a per-namespace limit.
      
      An async COPY request that occurs while this limit is exceeded gets
      NFS4ERR_DELAY. The requesting client can choose to send the request
      again after a delay or fall back to a traditional read/write style
      copy.
      
      If there is need to make the mechanism more sophisticated, we can
      visit that in future patches.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      aadc3bbe
    • Chuck Lever's avatar
      NFSD: Async COPY result needs to return a write verifier · 9ed666eb
      Chuck Lever authored
      Currently, when NFSD handles an asynchronous COPY, it returns a
      zero write verifier, relying on the subsequent CB_OFFLOAD callback
      to pass the write verifier and a stable_how4 value to the client.
      
      However, if the CB_OFFLOAD never arrives at the client (for example,
      if a network partition occurs just as the server sends the
      CB_OFFLOAD operation), the client will never receive this verifier.
      Thus, if the client sends a follow-up COMMIT, there is no way for
      the client to assess the COMMIT result.
      
      The usual recovery for a missing CB_OFFLOAD is for the client to
      send an OFFLOAD_STATUS operation, but that operation does not carry
      a write verifier in its result. Neither does it carry a stable_how4
      value, so the client /must/ send a COMMIT in this case -- which will
      always fail because currently there's still no write verifier in the
      COPY result.
      
      Thus the server needs to return a normal write verifier in its COPY
      result even if the COPY operation is to be performed asynchronously.
      
      If the server recognizes the callback stateid in subsequent
      OFFLOAD_STATUS operations, then obviously it has not restarted, and
      the write verifier the client received in the COPY result is still
      valid and can be used to assess a COMMIT of the copied data, if one
      is needed.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9ed666eb
    • NeilBrown's avatar
      nfsd: avoid races with wake_up_var() · 15392c8c
      NeilBrown authored
      wake_up_var() needs a barrier after the important change is made in the
      var and before wake_up_var() is called, else it is possible that a wake
      up won't be sent when it should.
      
      In each case here the var is changed in an "atomic" manner, so
      smb_mb__after_atomic() is sufficient.
      
      In one case the important change (removing the lease) is performed
      *after* the wake_up, which is backwards.  The code survives in part
      because the wait_var_event is given a timeout.
      
      This patch adds the required barriers and calls destroy_delegation()
      *before* waking any threads waiting for the delegation to be destroyed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      15392c8c
    • NeilBrown's avatar
      nfsd: use clear_and_wake_up_bit() · 985eeae9
      NeilBrown authored
      nfsd has two places that open-code clear_and_wake_up_bit().  One has
      the required memory barriers.  The other does not.
      
      Change both to use clear_and_wake_up_bit() so we have the barriers
      without the noise.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      985eeae9
    • Yan Zhen's avatar
      sunrpc: xprtrdma: Use ERR_CAST() to return · aeddf8e6
      Yan Zhen authored
      Using ERR_CAST() is more reasonable and safer, When it is necessary
      to convert the type of an error pointer and return it.
      Signed-off-by: default avatarYan Zhen <yanzhen@vivo.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      aeddf8e6
    • Thorsten Blum's avatar
      NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by() · 2869b3a0
      Thorsten Blum authored
      Add the __counted_by compiler attribute to the flexible array member
      volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
      CONFIG_FORTIFY_SOURCE.
      
      Use struct_size() instead of manually calculating the number of bytes to
      allocate for a pnfs_block_deviceaddr with a single volume.
      Signed-off-by: default avatarThorsten Blum <thorsten.blum@toblux.com>
      Reviewed-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2869b3a0
    • Guoqing Jiang's avatar
      nfsd: call cache_put if xdr_reserve_space returns NULL · d078cbf5
      Guoqing Jiang authored
      If not enough buffer space available, but idmap_lookup has triggered
      lookup_fn which calls cache_get and returns successfully. Then we
      missed to call cache_put here which pairs with cache_get.
      
      Fixes: ddd1ea56 ("nfsd4: use xdr_reserve_space in attribute encoding")
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Reviwed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d078cbf5
    • Jeff Layton's avatar
      nfsd: add more nfsd_cb tracepoints · ba017fd3
      Jeff Layton authored
      Add some tracepoints in the callback client RPC operations. Also
      add a tracepoint to nfsd4_cb_getattr_done.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ba017fd3
    • Jeff Layton's avatar
      nfsd: track the main opcode for callbacks · c1c9f3ea
      Jeff Layton authored
      Keep track of the "main" opcode for the callback, and display it in the
      tracepoint. This makes it simpler to discern what's happening when there
      is more than one callback in flight.
      
      The one special case is the CB_NULL RPC. That's not a CB_COMPOUND
      opcode, so designate the value 0 for that.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c1c9f3ea