1. 07 Apr, 2022 5 commits
  2. 30 Mar, 2022 3 commits
  3. 28 Mar, 2022 1 commit
  4. 26 Mar, 2022 1 commit
  5. 25 Mar, 2022 2 commits
  6. 24 Mar, 2022 3 commits
  7. 23 Mar, 2022 1 commit
    • NeilBrown's avatar
      SUNRPC: avoid race between mod_timer() and del_timer_sync() · 3848e96e
      NeilBrown authored
      xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
      Both xprt_unlock_connect() and xprt_release() call
       ->release_xprt()
      which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
      which calls mod_timer().
      
      This may result in mod_timer() being called *after* del_timer_sync().
      When this happens, the timer may fire long after the xprt has been freed,
      and run_timer_softirq() will probably crash.
      
      The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
      always called under ->transport_lock.  So if we take ->transport_lock to
      call del_timer_sync(), we can be sure that mod_timer() will run first
      (if it runs at all).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      3848e96e
  8. 22 Mar, 2022 16 commits
  9. 21 Mar, 2022 1 commit
  10. 13 Mar, 2022 7 commits
    • NeilBrown's avatar
      SUNRPC: change locking for xs_swap_enable/disable · 693486d5
      NeilBrown authored
      It is not in general safe to wait for XPRT_LOCKED to clear.
      A wakeup is only sent when
       - connection completes
       - sock close completes
      so during normal operations, this can wait indefinitely.
      
      The event we need to protect against is ->inet being set to NULL, and
      that happens under the recv_mutex lock.
      
      So drop the handlign of XPRT_LOCKED and use recv_mutex instead.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      693486d5
    • NeilBrown's avatar
      NFS: swap-out must always use STABLE writes. · c265de25
      NeilBrown authored
      The commit handling code is not safe against memory-pressure deadlocks
      when writing to swap.  In particular, nfs_commitdata_alloc() blocks
      indefinitely waiting for memory, and this can consume all available
      workqueue threads.
      
      swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
      that a stable write should be used if the write fits in a single
      request, and it normally does.  However if we ever swap with a small
      wsize, or gather unusually large numbers of pages for a single write,
      this might change.
      
      For safety, make it explicit in the code that direct writes used for swap
      must always use FLUSH_STABLE.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      c265de25
    • NeilBrown's avatar
      NFS: swap IO handling is slightly different for O_DIRECT IO · 64158668
      NeilBrown authored
      1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
         possible deadlocks with "fs_reclaim".  These deadlocks could, I believe,
         eventuate if a buffered read on the swapfile was attempted.
      
         We don't need coherence with the page cache for a swap file, and
         buffered writes are forbidden anyway.  There is no other need for
         i_rwsem during direct IO.  So never take it for swap_rw()
      
      2/ generic_write_checks() explicitly forbids writes to swap, and
         performs checks that are not needed for swap.  So bypass it
         for swap_rw().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      64158668
    • NeilBrown's avatar
      NFSv4: keep state manager thread active if swap is enabled · 4dc73c67
      NeilBrown authored
      If we are swapping over NFSv4, we may not be able to allocate memory to
      start the state-manager thread at the time when we need it.
      So keep it always running when swap is enabled, and just signal it to
      start.
      
      This requires updating and testing the cl_swapper count on the root
      rpc_clnt after following all ->cl_parent links.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      4dc73c67
    • NeilBrown's avatar
      SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC · 8db55a03
      NeilBrown authored
      rpc tasks can be marked as RPC_TASK_SWAPPER.  This causes GFP_MEMALLOC
      to be used for some allocations.  This is needed in some cases, but not
      in all where it is currently provided, and in some where it isn't
      provided.
      
      Currently *all* tasks associated with a rpc_client on which swap is
      enabled get the flag and hence some GFP_MEMALLOC support.
      
      GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it.
      However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does
      need it.
      
      xdr_alloc_bvec is called while the XPRT_LOCK is held.  If this blocks,
      then it blocks all other queued tasks.  So this allocation needs
      GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used
      for any swap writes.
      
      Similarly, if the transport is not connected, that will block all
      requests including swap writes, so memory allocations should get
      GFP_MEMALLOC if swap writes are possible.
      
      So with this patch:
       1/ we ONLY set RPC_TASK_SWAPPER for swap writes.
       2/ __rpc_execute() sets PF_MEMALLOC while handling any task
          with RPC_TASK_SWAPPER set, or when handling any task that
          holds the XPRT_LOCKED lock on an xprt used for swap.
          This removes the need for the RPC_IS_SWAPPER() test
          in ->buf_alloc handlers.
       3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking
          any task to a swapper xprt.  __rpc_execute() will clear it.
       3/ PF_MEMALLOC is set for all the connect workers.
      
      Reviewed-by: Chuck Lever <chuck.lever@oracle.com> (for xprtrdma parts)
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      8db55a03
    • NeilBrown's avatar
      NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS · 89c2be8a
      NeilBrown authored
      NFS_RPC_SWAPFLAGS is only used for READ requests.
      It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to
      requests.  This is not needed for swap READ - though it is for writes
      where it is set via a different mechanism.
      
      RPC_TASK_ROOTCREDS causes the 'machine' credential to be used.
      This is not needed as the root credential is saved when the swap file is
      opened, and this is used for all IO.
      
      So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of
      RPC_TASK_ROOTCREDS, that isn't needed either.
      
      Remove both.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      89c2be8a
    • NeilBrown's avatar
      SUNRPC: remove scheduling boost for "SWAPPER" tasks. · a80a8461
      NeilBrown authored
      Currently, tasks marked as "swapper" tasks get put to the front of
      non-priority rpc_queues, and are sorted earlier than non-swapper tasks on
      the transport's ->xmit_queue.
      
      This is pointless as currently *all* tasks for a mount that has swap
      enabled on *any* file are marked as "swapper" tasks.  So the net result
      is that the non-priority rpc_queues are reverse-ordered (LIFO).
      
      This scheduling boost is not necessary to avoid deadlocks, and hurts
      fairness, so remove it.  If there were a need to expedite some requests,
      the tk_priority mechanism is a more appropriate tool.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      a80a8461