1. 12 Jan, 2023 1 commit
  2. 11 Jan, 2023 5 commits
  3. 10 Jan, 2023 23 commits
  4. 09 Jan, 2023 9 commits
  5. 07 Jan, 2023 2 commits
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20230107' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 571f3dd0
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Fix race between call connection, data transmit and call disconnect
      
      Here are patches to fix an oops[1] caused by a race between call
      connection, initial packet transmission and call disconnection which
      results in something like:
      
              kernel BUG at net/rxrpc/peer_object.c:413!
      
      when the syzbot test is run.  The problem is that the connection procedure
      is effectively split across two threads and can get expanded by taking an
      interrupt, thereby adding the call to the peer error distribution list
      *after* it has been disconnected (say by the rxrpc socket shutting down).
      
      The easiest solution is to look at the fourth set of I/O thread
      conversion/SACK table expansion patches that didn't get applied[2] and take
      from it those patches that move call connection and disconnection into the
      I/O thread.  Moving these things into the I/O thread means that the
      sequencing is managed by all being done in the same thread - and the race
      can no longer happen.
      
      This is preferable to introducing an extra lock as adding an extra lock
      would make the I/O thread have to wait for the app thread in yet another
      place.
      
      The changes can be considered as a number of logical parts:
      
       (1) Move all of the call state changes into the I/O thread.
      
       (2) Make client connection ID space per-local endpoint so that the I/O
           thread doesn't need locks to access it.
      
       (3) Move actual abort generation into the I/O thread and clean it up.  If
           sendmsg or recvmsg want to cause an abort, they have to delegate it.
      
       (4) Offload the setting up of the security context on a connection to the
           thread of one of the apps that's starting a call.  We don't want to be
           doing any sort of crypto in the I/O thread.
      
       (5) Connect calls (ie. assign them to channel slots on connections) in the
           I/O thread.  Calls are set up by sendmsg/kafs and passed to the I/O
           thread to connect.  Connections are allocated in the I/O thread after
           this.
      
       (6) Disconnect calls in the I/O thread.
      
      I've also added a patch for an unrelated bug that cropped up during
      testing, whereby a race can occur between an incoming call and socket
      shutdown.
      
      Note that whilst this fixes the original syzbot bug, another bug may get
      triggered if this one is fixed:
      
              INFO: rcu detected stall in corrupted
              rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T
              rcu: blocking rcu_node structures (internal RCU debug):
      
      It doesn't look this should be anything to do with rxrpc, though, as I've
      tested an additional patch[3] that removes practically all the RCU usage
      from rxrpc and it still occurs.  It seems likely that it is being caused by
      something in the tunnelling setup that the syzbot test does, but there's
      not enough info to go on.  It also seems unlikely to be anything to do with
      the afs driver as the test doesn't use that.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      571f3dd0
    • David Howells's avatar
      rxrpc: Fix incoming call setup race · 42f229c3
      David Howells authored
      An incoming call can race with rxrpc socket destruction, leading to a
      leaked call.  This may result in an oops when the call timer eventually
      expires:
      
         BUG: kernel NULL pointer dereference, address: 0000000000000874
         RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50
         Call Trace:
          <IRQ>
          try_to_wake_up+0x59/0x550
          ? __local_bh_enable_ip+0x37/0x80
          ? rxrpc_poke_call+0x52/0x110 [rxrpc]
          ? rxrpc_poke_call+0x110/0x110 [rxrpc]
          ? rxrpc_poke_call+0x110/0x110 [rxrpc]
          call_timer_fn+0x24/0x120
      
      with a warning in the kernel log looking something like:
      
         rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)!
      
      incurred during rmmod of rxrpc.  The 1061d is the call flags:
      
         RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED,
         IS_SERVICE, RELEASED
      
      but no DISCONNECTED flag (0x800), so it's an incoming (service) call and
      it's still connected.
      
      The race appears to be that:
      
       (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state
           and allocates a call - then pauses, possibly for an interrupt.
      
       (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer,
           discards the prealloc and releases all calls attached to the socket.
      
       (3) rxrpc_new_incoming_call() resumes, launching the new call, including
           its timer and attaching it to the socket.
      
      Fix this by read-locking local->services_lock to access the AF_RXRPC socket
      providing the service rather than RCU in rxrpc_new_incoming_call().
      There's no real need to use RCU here as local->services_lock is only
      write-locked by the socket side in two places: when binding and when
      shutting down.
      
      Fixes: 5e6ef4f1 ("rxrpc: Make the I/O thread take over the call and local processor work")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-afs@lists.infradead.org
      42f229c3