• Wen Gu's avatar
    net/smc: Forward wakeup to smc socket waitqueue after fallback · 341adeec
    Wen Gu authored
    When we replace TCP with SMC and a fallback occurs, there may be
    some socket waitqueue entries remaining in smc socket->wq, such
    as eppoll_entries inserted by userspace applications.
    
    After the fallback, data flows over TCP/IP and only clcsocket->wq
    will be woken up. Applications can't be notified by the entries
    which were inserted in smc socket->wq before fallback. So we need
    a mechanism to wake up smc socket->wq at the same time if some
    entries remaining in it.
    
    The current workaround is to transfer the entries from smc socket->wq
    to clcsock->wq during the fallback. But this may cause a crash
    like this:
    
     general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
     CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E     5.16.0+ #107
     RIP: 0010:__wake_up_common+0x65/0x170
     Call Trace:
      <IRQ>
      __wake_up_common_lock+0x7a/0xc0
      sock_def_readable+0x3c/0x70
      tcp_data_queue+0x4a7/0xc40
      tcp_rcv_established+0x32f/0x660
      ? sk_filter_trim_cap+0xcb/0x2e0
      tcp_v4_do_rcv+0x10b/0x260
      tcp_v4_rcv+0xd2a/0xde0
      ip_protocol_deliver_rcu+0x3b/0x1d0
      ip_local_deliver_finish+0x54/0x60
      ip_local_deliver+0x6a/0x110
      ? tcp_v4_early_demux+0xa2/0x140
      ? tcp_v4_early_demux+0x10d/0x140
      ip_sublist_rcv_finish+0x49/0x60
      ip_sublist_rcv+0x19d/0x230
      ip_list_rcv+0x13e/0x170
      __netif_receive_skb_list_core+0x1c2/0x240
      netif_receive_skb_list_internal+0x1e6/0x320
      napi_complete_done+0x11d/0x190
      mlx5e_napi_poll+0x163/0x6b0 [mlx5_core]
      __napi_poll+0x3c/0x1b0
      net_rx_action+0x27c/0x300
      __do_softirq+0x114/0x2d2
      irq_exit_rcu+0xb4/0xe0
      common_interrupt+0xba/0xe0
      </IRQ>
      <TASK>
    
    The crash is caused by privately transferring waitqueue entries from
    smc socket->wq to clcsock->wq. The owners of these entries, such as
    epoll, have no idea that the entries have been transferred to a
    different socket wait queue and still use original waitqueue spinlock
    (smc socket->wq.wait.lock) to make the entries operation exclusive,
    but it doesn't work. The operations to the entries, such as removing
    from the waitqueue (now is clcsock->wq after fallback), may cause a
    crash when clcsock waitqueue is being iterated over at the moment.
    
    This patch tries to fix this by no longer transferring wait queue
    entries privately, but introducing own implementations of clcsock's
    callback functions in fallback situation. The callback functions will
    forward the wakeup to smc socket->wq if clcsock->wq is actually woken
    up and smc socket->wq has remaining entries.
    
    Fixes: 2153bd1e ("net/smc: Transfer remaining wait queue entries during fallback")
    Suggested-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
    Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    341adeec
af_smc.c 79.1 KB