• Marek Vasut's avatar
    net: ks8851: Handle softirqs at the end of IRQ thread to fix hang · be0384bf
    Marek Vasut authored
    The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
    any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
    implementation is guarded by local_bh_disable() and local_bh_enable().
    The local_bh_enable() may call do_softirq() to run softirqs in case
    any are pending. One of the softirqs is net_rx_action, which ultimately
    reaches the driver .start_xmit callback. If that happens, the system
    hangs. The entire call chain is below:
    
    ks8851_start_xmit_par from netdev_start_xmit
    netdev_start_xmit from dev_hard_start_xmit
    dev_hard_start_xmit from sch_direct_xmit
    sch_direct_xmit from __dev_queue_xmit
    __dev_queue_xmit from __neigh_update
    __neigh_update from neigh_update
    neigh_update from arp_process.constprop.0
    arp_process.constprop.0 from __netif_receive_skb_one_core
    __netif_receive_skb_one_core from process_backlog
    process_backlog from __napi_poll.constprop.0
    __napi_poll.constprop.0 from net_rx_action
    net_rx_action from __do_softirq
    __do_softirq from call_with_stack
    call_with_stack from do_softirq
    do_softirq from __local_bh_enable_ip
    __local_bh_enable_ip from netif_rx
    netif_rx from ks8851_irq
    ks8851_irq from irq_thread_fn
    irq_thread_fn from irq_thread
    irq_thread from kthread
    kthread from ret_from_fork
    
    The hang happens because ks8851_irq() first locks a spinlock in
    ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
    and with that spinlock locked, calls netif_rx(). Once the execution
    reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
    which attempts to claim the already locked spinlock again, and the
    hang happens.
    
    Move the do_softirq() call outside of the spinlock protected section
    of ks8851_irq() by disabling BHs around the entire spinlock protected
    section of ks8851_irq() handler. Place local_bh_enable() outside of
    the spinlock protected section, so that it can trigger do_softirq()
    without the ks8851_par.c ks8851_lock_par() spinlock being held, and
    safely call ks8851_start_xmit_par() without attempting to lock the
    already locked spinlock.
    
    Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
    now, replace netif_rx() with __netif_rx() which is not duplicating the
    local_bh_disable()/local_bh_enable() calls.
    
    Fixes: 797047f8 ("net: ks8851: Implement Parallel bus operations")
    Signed-off-by: default avatarMarek Vasut <marex@denx.de>
    Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    be0384bf
ks8851_common.c 30.7 KB