• Jakub Kicinski's avatar
    net: don't let netpoll invoke NAPI if in xmit context · 275b471e
    Jakub Kicinski authored
    Commit 0db3dc73 ("[NETPOLL]: tx lock deadlock fix") narrowed
    down the region under netif_tx_trylock() inside netpoll_send_skb().
    (At that point in time netif_tx_trylock() would lock all queues of
    the device.) Taking the tx lock was problematic because driver's
    cleanup method may take the same lock. So the change made us hold
    the xmit lock only around xmit, and expected the driver to take
    care of locking within ->ndo_poll_controller().
    
    Unfortunately this only works if netpoll isn't itself called with
    the xmit lock already held. Netpoll code is careful and uses
    trylock(). The drivers, however, may be using plain lock().
    Printing while holding the xmit lock is going to result in rare
    deadlocks.
    
    Luckily we record the xmit lock owners, so we can scan all the queues,
    the same way we scan NAPI owners. If any of the xmit locks is held
    by the local CPU we better not attempt any polling.
    
    It would be nice if we could narrow down the check to only the NAPIs
    and the queue we're trying to use. I don't see a way to do that now.
    Reported-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
    Fixes: 0db3dc73 ("[NETPOLL]: tx lock deadlock fix")
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    275b471e
netpoll.c 19.3 KB