• Eric Dumazet's avatar
    dump_stack: avoid potential deadlocks · e31e4672
    Eric Dumazet authored
    commit d7ce3692 upstream.
    
    Some servers experienced fatal deadlocks because of a combination of
    bugs, leading to multiple cpus calling dump_stack().
    
    The checksumming bug was fixed in commit 34ae6a1a ("ipv6: update
    skb->csum when CE mark is propagated").
    
    The second problem is a faulty locking in dump_stack()
    
    CPU1 runs in process context and calls dump_stack(), grabs dump_lock.
    
       CPU2 receives a TCP packet under softirq, grabs socket spinlock, and
       call dump_stack() from netdev_rx_csum_fault().
    
       dump_stack() spins on atomic_cmpxchg(&dump_lock, -1, 2), since
       dump_lock is owned by CPU1
    
    While dumping its stack, CPU1 is interrupted by a softirq, and happens
    to process a packet for the TCP socket locked by CPU2.
    
    CPU1 spins forever in spin_lock() : deadlock
    
    Stack trace on CPU1 looked like :
    
        NMI backtrace for cpu 1
        RIP: _raw_spin_lock+0x25/0x30
        ...
        Call Trace:
          <IRQ>
          tcp_v6_rcv+0x243/0x620
          ip...
    e31e4672
dump_stack.c 1.18 KB