• Chengfeng Ye's avatar
    scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock · 1a197555
    Chengfeng Ye authored
    There is a long call chain that &fip->ctlr_lock is acquired by isr
    fnic_isr_msix_wq_copy() under hard IRQ context. Thus other process context
    code acquiring the lock should disable IRQ, otherwise deadlock could happen
    if the IRQ preempts the execution while the lock is held in process context
    on the same CPU.
    
    [ISR]
    fnic_isr_msix_wq_copy()
     -> fnic_wq_copy_cmpl_handler()
     -> fnic_fcpio_cmpl_handler()
     -> fnic_fcpio_flogi_reg_cmpl_handler()
     -> fnic_flush_tx()
     -> fnic_send_frame()
     -> fcoe_ctlr_els_send()
     -> spin_lock_bh(&fip->ctlr_lock)
    
    [Process Context]
    1. fcoe_ctlr_timer_work()
     -> fcoe_ctlr_flogi_send()
     -> spin_lock_bh(&fip->ctlr_lock)
    
    2. fcoe_ctlr_recv_work()
     -> fcoe_ctlr_recv_handler()
     -> fcoe_ctlr_recv_els()
     -> fcoe_ctlr_announce()
     -> spin_lock_bh(&fip->ctlr_lock)
    
    3. fcoe_ctlr_recv_work()
     -> fcoe_ctlr_recv_handler()
     -> fcoe_ctlr_recv_els()
     -> fcoe_ctlr_flogi_retry()
     -> spin_lock_bh(&fip->ctlr_lock)
    
    4. -> fcoe_xmit()
     -> fcoe_ctlr_els_send()
     -> spin_lock_bh(&fip->ctlr_lock)
    
    spin_lock_bh() is not enough since fnic_isr_msix_wq_copy() is a
    hardirq.
    
    These flaws were found by an experimental static analysis tool I am
    developing for irq-related deadlock.
    
    The patch fix the potential deadlocks by spin_lock_irqsave() to disable
    hard irq.
    
    Fixes: 794d98e7 ("[SCSI] libfcoe: retry rejected FLOGI to another FCF if possible")
    Signed-off-by: default avatarChengfeng Ye <dg573847474@gmail.com>
    Link: https://lore.kernel.org/r/20230817074708.7509-1-dg573847474@gmail.comReviewed-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    1a197555
fcoe_ctlr.c 87.8 KB