• Neerav Parikh's avatar
    fcoe: Fix deadlock while deleting FCoE interface with NPIV ports · 94aa743a
    Neerav Parikh authored
    This patch fixes following deadlock caused by destroying of
    an FCoE interface with active NPIV ports on that interface.
    
        Call Trace:
        [<ffffffff814b7e88>] schedule+0x64/0x66
        [<ffffffff814b6b4f>] schedule_timeout+0x36/0xe3
        [<ffffffff81070c55>] ? update_curr+0xd6/0x110
        [<ffffffff81071f6b>] ? hrtick_update+0x1b/0x4d
        [<ffffffff81072405>] ? dequeue_task_fair+0x1ca/0x1d9
        [<ffffffff8106a369>] ? need_resched+0x1e/0x28
        [<ffffffff814b7d14>] wait_for_common+0x9b/0xf1
        [<ffffffff8106e7be>] ? try_to_wake_up+0x1e0/0x1e0
        [<ffffffff814b7e22>] wait_for_completion+0x1d/0x1f
        [<ffffffff8105ae82>] flush_workqueue+0x116/0x2a1
        [<ffffffff8105b357>] drain_workqueue+0x66/0x14c
        [<ffffffff8105b8ef>] destroy_workqueue+0x1a/0xcf
        [<ffffffffa009211e>] fc_remove_host+0x154/0x17f [scsi_transport_fc]
        [<ffffffffa00edbb8>] fcoe_if_destroy+0x184/0x1c9 [fcoe]
        [<ffffffffa00edc28>] fcoe_destroy_work+0x2b/0x44 [fcoe]
        [<ffffffff8105a82a>] process_one_work+0x1a8/0x2a4
        [<ffffffffa00edbfd>] ? fcoe_if_destroy+0x1c9/0x1c9 [fcoe]
        [<ffffffff8105c396>] worker_thread+0x1db/0x268
        [<ffffffff810604a3>] ? wake_up_bit+0x2a/0x2a
        [<ffffffff8105c1bb>] ? manage_workers.clone.16+0x1f6/0x1f6
        [<ffffffff8105ffd6>] kthread+0x6f/0x77
        [<ffffffff814c0304>] kernel_thread_helper+0x4/0x10
        [<ffffffff8105ff67>] ? kthread_freezable_should_stop+0x4b/0x4b
    
        Call Trace:
        [<ffffffff814b7e88>] schedule+0x64/0x66
        [<ffffffff814b8041>] schedule_preempt_disabled+0xe/0x10
        [<ffffffff814b70a1>] __mutex_lock_common.clone.5+0x117/0x17a
        [<ffffffff814b7117>] __mutex_lock_slowpath+0x13/0x15
        [<ffffffff814b6f76>] mutex_lock+0x23/0x37
        [<ffffffff8125b890>] ? list_del+0x11/0x30
        [<ffffffffa00edc84>] fcoe_vport_destroy+0x43/0x5f [fcoe]
        [<ffffffffa009130a>] fc_vport_terminate+0x48/0x110 [scsi_transport_fc]
        [<ffffffffa00913ef>] fc_vport_sched_delete+0x1d/0x79 [scsi_transport_fc]
        [<ffffffff8105a82a>] process_one_work+0x1a8/0x2a4
        [<ffffffffa00913d2>] ? fc_vport_terminate+0x110/0x110 [scsi_transport_fc]
        [<ffffffff8105c396>] worker_thread+0x1db/0x268
        [<ffffffff8105c1bb>] ? manage_workers.clone.16+0x1f6/0x1f6
        [<ffffffff8105ffd6>] kthread+0x6f/0x77
        [<ffffffff814c0304>] kernel_thread_helper+0x4/0x10
        [<ffffffff8105ff67>] ? kthread_freezable_should_stop+0x4b/0x4b
        [<ffffffff814c0300>] ? gs_change+0x13/0x13
    
    A prior attempt to fix this issue is posted here:
    http://lists.open-fcoe.org/pipermail/devel/2012-October/012318.html
    or
    http://article.gmane.org/gmane.linux.scsi.open-fcoe.devel/11924
    
    Based on feedback and discussion with Neil Horman it seems that the above patch
    may have a case where the fcoe_vport_destroy() and fcoe_destroy_work() can
    race; hence that patch has been withdrawn with this patch that is trying to
    solve the same problem in a different way.
    
    In the current approach instead of removing the fcoe_config_mutex from the
    vport_delete callback function; I've chosen to delete all the NPIV ports first
    on a given root lport before continuing with the removal of the root lport.
    Signed-off-by: default avatarNeerav Parikh <Neerav.Parikh@intel.com>
    Tested-by: default avatarMarcus Dennis <marcusx.e.dennis@intel.com>
    Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
    Signed-off-by: default avatarRobert Love <robert.w.love@intel.com>
    94aa743a
fcoe.c 76.7 KB