• Mika Westerberg's avatar
    bdi: Do not use freezable workqueue · a2b90f11
    Mika Westerberg authored
    A removable block device, such as NVMe or SSD connected over Thunderbolt
    can be hot-removed any time including when the system is suspended. When
    device is hot-removed during suspend and the system gets resumed, kernel
    first resumes devices and then thaws the userspace including freezable
    workqueues. What happens in that case is that the NVMe driver notices
    that the device is unplugged and removes it from the system. This ends
    up calling bdi_unregister() for the gendisk which then schedules
    wb_workfn() to be run one more time.
    
    However, since the bdi_wq is still frozen flush_delayed_work() call in
    wb_shutdown() blocks forever halting system resume process. User sees
    this as hang as nothing is happening anymore.
    
    Triggering sysrq-w reveals this:
    
      Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
      Call Trace:
       ? __schedule+0x2c5/0x630
       ? wait_for_completion+0xa4/0x120
       schedule+0x3e/0xc0
       schedule_timeout+0x1c9/0x320
       ? resched_curr+0x1f/0xd0
       ? wait_for_completion+0xa4/0x120
       wait_for_completion+0xc3/0x120
       ? wake_up_q+0x60/0x60
       __flush_work+0x131/0x1e0
       ? flush_workqueue_prep_pwqs+0x130/0x130
       bdi_unregister+0xb9/0x130
       del_gendisk+0x2d2/0x2e0
       nvme_ns_remove+0xed/0x110 [nvme_core]
       nvme_remove_namespaces+0x96/0xd0 [nvme_core]
       nvme_remove+0x5b/0x160 [nvme]
       pci_device_remove+0x36/0x90
       device_release_driver_internal+0xdf/0x1c0
       nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
       process_one_work+0x1c2/0x3f0
       worker_thread+0x48/0x3e0
       kthread+0x100/0x140
       ? current_work+0x30/0x30
       ? kthread_park+0x80/0x80
       ret_from_fork+0x35/0x40
    
    This is not limited to NVMes so exactly same issue can be reproduced by
    hot-removing SSD (over Thunderbolt) while the system is suspended.
    
    Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.
    Reported-by: default avatarAceLan Kao <acelan.kao@canonical.com>
    Link: https://marc.info/?l=linux-kernel&m=138695698516487
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
    Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.com/#tAcked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    a2b90f11
backing-dev.c 28.4 KB