1. 13 Jul, 2021 3 commits
    • Casey Chen's avatar
      nvme-pci: do not call nvme_dev_remove_admin from nvme_remove · 251ef6f7
      Casey Chen authored
      nvme_dev_remove_admin could free dev->admin_q and the admin_tagset
      while they are being accessed by nvme_dev_disable(), which can be called
      by nvme_reset_work via nvme_remove_dead_ctrl.
      
      Commit cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      intended to avoid requests being stuck on a removed controller by killing
      the admin queue. But the later fix c8e9e9b7 ("nvme-pci: unquiesce
      admin queue on shutdown"), together with nvme_dev_disable(dev, true)
      right before nvme_dev_remove_admin() could help dispatch requests and
      fail them early, so we don't need nvme_dev_remove_admin() any more.
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: default avatarCasey Chen <cachen@purestorage.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      251ef6f7
    • Casey Chen's avatar
      nvme-pci: fix multiple races in nvme_setup_io_queues · e4b9852a
      Casey Chen authored
      Below two paths could overlap each other if we power off a drive quickly
      after powering it on. There are multiple races in nvme_setup_io_queues()
      because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.
      
      nvme_reset_work()                                nvme_remove()
        nvme_setup_io_queues()                           nvme_dev_disable()
        ...                                              ...
      A1  clear NVMEQ_ENABLED bit for admin queue          lock
          retry:                                       B1  nvme_suspend_io_queues()
      A2    pci_free_irq() admin queue                 B2  nvme_suspend_queue() admin queue
      A3    pci_free_irq_vectors()                         nvme_pci_disable()
      A4    nvme_setup_irqs();                         B3    pci_free_irq_vectors()
            ...                                            unlock
      A5    queue_request_irq() for admin queue
            set NVMEQ_ENABLED bit
            ...
            nvme_create_io_queues()
      A6      result = queue_request_irq();
              set NVMEQ_ENABLED bit
            ...
            fail to allocate enough IO queues:
      A7      nvme_suspend_io_queues()
              goto retry
      
      If B3 runs in between A1 and A2, it will crash if irqaction haven't
      been freed by A2. B2 is supposed to free admin queue IRQ but it simply
      can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.
      
      Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
      gets cleared.
      
      After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
      is checking irqaction. A3 also could race with B2 if B2 is freeing
      IRQ while A3 is checking irqaction.
      
      Fix: A2 and A3 take lock for mutual exclusion.
      
      A3 could race with B3 since they could run free_msi_irqs() in parallel.
      
      Fix: A3 takes lock for mutual exclusion.
      
      A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
      interrupted by B3.
      
      Fix: A4 takes lock for mutual exclusion.
      
      If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
      They are just allocated by A5/A6.
      
      Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.
      
      A7 could get chance to pci_free_irq() for certain IO queue while B3 is
      checking irqaction.
      
      Fix: A7 takes lock.
      
      nvme_dev->online_queues need to be protected by shutdown_lock. Since it
      is not atomic, both paths could modify it using its own copy.
      Co-developed-by: default avatarYuanyuan Zhong <yzhong@purestorage.com>
      Signed-off-by: default avatarCasey Chen <cachen@purestorage.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      e4b9852a
    • Prabhakar Kushwaha's avatar
      nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE · 8b43ced6
      Prabhakar Kushwaha authored
      dev_get_by_name() finds network device by name but it also increases the
      reference count.
      
      If a nvme-tcp queue is present and the network device driver is removed
      before nvme_tcp, we will face the following continuous log:
      
        "kernel:unregister_netdevice: waiting for <eth> to become free. Usage count = 2"
      
      And rmmod further halts. Similar case arises during reboot/shutdown
      with nvme-tcp queue present and both never completes.
      
      To fix this, use __dev_get_by_name() which finds network device by
      name without increasing any reference counter.
      
      Fixes: 3ede8f72 ("nvme-tcp: allow selecting the network interface for connections")
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      [hch: remove the ->ndev member entirely]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8b43ced6
  2. 07 Jul, 2021 3 commits
  3. 05 Jul, 2021 1 commit
  4. 02 Jul, 2021 1 commit
  5. 01 Jul, 2021 5 commits
  6. 30 Jun, 2021 27 commits