• Hans de Goede's avatar
    PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors · 085a9f43
    Hans de Goede authored
    Use down_read_nested() and down_write_nested() when taking the
    ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
    the path to the PCI root bus as lock subclass parameter.
    
    This fixes the following false-positive lockdep report when unplugging a
    Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:
    
      pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
      pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
      ============================================
      WARNING: possible recursive locking detected
      5.16.0-rc2+ #621 Not tainted
      --------------------------------------------
      irq/124-pciehp/86 is trying to acquire lock:
      ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
    
      but task is already holding lock:
      ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
    
       other info that might help us debug this:
       Possible unsafe locking scenario:
    
    	 CPU0
    	 ----
        lock(&ctrl->reset_lock);
        lock(&ctrl->reset_lock);
    
       *** DEADLOCK ***
    
       May be due to missing lock nesting notation
    
      3 locks held by irq/124-pciehp/86:
       #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
       #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
       #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
    
      stack backtrace:
      CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
      Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
      Call Trace:
       <TASK>
       dump_stack_lvl+0x59/0x73
       __lock_acquire.cold+0xc5/0x2c6
       lock_acquire+0xb5/0x2b0
       down_read+0x3e/0x50
       pciehp_check_presence+0x23/0x80
       pciehp_runtime_resume+0x5c/0xa0
       device_for_each_child+0x45/0x70
       pcie_port_device_runtime_resume+0x20/0x30
       pci_pm_runtime_resume+0xa7/0xc0
       __rpm_callback+0x41/0x110
       rpm_callback+0x59/0x70
       rpm_resume+0x512/0x7b0
       __pm_runtime_resume+0x4a/0x90
       __device_release_driver+0x28/0x240
       device_release_driver+0x26/0x40
       pci_stop_bus_device+0x68/0x90
       pci_stop_bus_device+0x2c/0x90
       pci_stop_and_remove_bus_device+0xe/0x20
       pciehp_unconfigure_device+0x6c/0x110
       pciehp_disable_slot+0x5b/0xe0
       pciehp_handle_presence_or_link_change+0xc3/0x2f0
       pciehp_ist+0x179/0x180
    
    This lockdep warning is triggered because with Thunderbolt, hotplug ports
    are nested. When removing multiple devices in a daisy-chain, each hotplug
    port's reset_lock may be acquired recursively. It's never the same lock, so
    the lockdep splat is a false positive.
    
    Because locks at the same hierarchy level are never acquired recursively, a
    per-level lockdep class is sufficient to fix the lockdep warning.
    
    The choice to use one lockdep subclass per pcie-hotplug controller in the
    path to the root-bus was made to conserve class keys because their number
    is limited and the complexity grows quadratically with number of keys
    according to Documentation/locking/lockdep-design.rst.
    
    Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
    Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
    Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855Reported-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
    Cc: stable@vger.kernel.org
    085a9f43
pciehp.h 8.24 KB