• Bjorn Helgaas's avatar
    PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device · b440bde7
    Bjorn Helgaas authored
    Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
    normally generates a hot-remove event that unbinds the driver.
    
    Some drivers expect to remain bound to a device even while they power it
    off and back on again.  This can be dangerous, because if the device is
    removed or replaced while it is powered off, the driver doesn't know that
    anything changed.  But some drivers accept that risk.
    
    Add pci_ignore_hotplug() for use by drivers that know their device cannot
    be removed.  Using pci_ignore_hotplug() tells the PCI core that hot-plug
    events for the device should be ignored.
    
    The radeon and nouveau drivers use this to switch between a low-power,
    integrated GPU and a higher-power, higher-performance discrete GPU.  They
    power off the unused GPU, but they want to remain bound to it.
    
    This is a reimplementation of f244d8b6 ("ACPIPHP / radeon / nouveau:
    Fix VGA switcheroo problem related to hotplug") but extends it to work with
    both acpiphp and pciehp.
    
    This fixes a problem where systems with dual GPUs using the radeon drivers
    become unusable, freezing every few seconds (see bugzillas below).  The
    resume of the radeon device may also fail, e.g.,
    
    This fixes problems on dual GPU systems where the radeon driver becomes
    unusable because of problems while suspending the device, as in bug 79701:
    
        [drm] radeon: finishing device.
        radeon 0000:01:00.0: Userspace still has active objects !
        radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free
        ...
        WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
        trying to unbind memory from uninitialized GART !
    
    or while resuming it, as in bug 77261:
    
        radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
        radeon 0000:01:00.0: GPU lockup ...
        radeon 0000:01:00.0: GPU pci config reset
        pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
        radeon 0000:01:00.0: GPU reset succeeded, trying to resume
        *ERROR* radeon: dpm resume failed
        radeon 0000:01:00.0: Wait for MC idle timedout !
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701Reported-by: default avatarShawn Starr <shawn.starr@rogers.com>
    Reported-by: default avatarJose P. <lbdkmjdf@sharklasers.com>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    Acked-by: default avatarRajat Jain <rajatxjain@gmail.com>
    Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: default avatarDave Airlie <airlied@redhat.com>
    CC: stable@vger.kernel.org	# v3.15+
    b440bde7
nouveau_drm.c 28.1 KB