• Alex Williamson's avatar
    PCI/PM: Extend D3hot delay for NVIDIA HDA controllers · a5a6dd26
    Alex Williamson authored
    Assignment of NVIDIA Ampere-based GPUs have seen a regression since the
    below referenced commit, where the reduced D3hot transition delay appears
    to introduce a small window where a D3hot->D0 transition followed by a bus
    reset can wedge the device.  The entire device is subsequently unavailable,
    returning -1 on config space read and is unrecoverable without a host
    reset.
    
    This has been observed with RTX A2000 and A5000 GPU and audio functions
    assigned to a Windows VM, where shutdown of the VM places the devices in
    D3hot prior to vfio-pci performing a bus reset when userspace releases the
    devices.  The issue has roughly a 2-3% chance of occurring per shutdown.
    
    Restoring the HDA controller d3hot_delay to the effective value before the
    below commit has been shown to resolve the issue.  NVIDIA confirms this
    change should be safe for all of their HDA controllers.
    
    Fixes: 3e347969 ("PCI/PM: Reduce D3hot delay with usleep_range()")
    Link: https://lore.kernel.org/r/20230413194042.605768-1-alex.williamson@redhat.comReported-by: default avatarZhiyi Guo <zhguo@redhat.com>
    Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarTarun Gupta <targupta@nvidia.com>
    Cc: Abhishek Sahu <abhsahu@nvidia.com>
    Cc: Tarun Gupta <targupta@nvidia.com>
    a5a6dd26
quirks.c 215 KB