• Anthony DeRossi's avatar
    vfio/pci: Check the device set open count on reset · e806e223
    Anthony DeRossi authored
    vfio_pci_dev_set_needs_reset() inspects the open_count of every device
    in the set to determine whether a reset is allowed. The current device
    always has open_count == 1 within vfio_pci_core_disable(), effectively
    disabling the reset logic. This field is also documented as private in
    vfio_device, so it should not be used to determine whether other devices
    in the set are open.
    
    Checking for vfio_device_set_open_count() > 1 on the device set fixes
    both issues.
    
    After commit 2cd8b14a ("vfio/pci: Move to the device set
    infrastructure"), failure to create a new file for a device would cause
    the reset to be skipped due to open_count being decremented after
    calling close_device() in the error path.
    
    After commit eadd86f8 ("vfio: Remove calls to
    vfio_group_add_container_user()"), releasing a device would always skip
    the reset due to an ordering change in vfio_device_fops_release().
    
    Failing to reset the device leaves it in an unknown state, potentially
    causing errors when it is accessed later or bound to a different driver.
    
    This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3)
    assigned to a Windows guest. After shutting down the guest, unbinding
    the device from vfio-pci, and binding the device to amdgpu:
    
    [  548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
    [  548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
    [  548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
    [  548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
    [  548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init
    
    Fixes: 2cd8b14a ("vfio/pci: Move to the device set infrastructure")
    Fixes: eadd86f8 ("vfio: Remove calls to vfio_group_add_container_user()")
    Signed-off-by: default avatarAnthony DeRossi <ajderossi@gmail.com>
    Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    Reviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
    Link: https://lore.kernel.org/r/20221110014027.28780-4-ajderossi@gmail.comSigned-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    e806e223
vfio_pci_core.c 67.4 KB