• Dexuan Cui's avatar
    PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary · 340d4556
    Dexuan Cui authored
    When we hot-remove a device, usually the host sends us a PCI_EJECT message,
    and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
    
    When we execute the quick hot-add/hot-remove test, the host may not send
    us the PCI_EJECT message if the guest has not fully finished the
    initialization by sending the PCI_RESOURCES_ASSIGNED* message to the
    host, so it's potentially unsafe to only depend on the
    pci_destroy_slot() in hv_eject_device_work() because the code path
    
    create_root_hv_pci_bus()
     -> hv_pci_assign_slots()
    
    is not called in this case. Note: in this case, the host still sends the
    guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
    
    In the quick hot-add/hot-remove test, we can have such a race before
    the code path
    
    pci_devices_present_work()
     -> new_pcichild_device()
    
    adds the new device into the hbus->children list, we may have already
    received the PCI_EJECT message, and since the tasklet handler
    
    hv_pci_onchannelcallback()
    
    may fail to find the "hpdev" by calling
    
    get_pcichild_wslot(hbus, dev_message->wslot.slot)
    
    hv_pci_eject_device() is not called; Later, by continuing execution
    
    create_root_hv_pci_bus()
     -> hv_pci_assign_slots()
    
    creates the slot and the PCI_BUS_RELATIONS message with
    bus_rel->device_count == 0 removes the device from hbus->children, and
    we end up being unable to remove the slot in
    
    hv_pci_remove()
     -> hv_pci_remove_slots()
    
    Remove the slot in pci_devices_present_work() when the device
    is removed to address this race.
    
    pci_devices_present_work() and hv_eject_device_work() run in the
    singled-threaded hbus->wq, so there is not a double-remove issue for the
    slot.
    
    We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback()
    to the workqueue, because we need the hv_pci_onchannelcallback()
    synchronously call hv_pci_eject_device() to poll the channel
    ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue
    fixed in commit de0aa7b2 ("PCI: hv: Fix 2 hang issues in
    hv_compose_msi_msg()")
    
    Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
    Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
    [lorenzo.pieralisi@arm.com: rewritten commit log]
    Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
    Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
    Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
    Cc: stable@vger.kernel.org
    340d4556
pci-hyperv.c 74.5 KB