- 15 Mar, 2022 8 commits
-
-
Longfang Liu authored
VMs assigned with HiSilicon ACC VF devices can now perform live migration if the VF devices are bind to the hisi_acc_vfio_pci driver. Just like ACC PF/VF drivers this VFIO driver also make use of the HiSilicon QM interface. QM stands for Queue Management which is a generic IP used by ACC devices. It provides a generic PCIe interface for the CPU and the ACC devices to share a group of queues. QM integrated into an accelerator provides queue management service. Queues can be assigned to PF and VFs, and queues can be controlled by unified mailboxes and doorbells. The QM driver (drivers/crypto/hisilicon/qm.c) provides generic interfaces to ACC drivers to manage the QM. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-9-shameerali.kolothum.thodi@huawei.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Longfang Liu authored
We use VF QM state register to record the status of the QM configuration state. This will be used in the ACC migration driver to determine whether we can safely save and restore the QM data. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Acked-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-8-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Shameer Kolothum authored
struct pci_driver pointer is an input into the pci_iov_get_pf_drvdata(). Introduce helpers to retrieve the ACC PF dev struct pci_driver pointers as we use this in ACC vfio migration driver. Acked-by: Zhou Wang <wangzhou1@hisilicon.com> Acked-by: Kai Ye <yekai13@huawei.com> Acked-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-7-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Shameer Kolothum authored
HiSilicon ACC VF device BAR2 region consists of both functional register space and migration control register space. Unnecessarily exposing the migration BAR region to the Guest has the potential to prevent/corrupt the Guest migration. Hence, introduce a separate struct vfio_device_ops for migration support which will override the ioctl/read/write/mmap methods to hide the migration region and limit the Guest access only to the functional register space. This will be used in subsequent patches when we add migration support to the driver. Please note that it is OK to export the entire VF BAR if migration is not supported or required as this cannot affect the PF configurations. Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220308184902.2242-6-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Shameer Kolothum authored
Add a vendor-specific vfio_pci driver for HiSilicon ACC devices. This will be extended in subsequent patches to add support for VFIO live migration feature. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220308184902.2242-5-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Shameer Kolothum authored
Move the PCI Device IDs of HiSilicon ACC VF devices to a common header and also use a uniform naming convention. This will be useful when we introduce the vfio PCI HiSilicon ACC live migration driver in subsequent patches. Cc: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Zhou Wang <wangzhou1@hisilicon.com> Acked-by: Longfang Liu <liulongfang@huawei.com> Acked-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Link: https://lore.kernel.org/r/20220308184902.2242-4-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Longfang Liu authored
Move Doorbell and Mailbox definitions to common header file. Also export QM mailbox functions. This will be useful when we introduce VFIO PCI HiSilicon ACC live migration driver. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Acked-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-3-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Shameer Kolothum authored
Since we are going to introduce VFIO PCI HiSilicon ACC driver for live migration in subsequent patches, move the ACC QM header file to a common include dir. Acked-by: Zhou Wang <wangzhou1@hisilicon.com> Acked-by: Longfang Liu <liulongfang@huawei.com> Acked-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-2-shameerali.kolothum.thodi@huawei.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
- 09 Mar, 2022 1 commit
-
-
Yishai Hadas authored
Fix sparse warning to not use plain integer (i.e. 0) as NULL pointer. Reported-by: kernel test robot <lkp@intel.com> Fixes: 6fadb021 ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/202203090703.kxvZumJg-lkp@intel.com Link: https://lore.kernel.org/r/20220309080217.94274-1-yishaih@nvidia.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
- 07 Mar, 2022 1 commit
-
-
Leon Romanovsky authored
Replace "-" to be ":" in comment section to be aligned with kernel-doc format. drivers/pci/iov.c:67: warning: Function parameter or member 'dev' not described in 'pci_iov_get_pf_drvdata' drivers/pci/iov.c:67: warning: Function parameter or member 'pf_driver' not described in 'pci_iov_get_pf_drvdata' Fixes: a7e9f240 ("PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/8cecf7df45948a256dc56148cf9e87b2f2bb4198.1646652504.git.leonro@nvidia.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
- 03 Mar, 2022 10 commits
-
-
Alex Williamson authored
Merge branches 'v5.18/vfio/next/mlx5-migration-v10', 'v5.18/vfio/next/pm-fixes' and 'v5.18/vfio/next/uml-build-fix' into v5.18/vfio/next/next
-
Alex Williamson authored
Merge tag 'mlx5-vfio-v10' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.18/vfio/next/mlx5-migration-v10 Add mlx5 live migration driver and v2 migration protocol This series adds mlx5 live migration driver for VFs that are migration capable and includes the v2 migration protocol definition and mlx5 implementation. The mlx5 driver uses the vfio_pci_core split to create a specific VFIO PCI driver that matches the mlx5 virtual functions. The driver provides the same experience as normal vfio-pci with the addition of migration support. In HW the migration is controlled by the PF function, using its mlx5_core driver, and the VFIO PCI VF driver co-ordinates with the PF to execute the migration actions. The bulk of the v2 migration protocol is semantically the same v1, however it has been recast into a FSM for the device_state and the actual syscall interface uses normal ioctl(), read() and write() instead of building a syscall interface using the region. Several bits of infrastructure work are included here: - pci_iov_vf_id() to help drivers like mlx5 figure out the VF index from a BDF - pci_iov_get_pf_drvdata() to clarify the tricky locking protocol when a VF reaches into its PF's driver - mlx5_core uses the normal SRIOV lifecycle and disables SRIOV before driver remove, to be compatible with pci_iov_get_pf_drvdata() - Lifting VFIO_DEVICE_FEATURE into core VFIO code This series comes after alot of discussion. Some major points: - v1 ABI compatible migration defined using the same FSM approach: https://lore.kernel.org/all/0-v1-a4f7cab64938+3f-vfio_mig_states_jgg@nvidia.com/ - Attempts to clarify how the v1 API works: Alex's: https://lore.kernel.org/kvm/163909282574.728533.7460416142511440919.stgit@omen/ Jason's: https://lore.kernel.org/all/0-v3-184b374ad0a8+24c-vfio_mig_doc_jgg@nvidia.com/ - Etherpad exploring the scope and questions of general VFIO migration: https://lore.kernel.org/kvm/87mtm2loml.fsf@redhat.com/ NOTE: As this series touched mlx5_core parts we need to send this in a pull request format to VFIO to avoid conflicts. Matching qemu changes can be previewed here: https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 Link: https://lore.kernel.org/all/20220224142024.147653-1-yishaih@nvidia.comSigned-of-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
Register its own handler for pci_error_handlers.reset_done and update state accordingly. Link: https://lore.kernel.org/all/20220224142024.147653-16-yishaih@nvidia.comReviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
Expose vfio_pci_core_aer_err_detected() to be used by drivers as part of their pci_error_handlers structure. Next patch for mlx5 driver will use it. Link: https://lore.kernel.org/all/20220224142024.147653-15-yishaih@nvidia.comReviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
This patch adds support for vfio_pci driver for mlx5 devices. It uses vfio_pci_core to register to the VFIO subsystem and then implements the mlx5 specific logic in the migration area. The migration implementation follows the definition from uapi/vfio.h and uses the mlx5 VF->PF command channel to achieve it. This patch implements the suspend/resume flows. Link: https://lore.kernel.org/all/20220224142024.147653-14-yishaih@nvidia.comReviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
Expose migration commands over the device, it includes: suspend, resume, get vhca id, query/save/load state. As part of this adds the APIs and data structure that are needed to manage the migration data. Link: https://lore.kernel.org/all/20220224142024.147653-13-yishaih@nvidia.comReviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Jason Gunthorpe authored
v1 was never implemented and is replaced by v2. The old uAPI documentation is removed from the header file. The old uAPI definitions are still kept in the header file to ease transition for userspace copying these headers. They will be fully removed down the road. Link: https://lore.kernel.org/all/20220224142024.147653-12-yishaih@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Jason Gunthorpe authored
The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Jason Gunthorpe authored
Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Jason Gunthorpe authored
Invoke a new device op 'device_feature' to handle just the data array portion of the command. This lifts the ioctl validation to the core code and makes it simpler for either the core code, or layered drivers, to implement their own feature values. Provide vfio_check_feature() to consolidate checking the flags/etc against what the driver supports. Link: https://lore.kernel.org/all/20220224142024.147653-9-yishaih@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
- 27 Feb, 2022 7 commits
-
-
Yishai Hadas authored
Update mlx5 command list and error return function to handle migration commands. Link: https://lore.kernel.org/all/20220224142024.147653-8-yishaih@nvidia.comSigned-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
Introduce migration IFC related stuff to enable migration commands. Link: https://lore.kernel.org/all/20220224142024.147653-7-yishaih@nvidia.comSigned-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
Expose an API to get the mlx5 core device from a given VF PCI device if mlx5_core is its driver. Upon the get API we stay with the intf_state_mutex locked to make sure that the device can't be gone/unloaded till the caller will complete its job over the device, this expects to be for a short period of time for any flow that the lock is taken. Upon the put API we unlock the intf_state_mutex. The use case for those APIs is the migration flow of a VF over VFIO PCI. In that case the VF doesn't ride on mlx5_core, because the device is driving *two* different PCI devices, the PF owned by mlx5_core and the VF owned by the vfio driver. The mlx5_core of the PF is accessed only during the narrow window of the VF's ioctl that requires its services. This allows the PF driver to be more independent of the VF driver, so long as it doesn't reset the FW. Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.comSigned-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Jason Gunthorpe authored
There are some cases where a SR-IOV VF driver will need to reach into and interact with the PF driver. This requires accessing the drvdata of the PF. Provide a function pci_iov_get_pf_drvdata() to return this PF drvdata in a safe way. Normally accessing a drvdata of a foreign struct device would be done using the device_lock() to protect against device driver probe()/remove() races. However, due to the design of pci_enable_sriov() this will result in a ABBA deadlock on the device_lock as the PF's device_lock is held during PF sriov_configure() while calling pci_enable_sriov() which in turn holds the VF's device_lock while calling VF probe(), and similarly for remove. This means the VF driver can never obtain the PF's device_lock. Instead use the implicit locking created by pci_enable/disable_sriov(). A VF driver can access its PF drvdata only while its own driver is attached, and the PF driver can control access to its own drvdata based on when it calls pci_enable/disable_sriov(). To use this API the PF driver will setup the PF drvdata in the probe() function. pci_enable_sriov() is only called from sriov_configure() which cannot happen until probe() completes, ensuring no VF races with drvdata setup. For removal, the PF driver must call pci_disable_sriov() in its remove function before destroying any of the drvdata. This ensures that all VF drivers are unbound before returning, fencing concurrent access to the drvdata. The introduction of a new function to do this access makes clear the special locking scheme and the documents the requirements on the PF/VF drivers using this. Link: https://lore.kernel.org/all/20220224142024.147653-5-yishaih@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Yishai Hadas authored
Virtual functions depend on physical function for device access (for example firmware host PAGE management), so make sure to disable SR-IOV once PF is gone. This will prevent also the below warning if PF has gone before disabling SR-IOV. "driver left SR-IOV enabled after remove" Next patch from this series will rely on that when the VF may need to access safely the PF 'driver data'. Link: https://lore.kernel.org/all/20220224142024.147653-4-yishaih@nvidia.comSigned-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Leon Romanovsky authored
Instead open-code iteration to compare virtfn internal index, use newly introduced pci_iov_vf_id() call. Link: https://lore.kernel.org/all/20220224142024.147653-3-yishaih@nvidia.comSigned-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
Jason Gunthorpe authored
The PCI core uses the VF index internally, often called the vf_id, during the setup of the VF, eg pci_iov_add_virtfn(). This index is needed for device drivers that implement live migration for their internal operations that configure/control their VFs. Specifically, mlx5_vfio_pci driver that is introduced in coming patches from this series needs it and not the bus/device/function which is exposed today. Add pci_iov_vf_id() which computes the vf_id by reversing the math that was used to create the bus/device/function. Link: https://lore.kernel.org/all/20220224142024.147653-2-yishaih@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
-
- 22 Feb, 2022 3 commits
-
-
Abhishek Sahu authored
If 'vfio_pci_core_device::needs_pm_restore' is set (PCI device does not have No_Soft_Reset bit set in its PMCSR config register), then the current PCI state will be saved locally in 'vfio_pci_core_device::pm_save' during D0->D3hot transition and same will be restored back during D3hot->D0 transition. For reset-related functionalities, vfio driver uses PCI reset API's. These API's internally change the PCI power state back to D0 first if the device power state is non-D0. This state change to D0 will happen without the involvement of vfio driver. Let's consider the following example: 1. The device is in D3hot. 2. User invokes VFIO_DEVICE_RESET ioctl. 3. pci_try_reset_function() will be called which internally invokes pci_dev_save_and_disable(). 4. pci_set_power_state(dev, PCI_D0) will be called first. 5. pci_save_state() will happen then. Now, for the devices which has NoSoftRst-, the pci_set_power_state() can trigger soft reset and the original PCI config state will be lost at step (4) and this state cannot be restored again. This original PCI state can include any setting which is performed by SBIOS or host linux kernel (for example LTR, ASPM L1 substates, etc.). When this soft reset will be triggered, then all these settings will be reset, and the device state saved at step (5) will also have this setting cleared so it cannot be restored. Since the vfio driver only exposes limited PCI capabilities to its user, so the vfio driver user also won't have the option to save and restore these capabilities state either and these original settings will be permanently lost. For pci_reset_bus() also, we can have the above situation. The other functions/devices can be in D3hot and the reset will change the power state of all devices to D0 without the involvement of vfio driver. So, before calling any reset-related API's, we need to make sure that the device state is D0. This is mainly to preserve the state around soft reset. For vfio_pci_core_disable(), we use __pci_reset_function_locked() which internally can use pci_pm_reset() for the function reset. pci_pm_reset() requires the device power state to be in D0, otherwise it returns error. This patch changes the device power state to D0 by invoking vfio_pci_set_power_state() explicitly before calling any reset related API's. Fixes: 51ef3a00 ("vfio/pci: Restore device state on PM transition") Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220217122107.22434-3-abhsahu@nvidia.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Abhishek Sahu authored
If 'vfio_pci_core_device::needs_pm_restore' is set (PCI device does not have No_Soft_Reset bit set in its PMCSR config register), then the current PCI state will be saved locally in 'vfio_pci_core_device::pm_save' during D0->D3hot transition and same will be restored back during D3hot->D0 transition. For saving the PCI state locally, pci_store_saved_state() is being used and the pci_load_and_free_saved_state() will free the allocated memory. But for reset related IOCTLs, vfio driver calls PCI reset-related API's which will internally change the PCI power state back to D0. So, when the guest resumes, then it will get the current state as D0 and it will skip the call to vfio_pci_set_power_state() for changing the power state to D0 explicitly. In this case, the memory pointed by 'pm_save' will never be freed. In a malicious sequence, the state changing to D3hot followed by VFIO_DEVICE_RESET/VFIO_DEVICE_PCI_HOT_RESET can be run in a loop and it can cause an OOM situation. This patch frees the earlier allocated memory first before overwriting 'pm_save' to prevent the mentioned memory leak. Fixes: 51ef3a00 ("vfio/pci: Restore device state on PM transition") Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220217122107.22434-2-abhsahu@nvidia.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Alex Williamson authored
Resolve build errors reported against UML build for undefined ioport_map() and ioport_unmap() functions. Without this config option a device cannot have vfio_pci_core_device.has_vga set, so the existing function would always return -EINVAL anyway. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20220123125737.2658758-1-geert@linux-m68k.org Link: https://lore.kernel.org/r/164306582968.3758255.15192949639574660648.stgit@omenSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
- 20 Feb, 2022 10 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fix from Borislav Petkov: "Fix a NULL ptr dereference when dumping lockdep chains through /proc/lockdep_chains" * tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Correct lock_classes index mapping
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - Fix the ptrace regset xfpregs_set() callback to behave according to the ABI - Handle poisoned pages properly in the SGX reclaimer code * tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing x86/sgx: Fix missing poison handling in reclaimer
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Borislav Petkov: "Fix task exposure order when forking tasks" * tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix yet more sched_fork() races
-
git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasLinus Torvalds authored
Pull EDAC fix from Borislav Petkov: "Fix a long-standing struct alignment bug in the EDAC struct allocation code" * tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Three fixes, all in drivers. The ufs and qedi fixes are minor; the lpfc one is a bit bigger because it involves adding a heuristic to detect and deal with common but not standards compliant behaviour" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix divide by zero in ufshcd_map_queues() scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
-
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengineLinus Torvalds authored
Pull dmaengine fixes from Vinod Koul: "A bunch of driver fixes for: - ptdma error handling in init - lock fix in at_hdmac - error path and error num fix for sh dma - pm balance fix for stm32" * tag 'dmaengine-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: shdma: Fix runtime PM imbalance on error dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe dmaengine: sh: rcar-dmac: Check for error num after setting mask dmaengine: at_xdmac: Fix missing unlock in at_xdmac_tasklet() dmaengine: ptdma: Fix the error handling path in pt_core_init()
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "Some driver updates, a MAINTAINERS fix, and additions to COMPILE_TEST (so we won't miss build problems again)" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: remove duplicate entry for i2c-qcom-geni i2c: brcmstb: fix support for DSL and CM variants i2c: qup: allow COMPILE_TEST i2c: imx: allow COMPILE_TEST i2c: cadence: allow COMPILE_TEST i2c: qcom-cci: don't put a device tree node before i2c_add_adapter() i2c: qcom-cci: don't delete an unregistered adapter i2c: bcm2835: Avoid clock stretching timeouts
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input fixes from Dmitry Torokhov: - a fix for Synaptics touchpads in RMI4 mode failing to suspend/resume properly because I2C client devices are now being suspended and resumed asynchronously which changed the ordering - a change to make sure we do not set right and middle buttons capabilities on touchpads that are "buttonpads" (i.e. do not have separate physical buttons) - a change to zinitix touchscreen driver adding more compatible strings/IDs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: psmouse - set up dependency between PS/2 and SMBus companions Input: zinitix - add new compatible strings Input: clear BTN_RIGHT/MIDDLE on buttonpads
-
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supplyLinus Torvalds authored
Pull power supply fixes from Sebastian Reichel: "Three regression fixes for the 5.17 cycle: - build warning fix for power-supply documentation - pointer size fix in cw2015 battery driver - OOM handling in bq256xx charger driver" * tag 'for-v5.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: supply: bq256xx: Handle OOM correctly power: supply: core: fix application of sizeof to pointer power: supply: fix table problem in sysfs-class-power
-