- 28 Jul, 2023 6 commits
-
-
Nicolin Chen authored
Add a new IOMMU_TEST_OP_ACCESS_REPLACE_IOAS to allow replacing the access->ioas, corresponding to the iommufd_access_replace() helper. Then add replace coverage as a part of user_copy test case, which basically repeats the copy test after replacing the old ioas with a new one. Link: https://lore.kernel.org/r/a4897f93d41c34b972213243b8dbf4c3832842e4.1690523699.git.nicolinc@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
Taking advantage of the new iommufd_access_change_ioas_id helper, add an iommufd_access_replace() API for the VFIO emulated pathway to use. Link: https://lore.kernel.org/r/a3267b924fd5f45e0d3a1dd13a9237e923563862.1690523699.git.nicolinc@nvidia.comSuggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
Update iommufd_access_destroy_object() to call the new iommufd_access_change_ioas() helper. It is impossible to legitimately race iommufd_access_destroy_object() with iommufd_access_change_ioas() as iommufd_access_destroy_object() is only called once the refcount reache zero, so any concurrent iommufd_access_change_ioas() is already UAFing the memory. Link: https://lore.kernel.org/r/f9fbeca2cde7f8515da18d689b3e02a6a40a5e14.1690523699.git.nicolinc@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
The complication of the mutex and refcount will be amplified after we introduce the replace support for accesses. So, add a preparatory change of a constitutive helper iommufd_access_change_ioas() and its wrapper iommufd_access_change_ioas_id(). They can simply take care of existing iommufd_access_attach() and iommufd_access_detach(), properly sequencing the refcount puts so that they are truely at the end of the sequence after we know the IOAS pointer is not required any more. Link: https://lore.kernel.org/r/da0c462532193b447329c4eb975a596f47e49b70.1690523699.git.nicolinc@nvidia.comSuggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
This is a preparatory change for ioas replacement support for accesses. The replacement routine does an iopt_add_access() for a new IOAS first and then iopt_remove_access() for the old IOAS upon the success of the first call. However, the first call overrides the iopt_access_list_id in the access struct, resulting in iopt_remove_access() being unable to work on the old IOAS. Add an iopt_access_list_id as a parameter to iopt_remove_access, so the replacement routine can save the id before it gets overwritten. Pass the id in iopt_remove_access() for a proper cleanup. The existing callers should just pass in access->iopt_access_list_id. Link: https://lore.kernel.org/r/7bb939b9e0102da0c099572bb3de78ab7622221e.1690523699.git.nicolinc@nvidia.comSuggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
A driver that doesn't implement ops->dma_unmap shouldn't be allowed to do vfio_pin/unpin_pages(), though it can use vfio_dma_rw() to access an iova range. Deny !ops->dma_unmap cases in vfio_pin/unpin_pages(). Link: https://lore.kernel.org/r/85d622729d8f2334b35d42f1c568df1ededb9171.1690523699.git.nicolinc@nvidia.comSuggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
- 26 Jul, 2023 20 commits
-
-
Jason Gunthorpe authored
Test the basic flow. Link: https://lore.kernel.org/r/19-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
Now that we actually call iommufd_device_bind() we can return the idev_id from that function to userspace for use in other APIs. Link: https://lore.kernel.org/r/18-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This allows userspace to manually create HWPTs on IOAS's and then use those HWPTs as inputs to iommufd_device_attach/replace(). Following series will extend this to allow creating iommu_domains with driver specific parameters. Link: https://lore.kernel.org/r/17-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
Allow the selftest to call the function on the mock idev, add some tests to exercise it. Link: https://lore.kernel.org/r/16-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The selftest invokes things like replace under the object lock of its idev which protects the idev in a similar way to a real user. Unfortunately this triggers lockdep. A lock class per type will solve the problem. Link: https://lore.kernel.org/r/15-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
Replace allows all the devices in a group to move in one step to a new HWPT. Further, the HWPT move is done without going through a blocking domain so that the IOMMU driver can implement some level of non-distruption to ongoing DMA if that has meaning for it (eg for future special driver domains) Replace uses a lot of the same logic as normal attach, except the actual domain change over has different restrictions, and we are careful to sequence things so that failure is going to leave everything the way it was, and not get trapped in a blocking domain or something if there is ENOMEM. Link: https://lore.kernel.org/r/14-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Nicolin Chen authored
qemu has a need to replace the translations associated with a domain when the guest does large-scale operations like switching between an IDENTITY domain and, say, dma-iommu.c. Currently, it does this by replacing all the mappings in a single domain, but this is very inefficient and means that domains have to be per-device rather than per-translation. Provide a high-level API to allow replacements of one domain with another. This is similar to a detach/attach cycle except it doesn't force the group to go to the blocking domain in-between. By removing this forced blocking domain the iommu driver has the opportunity to implement a non-disruptive replacement of the domain to the greatest extent its hardware allows. This allows the qemu emulation of the vIOMMU to be more complete, as real hardware often has a non-distruptive replacement capability. It could be possible to address this by simply removing the protection from the iommu_attach_group(), but it is not so clear if that is safe for the few users. Thus, add a new API to serve this new purpose. All drivers are already required to support changing between active UNMANAGED domains when using their attach_dev ops. This API is expected to be used only by IOMMUFD, so add to the iommu-priv header and mark it as IOMMUFD_INTERNAL. Link: https://lore.kernel.org/r/13-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comSuggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The code flow for first time attaching a PT and replacing a PT is very similar except for the lowest do_attach step. Reorganize this so that the do_attach step is a function pointer. Replace requires destroying the old HWPT once it is replaced. This destruction cannot be done under all the locks that are held in the function pointer, so the signature allows returning a HWPT which will be destroyed by the caller after everything is unlocked. Link: https://lore.kernel.org/r/12-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
Due to the auto_domains mechanism the ioas->mutex must be held until the hwpt is completely setup by iommufd_object_abort_and_destroy() or iommufd_object_finalize(). This prevents a concurrent iommufd_device_auto_get_domain() from seeing an incompletely initialized object through the ioas->hwpt_list. To make this more consistent move the unlock until after finalize. Fixes: e8d57210 ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/11-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
During creation the hwpt must have the ioas->mutex held until the object is finalized. This means we need to be able to call iommufd_object_abort_and_destroy() while holding the mutex. Since iommufd_hw_pagetable_destroy() also needs the mutex this is problematic. Fix it by creating a special abort op for the object that can assume the caller is holding the lock, as required by the contract. The next patch will add another iommufd_object_abort_and_destroy() for a hwpt. Fixes: e8d57210 ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/10-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
Logically the HWPT should have the coherency set properly for the device that it is being created for when it is created. This was happening implicitly if the immediate_attach was set because iommufd_hw_pagetable_attach() does it as the first thing. Do it unconditionally so !immediate_attach works properly. Link: https://lore.kernel.org/r/9-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
Next patch will need to call this from two places. Link: https://lore.kernel.org/r/8-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The sw_msi_start is only set by the ARM drivers and it is always constant. Due to the way vfio/iommufd allow domains to be re-used between devices we have a built in assumption that there is only one value for sw_msi_start and it is global to the system. To make replace simpler where we may not reparse the iommu_get_resv_regions() move the sw_msi_start to the iommufd_group so it is always available once any HWPT has been attached. Link: https://lore.kernel.org/r/7-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This only needs to be done once per group, not once per device. The once per device was a way to make the device list work. Since we are abandoning this we can optimize things a bit. Link: https://lore.kernel.org/r/6-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The driver facing API in the iommu core makes the reserved regions per-device. An algorithm in the core code consolidates the regions of all the devices in a group to return the group view. To allow for devices to be hotplugged into the group iommufd would re-load the entire group's reserved regions for each device, just in case they changed. Further iommufd already has to deal with duplicated/overlapping reserved regions as it must union all the groups together. Thus simplify all of this to just use the device reserved regions interface directly from the iommu driver. Link: https://lore.kernel.org/r/5-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comSuggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
iommufd wants to use this in the next patch. For some reason the iommu_put_resv_regions() was already exported. Link: https://lore.kernel.org/r/4-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT. The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group. For destruction the flow is organized to make the following patches easier, the actual call to iommufd_object_destroy_user() is done at the top of the call chain without holding any locks. The HWPT to be destroyed is returned out from the locked region to make this possible. Later patches create locking that requires this. Link: https://lore.kernel.org/r/3-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
When the hwpt to device attachment is fairly static we could get away with the simple approach of keeping track of the groups via a device list. But with replace this is infeasible. Add an automatically managed struct that is 1:1 with the iommu_group per-ictx so we can store the necessary tracking information there. Link: https://lore.kernel.org/r/2-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
With the recent rework this no longer needs to be done at domain attachment time, we know if the device is usable by iommufd when we bind it. The value of msi_device_has_isolated_msi() is not allowed to change while a driver is bound. Link: https://lore.kernel.org/r/1-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.comReviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
https://github.com/awilliam/linux-vfioJason Gunthorpe authored
Shared branch with VFIO containing the enablement for VFIO "cdev" devices. This is required by following iommufd patches which add new ioctls to the VFIO cdev. ======= Existing VFIO provides group-centric user APIs for userspace. Userspace opens the /dev/vfio/$group_id first before getting device fd and hence getting access to device. This is not the desired model for iommufd. Per the conclusion of community discussion[1], iommufd provides device-centric kAPIs and requires its consumer (like VFIO) to be device-centric user APIs. Such user APIs are used to associate device with iommufd and also the I/O address spaces managed by the iommufd. This series first introduces a per device file structure to be prepared for further enhancement and refactors the kvm-vfio code to be prepared for accepting device file from userspace. After this, adds a mechanism for blocking device access before iommufd bind. Then refactors the vfio to be able to handle cdev paths (e.g. iommufd binding, no-iommufd, [de]attach ioas). This refactor includes making the device_open exclusive between the group and the cdev path, only allow single device open in cdev path; vfio-iommufd code is also refactored to support cdev. e.g. split the vfio_iommufd_bind() into two steps. Eventually, adds the cdev support for vfio device and the new ioctls, then makes group infrastructure optional as it is not needed when vfio device cdev is compiled. This series is based on some preparation works done to vfio emulated devices[2] and vfio pci hot reset enhancements[3]. Per discussion[4], this series does not support cdev for physical devices that do not have IOMMU. Such devices only have group-centric user APIs. This series is a prerequisite for iommu nesting for vfio device[5] [6]. [1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/ [2] https://lore.kernel.org/kvm/20230327093351.44505-1-yi.l.liu@intel.com/ - merged [3] https://lore.kernel.org/kvm/20230718105542.4138-1-yi.l.liu@intel.com/ [4] https://lore.kernel.org/kvm/20230525095939.37ddb8ce.alex.williamson@redhat.com/ [5] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.com/ [6] https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.com/#t ======= * 'v6.6/vfio/cdev' of https://github.com/awilliam/linux-vfio: (36 commits) docs: vfio: Add vfio device cdev description vfio: Compile vfio_group infrastructure optionally vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev() vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT vfio: Add VFIO_DEVICE_BIND_IOMMUFD vfio: Avoid repeated user pointer cast in vfio_device_fops_unl_ioctl() iommufd: Add iommufd_ctx_from_fd() vfio: Test kvm pointer in _vfio_device_get_kvm_safe() vfio: Add cdev for vfio_device vfio: Move device_del() before waiting for the last vfio_device registration refcount vfio: Move vfio_device_group_unregister() to be the first operation in unregister vfio-iommufd: Add detach_ioas support for emulated VFIO devices iommufd/device: Add iommufd_access_detach() API vfio-iommufd: Add detach_ioas support for physical VFIO devices vfio: Record devid in vfio_device_file vfio-iommufd: Split bind/attach into two steps vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind() vfio: Make vfio_df_open() single open for device cdev path vfio: Add cdev_device_open_cnt to vfio_group vfio: Block device access via device fd until device is opened vfio: Pass struct vfio_device_file * to vfio_device_open/close() kvm/vfio: Accept vfio device file from userspace kvm/vfio: Prepare for accepting vfio device fd vfio: Accept vfio device file in the KVM facing kAPI vfio: Refine vfio file kAPIs for KVM vfio: Allocate per device file structure vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET vfio/pci: Copy hot-reset device info to userspace in the devices loop vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev vfio: Add helper to search vfio_device in a dev_set ...
-
- 25 Jul, 2023 14 commits
-
-
Yi Liu authored
This gives notes for userspace applications on device cdev usage. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-27-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
vfio_group is not needed for vfio device cdev, so with vfio device cdev introduced, the vfio_group infrastructures can be compiled out if only cdev is needed. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-26-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
The IOMMU_CAP_CACHE_COHERENCY check only applies to the physical devices that are IOMMU-backed. But it is now in the group code. If want to compile vfio_group infrastructure out, this check needs to be moved out of the group code. Another reason for this change is to fail the device registration for the physical devices that do not have IOMMU if the group code is not compiled as the cdev interface does not support such devices. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-25-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This adds ioctl for userspace to attach device cdev fd to and detach from IOAS/hw_pagetable managed by iommufd. VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS or hw_pagetable managed by iommufd. Attach can be undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close. VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached IOAS or hw_pagetable managed by iommufd. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-24-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This adds ioctl for userspace to bind device cdev fd to iommufd. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. open_device op is called after bind_iommufd op. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230718135551.6592-23-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This adds a local variable to store the user pointer cast result from arg. It avoids the repeated casts in the code when more ioctls are added. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-22-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
It's common to get a reference to the iommufd context from a given file descriptor. So adds an API for it. Existing users of this API are compiled only when IOMMUFD is enabled, so no need to have a stub for the IOMMUFD disabled case. Tested-by: Yanting Jiang <yanting.jiang@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-21-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This saves some lines when adding the kvm get logic for the vfio_device cdev path. This also renames _vfio_device_get_kvm_safe() to be vfio_device_get_kvm_safe(). Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-20-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This adds cdev support for vfio_device. It allows the user to directly open a vfio device w/o using the legacy container/group interface, as a prerequisite for supporting new iommu features like nested translation and etc. The device fd opened in this manner doesn't have the capability to access the device as the fops open() doesn't open the device until the successful VFIO_DEVICE_BIND_IOMMUFD ioctl which will be added in a later patch. With this patch, devices registered to vfio core would have both the legacy group and the new device interfaces created. - group interface : /dev/vfio/$groupID - device interface: /dev/vfio/devices/vfioX - normal device ("X" is a unique number across vfio devices) For a given device, the user can identify the matching vfioX by searching the vfio-dev folder under the sysfs path of the device. Take PCI device (0000:6a:01.0) as an example, /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfioX implies the matching vfioX under /dev/vfio/devices/, and vfio-dev/vfioX/dev contains the major:minor number of the matching /dev/vfio/devices/vfioX. The user can get device fd by opening the /dev/vfio/devices/vfioX. The vfio_device cdev logic in this patch: *) __vfio_register_dev() path ends up doing cdev_device_add() for each vfio_device if VFIO_DEVICE_CDEV configured. *) vfio_unregister_group_dev() path does cdev_device_del(); cdev interface does not support noiommu devices, so VFIO only creates the legacy group interface for the physical devices that do not have IOMMU. noiommu users should use the legacy group interface. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-19-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
device_del() destroys the vfio-dev/vfioX under the sysfs for vfio_device. There is no reason to keep it while the device is going to be unregistered. This movement is also a preparation for adding vfio_device cdev. Kernel should remove the cdev node of the vfio_device to avoid new registration refcount increment while the device is going to be unregistered. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-18-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This avoids endless vfio_device refcount increment by userspace, which would keep blocking the vfio_unregister_group_dev(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-17-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This prepares for adding DETACH ioctl for emulated VFIO devices. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-16-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Nicolin Chen authored
Previously, the detach routine is only done by the destroy(). And it was called by vfio_iommufd_emulated_unbind() when the device runs close(), so all the mappings in iopt were cleaned in that setup, when the call trace reaches this detach() routine. Now, there's a need of a detach uAPI, meaning that it does not only need a new iommufd_access_detach() API, but also requires access->ops->unmap() call as a cleanup. So add one. However, leaving that unprotected can introduce some potential of a race condition during the pin_/unpin_pages() call, where access->ioas->iopt is getting referenced. So, add an ioas_lock to protect the context of iopt referencings. Also, to allow the iommufd_access_unpin_pages() callback to happen via this unmap() call, add an ioas_unpin pointer, so the unpin routine won't be affected by the "access->ioas = NULL" trick. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-15-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This prepares for adding DETACH ioctl for physical VFIO devices. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-14-yi.l.liu@intel.comSigned-off-by: Alex Williamson <alex.williamson@redhat.com>
-