Commit ec0e2dc8 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - VFIO direct character device (cdev) interface support. This extracts
   the vfio device fd from the container and group model, and is
   intended to be the native uAPI for use with IOMMUFD (Yi Liu)

 - Enhancements to the PCI hot reset interface in support of cdev usage
   (Yi Liu)

 - Fix a potential race between registering and unregistering vfio files
   in the kvm-vfio interface and extend use of a lock to avoid extra
   drop and acquires (Dmitry Torokhov)

 - A new vfio-pci variant driver for the AMD/Pensando Distributed
   Services Card (PDS) Ethernet device, supporting live migration (Brett
   Creeley)

 - Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
   and simplify driver init/exit in fsl code (Li Zetao)

 - Fix uninitialized hole in data structure and pad capability
   structures for alignment (Stefan Hajnoczi)

* tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits)
  vfio/pds: Send type for SUSPEND_STATUS command
  vfio/pds: fix return value in pds_vfio_get_lm_file()
  pds_core: Fix function header descriptions
  vfio: align capability structures
  vfio/type1: fix cap_migration information leak
  vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code
  vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver
  vfio/pds: Add Kconfig and documentation
  vfio/pds: Add support for firmware recovery
  vfio/pds: Add support for dirty page tracking
  vfio/pds: Add VFIO live migration support
  vfio/pds: register with the pds_core PF
  pds_core: Require callers of register/unregister to pass PF drvdata
  vfio/pds: Initial support for pds VFIO driver
  vfio: Commonize combine_ranges for use in other VFIO drivers
  kvm/vfio: avoid bouncing the mutex when adding and deleting groups
  kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add()
  docs: vfio: Add vfio device cdev description
  vfio: Compile vfio_group infrastructure optionally
  vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev()
  ...
parents b6f6167e 642265e2
......@@ -239,6 +239,137 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
IOMMUFD and vfio_iommu_type1
----------------------------
IOMMUFD is the new user API to manage I/O page tables from userspace.
It intends to be the portal of delivering advanced userspace DMA
features (nested translation [5]_, PASID [6]_, etc.) while also providing
a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
cases. Eventually the vfio_iommu_type1 driver, as well as the legacy
vfio container and group model is intended to be deprecated.
The IOMMUFD backwards compatibility interface can be enabled two ways.
In the first method, the kernel can be configured with
CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
transparently provides the entire infrastructure for the VFIO
container and IOMMU backend interfaces. The compatibility mode can
also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
simply symlink'd to /dev/iommu. Note that at the time of writing, the
compatibility mode is not entirely feature complete relative to
VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore
it is not generally advisable at this time to switch from native VFIO
implementations to the IOMMUFD compatibility interfaces.
Long term, VFIO users should migrate to device access through the cdev
interface described below, and native access through the IOMMUFD
provided interfaces.
VFIO Device cdev
----------------
Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
in a VFIO group.
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
cdev interface does not support noiommu devices, so user should use
the legacy group interface if noiommu is wanted.
The cdev only works with IOMMUFD. Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
actually use the device. Once BIND succeeds then a VFIO device can
be fully accessed by the user.
VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
Hence those modules can be fully compiled out in an environment
where no legacy VFIO application exists.
So far SPAPR does not support IOMMUFD yet. So it cannot support device
cdev either.
vfio device cdev access is still bound by IOMMU group semantics, ie. there
can be only one DMA owner for the group. Devices belonging to the same
group can not be bound to multiple iommufd_ctx or shared between native
kernel and vfio bus driver or other driver supporting the driver_managed_dma
flag. A violation of this ownership requirement will fail at the
VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access.
Device cdev Example
-------------------
Assume user wants to access PCI device 0000:6a:01.0::
$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
vfio0
This device is therefore represented as vfio0. The user can verify
its existence::
$ ls -l /dev/vfio/devices/vfio0
crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
511:0
$ ls -l /dev/char/511\:0
lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
Then provide the user with access to the device if unprivileged
operation is desired::
$ chown user:user /dev/vfio/devices/vfio0
Finally the user could get cdev fd by::
cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
An opened cdev_fd doesn't give the user any permission of accessing
the device except binding the cdev_fd to an iommufd. After that point
then the device is fully accessible including attaching it to an
IOMMUFD IOAS/HWPT to enable userspace DMA::
struct vfio_device_bind_iommufd bind = {
.argsz = sizeof(bind),
.flags = 0,
};
struct iommu_ioas_alloc alloc_data = {
.size = sizeof(alloc_data),
.flags = 0,
};
struct vfio_device_attach_iommufd_pt attach_data = {
.argsz = sizeof(attach_data),
.flags = 0,
};
struct iommu_ioas_map map = {
.size = sizeof(map),
.flags = IOMMU_IOAS_MAP_READABLE |
IOMMU_IOAS_MAP_WRITEABLE |
IOMMU_IOAS_MAP_FIXED_IOVA,
.__reserved = 0,
};
iommufd = open("/dev/iommu", O_RDWR);
bind.iommufd = iommufd;
ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
attach_data.pt_id = alloc_data.out_ioas_id;
ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
/* Allocate some space and setup a DMA mapping */
map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
map.iova = 0; /* 1MB starting at 0x0 from device view */
map.length = 1024 * 1024;
map.ioas_id = alloc_data.out_ioas_id;;
ioctl(iommufd, IOMMU_IOAS_MAP, &map);
/* Other device operations as stated in "VFIO Usage Example" */
VFIO User API
-------------------------------------------------------------------------------
......@@ -279,6 +410,7 @@ similar to a file operations structure::
struct iommufd_ctx *ictx, u32 *out_device_id);
void (*unbind_iommufd)(struct vfio_device *vdev);
int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
void (*detach_ioas)(struct vfio_device *vdev);
int (*open_device)(struct vfio_device *vdev);
void (*close_device)(struct vfio_device *vdev);
ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
......@@ -315,9 +447,10 @@ container_of().
- The [un]bind_iommufd callbacks are issued when the device is bound to
and unbound from iommufd.
- The attach_ioas callback is issued when the device is attached to an
IOAS managed by the bound iommufd. The attached IOAS is automatically
detached when the device is unbound from iommufd.
- The [de]attach_ioas callback is issued when the device is attached to
and detached from an IOAS managed by the bound iommufd. However, the
attached IOAS can also be automatically detached when the device is
unbound from iommufd.
- The read/write/mmap callbacks implement the device region access defined
by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
......@@ -564,3 +697,11 @@ This implementation has some specifics:
\-0d.1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
.. [5] Nested translation is an IOMMU feature which supports two stage
address translations. This improves the address translation efficiency
in IOMMU virtualization.
.. [6] PASID stands for Process Address Space ID, introduced by PCI
Express. It is a prerequisite for Shared Virtual Addressing (SVA)
and Scalable I/O Virtualization (Scalable IOV).
.. SPDX-License-Identifier: GPL-2.0+
.. note: can be edited and viewed with /usr/bin/formiko-vim
==========================================================
PCI VFIO driver for the AMD/Pensando(R) DSC adapter family
==========================================================
AMD/Pensando Linux VFIO PCI Device Driver
Copyright(c) 2023 Advanced Micro Devices, Inc.
Overview
========
The ``pds-vfio-pci`` module is a PCI driver that supports Live Migration
capable Virtual Function (VF) devices in the DSC hardware.
Using the device
================
The pds-vfio-pci device is enabled via multiple configuration steps and
depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
Function devices.
Shown below are the steps to bind the driver to a VF and also to the
associated auxiliary device created by the ``pds_core`` driver. This
example assumes the pds_core and pds-vfio-pci modules are already
loaded.
.. code-block:: bash
:name: example-setup-script
#!/bin/bash
PF_BUS="0000:60"
PF_BDF="0000:60:00.0"
VF_BDF="0000:60:00.1"
# Prevent non-vfio VF driver from probing the VF device
echo 0 > /sys/class/pci_bus/$PF_BUS/device/$PF_BDF/sriov_drivers_autoprobe
# Create single VF for Live Migration via pds_core
echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
# Allow the VF to be bound to the pds-vfio-pci driver
echo "pds-vfio-pci" > /sys/class/pci_bus/$PF_BUS/device/$VF_BDF/driver_override
# Bind the VF to the pds-vfio-pci driver
echo "$VF_BDF" > /sys/bus/pci/drivers/pds-vfio-pci/bind
After performing the steps above, a file in /dev/vfio/<iommu_group>
should have been created.
Enabling the driver
===================
The driver is enabled via the standard kernel configuration system,
using the make command::
make oldconfig/menuconfig/etc.
The driver is located in the menu structure at:
-> Device Drivers
-> VFIO Non-Privileged userspace driver framework
-> VFIO support for PDS PCI devices
Support
=======
For general Linux networking support, please use the netdev mailing
list, which is monitored by Pensando personnel::
netdev@vger.kernel.org
For more specific support needs, please use the Pensando driver support
email::
drivers@pensando.io
......@@ -16,6 +16,7 @@ Contents:
altera/altera_tse
amd/pds_core
amd/pds_vdpa
amd/pds_vfio_pci
aquantia/atlantic
chelsio/cxgb
cirrus/cs89x0
......
......@@ -9,22 +9,34 @@ Device types supported:
- KVM_DEV_TYPE_VFIO
Only one VFIO instance may be created per VM. The created device
tracks VFIO groups in use by the VM and features of those groups
important to the correctness and acceleration of the VM. As groups
are enabled and disabled for use by the VM, KVM should be updated
about their presence. When registered with KVM, a reference to the
VFIO-group is held by KVM.
tracks VFIO files (group or device) in use by the VM and features
of those groups/devices important to the correctness and acceleration
of the VM. As groups/devices are enabled and disabled for use by the
VM, KVM should be updated about their presence. When registered with
KVM, a reference to the VFIO file is held by KVM.
Groups:
KVM_DEV_VFIO_GROUP
KVM_DEV_VFIO_GROUP attributes:
KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
kvm_device_attr.addr points to an int32_t file descriptor
for the VFIO group.
KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
kvm_device_attr.addr points to an int32_t file descriptor
for the VFIO group.
KVM_DEV_VFIO_FILE
alias: KVM_DEV_VFIO_GROUP
KVM_DEV_VFIO_FILE attributes:
KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
tracking
kvm_device_attr.addr points to an int32_t file descriptor for the
VFIO file.
KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
device tracking
kvm_device_attr.addr points to an int32_t file descriptor for the
VFIO file.
KVM_DEV_VFIO_GROUP (legacy kvm device group restricted to the handling of VFIO group fd):
KVM_DEV_VFIO_GROUP_ADD: same as KVM_DEV_VFIO_FILE_ADD for group fd only
KVM_DEV_VFIO_GROUP_DEL: same as KVM_DEV_VFIO_FILE_DEL for group fd only
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
allocated by sPAPR KVM.
kvm_device_attr.addr points to a struct::
......@@ -40,7 +52,10 @@ KVM_DEV_VFIO_GROUP attributes:
- @tablefd is a file descriptor for a TCE table allocated via
KVM_CREATE_SPAPR_TCE.
The GROUP_ADD operation above should be invoked prior to accessing the
The FILE/GROUP_ADD operation above should be invoked prior to accessing the
device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
drivers which require a kvm pointer to be set in their .open_device()
callback.
callback. It is the same for device file descriptor via character device
open which gets device access via VFIO_DEVICE_BIND_IOMMUFD. For such file
descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
to support the drivers mentioned in prior sentence as well.
......@@ -22482,6 +22482,13 @@ S: Maintained
P: Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
F: drivers/vfio/pci/*/
VFIO PDS PCI DRIVER
M: Brett Creeley <brett.creeley@amd.com>
L: kvm@vger.kernel.org
S: Maintained
F: Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst
F: drivers/vfio/pci/pds/
VFIO PLATFORM DRIVER
M: Eric Auger <eric.auger@redhat.com>
L: kvm@vger.kernel.org
......
......@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
.detach_ioas = vfio_iommufd_emulated_detach_ioas,
};
static int intel_vgpu_probe(struct mdev_device *mdev)
......
......@@ -14,8 +14,8 @@ config IOMMUFD
if IOMMUFD
config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
depends on VFIO && !VFIO_CONTAINER
default VFIO && !VFIO_CONTAINER
depends on VFIO_GROUP && !VFIO_CONTAINER
default VFIO_GROUP && !VFIO_CONTAINER
help
IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
IOMMUFD providing compatibility emulation to give the same ioctls.
......
......@@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
/**
* iommufd_ctx_has_group - True if any device within the group is bound
* to the ictx
* @ictx: iommufd file descriptor
* @group: Pointer to a physical iommu_group struct
*
* True if any device within the group has been bound to this ictx, ex. via
* iommufd_device_bind(), therefore implying ictx ownership of the group.
*/
bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
{
struct iommufd_object *obj;
unsigned long index;
if (!ictx || !group)
return false;
xa_lock(&ictx->objects);
xa_for_each(&ictx->objects, index, obj) {
if (obj->type == IOMMUFD_OBJ_DEVICE &&
container_of(obj, struct iommufd_device, obj)->group == group) {
xa_unlock(&ictx->objects);
return true;
}
}
xa_unlock(&ictx->objects);
return false;
}
EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
/**
* iommufd_device_unbind - Undo iommufd_device_bind()
* @idev: Device returned by iommufd_device_bind()
......@@ -113,6 +143,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
{
return idev->ictx;
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
u32 iommufd_device_to_id(struct iommufd_device *idev)
{
return idev->obj.id;
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
static int iommufd_device_setup_msi(struct iommufd_device *idev,
struct iommufd_hw_pagetable *hwpt,
phys_addr_t sw_msi_start)
......@@ -441,6 +483,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
iommufd_ctx_get(ictx);
iommufd_object_finalize(ictx, &access->obj);
*id = access->obj.id;
mutex_init(&access->ioas_lock);
return access;
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
......@@ -457,26 +500,60 @@ void iommufd_access_destroy(struct iommufd_access *access)
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
void iommufd_access_detach(struct iommufd_access *access)
{
struct iommufd_ioas *cur_ioas = access->ioas;
mutex_lock(&access->ioas_lock);
if (WARN_ON(!access->ioas))
goto out;
/*
* Set ioas to NULL to block any further iommufd_access_pin_pages().
* iommufd_access_unpin_pages() can continue using access->ioas_unpin.
*/
access->ioas = NULL;
if (access->ops->unmap) {
mutex_unlock(&access->ioas_lock);
access->ops->unmap(access->data, 0, ULONG_MAX);
mutex_lock(&access->ioas_lock);
}
iopt_remove_access(&cur_ioas->iopt, access);
refcount_dec(&cur_ioas->obj.users);
out:
access->ioas_unpin = NULL;
mutex_unlock(&access->ioas_lock);
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
{
struct iommufd_ioas *new_ioas;
int rc = 0;
if (access->ioas)
mutex_lock(&access->ioas_lock);
if (WARN_ON(access->ioas || access->ioas_unpin)) {
mutex_unlock(&access->ioas_lock);
return -EINVAL;
}
new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
if (IS_ERR(new_ioas))
if (IS_ERR(new_ioas)) {
mutex_unlock(&access->ioas_lock);
return PTR_ERR(new_ioas);
}
rc = iopt_add_access(&new_ioas->iopt, access);
if (rc) {
mutex_unlock(&access->ioas_lock);
iommufd_put_object(&new_ioas->obj);
return rc;
}
iommufd_ref_to_users(&new_ioas->obj);
access->ioas = new_ioas;
access->ioas_unpin = new_ioas;
mutex_unlock(&access->ioas_lock);
return 0;
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
......@@ -531,8 +608,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
void iommufd_access_unpin_pages(struct iommufd_access *access,
unsigned long iova, unsigned long length)
{
struct io_pagetable *iopt = &access->ioas->iopt;
struct iopt_area_contig_iter iter;
struct io_pagetable *iopt;
unsigned long last_iova;
struct iopt_area *area;
......@@ -540,6 +617,17 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
return;
mutex_lock(&access->ioas_lock);
/*
* The driver must be doing something wrong if it calls this before an
* iommufd_access_attach() or after an iommufd_access_detach().
*/
if (WARN_ON(!access->ioas_unpin)) {
mutex_unlock(&access->ioas_lock);
return;
}
iopt = &access->ioas_unpin->iopt;
down_read(&iopt->iova_rwsem);
iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
iopt_area_remove_access(
......@@ -549,6 +637,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
min(last_iova, iopt_area_last_iova(area))));
WARN_ON(!iopt_area_contig_done(&iter));
up_read(&iopt->iova_rwsem);
mutex_unlock(&access->ioas_lock);
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_unpin_pages, IOMMUFD);
......@@ -594,8 +683,8 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
unsigned long length, struct page **out_pages,
unsigned int flags)
{
struct io_pagetable *iopt = &access->ioas->iopt;
struct iopt_area_contig_iter iter;
struct io_pagetable *iopt;
unsigned long last_iova;
struct iopt_area *area;
int rc;
......@@ -610,6 +699,13 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
if (check_add_overflow(iova, length - 1, &last_iova))
return -EOVERFLOW;
mutex_lock(&access->ioas_lock);
if (!access->ioas) {
mutex_unlock(&access->ioas_lock);
return -ENOENT;
}
iopt = &access->ioas->iopt;
down_read(&iopt->iova_rwsem);
iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
unsigned long last = min(last_iova, iopt_area_last_iova(area));
......@@ -640,6 +736,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
}
up_read(&iopt->iova_rwsem);
mutex_unlock(&access->ioas_lock);
return 0;
err_remove:
......@@ -654,6 +751,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
iopt_area_last_iova(area))));
}
up_read(&iopt->iova_rwsem);
mutex_unlock(&access->ioas_lock);
return rc;
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
......@@ -673,8 +771,8 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
void *data, size_t length, unsigned int flags)
{
struct io_pagetable *iopt = &access->ioas->iopt;
struct iopt_area_contig_iter iter;
struct io_pagetable *iopt;
struct iopt_area *area;
unsigned long last_iova;
int rc;
......@@ -684,6 +782,13 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
if (check_add_overflow(iova, length - 1, &last_iova))
return -EOVERFLOW;
mutex_lock(&access->ioas_lock);
if (!access->ioas) {
mutex_unlock(&access->ioas_lock);
return -ENOENT;
}
iopt = &access->ioas->iopt;
down_read(&iopt->iova_rwsem);
iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
unsigned long last = min(last_iova, iopt_area_last_iova(area));
......@@ -710,6 +815,7 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
rc = -ENOENT;
err_out:
up_read(&iopt->iova_rwsem);
mutex_unlock(&access->ioas_lock);
return rc;
}
EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD);
......@@ -296,6 +296,8 @@ struct iommufd_access {
struct iommufd_object obj;
struct iommufd_ctx *ictx;
struct iommufd_ioas *ioas;
struct iommufd_ioas *ioas_unpin;
struct mutex ioas_lock;
const struct iommufd_access_ops *ops;
void *data;
unsigned long iova_alignment;
......
......@@ -50,7 +50,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
* before calling iommufd_object_finalize().
*/
rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY,
xa_limit_32b, GFP_KERNEL_ACCOUNT);
xa_limit_31b, GFP_KERNEL_ACCOUNT);
if (rc)
goto out_free;
return obj;
......@@ -417,6 +417,30 @@ struct iommufd_ctx *iommufd_ctx_from_file(struct file *file)
}
EXPORT_SYMBOL_NS_GPL(iommufd_ctx_from_file, IOMMUFD);
/**
* iommufd_ctx_from_fd - Acquires a reference to the iommufd context
* @fd: File descriptor to obtain the reference from
*
* Returns a pointer to the iommufd_ctx, otherwise ERR_PTR. On success
* the caller is responsible to call iommufd_ctx_put().
*/
struct iommufd_ctx *iommufd_ctx_from_fd(int fd)
{
struct file *file;
file = fget(fd);
if (!file)
return ERR_PTR(-EBADF);
if (file->f_op != &iommufd_fops) {
fput(file);
return ERR_PTR(-EBADFD);
}
/* fget is the same as iommufd_ctx_get() */
return file->private_data;
}
EXPORT_SYMBOL_NS_GPL(iommufd_ctx_from_fd, IOMMUFD);
/**
* iommufd_ctx_put - Put back a reference
* @ictx: Context to put back
......
......@@ -483,6 +483,8 @@ static int iommufd_vfio_iommu_get_info(struct iommufd_ctx *ictx,
rc = cap_size;
goto out_put;
}
cap_size = ALIGN(cap_size, sizeof(u64));
if (last_cap && info.argsz >= total_cap_size &&
put_user(total_cap_size, &last_cap->next)) {
rc = -EFAULT;
......
......@@ -8,24 +8,19 @@
/**
* pds_client_register - Link the client to the firmware
* @pf_pdev: ptr to the PF driver struct
* @pf: ptr to the PF driver's private data struct
* @devname: name that includes service into, e.g. pds_core.vDPA
*
* Return: positive client ID (ci) on success, or
* negative for error
*/
int pds_client_register(struct pci_dev *pf_pdev, char *devname)
int pds_client_register(struct pdsc *pf, char *devname)
{
union pds_core_adminq_comp comp = {};
union pds_core_adminq_cmd cmd = {};
struct pdsc *pf;
int err;
u16 ci;
pf = pci_get_drvdata(pf_pdev);
if (pf->state)
return -ENXIO;
cmd.client_reg.opcode = PDS_AQ_CMD_CLIENT_REG;
strscpy(cmd.client_reg.devname, devname,
sizeof(cmd.client_reg.devname));
......@@ -53,23 +48,18 @@ EXPORT_SYMBOL_GPL(pds_client_register);
/**
* pds_client_unregister - Unlink the client from the firmware
* @pf_pdev: ptr to the PF driver struct
* @pf: ptr to the PF driver's private data struct
* @client_id: id returned from pds_client_register()
*
* Return: 0 on success, or
* negative for error
*/
int pds_client_unregister(struct pci_dev *pf_pdev, u16 client_id)
int pds_client_unregister(struct pdsc *pf, u16 client_id)
{
union pds_core_adminq_comp comp = {};
union pds_core_adminq_cmd cmd = {};
struct pdsc *pf;
int err;
pf = pci_get_drvdata(pf_pdev);
if (pf->state)
return -ENXIO;
cmd.client_unreg.opcode = PDS_AQ_CMD_CLIENT_UNREG;
cmd.client_unreg.client_id = cpu_to_le16(client_id);
......@@ -198,7 +188,7 @@ int pdsc_auxbus_dev_del(struct pdsc *cf, struct pdsc *pf)
padev = pf->vfs[cf->vf_id].padev;
if (padev) {
pds_client_unregister(pf->pdev, padev->client_id);
pds_client_unregister(pf, padev->client_id);
auxiliary_device_delete(&padev->aux_dev);
auxiliary_device_uninit(&padev->aux_dev);
padev->client_id = 0;
......@@ -243,7 +233,7 @@ int pdsc_auxbus_dev_add(struct pdsc *cf, struct pdsc *pf)
*/
snprintf(devname, sizeof(devname), "%s.%s.%d",
PDS_CORE_DRV_NAME, pf->viftype_status[vt].name, cf->uid);
client_id = pds_client_register(pf->pdev, devname);
client_id = pds_client_register(pf, devname);
if (client_id < 0) {
err = client_id;
goto out_unlock;
......@@ -252,7 +242,7 @@ int pdsc_auxbus_dev_add(struct pdsc *cf, struct pdsc *pf)
padev = pdsc_auxbus_dev_register(cf, pf, client_id,
pf->viftype_status[vt].name);
if (IS_ERR(padev)) {
pds_client_unregister(pf->pdev, client_id);
pds_client_unregister(pf, client_id);
err = PTR_ERR(padev);
goto out_unlock;
}
......
......@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
.detach_ioas = vfio_iommufd_emulated_detach_ioas,
};
struct mdev_driver vfio_ccw_mdev_driver = {
......
......@@ -2020,6 +2020,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
.detach_ioas = vfio_iommufd_emulated_detach_ioas,
.request = vfio_ap_mdev_request
};
......
......@@ -4,6 +4,8 @@ menuconfig VFIO
select IOMMU_API
depends on IOMMUFD || !IOMMUFD
select INTERVAL_TREE
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
select VFIO_CONTAINER if IOMMUFD=n
help
VFIO provides a framework for secure userspace device drivers.
......@@ -12,9 +14,33 @@ menuconfig VFIO
If you don't know what to do here, say N.
if VFIO
config VFIO_DEVICE_CDEV
bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
depends on IOMMUFD && !SPAPR_TCE_IOMMU
default !VFIO_GROUP
help
The VFIO device cdev is another way for userspace to get device
access. Userspace gets device fd by opening device cdev under
/dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
to set up secure DMA context for device access. This interface does
not support noiommu.
If you don't know what to do here, say N.
config VFIO_GROUP
bool "Support for the VFIO group /dev/vfio/$group_id"
default y
help
VFIO group support provides the traditional model for accessing
devices through VFIO and is used by the majority of userspace
applications and drivers making use of VFIO.
If you don't know what to do here, say Y.
config VFIO_CONTAINER
bool "Support for the VFIO container /dev/vfio/vfio"
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
depends on VFIO_GROUP
default y
help
The VFIO container is the classic interface to VFIO for establishing
......@@ -36,6 +62,7 @@ endif
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
depends on VFIO_GROUP
help
VFIO is built on the ability to isolate devices using the IOMMU.
Only with an IOMMU can userspace access to DMA capable devices be
......
......@@ -2,8 +2,9 @@
obj-$(CONFIG_VFIO) += vfio.o
vfio-y += vfio_main.o \
group.o \
iova_bitmap.o
vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
vfio-$(CONFIG_VFIO_GROUP) += group.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
vfio-$(CONFIG_VFIO_CONTAINER) += container.o
vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
......
......@@ -223,7 +223,6 @@ static struct cdx_driver vfio_cdx_driver = {
.match_id_table = vfio_cdx_table,
.driver = {
.name = "vfio-cdx",
.owner = THIS_MODULE,
},
.driver_managed_dma = true,
};
......
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2023 Intel Corporation.
*/
#include <linux/vfio.h>
#include <linux/iommufd.h>
#include "vfio.h"
static dev_t device_devt;
void vfio_init_device_cdev(struct vfio_device *device)
{
device->device.devt = MKDEV(MAJOR(device_devt), device->index);
cdev_init(&device->cdev, &vfio_device_fops);
device->cdev.owner = THIS_MODULE;
}
/*
* device access via the fd opened by this function is blocked until
* .open_device() is called successfully during BIND_IOMMUFD.
*/
int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
{
struct vfio_device *device = container_of(inode->i_cdev,
struct vfio_device, cdev);
struct vfio_device_file *df;
int ret;
/* Paired with the put in vfio_device_fops_release() */
if (!vfio_device_try_get_registration(device))
return -ENODEV;
df = vfio_allocate_device_file(device);
if (IS_ERR(df)) {
ret = PTR_ERR(df);
goto err_put_registration;
}
filep->private_data = df;
return 0;
err_put_registration:
vfio_device_put_registration(device);
return ret;
}
static void vfio_df_get_kvm_safe(struct vfio_device_file *df)
{
spin_lock(&df->kvm_ref_lock);
vfio_device_get_kvm_safe(df->device, df->kvm);
spin_unlock(&df->kvm_ref_lock);
}
long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
struct vfio_device_bind_iommufd __user *arg)
{
struct vfio_device *device = df->device;
struct vfio_device_bind_iommufd bind;
unsigned long minsz;
int ret;
static_assert(__same_type(arg->out_devid, df->devid));
minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
if (copy_from_user(&bind, arg, minsz))
return -EFAULT;
if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
return -EINVAL;
/* BIND_IOMMUFD only allowed for cdev fds */
if (df->group)
return -EINVAL;
ret = vfio_device_block_group(device);
if (ret)
return ret;
mutex_lock(&device->dev_set->lock);
/* one device cannot be bound twice */
if (df->access_granted) {
ret = -EINVAL;
goto out_unlock;
}
df->iommufd = iommufd_ctx_from_fd(bind.iommufd);
if (IS_ERR(df->iommufd)) {
ret = PTR_ERR(df->iommufd);
df->iommufd = NULL;
goto out_unlock;
}
/*
* Before the device open, get the KVM pointer currently
* associated with the device file (if there is) and obtain
* a reference. This reference is held until device closed.
* Save the pointer in the device for use by drivers.
*/
vfio_df_get_kvm_safe(df);
ret = vfio_df_open(df);
if (ret)
goto out_put_kvm;
ret = copy_to_user(&arg->out_devid, &df->devid,
sizeof(df->devid)) ? -EFAULT : 0;
if (ret)
goto out_close_device;
device->cdev_opened = true;
/*
* Paired with smp_load_acquire() in vfio_device_fops::ioctl/
* read/write/mmap
*/
smp_store_release(&df->access_granted, true);
mutex_unlock(&device->dev_set->lock);
return 0;
out_close_device:
vfio_df_close(df);
out_put_kvm:
vfio_device_put_kvm(device);
iommufd_ctx_put(df->iommufd);
df->iommufd = NULL;
out_unlock:
mutex_unlock(&device->dev_set->lock);
vfio_device_unblock_group(device);
return ret;
}
void vfio_df_unbind_iommufd(struct vfio_device_file *df)
{
struct vfio_device *device = df->device;
/*
* In the time of close, there is no contention with another one
* changing this flag. So read df->access_granted without lock
* and no smp_load_acquire() is ok.
*/
if (!df->access_granted)
return;
mutex_lock(&device->dev_set->lock);
vfio_df_close(df);
vfio_device_put_kvm(device);
iommufd_ctx_put(df->iommufd);
device->cdev_opened = false;
mutex_unlock(&device->dev_set->lock);
vfio_device_unblock_group(device);
}
int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
struct vfio_device_attach_iommufd_pt __user *arg)
{
struct vfio_device *device = df->device;
struct vfio_device_attach_iommufd_pt attach;
unsigned long minsz;
int ret;
minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
if (copy_from_user(&attach, arg, minsz))
return -EFAULT;
if (attach.argsz < minsz || attach.flags)
return -EINVAL;
mutex_lock(&device->dev_set->lock);
ret = device->ops->attach_ioas(device, &attach.pt_id);
if (ret)
goto out_unlock;
if (copy_to_user(&arg->pt_id, &attach.pt_id, sizeof(attach.pt_id))) {
ret = -EFAULT;
goto out_detach;
}
mutex_unlock(&device->dev_set->lock);
return 0;
out_detach:
device->ops->detach_ioas(device);
out_unlock:
mutex_unlock(&device->dev_set->lock);
return ret;
}
int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
struct vfio_device_detach_iommufd_pt __user *arg)
{
struct vfio_device *device = df->device;
struct vfio_device_detach_iommufd_pt detach;
unsigned long minsz;
minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
if (copy_from_user(&detach, arg, minsz))
return -EFAULT;
if (detach.argsz < minsz || detach.flags)
return -EINVAL;
mutex_lock(&device->dev_set->lock);
device->ops->detach_ioas(device);
mutex_unlock(&device->dev_set->lock);
return 0;
}
static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
{
return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
}
int vfio_cdev_init(struct class *device_class)
{
device_class->devnode = vfio_device_devnode;
return alloc_chrdev_region(&device_devt, 0,
MINORMASK + 1, "vfio-dev");
}
void vfio_cdev_cleanup(void)
{
unregister_chrdev_region(device_devt, MINORMASK + 1);
}
......@@ -593,6 +593,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static struct fsl_mc_driver vfio_fsl_mc_driver = {
......@@ -600,23 +601,11 @@ static struct fsl_mc_driver vfio_fsl_mc_driver = {
.remove = vfio_fsl_mc_remove,
.driver = {
.name = "vfio-fsl-mc",
.owner = THIS_MODULE,
},
.driver_managed_dma = true,
};
static int __init vfio_fsl_mc_driver_init(void)
{
return fsl_mc_driver_register(&vfio_fsl_mc_driver);
}
static void __exit vfio_fsl_mc_driver_exit(void)
{
fsl_mc_driver_unregister(&vfio_fsl_mc_driver);
}
module_init(vfio_fsl_mc_driver_init);
module_exit(vfio_fsl_mc_driver_exit);
module_fsl_mc_driver(vfio_fsl_mc_driver);
MODULE_LICENSE("Dual BSD/GPL");
MODULE_DESCRIPTION("VFIO for FSL-MC devices - User Level meta-driver");
......@@ -160,17 +160,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
{
spin_lock(&device->group->kvm_ref_lock);
if (!device->group->kvm)
goto unlock;
_vfio_device_get_kvm_safe(device, device->group->kvm);
unlock:
vfio_device_get_kvm_safe(device, device->group->kvm);
spin_unlock(&device->group->kvm_ref_lock);
}
static int vfio_device_group_open(struct vfio_device *device)
static int vfio_df_group_open(struct vfio_device_file *df)
{
struct vfio_device *device = df->device;
int ret;
mutex_lock(&device->group->group_lock);
......@@ -190,24 +186,62 @@ static int vfio_device_group_open(struct vfio_device *device)
if (device->open_count == 0)
vfio_device_group_get_kvm_safe(device);
ret = vfio_device_open(device, device->group->iommufd);
df->iommufd = device->group->iommufd;
if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
/*
* Require no compat ioas to be assigned to proceed. The basic
* statement is that the user cannot have done something that
* implies they expected translation to exist
*/
if (!capable(CAP_SYS_RAWIO) ||
vfio_iommufd_device_has_compat_ioas(device, df->iommufd))
ret = -EPERM;
else
ret = 0;
goto out_put_kvm;
}
if (device->open_count == 0)
vfio_device_put_kvm(device);
ret = vfio_df_open(df);
if (ret)
goto out_put_kvm;
if (df->iommufd && device->open_count == 1) {
ret = vfio_iommufd_compat_attach_ioas(device, df->iommufd);
if (ret)
goto out_close_device;
}
/*
* Paired with smp_load_acquire() in vfio_device_fops::ioctl/
* read/write/mmap and vfio_file_has_device_access()
*/
smp_store_release(&df->access_granted, true);
mutex_unlock(&device->dev_set->lock);
mutex_unlock(&device->group->group_lock);
return 0;
out_close_device:
vfio_df_close(df);
out_put_kvm:
df->iommufd = NULL;
if (device->open_count == 0)
vfio_device_put_kvm(device);
mutex_unlock(&device->dev_set->lock);
out_unlock:
mutex_unlock(&device->group->group_lock);
return ret;
}
void vfio_device_group_close(struct vfio_device *device)
void vfio_df_group_close(struct vfio_device_file *df)
{
struct vfio_device *device = df->device;
mutex_lock(&device->group->group_lock);
mutex_lock(&device->dev_set->lock);
vfio_device_close(device, device->group->iommufd);
vfio_df_close(df);
df->iommufd = NULL;
if (device->open_count == 0)
vfio_device_put_kvm(device);
......@@ -218,19 +252,28 @@ void vfio_device_group_close(struct vfio_device *device)
static struct file *vfio_device_open_file(struct vfio_device *device)
{
struct vfio_device_file *df;
struct file *filep;
int ret;
ret = vfio_device_group_open(device);
if (ret)
df = vfio_allocate_device_file(device);
if (IS_ERR(df)) {
ret = PTR_ERR(df);
goto err_out;
}
df->group = device->group;
ret = vfio_df_group_open(df);
if (ret)
goto err_free;
/*
* We can't use anon_inode_getfd() because we need to modify
* the f_mode flags directly to allow more than just ioctls
*/
filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
device, O_RDWR);
df, O_RDWR);
if (IS_ERR(filep)) {
ret = PTR_ERR(filep);
goto err_close_device;
......@@ -253,7 +296,9 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
return filep;
err_close_device:
vfio_device_group_close(device);
vfio_df_group_close(df);
err_free:
kfree(df);
err_out:
return ERR_PTR(ret);
}
......@@ -357,6 +402,33 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
}
}
int vfio_device_block_group(struct vfio_device *device)
{
struct vfio_group *group = device->group;
int ret = 0;
mutex_lock(&group->group_lock);
if (group->opened_file) {
ret = -EBUSY;
goto out_unlock;
}
group->cdev_device_open_cnt++;
out_unlock:
mutex_unlock(&group->group_lock);
return ret;
}
void vfio_device_unblock_group(struct vfio_device *device)
{
struct vfio_group *group = device->group;
mutex_lock(&group->group_lock);
group->cdev_device_open_cnt--;
mutex_unlock(&group->group_lock);
}
static int vfio_group_fops_open(struct inode *inode, struct file *filep)
{
struct vfio_group *group =
......@@ -379,6 +451,11 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
goto out_unlock;
}
if (group->cdev_device_open_cnt) {
ret = -EBUSY;
goto out_unlock;
}
/*
* Do we need multiple instances of the group open? Seems not.
*/
......@@ -453,6 +530,7 @@ static void vfio_group_release(struct device *dev)
mutex_destroy(&group->device_lock);
mutex_destroy(&group->group_lock);
WARN_ON(group->iommu_group);
WARN_ON(group->cdev_device_open_cnt);
ida_free(&vfio.group_ida, MINOR(group->dev.devt));
kfree(group);
}
......@@ -604,16 +682,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
if (!iommu_group)
return ERR_PTR(-EINVAL);
/*
* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
* restore cache coherency. It has to be checked here because it is only
* valid for cases where we are using iommu groups.
*/
if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
iommu_group_put(iommu_group);
return ERR_PTR(-EINVAL);
}
mutex_lock(&vfio.group_lock);
group = vfio_group_find_from_iommu(iommu_group);
if (group) {
......@@ -745,6 +813,15 @@ bool vfio_device_has_container(struct vfio_device *device)
return device->group->container;
}
struct vfio_group *vfio_group_from_file(struct file *file)
{
struct vfio_group *group = file->private_data;
if (file->f_op != &vfio_group_fops)
return NULL;
return group;
}
/**
* vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
* @file: VFIO group file
......@@ -755,13 +832,13 @@ bool vfio_device_has_container(struct vfio_device *device)
*/
struct iommu_group *vfio_file_iommu_group(struct file *file)
{
struct vfio_group *group = file->private_data;
struct vfio_group *group = vfio_group_from_file(file);
struct iommu_group *iommu_group = NULL;
if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
return NULL;
if (!vfio_file_is_group(file))
if (!group)
return NULL;
mutex_lock(&group->group_lock);
......@@ -775,33 +852,20 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
/**
* vfio_file_is_group - True if the file is usable with VFIO aPIS
* vfio_file_is_group - True if the file is a vfio group file
* @file: VFIO group file
*/
bool vfio_file_is_group(struct file *file)
{
return file->f_op == &vfio_group_fops;
return vfio_group_from_file(file);
}
EXPORT_SYMBOL_GPL(vfio_file_is_group);
/**
* vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
* is always CPU cache coherent
* @file: VFIO group file
*
* Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
* bit in DMA transactions. A return of false indicates that the user has
* rights to access additional instructions such as wbinvd on x86.
*/
bool vfio_file_enforced_coherent(struct file *file)
bool vfio_group_enforced_coherent(struct vfio_group *group)
{
struct vfio_group *group = file->private_data;
struct vfio_device *device;
bool ret = true;
if (!vfio_file_is_group(file))
return true;
/*
* If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
* any domain later attached to it will also not support it. If the cap
......@@ -819,28 +883,13 @@ bool vfio_file_enforced_coherent(struct file *file)
mutex_unlock(&group->device_lock);
return ret;
}
EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
/**
* vfio_file_set_kvm - Link a kvm with VFIO drivers
* @file: VFIO group file
* @kvm: KVM to link
*
* When a VFIO device is first opened the KVM will be available in
* device->kvm if one was associated with the group.
*/
void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
{
struct vfio_group *group = file->private_data;
if (!vfio_file_is_group(file))
return;
spin_lock(&group->kvm_ref_lock);
group->kvm = kvm;
spin_unlock(&group->kvm_ref_lock);
}
EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
/**
* vfio_file_has_dev - True if the VFIO file is a handle for device
......@@ -851,9 +900,9 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
*/
bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
{
struct vfio_group *group = file->private_data;
struct vfio_group *group = vfio_group_from_file(file);
if (!vfio_file_is_group(file))
if (!group)
return false;
return group == device->group;
......
......@@ -10,53 +10,48 @@
MODULE_IMPORT_NS(IOMMUFD);
MODULE_IMPORT_NS(IOMMUFD_VFIO);
int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
struct iommufd_ctx *ictx)
{
u32 ioas_id;
u32 device_id;
int ret;
lockdep_assert_held(&vdev->dev_set->lock);
return !iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
}
if (vfio_device_is_noiommu(vdev)) {
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
int vfio_df_iommufd_bind(struct vfio_device_file *df)
{
struct vfio_device *vdev = df->device;
struct iommufd_ctx *ictx = df->iommufd;
/*
* Require no compat ioas to be assigned to proceed. The basic
* statement is that the user cannot have done something that
* implies they expected translation to exist
*/
if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
return -EPERM;
return 0;
}
lockdep_assert_held(&vdev->dev_set->lock);
ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
if (ret)
return ret;
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
}
ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
if (ret)
goto err_unbind;
ret = vdev->ops->attach_ioas(vdev, &ioas_id);
if (ret)
goto err_unbind;
int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
struct iommufd_ctx *ictx)
{
u32 ioas_id;
int ret;
/*
* The legacy path has no way to return the device id or the selected
* pt_id
*/
lockdep_assert_held(&vdev->dev_set->lock);
/* compat noiommu does not need to do ioas attach */
if (vfio_device_is_noiommu(vdev))
return 0;
err_unbind:
if (vdev->ops->unbind_iommufd)
vdev->ops->unbind_iommufd(vdev);
ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
if (ret)
return ret;
/* The legacy path has no way to return the selected pt_id */
return vdev->ops->attach_ioas(vdev, &ioas_id);
}
void vfio_iommufd_unbind(struct vfio_device *vdev)
void vfio_df_iommufd_unbind(struct vfio_device_file *df)
{
struct vfio_device *vdev = df->device;
lockdep_assert_held(&vdev->dev_set->lock);
if (vfio_device_is_noiommu(vdev))
......@@ -66,6 +61,50 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
vdev->ops->unbind_iommufd(vdev);
}
struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
{
if (vdev->iommufd_device)
return iommufd_device_to_ictx(vdev->iommufd_device);
return NULL;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
static int vfio_iommufd_device_id(struct vfio_device *vdev)
{
if (vdev->iommufd_device)
return iommufd_device_to_id(vdev->iommufd_device);
return -EINVAL;
}
/*
* Return devid for a device.
* valid ID for the device that is owned by the ictx
* -ENOENT = device is owned but there is no ID
* -ENODEV or other error = device is not owned
*/
int vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx)
{
struct iommu_group *group;
int devid;
if (vfio_iommufd_device_ictx(vdev) == ictx)
return vfio_iommufd_device_id(vdev);
group = iommu_group_get(vdev->dev);
if (!group)
return -ENODEV;
if (iommufd_ctx_has_group(ictx, group))
devid = -ENOENT;
else
devid = -ENODEV;
iommu_group_put(group);
return devid;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_get_dev_id);
/*
* The physical standard ops mean that the iommufd_device is bound to the
* physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
......@@ -101,6 +140,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
{
int rc;
lockdep_assert_held(&vdev->dev_set->lock);
if (WARN_ON(!vdev->iommufd_device))
return -EINVAL;
if (vdev->iommufd_attached)
return -EBUSY;
rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
if (rc)
return rc;
......@@ -109,6 +156,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
}
EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
{
lockdep_assert_held(&vdev->dev_set->lock);
if (WARN_ON(!vdev->iommufd_device) || !vdev->iommufd_attached)
return;
iommufd_device_detach(vdev->iommufd_device);
vdev->iommufd_attached = false;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
/*
* The emulated standard ops mean that vfio_device is going to use the
* "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
......@@ -172,3 +231,16 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
return 0;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
{
lockdep_assert_held(&vdev->dev_set->lock);
if (WARN_ON(!vdev->iommufd_access) ||
!vdev->iommufd_attached)
return;
iommufd_access_detach(vdev->iommufd_access);
vdev->iommufd_attached = false;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
......@@ -63,4 +63,6 @@ source "drivers/vfio/pci/mlx5/Kconfig"
source "drivers/vfio/pci/hisilicon/Kconfig"
source "drivers/vfio/pci/pds/Kconfig"
endmenu
......@@ -11,3 +11,5 @@ obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/
obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
obj-$(CONFIG_PDS_VFIO_PCI) += pds/
......@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
......@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
......
......@@ -732,52 +732,6 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf)
mlx5vf_cmd_dealloc_pd(migf);
}
static void combine_ranges(struct rb_root_cached *root, u32 cur_nodes,
u32 req_nodes)
{
struct interval_tree_node *prev, *curr, *comb_start, *comb_end;
unsigned long min_gap;
unsigned long curr_gap;
/* Special shortcut when a single range is required */
if (req_nodes == 1) {
unsigned long last;
curr = comb_start = interval_tree_iter_first(root, 0, ULONG_MAX);
while (curr) {
last = curr->last;
prev = curr;
curr = interval_tree_iter_next(curr, 0, ULONG_MAX);
if (prev != comb_start)
interval_tree_remove(prev, root);
}
comb_start->last = last;
return;
}
/* Combine ranges which have the smallest gap */
while (cur_nodes > req_nodes) {
prev = NULL;
min_gap = ULONG_MAX;
curr = interval_tree_iter_first(root, 0, ULONG_MAX);
while (curr) {
if (prev) {
curr_gap = curr->start - prev->last;
if (curr_gap < min_gap) {
min_gap = curr_gap;
comb_start = prev;
comb_end = curr;
}
}
prev = curr;
curr = interval_tree_iter_next(curr, 0, ULONG_MAX);
}
comb_start->last = comb_end->last;
interval_tree_remove(comb_end, root);
cur_nodes--;
}
}
static int mlx5vf_create_tracker(struct mlx5_core_dev *mdev,
struct mlx5vf_pci_core_device *mvdev,
struct rb_root_cached *ranges, u32 nnodes)
......@@ -800,7 +754,7 @@ static int mlx5vf_create_tracker(struct mlx5_core_dev *mdev,
int i;
if (num_ranges > max_num_range) {
combine_ranges(ranges, nnodes, max_num_range);
vfio_combine_iova_ranges(ranges, nnodes, max_num_range);
num_ranges = max_num_range;
}
......
......@@ -1320,6 +1320,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static int mlx5vf_pci_probe(struct pci_dev *pdev,
......
# SPDX-License-Identifier: GPL-2.0
# Copyright (c) 2023 Advanced Micro Devices, Inc.
config PDS_VFIO_PCI
tristate "VFIO support for PDS PCI devices"
depends on PDS_CORE
select VFIO_PCI_CORE
help
This provides generic PCI support for PDS devices using the VFIO
framework.
More specific information on this driver can be
found in
<file:Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst>.
To compile this driver as a module, choose M here. The module
will be called pds-vfio-pci.
If you don't know what to do here, say N.
# SPDX-License-Identifier: GPL-2.0
# Copyright (c) 2023 Advanced Micro Devices, Inc.
obj-$(CONFIG_PDS_VFIO_PCI) += pds-vfio-pci.o
pds-vfio-pci-y := \
cmds.o \
dirty.o \
lm.o \
pci_drv.o \
vfio_dev.o
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#ifndef _CMDS_H_
#define _CMDS_H_
int pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio);
void pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio);
int pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio, u8 type);
int pds_vfio_resume_device_cmd(struct pds_vfio_pci_device *pds_vfio, u8 type);
int pds_vfio_get_lm_state_size_cmd(struct pds_vfio_pci_device *pds_vfio, u64 *size);
int pds_vfio_get_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio);
int pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio);
void pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio,
enum pds_lm_host_vf_status vf_status);
int pds_vfio_dirty_status_cmd(struct pds_vfio_pci_device *pds_vfio,
u64 regions_dma, u8 *max_regions,
u8 *num_regions);
int pds_vfio_dirty_enable_cmd(struct pds_vfio_pci_device *pds_vfio,
u64 regions_dma, u8 num_regions);
int pds_vfio_dirty_disable_cmd(struct pds_vfio_pci_device *pds_vfio);
int pds_vfio_dirty_seq_ack_cmd(struct pds_vfio_pci_device *pds_vfio,
u64 sgl_dma, u16 num_sge, u32 offset,
u32 total_len, bool read_seq);
#endif /* _CMDS_H_ */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#ifndef _DIRTY_H_
#define _DIRTY_H_
struct pds_vfio_bmp_info {
unsigned long *bmp;
u32 bmp_bytes;
struct pds_lm_sg_elem *sgl;
dma_addr_t sgl_addr;
u16 num_sge;
};
struct pds_vfio_dirty {
struct pds_vfio_bmp_info host_seq;
struct pds_vfio_bmp_info host_ack;
u64 region_size;
u64 region_start;
u64 region_page_size;
bool is_enabled;
};
struct pds_vfio_pci_device;
bool pds_vfio_dirty_is_enabled(struct pds_vfio_pci_device *pds_vfio);
void pds_vfio_dirty_set_enabled(struct pds_vfio_pci_device *pds_vfio);
void pds_vfio_dirty_set_disabled(struct pds_vfio_pci_device *pds_vfio);
void pds_vfio_dirty_disable(struct pds_vfio_pci_device *pds_vfio,
bool send_cmd);
int pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova,
unsigned long length,
struct iova_bitmap *dirty);
int pds_vfio_dma_logging_start(struct vfio_device *vdev,
struct rb_root_cached *ranges, u32 nnodes,
u64 *page_size);
int pds_vfio_dma_logging_stop(struct vfio_device *vdev);
#endif /* _DIRTY_H_ */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#ifndef _LM_H_
#define _LM_H_
#include <linux/fs.h>
#include <linux/mutex.h>
#include <linux/scatterlist.h>
#include <linux/types.h>
#include <linux/pds/pds_common.h>
#include <linux/pds/pds_adminq.h>
struct pds_vfio_lm_file {
struct file *filep;
struct mutex lock; /* protect live migration data file */
u64 size; /* Size with valid data */
u64 alloc_size; /* Total allocated size. Always >= len */
void *page_mem; /* memory allocated for pages */
struct page **pages; /* Backing pages for file */
unsigned long long npages;
struct sg_table sg_table; /* SG table for backing pages */
struct pds_lm_sg_elem *sgl; /* DMA mapping */
dma_addr_t sgl_addr;
u16 num_sge;
struct scatterlist *last_offset_sg; /* Iterator */
unsigned int sg_last_entry;
unsigned long last_offset;
};
struct pds_vfio_pci_device;
struct file *
pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio,
enum vfio_device_mig_state next);
void pds_vfio_put_save_file(struct pds_vfio_pci_device *pds_vfio);
void pds_vfio_put_restore_file(struct pds_vfio_pci_device *pds_vfio);
#endif /* _LM_H_ */
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/types.h>
#include <linux/vfio.h>
#include <linux/pds/pds_common.h>
#include <linux/pds/pds_core_if.h>
#include <linux/pds/pds_adminq.h>
#include "vfio_dev.h"
#include "pci_drv.h"
#include "cmds.h"
#define PDS_VFIO_DRV_DESCRIPTION "AMD/Pensando VFIO Device Driver"
#define PCI_VENDOR_ID_PENSANDO 0x1dd8
static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio)
{
bool deferred_reset_needed = false;
/*
* Documentation states that the kernel migration driver must not
* generate asynchronous device state transitions outside of
* manipulation by the user or the VFIO_DEVICE_RESET ioctl.
*
* Since recovery is an asynchronous event received from the device,
* initiate a deferred reset. Issue a deferred reset in the following
* situations:
* 1. Migration is in progress, which will cause the next step of
* the migration to fail.
* 2. If the device is in a state that will be set to
* VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is
* shutdown and device is in VFIO_DEVICE_STATE_STOP).
*/
mutex_lock(&pds_vfio->state_mutex);
if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING &&
pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
(pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
pds_vfio_dirty_is_enabled(pds_vfio)))
deferred_reset_needed = true;
mutex_unlock(&pds_vfio->state_mutex);
/*
* On the next user initiated state transition, the device will
* transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's
* responsibility to reset the device.
*
* If a VFIO_DEVICE_RESET is requested post recovery and before the next
* state transition, then the deferred reset state will be set to
* VFIO_DEVICE_STATE_RUNNING.
*/
if (deferred_reset_needed) {
spin_lock(&pds_vfio->reset_lock);
pds_vfio->deferred_reset = true;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_ERROR;
spin_unlock(&pds_vfio->reset_lock);
}
}
static int pds_vfio_pci_notify_handler(struct notifier_block *nb,
unsigned long ecode, void *data)
{
struct pds_vfio_pci_device *pds_vfio =
container_of(nb, struct pds_vfio_pci_device, nb);
struct device *dev = pds_vfio_to_dev(pds_vfio);
union pds_core_notifyq_comp *event = data;
dev_dbg(dev, "%s: event code %lu\n", __func__, ecode);
/*
* We don't need to do anything for RESET state==0 as there is no notify
* or feedback mechanism available, and it is possible that we won't
* even see a state==0 event since the pds_core recovery is pending.
*
* Any requests from VFIO while state==0 will fail, which will return
* error and may cause migration to fail.
*/
if (ecode == PDS_EVENT_RESET) {
dev_info(dev, "%s: PDS_EVENT_RESET event received, state==%d\n",
__func__, event->reset.state);
/*
* pds_core device finished recovery and sent us the
* notification (state == 1) to allow us to recover
*/
if (event->reset.state == 1)
pds_vfio_recovery(pds_vfio);
}
return 0;
}
static int
pds_vfio_pci_register_event_handler(struct pds_vfio_pci_device *pds_vfio)
{
struct device *dev = pds_vfio_to_dev(pds_vfio);
struct notifier_block *nb = &pds_vfio->nb;
int err;
if (!nb->notifier_call) {
nb->notifier_call = pds_vfio_pci_notify_handler;
err = pdsc_register_notify(nb);
if (err) {
nb->notifier_call = NULL;
dev_err(dev,
"failed to register pds event handler: %pe\n",
ERR_PTR(err));
return -EINVAL;
}
dev_dbg(dev, "pds event handler registered\n");
}
return 0;
}
static void
pds_vfio_pci_unregister_event_handler(struct pds_vfio_pci_device *pds_vfio)
{
if (pds_vfio->nb.notifier_call) {
pdsc_unregister_notify(&pds_vfio->nb);
pds_vfio->nb.notifier_call = NULL;
}
}
static int pds_vfio_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *id)
{
struct pds_vfio_pci_device *pds_vfio;
int err;
pds_vfio = vfio_alloc_device(pds_vfio_pci_device, vfio_coredev.vdev,
&pdev->dev, pds_vfio_ops_info());
if (IS_ERR(pds_vfio))
return PTR_ERR(pds_vfio);
dev_set_drvdata(&pdev->dev, &pds_vfio->vfio_coredev);
err = vfio_pci_core_register_device(&pds_vfio->vfio_coredev);
if (err)
goto out_put_vdev;
err = pds_vfio_register_client_cmd(pds_vfio);
if (err) {
dev_err(&pdev->dev, "failed to register as client: %pe\n",
ERR_PTR(err));
goto out_unregister_coredev;
}
err = pds_vfio_pci_register_event_handler(pds_vfio);
if (err)
goto out_unregister_client;
return 0;
out_unregister_client:
pds_vfio_unregister_client_cmd(pds_vfio);
out_unregister_coredev:
vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev);
out_put_vdev:
vfio_put_device(&pds_vfio->vfio_coredev.vdev);
return err;
}
static void pds_vfio_pci_remove(struct pci_dev *pdev)
{
struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev);
pds_vfio_pci_unregister_event_handler(pds_vfio);
pds_vfio_unregister_client_cmd(pds_vfio);
vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev);
vfio_put_device(&pds_vfio->vfio_coredev.vdev);
}
static const struct pci_device_id pds_vfio_pci_table[] = {
{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_PENSANDO, 0x1003) }, /* Ethernet VF */
{ 0, }
};
MODULE_DEVICE_TABLE(pci, pds_vfio_pci_table);
static void pds_vfio_pci_aer_reset_done(struct pci_dev *pdev)
{
struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev);
pds_vfio_reset(pds_vfio);
}
static const struct pci_error_handlers pds_vfio_pci_err_handlers = {
.reset_done = pds_vfio_pci_aer_reset_done,
.error_detected = vfio_pci_core_aer_err_detected,
};
static struct pci_driver pds_vfio_pci_driver = {
.name = KBUILD_MODNAME,
.id_table = pds_vfio_pci_table,
.probe = pds_vfio_pci_probe,
.remove = pds_vfio_pci_remove,
.err_handler = &pds_vfio_pci_err_handlers,
.driver_managed_dma = true,
};
module_pci_driver(pds_vfio_pci_driver);
MODULE_DESCRIPTION(PDS_VFIO_DRV_DESCRIPTION);
MODULE_AUTHOR("Brett Creeley <brett.creeley@amd.com>");
MODULE_LICENSE("GPL");
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#ifndef _PCI_DRV_H
#define _PCI_DRV_H
#include <linux/pci.h>
#endif /* _PCI_DRV_H */
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#include <linux/vfio.h>
#include <linux/vfio_pci_core.h>
#include "lm.h"
#include "dirty.h"
#include "vfio_dev.h"
struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio)
{
return pds_vfio->vfio_coredev.pdev;
}
struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio)
{
return &pds_vfio_to_pci_dev(pds_vfio)->dev;
}
struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev)
{
struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
return container_of(core_device, struct pds_vfio_pci_device,
vfio_coredev);
}
void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio)
{
again:
spin_lock(&pds_vfio->reset_lock);
if (pds_vfio->deferred_reset) {
pds_vfio->deferred_reset = false;
if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) {
pds_vfio_put_restore_file(pds_vfio);
pds_vfio_put_save_file(pds_vfio);
pds_vfio_dirty_disable(pds_vfio, false);
}
pds_vfio->state = pds_vfio->deferred_reset_state;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
spin_unlock(&pds_vfio->reset_lock);
goto again;
}
mutex_unlock(&pds_vfio->state_mutex);
spin_unlock(&pds_vfio->reset_lock);
}
void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio)
{
spin_lock(&pds_vfio->reset_lock);
pds_vfio->deferred_reset = true;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
if (!mutex_trylock(&pds_vfio->state_mutex)) {
spin_unlock(&pds_vfio->reset_lock);
return;
}
spin_unlock(&pds_vfio->reset_lock);
pds_vfio_state_mutex_unlock(pds_vfio);
}
static struct file *
pds_vfio_set_device_state(struct vfio_device *vdev,
enum vfio_device_mig_state new_state)
{
struct pds_vfio_pci_device *pds_vfio =
container_of(vdev, struct pds_vfio_pci_device,
vfio_coredev.vdev);
struct file *res = NULL;
mutex_lock(&pds_vfio->state_mutex);
/*
* only way to transition out of VFIO_DEVICE_STATE_ERROR is via
* VFIO_DEVICE_RESET, so prevent the state machine from running since
* vfio_mig_get_next_state() will throw a WARN_ON() when transitioning
* from VFIO_DEVICE_STATE_ERROR to any other state
*/
while (pds_vfio->state != VFIO_DEVICE_STATE_ERROR &&
new_state != pds_vfio->state) {
enum vfio_device_mig_state next_state;
int err = vfio_mig_get_next_state(vdev, pds_vfio->state,
new_state, &next_state);
if (err) {
res = ERR_PTR(err);
break;
}
res = pds_vfio_step_device_state_locked(pds_vfio, next_state);
if (IS_ERR(res))
break;
pds_vfio->state = next_state;
if (WARN_ON(res && new_state != pds_vfio->state)) {
res = ERR_PTR(-EINVAL);
break;
}
}
pds_vfio_state_mutex_unlock(pds_vfio);
/* still waiting on a deferred_reset */
if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR)
res = ERR_PTR(-EIO);
return res;
}
static int pds_vfio_get_device_state(struct vfio_device *vdev,
enum vfio_device_mig_state *current_state)
{
struct pds_vfio_pci_device *pds_vfio =
container_of(vdev, struct pds_vfio_pci_device,
vfio_coredev.vdev);
mutex_lock(&pds_vfio->state_mutex);
*current_state = pds_vfio->state;
pds_vfio_state_mutex_unlock(pds_vfio);
return 0;
}
static int pds_vfio_get_device_state_size(struct vfio_device *vdev,
unsigned long *stop_copy_length)
{
*stop_copy_length = PDS_LM_DEVICE_STATE_LENGTH;
return 0;
}
static const struct vfio_migration_ops pds_vfio_lm_ops = {
.migration_set_state = pds_vfio_set_device_state,
.migration_get_state = pds_vfio_get_device_state,
.migration_get_data_size = pds_vfio_get_device_state_size
};
static const struct vfio_log_ops pds_vfio_log_ops = {
.log_start = pds_vfio_dma_logging_start,
.log_stop = pds_vfio_dma_logging_stop,
.log_read_and_clear = pds_vfio_dma_logging_report,
};
static int pds_vfio_init_device(struct vfio_device *vdev)
{
struct pds_vfio_pci_device *pds_vfio =
container_of(vdev, struct pds_vfio_pci_device,
vfio_coredev.vdev);
struct pci_dev *pdev = to_pci_dev(vdev->dev);
int err, vf_id, pci_id;
vf_id = pci_iov_vf_id(pdev);
if (vf_id < 0)
return vf_id;
err = vfio_pci_core_init_dev(vdev);
if (err)
return err;
pds_vfio->vf_id = vf_id;
vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
vdev->mig_ops = &pds_vfio_lm_ops;
vdev->log_ops = &pds_vfio_log_ops;
pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
dev_dbg(&pdev->dev,
"%s: PF %#04x VF %#04x vf_id %d domain %d pds_vfio %p\n",
__func__, pci_dev_id(pdev->physfn), pci_id, vf_id,
pci_domain_nr(pdev->bus), pds_vfio);
return 0;
}
static int pds_vfio_open_device(struct vfio_device *vdev)
{
struct pds_vfio_pci_device *pds_vfio =
container_of(vdev, struct pds_vfio_pci_device,
vfio_coredev.vdev);
int err;
err = vfio_pci_core_enable(&pds_vfio->vfio_coredev);
if (err)
return err;
mutex_init(&pds_vfio->state_mutex);
pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev);
return 0;
}
static void pds_vfio_close_device(struct vfio_device *vdev)
{
struct pds_vfio_pci_device *pds_vfio =
container_of(vdev, struct pds_vfio_pci_device,
vfio_coredev.vdev);
mutex_lock(&pds_vfio->state_mutex);
pds_vfio_put_restore_file(pds_vfio);
pds_vfio_put_save_file(pds_vfio);
pds_vfio_dirty_disable(pds_vfio, true);
mutex_unlock(&pds_vfio->state_mutex);
mutex_destroy(&pds_vfio->state_mutex);
vfio_pci_core_close_device(vdev);
}
static const struct vfio_device_ops pds_vfio_ops = {
.name = "pds-vfio",
.init = pds_vfio_init_device,
.release = vfio_pci_core_release_dev,
.open_device = pds_vfio_open_device,
.close_device = pds_vfio_close_device,
.ioctl = vfio_pci_core_ioctl,
.device_feature = vfio_pci_core_ioctl_feature,
.read = vfio_pci_core_read,
.write = vfio_pci_core_write,
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
};
const struct vfio_device_ops *pds_vfio_ops_info(void)
{
return &pds_vfio_ops;
}
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2023 Advanced Micro Devices, Inc. */
#ifndef _VFIO_DEV_H_
#define _VFIO_DEV_H_
#include <linux/pci.h>
#include <linux/vfio_pci_core.h>
#include "dirty.h"
#include "lm.h"
struct pds_vfio_pci_device {
struct vfio_pci_core_device vfio_coredev;
struct pds_vfio_lm_file *save_file;
struct pds_vfio_lm_file *restore_file;
struct pds_vfio_dirty dirty;
struct mutex state_mutex; /* protect migration state */
enum vfio_device_mig_state state;
spinlock_t reset_lock; /* protect reset_done flow */
u8 deferred_reset;
enum vfio_device_mig_state deferred_reset_state;
struct notifier_block nb;
int vf_id;
u16 client_id;
};
void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio);
const struct vfio_device_ops *pds_vfio_ops_info(void);
struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev);
void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio);
struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio);
struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio);
#endif /* _VFIO_DEV_H_ */
......@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
......
This diff is collapsed.
......@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static const struct amba_id pl330_ids[] = {
......
......@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static struct platform_driver vfio_platform_driver = {
......
......@@ -16,14 +16,32 @@ struct iommufd_ctx;
struct iommu_group;
struct vfio_container;
struct vfio_device_file {
struct vfio_device *device;
struct vfio_group *group;
u8 access_granted;
u32 devid; /* only valid when iommufd is valid */
spinlock_t kvm_ref_lock; /* protect kvm field */
struct kvm *kvm;
struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
};
void vfio_device_put_registration(struct vfio_device *device);
bool vfio_device_try_get_registration(struct vfio_device *device);
int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
void vfio_device_close(struct vfio_device *device,
struct iommufd_ctx *iommufd);
int vfio_df_open(struct vfio_device_file *df);
void vfio_df_close(struct vfio_device_file *df);
struct vfio_device_file *
vfio_allocate_device_file(struct vfio_device *device);
extern const struct file_operations vfio_device_fops;
#ifdef CONFIG_VFIO_NOIOMMU
extern bool vfio_noiommu __read_mostly;
#else
enum { vfio_noiommu = false };
#endif
enum vfio_group_type {
/*
* Physical device with IOMMU backing.
......@@ -48,6 +66,7 @@ enum vfio_group_type {
VFIO_NO_IOMMU,
};
#if IS_ENABLED(CONFIG_VFIO_GROUP)
struct vfio_group {
struct device dev;
struct cdev cdev;
......@@ -74,8 +93,11 @@ struct vfio_group {
struct blocking_notifier_head notifier;
struct iommufd_ctx *iommufd;
spinlock_t kvm_ref_lock;
unsigned int cdev_device_open_cnt;
};
int vfio_device_block_group(struct vfio_device *device);
void vfio_device_unblock_group(struct vfio_device *device);
int vfio_device_set_group(struct vfio_device *device,
enum vfio_group_type type);
void vfio_device_remove_group(struct vfio_device *device);
......@@ -83,7 +105,10 @@ void vfio_device_group_register(struct vfio_device *device);
void vfio_device_group_unregister(struct vfio_device *device);
int vfio_device_group_use_iommu(struct vfio_device *device);
void vfio_device_group_unuse_iommu(struct vfio_device *device);
void vfio_device_group_close(struct vfio_device *device);
void vfio_df_group_close(struct vfio_device_file *df);
struct vfio_group *vfio_group_from_file(struct file *file);
bool vfio_group_enforced_coherent(struct vfio_group *group);
void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
......@@ -93,6 +118,82 @@ static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
vdev->group->type == VFIO_NO_IOMMU;
}
#else
struct vfio_group;
static inline int vfio_device_block_group(struct vfio_device *device)
{
return 0;
}
static inline void vfio_device_unblock_group(struct vfio_device *device)
{
}
static inline int vfio_device_set_group(struct vfio_device *device,
enum vfio_group_type type)
{
return 0;
}
static inline void vfio_device_remove_group(struct vfio_device *device)
{
}
static inline void vfio_device_group_register(struct vfio_device *device)
{
}
static inline void vfio_device_group_unregister(struct vfio_device *device)
{
}
static inline int vfio_device_group_use_iommu(struct vfio_device *device)
{
return -EOPNOTSUPP;
}
static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
{
}
static inline void vfio_df_group_close(struct vfio_device_file *df)
{
}
static inline struct vfio_group *vfio_group_from_file(struct file *file)
{
return NULL;
}
static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
{
return true;
}
static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
{
}
static inline bool vfio_device_has_container(struct vfio_device *device)
{
return false;
}
static inline int __init vfio_group_init(void)
{
return 0;
}
static inline void vfio_group_cleanup(void)
{
}
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
return false;
}
#endif /* CONFIG_VFIO_GROUP */
#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
/**
......@@ -217,20 +318,109 @@ static inline void vfio_container_cleanup(void)
#endif
#if IS_ENABLED(CONFIG_IOMMUFD)
int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
void vfio_iommufd_unbind(struct vfio_device *device);
bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
struct iommufd_ctx *ictx);
int vfio_df_iommufd_bind(struct vfio_device_file *df);
void vfio_df_iommufd_unbind(struct vfio_device_file *df);
int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
struct iommufd_ctx *ictx);
#else
static inline int vfio_iommufd_bind(struct vfio_device *device,
static inline bool
vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
struct iommufd_ctx *ictx)
{
return false;
}
static inline int vfio_df_iommufd_bind(struct vfio_device_file *fd)
{
return -EOPNOTSUPP;
}
static inline void vfio_iommufd_unbind(struct vfio_device *device)
static inline void vfio_df_iommufd_unbind(struct vfio_device_file *df)
{
}
static inline int
vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
struct iommufd_ctx *ictx)
{
return -EOPNOTSUPP;
}
#endif
int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
struct vfio_device_attach_iommufd_pt __user *arg);
int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
struct vfio_device_detach_iommufd_pt __user *arg);
#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
void vfio_init_device_cdev(struct vfio_device *device);
static inline int vfio_device_add(struct vfio_device *device)
{
/* cdev does not support noiommu device */
if (vfio_device_is_noiommu(device))
return device_add(&device->device);
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
if (vfio_device_is_noiommu(device))
device_del(&device->device);
else
cdev_device_del(&device->cdev, &device->device);
}
int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
struct vfio_device_bind_iommufd __user *arg);
void vfio_df_unbind_iommufd(struct vfio_device_file *df);
int vfio_cdev_init(struct class *device_class);
void vfio_cdev_cleanup(void);
#else
static inline void vfio_init_device_cdev(struct vfio_device *device)
{
}
static inline int vfio_device_add(struct vfio_device *device)
{
return device_add(&device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
device_del(&device->device);
}
static inline int vfio_device_fops_cdev_open(struct inode *inode,
struct file *filep)
{
return 0;
}
static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
struct vfio_device_bind_iommufd __user *arg)
{
return -ENOTTY;
}
static inline void vfio_df_unbind_iommufd(struct vfio_device_file *df)
{
}
static inline int vfio_cdev_init(struct class *device_class)
{
return 0;
}
static inline void vfio_cdev_cleanup(void)
{
}
#endif /* CONFIG_VFIO_DEVICE_CDEV */
#if IS_ENABLED(CONFIG_VFIO_VIRQFD)
int __init vfio_virqfd_init(void);
void vfio_virqfd_exit(void);
......@@ -244,17 +434,11 @@ static inline void vfio_virqfd_exit(void)
}
#endif
#ifdef CONFIG_VFIO_NOIOMMU
extern bool vfio_noiommu __read_mostly;
#else
enum { vfio_noiommu = false };
#endif
#ifdef CONFIG_HAVE_KVM
void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
void vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
void vfio_device_put_kvm(struct vfio_device *device);
#else
static inline void _vfio_device_get_kvm_safe(struct vfio_device *device,
static inline void vfio_device_get_kvm_safe(struct vfio_device *device,
struct kvm *kvm)
{
}
......
......@@ -2732,7 +2732,7 @@ static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
struct vfio_info_cap *caps)
{
struct vfio_iommu_type1_info_cap_migration cap_mig;
struct vfio_iommu_type1_info_cap_migration cap_mig = {};
cap_mig.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION;
cap_mig.header.version = 1;
......@@ -2762,27 +2762,20 @@ static int vfio_iommu_dma_avail_build_caps(struct vfio_iommu *iommu,
static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
unsigned long arg)
{
struct vfio_iommu_type1_info info;
struct vfio_iommu_type1_info info = {};
unsigned long minsz;
struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
unsigned long capsz;
int ret;
minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
/* For backward compatibility, cannot require this */
capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
if (copy_from_user(&info, (void __user *)arg, minsz))
return -EFAULT;
if (info.argsz < minsz)
return -EINVAL;
if (info.argsz >= capsz) {
minsz = capsz;
info.cap_offset = 0; /* output, no-recopy necessary */
}
minsz = min_t(size_t, info.argsz, sizeof(info));
mutex_lock(&iommu->lock);
info.flags = VFIO_IOMMU_INFO_PGSIZES;
......
This diff is collapsed.
......@@ -16,6 +16,7 @@ struct page;
struct iommufd_ctx;
struct iommufd_access;
struct file;
struct iommu_group;
struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
struct device *dev, u32 *id);
......@@ -24,6 +25,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
void iommufd_device_detach(struct iommufd_device *idev);
struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
u32 iommufd_device_to_id(struct iommufd_device *idev);
struct iommufd_access_ops {
u8 needs_pin_pages : 1;
void (*unmap)(void *data, unsigned long iova, unsigned long length);
......@@ -44,12 +48,15 @@ iommufd_access_create(struct iommufd_ctx *ictx,
const struct iommufd_access_ops *ops, void *data, u32 *id);
void iommufd_access_destroy(struct iommufd_access *access);
int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
void iommufd_access_detach(struct iommufd_access *access);
void iommufd_ctx_get(struct iommufd_ctx *ictx);
#if IS_ENABLED(CONFIG_IOMMUFD)
struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
struct iommufd_ctx *iommufd_ctx_from_fd(int fd);
void iommufd_ctx_put(struct iommufd_ctx *ictx);
bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
unsigned long length, struct page **out_pages,
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -1377,6 +1377,7 @@ static const struct vfio_device_ops mbochs_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
.detach_ioas = vfio_iommufd_emulated_detach_ioas,
};
static struct mdev_driver mbochs_driver = {
......
......@@ -666,6 +666,7 @@ static const struct vfio_device_ops mdpy_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
.detach_ioas = vfio_iommufd_emulated_detach_ioas,
};
static struct mdev_driver mdpy_driver = {
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment