Commit 47c8846a authored by Vidya Sagar's avatar Vidya Sagar Committed by Bjorn Helgaas

PCI: Extend ACS configurability

PCIe ACS settings control the level of isolation and the possible P2P paths
between devices. With greater isolation the kernel will create smaller
iommu_groups and with less isolation there is more HW that can achieve P2P
transfers. From a virtualization perspective all devices in the same
iommu_group must be assigned to the same VM as they lack security
isolation.

There is no way for the kernel to automatically know the correct ACS
settings for any given system and workload. Existing command line options
(e.g., disable_acs_redir) allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.

Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration.  The
definition is future proof; when new ACS bits are added to the spec the
open syntax can be extended.

ACS needs to be setup early in the kernel boot as the ACS settings affect
how iommu_groups are formed. iommu_group formation is a one time event
during initial device discovery, so changing ACS bits after kernel boot can
result in an inaccurate view of the iommu_groups compared to the current
isolation configuration.

ACS applies to PCIe Downstream Ports and multi-function devices.  The
default ACS settings are strict and deny any direct traffic between two
functions. This results in the smallest iommu_group the HW can support.
Frequently these values result in slow or non-working P2PDMA.

ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:

  - Full prevention

  - Translated requests can be direct, with various options

  - Asymmetric direct traffic, A can reach B but not the reverse

  - All traffic can be direct

Along with some other less common ones for special topologies.

The intention is that this option would be used with expert knowledge of
the HW capability and workload to achieve the desired configuration.

Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.comSigned-off-by: default avatarVidya Sagar <vidyas@nvidia.com>
[bhelgaas: add example, tidy printk formats]
Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
parent 524e057b
...@@ -4619,6 +4619,38 @@ ...@@ -4619,6 +4619,38 @@
bridges without forcing it upstream. Note: bridges without forcing it upstream. Note:
this removes isolation between devices and this removes isolation between devices and
may put more devices in an IOMMU group. may put more devices in an IOMMU group.
config_acs=
Format:
<ACS flags>@<pci_dev>[; ...]
Specify one or more PCI devices (in the format
specified above) optionally prepended with flags
and separated by semicolons. The respective
capabilities will be enabled, disabled or
unchanged based on what is specified in
flags.
ACS Flags is defined as follows:
bit-0 : ACS Source Validation
bit-1 : ACS Translation Blocking
bit-2 : ACS P2P Request Redirect
bit-3 : ACS P2P Completion Redirect
bit-4 : ACS Upstream Forwarding
bit-5 : ACS P2P Egress Control
bit-6 : ACS Direct Translated P2P
Each bit can be marked as:
'0' – force disabled
'1' – force enabled
'x' – unchanged
For example,
pci=config_acs=10x
would configure all devices that support
ACS to enable P2P Request Redirect, disable
Translation Blocking, and leave Source
Validation unchanged from whatever power-up
or firmware set it to.
Note: this may remove isolation between devices
and may put more devices in an IOMMU group.
force_floating [S390] Force usage of floating interrupts. force_floating [S390] Force usage of floating interrupts.
nomio [S390] Do not use MIO instructions. nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of norid [S390] ignore the RID field and force use of
......
...@@ -946,30 +946,67 @@ void pci_request_acs(void) ...@@ -946,30 +946,67 @@ void pci_request_acs(void)
} }
static const char *disable_acs_redir_param; static const char *disable_acs_redir_param;
static const char *config_acs_param;
/** struct pci_acs {
* pci_disable_acs_redir - disable ACS redirect capabilities u16 cap;
* @dev: the PCI device u16 ctrl;
* u16 fw_ctrl;
* For only devices specified in the disable_acs_redir parameter. };
*/
static void pci_disable_acs_redir(struct pci_dev *dev) static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
const char *p, u16 mask, u16 flags)
{ {
char *delimit;
int ret = 0; int ret = 0;
const char *p;
int pos;
u16 ctrl;
if (!disable_acs_redir_param) if (!p)
return; return;
p = disable_acs_redir_param;
while (*p) { while (*p) {
if (!mask) {
/* Check for ACS flags */
delimit = strstr(p, "@");
if (delimit) {
int end;
u32 shift = 0;
end = delimit - p - 1;
while (end > -1) {
if (*(p + end) == '0') {
mask |= 1 << shift;
shift++;
end--;
} else if (*(p + end) == '1') {
mask |= 1 << shift;
flags |= 1 << shift;
shift++;
end--;
} else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
shift++;
end--;
} else {
pci_err(dev, "Invalid ACS flags... Ignoring\n");
return;
}
}
p = delimit + 1;
} else {
pci_err(dev, "ACS Flags missing\n");
return;
}
}
if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
pci_err(dev, "Invalid ACS flags specified\n");
return;
}
ret = pci_dev_str_match(dev, p, &p); ret = pci_dev_str_match(dev, p, &p);
if (ret < 0) { if (ret < 0) {
pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n", pr_info_once("PCI: Can't parse ACS command line parameter\n");
disable_acs_redir_param);
break; break;
} else if (ret == 1) { } else if (ret == 1) {
/* Found a match */ /* Found a match */
...@@ -989,56 +1026,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev) ...@@ -989,56 +1026,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
if (!pci_dev_specific_disable_acs_redir(dev)) if (!pci_dev_specific_disable_acs_redir(dev))
return; return;
pos = dev->acs_cap; pci_dbg(dev, "ACS mask = %#06x\n", mask);
if (!pos) { pci_dbg(dev, "ACS flags = %#06x\n", flags);
pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
return;
}
pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
/* P2P Request & Completion Redirect */ /* If mask is 0 then we copy the bit from the firmware setting. */
ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC); caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
caps->ctrl |= flags;
pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl); pci_info(dev, "Configured ACS to %#06x\n", caps->ctrl);
pci_info(dev, "disabled ACS redirect\n");
} }
/** /**
* pci_std_enable_acs - enable ACS on devices using standard ACS capabilities * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
* @dev: the PCI device * @dev: the PCI device
* @caps: default ACS controls
*/ */
static void pci_std_enable_acs(struct pci_dev *dev) static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
{ {
int pos;
u16 cap;
u16 ctrl;
pos = dev->acs_cap;
if (!pos)
return;
pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
/* Source Validation */ /* Source Validation */
ctrl |= (cap & PCI_ACS_SV); caps->ctrl |= (caps->cap & PCI_ACS_SV);
/* P2P Request Redirect */ /* P2P Request Redirect */
ctrl |= (cap & PCI_ACS_RR); caps->ctrl |= (caps->cap & PCI_ACS_RR);
/* P2P Completion Redirect */ /* P2P Completion Redirect */
ctrl |= (cap & PCI_ACS_CR); caps->ctrl |= (caps->cap & PCI_ACS_CR);
/* Upstream Forwarding */ /* Upstream Forwarding */
ctrl |= (cap & PCI_ACS_UF); caps->ctrl |= (caps->cap & PCI_ACS_UF);
/* Enable Translation Blocking for external devices and noats */ /* Enable Translation Blocking for external devices and noats */
if (pci_ats_disabled() || dev->external_facing || dev->untrusted) if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
ctrl |= (cap & PCI_ACS_TB); caps->ctrl |= (caps->cap & PCI_ACS_TB);
pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
} }
/** /**
...@@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev *dev) ...@@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
*/ */
static void pci_enable_acs(struct pci_dev *dev) static void pci_enable_acs(struct pci_dev *dev)
{ {
if (!pci_acs_enable) struct pci_acs caps;
goto disable_acs_redir; int pos;
pos = dev->acs_cap;
if (!pos)
return;
if (!pci_dev_specific_enable_acs(dev)) pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
goto disable_acs_redir; pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
caps.fw_ctrl = caps.ctrl;
pci_std_enable_acs(dev); /* If an iommu is present we start with kernel default caps */
if (pci_acs_enable) {
if (pci_dev_specific_enable_acs(dev))
pci_std_enable_acs(dev, &caps);
}
disable_acs_redir:
/* /*
* Note: pci_disable_acs_redir() must be called even if ACS was not * Always apply caps from the command line, even if there is no iommu.
* enabled by the kernel because it may have been enabled by * Trust that the admin has a reason to change the ACS settings.
* platform firmware. So if we are told to disable it, we should
* always disable it after setting the kernel's default
* preferences.
*/ */
pci_disable_acs_redir(dev); __pci_config_acs(dev, &caps, disable_acs_redir_param,
PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
__pci_config_acs(dev, &caps, config_acs_param, 0, 0);
pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
} }
/** /**
...@@ -6840,6 +6869,8 @@ static int __init pci_setup(char *str) ...@@ -6840,6 +6869,8 @@ static int __init pci_setup(char *str)
pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS); pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
} else if (!strncmp(str, "disable_acs_redir=", 18)) { } else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18; disable_acs_redir_param = str + 18;
} else if (!strncmp(str, "config_acs=", 11)) {
config_acs_param = str + 11;
} else { } else {
pr_err("PCI: Unknown option `%s'\n", str); pr_err("PCI: Unknown option `%s'\n", str);
} }
...@@ -6864,6 +6895,7 @@ static int __init pci_realloc_setup_params(void) ...@@ -6864,6 +6895,7 @@ static int __init pci_realloc_setup_params(void)
resource_alignment_param = kstrdup(resource_alignment_param, resource_alignment_param = kstrdup(resource_alignment_param,
GFP_KERNEL); GFP_KERNEL);
disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL); disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
return 0; return 0;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment