Commit bfed6efb authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Borislav Petkov:

 - Add support for handling hw errors in SGX pages: poisoning,
   recovering from poison memory and error injection into SGX pages

 - A bunch of changes to the SGX selftests to simplify and allow of SGX
   features testing without the need of a whole SGX software stack

 - Add a sysfs attribute which is supposed to show the amount of SGX
   memory in a NUMA node, similar to what /proc/meminfo is to normal
   memory

 - The usual bunch of fixes and cleanups too

* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/sgx: Fix NULL pointer dereference on non-SGX systems
  selftests/sgx: Fix corrupted cpuid macro invocation
  x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
  x86/sgx: Fix minor documentation issues
  selftests/sgx: Add test for multiple TCS entry
  selftests/sgx: Enable multiple thread support
  selftests/sgx: Add page permission and exception test
  selftests/sgx: Rename test properties in preparation for more enclave tests
  selftests/sgx: Provide per-op parameter structs for the test enclave
  selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
  selftests/sgx: Move setup_test_encl() to each TEST_F()
  selftests/sgx: Encpsulate the test enclave creation
  selftests/sgx: Dump segments and /proc/self/maps only on failure
  selftests/sgx: Create a heap for the test enclave
  selftests/sgx: Make data measurement for an enclave segment optional
  selftests/sgx: Assign source for each segment
  selftests/sgx: Fix a benign linker warning
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Hook arch_memory_failure() into mainline code
  ...
parents d3c20bfb 2056e298
...@@ -176,3 +176,9 @@ Contact: Keith Busch <keith.busch@intel.com> ...@@ -176,3 +176,9 @@ Contact: Keith Busch <keith.busch@intel.com>
Description: Description:
The cache write policy: 0 for write-back, 1 for write-through, The cache write policy: 0 for write-back, 1 for write-through,
other or unknown. other or unknown.
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
Date: November 2021
Contact: Jarkko Sakkinen <jarkko@kernel.org>
Description:
The total amount of SGX physical memory in bytes.
...@@ -181,5 +181,24 @@ You should see something like this in dmesg:: ...@@ -181,5 +181,24 @@ You should see something like this in dmesg::
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0 [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
Special notes for injection into SGX enclaves:
There may be a separate BIOS setup option to enable SGX injection.
The injection process consists of setting some special memory controller
trigger that will inject the error on the next write to the target
address. But the h/w prevents any software outside of an SGX enclave
from accessing enclave pages (even BIOS SMM mode).
The following sequence can be used:
1) Determine physical address of enclave page
2) Use "notrigger=1" mode to inject (this will setup
the injection address, but will not actually inject)
3) Enter the enclave
4) Store data to the virtual address matching physical address from step 1
5) Execute CLFLUSH for that virtual address
6) Spin delay for 250ms
7) Read from the virtual address. This will trigger the error
For more information about EINJ, please refer to ACPI specification For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5 and ACPI 5.0, section 18.6. version 4.0, section 17.5 and ACPI 5.0, section 18.6.
...@@ -10,7 +10,7 @@ Overview ...@@ -10,7 +10,7 @@ Overview
Software Guard eXtensions (SGX) hardware enables for user space applications Software Guard eXtensions (SGX) hardware enables for user space applications
to set aside private memory regions of code and data: to set aside private memory regions of code and data:
* Privileged (ring-0) ENCLS functions orchestrate the construction of the. * Privileged (ring-0) ENCLS functions orchestrate the construction of the
regions. regions.
* Unprivileged (ring-3) ENCLU functions allow an application to enter and * Unprivileged (ring-3) ENCLU functions allow an application to enter and
execute inside the regions. execute inside the regions.
...@@ -91,7 +91,7 @@ In addition to the traditional compiler and linker build process, SGX has a ...@@ -91,7 +91,7 @@ In addition to the traditional compiler and linker build process, SGX has a
separate enclave “build” process. Enclaves must be built before they can be separate enclave “build” process. Enclaves must be built before they can be
executed (entered). The first step in building an enclave is opening the executed (entered). The first step in building an enclave is opening the
**/dev/sgx_enclave** device. Since enclave memory is protected from direct **/dev/sgx_enclave** device. Since enclave memory is protected from direct
access, special privileged instructions are Then used to copy data into enclave access, special privileged instructions are then used to copy data into enclave
pages and establish enclave page permissions. pages and establish enclave page permissions.
.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c .. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
...@@ -126,13 +126,13 @@ the need to juggle signal handlers. ...@@ -126,13 +126,13 @@ the need to juggle signal handlers.
ksgxd ksgxd
===== =====
SGX support includes a kernel thread called *ksgxwapd*. SGX support includes a kernel thread called *ksgxd*.
EPC sanitization EPC sanitization
---------------- ----------------
ksgxd is started when SGX initializes. Enclave memory is typically ready ksgxd is started when SGX initializes. Enclave memory is typically ready
For use when the processor powers on or resets. However, if SGX has been in for use when the processor powers on or resets. However, if SGX has been in
use since the reset, enclave pages may be in an inconsistent state. This might use since the reset, enclave pages may be in an inconsistent state. This might
occur after a crash and kexec() cycle, for instance. At boot, ksgxd occur after a crash and kexec() cycle, for instance. At boot, ksgxd
reinitializes all enclave pages so that they can be allocated and re-used. reinitializes all enclave pages so that they can be allocated and re-used.
...@@ -147,7 +147,7 @@ Page reclaimer ...@@ -147,7 +147,7 @@ Page reclaimer
Similar to the core kswapd, ksgxd, is responsible for managing the Similar to the core kswapd, ksgxd, is responsible for managing the
overcommitment of enclave memory. If the system runs out of enclave memory, overcommitment of enclave memory. If the system runs out of enclave memory,
*ksgxwapd* “swaps” enclave memory to normal memory. *ksgxd* “swaps” enclave memory to normal memory.
Launch Control Launch Control
============== ==============
...@@ -156,7 +156,7 @@ SGX provides a launch control mechanism. After all enclave pages have been ...@@ -156,7 +156,7 @@ SGX provides a launch control mechanism. After all enclave pages have been
copied, kernel executes EINIT function, which initializes the enclave. Only after copied, kernel executes EINIT function, which initializes the enclave. Only after
this the CPU can execute inside the enclave. this the CPU can execute inside the enclave.
ENIT function takes an RSA-3072 signature of the enclave measurement. The function EINIT function takes an RSA-3072 signature of the enclave measurement. The function
checks that the measurement is correct and signature is signed with the key checks that the measurement is correct and signature is signed with the key
hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
SHA256 of a public key. SHA256 of a public key.
...@@ -184,7 +184,7 @@ CPUs starting from Icelake use Total Memory Encryption (TME) in the place of ...@@ -184,7 +184,7 @@ CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
MEE. TME-based SGX implementations do not have an integrity Merkle tree, which MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
means integrity and replay-attacks are not mitigated. B, it includes means integrity and replay-attacks are not mitigated. B, it includes
additional changes to prevent cipher text from being returned and SW memory additional changes to prevent cipher text from being returned and SW memory
aliases from being Created. aliases from being created.
DMA to enclave memory is blocked by range registers on both MEE and TME systems DMA to enclave memory is blocked by range registers on both MEE and TME systems
(SDM section 41.10). (SDM section 41.10).
......
...@@ -1312,6 +1312,10 @@ config ARCH_HAS_PARANOID_L1D_FLUSH ...@@ -1312,6 +1312,10 @@ config ARCH_HAS_PARANOID_L1D_FLUSH
config DYNAMIC_SIGFRAME config DYNAMIC_SIGFRAME
bool bool
# Select, if arch has a named attribute group bound to NUMA device nodes.
config HAVE_ARCH_NODE_DEV_GROUP
bool
source "kernel/gcov/Kconfig" source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig" source "scripts/gcc-plugins/Kconfig"
......
...@@ -269,6 +269,7 @@ config X86 ...@@ -269,6 +269,7 @@ config X86
select HAVE_ARCH_KCSAN if X86_64 select HAVE_ARCH_KCSAN if X86_64
select X86_FEATURE_NAMES if PROC_FS select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
config INSTRUCTION_DECODER config INSTRUCTION_DECODER
...@@ -1921,6 +1922,7 @@ config X86_SGX ...@@ -1921,6 +1922,7 @@ config X86_SGX
select SRCU select SRCU
select MMU_NOTIFIER select MMU_NOTIFIER
select NUMA_KEEP_MEMINFO if NUMA select NUMA_KEEP_MEMINFO if NUMA
select XARRAY_MULTI
help help
Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
that can be used by applications to set aside private regions of code that can be used by applications to set aside private regions of code
......
...@@ -855,4 +855,12 @@ enum mds_mitigations { ...@@ -855,4 +855,12 @@ enum mds_mitigations {
MDS_MITIGATION_VMWERV, MDS_MITIGATION_VMWERV,
}; };
#ifdef CONFIG_X86_SGX
int arch_memory_failure(unsigned long pfn, int flags);
#define arch_memory_failure arch_memory_failure
bool arch_is_platform_page(u64 paddr);
#define arch_is_platform_page arch_is_platform_page
#endif
#endif /* _ASM_X86_PROCESSOR_H */ #endif /* _ASM_X86_PROCESSOR_H */
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
#ifndef _ASM_X86_SET_MEMORY_H #ifndef _ASM_X86_SET_MEMORY_H
#define _ASM_X86_SET_MEMORY_H #define _ASM_X86_SET_MEMORY_H
#include <linux/mm.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm-generic/set_memory.h> #include <asm-generic/set_memory.h>
...@@ -99,6 +100,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap) ...@@ -99,6 +100,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
unsigned long decoy_addr; unsigned long decoy_addr;
int rc; int rc;
/* SGX pages are not in the 1:1 map */
if (arch_is_platform_page(pfn << PAGE_SHIFT))
return 0;
/* /*
* We would like to just call: * We would like to just call:
* set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1); * set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
......
...@@ -6,11 +6,13 @@ ...@@ -6,11 +6,13 @@
#include <linux/highmem.h> #include <linux/highmem.h>
#include <linux/kthread.h> #include <linux/kthread.h>
#include <linux/miscdevice.h> #include <linux/miscdevice.h>
#include <linux/node.h>
#include <linux/pagemap.h> #include <linux/pagemap.h>
#include <linux/ratelimit.h> #include <linux/ratelimit.h>
#include <linux/sched/mm.h> #include <linux/sched/mm.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/sysfs.h>
#include <asm/sgx.h> #include <asm/sgx.h>
#include "driver.h" #include "driver.h"
#include "encl.h" #include "encl.h"
...@@ -20,6 +22,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; ...@@ -20,6 +22,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
static int sgx_nr_epc_sections; static int sgx_nr_epc_sections;
static struct task_struct *ksgxd_tsk; static struct task_struct *ksgxd_tsk;
static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq); static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
static DEFINE_XARRAY(sgx_epc_address_space);
/* /*
* These variables are part of the state of the reclaimer, and must be accessed * These variables are part of the state of the reclaimer, and must be accessed
...@@ -60,6 +63,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list) ...@@ -60,6 +63,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
page = list_first_entry(dirty_page_list, struct sgx_epc_page, list); page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
/*
* Checking page->poison without holding the node->lock
* is racy, but losing the race (i.e. poison is set just
* after the check) just means __eremove() will be uselessly
* called for a page that sgx_free_epc_page() will put onto
* the node->sgx_poison_page_list later.
*/
if (page->poison) {
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
struct sgx_numa_node *node = section->node;
spin_lock(&node->lock);
list_move(&page->list, &node->sgx_poison_page_list);
spin_unlock(&node->lock);
continue;
}
ret = __eremove(sgx_get_epc_virt_addr(page)); ret = __eremove(sgx_get_epc_virt_addr(page));
if (!ret) { if (!ret) {
/* /*
...@@ -471,6 +492,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) ...@@ -471,6 +492,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list); page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
list_del_init(&page->list); list_del_init(&page->list);
page->flags = 0;
spin_unlock(&node->lock); spin_unlock(&node->lock);
atomic_long_dec(&sgx_nr_free_pages); atomic_long_dec(&sgx_nr_free_pages);
...@@ -624,7 +646,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page) ...@@ -624,7 +646,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
spin_lock(&node->lock); spin_lock(&node->lock);
list_add_tail(&page->list, &node->free_page_list); page->owner = NULL;
if (page->poison)
list_add(&page->list, &node->sgx_poison_page_list);
else
list_add_tail(&page->list, &node->free_page_list);
page->flags = SGX_EPC_PAGE_IS_FREE;
spin_unlock(&node->lock); spin_unlock(&node->lock);
atomic_long_inc(&sgx_nr_free_pages); atomic_long_inc(&sgx_nr_free_pages);
...@@ -648,17 +675,102 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, ...@@ -648,17 +675,102 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
} }
section->phys_addr = phys_addr; section->phys_addr = phys_addr;
xa_store_range(&sgx_epc_address_space, section->phys_addr,
phys_addr + size - 1, section, GFP_KERNEL);
for (i = 0; i < nr_pages; i++) { for (i = 0; i < nr_pages; i++) {
section->pages[i].section = index; section->pages[i].section = index;
section->pages[i].flags = 0; section->pages[i].flags = 0;
section->pages[i].owner = NULL; section->pages[i].owner = NULL;
section->pages[i].poison = 0;
list_add_tail(&section->pages[i].list, &sgx_dirty_page_list); list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
} }
return true; return true;
} }
bool arch_is_platform_page(u64 paddr)
{
return !!xa_load(&sgx_epc_address_space, paddr);
}
EXPORT_SYMBOL_GPL(arch_is_platform_page);
static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
{
struct sgx_epc_section *section;
section = xa_load(&sgx_epc_address_space, paddr);
if (!section)
return NULL;
return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
}
/*
* Called in process context to handle a hardware reported
* error in an SGX EPC page.
* If the MF_ACTION_REQUIRED bit is set in flags, then the
* context is the task that consumed the poison data. Otherwise
* this is called from a kernel thread unrelated to the page.
*/
int arch_memory_failure(unsigned long pfn, int flags)
{
struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
struct sgx_epc_section *section;
struct sgx_numa_node *node;
/*
* mm/memory-failure.c calls this routine for all errors
* where there isn't a "struct page" for the address. But that
* includes other address ranges besides SGX.
*/
if (!page)
return -ENXIO;
/*
* If poison was consumed synchronously. Send a SIGBUS to
* the task. Hardware has already exited the SGX enclave and
* will not allow re-entry to an enclave that has a memory
* error. The signal may help the task understand why the
* enclave is broken.
*/
if (flags & MF_ACTION_REQUIRED)
force_sig(SIGBUS);
section = &sgx_epc_sections[page->section];
node = section->node;
spin_lock(&node->lock);
/* Already poisoned? Nothing more to do */
if (page->poison)
goto out;
page->poison = 1;
/*
* If the page is on a free list, move it to the per-node
* poison page list.
*/
if (page->flags & SGX_EPC_PAGE_IS_FREE) {
list_move(&page->list, &node->sgx_poison_page_list);
goto out;
}
/*
* TBD: Add additional plumbing to enable pre-emptive
* action for asynchronous poison notification. Until
* then just hope that the poison:
* a) is not accessed - sgx_free_epc_page() will deal with it
* when the user gives it back
* b) results in a recoverable machine check rather than
* a fatal one
*/
out:
spin_unlock(&node->lock);
return 0;
}
/** /**
* A section metric is concatenated in a way that @low bits 12-31 define the * A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
...@@ -670,6 +782,48 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high) ...@@ -670,6 +782,48 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
((high & GENMASK_ULL(19, 0)) << 32); ((high & GENMASK_ULL(19, 0)) << 32);
} }
#ifdef CONFIG_NUMA
static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
{
return sysfs_emit(buf, "%lu\n", sgx_numa_nodes[dev->id].size);
}
static DEVICE_ATTR_RO(sgx_total_bytes);
static umode_t arch_node_attr_is_visible(struct kobject *kobj,
struct attribute *attr, int idx)
{
/* Make all x86/ attributes invisible when SGX is not initialized: */
if (nodes_empty(sgx_numa_mask))
return 0;
return attr->mode;
}
static struct attribute *arch_node_dev_attrs[] = {
&dev_attr_sgx_total_bytes.attr,
NULL,
};
const struct attribute_group arch_node_dev_group = {
.name = "x86",
.attrs = arch_node_dev_attrs,
.is_visible = arch_node_attr_is_visible,
};
static void __init arch_update_sysfs_visibility(int nid)
{
struct node *node = node_devices[nid];
int ret;
ret = sysfs_update_group(&node->dev.kobj, &arch_node_dev_group);
if (ret)
pr_err("sysfs update failed (%d), files may be invisible", ret);
}
#else /* !CONFIG_NUMA */
static void __init arch_update_sysfs_visibility(int nid) {}
#endif
static bool __init sgx_page_cache_init(void) static bool __init sgx_page_cache_init(void)
{ {
u32 eax, ebx, ecx, edx, type; u32 eax, ebx, ecx, edx, type;
...@@ -713,10 +867,16 @@ static bool __init sgx_page_cache_init(void) ...@@ -713,10 +867,16 @@ static bool __init sgx_page_cache_init(void)
if (!node_isset(nid, sgx_numa_mask)) { if (!node_isset(nid, sgx_numa_mask)) {
spin_lock_init(&sgx_numa_nodes[nid].lock); spin_lock_init(&sgx_numa_nodes[nid].lock);
INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list); INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
node_set(nid, sgx_numa_mask); node_set(nid, sgx_numa_mask);
sgx_numa_nodes[nid].size = 0;
/* Make SGX-specific node sysfs files visible: */
arch_update_sysfs_visibility(nid);
} }
sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
sgx_numa_nodes[nid].size += size;
sgx_nr_epc_sections++; sgx_nr_epc_sections++;
} }
......
...@@ -26,9 +26,13 @@ ...@@ -26,9 +26,13 @@
/* Pages, which are being tracked by the page reclaimer. */ /* Pages, which are being tracked by the page reclaimer. */
#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0) #define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0)
/* Pages on free list */
#define SGX_EPC_PAGE_IS_FREE BIT(1)
struct sgx_epc_page { struct sgx_epc_page {
unsigned int section; unsigned int section;
unsigned int flags; u16 flags;
u16 poison;
struct sgx_encl_page *owner; struct sgx_encl_page *owner;
struct list_head list; struct list_head list;
}; };
...@@ -39,6 +43,8 @@ struct sgx_epc_page { ...@@ -39,6 +43,8 @@ struct sgx_epc_page {
*/ */
struct sgx_numa_node { struct sgx_numa_node {
struct list_head free_page_list; struct list_head free_page_list;
struct list_head sgx_poison_page_list;
unsigned long size;
spinlock_t lock; spinlock_t lock;
}; };
......
...@@ -545,7 +545,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, ...@@ -545,7 +545,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE) ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
!= REGION_INTERSECTS) && != REGION_INTERSECTS) &&
(region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY) (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
!= REGION_INTERSECTS))) != REGION_INTERSECTS) &&
!arch_is_platform_page(base_addr)))
return -EINVAL; return -EINVAL;
inject: inject:
......
...@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) ...@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
return false; return false;
pfn = PHYS_PFN(physical_addr); pfn = PHYS_PFN(physical_addr);
if (!pfn_valid(pfn)) { if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
pr_warn_ratelimited(FW_WARN GHES_PFX pr_warn_ratelimited(FW_WARN GHES_PFX
"Invalid address in generic error data: %#llx\n", "Invalid address in generic error data: %#llx\n",
physical_addr); physical_addr);
......
...@@ -581,6 +581,9 @@ static const struct attribute_group node_dev_group = { ...@@ -581,6 +581,9 @@ static const struct attribute_group node_dev_group = {
static const struct attribute_group *node_dev_groups[] = { static const struct attribute_group *node_dev_groups[] = {
&node_dev_group, &node_dev_group,
#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
&arch_node_dev_group,
#endif
NULL NULL
}; };
......
...@@ -3231,6 +3231,19 @@ extern void shake_page(struct page *p); ...@@ -3231,6 +3231,19 @@ extern void shake_page(struct page *p);
extern atomic_long_t num_poisoned_pages __read_mostly; extern atomic_long_t num_poisoned_pages __read_mostly;
extern int soft_offline_page(unsigned long pfn, int flags); extern int soft_offline_page(unsigned long pfn, int flags);
#ifndef arch_memory_failure
static inline int arch_memory_failure(unsigned long pfn, int flags)
{
return -ENXIO;
}
#endif
#ifndef arch_is_platform_page
static inline bool arch_is_platform_page(u64 paddr)
{
return false;
}
#endif
/* /*
* Error handlers for various types of pages. * Error handlers for various types of pages.
......
...@@ -58,4 +58,8 @@ static inline int phys_to_target_node(u64 start) ...@@ -58,4 +58,8 @@ static inline int phys_to_target_node(u64 start)
} }
#endif #endif
#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
extern const struct attribute_group arch_node_dev_group;
#endif
#endif /* _LINUX_NUMA_H */ #endif /* _LINUX_NUMA_H */
...@@ -1646,21 +1646,28 @@ int memory_failure(unsigned long pfn, int flags) ...@@ -1646,21 +1646,28 @@ int memory_failure(unsigned long pfn, int flags)
if (!sysctl_memory_failure_recovery) if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn); panic("Memory failure on page %lx", pfn);
mutex_lock(&mf_mutex);
p = pfn_to_online_page(pfn); p = pfn_to_online_page(pfn);
if (!p) { if (!p) {
res = arch_memory_failure(pfn, flags);
if (res == 0)
goto unlock_mutex;
if (pfn_valid(pfn)) { if (pfn_valid(pfn)) {
pgmap = get_dev_pagemap(pfn, NULL); pgmap = get_dev_pagemap(pfn, NULL);
if (pgmap) if (pgmap) {
return memory_failure_dev_pagemap(pfn, flags, res = memory_failure_dev_pagemap(pfn, flags,
pgmap); pgmap);
goto unlock_mutex;
}
} }
pr_err("Memory failure: %#lx: memory outside kernel control\n", pr_err("Memory failure: %#lx: memory outside kernel control\n",
pfn); pfn);
return -ENXIO; res = -ENXIO;
goto unlock_mutex;
} }
mutex_lock(&mf_mutex);
try_again: try_again:
if (PageHuge(p)) { if (PageHuge(p)) {
res = memory_failure_hugetlb(pfn, flags); res = memory_failure_hugetlb(pfn, flags);
......
...@@ -45,7 +45,7 @@ $(OUTPUT)/sign_key.o: sign_key.S ...@@ -45,7 +45,7 @@ $(OUTPUT)/sign_key.o: sign_key.S
$(CC) $(HOST_CFLAGS) -c $< -o $@ $(CC) $(HOST_CFLAGS) -c $< -o $@
$(OUTPUT)/test_encl.elf: test_encl.lds test_encl.c test_encl_bootstrap.S $(OUTPUT)/test_encl.elf: test_encl.lds test_encl.c test_encl_bootstrap.S
$(CC) $(ENCL_CFLAGS) -T $^ -o $@ $(CC) $(ENCL_CFLAGS) -T $^ -o $@ -Wl,--build-id=none
EXTRA_CLEAN := \ EXTRA_CLEAN := \
$(OUTPUT)/test_encl.elf \ $(OUTPUT)/test_encl.elf \
......
...@@ -19,13 +19,38 @@ ...@@ -19,13 +19,38 @@
#include "../../../../arch/x86/include/uapi/asm/sgx.h" #include "../../../../arch/x86/include/uapi/asm/sgx.h"
enum encl_op_type { enum encl_op_type {
ENCL_OP_PUT, ENCL_OP_PUT_TO_BUFFER,
ENCL_OP_GET, ENCL_OP_GET_FROM_BUFFER,
ENCL_OP_PUT_TO_ADDRESS,
ENCL_OP_GET_FROM_ADDRESS,
ENCL_OP_NOP,
ENCL_OP_MAX,
}; };
struct encl_op { struct encl_op_header {
uint64_t type; uint64_t type;
uint64_t buffer; };
struct encl_op_put_to_buf {
struct encl_op_header header;
uint64_t value;
};
struct encl_op_get_from_buf {
struct encl_op_header header;
uint64_t value;
};
struct encl_op_put_to_addr {
struct encl_op_header header;
uint64_t value;
uint64_t addr;
};
struct encl_op_get_from_addr {
struct encl_op_header header;
uint64_t value;
uint64_t addr;
}; };
#endif /* DEFINES_H */ #endif /* DEFINES_H */
...@@ -21,6 +21,8 @@ ...@@ -21,6 +21,8 @@
void encl_delete(struct encl *encl) void encl_delete(struct encl *encl)
{ {
struct encl_segment *heap_seg = &encl->segment_tbl[encl->nr_segments - 1];
if (encl->encl_base) if (encl->encl_base)
munmap((void *)encl->encl_base, encl->encl_size); munmap((void *)encl->encl_base, encl->encl_size);
...@@ -30,6 +32,8 @@ void encl_delete(struct encl *encl) ...@@ -30,6 +32,8 @@ void encl_delete(struct encl *encl)
if (encl->fd) if (encl->fd)
close(encl->fd); close(encl->fd);
munmap(heap_seg->src, heap_seg->size);
if (encl->segment_tbl) if (encl->segment_tbl)
free(encl->segment_tbl); free(encl->segment_tbl);
...@@ -107,11 +111,14 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg) ...@@ -107,11 +111,14 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg)
memset(&secinfo, 0, sizeof(secinfo)); memset(&secinfo, 0, sizeof(secinfo));
secinfo.flags = seg->flags; secinfo.flags = seg->flags;
ioc.src = (uint64_t)encl->src + seg->offset; ioc.src = (uint64_t)seg->src;
ioc.offset = seg->offset; ioc.offset = seg->offset;
ioc.length = seg->size; ioc.length = seg->size;
ioc.secinfo = (unsigned long)&secinfo; ioc.secinfo = (unsigned long)&secinfo;
ioc.flags = SGX_PAGE_MEASURE; if (seg->measure)
ioc.flags = SGX_PAGE_MEASURE;
else
ioc.flags = 0;
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_ADD_PAGES, &ioc); rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_ADD_PAGES, &ioc);
if (rc < 0) { if (rc < 0) {
...@@ -122,11 +129,10 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg) ...@@ -122,11 +129,10 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg)
return true; return true;
} }
bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
bool encl_load(const char *path, struct encl *encl)
{ {
const char device_path[] = "/dev/sgx_enclave"; const char device_path[] = "/dev/sgx_enclave";
struct encl_segment *seg;
Elf64_Phdr *phdr_tbl; Elf64_Phdr *phdr_tbl;
off_t src_offset; off_t src_offset;
Elf64_Ehdr *ehdr; Elf64_Ehdr *ehdr;
...@@ -178,6 +184,8 @@ bool encl_load(const char *path, struct encl *encl) ...@@ -178,6 +184,8 @@ bool encl_load(const char *path, struct encl *encl)
ehdr = encl->bin; ehdr = encl->bin;
phdr_tbl = encl->bin + ehdr->e_phoff; phdr_tbl = encl->bin + ehdr->e_phoff;
encl->nr_segments = 1; /* one for the heap */
for (i = 0; i < ehdr->e_phnum; i++) { for (i = 0; i < ehdr->e_phnum; i++) {
Elf64_Phdr *phdr = &phdr_tbl[i]; Elf64_Phdr *phdr = &phdr_tbl[i];
...@@ -193,7 +201,6 @@ bool encl_load(const char *path, struct encl *encl) ...@@ -193,7 +201,6 @@ bool encl_load(const char *path, struct encl *encl)
for (i = 0, j = 0; i < ehdr->e_phnum; i++) { for (i = 0, j = 0; i < ehdr->e_phnum; i++) {
Elf64_Phdr *phdr = &phdr_tbl[i]; Elf64_Phdr *phdr = &phdr_tbl[i];
unsigned int flags = phdr->p_flags; unsigned int flags = phdr->p_flags;
struct encl_segment *seg;
if (phdr->p_type != PT_LOAD) if (phdr->p_type != PT_LOAD)
continue; continue;
...@@ -216,6 +223,7 @@ bool encl_load(const char *path, struct encl *encl) ...@@ -216,6 +223,7 @@ bool encl_load(const char *path, struct encl *encl)
if (j == 0) { if (j == 0) {
src_offset = phdr->p_offset & PAGE_MASK; src_offset = phdr->p_offset & PAGE_MASK;
encl->src = encl->bin + src_offset;
seg->prot = PROT_READ | PROT_WRITE; seg->prot = PROT_READ | PROT_WRITE;
seg->flags = SGX_PAGE_TYPE_TCS << 8; seg->flags = SGX_PAGE_TYPE_TCS << 8;
...@@ -228,15 +236,27 @@ bool encl_load(const char *path, struct encl *encl) ...@@ -228,15 +236,27 @@ bool encl_load(const char *path, struct encl *encl)
seg->offset = (phdr->p_offset & PAGE_MASK) - src_offset; seg->offset = (phdr->p_offset & PAGE_MASK) - src_offset;
seg->size = (phdr->p_filesz + PAGE_SIZE - 1) & PAGE_MASK; seg->size = (phdr->p_filesz + PAGE_SIZE - 1) & PAGE_MASK;
seg->src = encl->src + seg->offset;
seg->measure = true;
j++; j++;
} }
assert(j == encl->nr_segments); assert(j == encl->nr_segments - 1);
seg = &encl->segment_tbl[j];
seg->offset = encl->segment_tbl[j - 1].offset + encl->segment_tbl[j - 1].size;
seg->size = heap_size;
seg->src = mmap(NULL, heap_size, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
seg->prot = PROT_READ | PROT_WRITE;
seg->flags = (SGX_PAGE_TYPE_REG << 8) | seg->prot;
seg->measure = false;
if (seg->src == MAP_FAILED)
goto err;
encl->src = encl->bin + src_offset; encl->src_size = encl->segment_tbl[j].offset + encl->segment_tbl[j].size;
encl->src_size = encl->segment_tbl[j - 1].offset +
encl->segment_tbl[j - 1].size;
for (encl->encl_size = 4096; encl->encl_size < encl->src_size; ) for (encl->encl_size = 4096; encl->encl_size < encl->src_size; )
encl->encl_size <<= 1; encl->encl_size <<= 1;
......
This diff is collapsed.
...@@ -6,11 +6,15 @@ ...@@ -6,11 +6,15 @@
#ifndef MAIN_H #ifndef MAIN_H
#define MAIN_H #define MAIN_H
#define ENCL_HEAP_SIZE_DEFAULT 4096
struct encl_segment { struct encl_segment {
void *src;
off_t offset; off_t offset;
size_t size; size_t size;
unsigned int prot; unsigned int prot;
unsigned int flags; unsigned int flags;
bool measure;
}; };
struct encl { struct encl {
...@@ -31,7 +35,7 @@ extern unsigned char sign_key[]; ...@@ -31,7 +35,7 @@ extern unsigned char sign_key[];
extern unsigned char sign_key_end[]; extern unsigned char sign_key_end[];
void encl_delete(struct encl *ctx); void encl_delete(struct encl *ctx);
bool encl_load(const char *path, struct encl *encl); bool encl_load(const char *path, struct encl *encl, unsigned long heap_size);
bool encl_measure(struct encl *encl); bool encl_measure(struct encl *encl);
bool encl_build(struct encl *encl); bool encl_build(struct encl *encl);
......
...@@ -289,15 +289,17 @@ static bool mrenclave_eextend(EVP_MD_CTX *ctx, uint64_t offset, ...@@ -289,15 +289,17 @@ static bool mrenclave_eextend(EVP_MD_CTX *ctx, uint64_t offset,
static bool mrenclave_segment(EVP_MD_CTX *ctx, struct encl *encl, static bool mrenclave_segment(EVP_MD_CTX *ctx, struct encl *encl,
struct encl_segment *seg) struct encl_segment *seg)
{ {
uint64_t end = seg->offset + seg->size; uint64_t end = seg->size;
uint64_t offset; uint64_t offset;
for (offset = seg->offset; offset < end; offset += PAGE_SIZE) { for (offset = 0; offset < end; offset += PAGE_SIZE) {
if (!mrenclave_eadd(ctx, offset, seg->flags)) if (!mrenclave_eadd(ctx, seg->offset + offset, seg->flags))
return false; return false;
if (!mrenclave_eextend(ctx, offset, encl->src + offset)) if (seg->measure) {
return false; if (!mrenclave_eextend(ctx, seg->offset + offset, seg->src + offset))
return false;
}
} }
return true; return true;
......
...@@ -4,6 +4,11 @@ ...@@ -4,6 +4,11 @@
#include <stddef.h> #include <stddef.h>
#include "defines.h" #include "defines.h"
/*
* Data buffer spanning two pages that will be placed first in .data
* segment. Even if not used internally the second page is needed by
* external test manipulating page permissions.
*/
static uint8_t encl_buffer[8192] = { 1 }; static uint8_t encl_buffer[8192] = { 1 };
static void *memcpy(void *dest, const void *src, size_t n) static void *memcpy(void *dest, const void *src, size_t n)
...@@ -16,20 +21,51 @@ static void *memcpy(void *dest, const void *src, size_t n) ...@@ -16,20 +21,51 @@ static void *memcpy(void *dest, const void *src, size_t n)
return dest; return dest;
} }
void encl_body(void *rdi, void *rsi) static void do_encl_op_put_to_buf(void *op)
{
struct encl_op_put_to_buf *op2 = op;
memcpy(&encl_buffer[0], &op2->value, 8);
}
static void do_encl_op_get_from_buf(void *op)
{ {
struct encl_op *op = (struct encl_op *)rdi; struct encl_op_get_from_buf *op2 = op;
memcpy(&op2->value, &encl_buffer[0], 8);
}
static void do_encl_op_put_to_addr(void *_op)
{
struct encl_op_put_to_addr *op = _op;
memcpy((void *)op->addr, &op->value, 8);
}
switch (op->type) { static void do_encl_op_get_from_addr(void *_op)
case ENCL_OP_PUT: {
memcpy(&encl_buffer[0], &op->buffer, 8); struct encl_op_get_from_addr *op = _op;
break;
memcpy(&op->value, (void *)op->addr, 8);
}
static void do_encl_op_nop(void *_op)
{
}
void encl_body(void *rdi, void *rsi)
{
const void (*encl_op_array[ENCL_OP_MAX])(void *) = {
do_encl_op_put_to_buf,
do_encl_op_get_from_buf,
do_encl_op_put_to_addr,
do_encl_op_get_from_addr,
do_encl_op_nop,
};
case ENCL_OP_GET: struct encl_op_header *op = (struct encl_op_header *)rdi;
memcpy(&op->buffer, &encl_buffer[0], 8);
break;
default: if (op->type < ENCL_OP_MAX)
break; (*encl_op_array[op->type])(op);
}
} }
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
.fill 1, 8, 0 # STATE (set by CPU) .fill 1, 8, 0 # STATE (set by CPU)
.fill 1, 8, 0 # FLAGS .fill 1, 8, 0 # FLAGS
.quad encl_ssa # OSSA .quad encl_ssa_tcs1 # OSSA
.fill 1, 4, 0 # CSSA (set by CPU) .fill 1, 4, 0 # CSSA (set by CPU)
.fill 1, 4, 1 # NSSA .fill 1, 4, 1 # NSSA
.quad encl_entry # OENTRY .quad encl_entry # OENTRY
...@@ -23,10 +23,10 @@ ...@@ -23,10 +23,10 @@
.fill 1, 4, 0xFFFFFFFF # GSLIMIT .fill 1, 4, 0xFFFFFFFF # GSLIMIT
.fill 4024, 1, 0 # Reserved .fill 4024, 1, 0 # Reserved
# Identical to the previous TCS. # TCS2
.fill 1, 8, 0 # STATE (set by CPU) .fill 1, 8, 0 # STATE (set by CPU)
.fill 1, 8, 0 # FLAGS .fill 1, 8, 0 # FLAGS
.quad encl_ssa # OSSA .quad encl_ssa_tcs2 # OSSA
.fill 1, 4, 0 # CSSA (set by CPU) .fill 1, 4, 0 # CSSA (set by CPU)
.fill 1, 4, 1 # NSSA .fill 1, 4, 1 # NSSA
.quad encl_entry # OENTRY .quad encl_entry # OENTRY
...@@ -40,8 +40,9 @@ ...@@ -40,8 +40,9 @@
.text .text
encl_entry: encl_entry:
# RBX contains the base address for TCS, which is also the first address # RBX contains the base address for TCS, which is the first address
# inside the enclave. By adding the value of le_stack_end to it, we get # inside the enclave for TCS #1 and one page into the enclave for
# TCS #2. By adding the value of encl_stack to it, we get
# the absolute address for the stack. # the absolute address for the stack.
lea (encl_stack)(%rbx), %rax lea (encl_stack)(%rbx), %rax
xchg %rsp, %rax xchg %rsp, %rax
...@@ -81,9 +82,15 @@ encl_entry: ...@@ -81,9 +82,15 @@ encl_entry:
.section ".data", "aw" .section ".data", "aw"
encl_ssa: encl_ssa_tcs1:
.space 4096
encl_ssa_tcs2:
.space 4096 .space 4096
.balign 4096 .balign 4096
.space 8192 # Stack of TCS #1
.space 4096
encl_stack: encl_stack:
.balign 4096
# Stack of TCS #2
.space 4096
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment