Commit 7fa95f9a authored by Nicholas Piggin's avatar Nicholas Piggin Committed by Michael Ellerman

powerpc/64s: system call support for scv/rfscv instructions

Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that LR is not preserved, nor
are volatile CR registers, and error is not indicated with CR0[SO],
but by returning a negative errno.

rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve LR.

getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.
Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
[mpe: Fix ppc64e build]
Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
parent b2dc2977
...@@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI ...@@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI
syscall syscall
======= =======
Invocation
----------
The syscall is made with the sc instruction, and returns with execution
continuing at the instruction following the sc instruction.
If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the
scv 0 instruction is an alternative that may provide better performance,
with some differences to calling sequence.
syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI
specification C function calling sequence, including register preservation specification C function calling sequence, including register preservation
rules, with the following differences. rules, with the following differences.
...@@ -12,16 +21,23 @@ rules, with the following differences. ...@@ -12,16 +21,23 @@ rules, with the following differences.
.. [1] Some syscalls (typically low-level management functions) may have .. [1] Some syscalls (typically low-level management functions) may have
different calling sequences (e.g., rt_sigreturn). different calling sequences (e.g., rt_sigreturn).
Parameters and return value Parameters
--------------------------- ----------
The system call number is specified in r0. The system call number is specified in r0.
There is a maximum of 6 integer parameters to a syscall, passed in r3-r8. There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
Both a return value and a return error code are returned. cr0.SO is the return Return value
error code, and r3 is the return value or error code. When cr0.SO is clear, ------------
the syscall succeeded and r3 is the return value. When cr0.SO is set, the - For the sc instruction, both a value and an error condition are returned.
syscall failed and r3 is the error code that generally corresponds to errno. cr0.SO is the error condition, and r3 is the return value. When cr0.SO is
clear, the syscall succeeded and r3 is the return value. When cr0.SO is set,
the syscall failed and r3 is the error value (that normally corresponds to
errno).
- For the scv 0 instruction, the return value indicates failure if it is
-4095..-1 (i.e., it is >= -MAX_ERRNO (-4095) as an unsigned comparison),
in which case the error value is the negated return value.
Stack Stack
----- -----
...@@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling sequence with the ...@@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling sequence with the
following differences: following differences:
=========== ============= ======================================== =========== ============= ========================================
--- For the sc instruction, differences with the ELF ABI ---
r0 Volatile (System call number.) r0 Volatile (System call number.)
r3 Volatile (Parameter 1, and return value.) r3 Volatile (Parameter 1, and return value.)
r4-r8 Volatile (Parameters 2-6.) r4-r8 Volatile (Parameters 2-6.)
cr0 Volatile (cr0.SO is the return error condition) cr0 Volatile (cr0.SO is the return error condition.)
cr1, cr5-7 Nonvolatile cr1, cr5-7 Nonvolatile
lr Nonvolatile lr Nonvolatile
--- For the scv 0 instruction, differences with the ELF ABI ---
r0 Volatile (System call number.)
r3 Volatile (Parameter 1, and return value.)
r4-r8 Volatile (Parameters 2-6.)
=========== ============= ======================================== =========== ============= ========================================
All floating point and vector data registers as well as control and status All floating point and vector data registers as well as control and status
registers are nonvolatile. registers are nonvolatile.
Invocation
----------
The syscall is performed with the sc instruction, and returns with execution
continuing at the instruction following the sc instruction.
Transactional Memory Transactional Memory
-------------------- --------------------
Syscall behavior can change if the processor is in transactional or suspended Syscall behavior can change if the processor is in transactional or suspended
...@@ -75,6 +92,7 @@ auxiliary vector. ...@@ -75,6 +92,7 @@ auxiliary vector.
returning to the caller. This case is not well defined or supported, so this returning to the caller. This case is not well defined or supported, so this
behavior should not be relied upon. behavior should not be relied upon.
scv 0 syscalls will always behave as PPC_FEATURE2_HTM_NOSC.
vsyscall vsyscall
======== ========
......
...@@ -98,7 +98,7 @@ unsigned long __init early_init(unsigned long dt_ptr); ...@@ -98,7 +98,7 @@ unsigned long __init early_init(unsigned long dt_ptr);
void __init machine_init(u64 dt_ptr); void __init machine_init(u64 dt_ptr);
#endif #endif
long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs); long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs); notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs, long scv);
notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr); notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr); notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
......
...@@ -123,6 +123,12 @@ ...@@ -123,6 +123,12 @@
hrfid; \ hrfid; \
b hrfi_flush_fallback b hrfi_flush_fallback
#define RFSCV_TO_USER \
STF_EXIT_BARRIER_SLOT; \
RFI_FLUSH_SLOT; \
RFSCV; \
b rfscv_flush_fallback
#endif /* __ASSEMBLY__ */ #endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_EXCEPTION_H */ #endif /* _ASM_POWERPC_EXCEPTION_H */
...@@ -128,7 +128,7 @@ end_##sname: ...@@ -128,7 +128,7 @@ end_##sname:
.if ((start) % (size) != 0); \ .if ((start) % (size) != 0); \
.error "Fixed section exception vector misalignment"; \ .error "Fixed section exception vector misalignment"; \
.endif; \ .endif; \
.if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100); \ .if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100) && ((size) != 0x1000); \
.error "Fixed section exception vector bad size"; \ .error "Fixed section exception vector bad size"; \
.endif; \ .endif; \
.if (start) < sname##_start; \ .if (start) < sname##_start; \
......
...@@ -755,6 +755,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96) ...@@ -755,6 +755,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
#define N_SLINE 68 #define N_SLINE 68
#define N_SO 100 #define N_SO 100
#define RFSCV .long 0x4c0000a4
/* /*
* Create an endian fixup trampoline * Create an endian fixup trampoline
* *
......
...@@ -222,9 +222,14 @@ static inline void set_trap(struct pt_regs *regs, unsigned long val) ...@@ -222,9 +222,14 @@ static inline void set_trap(struct pt_regs *regs, unsigned long val)
regs->trap = (regs->trap & TRAP_FLAGS_MASK) | (val & ~TRAP_FLAGS_MASK); regs->trap = (regs->trap & TRAP_FLAGS_MASK) | (val & ~TRAP_FLAGS_MASK);
} }
static inline bool trap_is_scv(struct pt_regs *regs)
{
return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x3000);
}
static inline bool trap_is_syscall(struct pt_regs *regs) static inline bool trap_is_syscall(struct pt_regs *regs)
{ {
return TRAP(regs) == 0xc00; return (trap_is_scv(regs) || TRAP(regs) == 0xc00);
} }
static inline bool trap_norestart(struct pt_regs *regs) static inline bool trap_norestart(struct pt_regs *regs)
......
...@@ -30,12 +30,12 @@ void setup_panic(void); ...@@ -30,12 +30,12 @@ void setup_panic(void);
#define ARCH_PANIC_TIMEOUT 180 #define ARCH_PANIC_TIMEOUT 180
#ifdef CONFIG_PPC_PSERIES #ifdef CONFIG_PPC_PSERIES
extern void pseries_enable_reloc_on_exc(void); extern bool pseries_enable_reloc_on_exc(void);
extern void pseries_disable_reloc_on_exc(void); extern void pseries_disable_reloc_on_exc(void);
extern void pseries_big_endian_exceptions(void); extern void pseries_big_endian_exceptions(void);
extern void pseries_little_endian_exceptions(void); extern void pseries_little_endian_exceptions(void);
#else #else
static inline void pseries_enable_reloc_on_exc(void) {} static inline bool pseries_enable_reloc_on_exc(void) { return false; }
static inline void pseries_disable_reloc_on_exc(void) {} static inline void pseries_disable_reloc_on_exc(void) {}
static inline void pseries_big_endian_exceptions(void) {} static inline void pseries_big_endian_exceptions(void) {}
static inline void pseries_little_endian_exceptions(void) {} static inline void pseries_little_endian_exceptions(void) {}
......
...@@ -40,6 +40,7 @@ enum instruction_type { ...@@ -40,6 +40,7 @@ enum instruction_type {
CACHEOP, CACHEOP,
BARRIER, BARRIER,
SYSCALL, SYSCALL,
SYSCALL_VECTORED_0,
MFMSR, MFMSR,
MTMSR, MTMSR,
RFI, RFI,
......
...@@ -98,7 +98,7 @@ _GLOBAL(__setup_cpu_power10) ...@@ -98,7 +98,7 @@ _GLOBAL(__setup_cpu_power10)
_GLOBAL(__setup_cpu_power9) _GLOBAL(__setup_cpu_power9)
mflr r11 mflr r11
bl __init_FSCR bl __init_FSCR_power9
1: bl __init_PMU 1: bl __init_PMU
bl __init_hvmode_206 bl __init_hvmode_206
mtlr r11 mtlr r11
...@@ -128,7 +128,7 @@ _GLOBAL(__restore_cpu_power10) ...@@ -128,7 +128,7 @@ _GLOBAL(__restore_cpu_power10)
_GLOBAL(__restore_cpu_power9) _GLOBAL(__restore_cpu_power9)
mflr r11 mflr r11
bl __init_FSCR bl __init_FSCR_power9
1: bl __init_PMU 1: bl __init_PMU
mfmsr r3 mfmsr r3
rldicl. r0,r3,4,63 rldicl. r0,r3,4,63
...@@ -198,6 +198,12 @@ __init_FSCR_power10: ...@@ -198,6 +198,12 @@ __init_FSCR_power10:
mtspr SPRN_FSCR, r3 mtspr SPRN_FSCR, r3
// fall through // fall through
__init_FSCR_power9:
mfspr r3, SPRN_FSCR
ori r3, r3, FSCR_SCV
mtspr SPRN_FSCR, r3
// fall through
__init_FSCR: __init_FSCR:
mfspr r3,SPRN_FSCR mfspr r3,SPRN_FSCR
ori r3,r3,FSCR_TAR|FSCR_EBB ori r3,r3,FSCR_TAR|FSCR_EBB
......
...@@ -120,7 +120,8 @@ extern void __restore_cpu_e6500(void); ...@@ -120,7 +120,8 @@ extern void __restore_cpu_e6500(void);
#define COMMON_USER2_POWER9 (COMMON_USER2_POWER8 | \ #define COMMON_USER2_POWER9 (COMMON_USER2_POWER8 | \
PPC_FEATURE2_ARCH_3_00 | \ PPC_FEATURE2_ARCH_3_00 | \
PPC_FEATURE2_HAS_IEEE128 | \ PPC_FEATURE2_HAS_IEEE128 | \
PPC_FEATURE2_DARN ) PPC_FEATURE2_DARN | \
PPC_FEATURE2_SCV)
#define COMMON_USER_POWER10 COMMON_USER_POWER9 #define COMMON_USER_POWER10 COMMON_USER_POWER9
#define COMMON_USER2_POWER10 (COMMON_USER2_POWER9 | \ #define COMMON_USER2_POWER10 (COMMON_USER2_POWER9 | \
PPC_FEATURE2_ARCH_3_1 | \ PPC_FEATURE2_ARCH_3_1 | \
......
...@@ -587,6 +587,7 @@ static struct dt_cpu_feature_match __initdata ...@@ -587,6 +587,7 @@ static struct dt_cpu_feature_match __initdata
{"little-endian", feat_enable_le, CPU_FTR_REAL_LE}, {"little-endian", feat_enable_le, CPU_FTR_REAL_LE},
{"smt", feat_enable_smt, 0}, {"smt", feat_enable_smt, 0},
{"interrupt-facilities", feat_enable, 0}, {"interrupt-facilities", feat_enable, 0},
{"system-call-vectored", feat_enable, 0},
{"timer-facilities", feat_enable, 0}, {"timer-facilities", feat_enable, 0},
{"timer-facilities-v3", feat_enable, 0}, {"timer-facilities-v3", feat_enable, 0},
{"debug-facilities", feat_enable, 0}, {"debug-facilities", feat_enable, 0},
......
...@@ -64,15 +64,173 @@ exception_marker: ...@@ -64,15 +64,173 @@ exception_marker:
.section ".text" .section ".text"
.align 7 .align 7
#ifdef CONFIG_PPC_BOOK3S
.macro system_call_vectored name trapnr
.globl system_call_vectored_\name
system_call_vectored_\name:
_ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
BEGIN_FTR_SECTION
extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
bne .Ltabort_syscall
END_FTR_SECTION_IFSET(CPU_FTR_TM)
#endif
INTERRUPT_TO_KERNEL
mr r10,r1
ld r1,PACAKSAVE(r13)
std r10,0(r1)
std r11,_NIP(r1)
std r12,_MSR(r1)
std r0,GPR0(r1)
std r10,GPR1(r1)
std r2,GPR2(r1)
ld r2,PACATOC(r13)
mfcr r12
li r11,0
/* Can we avoid saving r3-r8 in common case? */
std r3,GPR3(r1)
std r4,GPR4(r1)
std r5,GPR5(r1)
std r6,GPR6(r1)
std r7,GPR7(r1)
std r8,GPR8(r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
std r11,GPR10(r1)
std r11,GPR11(r1)
std r11,GPR12(r1)
std r9,GPR13(r1)
SAVE_NVGPRS(r1)
std r11,_XER(r1)
std r11,_LINK(r1)
std r11,_CTR(r1)
li r11,\trapnr
std r11,_TRAP(r1)
std r12,_CCR(r1)
std r3,ORIG_GPR3(r1)
addi r10,r1,STACK_FRAME_OVERHEAD
ld r11,exception_marker@toc(r2)
std r11,-16(r10) /* "regshere" marker */
/*
* RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
* would clobber syscall parameters. Also we always enter with IRQs
* enabled and nothing pending. system_call_exception() will call
* trace_hardirqs_off().
*
* scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS. The
* entry vector already sets PACAIRQSOFTMASK to IRQS_ALL_DISABLED.
*/
/* Calling convention has r9 = orig r0, r10 = regs */
mr r9,r0
bl system_call_exception
.Lsyscall_vectored_\name\()_exit:
addi r4,r1,STACK_FRAME_OVERHEAD
li r5,1 /* scv */
bl syscall_exit_prepare
ld r2,_CCR(r1)
ld r4,_NIP(r1)
ld r5,_MSR(r1)
BEGIN_FTR_SECTION
stdcx. r0,0,r1 /* to clear the reservation */
END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
BEGIN_FTR_SECTION
HMT_MEDIUM_LOW
END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
cmpdi r3,0
bne .Lsyscall_vectored_\name\()_restore_regs
/* rfscv returns with LR->NIA and CTR->MSR */
mtlr r4
mtctr r5
/* Could zero these as per ABI, but we may consider a stricter ABI
* which preserves these if libc implementations can benefit, so
* restore them for now until further measurement is done. */
ld r0,GPR0(r1)
ld r4,GPR4(r1)
ld r5,GPR5(r1)
ld r6,GPR6(r1)
ld r7,GPR7(r1)
ld r8,GPR8(r1)
/* Zero volatile regs that may contain sensitive kernel data */
li r9,0
li r10,0
li r11,0
li r12,0
mtspr SPRN_XER,r0
/*
* We don't need to restore AMR on the way back to userspace for KUAP.
* The value of AMR only matters while we're in the kernel.
*/
mtcr r2
ld r2,GPR2(r1)
ld r3,GPR3(r1)
ld r13,GPR13(r1)
ld r1,GPR1(r1)
RFSCV_TO_USER
b . /* prevent speculative execution */
.Lsyscall_vectored_\name\()_restore_regs:
li r3,0
mtmsrd r3,1
mtspr SPRN_SRR0,r4
mtspr SPRN_SRR1,r5
ld r3,_CTR(r1)
ld r4,_LINK(r1)
ld r5,_XER(r1)
REST_NVGPRS(r1)
ld r0,GPR0(r1)
mtcr r2
mtctr r3
mtlr r4
mtspr SPRN_XER,r5
REST_10GPRS(2, r1)
REST_2GPRS(12, r1)
ld r1,GPR1(r1)
RFI_TO_USER
.endm
system_call_vectored common 0x3000
/*
* We instantiate another entry copy for the SIGILL variant, with TRAP=0x7ff0
* which is tested by system_call_exception when r0 is -1 (as set by vector
* entry code).
*/
system_call_vectored sigill 0x7ff0
/*
* Entered via kernel return set up by kernel/sstep.c, must match entry regs
*/
.globl system_call_vectored_emulate
system_call_vectored_emulate:
_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
li r10,IRQS_ALL_DISABLED
stb r10,PACAIRQSOFTMASK(r13)
b system_call_vectored_common
#endif
.balign IFETCH_ALIGN_BYTES
.globl system_call_common .globl system_call_common
system_call_common: system_call_common:
_ASM_NOKPROBE_SYMBOL(system_call_common)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
BEGIN_FTR_SECTION BEGIN_FTR_SECTION
extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */ extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
bne .Ltabort_syscall bne .Ltabort_syscall
END_FTR_SECTION_IFSET(CPU_FTR_TM) END_FTR_SECTION_IFSET(CPU_FTR_TM)
#endif #endif
_ASM_NOKPROBE_SYMBOL(system_call_common)
mr r10,r1 mr r10,r1
ld r1,PACAKSAVE(r13) ld r1,PACAKSAVE(r13)
std r10,0(r1) std r10,0(r1)
...@@ -138,6 +296,7 @@ END_BTB_FLUSH_SECTION ...@@ -138,6 +296,7 @@ END_BTB_FLUSH_SECTION
.Lsyscall_exit: .Lsyscall_exit:
addi r4,r1,STACK_FRAME_OVERHEAD addi r4,r1,STACK_FRAME_OVERHEAD
li r5,0 /* !scv */
bl syscall_exit_prepare bl syscall_exit_prepare
ld r2,_CCR(r1) ld r2,_CCR(r1)
...@@ -224,10 +383,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ...@@ -224,10 +383,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
b . /* prevent speculative execution */ b . /* prevent speculative execution */
#endif #endif
#ifdef CONFIG_PPC_BOOK3S
_GLOBAL(ret_from_fork_scv)
bl schedule_tail
REST_NVGPRS(r1)
li r3,0 /* fork() return value */
b .Lsyscall_vectored_common_exit
#endif
_GLOBAL(ret_from_fork) _GLOBAL(ret_from_fork)
bl schedule_tail bl schedule_tail
REST_NVGPRS(r1) REST_NVGPRS(r1)
li r3,0 li r3,0 /* fork() return value */
b .Lsyscall_exit b .Lsyscall_exit
_GLOBAL(ret_from_kernel_thread) _GLOBAL(ret_from_kernel_thread)
......
...@@ -756,6 +756,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) ...@@ -756,6 +756,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
* guarantee they will be delivered virtually. Some conditions (see the ISA) * guarantee they will be delivered virtually. Some conditions (see the ISA)
* cause exceptions to be delivered in real mode. * cause exceptions to be delivered in real mode.
* *
* The scv instructions are a special case. They get a 0x3000 offset applied.
* scv exceptions have unique reentrancy properties, see below.
*
* It's impossible to receive interrupts below 0x300 via AIL. * It's impossible to receive interrupts below 0x300 via AIL.
* *
* KVM: None of the virtual exceptions are from the guest. Anything that * KVM: None of the virtual exceptions are from the guest. Anything that
...@@ -765,8 +768,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) ...@@ -765,8 +768,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
* We layout physical memory as follows: * We layout physical memory as follows:
* 0x0000 - 0x00ff : Secondary processor spin code * 0x0000 - 0x00ff : Secondary processor spin code
* 0x0100 - 0x18ff : Real mode pSeries interrupt vectors * 0x0100 - 0x18ff : Real mode pSeries interrupt vectors
* 0x1900 - 0x3fff : Real mode trampolines * 0x1900 - 0x2fff : Real mode trampolines
* 0x4000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors * 0x3000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors
* 0x5900 - 0x6fff : Relon mode trampolines * 0x5900 - 0x6fff : Relon mode trampolines
* 0x7000 - 0x7fff : FWNMI data area * 0x7000 - 0x7fff : FWNMI data area
* 0x8000 - .... : Common interrupt handlers, remaining early * 0x8000 - .... : Common interrupt handlers, remaining early
...@@ -777,8 +780,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) ...@@ -777,8 +780,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
* vectors there. * vectors there.
*/ */
OPEN_FIXED_SECTION(real_vectors, 0x0100, 0x1900) OPEN_FIXED_SECTION(real_vectors, 0x0100, 0x1900)
OPEN_FIXED_SECTION(real_trampolines, 0x1900, 0x4000) OPEN_FIXED_SECTION(real_trampolines, 0x1900, 0x3000)
OPEN_FIXED_SECTION(virt_vectors, 0x4000, 0x5900) OPEN_FIXED_SECTION(virt_vectors, 0x3000, 0x5900)
OPEN_FIXED_SECTION(virt_trampolines, 0x5900, 0x7000) OPEN_FIXED_SECTION(virt_trampolines, 0x5900, 0x7000)
#ifdef CONFIG_PPC_POWERNV #ifdef CONFIG_PPC_POWERNV
...@@ -814,6 +817,77 @@ USE_FIXED_SECTION(real_vectors) ...@@ -814,6 +817,77 @@ USE_FIXED_SECTION(real_vectors)
.globl __start_interrupts .globl __start_interrupts
__start_interrupts: __start_interrupts:
/**
* Interrupt 0x3000 - System Call Vectored Interrupt (syscall).
* This is a synchronous interrupt invoked with the "scv" instruction. The
* system call does not alter the HV bit, so it is directed to the OS.
*
* Handling:
* scv instructions enter the kernel without changing EE, RI, ME, or HV.
* In particular, this means we can take a maskable interrupt at any point
* in the scv handler, which is unlike any other interrupt. This is solved
* by treating the instruction addresses below __end_interrupts as being
* soft-masked.
*
* AIL-0 mode scv exceptions go to 0x17000-0x17fff, but we set AIL-3 and
* ensure scv is never executed with relocation off, which means AIL-0
* should never happen.
*
* Before leaving the below __end_interrupts text, at least of the following
* must be true:
* - MSR[PR]=1 (i.e., return to userspace)
* - MSR_EE|MSR_RI is set (no reentrant exceptions)
* - Standard kernel environment is set up (stack, paca, etc)
*
* Call convention:
*
* syscall register convention is in Documentation/powerpc/syscall64-abi.rst
*/
EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000)
/* SCV 0 */
mr r9,r13
GET_PACA(r13)
mflr r11
mfctr r12
li r10,IRQS_ALL_DISABLED
stb r10,PACAIRQSOFTMASK(r13)
#ifdef CONFIG_RELOCATABLE
b system_call_vectored_tramp
#else
b system_call_vectored_common
#endif
nop
/* SCV 1 - 127 */
.rept 127
mr r9,r13
GET_PACA(r13)
mflr r11
mfctr r12
li r10,IRQS_ALL_DISABLED
stb r10,PACAIRQSOFTMASK(r13)
li r0,-1 /* cause failure */
#ifdef CONFIG_RELOCATABLE
b system_call_vectored_sigill_tramp
#else
b system_call_vectored_sigill
#endif
.endr
EXC_VIRT_END(system_call_vectored, 0x3000, 0x1000)
#ifdef CONFIG_RELOCATABLE
TRAMP_VIRT_BEGIN(system_call_vectored_tramp)
__LOAD_HANDLER(r10, system_call_vectored_common)
mtctr r10
bctr
TRAMP_VIRT_BEGIN(system_call_vectored_sigill_tramp)
__LOAD_HANDLER(r10, system_call_vectored_sigill)
mtctr r10
bctr
#endif
/* No virt vectors corresponding with 0x0..0x100 */ /* No virt vectors corresponding with 0x0..0x100 */
EXC_VIRT_NONE(0x4000, 0x100) EXC_VIRT_NONE(0x4000, 0x100)
...@@ -2963,6 +3037,47 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback) ...@@ -2963,6 +3037,47 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
GET_SCRATCH0(r13); GET_SCRATCH0(r13);
hrfid hrfid
TRAMP_REAL_BEGIN(rfscv_flush_fallback)
/* system call volatile */
mr r7,r13
GET_PACA(r13);
mr r8,r1
ld r1,PACAKSAVE(r13)
mfctr r9
ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
ld r11,PACA_L1D_FLUSH_SIZE(r13)
srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */
mtctr r11
DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
/* order ld/st prior to dcbt stop all streams with flushing */
sync
/*
* The load adresses are at staggered offsets within cachelines,
* which suits some pipelines better (on others it should not
* hurt).
*/
1:
ld r11,(0x80 + 8)*0(r10)
ld r11,(0x80 + 8)*1(r10)
ld r11,(0x80 + 8)*2(r10)
ld r11,(0x80 + 8)*3(r10)
ld r11,(0x80 + 8)*4(r10)
ld r11,(0x80 + 8)*5(r10)
ld r11,(0x80 + 8)*6(r10)
ld r11,(0x80 + 8)*7(r10)
addi r10,r10,0x80*8
bdnz 1b
mtctr r9
li r9,0
li r10,0
li r11,0
mr r1,r8
mr r13,r7
RFSCV
USE_TEXT_SECTION() USE_TEXT_SECTION()
MASKED_INTERRUPT MASKED_INTERRUPT
MASKED_INTERRUPT hsrr=1 MASKED_INTERRUPT hsrr=1
......
...@@ -1599,6 +1599,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, ...@@ -1599,6 +1599,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
{ {
struct pt_regs *childregs, *kregs; struct pt_regs *childregs, *kregs;
extern void ret_from_fork(void); extern void ret_from_fork(void);
extern void ret_from_fork_scv(void);
extern void ret_from_kernel_thread(void); extern void ret_from_kernel_thread(void);
void (*f)(void); void (*f)(void);
unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE; unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE;
...@@ -1635,7 +1636,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, ...@@ -1635,7 +1636,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
if (usp) if (usp)
childregs->gpr[1] = usp; childregs->gpr[1] = usp;
p->thread.regs = childregs; p->thread.regs = childregs;
childregs->gpr[3] = 0; /* Result from fork() */ /* 64s sets this in ret_from_fork */
if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
childregs->gpr[3] = 0; /* Result from fork() */
if (clone_flags & CLONE_SETTLS) { if (clone_flags & CLONE_SETTLS) {
if (!is_32bit_task()) if (!is_32bit_task())
childregs->gpr[13] = tls; childregs->gpr[13] = tls;
...@@ -1643,7 +1646,10 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, ...@@ -1643,7 +1646,10 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
childregs->gpr[2] = tls; childregs->gpr[2] = tls;
} }
f = ret_from_fork; if (trap_is_scv(regs))
f = ret_from_fork_scv;
else
f = ret_from_fork;
} }
childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
sp -= STACK_FRAME_OVERHEAD; sp -= STACK_FRAME_OVERHEAD;
......
...@@ -196,7 +196,10 @@ static void __init configure_exceptions(void) ...@@ -196,7 +196,10 @@ static void __init configure_exceptions(void)
/* Under a PAPR hypervisor, we need hypercalls */ /* Under a PAPR hypervisor, we need hypercalls */
if (firmware_has_feature(FW_FEATURE_SET_MODE)) { if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
/* Enable AIL if possible */ /* Enable AIL if possible */
pseries_enable_reloc_on_exc(); if (!pseries_enable_reloc_on_exc()) {
init_task.thread.fscr &= ~FSCR_SCV;
cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
}
/* /*
* Tell the hypervisor that we want our exceptions to * Tell the hypervisor that we want our exceptions to
......
...@@ -205,8 +205,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka, ...@@ -205,8 +205,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
return; return;
/* error signalled ? */ /* error signalled ? */
if (!(regs->ccr & 0x10000000)) if (trap_is_scv(regs)) {
/* 32-bit compat mode sign extend? */
if (!IS_ERR_VALUE(ret))
return;
ret = -ret;
} else if (!(regs->ccr & 0x10000000)) {
return; return;
}
switch (ret) { switch (ret) {
case ERESTART_RESTARTBLOCK: case ERESTART_RESTARTBLOCK:
...@@ -239,9 +245,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka, ...@@ -239,9 +245,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
regs->nip -= 4; regs->nip -= 4;
regs->result = 0; regs->result = 0;
} else { } else {
regs->result = -EINTR; if (trap_is_scv(regs)) {
regs->gpr[3] = EINTR; regs->result = -EINTR;
regs->ccr |= 0x10000000; regs->gpr[3] = -EINTR;
} else {
regs->result = -EINTR;
regs->gpr[3] = EINTR;
regs->ccr |= 0x10000000;
}
} }
} }
......
...@@ -60,6 +60,11 @@ notrace long system_call_exception(long r3, long r4, long r5, ...@@ -60,6 +60,11 @@ notrace long system_call_exception(long r3, long r4, long r5,
local_irq_enable(); local_irq_enable();
if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) { if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
if (unlikely(regs->trap == 0x7ff0)) {
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return regs->gpr[3];
}
/* /*
* We use the return value of do_syscall_trace_enter() as the * We use the return value of do_syscall_trace_enter() as the
* syscall number. If the syscall was rejected for any reason * syscall number. If the syscall was rejected for any reason
...@@ -78,6 +83,11 @@ notrace long system_call_exception(long r3, long r4, long r5, ...@@ -78,6 +83,11 @@ notrace long system_call_exception(long r3, long r4, long r5,
r8 = regs->gpr[8]; r8 = regs->gpr[8];
} else if (unlikely(r0 >= NR_syscalls)) { } else if (unlikely(r0 >= NR_syscalls)) {
if (unlikely(regs->trap == 0x7ff0)) {
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return regs->gpr[3];
}
return -ENOSYS; return -ENOSYS;
} }
...@@ -105,16 +115,20 @@ notrace long system_call_exception(long r3, long r4, long r5, ...@@ -105,16 +115,20 @@ notrace long system_call_exception(long r3, long r4, long r5,
* local irqs must be disabled. Returns false if the caller must re-enable * local irqs must be disabled. Returns false if the caller must re-enable
* them, check for new work, and try again. * them, check for new work, and try again.
*/ */
static notrace inline bool prep_irq_for_enabled_exit(void) static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri)
{ {
/* This must be done with RI=1 because tracing may touch vmaps */ /* This must be done with RI=1 because tracing may touch vmaps */
trace_hardirqs_on(); trace_hardirqs_on();
/* This pattern matches prep_irq_for_idle */ /* This pattern matches prep_irq_for_idle */
__hard_EE_RI_disable(); if (clear_ri)
__hard_EE_RI_disable();
else
__hard_irq_disable();
if (unlikely(lazy_irq_pending_nocheck())) { if (unlikely(lazy_irq_pending_nocheck())) {
/* Took an interrupt, may have more exit work to do. */ /* Took an interrupt, may have more exit work to do. */
__hard_RI_enable(); if (clear_ri)
__hard_RI_enable();
trace_hardirqs_off(); trace_hardirqs_off();
local_paca->irq_happened |= PACA_IRQ_HARD_DIS; local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
...@@ -136,7 +150,8 @@ static notrace inline bool prep_irq_for_enabled_exit(void) ...@@ -136,7 +150,8 @@ static notrace inline bool prep_irq_for_enabled_exit(void)
* because RI=0 and soft mask state is "unreconciled", so it is marked notrace. * because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
*/ */
notrace unsigned long syscall_exit_prepare(unsigned long r3, notrace unsigned long syscall_exit_prepare(unsigned long r3,
struct pt_regs *regs) struct pt_regs *regs,
long scv)
{ {
unsigned long *ti_flagsp = &current_thread_info()->flags; unsigned long *ti_flagsp = &current_thread_info()->flags;
unsigned long ti_flags; unsigned long ti_flags;
...@@ -151,7 +166,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, ...@@ -151,7 +166,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
ti_flags = *ti_flagsp; ti_flags = *ti_flagsp;
if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) { if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) { if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
r3 = -r3; r3 = -r3;
regs->ccr |= 0x10000000; /* Set SO bit in CR */ regs->ccr |= 0x10000000; /* Set SO bit in CR */
...@@ -211,7 +226,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, ...@@ -211,7 +226,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
} }
} }
if (unlikely(!prep_irq_for_enabled_exit())) { /* scv need not set RI=0 because SRRs are not used */
if (unlikely(!prep_irq_for_enabled_exit(!scv))) {
local_irq_enable(); local_irq_enable();
goto again; goto again;
} }
...@@ -282,7 +298,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned ...@@ -282,7 +298,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
} }
} }
if (unlikely(!prep_irq_for_enabled_exit())) { if (unlikely(!prep_irq_for_enabled_exit(true))) {
local_irq_enable(); local_irq_enable();
local_irq_disable(); local_irq_disable();
goto again; goto again;
...@@ -345,7 +361,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign ...@@ -345,7 +361,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
} }
} }
if (unlikely(!prep_irq_for_enabled_exit())) { if (unlikely(!prep_irq_for_enabled_exit(true))) {
/* /*
* Can't local_irq_restore to replay if we were in * Can't local_irq_restore to replay if we were in
* interrupt context. Must replay directly. * interrupt context. Must replay directly.
......
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
#include <asm/disassemble.h> #include <asm/disassemble.h>
extern char system_call_common[]; extern char system_call_common[];
extern char system_call_vectored_emulate[];
#ifdef CONFIG_PPC64 #ifdef CONFIG_PPC64
/* Bits in SRR1 that are copied from MSR */ /* Bits in SRR1 that are copied from MSR */
...@@ -1236,6 +1237,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, ...@@ -1236,6 +1237,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
case 17: /* sc */ case 17: /* sc */
if ((word & 0xfe2) == 2) if ((word & 0xfe2) == 2)
op->type = SYSCALL; op->type = SYSCALL;
else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
(word & 0xfe3) == 1)
op->type = SYSCALL_VECTORED_0;
else else
op->type = UNKNOWN; op->type = UNKNOWN;
return 0; return 0;
...@@ -3378,6 +3382,18 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst instr) ...@@ -3378,6 +3382,18 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst instr)
regs->msr = MSR_KERNEL; regs->msr = MSR_KERNEL;
return 1; return 1;
#ifdef CONFIG_PPC64_BOOK3S
case SYSCALL_VECTORED_0: /* scv 0 */
regs->gpr[9] = regs->gpr[13];
regs->gpr[10] = MSR_KERNEL;
regs->gpr[11] = regs->nip + 4;
regs->gpr[12] = regs->msr & MSR_MASK;
regs->gpr[13] = (unsigned long) get_paca();
regs->nip = (unsigned long) &system_call_vectored_emulate;
regs->msr = MSR_KERNEL;
return 1;
#endif
case RFI: case RFI:
return -1; return -1;
#endif #endif
......
...@@ -358,7 +358,7 @@ static void pseries_lpar_idle(void) ...@@ -358,7 +358,7 @@ static void pseries_lpar_idle(void)
* to ever be a problem in practice we can move this into a kernel thread to * to ever be a problem in practice we can move this into a kernel thread to
* finish off the process later in boot. * finish off the process later in boot.
*/ */
void pseries_enable_reloc_on_exc(void) bool pseries_enable_reloc_on_exc(void)
{ {
long rc; long rc;
unsigned int delay, total_delay = 0; unsigned int delay, total_delay = 0;
...@@ -369,11 +369,13 @@ void pseries_enable_reloc_on_exc(void) ...@@ -369,11 +369,13 @@ void pseries_enable_reloc_on_exc(void)
if (rc == H_P2) { if (rc == H_P2) {
pr_info("Relocation on exceptions not" pr_info("Relocation on exceptions not"
" supported\n"); " supported\n");
return false;
} else if (rc != H_SUCCESS) { } else if (rc != H_SUCCESS) {
pr_warn("Unable to enable relocation" pr_warn("Unable to enable relocation"
" on exceptions: %ld\n", rc); " on exceptions: %ld\n", rc);
return false;
} }
break; return true;
} }
delay = get_longbusy_msecs(rc); delay = get_longbusy_msecs(rc);
...@@ -382,7 +384,7 @@ void pseries_enable_reloc_on_exc(void) ...@@ -382,7 +384,7 @@ void pseries_enable_reloc_on_exc(void)
pr_warn("Warning: Giving up waiting to enable " pr_warn("Warning: Giving up waiting to enable "
"relocation on exceptions (%u msec)!\n", "relocation on exceptions (%u msec)!\n",
total_delay); total_delay);
return; return false;
} }
mdelay(delay); mdelay(delay);
......
...@@ -1593,6 +1593,7 @@ const char *getvecname(unsigned long vec) ...@@ -1593,6 +1593,7 @@ const char *getvecname(unsigned long vec)
case 0x1300: ret = "(Instruction Breakpoint)"; break; case 0x1300: ret = "(Instruction Breakpoint)"; break;
case 0x1500: ret = "(Denormalisation)"; break; case 0x1500: ret = "(Denormalisation)"; break;
case 0x1700: ret = "(Altivec Assist)"; break; case 0x1700: ret = "(Altivec Assist)"; break;
case 0x3000: ret = "(System Call Vectored)"; break;
default: ret = ""; default: ret = "";
} }
return ret; return ret;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment