Commit 50d5299d authored by David Mosberger's avatar David Mosberger

ia64: Light-weight system call support (aka, "fsyscalls"). This does not (yet)

	accelerate normal system calls, but it puts the infrastructure in place
	and lets you write fsyscall-handlers to your hearts content.  A null system-
	call (such as getpid()) can now run in as little as 35 cycles!
parent ef222347
-*-Mode: outline-*-
Light-weight System Calls for IA-64
-----------------------------------
Started: 13-Jan-2002
Last update: 14-Jan-2002
David Mosberger-Tang
<davidm@hpl.hp.com>
Using the "epc" instruction effectively introduces a new mode of
execution to the ia64 linux kernel. We call this mode the
"fsys-mode". To recap, the normal states of execution are:
- kernel mode:
Both the register stack and the kernel stack have been
switched over to the kernel stack. The user-level state
is saved in a pt-regs structure at the top of the kernel
memory stack.
- user mode:
Both the register stack and the kernel stack are in
user land. The user-level state is contained in the
CPU registers.
- bank 0 interruption-handling mode:
This is the non-interruptible state in that all
interruption-handlers start executing in. The user-level
state remains in the CPU registers and some kernel state may
be stored in bank 0 of registers r16-r31.
Fsys-mode has the following special properties:
- execution is at privilege level 0 (most-privileged)
- CPU registers may contain a mixture of user-level and kernel-level
state (it is the responsibility of the kernel to ensure that no
security-sensitive kernel-level state is leaked back to
user-level)
- execution is interruptible and preemptible (an fsys-mode handler
can disable interrupts and avoid all other interruption-sources
to avoid preemption)
- neither the memory nor the register stack can be trusted while
in fsys-mode (they point to the user-level stacks, which may
be invalid)
In summary, fsys-mode is much more similar to running in user-mode
than it is to running in kernel-mode. Of course, given that the
privilege level is at level 0, this means that fsys-mode requires some
care (see below).
* How to tell fsys-mode
Linux operates in fsys-mode when (a) the privilege level is 0 (most
privileged) and (b) the stacks have NOT been switched to kernel memory
yet. For convenience, the header file <asm-ia64/ptrace.h> provides
three macros:
user_mode(regs)
user_stack(regs)
fsys_mode(regs)
The "regs" argument is a pointer to a pt_regs structure. user_mode()
returns TRUE if the CPU state pointed to by "regs" was executing in
user mode (privilege level 3). user_stack() returns TRUE if the state
pointed to by "regs" was executing on the user-level stack(s).
Finally, fsys_mode() returns TRUE if the CPU state pointed to by
"regs" was executing in fsys-mode. The fsys_mode() macro corresponds
exactly to the expression:
!user_mode(regs) && user_stack(regs)
* How to write an fsyscall handler
The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
(fsyscall_table). This table contains one entry for each system call.
By default, a system call is handled by fsys_fallback_syscall(). This
routine takes care of entering (full) kernel mode and calling the
normal Linux system call handler. For performance-critical system
calls, it is possible to write a hand-tuned fsyscall_handler. For
example, fsys.S contains fsys_getpid(), which is a hand-tuned version
of the getpid() system call.
The entry and exit-state of an fsyscall handler is as follows:
** Machine state on entry to fsyscall handler:
- r11 = saved ar.pfs (a user-level value)
- r15 = system call number
- r16 = "current" task pointer (in normal kernel-mode, this is in r13)
- r32-r39 = system call arguments
- b6 = return address (a user-level value)
- ar.pfs = previous frame-state (a user-level value)
- PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
- all other registers may contain values passed in from user-mode
** Required machine state on exit to fsyscall handler:
- r11 = saved ar.pfs (as passed into the fsyscall handler)
- r15 = system call number (as passed into the fsyscall handler)
- r32-r39 = system call arguments (as passed into the fsyscall handler)
- b6 = return address (as passed into the fsyscall handler)
- ar.pfs = previous frame-state (as passed into the fsyscall handler)
Fsyscall handlers can execute with very little overhead, but with that
speed comes a set of restrictions:
o Fsyscall-handlers MUST check for any pending work in the flags
member of the thread-info structure and if any of the
TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
doing a full system call (by calling fsys_fallback_syscall).
o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
r15, b6, and ar.pfs) because they will be needed in case of a
system call restart. Of course, all "preserved" registers also
must be preserved, in accordance to the normal calling conventions.
o Fsyscall-handlers MUST check argument registers for containing a
NaT value before using them in any way that could trigger a
NaT-consumption fault. If a system call argument is found to
contain a NaT value, an fsyscall-handler may return immediately
with r8=EINVAL, r10=-1.
o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
any other operation that would trigger mandatory RSE
(register-stack engine) traffic.
o Fsyscall-handlers MUST NOT write to any stacked registers because
it is not safe to assume that user-level called a handler with the
proper number of arguments.
o Fsyscall-handlers need to be careful when accessing per-CPU variables:
unless proper safe-guards are taken (e.g., interruptions are avoided),
execution may be pre-empted and resumed on another CPU at any given
time.
o Fsyscall-handlers must be careful not to leak sensitive kernel'
information back to user-level. In particular, before returning to
user-level, care needs to be taken to clear any scratch registers
that could contain sensitive information (note that the current
task pointer is not considered sensitive: it's already exposed
through ar.k6).
The above restrictions may seem draconian, but remember that it's
possible to trade off some of the restrictions by paying a slightly
higher overhead. For example, if an fsyscall-handler could benefit
from the shadow register bank, it could temporarily disable PSR.i and
PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
needed. In other words, following the above rules yields extremely
fast system call execution (while fully preserving system call
semantics), but there is also a lot of flexibility in handling more
complicated cases.
* PSR Handling
The "epc" instruction doesn't change the contents of PSR at all. This
is in contrast to a regular interruption, which clears almost all
bits. Because of that, some care needs to be taken to ensure things
work as expected. The following discussion describes how each PSR bit
is handled.
PSR.be Cleared when entering fsys-mode. A srlz.d instruction is used
to ensure the CPU is in little-endian mode before the first
load/store instruction is executed. PSR.be is normally NOT
restored upon return from an fsys-mode handler. In other
words, user-level code must not rely on PSR.be being preserved
across a system call.
PSR.up Unchanged.
PSR.ac Unchanged.
PSR.mfl Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.mfh Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.ic Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
PSR.i Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
PSR.pk Unchanged.
PSR.dt Unchanged.
PSR.dfl Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.dfh Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.sp Unchanged.
PSR.pp Unchanged.
PSR.di Unchanged.
PSR.si Unchanged.
PSR.db Unchanged. The kernel prevents user-level from setting a hardware
breakpoint that triggers at any privilege level other than 3 (user-mode).
PSR.lp Unchanged.
PSR.tb Lazy redirect. If a taken-branch trap occurs while in
fsys-mode, the trap-handler modifies the saved machine state
such that execution resumes in the gate page at
syscall_via_break(), with privilege level 3. Note: the
taken branch would occur on the branch invoking the
fsyscall-handler, at which point, by definition, a syscall
restart is still safe. If the system call number is invalid,
the fsys-mode handler will return directly to user-level. This
return will trigger a taken-branch trap, but since the trap is
taken _after_ restoring the privilege level, the CPU has already
left fsys-mode, so no special treatment is needed.
PSR.rt Unchanged.
PSR.cpl Cleared to 0.
PSR.is Unchanged (guaranteed to be 0 on entry to the gate page).
PSR.mc Unchanged.
PSR.it Unchanged (guaranteed to be 1).
PSR.id Unchanged. Note: the ia64 linux kernel never sets this bit.
PSR.da Unchanged. Note: the ia64 linux kernel never sets this bit.
PSR.dd Unchanged. Note: the ia64 linux kernel never sets this bit.
PSR.ss Lazy redirect. If set, "epc" will cause a Single Step Trap to
be taken. The trap handler then modifies the saved machine
state such that execution resumes in the gate page at
syscall_via_break(), with privilege level 3.
PSR.ri Unchanged.
PSR.ed Unchanged. Note: This bit could only have an effect if an fsys-mode
handler performed a speculative load that gets NaTted. If so, this
would be the normal & expected behavior, so no special treatment is
needed.
PSR.bn Unchanged. Note: fsys-mode handlers may clear the bit, if needed.
Doing so requires clearing PSR.i and PSR.ic as well.
PSR.ia Unchanged. Note: the ia64 linux kernel never sets this bit.
...@@ -806,6 +806,9 @@ source "arch/ia64/hp/sim/Kconfig" ...@@ -806,6 +806,9 @@ source "arch/ia64/hp/sim/Kconfig"
menu "Kernel hacking" menu "Kernel hacking"
config FSYS
bool "Light-weight system-call support (via epc)"
choice choice
prompt "Physical memory granularity" prompt "Physical memory granularity"
default IA64_GRANULE_64MB default IA64_GRANULE_64MB
......
...@@ -12,6 +12,7 @@ obj-y := acpi.o entry.o gate.o efi.o efi_stub.o ia64_ksyms.o \ ...@@ -12,6 +12,7 @@ obj-y := acpi.o entry.o gate.o efi.o efi_stub.o ia64_ksyms.o \
semaphore.o setup.o \ semaphore.o setup.o \
signal.o sys_ia64.o traps.o time.o unaligned.o unwind.o signal.o sys_ia64.o traps.o time.o unaligned.o unwind.o
obj-$(CONFIG_FSYS) += fsys.o
obj-$(CONFIG_IOSAPIC) += iosapic.o obj-$(CONFIG_IOSAPIC) += iosapic.o
obj-$(CONFIG_IA64_PALINFO) += palinfo.o obj-$(CONFIG_IA64_PALINFO) += palinfo.o
obj-$(CONFIG_EFI_VARS) += efivars.o obj-$(CONFIG_EFI_VARS) += efivars.o
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
* *
* Kernel entry points. * Kernel entry points.
* *
* Copyright (C) 1998-2002 Hewlett-Packard Co * Copyright (C) 1998-2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
* Copyright (C) 1999 VA Linux Systems * Copyright (C) 1999 VA Linux Systems
* Copyright (C) 1999 Walt Drummond <drummond@valinux.com> * Copyright (C) 1999 Walt Drummond <drummond@valinux.com>
...@@ -22,8 +22,8 @@ ...@@ -22,8 +22,8 @@
/* /*
* Global (preserved) predicate usage on syscall entry/exit path: * Global (preserved) predicate usage on syscall entry/exit path:
* *
* pKern: See entry.h. * pKStk: See entry.h.
* pUser: See entry.h. * pUStk: See entry.h.
* pSys: See entry.h. * pSys: See entry.h.
* pNonSys: !pSys * pNonSys: !pSys
*/ */
...@@ -63,7 +63,7 @@ ENTRY(ia64_execve) ...@@ -63,7 +63,7 @@ ENTRY(ia64_execve)
sxt4 r8=r8 // return 64-bit result sxt4 r8=r8 // return 64-bit result
;; ;;
stf.spill [sp]=f0 stf.spill [sp]=f0
(p6) cmp.ne pKern,pUser=r0,r0 // a successful execve() lands us in user-mode... (p6) cmp.ne pKStk,pUStk=r0,r0 // a successful execve() lands us in user-mode...
mov rp=loc0 mov rp=loc0
(p6) mov ar.pfs=r0 // clear ar.pfs on success (p6) mov ar.pfs=r0 // clear ar.pfs on success
(p7) br.ret.sptk.many rp (p7) br.ret.sptk.many rp
...@@ -193,7 +193,7 @@ GLOBAL_ENTRY(ia64_switch_to) ...@@ -193,7 +193,7 @@ GLOBAL_ENTRY(ia64_switch_to)
;; ;;
(p6) srlz.d (p6) srlz.d
ld8 sp=[r21] // load kernel stack pointer of new task ld8 sp=[r21] // load kernel stack pointer of new task
mov IA64_KR(CURRENT)=r20 // update "current" application register mov IA64_KR(CURRENT)=in0 // update "current" application register
mov r8=r13 // return pointer to previously running task mov r8=r13 // return pointer to previously running task
mov r13=in0 // set "current" pointer mov r13=in0 // set "current" pointer
;; ;;
...@@ -569,11 +569,12 @@ END(ia64_ret_from_syscall) ...@@ -569,11 +569,12 @@ END(ia64_ret_from_syscall)
// fall through // fall through
GLOBAL_ENTRY(ia64_leave_kernel) GLOBAL_ENTRY(ia64_leave_kernel)
PT_REGS_UNWIND_INFO(0) PT_REGS_UNWIND_INFO(0)
// work.need_resched etc. mustn't get changed by this CPU before it returns to userspace: // work.need_resched etc. mustn't get changed by this CPU before it returns to
(pUser) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUser // user- or fsys-mode:
(pUser) rsm psr.i (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
(pUStk) rsm psr.i
;; ;;
(pUser) adds r17=TI_FLAGS+IA64_TASK_SIZE,r13 (pUStk) adds r17=TI_FLAGS+IA64_TASK_SIZE,r13
;; ;;
.work_processed: .work_processed:
(p6) ld4 r18=[r17] // load current_thread_info()->flags (p6) ld4 r18=[r17] // load current_thread_info()->flags
...@@ -635,9 +636,9 @@ GLOBAL_ENTRY(ia64_leave_kernel) ...@@ -635,9 +636,9 @@ GLOBAL_ENTRY(ia64_leave_kernel)
;; ;;
srlz.i // ensure interruption collection is off srlz.i // ensure interruption collection is off
mov b7=r15 mov b7=r15
bsw.0 // switch back to bank 0 (no stop bit required beforehand...)
;; ;;
bsw.0 // switch back to bank 0 (pUStk) mov r18=IA64_KR(CURRENT) // Itanium 2: 12 cycle read latency
;;
adds r16=16,r12 adds r16=16,r12
adds r17=24,r12 adds r17=24,r12
;; ;;
...@@ -665,16 +666,21 @@ GLOBAL_ENTRY(ia64_leave_kernel) ...@@ -665,16 +666,21 @@ GLOBAL_ENTRY(ia64_leave_kernel)
;; ;;
ld8.fill r12=[r16],16 ld8.fill r12=[r16],16
ld8.fill r13=[r17],16 ld8.fill r13=[r17],16
(pUStk) adds r18=IA64_TASK_THREAD_ON_USTACK_OFFSET,r18
;; ;;
ld8.fill r14=[r16] ld8.fill r14=[r16]
ld8.fill r15=[r17] ld8.fill r15=[r17]
(pUStk) mov r17=1
;;
(pUStk) st1 [r18]=r17 // restore current->thread.on_ustack
shr.u r18=r19,16 // get byte size of existing "dirty" partition shr.u r18=r19,16 // get byte size of existing "dirty" partition
;; ;;
mov r16=ar.bsp // get existing backing store pointer mov r16=ar.bsp // get existing backing store pointer
movl r17=THIS_CPU(ia64_phys_stacked_size_p8) movl r17=THIS_CPU(ia64_phys_stacked_size_p8)
;; ;;
ld4 r17=[r17] // r17 = cpu_data->phys_stacked_size_p8 ld4 r17=[r17] // r17 = cpu_data->phys_stacked_size_p8
(pKern) br.cond.dpnt skip_rbs_switch (pKStk) br.cond.dpnt skip_rbs_switch
/* /*
* Restore user backing store. * Restore user backing store.
* *
...@@ -788,12 +794,12 @@ rse_clear_invalid: ...@@ -788,12 +794,12 @@ rse_clear_invalid:
skip_rbs_switch: skip_rbs_switch:
mov b6=rB6 mov b6=rB6
mov ar.pfs=rARPFS mov ar.pfs=rARPFS
(pUser) mov ar.bspstore=rARBSPSTORE (pUStk) mov ar.bspstore=rARBSPSTORE
(p9) mov cr.ifs=rCRIFS (p9) mov cr.ifs=rCRIFS
mov cr.ipsr=rCRIPSR mov cr.ipsr=rCRIPSR
mov cr.iip=rCRIIP mov cr.iip=rCRIIP
;; ;;
(pUser) mov ar.rnat=rARRNAT // must happen with RSE in lazy mode (pUStk) mov ar.rnat=rARRNAT // must happen with RSE in lazy mode
mov ar.rsc=rARRSC mov ar.rsc=rARRSC
mov ar.unat=rARUNAT mov ar.unat=rARUNAT
mov pr=rARPR,-1 mov pr=rARPR,-1
......
...@@ -4,8 +4,8 @@ ...@@ -4,8 +4,8 @@
* Preserved registers that are shared between code in ivt.S and entry.S. Be * Preserved registers that are shared between code in ivt.S and entry.S. Be
* careful not to step on these! * careful not to step on these!
*/ */
#define pKern p2 /* will leave_kernel return to kernel-mode? */ #define pKStk p2 /* will leave_kernel return to kernel-stacks? */
#define pUser p3 /* will leave_kernel return to user-mode? */ #define pUStk p3 /* will leave_kernel return to user-stacks? */
#define pSys p4 /* are we processing a (synchronous) system call? */ #define pSys p4 /* are we processing a (synchronous) system call? */
#define pNonSys p5 /* complement of pSys */ #define pNonSys p5 /* complement of pSys */
......
This diff is collapsed.
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
* This file contains the code that gets mapped at the upper end of each task's text * This file contains the code that gets mapped at the upper end of each task's text
* region. For now, it contains the signal trampoline code only. * region. For now, it contains the signal trampoline code only.
* *
* Copyright (C) 1999-2002 Hewlett-Packard Co * Copyright (C) 1999-2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
*/ */
...@@ -14,6 +14,85 @@ ...@@ -14,6 +14,85 @@
#include <asm/page.h> #include <asm/page.h>
.section .text.gate, "ax" .section .text.gate, "ax"
.start_gate:
#if CONFIG_FSYS
#include <asm/errno.h>
/*
* On entry:
* r11 = saved ar.pfs
* r15 = system call #
* b0 = saved return address
* b6 = return address
* On exit:
* r11 = saved ar.pfs
* r15 = system call #
* b0 = saved return address
* all other "scratch" registers: undefined
* all "preserved" registers: same as on entry
*/
GLOBAL_ENTRY(syscall_via_epc)
.prologue
.altrp b6
.body
{
/*
* Note: the kernel cannot assume that the first two instructions in this
* bundle get executed. The remaining code must be safe even if
* they do not get executed.
*/
adds r17=-1024,r15
mov r10=0 // default to successful syscall execution
epc
}
;;
rsm psr.be
movl r18=fsyscall_table
mov r16=IA64_KR(CURRENT)
mov r19=255
;;
shladd r18=r17,3,r18
cmp.geu p6,p0=r19,r17 // (syscall > 0 && syscall <= 1024+255)?
;;
srlz.d // ensure little-endian byteorder is in effect
(p6) ld8 r18=[r18]
;;
(p6) mov b7=r18
(p6) br.sptk.many b7
mov r10=-1
mov r8=ENOSYS
br.ret.sptk.many b6
END(syscall_via_epc)
GLOBAL_ENTRY(syscall_via_break)
.prologue
.altrp b6
.body
break 0x100000
br.ret.sptk.many b6
END(syscall_via_break)
GLOBAL_ENTRY(fsys_fallback_syscall)
/*
* It would be better/fsyser to do the SAVE_MIN magic directly here, but for now
* we simply fall back on doing a system-call via break. Good enough
* to get started. (Note: we have to do this through the gate page again, since
* the br.ret will switch us back to user-level privilege.)
*
* XXX Move this back to fsys.S after changing it over to avoid break 0x100000.
*/
movl r2=(syscall_via_break - .start_gate) + GATE_ADDR
;;
mov b7=r2
br.ret.sptk.many b7
END(fsys_fallback_syscall)
#endif /* CONFIG_FSYS */
# define ARG0_OFF (16 + IA64_SIGFRAME_ARG0_OFFSET) # define ARG0_OFF (16 + IA64_SIGFRAME_ARG0_OFFSET)
# define ARG1_OFF (16 + IA64_SIGFRAME_ARG1_OFFSET) # define ARG1_OFF (16 + IA64_SIGFRAME_ARG1_OFFSET)
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
* to set up the kernel's global pointer and jump to the kernel * to set up the kernel's global pointer and jump to the kernel
* entry point. * entry point.
* *
* Copyright (C) 1998-2001 Hewlett-Packard Co * Copyright (C) 1998-2001, 2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
* Stephane Eranian <eranian@hpl.hp.com> * Stephane Eranian <eranian@hpl.hp.com>
* Copyright (C) 1999 VA Linux Systems * Copyright (C) 1999 VA Linux Systems
...@@ -143,17 +143,14 @@ start_ap: ...@@ -143,17 +143,14 @@ start_ap:
movl r2=init_thread_union movl r2=init_thread_union
cmp.eq isBP,isAP=r0,r0 cmp.eq isBP,isAP=r0,r0
#endif #endif
;;
extr r3=r2,0,61 // r3 == phys addr of task struct
mov r16=KERNEL_TR_PAGE_NUM mov r16=KERNEL_TR_PAGE_NUM
;; ;;
// load the "current" pointer (r13) and ar.k6 with the current task // load the "current" pointer (r13) and ar.k6 with the current task
mov r13=r2 mov IA64_KR(CURRENT)=r2 // virtual address
mov IA64_KR(CURRENT)=r3 // Physical address
// initialize k4 to a safe value (64-128MB is mapped by TR_KERNEL) // initialize k4 to a safe value (64-128MB is mapped by TR_KERNEL)
mov IA64_KR(CURRENT_STACK)=r16 mov IA64_KR(CURRENT_STACK)=r16
mov r13=r2
/* /*
* Reserve space at the top of the stack for "struct pt_regs". Kernel threads * Reserve space at the top of the stack for "struct pt_regs". Kernel threads
* don't store interesting values in that structure, but the space still needs * don't store interesting values in that structure, but the space still needs
......
...@@ -30,25 +30,23 @@ ...@@ -30,25 +30,23 @@
* on interrupts. * on interrupts.
*/ */
#define MINSTATE_START_SAVE_MIN_VIRT \ #define MINSTATE_START_SAVE_MIN_VIRT \
(pUser) mov ar.rsc=0; /* set enforced lazy mode, pl 0, little-endian, loadrs=0 */ \ (pUStk) mov ar.rsc=0; /* set enforced lazy mode, pl 0, little-endian, loadrs=0 */ \
dep r1=-1,r1,61,3; /* r1 = current (virtual) */ \
;; \ ;; \
(pUser) mov.m rARRNAT=ar.rnat; \ (pUStk) mov.m rARRNAT=ar.rnat; \
(pUser) addl rKRBS=IA64_RBS_OFFSET,r1; /* compute base of RBS */ \ (pUStk) addl rKRBS=IA64_RBS_OFFSET,r1; /* compute base of RBS */ \
(pKern) mov r1=sp; /* get sp */ \ (pKStk) mov r1=sp; /* get sp */ \
;; \ ;; \
(pUser) lfetch.fault.excl.nt1 [rKRBS]; \ (pUStk) lfetch.fault.excl.nt1 [rKRBS]; \
(pUser) mov rARBSPSTORE=ar.bspstore; /* save ar.bspstore */ \ (pUStk) addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1; /* compute base of memory stack */ \
(pUser) addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1; /* compute base of memory stack */ \ (pUStk) mov rARBSPSTORE=ar.bspstore; /* save ar.bspstore */ \
;; \ ;; \
(pUser) mov ar.bspstore=rKRBS; /* switch to kernel RBS */ \ (pUStk) mov ar.bspstore=rKRBS; /* switch to kernel RBS */ \
(pKern) addl r1=-IA64_PT_REGS_SIZE,r1; /* if in kernel mode, use sp (r12) */ \ (pKStk) addl r1=-IA64_PT_REGS_SIZE,r1; /* if in kernel mode, use sp (r12) */ \
;; \ ;; \
(pUser) mov r18=ar.bsp; \ (pUStk) mov r18=ar.bsp; \
(pUser) mov ar.rsc=0x3; /* set eager mode, pl 0, little-endian, loadrs=0 */ \ (pUStk) mov ar.rsc=0x3; /* set eager mode, pl 0, little-endian, loadrs=0 */ \
#define MINSTATE_END_SAVE_MIN_VIRT \ #define MINSTATE_END_SAVE_MIN_VIRT \
or r13=r13,r14; /* make `current' a kernel virtual address */ \
bsw.1; /* switch back to bank 1 (must be last in insn group) */ \ bsw.1; /* switch back to bank 1 (must be last in insn group) */ \
;; ;;
...@@ -57,21 +55,21 @@ ...@@ -57,21 +55,21 @@
* go virtual and dont want to destroy the iip or ipsr. * go virtual and dont want to destroy the iip or ipsr.
*/ */
#define MINSTATE_START_SAVE_MIN_PHYS \ #define MINSTATE_START_SAVE_MIN_PHYS \
(pKern) movl sp=ia64_init_stack+IA64_STK_OFFSET-IA64_PT_REGS_SIZE; \ (pKStk) movl sp=ia64_init_stack+IA64_STK_OFFSET-IA64_PT_REGS_SIZE; \
(pUser) mov ar.rsc=0; /* set enforced lazy mode, pl 0, little-endian, loadrs=0 */ \ (pUStk) mov ar.rsc=0; /* set enforced lazy mode, pl 0, little-endian, loadrs=0 */ \
(pUser) addl rKRBS=IA64_RBS_OFFSET,r1; /* compute base of register backing store */ \ (pUStk) addl rKRBS=IA64_RBS_OFFSET,r1; /* compute base of register backing store */ \
;; \ ;; \
(pUser) mov rARRNAT=ar.rnat; \ (pUStk) mov rARRNAT=ar.rnat; \
(pKern) dep r1=0,sp,61,3; /* compute physical addr of sp */ \ (pKStk) dep r1=0,sp,61,3; /* compute physical addr of sp */ \
(pUser) addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1; /* compute base of memory stack */ \ (pUStk) addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1; /* compute base of memory stack */ \
(pUser) mov rARBSPSTORE=ar.bspstore; /* save ar.bspstore */ \ (pUStk) mov rARBSPSTORE=ar.bspstore; /* save ar.bspstore */ \
(pUser) dep rKRBS=-1,rKRBS,61,3; /* compute kernel virtual addr of RBS */\ (pUStk) dep rKRBS=-1,rKRBS,61,3; /* compute kernel virtual addr of RBS */\
;; \ ;; \
(pKern) addl r1=-IA64_PT_REGS_SIZE,r1; /* if in kernel mode, use sp (r12) */ \ (pKStk) addl r1=-IA64_PT_REGS_SIZE,r1; /* if in kernel mode, use sp (r12) */ \
(pUser) mov ar.bspstore=rKRBS; /* switch to kernel RBS */ \ (pUStk) mov ar.bspstore=rKRBS; /* switch to kernel RBS */ \
;; \ ;; \
(pUser) mov r18=ar.bsp; \ (pUStk) mov r18=ar.bsp; \
(pUser) mov ar.rsc=0x3; /* set eager mode, pl 0, little-endian, loadrs=0 */ \ (pUStk) mov ar.rsc=0x3; /* set eager mode, pl 0, little-endian, loadrs=0 */ \
#define MINSTATE_END_SAVE_MIN_PHYS \ #define MINSTATE_END_SAVE_MIN_PHYS \
or r12=r12,r14; /* make sp a kernel virtual address */ \ or r12=r12,r14; /* make sp a kernel virtual address */ \
...@@ -79,11 +77,13 @@ ...@@ -79,11 +77,13 @@
;; ;;
#ifdef MINSTATE_VIRT #ifdef MINSTATE_VIRT
# define MINSTATE_GET_CURRENT(reg) mov reg=IA64_KR(CURRENT)
# define MINSTATE_START_SAVE_MIN MINSTATE_START_SAVE_MIN_VIRT # define MINSTATE_START_SAVE_MIN MINSTATE_START_SAVE_MIN_VIRT
# define MINSTATE_END_SAVE_MIN MINSTATE_END_SAVE_MIN_VIRT # define MINSTATE_END_SAVE_MIN MINSTATE_END_SAVE_MIN_VIRT
#endif #endif
#ifdef MINSTATE_PHYS #ifdef MINSTATE_PHYS
# define MINSTATE_GET_CURRENT(reg) mov reg=IA64_KR(CURRENT);; dep reg=0,reg,61,3
# define MINSTATE_START_SAVE_MIN MINSTATE_START_SAVE_MIN_PHYS # define MINSTATE_START_SAVE_MIN MINSTATE_START_SAVE_MIN_PHYS
# define MINSTATE_END_SAVE_MIN MINSTATE_END_SAVE_MIN_PHYS # define MINSTATE_END_SAVE_MIN MINSTATE_END_SAVE_MIN_PHYS
#endif #endif
...@@ -110,23 +110,26 @@ ...@@ -110,23 +110,26 @@
* we can pass interruption state as arguments to a handler. * we can pass interruption state as arguments to a handler.
*/ */
#define DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA) \ #define DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA) \
mov rARRSC=ar.rsc; \ mov rARRSC=ar.rsc; /* M */ \
mov rARPFS=ar.pfs; \ mov rARUNAT=ar.unat; /* M */ \
mov rR1=r1; \ mov rR1=r1; /* A */ \
mov rARUNAT=ar.unat; \ MINSTATE_GET_CURRENT(r1); /* M (or M;;I) */ \
mov rCRIPSR=cr.ipsr; \ mov rCRIPSR=cr.ipsr; /* M */ \
mov rB6=b6; /* rB6 = branch reg 6 */ \ mov rARPFS=ar.pfs; /* I */ \
mov rCRIIP=cr.iip; \ mov rCRIIP=cr.iip; /* M */ \
mov r1=IA64_KR(CURRENT); /* r1 = current (physical) */ \ mov rB6=b6; /* I */ /* rB6 = branch reg 6 */ \
COVER; \ COVER; /* B;; (or nothing) */ \
;; \ ;; \
invala; \ adds r16=IA64_TASK_THREAD_ON_USTACK_OFFSET,r1; \
extr.u r16=rCRIPSR,32,2; /* extract psr.cpl */ \ ;; \
;; \ ld1 r17=[r16]; /* load current->thread.on_ustack flag */ \
cmp.eq pKern,pUser=r0,r16; /* are we in kernel mode already? (psr.cpl==0) */ \ st1 [r16]=r0; /* clear current->thread.on_ustack flag */ \
/* switch from user to kernel RBS: */ \ /* switch from user to kernel RBS: */ \
;; \ ;; \
invala; /* M */ \
SAVE_IFS; \ SAVE_IFS; \
cmp.eq pKStk,pUStk=r0,r17; /* are we in kernel mode already? (psr.cpl==0) */ \
;; \
MINSTATE_START_SAVE_MIN \ MINSTATE_START_SAVE_MIN \
add r17=L1_CACHE_BYTES,r1 /* really: biggest cache-line size */ \ add r17=L1_CACHE_BYTES,r1 /* really: biggest cache-line size */ \
;; \ ;; \
...@@ -138,23 +141,23 @@ ...@@ -138,23 +141,23 @@
;; \ ;; \
lfetch.fault.excl.nt1 [r17]; \ lfetch.fault.excl.nt1 [r17]; \
adds r17=8,r1; /* initialize second base pointer */ \ adds r17=8,r1; /* initialize second base pointer */ \
(pKern) mov r18=r0; /* make sure r18 isn't NaT */ \ (pKStk) mov r18=r0; /* make sure r18 isn't NaT */ \
;; \ ;; \
st8 [r17]=rCRIIP,16; /* save cr.iip */ \ st8 [r17]=rCRIIP,16; /* save cr.iip */ \
st8 [r16]=rCRIFS,16; /* save cr.ifs */ \ st8 [r16]=rCRIFS,16; /* save cr.ifs */ \
(pUser) sub r18=r18,rKRBS; /* r18=RSE.ndirty*8 */ \ (pUStk) sub r18=r18,rKRBS; /* r18=RSE.ndirty*8 */ \
;; \ ;; \
st8 [r17]=rARUNAT,16; /* save ar.unat */ \ st8 [r17]=rARUNAT,16; /* save ar.unat */ \
st8 [r16]=rARPFS,16; /* save ar.pfs */ \ st8 [r16]=rARPFS,16; /* save ar.pfs */ \
shl r18=r18,16; /* compute ar.rsc to be used for "loadrs" */ \ shl r18=r18,16; /* compute ar.rsc to be used for "loadrs" */ \
;; \ ;; \
st8 [r17]=rARRSC,16; /* save ar.rsc */ \ st8 [r17]=rARRSC,16; /* save ar.rsc */ \
(pUser) st8 [r16]=rARRNAT,16; /* save ar.rnat */ \ (pUStk) st8 [r16]=rARRNAT,16; /* save ar.rnat */ \
(pKern) adds r16=16,r16; /* skip over ar_rnat field */ \ (pKStk) adds r16=16,r16; /* skip over ar_rnat field */ \
;; /* avoid RAW on r16 & r17 */ \ ;; /* avoid RAW on r16 & r17 */ \
(pUser) st8 [r17]=rARBSPSTORE,16; /* save ar.bspstore */ \ (pUStk) st8 [r17]=rARBSPSTORE,16; /* save ar.bspstore */ \
st8 [r16]=rARPR,16; /* save predicates */ \ st8 [r16]=rARPR,16; /* save predicates */ \
(pKern) adds r17=16,r17; /* skip over ar_bspstore field */ \ (pKStk) adds r17=16,r17; /* skip over ar_bspstore field */ \
;; \ ;; \
st8 [r17]=rB6,16; /* save b6 */ \ st8 [r17]=rB6,16; /* save b6 */ \
st8 [r16]=r18,16; /* save ar.rsc value for "loadrs" */ \ st8 [r16]=r18,16; /* save ar.rsc value for "loadrs" */ \
......
...@@ -524,6 +524,23 @@ ia64_fault (unsigned long vector, unsigned long isr, unsigned long ifa, ...@@ -524,6 +524,23 @@ ia64_fault (unsigned long vector, unsigned long isr, unsigned long ifa,
case 29: /* Debug */ case 29: /* Debug */
case 35: /* Taken Branch Trap */ case 35: /* Taken Branch Trap */
case 36: /* Single Step Trap */ case 36: /* Single Step Trap */
if (fsys_mode(regs)) {
extern char syscall_via_break[], __start_gate_section[];
/*
* Got a trap in fsys-mode: Taken Branch Trap and Single Step trap
* need special handling; Debug trap is not supposed to happen.
*/
if (unlikely(vector == 29)) {
die("Got debug trap in fsys-mode---not supposed to happen!",
regs, 0);
return;
}
/* re-do the system call via break 0x100000: */
regs->cr_iip = GATE_ADDR + (syscall_via_break - __start_gate_section);
ia64_psr(regs)->ri = 0;
ia64_psr(regs)->cpl = 3;
return;
}
switch (vector) { switch (vector) {
case 29: case 29:
siginfo.si_code = TRAP_HWBKPT; siginfo.si_code = TRAP_HWBKPT;
......
...@@ -331,12 +331,8 @@ set_rse_reg (struct pt_regs *regs, unsigned long r1, unsigned long val, int nat) ...@@ -331,12 +331,8 @@ set_rse_reg (struct pt_regs *regs, unsigned long r1, unsigned long val, int nat)
return; return;
} }
/* if (!user_stack(regs)) {
* Avoid using user_mode() here: with "epc", we cannot use the privilege level to DPRINT("ignoring kernel write to r%lu; register isn't on the kernel RBS!", r1);
* infer whether the interrupt task was running on the kernel backing store.
*/
if (regs->r12 >= TASK_SIZE) {
DPRINT("ignoring kernel write to r%lu; register isn't on the RBS!", r1);
return; return;
} }
...@@ -406,11 +402,7 @@ get_rse_reg (struct pt_regs *regs, unsigned long r1, unsigned long *val, int *na ...@@ -406,11 +402,7 @@ get_rse_reg (struct pt_regs *regs, unsigned long r1, unsigned long *val, int *na
return; return;
} }
/* if (!user_stack(regs)) {
* Avoid using user_mode() here: with "epc", we cannot use the privilege level to
* infer whether the interrupt task was running on the kernel backing store.
*/
if (regs->r12 >= TASK_SIZE) {
DPRINT("ignoring kernel read of r%lu; register isn't on the RBS!", r1); DPRINT("ignoring kernel read of r%lu; register isn't on the RBS!", r1);
goto fail; goto fail;
} }
......
/* /*
* Utility to generate asm-ia64/offsets.h. * Utility to generate asm-ia64/offsets.h.
* *
* Copyright (C) 1999-2002 Hewlett-Packard Co * Copyright (C) 1999-2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
* *
* Note that this file has dual use: when building the kernel * Note that this file has dual use: when building the kernel
...@@ -53,7 +53,9 @@ tab[] = ...@@ -53,7 +53,9 @@ tab[] =
{ "UNW_FRAME_INFO_SIZE", sizeof (struct unw_frame_info) }, { "UNW_FRAME_INFO_SIZE", sizeof (struct unw_frame_info) },
{ "", 0 }, /* spacer */ { "", 0 }, /* spacer */
{ "IA64_TASK_THREAD_KSP_OFFSET", offsetof (struct task_struct, thread.ksp) }, { "IA64_TASK_THREAD_KSP_OFFSET", offsetof (struct task_struct, thread.ksp) },
{ "IA64_TASK_THREAD_ON_USTACK_OFFSET", offsetof (struct task_struct, thread.on_ustack) },
{ "IA64_TASK_PID_OFFSET", offsetof (struct task_struct, pid) }, { "IA64_TASK_PID_OFFSET", offsetof (struct task_struct, pid) },
{ "IA64_TASK_TGID_OFFSET", offsetof (struct task_struct, tgid) },
{ "IA64_PT_REGS_CR_IPSR_OFFSET", offsetof (struct pt_regs, cr_ipsr) }, { "IA64_PT_REGS_CR_IPSR_OFFSET", offsetof (struct pt_regs, cr_ipsr) },
{ "IA64_PT_REGS_CR_IIP_OFFSET", offsetof (struct pt_regs, cr_iip) }, { "IA64_PT_REGS_CR_IIP_OFFSET", offsetof (struct pt_regs, cr_iip) },
{ "IA64_PT_REGS_CR_IFS_OFFSET", offsetof (struct pt_regs, cr_ifs) }, { "IA64_PT_REGS_CR_IFS_OFFSET", offsetof (struct pt_regs, cr_ifs) },
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
#define _ASM_IA64_ASMMACRO_H #define _ASM_IA64_ASMMACRO_H
/* /*
* Copyright (C) 2000-2001 Hewlett-Packard Co * Copyright (C) 2000-2001, 2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
*/ */
...@@ -11,6 +11,11 @@ ...@@ -11,6 +11,11 @@
.proc name; \ .proc name; \
name: name:
#define ENTRY_MIN_ALIGN(name) \
.align 16; \
.proc name; \
name:
#define GLOBAL_ENTRY(name) \ #define GLOBAL_ENTRY(name) \
.global name; \ .global name; \
ENTRY(name) ENTRY(name)
......
...@@ -4,10 +4,12 @@ ...@@ -4,10 +4,12 @@
/* /*
* ELF-specific definitions. * ELF-specific definitions.
* *
* Copyright (C) 1998, 1999, 2002 Hewlett-Packard Co * Copyright (C) 1998-1999, 2002-2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
*/ */
#include <linux/config.h>
#include <asm/fpu.h> #include <asm/fpu.h>
#include <asm/page.h> #include <asm/page.h>
...@@ -88,6 +90,11 @@ extern void ia64_elf_core_copy_regs (struct pt_regs *src, elf_gregset_t dst); ...@@ -88,6 +90,11 @@ extern void ia64_elf_core_copy_regs (struct pt_regs *src, elf_gregset_t dst);
relevant until we have real hardware to play with... */ relevant until we have real hardware to play with... */
#define ELF_PLATFORM 0 #define ELF_PLATFORM 0
/*
* This should go into linux/elf.h...
*/
#define AT_SYSINFO 32
#ifdef __KERNEL__ #ifdef __KERNEL__
struct elf64_hdr; struct elf64_hdr;
extern void ia64_set_personality (struct elf64_hdr *elf_ex, int ibcs2_interpreter); extern void ia64_set_personality (struct elf64_hdr *elf_ex, int ibcs2_interpreter);
...@@ -99,7 +106,14 @@ extern int dump_task_fpu (struct task_struct *, elf_fpregset_t *); ...@@ -99,7 +106,14 @@ extern int dump_task_fpu (struct task_struct *, elf_fpregset_t *);
#define ELF_CORE_COPY_TASK_REGS(tsk, elf_gregs) dump_task_regs(tsk, elf_gregs) #define ELF_CORE_COPY_TASK_REGS(tsk, elf_gregs) dump_task_regs(tsk, elf_gregs)
#define ELF_CORE_COPY_FPREGS(tsk, elf_fpregs) dump_task_fpu(tsk, elf_fpregs) #define ELF_CORE_COPY_FPREGS(tsk, elf_fpregs) dump_task_fpu(tsk, elf_fpregs)
#ifdef CONFIG_FSYS
#define ARCH_DLINFO \
do { \
extern int syscall_via_epc; \
NEW_AUX_ENT(AT_SYSINFO, syscall_via_epc); \
} while (0)
#endif #endif
#endif /* __KERNEL__ */
#endif /* _ASM_IA64_ELF_H */ #endif /* _ASM_IA64_ELF_H */
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
#define _ASM_IA64_PROCESSOR_H #define _ASM_IA64_PROCESSOR_H
/* /*
* Copyright (C) 1998-2002 Hewlett-Packard Co * Copyright (C) 1998-2003 Hewlett-Packard Co
* David Mosberger-Tang <davidm@hpl.hp.com> * David Mosberger-Tang <davidm@hpl.hp.com>
* Stephane Eranian <eranian@hpl.hp.com> * Stephane Eranian <eranian@hpl.hp.com>
* Copyright (C) 1999 Asit Mallick <asit.k.mallick@intel.com> * Copyright (C) 1999 Asit Mallick <asit.k.mallick@intel.com>
...@@ -223,7 +223,10 @@ typedef struct { ...@@ -223,7 +223,10 @@ typedef struct {
struct siginfo; struct siginfo;
struct thread_struct { struct thread_struct {
__u64 flags; /* various thread flags (see IA64_THREAD_*) */ __u32 flags; /* various thread flags (see IA64_THREAD_*) */
/* writing on_ustack is performance-critical, so it's worth spending 8 bits on it... */
__u8 on_ustack; /* executing on user-stacks? */
__u8 pad[3];
__u64 ksp; /* kernel stack pointer */ __u64 ksp; /* kernel stack pointer */
__u64 map_base; /* base address for get_unmapped_area() */ __u64 map_base; /* base address for get_unmapped_area() */
__u64 task_size; /* limit for task size */ __u64 task_size; /* limit for task size */
...@@ -277,6 +280,7 @@ struct thread_struct { ...@@ -277,6 +280,7 @@ struct thread_struct {
#define INIT_THREAD { \ #define INIT_THREAD { \
.flags = 0, \ .flags = 0, \
.on_ustack = 0, \
.ksp = 0, \ .ksp = 0, \
.map_base = DEFAULT_MAP_BASE, \ .map_base = DEFAULT_MAP_BASE, \
.task_size = DEFAULT_TASK_SIZE, \ .task_size = DEFAULT_TASK_SIZE, \
......
...@@ -218,6 +218,8 @@ struct switch_stack { ...@@ -218,6 +218,8 @@ struct switch_stack {
# define ia64_task_regs(t) (((struct pt_regs *) ((char *) (t) + IA64_STK_OFFSET)) - 1) # define ia64_task_regs(t) (((struct pt_regs *) ((char *) (t) + IA64_STK_OFFSET)) - 1)
# define ia64_psr(regs) ((struct ia64_psr *) &(regs)->cr_ipsr) # define ia64_psr(regs) ((struct ia64_psr *) &(regs)->cr_ipsr)
# define user_mode(regs) (((struct ia64_psr *) &(regs)->cr_ipsr)->cpl != 0) # define user_mode(regs) (((struct ia64_psr *) &(regs)->cr_ipsr)->cpl != 0)
# define user_stack(regs) (current->thread.on_ustack != 0)
# define fsys_mode(regs) (!user_mode(regs) && user_stack(regs))
struct task_struct; /* forward decl */ struct task_struct; /* forward decl */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment