• Andrew Morton's avatar
    [PATCH] force_successful_syscall_return() · a4369a58
    Andrew Morton authored
    From: David Mosberger <davidm@napali.hpl.hp.com>, Christoph Hellwig
    
    I believe this is the last outstanding piece that prevents ia64 from being
    fully in sync with Linus' tree (yes, there are some minor ACPI changes
    outstanding and a toolchain bug that's left to fix, but other than that, I
    think we're clean).
    
    Many architectures (alpha, ia64, ppc, ppc64, sparc, and sparc64 at least)
    use a syscall convention which provides for a return value and a separate
    error flag.  On those architectures, it can be beneficial if the kernel
    provides a mechanism to signal that a syscall call has completed
    successfully, even when the returned value is potentially a (small)
    negative number.  The patch below provides a hook for such a mechanism via
    a macro called force_successful_syscall_return().  On x86, this would be
    simply a no-op (because on x86, user-level has to be hacked to handle such
    cases).  On Alpha, it would be something along the lines of:
    
     #define force_successful_syscall_return()  ptregs->r0 = 0
    
    where "ptregs" is a pointer to the user's ptregs structure of the current
    task.  On ia64, we have been using this for a long time:
    
     static inline void force_successful_syscall_return (void) {
    	ia64_task_regs(current)->r8 = 0;
     }
    
    The other architectures (ppc, ppc64, sparc, and sparc64) currently have no
    mechanism to force a syscall return to be successful.  But since the
    syscall convention already provide for a separate error flag, the arch
    maintainers could change this if they wanted to.
    
    There are only 3 places in the platform-independent portion of the kernel
    that need this macro:
    
     - memory_lseek() in drivers/char/mem.c
     - fs/fcntl.c for F_GETOWN
     - lseek for /proc/mem in fs/proc/array.c
    
    Ideally, there are a couple of other places that could benefit from this
    macro:
    
     - sys_getpriority()
     - sys_shmat()
     - sys_brk()
     - do_mmap2()
     - do_mremap()
    
    but these are not so critical, because the can be worked around in
    platform-specific code (e.g., see arch/ia64/kernel/sys_ia64.c).
    
    Note that for the above 3 cases, handling them in user level is rather
    suboptimal:
    
     - it would affect all lseek() syscalls, even though only /proc/mem and
       /dev/mem need the special treatment (at least until there are
       filesystems that can handle files >= 2^63 bytes)
    
     - all fcntl() calls would be affected, even though only F_GETOWN needs
       the special treatment
    
    so I think handling these in the kernel for the platforms that can makes
    tons of sense.
    
    The only limitation of force_successful_syscall_return() is that it doesn't
    help with system calls performed by the kernel.  But the kernel does that
    so rarely and for such a limited set of syscalls that this is not a real
    problem.
    a4369a58
ptrace.h 10.3 KB