• Benjamin Herrenschmidt's avatar
    [PATCH] ppc64: Implement a vDSO and use it for signal trampoline · 054eb715
    Benjamin Herrenschmidt authored
    This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped
    into every process space, similar to the x86 vsyscall page.  However, the
    implementation is very different (and doesn't use the gate area mecanism). 
    Actually, it contains two implementations, a 32 bits and a 64 bits one.
    
    These vDSO's are currently mapped at 0x100000 (+1Mb) when possible (when a
    process load section isn't already there).  In the future, we can randomize
    that address, or even imagine having a special phdr entry letting apps that
    wnat finer control over their address space to put it elsewhere (or not at
    all).
    
    The implementation adds a hook to binfmt_elf to let the architecture add a
    real VMA to the process space instead of using the gate area mecanism.
    This mecanism wasn't very suitable for ppc, we couldn't just "shove" PTE
    entries mapping kernel addresses into userland without expensive changes to
    our hash table management.  Instead, I made the vDSO be a normal VMA which,
    additionally, means it supports copy-on-write semantics if made writable
    via ptrace/mprotect, thus allowing breakpoints in the vDSO code.
    
    The current implementation of the vDSOs contain the signal trampolines with
    appropriate DWARF informations, which enable us to use non-executable
    stacks (patches to come later) along with a few more functions that we hope
    glibc will soon make good use of (this is the "hard" part now :) Note that
    the symbols exposed by the vDSO aren't "normal" function symbols, apps
    can't be expected to link against them directly, the vDSO's are both seen
    as if they were linked at 0 and the symbols just contain offsets to the
    various functions.  This is done on purpose to avoid a relocation step
    (ppc64 functions normally have descriptors with abs addresses in them). 
    When glibc uses those functions, it's expected to use it's own trampolines
    that know how to reach them.
    
    In some cases, the vDSO contains several versions of a given function (for
    various CPUs), the kernel will "patch" the symbol table at boot to make it
    point to the appropriate one transparently.  What is currently implemented
    is:
    
     -  int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz);
    
     This is a fully userland implementation of gettimeofday, with no barriers
     and no locks, and providing 100% equivalent results to the syscall version
    
     - void __kernel_sync_dicache(unsigned long start, unsigned long end)
    
     This function sync's the data and instruction caches (for making data
     executable), it is expected that userland loaders use this instead of
     doing it themselves, as the kernel will provide optimized versions for the
     current CPU.  Currently, the vDSO procides a full one for all CPUs prior
     to POWER5 and a nop one for POWER5 which implements hardware snooping at
     the L1 level.  In the future, an intermediate implementation may be done
     for the POWER4 and 970 which don't need the "dcbst" loop (the L1D cache is
     write-through on those).
    
     - void *__kernel_get_syscall_map(unsigned int *syscall_count);
    
     Returns a pointer to a map of implemented syscalls on the currently
     running kernel.  The map is agnostic to the size of "long", unlike kernel
     bitops, it stores bits from top to bottom so that memory actually contains
     a linear bitmap check for syscall N by testing bit (0x80000000 >> (N &
     0x1f)) of * 32 bits int at N >> 5.
    
    Note about backward compatibility issues: A bug in the ppc64 libgcc
    unwinder makes it unable to unwind stacks properly accross signals if the
    signal trampoline isn't on the stack.  This has been fixed in CVS for gcc
    4.0 and will be soon on the stable branch, but the problem exist will all
    currently used versions.
    
    That means that until glibc gets the patch to enable it's use of the vDSO
    symbols for the DWARF unwinder (rather trivial patch that will be pushed to
    glibc CVS soon hopefully), unwinding from a signal handler will not work
    for 64 bits applications.
    
    I consider this as a non-issue though as a patch is about to be produced,
    which can easily get pushed to "live" distros like debian, gentoo, fedora,
    etc...  soon enough (it breaks compatilbity with kernels below 2.4.20
    unfortunately as our signal stack layout changed, crap crap crap), as there
    are few 64 bits applications out there (expect gentoo), as it's only really
    an issue with C++ code relying on throwing exceptions out of signal
    handlers (extremely rare it seems), and as "release" distros like SLES or
    RHEL will probably have the vDSO enabled glibc _and_ the unwinder fix by
    the time they release a version with a 2.6.11 or 2.6.12 kernel anyway :)
    
    So far, I yet have to see an app failing because of that...
    
    Finally, many many many thanks to Alan Modra for writing the DWARF
    information of the signal handlers and debugging the libgcc issues !
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    054eb715
Makefile 3.35 KB