-
Benjamin Herrenschmidt authored
This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped into every process space, similar to the x86 vsyscall page. However, the implementation is very different (and doesn't use the gate area mecanism). Actually, it contains two implementations, a 32 bits and a 64 bits one. These vDSO's are currently mapped at 0x100000 (+1Mb) when possible (when a process load section isn't already there). In the future, we can randomize that address, or even imagine having a special phdr entry letting apps that wnat finer control over their address space to put it elsewhere (or not at all). The implementation adds a hook to binfmt_elf to let the architecture add a real VMA to the process space instead of using the gate area mecanism. This mecanism wasn't very suitable for ppc, we couldn't just "shove" PTE entries mapping kernel addresses into userland without expensive changes to our hash table management. Instead, I made the vDSO be a normal VMA which, additionally, means it supports copy-on-write semantics if made writable via ptrace/mprotect, thus allowing breakpoints in the vDSO code. The current implementation of the vDSOs contain the signal trampolines with appropriate DWARF informations, which enable us to use non-executable stacks (patches to come later) along with a few more functions that we hope glibc will soon make good use of (this is the "hard" part now :) Note that the symbols exposed by the vDSO aren't "normal" function symbols, apps can't be expected to link against them directly, the vDSO's are both seen as if they were linked at 0 and the symbols just contain offsets to the various functions. This is done on purpose to avoid a relocation step (ppc64 functions normally have descriptors with abs addresses in them). When glibc uses those functions, it's expected to use it's own trampolines that know how to reach them. In some cases, the vDSO contains several versions of a given function (for various CPUs), the kernel will "patch" the symbol table at boot to make it point to the appropriate one transparently. What is currently implemented is: - int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); This is a fully userland implementation of gettimeofday, with no barriers and no locks, and providing 100% equivalent results to the syscall version - void __kernel_sync_dicache(unsigned long start, unsigned long end) This function sync's the data and instruction caches (for making data executable), it is expected that userland loaders use this instead of doing it themselves, as the kernel will provide optimized versions for the current CPU. Currently, the vDSO procides a full one for all CPUs prior to POWER5 and a nop one for POWER5 which implements hardware snooping at the L1 level. In the future, an intermediate implementation may be done for the POWER4 and 970 which don't need the "dcbst" loop (the L1D cache is write-through on those). - void *__kernel_get_syscall_map(unsigned int *syscall_count); Returns a pointer to a map of implemented syscalls on the currently running kernel. The map is agnostic to the size of "long", unlike kernel bitops, it stores bits from top to bottom so that memory actually contains a linear bitmap check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of * 32 bits int at N >> 5. Note about backward compatibility issues: A bug in the ppc64 libgcc unwinder makes it unable to unwind stacks properly accross signals if the signal trampoline isn't on the stack. This has been fixed in CVS for gcc 4.0 and will be soon on the stable branch, but the problem exist will all currently used versions. That means that until glibc gets the patch to enable it's use of the vDSO symbols for the DWARF unwinder (rather trivial patch that will be pushed to glibc CVS soon hopefully), unwinding from a signal handler will not work for 64 bits applications. I consider this as a non-issue though as a patch is about to be produced, which can easily get pushed to "live" distros like debian, gentoo, fedora, etc... soon enough (it breaks compatilbity with kernels below 2.4.20 unfortunately as our signal stack layout changed, crap crap crap), as there are few 64 bits applications out there (expect gentoo), as it's only really an issue with C++ code relying on throwing exceptions out of signal handlers (extremely rare it seems), and as "release" distros like SLES or RHEL will probably have the vDSO enabled glibc _and_ the unwinder fix by the time they release a version with a 2.6.11 or 2.6.12 kernel anyway :) So far, I yet have to see an app failing because of that... Finally, many many many thanks to Alan Modra for writing the DWARF information of the signal handlers and debugging the libgcc issues ! Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
054eb715