• Nicolas Pitre's avatar
    [ARM] lower overhead with alternative copy_to_user for small copies · cb9dc92c
    Nicolas Pitre authored
    Because the alternate copy_to_user implementation has a higher setup cost
    than the standard implementation, the size of the memory area to copy
    is tested and the standard implementation invoked instead when that size
    is too small.  Still, that test is made after the processor has preserved
    a bunch of registers on the stack which have to be reloaded right away
    needlessly in that case, causing a measurable performance regression
    compared to plain usage of the standard implementation only.
    
    To make the size test overhead negligible, let's factorize it out of
    the alternate copy_to_user function where it is clear to the compiler
    that no stack frame is needed.  Thanks to CONFIG_ARM_UNWIND allowing
    for frame pointers to be disabled and tail call optimization to kick in,
    the overhead in the small copy case becomes only 3 assembly instructions.
    
    A similar trick is applied to clear_user as well.
    Signed-off-by: default avatarNicolas Pitre <nico@marvell.com>
    cb9dc92c
uaccess_with_memcpy.c 3.46 KB