• Rick Edgecombe's avatar
    x86/shstk: Handle vfork clone failure correctly · 33195560
    Rick Edgecombe authored
    Shadow stacks are allocated automatically and freed on exit, depending
    on the clone flags. The two cases where new shadow stacks are not
    allocated are !CLONE_VM (fork()) and CLONE_VFORK (vfork()). For
    !CLONE_VM, although a new stack is not allocated, it can be freed normally
    because it will happen in the child's copy of the VM.
    
    However, for CLONE_VFORK the parent and the child are actually using the
    same shadow stack. So the kernel doesn't need to allocate *or* free a
    shadow stack for a CLONE_VFORK child. CLONE_VFORK children already need
    special tracking to avoid returning to userspace until the child exits or
    execs. Shadow stack uses this same tracking to avoid freeing CLONE_VFORK
    shadow stacks.
    
    However, the tracking is not setup until the clone has succeeded
    (internally). Which means, if a CLONE_VFORK fails, the existing logic will
    not know it is a CLONE_VFORK and proceed to unmap the parents shadow stack.
    This error handling cleanup logic runs via exit_thread() in the
    bad_fork_cleanup_thread label in copy_process(). The issue was seen in
    the glibc test "posix/tst-spawn3-pidfd" while running with shadow stack
    using currently out-of-tree glibc patches.
    
    Fix it by not unmapping the vfork shadow stack in the error case as well.
    Since clone is implemented in core code, it is not ideal to pass the clone
    flags along the error path in order to have shadow stack code have
    symmetric logic in the freeing half of the thread shadow stack handling.
    
    Instead use the existing state for thread shadow stacks to track whether
    the thread is managing its own shadow stack. For CLONE_VFORK, simply set
    shstk->base and shstk->size to 0, and have it mean the thread is not
    managing a shadow stack and so should skip cleanup work. Implement this
    by breaking up the CLONE_VFORK and !CLONE_VM cases in
    shstk_alloc_thread_stack() to separate conditionals since, the logic is
    now different between them. In the case of CLONE_VFORK && !CLONE_VM, the
    existing behavior is to not clean up the shadow stack in the child (which
    should go away quickly with either be exit or exec), so maintain that
    behavior by handling the CLONE_VFORK case first in the allocation path.
    
    This new logioc cleanly handles the case of normal, successful
    CLONE_VFORK's skipping cleaning up their shadow stack's on exit as well.
    So remove the existing, vfork shadow stack freeing logic. This is in
    deactivate_mm() where vfork_done is used to tell if it is a vfork child
    that can skip cleaning up the thread shadow stack.
    
    Fixes: b2926a36 ("x86/shstk: Handle thread shadow stack")
    Reported-by: default avatarH.J. Lu <hjl.tools@gmail.com>
    Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Tested-by: default avatarH.J. Lu <hjl.tools@gmail.com>
    Link: https://lore.kernel.org/all/20230908203655.543765-2-rick.p.edgecombe%40intel.com
    33195560
mmu_context.h 6.8 KB