1. 10 May, 2022 2 commits
  2. 08 May, 2022 20 commits
    • Mark Rutland's avatar
      lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n · 8c6a490e
      Mark Rutland authored
      Recent rework broke building LKDTM when CONFIG_GCC_PLUGIN_STACKLEAK=n.
      This patch fixes that breakage.
      
      Prior to recent stackleak rework, the LKDTM STACKLEAK_ERASING code could
      be built when the kernel was not built with stackleak support, and would
      run a test that would almost certainly fail (or pass by sheer cosmic
      coincidence), e.g.
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: checking unused part of the thread stack (15560 bytes)...
      | lkdtm: FAIL: the erased part is not found (checked 15560 bytes)
      | lkdtm: FAIL: the thread stack is NOT properly erased!
      | lkdtm: This is probably expected, since this kernel (5.18.0-rc2 aarch64) was built *without* CONFIG_GCC_PLUGIN_STACKLEAK=y
      
      The recent rework to the test made it more accurate by using helpers
      which are only defined when CONFIG_GCC_PLUGIN_STACKLEAK=y, and so when
      building LKDTM when CONFIG_GCC_PLUGIN_STACKLEAK=n, we get a build
      failure:
      
      | drivers/misc/lkdtm/stackleak.c: In function 'check_stackleak_irqoff':
      | drivers/misc/lkdtm/stackleak.c:30:46: error: implicit declaration of function 'stackleak_task_low_bound' [-Werror=implicit-function-declaration]
      |    30 |         const unsigned long task_stack_low = stackleak_task_low_bound(current);
      |       |                                              ^~~~~~~~~~~~~~~~~~~~~~~~
      | drivers/misc/lkdtm/stackleak.c:31:47: error: implicit declaration of function 'stackleak_task_high_bound'; did you mean 'stackleak_task_init'? [-Werror=implicit-function-declaration]
      |    31 |         const unsigned long task_stack_high = stackleak_task_high_bound(current);
      |       |                                               ^~~~~~~~~~~~~~~~~~~~~~~~~
      |       |                                               stackleak_task_init
      | drivers/misc/lkdtm/stackleak.c:33:48: error: 'struct task_struct' has no member named 'lowest_stack'
      |    33 |         const unsigned long lowest_sp = current->lowest_stack;
      |       |                                                ^~
      | drivers/misc/lkdtm/stackleak.c:74:23: error: implicit declaration of function 'stackleak_find_top_of_poison' [-Werror=implicit-function-declaration]
      |    74 |         poison_high = stackleak_find_top_of_poison(task_stack_low, untracked_high);
      |       |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      This patch fixes the issue by not compiling the body of the test when
      CONFIG_GCC_PLUGIN_STACKLEAK=n, and replacing this with an unconditional
      XFAIL message. This means the pr_expected_config() in
      check_stackleak_irqoff() is redundant, and so it is removed.
      
      Where an architecture does not support stackleak, the test will log:
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: XFAIL: stackleak is not supported on this arch (HAVE_ARCH_STACKLEAK=n)
      
      Where an architectures does support stackleak, but this has not been
      compiled in, the test will log:
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: XFAIL: stackleak is not enabled (CONFIG_GCC_PLUGIN_STACKLEAK=n)
      
      Where stackleak has been compiled in, the test behaves as usual:
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: stackleak stack usage:
      |   high offset: 336 bytes
      |   current:     688 bytes
      |   lowest:      1232 bytes
      |   tracked:     1232 bytes
      |   untracked:   672 bytes
      |   poisoned:    14136 bytes
      |   low offset:  8 bytes
      | lkdtm: OK: the rest of the thread stack is properly erased
      
      Fixes: f4cfacd92972cc44 ("lkdtm/stackleak: rework boundary management")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220506121145.1162908-1-mark.rutland@arm.com
      8c6a490e
    • Mark Rutland's avatar
      arm64: entry: use stackleak_erase_on_task_stack() · 88959a39
      Mark Rutland authored
      On arm64 we always call stackleak_erase() on a task stack, and never
      call it on another stack. We can avoid some redundant work by using
      stackleak_erase_on_task_stack(), telling the stackleak code that it's
      being called on a task stack.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-14-mark.rutland@arm.com
      88959a39
    • Mark Rutland's avatar
      stackleak: add on/off stack variants · 8111e67d
      Mark Rutland authored
      The stackleak_erase() code dynamically handles being on a task stack or
      another stack. In most cases, this is a fixed property of the caller,
      which the caller is aware of, as an architecture might always return
      using the task stack, or might always return using a trampoline stack.
      
      This patch adds stackleak_erase_on_task_stack() and
      stackleak_erase_off_task_stack() functions which callers can use to
      avoid on_thread_stack() check and associated redundant work when the
      calling stack is known. The existing stackleak_erase() is retained as a
      safe default.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-13-mark.rutland@arm.com
      8111e67d
    • Mark Rutland's avatar
      lkdtm/stackleak: check stack boundaries · f171d695
      Mark Rutland authored
      The stackleak code relies upon the current SP and lowest recorded SP
      falling within expected task stack boundaries.
      
      Check this at the start of the test.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-12-mark.rutland@arm.com
      f171d695
    • Mark Rutland's avatar
      lkdtm/stackleak: prevent unexpected stack usage · f03a5093
      Mark Rutland authored
      The lkdtm_STACKLEAK_ERASING() test is instrumentable and runs with IRQs
      unmasked, so it's possible for unrelated code to clobber the task stack
      and/or manipulate current->lowest_stack while the test is running,
      resulting in spurious failures.
      
      The regular stackleak erasing code is non-instrumentable and runs with
      IRQs masked, preventing similar issues.
      
      Make the body of the test non-instrumentable, and run it with IRQs
      masked, avoiding such spurious failures.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-11-mark.rutland@arm.com
      f03a5093
    • Mark Rutland's avatar
      lkdtm/stackleak: rework boundary management · 72b61896
      Mark Rutland authored
      There are a few problems with the way the LKDTM STACKLEAK_ERASING test
      manipulates the stack pointer and boundary values:
      
      * It uses the address of a local variable to determine the current stack
        pointer, rather than using current_stack_pointer directly. As the
        local variable could be placed anywhere within the stack frame, this
        can be an over-estimate of the true stack pointer value.
      
      * Is uses an estimate of the current stack pointer as the upper boundary
        when scanning for poison, even though prior functions could have used
        more stack (and may have updated current->lowest stack accordingly).
      
      * A pr_info() call is made in the middle of the test. As the printk()
        code is out-of-line and will make use of the stack, this could clobber
        poison and/or adjust current->lowest_stack. It would be better to log
        the metadata after the body of the test to avoid such problems.
      
      These have been observed to result in spurious test failures on arm64.
      
      In addition to this there are a couple of things which are sub-optimal:
      
      * To avoid the STACK_END_MAGIC value, it conditionally modifies 'left'
        if this contains more than a single element, when it could instead
        calculate the bound unconditionally using stackleak_task_low_bound().
      
      * It open-codes the poison scanning. It would be better if this used the
        same helper code as used by erasing function so that the two cannot
        diverge.
      
      This patch reworks the test to avoid these issues, making use of the
      recently introduced helpers to ensure this is aligned with the regular
      stackleak code.
      
      As the new code tests stack boundaries before accessing the stack, there
      is no need to fail early when the tracked or untracked portions of the
      stack extend all the way to the low stack boundary.
      
      As stackleak_find_top_of_poison() is now used to find the top of the
      poisoned region of the stack, the subsequent poison checking starts at
      this boundary and verifies that stackleak_find_top_of_poison() is
      working correctly.
      
      The pr_info() which logged the untracked portion of stack is now moved
      to the end of the function, and logs the size of all the portions of the
      stack relevant to the test, including the portions at the top and bottom
      of the stack which are not erased or scanned, and the current / lowest
      recorded stack usage.
      
      Tested on x86_64:
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: stackleak stack usage:
      |   high offset: 168 bytes
      |   current:     336 bytes
      |   lowest:      656 bytes
      |   tracked:     656 bytes
      |   untracked:   400 bytes
      |   poisoned:    15152 bytes
      |   low offset:  8 bytes
      | lkdtm: OK: the rest of the thread stack is properly erased
      
      Tested on arm64:
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: stackleak stack usage:
      |   high offset: 336 bytes
      |   current:     656 bytes
      |   lowest:      1232 bytes
      |   tracked:     1232 bytes
      |   untracked:   672 bytes
      |   poisoned:    14136 bytes
      |   low offset:  8 bytes
      | lkdtm: OK: the rest of the thread stack is properly erased
      
      Tested on arm64 with deliberate breakage to the starting stack value and
      poison scanning:
      
      | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
      | lkdtm: Performing direct entry STACKLEAK_ERASING
      | lkdtm: FAIL: non-poison value 24 bytes below poison boundary: 0x0
      | lkdtm: FAIL: non-poison value 32 bytes below poison boundary: 0xffff8000083dbc00
      ...
      | lkdtm: FAIL: non-poison value 1912 bytes below poison boundary: 0x78b4b9999e8cb15
      | lkdtm: FAIL: non-poison value 1920 bytes below poison boundary: 0xffff8000083db400
      | lkdtm: stackleak stack usage:
      |   high offset: 336 bytes
      |   current:     688 bytes
      |   lowest:      1232 bytes
      |   tracked:     576 bytes
      |   untracked:   288 bytes
      |   poisoned:    15176 bytes
      |   low offset:  8 bytes
      | lkdtm: FAIL: the thread stack is NOT properly erased!
      | lkdtm: Unexpected! This kernel (5.18.0-rc1-00013-g1f7b1f1e29e0-dirty aarch64) was built with CONFIG_GCC_PLUGIN_STACKLEAK=y
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-10-mark.rutland@arm.com
      72b61896
    • Mark Rutland's avatar
      lkdtm/stackleak: avoid spurious failure · 4130a61c
      Mark Rutland authored
      The lkdtm_STACKLEAK_ERASING() test scans for a contiguous block of
      poison values between the low stack bound and the stack pointer, and
      fails if it does not find a sufficiently large block.
      
      This can happen legitimately if the scan the low stack bound, which
      could occur if functions called prior to lkdtm_STACKLEAK_ERASING() used
      a large amount of stack. If this were to occur, it means that the erased
      portion of the stack is smaller than the size used by the scan, but does
      not cause a functional problem
      
      In practice this is unlikely to happen, but as this is legitimate and
      would not result in a functional problem, the test should not fail in
      this case.
      
      Remove the spurious failure case.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-9-mark.rutland@arm.com
      4130a61c
    • Mark Rutland's avatar
      stackleak: rework poison scanning · 77cf2b6d
      Mark Rutland authored
      Currently we over-estimate the region of stack which must be erased.
      
      To determine the region to be erased, we scan downwards for a contiguous
      block of poison values (or the low bound of the stack). There are a few
      minor problems with this today:
      
      * When we find a block of poison values, we include this block within
        the region to erase.
      
        As this is included within the region to erase, this causes us to
        redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with
        poison.
      
      * As the loop condition checks 'poison_count <= depth', it will run an
        additional iteration after finding the contiguous block of poison,
        decrementing 'erase_low' once more than necessary.
      
        As this is included within the region to erase, this causes us to
        redundantly overwrite an additional unsigned long with poison.
      
      * As we always decrement 'erase_low' after checking an element on the
        stack, we always include the element below this within the region to
        erase.
      
        As this is included within the region to erase, this causes us to
        redundantly overwrite an additional unsigned long with poison.
      
        Note that this is not a functional problem. As the loop condition
        checks 'erase_low > task_stack_low', we'll never clobber the
        STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll
        never fail to erase the element immediately above the STACK_END_MAGIC.
      
      In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)`
      bytes more than necessary, which is unfortunate.
      
      This patch reworks the logic to find the address immediately above the
      poisoned region, by finding the lowest non-poisoned address. This is
      factored into a stackleak_find_top_of_poison() helper both for clarity
      and so that this can be shared with the LKDTM test in subsequent
      patches.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-8-mark.rutland@arm.com
      77cf2b6d
    • Mark Rutland's avatar
      stackleak: rework stack high bound handling · 0cfa2ccd
      Mark Rutland authored
      Prior to returning to userspace, we reset current->lowest_stack to a
      reasonable high bound. Currently we do this by subtracting the arbitrary
      value `THREAD_SIZE/64` from the top of the stack, for reasons lost to
      history.
      
      Looking at configurations today:
      
      * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The
        pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of
        padding above), and so this covers an additional portion of 44 to 60
        bytes.
      
      * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the
        bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs
        at the top of the stack is 168 bytes, and so this cover an additional
        88 bytes of stack (up to 344 with KASAN).
      
      * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages
        and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with
        KASAN). The pt_regs at the top of the stack is 336 bytes, so this can
        fall within the pt_regs, or can cover an additional 688 bytes of
        stack.
      
      Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the
      worst case, this will cause more than 600 bytes of stack to be erased
      for every syscall, even if actual stack usage were substantially
      smaller.
      
      This patches makes this slightly less nonsensical by consistently
      resetting current->lowest_stack to the base of the task pt_regs. For
      clarity and for consistency with the handling of the low bound, the
      generation of the high bound is split into a helper with commentary
      explaining why.
      
      Since the pt_regs at the top of the stack will be clobbered upon the
      next exception entry, we don't need to poison these at exception exit.
      By using task_pt_regs() as the high stack boundary instead of
      current_top_of_stack() we avoid some redundant poisoning, and the
      compiler can share the address generation between the poisoning and
      resetting of `current->lowest_stack`, making the generated code more
      optimal.
      
      It's not clear to me whether the existing `THREAD_SIZE/64` offset was a
      dodgy heuristic to skip the pt_regs, or whether it was attempting to
      minimize the number of times stackleak_check_stack() would have to
      update `current->lowest_stack` when stack usage was shallow at the cost
      of unconditionally poisoning a small portion of the stack for every exit
      to userspace.
      
      For now I've simply removed the offset, and if we need/want to minimize
      updates for shallow stack usage it should be easy to add a better
      heuristic atop, with appropriate commentary so we know what's going on.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
      0cfa2ccd
    • Mark Rutland's avatar
      stackleak: clarify variable names · 1723d39d
      Mark Rutland authored
      The logic within __stackleak_erase() can be a little hard to follow, as
      `boundary` switches from being the low bound to the high bound mid way
      through the function, and `kstack_ptr` is used to represent the start of
      the region to erase while `boundary` represents the end of the region to
      erase.
      
      Make this a little clearer by consistently using clearer variable names.
      The `boundary` variable is removed, the bounds of the region to erase
      are described by `erase_low` and `erase_high`, and bounds of the task
      stack are described by `task_stack_low` and `task_stack_high`.
      
      As the same time, remove the comment above the variables, since it is
      unclear whether it's intended as rationale, a complaint, or a TODO, and
      is more confusing than helpful.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-6-mark.rutland@arm.com
      1723d39d
    • Mark Rutland's avatar
      stackleak: rework stack low bound handling · 9ec79840
      Mark Rutland authored
      In stackleak_task_init(), stackleak_track_stack(), and
      __stackleak_erase(), we open-code skipping the STACK_END_MAGIC at the
      bottom of the stack. Each case is implemented slightly differently, and
      only the __stackleak_erase() case is commented.
      
      In stackleak_task_init() and stackleak_track_stack() we unconditionally
      add sizeof(unsigned long) to the lowest stack address. In
      stackleak_task_init() we use end_of_stack() for this, and in
      stackleak_track_stack() we use task_stack_page(). In __stackleak_erase()
      we handle this by detecting if `kstack_ptr` has hit the stack end
      boundary, and if so, conditionally moving it above the magic.
      
      This patch adds a new stackleak_task_low_bound() helper which is used in
      all three cases, which unconditionally adds sizeof(unsigned long) to the
      lowest address on the task stack, with commentary as to why. This uses
      end_of_stack() as stackleak_task_init() did prior to this patch, as this
      is consistent with the code in kernel/fork.c which initializes the
      STACK_END_MAGIC value.
      
      In __stackleak_erase() we no longer need to check whether we've spilled
      into the STACK_END_MAGIC value, as stackleak_track_stack() ensures that
      `current->lowest_stack` stops immediately above this, and similarly the
      poison scan will stop immediately above this.
      
      For stackleak_task_init() and stackleak_track_stack() this results in no
      change to code generation. For __stackleak_erase() the generated
      assembly is slightly simpler and shorter.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-5-mark.rutland@arm.com
      9ec79840
    • Mark Rutland's avatar
      stackleak: remove redundant check · ac7838b4
      Mark Rutland authored
      In __stackleak_erase() we check that the `erase_low` value derived from
      `current->lowest_stack` is above the lowest legitimate stack pointer
      value, but this is already enforced by stackleak_track_stack() when
      recording the lowest stack value.
      
      Remove the redundant check.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-4-mark.rutland@arm.com
      ac7838b4
    • Mark Rutland's avatar
      stackleak: move skip_erasing() check earlier · a12685e2
      Mark Rutland authored
      In stackleak_erase() we check skip_erasing() after accessing some fields
      from current. As generating the address of current uses asm which
      hazards with the static branch asm, this work is always performed, even
      when the static branch is patched to jump to the return at the end of the
      function.
      
      This patch avoids this redundant work by moving the skip_erasing() check
      earlier.
      
      To avoid complicating initialization within stackleak_erase(), the body
      of the function is split out into a __stackleak_erase() helper, with the
      check left in a wrapper function. The __stackleak_erase() helper is
      marked __always_inline to ensure that this is inlined into
      stackleak_erase() and not instrumented.
      
      Before this patch, on x86-64 w/ GCC 11.1.0 the start of the function is:
      
      <stackleak_erase>:
         65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
         00 00
         48 8b 48 20             mov    0x20(%rax),%rcx
         48 8b 80 98 0a 00 00    mov    0xa98(%rax),%rax
         66 90                   xchg   %ax,%ax  <------------ static branch
         48 89 c2                mov    %rax,%rdx
         48 29 ca                sub    %rcx,%rdx
         48 81 fa ff 3f 00 00    cmp    $0x3fff,%rdx
      
      After this patch, on x86-64 w/ GCC 11.1.0 the start of the function is:
      
      <stackleak_erase>:
         0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)  <--- static branch
         65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
         00 00
         48 8b 48 20             mov    0x20(%rax),%rcx
         48 8b 80 98 0a 00 00    mov    0xa98(%rax),%rax
         48 89 c2                mov    %rax,%rdx
         48 29 ca                sub    %rcx,%rdx
         48 81 fa ff 3f 00 00    cmp    $0x3fff,%rdx
      
      Before this patch, on arm64 w/ GCC 11.1.0 the start of the function is:
      
      <stackleak_erase>:
         d503245f        bti     c
         d5384100        mrs     x0, sp_el0
         f9401003        ldr     x3, [x0, #32]
         f9451000        ldr     x0, [x0, #2592]
         d503201f        nop  <------------------------------- static branch
         d503233f        paciasp
         cb030002        sub     x2, x0, x3
         d287ffe1        mov     x1, #0x3fff
         eb01005f        cmp     x2, x1
      
      After this patch, on arm64 w/ GCC 11.1.0 the start of the function is:
      
      <stackleak_erase>:
         d503245f        bti     c
         d503201f        nop  <------------------------------- static branch
         d503233f        paciasp
         d5384100        mrs     x0, sp_el0
         f9401003        ldr     x3, [x0, #32]
         d287ffe1        mov     x1, #0x3fff
         f9451000        ldr     x0, [x0, #2592]
         cb030002        sub     x2, x0, x3
         eb01005f        cmp     x2, x1
      
      While this may not be a huge win on its own, moving the static branch
      will permit further optimization of the body of the function in
      subsequent patches.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-3-mark.rutland@arm.com
      a12685e2
    • Mark Rutland's avatar
      arm64: stackleak: fix current_top_of_stack() · e85094c3
      Mark Rutland authored
      Due to some historical confusion, arm64's current_top_of_stack() isn't
      what the stackleak code expects. This could in theory result in a number
      of problems, and practically results in an unnecessary performance hit.
      We can avoid this by aligning the arm64 implementation with the x86
      implementation.
      
      The arm64 implementation of current_top_of_stack() was added
      specifically for stackleak in commit:
      
        0b3e3366 ("arm64: Add support for STACKLEAK gcc plugin")
      
      This was intended to be equivalent to the x86 implementation, but the
      implementation, semantics, and performance characteristics differ
      wildly:
      
      * On x86, current_top_of_stack() returns the top of the current task's
        task stack, regardless of which stack is in active use.
      
        The implementation accesses a percpu variable which the x86 entry code
        maintains, and returns the location immediately above the pt_regs on
        the task stack (above which x86 has some padding).
      
      * On arm64 current_top_of_stack() returns the top of the stack in active
        use (i.e. the one which is currently being used).
      
        The implementation checks the SP against a number of
        potentially-accessible stacks, and will BUG() if no stack is found.
      
      The core stackleak_erase() code determines the upper bound of stack to
      erase with:
      
      | if (on_thread_stack())
      |         boundary = current_stack_pointer;
      | else
      |         boundary = current_top_of_stack();
      
      On arm64 stackleak_erase() is always called on a task stack, and
      on_thread_stack() should always be true. On x86, stackleak_erase() is
      mostly called on a trampoline stack, and is sometimes called on a task
      stack.
      
      Currently, this results in a lot of unnecessary code being generated for
      arm64 for the impossible !on_thread_stack() case. Some of this is
      inlined, bloating stackleak_erase(), while portions of this are left
      out-of-line and permitted to be instrumented (which would be a
      functional problem if that code were reachable).
      
      As a first step towards improving this, this patch aligns arm64's
      implementation of current_top_of_stack() with x86's, always returning
      the top of the current task's stack. With GCC 11.1.0 this results in the
      bulk of the unnecessary code being removed, including all of the
      out-of-line instrumentable code.
      
      While I don't believe there's a functional problem in practice I've
      marked this as a fix since the semantic was clearly wrong, the fix
      itself is simple, and other code might rely upon this in future.
      
      Fixes: 0b3e3366 ("arm64: Add support for STACKLEAK gcc plugin")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220427173128.2603085-2-mark.rutland@arm.com
      e85094c3
    • Kees Cook's avatar
      randstruct: Enable Clang support · 035f7f87
      Kees Cook authored
      Clang 15 will support randstruct via the -frandomize-layout-seed-file=...
      option. Update the Kconfig and Makefile to recognize this feature.
      
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: linux-kbuild@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220503205503.3054173-7-keescook@chromium.org
      035f7f87
    • Kees Cook's avatar
      randstruct: Move seed generation into scripts/basic/ · be2b34fa
      Kees Cook authored
      To enable Clang randstruct support, move the structure layout
      randomization seed generation out of scripts/gcc-plugins/ into
      scripts/basic/ so it happens early enough that it can be used by either
      compiler implementation. The gcc-plugin still builds its own header file,
      but now does so from the common "randstruct.seed" file.
      
      Cc: linux-hardening@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220503205503.3054173-6-keescook@chromium.org
      be2b34fa
    • Kees Cook's avatar
      randstruct: Split randstruct Makefile and CFLAGS · 613f4b3e
      Kees Cook authored
      To enable the new Clang randstruct implementation[1], move
      randstruct into its own Makefile and split the CFLAGS from
      GCC_PLUGINS_CFLAGS into RANDSTRUCT_CFLAGS.
      
      [1] https://reviews.llvm.org/D121556
      
      Cc: linux-hardening@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220503205503.3054173-5-keescook@chromium.org
      613f4b3e
    • Kees Cook's avatar
      randstruct: Reorganize Kconfigs and attribute macros · 595b893e
      Kees Cook authored
      In preparation for Clang supporting randstruct, reorganize the Kconfigs,
      move the attribute macros, and generalize the feature to be named
      CONFIG_RANDSTRUCT for on/off, CONFIG_RANDSTRUCT_FULL for the full
      randomization mode, and CONFIG_RANDSTRUCT_PERFORMANCE for the cache-line
      sized mode.
      
      Cc: linux-hardening@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220503205503.3054173-4-keescook@chromium.org
      595b893e
    • Kees Cook's avatar
      sancov: Split plugin build from plugin CFLAGS · d3646589
      Kees Cook authored
      When the sancov_plugin is enabled, it gets added to gcc-plugin-y which
      is used to populate both GCC_PLUGIN (for building the plugin) and
      GCC_PLUGINS_CFLAGS (for enabling and options). Instead of adding sancov
      to both and then removing it from GCC_PLUGINS_CFLAGS, create a separate
      list, gcc-plugin-external-y, which is only added to GCC_PLUGIN.
      
      This will also be used by the coming randstruct build changes.
      
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: linux-kbuild@vger.kernel.org
      Cc: linux-hardening@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220503205503.3054173-3-keescook@chromium.org
      d3646589
    • Kees Cook's avatar
      netfs: Eliminate Clang randstruct warning · 3b5eed3c
      Kees Cook authored
      Clang's structure layout randomization feature gets upset when it sees
      struct inode (which is randomized) cast to struct netfs_i_context. This
      is due to seeing the inode pointer as being treated as an array of inodes,
      rather than "something else, following struct inode".
      
      Since netfs can't use container_of() (since it doesn't know what the
      true containing struct is), it uses this direct offset instead. Adjust
      the code to better reflect what is happening: an arbitrary pointer is
      being adjusted and cast to something else: use a "void *" for the math.
      The resulting binary output is the same, but Clang no longer sees an
      unexpected cross-structure cast:
      
      In file included from ../fs/nfs/inode.c:50:
      In file included from ../fs/nfs/fscache.h:15:
      In file included from ../include/linux/fscache.h:18:
      ../include/linux/netfs.h:298:9: error: casting from randomized structure pointer type 'struct inode *' to 'struct netfs_i_context *'
              return (struct netfs_i_context *)(inode + 1);
                     ^
      1 error generated.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220503205503.3054173-2-keescook@chromium.orgReviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Link: https://lore.kernel.org/lkml/7562f8eccd7cc0e447becfe9912179088784e3b9.camel@kernel.org
      3b5eed3c
  3. 13 Apr, 2022 10 commits
  4. 12 Apr, 2022 3 commits
    • Mikulas Patocka's avatar
      stat: fix inconsistency between struct stat and struct compat_stat · 932aba1e
      Mikulas Patocka authored
      struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
      st_dev and st_rdev; struct compat_stat (defined in
      arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
      a 16-bit padding.
      
      This patch fixes struct compat_stat to match struct stat.
      
      [ Historical note: the old x86 'struct stat' did have that 16-bit field
        that the compat layer had kept around, but it was changes back in 2003
        by "struct stat - support larger dev_t":
      
          https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d
      
        and back in those days, the x86_64 port was still new, and separate
        from the i386 code, and had already picked up the old version with a
        16-bit st_dev field ]
      
      Note that we can't change compat_dev_t because it is used by
      compat_loop_info.
      
      Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
      old_valid_dev to test if the value fits into them.  This fixes
      -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
      number 259.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      932aba1e
    • Jason A. Donenfeld's avatar
      gcc-plugins: latent_entropy: use /dev/urandom · c40160f2
      Jason A. Donenfeld authored
      While the latent entropy plugin mostly doesn't derive entropy from
      get_random_const() for measuring the call graph, when __latent_entropy is
      applied to a constant, then it's initialized statically to output from
      get_random_const(). In that case, this data is derived from a 64-bit
      seed, which means a buffer of 512 bits doesn't really have that amount
      of compile-time entropy.
      
      This patch fixes that shortcoming by just buffering chunks of
      /dev/urandom output and doling it out as requested.
      
      At the same time, it's important that we don't break the use of
      -frandom-seed, for people who want the runtime benefits of the latent
      entropy plugin, while still having compile-time determinism. In that
      case, we detect whether gcc's set_random_seed() has been called by
      making a call to get_random_seed(noinit=true) in the plugin init
      function, which is called after set_random_seed() is called but before
      anything that calls get_random_seed(noinit=false), and seeing if it's
      zero or not. If it's not zero, we're in deterministic mode, and so we
      just generate numbers with a basic xorshift prng.
      
      Note that we don't detect if -frandom-seed is being used using the
      documented local_tick variable, because it's assigned via:
         local_tick = (unsigned) tv.tv_sec * 1000 + tv.tv_usec / 1000;
      which may well overflow and become -1 on its own, and so isn't
      reliable: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105171
      
      [kees: The 256 byte rnd_buf size was chosen based on average (250),
       median (64), and std deviation (575) bytes of used entropy for a
       defconfig x86_64 build]
      
      Fixes: 38addce8 ("gcc-plugins: Add latent_entropy plugin")
      Cc: stable@vger.kernel.org
      Cc: PaX Team <pageexec@freemail.hu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220405222815.21155-1-Jason@zx2c4.com
      c40160f2
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.18-2' of... · 7281a59c
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform drivers fixes from Hans de Goede:
      
       - Documentation and compilation warning fixes
      
       - Kconfig dep fixes
      
       - Misc small code cleanups
      
      * tag 'platform-drivers-x86-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86: amd-pmc: Fix compilation without CONFIG_SUSPEND
        platform/x86: acerhdf: Cleanup str_starts_with()
        Documentation/ABI: sysfs-class-firmware-attributes: Misc. cleanups
        Documentation/ABI: sysfs-class-firmware-attributes: Fix Sphinx errors
        Documentation/ABI: sysfs-driver-intel_sdsi: Fix sphinx warnings
        platform/x86: barco-p50-gpio: Fix duplicate included linux/io.h
        platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative
        platform/x86: think-lmi: certificate support clean ups
      7281a59c
  5. 11 Apr, 2022 5 commits