Commits · 61f60bac8c05f8ecd2ae2a6360520b91a45be9a2 · Kirill Smelkov / linux

10 May, 2022 2 commits

gcc-plugins: Change all version strings match kernel · 61f60bac

Kees Cook authored May 10, 2022

It's not meaningful for the GCC plugins to track their versions separately
from the rest of the kernel. Switch all versions to the kernel version.

Fix mismatched indenting while we're at it.

Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>

61f60bac

randomize_kstack: Improve docs on requirements/rationale · 1ff29758

Kees Cook authored May 10, 2022

There were some recent questions about where and why to use the
random_kstack routines when applying them to new architectures[1].
Update the header comments to reflect the design choices for the
routines.

[1] https://lore.kernel.org/lkml/1652173338.7bltwybi0c.astroid@bobo.none

Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>

1ff29758

08 May, 2022 20 commits

lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n · 8c6a490e

Mark Rutland authored May 06, 2022

Recent rework broke building LKDTM when CONFIG_GCC_PLUGIN_STACKLEAK=n.
This patch fixes that breakage.

Prior to recent stackleak rework, the LKDTM STACKLEAK_ERASING code could
be built when the kernel was not built with stackleak support, and would
run a test that would almost certainly fail (or pass by sheer cosmic
coincidence), e.g.

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: checking unused part of the thread stack (15560 bytes)...
| lkdtm: FAIL: the erased part is not found (checked 15560 bytes)
| lkdtm: FAIL: the thread stack is NOT properly erased!
| lkdtm: This is probably expected, since this kernel (5.18.0-rc2 aarch64) was built *without* CONFIG_GCC_PLUGIN_STACKLEAK=y

The recent rework to the test made it more accurate by using helpers
which are only defined when CONFIG_GCC_PLUGIN_STACKLEAK=y, and so when
building LKDTM when CONFIG_GCC_PLUGIN_STACKLEAK=n, we get a build
failure:

| drivers/misc/lkdtm/stackleak.c: In function 'check_stackleak_irqoff':
| drivers/misc/lkdtm/stackleak.c:30:46: error: implicit declaration of function 'stackleak_task_low_bound' [-Werror=implicit-function-declaration]
|    30 |         const unsigned long task_stack_low = stackleak_task_low_bound(current);
|       |                                              ^~~~~~~~~~~~~~~~~~~~~~~~
| drivers/misc/lkdtm/stackleak.c:31:47: error: implicit declaration of function 'stackleak_task_high_bound'; did you mean 'stackleak_task_init'? [-Werror=implicit-function-declaration]
|    31 |         const unsigned long task_stack_high = stackleak_task_high_bound(current);
|       |                                               ^~~~~~~~~~~~~~~~~~~~~~~~~
|       |                                               stackleak_task_init
| drivers/misc/lkdtm/stackleak.c:33:48: error: 'struct task_struct' has no member named 'lowest_stack'
|    33 |         const unsigned long lowest_sp = current->lowest_stack;
|       |                                                ^~
| drivers/misc/lkdtm/stackleak.c:74:23: error: implicit declaration of function 'stackleak_find_top_of_poison' [-Werror=implicit-function-declaration]
|    74 |         poison_high = stackleak_find_top_of_poison(task_stack_low, untracked_high);
|       |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

This patch fixes the issue by not compiling the body of the test when
CONFIG_GCC_PLUGIN_STACKLEAK=n, and replacing this with an unconditional
XFAIL message. This means the pr_expected_config() in
check_stackleak_irqoff() is redundant, and so it is removed.

Where an architecture does not support stackleak, the test will log:

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: XFAIL: stackleak is not supported on this arch (HAVE_ARCH_STACKLEAK=n)

Where an architectures does support stackleak, but this has not been
compiled in, the test will log:

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: XFAIL: stackleak is not enabled (CONFIG_GCC_PLUGIN_STACKLEAK=n)

Where stackleak has been compiled in, the test behaves as usual:

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: stackleak stack usage:
|   high offset: 336 bytes
|   current:     688 bytes
|   lowest:      1232 bytes
|   tracked:     1232 bytes
|   untracked:   672 bytes
|   poisoned:    14136 bytes
|   low offset:  8 bytes
| lkdtm: OK: the rest of the thread stack is properly erased

Fixes: f4cfacd92972cc44 ("lkdtm/stackleak: rework boundary management")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220506121145.1162908-1-mark.rutland@arm.com

8c6a490e

arm64: entry: use stackleak_erase_on_task_stack() · 88959a39

Mark Rutland authored Apr 27, 2022

On arm64 we always call stackleak_erase() on a task stack, and never
call it on another stack. We can avoid some redundant work by using
stackleak_erase_on_task_stack(), telling the stackleak code that it's
being called on a task stack.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-14-mark.rutland@arm.com

88959a39

stackleak: add on/off stack variants · 8111e67d

Mark Rutland authored Apr 27, 2022

The stackleak_erase() code dynamically handles being on a task stack or
another stack. In most cases, this is a fixed property of the caller,
which the caller is aware of, as an architecture might always return
using the task stack, or might always return using a trampoline stack.

This patch adds stackleak_erase_on_task_stack() and
stackleak_erase_off_task_stack() functions which callers can use to
avoid on_thread_stack() check and associated redundant work when the
calling stack is known. The existing stackleak_erase() is retained as a
safe default.

There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-13-mark.rutland@arm.com

8111e67d

lkdtm/stackleak: check stack boundaries · f171d695

Mark Rutland authored Apr 27, 2022

The stackleak code relies upon the current SP and lowest recorded SP
falling within expected task stack boundaries.

Check this at the start of the test.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-12-mark.rutland@arm.com

f171d695

lkdtm/stackleak: prevent unexpected stack usage · f03a5093

Mark Rutland authored Apr 27, 2022

The lkdtm_STACKLEAK_ERASING() test is instrumentable and runs with IRQs
unmasked, so it's possible for unrelated code to clobber the task stack
and/or manipulate current->lowest_stack while the test is running,
resulting in spurious failures.

The regular stackleak erasing code is non-instrumentable and runs with
IRQs masked, preventing similar issues.

Make the body of the test non-instrumentable, and run it with IRQs
masked, avoiding such spurious failures.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-11-mark.rutland@arm.com

f03a5093

lkdtm/stackleak: rework boundary management · 72b61896

Mark Rutland authored Apr 27, 2022

There are a few problems with the way the LKDTM STACKLEAK_ERASING test
manipulates the stack pointer and boundary values:

* It uses the address of a local variable to determine the current stack
  pointer, rather than using current_stack_pointer directly. As the
  local variable could be placed anywhere within the stack frame, this
  can be an over-estimate of the true stack pointer value.

* Is uses an estimate of the current stack pointer as the upper boundary
  when scanning for poison, even though prior functions could have used
  more stack (and may have updated current->lowest stack accordingly).

* A pr_info() call is made in the middle of the test. As the printk()
  code is out-of-line and will make use of the stack, this could clobber
  poison and/or adjust current->lowest_stack. It would be better to log
  the metadata after the body of the test to avoid such problems.

These have been observed to result in spurious test failures on arm64.

In addition to this there are a couple of things which are sub-optimal:

* To avoid the STACK_END_MAGIC value, it conditionally modifies 'left'
  if this contains more than a single element, when it could instead
  calculate the bound unconditionally using stackleak_task_low_bound().

* It open-codes the poison scanning. It would be better if this used the
  same helper code as used by erasing function so that the two cannot
  diverge.

This patch reworks the test to avoid these issues, making use of the
recently introduced helpers to ensure this is aligned with the regular
stackleak code.

As the new code tests stack boundaries before accessing the stack, there
is no need to fail early when the tracked or untracked portions of the
stack extend all the way to the low stack boundary.

As stackleak_find_top_of_poison() is now used to find the top of the
poisoned region of the stack, the subsequent poison checking starts at
this boundary and verifies that stackleak_find_top_of_poison() is
working correctly.

The pr_info() which logged the untracked portion of stack is now moved
to the end of the function, and logs the size of all the portions of the
stack relevant to the test, including the portions at the top and bottom
of the stack which are not erased or scanned, and the current / lowest
recorded stack usage.

Tested on x86_64:

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: stackleak stack usage:
|   high offset: 168 bytes
|   current:     336 bytes
|   lowest:      656 bytes
|   tracked:     656 bytes
|   untracked:   400 bytes
|   poisoned:    15152 bytes
|   low offset:  8 bytes
| lkdtm: OK: the rest of the thread stack is properly erased

Tested on arm64:

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: stackleak stack usage:
|   high offset: 336 bytes
|   current:     656 bytes
|   lowest:      1232 bytes
|   tracked:     1232 bytes
|   untracked:   672 bytes
|   poisoned:    14136 bytes
|   low offset:  8 bytes
| lkdtm: OK: the rest of the thread stack is properly erased

Tested on arm64 with deliberate breakage to the starting stack value and
poison scanning:

| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry STACKLEAK_ERASING
| lkdtm: FAIL: non-poison value 24 bytes below poison boundary: 0x0
| lkdtm: FAIL: non-poison value 32 bytes below poison boundary: 0xffff8000083dbc00
...
| lkdtm: FAIL: non-poison value 1912 bytes below poison boundary: 0x78b4b9999e8cb15
| lkdtm: FAIL: non-poison value 1920 bytes below poison boundary: 0xffff8000083db400
| lkdtm: stackleak stack usage:
|   high offset: 336 bytes
|   current:     688 bytes
|   lowest:      1232 bytes
|   tracked:     576 bytes
|   untracked:   288 bytes
|   poisoned:    15176 bytes
|   low offset:  8 bytes
| lkdtm: FAIL: the thread stack is NOT properly erased!
| lkdtm: Unexpected! This kernel (5.18.0-rc1-00013-g1f7b1f1e29e0-dirty aarch64) was built with CONFIG_GCC_PLUGIN_STACKLEAK=y
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-10-mark.rutland@arm.com

72b61896

lkdtm/stackleak: avoid spurious failure · 4130a61c

Mark Rutland authored Apr 27, 2022

The lkdtm_STACKLEAK_ERASING() test scans for a contiguous block of
poison values between the low stack bound and the stack pointer, and
fails if it does not find a sufficiently large block.

This can happen legitimately if the scan the low stack bound, which
could occur if functions called prior to lkdtm_STACKLEAK_ERASING() used
a large amount of stack. If this were to occur, it means that the erased
portion of the stack is smaller than the size used by the scan, but does
not cause a functional problem

In practice this is unlikely to happen, but as this is legitimate and
would not result in a functional problem, the test should not fail in
this case.

Remove the spurious failure case.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-9-mark.rutland@arm.com

4130a61c

stackleak: rework poison scanning · 77cf2b6d

Mark Rutland authored Apr 27, 2022

Currently we over-estimate the region of stack which must be erased.

To determine the region to be erased, we scan downwards for a contiguous
block of poison values (or the low bound of the stack). There are a few
minor problems with this today:

* When we find a block of poison values, we include this block within
  the region to erase.

  As this is included within the region to erase, this causes us to
  redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with
  poison.

* As the loop condition checks 'poison_count <= depth', it will run an
  additional iteration after finding the contiguous block of poison,
  decrementing 'erase_low' once more than necessary.

  As this is included within the region to erase, this causes us to
  redundantly overwrite an additional unsigned long with poison.

* As we always decrement 'erase_low' after checking an element on the
  stack, we always include the element below this within the region to
  erase.

  As this is included within the region to erase, this causes us to
  redundantly overwrite an additional unsigned long with poison.

  Note that this is not a functional problem. As the loop condition
  checks 'erase_low > task_stack_low', we'll never clobber the
  STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll
  never fail to erase the element immediately above the STACK_END_MAGIC.

In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)`
bytes more than necessary, which is unfortunate.

This patch reworks the logic to find the address immediately above the
poisoned region, by finding the lowest non-poisoned address. This is
factored into a stackleak_find_top_of_poison() helper both for clarity
and so that this can be shared with the LKDTM test in subsequent
patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-8-mark.rutland@arm.com

77cf2b6d

stackleak: rework stack high bound handling · 0cfa2ccd

Mark Rutland authored Apr 27, 2022

Prior to returning to userspace, we reset current->lowest_stack to a
reasonable high bound. Currently we do this by subtracting the arbitrary
value `THREAD_SIZE/64` from the top of the stack, for reasons lost to
history.

Looking at configurations today:

* On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The
  pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of
  padding above), and so this covers an additional portion of 44 to 60
  bytes.

* On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the
  bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs
  at the top of the stack is 168 bytes, and so this cover an additional
  88 bytes of stack (up to 344 with KASAN).

* On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages
  and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with
  KASAN). The pt_regs at the top of the stack is 336 bytes, so this can
  fall within the pt_regs, or can cover an additional 688 bytes of
  stack.

Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the
worst case, this will cause more than 600 bytes of stack to be erased
for every syscall, even if actual stack usage were substantially
smaller.

This patches makes this slightly less nonsensical by consistently
resetting current->lowest_stack to the base of the task pt_regs. For
clarity and for consistency with the handling of the low bound, the
generation of the high bound is split into a helper with commentary
explaining why.

Since the pt_regs at the top of the stack will be clobbered upon the
next exception entry, we don't need to poison these at exception exit.
By using task_pt_regs() as the high stack boundary instead of
current_top_of_stack() we avoid some redundant poisoning, and the
compiler can share the address generation between the poisoning and
resetting of `current->lowest_stack`, making the generated code more
optimal.

It's not clear to me whether the existing `THREAD_SIZE/64` offset was a
dodgy heuristic to skip the pt_regs, or whether it was attempting to
minimize the number of times stackleak_check_stack() would have to
update `current->lowest_stack` when stack usage was shallow at the cost
of unconditionally poisoning a small portion of the stack for every exit
to userspace.

For now I've simply removed the offset, and if we need/want to minimize
updates for shallow stack usage it should be easy to add a better
heuristic atop, with appropriate commentary so we know what's going on.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com

0cfa2ccd

stackleak: clarify variable names · 1723d39d

Mark Rutland authored Apr 27, 2022

The logic within __stackleak_erase() can be a little hard to follow, as
`boundary` switches from being the low bound to the high bound mid way
through the function, and `kstack_ptr` is used to represent the start of
the region to erase while `boundary` represents the end of the region to
erase.

Make this a little clearer by consistently using clearer variable names.
The `boundary` variable is removed, the bounds of the region to erase
are described by `erase_low` and `erase_high`, and bounds of the task
stack are described by `task_stack_low` and `task_stack_high`.

As the same time, remove the comment above the variables, since it is
unclear whether it's intended as rationale, a complaint, or a TODO, and
is more confusing than helpful.

There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-6-mark.rutland@arm.com

1723d39d

stackleak: rework stack low bound handling · 9ec79840

Mark Rutland authored Apr 27, 2022

In stackleak_task_init(), stackleak_track_stack(), and
__stackleak_erase(), we open-code skipping the STACK_END_MAGIC at the
bottom of the stack. Each case is implemented slightly differently, and
only the __stackleak_erase() case is commented.

In stackleak_task_init() and stackleak_track_stack() we unconditionally
add sizeof(unsigned long) to the lowest stack address. In
stackleak_task_init() we use end_of_stack() for this, and in
stackleak_track_stack() we use task_stack_page(). In __stackleak_erase()
we handle this by detecting if `kstack_ptr` has hit the stack end
boundary, and if so, conditionally moving it above the magic.

This patch adds a new stackleak_task_low_bound() helper which is used in
all three cases, which unconditionally adds sizeof(unsigned long) to the
lowest address on the task stack, with commentary as to why. This uses
end_of_stack() as stackleak_task_init() did prior to this patch, as this
is consistent with the code in kernel/fork.c which initializes the
STACK_END_MAGIC value.

In __stackleak_erase() we no longer need to check whether we've spilled
into the STACK_END_MAGIC value, as stackleak_track_stack() ensures that
`current->lowest_stack` stops immediately above this, and similarly the
poison scan will stop immediately above this.

For stackleak_task_init() and stackleak_track_stack() this results in no
change to code generation. For __stackleak_erase() the generated
assembly is slightly simpler and shorter.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-5-mark.rutland@arm.com

9ec79840

stackleak: remove redundant check · ac7838b4

Mark Rutland authored Apr 27, 2022

In __stackleak_erase() we check that the `erase_low` value derived from
`current->lowest_stack` is above the lowest legitimate stack pointer
value, but this is already enforced by stackleak_track_stack() when
recording the lowest stack value.

Remove the redundant check.

There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-4-mark.rutland@arm.com

ac7838b4

stackleak: move skip_erasing() check earlier · a12685e2

Mark Rutland authored Apr 27, 2022

In stackleak_erase() we check skip_erasing() after accessing some fields
from current. As generating the address of current uses asm which
hazards with the static branch asm, this work is always performed, even
when the static branch is patched to jump to the return at the end of the
function.

This patch avoids this redundant work by moving the skip_erasing() check
earlier.

To avoid complicating initialization within stackleak_erase(), the body
of the function is split out into a __stackleak_erase() helper, with the
check left in a wrapper function. The __stackleak_erase() helper is
marked __always_inline to ensure that this is inlined into
stackleak_erase() and not instrumented.

Before this patch, on x86-64 w/ GCC 11.1.0 the start of the function is:

<stackleak_erase>:
   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
   00 00
   48 8b 48 20             mov    0x20(%rax),%rcx
   48 8b 80 98 0a 00 00    mov    0xa98(%rax),%rax
   66 90                   xchg   %ax,%ax  <------------ static branch
   48 89 c2                mov    %rax,%rdx
   48 29 ca                sub    %rcx,%rdx
   48 81 fa ff 3f 00 00    cmp    $0x3fff,%rdx

After this patch, on x86-64 w/ GCC 11.1.0 the start of the function is:

<stackleak_erase>:
   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)  <--- static branch
   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
   00 00
   48 8b 48 20             mov    0x20(%rax),%rcx
   48 8b 80 98 0a 00 00    mov    0xa98(%rax),%rax
   48 89 c2                mov    %rax,%rdx
   48 29 ca                sub    %rcx,%rdx
   48 81 fa ff 3f 00 00    cmp    $0x3fff,%rdx

Before this patch, on arm64 w/ GCC 11.1.0 the start of the function is:

<stackleak_erase>:
   d503245f        bti     c
   d5384100        mrs     x0, sp_el0
   f9401003        ldr     x3, [x0, #32]
   f9451000        ldr     x0, [x0, #2592]
   d503201f        nop  <------------------------------- static branch
   d503233f        paciasp
   cb030002        sub     x2, x0, x3
   d287ffe1        mov     x1, #0x3fff
   eb01005f        cmp     x2, x1

After this patch, on arm64 w/ GCC 11.1.0 the start of the function is:

<stackleak_erase>:
   d503245f        bti     c
   d503201f        nop  <------------------------------- static branch
   d503233f        paciasp
   d5384100        mrs     x0, sp_el0
   f9401003        ldr     x3, [x0, #32]
   d287ffe1        mov     x1, #0x3fff
   f9451000        ldr     x0, [x0, #2592]
   cb030002        sub     x2, x0, x3
   eb01005f        cmp     x2, x1

While this may not be a huge win on its own, moving the static branch
will permit further optimization of the body of the function in
subsequent patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-3-mark.rutland@arm.com

a12685e2

arm64: stackleak: fix current_top_of_stack() · e85094c3

Mark Rutland authored Apr 27, 2022

Due to some historical confusion, arm64's current_top_of_stack() isn't
what the stackleak code expects. This could in theory result in a number
of problems, and practically results in an unnecessary performance hit.
We can avoid this by aligning the arm64 implementation with the x86
implementation.

The arm64 implementation of current_top_of_stack() was added
specifically for stackleak in commit:

  0b3e3366 ("arm64: Add support for STACKLEAK gcc plugin")

This was intended to be equivalent to the x86 implementation, but the
implementation, semantics, and performance characteristics differ
wildly:

* On x86, current_top_of_stack() returns the top of the current task's
  task stack, regardless of which stack is in active use.

  The implementation accesses a percpu variable which the x86 entry code
  maintains, and returns the location immediately above the pt_regs on
  the task stack (above which x86 has some padding).

* On arm64 current_top_of_stack() returns the top of the stack in active
  use (i.e. the one which is currently being used).

  The implementation checks the SP against a number of
  potentially-accessible stacks, and will BUG() if no stack is found.

The core stackleak_erase() code determines the upper bound of stack to
erase with:

| if (on_thread_stack())
|         boundary = current_stack_pointer;
| else
|         boundary = current_top_of_stack();

On arm64 stackleak_erase() is always called on a task stack, and
on_thread_stack() should always be true. On x86, stackleak_erase() is
mostly called on a trampoline stack, and is sometimes called on a task
stack.

Currently, this results in a lot of unnecessary code being generated for
arm64 for the impossible !on_thread_stack() case. Some of this is
inlined, bloating stackleak_erase(), while portions of this are left
out-of-line and permitted to be instrumented (which would be a
functional problem if that code were reachable).

As a first step towards improving this, this patch aligns arm64's
implementation of current_top_of_stack() with x86's, always returning
the top of the current task's stack. With GCC 11.1.0 this results in the
bulk of the unnecessary code being removed, including all of the
out-of-line instrumentable code.

While I don't believe there's a functional problem in practice I've
marked this as a fix since the semantic was clearly wrong, the fix
itself is simple, and other code might rely upon this in future.

Fixes: 0b3e3366 ("arm64: Add support for STACKLEAK gcc plugin")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220427173128.2603085-2-mark.rutland@arm.com

e85094c3

randstruct: Enable Clang support · 035f7f87

Kees Cook authored May 03, 2022

Clang 15 will support randstruct via the -frandomize-layout-seed-file=...
option. Update the Kconfig and Makefile to recognize this feature.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-7-keescook@chromium.org

035f7f87

randstruct: Move seed generation into scripts/basic/ · be2b34fa

Kees Cook authored May 03, 2022

To enable Clang randstruct support, move the structure layout
randomization seed generation out of scripts/gcc-plugins/ into
scripts/basic/ so it happens early enough that it can be used by either
compiler implementation. The gcc-plugin still builds its own header file,
but now does so from the common "randstruct.seed" file.

Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-6-keescook@chromium.org

be2b34fa

randstruct: Split randstruct Makefile and CFLAGS · 613f4b3e

Kees Cook authored May 03, 2022

To enable the new Clang randstruct implementation[1], move
randstruct into its own Makefile and split the CFLAGS from
GCC_PLUGINS_CFLAGS into RANDSTRUCT_CFLAGS.

[1] https://reviews.llvm.org/D121556

Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-5-keescook@chromium.org

613f4b3e

randstruct: Reorganize Kconfigs and attribute macros · 595b893e

Kees Cook authored May 03, 2022

In preparation for Clang supporting randstruct, reorganize the Kconfigs,
move the attribute macros, and generalize the feature to be named
CONFIG_RANDSTRUCT for on/off, CONFIG_RANDSTRUCT_FULL for the full
randomization mode, and CONFIG_RANDSTRUCT_PERFORMANCE for the cache-line
sized mode.

Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-4-keescook@chromium.org

595b893e

sancov: Split plugin build from plugin CFLAGS · d3646589

Kees Cook authored May 03, 2022

When the sancov_plugin is enabled, it gets added to gcc-plugin-y which
is used to populate both GCC_PLUGIN (for building the plugin) and
GCC_PLUGINS_CFLAGS (for enabling and options). Instead of adding sancov
to both and then removing it from GCC_PLUGINS_CFLAGS, create a separate
list, gcc-plugin-external-y, which is only added to GCC_PLUGIN.

This will also be used by the coming randstruct build changes.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: linux-kbuild@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-3-keescook@chromium.org

d3646589

netfs: Eliminate Clang randstruct warning · 3b5eed3c

Kees Cook authored May 03, 2022

Clang's structure layout randomization feature gets upset when it sees
struct inode (which is randomized) cast to struct netfs_i_context. This
is due to seeing the inode pointer as being treated as an array of inodes,
rather than "something else, following struct inode".

Since netfs can't use container_of() (since it doesn't know what the
true containing struct is), it uses this direct offset instead. Adjust
the code to better reflect what is happening: an arbitrary pointer is
being adjusted and cast to something else: use a "void *" for the math.
The resulting binary output is the same, but Clang no longer sees an
unexpected cross-structure cast:

In file included from ../fs/nfs/inode.c:50:
In file included from ../fs/nfs/fscache.h:15:
In file included from ../include/linux/fscache.h:18:
../include/linux/netfs.h:298:9: error: casting from randomized structure pointer type 'struct inode *' to 'struct netfs_i_context *'
        return (struct netfs_i_context *)(inode + 1);
               ^
1 error generated.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-2-keescook@chromium.orgReviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/lkml/7562f8eccd7cc0e447becfe9912179088784e3b9.camel@kernel.org

3b5eed3c

13 Apr, 2022 10 commits

cfi: Use __builtin_function_start · e6f3b3c9

Sami Tolvanen authored Apr 05, 2022

Clang 14 added support for the __builtin_function_start function,
which allows us to implement the function_nocfi macro without
architecture-specific inline assembly and in a way that also works
with static initializers.

Change CONFIG_CFI_CLANG to depend on Clang >= 14, define
function_nocfi using __builtin_function_start, and remove the arm64
inline assembly implementation.

Link: https://github.com/llvm/llvm-project/commit/ec2e26eaf63558934f5b73a6e530edc453cf9508
Link: https://github.com/ClangBuiltLinux/linux/issues/1353Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will@kernel.org> # arm64
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220405221618.633743-1-samitolvanen@google.com

e6f3b3c9

security: don't treat structure as an array of struct hlist_head · 75c1182e

Bill Wendling authored Apr 07, 2022

The initialization of "security_hook_heads" is done by casting it to
another structure pointer type, and treating it as an array of "struct
hlist_head" objects. This requires an exception be made in "randstruct",
because otherwise it will emit an error, reducing the effectiveness of
the hardening technique.

Instead of using a cast, initialize the individual struct hlist_head
elements in security_hook_heads explicitly. This removes the need for
the cast and randstruct exception.
Signed-off-by: Bill Wendling <morbo@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220407175930.471870-1-morbo@google.com

75c1182e

usercopy: Remove HARDENED_USERCOPY_PAGESPAN · 1109a5d9

Matthew Wilcox (Oracle) authored Jan 10, 2022

There isn't enough information to make this a useful check any more;
the useful parts of it were moved in earlier patches, so remove this
set of checks now.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220110231530.665970-5-willy@infradead.org

1109a5d9

mm/usercopy: Detect large folio overruns · ab502103

Matthew Wilcox (Oracle) authored Jan 10, 2022

Move the compound page overrun detection out of
CONFIG_HARDENED_USERCOPY_PAGESPAN and convert it to use folios so it's
enabled for more people.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220110231530.665970-4-willy@infradead.org

ab502103

mm/usercopy: Detect vmalloc overruns · 0aef499f

Matthew Wilcox (Oracle) authored Jan 10, 2022

If you have a vmalloc() allocation, or an address from calling vmap(),
you cannot overrun the vm_area which describes it, regardless of the
size of the underlying allocation. This probably doesn't do much for
security because vmalloc comes with guard pages these days, but it
prevents usercopy aborts when copying to a vmap() of smaller pages.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220110231530.665970-3-willy@infradead.org

0aef499f

mm/usercopy: Check kmap addresses properly · 4e140f59

Matthew Wilcox (Oracle) authored Jan 10, 2022

If you are copying to an address in the kmap region, you may not copy
across a page boundary, no matter what the size of the underlying
allocation.  You can't kmap() a slab page because slab pages always
come from low memory.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org

4e140f59

Merge tag 'hardening-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · a1994480

Linus Torvalds authored Apr 12, 2022

Pull hardening fixes from Kees Cook:

 - latent_entropy: Use /dev/urandom instead of small GCC seed (Jason
   Donenfeld)

 - uapi/stddef.h: add missed include guards (Tadeusz Struk)

* tag 'hardening-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: latent_entropy: use /dev/urandom
  uapi/linux/stddef.h: Add include guards

a1994480

Merge tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · c1488c97

Linus Torvalds authored Apr 12, 2022

Pull nfsd fixes from Chuck Lever:

 - Fix a write performance regression

 - Fix crashes during request deferral on RDMA transports

* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix the svc_deferred_event trace class
  SUNRPC: Fix NFSD's request deferral on RDMA transports
  nfsd: Clean up nfsd_file_put()
  nfsd: Fix a write performance regression
  SUNRPC: Return true/false (not 1/0) from bool functions

c1488c97

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 453096eb

Linus Torvalds authored Apr 12, 2022

Pull kvm fixes from Paolo Bonzini:
 "x86:

   - Miscellaneous bugfixes

   - A small cleanup for the new workqueue code

   - Documentation syntax fix

  RISC-V:

   - Remove hgatp zeroing in kvm_arch_vcpu_put()

   - Fix alignment of the guest_hang() in KVM selftest

   - Fix PTE A and D bits in KVM selftest

   - Missing #include in vcpu_fp.c

  ARM:

   - Some PSCI fixes after introducing PSCIv1.1 and SYSTEM_RESET2

   - Fix the MMU write-lock not being taken on THP split

   - Fix mixed-width VM handling

   - Fix potential UAF when debugfs registration fails

   - Various selftest updates for all of the above"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (24 commits)
  KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU
  KVM: SVM: Do not activate AVIC for SEV-enabled guest
  Documentation: KVM: Add SPDX-License-Identifier tag
  selftests: kvm: add tsc_scaling_sync to .gitignore
  RISC-V: KVM: include missing hwcap.h into vcpu_fp
  KVM: selftests: riscv: Fix alignment of the guest_hang() function
  KVM: selftests: riscv: Set PTE A and D bits in VS-stage page table
  RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put()
  selftests: KVM: Free the GIC FD when cleaning up in arch_timer
  selftests: KVM: Don't leak GIC FD across dirty log test iterations
  KVM: Don't create VM debugfs files outside of the VM directory
  KVM: selftests: get-reg-list: Add KVM_REG_ARM_FW_REG(3)
  KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
  KVM: arm64: selftests: Introduce vcpu_width_config
  KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs
  KVM: arm64: vgic: Remove unnecessary type castings
  KVM: arm64: Don't split hugepages outside of MMU write lock
  KVM: arm64: Drop unneeded minor version check from PSCI v1.x handler
  KVM: arm64: Actually prevent SMC64 SYSTEM_RESET2 from AArch32
  KVM: arm64: Generally disallow SMC64 for AArch32 guests
  ...

453096eb

Merge tag 'media/v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 7083b89e

Linus Torvalds authored Apr 12, 2022

Pull media fixes from Mauro Carvalho Chehab:

 - a regression fix for si2157

 - a Kconfig dependency fix for imx-mipi-csis

 - fix the rockchip/rga driver probing logic

* tag 'media/v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: si2157: unknown chip version Si2147-A30 ROM 0x50
  media: platform: imx-mipi-csis: Add dependency on VIDEO_DEV
  media: rockchip/rga: do proper error checking in probe

7083b89e

12 Apr, 2022 3 commits

stat: fix inconsistency between struct stat and struct compat_stat · 932aba1e

Mikulas Patocka authored Apr 12, 2022

struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
st_dev and st_rdev; struct compat_stat (defined in
arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
a 16-bit padding.

This patch fixes struct compat_stat to match struct stat.

[ Historical note: the old x86 'struct stat' did have that 16-bit field
  that the compat layer had kept around, but it was changes back in 2003
  by "struct stat - support larger dev_t":

    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d

  and back in those days, the x86_64 port was still new, and separate
  from the i386 code, and had already picked up the old version with a
  16-bit st_dev field ]

Note that we can't change compat_dev_t because it is used by
compat_loop_info.

Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
old_valid_dev to test if the value fits into them.  This fixes
-EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
number 259.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

932aba1e

gcc-plugins: latent_entropy: use /dev/urandom · c40160f2

Jason A. Donenfeld authored Apr 06, 2022

While the latent entropy plugin mostly doesn't derive entropy from
get_random_const() for measuring the call graph, when __latent_entropy is
applied to a constant, then it's initialized statically to output from
get_random_const(). In that case, this data is derived from a 64-bit
seed, which means a buffer of 512 bits doesn't really have that amount
of compile-time entropy.

This patch fixes that shortcoming by just buffering chunks of
/dev/urandom output and doling it out as requested.

At the same time, it's important that we don't break the use of
-frandom-seed, for people who want the runtime benefits of the latent
entropy plugin, while still having compile-time determinism. In that
case, we detect whether gcc's set_random_seed() has been called by
making a call to get_random_seed(noinit=true) in the plugin init
function, which is called after set_random_seed() is called but before
anything that calls get_random_seed(noinit=false), and seeing if it's
zero or not. If it's not zero, we're in deterministic mode, and so we
just generate numbers with a basic xorshift prng.

Note that we don't detect if -frandom-seed is being used using the
documented local_tick variable, because it's assigned via:
   local_tick = (unsigned) tv.tv_sec * 1000 + tv.tv_usec / 1000;
which may well overflow and become -1 on its own, and so isn't
reliable: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105171

[kees: The 256 byte rnd_buf size was chosen based on average (250),
 median (64), and std deviation (575) bytes of used entropy for a
 defconfig x86_64 build]

Fixes: 38addce8 ("gcc-plugins: Add latent_entropy plugin")
Cc: stable@vger.kernel.org
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220405222815.21155-1-Jason@zx2c4.com

c40160f2

Merge tag 'platform-drivers-x86-v5.18-2' of... · 7281a59c

Linus Torvalds authored Apr 11, 2022

Merge tag 'platform-drivers-x86-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Hans de Goede:

 - Documentation and compilation warning fixes

 - Kconfig dep fixes

 - Misc small code cleanups

* tag 'platform-drivers-x86-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: amd-pmc: Fix compilation without CONFIG_SUSPEND
  platform/x86: acerhdf: Cleanup str_starts_with()
  Documentation/ABI: sysfs-class-firmware-attributes: Misc. cleanups
  Documentation/ABI: sysfs-class-firmware-attributes: Fix Sphinx errors
  Documentation/ABI: sysfs-driver-intel_sdsi: Fix sphinx warnings
  platform/x86: barco-p50-gpio: Fix duplicate included linux/io.h
  platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative
  platform/x86: think-lmi: certificate support clean ups

7281a59c

11 Apr, 2022 5 commits

KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU · 42dcbe7d

Vitaly Kuznetsov authored Apr 07, 2022

The following WARN is triggered from kvm_vm_ioctl_set_clock():
 WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
 ...
 CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G        W  O      5.16.0.stable #20
 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021
 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm]
 ...
 Call Trace:
  <TASK>
  ? kvm_write_guest+0x114/0x120 [kvm]
  kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
  kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]
  ? schedule+0x4e/0xc0
  ? __cond_resched+0x1a/0x50
  ? futex_wait+0x166/0x250
  ? __send_signal+0x1f1/0x3d0
  kvm_vm_ioctl+0x747/0xda0 [kvm]
  ...

The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if
mark_page_dirty() is called without an active vCPU") but the change seems
to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's
no real need to actually write to guest memory to invalidate TSC page, this
can be done by the first vCPU which goes through kvm_guest_time_update().
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220407201013.963226-1-vkuznets@redhat.com>

42dcbe7d

KVM: SVM: Do not activate AVIC for SEV-enabled guest · c538dc79

Suravee Suthikulpanit authored Apr 08, 2022

Since current AVIC implementation cannot support encrypted memory,
inhibit AVIC for SEV-enabled guest.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220408133710.54275-1-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c538dc79

Documentation: KVM: Add SPDX-License-Identifier tag · af105c9c

Like Xu authored Apr 06, 2022

+new file mode 100644
+WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
+#27: FILE: Documentation/virt/kvm/x86/errata.rst:1:

Opportunistically update all other non-added KVM documents and
remove a new extra blank line at EOF for x86/errata.rst.
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220406063715.55625-5-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

af105c9c

selftests: kvm: add tsc_scaling_sync to .gitignore · 0c8b6641

Like Xu authored Apr 06, 2022

The tsc_scaling_sync's binary should be present in the .gitignore
file for the git to ignore it.
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220406063715.55625-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0c8b6641

Merge tag 'kvm-riscv-fixes-5.18-1' of https://github.com/kvm-riscv/linux into HEAD · b2c2c21a

Paolo Bonzini authored Apr 11, 2022

KVM/riscv fixes for 5.18, take #1

- Remove hgatp zeroing in kvm_arch_vcpu_put()

- Fix alignment of the guest_hang() in KVM selftest

- Fix PTE A and D bits in KVM selftest

- Missing #include in vcpu_fp.c

b2c2c21a