Commit 9ba19ccd authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - LKMM updates: mostly documentation changes, but also some new litmus
   tests for atomic ops.

 - KCSAN updates: the most important change is that GCC 11 now has all
   fixes in place to support KCSAN, so GCC support can be enabled again.
   Also more annotations.

 - futex updates: minor cleanups and simplifications

 - seqlock updates: merge preparatory changes/cleanups for the
   'associated locks' facilities.

 - lockdep updates:
    - simplify IRQ trace event handling
    - add various new debug checks
    - simplify header dependencies, split out <linux/lockdep_types.h>,
      decouple lockdep from other low level headers some more
    - fix NMI handling

 - misc cleanups and smaller fixes

* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  kcsan: Improve IRQ state trace reporting
  lockdep: Refactor IRQ trace events fields into struct
  seqlock: lockdep assert non-preemptibility on seqcount_t write
  lockdep: Add preemption enabled/disabled assertion APIs
  seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
  seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
  seqlock: Reorder seqcount_t and seqlock_t API definitions
  seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
  seqlock: Properly format kernel-doc code samples
  Documentation: locking: Describe seqlock design and usage
  locking/qspinlock: Do not include atomic.h from qspinlock_types.h
  locking/atomic: Move ATOMIC_INIT into linux/types.h
  lockdep: Move list.h inclusion into lockdep.h
  locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
  futex: Remove unused or redundant includes
  futex: Consistently use fshared as boolean
  futex: Remove needless goto's
  futex: Remove put_futex_key()
  rwsem: fix commas in initialisation
  docs: locking: Replace HTTP links with HTTPS ones
  ...
parents 8f0cb666 992414a1
...@@ -85,21 +85,21 @@ smp_store_release() respectively. Therefore, if you find yourself only using ...@@ -85,21 +85,21 @@ smp_store_release() respectively. Therefore, if you find yourself only using
the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
and are doing it wrong. and are doing it wrong.
A subtle detail of atomic_set{}() is that it should be observable to the RMW A note for the implementation of atomic_set{}() is that it must not break the
ops. That is: atomicity of the RMW ops. That is:
C atomic-set C Atomic-RMW-ops-are-atomic-WRT-atomic_set
{ {
atomic_set(v, 1); atomic_t v = ATOMIC_INIT(1);
} }
P1(atomic_t *v) P0(atomic_t *v)
{ {
atomic_add_unless(v, 1, 0); (void)atomic_add_unless(v, 1, 0);
} }
P2(atomic_t *v) P1(atomic_t *v)
{ {
atomic_set(v, 0); atomic_set(v, 0);
} }
...@@ -233,19 +233,19 @@ as well. Similarly, something like: ...@@ -233,19 +233,19 @@ as well. Similarly, something like:
is an ACQUIRE pattern (though very much not typical), but again the barrier is is an ACQUIRE pattern (though very much not typical), but again the barrier is
strictly stronger than ACQUIRE. As illustrated: strictly stronger than ACQUIRE. As illustrated:
C strong-acquire C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
{ {
} }
P1(int *x, atomic_t *y) P0(int *x, atomic_t *y)
{ {
r0 = READ_ONCE(*x); r0 = READ_ONCE(*x);
smp_rmb(); smp_rmb();
r1 = atomic_read(y); r1 = atomic_read(y);
} }
P2(int *x, atomic_t *y) P1(int *x, atomic_t *y)
{ {
atomic_inc(y); atomic_inc(y);
smp_mb__after_atomic(); smp_mb__after_atomic();
...@@ -253,14 +253,14 @@ strictly stronger than ACQUIRE. As illustrated: ...@@ -253,14 +253,14 @@ strictly stronger than ACQUIRE. As illustrated:
} }
exists exists
(r0=1 /\ r1=0) (0:r0=1 /\ 0:r1=0)
This should not happen; but a hypothetical atomic_inc_acquire() -- This should not happen; but a hypothetical atomic_inc_acquire() --
(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
because it would not order the W part of the RMW against the following because it would not order the W part of the RMW against the following
WRITE_ONCE. Thus: WRITE_ONCE. Thus:
P1 P2 P0 P1
t = LL.acq *y (0) t = LL.acq *y (0)
t++; t++;
......
...@@ -8,7 +8,8 @@ approach to detect races. KCSAN's primary purpose is to detect `data races`_. ...@@ -8,7 +8,8 @@ approach to detect races. KCSAN's primary purpose is to detect `data races`_.
Usage Usage
----- -----
KCSAN requires Clang version 11 or later. KCSAN is supported by both GCC and Clang. With GCC we require version 11 or
later, and with Clang also require version 11 or later.
To enable KCSAN configure the kernel with:: To enable KCSAN configure the kernel with::
......
============
LITMUS TESTS
============
Each subdirectory contains litmus tests that are typical to describe the
semantics of respective kernel APIs.
For more information about how to "run" a litmus test or how to generate
a kernel test module based on a litmus test, please see
tools/memory-model/README.
atomic (/atomic derectory)
--------------------------
Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
Test that an atomic RMW followed by a smp_mb__after_atomic() is
stronger than a normal acquire: both the read and write parts of
the RMW are ordered before the subsequential memory accesses.
Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
Test that atomic_set() cannot break the atomicity of atomic RMWs.
NOTE: Require herd7 7.56 or later which supports "(void)expr".
RCU (/rcu directory)
--------------------
MP+onceassign+derefonce.litmus (under tools/memory-model/litmus-tests/)
Demonstrates the use of rcu_assign_pointer() and rcu_dereference() to
ensure that an RCU reader will not see pre-initialization garbage.
RCU+sync+read.litmus
RCU+sync+free.litmus
Both the above litmus tests demonstrate the RCU grace period guarantee
that an RCU read-side critical section can never span a grace period.
C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
(*
* Result: Never
*
* Test that an atomic RMW followed by a smp_mb__after_atomic() is
* stronger than a normal acquire: both the read and write parts of
* the RMW are ordered before the subsequential memory accesses.
*)
{
}
P0(int *x, atomic_t *y)
{
int r0;
int r1;
r0 = READ_ONCE(*x);
smp_rmb();
r1 = atomic_read(y);
}
P1(int *x, atomic_t *y)
{
atomic_inc(y);
smp_mb__after_atomic();
WRITE_ONCE(*x, 1);
}
exists
(0:r0=1 /\ 0:r1=0)
C Atomic-RMW-ops-are-atomic-WRT-atomic_set
(*
* Result: Never
*
* Test that atomic_set() cannot break the atomicity of atomic RMWs.
* NOTE: This requires herd7 7.56 or later which supports "(void)expr".
*)
{
atomic_t v = ATOMIC_INIT(1);
}
P0(atomic_t *v)
{
(void)atomic_add_unless(v, 1, 0);
}
P1(atomic_t *v)
{
atomic_set(v, 0);
}
exists
(v=2)
C RCU+sync+free
(*
* Result: Never
*
* This litmus test demonstrates that an RCU reader can never see a write that
* follows a grace period, if it did not see writes that precede that grace
* period.
*
* This is a typical pattern of RCU usage, where the write before the grace
* period assigns a pointer, and the writes following the grace period destroy
* the object that the pointer used to point to.
*
* This is one implication of the RCU grace-period guarantee, which says (among
* other things) that an RCU read-side critical section cannot span a grace period.
*)
{
int x = 1;
int *y = &x;
int z = 1;
}
P0(int *x, int *z, int **y)
{
int *r0;
int r1;
rcu_read_lock();
r0 = rcu_dereference(*y);
r1 = READ_ONCE(*r0);
rcu_read_unlock();
}
P1(int *x, int *z, int **y)
{
rcu_assign_pointer(*y, z);
synchronize_rcu();
WRITE_ONCE(*x, 0);
}
exists (0:r0=x /\ 0:r1=0)
C RCU+sync+read
(*
* Result: Never
*
* This litmus test demonstrates that after a grace period, an RCU updater always
* sees all stores done in prior RCU read-side critical sections. Such
* read-side critical sections would have ended before the grace period ended.
*
* This is one implication of the RCU grace-period guarantee, which says (among
* other things) that an RCU read-side critical section cannot span a grace period.
*)
{
int x = 0;
int y = 0;
}
P0(int *x, int *y)
{
rcu_read_lock();
WRITE_ONCE(*x, 1);
WRITE_ONCE(*y, 1);
rcu_read_unlock();
}
P1(int *x, int *y)
{
int r0;
int r1;
r0 = READ_ONCE(*x);
synchronize_rcu();
r1 = READ_ONCE(*y);
}
exists (1:r0=1 /\ 1:r1=0)
...@@ -14,6 +14,7 @@ locking ...@@ -14,6 +14,7 @@ locking
mutex-design mutex-design
rt-mutex-design rt-mutex-design
rt-mutex rt-mutex
seqlock
spinlocks spinlocks
ww-mutex-design ww-mutex-design
preempt-locking preempt-locking
......
...@@ -18,7 +18,7 @@ as an alternative to these. This new data structure provided a number ...@@ -18,7 +18,7 @@ as an alternative to these. This new data structure provided a number
of advantages, including simpler interfaces, and at that time smaller of advantages, including simpler interfaces, and at that time smaller
code (see Disadvantages). code (see Disadvantages).
[1] http://lwn.net/Articles/164802/ [1] https://lwn.net/Articles/164802/
Implementation Implementation
-------------- --------------
......
======================================
Sequence counters and sequential locks
======================================
Introduction
============
Sequence counters are a reader-writer consistency mechanism with
lockless readers (read-only retry loops), and no writer starvation. They
are used for data that's rarely written to (e.g. system time), where the
reader wants a consistent set of information and is willing to retry if
that information changes.
A data set is consistent when the sequence count at the beginning of the
read side critical section is even and the same sequence count value is
read again at the end of the critical section. The data in the set must
be copied out inside the read side critical section. If the sequence
count has changed between the start and the end of the critical section,
the reader must retry.
Writers increment the sequence count at the start and the end of their
critical section. After starting the critical section the sequence count
is odd and indicates to the readers that an update is in progress. At
the end of the write side critical section the sequence count becomes
even again which lets readers make progress.
A sequence counter write side critical section must never be preempted
or interrupted by read side sections. Otherwise the reader will spin for
the entire scheduler tick due to the odd sequence count value and the
interrupted writer. If that reader belongs to a real-time scheduling
class, it can spin forever and the kernel will livelock.
This mechanism cannot be used if the protected data contains pointers,
as the writer can invalidate a pointer that the reader is following.
.. _seqcount_t:
Sequence counters (``seqcount_t``)
==================================
This is the the raw counting mechanism, which does not protect against
multiple writers. Write side critical sections must thus be serialized
by an external lock.
If the write serialization primitive is not implicitly disabling
preemption, preemption must be explicitly disabled before entering the
write side section. If the read section can be invoked from hardirq or
softirq contexts, interrupts or bottom halves must also be respectively
disabled before entering the write section.
If it's desired to automatically handle the sequence counter
requirements of writer serialization and non-preemptibility, use
:ref:`seqlock_t` instead.
Initialization::
/* dynamic */
seqcount_t foo_seqcount;
seqcount_init(&foo_seqcount);
/* static */
static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
/* C99 struct init */
struct {
.seq = SEQCNT_ZERO(foo.seq),
} foo;
Write path::
/* Serialized context with disabled preemption */
write_seqcount_begin(&foo_seqcount);
/* ... [[write-side critical section]] ... */
write_seqcount_end(&foo_seqcount);
Read path::
do {
seq = read_seqcount_begin(&foo_seqcount);
/* ... [[read-side critical section]] ... */
} while (read_seqcount_retry(&foo_seqcount, seq));
.. _seqlock_t:
Sequential locks (``seqlock_t``)
================================
This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
embedded spinlock for writer serialization and non-preemptibility.
If the read side section can be invoked from hardirq or softirq context,
use the write side function variants which disable interrupts or bottom
halves respectively.
Initialization::
/* dynamic */
seqlock_t foo_seqlock;
seqlock_init(&foo_seqlock);
/* static */
static DEFINE_SEQLOCK(foo_seqlock);
/* C99 struct init */
struct {
.seql = __SEQLOCK_UNLOCKED(foo.seql)
} foo;
Write path::
write_seqlock(&foo_seqlock);
/* ... [[write-side critical section]] ... */
write_sequnlock(&foo_seqlock);
Read path, three categories:
1. Normal Sequence readers which never block a writer but they must
retry if a writer is in progress by detecting change in the sequence
number. Writers do not wait for a sequence reader::
do {
seq = read_seqbegin(&foo_seqlock);
/* ... [[read-side critical section]] ... */
} while (read_seqretry(&foo_seqlock, seq));
2. Locking readers which will wait if a writer or another locking reader
is in progress. A locking reader in progress will also block a writer
from entering its critical section. This read lock is
exclusive. Unlike rwlock_t, only one locking reader can acquire it::
read_seqlock_excl(&foo_seqlock);
/* ... [[read-side critical section]] ... */
read_sequnlock_excl(&foo_seqlock);
3. Conditional lockless reader (as in 1), or locking reader (as in 2),
according to a passed marker. This is used to avoid lockless readers
starvation (too much retry loops) in case of a sharp spike in write
activity. First, a lockless read is tried (even marker passed). If
that trial fails (odd sequence counter is returned, which is used as
the next iteration marker), the lockless read is transformed to a
full locking read and no retry loop is necessary::
/* marker; even initialization */
int seq = 0;
do {
read_seqbegin_or_lock(&foo_seqlock, &seq);
/* ... [[read-side critical section]] ... */
} while (need_seqretry(&foo_seqlock, seq));
done_seqretry(&foo_seqlock, seq);
API documentation
=================
.. kernel-doc:: include/linux/seqlock.h
...@@ -9981,6 +9981,7 @@ M: Luc Maranget <luc.maranget@inria.fr> ...@@ -9981,6 +9981,7 @@ M: Luc Maranget <luc.maranget@inria.fr>
M: "Paul E. McKenney" <paulmck@kernel.org> M: "Paul E. McKenney" <paulmck@kernel.org>
R: Akira Yokosawa <akiyks@gmail.com> R: Akira Yokosawa <akiyks@gmail.com>
R: Daniel Lustig <dlustig@nvidia.com> R: Daniel Lustig <dlustig@nvidia.com>
R: Joel Fernandes <joel@joelfernandes.org>
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org
L: linux-arch@vger.kernel.org L: linux-arch@vger.kernel.org
S: Supported S: Supported
...@@ -9989,6 +9990,7 @@ F: Documentation/atomic_bitops.txt ...@@ -9989,6 +9990,7 @@ F: Documentation/atomic_bitops.txt
F: Documentation/atomic_t.txt F: Documentation/atomic_t.txt
F: Documentation/core-api/atomic_ops.rst F: Documentation/core-api/atomic_ops.rst
F: Documentation/core-api/refcount-vs-atomic.rst F: Documentation/core-api/refcount-vs-atomic.rst
F: Documentation/litmus-tests/
F: Documentation/memory-barriers.txt F: Documentation/memory-barriers.txt
F: tools/memory-model/ F: tools/memory-model/
......
...@@ -24,7 +24,6 @@ ...@@ -24,7 +24,6 @@
#define __atomic_acquire_fence() #define __atomic_acquire_fence()
#define __atomic_post_full_fence() #define __atomic_post_full_fence()
#define ATOMIC_INIT(i) { (i) }
#define ATOMIC64_INIT(i) { (i) } #define ATOMIC64_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
......
...@@ -14,8 +14,6 @@ ...@@ -14,8 +14,6 @@
#include <asm/barrier.h> #include <asm/barrier.h>
#include <asm/smp.h> #include <asm/smp.h>
#define ATOMIC_INIT(i) { (i) }
#ifndef CONFIG_ARC_PLAT_EZNPS #ifndef CONFIG_ARC_PLAT_EZNPS
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
......
...@@ -15,8 +15,6 @@ ...@@ -15,8 +15,6 @@
#include <asm/barrier.h> #include <asm/barrier.h>
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#define ATOMIC_INIT(i) { (i) }
#ifdef __KERNEL__ #ifdef __KERNEL__
/* /*
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
#ifndef _ASM_ARM_PERCPU_H_ #ifndef _ASM_ARM_PERCPU_H_
#define _ASM_ARM_PERCPU_H_ #define _ASM_ARM_PERCPU_H_
#include <asm/thread_info.h> register unsigned long current_stack_pointer asm ("sp");
/* /*
* Same as asm-generic/percpu.h, except that we store the per cpu offset * Same as asm-generic/percpu.h, except that we store the per cpu offset
......
...@@ -75,11 +75,6 @@ struct thread_info { ...@@ -75,11 +75,6 @@ struct thread_info {
.addr_limit = KERNEL_DS, \ .addr_limit = KERNEL_DS, \
} }
/*
* how to get the current stack pointer in C
*/
register unsigned long current_stack_pointer asm ("sp");
/* /*
* how to get the thread information struct from C * how to get the thread information struct from C
*/ */
......
...@@ -99,8 +99,6 @@ static inline long arch_atomic64_dec_if_positive(atomic64_t *v) ...@@ -99,8 +99,6 @@ static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
return __lse_ll_sc_body(atomic64_dec_if_positive, v); return __lse_ll_sc_body(atomic64_dec_if_positive, v);
} }
#define ATOMIC_INIT(i) { (i) }
#define arch_atomic_read(v) __READ_ONCE((v)->counter) #define arch_atomic_read(v) __READ_ONCE((v)->counter)
#define arch_atomic_set(v, i) __WRITE_ONCE(((v)->counter), (i)) #define arch_atomic_set(v, i) __WRITE_ONCE(((v)->counter), (i))
......
...@@ -12,8 +12,6 @@ ...@@ -12,8 +12,6 @@
* resource counting etc.. * resource counting etc..
*/ */
#define ATOMIC_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i)) #define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
......
...@@ -12,8 +12,6 @@ ...@@ -12,8 +12,6 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
/* Normal writes in our arch don't clear lock reservations */ /* Normal writes in our arch don't clear lock reservations */
static inline void atomic_set(atomic_t *v, int new) static inline void atomic_set(atomic_t *v, int new)
......
...@@ -19,7 +19,6 @@ ...@@ -19,7 +19,6 @@
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
#define ATOMIC64_INIT(i) { (i) } #define ATOMIC64_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
......
...@@ -16,8 +16,6 @@ ...@@ -16,8 +16,6 @@
* We do not have SMP m68k systems, so we don't have to deal with that. * We do not have SMP m68k systems, so we don't have to deal with that.
*/ */
#define ATOMIC_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i)) #define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
......
...@@ -45,7 +45,6 @@ static __always_inline type pfx##_xchg(pfx##_t *v, type n) \ ...@@ -45,7 +45,6 @@ static __always_inline type pfx##_xchg(pfx##_t *v, type n) \
return xchg(&v->counter, n); \ return xchg(&v->counter, n); \
} }
#define ATOMIC_INIT(i) { (i) }
ATOMIC_OPS(atomic, int) ATOMIC_OPS(atomic, int)
#ifdef CONFIG_64BIT #ifdef CONFIG_64BIT
......
...@@ -136,8 +136,6 @@ ATOMIC_OPS(xor, ^=) ...@@ -136,8 +136,6 @@ ATOMIC_OPS(xor, ^=)
#undef ATOMIC_OP_RETURN #undef ATOMIC_OP_RETURN
#undef ATOMIC_OP #undef ATOMIC_OP
#define ATOMIC_INIT(i) { (i) }
#ifdef CONFIG_64BIT #ifdef CONFIG_64BIT
#define ATOMIC64_INIT(i) { (i) } #define ATOMIC64_INIT(i) { (i) }
......
...@@ -11,8 +11,6 @@ ...@@ -11,8 +11,6 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
/* /*
* Since *_return_relaxed and {cmp}xchg_relaxed are implemented with * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
* a "bne-" instruction at the end, so an isync is enough as a acquire barrier * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
......
#ifndef _ASM_POWERPC_DTL_H
#define _ASM_POWERPC_DTL_H
#include <asm/lppaca.h>
#include <linux/spinlock_types.h>
/*
* Layout of entries in the hypervisor's dispatch trace log buffer.
*/
struct dtl_entry {
u8 dispatch_reason;
u8 preempt_reason;
__be16 processor_id;
__be32 enqueue_to_dispatch_time;
__be32 ready_to_enqueue_time;
__be32 waiting_to_ready_time;
__be64 timebase;
__be64 fault_addr;
__be64 srr0;
__be64 srr1;
};
#define DISPATCH_LOG_BYTES 4096 /* bytes per cpu */
#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
/*
* Dispatch trace log event enable mask:
* 0x1: voluntary virtual processor waits
* 0x2: time-slice preempts
* 0x4: virtual partition memory page faults
*/
#define DTL_LOG_CEDE 0x1
#define DTL_LOG_PREEMPT 0x2
#define DTL_LOG_FAULT 0x4
#define DTL_LOG_ALL (DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
extern struct kmem_cache *dtl_cache;
extern rwlock_t dtl_access_lock;
/*
* When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls
* reading from the dispatch trace log. If other code wants to consume
* DTL entries, it can set this pointer to a function that will get
* called once for each DTL entry that gets processed.
*/
extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
extern void register_dtl_buffer(int cpu);
extern void alloc_dtl_buffers(unsigned long *time_limit);
extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity);
#endif /* _ASM_POWERPC_DTL_H */
...@@ -42,7 +42,6 @@ ...@@ -42,7 +42,6 @@
*/ */
#include <linux/cache.h> #include <linux/cache.h>
#include <linux/threads.h> #include <linux/threads.h>
#include <linux/spinlock_types.h>
#include <asm/types.h> #include <asm/types.h>
#include <asm/mmu.h> #include <asm/mmu.h>
#include <asm/firmware.h> #include <asm/firmware.h>
...@@ -146,49 +145,6 @@ struct slb_shadow { ...@@ -146,49 +145,6 @@ struct slb_shadow {
} save_area[SLB_NUM_BOLTED]; } save_area[SLB_NUM_BOLTED];
} ____cacheline_aligned; } ____cacheline_aligned;
/*
* Layout of entries in the hypervisor's dispatch trace log buffer.
*/
struct dtl_entry {
u8 dispatch_reason;
u8 preempt_reason;
__be16 processor_id;
__be32 enqueue_to_dispatch_time;
__be32 ready_to_enqueue_time;
__be32 waiting_to_ready_time;
__be64 timebase;
__be64 fault_addr;
__be64 srr0;
__be64 srr1;
};
#define DISPATCH_LOG_BYTES 4096 /* bytes per cpu */
#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
/*
* Dispatch trace log event enable mask:
* 0x1: voluntary virtual processor waits
* 0x2: time-slice preempts
* 0x4: virtual partition memory page faults
*/
#define DTL_LOG_CEDE 0x1
#define DTL_LOG_PREEMPT 0x2
#define DTL_LOG_FAULT 0x4
#define DTL_LOG_ALL (DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
extern struct kmem_cache *dtl_cache;
extern rwlock_t dtl_access_lock;
/*
* When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls
* reading from the dispatch trace log. If other code wants to consume
* DTL entries, it can set this pointer to a function that will get
* called once for each DTL entry that gets processed.
*/
extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
extern void register_dtl_buffer(int cpu);
extern void alloc_dtl_buffers(unsigned long *time_limit);
extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity); extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity);
#endif /* CONFIG_PPC_BOOK3S */ #endif /* CONFIG_PPC_BOOK3S */
......
...@@ -29,7 +29,6 @@ ...@@ -29,7 +29,6 @@
#include <asm/hmi.h> #include <asm/hmi.h>
#include <asm/cpuidle.h> #include <asm/cpuidle.h>
#include <asm/atomic.h> #include <asm/atomic.h>
#include <asm/rtas-types.h>
#include <asm-generic/mmiowb_types.h> #include <asm-generic/mmiowb_types.h>
...@@ -53,6 +52,7 @@ extern unsigned int debug_smp_processor_id(void); /* from linux/smp.h */ ...@@ -53,6 +52,7 @@ extern unsigned int debug_smp_processor_id(void); /* from linux/smp.h */
#define get_slb_shadow() (get_paca()->slb_shadow_ptr) #define get_slb_shadow() (get_paca()->slb_shadow_ptr)
struct task_struct; struct task_struct;
struct rtas_args;
/* /*
* Defines the layout of the paca. * Defines the layout of the paca.
......
...@@ -183,6 +183,8 @@ static inline unsigned long read_spurr(unsigned long tb) ...@@ -183,6 +183,8 @@ static inline unsigned long read_spurr(unsigned long tb)
#ifdef CONFIG_PPC_SPLPAR #ifdef CONFIG_PPC_SPLPAR
#include <asm/dtl.h>
/* /*
* Scan the dispatch trace log and count up the stolen time. * Scan the dispatch trace log and count up the stolen time.
* Should be called with interrupts disabled. * Should be called with interrupts disabled.
......
...@@ -74,6 +74,7 @@ ...@@ -74,6 +74,7 @@
#include <asm/hw_breakpoint.h> #include <asm/hw_breakpoint.h>
#include <asm/kvm_book3s_uvmem.h> #include <asm/kvm_book3s_uvmem.h>
#include <asm/ultravisor.h> #include <asm/ultravisor.h>
#include <asm/dtl.h>
#include "book3s.h" #include "book3s.h"
......
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
#include <asm/smp.h> #include <asm/smp.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <asm/firmware.h> #include <asm/firmware.h>
#include <asm/dtl.h>
#include <asm/lppaca.h> #include <asm/lppaca.h>
#include <asm/debugfs.h> #include <asm/debugfs.h>
#include <asm/plpar_wrappers.h> #include <asm/plpar_wrappers.h>
......
...@@ -40,6 +40,7 @@ ...@@ -40,6 +40,7 @@
#include <asm/fadump.h> #include <asm/fadump.h>
#include <asm/asm-prototypes.h> #include <asm/asm-prototypes.h>
#include <asm/debugfs.h> #include <asm/debugfs.h>
#include <asm/dtl.h>
#include "pseries.h" #include "pseries.h"
......
...@@ -70,6 +70,7 @@ ...@@ -70,6 +70,7 @@
#include <asm/idle.h> #include <asm/idle.h>
#include <asm/swiotlb.h> #include <asm/swiotlb.h>
#include <asm/svm.h> #include <asm/svm.h>
#include <asm/dtl.h>
#include "pseries.h" #include "pseries.h"
#include "../../../../drivers/pci/pci.h" #include "../../../../drivers/pci/pci.h"
......
...@@ -11,6 +11,7 @@ ...@@ -11,6 +11,7 @@
#include <asm/svm.h> #include <asm/svm.h>
#include <asm/swiotlb.h> #include <asm/swiotlb.h>
#include <asm/ultravisor.h> #include <asm/ultravisor.h>
#include <asm/dtl.h>
static int __init init_svm(void) static int __init init_svm(void)
{ {
......
...@@ -19,8 +19,6 @@ ...@@ -19,8 +19,6 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
#define __atomic_acquire_fence() \ #define __atomic_acquire_fence() \
__asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory") __asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory")
......
...@@ -15,8 +15,6 @@ ...@@ -15,8 +15,6 @@
#include <asm/barrier.h> #include <asm/barrier.h>
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#define ATOMIC_INIT(i) { (i) }
static inline int atomic_read(const atomic_t *v) static inline int atomic_read(const atomic_t *v)
{ {
int c; int c;
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
#include <asm/sigp.h> #include <asm/sigp.h>
#include <asm/lowcore.h> #include <asm/lowcore.h>
#include <asm/processor.h>
#define raw_smp_processor_id() (S390_lowcore.cpu_nr) #define raw_smp_processor_id() (S390_lowcore.cpu_nr)
......
...@@ -24,7 +24,6 @@ ...@@ -24,7 +24,6 @@
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <asm/lowcore.h> #include <asm/lowcore.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/processor.h>
#define STACK_INIT_OFFSET \ #define STACK_INIT_OFFSET \
(THREAD_SIZE - STACK_FRAME_OVERHEAD - sizeof(struct pt_regs)) (THREAD_SIZE - STACK_FRAME_OVERHEAD - sizeof(struct pt_regs))
......
...@@ -19,8 +19,6 @@ ...@@ -19,8 +19,6 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_set(v,i) WRITE_ONCE((v)->counter, (i)) #define atomic_set(v,i) WRITE_ONCE((v)->counter, (i))
......
...@@ -18,8 +18,6 @@ ...@@ -18,8 +18,6 @@
#include <asm/barrier.h> #include <asm/barrier.h>
#include <asm-generic/atomic64.h> #include <asm-generic/atomic64.h>
#define ATOMIC_INIT(i) { (i) }
int atomic_add_return(int, atomic_t *); int atomic_add_return(int, atomic_t *);
int atomic_fetch_add(int, atomic_t *); int atomic_fetch_add(int, atomic_t *);
int atomic_fetch_and(int, atomic_t *); int atomic_fetch_and(int, atomic_t *);
......
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
#define ATOMIC64_INIT(i) { (i) } #define ATOMIC64_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter) #define atomic_read(v) READ_ONCE((v)->counter)
......
...@@ -4,7 +4,9 @@ ...@@ -4,7 +4,9 @@
#include <linux/compiler.h> #include <linux/compiler.h>
#ifndef BUILD_VDSO
register unsigned long __local_per_cpu_offset asm("g5"); register unsigned long __local_per_cpu_offset asm("g5");
#endif
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
......
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
#ifndef _SPARC_TRAP_BLOCK_H #ifndef _SPARC_TRAP_BLOCK_H
#define _SPARC_TRAP_BLOCK_H #define _SPARC_TRAP_BLOCK_H
#include <linux/threads.h>
#include <asm/hypervisor.h> #include <asm/hypervisor.h>
#include <asm/asi.h> #include <asm/asi.h>
......
...@@ -3,6 +3,9 @@ ...@@ -3,6 +3,9 @@
config TRACE_IRQFLAGS_SUPPORT config TRACE_IRQFLAGS_SUPPORT
def_bool y def_bool y
config TRACE_IRQFLAGS_NMI_SUPPORT
def_bool y
config EARLY_PRINTK_USB config EARLY_PRINTK_USB
bool bool
......
...@@ -559,8 +559,7 @@ SYSCALL_DEFINE0(ni_syscall) ...@@ -559,8 +559,7 @@ SYSCALL_DEFINE0(ni_syscall)
} }
/** /**
* idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional * idtentry_enter - Handle state tracking on ordinary idtentries
* RCU handling
* @regs: Pointer to pt_regs of interrupted context * @regs: Pointer to pt_regs of interrupted context
* *
* Invokes: * Invokes:
...@@ -572,6 +571,9 @@ SYSCALL_DEFINE0(ni_syscall) ...@@ -572,6 +571,9 @@ SYSCALL_DEFINE0(ni_syscall)
* - The hardirq tracer to keep the state consistent as low level ASM * - The hardirq tracer to keep the state consistent as low level ASM
* entry disabled interrupts. * entry disabled interrupts.
* *
* As a precondition, this requires that the entry came from user mode,
* idle, or a kernel context in which RCU is watching.
*
* For kernel mode entries RCU handling is done conditional. If RCU is * For kernel mode entries RCU handling is done conditional. If RCU is
* watching then the only RCU requirement is to check whether the tick has * watching then the only RCU requirement is to check whether the tick has
* to be restarted. If RCU is not watching then rcu_irq_enter() has to be * to be restarted. If RCU is not watching then rcu_irq_enter() has to be
...@@ -585,18 +587,21 @@ SYSCALL_DEFINE0(ni_syscall) ...@@ -585,18 +587,21 @@ SYSCALL_DEFINE0(ni_syscall)
* establish the proper context for NOHZ_FULL. Otherwise scheduling on exit * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
* would not be possible. * would not be possible.
* *
* Returns: True if RCU has been adjusted on a kernel entry * Returns: An opaque object that must be passed to idtentry_exit()
* False otherwise
* *
* The return value must be fed into the rcu_exit argument of * The return value must be fed into the state argument of
* idtentry_exit_cond_rcu(). * idtentry_exit().
*/ */
bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs) noinstr idtentry_state_t idtentry_enter(struct pt_regs *regs)
{ {
idtentry_state_t ret = {
.exit_rcu = false,
};
if (user_mode(regs)) { if (user_mode(regs)) {
check_user_regs(regs); check_user_regs(regs);
enter_from_user_mode(); enter_from_user_mode();
return false; return ret;
} }
/* /*
...@@ -634,7 +639,8 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs) ...@@ -634,7 +639,8 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
trace_hardirqs_off_finish(); trace_hardirqs_off_finish();
instrumentation_end(); instrumentation_end();
return true; ret.exit_rcu = true;
return ret;
} }
/* /*
...@@ -649,7 +655,7 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs) ...@@ -649,7 +655,7 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
trace_hardirqs_off(); trace_hardirqs_off();
instrumentation_end(); instrumentation_end();
return false; return ret;
} }
static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched) static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
...@@ -667,10 +673,9 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched) ...@@ -667,10 +673,9 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
} }
/** /**
* idtentry_exit_cond_rcu - Handle return from exception with conditional RCU * idtentry_exit - Handle return from exception that used idtentry_enter()
* handling
* @regs: Pointer to pt_regs (exception entry regs) * @regs: Pointer to pt_regs (exception entry regs)
* @rcu_exit: Invoke rcu_irq_exit() if true * @state: Return value from matching call to idtentry_enter()
* *
* Depending on the return target (kernel/user) this runs the necessary * Depending on the return target (kernel/user) this runs the necessary
* preemption and work checks if possible and reguired and returns to * preemption and work checks if possible and reguired and returns to
...@@ -679,10 +684,10 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched) ...@@ -679,10 +684,10 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
* This is the last action before returning to the low level ASM code which * This is the last action before returning to the low level ASM code which
* just needs to return to the appropriate context. * just needs to return to the appropriate context.
* *
* Counterpart to idtentry_enter_cond_rcu(). The return value of the entry * Counterpart to idtentry_enter(). The return value of the entry
* function must be fed into the @rcu_exit argument. * function must be fed into the @state argument.
*/ */
void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit) noinstr void idtentry_exit(struct pt_regs *regs, idtentry_state_t state)
{ {
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
...@@ -695,7 +700,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit) ...@@ -695,7 +700,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
* carefully and needs the same ordering of lockdep/tracing * carefully and needs the same ordering of lockdep/tracing
* and RCU as the return to user mode path. * and RCU as the return to user mode path.
*/ */
if (rcu_exit) { if (state.exit_rcu) {
instrumentation_begin(); instrumentation_begin();
/* Tell the tracer that IRET will enable interrupts */ /* Tell the tracer that IRET will enable interrupts */
trace_hardirqs_on_prepare(); trace_hardirqs_on_prepare();
...@@ -714,7 +719,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit) ...@@ -714,7 +719,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
* IRQ flags state is correct already. Just tell RCU if it * IRQ flags state is correct already. Just tell RCU if it
* was not watching on entry. * was not watching on entry.
*/ */
if (rcu_exit) if (state.exit_rcu)
rcu_irq_exit(); rcu_irq_exit();
} }
} }
...@@ -726,7 +731,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit) ...@@ -726,7 +731,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
* Invokes enter_from_user_mode() to establish the proper context for * Invokes enter_from_user_mode() to establish the proper context for
* NOHZ_FULL. Otherwise scheduling on exit would not be possible. * NOHZ_FULL. Otherwise scheduling on exit would not be possible.
*/ */
void noinstr idtentry_enter_user(struct pt_regs *regs) noinstr void idtentry_enter_user(struct pt_regs *regs)
{ {
check_user_regs(regs); check_user_regs(regs);
enter_from_user_mode(); enter_from_user_mode();
...@@ -744,13 +749,47 @@ void noinstr idtentry_enter_user(struct pt_regs *regs) ...@@ -744,13 +749,47 @@ void noinstr idtentry_enter_user(struct pt_regs *regs)
* *
* Counterpart to idtentry_enter_user(). * Counterpart to idtentry_enter_user().
*/ */
void noinstr idtentry_exit_user(struct pt_regs *regs) noinstr void idtentry_exit_user(struct pt_regs *regs)
{ {
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
prepare_exit_to_usermode(regs); prepare_exit_to_usermode(regs);
} }
noinstr bool idtentry_enter_nmi(struct pt_regs *regs)
{
bool irq_state = lockdep_hardirqs_enabled();
__nmi_enter();
lockdep_hardirqs_off(CALLER_ADDR0);
lockdep_hardirq_enter();
rcu_nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
ftrace_nmi_enter();
instrumentation_end();
return irq_state;
}
noinstr void idtentry_exit_nmi(struct pt_regs *regs, bool restore)
{
instrumentation_begin();
ftrace_nmi_exit();
if (restore) {
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
}
instrumentation_end();
rcu_nmi_exit();
lockdep_hardirq_exit();
if (restore)
lockdep_hardirqs_on(CALLER_ADDR0);
__nmi_exit();
}
#ifdef CONFIG_XEN_PV #ifdef CONFIG_XEN_PV
#ifndef CONFIG_PREEMPTION #ifndef CONFIG_PREEMPTION
/* /*
...@@ -800,9 +839,10 @@ static void __xen_pv_evtchn_do_upcall(void) ...@@ -800,9 +839,10 @@ static void __xen_pv_evtchn_do_upcall(void)
__visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
{ {
struct pt_regs *old_regs; struct pt_regs *old_regs;
bool inhcall, rcu_exit; bool inhcall;
idtentry_state_t state;
rcu_exit = idtentry_enter_cond_rcu(regs); state = idtentry_enter(regs);
old_regs = set_irq_regs(regs); old_regs = set_irq_regs(regs);
instrumentation_begin(); instrumentation_begin();
...@@ -812,13 +852,13 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) ...@@ -812,13 +852,13 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
set_irq_regs(old_regs); set_irq_regs(old_regs);
inhcall = get_and_clear_inhcall(); inhcall = get_and_clear_inhcall();
if (inhcall && !WARN_ON_ONCE(rcu_exit)) { if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
instrumentation_begin(); instrumentation_begin();
idtentry_exit_cond_resched(regs, true); idtentry_exit_cond_resched(regs, true);
instrumentation_end(); instrumentation_end();
restore_inhcall(inhcall); restore_inhcall(inhcall);
} else { } else {
idtentry_exit_cond_rcu(regs, rcu_exit); idtentry_exit(regs, state);
} }
} }
#endif /* CONFIG_XEN_PV */ #endif /* CONFIG_XEN_PV */
...@@ -14,8 +14,6 @@ ...@@ -14,8 +14,6 @@
* resource counting etc.. * resource counting etc..
*/ */
#define ATOMIC_INIT(i) { (i) }
/** /**
* arch_atomic_read - read atomic variable * arch_atomic_read - read atomic variable
* @v: pointer of type atomic_t * @v: pointer of type atomic_t
......
...@@ -13,8 +13,15 @@ ...@@ -13,8 +13,15 @@
void idtentry_enter_user(struct pt_regs *regs); void idtentry_enter_user(struct pt_regs *regs);
void idtentry_exit_user(struct pt_regs *regs); void idtentry_exit_user(struct pt_regs *regs);
bool idtentry_enter_cond_rcu(struct pt_regs *regs); typedef struct idtentry_state {
void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit); bool exit_rcu;
} idtentry_state_t;
idtentry_state_t idtentry_enter(struct pt_regs *regs);
void idtentry_exit(struct pt_regs *regs, idtentry_state_t state);
bool idtentry_enter_nmi(struct pt_regs *regs);
void idtentry_exit_nmi(struct pt_regs *regs, bool irq_state);
/** /**
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
...@@ -54,12 +61,12 @@ static __always_inline void __##func(struct pt_regs *regs); \ ...@@ -54,12 +61,12 @@ static __always_inline void __##func(struct pt_regs *regs); \
\ \
__visible noinstr void func(struct pt_regs *regs) \ __visible noinstr void func(struct pt_regs *regs) \
{ \ { \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \ idtentry_state_t state = idtentry_enter(regs); \
\ \
instrumentation_begin(); \ instrumentation_begin(); \
__##func (regs); \ __##func (regs); \
instrumentation_end(); \ instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \ idtentry_exit(regs, state); \
} \ } \
\ \
static __always_inline void __##func(struct pt_regs *regs) static __always_inline void __##func(struct pt_regs *regs)
...@@ -101,12 +108,12 @@ static __always_inline void __##func(struct pt_regs *regs, \ ...@@ -101,12 +108,12 @@ static __always_inline void __##func(struct pt_regs *regs, \
__visible noinstr void func(struct pt_regs *regs, \ __visible noinstr void func(struct pt_regs *regs, \
unsigned long error_code) \ unsigned long error_code) \
{ \ { \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \ idtentry_state_t state = idtentry_enter(regs); \
\ \
instrumentation_begin(); \ instrumentation_begin(); \
__##func (regs, error_code); \ __##func (regs, error_code); \
instrumentation_end(); \ instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \ idtentry_exit(regs, state); \
} \ } \
\ \
static __always_inline void __##func(struct pt_regs *regs, \ static __always_inline void __##func(struct pt_regs *regs, \
...@@ -199,7 +206,7 @@ static __always_inline void __##func(struct pt_regs *regs, u8 vector); \ ...@@ -199,7 +206,7 @@ static __always_inline void __##func(struct pt_regs *regs, u8 vector); \
__visible noinstr void func(struct pt_regs *regs, \ __visible noinstr void func(struct pt_regs *regs, \
unsigned long error_code) \ unsigned long error_code) \
{ \ { \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \ idtentry_state_t state = idtentry_enter(regs); \
\ \
instrumentation_begin(); \ instrumentation_begin(); \
irq_enter_rcu(); \ irq_enter_rcu(); \
...@@ -207,7 +214,7 @@ __visible noinstr void func(struct pt_regs *regs, \ ...@@ -207,7 +214,7 @@ __visible noinstr void func(struct pt_regs *regs, \
__##func (regs, (u8)error_code); \ __##func (regs, (u8)error_code); \
irq_exit_rcu(); \ irq_exit_rcu(); \
instrumentation_end(); \ instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \ idtentry_exit(regs, state); \
} \ } \
\ \
static __always_inline void __##func(struct pt_regs *regs, u8 vector) static __always_inline void __##func(struct pt_regs *regs, u8 vector)
...@@ -241,7 +248,7 @@ static void __##func(struct pt_regs *regs); \ ...@@ -241,7 +248,7 @@ static void __##func(struct pt_regs *regs); \
\ \
__visible noinstr void func(struct pt_regs *regs) \ __visible noinstr void func(struct pt_regs *regs) \
{ \ { \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \ idtentry_state_t state = idtentry_enter(regs); \
\ \
instrumentation_begin(); \ instrumentation_begin(); \
irq_enter_rcu(); \ irq_enter_rcu(); \
...@@ -249,7 +256,7 @@ __visible noinstr void func(struct pt_regs *regs) \ ...@@ -249,7 +256,7 @@ __visible noinstr void func(struct pt_regs *regs) \
run_on_irqstack_cond(__##func, regs, regs); \ run_on_irqstack_cond(__##func, regs, regs); \
irq_exit_rcu(); \ irq_exit_rcu(); \
instrumentation_end(); \ instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \ idtentry_exit(regs, state); \
} \ } \
\ \
static noinline void __##func(struct pt_regs *regs) static noinline void __##func(struct pt_regs *regs)
...@@ -270,7 +277,7 @@ static __always_inline void __##func(struct pt_regs *regs); \ ...@@ -270,7 +277,7 @@ static __always_inline void __##func(struct pt_regs *regs); \
\ \
__visible noinstr void func(struct pt_regs *regs) \ __visible noinstr void func(struct pt_regs *regs) \
{ \ { \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \ idtentry_state_t state = idtentry_enter(regs); \
\ \
instrumentation_begin(); \ instrumentation_begin(); \
__irq_enter_raw(); \ __irq_enter_raw(); \
...@@ -278,7 +285,7 @@ __visible noinstr void func(struct pt_regs *regs) \ ...@@ -278,7 +285,7 @@ __visible noinstr void func(struct pt_regs *regs) \
__##func (regs); \ __##func (regs); \
__irq_exit_raw(); \ __irq_exit_raw(); \
instrumentation_end(); \ instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \ idtentry_exit(regs, state); \
} \ } \
\ \
static __always_inline void __##func(struct pt_regs *regs) static __always_inline void __##func(struct pt_regs *regs)
......
...@@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags); ...@@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token) noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
{ {
u32 reason = kvm_read_and_reset_apf_flags(); u32 reason = kvm_read_and_reset_apf_flags();
bool rcu_exit; idtentry_state_t state;
switch (reason) { switch (reason) {
case KVM_PV_REASON_PAGE_NOT_PRESENT: case KVM_PV_REASON_PAGE_NOT_PRESENT:
...@@ -243,7 +243,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token) ...@@ -243,7 +243,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
return false; return false;
} }
rcu_exit = idtentry_enter_cond_rcu(regs); state = idtentry_enter(regs);
instrumentation_begin(); instrumentation_begin();
/* /*
...@@ -264,7 +264,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token) ...@@ -264,7 +264,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
} }
instrumentation_end(); instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit); idtentry_exit(regs, state);
return true; return true;
} }
......
...@@ -330,7 +330,6 @@ static noinstr void default_do_nmi(struct pt_regs *regs) ...@@ -330,7 +330,6 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
__this_cpu_write(last_nmi_rip, regs->ip); __this_cpu_write(last_nmi_rip, regs->ip);
instrumentation_begin(); instrumentation_begin();
trace_hardirqs_off_finish();
handled = nmi_handle(NMI_LOCAL, regs); handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled); __this_cpu_add(nmi_stats.normal, handled);
...@@ -417,8 +416,6 @@ static noinstr void default_do_nmi(struct pt_regs *regs) ...@@ -417,8 +416,6 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
unknown_nmi_error(reason, regs); unknown_nmi_error(reason, regs);
out: out:
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end(); instrumentation_end();
} }
...@@ -478,6 +475,8 @@ static DEFINE_PER_CPU(unsigned long, nmi_dr7); ...@@ -478,6 +475,8 @@ static DEFINE_PER_CPU(unsigned long, nmi_dr7);
DEFINE_IDTENTRY_RAW(exc_nmi) DEFINE_IDTENTRY_RAW(exc_nmi)
{ {
bool irq_state;
if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id())) if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
return; return;
...@@ -491,14 +490,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi) ...@@ -491,14 +490,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
this_cpu_write(nmi_dr7, local_db_save()); this_cpu_write(nmi_dr7, local_db_save());
nmi_enter(); irq_state = idtentry_enter_nmi(regs);
inc_irq_stat(__nmi_count); inc_irq_stat(__nmi_count);
if (!ignore_nmis) if (!ignore_nmis)
default_do_nmi(regs); default_do_nmi(regs);
nmi_exit(); idtentry_exit_nmi(regs, irq_state);
local_db_restore(this_cpu_read(nmi_dr7)); local_db_restore(this_cpu_read(nmi_dr7));
......
...@@ -245,7 +245,7 @@ static noinstr bool handle_bug(struct pt_regs *regs) ...@@ -245,7 +245,7 @@ static noinstr bool handle_bug(struct pt_regs *regs)
DEFINE_IDTENTRY_RAW(exc_invalid_op) DEFINE_IDTENTRY_RAW(exc_invalid_op)
{ {
bool rcu_exit; idtentry_state_t state;
/* /*
* We use UD2 as a short encoding for 'CALL __WARN', as such * We use UD2 as a short encoding for 'CALL __WARN', as such
...@@ -255,11 +255,11 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op) ...@@ -255,11 +255,11 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
if (!user_mode(regs) && handle_bug(regs)) if (!user_mode(regs) && handle_bug(regs))
return; return;
rcu_exit = idtentry_enter_cond_rcu(regs); state = idtentry_enter(regs);
instrumentation_begin(); instrumentation_begin();
handle_invalid_op(regs); handle_invalid_op(regs);
instrumentation_end(); instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit); idtentry_exit(regs, state);
} }
DEFINE_IDTENTRY(exc_coproc_segment_overrun) DEFINE_IDTENTRY(exc_coproc_segment_overrun)
...@@ -405,7 +405,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault) ...@@ -405,7 +405,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
} }
#endif #endif
nmi_enter(); idtentry_enter_nmi(regs);
instrumentation_begin(); instrumentation_begin();
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
...@@ -651,15 +651,12 @@ DEFINE_IDTENTRY_RAW(exc_int3) ...@@ -651,15 +651,12 @@ DEFINE_IDTENTRY_RAW(exc_int3)
instrumentation_end(); instrumentation_end();
idtentry_exit_user(regs); idtentry_exit_user(regs);
} else { } else {
nmi_enter(); bool irq_state = idtentry_enter_nmi(regs);
instrumentation_begin(); instrumentation_begin();
trace_hardirqs_off_finish();
if (!do_int3(regs)) if (!do_int3(regs))
die("int3", regs, 0); die("int3", regs, 0);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end(); instrumentation_end();
nmi_exit(); idtentry_exit_nmi(regs, irq_state);
} }
} }
...@@ -867,9 +864,8 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user) ...@@ -867,9 +864,8 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
static __always_inline void exc_debug_kernel(struct pt_regs *regs, static __always_inline void exc_debug_kernel(struct pt_regs *regs,
unsigned long dr6) unsigned long dr6)
{ {
nmi_enter(); bool irq_state = idtentry_enter_nmi(regs);
instrumentation_begin(); instrumentation_begin();
trace_hardirqs_off_finish();
/* /*
* If something gets miswired and we end up here for a user mode * If something gets miswired and we end up here for a user mode
...@@ -886,10 +882,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs, ...@@ -886,10 +882,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
handle_debug(regs, dr6, false); handle_debug(regs, dr6, false);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end(); instrumentation_end();
nmi_exit(); idtentry_exit_nmi(regs, irq_state);
} }
static __always_inline void exc_debug_user(struct pt_regs *regs, static __always_inline void exc_debug_user(struct pt_regs *regs,
...@@ -905,6 +899,7 @@ static __always_inline void exc_debug_user(struct pt_regs *regs, ...@@ -905,6 +899,7 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
instrumentation_begin(); instrumentation_begin();
handle_debug(regs, dr6, true); handle_debug(regs, dr6, true);
instrumentation_end(); instrumentation_end();
idtentry_exit_user(regs); idtentry_exit_user(regs);
} }
......
...@@ -1377,7 +1377,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1377,7 +1377,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{ {
unsigned long address = read_cr2(); unsigned long address = read_cr2();
bool rcu_exit; idtentry_state_t state;
prefetchw(&current->mm->mmap_lock); prefetchw(&current->mm->mmap_lock);
...@@ -1412,11 +1412,11 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) ...@@ -1412,11 +1412,11 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
* code reenabled RCU to avoid subsequent wreckage which helps * code reenabled RCU to avoid subsequent wreckage which helps
* debugability. * debugability.
*/ */
rcu_exit = idtentry_enter_cond_rcu(regs); state = idtentry_enter(regs);
instrumentation_begin(); instrumentation_begin();
handle_page_fault(regs, error_code, address); handle_page_fault(regs, error_code, address);
instrumentation_end(); instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit); idtentry_exit(regs, state);
} }
...@@ -135,7 +135,7 @@ static inline void cpa_inc_2m_checked(void) ...@@ -135,7 +135,7 @@ static inline void cpa_inc_2m_checked(void)
static inline void cpa_inc_4k_install(void) static inline void cpa_inc_4k_install(void)
{ {
cpa_4k_install++; data_race(cpa_4k_install++);
} }
static inline void cpa_inc_lp_sameprot(int level) static inline void cpa_inc_lp_sameprot(int level)
......
...@@ -19,8 +19,6 @@ ...@@ -19,8 +19,6 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#define ATOMIC_INIT(i) { (i) }
/* /*
* This Xtensa implementation assumes that the right mechanism * This Xtensa implementation assumes that the right mechanism
* for exclusion is for locking interrupts to level EXCM_LEVEL. * for exclusion is for locking interrupts to level EXCM_LEVEL.
......
...@@ -159,8 +159,6 @@ ATOMIC_OP(xor, ^) ...@@ -159,8 +159,6 @@ ATOMIC_OP(xor, ^)
* resource counting etc.. * resource counting etc..
*/ */
#define ATOMIC_INIT(i) { (i) }
/** /**
* atomic_read - read atomic variable * atomic_read - read atomic variable
* @v: pointer of type atomic_t * @v: pointer of type atomic_t
......
...@@ -11,6 +11,7 @@ ...@@ -11,6 +11,7 @@
#define __ASM_GENERIC_QSPINLOCK_H #define __ASM_GENERIC_QSPINLOCK_H
#include <asm-generic/qspinlock_types.h> #include <asm-generic/qspinlock_types.h>
#include <linux/atomic.h>
/** /**
* queued_spin_is_locked - is the spinlock locked? * queued_spin_is_locked - is the spinlock locked?
......
...@@ -9,15 +9,7 @@ ...@@ -9,15 +9,7 @@
#ifndef __ASM_GENERIC_QSPINLOCK_TYPES_H #ifndef __ASM_GENERIC_QSPINLOCK_TYPES_H
#define __ASM_GENERIC_QSPINLOCK_TYPES_H #define __ASM_GENERIC_QSPINLOCK_TYPES_H
/*
* Including atomic.h with PARAVIRT on will cause compilation errors because
* of recursive header file incluson via paravirt_types.h. So don't include
* it if PARAVIRT is on.
*/
#ifndef CONFIG_PARAVIRT
#include <linux/types.h> #include <linux/types.h>
#include <linux/atomic.h>
#endif
typedef struct qspinlock { typedef struct qspinlock {
union { union {
......
...@@ -111,32 +111,42 @@ extern void rcu_nmi_exit(void); ...@@ -111,32 +111,42 @@ extern void rcu_nmi_exit(void);
/* /*
* nmi_enter() can nest up to 15 times; see NMI_BITS. * nmi_enter() can nest up to 15 times; see NMI_BITS.
*/ */
#define nmi_enter() \ #define __nmi_enter() \
do { \ do { \
lockdep_off(); \
arch_nmi_enter(); \ arch_nmi_enter(); \
printk_nmi_enter(); \ printk_nmi_enter(); \
lockdep_off(); \
BUG_ON(in_nmi() == NMI_MASK); \ BUG_ON(in_nmi() == NMI_MASK); \
__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \ __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
rcu_nmi_enter(); \ } while (0)
#define nmi_enter() \
do { \
__nmi_enter(); \
lockdep_hardirq_enter(); \ lockdep_hardirq_enter(); \
rcu_nmi_enter(); \
instrumentation_begin(); \ instrumentation_begin(); \
ftrace_nmi_enter(); \ ftrace_nmi_enter(); \
instrumentation_end(); \ instrumentation_end(); \
} while (0) } while (0)
#define __nmi_exit() \
do { \
BUG_ON(!in_nmi()); \
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
printk_nmi_exit(); \
arch_nmi_exit(); \
lockdep_on(); \
} while (0)
#define nmi_exit() \ #define nmi_exit() \
do { \ do { \
instrumentation_begin(); \ instrumentation_begin(); \
ftrace_nmi_exit(); \ ftrace_nmi_exit(); \
instrumentation_end(); \ instrumentation_end(); \
lockdep_hardirq_exit(); \
rcu_nmi_exit(); \ rcu_nmi_exit(); \
BUG_ON(!in_nmi()); \ lockdep_hardirq_exit(); \
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \ __nmi_exit(); \
lockdep_on(); \
printk_nmi_exit(); \
arch_nmi_exit(); \
} while (0) } while (0)
#endif /* LINUX_HARDIRQ_H */ #endif /* LINUX_HARDIRQ_H */
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include <linux/typecheck.h> #include <linux/typecheck.h>
#include <asm/irqflags.h> #include <asm/irqflags.h>
#include <asm/percpu.h>
/* Currently lockdep_softirqs_on/off is used only by lockdep */ /* Currently lockdep_softirqs_on/off is used only by lockdep */
#ifdef CONFIG_PROVE_LOCKING #ifdef CONFIG_PROVE_LOCKING
...@@ -31,18 +32,35 @@ ...@@ -31,18 +32,35 @@
#endif #endif
#ifdef CONFIG_TRACE_IRQFLAGS #ifdef CONFIG_TRACE_IRQFLAGS
/* Per-task IRQ trace events information. */
struct irqtrace_events {
unsigned int irq_events;
unsigned long hardirq_enable_ip;
unsigned long hardirq_disable_ip;
unsigned int hardirq_enable_event;
unsigned int hardirq_disable_event;
unsigned long softirq_disable_ip;
unsigned long softirq_enable_ip;
unsigned int softirq_disable_event;
unsigned int softirq_enable_event;
};
DECLARE_PER_CPU(int, hardirqs_enabled);
DECLARE_PER_CPU(int, hardirq_context);
extern void trace_hardirqs_on_prepare(void); extern void trace_hardirqs_on_prepare(void);
extern void trace_hardirqs_off_finish(void); extern void trace_hardirqs_off_finish(void);
extern void trace_hardirqs_on(void); extern void trace_hardirqs_on(void);
extern void trace_hardirqs_off(void); extern void trace_hardirqs_off(void);
# define lockdep_hardirq_context(p) ((p)->hardirq_context) # define lockdep_hardirq_context() (this_cpu_read(hardirq_context))
# define lockdep_softirq_context(p) ((p)->softirq_context) # define lockdep_softirq_context(p) ((p)->softirq_context)
# define lockdep_hardirqs_enabled(p) ((p)->hardirqs_enabled) # define lockdep_hardirqs_enabled() (this_cpu_read(hardirqs_enabled))
# define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled) # define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled)
# define lockdep_hardirq_enter() \ # define lockdep_hardirq_enter() \
do { \ do { \
if (!current->hardirq_context++) \ if (this_cpu_inc_return(hardirq_context) == 1) \
current->hardirq_threaded = 0; \ current->hardirq_threaded = 0; \
} while (0) } while (0)
# define lockdep_hardirq_threaded() \ # define lockdep_hardirq_threaded() \
do { \ do { \
...@@ -50,7 +68,7 @@ do { \ ...@@ -50,7 +68,7 @@ do { \
} while (0) } while (0)
# define lockdep_hardirq_exit() \ # define lockdep_hardirq_exit() \
do { \ do { \
current->hardirq_context--; \ this_cpu_dec(hardirq_context); \
} while (0) } while (0)
# define lockdep_softirq_enter() \ # define lockdep_softirq_enter() \
do { \ do { \
...@@ -104,9 +122,9 @@ do { \ ...@@ -104,9 +122,9 @@ do { \
# define trace_hardirqs_off_finish() do { } while (0) # define trace_hardirqs_off_finish() do { } while (0)
# define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_on() do { } while (0)
# define trace_hardirqs_off() do { } while (0) # define trace_hardirqs_off() do { } while (0)
# define lockdep_hardirq_context(p) 0 # define lockdep_hardirq_context() 0
# define lockdep_softirq_context(p) 0 # define lockdep_softirq_context(p) 0
# define lockdep_hardirqs_enabled(p) 0 # define lockdep_hardirqs_enabled() 0
# define lockdep_softirqs_enabled(p) 0 # define lockdep_softirqs_enabled(p) 0
# define lockdep_hardirq_enter() do { } while (0) # define lockdep_hardirq_enter() do { } while (0)
# define lockdep_hardirq_threaded() do { } while (0) # define lockdep_hardirq_threaded() do { } while (0)
......
...@@ -10,33 +10,15 @@ ...@@ -10,33 +10,15 @@
#ifndef __LINUX_LOCKDEP_H #ifndef __LINUX_LOCKDEP_H
#define __LINUX_LOCKDEP_H #define __LINUX_LOCKDEP_H
#include <linux/lockdep_types.h>
#include <asm/percpu.h>
struct task_struct; struct task_struct;
struct lockdep_map;
/* for sysctl */ /* for sysctl */
extern int prove_locking; extern int prove_locking;
extern int lock_stat; extern int lock_stat;
#define MAX_LOCKDEP_SUBCLASSES 8UL
#include <linux/types.h>
enum lockdep_wait_type {
LD_WAIT_INV = 0, /* not checked, catch all */
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
#else
LD_WAIT_CONFIG = LD_WAIT_SPIN,
#endif
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
LD_WAIT_MAX, /* must be last */
};
#ifdef CONFIG_LOCKDEP #ifdef CONFIG_LOCKDEP
#include <linux/linkage.h> #include <linux/linkage.h>
...@@ -44,147 +26,6 @@ enum lockdep_wait_type { ...@@ -44,147 +26,6 @@ enum lockdep_wait_type {
#include <linux/debug_locks.h> #include <linux/debug_locks.h>
#include <linux/stacktrace.h> #include <linux/stacktrace.h>
/*
* We'd rather not expose kernel/lockdep_states.h this wide, but we do need
* the total number of states... :-(
*/
#define XXX_LOCK_USAGE_STATES (1+2*4)
/*
* NR_LOCKDEP_CACHING_CLASSES ... Number of classes
* cached in the instance of lockdep_map
*
* Currently main class (subclass == 0) and signle depth subclass
* are cached in lockdep_map. This optimization is mainly targeting
* on rq->lock. double_rq_lock() acquires this highly competitive with
* single depth.
*/
#define NR_LOCKDEP_CACHING_CLASSES 2
/*
* A lockdep key is associated with each lock object. For static locks we use
* the lock address itself as the key. Dynamically allocated lock objects can
* have a statically or dynamically allocated key. Dynamically allocated lock
* keys must be registered before being used and must be unregistered before
* the key memory is freed.
*/
struct lockdep_subclass_key {
char __one_byte;
} __attribute__ ((__packed__));
/* hash_entry is used to keep track of dynamically allocated keys. */
struct lock_class_key {
union {
struct hlist_node hash_entry;
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};
};
extern struct lock_class_key __lockdep_no_validate__;
struct lock_trace;
#define LOCKSTAT_POINTS 4
/*
* The lock-class itself. The order of the structure members matters.
* reinit_class() zeroes the key member and all subsequent members.
*/
struct lock_class {
/*
* class-hash:
*/
struct hlist_node hash_entry;
/*
* Entry in all_lock_classes when in use. Entry in free_lock_classes
* when not in use. Instances that are being freed are on one of the
* zapped_classes lists.
*/
struct list_head lock_entry;
/*
* These fields represent a directed graph of lock dependencies,
* to every node we attach a list of "forward" and a list of
* "backward" graph nodes.
*/
struct list_head locks_after, locks_before;
const struct lockdep_subclass_key *key;
unsigned int subclass;
unsigned int dep_gen_id;
/*
* IRQ/softirq usage tracking bits:
*/
unsigned long usage_mask;
const struct lock_trace *usage_traces[XXX_LOCK_USAGE_STATES];
/*
* Generation counter, when doing certain classes of graph walking,
* to ensure that we check one node only once:
*/
int name_version;
const char *name;
short wait_type_inner;
short wait_type_outer;
#ifdef CONFIG_LOCK_STAT
unsigned long contention_point[LOCKSTAT_POINTS];
unsigned long contending_point[LOCKSTAT_POINTS];
#endif
} __no_randomize_layout;
#ifdef CONFIG_LOCK_STAT
struct lock_time {
s64 min;
s64 max;
s64 total;
unsigned long nr;
};
enum bounce_type {
bounce_acquired_write,
bounce_acquired_read,
bounce_contended_write,
bounce_contended_read,
nr_bounce_types,
bounce_acquired = bounce_acquired_write,
bounce_contended = bounce_contended_write,
};
struct lock_class_stats {
unsigned long contention_point[LOCKSTAT_POINTS];
unsigned long contending_point[LOCKSTAT_POINTS];
struct lock_time read_waittime;
struct lock_time write_waittime;
struct lock_time read_holdtime;
struct lock_time write_holdtime;
unsigned long bounces[nr_bounce_types];
};
struct lock_class_stats lock_stats(struct lock_class *class);
void clear_lock_stats(struct lock_class *class);
#endif
/*
* Map the lock object (the lock instance) to the lock-class object.
* This is embedded into specific lock instances:
*/
struct lockdep_map {
struct lock_class_key *key;
struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES];
const char *name;
short wait_type_outer; /* can be taken in this context */
short wait_type_inner; /* presents this context */
#ifdef CONFIG_LOCK_STAT
int cpu;
unsigned long ip;
#endif
};
static inline void lockdep_copy_map(struct lockdep_map *to, static inline void lockdep_copy_map(struct lockdep_map *to,
struct lockdep_map *from) struct lockdep_map *from)
{ {
...@@ -440,8 +281,6 @@ static inline void lock_set_subclass(struct lockdep_map *lock, ...@@ -440,8 +281,6 @@ static inline void lock_set_subclass(struct lockdep_map *lock,
extern void lock_downgrade(struct lockdep_map *lock, unsigned long ip); extern void lock_downgrade(struct lockdep_map *lock, unsigned long ip);
struct pin_cookie { unsigned int val; };
#define NIL_COOKIE (struct pin_cookie){ .val = 0U, } #define NIL_COOKIE (struct pin_cookie){ .val = 0U, }
extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock); extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
...@@ -520,10 +359,6 @@ static inline void lockdep_set_selftest_task(struct task_struct *task) ...@@ -520,10 +359,6 @@ static inline void lockdep_set_selftest_task(struct task_struct *task)
# define lockdep_reset() do { debug_locks = 1; } while (0) # define lockdep_reset() do { debug_locks = 1; } while (0)
# define lockdep_free_key_range(start, size) do { } while (0) # define lockdep_free_key_range(start, size) do { } while (0)
# define lockdep_sys_exit() do { } while (0) # define lockdep_sys_exit() do { } while (0)
/*
* The class key takes no space if lockdep is disabled:
*/
struct lock_class_key { };
static inline void lockdep_register_key(struct lock_class_key *key) static inline void lockdep_register_key(struct lock_class_key *key)
{ {
...@@ -533,11 +368,6 @@ static inline void lockdep_unregister_key(struct lock_class_key *key) ...@@ -533,11 +368,6 @@ static inline void lockdep_unregister_key(struct lock_class_key *key)
{ {
} }
/*
* The lockdep_map takes no space if lockdep is disabled:
*/
struct lockdep_map { };
#define lockdep_depth(tsk) (0) #define lockdep_depth(tsk) (0)
#define lockdep_is_held_type(l, r) (1) #define lockdep_is_held_type(l, r) (1)
...@@ -549,8 +379,6 @@ struct lockdep_map { }; ...@@ -549,8 +379,6 @@ struct lockdep_map { };
#define lockdep_recursing(tsk) (0) #define lockdep_recursing(tsk) (0)
struct pin_cookie { };
#define NIL_COOKIE (struct pin_cookie){ } #define NIL_COOKIE (struct pin_cookie){ }
#define lockdep_pin_lock(l) ({ struct pin_cookie cookie = { }; cookie; }) #define lockdep_pin_lock(l) ({ struct pin_cookie cookie = { }; cookie; })
...@@ -703,38 +531,58 @@ do { \ ...@@ -703,38 +531,58 @@ do { \
lock_release(&(lock)->dep_map, _THIS_IP_); \ lock_release(&(lock)->dep_map, _THIS_IP_); \
} while (0) } while (0)
#define lockdep_assert_irqs_enabled() do { \ DECLARE_PER_CPU(int, hardirqs_enabled);
WARN_ONCE(debug_locks && !current->lockdep_recursion && \ DECLARE_PER_CPU(int, hardirq_context);
!current->hardirqs_enabled, \
"IRQs not enabled as expected\n"); \
} while (0)
#define lockdep_assert_irqs_disabled() do { \ #define lockdep_assert_irqs_enabled() \
WARN_ONCE(debug_locks && !current->lockdep_recursion && \ do { \
current->hardirqs_enabled, \ WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled)); \
"IRQs not disabled as expected\n"); \ } while (0)
} while (0)
#define lockdep_assert_in_irq() do { \ #define lockdep_assert_irqs_disabled() \
WARN_ONCE(debug_locks && !current->lockdep_recursion && \ do { \
!current->hardirq_context, \ WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled)); \
"Not in hardirq as expected\n"); \ } while (0)
} while (0)
#define lockdep_assert_in_irq() \
do { \
WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context)); \
} while (0)
#define lockdep_assert_preemption_enabled() \
do { \
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
debug_locks && \
(preempt_count() != 0 || \
!this_cpu_read(hardirqs_enabled))); \
} while (0)
#define lockdep_assert_preemption_disabled() \
do { \
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
debug_locks && \
(preempt_count() == 0 && \
this_cpu_read(hardirqs_enabled))); \
} while (0)
#else #else
# define might_lock(lock) do { } while (0) # define might_lock(lock) do { } while (0)
# define might_lock_read(lock) do { } while (0) # define might_lock_read(lock) do { } while (0)
# define might_lock_nested(lock, subclass) do { } while (0) # define might_lock_nested(lock, subclass) do { } while (0)
# define lockdep_assert_irqs_enabled() do { } while (0) # define lockdep_assert_irqs_enabled() do { } while (0)
# define lockdep_assert_irqs_disabled() do { } while (0) # define lockdep_assert_irqs_disabled() do { } while (0)
# define lockdep_assert_in_irq() do { } while (0) # define lockdep_assert_in_irq() do { } while (0)
# define lockdep_assert_preemption_enabled() do { } while (0)
# define lockdep_assert_preemption_disabled() do { } while (0)
#endif #endif
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
# define lockdep_assert_RT_in_threaded_ctx() do { \ # define lockdep_assert_RT_in_threaded_ctx() do { \
WARN_ONCE(debug_locks && !current->lockdep_recursion && \ WARN_ONCE(debug_locks && !current->lockdep_recursion && \
current->hardirq_context && \ lockdep_hardirq_context() && \
!(current->hardirq_threaded || current->irq_config), \ !(current->hardirq_threaded || current->irq_config), \
"Not in threaded context on PREEMPT_RT as expected\n"); \ "Not in threaded context on PREEMPT_RT as expected\n"); \
} while (0) } while (0)
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Runtime locking correctness validator
*
* Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
* Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
*
* see Documentation/locking/lockdep-design.rst for more details.
*/
#ifndef __LINUX_LOCKDEP_TYPES_H
#define __LINUX_LOCKDEP_TYPES_H
#include <linux/types.h>
#define MAX_LOCKDEP_SUBCLASSES 8UL
enum lockdep_wait_type {
LD_WAIT_INV = 0, /* not checked, catch all */
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
#else
LD_WAIT_CONFIG = LD_WAIT_SPIN,
#endif
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
LD_WAIT_MAX, /* must be last */
};
#ifdef CONFIG_LOCKDEP
/*
* We'd rather not expose kernel/lockdep_states.h this wide, but we do need
* the total number of states... :-(
*/
#define XXX_LOCK_USAGE_STATES (1+2*4)
/*
* NR_LOCKDEP_CACHING_CLASSES ... Number of classes
* cached in the instance of lockdep_map
*
* Currently main class (subclass == 0) and signle depth subclass
* are cached in lockdep_map. This optimization is mainly targeting
* on rq->lock. double_rq_lock() acquires this highly competitive with
* single depth.
*/
#define NR_LOCKDEP_CACHING_CLASSES 2
/*
* A lockdep key is associated with each lock object. For static locks we use
* the lock address itself as the key. Dynamically allocated lock objects can
* have a statically or dynamically allocated key. Dynamically allocated lock
* keys must be registered before being used and must be unregistered before
* the key memory is freed.
*/
struct lockdep_subclass_key {
char __one_byte;
} __attribute__ ((__packed__));
/* hash_entry is used to keep track of dynamically allocated keys. */
struct lock_class_key {
union {
struct hlist_node hash_entry;
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};
};
extern struct lock_class_key __lockdep_no_validate__;
struct lock_trace;
#define LOCKSTAT_POINTS 4
/*
* The lock-class itself. The order of the structure members matters.
* reinit_class() zeroes the key member and all subsequent members.
*/
struct lock_class {
/*
* class-hash:
*/
struct hlist_node hash_entry;
/*
* Entry in all_lock_classes when in use. Entry in free_lock_classes
* when not in use. Instances that are being freed are on one of the
* zapped_classes lists.
*/
struct list_head lock_entry;
/*
* These fields represent a directed graph of lock dependencies,
* to every node we attach a list of "forward" and a list of
* "backward" graph nodes.
*/
struct list_head locks_after, locks_before;
const struct lockdep_subclass_key *key;
unsigned int subclass;
unsigned int dep_gen_id;
/*
* IRQ/softirq usage tracking bits:
*/
unsigned long usage_mask;
const struct lock_trace *usage_traces[XXX_LOCK_USAGE_STATES];
/*
* Generation counter, when doing certain classes of graph walking,
* to ensure that we check one node only once:
*/
int name_version;
const char *name;
short wait_type_inner;
short wait_type_outer;
#ifdef CONFIG_LOCK_STAT
unsigned long contention_point[LOCKSTAT_POINTS];
unsigned long contending_point[LOCKSTAT_POINTS];
#endif
} __no_randomize_layout;
#ifdef CONFIG_LOCK_STAT
struct lock_time {
s64 min;
s64 max;
s64 total;
unsigned long nr;
};
enum bounce_type {
bounce_acquired_write,
bounce_acquired_read,
bounce_contended_write,
bounce_contended_read,
nr_bounce_types,
bounce_acquired = bounce_acquired_write,
bounce_contended = bounce_contended_write,
};
struct lock_class_stats {
unsigned long contention_point[LOCKSTAT_POINTS];
unsigned long contending_point[LOCKSTAT_POINTS];
struct lock_time read_waittime;
struct lock_time write_waittime;
struct lock_time read_holdtime;
struct lock_time write_holdtime;
unsigned long bounces[nr_bounce_types];
};
struct lock_class_stats lock_stats(struct lock_class *class);
void clear_lock_stats(struct lock_class *class);
#endif
/*
* Map the lock object (the lock instance) to the lock-class object.
* This is embedded into specific lock instances:
*/
struct lockdep_map {
struct lock_class_key *key;
struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES];
const char *name;
short wait_type_outer; /* can be taken in this context */
short wait_type_inner; /* presents this context */
#ifdef CONFIG_LOCK_STAT
int cpu;
unsigned long ip;
#endif
};
struct pin_cookie { unsigned int val; };
#else /* !CONFIG_LOCKDEP */
/*
* The class key takes no space if lockdep is disabled:
*/
struct lock_class_key { };
/*
* The lockdep_map takes no space if lockdep is disabled:
*/
struct lockdep_map { };
struct pin_cookie { };
#endif /* !LOCKDEP */
#endif /* __LINUX_LOCKDEP_TYPES_H */
...@@ -248,6 +248,8 @@ static inline void __list_splice_init_rcu(struct list_head *list, ...@@ -248,6 +248,8 @@ static inline void __list_splice_init_rcu(struct list_head *list,
*/ */
sync(); sync();
ASSERT_EXCLUSIVE_ACCESS(*first);
ASSERT_EXCLUSIVE_ACCESS(*last);
/* /*
* Readers are finished with the source list, so perform splice. * Readers are finished with the source list, so perform splice.
......
...@@ -60,39 +60,39 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) ...@@ -60,39 +60,39 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
} }
#define RWSEM_UNLOCKED_VALUE 0L #define RWSEM_UNLOCKED_VALUE 0L
#define __RWSEM_INIT_COUNT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE) #define __RWSEM_COUNT_INIT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
/* Common initializer macros and functions */ /* Common initializer macros and functions */
#ifdef CONFIG_DEBUG_LOCK_ALLOC #ifdef CONFIG_DEBUG_LOCK_ALLOC
# define __RWSEM_DEP_MAP_INIT(lockname) \ # define __RWSEM_DEP_MAP_INIT(lockname) \
, .dep_map = { \ .dep_map = { \
.name = #lockname, \ .name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \ .wait_type_inner = LD_WAIT_SLEEP, \
} },
#else #else
# define __RWSEM_DEP_MAP_INIT(lockname) # define __RWSEM_DEP_MAP_INIT(lockname)
#endif #endif
#ifdef CONFIG_DEBUG_RWSEMS #ifdef CONFIG_DEBUG_RWSEMS
# define __DEBUG_RWSEM_INITIALIZER(lockname) , .magic = &lockname # define __RWSEM_DEBUG_INIT(lockname) .magic = &lockname,
#else #else
# define __DEBUG_RWSEM_INITIALIZER(lockname) # define __RWSEM_DEBUG_INIT(lockname)
#endif #endif
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED #define __RWSEM_OPT_INIT(lockname) .osq = OSQ_LOCK_UNLOCKED,
#else #else
#define __RWSEM_OPT_INIT(lockname) #define __RWSEM_OPT_INIT(lockname)
#endif #endif
#define __RWSEM_INITIALIZER(name) \ #define __RWSEM_INITIALIZER(name) \
{ __RWSEM_INIT_COUNT(name), \ { __RWSEM_COUNT_INIT(name), \
.owner = ATOMIC_LONG_INIT(0), \ .owner = ATOMIC_LONG_INIT(0), \
.wait_list = LIST_HEAD_INIT((name).wait_list), \
.wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock) \
__RWSEM_OPT_INIT(name) \ __RWSEM_OPT_INIT(name) \
__DEBUG_RWSEM_INITIALIZER(name) \ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock),\
.wait_list = LIST_HEAD_INIT((name).wait_list), \
__RWSEM_DEBUG_INIT(name) \
__RWSEM_DEP_MAP_INIT(name) } __RWSEM_DEP_MAP_INIT(name) }
#define DECLARE_RWSEM(name) \ #define DECLARE_RWSEM(name) \
......
...@@ -18,6 +18,7 @@ ...@@ -18,6 +18,7 @@
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/plist.h> #include <linux/plist.h>
#include <linux/hrtimer.h> #include <linux/hrtimer.h>
#include <linux/irqflags.h>
#include <linux/seccomp.h> #include <linux/seccomp.h>
#include <linux/nodemask.h> #include <linux/nodemask.h>
#include <linux/rcupdate.h> #include <linux/rcupdate.h>
...@@ -980,19 +981,9 @@ struct task_struct { ...@@ -980,19 +981,9 @@ struct task_struct {
#endif #endif
#ifdef CONFIG_TRACE_IRQFLAGS #ifdef CONFIG_TRACE_IRQFLAGS
unsigned int irq_events; struct irqtrace_events irqtrace;
unsigned int hardirq_threaded; unsigned int hardirq_threaded;
unsigned long hardirq_enable_ip;
unsigned long hardirq_disable_ip;
unsigned int hardirq_enable_event;
unsigned int hardirq_disable_event;
int hardirqs_enabled;
int hardirq_context;
u64 hardirq_chain_key; u64 hardirq_chain_key;
unsigned long softirq_disable_ip;
unsigned long softirq_enable_ip;
unsigned int softirq_disable_event;
unsigned int softirq_enable_event;
int softirqs_enabled; int softirqs_enabled;
int softirq_context; int softirq_context;
int irq_config; int irq_config;
...@@ -1193,8 +1184,12 @@ struct task_struct { ...@@ -1193,8 +1184,12 @@ struct task_struct {
#ifdef CONFIG_KASAN #ifdef CONFIG_KASAN
unsigned int kasan_depth; unsigned int kasan_depth;
#endif #endif
#ifdef CONFIG_KCSAN #ifdef CONFIG_KCSAN
struct kcsan_ctx kcsan_ctx; struct kcsan_ctx kcsan_ctx;
#ifdef CONFIG_TRACE_IRQFLAGS
struct irqtrace_events kcsan_save_irqtrace;
#endif
#endif #endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
......
/* SPDX-License-Identifier: GPL-2.0 */ /* SPDX-License-Identifier: GPL-2.0 */
#ifndef __LINUX_SEQLOCK_H #ifndef __LINUX_SEQLOCK_H
#define __LINUX_SEQLOCK_H #define __LINUX_SEQLOCK_H
/* /*
* Reader/writer consistent mechanism without starving writers. This type of * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
* lock for data where the reader wants a consistent set of information * lockless readers (read-only retry loops), and no writer starvation.
* and is willing to retry if the information changes. There are two types
* of readers:
* 1. Sequence readers which never block a writer but they may have to retry
* if a writer is in progress by detecting change in sequence number.
* Writers do not wait for a sequence reader.
* 2. Locking readers which will wait if a writer or another locking reader
* is in progress. A locking reader in progress will also block a writer
* from going forward. Unlike the regular rwlock, the read lock here is
* exclusive so that only one locking reader can get it.
*
* This is not as cache friendly as brlock. Also, this may not work well
* for data that contains pointers, because any writer could
* invalidate a pointer that a reader was following.
*
* Expected non-blocking reader usage:
* do {
* seq = read_seqbegin(&foo);
* ...
* } while (read_seqretry(&foo, seq));
*
* *
* On non-SMP the spin locks disappear but the writer still needs * See Documentation/locking/seqlock.rst
* to increment the sequence variables because an interrupt routine could
* change the state of the data.
* *
* Based on x86_64 vsyscall gettimeofday * Copyrights:
* by Keith Owens and Andrea Arcangeli * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
*/ */
#include <linux/spinlock.h> #include <linux/spinlock.h>
...@@ -41,8 +20,8 @@ ...@@ -41,8 +20,8 @@
#include <asm/processor.h> #include <asm/processor.h>
/* /*
* The seqlock interface does not prescribe a precise sequence of read * The seqlock seqcount_t interface does not prescribe a precise sequence of
* begin/retry/end. For readers, typically there is a call to * read begin/retry/end. For readers, typically there is a call to
* read_seqcount_begin() and read_seqcount_retry(), however, there are more * read_seqcount_begin() and read_seqcount_retry(), however, there are more
* esoteric cases which do not follow this pattern. * esoteric cases which do not follow this pattern.
* *
...@@ -50,16 +29,30 @@ ...@@ -50,16 +29,30 @@
* via seqcount_t under KCSAN: upon beginning a seq-reader critical section, * via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
* pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
* atomics; if there is a matching read_seqcount_retry() call, no following * atomics; if there is a matching read_seqcount_retry() call, no following
* memory operations are considered atomic. Usage of seqlocks via seqlock_t * memory operations are considered atomic. Usage of the seqlock_t interface
* interface is not affected. * is not affected.
*/ */
#define KCSAN_SEQLOCK_REGION_MAX 1000 #define KCSAN_SEQLOCK_REGION_MAX 1000
/* /*
* Version using sequence counter only. * Sequence counters (seqcount_t)
* This can be used when code has its own mutex protecting the *
* updating starting before the write_seqcountbeqin() and ending * This is the raw counting mechanism, without any writer protection.
* after the write_seqcount_end(). *
* Write side critical sections must be serialized and non-preemptible.
*
* If readers can be invoked from hardirq or softirq contexts,
* interrupts or bottom halves must also be respectively disabled before
* entering the write section.
*
* This mechanism can't be used if the protected data contains pointers,
* as the writer can invalidate a pointer that a reader is following.
*
* If it's desired to automatically handle the sequence counter writer
* serialization and non-preemptibility requirements, use a sequential
* lock (seqlock_t) instead.
*
* See Documentation/locking/seqlock.rst
*/ */
typedef struct seqcount { typedef struct seqcount {
unsigned sequence; unsigned sequence;
...@@ -82,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name, ...@@ -82,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
# define SEQCOUNT_DEP_MAP_INIT(lockname) \ # define SEQCOUNT_DEP_MAP_INIT(lockname) \
.dep_map = { .name = #lockname } \ .dep_map = { .name = #lockname } \
/**
* seqcount_init() - runtime initializer for seqcount_t
* @s: Pointer to the seqcount_t instance
*/
# define seqcount_init(s) \ # define seqcount_init(s) \
do { \ do { \
static struct lock_class_key __key; \ static struct lock_class_key __key; \
...@@ -105,13 +102,15 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s) ...@@ -105,13 +102,15 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
# define seqcount_lockdep_reader_access(x) # define seqcount_lockdep_reader_access(x)
#endif #endif
#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)} /**
* SEQCNT_ZERO() - static initializer for seqcount_t
* @name: Name of the seqcount_t instance
*/
#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
/** /**
* __read_seqcount_begin - begin a seq-read critical section (without barrier) * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
* @s: pointer to seqcount_t * @s: Pointer to seqcount_t
* Returns: count to be passed to read_seqcount_retry
* *
* __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb() * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
...@@ -120,6 +119,8 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s) ...@@ -120,6 +119,8 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
* *
* Use carefully, only in critical code, and comment how the barrier is * Use carefully, only in critical code, and comment how the barrier is
* provided. * provided.
*
* Return: count to be passed to read_seqcount_retry()
*/ */
static inline unsigned __read_seqcount_begin(const seqcount_t *s) static inline unsigned __read_seqcount_begin(const seqcount_t *s)
{ {
...@@ -136,30 +137,10 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s) ...@@ -136,30 +137,10 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
} }
/** /**
* raw_read_seqcount - Read the raw seqcount * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
* @s: pointer to seqcount_t * @s: Pointer to seqcount_t
* Returns: count to be passed to read_seqcount_retry
* *
* raw_read_seqcount opens a read critical section of the given * Return: count to be passed to read_seqcount_retry()
* seqcount without any lockdep checking and without checking or
* masking the LSB. Calling code is responsible for handling that.
*/
static inline unsigned raw_read_seqcount(const seqcount_t *s)
{
unsigned ret = READ_ONCE(s->sequence);
smp_rmb();
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret;
}
/**
* raw_read_seqcount_begin - start seq-read critical section w/o lockdep
* @s: pointer to seqcount_t
* Returns: count to be passed to read_seqcount_retry
*
* raw_read_seqcount_begin opens a read critical section of the given
* seqcount, but without any lockdep checking. Validity of the critical
* section is tested by checking read_seqcount_retry function.
*/ */
static inline unsigned raw_read_seqcount_begin(const seqcount_t *s) static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
{ {
...@@ -169,13 +150,10 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s) ...@@ -169,13 +150,10 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
} }
/** /**
* read_seqcount_begin - begin a seq-read critical section * read_seqcount_begin() - begin a seqcount_t read critical section
* @s: pointer to seqcount_t * @s: Pointer to seqcount_t
* Returns: count to be passed to read_seqcount_retry
* *
* read_seqcount_begin opens a read critical section of the given seqcount. * Return: count to be passed to read_seqcount_retry()
* Validity of the critical section is tested by checking read_seqcount_retry
* function.
*/ */
static inline unsigned read_seqcount_begin(const seqcount_t *s) static inline unsigned read_seqcount_begin(const seqcount_t *s)
{ {
...@@ -184,32 +162,54 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s) ...@@ -184,32 +162,54 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
} }
/** /**
* raw_seqcount_begin - begin a seq-read critical section * raw_read_seqcount() - read the raw seqcount_t counter value
* @s: pointer to seqcount_t * @s: Pointer to seqcount_t
* Returns: count to be passed to read_seqcount_retry
* *
* raw_seqcount_begin opens a read critical section of the given seqcount. * raw_read_seqcount opens a read critical section of the given
* Validity of the critical section is tested by checking read_seqcount_retry * seqcount_t, without any lockdep checking, and without checking or
* function. * masking the sequence counter LSB. Calling code is responsible for
* handling that.
* *
* Unlike read_seqcount_begin(), this function will not wait for the count * Return: count to be passed to read_seqcount_retry()
* to stabilize. If a writer is active when we begin, we will fail the
* read_seqcount_retry() instead of stabilizing at the beginning of the
* critical section.
*/ */
static inline unsigned raw_seqcount_begin(const seqcount_t *s) static inline unsigned raw_read_seqcount(const seqcount_t *s)
{ {
unsigned ret = READ_ONCE(s->sequence); unsigned ret = READ_ONCE(s->sequence);
smp_rmb(); smp_rmb();
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret & ~1; return ret;
} }
/** /**
* __read_seqcount_retry - end a seq-read critical section (without barrier) * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
* @s: pointer to seqcount_t * lockdep and w/o counter stabilization
* @start: count, from read_seqcount_begin * @s: Pointer to seqcount_t
* Returns: 1 if retry is required, else 0 *
* raw_seqcount_begin opens a read critical section of the given
* seqcount_t. Unlike read_seqcount_begin(), this function will not wait
* for the count to stabilize. If a writer is active when it begins, it
* will fail the read_seqcount_retry() at the end of the read critical
* section instead of stabilizing at the beginning of it.
*
* Use this only in special kernel hot paths where the read section is
* small and has a high probability of success through other external
* means. It will save a single branching instruction.
*
* Return: count to be passed to read_seqcount_retry()
*/
static inline unsigned raw_seqcount_begin(const seqcount_t *s)
{
/*
* If the counter is odd, let read_seqcount_retry() fail
* by decrementing the counter.
*/
return raw_read_seqcount(s) & ~1;
}
/**
* __read_seqcount_retry() - end a seqcount_t read section w/o barrier
* @s: Pointer to seqcount_t
* @start: count, from read_seqcount_begin()
* *
* __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb() * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
...@@ -218,6 +218,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s) ...@@ -218,6 +218,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
* *
* Use carefully, only in critical code, and comment how the barrier is * Use carefully, only in critical code, and comment how the barrier is
* provided. * provided.
*
* Return: true if a read section retry is required, else false
*/ */
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start) static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
{ {
...@@ -226,14 +228,15 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start) ...@@ -226,14 +228,15 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
} }
/** /**
* read_seqcount_retry - end a seq-read critical section * read_seqcount_retry() - end a seqcount_t read critical section
* @s: pointer to seqcount_t * @s: Pointer to seqcount_t
* @start: count, from read_seqcount_begin * @start: count, from read_seqcount_begin()
* Returns: 1 if retry is required, else 0
* *
* read_seqcount_retry closes a read critical section of the given seqcount. * read_seqcount_retry closes the read critical section of given
* If the critical section was invalid, it must be ignored (and typically * seqcount_t. If the critical section was invalid, it must be ignored
* retried). * (and typically retried).
*
* Return: true if a read section retry is required, else false
*/ */
static inline int read_seqcount_retry(const seqcount_t *s, unsigned start) static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
{ {
...@@ -241,8 +244,10 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start) ...@@ -241,8 +244,10 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
return __read_seqcount_retry(s, start); return __read_seqcount_retry(s, start);
} }
/**
* raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
* @s: Pointer to seqcount_t
*/
static inline void raw_write_seqcount_begin(seqcount_t *s) static inline void raw_write_seqcount_begin(seqcount_t *s)
{ {
kcsan_nestable_atomic_begin(); kcsan_nestable_atomic_begin();
...@@ -250,6 +255,10 @@ static inline void raw_write_seqcount_begin(seqcount_t *s) ...@@ -250,6 +255,10 @@ static inline void raw_write_seqcount_begin(seqcount_t *s)
smp_wmb(); smp_wmb();
} }
/**
* raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
* @s: Pointer to seqcount_t
*/
static inline void raw_write_seqcount_end(seqcount_t *s) static inline void raw_write_seqcount_end(seqcount_t *s)
{ {
smp_wmb(); smp_wmb();
...@@ -257,45 +266,104 @@ static inline void raw_write_seqcount_end(seqcount_t *s) ...@@ -257,45 +266,104 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
kcsan_nestable_atomic_end(); kcsan_nestable_atomic_end();
} }
static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
{
raw_write_seqcount_begin(s);
seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
}
/** /**
* raw_write_seqcount_barrier - do a seq write barrier * write_seqcount_begin_nested() - start a seqcount_t write section with
* @s: pointer to seqcount_t * custom lockdep nesting level
* @s: Pointer to seqcount_t
* @subclass: lockdep nesting level
* *
* This can be used to provide an ordering guarantee instead of the * See Documentation/locking/lockdep-design.rst
* usual consistency guarantee. It is one wmb cheaper, because we can */
* collapse the two back-to-back wmb()s. static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
{
lockdep_assert_preemption_disabled();
__write_seqcount_begin_nested(s, subclass);
}
/*
* A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
*
* Use for internal seqlock.h code where it's known that preemption is
* already disabled. For example, seqlock_t write side functions.
*/
static inline void __write_seqcount_begin(seqcount_t *s)
{
__write_seqcount_begin_nested(s, 0);
}
/**
* write_seqcount_begin() - start a seqcount_t write side critical section
* @s: Pointer to seqcount_t
*
* write_seqcount_begin opens a write side critical section of the given
* seqcount_t.
*
* Context: seqcount_t write side critical sections must be serialized and
* non-preemptible. If readers can be invoked from hardirq or softirq
* context, interrupts or bottom halves must be respectively disabled.
*/
static inline void write_seqcount_begin(seqcount_t *s)
{
write_seqcount_begin_nested(s, 0);
}
/**
* write_seqcount_end() - end a seqcount_t write side critical section
* @s: Pointer to seqcount_t
*
* The write section must've been opened with write_seqcount_begin().
*/
static inline void write_seqcount_end(seqcount_t *s)
{
seqcount_release(&s->dep_map, _RET_IP_);
raw_write_seqcount_end(s);
}
/**
* raw_write_seqcount_barrier() - do a seqcount_t write barrier
* @s: Pointer to seqcount_t
*
* This can be used to provide an ordering guarantee instead of the usual
* consistency guarantee. It is one wmb cheaper, because it can collapse
* the two back-to-back wmb()s.
* *
* Note that writes surrounding the barrier should be declared atomic (e.g. * Note that writes surrounding the barrier should be declared atomic (e.g.
* via WRITE_ONCE): a) to ensure the writes become visible to other threads * via WRITE_ONCE): a) to ensure the writes become visible to other threads
* atomically, avoiding compiler optimizations; b) to document which writes are * atomically, avoiding compiler optimizations; b) to document which writes are
* meant to propagate to the reader critical section. This is necessary because * meant to propagate to the reader critical section. This is necessary because
* neither writes before and after the barrier are enclosed in a seq-writer * neither writes before and after the barrier are enclosed in a seq-writer
* critical section that would ensure readers are aware of ongoing writes. * critical section that would ensure readers are aware of ongoing writes::
* *
* seqcount_t seq; * seqcount_t seq;
* bool X = true, Y = false; * bool X = true, Y = false;
* *
* void read(void) * void read(void)
* { * {
* bool x, y; * bool x, y;
* *
* do { * do {
* int s = read_seqcount_begin(&seq); * int s = read_seqcount_begin(&seq);
* *
* x = X; y = Y; * x = X; y = Y;
* *
* } while (read_seqcount_retry(&seq, s)); * } while (read_seqcount_retry(&seq, s));
* *
* BUG_ON(!x && !y); * BUG_ON(!x && !y);
* } * }
* *
* void write(void) * void write(void)
* { * {
* WRITE_ONCE(Y, true); * WRITE_ONCE(Y, true);
* *
* raw_write_seqcount_barrier(seq); * raw_write_seqcount_barrier(seq);
* *
* WRITE_ONCE(X, false); * WRITE_ONCE(X, false);
* } * }
*/ */
static inline void raw_write_seqcount_barrier(seqcount_t *s) static inline void raw_write_seqcount_barrier(seqcount_t *s)
...@@ -307,6 +375,37 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s) ...@@ -307,6 +375,37 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
kcsan_nestable_atomic_end(); kcsan_nestable_atomic_end();
} }
/**
* write_seqcount_invalidate() - invalidate in-progress seqcount_t read
* side operations
* @s: Pointer to seqcount_t
*
* After write_seqcount_invalidate, no seqcount_t read side operations
* will complete successfully and see data older than this.
*/
static inline void write_seqcount_invalidate(seqcount_t *s)
{
smp_wmb();
kcsan_nestable_atomic_begin();
s->sequence+=2;
kcsan_nestable_atomic_end();
}
/**
* raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
* @s: Pointer to seqcount_t
*
* Use seqcount_t latching to switch between two storage places protected
* by a sequence counter. Doing so allows having interruptible, preemptible,
* seqcount_t write side critical sections.
*
* Check raw_write_seqcount_latch() for more details and a full reader and
* writer usage example.
*
* Return: sequence counter raw value. Use the lowest bit as an index for
* picking which data copy to read. The full counter value must then be
* checked with read_seqcount_retry().
*/
static inline int raw_read_seqcount_latch(seqcount_t *s) static inline int raw_read_seqcount_latch(seqcount_t *s)
{ {
/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */ /* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
...@@ -315,8 +414,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s) ...@@ -315,8 +414,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
} }
/** /**
* raw_write_seqcount_latch - redirect readers to even/odd copy * raw_write_seqcount_latch() - redirect readers to even/odd copy
* @s: pointer to seqcount_t * @s: Pointer to seqcount_t
* *
* The latch technique is a multiversion concurrency control method that allows * The latch technique is a multiversion concurrency control method that allows
* queries during non-atomic modifications. If you can guarantee queries never * queries during non-atomic modifications. If you can guarantee queries never
...@@ -332,64 +431,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s) ...@@ -332,64 +431,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
* Very simply put: we first modify one copy and then the other. This ensures * Very simply put: we first modify one copy and then the other. This ensures
* there is always one copy in a stable state, ready to give us an answer. * there is always one copy in a stable state, ready to give us an answer.
* *
* The basic form is a data structure like: * The basic form is a data structure like::
* *
* struct latch_struct { * struct latch_struct {
* seqcount_t seq; * seqcount_t seq;
* struct data_struct data[2]; * struct data_struct data[2];
* }; * };
* *
* Where a modification, which is assumed to be externally serialized, does the * Where a modification, which is assumed to be externally serialized, does the
* following: * following::
* *
* void latch_modify(struct latch_struct *latch, ...) * void latch_modify(struct latch_struct *latch, ...)
* { * {
* smp_wmb(); <- Ensure that the last data[1] update is visible * smp_wmb(); // Ensure that the last data[1] update is visible
* latch->seq++; * latch->seq++;
* smp_wmb(); <- Ensure that the seqcount update is visible * smp_wmb(); // Ensure that the seqcount update is visible
* *
* modify(latch->data[0], ...); * modify(latch->data[0], ...);
* *
* smp_wmb(); <- Ensure that the data[0] update is visible * smp_wmb(); // Ensure that the data[0] update is visible
* latch->seq++; * latch->seq++;
* smp_wmb(); <- Ensure that the seqcount update is visible * smp_wmb(); // Ensure that the seqcount update is visible
* *
* modify(latch->data[1], ...); * modify(latch->data[1], ...);
* } * }
* *
* The query will have a form like: * The query will have a form like::
* *
* struct entry *latch_query(struct latch_struct *latch, ...) * struct entry *latch_query(struct latch_struct *latch, ...)
* { * {
* struct entry *entry; * struct entry *entry;
* unsigned seq, idx; * unsigned seq, idx;
* *
* do { * do {
* seq = raw_read_seqcount_latch(&latch->seq); * seq = raw_read_seqcount_latch(&latch->seq);
* *
* idx = seq & 0x01; * idx = seq & 0x01;
* entry = data_query(latch->data[idx], ...); * entry = data_query(latch->data[idx], ...);
* *
* smp_rmb(); * // read_seqcount_retry() includes needed smp_rmb()
* } while (seq != latch->seq); * } while (read_seqcount_retry(&latch->seq, seq));
* *
* return entry; * return entry;
* } * }
* *
* So during the modification, queries are first redirected to data[1]. Then we * So during the modification, queries are first redirected to data[1]. Then we
* modify data[0]. When that is complete, we redirect queries back to data[0] * modify data[0]. When that is complete, we redirect queries back to data[0]
* and we can modify data[1]. * and we can modify data[1].
* *
* NOTE: The non-requirement for atomic modifications does _NOT_ include * NOTE:
* the publishing of new entries in the case where data is a dynamic
* data structure.
* *
* An iteration might start in data[0] and get suspended long enough * The non-requirement for atomic modifications does _NOT_ include
* to miss an entire modification sequence, once it resumes it might * the publishing of new entries in the case where data is a dynamic
* observe the new entry. * data structure.
* *
* NOTE: When data is a dynamic data structure; one should use regular RCU * An iteration might start in data[0] and get suspended long enough
* patterns to manage the lifetimes of the objects within. * to miss an entire modification sequence, once it resumes it might
* observe the new entry.
*
* NOTE:
*
* When data is a dynamic data structure; one should use regular RCU
* patterns to manage the lifetimes of the objects within.
*/ */
static inline void raw_write_seqcount_latch(seqcount_t *s) static inline void raw_write_seqcount_latch(seqcount_t *s)
{ {
...@@ -399,67 +502,48 @@ static inline void raw_write_seqcount_latch(seqcount_t *s) ...@@ -399,67 +502,48 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
} }
/* /*
* Sequence counter only version assumes that callers are using their * Sequential locks (seqlock_t)
* own mutexing.
*/
static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
{
raw_write_seqcount_begin(s);
seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
}
static inline void write_seqcount_begin(seqcount_t *s)
{
write_seqcount_begin_nested(s, 0);
}
static inline void write_seqcount_end(seqcount_t *s)
{
seqcount_release(&s->dep_map, _RET_IP_);
raw_write_seqcount_end(s);
}
/**
* write_seqcount_invalidate - invalidate in-progress read-side seq operations
* @s: pointer to seqcount_t
* *
* After write_seqcount_invalidate, no read-side seq operations will complete * Sequence counters with an embedded spinlock for writer serialization
* successfully and see data older than this. * and non-preemptibility.
*
* For more info, see:
* - Comments on top of seqcount_t
* - Documentation/locking/seqlock.rst
*/ */
static inline void write_seqcount_invalidate(seqcount_t *s)
{
smp_wmb();
kcsan_nestable_atomic_begin();
s->sequence+=2;
kcsan_nestable_atomic_end();
}
typedef struct { typedef struct {
struct seqcount seqcount; struct seqcount seqcount;
spinlock_t lock; spinlock_t lock;
} seqlock_t; } seqlock_t;
/*
* These macros triggered gcc-3.x compile-time problems. We think these are
* OK now. Be cautious.
*/
#define __SEQLOCK_UNLOCKED(lockname) \ #define __SEQLOCK_UNLOCKED(lockname) \
{ \ { \
.seqcount = SEQCNT_ZERO(lockname), \ .seqcount = SEQCNT_ZERO(lockname), \
.lock = __SPIN_LOCK_UNLOCKED(lockname) \ .lock = __SPIN_LOCK_UNLOCKED(lockname) \
} }
#define seqlock_init(x) \ /**
* seqlock_init() - dynamic initializer for seqlock_t
* @sl: Pointer to the seqlock_t instance
*/
#define seqlock_init(sl) \
do { \ do { \
seqcount_init(&(x)->seqcount); \ seqcount_init(&(sl)->seqcount); \
spin_lock_init(&(x)->lock); \ spin_lock_init(&(sl)->lock); \
} while (0) } while (0)
#define DEFINE_SEQLOCK(x) \ /**
seqlock_t x = __SEQLOCK_UNLOCKED(x) * DEFINE_SEQLOCK() - Define a statically allocated seqlock_t
* @sl: Name of the seqlock_t instance
*/
#define DEFINE_SEQLOCK(sl) \
seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
/* /**
* Read side functions for starting and finalizing a read side section. * read_seqbegin() - start a seqlock_t read side critical section
* @sl: Pointer to seqlock_t
*
* Return: count, to be passed to read_seqretry()
*/ */
static inline unsigned read_seqbegin(const seqlock_t *sl) static inline unsigned read_seqbegin(const seqlock_t *sl)
{ {
...@@ -470,6 +554,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl) ...@@ -470,6 +554,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
return ret; return ret;
} }
/**
* read_seqretry() - end a seqlock_t read side section
* @sl: Pointer to seqlock_t
* @start: count, from read_seqbegin()
*
* read_seqretry closes the read side critical section of given seqlock_t.
* If the critical section was invalid, it must be ignored (and typically
* retried).
*
* Return: true if a read section retry is required, else false
*/
static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start) static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
{ {
/* /*
...@@ -481,41 +576,85 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start) ...@@ -481,41 +576,85 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
return read_seqcount_retry(&sl->seqcount, start); return read_seqcount_retry(&sl->seqcount, start);
} }
/* /**
* Lock out other writers and update the count. * write_seqlock() - start a seqlock_t write side critical section
* Acts like a normal spin_lock/unlock. * @sl: Pointer to seqlock_t
* Don't need preempt_disable() because that is in the spin_lock already. *
* write_seqlock opens a write side critical section for the given
* seqlock_t. It also implicitly acquires the spinlock_t embedded inside
* that sequential lock. All seqlock_t write side sections are thus
* automatically serialized and non-preemptible.
*
* Context: if the seqlock_t read section, or other write side critical
* sections, can be invoked from hardirq or softirq contexts, use the
* _irqsave or _bh variants of this function instead.
*/ */
static inline void write_seqlock(seqlock_t *sl) static inline void write_seqlock(seqlock_t *sl)
{ {
spin_lock(&sl->lock); spin_lock(&sl->lock);
write_seqcount_begin(&sl->seqcount); __write_seqcount_begin(&sl->seqcount);
} }
/**
* write_sequnlock() - end a seqlock_t write side critical section
* @sl: Pointer to seqlock_t
*
* write_sequnlock closes the (serialized and non-preemptible) write side
* critical section of given seqlock_t.
*/
static inline void write_sequnlock(seqlock_t *sl) static inline void write_sequnlock(seqlock_t *sl)
{ {
write_seqcount_end(&sl->seqcount); write_seqcount_end(&sl->seqcount);
spin_unlock(&sl->lock); spin_unlock(&sl->lock);
} }
/**
* write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
* @sl: Pointer to seqlock_t
*
* _bh variant of write_seqlock(). Use only if the read side section, or
* other write side sections, can be invoked from softirq contexts.
*/
static inline void write_seqlock_bh(seqlock_t *sl) static inline void write_seqlock_bh(seqlock_t *sl)
{ {
spin_lock_bh(&sl->lock); spin_lock_bh(&sl->lock);
write_seqcount_begin(&sl->seqcount); __write_seqcount_begin(&sl->seqcount);
} }
/**
* write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
* @sl: Pointer to seqlock_t
*
* write_sequnlock_bh closes the serialized, non-preemptible, and
* softirqs-disabled, seqlock_t write side critical section opened with
* write_seqlock_bh().
*/
static inline void write_sequnlock_bh(seqlock_t *sl) static inline void write_sequnlock_bh(seqlock_t *sl)
{ {
write_seqcount_end(&sl->seqcount); write_seqcount_end(&sl->seqcount);
spin_unlock_bh(&sl->lock); spin_unlock_bh(&sl->lock);
} }
/**
* write_seqlock_irq() - start a non-interruptible seqlock_t write section
* @sl: Pointer to seqlock_t
*
* _irq variant of write_seqlock(). Use only if the read side section, or
* other write sections, can be invoked from hardirq contexts.
*/
static inline void write_seqlock_irq(seqlock_t *sl) static inline void write_seqlock_irq(seqlock_t *sl)
{ {
spin_lock_irq(&sl->lock); spin_lock_irq(&sl->lock);
write_seqcount_begin(&sl->seqcount); __write_seqcount_begin(&sl->seqcount);
} }
/**
* write_sequnlock_irq() - end a non-interruptible seqlock_t write section
* @sl: Pointer to seqlock_t
*
* write_sequnlock_irq closes the serialized and non-interruptible
* seqlock_t write side section opened with write_seqlock_irq().
*/
static inline void write_sequnlock_irq(seqlock_t *sl) static inline void write_sequnlock_irq(seqlock_t *sl)
{ {
write_seqcount_end(&sl->seqcount); write_seqcount_end(&sl->seqcount);
...@@ -527,13 +666,32 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl) ...@@ -527,13 +666,32 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
unsigned long flags; unsigned long flags;
spin_lock_irqsave(&sl->lock, flags); spin_lock_irqsave(&sl->lock, flags);
write_seqcount_begin(&sl->seqcount); __write_seqcount_begin(&sl->seqcount);
return flags; return flags;
} }
/**
* write_seqlock_irqsave() - start a non-interruptible seqlock_t write
* section
* @lock: Pointer to seqlock_t
* @flags: Stack-allocated storage for saving caller's local interrupt
* state, to be passed to write_sequnlock_irqrestore().
*
* _irqsave variant of write_seqlock(). Use it only if the read side
* section, or other write sections, can be invoked from hardirq context.
*/
#define write_seqlock_irqsave(lock, flags) \ #define write_seqlock_irqsave(lock, flags) \
do { flags = __write_seqlock_irqsave(lock); } while (0) do { flags = __write_seqlock_irqsave(lock); } while (0)
/**
* write_sequnlock_irqrestore() - end non-interruptible seqlock_t write
* section
* @sl: Pointer to seqlock_t
* @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
*
* write_sequnlock_irqrestore closes the serialized and non-interruptible
* seqlock_t write section previously opened with write_seqlock_irqsave().
*/
static inline void static inline void
write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags) write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
{ {
...@@ -541,65 +699,79 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags) ...@@ -541,65 +699,79 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
spin_unlock_irqrestore(&sl->lock, flags); spin_unlock_irqrestore(&sl->lock, flags);
} }
/* /**
* A locking reader exclusively locks out other writers and locking readers, * read_seqlock_excl() - begin a seqlock_t locking reader section
* but doesn't update the sequence number. Acts like a normal spin_lock/unlock. * @sl: Pointer to seqlock_t
* Don't need preempt_disable() because that is in the spin_lock already. *
* read_seqlock_excl opens a seqlock_t locking reader critical section. A
* locking reader exclusively locks out *both* other writers *and* other
* locking readers, but it does not update the embedded sequence number.
*
* Locking readers act like a normal spin_lock()/spin_unlock().
*
* Context: if the seqlock_t write section, *or other read sections*, can
* be invoked from hardirq or softirq contexts, use the _irqsave or _bh
* variant of this function instead.
*
* The opened read section must be closed with read_sequnlock_excl().
*/ */
static inline void read_seqlock_excl(seqlock_t *sl) static inline void read_seqlock_excl(seqlock_t *sl)
{ {
spin_lock(&sl->lock); spin_lock(&sl->lock);
} }
/**
* read_sequnlock_excl() - end a seqlock_t locking reader critical section
* @sl: Pointer to seqlock_t
*/
static inline void read_sequnlock_excl(seqlock_t *sl) static inline void read_sequnlock_excl(seqlock_t *sl)
{ {
spin_unlock(&sl->lock); spin_unlock(&sl->lock);
} }
/** /**
* read_seqbegin_or_lock - begin a sequence number check or locking block * read_seqlock_excl_bh() - start a seqlock_t locking reader section with
* @lock: sequence lock * softirqs disabled
* @seq : sequence number to be checked * @sl: Pointer to seqlock_t
* *
* First try it once optimistically without taking the lock. If that fails, * _bh variant of read_seqlock_excl(). Use this variant only if the
* take the lock. The sequence number is also used as a marker for deciding * seqlock_t write side section, *or other read sections*, can be invoked
* whether to be a reader (even) or writer (odd). * from softirq contexts.
* N.B. seq must be initialized to an even number to begin with.
*/ */
static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
{
if (!(*seq & 1)) /* Even */
*seq = read_seqbegin(lock);
else /* Odd */
read_seqlock_excl(lock);
}
static inline int need_seqretry(seqlock_t *lock, int seq)
{
return !(seq & 1) && read_seqretry(lock, seq);
}
static inline void done_seqretry(seqlock_t *lock, int seq)
{
if (seq & 1)
read_sequnlock_excl(lock);
}
static inline void read_seqlock_excl_bh(seqlock_t *sl) static inline void read_seqlock_excl_bh(seqlock_t *sl)
{ {
spin_lock_bh(&sl->lock); spin_lock_bh(&sl->lock);
} }
/**
* read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
* reader section
* @sl: Pointer to seqlock_t
*/
static inline void read_sequnlock_excl_bh(seqlock_t *sl) static inline void read_sequnlock_excl_bh(seqlock_t *sl)
{ {
spin_unlock_bh(&sl->lock); spin_unlock_bh(&sl->lock);
} }
/**
* read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
* reader section
* @sl: Pointer to seqlock_t
*
* _irq variant of read_seqlock_excl(). Use this only if the seqlock_t
* write side section, *or other read sections*, can be invoked from a
* hardirq context.
*/
static inline void read_seqlock_excl_irq(seqlock_t *sl) static inline void read_seqlock_excl_irq(seqlock_t *sl)
{ {
spin_lock_irq(&sl->lock); spin_lock_irq(&sl->lock);
} }
/**
* read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
* locking reader section
* @sl: Pointer to seqlock_t
*/
static inline void read_sequnlock_excl_irq(seqlock_t *sl) static inline void read_sequnlock_excl_irq(seqlock_t *sl)
{ {
spin_unlock_irq(&sl->lock); spin_unlock_irq(&sl->lock);
...@@ -613,15 +785,117 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl) ...@@ -613,15 +785,117 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
return flags; return flags;
} }
/**
* read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
* locking reader section
* @lock: Pointer to seqlock_t
* @flags: Stack-allocated storage for saving caller's local interrupt
* state, to be passed to read_sequnlock_excl_irqrestore().
*
* _irqsave variant of read_seqlock_excl(). Use this only if the seqlock_t
* write side section, *or other read sections*, can be invoked from a
* hardirq context.
*/
#define read_seqlock_excl_irqsave(lock, flags) \ #define read_seqlock_excl_irqsave(lock, flags) \
do { flags = __read_seqlock_excl_irqsave(lock); } while (0) do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
/**
* read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
* locking reader section
* @sl: Pointer to seqlock_t
* @flags: Caller saved interrupt state, from read_seqlock_excl_irqsave()
*/
static inline void static inline void
read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags) read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
{ {
spin_unlock_irqrestore(&sl->lock, flags); spin_unlock_irqrestore(&sl->lock, flags);
} }
/**
* read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
* @lock: Pointer to seqlock_t
* @seq : Marker and return parameter. If the passed value is even, the
* reader will become a *lockless* seqlock_t reader as in read_seqbegin().
* If the passed value is odd, the reader will become a *locking* reader
* as in read_seqlock_excl(). In the first call to this function, the
* caller *must* initialize and pass an even value to @seq; this way, a
* lockless read can be optimistically tried first.
*
* read_seqbegin_or_lock is an API designed to optimistically try a normal
* lockless seqlock_t read section first. If an odd counter is found, the
* lockless read trial has failed, and the next read iteration transforms
* itself into a full seqlock_t locking reader.
*
* This is typically used to avoid seqlock_t lockless readers starvation
* (too much retry loops) in the case of a sharp spike in write side
* activity.
*
* Context: if the seqlock_t write section, *or other read sections*, can
* be invoked from hardirq or softirq contexts, use the _irqsave or _bh
* variant of this function instead.
*
* Check Documentation/locking/seqlock.rst for template example code.
*
* Return: the encountered sequence counter value, through the @seq
* parameter, which is overloaded as a return parameter. This returned
* value must be checked with need_seqretry(). If the read section need to
* be retried, this returned value must also be passed as the @seq
* parameter of the next read_seqbegin_or_lock() iteration.
*/
static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
{
if (!(*seq & 1)) /* Even */
*seq = read_seqbegin(lock);
else /* Odd */
read_seqlock_excl(lock);
}
/**
* need_seqretry() - validate seqlock_t "locking or lockless" read section
* @lock: Pointer to seqlock_t
* @seq: sequence count, from read_seqbegin_or_lock()
*
* Return: true if a read section retry is required, false otherwise
*/
static inline int need_seqretry(seqlock_t *lock, int seq)
{
return !(seq & 1) && read_seqretry(lock, seq);
}
/**
* done_seqretry() - end seqlock_t "locking or lockless" reader section
* @lock: Pointer to seqlock_t
* @seq: count, from read_seqbegin_or_lock()
*
* done_seqretry finishes the seqlock_t read side critical section started
* with read_seqbegin_or_lock() and validated by need_seqretry().
*/
static inline void done_seqretry(seqlock_t *lock, int seq)
{
if (seq & 1)
read_sequnlock_excl(lock);
}
/**
* read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
* a non-interruptible locking reader
* @lock: Pointer to seqlock_t
* @seq: Marker and return parameter. Check read_seqbegin_or_lock().
*
* This is the _irqsave variant of read_seqbegin_or_lock(). Use it only if
* the seqlock_t write section, *or other read sections*, can be invoked
* from hardirq context.
*
* Note: Interrupts will be disabled only for "locking reader" mode.
*
* Return:
*
* 1. The saved local interrupts state in case of a locking reader, to
* be passed to done_seqretry_irqrestore().
*
* 2. The encountered sequence counter value, returned through @seq
* overloaded as a return parameter. Check read_seqbegin_or_lock().
*/
static inline unsigned long static inline unsigned long
read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq) read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
{ {
...@@ -635,6 +909,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq) ...@@ -635,6 +909,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
return flags; return flags;
} }
/**
* done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
* non-interruptible locking reader section
* @lock: Pointer to seqlock_t
* @seq: Count, from read_seqbegin_or_lock_irqsave()
* @flags: Caller's saved local interrupt state in case of a locking
* reader, also from read_seqbegin_or_lock_irqsave()
*
* This is the _irqrestore variant of done_seqretry(). The read section
* must've been opened with read_seqbegin_or_lock_irqsave(), and validated
* by need_seqretry().
*/
static inline void static inline void
done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
{ {
......
...@@ -56,6 +56,7 @@ ...@@ -56,6 +56,7 @@
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/stringify.h> #include <linux/stringify.h>
#include <linux/bottom_half.h> #include <linux/bottom_half.h>
#include <linux/lockdep.h>
#include <asm/barrier.h> #include <asm/barrier.h>
#include <asm/mmiowb.h> #include <asm/mmiowb.h>
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
# include <linux/spinlock_types_up.h> # include <linux/spinlock_types_up.h>
#endif #endif
#include <linux/lockdep.h> #include <linux/lockdep_types.h>
typedef struct raw_spinlock { typedef struct raw_spinlock {
arch_spinlock_t raw_lock; arch_spinlock_t raw_lock;
......
...@@ -167,6 +167,8 @@ typedef struct { ...@@ -167,6 +167,8 @@ typedef struct {
int counter; int counter;
} atomic_t; } atomic_t;
#define ATOMIC_INIT(i) { (i) }
#ifdef CONFIG_64BIT #ifdef CONFIG_64BIT
typedef struct { typedef struct {
s64 counter; s64 counter;
......
...@@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) ...@@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (new) { if (new) {
*new = *orig; ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
/*
* orig->shared.rb may be modified concurrently, but the clone
* will be reinitialized.
*/
*new = data_race(*orig);
INIT_LIST_HEAD(&new->anon_vma_chain); INIT_LIST_HEAD(&new->anon_vma_chain);
new->vm_next = new->vm_prev = NULL; new->vm_next = new->vm_prev = NULL;
} }
...@@ -1954,8 +1960,8 @@ static __latent_entropy struct task_struct *copy_process( ...@@ -1954,8 +1960,8 @@ static __latent_entropy struct task_struct *copy_process(
rt_mutex_init_task(p); rt_mutex_init_task(p);
lockdep_assert_irqs_enabled();
#ifdef CONFIG_PROVE_LOCKING #ifdef CONFIG_PROVE_LOCKING
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled); DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
#endif #endif
retval = -EAGAIN; retval = -EAGAIN;
...@@ -2035,19 +2041,11 @@ static __latent_entropy struct task_struct *copy_process( ...@@ -2035,19 +2041,11 @@ static __latent_entropy struct task_struct *copy_process(
seqcount_init(&p->mems_allowed_seq); seqcount_init(&p->mems_allowed_seq);
#endif #endif
#ifdef CONFIG_TRACE_IRQFLAGS #ifdef CONFIG_TRACE_IRQFLAGS
p->irq_events = 0; memset(&p->irqtrace, 0, sizeof(p->irqtrace));
p->hardirqs_enabled = 0; p->irqtrace.hardirq_disable_ip = _THIS_IP_;
p->hardirq_enable_ip = 0; p->irqtrace.softirq_enable_ip = _THIS_IP_;
p->hardirq_enable_event = 0; p->softirqs_enabled = 1;
p->hardirq_disable_ip = _THIS_IP_; p->softirq_context = 0;
p->hardirq_disable_event = 0;
p->softirqs_enabled = 1;
p->softirq_enable_ip = _THIS_IP_;
p->softirq_enable_event = 0;
p->softirq_disable_ip = 0;
p->softirq_disable_event = 0;
p->hardirq_context = 0;
p->softirq_context = 0;
#endif #endif
p->pagefault_disabled = 0; p->pagefault_disabled = 0;
......
...@@ -32,30 +32,13 @@ ...@@ -32,30 +32,13 @@
* "But they come in a choice of three flavours!" * "But they come in a choice of three flavours!"
*/ */
#include <linux/compat.h> #include <linux/compat.h>
#include <linux/slab.h>
#include <linux/poll.h>
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/jhash.h> #include <linux/jhash.h>
#include <linux/init.h>
#include <linux/futex.h>
#include <linux/mount.h>
#include <linux/pagemap.h> #include <linux/pagemap.h>
#include <linux/syscalls.h> #include <linux/syscalls.h>
#include <linux/signal.h>
#include <linux/export.h>
#include <linux/magic.h>
#include <linux/pid.h>
#include <linux/nsproxy.h>
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
#include <linux/sched/wake_q.h>
#include <linux/sched/mm.h>
#include <linux/hugetlb.h> #include <linux/hugetlb.h>
#include <linux/freezer.h> #include <linux/freezer.h>
#include <linux/memblock.h> #include <linux/memblock.h>
#include <linux/fault-inject.h> #include <linux/fault-inject.h>
#include <linux/refcount.h>
#include <asm/futex.h> #include <asm/futex.h>
...@@ -476,7 +459,7 @@ static u64 get_inode_sequence_number(struct inode *inode) ...@@ -476,7 +459,7 @@ static u64 get_inode_sequence_number(struct inode *inode)
/** /**
* get_futex_key() - Get parameters which are the keys for a futex * get_futex_key() - Get parameters which are the keys for a futex
* @uaddr: virtual address of the futex * @uaddr: virtual address of the futex
* @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED * @fshared: false for a PROCESS_PRIVATE futex, true for PROCESS_SHARED
* @key: address where result is stored. * @key: address where result is stored.
* @rw: mapping needs to be read/write (values: FUTEX_READ, * @rw: mapping needs to be read/write (values: FUTEX_READ,
* FUTEX_WRITE) * FUTEX_WRITE)
...@@ -500,8 +483,8 @@ static u64 get_inode_sequence_number(struct inode *inode) ...@@ -500,8 +483,8 @@ static u64 get_inode_sequence_number(struct inode *inode)
* *
* lock_page() might sleep, the caller should not hold a spinlock. * lock_page() might sleep, the caller should not hold a spinlock.
*/ */
static int static int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_access rw) enum futex_access rw)
{ {
unsigned long address = (unsigned long)uaddr; unsigned long address = (unsigned long)uaddr;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
...@@ -538,7 +521,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a ...@@ -538,7 +521,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
again: again:
/* Ignore any VERIFY_READ mapping (futex common case) */ /* Ignore any VERIFY_READ mapping (futex common case) */
if (unlikely(should_fail_futex(fshared))) if (unlikely(should_fail_futex(true)))
return -EFAULT; return -EFAULT;
err = get_user_pages_fast(address, 1, FOLL_WRITE, &page); err = get_user_pages_fast(address, 1, FOLL_WRITE, &page);
...@@ -626,7 +609,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a ...@@ -626,7 +609,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
* A RO anonymous page will never change and thus doesn't make * A RO anonymous page will never change and thus doesn't make
* sense for futex operations. * sense for futex operations.
*/ */
if (unlikely(should_fail_futex(fshared)) || ro) { if (unlikely(should_fail_futex(true)) || ro) {
err = -EFAULT; err = -EFAULT;
goto out; goto out;
} }
...@@ -677,10 +660,6 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a ...@@ -677,10 +660,6 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
return err; return err;
} }
static inline void put_futex_key(union futex_key *key)
{
}
/** /**
* fault_in_user_writeable() - Fault in user address and verify RW access * fault_in_user_writeable() - Fault in user address and verify RW access
* @uaddr: pointer to faulting user space address * @uaddr: pointer to faulting user space address
...@@ -1611,13 +1590,13 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) ...@@ -1611,13 +1590,13 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ); ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out; return ret;
hb = hash_futex(&key); hb = hash_futex(&key);
/* Make sure we really have tasks to wakeup */ /* Make sure we really have tasks to wakeup */
if (!hb_waiters_pending(hb)) if (!hb_waiters_pending(hb))
goto out_put_key; return ret;
spin_lock(&hb->lock); spin_lock(&hb->lock);
...@@ -1640,9 +1619,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) ...@@ -1640,9 +1619,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
spin_unlock(&hb->lock); spin_unlock(&hb->lock);
wake_up_q(&wake_q); wake_up_q(&wake_q);
out_put_key:
put_futex_key(&key);
out:
return ret; return ret;
} }
...@@ -1709,10 +1685,10 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, ...@@ -1709,10 +1685,10 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
retry: retry:
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ); ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out; return ret;
ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE); ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out_put_key1; return ret;
hb1 = hash_futex(&key1); hb1 = hash_futex(&key1);
hb2 = hash_futex(&key2); hb2 = hash_futex(&key2);
...@@ -1730,13 +1706,13 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, ...@@ -1730,13 +1706,13 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
* an MMU, but we might get them from range checking * an MMU, but we might get them from range checking
*/ */
ret = op_ret; ret = op_ret;
goto out_put_keys; return ret;
} }
if (op_ret == -EFAULT) { if (op_ret == -EFAULT) {
ret = fault_in_user_writeable(uaddr2); ret = fault_in_user_writeable(uaddr2);
if (ret) if (ret)
goto out_put_keys; return ret;
} }
if (!(flags & FLAGS_SHARED)) { if (!(flags & FLAGS_SHARED)) {
...@@ -1744,8 +1720,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, ...@@ -1744,8 +1720,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
goto retry_private; goto retry_private;
} }
put_futex_key(&key2);
put_futex_key(&key1);
cond_resched(); cond_resched();
goto retry; goto retry;
} }
...@@ -1781,11 +1755,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, ...@@ -1781,11 +1755,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
out_unlock: out_unlock:
double_unlock_hb(hb1, hb2); double_unlock_hb(hb1, hb2);
wake_up_q(&wake_q); wake_up_q(&wake_q);
out_put_keys:
put_futex_key(&key2);
out_put_key1:
put_futex_key(&key1);
out:
return ret; return ret;
} }
...@@ -1992,20 +1961,18 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ...@@ -1992,20 +1961,18 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
retry: retry:
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ); ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out; return ret;
ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
requeue_pi ? FUTEX_WRITE : FUTEX_READ); requeue_pi ? FUTEX_WRITE : FUTEX_READ);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out_put_key1; return ret;
/* /*
* The check above which compares uaddrs is not sufficient for * The check above which compares uaddrs is not sufficient for
* shared futexes. We need to compare the keys: * shared futexes. We need to compare the keys:
*/ */
if (requeue_pi && match_futex(&key1, &key2)) { if (requeue_pi && match_futex(&key1, &key2))
ret = -EINVAL; return -EINVAL;
goto out_put_keys;
}
hb1 = hash_futex(&key1); hb1 = hash_futex(&key1);
hb2 = hash_futex(&key2); hb2 = hash_futex(&key2);
...@@ -2025,13 +1992,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ...@@ -2025,13 +1992,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
ret = get_user(curval, uaddr1); ret = get_user(curval, uaddr1);
if (ret) if (ret)
goto out_put_keys; return ret;
if (!(flags & FLAGS_SHARED)) if (!(flags & FLAGS_SHARED))
goto retry_private; goto retry_private;
put_futex_key(&key2);
put_futex_key(&key1);
goto retry; goto retry;
} }
if (curval != *cmpval) { if (curval != *cmpval) {
...@@ -2090,12 +2055,10 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ...@@ -2090,12 +2055,10 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
case -EFAULT: case -EFAULT:
double_unlock_hb(hb1, hb2); double_unlock_hb(hb1, hb2);
hb_waiters_dec(hb2); hb_waiters_dec(hb2);
put_futex_key(&key2);
put_futex_key(&key1);
ret = fault_in_user_writeable(uaddr2); ret = fault_in_user_writeable(uaddr2);
if (!ret) if (!ret)
goto retry; goto retry;
goto out; return ret;
case -EBUSY: case -EBUSY:
case -EAGAIN: case -EAGAIN:
/* /*
...@@ -2106,8 +2069,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ...@@ -2106,8 +2069,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
*/ */
double_unlock_hb(hb1, hb2); double_unlock_hb(hb1, hb2);
hb_waiters_dec(hb2); hb_waiters_dec(hb2);
put_futex_key(&key2);
put_futex_key(&key1);
/* /*
* Handle the case where the owner is in the middle of * Handle the case where the owner is in the middle of
* exiting. Wait for the exit to complete otherwise * exiting. Wait for the exit to complete otherwise
...@@ -2216,12 +2177,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ...@@ -2216,12 +2177,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
double_unlock_hb(hb1, hb2); double_unlock_hb(hb1, hb2);
wake_up_q(&wake_q); wake_up_q(&wake_q);
hb_waiters_dec(hb2); hb_waiters_dec(hb2);
out_put_keys:
put_futex_key(&key2);
out_put_key1:
put_futex_key(&key1);
out:
return ret ? ret : task_count; return ret ? ret : task_count;
} }
...@@ -2567,7 +2522,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) ...@@ -2567,7 +2522,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
*/ */
if (q->pi_state->owner != current) if (q->pi_state->owner != current)
ret = fixup_pi_state_owner(uaddr, q, current); ret = fixup_pi_state_owner(uaddr, q, current);
goto out; return ret ? ret : locked;
} }
/* /*
...@@ -2580,7 +2535,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) ...@@ -2580,7 +2535,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
*/ */
if (q->pi_state->owner == current) { if (q->pi_state->owner == current) {
ret = fixup_pi_state_owner(uaddr, q, NULL); ret = fixup_pi_state_owner(uaddr, q, NULL);
goto out; return ret;
} }
/* /*
...@@ -2594,8 +2549,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) ...@@ -2594,8 +2549,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
q->pi_state->owner); q->pi_state->owner);
} }
out: return ret;
return ret ? ret : locked;
} }
/** /**
...@@ -2692,12 +2646,11 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, ...@@ -2692,12 +2646,11 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
ret = get_user(uval, uaddr); ret = get_user(uval, uaddr);
if (ret) if (ret)
goto out; return ret;
if (!(flags & FLAGS_SHARED)) if (!(flags & FLAGS_SHARED))
goto retry_private; goto retry_private;
put_futex_key(&q->key);
goto retry; goto retry;
} }
...@@ -2706,9 +2659,6 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, ...@@ -2706,9 +2659,6 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
ret = -EWOULDBLOCK; ret = -EWOULDBLOCK;
} }
out:
if (ret)
put_futex_key(&q->key);
return ret; return ret;
} }
...@@ -2853,7 +2803,6 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ...@@ -2853,7 +2803,6 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
* - EAGAIN: The user space value changed. * - EAGAIN: The user space value changed.
*/ */
queue_unlock(hb); queue_unlock(hb);
put_futex_key(&q.key);
/* /*
* Handle the case where the owner is in the middle of * Handle the case where the owner is in the middle of
* exiting. Wait for the exit to complete otherwise * exiting. Wait for the exit to complete otherwise
...@@ -2961,13 +2910,11 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ...@@ -2961,13 +2910,11 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
put_pi_state(pi_state); put_pi_state(pi_state);
} }
goto out_put_key; goto out;
out_unlock_put_key: out_unlock_put_key:
queue_unlock(hb); queue_unlock(hb);
out_put_key:
put_futex_key(&q.key);
out: out:
if (to) { if (to) {
hrtimer_cancel(&to->timer); hrtimer_cancel(&to->timer);
...@@ -2980,12 +2927,11 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ...@@ -2980,12 +2927,11 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
ret = fault_in_user_writeable(uaddr); ret = fault_in_user_writeable(uaddr);
if (ret) if (ret)
goto out_put_key; goto out;
if (!(flags & FLAGS_SHARED)) if (!(flags & FLAGS_SHARED))
goto retry_private; goto retry_private;
put_futex_key(&q.key);
goto retry; goto retry;
} }
...@@ -3114,16 +3060,13 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) ...@@ -3114,16 +3060,13 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
out_unlock: out_unlock:
spin_unlock(&hb->lock); spin_unlock(&hb->lock);
out_putkey: out_putkey:
put_futex_key(&key);
return ret; return ret;
pi_retry: pi_retry:
put_futex_key(&key);
cond_resched(); cond_resched();
goto retry; goto retry;
pi_faulted: pi_faulted:
put_futex_key(&key);
ret = fault_in_user_writeable(uaddr); ret = fault_in_user_writeable(uaddr);
if (!ret) if (!ret)
...@@ -3265,7 +3208,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, ...@@ -3265,7 +3208,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
*/ */
ret = futex_wait_setup(uaddr, val, flags, &q, &hb); ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
if (ret) if (ret)
goto out_key2; goto out;
/* /*
* The check above which compares uaddrs is not sufficient for * The check above which compares uaddrs is not sufficient for
...@@ -3274,7 +3217,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, ...@@ -3274,7 +3217,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
if (match_futex(&q.key, &key2)) { if (match_futex(&q.key, &key2)) {
queue_unlock(hb); queue_unlock(hb);
ret = -EINVAL; ret = -EINVAL;
goto out_put_keys; goto out;
} }
/* Queue the futex_q, drop the hb lock, wait for wakeup. */ /* Queue the futex_q, drop the hb lock, wait for wakeup. */
...@@ -3284,7 +3227,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, ...@@ -3284,7 +3227,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
spin_unlock(&hb->lock); spin_unlock(&hb->lock);
if (ret) if (ret)
goto out_put_keys; goto out;
/* /*
* In order for us to be here, we know our q.key == key2, and since * In order for us to be here, we know our q.key == key2, and since
...@@ -3374,11 +3317,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, ...@@ -3374,11 +3317,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
ret = -EWOULDBLOCK; ret = -EWOULDBLOCK;
} }
out_put_keys:
put_futex_key(&q.key);
out_key2:
put_futex_key(&key2);
out: out:
if (to) { if (to) {
hrtimer_cancel(&to->timer); hrtimer_cancel(&to->timer);
......
...@@ -7,8 +7,11 @@ CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE) ...@@ -7,8 +7,11 @@ CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_debugfs.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_debugfs.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_report.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_report.o = $(CC_FLAGS_FTRACE)
CFLAGS_core.o := $(call cc-option,-fno-conserve-stack,) \ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
$(call cc-option,-fno-stack-protector,) -fno-stack-protector -DDISABLE_BRANCH_PROFILING
obj-y := core.o debugfs.o report.o obj-y := core.o debugfs.o report.o
obj-$(CONFIG_KCSAN_SELFTEST) += test.o obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
CFLAGS_kcsan-test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
obj-$(CONFIG_KCSAN_TEST) += kcsan-test.o
...@@ -3,8 +3,7 @@ ...@@ -3,8 +3,7 @@
#ifndef _KERNEL_KCSAN_ATOMIC_H #ifndef _KERNEL_KCSAN_ATOMIC_H
#define _KERNEL_KCSAN_ATOMIC_H #define _KERNEL_KCSAN_ATOMIC_H
#include <linux/jiffies.h> #include <linux/types.h>
#include <linux/sched.h>
/* /*
* Special rules for certain memory where concurrent conflicting accesses are * Special rules for certain memory where concurrent conflicting accesses are
...@@ -13,8 +12,7 @@ ...@@ -13,8 +12,7 @@
*/ */
static bool kcsan_is_atomic_special(const volatile void *ptr) static bool kcsan_is_atomic_special(const volatile void *ptr)
{ {
/* volatile globals that have been observed in data races. */ return false;
return ptr == &jiffies || ptr == &current->state;
} }
#endif /* _KERNEL_KCSAN_ATOMIC_H */ #endif /* _KERNEL_KCSAN_ATOMIC_H */
...@@ -291,6 +291,20 @@ static inline unsigned int get_delay(void) ...@@ -291,6 +291,20 @@ static inline unsigned int get_delay(void)
0); 0);
} }
void kcsan_save_irqtrace(struct task_struct *task)
{
#ifdef CONFIG_TRACE_IRQFLAGS
task->kcsan_save_irqtrace = task->irqtrace;
#endif
}
void kcsan_restore_irqtrace(struct task_struct *task)
{
#ifdef CONFIG_TRACE_IRQFLAGS
task->irqtrace = task->kcsan_save_irqtrace;
#endif
}
/* /*
* Pull everything together: check_access() below contains the performance * Pull everything together: check_access() below contains the performance
* critical operations; the fast-path (including check_access) functions should * critical operations; the fast-path (including check_access) functions should
...@@ -336,9 +350,11 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr, ...@@ -336,9 +350,11 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
flags = user_access_save(); flags = user_access_save();
if (consumed) { if (consumed) {
kcsan_save_irqtrace(current);
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE, kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
KCSAN_REPORT_CONSUMED_WATCHPOINT, KCSAN_REPORT_CONSUMED_WATCHPOINT,
watchpoint - watchpoints); watchpoint - watchpoints);
kcsan_restore_irqtrace(current);
} else { } else {
/* /*
* The other thread may not print any diagnostics, as it has * The other thread may not print any diagnostics, as it has
...@@ -396,9 +412,14 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type) ...@@ -396,9 +412,14 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
goto out; goto out;
} }
/*
* Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
* runtime is entered for every memory access, and potentially useful
* information is lost if dirtied by KCSAN.
*/
kcsan_save_irqtrace(current);
if (!kcsan_interrupt_watcher) if (!kcsan_interrupt_watcher)
/* Use raw to avoid lockdep recursion via IRQ flags tracing. */ local_irq_save(irq_flags);
raw_local_irq_save(irq_flags);
watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write); watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
if (watchpoint == NULL) { if (watchpoint == NULL) {
...@@ -539,7 +560,8 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type) ...@@ -539,7 +560,8 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS); kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
out_unlock: out_unlock:
if (!kcsan_interrupt_watcher) if (!kcsan_interrupt_watcher)
raw_local_irq_restore(irq_flags); local_irq_restore(irq_flags);
kcsan_restore_irqtrace(current);
out: out:
user_access_restore(ua_flags); user_access_restore(ua_flags);
} }
...@@ -754,6 +776,7 @@ EXPORT_SYMBOL(__kcsan_check_access); ...@@ -754,6 +776,7 @@ EXPORT_SYMBOL(__kcsan_check_access);
*/ */
#define DEFINE_TSAN_READ_WRITE(size) \ #define DEFINE_TSAN_READ_WRITE(size) \
void __tsan_read##size(void *ptr); \
void __tsan_read##size(void *ptr) \ void __tsan_read##size(void *ptr) \
{ \ { \
check_access(ptr, size, 0); \ check_access(ptr, size, 0); \
...@@ -762,6 +785,7 @@ EXPORT_SYMBOL(__kcsan_check_access); ...@@ -762,6 +785,7 @@ EXPORT_SYMBOL(__kcsan_check_access);
void __tsan_unaligned_read##size(void *ptr) \ void __tsan_unaligned_read##size(void *ptr) \
__alias(__tsan_read##size); \ __alias(__tsan_read##size); \
EXPORT_SYMBOL(__tsan_unaligned_read##size); \ EXPORT_SYMBOL(__tsan_unaligned_read##size); \
void __tsan_write##size(void *ptr); \
void __tsan_write##size(void *ptr) \ void __tsan_write##size(void *ptr) \
{ \ { \
check_access(ptr, size, KCSAN_ACCESS_WRITE); \ check_access(ptr, size, KCSAN_ACCESS_WRITE); \
...@@ -777,12 +801,14 @@ DEFINE_TSAN_READ_WRITE(4); ...@@ -777,12 +801,14 @@ DEFINE_TSAN_READ_WRITE(4);
DEFINE_TSAN_READ_WRITE(8); DEFINE_TSAN_READ_WRITE(8);
DEFINE_TSAN_READ_WRITE(16); DEFINE_TSAN_READ_WRITE(16);
void __tsan_read_range(void *ptr, size_t size);
void __tsan_read_range(void *ptr, size_t size) void __tsan_read_range(void *ptr, size_t size)
{ {
check_access(ptr, size, 0); check_access(ptr, size, 0);
} }
EXPORT_SYMBOL(__tsan_read_range); EXPORT_SYMBOL(__tsan_read_range);
void __tsan_write_range(void *ptr, size_t size);
void __tsan_write_range(void *ptr, size_t size) void __tsan_write_range(void *ptr, size_t size)
{ {
check_access(ptr, size, KCSAN_ACCESS_WRITE); check_access(ptr, size, KCSAN_ACCESS_WRITE);
...@@ -799,6 +825,7 @@ EXPORT_SYMBOL(__tsan_write_range); ...@@ -799,6 +825,7 @@ EXPORT_SYMBOL(__tsan_write_range);
* the size-check of compiletime_assert_rwonce_type(). * the size-check of compiletime_assert_rwonce_type().
*/ */
#define DEFINE_TSAN_VOLATILE_READ_WRITE(size) \ #define DEFINE_TSAN_VOLATILE_READ_WRITE(size) \
void __tsan_volatile_read##size(void *ptr); \
void __tsan_volatile_read##size(void *ptr) \ void __tsan_volatile_read##size(void *ptr) \
{ \ { \
const bool is_atomic = size <= sizeof(long long) && \ const bool is_atomic = size <= sizeof(long long) && \
...@@ -811,6 +838,7 @@ EXPORT_SYMBOL(__tsan_write_range); ...@@ -811,6 +838,7 @@ EXPORT_SYMBOL(__tsan_write_range);
void __tsan_unaligned_volatile_read##size(void *ptr) \ void __tsan_unaligned_volatile_read##size(void *ptr) \
__alias(__tsan_volatile_read##size); \ __alias(__tsan_volatile_read##size); \
EXPORT_SYMBOL(__tsan_unaligned_volatile_read##size); \ EXPORT_SYMBOL(__tsan_unaligned_volatile_read##size); \
void __tsan_volatile_write##size(void *ptr); \
void __tsan_volatile_write##size(void *ptr) \ void __tsan_volatile_write##size(void *ptr) \
{ \ { \
const bool is_atomic = size <= sizeof(long long) && \ const bool is_atomic = size <= sizeof(long long) && \
...@@ -836,14 +864,17 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(16); ...@@ -836,14 +864,17 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(16);
* The below are not required by KCSAN, but can still be emitted by the * The below are not required by KCSAN, but can still be emitted by the
* compiler. * compiler.
*/ */
void __tsan_func_entry(void *call_pc);
void __tsan_func_entry(void *call_pc) void __tsan_func_entry(void *call_pc)
{ {
} }
EXPORT_SYMBOL(__tsan_func_entry); EXPORT_SYMBOL(__tsan_func_entry);
void __tsan_func_exit(void);
void __tsan_func_exit(void) void __tsan_func_exit(void)
{ {
} }
EXPORT_SYMBOL(__tsan_func_exit); EXPORT_SYMBOL(__tsan_func_exit);
void __tsan_init(void);
void __tsan_init(void) void __tsan_init(void)
{ {
} }
......
// SPDX-License-Identifier: GPL-2.0
/*
* KCSAN test with various race scenarious to test runtime behaviour. Since the
* interface with which KCSAN's reports are obtained is via the console, this is
* the output we should verify. For each test case checks the presence (or
* absence) of generated reports. Relies on 'console' tracepoint to capture
* reports as they appear in the kernel log.
*
* Makes use of KUnit for test organization, and the Torture framework for test
* thread control.
*
* Copyright (C) 2020, Google LLC.
* Author: Marco Elver <elver@google.com>
*/
#include <kunit/test.h>
#include <linux/jiffies.h>
#include <linux/kcsan-checks.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/seqlock.h>
#include <linux/spinlock.h>
#include <linux/string.h>
#include <linux/timer.h>
#include <linux/torture.h>
#include <linux/tracepoint.h>
#include <linux/types.h>
#include <trace/events/printk.h>
/* Points to current test-case memory access "kernels". */
static void (*access_kernels[2])(void);
static struct task_struct **threads; /* Lists of threads. */
static unsigned long end_time; /* End time of test. */
/* Report as observed from console. */
static struct {
spinlock_t lock;
int nlines;
char lines[3][512];
} observed = {
.lock = __SPIN_LOCK_UNLOCKED(observed.lock),
};
/* Setup test checking loop. */
static __no_kcsan inline void
begin_test_checks(void (*func1)(void), void (*func2)(void))
{
kcsan_disable_current();
/*
* Require at least as long as KCSAN_REPORT_ONCE_IN_MS, to ensure at
* least one race is reported.
*/
end_time = jiffies + msecs_to_jiffies(CONFIG_KCSAN_REPORT_ONCE_IN_MS + 500);
/* Signal start; release potential initialization of shared data. */
smp_store_release(&access_kernels[0], func1);
smp_store_release(&access_kernels[1], func2);
}
/* End test checking loop. */
static __no_kcsan inline bool
end_test_checks(bool stop)
{
if (!stop && time_before(jiffies, end_time)) {
/* Continue checking */
might_sleep();
return false;
}
kcsan_enable_current();
return true;
}
/*
* Probe for console output: checks if a race was reported, and obtains observed
* lines of interest.
*/
__no_kcsan
static void probe_console(void *ignore, const char *buf, size_t len)
{
unsigned long flags;
int nlines;
/*
* Note that KCSAN reports under a global lock, so we do not risk the
* possibility of having multiple reports interleaved. If that were the
* case, we'd expect tests to fail.
*/
spin_lock_irqsave(&observed.lock, flags);
nlines = observed.nlines;
if (strnstr(buf, "BUG: KCSAN: ", len) && strnstr(buf, "test_", len)) {
/*
* KCSAN report and related to the test.
*
* The provided @buf is not NUL-terminated; copy no more than
* @len bytes and let strscpy() add the missing NUL-terminator.
*/
strscpy(observed.lines[0], buf, min(len + 1, sizeof(observed.lines[0])));
nlines = 1;
} else if ((nlines == 1 || nlines == 2) && strnstr(buf, "bytes by", len)) {
strscpy(observed.lines[nlines++], buf, min(len + 1, sizeof(observed.lines[0])));
if (strnstr(buf, "race at unknown origin", len)) {
if (WARN_ON(nlines != 2))
goto out;
/* No second line of interest. */
strcpy(observed.lines[nlines++], "<none>");
}
}
out:
WRITE_ONCE(observed.nlines, nlines); /* Publish new nlines. */
spin_unlock_irqrestore(&observed.lock, flags);
}
/* Check if a report related to the test exists. */
__no_kcsan
static bool report_available(void)
{
return READ_ONCE(observed.nlines) == ARRAY_SIZE(observed.lines);
}
/* Report information we expect in a report. */
struct expect_report {
/* Access information of both accesses. */
struct {
void *fn; /* Function pointer to expected function of top frame. */
void *addr; /* Address of access; unchecked if NULL. */
size_t size; /* Size of access; unchecked if @addr is NULL. */
int type; /* Access type, see KCSAN_ACCESS definitions. */
} access[2];
};
/* Check observed report matches information in @r. */
__no_kcsan
static bool report_matches(const struct expect_report *r)
{
const bool is_assert = (r->access[0].type | r->access[1].type) & KCSAN_ACCESS_ASSERT;
bool ret = false;
unsigned long flags;
typeof(observed.lines) expect;
const char *end;
char *cur;
int i;
/* Doubled-checked locking. */
if (!report_available())
return false;
/* Generate expected report contents. */
/* Title */
cur = expect[0];
end = &expect[0][sizeof(expect[0]) - 1];
cur += scnprintf(cur, end - cur, "BUG: KCSAN: %s in ",
is_assert ? "assert: race" : "data-race");
if (r->access[1].fn) {
char tmp[2][64];
int cmp;
/* Expect lexographically sorted function names in title. */
scnprintf(tmp[0], sizeof(tmp[0]), "%pS", r->access[0].fn);
scnprintf(tmp[1], sizeof(tmp[1]), "%pS", r->access[1].fn);
cmp = strcmp(tmp[0], tmp[1]);
cur += scnprintf(cur, end - cur, "%ps / %ps",
cmp < 0 ? r->access[0].fn : r->access[1].fn,
cmp < 0 ? r->access[1].fn : r->access[0].fn);
} else {
scnprintf(cur, end - cur, "%pS", r->access[0].fn);
/* The exact offset won't match, remove it. */
cur = strchr(expect[0], '+');
if (cur)
*cur = '\0';
}
/* Access 1 */
cur = expect[1];
end = &expect[1][sizeof(expect[1]) - 1];
if (!r->access[1].fn)
cur += scnprintf(cur, end - cur, "race at unknown origin, with ");
/* Access 1 & 2 */
for (i = 0; i < 2; ++i) {
const char *const access_type =
(r->access[i].type & KCSAN_ACCESS_ASSERT) ?
((r->access[i].type & KCSAN_ACCESS_WRITE) ?
"assert no accesses" :
"assert no writes") :
((r->access[i].type & KCSAN_ACCESS_WRITE) ?
"write" :
"read");
const char *const access_type_aux =
(r->access[i].type & KCSAN_ACCESS_ATOMIC) ?
" (marked)" :
((r->access[i].type & KCSAN_ACCESS_SCOPED) ?
" (scoped)" :
"");
if (i == 1) {
/* Access 2 */
cur = expect[2];
end = &expect[2][sizeof(expect[2]) - 1];
if (!r->access[1].fn) {
/* Dummy string if no second access is available. */
strcpy(cur, "<none>");
break;
}
}
cur += scnprintf(cur, end - cur, "%s%s to ", access_type,
access_type_aux);
if (r->access[i].addr) /* Address is optional. */
cur += scnprintf(cur, end - cur, "0x%px of %zu bytes",
r->access[i].addr, r->access[i].size);
}
spin_lock_irqsave(&observed.lock, flags);
if (!report_available())
goto out; /* A new report is being captured. */
/* Finally match expected output to what we actually observed. */
ret = strstr(observed.lines[0], expect[0]) &&
/* Access info may appear in any order. */
((strstr(observed.lines[1], expect[1]) &&
strstr(observed.lines[2], expect[2])) ||
(strstr(observed.lines[1], expect[2]) &&
strstr(observed.lines[2], expect[1])));
out:
spin_unlock_irqrestore(&observed.lock, flags);
return ret;
}
/* ===== Test kernels ===== */
static long test_sink;
static long test_var;
/* @test_array should be large enough to fall into multiple watchpoint slots. */
static long test_array[3 * PAGE_SIZE / sizeof(long)];
static struct {
long val[8];
} test_struct;
static DEFINE_SEQLOCK(test_seqlock);
/*
* Helper to avoid compiler optimizing out reads, and to generate source values
* for writes.
*/
__no_kcsan
static noinline void sink_value(long v) { WRITE_ONCE(test_sink, v); }
static noinline void test_kernel_read(void) { sink_value(test_var); }
static noinline void test_kernel_write(void)
{
test_var = READ_ONCE_NOCHECK(test_sink) + 1;
}
static noinline void test_kernel_write_nochange(void) { test_var = 42; }
/* Suffixed by value-change exception filter. */
static noinline void test_kernel_write_nochange_rcu(void) { test_var = 42; }
static noinline void test_kernel_read_atomic(void)
{
sink_value(READ_ONCE(test_var));
}
static noinline void test_kernel_write_atomic(void)
{
WRITE_ONCE(test_var, READ_ONCE_NOCHECK(test_sink) + 1);
}
__no_kcsan
static noinline void test_kernel_write_uninstrumented(void) { test_var++; }
static noinline void test_kernel_data_race(void) { data_race(test_var++); }
static noinline void test_kernel_assert_writer(void)
{
ASSERT_EXCLUSIVE_WRITER(test_var);
}
static noinline void test_kernel_assert_access(void)
{
ASSERT_EXCLUSIVE_ACCESS(test_var);
}
#define TEST_CHANGE_BITS 0xff00ff00
static noinline void test_kernel_change_bits(void)
{
if (IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) {
/*
* Avoid race of unknown origin for this test, just pretend they
* are atomic.
*/
kcsan_nestable_atomic_begin();
test_var ^= TEST_CHANGE_BITS;
kcsan_nestable_atomic_end();
} else
WRITE_ONCE(test_var, READ_ONCE(test_var) ^ TEST_CHANGE_BITS);
}
static noinline void test_kernel_assert_bits_change(void)
{
ASSERT_EXCLUSIVE_BITS(test_var, TEST_CHANGE_BITS);
}
static noinline void test_kernel_assert_bits_nochange(void)
{
ASSERT_EXCLUSIVE_BITS(test_var, ~TEST_CHANGE_BITS);
}
/* To check that scoped assertions do trigger anywhere in scope. */
static noinline void test_enter_scope(void)
{
int x = 0;
/* Unrelated accesses to scoped assert. */
READ_ONCE(test_sink);
kcsan_check_read(&x, sizeof(x));
}
static noinline void test_kernel_assert_writer_scoped(void)
{
ASSERT_EXCLUSIVE_WRITER_SCOPED(test_var);
test_enter_scope();
}
static noinline void test_kernel_assert_access_scoped(void)
{
ASSERT_EXCLUSIVE_ACCESS_SCOPED(test_var);
test_enter_scope();
}
static noinline void test_kernel_rmw_array(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(test_array); ++i)
test_array[i]++;
}
static noinline void test_kernel_write_struct(void)
{
kcsan_check_write(&test_struct, sizeof(test_struct));
kcsan_disable_current();
test_struct.val[3]++; /* induce value change */
kcsan_enable_current();
}
static noinline void test_kernel_write_struct_part(void)
{
test_struct.val[3] = 42;
}
static noinline void test_kernel_read_struct_zero_size(void)
{
kcsan_check_read(&test_struct.val[3], 0);
}
static noinline void test_kernel_jiffies_reader(void)
{
sink_value((long)jiffies);
}
static noinline void test_kernel_seqlock_reader(void)
{
unsigned int seq;
do {
seq = read_seqbegin(&test_seqlock);
sink_value(test_var);
} while (read_seqretry(&test_seqlock, seq));
}
static noinline void test_kernel_seqlock_writer(void)
{
unsigned long flags;
write_seqlock_irqsave(&test_seqlock, flags);
test_var++;
write_sequnlock_irqrestore(&test_seqlock, flags);
}
/* ===== Test cases ===== */
/* Simple test with normal data race. */
__no_kcsan
static void test_basic(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
static const struct expect_report never = {
.access = {
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
bool match_expect = false;
bool match_never = false;
begin_test_checks(test_kernel_write, test_kernel_read);
do {
match_expect |= report_matches(&expect);
match_never = report_matches(&never);
} while (!end_test_checks(match_never));
KUNIT_EXPECT_TRUE(test, match_expect);
KUNIT_EXPECT_FALSE(test, match_never);
}
/*
* Stress KCSAN with lots of concurrent races on different addresses until
* timeout.
*/
__no_kcsan
static void test_concurrent_races(struct kunit *test)
{
const struct expect_report expect = {
.access = {
/* NULL will match any address. */
{ test_kernel_rmw_array, NULL, 0, KCSAN_ACCESS_WRITE },
{ test_kernel_rmw_array, NULL, 0, 0 },
},
};
static const struct expect_report never = {
.access = {
{ test_kernel_rmw_array, NULL, 0, 0 },
{ test_kernel_rmw_array, NULL, 0, 0 },
},
};
bool match_expect = false;
bool match_never = false;
begin_test_checks(test_kernel_rmw_array, test_kernel_rmw_array);
do {
match_expect |= report_matches(&expect);
match_never |= report_matches(&never);
} while (!end_test_checks(false));
KUNIT_EXPECT_TRUE(test, match_expect); /* Sanity check matches exist. */
KUNIT_EXPECT_FALSE(test, match_never);
}
/* Test the KCSAN_REPORT_VALUE_CHANGE_ONLY option. */
__no_kcsan
static void test_novalue_change(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_write_nochange, test_kernel_read);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
if (IS_ENABLED(CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY))
KUNIT_EXPECT_FALSE(test, match_expect);
else
KUNIT_EXPECT_TRUE(test, match_expect);
}
/*
* Test that the rules where the KCSAN_REPORT_VALUE_CHANGE_ONLY option should
* never apply work.
*/
__no_kcsan
static void test_novalue_change_exception(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write_nochange_rcu, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_write_nochange_rcu, test_kernel_read);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
/* Test that data races of unknown origin are reported. */
__no_kcsan
static void test_unknown_origin(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
{ NULL },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_write_uninstrumented, test_kernel_read);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
if (IS_ENABLED(CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN))
KUNIT_EXPECT_TRUE(test, match_expect);
else
KUNIT_EXPECT_FALSE(test, match_expect);
}
/* Test KCSAN_ASSUME_PLAIN_WRITES_ATOMIC if it is selected. */
__no_kcsan
static void test_write_write_assume_atomic(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
{ test_kernel_write, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_write, test_kernel_write);
do {
sink_value(READ_ONCE(test_var)); /* induce value-change */
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
if (IS_ENABLED(CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC))
KUNIT_EXPECT_FALSE(test, match_expect);
else
KUNIT_EXPECT_TRUE(test, match_expect);
}
/*
* Test that data races with writes larger than word-size are always reported,
* even if KCSAN_ASSUME_PLAIN_WRITES_ATOMIC is selected.
*/
__no_kcsan
static void test_write_write_struct(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_write_struct, test_kernel_write_struct);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
/*
* Test that data races where only one write is larger than word-size are always
* reported, even if KCSAN_ASSUME_PLAIN_WRITES_ATOMIC is selected.
*/
__no_kcsan
static void test_write_write_struct_part(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
{ test_kernel_write_struct_part, &test_struct.val[3], sizeof(test_struct.val[3]), KCSAN_ACCESS_WRITE },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_write_struct, test_kernel_write_struct_part);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
/* Test that races with atomic accesses never result in reports. */
__no_kcsan
static void test_read_atomic_write_atomic(struct kunit *test)
{
bool match_never = false;
begin_test_checks(test_kernel_read_atomic, test_kernel_write_atomic);
do {
match_never = report_available();
} while (!end_test_checks(match_never));
KUNIT_EXPECT_FALSE(test, match_never);
}
/* Test that a race with an atomic and plain access result in reports. */
__no_kcsan
static void test_read_plain_atomic_write(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
{ test_kernel_write_atomic, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC },
},
};
bool match_expect = false;
if (IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS))
return;
begin_test_checks(test_kernel_read, test_kernel_write_atomic);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
/* Zero-sized accesses should never cause data race reports. */
__no_kcsan
static void test_zero_size_access(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
},
};
const struct expect_report never = {
.access = {
{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
{ test_kernel_read_struct_zero_size, &test_struct.val[3], 0, 0 },
},
};
bool match_expect = false;
bool match_never = false;
begin_test_checks(test_kernel_write_struct, test_kernel_read_struct_zero_size);
do {
match_expect |= report_matches(&expect);
match_never = report_matches(&never);
} while (!end_test_checks(match_never));
KUNIT_EXPECT_TRUE(test, match_expect); /* Sanity check. */
KUNIT_EXPECT_FALSE(test, match_never);
}
/* Test the data_race() macro. */
__no_kcsan
static void test_data_race(struct kunit *test)
{
bool match_never = false;
begin_test_checks(test_kernel_data_race, test_kernel_data_race);
do {
match_never = report_available();
} while (!end_test_checks(match_never));
KUNIT_EXPECT_FALSE(test, match_never);
}
__no_kcsan
static void test_assert_exclusive_writer(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_assert_writer, test_kernel_write_nochange);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
__no_kcsan
static void test_assert_exclusive_access(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_assert_access, test_kernel_read);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
__no_kcsan
static void test_assert_exclusive_access_writer(struct kunit *test)
{
const struct expect_report expect_access_writer = {
.access = {
{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
},
};
const struct expect_report expect_access_access = {
.access = {
{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
},
};
const struct expect_report never = {
.access = {
{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
},
};
bool match_expect_access_writer = false;
bool match_expect_access_access = false;
bool match_never = false;
begin_test_checks(test_kernel_assert_access, test_kernel_assert_writer);
do {
match_expect_access_writer |= report_matches(&expect_access_writer);
match_expect_access_access |= report_matches(&expect_access_access);
match_never |= report_matches(&never);
} while (!end_test_checks(match_never));
KUNIT_EXPECT_TRUE(test, match_expect_access_writer);
KUNIT_EXPECT_TRUE(test, match_expect_access_access);
KUNIT_EXPECT_FALSE(test, match_never);
}
__no_kcsan
static void test_assert_exclusive_bits_change(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_assert_bits_change, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
{ test_kernel_change_bits, &test_var, sizeof(test_var),
KCSAN_ACCESS_WRITE | (IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS) ? 0 : KCSAN_ACCESS_ATOMIC) },
},
};
bool match_expect = false;
begin_test_checks(test_kernel_assert_bits_change, test_kernel_change_bits);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
__no_kcsan
static void test_assert_exclusive_bits_nochange(struct kunit *test)
{
bool match_never = false;
begin_test_checks(test_kernel_assert_bits_nochange, test_kernel_change_bits);
do {
match_never = report_available();
} while (!end_test_checks(match_never));
KUNIT_EXPECT_FALSE(test, match_never);
}
__no_kcsan
static void test_assert_exclusive_writer_scoped(struct kunit *test)
{
const struct expect_report expect_start = {
.access = {
{ test_kernel_assert_writer_scoped, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_SCOPED },
{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
},
};
const struct expect_report expect_anywhere = {
.access = {
{ test_enter_scope, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_SCOPED },
{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
},
};
bool match_expect_start = false;
bool match_expect_anywhere = false;
begin_test_checks(test_kernel_assert_writer_scoped, test_kernel_write_nochange);
do {
match_expect_start |= report_matches(&expect_start);
match_expect_anywhere |= report_matches(&expect_anywhere);
} while (!end_test_checks(match_expect_start && match_expect_anywhere));
KUNIT_EXPECT_TRUE(test, match_expect_start);
KUNIT_EXPECT_TRUE(test, match_expect_anywhere);
}
__no_kcsan
static void test_assert_exclusive_access_scoped(struct kunit *test)
{
const struct expect_report expect_start1 = {
.access = {
{ test_kernel_assert_access_scoped, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_SCOPED },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
const struct expect_report expect_start2 = {
.access = { expect_start1.access[0], expect_start1.access[0] },
};
const struct expect_report expect_inscope = {
.access = {
{ test_enter_scope, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_SCOPED },
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
},
};
bool match_expect_start = false;
bool match_expect_inscope = false;
begin_test_checks(test_kernel_assert_access_scoped, test_kernel_read);
end_time += msecs_to_jiffies(1000); /* This test requires a bit more time. */
do {
match_expect_start |= report_matches(&expect_start1) || report_matches(&expect_start2);
match_expect_inscope |= report_matches(&expect_inscope);
} while (!end_test_checks(match_expect_start && match_expect_inscope));
KUNIT_EXPECT_TRUE(test, match_expect_start);
KUNIT_EXPECT_TRUE(test, match_expect_inscope);
}
/*
* jiffies is special (declared to be volatile) and its accesses are typically
* not marked; this test ensures that the compiler nor KCSAN gets confused about
* jiffies's declaration on different architectures.
*/
__no_kcsan
static void test_jiffies_noreport(struct kunit *test)
{
bool match_never = false;
begin_test_checks(test_kernel_jiffies_reader, test_kernel_jiffies_reader);
do {
match_never = report_available();
} while (!end_test_checks(match_never));
KUNIT_EXPECT_FALSE(test, match_never);
}
/* Test that racing accesses in seqlock critical sections are not reported. */
__no_kcsan
static void test_seqlock_noreport(struct kunit *test)
{
bool match_never = false;
begin_test_checks(test_kernel_seqlock_reader, test_kernel_seqlock_writer);
do {
match_never = report_available();
} while (!end_test_checks(match_never));
KUNIT_EXPECT_FALSE(test, match_never);
}
/*
* Each test case is run with different numbers of threads. Until KUnit supports
* passing arguments for each test case, we encode #threads in the test case
* name (read by get_num_threads()). [The '-' was chosen as a stylistic
* preference to separate test name and #threads.]
*
* The thread counts are chosen to cover potentially interesting boundaries and
* corner cases (range 2-5), and then stress the system with larger counts.
*/
#define KCSAN_KUNIT_CASE(test_name) \
{ .run_case = test_name, .name = #test_name "-02" }, \
{ .run_case = test_name, .name = #test_name "-03" }, \
{ .run_case = test_name, .name = #test_name "-04" }, \
{ .run_case = test_name, .name = #test_name "-05" }, \
{ .run_case = test_name, .name = #test_name "-08" }, \
{ .run_case = test_name, .name = #test_name "-16" }
static struct kunit_case kcsan_test_cases[] = {
KCSAN_KUNIT_CASE(test_basic),
KCSAN_KUNIT_CASE(test_concurrent_races),
KCSAN_KUNIT_CASE(test_novalue_change),
KCSAN_KUNIT_CASE(test_novalue_change_exception),
KCSAN_KUNIT_CASE(test_unknown_origin),
KCSAN_KUNIT_CASE(test_write_write_assume_atomic),
KCSAN_KUNIT_CASE(test_write_write_struct),
KCSAN_KUNIT_CASE(test_write_write_struct_part),
KCSAN_KUNIT_CASE(test_read_atomic_write_atomic),
KCSAN_KUNIT_CASE(test_read_plain_atomic_write),
KCSAN_KUNIT_CASE(test_zero_size_access),
KCSAN_KUNIT_CASE(test_data_race),
KCSAN_KUNIT_CASE(test_assert_exclusive_writer),
KCSAN_KUNIT_CASE(test_assert_exclusive_access),
KCSAN_KUNIT_CASE(test_assert_exclusive_access_writer),
KCSAN_KUNIT_CASE(test_assert_exclusive_bits_change),
KCSAN_KUNIT_CASE(test_assert_exclusive_bits_nochange),
KCSAN_KUNIT_CASE(test_assert_exclusive_writer_scoped),
KCSAN_KUNIT_CASE(test_assert_exclusive_access_scoped),
KCSAN_KUNIT_CASE(test_jiffies_noreport),
KCSAN_KUNIT_CASE(test_seqlock_noreport),
{},
};
/* ===== End test cases ===== */
/* Get number of threads encoded in test name. */
static bool __no_kcsan
get_num_threads(const char *test, int *nthreads)
{
int len = strlen(test);
if (WARN_ON(len < 3))
return false;
*nthreads = test[len - 1] - '0';
*nthreads += (test[len - 2] - '0') * 10;
if (WARN_ON(*nthreads < 0))
return false;
return true;
}
/* Concurrent accesses from interrupts. */
__no_kcsan
static void access_thread_timer(struct timer_list *timer)
{
static atomic_t cnt = ATOMIC_INIT(0);
unsigned int idx;
void (*func)(void);
idx = (unsigned int)atomic_inc_return(&cnt) % ARRAY_SIZE(access_kernels);
/* Acquire potential initialization. */
func = smp_load_acquire(&access_kernels[idx]);
if (func)
func();
}
/* The main loop for each thread. */
__no_kcsan
static int access_thread(void *arg)
{
struct timer_list timer;
unsigned int cnt = 0;
unsigned int idx;
void (*func)(void);
timer_setup_on_stack(&timer, access_thread_timer, 0);
do {
might_sleep();
if (!timer_pending(&timer))
mod_timer(&timer, jiffies + 1);
else {
/* Iterate through all kernels. */
idx = cnt++ % ARRAY_SIZE(access_kernels);
/* Acquire potential initialization. */
func = smp_load_acquire(&access_kernels[idx]);
if (func)
func();
}
} while (!torture_must_stop());
del_timer_sync(&timer);
destroy_timer_on_stack(&timer);
torture_kthread_stopping("access_thread");
return 0;
}
__no_kcsan
static int test_init(struct kunit *test)
{
unsigned long flags;
int nthreads;
int i;
spin_lock_irqsave(&observed.lock, flags);
for (i = 0; i < ARRAY_SIZE(observed.lines); ++i)
observed.lines[i][0] = '\0';
observed.nlines = 0;
spin_unlock_irqrestore(&observed.lock, flags);
if (!torture_init_begin((char *)test->name, 1))
return -EBUSY;
if (!get_num_threads(test->name, &nthreads))
goto err;
if (WARN_ON(threads))
goto err;
for (i = 0; i < ARRAY_SIZE(access_kernels); ++i) {
if (WARN_ON(access_kernels[i]))
goto err;
}
if (!IS_ENABLED(CONFIG_PREEMPT) || !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) {
/*
* Without any preemption, keep 2 CPUs free for other tasks, one
* of which is the main test case function checking for
* completion or failure.
*/
const int min_unused_cpus = IS_ENABLED(CONFIG_PREEMPT_NONE) ? 2 : 0;
const int min_required_cpus = 2 + min_unused_cpus;
if (num_online_cpus() < min_required_cpus) {
pr_err("%s: too few online CPUs (%u < %d) for test",
test->name, num_online_cpus(), min_required_cpus);
goto err;
} else if (nthreads > num_online_cpus() - min_unused_cpus) {
nthreads = num_online_cpus() - min_unused_cpus;
pr_warn("%s: limiting number of threads to %d\n",
test->name, nthreads);
}
}
if (nthreads) {
threads = kcalloc(nthreads + 1, sizeof(struct task_struct *),
GFP_KERNEL);
if (WARN_ON(!threads))
goto err;
threads[nthreads] = NULL;
for (i = 0; i < nthreads; ++i) {
if (torture_create_kthread(access_thread, NULL,
threads[i]))
goto err;
}
}
torture_init_end();
return 0;
err:
kfree(threads);
threads = NULL;
torture_init_end();
return -EINVAL;
}
__no_kcsan
static void test_exit(struct kunit *test)
{
struct task_struct **stop_thread;
int i;
if (torture_cleanup_begin())
return;
for (i = 0; i < ARRAY_SIZE(access_kernels); ++i)
WRITE_ONCE(access_kernels[i], NULL);
if (threads) {
for (stop_thread = threads; *stop_thread; stop_thread++)
torture_stop_kthread(reader_thread, *stop_thread);
kfree(threads);
threads = NULL;
}
torture_cleanup_end();
}
static struct kunit_suite kcsan_test_suite = {
.name = "kcsan-test",
.test_cases = kcsan_test_cases,
.init = test_init,
.exit = test_exit,
};
static struct kunit_suite *kcsan_test_suites[] = { &kcsan_test_suite, NULL };
__no_kcsan
static void register_tracepoints(struct tracepoint *tp, void *ignore)
{
check_trace_callback_type_console(probe_console);
if (!strcmp(tp->name, "console"))
WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
}
__no_kcsan
static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
{
if (!strcmp(tp->name, "console"))
tracepoint_probe_unregister(tp, probe_console, NULL);
}
/*
* We only want to do tracepoints setup and teardown once, therefore we have to
* customize the init and exit functions and cannot rely on kunit_test_suite().
*/
static int __init kcsan_test_init(void)
{
/*
* Because we want to be able to build the test as a module, we need to
* iterate through all known tracepoints, since the static registration
* won't work here.
*/
for_each_kernel_tracepoint(register_tracepoints, NULL);
return __kunit_test_suites_init(kcsan_test_suites);
}
static void kcsan_test_exit(void)
{
__kunit_test_suites_exit(kcsan_test_suites);
for_each_kernel_tracepoint(unregister_tracepoints, NULL);
tracepoint_synchronize_unregister();
}
late_initcall(kcsan_test_init);
module_exit(kcsan_test_exit);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Marco Elver <elver@google.com>");
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
#define _KERNEL_KCSAN_KCSAN_H #define _KERNEL_KCSAN_KCSAN_H
#include <linux/kcsan.h> #include <linux/kcsan.h>
#include <linux/sched.h>
/* The number of adjacent watchpoints to check. */ /* The number of adjacent watchpoints to check. */
#define KCSAN_CHECK_ADJACENT 1 #define KCSAN_CHECK_ADJACENT 1
...@@ -22,6 +23,12 @@ extern unsigned int kcsan_udelay_interrupt; ...@@ -22,6 +23,12 @@ extern unsigned int kcsan_udelay_interrupt;
*/ */
extern bool kcsan_enabled; extern bool kcsan_enabled;
/*
* Save/restore IRQ flags state trace dirtied by KCSAN.
*/
void kcsan_save_irqtrace(struct task_struct *task);
void kcsan_restore_irqtrace(struct task_struct *task);
/* /*
* Initialize debugfs file. * Initialize debugfs file.
*/ */
......
...@@ -308,6 +308,9 @@ static void print_verbose_info(struct task_struct *task) ...@@ -308,6 +308,9 @@ static void print_verbose_info(struct task_struct *task)
if (!task) if (!task)
return; return;
/* Restore IRQ state trace for printing. */
kcsan_restore_irqtrace(task);
pr_err("\n"); pr_err("\n");
debug_show_held_locks(task); debug_show_held_locks(task);
print_irqtrace_events(task); print_irqtrace_events(task);
...@@ -606,10 +609,11 @@ void kcsan_report(const volatile void *ptr, size_t size, int access_type, ...@@ -606,10 +609,11 @@ void kcsan_report(const volatile void *ptr, size_t size, int access_type,
goto out; goto out;
/* /*
* With TRACE_IRQFLAGS, lockdep's IRQ trace state becomes corrupted if * Because we may generate reports when we're in scheduler code, the use
* we do not turn off lockdep here; this could happen due to recursion * of printk() could deadlock. Until such time that all printing code
* into lockdep via KCSAN if we detect a race in utilities used by * called in print_report() is scheduler-safe, accept the risk, and just
* lockdep. * get our message out. As such, also disable lockdep to hide the
* warning, and avoid disabling lockdep for the rest of the kernel.
*/ */
lockdep_off(); lockdep_off();
......
...@@ -395,7 +395,7 @@ void lockdep_init_task(struct task_struct *task) ...@@ -395,7 +395,7 @@ void lockdep_init_task(struct task_struct *task)
static __always_inline void lockdep_recursion_finish(void) static __always_inline void lockdep_recursion_finish(void)
{ {
if (WARN_ON_ONCE(--current->lockdep_recursion)) if (WARN_ON_ONCE((--current->lockdep_recursion) & LOCKDEP_RECURSION_MASK))
current->lockdep_recursion = 0; current->lockdep_recursion = 0;
} }
...@@ -2062,9 +2062,9 @@ print_bad_irq_dependency(struct task_struct *curr, ...@@ -2062,9 +2062,9 @@ print_bad_irq_dependency(struct task_struct *curr,
pr_warn("-----------------------------------------------------\n"); pr_warn("-----------------------------------------------------\n");
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n", pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
curr->comm, task_pid_nr(curr), curr->comm, task_pid_nr(curr),
curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT, lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT,
curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT, curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
curr->hardirqs_enabled, lockdep_hardirqs_enabled(),
curr->softirqs_enabled); curr->softirqs_enabled);
print_lock(next); print_lock(next);
...@@ -3331,9 +3331,9 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this, ...@@ -3331,9 +3331,9 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n", pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
curr->comm, task_pid_nr(curr), curr->comm, task_pid_nr(curr),
lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT, lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT,
lockdep_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT, lockdep_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
lockdep_hardirqs_enabled(curr), lockdep_hardirqs_enabled(),
lockdep_softirqs_enabled(curr)); lockdep_softirqs_enabled(curr));
print_lock(this); print_lock(this);
...@@ -3484,19 +3484,21 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this, ...@@ -3484,19 +3484,21 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this,
void print_irqtrace_events(struct task_struct *curr) void print_irqtrace_events(struct task_struct *curr)
{ {
printk("irq event stamp: %u\n", curr->irq_events); const struct irqtrace_events *trace = &curr->irqtrace;
printk("irq event stamp: %u\n", trace->irq_events);
printk("hardirqs last enabled at (%u): [<%px>] %pS\n", printk("hardirqs last enabled at (%u): [<%px>] %pS\n",
curr->hardirq_enable_event, (void *)curr->hardirq_enable_ip, trace->hardirq_enable_event, (void *)trace->hardirq_enable_ip,
(void *)curr->hardirq_enable_ip); (void *)trace->hardirq_enable_ip);
printk("hardirqs last disabled at (%u): [<%px>] %pS\n", printk("hardirqs last disabled at (%u): [<%px>] %pS\n",
curr->hardirq_disable_event, (void *)curr->hardirq_disable_ip, trace->hardirq_disable_event, (void *)trace->hardirq_disable_ip,
(void *)curr->hardirq_disable_ip); (void *)trace->hardirq_disable_ip);
printk("softirqs last enabled at (%u): [<%px>] %pS\n", printk("softirqs last enabled at (%u): [<%px>] %pS\n",
curr->softirq_enable_event, (void *)curr->softirq_enable_ip, trace->softirq_enable_event, (void *)trace->softirq_enable_ip,
(void *)curr->softirq_enable_ip); (void *)trace->softirq_enable_ip);
printk("softirqs last disabled at (%u): [<%px>] %pS\n", printk("softirqs last disabled at (%u): [<%px>] %pS\n",
curr->softirq_disable_event, (void *)curr->softirq_disable_ip, trace->softirq_disable_event, (void *)trace->softirq_disable_ip,
(void *)curr->softirq_disable_ip); (void *)trace->softirq_disable_ip);
} }
static int HARDIRQ_verbose(struct lock_class *class) static int HARDIRQ_verbose(struct lock_class *class)
...@@ -3646,10 +3648,19 @@ static void __trace_hardirqs_on_caller(void) ...@@ -3646,10 +3648,19 @@ static void __trace_hardirqs_on_caller(void)
*/ */
void lockdep_hardirqs_on_prepare(unsigned long ip) void lockdep_hardirqs_on_prepare(unsigned long ip)
{ {
if (unlikely(!debug_locks || current->lockdep_recursion)) if (unlikely(!debug_locks))
return;
/*
* NMIs do not (and cannot) track lock dependencies, nothing to do.
*/
if (unlikely(in_nmi()))
return;
if (unlikely(current->lockdep_recursion & LOCKDEP_RECURSION_MASK))
return; return;
if (unlikely(current->hardirqs_enabled)) { if (unlikely(lockdep_hardirqs_enabled())) {
/* /*
* Neither irq nor preemption are disabled here * Neither irq nor preemption are disabled here
* so this is racy by nature but losing one hit * so this is racy by nature but losing one hit
...@@ -3677,7 +3688,7 @@ void lockdep_hardirqs_on_prepare(unsigned long ip) ...@@ -3677,7 +3688,7 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
* Can't allow enabling interrupts while in an interrupt handler, * Can't allow enabling interrupts while in an interrupt handler,
* that's general bad form and such. Recursion, limited stack etc.. * that's general bad form and such. Recursion, limited stack etc..
*/ */
if (DEBUG_LOCKS_WARN_ON(current->hardirq_context)) if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context()))
return; return;
current->hardirq_chain_key = current->curr_chain_key; current->hardirq_chain_key = current->curr_chain_key;
...@@ -3690,12 +3701,35 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on_prepare); ...@@ -3690,12 +3701,35 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on_prepare);
void noinstr lockdep_hardirqs_on(unsigned long ip) void noinstr lockdep_hardirqs_on(unsigned long ip)
{ {
struct task_struct *curr = current; struct irqtrace_events *trace = &current->irqtrace;
if (unlikely(!debug_locks))
return;
/*
* NMIs can happen in the middle of local_irq_{en,dis}able() where the
* tracking state and hardware state are out of sync.
*
* NMIs must save lockdep_hardirqs_enabled() to restore IRQ state from,
* and not rely on hardware state like normal interrupts.
*/
if (unlikely(in_nmi())) {
if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
return;
/*
* Skip:
* - recursion check, because NMI can hit lockdep;
* - hardware state check, because above;
* - chain_key check, see lockdep_hardirqs_on_prepare().
*/
goto skip_checks;
}
if (unlikely(!debug_locks || curr->lockdep_recursion)) if (unlikely(current->lockdep_recursion & LOCKDEP_RECURSION_MASK))
return; return;
if (curr->hardirqs_enabled) { if (lockdep_hardirqs_enabled()) {
/* /*
* Neither irq nor preemption are disabled here * Neither irq nor preemption are disabled here
* so this is racy by nature but losing one hit * so this is racy by nature but losing one hit
...@@ -3720,10 +3754,11 @@ void noinstr lockdep_hardirqs_on(unsigned long ip) ...@@ -3720,10 +3754,11 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key != DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key !=
current->curr_chain_key); current->curr_chain_key);
skip_checks:
/* we'll do an OFF -> ON transition: */ /* we'll do an OFF -> ON transition: */
curr->hardirqs_enabled = 1; this_cpu_write(hardirqs_enabled, 1);
curr->hardirq_enable_ip = ip; trace->hardirq_enable_ip = ip;
curr->hardirq_enable_event = ++curr->irq_events; trace->hardirq_enable_event = ++trace->irq_events;
debug_atomic_inc(hardirqs_on_events); debug_atomic_inc(hardirqs_on_events);
} }
EXPORT_SYMBOL_GPL(lockdep_hardirqs_on); EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
...@@ -3733,9 +3768,18 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on); ...@@ -3733,9 +3768,18 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
*/ */
void noinstr lockdep_hardirqs_off(unsigned long ip) void noinstr lockdep_hardirqs_off(unsigned long ip)
{ {
struct task_struct *curr = current; if (unlikely(!debug_locks))
return;
if (unlikely(!debug_locks || curr->lockdep_recursion)) /*
* Matching lockdep_hardirqs_on(), allow NMIs in the middle of lockdep;
* they will restore the software state. This ensures the software
* state is consistent inside NMIs as well.
*/
if (in_nmi()) {
if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
return;
} else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK)
return; return;
/* /*
...@@ -3745,13 +3789,15 @@ void noinstr lockdep_hardirqs_off(unsigned long ip) ...@@ -3745,13 +3789,15 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return; return;
if (curr->hardirqs_enabled) { if (lockdep_hardirqs_enabled()) {
struct irqtrace_events *trace = &current->irqtrace;
/* /*
* We have done an ON -> OFF transition: * We have done an ON -> OFF transition:
*/ */
curr->hardirqs_enabled = 0; this_cpu_write(hardirqs_enabled, 0);
curr->hardirq_disable_ip = ip; trace->hardirq_disable_ip = ip;
curr->hardirq_disable_event = ++curr->irq_events; trace->hardirq_disable_event = ++trace->irq_events;
debug_atomic_inc(hardirqs_off_events); debug_atomic_inc(hardirqs_off_events);
} else { } else {
debug_atomic_inc(redundant_hardirqs_off); debug_atomic_inc(redundant_hardirqs_off);
...@@ -3764,7 +3810,7 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_off); ...@@ -3764,7 +3810,7 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
*/ */
void lockdep_softirqs_on(unsigned long ip) void lockdep_softirqs_on(unsigned long ip)
{ {
struct task_struct *curr = current; struct irqtrace_events *trace = &current->irqtrace;
if (unlikely(!debug_locks || current->lockdep_recursion)) if (unlikely(!debug_locks || current->lockdep_recursion))
return; return;
...@@ -3776,7 +3822,7 @@ void lockdep_softirqs_on(unsigned long ip) ...@@ -3776,7 +3822,7 @@ void lockdep_softirqs_on(unsigned long ip)
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return; return;
if (curr->softirqs_enabled) { if (current->softirqs_enabled) {
debug_atomic_inc(redundant_softirqs_on); debug_atomic_inc(redundant_softirqs_on);
return; return;
} }
...@@ -3785,17 +3831,17 @@ void lockdep_softirqs_on(unsigned long ip) ...@@ -3785,17 +3831,17 @@ void lockdep_softirqs_on(unsigned long ip)
/* /*
* We'll do an OFF -> ON transition: * We'll do an OFF -> ON transition:
*/ */
curr->softirqs_enabled = 1; current->softirqs_enabled = 1;
curr->softirq_enable_ip = ip; trace->softirq_enable_ip = ip;
curr->softirq_enable_event = ++curr->irq_events; trace->softirq_enable_event = ++trace->irq_events;
debug_atomic_inc(softirqs_on_events); debug_atomic_inc(softirqs_on_events);
/* /*
* We are going to turn softirqs on, so set the * We are going to turn softirqs on, so set the
* usage bit for all held locks, if hardirqs are * usage bit for all held locks, if hardirqs are
* enabled too: * enabled too:
*/ */
if (curr->hardirqs_enabled) if (lockdep_hardirqs_enabled())
mark_held_locks(curr, LOCK_ENABLED_SOFTIRQ); mark_held_locks(current, LOCK_ENABLED_SOFTIRQ);
lockdep_recursion_finish(); lockdep_recursion_finish();
} }
...@@ -3804,8 +3850,6 @@ void lockdep_softirqs_on(unsigned long ip) ...@@ -3804,8 +3850,6 @@ void lockdep_softirqs_on(unsigned long ip)
*/ */
void lockdep_softirqs_off(unsigned long ip) void lockdep_softirqs_off(unsigned long ip)
{ {
struct task_struct *curr = current;
if (unlikely(!debug_locks || current->lockdep_recursion)) if (unlikely(!debug_locks || current->lockdep_recursion))
return; return;
...@@ -3815,13 +3859,15 @@ void lockdep_softirqs_off(unsigned long ip) ...@@ -3815,13 +3859,15 @@ void lockdep_softirqs_off(unsigned long ip)
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return; return;
if (curr->softirqs_enabled) { if (current->softirqs_enabled) {
struct irqtrace_events *trace = &current->irqtrace;
/* /*
* We have done an ON -> OFF transition: * We have done an ON -> OFF transition:
*/ */
curr->softirqs_enabled = 0; current->softirqs_enabled = 0;
curr->softirq_disable_ip = ip; trace->softirq_disable_ip = ip;
curr->softirq_disable_event = ++curr->irq_events; trace->softirq_disable_event = ++trace->irq_events;
debug_atomic_inc(softirqs_off_events); debug_atomic_inc(softirqs_off_events);
/* /*
* Whoops, we wanted softirqs off, so why aren't they? * Whoops, we wanted softirqs off, so why aren't they?
...@@ -3843,7 +3889,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check) ...@@ -3843,7 +3889,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
*/ */
if (!hlock->trylock) { if (!hlock->trylock) {
if (hlock->read) { if (hlock->read) {
if (curr->hardirq_context) if (lockdep_hardirq_context())
if (!mark_lock(curr, hlock, if (!mark_lock(curr, hlock,
LOCK_USED_IN_HARDIRQ_READ)) LOCK_USED_IN_HARDIRQ_READ))
return 0; return 0;
...@@ -3852,7 +3898,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check) ...@@ -3852,7 +3898,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
LOCK_USED_IN_SOFTIRQ_READ)) LOCK_USED_IN_SOFTIRQ_READ))
return 0; return 0;
} else { } else {
if (curr->hardirq_context) if (lockdep_hardirq_context())
if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ)) if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ))
return 0; return 0;
if (curr->softirq_context) if (curr->softirq_context)
...@@ -3890,7 +3936,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check) ...@@ -3890,7 +3936,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
static inline unsigned int task_irq_context(struct task_struct *task) static inline unsigned int task_irq_context(struct task_struct *task)
{ {
return LOCK_CHAIN_HARDIRQ_CONTEXT * !!task->hardirq_context + return LOCK_CHAIN_HARDIRQ_CONTEXT * !!lockdep_hardirq_context() +
LOCK_CHAIN_SOFTIRQ_CONTEXT * !!task->softirq_context; LOCK_CHAIN_SOFTIRQ_CONTEXT * !!task->softirq_context;
} }
...@@ -3983,7 +4029,7 @@ static inline short task_wait_context(struct task_struct *curr) ...@@ -3983,7 +4029,7 @@ static inline short task_wait_context(struct task_struct *curr)
* Set appropriate wait type for the context; for IRQs we have to take * Set appropriate wait type for the context; for IRQs we have to take
* into account force_irqthread as that is implied by PREEMPT_RT. * into account force_irqthread as that is implied by PREEMPT_RT.
*/ */
if (curr->hardirq_context) { if (lockdep_hardirq_context()) {
/* /*
* Check if force_irqthreads will run us threaded. * Check if force_irqthreads will run us threaded.
*/ */
...@@ -4826,11 +4872,11 @@ static void check_flags(unsigned long flags) ...@@ -4826,11 +4872,11 @@ static void check_flags(unsigned long flags)
return; return;
if (irqs_disabled_flags(flags)) { if (irqs_disabled_flags(flags)) {
if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) { if (DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())) {
printk("possible reason: unannotated irqs-off.\n"); printk("possible reason: unannotated irqs-off.\n");
} }
} else { } else {
if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) { if (DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())) {
printk("possible reason: unannotated irqs-on.\n"); printk("possible reason: unannotated irqs-on.\n");
} }
} }
......
...@@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock) ...@@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
*/ */
for (;;) { for (;;) {
if (prev->next == node && /*
* cpu_relax() below implies a compiler barrier which would
* prevent this comparison being optimized away.
*/
if (data_race(prev->next) == node &&
cmpxchg(&prev->next, node, NULL) == node) cmpxchg(&prev->next, node, NULL) == node)
break; break;
......
...@@ -107,6 +107,12 @@ static bool ksoftirqd_running(unsigned long pending) ...@@ -107,6 +107,12 @@ static bool ksoftirqd_running(unsigned long pending)
* where hardirqs are disabled legitimately: * where hardirqs are disabled legitimately:
*/ */
#ifdef CONFIG_TRACE_IRQFLAGS #ifdef CONFIG_TRACE_IRQFLAGS
DEFINE_PER_CPU(int, hardirqs_enabled);
DEFINE_PER_CPU(int, hardirq_context);
EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
{ {
unsigned long flags; unsigned long flags;
...@@ -224,7 +230,7 @@ static inline bool lockdep_softirq_start(void) ...@@ -224,7 +230,7 @@ static inline bool lockdep_softirq_start(void)
{ {
bool in_hardirq = false; bool in_hardirq = false;
if (lockdep_hardirq_context(current)) { if (lockdep_hardirq_context()) {
in_hardirq = true; in_hardirq = true;
lockdep_hardirq_exit(); lockdep_hardirq_exit();
} }
......
...@@ -1117,6 +1117,7 @@ config PROVE_LOCKING ...@@ -1117,6 +1117,7 @@ config PROVE_LOCKING
select DEBUG_RWSEMS select DEBUG_RWSEMS
select DEBUG_WW_MUTEX_SLOWPATH select DEBUG_WW_MUTEX_SLOWPATH
select DEBUG_LOCK_ALLOC select DEBUG_LOCK_ALLOC
select PREEMPT_COUNT if !ARCH_NO_PREEMPT
select TRACE_IRQFLAGS select TRACE_IRQFLAGS
default n default n
help help
...@@ -1325,11 +1326,17 @@ config WW_MUTEX_SELFTEST ...@@ -1325,11 +1326,17 @@ config WW_MUTEX_SELFTEST
endmenu # lock debugging endmenu # lock debugging
config TRACE_IRQFLAGS config TRACE_IRQFLAGS
depends on TRACE_IRQFLAGS_SUPPORT
bool bool
help help
Enables hooks to interrupt enabling and disabling for Enables hooks to interrupt enabling and disabling for
either tracing or lock debugging. either tracing or lock debugging.
config TRACE_IRQFLAGS_NMI
def_bool y
depends on TRACE_IRQFLAGS
depends on TRACE_IRQFLAGS_NMI_SUPPORT
config STACKTRACE config STACKTRACE
bool "Stack backtrace support" bool "Stack backtrace support"
depends on STACKTRACE_SUPPORT depends on STACKTRACE_SUPPORT
......
...@@ -4,7 +4,8 @@ config HAVE_ARCH_KCSAN ...@@ -4,7 +4,8 @@ config HAVE_ARCH_KCSAN
bool bool
config HAVE_KCSAN_COMPILER config HAVE_KCSAN_COMPILER
def_bool CC_IS_CLANG && $(cc-option,-fsanitize=thread -mllvm -tsan-distinguish-volatile=1) def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=thread -mllvm -tsan-distinguish-volatile=1)) || \
(CC_IS_GCC && $(cc-option,-fsanitize=thread --param tsan-distinguish-volatile=1))
help help
For the list of compilers that support KCSAN, please see For the list of compilers that support KCSAN, please see
<file:Documentation/dev-tools/kcsan.rst>. <file:Documentation/dev-tools/kcsan.rst>.
...@@ -59,7 +60,28 @@ config KCSAN_SELFTEST ...@@ -59,7 +60,28 @@ config KCSAN_SELFTEST
bool "Perform short selftests on boot" bool "Perform short selftests on boot"
default y default y
help help
Run KCSAN selftests on boot. On test failure, causes the kernel to panic. Run KCSAN selftests on boot. On test failure, causes the kernel to
panic. Recommended to be enabled, ensuring critical functionality
works as intended.
config KCSAN_TEST
tristate "KCSAN test for integrated runtime behaviour"
depends on TRACEPOINTS && KUNIT
select TORTURE_TEST
help
KCSAN test focusing on behaviour of the integrated runtime. Tests
various race scenarios, and verifies the reports generated to
console. Makes use of KUnit for test organization, and the Torture
framework for test thread control.
Each test case may run at least up to KCSAN_REPORT_ONCE_IN_MS
milliseconds. Test run duration may be optimized by building the
kernel and KCSAN test with KCSAN_REPORT_ONCE_IN_MS set to a lower
than default value.
Say Y here if you want the test to be built into the kernel and run
during boot; say M if you want the test to build as a module; say N
if you are unsure.
config KCSAN_EARLY_ENABLE config KCSAN_EARLY_ENABLE
bool "Early enable during boot" bool "Early enable during boot"
......
...@@ -6,7 +6,7 @@ ifdef CONFIG_KCSAN ...@@ -6,7 +6,7 @@ ifdef CONFIG_KCSAN
ifdef CONFIG_CC_IS_CLANG ifdef CONFIG_CC_IS_CLANG
cc-param = -mllvm -$(1) cc-param = -mllvm -$(1)
else else
cc-param = --param -$(1) cc-param = --param $(1)
endif endif
# Keep most options here optional, to allow enabling more compilers if absence # Keep most options here optional, to allow enabling more compilers if absence
......
...@@ -2,9 +2,9 @@ ...@@ -2,9 +2,9 @@
#ifndef _LIBLOCKDEP_LINUX_TRACE_IRQFLAGS_H_ #ifndef _LIBLOCKDEP_LINUX_TRACE_IRQFLAGS_H_
#define _LIBLOCKDEP_LINUX_TRACE_IRQFLAGS_H_ #define _LIBLOCKDEP_LINUX_TRACE_IRQFLAGS_H_
# define lockdep_hardirq_context(p) 0 # define lockdep_hardirq_context() 0
# define lockdep_softirq_context(p) 0 # define lockdep_softirq_context(p) 0
# define lockdep_hardirqs_enabled(p) 0 # define lockdep_hardirqs_enabled() 0
# define lockdep_softirqs_enabled(p) 0 # define lockdep_softirqs_enabled(p) 0
# define lockdep_hardirq_enter() do { } while (0) # define lockdep_hardirq_enter() do { } while (0)
# define lockdep_hardirq_exit() do { } while (0) # define lockdep_hardirq_exit() do { } while (0)
......
...@@ -1985,28 +1985,36 @@ outcome undefined. ...@@ -1985,28 +1985,36 @@ outcome undefined.
In technical terms, the compiler is allowed to assume that when the In technical terms, the compiler is allowed to assume that when the
program executes, there will not be any data races. A "data race" program executes, there will not be any data races. A "data race"
occurs when two conflicting memory accesses execute concurrently; occurs when there are two memory accesses such that:
two memory accesses "conflict" if:
they access the same location, 1. they access the same location,
they occur on different CPUs (or in different threads on the 2. at least one of them is a store,
same CPU),
at least one of them is a plain access, 3. at least one of them is plain,
and at least one of them is a store. 4. they occur on different CPUs (or in different threads on the
same CPU), and
The LKMM tries to determine whether a program contains two conflicting 5. they execute concurrently.
accesses which may execute concurrently; if it does then the LKMM says
there is a potential data race and makes no predictions about the
program's outcome.
Determining whether two accesses conflict is easy; you can see that In the literature, two accesses are said to "conflict" if they satisfy
all the concepts involved in the definition above are already part of 1 and 2 above. We'll go a little farther and say that two accesses
the memory model. The hard part is telling whether they may execute are "race candidates" if they satisfy 1 - 4. Thus, whether or not two
concurrently. The LKMM takes a conservative attitude, assuming that race candidates actually do race in a given execution depends on
accesses may be concurrent unless it can prove they cannot. whether they are concurrent.
The LKMM tries to determine whether a program contains race candidates
which may execute concurrently; if it does then the LKMM says there is
a potential data race and makes no predictions about the program's
outcome.
Determining whether two accesses are race candidates is easy; you can
see that all the concepts involved in the definition above are already
part of the memory model. The hard part is telling whether they may
execute concurrently. The LKMM takes a conservative attitude,
assuming that accesses may be concurrent unless it can prove they
are not.
If two memory accesses aren't concurrent then one must execute before If two memory accesses aren't concurrent then one must execute before
the other. Therefore the LKMM decides two accesses aren't concurrent the other. Therefore the LKMM decides two accesses aren't concurrent
...@@ -2169,8 +2177,8 @@ again, now using plain accesses for buf: ...@@ -2169,8 +2177,8 @@ again, now using plain accesses for buf:
} }
This program does not contain a data race. Although the U and V This program does not contain a data race. Although the U and V
accesses conflict, the LKMM can prove they are not concurrent as accesses are race candidates, the LKMM can prove they are not
follows: concurrent as follows:
The smp_wmb() fence in P0 is both a compiler barrier and a The smp_wmb() fence in P0 is both a compiler barrier and a
cumul-fence. It guarantees that no matter what hash of cumul-fence. It guarantees that no matter what hash of
...@@ -2324,12 +2332,11 @@ could now perform the load of x before the load of ptr (there might be ...@@ -2324,12 +2332,11 @@ could now perform the load of x before the load of ptr (there might be
a control dependency but no address dependency at the machine level). a control dependency but no address dependency at the machine level).
Finally, it turns out there is a situation in which a plain write does Finally, it turns out there is a situation in which a plain write does
not need to be w-post-bounded: when it is separated from the not need to be w-post-bounded: when it is separated from the other
conflicting access by a fence. At first glance this may seem race-candidate access by a fence. At first glance this may seem
impossible. After all, to be conflicting the second access has to be impossible. After all, to be race candidates the two accesses must
on a different CPU from the first, and fences don't link events on be on different CPUs, and fences don't link events on different CPUs.
different CPUs. Well, normal fences don't -- but rcu-fence can! Well, normal fences don't -- but rcu-fence can! Here's an example:
Here's an example:
int x, y; int x, y;
...@@ -2365,7 +2372,7 @@ concurrent and there is no race, even though P1's plain store to y ...@@ -2365,7 +2372,7 @@ concurrent and there is no race, even though P1's plain store to y
isn't w-post-bounded by any marked accesses. isn't w-post-bounded by any marked accesses.
Putting all this material together yields the following picture. For Putting all this material together yields the following picture. For
two conflicting stores W and W', where W ->co W', the LKMM says the race-candidate stores W and W', where W ->co W', the LKMM says the
stores don't race if W can be linked to W' by a stores don't race if W can be linked to W' by a
w-post-bounded ; vis ; w-pre-bounded w-post-bounded ; vis ; w-pre-bounded
...@@ -2378,8 +2385,8 @@ sequence, and if W' is plain then they also have to be linked by a ...@@ -2378,8 +2385,8 @@ sequence, and if W' is plain then they also have to be linked by a
w-post-bounded ; vis ; r-pre-bounded w-post-bounded ; vis ; r-pre-bounded
sequence. For a conflicting load R and store W, the LKMM says the two sequence. For race-candidate load R and store W, the LKMM says the
accesses don't race if R can be linked to W by an two accesses don't race if R can be linked to W by an
r-post-bounded ; xb* ; w-pre-bounded r-post-bounded ; xb* ; w-pre-bounded
...@@ -2411,20 +2418,20 @@ is, the rules governing the memory subsystem's choice of a store to ...@@ -2411,20 +2418,20 @@ is, the rules governing the memory subsystem's choice of a store to
satisfy a load request and its determination of where a store will satisfy a load request and its determination of where a store will
fall in the coherence order): fall in the coherence order):
If R and W conflict and it is possible to link R to W by one If R and W are race candidates and it is possible to link R to
of the xb* sequences listed above, then W ->rfe R is not W by one of the xb* sequences listed above, then W ->rfe R is
allowed (i.e., a load cannot read from a store that it not allowed (i.e., a load cannot read from a store that it
executes before, even if one or both is plain). executes before, even if one or both is plain).
If W and R conflict and it is possible to link W to R by one If W and R are race candidates and it is possible to link W to
of the vis sequences listed above, then R ->fre W is not R by one of the vis sequences listed above, then R ->fre W is
allowed (i.e., if a store is visible to a load then the load not allowed (i.e., if a store is visible to a load then the
must read from that store or one coherence-after it). load must read from that store or one coherence-after it).
If W and W' conflict and it is possible to link W to W' by one If W and W' are race candidates and it is possible to link W
of the vis sequences listed above, then W' ->co W is not to W' by one of the vis sequences listed above, then W' ->co W
allowed (i.e., if one store is visible to a second then the is not allowed (i.e., if one store is visible to a second then
second must come after the first in the coherence order). the second must come after the first in the coherence order).
This is the extent to which the LKMM deals with plain accesses. This is the extent to which the LKMM deals with plain accesses.
Perhaps it could say more (for example, plain accesses might Perhaps it could say more (for example, plain accesses might
......
...@@ -126,7 +126,7 @@ However, it is not necessarily the case that accesses ordered by ...@@ -126,7 +126,7 @@ However, it is not necessarily the case that accesses ordered by
locking will be seen as ordered by CPUs not holding that lock. locking will be seen as ordered by CPUs not holding that lock.
Consider this example: Consider this example:
/* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */ /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
void CPU0(void) void CPU0(void)
{ {
spin_lock(&mylock); spin_lock(&mylock);
......
...@@ -73,6 +73,18 @@ o Christopher Pulte, Shaked Flur, Will Deacon, Jon French, ...@@ -73,6 +73,18 @@ o Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
Linux-kernel memory model Linux-kernel memory model
========================= =========================
o Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra.
2019. "Calibrating your fear of big bad optimizing compilers"
Linux Weekly News. https://lwn.net/Articles/799218/
o Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra.
2019. "Who's afraid of a big bad optimizing compiler?"
Linux Weekly News. https://lwn.net/Articles/793253/
o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
Alan Stern. 2018. "Frightening small children and disconcerting Alan Stern. 2018. "Frightening small children and disconcerting
grown-ups: Concurrency in the Linux kernel". In Proceedings of grown-ups: Concurrency in the Linux kernel". In Proceedings of
...@@ -88,6 +100,11 @@ o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and ...@@ -88,6 +100,11 @@ o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
Alan Stern. 2017. "A formal kernel memory-ordering model (part 2)" Alan Stern. 2017. "A formal kernel memory-ordering model (part 2)"
Linux Weekly News. https://lwn.net/Articles/720550/ Linux Weekly News. https://lwn.net/Articles/720550/
o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
Alan Stern. 2017-2019. "A Formal Model of Linux-Kernel Memory
Ordering" (backup material for the LWN articles)
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/
Memory-model tooling Memory-model tooling
==================== ====================
...@@ -110,5 +127,5 @@ Memory-model comparisons ...@@ -110,5 +127,5 @@ Memory-model comparisons
======================== ========================
o Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun o Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
Feng. 2016. "Linux-Kernel Memory Model". (6 June 2016). Feng. 2018. "Linux-Kernel Memory Model". (27 September 2018).
http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0124r2.html. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0124r6.html.
...@@ -28,8 +28,34 @@ downloaded separately: ...@@ -28,8 +28,34 @@ downloaded separately:
See "herdtools7/INSTALL.md" for installation instructions. See "herdtools7/INSTALL.md" for installation instructions.
Note that although these tools usually provide backwards compatibility, Note that although these tools usually provide backwards compatibility,
this is not absolutely guaranteed. Therefore, if a later version does this is not absolutely guaranteed.
not work, please try using the exact version called out above.
For example, a future version of herd7 might not work with the model
in this release. A compatible model will likely be made available in
a later release of Linux kernel.
If you absolutely need to run the model in this particular release,
please try using the exact version called out above.
klitmus7 is independent of the model provided here. It has its own
dependency on a target kernel release where converted code is built
and executed. Any change in kernel APIs essential to klitmus7 will
necessitate an upgrade of klitmus7.
If you find any compatibility issues in klitmus7, please inform the
memory model maintainers.
klitmus7 Compatibility Table
----------------------------
============ ==========
target Linux herdtools7
------------ ----------
-- 4.18 7.48 --
4.15 -- 4.19 7.49 --
4.20 -- 5.5 7.54 --
5.6 -- 7.56 --
============ ==========
================== ==================
...@@ -207,11 +233,15 @@ The Linux-kernel memory model (LKMM) has the following limitations: ...@@ -207,11 +233,15 @@ The Linux-kernel memory model (LKMM) has the following limitations:
case as a store release. case as a store release.
b. The "unless" RMW operations are not currently modeled: b. The "unless" RMW operations are not currently modeled:
atomic_long_add_unless(), atomic_add_unless(), atomic_long_add_unless(), atomic_inc_unless_negative(),
atomic_inc_unless_negative(), and and atomic_dec_unless_positive(). These can be emulated
atomic_dec_unless_positive(). These can be emulated
in litmus tests, for example, by using atomic_cmpxchg(). in litmus tests, for example, by using atomic_cmpxchg().
One exception of this limitation is atomic_add_unless(),
which is provided directly by herd7 (so no corresponding
definition in linux-kernel.def). atomic_add_unless() is
modeled by herd7 therefore it can be used in litmus tests.
c. The call_rcu() function is not modeled. It can be c. The call_rcu() function is not modeled. It can be
emulated in litmus tests by adding another process that emulated in litmus tests by adding another process that
invokes synchronize_rcu() and the body of the callback invokes synchronize_rcu() and the body of the callback
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment