Commit d089f48f authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "These are the latest RCU updates for v5.12:

   - Documentation updates.

   - Miscellaneous fixes.

   - kfree_rcu() updates: Addition of mem_dump_obj() to provide
     allocator return addresses to more easily locate bugs. This has a
     couple of RCU-related commits, but is mostly MM. Was pulled in with
     akpm's agreement.

   - Per-callback-batch tracking of numbers of callbacks, which enables
     better debugging information and smarter reactions to large numbers
     of callbacks.

   - The first round of changes to allow CPUs to be runtime switched
     from and to callback-offloaded state.

   - CONFIG_PREEMPT_RT-related changes.

   - RCU CPU stall warning updates.

   - Addition of polling grace-period APIs for SRCU.

   - Torture-test and torture-test scripting updates, including a
     "torture everything" script that runs rcutorture, locktorture,
     scftorture, rcuscale, and refscale. Plus does an allmodconfig
     build.

   - nolibc fixes for the torture tests"

* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
  percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
  rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
  mm: Make mem_obj_dump() vmalloc() dumps include start and length
  mm: Make mem_dump_obj() handle vmalloc() memory
  mm: Make mem_dump_obj() handle NULL and zero-sized pointers
  mm: Add mem_dump_obj() to print source of memory block
  tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
  tools/nolibc: Fix position of -lgcc in the documented example
  tools/nolibc: Emit detailed error for missing alternate syscall number definitions
  tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
  tools/nolibc: Get timeval, timespec and timezone from linux/time.h
  tools/nolibc: Implement poll() based on ppoll()
  tools/nolibc: Implement fork() based on clone()
  tools/nolibc: Make getpgrp() fall back to getpgid(0)
  tools/nolibc: Make dup2() rely on dup3() when available
  tools/nolibc: Add the definition for dup()
  rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
  torture: Maintain torture-specific set of CPUs-online books
  torture: Clean up after torture-test CPU hotplugging
  rcutorture: Make object_debug also double call_rcu() heap object
  ...
parents 3f6ec19f 2b392cb1
......@@ -38,7 +38,7 @@ sections.
RCU-preempt Expedited Grace Periods
===================================
``CONFIG_PREEMPT=y`` kernels implement RCU-preempt.
``CONFIG_PREEMPTION=y`` kernels implement RCU-preempt.
The overall flow of the handling of a given CPU by an RCU-preempt
expedited grace period is shown in the following diagram:
......@@ -112,7 +112,7 @@ things.
RCU-sched Expedited Grace Periods
---------------------------------
``CONFIG_PREEMPT=n`` kernels implement RCU-sched. The overall flow of
``CONFIG_PREEMPTION=n`` kernels implement RCU-sched. The overall flow of
the handling of a given CPU by an RCU-sched expedited grace period is
shown in the following diagram:
......
......@@ -70,7 +70,7 @@ over a rather long period of time, but improvements are always welcome!
is less readable and prevents lockdep from detecting locking issues.
Letting RCU-protected pointers "leak" out of an RCU read-side
critical section is every bid as bad as letting them leak out
critical section is every bit as bad as letting them leak out
from under a lock. Unless, of course, you have arranged some
other means of protection, such as a lock or a reference count
-before- letting them out of the RCU read-side critical section.
......@@ -129,9 +129,7 @@ over a rather long period of time, but improvements are always welcome!
accesses. The rcu_dereference() primitive ensures that
the CPU picks up the pointer before it picks up the data
that the pointer points to. This really is necessary
on Alpha CPUs. If you don't believe me, see:
http://www.openvms.compaq.com/wizard/wiz_2637.html
on Alpha CPUs.
The rcu_dereference() primitive is also an excellent
documentation aid, letting the person reading the
......@@ -214,9 +212,9 @@ over a rather long period of time, but improvements are always welcome!
the rest of the system.
7. As of v4.20, a given kernel implements only one RCU flavor,
which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y.
which is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
If the updater uses call_rcu() or synchronize_rcu(),
then the corresponding readers my use rcu_read_lock() and
then the corresponding readers may use rcu_read_lock() and
rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
or any pair of primitives that disables and re-enables preemption,
for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
......
......@@ -9,7 +9,7 @@ RCU (read-copy update) is a synchronization mechanism that can be thought
of as a replacement for read-writer locking (among other things), but with
very low-overhead readers that are immune to deadlock, priority inversion,
and unbounded latency. RCU read-side critical sections are delimited
by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT
by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPTION
kernels, generate no code whatsoever.
This means that RCU writers are unaware of the presence of concurrent
......@@ -329,10 +329,10 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last
to smp_call_function() and further to smp_call_function_on_cpu(),
causing this latter to spin until the cross-CPU invocation of
rcu_barrier_func() has completed. This by itself would prevent
a grace period from completing on non-CONFIG_PREEMPT kernels,
a grace period from completing on non-CONFIG_PREEMPTION kernels,
since each CPU must undergo a context switch (or other quiescent
state) before the grace period can complete. However, this is
of no use in CONFIG_PREEMPT kernels.
of no use in CONFIG_PREEMPTION kernels.
Therefore, on_each_cpu() disables preemption across its call
to smp_call_function() and also across the local call to
......
......@@ -25,7 +25,7 @@ warnings:
- A CPU looping with bottom halves disabled.
- For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
- For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is
really expected and desirable behavior, you might need to add
some calls to cond_resched().
......@@ -44,7 +44,7 @@ warnings:
result in the ``rcu_.*kthread starved for`` console-log message,
which will include additional debugging information.
- A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
- A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
......@@ -92,7 +92,9 @@ warnings:
buggy timer hardware through bugs in the interrupt or exception
path (whether hardware, firmware, or software) through bugs
in Linux's timer subsystem through bugs in the scheduler, and,
yes, even including bugs in RCU itself.
yes, even including bugs in RCU itself. It can also result in
the ``rcu_.*timer wakeup didn't happen for`` console-log message,
which will include additional debugging information.
- A bug in the RCU implementation.
......@@ -292,6 +294,25 @@ kthread is waiting for a short timeout, the "state" precedes value of the
task_struct ->state field, and the "cpu" indicates that the grace-period
kthread last ran on CPU 5.
If the relevant grace-period kthread does not wake from FQS wait in a
reasonable time, then the following additional line is printed::
kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
The "23804" indicates that kthread's timer expired more than 23 thousand
jiffies ago. The rest of the line has meaning similar to the kthread
starvation case.
Additionally, the following line is printed::
Possible timer handling issue on cpu=4 timer-softirq=11142
Here "cpu" indicates that the grace-period kthread last ran on CPU 4,
where it queued the fqs timer. The number following the "timer-softirq"
is the current ``TIMER_SOFTIRQ`` count on cpu 4. If this value does not
change on successive RCU CPU stall warnings, there is further reason to
suspect a timer problem.
Multiple Warnings From One Stall
================================
......
......@@ -683,7 +683,7 @@ Quick Quiz #1:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section presents a "toy" RCU implementation that is based on
"classic RCU". It is also short on performance (but only for updates) and
on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
are the same as those shown in the preceding section, so they are omitted.
::
......@@ -739,7 +739,7 @@ Quick Quiz #2:
Quick Quiz #3:
If it is illegal to block in an RCU read-side
critical section, what the heck do you do in
PREEMPT_RT, where normal spinlocks can block???
CONFIG_PREEMPT_RT, where normal spinlocks can block???
:ref:`Answers to Quick Quiz <8_whatisRCU>`
......@@ -1093,7 +1093,7 @@ Quick Quiz #2:
overhead is **negative**.
Answer:
Imagine a single-CPU system with a non-CONFIG_PREEMPT
Imagine a single-CPU system with a non-CONFIG_PREEMPTION
kernel where a routing table is used by process-context
code, but can be updated by irq-context code (for example,
by an "ICMP REDIRECT" packet). The usual way of handling
......@@ -1120,10 +1120,10 @@ Answer:
Quick Quiz #3:
If it is illegal to block in an RCU read-side
critical section, what the heck do you do in
PREEMPT_RT, where normal spinlocks can block???
CONFIG_PREEMPT_RT, where normal spinlocks can block???
Answer:
Just as PREEMPT_RT permits preemption of spinlock
Just as CONFIG_PREEMPT_RT permits preemption of spinlock
critical sections, it permits preemption of RCU
read-side critical sections. It also permits
spinlocks blocking while in RCU read-side critical
......
......@@ -4078,6 +4078,10 @@
value, meaning that RCU_SOFTIRQ is used by default.
Specify rcutree.use_softirq=0 to use rcuc kthreads.
But note that CONFIG_PREEMPT_RT=y kernels disable
this kernel boot parameter, forcibly setting it
to zero.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
tree. This is used by rcutorture, and might
......@@ -4165,12 +4169,6 @@
Set wakeup interval for idle CPUs that have
RCU callbacks (RCU_FAST_NO_HZ=y).
rcutree.rcu_idle_lazy_gp_delay= [KNL]
Set wakeup interval for idle CPUs that have
only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y).
Lazy RCU callbacks are those which RCU can
prove do nothing more than free memory.
rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra
wake_up() if it sleeps three times longer than
......@@ -4324,6 +4322,14 @@
stress RCU, they don't participate in the actual
test, hence the "fake".
rcutorture.nocbs_nthreads= [KNL]
Set number of RCU callback-offload togglers.
Zero (the default) disables toggling.
rcutorture.nocbs_toggle= [KNL]
Set the delay in milliseconds between successive
callback-offload toggling attempts.
rcutorture.nreaders= [KNL]
Set number of RCU readers. The value -1 selects
N-1, where N is the number of CPUs. A value
......@@ -4456,6 +4462,13 @@
only normal grace-period primitives. No effect
on CONFIG_TINY_RCU kernels.
But note that CONFIG_PREEMPT_RT=y kernels enables
this kernel boot parameter, forcibly setting
it to the value one, that is, converting any
post-boot attempt at an expedited RCU grace
period to instead use normal non-expedited
grace-period processing.
rcupdate.rcu_task_ipi_delay= [KNL]
Set time in jiffies during which RCU tasks will
avoid sending IPIs, starting with the beginning
......@@ -4543,6 +4556,12 @@
refscale.verbose= [KNL]
Enable additional printk() statements.
refscale.verbose_batched= [KNL]
Batch the additional printk() statements. If zero
(the default) or negative, print everything. Otherwise,
print every Nth verbose statement, where N is the value
specified.
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
......@@ -5317,6 +5336,14 @@
are running concurrently, especially on systems
with rotating-rust storage.
torture.verbose_sleep_frequency= [KNL]
Specifies how many verbose printk()s should be
emitted between each sleep. The default of zero
disables verbose-printk() sleeping.
torture.verbose_sleep_duration= [KNL]
Duration of each verbose-printk() sleep in jiffies.
tp720= [HW,PS2]
tpm_suspend_pcr=[HW,TPM]
......
......@@ -111,6 +111,8 @@ static inline void cpu_maps_update_done(void)
#endif /* CONFIG_SMP */
extern struct bus_type cpu_subsys;
extern int lockdep_is_cpus_held(void);
#ifdef CONFIG_HOTPLUG_CPU
extern void cpus_write_lock(void);
extern void cpus_write_unlock(void);
......
......@@ -901,7 +901,7 @@ static inline void hlist_add_before(struct hlist_node *n,
}
/**
* hlist_add_behing - add a new entry after the one specified
* hlist_add_behind - add a new entry after the one specified
* @n: new entry to be added
* @prev: hlist node to add it after, which must be non-NULL
*/
......
......@@ -3177,5 +3177,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
extern int sysctl_nr_trim_pages;
void mem_dump_obj(void *object);
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
......@@ -63,6 +63,122 @@ struct rcu_cblist {
#define RCU_NEXT_TAIL 3
#define RCU_CBLIST_NSEGS 4
/*
* ==NOCB Offloading state machine==
*
*
* ----------------------------------------------------------------------------
* | SEGCBLIST_SOFTIRQ_ONLY |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, without holding nocb_lock. |
* ----------------------------------------------------------------------------
* |
* v
* ----------------------------------------------------------------------------
* | SEGCBLIST_OFFLOADED |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, while holding nocb_lock. Waking up CB and GP kthreads, |
* | allowing nocb_timer to be armed. |
* ----------------------------------------------------------------------------
* |
* v
* -----------------------------------
* | |
* v v
* --------------------------------------- ----------------------------------|
* | SEGCBLIST_OFFLOADED | | | SEGCBLIST_OFFLOADED | |
* | SEGCBLIST_KTHREAD_CB | | SEGCBLIST_KTHREAD_GP |
* | | | |
* | | | |
* | CB kthread woke up and | | GP kthread woke up and |
* | acknowledged SEGCBLIST_OFFLOADED. | | acknowledged SEGCBLIST_OFFLOADED|
* | Processes callbacks concurrently | | |
* | with rcu_core(), holding | | |
* | nocb_lock. | | |
* --------------------------------------- -----------------------------------
* | |
* -----------------------------------
* |
* v
* |--------------------------------------------------------------------------|
* | SEGCBLIST_OFFLOADED | |
* | SEGCBLIST_KTHREAD_CB | |
* | SEGCBLIST_KTHREAD_GP |
* | |
* | Kthreads handle callbacks holding nocb_lock, local rcu_core() stops |
* | handling callbacks. |
* ----------------------------------------------------------------------------
*/
/*
* ==NOCB De-Offloading state machine==
*
*
* |--------------------------------------------------------------------------|
* | SEGCBLIST_OFFLOADED | |
* | SEGCBLIST_KTHREAD_CB | |
* | SEGCBLIST_KTHREAD_GP |
* | |
* | CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core() |
* | ignores callbacks. |
* ----------------------------------------------------------------------------
* |
* v
* |--------------------------------------------------------------------------|
* | SEGCBLIST_KTHREAD_CB | |
* | SEGCBLIST_KTHREAD_GP |
* | |
* | CB/GP kthreads and local rcu_core() handle callbacks concurrently |
* | holding nocb_lock. Wake up CB and GP kthreads if necessary. |
* ----------------------------------------------------------------------------
* |
* v
* -----------------------------------
* | |
* v v
* ---------------------------------------------------------------------------|
* | |
* | SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP |
* | | |
* | GP kthread woke up and | CB kthread woke up and |
* | acknowledged the fact that | acknowledged the fact that |
* | SEGCBLIST_OFFLOADED got cleared. | SEGCBLIST_OFFLOADED got cleared. |
* | | The CB kthread goes to sleep |
* | The callbacks from the target CPU | until it ever gets re-offloaded. |
* | will be ignored from the GP kthread | |
* | loop. | |
* ----------------------------------------------------------------------------
* | |
* -----------------------------------
* |
* v
* ----------------------------------------------------------------------------
* | 0 |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, while holding nocb_lock. Forbid nocb_timer to be armed. |
* | Flush pending nocb_timer. Flush nocb bypass callbacks. |
* ----------------------------------------------------------------------------
* |
* v
* ----------------------------------------------------------------------------
* | SEGCBLIST_SOFTIRQ_ONLY |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, without holding nocb_lock. |
* ----------------------------------------------------------------------------
*/
#define SEGCBLIST_ENABLED BIT(0)
#define SEGCBLIST_SOFTIRQ_ONLY BIT(1)
#define SEGCBLIST_KTHREAD_CB BIT(2)
#define SEGCBLIST_KTHREAD_GP BIT(3)
#define SEGCBLIST_OFFLOADED BIT(4)
struct rcu_segcblist {
struct rcu_head *head;
struct rcu_head **tails[RCU_CBLIST_NSEGS];
......@@ -72,8 +188,8 @@ struct rcu_segcblist {
#else
long len;
#endif
u8 enabled;
u8 offloaded;
long seglen[RCU_CBLIST_NSEGS];
u8 flags;
};
#define RCU_SEGCBLIST_INITIALIZER(n) \
......
......@@ -33,6 +33,8 @@
#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
#define ulong2long(a) (*(long *)(&(a)))
#define USHORT_CMP_GE(a, b) (USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
#define USHORT_CMP_LT(a, b) (USHRT_MAX / 2 < (unsigned short)((a) - (b)))
/* Exported common interfaces */
void call_rcu(struct rcu_head *head, rcu_callback_t func);
......@@ -110,8 +112,12 @@ static inline void rcu_user_exit(void) { }
#ifdef CONFIG_RCU_NOCB_CPU
void rcu_init_nohz(void);
int rcu_nocb_cpu_offload(int cpu);
int rcu_nocb_cpu_deoffload(int cpu);
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
static inline void rcu_init_nohz(void) { }
static inline int rcu_nocb_cpu_offload(int cpu) { return -EINVAL; }
static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
/**
......@@ -846,19 +852,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
*/
#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
/*
* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
*/
#define __kvfree_rcu(head, offset) \
do { \
BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \
kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
} while (0)
/**
* kfree_rcu() - kfree an object after a grace period.
* @ptr: pointer to kfree
* @rhf: the name of the struct rcu_head within the type of @ptr.
* @ptr: pointer to kfree for both single- and double-argument invocations.
* @rhf: the name of the struct rcu_head within the type of @ptr,
* but only for double-argument invocations.
*
* Many rcu callbacks functions just call kfree() on the base structure.
* These functions are trivial, but their size adds up, and furthermore
......@@ -871,7 +869,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* Because the functions are not allowed in the low-order 4096 bytes of
* kernel virtual memory, offsets up to 4095 bytes can be accommodated.
* If the offset is larger than 4095 bytes, a compile-time error will
* be generated in __kvfree_rcu(). If this error is triggered, you can
* be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
* either fall back to use of call_rcu() or rearrange the structure to
* position the rcu_head structure into the first 4096 bytes.
*
......@@ -881,13 +879,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* The BUILD_BUG_ON check must not involve any function calls, hence the
* checks are done in macros here.
*/
#define kfree_rcu(ptr, rhf) \
do { \
typeof (ptr) ___p = (ptr); \
\
if (___p) \
__kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
} while (0)
#define kfree_rcu kvfree_rcu
/**
* kvfree_rcu() - kvfree an object after a grace period.
......@@ -919,7 +911,17 @@ do { \
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
#define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf)
#define kvfree_rcu_arg_2(ptr, rhf) \
do { \
typeof (ptr) ___p = (ptr); \
\
if (___p) { \
BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
kvfree_call_rcu(&((___p)->rhf), (rcu_callback_t)(unsigned long) \
(offsetof(typeof(*(ptr)), rhf))); \
} \
} while (0)
#define kvfree_rcu_arg_1(ptr) \
do { \
typeof(ptr) ___p = (ptr); \
......
......@@ -186,6 +186,8 @@ void kfree(const void *);
void kfree_sensitive(const void *);
size_t __ksize(const void *);
size_t ksize(const void *);
bool kmem_valid_obj(void *object);
void kmem_dump_obj(void *object);
#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
......
......@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
void synchronize_srcu(struct srcu_struct *ssp);
unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
......
......@@ -15,7 +15,8 @@
struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
short srcu_idx; /* Current reader array element. */
unsigned short srcu_idx; /* Current reader array element in bit 0x2. */
unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
......@@ -59,7 +60,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp)
{
int idx;
idx = READ_ONCE(ssp->srcu_idx);
idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
WRITE_ONCE(ssp->srcu_lock_nesting[idx], ssp->srcu_lock_nesting[idx] + 1);
return idx;
}
......@@ -80,7 +81,7 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
{
int idx;
idx = READ_ONCE(ssp->srcu_idx) & 0x1;
idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n",
tt, tf, idx,
READ_ONCE(ssp->srcu_lock_nesting[!idx]),
......
......@@ -192,6 +192,8 @@ extern int try_to_del_timer_sync(struct timer_list *timer);
#define del_singleshot_timer_sync(t) del_timer_sync(t)
extern bool timer_curr_running(struct timer_list *timer);
extern void init_timers(void);
struct hrtimer;
extern enum hrtimer_restart it_real_fn(struct hrtimer *);
......
......@@ -32,11 +32,27 @@
#define TOROUT_STRING(s) \
pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s)
#define VERBOSE_TOROUT_STRING(s) \
do { if (verbose) pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s); } while (0)
do { \
if (verbose) { \
verbose_torout_sleep(); \
pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s); \
} \
} while (0)
#define VERBOSE_TOROUT_ERRSTRING(s) \
do { if (verbose) pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); } while (0)
do { \
if (verbose) { \
verbose_torout_sleep(); \
pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); \
} \
} while (0)
void verbose_torout_sleep(void);
/* Definitions for online/offline exerciser. */
#ifdef CONFIG_HOTPLUG_CPU
int torture_num_online_cpus(void);
#else /* #ifdef CONFIG_HOTPLUG_CPU */
static inline int torture_num_online_cpus(void) { return 1; }
#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
typedef void torture_ofl_func(void);
bool torture_offline(int cpu, long *n_onl_attempts, long *n_onl_successes,
unsigned long *sum_offl, int *min_onl, int *max_onl);
......@@ -61,6 +77,13 @@ static inline void torture_random_init(struct torture_random_state *trsp)
trsp->trs_count = 0;
}
/* Definitions for high-resolution-timer sleeps. */
int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, struct torture_random_state *trsp);
int torture_hrtimeout_us(u32 baset_us, u32 fuzzt_ns, struct torture_random_state *trsp);
int torture_hrtimeout_ms(u32 baset_ms, u32 fuzzt_us, struct torture_random_state *trsp);
int torture_hrtimeout_jiffies(u32 baset_j, struct torture_random_state *trsp);
int torture_hrtimeout_s(u32 baset_s, u32 fuzzt_ms, struct torture_random_state *trsp);
/* Task shuffler, which causes CPUs to occasionally go idle. */
void torture_shuffle_task_register(struct task_struct *tp);
int torture_shuffle_init(long shuffint);
......
......@@ -241,4 +241,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
int register_vmap_purge_notifier(struct notifier_block *nb);
int unregister_vmap_purge_notifier(struct notifier_block *nb);
#ifdef CONFIG_MMU
bool vmalloc_dump_obj(void *object);
#else
static inline bool vmalloc_dump_obj(void *object) { return false; }
#endif
#endif /* _LINUX_VMALLOC_H */
......@@ -505,6 +505,32 @@ TRACE_EVENT_RCU(rcu_callback,
__entry->qlen)
);
TRACE_EVENT_RCU(rcu_segcb_stats,
TP_PROTO(struct rcu_segcblist *rs, const char *ctx),
TP_ARGS(rs, ctx),
TP_STRUCT__entry(
__field(const char *, ctx)
__array(unsigned long, gp_seq, RCU_CBLIST_NSEGS)
__array(long, seglen, RCU_CBLIST_NSEGS)
),
TP_fast_assign(
__entry->ctx = ctx;
memcpy(__entry->seglen, rs->seglen, RCU_CBLIST_NSEGS * sizeof(long));
memcpy(__entry->gp_seq, rs->gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long));
),
TP_printk("%s seglen: (DONE=%ld, WAIT=%ld, NEXT_READY=%ld, NEXT=%ld) "
"gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
__entry->seglen[0], __entry->seglen[1], __entry->seglen[2], __entry->seglen[3],
__entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
);
/*
* Tracepoint for the registration of a single RCU callback of the special
* kvfree() form. The first argument is the RCU type, the second argument
......
......@@ -330,6 +330,13 @@ void lockdep_assert_cpus_held(void)
percpu_rwsem_assert_held(&cpu_hotplug_lock);
}
#ifdef CONFIG_LOCKDEP
int lockdep_is_cpus_held(void)
{
return percpu_rwsem_is_held(&cpu_hotplug_lock);
}
#endif
static void lockdep_acquire_cpus_lock(void)
{
rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
......
......@@ -27,7 +27,6 @@
#include <linux/moduleparam.h>
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/percpu-rwsem.h>
#include <linux/torture.h>
#include <linux/reboot.h>
......
......@@ -95,6 +95,7 @@ config TASKS_RUDE_RCU
config TASKS_TRACE_RCU
def_bool 0
select IRQ_WORK
help
This option enables a task-based RCU implementation that uses
explicit rcu_read_lock_trace() read-side markers, and allows
......@@ -188,8 +189,8 @@ config RCU_FAST_NO_HZ
config RCU_BOOST
bool "Enable RCU priority boosting"
depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT
default n
depends on (RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT) || PREEMPT_RT
default y if PREEMPT_RT
help
This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long.
......
......@@ -378,7 +378,11 @@ do { \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_rcu_node(p) raw_spin_unlock(&ACCESS_PRIVATE(p, lock))
#define raw_spin_unlock_rcu_node(p) \
do { \
lockdep_assert_irqs_disabled(); \
raw_spin_unlock(&ACCESS_PRIVATE(p, lock)); \
} while (0)
#define raw_spin_lock_irq_rcu_node(p) \
do { \
......@@ -387,7 +391,10 @@ do { \
} while (0)
#define raw_spin_unlock_irq_rcu_node(p) \
raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
do { \
lockdep_assert_irqs_disabled(); \
raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock)); \
} while (0)
#define raw_spin_lock_irqsave_rcu_node(p, flags) \
do { \
......@@ -396,7 +403,10 @@ do { \
} while (0)
#define raw_spin_unlock_irqrestore_rcu_node(p, flags) \
raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags)
do { \
lockdep_assert_irqs_disabled(); \
raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags); \
} while (0)
#define raw_spin_trylock_rcu_node(p) \
({ \
......
This diff is collapsed.
......@@ -15,6 +15,9 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
return READ_ONCE(rclp->len);
}
/* Return number of callbacks in segmented callback list by summing seglen. */
long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
void rcu_cblist_init(struct rcu_cblist *rclp);
void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
......@@ -50,19 +53,51 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp)
#endif
}
static inline void rcu_segcblist_set_flags(struct rcu_segcblist *rsclp,
int flags)
{
rsclp->flags |= flags;
}
static inline void rcu_segcblist_clear_flags(struct rcu_segcblist *rsclp,
int flags)
{
rsclp->flags &= ~flags;
}
static inline bool rcu_segcblist_test_flags(struct rcu_segcblist *rsclp,
int flags)
{
return READ_ONCE(rsclp->flags) & flags;
}
/*
* Is the specified rcu_segcblist enabled, for example, not corresponding
* to an offline CPU?
*/
static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
{
return rsclp->enabled;
return rcu_segcblist_test_flags(rsclp, SEGCBLIST_ENABLED);
}
/* Is the specified rcu_segcblist offloaded? */
/* Is the specified rcu_segcblist offloaded, or is SEGCBLIST_SOFTIRQ_ONLY set? */
static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
{
return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;
if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
!rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY))
return true;
return false;
}
static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rsclp)
{
int flags = SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP | SEGCBLIST_OFFLOADED;
if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) && (rsclp->flags & flags) == flags)
return true;
return false;
}
/*
......@@ -75,10 +110,22 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
return !READ_ONCE(*READ_ONCE(rsclp->tails[seg]));
}
/*
* Is the specified segment of the specified rcu_segcblist structure
* empty of callbacks?
*/
static inline bool rcu_segcblist_segempty(struct rcu_segcblist *rsclp, int seg)
{
if (seg == RCU_DONE_TAIL)
return &rsclp->head == rsclp->tails[RCU_DONE_TAIL];
return rsclp->tails[seg - 1] == rsclp->tails[seg];
}
void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
void rcu_segcblist_init(struct rcu_segcblist *rsclp);
void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
......@@ -88,8 +135,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
struct rcu_head *rhp);
bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
struct rcu_head *rhp);
void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp);
void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp);
void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
......
This diff is collapsed.
......@@ -46,6 +46,18 @@
#define VERBOSE_SCALEOUT(s, x...) \
do { if (verbose) pr_alert("%s" SCALE_FLAG s, scale_type, ## x); } while (0)
static atomic_t verbose_batch_ctr;
#define VERBOSE_SCALEOUT_BATCH(s, x...) \
do { \
if (verbose && \
(verbose_batched <= 0 || \
!(atomic_inc_return(&verbose_batch_ctr) % verbose_batched))) { \
schedule_timeout_uninterruptible(1); \
pr_alert("%s" SCALE_FLAG s, scale_type, ## x); \
} \
} while (0)
#define VERBOSE_SCALEOUT_ERRSTRING(s, x...) \
do { if (verbose) pr_alert("%s" SCALE_FLAG "!!! " s, scale_type, ## x); } while (0)
......@@ -57,6 +69,7 @@ module_param(scale_type, charp, 0444);
MODULE_PARM_DESC(scale_type, "Type of test (rcu, srcu, refcnt, rwsem, rwlock.");
torture_param(int, verbose, 0, "Enable verbose debugging printk()s");
torture_param(int, verbose_batched, 0, "Batch verbose debugging printk()s");
// Wait until there are multiple CPUs before starting test.
torture_param(int, holdoff, IS_BUILTIN(CONFIG_RCU_REF_SCALE_TEST) ? 10 : 0,
......@@ -368,14 +381,14 @@ ref_scale_reader(void *arg)
u64 start;
s64 duration;
VERBOSE_SCALEOUT("ref_scale_reader %ld: task started", me);
VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: task started", me);
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
set_user_nice(current, MAX_NICE);
atomic_inc(&n_init);
if (holdoff)
schedule_timeout_interruptible(holdoff * HZ);
repeat:
VERBOSE_SCALEOUT("ref_scale_reader %ld: waiting to start next experiment on cpu %d", me, smp_processor_id());
VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: waiting to start next experiment on cpu %d", me, smp_processor_id());
// Wait for signal that this reader can start.
wait_event(rt->wq, (atomic_read(&nreaders_exp) && smp_load_acquire(&rt->start_reader)) ||
......@@ -392,7 +405,7 @@ ref_scale_reader(void *arg)
while (atomic_read_acquire(&n_started))
cpu_relax();
VERBOSE_SCALEOUT("ref_scale_reader %ld: experiment %d started", me, exp_idx);
VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: experiment %d started", me, exp_idx);
// To reduce noise, do an initial cache-warming invocation, check
......@@ -421,7 +434,7 @@ ref_scale_reader(void *arg)
if (atomic_dec_and_test(&nreaders_exp))
wake_up(&main_wq);
VERBOSE_SCALEOUT("ref_scale_reader %ld: experiment %d ended, (readers remaining=%d)",
VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: experiment %d ended, (readers remaining=%d)",
me, exp_idx, atomic_read(&nreaders_exp));
if (!torture_must_stop())
......
......@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
ssp->srcu_idx_max = 0;
INIT_WORK(&ssp->srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(&ssp->srcu_work.entry);
return 0;
......@@ -84,6 +85,8 @@ void cleanup_srcu_struct(struct srcu_struct *ssp)
WARN_ON(ssp->srcu_gp_waiting);
WARN_ON(ssp->srcu_cb_head);
WARN_ON(&ssp->srcu_cb_head != ssp->srcu_cb_tail);
WARN_ON(ssp->srcu_idx != ssp->srcu_idx_max);
WARN_ON(ssp->srcu_idx & 0x1);
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
......@@ -114,7 +117,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
ssp = container_of(wp, struct srcu_struct, srcu_work);
if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
/* Remove recently arrived callbacks and wait for readers. */
......@@ -124,11 +127,12 @@ void srcu_drive_gp(struct work_struct *wp)
ssp->srcu_cb_head = NULL;
ssp->srcu_cb_tail = &ssp->srcu_cb_head;
local_irq_enable();
idx = ssp->srcu_idx;
WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx);
idx = (ssp->srcu_idx & 0x2) / 2;
WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
/* Invoke the callbacks we removed above. */
while (lh) {
......@@ -146,11 +150,27 @@ void srcu_drive_gp(struct work_struct *wp)
* straighten that out.
*/
WRITE_ONCE(ssp->srcu_gp_running, false);
if (READ_ONCE(ssp->srcu_cb_head))
if (USHORT_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
schedule_work(&ssp->srcu_work);
}
EXPORT_SYMBOL_GPL(srcu_drive_gp);
static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
{
unsigned short cookie;
cookie = get_state_synchronize_srcu(ssp);
if (USHORT_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie))
return;
WRITE_ONCE(ssp->srcu_idx_max, cookie);
if (!READ_ONCE(ssp->srcu_gp_running)) {
if (likely(srcu_init_done))
schedule_work(&ssp->srcu_work);
else if (list_empty(&ssp->srcu_work.entry))
list_add(&ssp->srcu_work.entry, &srcu_boot_list);
}
}
/*
* Enqueue an SRCU callback on the specified srcu_struct structure,
* initiating grace-period processing if it is not already running.
......@@ -166,12 +186,7 @@ void call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
*ssp->srcu_cb_tail = rhp;
ssp->srcu_cb_tail = &rhp->next;
local_irq_restore(flags);
if (!READ_ONCE(ssp->srcu_gp_running)) {
if (likely(srcu_init_done))
schedule_work(&ssp->srcu_work);
else if (list_empty(&ssp->srcu_work.entry))
list_add(&ssp->srcu_work.entry, &srcu_boot_list);
}
srcu_gp_start_if_needed(ssp);
}
EXPORT_SYMBOL_GPL(call_srcu);
......@@ -190,6 +205,48 @@ void synchronize_srcu(struct srcu_struct *ssp)
}
EXPORT_SYMBOL_GPL(synchronize_srcu);
/*
* get_state_synchronize_srcu - Provide an end-of-grace-period cookie
*/
unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
{
unsigned long ret;
barrier();
ret = (READ_ONCE(ssp->srcu_idx) + 3) & ~0x1;
barrier();
return ret & USHRT_MAX;
}
EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
/*
* start_poll_synchronize_srcu - Provide cookie and start grace period
*
* The difference between this and get_state_synchronize_srcu() is that
* this function ensures that the poll_state_synchronize_srcu() will
* eventually return the value true.
*/
unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
{
unsigned long ret = get_state_synchronize_srcu(ssp);
srcu_gp_start_if_needed(ssp);
return ret;
}
EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
/*
* poll_state_synchronize_srcu - Has cookie's grace period ended?
*/
bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie)
{
bool ret = USHORT_CMP_GE(READ_ONCE(ssp->srcu_idx), cookie);
barrier();
return ret;
}
EXPORT_SYMBOL_GPL(poll_state_synchronize_srcu);
/* Lockdep diagnostics. */
void __init rcu_scheduler_starting(void)
{
......
......@@ -807,6 +807,46 @@ static void srcu_leak_callback(struct rcu_head *rhp)
{
}
/*
* Start an SRCU grace period, and also queue the callback if non-NULL.
*/
static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
struct rcu_head *rhp, bool do_norm)
{
unsigned long flags;
int idx;
bool needexp = false;
bool needgp = false;
unsigned long s;
struct srcu_data *sdp;
check_init_srcu_struct(ssp);
idx = srcu_read_lock(ssp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
if (rhp)
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq));
s = rcu_seq_snap(&ssp->srcu_gp_seq);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
sdp->srcu_gp_seq_needed = s;
needgp = true;
}
if (!do_norm && ULONG_CMP_LT(sdp->srcu_gp_seq_needed_exp, s)) {
sdp->srcu_gp_seq_needed_exp = s;
needexp = true;
}
spin_unlock_irqrestore_rcu_node(sdp, flags);
if (needgp)
srcu_funnel_gp_start(ssp, sdp, s, do_norm);
else if (needexp)
srcu_funnel_exp_start(ssp, sdp->mynode, s);
srcu_read_unlock(ssp, idx);
return s;
}
/*
* Enqueue an SRCU callback on the srcu_data structure associated with
* the current CPU and the specified srcu_struct structure, initiating
......@@ -838,14 +878,6 @@ static void srcu_leak_callback(struct rcu_head *rhp)
static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
rcu_callback_t func, bool do_norm)
{
unsigned long flags;
int idx;
bool needexp = false;
bool needgp = false;
unsigned long s;
struct srcu_data *sdp;
check_init_srcu_struct(ssp);
if (debug_rcu_head_queue(rhp)) {
/* Probable double call_srcu(), so leak the callback. */
WRITE_ONCE(rhp->func, srcu_leak_callback);
......@@ -853,28 +885,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
return;
}
rhp->func = func;
idx = srcu_read_lock(ssp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq));
s = rcu_seq_snap(&ssp->srcu_gp_seq);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
sdp->srcu_gp_seq_needed = s;
needgp = true;
}
if (!do_norm && ULONG_CMP_LT(sdp->srcu_gp_seq_needed_exp, s)) {
sdp->srcu_gp_seq_needed_exp = s;
needexp = true;
}
spin_unlock_irqrestore_rcu_node(sdp, flags);
if (needgp)
srcu_funnel_gp_start(ssp, sdp, s, do_norm);
else if (needexp)
srcu_funnel_exp_start(ssp, sdp->mynode, s);
srcu_read_unlock(ssp, idx);
(void)srcu_gp_start_if_needed(ssp, rhp, do_norm);
}
/**
......@@ -1003,6 +1014,77 @@ void synchronize_srcu(struct srcu_struct *ssp)
}
EXPORT_SYMBOL_GPL(synchronize_srcu);
/**
* get_state_synchronize_srcu - Provide an end-of-grace-period cookie
* @ssp: srcu_struct to provide cookie for.
*
* This function returns a cookie that can be passed to
* poll_state_synchronize_srcu(), which will return true if a full grace
* period has elapsed in the meantime. It is the caller's responsibility
* to make sure that grace period happens, for example, by invoking
* call_srcu() after return from get_state_synchronize_srcu().
*/
unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
{
// Any prior manipulation of SRCU-protected data must happen
// before the load from ->srcu_gp_seq.
smp_mb();
return rcu_seq_snap(&ssp->srcu_gp_seq);
}
EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
/**
* start_poll_synchronize_srcu - Provide cookie and start grace period
* @ssp: srcu_struct to provide cookie for.
*
* This function returns a cookie that can be passed to
* poll_state_synchronize_srcu(), which will return true if a full grace
* period has elapsed in the meantime. Unlike get_state_synchronize_srcu(),
* this function also ensures that any needed SRCU grace period will be
* started. This convenience does come at a cost in terms of CPU overhead.
*/
unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
{
return srcu_gp_start_if_needed(ssp, NULL, true);
}
EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
/**
* poll_state_synchronize_srcu - Has cookie's grace period ended?
* @ssp: srcu_struct to provide cookie for.
* @cookie: Return value from get_state_synchronize_srcu() or start_poll_synchronize_srcu().
*
* This function takes the cookie that was returned from either
* get_state_synchronize_srcu() or start_poll_synchronize_srcu(), and
* returns @true if an SRCU grace period elapsed since the time that the
* cookie was created.
*
* Because cookies are finite in size, wrapping/overflow is possible.
* This is more pronounced on 32-bit systems where cookies are 32 bits,
* where in theory wrapping could happen in about 14 hours assuming
* 25-microsecond expedited SRCU grace periods. However, a more likely
* overflow lower bound is on the order of 24 days in the case of
* one-millisecond SRCU grace periods. Of course, wrapping in a 64-bit
* system requires geologic timespans, as in more than seven million years
* even for expedited SRCU grace periods.
*
* Wrapping/overflow is much more of an issue for CONFIG_SMP=n systems
* that also have CONFIG_PREEMPTION=n, which selects Tiny SRCU. This uses
* a 16-bit cookie, which rcutorture routinely wraps in a matter of a
* few minutes. If this proves to be a problem, this counter will be
* expanded to the same size as for Tree SRCU.
*/
bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie)
{
if (!rcu_seq_done(&ssp->srcu_gp_seq, cookie))
return false;
// Ensure that the end of the SRCU grace period happens before
// any subsequent code that the caller might execute.
smp_mb(); // ^^^
return true;
}
EXPORT_SYMBOL_GPL(poll_state_synchronize_srcu);
/*
* Callback function for srcu_barrier() use.
*/
......@@ -1160,6 +1242,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
*/
static void srcu_invoke_callbacks(struct work_struct *work)
{
long len;
bool more;
struct rcu_cblist ready_cbs;
struct rcu_head *rhp;
......@@ -1182,6 +1265,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
/* We are on the job! Extract and invoke ready callbacks. */
sdp->srcu_cblist_invoking = true;
rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs);
len = ready_cbs.len;
spin_unlock_irq_rcu_node(sdp);
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
......@@ -1190,13 +1274,14 @@ static void srcu_invoke_callbacks(struct work_struct *work)
rhp->func(rhp);
local_bh_enable();
}
WARN_ON_ONCE(ready_cbs.len);
/*
* Update counts, accelerate new callbacks, and if needed,
* schedule another round of callback invocation.
*/
spin_lock_irq_rcu_node(sdp);
rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
rcu_segcblist_add_len(&sdp->srcu_cblist, -len);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&ssp->srcu_gp_seq));
sdp->srcu_cblist_invoking = false;
......
......@@ -1224,6 +1224,82 @@ void show_rcu_tasks_gp_kthreads(void)
}
#endif /* #ifndef CONFIG_TINY_RCU */
#ifdef CONFIG_PROVE_RCU
struct rcu_tasks_test_desc {
struct rcu_head rh;
const char *name;
bool notrun;
};
static struct rcu_tasks_test_desc tests[] = {
{
.name = "call_rcu_tasks()",
/* If not defined, the test is skipped. */
.notrun = !IS_ENABLED(CONFIG_TASKS_RCU),
},
{
.name = "call_rcu_tasks_rude()",
/* If not defined, the test is skipped. */
.notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU),
},
{
.name = "call_rcu_tasks_trace()",
/* If not defined, the test is skipped. */
.notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU)
}
};
static void test_rcu_tasks_callback(struct rcu_head *rhp)
{
struct rcu_tasks_test_desc *rttd =
container_of(rhp, struct rcu_tasks_test_desc, rh);
pr_info("Callback from %s invoked.\n", rttd->name);
rttd->notrun = true;
}
static void rcu_tasks_initiate_self_tests(void)
{
pr_info("Running RCU-tasks wait API self tests\n");
#ifdef CONFIG_TASKS_RCU
synchronize_rcu_tasks();
call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback);
#endif
#ifdef CONFIG_TASKS_RUDE_RCU
synchronize_rcu_tasks_rude();
call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback);
#endif
#ifdef CONFIG_TASKS_TRACE_RCU
synchronize_rcu_tasks_trace();
call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback);
#endif
}
static int rcu_tasks_verify_self_tests(void)
{
int ret = 0;
int i;
for (i = 0; i < ARRAY_SIZE(tests); i++) {
if (!tests[i].notrun) { // still hanging.
pr_err("%s has been failed.\n", tests[i].name);
ret = -1;
}
}
if (ret)
WARN_ON(1);
return ret;
}
late_initcall(rcu_tasks_verify_self_tests);
#else /* #ifdef CONFIG_PROVE_RCU */
static void rcu_tasks_initiate_self_tests(void) { }
#endif /* #else #ifdef CONFIG_PROVE_RCU */
void __init rcu_init_tasks_generic(void)
{
#ifdef CONFIG_TASKS_RCU
......@@ -1237,6 +1313,9 @@ void __init rcu_init_tasks_generic(void)
#ifdef CONFIG_TASKS_TRACE_RCU
rcu_spawn_tasks_trace_kthread();
#endif
// Run the self-tests.
rcu_tasks_initiate_self_tests();
}
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
......
This diff is collapsed.
......@@ -201,6 +201,7 @@ struct rcu_data {
/* 5) Callback offloading. */
#ifdef CONFIG_RCU_NOCB_CPU
struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
struct swait_queue_head nocb_state_wq; /* For offloading state changes */
struct task_struct *nocb_gp_kthread;
raw_spinlock_t nocb_lock; /* Guard following pair of fields. */
atomic_t nocb_lock_contended; /* Contention experienced. */
......@@ -256,6 +257,7 @@ struct rcu_data {
};
/* Values for nocb_defer_wakeup field in struct rcu_data. */
#define RCU_NOCB_WAKE_OFF -1
#define RCU_NOCB_WAKE_NOT 0
#define RCU_NOCB_WAKE 1
#define RCU_NOCB_WAKE_FORCE 2
......
......@@ -545,7 +545,7 @@ static void synchronize_rcu_expedited_wait(void)
data_race(rnp_root->expmask),
".T"[!!data_race(rnp_root->exp_tasks)]);
if (ndetected) {
pr_err("blocking rcu_node structures:");
pr_err("blocking rcu_node structures (internal RCU debug):");
rcu_for_each_node_breadth_first(rnp) {
if (rnp == rnp_root)
continue; /* printed unconditionally */
......
This diff is collapsed.
......@@ -266,6 +266,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
struct task_struct *t;
struct task_struct *ts[8];
lockdep_assert_irqs_disabled();
if (!rcu_preempt_blocked_readers_cgp(rnp))
return 0;
pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
......@@ -290,6 +291,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
".q"[rscr.rs.b.need_qs],
".e"[rscr.rs.b.exp_hint],
".l"[rscr.on_blkd_list]);
lockdep_assert_irqs_disabled();
put_task_struct(t);
ndetected++;
}
......@@ -333,9 +335,12 @@ static void rcu_dump_cpu_stacks(void)
rcu_for_each_leaf_node(rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
for_each_leaf_node_possible_cpu(rnp, cpu)
if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
if (!trigger_single_cpu_backtrace(cpu))
if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
if (cpu_is_offline(cpu))
pr_err("Offline CPU %d blocking current GP.\n", cpu);
else if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
}
......@@ -449,25 +454,66 @@ static void print_cpu_stall_info(int cpu)
/* Complain about starvation of grace-period kthread. */
static void rcu_check_gp_kthread_starvation(void)
{
int cpu;
struct task_struct *gpk = rcu_state.gp_kthread;
unsigned long j;
if (rcu_is_gp_kthread_starving(&j)) {
cpu = gpk ? task_cpu(gpk) : -1;
pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
rcu_state.name, j,
(long)rcu_seq_current(&rcu_state.gp_seq),
data_race(rcu_state.gp_flags),
gp_state_getname(rcu_state.gp_state), rcu_state.gp_state,
gpk ? gpk->state : ~0, gpk ? task_cpu(gpk) : -1);
gpk ? gpk->state : ~0, cpu);
if (gpk) {
pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
pr_err("RCU grace-period kthread stack dump:\n");
sched_show_task(gpk);
if (cpu >= 0) {
if (cpu_is_offline(cpu)) {
pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
} else {
pr_err("Stack dump where RCU GP kthread last ran:\n");
if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
}
}
wake_up_process(gpk);
}
}
}
/* Complain about missing wakeups from expired fqs wait timer */
static void rcu_check_gp_kthread_expired_fqs_timer(void)
{
struct task_struct *gpk = rcu_state.gp_kthread;
short gp_state;
unsigned long jiffies_fqs;
int cpu;
/*
* Order reads of .gp_state and .jiffies_force_qs.
* Matching smp_wmb() is present in rcu_gp_fqs_loop().
*/
gp_state = smp_load_acquire(&rcu_state.gp_state);
jiffies_fqs = READ_ONCE(rcu_state.jiffies_force_qs);
if (gp_state == RCU_GP_WAIT_FQS &&
time_after(jiffies, jiffies_fqs + RCU_STALL_MIGHT_MIN) &&
gpk && !READ_ONCE(gpk->on_rq)) {
cpu = task_cpu(gpk);
pr_err("%s kthread timer wakeup didn't happen for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx\n",
rcu_state.name, (jiffies - jiffies_fqs),
(long)rcu_seq_current(&rcu_state.gp_seq),
data_race(rcu_state.gp_flags),
gp_state_getname(RCU_GP_WAIT_FQS), RCU_GP_WAIT_FQS,
gpk->state);
pr_err("\tPossible timer handling issue on cpu=%d timer-softirq=%u\n",
cpu, kstat_softirqs_cpu(TIMER_SOFTIRQ, cpu));
}
}
static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
{
int cpu;
......@@ -478,6 +524,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
struct rcu_node *rnp;
long totqlen = 0;
lockdep_assert_irqs_disabled();
/* Kick and suppress, if so configured. */
rcu_stall_kick_kthreads();
if (rcu_stall_is_suppressed())
......@@ -499,6 +547,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
}
}
ndetected += rcu_print_task_stall(rnp, flags); // Releases rnp->lock.
lockdep_assert_irqs_disabled();
}
for_each_possible_cpu(cpu)
......@@ -529,6 +578,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
WRITE_ONCE(rcu_state.jiffies_stall,
jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
rcu_check_gp_kthread_expired_fqs_timer();
rcu_check_gp_kthread_starvation();
panic_on_rcu_stall();
......@@ -544,6 +594,8 @@ static void print_cpu_stall(unsigned long gps)
struct rcu_node *rnp = rcu_get_root();
long totqlen = 0;
lockdep_assert_irqs_disabled();
/* Kick and suppress, if so configured. */
rcu_stall_kick_kthreads();
if (rcu_stall_is_suppressed())
......@@ -564,6 +616,7 @@ static void print_cpu_stall(unsigned long gps)
jiffies - gps,
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen);
rcu_check_gp_kthread_expired_fqs_timer();
rcu_check_gp_kthread_starvation();
rcu_dump_cpu_stacks();
......@@ -598,6 +651,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
unsigned long js;
struct rcu_node *rnp;
lockdep_assert_irqs_disabled();
if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) ||
!rcu_gp_in_progress())
return;
......
......@@ -56,8 +56,10 @@
#ifndef CONFIG_TINY_RCU
module_param(rcu_expedited, int, 0);
module_param(rcu_normal, int, 0);
static int rcu_normal_after_boot;
static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT);
#ifndef CONFIG_PREEMPT_RT
module_param(rcu_normal_after_boot, int, 0);
#endif
#endif /* #ifndef CONFIG_TINY_RCU */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
......
......@@ -398,6 +398,7 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
static int scftorture_invoker(void *arg)
{
int cpu;
int curcpu;
DEFINE_TORTURE_RANDOM(rand);
struct scf_statistics *scfp = (struct scf_statistics *)arg;
bool was_offline = false;
......@@ -412,7 +413,10 @@ static int scftorture_invoker(void *arg)
VERBOSE_SCFTORTOUT("scftorture_invoker %d: Waiting for all SCF torturers from cpu %d", scfp->cpu, smp_processor_id());
// Make sure that the CPU is affinitized appropriately during testing.
WARN_ON_ONCE(smp_processor_id() != scfp->cpu);
curcpu = smp_processor_id();
WARN_ONCE(curcpu != scfp->cpu % nr_cpu_ids,
"%s: Wanted CPU %d, running on %d, nr_cpu_ids = %d\n",
__func__, scfp->cpu, curcpu, nr_cpu_ids);
if (!atomic_dec_return(&n_started))
while (atomic_read_acquire(&n_started)) {
......
......@@ -3478,7 +3478,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
/**
* try_invoke_on_locked_down_task - Invoke a function on task in fixed state
* @p: Process for which the function is to be invoked.
* @p: Process for which the function is to be invoked, can be @current.
* @func: Function to invoke.
* @arg: Argument to function.
*
......@@ -3496,12 +3496,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
*/
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
{
bool ret = false;
struct rq_flags rf;
bool ret = false;
struct rq *rq;
lockdep_assert_irqs_enabled();
raw_spin_lock_irq(&p->pi_lock);
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
if (p->on_rq) {
rq = __task_rq_lock(p, &rf);
if (task_rq(p) == rq)
......@@ -3518,7 +3517,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
ret = func(p, arg);
}
}
raw_spin_unlock_irq(&p->pi_lock);
raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
return ret;
}
......
......@@ -1237,6 +1237,20 @@ int try_to_del_timer_sync(struct timer_list *timer)
}
EXPORT_SYMBOL(try_to_del_timer_sync);
bool timer_curr_running(struct timer_list *timer)
{
int i;
for (i = 0; i < NR_BASES; i++) {
struct timer_base *base = this_cpu_ptr(&timer_bases[i]);
if (base->running_timer == timer)
return true;
}
return false;
}
#ifdef CONFIG_PREEMPT_RT
static __init void timer_base_init_expiry_lock(struct timer_base *base)
{
......
This diff is collapsed.
......@@ -5,6 +5,7 @@
#include <linux/sched.h>
#include <linux/wait.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/percpu-refcount.h>
/*
......@@ -168,6 +169,7 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
struct percpu_ref_data, rcu);
struct percpu_ref *ref = data->ref;
unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
static atomic_t underflows;
unsigned long count = 0;
int cpu;
......@@ -191,9 +193,13 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
*/
atomic_long_add((long)count - PERCPU_COUNT_BIAS, &data->count);
WARN_ONCE(atomic_long_read(&data->count) <= 0,
if (WARN_ONCE(atomic_long_read(&data->count) <= 0,
"percpu ref (%ps) <= 0 (%ld) after switching to atomic",
data->release, atomic_long_read(&data->count));
data->release, atomic_long_read(&data->count)) &&
atomic_inc_return(&underflows) < 4) {
pr_err("%s(): percpu_ref underflow", __func__);
mem_dump_obj(data);
}
/* @ref is viewed as dead on all CPUs, send out switch confirmation */
percpu_ref_call_confirm_rcu(rcu);
......
This diff is collapsed.
......@@ -615,4 +615,16 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
return false;
}
#define KS_ADDRS_COUNT 16
struct kmem_obj_info {
void *kp_ptr;
struct page *kp_page;
void *kp_objp;
unsigned long kp_data_offset;
struct kmem_cache *kp_slab_cache;
void *kp_ret;
void *kp_stack[KS_ADDRS_COUNT];
};
void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page);
#endif /* MM_SLAB_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -14,4 +14,5 @@ egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls o
grep -v 'ODEBUG: ' |
grep -v 'This means that this is a DEBUG kernel and it is' |
grep -v 'Warning: unable to open an initial console' |
grep -v 'Warning: Failed to add ttynull console. No stdin, stdout, and stderr.*the init process!' |
grep -v 'NOHZ tick-stop error: Non-RCU local softirq work is pending, handler'
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment