Commit 2e31da75 authored by Paul E. McKenney's avatar Paul E. McKenney

Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a',...

Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a', 'nocb.2023.05.11a', 'rcu-tasks.2023.05.10a', 'torture.2023.05.15a' and 'rcu-urgent.2023.06.06a' into HEAD

doc.2023.05.10a: Documentation updates
fixes.2023.05.11a: Miscellaneous fixes
kvfree.2023.05.10a: kvfree_rcu updates
nocb.2023.05.11a: Callback-offloading updates
rcu-tasks.2023.05.10a: Tasks RCU updates
torture.2023.05.15a: Torture-test updates
rcu-urgent.2023.06.06a: Urgent SRCU fix
...@@ -2071,41 +2071,7 @@ call. ...@@ -2071,41 +2071,7 @@ call.
Because RCU avoids interrupting idle CPUs, it is illegal to execute an Because RCU avoids interrupting idle CPUs, it is illegal to execute an
RCU read-side critical section on an idle CPU. (Kernels built with RCU read-side critical section on an idle CPU. (Kernels built with
``CONFIG_PROVE_RCU=y`` will splat if you try it.) The RCU_NONIDLE() ``CONFIG_PROVE_RCU=y`` will splat if you try it.)
macro and ``_rcuidle`` event tracing is provided to work around this
restriction. In addition, rcu_is_watching() may be used to test
whether or not it is currently legal to run RCU read-side critical
sections on this CPU. I learned of the need for diagnostics on the one
hand and RCU_NONIDLE() on the other while inspecting idle-loop code.
Steven Rostedt supplied ``_rcuidle`` event tracing, which is used quite
heavily in the idle loop. However, there are some restrictions on the
code placed within RCU_NONIDLE():
#. Blocking is prohibited. In practice, this is not a serious
restriction given that idle tasks are prohibited from blocking to
begin with.
#. Although nesting RCU_NONIDLE() is permitted, they cannot nest
indefinitely deeply. However, given that they can be nested on the
order of a million deep, even on 32-bit systems, this should not be a
serious restriction. This nesting limit would probably be reached
long after the compiler OOMed or the stack overflowed.
#. Any code path that enters RCU_NONIDLE() must sequence out of that
same RCU_NONIDLE(). For example, the following is grossly
illegal:
::
1 RCU_NONIDLE({
2 do_something();
3 goto bad_idea; /* BUG!!! */
4 do_something_else();});
5 bad_idea:
It is just as illegal to transfer control into the middle of
RCU_NONIDLE()'s argument. Yes, in theory, you could transfer in
as long as you also transferred out, but in practice you could also
expect to get sharply worded review comments.
It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU
running in userspace. RCU must therefore track ``nohz_full`` userspace running in userspace. RCU must therefore track ``nohz_full`` userspace
......
...@@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs:: ...@@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs::
RCU_LOCKDEP_WARN RCU_LOCKDEP_WARN
rcu_sleep_check rcu_sleep_check
RCU_NONIDLE
All: Unchecked RCU-protected pointer access:: All: Unchecked RCU-protected pointer access::
......
...@@ -5094,8 +5094,17 @@ ...@@ -5094,8 +5094,17 @@
rcutorture.stall_cpu_block= [KNL] rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result Sleep while stalling if set. This will result
in warnings from preemptible RCU in addition in warnings from preemptible RCU in addition to
to any other stall-related activity. any other stall-related activity. Note that
in kernels built with CONFIG_PREEMPTION=n and
CONFIG_PREEMPT_COUNT=y, this parameter will
cause the CPU to pass through a quiescent state.
Given CONFIG_PREEMPTION=n, this will suppress
RCU CPU stall warnings, but will instead result
in scheduling-while-atomic splats.
Use of this module parameter results in splats.
rcutorture.stall_cpu_holdoff= [KNL] rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall. Time to wait (s) after boot before inducing stall.
......
...@@ -106,12 +106,22 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); ...@@ -106,12 +106,22 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
#define RAW_NOTIFIER_INIT(name) { \ #define RAW_NOTIFIER_INIT(name) { \
.head = NULL } .head = NULL }
#ifdef CONFIG_TREE_SRCU
#define SRCU_NOTIFIER_INIT(name, pcpu) \ #define SRCU_NOTIFIER_INIT(name, pcpu) \
{ \ { \
.mutex = __MUTEX_INITIALIZER(name.mutex), \ .mutex = __MUTEX_INITIALIZER(name.mutex), \
.head = NULL, \ .head = NULL, \
.srcuu = __SRCU_USAGE_INIT(name.srcuu), \
.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \ .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
} }
#else
#define SRCU_NOTIFIER_INIT(name, pcpu) \
{ \
.mutex = __MUTEX_INITIALIZER(name.mutex), \
.head = NULL, \
.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
}
#endif
#define ATOMIC_NOTIFIER_HEAD(name) \ #define ATOMIC_NOTIFIER_HEAD(name) \
struct atomic_notifier_head name = \ struct atomic_notifier_head name = \
......
...@@ -156,31 +156,6 @@ static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; } ...@@ -156,31 +156,6 @@ static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
static inline void rcu_nocb_flush_deferred_wakeup(void) { } static inline void rcu_nocb_flush_deferred_wakeup(void) { }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
/**
* RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
* @a: Code that RCU needs to pay attention to.
*
* RCU read-side critical sections are forbidden in the inner idle loop,
* that is, between the ct_idle_enter() and the ct_idle_exit() -- RCU
* will happily ignore any such read-side critical sections. However,
* things like powertop need tracepoints in the inner idle loop.
*
* This macro provides the way out: RCU_NONIDLE(do_something_with_RCU())
* will tell RCU that it needs to pay attention, invoke its argument
* (in this example, calling the do_something_with_RCU() function),
* and then tell RCU to go back to ignoring this CPU. It is permissible
* to nest RCU_NONIDLE() wrappers, but not indefinitely (but the limit is
* on the order of a million or so, even on 32-bit systems). It is
* not legal to block within RCU_NONIDLE(), nor is it permissible to
* transfer control either into or out of RCU_NONIDLE()'s statement.
*/
#define RCU_NONIDLE(a) \
do { \
ct_irq_enter_irqson(); \
do { a; } while (0); \
ct_irq_exit_irqson(); \
} while (0)
/* /*
* Note a quasi-voluntary context switch for RCU-tasks's benefit. * Note a quasi-voluntary context switch for RCU-tasks's benefit.
* This is a macro rather than an inline function to avoid #include hell. * This is a macro rather than an inline function to avoid #include hell.
...@@ -957,9 +932,8 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -957,9 +932,8 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
/** /**
* kfree_rcu() - kfree an object after a grace period. * kfree_rcu() - kfree an object after a grace period.
* @ptr: pointer to kfree for both single- and double-argument invocations. * @ptr: pointer to kfree for double-argument invocations.
* @rhf: the name of the struct rcu_head within the type of @ptr, * @rhf: the name of the struct rcu_head within the type of @ptr.
* but only for double-argument invocations.
* *
* Many rcu callbacks functions just call kfree() on the base structure. * Many rcu callbacks functions just call kfree() on the base structure.
* These functions are trivial, but their size adds up, and furthermore * These functions are trivial, but their size adds up, and furthermore
...@@ -984,26 +958,18 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -984,26 +958,18 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* The BUILD_BUG_ON check must not involve any function calls, hence the * The BUILD_BUG_ON check must not involve any function calls, hence the
* checks are done in macros here. * checks are done in macros here.
*/ */
#define kfree_rcu(ptr, rhf...) kvfree_rcu(ptr, ## rhf) #define kfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
#define kvfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
/** /**
* kvfree_rcu() - kvfree an object after a grace period. * kfree_rcu_mightsleep() - kfree an object after a grace period.
* * @ptr: pointer to kfree for single-argument invocations.
* This macro consists of one or two arguments and it is
* based on whether an object is head-less or not. If it
* has a head then a semantic stays the same as it used
* to be before:
*
* kvfree_rcu(ptr, rhf);
*
* where @ptr is a pointer to kvfree(), @rhf is the name
* of the rcu_head structure within the type of @ptr.
* *
* When it comes to head-less variant, only one argument * When it comes to head-less variant, only one argument
* is passed and that is just a pointer which has to be * is passed and that is just a pointer which has to be
* freed after a grace period. Therefore the semantic is * freed after a grace period. Therefore the semantic is
* *
* kvfree_rcu(ptr); * kfree_rcu_mightsleep(ptr);
* *
* where @ptr is the pointer to be freed by kvfree(). * where @ptr is the pointer to be freed by kvfree().
* *
...@@ -1012,13 +978,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -1012,13 +978,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* annotation. Otherwise, please switch and embed the * annotation. Otherwise, please switch and embed the
* rcu_head structure within the type of @ptr. * rcu_head structure within the type of @ptr.
*/ */
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \ #define kfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
#define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr) #define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
#define kfree_rcu_mightsleep(ptr) kvfree_rcu_mightsleep(ptr)
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
#define kvfree_rcu_arg_2(ptr, rhf) \ #define kvfree_rcu_arg_2(ptr, rhf) \
do { \ do { \
typeof (ptr) ___p = (ptr); \ typeof (ptr) ___p = (ptr); \
......
...@@ -212,7 +212,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp) ...@@ -212,7 +212,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
srcu_check_nmi_safety(ssp, false); srcu_check_nmi_safety(ssp, false);
retval = __srcu_read_lock(ssp); retval = __srcu_read_lock(ssp);
srcu_lock_acquire(&(ssp)->dep_map); srcu_lock_acquire(&ssp->dep_map);
return retval; return retval;
} }
...@@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp ...@@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp
srcu_check_nmi_safety(ssp, true); srcu_check_nmi_safety(ssp, true);
retval = __srcu_read_lock_nmisafe(ssp); retval = __srcu_read_lock_nmisafe(ssp);
rcu_lock_acquire(&(ssp)->dep_map); rcu_lock_acquire(&ssp->dep_map);
return retval; return retval;
} }
...@@ -284,7 +284,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx) ...@@ -284,7 +284,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
{ {
WARN_ON_ONCE(idx & ~0x1); WARN_ON_ONCE(idx & ~0x1);
srcu_check_nmi_safety(ssp, false); srcu_check_nmi_safety(ssp, false);
srcu_lock_release(&(ssp)->dep_map); srcu_lock_release(&ssp->dep_map);
__srcu_read_unlock(ssp, idx); __srcu_read_unlock(ssp, idx);
} }
...@@ -300,7 +300,7 @@ static inline void srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) ...@@ -300,7 +300,7 @@ static inline void srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
{ {
WARN_ON_ONCE(idx & ~0x1); WARN_ON_ONCE(idx & ~0x1);
srcu_check_nmi_safety(ssp, true); srcu_check_nmi_safety(ssp, true);
rcu_lock_release(&(ssp)->dep_map); rcu_lock_release(&ssp->dep_map);
__srcu_read_unlock_nmisafe(ssp, idx); __srcu_read_unlock_nmisafe(ssp, idx);
} }
......
...@@ -33,24 +33,19 @@ ...@@ -33,24 +33,19 @@
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
torture_param(int, nwriters_stress, -1, torture_param(int, nwriters_stress, -1, "Number of write-locking stress-test threads");
"Number of write-locking stress-test threads"); torture_param(int, nreaders_stress, -1, "Number of read-locking stress-test threads");
torture_param(int, nreaders_stress, -1, torture_param(int, long_hold, 100, "Do occasional long hold of lock (ms), 0=disable");
"Number of read-locking stress-test threads");
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
torture_param(int, onoff_interval, 0, torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable");
"Time between CPU hotplugs (s), 0=disable"); torture_param(int, shuffle_interval, 3, "Number of jiffies between shuffles, 0=disable");
torture_param(int, shuffle_interval, 3,
"Number of jiffies between shuffles, 0=disable");
torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable."); torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
torture_param(int, stat_interval, 60, torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s");
"Number of seconds between stats printk()s");
torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable"); torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
torture_param(int, rt_boost, 2, torture_param(int, rt_boost, 2,
"Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types."); "Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types.");
torture_param(int, rt_boost_factor, 50, "A factor determining how often rt-boost happens."); torture_param(int, rt_boost_factor, 50, "A factor determining how often rt-boost happens.");
torture_param(int, verbose, 1, torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
"Enable verbose debugging printk()s");
torture_param(int, nested_locks, 0, "Number of nested locks (max = 8)"); torture_param(int, nested_locks, 0, "Number of nested locks (max = 8)");
/* Going much higher trips "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" errors */ /* Going much higher trips "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" errors */
#define MAX_NESTED_LOCKS 8 #define MAX_NESTED_LOCKS 8
...@@ -120,7 +115,7 @@ static int torture_lock_busted_write_lock(int tid __maybe_unused) ...@@ -120,7 +115,7 @@ static int torture_lock_busted_write_lock(int tid __maybe_unused)
static void torture_lock_busted_write_delay(struct torture_random_state *trsp) static void torture_lock_busted_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a long delay occasionally to force massive contention. */ /* We want a long delay occasionally to force massive contention. */
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
...@@ -198,16 +193,18 @@ __acquires(torture_spinlock) ...@@ -198,16 +193,18 @@ __acquires(torture_spinlock)
static void torture_spin_lock_write_delay(struct torture_random_state *trsp) static void torture_spin_lock_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long shortdelay_us = 2; const unsigned long shortdelay_us = 2;
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
unsigned long j;
/* We want a short delay mostly to emulate likely code, and /* We want a short delay mostly to emulate likely code, and
* we want a long delay occasionally to force massive contention. * we want a long delay occasionally to force massive contention.
*/ */
if (!(torture_random(trsp) % if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2000 * longdelay_ms))) {
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) j = jiffies;
mdelay(longdelay_ms); mdelay(longdelay_ms);
if (!(torture_random(trsp) % pr_alert("%s: delay = %lu jiffies.\n", __func__, jiffies - j);
(cxt.nrealwriters_stress * 2 * shortdelay_us))) }
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 200 * shortdelay_us)))
udelay(shortdelay_us); udelay(shortdelay_us);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
...@@ -322,7 +319,7 @@ __acquires(torture_rwlock) ...@@ -322,7 +319,7 @@ __acquires(torture_rwlock)
static void torture_rwlock_write_delay(struct torture_random_state *trsp) static void torture_rwlock_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long shortdelay_us = 2; const unsigned long shortdelay_us = 2;
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a short delay mostly to emulate likely code, and /* We want a short delay mostly to emulate likely code, and
* we want a long delay occasionally to force massive contention. * we want a long delay occasionally to force massive contention.
...@@ -455,14 +452,12 @@ __acquires(torture_mutex) ...@@ -455,14 +452,12 @@ __acquires(torture_mutex)
static void torture_mutex_delay(struct torture_random_state *trsp) static void torture_mutex_delay(struct torture_random_state *trsp)
{ {
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a long delay occasionally to force massive contention. */ /* We want a long delay occasionally to force massive contention. */
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms * 5); mdelay(longdelay_ms * 5);
else
mdelay(longdelay_ms / 5);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
} }
...@@ -630,7 +625,7 @@ __acquires(torture_rtmutex) ...@@ -630,7 +625,7 @@ __acquires(torture_rtmutex)
static void torture_rtmutex_delay(struct torture_random_state *trsp) static void torture_rtmutex_delay(struct torture_random_state *trsp)
{ {
const unsigned long shortdelay_us = 2; const unsigned long shortdelay_us = 2;
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* /*
* We want a short delay mostly to emulate likely code, and * We want a short delay mostly to emulate likely code, and
...@@ -640,7 +635,7 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp) ...@@ -640,7 +635,7 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp)
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms); mdelay(longdelay_ms);
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
(cxt.nrealwriters_stress * 2 * shortdelay_us))) (cxt.nrealwriters_stress * 200 * shortdelay_us)))
udelay(shortdelay_us); udelay(shortdelay_us);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
...@@ -695,14 +690,12 @@ __acquires(torture_rwsem) ...@@ -695,14 +690,12 @@ __acquires(torture_rwsem)
static void torture_rwsem_write_delay(struct torture_random_state *trsp) static void torture_rwsem_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a long delay occasionally to force massive contention. */ /* We want a long delay occasionally to force massive contention. */
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms * 10); mdelay(longdelay_ms * 10);
else
mdelay(longdelay_ms / 10);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
} }
...@@ -848,8 +841,8 @@ static int lock_torture_writer(void *arg) ...@@ -848,8 +841,8 @@ static int lock_torture_writer(void *arg)
lwsp->n_lock_acquired++; lwsp->n_lock_acquired++;
} }
cxt.cur_ops->write_delay(&rand);
if (!skip_main_lock) { if (!skip_main_lock) {
cxt.cur_ops->write_delay(&rand);
lock_is_write_held = false; lock_is_write_held = false;
WRITE_ONCE(last_lock_release, jiffies); WRITE_ONCE(last_lock_release, jiffies);
cxt.cur_ops->writeunlock(tid); cxt.cur_ops->writeunlock(tid);
......
...@@ -314,4 +314,22 @@ config RCU_LAZY ...@@ -314,4 +314,22 @@ config RCU_LAZY
To save power, batch RCU callbacks and flush after delay, memory To save power, batch RCU callbacks and flush after delay, memory
pressure, or callback list growing too big. pressure, or callback list growing too big.
config RCU_DOUBLE_CHECK_CB_TIME
bool "RCU callback-batch backup time check"
depends on RCU_EXPERT
default n
help
Use this option to provide more precise enforcement of the
rcutree.rcu_resched_ns module parameter in situations where
a single RCU callback might run for hundreds of microseconds,
thus defeating the 32-callback batching used to amortize the
cost of the fine-grained but expensive local_clock() function.
This option rounds rcutree.rcu_resched_ns up to the next
jiffy, and overrides the 32-callback batching if this limit
is exceeded.
Say Y here if you need tighter callback-limit enforcement.
Say N here if you are unsure.
endmenu # "RCU Subsystem" endmenu # "RCU Subsystem"
...@@ -642,4 +642,10 @@ void show_rcu_tasks_trace_gp_kthread(void); ...@@ -642,4 +642,10 @@ void show_rcu_tasks_trace_gp_kthread(void);
static inline void show_rcu_tasks_trace_gp_kthread(void) {} static inline void show_rcu_tasks_trace_gp_kthread(void) {}
#endif #endif
#ifdef CONFIG_TINY_RCU
static inline bool rcu_cpu_beenfullyonline(int cpu) { return true; }
#else
bool rcu_cpu_beenfullyonline(int cpu);
#endif
#endif /* __LINUX_RCU_H */ #endif /* __LINUX_RCU_H */
...@@ -522,89 +522,6 @@ rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag) ...@@ -522,89 +522,6 @@ rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown); scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown);
} }
static void
rcu_scale_cleanup(void)
{
int i;
int j;
int ngps = 0;
u64 *wdp;
u64 *wdpp;
/*
* Would like warning at start, but everything is expedited
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (torture_cleanup_begin())
return;
if (!cur_ops) {
torture_cleanup_end();
return;
}
if (reader_tasks) {
for (i = 0; i < nrealreaders; i++)
torture_stop_kthread(rcu_scale_reader,
reader_tasks[i]);
kfree(reader_tasks);
}
if (writer_tasks) {
for (i = 0; i < nrealwriters; i++) {
torture_stop_kthread(rcu_scale_writer,
writer_tasks[i]);
if (!writer_n_durations)
continue;
j = writer_n_durations[i];
pr_alert("%s%s writer %d gps: %d\n",
scale_type, SCALE_FLAG, i, j);
ngps += j;
}
pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
scale_type, SCALE_FLAG,
t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
t_rcu_scale_writer_finished -
t_rcu_scale_writer_started,
ngps,
rcuscale_seq_diff(b_rcu_gp_test_finished,
b_rcu_gp_test_started));
for (i = 0; i < nrealwriters; i++) {
if (!writer_durations)
break;
if (!writer_n_durations)
continue;
wdpp = writer_durations[i];
if (!wdpp)
continue;
for (j = 0; j < writer_n_durations[i]; j++) {
wdp = &wdpp[j];
pr_alert("%s%s %4d writer-duration: %5d %llu\n",
scale_type, SCALE_FLAG,
i, j, *wdp);
if (j % 100 == 0)
schedule_timeout_uninterruptible(1);
}
kfree(writer_durations[i]);
}
kfree(writer_tasks);
kfree(writer_durations);
kfree(writer_n_durations);
}
/* Do torture-type-specific cleanup operations. */
if (cur_ops->cleanup != NULL)
cur_ops->cleanup();
torture_cleanup_end();
}
/* /*
* Return the number if non-negative. If -1, the number of CPUs. * Return the number if non-negative. If -1, the number of CPUs.
* If less than -1, that much less than the number of CPUs, but * If less than -1, that much less than the number of CPUs, but
...@@ -624,20 +541,6 @@ static int compute_real(int n) ...@@ -624,20 +541,6 @@ static int compute_real(int n)
return nr; return nr;
} }
/*
* RCU scalability shutdown kthread. Just waits to be awakened, then shuts
* down system.
*/
static int
rcu_scale_shutdown(void *arg)
{
wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_scale_cleanup();
kernel_power_off();
return -EINVAL;
}
/* /*
* kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number * kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number
* of iterations and measure total time and number of GP for all iterations to complete. * of iterations and measure total time and number of GP for all iterations to complete.
...@@ -874,6 +777,108 @@ kfree_scale_init(void) ...@@ -874,6 +777,108 @@ kfree_scale_init(void)
return firsterr; return firsterr;
} }
static void
rcu_scale_cleanup(void)
{
int i;
int j;
int ngps = 0;
u64 *wdp;
u64 *wdpp;
/*
* Would like warning at start, but everything is expedited
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (kfree_rcu_test) {
kfree_scale_cleanup();
return;
}
if (torture_cleanup_begin())
return;
if (!cur_ops) {
torture_cleanup_end();
return;
}
if (reader_tasks) {
for (i = 0; i < nrealreaders; i++)
torture_stop_kthread(rcu_scale_reader,
reader_tasks[i]);
kfree(reader_tasks);
}
if (writer_tasks) {
for (i = 0; i < nrealwriters; i++) {
torture_stop_kthread(rcu_scale_writer,
writer_tasks[i]);
if (!writer_n_durations)
continue;
j = writer_n_durations[i];
pr_alert("%s%s writer %d gps: %d\n",
scale_type, SCALE_FLAG, i, j);
ngps += j;
}
pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
scale_type, SCALE_FLAG,
t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
t_rcu_scale_writer_finished -
t_rcu_scale_writer_started,
ngps,
rcuscale_seq_diff(b_rcu_gp_test_finished,
b_rcu_gp_test_started));
for (i = 0; i < nrealwriters; i++) {
if (!writer_durations)
break;
if (!writer_n_durations)
continue;
wdpp = writer_durations[i];
if (!wdpp)
continue;
for (j = 0; j < writer_n_durations[i]; j++) {
wdp = &wdpp[j];
pr_alert("%s%s %4d writer-duration: %5d %llu\n",
scale_type, SCALE_FLAG,
i, j, *wdp);
if (j % 100 == 0)
schedule_timeout_uninterruptible(1);
}
kfree(writer_durations[i]);
}
kfree(writer_tasks);
kfree(writer_durations);
kfree(writer_n_durations);
}
/* Do torture-type-specific cleanup operations. */
if (cur_ops->cleanup != NULL)
cur_ops->cleanup();
torture_cleanup_end();
}
/*
* RCU scalability shutdown kthread. Just waits to be awakened, then shuts
* down system.
*/
static int
rcu_scale_shutdown(void *arg)
{
wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_scale_cleanup();
kernel_power_off();
return -EINVAL;
}
static int __init static int __init
rcu_scale_init(void) rcu_scale_init(void)
{ {
......
...@@ -241,7 +241,6 @@ static void cblist_init_generic(struct rcu_tasks *rtp) ...@@ -241,7 +241,6 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
if (rcu_task_enqueue_lim < 0) { if (rcu_task_enqueue_lim < 0) {
rcu_task_enqueue_lim = 1; rcu_task_enqueue_lim = 1;
rcu_task_cb_adjust = true; rcu_task_cb_adjust = true;
pr_info("%s: Setting adjustable number of callback queues.\n", __func__);
} else if (rcu_task_enqueue_lim == 0) { } else if (rcu_task_enqueue_lim == 0) {
rcu_task_enqueue_lim = 1; rcu_task_enqueue_lim = 1;
} }
...@@ -272,7 +271,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp) ...@@ -272,7 +271,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled. raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
} }
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags); raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
pr_info("%s: Setting shift to %d and lim to %d.\n", __func__, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim));
pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim), rcu_task_cb_adjust);
} }
// IRQ-work handler that does deferred wakeup for call_rcu_tasks_generic(). // IRQ-work handler that does deferred wakeup for call_rcu_tasks_generic().
...@@ -463,6 +464,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu ...@@ -463,6 +464,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
{ {
int cpu; int cpu;
int cpunext; int cpunext;
int cpuwq;
unsigned long flags; unsigned long flags;
int len; int len;
struct rcu_head *rhp; struct rcu_head *rhp;
...@@ -473,11 +475,13 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu ...@@ -473,11 +475,13 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
cpunext = cpu * 2 + 1; cpunext = cpu * 2 + 1;
if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work); cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND;
queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
cpunext++; cpunext++;
if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work); cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND;
queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
} }
} }
......
This diff is collapsed.
...@@ -643,7 +643,7 @@ static void synchronize_rcu_expedited_wait(void) ...@@ -643,7 +643,7 @@ static void synchronize_rcu_expedited_wait(void)
"O."[!!cpu_online(cpu)], "O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rnp->expmaskinit)], "o."[!!(rdp->grpmask & rnp->expmaskinit)],
"N."[!!(rdp->grpmask & rnp->expmaskinitnext)], "N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
"D."[!!(rdp->cpu_no_qs.b.exp)]); "D."[!!data_race(rdp->cpu_no_qs.b.exp)]);
} }
} }
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
......
...@@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) ...@@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
int cpu; int cpu;
unsigned long count = 0; unsigned long count = 0;
if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
return 0;
/* Protect rcu_nocb_mask against concurrent (de-)offloading. */
if (!mutex_trylock(&rcu_state.barrier_mutex))
return 0;
/* Snapshot count of all CPUs */ /* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) { for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
count += READ_ONCE(rdp->lazy_len); count += READ_ONCE(rdp->lazy_len);
} }
mutex_unlock(&rcu_state.barrier_mutex);
return count ? count : SHRINK_EMPTY; return count ? count : SHRINK_EMPTY;
} }
...@@ -1336,15 +1345,45 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) ...@@ -1336,15 +1345,45 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
unsigned long flags; unsigned long flags;
unsigned long count = 0; unsigned long count = 0;
if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
return 0;
/*
* Protect against concurrent (de-)offloading. Otherwise nocb locking
* may be ignored or imbalanced.
*/
if (!mutex_trylock(&rcu_state.barrier_mutex)) {
/*
* But really don't insist if barrier_mutex is contended since we
* can't guarantee that it will never engage in a dependency
* chain involving memory allocation. The lock is seldom contended
* anyway.
*/
return 0;
}
/* Snapshot count of all CPUs */ /* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) { for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
int _count = READ_ONCE(rdp->lazy_len); int _count;
if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp)))
continue;
if (_count == 0) if (!READ_ONCE(rdp->lazy_len))
continue; continue;
rcu_nocb_lock_irqsave(rdp, flags); rcu_nocb_lock_irqsave(rdp, flags);
WRITE_ONCE(rdp->lazy_len, 0); /*
* Recheck under the nocb lock. Since we are not holding the bypass
* lock we may still race with increments from the enqueuer but still
* we know for sure if there is at least one lazy callback.
*/
_count = READ_ONCE(rdp->lazy_len);
if (!_count) {
rcu_nocb_unlock_irqrestore(rdp, flags);
continue;
}
WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
rcu_nocb_unlock_irqrestore(rdp, flags); rcu_nocb_unlock_irqrestore(rdp, flags);
wake_nocb_gp(rdp, false); wake_nocb_gp(rdp, false);
sc->nr_to_scan -= _count; sc->nr_to_scan -= _count;
...@@ -1352,6 +1391,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) ...@@ -1352,6 +1391,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
if (sc->nr_to_scan <= 0) if (sc->nr_to_scan <= 0)
break; break;
} }
mutex_unlock(&rcu_state.barrier_mutex);
return count ? count : SHRINK_STOP; return count ? count : SHRINK_STOP;
} }
......
...@@ -257,6 +257,8 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp) ...@@ -257,6 +257,8 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
* GP should not be able to end until we report, so there should be * GP should not be able to end until we report, so there should be
* no need to check for a subsequent expedited GP. (Though we are * no need to check for a subsequent expedited GP. (Though we are
* still in a quiescent state in any case.) * still in a quiescent state in any case.)
*
* Interrupts are disabled, so ->cpu_no_qs.b.exp cannot change.
*/ */
if (blkd_state & RCU_EXP_BLKD && rdp->cpu_no_qs.b.exp) if (blkd_state & RCU_EXP_BLKD && rdp->cpu_no_qs.b.exp)
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
...@@ -941,7 +943,7 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t) ...@@ -941,7 +943,7 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
{ {
struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
if (rdp->cpu_no_qs.b.exp) if (READ_ONCE(rdp->cpu_no_qs.b.exp))
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
} }
......
...@@ -250,7 +250,7 @@ identify_qemu_args () { ...@@ -250,7 +250,7 @@ identify_qemu_args () {
echo -machine virt,gic-version=host -cpu host echo -machine virt,gic-version=host -cpu host
;; ;;
qemu-system-ppc64) qemu-system-ppc64)
echo -enable-kvm -M pseries -nodefaults echo -M pseries -nodefaults
echo -device spapr-vscsi echo -device spapr-vscsi
if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC" if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC"
then then
......
...@@ -5,4 +5,4 @@ rcutree.gp_init_delay=3 ...@@ -5,4 +5,4 @@ rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 rcutree.kthread_prio=2
threadirqs threadirqs
tree.use_softirq=0 rcutree.use_softirq=0
...@@ -4,4 +4,4 @@ rcutree.gp_init_delay=3 ...@@ -4,4 +4,4 @@ rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 rcutree.kthread_prio=2
threadirqs threadirqs
tree.use_softirq=0 rcutree.use_softirq=0
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment