Commit df00ccca authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - documentation updates

   - miscellaneous fixes

   - minor reorganization of code

   - torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  rcu: Correctly handle sparse possible cpus
  rcu: sysctl: Panic on RCU Stall
  rcu: Fix a typo in a comment
  rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
  rcu: Disable TASKS_RCU for usermode Linux
  rcu: No ordering for rcu_assign_pointer() of NULL
  rcutorture: Fix error return code in rcu_perf_init()
  torture: Inflict default jitter
  rcuperf: Don't treat gp_exp mis-setting as a WARN
  rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
  rcutorture: Don't specify the cpu type of QEMU on PPC
  rcutorture: Make -soundhw a x86 specific option
  rcutorture: Use vmlinux as the fallback kernel image
  rcutorture/doc: Create initrd using dracut
  torture: Stop onoff task if there is only one cpu
  torture: Add starvation events to error summary
  torture:  Break online and offline functions out of torture_onoff()
  torture: Forgive lengthy trace dumps and preemption
  torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
  torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
  ...
parents dd950695 54d5f16e
...@@ -2391,6 +2391,41 @@ and <tt>RCU_NONIDLE()</tt> on the other while inspecting ...@@ -2391,6 +2391,41 @@ and <tt>RCU_NONIDLE()</tt> on the other while inspecting
idle-loop code. idle-loop code.
Steven Rostedt supplied <tt>_rcuidle</tt> event tracing, Steven Rostedt supplied <tt>_rcuidle</tt> event tracing,
which is used quite heavily in the idle loop. which is used quite heavily in the idle loop.
However, there are some restrictions on the code placed within
<tt>RCU_NONIDLE()</tt>:
<ol>
<li> Blocking is prohibited.
In practice, this is not a serious restriction given that idle
tasks are prohibited from blocking to begin with.
<li> Although nesting <tt>RCU_NONIDLE()</tt> is permited, they cannot
nest indefinitely deeply.
However, given that they can be nested on the order of a million
deep, even on 32-bit systems, this should not be a serious
restriction.
This nesting limit would probably be reached long after the
compiler OOMed or the stack overflowed.
<li> Any code path that enters <tt>RCU_NONIDLE()</tt> must sequence
out of that same <tt>RCU_NONIDLE()</tt>.
For example, the following is grossly illegal:
<blockquote>
<pre>
1 RCU_NONIDLE({
2 do_something();
3 goto bad_idea; /* BUG!!! */
4 do_something_else();});
5 bad_idea:
</pre>
</blockquote>
<p>
It is just as illegal to transfer control into the middle of
<tt>RCU_NONIDLE()</tt>'s argument.
Yes, in theory, you could transfer in as long as you also
transferred out, but in practice you could also expect to get sharply
worded review comments.
</ol>
<p> <p>
It is similarly socially unacceptable to interrupt an It is similarly socially unacceptable to interrupt an
......
...@@ -49,7 +49,7 @@ rcupdate.rcu_task_stall_timeout ...@@ -49,7 +49,7 @@ rcupdate.rcu_task_stall_timeout
This boot/sysfs parameter controls the RCU-tasks stall warning This boot/sysfs parameter controls the RCU-tasks stall warning
interval. A value of zero or less suppresses RCU-tasks stall interval. A value of zero or less suppresses RCU-tasks stall
warnings. A positive value sets the stall-warning interval warnings. A positive value sets the stall-warning interval
in jiffies. An RCU-tasks stall warning starts wtih the line: in jiffies. An RCU-tasks stall warning starts with the line:
INFO: rcu_tasks detected stalls on tasks: INFO: rcu_tasks detected stalls on tasks:
......
...@@ -5,6 +5,9 @@ to start learning about RCU: ...@@ -5,6 +5,9 @@ to start learning about RCU:
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
2010 Big API Table http://lwn.net/Articles/419086/
5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
2014 Big API Table http://lwn.net/Articles/609973/
What is RCU? What is RCU?
......
...@@ -58,6 +58,7 @@ show up in /proc/sys/kernel: ...@@ -58,6 +58,7 @@ show up in /proc/sys/kernel:
- panic_on_stackoverflow - panic_on_stackoverflow
- panic_on_unrecovered_nmi - panic_on_unrecovered_nmi
- panic_on_warn - panic_on_warn
- panic_on_rcu_stall
- perf_cpu_time_max_percent - perf_cpu_time_max_percent
- perf_event_paranoid - perf_event_paranoid
- perf_event_max_stack - perf_event_max_stack
...@@ -618,6 +619,17 @@ a kernel rebuild when attempting to kdump at the location of a WARN(). ...@@ -618,6 +619,17 @@ a kernel rebuild when attempting to kdump at the location of a WARN().
============================================================== ==============================================================
panic_on_rcu_stall:
When set to 1, calls panic() after RCU stall detection messages. This
is useful to define the root cause of RCU stalls using a vmcore.
0: do not panic() when RCU stall takes place, default behavior.
1: panic() after printing RCU stall messages.
==============================================================
perf_cpu_time_max_percent: perf_cpu_time_max_percent:
Hints to the kernel how much CPU time it should be allowed to Hints to the kernel how much CPU time it should be allowed to
......
...@@ -451,6 +451,7 @@ extern int panic_on_oops; ...@@ -451,6 +451,7 @@ extern int panic_on_oops;
extern int panic_on_unrecovered_nmi; extern int panic_on_unrecovered_nmi;
extern int panic_on_io_nmi; extern int panic_on_io_nmi;
extern int panic_on_warn; extern int panic_on_warn;
extern int sysctl_panic_on_rcu_stall;
extern int sysctl_panic_on_stackoverflow; extern int sysctl_panic_on_stackoverflow;
extern bool crash_kexec_post_notifiers; extern bool crash_kexec_post_notifiers;
......
...@@ -45,6 +45,7 @@ ...@@ -45,6 +45,7 @@
#include <linux/bug.h> #include <linux/bug.h>
#include <linux/compiler.h> #include <linux/compiler.h>
#include <linux/ktime.h> #include <linux/ktime.h>
#include <linux/irqflags.h>
#include <asm/barrier.h> #include <asm/barrier.h>
...@@ -379,12 +380,13 @@ static inline void rcu_init_nohz(void) ...@@ -379,12 +380,13 @@ static inline void rcu_init_nohz(void)
* in the inner idle loop. * in the inner idle loop.
* *
* This macro provides the way out: RCU_NONIDLE(do_something_with_RCU()) * This macro provides the way out: RCU_NONIDLE(do_something_with_RCU())
* will tell RCU that it needs to pay attending, invoke its argument * will tell RCU that it needs to pay attention, invoke its argument
* (in this example, a call to the do_something_with_RCU() function), * (in this example, calling the do_something_with_RCU() function),
* and then tell RCU to go back to ignoring this CPU. It is permissible * and then tell RCU to go back to ignoring this CPU. It is permissible
* to nest RCU_NONIDLE() wrappers, but the nesting level is currently * to nest RCU_NONIDLE() wrappers, but not indefinitely (but the limit is
* quite limited. If deeper nesting is required, it will be necessary * on the order of a million or so, even on 32-bit systems). It is
* to adjust DYNTICK_TASK_NESTING_VALUE accordingly. * not legal to block within RCU_NONIDLE(), nor is it permissible to
* transfer control either into or out of RCU_NONIDLE()'s statement.
*/ */
#define RCU_NONIDLE(a) \ #define RCU_NONIDLE(a) \
do { \ do { \
...@@ -649,7 +651,16 @@ static inline void rcu_preempt_sleep_check(void) ...@@ -649,7 +651,16 @@ static inline void rcu_preempt_sleep_check(void)
* please be careful when making changes to rcu_assign_pointer() and the * please be careful when making changes to rcu_assign_pointer() and the
* other macros that it invokes. * other macros that it invokes.
*/ */
#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v)) #define rcu_assign_pointer(p, v) \
({ \
uintptr_t _r_a_p__v = (uintptr_t)(v); \
\
if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
else \
smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
_r_a_p__v; \
})
/** /**
* rcu_access_pointer() - fetch RCU pointer with no dereferencing * rcu_access_pointer() - fetch RCU pointer with no dereferencing
......
...@@ -50,6 +50,10 @@ ...@@ -50,6 +50,10 @@
do { if (verbose) pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); } while (0) do { if (verbose) pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); } while (0)
/* Definitions for online/offline exerciser. */ /* Definitions for online/offline exerciser. */
bool torture_offline(int cpu, long *n_onl_attempts, long *n_onl_successes,
unsigned long *sum_offl, int *min_onl, int *max_onl);
bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
unsigned long *sum_onl, int *min_onl, int *max_onl);
int torture_onoff_init(long ooholdoff, long oointerval); int torture_onoff_init(long ooholdoff, long oointerval);
void torture_onoff_stats(void); void torture_onoff_stats(void);
bool torture_onoff_failures(void); bool torture_onoff_failures(void);
......
...@@ -517,6 +517,7 @@ config SRCU ...@@ -517,6 +517,7 @@ config SRCU
config TASKS_RCU config TASKS_RCU
bool bool
default n default n
depends on !UML
select SRCU select SRCU
help help
This option enables a task-based RCU implementation that uses This option enables a task-based RCU implementation that uses
......
...@@ -58,7 +58,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>"); ...@@ -58,7 +58,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>");
#define VERBOSE_PERFOUT_ERRSTRING(s) \ #define VERBOSE_PERFOUT_ERRSTRING(s) \
do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0) do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0)
torture_param(bool, gp_exp, true, "Use expedited GP wait primitives"); torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
torture_param(int, nreaders, -1, "Number of RCU reader threads"); torture_param(int, nreaders, -1, "Number of RCU reader threads");
torture_param(int, nwriters, -1, "Number of RCU updater threads"); torture_param(int, nwriters, -1, "Number of RCU updater threads");
...@@ -96,12 +96,7 @@ static int rcu_perf_writer_state; ...@@ -96,12 +96,7 @@ static int rcu_perf_writer_state;
#define MAX_MEAS 10000 #define MAX_MEAS 10000
#define MIN_MEAS 100 #define MIN_MEAS 100
#if defined(MODULE) || defined(CONFIG_RCU_PERF_TEST_RUNNABLE) static int perf_runnable = IS_ENABLED(MODULE);
#define RCUPERF_RUNNABLE_INIT 1
#else
#define RCUPERF_RUNNABLE_INIT 0
#endif
static int perf_runnable = RCUPERF_RUNNABLE_INIT;
module_param(perf_runnable, int, 0444); module_param(perf_runnable, int, 0444);
MODULE_PARM_DESC(perf_runnable, "Start rcuperf at boot"); MODULE_PARM_DESC(perf_runnable, "Start rcuperf at boot");
...@@ -363,8 +358,6 @@ rcu_perf_writer(void *arg) ...@@ -363,8 +358,6 @@ rcu_perf_writer(void *arg)
u64 *wdpp = writer_durations[me]; u64 *wdpp = writer_durations[me];
VERBOSE_PERFOUT_STRING("rcu_perf_writer task started"); VERBOSE_PERFOUT_STRING("rcu_perf_writer task started");
WARN_ON(rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp);
WARN_ON(rcu_gp_is_normal() && gp_exp);
WARN_ON(!wdpp); WARN_ON(!wdpp);
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
sp.sched_priority = 1; sp.sched_priority = 1;
...@@ -631,12 +624,24 @@ rcu_perf_init(void) ...@@ -631,12 +624,24 @@ rcu_perf_init(void)
firsterr = -ENOMEM; firsterr = -ENOMEM;
goto unwind; goto unwind;
} }
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) {
VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
firsterr = -EINVAL;
goto unwind;
}
if (rcu_gp_is_normal() && gp_exp) {
VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
firsterr = -EINVAL;
goto unwind;
}
for (i = 0; i < nrealwriters; i++) { for (i = 0; i < nrealwriters; i++) {
writer_durations[i] = writer_durations[i] =
kcalloc(MAX_MEAS, sizeof(*writer_durations[i]), kcalloc(MAX_MEAS, sizeof(*writer_durations[i]),
GFP_KERNEL); GFP_KERNEL);
if (!writer_durations[i]) if (!writer_durations[i]) {
firsterr = -ENOMEM;
goto unwind; goto unwind;
}
firsterr = torture_create_kthread(rcu_perf_writer, (void *)i, firsterr = torture_create_kthread(rcu_perf_writer, (void *)i,
writer_tasks[i]); writer_tasks[i]);
if (firsterr) if (firsterr)
......
...@@ -182,12 +182,7 @@ static const char *rcu_torture_writer_state_getname(void) ...@@ -182,12 +182,7 @@ static const char *rcu_torture_writer_state_getname(void)
return rcu_torture_writer_state_names[i]; return rcu_torture_writer_state_names[i];
} }
#if defined(MODULE) || defined(CONFIG_RCU_TORTURE_TEST_RUNNABLE) static int torture_runnable = IS_ENABLED(MODULE);
#define RCUTORTURE_RUNNABLE_INIT 1
#else
#define RCUTORTURE_RUNNABLE_INIT 0
#endif
static int torture_runnable = RCUTORTURE_RUNNABLE_INIT;
module_param(torture_runnable, int, 0444); module_param(torture_runnable, int, 0444);
MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot"); MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot");
...@@ -1476,7 +1471,7 @@ static int rcu_torture_barrier_cbs(void *arg) ...@@ -1476,7 +1471,7 @@ static int rcu_torture_barrier_cbs(void *arg)
break; break;
/* /*
* The above smp_load_acquire() ensures barrier_phase load * The above smp_load_acquire() ensures barrier_phase load
* is ordered before the folloiwng ->call(). * is ordered before the following ->call().
*/ */
local_irq_disable(); /* Just to test no-irq call_rcu(). */ local_irq_disable(); /* Just to test no-irq call_rcu(). */
cur_ops->call(&rcu, rcu_torture_barrier_cbf); cur_ops->call(&rcu, rcu_torture_barrier_cbf);
......
This diff is collapsed.
...@@ -253,6 +253,13 @@ struct rcu_node { ...@@ -253,6 +253,13 @@ struct rcu_node {
wait_queue_head_t exp_wq[4]; wait_queue_head_t exp_wq[4];
} ____cacheline_internodealigned_in_smp; } ____cacheline_internodealigned_in_smp;
/*
* Bitmasks in an rcu_node cover the interval [grplo, grphi] of CPU IDs, and
* are indexed relative to this interval rather than the global CPU ID space.
* This generates the bit for a CPU in node-local masks.
*/
#define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo))
/* /*
* Do a full breadth-first scan of the rcu_node structures for the * Do a full breadth-first scan of the rcu_node structures for the
* specified rcu_state structure. * specified rcu_state structure.
...@@ -280,6 +287,14 @@ struct rcu_node { ...@@ -280,6 +287,14 @@ struct rcu_node {
for ((rnp) = (rsp)->level[rcu_num_lvls - 1]; \ for ((rnp) = (rsp)->level[rcu_num_lvls - 1]; \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++) (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
/*
* Iterate over all possible CPUs in a leaf RCU node.
*/
#define for_each_leaf_node_possible_cpu(rnp, cpu) \
for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
cpu <= rnp->grphi; \
cpu = cpumask_next((cpu), cpu_possible_mask))
/* /*
* Union to allow "aggregate OR" operation on the need for a quiescent * Union to allow "aggregate OR" operation on the need for a quiescent
* state by the normal and expedited grace periods. * state by the normal and expedited grace periods.
......
This diff is collapsed.
...@@ -79,8 +79,6 @@ static void __init rcu_bootup_announce_oddness(void) ...@@ -79,8 +79,6 @@ static void __init rcu_bootup_announce_oddness(void)
pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n"); pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n");
if (IS_ENABLED(CONFIG_PROVE_RCU)) if (IS_ENABLED(CONFIG_PROVE_RCU))
pr_info("\tRCU lockdep checking is enabled.\n"); pr_info("\tRCU lockdep checking is enabled.\n");
if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_RUNNABLE))
pr_info("\tRCU torture testing starts during boot.\n");
if (RCU_NUM_LVLS >= 4) if (RCU_NUM_LVLS >= 4)
pr_info("\tFour(or more)-level hierarchy is enabled.\n"); pr_info("\tFour(or more)-level hierarchy is enabled.\n");
if (RCU_FANOUT_LEAF != 16) if (RCU_FANOUT_LEAF != 16)
...@@ -681,84 +679,6 @@ void synchronize_rcu(void) ...@@ -681,84 +679,6 @@ void synchronize_rcu(void)
} }
EXPORT_SYMBOL_GPL(synchronize_rcu); EXPORT_SYMBOL_GPL(synchronize_rcu);
/*
* Remote handler for smp_call_function_single(). If there is an
* RCU read-side critical section in effect, request that the
* next rcu_read_unlock() record the quiescent state up the
* ->expmask fields in the rcu_node tree. Otherwise, immediately
* report the quiescent state.
*/
static void sync_rcu_exp_handler(void *info)
{
struct rcu_data *rdp;
struct rcu_state *rsp = info;
struct task_struct *t = current;
/*
* Within an RCU read-side critical section, request that the next
* rcu_read_unlock() report. Unless this RCU read-side critical
* section has already blocked, in which case it is already set
* up for the expedited grace period to wait on it.
*/
if (t->rcu_read_lock_nesting > 0 &&
!t->rcu_read_unlock_special.b.blocked) {
t->rcu_read_unlock_special.b.exp_need_qs = true;
return;
}
/*
* We are either exiting an RCU read-side critical section (negative
* values of t->rcu_read_lock_nesting) or are not in one at all
* (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
* read-side critical section that blocked before this expedited
* grace period started. Either way, we can immediately report
* the quiescent state.
*/
rdp = this_cpu_ptr(rsp->rda);
rcu_report_exp_rdp(rsp, rdp, true);
}
/**
* synchronize_rcu_expedited - Brute-force RCU grace period
*
* Wait for an RCU-preempt grace period, but expedite it. The basic
* idea is to IPI all non-idle non-nohz online CPUs. The IPI handler
* checks whether the CPU is in an RCU-preempt critical section, and
* if so, it sets a flag that causes the outermost rcu_read_unlock()
* to report the quiescent state. On the other hand, if the CPU is
* not in an RCU read-side critical section, the IPI handler reports
* the quiescent state immediately.
*
* Although this is a greate improvement over previous expedited
* implementations, it is still unfriendly to real-time workloads, so is
* thus not recommended for any sort of common-case code. In fact, if
* you are using synchronize_rcu_expedited() in a loop, please restructure
* your code to batch your updates, and then Use a single synchronize_rcu()
* instead.
*/
void synchronize_rcu_expedited(void)
{
struct rcu_state *rsp = rcu_state_p;
unsigned long s;
/* If expedited grace periods are prohibited, fall back to normal. */
if (rcu_gp_is_normal()) {
wait_rcu_gp(call_rcu);
return;
}
s = rcu_exp_gp_seq_snap(rsp);
if (exp_funnel_lock(rsp, s))
return; /* Someone else did our work for us. */
/* Initialize the rcu_node tree in preparation for the wait. */
sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
/* Wait for ->blkd_tasks lists to drain, then wake everyone up. */
rcu_exp_wait_wake(rsp, s);
}
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
/** /**
* rcu_barrier - Wait until all in-flight call_rcu() callbacks complete. * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
* *
...@@ -882,16 +802,6 @@ static void rcu_preempt_check_callbacks(void) ...@@ -882,16 +802,6 @@ static void rcu_preempt_check_callbacks(void)
{ {
} }
/*
* Wait for an rcu-preempt grace period, but make it happen quickly.
* But because preemptible RCU does not exist, map to rcu-sched.
*/
void synchronize_rcu_expedited(void)
{
synchronize_sched_expedited();
}
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
/* /*
* Because preemptible RCU does not exist, rcu_barrier() is just * Because preemptible RCU does not exist, rcu_barrier() is just
* another name for rcu_barrier_sched(). * another name for rcu_barrier_sched().
...@@ -1254,8 +1164,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) ...@@ -1254,8 +1164,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
return; return;
if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
return; return;
for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) for_each_leaf_node_possible_cpu(rnp, cpu)
if ((mask & 0x1) && cpu != outgoingcpu) if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
cpu != outgoingcpu)
cpumask_set_cpu(cpu, cm); cpumask_set_cpu(cpu, cm);
if (cpumask_weight(cm) == 0) if (cpumask_weight(cm) == 0)
cpumask_setall(cm); cpumask_setall(cm);
......
...@@ -528,6 +528,7 @@ static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10; ...@@ -528,6 +528,7 @@ static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
module_param(rcu_task_stall_timeout, int, 0644); module_param(rcu_task_stall_timeout, int, 0644);
static void rcu_spawn_tasks_kthread(void); static void rcu_spawn_tasks_kthread(void);
static struct task_struct *rcu_tasks_kthread_ptr;
/* /*
* Post an RCU-tasks callback. First call must be from process context * Post an RCU-tasks callback. First call must be from process context
...@@ -537,6 +538,7 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func) ...@@ -537,6 +538,7 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
{ {
unsigned long flags; unsigned long flags;
bool needwake; bool needwake;
bool havetask = READ_ONCE(rcu_tasks_kthread_ptr);
rhp->next = NULL; rhp->next = NULL;
rhp->func = func; rhp->func = func;
...@@ -545,7 +547,9 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func) ...@@ -545,7 +547,9 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
*rcu_tasks_cbs_tail = rhp; *rcu_tasks_cbs_tail = rhp;
rcu_tasks_cbs_tail = &rhp->next; rcu_tasks_cbs_tail = &rhp->next;
raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
if (needwake) { /* We can't create the thread unless interrupts are enabled. */
if ((needwake && havetask) ||
(!havetask && !irqs_disabled_flags(flags))) {
rcu_spawn_tasks_kthread(); rcu_spawn_tasks_kthread();
wake_up(&rcu_tasks_cbs_wq); wake_up(&rcu_tasks_cbs_wq);
} }
...@@ -790,7 +794,6 @@ static int __noreturn rcu_tasks_kthread(void *arg) ...@@ -790,7 +794,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
static void rcu_spawn_tasks_kthread(void) static void rcu_spawn_tasks_kthread(void)
{ {
static DEFINE_MUTEX(rcu_tasks_kthread_mutex); static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
static struct task_struct *rcu_tasks_kthread_ptr;
struct task_struct *t; struct task_struct *t;
if (READ_ONCE(rcu_tasks_kthread_ptr)) { if (READ_ONCE(rcu_tasks_kthread_ptr)) {
......
...@@ -1204,6 +1204,17 @@ static struct ctl_table kern_table[] = { ...@@ -1204,6 +1204,17 @@ static struct ctl_table kern_table[] = {
.extra1 = &one, .extra1 = &one,
.extra2 = &one, .extra2 = &one,
}, },
#endif
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
{
.procname = "panic_on_rcu_stall",
.data = &sysctl_panic_on_rcu_stall,
.maxlen = sizeof(sysctl_panic_on_rcu_stall),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &one,
},
#endif #endif
{ } { }
}; };
......
...@@ -81,6 +81,104 @@ static unsigned long sum_online; ...@@ -81,6 +81,104 @@ static unsigned long sum_online;
static int min_online = -1; static int min_online = -1;
static int max_online; static int max_online;
/*
* Attempt to take a CPU offline. Return false if the CPU is already
* offline or if it is not subject to CPU-hotplug operations. The
* caller can detect other failures by looking at the statistics.
*/
bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
unsigned long *sum_offl, int *min_offl, int *max_offl)
{
unsigned long delta;
int ret;
unsigned long starttime;
if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
return false;
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offlining %d\n",
torture_type, cpu);
starttime = jiffies;
(*n_offl_attempts)++;
ret = cpu_down(cpu);
if (ret) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offline %d failed: errno %d\n",
torture_type, cpu, ret);
} else {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offlined %d\n",
torture_type, cpu);
(*n_offl_successes)++;
delta = jiffies - starttime;
sum_offl += delta;
if (*min_offl < 0) {
*min_offl = delta;
*max_offl = delta;
}
if (*min_offl > delta)
*min_offl = delta;
if (*max_offl < delta)
*max_offl = delta;
}
return true;
}
EXPORT_SYMBOL_GPL(torture_offline);
/*
* Attempt to bring a CPU online. Return false if the CPU is already
* online or if it is not subject to CPU-hotplug operations. The
* caller can detect other failures by looking at the statistics.
*/
bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
unsigned long *sum_onl, int *min_onl, int *max_onl)
{
unsigned long delta;
int ret;
unsigned long starttime;
if (cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
return false;
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: onlining %d\n",
torture_type, cpu);
starttime = jiffies;
(*n_onl_attempts)++;
ret = cpu_up(cpu);
if (ret) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: online %d failed: errno %d\n",
torture_type, cpu, ret);
} else {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: onlined %d\n",
torture_type, cpu);
(*n_onl_successes)++;
delta = jiffies - starttime;
*sum_onl += delta;
if (*min_onl < 0) {
*min_onl = delta;
*max_onl = delta;
}
if (*min_onl > delta)
*min_onl = delta;
if (*max_onl < delta)
*max_onl = delta;
}
return true;
}
EXPORT_SYMBOL_GPL(torture_online);
/* /*
* Execute random CPU-hotplug operations at the interval specified * Execute random CPU-hotplug operations at the interval specified
* by the onoff_interval. * by the onoff_interval.
...@@ -89,16 +187,19 @@ static int ...@@ -89,16 +187,19 @@ static int
torture_onoff(void *arg) torture_onoff(void *arg)
{ {
int cpu; int cpu;
unsigned long delta;
int maxcpu = -1; int maxcpu = -1;
DEFINE_TORTURE_RANDOM(rand); DEFINE_TORTURE_RANDOM(rand);
int ret;
unsigned long starttime;
VERBOSE_TOROUT_STRING("torture_onoff task started"); VERBOSE_TOROUT_STRING("torture_onoff task started");
for_each_online_cpu(cpu) for_each_online_cpu(cpu)
maxcpu = cpu; maxcpu = cpu;
WARN_ON(maxcpu < 0); WARN_ON(maxcpu < 0);
if (maxcpu == 0) {
VERBOSE_TOROUT_STRING("Only one CPU, so CPU-hotplug testing is disabled");
goto stop;
}
if (onoff_holdoff > 0) { if (onoff_holdoff > 0) {
VERBOSE_TOROUT_STRING("torture_onoff begin holdoff"); VERBOSE_TOROUT_STRING("torture_onoff begin holdoff");
schedule_timeout_interruptible(onoff_holdoff); schedule_timeout_interruptible(onoff_holdoff);
...@@ -106,69 +207,16 @@ torture_onoff(void *arg) ...@@ -106,69 +207,16 @@ torture_onoff(void *arg)
} }
while (!torture_must_stop()) { while (!torture_must_stop()) {
cpu = (torture_random(&rand) >> 4) % (maxcpu + 1); cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
if (cpu_online(cpu) && cpu_is_hotpluggable(cpu)) { if (!torture_offline(cpu,
if (verbose) &n_offline_attempts, &n_offline_successes,
pr_alert("%s" TORTURE_FLAG &sum_offline, &min_offline, &max_offline))
"torture_onoff task: offlining %d\n", torture_online(cpu,
torture_type, cpu); &n_online_attempts, &n_online_successes,
starttime = jiffies; &sum_online, &min_online, &max_online);
n_offline_attempts++;
ret = cpu_down(cpu);
if (ret) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offline %d failed: errno %d\n",
torture_type, cpu, ret);
} else {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offlined %d\n",
torture_type, cpu);
n_offline_successes++;
delta = jiffies - starttime;
sum_offline += delta;
if (min_offline < 0) {
min_offline = delta;
max_offline = delta;
}
if (min_offline > delta)
min_offline = delta;
if (max_offline < delta)
max_offline = delta;
}
} else if (cpu_is_hotpluggable(cpu)) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: onlining %d\n",
torture_type, cpu);
starttime = jiffies;
n_online_attempts++;
ret = cpu_up(cpu);
if (ret) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: online %d failed: errno %d\n",
torture_type, cpu, ret);
} else {
if (verbose)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: onlined %d\n",
torture_type, cpu);
n_online_successes++;
delta = jiffies - starttime;
sum_online += delta;
if (min_online < 0) {
min_online = delta;
max_online = delta;
}
if (min_online > delta)
min_online = delta;
if (max_online < delta)
max_online = delta;
}
}
schedule_timeout_interruptible(onoff_interval); schedule_timeout_interruptible(onoff_interval);
} }
stop:
torture_kthread_stopping("torture_onoff"); torture_kthread_stopping("torture_onoff");
return 0; return 0;
} }
......
...@@ -1307,22 +1307,6 @@ config RCU_PERF_TEST ...@@ -1307,22 +1307,6 @@ config RCU_PERF_TEST
Say M if you want the RCU performance tests to build as a module. Say M if you want the RCU performance tests to build as a module.
Say N if you are unsure. Say N if you are unsure.
config RCU_PERF_TEST_RUNNABLE
bool "performance tests for RCU runnable by default"
depends on RCU_PERF_TEST = y
default n
help
This option provides a way to build the RCU performance tests
directly into the kernel without them starting up at boot time.
You can use /sys/module to manually override this setting.
This /proc file is available only when the RCU performance
tests have been built into the kernel.
Say Y here if you want the RCU performance tests to start during
boot (you probably don't).
Say N here if you want the RCU performance tests to start only
after being manually enabled via /sys/module.
config RCU_TORTURE_TEST config RCU_TORTURE_TEST
tristate "torture tests for RCU" tristate "torture tests for RCU"
depends on DEBUG_KERNEL depends on DEBUG_KERNEL
...@@ -1340,23 +1324,6 @@ config RCU_TORTURE_TEST ...@@ -1340,23 +1324,6 @@ config RCU_TORTURE_TEST
Say M if you want the RCU torture tests to build as a module. Say M if you want the RCU torture tests to build as a module.
Say N if you are unsure. Say N if you are unsure.
config RCU_TORTURE_TEST_RUNNABLE
bool "torture tests for RCU runnable by default"
depends on RCU_TORTURE_TEST = y
default n
help
This option provides a way to build the RCU torture tests
directly into the kernel without them starting up at boot
time. You can use /proc/sys/kernel/rcutorture_runnable
to manually override this setting. This /proc file is
available only when the RCU torture tests have been built
into the kernel.
Say Y here if you want the RCU torture tests to start during
boot (you probably don't).
Say N here if you want the RCU torture tests to start only
after being manually enabled via /proc.
config RCU_TORTURE_TEST_SLOW_PREINIT config RCU_TORTURE_TEST_SLOW_PREINIT
bool "Slow down RCU grace-period pre-initialization to expose races" bool "Slow down RCU grace-period pre-initialization to expose races"
depends on RCU_TORTURE_TEST depends on RCU_TORTURE_TEST
......
...@@ -99,8 +99,9 @@ configfrag_hotplug_cpu () { ...@@ -99,8 +99,9 @@ configfrag_hotplug_cpu () {
# identify_boot_image qemu-cmd # identify_boot_image qemu-cmd
# #
# Returns the relative path to the kernel build image. This will be # Returns the relative path to the kernel build image. This will be
# arch/<arch>/boot/bzImage unless overridden with the TORTURE_BOOT_IMAGE # arch/<arch>/boot/bzImage or vmlinux if bzImage is not a target for the
# environment variable. # architecture, unless overridden with the TORTURE_BOOT_IMAGE environment
# variable.
identify_boot_image () { identify_boot_image () {
if test -n "$TORTURE_BOOT_IMAGE" if test -n "$TORTURE_BOOT_IMAGE"
then then
...@@ -110,11 +111,8 @@ identify_boot_image () { ...@@ -110,11 +111,8 @@ identify_boot_image () {
qemu-system-x86_64|qemu-system-i386) qemu-system-x86_64|qemu-system-i386)
echo arch/x86/boot/bzImage echo arch/x86/boot/bzImage
;; ;;
qemu-system-ppc64)
echo arch/powerpc/boot/bzImage
;;
*) *)
echo "" echo vmlinux
;; ;;
esac esac
fi fi
...@@ -175,7 +173,7 @@ identify_qemu_args () { ...@@ -175,7 +173,7 @@ identify_qemu_args () {
qemu-system-x86_64|qemu-system-i386) qemu-system-x86_64|qemu-system-i386)
;; ;;
qemu-system-ppc64) qemu-system-ppc64)
echo -enable-kvm -M pseries -cpu POWER7 -nodefaults echo -enable-kvm -M pseries -nodefaults
echo -device spapr-vscsi echo -device spapr-vscsi
if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC" if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC"
then then
......
...@@ -8,9 +8,9 @@ ...@@ -8,9 +8,9 @@
# #
# Usage: kvm-test-1-run.sh config builddir resdir seconds qemu-args boot_args # Usage: kvm-test-1-run.sh config builddir resdir seconds qemu-args boot_args
# #
# qemu-args defaults to "-enable-kvm -soundhw pcspk -nographic", along with # qemu-args defaults to "-enable-kvm -nographic", along with arguments
# arguments specifying the number of CPUs and other # specifying the number of CPUs and other options
# options generated from the underlying CPU architecture. # generated from the underlying CPU architecture.
# boot_args defaults to value returned by the per_version_boot_params # boot_args defaults to value returned by the per_version_boot_params
# shell function. # shell function.
# #
...@@ -96,7 +96,8 @@ if test "$base_resdir" != "$resdir" -a -f $base_resdir/bzImage -a -f $base_resdi ...@@ -96,7 +96,8 @@ if test "$base_resdir" != "$resdir" -a -f $base_resdir/bzImage -a -f $base_resdi
then then
# Rerunning previous test, so use that test's kernel. # Rerunning previous test, so use that test's kernel.
QEMU="`identify_qemu $base_resdir/vmlinux`" QEMU="`identify_qemu $base_resdir/vmlinux`"
KERNEL=$base_resdir/bzImage BOOT_IMAGE="`identify_boot_image $QEMU`"
KERNEL=$base_resdir/${BOOT_IMAGE##*/} # use the last component of ${BOOT_IMAGE}
ln -s $base_resdir/Make*.out $resdir # for kvm-recheck.sh ln -s $base_resdir/Make*.out $resdir # for kvm-recheck.sh
ln -s $base_resdir/.config $resdir # for kvm-recheck.sh ln -s $base_resdir/.config $resdir # for kvm-recheck.sh
elif kvm-build.sh $config_template $builddir $T elif kvm-build.sh $config_template $builddir $T
...@@ -110,7 +111,7 @@ then ...@@ -110,7 +111,7 @@ then
if test -n "$BOOT_IMAGE" if test -n "$BOOT_IMAGE"
then then
cp $builddir/$BOOT_IMAGE $resdir cp $builddir/$BOOT_IMAGE $resdir
KERNEL=$resdir/bzImage KERNEL=$resdir/${BOOT_IMAGE##*/}
else else
echo No identifiable boot image, not running KVM, see $resdir. echo No identifiable boot image, not running KVM, see $resdir.
echo Do the torture scripts know about your architecture? echo Do the torture scripts know about your architecture?
...@@ -147,7 +148,7 @@ then ...@@ -147,7 +148,7 @@ then
fi fi
# Generate -smp qemu argument. # Generate -smp qemu argument.
qemu_args="-enable-kvm -soundhw pcspk -nographic $qemu_args" qemu_args="-enable-kvm -nographic $qemu_args"
cpu_count=`configNR_CPUS.sh $config_template` cpu_count=`configNR_CPUS.sh $config_template`
cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"`
vcpus=`identify_qemu_vcpus` vcpus=`identify_qemu_vcpus`
...@@ -229,6 +230,7 @@ fi ...@@ -229,6 +230,7 @@ fi
if test $commandcompleted -eq 0 -a -n "$qemu_pid" if test $commandcompleted -eq 0 -a -n "$qemu_pid"
then then
echo Grace period for qemu job at pid $qemu_pid echo Grace period for qemu job at pid $qemu_pid
oldline="`tail $resdir/console.log`"
while : while :
do do
kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
...@@ -238,13 +240,29 @@ then ...@@ -238,13 +240,29 @@ then
else else
break break
fi fi
if test $kruntime -ge $((seconds + $TORTURE_SHUTDOWN_GRACE)) must_continue=no
newline="`tail $resdir/console.log`"
if test "$newline" != "$oldline" && echo $newline | grep -q ' [0-9]\+us : '
then
must_continue=yes
fi
last_ts="`tail $resdir/console.log | grep '^\[ *[0-9]\+\.[0-9]\+]' | tail -1 | sed -e 's/^\[ *//' -e 's/\..*$//'`"
if test -z "last_ts"
then
last_ts=0
fi
if test "$newline" != "$oldline" -a "$last_ts" -lt $((seconds + $TORTURE_SHUTDOWN_GRACE))
then
must_continue=yes
fi
if test $must_continue = no -a $kruntime -ge $((seconds + $TORTURE_SHUTDOWN_GRACE))
then then
echo "!!! PID $qemu_pid hung at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 echo "!!! PID $qemu_pid hung at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1
kill -KILL $qemu_pid kill -KILL $qemu_pid
break break
fi fi
sleep 1 oldline=$newline
sleep 10
done done
elif test -z "$qemu_pid" elif test -z "$qemu_pid"
then then
......
...@@ -48,7 +48,7 @@ resdir="" ...@@ -48,7 +48,7 @@ resdir=""
configs="" configs=""
cpus=0 cpus=0
ds=`date +%Y.%m.%d-%H:%M:%S` ds=`date +%Y.%m.%d-%H:%M:%S`
jitter=0 jitter="-1"
. functions.sh . functions.sh
......
...@@ -33,7 +33,7 @@ if grep -Pq '\x00' < $file ...@@ -33,7 +33,7 @@ if grep -Pq '\x00' < $file
then then
print_warning Console output contains nul bytes, old qemu still running? print_warning Console output contains nul bytes, old qemu still running?
fi fi
egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state' < $file | grep -v 'ODEBUG: ' | grep -v 'Warning: unable to open an initial console' > $1.diags egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for' < $file | grep -v 'ODEBUG: ' | grep -v 'Warning: unable to open an initial console' > $1.diags
if test -s $1.diags if test -s $1.diags
then then
print_warning Assertion failure in $file $title print_warning Assertion failure in $file $title
...@@ -69,6 +69,11 @@ then ...@@ -69,6 +69,11 @@ then
then then
summary="$summary Stalls: $n_stalls" summary="$summary Stalls: $n_stalls"
fi fi
n_starves=`grep -c 'rcu_.*kthread starved for' $1`
if test "$n_starves" -ne 0
then
summary="$summary Starves: $n_starves"
fi
print_warning Summary: $summary print_warning Summary: $summary
else else
rm $1.diags rm $1.diags
......
...@@ -13,6 +13,22 @@ cd initrd ...@@ -13,6 +13,22 @@ cd initrd
cpio -id < /tmp/initrd.img.zcat cpio -id < /tmp/initrd.img.zcat
------------------------------------------------------------------------ ------------------------------------------------------------------------
Another way to create an initramfs image is using "dracut"[1], which is
available on many distros, however the initramfs dracut generates is a cpio
archive with another cpio archive in it, so an extra step is needed to create
the initrd directory hierarchy.
Here are the commands to create a initrd directory for rcutorture using
dracut:
------------------------------------------------------------------------
dracut --no-hostonly --no-hostonly-cmdline --module "base bash shutdown" /tmp/initramfs.img
cd tools/testing/selftests/rcutorture
mkdir initrd
cd initrd
/usr/lib/dracut/skipcpio /tmp/initramfs.img | zcat | cpio -id < /tmp/initramfs.img
------------------------------------------------------------------------
Interestingly enough, if you are running rcutorture, you don't really Interestingly enough, if you are running rcutorture, you don't really
need userspace in many cases. Running without userspace has the need userspace in many cases. Running without userspace has the
advantage of allowing you to test your kernel independently of the advantage of allowing you to test your kernel independently of the
...@@ -89,3 +105,9 @@ while : ...@@ -89,3 +105,9 @@ while :
do do
sleep 10 sleep 10
done done
------------------------------------------------------------------------
References:
[1]: https://dracut.wiki.kernel.org/index.php/Main_Page
[2]: http://blog.elastocloud.org/2015/06/rapid-linux-kernel-devtest-with-qemu.html
[3]: https://www.centos.org/forums/viewtopic.php?t=51621
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment