Commit af96134d authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:
 "Documentation updates

  Miscellaneous fixes, perhaps most notably:

   - Remove RCU_NONIDLE(). The new visibility of most of the idle loop
     to RCU has obsoleted this API.

   - Make the RCU_SOFTIRQ callback-invocation time limit also apply to
     the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.

   - Add a jiffies-based callback-invocation time limit to handle
     long-running callbacks. (The local_clock() function is only invoked
     once per 32 callbacks due to its high overhead.)

   - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which
     fixes a bug that can occur on systems with non-contiguous CPU
     numbering.

  kvfree_rcu updates:

   - Eliminate the single-argument variant of k[v]free_rcu() now that
     all uses have been converted to k[v]free_rcu_mightsleep().

   - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too
     soon. Yes, this is closing the barn door after the horse has
     escaped, but Murphy says that there will be more horses.

  Callback-offloading updates:

   - Fix a number of bugs involving the shrinker and lazy callbacks.

  Tasks RCU updates

  Torture-test updates"

* tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
  torture: Remove duplicated argument -enable-kvm for ppc64
  doc/rcutorture: Add description of rcutorture.stall_cpu_block
  rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
  rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
  rcutorture: Correct name of use_softirq module parameter
  locktorture: Add long_hold to adjust lock-hold delays
  rcu/nocb: Make shrinker iterate only over NOCB CPUs
  rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
  rcu: Make rcu_cpu_starting() rely on interrupts being disabled
  rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work
  rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
  rcu: Employ jiffies-based backstop to callback time limit
  rcu: Check callback-invocation time limit for rcuc kthreads
  rcu: Remove RCU_NONIDLE()
  rcu: Add more RCU files to kernel-api.rst
  rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output
  rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
  rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
  rcu/nocb: Fix shrinker race against callback enqueuer
  rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
  ...
parents 1ef6663a 2e31da75
...@@ -2071,41 +2071,7 @@ call. ...@@ -2071,41 +2071,7 @@ call.
Because RCU avoids interrupting idle CPUs, it is illegal to execute an Because RCU avoids interrupting idle CPUs, it is illegal to execute an
RCU read-side critical section on an idle CPU. (Kernels built with RCU read-side critical section on an idle CPU. (Kernels built with
``CONFIG_PROVE_RCU=y`` will splat if you try it.) The RCU_NONIDLE() ``CONFIG_PROVE_RCU=y`` will splat if you try it.)
macro and ``_rcuidle`` event tracing is provided to work around this
restriction. In addition, rcu_is_watching() may be used to test
whether or not it is currently legal to run RCU read-side critical
sections on this CPU. I learned of the need for diagnostics on the one
hand and RCU_NONIDLE() on the other while inspecting idle-loop code.
Steven Rostedt supplied ``_rcuidle`` event tracing, which is used quite
heavily in the idle loop. However, there are some restrictions on the
code placed within RCU_NONIDLE():
#. Blocking is prohibited. In practice, this is not a serious
restriction given that idle tasks are prohibited from blocking to
begin with.
#. Although nesting RCU_NONIDLE() is permitted, they cannot nest
indefinitely deeply. However, given that they can be nested on the
order of a million deep, even on 32-bit systems, this should not be a
serious restriction. This nesting limit would probably be reached
long after the compiler OOMed or the stack overflowed.
#. Any code path that enters RCU_NONIDLE() must sequence out of that
same RCU_NONIDLE(). For example, the following is grossly
illegal:
::
1 RCU_NONIDLE({
2 do_something();
3 goto bad_idea; /* BUG!!! */
4 do_something_else();});
5 bad_idea:
It is just as illegal to transfer control into the middle of
RCU_NONIDLE()'s argument. Yes, in theory, you could transfer in
as long as you also transferred out, but in practice you could also
expect to get sharply worded review comments.
It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU
running in userspace. RCU must therefore track ``nohz_full`` userspace running in userspace. RCU must therefore track ``nohz_full`` userspace
......
...@@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs:: ...@@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs::
RCU_LOCKDEP_WARN RCU_LOCKDEP_WARN
rcu_sleep_check rcu_sleep_check
RCU_NONIDLE
All: Unchecked RCU-protected pointer access:: All: Unchecked RCU-protected pointer access::
......
...@@ -4731,43 +4731,6 @@ ...@@ -4731,43 +4731,6 @@
the propagation of recent CPU-hotplug changes up the propagation of recent CPU-hotplug changes up
the rcu_node combining tree. the rcu_node combining tree.
rcutree.use_softirq= [KNL]
If set to zero, move all RCU_SOFTIRQ processing to
per-CPU rcuc kthreads. Defaults to a non-zero
value, meaning that RCU_SOFTIRQ is used by default.
Specify rcutree.use_softirq=0 to use rcuc kthreads.
But note that CONFIG_PREEMPT_RT=y kernels disable
this kernel boot parameter, forcibly setting it
to zero.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
tree. This is used by rcutorture, and might
possibly be useful for architectures having high
cache-to-cache transfer latencies.
rcutree.rcu_fanout_leaf= [KNL]
Change the number of CPUs assigned to each
leaf rcu_node structure. Useful for very
large systems, which will choose the value 64,
and for NUMA systems with large remote-access
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.rcu_delay_page_cache_fill_msec= [KNL]
Set the page-cache refill delay (in milliseconds)
in response to low-memory conditions. The range
of permitted values is in the range 0:100000.
rcutree.jiffies_till_first_fqs= [KNL] rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to Set delay from grace-period initialization to
first attempt to force quiescent states. first attempt to force quiescent states.
...@@ -4806,21 +4769,6 @@ ...@@ -4806,21 +4769,6 @@
When RCU_NOCB_CPU is set, also adjust the When RCU_NOCB_CPU is set, also adjust the
priority of NOCB callback kthreads. priority of NOCB callback kthreads.
rcutree.rcu_divisor= [KNL]
Set the shift-right count to use to compute
the callback-invocation batch limit bl from
the number of callbacks queued on this CPU.
The result will be bounded below by the value of
the rcutree.blimit kernel parameter. Every bl
callbacks, the softirq handler will exit in
order to allow the CPU to do other work.
Please note that this callback-invocation batch
limit applies only to non-offloaded callback
invocation. Offloaded callbacks are instead
invoked in the context of an rcuoc kthread, which
scheduler will preempt as it does any other task.
rcutree.nocb_nobypass_lim_per_jiffy= [KNL] rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
On callback-offloaded (rcu_nocbs) CPUs, On callback-offloaded (rcu_nocbs) CPUs,
RCU reduces the lock contention that would RCU reduces the lock contention that would
...@@ -4834,14 +4782,6 @@ ...@@ -4834,14 +4782,6 @@
the ->nocb_bypass queue. The definition of "too the ->nocb_bypass queue. The definition of "too
many" is supplied by this kernel boot parameter. many" is supplied by this kernel boot parameter.
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
of the number of CPUs. Larger numbers reduce
the wakeup overhead on the global grace-period
kthread, but increases that same overhead on
each group's NOCB grace-period kthread.
rcutree.qhimark= [KNL] rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which Set threshold of queued RCU callbacks beyond which
batch limiting is disabled. batch limiting is disabled.
...@@ -4859,6 +4799,56 @@ ...@@ -4859,6 +4799,56 @@
on rcutree.qhimark at boot time and to zero to on rcutree.qhimark at boot time and to zero to
disable more aggressive help enlistment. disable more aggressive help enlistment.
rcutree.rcu_delay_page_cache_fill_msec= [KNL]
Set the page-cache refill delay (in milliseconds)
in response to low-memory conditions. The range
of permitted values is in the range 0:100000.
rcutree.rcu_divisor= [KNL]
Set the shift-right count to use to compute
the callback-invocation batch limit bl from
the number of callbacks queued on this CPU.
The result will be bounded below by the value of
the rcutree.blimit kernel parameter. Every bl
callbacks, the softirq handler will exit in
order to allow the CPU to do other work.
Please note that this callback-invocation batch
limit applies only to non-offloaded callback
invocation. Offloaded callbacks are instead
invoked in the context of an rcuoc kthread, which
scheduler will preempt as it does any other task.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
tree. This is used by rcutorture, and might
possibly be useful for architectures having high
cache-to-cache transfer latencies.
rcutree.rcu_fanout_leaf= [KNL]
Change the number of CPUs assigned to each
leaf rcu_node structure. Useful for very
large systems, which will choose the value 64,
and for NUMA systems with large remote-access
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
of the number of CPUs. Larger numbers reduce
the wakeup overhead on the global grace-period
kthread, but increases that same overhead on
each group's NOCB grace-period kthread.
rcutree.rcu_kick_kthreads= [KNL] rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra Cause the grace-period kthread to get an extra
wake_up() if it sleeps three times longer than wake_up() if it sleeps three times longer than
...@@ -4866,6 +4856,13 @@ ...@@ -4866,6 +4856,13 @@
This wake_up() will be accompanied by a This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump(). WARN_ONCE() splat and an ftrace_dump().
rcutree.rcu_resched_ns= [KNL]
Limit the time spend invoking a batch of RCU
callbacks to the specified number of nanoseconds.
By default, this limit is checked only once
every 32 callbacks in order to limit the pain
inflicted by local_clock() overhead.
rcutree.rcu_unlock_delay= [KNL] rcutree.rcu_unlock_delay= [KNL]
In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
this specifies an rcu_read_unlock()-time delay this specifies an rcu_read_unlock()-time delay
...@@ -4880,6 +4877,16 @@ ...@@ -4880,6 +4877,16 @@
rcu_node tree with an eye towards determining rcu_node tree with an eye towards determining
why a new grace period has not yet started. why a new grace period has not yet started.
rcutree.use_softirq= [KNL]
If set to zero, move all RCU_SOFTIRQ processing to
per-CPU rcuc kthreads. Defaults to a non-zero
value, meaning that RCU_SOFTIRQ is used by default.
Specify rcutree.use_softirq=0 to use rcuc kthreads.
But note that CONFIG_PREEMPT_RT=y kernels disable
this kernel boot parameter, forcibly setting it
to zero.
rcuscale.gp_async= [KNL] rcuscale.gp_async= [KNL]
Measure performance of asynchronous Measure performance of asynchronous
grace-period primitives such as call_rcu(). grace-period primitives such as call_rcu().
...@@ -5082,8 +5089,17 @@ ...@@ -5082,8 +5089,17 @@
rcutorture.stall_cpu_block= [KNL] rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result Sleep while stalling if set. This will result
in warnings from preemptible RCU in addition in warnings from preemptible RCU in addition to
to any other stall-related activity. any other stall-related activity. Note that
in kernels built with CONFIG_PREEMPTION=n and
CONFIG_PREEMPT_COUNT=y, this parameter will
cause the CPU to pass through a quiescent state.
Given CONFIG_PREEMPTION=n, this will suppress
RCU CPU stall warnings, but will instead result
in scheduling-while-atomic splats.
Use of this module parameter results in splats.
rcutorture.stall_cpu_holdoff= [KNL] rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall. Time to wait (s) after boot before inducing stall.
......
...@@ -412,3 +412,15 @@ Read-Copy Update (RCU) ...@@ -412,3 +412,15 @@ Read-Copy Update (RCU)
.. kernel-doc:: include/linux/rcu_sync.h .. kernel-doc:: include/linux/rcu_sync.h
.. kernel-doc:: kernel/rcu/sync.c .. kernel-doc:: kernel/rcu/sync.c
.. kernel-doc:: kernel/rcu/tasks.h
.. kernel-doc:: kernel/rcu/tree_stall.h
.. kernel-doc:: include/linux/rcupdate_trace.h
.. kernel-doc:: include/linux/rcupdate_wait.h
.. kernel-doc:: include/linux/rcuref.h
.. kernel-doc:: include/linux/rcutree.h
...@@ -17822,7 +17822,7 @@ M: Boqun Feng <boqun.feng@gmail.com> ...@@ -17822,7 +17822,7 @@ M: Boqun Feng <boqun.feng@gmail.com>
R: Steven Rostedt <rostedt@goodmis.org> R: Steven Rostedt <rostedt@goodmis.org>
R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
R: Lai Jiangshan <jiangshanlai@gmail.com> R: Lai Jiangshan <jiangshanlai@gmail.com>
R: Zqiang <qiang1.zhang@intel.com> R: Zqiang <qiang.zhang1211@gmail.com>
L: rcu@vger.kernel.org L: rcu@vger.kernel.org
S: Supported S: Supported
W: http://www.rdrop.com/users/paulmck/RCU/ W: http://www.rdrop.com/users/paulmck/RCU/
......
...@@ -1357,7 +1357,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) ...@@ -1357,7 +1357,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
if (cleanup_err) if (cleanup_err)
rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err); rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err);
kfree_rcu(mr); kfree_rcu_mightsleep(mr);
return 0; return 0;
err_out: err_out:
......
...@@ -156,31 +156,6 @@ static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; } ...@@ -156,31 +156,6 @@ static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
static inline void rcu_nocb_flush_deferred_wakeup(void) { } static inline void rcu_nocb_flush_deferred_wakeup(void) { }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
/**
* RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
* @a: Code that RCU needs to pay attention to.
*
* RCU read-side critical sections are forbidden in the inner idle loop,
* that is, between the ct_idle_enter() and the ct_idle_exit() -- RCU
* will happily ignore any such read-side critical sections. However,
* things like powertop need tracepoints in the inner idle loop.
*
* This macro provides the way out: RCU_NONIDLE(do_something_with_RCU())
* will tell RCU that it needs to pay attention, invoke its argument
* (in this example, calling the do_something_with_RCU() function),
* and then tell RCU to go back to ignoring this CPU. It is permissible
* to nest RCU_NONIDLE() wrappers, but not indefinitely (but the limit is
* on the order of a million or so, even on 32-bit systems). It is
* not legal to block within RCU_NONIDLE(), nor is it permissible to
* transfer control either into or out of RCU_NONIDLE()'s statement.
*/
#define RCU_NONIDLE(a) \
do { \
ct_irq_enter_irqson(); \
do { a; } while (0); \
ct_irq_exit_irqson(); \
} while (0)
/* /*
* Note a quasi-voluntary context switch for RCU-tasks's benefit. * Note a quasi-voluntary context switch for RCU-tasks's benefit.
* This is a macro rather than an inline function to avoid #include hell. * This is a macro rather than an inline function to avoid #include hell.
...@@ -957,9 +932,8 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -957,9 +932,8 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
/** /**
* kfree_rcu() - kfree an object after a grace period. * kfree_rcu() - kfree an object after a grace period.
* @ptr: pointer to kfree for both single- and double-argument invocations. * @ptr: pointer to kfree for double-argument invocations.
* @rhf: the name of the struct rcu_head within the type of @ptr, * @rhf: the name of the struct rcu_head within the type of @ptr.
* but only for double-argument invocations.
* *
* Many rcu callbacks functions just call kfree() on the base structure. * Many rcu callbacks functions just call kfree() on the base structure.
* These functions are trivial, but their size adds up, and furthermore * These functions are trivial, but their size adds up, and furthermore
...@@ -984,26 +958,18 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -984,26 +958,18 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* The BUILD_BUG_ON check must not involve any function calls, hence the * The BUILD_BUG_ON check must not involve any function calls, hence the
* checks are done in macros here. * checks are done in macros here.
*/ */
#define kfree_rcu(ptr, rhf...) kvfree_rcu(ptr, ## rhf) #define kfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
#define kvfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
/** /**
* kvfree_rcu() - kvfree an object after a grace period. * kfree_rcu_mightsleep() - kfree an object after a grace period.
* * @ptr: pointer to kfree for single-argument invocations.
* This macro consists of one or two arguments and it is
* based on whether an object is head-less or not. If it
* has a head then a semantic stays the same as it used
* to be before:
*
* kvfree_rcu(ptr, rhf);
*
* where @ptr is a pointer to kvfree(), @rhf is the name
* of the rcu_head structure within the type of @ptr.
* *
* When it comes to head-less variant, only one argument * When it comes to head-less variant, only one argument
* is passed and that is just a pointer which has to be * is passed and that is just a pointer which has to be
* freed after a grace period. Therefore the semantic is * freed after a grace period. Therefore the semantic is
* *
* kvfree_rcu(ptr); * kfree_rcu_mightsleep(ptr);
* *
* where @ptr is the pointer to be freed by kvfree(). * where @ptr is the pointer to be freed by kvfree().
* *
...@@ -1012,13 +978,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -1012,13 +978,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* annotation. Otherwise, please switch and embed the * annotation. Otherwise, please switch and embed the
* rcu_head structure within the type of @ptr. * rcu_head structure within the type of @ptr.
*/ */
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \ #define kfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
#define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr) #define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
#define kfree_rcu_mightsleep(ptr) kvfree_rcu_mightsleep(ptr)
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
#define kvfree_rcu_arg_2(ptr, rhf) \ #define kvfree_rcu_arg_2(ptr, rhf) \
do { \ do { \
typeof (ptr) ___p = (ptr); \ typeof (ptr) ___p = (ptr); \
......
...@@ -212,7 +212,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp) ...@@ -212,7 +212,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
srcu_check_nmi_safety(ssp, false); srcu_check_nmi_safety(ssp, false);
retval = __srcu_read_lock(ssp); retval = __srcu_read_lock(ssp);
srcu_lock_acquire(&(ssp)->dep_map); srcu_lock_acquire(&ssp->dep_map);
return retval; return retval;
} }
...@@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp ...@@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp
srcu_check_nmi_safety(ssp, true); srcu_check_nmi_safety(ssp, true);
retval = __srcu_read_lock_nmisafe(ssp); retval = __srcu_read_lock_nmisafe(ssp);
rcu_lock_acquire(&(ssp)->dep_map); rcu_lock_acquire(&ssp->dep_map);
return retval; return retval;
} }
...@@ -284,7 +284,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx) ...@@ -284,7 +284,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
{ {
WARN_ON_ONCE(idx & ~0x1); WARN_ON_ONCE(idx & ~0x1);
srcu_check_nmi_safety(ssp, false); srcu_check_nmi_safety(ssp, false);
srcu_lock_release(&(ssp)->dep_map); srcu_lock_release(&ssp->dep_map);
__srcu_read_unlock(ssp, idx); __srcu_read_unlock(ssp, idx);
} }
...@@ -300,7 +300,7 @@ static inline void srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) ...@@ -300,7 +300,7 @@ static inline void srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
{ {
WARN_ON_ONCE(idx & ~0x1); WARN_ON_ONCE(idx & ~0x1);
srcu_check_nmi_safety(ssp, true); srcu_check_nmi_safety(ssp, true);
rcu_lock_release(&(ssp)->dep_map); rcu_lock_release(&ssp->dep_map);
__srcu_read_unlock_nmisafe(ssp, idx); __srcu_read_unlock_nmisafe(ssp, idx);
} }
......
...@@ -33,24 +33,19 @@ ...@@ -33,24 +33,19 @@
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
torture_param(int, nwriters_stress, -1, torture_param(int, nwriters_stress, -1, "Number of write-locking stress-test threads");
"Number of write-locking stress-test threads"); torture_param(int, nreaders_stress, -1, "Number of read-locking stress-test threads");
torture_param(int, nreaders_stress, -1, torture_param(int, long_hold, 100, "Do occasional long hold of lock (ms), 0=disable");
"Number of read-locking stress-test threads");
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
torture_param(int, onoff_interval, 0, torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable");
"Time between CPU hotplugs (s), 0=disable"); torture_param(int, shuffle_interval, 3, "Number of jiffies between shuffles, 0=disable");
torture_param(int, shuffle_interval, 3,
"Number of jiffies between shuffles, 0=disable");
torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable."); torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
torture_param(int, stat_interval, 60, torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s");
"Number of seconds between stats printk()s");
torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable"); torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
torture_param(int, rt_boost, 2, torture_param(int, rt_boost, 2,
"Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types."); "Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types.");
torture_param(int, rt_boost_factor, 50, "A factor determining how often rt-boost happens."); torture_param(int, rt_boost_factor, 50, "A factor determining how often rt-boost happens.");
torture_param(int, verbose, 1, torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
"Enable verbose debugging printk()s");
torture_param(int, nested_locks, 0, "Number of nested locks (max = 8)"); torture_param(int, nested_locks, 0, "Number of nested locks (max = 8)");
/* Going much higher trips "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" errors */ /* Going much higher trips "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" errors */
#define MAX_NESTED_LOCKS 8 #define MAX_NESTED_LOCKS 8
...@@ -120,7 +115,7 @@ static int torture_lock_busted_write_lock(int tid __maybe_unused) ...@@ -120,7 +115,7 @@ static int torture_lock_busted_write_lock(int tid __maybe_unused)
static void torture_lock_busted_write_delay(struct torture_random_state *trsp) static void torture_lock_busted_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a long delay occasionally to force massive contention. */ /* We want a long delay occasionally to force massive contention. */
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
...@@ -198,16 +193,18 @@ __acquires(torture_spinlock) ...@@ -198,16 +193,18 @@ __acquires(torture_spinlock)
static void torture_spin_lock_write_delay(struct torture_random_state *trsp) static void torture_spin_lock_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long shortdelay_us = 2; const unsigned long shortdelay_us = 2;
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
unsigned long j;
/* We want a short delay mostly to emulate likely code, and /* We want a short delay mostly to emulate likely code, and
* we want a long delay occasionally to force massive contention. * we want a long delay occasionally to force massive contention.
*/ */
if (!(torture_random(trsp) % if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2000 * longdelay_ms))) {
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) j = jiffies;
mdelay(longdelay_ms); mdelay(longdelay_ms);
if (!(torture_random(trsp) % pr_alert("%s: delay = %lu jiffies.\n", __func__, jiffies - j);
(cxt.nrealwriters_stress * 2 * shortdelay_us))) }
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 200 * shortdelay_us)))
udelay(shortdelay_us); udelay(shortdelay_us);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
...@@ -322,7 +319,7 @@ __acquires(torture_rwlock) ...@@ -322,7 +319,7 @@ __acquires(torture_rwlock)
static void torture_rwlock_write_delay(struct torture_random_state *trsp) static void torture_rwlock_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long shortdelay_us = 2; const unsigned long shortdelay_us = 2;
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a short delay mostly to emulate likely code, and /* We want a short delay mostly to emulate likely code, and
* we want a long delay occasionally to force massive contention. * we want a long delay occasionally to force massive contention.
...@@ -455,14 +452,12 @@ __acquires(torture_mutex) ...@@ -455,14 +452,12 @@ __acquires(torture_mutex)
static void torture_mutex_delay(struct torture_random_state *trsp) static void torture_mutex_delay(struct torture_random_state *trsp)
{ {
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a long delay occasionally to force massive contention. */ /* We want a long delay occasionally to force massive contention. */
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms * 5); mdelay(longdelay_ms * 5);
else
mdelay(longdelay_ms / 5);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
} }
...@@ -630,7 +625,7 @@ __acquires(torture_rtmutex) ...@@ -630,7 +625,7 @@ __acquires(torture_rtmutex)
static void torture_rtmutex_delay(struct torture_random_state *trsp) static void torture_rtmutex_delay(struct torture_random_state *trsp)
{ {
const unsigned long shortdelay_us = 2; const unsigned long shortdelay_us = 2;
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* /*
* We want a short delay mostly to emulate likely code, and * We want a short delay mostly to emulate likely code, and
...@@ -640,7 +635,7 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp) ...@@ -640,7 +635,7 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp)
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms); mdelay(longdelay_ms);
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
(cxt.nrealwriters_stress * 2 * shortdelay_us))) (cxt.nrealwriters_stress * 200 * shortdelay_us)))
udelay(shortdelay_us); udelay(shortdelay_us);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
...@@ -695,14 +690,12 @@ __acquires(torture_rwsem) ...@@ -695,14 +690,12 @@ __acquires(torture_rwsem)
static void torture_rwsem_write_delay(struct torture_random_state *trsp) static void torture_rwsem_write_delay(struct torture_random_state *trsp)
{ {
const unsigned long longdelay_ms = 100; const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
/* We want a long delay occasionally to force massive contention. */ /* We want a long delay occasionally to force massive contention. */
if (!(torture_random(trsp) % if (!(torture_random(trsp) %
(cxt.nrealwriters_stress * 2000 * longdelay_ms))) (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms * 10); mdelay(longdelay_ms * 10);
else
mdelay(longdelay_ms / 10);
if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
torture_preempt_schedule(); /* Allow test to be preempted. */ torture_preempt_schedule(); /* Allow test to be preempted. */
} }
...@@ -848,8 +841,8 @@ static int lock_torture_writer(void *arg) ...@@ -848,8 +841,8 @@ static int lock_torture_writer(void *arg)
lwsp->n_lock_acquired++; lwsp->n_lock_acquired++;
} }
cxt.cur_ops->write_delay(&rand);
if (!skip_main_lock) { if (!skip_main_lock) {
cxt.cur_ops->write_delay(&rand);
lock_is_write_held = false; lock_is_write_held = false;
WRITE_ONCE(last_lock_release, jiffies); WRITE_ONCE(last_lock_release, jiffies);
cxt.cur_ops->writeunlock(tid); cxt.cur_ops->writeunlock(tid);
......
...@@ -314,4 +314,22 @@ config RCU_LAZY ...@@ -314,4 +314,22 @@ config RCU_LAZY
To save power, batch RCU callbacks and flush after delay, memory To save power, batch RCU callbacks and flush after delay, memory
pressure, or callback list growing too big. pressure, or callback list growing too big.
config RCU_DOUBLE_CHECK_CB_TIME
bool "RCU callback-batch backup time check"
depends on RCU_EXPERT
default n
help
Use this option to provide more precise enforcement of the
rcutree.rcu_resched_ns module parameter in situations where
a single RCU callback might run for hundreds of microseconds,
thus defeating the 32-callback batching used to amortize the
cost of the fine-grained but expensive local_clock() function.
This option rounds rcutree.rcu_resched_ns up to the next
jiffy, and overrides the 32-callback batching if this limit
is exceeded.
Say Y here if you need tighter callback-limit enforcement.
Say N here if you are unsure.
endmenu # "RCU Subsystem" endmenu # "RCU Subsystem"
...@@ -642,4 +642,10 @@ void show_rcu_tasks_trace_gp_kthread(void); ...@@ -642,4 +642,10 @@ void show_rcu_tasks_trace_gp_kthread(void);
static inline void show_rcu_tasks_trace_gp_kthread(void) {} static inline void show_rcu_tasks_trace_gp_kthread(void) {}
#endif #endif
#ifdef CONFIG_TINY_RCU
static inline bool rcu_cpu_beenfullyonline(int cpu) { return true; }
#else
bool rcu_cpu_beenfullyonline(int cpu);
#endif
#endif /* __LINUX_RCU_H */ #endif /* __LINUX_RCU_H */
...@@ -522,89 +522,6 @@ rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag) ...@@ -522,89 +522,6 @@ rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown); scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown);
} }
static void
rcu_scale_cleanup(void)
{
int i;
int j;
int ngps = 0;
u64 *wdp;
u64 *wdpp;
/*
* Would like warning at start, but everything is expedited
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (torture_cleanup_begin())
return;
if (!cur_ops) {
torture_cleanup_end();
return;
}
if (reader_tasks) {
for (i = 0; i < nrealreaders; i++)
torture_stop_kthread(rcu_scale_reader,
reader_tasks[i]);
kfree(reader_tasks);
}
if (writer_tasks) {
for (i = 0; i < nrealwriters; i++) {
torture_stop_kthread(rcu_scale_writer,
writer_tasks[i]);
if (!writer_n_durations)
continue;
j = writer_n_durations[i];
pr_alert("%s%s writer %d gps: %d\n",
scale_type, SCALE_FLAG, i, j);
ngps += j;
}
pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
scale_type, SCALE_FLAG,
t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
t_rcu_scale_writer_finished -
t_rcu_scale_writer_started,
ngps,
rcuscale_seq_diff(b_rcu_gp_test_finished,
b_rcu_gp_test_started));
for (i = 0; i < nrealwriters; i++) {
if (!writer_durations)
break;
if (!writer_n_durations)
continue;
wdpp = writer_durations[i];
if (!wdpp)
continue;
for (j = 0; j < writer_n_durations[i]; j++) {
wdp = &wdpp[j];
pr_alert("%s%s %4d writer-duration: %5d %llu\n",
scale_type, SCALE_FLAG,
i, j, *wdp);
if (j % 100 == 0)
schedule_timeout_uninterruptible(1);
}
kfree(writer_durations[i]);
}
kfree(writer_tasks);
kfree(writer_durations);
kfree(writer_n_durations);
}
/* Do torture-type-specific cleanup operations. */
if (cur_ops->cleanup != NULL)
cur_ops->cleanup();
torture_cleanup_end();
}
/* /*
* Return the number if non-negative. If -1, the number of CPUs. * Return the number if non-negative. If -1, the number of CPUs.
* If less than -1, that much less than the number of CPUs, but * If less than -1, that much less than the number of CPUs, but
...@@ -624,20 +541,6 @@ static int compute_real(int n) ...@@ -624,20 +541,6 @@ static int compute_real(int n)
return nr; return nr;
} }
/*
* RCU scalability shutdown kthread. Just waits to be awakened, then shuts
* down system.
*/
static int
rcu_scale_shutdown(void *arg)
{
wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_scale_cleanup();
kernel_power_off();
return -EINVAL;
}
/* /*
* kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number * kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number
* of iterations and measure total time and number of GP for all iterations to complete. * of iterations and measure total time and number of GP for all iterations to complete.
...@@ -874,6 +777,108 @@ kfree_scale_init(void) ...@@ -874,6 +777,108 @@ kfree_scale_init(void)
return firsterr; return firsterr;
} }
static void
rcu_scale_cleanup(void)
{
int i;
int j;
int ngps = 0;
u64 *wdp;
u64 *wdpp;
/*
* Would like warning at start, but everything is expedited
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (kfree_rcu_test) {
kfree_scale_cleanup();
return;
}
if (torture_cleanup_begin())
return;
if (!cur_ops) {
torture_cleanup_end();
return;
}
if (reader_tasks) {
for (i = 0; i < nrealreaders; i++)
torture_stop_kthread(rcu_scale_reader,
reader_tasks[i]);
kfree(reader_tasks);
}
if (writer_tasks) {
for (i = 0; i < nrealwriters; i++) {
torture_stop_kthread(rcu_scale_writer,
writer_tasks[i]);
if (!writer_n_durations)
continue;
j = writer_n_durations[i];
pr_alert("%s%s writer %d gps: %d\n",
scale_type, SCALE_FLAG, i, j);
ngps += j;
}
pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
scale_type, SCALE_FLAG,
t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
t_rcu_scale_writer_finished -
t_rcu_scale_writer_started,
ngps,
rcuscale_seq_diff(b_rcu_gp_test_finished,
b_rcu_gp_test_started));
for (i = 0; i < nrealwriters; i++) {
if (!writer_durations)
break;
if (!writer_n_durations)
continue;
wdpp = writer_durations[i];
if (!wdpp)
continue;
for (j = 0; j < writer_n_durations[i]; j++) {
wdp = &wdpp[j];
pr_alert("%s%s %4d writer-duration: %5d %llu\n",
scale_type, SCALE_FLAG,
i, j, *wdp);
if (j % 100 == 0)
schedule_timeout_uninterruptible(1);
}
kfree(writer_durations[i]);
}
kfree(writer_tasks);
kfree(writer_durations);
kfree(writer_n_durations);
}
/* Do torture-type-specific cleanup operations. */
if (cur_ops->cleanup != NULL)
cur_ops->cleanup();
torture_cleanup_end();
}
/*
* RCU scalability shutdown kthread. Just waits to be awakened, then shuts
* down system.
*/
static int
rcu_scale_shutdown(void *arg)
{
wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_scale_cleanup();
kernel_power_off();
return -EINVAL;
}
static int __init static int __init
rcu_scale_init(void) rcu_scale_init(void)
{ {
......
...@@ -241,7 +241,6 @@ static void cblist_init_generic(struct rcu_tasks *rtp) ...@@ -241,7 +241,6 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
if (rcu_task_enqueue_lim < 0) { if (rcu_task_enqueue_lim < 0) {
rcu_task_enqueue_lim = 1; rcu_task_enqueue_lim = 1;
rcu_task_cb_adjust = true; rcu_task_cb_adjust = true;
pr_info("%s: Setting adjustable number of callback queues.\n", __func__);
} else if (rcu_task_enqueue_lim == 0) { } else if (rcu_task_enqueue_lim == 0) {
rcu_task_enqueue_lim = 1; rcu_task_enqueue_lim = 1;
} }
...@@ -272,7 +271,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp) ...@@ -272,7 +271,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled. raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
} }
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags); raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
pr_info("%s: Setting shift to %d and lim to %d.\n", __func__, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim));
pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim), rcu_task_cb_adjust);
} }
// IRQ-work handler that does deferred wakeup for call_rcu_tasks_generic(). // IRQ-work handler that does deferred wakeup for call_rcu_tasks_generic().
...@@ -463,6 +464,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu ...@@ -463,6 +464,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
{ {
int cpu; int cpu;
int cpunext; int cpunext;
int cpuwq;
unsigned long flags; unsigned long flags;
int len; int len;
struct rcu_head *rhp; struct rcu_head *rhp;
...@@ -473,11 +475,13 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu ...@@ -473,11 +475,13 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
cpunext = cpu * 2 + 1; cpunext = cpu * 2 + 1;
if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work); cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND;
queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
cpunext++; cpunext++;
if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work); cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND;
queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
} }
} }
......
...@@ -2046,19 +2046,35 @@ rcu_check_quiescent_state(struct rcu_data *rdp) ...@@ -2046,19 +2046,35 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
rcu_report_qs_rdp(rdp); rcu_report_qs_rdp(rdp);
} }
/* Return true if callback-invocation time limit exceeded. */
static bool rcu_do_batch_check_time(long count, long tlimit,
bool jlimit_check, unsigned long jlimit)
{
// Invoke local_clock() only once per 32 consecutive callbacks.
return unlikely(tlimit) &&
(!likely(count & 31) ||
(IS_ENABLED(CONFIG_RCU_DOUBLE_CHECK_CB_TIME) &&
jlimit_check && time_after(jiffies, jlimit))) &&
local_clock() >= tlimit;
}
/* /*
* Invoke any RCU callbacks that have made it to the end of their grace * Invoke any RCU callbacks that have made it to the end of their grace
* period. Throttle as specified by rdp->blimit. * period. Throttle as specified by rdp->blimit.
*/ */
static void rcu_do_batch(struct rcu_data *rdp) static void rcu_do_batch(struct rcu_data *rdp)
{ {
long bl;
long count = 0;
int div; int div;
bool __maybe_unused empty; bool __maybe_unused empty;
unsigned long flags; unsigned long flags;
struct rcu_head *rhp; unsigned long jlimit;
bool jlimit_check = false;
long pending;
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl); struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
long bl, count = 0; struct rcu_head *rhp;
long pending, tlimit = 0; long tlimit = 0;
/* If no callbacks are ready, just return. */ /* If no callbacks are ready, just return. */
if (!rcu_segcblist_ready_cbs(&rdp->cblist)) { if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
...@@ -2082,11 +2098,15 @@ static void rcu_do_batch(struct rcu_data *rdp) ...@@ -2082,11 +2098,15 @@ static void rcu_do_batch(struct rcu_data *rdp)
div = READ_ONCE(rcu_divisor); div = READ_ONCE(rcu_divisor);
div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div; div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
bl = max(rdp->blimit, pending >> div); bl = max(rdp->blimit, pending >> div);
if (in_serving_softirq() && unlikely(bl > 100)) { if ((in_serving_softirq() || rdp->rcu_cpu_kthread_status == RCU_KTHREAD_RUNNING) &&
(IS_ENABLED(CONFIG_RCU_DOUBLE_CHECK_CB_TIME) || unlikely(bl > 100))) {
const long npj = NSEC_PER_SEC / HZ;
long rrn = READ_ONCE(rcu_resched_ns); long rrn = READ_ONCE(rcu_resched_ns);
rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn; rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn;
tlimit = local_clock() + rrn; tlimit = local_clock() + rrn;
jlimit = jiffies + (rrn + npj + 1) / npj;
jlimit_check = true;
} }
trace_rcu_batch_start(rcu_state.name, trace_rcu_batch_start(rcu_state.name,
rcu_segcblist_n_cbs(&rdp->cblist), bl); rcu_segcblist_n_cbs(&rdp->cblist), bl);
...@@ -2126,21 +2146,23 @@ static void rcu_do_batch(struct rcu_data *rdp) ...@@ -2126,21 +2146,23 @@ static void rcu_do_batch(struct rcu_data *rdp)
* Make sure we don't spend too much time here and deprive other * Make sure we don't spend too much time here and deprive other
* softirq vectors of CPU cycles. * softirq vectors of CPU cycles.
*/ */
if (unlikely(tlimit)) { if (rcu_do_batch_check_time(count, tlimit, jlimit_check, jlimit))
/* only call local_clock() every 32 callbacks */
if (likely((count & 31) || local_clock() < tlimit))
continue;
/* Exceeded the time limit, so leave. */
break; break;
}
} else { } else {
// In rcuoc context, so no worries about depriving // In rcuc/rcuoc context, so no worries about
// other softirq vectors of CPU cycles. // depriving other softirq vectors of CPU cycles.
local_bh_enable(); local_bh_enable();
lockdep_assert_irqs_enabled(); lockdep_assert_irqs_enabled();
cond_resched_tasks_rcu_qs(); cond_resched_tasks_rcu_qs();
lockdep_assert_irqs_enabled(); lockdep_assert_irqs_enabled();
local_bh_disable(); local_bh_disable();
// But rcuc kthreads can delay quiescent-state
// reporting, so check time limits for them.
if (rdp->rcu_cpu_kthread_status == RCU_KTHREAD_RUNNING &&
rcu_do_batch_check_time(count, tlimit, jlimit_check, jlimit)) {
rdp->rcu_cpu_has_work = 1;
break;
}
} }
} }
...@@ -2459,12 +2481,12 @@ static void rcu_cpu_kthread(unsigned int cpu) ...@@ -2459,12 +2481,12 @@ static void rcu_cpu_kthread(unsigned int cpu)
*statusp = RCU_KTHREAD_RUNNING; *statusp = RCU_KTHREAD_RUNNING;
local_irq_disable(); local_irq_disable();
work = *workp; work = *workp;
*workp = 0; WRITE_ONCE(*workp, 0);
local_irq_enable(); local_irq_enable();
if (work) if (work)
rcu_core(); rcu_core();
local_bh_enable(); local_bh_enable();
if (*workp == 0) { if (!READ_ONCE(*workp)) {
trace_rcu_utilization(TPS("End CPU kthread@rcu_wait")); trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
*statusp = RCU_KTHREAD_WAITING; *statusp = RCU_KTHREAD_WAITING;
return; return;
...@@ -2756,7 +2778,7 @@ EXPORT_SYMBOL_GPL(call_rcu); ...@@ -2756,7 +2778,7 @@ EXPORT_SYMBOL_GPL(call_rcu);
*/ */
struct kvfree_rcu_bulk_data { struct kvfree_rcu_bulk_data {
struct list_head list; struct list_head list;
unsigned long gp_snap; struct rcu_gp_oldstate gp_snap;
unsigned long nr_records; unsigned long nr_records;
void *records[]; void *records[];
}; };
...@@ -2773,6 +2795,7 @@ struct kvfree_rcu_bulk_data { ...@@ -2773,6 +2795,7 @@ struct kvfree_rcu_bulk_data {
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
* @head_free: List of kfree_rcu() objects waiting for a grace period * @head_free: List of kfree_rcu() objects waiting for a grace period
* @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees.
* @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
* @krcp: Pointer to @kfree_rcu_cpu structure * @krcp: Pointer to @kfree_rcu_cpu structure
*/ */
...@@ -2780,6 +2803,7 @@ struct kvfree_rcu_bulk_data { ...@@ -2780,6 +2803,7 @@ struct kvfree_rcu_bulk_data {
struct kfree_rcu_cpu_work { struct kfree_rcu_cpu_work {
struct rcu_work rcu_work; struct rcu_work rcu_work;
struct rcu_head *head_free; struct rcu_head *head_free;
struct rcu_gp_oldstate head_free_gp_snap;
struct list_head bulk_head_free[FREE_N_CHANNELS]; struct list_head bulk_head_free[FREE_N_CHANNELS];
struct kfree_rcu_cpu *krcp; struct kfree_rcu_cpu *krcp;
}; };
...@@ -2900,6 +2924,9 @@ drain_page_cache(struct kfree_rcu_cpu *krcp) ...@@ -2900,6 +2924,9 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
struct llist_node *page_list, *pos, *n; struct llist_node *page_list, *pos, *n;
int freed = 0; int freed = 0;
if (!rcu_min_cached_objs)
return 0;
raw_spin_lock_irqsave(&krcp->lock, flags); raw_spin_lock_irqsave(&krcp->lock, flags);
page_list = llist_del_all(&krcp->bkvcache); page_list = llist_del_all(&krcp->bkvcache);
WRITE_ONCE(krcp->nr_bkv_objs, 0); WRITE_ONCE(krcp->nr_bkv_objs, 0);
...@@ -2920,8 +2947,8 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, ...@@ -2920,8 +2947,8 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
unsigned long flags; unsigned long flags;
int i; int i;
if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
debug_rcu_bhead_unqueue(bnode); debug_rcu_bhead_unqueue(bnode);
rcu_lock_acquire(&rcu_callback_map); rcu_lock_acquire(&rcu_callback_map);
if (idx == 0) { // kmalloc() / kfree(). if (idx == 0) { // kmalloc() / kfree().
trace_rcu_invoke_kfree_bulk_callback( trace_rcu_invoke_kfree_bulk_callback(
...@@ -2938,6 +2965,7 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, ...@@ -2938,6 +2965,7 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
} }
} }
rcu_lock_release(&rcu_callback_map); rcu_lock_release(&rcu_callback_map);
}
raw_spin_lock_irqsave(&krcp->lock, flags); raw_spin_lock_irqsave(&krcp->lock, flags);
if (put_cached_bnode(krcp, bnode)) if (put_cached_bnode(krcp, bnode))
...@@ -2984,6 +3012,7 @@ static void kfree_rcu_work(struct work_struct *work) ...@@ -2984,6 +3012,7 @@ static void kfree_rcu_work(struct work_struct *work)
struct rcu_head *head; struct rcu_head *head;
struct kfree_rcu_cpu *krcp; struct kfree_rcu_cpu *krcp;
struct kfree_rcu_cpu_work *krwp; struct kfree_rcu_cpu_work *krwp;
struct rcu_gp_oldstate head_gp_snap;
int i; int i;
krwp = container_of(to_rcu_work(work), krwp = container_of(to_rcu_work(work),
...@@ -2998,6 +3027,7 @@ static void kfree_rcu_work(struct work_struct *work) ...@@ -2998,6 +3027,7 @@ static void kfree_rcu_work(struct work_struct *work)
// Channel 3. // Channel 3.
head = krwp->head_free; head = krwp->head_free;
krwp->head_free = NULL; krwp->head_free = NULL;
head_gp_snap = krwp->head_free_gp_snap;
raw_spin_unlock_irqrestore(&krcp->lock, flags); raw_spin_unlock_irqrestore(&krcp->lock, flags);
// Handle the first two channels. // Handle the first two channels.
...@@ -3014,6 +3044,7 @@ static void kfree_rcu_work(struct work_struct *work) ...@@ -3014,6 +3044,7 @@ static void kfree_rcu_work(struct work_struct *work)
* queued on a linked list through their rcu_head structures. * queued on a linked list through their rcu_head structures.
* This list is named "Channel 3". * This list is named "Channel 3".
*/ */
if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
kvfree_rcu_list(head); kvfree_rcu_list(head);
} }
...@@ -3081,7 +3112,7 @@ kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) ...@@ -3081,7 +3112,7 @@ kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
INIT_LIST_HEAD(&bulk_ready[i]); INIT_LIST_HEAD(&bulk_ready[i]);
list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) { list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) {
if (!poll_state_synchronize_rcu(bnode->gp_snap)) if (!poll_state_synchronize_rcu_full(&bnode->gp_snap))
break; break;
atomic_sub(bnode->nr_records, &krcp->bulk_count[i]); atomic_sub(bnode->nr_records, &krcp->bulk_count[i]);
...@@ -3146,6 +3177,7 @@ static void kfree_rcu_monitor(struct work_struct *work) ...@@ -3146,6 +3177,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
// objects queued on the linked list. // objects queued on the linked list.
if (!krwp->head_free) { if (!krwp->head_free) {
krwp->head_free = krcp->head; krwp->head_free = krcp->head;
get_state_synchronize_rcu_full(&krwp->head_free_gp_snap);
atomic_set(&krcp->head_count, 0); atomic_set(&krcp->head_count, 0);
WRITE_ONCE(krcp->head, NULL); WRITE_ONCE(krcp->head, NULL);
} }
...@@ -3194,7 +3226,7 @@ static void fill_page_cache_func(struct work_struct *work) ...@@ -3194,7 +3226,7 @@ static void fill_page_cache_func(struct work_struct *work)
nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ?
1 : rcu_min_cached_objs; 1 : rcu_min_cached_objs;
for (i = 0; i < nr_pages; i++) { for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) {
bnode = (struct kvfree_rcu_bulk_data *) bnode = (struct kvfree_rcu_bulk_data *)
__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
...@@ -3218,6 +3250,10 @@ static void fill_page_cache_func(struct work_struct *work) ...@@ -3218,6 +3250,10 @@ static void fill_page_cache_func(struct work_struct *work)
static void static void
run_page_cache_worker(struct kfree_rcu_cpu *krcp) run_page_cache_worker(struct kfree_rcu_cpu *krcp)
{ {
// If cache disabled, bail out.
if (!rcu_min_cached_objs)
return;
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
!atomic_xchg(&krcp->work_in_progress, 1)) { !atomic_xchg(&krcp->work_in_progress, 1)) {
if (atomic_read(&krcp->backoff_page_cache_fill)) { if (atomic_read(&krcp->backoff_page_cache_fill)) {
...@@ -3272,7 +3308,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, ...@@ -3272,7 +3308,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
// scenarios. // scenarios.
bnode = (struct kvfree_rcu_bulk_data *) bnode = (struct kvfree_rcu_bulk_data *)
__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
*krcp = krc_this_cpu_lock(flags); raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
} }
if (!bnode) if (!bnode)
...@@ -3285,7 +3321,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, ...@@ -3285,7 +3321,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
// Finally insert and update the GP for this page. // Finally insert and update the GP for this page.
bnode->records[bnode->nr_records++] = ptr; bnode->records[bnode->nr_records++] = ptr;
bnode->gp_snap = get_state_synchronize_rcu(); get_state_synchronize_rcu_full(&bnode->gp_snap);
atomic_inc(&(*krcp)->bulk_count[idx]); atomic_inc(&(*krcp)->bulk_count[idx]);
return true; return true;
...@@ -4283,7 +4319,6 @@ int rcutree_prepare_cpu(unsigned int cpu) ...@@ -4283,7 +4319,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
*/ */
rnp = rdp->mynode; rnp = rdp->mynode;
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
rdp->beenonline = true; /* We have now been online. */
rdp->gp_seq = READ_ONCE(rnp->gp_seq); rdp->gp_seq = READ_ONCE(rnp->gp_seq);
rdp->gp_seq_needed = rdp->gp_seq; rdp->gp_seq_needed = rdp->gp_seq;
rdp->cpu_no_qs.b.norm = true; rdp->cpu_no_qs.b.norm = true;
...@@ -4310,6 +4345,16 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoing) ...@@ -4310,6 +4345,16 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
} }
/*
* Has the specified (known valid) CPU ever been fully online?
*/
bool rcu_cpu_beenfullyonline(int cpu)
{
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
return smp_load_acquire(&rdp->beenonline);
}
/* /*
* Near the end of the CPU-online process. Pretty much all services * Near the end of the CPU-online process. Pretty much all services
* enabled, and the CPU is now very much alive. * enabled, and the CPU is now very much alive.
...@@ -4368,15 +4413,16 @@ int rcutree_offline_cpu(unsigned int cpu) ...@@ -4368,15 +4413,16 @@ int rcutree_offline_cpu(unsigned int cpu)
* Note that this function is special in that it is invoked directly * Note that this function is special in that it is invoked directly
* from the incoming CPU rather than from the cpuhp_step mechanism. * from the incoming CPU rather than from the cpuhp_step mechanism.
* This is because this function must be invoked at a precise location. * This is because this function must be invoked at a precise location.
* This incoming CPU must not have enabled interrupts yet.
*/ */
void rcu_cpu_starting(unsigned int cpu) void rcu_cpu_starting(unsigned int cpu)
{ {
unsigned long flags;
unsigned long mask; unsigned long mask;
struct rcu_data *rdp; struct rcu_data *rdp;
struct rcu_node *rnp; struct rcu_node *rnp;
bool newcpu; bool newcpu;
lockdep_assert_irqs_disabled();
rdp = per_cpu_ptr(&rcu_data, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->cpu_started) if (rdp->cpu_started)
return; return;
...@@ -4384,7 +4430,6 @@ void rcu_cpu_starting(unsigned int cpu) ...@@ -4384,7 +4430,6 @@ void rcu_cpu_starting(unsigned int cpu)
rnp = rdp->mynode; rnp = rdp->mynode;
mask = rdp->grpmask; mask = rdp->grpmask;
local_irq_save(flags);
arch_spin_lock(&rcu_state.ofl_lock); arch_spin_lock(&rcu_state.ofl_lock);
rcu_dynticks_eqs_online(); rcu_dynticks_eqs_online();
raw_spin_lock(&rcu_state.barrier_lock); raw_spin_lock(&rcu_state.barrier_lock);
...@@ -4403,17 +4448,17 @@ void rcu_cpu_starting(unsigned int cpu) ...@@ -4403,17 +4448,17 @@ void rcu_cpu_starting(unsigned int cpu)
/* An incoming CPU should never be blocking a grace period. */ /* An incoming CPU should never be blocking a grace period. */
if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */ if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
/* rcu_report_qs_rnp() *really* wants some flags to restore */ /* rcu_report_qs_rnp() *really* wants some flags to restore */
unsigned long flags2; unsigned long flags;
local_irq_save(flags2); local_irq_save(flags);
rcu_disable_urgency_upon_qs(rdp); rcu_disable_urgency_upon_qs(rdp);
/* Report QS -after- changing ->qsmaskinitnext! */ /* Report QS -after- changing ->qsmaskinitnext! */
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags2); rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
} else { } else {
raw_spin_unlock_rcu_node(rnp); raw_spin_unlock_rcu_node(rnp);
} }
arch_spin_unlock(&rcu_state.ofl_lock); arch_spin_unlock(&rcu_state.ofl_lock);
local_irq_restore(flags); smp_store_release(&rdp->beenonline, true);
smp_mb(); /* Ensure RCU read-side usage follows above initialization. */ smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
} }
......
...@@ -643,7 +643,7 @@ static void synchronize_rcu_expedited_wait(void) ...@@ -643,7 +643,7 @@ static void synchronize_rcu_expedited_wait(void)
"O."[!!cpu_online(cpu)], "O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rnp->expmaskinit)], "o."[!!(rdp->grpmask & rnp->expmaskinit)],
"N."[!!(rdp->grpmask & rnp->expmaskinitnext)], "N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
"D."[!!(rdp->cpu_no_qs.b.exp)]); "D."[!!data_race(rdp->cpu_no_qs.b.exp)]);
} }
} }
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
......
...@@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) ...@@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
int cpu; int cpu;
unsigned long count = 0; unsigned long count = 0;
if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
return 0;
/* Protect rcu_nocb_mask against concurrent (de-)offloading. */
if (!mutex_trylock(&rcu_state.barrier_mutex))
return 0;
/* Snapshot count of all CPUs */ /* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) { for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
count += READ_ONCE(rdp->lazy_len); count += READ_ONCE(rdp->lazy_len);
} }
mutex_unlock(&rcu_state.barrier_mutex);
return count ? count : SHRINK_EMPTY; return count ? count : SHRINK_EMPTY;
} }
...@@ -1336,15 +1345,45 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) ...@@ -1336,15 +1345,45 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
unsigned long flags; unsigned long flags;
unsigned long count = 0; unsigned long count = 0;
if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
return 0;
/*
* Protect against concurrent (de-)offloading. Otherwise nocb locking
* may be ignored or imbalanced.
*/
if (!mutex_trylock(&rcu_state.barrier_mutex)) {
/*
* But really don't insist if barrier_mutex is contended since we
* can't guarantee that it will never engage in a dependency
* chain involving memory allocation. The lock is seldom contended
* anyway.
*/
return 0;
}
/* Snapshot count of all CPUs */ /* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) { for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
int _count = READ_ONCE(rdp->lazy_len); int _count;
if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp)))
continue;
if (_count == 0) if (!READ_ONCE(rdp->lazy_len))
continue; continue;
rcu_nocb_lock_irqsave(rdp, flags); rcu_nocb_lock_irqsave(rdp, flags);
WRITE_ONCE(rdp->lazy_len, 0); /*
* Recheck under the nocb lock. Since we are not holding the bypass
* lock we may still race with increments from the enqueuer but still
* we know for sure if there is at least one lazy callback.
*/
_count = READ_ONCE(rdp->lazy_len);
if (!_count) {
rcu_nocb_unlock_irqrestore(rdp, flags);
continue;
}
WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
rcu_nocb_unlock_irqrestore(rdp, flags); rcu_nocb_unlock_irqrestore(rdp, flags);
wake_nocb_gp(rdp, false); wake_nocb_gp(rdp, false);
sc->nr_to_scan -= _count; sc->nr_to_scan -= _count;
...@@ -1352,6 +1391,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) ...@@ -1352,6 +1391,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
if (sc->nr_to_scan <= 0) if (sc->nr_to_scan <= 0)
break; break;
} }
mutex_unlock(&rcu_state.barrier_mutex);
return count ? count : SHRINK_STOP; return count ? count : SHRINK_STOP;
} }
......
...@@ -257,6 +257,8 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp) ...@@ -257,6 +257,8 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
* GP should not be able to end until we report, so there should be * GP should not be able to end until we report, so there should be
* no need to check for a subsequent expedited GP. (Though we are * no need to check for a subsequent expedited GP. (Though we are
* still in a quiescent state in any case.) * still in a quiescent state in any case.)
*
* Interrupts are disabled, so ->cpu_no_qs.b.exp cannot change.
*/ */
if (blkd_state & RCU_EXP_BLKD && rdp->cpu_no_qs.b.exp) if (blkd_state & RCU_EXP_BLKD && rdp->cpu_no_qs.b.exp)
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
...@@ -941,7 +943,7 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t) ...@@ -941,7 +943,7 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
{ {
struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
if (rdp->cpu_no_qs.b.exp) if (READ_ONCE(rdp->cpu_no_qs.b.exp))
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
} }
......
...@@ -250,7 +250,7 @@ identify_qemu_args () { ...@@ -250,7 +250,7 @@ identify_qemu_args () {
echo -machine virt,gic-version=host -cpu host echo -machine virt,gic-version=host -cpu host
;; ;;
qemu-system-ppc64) qemu-system-ppc64)
echo -enable-kvm -M pseries -nodefaults echo -M pseries -nodefaults
echo -device spapr-vscsi echo -device spapr-vscsi
if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC" if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC"
then then
......
...@@ -5,4 +5,4 @@ rcutree.gp_init_delay=3 ...@@ -5,4 +5,4 @@ rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 rcutree.kthread_prio=2
threadirqs threadirqs
tree.use_softirq=0 rcutree.use_softirq=0
...@@ -4,4 +4,4 @@ rcutree.gp_init_delay=3 ...@@ -4,4 +4,4 @@ rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 rcutree.kthread_prio=2
threadirqs threadirqs
tree.use_softirq=0 rcutree.use_softirq=0
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment