Commit d99391ec authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The RCU changes in this cycle were:
   - Expedited grace-period updates
   - kfree_rcu() updates
   - RCU list updates
   - Preemptible RCU updates
   - Torture-test updates
   - Miscellaneous fixes
   - Documentation updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  rcu: Remove unused stop-machine #include
  powerpc: Remove comment about read_barrier_depends()
  .mailmap: Add entries for old paulmck@kernel.org addresses
  srcu: Apply *_ONCE() to ->srcu_last_gp_end
  rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()
  rcu: Move rcu_{expedited,normal} definitions into rcupdate.h
  rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h
  rcu: Remove the declaration of call_rcu() in tree.h
  rcu: Fix tracepoint tracking RCU CPU kthread utilization
  rcu: Fix harmless omission of "CONFIG_" from #if condition
  rcu: Avoid tick_dep_set_cpu() misordering
  rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
  rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()
  rcu: Clear ->rcu_read_unlock_special only once
  rcu: Clear .exp_hint only when deferred quiescent state has been reported
  rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
  rcu: Remove kfree_call_rcu_nobatch()
  rcu: Remove kfree_rcu() special casing and lazy-callback handling
  rcu: Add support for debug_objects debugging for kfree_rcu()
  rcu: Add multiple in-flight batches of kfree_rcu() work
  ...
parents 8b561778 f8a4bb6b
...@@ -210,6 +210,10 @@ Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> ...@@ -210,6 +210,10 @@ Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Patrick Mochel <mochel@digitalimplant.org> Patrick Mochel <mochel@digitalimplant.org>
Paul Burton <paulburton@kernel.org> <paul.burton@imgtec.com> Paul Burton <paulburton@kernel.org> <paul.burton@imgtec.com>
Paul Burton <paulburton@kernel.org> <paul.burton@mips.com> Paul Burton <paulburton@kernel.org> <paul.burton@mips.com>
Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.ibm.com>
Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.vnet.ibm.com>
Paul E. McKenney <paulmck@kernel.org> <paul.mckenney@linaro.org>
Paul E. McKenney <paulmck@kernel.org> <paulmck@us.ibm.com>
Peter A Jonsson <pj@ludd.ltu.se> Peter A Jonsson <pj@ludd.ltu.se>
Peter Oruba <peter@oruba.de> Peter Oruba <peter@oruba.de>
Peter Oruba <peter.oruba@amd.com> Peter Oruba <peter.oruba@amd.com>
......
.. _NMI_rcu_doc:
Using RCU to Protect Dynamic NMI Handlers Using RCU to Protect Dynamic NMI Handlers
=========================================
Although RCU is usually used to protect read-mostly data structures, Although RCU is usually used to protect read-mostly data structures,
...@@ -9,7 +12,7 @@ work in "arch/x86/oprofile/nmi_timer_int.c" and in ...@@ -9,7 +12,7 @@ work in "arch/x86/oprofile/nmi_timer_int.c" and in
"arch/x86/kernel/traps.c". "arch/x86/kernel/traps.c".
The relevant pieces of code are listed below, each followed by a The relevant pieces of code are listed below, each followed by a
brief explanation. brief explanation::
static int dummy_nmi_callback(struct pt_regs *regs, int cpu) static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
{ {
...@@ -18,12 +21,12 @@ brief explanation. ...@@ -18,12 +21,12 @@ brief explanation.
The dummy_nmi_callback() function is a "dummy" NMI handler that does The dummy_nmi_callback() function is a "dummy" NMI handler that does
nothing, but returns zero, thus saying that it did nothing, allowing nothing, but returns zero, thus saying that it did nothing, allowing
the NMI handler to take the default machine-specific action. the NMI handler to take the default machine-specific action::
static nmi_callback_t nmi_callback = dummy_nmi_callback; static nmi_callback_t nmi_callback = dummy_nmi_callback;
This nmi_callback variable is a global function pointer to the current This nmi_callback variable is a global function pointer to the current
NMI handler. NMI handler::
void do_nmi(struct pt_regs * regs, long error_code) void do_nmi(struct pt_regs * regs, long error_code)
{ {
...@@ -53,11 +56,12 @@ anyway. However, in practice it is a good documentation aid, particularly ...@@ -53,11 +56,12 @@ anyway. However, in practice it is a good documentation aid, particularly
for anyone attempting to do something similar on Alpha or on systems for anyone attempting to do something similar on Alpha or on systems
with aggressive optimizing compilers. with aggressive optimizing compilers.
Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, Quick Quiz:
given that the code referenced by the pointer is read-only? Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
:ref:`Answer to Quick Quiz <answer_quick_quiz_NMI>`
Back to the discussion of NMI and RCU... Back to the discussion of NMI and RCU::
void set_nmi_callback(nmi_callback_t callback) void set_nmi_callback(nmi_callback_t callback)
{ {
...@@ -68,7 +72,7 @@ The set_nmi_callback() function registers an NMI handler. Note that any ...@@ -68,7 +72,7 @@ The set_nmi_callback() function registers an NMI handler. Note that any
data that is to be used by the callback must be initialized up -before- data that is to be used by the callback must be initialized up -before-
the call to set_nmi_callback(). On architectures that do not order the call to set_nmi_callback(). On architectures that do not order
writes, the rcu_assign_pointer() ensures that the NMI handler sees the writes, the rcu_assign_pointer() ensures that the NMI handler sees the
initialized values. initialized values::
void unset_nmi_callback(void) void unset_nmi_callback(void)
{ {
...@@ -82,7 +86,7 @@ up any data structures used by the old NMI handler until execution ...@@ -82,7 +86,7 @@ up any data structures used by the old NMI handler until execution
of it completes on all other CPUs. of it completes on all other CPUs.
One way to accomplish this is via synchronize_rcu(), perhaps as One way to accomplish this is via synchronize_rcu(), perhaps as
follows: follows::
unset_nmi_callback(); unset_nmi_callback();
synchronize_rcu(); synchronize_rcu();
...@@ -98,24 +102,23 @@ to free up the handler's data as soon as synchronize_rcu() returns. ...@@ -98,24 +102,23 @@ to free up the handler's data as soon as synchronize_rcu() returns.
Important note: for this to work, the architecture in question must Important note: for this to work, the architecture in question must
invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively.
.. _answer_quick_quiz_NMI:
Answer to Quick Quiz Answer to Quick Quiz:
Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
Why might the rcu_dereference_sched() be necessary on Alpha, given
that the code referenced by the pointer is read-only?
Answer: The caller to set_nmi_callback() might well have The caller to set_nmi_callback() might well have
initialized some data that is to be used by the new NMI initialized some data that is to be used by the new NMI
handler. In this case, the rcu_dereference_sched() would handler. In this case, the rcu_dereference_sched() would
be needed, because otherwise a CPU that received an NMI be needed, because otherwise a CPU that received an NMI
just after the new handler was set might see the pointer just after the new handler was set might see the pointer
to the new NMI handler, but the old pre-initialized to the new NMI handler, but the old pre-initialized
version of the handler's data. version of the handler's data.
This same sad story can happen on other CPUs when using This same sad story can happen on other CPUs when using
a compiler with aggressive pointer-value speculation a compiler with aggressive pointer-value speculation
optimizations. optimizations.
More important, the rcu_dereference_sched() makes it More important, the rcu_dereference_sched() makes it
clear to someone reading the code that the pointer is clear to someone reading the code that the pointer is
being protected by RCU-sched. being protected by RCU-sched.
Using RCU to Protect Read-Mostly Arrays .. _array_rcu_doc:
Using RCU to Protect Read-Mostly Arrays
=======================================
Although RCU is more commonly used to protect linked lists, it can Although RCU is more commonly used to protect linked lists, it can
also be used to protect arrays. Three situations are as follows: also be used to protect arrays. Three situations are as follows:
1. Hash Tables 1. :ref:`Hash Tables <hash_tables>`
2. Static Arrays 2. :ref:`Static Arrays <static_arrays>`
3. Resizeable Arrays 3. :ref:`Resizable Arrays <resizable_arrays>`
Each of these three situations involves an RCU-protected pointer to an Each of these three situations involves an RCU-protected pointer to an
array that is separately indexed. It might be tempting to consider use array that is separately indexed. It might be tempting to consider use
of RCU to instead protect the index into an array, however, this use of RCU to instead protect the index into an array, however, this use
case is -not- supported. The problem with RCU-protected indexes into case is **not** supported. The problem with RCU-protected indexes into
arrays is that compilers can play way too many optimization games with arrays is that compilers can play way too many optimization games with
integers, which means that the rules governing handling of these indexes integers, which means that the rules governing handling of these indexes
are far more trouble than they are worth. If RCU-protected indexes into are far more trouble than they are worth. If RCU-protected indexes into
...@@ -24,16 +26,20 @@ to be safely used. ...@@ -24,16 +26,20 @@ to be safely used.
That aside, each of the three RCU-protected pointer situations are That aside, each of the three RCU-protected pointer situations are
described in the following sections. described in the following sections.
.. _hash_tables:
Situation 1: Hash Tables Situation 1: Hash Tables
------------------------
Hash tables are often implemented as an array, where each array entry Hash tables are often implemented as an array, where each array entry
has a linked-list hash chain. Each hash chain can be protected by RCU has a linked-list hash chain. Each hash chain can be protected by RCU
as described in the listRCU.txt document. This approach also applies as described in the listRCU.txt document. This approach also applies
to other array-of-list situations, such as radix trees. to other array-of-list situations, such as radix trees.
.. _static_arrays:
Situation 2: Static Arrays Situation 2: Static Arrays
--------------------------
Static arrays, where the data (rather than a pointer to the data) is Static arrays, where the data (rather than a pointer to the data) is
located in each array element, and where the array is never resized, located in each array element, and where the array is never resized,
...@@ -41,13 +47,17 @@ have not been used with RCU. Rik van Riel recommends using seqlock in ...@@ -41,13 +47,17 @@ have not been used with RCU. Rik van Riel recommends using seqlock in
this situation, which would also have minimal read-side overhead as long this situation, which would also have minimal read-side overhead as long
as updates are rare. as updates are rare.
Quick Quiz: Why is it so important that updates be rare when Quick Quiz:
using seqlock? Why is it so important that updates be rare when using seqlock?
:ref:`Answer to Quick Quiz <answer_quick_quiz_seqlock>`
.. _resizable_arrays:
Situation 3: Resizeable Arrays Situation 3: Resizable Arrays
------------------------------
Use of RCU for resizeable arrays is demonstrated by the grow_ary() Use of RCU for resizable arrays is demonstrated by the grow_ary()
function formerly used by the System V IPC code. The array is used function formerly used by the System V IPC code. The array is used
to map from semaphore, message-queue, and shared-memory IDs to the data to map from semaphore, message-queue, and shared-memory IDs to the data
structure that represents the corresponding IPC construct. The grow_ary() structure that represents the corresponding IPC construct. The grow_ary()
...@@ -60,7 +70,7 @@ the remainder of the new, updates the ids->entries pointer to point to ...@@ -60,7 +70,7 @@ the remainder of the new, updates the ids->entries pointer to point to
the new array, and invokes ipc_rcu_putref() to free up the old array. the new array, and invokes ipc_rcu_putref() to free up the old array.
Note that rcu_assign_pointer() is used to update the ids->entries pointer, Note that rcu_assign_pointer() is used to update the ids->entries pointer,
which includes any memory barriers required on whatever architecture which includes any memory barriers required on whatever architecture
you are running on. you are running on::
static int grow_ary(struct ipc_ids* ids, int newsize) static int grow_ary(struct ipc_ids* ids, int newsize)
{ {
...@@ -112,7 +122,7 @@ a simple check suffices. The pointer to the structure corresponding ...@@ -112,7 +122,7 @@ a simple check suffices. The pointer to the structure corresponding
to the desired IPC object is placed in "out", with NULL indicating to the desired IPC object is placed in "out", with NULL indicating
a non-existent entry. After acquiring "out->lock", the "out->deleted" a non-existent entry. After acquiring "out->lock", the "out->deleted"
flag indicates whether the IPC object is in the process of being flag indicates whether the IPC object is in the process of being
deleted, and, if not, the pointer is returned. deleted, and, if not, the pointer is returned::
struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id) struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
{ {
...@@ -144,8 +154,10 @@ deleted, and, if not, the pointer is returned. ...@@ -144,8 +154,10 @@ deleted, and, if not, the pointer is returned.
return out; return out;
} }
.. _answer_quick_quiz_seqlock:
Answer to Quick Quiz: Answer to Quick Quiz:
Why is it so important that updates be rare when using seqlock?
The reason that it is important that updates be rare when The reason that it is important that updates be rare when
using seqlock is that frequent updates can livelock readers. using seqlock is that frequent updates can livelock readers.
......
...@@ -7,8 +7,13 @@ RCU concepts ...@@ -7,8 +7,13 @@ RCU concepts
.. toctree:: .. toctree::
:maxdepth: 3 :maxdepth: 3
arrayRCU
rcubarrier
rcu_dereference
whatisRCU
rcu rcu
listRCU listRCU
NMI-RCU
UP UP
Design/Memory-Ordering/Tree-RCU-Memory-Ordering Design/Memory-Ordering/Tree-RCU-Memory-Ordering
......
...@@ -99,7 +99,7 @@ With this change, the rcu_dereference() is always within an RCU ...@@ -99,7 +99,7 @@ With this change, the rcu_dereference() is always within an RCU
read-side critical section, which again would have suppressed the read-side critical section, which again would have suppressed the
above lockdep-RCU splat. above lockdep-RCU splat.
But in this particular case, we don't actually deference the pointer But in this particular case, we don't actually dereference the pointer
returned from rcu_dereference(). Instead, that pointer is just compared returned from rcu_dereference(). Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced to the cic pointer, which means that the rcu_dereference() can be replaced
by rcu_access_pointer() as follows: by rcu_access_pointer() as follows:
......
.. _rcu_dereference_doc:
PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
===============================================================
Most of the time, you can use values from rcu_dereference() or one of Most of the time, you can use values from rcu_dereference() or one of
the similar primitives without worries. Dereferencing (prefix "*"), the similar primitives without worries. Dereferencing (prefix "*"),
...@@ -8,7 +11,7 @@ subtraction of constants, and casts all work quite naturally and safely. ...@@ -8,7 +11,7 @@ subtraction of constants, and casts all work quite naturally and safely.
It is nevertheless possible to get into trouble with other operations. It is nevertheless possible to get into trouble with other operations.
Follow these rules to keep your RCU code working properly: Follow these rules to keep your RCU code working properly:
o You must use one of the rcu_dereference() family of primitives - You must use one of the rcu_dereference() family of primitives
to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
will complain. Worse yet, your code can see random memory-corruption will complain. Worse yet, your code can see random memory-corruption
bugs due to games that compilers and DEC Alpha can play. bugs due to games that compilers and DEC Alpha can play.
...@@ -25,24 +28,24 @@ o You must use one of the rcu_dereference() family of primitives ...@@ -25,24 +28,24 @@ o You must use one of the rcu_dereference() family of primitives
for an example where the compiler can in fact deduce the exact for an example where the compiler can in fact deduce the exact
value of the pointer, and thus cause misordering. value of the pointer, and thus cause misordering.
o You are only permitted to use rcu_dereference on pointer values. - You are only permitted to use rcu_dereference on pointer values.
The compiler simply knows too much about integral values to The compiler simply knows too much about integral values to
trust it to carry dependencies through integer operations. trust it to carry dependencies through integer operations.
There are a very few exceptions, namely that you can temporarily There are a very few exceptions, namely that you can temporarily
cast the pointer to uintptr_t in order to: cast the pointer to uintptr_t in order to:
o Set bits and clear bits down in the must-be-zero low-order - Set bits and clear bits down in the must-be-zero low-order
bits of that pointer. This clearly means that the pointer bits of that pointer. This clearly means that the pointer
must have alignment constraints, for example, this does must have alignment constraints, for example, this does
-not- work in general for char* pointers. -not- work in general for char* pointers.
o XOR bits to translate pointers, as is done in some - XOR bits to translate pointers, as is done in some
classic buddy-allocator algorithms. classic buddy-allocator algorithms.
It is important to cast the value back to pointer before It is important to cast the value back to pointer before
doing much of anything else with it. doing much of anything else with it.
o Avoid cancellation when using the "+" and "-" infix arithmetic - Avoid cancellation when using the "+" and "-" infix arithmetic
operators. For example, for a given variable "x", avoid operators. For example, for a given variable "x", avoid
"(x-(uintptr_t)x)" for char* pointers. The compiler is within its "(x-(uintptr_t)x)" for char* pointers. The compiler is within its
rights to substitute zero for this sort of expression, so that rights to substitute zero for this sort of expression, so that
...@@ -54,16 +57,16 @@ o Avoid cancellation when using the "+" and "-" infix arithmetic ...@@ -54,16 +57,16 @@ o Avoid cancellation when using the "+" and "-" infix arithmetic
"p+a-b" is safe because its value still necessarily depends on "p+a-b" is safe because its value still necessarily depends on
the rcu_dereference(), thus maintaining proper ordering. the rcu_dereference(), thus maintaining proper ordering.
o If you are using RCU to protect JITed functions, so that the - If you are using RCU to protect JITed functions, so that the
"()" function-invocation operator is applied to a value obtained "()" function-invocation operator is applied to a value obtained
(directly or indirectly) from rcu_dereference(), you may need to (directly or indirectly) from rcu_dereference(), you may need to
interact directly with the hardware to flush instruction caches. interact directly with the hardware to flush instruction caches.
This issue arises on some systems when a newly JITed function is This issue arises on some systems when a newly JITed function is
using the same memory that was used by an earlier JITed function. using the same memory that was used by an earlier JITed function.
o Do not use the results from relational operators ("==", "!=", - Do not use the results from relational operators ("==", "!=",
">", ">=", "<", or "<=") when dereferencing. For example, ">", ">=", "<", or "<=") when dereferencing. For example,
the following (quite strange) code is buggy: the following (quite strange) code is buggy::
int *p; int *p;
int *q; int *q;
...@@ -81,11 +84,11 @@ o Do not use the results from relational operators ("==", "!=", ...@@ -81,11 +84,11 @@ o Do not use the results from relational operators ("==", "!=",
after such branches, but can speculate loads, which can again after such branches, but can speculate loads, which can again
result in misordering bugs. result in misordering bugs.
o Be very careful about comparing pointers obtained from - Be very careful about comparing pointers obtained from
rcu_dereference() against non-NULL values. As Linus Torvalds rcu_dereference() against non-NULL values. As Linus Torvalds
explained, if the two pointers are equal, the compiler could explained, if the two pointers are equal, the compiler could
substitute the pointer you are comparing against for the pointer substitute the pointer you are comparing against for the pointer
obtained from rcu_dereference(). For example: obtained from rcu_dereference(). For example::
p = rcu_dereference(gp); p = rcu_dereference(gp);
if (p == &default_struct) if (p == &default_struct)
...@@ -93,7 +96,7 @@ o Be very careful about comparing pointers obtained from ...@@ -93,7 +96,7 @@ o Be very careful about comparing pointers obtained from
Because the compiler now knows that the value of "p" is exactly Because the compiler now knows that the value of "p" is exactly
the address of the variable "default_struct", it is free to the address of the variable "default_struct", it is free to
transform this code into the following: transform this code into the following::
p = rcu_dereference(gp); p = rcu_dereference(gp);
if (p == &default_struct) if (p == &default_struct)
...@@ -105,14 +108,14 @@ o Be very careful about comparing pointers obtained from ...@@ -105,14 +108,14 @@ o Be very careful about comparing pointers obtained from
However, comparisons are OK in the following cases: However, comparisons are OK in the following cases:
o The comparison was against the NULL pointer. If the - The comparison was against the NULL pointer. If the
compiler knows that the pointer is NULL, you had better compiler knows that the pointer is NULL, you had better
not be dereferencing it anyway. If the comparison is not be dereferencing it anyway. If the comparison is
non-equal, the compiler is none the wiser. Therefore, non-equal, the compiler is none the wiser. Therefore,
it is safe to compare pointers from rcu_dereference() it is safe to compare pointers from rcu_dereference()
against NULL pointers. against NULL pointers.
o The pointer is never dereferenced after being compared. - The pointer is never dereferenced after being compared.
Since there are no subsequent dereferences, the compiler Since there are no subsequent dereferences, the compiler
cannot use anything it learned from the comparison cannot use anything it learned from the comparison
to reorder the non-existent subsequent dereferences. to reorder the non-existent subsequent dereferences.
...@@ -124,31 +127,31 @@ o Be very careful about comparing pointers obtained from ...@@ -124,31 +127,31 @@ o Be very careful about comparing pointers obtained from
dereferenced, rcu_access_pointer() should be used in place dereferenced, rcu_access_pointer() should be used in place
of rcu_dereference(). of rcu_dereference().
o The comparison is against a pointer that references memory - The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason that was initialized "a long time ago." The reason
this is safe is that even if misordering occurs, the this is safe is that even if misordering occurs, the
misordering will not affect the accesses that follow misordering will not affect the accesses that follow
the comparison. So exactly how long ago is "a long the comparison. So exactly how long ago is "a long
time ago"? Here are some possibilities: time ago"? Here are some possibilities:
o Compile time. - Compile time.
o Boot time. - Boot time.
o Module-init time for module code. - Module-init time for module code.
o Prior to kthread creation for kthread code. - Prior to kthread creation for kthread code.
o During some prior acquisition of the lock that - During some prior acquisition of the lock that
we now hold. we now hold.
o Before mod_timer() time for a timer handler. - Before mod_timer() time for a timer handler.
There are many other possibilities involving the Linux There are many other possibilities involving the Linux
kernel's wide array of primitives that cause code to kernel's wide array of primitives that cause code to
be invoked at a later time. be invoked at a later time.
o The pointer being compared against also came from - The pointer being compared against also came from
rcu_dereference(). In this case, both pointers depend rcu_dereference(). In this case, both pointers depend
on one rcu_dereference() or another, so you get proper on one rcu_dereference() or another, so you get proper
ordering either way. ordering either way.
...@@ -159,13 +162,13 @@ o Be very careful about comparing pointers obtained from ...@@ -159,13 +162,13 @@ o Be very careful about comparing pointers obtained from
of such an RCU usage bug is shown in the section titled of such an RCU usage bug is shown in the section titled
"EXAMPLE OF AMPLIFIED RCU-USAGE BUG". "EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
o All of the accesses following the comparison are stores, - All of the accesses following the comparison are stores,
so that a control dependency preserves the needed ordering. so that a control dependency preserves the needed ordering.
That said, it is easy to get control dependencies wrong. That said, it is easy to get control dependencies wrong.
Please see the "CONTROL DEPENDENCIES" section of Please see the "CONTROL DEPENDENCIES" section of
Documentation/memory-barriers.txt for more details. Documentation/memory-barriers.txt for more details.
o The pointers are not equal -and- the compiler does - The pointers are not equal -and- the compiler does
not have enough information to deduce the value of the not have enough information to deduce the value of the
pointer. Note that the volatile cast in rcu_dereference() pointer. Note that the volatile cast in rcu_dereference()
will normally prevent the compiler from knowing too much. will normally prevent the compiler from knowing too much.
...@@ -175,7 +178,7 @@ o Be very careful about comparing pointers obtained from ...@@ -175,7 +178,7 @@ o Be very careful about comparing pointers obtained from
comparison will provide exactly the information that the comparison will provide exactly the information that the
compiler needs to deduce the value of the pointer. compiler needs to deduce the value of the pointer.
o Disable any value-speculation optimizations that your compiler - Disable any value-speculation optimizations that your compiler
might provide, especially if you are making use of feedback-based might provide, especially if you are making use of feedback-based
optimizations that take data collected from prior runs. Such optimizations that take data collected from prior runs. Such
value-speculation optimizations reorder operations by design. value-speculation optimizations reorder operations by design.
...@@ -188,11 +191,12 @@ o Disable any value-speculation optimizations that your compiler ...@@ -188,11 +191,12 @@ o Disable any value-speculation optimizations that your compiler
EXAMPLE OF AMPLIFIED RCU-USAGE BUG EXAMPLE OF AMPLIFIED RCU-USAGE BUG
----------------------------------
Because updaters can run concurrently with RCU readers, RCU readers can Because updaters can run concurrently with RCU readers, RCU readers can
see stale and/or inconsistent values. If RCU readers need fresh or see stale and/or inconsistent values. If RCU readers need fresh or
consistent values, which they sometimes do, they need to take proper consistent values, which they sometimes do, they need to take proper
precautions. To see this, consider the following code fragment: precautions. To see this, consider the following code fragment::
struct foo { struct foo {
int a; int a;
...@@ -244,7 +248,7 @@ to some reordering from the compiler and CPUs is beside the point. ...@@ -244,7 +248,7 @@ to some reordering from the compiler and CPUs is beside the point.
But suppose that the reader needs a consistent view? But suppose that the reader needs a consistent view?
Then one approach is to use locking, for example, as follows: Then one approach is to use locking, for example, as follows::
struct foo { struct foo {
int a; int a;
...@@ -299,6 +303,7 @@ As always, use the right tool for the job! ...@@ -299,6 +303,7 @@ As always, use the right tool for the job!
EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
-----------------------------------------
If a pointer obtained from rcu_dereference() compares not-equal to some If a pointer obtained from rcu_dereference() compares not-equal to some
other pointer, the compiler normally has no clue what the value of the other pointer, the compiler normally has no clue what the value of the
...@@ -308,7 +313,7 @@ guarantees that RCU depends on. And the volatile cast in rcu_dereference() ...@@ -308,7 +313,7 @@ guarantees that RCU depends on. And the volatile cast in rcu_dereference()
should prevent the compiler from guessing the value. should prevent the compiler from guessing the value.
But without rcu_dereference(), the compiler knows more than you might But without rcu_dereference(), the compiler knows more than you might
expect. Consider the following code fragment: expect. Consider the following code fragment::
struct foo { struct foo {
int a; int a;
...@@ -354,6 +359,7 @@ dereference the resulting pointer. ...@@ -354,6 +359,7 @@ dereference the resulting pointer.
WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE? WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
------------------------------------------------------------
First, please avoid using rcu_dereference_raw() and also please avoid First, please avoid using rcu_dereference_raw() and also please avoid
using rcu_dereference_check() and rcu_dereference_protected() with a using rcu_dereference_check() and rcu_dereference_protected() with a
...@@ -370,7 +376,7 @@ member of the rcu_dereference() to use in various situations: ...@@ -370,7 +376,7 @@ member of the rcu_dereference() to use in various situations:
2. If the access might be within an RCU read-side critical section 2. If the access might be within an RCU read-side critical section
on the one hand, or protected by (say) my_lock on the other, on the one hand, or protected by (say) my_lock on the other,
use rcu_dereference_check(), for example: use rcu_dereference_check(), for example::
p1 = rcu_dereference_check(p->rcu_protected_pointer, p1 = rcu_dereference_check(p->rcu_protected_pointer,
lockdep_is_held(&my_lock)); lockdep_is_held(&my_lock));
...@@ -378,14 +384,14 @@ member of the rcu_dereference() to use in various situations: ...@@ -378,14 +384,14 @@ member of the rcu_dereference() to use in various situations:
3. If the access might be within an RCU read-side critical section 3. If the access might be within an RCU read-side critical section
on the one hand, or protected by either my_lock or your_lock on on the one hand, or protected by either my_lock or your_lock on
the other, again use rcu_dereference_check(), for example: the other, again use rcu_dereference_check(), for example::
p1 = rcu_dereference_check(p->rcu_protected_pointer, p1 = rcu_dereference_check(p->rcu_protected_pointer,
lockdep_is_held(&my_lock) || lockdep_is_held(&my_lock) ||
lockdep_is_held(&your_lock)); lockdep_is_held(&your_lock));
4. If the access is on the update side, so that it is always protected 4. If the access is on the update side, so that it is always protected
by my_lock, use rcu_dereference_protected(): by my_lock, use rcu_dereference_protected()::
p1 = rcu_dereference_protected(p->rcu_protected_pointer, p1 = rcu_dereference_protected(p->rcu_protected_pointer,
lockdep_is_held(&my_lock)); lockdep_is_held(&my_lock));
...@@ -410,18 +416,19 @@ member of the rcu_dereference() to use in various situations: ...@@ -410,18 +416,19 @@ member of the rcu_dereference() to use in various situations:
SPARSE CHECKING OF RCU-PROTECTED POINTERS SPARSE CHECKING OF RCU-PROTECTED POINTERS
-----------------------------------------
The sparse static-analysis tool checks for direct access to RCU-protected The sparse static-analysis tool checks for direct access to RCU-protected
pointers, which can result in "interesting" bugs due to compiler pointers, which can result in "interesting" bugs due to compiler
optimizations involving invented loads and perhaps also load tearing. optimizations involving invented loads and perhaps also load tearing.
For example, suppose someone mistakenly does something like this: For example, suppose someone mistakenly does something like this::
p = q->rcu_protected_pointer; p = q->rcu_protected_pointer;
do_something_with(p->a); do_something_with(p->a);
do_something_else_with(p->b); do_something_else_with(p->b);
If register pressure is high, the compiler might optimize "p" out If register pressure is high, the compiler might optimize "p" out
of existence, transforming the code to something like this: of existence, transforming the code to something like this::
do_something_with(q->rcu_protected_pointer->a); do_something_with(q->rcu_protected_pointer->a);
do_something_else_with(q->rcu_protected_pointer->b); do_something_else_with(q->rcu_protected_pointer->b);
...@@ -435,7 +442,7 @@ Load tearing could of course result in dereferencing a mashup of a pair ...@@ -435,7 +442,7 @@ Load tearing could of course result in dereferencing a mashup of a pair
of pointers, which also might fatally disappoint your code. of pointers, which also might fatally disappoint your code.
These problems could have been avoided simply by making the code instead These problems could have been avoided simply by making the code instead
read as follows: read as follows::
p = rcu_dereference(q->rcu_protected_pointer); p = rcu_dereference(q->rcu_protected_pointer);
do_something_with(p->a); do_something_with(p->a);
...@@ -448,7 +455,7 @@ or as a formal parameter, with "__rcu", which tells sparse to complain if ...@@ -448,7 +455,7 @@ or as a formal parameter, with "__rcu", which tells sparse to complain if
this pointer is accessed directly. It will also cause sparse to complain this pointer is accessed directly. It will also cause sparse to complain
if a pointer not marked with "__rcu" is accessed using rcu_dereference() if a pointer not marked with "__rcu" is accessed using rcu_dereference()
and friends. For example, ->rcu_protected_pointer might be declared as and friends. For example, ->rcu_protected_pointer might be declared as
follows: follows::
struct foo __rcu *rcu_protected_pointer; struct foo __rcu *rcu_protected_pointer;
......
.. _rcu_barrier:
RCU and Unloadable Modules RCU and Unloadable Modules
==========================
[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/] [Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/]
...@@ -21,7 +24,7 @@ given that readers might well leave absolutely no trace of their ...@@ -21,7 +24,7 @@ given that readers might well leave absolutely no trace of their
presence? There is a synchronize_rcu() primitive that blocks until all presence? There is a synchronize_rcu() primitive that blocks until all
pre-existing readers have completed. An updater wishing to delete an pre-existing readers have completed. An updater wishing to delete an
element p from a linked list might do the following, while holding an element p from a linked list might do the following, while holding an
appropriate lock, of course: appropriate lock, of course::
list_del_rcu(p); list_del_rcu(p);
synchronize_rcu(); synchronize_rcu();
...@@ -32,13 +35,13 @@ primitive must be used instead. This primitive takes a pointer to an ...@@ -32,13 +35,13 @@ primitive must be used instead. This primitive takes a pointer to an
rcu_head struct placed within the RCU-protected data structure and rcu_head struct placed within the RCU-protected data structure and
another pointer to a function that may be invoked later to free that another pointer to a function that may be invoked later to free that
structure. Code to delete an element p from the linked list from IRQ structure. Code to delete an element p from the linked list from IRQ
context might then be as follows: context might then be as follows::
list_del_rcu(p); list_del_rcu(p);
call_rcu(&p->rcu, p_callback); call_rcu(&p->rcu, p_callback);
Since call_rcu() never blocks, this code can safely be used from within Since call_rcu() never blocks, this code can safely be used from within
IRQ context. The function p_callback() might be defined as follows: IRQ context. The function p_callback() might be defined as follows::
static void p_callback(struct rcu_head *rp) static void p_callback(struct rcu_head *rp)
{ {
...@@ -49,6 +52,7 @@ IRQ context. The function p_callback() might be defined as follows: ...@@ -49,6 +52,7 @@ IRQ context. The function p_callback() might be defined as follows:
Unloading Modules That Use call_rcu() Unloading Modules That Use call_rcu()
-------------------------------------
But what if p_callback is defined in an unloadable module? But what if p_callback is defined in an unloadable module?
...@@ -69,10 +73,11 @@ in realtime kernels in order to avoid excessive scheduling latencies. ...@@ -69,10 +73,11 @@ in realtime kernels in order to avoid excessive scheduling latencies.
rcu_barrier() rcu_barrier()
-------------
We instead need the rcu_barrier() primitive. Rather than waiting for We instead need the rcu_barrier() primitive. Rather than waiting for
a grace period to elapse, rcu_barrier() waits for all outstanding RCU a grace period to elapse, rcu_barrier() waits for all outstanding RCU
callbacks to complete. Please note that rcu_barrier() does -not- imply callbacks to complete. Please note that rcu_barrier() does **not** imply
synchronize_rcu(), in particular, if there are no RCU callbacks queued synchronize_rcu(), in particular, if there are no RCU callbacks queued
anywhere, rcu_barrier() is within its rights to return immediately, anywhere, rcu_barrier() is within its rights to return immediately,
without waiting for a grace period to elapse. without waiting for a grace period to elapse.
...@@ -88,79 +93,79 @@ must match the flavor of rcu_barrier() with that of call_rcu(). If your ...@@ -88,79 +93,79 @@ must match the flavor of rcu_barrier() with that of call_rcu(). If your
module uses multiple flavors of call_rcu(), then it must also use multiple module uses multiple flavors of call_rcu(), then it must also use multiple
flavors of rcu_barrier() when unloading that module. For example, if flavors of rcu_barrier() when unloading that module. For example, if
it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on
srcu_struct_2(), then the following three lines of code will be required srcu_struct_2, then the following three lines of code will be required
when unloading: when unloading::
1 rcu_barrier(); 1 rcu_barrier();
2 srcu_barrier(&srcu_struct_1); 2 srcu_barrier(&srcu_struct_1);
3 srcu_barrier(&srcu_struct_2); 3 srcu_barrier(&srcu_struct_2);
The rcutorture module makes use of rcu_barrier() in its exit function The rcutorture module makes use of rcu_barrier() in its exit function
as follows: as follows::
1 static void 1 static void
2 rcu_torture_cleanup(void) 2 rcu_torture_cleanup(void)
3 { 3 {
4 int i; 4 int i;
5 5
6 fullstop = 1; 6 fullstop = 1;
7 if (shuffler_task != NULL) { 7 if (shuffler_task != NULL) {
8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task"); 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
9 kthread_stop(shuffler_task); 9 kthread_stop(shuffler_task);
10 } 10 }
11 shuffler_task = NULL; 11 shuffler_task = NULL;
12 12
13 if (writer_task != NULL) { 13 if (writer_task != NULL) {
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
15 kthread_stop(writer_task); 15 kthread_stop(writer_task);
16 } 16 }
17 writer_task = NULL; 17 writer_task = NULL;
18 18
19 if (reader_tasks != NULL) { 19 if (reader_tasks != NULL) {
20 for (i = 0; i < nrealreaders; i++) { 20 for (i = 0; i < nrealreaders; i++) {
21 if (reader_tasks[i] != NULL) { 21 if (reader_tasks[i] != NULL) {
22 VERBOSE_PRINTK_STRING( 22 VERBOSE_PRINTK_STRING(
23 "Stopping rcu_torture_reader task"); 23 "Stopping rcu_torture_reader task");
24 kthread_stop(reader_tasks[i]); 24 kthread_stop(reader_tasks[i]);
25 } 25 }
26 reader_tasks[i] = NULL; 26 reader_tasks[i] = NULL;
27 } 27 }
28 kfree(reader_tasks); 28 kfree(reader_tasks);
29 reader_tasks = NULL; 29 reader_tasks = NULL;
30 } 30 }
31 rcu_torture_current = NULL; 31 rcu_torture_current = NULL;
32 32
33 if (fakewriter_tasks != NULL) { 33 if (fakewriter_tasks != NULL) {
34 for (i = 0; i < nfakewriters; i++) { 34 for (i = 0; i < nfakewriters; i++) {
35 if (fakewriter_tasks[i] != NULL) { 35 if (fakewriter_tasks[i] != NULL) {
36 VERBOSE_PRINTK_STRING( 36 VERBOSE_PRINTK_STRING(
37 "Stopping rcu_torture_fakewriter task"); 37 "Stopping rcu_torture_fakewriter task");
38 kthread_stop(fakewriter_tasks[i]); 38 kthread_stop(fakewriter_tasks[i]);
39 } 39 }
40 fakewriter_tasks[i] = NULL; 40 fakewriter_tasks[i] = NULL;
41 } 41 }
42 kfree(fakewriter_tasks); 42 kfree(fakewriter_tasks);
43 fakewriter_tasks = NULL; 43 fakewriter_tasks = NULL;
44 } 44 }
45 45
46 if (stats_task != NULL) { 46 if (stats_task != NULL) {
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
48 kthread_stop(stats_task); 48 kthread_stop(stats_task);
49 } 49 }
50 stats_task = NULL; 50 stats_task = NULL;
51 51
52 /* Wait for all RCU callbacks to fire. */ 52 /* Wait for all RCU callbacks to fire. */
53 rcu_barrier(); 53 rcu_barrier();
54 54
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
56 56
57 if (cur_ops->cleanup != NULL) 57 if (cur_ops->cleanup != NULL)
58 cur_ops->cleanup(); 58 cur_ops->cleanup();
59 if (atomic_read(&n_rcu_torture_error)) 59 if (atomic_read(&n_rcu_torture_error))
60 rcu_torture_print_module_parms("End of test: FAILURE"); 60 rcu_torture_print_module_parms("End of test: FAILURE");
61 else 61 else
62 rcu_torture_print_module_parms("End of test: SUCCESS"); 62 rcu_torture_print_module_parms("End of test: SUCCESS");
63 } 63 }
Line 6 sets a global variable that prevents any RCU callbacks from Line 6 sets a global variable that prevents any RCU callbacks from
re-posting themselves. This will not be necessary in most cases, since re-posting themselves. This will not be necessary in most cases, since
...@@ -176,9 +181,14 @@ for any pre-existing callbacks to complete. ...@@ -176,9 +181,14 @@ for any pre-existing callbacks to complete.
Then lines 55-62 print status and do operation-specific cleanup, and Then lines 55-62 print status and do operation-specific cleanup, and
then return, permitting the module-unload operation to be completed. then return, permitting the module-unload operation to be completed.
Quick Quiz #1: Is there any other situation where rcu_barrier() might .. _rcubarrier_quiz_1:
Quick Quiz #1:
Is there any other situation where rcu_barrier() might
be required? be required?
:ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>`
Your module might have additional complications. For example, if your Your module might have additional complications. For example, if your
module invokes call_rcu() from timers, you will need to first cancel all module invokes call_rcu() from timers, you will need to first cancel all
the timers, and only then invoke rcu_barrier() to wait for any remaining the timers, and only then invoke rcu_barrier() to wait for any remaining
...@@ -188,11 +198,12 @@ Of course, if you module uses call_rcu(), you will need to invoke ...@@ -188,11 +198,12 @@ Of course, if you module uses call_rcu(), you will need to invoke
rcu_barrier() before unloading. Similarly, if your module uses rcu_barrier() before unloading. Similarly, if your module uses
call_srcu(), you will need to invoke srcu_barrier() before unloading, call_srcu(), you will need to invoke srcu_barrier() before unloading,
and on the same srcu_struct structure. If your module uses call_rcu() and on the same srcu_struct structure. If your module uses call_rcu()
-and- call_srcu(), then you will need to invoke rcu_barrier() -and- **and** call_srcu(), then you will need to invoke rcu_barrier() **and**
srcu_barrier(). srcu_barrier().
Implementing rcu_barrier() Implementing rcu_barrier()
--------------------------
Dipankar Sarma's implementation of rcu_barrier() makes use of the fact Dipankar Sarma's implementation of rcu_barrier() makes use of the fact
that RCU callbacks are never reordered once queued on one of the per-CPU that RCU callbacks are never reordered once queued on one of the per-CPU
...@@ -200,19 +211,19 @@ queues. His implementation queues an RCU callback on each of the per-CPU ...@@ -200,19 +211,19 @@ queues. His implementation queues an RCU callback on each of the per-CPU
callback queues, and then waits until they have all started executing, at callback queues, and then waits until they have all started executing, at
which point, all earlier RCU callbacks are guaranteed to have completed. which point, all earlier RCU callbacks are guaranteed to have completed.
The original code for rcu_barrier() was as follows: The original code for rcu_barrier() was as follows::
1 void rcu_barrier(void) 1 void rcu_barrier(void)
2 { 2 {
3 BUG_ON(in_interrupt()); 3 BUG_ON(in_interrupt());
4 /* Take cpucontrol mutex to protect against CPU hotplug */ 4 /* Take cpucontrol mutex to protect against CPU hotplug */
5 mutex_lock(&rcu_barrier_mutex); 5 mutex_lock(&rcu_barrier_mutex);
6 init_completion(&rcu_barrier_completion); 6 init_completion(&rcu_barrier_completion);
7 atomic_set(&rcu_barrier_cpu_count, 0); 7 atomic_set(&rcu_barrier_cpu_count, 0);
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
9 wait_for_completion(&rcu_barrier_completion); 9 wait_for_completion(&rcu_barrier_completion);
10 mutex_unlock(&rcu_barrier_mutex); 10 mutex_unlock(&rcu_barrier_mutex);
11 } 11 }
Line 3 verifies that the caller is in process context, and lines 5 and 10 Line 3 verifies that the caller is in process context, and lines 5 and 10
use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the
...@@ -226,18 +237,18 @@ This code was rewritten in 2008 and several times thereafter, but this ...@@ -226,18 +237,18 @@ This code was rewritten in 2008 and several times thereafter, but this
still gives the general idea. still gives the general idea.
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
to post an RCU callback, as follows: to post an RCU callback, as follows::
1 static void rcu_barrier_func(void *notused) 1 static void rcu_barrier_func(void *notused)
2 { 2 {
3 int cpu = smp_processor_id(); 3 int cpu = smp_processor_id();
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
5 struct rcu_head *head; 5 struct rcu_head *head;
6 6
7 head = &rdp->barrier; 7 head = &rdp->barrier;
8 atomic_inc(&rcu_barrier_cpu_count); 8 atomic_inc(&rcu_barrier_cpu_count);
9 call_rcu(head, rcu_barrier_callback); 9 call_rcu(head, rcu_barrier_callback);
10 } 10 }
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure, Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
which contains the struct rcu_head that needed for the later call to which contains the struct rcu_head that needed for the later call to
...@@ -248,20 +259,25 @@ the current CPU's queue. ...@@ -248,20 +259,25 @@ the current CPU's queue.
The rcu_barrier_callback() function simply atomically decrements the The rcu_barrier_callback() function simply atomically decrements the
rcu_barrier_cpu_count variable and finalizes the completion when it rcu_barrier_cpu_count variable and finalizes the completion when it
reaches zero, as follows: reaches zero, as follows::
1 static void rcu_barrier_callback(struct rcu_head *notused) 1 static void rcu_barrier_callback(struct rcu_head *notused)
2 { 2 {
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
4 complete(&rcu_barrier_completion); 4 complete(&rcu_barrier_completion);
5 } 5 }
Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes .. _rcubarrier_quiz_2:
Quick Quiz #2:
What happens if CPU 0's rcu_barrier_func() executes
immediately (thus incrementing rcu_barrier_cpu_count to the immediately (thus incrementing rcu_barrier_cpu_count to the
value one), but the other CPU's rcu_barrier_func() invocations value one), but the other CPU's rcu_barrier_func() invocations
are delayed for a full grace period? Couldn't this result in are delayed for a full grace period? Couldn't this result in
rcu_barrier() returning prematurely? rcu_barrier() returning prematurely?
:ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>`
The current rcu_barrier() implementation is more complex, due to the need The current rcu_barrier() implementation is more complex, due to the need
to avoid disturbing idle CPUs (especially on battery-powered systems) to avoid disturbing idle CPUs (especially on battery-powered systems)
and the need to minimally disturb non-idle CPUs in real-time systems. and the need to minimally disturb non-idle CPUs in real-time systems.
...@@ -269,6 +285,7 @@ However, the code above illustrates the concepts. ...@@ -269,6 +285,7 @@ However, the code above illustrates the concepts.
rcu_barrier() Summary rcu_barrier() Summary
---------------------
The rcu_barrier() primitive has seen relatively little use, since most The rcu_barrier() primitive has seen relatively little use, since most
code using RCU is in the core kernel rather than in modules. However, if code using RCU is in the core kernel rather than in modules. However, if
...@@ -277,8 +294,12 @@ so that your module may be safely unloaded. ...@@ -277,8 +294,12 @@ so that your module may be safely unloaded.
Answers to Quick Quizzes Answers to Quick Quizzes
------------------------
.. _answer_rcubarrier_quiz_1:
Quick Quiz #1: Is there any other situation where rcu_barrier() might Quick Quiz #1:
Is there any other situation where rcu_barrier() might
be required? be required?
Answer: Interestingly enough, rcu_barrier() was not originally Answer: Interestingly enough, rcu_barrier() was not originally
...@@ -292,7 +313,12 @@ Answer: Interestingly enough, rcu_barrier() was not originally ...@@ -292,7 +313,12 @@ Answer: Interestingly enough, rcu_barrier() was not originally
implementing rcutorture, and found that rcu_barrier() solves implementing rcutorture, and found that rcu_barrier() solves
this problem as well. this problem as well.
Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes :ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1>`
.. _answer_rcubarrier_quiz_2:
Quick Quiz #2:
What happens if CPU 0's rcu_barrier_func() executes
immediately (thus incrementing rcu_barrier_cpu_count to the immediately (thus incrementing rcu_barrier_cpu_count to the
value one), but the other CPU's rcu_barrier_func() invocations value one), but the other CPU's rcu_barrier_func() invocations
are delayed for a full grace period? Couldn't this result in are delayed for a full grace period? Couldn't this result in
...@@ -323,3 +349,5 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last ...@@ -323,3 +349,5 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last
is to add an rcu_read_lock() before line 8 of rcu_barrier() is to add an rcu_read_lock() before line 8 of rcu_barrier()
and an rcu_read_unlock() after line 8 of this same function. If and an rcu_read_unlock() after line 8 of this same function. If
you can think of a better change, please let me know! you can think of a better change, please let me know!
:ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`
...@@ -225,18 +225,13 @@ an estimate of the total number of RCU callbacks queued across all CPUs ...@@ -225,18 +225,13 @@ an estimate of the total number of RCU callbacks queued across all CPUs
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU: for each CPU:
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
The "last_accelerate:" prints the low-order 16 bits (in hex) of the The "last_accelerate:" prints the low-order 16 bits (in hex) of the
jiffies counter when this CPU last invoked rcu_try_advance_all_cbs() jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from
rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle
status, so that an "l" indicates that all callbacks were lazy at the start processing is enabled.
of the last idle period and an "L" indicates that there are currently
no non-lazy callbacks (in both cases, "." is printed otherwise, as
shown above) and "D" indicates that dyntick-idle processing is enabled
("." is printed otherwise, for example, if disabled via the "nohz="
kernel boot parameter).
If the grace period ends just as the stall warning starts printing, If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include there will be a spurious stall-warning message, which will include
......
.. _whatisrcu_doc:
What is RCU? -- "Read, Copy, Update" What is RCU? -- "Read, Copy, Update"
======================================
Please note that the "What is RCU?" LWN series is an excellent place Please note that the "What is RCU?" LWN series is an excellent place
to start learning about RCU: to start learning about RCU:
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ | 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
2010 Big API Table http://lwn.net/Articles/419086/ | 2010 Big API Table http://lwn.net/Articles/419086/
5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ | 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
2014 Big API Table http://lwn.net/Articles/609973/ | 2014 Big API Table http://lwn.net/Articles/609973/
What is RCU? What is RCU?
...@@ -24,14 +27,21 @@ the experience has been that different people must take different paths ...@@ -24,14 +27,21 @@ the experience has been that different people must take different paths
to arrive at an understanding of RCU. This document provides several to arrive at an understanding of RCU. This document provides several
different paths, as follows: different paths, as follows:
1. RCU OVERVIEW :ref:`1. RCU OVERVIEW <1_whatisRCU>`
2. WHAT IS RCU'S CORE API?
3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? :ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>`
4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? :ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
6. ANALOGY WITH READER-WRITER LOCKING
7. FULL LIST OF RCU APIs :ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
8. ANSWERS TO QUICK QUIZZES
:ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
:ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
:ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>`
:ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
People who prefer starting with a conceptual overview should focus on People who prefer starting with a conceptual overview should focus on
Section 1, though most readers will profit by reading this section at Section 1, though most readers will profit by reading this section at
...@@ -49,8 +59,10 @@ everything, feel free to read the whole thing -- but if you are really ...@@ -49,8 +59,10 @@ everything, feel free to read the whole thing -- but if you are really
that type of person, you have perused the source code and will therefore that type of person, you have perused the source code and will therefore
never need this document anyway. ;-) never need this document anyway. ;-)
.. _1_whatisRCU:
1. RCU OVERVIEW 1. RCU OVERVIEW
----------------
The basic idea behind RCU is to split updates into "removal" and The basic idea behind RCU is to split updates into "removal" and
"reclamation" phases. The removal phase removes references to data items "reclamation" phases. The removal phase removes references to data items
...@@ -116,8 +128,10 @@ So how the heck can a reclaimer tell when a reader is done, given ...@@ -116,8 +128,10 @@ So how the heck can a reclaimer tell when a reader is done, given
that readers are not doing any sort of synchronization operations??? that readers are not doing any sort of synchronization operations???
Read on to learn about how RCU's API makes this easy. Read on to learn about how RCU's API makes this easy.
.. _2_whatisRCU:
2. WHAT IS RCU'S CORE API? 2. WHAT IS RCU'S CORE API?
---------------------------
The core RCU API is quite small: The core RCU API is quite small:
...@@ -136,7 +150,7 @@ later. See the kernel docbook documentation for more info, or look directly ...@@ -136,7 +150,7 @@ later. See the kernel docbook documentation for more info, or look directly
at the function header comments. at the function header comments.
rcu_read_lock() rcu_read_lock()
^^^^^^^^^^^^^^^
void rcu_read_lock(void); void rcu_read_lock(void);
Used by a reader to inform the reclaimer that the reader is Used by a reader to inform the reclaimer that the reader is
...@@ -150,7 +164,7 @@ rcu_read_lock() ...@@ -150,7 +164,7 @@ rcu_read_lock()
longer-term references to data structures. longer-term references to data structures.
rcu_read_unlock() rcu_read_unlock()
^^^^^^^^^^^^^^^^^
void rcu_read_unlock(void); void rcu_read_unlock(void);
Used by a reader to inform the reclaimer that the reader is Used by a reader to inform the reclaimer that the reader is
...@@ -158,15 +172,15 @@ rcu_read_unlock() ...@@ -158,15 +172,15 @@ rcu_read_unlock()
read-side critical sections may be nested and/or overlapping. read-side critical sections may be nested and/or overlapping.
synchronize_rcu() synchronize_rcu()
^^^^^^^^^^^^^^^^^
void synchronize_rcu(void); void synchronize_rcu(void);
Marks the end of updater code and the beginning of reclaimer Marks the end of updater code and the beginning of reclaimer
code. It does this by blocking until all pre-existing RCU code. It does this by blocking until all pre-existing RCU
read-side critical sections on all CPUs have completed. read-side critical sections on all CPUs have completed.
Note that synchronize_rcu() will -not- necessarily wait for Note that synchronize_rcu() will **not** necessarily wait for
any subsequent RCU read-side critical sections to complete. any subsequent RCU read-side critical sections to complete.
For example, consider the following sequence of events: For example, consider the following sequence of events::
CPU 0 CPU 1 CPU 2 CPU 0 CPU 1 CPU 2
----------------- ------------------------- --------------- ----------------- ------------------------- ---------------
...@@ -182,7 +196,7 @@ synchronize_rcu() ...@@ -182,7 +196,7 @@ synchronize_rcu()
any that begin after synchronize_rcu() is invoked. any that begin after synchronize_rcu() is invoked.
Of course, synchronize_rcu() does not necessarily return Of course, synchronize_rcu() does not necessarily return
-immediately- after the last pre-existing RCU read-side critical **immediately** after the last pre-existing RCU read-side critical
section completes. For one thing, there might well be scheduling section completes. For one thing, there might well be scheduling
delays. For another thing, many RCU implementations process delays. For another thing, many RCU implementations process
requests in batches in order to improve efficiencies, which can requests in batches in order to improve efficiencies, which can
...@@ -211,10 +225,10 @@ synchronize_rcu() ...@@ -211,10 +225,10 @@ synchronize_rcu()
checklist.txt for some approaches to limiting the update rate. checklist.txt for some approaches to limiting the update rate.
rcu_assign_pointer() rcu_assign_pointer()
^^^^^^^^^^^^^^^^^^^^
void rcu_assign_pointer(p, typeof(p) v); void rcu_assign_pointer(p, typeof(p) v);
Yes, rcu_assign_pointer() -is- implemented as a macro, though it Yes, rcu_assign_pointer() **is** implemented as a macro, though it
would be cool to be able to declare a function in this manner. would be cool to be able to declare a function in this manner.
(Compiler experts will no doubt disagree.) (Compiler experts will no doubt disagree.)
...@@ -231,7 +245,7 @@ rcu_assign_pointer() ...@@ -231,7 +245,7 @@ rcu_assign_pointer()
the _rcu list-manipulation primitives such as list_add_rcu(). the _rcu list-manipulation primitives such as list_add_rcu().
rcu_dereference() rcu_dereference()
^^^^^^^^^^^^^^^^^
typeof(p) rcu_dereference(p); typeof(p) rcu_dereference(p);
Like rcu_assign_pointer(), rcu_dereference() must be implemented Like rcu_assign_pointer(), rcu_dereference() must be implemented
...@@ -248,13 +262,13 @@ rcu_dereference() ...@@ -248,13 +262,13 @@ rcu_dereference()
Common coding practice uses rcu_dereference() to copy an Common coding practice uses rcu_dereference() to copy an
RCU-protected pointer to a local variable, then dereferences RCU-protected pointer to a local variable, then dereferences
this local variable, for example as follows: this local variable, for example as follows::
p = rcu_dereference(head.next); p = rcu_dereference(head.next);
return p->data; return p->data;
However, in this case, one could just as easily combine these However, in this case, one could just as easily combine these
into one statement: into one statement::
return rcu_dereference(head.next)->data; return rcu_dereference(head.next)->data;
...@@ -266,8 +280,8 @@ rcu_dereference() ...@@ -266,8 +280,8 @@ rcu_dereference()
unnecessary overhead on Alpha CPUs. unnecessary overhead on Alpha CPUs.
Note that the value returned by rcu_dereference() is valid Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section [1]. only within the enclosing RCU read-side critical section [1]_.
For example, the following is -not- legal: For example, the following is **not** legal::
rcu_read_lock(); rcu_read_lock();
p = rcu_dereference(head.next); p = rcu_dereference(head.next);
...@@ -290,9 +304,9 @@ rcu_dereference() ...@@ -290,9 +304,9 @@ rcu_dereference()
at any time, including immediately after the rcu_dereference(). at any time, including immediately after the rcu_dereference().
And, again like rcu_assign_pointer(), rcu_dereference() is And, again like rcu_assign_pointer(), rcu_dereference() is
typically used indirectly, via the _rcu list-manipulation typically used indirectly, via the _rcu list-manipulation
primitives, such as list_for_each_entry_rcu() [2]. primitives, such as list_for_each_entry_rcu() [2]_.
[1] The variant rcu_dereference_protected() can be used outside .. [1] The variant rcu_dereference_protected() can be used outside
of an RCU read-side critical section as long as the usage is of an RCU read-side critical section as long as the usage is
protected by locks acquired by the update-side code. This variant protected by locks acquired by the update-side code. This variant
avoids the lockdep warning that would happen when using (for avoids the lockdep warning that would happen when using (for
...@@ -305,7 +319,7 @@ rcu_dereference() ...@@ -305,7 +319,7 @@ rcu_dereference()
a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst
and the API's code comments for more details and example usage. and the API's code comments for more details and example usage.
[2] If the list_for_each_entry_rcu() instance might be used by .. [2] If the list_for_each_entry_rcu() instance might be used by
update-side code as well as by RCU readers, then an additional update-side code as well as by RCU readers, then an additional
lockdep expression can be added to its list of arguments. lockdep expression can be added to its list of arguments.
For example, given an additional "lock_is_held(&mylock)" argument, For example, given an additional "lock_is_held(&mylock)" argument,
...@@ -315,6 +329,7 @@ rcu_dereference() ...@@ -315,6 +329,7 @@ rcu_dereference()
The following diagram shows how each API communicates among the The following diagram shows how each API communicates among the
reader, updater, and reclaimer. reader, updater, and reclaimer.
::
rcu_assign_pointer() rcu_assign_pointer()
...@@ -375,12 +390,16 @@ c. RCU applied to scheduler and interrupt/NMI-handler tasks. ...@@ -375,12 +390,16 @@ c. RCU applied to scheduler and interrupt/NMI-handler tasks.
Again, most uses will be of (a). The (b) and (c) cases are important Again, most uses will be of (a). The (b) and (c) cases are important
for specialized uses, but are relatively uncommon. for specialized uses, but are relatively uncommon.
.. _3_whatisRCU:
3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
-----------------------------------------------
This section shows a simple use of the core RCU API to protect a This section shows a simple use of the core RCU API to protect a
global pointer to a dynamically allocated structure. More-typical global pointer to a dynamically allocated structure. More-typical
uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
:ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
::
struct foo { struct foo {
int a; int a;
...@@ -440,40 +459,43 @@ uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. ...@@ -440,40 +459,43 @@ uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
So, to sum up: So, to sum up:
o Use rcu_read_lock() and rcu_read_unlock() to guard RCU - Use rcu_read_lock() and rcu_read_unlock() to guard RCU
read-side critical sections. read-side critical sections.
o Within an RCU read-side critical section, use rcu_dereference() - Within an RCU read-side critical section, use rcu_dereference()
to dereference RCU-protected pointers. to dereference RCU-protected pointers.
o Use some solid scheme (such as locks or semaphores) to - Use some solid scheme (such as locks or semaphores) to
keep concurrent updates from interfering with each other. keep concurrent updates from interfering with each other.
o Use rcu_assign_pointer() to update an RCU-protected pointer. - Use rcu_assign_pointer() to update an RCU-protected pointer.
This primitive protects concurrent readers from the updater, This primitive protects concurrent readers from the updater,
-not- concurrent updates from each other! You therefore still **not** concurrent updates from each other! You therefore still
need to use locking (or something similar) to keep concurrent need to use locking (or something similar) to keep concurrent
rcu_assign_pointer() primitives from interfering with each other. rcu_assign_pointer() primitives from interfering with each other.
o Use synchronize_rcu() -after- removing a data element from an - Use synchronize_rcu() **after** removing a data element from an
RCU-protected data structure, but -before- reclaiming/freeing RCU-protected data structure, but **before** reclaiming/freeing
the data element, in order to wait for the completion of all the data element, in order to wait for the completion of all
RCU read-side critical sections that might be referencing that RCU read-side critical sections that might be referencing that
data item. data item.
See checklist.txt for additional rules to follow when using RCU. See checklist.txt for additional rules to follow when using RCU.
And again, more-typical uses of RCU may be found in listRCU.txt, And again, more-typical uses of RCU may be found in :ref:`listRCU.rst
arrayRCU.txt, and NMI-RCU.txt. <list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
<NMI_rcu_doc>`.
.. _4_whatisRCU:
4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
--------------------------------------------
In the example above, foo_update_a() blocks until a grace period elapses. In the example above, foo_update_a() blocks until a grace period elapses.
This is quite simple, but in some cases one cannot afford to wait so This is quite simple, but in some cases one cannot afford to wait so
long -- there might be other high-priority work to be done. long -- there might be other high-priority work to be done.
In such cases, one uses call_rcu() rather than synchronize_rcu(). In such cases, one uses call_rcu() rather than synchronize_rcu().
The call_rcu() API is as follows: The call_rcu() API is as follows::
void call_rcu(struct rcu_head * head, void call_rcu(struct rcu_head * head,
void (*func)(struct rcu_head *head)); void (*func)(struct rcu_head *head));
...@@ -481,7 +503,7 @@ The call_rcu() API is as follows: ...@@ -481,7 +503,7 @@ The call_rcu() API is as follows:
This function invokes func(head) after a grace period has elapsed. This function invokes func(head) after a grace period has elapsed.
This invocation might happen from either softirq or process context, This invocation might happen from either softirq or process context,
so the function is not permitted to block. The foo struct needs to so the function is not permitted to block. The foo struct needs to
have an rcu_head structure added, perhaps as follows: have an rcu_head structure added, perhaps as follows::
struct foo { struct foo {
int a; int a;
...@@ -490,7 +512,7 @@ have an rcu_head structure added, perhaps as follows: ...@@ -490,7 +512,7 @@ have an rcu_head structure added, perhaps as follows:
struct rcu_head rcu; struct rcu_head rcu;
}; };
The foo_update_a() function might then be written as follows: The foo_update_a() function might then be written as follows::
/* /*
* Create a new struct foo that is the same as the one currently * Create a new struct foo that is the same as the one currently
...@@ -520,7 +542,7 @@ The foo_update_a() function might then be written as follows: ...@@ -520,7 +542,7 @@ The foo_update_a() function might then be written as follows:
call_rcu(&old_fp->rcu, foo_reclaim); call_rcu(&old_fp->rcu, foo_reclaim);
} }
The foo_reclaim() function might appear as follows: The foo_reclaim() function might appear as follows::
void foo_reclaim(struct rcu_head *rp) void foo_reclaim(struct rcu_head *rp)
{ {
...@@ -544,7 +566,7 @@ namely foo_reclaim(). ...@@ -544,7 +566,7 @@ namely foo_reclaim().
The summary of advice is the same as for the previous section, except The summary of advice is the same as for the previous section, except
that we are now using call_rcu() rather than synchronize_rcu(): that we are now using call_rcu() rather than synchronize_rcu():
o Use call_rcu() -after- removing a data element from an - Use call_rcu() **after** removing a data element from an
RCU-protected data structure in order to register a callback RCU-protected data structure in order to register a callback
function that will be invoked after the completion of all RCU function that will be invoked after the completion of all RCU
read-side critical sections that might be referencing that read-side critical sections that might be referencing that
...@@ -552,14 +574,16 @@ o Use call_rcu() -after- removing a data element from an ...@@ -552,14 +574,16 @@ o Use call_rcu() -after- removing a data element from an
If the callback for call_rcu() is not doing anything more than calling If the callback for call_rcu() is not doing anything more than calling
kfree() on the structure, you can use kfree_rcu() instead of call_rcu() kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
to avoid having to write your own callback: to avoid having to write your own callback::
kfree_rcu(old_fp, rcu); kfree_rcu(old_fp, rcu);
Again, see checklist.txt for additional rules governing the use of RCU. Again, see checklist.txt for additional rules governing the use of RCU.
.. _5_whatisRCU:
5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
------------------------------------------------
One of the nice things about RCU is that it has extremely simple "toy" One of the nice things about RCU is that it has extremely simple "toy"
implementations that are a good first step towards understanding the implementations that are a good first step towards understanding the
...@@ -579,7 +603,7 @@ more details on the current implementation as of early 2004. ...@@ -579,7 +603,7 @@ more details on the current implementation as of early 2004.
5A. "TOY" IMPLEMENTATION #1: LOCKING 5A. "TOY" IMPLEMENTATION #1: LOCKING
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section presents a "toy" RCU implementation that is based on This section presents a "toy" RCU implementation that is based on
familiar locking primitives. Its overhead makes it a non-starter for familiar locking primitives. Its overhead makes it a non-starter for
real-life use, as does its lack of scalability. It is also unsuitable real-life use, as does its lack of scalability. It is also unsuitable
...@@ -591,7 +615,7 @@ you allow nested rcu_read_lock() calls, you can deadlock. ...@@ -591,7 +615,7 @@ you allow nested rcu_read_lock() calls, you can deadlock.
However, it is probably the easiest implementation to relate to, so is However, it is probably the easiest implementation to relate to, so is
a good starting point. a good starting point.
It is extremely simple: It is extremely simple::
static DEFINE_RWLOCK(rcu_gp_mutex); static DEFINE_RWLOCK(rcu_gp_mutex);
...@@ -614,7 +638,7 @@ It is extremely simple: ...@@ -614,7 +638,7 @@ It is extremely simple:
[You can ignore rcu_assign_pointer() and rcu_dereference() without missing [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
much. But here are simplified versions anyway. And whatever you do, much. But here are simplified versions anyway. And whatever you do,
don't forget about them when submitting patches making use of RCU!] don't forget about them when submitting patches making use of RCU!]::
#define rcu_assign_pointer(p, v) \ #define rcu_assign_pointer(p, v) \
({ \ ({ \
...@@ -647,18 +671,23 @@ that the only thing that can block rcu_read_lock() is a synchronize_rcu(). ...@@ -647,18 +671,23 @@ that the only thing that can block rcu_read_lock() is a synchronize_rcu().
But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
so there can be no deadlock cycle. so there can be no deadlock cycle.
Quick Quiz #1: Why is this argument naive? How could a deadlock .. _quiz_1:
Quick Quiz #1:
Why is this argument naive? How could a deadlock
occur when using this algorithm in a real-world Linux occur when using this algorithm in a real-world Linux
kernel? How could this deadlock be avoided? kernel? How could this deadlock be avoided?
:ref:`Answers to Quick Quiz <8_whatisRCU>`
5B. "TOY" EXAMPLE #2: CLASSIC RCU 5B. "TOY" EXAMPLE #2: CLASSIC RCU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section presents a "toy" RCU implementation that is based on This section presents a "toy" RCU implementation that is based on
"classic RCU". It is also short on performance (but only for updates) and "classic RCU". It is also short on performance (but only for updates) and
on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
kernels. The definitions of rcu_dereference() and rcu_assign_pointer() kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
are the same as those shown in the preceding section, so they are omitted. are the same as those shown in the preceding section, so they are omitted.
::
void rcu_read_lock(void) { } void rcu_read_lock(void) { }
...@@ -683,14 +712,14 @@ CPU in turn. The run_on() primitive can be implemented straightforwardly ...@@ -683,14 +712,14 @@ CPU in turn. The run_on() primitive can be implemented straightforwardly
in terms of the sched_setaffinity() primitive. Of course, a somewhat less in terms of the sched_setaffinity() primitive. Of course, a somewhat less
"toy" implementation would restore the affinity upon completion rather "toy" implementation would restore the affinity upon completion rather
than just leaving all tasks running on the last CPU, but when I said than just leaving all tasks running on the last CPU, but when I said
"toy", I meant -toy-! "toy", I meant **toy**!
So how the heck is this supposed to work??? So how the heck is this supposed to work???
Remember that it is illegal to block while in an RCU read-side critical Remember that it is illegal to block while in an RCU read-side critical
section. Therefore, if a given CPU executes a context switch, we know section. Therefore, if a given CPU executes a context switch, we know
that it must have completed all preceding RCU read-side critical sections. that it must have completed all preceding RCU read-side critical sections.
Once -all- CPUs have executed a context switch, then -all- preceding Once **all** CPUs have executed a context switch, then **all** preceding
RCU read-side critical sections will have completed. RCU read-side critical sections will have completed.
So, suppose that we remove a data item from its structure and then invoke So, suppose that we remove a data item from its structure and then invoke
...@@ -698,19 +727,32 @@ synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed ...@@ -698,19 +727,32 @@ synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed
that there are no RCU read-side critical sections holding a reference that there are no RCU read-side critical sections holding a reference
to that data item, so we can safely reclaim it. to that data item, so we can safely reclaim it.
Quick Quiz #2: Give an example where Classic RCU's read-side .. _quiz_2:
overhead is -negative-.
Quick Quiz #2:
Give an example where Classic RCU's read-side
overhead is **negative**.
:ref:`Answers to Quick Quiz <8_whatisRCU>`
Quick Quiz #3: If it is illegal to block in an RCU read-side .. _quiz_3:
Quick Quiz #3:
If it is illegal to block in an RCU read-side
critical section, what the heck do you do in critical section, what the heck do you do in
PREEMPT_RT, where normal spinlocks can block??? PREEMPT_RT, where normal spinlocks can block???
:ref:`Answers to Quick Quiz <8_whatisRCU>`
.. _6_whatisRCU:
6. ANALOGY WITH READER-WRITER LOCKING 6. ANALOGY WITH READER-WRITER LOCKING
--------------------------------------
Although RCU can be used in many different ways, a very common use of Although RCU can be used in many different ways, a very common use of
RCU is analogous to reader-writer locking. The following unified RCU is analogous to reader-writer locking. The following unified
diff shows how closely related RCU and reader-writer locking can be. diff shows how closely related RCU and reader-writer locking can be.
::
@@ -5,5 +5,5 @@ struct el { @@ -5,5 +5,5 @@ struct el {
int data; int data;
...@@ -762,7 +804,7 @@ diff shows how closely related RCU and reader-writer locking can be. ...@@ -762,7 +804,7 @@ diff shows how closely related RCU and reader-writer locking can be.
return 0; return 0;
} }
Or, for those who prefer a side-by-side listing: Or, for those who prefer a side-by-side listing::
1 struct el { 1 struct el { 1 struct el { 1 struct el {
2 struct list_head list; 2 struct list_head list; 2 struct list_head list; 2 struct list_head list;
...@@ -774,40 +816,44 @@ Or, for those who prefer a side-by-side listing: ...@@ -774,40 +816,44 @@ Or, for those who prefer a side-by-side listing:
8 rwlock_t listmutex; 8 spinlock_t listmutex; 8 rwlock_t listmutex; 8 spinlock_t listmutex;
9 struct el head; 9 struct el head; 9 struct el head; 9 struct el head;
1 int search(long key, int *result) 1 int search(long key, int *result) ::
2 { 2 {
3 struct list_head *lp; 3 struct list_head *lp; 1 int search(long key, int *result) 1 int search(long key, int *result)
4 struct el *p; 4 struct el *p; 2 { 2 {
5 5 3 struct list_head *lp; 3 struct list_head *lp;
6 read_lock(&listmutex); 6 rcu_read_lock(); 4 struct el *p; 4 struct el *p;
7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { 5 5
8 if (p->key == key) { 8 if (p->key == key) { 6 read_lock(&listmutex); 6 rcu_read_lock();
9 *result = p->data; 9 *result = p->data; 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
10 read_unlock(&listmutex); 10 rcu_read_unlock(); 8 if (p->key == key) { 8 if (p->key == key) {
11 return 1; 11 return 1; 9 *result = p->data; 9 *result = p->data;
12 } 12 } 10 read_unlock(&listmutex); 10 rcu_read_unlock();
13 } 13 } 11 return 1; 11 return 1;
14 read_unlock(&listmutex); 14 rcu_read_unlock(); 12 } 12 }
15 return 0; 15 return 0; 13 } 13 }
16 } 16 } 14 read_unlock(&listmutex); 14 rcu_read_unlock();
15 return 0; 15 return 0;
1 int delete(long key) 1 int delete(long key) 16 } 16 }
2 { 2 {
3 struct el *p; 3 struct el *p; ::
4 4
5 write_lock(&listmutex); 5 spin_lock(&listmutex); 1 int delete(long key) 1 int delete(long key)
6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 2 { 2 {
7 if (p->key == key) { 7 if (p->key == key) { 3 struct el *p; 3 struct el *p;
8 list_del(&p->list); 8 list_del_rcu(&p->list); 4 4
9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 5 write_lock(&listmutex); 5 spin_lock(&listmutex);
10 synchronize_rcu(); 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
10 kfree(p); 11 kfree(p); 7 if (p->key == key) { 7 if (p->key == key) {
11 return 1; 12 return 1; 8 list_del(&p->list); 8 list_del_rcu(&p->list);
12 } 13 } 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
13 } 14 } 10 synchronize_rcu();
14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); 10 kfree(p); 11 kfree(p);
15 return 0; 16 return 0; 11 return 1; 12 return 1;
16 } 17 } 12 } 13 }
13 } 14 }
14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
15 return 0; 16 return 0;
16 } 17 }
Either way, the differences are quite small. Read-side locking moves Either way, the differences are quite small. Read-side locking moves
to rcu_read_lock() and rcu_read_unlock, update-side locking moves from to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
...@@ -825,22 +871,27 @@ delete() can now block. If this is a problem, there is a callback-based ...@@ -825,22 +871,27 @@ delete() can now block. If this is a problem, there is a callback-based
mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
be used in place of synchronize_rcu(). be used in place of synchronize_rcu().
.. _7_whatisRCU:
7. FULL LIST OF RCU APIs 7. FULL LIST OF RCU APIs
-------------------------
The RCU APIs are documented in docbook-format header comments in the The RCU APIs are documented in docbook-format header comments in the
Linux-kernel source code, but it helps to have a full list of the Linux-kernel source code, but it helps to have a full list of the
APIs, since there does not appear to be a way to categorize them APIs, since there does not appear to be a way to categorize them
in docbook. Here is the list, by category. in docbook. Here is the list, by category.
RCU list traversal: RCU list traversal::
list_entry_rcu list_entry_rcu
list_entry_lockless
list_first_entry_rcu list_first_entry_rcu
list_next_rcu list_next_rcu
list_for_each_entry_rcu list_for_each_entry_rcu
list_for_each_entry_continue_rcu list_for_each_entry_continue_rcu
list_for_each_entry_from_rcu list_for_each_entry_from_rcu
list_first_or_null_rcu
list_next_or_null_rcu
hlist_first_rcu hlist_first_rcu
hlist_next_rcu hlist_next_rcu
hlist_pprev_rcu hlist_pprev_rcu
...@@ -854,7 +905,7 @@ RCU list traversal: ...@@ -854,7 +905,7 @@ RCU list traversal:
hlist_bl_first_rcu hlist_bl_first_rcu
hlist_bl_for_each_entry_rcu hlist_bl_for_each_entry_rcu
RCU pointer/list update: RCU pointer/list update::
rcu_assign_pointer rcu_assign_pointer
list_add_rcu list_add_rcu
...@@ -864,10 +915,12 @@ RCU pointer/list update: ...@@ -864,10 +915,12 @@ RCU pointer/list update:
hlist_add_behind_rcu hlist_add_behind_rcu
hlist_add_before_rcu hlist_add_before_rcu
hlist_add_head_rcu hlist_add_head_rcu
hlist_add_tail_rcu
hlist_del_rcu hlist_del_rcu
hlist_del_init_rcu hlist_del_init_rcu
hlist_replace_rcu hlist_replace_rcu
list_splice_init_rcu() list_splice_init_rcu
list_splice_tail_init_rcu
hlist_nulls_del_init_rcu hlist_nulls_del_init_rcu
hlist_nulls_del_rcu hlist_nulls_del_rcu
hlist_nulls_add_head_rcu hlist_nulls_add_head_rcu
...@@ -876,7 +929,9 @@ RCU pointer/list update: ...@@ -876,7 +929,9 @@ RCU pointer/list update:
hlist_bl_del_rcu hlist_bl_del_rcu
hlist_bl_set_first_rcu hlist_bl_set_first_rcu
RCU: Critical sections Grace period Barrier RCU::
Critical sections Grace period Barrier
rcu_read_lock synchronize_net rcu_barrier rcu_read_lock synchronize_net rcu_barrier
rcu_read_unlock synchronize_rcu rcu_read_unlock synchronize_rcu
...@@ -885,7 +940,9 @@ RCU: Critical sections Grace period Barrier ...@@ -885,7 +940,9 @@ RCU: Critical sections Grace period Barrier
rcu_dereference_check kfree_rcu rcu_dereference_check kfree_rcu
rcu_dereference_protected rcu_dereference_protected
bh: Critical sections Grace period Barrier bh::
Critical sections Grace period Barrier
rcu_read_lock_bh call_rcu rcu_barrier rcu_read_lock_bh call_rcu rcu_barrier
rcu_read_unlock_bh synchronize_rcu rcu_read_unlock_bh synchronize_rcu
...@@ -896,7 +953,9 @@ bh: Critical sections Grace period Barrier ...@@ -896,7 +953,9 @@ bh: Critical sections Grace period Barrier
rcu_dereference_bh_protected rcu_dereference_bh_protected
rcu_read_lock_bh_held rcu_read_lock_bh_held
sched: Critical sections Grace period Barrier sched::
Critical sections Grace period Barrier
rcu_read_lock_sched call_rcu rcu_barrier rcu_read_lock_sched call_rcu rcu_barrier
rcu_read_unlock_sched synchronize_rcu rcu_read_unlock_sched synchronize_rcu
...@@ -910,7 +969,9 @@ sched: Critical sections Grace period Barrier ...@@ -910,7 +969,9 @@ sched: Critical sections Grace period Barrier
rcu_read_lock_sched_held rcu_read_lock_sched_held
SRCU: Critical sections Grace period Barrier SRCU::
Critical sections Grace period Barrier
srcu_read_lock call_srcu srcu_barrier srcu_read_lock call_srcu srcu_barrier
srcu_read_unlock synchronize_srcu srcu_read_unlock synchronize_srcu
...@@ -918,13 +979,14 @@ SRCU: Critical sections Grace period Barrier ...@@ -918,13 +979,14 @@ SRCU: Critical sections Grace period Barrier
srcu_dereference_check srcu_dereference_check
srcu_read_lock_held srcu_read_lock_held
SRCU: Initialization/cleanup SRCU: Initialization/cleanup::
DEFINE_SRCU DEFINE_SRCU
DEFINE_STATIC_SRCU DEFINE_STATIC_SRCU
init_srcu_struct init_srcu_struct
cleanup_srcu_struct cleanup_srcu_struct
All: lockdep-checked RCU-protected pointer access All: lockdep-checked RCU-protected pointer access::
rcu_access_pointer rcu_access_pointer
rcu_dereference_raw rcu_dereference_raw
...@@ -974,15 +1036,19 @@ g. Otherwise, use RCU. ...@@ -974,15 +1036,19 @@ g. Otherwise, use RCU.
Of course, this all assumes that you have determined that RCU is in fact Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job. the right tool for your job.
.. _8_whatisRCU:
8. ANSWERS TO QUICK QUIZZES 8. ANSWERS TO QUICK QUIZZES
----------------------------
Quick Quiz #1: Why is this argument naive? How could a deadlock Quick Quiz #1:
Why is this argument naive? How could a deadlock
occur when using this algorithm in a real-world Linux occur when using this algorithm in a real-world Linux
kernel? [Referring to the lock-based "toy" RCU kernel? [Referring to the lock-based "toy" RCU
algorithm.] algorithm.]
Answer: Consider the following sequence of events: Answer:
Consider the following sequence of events:
1. CPU 0 acquires some unrelated lock, call it 1. CPU 0 acquires some unrelated lock, call it
"problematic_lock", disabling irq via "problematic_lock", disabling irq via
...@@ -1021,10 +1087,14 @@ Answer: Consider the following sequence of events: ...@@ -1021,10 +1087,14 @@ Answer: Consider the following sequence of events:
approach where tasks in RCU read-side critical sections approach where tasks in RCU read-side critical sections
cannot be blocked by tasks executing synchronize_rcu(). cannot be blocked by tasks executing synchronize_rcu().
Quick Quiz #2: Give an example where Classic RCU's read-side :ref:`Back to Quick Quiz #1 <quiz_1>`
overhead is -negative-.
Quick Quiz #2:
Give an example where Classic RCU's read-side
overhead is **negative**.
Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT Answer:
Imagine a single-CPU system with a non-CONFIG_PREEMPT
kernel where a routing table is used by process-context kernel where a routing table is used by process-context
code, but can be updated by irq-context code (for example, code, but can be updated by irq-context code (for example,
by an "ICMP REDIRECT" packet). The usual way of handling by an "ICMP REDIRECT" packet). The usual way of handling
...@@ -1046,11 +1116,15 @@ Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT ...@@ -1046,11 +1116,15 @@ Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT
even the theoretical possibility of negative overhead for even the theoretical possibility of negative overhead for
a synchronization primitive is a bit unexpected. ;-) a synchronization primitive is a bit unexpected. ;-)
Quick Quiz #3: If it is illegal to block in an RCU read-side :ref:`Back to Quick Quiz #2 <quiz_2>`
Quick Quiz #3:
If it is illegal to block in an RCU read-side
critical section, what the heck do you do in critical section, what the heck do you do in
PREEMPT_RT, where normal spinlocks can block??? PREEMPT_RT, where normal spinlocks can block???
Answer: Just as PREEMPT_RT permits preemption of spinlock Answer:
Just as PREEMPT_RT permits preemption of spinlock
critical sections, it permits preemption of RCU critical sections, it permits preemption of RCU
read-side critical sections. It also permits read-side critical sections. It also permits
spinlocks blocking while in RCU read-side critical spinlocks blocking while in RCU read-side critical
...@@ -1069,6 +1143,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock ...@@ -1069,6 +1143,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
Besides, how does the computer know what pizza parlor Besides, how does the computer know what pizza parlor
the human being went to??? the human being went to???
:ref:`Back to Quick Quiz #3 <quiz_3>`
ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS
......
...@@ -4001,6 +4001,19 @@ ...@@ -4001,6 +4001,19 @@
test until boot completes in order to avoid test until boot completes in order to avoid
interference. interference.
rcuperf.kfree_rcu_test= [KNL]
Set to measure performance of kfree_rcu() flooding.
rcuperf.kfree_nthreads= [KNL]
The number of threads running loops of kfree_rcu().
rcuperf.kfree_alloc_num= [KNL]
Number of allocations and frees done in an iteration.
rcuperf.kfree_loops= [KNL]
Number of loops doing rcuperf.kfree_alloc_num number
of allocations and frees.
rcuperf.nreaders= [KNL] rcuperf.nreaders= [KNL]
Set number of RCU readers. The value -1 selects Set number of RCU readers. The value -1 selects
N, where N is the number of CPUs. A value N, where N is the number of CPUs. A value
......
...@@ -18,8 +18,6 @@ ...@@ -18,8 +18,6 @@
* mb() prevents loads and stores being reordered across this point. * mb() prevents loads and stores being reordered across this point.
* rmb() prevents loads being reordered across this point. * rmb() prevents loads being reordered across this point.
* wmb() prevents stores being reordered across this point. * wmb() prevents stores being reordered across this point.
* read_barrier_depends() prevents data-dependent loads being reordered
* across this point (nop on PPC).
* *
* *mb() variants without smp_ prefix must order all types of memory * *mb() variants without smp_ prefix must order all types of memory
* operations with one another. sync is the only instruction sufficient * operations with one another. sync is the only instruction sufficient
......
...@@ -281,8 +281,8 @@ void mt76_rx_aggr_stop(struct mt76_dev *dev, struct mt76_wcid *wcid, u8 tidno) ...@@ -281,8 +281,8 @@ void mt76_rx_aggr_stop(struct mt76_dev *dev, struct mt76_wcid *wcid, u8 tidno)
{ {
struct mt76_rx_tid *tid = NULL; struct mt76_rx_tid *tid = NULL;
rcu_swap_protected(wcid->aggr[tidno], tid, tid = rcu_replace_pointer(wcid->aggr[tidno], tid,
lockdep_is_held(&dev->mutex)); lockdep_is_held(&dev->mutex));
if (tid) { if (tid) {
mt76_rx_aggr_shutdown(dev, tid); mt76_rx_aggr_shutdown(dev, tid);
kfree_rcu(tid, rcu_head); kfree_rcu(tid, rcu_head);
......
...@@ -23,6 +23,13 @@ ...@@ -23,6 +23,13 @@
#define LIST_HEAD(name) \ #define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name) struct list_head name = LIST_HEAD_INIT(name)
/**
* INIT_LIST_HEAD - Initialize a list_head structure
* @list: list_head structure to be initialized.
*
* Initializes the list_head to point to itself. If it is a list header,
* the result is an empty list.
*/
static inline void INIT_LIST_HEAD(struct list_head *list) static inline void INIT_LIST_HEAD(struct list_head *list)
{ {
WRITE_ONCE(list->next, list); WRITE_ONCE(list->next, list);
...@@ -120,12 +127,6 @@ static inline void __list_del_clearprev(struct list_head *entry) ...@@ -120,12 +127,6 @@ static inline void __list_del_clearprev(struct list_head *entry)
entry->prev = NULL; entry->prev = NULL;
} }
/**
* list_del - deletes entry from list.
* @entry: the element to delete from the list.
* Note: list_empty() on entry does not return true after this, the entry is
* in an undefined state.
*/
static inline void __list_del_entry(struct list_head *entry) static inline void __list_del_entry(struct list_head *entry)
{ {
if (!__list_del_entry_valid(entry)) if (!__list_del_entry_valid(entry))
...@@ -134,6 +135,12 @@ static inline void __list_del_entry(struct list_head *entry) ...@@ -134,6 +135,12 @@ static inline void __list_del_entry(struct list_head *entry)
__list_del(entry->prev, entry->next); __list_del(entry->prev, entry->next);
} }
/**
* list_del - deletes entry from list.
* @entry: the element to delete from the list.
* Note: list_empty() on entry does not return true after this, the entry is
* in an undefined state.
*/
static inline void list_del(struct list_head *entry) static inline void list_del(struct list_head *entry)
{ {
__list_del_entry(entry); __list_del_entry(entry);
...@@ -157,8 +164,15 @@ static inline void list_replace(struct list_head *old, ...@@ -157,8 +164,15 @@ static inline void list_replace(struct list_head *old,
new->prev->next = new; new->prev->next = new;
} }
/**
* list_replace_init - replace old entry by new one and initialize the old one
* @old : the element to be replaced
* @new : the new element to insert
*
* If @old was empty, it will be overwritten.
*/
static inline void list_replace_init(struct list_head *old, static inline void list_replace_init(struct list_head *old,
struct list_head *new) struct list_head *new)
{ {
list_replace(old, new); list_replace(old, new);
INIT_LIST_HEAD(old); INIT_LIST_HEAD(old);
...@@ -754,11 +768,36 @@ static inline void INIT_HLIST_NODE(struct hlist_node *h) ...@@ -754,11 +768,36 @@ static inline void INIT_HLIST_NODE(struct hlist_node *h)
h->pprev = NULL; h->pprev = NULL;
} }
/**
* hlist_unhashed - Has node been removed from list and reinitialized?
* @h: Node to be checked
*
* Not that not all removal functions will leave a node in unhashed
* state. For example, hlist_nulls_del_init_rcu() does leave the
* node in unhashed state, but hlist_nulls_del() does not.
*/
static inline int hlist_unhashed(const struct hlist_node *h) static inline int hlist_unhashed(const struct hlist_node *h)
{ {
return !h->pprev; return !h->pprev;
} }
/**
* hlist_unhashed_lockless - Version of hlist_unhashed for lockless use
* @h: Node to be checked
*
* This variant of hlist_unhashed() must be used in lockless contexts
* to avoid potential load-tearing. The READ_ONCE() is paired with the
* various WRITE_ONCE() in hlist helpers that are defined below.
*/
static inline int hlist_unhashed_lockless(const struct hlist_node *h)
{
return !READ_ONCE(h->pprev);
}
/**
* hlist_empty - Is the specified hlist_head structure an empty hlist?
* @h: Structure to check.
*/
static inline int hlist_empty(const struct hlist_head *h) static inline int hlist_empty(const struct hlist_head *h)
{ {
return !READ_ONCE(h->first); return !READ_ONCE(h->first);
...@@ -771,9 +810,16 @@ static inline void __hlist_del(struct hlist_node *n) ...@@ -771,9 +810,16 @@ static inline void __hlist_del(struct hlist_node *n)
WRITE_ONCE(*pprev, next); WRITE_ONCE(*pprev, next);
if (next) if (next)
next->pprev = pprev; WRITE_ONCE(next->pprev, pprev);
} }
/**
* hlist_del - Delete the specified hlist_node from its list
* @n: Node to delete.
*
* Note that this function leaves the node in hashed state. Use
* hlist_del_init() or similar instead to unhash @n.
*/
static inline void hlist_del(struct hlist_node *n) static inline void hlist_del(struct hlist_node *n)
{ {
__hlist_del(n); __hlist_del(n);
...@@ -781,6 +827,12 @@ static inline void hlist_del(struct hlist_node *n) ...@@ -781,6 +827,12 @@ static inline void hlist_del(struct hlist_node *n)
n->pprev = LIST_POISON2; n->pprev = LIST_POISON2;
} }
/**
* hlist_del_init - Delete the specified hlist_node from its list and initialize
* @n: Node to delete.
*
* Note that this function leaves the node in unhashed state.
*/
static inline void hlist_del_init(struct hlist_node *n) static inline void hlist_del_init(struct hlist_node *n)
{ {
if (!hlist_unhashed(n)) { if (!hlist_unhashed(n)) {
...@@ -789,51 +841,83 @@ static inline void hlist_del_init(struct hlist_node *n) ...@@ -789,51 +841,83 @@ static inline void hlist_del_init(struct hlist_node *n)
} }
} }
/**
* hlist_add_head - add a new entry at the beginning of the hlist
* @n: new entry to be added
* @h: hlist head to add it after
*
* Insert a new entry after the specified head.
* This is good for implementing stacks.
*/
static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h) static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{ {
struct hlist_node *first = h->first; struct hlist_node *first = h->first;
n->next = first; WRITE_ONCE(n->next, first);
if (first) if (first)
first->pprev = &n->next; WRITE_ONCE(first->pprev, &n->next);
WRITE_ONCE(h->first, n); WRITE_ONCE(h->first, n);
n->pprev = &h->first; WRITE_ONCE(n->pprev, &h->first);
} }
/* next must be != NULL */ /**
* hlist_add_before - add a new entry before the one specified
* @n: new entry to be added
* @next: hlist node to add it before, which must be non-NULL
*/
static inline void hlist_add_before(struct hlist_node *n, static inline void hlist_add_before(struct hlist_node *n,
struct hlist_node *next) struct hlist_node *next)
{ {
n->pprev = next->pprev; WRITE_ONCE(n->pprev, next->pprev);
n->next = next; WRITE_ONCE(n->next, next);
next->pprev = &n->next; WRITE_ONCE(next->pprev, &n->next);
WRITE_ONCE(*(n->pprev), n); WRITE_ONCE(*(n->pprev), n);
} }
/**
* hlist_add_behing - add a new entry after the one specified
* @n: new entry to be added
* @prev: hlist node to add it after, which must be non-NULL
*/
static inline void hlist_add_behind(struct hlist_node *n, static inline void hlist_add_behind(struct hlist_node *n,
struct hlist_node *prev) struct hlist_node *prev)
{ {
n->next = prev->next; WRITE_ONCE(n->next, prev->next);
prev->next = n; WRITE_ONCE(prev->next, n);
n->pprev = &prev->next; WRITE_ONCE(n->pprev, &prev->next);
if (n->next) if (n->next)
n->next->pprev = &n->next; WRITE_ONCE(n->next->pprev, &n->next);
} }
/* after that we'll appear to be on some hlist and hlist_del will work */ /**
* hlist_add_fake - create a fake hlist consisting of a single headless node
* @n: Node to make a fake list out of
*
* This makes @n appear to be its own predecessor on a headless hlist.
* The point of this is to allow things like hlist_del() to work correctly
* in cases where there is no list.
*/
static inline void hlist_add_fake(struct hlist_node *n) static inline void hlist_add_fake(struct hlist_node *n)
{ {
n->pprev = &n->next; n->pprev = &n->next;
} }
/**
* hlist_fake: Is this node a fake hlist?
* @h: Node to check for being a self-referential fake hlist.
*/
static inline bool hlist_fake(struct hlist_node *h) static inline bool hlist_fake(struct hlist_node *h)
{ {
return h->pprev == &h->next; return h->pprev == &h->next;
} }
/* /**
* hlist_is_singular_node - is node the only element of the specified hlist?
* @n: Node to check for singularity.
* @h: Header for potentially singular list.
*
* Check whether the node is the only node of the head without * Check whether the node is the only node of the head without
* accessing head: * accessing head, thus avoiding unnecessary cache misses.
*/ */
static inline bool static inline bool
hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h) hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h)
...@@ -841,7 +925,11 @@ hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h) ...@@ -841,7 +925,11 @@ hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h)
return !n->next && n->pprev == &h->first; return !n->next && n->pprev == &h->first;
} }
/* /**
* hlist_move_list - Move an hlist
* @old: hlist_head for old list.
* @new: hlist_head for new list.
*
* Move a list from one list head to another. Fixup the pprev * Move a list from one list head to another. Fixup the pprev
* reference of the first entry if it exists. * reference of the first entry if it exists.
*/ */
......
...@@ -56,11 +56,33 @@ static inline unsigned long get_nulls_value(const struct hlist_nulls_node *ptr) ...@@ -56,11 +56,33 @@ static inline unsigned long get_nulls_value(const struct hlist_nulls_node *ptr)
return ((unsigned long)ptr) >> 1; return ((unsigned long)ptr) >> 1;
} }
/**
* hlist_nulls_unhashed - Has node been removed and reinitialized?
* @h: Node to be checked
*
* Not that not all removal functions will leave a node in unhashed state.
* For example, hlist_del_init_rcu() leaves the node in unhashed state,
* but hlist_nulls_del() does not.
*/
static inline int hlist_nulls_unhashed(const struct hlist_nulls_node *h) static inline int hlist_nulls_unhashed(const struct hlist_nulls_node *h)
{ {
return !h->pprev; return !h->pprev;
} }
/**
* hlist_nulls_unhashed_lockless - Has node been removed and reinitialized?
* @h: Node to be checked
*
* Not that not all removal functions will leave a node in unhashed state.
* For example, hlist_del_init_rcu() leaves the node in unhashed state,
* but hlist_nulls_del() does not. Unlike hlist_nulls_unhashed(), this
* function may be used locklessly.
*/
static inline int hlist_nulls_unhashed_lockless(const struct hlist_nulls_node *h)
{
return !READ_ONCE(h->pprev);
}
static inline int hlist_nulls_empty(const struct hlist_nulls_head *h) static inline int hlist_nulls_empty(const struct hlist_nulls_head *h)
{ {
return is_a_nulls(READ_ONCE(h->first)); return is_a_nulls(READ_ONCE(h->first));
...@@ -72,10 +94,10 @@ static inline void hlist_nulls_add_head(struct hlist_nulls_node *n, ...@@ -72,10 +94,10 @@ static inline void hlist_nulls_add_head(struct hlist_nulls_node *n,
struct hlist_nulls_node *first = h->first; struct hlist_nulls_node *first = h->first;
n->next = first; n->next = first;
n->pprev = &h->first; WRITE_ONCE(n->pprev, &h->first);
h->first = n; h->first = n;
if (!is_a_nulls(first)) if (!is_a_nulls(first))
first->pprev = &n->next; WRITE_ONCE(first->pprev, &n->next);
} }
static inline void __hlist_nulls_del(struct hlist_nulls_node *n) static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
...@@ -85,13 +107,13 @@ static inline void __hlist_nulls_del(struct hlist_nulls_node *n) ...@@ -85,13 +107,13 @@ static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
WRITE_ONCE(*pprev, next); WRITE_ONCE(*pprev, next);
if (!is_a_nulls(next)) if (!is_a_nulls(next))
next->pprev = pprev; WRITE_ONCE(next->pprev, pprev);
} }
static inline void hlist_nulls_del(struct hlist_nulls_node *n) static inline void hlist_nulls_del(struct hlist_nulls_node *n)
{ {
__hlist_nulls_del(n); __hlist_nulls_del(n);
n->pprev = LIST_POISON2; WRITE_ONCE(n->pprev, LIST_POISON2);
} }
/** /**
......
...@@ -22,7 +22,6 @@ struct rcu_cblist { ...@@ -22,7 +22,6 @@ struct rcu_cblist {
struct rcu_head *head; struct rcu_head *head;
struct rcu_head **tail; struct rcu_head **tail;
long len; long len;
long len_lazy;
}; };
#define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head } #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
...@@ -73,7 +72,6 @@ struct rcu_segcblist { ...@@ -73,7 +72,6 @@ struct rcu_segcblist {
#else #else
long len; long len;
#endif #endif
long len_lazy;
u8 enabled; u8 enabled;
u8 offloaded; u8 offloaded;
}; };
......
...@@ -40,6 +40,16 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list) ...@@ -40,6 +40,16 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
*/ */
#define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next))) #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
/**
* list_tail_rcu - returns the prev pointer of the head of the list
* @head: the head of the list
*
* Note: This should only be used with the list header, and even then
* only if list_del() and similar primitives are not also used on the
* list header.
*/
#define list_tail_rcu(head) (*((struct list_head __rcu **)(&(head)->prev)))
/* /*
* Check during list traversal that we are within an RCU reader * Check during list traversal that we are within an RCU reader
*/ */
...@@ -173,7 +183,7 @@ static inline void hlist_del_init_rcu(struct hlist_node *n) ...@@ -173,7 +183,7 @@ static inline void hlist_del_init_rcu(struct hlist_node *n)
{ {
if (!hlist_unhashed(n)) { if (!hlist_unhashed(n)) {
__hlist_del(n); __hlist_del(n);
n->pprev = NULL; WRITE_ONCE(n->pprev, NULL);
} }
} }
...@@ -361,7 +371,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, ...@@ -361,7 +371,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* @pos: the type * to use as a loop cursor. * @pos: the type * to use as a loop cursor.
* @head: the head for your list. * @head: the head for your list.
* @member: the name of the list_head within the struct. * @member: the name of the list_head within the struct.
* @cond: optional lockdep expression if called from non-RCU protection. * @cond...: optional lockdep expression if called from non-RCU protection.
* *
* This list-traversal primitive may safely run concurrently with * This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as list_add_rcu() * the _rcu list-mutation primitives such as list_add_rcu()
...@@ -473,7 +483,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, ...@@ -473,7 +483,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
static inline void hlist_del_rcu(struct hlist_node *n) static inline void hlist_del_rcu(struct hlist_node *n)
{ {
__hlist_del(n); __hlist_del(n);
n->pprev = LIST_POISON2; WRITE_ONCE(n->pprev, LIST_POISON2);
} }
/** /**
...@@ -489,11 +499,11 @@ static inline void hlist_replace_rcu(struct hlist_node *old, ...@@ -489,11 +499,11 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
struct hlist_node *next = old->next; struct hlist_node *next = old->next;
new->next = next; new->next = next;
new->pprev = old->pprev; WRITE_ONCE(new->pprev, old->pprev);
rcu_assign_pointer(*(struct hlist_node __rcu **)new->pprev, new); rcu_assign_pointer(*(struct hlist_node __rcu **)new->pprev, new);
if (next) if (next)
new->next->pprev = &new->next; WRITE_ONCE(new->next->pprev, &new->next);
old->pprev = LIST_POISON2; WRITE_ONCE(old->pprev, LIST_POISON2);
} }
/* /*
...@@ -528,10 +538,10 @@ static inline void hlist_add_head_rcu(struct hlist_node *n, ...@@ -528,10 +538,10 @@ static inline void hlist_add_head_rcu(struct hlist_node *n,
struct hlist_node *first = h->first; struct hlist_node *first = h->first;
n->next = first; n->next = first;
n->pprev = &h->first; WRITE_ONCE(n->pprev, &h->first);
rcu_assign_pointer(hlist_first_rcu(h), n); rcu_assign_pointer(hlist_first_rcu(h), n);
if (first) if (first)
first->pprev = &n->next; WRITE_ONCE(first->pprev, &n->next);
} }
/** /**
...@@ -564,7 +574,7 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n, ...@@ -564,7 +574,7 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n,
if (last) { if (last) {
n->next = last->next; n->next = last->next;
n->pprev = &last->next; WRITE_ONCE(n->pprev, &last->next);
rcu_assign_pointer(hlist_next_rcu(last), n); rcu_assign_pointer(hlist_next_rcu(last), n);
} else { } else {
hlist_add_head_rcu(n, h); hlist_add_head_rcu(n, h);
...@@ -592,10 +602,10 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n, ...@@ -592,10 +602,10 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n,
static inline void hlist_add_before_rcu(struct hlist_node *n, static inline void hlist_add_before_rcu(struct hlist_node *n,
struct hlist_node *next) struct hlist_node *next)
{ {
n->pprev = next->pprev; WRITE_ONCE(n->pprev, next->pprev);
n->next = next; n->next = next;
rcu_assign_pointer(hlist_pprev_rcu(n), n); rcu_assign_pointer(hlist_pprev_rcu(n), n);
next->pprev = &n->next; WRITE_ONCE(next->pprev, &n->next);
} }
/** /**
...@@ -620,10 +630,10 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n, ...@@ -620,10 +630,10 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
struct hlist_node *prev) struct hlist_node *prev)
{ {
n->next = prev->next; n->next = prev->next;
n->pprev = &prev->next; WRITE_ONCE(n->pprev, &prev->next);
rcu_assign_pointer(hlist_next_rcu(prev), n); rcu_assign_pointer(hlist_next_rcu(prev), n);
if (n->next) if (n->next)
n->next->pprev = &n->next; WRITE_ONCE(n->next->pprev, &n->next);
} }
#define __hlist_for_each_rcu(pos, head) \ #define __hlist_for_each_rcu(pos, head) \
...@@ -636,7 +646,7 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n, ...@@ -636,7 +646,7 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
* @pos: the type * to use as a loop cursor. * @pos: the type * to use as a loop cursor.
* @head: the head for your list. * @head: the head for your list.
* @member: the name of the hlist_node within the struct. * @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection. * @cond...: optional lockdep expression if called from non-RCU protection.
* *
* This list-traversal primitive may safely run concurrently with * This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu() * the _rcu list-mutation primitives such as hlist_add_head_rcu()
......
...@@ -34,13 +34,21 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n) ...@@ -34,13 +34,21 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
{ {
if (!hlist_nulls_unhashed(n)) { if (!hlist_nulls_unhashed(n)) {
__hlist_nulls_del(n); __hlist_nulls_del(n);
n->pprev = NULL; WRITE_ONCE(n->pprev, NULL);
} }
} }
/**
* hlist_nulls_first_rcu - returns the first element of the hash list.
* @head: the head of the list.
*/
#define hlist_nulls_first_rcu(head) \ #define hlist_nulls_first_rcu(head) \
(*((struct hlist_nulls_node __rcu __force **)&(head)->first)) (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
/**
* hlist_nulls_next_rcu - returns the element of the list after @node.
* @node: element of the list.
*/
#define hlist_nulls_next_rcu(node) \ #define hlist_nulls_next_rcu(node) \
(*((struct hlist_nulls_node __rcu __force **)&(node)->next)) (*((struct hlist_nulls_node __rcu __force **)&(node)->next))
...@@ -66,7 +74,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n) ...@@ -66,7 +74,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n) static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n)
{ {
__hlist_nulls_del(n); __hlist_nulls_del(n);
n->pprev = LIST_POISON2; WRITE_ONCE(n->pprev, LIST_POISON2);
} }
/** /**
...@@ -94,10 +102,10 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, ...@@ -94,10 +102,10 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
struct hlist_nulls_node *first = h->first; struct hlist_nulls_node *first = h->first;
n->next = first; n->next = first;
n->pprev = &h->first; WRITE_ONCE(n->pprev, &h->first);
rcu_assign_pointer(hlist_nulls_first_rcu(h), n); rcu_assign_pointer(hlist_nulls_first_rcu(h), n);
if (!is_a_nulls(first)) if (!is_a_nulls(first))
first->pprev = &n->next; WRITE_ONCE(first->pprev, &n->next);
} }
/** /**
...@@ -141,7 +149,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n, ...@@ -141,7 +149,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n,
* hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type
* @tpos: the type * to use as a loop cursor. * @tpos: the type * to use as a loop cursor.
* @pos: the &struct hlist_nulls_node to use as a loop cursor. * @pos: the &struct hlist_nulls_node to use as a loop cursor.
* @head: the head for your list. * @head: the head of the list.
* @member: the name of the hlist_nulls_node within the struct. * @member: the name of the hlist_nulls_node within the struct.
* *
* The barrier() is needed to make sure compiler doesn't cache first element [1], * The barrier() is needed to make sure compiler doesn't cache first element [1],
...@@ -161,7 +169,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n, ...@@ -161,7 +169,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n,
* iterate over list of given type safe against removal of list entry * iterate over list of given type safe against removal of list entry
* @tpos: the type * to use as a loop cursor. * @tpos: the type * to use as a loop cursor.
* @pos: the &struct hlist_nulls_node to use as a loop cursor. * @pos: the &struct hlist_nulls_node to use as a loop cursor.
* @head: the head for your list. * @head: the head of the list.
* @member: the name of the hlist_nulls_node within the struct. * @member: the name of the hlist_nulls_node within the struct.
*/ */
#define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \ #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \
......
...@@ -154,7 +154,7 @@ static inline void exit_tasks_rcu_finish(void) { } ...@@ -154,7 +154,7 @@ static inline void exit_tasks_rcu_finish(void) { }
* *
* This macro resembles cond_resched(), except that it is defined to * This macro resembles cond_resched(), except that it is defined to
* report potential quiescent states to RCU-tasks even if the cond_resched() * report potential quiescent states to RCU-tasks even if the cond_resched()
* machinery were to be shut off, as some advocate for PREEMPT kernels. * machinery were to be shut off, as some advocate for PREEMPTION kernels.
*/ */
#define cond_resched_tasks_rcu_qs() \ #define cond_resched_tasks_rcu_qs() \
do { \ do { \
...@@ -167,7 +167,7 @@ do { \ ...@@ -167,7 +167,7 @@ do { \
* TREE_RCU and rcu_barrier_() primitives in TINY_RCU. * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
*/ */
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) #if defined(CONFIG_TREE_RCU)
#include <linux/rcutree.h> #include <linux/rcutree.h>
#elif defined(CONFIG_TINY_RCU) #elif defined(CONFIG_TINY_RCU)
#include <linux/rcutiny.h> #include <linux/rcutiny.h>
...@@ -400,22 +400,6 @@ do { \ ...@@ -400,22 +400,6 @@ do { \
__tmp; \ __tmp; \
}) })
/**
* rcu_swap_protected() - swap an RCU and a regular pointer
* @rcu_ptr: RCU pointer
* @ptr: regular pointer
* @c: the conditions under which the dereference will take place
*
* Perform swap(@rcu_ptr, @ptr) where @rcu_ptr is an RCU-annotated pointer and
* @c is the argument that is passed to the rcu_dereference_protected() call
* used to read that pointer.
*/
#define rcu_swap_protected(rcu_ptr, ptr, c) do { \
typeof(ptr) __tmp = rcu_dereference_protected((rcu_ptr), (c)); \
rcu_assign_pointer((rcu_ptr), (ptr)); \
(ptr) = __tmp; \
} while (0)
/** /**
* rcu_access_pointer() - fetch RCU pointer with no dereferencing * rcu_access_pointer() - fetch RCU pointer with no dereferencing
* @p: The pointer to read * @p: The pointer to read
...@@ -598,10 +582,10 @@ do { \ ...@@ -598,10 +582,10 @@ do { \
* *
* You can avoid reading and understanding the next paragraph by * You can avoid reading and understanding the next paragraph by
* following this rule: don't put anything in an rcu_read_lock() RCU * following this rule: don't put anything in an rcu_read_lock() RCU
* read-side critical section that would block in a !PREEMPT kernel. * read-side critical section that would block in a !PREEMPTION kernel.
* But if you want the full story, read on! * But if you want the full story, read on!
* *
* In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), * In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU),
* it is illegal to block while in an RCU read-side critical section. * it is illegal to block while in an RCU read-side critical section.
* In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION
* kernel builds, RCU read-side critical sections may be preempted, * kernel builds, RCU read-side critical sections may be preempted,
...@@ -912,4 +896,8 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f) ...@@ -912,4 +896,8 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
return false; return false;
} }
/* kernel/ksysfs.c definitions */
extern int rcu_expedited;
extern int rcu_normal;
#endif /* __LINUX_RCUPDATE_H */ #endif /* __LINUX_RCUPDATE_H */
...@@ -85,6 +85,7 @@ static inline void rcu_scheduler_starting(void) { } ...@@ -85,6 +85,7 @@ static inline void rcu_scheduler_starting(void) { }
static inline void rcu_end_inkernel_boot(void) { } static inline void rcu_end_inkernel_boot(void) { }
static inline bool rcu_is_watching(void) { return true; } static inline bool rcu_is_watching(void) { return true; }
static inline void rcu_momentary_dyntick_idle(void) { } static inline void rcu_momentary_dyntick_idle(void) { }
static inline void kfree_rcu_scheduler_running(void) { }
/* Avoid RCU read-side critical sections leaking across. */ /* Avoid RCU read-side critical sections leaking across. */
static inline void rcu_all_qs(void) { barrier(); } static inline void rcu_all_qs(void) { barrier(); }
......
...@@ -38,6 +38,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func); ...@@ -38,6 +38,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier(void); void rcu_barrier(void);
bool rcu_eqs_special_set(int cpu); bool rcu_eqs_special_set(int cpu);
void rcu_momentary_dyntick_idle(void); void rcu_momentary_dyntick_idle(void);
void kfree_rcu_scheduler_running(void);
unsigned long get_state_synchronize_rcu(void); unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate); void cond_synchronize_rcu(unsigned long oldstate);
......
...@@ -109,8 +109,10 @@ enum tick_dep_bits { ...@@ -109,8 +109,10 @@ enum tick_dep_bits {
TICK_DEP_BIT_PERF_EVENTS = 1, TICK_DEP_BIT_PERF_EVENTS = 1,
TICK_DEP_BIT_SCHED = 2, TICK_DEP_BIT_SCHED = 2,
TICK_DEP_BIT_CLOCK_UNSTABLE = 3, TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
TICK_DEP_BIT_RCU = 4 TICK_DEP_BIT_RCU = 4,
TICK_DEP_BIT_RCU_EXP = 5
}; };
#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU_EXP
#define TICK_DEP_MASK_NONE 0 #define TICK_DEP_MASK_NONE 0
#define TICK_DEP_MASK_POSIX_TIMER (1 << TICK_DEP_BIT_POSIX_TIMER) #define TICK_DEP_MASK_POSIX_TIMER (1 << TICK_DEP_BIT_POSIX_TIMER)
...@@ -118,6 +120,7 @@ enum tick_dep_bits { ...@@ -118,6 +120,7 @@ enum tick_dep_bits {
#define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED) #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
#define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE) #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU) #define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
#define TICK_DEP_MASK_RCU_EXP (1 << TICK_DEP_BIT_RCU_EXP)
#ifdef CONFIG_NO_HZ_COMMON #ifdef CONFIG_NO_HZ_COMMON
extern bool tick_nohz_enabled; extern bool tick_nohz_enabled;
......
...@@ -41,7 +41,7 @@ TRACE_EVENT(rcu_utilization, ...@@ -41,7 +41,7 @@ TRACE_EVENT(rcu_utilization,
TP_printk("%s", __entry->s) TP_printk("%s", __entry->s)
); );
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) #if defined(CONFIG_TREE_RCU)
/* /*
* Tracepoint for grace-period events. Takes a string identifying the * Tracepoint for grace-period events. Takes a string identifying the
...@@ -432,7 +432,7 @@ TRACE_EVENT_RCU(rcu_fqs, ...@@ -432,7 +432,7 @@ TRACE_EVENT_RCU(rcu_fqs,
__entry->cpu, __entry->qsevent) __entry->cpu, __entry->qsevent)
); );
#endif /* #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) */ #endif /* #if defined(CONFIG_TREE_RCU) */
/* /*
* Tracepoint for dyntick-idle entry/exit events. These take a string * Tracepoint for dyntick-idle entry/exit events. These take a string
...@@ -449,7 +449,7 @@ TRACE_EVENT_RCU(rcu_fqs, ...@@ -449,7 +449,7 @@ TRACE_EVENT_RCU(rcu_fqs,
*/ */
TRACE_EVENT_RCU(rcu_dyntick, TRACE_EVENT_RCU(rcu_dyntick,
TP_PROTO(const char *polarity, long oldnesting, long newnesting, atomic_t dynticks), TP_PROTO(const char *polarity, long oldnesting, long newnesting, int dynticks),
TP_ARGS(polarity, oldnesting, newnesting, dynticks), TP_ARGS(polarity, oldnesting, newnesting, dynticks),
...@@ -464,7 +464,7 @@ TRACE_EVENT_RCU(rcu_dyntick, ...@@ -464,7 +464,7 @@ TRACE_EVENT_RCU(rcu_dyntick,
__entry->polarity = polarity; __entry->polarity = polarity;
__entry->oldnesting = oldnesting; __entry->oldnesting = oldnesting;
__entry->newnesting = newnesting; __entry->newnesting = newnesting;
__entry->dynticks = atomic_read(&dynticks); __entry->dynticks = dynticks;
), ),
TP_printk("%s %lx %lx %#3x", __entry->polarity, TP_printk("%s %lx %lx %#3x", __entry->polarity,
...@@ -481,16 +481,14 @@ TRACE_EVENT_RCU(rcu_dyntick, ...@@ -481,16 +481,14 @@ TRACE_EVENT_RCU(rcu_dyntick,
*/ */
TRACE_EVENT_RCU(rcu_callback, TRACE_EVENT_RCU(rcu_callback,
TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen_lazy, TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen),
long qlen),
TP_ARGS(rcuname, rhp, qlen_lazy, qlen), TP_ARGS(rcuname, rhp, qlen),
TP_STRUCT__entry( TP_STRUCT__entry(
__field(const char *, rcuname) __field(const char *, rcuname)
__field(void *, rhp) __field(void *, rhp)
__field(void *, func) __field(void *, func)
__field(long, qlen_lazy)
__field(long, qlen) __field(long, qlen)
), ),
...@@ -498,13 +496,12 @@ TRACE_EVENT_RCU(rcu_callback, ...@@ -498,13 +496,12 @@ TRACE_EVENT_RCU(rcu_callback,
__entry->rcuname = rcuname; __entry->rcuname = rcuname;
__entry->rhp = rhp; __entry->rhp = rhp;
__entry->func = rhp->func; __entry->func = rhp->func;
__entry->qlen_lazy = qlen_lazy;
__entry->qlen = qlen; __entry->qlen = qlen;
), ),
TP_printk("%s rhp=%p func=%ps %ld/%ld", TP_printk("%s rhp=%p func=%ps %ld",
__entry->rcuname, __entry->rhp, __entry->func, __entry->rcuname, __entry->rhp, __entry->func,
__entry->qlen_lazy, __entry->qlen) __entry->qlen)
); );
/* /*
...@@ -518,15 +515,14 @@ TRACE_EVENT_RCU(rcu_callback, ...@@ -518,15 +515,14 @@ TRACE_EVENT_RCU(rcu_callback,
TRACE_EVENT_RCU(rcu_kfree_callback, TRACE_EVENT_RCU(rcu_kfree_callback,
TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset, TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
long qlen_lazy, long qlen), long qlen),
TP_ARGS(rcuname, rhp, offset, qlen_lazy, qlen), TP_ARGS(rcuname, rhp, offset, qlen),
TP_STRUCT__entry( TP_STRUCT__entry(
__field(const char *, rcuname) __field(const char *, rcuname)
__field(void *, rhp) __field(void *, rhp)
__field(unsigned long, offset) __field(unsigned long, offset)
__field(long, qlen_lazy)
__field(long, qlen) __field(long, qlen)
), ),
...@@ -534,13 +530,12 @@ TRACE_EVENT_RCU(rcu_kfree_callback, ...@@ -534,13 +530,12 @@ TRACE_EVENT_RCU(rcu_kfree_callback,
__entry->rcuname = rcuname; __entry->rcuname = rcuname;
__entry->rhp = rhp; __entry->rhp = rhp;
__entry->offset = offset; __entry->offset = offset;
__entry->qlen_lazy = qlen_lazy;
__entry->qlen = qlen; __entry->qlen = qlen;
), ),
TP_printk("%s rhp=%p func=%ld %ld/%ld", TP_printk("%s rhp=%p func=%ld %ld",
__entry->rcuname, __entry->rhp, __entry->offset, __entry->rcuname, __entry->rhp, __entry->offset,
__entry->qlen_lazy, __entry->qlen) __entry->qlen)
); );
/* /*
...@@ -552,27 +547,24 @@ TRACE_EVENT_RCU(rcu_kfree_callback, ...@@ -552,27 +547,24 @@ TRACE_EVENT_RCU(rcu_kfree_callback,
*/ */
TRACE_EVENT_RCU(rcu_batch_start, TRACE_EVENT_RCU(rcu_batch_start,
TP_PROTO(const char *rcuname, long qlen_lazy, long qlen, long blimit), TP_PROTO(const char *rcuname, long qlen, long blimit),
TP_ARGS(rcuname, qlen_lazy, qlen, blimit), TP_ARGS(rcuname, qlen, blimit),
TP_STRUCT__entry( TP_STRUCT__entry(
__field(const char *, rcuname) __field(const char *, rcuname)
__field(long, qlen_lazy)
__field(long, qlen) __field(long, qlen)
__field(long, blimit) __field(long, blimit)
), ),
TP_fast_assign( TP_fast_assign(
__entry->rcuname = rcuname; __entry->rcuname = rcuname;
__entry->qlen_lazy = qlen_lazy;
__entry->qlen = qlen; __entry->qlen = qlen;
__entry->blimit = blimit; __entry->blimit = blimit;
), ),
TP_printk("%s CBs=%ld/%ld bl=%ld", TP_printk("%s CBs=%ld bl=%ld",
__entry->rcuname, __entry->qlen_lazy, __entry->qlen, __entry->rcuname, __entry->qlen, __entry->blimit)
__entry->blimit)
); );
/* /*
......
...@@ -7,7 +7,7 @@ menu "RCU Subsystem" ...@@ -7,7 +7,7 @@ menu "RCU Subsystem"
config TREE_RCU config TREE_RCU
bool bool
default y if !PREEMPTION && SMP default y if SMP
help help
This option selects the RCU implementation that is This option selects the RCU implementation that is
designed for very large SMP system with hundreds or designed for very large SMP system with hundreds or
...@@ -17,6 +17,7 @@ config TREE_RCU ...@@ -17,6 +17,7 @@ config TREE_RCU
config PREEMPT_RCU config PREEMPT_RCU
bool bool
default y if PREEMPTION default y if PREEMPTION
select TREE_RCU
help help
This option selects the RCU implementation that is This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or designed for very large SMP systems with hundreds or
...@@ -78,7 +79,7 @@ config TASKS_RCU ...@@ -78,7 +79,7 @@ config TASKS_RCU
user-mode execution as quiescent states. user-mode execution as quiescent states.
config RCU_STALL_COMMON config RCU_STALL_COMMON
def_bool ( TREE_RCU || PREEMPT_RCU ) def_bool TREE_RCU
help help
This option enables RCU CPU stall code that is common between This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow the TINY and TREE variants of RCU. The purpose is to allow
...@@ -86,13 +87,13 @@ config RCU_STALL_COMMON ...@@ -86,13 +87,13 @@ config RCU_STALL_COMMON
making these warnings mandatory for the tree variants. making these warnings mandatory for the tree variants.
config RCU_NEED_SEGCBLIST config RCU_NEED_SEGCBLIST
def_bool ( TREE_RCU || PREEMPT_RCU || TREE_SRCU ) def_bool ( TREE_RCU || TREE_SRCU )
config RCU_FANOUT config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value" int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT range 2 64 if 64BIT
range 2 32 if !64BIT range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT depends on TREE_RCU && RCU_EXPERT
default 64 if 64BIT default 64 if 64BIT
default 32 if !64BIT default 32 if !64BIT
help help
...@@ -112,7 +113,7 @@ config RCU_FANOUT_LEAF ...@@ -112,7 +113,7 @@ config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value" int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 64 if 64BIT range 2 64 if 64BIT
range 2 32 if !64BIT range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT depends on TREE_RCU && RCU_EXPERT
default 16 default 16
help help
This option controls the leaf-level fanout of hierarchical This option controls the leaf-level fanout of hierarchical
...@@ -187,7 +188,7 @@ config RCU_BOOST_DELAY ...@@ -187,7 +188,7 @@ config RCU_BOOST_DELAY
config RCU_NOCB_CPU config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs" bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || PREEMPT_RCU depends on TREE_RCU
depends on RCU_EXPERT || NO_HZ_FULL depends on RCU_EXPERT || NO_HZ_FULL
default n default n
help help
...@@ -200,8 +201,8 @@ config RCU_NOCB_CPU ...@@ -200,8 +201,8 @@ config RCU_NOCB_CPU
specified at boot time by the rcu_nocbs parameter. For each specified at boot time by the rcu_nocbs parameter. For each
such CPU, a kthread ("rcuox/N") will be created to invoke such CPU, a kthread ("rcuox/N") will be created to invoke
callbacks, where the "N" is the CPU being offloaded, and where callbacks, where the "N" is the CPU being offloaded, and where
the "p" for RCU-preempt (PREEMPT kernels) and "s" for RCU-sched the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
(!PREEMPT kernels). Nothing prevents this kthread from running (!PREEMPTION kernels). Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired. to force the kthreads to run on whatever set of CPUs is desired.
......
...@@ -9,6 +9,5 @@ obj-$(CONFIG_TINY_SRCU) += srcutiny.o ...@@ -9,6 +9,5 @@ obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_TREE_RCU) += tree.o obj-$(CONFIG_TREE_RCU) += tree.o
obj-$(CONFIG_PREEMPT_RCU) += tree.o
obj-$(CONFIG_TINY_RCU) += tiny.o obj-$(CONFIG_TINY_RCU) += tiny.o
obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
...@@ -198,33 +198,6 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head) ...@@ -198,33 +198,6 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
} }
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */ #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
void kfree(const void *);
/*
* Reclaim the specified callback, either by invoking it (non-lazy case)
* or freeing it directly (lazy case). Return true if lazy, false otherwise.
*/
static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
{
rcu_callback_t f;
unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map);
if (__is_kfree_rcu_offset(offset)) {
trace_rcu_invoke_kfree_callback(rn, head, offset);
kfree((void *)head - offset);
rcu_lock_release(&rcu_callback_map);
return true;
} else {
trace_rcu_invoke_callback(rn, head);
f = head->func;
WRITE_ONCE(head->func, (rcu_callback_t)0L);
f(head);
rcu_lock_release(&rcu_callback_map);
return false;
}
}
#ifdef CONFIG_RCU_STALL_COMMON #ifdef CONFIG_RCU_STALL_COMMON
extern int rcu_cpu_stall_ftrace_dump; extern int rcu_cpu_stall_ftrace_dump;
...@@ -281,7 +254,7 @@ void rcu_test_sync_prims(void); ...@@ -281,7 +254,7 @@ void rcu_test_sync_prims(void);
*/ */
extern void resched_cpu(int cpu); extern void resched_cpu(int cpu);
#if defined(SRCU) || !defined(TINY_RCU) #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU)
#include <linux/rcu_node_tree.h> #include <linux/rcu_node_tree.h>
...@@ -418,7 +391,7 @@ do { \ ...@@ -418,7 +391,7 @@ do { \
#define raw_lockdep_assert_held_rcu_node(p) \ #define raw_lockdep_assert_held_rcu_node(p) \
lockdep_assert_held(&ACCESS_PRIVATE(p, lock)) lockdep_assert_held(&ACCESS_PRIVATE(p, lock))
#endif /* #if defined(SRCU) || !defined(TINY_RCU) */ #endif /* #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU) */
#ifdef CONFIG_SRCU #ifdef CONFIG_SRCU
void srcu_init(void); void srcu_init(void);
...@@ -454,7 +427,7 @@ enum rcutorture_type { ...@@ -454,7 +427,7 @@ enum rcutorture_type {
INVALID_RCU_FLAVOR INVALID_RCU_FLAVOR
}; };
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) #if defined(CONFIG_TREE_RCU)
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
unsigned long *gp_seq); unsigned long *gp_seq);
void do_trace_rcu_torture_read(const char *rcutorturename, void do_trace_rcu_torture_read(const char *rcutorturename,
......
...@@ -20,14 +20,10 @@ void rcu_cblist_init(struct rcu_cblist *rclp) ...@@ -20,14 +20,10 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
rclp->head = NULL; rclp->head = NULL;
rclp->tail = &rclp->head; rclp->tail = &rclp->head;
rclp->len = 0; rclp->len = 0;
rclp->len_lazy = 0;
} }
/* /*
* Enqueue an rcu_head structure onto the specified callback list. * Enqueue an rcu_head structure onto the specified callback list.
* This function assumes that the callback is non-lazy because it
* is intended for use by no-CBs CPUs, which do not distinguish
* between lazy and non-lazy RCU callbacks.
*/ */
void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp) void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
{ {
...@@ -54,7 +50,6 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp, ...@@ -54,7 +50,6 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
else else
drclp->tail = &drclp->head; drclp->tail = &drclp->head;
drclp->len = srclp->len; drclp->len = srclp->len;
drclp->len_lazy = srclp->len_lazy;
if (!rhp) { if (!rhp) {
rcu_cblist_init(srclp); rcu_cblist_init(srclp);
} else { } else {
...@@ -62,16 +57,12 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp, ...@@ -62,16 +57,12 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
srclp->head = rhp; srclp->head = rhp;
srclp->tail = &rhp->next; srclp->tail = &rhp->next;
WRITE_ONCE(srclp->len, 1); WRITE_ONCE(srclp->len, 1);
srclp->len_lazy = 0;
} }
} }
/* /*
* Dequeue the oldest rcu_head structure from the specified callback * Dequeue the oldest rcu_head structure from the specified callback
* list. This function assumes that the callback is non-lazy, but * list.
* the caller can later invoke rcu_cblist_dequeued_lazy() if it
* finds otherwise (and if it cares about laziness). This allows
* different users to have different ways of determining laziness.
*/ */
struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp) struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp)
{ {
...@@ -161,7 +152,6 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp) ...@@ -161,7 +152,6 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
for (i = 0; i < RCU_CBLIST_NSEGS; i++) for (i = 0; i < RCU_CBLIST_NSEGS; i++)
rsclp->tails[i] = &rsclp->head; rsclp->tails[i] = &rsclp->head;
rcu_segcblist_set_len(rsclp, 0); rcu_segcblist_set_len(rsclp, 0);
rsclp->len_lazy = 0;
rsclp->enabled = 1; rsclp->enabled = 1;
} }
...@@ -173,7 +163,6 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp) ...@@ -173,7 +163,6 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
{ {
WARN_ON_ONCE(!rcu_segcblist_empty(rsclp)); WARN_ON_ONCE(!rcu_segcblist_empty(rsclp));
WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp)); WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp));
WARN_ON_ONCE(rcu_segcblist_n_lazy_cbs(rsclp));
rsclp->enabled = 0; rsclp->enabled = 0;
} }
...@@ -253,11 +242,9 @@ bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp) ...@@ -253,11 +242,9 @@ bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp)
* absolutely not OK for it to ever miss posting a callback. * absolutely not OK for it to ever miss posting a callback.
*/ */
void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
struct rcu_head *rhp, bool lazy) struct rcu_head *rhp)
{ {
rcu_segcblist_inc_len(rsclp); rcu_segcblist_inc_len(rsclp);
if (lazy)
rsclp->len_lazy++;
smp_mb(); /* Ensure counts are updated before callback is enqueued. */ smp_mb(); /* Ensure counts are updated before callback is enqueued. */
rhp->next = NULL; rhp->next = NULL;
WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp); WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
...@@ -275,15 +262,13 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, ...@@ -275,15 +262,13 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
* period. You have been warned. * period. You have been warned.
*/ */
bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
struct rcu_head *rhp, bool lazy) struct rcu_head *rhp)
{ {
int i; int i;
if (rcu_segcblist_n_cbs(rsclp) == 0) if (rcu_segcblist_n_cbs(rsclp) == 0)
return false; return false;
rcu_segcblist_inc_len(rsclp); rcu_segcblist_inc_len(rsclp);
if (lazy)
rsclp->len_lazy++;
smp_mb(); /* Ensure counts are updated before callback is entrained. */ smp_mb(); /* Ensure counts are updated before callback is entrained. */
rhp->next = NULL; rhp->next = NULL;
for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--) for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
...@@ -307,8 +292,6 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, ...@@ -307,8 +292,6 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp, void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp) struct rcu_cblist *rclp)
{ {
rclp->len_lazy += rsclp->len_lazy;
rsclp->len_lazy = 0;
rclp->len = rcu_segcblist_xchg_len(rsclp, 0); rclp->len = rcu_segcblist_xchg_len(rsclp, 0);
} }
...@@ -361,9 +344,7 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp, ...@@ -361,9 +344,7 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp, void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp) struct rcu_cblist *rclp)
{ {
rsclp->len_lazy += rclp->len_lazy;
rcu_segcblist_add_len(rsclp, rclp->len); rcu_segcblist_add_len(rsclp, rclp->len);
rclp->len_lazy = 0;
rclp->len = 0; rclp->len = 0;
} }
......
...@@ -15,15 +15,6 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp) ...@@ -15,15 +15,6 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
return READ_ONCE(rclp->len); return READ_ONCE(rclp->len);
} }
/*
* Account for the fact that a previously dequeued callback turned out
* to be marked as lazy.
*/
static inline void rcu_cblist_dequeued_lazy(struct rcu_cblist *rclp)
{
rclp->len_lazy--;
}
void rcu_cblist_init(struct rcu_cblist *rclp); void rcu_cblist_init(struct rcu_cblist *rclp);
void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp); void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp, void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
...@@ -59,18 +50,6 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp) ...@@ -59,18 +50,6 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp)
#endif #endif
} }
/* Return number of lazy callbacks in segmented callback list. */
static inline long rcu_segcblist_n_lazy_cbs(struct rcu_segcblist *rsclp)
{
return rsclp->len_lazy;
}
/* Return number of lazy callbacks in segmented callback list. */
static inline long rcu_segcblist_n_nonlazy_cbs(struct rcu_segcblist *rsclp)
{
return rcu_segcblist_n_cbs(rsclp) - rsclp->len_lazy;
}
/* /*
* Is the specified rcu_segcblist enabled, for example, not corresponding * Is the specified rcu_segcblist enabled, for example, not corresponding
* to an offline CPU? * to an offline CPU?
...@@ -106,9 +85,9 @@ struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp); ...@@ -106,9 +85,9 @@ struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp); struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp);
bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp); bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp);
void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
struct rcu_head *rhp, bool lazy); struct rcu_head *rhp);
bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
struct rcu_head *rhp, bool lazy); struct rcu_head *rhp);
void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp, void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp); struct rcu_cblist *rclp);
void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp, void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
......
...@@ -86,6 +86,7 @@ torture_param(bool, shutdown, RCUPERF_SHUTDOWN, ...@@ -86,6 +86,7 @@ torture_param(bool, shutdown, RCUPERF_SHUTDOWN,
"Shutdown at end of performance tests."); "Shutdown at end of performance tests.");
torture_param(int, verbose, 1, "Enable verbose debugging printk()s"); torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable"); torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?");
static char *perf_type = "rcu"; static char *perf_type = "rcu";
module_param(perf_type, charp, 0444); module_param(perf_type, charp, 0444);
...@@ -105,8 +106,8 @@ static atomic_t n_rcu_perf_writer_finished; ...@@ -105,8 +106,8 @@ static atomic_t n_rcu_perf_writer_finished;
static wait_queue_head_t shutdown_wq; static wait_queue_head_t shutdown_wq;
static u64 t_rcu_perf_writer_started; static u64 t_rcu_perf_writer_started;
static u64 t_rcu_perf_writer_finished; static u64 t_rcu_perf_writer_finished;
static unsigned long b_rcu_perf_writer_started; static unsigned long b_rcu_gp_test_started;
static unsigned long b_rcu_perf_writer_finished; static unsigned long b_rcu_gp_test_finished;
static DEFINE_PER_CPU(atomic_t, n_async_inflight); static DEFINE_PER_CPU(atomic_t, n_async_inflight);
#define MAX_MEAS 10000 #define MAX_MEAS 10000
...@@ -378,10 +379,10 @@ rcu_perf_writer(void *arg) ...@@ -378,10 +379,10 @@ rcu_perf_writer(void *arg)
if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) { if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) {
t_rcu_perf_writer_started = t; t_rcu_perf_writer_started = t;
if (gp_exp) { if (gp_exp) {
b_rcu_perf_writer_started = b_rcu_gp_test_started =
cur_ops->exp_completed() / 2; cur_ops->exp_completed() / 2;
} else { } else {
b_rcu_perf_writer_started = cur_ops->get_gp_seq(); b_rcu_gp_test_started = cur_ops->get_gp_seq();
} }
} }
...@@ -429,10 +430,10 @@ rcu_perf_writer(void *arg) ...@@ -429,10 +430,10 @@ rcu_perf_writer(void *arg)
PERFOUT_STRING("Test complete"); PERFOUT_STRING("Test complete");
t_rcu_perf_writer_finished = t; t_rcu_perf_writer_finished = t;
if (gp_exp) { if (gp_exp) {
b_rcu_perf_writer_finished = b_rcu_gp_test_finished =
cur_ops->exp_completed() / 2; cur_ops->exp_completed() / 2;
} else { } else {
b_rcu_perf_writer_finished = b_rcu_gp_test_finished =
cur_ops->get_gp_seq(); cur_ops->get_gp_seq();
} }
if (shutdown) { if (shutdown) {
...@@ -515,8 +516,8 @@ rcu_perf_cleanup(void) ...@@ -515,8 +516,8 @@ rcu_perf_cleanup(void)
t_rcu_perf_writer_finished - t_rcu_perf_writer_finished -
t_rcu_perf_writer_started, t_rcu_perf_writer_started,
ngps, ngps,
rcuperf_seq_diff(b_rcu_perf_writer_finished, rcuperf_seq_diff(b_rcu_gp_test_finished,
b_rcu_perf_writer_started)); b_rcu_gp_test_started));
for (i = 0; i < nrealwriters; i++) { for (i = 0; i < nrealwriters; i++) {
if (!writer_durations) if (!writer_durations)
break; break;
...@@ -584,6 +585,159 @@ rcu_perf_shutdown(void *arg) ...@@ -584,6 +585,159 @@ rcu_perf_shutdown(void *arg)
return -EINVAL; return -EINVAL;
} }
/*
* kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number
* of iterations and measure total time and number of GP for all iterations to complete.
*/
torture_param(int, kfree_nthreads, -1, "Number of threads running loops of kfree_rcu().");
torture_param(int, kfree_alloc_num, 8000, "Number of allocations and frees done in an iteration.");
torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num allocations and frees.");
static struct task_struct **kfree_reader_tasks;
static int kfree_nrealthreads;
static atomic_t n_kfree_perf_thread_started;
static atomic_t n_kfree_perf_thread_ended;
struct kfree_obj {
char kfree_obj[8];
struct rcu_head rh;
};
static int
kfree_perf_thread(void *arg)
{
int i, loop = 0;
long me = (long)arg;
struct kfree_obj *alloc_ptr;
u64 start_time, end_time;
VERBOSE_PERFOUT_STRING("kfree_perf_thread task started");
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
set_user_nice(current, MAX_NICE);
start_time = ktime_get_mono_fast_ns();
if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) {
if (gp_exp)
b_rcu_gp_test_started = cur_ops->exp_completed() / 2;
else
b_rcu_gp_test_started = cur_ops->get_gp_seq();
}
do {
for (i = 0; i < kfree_alloc_num; i++) {
alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
if (!alloc_ptr)
return -ENOMEM;
kfree_rcu(alloc_ptr, rh);
}
cond_resched();
} while (!torture_must_stop() && ++loop < kfree_loops);
if (atomic_inc_return(&n_kfree_perf_thread_ended) >= kfree_nrealthreads) {
end_time = ktime_get_mono_fast_ns();
if (gp_exp)
b_rcu_gp_test_finished = cur_ops->exp_completed() / 2;
else
b_rcu_gp_test_finished = cur_ops->get_gp_seq();
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld\n",
(unsigned long long)(end_time - start_time), kfree_loops,
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started));
if (shutdown) {
smp_mb(); /* Assign before wake. */
wake_up(&shutdown_wq);
}
}
torture_kthread_stopping("kfree_perf_thread");
return 0;
}
static void
kfree_perf_cleanup(void)
{
int i;
if (torture_cleanup_begin())
return;
if (kfree_reader_tasks) {
for (i = 0; i < kfree_nrealthreads; i++)
torture_stop_kthread(kfree_perf_thread,
kfree_reader_tasks[i]);
kfree(kfree_reader_tasks);
}
torture_cleanup_end();
}
/*
* shutdown kthread. Just waits to be awakened, then shuts down system.
*/
static int
kfree_perf_shutdown(void *arg)
{
do {
wait_event(shutdown_wq,
atomic_read(&n_kfree_perf_thread_ended) >=
kfree_nrealthreads);
} while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads);
smp_mb(); /* Wake before output. */
kfree_perf_cleanup();
kernel_power_off();
return -EINVAL;
}
static int __init
kfree_perf_init(void)
{
long i;
int firsterr = 0;
kfree_nrealthreads = compute_real(kfree_nthreads);
/* Start up the kthreads. */
if (shutdown) {
init_waitqueue_head(&shutdown_wq);
firsterr = torture_create_kthread(kfree_perf_shutdown, NULL,
shutdown_task);
if (firsterr)
goto unwind;
schedule_timeout_uninterruptible(1);
}
kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
GFP_KERNEL);
if (kfree_reader_tasks == NULL) {
firsterr = -ENOMEM;
goto unwind;
}
for (i = 0; i < kfree_nrealthreads; i++) {
firsterr = torture_create_kthread(kfree_perf_thread, (void *)i,
kfree_reader_tasks[i]);
if (firsterr)
goto unwind;
}
while (atomic_read(&n_kfree_perf_thread_started) < kfree_nrealthreads)
schedule_timeout_uninterruptible(1);
torture_init_end();
return 0;
unwind:
torture_init_end();
kfree_perf_cleanup();
return firsterr;
}
static int __init static int __init
rcu_perf_init(void) rcu_perf_init(void)
{ {
...@@ -616,6 +770,9 @@ rcu_perf_init(void) ...@@ -616,6 +770,9 @@ rcu_perf_init(void)
if (cur_ops->init) if (cur_ops->init)
cur_ops->init(); cur_ops->init();
if (kfree_rcu_test)
return kfree_perf_init();
nrealwriters = compute_real(nwriters); nrealwriters = compute_real(nwriters);
nrealreaders = compute_real(nreaders); nrealreaders = compute_real(nreaders);
atomic_set(&n_rcu_perf_reader_started, 0); atomic_set(&n_rcu_perf_reader_started, 0);
......
...@@ -1661,43 +1661,52 @@ static void rcu_torture_fwd_prog_cb(struct rcu_head *rhp) ...@@ -1661,43 +1661,52 @@ static void rcu_torture_fwd_prog_cb(struct rcu_head *rhp)
struct rcu_fwd_cb { struct rcu_fwd_cb {
struct rcu_head rh; struct rcu_head rh;
struct rcu_fwd_cb *rfc_next; struct rcu_fwd_cb *rfc_next;
struct rcu_fwd *rfc_rfp;
int rfc_gps; int rfc_gps;
}; };
static DEFINE_SPINLOCK(rcu_fwd_lock);
static struct rcu_fwd_cb *rcu_fwd_cb_head;
static struct rcu_fwd_cb **rcu_fwd_cb_tail = &rcu_fwd_cb_head;
static long n_launders_cb;
static unsigned long rcu_fwd_startat;
static bool rcu_fwd_emergency_stop;
#define MAX_FWD_CB_JIFFIES (8 * HZ) /* Maximum CB test duration. */ #define MAX_FWD_CB_JIFFIES (8 * HZ) /* Maximum CB test duration. */
#define MIN_FWD_CB_LAUNDERS 3 /* This many CB invocations to count. */ #define MIN_FWD_CB_LAUNDERS 3 /* This many CB invocations to count. */
#define MIN_FWD_CBS_LAUNDERED 100 /* Number of counted CBs. */ #define MIN_FWD_CBS_LAUNDERED 100 /* Number of counted CBs. */
#define FWD_CBS_HIST_DIV 10 /* Histogram buckets/second. */ #define FWD_CBS_HIST_DIV 10 /* Histogram buckets/second. */
#define N_LAUNDERS_HIST (2 * MAX_FWD_CB_JIFFIES / (HZ / FWD_CBS_HIST_DIV))
struct rcu_launder_hist { struct rcu_launder_hist {
long n_launders; long n_launders;
unsigned long launder_gp_seq; unsigned long launder_gp_seq;
}; };
#define N_LAUNDERS_HIST (2 * MAX_FWD_CB_JIFFIES / (HZ / FWD_CBS_HIST_DIV))
static struct rcu_launder_hist n_launders_hist[N_LAUNDERS_HIST];
static unsigned long rcu_launder_gp_seq_start;
static void rcu_torture_fwd_cb_hist(void) struct rcu_fwd {
spinlock_t rcu_fwd_lock;
struct rcu_fwd_cb *rcu_fwd_cb_head;
struct rcu_fwd_cb **rcu_fwd_cb_tail;
long n_launders_cb;
unsigned long rcu_fwd_startat;
struct rcu_launder_hist n_launders_hist[N_LAUNDERS_HIST];
unsigned long rcu_launder_gp_seq_start;
};
struct rcu_fwd *rcu_fwds;
bool rcu_fwd_emergency_stop;
static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
{ {
unsigned long gps; unsigned long gps;
unsigned long gps_old; unsigned long gps_old;
int i; int i;
int j; int j;
for (i = ARRAY_SIZE(n_launders_hist) - 1; i > 0; i--) for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--)
if (n_launders_hist[i].n_launders > 0) if (rfp->n_launders_hist[i].n_launders > 0)
break; break;
pr_alert("%s: Callback-invocation histogram (duration %lu jiffies):", pr_alert("%s: Callback-invocation histogram (duration %lu jiffies):",
__func__, jiffies - rcu_fwd_startat); __func__, jiffies - rfp->rcu_fwd_startat);
gps_old = rcu_launder_gp_seq_start; gps_old = rfp->rcu_launder_gp_seq_start;
for (j = 0; j <= i; j++) { for (j = 0; j <= i; j++) {
gps = n_launders_hist[j].launder_gp_seq; gps = rfp->n_launders_hist[j].launder_gp_seq;
pr_cont(" %ds/%d: %ld:%ld", pr_cont(" %ds/%d: %ld:%ld",
j + 1, FWD_CBS_HIST_DIV, n_launders_hist[j].n_launders, j + 1, FWD_CBS_HIST_DIV,
rfp->n_launders_hist[j].n_launders,
rcutorture_seq_diff(gps, gps_old)); rcutorture_seq_diff(gps, gps_old));
gps_old = gps; gps_old = gps;
} }
...@@ -1711,26 +1720,27 @@ static void rcu_torture_fwd_cb_cr(struct rcu_head *rhp) ...@@ -1711,26 +1720,27 @@ static void rcu_torture_fwd_cb_cr(struct rcu_head *rhp)
int i; int i;
struct rcu_fwd_cb *rfcp = container_of(rhp, struct rcu_fwd_cb, rh); struct rcu_fwd_cb *rfcp = container_of(rhp, struct rcu_fwd_cb, rh);
struct rcu_fwd_cb **rfcpp; struct rcu_fwd_cb **rfcpp;
struct rcu_fwd *rfp = rfcp->rfc_rfp;
rfcp->rfc_next = NULL; rfcp->rfc_next = NULL;
rfcp->rfc_gps++; rfcp->rfc_gps++;
spin_lock_irqsave(&rcu_fwd_lock, flags); spin_lock_irqsave(&rfp->rcu_fwd_lock, flags);
rfcpp = rcu_fwd_cb_tail; rfcpp = rfp->rcu_fwd_cb_tail;
rcu_fwd_cb_tail = &rfcp->rfc_next; rfp->rcu_fwd_cb_tail = &rfcp->rfc_next;
WRITE_ONCE(*rfcpp, rfcp); WRITE_ONCE(*rfcpp, rfcp);
WRITE_ONCE(n_launders_cb, n_launders_cb + 1); WRITE_ONCE(rfp->n_launders_cb, rfp->n_launders_cb + 1);
i = ((jiffies - rcu_fwd_startat) / (HZ / FWD_CBS_HIST_DIV)); i = ((jiffies - rfp->rcu_fwd_startat) / (HZ / FWD_CBS_HIST_DIV));
if (i >= ARRAY_SIZE(n_launders_hist)) if (i >= ARRAY_SIZE(rfp->n_launders_hist))
i = ARRAY_SIZE(n_launders_hist) - 1; i = ARRAY_SIZE(rfp->n_launders_hist) - 1;
n_launders_hist[i].n_launders++; rfp->n_launders_hist[i].n_launders++;
n_launders_hist[i].launder_gp_seq = cur_ops->get_gp_seq(); rfp->n_launders_hist[i].launder_gp_seq = cur_ops->get_gp_seq();
spin_unlock_irqrestore(&rcu_fwd_lock, flags); spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags);
} }
// Give the scheduler a chance, even on nohz_full CPUs. // Give the scheduler a chance, even on nohz_full CPUs.
static void rcu_torture_fwd_prog_cond_resched(unsigned long iter) static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
{ {
if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) { if (IS_ENABLED(CONFIG_PREEMPTION) && IS_ENABLED(CONFIG_NO_HZ_FULL)) {
// Real call_rcu() floods hit userspace, so emulate that. // Real call_rcu() floods hit userspace, so emulate that.
if (need_resched() || (iter & 0xfff)) if (need_resched() || (iter & 0xfff))
schedule(); schedule();
...@@ -1744,23 +1754,23 @@ static void rcu_torture_fwd_prog_cond_resched(unsigned long iter) ...@@ -1744,23 +1754,23 @@ static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
* Free all callbacks on the rcu_fwd_cb_head list, either because the * Free all callbacks on the rcu_fwd_cb_head list, either because the
* test is over or because we hit an OOM event. * test is over or because we hit an OOM event.
*/ */
static unsigned long rcu_torture_fwd_prog_cbfree(void) static unsigned long rcu_torture_fwd_prog_cbfree(struct rcu_fwd *rfp)
{ {
unsigned long flags; unsigned long flags;
unsigned long freed = 0; unsigned long freed = 0;
struct rcu_fwd_cb *rfcp; struct rcu_fwd_cb *rfcp;
for (;;) { for (;;) {
spin_lock_irqsave(&rcu_fwd_lock, flags); spin_lock_irqsave(&rfp->rcu_fwd_lock, flags);
rfcp = rcu_fwd_cb_head; rfcp = rfp->rcu_fwd_cb_head;
if (!rfcp) { if (!rfcp) {
spin_unlock_irqrestore(&rcu_fwd_lock, flags); spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags);
break; break;
} }
rcu_fwd_cb_head = rfcp->rfc_next; rfp->rcu_fwd_cb_head = rfcp->rfc_next;
if (!rcu_fwd_cb_head) if (!rfp->rcu_fwd_cb_head)
rcu_fwd_cb_tail = &rcu_fwd_cb_head; rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head;
spin_unlock_irqrestore(&rcu_fwd_lock, flags); spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags);
kfree(rfcp); kfree(rfcp);
freed++; freed++;
rcu_torture_fwd_prog_cond_resched(freed); rcu_torture_fwd_prog_cond_resched(freed);
...@@ -1774,7 +1784,8 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void) ...@@ -1774,7 +1784,8 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void)
} }
/* Carry out need_resched()/cond_resched() forward-progress testing. */ /* Carry out need_resched()/cond_resched() forward-progress testing. */
static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries) static void rcu_torture_fwd_prog_nr(struct rcu_fwd *rfp,
int *tested, int *tested_tries)
{ {
unsigned long cver; unsigned long cver;
unsigned long dur; unsigned long dur;
...@@ -1804,8 +1815,8 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries) ...@@ -1804,8 +1815,8 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
sd = cur_ops->stall_dur() + 1; sd = cur_ops->stall_dur() + 1;
sd4 = (sd + fwd_progress_div - 1) / fwd_progress_div; sd4 = (sd + fwd_progress_div - 1) / fwd_progress_div;
dur = sd4 + torture_random(&trs) % (sd - sd4); dur = sd4 + torture_random(&trs) % (sd - sd4);
WRITE_ONCE(rcu_fwd_startat, jiffies); WRITE_ONCE(rfp->rcu_fwd_startat, jiffies);
stopat = rcu_fwd_startat + dur; stopat = rfp->rcu_fwd_startat + dur;
while (time_before(jiffies, stopat) && while (time_before(jiffies, stopat) &&
!shutdown_time_arrived() && !shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
...@@ -1840,7 +1851,7 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries) ...@@ -1840,7 +1851,7 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
} }
/* Carry out call_rcu() forward-progress testing. */ /* Carry out call_rcu() forward-progress testing. */
static void rcu_torture_fwd_prog_cr(void) static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
{ {
unsigned long cver; unsigned long cver;
unsigned long flags; unsigned long flags;
...@@ -1864,23 +1875,23 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1864,23 +1875,23 @@ static void rcu_torture_fwd_prog_cr(void)
/* Loop continuously posting RCU callbacks. */ /* Loop continuously posting RCU callbacks. */
WRITE_ONCE(rcu_fwd_cb_nodelay, true); WRITE_ONCE(rcu_fwd_cb_nodelay, true);
cur_ops->sync(); /* Later readers see above write. */ cur_ops->sync(); /* Later readers see above write. */
WRITE_ONCE(rcu_fwd_startat, jiffies); WRITE_ONCE(rfp->rcu_fwd_startat, jiffies);
stopat = rcu_fwd_startat + MAX_FWD_CB_JIFFIES; stopat = rfp->rcu_fwd_startat + MAX_FWD_CB_JIFFIES;
n_launders = 0; n_launders = 0;
n_launders_cb = 0; rfp->n_launders_cb = 0; // Hoist initialization for multi-kthread
n_launders_sa = 0; n_launders_sa = 0;
n_max_cbs = 0; n_max_cbs = 0;
n_max_gps = 0; n_max_gps = 0;
for (i = 0; i < ARRAY_SIZE(n_launders_hist); i++) for (i = 0; i < ARRAY_SIZE(rfp->n_launders_hist); i++)
n_launders_hist[i].n_launders = 0; rfp->n_launders_hist[i].n_launders = 0;
cver = READ_ONCE(rcu_torture_current_version); cver = READ_ONCE(rcu_torture_current_version);
gps = cur_ops->get_gp_seq(); gps = cur_ops->get_gp_seq();
rcu_launder_gp_seq_start = gps; rfp->rcu_launder_gp_seq_start = gps;
tick_dep_set_task(current, TICK_DEP_BIT_RCU); tick_dep_set_task(current, TICK_DEP_BIT_RCU);
while (time_before(jiffies, stopat) && while (time_before(jiffies, stopat) &&
!shutdown_time_arrived() && !shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
rfcp = READ_ONCE(rcu_fwd_cb_head); rfcp = READ_ONCE(rfp->rcu_fwd_cb_head);
rfcpn = NULL; rfcpn = NULL;
if (rfcp) if (rfcp)
rfcpn = READ_ONCE(rfcp->rfc_next); rfcpn = READ_ONCE(rfcp->rfc_next);
...@@ -1888,7 +1899,7 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1888,7 +1899,7 @@ static void rcu_torture_fwd_prog_cr(void)
if (rfcp->rfc_gps >= MIN_FWD_CB_LAUNDERS && if (rfcp->rfc_gps >= MIN_FWD_CB_LAUNDERS &&
++n_max_gps >= MIN_FWD_CBS_LAUNDERED) ++n_max_gps >= MIN_FWD_CBS_LAUNDERED)
break; break;
rcu_fwd_cb_head = rfcpn; rfp->rcu_fwd_cb_head = rfcpn;
n_launders++; n_launders++;
n_launders_sa++; n_launders_sa++;
} else { } else {
...@@ -1900,6 +1911,7 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1900,6 +1911,7 @@ static void rcu_torture_fwd_prog_cr(void)
n_max_cbs++; n_max_cbs++;
n_launders_sa = 0; n_launders_sa = 0;
rfcp->rfc_gps = 0; rfcp->rfc_gps = 0;
rfcp->rfc_rfp = rfp;
} }
cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr); cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr);
rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs); rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs);
...@@ -1910,22 +1922,22 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1910,22 +1922,22 @@ static void rcu_torture_fwd_prog_cr(void)
} }
} }
stoppedat = jiffies; stoppedat = jiffies;
n_launders_cb_snap = READ_ONCE(n_launders_cb); n_launders_cb_snap = READ_ONCE(rfp->n_launders_cb);
cver = READ_ONCE(rcu_torture_current_version) - cver; cver = READ_ONCE(rcu_torture_current_version) - cver;
gps = rcutorture_seq_diff(cur_ops->get_gp_seq(), gps); gps = rcutorture_seq_diff(cur_ops->get_gp_seq(), gps);
cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */ cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */
(void)rcu_torture_fwd_prog_cbfree(); (void)rcu_torture_fwd_prog_cbfree(rfp);
if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop) && if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop) &&
!shutdown_time_arrived()) { !shutdown_time_arrived()) {
WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED); WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED);
pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n", pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n",
__func__, __func__,
stoppedat - rcu_fwd_startat, jiffies - stoppedat, stoppedat - rfp->rcu_fwd_startat, jiffies - stoppedat,
n_launders + n_max_cbs - n_launders_cb_snap, n_launders + n_max_cbs - n_launders_cb_snap,
n_launders, n_launders_sa, n_launders, n_launders_sa,
n_max_gps, n_max_cbs, cver, gps); n_max_gps, n_max_cbs, cver, gps);
rcu_torture_fwd_cb_hist(); rcu_torture_fwd_cb_hist(rfp);
} }
schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */ schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */
tick_dep_clear_task(current, TICK_DEP_BIT_RCU); tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
...@@ -1940,20 +1952,22 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1940,20 +1952,22 @@ static void rcu_torture_fwd_prog_cr(void)
static int rcutorture_oom_notify(struct notifier_block *self, static int rcutorture_oom_notify(struct notifier_block *self,
unsigned long notused, void *nfreed) unsigned long notused, void *nfreed)
{ {
struct rcu_fwd *rfp = rcu_fwds;
WARN(1, "%s invoked upon OOM during forward-progress testing.\n", WARN(1, "%s invoked upon OOM during forward-progress testing.\n",
__func__); __func__);
rcu_torture_fwd_cb_hist(); rcu_torture_fwd_cb_hist(rfp);
rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rcu_fwd_startat)) / 2); rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rfp->rcu_fwd_startat)) / 2);
WRITE_ONCE(rcu_fwd_emergency_stop, true); WRITE_ONCE(rcu_fwd_emergency_stop, true);
smp_mb(); /* Emergency stop before free and wait to avoid hangs. */ smp_mb(); /* Emergency stop before free and wait to avoid hangs. */
pr_info("%s: Freed %lu RCU callbacks.\n", pr_info("%s: Freed %lu RCU callbacks.\n",
__func__, rcu_torture_fwd_prog_cbfree()); __func__, rcu_torture_fwd_prog_cbfree(rfp));
rcu_barrier(); rcu_barrier();
pr_info("%s: Freed %lu RCU callbacks.\n", pr_info("%s: Freed %lu RCU callbacks.\n",
__func__, rcu_torture_fwd_prog_cbfree()); __func__, rcu_torture_fwd_prog_cbfree(rfp));
rcu_barrier(); rcu_barrier();
pr_info("%s: Freed %lu RCU callbacks.\n", pr_info("%s: Freed %lu RCU callbacks.\n",
__func__, rcu_torture_fwd_prog_cbfree()); __func__, rcu_torture_fwd_prog_cbfree(rfp));
smp_mb(); /* Frees before return to avoid redoing OOM. */ smp_mb(); /* Frees before return to avoid redoing OOM. */
(*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */ (*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */
pr_info("%s returning after OOM processing.\n", __func__); pr_info("%s returning after OOM processing.\n", __func__);
...@@ -1967,6 +1981,7 @@ static struct notifier_block rcutorture_oom_nb = { ...@@ -1967,6 +1981,7 @@ static struct notifier_block rcutorture_oom_nb = {
/* Carry out grace-period forward-progress testing. */ /* Carry out grace-period forward-progress testing. */
static int rcu_torture_fwd_prog(void *args) static int rcu_torture_fwd_prog(void *args)
{ {
struct rcu_fwd *rfp = args;
int tested = 0; int tested = 0;
int tested_tries = 0; int tested_tries = 0;
...@@ -1978,8 +1993,8 @@ static int rcu_torture_fwd_prog(void *args) ...@@ -1978,8 +1993,8 @@ static int rcu_torture_fwd_prog(void *args)
schedule_timeout_interruptible(fwd_progress_holdoff * HZ); schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
WRITE_ONCE(rcu_fwd_emergency_stop, false); WRITE_ONCE(rcu_fwd_emergency_stop, false);
register_oom_notifier(&rcutorture_oom_nb); register_oom_notifier(&rcutorture_oom_nb);
rcu_torture_fwd_prog_nr(&tested, &tested_tries); rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
rcu_torture_fwd_prog_cr(); rcu_torture_fwd_prog_cr(rfp);
unregister_oom_notifier(&rcutorture_oom_nb); unregister_oom_notifier(&rcutorture_oom_nb);
/* Avoid slow periods, better to test when busy. */ /* Avoid slow periods, better to test when busy. */
...@@ -1995,6 +2010,8 @@ static int rcu_torture_fwd_prog(void *args) ...@@ -1995,6 +2010,8 @@ static int rcu_torture_fwd_prog(void *args)
/* If forward-progress checking is requested and feasible, spawn the thread. */ /* If forward-progress checking is requested and feasible, spawn the thread. */
static int __init rcu_torture_fwd_prog_init(void) static int __init rcu_torture_fwd_prog_init(void)
{ {
struct rcu_fwd *rfp;
if (!fwd_progress) if (!fwd_progress)
return 0; /* Not requested, so don't do it. */ return 0; /* Not requested, so don't do it. */
if (!cur_ops->stall_dur || cur_ops->stall_dur() <= 0 || if (!cur_ops->stall_dur || cur_ops->stall_dur() <= 0 ||
...@@ -2013,8 +2030,12 @@ static int __init rcu_torture_fwd_prog_init(void) ...@@ -2013,8 +2030,12 @@ static int __init rcu_torture_fwd_prog_init(void)
fwd_progress_holdoff = 1; fwd_progress_holdoff = 1;
if (fwd_progress_div <= 0) if (fwd_progress_div <= 0)
fwd_progress_div = 4; fwd_progress_div = 4;
return torture_create_kthread(rcu_torture_fwd_prog, rfp = kzalloc(sizeof(*rfp), GFP_KERNEL);
NULL, fwd_prog_task); if (!rfp)
return -ENOMEM;
spin_lock_init(&rfp->rcu_fwd_lock);
rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head;
return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task);
} }
/* Callback function for RCU barrier testing. */ /* Callback function for RCU barrier testing. */
......
...@@ -103,7 +103,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock); ...@@ -103,7 +103,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
/* /*
* Workqueue handler to drive one grace period and invoke any callbacks * Workqueue handler to drive one grace period and invoke any callbacks
* that become ready as a result. Single-CPU and !PREEMPT operation * that become ready as a result. Single-CPU and !PREEMPTION operation
* means that we get away with murder on synchronization. ;-) * means that we get away with murder on synchronization. ;-)
*/ */
void srcu_drive_gp(struct work_struct *wp) void srcu_drive_gp(struct work_struct *wp)
......
...@@ -530,7 +530,7 @@ static void srcu_gp_end(struct srcu_struct *ssp) ...@@ -530,7 +530,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
idx = rcu_seq_state(ssp->srcu_gp_seq); idx = rcu_seq_state(ssp->srcu_gp_seq);
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2); WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
cbdelay = srcu_get_delay(ssp); cbdelay = srcu_get_delay(ssp);
ssp->srcu_last_gp_end = ktime_get_mono_fast_ns(); WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
rcu_seq_end(&ssp->srcu_gp_seq); rcu_seq_end(&ssp->srcu_gp_seq);
gpseq = rcu_seq_current(&ssp->srcu_gp_seq); gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq)) if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
...@@ -762,6 +762,7 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) ...@@ -762,6 +762,7 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
unsigned long flags; unsigned long flags;
struct srcu_data *sdp; struct srcu_data *sdp;
unsigned long t; unsigned long t;
unsigned long tlast;
/* If the local srcu_data structure has callbacks, not idle. */ /* If the local srcu_data structure has callbacks, not idle. */
local_irq_save(flags); local_irq_save(flags);
...@@ -780,9 +781,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) ...@@ -780,9 +781,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
/* First, see if enough time has passed since the last GP. */ /* First, see if enough time has passed since the last GP. */
t = ktime_get_mono_fast_ns(); t = ktime_get_mono_fast_ns();
tlast = READ_ONCE(ssp->srcu_last_gp_end);
if (exp_holdoff == 0 || if (exp_holdoff == 0 ||
time_in_range_open(t, ssp->srcu_last_gp_end, time_in_range_open(t, tlast, tlast + exp_holdoff))
ssp->srcu_last_gp_end + exp_holdoff))
return false; /* Too soon after last GP. */ return false; /* Too soon after last GP. */
/* Next, check for probable idleness. */ /* Next, check for probable idleness. */
...@@ -853,7 +854,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp, ...@@ -853,7 +854,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
local_irq_save(flags); local_irq_save(flags);
sdp = this_cpu_ptr(ssp->sda); sdp = this_cpu_ptr(ssp->sda);
spin_lock_rcu_node(sdp); spin_lock_rcu_node(sdp);
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false); rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist, rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq)); rcu_seq_current(&ssp->srcu_gp_seq));
s = rcu_seq_snap(&ssp->srcu_gp_seq); s = rcu_seq_snap(&ssp->srcu_gp_seq);
...@@ -1052,7 +1053,7 @@ void srcu_barrier(struct srcu_struct *ssp) ...@@ -1052,7 +1053,7 @@ void srcu_barrier(struct srcu_struct *ssp)
sdp->srcu_barrier_head.func = srcu_barrier_cb; sdp->srcu_barrier_head.func = srcu_barrier_cb;
debug_rcu_head_queue(&sdp->srcu_barrier_head); debug_rcu_head_queue(&sdp->srcu_barrier_head);
if (!rcu_segcblist_entrain(&sdp->srcu_cblist, if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
&sdp->srcu_barrier_head, 0)) { &sdp->srcu_barrier_head)) {
debug_rcu_head_unqueue(&sdp->srcu_barrier_head); debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
atomic_dec(&ssp->srcu_barrier_cpu_cnt); atomic_dec(&ssp->srcu_barrier_cpu_cnt);
} }
......
...@@ -22,6 +22,7 @@ ...@@ -22,6 +22,7 @@
#include <linux/time.h> #include <linux/time.h>
#include <linux/cpu.h> #include <linux/cpu.h>
#include <linux/prefetch.h> #include <linux/prefetch.h>
#include <linux/slab.h>
#include "rcu.h" #include "rcu.h"
...@@ -73,6 +74,31 @@ void rcu_sched_clock_irq(int user) ...@@ -73,6 +74,31 @@ void rcu_sched_clock_irq(int user)
} }
} }
/*
* Reclaim the specified callback, either by invoking it for non-kfree cases or
* freeing it directly (for kfree). Return true if kfreeing, false otherwise.
*/
static inline bool rcu_reclaim_tiny(struct rcu_head *head)
{
rcu_callback_t f;
unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map);
if (__is_kfree_rcu_offset(offset)) {
trace_rcu_invoke_kfree_callback("", head, offset);
kfree((void *)head - offset);
rcu_lock_release(&rcu_callback_map);
return true;
}
trace_rcu_invoke_callback("", head);
f = head->func;
WRITE_ONCE(head->func, (rcu_callback_t)0L);
f(head);
rcu_lock_release(&rcu_callback_map);
return false;
}
/* Invoke the RCU callbacks whose grace period has elapsed. */ /* Invoke the RCU callbacks whose grace period has elapsed. */
static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused) static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
{ {
...@@ -100,7 +126,7 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused ...@@ -100,7 +126,7 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
prefetch(next); prefetch(next);
debug_rcu_head_unqueue(list); debug_rcu_head_unqueue(list);
local_bh_disable(); local_bh_disable();
__rcu_reclaim("", list); rcu_reclaim_tiny(list);
local_bh_enable(); local_bh_enable();
list = next; list = next;
} }
......
...@@ -43,7 +43,6 @@ ...@@ -43,7 +43,6 @@
#include <uapi/linux/sched/types.h> #include <uapi/linux/sched/types.h>
#include <linux/prefetch.h> #include <linux/prefetch.h>
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/stop_machine.h>
#include <linux/random.h> #include <linux/random.h>
#include <linux/trace_events.h> #include <linux/trace_events.h>
#include <linux/suspend.h> #include <linux/suspend.h>
...@@ -55,6 +54,7 @@ ...@@ -55,6 +54,7 @@
#include <linux/oom.h> #include <linux/oom.h>
#include <linux/smpboot.h> #include <linux/smpboot.h>
#include <linux/jiffies.h> #include <linux/jiffies.h>
#include <linux/slab.h>
#include <linux/sched/isolation.h> #include <linux/sched/isolation.h>
#include <linux/sched/clock.h> #include <linux/sched/clock.h>
#include "../time/tick-internal.h" #include "../time/tick-internal.h"
...@@ -84,7 +84,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = { ...@@ -84,7 +84,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE, .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR), .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
}; };
struct rcu_state rcu_state = { static struct rcu_state rcu_state = {
.level = { &rcu_state.node[0] }, .level = { &rcu_state.node[0] },
.gp_state = RCU_GP_IDLE, .gp_state = RCU_GP_IDLE,
.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, .gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,
...@@ -188,7 +188,7 @@ EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio); ...@@ -188,7 +188,7 @@ EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
* held, but the bit corresponding to the current CPU will be stable * held, but the bit corresponding to the current CPU will be stable
* in most contexts. * in most contexts.
*/ */
unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp) static unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp)
{ {
return READ_ONCE(rnp->qsmaskinitnext); return READ_ONCE(rnp->qsmaskinitnext);
} }
...@@ -294,7 +294,7 @@ static void rcu_dynticks_eqs_online(void) ...@@ -294,7 +294,7 @@ static void rcu_dynticks_eqs_online(void)
* *
* No ordering, as we are sampling CPU-local information. * No ordering, as we are sampling CPU-local information.
*/ */
bool rcu_dynticks_curr_cpu_in_eqs(void) static bool rcu_dynticks_curr_cpu_in_eqs(void)
{ {
struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
...@@ -305,7 +305,7 @@ bool rcu_dynticks_curr_cpu_in_eqs(void) ...@@ -305,7 +305,7 @@ bool rcu_dynticks_curr_cpu_in_eqs(void)
* Snapshot the ->dynticks counter with full ordering so as to allow * Snapshot the ->dynticks counter with full ordering so as to allow
* stable comparison of this counter with past and future snapshots. * stable comparison of this counter with past and future snapshots.
*/ */
int rcu_dynticks_snap(struct rcu_data *rdp) static int rcu_dynticks_snap(struct rcu_data *rdp)
{ {
int snap = atomic_add_return(0, &rdp->dynticks); int snap = atomic_add_return(0, &rdp->dynticks);
...@@ -528,16 +528,6 @@ static struct rcu_node *rcu_get_root(void) ...@@ -528,16 +528,6 @@ static struct rcu_node *rcu_get_root(void)
return &rcu_state.node[0]; return &rcu_state.node[0];
} }
/*
* Convert a ->gp_state value to a character string.
*/
static const char *gp_state_getname(short gs)
{
if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
return "???";
return gp_state_names[gs];
}
/* /*
* Send along grace-period-related data for rcutorture diagnostics. * Send along grace-period-related data for rcutorture diagnostics.
*/ */
...@@ -577,7 +567,7 @@ static void rcu_eqs_enter(bool user) ...@@ -577,7 +567,7 @@ static void rcu_eqs_enter(bool user)
} }
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, rdp->dynticks); trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)); WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
rdp = this_cpu_ptr(&rcu_data); rdp = this_cpu_ptr(&rcu_data);
do_nocb_deferred_wakeup(rdp); do_nocb_deferred_wakeup(rdp);
...@@ -650,14 +640,15 @@ static __always_inline void rcu_nmi_exit_common(bool irq) ...@@ -650,14 +640,15 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
* leave it in non-RCU-idle state. * leave it in non-RCU-idle state.
*/ */
if (rdp->dynticks_nmi_nesting != 1) { if (rdp->dynticks_nmi_nesting != 1) {
trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2, rdp->dynticks); trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2,
atomic_read(&rdp->dynticks));
WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */ WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
rdp->dynticks_nmi_nesting - 2); rdp->dynticks_nmi_nesting - 2);
return; return;
} }
/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */ /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, rdp->dynticks); trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, atomic_read(&rdp->dynticks));
WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */ WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
if (irq) if (irq)
...@@ -744,7 +735,7 @@ static void rcu_eqs_exit(bool user) ...@@ -744,7 +735,7 @@ static void rcu_eqs_exit(bool user)
rcu_dynticks_task_exit(); rcu_dynticks_task_exit();
rcu_dynticks_eqs_exit(); rcu_dynticks_eqs_exit();
rcu_cleanup_after_idle(); rcu_cleanup_after_idle();
trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, rdp->dynticks); trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, atomic_read(&rdp->dynticks));
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)); WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
WRITE_ONCE(rdp->dynticks_nesting, 1); WRITE_ONCE(rdp->dynticks_nesting, 1);
WARN_ON_ONCE(rdp->dynticks_nmi_nesting); WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
...@@ -800,8 +791,8 @@ void rcu_user_exit(void) ...@@ -800,8 +791,8 @@ void rcu_user_exit(void)
*/ */
static __always_inline void rcu_nmi_enter_common(bool irq) static __always_inline void rcu_nmi_enter_common(bool irq)
{ {
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
long incby = 2; long incby = 2;
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
/* Complain about underflow. */ /* Complain about underflow. */
WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0); WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
...@@ -828,12 +819,17 @@ static __always_inline void rcu_nmi_enter_common(bool irq) ...@@ -828,12 +819,17 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
} else if (tick_nohz_full_cpu(rdp->cpu) && } else if (tick_nohz_full_cpu(rdp->cpu) &&
rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE && rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) { READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
rdp->rcu_forced_tick = true; raw_spin_lock_rcu_node(rdp->mynode);
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU); // Recheck under lock.
if (rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
rdp->rcu_forced_tick = true;
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
}
raw_spin_unlock_rcu_node(rdp->mynode);
} }
trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="), trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting,
rdp->dynticks_nmi_nesting + incby, rdp->dynticks); rdp->dynticks_nmi_nesting + incby, atomic_read(&rdp->dynticks));
WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */ WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */
rdp->dynticks_nmi_nesting + incby); rdp->dynticks_nmi_nesting + incby);
barrier(); barrier();
...@@ -898,6 +894,7 @@ void rcu_irq_enter_irqson(void) ...@@ -898,6 +894,7 @@ void rcu_irq_enter_irqson(void)
*/ */
static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp) static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
{ {
raw_lockdep_assert_held_rcu_node(rdp->mynode);
WRITE_ONCE(rdp->rcu_urgent_qs, false); WRITE_ONCE(rdp->rcu_urgent_qs, false);
WRITE_ONCE(rdp->rcu_need_heavy_qs, false); WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) { if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
...@@ -1934,7 +1931,7 @@ rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags) ...@@ -1934,7 +1931,7 @@ rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags)
struct rcu_node *rnp_p; struct rcu_node *rnp_p;
raw_lockdep_assert_held_rcu_node(rnp); raw_lockdep_assert_held_rcu_node(rnp);
if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPTION)) || if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT_RCU)) ||
WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) || WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) ||
rnp->qsmask != 0) { rnp->qsmask != 0) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
...@@ -2146,7 +2143,6 @@ static void rcu_do_batch(struct rcu_data *rdp) ...@@ -2146,7 +2143,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
/* If no callbacks are ready, just return. */ /* If no callbacks are ready, just return. */
if (!rcu_segcblist_ready_cbs(&rdp->cblist)) { if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
trace_rcu_batch_start(rcu_state.name, trace_rcu_batch_start(rcu_state.name,
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
rcu_segcblist_n_cbs(&rdp->cblist), 0); rcu_segcblist_n_cbs(&rdp->cblist), 0);
trace_rcu_batch_end(rcu_state.name, 0, trace_rcu_batch_end(rcu_state.name, 0,
!rcu_segcblist_empty(&rdp->cblist), !rcu_segcblist_empty(&rdp->cblist),
...@@ -2168,7 +2164,6 @@ static void rcu_do_batch(struct rcu_data *rdp) ...@@ -2168,7 +2164,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
if (unlikely(bl > 100)) if (unlikely(bl > 100))
tlimit = local_clock() + rcu_resched_ns; tlimit = local_clock() + rcu_resched_ns;
trace_rcu_batch_start(rcu_state.name, trace_rcu_batch_start(rcu_state.name,
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
rcu_segcblist_n_cbs(&rdp->cblist), bl); rcu_segcblist_n_cbs(&rdp->cblist), bl);
rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl); rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
if (offloaded) if (offloaded)
...@@ -2179,9 +2174,19 @@ static void rcu_do_batch(struct rcu_data *rdp) ...@@ -2179,9 +2174,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
tick_dep_set_task(current, TICK_DEP_BIT_RCU); tick_dep_set_task(current, TICK_DEP_BIT_RCU);
rhp = rcu_cblist_dequeue(&rcl); rhp = rcu_cblist_dequeue(&rcl);
for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) { for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
rcu_callback_t f;
debug_rcu_head_unqueue(rhp); debug_rcu_head_unqueue(rhp);
if (__rcu_reclaim(rcu_state.name, rhp))
rcu_cblist_dequeued_lazy(&rcl); rcu_lock_acquire(&rcu_callback_map);
trace_rcu_invoke_callback(rcu_state.name, rhp);
f = rhp->func;
WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
f(rhp);
rcu_lock_release(&rcu_callback_map);
/* /*
* Stop only if limit reached and CPU has something to do. * Stop only if limit reached and CPU has something to do.
* Note: The rcl structure counts down from zero. * Note: The rcl structure counts down from zero.
...@@ -2294,7 +2299,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp)) ...@@ -2294,7 +2299,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
mask = 0; mask = 0;
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->qsmask == 0) { if (rnp->qsmask == 0) {
if (!IS_ENABLED(CONFIG_PREEMPTION) || if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
rcu_preempt_blocked_readers_cgp(rnp)) { rcu_preempt_blocked_readers_cgp(rnp)) {
/* /*
* No point in scanning bits because they * No point in scanning bits because they
...@@ -2308,14 +2313,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp)) ...@@ -2308,14 +2313,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
continue; continue;
} }
for_each_leaf_node_possible_cpu(rnp, cpu) { for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
unsigned long bit = leaf_node_cpu_bit(rnp, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
if ((rnp->qsmask & bit) != 0) { if (f(rdp)) {
rdp = per_cpu_ptr(&rcu_data, cpu); mask |= rdp->grpmask;
if (f(rdp)) { rcu_disable_urgency_upon_qs(rdp);
mask |= bit;
rcu_disable_urgency_upon_qs(rdp);
}
} }
} }
if (mask != 0) { if (mask != 0) {
...@@ -2474,8 +2476,8 @@ static void rcu_cpu_kthread(unsigned int cpu) ...@@ -2474,8 +2476,8 @@ static void rcu_cpu_kthread(unsigned int cpu)
char work, *workp = this_cpu_ptr(&rcu_data.rcu_cpu_has_work); char work, *workp = this_cpu_ptr(&rcu_data.rcu_cpu_has_work);
int spincnt; int spincnt;
trace_rcu_utilization(TPS("Start CPU kthread@rcu_run"));
for (spincnt = 0; spincnt < 10; spincnt++) { for (spincnt = 0; spincnt < 10; spincnt++) {
trace_rcu_utilization(TPS("Start CPU kthread@rcu_wait"));
local_bh_disable(); local_bh_disable();
*statusp = RCU_KTHREAD_RUNNING; *statusp = RCU_KTHREAD_RUNNING;
local_irq_disable(); local_irq_disable();
...@@ -2583,7 +2585,7 @@ static void rcu_leak_callback(struct rcu_head *rhp) ...@@ -2583,7 +2585,7 @@ static void rcu_leak_callback(struct rcu_head *rhp)
* is expected to specify a CPU. * is expected to specify a CPU.
*/ */
static void static void
__call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy) __call_rcu(struct rcu_head *head, rcu_callback_t func)
{ {
unsigned long flags; unsigned long flags;
struct rcu_data *rdp; struct rcu_data *rdp;
...@@ -2618,18 +2620,17 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy) ...@@ -2618,18 +2620,17 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy)
if (rcu_segcblist_empty(&rdp->cblist)) if (rcu_segcblist_empty(&rdp->cblist))
rcu_segcblist_init(&rdp->cblist); rcu_segcblist_init(&rdp->cblist);
} }
if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
return; // Enqueued onto ->nocb_bypass, so just leave. return; // Enqueued onto ->nocb_bypass, so just leave.
/* If we get here, rcu_nocb_try_bypass() acquired ->nocb_lock. */ /* If we get here, rcu_nocb_try_bypass() acquired ->nocb_lock. */
rcu_segcblist_enqueue(&rdp->cblist, head, lazy); rcu_segcblist_enqueue(&rdp->cblist, head);
if (__is_kfree_rcu_offset((unsigned long)func)) if (__is_kfree_rcu_offset((unsigned long)func))
trace_rcu_kfree_callback(rcu_state.name, head, trace_rcu_kfree_callback(rcu_state.name, head,
(unsigned long)func, (unsigned long)func,
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
rcu_segcblist_n_cbs(&rdp->cblist)); rcu_segcblist_n_cbs(&rdp->cblist));
else else
trace_rcu_callback(rcu_state.name, head, trace_rcu_callback(rcu_state.name, head,
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
rcu_segcblist_n_cbs(&rdp->cblist)); rcu_segcblist_n_cbs(&rdp->cblist));
/* Go handle any RCU core processing required. */ /* Go handle any RCU core processing required. */
...@@ -2679,28 +2680,230 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy) ...@@ -2679,28 +2680,230 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy)
*/ */
void call_rcu(struct rcu_head *head, rcu_callback_t func) void call_rcu(struct rcu_head *head, rcu_callback_t func)
{ {
__call_rcu(head, func, 0); __call_rcu(head, func);
} }
EXPORT_SYMBOL_GPL(call_rcu); EXPORT_SYMBOL_GPL(call_rcu);
/* Maximum number of jiffies to wait before draining a batch. */
#define KFREE_DRAIN_JIFFIES (HZ / 50)
#define KFREE_N_BATCHES 2
/**
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
* @head_free: List of kfree_rcu() objects waiting for a grace period
* @krcp: Pointer to @kfree_rcu_cpu structure
*/
struct kfree_rcu_cpu_work {
struct rcu_work rcu_work;
struct rcu_head *head_free;
struct kfree_rcu_cpu *krcp;
};
/**
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
* @head: List of kfree_rcu() objects not yet waiting for a grace period
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
* @lock: Synchronize access to this structure
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
* @monitor_todo: Tracks whether a @monitor_work delayed work is pending
* @initialized: The @lock and @rcu_work fields have been initialized
*
* This is a per-CPU structure. The reason that it is not included in
* the rcu_data structure is to permit this code to be extracted from
* the RCU files. Such extraction could allow further optimization of
* the interactions with the slab allocators.
*/
struct kfree_rcu_cpu {
struct rcu_head *head;
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
spinlock_t lock;
struct delayed_work monitor_work;
bool monitor_todo;
bool initialized;
};
static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
/*
* This function is invoked in workqueue context after a grace period.
* It frees all the objects queued on ->head_free.
*/
static void kfree_rcu_work(struct work_struct *work)
{
unsigned long flags;
struct rcu_head *head, *next;
struct kfree_rcu_cpu *krcp;
struct kfree_rcu_cpu_work *krwp;
krwp = container_of(to_rcu_work(work),
struct kfree_rcu_cpu_work, rcu_work);
krcp = krwp->krcp;
spin_lock_irqsave(&krcp->lock, flags);
head = krwp->head_free;
krwp->head_free = NULL;
spin_unlock_irqrestore(&krcp->lock, flags);
// List "head" is now private, so traverse locklessly.
for (; head; head = next) {
unsigned long offset = (unsigned long)head->func;
next = head->next;
// Potentially optimize with kfree_bulk in future.
debug_rcu_head_unqueue(head);
rcu_lock_acquire(&rcu_callback_map);
trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset))) {
/* Could be optimized with kfree_bulk() in future. */
kfree((void *)head - offset);
}
rcu_lock_release(&rcu_callback_map);
cond_resched_tasks_rcu_qs();
}
}
/* /*
* Queue an RCU callback for lazy invocation after a grace period. * Schedule the kfree batch RCU work to run in workqueue context after a GP.
* This will likely be later named something like "call_rcu_lazy()", *
* but this change will require some way of tagging the lazy RCU * This function is invoked by kfree_rcu_monitor() when the KFREE_DRAIN_JIFFIES
* callbacks in the list of pending callbacks. Until then, this * timeout has been reached.
* function may only be called from __kfree_rcu(). */
static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
{
int i;
struct kfree_rcu_cpu_work *krwp = NULL;
lockdep_assert_held(&krcp->lock);
for (i = 0; i < KFREE_N_BATCHES; i++)
if (!krcp->krw_arr[i].head_free) {
krwp = &(krcp->krw_arr[i]);
break;
}
// If a previous RCU batch is in progress, we cannot immediately
// queue another one, so return false to tell caller to retry.
if (!krwp)
return false;
krwp->head_free = krcp->head;
krcp->head = NULL;
INIT_RCU_WORK(&krwp->rcu_work, kfree_rcu_work);
queue_rcu_work(system_wq, &krwp->rcu_work);
return true;
}
static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
unsigned long flags)
{
// Attempt to start a new batch.
krcp->monitor_todo = false;
if (queue_kfree_rcu_work(krcp)) {
// Success! Our job is done here.
spin_unlock_irqrestore(&krcp->lock, flags);
return;
}
// Previous RCU batch still in progress, try again later.
krcp->monitor_todo = true;
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
spin_unlock_irqrestore(&krcp->lock, flags);
}
/*
* This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
* It invokes kfree_rcu_drain_unlock() to attempt to start another batch.
*/
static void kfree_rcu_monitor(struct work_struct *work)
{
unsigned long flags;
struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
monitor_work.work);
spin_lock_irqsave(&krcp->lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
else
spin_unlock_irqrestore(&krcp->lock, flags);
}
/*
* Queue a request for lazy invocation of kfree() after a grace period.
*
* Each kfree_call_rcu() request is added to a batch. The batch will be drained
* every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch
* will be kfree'd in workqueue context. This allows us to:
*
* 1. Batch requests together to reduce the number of grace periods during
* heavy kfree_rcu() load.
*
* 2. It makes it possible to use kfree_bulk() on a large number of
* kfree_rcu() requests thus reducing cache misses and the per-object
* overhead of kfree().
*/ */
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{ {
__call_rcu(head, func, 1); unsigned long flags;
struct kfree_rcu_cpu *krcp;
local_irq_save(flags); // For safely calling this_cpu_ptr().
krcp = this_cpu_ptr(&krc);
if (krcp->initialized)
spin_lock(&krcp->lock);
// Queue the object but don't yet schedule the batch.
if (debug_rcu_head_queue(head)) {
// Probable double kfree_rcu(), just leak.
WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
__func__, head);
goto unlock_return;
}
head->func = func;
head->next = krcp->head;
krcp->head = head;
// Set timer to drain after KFREE_DRAIN_JIFFIES.
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
!krcp->monitor_todo) {
krcp->monitor_todo = true;
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
}
unlock_return:
if (krcp->initialized)
spin_unlock(&krcp->lock);
local_irq_restore(flags);
} }
EXPORT_SYMBOL_GPL(kfree_call_rcu); EXPORT_SYMBOL_GPL(kfree_call_rcu);
void __init kfree_rcu_scheduler_running(void)
{
int cpu;
unsigned long flags;
for_each_online_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
spin_lock_irqsave(&krcp->lock, flags);
if (!krcp->head || krcp->monitor_todo) {
spin_unlock_irqrestore(&krcp->lock, flags);
continue;
}
krcp->monitor_todo = true;
schedule_delayed_work_on(cpu, &krcp->monitor_work,
KFREE_DRAIN_JIFFIES);
spin_unlock_irqrestore(&krcp->lock, flags);
}
}
/* /*
* During early boot, any blocking grace-period wait automatically * During early boot, any blocking grace-period wait automatically
* implies a grace period. Later on, this is never the case for PREEMPT. * implies a grace period. Later on, this is never the case for PREEMPTION.
* *
* Howevr, because a context switch is a grace period for !PREEMPT, any * Howevr, because a context switch is a grace period for !PREEMPTION, any
* blocking grace-period wait automatically implies a grace period if * blocking grace-period wait automatically implies a grace period if
* there is only one CPU online at any point time during execution of * there is only one CPU online at any point time during execution of
* either synchronize_rcu() or synchronize_rcu_expedited(). It is OK to * either synchronize_rcu() or synchronize_rcu_expedited(). It is OK to
...@@ -2896,7 +3099,7 @@ static void rcu_barrier_func(void *unused) ...@@ -2896,7 +3099,7 @@ static void rcu_barrier_func(void *unused)
debug_rcu_head_queue(&rdp->barrier_head); debug_rcu_head_queue(&rdp->barrier_head);
rcu_nocb_lock(rdp); rcu_nocb_lock(rdp);
WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head, 0)) { if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
atomic_inc(&rcu_state.barrier_cpu_count); atomic_inc(&rcu_state.barrier_cpu_count);
} else { } else {
debug_rcu_head_unqueue(&rdp->barrier_head); debug_rcu_head_unqueue(&rdp->barrier_head);
...@@ -3557,12 +3760,29 @@ static void __init rcu_dump_rcu_node_tree(void) ...@@ -3557,12 +3760,29 @@ static void __init rcu_dump_rcu_node_tree(void)
struct workqueue_struct *rcu_gp_wq; struct workqueue_struct *rcu_gp_wq;
struct workqueue_struct *rcu_par_gp_wq; struct workqueue_struct *rcu_par_gp_wq;
static void __init kfree_rcu_batch_init(void)
{
int cpu;
int i;
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
spin_lock_init(&krcp->lock);
for (i = 0; i < KFREE_N_BATCHES; i++)
krcp->krw_arr[i].krcp = krcp;
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
krcp->initialized = true;
}
}
void __init rcu_init(void) void __init rcu_init(void)
{ {
int cpu; int cpu;
rcu_early_boot_tests(); rcu_early_boot_tests();
kfree_rcu_batch_init();
rcu_bootup_announce(); rcu_bootup_announce();
rcu_init_geometry(); rcu_init_geometry();
rcu_init_one(); rcu_init_one();
......
...@@ -16,7 +16,6 @@ ...@@ -16,7 +16,6 @@
#include <linux/cpumask.h> #include <linux/cpumask.h>
#include <linux/seqlock.h> #include <linux/seqlock.h>
#include <linux/swait.h> #include <linux/swait.h>
#include <linux/stop_machine.h>
#include <linux/rcu_node_tree.h> #include <linux/rcu_node_tree.h>
#include "rcu_segcblist.h" #include "rcu_segcblist.h"
...@@ -182,8 +181,8 @@ struct rcu_data { ...@@ -182,8 +181,8 @@ struct rcu_data {
bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */ bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */
bool rcu_urgent_qs; /* GP old need light quiescent state. */ bool rcu_urgent_qs; /* GP old need light quiescent state. */
bool rcu_forced_tick; /* Forced tick to provide QS. */ bool rcu_forced_tick; /* Forced tick to provide QS. */
bool rcu_forced_tick_exp; /* ... provide QS to expedited GP. */
#ifdef CONFIG_RCU_FAST_NO_HZ #ifdef CONFIG_RCU_FAST_NO_HZ
bool all_lazy; /* All CPU's CBs lazy at idle start? */
unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */ unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */
unsigned long last_advance_all; /* Last jiffy CBs were all advanced. */ unsigned long last_advance_all; /* Last jiffy CBs were all advanced. */
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
...@@ -368,18 +367,6 @@ struct rcu_state { ...@@ -368,18 +367,6 @@ struct rcu_state {
#define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */ #define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */
#define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */ #define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */
static const char * const gp_state_names[] = {
"RCU_GP_IDLE",
"RCU_GP_WAIT_GPS",
"RCU_GP_DONE_GPS",
"RCU_GP_ONOFF",
"RCU_GP_INIT",
"RCU_GP_WAIT_FQS",
"RCU_GP_DOING_FQS",
"RCU_GP_CLEANUP",
"RCU_GP_CLEANED",
};
/* /*
* In order to export the rcu_state name to the tracing tools, it * In order to export the rcu_state name to the tracing tools, it
* needs to be added in the __tracepoint_string section. * needs to be added in the __tracepoint_string section.
...@@ -403,8 +390,6 @@ static const char *tp_rcu_varname __used __tracepoint_string = rcu_name; ...@@ -403,8 +390,6 @@ static const char *tp_rcu_varname __used __tracepoint_string = rcu_name;
#define RCU_NAME rcu_name #define RCU_NAME rcu_name
#endif /* #else #ifdef CONFIG_TRACING */ #endif /* #else #ifdef CONFIG_TRACING */
int rcu_dynticks_snap(struct rcu_data *rdp);
/* Forward declarations for tree_plugin.h */ /* Forward declarations for tree_plugin.h */
static void rcu_bootup_announce(void); static void rcu_bootup_announce(void);
static void rcu_qs(void); static void rcu_qs(void);
...@@ -415,7 +400,6 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp); ...@@ -415,7 +400,6 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
static int rcu_print_task_exp_stall(struct rcu_node *rnp); static int rcu_print_task_exp_stall(struct rcu_node *rnp);
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp); static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
static void rcu_flavor_sched_clock_irq(int user); static void rcu_flavor_sched_clock_irq(int user);
void call_rcu(struct rcu_head *head, rcu_callback_t func);
static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck); static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags); static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp); static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
......
...@@ -21,7 +21,7 @@ static void rcu_exp_gp_seq_start(void) ...@@ -21,7 +21,7 @@ static void rcu_exp_gp_seq_start(void)
} }
/* /*
* Return then value that expedited-grace-period counter will have * Return the value that the expedited-grace-period counter will have
* at the end of the current grace period. * at the end of the current grace period.
*/ */
static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void) static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void)
...@@ -39,7 +39,9 @@ static void rcu_exp_gp_seq_end(void) ...@@ -39,7 +39,9 @@ static void rcu_exp_gp_seq_end(void)
} }
/* /*
* Take a snapshot of the expedited-grace-period counter. * Take a snapshot of the expedited-grace-period counter, which is the
* earliest value that will indicate that a full grace period has
* elapsed since the current time.
*/ */
static unsigned long rcu_exp_gp_seq_snap(void) static unsigned long rcu_exp_gp_seq_snap(void)
{ {
...@@ -134,7 +136,7 @@ static void __maybe_unused sync_exp_reset_tree(void) ...@@ -134,7 +136,7 @@ static void __maybe_unused sync_exp_reset_tree(void)
rcu_for_each_node_breadth_first(rnp) { rcu_for_each_node_breadth_first(rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
WARN_ON_ONCE(rnp->expmask); WARN_ON_ONCE(rnp->expmask);
rnp->expmask = rnp->expmaskinit; WRITE_ONCE(rnp->expmask, rnp->expmaskinit);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
} }
} }
...@@ -143,31 +145,26 @@ static void __maybe_unused sync_exp_reset_tree(void) ...@@ -143,31 +145,26 @@ static void __maybe_unused sync_exp_reset_tree(void)
* Return non-zero if there is no RCU expedited grace period in progress * Return non-zero if there is no RCU expedited grace period in progress
* for the specified rcu_node structure, in other words, if all CPUs and * for the specified rcu_node structure, in other words, if all CPUs and
* tasks covered by the specified rcu_node structure have done their bit * tasks covered by the specified rcu_node structure have done their bit
* for the current expedited grace period. Works only for preemptible * for the current expedited grace period.
* RCU -- other RCU implementation use other means.
*
* Caller must hold the specificed rcu_node structure's ->lock
*/ */
static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp) static bool sync_rcu_exp_done(struct rcu_node *rnp)
{ {
raw_lockdep_assert_held_rcu_node(rnp); raw_lockdep_assert_held_rcu_node(rnp);
return rnp->exp_tasks == NULL && return rnp->exp_tasks == NULL &&
READ_ONCE(rnp->expmask) == 0; READ_ONCE(rnp->expmask) == 0;
} }
/* /*
* Like sync_rcu_preempt_exp_done(), but this function assumes the caller * Like sync_rcu_exp_done(), but where the caller does not hold the
* doesn't hold the rcu_node's ->lock, and will acquire and release the lock * rcu_node's ->lock.
* itself
*/ */
static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp) static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp)
{ {
unsigned long flags; unsigned long flags;
bool ret; bool ret;
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
ret = sync_rcu_preempt_exp_done(rnp); ret = sync_rcu_exp_done(rnp);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return ret; return ret;
...@@ -181,8 +178,6 @@ static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp) ...@@ -181,8 +178,6 @@ static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
* which the task was queued or to one of that rcu_node structure's ancestors, * which the task was queued or to one of that rcu_node structure's ancestors,
* recursively up the tree. (Calm down, calm down, we do the recursion * recursively up the tree. (Calm down, calm down, we do the recursion
* iteratively!) * iteratively!)
*
* Caller must hold the specified rcu_node structure's ->lock.
*/ */
static void __rcu_report_exp_rnp(struct rcu_node *rnp, static void __rcu_report_exp_rnp(struct rcu_node *rnp,
bool wake, unsigned long flags) bool wake, unsigned long flags)
...@@ -190,8 +185,9 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp, ...@@ -190,8 +185,9 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
{ {
unsigned long mask; unsigned long mask;
raw_lockdep_assert_held_rcu_node(rnp);
for (;;) { for (;;) {
if (!sync_rcu_preempt_exp_done(rnp)) { if (!sync_rcu_exp_done(rnp)) {
if (!rnp->expmask) if (!rnp->expmask)
rcu_initiate_boost(rnp, flags); rcu_initiate_boost(rnp, flags);
else else
...@@ -211,7 +207,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp, ...@@ -211,7 +207,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
rnp = rnp->parent; rnp = rnp->parent;
raw_spin_lock_rcu_node(rnp); /* irqs already disabled */ raw_spin_lock_rcu_node(rnp); /* irqs already disabled */
WARN_ON_ONCE(!(rnp->expmask & mask)); WARN_ON_ONCE(!(rnp->expmask & mask));
rnp->expmask &= ~mask; WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask);
} }
} }
...@@ -234,14 +230,23 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_node *rnp, bool wake) ...@@ -234,14 +230,23 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_node *rnp, bool wake)
static void rcu_report_exp_cpu_mult(struct rcu_node *rnp, static void rcu_report_exp_cpu_mult(struct rcu_node *rnp,
unsigned long mask, bool wake) unsigned long mask, bool wake)
{ {
int cpu;
unsigned long flags; unsigned long flags;
struct rcu_data *rdp;
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (!(rnp->expmask & mask)) { if (!(rnp->expmask & mask)) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return; return;
} }
rnp->expmask &= ~mask; WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask);
for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (!IS_ENABLED(CONFIG_NO_HZ_FULL) || !rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = false;
tick_dep_clear_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
}
__rcu_report_exp_rnp(rnp, wake, flags); /* Releases rnp->lock. */ __rcu_report_exp_rnp(rnp, wake, flags); /* Releases rnp->lock. */
} }
...@@ -345,8 +350,8 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -345,8 +350,8 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
/* Each pass checks a CPU for identity, offline, and idle. */ /* Each pass checks a CPU for identity, offline, and idle. */
mask_ofl_test = 0; mask_ofl_test = 0;
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
unsigned long mask = rdp->grpmask;
int snap; int snap;
if (raw_smp_processor_id() == cpu || if (raw_smp_processor_id() == cpu ||
...@@ -372,12 +377,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -372,12 +377,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
/* IPI the remaining CPUs for expedited quiescent state. */ /* IPI the remaining CPUs for expedited quiescent state. */
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { for_each_leaf_node_cpu_mask(rnp, cpu, mask_ofl_ipi) {
unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
unsigned long mask = rdp->grpmask;
if (!(mask_ofl_ipi & mask))
continue;
retry_ipi: retry_ipi:
if (rcu_dynticks_in_eqs_since(rdp, rdp->exp_dynticks_snap)) { if (rcu_dynticks_in_eqs_since(rdp, rdp->exp_dynticks_snap)) {
mask_ofl_test |= mask; mask_ofl_test |= mask;
...@@ -389,10 +392,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -389,10 +392,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
} }
ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0); ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
put_cpu(); put_cpu();
if (!ret) { /* The CPU will report the QS in response to the IPI. */
mask_ofl_ipi &= ~mask; if (!ret)
continue; continue;
}
/* Failed, raced with CPU hotplug operation. */ /* Failed, raced with CPU hotplug operation. */
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
if ((rnp->qsmaskinitnext & mask) && if ((rnp->qsmaskinitnext & mask) &&
...@@ -403,13 +406,12 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -403,13 +406,12 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
schedule_timeout_uninterruptible(1); schedule_timeout_uninterruptible(1);
goto retry_ipi; goto retry_ipi;
} }
/* CPU really is offline, so we can ignore it. */ /* CPU really is offline, so we must report its QS. */
if (!(rnp->expmask & mask)) if (rnp->expmask & mask)
mask_ofl_ipi &= ~mask; mask_ofl_test |= mask;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
} }
/* Report quiescent states for those that went offline. */ /* Report quiescent states for those that went offline. */
mask_ofl_test |= mask_ofl_ipi;
if (mask_ofl_test) if (mask_ofl_test)
rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false); rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
} }
...@@ -456,29 +458,62 @@ static void sync_rcu_exp_select_cpus(void) ...@@ -456,29 +458,62 @@ static void sync_rcu_exp_select_cpus(void)
flush_work(&rnp->rew.rew_work); flush_work(&rnp->rew.rew_work);
} }
static void synchronize_sched_expedited_wait(void) /*
* Wait for the expedited grace period to elapse, within time limit.
* If the time limit is exceeded without the grace period elapsing,
* return false, otherwise return true.
*/
static bool synchronize_rcu_expedited_wait_once(long tlimit)
{
int t;
struct rcu_node *rnp_root = rcu_get_root();
t = swait_event_timeout_exclusive(rcu_state.expedited_wq,
sync_rcu_exp_done_unlocked(rnp_root),
tlimit);
// Workqueues should not be signaled.
if (t > 0 || sync_rcu_exp_done_unlocked(rnp_root))
return true;
WARN_ON(t < 0); /* workqueues should not be signaled. */
return false;
}
/*
* Wait for the expedited grace period to elapse, issuing any needed
* RCU CPU stall warnings along the way.
*/
static void synchronize_rcu_expedited_wait(void)
{ {
int cpu; int cpu;
unsigned long jiffies_stall; unsigned long jiffies_stall;
unsigned long jiffies_start; unsigned long jiffies_start;
unsigned long mask; unsigned long mask;
int ndetected; int ndetected;
struct rcu_data *rdp;
struct rcu_node *rnp; struct rcu_node *rnp;
struct rcu_node *rnp_root = rcu_get_root(); struct rcu_node *rnp_root = rcu_get_root();
int ret;
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait")); trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_jiffies_till_stall_check(); jiffies_stall = rcu_jiffies_till_stall_check();
jiffies_start = jiffies; jiffies_start = jiffies;
if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
if (synchronize_rcu_expedited_wait_once(1))
return;
rcu_for_each_leaf_node(rnp) {
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
}
}
WARN_ON_ONCE(1);
}
for (;;) { for (;;) {
ret = swait_event_timeout_exclusive( if (synchronize_rcu_expedited_wait_once(jiffies_stall))
rcu_state.expedited_wq,
sync_rcu_preempt_exp_done_unlocked(rnp_root),
jiffies_stall);
if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
return; return;
WARN_ON(ret < 0); /* workqueues should not be signaled. */
if (rcu_cpu_stall_suppress) if (rcu_cpu_stall_suppress)
continue; continue;
panic_on_rcu_stall(); panic_on_rcu_stall();
...@@ -491,7 +526,7 @@ static void synchronize_sched_expedited_wait(void) ...@@ -491,7 +526,7 @@ static void synchronize_sched_expedited_wait(void)
struct rcu_data *rdp; struct rcu_data *rdp;
mask = leaf_node_cpu_bit(rnp, cpu); mask = leaf_node_cpu_bit(rnp, cpu);
if (!(rnp->expmask & mask)) if (!(READ_ONCE(rnp->expmask) & mask))
continue; continue;
ndetected++; ndetected++;
rdp = per_cpu_ptr(&rcu_data, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
...@@ -503,17 +538,18 @@ static void synchronize_sched_expedited_wait(void) ...@@ -503,17 +538,18 @@ static void synchronize_sched_expedited_wait(void)
} }
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
jiffies - jiffies_start, rcu_state.expedited_sequence, jiffies - jiffies_start, rcu_state.expedited_sequence,
rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]); READ_ONCE(rnp_root->expmask),
".T"[!!rnp_root->exp_tasks]);
if (ndetected) { if (ndetected) {
pr_err("blocking rcu_node structures:"); pr_err("blocking rcu_node structures:");
rcu_for_each_node_breadth_first(rnp) { rcu_for_each_node_breadth_first(rnp) {
if (rnp == rnp_root) if (rnp == rnp_root)
continue; /* printed unconditionally */ continue; /* printed unconditionally */
if (sync_rcu_preempt_exp_done_unlocked(rnp)) if (sync_rcu_exp_done_unlocked(rnp))
continue; continue;
pr_cont(" l=%u:%d-%d:%#lx/%c", pr_cont(" l=%u:%d-%d:%#lx/%c",
rnp->level, rnp->grplo, rnp->grphi, rnp->level, rnp->grplo, rnp->grphi,
rnp->expmask, READ_ONCE(rnp->expmask),
".T"[!!rnp->exp_tasks]); ".T"[!!rnp->exp_tasks]);
} }
pr_cont("\n"); pr_cont("\n");
...@@ -521,7 +557,7 @@ static void synchronize_sched_expedited_wait(void) ...@@ -521,7 +557,7 @@ static void synchronize_sched_expedited_wait(void)
rcu_for_each_leaf_node(rnp) { rcu_for_each_leaf_node(rnp) {
for_each_leaf_node_possible_cpu(rnp, cpu) { for_each_leaf_node_possible_cpu(rnp, cpu) {
mask = leaf_node_cpu_bit(rnp, cpu); mask = leaf_node_cpu_bit(rnp, cpu);
if (!(rnp->expmask & mask)) if (!(READ_ONCE(rnp->expmask) & mask))
continue; continue;
dump_cpu_task(cpu); dump_cpu_task(cpu);
} }
...@@ -540,15 +576,14 @@ static void rcu_exp_wait_wake(unsigned long s) ...@@ -540,15 +576,14 @@ static void rcu_exp_wait_wake(unsigned long s)
{ {
struct rcu_node *rnp; struct rcu_node *rnp;
synchronize_sched_expedited_wait(); synchronize_rcu_expedited_wait();
rcu_exp_gp_seq_end();
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end"));
/* // Switch over to wakeup mode, allowing the next GP to proceed.
* Switch over to wakeup mode, allowing the next GP, but -only- the // End the previous grace period only after acquiring the mutex
* next GP, to proceed. // to ensure that only one GP runs concurrently with wakeups.
*/
mutex_lock(&rcu_state.exp_wake_mutex); mutex_lock(&rcu_state.exp_wake_mutex);
rcu_exp_gp_seq_end();
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end"));
rcu_for_each_node_breadth_first(rnp) { rcu_for_each_node_breadth_first(rnp) {
if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) { if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
...@@ -559,7 +594,7 @@ static void rcu_exp_wait_wake(unsigned long s) ...@@ -559,7 +594,7 @@ static void rcu_exp_wait_wake(unsigned long s)
spin_unlock(&rnp->exp_lock); spin_unlock(&rnp->exp_lock);
} }
smp_mb(); /* All above changes before wakeup. */ smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]); wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
} }
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake")); trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
mutex_unlock(&rcu_state.exp_wake_mutex); mutex_unlock(&rcu_state.exp_wake_mutex);
...@@ -610,7 +645,7 @@ static void rcu_exp_handler(void *unused) ...@@ -610,7 +645,7 @@ static void rcu_exp_handler(void *unused)
* critical section. If also enabled or idle, immediately * critical section. If also enabled or idle, immediately
* report the quiescent state, otherwise defer. * report the quiescent state, otherwise defer.
*/ */
if (!t->rcu_read_lock_nesting) { if (!rcu_preempt_depth()) {
if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) || if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
rcu_dynticks_curr_cpu_in_eqs()) { rcu_dynticks_curr_cpu_in_eqs()) {
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
...@@ -634,7 +669,7 @@ static void rcu_exp_handler(void *unused) ...@@ -634,7 +669,7 @@ static void rcu_exp_handler(void *unused)
* can have caused this quiescent state to already have been * can have caused this quiescent state to already have been
* reported, so we really do need to check ->expmask. * reported, so we really do need to check ->expmask.
*/ */
if (t->rcu_read_lock_nesting > 0) { if (rcu_preempt_depth() > 0) {
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->expmask & rdp->grpmask) { if (rnp->expmask & rdp->grpmask) {
rdp->exp_deferred_qs = true; rdp->exp_deferred_qs = true;
...@@ -670,7 +705,7 @@ static void rcu_exp_handler(void *unused) ...@@ -670,7 +705,7 @@ static void rcu_exp_handler(void *unused)
} }
} }
/* PREEMPT=y, so no PREEMPT=n expedited grace period to clean up after. */ /* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */
static void sync_sched_exp_online_cleanup(int cpu) static void sync_sched_exp_online_cleanup(int cpu)
{ {
} }
...@@ -785,7 +820,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) ...@@ -785,7 +820,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
* implementations, it is still unfriendly to real-time workloads, so is * implementations, it is still unfriendly to real-time workloads, so is
* thus not recommended for any sort of common-case code. In fact, if * thus not recommended for any sort of common-case code. In fact, if
* you are using synchronize_rcu_expedited() in a loop, please restructure * you are using synchronize_rcu_expedited() in a loop, please restructure
* your code to batch your updates, and then Use a single synchronize_rcu() * your code to batch your updates, and then use a single synchronize_rcu()
* instead. * instead.
* *
* This has the same semantics as (but is more brutal than) synchronize_rcu(). * This has the same semantics as (but is more brutal than) synchronize_rcu().
......
...@@ -220,7 +220,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp) ...@@ -220,7 +220,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
* blocked tasks. * blocked tasks.
*/ */
if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD)) { if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD)) {
rnp->gp_tasks = &t->rcu_node_entry; WRITE_ONCE(rnp->gp_tasks, &t->rcu_node_entry);
WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq); WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq);
} }
if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD)) if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
...@@ -290,8 +290,8 @@ void rcu_note_context_switch(bool preempt) ...@@ -290,8 +290,8 @@ void rcu_note_context_switch(bool preempt)
trace_rcu_utilization(TPS("Start context switch")); trace_rcu_utilization(TPS("Start context switch"));
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0); WARN_ON_ONCE(!preempt && rcu_preempt_depth() > 0);
if (t->rcu_read_lock_nesting > 0 && if (rcu_preempt_depth() > 0 &&
!t->rcu_read_unlock_special.b.blocked) { !t->rcu_read_unlock_special.b.blocked) {
/* Possibly blocking in an RCU read-side critical section. */ /* Possibly blocking in an RCU read-side critical section. */
...@@ -340,7 +340,7 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch); ...@@ -340,7 +340,7 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
*/ */
static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp) static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
{ {
return rnp->gp_tasks != NULL; return READ_ONCE(rnp->gp_tasks) != NULL;
} }
/* Bias and limit values for ->rcu_read_lock_nesting. */ /* Bias and limit values for ->rcu_read_lock_nesting. */
...@@ -348,6 +348,21 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp) ...@@ -348,6 +348,21 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
#define RCU_NEST_NMAX (-INT_MAX / 2) #define RCU_NEST_NMAX (-INT_MAX / 2)
#define RCU_NEST_PMAX (INT_MAX / 2) #define RCU_NEST_PMAX (INT_MAX / 2)
static void rcu_preempt_read_enter(void)
{
current->rcu_read_lock_nesting++;
}
static void rcu_preempt_read_exit(void)
{
current->rcu_read_lock_nesting--;
}
static void rcu_preempt_depth_set(int val)
{
current->rcu_read_lock_nesting = val;
}
/* /*
* Preemptible RCU implementation for rcu_read_lock(). * Preemptible RCU implementation for rcu_read_lock().
* Just increment ->rcu_read_lock_nesting, shared state will be updated * Just increment ->rcu_read_lock_nesting, shared state will be updated
...@@ -355,9 +370,9 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp) ...@@ -355,9 +370,9 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
*/ */
void __rcu_read_lock(void) void __rcu_read_lock(void)
{ {
current->rcu_read_lock_nesting++; rcu_preempt_read_enter();
if (IS_ENABLED(CONFIG_PROVE_LOCKING)) if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(current->rcu_read_lock_nesting > RCU_NEST_PMAX); WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX);
barrier(); /* critical section after entry code. */ barrier(); /* critical section after entry code. */
} }
EXPORT_SYMBOL_GPL(__rcu_read_lock); EXPORT_SYMBOL_GPL(__rcu_read_lock);
...@@ -373,19 +388,19 @@ void __rcu_read_unlock(void) ...@@ -373,19 +388,19 @@ void __rcu_read_unlock(void)
{ {
struct task_struct *t = current; struct task_struct *t = current;
if (t->rcu_read_lock_nesting != 1) { if (rcu_preempt_depth() != 1) {
--t->rcu_read_lock_nesting; rcu_preempt_read_exit();
} else { } else {
barrier(); /* critical section before exit code. */ barrier(); /* critical section before exit code. */
t->rcu_read_lock_nesting = -RCU_NEST_BIAS; rcu_preempt_depth_set(-RCU_NEST_BIAS);
barrier(); /* assign before ->rcu_read_unlock_special load */ barrier(); /* assign before ->rcu_read_unlock_special load */
if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
rcu_read_unlock_special(t); rcu_read_unlock_special(t);
barrier(); /* ->rcu_read_unlock_special load before assign */ barrier(); /* ->rcu_read_unlock_special load before assign */
t->rcu_read_lock_nesting = 0; rcu_preempt_depth_set(0);
} }
if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
int rrln = t->rcu_read_lock_nesting; int rrln = rcu_preempt_depth();
WARN_ON_ONCE(rrln < 0 && rrln > RCU_NEST_NMAX); WARN_ON_ONCE(rrln < 0 && rrln > RCU_NEST_NMAX);
} }
...@@ -444,15 +459,9 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) ...@@ -444,15 +459,9 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
local_irq_restore(flags); local_irq_restore(flags);
return; return;
} }
t->rcu_read_unlock_special.b.deferred_qs = false; t->rcu_read_unlock_special.s = 0;
if (special.b.need_qs) { if (special.b.need_qs)
rcu_qs(); rcu_qs();
t->rcu_read_unlock_special.b.need_qs = false;
if (!t->rcu_read_unlock_special.s && !rdp->exp_deferred_qs) {
local_irq_restore(flags);
return;
}
}
/* /*
* Respond to a request by an expedited grace period for a * Respond to a request by an expedited grace period for a
...@@ -460,17 +469,11 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) ...@@ -460,17 +469,11 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
* tasks are handled when removing the task from the * tasks are handled when removing the task from the
* blocked-tasks list below. * blocked-tasks list below.
*/ */
if (rdp->exp_deferred_qs) { if (rdp->exp_deferred_qs)
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
if (!t->rcu_read_unlock_special.s) {
local_irq_restore(flags);
return;
}
}
/* Clean up if blocked during RCU read-side critical section. */ /* Clean up if blocked during RCU read-side critical section. */
if (special.b.blocked) { if (special.b.blocked) {
t->rcu_read_unlock_special.b.blocked = false;
/* /*
* Remove this task from the list it blocked on. The task * Remove this task from the list it blocked on. The task
...@@ -485,7 +488,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) ...@@ -485,7 +488,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
empty_norm = !rcu_preempt_blocked_readers_cgp(rnp); empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq && WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
(!empty_norm || rnp->qsmask)); (!empty_norm || rnp->qsmask));
empty_exp = sync_rcu_preempt_exp_done(rnp); empty_exp = sync_rcu_exp_done(rnp);
smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */ smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
np = rcu_next_node_entry(t, rnp); np = rcu_next_node_entry(t, rnp);
list_del_init(&t->rcu_node_entry); list_del_init(&t->rcu_node_entry);
...@@ -493,7 +496,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) ...@@ -493,7 +496,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
trace_rcu_unlock_preempted_task(TPS("rcu_preempt"), trace_rcu_unlock_preempted_task(TPS("rcu_preempt"),
rnp->gp_seq, t->pid); rnp->gp_seq, t->pid);
if (&t->rcu_node_entry == rnp->gp_tasks) if (&t->rcu_node_entry == rnp->gp_tasks)
rnp->gp_tasks = np; WRITE_ONCE(rnp->gp_tasks, np);
if (&t->rcu_node_entry == rnp->exp_tasks) if (&t->rcu_node_entry == rnp->exp_tasks)
rnp->exp_tasks = np; rnp->exp_tasks = np;
if (IS_ENABLED(CONFIG_RCU_BOOST)) { if (IS_ENABLED(CONFIG_RCU_BOOST)) {
...@@ -509,7 +512,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) ...@@ -509,7 +512,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
* Note that rcu_report_unblock_qs_rnp() releases rnp->lock, * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
* so we must take a snapshot of the expedited state. * so we must take a snapshot of the expedited state.
*/ */
empty_exp_now = sync_rcu_preempt_exp_done(rnp); empty_exp_now = sync_rcu_exp_done(rnp);
if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) { if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
trace_rcu_quiescent_state_report(TPS("preempt_rcu"), trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
rnp->gp_seq, rnp->gp_seq,
...@@ -551,7 +554,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t) ...@@ -551,7 +554,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{ {
return (__this_cpu_read(rcu_data.exp_deferred_qs) || return (__this_cpu_read(rcu_data.exp_deferred_qs) ||
READ_ONCE(t->rcu_read_unlock_special.s)) && READ_ONCE(t->rcu_read_unlock_special.s)) &&
t->rcu_read_lock_nesting <= 0; rcu_preempt_depth() <= 0;
} }
/* /*
...@@ -564,16 +567,16 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t) ...@@ -564,16 +567,16 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
static void rcu_preempt_deferred_qs(struct task_struct *t) static void rcu_preempt_deferred_qs(struct task_struct *t)
{ {
unsigned long flags; unsigned long flags;
bool couldrecurse = t->rcu_read_lock_nesting >= 0; bool couldrecurse = rcu_preempt_depth() >= 0;
if (!rcu_preempt_need_deferred_qs(t)) if (!rcu_preempt_need_deferred_qs(t))
return; return;
if (couldrecurse) if (couldrecurse)
t->rcu_read_lock_nesting -= RCU_NEST_BIAS; rcu_preempt_depth_set(rcu_preempt_depth() - RCU_NEST_BIAS);
local_irq_save(flags); local_irq_save(flags);
rcu_preempt_deferred_qs_irqrestore(t, flags); rcu_preempt_deferred_qs_irqrestore(t, flags);
if (couldrecurse) if (couldrecurse)
t->rcu_read_lock_nesting += RCU_NEST_BIAS; rcu_preempt_depth_set(rcu_preempt_depth() + RCU_NEST_BIAS);
} }
/* /*
...@@ -610,9 +613,8 @@ static void rcu_read_unlock_special(struct task_struct *t) ...@@ -610,9 +613,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode; struct rcu_node *rnp = rdp->mynode;
t->rcu_read_unlock_special.b.exp_hint = false;
exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
(rdp->grpmask & rnp->expmask) || (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
tick_nohz_full_cpu(rdp->cpu); tick_nohz_full_cpu(rdp->cpu);
// Need to defer quiescent state until everything is enabled. // Need to defer quiescent state until everything is enabled.
if (irqs_were_disabled && use_softirq && if (irqs_were_disabled && use_softirq &&
...@@ -640,7 +642,6 @@ static void rcu_read_unlock_special(struct task_struct *t) ...@@ -640,7 +642,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
local_irq_restore(flags); local_irq_restore(flags);
return; return;
} }
WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
rcu_preempt_deferred_qs_irqrestore(t, flags); rcu_preempt_deferred_qs_irqrestore(t, flags);
} }
...@@ -648,8 +649,7 @@ static void rcu_read_unlock_special(struct task_struct *t) ...@@ -648,8 +649,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
* Check that the list of blocked tasks for the newly completed grace * Check that the list of blocked tasks for the newly completed grace
* period is in fact empty. It is a serious bug to complete a grace * period is in fact empty. It is a serious bug to complete a grace
* period that still has RCU readers blocked! This function must be * period that still has RCU readers blocked! This function must be
* invoked -before- updating this rnp's ->gp_seq, and the rnp's ->lock * invoked -before- updating this rnp's ->gp_seq.
* must be held by the caller.
* *
* Also, if there are blocked tasks on the list, they automatically * Also, if there are blocked tasks on the list, they automatically
* block the newly created grace period, so set up ->gp_tasks accordingly. * block the newly created grace period, so set up ->gp_tasks accordingly.
...@@ -659,11 +659,12 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) ...@@ -659,11 +659,12 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
struct task_struct *t; struct task_struct *t;
RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n"); RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
raw_lockdep_assert_held_rcu_node(rnp);
if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp))) if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
dump_blkd_tasks(rnp, 10); dump_blkd_tasks(rnp, 10);
if (rcu_preempt_has_tasks(rnp) && if (rcu_preempt_has_tasks(rnp) &&
(rnp->qsmaskinit || rnp->wait_blkd_tasks)) { (rnp->qsmaskinit || rnp->wait_blkd_tasks)) {
rnp->gp_tasks = rnp->blkd_tasks.next; WRITE_ONCE(rnp->gp_tasks, rnp->blkd_tasks.next);
t = container_of(rnp->gp_tasks, struct task_struct, t = container_of(rnp->gp_tasks, struct task_struct,
rcu_node_entry); rcu_node_entry);
trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"), trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
...@@ -686,7 +687,7 @@ static void rcu_flavor_sched_clock_irq(int user) ...@@ -686,7 +687,7 @@ static void rcu_flavor_sched_clock_irq(int user)
if (user || rcu_is_cpu_rrupt_from_idle()) { if (user || rcu_is_cpu_rrupt_from_idle()) {
rcu_note_voluntary_context_switch(current); rcu_note_voluntary_context_switch(current);
} }
if (t->rcu_read_lock_nesting > 0 || if (rcu_preempt_depth() > 0 ||
(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
/* No QS, force context switch if deferred. */ /* No QS, force context switch if deferred. */
if (rcu_preempt_need_deferred_qs(t)) { if (rcu_preempt_need_deferred_qs(t)) {
...@@ -696,13 +697,13 @@ static void rcu_flavor_sched_clock_irq(int user) ...@@ -696,13 +697,13 @@ static void rcu_flavor_sched_clock_irq(int user)
} else if (rcu_preempt_need_deferred_qs(t)) { } else if (rcu_preempt_need_deferred_qs(t)) {
rcu_preempt_deferred_qs(t); /* Report deferred QS. */ rcu_preempt_deferred_qs(t); /* Report deferred QS. */
return; return;
} else if (!t->rcu_read_lock_nesting) { } else if (!rcu_preempt_depth()) {
rcu_qs(); /* Report immediate QS. */ rcu_qs(); /* Report immediate QS. */
return; return;
} }
/* If GP is oldish, ask for help from rcu_read_unlock_special(). */ /* If GP is oldish, ask for help from rcu_read_unlock_special(). */
if (t->rcu_read_lock_nesting > 0 && if (rcu_preempt_depth() > 0 &&
__this_cpu_read(rcu_data.core_needs_qs) && __this_cpu_read(rcu_data.core_needs_qs) &&
__this_cpu_read(rcu_data.cpu_no_qs.b.norm) && __this_cpu_read(rcu_data.cpu_no_qs.b.norm) &&
!t->rcu_read_unlock_special.b.need_qs && !t->rcu_read_unlock_special.b.need_qs &&
...@@ -723,11 +724,11 @@ void exit_rcu(void) ...@@ -723,11 +724,11 @@ void exit_rcu(void)
struct task_struct *t = current; struct task_struct *t = current;
if (unlikely(!list_empty(&current->rcu_node_entry))) { if (unlikely(!list_empty(&current->rcu_node_entry))) {
t->rcu_read_lock_nesting = 1; rcu_preempt_depth_set(1);
barrier(); barrier();
WRITE_ONCE(t->rcu_read_unlock_special.b.blocked, true); WRITE_ONCE(t->rcu_read_unlock_special.b.blocked, true);
} else if (unlikely(t->rcu_read_lock_nesting)) { } else if (unlikely(rcu_preempt_depth())) {
t->rcu_read_lock_nesting = 1; rcu_preempt_depth_set(1);
} else { } else {
return; return;
} }
...@@ -757,7 +758,8 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck) ...@@ -757,7 +758,8 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n", pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
__func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext); __func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
pr_info("%s: ->gp_tasks %p ->boost_tasks %p ->exp_tasks %p\n", pr_info("%s: ->gp_tasks %p ->boost_tasks %p ->exp_tasks %p\n",
__func__, rnp->gp_tasks, rnp->boost_tasks, rnp->exp_tasks); __func__, READ_ONCE(rnp->gp_tasks), rnp->boost_tasks,
rnp->exp_tasks);
pr_info("%s: ->blkd_tasks", __func__); pr_info("%s: ->blkd_tasks", __func__);
i = 0; i = 0;
list_for_each(lhp, &rnp->blkd_tasks) { list_for_each(lhp, &rnp->blkd_tasks) {
...@@ -788,7 +790,7 @@ static void __init rcu_bootup_announce(void) ...@@ -788,7 +790,7 @@ static void __init rcu_bootup_announce(void)
} }
/* /*
* Note a quiescent state for PREEMPT=n. Because we do not need to know * Note a quiescent state for PREEMPTION=n. Because we do not need to know
* how many quiescent states passed, just if there was at least one since * how many quiescent states passed, just if there was at least one since
* the start of the grace period, this just sets a flag. The caller must * the start of the grace period, this just sets a flag. The caller must
* have disabled preemption. * have disabled preemption.
...@@ -838,7 +840,7 @@ void rcu_all_qs(void) ...@@ -838,7 +840,7 @@ void rcu_all_qs(void)
EXPORT_SYMBOL_GPL(rcu_all_qs); EXPORT_SYMBOL_GPL(rcu_all_qs);
/* /*
* Note a PREEMPT=n context switch. The caller must have disabled interrupts. * Note a PREEMPTION=n context switch. The caller must have disabled interrupts.
*/ */
void rcu_note_context_switch(bool preempt) void rcu_note_context_switch(bool preempt)
{ {
...@@ -1262,10 +1264,9 @@ static void rcu_prepare_for_idle(void) ...@@ -1262,10 +1264,9 @@ static void rcu_prepare_for_idle(void)
/* /*
* This code is invoked when a CPU goes idle, at which point we want * This code is invoked when a CPU goes idle, at which point we want
* to have the CPU do everything required for RCU so that it can enter * to have the CPU do everything required for RCU so that it can enter
* the energy-efficient dyntick-idle mode. This is handled by a * the energy-efficient dyntick-idle mode.
* state machine implemented by rcu_prepare_for_idle() below.
* *
* The following three proprocessor symbols control this state machine: * The following preprocessor symbol controls this:
* *
* RCU_IDLE_GP_DELAY gives the number of jiffies that a CPU is permitted * RCU_IDLE_GP_DELAY gives the number of jiffies that a CPU is permitted
* to sleep in dyntick-idle mode with RCU callbacks pending. This * to sleep in dyntick-idle mode with RCU callbacks pending. This
...@@ -1274,21 +1275,15 @@ static void rcu_prepare_for_idle(void) ...@@ -1274,21 +1275,15 @@ static void rcu_prepare_for_idle(void)
* number, be warned: Setting RCU_IDLE_GP_DELAY too high can hang your * number, be warned: Setting RCU_IDLE_GP_DELAY too high can hang your
* system. And if you are -that- concerned about energy efficiency, * system. And if you are -that- concerned about energy efficiency,
* just power the system down and be done with it! * just power the system down and be done with it!
* RCU_IDLE_LAZY_GP_DELAY gives the number of jiffies that a CPU is
* permitted to sleep in dyntick-idle mode with only lazy RCU
* callbacks pending. Setting this too high can OOM your system.
* *
* The values below work well in practice. If future workloads require * The value below works well in practice. If future workloads require
* adjustment, they can be converted into kernel config parameters, though * adjustment, they can be converted into kernel config parameters, though
* making the state machine smarter might be a better option. * making the state machine smarter might be a better option.
*/ */
#define RCU_IDLE_GP_DELAY 4 /* Roughly one grace period. */ #define RCU_IDLE_GP_DELAY 4 /* Roughly one grace period. */
#define RCU_IDLE_LAZY_GP_DELAY (6 * HZ) /* Roughly six seconds. */
static int rcu_idle_gp_delay = RCU_IDLE_GP_DELAY; static int rcu_idle_gp_delay = RCU_IDLE_GP_DELAY;
module_param(rcu_idle_gp_delay, int, 0644); module_param(rcu_idle_gp_delay, int, 0644);
static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
module_param(rcu_idle_lazy_gp_delay, int, 0644);
/* /*
* Try to advance callbacks on the current CPU, but only if it has been * Try to advance callbacks on the current CPU, but only if it has been
...@@ -1327,8 +1322,7 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void) ...@@ -1327,8 +1322,7 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void)
/* /*
* Allow the CPU to enter dyntick-idle mode unless it has callbacks ready * Allow the CPU to enter dyntick-idle mode unless it has callbacks ready
* to invoke. If the CPU has callbacks, try to advance them. Tell the * to invoke. If the CPU has callbacks, try to advance them. Tell the
* caller to set the timeout based on whether or not there are non-lazy * caller about what to set the timeout.
* callbacks.
* *
* The caller must have disabled interrupts. * The caller must have disabled interrupts.
*/ */
...@@ -1354,25 +1348,18 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt) ...@@ -1354,25 +1348,18 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt)
} }
rdp->last_accelerate = jiffies; rdp->last_accelerate = jiffies;
/* Request timer delay depending on laziness, and round. */ /* Request timer and round. */
rdp->all_lazy = !rcu_segcblist_n_nonlazy_cbs(&rdp->cblist); dj = round_up(rcu_idle_gp_delay + jiffies, rcu_idle_gp_delay) - jiffies;
if (rdp->all_lazy) {
dj = round_jiffies(rcu_idle_lazy_gp_delay + jiffies) - jiffies;
} else {
dj = round_up(rcu_idle_gp_delay + jiffies,
rcu_idle_gp_delay) - jiffies;
}
*nextevt = basemono + dj * TICK_NSEC; *nextevt = basemono + dj * TICK_NSEC;
return 0; return 0;
} }
/* /*
* Prepare a CPU for idle from an RCU perspective. The first major task * Prepare a CPU for idle from an RCU perspective. The first major task is to
* is to sense whether nohz mode has been enabled or disabled via sysfs. * sense whether nohz mode has been enabled or disabled via sysfs. The second
* The second major task is to check to see if a non-lazy callback has * major task is to accelerate (that is, assign grace-period numbers to) any
* arrived at a CPU that previously had only lazy callbacks. The third * recently arrived callbacks.
* major task is to accelerate (that is, assign grace-period numbers to)
* any recently arrived callbacks.
* *
* The caller must have disabled interrupts. * The caller must have disabled interrupts.
*/ */
...@@ -1398,17 +1385,6 @@ static void rcu_prepare_for_idle(void) ...@@ -1398,17 +1385,6 @@ static void rcu_prepare_for_idle(void)
if (!tne) if (!tne)
return; return;
/*
* If a non-lazy callback arrived at a CPU having only lazy
* callbacks, invoke RCU core for the side-effect of recalculating
* idle duration on re-entry to idle.
*/
if (rdp->all_lazy && rcu_segcblist_n_nonlazy_cbs(&rdp->cblist)) {
rdp->all_lazy = false;
invoke_rcu_core();
return;
}
/* /*
* If we have not yet accelerated this jiffy, accelerate all * If we have not yet accelerated this jiffy, accelerate all
* callbacks on this CPU. * callbacks on this CPU.
...@@ -2321,6 +2297,8 @@ static void __init rcu_organize_nocb_kthreads(void) ...@@ -2321,6 +2297,8 @@ static void __init rcu_organize_nocb_kthreads(void)
{ {
int cpu; int cpu;
bool firsttime = true; bool firsttime = true;
bool gotnocbs = false;
bool gotnocbscbs = true;
int ls = rcu_nocb_gp_stride; int ls = rcu_nocb_gp_stride;
int nl = 0; /* Next GP kthread. */ int nl = 0; /* Next GP kthread. */
struct rcu_data *rdp; struct rcu_data *rdp;
...@@ -2343,21 +2321,31 @@ static void __init rcu_organize_nocb_kthreads(void) ...@@ -2343,21 +2321,31 @@ static void __init rcu_organize_nocb_kthreads(void)
rdp = per_cpu_ptr(&rcu_data, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->cpu >= nl) { if (rdp->cpu >= nl) {
/* New GP kthread, set up for CBs & next GP. */ /* New GP kthread, set up for CBs & next GP. */
gotnocbs = true;
nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls; nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
rdp->nocb_gp_rdp = rdp; rdp->nocb_gp_rdp = rdp;
rdp_gp = rdp; rdp_gp = rdp;
if (!firsttime && dump_tree) if (dump_tree) {
pr_cont("\n"); if (!firsttime)
firsttime = false; pr_cont("%s\n", gotnocbscbs
pr_alert("%s: No-CB GP kthread CPU %d:", __func__, cpu); ? "" : " (self only)");
gotnocbscbs = false;
firsttime = false;
pr_alert("%s: No-CB GP kthread CPU %d:",
__func__, cpu);
}
} else { } else {
/* Another CB kthread, link to previous GP kthread. */ /* Another CB kthread, link to previous GP kthread. */
gotnocbscbs = true;
rdp->nocb_gp_rdp = rdp_gp; rdp->nocb_gp_rdp = rdp_gp;
rdp_prev->nocb_next_cb_rdp = rdp; rdp_prev->nocb_next_cb_rdp = rdp;
pr_alert(" %d", cpu); if (dump_tree)
pr_cont(" %d", cpu);
} }
rdp_prev = rdp; rdp_prev = rdp;
} }
if (gotnocbs && dump_tree)
pr_cont("%s\n", gotnocbscbs ? "" : " (self only)");
} }
/* /*
......
...@@ -163,7 +163,7 @@ static void rcu_iw_handler(struct irq_work *iwp) ...@@ -163,7 +163,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
// //
// Printing RCU CPU stall warnings // Printing RCU CPU stall warnings
#ifdef CONFIG_PREEMPTION #ifdef CONFIG_PREEMPT_RCU
/* /*
* Dump detailed information for all tasks blocking the current RCU * Dump detailed information for all tasks blocking the current RCU
...@@ -215,7 +215,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp) ...@@ -215,7 +215,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
return ndetected; return ndetected;
} }
#else /* #ifdef CONFIG_PREEMPTION */ #else /* #ifdef CONFIG_PREEMPT_RCU */
/* /*
* Because preemptible RCU does not exist, we never have to check for * Because preemptible RCU does not exist, we never have to check for
...@@ -233,7 +233,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp) ...@@ -233,7 +233,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
{ {
return 0; return 0;
} }
#endif /* #else #ifdef CONFIG_PREEMPTION */ #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
/* /*
* Dump stacks of all tasks running on stalled CPUs. First try using * Dump stacks of all tasks running on stalled CPUs. First try using
...@@ -263,11 +263,9 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu) ...@@ -263,11 +263,9 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
{ {
struct rcu_data *rdp = &per_cpu(rcu_data, cpu); struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
sprintf(cp, "last_accelerate: %04lx/%04lx, Nonlazy posted: %c%c%c", sprintf(cp, "last_accelerate: %04lx/%04lx dyntick_enabled: %d",
rdp->last_accelerate & 0xffff, jiffies & 0xffff, rdp->last_accelerate & 0xffff, jiffies & 0xffff,
".l"[rdp->all_lazy], !!rdp->tick_nohz_enabled_snap);
".L"[!rcu_segcblist_n_nonlazy_cbs(&rdp->cblist)],
".D"[!!rdp->tick_nohz_enabled_snap]);
} }
#else /* #ifdef CONFIG_RCU_FAST_NO_HZ */ #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
...@@ -279,6 +277,28 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu) ...@@ -279,6 +277,28 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
#endif /* #else #ifdef CONFIG_RCU_FAST_NO_HZ */ #endif /* #else #ifdef CONFIG_RCU_FAST_NO_HZ */
static const char * const gp_state_names[] = {
[RCU_GP_IDLE] = "RCU_GP_IDLE",
[RCU_GP_WAIT_GPS] = "RCU_GP_WAIT_GPS",
[RCU_GP_DONE_GPS] = "RCU_GP_DONE_GPS",
[RCU_GP_ONOFF] = "RCU_GP_ONOFF",
[RCU_GP_INIT] = "RCU_GP_INIT",
[RCU_GP_WAIT_FQS] = "RCU_GP_WAIT_FQS",
[RCU_GP_DOING_FQS] = "RCU_GP_DOING_FQS",
[RCU_GP_CLEANUP] = "RCU_GP_CLEANUP",
[RCU_GP_CLEANED] = "RCU_GP_CLEANED",
};
/*
* Convert a ->gp_state value to a character string.
*/
static const char *gp_state_getname(short gs)
{
if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
return "???";
return gp_state_names[gs];
}
/* /*
* Print out diagnostic information for the specified stalled CPU. * Print out diagnostic information for the specified stalled CPU.
* *
......
...@@ -40,6 +40,7 @@ ...@@ -40,6 +40,7 @@
#include <linux/rcupdate_wait.h> #include <linux/rcupdate_wait.h>
#include <linux/sched/isolation.h> #include <linux/sched/isolation.h>
#include <linux/kprobes.h> #include <linux/kprobes.h>
#include <linux/slab.h>
#define CREATE_TRACE_POINTS #define CREATE_TRACE_POINTS
...@@ -51,9 +52,7 @@ ...@@ -51,9 +52,7 @@
#define MODULE_PARAM_PREFIX "rcupdate." #define MODULE_PARAM_PREFIX "rcupdate."
#ifndef CONFIG_TINY_RCU #ifndef CONFIG_TINY_RCU
extern int rcu_expedited; /* from sysctl */
module_param(rcu_expedited, int, 0); module_param(rcu_expedited, int, 0);
extern int rcu_normal; /* from sysctl */
module_param(rcu_normal, int, 0); module_param(rcu_normal, int, 0);
static int rcu_normal_after_boot; static int rcu_normal_after_boot;
module_param(rcu_normal_after_boot, int, 0); module_param(rcu_normal_after_boot, int, 0);
...@@ -218,6 +217,7 @@ static int __init rcu_set_runtime_mode(void) ...@@ -218,6 +217,7 @@ static int __init rcu_set_runtime_mode(void)
{ {
rcu_test_sync_prims(); rcu_test_sync_prims();
rcu_scheduler_active = RCU_SCHEDULER_RUNNING; rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
kfree_rcu_scheduler_running();
rcu_test_sync_prims(); rcu_test_sync_prims();
return 0; return 0;
} }
...@@ -435,7 +435,7 @@ struct debug_obj_descr rcuhead_debug_descr = { ...@@ -435,7 +435,7 @@ struct debug_obj_descr rcuhead_debug_descr = {
EXPORT_SYMBOL_GPL(rcuhead_debug_descr); EXPORT_SYMBOL_GPL(rcuhead_debug_descr);
#endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */ #endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) || defined(CONFIG_RCU_TRACE) #if defined(CONFIG_TREE_RCU) || defined(CONFIG_RCU_TRACE)
void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp, void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp,
unsigned long secs, unsigned long secs,
unsigned long c_old, unsigned long c) unsigned long c_old, unsigned long c)
...@@ -853,14 +853,22 @@ static void test_callback(struct rcu_head *r) ...@@ -853,14 +853,22 @@ static void test_callback(struct rcu_head *r)
DEFINE_STATIC_SRCU(early_srcu); DEFINE_STATIC_SRCU(early_srcu);
struct early_boot_kfree_rcu {
struct rcu_head rh;
};
static void early_boot_test_call_rcu(void) static void early_boot_test_call_rcu(void)
{ {
static struct rcu_head head; static struct rcu_head head;
static struct rcu_head shead; static struct rcu_head shead;
struct early_boot_kfree_rcu *rhp;
call_rcu(&head, test_callback); call_rcu(&head, test_callback);
if (IS_ENABLED(CONFIG_SRCU)) if (IS_ENABLED(CONFIG_SRCU))
call_srcu(&early_srcu, &shead, test_callback); call_srcu(&early_srcu, &shead, test_callback);
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
if (!WARN_ON_ONCE(!rhp))
kfree_rcu(rhp, rh);
} }
void rcu_early_boot_tests(void) void rcu_early_boot_tests(void)
......
...@@ -1268,7 +1268,7 @@ static struct ctl_table kern_table[] = { ...@@ -1268,7 +1268,7 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_do_static_key, .proc_handler = proc_do_static_key,
}, },
#endif #endif
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) #if defined(CONFIG_TREE_RCU)
{ {
.procname = "panic_on_rcu_stall", .procname = "panic_on_rcu_stall",
.data = &sysctl_panic_on_rcu_stall, .data = &sysctl_panic_on_rcu_stall,
......
...@@ -257,9 +257,6 @@ static char *tipc_key_change_dump(struct tipc_key old, struct tipc_key new, ...@@ -257,9 +257,6 @@ static char *tipc_key_change_dump(struct tipc_key old, struct tipc_key new,
#define tipc_aead_rcu_ptr(rcu_ptr, lock) \ #define tipc_aead_rcu_ptr(rcu_ptr, lock) \
rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock)) rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock))
#define tipc_aead_rcu_swap(rcu_ptr, ptr, lock) \
rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock))
#define tipc_aead_rcu_replace(rcu_ptr, ptr, lock) \ #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock) \
do { \ do { \
typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr), \ typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr), \
...@@ -1189,7 +1186,7 @@ static bool tipc_crypto_key_try_align(struct tipc_crypto *rx, u8 new_pending) ...@@ -1189,7 +1186,7 @@ static bool tipc_crypto_key_try_align(struct tipc_crypto *rx, u8 new_pending)
/* Move passive key if any */ /* Move passive key if any */
if (key.passive) { if (key.passive) {
tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, &rx->lock); tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, lockdep_is_held(&rx->lock));
x = (key.passive - key.pending + new_pending) % KEY_MAX; x = (key.passive - key.pending + new_pending) % KEY_MAX;
new_passive = (x <= 0) ? x + KEY_MAX : x; new_passive = (x <= 0) ? x + KEY_MAX : x;
} }
......
...@@ -15,8 +15,15 @@ then ...@@ -15,8 +15,15 @@ then
exit 0 exit 0
fi fi
ncpus=`grep '^processor' /proc/cpuinfo | wc -l` ncpus=`grep '^processor' /proc/cpuinfo | wc -l`
idlecpus=`mpstat | tail -1 | \ if mpstat -V > /dev/null 2>&1
awk -v ncpus=$ncpus '{ print ncpus * ($7 + $NF) / 100 }'` then
idlecpus=`mpstat | tail -1 | \
awk -v ncpus=$ncpus '{ print ncpus * ($7 + $NF) / 100 }'`
else
# No mpstat command, so use all available CPUs.
echo The mpstat command is not available, so greedily using all CPUs.
idlecpus=$ncpus
fi
awk -v ncpus=$ncpus -v idlecpus=$idlecpus < /dev/null ' awk -v ncpus=$ncpus -v idlecpus=$idlecpus < /dev/null '
BEGIN { BEGIN {
cpus2use = idlecpus; cpus2use = idlecpus;
......
...@@ -23,25 +23,39 @@ spinmax=${4-1000} ...@@ -23,25 +23,39 @@ spinmax=${4-1000}
n=1 n=1
starttime=`awk 'BEGIN { print systime(); }' < /dev/null` starttime=`gawk 'BEGIN { print systime(); }' < /dev/null`
nohotplugcpus=
for i in /sys/devices/system/cpu/cpu[0-9]*
do
if test -f $i/online
then
:
else
curcpu=`echo $i | sed -e 's/^[^0-9]*//'`
nohotplugcpus="$nohotplugcpus $curcpu"
fi
done
while : while :
do do
# Check for done. # Check for done.
t=`awk -v s=$starttime 'BEGIN { print systime() - s; }' < /dev/null` t=`gawk -v s=$starttime 'BEGIN { print systime() - s; }' < /dev/null`
if test "$t" -gt "$duration" if test "$t" -gt "$duration"
then then
exit 0; exit 0;
fi fi
# Set affinity to randomly selected online CPU # Set affinity to randomly selected online CPU
cpus=`grep 1 /sys/devices/system/cpu/*/online | if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 |
sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'` sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'`
then
# Do not leave out poor old cpu0 which may not be hot-pluggable :
if [ ! -f "/sys/devices/system/cpu/cpu0/online" ]; then else
cpus="0 $cpus" cpus=
fi fi
# Do not leave out non-hot-pluggable CPUs
cpus="$cpus $nohotplugcpus"
cpumask=`awk -v cpus="$cpus" -v me=$me -v n=$n 'BEGIN { cpumask=`awk -v cpus="$cpus" -v me=$me -v n=$n 'BEGIN {
srand(n + me + systime()); srand(n + me + systime());
......
...@@ -25,6 +25,7 @@ stopstate="`grep 'End-test grace-period state: g' $i/console.log 2> /dev/null | ...@@ -25,6 +25,7 @@ stopstate="`grep 'End-test grace-period state: g' $i/console.log 2> /dev/null |
tail -1 | sed -e 's/^\[[ 0-9.]*] //' | tail -1 | sed -e 's/^\[[ 0-9.]*] //' |
awk '{ print \"[\" $1 \" \" $5 \" \" $6 \" \" $7 \"]\"; }' | awk '{ print \"[\" $1 \" \" $5 \" \" $6 \" \" $7 \"]\"; }' |
tr -d '\012\015'`" tr -d '\012\015'`"
fwdprog="`grep 'rcu_torture_fwd_prog_cr Duration' $i/console.log 2> /dev/null | sed -e 's/^\[[^]]*] //' | sort -k15nr | head -1 | awk '{ print $14 " " $15 }'`"
if test -z "$ngps" if test -z "$ngps"
then then
echo "$configfile ------- " $stopstate echo "$configfile ------- " $stopstate
...@@ -39,7 +40,7 @@ else ...@@ -39,7 +40,7 @@ else
BEGIN { print ngps / dur }' < /dev/null` BEGIN { print ngps / dur }' < /dev/null`
title="$title ($ngpsps/s)" title="$title ($ngpsps/s)"
fi fi
echo $title $stopstate echo $title $stopstate $fwdprog
nclosecalls=`grep --binary-files=text 'torture: Reader Batch' $i/console.log | tail -1 | awk '{for (i=NF-8;i<=NF;i++) sum+=$i; } END {print sum}'` nclosecalls=`grep --binary-files=text 'torture: Reader Batch' $i/console.log | tail -1 | awk '{for (i=NF-8;i<=NF;i++) sum+=$i; } END {print sum}'`
if test -z "$nclosecalls" if test -z "$nclosecalls"
then then
......
...@@ -123,7 +123,7 @@ qemu_args=$5 ...@@ -123,7 +123,7 @@ qemu_args=$5
boot_args=$6 boot_args=$6
cd $KVM cd $KVM
kstarttime=`awk 'BEGIN { print systime() }' < /dev/null` kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null`
if test -z "$TORTURE_BUILDONLY" if test -z "$TORTURE_BUILDONLY"
then then
echo ' ---' `date`: Starting kernel echo ' ---' `date`: Starting kernel
...@@ -133,11 +133,10 @@ fi ...@@ -133,11 +133,10 @@ fi
qemu_args="-enable-kvm -nographic $qemu_args" qemu_args="-enable-kvm -nographic $qemu_args"
cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment` cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment`
cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"`
vcpus=`identify_qemu_vcpus` if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS"
if test $cpu_count -gt $vcpus
then then
echo CPU count limited from $cpu_count to $vcpus | tee -a $resdir/Warnings echo CPU count limited from $cpu_count to $TORTURE_ALLOTED_CPUS | tee -a $resdir/Warnings
cpu_count=$vcpus cpu_count=$TORTURE_ALLOTED_CPUS
fi fi
qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`" qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`"
...@@ -177,7 +176,7 @@ do ...@@ -177,7 +176,7 @@ do
then then
qemu_pid=`cat "$resdir/qemu_pid"` qemu_pid=`cat "$resdir/qemu_pid"`
fi fi
kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1 if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1
then then
if test $kruntime -ge $seconds if test $kruntime -ge $seconds
...@@ -213,7 +212,7 @@ then ...@@ -213,7 +212,7 @@ then
oldline="`tail $resdir/console.log`" oldline="`tail $resdir/console.log`"
while : while :
do do
kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
if kill -0 $qemu_pid > /dev/null 2>&1 if kill -0 $qemu_pid > /dev/null 2>&1
then then
: :
......
...@@ -24,7 +24,9 @@ dur=$((30*60)) ...@@ -24,7 +24,9 @@ dur=$((30*60))
dryrun="" dryrun=""
KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM
PATH=${KVM}/bin:$PATH; export PATH PATH=${KVM}/bin:$PATH; export PATH
TORTURE_ALLOTED_CPUS="" . functions.sh
TORTURE_ALLOTED_CPUS="`identify_qemu_vcpus`"
TORTURE_DEFCONFIG=defconfig TORTURE_DEFCONFIG=defconfig
TORTURE_BOOT_IMAGE="" TORTURE_BOOT_IMAGE=""
TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD
...@@ -40,8 +42,6 @@ cpus=0 ...@@ -40,8 +42,6 @@ cpus=0
ds=`date +%Y.%m.%d-%H:%M:%S` ds=`date +%Y.%m.%d-%H:%M:%S`
jitter="-1" jitter="-1"
. functions.sh
usage () { usage () {
echo "Usage: $scriptname optional arguments:" echo "Usage: $scriptname optional arguments:"
echo " --bootargs kernel-boot-arguments" echo " --bootargs kernel-boot-arguments"
...@@ -93,6 +93,11 @@ do ...@@ -93,6 +93,11 @@ do
checkarg --cpus "(number)" "$#" "$2" '^[0-9]*$' '^--' checkarg --cpus "(number)" "$#" "$2" '^[0-9]*$' '^--'
cpus=$2 cpus=$2
TORTURE_ALLOTED_CPUS="$2" TORTURE_ALLOTED_CPUS="$2"
max_cpus="`identify_qemu_vcpus`"
if test "$TORTURE_ALLOTED_CPUS" -gt "$max_cpus"
then
TORTURE_ALLOTED_CPUS=$max_cpus
fi
shift shift
;; ;;
--datestamp) --datestamp)
...@@ -198,9 +203,10 @@ fi ...@@ -198,9 +203,10 @@ fi
CONFIGFRAG=${KVM}/configs/${TORTURE_SUITE}; export CONFIGFRAG CONFIGFRAG=${KVM}/configs/${TORTURE_SUITE}; export CONFIGFRAG
defaultconfigs="`tr '\012' ' ' < $CONFIGFRAG/CFLIST`"
if test -z "$configs" if test -z "$configs"
then then
configs="`cat $CONFIGFRAG/CFLIST`" configs=$defaultconfigs
fi fi
if test -z "$resdir" if test -z "$resdir"
...@@ -209,7 +215,7 @@ then ...@@ -209,7 +215,7 @@ then
fi fi
# Create a file of test-name/#cpus pairs, sorted by decreasing #cpus. # Create a file of test-name/#cpus pairs, sorted by decreasing #cpus.
touch $T/cfgcpu configs_derep=
for CF in $configs for CF in $configs
do do
case $CF in case $CF in
...@@ -222,15 +228,21 @@ do ...@@ -222,15 +228,21 @@ do
CF1=$CF CF1=$CF
;; ;;
esac esac
for ((cur_rep=0;cur_rep<$config_reps;cur_rep++))
do
configs_derep="$configs_derep $CF1"
done
done
touch $T/cfgcpu
configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`"
for CF1 in $configs_derep
do
if test -f "$CONFIGFRAG/$CF1" if test -f "$CONFIGFRAG/$CF1"
then then
cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$CF1` cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$CF1`
cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"` cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"`
cpu_count=`configfrag_boot_maxcpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"` cpu_count=`configfrag_boot_maxcpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"`
for ((cur_rep=0;cur_rep<$config_reps;cur_rep++)) echo $CF1 $cpu_count >> $T/cfgcpu
do
echo $CF1 $cpu_count >> $T/cfgcpu
done
else else
echo "The --configs file $CF1 does not exist, terminating." echo "The --configs file $CF1 does not exist, terminating."
exit 1 exit 1
......
...@@ -20,58 +20,9 @@ if [ -s "$D/initrd/init" ]; then ...@@ -20,58 +20,9 @@ if [ -s "$D/initrd/init" ]; then
exit 0 exit 0
fi fi
T=${TMPDIR-/tmp}/mkinitrd.sh.$$ # Create a C-language initrd/init infinite-loop program and statically
trap 'rm -rf $T' 0 2 # link it. This results in a very small initrd.
mkdir $T echo "Creating a statically linked C-language initrd"
cat > $T/init << '__EOF___'
#!/bin/sh
# Run in userspace a few milliseconds every second. This helps to
# exercise the NO_HZ_FULL portions of RCU. The 192 instances of "a" was
# empirically shown to give a nice multi-millisecond burst of user-mode
# execution on a 2GHz CPU, as desired. Modern CPUs will vary from a
# couple of milliseconds up to perhaps 100 milliseconds, which is an
# acceptable range.
#
# Why not calibrate an exact delay? Because within this initrd, we
# are restricted to Bourne-shell builtins, which as far as I know do not
# provide any means of obtaining a fine-grained timestamp.
a4="a a a a"
a16="$a4 $a4 $a4 $a4"
a64="$a16 $a16 $a16 $a16"
a192="$a64 $a64 $a64"
while :
do
q=
for i in $a192
do
q="$q $i"
done
sleep 1
done
__EOF___
# Try using dracut to create initrd
if command -v dracut >/dev/null 2>&1
then
echo Creating $D/initrd using dracut.
# Filesystem creation
dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img
cd $D
mkdir -p initrd
cd initrd
zcat $T/initramfs.img | cpio -id
cp $T/init init
chmod +x init
echo Done creating $D/initrd using dracut
exit 0
fi
# No dracut, so create a C-language initrd/init program and statically
# link it. This results in a very small initrd, but might be a bit less
# future-proof than dracut.
echo "Could not find dracut, attempting C initrd"
cd $D cd $D
mkdir -p initrd mkdir -p initrd
cd initrd cd initrd
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment