Commit 4bbfd746 authored by Ingo Molnar's avatar Ingo Molnar

Merge branch 'for-mingo' of...

Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu

Pull RCU changes from Paul E. McKenney:

- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

- Replace calls of RCU-bh and RCU-sched update-side functions
  to their vanilla RCU counterparts.  This series is a step
  towards complete removal of the RCU-bh and RCU-sched update-side
  functions.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- Documentation updates, including a number of flavor-consolidation
  updates from Joel Fernandes.

- Miscellaneous fixes.

- Automate generation of the initrd filesystem used for
  rcutorture testing.

- Convert spin_is_locked() assertions to instead use lockdep.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- SRCU updates, especially including a fix from Dennis Krein
  for a bag-on-head-class bug.

- RCU torture-test updates.
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 25956467 5ac7cdc2
...@@ -160,9 +160,9 @@ was in flight. ...@@ -160,9 +160,9 @@ was in flight.
If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports
the quiescent state. the quiescent state.
<p> <p> Otherwise, the handler forces a future context switch by setting the
Otherwise, the handler invokes <tt>resched_cpu()</tt>, which forces NEED_RESCHED flag of the current task's thread flag and the CPU preempt
a future context switch. counter.
At the time of the context switch, the CPU reports the quiescent state. At the time of the context switch, the CPU reports the quiescent state.
Should the CPU go offline first, it will report the quiescent state Should the CPU go offline first, it will report the quiescent state
at that time. at that time.
......
...@@ -77,7 +77,7 @@ The key point is that the lock-acquisition functions, including ...@@ -77,7 +77,7 @@ The key point is that the lock-acquisition functions, including
<tt>smp_mb__after_unlock_lock()</tt> immediately after successful <tt>smp_mb__after_unlock_lock()</tt> immediately after successful
acquisition of the lock. acquisition of the lock.
<p>Therefore, for any given <tt>rcu_node</tt> struction, any access <p>Therefore, for any given <tt>rcu_node</tt> structure, any access
happening before one of the above lock-release functions will be seen happening before one of the above lock-release functions will be seen
by all CPUs as happening before any access happening after a later by all CPUs as happening before any access happening after a later
one of the above lock-acquisition functions. one of the above lock-acquisition functions.
......
...@@ -63,7 +63,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -63,7 +63,7 @@ over a rather long period of time, but improvements are always welcome!
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
rcu_read_lock_sched(), or by the appropriate update-side lock. rcu_read_lock_sched(), or by the appropriate update-side lock.
Disabling of preemption can serve as rcu_read_lock_sched(), but Disabling of preemption can serve as rcu_read_lock_sched(), but
is less readable. is less readable and prevents lockdep from detecting locking issues.
Letting RCU-protected pointers "leak" out of an RCU read-side Letting RCU-protected pointers "leak" out of an RCU read-side
critical section is every bid as bad as letting them leak out critical section is every bid as bad as letting them leak out
...@@ -285,11 +285,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -285,11 +285,7 @@ over a rather long period of time, but improvements are always welcome!
here is that superuser already has lots of ways to crash here is that superuser already has lots of ways to crash
the machine. the machine.
d. Use call_rcu_bh() rather than call_rcu(), in order to take d. Periodically invoke synchronize_rcu(), permitting a limited
advantage of call_rcu_bh()'s faster grace periods. (This
is only a partial solution, though.)
e. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period. number of updates per grace period.
The same cautions apply to call_rcu_bh(), call_rcu_sched(), The same cautions apply to call_rcu_bh(), call_rcu_sched(),
...@@ -324,37 +320,14 @@ over a rather long period of time, but improvements are always welcome! ...@@ -324,37 +320,14 @@ over a rather long period of time, but improvements are always welcome!
will break Alpha, cause aggressive compilers to generate bad code, will break Alpha, cause aggressive compilers to generate bad code,
and confuse people trying to read your code. and confuse people trying to read your code.
11. Note that synchronize_rcu() -only- guarantees to wait until 11. Any lock acquired by an RCU callback must be acquired elsewhere
all currently executing rcu_read_lock()-protected RCU read-side
critical sections complete. It does -not- necessarily guarantee
that all currently running interrupts, NMIs, preempt_disable()
code, or idle loops will complete. Therefore, if your
read-side critical sections are protected by something other
than rcu_read_lock(), do -not- use synchronize_rcu().
Similarly, disabling preemption is not an acceptable substitute
for rcu_read_lock(). Code that attempts to use preemption
disabling where it should be using rcu_read_lock() will break
in CONFIG_PREEMPT=y kernel builds.
If you want to wait for interrupt handlers, NMI handlers, and
code under the influence of preempt_disable(), you instead
need to use synchronize_irq() or synchronize_sched().
This same limitation also applies to synchronize_rcu_bh()
and synchronize_srcu(), as well as to the asynchronous and
expedited forms of the three primitives, namely call_rcu(),
call_rcu_bh(), call_srcu(), synchronize_rcu_expedited(),
synchronize_rcu_bh_expedited(), and synchronize_srcu_expedited().
12. Any lock acquired by an RCU callback must be acquired elsewhere
with softirq disabled, e.g., via spin_lock_irqsave(), with softirq disabled, e.g., via spin_lock_irqsave(),
spin_lock_bh(), etc. Failing to disable irq on a given spin_lock_bh(), etc. Failing to disable irq on a given
acquisition of that lock will result in deadlock as soon as acquisition of that lock will result in deadlock as soon as
the RCU softirq handler happens to run your RCU callback while the RCU softirq handler happens to run your RCU callback while
interrupting that acquisition's critical section. interrupting that acquisition's critical section.
13. RCU callbacks can be and are executed in parallel. In many cases, 12. RCU callbacks can be and are executed in parallel. In many cases,
the callback code simply wrappers around kfree(), so that this the callback code simply wrappers around kfree(), so that this
is not an issue (or, more accurately, to the extent that it is is not an issue (or, more accurately, to the extent that it is
an issue, the memory-allocator locking handles it). However, an issue, the memory-allocator locking handles it). However,
...@@ -370,7 +343,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -370,7 +343,7 @@ over a rather long period of time, but improvements are always welcome!
not the case, a self-spawning RCU callback would prevent the not the case, a self-spawning RCU callback would prevent the
victim CPU from ever going offline.) victim CPU from ever going offline.)
14. Unlike other forms of RCU, it -is- permissible to block in an 13. Unlike other forms of RCU, it -is- permissible to block in an
SRCU read-side critical section (demarked by srcu_read_lock() SRCU read-side critical section (demarked by srcu_read_lock()
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU". and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
Please note that if you don't need to sleep in read-side critical Please note that if you don't need to sleep in read-side critical
...@@ -414,7 +387,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -414,7 +387,7 @@ over a rather long period of time, but improvements are always welcome!
Note that rcu_dereference() and rcu_assign_pointer() relate to Note that rcu_dereference() and rcu_assign_pointer() relate to
SRCU just as they do to other forms of RCU. SRCU just as they do to other forms of RCU.
15. The whole point of call_rcu(), synchronize_rcu(), and friends 14. The whole point of call_rcu(), synchronize_rcu(), and friends
is to wait until all pre-existing readers have finished before is to wait until all pre-existing readers have finished before
carrying out some otherwise-destructive operation. It is carrying out some otherwise-destructive operation. It is
therefore critically important to -first- remove any path therefore critically important to -first- remove any path
...@@ -426,13 +399,13 @@ over a rather long period of time, but improvements are always welcome! ...@@ -426,13 +399,13 @@ over a rather long period of time, but improvements are always welcome!
is the caller's responsibility to guarantee that any subsequent is the caller's responsibility to guarantee that any subsequent
readers will execute safely. readers will execute safely.
16. The various RCU read-side primitives do -not- necessarily contain 15. The various RCU read-side primitives do -not- necessarily contain
memory barriers. You should therefore plan for the CPU memory barriers. You should therefore plan for the CPU
and the compiler to freely reorder code into and out of RCU and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this. RCU update-side primitives to deal with this.
17. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the 16. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
__rcu sparse checks to validate your RCU code. These can help __rcu sparse checks to validate your RCU code. These can help
find problems as follows: find problems as follows:
...@@ -455,7 +428,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -455,7 +428,7 @@ over a rather long period of time, but improvements are always welcome!
These debugging aids can help you find problems that are These debugging aids can help you find problems that are
otherwise extremely difficult to spot. otherwise extremely difficult to spot.
18. If you register a callback using call_rcu(), call_rcu_bh(), 17. If you register a callback using call_rcu(), call_rcu_bh(),
call_rcu_sched(), or call_srcu(), and pass in a function defined call_rcu_sched(), or call_srcu(), and pass in a function defined
within a loadable module, then it in necessary to wait for within a loadable module, then it in necessary to wait for
all pending callbacks to be invoked after the last invocation all pending callbacks to be invoked after the last invocation
...@@ -469,8 +442,8 @@ over a rather long period of time, but improvements are always welcome! ...@@ -469,8 +442,8 @@ over a rather long period of time, but improvements are always welcome!
You instead need to use one of the barrier functions: You instead need to use one of the barrier functions:
o call_rcu() -> rcu_barrier() o call_rcu() -> rcu_barrier()
o call_rcu_bh() -> rcu_barrier_bh() o call_rcu_bh() -> rcu_barrier()
o call_rcu_sched() -> rcu_barrier_sched() o call_rcu_sched() -> rcu_barrier()
o call_srcu() -> srcu_barrier() o call_srcu() -> srcu_barrier()
However, these barrier functions are absolutely -not- guaranteed However, these barrier functions are absolutely -not- guaranteed
......
...@@ -176,9 +176,8 @@ causing stalls, and that the stall was affecting RCU-sched. This message ...@@ -176,9 +176,8 @@ causing stalls, and that the stall was affecting RCU-sched. This message
will normally be followed by stack dumps for each CPU. Please note that will normally be followed by stack dumps for each CPU. Please note that
PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
the tasks will be indicated by PID, for example, "P3421". It is even the tasks will be indicated by PID, for example, "P3421". It is even
possible for a rcu_preempt_state stall to be caused by both CPUs -and- possible for an rcu_state stall to be caused by both CPUs -and- tasks,
tasks, in which case the offending CPUs and tasks will all be called in which case the offending CPUs and tasks will all be called out in the list.
out in the list.
CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
the RCU core for the past three grace periods. In contrast, CPU 16's "(0 the RCU core for the past three grace periods. In contrast, CPU 16's "(0
...@@ -206,7 +205,7 @@ handlers are no longer able to execute on this CPU. This can happen if ...@@ -206,7 +205,7 @@ handlers are no longer able to execute on this CPU. This can happen if
the stalled CPU is spinning with interrupts are disabled, or, in -rt the stalled CPU is spinning with interrupts are disabled, or, in -rt
kernels, if a high-priority process is starving RCU's softirq handler. kernels, if a high-priority process is starving RCU's softirq handler.
The "fps=" shows the number of force-quiescent-state idle/offline The "fqs=" shows the number of force-quiescent-state idle/offline
detection passes that the grace-period kthread has made across this detection passes that the grace-period kthread has made across this
CPU since the last time that this CPU noted the beginning of a grace CPU since the last time that this CPU noted the beginning of a grace
period. period.
......
...@@ -266,7 +266,7 @@ rcu_dereference() ...@@ -266,7 +266,7 @@ rcu_dereference()
unnecessary overhead on Alpha CPUs. unnecessary overhead on Alpha CPUs.
Note that the value returned by rcu_dereference() is valid Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section. only within the enclosing RCU read-side critical section [1].
For example, the following is -not- legal: For example, the following is -not- legal:
rcu_read_lock(); rcu_read_lock();
...@@ -292,6 +292,19 @@ rcu_dereference() ...@@ -292,6 +292,19 @@ rcu_dereference()
typically used indirectly, via the _rcu list-manipulation typically used indirectly, via the _rcu list-manipulation
primitives, such as list_for_each_entry_rcu(). primitives, such as list_for_each_entry_rcu().
[1] The variant rcu_dereference_protected() can be used outside
of an RCU read-side critical section as long as the usage is
protected by locks acquired by the update-side code. This variant
avoids the lockdep warning that would happen when using (for
example) rcu_dereference() without rcu_read_lock() protection.
Using rcu_dereference_protected() also has the advantage
of permitting compiler optimizations that rcu_dereference()
must prohibit. The rcu_dereference_protected() variant takes
a lockdep expression to indicate which locks must be acquired
by the caller. If the indicated protection is not provided,
a lockdep splat is emitted. See RCU/Design/Requirements.html
and the API's code comments for more details and example usage.
The following diagram shows how each API communicates among the The following diagram shows how each API communicates among the
reader, updater, and reclaimer. reader, updater, and reclaimer.
...@@ -322,28 +335,27 @@ to their callers and (2) call_rcu() callbacks may be invoked. Efficient ...@@ -322,28 +335,27 @@ to their callers and (2) call_rcu() callbacks may be invoked. Efficient
implementations of the RCU infrastructure make heavy use of batching in implementations of the RCU infrastructure make heavy use of batching in
order to amortize their overhead over many uses of the corresponding APIs. order to amortize their overhead over many uses of the corresponding APIs.
There are no fewer than three RCU mechanisms in the Linux kernel; the There are at least three flavors of RCU usage in the Linux kernel. The diagram
diagram above shows the first one, which is by far the most commonly used. above shows the most common one. On the updater side, the rcu_assign_pointer(),
The rcu_dereference() and rcu_assign_pointer() primitives are used for sychronize_rcu() and call_rcu() primitives used are the same for all three
all three mechanisms, but different defer and protect primitives are flavors. However for protection (on the reader side), the primitives used vary
used as follows: depending on the flavor:
Defer Protect
a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() a. rcu_read_lock() / rcu_read_unlock()
call_rcu() rcu_dereference() rcu_dereference()
b. synchronize_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() b. rcu_read_lock_bh() / rcu_read_unlock_bh()
call_rcu_bh() rcu_dereference_bh() local_bh_disable() / local_bh_enable()
rcu_dereference_bh()
c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched() c. rcu_read_lock_sched() / rcu_read_unlock_sched()
call_rcu_sched() preempt_disable() / preempt_enable() preempt_disable() / preempt_enable()
local_irq_save() / local_irq_restore() local_irq_save() / local_irq_restore()
hardirq enter / hardirq exit hardirq enter / hardirq exit
NMI enter / NMI exit NMI enter / NMI exit
rcu_dereference_sched() rcu_dereference_sched()
These three mechanisms are used as follows: These three flavors are used as follows:
a. RCU applied to normal data structures. a. RCU applied to normal data structures.
...@@ -867,18 +879,20 @@ RCU: Critical sections Grace period Barrier ...@@ -867,18 +879,20 @@ RCU: Critical sections Grace period Barrier
bh: Critical sections Grace period Barrier bh: Critical sections Grace period Barrier
rcu_read_lock_bh call_rcu_bh rcu_barrier_bh rcu_read_lock_bh call_rcu rcu_barrier
rcu_read_unlock_bh synchronize_rcu_bh rcu_read_unlock_bh synchronize_rcu
rcu_dereference_bh synchronize_rcu_bh_expedited [local_bh_disable] synchronize_rcu_expedited
[and friends]
rcu_dereference_bh
rcu_dereference_bh_check rcu_dereference_bh_check
rcu_dereference_bh_protected rcu_dereference_bh_protected
rcu_read_lock_bh_held rcu_read_lock_bh_held
sched: Critical sections Grace period Barrier sched: Critical sections Grace period Barrier
rcu_read_lock_sched synchronize_sched rcu_barrier_sched rcu_read_lock_sched call_rcu rcu_barrier
rcu_read_unlock_sched call_rcu_sched rcu_read_unlock_sched synchronize_rcu
[preempt_disable] synchronize_sched_expedited [preempt_disable] synchronize_rcu_expedited
[and friends] [and friends]
rcu_read_lock_sched_notrace rcu_read_lock_sched_notrace
rcu_read_unlock_sched_notrace rcu_read_unlock_sched_notrace
...@@ -890,8 +904,8 @@ sched: Critical sections Grace period Barrier ...@@ -890,8 +904,8 @@ sched: Critical sections Grace period Barrier
SRCU: Critical sections Grace period Barrier SRCU: Critical sections Grace period Barrier
srcu_read_lock synchronize_srcu srcu_barrier srcu_read_lock call_srcu srcu_barrier
srcu_read_unlock call_srcu srcu_read_unlock synchronize_srcu
srcu_dereference synchronize_srcu_expedited srcu_dereference synchronize_srcu_expedited
srcu_dereference_check srcu_dereference_check
srcu_read_lock_held srcu_read_lock_held
...@@ -1034,7 +1048,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock ...@@ -1034,7 +1048,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
spinlocks blocking while in RCU read-side critical spinlocks blocking while in RCU read-side critical
sections. sections.
Why the apparent inconsistency? Because it is it Why the apparent inconsistency? Because it is
possible to use priority boosting to keep the RCU possible to use priority boosting to keep the RCU
grace periods short if need be (for example, if running grace periods short if need be (for example, if running
short of memory). In contrast, if blocking waiting short of memory). In contrast, if blocking waiting
......
This diff is collapsed.
...@@ -289,7 +289,7 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte) ...@@ -289,7 +289,7 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
(*batchp)->ptes[(*batchp)->index++] = hugepte; (*batchp)->ptes[(*batchp)->index++] = hugepte;
if ((*batchp)->index == HUGEPD_FREELIST_SIZE) { if ((*batchp)->index == HUGEPD_FREELIST_SIZE) {
call_rcu_sched(&(*batchp)->rcu, hugepd_free_rcu_callback); call_rcu(&(*batchp)->rcu, hugepd_free_rcu_callback);
*batchp = NULL; *batchp = NULL;
} }
put_cpu_var(hugepd_freelist_cur); put_cpu_var(hugepd_freelist_cur);
......
...@@ -352,7 +352,7 @@ void tlb_table_flush(struct mmu_gather *tlb) ...@@ -352,7 +352,7 @@ void tlb_table_flush(struct mmu_gather *tlb)
struct mmu_table_batch **batch = &tlb->batch; struct mmu_table_batch **batch = &tlb->batch;
if (*batch) { if (*batch) {
call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); call_rcu(&(*batch)->rcu, tlb_remove_table_rcu);
*batch = NULL; *batch = NULL;
} }
} }
......
...@@ -53,7 +53,7 @@ static void timer_stop(void) ...@@ -53,7 +53,7 @@ static void timer_stop(void)
{ {
nmi_adjust_hz(1); nmi_adjust_hz(1);
unregister_die_notifier(&profile_timer_exceptions_nb); unregister_die_notifier(&profile_timer_exceptions_nb);
synchronize_sched(); /* Allow already-started NMIs to complete. */ synchronize_rcu(); /* Allow already-started NMIs to complete. */
} }
static int op_nmi_timer_init(struct oprofile_operations *ops) static int op_nmi_timer_init(struct oprofile_operations *ops)
......
...@@ -59,7 +59,7 @@ static struct pcibios_fwaddrmap *pcibios_fwaddrmap_lookup(struct pci_dev *dev) ...@@ -59,7 +59,7 @@ static struct pcibios_fwaddrmap *pcibios_fwaddrmap_lookup(struct pci_dev *dev)
{ {
struct pcibios_fwaddrmap *map; struct pcibios_fwaddrmap *map;
WARN_ON_SMP(!spin_is_locked(&pcibios_fwaddrmap_lock)); lockdep_assert_held(&pcibios_fwaddrmap_lock);
list_for_each_entry(map, &pcibios_fwaddrmappings, list) list_for_each_entry(map, &pcibios_fwaddrmappings, list)
if (map->dev == dev) if (map->dev == dev)
......
...@@ -382,7 +382,7 @@ static int pcrypt_cpumask_change_notify(struct notifier_block *self, ...@@ -382,7 +382,7 @@ static int pcrypt_cpumask_change_notify(struct notifier_block *self,
cpumask_copy(new_mask->mask, cpumask->cbcpu); cpumask_copy(new_mask->mask, cpumask->cbcpu);
rcu_assign_pointer(pcrypt->cb_cpumask, new_mask); rcu_assign_pointer(pcrypt->cb_cpumask, new_mask);
synchronize_rcu_bh(); synchronize_rcu();
free_cpumask_var(old_mask->mask); free_cpumask_var(old_mask->mask);
kfree(old_mask); kfree(old_mask);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment