Commit f2094107 authored by Paul E. McKenney's avatar Paul E. McKenney

Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' into HEAD

doc.2017.04.12a: Documentation updates
fixes.2017.04.19a: Miscellaneous fixes
srcu.2017.04.21a: Parallelize SRCU callback handling
......@@ -17,7 +17,7 @@ rcu_dereference.txt
rcubarrier.txt
- RCU and Unloadable Modules
rculist_nulls.txt
- RCU list primitives for use with SLAB_DESTROY_BY_RCU
- RCU list primitives for use with SLAB_TYPESAFE_BY_RCU
rcuref.txt
- Reference-count design for elements of lists/arrays protected by RCU
rcu.txt
......
......@@ -1185,6 +1185,9 @@ Its fields are as follows:
1 int dynticks_nesting;
2 int dynticks_nmi_nesting;
3 atomic_t dynticks;
4 bool rcu_need_heavy_qs;
5 unsigned long rcu_qs_ctr;
6 bool rcu_urgent_qs;
</pre>
<p>The <tt>-&gt;dynticks_nesting</tt> field counts the
......@@ -1198,11 +1201,32 @@ NMIs are counted by the <tt>-&gt;dynticks_nmi_nesting</tt>
field, except that NMIs that interrupt non-dyntick-idle execution
are not counted.
</p><p>Finally, the <tt>-&gt;dynticks</tt> field counts the corresponding
</p><p>The <tt>-&gt;dynticks</tt> field counts the corresponding
CPU's transitions to and from dyntick-idle mode, so that this counter
has an even value when the CPU is in dyntick-idle mode and an odd
value otherwise.
</p><p>The <tt>-&gt;rcu_need_heavy_qs</tt> field is used
to record the fact that the RCU core code would really like to
see a quiescent state from the corresponding CPU, so much so that
it is willing to call for heavy-weight dyntick-counter operations.
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
code, which provide a momentary idle sojourn in response.
</p><p>The <tt>-&gt;rcu_qs_ctr</tt> field is used to record
quiescent states from <tt>cond_resched()</tt>.
Because <tt>cond_resched()</tt> can execute quite frequently, this
must be quite lightweight, as in a non-atomic increment of this
per-CPU field.
</p><p>Finally, the <tt>-&gt;rcu_urgent_qs</tt> field is used to record
the fact that the RCU core code would really like to see a quiescent
state from the corresponding CPU, with the various other fields indicating
just how badly RCU wants this quiescent state.
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
code, which, if nothing else, non-atomically increment <tt>-&gt;rcu_qs_ctr</tt>
in response.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
......
Using hlist_nulls to protect read-mostly linked lists and
objects using SLAB_DESTROY_BY_RCU allocations.
objects using SLAB_TYPESAFE_BY_RCU allocations.
Please read the basics in Documentation/RCU/listRCU.txt
......@@ -7,7 +7,7 @@ Using special makers (called 'nulls') is a convenient way
to solve following problem :
A typical RCU linked list managing objects which are
allocated with SLAB_DESTROY_BY_RCU kmem_cache can
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
use following algos :
1) Lookup algo
......@@ -96,7 +96,7 @@ unlock_chain(); // typically a spin_unlock()
3) Remove algo
--------------
Nothing special here, we can use a standard RCU hlist deletion.
But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
very very fast (before the end of RCU grace period)
if (put_last_reference_on(obj) {
......
......@@ -928,7 +928,8 @@ d. Do you need RCU grace periods to complete even in the face
e. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
named SLAB_DESTROY_BY_RCU). But please be careful!
f. Do you need read-side critical sections that are respected
even though they are in the middle of the idle loop, during
......
......@@ -320,6 +320,9 @@ config HAVE_CMPXCHG_LOCAL
config HAVE_CMPXCHG_DOUBLE
bool
config ARCH_WEAK_RELEASE_ACQUIRE
bool
config ARCH_WANT_IPC_PARSE_VERSION
bool
......
......@@ -99,6 +99,7 @@ config PPC
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
......
......@@ -4552,7 +4552,7 @@ i915_gem_load_init(struct drm_i915_private *dev_priv)
dev_priv->requests = KMEM_CACHE(drm_i915_gem_request,
SLAB_HWCACHE_ALIGN |
SLAB_RECLAIM_ACCOUNT |
SLAB_DESTROY_BY_RCU);
SLAB_TYPESAFE_BY_RCU);
if (!dev_priv->requests)
goto err_vmas;
......
......@@ -493,7 +493,7 @@ static inline struct drm_i915_gem_request *
__i915_gem_active_get_rcu(const struct i915_gem_active *active)
{
/* Performing a lockless retrieval of the active request is super
* tricky. SLAB_DESTROY_BY_RCU merely guarantees that the backing
* tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
* slab of request objects will not be freed whilst we hold the
* RCU read lock. It does not guarantee that the request itself
* will not be freed and then *reused*. Viz,
......
......@@ -1071,7 +1071,7 @@ int ldlm_init(void)
ldlm_lock_slab = kmem_cache_create("ldlm_locks",
sizeof(struct ldlm_lock), 0,
SLAB_HWCACHE_ALIGN |
SLAB_DESTROY_BY_RCU, NULL);
SLAB_TYPESAFE_BY_RCU, NULL);
if (!ldlm_lock_slab) {
kmem_cache_destroy(ldlm_resource_slab);
return -ENOMEM;
......
......@@ -2340,7 +2340,7 @@ static int jbd2_journal_init_journal_head_cache(void)
jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head",
sizeof(struct journal_head),
0, /* offset */
SLAB_TEMPORARY | SLAB_DESTROY_BY_RCU,
SLAB_TEMPORARY | SLAB_TYPESAFE_BY_RCU,
NULL); /* ctor */
retval = 0;
if (!jbd2_journal_head_cache) {
......
......@@ -38,7 +38,7 @@ void signalfd_cleanup(struct sighand_struct *sighand)
/*
* The lockless check can race with remove_wait_queue() in progress,
* but in this case its caller should run under rcu_read_lock() and
* sighand_cachep is SLAB_DESTROY_BY_RCU, we can safely return.
* sighand_cachep is SLAB_TYPESAFE_BY_RCU, we can safely return.
*/
if (likely(!waitqueue_active(wqh)))
return;
......
......@@ -229,7 +229,7 @@ static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)
*
* Function returns NULL if no refcount could be obtained, or the fence.
* This function handles acquiring a reference to a fence that may be
* reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
* reallocated within the RCU grace period (such as with SLAB_TYPESAFE_BY_RCU),
* so long as the caller is using RCU on the pointer to the fence.
*
* An alternative mechanism is to employ a seqlock to protect a bunch of
......@@ -257,7 +257,7 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep)
* have successfully acquire a reference to it. If it no
* longer matches, we are holding a reference to some other
* reallocated pointer. This is possible if the allocator
* is using a freelist like SLAB_DESTROY_BY_RCU where the
* is using a freelist like SLAB_TYPESAFE_BY_RCU where the
* fence remains valid for the RCU grace period, but it
* may be reallocated. When using such allocators, we are
* responsible for ensuring the reference we get is to
......
......@@ -375,8 +375,6 @@ struct kvm {
struct mutex slots_lock;
struct mm_struct *mm; /* userspace tied to this vm */
struct kvm_memslots *memslots[KVM_ADDRESS_SPACE_NUM];
struct srcu_struct srcu;
struct srcu_struct irq_srcu;
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
/*
......@@ -429,6 +427,8 @@ struct kvm {
struct list_head devices;
struct dentry *debugfs_dentry;
struct kvm_stat_data **debugfs_stat_data;
struct srcu_struct srcu;
struct srcu_struct irq_srcu;
};
#define kvm_err(fmt, ...) \
......
/*
* RCU node combining tree definitions. These are used to compute
* global attributes while avoiding common-case global contention. A key
* property that these computations rely on is a tournament-style approach
* where only one of the tasks contending a lower level in the tree need
* advance to the next higher level. If properly configured, this allows
* unlimited scalability while maintaining a constant level of contention
* on the root node.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright IBM Corporation, 2017
*
* Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
*/
#ifndef __LINUX_RCU_NODE_TREE_H
#define __LINUX_RCU_NODE_TREE_H
/*
* Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
* CONFIG_RCU_FANOUT_LEAF.
* In theory, it should be possible to add more levels straightforwardly.
* In practice, this did work well going from three levels to four.
* Of course, your mileage may vary.
*/
#ifdef CONFIG_RCU_FANOUT
#define RCU_FANOUT CONFIG_RCU_FANOUT
#else /* #ifdef CONFIG_RCU_FANOUT */
# ifdef CONFIG_64BIT
# define RCU_FANOUT 64
# else
# define RCU_FANOUT 32
# endif
#endif /* #else #ifdef CONFIG_RCU_FANOUT */
#ifdef CONFIG_RCU_FANOUT_LEAF
#define RCU_FANOUT_LEAF CONFIG_RCU_FANOUT_LEAF
#else /* #ifdef CONFIG_RCU_FANOUT_LEAF */
#define RCU_FANOUT_LEAF 16
#endif /* #else #ifdef CONFIG_RCU_FANOUT_LEAF */
#define RCU_FANOUT_1 (RCU_FANOUT_LEAF)
#define RCU_FANOUT_2 (RCU_FANOUT_1 * RCU_FANOUT)
#define RCU_FANOUT_3 (RCU_FANOUT_2 * RCU_FANOUT)
#define RCU_FANOUT_4 (RCU_FANOUT_3 * RCU_FANOUT)
#if NR_CPUS <= RCU_FANOUT_1
# define RCU_NUM_LVLS 1
# define NUM_RCU_LVL_0 1
# define NUM_RCU_NODES NUM_RCU_LVL_0
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0 }
# define RCU_NODE_NAME_INIT { "rcu_node_0" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0" }
#elif NR_CPUS <= RCU_FANOUT_2
# define RCU_NUM_LVLS 2
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1" }
#elif NR_CPUS <= RCU_FANOUT_3
# define RCU_NUM_LVLS 3
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
#elif NR_CPUS <= RCU_FANOUT_4
# define RCU_NUM_LVLS 4
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3)
# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
#else
# error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
#endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
#endif /* __LINUX_RCU_NODE_TREE_H */
This diff is collapsed.
......@@ -509,7 +509,8 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n,
{
struct hlist_node *i, *last = NULL;
for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i))
/* Note: write side code, so rcu accessors are not needed. */
for (i = h->first; i; i = i->next)
last = i;
if (last) {
......
......@@ -363,15 +363,20 @@ static inline void rcu_init_nohz(void)
#ifdef CONFIG_TASKS_RCU
#define TASKS_RCU(x) x
extern struct srcu_struct tasks_rcu_exit_srcu;
#define rcu_note_voluntary_context_switch(t) \
#define rcu_note_voluntary_context_switch_lite(t) \
do { \
rcu_all_qs(); \
if (READ_ONCE((t)->rcu_tasks_holdout)) \
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
} while (0)
#define rcu_note_voluntary_context_switch(t) \
do { \
rcu_all_qs(); \
rcu_note_voluntary_context_switch_lite(t); \
} while (0)
#else /* #ifdef CONFIG_TASKS_RCU */
#define TASKS_RCU(x) do { } while (0)
#define rcu_note_voluntary_context_switch(t) rcu_all_qs()
#define rcu_note_voluntary_context_switch_lite(t) do { } while (0)
#define rcu_note_voluntary_context_switch(t) rcu_all_qs()
#endif /* #else #ifdef CONFIG_TASKS_RCU */
/**
......@@ -1127,11 +1132,11 @@ do { \
* if the UNLOCK and LOCK are executed by the same CPU or if the
* UNLOCK and LOCK operate on the same lock variable.
*/
#ifdef CONFIG_PPC
#ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE
#define smp_mb__after_unlock_lock() smp_mb() /* Full ordering for lock. */
#else /* #ifdef CONFIG_PPC */
#else /* #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
#define smp_mb__after_unlock_lock() do { } while (0)
#endif /* #else #ifdef CONFIG_PPC */
#endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
#endif /* __LINUX_RCUPDATE_H */
......@@ -33,6 +33,11 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
return 0;
}
static inline bool rcu_eqs_special_set(int cpu)
{
return false; /* Never flag non-existent other CPUs! */
}
static inline unsigned long get_state_synchronize_rcu(void)
{
return 0;
......@@ -87,10 +92,11 @@ static inline void kfree_call_rcu(struct rcu_head *head,
call_rcu(head, func);
}
static inline void rcu_note_context_switch(void)
{
rcu_sched_qs();
}
#define rcu_note_context_switch(preempt) \
do { \
rcu_sched_qs(); \
rcu_note_voluntary_context_switch_lite(current); \
} while (0)
/*
* Take advantage of the fact that there is only one CPU, which
......@@ -212,14 +218,14 @@ static inline void exit_rcu(void)
{
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU)
extern int rcu_scheduler_active __read_mostly;
void rcu_scheduler_starting(void);
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#else /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
static inline void rcu_scheduler_starting(void)
{
}
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#endif /* #else #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE)
......@@ -237,6 +243,10 @@ static inline bool rcu_is_watching(void)
#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
static inline void rcu_request_urgent_qs_task(struct task_struct *t)
{
}
static inline void rcu_all_qs(void)
{
barrier(); /* Avoid RCU read-side critical sections leaking across. */
......
......@@ -30,7 +30,7 @@
#ifndef __LINUX_RCUTREE_H
#define __LINUX_RCUTREE_H
void rcu_note_context_switch(void);
void rcu_note_context_switch(bool preempt);
int rcu_needs_cpu(u64 basem, u64 *nextevt);
void rcu_cpu_stall_reset(void);
......@@ -41,7 +41,7 @@ void rcu_cpu_stall_reset(void);
*/
static inline void rcu_virt_note_context_switch(int cpu)
{
rcu_note_context_switch();
rcu_note_context_switch(false);
}
void synchronize_rcu_bh(void);
......@@ -108,6 +108,7 @@ void rcu_scheduler_starting(void);
extern int rcu_scheduler_active __read_mostly;
bool rcu_is_watching(void);
void rcu_request_urgent_qs_task(struct task_struct *t);
void rcu_all_qs(void);
......
......@@ -28,7 +28,7 @@
#define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
/*
* SLAB_DESTROY_BY_RCU - **WARNING** READ THIS!
* SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS!
*
* This delays freeing the SLAB page by a grace period, it does _NOT_
* delay object freeing. This means that if you do kmem_cache_free()
......@@ -61,8 +61,10 @@
*
* rcu_read_lock before reading the address, then rcu_read_unlock after
* taking the spinlock within the structure expected at that address.
*
* Note that SLAB_TYPESAFE_BY_RCU was originally named SLAB_DESTROY_BY_RCU.
*/
#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
#define SLAB_TYPESAFE_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
#define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
......
......@@ -22,7 +22,7 @@
* Lai Jiangshan <laijs@cn.fujitsu.com>
*
* For detailed explanation of Read-Copy Update mechanism see -
* Documentation/RCU/ *.txt
* Documentation/RCU/ *.txt
*
*/
......@@ -32,35 +32,9 @@
#include <linux/mutex.h>
#include <linux/rcupdate.h>
#include <linux/workqueue.h>
#include <linux/rcu_segcblist.h>
struct srcu_array {
unsigned long lock_count[2];
unsigned long unlock_count[2];
};
struct rcu_batch {
struct rcu_head *head, **tail;
};
#define RCU_BATCH_INIT(name) { NULL, &(name.head) }
struct srcu_struct {
unsigned long completed;
struct srcu_array __percpu *per_cpu_ref;
spinlock_t queue_lock; /* protect ->batch_queue, ->running */
bool running;
/* callbacks just queued */
struct rcu_batch batch_queue;
/* callbacks try to do the first check_zero */
struct rcu_batch batch_check0;
/* callbacks done with the first check_zero and the flip */
struct rcu_batch batch_check1;
struct rcu_batch batch_done;
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
struct srcu_struct;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
......@@ -82,46 +56,15 @@ int init_srcu_struct(struct srcu_struct *sp);
#define __SRCU_DEP_MAP_INIT(srcu_name)
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.completed = -300, \
.per_cpu_ref = &name##_srcu_array, \
.queue_lock = __SPIN_LOCK_UNLOCKED(name.queue_lock), \
.running = false, \
.batch_queue = RCU_BATCH_INIT(name.batch_queue), \
.batch_check0 = RCU_BATCH_INIT(name.batch_check0), \
.batch_check1 = RCU_BATCH_INIT(name.batch_check1), \
.batch_done = RCU_BATCH_INIT(name.batch_done), \
.work = __DELAYED_WORK_INITIALIZER(name.work, process_srcu, 0),\
__SRCU_DEP_MAP_INIT(name) \
}
/*
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
* DEFINE_STATIC_SRCU() within a function. If these rules are too
* restrictive, declare the srcu_struct manually. For example, in
* each file:
*
* static struct srcu_struct my_srcu;
*
* Then, before the first use of each my_srcu, manually initialize it:
*
* init_srcu_struct(&my_srcu);
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_array, name##_srcu_array);\
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
#ifdef CONFIG_TINY_SRCU
#include <linux/srcutiny.h>
#elif defined(CONFIG_TREE_SRCU)
#include <linux/srcutree.h>
#elif defined(CONFIG_CLASSIC_SRCU)
#include <linux/srcuclassic.h>
#else
#error "Unknown SRCU implementation specified to kernel configuration"
#endif
/**
* call_srcu() - Queue a callback for invocation after an SRCU grace period
......@@ -147,9 +90,6 @@ void cleanup_srcu_struct(struct srcu_struct *sp);
int __srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void __srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
void synchronize_srcu(struct srcu_struct *sp);
void synchronize_srcu_expedited(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
......
/*
* Sleepable Read-Copy Update mechanism for mutual exclusion,
* classic v4.11 variant.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright (C) IBM Corporation, 2017
*
* Author: Paul McKenney <paulmck@us.ibm.com>
*/
#ifndef _LINUX_SRCU_CLASSIC_H
#define _LINUX_SRCU_CLASSIC_H
struct srcu_array {
unsigned long lock_count[2];
unsigned long unlock_count[2];
};
struct rcu_batch {
struct rcu_head *head, **tail;
};
#define RCU_BATCH_INIT(name) { NULL, &(name.head) }
struct srcu_struct {
unsigned long completed;
struct srcu_array __percpu *per_cpu_ref;
spinlock_t queue_lock; /* protect ->batch_queue, ->running */
bool running;
/* callbacks just queued */
struct rcu_batch batch_queue;
/* callbacks try to do the first check_zero */
struct rcu_batch batch_check0;
/* callbacks done with the first check_zero and the flip */
struct rcu_batch batch_check1;
struct rcu_batch batch_done;
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.completed = -300, \
.per_cpu_ref = &name##_srcu_array, \
.queue_lock = __SPIN_LOCK_UNLOCKED(name.queue_lock), \
.running = false, \
.batch_queue = RCU_BATCH_INIT(name.batch_queue), \
.batch_check0 = RCU_BATCH_INIT(name.batch_check0), \
.batch_check1 = RCU_BATCH_INIT(name.batch_check1), \
.batch_done = RCU_BATCH_INIT(name.batch_done), \
.work = __DELAYED_WORK_INITIALIZER(name.work, process_srcu, 0),\
__SRCU_DEP_MAP_INIT(name) \
}
/*
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
* DEFINE_STATIC_SRCU() within a function. If these rules are too
* restrictive, declare the srcu_struct manually. For example, in
* each file:
*
* static struct srcu_struct my_srcu;
*
* Then, before the first use of each my_srcu, manually initialize it:
*
* init_srcu_struct(&my_srcu);
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_array, name##_srcu_array);\
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
void synchronize_srcu_expedited(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
#endif
/*
* Sleepable Read-Copy Update mechanism for mutual exclusion,
* tiny variant.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright (C) IBM Corporation, 2017
*
* Author: Paul McKenney <paulmck@us.ibm.com>
*/
#ifndef _LINUX_SRCU_TINY_H
#define _LINUX_SRCU_TINY_H
#include <linux/swait.h>
struct srcu_struct {
int srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
struct swait_queue_head srcu_wq;
/* Last srcu_read_unlock() wakes GP. */
unsigned long srcu_gp_seq; /* GP seq # for callback tagging. */
struct rcu_segcblist srcu_cblist;
/* Pending SRCU callbacks. */
int srcu_idx; /* Current reader array element. */
bool srcu_gp_running; /* GP workqueue running? */
bool srcu_gp_waiting; /* GP waiting for readers? */
struct work_struct srcu_work; /* For driving grace periods. */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
void srcu_drive_gp(struct work_struct *wp);
#define __SRCU_STRUCT_INIT(name) \
{ \
.srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq), \
.srcu_cblist = RCU_SEGCBLIST_INITIALIZER(name.srcu_cblist), \
.srcu_work = __WORK_INITIALIZER(name.srcu_work, srcu_drive_gp), \
__SRCU_DEP_MAP_INIT(name) \
}
/*
* This odd _STATIC_ arrangement is needed for API compatibility with
* Tree SRCU, which needs some per-CPU data.
*/
#define DEFINE_SRCU(name) \
struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_STATIC_SRCU(name) \
static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
void synchronize_srcu(struct srcu_struct *sp);
static inline void synchronize_srcu_expedited(struct srcu_struct *sp)
{
synchronize_srcu(sp);
}
static inline void srcu_barrier(struct srcu_struct *sp)
{
synchronize_srcu(sp);
}
static inline unsigned long srcu_batches_completed(struct srcu_struct *sp)
{
return 0;
}
#endif
/*
* Sleepable Read-Copy Update mechanism for mutual exclusion,
* tree variant.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright (C) IBM Corporation, 2017
*
* Author: Paul McKenney <paulmck@us.ibm.com>
*/
#ifndef _LINUX_SRCU_TREE_H
#define _LINUX_SRCU_TREE_H
#include <linux/rcu_node_tree.h>
#include <linux/completion.h>
struct srcu_node;
struct srcu_struct;
/*
* Per-CPU structure feeding into leaf srcu_node, similar in function
* to rcu_node.
*/
struct srcu_data {
/* Read-side state. */
unsigned long srcu_lock_count[2]; /* Locks per CPU. */
unsigned long srcu_unlock_count[2]; /* Unlocks per CPU. */
/* Update-side state. */
spinlock_t lock ____cacheline_internodealigned_in_smp;
struct rcu_segcblist srcu_cblist; /* List of callbacks.*/
unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */
bool srcu_cblist_invoking; /* Invoking these CBs? */
struct delayed_work work; /* Context for CB invoking. */
struct rcu_head srcu_barrier_head; /* For srcu_barrier() use. */
struct srcu_node *mynode; /* Leaf srcu_node. */
int cpu;
struct srcu_struct *sp;
};
/*
* Node in SRCU combining tree, similar in function to rcu_data.
*/
struct srcu_node {
spinlock_t lock;
unsigned long srcu_have_cbs[4]; /* GP seq for children */
/* having CBs, but only */
/* is > ->srcu_gq_seq. */
struct srcu_node *srcu_parent; /* Next up in tree. */
int grplo; /* Least CPU for node. */
int grphi; /* Biggest CPU for node. */
};
/*
* Per-SRCU-domain structure, similar in function to rcu_state.
*/
struct srcu_struct {
struct srcu_node node[NUM_RCU_NODES]; /* Combining tree. */
struct srcu_node *level[RCU_NUM_LVLS + 1];
/* First node at each level. */
struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
spinlock_t gp_lock; /* protect ->srcu_cblist */
struct mutex srcu_gp_mutex; /* Serialize GP work. */
unsigned int srcu_idx; /* Current rdr array element. */
unsigned long srcu_gp_seq; /* Grace-period seq #. */
unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */
atomic_t srcu_exp_cnt; /* # ongoing expedited GPs. */
struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */
unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */
struct mutex srcu_barrier_mutex; /* Serialize barrier ops. */
struct completion srcu_barrier_completion;
/* Awaken barrier rq at end. */
atomic_t srcu_barrier_cpu_cnt; /* # CPUs not yet posting a */
/* callback for the barrier */
/* operation. */
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
/* Values for state variable (bottom bits of ->srcu_gp_seq). */
#define SRCU_STATE_IDLE 0
#define SRCU_STATE_SCAN1 1
#define SRCU_STATE_SCAN2 2
void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.sda = &name##_srcu_data, \
.gp_lock = __SPIN_LOCK_UNLOCKED(name.gp_lock), \
.srcu_gp_seq_needed = 0 - 1, \
__SRCU_DEP_MAP_INIT(name) \
}
/*
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
* DEFINE_STATIC_SRCU() within a function. If these rules are too
* restrictive, declare the srcu_struct manually. For example, in
* each file:
*
* static struct srcu_struct my_srcu;
*
* Then, before the first use of each my_srcu, manually initialize it:
*
* init_srcu_struct(&my_srcu);
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data);\
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
void synchronize_srcu_expedited(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
#endif
......@@ -209,7 +209,7 @@ struct ustat {
* naturally due ABI requirements, but some architectures (like CRIS) have
* weird ABI and we need to ask it explicitly.
*
* The alignment is required to guarantee that bits 0 and 1 of @next will be
* The alignment is required to guarantee that bit 0 of @next will be
* clear under normal conditions -- as long as we use call_rcu(),
* call_rcu_bh(), call_rcu_sched(), or call_srcu() to queue callback.
*
......
......@@ -993,7 +993,7 @@ struct smc_hashinfo;
struct module;
/*
* caches using SLAB_DESTROY_BY_RCU should let .next pointer from nulls nodes
* caches using SLAB_TYPESAFE_BY_RCU should let .next pointer from nulls nodes
* un-modified. Special care is taken when initializing object to zero.
*/
static inline void sk_prot_clear_nulls(struct sock *sk, int size)
......
......@@ -526,6 +526,35 @@ config SRCU
permits arbitrary sleeping or blocking within RCU read-side critical
sections.
config CLASSIC_SRCU
bool "Use v4.11 classic SRCU implementation"
default n
depends on RCU_EXPERT && SRCU
help
This option selects the traditional well-tested classic SRCU
implementation from v4.11, as might be desired for enterprise
Linux distributions. Without this option, the shiny new
Tiny SRCU and Tree SRCU implementations are used instead.
At some point, it is hoped that Tiny SRCU and Tree SRCU
will accumulate enough test time and confidence to allow
Classic SRCU to be dropped entirely.
Say Y if you need a rock-solid SRCU.
Say N if you would like help test Tree SRCU.
config TINY_SRCU
bool
default y if TINY_RCU && !CLASSIC_SRCU
help
This option selects the single-CPU non-preemptible version of SRCU.
config TREE_SRCU
bool
default y if !TINY_RCU && !CLASSIC_SRCU
help
This option selects the full-fledged version of SRCU.
config TASKS_RCU
bool
default n
......@@ -612,11 +641,17 @@ config RCU_FANOUT_LEAF
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well.
leaf-level fanouts work well. That said, setting leaf-level
fanout to a large number will likely cause problematic
lock contention on the leaf-level rcu_node structures unless
you boot with the skew_tick kernel parameter.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems.
Select the maximum permissible value for large systems, but
please understand that you may also need to set the skew_tick
kernel boot parameter to avoid contention on the rcu_node
structure's locks.
Take the default if unsure.
......
......@@ -1313,7 +1313,7 @@ void __cleanup_sighand(struct sighand_struct *sighand)
if (atomic_dec_and_test(&sighand->count)) {
signalfd_cleanup(sighand);
/*
* sighand_cachep is SLAB_DESTROY_BY_RCU so we can free it
* sighand_cachep is SLAB_TYPESAFE_BY_RCU so we can free it
* without an RCU grace period, see __lock_task_sighand().
*/
kmem_cache_free(sighand_cachep, sighand);
......@@ -2144,7 +2144,7 @@ void __init proc_caches_init(void)
{
sighand_cachep = kmem_cache_create("sighand_cache",
sizeof(struct sighand_struct), 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_DESTROY_BY_RCU|
SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
SLAB_NOTRACK|SLAB_ACCOUNT, sighand_ctor);
signal_cachep = kmem_cache_create("signal_cache",
sizeof(struct signal_struct), 0,
......
......@@ -1144,10 +1144,10 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
return 0;
printk("\n");
printk("======================================================\n");
printk("[ INFO: possible circular locking dependency detected ]\n");
pr_warn("======================================================\n");
pr_warn("WARNING: possible circular locking dependency detected\n");
print_kernel_ident();
printk("-------------------------------------------------------\n");
pr_warn("------------------------------------------------------\n");
printk("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(check_src);
......@@ -1482,11 +1482,11 @@ print_bad_irq_dependency(struct task_struct *curr,
return 0;
printk("\n");
printk("======================================================\n");
printk("[ INFO: %s-safe -> %s-unsafe lock order detected ]\n",
pr_warn("=====================================================\n");
pr_warn("WARNING: %s-safe -> %s-unsafe lock order detected\n",
irqclass, irqclass);
print_kernel_ident();
printk("------------------------------------------------------\n");
pr_warn("-----------------------------------------------------\n");
printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
curr->comm, task_pid_nr(curr),
curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
......@@ -1711,10 +1711,10 @@ print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
return 0;
printk("\n");
printk("=============================================\n");
printk("[ INFO: possible recursive locking detected ]\n");
pr_warn("============================================\n");
pr_warn("WARNING: possible recursive locking detected\n");
print_kernel_ident();
printk("---------------------------------------------\n");
pr_warn("--------------------------------------------\n");
printk("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(next);
......@@ -2061,10 +2061,10 @@ static void print_collision(struct task_struct *curr,
struct lock_chain *chain)
{
printk("\n");
printk("======================\n");
printk("[chain_key collision ]\n");
pr_warn("============================\n");
pr_warn("WARNING: chain_key collision\n");
print_kernel_ident();
printk("----------------------\n");
pr_warn("----------------------------\n");
printk("%s/%d: ", current->comm, task_pid_nr(current));
printk("Hash chain already cached but the contents don't match!\n");
......@@ -2360,10 +2360,10 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
return 0;
printk("\n");
printk("=================================\n");
printk("[ INFO: inconsistent lock state ]\n");
pr_warn("================================\n");
pr_warn("WARNING: inconsistent lock state\n");
print_kernel_ident();
printk("---------------------------------\n");
pr_warn("--------------------------------\n");
printk("inconsistent {%s} -> {%s} usage.\n",
usage_str[prev_bit], usage_str[new_bit]);
......@@ -2425,10 +2425,10 @@ print_irq_inversion_bug(struct task_struct *curr,
return 0;
printk("\n");
printk("=========================================================\n");
printk("[ INFO: possible irq lock inversion dependency detected ]\n");
pr_warn("========================================================\n");
pr_warn("WARNING: possible irq lock inversion dependency detected\n");
print_kernel_ident();
printk("---------------------------------------------------------\n");
pr_warn("--------------------------------------------------------\n");
printk("%s/%d just changed the state of lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(this);
......@@ -3170,10 +3170,10 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
return 0;
printk("\n");
printk("==================================\n");
printk("[ BUG: Nested lock was not taken ]\n");
pr_warn("==================================\n");
pr_warn("WARNING: Nested lock was not taken\n");
print_kernel_ident();
printk("----------------------------------\n");
pr_warn("----------------------------------\n");
printk("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
print_lock(hlock);
......@@ -3383,10 +3383,10 @@ print_unlock_imbalance_bug(struct task_struct *curr, struct lockdep_map *lock,
return 0;
printk("\n");
printk("=====================================\n");
printk("[ BUG: bad unlock balance detected! ]\n");
pr_warn("=====================================\n");
pr_warn("WARNING: bad unlock balance detected!\n");
print_kernel_ident();
printk("-------------------------------------\n");
pr_warn("-------------------------------------\n");
printk("%s/%d is trying to release lock (",
curr->comm, task_pid_nr(curr));
print_lockdep_cache(lock);
......@@ -3880,10 +3880,10 @@ print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
return 0;
printk("\n");
printk("=================================\n");
printk("[ BUG: bad contention detected! ]\n");
pr_warn("=================================\n");
pr_warn("WARNING: bad contention detected!\n");
print_kernel_ident();
printk("---------------------------------\n");
pr_warn("---------------------------------\n");
printk("%s/%d is trying to contend lock (",
curr->comm, task_pid_nr(curr));
print_lockdep_cache(lock);
......@@ -4244,10 +4244,10 @@ print_freed_lock_bug(struct task_struct *curr, const void *mem_from,
return;
printk("\n");
printk("=========================\n");
printk("[ BUG: held lock freed! ]\n");
pr_warn("=========================\n");
pr_warn("WARNING: held lock freed!\n");
print_kernel_ident();
printk("-------------------------\n");
pr_warn("-------------------------\n");
printk("%s/%d is freeing memory %p-%p, with a lock still held there!\n",
curr->comm, task_pid_nr(curr), mem_from, mem_to-1);
print_lock(hlock);
......@@ -4302,11 +4302,11 @@ static void print_held_locks_bug(void)
return;
printk("\n");
printk("=====================================\n");
printk("[ BUG: %s/%d still has locks held! ]\n",
pr_warn("====================================\n");
pr_warn("WARNING: %s/%d still has locks held!\n",
current->comm, task_pid_nr(current));
print_kernel_ident();
printk("-------------------------------------\n");
pr_warn("------------------------------------\n");
lockdep_print_held_locks(current);
printk("\nstack backtrace:\n");
dump_stack();
......@@ -4371,7 +4371,7 @@ void debug_show_all_locks(void)
} while_each_thread(g, p);
printk("\n");
printk("=============================================\n\n");
pr_warn("=============================================\n\n");
if (unlock)
read_unlock(&tasklist_lock);
......@@ -4401,10 +4401,10 @@ asmlinkage __visible void lockdep_sys_exit(void)
if (!debug_locks_off())
return;
printk("\n");
printk("================================================\n");
printk("[ BUG: lock held when returning to user space! ]\n");
pr_warn("================================================\n");
pr_warn("WARNING: lock held when returning to user space!\n");
print_kernel_ident();
printk("------------------------------------------------\n");
pr_warn("------------------------------------------------\n");
printk("%s/%d is leaving the kernel with locks still held!\n",
curr->comm, curr->pid);
lockdep_print_held_locks(curr);
......@@ -4421,13 +4421,13 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
#endif /* #ifdef CONFIG_PROVE_RCU_REPEATEDLY */
/* Note: the following can be executed concurrently, so be careful. */
printk("\n");
pr_err("===============================\n");
pr_err("[ ERR: suspicious RCU usage. ]\n");
pr_warn("=============================\n");
pr_warn("WARNING: suspicious RCU usage\n");
print_kernel_ident();
pr_err("-------------------------------\n");
pr_err("%s:%d %s!\n", file, line, s);
pr_err("\nother info that might help us debug this:\n\n");
pr_err("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
pr_warn("-----------------------------\n");
printk("%s:%d %s!\n", file, line, s);
printk("\nother info that might help us debug this:\n\n");
printk("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
!rcu_lockdep_current_cpu_online()
? "RCU used illegally from offline CPU!\n"
: !rcu_is_watching()
......
......@@ -102,10 +102,11 @@ void debug_rt_mutex_print_deadlock(struct rt_mutex_waiter *waiter)
return;
}
printk("\n============================================\n");
printk( "[ BUG: circular locking deadlock detected! ]\n");
printk("%s\n", print_tainted());
printk( "--------------------------------------------\n");
pr_warn("\n");
pr_warn("============================================\n");
pr_warn("WARNING: circular locking deadlock detected!\n");
pr_warn("%s\n", print_tainted());
pr_warn("--------------------------------------------\n");
printk("%s/%d is deadlocking current task %s/%d\n\n",
task->comm, task_pid_nr(task),
current->comm, task_pid_nr(current));
......
......@@ -3,7 +3,9 @@
KCOV_INSTRUMENT := n
obj-y += update.o sync.o
obj-$(CONFIG_SRCU) += srcu.o
obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
obj-$(CONFIG_TREE_SRCU) += srcutree.o
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_TREE_RCU) += tree.o
......
......@@ -56,6 +56,83 @@
#define DYNTICK_TASK_EXIT_IDLE (DYNTICK_TASK_NEST_VALUE + \
DYNTICK_TASK_FLAG)
/*
* Grace-period counter management.
*/
#define RCU_SEQ_CTR_SHIFT 2
#define RCU_SEQ_STATE_MASK ((1 << RCU_SEQ_CTR_SHIFT) - 1)
/*
* Return the counter portion of a sequence number previously returned
* by rcu_seq_snap() or rcu_seq_current().
*/
static inline unsigned long rcu_seq_ctr(unsigned long s)
{
return s >> RCU_SEQ_CTR_SHIFT;
}
/*
* Return the state portion of a sequence number previously returned
* by rcu_seq_snap() or rcu_seq_current().
*/
static inline int rcu_seq_state(unsigned long s)
{
return s & RCU_SEQ_STATE_MASK;
}
/*
* Set the state portion of the pointed-to sequence number.
* The caller is responsible for preventing conflicting updates.
*/
static inline void rcu_seq_set_state(unsigned long *sp, int newstate)
{
WARN_ON_ONCE(newstate & ~RCU_SEQ_STATE_MASK);
WRITE_ONCE(*sp, (*sp & ~RCU_SEQ_STATE_MASK) + newstate);
}
/* Adjust sequence number for start of update-side operation. */
static inline void rcu_seq_start(unsigned long *sp)
{
WRITE_ONCE(*sp, *sp + 1);
smp_mb(); /* Ensure update-side operation after counter increment. */
WARN_ON_ONCE(rcu_seq_state(*sp) != 1);
}
/* Adjust sequence number for end of update-side operation. */
static inline void rcu_seq_end(unsigned long *sp)
{
smp_mb(); /* Ensure update-side operation before counter increment. */
WARN_ON_ONCE(!rcu_seq_state(*sp));
WRITE_ONCE(*sp, (*sp | RCU_SEQ_STATE_MASK) + 1);
}
/* Take a snapshot of the update side's sequence number. */
static inline unsigned long rcu_seq_snap(unsigned long *sp)
{
unsigned long s;
s = (READ_ONCE(*sp) + 2 * RCU_SEQ_STATE_MASK + 1) & ~RCU_SEQ_STATE_MASK;
smp_mb(); /* Above access must not bleed into critical section. */
return s;
}
/* Return the current value the update side's sequence number, no ordering. */
static inline unsigned long rcu_seq_current(unsigned long *sp)
{
return READ_ONCE(*sp);
}
/*
* Given a snapshot from rcu_seq_snap(), determine whether or not a
* full update-side operation has occurred.
*/
static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
{
return ULONG_CMP_GE(READ_ONCE(*sp), s);
}
/*
* debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally
* by call_rcu() and rcu callback execution, and are therefore not part of the
......@@ -109,12 +186,12 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
rcu_lock_acquire(&rcu_callback_map);
if (__is_kfree_rcu_offset(offset)) {
RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset));
RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset);)
kfree((void *)head - offset);
rcu_lock_release(&rcu_callback_map);
return true;
} else {
RCU_TRACE(trace_rcu_invoke_callback(rn, head));
RCU_TRACE(trace_rcu_invoke_callback(rn, head);)
head->func(head);
rcu_lock_release(&rcu_callback_map);
return false;
......@@ -144,4 +221,76 @@ void rcu_test_sync_prims(void);
*/
extern void resched_cpu(int cpu);
#if defined(SRCU) || !defined(TINY_RCU)
#include <linux/rcu_node_tree.h>
extern int rcu_num_lvls;
extern int num_rcu_lvl[];
extern int rcu_num_nodes;
static bool rcu_fanout_exact;
static int rcu_fanout_leaf;
/*
* Compute the per-level fanout, either using the exact fanout specified
* or balancing the tree, depending on the rcu_fanout_exact boot parameter.
*/
static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
{
int i;
if (rcu_fanout_exact) {
levelspread[rcu_num_lvls - 1] = rcu_fanout_leaf;
for (i = rcu_num_lvls - 2; i >= 0; i--)
levelspread[i] = RCU_FANOUT;
} else {
int ccur;
int cprv;
cprv = nr_cpu_ids;
for (i = rcu_num_lvls - 1; i >= 0; i--) {
ccur = levelcnt[i];
levelspread[i] = (cprv + ccur - 1) / ccur;
cprv = ccur;
}
}
}
/*
* Do a full breadth-first scan of the rcu_node structures for the
* specified rcu_state structure.
*/
#define rcu_for_each_node_breadth_first(rsp, rnp) \
for ((rnp) = &(rsp)->node[0]; \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
/*
* Do a breadth-first scan of the non-leaf rcu_node structures for the
* specified rcu_state structure. Note that if there is a singleton
* rcu_node tree with but one rcu_node structure, this loop is a no-op.
*/
#define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \
for ((rnp) = &(rsp)->node[0]; \
(rnp) < (rsp)->level[rcu_num_lvls - 1]; (rnp)++)
/*
* Scan the leaves of the rcu_node hierarchy for the specified rcu_state
* structure. Note that if there is a singleton rcu_node tree with but
* one rcu_node structure, this loop -will- visit the rcu_node structure.
* It is still a leaf node, even if it is also the root node.
*/
#define rcu_for_each_leaf_node(rsp, rnp) \
for ((rnp) = (rsp)->level[rcu_num_lvls - 1]; \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
/*
* Iterate over all possible CPUs in a leaf RCU node.
*/
#define for_each_leaf_node_possible_cpu(rnp, cpu) \
for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
cpu <= rnp->grphi; \
cpu = cpumask_next((cpu), cpu_possible_mask))
#endif /* #if defined(SRCU) || !defined(TINY_RCU) */
#endif /* __LINUX_RCU_H */
......@@ -559,19 +559,34 @@ static void srcu_torture_barrier(void)
static void srcu_torture_stats(void)
{
int cpu;
int idx = srcu_ctlp->completed & 0x1;
int __maybe_unused cpu;
int idx;
pr_alert("%s%s per-CPU(idx=%d):",
#if defined(CONFIG_TREE_SRCU) || defined(CONFIG_CLASSIC_SRCU)
#ifdef CONFIG_TREE_SRCU
idx = srcu_ctlp->srcu_idx & 0x1;
#else /* #ifdef CONFIG_TREE_SRCU */
idx = srcu_ctlp->completed & 0x1;
#endif /* #else #ifdef CONFIG_TREE_SRCU */
pr_alert("%s%s Tree SRCU per-CPU(idx=%d):",
torture_type, TORTURE_FLAG, idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
struct srcu_array *counts = per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
#ifdef CONFIG_TREE_SRCU
struct srcu_data *counts;
counts = per_cpu_ptr(srcu_ctlp->sda, cpu);
u0 = counts->srcu_unlock_count[!idx];
u1 = counts->srcu_unlock_count[idx];
#else /* #ifdef CONFIG_TREE_SRCU */
struct srcu_array *counts;
counts = per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
u0 = counts->unlock_count[!idx];
u1 = counts->unlock_count[idx];
#endif /* #else #ifdef CONFIG_TREE_SRCU */
/*
* Make sure that a lock is always counted if the corresponding
......@@ -579,14 +594,26 @@ static void srcu_torture_stats(void)
*/
smp_rmb();
#ifdef CONFIG_TREE_SRCU
l0 = counts->srcu_lock_count[!idx];
l1 = counts->srcu_lock_count[idx];
#else /* #ifdef CONFIG_TREE_SRCU */
l0 = counts->lock_count[!idx];
l1 = counts->lock_count[idx];
#endif /* #else #ifdef CONFIG_TREE_SRCU */
c0 = l0 - u0;
c1 = l1 - u1;
pr_cont(" %d(%ld,%ld)", cpu, c0, c1);
}
pr_cont("\n");
#elif defined(CONFIG_TINY_SRCU)
idx = READ_ONCE(srcu_ctlp->srcu_idx) & 0x1;
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%d,%d)\n",
torture_type, TORTURE_FLAG, idx,
READ_ONCE(srcu_ctlp->srcu_lock_nesting[!idx]),
READ_ONCE(srcu_ctlp->srcu_lock_nesting[idx]));
#endif
}
static void srcu_torture_synchronize_expedited(void)
......
......@@ -22,7 +22,7 @@
* Lai Jiangshan <laijs@cn.fujitsu.com>
*
* For detailed explanation of Read-Copy Update mechanism see -
* Documentation/RCU/ *.txt
* Documentation/RCU/ *.txt
*
*/
......@@ -243,8 +243,14 @@ static bool srcu_readers_active(struct srcu_struct *sp)
* cleanup_srcu_struct - deconstruct a sleep-RCU structure
* @sp: structure to clean up.
*
* Must invoke this after you are finished using a given srcu_struct that
* was initialized via init_srcu_struct(), else you leak memory.
* Must invoke this only after you are finished using a given srcu_struct
* that was initialized via init_srcu_struct(). This code does some
* probabalistic checking, spotting late uses of srcu_read_lock(),
* synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu().
* If any such late uses are detected, the per-CPU memory associated with
* the srcu_struct is simply leaked and WARN_ON() is invoked. If the
* caller frees the srcu_struct itself, a use-after-free crash will likely
* ensue, but at least there will be a warning printed.
*/
void cleanup_srcu_struct(struct srcu_struct *sp)
{
......
/*
* Sleepable Read-Copy Update mechanism for mutual exclusion,
* tiny version for non-preemptible single-CPU use.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright (C) IBM Corporation, 2017
*
* Author: Paul McKenney <paulmck@us.ibm.com>
*/
#include <linux/export.h>
#include <linux/mutex.h>
#include <linux/preempt.h>
#include <linux/rcupdate_wait.h>
#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/srcu.h>
#include <linux/rcu_node_tree.h>
#include "rcu.h"
static int init_srcu_struct_fields(struct srcu_struct *sp)
{
sp->srcu_lock_nesting[0] = 0;
sp->srcu_lock_nesting[1] = 0;
init_swait_queue_head(&sp->srcu_wq);
sp->srcu_gp_seq = 0;
rcu_segcblist_init(&sp->srcu_cblist);
sp->srcu_gp_running = false;
sp->srcu_gp_waiting = false;
sp->srcu_idx = 0;
INIT_WORK(&sp->srcu_work, srcu_drive_gp);
return 0;
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
int __init_srcu_struct(struct srcu_struct *sp, const char *name,
struct lock_class_key *key)
{
/* Don't re-initialize a lock while it is held. */
debug_check_no_locks_freed((void *)sp, sizeof(*sp));
lockdep_init_map(&sp->dep_map, name, key, 0);
return init_srcu_struct_fields(sp);
}
EXPORT_SYMBOL_GPL(__init_srcu_struct);
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/*
* init_srcu_struct - initialize a sleep-RCU structure
* @sp: structure to initialize.
*
* Must invoke this on a given srcu_struct before passing that srcu_struct
* to any other function. Each srcu_struct represents a separate domain
* of SRCU protection.
*/
int init_srcu_struct(struct srcu_struct *sp)
{
return init_srcu_struct_fields(sp);
}
EXPORT_SYMBOL_GPL(init_srcu_struct);
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/*
* cleanup_srcu_struct - deconstruct a sleep-RCU structure
* @sp: structure to clean up.
*
* Must invoke this after you are finished using a given srcu_struct that
* was initialized via init_srcu_struct(), else you leak memory.
*/
void cleanup_srcu_struct(struct srcu_struct *sp)
{
WARN_ON(sp->srcu_lock_nesting[0] || sp->srcu_lock_nesting[1]);
flush_work(&sp->srcu_work);
WARN_ON(rcu_seq_state(sp->srcu_gp_seq));
WARN_ON(sp->srcu_gp_running);
WARN_ON(sp->srcu_gp_waiting);
WARN_ON(!rcu_segcblist_empty(&sp->srcu_cblist));
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
/*
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Must be called from process context.
* Returns an index that must be passed to the matching srcu_read_unlock().
*/
int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;
idx = READ_ONCE(sp->srcu_idx);
WRITE_ONCE(sp->srcu_lock_nesting[idx], sp->srcu_lock_nesting[idx] + 1);
return idx;
}
EXPORT_SYMBOL_GPL(__srcu_read_lock);
/*
* Removes the count for the old reader from the appropriate element of
* the srcu_struct. Must be called from process context.
*/
void __srcu_read_unlock(struct srcu_struct *sp, int idx)
{
int newval = sp->srcu_lock_nesting[idx] - 1;
WRITE_ONCE(sp->srcu_lock_nesting[idx], newval);
if (!newval && READ_ONCE(sp->srcu_gp_waiting))
swake_up(&sp->srcu_wq);
}
EXPORT_SYMBOL_GPL(__srcu_read_unlock);
/*
* Workqueue handler to drive one grace period and invoke any callbacks
* that become ready as a result. Single-CPU and !PREEMPT operation
* means that we get away with murder on synchronization. ;-)
*/
void srcu_drive_gp(struct work_struct *wp)
{
int idx;
struct rcu_cblist ready_cbs;
struct srcu_struct *sp;
struct rcu_head *rhp;
sp = container_of(wp, struct srcu_struct, srcu_work);
if (sp->srcu_gp_running || rcu_segcblist_empty(&sp->srcu_cblist))
return; /* Already running or nothing to do. */
/* Tag recently arrived callbacks and wait for readers. */
WRITE_ONCE(sp->srcu_gp_running, true);
rcu_segcblist_accelerate(&sp->srcu_cblist,
rcu_seq_snap(&sp->srcu_gp_seq));
rcu_seq_start(&sp->srcu_gp_seq);
idx = sp->srcu_idx;
WRITE_ONCE(sp->srcu_idx, !sp->srcu_idx);
WRITE_ONCE(sp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
swait_event(sp->srcu_wq, !READ_ONCE(sp->srcu_lock_nesting[idx]));
WRITE_ONCE(sp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
rcu_seq_end(&sp->srcu_gp_seq);
/* Update callback list based on GP, and invoke ready callbacks. */
rcu_segcblist_advance(&sp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
if (rcu_segcblist_ready_cbs(&sp->srcu_cblist)) {
rcu_cblist_init(&ready_cbs);
local_irq_disable();
rcu_segcblist_extract_done_cbs(&sp->srcu_cblist, &ready_cbs);
local_irq_enable();
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
}
local_irq_disable();
rcu_segcblist_insert_count(&sp->srcu_cblist, &ready_cbs);
local_irq_enable();
}
WRITE_ONCE(sp->srcu_gp_running, false);
/*
* If more callbacks, reschedule ourselves. This can race with
* a call_srcu() at interrupt level, but the ->srcu_gp_running
* checks will straighten that out.
*/
if (!rcu_segcblist_empty(&sp->srcu_cblist))
schedule_work(&sp->srcu_work);
}
EXPORT_SYMBOL_GPL(srcu_drive_gp);
/*
* Enqueue an SRCU callback on the specified srcu_struct structure,
* initiating grace-period processing if it is not already running.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
rcu_callback_t func)
{
unsigned long flags;
head->func = func;
local_irq_save(flags);
rcu_segcblist_enqueue(&sp->srcu_cblist, head, false);
local_irq_restore(flags);
if (!READ_ONCE(sp->srcu_gp_running))
schedule_work(&sp->srcu_work);
}
EXPORT_SYMBOL_GPL(call_srcu);
/*
* synchronize_srcu - wait for prior SRCU read-side critical-section completion
*/
void synchronize_srcu(struct srcu_struct *sp)
{
struct rcu_synchronize rs;
init_rcu_head_on_stack(&rs.head);
init_completion(&rs.completion);
call_srcu(sp, &rs.head, wakeme_after_rcu);
wait_for_completion(&rs.completion);
destroy_rcu_head_on_stack(&rs.head);
}
EXPORT_SYMBOL_GPL(synchronize_srcu);
This diff is collapsed.
......@@ -79,7 +79,7 @@ EXPORT_SYMBOL(__rcu_is_watching);
*/
static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
{
RCU_TRACE(reset_cpu_stall_ticks(rcp));
RCU_TRACE(reset_cpu_stall_ticks(rcp);)
if (rcp->donetail != rcp->curtail) {
rcp->donetail = rcp->curtail;
return 1;
......@@ -125,7 +125,7 @@ void rcu_bh_qs(void)
*/
void rcu_check_callbacks(int user)
{
RCU_TRACE(check_cpu_stalls());
RCU_TRACE(check_cpu_stalls();)
if (user)
rcu_sched_qs();
else if (!in_softirq())
......@@ -143,7 +143,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
const char *rn = NULL;
struct rcu_head *next, *list;
unsigned long flags;
RCU_TRACE(int cb_count = 0);
RCU_TRACE(int cb_count = 0;)
/* Move the ready-to-invoke callbacks to a local list. */
local_irq_save(flags);
......@@ -152,7 +152,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
local_irq_restore(flags);
return;
}
RCU_TRACE(trace_rcu_batch_start(rcp->name, 0, rcp->qlen, -1));
RCU_TRACE(trace_rcu_batch_start(rcp->name, 0, rcp->qlen, -1);)
list = rcp->rcucblist;
rcp->rcucblist = *rcp->donetail;
*rcp->donetail = NULL;
......@@ -162,7 +162,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
local_irq_restore(flags);
/* Invoke the callbacks on the local list. */
RCU_TRACE(rn = rcp->name);
RCU_TRACE(rn = rcp->name;)
while (list) {
next = list->next;
prefetch(next);
......@@ -171,9 +171,9 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
__rcu_reclaim(rn, list);
local_bh_enable();
list = next;
RCU_TRACE(cb_count++);
RCU_TRACE(cb_count++;)
}
RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count));
RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count);)
RCU_TRACE(trace_rcu_batch_end(rcp->name,
cb_count, 0, need_resched(),
is_idle_task(current),
......@@ -221,7 +221,7 @@ static void __call_rcu(struct rcu_head *head,
local_irq_save(flags);
*rcp->curtail = head;
rcp->curtail = &head->next;
RCU_TRACE(rcp->qlen++);
RCU_TRACE(rcp->qlen++;)
local_irq_restore(flags);
if (unlikely(is_idle_task(current))) {
......@@ -254,8 +254,8 @@ EXPORT_SYMBOL_GPL(call_rcu_bh);
void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
RCU_TRACE(reset_cpu_stall_ticks(&rcu_sched_ctrlblk));
RCU_TRACE(reset_cpu_stall_ticks(&rcu_bh_ctrlblk));
RCU_TRACE(reset_cpu_stall_ticks(&rcu_sched_ctrlblk);)
RCU_TRACE(reset_cpu_stall_ticks(&rcu_bh_ctrlblk);)
rcu_early_boot_tests();
}
......@@ -52,7 +52,7 @@ static struct rcu_ctrlblk rcu_bh_ctrlblk = {
RCU_TRACE(.name = "rcu_bh")
};
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU)
#include <linux/kernel_stat.h>
int rcu_scheduler_active __read_mostly;
......@@ -65,15 +65,16 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
* to RCU_SCHEDULER_RUNNING, skipping the RCU_SCHEDULER_INIT stage.
* The reason for this is that Tiny RCU does not need kthreads, so does
* not have to care about the fact that the scheduler is half-initialized
* at a certain phase of the boot process.
* at a certain phase of the boot process. Unless SRCU is in the mix.
*/
void __init rcu_scheduler_starting(void)
{
WARN_ON(nr_context_switches() > 0);
rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
rcu_scheduler_active = IS_ENABLED(CONFIG_SRCU)
? RCU_SCHEDULER_INIT : RCU_SCHEDULER_RUNNING;
}
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
#ifdef CONFIG_RCU_TRACE
......@@ -162,8 +163,8 @@ static void reset_cpu_stall_ticks(struct rcu_ctrlblk *rcp)
static void check_cpu_stalls(void)
{
RCU_TRACE(check_cpu_stall(&rcu_bh_ctrlblk));
RCU_TRACE(check_cpu_stall(&rcu_sched_ctrlblk));
RCU_TRACE(check_cpu_stall(&rcu_bh_ctrlblk);)
RCU_TRACE(check_cpu_stall(&rcu_sched_ctrlblk);)
}
#endif /* #ifdef CONFIG_RCU_TRACE */
This diff is collapsed.
......@@ -30,80 +30,8 @@
#include <linux/seqlock.h>
#include <linux/swait.h>
#include <linux/stop_machine.h>
/*
* Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
* CONFIG_RCU_FANOUT_LEAF.
* In theory, it should be possible to add more levels straightforwardly.
* In practice, this did work well going from three levels to four.
* Of course, your mileage may vary.
*/
#ifdef CONFIG_RCU_FANOUT
#define RCU_FANOUT CONFIG_RCU_FANOUT
#else /* #ifdef CONFIG_RCU_FANOUT */
# ifdef CONFIG_64BIT
# define RCU_FANOUT 64
# else
# define RCU_FANOUT 32
# endif
#endif /* #else #ifdef CONFIG_RCU_FANOUT */
#ifdef CONFIG_RCU_FANOUT_LEAF
#define RCU_FANOUT_LEAF CONFIG_RCU_FANOUT_LEAF
#else /* #ifdef CONFIG_RCU_FANOUT_LEAF */
# ifdef CONFIG_64BIT
# define RCU_FANOUT_LEAF 64
# else
# define RCU_FANOUT_LEAF 32
# endif
#endif /* #else #ifdef CONFIG_RCU_FANOUT_LEAF */
#define RCU_FANOUT_1 (RCU_FANOUT_LEAF)
#define RCU_FANOUT_2 (RCU_FANOUT_1 * RCU_FANOUT)
#define RCU_FANOUT_3 (RCU_FANOUT_2 * RCU_FANOUT)
#define RCU_FANOUT_4 (RCU_FANOUT_3 * RCU_FANOUT)
#if NR_CPUS <= RCU_FANOUT_1
# define RCU_NUM_LVLS 1
# define NUM_RCU_LVL_0 1
# define NUM_RCU_NODES NUM_RCU_LVL_0
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0 }
# define RCU_NODE_NAME_INIT { "rcu_node_0" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0" }
#elif NR_CPUS <= RCU_FANOUT_2
# define RCU_NUM_LVLS 2
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1" }
#elif NR_CPUS <= RCU_FANOUT_3
# define RCU_NUM_LVLS 3
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
#elif NR_CPUS <= RCU_FANOUT_4
# define RCU_NUM_LVLS 4
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3)
# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
#else
# error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
#endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
extern int rcu_num_lvls;
extern int rcu_num_nodes;
#include <linux/rcu_segcblist.h>
#include <linux/rcu_node_tree.h>
/*
* Dynticks per-CPU state.
......@@ -113,6 +41,9 @@ struct rcu_dynticks {
/* Process level is worth LLONG_MAX/2. */
int dynticks_nmi_nesting; /* Track NMI nesting level. */
atomic_t dynticks; /* Even value for idle, else odd. */
bool rcu_need_heavy_qs; /* GP old, need heavy quiescent state. */
unsigned long rcu_qs_ctr; /* Light universal quiescent state ctr. */
bool rcu_urgent_qs; /* GP old need light quiescent state. */
#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
long long dynticks_idle_nesting;
/* irq/process nesting level from idle. */
......@@ -261,41 +192,6 @@ struct rcu_node {
*/
#define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo))
/*
* Do a full breadth-first scan of the rcu_node structures for the
* specified rcu_state structure.
*/
#define rcu_for_each_node_breadth_first(rsp, rnp) \
for ((rnp) = &(rsp)->node[0]; \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
/*
* Do a breadth-first scan of the non-leaf rcu_node structures for the
* specified rcu_state structure. Note that if there is a singleton
* rcu_node tree with but one rcu_node structure, this loop is a no-op.
*/
#define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \
for ((rnp) = &(rsp)->node[0]; \
(rnp) < (rsp)->level[rcu_num_lvls - 1]; (rnp)++)
/*
* Scan the leaves of the rcu_node hierarchy for the specified rcu_state
* structure. Note that if there is a singleton rcu_node tree with but
* one rcu_node structure, this loop -will- visit the rcu_node structure.
* It is still a leaf node, even if it is also the root node.
*/
#define rcu_for_each_leaf_node(rsp, rnp) \
for ((rnp) = (rsp)->level[rcu_num_lvls - 1]; \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
/*
* Iterate over all possible CPUs in a leaf RCU node.
*/
#define for_each_leaf_node_possible_cpu(rnp, cpu) \
for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
cpu <= rnp->grphi; \
cpu = cpumask_next((cpu), cpu_possible_mask))
/*
* Union to allow "aggregate OR" operation on the need for a quiescent
* state by the normal and expedited grace periods.
......@@ -336,34 +232,9 @@ struct rcu_data {
/* period it is aware of. */
/* 2) batch handling */
/*
* If nxtlist is not NULL, it is partitioned as follows.
* Any of the partitions might be empty, in which case the
* pointer to that partition will be equal to the pointer for
* the following partition. When the list is empty, all of
* the nxttail elements point to the ->nxtlist pointer itself,
* which in that case is NULL.
*
* [nxtlist, *nxttail[RCU_DONE_TAIL]):
* Entries that batch # <= ->completed
* The grace period for these entries has completed, and
* the other grace-period-completed entries may be moved
* here temporarily in rcu_process_callbacks().
* [*nxttail[RCU_DONE_TAIL], *nxttail[RCU_WAIT_TAIL]):
* Entries that batch # <= ->completed - 1: waiting for current GP
* [*nxttail[RCU_WAIT_TAIL], *nxttail[RCU_NEXT_READY_TAIL]):
* Entries known to have arrived before current GP ended
* [*nxttail[RCU_NEXT_READY_TAIL], *nxttail[RCU_NEXT_TAIL]):
* Entries that might have arrived after current GP ended
* Note that the value of *nxttail[RCU_NEXT_TAIL] will
* always be NULL, as this is the end of the list.
*/
struct rcu_head *nxtlist;
struct rcu_head **nxttail[RCU_NEXT_SIZE];
unsigned long nxtcompleted[RCU_NEXT_SIZE];
/* grace periods for sublists. */
long qlen_lazy; /* # of lazy queued callbacks */
long qlen; /* # of queued callbacks, incl lazy */
struct rcu_segcblist cblist; /* Segmented callback list, with */
/* different callbacks waiting for */
/* different grace periods. */
long qlen_last_fqs_check;
/* qlen at last check for QS forcing */
unsigned long n_cbs_invoked; /* count of RCU cbs invoked. */
......@@ -482,7 +353,6 @@ struct rcu_state {
struct rcu_node *level[RCU_NUM_LVLS + 1];
/* Hierarchy levels (+1 to */
/* shut bogus gcc warning) */
u8 flavor_mask; /* bit in flavor mask. */
struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */
call_rcu_func_t call; /* call_rcu() flavor. */
int ncpus; /* # CPUs seen so far. */
......@@ -502,14 +372,11 @@ struct rcu_state {
raw_spinlock_t orphan_lock ____cacheline_internodealigned_in_smp;
/* Protect following fields. */
struct rcu_head *orphan_nxtlist; /* Orphaned callbacks that */
struct rcu_cblist orphan_pend; /* Orphaned callbacks that */
/* need a grace period. */
struct rcu_head **orphan_nxttail; /* Tail of above. */
struct rcu_head *orphan_donelist; /* Orphaned callbacks that */
struct rcu_cblist orphan_done; /* Orphaned callbacks that */
/* are ready to invoke. */
struct rcu_head **orphan_donetail; /* Tail of above. */
long qlen_lazy; /* Number of lazy callbacks. */
long qlen; /* Total number of callbacks. */
/* (Contains counts.) */
/* End of fields guarded by orphan_lock. */
struct mutex barrier_mutex; /* Guards barrier fields. */
......@@ -596,6 +463,7 @@ extern struct rcu_state rcu_preempt_state;
#endif /* #ifdef CONFIG_PREEMPT_RCU */
int rcu_dynticks_snap(struct rcu_dynticks *rdtp);
bool rcu_eqs_special_set(int cpu);
#ifdef CONFIG_RCU_BOOST
DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
......@@ -673,6 +541,14 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
static void rcu_dynticks_task_enter(void);
static void rcu_dynticks_task_exit(void);
#ifdef CONFIG_SRCU
void srcu_online_cpu(unsigned int cpu);
void srcu_offline_cpu(unsigned int cpu);
#else /* #ifdef CONFIG_SRCU */
void srcu_online_cpu(unsigned int cpu) { }
void srcu_offline_cpu(unsigned int cpu) { }
#endif /* #else #ifdef CONFIG_SRCU */
#endif /* #ifndef RCU_TREE_NONCORE */
#ifdef CONFIG_RCU_TRACE
......
......@@ -292,7 +292,7 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
rnp->grplo, rnp->grphi,
TPS("wait"));
wait_event(rnp->exp_wq[(s >> 1) & 0x3],
wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
sync_exp_work_done(rsp,
&rdp->exp_workdone2, s));
return true;
......@@ -331,6 +331,8 @@ static void sync_sched_exp_handler(void *data)
return;
}
__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
/* Store .exp before .rcu_urgent_qs. */
smp_store_release(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs), true);
resched_cpu(smp_processor_id());
}
......@@ -531,7 +533,8 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
rnp->exp_seq_rq = s;
spin_unlock(&rnp->exp_lock);
}
wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]);
smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]);
}
trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
mutex_unlock(&rsp->exp_wake_mutex);
......@@ -609,9 +612,9 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
/* Wait for expedited grace period to complete. */
rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
rnp = rcu_get_root(rsp);
wait_event(rnp->exp_wq[(s >> 1) & 0x3],
sync_exp_work_done(rsp,
&rdp->exp_workdone0, s));
wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
sync_exp_work_done(rsp, &rdp->exp_workdone0, s));
smp_mb(); /* Workqueue actions happen before return. */
/* Let the next expedited grace period start. */
mutex_unlock(&rsp->exp_mutex);
......@@ -735,15 +738,3 @@ void synchronize_rcu_expedited(void)
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
/*
* Switch to run-time mode once Tree RCU has fully initialized.
*/
static int __init rcu_exp_runtime_mode(void)
{
rcu_test_sync_prims();
rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
rcu_test_sync_prims();
return 0;
}
core_initcall(rcu_exp_runtime_mode);
This diff is collapsed.
......@@ -41,11 +41,11 @@
#include <linux/mutex.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include <linux/prefetch.h>
#define RCU_TREE_NONCORE
#include "tree.h"
DECLARE_PER_CPU_SHARED_ALIGNED(unsigned long, rcu_qs_ctr);
#include "rcu.h"
static int r_open(struct inode *inode, struct file *file,
const struct seq_operations *op)
......@@ -121,7 +121,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
cpu_is_offline(rdp->cpu) ? '!' : ' ',
ulong2long(rdp->completed), ulong2long(rdp->gpnum),
rdp->cpu_no_qs.b.norm,
rdp->rcu_qs_ctr_snap == per_cpu(rcu_qs_ctr, rdp->cpu),
rdp->rcu_qs_ctr_snap == per_cpu(rdp->dynticks->rcu_qs_ctr, rdp->cpu),
rdp->core_needs_qs);
seq_printf(m, " dt=%d/%llx/%d df=%lu",
rcu_dynticks_snap(rdp->dynticks),
......@@ -130,17 +130,15 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
rdp->dynticks_fqs);
seq_printf(m, " of=%lu", rdp->offline_fqs);
rcu_nocb_q_lengths(rdp, &ql, &qll);
qll += rdp->qlen_lazy;
ql += rdp->qlen;
qll += rcu_segcblist_n_lazy_cbs(&rdp->cblist);
ql += rcu_segcblist_n_cbs(&rdp->cblist);
seq_printf(m, " ql=%ld/%ld qs=%c%c%c%c",
qll, ql,
".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] !=
rdp->nxttail[RCU_NEXT_TAIL]],
".R"[rdp->nxttail[RCU_WAIT_TAIL] !=
rdp->nxttail[RCU_NEXT_READY_TAIL]],
".W"[rdp->nxttail[RCU_DONE_TAIL] !=
rdp->nxttail[RCU_WAIT_TAIL]],
".D"[&rdp->nxtlist != rdp->nxttail[RCU_DONE_TAIL]]);
".N"[!rcu_segcblist_segempty(&rdp->cblist, RCU_NEXT_TAIL)],
".R"[!rcu_segcblist_segempty(&rdp->cblist,
RCU_NEXT_READY_TAIL)],
".W"[!rcu_segcblist_segempty(&rdp->cblist, RCU_WAIT_TAIL)],
".D"[!rcu_segcblist_segempty(&rdp->cblist, RCU_DONE_TAIL)]);
#ifdef CONFIG_RCU_BOOST
seq_printf(m, " kt=%d/%c ktl=%x",
per_cpu(rcu_cpu_has_work, rdp->cpu),
......@@ -278,7 +276,9 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
seq_printf(m, "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld/%ld\n",
rsp->n_force_qs, rsp->n_force_qs_ngp,
rsp->n_force_qs - rsp->n_force_qs_ngp,
READ_ONCE(rsp->n_force_qs_lh), rsp->qlen_lazy, rsp->qlen);
READ_ONCE(rsp->n_force_qs_lh),
rcu_cblist_n_lazy_cbs(&rsp->orphan_done),
rcu_cblist_n_cbs(&rsp->orphan_done));
for (rnp = &rsp->node[0]; rnp - &rsp->node[0] < rcu_num_nodes; rnp++) {
if (rnp->level != level) {
seq_puts(m, "\n");
......
This diff is collapsed.
......@@ -3378,7 +3378,7 @@ static void __sched notrace __schedule(bool preempt)
hrtick_clear(rq);
local_irq_disable();
rcu_note_context_switch();
rcu_note_context_switch(preempt);
/*
* Make sure that signal_pending_state()->signal_pending() below
......
......@@ -1237,7 +1237,7 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
}
/*
* This sighand can be already freed and even reused, but
* we rely on SLAB_DESTROY_BY_RCU and sighand_ctor() which
* we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
* initializes ->siglock: this slab can't go away, it has
* the same object type, ->siglock can't be reinitialized.
*
......
This diff is collapsed.
......@@ -95,7 +95,7 @@ void kmemcheck_slab_alloc(struct kmem_cache *s, gfp_t gfpflags, void *object,
void kmemcheck_slab_free(struct kmem_cache *s, void *object, size_t size)
{
/* TODO: RCU freeing is unsupported for now; hide false positives. */
if (!s->ctor && !(s->flags & SLAB_DESTROY_BY_RCU))
if (!s->ctor && !(s->flags & SLAB_TYPESAFE_BY_RCU))
kmemcheck_mark_freed(object, size);
}
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment