1. 14 Aug, 2023 1 commit
    • Paul E. McKenney's avatar
      rcu-tasks: Fix boot-time RCU tasks debug-only deadlock · 9d0cce2b
      Paul E. McKenney authored
      In kernels built with CONFIG_PROVE_RCU=y (for example, lockdep kernels),
      the following sequence of events can occur:
      
      o	rcu_init_tasks_generic() is invoked just before init is spawned.
      	It invokes rcu_spawn_tasks_kthread() and friends.
      
      o	rcu_spawn_tasks_kthread() invokes rcu_spawn_tasks_kthread_generic(),
      	which uses kthread_run() to create the needed kthread.
      
      o	Control returns to rcu_init_tasks_generic(), which, because this
      	is a CONFIG_PROVE_RCU=y kernel, invokes the version of the
      	rcu_tasks_initiate_self_tests() function that actually does
      	something, including invoking synchronize_rcu_tasks(), which
      	in turn invokes synchronize_rcu_tasks_generic().
      
      o	synchronize_rcu_tasks_generic() sees that the ->kthread_ptr is
      	still NULL, because the newly spawned kthread has not yet
      	started.
      
      o	The new kthread starts, preempting synchronize_rcu_tasks_generic()
      	just after its check.  This kthread invokes rcu_tasks_one_gp(),
      	which acquires ->tasks_gp_mutex, and, seeing no work, blocks
      	in rcuwait_wait_event().  Note that this step requires either
      	a preemptible kernel or a fault-injection-style sleep at the
      	beginning of mutex_lock().
      
      o	synchronize_rcu_tasks_generic() resumes and invokes rcu_tasks_one_gp().
      
      o	rcu_tasks_one_gp() attempts to acquire ->tasks_gp_mutex, which
      	is still held by the newly spawned kthread's rcu_tasks_one_gp()
      	function.  Deadlock.
      
      Because the only reason for ->tasks_gp_mutex is to handle pre-kthread
      synchronous grace periods, this commit avoids this deadlock by having
      rcu_tasks_one_gp() momentarily release ->tasks_gp_mutex while invoking
      rcuwait_wait_event().  This allows the call to rcu_tasks_one_gp() from
      synchronize_rcu_tasks_generic() proceed.
      
      Note that it is not necessary to release the mutex anywhere else in
      rcu_tasks_one_gp() because rcuwait_wait_event() is the only function
      that can block indefinitely.
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reported-by: default avatarRoy Hopkins <rhopkins@suse.de>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Tested-by: default avatarRoy Hopkins <rhopkins@suse.de>
      9d0cce2b
  2. 31 Jul, 2023 1 commit
    • Paul E. McKenney's avatar
      rcu-tasks: Permit use of debug-objects with RCU Tasks flavors · cb88f7f5
      Paul E. McKenney authored
      Currently, cblist_init_generic() holds a raw spinlock when invoking
      INIT_WORK().  This fails in kernels built with CONFIG_DEBUG_OBJECTS=y
      due to memory allocation being forbidden while holding a raw spinlock.
      But the only reason for holding the raw spinlock is to synchronize
      with early boot calls to call_rcu_tasks(), call_rcu_tasks_rude, and,
      last but not least, call_rcu_tasks_trace().  These calls also invoke
      cblist_init_generic() in order to support early boot queueing of
      callbacks.
      
      Except that there are no early boot calls to either of these three
      functions, and the BPF guys confirm that they have no plans to add any
      such calls.
      
      This commit therefore removes the synchronization and adds a
      WARN_ON_ONCE() to catch the case of now-prohibited early boot RCU Tasks
      callback queueing.
      
      If early boot queueing is needed, an "initialized" flag may be added to
      the rcu_tasks structure.  Then queueing a callback before this flag is set
      would initialize the callback list (if needed) and queue the callback.
      The decision as to where to queue the callback given the possibility of
      non-zero boot CPUs is left as an exercise for the reader.
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      cb88f7f5
  3. 24 Jul, 2023 1 commit
    • Paul E. McKenney's avatar
      checkpatch: Complain about unexpected uses of RCU Tasks Trace · 84dd7f19
      Paul E. McKenney authored
      RCU Tasks Trace is quite specialized, having been created specifically
      for sleepable BPF programs.  Because it allows general blocking within
      readers, any new use of RCU Tasks Trace must take current use cases into
      account.  Therefore, update checkpatch.pl to complain about use of any of
      the RCU Tasks Trace API members outside of BPF and outside of RCU itself.
      
      [ paulmck: Apply Joe Perches feedback. ]
      
      Cc: Andy Whitcroft <apw@canonical.com> (maintainer:CHECKPATCH)
      Cc: Joe Perches <joe@perches.com> (maintainer:CHECKPATCH)
      Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> (reviewer:CHECKPATCH)
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: <bpf@vger.kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      84dd7f19
  4. 14 Jul, 2023 4 commits
    • Paul E. McKenney's avatar
      rcu-tasks: Cancel callback laziness if too many callbacks · db13710a
      Paul E. McKenney authored
      The various RCU Tasks flavors now do lazy grace periods when there are
      only asynchronous grace period requests.  By default, the system will let
      250 milliseconds elapse after the first call_rcu_tasks*() callbacki is
      queued before starting a grace period.  In contrast, synchronous grace
      period requests such as synchronize_rcu_tasks*() will start a grace
      period immediately.
      
      However, invoking one of the call_rcu_tasks*() functions in a too-tight
      loop can result in a callback flood, which in turn can exhaust memory
      if grace periods are delayed for too long.
      
      This commit therefore sets a limit so that the grace-period kthread
      will be awakened when any CPU's callback list expands to contain
      rcupdate.rcu_task_lazy_lim callbacks elements (defaulting to 32, set to -1
      to disable), the grace-period kthread will be awakened, thus cancelling
      any ongoing laziness and getting out in front of the potential callback
      flood.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      db13710a
    • Paul E. McKenney's avatar
      rcu-tasks: Add kernel boot parameters for callback laziness · 450d461a
      Paul E. McKenney authored
      This commit adds kernel boot parameters for callback laziness, allowing
      the RCU Tasks flavors to be individually adjusted.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      450d461a
    • Paul E. McKenney's avatar
      rcu-tasks: Remove redundant #ifdef CONFIG_TASKS_RCU · 5ae769c6
      Paul E. McKenney authored
      The kernel/rcu/tasks.h file has a #endif immediately followed by an
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5ae769c6
    • Paul E. McKenney's avatar
      rcu-tasks: Treat only synchronous grace periods urgently · d119357d
      Paul E. McKenney authored
      The performance requirements on RCU Tasks, and in particular on RCU
      Tasks Trace, have evolved over time as the workloads have evolved.
      The current implementation is designed to provide low grace-period
      latencies, and also to accommodate short-duration floods of callbacks.
      
      However, current workloads can also provide a constant background
      callback-queuing rate of a few hundred call_rcu_tasks_trace() invocations
      per second.  This results in continuous back-to-back RCU Tasks Trace
      grace periods, which in turn can consume the better part of 10% of a CPU.
      One could take the attitude that there are several tens of other CPUs on
      the systems running such workloads, but energy efficiency is a thing.
      On these systems, although asynchronous grace-period requests happen
      every few milliseconds, synchronous grace-period requests are quite rare.
      
      This commit therefore arrranges for grace periods to be initiated
      immediately in response to calls to synchronize_rcu_tasks*() and
      also to calls to synchronize_rcu_mult() that are passed one of the
      call_rcu_tasks*() functions.  These are recognized by the tell-tale
      wakeme_after_rcu callback function.
      
      In other cases, callbacks are gathered up for up to about 250 milliseconds
      before a grace period is initiated.  This results in more than an order of
      magnitude reduction in RCU Tasks Trace grace periods, with corresponding
      reduction in consumption of CPU time.
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reported-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      d119357d
  5. 09 Jul, 2023 10 commits
  6. 08 Jul, 2023 23 commits