1. 08 Sep, 2024 6 commits
    • Neeraj Upadhyay's avatar
      Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a',... · 355debb8
      Neeraj Upadhyay authored
      Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a', 'nocb.09.09.24a', 'rcutorture.14.08.24a', 'rcustall.09.09.24a', 'srcu.12.08.24a', 'rcu.tasks.14.08.24a', 'rcu_scaling_tests.15.08.24a', 'fixes.12.08.24a' and 'misc.11.08.24a' into next.09.09.24a
      355debb8
    • Paul E. McKenney's avatar
      rcu: Defer printing stall-warning backtrace when holding rcu_node lock · 1ecd9d68
      Paul E. McKenney authored
      The rcu_dump_cpu_stacks() holds the leaf rcu_node structure's ->lock
      when dumping the stakcks of any CPUs stalling the current grace period.
      This lock is held to prevent confusion that would otherwise occur when
      the stalled CPU reported its quiescent state (and then went on to do
      unrelated things) just as the backtrace NMI was heading towards it.
      
      This has worked well, but on larger systems has recently been observed
      to cause severe lock contention resulting in CSD-lock stalls and other
      general unhappiness.
      
      This commit therefore does printk_deferred_enter() before acquiring
      the lock and printk_deferred_exit() after releasing it, thus deferring
      the overhead of actually outputting the stack trace out of that lock's
      critical section.
      Reported-by: default avatarRik van Riel <riel@surriel.com>
      Suggested-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatar"Paul E. McKenney" <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      1ecd9d68
    • Frederic Weisbecker's avatar
      rcu/nocb: Remove superfluous memory barrier after bypass enqueue · 7562eed2
      Frederic Weisbecker authored
      Pre-GP accesses performed by the update side must be ordered against
      post-GP accesses performed by the readers. This is ensured by the
      bypass or nocb locking on enqueue time, followed by the fully ordered
      rnp locking initiated while callbacks are accelerated, and then
      propagated throughout the whole GP lifecyle associated with the
      callbacks.
      
      Therefore the explicit barrier advertizing ordering between bypass
      enqueue and rcuo wakeup is superfluous. If anything, it would even only
      order the first bypass callback enqueue against the rcuo wakeup and
      ignore all the subsequent ones.
      
      Remove the needless barrier.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      7562eed2
    • Frederic Weisbecker's avatar
      rcu/nocb: Conditionally wake up rcuo if not already waiting on GP · 1b022b87
      Frederic Weisbecker authored
      A callback enqueuer currently wakes up the rcuo kthread if it is adding
      the first non-done callback of a CPU, whether the kthread is waiting on
      a grace period or not (unless the CPU is offline).
      
      This looks like a desired behaviour because then the rcuo kthread
      doesn't wait for the end of the current grace period to handle the
      callback. It is accelerated right away and assigned to the next grace
      period. The GP kthread is notified about that fact and iterates with
      the upcoming GP without sleeping in-between.
      
      However this best-case scenario is contradicted by a few details,
      depending on the situation:
      
      1) If the callback is a non-bypass one queued with IRQs enabled, the
         wake up only occurs if no other pending callbacks are on the list.
         Therefore the theoretical "optimization" actually applies on rare
         occasions.
      
      2) If the callback is a non-bypass one queued with IRQs disabled, the
         situation is similar with even more uncertainty due to the deferred
         wake up.
      
      3) If the callback is lazy, a few jiffies don't make any difference.
      
      4) If the callback is bypass, the wake up timer is programmed 2 jiffies
         ahead by rcuo in case the regular pending queue has been handled
         in the meantime. The rare storm of callbacks can otherwise wait for
         the currently elapsing grace period to be flushed and handled.
      
      For all those reasons, the optimization is only theoretical and
      occasional. Therefore it is reasonable that callbacks enqueuers only
      wake up the rcuo kthread when it is not already waiting on a grace
      period to complete.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      1b022b87
    • Frederic Weisbecker's avatar
      rcu/nocb: Fix RT throttling hrtimer armed from offline CPU · 9139f932
      Frederic Weisbecker authored
      After a CPU is marked offline and until it reaches its final trip to
      idle, rcuo has several opportunities to be woken up, either because
      a callback has been queued in the meantime or because
      rcutree_report_cpu_dead() has issued the final deferred NOCB wake up.
      
      If RCU-boosting is enabled, RCU kthreads are set to SCHED_FIFO policy.
      And if RT-bandwidth is enabled, the related hrtimer might be armed.
      However this then happens after hrtimers have been migrated at the
      CPUHP_AP_HRTIMERS_DYING stage, which is broken as reported by the
      following warning:
      
       Call trace:
        enqueue_hrtimer+0x7c/0xf8
        hrtimer_start_range_ns+0x2b8/0x300
        enqueue_task_rt+0x298/0x3f0
        enqueue_task+0x94/0x188
        ttwu_do_activate+0xb4/0x27c
        try_to_wake_up+0x2d8/0x79c
        wake_up_process+0x18/0x28
        __wake_nocb_gp+0x80/0x1a0
        do_nocb_deferred_wakeup_common+0x3c/0xcc
        rcu_report_dead+0x68/0x1ac
        cpuhp_report_idle_dead+0x48/0x9c
        do_idle+0x288/0x294
        cpu_startup_entry+0x34/0x3c
        secondary_start_kernel+0x138/0x158
      
      Fix this with waking up rcuo using an IPI if necessary. Since the
      existing API to deal with this situation only handles swait queue, rcuo
      is only woken up from offline CPUs if it's not already waiting on a
      grace period. In the worst case some callbacks will just wait for a
      grace period to complete before being assigned to a subsequent one.
      Reported-by: default avatar"Cheng-Jui Wang (王正睿)" <Cheng-Jui.Wang@mediatek.com>
      Fixes: 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      9139f932
    • Frederic Weisbecker's avatar
      rcu/nocb: Simplify (de-)offloading state machine · 1fcb932c
      Frederic Weisbecker authored
      Now that the (de-)offloading process can only apply to offline CPUs,
      there is no more concurrency between rcu_core and nocb kthreads. Also
      the mutation now happens on empty queues.
      
      Therefore the state machine can be reduced to a single bit called
      SEGCBLIST_OFFLOADED. Simplify the transition as follows:
      
      * Upon offloading: queue the rdp to be added to the rcuog list and
        wait for the rcuog kthread to set the SEGCBLIST_OFFLOADED bit. Unpark
        rcuo kthread.
      
      * Upon de-offloading: Park rcuo kthread. Queue the rdp to be removed
        from the rcuog list and wait for the rcuog kthread to clear the
        SEGCBLIST_OFFLOADED bit.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      1fcb932c
  2. 15 Aug, 2024 14 commits
  3. 14 Aug, 2024 20 commits