1. 12 Apr, 2022 9 commits
    • Paul E. McKenney's avatar
      rcu-tasks: Handle sparse cpu_possible_mask in rcu_tasks_invoke_cbs() · ab2756ea
      Paul E. McKenney authored
      If the cpu_possible_mask is sparse (for example, if bits are set only for
      CPUs 0, 4, 8, ...), then rcu_tasks_invoke_cbs() will access per-CPU data
      for a CPU not in cpu_possible_mask.  It makes these accesses while doing
      a workqueue-based binary search for non-empty callback lists.  Although
      this search must pass through CPUs not represented in cpu_possible_mask,
      it has no need to check the callback list for such CPUs.
      
      This commit therefore changes the rcu_tasks_invoke_cbs() function's
      binary search so as to only check callback lists for CPUs present in
      cpu_possible_mask.
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ab2756ea
    • Eric Dumazet's avatar
      rcu-tasks: Handle sparse cpu_possible_mask · 07d95c34
      Eric Dumazet authored
      If the rcupdate.rcu_task_enqueue_lim kernel boot parameter is set to
      something greater than 1 and less than nr_cpu_ids, the code attempts to
      use a subset of the CPU's RCU Tasks callback lists.  This works, but only
      if the cpu_possible_mask is contiguous.  If there are "holes" in this
      mask, the callback-enqueue code might attempt to access a non-existent
      per-CPU ->rtcpu variable for a non-existent CPU.  For example, if only
      CPUs 0, 4, 8, 12, 16 and so on are in cpu_possible_mask, specifying
      rcupdate.rcu_task_enqueue_lim=4 would cause the code to attempt to
      use callback queues for non-existent CPUs 1, 2, and 3.  Because such
      systems have existed in the past and might still exist, the code needs
      to gracefully handle this situation.
      
      This commit therefore checks to see whether the desired CPU is present
      in cpu_possible_mask, and, if not, searches for the next CPU.  This means
      that the systems administrator of a system with a sparse cpu_possible_mask
      will need to account for this sparsity when specifying the value of
      the rcupdate.rcu_task_enqueue_lim kernel boot parameter.  For example,
      setting this parameter to the value 4 will use only CPUs 0 and 4, which
      CPU 4 getting three times the callback load of CPU 0.
      
      This commit assumes that bit (nr_cpu_ids - 1) is always set in
      cpu_possible_mask.
      
      Link: https://lore.kernel.org/lkml/CANn89iKaNEwyNZ=L_PQnkH0LP_XjLYrr_dpyRKNNoDJaWKdrmg@mail.gmail.com/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      07d95c34
    • Paul E. McKenney's avatar
      rcu-tasks: Make show_rcu_tasks_generic_gp_kthread() check all CPUs · 10b3742f
      Paul E. McKenney authored
      Currently, the show_rcu_tasks_generic_gp_kthread() function only looks
      at CPU 0's callback lists.  Although this is not fatal, it can confuse
      debugging efforts in cases where any of the Tasks RCU flavors are in
      per-CPU queueing mode.  This commit therefore causes this function to
      scan all CPUs' callback queues.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      10b3742f
    • Paul E. McKenney's avatar
      rcu-tasks: Restore use of timers for non-RT kernels · bddf7122
      Paul E. McKenney authored
      The use of hrtimers for RCU-tasks grace-period delays works well in
      general, but can result in excessive grace-period delays for some
      corner-case workloads.  This commit therefore reverts to the use of
      timers for non-RT kernels to mitigate those grace-period delays.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      bddf7122
    • Sebastian Andrzej Siewior's avatar
      rcu-tasks: Use schedule_hrtimeout_range() to wait for grace periods · 777570d9
      Sebastian Andrzej Siewior authored
      The synchronous RCU-tasks grace-period-wait primitives invoke
      schedule_timeout_idle() to give readers a chance to exit their
      read-side critical sections.  Unfortunately, this fails during early
      boot on PREEMPT_RT because PREEMPT_RT relies solely on ksoftirqd to run
      timer handlers.  Because ksoftirqd cannot operate until its kthreads
      are spawned, there is a brief period of time following scheduler
      initialization where PREEMPT_RT cannot run the timer handlers that
      schedule_timeout_idle() relies on, resulting in a hang.
      
      To avoid this boot-time hang, this commit replaces schedule_timeout_idle()
      with schedule_hrtimeout(), so that the timer expires in hardirq context.
      This is ensures that the timer fires even on PREEMPT_RT throughout the
      irqs-enabled portions of boot as well as during runtime.
      
      The timer is set to expire between fract and fract + HZ / 2 jiffies in
      order to align with any other timers that might expire during that time,
      thus reducing the number of wakeups.
      
      Note that RCU-tasks grace periods are infrequent, so the use of hrtimer
      should be fine.  In contrast, in common-case code, user of hrtimer
      could result in performance issues.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      777570d9
    • Paul E. McKenney's avatar
      rcu-tasks: Make Tasks RCU account for userspace execution · 5d900708
      Paul E. McKenney authored
      The main Tasks RCU quiescent state is voluntary context switch.  However,
      userspace execution is also a valid quiescent state, and is a valuable one
      for userspace applications that spin repeatedly executing light-weight
      non-sleeping system calls.  Currently, such an application can delay a
      Tasks RCU grace period for many tens of seconds.
      
      This commit therefore enlists the aid of the scheduler-clock interrupt to
      provide a Tasks RCU quiescent state when it interrupted a task executing
      in userspace.
      
      [ paulmck: Apply feedback from kernel test robot. ]
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Neil Spring <ntspring@fb.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5d900708
    • Sebastian Andrzej Siewior's avatar
      rcu-tasks: Use rcuwait for the rcu_tasks_kthread() · 88db792b
      Sebastian Andrzej Siewior authored
      The waitqueue used by rcu_tasks_kthread() has always only one waiter.
      With a guaranteed only one waiter, this can be replaced with rcuwait
      which is smaller and simpler. With rcuwait based wake counterpart, the
      irqwork function (call_rcu_tasks_iw_wakeup()) can be invoked hardirq
      context because it is only a wake up and no sleeping locks are involved
      (unlike the wait_queue_head).
      As a side effect, this is also one piece of the puzzle to pass the RCU
      selftest at early boot on PREEMPT_RT.
      
      Replace wait_queue_head with rcuwait and let the irqwork run in hardirq
      context on PREEMPT_RT.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      88db792b
    • Paul E. McKenney's avatar
      rcu-tasks: Print pre-stall-warning informational messages · f2539003
      Paul E. McKenney authored
      RCU-tasks stall-warning messages are printed after the grace period is ten
      minutes old.  Unfortunately, most of us will have rebooted the system in
      response to an apparently-hung command long before the ten minutes is up,
      and will thus see what looks to be a silent hang.
      
      This commit therefore adds pr_info() messages that are printed earlier.
      These should avoid being classified as errors, but should give impatient
      users a hint.  These are controlled by new rcupdate.rcu_task_stall_info
      and rcupdate.rcu_task_stall_info_mult kernel-boot parameters.  The former
      defines the initial delay in jiffies (defaulting to 10 seconds) and the
      latter defines the multiplier (defaulting to 3).  Thus, by default, the
      first message will appear 10 seconds into the RCU-tasks grace period,
      the second 40 seconds in, and the third 160 seconds in.  There would be
      a fourth at 640 seconds in, but the stall warning message appears 600
      seconds in, and once a stall warning is printed for a given grace period,
      no further informational messages are printed.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      f2539003
    • Padmanabha Srinivasaiah's avatar
      rcu-tasks: Fix race in schedule and flush work · f75fd4b9
      Padmanabha Srinivasaiah authored
      While booting secondary CPUs, cpus_read_[lock/unlock] is not keeping
      online cpumask stable. The transient online mask results in below
      calltrace.
      
      [    0.324121] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
      [    0.346652] Detected PIPT I-cache on CPU2
      [    0.347212] CPU2: Booted secondary processor 0x0000000002 [0x410fd083]
      [    0.377255] Detected PIPT I-cache on CPU3
      [    0.377823] CPU3: Booted secondary processor 0x0000000003 [0x410fd083]
      [    0.379040] ------------[ cut here ]------------
      [    0.383662] WARNING: CPU: 0 PID: 10 at kernel/workqueue.c:3084 __flush_work+0x12c/0x138
      [    0.384850] Modules linked in:
      [    0.385403] CPU: 0 PID: 10 Comm: rcu_tasks_rude_ Not tainted 5.17.0-rc3-v8+ #13
      [    0.386473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
      [    0.387289] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [    0.388308] pc : __flush_work+0x12c/0x138
      [    0.388970] lr : __flush_work+0x80/0x138
      [    0.389620] sp : ffffffc00aaf3c60
      [    0.390139] x29: ffffffc00aaf3d20 x28: ffffffc009c16af0 x27: ffffff80f761df48
      [    0.391316] x26: 0000000000000004 x25: 0000000000000003 x24: 0000000000000100
      [    0.392493] x23: ffffffffffffffff x22: ffffffc009c16b10 x21: ffffffc009c16b28
      [    0.393668] x20: ffffffc009e53861 x19: ffffff80f77fbf40 x18: 00000000d744fcc9
      [    0.394842] x17: 000000000000000b x16: 00000000000001c2 x15: ffffffc009e57550
      [    0.396016] x14: 0000000000000000 x13: ffffffffffffffff x12: 0000000100000000
      [    0.397190] x11: 0000000000000462 x10: ffffff8040258008 x9 : 0000000100000000
      [    0.398364] x8 : 0000000000000000 x7 : ffffffc0093c8bf4 x6 : 0000000000000000
      [    0.399538] x5 : 0000000000000000 x4 : ffffffc00a976e40 x3 : ffffffc00810444c
      [    0.400711] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000
      [    0.401886] Call trace:
      [    0.402309]  __flush_work+0x12c/0x138
      [    0.402941]  schedule_on_each_cpu+0x228/0x278
      [    0.403693]  rcu_tasks_rude_wait_gp+0x130/0x144
      [    0.404502]  rcu_tasks_kthread+0x220/0x254
      [    0.405264]  kthread+0x174/0x1ac
      [    0.405837]  ret_from_fork+0x10/0x20
      [    0.406456] irq event stamp: 102
      [    0.406966] hardirqs last  enabled at (101): [<ffffffc0093c8468>] _raw_spin_unlock_irq+0x78/0xb4
      [    0.408304] hardirqs last disabled at (102): [<ffffffc0093b8270>] el1_dbg+0x24/0x5c
      [    0.409410] softirqs last  enabled at (54): [<ffffffc0081b80c8>] local_bh_enable+0xc/0x2c
      [    0.410645] softirqs last disabled at (50): [<ffffffc0081b809c>] local_bh_disable+0xc/0x2c
      [    0.411890] ---[ end trace 0000000000000000 ]---
      [    0.413000] smp: Brought up 1 node, 4 CPUs
      [    0.413762] SMP: Total of 4 processors activated.
      [    0.414566] CPU features: detected: 32-bit EL0 Support
      [    0.415414] CPU features: detected: 32-bit EL1 Support
      [    0.416278] CPU features: detected: CRC32 instructions
      [    0.447021] Callback from call_rcu_tasks_rude() invoked.
      [    0.506693] Callback from call_rcu_tasks() invoked.
      
      This commit therefore fixes this issue by applying a single-CPU
      optimization to the RCU Tasks Rude grace-period process.  The key point
      here is that the purpose of this RCU flavor is to force a schedule on
      each online CPU since some past event.  But the rcu_tasks_rude_wait_gp()
      function runs in the context of the RCU Tasks Rude's grace-period kthread,
      so there must already have been a context switch on the current CPU since
      the call to either synchronize_rcu_tasks_rude() or call_rcu_tasks_rude().
      So if there is only a single CPU online, RCU Tasks Rude's grace-period
      kthread does not need to anything at all.
      
      It turns out that the rcu_tasks_rude_wait_gp() function's call to
      schedule_on_each_cpu() causes problems during early boot.  During that
      time, there is only one online CPU, namely the boot CPU.  Therefore,
      applying this single-CPU optimization fixes early-boot instances of
      this problem.
      
      Link: https://lore.kernel.org/lkml/20220210184319.25009-1-treasure4paddy@gmail.com/T/Suggested-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarPadmanabha Srinivasaiah <treasure4paddy@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      f75fd4b9
  2. 03 Apr, 2022 8 commits
  3. 02 Apr, 2022 23 commits