1. 13 Aug, 2019 8 commits
    • Paul E. McKenney's avatar
      rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait() · 9fa471a8
      Paul E. McKenney authored
      This commit adjusts naming to account for the new distinction between
      callback and grace-period no-CBs kthreads.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      9fa471a8
    • Paul E. McKenney's avatar
      rcu/nocb: Provide separate no-CBs grace-period kthreads · 12f54c3a
      Paul E. McKenney authored
      Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
      are divided into groups.  The first rcuo kthread to come online in a
      given group is that group's leader, and the leader both waits for grace
      periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
      only invoke callbacks.
      
      This works well in the real-time/embedded environments for which it was
      intended because such environments tend not to generate all that many
      callbacks.  However, given huge floods of callbacks, it is possible for
      the leader kthread to be stuck invoking callbacks while its followers
      wait helplessly while their callbacks pile up.  This is a good recipe
      for an OOM, and rcutorture's new callback-flood capability does generate
      such OOMs.
      
      One strategy would be to wait until such OOMs start happening in
      production, but similar OOMs have in fact happened starting in 2018.
      It would therefore be wise to take a more proactive approach.
      
      This commit therefore features per-CPU rcuo kthreads that do nothing
      but invoke callbacks.  Instead of having one of these kthreads act as
      leader, each group has a separate rcog kthread that handles grace periods
      for its group.  Because these rcuog kthreads do not invoke callbacks,
      callback floods on one CPU no longer block callbacks from reaching the
      rcuc callback-invocation kthreads on other CPUs.
      
      This change does introduce additional kthreads, however:
      
      1.	The number of additional kthreads is about the square root of
      	the number of CPUs, so that a 4096-CPU system would have only
      	about 64 additional kthreads.  Note that recent changes
      	decreased the number of rcuo kthreads by a factor of two
      	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
      	this still represents a significant improvement on most systems.
      
      2.	The leading "rcuo" of the rcuog kthreads should allow existing
      	scripting to affinity these additional kthreads as needed, the
      	same as for the rcuop and rcuos kthreads.  (There are no longer
      	any rcuob kthreads.)
      
      3.	A state-machine approach was considered and rejected.  Although
      	this would allow the rcuo kthreads to continue their dual
      	leader/follower roles, it complicates callback invocation
      	and makes it more difficult to consolidate rcuo callback
      	invocation with existing softirq callback invocation.
      
      The introduction of rcuog kthreads should thus be acceptable.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      12f54c3a
    • Paul E. McKenney's avatar
      rcu/nocb: Update comments to prepare for forward-progress work · 6484fe54
      Paul E. McKenney authored
      This commit simply rewords comments to prepare for leader nocb kthreads
      doing only grace-period work and callback shuffling.  This will mean
      the addition of replacement kthreads to invoke callbacks.  The "leader"
      and "follower" thus become less meaningful, so the commit changes no-CB
      comments with these strings to "GP" and "CB", respectively.  (Give or
      take the usual grammatical transformations.)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      6484fe54
    • Paul E. McKenney's avatar
      rcu/nocb: Rename rcu_data fields to prepare for forward-progress work · 58bf6f77
      Paul E. McKenney authored
      This commit simply renames rcu_data fields to prepare for leader
      nocb kthreads doing only grace-period work and callback shuffling.
      This will mean the addition of replacement kthreads to invoke callbacks.
      The "leader" and "follower" thus become less meaningful, so the commit
      changes no-CB fields with these strings to "gp" and "cb", respectively.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      58bf6f77
    • Paul E. McKenney's avatar
      Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a',... · 31da0670
      Paul E. McKenney authored
      Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD
      
      consolidate.2019.08.01b: Further consolidation cleanups
      fixes.2019.08.12a: Miscellaneous fixes
      lists.2019.08.13a: Optional lockdep arguments for RCU list macros
      torture.2019.08.01b: Torture-test updates
      31da0670
    • Joel Fernandes (Google)'s avatar
      acpi: Use built-in RCU list checking for acpi_ioremaps list · bee6f871
      Joel Fernandes (Google) authored
      This commit applies the consolidated list_for_each_entry_rcu() support
      for lockdep conditions.
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      bee6f871
    • Joel Fernandes (Google)'s avatar
      x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator · 842a56cf
      Joel Fernandes (Google) authored
      The pcm_mmcfg_list is traversed by list_for_each_entry_rcu() outside
      of an RCU read-side critical section, which is safe because the
      pci_mmcfg_lock is held.  This commit therefore adds a lockdep expression
      to list_for_each_entry_rcu() in order t avoid lockdep warnings.
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      842a56cf
    • Joel Fernandes (Google)'s avatar
      driver/core: Convert to use built-in RCU list checking · c2fa1e1b
      Joel Fernandes (Google) authored
      This commit applies the consolidated hlist_for_each_entry_rcu() support
      for lockdep conditions.
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      c2fa1e1b
  2. 12 Aug, 2019 5 commits
    • Andrea Parri's avatar
      MAINTAINERS: Update e-mail address for Andrea Parri · ba31ebfa
      Andrea Parri authored
      My @amarulasolutions.com address stopped working this July, so update
      to my @gmail.com address where you'll still be able to reach me.
      Signed-off-by: default avatarAndrea Parri <parri.andrea@gmail.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Will Deacon <will@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jade Alglave <j.alglave@ucl.ac.uk>
      Cc: Luc Maranget <luc.maranget@inria.fr>
      Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
      Cc: Akira Yokosawa <akiyks@gmail.com>
      Cc: Daniel Lustig <dlustig@nvidia.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      ba31ebfa
    • Mukesh Ojha's avatar
      rcu: Fix spelling mistake "greate"->"great" · 511b44f7
      Mukesh Ojha authored
      This commit fixes a spelling mistake in file tree_exp.h.
      Signed-off-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      511b44f7
    • Paul E. McKenney's avatar
      arm: Use common outgoing-CPU-notification code · 1d5087ab
      Paul E. McKenney authored
      This commit removes the open-coded CPU-offline notification with new
      common code.  In particular, this change avoids calling scheduler code
      using RCU from an offline CPU that RCU is ignoring.  This is a minimal
      change.  A more intrusive change might invoke the cpu_check_up_prepare()
      and cpu_set_state_online() functions at CPU-online time, which would
      allow onlining throw an error if the CPU did not go offline properly.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      1d5087ab
    • Paul E. McKenney's avatar
      rcu: Remove redundant "if" condition from rcu_gp_is_expedited() · b823cafa
      Paul E. McKenney authored
      Because rcu_expedited_nesting is initialized to 1 and not decremented
      until just before init is spawned, rcu_expedited_nesting is guaranteed
      to be non-zero whenever rcu_scheduler_active == RCU_SCHEDULER_INIT.
      This commit therefore removes this redundant "if" equality test.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      b823cafa
    • Peter Zijlstra's avatar
      idle: Prevent late-arriving interrupts from disrupting offline · e78a7614
      Peter Zijlstra authored
      Scheduling-clock interrupts can arrive late in the CPU-offline process,
      after idle entry and the subsequent call to cpuhp_report_idle_dead().
      Once execution passes the call to rcu_report_dead(), RCU is ignoring
      the CPU, which results in lockdep complaints when the interrupt handler
      uses RCU:
      
      ------------------------------------------------------------------------
      
      =============================
      WARNING: suspicious RCU usage
      5.2.0-rc1+ #681 Not tainted
      -----------------------------
      kernel/sched/fair.c:9542 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      RCU used illegally from offline CPU!
      rcu_scheduler_active = 2, debug_locks = 1
      no locks held by swapper/5/0.
      
      stack backtrace:
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.2.0-rc1+ #681
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       dump_stack+0x5e/0x8b
       trigger_load_balance+0xa8/0x390
       ? tick_sched_do_timer+0x60/0x60
       update_process_times+0x3b/0x50
       tick_sched_handle+0x2f/0x40
       tick_sched_timer+0x32/0x70
       __hrtimer_run_queues+0xd3/0x3b0
       hrtimer_interrupt+0x11d/0x270
       ? sched_clock_local+0xc/0x74
       smp_apic_timer_interrupt+0x79/0x200
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      RIP: 0010:delay_tsc+0x22/0x50
      Code: ff 0f 1f 80 00 00 00 00 65 44 8b 05 18 a7 11 48 0f ae e8 0f 31 48 89 d6 48 c1 e6 20 48 09 c6 eb 0e f3 90 65 8b 05 fe a6 11 48 <41> 39 c0 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29
      RSP: 0000:ffff8f92c0157ed0 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff13
      RAX: 0000000000000005 RBX: ffff8c861f356400 RCX: ffff8f92c0157e64
      RDX: 000000321214c8cc RSI: 00000032120daa7f RDI: 0000000000260f15
      RBP: 0000000000000005 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff8c861ee18000 R15: ffff8c861ee18000
       cpuhp_report_idle_dead+0x31/0x60
       do_idle+0x1d5/0x200
       ? _raw_spin_unlock_irqrestore+0x2d/0x40
       cpu_startup_entry+0x14/0x20
       start_secondary+0x151/0x170
       secondary_startup_64+0xa4/0xb0
      
      ------------------------------------------------------------------------
      
      This happens rarely, but can be forced by happen more often by
      placing delays in cpuhp_report_idle_dead() following the call to
      rcu_report_dead().  With this in place, the following rcutorture
      scenario reproduces the problem within a few minutes:
      
      tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TREE04"
      
      This commit uses the crude but effective expedient of moving the disabling
      of interrupts within the idle loop to precede the cpu_is_offline()
      check.  It also invokes tick_nohz_idle_stop_tick() instead of
      tick_nohz_idle_stop_tick_protected() to shut off the scheduling-clock
      interrupt.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      [ paulmck: Revert tick_nohz_idle_stop_tick_protected() removal, new callers. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      e78a7614
  3. 09 Aug, 2019 3 commits
  4. 01 Aug, 2019 20 commits
    • Paul E. McKenney's avatar
      rcutorture: Aggressive forward-progress tests shouldn't block shutdown · 60013d5d
      Paul E. McKenney authored
      The more aggressive forward-progress tests can interfere with rcutorture
      shutdown, resulting in false-positive diagnostics.  This commit therefore
      ends any such tests 30 seconds prior to shutdown.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      60013d5d
    • Joel Fernandes (Google)'s avatar
      rcuperf: Make rcuperf kernel test more robust for !expedited mode · 77e9752c
      Joel Fernandes (Google) authored
      It is possible that the rcuperf kernel test runs concurrently with init
      starting up.  During this time, the system is running all grace periods
      as expedited.  However, rcuperf can also be run for normal GP tests.
      Right now, it depends on a holdoff time before starting the test to
      ensure grace periods start later. This works fine with the default
      holdoff time however it is not robust in situations where init takes
      greater than the holdoff time to finish running. Or, as in my case:
      
      I modified the rcuperf test locally to also run a thread that did
      preempt disable/enable in a loop. This had the effect of slowing down
      init. The end result was that the "batches:" counter in rcuperf was 0
      causing a division by 0 error in the results. This counter was 0 because
      only expedited GPs seem to happen, not normal ones which led to the
      rcu_state.gp_seq counter remaining constant across grace periods which
      unexpectedly happen to be expedited. The system was running expedited
      RCU all the time because rcu_unexpedited_gp() would not have run yet
      from init.  In other words, the test would concurrently with init
      booting in expedited GP mode.
      
      To fix this properly, this commit waits until system_state is set to
      SYSTEM_RUNNING before starting the test.  This change is made just
      before kernel_init() invokes rcu_end_inkernel_boot(), and this latter
      is what turns off boot-time expediting of RCU grace periods.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      77e9752c
    • Denis Efremov's avatar
      torture: Remove exporting of internal functions · 21f57546
      Denis Efremov authored
      The functions torture_onoff_cleanup() and torture_shuffle_cleanup()
      are declared static and marked EXPORT_SYMBOL_GPL(), which is at best an
      odd combination.  Because these functions are not used outside of the
      kernel/torture.c file they are defined in, this commit removes their
      EXPORT_SYMBOL_GPL() marking.
      
      Fixes: cc47ae08 ("rcutorture: Abstract torture-test cleanup")
      Signed-off-by: default avatarDenis Efremov <efremov@linux.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      21f57546
    • Paul E. McKenney's avatar
      rcutorture: Emulate userspace sojourn during call_rcu() floods · bd1bfc51
      Paul E. McKenney authored
      During an actual call_rcu() flood, there would be frequent trips to
      userspace (in-kernel call_rcu() floods must be otherwise housebroken).
      Userspace execution allows a great many things to interrupt execution,
      and rcutorture needs to also allow such interruptions.  This commit
      therefore causes call_rcu() floods to occasionally invoke schedule(),
      thus preventing spurious rcutorture failures due to other parts of the
      kernel becoming irate at the call_rcu() flood events.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      bd1bfc51
    • Paul E. McKenney's avatar
      rcutorture: Test TREE03 with the threadirqs kernel boot parameter · f4e83529
      Paul E. McKenney authored
      Since commit 05f41571 ("rcu: Speed up expedited GPs when interrupting
      RCU reader") in v5.0 and through v5.1, booting with the threadirqs kernel
      boot parameter caused self-deadlocks, which can be reproduced using the
      following command on an 8-CPU system:
      
      tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs "TREE03" --bootargs "threadirqs"
      
      This commit therefore adds the threadirqs kernel boot parameter to
      the TREE03 rcutorture scenario in order to more quickly detect future
      similar bugs.
      
      Link: http://lkml.kernel.org/r/20190626135447.y24mvfuid5fifwjc@linutronix.deSigned-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      f4e83529
    • Paul E. McKenney's avatar
      torture: Expand last_ts variable in kvm-test-1-run.sh · 2c667e5e
      Paul E. McKenney authored
      The kvm-test-1-run.sh script says 'test -z "last_ts"' which always
      evaluates to true (AKA zero) regardless of the value of the last_ts shell
      variable.  This commit therefore inserts the needed dollar sign ("$").
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      2c667e5e
    • Xiao Yang's avatar
      rcuperf: Fix perf_type module-parameter description · b3f3886c
      Xiao Yang authored
      The rcu_bh rcuperf type was removed by commit 620d2460("rcuperf:
      Remove the "rcu_bh" and "sched" torture types"), but it lives on in the
      MODULE_PARM_DESC() of perf_type.  This commit therefore changes that
      module-parameter description to substitute srcu for rcu_bh.
      Signed-off-by: default avatarXiao Yang <ice_yangxiao@163.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      b3f3886c
    • Joel Fernandes (Google)'s avatar
      rcu: Remove redundant debug_locks check in rcu_read_lock_sched_held() · 9147089b
      Joel Fernandes (Google) authored
      The debug_locks flag can never be true at the end of
      rcu_read_lock_sched_held() because it is already checked by the earlier
      call todebug_lockdep_rcu_enabled().   This commit therefore removes this
      redundant check.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      9147089b
    • Joel Fernandes (Google)'s avatar
      treewide: Rename rcu_dereference_raw_notrace() to _check() · 0a5b99f5
      Joel Fernandes (Google) authored
      The rcu_dereference_raw_notrace() API name is confusing.  It is equivalent
      to rcu_dereference_raw() except that it also does sparse pointer checking.
      
      There are only a few users of rcu_dereference_raw_notrace(). This patches
      renames all of them to be rcu_dereference_raw_check() with the "_check()"
      indicating sparse checking.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      [ paulmck: Fix checkpatch warnings about parentheses. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      0a5b99f5
    • Paul E. McKenney's avatar
      doc: Add rcutree.kthread_prio pointer to stallwarn.txt · 0500873d
      Paul E. McKenney authored
      This commit adds mention of the rcutree.kthread_prio kernel boot parameter
      to the discussion of how high-priority real-time tasks can result in
      RCU CPU stall warnings.  (However, this does not necessarily help when
      the high-priority real-time tasks are using dubious deadlines.)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      0500873d
    • Byungchul Park's avatar
      rcu: Change return type of rcu_spawn_one_boost_kthread() · 3545832f
      Byungchul Park authored
      The return value of rcu_spawn_one_boost_kthread() is not used any longer.
      This commit therefore changes its return type from int to void, and
      removes the cast to void from its callers.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      3545832f
    • Paul E. McKenney's avatar
      srcu: Avoid srcutorture security-based pointer obfuscation · 7e210a65
      Paul E. McKenney authored
      Because pointer output is now obfuscated, and because what you really
      want to know is whether or not the callback lists are empty, this commit
      replaces the srcu_data structure's head callback pointer printout with
      a single character that is "." is the callback list is empty or "C"
      otherwise.
      
      This is the only remaining user of rcu_segcblist_head(), so this
      commit also removes this function's definition.  It also turns out that
      rcu_segcblist_tail() no longer has any callers, so this commit removes
      that function's definition while in the area.  They were both marked
      "Interim", and their end has come.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      7e210a65
    • Paul E. McKenney's avatar
      rcu: Add destroy_work_on_stack() to match INIT_WORK_ONSTACK() · fbad01af
      Paul E. McKenney authored
      The synchronize_rcu_expedited() function has an INIT_WORK_ONSTACK(),
      but lacks the corresponding destroy_work_on_stack().  This commit
      therefore adds destroy_work_on_stack().
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Acked-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      fbad01af
    • Paul E. McKenney's avatar
      rcu: Add kernel parameter to dump trace after RCU CPU stall warning · cdc694b2
      Paul E. McKenney authored
      This commit adds a rcu_cpu_stall_ftrace_dump kernel boot parameter, that,
      when set, causes the trace buffer to be dumped after an RCU CPU stall
      warning is printed.  This kernel boot parameter is disabled by default,
      maintaining compatibility with previous behavior.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      cdc694b2
    • Paul E. McKenney's avatar
      rcu: Restore barrier() to rcu_read_lock() and rcu_read_unlock() · 1f3ebc82
      Paul E. McKenney authored
      Commit bb73c52b ("rcu: Don't disable preemption for Tiny and Tree
      RCU readers") removed the barrier() calls from rcu_read_lock() and
      rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels.
      Within RCU, this commit was OK, but it failed to account for things like
      get_user() that can pagefault and that can be reordered by the compiler.
      Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock()
      can cause these page faults to migrate into RCU read-side critical
      sections, which in CONFIG_PREEMPT=n kernels could result in too-short
      grace periods and arbitrary misbehavior.  Please see commit 386afc91
      ("spinlocks and preemption points need to be at least compiler barriers")
      and Linus's commit 66be4e66 ("rcu: locking and unlocking need to
      always be at least barriers"), this last of which restores the barrier()
      call to both rcu_read_lock() and rcu_read_unlock().
      
      This commit removes barrier() calls that are no longer needed given that
      the addition of them in Linus's commit noted above.  The combination of
      this commit and Linus's commit effectively reverts commit bb73c52b
      ("rcu: Don't disable preemption for Tiny and Tree RCU readers").
      Reported-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      [ paulmck: Fix embarrassing typo located by Alan Stern. ]
      1f3ebc82
    • Paul E. McKenney's avatar
      time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint · b55bd585
      Paul E. McKenney authored
      The TASKS03 and TREE04 rcutorture scenarios produce the following
      lockdep complaint:
      
      ------------------------------------------------------------------------
      
      ================================
      WARNING: inconsistent lock state
      5.2.0-rc1+ #513 Not tainted
      --------------------------------
      inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      migration/1/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
      (____ptrval____) (tick_broadcast_lock){?...}, at: tick_broadcast_offline+0xf/0x70
      {IN-HARDIRQ-W} state was registered at:
        lock_acquire+0xb0/0x1c0
        _raw_spin_lock_irqsave+0x3c/0x50
        tick_broadcast_switch_to_oneshot+0xd/0x40
        tick_switch_to_oneshot+0x4f/0xd0
        hrtimer_run_queues+0xf3/0x130
        run_local_timers+0x1c/0x50
        update_process_times+0x1c/0x50
        tick_periodic+0x26/0xc0
        tick_handle_periodic+0x1a/0x60
        smp_apic_timer_interrupt+0x80/0x2a0
        apic_timer_interrupt+0xf/0x20
        _raw_spin_unlock_irqrestore+0x4e/0x60
        rcu_nocb_gp_kthread+0x15d/0x590
        kthread+0xf3/0x130
        ret_from_fork+0x3a/0x50
      irq event stamp: 171
      hardirqs last  enabled at (171): [<ffffffff8a201a37>] trace_hardirqs_on_thunk+0x1a/0x1c
      hardirqs last disabled at (170): [<ffffffff8a201a53>] trace_hardirqs_off_thunk+0x1a/0x1c
      softirqs last  enabled at (0): [<ffffffff8a264ee0>] copy_process.part.56+0x650/0x1cb0
      softirqs last disabled at (0): [<0000000000000000>] 0x0
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(tick_broadcast_lock);
        <Interrupt>
          lock(tick_broadcast_lock);
      
       *** DEADLOCK ***
      
      1 lock held by migration/1/14:
       #0: (____ptrval____) (clockevents_lock){+.+.}, at: tick_offline_cpu+0xf/0x30
      
      stack backtrace:
      CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.2.0-rc1+ #513
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0x5e/0x8b
       print_usage_bug+0x1fc/0x216
       ? print_shortest_lock_dependencies+0x1b0/0x1b0
       mark_lock+0x1f2/0x280
       __lock_acquire+0x1e0/0x18f0
       ? __lock_acquire+0x21b/0x18f0
       ? _raw_spin_unlock_irqrestore+0x4e/0x60
       lock_acquire+0xb0/0x1c0
       ? tick_broadcast_offline+0xf/0x70
       _raw_spin_lock+0x33/0x40
       ? tick_broadcast_offline+0xf/0x70
       tick_broadcast_offline+0xf/0x70
       tick_offline_cpu+0x16/0x30
       take_cpu_down+0x7d/0xa0
       multi_cpu_stop+0xa2/0xe0
       ? cpu_stop_queue_work+0xc0/0xc0
       cpu_stopper_thread+0x6d/0x100
       smpboot_thread_fn+0x169/0x240
       kthread+0xf3/0x130
       ? sort_range+0x20/0x20
       ? kthread_cancel_delayed_work_sync+0x10/0x10
       ret_from_fork+0x3a/0x50
      
      ------------------------------------------------------------------------
      
      To reproduce, run the following rcutorture test:
      
              tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TASKS03 TREE04"
      
      It turns out that tick_broadcast_offline() was an innocent bystander.
      After all, interrupts are supposed to be disabled throughout
      take_cpu_down(), and therefore should have been disabled upon entry to
      tick_offline_cpu() and thus to tick_broadcast_offline().  This suggests
      that one of the CPU-hotplug notifiers was incorrectly enabling interrupts,
      and leaving them enabled on return.
      
      Some debugging code showed that the culprit was sched_cpu_dying().
      It had irqs enabled after return from sched_tick_stop().  Which in turn
      had irqs enabled after return from cancel_delayed_work_sync().  Which is a
      wrapper around __cancel_work_timer().  Which can sleep in the case where
      something else is concurrently trying to cancel the same delayed work,
      and as Thomas Gleixner pointed out on IRC, sleeping is a decidedly bad
      idea when you are invoked from take_cpu_down(), regardless of the state
      you leave interrupts in upon return.
      
      Code inspection located no reason why the delayed work absolutely
      needed to be canceled from sched_tick_stop():  The work is not
      bound to the outgoing CPU by design, given that the whole point is
      to collect statistics without disturbing the outgoing CPU.
      
      This commit therefore simply drops the cancel_delayed_work_sync() from
      sched_tick_stop().  Instead, a new ->state field is added to the tick_work
      structure so that the delayed-work handler function sched_tick_remote()
      can avoid reposting itself.  A cpu_is_offline() check is also added to
      sched_tick_remote() to avoid mucking with the state of an offlined CPU
      (though it does appear safe to do so).  The sched_tick_start() and
      sched_tick_stop() functions also update ->state, and sched_tick_start()
      also schedules the delayed work if ->state indicates that it is not
      already in flight.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      [ paulmck: Apply Peter Zijlstra and Frederic Weisbecker atomics feedback. ]
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      b55bd585
    • Paul E. McKenney's avatar
      lockdep: Make print_lock() address visible · 519248f3
      Paul E. McKenney authored
      Security is a wonderful thing, but so is the ability to debug based on
      lockdep warnings.  This commit therefore makes lockdep lock addresses
      visible in the clear.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      519248f3
    • Joel Fernandes (Google)'s avatar
      rcu: Simplify rcu_note_context_switch exit from critical section · cb4dbbfa
      Joel Fernandes (Google) authored
      Because __rcu_read_unlock() can be preempted just before the call to
      rcu_read_unlock_special(), it is possible for a task to be preempted just
      before it would have fully exited its RCU read-side critical section.
      This would result in a needless extension of that critical section until
      that task was resumed, which might in turn result in a needlessly
      long grace period, needless RCU priority boosting, and needless
      force-quiescent-state actions.  Therefore, rcu_note_context_switch()
      invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when
      it detects this situation.  This action by rcu_note_context_switch()
      ends the RCU read-side critical section immediately.
      
      Of course, once the task resumes, it will invoke rcu_read_unlock_special()
      redundantly.  This is harmless because the fact that a preemption
      happened means that interrupts, preemption, and softirqs cannot
      have been disabled, so there would be no deferred quiescent state.
      While ->rcu_read_lock_nesting remains less than zero, none of the
      ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by
      the call to rcu_note_context_switch() at task-preemption time.  Therefore,
      setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.
      
      Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore()
      would return immediately.  With one possible exception, which is
      if an expedited grace period started just as the task was being
      resumed, which could leave ->exp_deferred_qs set.  This will cause
      rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(),
      reporting the quiescent state, just as it should.  (Such an expedited
      grace period won't affect the preemption code path due to interrupts
      having already been disabled.)
      
      But when rcu_note_context_switch() invokes __rcu_read_unlock(), it
      is doing so with preemption disabled, hence __rcu_read_unlock() will
      unconditionally defer the quiescent state, only to immediately invoke
      rcu_preempt_deferred_qs(), thus immediately reporting the deferred
      quiescent state.  It turns out to be safe (and faster) to instead
      just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock()
      middleman.
      
      Because this is the invocation during the preemption (as opposed to
      the invocation just after the resume), at least one of the bits in
      ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting
      must be negative.  This means that rcu_preempt_need_deferred_qs() must
      return true, avoiding the early exit from rcu_preempt_deferred_qs().
      Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately,
      as required.
      
      This commit therefore simplifies the CONFIG_PREEMPT=y version of
      rcu_note_context_switch() by removing the "else if" branch of its
      "if" statement.  This change means that all callers that would have
      invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs()
      will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the
      rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.
      
      Cc: rcu@vger.kernel.org
      Cc: kernel-team@android.com
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      cb4dbbfa
    • Paul E. McKenney's avatar
      rcu: Make rcu_read_unlock_special() checks match raise_softirq_irqoff() · 87446b48
      Paul E. McKenney authored
      Threaded interrupts provide additional interesting interactions between
      RCU and raise_softirq() that can result in self-deadlocks in v5.0-2 of
      the Linux kernel.  These self-deadlocks can be provoked in susceptible
      kernels within a few minutes using the following rcutorture command on
      an 8-CPU system:
      
      tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs "TREE03" --bootargs "threadirqs"
      
      Although post-v5.2 RCU commits have at least greatly reduced the
      probability of these self-deadlocks, this was entirely by accident.
      Although this sort of accident should be rowdily celebrated on those
      rare occasions when it does occur, such celebrations should be quickly
      followed by a principled patch, which is what this patch purports to be.
      
      The key point behind this patch is that when in_interrupt() returns
      true, __raise_softirq_irqoff() will never attempt a wakeup.  Therefore,
      if in_interrupt(), calls to raise_softirq*() are both safe and
      extremely cheap.
      
      This commit therefore replaces the in_irq() calls in the "if" statement
      in rcu_read_unlock_special() with in_interrupt() and simplifies the
      "if" condition to the following:
      
      	if (irqs_were_disabled && use_softirq &&
      	    (in_interrupt() ||
      	     (exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
      		raise_softirq_irqoff(RCU_SOFTIRQ);
      	} else {
      		/* Appeal to the scheduler. */
      	}
      
      The rationale behind the "if" condition is as follows:
      
      1.	irqs_were_disabled:  If interrupts are enabled, we should
      	instead appeal to the scheduler so as to let the upcoming
      	irq_enable()/local_bh_enable() do the rescheduling for us.
      2.	use_softirq: If this kernel isn't using softirq, then
      	raise_softirq_irqoff() will be unhelpful.
      3.	a.	in_interrupt(): If this returns true, the subsequent
      		call to raise_softirq_irqoff() is guaranteed not to
      		do a wakeup, so that call will be both very cheap and
      		quite safe.
      	b.	Otherwise, if !in_interrupt() the raise_softirq_irqoff()
      		might do a wakeup, which is expensive and, in some
      		contexts, unsafe.
      		i.	The "exp" (an expedited RCU grace period is being
      			blocked) says that the wakeup is worthwhile, and:
      		ii.	The !.deferred_qs says that scheduler locks
      			cannot be held, so the wakeup will be safe.
      
      Backporting this requires considerable care, so no auto-backport, please!
      
      Fixes: 05f41571 ("rcu: Speed up expedited GPs when interrupting RCU reader")
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      87446b48
    • Paul E. McKenney's avatar
      rcu: Simplify rcu_read_unlock_special() deferred wakeups · d143b3d1
      Paul E. McKenney authored
      In !use_softirq runs, we clearly cannot rely on raise_softirq() and
      its lightweight bit setting, so we must instead do some form of wakeup.
      In the absence of a self-IPI when interrupts are disabled, these wakeups
      can be delayed until the next interrupt occurs.  This means that calling
      invoke_rcu_core() doesn't actually do any expediting.
      
      In this case, it is better to take the "else" clause, which sets the
      current CPU's resched bits and, if there is an expedited grace period
      in flight, uses IRQ-work to force the needed self-IPI.  This commit
      therefore removes the "else if" clause that calls invoke_rcu_core().
      Reported-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      d143b3d1
  5. 28 Jul, 2019 4 commits