1. 22 Jan, 2021 7 commits
    • Paul E. McKenney's avatar
      Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b',... · 0d2460ba
      Paul E. McKenney authored
      Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b', 'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD
      
      doc.2021.01.06a: Documentation updates.
      fixes.2021.01.04b: Miscellaneous fixes.
      kfree_rcu.2021.01.04a: kfree_rcu() updates.
      mmdumpobj.2021.01.22a: Dump allocation point for memory blocks.
      nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths.
      rt.2021.01.04a: Real-time updates.
      stall.2021.01.06a: RCU CPU stall warning updates.
      torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API.
      tortureall.2021.01.06a: Torture-test script updates.
      0d2460ba
    • Paul E. McKenney's avatar
      percpu_ref: Dump mem_dump_obj() info upon reference-count underflow · 3375efed
      Paul E. McKenney authored
      Reference-count underflow for percpu_ref is detected in the RCU callback
      percpu_ref_switch_to_atomic_rcu(), and the resulting warning does not
      print anything allowing easy identification of which percpu_ref use
      case is underflowing.  This is of course not normally a problem when
      developing a new percpu_ref use case because it is most likely that
      the problem resides in this new use case.  However, when deploying a
      new kernel to a large set of servers, the underflow might well be a new
      corner case in any of the old percpu_ref use cases.
      
      This commit therefore calls mem_dump_obj() to dump out any additional
      available information on the underflowing percpu_ref instance.
      
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      3375efed
    • Paul E. McKenney's avatar
      rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback · b4b7914a
      Paul E. McKenney authored
      The debug-object double-free checks in __call_rcu() print out the
      RCU callback function, which is usually sufficient to track down the
      double free.  However, all uses of things like queue_rcu_work() will
      have the same RCU callback function (rcu_work_rcufn() in this case),
      so a diagnostic message for a double queue_rcu_work() needs more than
      just the callback function.
      
      This commit therefore calls mem_dump_obj() to dump out any additional
      available information on the double-freed callback.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b4b7914a
    • Paul E. McKenney's avatar
      mm: Make mem_obj_dump() vmalloc() dumps include start and length · bd34dcd4
      Paul E. McKenney authored
      This commit adds the starting address and number of pages to the vmalloc()
      information dumped by way of vmalloc_dump_obj().
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <linux-mm@kvack.org>
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      bd34dcd4
    • Paul E. McKenney's avatar
      mm: Make mem_dump_obj() handle vmalloc() memory · 98f18083
      Paul E. McKenney authored
      This commit adds vmalloc() support to mem_dump_obj().  Note that the
      vmalloc_dump_obj() function combines the checking and dumping, in
      contrast with the split between kmem_valid_obj() and kmem_dump_obj().
      The reason for the difference is that the checking in the vmalloc()
      case involves acquiring a global lock, and redundant acquisitions of
      global locks should be avoided, even on not-so-fast paths.
      
      Note that this change causes on-stack variables to be reported as
      vmalloc() storage from kernel_clone() or similar, depending on the degree
      of inlining that your compiler does.  This is likely more helpful than
      the earlier "non-paged (local) memory".
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <linux-mm@kvack.org>
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      98f18083
    • Paul E. McKenney's avatar
      mm: Make mem_dump_obj() handle NULL and zero-sized pointers · b70fa3b1
      Paul E. McKenney authored
      This commit makes mem_dump_obj() call out NULL and zero-sized pointers
      specially instead of classifying them as non-paged memory.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b70fa3b1
    • Paul E. McKenney's avatar
      mm: Add mem_dump_obj() to print source of memory block · 8e7f37f2
      Paul E. McKenney authored
      There are kernel facilities such as per-CPU reference counts that give
      error messages in generic handlers or callbacks, whose messages are
      unenlightening.  In the case of per-CPU reference-count underflow, this
      is not a problem when creating a new use of this facility because in that
      case the bug is almost certainly in the code implementing that new use.
      However, trouble arises when deploying across many systems, which might
      exercise corner cases that were not seen during development and testing.
      Here, it would be really nice to get some kind of hint as to which of
      several uses the underflow was caused by.
      
      This commit therefore exposes a mem_dump_obj() function that takes
      a pointer to memory (which must still be allocated if it has been
      dynamically allocated) and prints available information on where that
      memory came from.  This pointer can reference the middle of the block as
      well as the beginning of the block, as needed by things like RCU callback
      functions and timer handlers that might not know where the beginning of
      the memory block is.  These functions and handlers can use mem_dump_obj()
      to print out better hints as to where the problem might lie.
      
      The information printed can depend on kernel configuration.  For example,
      the allocation return address can be printed only for slab and slub,
      and even then only when the necessary debug has been enabled.  For slab,
      build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
      to the next power of two or use the SLAB_STORE_USER when creating the
      kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
      boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
      if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
      to enable printing of the allocation-time stack trace.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
      [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
      [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
      [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
      [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
      [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      8e7f37f2
  2. 12 Jan, 2021 1 commit
  3. 07 Jan, 2021 32 commits
    • Paul E. McKenney's avatar
      torture: Maintain torture-specific set of CPUs-online books · 1afb95fe
      Paul E. McKenney authored
      The TREE01 rcutorture scenario intentionally creates confusion as to the
      number of available CPUs by specifying the "maxcpus=8 nr_cpus=43" kernel
      boot parameters.  This can disable rcutorture's load shedding, which
      currently uses num_online_cpus(), which would count the extra 35 CPUs.
      However, the rcutorture guest OS will be provisioned with only 8 CPUs,
      which means that rcutorture will present full load even when all but one
      of the original 8 CPUs are offline.  This can result in spurious errors
      due to extreme overloading of that single remaining CPU.
      
      This commit therefore keeps a separate set of books on the number of
      usable online CPUs, so that torture_num_online_cpus() is used for load
      shedding instead of num_online_cpus().  Note that initial sizing must
      use num_online_cpus() because torture_num_online_cpus() will return
      NR_CPUS until shortly after torture_onoff_init() is invoked.
      Reported-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      [ paulmck: Apply feedback from kernel test robot. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      1afb95fe
    • Paul E. McKenney's avatar
      torture: Clean up after torture-test CPU hotplugging · 0b962c8f
      Paul E. McKenney authored
      This commit puts all CPUs back online at the end of a torture test,
      and also unconditionally puts them online at the beginning of the test,
      rather than just in the case of built-in tests.  This allows torture tests
      to behave in a predictable manner, whether built-in or based on modules.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      0b962c8f
    • Paul E. McKenney's avatar
      rcutorture: Make object_debug also double call_rcu() heap object · edf7b841
      Paul E. McKenney authored
      This commit provides a test for call_rcu() printing the allocation address
      of a double-freed callback by double-freeing a callback allocated via
      kmalloc().  However, this commit does not depend on any other commit.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      edf7b841
    • Paul E. McKenney's avatar
      torture: Throttle VERBOSE_TOROUT_*() output · 8a67a20b
      Paul E. McKenney authored
      This commit adds kernel boot parameters torture.verbose_sleep_frequency
      and torture.verbose_sleep_duration, which allow VERBOSE_TOROUT_*() output
      to be throttled with periodic sleeps on large systems.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      8a67a20b
    • Paul E. McKenney's avatar
      torture: Make refscale throttle high-rate printk()s · 414c116e
      Paul E. McKenney authored
      This commit adds a short delay for verbose_batched-throttled printk()s
      to further decrease console flooding.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      414c116e
    • Paul E. McKenney's avatar
      rcutorture: Use hrtimers for reader and writer delays · 1eba0ef9
      Paul E. McKenney authored
      This commit replaces schedule_timeout_uninterruptible() and
      schedule_timeout_interruptible() with torture_hrtimeout_us() and
      torture_hrtimeout_jiffies() to avoid timer-wheel synchronization.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      1eba0ef9
    • Paul E. McKenney's avatar
      torture: Make stutter use torture_hrtimeout_*() functions · ed24affa
      Paul E. McKenney authored
      This commit saves a few lines of code by making the stutter_wait()
      and torture_stutter() functions use torture_hrtimeout_jiffies() and
      torture_hrtimeout_us().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ed24affa
    • Paul E. McKenney's avatar
      rcutorture: Use torture_hrtimeout_jiffies() to avoid busy-waits · ea31fd9c
      Paul E. McKenney authored
      Because rcu_torture_writer() and rcu_torture_fakewriter() predate
      hrtimers, they do timer-wheel-decoupled timed waits by using the
      timer-wheel-based schedule_timeout_interruptible() functions in
      conjunction with a random udelay()-based wait.  This latter unnecessarily
      burns CPU time, so this commit instead uses torture_hrtimeout_jiffies()
      to decouple from the timer wheels without busy-waiting.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ea31fd9c
    • Paul E. McKenney's avatar
      torture: Add fuzzed hrtimer-based sleep functions · ae19aaaf
      Paul E. McKenney authored
      This commit adds torture_hrtimeout_ns(), torture_hrtimeout_us(),
      torture_hrtimeout_ms(), torture_hrtimeout_jiffies(), and
      torture_hrtimeout_s(), each of which uses hrtimers to block for a fuzzed
      time interval.  These functions are intended to be used by the various
      torture tests to decouple wakeups from the timer wheel, thus providing
      more opportunity for Murphy to insert destructive race conditions.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ae19aaaf
    • Paul E. McKenney's avatar
      rcutorture: Make rcu_torture_fakewriter() use blocking wait primitives · 682189a3
      Paul E. McKenney authored
      Full testing of the new SRCU polling API requires that the fake
      writers also use it in order to test concurrent calls to all of the API
      members, especially start_poll_synchronize_srcu().  This commit makes
      rcu_torture_fakewriter() use all available blocking grace-period-wait
      primitives available from the RCU flavor under test.
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      682189a3
    • Paul E. McKenney's avatar
      rcutorture: Make synctype[] and nsynctype be static global · 18fbf307
      Paul E. McKenney authored
      Full testing of the new SRCU polling API requires that the fake writers
      also use it in order to test concurrent calls to all of the API members,
      especially start_poll_synchronize_srcu().  This commit prepares the ground
      for this by making the synctype[] and nsynctype variables be static
      globals so that the rcu_torture_fakewriter() function can access them.
      Initialization of these variables is moved from rcu_torture_writer()
      to a new rcu_torture_write_types() function that is invoked from
      rcu_torture_init() just before the first writer kthread is spawned.
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      18fbf307
    • Paul E. McKenney's avatar
      rcutorture: Require entire stutter period be post-boot · 12a910e3
      Paul E. McKenney authored
      Currently, the rcu_torture_writer() function checks that all required
      grace periods elapse during a stutter interval, which is a multi-second
      time period during which the test load is removed.  However, this check
      is suppressed during early boot (that is, before init is spawned) in
      order to avoid false positives that otherwise occur due to heavy load
      on the single boot CPU.
      
      Unfortunately, this approach is insufficient.  It is possible that the
      stutter interval might end just as init is spawned, so that early boot
      conditions prevailed during almost the entire stutter interval.
      
      This commit therefore takes a snapshot of boot-complete state just
      before the stutter interval, thus suppressing the check for failure to
      complete grace periods unless the entire stutter interval took place
      after early boot.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      12a910e3
    • Paul E. McKenney's avatar
      refscale: Allow summarization of verbose output · e76506f0
      Paul E. McKenney authored
      The refscale test prints enough per-kthread console output to provoke RCU
      CPU stall warnings on large systems.  This commit therefore allows this
      output to be summarized.  For example, the refscale.verbose_batched=32
      boot parameter would causes only every 32nd line of output to be logged.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e76506f0
    • Paul E. McKenney's avatar
      torture: Compress KASAN vmlinux files · e3e1a997
      Paul E. McKenney authored
      The sizes of vmlinux files built with KASAN enabled can approach a full
      gigabyte, which can result in disk overflow sooner rather than later.
      Fortunately, the xz command compresses them by almost an order of
      magnitude.  This commit therefore uses xz to compress vmlinux file built
      by torture.sh with KASAN enabled.
      
      However, xz is not the fastest thing in the world.  In fact, it is way
      slower than rotating-rust mass storage.  This commit therefore also adds a
      --compress-kasan-vmlinux argument to specify the degree of xz concurrency,
      which defaults to using all available CPUs if there are that many files in
      need of compression.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e3e1a997
    • Paul E. McKenney's avatar
      torture: Add --kcsan-kmake-arg to torture.sh for KCSAN · c54e4138
      Paul E. McKenney authored
      In 2020, running KCSAN often requires careful choice of compiler.
      This commit therefore adds a --kcsan-kmake-arg parameter to torture.sh
      to allow specifying (for example) "CC=clang" to the kernel build process
      to correctly build a KCSAN-enabled kernel.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c54e4138
    • Paul E. McKenney's avatar
      torture: Add command and results directory to torture.sh log · c66c0f94
      Paul E. McKenney authored
      This commit adds the command and arguments to the torture.sh log file, and
      also outputs the results directory.  This latter allows impatient users
      to quickly find the results that are being generated by the current run.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c66c0f94
    • Paul E. McKenney's avatar
      torture: Allow scenarios to be specified to torture.sh · 8847bd49
      Paul E. McKenney authored
      This commit adds --configs-rcutorture, --configs-locktorture, and
      --configs-scftorture arguments to torture.sh, allowing the desired
      set of scenarios to be passed to each.  The default for each has been
      changed from a large-system-appropriate set to just CFLIST for each.
      Users are encouraged to create scripts that provide appropriate settings
      for their specific systems.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      8847bd49
    • Paul E. McKenney's avatar
      torture: Drop log.long generation from torture.sh · 5ae5f745
      Paul E. McKenney authored
      Now that kvm.sh puts all the relevant details in the "log" file,
      there is no need for torture.sh to generate a separate "log.long"
      file.  This commit therefore drops this from torture.sh.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5ae5f745
    • Paul E. McKenney's avatar
      torture: Make torture.sh refuse to do zero-length runs · c679d90b
      Paul E. McKenney authored
      This commit causes torture.sh to check for zero-length runs and to take
      the cowardly option of refusing to run them, logging its cowardice for
      later inspection.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c679d90b
    • Paul E. McKenney's avatar
      torture: Make torture.sh throttle VERBOSE_TOROUT_*() for refscale · d97addc4
      Paul E. McKenney authored
      This commit causes torture.sh to use the torture.verbose_sleep_frequency
      kernel boot parameter to throttle verbose refscale output on large systems.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      d97addc4
    • Paul E. McKenney's avatar
      torture: Make torture.sh allmodconfig retain and label output · 1fe9cef4
      Paul E. McKenney authored
      This commit places "---" markers in the torture.sh script's allmodconfig
      output, and uses "<<" to avoid overwriting earlier output from this
      build test.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      1fe9cef4
    • Paul E. McKenney's avatar
      torture: Create doyesno helper function for torture.sh · c9a9d8e8
      Paul E. McKenney authored
      This commit saves a few lines of code by creating a doyesno helper bash
      function for argument parsing.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c9a9d8e8
    • Paul E. McKenney's avatar
      torture: Make torture.sh refscale runs use verbose_batched module parameter · 264da483
      Paul E. McKenney authored
      On large systems, the refscale printk() rate can overrun the file system's
      ability to accept console log messages.  This commit therefore uses the
      new verbose_batched module parameter to rate-limit some of the higher-rate
      printk() calls.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      264da483
    • Paul E. McKenney's avatar
      torture: Make torture.sh rcuscale and refscale deal with allmodconfig · 7a99487c
      Paul E. McKenney authored
      The .mod.c files created by allmodconfig builds interfers with the approach
      torture.sh uses to enumerate types of rcuscale and refscale runs.  This
      commit therefore tightens the pattern matching to avoid this interference.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      7a99487c
    • Paul E. McKenney's avatar
      torture: Enable torture.sh argument checking · 532017b1
      Paul E. McKenney authored
      This commit uncomments the argument checking for the --duration argument
      to torture.sh.  While in the area, it also corrects the duration units
      from seconds to minutes.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      532017b1
    • Paul E. McKenney's avatar
      torture: Auto-size SCF and scaling runs based on number of CPUs · 69d2b33e
      Paul E. McKenney authored
      This commit improves torture.sh flexibility by autoscaling the number
      of CPUs to be used in variable-CPUs torture tests, including scftorture,
      refscale, rcuscale, and kvfree.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      69d2b33e
    • Paul E. McKenney's avatar
      torture: Add "make allmodconfig" to torture.sh · a115a775
      Paul E. McKenney authored
      This commit adds the ability to do "make allmodconfig" to torture.sh,
      given that normal rcutorture runs do not normally catch missing exports.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      a115a775
    • Paul E. McKenney's avatar
      torture: Remove use of "eval" in torture.sh · 197220d4
      Paul E. McKenney authored
      The bash "eval" command enables Bobby Tables attacks, which might not
      be a concern in torture testing by themselves, but one could imagine
      these combined with a cut-and-paste attack.  This commit therefore gets
      rid of them.  This comes at a price in terms of bash quoting not working
      nicely, so the "--bootargs" argument lists are now passed to torture_one
      via a bash-variable side channel.  This might be a bit ugly, but it will
      also allow torture.sh to grow its own --bootargs parameter.
      
      While in the area, add proper header comments for the bash functions.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      197220d4
    • Paul E. McKenney's avatar
      torture: Make torture.sh use common time-duration bash functions · 1adb5d6b
      Paul E. McKenney authored
      This commit makes torture.sh use the new bash functions get_starttime()
      and get_starttime_duration() created for kvm.sh.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      1adb5d6b
    • Paul E. McKenney's avatar
      torture: Add torture.sh torture-everything script · bfc19c13
      Paul E. McKenney authored
      Although tailoring a specific set of kvm.sh runs has served rcutorture
      testing well over many years, it requires a relatively distraction-free
      environment, which is not always available.  This commit therefore
      adds a prototype torture.sh script that by default tortures pretty much
      everything the rcutorture scripting is designed to torture, and which
      can be given command-line arguments to take a more focused approach.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      bfc19c13
    • Neeraj Upadhyay's avatar
      rcu: Check and report missed fqs timer wakeup on RCU stall · 683954e5
      Neeraj Upadhyay authored
      For a new grace period request, the RCU GP kthread transitions through
      following states:
      
      a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]
      
      The RCU_GP_WAIT_GPS state is where the GP kthread waits for a request
      for a new GP.  Once it receives a request (for example, when a new RCU
      callback is queued), the GP kthread transitions to RCU_GP_DONE_GPS.
      
      b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]
      
      Grace period initialization starts in rcu_gp_init(), which records the
      start of new GP in rcu_state.gp_seq and transitions to RCU_GP_ONOFF.
      
      c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]
      
      The purpose of the RCU_GP_ONOFF state is to apply the online/offline
      information that was buffered for any CPUs that recently came online or
      went offline.  This state is maintained in per-leaf rcu_node bitmasks,
      with the buffered state in ->qsmaskinitnext and the state for the upcoming
      GP in ->qsmaskinit.  At the end of this RCU_GP_ONOFF state, each bit in
      ->qsmaskinit will correspond to a CPU that must pass through a quiescent
      state before the upcoming grace period is allowed to complete.
      
      However, a leaf rcu_node structure with an all-zeroes ->qsmaskinit
      cannot necessarily be ignored.  In preemptible RCU, there might well be
      tasks still in RCU read-side critical sections that were first preempted
      while running on one of the CPUs managed by this structure.  Such tasks
      will be queued on this structure's ->blkd_tasks list.  Only after this
      list fully drains can this leaf rcu_node structure be ignored, and even
      then only if none of its CPUs have come back online in the meantime.
      Once that happens, the ->qsmaskinit masks further up the tree will be
      updated to exclude this leaf rcu_node structure.
      
      Once the ->qsmaskinitnext and ->qsmaskinit fields have been updated
      as needed, the GP kthread transitions to RCU_GP_INIT.
      
      d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]
      
      The purpose of the RCU_GP_INIT state is to copy each ->qsmaskinit to
      the ->qsmask field within each rcu_node structure.  This copying is done
      breadth-first from the root to the leaves.  Why not just copy directly
      from ->qsmaskinitnext to ->qsmask?  Because the ->qsmaskinitnext masks
      can change in the meantime as additional CPUs come online or go offline.
      Such changes would result in inconsistencies in the ->qsmask fields up and
      down the tree, which could in turn result in too-short grace periods or
      grace-period hangs.  These issues are avoided by snapshotting the leaf
      rcu_node structures' ->qsmaskinitnext fields into their ->qsmaskinit
      counterparts, generating a consistent set of ->qsmaskinit fields
      throughout the tree, and only then copying these consistent ->qsmaskinit
      fields to their ->qsmask counterparts.
      
      Once this initialization step is complete, the GP kthread transitions
      to RCU_GP_WAIT_FQS, where it waits to do a force-quiescent-state scan
      on the one hand or for the end of the grace period on the other.
      
      e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]
      
      The RCU_GP_WAIT_FQS state waits for one of three things:  (1) An
      explicit request to do a force-quiescent-state scan, (2) The end of
      the grace period, or (3) A short interval of time, after which it
      will do a force-quiescent-state (FQS) scan.  The explicit request can
      come from rcutorture or from any CPU that has too many RCU callbacks
      queued (see the qhimark kernel parameter and the RCU_GP_FLAG_OVLD
      flag).  The aforementioned "short period of time" is specified by the
      jiffies_till_first_fqs boot parameter for a given grace period's first
      FQS scan and by the jiffies_till_next_fqs for later FQS scans.
      
      Either way, once the wait is over, the GP kthread transitions to
      RCU_GP_DOING_FQS.
      
      f. [RCU_GP_DOING_FQS] -> [RCU_GP_CLEANUP]
      
      The RCU_GP_DOING_FQS state performs an FQS scan.  Each such scan carries
      out two functions for any CPU whose bit is still set in its leaf rcu_node
      structure's ->qsmask field, that is, for any CPU that has not yet reported
      a quiescent state for the current grace period:
      
        i.  Report quiescent states on behalf of CPUs that have been observed
            to be idle (from an RCU perspective) since the beginning of the
            grace period.
      
        ii. If the current grace period is too old, take various actions to
            encourage holdout CPUs to pass through quiescent states, including
            enlisting the aid of any calls to cond_resched() and might_sleep(),
            and even including IPIing the holdout CPUs.
      
      These checks are skipped for any leaf rcu_node structure with a all-zero
      ->qsmask field, however such structures are subject to RCU priority
      boosting if there are tasks on a given structure blocking the current
      grace period.  The end of the grace period is detected when the root
      rcu_node structure's ->qsmask is zero and when there are no longer any
      preempted tasks blocking the current grace period.  (No, this last check
      is not redundant.  To see this, consider an rcu_node tree having exactly
      one structure that serves as both root and leaf.)
      
      Once the end of the grace period is detected, the GP kthread transitions
      to RCU_GP_CLEANUP.
      
      g. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED]
      
      The RCU_GP_CLEANUP state marks the end of grace period by updating the
      rcu_state structure's ->gp_seq field and also all rcu_node structures'
      ->gp_seq field.  As before, the rcu_node tree is traversed in breadth
      first order.  Once this update is complete, the GP kthread transitions
      to the RCU_GP_CLEANED state.
      
      i. [RCU_GP_CLEANED] -> [RCU_GP_INIT]
      
      Once in the RCU_GP_CLEANED state, the GP kthread immediately transitions
      into the RCU_GP_INIT state.
      
      j. The role of timers.
      
      If there is at least one idle CPU, and if timers are not firing, the
      transition from RCU_GP_DOING_FQS to RCU_GP_CLEANUP will never happen.
      Timers can fail to fire for a number of reasons, including issues in
      timer configuration, issues in the timer framework, and failure to handle
      softirqs (for example, when there is a storm of interrupts).  Whatever the
      reason, if the timers fail to fire, the GP kthread will never be awakened,
      resulting in RCU CPU stall warnings and eventually in OOM.
      
      However, an RCU CPU stall warning has a large number of potential causes,
      as documented in Documentation/RCU/stallwarn.rst.  This commit therefore
      adds analysis to the RCU CPU stall-warning code to emit an additional
      message if the cause of the stall is likely to be timer failure.
      Signed-off-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      683954e5
    • Paul E. McKenney's avatar
      rcu: Do any deferred nocb wakeups at CPU offline time · 147c6852
      Paul E. McKenney authored
      Because the need to wake a nocb GP kthread ("rcuog") is sometimes
      detected when wakeups cannot be done, these wakeups can be deferred.
      The wakeups are then carried out by calls to do_nocb_deferred_wakeup()
      at various safe points in the code, including RCU's idle hooks.  However,
      when a CPU goes offline, it invokes arch_cpu_idle_dead() without invoking
      any of RCU's idle hooks.
      
      This commit therefore adds a call to do_nocb_deferred_wakeup() in
      rcu_report_dead() in order to handle any deferred wakeups that have been
      requested by the outgoing CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      147c6852