1. 22 Jul, 2022 2 commits
    • Linus Torvalds's avatar
      Merge tag 'rcu-urgent.2022.07.21a' of... · 4ba1329c
      Linus Torvalds authored
      Merge tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
      
      Pull RCU fix from Paul McKenney:
       "This contains a pair of commits that fix 282d8998 ("srcu: Prevent
        expedited GPs and blocking readers from consuming CPU"), which was
        itself a fix to an SRCU expedited grace-period problem that could
        prevent kernel live patching (KLP) from completing.
      
        That SRCU fix for KLP introduced large (as in minutes) boot-time
        delays to embedded Linux kernels running on qemu/KVM. These delays
        were due to the emulation of certain MMIO operations controlling
        memory layout, which were emulated with one expedited grace period per
        access. Common configurations required thousands of boot-time MMIO
        accesses, and thus thousands of boot-time expedited SRCU grace
        periods.
      
        In these configurations, the occasional sleeps that allowed KLP to
        proceed caused excessive boot delays. These commits preserve enough
        sleeps to permit KLP to proceed, but few enough that the virtual
        embedded kernels still boot reasonably quickly.
      
        This represents a regression introduced in the v5.19 merge window, and
        the bug is causing significant inconvenience"
      
      * tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        srcu: Make expedited RCU grace periods block even less frequently
        srcu: Block less aggressively for expedited grace periods
      4ba1329c
    • Linus Torvalds's avatar
      mmu_gather: fix the CONFIG_MMU_GATHER_NO_RANGE case · 7fb5e508
      Linus Torvalds authored
      Sudip reports that alpha doesn't build properly, with errors like
      
        include/asm-generic/tlb.h:401:1: error: redefinition of 'tlb_update_vma_flags'
          401 | tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma)
              | ^~~~~~~~~~~~~~~~~~~~
        include/asm-generic/tlb.h:372:1: note: previous definition of 'tlb_update_vma_flags' with type 'void(struct mmu_gather *, struct vm_area_struct *)'
          372 | tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma) { }
      
      the cause being that We have this odd situation where some architectures
      were never converted to the newer TLB flushing interfaces that have a
      range for the flush.  Instead people left them alone, and we have them
      select the MMU_GATHER_NO_RANGE config option to make the tlb header
      files account for this.
      
      Peter Zijlstra cleaned some of these nasty header file games up in
      commits
      
        1e9fdf21 ("mmu_gather: Remove per arch tlb_{start,end}_vma()")
        18ba064e ("mmu_gather: Let there be one tlb_{start,end}_vma() implementation")
      
      but tlb_update_vma_flags() was left alone, and then commit b67fbebd
      ("mmu_gather: Force tlb-flush VM_PFNMAP vmas") ended up removing only
      _one_ of the two stale duplicate dummy inline functions.
      
      This removes the other stale one.
      
      Somebody braver than me should try to remove MMU_GATHER_NO_RANGE
      entirely, but it requires fixing up the oddball architectures that use
      it: alpha, m68k, microblaze, nios2 and openrisc.
      
      The fixups should be fairly straightforward ("fix the build errors it
      exposes by adding the appropriate range arguments"), but the reason this
      wasn't done in the first place is that so few people end up working on
      those architectures.  But it could be done one architecture at a time,
      hint, hint.
      Reported-by: default avatarSudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
      Fixes: b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
      Link: https://lore.kernel.org/all/YtpXh0QHWwaEWVAY@debian/
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fb5e508
  2. 21 Jul, 2022 8 commits
  3. 20 Jul, 2022 28 commits
  4. 19 Jul, 2022 2 commits
    • Neeraj Upadhyay's avatar
      srcu: Make expedited RCU grace periods block even less frequently · 4f2bfd94
      Neeraj Upadhyay authored
      The purpose of commit 282d8998 ("srcu: Prevent expedited GPs
      and blocking readers from consuming CPU") was to prevent a long
      series of never-blocking expedited SRCU grace periods from blocking
      kernel-live-patching (KLP) progress.  Although it was successful, it also
      resulted in excessive boot times on certain embedded workloads running
      under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
      means increasing the boot time up into the three-to-four minute range.
      This increase in boot time was due to the more than 6000 back-to-back
      invocations of synchronize_rcu_expedited() within the KVM host OS, which
      in turn resulted from qemu's emulation of a long series of MMIO accesses.
      
      Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
      periods") did not significantly help this particular use case.
      
      Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
      value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
      of non-sleeping per phase counts on a system with preemption enabled,
      and observed the following boot times:
      
      +──────────────────────────+────────────────+
      | SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
      +──────────────────────────+────────────────+
      | 100                      | 30.053         |
      | 150                      | 25.151         |
      | 200                      | 20.704         |
      | 250                      | 15.748         |
      | 500                      | 11.401         |
      | 1000                     | 11.443         |
      | 10000                    | 11.258         |
      | 1000000                  | 11.154         |
      +──────────────────────────+────────────────+
      
      Analysis on the experiment results show additional improvements with
      CPU-bound delays approaching one jiffy in duration. This improvement was
      also seen when number of per-phase iterations were scaled to one jiffy.
      
      This commit therefore scales per-grace-period phase number of non-sleeping
      polls so that non-sleeping polls extend for about one jiffy. In addition,
      the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
      replaced with a simple check for an expedited grace period.  This change
      schedules callback invocation immediately after expedited grace periods
      complete, which results in greatly improved boot times.  Testing done
      by Marc and Zhangfei confirms that this change recovers most of the
      performance degradation in boottime; for CONFIG_HZ_250 configuration,
      specifically, boot times improve from 3m50s to 41s on Marc's setup;
      and from 2m40s to ~9.7s on Zhangfei's setup.
      
      In addition to the changes to default per phase delays, this
      change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
      srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
      This allows users to configure the srcu grace period scanning delays in
      order to more quickly react to additional use cases.
      
      Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
      Fixes: 282d8998 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
      Reported-by: default avatarZhangfei Gao <zhangfei.gao@linaro.org>
      Reported-by: default avataryueluck <yueluck@163.com>
      Signed-off-by: default avatarNeeraj Upadhyay <quic_neeraju@quicinc.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: default avatarZhangfei Gao <zhangfei.gao@linaro.org>
      Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      4f2bfd94
    • Paul E. McKenney's avatar
      srcu: Block less aggressively for expedited grace periods · 8f870e6e
      Paul E. McKenney authored
      Commit 282d8998 ("srcu: Prevent expedited GPs and blocking readers
      from consuming CPU") fixed a problem where a long-running expedited SRCU
      grace period could block kernel live patching.  It did so by giving up
      on expediting once a given SRCU expedited grace period grew too old.
      
      Unfortunately, this added excessive delays to boots of virtual embedded
      systems specifying "-bios QEMU_EFI.fd" to qemu.  This commit therefore
      makes the transition away from expediting less aggressive, increasing
      the per-grace-period phase number of non-sleeping polls of readers from
      one to three and increasing the required grace-period age from one jiffy
      (actually from zero to one jiffies) to two jiffies (actually from one
      to two jiffies).
      
      Fixes: 282d8998 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reported-by: default avatarZhangfei Gao <zhangfei.gao@linaro.org>
      Reported-by: default avatarchenxiang (M)" <chenxiang66@hisilicon.com>
      Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarNeeraj Upadhyay <quic_neeraju@quicinc.com>
      Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
      8f870e6e