1. 13 Jul, 2024 1 commit
  2. 10 Jul, 2024 1 commit
    • Michael Kelley's avatar
      swiotlb: reduce swiotlb pool lookups · 7296f230
      Michael Kelley authored
      With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
      in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
      places, the pool is found and used in one function, and then must
      be found again in the next function that is called because only the
      tlb_addr is passed as an argument. These are the six call sites:
      
      dma_direct_map_page:
       1. swiotlb_map -> swiotlb_tbl_map_single -> swiotlb_bounce
      
      dma_direct_unmap_page:
       2. dma_direct_sync_single_for_cpu -> is_swiotlb_buffer
       3. dma_direct_sync_single_for_cpu -> swiotlb_sync_single_for_cpu ->
      	swiotlb_bounce
       4. is_swiotlb_buffer
       5. swiotlb_tbl_unmap_single -> swiotlb_del_transient
       6. swiotlb_tbl_unmap_single -> swiotlb_release_slots
      
      Reduce the number of calls by finding the pool at a higher level, and
      passing it as an argument instead of searching again. A key change is
      for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
      and then pass this pool pointer to subsequent swiotlb functions.
      
      There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
      is a swiotlb buffer before calling a swiotlb function. To reduce code
      duplication in getting the pool pointer and passing it as an argument,
      introduce inline wrappers for this pattern. The generated code is
      essentially unchanged.
      
      Since is_swiotlb_buffer() no longer returns a boolean, rename some
      functions to reflect the change:
      
       * swiotlb_find_pool() becomes __swiotlb_find_pool()
       * is_swiotlb_buffer() becomes swiotlb_find_pool()
       * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()
      
      With these changes, a round-trip map/unmap pair requires only 2 pool
      lookups (listed using the new names and wrappers):
      
      dma_direct_unmap_page:
       1. dma_direct_sync_single_for_cpu -> swiotlb_find_pool
       2. swiotlb_tbl_unmap_single -> swiotlb_find_pool
      
      These changes come from noticing the inefficiencies in a code review,
      not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
      __swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
      so avoiding the redundant calls helps performance in a hot path.
      When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
      is minimal and the perf benefits are likely negligible, but no
      harm is done.
      
      No functional change is intended.
      Signed-off-by: default avatarMichael Kelley <mhklinux@outlook.com>
      Reviewed-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      7296f230
  3. 09 Jul, 2024 1 commit
    • Yicong Yang's avatar
      dma-mapping: benchmark: Don't starve others when doing the test · 54624acf
      Yicong Yang authored
      The test thread will start N benchmark kthreads and then schedule out
      until the test time finished and notify the benchmark kthreads to stop.
      The benchmark kthreads will keep running until notified to stop.
      There's a problem with current implementation when the benchmark
      kthreads number is equal to the CPUs on a non-preemptible kernel:
      since the scheduler will balance the kthreads across the CPUs and
      when the test time's out the test thread won't get a chance to be
      scheduled on any CPU then cannot notify the benchmark kthreads to stop.
      
      This can be easily reproduced on a VM (simulated with 16 CPUs) with
      PREEMPT_VOLUNTARY:
      estuary:/mnt$ ./dma_map_benchmark -t 16 -s 1
       rcu: INFO: rcu_sched self-detected stall on CPU
       rcu:     10-...!: (5221 ticks this GP) idle=ed24/1/0x4000000000000000 softirq=142/142 fqs=0
       rcu:     (t=5254 jiffies g=-559 q=45 ncpus=16)
       rcu: rcu_sched kthread starved for 5255 jiffies! g-559 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=12
       rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
       rcu: RCU grace-period kthread stack dump:
       task:rcu_sched       state:R  running task     stack:0     pid:16    tgid:16    ppid:2      flags:0x00000008
       Call trace
        __switch_to+0xec/0x138
        __schedule+0x2f8/0x1080
        schedule+0x30/0x130
        schedule_timeout+0xa0/0x188
        rcu_gp_fqs_loop+0x128/0x528
        rcu_gp_kthread+0x1c8/0x208
        kthread+0xec/0xf8
        ret_from_fork+0x10/0x20
       Sending NMI from CPU 10 to CPUs 0:
       NMI backtrace for cpu 0
       CPU: 0 PID: 332 Comm: dma-map-benchma Not tainted 6.10.0-rc1-vanilla-LSE #8
       Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
       pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
       pc : arm_smmu_cmdq_issue_cmdlist+0x218/0x730
       lr : arm_smmu_cmdq_issue_cmdlist+0x488/0x730
       sp : ffff80008748b630
       x29: ffff80008748b630 x28: 0000000000000000 x27: ffff80008748b780
       x26: 0000000000000000 x25: 000000000000bc70 x24: 000000000001bc70
       x23: ffff0000c12af080 x22: 0000000000010000 x21: 000000000000ffff
       x20: ffff80008748b700 x19: ffff0000c12af0c0 x18: 0000000000010000
       x17: 0000000000000001 x16: 0000000000000040 x15: ffffffffffffffff
       x14: 0001ffffffffffff x13: 000000000000ffff x12: 00000000000002f1
       x11: 000000000001ffff x10: 0000000000000031 x9 : ffff800080b6b0b8
       x8 : ffff0000c2a48000 x7 : 000000000001bc71 x6 : 0001800000000000
       x5 : 00000000000002f1 x4 : 01ffffffffffffff x3 : 000000000009aaf1
       x2 : 0000000000000018 x1 : 000000000000000f x0 : ffff0000c12af18c
       Call trace:
        arm_smmu_cmdq_issue_cmdlist+0x218/0x730
        __arm_smmu_tlb_inv_range+0xe0/0x1a8
        arm_smmu_iotlb_sync+0xc0/0x128
        __iommu_dma_unmap+0x248/0x320
        iommu_dma_unmap_page+0x5c/0xe8
        dma_unmap_page_attrs+0x38/0x1d0
        map_benchmark_thread+0x118/0x2c0
        kthread+0xec/0xf8
        ret_from_fork+0x10/0x20
      
      Solve this by adding scheduling point in the kthread loop,
      so if there're other threads in the system they may have
      a chance to run, especially the thread to notify the test
      end. However this may degrade the test concurrency so it's
      recommended to run this on an idle system.
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Acked-by: default avatarBarry Song <baohua@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      54624acf
  4. 07 Jul, 2024 3 commits
    • Linus Torvalds's avatar
      Linux 6.10-rc7 · 256abd8e
      Linus Torvalds authored
      256abd8e
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 5a4bd506
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A set of clk fixes for the Qualcomm, Mediatek, and Allwinner drivers:
      
         - Fix the Qualcomm Stromer Plus PLL set_rate() clk_op to explicitly
           set the alpha enable bit and not set bits that don't exist
      
         - Mark Qualcomm IPQ9574 crypto clks as voted to avoid stuck clk
           warnings
      
         - Fix the parent of some PLLs on Qualcomm sm6530 so their rate is
           correct
      
         - Fix the min/max rate clamping logic in the Allwinner driver that
           got broken in v6.9
      
         - Limit runtime PM enabling in the Mediatek driver to only
           mt8183-mfgcfg so that system wide resume doesn't break on other
           Mediatek SoCs"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: mediatek: mt8183: Only enable runtime PM on mt8183-mfgcfg
        clk: sunxi-ng: common: Don't call hw_to_ccu_common on hw without common
        clk: qcom: gcc-ipq9574: Add BRANCH_HALT_VOTED flag
        clk: qcom: apss-ipq-pll: remove 'config_ctl_hi_val' from Stromer pll configs
        clk: qcom: clk-alpha-pll: set ALPHA_EN bit for Stromer Plus PLLs
        clk: qcom: gcc-sm6350: Fix gpll6* & gpll7 parents
      5a4bd506
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · c6653f49
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix unnecessary copy to 0 when kernel is booted at address 0
      
       - Fix usercopy crash when dumping dtl via debugfs
      
       - Avoid possible crash when PCI hotplug races with error handling
      
       - Fix kexec crash caused by scv being disabled before other CPUs
         call-in
      
       - Fix powerpc selftests build with USERCFLAGS set
      
      Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen,
      Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia.
      
      * tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Fix build with USERCFLAGS set
        powerpc/pseries: Fix scv instruction crash with kexec
        powerpc/eeh: avoid possible crash when edev->pdev changes
        powerpc/pseries: Whitelist dtl slub object for copying to userspace
        powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
      c6653f49
  5. 06 Jul, 2024 3 commits
    • Linus Torvalds's avatar
      Merge tag '6.10-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6 · 256fdd4b
      Linus Torvalds authored
      Pull smb client fix from Steve French:
       "Fix for smb3 readahead performance regression"
      
      * tag '6.10-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix read-performance regression by dropping readahead expansion
      256fdd4b
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 22f902df
      Linus Torvalds authored
      Pull i2c fix from Wolfram Sang:
       "An i2c driver fix"
      
      * tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
      22f902df
    • Michael Ellerman's avatar
      selftests/powerpc: Fix build with USERCFLAGS set · 8b7f59de
      Michael Ellerman authored
      Currently building the powerpc selftests with USERCFLAGS set to anything
      causes the build to break:
      
        $ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
        ...
        gcc -Wno-error    cache_shape.c ...
        cache_shape.c:18:10: fatal error: utils.h: No such file or directory
           18 | #include "utils.h"
              |          ^~~~~~~~~
        compilation terminated.
      
      This happens because the USERCFLAGS are added to CFLAGS in lib.mk, which
      causes the check of CFLAGS in powerpc/flags.mk to skip setting CFLAGS at
      all, resulting in none of the usual CFLAGS being passed. That can
      be seen in the output above, the only flag passed to the compiler is
      -Wno-error.
      
      Fix it by dropping the conditional setting of CFLAGS in flags.mk.
      Instead always set CFLAGS, but also append USERCFLAGS if they are set.
      
      Note that appending to CFLAGS (with +=) wouldn't work, because flags.mk
      is included by multiple Makefiles (to support partial builds), causing
      CFLAGS to be appended to multiple times. Additionally that would place
      the USERCFLAGS prior to the standard CFLAGS, meaning the USERCFLAGS
      couldn't override the standard flags. Being able to override the
      standard flags is desirable, for example for adding -Wno-error.
      
      With the fix in place, the CFLAGS are set correctly, including the
      USERCFLAGS:
      
        $ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
        ...
        gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v6.10-rc2-7-gdea17e7e56c3"'
        -I/home/michael/linux/tools/testing/selftests/powerpc/include -Wno-error
        cache_shape.c ...
      
      Fixes: 5553a793 ("selftests/powerpc: Add flags.mk to support pmu buildable")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240706120833.909853-1-mpe@ellerman.id.au
      8b7f59de
  6. 05 Jul, 2024 11 commits
  7. 04 Jul, 2024 20 commits