1. 07 May, 2024 6 commits
    • Alexander Lobakin's avatar
      page_pool: make sure frag API fields don't span between cachelines · 1f20a576
      Alexander Lobakin authored
      After commit 5027ec19 ("net: page_pool: split the page_pool_params
      into fast and slow") that made &page_pool contain only "hot" params at
      the start, cacheline boundary chops frag API fields group in the middle
      again.
      To not bother with this each time fast params get expanded or shrunk,
      let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
      their actual size (2 longs + 1 int). This ensures 16-byte alignment for
      the 32-bit architectures and 32-byte alignment for the 64-bit ones,
      excluding unnecessary false-sharing.
      ::page_state_hold_cnt is used quite intensively on hotpath no matter if
      frag API is used, so move it to the newly created hole in the first
      cacheline.
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      1f20a576
    • Alexander Lobakin's avatar
      iommu/dma: avoid expensive indirect calls for sync operations · ea01fa70
      Alexander Lobakin authored
      When IOMMU is on, the actual synchronization happens in the same cases
      as with the direct DMA. Advertise %DMA_F_CAN_SKIP_SYNC in IOMMU DMA to
      skip sync ops calls (indirect) for non-SWIOTLB buffers.
      
      perf profile before the patch:
      
          18.53%  [kernel]       [k] gq_rx_skb
          14.77%  [kernel]       [k] napi_reuse_skb
           8.95%  [kernel]       [k] skb_release_data
           5.42%  [kernel]       [k] dev_gro_receive
           5.37%  [kernel]       [k] memcpy
      <*>  5.26%  [kernel]       [k] iommu_dma_sync_sg_for_cpu
           4.78%  [kernel]       [k] tcp_gro_receive
      <*>  4.42%  [kernel]       [k] iommu_dma_sync_sg_for_device
           4.12%  [kernel]       [k] ipv6_gro_receive
           3.65%  [kernel]       [k] gq_pool_get
           3.25%  [kernel]       [k] skb_gro_receive
           2.07%  [kernel]       [k] napi_gro_frags
           1.98%  [kernel]       [k] tcp6_gro_receive
           1.27%  [kernel]       [k] gq_rx_prep_buffers
           1.18%  [kernel]       [k] gq_rx_napi_handler
           0.99%  [kernel]       [k] csum_partial
           0.74%  [kernel]       [k] csum_ipv6_magic
           0.72%  [kernel]       [k] free_pcp_prepare
           0.60%  [kernel]       [k] __napi_poll
           0.58%  [kernel]       [k] net_rx_action
           0.56%  [kernel]       [k] read_tsc
      <*>  0.50%  [kernel]       [k] __x86_indirect_thunk_r11
           0.45%  [kernel]       [k] memset
      
      After patch, lines with <*> no longer show up, and overall
      cpu usage looks much better (~60% instead of ~72%):
      
          25.56%  [kernel]       [k] gq_rx_skb
           9.90%  [kernel]       [k] napi_reuse_skb
           7.39%  [kernel]       [k] dev_gro_receive
           6.78%  [kernel]       [k] memcpy
           6.53%  [kernel]       [k] skb_release_data
           6.39%  [kernel]       [k] tcp_gro_receive
           5.71%  [kernel]       [k] ipv6_gro_receive
           4.35%  [kernel]       [k] napi_gro_frags
           4.34%  [kernel]       [k] skb_gro_receive
           3.50%  [kernel]       [k] gq_pool_get
           3.08%  [kernel]       [k] gq_rx_napi_handler
           2.35%  [kernel]       [k] tcp6_gro_receive
           2.06%  [kernel]       [k] gq_rx_prep_buffers
           1.32%  [kernel]       [k] csum_partial
           0.93%  [kernel]       [k] csum_ipv6_magic
           0.65%  [kernel]       [k] net_rx_action
      
      iavf yields +10% of Mpps on Rx. This also unblocks batched allocations
      of XSk buffers when IOMMU is active.
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      ea01fa70
    • Alexander Lobakin's avatar
      dma: avoid redundant calls for sync operations · f406c8e4
      Alexander Lobakin authored
      Quite often, devices do not need dma_sync operations on x86_64 at least.
      Indeed, when dev_is_dma_coherent(dev) is true and
      dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu()
      and friends do nothing.
      
      However, indirectly calling them when CONFIG_RETPOLINE=y consumes about
      10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate.
      Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%.
      
      Add dev->need_dma_sync boolean and turn it off during the device
      initialization (dma_set_mask()) depending on the setup:
      dev_is_dma_coherent() for the direct DMA, !(sync_single_for_device ||
      sync_single_for_cpu) or the new dma_map_ops flag, %DMA_F_CAN_SKIP_SYNC,
      advertised for non-NULL DMA ops.
      Then later, if/when swiotlb is used for the first time, the flag
      is reset back to on, from swiotlb_tbl_map_single().
      
      On iavf, the UDP trafficgen with XDP_DROP in skb mode test shows
      +3-5% increase for direct DMA.
      
      Suggested-by: Christoph Hellwig <hch@lst.de> # direct DMA shortcut
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      f406c8e4
    • Alexander Lobakin's avatar
      dma: compile-out DMA sync op calls when not used · fe7514b1
      Alexander Lobakin authored
      Some platforms do have DMA, but DMA there is always direct and coherent.
      Currently, even on such platforms DMA sync operations are compiled and
      called.
      Add a new hidden Kconfig symbol, DMA_NEED_SYNC, and set it only when
      either sync operations are needed or there is DMA ops or swiotlb
      or DMA debug is enabled. Compile global dma_sync_*() and dma_need_sync()
      only when it's set, otherwise provide empty inline stubs.
      The change allows for future optimizations of DMA sync calls depending
      on runtime conditions.
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      fe7514b1
    • Michael Kelley's avatar
      iommu/dma: fix zeroing of bounce buffer padding used by untrusted devices · 2650073f
      Michael Kelley authored
      iommu_dma_map_page() allocates swiotlb memory as a bounce buffer when an
      untrusted device wants to map only part of the memory in an granule.  The
      goal is to disallow the untrusted device having DMA access to unrelated
      kernel data that may be sharing the granule.  To meet this goal, the
      bounce buffer itself is zeroed, and any additional swiotlb memory up to
      alloc_size after the bounce buffer end (i.e., "post-padding") is also
      zeroed.
      
      However, as of commit 901c7280 ("Reinstate some of "swiotlb: rework
      "fix info leak with DMA_FROM_DEVICE"""), swiotlb_tbl_map_single() always
      initializes the contents of the bounce buffer to the original memory.
      Zeroing the bounce buffer is redundant and probably wrong per the
      discussion in that commit. Only the post-padding needs to be zeroed.
      
      Also, when the DMA min_align_mask is non-zero, the allocated bounce
      buffer space may not start on a granule boundary.  The swiotlb memory
      from the granule boundary to the start of the allocated bounce buffer
      might belong to some unrelated bounce buffer. So as described in the
      "second issue" in [1], it can't be zeroed to protect against untrusted
      devices. But as of commit af133562 ("swiotlb: extend buffer
      pre-padding to alloc_align_mask if necessary"), swiotlb_tbl_map_single()
      allocates pre-padding slots when necessary to meet min_align_mask
      requirements, making it possible to zero the pre-padding area as well.
      
      Finally, iommu_dma_map_page() uses the swiotlb for untrusted devices
      and also for certain kmalloc() memory. Current code does the zeroing
      for both cases, but it is needed only for the untrusted device case.
      
      Fix all of this by updating iommu_dma_map_page() to zero both the
      pre-padding and post-padding areas, but not the actual bounce buffer.
      Do this only in the case where the bounce buffer is used because
      of an untrusted device.
      
      [1] https://lore.kernel.org/all/20210929023300.335969-1-stevensd@google.com/Signed-off-by: default avatarMichael Kelley <mhklinux@outlook.com>
      Reviewed-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2650073f
    • Michael Kelley's avatar
      swiotlb: remove alloc_size argument to swiotlb_tbl_map_single() · 327e2c97
      Michael Kelley authored
      Currently swiotlb_tbl_map_single() takes alloc_align_mask and
      alloc_size arguments to specify an swiotlb allocation that is larger
      than mapping_size.  This larger allocation is used solely by
      iommu_dma_map_single() to handle untrusted devices that should not have
      DMA visibility to memory pages that are partially used for unrelated
      kernel data.
      
      Having two arguments to specify the allocation is redundant. While
      alloc_align_mask naturally specifies the alignment of the starting
      address of the allocation, it can also implicitly specify the size
      by rounding up the mapping_size to that alignment.
      
      Additionally, the current approach has an edge case bug.
      iommu_dma_map_page() already does the rounding up to compute the
      alloc_size argument. But swiotlb_tbl_map_single() then calculates the
      alignment offset based on the DMA min_align_mask, and adds that offset to
      alloc_size. If the offset is non-zero, the addition may result in a value
      that is larger than the max the swiotlb can allocate.  If the rounding up
      is done _after_ the alignment offset is added to the mapping_size (and
      the original mapping_size conforms to the value returned by
      swiotlb_max_mapping_size), then the max that the swiotlb can allocate
      will not be exceeded.
      
      In view of these issues, simplify the swiotlb_tbl_map_single() interface
      by removing the alloc_size argument. Most call sites pass the same value
      for mapping_size and alloc_size, and they pass alloc_align_mask as zero.
      Just remove the redundant argument from these callers, as they will see
      no functional change. For iommu_dma_map_page() also remove the alloc_size
      argument, and have swiotlb_tbl_map_single() compute the alloc_size by
      rounding up mapping_size after adding the offset based on min_align_mask.
      This has the side effect of fixing the edge case bug but with no other
      functional change.
      
      Also add a sanity test on the alloc_align_mask. While IOMMU code
      currently ensures the granule is not larger than PAGE_SIZE, if that
      guarantee were to be removed in the future, the downstream effect on the
      swiotlb might go unnoticed until strange allocation failures occurred.
      
      Tested on an ARM64 system with 16K page size and some kernel test-only
      hackery to allow modifying the DMA min_align_mask and the granule size
      that becomes the alloc_align_mask. Tested these combinations with a
      variety of original memory addresses and sizes, including those that
      reproduce the edge case bug:
      
       * 4K granule and 0 min_align_mask
       * 4K granule and 0xFFF min_align_mask (4K - 1)
       * 16K granule and 0xFFF min_align_mask
       * 64K granule and 0xFFF min_align_mask
       * 64K granule and 0x3FFF min_align_mask (16K - 1)
      
      With the changes, all combinations pass.
      Signed-off-by: default avatarMichael Kelley <mhklinux@outlook.com>
      Reviewed-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      327e2c97
  2. 02 May, 2024 1 commit
  3. 28 Apr, 2024 6 commits
  4. 27 Apr, 2024 9 commits
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux · 2c815938
      Linus Torvalds authored
      Pull Rust fixes from Miguel Ojeda:
      
       - Soundness: make internal functions generated by the 'module!' macro
         inaccessible, do not implement 'Zeroable' for 'Infallible' and
         require 'Send' for the 'Module' trait.
      
       - Build: avoid errors with "empty" files and workaround 'rustdoc' ICE.
      
       - Kconfig: depend on '!CFI_CLANG' and avoid selecting 'CONSTRUCTORS'.
      
       - Code docs: remove non-existing key from 'module!' macro example.
      
       - Docs: trivial rendering fix in arch table.
      
      * tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux:
        rust: remove `params` from `module` macro example
        kbuild: rust: force `alloc` extern to allow "empty" Rust files
        kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
        rust: kernel: require `Send` for `Module` implementations
        rust: phy: implement `Send` for `Registration`
        rust: make mutually exclusive with CFI_CLANG
        rust: macros: fix soundness issue in `module!` macro
        rust: init: remove impl Zeroable for Infallible
        docs: rust: fix improper rendering in Arch Support page
        rust: don't select CONSTRUCTORS
      2c815938
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 57865f39
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix for TASK_SIZE on rv64/NOMMU, to reflect the lack of user/kernel
         separation
      
       - A fix to avoid loading rv64/NOMMU kernel past the start of RAM
      
       - A fix for RISCV_HWPROBE_EXT_ZVFHMIN on ilp32 to avoid signed integer
         overflow in the bitmask
      
       - The sud_test kselftest has been fixed to properly swizzle the syscall
         number into the return register, which are not the same on RISC-V
      
       - A fix for a build warning in the perf tools on rv32
      
       - A fix for the CBO selftests, to avoid non-constants leaking into the
         inline asm
      
       - A pair of fixes for T-Head PBMT errata probing, which has been
         renamed MAE by the vendor
      
      * tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2
        perf riscv: Fix the warning due to the incompatible type
        riscv: T-Head: Test availability bit before enabling MAE errata
        riscv: thead: Rename T-Head PBMT to MAE
        selftests: sud_test: return correct emulated syscall value on RISC-V
        riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN
        riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
        riscv: Fix TASK_SIZE on 64-bit NOMMU
      57865f39
    • Linus Torvalds's avatar
      Merge tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · d43df69f
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
       "Three smb3 client fixes, all also for stable:
      
         - two small locking fixes spotted by Coverity
      
         - FILE_ALL_INFO and network_open_info packing fix"
      
      * tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: fix lock ordering potential deadlock in cifs_sync_mid_result
        smb3: missing lock when picking channel
        smb: client: Fix struct_group() usage in __packed structs
      d43df69f
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5d12ed4b
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Fix a race condition in the at24 eeprom handler, a NULL pointer
        exception in the I2C core for controllers only using target modes,
        drop a MAINTAINERS entry, and fix an incorrect DT binding for at24"
      
      * tag 'i2c-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: smbus: fix NULL function pointer dereference
        MAINTAINERS: Drop entry for PCA9541 bus master selector
        eeprom: at24: fix memory corruption race condition
        dt-bindings: eeprom: at24: Fix ST M24C64-D compatible schema
      5d12ed4b
    • Tetsuo Handa's avatar
      profiling: Remove create_prof_cpu_mask(). · 2e5449f4
      Tetsuo Handa authored
      create_prof_cpu_mask() is no longer used after commit 1f44a225 ("s390:
      convert interrupt handling to use generic hardirq").
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e5449f4
    • Linus Torvalds's avatar
      Merge tag 'soundwire-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire · 8a5c3ef7
      Linus Torvalds authored
      Pull soundwire fix from Vinod Koul:
      
       - Single AMD driver fix for wake interrupt handling in clockstop mode
      
      * tag 'soundwire-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
        soundwire: amd: fix for wake interrupt handling for clockstop mode
      8a5c3ef7
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine · 6fba14a7
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
      
       - Revert pl330 issue_pending waits until WFP state due to regression
         reported in Bluetooth loading
      
       - Xilinx driver fixes for synchronization, buffer offsets, locking and
         kdoc
      
       - idxd fixes for spinlock and preventing the migration of the perf
         context to an invalid target
      
       - idma driver fix for interrupt handling when powered off
      
       - Tegra driver residual calculation fix
      
       - Owl driver register access fix
      
      * tag 'dmaengine-fix-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
        dmaengine: idxd: Fix oops during rmmod on single-CPU platforms
        dmaengine: xilinx: xdma: Clarify kdoc in XDMA driver
        dmaengine: xilinx: xdma: Fix synchronization issue
        dmaengine: xilinx: xdma: Fix wrong offsets in the buffers addresses in dma descriptor
        dma: xilinx_dpdma: Fix locking
        dmaengine: idxd: Convert spinlock to mutex to lock evl workqueue
        idma64: Don't try to serve interrupts when device is powered off
        dmaengine: tegra186: Fix residual calculation
        dmaengine: owl: fix register access functions
        dmaengine: Revert "dmaengine: pl330: issue_pending waits until WFP state"
      6fba14a7
    • Linus Torvalds's avatar
      Merge tag 'phy-fixes-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · 63407d30
      Linus Torvalds authored
      Pull phy fixes from Vinod Koul:
      
       - static checker (array size, bounds) fix for marvel driver
      
       - Rockchip rk3588 pcie fixes for bifurcation and mux
      
       - Qualcomm qmp-compbo fix for VCO, register base and regulator name for
         m31 driver
      
       - charger det crash fix for ti driver
      
      * tag 'phy-fixes-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
        phy: ti: tusb1210: Resolve charger-det crash if charger psy is unregistered
        phy: qcom: qmp-combo: fix VCO div offset on v5_5nm and v6
        phy: phy-rockchip-samsung-hdptx: Select CONFIG_RATIONAL
        phy: qcom: m31: match requested regulator name with dt schema
        phy: qcom: qmp-combo: Fix register base for QSERDES_DP_PHY_MODE
        phy: qcom: qmp-combo: Fix VCO div offset on v3
        phy: rockchip: naneng-combphy: Fix mux on rk3588
        phy: rockchip-snps-pcie3: fix clearing PHP_GRF_PCIESEL_CON bits
        phy: rockchip-snps-pcie3: fix bifurcation on rk3588
        phy: freescale: imx8m-pcie: fix pcie link-up instability
        phy: marvell: a3700-comphy: Fix hardcoded array size
        phy: marvell: a3700-comphy: Fix out of bounds read
      63407d30
    • Wolfram Sang's avatar
      i2c: smbus: fix NULL function pointer dereference · 91811a31
      Wolfram Sang authored
      Baruch reported an OOPS when using the designware controller as target
      only. Target-only modes break the assumption of one transfer function
      always being available. Fix this by always checking the pointer in
      __i2c_transfer.
      Reported-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Closes: https://lore.kernel.org/r/4269631780e5ba789cf1ae391eec1b959def7d99.1712761976.git.baruch@tkos.co.il
      Fixes: 4b1acc43 ("i2c: core changes for slave support")
      [wsa: dropped the simplification in core-smbus to avoid theoretical regressions]
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Tested-by: default avatarBaruch Siach <baruch@tkos.co.il>
      91811a31
  5. 26 Apr, 2024 18 commits