1. 26 Jul, 2023 6 commits
  2. 20 Jul, 2023 2 commits
  3. 19 Jul, 2023 1 commit
  4. 14 Jul, 2023 2 commits
  5. 13 Jul, 2023 1 commit
  6. 12 Jul, 2023 2 commits
    • Mostafa Saleh's avatar
      KVM: arm64: Add missing BTI instructions · dcf89d11
      Mostafa Saleh authored
      Some bti instructions were missing from
      commit b53d4a27 ("KVM: arm64: Use BTI for nvhe")
      
      1) kvm_host_psci_cpu_entry
      kvm_host_psci_cpu_entry is called from __kvm_hyp_init_cpu through "br"
      instruction as __kvm_hyp_init_cpu resides in idmap section while
      kvm_host_psci_cpu_entry is in hyp .text so the offset is larger than
      128MB range covered by "b".
      Which means that this function should start with "bti j" instruction.
      
      LLVM which is the only compiler supporting BTI for Linux, adds "bti j"
      for jump tables or by when taking the address of the block [1].
      Same behaviour is observed with GCC.
      
      As kvm_host_psci_cpu_entry is a C function, this must be done in
      assembly.
      
      Another solution is to use X16/X17 with "br", as according to ARM
      ARM DDI0487I.a RLJHCL/IGMGRS, PACIASP has an implicit branch
      target identification instruction that is compatible with
      PSTATE.BTYPE 0b01 which includes "br X16/X17"
      And the kvm_host_psci_cpu_entry has PACIASP as it is an external
      function.
      Although, using explicit "bti" makes it more clear than relying on
      which register is used.
      
      A third solution is to clear SCTLR_EL2.BT, which would make PACIASP
      compatible PSTATE.BTYPE 0b11 ("br" to other registers).
      However this deviates from the kernel behaviour (in bti_enable()).
      
      2) Spectre vector table
      "br" instructions are generated at runtime for the vector table
      (__bp_harden_hyp_vecs).
      These branches would land on vectors in __kvm_hyp_vector at offset 8.
      As all the macros are defined with valid_vect/invalid_vect, it is
      sufficient to add "bti j" at the correct offset.
      
      [1] https://reviews.llvm.org/D52867
      
      Fixes: b53d4a27 ("KVM: arm64: Use BTI for nvhe")
      Signed-off-by: default avatarMostafa Saleh <smostafa@google.com>
      Reported-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Link: https://lore.kernel.org/r/20230706152240.685684-1-smostafa@google.comSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      dcf89d11
    • Oliver Upton's avatar
      KVM: arm64: Correctly handle page aging notifiers for unaligned memslot · df6556ad
      Oliver Upton authored
      Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
      memory. This is even the case with hugepages, although it is a rather
      suboptimal configuration as PTE level mappings are used at stage-2.
      
      The arm64 page aging handlers have an assumption that the specified
      range is exactly one page/block of memory, which in the aforementioned
      case is not necessarily true. All together this leads to the WARN() in
      kvm_age_gfn() firing.
      
      However, the WARN is only part of the issue as the table walkers visit
      at most a single leaf PTE. For hugepage-backed memory in a memslot that
      isn't hugepage-aligned, page aging entirely misses accesses to the
      hugepage beyond the first page in the memslot.
      
      Add a new walker dedicated to handling page aging MMU notifiers capable
      of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
      walker and drop the WARN that caught the issue in the first place. The
      implementation of this walker was inspired by the test_clear_young()
      implementation by Yu Zhao [*], but repurposed to address a bug in the
      existing aging implementation.
      
      Cc: stable@vger.kernel.org # v5.15
      Fixes: 056aad67 ("kvm: arm/arm64: Rework gpa callback handlers")
      Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/Co-developed-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reported-by: default avatarReiji Watanabe <reijiw@google.com>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarShaoqin Huang <shahuang@redhat.com>
      Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.devSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      df6556ad
  7. 11 Jul, 2023 3 commits
    • Marc Zyngier's avatar
      KVM: arm64: Disable preemption in kvm_arch_hardware_enable() · 970dee09
      Marc Zyngier authored
      Since 0bf50497 ("KVM: Drop kvm_count_lock and instead protect
      kvm_usage_count with kvm_lock"), hotplugging back a CPU whilst
      a guest is running results in a number of ugly splats as most
      of this code expects to run with preemption disabled, which isn't
      the case anymore.
      
      While the context is preemptable, it isn't migratable, which should
      be enough. But we have plenty of preemptible() checks all over
      the place, and our per-CPU accessors also disable preemption.
      
      Since this affects released versions, let's do the easy fix first,
      disabling preemption in kvm_arch_hardware_enable(). We can always
      revisit this with a more invasive fix in the future.
      
      Fixes: 0bf50497 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
      Reported-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Tested-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/aeab7562-2d39-e78e-93b1-4711f8cc3fa5@arm.com
      Cc: stable@vger.kernel.org # v6.3, v6.4
      Link: https://lore.kernel.org/r/20230703163548.1498943-1-maz@kernel.orgSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      970dee09
    • Sudeep Holla's avatar
      KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm · fa729bc7
      Sudeep Holla authored
      Currently there is no synchronisation between finalize_pkvm() and
      kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
      kvm_arm_init() fails resulting in the following warning on all the CPUs
      and eventually a HYP panic:
      
        | kvm [1]: IPA Size Limit: 48 bits
        | kvm [1]: Failed to init hyp memory protection
        | kvm [1]: error initializing Hyp mode: -22
        |
        | <snip>
        |
        | WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
        | Modules linked in:
        | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
        | Hardware name: FVP Base RevC (DT)
        | pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
        | pc : _kvm_host_prot_finalize+0x30/0x50
        | lr : __flush_smp_call_function_queue+0xd8/0x230
        |
        | Call trace:
        |  _kvm_host_prot_finalize+0x3c/0x50
        |  on_each_cpu_cond_mask+0x3c/0x6c
        |  pkvm_drop_host_privileges+0x4c/0x78
        |  finalize_pkvm+0x3c/0x5c
        |  do_one_initcall+0xcc/0x240
        |  do_initcall_level+0x8c/0xac
        |  do_initcalls+0x54/0x94
        |  do_basic_setup+0x1c/0x28
        |  kernel_init_freeable+0x100/0x16c
        |  kernel_init+0x20/0x1a0
        |  ret_from_fork+0x10/0x20
        | Failed to finalize Hyp protection: -22
        |     dtb=fvp-base-revc.dtb
        | kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
        | kvm [95]: nVHE call trace:
        | kvm [95]:  [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
        | kvm [95]:  [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
        | kvm [95]:  [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
        | kvm [95]:  [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
        | kvm [95]: ---[ end nVHE call trace ]---
        | kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
        | Kernel panic - not syncing: HYP panic:
        | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
        | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
        | VCPU:0000000000000000
        | CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G        W          6.4.0 #237
        | Hardware name: FVP Base RevC (DT)
        | Workqueue: rpciod rpc_async_schedule
        | Call trace:
        |  dump_backtrace+0xec/0x108
        |  show_stack+0x18/0x2c
        |  dump_stack_lvl+0x50/0x68
        |  dump_stack+0x18/0x24
        |  panic+0x138/0x33c
        |  nvhe_hyp_panic_handler+0x100/0x184
        |  new_slab+0x23c/0x54c
        |  ___slab_alloc+0x3e4/0x770
        |  kmem_cache_alloc_node+0x1f0/0x278
        |  __alloc_skb+0xdc/0x294
        |  tcp_stream_alloc_skb+0x2c/0xf0
        |  tcp_sendmsg_locked+0x3d0/0xda4
        |  tcp_sendmsg+0x38/0x5c
        |  inet_sendmsg+0x44/0x60
        |  sock_sendmsg+0x1c/0x34
        |  xprt_sock_sendmsg+0xdc/0x274
        |  xs_tcp_send_request+0x1ac/0x28c
        |  xprt_transmit+0xcc/0x300
        |  call_transmit+0x78/0x90
        |  __rpc_execute+0x114/0x3d8
        |  rpc_async_schedule+0x28/0x48
        |  process_one_work+0x1d8/0x314
        |  worker_thread+0x248/0x474
        |  kthread+0xfc/0x184
        |  ret_from_fork+0x10/0x20
        | SMP: stopping secondary CPUs
        | Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
        | PHYS_OFFSET: 0x80000000
        | CPU features: 0x00000000,1035b7a3,ccfe773f
        | Memory Limit: none
        | ---[ end Kernel panic - not syncing: HYP panic:
        | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
        | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
        | VCPU:0000000000000000 ]---
      
      Fix it by checking for the successfull initialisation of kvm_arm_init()
      in finalize_pkvm() before proceeding any futher.
      
      Fixes: 87727ba2 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
      Cc: Will Deacon <will@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Zenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.comSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      fa729bc7
    • Marc Zyngier's avatar
      KVM: arm64: timers: Use CNTHCTL_EL2 when setting non-CNTKCTL_EL1 bits · fe769e6c
      Marc Zyngier authored
      It recently appeared that, when running VHE, there is a notable
      difference between using CNTKCTL_EL1 and CNTHCTL_EL2, despite what
      the architecture documents:
      
      - When accessed from EL2, bits [19:18] and [16:10] of CNTKCTL_EL1 have
        the same assignment as CNTHCTL_EL2
      - When accessed from EL1, bits [19:18] and [16:10] are RES0
      
      It is all OK, until you factor in NV, where the EL2 guest runs at EL1.
      In this configuration, CNTKCTL_EL11 doesn't trap, nor ends up in
      the VNCR page. This means that any write from the guest affecting
      CNTHCTL_EL2 using CNTKCTL_EL1 ends up losing some state. Not good.
      
      The fix it obvious: don't use CNTKCTL_EL1 if you want to change bits
      that are not part of the EL1 definition of CNTKCTL_EL1, and use
      CNTHCTL_EL2 instead. This doesn't change anything for a bare-metal OS,
      and fixes it when running under NV. The NV hypervisor will itself
      have to work harder to merge the two accessors.
      
      Note that there is a pending update to the architecture to address
      this issue by making the affected bits UNKNOWN when CNTKCTL_EL1 is
      used from EL2 with VHE enabled.
      
      Fixes: c605ee24 ("KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org # v6.4
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Link: https://lore.kernel.org/r/20230627140557.544885-1-maz@kernel.orgSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      fe769e6c
  8. 09 Jul, 2023 10 commits
  9. 08 Jul, 2023 13 commits
    • Hugh Dickins's avatar
      mm: lock newly mapped VMA with corrected ordering · 1c7873e3
      Hugh Dickins authored
      Lockdep is certainly right to complain about
      
        (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
                       but task is already holding lock:
        (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
      
      Invert those to the usual ordering.
      
      Fixes: 33313a74 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c7873e3
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of... · 946c6b59
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull hotfixes from Andrew Morton:
       "16 hotfixes. Six are cc:stable and the remainder address post-6.4
        issues"
      
      The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
      it was all hopefully fixed in mainline.
      
      * tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        lib: dhry: fix sleeping allocations inside non-preemptable section
        kasan, slub: fix HW_TAGS zeroing with slub_debug
        kasan: fix type cast in memory_is_poisoned_n
        mailmap: add entries for Heiko Stuebner
        mailmap: update manpage link
        bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
        MAINTAINERS: add linux-next info
        mailmap: add Markus Schneider-Pargmann
        writeback: account the number of pages written back
        mm: call arch_swap_restore() from do_swap_page()
        squashfs: fix cache race with migration
        mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
        docs: update ocfs2-devel mailing list address
        MAINTAINERS: update ocfs2-devel mailing list address
        mm: disable CONFIG_PER_VMA_LOCK until its fixed
        fork: lock VMAs of the parent process when forking
      946c6b59
    • Suren Baghdasaryan's avatar
      fork: lock VMAs of the parent process when forking · fb49c455
      Suren Baghdasaryan authored
      When forking a child process, the parent write-protects anonymous pages
      and COW-shares them with the child being forked using copy_present_pte().
      
      We must not take any concurrent page faults on the source vma's as they
      are being processed, as we expect both the vma and the pte's behind it
      to be stable.  For example, the anon_vma_fork() expects the parents
      vma->anon_vma to not change during the vma copy.
      
      A concurrent page fault on a page newly marked read-only by the page
      copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
      source vma, defeating the anon_vma_clone() that wasn't done because the
      parent vma originally didn't have an anon_vma, but we now might end up
      copying a pte entry for a page that has one.
      
      Before the per-vma lock based changes, the mmap_lock guaranteed
      exclusion with concurrent page faults.  But now we need to do a
      vma_start_write() to make sure no concurrent faults happen on this vma
      while it is being processed.
      
      This fix can potentially regress some fork-heavy workloads.  Kernel
      build time did not show noticeable regression on a 56-core machine while
      a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
      shows ~5% regression.  If such fork time regression is unacceptable,
      disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
      optimizations are possible if this regression proves to be problematic.
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/Reported-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/Reported-by: default avatarJacob Young <jacobly.alt@gmail.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
      Fixes: 0bff0aae ("x86/mm: try VMA lock-based page fault handling first")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb49c455
    • Suren Baghdasaryan's avatar
      mm: lock newly mapped VMA which can be modified after it becomes visible · 33313a74
      Suren Baghdasaryan authored
      mmap_region adds a newly created VMA into VMA tree and might modify it
      afterwards before dropping the mmap_lock.  This poses a problem for page
      faults handled under per-VMA locks because they don't take the mmap_lock
      and can stumble on this VMA while it's still being modified.  Currently
      this does not pose a problem since post-addition modifications are done
      only for file-backed VMAs, which are not handled under per-VMA lock.
      However, once support for handling file-backed page faults with per-VMA
      locks is added, this will become a race.
      
      Fix this by write-locking the VMA before inserting it into the VMA tree.
      Other places where a new VMA is added into VMA tree do not modify it
      after the insertion, so do not need the same locking.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33313a74
    • Suren Baghdasaryan's avatar
      mm: lock a vma before stack expansion · c137381f
      Suren Baghdasaryan authored
      With recent changes necessitating mmap_lock to be held for write while
      expanding a stack, per-VMA locks should follow the same rules and be
      write-locked to prevent page faults into the VMA being expanded. Add
      the necessary locking.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c137381f
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 7fcd473a
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "A few late arriving patches that missed the initial pull request. It's
        mostly bug fixes (the dt-bindings is a fix for the initial pull)"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Remove unused function declaration
        scsi: target: docs: Remove tcm_mod_builder.py
        scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
        scsi: dt-bindings: ufs: qcom: Fix ICE phandle
        scsi: core: Simplify scsi_cdl_check_cmd()
        scsi: isci: Fix comment typo
        scsi: smartpqi: Replace one-element arrays with flexible-array members
        scsi: target: tcmu: Replace strlcpy() with strscpy()
        scsi: ncr53c8xx: Replace strlcpy() with strscpy()
        scsi: lpfc: Fix lpfc_name struct packing
      7fcd473a
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 84dc5aa3
      Linus Torvalds authored
      Pull more i2c updates from Wolfram Sang:
      
       - xiic patch should have been in the original pull but slipped through
      
       - mpc patch fixes a build regression
      
       - nomadik cleanup
      
      * tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: mpc: Drop unused variable
        i2c: nomadik: Remove a useless call in the remove function
        i2c: xiic: Don't try to handle more interrupt events after error
      84dc5aa3
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 8fc3b8f0
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - Check for NULL bdev in LoadPin (Matthias Kaehlcke)
      
       - Revert unwanted KUnit FORTIFY build default
      
       - Fix 1-element array causing boot warnings with xhci-hub
      
      * tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
        Revert "fortify: Allow KUnit test to build without FORTIFY"
        dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
      8fc3b8f0
    • Anup Sharma's avatar
      ntb: hw: amd: Fix debugfs_create_dir error checking · bff6efc5
      Anup Sharma authored
      The debugfs_create_dir function returns ERR_PTR in case of error, and the
      only correct way to check if an error occurred is 'IS_ERR' inline function.
      This patch will replace the null-comparison with IS_ERR.
      Signed-off-by: default avatarAnup Sharma <anupnewsmail@gmail.com>
      Suggested-by: default avatarIvan Orlov <ivan.orlov0322@gmail.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      bff6efc5
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of... · c206353d
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
      
      Pull more perf tools updates from Namhyung Kim:
       "These are remaining changes and fixes for this cycle.
      
        Build:
      
         - Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
           and skip if the vmlinux has no BTF.
      
         - Replace deprecated clang -target xxx option by --target=xxx.
      
        perf record:
      
         - Print event attributes with well known type and config symbols in
           the debug output like below:
      
             # perf record -e cycles,cpu-clock -C0 -vv true
             <SNIP>
             ------------------------------------------------------------
             perf_event_attr:
               type                             0 (PERF_TYPE_HARDWARE)
               size                             136
               config                           0 (PERF_COUNT_HW_CPU_CYCLES)
               { sample_period, sample_freq }   4000
               sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
               read_format                      ID
               disabled                         1
               inherit                          1
               freq                             1
               sample_id_all                    1
               exclude_guest                    1
             ------------------------------------------------------------
             sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
             ------------------------------------------------------------
             perf_event_attr:
               type                             1 (PERF_TYPE_SOFTWARE)
               size                             136
               config                           0 (PERF_COUNT_SW_CPU_CLOCK)
               { sample_period, sample_freq }   4000
               sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
               read_format                      ID
               disabled                         1
               inherit                          1
               freq                             1
               sample_id_all                    1
               exclude_guest                    1
      
         - Update AMD IBS event error message since it now support per-process
           profiling but no priviledge filters.
      
             $ sudo perf record -e ibs_op//k -C 0
             Error:
             AMD IBS doesn't support privilege filtering. Try again without
             the privilege modifiers (like 'k') at the end.
      
        perf lock contention:
      
         - Support CSV style output using -x option
      
             $ sudo perf lock con -ab -x, sleep 1
             # output: contended, total wait, max wait, avg wait, type, caller
             19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
             15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
             4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
             1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
             8, 67608, 27404, 8451, spinlock, __queue_work+0x174
             3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
             3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
             2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
             1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
      
         - Add --output option to save the data to a file not to be interfered
           by other debug messages.
      
        Test:
      
         - Fix event parsing test on ARM where there's no raw PMU nor supports
           PERF_PMU_CAP_EXTENDED_HW_TYPE.
      
         - Update the lock contention test case for CSV output.
      
         - Fix a segfault in the daemon command test.
      
        Vendor events (JSON):
      
         - Add has_event() to check if the given event is available on system
           at runtime. On Intel machines, some transaction events may not be
           present when TSC extensions are disabled.
      
         - Update Intel event metrics.
      
        Misc:
      
         - Sort symbols by name using an external array of pointers instead of
           a rbtree node in the symbol. This will save 16-bytes or 24-bytes
           per symbol whether the sorting is actually requested or not.
      
         - Fix unwinding DWARF callstacks using libdw when --symfs option is
           used"
      
      * tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
        perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
        perf test: Fix event parsing test on Arm
        perf evsel amd: Fix IBS error message
        perf: unwind: Fix symfs with libdw
        perf symbol: Fix uninitialized return value in symbols__find_by_name()
        perf test: Test perf lock contention CSV output
        perf lock contention: Add --output option
        perf lock contention: Add -x option for CSV style output
        perf lock: Remove stale comments
        perf vendor events intel: Update tigerlake to 1.13
        perf vendor events intel: Update skylakex to 1.31
        perf vendor events intel: Update skylake to 57
        perf vendor events intel: Update sapphirerapids to 1.14
        perf vendor events intel: Update icelakex to 1.21
        perf vendor events intel: Update icelake to 1.19
        perf vendor events intel: Update cascadelakex to 1.19
        perf vendor events intel: Update meteorlake to 1.03
        perf vendor events intel: Add rocketlake events/metrics
        perf vendor metrics intel: Make transaction metrics conditional
        perf jevents: Support for has_event function
        ...
      c206353d
    • Linus Torvalds's avatar
      Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linux · ad8258e8
      Linus Torvalds authored
      Pull bitmap updates from Yury Norov:
       "Fixes for different bitmap pieces:
      
         - lib/test_bitmap: increment failure counter properly
      
           The tests that don't use expect_eq() macro to determine that a test
           is failured must increment failed_tests explicitly.
      
         - lib/bitmap: drop optimization of bitmap_{from,to}_arr64
      
           bitmap_{from,to}_arr64() optimization is overly optimistic
           on 32-bit LE architectures when it's wired to
           bitmap_copy_clear_tail().
      
         - nodemask: Drop duplicate check in for_each_node_mask()
      
           As the return value type of first_node() became unsigned, the node
           >= 0 became unnecessary.
      
         - cpumask: fix function description kernel-doc notation
      
         - MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record
      
           Add linux/bits.h and linux/bitfield.h for visibility"
      
      * tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
        MAINTAINERS: Add bitfield.h to the BITMAP API record
        MAINTAINERS: Add bits.h to the BITMAP API record
        cpumask: fix function description kernel-doc notation
        nodemask: Drop duplicate check in for_each_node_mask()
        lib/bitmap: drop optimization of bitmap_{from,to}_arr64
        lib/test_bitmap: increment failure counter properly
      ad8258e8
    • Geert Uytterhoeven's avatar
      lib: dhry: fix sleeping allocations inside non-preemptable section · 8ba388c0
      Geert Uytterhoeven authored
      The Smatch static checker reports the following warnings:
      
          lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context
          lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context
      
      Indeed, dhry() does sleeping allocations inside the non-preemptable
      section delimited by get_cpu()/put_cpu().
      
      Fix this by using atomic allocations instead.
      Add error handling, as atomic these allocations may fail.
      
      Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be
      Fixes: 13684e96 ("lib: dhry: fix unstable smp_processor_id(_) usage")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Closes: https://lore.kernel.org/r/0469eb3a-02eb-4b41-b189-de20b931fa56@moroto.mountainSigned-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8ba388c0
    • Andrey Konovalov's avatar
      kasan, slub: fix HW_TAGS zeroing with slub_debug · fdb54d96
      Andrey Konovalov authored
      Commit 946fa0db ("mm/slub: extend redzone check to extra allocated
      kmalloc space than requested") added precise kmalloc redzone poisoning to
      the slub_debug functionality.
      
      However, this commit didn't account for HW_TAGS KASAN fully initializing
      the object via its built-in memory initialization feature.  Even though
      HW_TAGS KASAN memory initialization contains special memory initialization
      handling for when slub_debug is enabled, it does not account for in-object
      slub_debug redzones.  As a result, HW_TAGS KASAN can overwrite these
      redzones and cause false-positive slub_debug reports.
      
      To fix the issue, avoid HW_TAGS KASAN memory initialization when
      slub_debug is enabled altogether.  Implement this by moving the
      __slub_debug_enabled check to slab_post_alloc_hook.  Common slab code
      seems like a more appropriate place for a slub_debug check anyway.
      
      Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.1688561016.git.andreyknvl@google.com
      Fixes: 946fa0db ("mm/slub: extend redzone check to extra allocated kmalloc space than requested")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reported-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fdb54d96