Commits · 25a35c1a3d8fd4b10805f05b109a76c9b8b0200c · Kirill Smelkov / linux

30 Oct, 2023 8 commits

Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/next · 25a35c1a

Oliver Upton authored Oct 30, 2023

* kvm-arm64/smccc-filter-cleanups:
  : Cleanup the management of KVM's SMCCC maple tree
  :
  : Avoid the cost of maintaining the SMCCC filter maple tree if userspace
  : hasn't writen a rule to the filter. While at it, rip out the now
  : unnecessary VM flag to indicate whether or not the SMCCC filter was
  : configured.
  KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured
  KVM: arm64: Only insert reserved ranges when SMCCC filter is used
  KVM: arm64: Add a predicate for testing if SMCCC filter is configured
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

25a35c1a

Merge branch kvm-arm64/pmevtyper-filter into kvmarm/next · 7ff7dfe9

Oliver Upton authored Oct 30, 2023

* kvm-arm64/pmevtyper-filter:
  : Fixes to KVM's handling of the PMUv3 exception level filtering bits
  :
  :  - NSH (count at EL2) and M (count at EL3) should be stateful when the
  :    respective EL is advertised in the ID registers but have no effect on
  :    event counting.
  :
  :  - NSU and NSK modify the event filtering of EL0 and EL1, respectively.
  :    Though the kernel may not use these bits, other KVM guests might.
  :    Implement these bits exactly as written in the pseudocode if EL3 is
  :    advertised.
  KVM: arm64: Add PMU event filter bits required if EL3 is implemented
  KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

7ff7dfe9

Merge branch kvm-arm64/feature-flag-refactor into kvmarm/next · d47dcb67

Oliver Upton authored Oct 30, 2023

* kvm-arm64/feature-flag-refactor:
  : vCPU feature flag cleanup
  :
  : Clean up KVM's handling of vCPU feature flags to get rid of the
  : vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu().
  KVM: arm64: Get rid of vCPU-scoped feature bitmap
  KVM: arm64: Remove unused return value from kvm_reset_vcpu()
  KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Prevent NV feature flag on systems w/o nested virt
  KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl
  KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Add generic check for system-supported vCPU features
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

d47dcb67

Merge branch kvm-arm64/misc into kvmarm/next · 054056bf

Oliver Upton authored Oct 30, 2023

* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Put an upper bound on the number of I-cache invalidations by
  :    cacheline to avoid soft lockups
  :
  :  - Get rid of bogus refererence count transfer for THP mappings
  :
  :  - Do a local TLB invalidation on permission fault race
  :
  :  - Fixes for page_fault_test KVM selftest
  :
  :  - Add a tracepoint for detecting MMIO instructions unsupported by KVM
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: arm64: Do not transfer page refcount for THP adjustment
  KVM: arm64: Avoid soft lockups due to I-cache maintenance
  arm64: tlbflush: Rename MAX_TLBI_OPS
  KVM: arm64: Don't use kerneldoc comment for arm64_check_features()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

054056bf

KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 · d11974dc

Oliver Upton authored Oct 26, 2023

It is a pretty well known fact that KVM does not support MMIO emulation
without valid instruction syndrome information (ESR_EL2.ISV == 0). The
current kvm_pr_unimpl() is pretty useless, as it contains zero context
to relate the event to a vCPU.

Replace it with a precise tracepoint that dumps the relevant context
so the user can make sense of what the guest is doing.
Acked-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231026205306.3045075-1-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

d11974dc

KVM: arm64: selftest: Perform ISB before reading PAR_EL1 · 06899aa5

Zenghui Yu authored Oct 07, 2023

It looks like a mistake to issue ISB *after* reading PAR_EL1, we should
instead perform it between the AT instruction and the reads of PAR_EL1.

As according to DDI0487J.a IJTYVP,

"When an address translation instruction is executed, explicit
 synchronization is required to guarantee the result is visible to
 subsequent direct reads of PAR_EL1."

Otherwise all guest_at testcases fail on my box with

==== Test Assertion Failure ====
  aarch64/page_fault_test.c:142: par & 1 == 0
  pid=1355864 tid=1355864 errno=4 - Interrupted system call
     1	0x0000000000402853: vcpu_run_loop at page_fault_test.c:681
     2	0x0000000000402cdb: run_test at page_fault_test.c:730
     3	0x0000000000403897: for_each_guest_mode at guest_modes.c:100
     4	0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1105
     5	 (inlined by) main at page_fault_test.c:1131
     6	0x0000ffffb153c03b: ?? ??:0
     7	0x0000ffffb153c113: ?? ??:0
     8	0x0000000000401aaf: _start at ??:?
  0x1 != 0x0 (par & 1 != 0)
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-2-yuzenghui@huawei.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

06899aa5

KVM: arm64: selftest: Add the missing .guest_prepare() · beaf35b4

Zenghui Yu authored Oct 07, 2023

Running page_fault_test on a Cortex A72 fails with

Test: ro_memslot_no_syndrome_guest_cas
Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
Testing memory backing src type: anonymous
==== Test Assertion Failure ====
  aarch64/page_fault_test.c:117: guest_check_lse()
  pid=1944087 tid=1944087 errno=4 - Interrupted system call
     1	0x00000000004028b3: vcpu_run_loop at page_fault_test.c:682
     2	0x0000000000402d93: run_test at page_fault_test.c:731
     3	0x0000000000403957: for_each_guest_mode at guest_modes.c:100
     4	0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1108
     5	 (inlined by) main at page_fault_test.c:1134
     6	0x0000ffff868e503b: ?? ??:0
     7	0x0000ffff868e5113: ?? ??:0
     8	0x0000000000401aaf: _start at ??:?
  guest_check_lse()

because we don't have a guest_prepare stage to check the presence of
FEAT_LSE and skip the related guest_cas testing, and we end-up failing in
GUEST_ASSERT(guest_check_lse()).

Add the missing .guest_prepare() where it's indeed required.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-1-yuzenghui@huawei.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

beaf35b4

KVM: arm64: Always invalidate TLB for stage-2 permission faults · be097997

Oliver Upton authored Sep 22, 2023

It is possible for multiple vCPUs to fault on the same IPA and attempt
to resolve the fault. One of the page table walks will actually update
the PTE and the rest will return -EAGAIN per our race detection scheme.
KVM elides the TLB invalidation on the racing threads as the return
value is nonzero.

Before commit a12ab137 ("KVM: arm64: Use local TLBI on permission
relaxation") KVM always used broadcast TLB invalidations when handling
permission faults, which had the convenient property of making the
stage-2 updates visible to all CPUs in the system. However now we do a
local invalidation, and TLBI elision leads to the vCPU thread faulting
again on the stale entry. Remember that the architecture permits the TLB
to cache translations that precipitate a permission fault.

Invalidate the TLB entry responsible for the permission fault if the
stage-2 descriptor has been relaxed, regardless of which thread actually
did the job.
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230922223229.1608155-1-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

be097997

24 Oct, 2023 2 commits

KVM: arm64: Add PMU event filter bits required if EL3 is implemented · ae8d3522

Oliver Upton authored Oct 19, 2023

Suzuki noticed that KVM's PMU emulation is oblivious to the NSU and NSK
event filter bits. On systems that have EL3 these bits modify the
filter behavior in non-secure EL0 and EL1, respectively. Even though the
kernel doesn't use these bits, it is entirely possible some other guest
OS does. Additionally, it would appear that these and the M bit are
required by the architecture if EL3 is implemented.

Allow the EL3 event filter bits to be set if EL3 is advertised in the
guest's ID register. Implement the behavior of NSU and NSK according to
the pseudocode, and entirely ignore the M bit for perf event creation.
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20231019185618.3442949-3-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

ae8d3522

KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised · bc512d6a

Oliver Upton authored Oct 19, 2023

The NSH bit, which filters event counting at EL2, is required by the
architecture if an implementation has EL2. Even though KVM doesn't
support nested virt yet, it makes no effort to hide the existence of EL2
from the ID registers. Userspace can, however, change the value of PFR0
to hide EL2. Align KVM's sysreg emulation with the architecture and make
NSH RES0 if EL2 isn't advertised. Keep in mind the bit is ignored when
constructing the backing perf event.

While at it, build the event type mask using explicit field definitions
instead of relying on ARMV8_PMU_EVTYPE_MASK. KVM probably should've been
doing this in the first place, as it avoids changes to the
aforementioned mask affecting sysreg emulation.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20231019185618.3442949-2-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

bc512d6a

05 Oct, 2023 3 commits

KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured · 4202bcac

Oliver Upton authored Oct 04, 2023

The smccc_filter maple tree is only populated if userspace attempted to
configure it. Use the state of the maple tree to determine if the filter
has been configured, eliminating the VM flag.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231004234947.207507-4-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

4202bcac

KVM: arm64: Only insert reserved ranges when SMCCC filter is used · d34b7648

Oliver Upton authored Oct 04, 2023

The reserved ranges are only useful for preventing userspace from
adding a rule that intersects with functions we must handle in KVM. If
userspace never writes to the SMCCC filter than this is all just wasted
work/memory.

Insert reserved ranges on the first call to KVM_ARM_VM_SMCCC_FILTER.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231004234947.207507-3-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

d34b7648

KVM: arm64: Add a predicate for testing if SMCCC filter is configured · bb17fb31

Oliver Upton authored Oct 04, 2023

Eventually we can drop the VM flag, move around the existing
implementation for now.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231004234947.207507-2-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

bb17fb31

30 Sep, 2023 1 commit

KVM: arm64: Do not transfer page refcount for THP adjustment · c04bf723

Vincent Donnefort authored Sep 28, 2023

GUP affects a refcount common to all pages forming the THP. There is
therefore no need to move the refcount from a tail to the head page.
Under the hood it decrements and increments the same counter.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230928173205.2826598-2-vdonnefort@google.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

c04bf723

24 Sep, 2023 4 commits

Linux 6.6-rc3 · 6465e260
Linus Torvalds authored Sep 24, 2023

6465e260

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8a511e7e

Linus Torvalds authored Sep 24, 2023

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Fix EL2 Stage-1 MMIO mappings where a random address was used

   - Fix SMCCC function number comparison when the SVE hint is set

  RISC-V:

   - Fix KVM_GET_REG_LIST API for ISA_EXT registers

   - Fix reading ISA_EXT register of a missing extension

   - Fix ISA_EXT register handling in get-reg-list test

   - Fix filtering of AIA registers in get-reg-list test

  x86:

   - Fixes for TSC_AUX virtualization

   - Stop zapping page tables asynchronously, since we don't zap them as
     often as before"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX
  KVM: SVM: Fix TSC_AUX virtualization setup
  KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
  KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously
  KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()
  KVM: x86/mmu: Open code leaf invalidation from mmu_notifier
  KVM: riscv: selftests: Selectively filter-out AIA registers
  KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list
  RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions
  RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers
  KVM: selftests: Assert that vasprintf() is successful
  KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID
  KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()

8a511e7e

Merge tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 5edc6bb3

Linus Torvalds authored Sep 24, 2023

Pull tracing fixes from Steven Rostedt:

 - Fix the "bytes" output of the per_cpu stat file

   The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
   accounting was not accurate. It is suppose to show how many used
   bytes are still in the ring buffer, but even when the ring buffer was
   empty it would still show there were bytes used.

 - Fix a bug in eventfs where reading a dynamic event directory (open)
   and then creating a dynamic event that goes into that diretory screws
   up the accounting.

   On close, the newly created event dentry will get a "dput" without
   ever having a "dget" done for it. The fix is to allocate an array on
   dir open to save what dentries were actually "dget" on, and what ones
   to "dput" on close.

* tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Remember what dentries were created on dir open
  ring-buffer: Fix bytes info in per_cpu buffer stats

5edc6bb3

Merge tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 2ad78f8c

Linus Torvalds authored Sep 24, 2023

Pull cxl fixes from Dan Williams:
 "A collection of regression fixes, bug fixes, and some small cleanups
  to the Compute Express Link code.

  The regressions arrived in the v6.5 dev cycle and missed the v6.6
  merge window due to my personal absences this cycle. The most
  important fixes are for scenarios where the CXL subsystem fails to
  parse valid region configurations established by platform firmware.
  This is important because agreement between OS and BIOS on the CXL
  configuration is fundamental to implementing "OS native" error
  handling, i.e. address translation and component failure
  identification.

  Other important fixes are a driver load error when the BIOS lets the
  Linux PCI core handle AER events, but not CXL memory errors.

  The other fixex might have end user impact, but for now are only known
  to trigger in our test/emulation environment.

  Summary:

   - Fix multiple scenarios where platform firmware defined regions fail
     to be assembled by the CXL core.

   - Fix a spurious driver-load failure on platforms that enable OS
     native AER, but not OS native CXL error handling.

   - Fix a regression detecting "poison" commands when "security"
     commands are also defined.

   - Fix a cxl_test regression with the move to centralize CXL port
     register enumeration in the CXL core.

   - Miscellaneous small fixes and cleanups"

* tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/acpi: Annotate struct cxl_cxims_data with __counted_by
  cxl/port: Fix cxl_test register enumeration regression
  cxl/region: Refactor granularity select in cxl_port_setup_targets()
  cxl/region: Match auto-discovered region decoders by HPA range
  cxl/mbox: Fix CEL logic for poison and security commands
  cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native()
  PCI/AER: Export pcie_aer_is_native()
  cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers

2ad78f8c

23 Sep, 2023 14 commits

Merge tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 3aba70ae

Linus Torvalds authored Sep 23, 2023

Pull gpio fixes from Bartosz Golaszewski:

 - fix an invalid usage of __free(kfree) leading to kfreeing an
   ERR_PTR()

 - fix an irq domain leak in gpio-tb10x

 - MAINTAINERS update

* tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sim: fix an invalid __free() usage
  gpio: tb10x: Fix an error handling path in tb10x_gpio_probe()
  MAINTAINERS: gpio-regmap: make myself a maintainer of it

3aba70ae

Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of... · 85eba5f1

Linus Torvalds authored Sep 23, 2023

Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three
  are cc:stable"

* tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  proc: nommu: fix empty /proc/<pid>/maps
  filemap: add filemap_map_order0_folio() to handle order0 folio
  proc: nommu: /proc/<pid>/maps: release mmap read lock
  mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
  pidfd: prevent a kernel-doc warning
  argv_split: fix kernel-doc warnings
  scatterlist: add missing function params to kernel-doc
  selftests/proc: fixup proc-empty-vm test after KSM changes
  revert "scripts/gdb/symbols: add specific ko module load command"
  selftests: link libasan statically for tests with -fsanitize=address
  task_work: add kerneldoc annotation for 'data' argument
  mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
  sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning

85eba5f1

Merge tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 8565bdf8

Linus Torvalds authored Sep 23, 2023

Pull smb client fixes from Steve French:
 "Six smb3 client fixes, including three for stable, from the SMB
  plugfest (testing event) this week:

   - Reparse point handling fix (found when investigating dir
     enumeration when fifo in dir)

   - Fix excessive thread creation for dir lease cleanup

   - UAF fix in negotiate path

   - remove duplicate error message mapping and fix confusing warning
     message

   - add dynamic trace point to improve debugging RDMA connection
     attempts"

* tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix confusing debug message
  smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
  smb3: remove duplicate error mapping
  cifs: Fix UAF in cifs_demultiplex_thread()
  smb3: do not start laundromat thread when dir leases  disabled
  smb3: Add dynamic trace points for RDMA (smbdirect) reconnect

8565bdf8

Merge tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5a4de7dc

Linus Torvalds authored Sep 23, 2023

Pull i2c fixes from Wolfram Sang:
 "A set of I2C driver fixes. Mostly fixing resource leaks or sanity
  checks"

* tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: xiic: Correct return value check for xiic_reinit()
  i2c: mux: gpio: Add missing fwnode_handle_put()
  i2c: mux: demux-pinctrl: check the return value of devm_kstrdup()
  i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
  i2c: i801: unregister tco_pdev in i801_probe() error path

5a4de7dc

mfd: cs42l43: Use correct macro for new-style PM runtime ops · eb72d520

Charles Keepax authored Sep 19, 2023

The code was accidentally mixing new and old style macros, update the
macros used to remove an unused function warning whilst building with
no PM enabled in the config.

Fixes: ace6d144 ("mfd: cs42l43: Add support for cs42l43 core driver")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/all/20230822114914.340359-1-ckeepax@opensource.cirrus.com/Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

eb72d520

Merge tag 'loongarch-fixes-6.6-1' of... · 93397d3a

Linus Torvalds authored Sep 23, 2023

Merge tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix lockdep, fix a boot failure, fix some build warnings, fix document
  links, and some cleanups"

* tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  docs/zh_CN/LoongArch: Update the links of ABI
  docs/LoongArch: Update the links of ABI
  LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()
  kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage
  LoongArch: Set all reserved memblocks on Node#0 at initialization
  LoongArch: Remove dead code in relocate_new_kernel
  LoongArch: Use _UL() and _ULL()
  LoongArch: Fix some build warnings with W=1
  LoongArch: Fix lockdep static memory detection

93397d3a

Merge tag 's390-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 2e3d3911

Linus Torvalds authored Sep 23, 2023

Pull s390 fixes from Vasily Gorbik:

 - Fix potential string buffer overflow in hypervisor user-defined
   certificates handling

 - Update defconfigs

* tag 's390-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cert_store: fix string length handling
  s390: update defconfigs

2e3d3911

Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 59c376d6

Linus Torvalds authored Sep 23, 2023

Pull iomap fixes from Darrick Wong:

 - Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
   less poorly with block device io racing with block device resizing

 - Fix a stale page data exposure bug introduced in 6.6-rc1 when
   unsharing a file range that is not in the page cache

* tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: convert iomap_unshare_iter to use large folios
  iomap: don't skip reading in !uptodate folios when unsharing a range
  iomap: handle error conditions more gracefully in iomap_to_bh

59c376d6

Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into HEAD · 5804c19b

Paolo Bonzini authored Sep 23, 2023

KVM/riscv fixes for 6.6, take #1

- Fix KVM_GET_REG_LIST API for ISA_EXT registers
- Fix reading ISA_EXT register of a missing extension
- Fix ISA_EXT register handling in get-reg-list test
- Fix filtering of AIA registers in get-reg-list test

5804c19b

KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX · 916e3e5f

Tom Lendacky authored Sep 15, 2023

When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B"
within the VMSA. This means that the guest value is loaded on VMRUN and
the host value is restored from the host save area on #VMEXIT.

Since the value is restored on #VMEXIT, the KVM user return MSR support
for TSC_AUX can be replaced by populating the host save area with the
current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux
post-boot, the host save area can be set once in svm_hardware_enable().
This eliminates the two WRMSR instructions associated with the user return
MSR support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

916e3e5f

KVM: SVM: Fix TSC_AUX virtualization setup · e0096d01

Tom Lendacky authored Sep 15, 2023

The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.

This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.

Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.

With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.

Fixes: 296d5a17 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0096d01

KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway · e8d93d5d

Paolo Bonzini authored Sep 22, 2023

svm_recalc_instruction_intercepts() is always called at least once
before the vCPU is started, so the setting or clearing of the RDTSCP
intercept can be dropped from the TSC_AUX virtualization support.

Extracted from a patch by Tom Lendacky.

Cc: stable@vger.kernel.org
Fixes: 296d5a17 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e8d93d5d

KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously · 0df9dab8

Sean Christopherson authored Sep 15, 2023

Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated. Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).

While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot. Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost. Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.

When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner. In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.

Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.

Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong. Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.

Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events). Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.

Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.
Reported-by: Pattara Teerapong <pteerapong@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Yiwei Zhang<zzyiwei@google.com>
Cc: Paul Hsia <paulhsia@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0df9dab8

KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe() · 441a5dfc

Paolo Bonzini authored Sep 21, 2023

All callers except the MMU notifier want to process all address spaces.
Remove the address space ID argument of for_each_tdp_mmu_root_yield_safe()
and switch the MMU notifier to use __for_each_tdp_mmu_root_yield_safe().

Extracted out of a patch by Sean Christopherson <seanjc@google.com>

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

441a5dfc

22 Sep, 2023 8 commits

Merge tag 'hardening-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · d90b0276

Linus Torvalds authored Sep 22, 2023

Pull hardening fixes from Kees Cook:

 - Fix UAPI stddef.h to avoid C++-ism (Alexey Dobriyan)

 - Fix harmless UAPI stddef.h header guard endif (Alexey Dobriyan)

* tag 'hardening-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  uapi: stddef.h: Fix __DECLARE_FLEX_ARRAY for C++
  uapi: stddef.h: Fix header guard location

d90b0276

Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 3abc79dc

Linus Torvalds authored Sep 22, 2023

Pull xfs fixes from Chandan Babu:

 - Fix an integer overflow bug when processing an fsmap call

 - Fix crash due to CPU hot remove event racing with filesystem mount
   operation

 - During read-only mount, XFS does not allow the contents of the log to
   be recovered when there are one or more unrecognized rcompat features
   in the primary superblock, since the log might have intent items
   which the kernel does not know how to process

 - During recovery of log intent items, XFS now reserves log space
   sufficient for one cycle of a permanent transaction to execute.
   Otherwise, this could lead to livelocks due to non-availability of
   log space

 - On an fs which has an ondisk unlinked inode list, trying to delete a
   file or allocating an O_TMPFILE file can cause the fs to the shutdown
   if the first inode in the ondisk inode list is not present in the
   inode cache. The bug is solved by explicitly loading the first inode
   in the ondisk unlinked inode list into the inode cache if it is not
   already cached

   A similar problem arises when the uncached inode is present in the
   middle of the ondisk unlinked inode list. This second bug is
   triggered when executing operations like quotacheck and bulkstat. In
   this case, XFS now reads in the entire ondisk unlinked inode list

 - Enable LARP mode only on recent v5 filesystems

 - Fix a out of bounds memory access in scrub

 - Fix a performance bug when locating the tail of the log during
   mounting a filesystem

* tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
  xfs: only call xchk_stats_merge after validating scrub inputs
  xfs: require a relatively recent V5 filesystem for LARP mode
  xfs: make inode unlinked bucket recovery work with quotacheck
  xfs: load uncached unlinked inodes into memory on demand
  xfs: reserve less log space when recovering log intent items
  xfs: fix log recovery when unknown rocompat bits are set
  xfs: reload entire unlinked bucket lists
  xfs: allow inode inactivation during a ro mount log recovery
  xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
  xfs: remove CPU hotplug infrastructure
  xfs: remove the all-mounts list
  xfs: use per-mount cpumask to track nonempty percpu inodegc lists
  xfs: fix an agbno overflow in __xfs_getfsmap_datadev
  xfs: fix per-cpu CIL structure aggregation racing with dying cpus
  xfs: fix select in config XFS_ONLINE_SCRUB_STATS

3abc79dc

cxl/acpi: Annotate struct cxl_cxims_data with __counted_by · c66650d2

Kees Cook authored Sep 22, 2023

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cxl_cxims_data.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230922175319.work.096-kees@kernel.orgSigned-off-by: Dan Williams <dan.j.williams@intel.com>

c66650d2

cxl/port: Fix cxl_test register enumeration regression · a76b6251

Dan Williams authored Sep 15, 2023

The cxl_test unit test environment models a CXL topology for
sysfs/user-ABI regression testing. It uses interface mocking via the
"--wrap=" linker option to redirect cxl_core routines that parse
hardware registers with versions that just publish objects, like
devm_cxl_enumerate_decoders().

Starting with:

Commit 19ab69a6 ("cxl/port: Store the port's Component Register mappings in struct cxl_port")

...port register enumeration is moved into devm_cxl_add_port(). This
conflicts with the "cxl_test avoids emulating registers stance" so
either the port code needs to be refactored (too violent), or modified
so that register enumeration is skipped on "fake" cxl_test ports
(annoying, but straightforward).

This conflict has happened previously and the "check for platform
device" workaround to avoid instrusive refactoring was deployed in those
scenarios. In general, refactoring should only benefit production code,
test code needs to remain minimally instrusive to the greatest extent
possible.

This was missed previously because it may sometimes just cause warning
messages to be emitted, but it can also cause test failures. The
backport to -stable is only nice to have for clean cxl_test runs.

Fixes: 19ab69a6 ("cxl/port: Store the port's Component Register mappings in struct cxl_port")
Cc: stable@vger.kernel.org
Reported-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

a76b6251

eventfs: Remember what dentries were created on dir open · ef36b4f9

Steven Rostedt (Google) authored Sep 22, 2023

Using the following code with libtracefs:

	int dfd;

	// create the directory events/kprobes/kp1
	tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1");

	// Open the kprobes directory
	dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY);

	// Do a lookup of the kprobes/kp1 directory (by looking at enable)
	tracefs_file_exists(NULL, "events/kprobes/kp1/enable");

	// Now create a new entry in the kprobes directory
	tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1");

	// Do another lookup to create the dentries
	tracefs_file_exists(NULL, "events/kprobes/kp2/enable"))

	// Close the directory
	close(dfd);

What happened above, the first open (dfd) will call
dcache_dir_open_wrapper() that will create the dentries and up their ref
counts.

Now the creation of "kp2" will add another dentry within the kprobes
directory.

Upon the close of dfd, eventfs_release() will now do a dput for all the
entries in kprobes. But this is where the problem lies. The open only
upped the dentry of kp1 and not kp2. Now the close is decrementing both
kp1 and kp2, which causes kp2 to get a negative count.

Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel
to crash! (due to the messed up accounting of the ref counts).

To solve this, save all the dentries that are opened in the
dcache_dir_open_wrapper() into an array, and use this array to know what
dentries to do a dput on in eventfs_release().

Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the
file->private_data, we need to also add a wrapper around dcache_readdir()
that uses the cursor assigned to the file->private_data. This is because
the dentries need to also be saved in the file->private_data. To do this
create the structure:

  struct dentry_list {
	void		*cursor;
	struct dentry	**dentries;
  };

Which will hold both the cursor and the dentries. Some shuffling around is
needed to make sure that dcache_dir_open() and dcache_readdir() only see
the cursor.

Link: https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 63940449 ("eventfs: Implement eventfs lookup, read, open functions")
Reported-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ef36b4f9

ring-buffer: Fix bytes info in per_cpu buffer stats · 45d99ea4

Zheng Yejian authored Sep 21, 2023

The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of
bytes in cpu buffer that have not been consumed. However, currently
after consuming data by reading file 'trace_pipe', the 'bytes' info
was not changed as expected.

  # cat per_cpu/cpu0/stats
  entries: 0
  overrun: 0
  commit overrun: 0
  bytes: 568             <--- 'bytes' is problematical !!!
  oldest event ts:  8651.371479
  now ts:  8653.912224
  dropped events: 0
  read events: 8

The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it:
  1. When stat 'read_bytes', account consumed event in rb_advance_reader();
  2. When stat 'entries_bytes', exclude the discarded padding event which
     is smaller than minimum size because it is invisible to reader. Then
     use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for
     page-based read/remove/overrun.

Also correct the comments of ring_buffer_bytes_cpu() in this patch.

Link: https://lore.kernel.org/linux-trace-kernel/20230921125425.1708423-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: c64e148a ("trace: Add ring buffer stats to measure rate of events")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

45d99ea4

Merge tag 'thermal-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 8018e02a

Linus Torvalds authored Sep 22, 2023

Pull thermal control fix from Rafael Wysocki:
 "Unbreak the trip point update sysfs interface that has been broken
  since the 6.3 cycle (Rafael Wysocki)"

* tag 'thermal-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: sysfs: Fix trip_point_hyst_store()

8018e02a

Merge tag 'acpi-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b184c040

Linus Torvalds authored Sep 22, 2023

Pull ACPI fixes from Rafael Wysocki:
 "These fix a general ACPI processor driver regression and an ia64 build
  issue, both introduced recently.

  Specifics:

   - Fix recently introduced uninitialized memory access issue in the
     ACPI processor driver (Michal Wilczynski)

   - Fix ia64 build inadvertently broken by recent ACPI processor driver
     changes, which is prudent to do for 6.6 even though ia64 support is
     slated for removal in 6.7 (Ard Biesheuvel)"

* tag 'acpi-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Fix uninitialized access of buf in acpi_set_pdc_bits()
  acpi: Provide ia64 dummy implementation of acpi_proc_quirk_mwait_check()

b184c040