Commits · 0f099dc9d1149ceaa9319810dba44ffd06f3aef7 · Kirill Smelkov / linux

03 Apr, 2024 4 commits

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 0f099dc9

Linus Torvalds authored Apr 03, 2024

Pull KVM fixes from Paolo Bonzini:
 "ARM:

   - Ensure perf events programmed to count during guest execution are
     actually enabled before entering the guest in the nVHE
     configuration

   - Restore out-of-range handler for stage-2 translation faults

   - Several fixes to stage-2 TLB invalidations to avoid stale
     translations, possibly including partial walk caches

   - Fix early handling of architectural VHE-only systems to ensure E2H
     is appropriately set

   - Correct a format specifier warning in the arch_timer selftest

   - Make the KVM banner message correctly handle all of the possible
     configurations

  RISC-V:

   - Remove redundant semicolon in num_isa_ext_regs()

   - Fix APLIC setipnum_le/be write emulation

   - Fix APLIC in_clrip[x] read emulation

  x86:

   - Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID
     entries (old vs. new) and ultimately neglects to clear PV_UNHALT
     from vCPUs with HLT-exiting disabled

   - Documentation fixes for SEV

   - Fix compat ABI for KVM_MEMORY_ENCRYPT_OP

   - Fix a 14-year-old goof in a declaration shared by host and guest;
     the enabled field used by Linux when running as a guest pushes the
     size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is
     really unconsequential because KVM never consumes anything beyond
     the first 64 bytes, but the resulting struct does not match the
     documentation

  Selftests:

   - Fix spelling mistake in arch_timer selftest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: arm64: Rationalise KVM banner output
  arm64: Fix early handling of FEAT_E2H0 not being implemented
  KVM: arm64: Ensure target address is granule-aligned for range TLBI
  KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
  KVM: arm64: Don't pass a TLBI level hint when zapping table entries
  KVM: arm64: Don't defer TLB invalidation when zapping table entries
  KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test
  KVM: arm64: Fix out-of-IPA space translation fault handling
  KVM: arm64: Fix host-programmed guest events in nVHE
  RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
  RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
  RISC-V: KVM: Remove second semicolon
  KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
  Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP
  Documentation: kvm/sev: separate description of firmware
  KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
  KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled
  KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
  KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
  KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES
  ...

0f099dc9

security: Place security_path_post_mknod() where the original IMA call was · 701b3899

Roberto Sassu authored Apr 03, 2024

Commit 08abce60 ("security: Introduce path_post_mknod hook")
introduced security_path_post_mknod(), to replace the IMA-specific call
to ima_post_path_mknod().

For symmetry with security_path_mknod(), security_path_post_mknod() was
called after a successful mknod operation, for any file type, rather
than only for regular files at the time there was the IMA call.

However, as reported by VFS maintainers, successful mknod operation does
not mean that the dentry always has an inode attached to it (for
example, not for FIFOs on a SAMBA mount).

If that condition happens, the kernel crashes when
security_path_post_mknod() attempts to verify if the inode associated to
the dentry is private.

Move security_path_post_mknod() where the ima_post_path_mknod() call was,
which is obviously correct from IMA/EVM perspective. IMA/EVM are the only
in-kernel users, and only need to inspect regular files.
Reported-by: Steve French <smfrench@gmail.com>
Closes: https://lore.kernel.org/linux-kernel/CAH2r5msAVzxCUHHG8VKrMPUKQHmBpE6K9_vjhgDa1uAvwx4ppw@mail.gmail.com/Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 08abce60 ("security: Introduce path_post_mknod hook")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

701b3899

x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO · 0e110732

Borislav Petkov (AMD) authored Apr 02, 2024

The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO
case is there only for the altenative in CALL_UNTRAIN_RET to have
a symbol to resolve.

However, testing with kernels which don't have CONFIG_MITIGATION_SRSO
enabled, leads to the warning in patch_return() to fire:

missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826

Put in a plain "ret" there so that gcc doesn't put a return thunk in
in its place which special and gets checked.

In addition:

ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1
make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2
make: *** [Makefile:240: __sub-make] Chyba 2

since !SRSO builds would use the dummy return thunk as reported by
petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.comSigned-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0e110732

Merge tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 3e92c1e6

Linus Torvalds authored Apr 02, 2024

Pull selinux fix from Paul Moore:
 "A single patch for SELinux to fix a problem where we could potentially
  dereference an error pointer if we failed to successfully mount
  selinuxfs"

* tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: avoid dereference of garbage after mount failure

3e92c1e6

02 Apr, 2024 5 commits

Merge tag 'docs-6.9-fixes' of git://git.lwn.net/linux · b1e6ec0a

Linus Torvalds authored Apr 02, 2024

Pull documentation fixes from Jonathan Corbet:
 "Four small documentation fixes"

* tag 'docs-6.9-fixes' of git://git.lwn.net/linux:
  docs: zswap: fix shell command format
  tracing: Fix documentation on tp_printk cmdline option
  docs: Fix bitfield handling in kernel-doc
  Documentation: dev-tools: Add link to RV docs

b1e6ec0a

Merge tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs · 67199a47

Linus Torvalds authored Apr 02, 2024

Pull bcachefs fixes from Kent Overstreet:
 "Lots of fixes for situations with extreme filesystem damage.

  One fix ("Fix journal pins in btree write buffer") applicable to
  normal usage; also a dio performance fix.

  New repair/construction code is in the final stages, should be ready
  in about a week. Anyone that lost btree interior nodes (or a variety
  of other damage) as a result of the splitbrain bug will be able to
  repair then"

* tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits)
  bcachefs: On emergency shutdown, print out current journal sequence number
  bcachefs: Fix overlapping extent repair
  bcachefs: Fix remove_dirent()
  bcachefs: Logged op errors should be ignored
  bcachefs: Improve -o norecovery; opts.recovery_pass_limit
  bcachefs: bch2_run_explicit_recovery_pass_persistent()
  bcachefs: Ensure bch_sb_field_ext always exists
  bcachefs: Flush journal immediately after replay if we did early repair
  bcachefs: Resume logged ops after fsck
  bcachefs: Add error messages to logged ops fns
  bcachefs: Split out recovery_passes.c
  bcachefs: fix backpointer for missing alloc key msg
  bcachefs: Fix bch2_btree_increase_depth()
  bcachefs: Kill bch2_bkey_ptr_data_type()
  bcachefs: Fix use after free in check_root_trans()
  bcachefs: Fix repair path for missing indirect extents
  bcachefs: Fix use after free in bch2_check_fix_ptrs()
  bcachefs: Fix btree node keys accounting in topology repair path
  bcachefs: Check btree ptr min_key in .invalid
  bcachefs: add REQ_SYNC and REQ_IDLE in write dio
  ...

67199a47

Merge tag 'kvm-riscv-fixes-6.9-1' of https://github.com/kvm-riscv/linux into HEAD · 9bc60f73

Paolo Bonzini authored Apr 02, 2024

KVM/riscv fixes for 6.9, take #1

- Fix spelling mistake in arch_timer selftest
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation

9bc60f73

Merge tag 'kvmarm-fixes-6.9-1' of... · 52b761b4

Paolo Bonzini authored Apr 02, 2024

Merge tag 'kvmarm-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.9, part #1

 - Ensure perf events programmed to count during guest execution
   are actually enabled before entering the guest in the nVHE
   configuration.

 - Restore out-of-range handler for stage-2 translation faults.

 - Several fixes to stage-2 TLB invalidations to avoid stale
   translations, possibly including partial walk caches.

 - Fix early handling of architectural VHE-only systems to ensure E2H is
   appropriately set.

 - Correct a format specifier warning in the arch_timer selftest.

 - Make the KVM banner message correctly handle all of the possible
   configurations.

52b761b4

selinux: avoid dereference of garbage after mount failure · 37801a36

Christian Göttsche authored Mar 28, 2024

In case kern_mount() fails and returns an error pointer return in the
error branch instead of continuing and dereferencing the error pointer.

While on it drop the never read static variable selinuxfs_mount.

Cc: stable@vger.kernel.org
Fixes: 0619f0f5 ("selinux: wrap selinuxfs state")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

37801a36

01 Apr, 2024 31 commits

Merge tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux · 026e680b

Linus Torvalds authored Apr 01, 2024

Pull pwm fix from Uwe Kleine-König:
 "This fixes a regression intoduced by an off-by-one in v6.9-rc1 making
  the pwm-pxa and the pwm driver in ti-sn65dsi86 unusable for most
  consumer drivers because the default period wasn't set"

* tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: Fix setting period with #pwm-cells = <1> and of_pwm_single_xlate()

026e680b

KVM: arm64: Rationalise KVM banner output · d96c66ab

Marc Zyngier authored Mar 21, 2024

We are not very consistent when it comes to displaying which mode
we're in (VHE, {n,h}VHE, protected or not). For example, booting
in protected mode with hVHE results in:

[    0.969545] kvm [1]: Protected nVHE mode initialized successfully

which is mildly amusing considering that the machine is VHE only.

We already cleaned this up a bit with commit 1f3ca702 ("KVM:
arm64: print Hyp mode"), but that's still unsatisfactory.

Unify the three strings into one and use a mess of conditional
statements to sort it out (yes, it's a slow day).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240321173706.3280796-1-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

d96c66ab

arm64: Fix early handling of FEAT_E2H0 not being implemented · b3320142

Marc Zyngier authored Mar 21, 2024

Commit 3944382f introduced checks for the FEAT_E2H0 not being
implemented. However, the check is absolutely wrong and makes a
point it testing a bit that is guaranteed to be zero.

On top of that, the detection happens way too late, after the
init_el2_state has done its job.

This went undetected because the HW this was tested on has E2H being
RAO/WI, and not RES1. However, the bug shows up when run as a nested
guest, where HCR_EL2.E2H is not necessarily set to 1. As a result,
booting the kernel in hVHE mode fails with timer accesses being
cought in a trap loop (which was fun to debug).

Fix the check for ID_AA64MMFR4_EL1.E2H0, and set the HCR_EL2.E2H bit
early so that it can be checked by the rest of the init sequence.

With this, hVHE works again in a NV environment that doesn't have
FEAT_E2H0.

Fixes: 3944382f ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240321115414.3169115-1-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

b3320142

KVM: arm64: Ensure target address is granule-aligned for range TLBI · 4c36a156

Will Deacon authored Mar 27, 2024

When zapping a table entry in stage2_try_break_pte(), we issue range
TLB invalidation for the region that was mapped by the table. However,
we neglect to align the base address down to the granule size and so
if we ended up reaching the table entry via a misaligned address then
we will accidentally skip invalidation for some prefix of the affected
address range.

Align 'ctx->addr' down to the granule size when performing TLB
invalidation for an unmapped table in stage2_try_break_pte().

Cc: Raghavendra Rao Ananta <rananta@google.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Quentin Perret <qperret@google.com>
Fixes: defc8cc7 ("KVM: arm64: Invalidate the table entries upon a range")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240327124853.11206-5-will@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

4c36a156

KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range() · 0f0ff097

Will Deacon authored Mar 27, 2024

Commit c910f2b6 ("arm64/mm: Update tlb invalidation routines for
FEAT_LPA2") updated the __tlbi_level() macro to take the target level
as an argument, with TLBI_TTL_UNKNOWN (rather than 0) indicating that
the caller cannot provide level information. Unfortunately, the two
implementations of __kvm_tlb_flush_vmid_range() were not updated and so
now ask for an level 0 invalidation if FEAT_LPA2 is implemented.

Fix the problem by passing TLBI_TTL_UNKNOWN instead of 0 as the level
argument to __flush_s2_tlb_range_op() in __kvm_tlb_flush_vmid_range().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: c910f2b6 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240327124853.11206-4-will@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

0f0ff097

KVM: arm64: Don't pass a TLBI level hint when zapping table entries · 36e00832

Will Deacon authored Mar 27, 2024

The TLBI level hints are for leaf entries only, so take care not to pass
them incorrectly after clearing a table entry.

Cc: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Fixes: 82bb0244 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
Fixes: 6d9d2115 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240327124853.11206-3-will@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

36e00832

KVM: arm64: Don't defer TLB invalidation when zapping table entries · f62d4c3e

Will Deacon authored Mar 27, 2024

Commit 7657ea92 ("KVM: arm64: Use TLBI range-based instructions for
unmap") introduced deferred TLB invalidation for the stage-2 page-table
so that range-based invalidation can be used for the accumulated
addresses. This works fine if the structure of the page-tables remains
unchanged, but if entire tables are zapped and subsequently freed then
we transiently leave the hardware page-table walker with a reference
to freed memory thanks to the translation walk caches. For example,
stage2_unmap_walker() will free page-table pages:

	if (childp)
		mm_ops->put_page(childp);

and issue the TLB invalidation later in kvm_pgtable_stage2_unmap():

	if (stage2_unmap_defer_tlb_flush(pgt))
		/* Perform the deferred TLB invalidations */
		kvm_tlb_flush_vmid_range(pgt->mmu, addr, size);

For now, take the conservative approach and invalidate the TLB eagerly
when we clear a table entry. Note, however, that the existing level
hint passed to __kvm_tlb_flush_vmid_ipa() is incorrect and will be
fixed in a subsequent patch.

Cc: Raghavendra Rao Ananta <rananta@google.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240327124853.11206-2-will@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

f62d4c3e

bcachefs: On emergency shutdown, print out current journal sequence number · b3c7fd35
Kent Overstreet authored Mar 30, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
b3c7fd35

bcachefs: Fix overlapping extent repair · eab3a3ce

Kent Overstreet authored Mar 30, 2024

overlapping extent repair was colliding with extent past end of inode
checks - don't update "extent ends at" until we know we have an extent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

eab3a3ce

bcachefs: Fix remove_dirent() · 8ce1db80

Kent Overstreet authored Apr 01, 2024

We were missing an iter_traverse().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8ce1db80

bcachefs: Logged op errors should be ignored · cecfed9b

Kent Overstreet authored Mar 31, 2024

If something is wrong with a logged op, we just want to delete it -
there's nothing to repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cecfed9b

bcachefs: Improve -o norecovery; opts.recovery_pass_limit · 13c1e583

Kent Overstreet authored Mar 28, 2024

This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.

Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.

When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.

recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

13c1e583

bcachefs: bch2_run_explicit_recovery_pass_persistent() · 060ff30a

Kent Overstreet authored Mar 29, 2024

Flag that we need to run a recovery pass and run it - persistenly, so if
we crash it'll still get run.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

060ff30a

bcachefs: Ensure bch_sb_field_ext always exists · 0a34c058

Kent Overstreet authored Mar 30, 2024

This makes bch_sb_field_ext more consistent with the rest of -o
nochanges - we don't want to be varying other codepaths based on -o
nochanges, since it's used for testing in dry run mode; also fixes some
potential null ptr derefs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0a34c058

bcachefs: Flush journal immediately after replay if we did early repair · 4fe0eeea
Kent Overstreet authored Mar 28, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
4fe0eeea

bcachefs: Resume logged ops after fsck · af855a5f

Kent Overstreet authored Mar 23, 2024

Finishing logged ops requires the filesystem to be in a reasonably
consistent state - and other fsck passes don't require it to have
completed, so just run it last.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

af855a5f

bcachefs: Add error messages to logged ops fns · e5aa8046
Kent Overstreet authored Mar 23, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
e5aa8046

bcachefs: Split out recovery_passes.c · d2554263

Kent Overstreet authored Mar 23, 2024

We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d2554263

bcachefs: fix backpointer for missing alloc key msg · 11d5568d
Kent Overstreet authored Mar 28, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
11d5568d

bcachefs: Fix bch2_btree_increase_depth() · 7f9e5080

Kent Overstreet authored Mar 14, 2024

When we haven't yet allocated any btree nodes for a given btree, we
first need to call the regular split path to allocate one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7f9e5080

bcachefs: Kill bch2_bkey_ptr_data_type() · 47d2080e

Kent Overstreet authored Mar 25, 2024

Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

47d2080e

bcachefs: Fix use after free in check_root_trans() · dcc1c045
Kent Overstreet authored Mar 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
dcc1c045
bcachefs: Fix repair path for missing indirect extents · 83bb5853
Kent Overstreet authored Mar 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
83bb5853
bcachefs: Fix use after free in bch2_check_fix_ptrs() · 6f5869ff
Kent Overstreet authored Mar 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
6f5869ff

bcachefs: Fix btree node keys accounting in topology repair path · 812a9297

Kent Overstreet authored Mar 26, 2024

When dropping keys now outside a now because we're changing the node
min/max, we need to redo the node's accounting as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

812a9297

bcachefs: Check btree ptr min_key in .invalid · 805b535a
Kent Overstreet authored Mar 25, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
805b535a

bcachefs: add REQ_SYNC and REQ_IDLE in write dio · bb660099

zhuxiaohui authored Mar 26, 2024

when writing file with direct_IO on bcachefs, then performance is
much lower than other fs due to write back throttle in block layer:

        wbt_wait+1
        __rq_qos_throttle+32
        blk_mq_submit_bio+394
        submit_bio_noacct_nocheck+649
        bch2_submit_wbio_replicas+538
        __bch2_write+2539
        bch2_direct_write+1663
        bch2_write_iter+318
        aio_write+355
        io_submit_one+1224
        __x64_sys_io_submit+169
        do_syscall_64+134
        entry_SYSCALL_64_after_hwframe+110

add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io
Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bb660099

bcachefs: Improved topology repair checks · 79032b07

Kent Overstreet authored Mar 23, 2024

Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

79032b07

bcachefs: Be careful about btree node splits during journal replay · 40cb2623
Kent Overstreet authored Mar 26, 2024
```
Don't pick a pivot that's going to be deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
40cb2623

bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finished · 048f47e8

Kent Overstreet authored Mar 25, 2024

btree_and_journal_iter is now safe to use at runtime, not just during
recovery before journal keys have been freed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

048f47e8

bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc · 36f9ef10

Hongbo Li authored Mar 25, 2024

The old code doesn't consider the mem alloced from mempool when call
krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to
free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX"
is inaccurate when trans->mem was allocated by krealloc function.
Instead, we use used_mempool stuff to record the situation, and realloc
or free the trans->mem in elegant way.

Also, after krealloc failed in __bch2_trans_kmalloc, the old data
should be copied to the new buffer when alloc from mempool_alloc.

Fixes: 31403dca ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

36f9ef10