- 01 Mar, 2024 10 commits
-
-
Saurabh Sengar authored
The current method for signaling the compatibility of a Hyper-V host with MSIs featuring 15-bit APIC IDs relies on a synthetic cpuid leaf. However, for higher VTLs, this leaf is not reported, due to the absence of an IO-APIC. As an alternative, assume that when running at a high VTL, the host supports 15-bit APIC IDs. This assumption is safe, as Hyper-V does not employ any architectural MSIs at higher VTLs This unblocks startup of VTL2 environments with more than 256 CPUs. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1705341460-18394-1-git-send-email-ssengar@linux.microsoft.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1705341460-18394-1-git-send-email-ssengar@linux.microsoft.com>
-
Michael Kelley authored
In a CoCo VM, when transitioning memory from encrypted to decrypted, or vice versa, the caller of set_memory_encrypted() or set_memory_decrypted() is responsible for ensuring the memory isn't in use and isn't referenced while the transition is in progress. The transition has multiple steps, and the memory is in an inconsistent state until all steps are complete. A reference while the state is inconsistent could result in an exception that can't be cleanly fixed up. However, the kernel load_unaligned_zeropad() mechanism could cause a stray reference that can't be prevented by the caller of set_memory_encrypted() or set_memory_decrypted(), so there's specific code to handle this case. But a CoCo VM running on Hyper-V may be configured to run with a paravisor, with the #VC or #VE exception routed to the paravisor. There's no architectural way to forward the exceptions back to the guest kernel, and in such a case, the load_unaligned_zeropad() specific code doesn't work. To avoid this problem, mark pages as "not present" while a transition is in progress. If load_unaligned_zeropad() causes a stray reference, a normal page fault is generated instead of #VC or #VE, and the page-fault-based fixup handlers for load_unaligned_zeropad() resolve the reference. When the encrypted/decrypted transition is complete, mark the pages as "present" again. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
-
Michael Kelley authored
set_memory_p() is currently static. It has parameters that don't match set_memory_p() under arch/powerpc and that aren't congruent with the other set_memory_* functions. There's no good reason for the difference. Fix this by making the parameters consistent, and update the one existing call site. Make the function non-static and add it to include/asm/set_memory.h so that it is completely parallel to set_memory_np() and is usable in other modules. No functional change. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240116022008.1023398-3-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240116022008.1023398-3-mhklinux@outlook.com>
-
Michael Kelley authored
In preparation for temporarily marking pages not present during a transition between encrypted and decrypted, use slow_virt_to_phys() in the hypervisor callback. As long as the PFN is correct, slow_virt_to_phys() works even if the leaf PTE is not present. The existing functions that depend on vmalloc_to_page() all require that the leaf PTE be marked present, so they don't work. Update the comments for slow_virt_to_phys() to note this broader usage and the requirement to work even if the PTE is not marked present. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20240116022008.1023398-2-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240116022008.1023398-2-mhklinux@outlook.com>
-
Michael Kelley authored
Add documentation topic for PCI pass-thru devices in Linux guests on Hyper-V and for the associated PCI controller driver (pci-hyperv.c). Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/20240222200710.305259-1-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240222200710.305259-1-mhklinux@outlook.com>
-
Michael Kelley authored
A previous commit left the indentation in create_gpadl_header() unchanged for ease of review. Update the indentation and remove line wrap in two places where it is no longer necessary. No functional change. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20240111165451.269418-2-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240111165451.269418-2-mhklinux@outlook.com>
-
Michael Kelley authored
create_gpadl_header() creates a message header, and one or more message bodies if the number of GPADL entries exceeds what fits in the header. Currently the code for creating the message header is duplicated in the two halves of the main "if" statement governing whether message bodies are created. Eliminate the duplication by making minor tweaks to the logic and associated comments. While here, simplify the handling of memory allocation errors, and use umin() instead of open coding it. For ease of review, the indentation of sizable chunks of code is *not* changed. A follow-on patch updates only the indentation. No functional change. Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20240111165451.269418-1-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240111165451.269418-1-mhklinux@outlook.com>
-
Michael Kelley authored
A recent commit removing the use of screen_info introduced a logic error. The error causes hvfb_getmem() to always return -ENOMEM for Generation 2 VMs. As a result, the Hyper-V frame buffer device fails to initialize. The error was introduced by removing an "else if" clause, leaving Gen2 VMs to always take the -ENOMEM error path. Fix the problem by removing the error path "else" clause. Gen 2 VMs now always proceed through the MMIO memory allocation code, but with "base" and "size" defaulting to 0. Fixes: 0aa0838c ("fbdev/hyperv_fb: Remove firmware framebuffers with aperture helpers") Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20240201060022.233666-1-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240201060022.233666-1-mhklinux@outlook.com>
-
Michael Kelley authored
The VMBUS_RING_SIZE macro adds space for a ring buffer header to the requested ring buffer size. The header size is always 1 page, and so its size varies based on the PAGE_SIZE for which the kernel is built. If the requested ring buffer size is a large power-of-2 size and the header size is small, the resulting size is inefficient in its use of memory. For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory allocator, and wastes 508 Kbytes of memory. In such situations, the exact size of the ring buffer isn't that important, and it's OK to allocate the 4 Kbyte header at the beginning of the 512 Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted. Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer sizes. "Large" is somewhat arbitrarily defined as 8 times the size of the ring buffer header (which is of size PAGE_SIZE). For example, for 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first 4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer header. In both cases, smaller sizes add space for the header so the ring size isn't reduced too much by using part of the space for the header. For example, with a 64 Kbyte page size, we don't want a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half of the space for the header. In such a case, the memory allocation is less efficient, but it's the best that can be done. While the new algorithm slightly changes the amount of space allocated for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't known to be sensitive to small changes in ring buffer size, so there shouldn't be any effect. Fixes: c1135c7f ("Drivers: hv: vmbus: Introduce types of GPADL") Fixes: 6941f67a ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502 Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Tested-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Link: https://lore.kernel.org/r/20240229004533.313662-1-mhklinux@outlook.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240229004533.313662-1-mhklinux@outlook.com>
-
Peter Martincic authored
Hyper-V hosts can omit the _SYNC flag to due a bug on resume from modern suspend. In such a case, the guest may fail to update its time-of-day to account for the period when it was suspended, and could proceed with a significantly wrong time-of-day. In such a case when the guest is significantly behind, fix it by treating a _SAMPLE the same as if _SYNC was received so that the guest time-of-day is updated. This is hidden behind param hv_utils.timesync_implicit. Signed-off-by: Peter Martincic <pmartincic@microsoft.com> Acked-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20231127213524.52783-1-pmartincic@linux.microsoft.comSigned-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20231127213524.52783-1-pmartincic@linux.microsoft.com>
-
- 25 Feb, 2024 30 commits
-
-
Linus Torvalds authored
-
https://evilpiepirate.org/git/bcachefsLinus Torvalds authored
Pull bcachefs fixes from Kent Overstreet: "Some more mostly boring fixes, but some not User reported ones: - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty performance bug; user reported an untar initially taking two seconds and then ~2 minutes - kill a __GFP_NOFAIL in the buffered read path; this was a leftover from the trickier fix to kill __GFP_NOFAIL in readahead, where we can't return errors (and have to silently truncate the read ourselves). bcachefs can't use GFP_NOFAIL for folio state unlike iomap based filesystems because our folio state is just barely too big, 2MB hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL. additionally, the flags argument was just buggy, we weren't supplying GFP_KERNEL previously (!)" * tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs: bcachefs: fix bch2_save_backtrace() bcachefs: Fix check_snapshot() memcpy bcachefs: Fix bch2_journal_flush_device_pins() bcachefs: fix iov_iter count underflow on sub-block dio read bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree bcachefs: Kill __GFP_NOFAIL in buffered read path bcachefs: fix backpointer_to_text() when dev does not exist
-
Kent Overstreet authored
Missed a call in the previous fix. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
git://git.lwn.net/linuxLinus Torvalds authored
Pull two documentation build fixes from Jonathan Corbet: - The XFS online fsck documentation uses incredibly deeply nested subsection and list nesting; that broke the PDF docs build. Tweak a parameter to tell LaTeX to allow the deeper nesting. - Fix a 6.8 PDF-build regression * tag 'docs-6.8-fixes3' of git://git.lwn.net/linux: docs: translations: use attribute to store current language docs: Instruct LaTeX to cope with deeper nesting
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fixes from Greg KH: "Here are some small USB fixes for 6.8-rc6 to resolve some reported problems. These include: - regression fixes with typec tpcm code as reported by many - cdnsp and cdns3 driver fixes - usb role setting code bugfixes - build fix for uhci driver - ncm gadget driver bugfix - MAINTAINERS entry update All of these have been in linux-next all week with no reported issues and there is at least one fix in here that is in Thorsten's regression list that is being tracked" * tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tpcm: Fix issues with power being removed during reset MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role" usb: gadget: omap_udc: fix USB gadget regression on Palm TE usb: dwc3: gadget: Don't disconnect if not started usb: cdns3: fix memory double free when handle zero packet usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable() usb: roles: don't get/set_role() when usb_role_switch is unregistered usb: roles: fix NULL pointer issue when put module's reference usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers usb: cdnsp: blocked some cdns3 specific code usb: uhci-grlib: Explicitly include linux/platform_device.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty/serial driver fixes from Greg KH: "Here are three small serial/tty driver fixes for 6.8-rc6 that resolve the following reported errors: - riscv hvc console driver fix that was reported by many - amba-pl011 serial driver fix for RS485 mode - stm32 serial driver fix for RS485 mode All of these have been in linux-next all week with no reported problems" * tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: amba-pl011: Fix DMA transmission in RS485 mode serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled tty: hvc: Don't enable the RISC-V SBI console by default
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - Make sure clearing CPU buffers using VERW happens at the latest possible point in the return-to-userspace path, otherwise memory accesses after the VERW execution could cause data to land in CPU buffers again * tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM/VMX: Move VERW closer to VMentry for MDS mitigation KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key x86/entry_32: Add VERW just before userspace transition x86/entry_64: Add VERW just before userspace transition x86/bugs: Add asm helpers for executing VERW
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Borislav Petkov: - Make sure GICv4 always gets initialized to prevent a kexec-ed kernel from silently failing to set it up - Do not call bus_get_dev_root() for the mbigen irqchip as it always returns NULL - use NULL directly - Fix hardware interrupt number truncation when assigning MSI interrupts - Correct sending end-of-interrupt messages to disabled interrupts lines on RISC-V PLIC * tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Do not assume vPE tables are preallocated irqchip/mbigen: Don't use bus_get_dev_root() to find the parent PCI/MSI: Prevent MSI hardware interrupt number truncation irqchip/sifive-plic: Enable interrupt if needed before EOI
-
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofsLinus Torvalds authored
Pull erofs fix from Gao Xiang: - Fix page refcount leak when looking up specific inodes introduced by metabuf reworking * tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix refcount on the metabuf used for inode lookup
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull RCU pathwalk fixes from Al Viro: "We still have some races in filesystem methods when exposed to RCU pathwalk. This series is a result of code audit (the second round of it) and it should deal with most of that stuff. Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a note for NTFS folks - when documentation says that a method may not block, it *does* imply that blocking allocations are to be avoided. Really)" [ More explanations for people who aren't familiar with the vagaries of RCU path walking: most of it is hidden from filesystems, but if a filesystem actively participates in the low-level path walking it needs to make sure the fields involved in that walk are RCU-safe. That "actively participate in low-level path walking" includes things like having its own ->d_hash()/->d_compare() routines, or by having its own directory permission function that doesn't just use the common helpers. Having a ->d_revalidate() function will also have this issue. Note that instead of making everything RCU safe you can also choose to abort the RCU pathwalk if your operation cannot be done safely under RCU, but that obviously comes with a performance penalty. One common pattern is to allow the simple cases under RCU, and abort only if you need to do something more complicated. So not everything needs to be RCU-safe, and things like the inode etc that the VFS itself maintains obviously already are. But these fixes tend to be about properly RCU-delaying things like ->s_fs_info that are maintained by the filesystem and that got potentially released too early. - Linus ] * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4_get_link(): fix breakage in RCU mode cifs_get_link(): bail out in unsafe case fuse: fix UAF in rcu pathwalks procfs: make freeing proc_fs_info rcu-delayed procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() nfs: fix UAF on pathwalk running into umount nfs: make nfs_set_verifier() safe for use in RCU pathwalk afs: fix __afs_break_callback() / afs_drop_open_mmap() race hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper affs: free affs_sb_info with kfree_rcu() rcu pathwalk: prevent bogus hard errors from may_lookup() fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs fixes from Al Viro: "A couple of fixes - revert of regression from this cycle and a fix for erofs failure exit breakage (had been there since way back)" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: erofs: fix handling kern_mount() failure Revert "get rid of DCACHE_GENOCIDE"
-
Al Viro authored
1) errors from ext4_getblk() should not be propagated to caller unless we are really sure that we would've gotten the same error in non-RCU pathwalk. 2) we leak buffer_heads if ext4_getblk() is successful, but bh is not uptodate. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
->d_revalidate() bails out there, anyway. It's not enough to prevent getting into ->get_link() in RCU mode, but that could happen only in a very contrieved setup. Not worth trying to do anything fancy here unless ->d_revalidate() stops kicking out of RCU mode at least in some cases. Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
->permission(), ->get_link() and ->inode_get_acl() might dereference ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns as well) when called from rcu pathwalk. Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info and dropping ->user_ns rcu-delayed too. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns() is still synchronous, but that's not a problem - it does rcu-delay everything that needs to be) Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
that keeps both around until struct inode is freed, making access to them safe from rcu-pathwalk Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
NFS ->d_revalidate(), ->permission() and ->get_link() need to access some parts of nfs_server when called in RCU mode: server->flags server->caps *(server->io_stats) and, worst of all, call server->nfs_client->rpc_ops->have_delegation (the last one - as NFS_PROTO(inode)->have_delegation()). We really don't want to RCU-delay the entire nfs_free_server() (it would have to be done with schedule_work() from RCU callback, since it can't be made to run from interrupt context), but actual freeing of nfs_server and ->io_stats can be done via call_rcu() just fine. nfs_client part is handled simply by making nfs_free_client() use kfree_rcu(). Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
nfs_set_verifier() relies upon dentry being pinned; if that's the case, grabbing ->d_lock stabilizes ->d_parent and guarantees that ->d_parent points to a positive dentry. For something we'd run into in RCU mode that is *not* true - dentry might've been through dentry_kill() just as we grabbed ->d_lock, with its parent going through the same just as we get to into nfs_set_verifier_locked(). It might get to detaching inode (and zeroing ->d_inode) before nfs_set_verifier_locked() gets to fetching that; we get an oops as the result. That can happen in nfs{,4} ->d_revalidate(); the call chain in question is nfs_set_verifier_locked() <- nfs_set_verifier() <- nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate(). We have checked that the parent had been positive, but that's done before we get to nfs_set_verifier() and it's possible for memory pressure to pick our dentry as eviction candidate by that time. If that happens, back-to-back attempts to kill dentry and its parent are quite normal. Sure, in case of eviction we'll fail the ->d_seq check in the caller, but we need to survive until we return there... Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero do queue_work(&vnode->cb_work). In afs_drop_open_mmap() we decrement ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero. The trouble is, there's nothing to prevent __afs_break_callback() from seeing ->cb_nr_mmap before the decrement and do queue_work() after both the decrement and flush_work(). If that happens, we might be in trouble - vnode might get freed before the queued work runs. __afs_break_callback() is always done under ->cb_lock, so let's make sure that ->cb_nr_mmap can change from non-zero to zero while holding ->cb_lock (the spinlock component of it - it's a seqlock and we don't need to mess with the counter). Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
->d_hash() and ->d_compare() use those, so we need to delay freeing them. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem that is in process of getting shut down. Besides, having nls and upcase table cleanup moved from ->put_super() towards the place where sbi is freed makes for simpler failure exits. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
one of the flags in it is used by ->d_hash()/->d_compare() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
If lazy call of ->permission() returns a hard error, check that try_to_unlazy() succeeds before returning it. That both makes life easier for ->permission() instances and closes the race in ENOTDIR handling - it is possible that positive d_can_lookup() seen in link_path_walk() applies to the state *after* unlink() + mkdir(), while nd->inode matches the state prior to that. Normally seeing e.g. EACCES from permission check in rcu pathwalk means that with some timings non-rcu pathwalk would've run into the same; however, running into a non-executable regular file in the middle of a pathname would not get to permission check - it would fail with ENOTDIR instead. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Avoids fun races in RCU pathwalk... Same goes for freeing LSM shite hanging off super_block's arse. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Kent Overstreet authored
check_snapshot() copies the bch_snapshot to a temporary to easily handle older versions that don't have all the fields of the current version, but it lacked a min() to correctly handle keys newer and larger than the current version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
If a journal write errored, the list of devices it was written to could be empty - we're not supposed to mark an empty replicas list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Brian Foster authored
bch2_direct_IO_read() checks the request offset and size for sector alignment and then falls through to a couple calculations to shrink the size of the request based on the inode size. The problem is that these checks round up to the fs block size, which runs the risk of underflowing iter->count if the block size happens to be large enough. This is triggered by fstest generic/361 with a 4k block size, which subsequently leads to a crash. To avoid this crash, check that the shorten length doesn't exceed the overall length of the iter. Fixes: Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the keyspace where no keys are visible in the current snapshot, we have a problem - we'll scan for a very long time before scanning terminates. Awhile back, this was fixed for most cases with peek_upto() (and assertions that enforce that it's being used). But the fix missed the fact that the inodes btree is different - every key offset is in a different snapshot tree, not just the inode field. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the easy one in read_single_folio() (where wa can return an error) was missed - oops. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-