- 06 Aug, 2021 1 commit
-
-
Greg Kroah-Hartman authored
Merge tag 'sysfs_defferred_iomem_get_mapping-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core driver-core-next sysfs: Allow deferred execution of iomem_get_mapping() Tag for toerh trees/branches to pull from in order to have a stable base to build off of for the "Allow deferred execution of iomem_get_mapping()" set of sysfs changes Link: https://lore.kernel.org/r/20210729233235.1508920-1-kw@linux.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> * tag 'sysfs_defferred_iomem_get_mapping-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: Rename struct bin_attribute member to f_mapping sysfs: Invoke iomem_get_mapping() from the sysfs open callback
-
- 05 Aug, 2021 7 commits
-
-
Krzysztof Wilczyński authored
There are two users of iomem_get_mapping(), the struct file and struct bin_attribute. The former has a member called "f_mapping" and the latter has a member called "mapping", and both are poniters to struct address_space. Rename struct bin_attribute member to "f_mapping" to keep both meaning and the usage consistent with other users of iomem_get_mapping(). Acked-by:
Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20210729233235.1508920-3-kw@linux.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Krzysztof Wilczyński authored
Defer invocation of the iomem_get_mapping() to the sysfs open callback so that it can be executed as needed when the binary sysfs object has been accessed. To do that, convert the "mapping" member of the struct bin_attribute from a pointer to the struct address_space into a function pointer with a signature that requires the same return type, and then updates the sysfs_kf_bin_open() to invoke provided function should the function pointer be valid. Also, convert every invocation of iomem_get_mapping() into a function pointer assignment, therefore allowing for the iomem_get_mapping() invocation to be deferred to when the sysfs open callback runs. Thus, this change removes the need for the fs_initcalls to complete before any other sub-system that uses the iomem_get_mapping() would be able to invoke it safely without leading to a failure and an Oops related to an invalid iomem_get_mapping() access. Suggested-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Acked-by:
Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20210729233235.1508920-2-kw@linux.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
If a kernel module gets unloaded then it printed report about a leak before commit 275678e7 ("debugfs: Check module state before warning in {full/open}_proxy_open()"). An additional check was added in this commit to avoid this printing. But it was forgotten that the function must return an error in this case because it was not actually opened. As result, the systems started to crash or to hang when a module was unloaded while something was trying to open a file. Fixes: 275678e7 ("debugfs: Check module state before warning in {full/open}_proxy_open()") Cc: Taehee Yoo <ap420073@gmail.com> Reported-by:
Mário Lopes <ml@simonwunderlich.de> Signed-off-by:
Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20210802162444.7848-1-sven@narfation.orgSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uwe Kleine-König authored
The only actual use is to check in zorro_device_probe() that the device isn't already bound. The driver core already ensures this however so the check can go away which allows to drop the then assigned-only member from struct zorro_dev. If the value was indeed needed somewhere it can always be calculated by to_zorro_driver(z->dev.driver) . Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210730191035.1455248-5-u.kleine-koenig@pengutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uwe Kleine-König authored
The driver core only calls a remove callback when the device was successfully bound (aka probed) before. So dev->driver is never NULL. (And even if it was NULL, to_zorro_driver(NULL) isn't ...) Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210730191035.1455248-4-u.kleine-koenig@pengutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uwe Kleine-König authored
The driver core only calls a remove callback when the device was successfully bound (aka probed) before. So dev->driver is never NULL. (And even if it was NULL, to_superhyway_driver(NULL) isn't ...) Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210730191035.1455248-3-u.kleine-koenig@pengutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uwe Kleine-König authored
The driver core only calls a remove callback when the device was successfully bound (aka probed) before. So dev->driver is never NULL and the respective check can just be dropped. Acked-by:
Finn Thain <fthain@linux-m68k.org> Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210730191035.1455248-2-u.kleine-koenig@pengutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 Jul, 2021 1 commit
-
-
Uwe Kleine-König authored
The nubus core ignores the return value of the remove callback (in nubus_device_remove()) and all implementers return 0 anyway. So make it impossible for future drivers to return an unused error code by changing the remove prototype to return void. Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by:
Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/20210727080840.3550927-3-u.kleine-koenig@pengutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 Jul, 2021 6 commits
-
-
Ian Kent authored
The call to d_splice_alias() in kernfs_iop_lookup() doesn't depend on any kernfs node so there's no reason to hold the kernfs node lock when calling it. Reviewed-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642772000.63632.10672683419693513226.stgit@web.messagingengine.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ian Kent authored
The inode operations .permission() and .getattr() use the kernfs node write lock but all that's needed is the read lock to protect against partial updates of these kernfs node fields which are all done under the write lock. And .permission() is called frequently during path walks and can cause quite a bit of contention between kernfs node operations and path walks when the number of concurrent walks is high. To change kernfs_iop_getattr() and kernfs_iop_permission() to take the rw sem read lock instead of the write lock an additional lock is needed to protect against multiple processes concurrently updating the inode attributes and link count in kernfs_refresh_inode(). The inode i_lock seems like the sensible thing to use to protect these inode attribute updates so use it in kernfs_refresh_inode(). The last hunk in the patch, applied to kernfs_fill_super(), is possibly not needed but taking the lock was present originally. I prefer to continue to take it to protect against a partial update of the source kernfs fields during the call to kernfs_refresh_inode() made by kernfs_get_inode(). Reviewed-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642771474.63632.16295959115893904470.stgit@web.messagingengine.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ian Kent authored
The kernfs global lock restricts the ability to perform kernfs node lookup operations in parallel during path walks. Change the kernfs mutex to an rwsem so that, when opportunity arises, node searches can be done in parallel with path walk lookups. Reviewed-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642770946.63632.2218304587223241374.stgit@web.messagingengine.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ian Kent authored
If there are many lookups for non-existent paths these negative lookups can lead to a lot of overhead during path walks. The VFS allows dentries to be created as negative and hashed, and caches them so they can be used to reduce the fairly high overhead alloc/free cycle that occurs during these lookups. Use the kernfs node parent revision to identify if a change has been made to the containing directory so that the negative dentry can be discarded and the lookup redone. Reviewed-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642770420.63632.15791924970508867106.stgit@web.messagingengine.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ian Kent authored
Add a revision counter to kernfs directory nodes so it can be used to detect if a directory node has changed during negative dentry revalidation. There's an assumption that sizeof(unsigned long) <= sizeof(pointer) on all architectures and as far as I know that assumption holds. So adding a revision counter to the struct kernfs_elem_dir variant of the kernfs_node type union won't increase the size of the kernfs_node struct. This is because struct kernfs_elem_dir is at least sizeof(pointer) smaller than the largest union variant. It's tempting to make the revision counter a u64 but that would increase the size of kernfs_node on archs where sizeof(pointer) is smaller than the revision counter. Reviewed-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642769895.63632.8356662784964509867.stgit@web.messagingengine.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
We need the driver-core fixes in here as well. Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 Jul, 2021 9 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
gcc doesn't care, but clang quite reasonably pointed out that the recent commit e9ba16e6 ("smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining") did some really odd things: kernel/smpboot.c:50:20: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier] static inline void __always_inline idle_init(unsigned int cpu) ^ which not only has that duplicate inlining specifier, but the new __always_inline was put in the wrong place of the function definition. We put the storage class specifiers (ie things like "static" and "extern") first, and the type information after that. And while the compiler may not care, we put the inline specifier before the types. So it should be just static __always_inline void idle_init(unsigned int cpu) instead. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Fix guest to host memory corruption in H_RTAS due to missing nargs check. - Fix guest triggerable host crashes due to bad handling of nested guest TM state. - Fix possible crashes due to incorrect reference counting in kvm_arch_vcpu_ioctl(). - Two commits fixing some regressions in KVM transactional memory handling introduced by the recent rework of the KVM code. Thanks to Nicholas Piggin, Alexey Kardashevskiy, and Michael Neuling. * tag 'powerpc-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash KVM: PPC: Book3S HV P9: Fix guest TM support
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fixes from Thomas Gleixner: "A small set of timer related fixes: - Plug a race between rearm and process tick in the posix CPU timers code - Make the optimization to avoid recalculation of the next timer interrupt work correctly when there are no timers pending" * tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Fix get_next_timer_interrupt() with no timers pending posix-cpu-timers: Fix rearm racing against process tick
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 jump label fix from Thomas Gleixner: "A single fix for jump labels to prevent the compiler from agressive un-inlining which results in a section mismatch" * tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Thomas Gleixner: "A set of EFI fixes: - Prevent memblock and I/O reserved resources to get out of sync when EFI memreserve is in use. - Don't claim a non-existing table is invalid - Don't warn when firmware memory is already reserved correctly" * tag 'efi-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/mokvar: Reserve the table only if it is in boot services data efi/libstub: Fix the efi_load_initrd function description firmware/efi: Tell memblock about EFI iomem reservations efi/tpm: Differentiate missing and invalid final event log table.
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull core fix from Thomas Gleixner: "A single update for the boot code to prevent aggressive un-inlining which causes a section mismatch" * tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: - handle vmalloc addresses in dma_common_{mmap,get_sgtable} (Roman Skakun) * tag 'dma-mapping-5.14-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Five cifs/smb3 fixes, including a DFS failover fix, two fallocate fixes, and two trivial coverity cleanups" * tag '5.14-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix fallocate when trying to allocate a hole. CIFS: Clarify SMB1 code for POSIX delete file CIFS: Clarify SMB1 code for POSIX Create cifs: support share failover when remounting cifs: only write 64kb at a time when fallocating a small region of a file
-
- 24 Jul, 2021 16 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: - properly set the memory size, which fixes 32-bit systems - allow initrd to load anywhere in memory, rather that restricting it to the first 256MiB - fix the 'mem=' parameter on 64-bit systems to properly account for the maximum supported memory now that the kernel is outside the linear map - avoid installing mappings into the last 4KiB of memory, which conflicts with error values - avoid the stack from being freed while it is being walked - a handful of fixes to the new copy to/from user routines * tag 'riscv-for-linus-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: __asm_copy_to-from_user: Fix: Typos in comments riscv: __asm_copy_to-from_user: Remove unnecessary size check riscv: __asm_copy_to-from_user: Fix: fail on RV32 riscv: __asm_copy_to-from_user: Fix: overrun copy riscv: stacktrace: pin the task's stack in get_wchan riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE riscv: Make sure the linear mapping does not use the kernel mapping riscv: Fix memory_limit for 64-bit kernel RISC-V: load initrd wherever it fits into memory riscv: Fix 32-bit RISC-V boot failure
-
Linus Torvalds authored
Commit 71f64283 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()") started doing "acpi_dev_put()" on a pointer that was possibly NULL. That fails miserably, because that helper inline function is not set up to handle that case. Just make acpi_dev_put() silently accept a NULL pointer, rather than calling down to put_device() with an invalid offset off that NULL pointer. Link: https://lore.kernel.org/lkml/a607c149-6bf6-0fd0-0e31-100378504da2@kernel.dk/Reported-and-tested-by:
Jens Axboe <axboe@kernel.dk> Tested-by:
Daniel Scally <djrscally@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Four fixes, all in drivers, all of which can lead to user visible problems in certain situations" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: Fix NULL dereference on XCOPY completion scsi: mpt3sas: Transition IOC to Ready state during shutdown scsi: target: Fix protect handling in WRITE SAME(32) scsi: iscsi: Fix iface sysfs attr detection
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: - Fix a memory leak due to a race condition in io_init_wq_offload (Yang) - Poll error handling fixes (Pavel) - Fix early fdput() regression (me) - Don't reissue iopoll requests off release path (me) - Add a safety check for io-wq queue off wrong path (me) * tag 'io_uring-5.14-2021-07-24' of git://git.kernel.dk/linux-block: io_uring: explicitly catch any illegal async queue attempt io_uring: never attempt iopoll reissue from release path io_uring: fix early fdput() of file io_uring: fix memleak in io_init_wq_offload() io_uring: remove double poll entry on arm failure io_uring: explicitly count entries for poll reqs
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: - NVMe pull request (Christoph): - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng) - Increase max blk-cgroup policy size, now that mq-deadline uses it too (Oleksandr) * tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-block: nvme: set the PRACT bit when using Write Zeroes with T10 PI nvme: fix nvme_setup_command metadata trace event nvme: fix refcounting imbalance when all paths are down nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING block: increase BLKCG_MAX_POLS
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "Two bugfixes for the I2C subsystem" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mpc: Poll for MCF misc: eeprom: at24: Always append device id even if label property is set.
-
Linus Torvalds authored
Merge misc mm fixes from Andrew Morton: "15 patches. VM subsystems affected by this patch series: userfaultfd, kfence, highmem, pagealloc, memblock, pagecache, secretmem, pagemap, and hugetlbfs" * akpm: hugetlbfs: fix mount mode command line processing mm: fix the deadlock in finish_fault() mm: mmap_lock: fix disabling preemption directly mm/secretmem: wire up ->set_page_dirty writeback, cgroup: do not reparent dax inodes writeback, cgroup: remove wb from offline list before releasing refcnt memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction mm: use kmap_local_page in memzero_page mm: call flush_dcache_page() in memcpy_to_page() and memzero_page() kfence: skip all GFP_ZONEMASK allocations kfence: move the size check to the beginning of __kfence_alloc() kfence: defer kfence_test_init to ensure that kunit debugfs is created selftest: use mmap instead of posix_memalign to allocate memory userfaultfd: do not untag user pointers
-
Akira Tsukamoto authored
Fixing typos and grammar mistakes and using more intuitive label name. Signed-off-by:
Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Akira Tsukamoto authored
Clean up: The size of 0 will be evaluated in the next step. Not required here. Signed-off-by:
Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Akira Tsukamoto authored
Had a bug when converting bytes to bits when the cpu was rv32. The a3 contains the number of bytes and multiple of 8 would be the bits. The LGREG is holding 2 for RV32 and 3 for RV32, so to achieve multiple of 8 it must always be constant 3. The 2 was mistakenly used for rv32. Signed-off-by:
Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Akira Tsukamoto authored
There were two causes for the overrun memory access. The threshold size was too small. The aligning dst require one SZREG and unrolling word copy requires 8*SZREG, total have to be at least 9*SZREG. Inside the unrolling copy, the subtracting -(8*SZREG-1) would make iteration happening one extra loop. Proper value is -(8*SZREG). Signed-off-by:
Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Mike Kravetz authored
In commit 32021982 ("hugetlbfs: Convert to fs_context") processing of the mount mode string was changed from match_octal() to fsparam_u32. This changed existing behavior as match_octal does not require octal values to have a '0' prefix, but fsparam_u32 does. Use fsparam_u32oct which provides the same behavior as match_octal. Link: https://lkml.kernel.org/r/20210721183326.102716-1-mike.kravetz@oracle.com Fixes: 32021982 ("hugetlbfs: Convert to fs_context") Signed-off-by:
Mike Kravetz <mike.kravetz@oracle.com> Reported-by:
Dennis Camera <bugs+kernel.org@dtnr.ch> Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Qi Zheng authored
Commit 63f3655f ("mm, memcg: fix reclaim deadlock with writeback") fix the following ABBA deadlock by pre-allocating the pte page table without holding the page lock. lock_page(A) SetPageWriteback(A) unlock_page(A) lock_page(B) lock_page(B) pte_alloc_one shrink_page_list wait_on_page_writeback(A) SetPageWriteback(B) unlock_page(B) # flush A, B to clear the writeback Commit f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") reworked the relevant code but ignored this race. This will cause the deadlock above to appear again, so fix it. Link: https://lkml.kernel.org/r/20210721074849.57004-1-zhengqi.arch@bytedance.com Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by:
Qi Zheng <zhengqi.arch@bytedance.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Muchun Song authored
Commit 832b5072 ("mm: mmap_lock: use local locks instead of disabling preemption") fixed a bug by using local locks. But commit d01079f3 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations") changed those lines back to the original version. I guess it was introduced by fixing conflicts. Link: https://lkml.kernel.org/r/20210720074228.76342-1-songmuchun@bytedance.com Fixes: d01079f3 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Acked-by:
Mel Gorman <mgorman@techsingularity.net> Reviewed-by:
Yang Shi <shy828301@gmail.com> Reviewed-by:
Pankaj Gupta <pankaj.gupta@ionos.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
Make secretmem up to date with the changes done in commit 0af57378 ("mm: require ->set_page_dirty to be explicitly wired up") so that unconditional call to this method won't cause crashes. Link: https://lkml.kernel.org/r/20210716063933.31633-1-rppt@kernel.org Fixes: 0af57378 ("mm: require ->set_page_dirty to be explicitly wired up") Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
The inode switching code is not suited for dax inodes. An attempt to switch a dax inode to a parent writeback structure (as a part of a writeback cleanup procedure) results in a panic like this: run fstests generic/270 at 2021-07-15 05:54:02 XFS (pmem0p2): EXPERIMENTAL big timestamp feature in use. Use at your own risk! XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk XFS (pmem0p2): EXPERIMENTAL inode btree counters feature in use. Use at your own risk! XFS (pmem0p2): Mounting V5 Filesystem XFS (pmem0p2): Ending clean mount XFS (pmem0p2): Quotacheck needed: Please wait. XFS (pmem0p2): Quotacheck: Done. XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) BUG: unable to handle page fault for address: 0000000005b0f669 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 13 PID: 10479 Comm: kworker/13:16 Not tainted 5.14.0-rc1-master-8096acd7+ #8 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/13/2016 Workqueue: inode_switch_wbs inode_switch_wbs_work_fn RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Call Trace: inode_switch_wbs_work_fn+0xb6/0x2a0 process_one_work+0x1e6/0x380 worker_thread+0x53/0x3d0 kthread+0x10f/0x130 ret_from_fork+0x22/0x30 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit iTCO_wdt irqbypass drm_kms_helper iTCO_vendor_support acpi_ipmi rapl syscopyarea sysfillrect intel_cstate ipmi_si sysimgblt ioatdma dax_pmem_compat fb_sys_fops ipmi_devintf device_dax i2c_i801 pcspkr intel_uncore hpilo nd_pmem cec dax_pmem_core dca i2c_smbus acpi_tad lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel tg3 ghash_clmulni_intel serio_raw hpsa hpwdt scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000005b0f669 ---[ end trace ed2105faff8384f3 ]--- RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x15200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The crash happens on an attempt to iterate over attached pagecache pages and check the dirty flag: a dax inode's xarray contains pfn's instead of generic struct page pointers. This happens for DAX and not for other kinds of non-page entries in the inodes because it's a tagged iteration, and shadow/swap entries are never tagged; only DAX entries get tagged. Fix the problem by bailing out (with the false return value) of inode_prepare_sbs_switch() if a dax inode is passed. [willy@infradead.org: changelog addition] Link: https://lkml.kernel.org/r/20210719171350.3876830-1-guro@fb.com Fixes: c22d70a1 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by:
Roman Gushchin <guro@fb.com> Reported-by:
Murphy Zhou <jencce.kernel@gmail.com> Reported-by:
Darrick J. Wong <djwong@kernel.org> Tested-by:
Darrick J. Wong <djwong@kernel.org> Tested-by:
Murphy Zhou <jencce.kernel@gmail.com> Acked-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-