- 26 Jul, 2021 11 commits
-
-
Isaac J. Manjarres authored
Implement the map_pages() callback for the ARM LPAE io-pgtable format. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-12-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
Implement the unmap_pages() callback for the ARM LPAE io-pgtable format. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-11-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
The PTE methods currently operate on a single entry. In preparation for manipulating multiple PTEs in one map or unmap call, allow them to handle multiple PTEs. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-10-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
Since iommu_pgsize can calculate how many pages of the same size can be mapped/unmapped before the next largest page size boundary, add support for invoking an IOMMU driver's map_pages() callback, if it provides one. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1623850736-389584-9-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Will Deacon authored
Extend iommu_pgsize() to populate an optional 'count' parameter so that we can direct unmapping operation to the ->unmap_pages callback if it has been provided by the driver. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1623850736-389584-8-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Will Deacon authored
The 'addr_merge' parameter to iommu_pgsize() is a fabricated address intended to describe the alignment requirements to consider when choosing an appropriate page size. On the iommu_map() path, this address is the logical OR of the virtual and physical addresses. Subsequent improvements to iommu_pgsize() will need to check the alignment of the virtual and physical components of 'addr_merge' independently, so pass them in as separate parameters and reconstruct 'addr_merge' locally. No functional change. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1623850736-389584-7-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Will Deacon authored
Avoid the potential for shifting values by amounts greater than the width of their type by using a bitmap to compute page size in iommu_pgsize(). Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1623850736-389584-6-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to map a physically contiguous rnage of pages of the same size. For IOMMU drivers that do not specify a map_pages() callback, the existing logic of mapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-5-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
Mapping memory into io-pgtables follows the same semantics that unmapping memory used to follow (i.e. a buffer will be mapped one page block per call to the io-pgtable code). This means that it can be optimized in the same way that unmapping memory was, so add a map_pages() callback to the io-pgtable ops structure, so that a range of pages of the same size can be mapped within the same call. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-4-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to unmap a virtually contiguous range of pages of the same size. For IOMMU drivers that do not specify an unmap_pages() callback, the existing logic of unmapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-3-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Isaac J. Manjarres authored
The io-pgtable code expects to operate on a single block or granule of memory that is supported by the IOMMU hardware when unmapping memory. This means that when a large buffer that consists of multiple such blocks is unmapped, the io-pgtable code will walk the page tables to the correct level to unmap each block, even for blocks that are virtually contiguous and at the same level, which can incur an overhead in performance. Introduce the unmap_pages() page table op to express to the io-pgtable code that it should unmap a number of blocks of the same size, instead of a single block. Doing so allows multiple blocks to be unmapped in one call to the io-pgtable code, reducing the number of page table walks, and indirect calls. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-2-git-send-email-quic_c_gdjako@quicinc.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
- 25 Jul, 2021 9 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
gcc doesn't care, but clang quite reasonably pointed out that the recent commit e9ba16e6 ("smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining") did some really odd things: kernel/smpboot.c:50:20: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier] static inline void __always_inline idle_init(unsigned int cpu) ^ which not only has that duplicate inlining specifier, but the new __always_inline was put in the wrong place of the function definition. We put the storage class specifiers (ie things like "static" and "extern") first, and the type information after that. And while the compiler may not care, we put the inline specifier before the types. So it should be just static __always_inline void idle_init(unsigned int cpu) instead. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Fix guest to host memory corruption in H_RTAS due to missing nargs check. - Fix guest triggerable host crashes due to bad handling of nested guest TM state. - Fix possible crashes due to incorrect reference counting in kvm_arch_vcpu_ioctl(). - Two commits fixing some regressions in KVM transactional memory handling introduced by the recent rework of the KVM code. Thanks to Nicholas Piggin, Alexey Kardashevskiy, and Michael Neuling. * tag 'powerpc-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash KVM: PPC: Book3S HV P9: Fix guest TM support
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fixes from Thomas Gleixner: "A small set of timer related fixes: - Plug a race between rearm and process tick in the posix CPU timers code - Make the optimization to avoid recalculation of the next timer interrupt work correctly when there are no timers pending" * tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Fix get_next_timer_interrupt() with no timers pending posix-cpu-timers: Fix rearm racing against process tick
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 jump label fix from Thomas Gleixner: "A single fix for jump labels to prevent the compiler from agressive un-inlining which results in a section mismatch" * tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Thomas Gleixner: "A set of EFI fixes: - Prevent memblock and I/O reserved resources to get out of sync when EFI memreserve is in use. - Don't claim a non-existing table is invalid - Don't warn when firmware memory is already reserved correctly" * tag 'efi-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/mokvar: Reserve the table only if it is in boot services data efi/libstub: Fix the efi_load_initrd function description firmware/efi: Tell memblock about EFI iomem reservations efi/tpm: Differentiate missing and invalid final event log table.
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull core fix from Thomas Gleixner: "A single update for the boot code to prevent aggressive un-inlining which causes a section mismatch" * tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: - handle vmalloc addresses in dma_common_{mmap,get_sgtable} (Roman Skakun) * tag 'dma-mapping-5.14-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Five cifs/smb3 fixes, including a DFS failover fix, two fallocate fixes, and two trivial coverity cleanups" * tag '5.14-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix fallocate when trying to allocate a hole. CIFS: Clarify SMB1 code for POSIX delete file CIFS: Clarify SMB1 code for POSIX Create cifs: support share failover when remounting cifs: only write 64kb at a time when fallocating a small region of a file
-
- 24 Jul, 2021 20 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: - properly set the memory size, which fixes 32-bit systems - allow initrd to load anywhere in memory, rather that restricting it to the first 256MiB - fix the 'mem=' parameter on 64-bit systems to properly account for the maximum supported memory now that the kernel is outside the linear map - avoid installing mappings into the last 4KiB of memory, which conflicts with error values - avoid the stack from being freed while it is being walked - a handful of fixes to the new copy to/from user routines * tag 'riscv-for-linus-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: __asm_copy_to-from_user: Fix: Typos in comments riscv: __asm_copy_to-from_user: Remove unnecessary size check riscv: __asm_copy_to-from_user: Fix: fail on RV32 riscv: __asm_copy_to-from_user: Fix: overrun copy riscv: stacktrace: pin the task's stack in get_wchan riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE riscv: Make sure the linear mapping does not use the kernel mapping riscv: Fix memory_limit for 64-bit kernel RISC-V: load initrd wherever it fits into memory riscv: Fix 32-bit RISC-V boot failure
-
Linus Torvalds authored
Commit 71f64283 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()") started doing "acpi_dev_put()" on a pointer that was possibly NULL. That fails miserably, because that helper inline function is not set up to handle that case. Just make acpi_dev_put() silently accept a NULL pointer, rather than calling down to put_device() with an invalid offset off that NULL pointer. Link: https://lore.kernel.org/lkml/a607c149-6bf6-0fd0-0e31-100378504da2@kernel.dk/Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Daniel Scally <djrscally@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Four fixes, all in drivers, all of which can lead to user visible problems in certain situations" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: Fix NULL dereference on XCOPY completion scsi: mpt3sas: Transition IOC to Ready state during shutdown scsi: target: Fix protect handling in WRITE SAME(32) scsi: iscsi: Fix iface sysfs attr detection
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: - Fix a memory leak due to a race condition in io_init_wq_offload (Yang) - Poll error handling fixes (Pavel) - Fix early fdput() regression (me) - Don't reissue iopoll requests off release path (me) - Add a safety check for io-wq queue off wrong path (me) * tag 'io_uring-5.14-2021-07-24' of git://git.kernel.dk/linux-block: io_uring: explicitly catch any illegal async queue attempt io_uring: never attempt iopoll reissue from release path io_uring: fix early fdput() of file io_uring: fix memleak in io_init_wq_offload() io_uring: remove double poll entry on arm failure io_uring: explicitly count entries for poll reqs
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: - NVMe pull request (Christoph): - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng) - Increase max blk-cgroup policy size, now that mq-deadline uses it too (Oleksandr) * tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-block: nvme: set the PRACT bit when using Write Zeroes with T10 PI nvme: fix nvme_setup_command metadata trace event nvme: fix refcounting imbalance when all paths are down nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING block: increase BLKCG_MAX_POLS
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "Two bugfixes for the I2C subsystem" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mpc: Poll for MCF misc: eeprom: at24: Always append device id even if label property is set.
-
Linus Torvalds authored
Merge misc mm fixes from Andrew Morton: "15 patches. VM subsystems affected by this patch series: userfaultfd, kfence, highmem, pagealloc, memblock, pagecache, secretmem, pagemap, and hugetlbfs" * akpm: hugetlbfs: fix mount mode command line processing mm: fix the deadlock in finish_fault() mm: mmap_lock: fix disabling preemption directly mm/secretmem: wire up ->set_page_dirty writeback, cgroup: do not reparent dax inodes writeback, cgroup: remove wb from offline list before releasing refcnt memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction mm: use kmap_local_page in memzero_page mm: call flush_dcache_page() in memcpy_to_page() and memzero_page() kfence: skip all GFP_ZONEMASK allocations kfence: move the size check to the beginning of __kfence_alloc() kfence: defer kfence_test_init to ensure that kunit debugfs is created selftest: use mmap instead of posix_memalign to allocate memory userfaultfd: do not untag user pointers
-
Akira Tsukamoto authored
Fixing typos and grammar mistakes and using more intuitive label name. Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
-
Akira Tsukamoto authored
Clean up: The size of 0 will be evaluated in the next step. Not required here. Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
-
Akira Tsukamoto authored
Had a bug when converting bytes to bits when the cpu was rv32. The a3 contains the number of bytes and multiple of 8 would be the bits. The LGREG is holding 2 for RV32 and 3 for RV32, so to achieve multiple of 8 it must always be constant 3. The 2 was mistakenly used for rv32. Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
-
Akira Tsukamoto authored
There were two causes for the overrun memory access. The threshold size was too small. The aligning dst require one SZREG and unrolling word copy requires 8*SZREG, total have to be at least 9*SZREG. Inside the unrolling copy, the subtracting -(8*SZREG-1) would make iteration happening one extra loop. Proper value is -(8*SZREG). Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa2 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
-
Mike Kravetz authored
In commit 32021982 ("hugetlbfs: Convert to fs_context") processing of the mount mode string was changed from match_octal() to fsparam_u32. This changed existing behavior as match_octal does not require octal values to have a '0' prefix, but fsparam_u32 does. Use fsparam_u32oct which provides the same behavior as match_octal. Link: https://lkml.kernel.org/r/20210721183326.102716-1-mike.kravetz@oracle.com Fixes: 32021982 ("hugetlbfs: Convert to fs_context") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dennis Camera <bugs+kernel.org@dtnr.ch> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Qi Zheng authored
Commit 63f3655f ("mm, memcg: fix reclaim deadlock with writeback") fix the following ABBA deadlock by pre-allocating the pte page table without holding the page lock. lock_page(A) SetPageWriteback(A) unlock_page(A) lock_page(B) lock_page(B) pte_alloc_one shrink_page_list wait_on_page_writeback(A) SetPageWriteback(B) unlock_page(B) # flush A, B to clear the writeback Commit f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") reworked the relevant code but ignored this race. This will cause the deadlock above to appear again, so fix it. Link: https://lkml.kernel.org/r/20210721074849.57004-1-zhengqi.arch@bytedance.com Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Muchun Song authored
Commit 832b5072 ("mm: mmap_lock: use local locks instead of disabling preemption") fixed a bug by using local locks. But commit d01079f3 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations") changed those lines back to the original version. I guess it was introduced by fixing conflicts. Link: https://lkml.kernel.org/r/20210720074228.76342-1-songmuchun@bytedance.com Fixes: d01079f3 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
Make secretmem up to date with the changes done in commit 0af57378 ("mm: require ->set_page_dirty to be explicitly wired up") so that unconditional call to this method won't cause crashes. Link: https://lkml.kernel.org/r/20210716063933.31633-1-rppt@kernel.org Fixes: 0af57378 ("mm: require ->set_page_dirty to be explicitly wired up") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
The inode switching code is not suited for dax inodes. An attempt to switch a dax inode to a parent writeback structure (as a part of a writeback cleanup procedure) results in a panic like this: run fstests generic/270 at 2021-07-15 05:54:02 XFS (pmem0p2): EXPERIMENTAL big timestamp feature in use. Use at your own risk! XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk XFS (pmem0p2): EXPERIMENTAL inode btree counters feature in use. Use at your own risk! XFS (pmem0p2): Mounting V5 Filesystem XFS (pmem0p2): Ending clean mount XFS (pmem0p2): Quotacheck needed: Please wait. XFS (pmem0p2): Quotacheck: Done. XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) BUG: unable to handle page fault for address: 0000000005b0f669 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 13 PID: 10479 Comm: kworker/13:16 Not tainted 5.14.0-rc1-master-8096acd7+ #8 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/13/2016 Workqueue: inode_switch_wbs inode_switch_wbs_work_fn RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Call Trace: inode_switch_wbs_work_fn+0xb6/0x2a0 process_one_work+0x1e6/0x380 worker_thread+0x53/0x3d0 kthread+0x10f/0x130 ret_from_fork+0x22/0x30 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit iTCO_wdt irqbypass drm_kms_helper iTCO_vendor_support acpi_ipmi rapl syscopyarea sysfillrect intel_cstate ipmi_si sysimgblt ioatdma dax_pmem_compat fb_sys_fops ipmi_devintf device_dax i2c_i801 pcspkr intel_uncore hpilo nd_pmem cec dax_pmem_core dca i2c_smbus acpi_tad lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel tg3 ghash_clmulni_intel serio_raw hpsa hpwdt scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000005b0f669 ---[ end trace ed2105faff8384f3 ]--- RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x15200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The crash happens on an attempt to iterate over attached pagecache pages and check the dirty flag: a dax inode's xarray contains pfn's instead of generic struct page pointers. This happens for DAX and not for other kinds of non-page entries in the inodes because it's a tagged iteration, and shadow/swap entries are never tagged; only DAX entries get tagged. Fix the problem by bailing out (with the false return value) of inode_prepare_sbs_switch() if a dax inode is passed. [willy@infradead.org: changelog addition] Link: https://lkml.kernel.org/r/20210719171350.3876830-1-guro@fb.com Fixes: c22d70a1 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Murphy Zhou <jencce.kernel@gmail.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
Boyang reported that the commit c22d70a1 ("writeback, cgroup: release dying cgwbs by switching attached inodes") causes the kernel to crash while running xfstests generic/256 on ext4 on aarch64 and ppc64le. run fstests generic/256 at 2021-07-12 05:41:40 EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: . Quota mode: none. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp=00000000b0502000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP Modules linked in: dm_flakey dm_snapshot dm_bufio dm_zero dm_mod loop tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc ext4 vfat fat mbcache jbd2 drm fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_net net_failover virtio_console failover virtio_mmio aes_neon_bs [last unloaded: scsi_debug] CPU: 0 PID: 408468 Comm: kworker/u8:5 Tainted: G X --------- --- 5.14.0-0.rc1.15.bx.el9.aarch64 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Workqueue: events_unbound cleanup_offline_cgwbs_workfn pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO BTYPE=--) pc : cleanup_offline_cgwbs_workfn+0x320/0x394 lr : cleanup_offline_cgwbs_workfn+0xe0/0x394 sp : ffff80001554fd10 x29: ffff80001554fd10 x28: 0000000000000000 x27: 0000000000000001 x26: 0000000000000000 x25: 00000000000000e0 x24: ffffd2a2fbe671a8 x23: ffff80001554fd88 x22: ffffd2a2fbe67198 x21: ffffd2a2fc25a730 x20: ffff210412bc3000 x19: ffff210412bc3280 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000040 x11: ffff210481572238 x10: ffff21048157223a x9 : ffffd2a2fa276c60 x8 : ffff210484106b60 x7 : 0000000000000000 x6 : 000000000007d18a x5 : ffff210416a86400 x4 : ffff210412bc0280 x3 : 0000000000000000 x2 : ffff80001554fd88 x1 : ffff210412bc0280 x0 : 0000000000000003 Call trace: cleanup_offline_cgwbs_workfn+0x320/0x394 process_one_work+0x1f4/0x4b0 worker_thread+0x184/0x540 kthread+0x114/0x120 ret_from_fork+0x10/0x18 Code: d63f0020 97f99963 17ffffa6 f8588263 (f9400061) ---[ end trace e250fe289272792a ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 0-2 Kernel Offset: 0x52a2e9fa0000 from 0xffff800010000000 PHYS_OFFSET: 0xfff0defca0000000 CPU features: 0x00200251,23200840 Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- The problem happens when cgwb_release_workfn() races with cleanup_offline_cgwbs_workfn(): wb_tryget() in cleanup_offline_cgwbs_workfn() can be called after percpu_ref_exit() is cgwb_release_workfn(), which is basically a use-after-free error. Fix the problem by making removing the writeback structure from the offline list before releasing the percpu reference counter. It will guarantee that cleanup_offline_cgwbs_workfn() will not see and not access writeback structures which are about to be released. Link: https://lkml.kernel.org/r/20210716201039.3762203-1-guro@fb.com Fixes: c22d70a1 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Boyang Xue <bxue@redhat.com> Suggested-by: Jan Kara <jack@suse.cz> Tested-by: Darrick J. Wong <djwong@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
Commit b10d6bca ("arch, drivers: replace for_each_membock() with for_each_mem_range()") didn't take into account that when there is movable_node parameter in the kernel command line, for_each_mem_range() would skip ranges marked with MEMBLOCK_HOTPLUG. The page table setup code in POWER uses for_each_mem_range() to create the linear mapping of the physical memory and since the regions marked as MEMORY_HOTPLUG are skipped, they never make it to the linear map. A later access to the memory in those ranges will fail: BUG: Unable to handle kernel data access on write at 0xc000000400000000 Faulting instruction address: 0xc00000000008a3c0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7 NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040 REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000 CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0 GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000 GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200 GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300 GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000 GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0 GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680 NIP clear_user_page+0x50/0x80 LR __handle_mm_fault+0xc88/0x1910 Call Trace: __handle_mm_fault+0xc44/0x1910 (unreliable) handle_mm_fault+0x130/0x2a0 __get_user_pages+0x248/0x610 __get_user_pages_remote+0x12c/0x3e0 get_arg_page+0x54/0xf0 copy_string_kernel+0x11c/0x210 kernel_execve+0x16c/0x220 call_usermodehelper_exec_async+0x1b0/0x2f0 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050 7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec ---[ end trace 490b8c67e6075e09 ]--- Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the traversal fixes this issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100 Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org Fixes: b10d6bca ("arch, drivers: replace for_each_membock() with for_each_mem_range()") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sergei Trofimovich authored
To reproduce the failure we need the following system: - kernel command: page_poison=1 init_on_free=0 init_on_alloc=0 - kernel config: * CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y * CONFIG_INIT_ON_FREE_DEFAULT_ON=y * CONFIG_PAGE_POISONING=y Resulting in: 0000000085629bdd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000000022861832: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000c597f5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G U O 5.13.1-gentoo-x86_64 #1 Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021 Call Trace: dump_stack+0x64/0x7c __kernel_unpoison_pages.cold+0x48/0x84 post_alloc_hook+0x60/0xa0 get_page_from_freelist+0xdb8/0x1000 __alloc_pages+0x163/0x2b0 __get_free_pages+0xc/0x30 pgd_alloc+0x2e/0x1a0 mm_init+0x185/0x270 dup_mm+0x6b/0x4f0 copy_process+0x190d/0x1b10 kernel_clone+0xba/0x3b0 __do_sys_clone+0x8f/0xb0 do_syscall_64+0x68/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Before commit 51cba1eb ("init_on_alloc: Optimize static branches") init_on_alloc never enabled static branch by default. It could only be enabed explicitly by init_mem_debugging_and_hardening(). But after commit 51cba1eb, a static branch could already be enabled by default. There was no code to ever disable it. That caused page_poison=1 / init_on_free=1 conflict. This change extends init_mem_debugging_and_hardening() to also disable static branch disabling. Link: https://lkml.kernel.org/r/20210714031935.4094114-1-keescook@chromium.org Link: https://lore.kernel.org/r/20210712215816.1512739-1-slyfox@gentoo.org Fixes: 51cba1eb ("init_on_alloc: Optimize static branches") Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Kees Cook <keescook@chromium.org> Co-developed-by: Kees Cook <keescook@chromium.org> Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Reported-by: <bowsingbetee@pm.me> Tested-by: <bowsingbetee@protonmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
The commit message introducing the global memzero_page explicitly mentions switching to kmap_local_page in the commit log but doesn't actually do that. Link: https://lkml.kernel.org/r/20210713055231.137602-3-hch@lst.de Fixes: 28961998 ("iov_iter: lift memzero_page() to highmem.h") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-