- 15 Jun, 2024 16 commits
-
-
Kefeng Wang authored
The large folio is mapped with folio size(not greater PMD_SIZE) aligned virtual address during the pagefault, ie, 'addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE)' in do_anonymous_page(). But after the mremap(), the virtual address only requires PAGE_SIZE alignment. Also pte is moved to new in move_page_tables(), then traversal of the new pte in the numa_rebuild_large_mapping() could hit the following issue, Unable to handle kernel paging request at virtual address 00000a80c021a788 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00002040341a6000 [00000a80c021a788] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP ... CPU: 76 PID: 15187 Comm: git Kdump: loaded Tainted: G W 6.10.0-rc2+ #209 Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.79 08/21/2021 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : numa_rebuild_large_mapping+0x338/0x638 lr : numa_rebuild_large_mapping+0x320/0x638 sp : ffff8000b41c3b00 x29: ffff8000b41c3b30 x28: ffff8000812a0000 x27: 00000000000a8000 x26: 00000000000000a8 x25: 0010000000000001 x24: ffff20401c7170f0 x23: 0000ffff33a1e000 x22: 0000ffff33a76000 x21: ffff20400869eca0 x20: 0000ffff33976000 x19: 00000000000000a8 x18: ffffffffffffffff x17: 0000000000000000 x16: 0000000000000020 x15: ffff8000b41c36a8 x14: 0000000000000000 x13: 205d373831353154 x12: 5b5d333331363732 x11: 000000000011ff78 x10: 000000000011ff10 x9 : ffff800080273f30 x8 : 000000320400869e x7 : c0000000ffffd87f x6 : 00000000001e6ba8 x5 : ffff206f3fb5af88 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : fffffdffc0000000 x0 : 00000a80c021a780 Call trace: numa_rebuild_large_mapping+0x338/0x638 do_numa_page+0x3e4/0x4e0 handle_pte_fault+0x1bc/0x238 __handle_mm_fault+0x20c/0x400 handle_mm_fault+0xa8/0x288 do_page_fault+0x124/0x498 do_translation_fault+0x54/0x80 do_mem_abort+0x4c/0xa8 el0_da+0x40/0x110 el0t_64_sync_handler+0xe4/0x158 el0t_64_sync+0x188/0x190 Fix it by making the start and end not only within the vma range, but also within the page table range. Link: https://lkml.kernel.org/r/20240612122822.4033433-1-wangkefeng.wang@huawei.com Fixes: d2136d74 ("mm: support multi-size THP numa balancing") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); and if DEBUG_VM were off, then pages would be lost on a local list. Our convention is that if migrate_pages() reports complete success (0), then the migratepages list will be empty; but if it reports an error or some pages remaining, then its caller must putback_movable_pages(). There's a new case in which migrate_pages() has been reporting complete success, but returning with pages left on the migratepages list: when migrate_pages_batch() successfully split a folio on the deferred list, but then the "Failure isn't counted" call does not dispose of them all. Since that block is expecting the large folio to have been counted as 1 failure already, and since the return code is later adjusted to success whenever the returned list is found empty, the simple way to fix this safely is to count splitting the deferred folio as "a failure". Link: https://lkml.kernel.org/r/46c948b4-4dd8-6e03-4c7b-ce4e81cfa536@google.com Fixes: 7262f208 ("mm/migrate: split source folio if it is on deferred split list") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Mark Brown authored
KTAP parsers interpret the output of ksft_test_result_*() as being the name of the test. The map_fixed_noreplace test uses a dynamically allocated base address for the mmap()s that it tests and currently includes this in the test names that it logs so the test names that are logged are not stable between runs. It also uses multiples of PAGE_SIZE which mean that runs for kernels with different PAGE_SIZE configurations can't be directly compared. Both these factors cause issues for CI systems when interpreting and displaying results. Fix this by replacing the current test names with fixed strings describing the intent of the mappings that are logged, the existing messages with the actual addresses and sizes are retained as diagnostic prints to aid in debugging. Link: https://lkml.kernel.org/r/20240605-kselftest-mm-fixed-noreplace-v1-1-a235db8b9be9@kernel.org Fixes: 4838cf70 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jeff Xu authored
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't have proper documentation. This led to a lot of confusion, especially about whether or not memfd created with the MFD_NOEXEC_SEAL flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set MFD_ALLOW_SEALING to be sealable, so it's a fair question. As one might have noticed, unlike other flags in memfd_create, MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea is to make it easier to use memfd in the most common way, which is NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl vm.noexec to help existing applications move to a more secure way of using memfd. Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those are based on the viewpoint that each flag is an atomic unit, which is a reasonable assumption. However, MFD_NOEXEC_SEAL was designed with the intent of promoting the most secure method of using memfd, therefore a combination of multiple functionalities into one bit. Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year, and multiple applications and distributions have backported and utilized it. Altering ABI now presents a degree of risk and may lead to disruption. MFD_NOEXEC_SEAL is a new flag, and applications must change their code to use it. There is no backward compatibility problem. When sysctl vm.noexec == 1 or 2, applications that don't set MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And old-application might break, that is by-design, in such a system vm.noexec = 0 shall be used. Also no backward compatibility problem. I propose to include this documentation patch to assist in clarifying the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future confusion. Finally, I would like to express my gratitude to David Rheinsberg and Barnabás Pőcze for initiating the discussion on the topic of sealability. [1] https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ [jeffxu@chromium.org: updates per Randy] Link: https://lkml.kernel.org/r/20240611034903.3456796-2-jeffxu@chromium.org [jeffxu@chromium.org: v3] Link: https://lkml.kernel.org/r/20240611231409.3899809-2-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240607203543.2151433-2-jeffxu@google.comSigned-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Barnabás Pőcze <pobrn@protonmail.com> Cc: Daniel Verkamp <dverkamp@chromium.org> Cc: David Rheinsberg <david@readahead.eu> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Rafael Aquini authored
An ASLR regression was noticed [1] and tracked down to file-mapped areas being backed by THP in recent kernels. The 21-bit alignment constraint for such mappings reduces the entropy for randomizing the placement of 64-bit library mappings and breaks ASLR completely for 32-bit libraries. The reported issue is easily addressed by increasing vm.mmap_rnd_bits and vm.mmap_rnd_compat_bits. This patch just provides a simple way to set ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values allowed by the architecture at build time. [1] https://zolutal.github.io/aslrnt/ [akpm@linux-foundation.org: default to `y' if 32-bit, per Rafael] Link: https://lkml.kernel.org/r/20240606180622.102099-1-aquini@redhat.com Fixes: 1854bc6e ("mm/readahead: Align file mappings for non-DAX") Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Oberparleiter authored
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 14. Tested with GCC versions 14.1.0 and 13.2.0. Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.comSigned-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reported-by: Allison Henderson <allison.henderson@oracle.com> Reported-by: Chuck Lever III <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true. That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop. Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com Fixes: 12db8b69 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Wei Fu <fuweid89@gmail.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joel Granados <j.granados@samsung.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Zqiang <qiang.zhang1211@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ran Xiaokai authored
When I did a large folios split test, a WARNING "[ 5059.122759][ T166] Cannot split file folio to non-0 order" was triggered. But the test cases are only for anonmous folios. while mapping_large_folio_support() is only reasonable for page cache folios. In split_huge_page_to_list_to_order(), the folio passed to mapping_large_folio_support() maybe anonmous folio. The folio_test_anon() check is missing. So the split of the anonmous THP is failed. This is also the same for shmem_mapping(). We'd better add a check for both. But the shmem_mapping() in __split_huge_page() is not involved, as for anonmous folios, the end parameter is set to -1, so (head[i].index >= end) is always false. shmem_mapping() is not called. Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon mapping, So we can detect the wrong use more easily. THP folios maybe exist in the pagecache even the file system doesn't support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is enabled, khugepaged will try to collapse read-only file-backed pages to THP. But the mapping does not actually support multi order large folios properly. Using /sys/kernel/debug/split_huge_pages to verify this, with this patch, large anon THP is successfully split and the warning is ceased. Link: https://lkml.kernel.org/r/202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn Fixes: c010d47f ("mm: thp: split huge page to any lower order pages") Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Michal Hocko <mhocko@kernel.org> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Suren Baghdasaryan authored
put_page_tag_ref() should be called only when get_page_tag_ref() returns a valid reference because only in that case get_page_tag_ref() enters RCU read section while put_page_tag_ref() will call rcu_read_unlock() even if the provided reference is NULL. Fix pgalloc_tag_get() which does not follow this rule causing RCU imbalance. Add a warning in put_page_tag_ref() to catch any future mistakes. Link: https://lkml.kernel.org/r/20240601233840.617458-1-surenb@google.com Fixes: cc92eba1 ("mm: fix non-compound multi-order memory accounting in __free_pages") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202405271029.6d2f9c4c-lkp@intel.comAcked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kees Cook <keescook@chromium.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Suren Baghdasaryan authored
Memory allocation profiling is trying to register sysctl interface even when CONFIG_SYSCTL=n, resulting in proc_do_static_key() being undefined. Prevent that by skipping sysctl registration for such configurations. Link: https://lkml.kernel.org/r/20240601233831.617124-1-surenb@google.com Fixes: 22d407b1 ("lib: add allocation tagging support for memory allocation profiling") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405280616.wcOGWJEj-lkp@intel.com/Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kees Cook <keescook@chromium.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lorenzo Stoakes authored
I haven't had the bandwidth to review vmalloc patches recently and I suspect I won't be able to do so consistently moving forwards, so I think it's best if I remove myself as reviewer for the time being. Link: https://lkml.kernel.org/r/20240602205510.108807-1-lstoakes@gmail.comSigned-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Hildenbrand authored
There was insufficient review and no agreement that this is the right approach. There are serious flaws with the implementation that make processes using mlock() not even work with simple fork() [1] and we get reliable crashes when rebooting. Further, simply because we might be unmapping a single PTE of a large mlocked folio, we shouldn't zero out the whole folio. ... especially because the code can also *corrupt* urelated memory because kernel_init_pages(page, folio_nr_pages(folio)); Could end up writing outside of the actual folio if we work with a tail page. Let's revert it. Once there is agreement that this is the right approach, the issues were fixed and there was reasonable review and proper testing, we can consider it again. [1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com Fixes: ba42b524 ("mm: init_mlocked_on_free_v3") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/Reported-by: Lance Yang <ioworker0@gmail.com> Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.comAcked-by: Lance Yang <ioworker0@gmail.com> Cc: York Jasper Niebuhr <yjnworkstation@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Not all pages may apply to pgtable check. One example is ZONE_DEVICE pages: they map PFNs directly, and they don't allocate page_ext at all even if there's struct page around. One may reference devm_memremap_pages(). When both ZONE_DEVICE and page-table-check enabled, then try to map some dax memories, one can trigger kernel bug constantly now when the kernel was trying to inject some pfn maps on the dax device: kernel BUG at mm/page_table_check.c:55! While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page fault resolutions, skip all the checks if page_ext doesn't even exist in pgtable checker, which applies to ZONE_DEVICE but maybe more. Link: https://lkml.kernel.org/r/20240605212146.994486-1-peterx@redhat.com Fixes: df4e817b ("mm: page table check") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yury Norov authored
'-Warray-bounds' is already disabled for gcc-10+. Now that we've merged bitmap_{read,write), I see the following error when building the kernel with gcc-9.4 (Ubuntu 20.04.4 LTS) for x86_64 allmodconfig: drivers/pinctrl/pinctrl-cy8c95x0.c: In function `cy8c95x0_read_regs_mask.isra.0': include/linux/bitmap.h:756:18: error: array subscript [1, 288230376151711744] is outside array bounds of `long unsigned int[1]' [-Werror=array-bounds] 756 | value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); | ~~~^~~~~~~~~~~ The immediate reason is that the commit b4475970 ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()") switched the bitmap_get_value8() to an alias of bitmap_read(); the same for 'set'. Now; the code that triggers Warray-bounds, calls the function like this: #define MAX_BANK 8 #define BANK_SZ 8 #define MAX_LINE (MAX_BANK * BANK_SZ) DECLARE_BITMAP(tval, MAX_LINE); // 64-bit map: unsigned long tval[1] read_val |= bitmap_get_value8(tval, i * BANK_SZ) & ~bits; bitmap_read() is implemented such that it may conditionally dereference a pointer beyond the boundary like this: unsigned long offset = start % BITS_PER_LONG; unsigned long space = BITS_PER_LONG - offset; if (space >= nbits) return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits); value_low = map[index] & BITMAP_FIRST_WORD_MASK(start); value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); return (value_low >> offset) | (value_high << space); In case of bitmap_get_value8(), it's impossible to violate the boundary because 'space >= nbits' is never the true for byte-aligned 8-bit access. So, this is clearly a false-positive. The same type of false-positives break my allmodconfig build in many places. gcc-8, is clear, however. Link: https://lkml.kernel.org/r/20240522225830.1201778-1-yury.norov@gmail.com Fixes: b4475970 ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()") Signed-off-by: Yury Norov <yury.norov@gmail.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Yoann Congal <yoann.congal@smile.fr> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Joseph Qi authored
bdev->bd_super has been removed and commit 8887b94d change the usage from bdev->bd_super to b_assoc_map->host->i_sb. Since ocfs2 hasn't set bh->b_assoc_map, it will trigger NULL pointer dereference when calling into ocfs2_abort_trigger(). Actually this was pointed out in history, see commit 74e364ad. But I've made a mistake when reviewing commit 8887b94d and then re-introduce this regression. Since we cannot revive bdev in buffer head, so fix this issue by initializing all types of ocfs2 triggers when fill super, and then get the specific ocfs2 trigger from ocfs2_caching_info when access journal. [joseph.qi@linux.alibaba.com: v2] Link: https://lkml.kernel.org/r/20240602112045.1112708-1-joseph.qi@linux.alibaba.com Link: https://lkml.kernel.org/r/20240530110630.3933832-2-joseph.qi@linux.alibaba.com Fixes: 8887b94d ("ocfs2: stop using bdev->bd_super for journal error logging") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Joseph Qi authored
bdev->bd_super has been removed and commit 8887b94d change the usage from bdev->bd_super to b_assoc_map->host->i_sb. This introduces the following NULL pointer dereference in ocfs2_journal_dirty() since b_assoc_map is still not initialized. This can be easily reproduced by running xfstests generic/186, which simulate no more credits. [ 134.351592] BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 134.355341] RIP: 0010:ocfs2_journal_dirty+0x14f/0x160 [ocfs2] ... [ 134.365071] Call Trace: [ 134.365312] <TASK> [ 134.365524] ? __die_body+0x1e/0x60 [ 134.365868] ? page_fault_oops+0x13d/0x4f0 [ 134.366265] ? __pfx_bit_wait_io+0x10/0x10 [ 134.366659] ? schedule+0x27/0xb0 [ 134.366981] ? exc_page_fault+0x6a/0x140 [ 134.367356] ? asm_exc_page_fault+0x26/0x30 [ 134.367762] ? ocfs2_journal_dirty+0x14f/0x160 [ocfs2] [ 134.368305] ? ocfs2_journal_dirty+0x13d/0x160 [ocfs2] [ 134.368837] ocfs2_create_new_meta_bhs.isra.51+0x139/0x2e0 [ocfs2] [ 134.369454] ocfs2_grow_tree+0x688/0x8a0 [ocfs2] [ 134.369927] ocfs2_split_and_insert.isra.67+0x35c/0x4a0 [ocfs2] [ 134.370521] ocfs2_split_extent+0x314/0x4d0 [ocfs2] [ 134.371019] ocfs2_change_extent_flag+0x174/0x410 [ocfs2] [ 134.371566] ocfs2_add_refcount_flag+0x3fa/0x630 [ocfs2] [ 134.372117] ocfs2_reflink_remap_extent+0x21b/0x4c0 [ocfs2] [ 134.372994] ? inode_update_timestamps+0x4a/0x120 [ 134.373692] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2] [ 134.374545] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2] [ 134.375393] ocfs2_reflink_remap_blocks+0xe4/0x4e0 [ocfs2] [ 134.376197] ocfs2_remap_file_range+0x1de/0x390 [ocfs2] [ 134.376971] ? security_file_permission+0x29/0x50 [ 134.377644] vfs_clone_file_range+0xfe/0x320 [ 134.378268] ioctl_file_clone+0x45/0xa0 [ 134.378853] do_vfs_ioctl+0x457/0x990 [ 134.379422] __x64_sys_ioctl+0x6e/0xd0 [ 134.379987] do_syscall_64+0x5d/0x170 [ 134.380550] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 134.381231] RIP: 0033:0x7fa4926397cb [ 134.381786] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01 48 [ 134.383930] RSP: 002b:00007ffc2b39f7b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 134.384854] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa4926397cb [ 134.385734] RDX: 00007ffc2b39f7f0 RSI: 000000004020940d RDI: 0000000000000003 [ 134.386606] RBP: 0000000000000000 R08: 00111a82a4f015bb R09: 00007fa494221000 [ 134.387476] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 134.388342] R13: 0000000000f10000 R14: 0000558e844e2ac8 R15: 0000000000f10000 [ 134.389207] </TASK> Fix it by only aborting transaction and journal in ocfs2_journal_dirty() now, and leave ocfs2_abort() later when detecting an aborted handle, e.g. start next transaction. Also log the handle details in this case. Link: https://lkml.kernel.org/r/20240530110630.3933832-1-joseph.qi@linux.alibaba.com Fixes: 8887b94d ("ocfs2: stop using bdev->bd_super for journal error logging") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Jun, 2024 5 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
Merge tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Update copies of kernel headers, which resulted in support for the new 'mseal' syscall, SUBVOL statx return mask bit, RISC-V and PPC prctls, fcntl's DUPFD_QUERY, POSTED_MSI_NOTIFICATION IRQ vector, 'map_shadow_stack' syscall for x86-32. - Revert perf.data record memory allocation optimization that ended up causing a regression, work is being done to re-introduce it in the next merge window. - Fix handling of minimal vmlinux.h file used with BPF's CO-RE when interrupting the build. * tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event" tools headers arm64: Sync arm64's cputype.h with the kernel sources tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL tools headers UAPI: Update i915_drm.h with the kernel sources tools headers UAPI: Sync kvm headers with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources to pick POSTED_MSI_NOTIFICATION perf beauty: Update copy of linux/socket.h with the kernel sources tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY tools headers UAPI: Sync linux/prctl.h with the kernel sources tools include UAPI: Sync linux/stat.h with the kernel sources
-
git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasLinus Torvalds authored
Pull EDAC fixes from Borislav Petkov: - Convert PCI core error codes to proper error numbers since latter get propagated all the way up to the module loading functions * tag 'edac_urgent_for_v6.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/igen6: Convert PCIBIOS_* return codes to errnos EDAC/amd64: Convert PCIBIOS_* return codes to errnos
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fix from Stephen Boyd: "One fix for the SiFive PRCI clocks so that the device boots again. This driver was registering clkdev lookups that were always going to be useless. This wasn't a problem until clkdev started returning an error in these cases, causing this driver to fail probe, and thus boot to fail because clks are essential for most drivers. The fix is simple, don't use clkdev because this is a DT based system where clkdev isn't used" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sifive: Do not register clkdevs for PRCI clocks
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fixes from Steve French: "Two small smb3 client fixes: - fix deadlock in umount - minor cleanup due to netfs change" * tag '6.10-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Don't advance the I/O iterator before terminating subrequest smb: client: fix deadlock in smb2_find_smb_tcon()
-
- 08 Jun, 2024 8 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID fixes from Benjamin Tissoires: - fix potential read out of bounds in hid-asus (Andrew Ballance) - fix endian-conversion on little endian systems in intel-ish-hid (Arnd Bergmann) - A couple of new input event codes (Aseda Aboagye) - errors handling fixes in hid-nvidia-shield (Chen Ni), hid-nintendo (Christophe JAILLET), hid-logitech-dj (José Expósito) - current leakage fix while the device is in suspend on a i2c-hid laptop (Johan Hovold) - other assorted smaller fixes and device ID / quirk entry additions * tag 'for-linus-2024060801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: Ignore battery for ELAN touchscreens 2F2C and 4116 HID: i2c-hid: elan: fix reset suspend current leakage dt-bindings: HID: i2c-hid: elan: add 'no-reset-on-power-off' property dt-bindings: HID: i2c-hid: elan: add Elan eKTH5015M dt-bindings: HID: i2c-hid: add dedicated Ilitek ILI2901 schema input: Add support for "Do Not Disturb" input: Add event code for accessibility key hid: asus: asus_report_fixup: fix potential read out of bounds HID: logitech-hidpp: add missing MODULE_DESCRIPTION() macro HID: intel-ish-hid: fix endian-conversion HID: nintendo: Fix an error handling path in nintendo_hid_probe() HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode() HID: core: remove unnecessary WARN_ON() in implement() HID: nvidia-shield: Add missing check for input_ff_create_memless HID: intel-ish-hid: Fix build error for COMPILE_TEST
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix the initial state of the save button in 'make gconfig' - Improve the Kconfig documentation - Fix a Kconfig bug regarding property visibility - Fix build breakage for systems where 'sed' is not installed in /bin - Fix a false warning about missing MODULE_DESCRIPTION() * tag 'kbuild-fixes-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: do not warn about missing MODULE_DESCRIPTION() for vmlinux.o kbuild: explicitly run mksysmap as sed script from link-vmlinux.sh kconfig: remove wrong expr_trans_bool() kconfig: doc: document behavior of 'select' and 'imply' followed by 'if' kconfig: doc: fix a typo in the note about 'imply' kconfig: gconf: give a proper initial state to the Save button kconfig: remove unneeded code for user-supplied values being out of range
-
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds authored
Pull media fixes from Mauro Carvalho Chehab: - fixes for the new ipu6 driver (and related fixes to mei csi driver) - fix a double debugfs remove logic at mgb4 driver - a documentation fix * tag 'media/v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: intel/ipu6: add csi2 port sanity check in notifier bound media: intel/ipu6: update the maximum supported csi2 port number to 6 media: mei: csi: Warn less verbosely of a missing device fwnode media: mei: csi: Put the IPU device reference media: intel/ipu6: fix the buffer flags caused by wrong parentheses media: intel/ipu6: Fix an error handling path in isys_probe() media: intel/ipu6: Move isys_remove() close to isys_probe() media: intel/ipu6: Fix some redundant resources freeing in ipu6_pci_remove() media: Documentation: v4l: Fix ACTIVE route flag media: mgb4: Fix double debugfs remove
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Ingo Molnar: - Fix possible memory leak the riscv-intc irqchip driver load failures - Fix boot crash in the sifive-plic irqchip driver caused by recently changed boot initialization order - Fix race condition in the gic-v3-its irqchip driver * tag 'irq-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update() irqchip/sifive-plic: Chain to parent IRQ after handlers are ready irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: "Miscellaneous fixes: - Fix kexec() crash if call depth tracking is enabled - Fix SMN reads on inaccessible registers on certain AMD systems" * tag 'x86-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd_nb: Check for invalid SMN reads x86/kexec: Fix bug with call depth tracking
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf event fix from Ingo Molnar: "Fix race between perf_event_free_task() and perf_event_release_kernel() that can result in missed wakeups and hung tasks" * tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix missing wakeup when waiting for context reference
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking doc fix from Ingo Molnar: "Fix typos in the kerneldoc of some of the atomic APIs" * tag 'locking-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: scripts: fix ${atomic}_sub_and_test() kerneldoc
-
Linus Torvalds authored
Merge tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "14 hotfixes, 6 of which are cc:stable. All except the nilfs2 fix affect MM and all are singletons - see the chagelogs for details" * tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors mm: fix xyz_noprof functions calling profiled functions codetag: avoid race at alloc_slab_obj_exts mm/hugetlb: do not call vma_add_reservation upon ENOMEM mm/ksm: fix ksm_zero_pages accounting mm/ksm: fix ksm_pages_scanned accounting kmsan: do not wipe out origin when doing partial unpoisoning vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr() mm: page_alloc: fix highatomic typing in multi-block buddies nilfs2: fix potential kernel bug due to lack of writeback flag waiting memcg: remove the lockdep assert from __mod_objcg_mlstate() mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n mm: drop the 'anon_' prefix for swap-out mTHP counters
-
- 07 Jun, 2024 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linuxLinus Torvalds authored
Pull gpio fixes from Bartosz Golaszewski: - interrupt handling and Kconfig fixes for gpio-tqmx86 - add a buffer for storing output values in gpio-tqmx86 as reading back the registers always returns the input values - add missing MODULE_DESCRIPTION()s to several GPIO drivers * tag 'gpio-fixes-for-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: add missing MODULE_DESCRIPTION() macros gpio: tqmx86: fix broken IRQ_TYPE_EDGE_BOTH interrupt type gpio: tqmx86: store IRQ trigger type and unmask status separately gpio: tqmx86: introduce shadow register for GPIO output value gpio: tqmx86: fix typo in Kconfig label
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull block fixes from Jens Axboe: - Fix for null_blk block size validation (Andreas) - NVMe pull request via Keith: - Use reserved tags for special fabrics operations (Chunguang) - Persistent Reservation status masking fix (Weiwen) * tag 'block-6.10-20240607' of git://git.kernel.dk/linux: null_blk: fix validation of block size nvme: fix nvme_pr_* status code parsing nvme-fabrics: use reserved tag for reg read/write command
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: - Fix a locking order issue with setting max async thread workers (Hagar) - Fix for a NULL pointer dereference for failed async flagged requests using ring provided buffers. This doesn't affect the current kernel, but it does affect older kernels, and is being queued up for 6.10 just to make the stable process easier (me) - Fix for NAPI timeout calculations for how long to busy poll, and subsequently how much to sleep post that if a wait timeout is passed in (me) - Fix for a regression in this release cycle, where we could end up using a partially unitialized match value for io-wq (Su) * tag 'io_uring-6.10-20240607' of git://git.kernel.dk/linux: io_uring: fix possible deadlock in io_register_iowq_max_workers() io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue() io_uring/napi: fix timeout calculation io_uring: check for non-NULL file pointer in io_file_can_poll()
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: - fix handling of folio private changes. The private value holds pointer to our extent buffer structure representing a metadata range. Release and create of the range was not properly synchronized when updating the private bit which ended up in double folio_put, leading to all sorts of breakage - fix a crash, reported as duplicate key in metadata, but caused by a race of fsync and size extending write. Requires prealloc target range + fsync and other conditions (log tree state, timing) - fix leak of qgroup extent records after transaction abort * tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: protect folio::private when attaching extent buffer folios btrfs: fix leak of qgroup extent records after transaction abort btrfs: fix crash on racing fsync and size-extending write into prealloc
-
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds authored
Pull nfsd fix from Chuck Lever: - Fix an occasional memory overwrite caused by a fix added in 6.10 * tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix loop termination condition in gss_free_in_token_pages()
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: - Another fix to avoid allocating pages that overlap with ERR_PTR, which manifests on rv32 - A revert for the badaccess patch I incorrectly picked up an early version of * tag 'riscv-for-linus-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Revert "riscv: mm: accelerate pagefault when badaccess" riscv: fix overlap of allocated page and PTR_ERR
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Alexander Gordeev: - Do not create PT_LOAD program header for the kenel image when the virtual memory informaton in OS_INFO data is not available. That fixes stand-alone dump failures against kernels that do not provide the virtual memory informaton - Add KVM s390 shared zeropage selftest * tag 's390-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: KVM: s390x: selftests: Add shared zeropage test s390/crash: Do not use VM info if os_info does not have it
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Will Deacon: - Fix spurious CPU hotplug warning message from SETEND emulation code - Fix the build when GCC wasn't inlining our I/O accessor internals * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/io: add constant-argument check arm64: armv8_deprecated: Fix warning in isndep cpuhp starting process
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - Default silead touchscreen driver to 10 fingers and drop 10 finger setting from all DMI quirks. More of a cleanup then a pure fix, but since the DMI quirks always get updated through the fixes branch this avoids conflicts. - Kconfig fix for randconfig builds - dell-smbios: Fix wrong token data in sysfs - amd-hsmp: Fix driver poking unsupported hw when loaded manually * tag 'platform-drivers-x86-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/amd/hsmp: Check HSMP support on AMD family of processors platform/x86: dell-smbios: Simplify error handling platform/x86: dell-smbios: Fix wrong token data in sysfs platform/x86: yt2-1380: add CONFIG_EXTCON dependency platform/x86: touchscreen_dmi: Use 2-argument strscpy() platform/x86: touchscreen_dmi: Drop "silead,max-fingers" property Input: silead - Always support 10 fingers
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull iommu fixes from Joerg Roedel: "Core: - Make iommu-dma code recognize 'force_aperture' again - Fix for potential NULL-ptr dereference from iommu_sva_bind_device() return value AMD IOMMU fixes: - Fix lockdep splat for invalid wait context - Add feature bit check before enabling PPR - Make workqueue name fit into buffer - Fix memory leak in sysfs code" * tag 'iommu-fixes-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix Invalid wait context issue iommu/amd: Check EFR[EPHSup] bit before enabling PPR iommu/amd: Fix workqueue name iommu: Return right value in iommu_sva_bind_device() iommu/dma: Fix domain init iommu/amd: Fix sysfs leak in iommu init
-
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linuxLinus Torvalds authored
Pull ata fix from Niklas Cassel: - Fix a regression for the PATA MacIO driver were it would fail to probe because of the recent changes of initializing the limits in SCSI core * tag 'ata-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K
-