- 18 Aug, 2023 26 commits
-
-
Peng Zhang authored
When expanding a range in two directions, only partially overwriting the previous and next ranges, the number of entries will not be increased, so we can just update the pivots as a fast path. However, it may introduce potential risks in RCU mode, because it updates two pivots. We only enable it in non-RCU mode. Link: https://lkml.kernel.org/r/20230628073657.75314-5-zhangpeng.00@bytedance.comSigned-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peng Zhang authored
When the new range can be completely covered by the original last range without touching the boundaries on both sides, two new entries can be appended to the end as a fast path. We update the original last pivot at the end, and the newly appended two entries will not be accessed before this, so it is also safe in RCU mode. This is useful for sequential insertion, which is what we do in dup_mmap(). Enabling BENCH_FORK in test_maple_tree and just running bench_forking() gives the following time-consuming numbers: before: after: 17,874.83 msec 15,738.38 msec It shows about a 12% performance improvement for duplicating VMAs. Link: https://lkml.kernel.org/r/20230628073657.75314-4-zhangpeng.00@bytedance.comSigned-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peng Zhang authored
Add test for expanding range in RCU mode. If we use the fast path of the slot store to expand range in RCU mode, this test will fail. Link: https://lkml.kernel.org/r/20230628073657.75314-3-zhangpeng.00@bytedance.comSigned-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peng Zhang authored
Patch series "Optimize the fast path of mas_store()", v4. Add fast paths for mas_wr_append() and mas_wr_slot_store() respectively. The newly added fast path of mas_wr_append() is used in fork() and how much it benefits fork() depends on how many VMAs are duplicated. Thanks Liam for the review. This patch (of 4): Add tests for all cases of mas_wr_append() and mas_wr_slot_store(). Link: https://lkml.kernel.org/r/20230628073657.75314-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230628073657.75314-2-zhangpeng.00@bytedance.comSigned-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Haibo Li authored
ra->prev_pos tracks the last visited byte in the previous read request. It is used to check whether it is sequential read in ondemand_readahead and thus affects the readahead window. After commit 06c04442 ("mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig"), update logic of prev_pos is changed. It updates prev_pos after each return from filemap_get_pages(). But the read request from user may be not fully completed at this point. The updated prev_pos impacts the subsequent readahead window. The real problem is performance drop of fsck_msdos between linux-5.4 and linux-5.15(also linux-6.4). Comparing to linux-5.4,It spends about 110% time and read 140% pages. The read pattern of fsck_msdos is not fully sequential. Simplified read pattern of fsck_msdos likes below: 1.read at page offset 0xa,size 0x1000 2.read at other page offset like 0x20,size 0x1000 3.read at page offset 0xa,size 0x4000 4.read at page offset 0xe,size 0x1000 Here is the read status on linux-6.4: 1.after read at page offset 0xa,size 0x1000 ->page ofs 0xa go into pagecache 2.after read at page offset 0x20,size 0x1000 ->page ofs 0x20 go into pagecache 3.read at page offset 0xa,size 0x4000 ->filemap_get_pages read ofs 0xa from pagecache and returns ->prev_pos is updated to 0xb and goto next loop ->filemap_get_pages tends to read ofs 0xb,size 0x3000 ->initial_readahead case in ondemand_readahead since prev_pos is the same as request ofs. ->read 8 pages while async size is 5 pages (PageReadahead flag at page 0xe) 4.read at page offset 0xe,size 0x1000 ->hit page 0xe with PageReadahead flag set,double the ra_size. read 16 pages while async size is 16 pages Now it reads 24 pages while actually uses 5 pages on linux-5.4: 1.the same as 6.4 2.the same as 6.4 3.read at page offset 0xa,size 0x4000 ->read ofs 0xa from pagecache ->read ofs 0xb,size 0x3000 using page_cache_sync_readahead read 3 pages ->prev_pos is updated to 0xd before generic_file_buffered_read returns 4.read at page offset 0xe,size 0x1000 ->initial_readahead case in ondemand_readahead since request ofs-prev_pos==1 ->read 4 pages while async size is 3 pages Now it reads 7 pages while actually uses 5 pages. In above demo, the initial_readahead case is triggered by offset of user request on linux-5.4. While it may be triggered by update logic of prev_pos on linux-6.4. To fix the performance drop, update prev_pos after finishing one read request. Link: https://lkml.kernel.org/r/20230628110220.120134-1-haibo.li@mediatek.comSigned-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Add a matrix for testing gup based on the current gup_test. Only run the matrix when -a is specified because it's a bit slow. It covers: - Different types of huge pages: thp, hugetlb, or no huge page - Permissions: Write / Read-only - Fast-gup, with/without - Types of the GUP: pin / gup / longterm pins - Shared / Private memories - GUP size: 1 / 512 / random page sizes Link: https://lkml.kernel.org/r/20230628215310.73782-9-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Allows to specify optional tests in run_vmtests.sh, where we can run time consuming test matrix only when user specified "-a". Link: https://lkml.kernel.org/r/20230628215310.73782-8-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Now __get_user_pages() should be well prepared to handle thp completely, as long as hugetlb gup requests even without the hugetlb's special path. Time to retire follow_hugetlb_page(). Tweak misc comments to reflect reality of follow_hugetlb_page()'s removal. Link: https://lkml.kernel.org/r/20230628215310.73782-7-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
The acceleration of THP was done with ctx.page_mask, however it'll be ignored if **pages is non-NULL. The old optimization was introduced in 2013 in 240aadee ("mm: accelerate mm_populate() treatment of THP pages"). It didn't explain why we can't optimize the **pages non-NULL case. It's possible that at that time the major goal was for mm_populate() which should be enough back then. Optimize thp for all cases, by properly looping over each subpage, doing cache flushes, and boost refcounts / pincounts where needed in one go. This can be verified using gup_test below: # chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10 Before: 13992.50 ( +-8.75%) After: 378.50 (+-69.62%) Link: https://lkml.kernel.org/r/20230628215310.73782-6-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
The only path that doesn't use generic "**pages" handling is the gate vma. Make it use the same path, meanwhile tune the next_page label upper to cover "**pages" handling. This prepares for THP handling for "**pages". Link: https://lkml.kernel.org/r/20230628215310.73782-5-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
follow_page() doesn't need it, but we'll start to need it when unifying gup for hugetlb. Link: https://lkml.kernel.org/r/20230628215310.73782-4-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the target of FOLL_WRITE either. However add the checks. Namely, either the need to CoW due to missing write bit, or proper unsharing on !AnonExclusive pages over R/O pins to reject the follow page. That brings this function closer to follow_hugetlb_page(). So we don't care before, and also for now. But we'll care if we switch over slow-gup to use hugetlb_follow_page_mask(). We'll also care when to return -EMLINK properly, as that's the gup internal api to mean "we should unshare". Not really needed for follow page path, though. When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be clear that it just should never fail. When error happens, instead of setting page==NULL, capture the errno instead. Link: https://lkml.kernel.org/r/20230628215310.73782-3-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Patch series "mm/gup: Unify hugetlb, speed up thp", v4. Hugetlb has a special path for slow gup that follow_page_mask() is actually skipped completely along with faultin_page(). It's not only confusing, but also duplicating a lot of logics that generic gup already has, making hugetlb slightly special. This patchset tries to dedup the logic, by first touching up the slow gup code to be able to handle hugetlb pages correctly with the current follow page and faultin routines (where we're mostly there.. due to 10 years ago we did try to optimize thp, but half way done; more below), then at the last patch drop the special path, then the hugetlb gup will always go the generic routine too via faultin_page(). Note that hugetlb is still special for gup, mostly due to the pgtable walking (hugetlb_walk()) that we rely on which is currently per-arch. But this is still one small step forward, and the diffstat might be a proof too that this might be worthwhile. Then for the "speed up thp" side: as a side effect, when I'm looking at the chunk of code, I found that thp support is actually partially done. It doesn't mean that thp won't work for gup, but as long as **pages pointer passed over, the optimization will be skipped too. Patch 6 should address that, so for thp we now get full speed gup. For a quick number, "chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10" gives me 13992.50us -> 378.50us. Gup_test is an extreme case, but just to show how it affects thp gups. This patch (of 8): Firstly, the no_page_table() is meaningless for hugetlb which is a no-op there, because a hugetlb page always satisfies: - vma_is_anonymous() == false - vma->vm_ops->fault != NULL So we can already safely remove it in hugetlb_follow_page_mask(), alongside with the page* variable. Meanwhile, what we do in follow_hugetlb_page() actually makes sense for a dump: we try to fault in the page only if the page cache is already allocated. Let's do the same here for follow_page_mask() on hugetlb. It should so far has zero effect on real dumps, because that still goes into follow_hugetlb_page(). But this may start to influence a bit on follow_page() users who mimics a "dump page" scenario, but hopefully in a good way. This also paves way for unifying the hugetlb gup-slow. Link: https://lkml.kernel.org/r/20230628215310.73782-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230628215310.73782-2-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Collingbourne authored
As a result of the patches "mm: Call arch_swap_restore() from do_swap_page()" and "mm: Call arch_swap_restore() from unuse_pte()", there are no circumstances in which a swapped-in page is installed in a page table without first having arch_swap_restore() called on it. Therefore, we no longer need the logic in set_pte_at() that restores the tags, so remove it. Link: https://lkml.kernel.org/r/20230523004312.1807357-4-pcc@google.com Link: https://linux-review.googlesource.com/id/I8ad54476f3b2d0144ccd8ce0c1d7a2963e5ff6f3Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Hildenbrand <david@redhat.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kasan-dev@googlegroups.com Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Collingbourne authored
We would like to move away from requiring architectures to restore metadata from swap in the set_pte_at() implementation, as this is not only error-prone but adds complexity to the arch-specific code. This requires us to call arch_swap_restore() before calling swap_free() whenever pages are restored from swap. We are currently doing so everywhere except in unuse_pte(); do so there as well. Link: https://lkml.kernel.org/r/20230523004312.1807357-3-pcc@google.com Link: https://linux-review.googlesource.com/id/I68276653e612d64cde271ce1b5a99ae05d6bbc4fSigned-off-by: Peter Collingbourne <pcc@google.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
All callers of show_free_areas() pass 0 and NULL, so we can directly use show_mem() instead of show_free_areas(0, NULL), which could make show_free_areas() a static function. Link: https://lkml.kernel.org/r/20230630062253.189440-2-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
All callers of show_mem() pass 0 and NULL, so we can remove the two arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem(). Link: https://lkml.kernel.org/r/20230630062253.189440-1-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Thomas Weißschuh authored
The memfd_create() syscall, enabled by CONFIG_MEMFD_CREATE, is useful on its own even when not required by CONFIG_TMPFS or CONFIG_HUGETLBFS. Split it into its own proper bool option that can be enabled by users. Move that option into mm/ where the code itself also lies. Also add "select" statements to CONFIG_TMPFS and CONFIG_HUGETLBFS so they automatically enable CONFIG_MEMFD_CREATE as before. Link: https://lkml.kernel.org/r/20230630-config-memfd-v1-1-9acc3ae38b5a@weissschuh.netSigned-off-by: Thomas Weißschuh <linux@weissschuh.net> Tested-by: Zhangjin Wu <falcon@tinylab.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
ZhangPeng authored
After converting the last user to folio_raw_mapping(), we can safely remove the function. Link: https://lkml.kernel.org/r/20230701032853.258697-3-zhangpeng362@huawei.comSigned-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
ZhangPeng authored
We can replace four implicit calls to compound_head() with one by using folio. Link: https://lkml.kernel.org/r/20230701032853.258697-2-zhangpeng362@huawei.comSigned-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ma Wupeng authored
Our test finds a WARN_ON in add_to_avail_list. During add_to_avail_list, avail_lists is already in swap_avail_heads, while leads to this WARN_ON. Here is the simplified calltrace: ------------[ cut here ]------------ Call trace: add_to_avail_list+0xb8/0xc0 swap_range_free+0x110/0x138 swapcache_free_entries+0x100/0x1c0 free_swap_slot+0xbc/0xe0 put_swap_folio+0x1f0/0x2ec delete_from_swap_cache+0x6c/0xd0 folio_free_swap+0xa4/0xe4 __try_to_reclaim_swap+0x9c/0x190 free_swap_and_cache+0x84/0x88 unmap_page_range+0x31c/0x934 unmap_single_vma.isra.0+0x48/0x84 unmap_vmas+0x98/0x10c exit_mmap+0xa4/0x210 mmput+0x88/0x158 do_exit+0x284/0x970 do_group_exit+0x34/0x90 post_copy_siginfo_from_user32+0x0/0x1cc do_notify_resume+0x15c/0x470 el0_svc+0x74/0x84 el0t_64_sync_handler+0xb8/0xbc el0t_64_sync+0x190/0x194 During swapoff, try_to_unuse fails to alloc memory due to memory limit and this leads to the failure of swapoff and causes re-insertion of swap space back into swap_list. During _enable_swap_info, this swap device is added to avail list even this swap device if full. At the same time, one entry in this full swap device in released and we try to add this device into avail list and find it is already in the avail list. This causes this WARN_ON. To fix this. Don't add to avail list is swap is full. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20230627120833.2230766-3-mawupeng1@huawei.comSigned-off-by: Ma Wupeng <mawupeng1@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ma Wupeng authored
Patch series "fix WARN_ON in add_to_avail_list". Empty check for plist_node is checked in add_to_avail_list and plist_add. Drop the duplicate one in add_to_avail_list. Link: https://lkml.kernel.org/r/20230627120833.2230766-1-mawupeng1@huawei.com Link: https://lkml.kernel.org/r/20230627120833.2230766-2-mawupeng1@huawei.comSigned-off-by: Ma Wupeng <mawupeng1@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Sidhartha Kumar authored
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Link: https://lkml.kernel.org/r/20230627174349.491803-1-sidhartha.kumar@oracle.comSigned-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
Since commit 633c0666 ("Memoryless nodes: drop one memoryless node boot warning"), the warning for a node with no available memory is removed. Update the corresponding comment. Link: https://lkml.kernel.org/r/20230625033340.1054103-1-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Thomas Gleixner authored
The documentation of mt_next() claims that it starts the search at the provided index. That's incorrect as it starts the search after the provided index. The documentation of mt_find() is slightly confusing. "Handles locking" is not really helpful as it does not explain how the "locking" works. Also the documentation of index talks about a range, while in reality the index is updated on a succesful search to the index of the found entry plus one. Fix similar issues for mt_find_after() and mt_prev(). Reword the confusing "Note: Will not return the zero entry." comment on mt_for_each() and document @__index correctly. Link: https://lkml.kernel.org/r/87ttw2n556.ffs@tglxSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Charan Teja Kalla authored
A folio turns into a Workingset during: 1) shrink_active_list() placing the folio from active to inactive list. 2) When a workingset transition is happening during the folio refault. And when Workingset is set on a folio, PSI for memory can be accounted during a) That folio is being reclaimed and b) Refault of that folio, for usual reclaims. This accounting of PSI for memory is not consistent for reclaim + refault operation between usual reclaim and madvise(COLD/PAGEOUT) which deactivate or proactively reclaim a folio: a) A folio started at inactive and moved to active as part of accesses. Workingset is absent on the folio thus refault of it when reclaimed through MADV_PAGEOUT operation doesn't account for PSI. b) When the same folio transition from inactive->active and then to inactive through shrink_active_list(). Workingset is set on the folio thus refault of it when reclaimed through MADV_PAGEOUT operation accounts for PSI. c) When the same folio is part of active list directly as a result of folio refault and this was a workingset folio prior to eviction. Workingset is set on the folio thus the refault of it when reclaimed through MADV_PAGEOUT/MADV_COLD operation accounts for PSI. d) MADV_COLD transfers the folio from active list to inactive list. Such folios may not have the Workingset thus refault operation on such folio doesn't account for PSI. As said above, refault operation caused because of MADV_PAGEOUT on a folio is accounts for memory PSI in b) and c) but not in a). Refault caused by the reclaim of a folio on which MADV_COLD is performed accounts memory PSI in c) but not in d). These behaviours are inconsistent w.r.t usual reclaim + refault operation. Make this PSI accounting always consistent by turning a folio into a workingset one whenever it is leaving the active list. Also, accounting of PSI on a folio whenever it leaves the active list as part of the MADV_COLD/PAGEOUT operation helps the users whether they are operating on proper folios[1]. [1] https://lore.kernel.org/all/20230605180013.GD221380@cmpxchg.org/ Link: https://lkml.kernel.org/r/1688393201-11135-1-git-send-email-quic_charante@quicinc.comSigned-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Suggested-by: Suren Baghdasaryan <surenb@google.com> Reported-by: Sai Manobhiram Manapragada <quic_smanapra@quicinc.com> Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Jul, 2023 14 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spiLinus Torvalds authored
Pull spi fixes from Mark Brown: "A bunch of fixes for the Qualcomm QSPI driver, fixing multiple issues with the newly added DMA mode - it had a number of issues exposed when tested in a wider range of use cases, both race condition style issues and issues with different inputs to those that had been used in test" * tag 'spi-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-qcom-qspi: Add mem_ops to avoid PIO for badly sized reads spi: spi-qcom-qspi: Fallback to PIO for xfers that aren't multiples of 4 bytes spi: spi-qcom-qspi: Add DMA_CHAIN_DONE to ALL_IRQS spi: spi-qcom-qspi: Call dma_wmb() after setting up descriptors spi: spi-qcom-qspi: Use GFP_ATOMIC flag while allocating for descriptor spi: spi-qcom-qspi: Ignore disabled interrupts' status in isr
-
Linus Torvalds authored
Merge tag 'regulator-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A couple of small fixes for the the mt6358 driver, fixing error reporting and a bootstrapping issue" * tag 'regulator-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: mt6358: Fix incorrect VCN33 sync error message regulator: mt6358: Sync VCN33_* enable status after checking ID
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fixes from Greg KH: "Here are a set of USB driver fixes for 6.5-rc4. Include in here are: - new USB serial device ids - dwc3 driver fixes for reported issues - typec driver fixes for reported problems - gadget driver fixes - reverts of some problematic USB changes that went into -rc1 All of these have been in linux-next with no reported problems" * tag 'usb-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits) usb: misc: ehset: fix wrong if condition usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config usb: gadget: call usb_gadget_check_config() to verify UDC capability usb: typec: Use sysfs_emit_at when concatenating the string usb: typec: Iterate pds array when showing the pd list usb: typec: Set port->pd before adding device for typec_port usb: typec: qcom: fix return value check in qcom_pmic_typec_probe() Revert "usb: gadget: tegra-xudc: Fix error check in tegra_xudc_powerdomain_init()" Revert "usb: xhci: tegra: Fix error check" USB: gadget: Fix the memory leak in raw_gadget driver usb: gadget: core: remove unbalanced mutex_unlock in usb_gadget_activate Revert "usb: dwc3: core: Enable AutoRetry feature in the controller" Revert "xhci: add quirk for host controllers that don't update endpoint DCS" USB: quirks: add quirk for Focusrite Scarlett usb: xhci-mtk: set the dma max_seg_size MAINTAINERS: drop invalid usb/cdns3 Reviewer e-mail usb: dwc3: don't reset device side if dwc3 was configured as host-only usb: typec: ucsi: move typec_set_mode(TYPEC_STATE_SAFE) to ucsi_unregister_partner() usb: ohci-at91: Fix the unhandle interrupt when resume ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty/serial fixes from Greg KH: "Here are some small TTY and serial driver fixes for 6.5-rc4 for some reported problems. Included in here is: - TIOCSTI fix for braille readers - documentation fix for minor numbers - MAINTAINERS update for new serial files in -rc1 - minor serial driver fixes for reported problems All of these have been in linux-next with no reported problems" * tag 'tty-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_dw: Preserve original value of DLF register tty: serial: sh-sci: Fix sleeping in atomic context serial: sifive: Fix sifive_serial_console_setup() section Documentation: devices.txt: reconcile serial/ucc_uart minor numers MAINTAINERS: Update TTY layer for lists and recently added files tty: n_gsm: fix UAF in gsm_cleanup_mux TIOCSTI: always enable for CAP_SYS_ADMIN
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging driver fixes from Greg KH: "Here are three small staging driver fixes for 6.5-rc4 that resolve some reported problems. These fixes are: - fix for an old bug in the r8712 driver - fbtft driver fix for a spi device - potential overflow fix in the ks7010 driver All of these have been in linux-next with no reported problems" * tag 'staging-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext() staging: fbtft: ili9341: use macro FBTFT_REGISTER_SPI_DRIVER staging: r8712: Fix memory leak in _r8712_init_xmit_priv()
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char driver and Documentation fixes from Greg KH: "Here is a char driver fix and some documentation updates for 6.5-rc4 that contain the following changes: - sram/genalloc bugfix for reported problem - security-bugs.rst update based on recent discussions - embargoed-hardware-issues minor cleanups and then partial revert for the project/company lists All of these have been in linux-next for a while with no reported problems, and the documentation updates have all been reviewed by the relevant developers" * tag 'char-misc-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc/genalloc: Name subpools by of_node_full_name() Documentation: embargoed-hardware-issues.rst: add AMD to the list Documentation: embargoed-hardware-issues.rst: clean out empty and unused entries Documentation: security-bugs.rst: clarify CVE handling Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
-
Linus Torvalds authored
Merge tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe fixes from Masami Hiramatsu: - probe-events: add NULL check for some BTF API calls which can return error code and NULL. - ftrace selftests: check fprobe and kprobe event correctly. This fixes a miss condition of the test command. - kprobes: do not allow probing functions that start with "__cfi_" or "__pfx_" since those are auto generated for kernel CFI and not executed. * tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Prohibit probing on CFI preamble symbol selftests/ftrace: Fix to check fprobe event eneblement tracing/probes: Fix to add NULL check for BTF APIs
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "x86: - Do not register IRQ bypass consumer if posted interrupts not supported - Fix missed device interrupt due to non-atomic update of IRR - Use GFP_KERNEL_ACCOUNT for pid_table in ipiv - Make VMREAD error path play nice with noinstr - x86: Acquire SRCU read lock when handling fastpath MSR writes - Support linking rseq tests statically against glibc 2.35+ - Fix reference count for stats file descriptors - Detect userspace setting invalid CR0 Non-KVM: - Remove coccinelle script that has caused multiple confusion ("debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage", acked by Greg)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: selftests: Expand x86's sregs test to cover illegal CR0 values KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage" KVM: selftests: Verify stats fd is usable after VM fd has been closed KVM: selftests: Verify stats fd can be dup()'d and read KVM: selftests: Verify userspace can create "redundant" binary stats files KVM: selftests: Explicitly free vcpus array in binary stats test KVM: selftests: Clean up stats fd in common stats_test() helper KVM: selftests: Use pread() to read binary stats header KVM: Grab a reference to KVM for VM and vCPU stats file descriptors selftests/rseq: Play nice with binaries statically linked against glibc 2.35+ Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid" KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path KVM: VMX: Make VMREAD error path play nice with noinstr KVM: x86/irq: Conditionally register IRQ bypass consumer again KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv KVM: x86: check the kvm_cpu_get_interrupt result before using it KVM: x86: VMX: set irr_pending in kvm_apic_update_irr ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fix from Borislav Petkov: - Fix a rtmutex race condition resulting from sharing of the sort key between the lock waiters and the PI chain tree (->pi_waiters) of a task by giving each tree their own sort key * tag 'locking_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix task->pi_waiters integrity
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - AMD's automatic IBRS doesn't enable cross-thread branch target injection protection (STIBP) for user processes. Enable STIBP on such systems. - Do not delete (but put the ref instead) of AMD MCE error thresholding sysfs kobjects when destroying them in order not to delete the kernfs pointer prematurely - Restore annotation in ret_from_fork_asm() in order to fix kthread stack unwinding from being marked as unreliable and thus breaking livepatching * tag 'x86_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks x86: Fix kthread unwind
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Borislav Petkov: - Work around an erratum on GIC700, where a race between a CPU handling a wake-up interrupt, a change of affinity, and another CPU going to sleep can result in a lack of wake-up event on the next interrupt - Fix the locking required on a VPE for GICv4 - Enable Rockchip 3588001 erratum workaround for RK3588S - Fix the irq-bcm6345-l1 assumtions of the boot CPU always be the first CPU in the system * tag 'irq_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627 irqchip/gic-v3: Enable Rockchip 3588001 erratum workaround for RK3588S irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fixes from Steve French: "Four small SMB3 client fixes: - two reconnect fixes (to address the case where non-default iocharset gets incorrectly overridden at reconnect with the default charset) - fix for NTLMSSP_AUTH request setting a flag incorrectly) - Add missing check for invalid tlink (tree connection) in ioctl" * tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add missing return value check for cifs_sb_tlink smb3: do not set NTLMSSP_VERSION flag for negotiate not auth request cifs: fix charset issue in reconnection fs/nls: make load_nls() take a const parameter
-
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: - Fix to /sys/kernel/tracing/per_cpu/cpu*/stats read and entries. If a resize shrinks the buffer it clears the read count to notify readers that they need to reset. But the read count is also used for accounting and this causes the numbers to be off. Instead, create a separate variable to use to notify readers to reset. - Fix the ref counts of the "soft disable" mode. The wrong value was used for testing if soft disable mode should be enabled or disable, but instead, just change the logic to do the enable and disable in place when the SOFT_MODE is set or cleared. - Several kernel-doc fixes - Removal of unused external declarations * tag 'trace-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix warning in trace_buffered_event_disable() ftrace: Remove unused extern declarations tracing: Fix kernel-doc warnings in trace_seq.c tracing: Fix kernel-doc warnings in trace_events_trigger.c tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.c ring-buffer: Fix kernel-doc warnings in ring_buffer.c ring-buffer: Fix wrong stat of cpu_buffer->read
-