- 06 Mar, 2024 24 commits
-
-
Chengming Zhou authored
Commit bf9b7df2 ("mm/zswap: global lru and shrinker shared by all zswap_pools") introduced a new lock to protect zswap_next_shrink, instead of reusing zswap_pools_lock. But the problem is that it's initialized only when zswap enabled, which causes bug if zswap_memcg_offline_cleanup() called without zswap enabled. Fix it by using DEFINE_SPINLOCK() to statically initialize them and define them as multiple static variables to keep in consistent with the existing global variables in zswap. Link: https://lkml.kernel.org/r/20240305075345.1493214-1-chengming.zhou@linux.dev Fixes: bf9b7df2 ("mm/zswap: global lru and shrinker shared by all zswap_pools") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403051008.a8cf8a94-lkp@intel.comSigned-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
The rounddown_pow_of_two(0) is undefined, so val = 0 is not allowed in the fault_around_bytes_set(), and leads to shift-out-of-bounds, UBSAN: shift-out-of-bounds in include/linux/log2.h:67:13 shift exponent 4294967295 is too large for 64-bit type 'long unsigned int' CPU: 7 PID: 107 Comm: sh Not tainted 6.8.0-rc6-next-20240301 #294 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x78/0x90 dump_stack+0x18/0x24 ubsan_epilogue+0x10/0x44 __ubsan_handle_shift_out_of_bounds+0x98/0x134 fault_around_bytes_set+0xa4/0xb0 simple_attr_write_xsigned.isra.0+0xe4/0x1ac simple_attr_write+0x18/0x24 debugfs_attr_write+0x4c/0x98 vfs_write+0xd0/0x4b0 ksys_write+0x6c/0xfc __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xdc el0t_64_sync_handler+0xc0/0xc4 el0t_64_sync+0x190/0x194 ---[ end trace ]--- Fix it by setting the minimum val to PAGE_SIZE. Link: https://lkml.kernel.org/r/20240302064312.2358924-1-wangkefeng.wang@huawei.com Fixes: 53d36a56 ("mm: prefer fault_around_pages to fault_around_bytes") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Yue Sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/all/CAEkJfYPim6DQqW1GqCiHLdh2-eweqk1fGyXqs3JM+8e1qGge8w@mail.gmail.com/Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Qi Zheng authored
After commit 6326c26c ("s390: convert various pgalloc functions to use ptdescs"), there are still some positions that use page->{lru, index} instead of ptdesc->{pt_list, pt_index}. In order to make the use of ptdesc->{pt_list, pt_index} clearer, it would be better to convert them as well. [zhengqi.arch@bytedance.com: fix build failure] Link: https://lkml.kernel.org/r/20240305072154.26168-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/04beaf3255056ffe131a5ea595736066c1e84756.1709541697.git.zhengqi.arch@bytedance.comSigned-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Qi Zheng authored
In s390, the page->index field is used for gmap (see gmap_shadow_pgt()), so add the corresponding pt_index to struct ptdesc and add a comment to clarify this. Link: https://lkml.kernel.org/r/283624c2af45fb2090b41a6b1b5481bb0a45bad7.1709541697.git.zhengqi.arch@bytedance.comSigned-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Qi Zheng authored
Patch series "minor fixes and supplement for ptdesc". In this series, the [PATCH 1/3] and [PATCH 2/3] are fixes for some issues discovered during code inspection. The [PATCH 3/3] is a supplement to ptdesc conversion in s390, I don't know why this is not done in the commit 6326c26c ("s390: convert various pgalloc functions to use ptdescs"), maybe I missed something. And since I don't have an s390 environment, I hope kernel test robot can help compile and test, and this is why I did not fold [PATCH 2/3] and [PATCH 3/3] into one patch. This patch (of 3): The commit 32cc0b7c ("powerpc: add pte_free_defer() for pgtables sharing page") introduced the use of PageActive flag to page table fragments tracking, so the ptdesc->__page_flags is not unused, so correct the wrong comment. Link: https://lkml.kernel.org/r/cover.1709541697.git.zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/cc42d5915fd98fd802f920de243f535efcfe01db.1709541697.git.zhengqi.arch@bytedance.comSigned-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Thorsten Blum authored
Fixes Coccinelle/coccicheck warning reported by do_div.cocci. Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder. Link: https://lkml.kernel.org/r/20240228224911.1164-2-thorsten.blum@toblux.comSigned-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
We actually add folios to the pagelist already, but then work with them as pages. Removes a call to compound_head() in PageKsm() and removes a reference to page->index. Link: https://lkml.kernel.org/r/20240229153015.1996829-1-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
madvise, mprotect and some others might need folio_pte_batch to check if a range of PTEs are completely mapped to a large folio with contiguous physical addresses. Let's make it available in mm/internal.h. While at it, add proper kernel doc and sanity-check more input parameters using two additional VM_WARN_ON_FOLIO(). [21cnbao@gmail.com: build fix] Link: https://lkml.kernel.org/r/CAGsJ_4wWzG-37D82vqP_zt+Fcbz+URVe5oXLBc4M5wbN8A_gpQ@mail.gmail.com [david@redhat.com: improve the doc for the exported func] Link: https://lkml.kernel.org/r/20240227104201.337988-1-21cnbao@gmail.comSigned-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Now that PF_POISONED_CHECK() can take a const argument, we can drop the cast. Link: https://lkml.kernel.org/r/20240227192337.757313-9-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Constify the flag tests that aren't automatically generated and the tests that look like flag tests but are more complicated. Link: https://lkml.kernel.org/r/20240227192337.757313-8-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Now that dump_page() takes a const argument, we can constify all the page flag tests. Link: https://lkml.kernel.org/r/20240227192337.757313-7-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Now that __dump_page() takes a const argument, we can make dump_page() take a const struct page too. Link: https://lkml.kernel.org/r/20240227192337.757313-6-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Turn __dump_page() into a wrapper around __dump_folio(). Snapshot the page & folio into a stack variable so we don't hit BUG_ON() if an allocation is freed under us and what was a folio pointer becomes a pointer to a tail page. [willy@infradead.org: fix build issue] Link: https://lkml.kernel.org/r/ZeAKCyTn_xS3O9cE@casper.infradead.org [willy@infradead.org: fix __dump_folio] Link: https://lkml.kernel.org/r/ZeJJegP8zM7S9GTy@casper.infradead.org [willy@infradead.org: fix pointer confusion] Link: https://lkml.kernel.org/r/ZeYa00ixxC4k1ot-@casper.infradead.org [akpm@linux-foundation.org: s/printk/pr_warn/] Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All callers have been converted to use folios, so remove the various set/clear/test functions defined on pages. Link: https://lkml.kernel.org/r/20240227192337.757313-4-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All callers have been converted to use folios. This was the only user of PF_ONLY_HEAD, so remove that too. Link: https://lkml.kernel.org/r/20240227192337.757313-3-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Patch series "PageFlags cleanups". We have now successfully removed all of the uses of some of the PageFlags from the kernel, but there's nothing to stop somebody reintroducing them. By splitting out FOLIO_FLAGS from PAGEFLAGS, we can stop defining the old flags; and we do that in some of the later patches. After doing this, I realised that dump_page() was living dangerously; we could end up calling folio_test_foo() on a pointer which no longer pointed to a folio (as dump_page() is not necessarily called when the caller has a reference to the page). So I fixed that up. And then I realised that this was the key to making dump_page() take a const argument, which means we can constify the page flags testing, which means we can remove more cast-away-the-const bad code. And here's where I ended up. This patch (of 8): We've progressed far enough with the folio transition that some flags are now no longer checked on pages, but only on folios. To prevent new users appearing, prepare to only define the folio versions of the flag test/set/clear. Link: https://lkml.kernel.org/r/20240227192337.757313-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240227192337.757313-2-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
Optimizing the initialization speed of 1G huge pages through parallelization. 1G hugetlbs are allocated from bootmem, a process that is already very fast and does not currently require optimization. Therefore, we focus on parallelizing only the initialization phase in `gather_bootmem_prealloc`. Here are some test results: test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% [akpm@linux-foundation.org: s/initialied/initialized/, per Alexey] Link: https://lkml.kernel.org/r/20240222140422.393911-9-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
By distributing both the allocation and the initialization tasks across multiple threads, the initialization of 2M hugetlb will be faster, thereby improving the boot speed. Here are some test results: test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% Link: https://lkml.kernel.org/r/20240222140422.393911-8-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
Allow hugetlb use padata_do_multithreaded for parallel initialization. Select CONFIG_PADATA in this case. Link: https://lkml.kernel.org/r/20240222140422.393911-7-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
hugetlb parallelization depends on PADATA, and PADATA depends on SMP. PADATA consists of two distinct functionality: One part is padata_do_multithreaded which disregards order and simply divides tasks into several groups for parallel execution. Hugetlb init parallelization depends on padata_do_multithreaded. The other part is composed of a set of APIs that, while handling data in an out-of-order parallel manner, can eventually return the data with ordered sequence. Currently Only `crypto/pcrypt.c` use them. All users of PADATA of non-SMP case currently only use padata_do_multithreaded. It is easy to implement a serial one in include/linux/padata.h. And it is not necessary to implement another functionality unless the only user of crypto/pcrypt.c does not depend on SMP in the future. Link: https://lkml.kernel.org/r/20240222140422.393911-6-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
different nodes Date: Thu, 22 Feb 2024 22:04:17 +0800 When a group of tasks that access different nodes are scheduled on the same node, they may encounter bandwidth bottlenecks and access latency. Thus, numa_aware flag is introduced here, allowing tasks to be distributed across different nodes to fully utilize the advantage of multi-node systems. Link: https://lkml.kernel.org/r/20240222140422.393911-5-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
With parallelization of hugetlb allocation across different threads, each thread works on a differnet node to allocate pages from, instead of all allocating from a common node h->next_nid_to_alloc. To address this, it's necessary to assign a separate next_nid_to_alloc for each thread. Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc have been modified to directly accept a *next_nid_to_alloc parameter, ensuring thread-specific allocation and avoiding concurrent access issues. Link: https://lkml.kernel.org/r/20240222140422.393911-4-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
1G and 2M huge pages have different allocation and initialization logic, which leads to subtle differences in parallelization. Therefore, it is appropriate to split hugetlb_hstate_alloc_pages into gigantic and non-gigantic. This patch has no functional changes. Link: https://lkml.kernel.org/r/20240222140422.393911-3-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
Patch series "hugetlb: parallelize hugetlb page init on boot", v6. Introduction ------------ Hugetlb initialization during boot takes up a considerable amount of time. For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2 seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel host takes more than 1 minute[1]. This is a noteworthy figure. Inspired by [2] and [3], hugetlb initialization can also be accelerated through parallelization. Kernel already has infrastructure like padata_do_multithreaded, this patch uses it to achieve effective results by minimal modifications. [1] https://lore.kernel.org/all/783f8bac-55b8-5b95-eb6a-11a583675000@google.com/ [2] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/ [3] https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/ [4] https://lore.kernel.org/all/76becfc1-e609-e3e8-2966-4053143170b6@google.com/ max_threads ----------- This patch use `padata_do_multithreaded` like this: ``` job.max_threads = num_node_state(N_MEMORY) * multiplier; padata_do_multithreaded(&job); ``` To fully utilize the CPU, the number of parallel threads needs to be carefully considered. `max_threads = num_node_state(N_MEMORY)` does not fully utilize the CPU, so we need to multiply it by a multiplier. Tests below indicate that a multiplier of 2 significantly improves performance, and although larger values also provide improvements, the gains are marginal. multiplier 1 2 3 4 5 ------------ ------- ------- ------- ------- ------- 256G 2node 358ms 215ms 157ms 134ms 126ms 2T 4node 979ms 679ms 543ms 489ms 481ms 50G 2node 71ms 44ms 37ms 30ms 31ms Therefore, choosing 2 as the multiplier strikes a good balance between enhancing parallel processing capabilities and maintaining efficient resource management. Test result ----------- test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% This patch (of 8): The readability of `hugetlb_hstate_alloc_pages` is poor. By cleaning the code, its readability can be improved, facilitating future modifications. This patch extracts two functions to reduce the complexity of `hugetlb_hstate_alloc_pages` and has no functional changes. - hugetlb_hstate_alloc_pages_node_specific() to handle iterates through each online node and performs allocation if necessary. - hugetlb_hstate_alloc_pages_report() report error during allocation. And the value of h->max_huge_pages is updated accordingly. Link: https://lkml.kernel.org/r/20240222140422.393911-1-gang.li@linux.dev Link: https://lkml.kernel.org/r/20240222140422.393911-2-gang.li@linux.devSigned-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 05 Mar, 2024 16 commits
-
-
Chengming Zhou authored
We will save allocated tag in the object header to indicate that it's allocated. handle |= OBJ_ALLOCATED_TAG; So the object header needs to reserve LSB for this tag bit. But the handle itself doesn't need to reserve LSB to save tag, since it's only used to find the position of object, by (pfn + obj_idx). So remove LSB reserve from handle, one more bit can be used as obj_idx. Link: https://lkml.kernel.org/r/20240228023854.3511239-1-chengming.zhou@linux.devSigned-off-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
John Hubbard authored
do_numa_page() is reading from the same page table entry, twice, while holding the page table lock: once while checking that the pte hasn't changed, and again in order to modify the pte. Instead, just read the pte once, and save it in the same old_pte variable that already exists. This has no effect on behavior, other than to provide a tiny potential improvement to performance, by avoiding the redundant memory read (which the compiler cannot elide, due to READ_ONCE()). Also improve the associated comments nearby. Link: https://lkml.kernel.org/r/20240228034151.459370-1-jhubbard@nvidia.comSigned-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Richard Chang authored
alloc_contig_migrate_range has every information to be able to understand big contiguous allocation latency. For example, how many pages are migrated, how many times they were needed to unmap from page tables. This patch adds the trace event to collect the allocation statistics. In the field, it was quite useful to understand CMA allocation latency. [akpm@linux-foundation.org: a/trace_mm_alloc_config_migrate_range_info_enabled/trace_mm_alloc_contig_migrate_range_info_enabled] Link: https://lkml.kernel.org/r/20240228051127.2859472-1-richardycc@google.comSigned-off-by: Richard Chang <richardycc@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org. Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
We already have a folio; use it instead of the head page where reasonable. Saves a couple of calls to compound_head() and elimimnates a few references to page->mapping. Link: https://lkml.kernel.org/r/20240228164326.1355045-1-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Huang Shijie authored
In memory_model.h, if CONFIG_SPARSEMEM_VMEMMAP is configed, kernel will use vmemmap to do the __pfn_to_page/page_to_pfn, and kernel will not use the "classic sparse" to do the __pfn_to_page/page_to_pfn. So export the vmemmap when CONFIG_SPARSEMEM_VMEMMAP is configed. This makes the user applications (crash, etc) get faster pfn_to_page/page_to_pfn operations too. Link: https://lkml.kernel.org/r/20240227014952.3184-1-shijie@os.amperecomputing.comSigned-off-by: Huang Shijie <shijie@os.amperecomputing.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Kazuhito Hagio <k-hagio-ab@nec.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Changbin Du authored
The synchronization here is to ensure the ordering of freeing of a module init so that it happens before W+X checking. It is worth noting it is not that the freeing was not happening, it is just that our sanity checkers raced against the permission checkers which assume init memory is already gone. Commit 1a7b7d92 ("modules: Use vmalloc special flag") moved calling do_free_init() into a global workqueue instead of relying on it being called through call_rcu(..., do_free_init), which used to allowed us call do_free_init() asynchronously after the end of a subsequent grace period. The move to a global workqueue broke the gaurantees for code which needed to be sure the do_free_init() would complete with rcu_barrier(). To fix this callers which used to rely on rcu_barrier() must now instead use flush_work(&init_free_wq). Without this fix, we still could encounter false positive reports in W+X checking since the rcu_barrier() here can not ensure the ordering now. Even worse, the rcu_barrier() can introduce significant delay. Eric Chanudet reported that the rcu_barrier introduces ~0.1s delay on a PREEMPT_RT kernel. [ 0.291444] Freeing unused kernel memory: 5568K [ 0.402442] Run /sbin/init as init process With this fix, the above delay can be eliminated. Link: https://lkml.kernel.org/r/20240227023546.2490667-1-changbin.du@huawei.com Fixes: 1a7b7d92 ("modules: Use vmalloc special flag") Signed-off-by: Changbin Du <changbin.du@huawei.com> Tested-by: Eric Chanudet <echanude@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Xiaoyi Su <suxiaoyi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All but one caller already has a folio, so convert free_page_and_swap_cache() to have a folio and remove the call to page_folio(). Link: https://lkml.kernel.org/r/20240227174254.710559-19-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
These pages are all chained together through the lru list, so we know they're folios. Use the folio APIs to save three hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20240227174254.710559-18-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Process the pages in batch-sized quantities instead of all-at-once. Link: https://lkml.kernel.org/r/20240227174254.710559-17-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
The last user was removed over a year ago; remove the definition. Link: https://lkml.kernel.org/r/20240227174254.710559-16-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All callers now use free_unref_folios() so we can delete this function. Link: https://lkml.kernel.org/r/20240227174254.710559-15-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All users have been converted to mem_cgroup_uncharge_folios() so we can remove this API. Link: https://lkml.kernel.org/r/20240227174254.710559-14-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
The few folios which can't be moved to the LRU list (because their refcount dropped to zero) used to be returned to the caller to dispose of. Make this simpler to call by freeing the folios directly through free_unref_folios(). Link: https://lkml.kernel.org/r/20240227174254.710559-13-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Use free_unref_page_batch() to free the folios. This may increase the number of IPIs from calling try_to_unmap_flush() more often, but that's going to be very workload-dependent. It may even reduce the number of IPIs as we now batch-free large folios instead of freeing them one at a time. Link: https://lkml.kernel.org/r/20240227174254.710559-12-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Hugetlb folios still get special treatment, but normal large folios can now be freed by free_unref_folios(). This should have a reasonable performance impact, TBD. Link: https://lkml.kernel.org/r/20240227174254.710559-11-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Call folio_undo_large_rmappable() if needed. free_unref_page_prepare() destroys the ability to call folio_order(), so stash the order in folio->private for the benefit of the second loop. Link: https://lkml.kernel.org/r/20240227174254.710559-10-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-