- 29 Apr, 2022 33 commits
-
-
Naoya Horiguchi authored
Reverts commit 888af270 ("mm/memory-failure.c: fix race with changing page compound again") because now we fetch the page refcount under hugetlb_lock in try_memory_failure_hugetlb() so that the race check is no longer necessary. Link: https://lkml.kernel.org/r/20220408135323.1559401-4-naoya.horiguchi@linux.devSigned-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Naoya Horiguchi authored
In already hwpoisoned case, memory_failure() is supposed to return with releasing the page refcount taken for error handling. But currently the refcount is not released when called with MF_COUNT_INCREASED, which makes page refcount inconsistent. This should be rare and non-critical, but it might be inconvenient in testing (unpoison doesn't work). Link: https://lkml.kernel.org/r/20220408135323.1559401-3-naoya.horiguchi@linux.devSigned-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
liqiong authored
No need cast (void*) to (struct hwp_walk*). Link: https://lkml.kernel.org/r/20220322142826.25939-1-liqiong@nfschina.comSigned-off-by: liqiong <liqiong@nfschina.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Zi Yan authored
Whenever the buddy of a page is found from __find_buddy_pfn(), page_is_buddy() should be used to check its validity. Add a helper function find_buddy_page_pfn() to find the buddy page and do the check together. [ziy@nvidia.com: updates per David] Link: https://lkml.kernel.org/r/20220401230804.1658207-2-zi.yan@sent.com Link: https://lore.kernel.org/linux-mm/CAHk-=wji_AmYygZMTsPMdJ7XksMt7kOur8oDfDdniBRMjm4VkQ@mail.gmail.com/ Link: https://lkml.kernel.org/r/7236E7CA-B5F1-4C04-AB85-E86FA3E9A54B@nvidia.comSuggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Zi Yan authored
Move pageblock migratetype check code in the while loop to simplify the logic. It also saves redundant buddy page checking code. Link: https://lkml.kernel.org/r/20220401230804.1658207-1-zi.yan@sent.com Link: https://lore.kernel.org/linux-mm/27ff69f9-60c5-9e59-feb2-295250077551@suse.cz/Signed-off-by: Zi Yan <ziy@nvidia.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
To make node order in round-robin in the same distance group, we add a penalty to the first node we got in each round. To get a round-robin order in the same distance group, we don't need to decrease the penalty since: * find_next_best_node() always iterates node in the same order * distance matters more then penalty in find_next_best_node() * in nodes with the same distance, the first one would be picked up So it is fine to increase same penalty when we get the first node in the same distance group. Since we just increase a constance of 1 to node penalty, it is not necessary to multiply MAX_NODE_LOAD for preference. [richard.weiyang@gmail.com: remove remove MAX_NODE_LOAD, per Vlastimil] Link: https://lkml.kernel.org/r/20220412001319.7462-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20220123013537.20491-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Krupa Ramakrishnan <krupa.ramakrishnan@amd.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Joel Savitz authored
commit 5ef64cc8 ("mm: allow a controlled amount of unfairness in the page lock") introduced a new systctl but no accompanying documentation. Add a simple entry to the documentation. Link: https://lkml.kernel.org/r/20220325164437.120246-1-jsavitz@redhat.comSigned-off-by: Joel Savitz <jsavitz@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "zhangyi (F)" <yi.zhang@huawei.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yury Norov authored
vmap() takes struct page *pages as one of arguments, and user may provide an invalid pointer which may lead to corrupted translation table. An example of such behaviour is erroneous usage of virt_to_page(): vaddr1 = dma_alloc_coherent() page = virt_to_page() // Wrong here ... vaddr2 = vmap(page) memset(vaddr2) // Faulting here virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address. The problem is that vmap() populates pte with bad pfn successfully, and it's much harder to debug at memory access time. This case should be caught by DEBUG_VIRTUAL being that enabled, but it's not enabled in popular distros. Kernel already checks the pages against NULL. In the case mentioned above, however, the address is not NULL, and it's big enough so that the hardware generated Address Size Abort on arm64: [ 665.484101] Unhandled fault at 0xffff8000252cd000 [ 665.488807] Mem abort info: [ 665.491617] ESR = 0x96000043 [ 665.494675] EC = 0x25: DABT (current EL), IL = 32 bits [ 665.499985] SET = 0, FnV = 0 [ 665.503039] EA = 0, S1PTW = 0 [ 665.506167] Data abort info: [ 665.509047] ISV = 0, ISS = 0x00000043 [ 665.512882] CM = 0, WnR = 1 [ 665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000 [ 665.522550] [ffff8000252cd000] pgd=000000affcfff003, pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713 [ 665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP [ 665.539936] Modules linked in: [...] [ 665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P OE 5.4.0-84-generic #94~18.04.1-Ubuntu [ 665.626806] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018 [ 665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO) [ 665.641407] pc : __memset+0x38/0x188 [ 665.645146] lr : test+0xcc/0x3f8 [ 665.650184] sp : ffff8000359bb840 [ 665.653486] x29: ffff8000359bb840 x28: 0000000000000000 [ 665.658785] x27: 0000000000000000 x26: 0000000000231000 [ 665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800 [ 665.669382] x23: 0000000000000001 x22: ffff00af533e5000 [ 665.674680] x21: 0000000000001000 x20: 0000000000000000 [ 665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff [ 665.685276] x17: 00000000588636a5 x16: 0000000000000013 [ 665.690574] x15: ffffffffffffffff x14: 000000000007ffff [ 665.695872] x13: 0000000080000000 x12: 0140000000000000 [ 665.701170] x11: 0000000000000041 x10: ffff8000652cd000 [ 665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000 [ 665.711767] x7 : 0303030303030303 x6 : 0000000000001000 [ 665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000 [ 665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001 [ 665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000 [ 665.732960] Call trace: [ 665.735395] __memset+0x38/0x188 [...] Interestingly, this abort happens even if copy_from_kernel_nofault() is used, which is quite inconvenient for debugging purposes. This patch adds a pfn_valid() check into vmap() path, so that invalid mapping will not be created; WARN_ON() is used to let client code know that something goes wrong, and it's not a regular EINVAL situation. Link: https://lkml.kernel.org/r/20220422220410.1308706-1-yury.norov@gmail.comSigned-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yixuan Cao authored
The sentence "but the mempolcy want to alloc memory by interleaving" should be rephrased with "but the mempolicy wants to alloc memory by interleaving" where "mempolicy" is a struct name. This work is coauthored by Yinan Zhang Jiajian Ye Shenghong Han Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401064543.4447-1-caoyixuan2019@email.szu.edu.cnSigned-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lu Jialin authored
There is no use for the private value, __OOM_TYPE and OOM notifier OOM_CONTROL. Therefore remove them to make the code clean. Link: https://lkml.kernel.org/r/20220421122755.40899-1-lujialin4@huawei.comSigned-off-by: Lu Jialin <lujialin4@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lu Jialin authored
cgroup_memory_noswap is only used in mm/memcontrol.c, therefore just make it static, and remove export in include/linux/memcontrol.h Link: https://lkml.kernel.org/r/20220421124736.62180-1-lujialin4@huawei.comSigned-off-by: Lu Jialin <lujialin4@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Roman Gushchin authored
List memory control and kernel memory control kselftests in the memory resource controller entry. Link: https://lkml.kernel.org/r/20220415000133.3955987-5-roman.gushchin@linux.devSigned-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: David Vernet <void@manifault.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Roman Gushchin authored
List cgroup kselftests in the cgroup MAINTAINERS entry. These are tests covering core, freezer and cgroup.kill functionality. Link: https://lkml.kernel.org/r/20220415000133.3955987-4-roman.gushchin@linux.devSigned-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Down <chris@chrisdown.name> Cc: David Vernet <void@manifault.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Roman Gushchin authored
After commit 0e4b01df ("mm, memcg: throttle allocators when failing reclaim over memory.high") allocating memory over memory.high became very time consuming. But it's exactly what the memory.high test from cgroup kselftests is doing: it tries to allocate 100M with 30M memory.high value. It takes forever to complete. In order to keep it passing (or failing) in a reasonable amount of time let's try to allocate only a little over 30M: 31M to be precise. With this change test_memcontrol finishes in a reasonable amount of time: $ time ./test_memcontrol ok 1 test_memcg_subtree_control ok 2 test_memcg_current ok 3 test_memcg_min ok 4 test_memcg_low ok 5 test_memcg_high ok 6 test_memcg_max ok 7 test_memcg_oom_events ok 8 test_memcg_swap_max ok 9 test_memcg_sock ok 10 test_memcg_oom_group_leaf_events ok 11 test_memcg_oom_group_parent_events ok 12 test_memcg_oom_group_score_events real 0m2.273s user 0m0.064s sys 0m0.739s Link: https://lkml.kernel.org/r/20220415000133.3955987-3-roman.gushchin@linux.devSigned-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: David Vernet <void@manifault.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Roman Gushchin authored
Patch series "mm: memcg kselftests fixes". This patch (of 4): Commit 9852ae3f ("mm, memcg: consider subtrees in memory.events") made memory.events recursive: all events are propagated upwards by the tree. It was a change in semantics. It broke the oom group leaf events test: it assumes that after an OOM the oom_kill counter is zero on parent's level. Let's adjust the test: it should have similar expectations for the child and parent levels. The test passes after this fix. Link: https://lkml.kernel.org/r/20220415000133.3955987-2-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20220415000133.3955987-1-roman.gushchin@linux.devSigned-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: David Vernet <void@manifault.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
After commit bef8620c ("mm: memcg: deprecate the non-hierarchical mode"), we won't have a NULL parent except root_mem_cgroup. And this case is handled when (memcg == root). Link: https://lkml.kernel.org/r/20220403020833.26164-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
For each round-trip, we assign generation on first invocation and compare it on subsequent invocations. Let's move them together to make it more self-explaining. Also this reduce a check on prev. [hannes@cmpxchg.org: better comment to explain reclaim model] Link: https://lkml.kernel.org/r/20220330234719.18340-4-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
During mem_cgroup_iter, there are two ways to get iteration position: reclaim vs non-reclaim mode. Let's do it explicitly for reclaim vs non-reclaim mode. Link: https://lkml.kernel.org/r/20220330234719.18340-3-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
Patch series "mm/memcg: some cleanup for mem_cgroup_iter()", v2. No functional change, try to make it more readable. This patch (of 3): Instead of resetting memcg when css is either not verified or not got reference, we can set it after these process. No functional change, just simplified the code a little. Link: https://lkml.kernel.org/r/20220330234719.18340-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20220330234719.18340-2-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
When mz is not NULL, it means mz can either come from mem_cgroup_largest_soft_limit_node or __mem_cgroup_largest_soft_limit_node. And both of them have removed this node by __mem_cgroup_remove_exceeded(). Not necessary to call __mem_cgroup_remove_exceeded() again. [mhocko@suse.com: refine changelog] Link: https://lkml.kernel.org/r/20220314233030.12334-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
The local variable nr_scanned is unneeded as mem_cgroup_soft_reclaim always does *total_scanned += nr_scanned. So we can pass total_scanned directly to the mem_cgroup_soft_reclaim to simplify the code and save some cpu cycles of adding nr_scanned to total_scanned. Link: https://lkml.kernel.org/r/20220328114144.53389-1-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
The return value of shmem_init is never used. So we can make it return void now. [akpm@linux-foundation.org: remove `return;' from void-returning function, per Muchun Song] Link: https://lkml.kernel.org/r/20220328112707.22217-1-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Chen Wandun authored
In function bdi_set_min_ratio, min_ratio is unsigned int, it will result underflow when setting min_ratio below bdi->min_ratio, it is confusing. Rework it, no functional change. Link: https://lkml.kernel.org/r/20220422095159.2858305-1-chenwandun@huawei.comSigned-off-by: Chen Wandun <chenwandun@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yixuan Cao authored
I noticed a detail that needs to be adjusted. When judging whether a page is allocated by vmalloc, the value of the variable "tmp" was repeatedly judged, so the code was adjusted. This work is coauthored by Yinan Zhang, Jiajian Ye, Shenghong Han, Chongxi Zhao, Yuhong Feng and Yongqiang Liu. Link: https://lkml.kernel.org/r/20220414042744.13896-1-caoyixuan2019@email.szu.edu.cnSigned-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yixuan Cao authored
An application is suspected of having memory leak when its memory consumption is high and keeps increasing. There are several commonly used memory allocators: slab, cma, vmalloc, etc. The memory leak identification can be sped up if the page information allocated by an allocator can be analyzed separately. This patch provides supports for memory allocator labelling for slab, vmalloc, and cma. The pages allocated by slab and cma can be confirmed from the "PFN" line according to the kernel codes, and the label of the vmalloc allocator can be obtained by analyzing the stack trace. Thanks for Vlastimil Babka's constructive suggestions. Based on Yinan Zhang's study, the call chain of vmalloc() is vmalloc() -> ... -> __vmalloc_node_range() -> __vmalloc_area_node(). __vmalloc_area_node() requests memory through the interface of buddy allocation system. In the current version, __vmalloc_area_node() uses four interfaces: alloc_pages_bulk_array_mempolicy(), alloc_pages_bulk_array_node(), alloc_pages() and alloc_pages_node(). By disassembling the code, we find that __vmalloc_area_node() is expanded in __vmalloc_node_range(). So __vmalloc_area_node is not in the stack trace. On the test machine, the stack trace of pages allocated by vmalloc has the following four forms: __alloc_pages_bulk+0x230/0x6a0 __vmalloc_node_range+0x19c/0x598 alloc_pages_bulk_array_mempolicy+0xbc/0x278 __vmalloc_node_range+0x1e8/0x598 __alloc_pages+0x160/0x2b0 __vmalloc_node_range+0x234/0x598 alloc_pages+0xac/0x150 __vmalloc_node_range+0x44c/0x598 Therefore, in two consecutive lines of stacktrace, if the first line contains the word "alloc_pages" and the second line contains the word "__vmalloc_node_range", it can be determined that the page is allocated by vmalloc. And the function offset and size are not the same on different machines, so there is no need to match them. At the same time, this patch updates the --cull and --sort options to support allocator-based merge statistics and sorting. The added functions are fully compatible with the original work. When using, you can use "allocator", or abbreviated as "ator". Relevant updates have also been made in the documentation(Documentation/vm/page_owner.rst). Example: ./page_owner_sort <input> <output> --cull=st,pid,name,allocator ./page_owner_sort <input> <output> --sort=ator,pid,name This work is coauthored by Jiajian Ye, Yinan Zhang, Shenghong Han, Chongxi Zhao, Yuhong Feng and Yongqiang Liu. Link: https://lkml.kernel.org/r/20220410132932.9402-1-caoyixuan2019@email.szu.edu.cnSigned-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Haowen Bai authored
As normal usage, tool will print huge parser log and spend a lot of time printing, so it would be preferable add "-d" debug control to avoid this problem. Link: https://lkml.kernel.org/r/1649672446-5685-1-git-send-email-baihaowen@meizu.comSigned-off-by: Haowen Bai <baihaowen@meizu.com> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jiajian Ye authored
When viewing page owner information, we may want to sort blocks of information by multiple keys, since one single key does not uniquely identify a block. Therefore, following adjustments are made: 1. Add a new --sort option to support sorting blocks of information by multiple keys. ./page_owner_sort <input> <output> --sort=<order> ./page_owner_sort <input> <output> --sort <order> <order> is a single argument in the form of a comma-separated list, which offers a way to specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. The ascending or descending order can be specified by adding the + (ascending, default) or - (descend -ing) prefix to the key: ./page_owner_sort <input> <output> [option] --sort -key1,+key2,key3... For example, to sort the blocks first by task command name in lexicographic order and then by pid in ascending numerical order, use the following: ./page_owner_sort <input> <output> --sort=name,+pid To sort the blocks first by pid in ascending order and then by timestamp of the page when it is allocated in descending order, use the following: ./page_owner_sort <input> <output> --sort=pid,-alloc_ts 2. Add explanations of a newly added --sort option in the function usage() and the document(Documentation/vm/page_owner.rst). This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401024856.767-3-yejiajian2018@email.szu.edu.cnSigned-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jiajian Ye authored
When viewing page owner information, we may want to select blocks whose PID/TGID/TASK_COMM_NAME appears in a user-specified list for data analysis and aggregation. But currently page_owner_sort only supports selecting blocks associated with only one specified PID/TGID/TASK_COMM_NAME. Therefore, following adjustments are made to fix the problem: 1. Enhance selecting function to support the selection of multiple PIDs/TGIDs/TASK_COMM_NAMEs. The enhanced usages are as follows: --pid <pidlist> Select by pid. This selects the blocks whose PID numbers appear in <pidlist>. --tgid <tgidlist> Select by tgid. This selects the blocks whose TGID numbers appear in <tgidlist>. --name <cmdlist> Select by task command name. This selects the blocks whose task command name appear in <cmdlist>. Where <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list,which offers a way to specify individual selecting rules. For example, if you want to select blocks whose tgids are 1, 2 or 3, you have to use 4 commands as follows: ./page_owner_sort <input> <output1> --tgid=1 ./page_owner_sort <input> <output2> --tgid=2 ./page_owner_sort <input> <output3> --tgid=3 cat <output1> <output2> <output3> > <output> With this patch, you can use only 1 command to obtain the same result as above: ./page_owner_sort <input> <output1> --tgid=1,2,3 2. Update explanations of --pid, --tgid and --name in the function usage() and the document(Documents/vm/page_owner.rst). This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401024856.767-2-yejiajian2018@email.szu.edu.cnSigned-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jiajian Ye authored
Error messages should be send to stderr using fprintf() instead of printf(). This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401024856.767-1-yejiajian2018@email.szu.edu.cnSigned-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
git://anongit.freedesktop.org/drm/drmLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Another relatively quiet week, amdgpu leads the way, some i915 display fixes, and a single sunxi fix. amdgpu: - Runtime pm fix - DCN memory leak fix in error path - SI DPM deadlock fix - S0ix fix amdkfd: - GWS fix - GWS support for CRIU i915: - Fix #5284: Backlight control regression on XMG Core 15 e21 - Fix black display plane on Acer One AO532h - Two smaller display fixes sunxi: - Single fix removing applying PHYS_OFFSET twice" * tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend drm/amd/pm: fix the deadlock issue observed on SI drm/amd/display: Fix memory leak in dcn21_clock_source_create drm/amdgpu: don't runtime suspend if there are displays attached (v3) drm/amdkfd: CRIU add support for GWS queues drm/amdkfd: Fix GWS queue count drm/sun4i: Remove obsolete references to PHYS_OFFSET drm/i915/fbc: Consult hw.crtc instead of uapi.crtc drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses drm/i915: Check EDID for HDR static metadata when choosing blc drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
-
Dave Airlie authored
Merge tag 'amd-drm-fixes-5.18-2022-04-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.18-2022-04-27: amdgpu: - Runtime pm fix - DCN memory leak fix in error path - SI DPM deadlock fix - S0ix fix amdkfd: - GWS fix - GWS support for CRIU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
-
Dave Airlie authored
Merge tag 'drm-intel-fixes-2022-04-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix #5284: Backlight control regression on XMG Core 15 e21 - Fix black display plane on Acer One AO532h - Two smaller display fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
-
git://anongit.freedesktop.org/drm/drm-miscDave Airlie authored
drm-misc-fixes for v5.18-rc5: - Single fix removing applying PHYS_OFFSET twice in sunxi. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
-
- 28 Apr, 2022 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf and netfilter. Current release - new code bugs: - bridge: switchdev: check br_vlan_group() return value - use this_cpu_inc() to increment net->core_stats, fix preempt-rt Previous releases - regressions: - eth: stmmac: fix write to sgmii_adapter_base Previous releases - always broken: - netfilter: nf_conntrack_tcp: re-init for syn packets only, resolving issues with TCP fastopen - tcp: md5: fix incorrect tcp_header_len for incoming connections - tcp: fix F-RTO may not work correctly when receiving DSACK - tcp: ensure use of most recently sent skb when filling rate samples - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT - virtio_net: fix wrong buf address calculation when using xdp - xsk: fix forwarding when combining copy mode with busy poll - xsk: fix possible crash when multiple sockets are created - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event - wireguard: device: check for metadata_dst with skb_valid_dst() - netfilter: update ip6_route_me_harder to consider L3 domain - gre: make o_seqno start from 0 in native mode - gre: switch o_seqno to atomic to prevent races in collect_md mode Misc: - add Eric Dumazet to networking maintainers - dt: dsa: realtek: remove realtek,rtl8367s string - netfilter: flowtable: Remove the empty file" * tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) tcp: fix F-RTO may not work correctly when receiving DSACK Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK ixgbe: ensure IPsec VF<->PF compatibility MAINTAINERS: Update BNXT entry with firmware files netfilter: nft_socket: only do sk lookups when indev is available net: fec: add missing of_node_put() in fec_enet_init_stop_mode() bnx2x: fix napi API usage sequence tls: Skip tls_append_frag on zero copy size Add Eric Dumazet to networking maintainers netfilter: conntrack: fix udp offload timeout sysctl netfilter: nf_conntrack_tcp: re-init for syn packets only net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK net: Use this_cpu_inc() to increment net->core_stats Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted Bluetooth: hci_event: Fix creating hci_conn object on error status Bluetooth: hci_event: Fix checking for invalid handle on error status ice: fix use-after-free when deinitializing mailbox snapshot ice: wait 5 s for EMP reset after firmware flash ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull thermal control fixes from Rafael Wysocki: "These take back recent chages that started to confuse users and fix up an attr.show callback prototype in a driver. Specifics: - Stop warning about deprecation of the userspace thermal governor and cooling device status interface, because there are cases in which user space has to drive thermal management with the help of them (Daniel Lezcano) - Fix attr.show callback prototype in the int340x thermal driver (Kees Cook)" * tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/governor: Remove deprecated information Revert "thermal/core: Deprecate changing cooling device state from userspace" thermal: int340x: Fix attr.show callback prototype
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "These fix up recent intel_idle driver changes and fix some ARM cpufreq driver issues. Specifics: - Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov, Vladimir Zapolskiy). - Fix memory leak with the Sun501 driver (Xiaobing Luo). - Make intel_idle enable C1E promotion on all CPUs when C1E is preferred to C1 (Artem Bityutskiy). - Make C6 optimization on Sapphire Rapids added recently work as expected if both C1E and C1 are "preferred" (Artem Bityutskiy)" * tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: Fix SPR C6 optimization intel_idle: Fix the 'preferred_cstates' module parameter cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms cpufreq: qcom-hw: provide online/offline operations cpufreq: qcom-hw: fix the opp entries refcounting cpufreq: qcom-hw: fix the race between LMH worker and cpuhp cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fixes from Rafael WysockiL "These fix up the ACPI processor driver after a change made during the 5.16 cycle that inadvertently broke falling back to shallower C-states when C3 cannot be used. Specifics: - Make the ACPI processor driver avoid falling back to C3 type of C-states when C3 cannot be requested (Ville Syrjälä) - Revert a quirk that is not necessary any more after fixing the underlying issue properly (Ville Syrjälä)" * tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40" ACPI: processor: idle: Avoid falling back to C3 type C-states
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Highlights: - asus-wmi bug-fixes - intel-sdsu bug-fixes - build (warning) fixes - couple of hw-id additions" * tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel: pmc/core: change pmc_lpm_modes to static platform/x86/intel/sdsi: Fix bug in multi packet reads platform/x86/intel/sdsi: Poll on ready bit for writes platform/x86/intel/sdsi: Handle leaky bucket platform/x86: intel-uncore-freq: Prevent driver loading in guests platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard platform/x86: dell-laptop: Add quirk entry for Latitude 7520 platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf() tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
-
Linus Torvalds authored
Merge tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "A minor fix for the DT binding documentation of the rt5190a driver" * tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: dt-bindings: Revise the rt5190a buck/ldo description
-
Pengcheng Yang authored
Currently DSACK is regarded as a dupack, which may cause F-RTO to incorrectly enter "loss was real" when receiving DSACK. Packetdrill to demonstrate: // Enable F-RTO and TLP 0 `sysctl -q net.ipv4.tcp_frto=2` 0 `sysctl -q net.ipv4.tcp_early_retrans=3` 0 `sysctl -q net.ipv4.tcp_congestion_control=cubic` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // RTT 10ms, RTO 210ms +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 2 data segments +0 write(4, ..., 2000) = 2000 +0 > P. 1:2001(2000) ack 1 // TLP +.022 > P. 1001:2001(1000) ack 1 // Continue to send 8 data segments +0 write(4, ..., 10000) = 10000 +0 > P. 2001:10001(8000) ack 1 // RTO +.188 > . 1:1001(1000) ack 1 // The original data is acked and new data is sent(F-RTO step 2.b) +0 < . 1:1(0) ack 2001 win 257 +0 > P. 10001:12001(2000) ack 1 // D-SACK caused by TLP is regarded as a dupack, this results in // the incorrect judgment of "loss was real"(F-RTO step 3.a) +.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop> // Never-retransmitted data(3001:4001) are acked and // expect to switch to open state(F-RTO step 3.b) +0 < . 1:1(0) ack 4001 win 257 +0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }% Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-