1. 30 Nov, 2022 33 commits
  2. 23 Nov, 2022 7 commits
    • Li Hua's avatar
      test_kprobes: fix implicit declaration error of test_kprobes · de3db3f8
      Li Hua authored
      If KPROBES_SANITY_TEST and ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled, but
      STACKTRACE is not set. Build failed as below:
      
      lib/test_kprobes.c: In function `stacktrace_return_handler':
      lib/test_kprobes.c:228:8: error: implicit declaration of function `stack_trace_save'; did you mean `stacktrace_driver'? [-Werror=implicit-function-declaration]
        ret = stack_trace_save(stack_buf, STACK_BUF_SIZE, 0);
              ^~~~~~~~~~~~~~~~
              stacktrace_driver
      cc1: all warnings being treated as errors
      scripts/Makefile.build:250: recipe for target 'lib/test_kprobes.o' failed
      make[2]: *** [lib/test_kprobes.o] Error 1
      
      To fix this error, Select STACKTRACE if ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled.
      
      Link: https://lkml.kernel.org/r/20221121030620.63181-1-hucool.lihua@huawei.com
      Fixes: 1f6d3a8f ("kprobes: Add a test case for stacktrace from kretprobe handler")
      Signed-off-by: default avatarLi Hua <hucool.lihua@huawei.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      de3db3f8
    • Chen Zhongjin's avatar
      nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty · 512c5ca0
      Chen Zhongjin authored
      When extending segments, nilfs_sufile_alloc() is called to get an
      unassigned segment, then mark it as dirty to avoid accidentally allocating
      the same segment in the future.
      
      But for some special cases such as a corrupted image it can be unreliable.
      If such corruption of the dirty state of the segment occurs, nilfs2 may
      reallocate a segment that is in use and pick the same segment for writing
      twice at the same time.
      
      This will cause the problem reported by syzkaller:
      https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd24
      
      This case started with segbuf1.segnum = 3, nextnum = 4 when constructed. 
      It supposed segment 4 has already been allocated and marked as dirty.
      
      However the dirty state was corrupted and segment 4 usage was not dirty. 
      For the first time nilfs_segctor_extend_segments() segment 4 was allocated
      again, which made segbuf2 and next segbuf3 had same segment 4.
      
      sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is added
      to both buffer lists of two segbuf.  It makes the lists broken which
      causes NULL pointer dereference.
      
      Fix the problem by setting usage as dirty every time in
      nilfs_sufile_mark_dirty(), which is called during constructing current
      segment to be written out and before allocating next segment.
      
      [chenzhongjin@huawei.com: add lock protection per Ryusuke]
        Link: https://lkml.kernel.org/r/20221121091141.214703-1-chenzhongjin@huawei.com
      Link: https://lkml.kernel.org/r/20221118063304.140187-1-chenzhongjin@huawei.com
      Fixes: 9ff05123 ("nilfs2: segment constructor")
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Reported-by: <syzbot+77e4f0...@syzkaller.appspotmail.com>
      Reported-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      512c5ca0
    • Aneesh Kumar K.V's avatar
      mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1 · 81a70c21
      Aneesh Kumar K.V authored
      balance_dirty_pages doesn't do the required dirty throttling on cgroupv1. 
      See commit 9badce00 ("cgroup, writeback: don't enable cgroup writeback
      on traditional hierarchies").  Instead, the kernel depends on writeback
      throttling in shrink_folio_list to achieve the same goal.  With large
      memory systems, the flusher may not be able to writeback quickly enough
      such that we will start finding pages in the shrink_folio_list already in
      writeback.  Hence for cgroupv1 let's do a reclaim throttle after waking up
      the flusher.
      
      The below test which used to fail on a 256GB system completes till the the
      file system is full with this change.
      
      root@lp2:/sys/fs/cgroup/memory# mkdir test
      root@lp2:/sys/fs/cgroup/memory# cd test/
      root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes
      root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks
      root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M
      Killed
      
      Link: https://lkml.kernel.org/r/20221118070603.84081-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: zefan li <lizefan.x@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      81a70c21
    • Qi Zheng's avatar
      mm: fix unexpected changes to {failslab|fail_page_alloc}.attr · ea4452de
      Qi Zheng authored
      When we specify __GFP_NOWARN, we only expect that no warnings will be
      issued for current caller.  But in the __should_failslab() and
      __should_fail_alloc_page(), the local GFP flags alter the global
      {failslab|fail_page_alloc}.attr, which is persistent and shared by all
      tasks.  This is not what we expected, let's fix it.
      
      [akpm@linux-foundation.org: unexport should_fail_ex()]
      Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com
      Fixes: 3f913fc5 ("mm: fix missing handler for __GFP_NOWARN")
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ea4452de
    • Chen Wandun's avatar
      swapfile: fix soft lockup in scan_swap_map_slots · de1ccfb6
      Chen Wandun authored
      A softlockup occurs in scan free swap slot under huge memory pressure. 
      The test scenario is: 64 CPU cores, 64GB memory, and 28 zram devices, the
      disksize of each zram device is 50MB.
      
      LATENCY_LIMIT is used to prevent softlockups in scan_swap_map_slots(), but
      the real loop number would more than LATENCY_LIMIT because of "goto checks
      and goto scan" repeatly without decreasing latency limit.
      
      In order to fix it, decrease latency_ration in advance.
      
      There is also a suspicious place that will cause softlockups in
      get_swap_pages().  In this function, the "goto start_over" may result in
      continuous scanning of the swap partition.  If there is no cond_sched in
      scan_swap_map_slots(), it would cause a softlockup (I am not sure about
      this).
      
      WARN: soft lockup - CPU#11 stuck for 11s! [kswapd0:466]
      CPU: 11 PID: 466 Comm: kswapd@ Kdump: loaded Tainted: G
      dump backtrace+0x0/0x1le4
      show stack+0x20/@x2c
      dump_stack+0xd8/0x140
      watchdog print_info+0x48/0x54
      watchdog_process_before_softlockup+0x98/0xa0
      watchdog_timer_fn+0xlac/0x2d0
      hrtimer_rum_queues+0xb0/0x130
      hrtimer_interrupt+0x13c/0x3c0
      arch_timer_handler_virt+0x3c/0x50
      handLe_percpu_devid_irq+0x90/0x1f4
      handle domain irq+0x84/0x100
      gic_handle_irq+0x88/0x2b0
      e11 ira+0xhB/Bx140
      scan_swap_map_slots+0x678/0x890
      get_swap_pages+0x29c/0x440
      get_swap_page+0x120/0x2e0
      add_to_swap+UX2U/0XyC
      shrink_page_list+0x5d0/0x152c
      shrink_inactive_list+0xl6c/Bx500
      shrink_lruvec+0x270/0x304
      
      WARN: soft lockup - CPU#32 stuck for 11s! [stress-ng:309915]
      watchdog_timer_fn+0x1ac/0x2d0
      __run_hrtimer+0x98/0x2a0
      __hrtimer_run_queues+0xb0/0x130
      hrtimer_interrupt+0x13c/0x3c0
      arch_timer_handler_virt+0x3c/0x50
      handle_percpu_devid_irq+0x90/0x1f4
      __handle_domain_irq+0x84/0x100
      gic_handle_irq+0x88/0x2b0
      el1_irq+0xb8/0x140
      get_swap_pages+0x1e8/0x440
      get_swap_page+0x1c8/0x2e0
      add_to_swap+0x20/0x9c
      shrink_page_list+0x5d0/0x152c
      reclaim_pages+0x160/0x310
      madvise_cold_or_pageout_pte_range+0x7bc/0xe3c
      walk_pmd_range.isra.0+0xac/0x22c
      walk_pud_range+0xfc/0x1c0
      walk_pgd_range+0x158/0x1b0
      __walk_page_range+0x64/0x100
      walk_page_range+0x104/0x150
      
      Link: https://lkml.kernel.org/r/20221118133850.3360369-1-chenwandun@huawei.com
      Fixes: 048c27fd ("[PATCH] swap: scan_swap_map latency breaks")
      Signed-off-by: default avatarChen Wandun <chenwandun@huawei.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Nanyong Sun <sunnanyong@huawei.com>
      Cc: <xialonglong1@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      de1ccfb6
    • Mike Kravetz's avatar
      hugetlb: fix __prep_compound_gigantic_page page flag setting · 7fb0728a
      Mike Kravetz authored
      Commit 2b21624f ("hugetlb: freeze allocated pages before creating
      hugetlb pages") changed the order page flags were cleared and set in the
      head page.  It moved the __ClearPageReserved after __SetPageHead. 
      However, there is a check to make sure __ClearPageReserved is never done
      on a head page.  If CONFIG_DEBUG_VM_PGFLAGS is enabled, the following BUG
      will be hit when creating a hugetlb gigantic page:
      
          page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
          ------------[ cut here ]------------
          kernel BUG at include/linux/page-flags.h:500!
          Call Trace will differ depending on whether hugetlb page is created
          at boot time or run time.
      
      Make sure to __ClearPageReserved BEFORE __SetPageHead.
      
      Link: https://lkml.kernel.org/r/20221118195249.178319-1-mike.kravetz@oracle.com
      Fixes: 2b21624f ("hugetlb: freeze allocated pages before creating hugetlb pages")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Tested-by: default avatarTarun Sahu <tsahu@linux.ibm.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7fb0728a
    • Marco Elver's avatar
      kfence: fix stack trace pruning · 747c0f35
      Marco Elver authored
      Commit b1405135 ("mm/sl[au]b: generalize kmalloc subsystem")
      refactored large parts of the kmalloc subsystem, resulting in the stack
      trace pruning logic done by KFENCE to no longer work.
      
      While b1405135 attempted to fix the situation by including
      '__kmem_cache_free' in the list of functions KFENCE should skip through,
      this only works when the compiler actually optimized the tail call from
      kfree() to __kmem_cache_free() into a jump (and thus kfree() _not_
      appearing in the full stack trace to begin with).
      
      In some configurations, the compiler no longer optimizes the tail call
      into a jump, and __kmem_cache_free() appears in the stack trace.  This
      means that the pruned stack trace shown by KFENCE would include kfree()
      which is not intended - for example:
      
       | BUG: KFENCE: invalid free in kfree+0x7c/0x120
       |
       | Invalid free of 0xffff8883ed8fefe0 (in kfence-#126):
       |  kfree+0x7c/0x120
       |  test_double_free+0x116/0x1a9
       |  kunit_try_run_case+0x90/0xd0
       | [...]
      
      Fix it by moving __kmem_cache_free() to the list of functions that may be
      tail called by an allocator entry function, making the pruning logic work
      in both the optimized and unoptimized tail call cases.
      
      Link: https://lkml.kernel.org/r/20221118152216.3914899-1-elver@google.com
      Fixes: b1405135 ("mm/sl[au]b: generalize kmalloc subsystem")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      747c0f35