1. 12 Sep, 2022 6 commits
    • Zach O'Keefe's avatar
      mm/thp: add flag to enforce sysfs THP in hugepage_vma_check() · a7f4e6e4
      Zach O'Keefe authored
      MADV_COLLAPSE is not coupled to the kernel-oriented sysfs THP settings[1].
      
      hugepage_vma_check() is the authority on determining if a VMA is eligible
      for THP allocation/collapse, and currently enforces the sysfs THP
      settings.  Add a flag to disable these checks.  For now, only apply this
      arg to anon and file, which use /sys/kernel/transparent_hugepage/enabled. 
      We can expand this to shmem, which uses
      /sys/kernel/transparent_hugepage/shmem_enabled, later.
      
      Use this flag in collapse_pte_mapped_thp() where previously the VMA flags
      passed to hugepage_vma_check() were OR'd with VM_HUGEPAGE to elide the
      VM_HUGEPAGE check in "madvise" THP mode.  Prior to "mm: khugepaged: check
      THP flag in hugepage_vma_check()", this check also didn't check "never"
      THP mode.  As such, this restores the previous behavior of
      collapse_pte_mapped_thp() where sysfs THP settings are ignored.  See
      comment in code for justification why this is OK.
      
      [1] https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/
      
      Link: https://lkml.kernel.org/r/20220706235936.2197195-8-zokeefe@google.comSigned-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a7f4e6e4
    • Zach O'Keefe's avatar
      mm/khugepaged: add flag to predicate khugepaged-only behavior · d8ea7cc8
      Zach O'Keefe authored
      Add .is_khugepaged flag to struct collapse_control so khugepaged-specific
      behavior can be elided by MADV_COLLAPSE context.
      
      Start by protecting khugepaged-specific heuristics by this flag.  In
      MADV_COLLAPSE, the user presumably has reason to believe the collapse will
      be beneficial and khugepaged heuristics shouldn't prevent the user from
      doing so:
      
      1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared]
      
      2) requirement that some pages in region being collapsed be young or
         referenced
      
      [zokeefe@google.com: consistently order cc->is_khugepaged and pte_* checks]
        Link: https://lkml.kernel.org/r/20220720140603.1958773-3-zokeefe@google.com
        Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
      Link: https://lkml.kernel.org/r/20220706235936.2197195-7-zokeefe@google.comSigned-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d8ea7cc8
    • Zach O'Keefe's avatar
      mm/khugepaged: propagate enum scan_result codes back to callers · 50ad2f24
      Zach O'Keefe authored
      Propagate enum scan_result codes back through return values of
      functions downstream of khugepaged_scan_file() and
      khugepaged_scan_pmd() to inform callers if the operation was
      successful, and if not, why.
      
      Since khugepaged_scan_pmd()'s return value already has a specific meaning
      (whether mmap_lock was unlocked or not), add a bool* argument to
      khugepaged_scan_pmd() to retrieve this information.
      
      Change khugepaged to take action based on the return values of
      khugepaged_scan_file() and khugepaged_scan_pmd() instead of acting deep
      within the collapsing functions themselves.
      
      hugepage_vma_revalidate() now returns SCAN_SUCCEED on success to be more
      consistent with enum scan_result propagation.
      
      Remove dependency on error pointers to communicate to khugepaged that
      allocation failed and it should sleep; instead just use the result of the
      scan (SCAN_ALLOC_HUGE_PAGE_FAIL if allocation fails).
      
      Link: https://lkml.kernel.org/r/20220706235936.2197195-6-zokeefe@google.comSigned-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50ad2f24
    • Zach O'Keefe's avatar
      mm/khugepaged: dedup and simplify hugepage alloc and charging · 9710a78a
      Zach O'Keefe authored
      The following code is duplicated in collapse_huge_page() and
      collapse_file():
      
              gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
      
      	new_page = khugepaged_alloc_page(hpage, gfp, node);
              if (!new_page) {
                      result = SCAN_ALLOC_HUGE_PAGE_FAIL;
                      goto out;
              }
      
              if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) {
                      result = SCAN_CGROUP_CHARGE_FAIL;
                      goto out;
              }
              count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
      
      Also, "node" is passed as an argument to both collapse_huge_page() and
      collapse_file() and obtained the same way, via
      khugepaged_find_target_node().
      
      Move all this into a new helper, alloc_charge_hpage(), and remove the
      duplicate code from collapse_huge_page() and collapse_file().  Also,
      simplify khugepaged_alloc_page() by returning a bool indicating allocation
      success instead of a copy of the allocated struct page *.
      
      Link: https://lkml.kernel.org/r/20220706235936.2197195-5-zokeefe@google.comSigned-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Suggested-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9710a78a
    • Zach O'Keefe's avatar
      mm/khugepaged: add struct collapse_control · 34d6b470
      Zach O'Keefe authored
      Modularize hugepage collapse by introducing struct collapse_control.  This
      structure serves to describe the properties of the requested collapse, as
      well as serve as a local scratch pad to use during the collapse itself.
      
      Start by moving global per-node khugepaged statistics into this new
      structure.  Note that this structure is still statically allocated since
      CONFIG_NODES_SHIFT might be arbitrary large, and stack-allocating a
      MAX_NUMNODES-sized array could cause -Wframe-large-than= errors.
      
      [zokeefe@google.com: use minimal bits to store num page < HPAGE_PMD_NR]
        Link: https://lkml.kernel.org/r/20220720140603.1958773-2-zokeefe@google.com
        Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
      [sfr@canb.auug.org.au: fix build]
        Link: https://lkml.kernel.org/r/20220721195508.15f1e07a@canb.auug.org.au
      [zokeefe@google.com: fix struct collapse_control load_node definition]
        Link: https://lore.kernel.org/linux-mm/202209021349.F73i5d6X-lkp@intel.com/
        Link: https://lkml.kernel.org/r/20220903021221.1130021-1-zokeefe@google.com
      Link: https://lkml.kernel.org/r/20220706235936.2197195-4-zokeefe@google.comSigned-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      34d6b470
    • Yang Shi's avatar
      mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA · c6a7f445
      Yang Shi authored
      Patch series "mm: userspace hugepage collapse", v7.
      
      Introduction
      --------------------------------
      
      This series provides a mechanism for userspace to induce a collapse of
      eligible ranges of memory into transparent hugepages in process context,
      thus permitting users to more tightly control their own hugepage
      utilization policy at their own expense.
      
      This idea was introduced by David Rientjes[5].
      
      Interface
      --------------------------------
      
      The proposed interface adds a new madvise(2) mode, MADV_COLLAPSE, and
      leverages the new process_madvise(2) call.
      
      process_madvise(2)
      
      	Performs a synchronous collapse of the native pages
      	mapped by the list of iovecs into transparent hugepages.
      
      	This operation is independent of the system THP sysfs settings,
      	but attempts to collapse VMAs marked VM_NOHUGEPAGE will still fail.
      
      	THP allocation may enter direct reclaim and/or compaction.
      
      	When a range spans multiple VMAs, the semantics of the collapse
      	over of each VMA is independent from the others.
      
      	Caller must have CAP_SYS_ADMIN if not acting on self.
      
      	Return value follows existing process_madvise(2) conventions.  A
      	“success” indicates that all hugepage-sized/aligned regions
      	covered by the provided range were either successfully
      	collapsed, or were already pmd-mapped THPs.
      
      madvise(2)
      
      	Equivalent to process_madvise(2) on self, with 0 returned on
      	“success”.
      
      Current Use-Cases
      --------------------------------
      
      (1)	Immediately back executable text by THPs.  Current support provided
      	by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large
      	system which might impair services from serving at their full rated
      	load after (re)starting.  Tricks like mremap(2)'ing text onto
      	anonymous memory to immediately realize iTLB performance prevents
      	page sharing and demand paging, both of which increase steady state
      	memory footprint.  With MADV_COLLAPSE, we get the best of both
      	worlds: Peak upfront performance and lower RAM footprints.  Note
      	that subsequent support for file-backed memory is required here.
      
      (2)	malloc() implementations that manage memory in hugepage-sized
      	chunks, but sometimes subrelease memory back to the system in
      	native-sized chunks via MADV_DONTNEED; zapping the pmd.  Later,
      	when the memory is hot, the implementation could
      	madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain
      	hugepage coverage and dTLB performance.  TCMalloc is such an
      	implementation that could benefit from this[6].  A prior study of
      	Google internal workloads during evaluation of Temeraire, a
      	hugepage-aware enhancement to TCMalloc, showed that nearly 20% of
      	all cpu cycles were spent in dTLB stalls, and that increasing
      	hugepage coverage by even small amount can help with that[7].
      
      (3)	userfaultfd-based live migration of virtual machines satisfy UFFD
      	faults by fetching native-sized pages over the network (to avoid
      	latency of transferring an entire hugepage).  However, after guest
      	memory has been fully copied to the new host, MADV_COLLAPSE can
      	be used to immediately increase guest performance.  Note that
      	subsequent support for file/shmem-backed memory is required here.
      
      (4)	HugeTLB high-granularity mapping allows HugeTLB a HugeTLB page to
      	be mapped at different levels in the page tables[8].  As it's not
      	"transparent" like THP, HugeTLB high-granularity mappings require
      	an explicit user API. It is intended that MADV_COLLAPSE be co-opted
      	for this use case[9].  Note that subsequent support for HugeTLB
      	memory is required here.
      
      Future work
      --------------------------------
      
      Only private anonymous memory is supported by this series. File and
      shmem memory support will be added later.
      
      One possible user of this functionality is a userspace agent that
      attempts to optimize THP utilization system-wide by allocating THPs
      based on, for example, task priority, task performance requirements, or
      heatmaps.  For the latter, one idea that has already surfaced is using
      DAMON to identify hot regions, and driving THP collapse through a new
      DAMOS_COLLAPSE scheme[10].
      
      
      This patch (of 17):
      
      The khugepaged has optimization to reduce huge page allocation calls for
      !CONFIG_NUMA by carrying the allocated but failed to collapse huge page to
      the next loop.  CONFIG_NUMA doesn't do so since the next loop may try to
      collapse huge page from a different node, so it doesn't make too much
      sense to carry it.
      
      But when NUMA=n, the huge page is allocated by khugepaged_prealloc_page()
      before scanning the address space, so it means huge page may be allocated
      even though there is no suitable range for collapsing.  Then the page
      would be just freed if khugepaged already made enough progress.  This
      could make NUMA=n run have 5 times as much thp_collapse_alloc as NUMA=y
      run.  This problem actually makes things worse due to the way more
      pointless THP allocations and makes the optimization pointless.
      
      This could be fixed by carrying the huge page across scans, but it will
      complicate the code further and the huge page may be carried indefinitely.
      But if we take one step back, the optimization itself seems not worth
      keeping nowadays since:
      
        * Not too many users build NUMA=n kernel nowadays even though the kernel is
          actually running on a non-NUMA machine. Some small devices may run NUMA=n
          kernel, but I don't think they actually use THP.
        * Since commit 44042b44 ("mm/page_alloc: allow high-order pages to be
          stored on the per-cpu lists"), THP could be cached by pcp.  This actually
          somehow does the job done by the optimization.
      
      Link: https://lkml.kernel.org/r/20220706235936.2197195-1-zokeefe@google.com
      Link: https://lkml.kernel.org/r/20220706235936.2197195-3-zokeefe@google.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Co-developed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c6a7f445
  2. 28 Aug, 2022 25 commits
  3. 27 Aug, 2022 9 commits
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 10d4879f
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "Fix two issues introduced recently and one driver problem leading to a
        NULL pointer dereference in some cases.
      
        Specifics:
      
         - Add missing EXPORT_SYMBOL_GPL in the thermal core and add back the
           required 'trips' property to the thermal zone DT bindings (Daniel
           Lezcano)
      
         - Prevent the int340x_thermal driver from crashing when a package
           with a buffer of 0 length is returned by an ACPI control method
           evaluated by it (Lee, Chun-Yi)"
      
      * tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTR
        dt-bindings: thermal: Fix missing required property
        thermal/core: Add missing EXPORT_SYMBOL_GPL
      10d4879f
    • Linus Torvalds's avatar
      Merge tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b98f602d
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Make __resolve_freq() check the presence of the frequency table
        instead of checking whether or not the ->target_index() callback is
        implemented by the driver, because that need not be the case when
        __resolve_freq() is used (Lukasz Luba)"
      
      * tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: check only freq_table in __resolve_freq()
      b98f602d
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 2b1ddb59
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix issues introduced by recent changes related to the handling
        of ACPI device properties and a coding mistake in the exit path of the
        ACPI processor driver.
      
        Specifics:
      
         - Prevent acpi_thermal_cpufreq_exit() from attempting to remove
           the same frequency QoS request multiple times (Riwen Lu)
      
         - Fix type detection for integer ACPI device properties (Stefan
           Binding)
      
         - Avoid emitting false-positive warnings when processing ACPI
           device properties and drop the useless default case from the
           acpi_copy_property_array_uint() macro (Sakari Ailus)"
      
      * tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: property: Remove default association from integer maximum values
        ACPI: property: Ignore already existing data node tags
        ACPI: property: Fix type detection of unified integer reading functions
        ACPI: processor: Remove freq Qos request for all CPUs
      2b1ddb59
    • Linus Torvalds's avatar
      Merge tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · dee18737
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix double free of guarded storage and runtime instrumentation
         control blocks on fork() failure
      
       - Fix triggering write fault when VMA does not allow VM_WRITE
      
      * tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/mm: do not trigger write fault when vma does not allow VM_WRITE
        s390: fix double free of GS and RI CBs on fork() failure
      dee18737
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 05519f24
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - two minor cleanups
      
       - a fix of the xen/privcmd driver avoiding a possible NULL dereference
         in an error case
      
      * tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/privcmd: fix error exit of privcmd_ioctl_dm_op()
        xen: move from strlcpy with unused retval to strscpy
        xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY
      05519f24
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 17b28d42
      Linus Torvalds authored
      Pull audit fix from Paul Moore:
       "Another small audit patch, this time to fix a bug where the return
        codes were not properly set before the audit filters were run,
        potentially resulting in missed audit records"
      
      * tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: move audit_return_fixup before the filters
      17b28d42
    • Linus Torvalds's avatar
      Merge tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 89b749d8
      Linus Torvalds authored
      Pull fbdev fixes and updates from Helge Deller:
       "Mostly just small patches, with the exception of the bigger indenting
        cleanups in the sisfb and radeonfb drivers.
      
        Two patches should be mentioned though: A fix-up for fbdev if the
        screen resize fails (by Shigeru Yoshida), and a potential divide by
        zero fix in fb_pm2fb (by Letu Ren).
      
        Summary:
      
        Major fixes:
         - Revert the changes for fbcon console when vc_resize() fails
           [Shigeru Yoshida]
         - Avoid a potential divide by zero error in fb_pm2fb [Letu Ren]
      
        Minor fixes:
         - Add missing pci_disable_device() in chipsfb_pci_init() [Yang
           Yingliang]
         - Fix tests for platform_get_irq() failure in omapfb [Yu Zhe]
         - Destroy mutex on freeing struct fb_info in fbsysfs [Shigeru
           Yoshida]
      
        Cleanups:
         - Move fbdev drivers from strlcpy to strscpy [Wolfram Sang]
         - Indenting fixes, comment fixes, ... [Jiapeng Chong & Jilin Yuan]"
      
      * tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
        fbdev: fbcon: Properly revert changes when vc_resize() failed
        fbdev: Move fbdev drivers from strlcpy to strscpy
        fbdev: omap: Remove unnecessary print function dev_err()
        fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init()
        fbdev: fbcon: Destroy mutex on freeing struct fb_info
        fbdev: radeon: Clean up some inconsistent indenting
        fbdev: sisfb: Clean up some inconsistent indenting
        fbdev: fb_pm2fb: Avoid potential divide by zero error
        fbdev: ssd1307fb: Fix repeated words in comments
        fbdev: omapfb: Fix tests for platform_get_irq() failure
      89b749d8
    • Mikulas Patocka's avatar
      provide arch_test_bit_acquire for architectures that define test_bit · d6ffe606
      Mikulas Patocka authored
      Some architectures define their own arch_test_bit and they also need
      arch_test_bit_acquire, otherwise they won't compile.  We also clean up
      the code by using the generic test_bit if that is equivalent to the
      arch-specific version.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 8238b457 ("wait_on_bit: add an acquire memory barrier")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6ffe606
    • Zhengjun Xing's avatar
      perf stat: Capitalize topdown metrics' names · 48648548
      Zhengjun Xing authored
      Capitalize topdown metrics' names to follow the intel SDM.
      
      Before:
      
       # ./perf stat -a  sleep 1
      
       Performance counter stats for 'system wide':
      
              228,094.05 msec cpu-clock                        #  225.026 CPUs utilized
                     842      context-switches                 #    3.691 /sec
                     224      cpu-migrations                   #    0.982 /sec
                      70      page-faults                      #    0.307 /sec
              23,164,105      cycles                           #    0.000 GHz
              29,403,446      instructions                     #    1.27  insn per cycle
               5,268,185      branches                         #   23.097 K/sec
                  33,239      branch-misses                    #    0.63% of all branches
             136,248,990      slots                            #  597.337 K/sec
              32,976,450      topdown-retiring                 #     24.2% retiring
               4,651,918      topdown-bad-spec                 #      3.4% bad speculation
              26,148,695      topdown-fe-bound                 #     19.2% frontend bound
              72,515,776      topdown-be-bound                 #     53.2% backend bound
               6,008,540      topdown-heavy-ops                #      4.4% heavy operations       #     19.8% light operations
               3,934,049      topdown-br-mispredict            #      2.9% branch mispredict      #      0.5% machine clears
              16,655,439      topdown-fetch-lat                #     12.2% fetch latency          #      7.0% fetch bandwidth
              41,635,972      topdown-mem-bound                #     30.5% memory bound           #     22.7% Core bound
      
             1.013634593 seconds time elapsed
      
      After:
      
       # ./perf stat -a  sleep 1
      
       Performance counter stats for 'system wide':
      
              228,081.94 msec cpu-clock                        #  225.003 CPUs utilized
                     824      context-switches                 #    3.613 /sec
                     224      cpu-migrations                   #    0.982 /sec
                      67      page-faults                      #    0.294 /sec
              22,647,423      cycles                           #    0.000 GHz
              28,870,551      instructions                     #    1.27  insn per cycle
               5,167,099      branches                         #   22.655 K/sec
                  32,383      branch-misses                    #    0.63% of all branches
             133,411,074      slots                            #  584.926 K/sec
              32,352,607      topdown-retiring                 #     24.3% Retiring
               4,456,977      topdown-bad-spec                 #      3.3% Bad Speculation
              25,626,487      topdown-fe-bound                 #     19.2% Frontend Bound
              70,955,316      topdown-be-bound                 #     53.2% Backend Bound
               5,834,844      topdown-heavy-ops                #      4.4% Heavy Operations       #     19.9% Light Operations
               3,738,781      topdown-br-mispredict            #      2.8% Branch Mispredict      #      0.5% Machine Clears
              16,286,803      topdown-fetch-lat                #     12.2% Fetch Latency          #      7.0% Fetch Bandwidth
              40,802,069      topdown-mem-bound                #     30.6% Memory Bound           #     22.6% Core Bound
      
             1.013683125 seconds time elapsed
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      48648548