1. 19 Jun, 2023 40 commits
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from z3fold · e774a7bc
      Domenico Cerasuolo authored
      Switch z3fold to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-4-cerasuolodomenico@gmail.comSigned-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e774a7bc
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from zbud · 1be537c6
      Domenico Cerasuolo authored
      Switch zbud to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-3-cerasuolodomenico@gmail.comSigned-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1be537c6
    • Domenico Cerasuolo's avatar
      mm: zswap: add pool shrinking mechanism · f999f38b
      Domenico Cerasuolo authored
      Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
      
      This series aims to improve the zswap reclaim mechanism by reorganizing
      the LRU management. In the current implementation, the LRU is maintained
      within each zpool driver, resulting in duplicated code across the three
      drivers. The proposed change consists in moving the LRU management from
      the individual implementations up to the zswap layer.
      
      The primary objective of this refactoring effort is to simplify the
      codebase. By unifying the reclaim loop and consolidating LRU handling
      within zswap, we can eliminate redundant code and improve
      maintainability. Additionally, this change enables the reclamation of
      stored pages in their actual LRU order. Presently, the zpool drivers
      link backing pages in an LRU, causing compressed pages with different
      LRU positions to be written back simultaneously.
      
      The series consists of several patches. The first patch implements the
      LRU and the reclaim loop in zswap, but it is not used yet because all
      three driver implementations are marked as zpool_evictable.
      The following three commits modify each zpool driver to be not
      zpool_evictable, allowing the use of the reclaim loop in zswap.
      As the drivers removed their shrink functions, the zpool interface is
      then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
      Finally, the code in zswap is further cleaned up by simplifying the
      writeback function and removing the now unnecessary zswap_header.
      
      
      This patch (of 7):
      
      Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
      function, which is called from zpool_shrink.  However, with this commit, a
      unified shrink function is added to zswap.  The ultimate goal is to
      eliminate the need for zpool_shrink once all zpool implementations have
      dropped their shrink code.
      
      To ensure the functionality of each commit, this change focuses solely on
      adding the mechanism itself.  No modifications are made to the backends,
      meaning that functionally, there are no immediate changes.  The zswap
      mechanism will only come into effect once the backends have removed their
      shrink code.  The subsequent commits will address the modifications needed
      in the backends.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
      Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.comSigned-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f999f38b
    • Muhammad Usama Anjum's avatar
      selftests: mm: remove duplicate unneeded defines · 0183d777
      Muhammad Usama Anjum authored
      Remove all defines which aren't needed after correctly including the
      kernel header files.
      
      Link: https://lkml.kernel.org/r/20230612095347.996335-2-usama.anjum@collabora.comSigned-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0183d777
    • Muhammad Usama Anjum's avatar
      selftests: mm: remove wrong kernel header inclusion · 1e6d1e36
      Muhammad Usama Anjum authored
      It is wrong to include unprocessed user header files directly.  They are
      processed to "<source_tree>/usr/include" by running "make headers" and
      they are included in selftests by kselftest makefiles automatically with
      help of KHDR_INCLUDES variable.  These headers should always bulilt first
      before building kselftests.
      
      Link: https://lkml.kernel.org/r/20230612095347.996335-1-usama.anjum@collabora.com
      Fixes: 07115fcc ("selftests/mm: add new selftests for KSM")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e6d1e36
    • Ryan Roberts's avatar
      mm: ptep_get() conversion · c33c7948
      Ryan Roberts authored
      Convert all instances of direct pte_t* dereferencing to instead use
      ptep_get() helper.  This means that by default, the accesses change from a
      C dereference to a READ_ONCE().  This is technically the correct thing to
      do since where pgtables are modified by HW (for access/dirty) they are
      volatile and therefore we should always ensure READ_ONCE() semantics.
      
      But more importantly, by always using the helper, it can be overridden by
      the architecture to fully encapsulate the contents of the pte.  Arch code
      is deliberately not converted, as the arch code knows best.  It is
      intended that arch code (arm64) will override the default with its own
      implementation that can (e.g.) hide certain bits from the core code, or
      determine young/dirty status by mixing in state from another source.
      
      Conversion was done using Coccinelle:
      
      ----
      
      // $ make coccicheck \
      //          COCCI=ptepget.cocci \
      //          SPFLAGS="--include-headers" \
      //          MODE=patch
      
      virtual patch
      
      @ depends on patch @
      pte_t *v;
      @@
      
      - *v
      + ptep_get(v)
      
      ----
      
      Then reviewed and hand-edited to avoid multiple unnecessary calls to
      ptep_get(), instead opting to store the result of a single call in a
      variable, where it is correct to do so.  This aims to negate any cost of
      READ_ONCE() and will benefit arch-overrides that may be more complex.
      
      Included is a fix for an issue in an earlier version of this patch that
      was pointed out by kernel test robot.  The issue arose because config
      MMU=n elides definition of the ptep helper functions, including
      ptep_get().  HUGETLB_PAGE=n configs still define a simple
      huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
      So when both configs are disabled, this caused a build error because
      ptep_get() is not defined.  Fix by continuing to do a direct dereference
      when MMU=n.  This is safe because for this config the arch code cannot be
      trying to virtualize the ptes because none of the ptep helpers are
      defined.
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.comReported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c33c7948
    • Ryan Roberts's avatar
      mm: move ptep_get() and pmdp_get() helpers · 6c1d2a07
      Ryan Roberts authored
      There are many call sites that directly dereference a pte_t pointer.  This
      makes it very difficult to properly encapsulate a page table in the arch
      code without having to allocate shadow page tables.
      
      We will shortly solve this by replacing all the call sites with ptep_get()
      calls.  But there are call sites above the function definition in the
      header file, so let's move ptep_get() to an earlier location to solve that
      problem.  And move pmdp_get() at the same time to keep it close to
      ptep_get().
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-3-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c1d2a07
    • Ryan Roberts's avatar
      mm: ptdump should use ptep_get_lockless() · 426931e7
      Ryan Roberts authored
      Patch series "Encapsulate PTE contents from non-arch code", v3.
      
      A series to improve the encapsulation of pte entries by disallowing
      non-arch code from directly dereferencing pte_t pointers.
      
      This means that by default, the accesses change from a C dereference to a
      READ_ONCE().  This is technically the correct thing to do since where
      pgtables are modified by HW (for access/dirty) they are volatile and
      therefore we should always ensure READ_ONCE() semantics.
      
      But more importantly, by always using the helper, it can be overridden by
      the architecture to fully encapsulate the contents of the pte.  Arch code
      is deliberately not converted, as the arch code knows best.  It is
      intended that arch code (arm64) will override the default with its own
      implementation that can (e.g.) hide certain bits from the core code, or
      determine young/dirty status by mixing in state from another source.
      
      
      This patch (of 3):
      
      The page table dumper uses walk_page_range_novma() to walk the page
      tables, which does not lock the PTL before calling the pte_entry()
      callback.  Therefore, the page table dumper's callback must use
      ptep_get_lockless() rather than ptep_get() to ensure that the pte it reads
      is not torn or otherwise corrupt when racing with writers.
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-1-ryan.roberts@arm.com
      Link: https://lkml.kernel.org/r/20230612151545.3317766-2-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      426931e7
    • Catalin Marinas's avatar
      sh: move the ARCH_DMA_MINALIGN definition to asm/cache.h · e6926a4d
      Catalin Marinas authored
      The sh architecture defines ARCH_DMA_MINALIGN in asm/page.h.  Move it to
      asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in
      linux/cache.h without redefine errors/warnings.
      
      Link: https://lkml.kernel.org/r/20230613155245.1228274-4-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e6926a4d
    • Catalin Marinas's avatar
      microblaze: move the ARCH_{DMA,SLAB}_MINALIGN definitions to asm/cache.h · 4ea57ce4
      Catalin Marinas authored
      The microblaze architecture defines ARCH_DMA_MINALIGN in asm/page.h.  Move
      it to asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in
      linux/cache.h without redefine errors/warnings.
      
      While at it, also move ARCH_SLAB_MINALIGN to asm/cache.h for
      consistency.
      
      Link: https://lkml.kernel.org/r/20230613155245.1228274-3-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ea57ce4
    • Catalin Marinas's avatar
      powerpc: move the ARCH_DMA_MINALIGN definition to asm/cache.h · 78615c4d
      Catalin Marinas authored
      Patch series "Move the ARCH_DMA_MINALIGN definition to asm/cache.h".
      
      The ARCH_KMALLOC_MINALIGN reduction series defines a generic
      ARCH_DMA_MINALIGN in linux/cache.h:
      
      https://lore.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com/
      
      Unfortunately, this causes a duplicate definition warning for
      microblaze, powerpc (32-bit only) and sh as these architectures define
      ARCH_DMA_MINALIGN in a different file than asm/cache.h. Move the macro
      to asm/cache.h to avoid this issue and also bring them in line with the
      other architectures.
      
      
      This patch (of 3):
      
      The powerpc architecture defines ARCH_DMA_MINALIGN in asm/page_32.h and
      only if CONFIG_NOT_COHERENT_CACHE is enabled (32-bit platforms only). 
      Move this macro to asm/cache.h to allow a generic ARCH_DMA_MINALIGN
      definition in linux/cache.h without redefine errors/warnings.
      
      Link: https://lkml.kernel.org/r/20230613155245.1228274-1-catalin.marinas@arm.com
      Link: https://lkml.kernel.org/r/20230613155245.1228274-2-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202306131053.1ybvRRhO-lkp@intel.com/
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78615c4d
    • Catalin Marinas's avatar
      arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 · 1c1a429e
      Catalin Marinas authored
      With the DMA bouncing of unaligned kmalloc() buffers now in place, enable
      it for arm64 to allow the kmalloc-{8,16,32,48,96} caches.  In addition,
      always create the swiotlb buffer even when the end of RAM is within the
      32-bit physical address range (the swiotlb buffer can still be disabled on
      the kernel command line).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-18-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1c1a429e
    • Catalin Marinas's avatar
      mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible · b035f5a6
      Catalin Marinas authored
      If an architecture opted in to DMA bouncing of unaligned kmalloc() buffers
      (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc() cache
      alignment below cache-line size to ARCH_KMALLOC_MINALIGN.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-17-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b035f5a6
    • Catalin Marinas's avatar
      iommu/dma: force bouncing if the size is not cacheline-aligned · 861370f4
      Catalin Marinas authored
      Similarly to the direct DMA, bounce small allocations as they may have
      originated from a kmalloc() cache not safe for DMA. Unlike the direct
      DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
      non-coherent devices as this would break some cases where the iova is
      expected to be contiguous (dmabuf). Instead, scan the scatterlist for
      any small sizes and only go the swiotlb path if any element of the list
      needs bouncing (note that iommu_dma_map_page() would still only bounce
      those buffers which are not DMA-aligned).
      
      To avoid scanning the scatterlist on the 'sync' operations, introduce an
      SG_DMA_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
      dev_use_swiotlb() function together with the newly added
      dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
      kmalloc() buffers (suggested by Robin Murphy).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-16-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      861370f4
    • Catalin Marinas's avatar
      dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned · 370645f4
      Catalin Marinas authored
      For direct DMA, if the size is small enough to have originated from a
      kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
      dma_get_cache_alignment() and bounce if necessary.  For larger sizes, it
      is the responsibility of the DMA API caller to ensure proper alignment.
      
      At this point, the kmalloc() caches are properly aligned but this will
      change in a subsequent patch.
      
      Architectures can opt in by selecting DMA_BOUNCE_UNALIGNED_KMALLOC.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-15-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      370645f4
    • Robin Murphy's avatar
      dma-mapping: name SG DMA flag helpers consistently · cb147bbe
      Robin Murphy authored
      sg_is_dma_bus_address() is inconsistent with the naming pattern of its
      corresponding setters and its own kerneldoc, so take the majority vote and
      rename it sg_dma_is_bus_address() (and fix up the missing underscores in
      the kerneldoc too).  This gives us a nice clear pattern where SG DMA flags
      are SG_DMA_<NAME>, and the helpers for acting on them are
      sg_dma_<action>_<name>().
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-14-catalin.marinas@arm.comSigned-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJerry Snitselaar <jsnitsel@redhat.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
        Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.comTested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb147bbe
    • Robin Murphy's avatar
      scatterlist: add dedicated config for DMA flags · af2880ec
      Robin Murphy authored
      The DMA flags field will be useful for users beyond PCI P2P, so upgrade to
      its own dedicated config option.
      
      [catalin.marinas@arm.com: use #ifdef CONFIG_NEED_SG_DMA_FLAGS in scatterlist.h]
      [catalin.marinas@arm.com: update PCI_P2PDMA dma_flags comment in scatterlist.h]
      Link: https://lkml.kernel.org/r/20230612153201.554742-13-catalin.marinas@arm.comSigned-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      af2880ec
    • Catalin Marinas's avatar
      arm64: allow kmalloc() caches aligned to the smaller cache_line_size() · 9382bc44
      Catalin Marinas authored
      On arm64, ARCH_DMA_MINALIGN is 128, larger than the cache line size on
      most of the current platforms (typically 64).  Define
      ARCH_KMALLOC_MINALIGN to 8 (the default for architectures without their
      own ARCH_DMA_MINALIGN) and override dma_get_cache_alignment() to return
      cache_line_size(), probed at run-time.  The kmalloc() caches will be
      limited to the cache line size.  This will allow the additional
      kmalloc-{64,192} caches on most arm64 platforms.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-12-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9382bc44
    • Catalin Marinas's avatar
      iio: core: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · 88b216d3
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-11-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88b216d3
    • Catalin Marinas's avatar
      dm-crypt: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · 7bc75714
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-10-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7bc75714
    • Catalin Marinas's avatar
      drivers/spi: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · 3cbbb410
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-9-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3cbbb410
    • Catalin Marinas's avatar
      drivers/usb: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · 075efe7c
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-8-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      075efe7c
    • Catalin Marinas's avatar
      drivers/gpu: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · 6716ccaf
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-7-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6716ccaf
    • Catalin Marinas's avatar
      drivers/base: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · be6a5b5e
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-6-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      be6a5b5e
    • Catalin Marinas's avatar
      mm/slab: limit kmalloc() minimum alignment to dma_get_cache_alignment() · 963e84b0
      Catalin Marinas authored
      Do not create kmalloc() caches which are not aligned to
      dma_get_cache_alignment().  There is no functional change since for
      current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN
      equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()).  On
      architectures without a specific ARCH_DMA_MINALIGN,
      dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-5-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      963e84b0
    • Catalin Marinas's avatar
      mm/slab: simplify create_kmalloc_cache() args and make it static · 0c474d31
      Catalin Marinas authored
      In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead
      of initialising the kmalloc_caches array directly.  With this,
      create_kmalloc_cache() is now only called from new_kmalloc_cache() in the
      same file, so make it static.  In addition, the useroffset argument is
      always 0 while usersize is the same as size.  Remove them.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c474d31
    • Catalin Marinas's avatar
      dma: allow dma_get_cache_alignment() to be overridden by the arch code · 8c57da28
      Catalin Marinas authored
      On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
      configurations deployed.  Allow an architecture to override
      dma_get_cache_alignment() in order to return a run-time probed value (e.g.
      cache_line_size()).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-3-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c57da28
    • Catalin Marinas's avatar
      mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN · 4ab5f8ec
      Catalin Marinas authored
      Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7.
      
      A series reducing the kmalloc() minimum alignment on arm64 to 8 (from
      128).  
      
      
      This patch (of 17):
      
      In preparation for supporting a kmalloc() minimum alignment smaller than
      the arch DMA alignment, decouple the two definitions.  This requires that
      either the kmalloc() caches are aligned to a (run-time) cache-line size or
      the DMA API bounces unaligned kmalloc() allocations.  Subsequent patches
      will implement both options.
      
      After this patch, ARCH_DMA_MINALIGN is expected to be used in static
      alignment annotations and defined by an architecture to be the maximum
      alignment for all supported configurations/SoCs in a single Image. 
      Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
      define its value in the arch headers.
      
      Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
      dma_get_cache_alignment() so that there is no change for architectures not
      requiring a minimum DMA alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com
      Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ab5f8ec
    • Peter Xu's avatar
      mm/hugetlb: fix pgtable lock on pmd sharing · 349d1670
      Peter Xu authored
      Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable
      in this case because it should only work for last level pte changes, while
      pmd sharing is always one level higher.
      
      Meanwhile, here we're locking over the spte pgtable lock which is even not
      a lock for current mm but someone else's.
      
      It seems even racy on operating on the lock, as after put_page() of the
      spte pgtable page logically the page can be released, so at least the
      spin_unlock() needs to be done after the put_page().
      
      No report I am aware, I'm not even sure whether it'll just work on taking
      the spte pmd lock, because while we're holding i_mmap read lock it probably
      means the vma interval tree is frozen, all pte allocators over this pud
      entry could always find the specific svma and spte page, so maybe they'll
      serialize on this spte page lock?  Even so, doesn't seem to be expected.
      It just seems to be an accident of cb900f41.
      
      Fix it with the proper pud lock (which is the mm's page_table_lock).
      
      Link: https://lkml.kernel.org/r/20230612160420.809818-1-peterx@redhat.com
      Fixes: cb900f41 ("mm, hugetlb: convert hugetlbfs to use split pmd lock")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      349d1670
    • Sidhartha Kumar's avatar
      mm: remove set_compound_page_dtor() · b95826c9
      Sidhartha Kumar authored
      All users can use the folio equivalent so this function can be safely
      removed.
      
      Link: https://lkml.kernel.org/r/20230612163405.99345-1-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Tarun Sahu <tsahu@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b95826c9
    • Hugh Dickins's avatar
      perf/core: allow pte_offset_map() to fail · a92cbb82
      Hugh Dickins authored
      In rare transient cases, not yet made possible, pte_offset_map() and
      pte_offet_map_lock() may not find a page table: handle appropriately.
      
      [hughd@google.com: __wp_page_copy_user(): don't call update_mmu_tlb() with NULL]
        Link: https://lkml.kernel.org/r/1a4db221-7872-3594-57ce-42369945ec8d@google.com
      Link: https://lkml.kernel.org/r/a194441b-63f3-adb6-5964-7ca3171ae7c2@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a92cbb82
    • Hugh Dickins's avatar
      mm/swap: swap_vma_readahead() do the pte_offset_map() · 4f8fcf4c
      Hugh Dickins authored
      swap_vma_readahead() has been proceeding in an unconventional way, its
      preliminary swap_ra_info() doing the pte_offset_map() and pte_unmap(),
      then relying on that pte pointer even after the pte_unmap() - in its
      CONFIG_64BIT case (I think !CONFIG_HIGHPTE was intended; whereas 32-bit
      copied ptes to stack while they were mapped, but had to limit how many).
      
      Though it would be difficult to construct a failing testcase, accessing
      page table after pte_unmap() will become bad practice, even on 64-bit: an
      rcu_read_unlock() in pte_unmap() will allow page table to be freed.
      
      Move relevant definitions from include/linux/swap.h to mm/swap_state.c,
      nothing else used them.  Delete the CONFIG_64BIT distinction and buffer,
      delete all reference to ptes from swap_ra_info(), use pte_offset_map()
      repeatedly in swap_vma_readahead(), breaking from the loop if it fails.
      
      (Will the repeated "map" and "unmap" show up as a slowdown anywhere?  If
      so, maybe modify __read_swap_cache_async() to do the pte_unmap() only when
      it does not find the page already in the swapcache.)
      
      Use ptep_get_lockless(), mainly for its READ_ONCE().  Correctly advance
      the address passed down to each call of __read__swap_cache_async().
      
      Link: https://lkml.kernel.org/r/b7c64ab3-9e44-aac0-d2b-c57de578af1c@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4f8fcf4c
    • Hugh Dickins's avatar
      mm/pgtable: delete pmd_trans_unstable() and friends · feda5c39
      Hugh Dickins authored
      Delete pmd_trans_unstable, pmd_none_or_trans_huge_or_clear_bad() and
      pmd_devmap_trans_unstable(), all now unused.
      
      With mixed feelings, delete all the comments on pmd_trans_unstable(). 
      That was very good documentation of a subtle state, and this series does
      not even eliminate that state: but rather, normalizes and extends it,
      asking pte_offset_map[_lock]() callers to anticipate failure, without
      regard for whether mmap_read_lock() or mmap_write_lock() is held.
      
      Retain pud_trans_unstable(), which has one use in __handle_mm_fault(), but
      delete its equivalent pud_none_or_trans_huge_or_dev_or_clear_bad().  While
      there, move the default arch_needs_pgtable_deposit() definition up near
      where pgtable_trans_huge_deposit() and withdraw() are declared.
      
      Link: https://lkml.kernel.org/r/5abdab3-3136-b42e-274d-9c6281bfb79@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      feda5c39
    • Hugh Dickins's avatar
      mm/memory: handle_pte_fault() use pte_offset_map_nolock() · c7ad0880
      Hugh Dickins authored
      handle_pte_fault() use pte_offset_map_nolock() to get the vmf.ptl which
      corresponds to vmf.pte, instead of pte_lockptr() being used later, when
      there's a chance that the pmd entry might have changed, perhaps to none,
      or to a huge pmd, with no split ptlock in its struct page.
      
      Remove its pmd_devmap_trans_unstable() call: pte_offset_map_nolock() will
      handle that case by failing.  Update the "morph" comment above, looking
      forward to when shmem or file collapse to THP may not take mmap_lock for
      write (or not at all).
      
      do_numa_page() use the vmf->ptl from handle_pte_fault() at first, but
      refresh it when refreshing vmf->pte.
      
      do_swap_page()'s pte_unmap_same() (the thing that takes ptl to verify a
      two-part PAE orig_pte) use the vmf->ptl from handle_pte_fault() too; but
      do_swap_page() is also used by anon THP's __collapse_huge_page_swapin(),
      so adjust that to set vmf->ptl by pte_offset_map_nolock().
      
      Link: https://lkml.kernel.org/r/c1107654-3929-60ac-223e-6877cbb86065@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c7ad0880
    • Hugh Dickins's avatar
      mm/memory: allow pte_offset_map[_lock]() to fail · 3db82b93
      Hugh Dickins authored
      copy_pte_range(): use pte_offset_map_nolock(), and allow for it to fail;
      but with a comment on some further assumptions that are being made there.
      
      zap_pte_range() and zap_pmd_range(): adjust their interaction so that a
      pte_offset_map_lock() failure in zap_pte_range() leads to a retry in
      zap_pmd_range(); remove call to pmd_none_or_trans_huge_or_clear_bad().
      
      Allow pte_offset_map_lock() to fail in many functions.  Update comment on
      calling pte_alloc() in do_anonymous_page().  Remove redundant calls to
      pmd_trans_unstable(), pmd_devmap_trans_unstable(), pmd_none() and
      pmd_bad(); but leave pmd_none_or_clear_bad() calls in free_pmd_range() and
      copy_pmd_range(), those do simplify the next level down.
      
      Link: https://lkml.kernel.org/r/bb548d50-e99a-f29e-eab1-a43bef2a1287@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3db82b93
    • Hugh Dickins's avatar
      mm/khugepaged: allow pte_offset_map[_lock]() to fail · 895f5ee4
      Hugh Dickins authored
      __collapse_huge_page_swapin(): don't drop the map after every pte, it only
      has to be dropped by do_swap_page(); give up if pte_offset_map() fails;
      trace_mm_collapse_huge_page_swapin() at the end, with result; fix comment
      on returned result; fix vmf.pgoff, though it's not used.
      
      collapse_huge_page(): use pte_offset_map_lock() on the _pmd returned from
      clearing; allow failure, but it should be impossible there. 
      hpage_collapse_scan_pmd() and collapse_pte_mapped_thp() allow for
      pte_offset_map_lock() failure.
      
      Link: https://lkml.kernel.org/r/6513e85-d798-34ec-3762-7c24ffb9329@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      895f5ee4
    • Hugh Dickins's avatar
      mm/huge_memory: split huge pmd under one pte_offset_map() · c9c1ee20
      Hugh Dickins authored
      __split_huge_zero_page_pmd() use a single pte_offset_map() to sweep the
      extent: it's already under pmd_lock(), so this is no worse for latency;
      and since it's supposed to have full control of the just-withdrawn page
      table, here choose to VM_BUG_ON if it were to fail.  And please don't
      increment haddr by PAGE_SIZE, that should remain huge aligned: declare a
      separate addr (not a bugfix, but it was deceptive).
      
      __split_huge_pmd_locked() likewise (but it had declared a separate addr);
      and change its BUG_ON(!pte_none) to VM_BUG_ON, for consistency with zero
      (those deposited page tables are sometimes victims of random corruption).
      
      Link: https://lkml.kernel.org/r/90cbed7f-90d9-b779-4a46-d2485baf9595@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c9c1ee20
    • Hugh Dickins's avatar
      mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() · 2378118b
      Hugh Dickins authored
      There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to
      distinguish huge_zero_page from a normal THP: follow_page_pte() handles
      any instability, and here it's a good idea to replace any pmd_none(*pmd)
      by a page table a.s.a.p, in the huge_zero_page case as for a normal THP;
      and this removes an unnecessary possibility of -EBUSY failure.
      
      (Hmm, couldn't the normal THP case have hit an unstably refaulted THP
      before?  But there are only two, exceptional, users of FOLL_SPLIT_PMD.)
      
      Link: https://lkml.kernel.org/r/59fd15dd-4d39-5ec-2043-1d5117f7f85@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2378118b
    • Hugh Dickins's avatar
      mm/migrate_device: allow pte_offset_map_lock() to fail · 4b56069c
      Hugh Dickins authored
      migrate_vma_collect_pmd(): remove the pmd_trans_unstable() handling after
      splitting huge zero pmd, and the pmd_none() handling after successfully
      splitting huge page: those are now managed inside pte_offset_map_lock(),
      and by "goto again" when it fails.
      
      But the skip after unsuccessful split_huge_page() must stay: it avoids an
      endless loop.  The skip when pmd_bad()?  Remove that: it will be treated
      as a hole rather than a skip once cleared by pte_offset_map_lock(), but
      with different timing that would be so anyway; and it's arguably best to
      leave the pmd_bad() handling centralized there.
      
      migrate_vma_insert_page(): remove comment on the old pte_offset_map() and
      old locking limitations; remove the pmd_trans_unstable() check and just
      proceed to pte_offset_map_lock(), aborting when it fails (page has been
      charged to memcg, but as in other cases, it's uncharged when freed).
      
      Link: https://lkml.kernel.org/r/1131be62-2e84-da2f-8f45-807b2cbeeec5@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b56069c
    • Hugh Dickins's avatar
      mm/mglru: allow pte_offset_map_nolock() to fail · 52fc0483
      Hugh Dickins authored
      MGLRU's walk_pte_range() use the safer pte_offset_map_nolock(), rather
      than pte_lockptr(), to get the ptl for its trylock.  Just return false and
      move on to next extent if it fails, like when the trylock fails.  Remove
      the VM_WARN_ON_ONCE(pmd_leaf) since that will happen, rarely.
      
      Link: https://lkml.kernel.org/r/51ece73e-7398-2e4a-2384-56708c87844f@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      52fc0483