1. 18 Aug, 2023 40 commits
    • ZhangPeng's avatar
      mm/page_io: remove unneeded SetPageError() · 9962ed64
      ZhangPeng authored
      Nobody checks the PageError()/folio_test_error() for the page/folio in
      __end_swap_bio_read/write() and sio_write_complete(). Therefore, we
      don't need to set the error flag. Just drop it.
      
      Link: https://lkml.kernel.org/r/20230721034451.16412-3-zhangpeng362@huawei.comSigned-off-by: default avatarZhangPeng <zhangpeng362@huawei.com>
      Suggested-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Nanyong Sun <sunnanyong@huawei.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9962ed64
    • ZhangPeng's avatar
      mm/page_io: remove unneeded ClearPageUptodate() · 479c3304
      ZhangPeng authored
      Patch series "Convert several functions in page_io.c to use a folio", v4.
      
      Convert several functions in page_io.c to use a folio, which can remove
      several implicit calls to compound_head().
      
      
      This patch (of 10):
      
      The VM_BUG_ON_FOLIO in swap_readpage() ensures that the page is already
      !uptodate in __end_swap_bio_read() and sio_read_complete().  Just remove
      unneeded ClearPageUptodate().
      
      Link: https://lkml.kernel.org/r/20230721034451.16412-1-zhangpeng362@huawei.com
      Link: https://lkml.kernel.org/r/20230721034451.16412-2-zhangpeng362@huawei.comSigned-off-by: default avatarZhangPeng <zhangpeng362@huawei.com>
      Suggested-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Nanyong Sun <sunnanyong@huawei.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      479c3304
    • Kemeng Shi's avatar
      mm/compaction: avoid unneeded pageblock_end_pfn when no_set_skip_hint is set · 3c099a2b
      Kemeng Shi authored
      Move pageblock_end_pfn after no_set_skip_hint check to avoid unneeded
      pageblock_end_pfn if no_set_skip_hint is set.
      
      Link: https://lkml.kernel.org/r/20230721150957.2058634-3-shikemeng@huawei.comSigned-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3c099a2b
    • Kemeng Shi's avatar
      mm/compaction: correct comment of candidate pfn in fast_isolate_freepages · e6bd14ec
      Kemeng Shi authored
      Patch series "Two minor cleanups for compaction", v2.
      
      This series contains two random cleanups for compaction.
      
      
      This patch (of 2):
      
      If no preferred one was not found, we will use candidate page with maximum
      pfn > min_pfn which is saved in high_pfn.  Correct "minimum" to "maximum
      candidate" in comment.
      
      Link: https://lkml.kernel.org/r/20230721150957.2058634-1-shikemeng@huawei.com
      Link: https://lkml.kernel.org/r/20230721150957.2058634-2-shikemeng@huawei.comSigned-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e6bd14ec
    • Miaohe Lin's avatar
      mm/mprotect: fix obsolete function name in change_pte_range() · eafcb7a9
      Miaohe Lin authored
      Since commit 79a1971c ("mm: move the copy_one_pte() pte_present check
      into the caller"), the explanation of preserving soft-dirtiness is moved
      into copy_nonpresent_pte().  Update corresponding comment.
      
      Link: https://lkml.kernel.org/r/20230723033114.3224409-1-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eafcb7a9
    • Ryan Roberts's avatar
      selftests/mm: run all tests from run_vmtests.sh · 05f1edac
      Ryan Roberts authored
      It is very unclear to me how one is supposed to run all the mm selftests
      consistently and get clear results.
      
      Most of the test programs are launched by both run_vmtests.sh and
      run_kselftest.sh:
      
        hugepage-mmap
        hugepage-shm
        map_hugetlb
        hugepage-mremap
        hugepage-vmemmap
        hugetlb-madvise
        map_fixed_noreplace
        gup_test
        gup_longterm
        uffd-unit-tests
        uffd-stress
        compaction_test
        on-fault-limit
        map_populate
        mlock-random-test
        mlock2-tests
        mrelease_test
        mremap_test
        thuge-gen
        virtual_address_range
        va_high_addr_switch
        mremap_dontunmap
        hmm-tests
        madv_populate
        memfd_secret
        ksm_tests
        ksm_functional_tests
        soft-dirty
        cow
      
      However, of this set, when launched by run_vmtests.sh, some of the
      programs are invoked multiple times with different arguments. When
      invoked by run_kselftest.sh, they are invoked without arguments (and as
      a consequence, some fail immediately).
      
      Some test programs are only launched by run_vmtests.sh:
      
        test_vmalloc.sh
      
      And some test programs and only launched by run_kselftest.sh:
      
        khugepaged
        migration
        mkdirty
        transhuge-stress
        split_huge_page_test
        mdwe_test
        write_to_hugetlbfs
      
      Furthermore, run_vmtests.sh is invoked by run_kselftest.sh, so in this
      case all the test programs invoked by both scripts are run twice!
      
      Needless to say, this is a bit of a mess. In the absence of fully
      understanding the history here, it looks to me like the best solution is
      to launch ALL test programs from run_vmtests.sh, and ONLY invoke
      run_vmtests.sh from run_kselftest.sh. This way, we get full control over
      the parameters, each program is only invoked the intended number of
      times, and regardless of which script is used, the same tests get run in
      the same way.
      
      The only drawback is that if using run_kselftest.sh, it's top-level tap
      result reporting reports only a single test and it fails if any of the
      contained tests fail. I don't see this as a big deal though since we
      still see all the nested reporting from multiple layers. The other issue
      with this is that all of run_vmtests.sh must execute within a single
      kselftest timeout period, so let's increase that to something more
      suitable.
      
      In the Makefile, TEST_GEN_PROGS will compile and install the tests and
      will add them to the list of tests that run_kselftest.sh will run.
      TEST_GEN_FILES will compile and install the tests but will not add them
      to the test list. So let's move all the programs from TEST_GEN_PROGS to
      TEST_GEN_FILES so that they are built but not executed by
      run_kselftest.sh. Note that run_vmtests.sh is added to TEST_PROGS, which
      means it ends up in the test list. (the lack of "_GEN" means it won't be
      compiled, but simply copied).
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-9-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      05f1edac
    • Ryan Roberts's avatar
      selftests/mm: optionally pass duration to transhuge-stress · e1706210
      Ryan Roberts authored
      Until now, transhuge-stress runs until its explicitly killed, so when
      invoked by run_kselftest.sh, it would run until the test timeout, then it
      would be killed and the test would be marked as failed.
      
      Add a new, optional command line parameter that allows the user to specify
      the duration in seconds that the program should run.  The program exits
      after this duration with a success (0) exit code.  If the argument is
      omitted the old behacvior remains.
      
      On it's own, this doesn't quite solve our problem because run_kselftest.sh
      does not allow passing parameters to the program under test.  But we will
      shortly move this to run_vmtests.sh, which does allow parameter passing.
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-8-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e1706210
    • Ryan Roberts's avatar
      selftests/mm: make migration test robust to failure · 00030332
      Ryan Roberts authored
      The `migration` test currently has a number of robustness problems that
      cause it to hang and leak resources.
      
      Timeout: There are 3 tests, which each previously ran for 60 seconds. 
      However, the timeout in mm/settings for a single test binary was set to 45
      seconds.  So when run using run_kselftest.sh, the top level timeout would
      trigger before the test binary was finished.  Solve this by meeting in the
      middle; each of the 3 tests now runs for 20 seconds (for a total of 60),
      and the top level timeout is set to 90 seconds.
      
      Leaking child processes: the `shared_anon` test fork()s some children but
      then an ASSERT() fires before the test kills those children.  The assert
      causes immediate exit of the parent and leaking of the children. 
      Furthermore, if run using the run_kselftest.sh wrapper, the wrapper would
      get stuck waiting for those children to exit, which never happens.  Solve
      this by setting the "parent death signal" to SIGHUP in the child, so that
      the child is killed automatically if the parent dies.
      
      With these changes, the test binary now runs to completion on arm64, with
      2 tests passing and the `shared_anon` test failing.
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-7-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      00030332
    • Ryan Roberts's avatar
      selftests/mm: va_high_addr_switch should skip unsupported arm64 configs · 49f09526
      Ryan Roberts authored
      va_high_addr_switch has a mechanism to determine if the tests should be
      run or skipped (supported_arch()).  This currently returns unconditionally
      true for arm64.  However, va_high_addr_switch also requires a large
      virtual address space for the tests to run, otherwise they spuriously
      fail.
      
      Since arm64 can only support VA > 48 bits when the page size is 64K, let's
      decide whether we should skip the test suite based on the page size.  This
      reduces noise when running on 4K and 16K kernels.
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-6-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      49f09526
    • Ryan Roberts's avatar
      selftests/mm: fix thuge-gen test bugs · 6e16f513
      Ryan Roberts authored
      thuge-gen was previously only munmapping part of the mmapped buffer, which
      caused us to run out of 1G huge pages for a later part of the test.  Fix
      this by munmapping the whole buffer.  Based on the code, it looks like a
      typo rather than an intention to keep some of the buffer mapped.
      
      thuge-gen was also calling mmap with SHM_HUGETLB flag (bit 11 set), which
      is actually MAP_DENYWRITE in mmap context.  The man page says this flag is
      ignored in modern kernels.  I'm pretty sure from the context that the
      author intended to pass the MAP_HUGETLB flag so I've fixed that up too.
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-5-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6e16f513
    • Ryan Roberts's avatar
      selftests/mm: enable mrelease_test for arm64 · e515bce9
      Ryan Roberts authored
      mrelease_test defaults to defining __NR_pidfd_open and
      __NR_process_mrelease syscall numbers to -1, if they are not defined
      anywhere else, and the suite would then be marked as skipped as a result.
      
      arm64 (at least the stock debian toolchain that I'm using) requires
      including <sys/syscall.h> to pull in the defines for these syscalls.  So
      let's add this header.  With this in place, the test is passing on arm64.
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-4-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e515bce9
    • Ryan Roberts's avatar
      selftests/mm: skip soft-dirty tests on arm64 · f6dd4e22
      Ryan Roberts authored
      arm64 does not support the soft-dirty PTE bit.  However, the `soft-dirty`
      test suite is currently run unconditionally and therefore generates
      spurious test failures on arm64.  There are also some tests in
      `madv_populate` which assume it is supported.
      
      For `soft-dirty` lets disable the whole suite for arm64; it is no longer
      built and run_vmtests.sh will skip it if its not present.
      
      For `madv_populate`, we need a runtime mechanism so that the remaining
      tests continue to be run.  Unfortunately, the only way to determine if the
      soft-dirty dirty bit is supported is to write to a page, then see if the
      bit is set in /proc/self/pagemap.  But the tests that we want to
      conditionally execute are testing precicesly this.  So if we introduced
      this feature check, we could accedentally turn a real failure (on a system
      that claims to support soft-dirty) into a skip.  So instead, do the check
      based on architecture; for arm64, we report that soft-dirty is not
      supported.
      
      Link: https://lkml.kernel.org/r/20230724082522.1202616-3-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f6dd4e22
    • Ryan Roberts's avatar
      selftests: line buffer test program's stdout · 58e2847a
      Ryan Roberts authored
      Patch series "selftests/mm fixes for arm64", v3.
      
      Given my on-going work on large anon folios and contpte mappings, I
      decided it would be a good idea to start running mm selftests to help
      guard against regressions.  However, it soon became clear that I
      couldn't get the suite to run cleanly on arm64 with a vanilla v6.5-rc1
      kernel (perhaps I'm just doing it wrong??), so got stuck in a rabbit
      hole trying to debug and fix all the issues.  Some were down to
      misconfigurations, but I also found a number of issues with the tests
      and even a couple of issues with the kernel.
      
      
      This patch (of 8):
      
      The selftests runner pipes the test program's stdout to tap_prefix.  The
      presence of the pipe means that the test program sets its stdout to be
      fully buffered (as aposed to line buffered when directly connected to the
      terminal).  The block buffering means that there is often content in the
      buffer at fork() time, which causes the output to end up duplicated.  This
      was causing problems for mm:cow where test results were duplicated 20-30x.
      
      Solve this by using `stdbuf`, when available to force the test program to
      use line buffered mode.  This means previously printf'ed results are
      flushed out of the program before any fork().
      
      Additionally, explicitly set line buffer mode in ksft_print_header(),
      which means that all test programs that use the ksft framework will
      benefit even if stdbuf is not present on the system.
      
      [ryan.roberts@arm.com: add setvbuf() to set buffering mode]
        Link: https://lkml.kernel.org/r/20230726070655.2713530-1-ryan.roberts@arm.com
      Link: https://lkml.kernel.org/r/20230724082522.1202616-1-ryan.roberts@arm.com
      Link: https://lkml.kernel.org/r/20230724082522.1202616-2-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58e2847a
    • Miaohe Lin's avatar
      mm: fix obsolete function name above debug_pagealloc_enabled_static() · ea09800b
      Miaohe Lin authored
      Since commit 04013513 ("mm, page_alloc: do not rely on the order of
      page_poison and init_on_alloc/free parameters"), init_debug_pagealloc() is
      converted to init_mem_debugging_and_hardening().  Later it's renamed to
      mem_debugging_and_hardening_init() via commit f2fc4b44 ("mm: move
      init_mem_debugging_and_hardening() to mm/mm_init.c").
      
      Link: https://lkml.kernel.org/r/20230720112806.3851893-1-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ea09800b
    • Alistair Popple's avatar
      mmu_notifiers: rename invalidate_range notifier · 1af5a810
      Alistair Popple authored
      There are two main use cases for mmu notifiers.  One is by KVM which uses
      mmu_notifier_invalidate_range_start()/end() to manage a software TLB.
      
      The other is to manage hardware TLBs which need to use the
      invalidate_range() callback because HW can establish new TLB entries at
      any time.  Hence using start/end() can lead to memory corruption as these
      callbacks happen too soon/late during page unmap.
      
      mmu notifier users should therefore either use the start()/end() callbacks
      or the invalidate_range() callbacks.  To make this usage clearer rename
      the invalidate_range() callback to arch_invalidate_secondary_tlbs() and
      update documention.
      
      Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Suggested-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Andrew Donnellan <ajd@linux.ibm.com>
      Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
      Cc: Frederic Barrat <fbarrat@linux.ibm.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nicolin Chen <nicolinc@nvidia.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zhi Wang <zhi.wang.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1af5a810
    • Alistair Popple's avatar
      mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() · ec8832d0
      Alistair Popple authored
      Secondary TLBs are now invalidated from the architecture specific TLB
      invalidation functions.  Therefore there is no need to explicitly notify
      or invalidate as part of the range end functions.  This means we can
      remove mmu_notifier_invalidate_range_end_only() and some of the
      ptep_*_notify() functions.
      
      Link: https://lkml.kernel.org/r/90d749d03cbab256ca0edeb5287069599566d783.1690292440.git-series.apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Andrew Donnellan <ajd@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
      Cc: Frederic Barrat <fbarrat@linux.ibm.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nicolin Chen <nicolinc@nvidia.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zhi Wang <zhi.wang.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec8832d0
    • Alistair Popple's avatar
      mmu_notifiers: call invalidate_range() when invalidating TLBs · 6bbd42e2
      Alistair Popple authored
      The invalidate_range() is going to become an architecture specific mmu
      notifier used to keep the TLB of secondary MMUs such as an IOMMU in sync
      with the CPU page tables.  Currently it is called from separate code paths
      to the main CPU TLB invalidations.  This can lead to a secondary TLB not
      getting invalidated when required and makes it hard to reason about when
      exactly the secondary TLB is invalidated.
      
      To fix this move the notifier call to the architecture specific TLB
      maintenance functions for architectures that have secondary MMUs requiring
      explicit software invalidations.
      
      This fixes a SMMU bug on ARM64.  On ARM64 PTE permission upgrades require
      a TLB invalidation.  This invalidation is done by the architecture
      specific ptep_set_access_flags() which calls flush_tlb_page() if required.
      However this doesn't call the notifier resulting in infinite faults being
      generated by devices using the SMMU if it has previously cached a
      read-only PTE in it's TLB.
      
      Moving the invalidations into the TLB invalidation functions ensures all
      invalidations happen at the same time as the CPU invalidation.  The
      architecture specific flush_tlb_all() routines do not call the notifier as
      none of the IOMMUs require this.
      
      Link: https://lkml.kernel.org/r/0287ae32d91393a582897d6c4db6f7456b1001f2.1690292440.git-series.apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Suggested-by: default avatarJason Gunthorpe <jgg@ziepe.ca>
      Tested-by: default avatarSeongJae Park <sj@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Tested-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Andrew Donnellan <ajd@linux.ibm.com>
      Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
      Cc: Frederic Barrat <fbarrat@linux.ibm.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nicolin Chen <nicolinc@nvidia.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zhi Wang <zhi.wang.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6bbd42e2
    • Alistair Popple's avatar
      mmu_notifiers: fixup comment in mmu_interval_read_begin() · 57b037db
      Alistair Popple authored
      The comment in mmu_interval_read_begin() refers to a function that doesn't
      exist and uses the wrong call-back name.  The op for mmu interval
      notifiers is mmu_interval_notifier_ops->invalidate() so fix the comment up
      to reflect that.
      
      Link: https://lkml.kernel.org/r/e7a09081b3ac82a03c189409f1262fc2df91071e.1690292440.git-series.apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Andrew Donnellan <ajd@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
      Cc: Frederic Barrat <fbarrat@linux.ibm.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nicolin Chen <nicolinc@nvidia.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zhi Wang <zhi.wang.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      57b037db
    • Alistair Popple's avatar
      arm64/smmu: use TLBI ASID when invalidating entire range · 38b14e2e
      Alistair Popple authored
      Patch series "Invalidate secondary IOMMU TLB on permission upgrade", v4.
      
      The main change is to move secondary TLB invalidation mmu notifier
      callbacks into the architecture specific TLB flushing functions. This
      makes secondary TLB invalidation mostly match CPU invalidation while
      still allowing efficient range based invalidations based on the
      existing TLB batching code.
      
      
      This patch (of 5):
      
      The ARM SMMU has a specific command for invalidating the TLB for an entire
      ASID.  Currently this is used for the IO_PGTABLE API but not for ATS when
      called from the MMU notifier.
      
      The current implementation of notifiers does not attempt to invalidate
      such a large address range, instead walking each VMA and invalidating each
      range individually during mmap removal.  However in future SMMU TLB
      invalidations are going to be sent as part of the normal flush_tlb_*()
      kernel calls.  To better deal with that add handling to use TLBI ASID when
      invalidating the entire address space.
      
      Link: https://lkml.kernel.org/r/cover.1eca029b8603ef4eebe5b41eae51facfc5920c41.1690292440.git-series.apopple@nvidia.com
      Link: https://lkml.kernel.org/r/ba5f0ec5fbc2ab188797524d3687e075e2412a2b.1690292440.git-series.apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Andrew Donnellan <ajd@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
      Cc: Frederic Barrat <fbarrat@linux.ibm.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nicolin Chen <nicolinc@nvidia.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zhi Wang <zhi.wang.linux@gmail.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: SeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      38b14e2e
    • Liam R. Howlett's avatar
      maple_tree: Be more strict about locking · 19a462f0
      Liam R. Howlett authored
      Use lockdep to check the write path in the maple tree holds the lock in
      write mode.
      
      Introduce mt_write_lock_is_held() to check if the lock is held for
      writing.  Update the necessary checks for rcu_dereference_protected() to
      use the new write lock check.
      
      Link: https://lkml.kernel.org/r/20230714195551.894800-5-Liam.Howlett@oracle.comSigned-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oliver Sang <oliver.sang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      19a462f0
    • Liam R. Howlett's avatar
      mm/mmap: change detached vma locking scheme · 02fdb25f
      Liam R. Howlett authored
      Don't set the lock to the mm lock so that the detached VMA tree does not
      complain about being unlocked when the mmap_lock is dropped prior to
      freeing the tree.
      
      Introduce mt_on_stack() for setting the external lock to NULL only when
      LOCKDEP is used.
      
      Move the destroying of the detached tree outside the mmap lock all
      together.
      
      Link: https://lkml.kernel.org/r/20230719183142.ktgcmuj2pnlr3h3s@revolverSigned-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oliver Sang <oliver.sang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02fdb25f
    • Liam R. Howlett's avatar
      maple_tree: relax lockdep checks for on-stack trees · 134d153c
      Liam R. Howlett authored
      To support early release of the maple tree locks, do not lockdep check the
      lock if it is set to NULL.  This is intended for the special case on-stack
      use of tracking entries and not for general use.
      
      Link: https://lkml.kernel.org/r/20230714195551.894800-3-Liam.Howlett@oracle.comSigned-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oliver Sang <oliver.sang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      134d153c
    • Liam R. Howlett's avatar
      mm/mmap: clean up validate_mm() calls · 2574d5e4
      Liam R. Howlett authored
      Patch series "More strict maple tree lockdep", v2.
      
      Linus asked for more strict maple tree lockdep checking [1] and for them
      to resume the normal path through Andrews tree.
      
      This series of patches adds checks to ensure the lock is held in write
      mode during the write path of the maple tree instead of checking if it's
      held at all.
      
      It also reduces the validate_mm() calls by consolidating into commonly
      used functions (patch 0001), and removes the necessity of holding the lock
      on the detached tree during munmap() operations.
      
      
      This patch (of 4):
      
      validate_mm() calls are too spread out and duplicated in numerous
      locations.  Also, now that the stack write is done under the write lock,
      it is not necessary to validate the mm prior to write operations.
      
      Add a validate_mm() to the stack expansions, and to vma_complete() so
      that numerous others may be dropped.
      
      Note that vma_link() (and also insert_vm_struct() by call path) already
      call validate_mm().
      
      vma_merge() also had an unnecessary call to vma_iter_free() since the
      logic change to abort earlier if no merging is necessary.
      
      Drop extra validate_mm() calls at the start of functions and error paths
      which won't write to the tree.
      
      Relocate the validate_mm() call in the do_brk_flags() to avoid
      re-running the same test when vma_complete() is used.
      
      The call within the error path of mmap_region() is left intentionally
      because of the complexity of the function and the potential of drivers
      modifying the tree.
      
      Link: https://lkml.kernel.org/r/20230714195551.894800-1-Liam.Howlett@oracle.com
      Link: https://lkml.kernel.org/r/20230714195551.894800-2-Liam.Howlett@oracle.comSigned-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oliver Sang <oliver.sang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2574d5e4
    • Sidhartha Kumar's avatar
      mm/hugetlb: get rid of page_hstate() · affd26b1
      Sidhartha Kumar authored
      Convert the last page_hstate() user to use folio_hstate() so page_hstate()
      can be safely removed.
      
      Link: https://lkml.kernel.org/r/20230719184145.301911-1-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      affd26b1
    • Kemeng Shi's avatar
    • Peng Zhang's avatar
      mm: kfence: allocate kfence_metadata at runtime · cabdf74e
      Peng Zhang authored
      kfence_metadata is currently a static array.  For the purpose of
      allocating scalable __kfence_pool, we first change it to runtime
      allocation of metadata.  Since the size of an object of kfence_metadata is
      1160 bytes, we can save at least 72 pages (with default 256 objects)
      without enabling kfence.
      
      [akpm@linux-foundation.org: restore newline, per Marco]
      Link: https://lkml.kernel.org/r/20230718073019.52513-1-zhangpeng.00@bytedance.comSigned-off-by: default avatarPeng Zhang <zhangpeng.00@bytedance.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cabdf74e
    • Miaohe Lin's avatar
      memory tier: use helper macro __ATTR_RW() · 8d3a7d79
      Miaohe Lin authored
      Use helper macro __ATTR_RW to define numa demotion attributes.  Minor
      readability improvement.
      
      Link: https://lkml.kernel.org/r/20230715035111.2656784-1-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8d3a7d79
    • Mike Rapoport (IBM)'s avatar
    • Mike Rapoport (IBM)'s avatar
      maple_tree: mtree_insert*: fix typo in kernel-doc description · 4445e582
      Mike Rapoport (IBM) authored
      Replace "Insert and entry at a give index" with "Insert an entry at a
      given index"
      
      Link: https://lkml.kernel.org/r/20230715143920.994812-1-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4445e582
    • Zhu, Lipeng's avatar
      fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing. · aee79d4e
      Zhu, Lipeng authored
      When running UnixBench/Shell Scripts, we observed high false sharing for
      accessing i_mmap against i_mmap_rwsem.
      
      UnixBench/Shell Scripts are typical load/execute command test scenarios,
      which concurrently launch->execute->exit a lot of shell commands.  A lot
      of processes invoke vma_interval_tree_remove which touch "i_mmap", the
      call stack:
      
      ----vma_interval_tree_remove
          |----unlink_file_vma
          |    free_pgtables
          |    |----exit_mmap
          |    |    mmput
          |    |    |----begin_new_exec
          |    |    |    load_elf_binary
          |    |    |    bprm_execve
      
      Meanwhile, there are a lot of processes touch 'i_mmap_rwsem' to acquire
      the semaphore in order to access 'i_mmap'.  In existing 'address_space'
      layout, 'i_mmap' and 'i_mmap_rwsem' are in the same cacheline.
      
      The patch places the i_mmap and i_mmap_rwsem in separate cache lines to
      avoid this false sharing problem.
      
      With this patch, based on kernel v6.4.0, on Intel Sapphire Rapids
      112C/224T platform, the score improves by ~5.3%.  And perf c2c tool shows
      the false sharing is resolved as expected, the symbol
      vma_interval_tree_remove disappeared in cache line 0 after this change.
      
      Baseline:
      =================================================
            Shared Cache Line Distribution Pareto
      =================================================
      -------------------------------------------------------------
          0    3729     5791        0        0  0xff19b3818445c740
      -------------------------------------------------------------
         3.27%    3.02%    0.00%    0.00%   0x18     0       1  0xffffffffa194403b       604       483       389      692       203  [k] vma_interval_tree_insert    [kernel.kallsyms]  vma_interval_tree_insert+75      0  1
         4.13%    3.63%    0.00%    0.00%   0x20     0       1  0xffffffffa19440a2       553       413       415      962       215  [k] vma_interval_tree_remove    [kernel.kallsyms]  vma_interval_tree_remove+18      0  1
         2.04%    1.35%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1d6      1210       855       460     1229       222  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
         0.62%    1.85%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1bf       762       329       577      527       198  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
         0.48%    0.31%    0.00%    0.00%   0x28     0       1  0xffffffffa219a58c      1677      1476       733     1544       224  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
         0.05%    0.07%    0.00%    0.00%   0x28     0       1  0xffffffffa219a21d      1040       819       689       33        27  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+749    0  1
         0.00%    0.05%    0.00%    0.00%   0x28     0       1  0xffffffffa17707db         0      1005       786     1373       223  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
         0.00%    0.02%    0.00%    0.00%   0x28     0       1  0xffffffffa219a064         0       233       778       32        30  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
        33.82%   34.10%    0.00%    0.00%   0x30     0       1  0xffffffffa1770945       779       495       534     6011       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
        17.06%   15.28%    0.00%    0.00%   0x30     0       1  0xffffffffa1770915       593       438       468     2715       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1
         3.54%    3.52%    0.00%    0.00%   0x30     0       1  0xffffffffa2199f84       881       601       583     1421       223  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+84     0  1
      
      With this change:
      -------------------------------------------------------------
         0      556      838        0        0  0xff2780d7965d2780
      -------------------------------------------------------------
          0.18%    0.60%    0.00%    0.00%    0x8     0       1  0xffffffffafff27b8       503       453       569       14        13  [k] do_dentry_open              [kernel.kallsyms]  do_dentry_open+456               0  1
          0.54%    0.12%    0.00%    0.00%    0x8     0       1  0xffffffffaffc51ac       510       199       428       15        12  [k] hugepage_vma_check          [kernel.kallsyms]  hugepage_vma_check+252           0  1
          1.80%    2.15%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1d6      1778       799       343      215       136  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
          0.54%    1.31%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1bf       547       296       528       91        71  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
          0.72%    0.72%    0.00%    0.00%   0x18     0       1  0xffffffffb079a58c      1479      1534       676      288       163  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
          0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffafd707db         0      2381       744      282       158  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
          0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffb079a064         0       239       518        6         6  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
         46.58%   47.02%    0.00%    0.00%   0x20     0       1  0xffffffffafd70945       704       403       499     1137       219  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
         23.92%   25.78%    0.00%    0.00%   0x20     0       1  0xffffffffafd70915       558       413       500      542       185  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1
      
      v1->v2: change padding to exchange fields.
      
      Link: https://lkml.kernel.org/r/20230716145653.20122-1-lipeng.zhu@intel.comSigned-off-by: default avatarLipeng Zhu <lipeng.zhu@intel.com>
      Reviewed-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Yu Ma <yu.ma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aee79d4e
    • Haifeng Xu's avatar
      mm/mm_init.c: drop node_start_pfn from adjust_zone_range_for_zone_movable() · 0792e47d
      Haifeng Xu authored
      node_start_pfn is not used in adjust_zone_range_for_zone_movable(), so it
      is pointless to waste a function argument.  Drop the parameter.
      
      Link: https://lkml.kernel.org/r/20230717065811.1262-1-haifeng.xu@shopee.comSigned-off-by: default avatarHaifeng Xu <haifeng.xu@shopee.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0792e47d
    • Miaohe Lin's avatar
      mm/memcg: minor cleanup for mc_handle_present_pte() · 58f341f7
      Miaohe Lin authored
      When pagetable lock is held, the page will always be page_mapped().  So
      remove unneeded page_mapped() check.  Also the page can't be freed from
      under us in this case.  So use get_page() to get extra page reference to
      simplify the code.  No functional change intended.
      
      Link: https://lkml.kernel.org/r/20230717113644.3026478-1-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58f341f7
    • Barry Song's avatar
      arm64: support batched/deferred tlb shootdown during page reclamation/migration · 43b3dfdd
      Barry Song authored
      On x86, batched and deferred tlb shootdown has lead to 90% performance
      increase on tlb shootdown.  on arm64, HW can do tlb shootdown without
      software IPI.  But sync tlbi is still quite expensive.
      
      Even running a simplest program which requires swapout can
      prove this is true,
       #include <sys/types.h>
       #include <unistd.h>
       #include <sys/mman.h>
       #include <string.h>
      
       int main()
       {
       #define SIZE (1 * 1024 * 1024)
               volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                                MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
               memset(p, 0x88, SIZE);
      
               for (int k = 0; k < 10000; k++) {
                       /* swap in */
                       for (int i = 0; i < SIZE; i += 4096) {
                               (void)p[i];
                       }
      
                       /* swap out */
                       madvise(p, SIZE, MADV_PAGEOUT);
               }
       }
      
      Perf result on snapdragon 888 with 8 cores by using zRAM
      as the swap block device.
      
       ~ # perf record taskset -c 4 ./a.out
       [ perf record: Woken up 10 times to write data ]
       [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
       ~ # perf report
       # To display the perf.data header info, please use --header/--header-only options.
       # To display the perf.data header info, please use --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 60K of event 'cycles'
       # Event count (approx.): 35706225414
       #
       # Overhead  Command  Shared Object      Symbol
       # ........  .......  .................  ......
       #
          21.07%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irq
           8.23%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           6.67%  a.out    [kernel.kallsyms]  [k] filemap_map_pages
           6.16%  a.out    [kernel.kallsyms]  [k] __zram_bvec_write
           5.36%  a.out    [kernel.kallsyms]  [k] ptep_clear_flush
           3.71%  a.out    [kernel.kallsyms]  [k] _raw_spin_lock
           3.49%  a.out    [kernel.kallsyms]  [k] memset64
           1.63%  a.out    [kernel.kallsyms]  [k] clear_page
           1.42%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock
           1.26%  a.out    [kernel.kallsyms]  [k] mod_zone_state.llvm.8525150236079521930
           1.23%  a.out    [kernel.kallsyms]  [k] xas_load
           1.15%  a.out    [kernel.kallsyms]  [k] zram_slot_lock
      
      ptep_clear_flush() takes 5.36% CPU in the micro-benchmark swapping in/out
      a page mapped by only one process.  If the page is mapped by multiple
      processes, typically, like more than 100 on a phone, the overhead would be
      much higher as we have to run tlb flush 100 times for one single page. 
      Plus, tlb flush overhead will increase with the number of CPU cores due to
      the bad scalability of tlb shootdown in HW, so those ARM64 servers should
      expect much higher overhead.
      
      Further perf annonate shows 95% cpu time of ptep_clear_flush is actually
      used by the final dsb() to wait for the completion of tlb flush.  This
      provides us a very good chance to leverage the existing batched tlb in
      kernel.  The minimum modification is that we only send async tlbi in the
      first stage and we send dsb while we have to sync in the second stage.
      
      With the above simplest micro benchmark, collapsed time to finish the
      program decreases around 5%.
      
      Typical collapsed time w/o patch:
       ~ # time taskset -c 4 ./a.out
       0.21user 14.34system 0:14.69elapsed
      w/ patch:
       ~ # time taskset -c 4 ./a.out
       0.22user 13.45system 0:13.80elapsed
      
      Also tested with benchmark in the commit on Kunpeng920 arm64 server
      and observed an improvement around 12.5% with command
      `time ./swap_bench`.
              w/o             w/
      real    0m13.460s       0m11.771s
      user    0m0.248s        0m0.279s
      sys     0m12.039s       0m11.458s
      
      Originally it's noticed a 16.99% overhead of ptep_clear_flush()
      which has been eliminated by this patch:
      
      [root@localhost yang]# perf record -- ./swap_bench && perf report
      [...]
      16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush
      
      It is tested on 4,8,128 CPU platforms and shows to be beneficial on
      large systems but may not have improvement on small systems like on
      a 4 CPU platform.
      
      Also this patch improve the performance of page migration. Using pmbench
      and tries to migrate the pages of pmbench between node 0 and node 1 for
      100 times for 1G memory, this patch decrease the time used around 20%
      (prev 18.338318910 sec after 13.981866350 sec) and saved the time used
      by ptep_clear_flush().
      
      Link: https://lkml.kernel.org/r/20230717131004.12662-5-yangyicong@huawei.comTested-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Tested-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Tested-by: default avatarPunit Agrawal <punit.agrawal@bytedance.com>
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: lipeifeng <lipeifeng@oppo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Tao <prime.zeng@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      43b3dfdd
    • Yicong Yang's avatar
      mm/tlbbatch: introduce arch_flush_tlb_batched_pending() · db6c1f6f
      Yicong Yang authored
      Currently we'll flush the mm in flush_tlb_batched_pending() to avoid race
      between reclaim unmaps pages by batched TLB flush and mprotect/munmap/etc.
      Other architectures like arm64 may only need a synchronization
      barrier(dsb) here rather than a full mm flush.  So add
      arch_flush_tlb_batched_pending() to allow an arch-specific implementation
      here.  This intends no functional changes on x86 since still a full mm
      flush for x86.
      
      Link: https://lkml.kernel.org/r/20230717131004.12662-4-yangyicong@huawei.comSigned-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Barry Song <v-songbaohua@oppo.com>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: lipeifeng <lipeifeng@oppo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Punit Agrawal <punit.agrawal@bytedance.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xin Hao <xhao@linux.alibaba.com>
      Cc: Zeng Tao <prime.zeng@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      db6c1f6f
    • Barry Song's avatar
      mm/tlbbatch: rename and extend some functions · f73419bb
      Barry Song authored
      This patch does some preparation works to extend batched TLB flush to
      arm64. Including:
      - Extend set_tlb_ubc_flush_pending() and arch_tlbbatch_add_mm()
        to accept an additional argument for address, architectures
        like arm64 may need this for tlbi.
      - Rename arch_tlbbatch_add_mm() to arch_tlbbatch_add_pending()
        to match its current function since we don't need to handle
        mm on architectures like arm64 and add_mm is not proper,
        add_pending will make sense to both as on x86 we're pending the
        TLB flush operations while on arm64 we're pending the synchronize
        operations.
      
      This intends no functional changes on x86.
      
      Link: https://lkml.kernel.org/r/20230717131004.12662-3-yangyicong@huawei.comTested-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Tested-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Tested-by: default avatarPunit Agrawal <punit.agrawal@bytedance.com>
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: lipeifeng <lipeifeng@oppo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Tao <prime.zeng@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f73419bb
    • Anshuman Khandual's avatar
      mm/tlbbatch: introduce arch_tlbbatch_should_defer() · 65c8d30e
      Anshuman Khandual authored
      Patch series "arm64: support batched/deferred tlb shootdown during page
      reclamation/migration", v11.
      
      Though ARM64 has the hardware to do tlb shootdown, the hardware
      broadcasting is not free.  A simplest micro benchmark shows even on
      snapdragon 888 with only 8 cores, the overhead for ptep_clear_flush is
      huge even for paging out one page mapped by only one process: 5.36% a.out
      [kernel.kallsyms] [k] ptep_clear_flush
      
      While pages are mapped by multiple processes or HW has more CPUs, the cost
      should become even higher due to the bad scalability of tlb shootdown. 
      The same benchmark can result in 16.99% CPU consumption on ARM64 server
      with around 100 cores according to the test on patch 4/4.
      
      This patchset leverages the existing BATCHED_UNMAP_TLB_FLUSH by
      1. only send tlbi instructions in the first stage -
      	arch_tlbbatch_add_mm()
      2. wait for the completion of tlbi by dsb while doing tlbbatch
      	sync in arch_tlbbatch_flush()
      
      Testing on snapdragon shows the overhead of ptep_clear_flush is removed by
      the patchset.  The micro benchmark becomes 5% faster even for one page
      mapped by single process on snapdragon 888.
      
      Since BATCHED_UNMAP_TLB_FLUSH is implemented only on x86, the patchset
      does some renaming/extension for the current implementation first (Patch
      1-3), then add the support on arm64 (Patch 4).
      		
      
      This patch (of 4):
      
      The entire scheme of deferred TLB flush in reclaim path rests on the fact
      that the cost to refill TLB entries is less than flushing out individual
      entries by sending IPI to remote CPUs.  But architecture can have
      different ways to evaluate that.  Hence apart from checking
      TTU_BATCH_FLUSH in the TTU flags, rest of the decision should be
      architecture specific.
      
      [yangyicong@hisilicon.com: rebase and fix incorrect return value type]
      Link: https://lkml.kernel.org/r/20230717131004.12662-1-yangyicong@huawei.com
      Link: https://lkml.kernel.org/r/20230717131004.12662-2-yangyicong@huawei.comSigned-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      [https://lore.kernel.org/linuxppc-dev/20171101101735.2318-2-khandual@linux.vnet.ibm.com/]
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Reviewed-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Tested-by: default avatarPunit Agrawal <punit.agrawal@bytedance.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: lipeifeng <lipeifeng@oppo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Tao <prime.zeng@hisilicon.com>
      Cc: Barry Song <v-songbaohua@oppo.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <namit@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65c8d30e
    • Baoquan He's avatar
      mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed · 95da27c4
      Baoquan He authored
      Now there are no users of ioremap_allowed and iounmap_allowed, clean
      them up.
      
      Link: https://lkml.kernel.org/r/20230706154520.11257-20-bhe@redhat.comSigned-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      95da27c4
    • Baoquan He's avatar
      arm64 : mm: add wrapper function ioremap_prot() · 8f03d74f
      Baoquan He authored
      Since hook functions ioremap_allowed() and iounmap_allowed() will be
      obsoleted, add wrapper function ioremap_prot() to contain the the specific
      handling in addition to generic_ioremap_prot() invocation.
      
      Link: https://lkml.kernel.org/r/20230706154520.11257-19-bhe@redhat.comSigned-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8f03d74f
    • Christophe Leroy's avatar
      powerpc: mm: convert to GENERIC_IOREMAP · 8d05554d
      Christophe Leroy authored
      By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(),
      generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap()
      and iounmap() are all visible and available to arch. Arch needs to
      provide wrapper functions to override the generic versions if there's
      arch specific handling in its ioremap_prot(), ioremap() or iounmap().
      This change will simplify implementation by removing duplicated code
      with generic_ioremap_prot() and generic_iounmap(), and has the equivalent
      functioality as before.
      
      Here, add wrapper functions ioremap_prot() and iounmap() for powerpc's
      special operation when ioremap() and iounmap().
      
      Link: https://lkml.kernel.org/r/20230706154520.11257-18-bhe@redhat.comSigned-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8d05554d
    • Baoquan He's avatar
      mm: move is_ioremap_addr() into new header file · 016fec91
      Baoquan He authored
      Now is_ioremap_addr() is only used in kernel/iomem.c and gonna be used in
      mm/ioremap.c.  Move it into its own new header file linux/ioremap.h.
      
      Link: https://lkml.kernel.org/r/20230706154520.11257-17-bhe@redhat.comSuggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      016fec91