1. 06 Nov, 2019 1 commit
    • Thomas Hellstrom's avatar
      mm: Add write-protect and clean utilities for address space ranges · c5acad84
      Thomas Hellstrom authored
      
      Add two utilities to 1) write-protect and 2) clean all ptes pointing into
      a range of an address space.
      The utilities are intended to aid in tracking dirty pages (either
      driver-allocated system memory or pci device memory).
      The write-protect utility should be used in conjunction with
      page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
      accesses. Typically one would want to use this on sparse accesses into
      large memory regions. The clean utility should be used to utilize
      hardware dirtying functionality and avoid the overhead of page-faults,
      typically on large accesses into small memory regions.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5acad84
  2. 24 Sep, 2019 2 commits
  3. 20 Aug, 2019 1 commit
  4. 07 Aug, 2019 2 commits
  5. 17 Jul, 2019 1 commit
  6. 15 Jul, 2019 1 commit
  7. 12 Jul, 2019 4 commits
  8. 02 Jul, 2019 5 commits
  9. 18 Jun, 2019 1 commit
    • Thomas Hellstrom's avatar
      mm: Add write-protect and clean utilities for address space ranges · 4fe51e9e
      Thomas Hellstrom authored
      
      Add two utilities to a) write-protect and b) clean all ptes pointing into
      a range of an address space.
      The utilities are intended to aid in tracking dirty pages (either
      driver-allocated system memory or pci device memory).
      The write-protect utility should be used in conjunction with
      page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
      accesses. Typically one would want to use this on sparse accesses into
      large memory regions. The clean utility should be used to utilize
      hardware dirtying functionality and avoid the overhead of page-faults,
      typically on large accesses into small memory regions.
      
      The added file "as_dirty_helpers.c" is initially listed as maintained by
      VMware under our DRM driver. If somebody would like it elsewhere,
      that's of course no problem.
      
      Notable changes since RFC:
      - Added comments to help avoid the usage of these function for VMAs
        it's not intended for. We also do advisory checks on the vm_flags and
        warn on illegal usage.
      - Perform the pte modifications the same way softdirty does.
      - Add mmu_notifier range invalidation calls.
      - Add a config option so that this code is not unconditionally included.
      - Tell the mmu_gather code about pending tlb flushes.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: linux-mm@kvack.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> #v1
      4fe51e9e
  10. 08 Jun, 2019 1 commit
  11. 21 May, 2019 1 commit
  12. 14 May, 2019 6 commits
  13. 28 Dec, 2018 1 commit
  14. 31 Oct, 2018 2 commits
  15. 21 Oct, 2018 1 commit
  16. 20 Sep, 2018 1 commit
    • Pasha Tatashin's avatar
      mm: disable deferred struct page for 32-bit arches · 889c695d
      Pasha Tatashin authored
      Deferred struct page init is needed only on systems with large amount of
      physical memory to improve boot performance.  32-bit systems do not
      benefit from this feature.
      
      Jiri reported a problem where deferred struct pages do not work well with
      x86-32:
      
      [    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
      [    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
      [    0.036269] Initializing CPU#0
      [    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
      [    0.038459] page:f6780000 is uninitialized and poisoned
      [    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
      [    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
      [    0.040038] ------------[ cut here ]------------
      [    0.040399] kernel BUG at include/linux/page-flags.h:293!
      [    0.040823] invalid opcode: 0000 [#1] SMP PTI
      [    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
      [    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [    0.042496] EIP: free_highmem_page+0x64/0x80
      [    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
      [    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
      [    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
      [    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
      [    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
      [    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [    0.046913] DR6: fffe0ff0 DR7: 00000400
      [    0.047220] Call Trace:
      [    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
      [    0.047854]  set_highmem_pages_init+0x5b/0x71
      [    0.048202]  mem_init+0x2b/0x1e8
      [    0.048460]  start_kernel+0x1d2/0x425
      [    0.048757]  i386_start_kernel+0x93/0x97
      [    0.049073]  startup_32_smp+0x164/0x168
      [    0.049379] Modules linked in:
      [    0.049626] ---[ end trace 337949378db0abbb ]---
      
      We free highmem pages before their struct pages are initialized:
      
      mem_init()
       set_highmem_pages_init()
        add_highpages_with_active_regions()
         free_highmem_page()
          .. Access uninitialized struct page here..
      
      Because there is no reason to have this feature on 32-bit systems, just
      disable it.
      
      Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
      Fixes: 2e3ca40f
      
       ("mm: relax deferred struct page requirements")
      Signed-off-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      889c695d
  17. 17 Aug, 2018 3 commits
    • Huang Ying's avatar
      mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP · 14fef284
      Huang Ying authored
      CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's unreasonable
      to optimize swapping for THP (Transparent Huge Page) without basic
      swapping support.
      
      In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
      split_swap_cluster() will not be built because it is in swapfile.c, but
      it will be called in huge_memory.c.  This doesn't trigger a build error
      in practice because the call site is enclosed by PageSwapCache(), which
      is defined to be constant 0 when CONFIG_SWAP=n.  But this is fragile and
      should be fixed.
      
      The comments are fixed too to reflect the latest progress.
      
      Link: http://lkml.kernel.org/r/20180713021228.439-1-ying.huang@intel.com
      Fixes: 38d8b4e6
      
       ("mm, THP, swap: delay splitting THP during swap out")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14fef284
    • Pavel Tatashin's avatar
      mm/sparse: delete old sparse_init and enable new one · 2a3cb8ba
      Pavel Tatashin authored
      Rename new_sparse_init() to sparse_init() which enables it.  Delete old
      sparse_init() and all the code that became obsolete with.
      
      [pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
        Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
      
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Tested-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a3cb8ba
    • Mike Rapoport's avatar
      mm: make DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM · d39f8fb4
      Mike Rapoport authored
      The deferred memory initialization relies on section definitions, e.g
      PAGES_PER_SECTION, that are only available when CONFIG_SPARSEMEM=y on
      most architectures.
      
      Initially DEFERRED_STRUCT_PAGE_INIT depended on explicit
      ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT configuration option, but since
      the commit 2e3ca40f ("mm: relax deferred struct page
      requirements") this requirement was relaxed and now it is possible to
      enable DEFERRED_STRUCT_PAGE_INIT on architectures that support
      DISCONTINGMEM and NO_BOOTMEM which causes build failures.
      
      For instance, setting SMP=y and DEFERRED_STRUCT_PAGE_INIT=y on arc
      causes the following build failure:
      
          CC      mm/page_alloc.o
        mm/page_alloc.c: In function 'update_defer_init':
        mm/page_alloc.c:321:14: error: 'PAGES_PER_SECTION'
        undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
              (pfn & (PAGES_PER_SECTION - 1)) == 0) {
                      ^~~~~~~~~~~~~~~~~
                      USEC_PER_SEC
        mm/page_alloc.c:321:14: note: each undeclared identifier is reported only once for each function it appears in
        In file included from include/linux/cache.h:5:0,
                         from include/linux/printk.h:9,
                         from include/linux/kernel.h:14,
                         from include/asm-generic/bug.h:18,
                         from arch/arc/include/asm/bug.h:32,
                         from include/linux/bug.h:5,
                         from include/linux/mmdebug.h:5,
                         from include/linux/mm.h:9,
                         from mm/page_alloc.c:18:
        mm/page_alloc.c: In function 'deferred_grow_zone':
        mm/page_alloc.c:1624:52: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
          unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
                                                            ^
        include/uapi/linux/kernel.h:11:47: note: in definition of macro '__ALIGN_KERNEL_MASK'
         #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
                                                       ^~~~
        include/linux/kernel.h:58:22: note: in expansion of macro '__ALIGN_KERNEL'
         #define ALIGN(x, a)  __ALIGN_KERNEL((x), (a))
                              ^~~~~~~~~~~~~~
        mm/page_alloc.c:1624:34: note: in expansion of macro 'ALIGN'
          unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
                                          ^~~~~
        In file included from include/asm-generic/bug.h:18:0,
                         from arch/arc/include/asm/bug.h:32,
                         from include/linux/bug.h:5,
                         from include/linux/mmdebug.h:5,
                         from include/linux/mm.h:9,
                         from mm/page_alloc.c:18:
        mm/page_alloc.c: In function 'free_area_init_node':
        mm/page_alloc.c:6379:50: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
          pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
                                                          ^
        include/linux/kernel.h:812:22: note: in definition of macro '__typecheck'
           (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                              ^
        include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
          __builtin_choose_expr(__safe_cmp(x, y), \
                                ^~~~~~~~~~
        include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
         #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                                   ^~~~~~~~~~~~~
        mm/page_alloc.c:6379:29: note: in expansion of macro 'min_t'
          pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
                                     ^~~~~
        include/linux/kernel.h:836:2: error: first argument to '__builtin_choose_expr' not a constant
          __builtin_choose_expr(__safe_cmp(x, y), \
          ^
        include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
         #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                                   ^~~~~~~~~~~~~
        mm/page_alloc.c:6379:29: note: in expansion of macro 'min_t'
          pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
                                     ^~~~~
        scripts/Makefile.build:317: recipe for target 'mm/page_alloc.o' failed
      
      Let's make the DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM
      as the systems that support DISCONTIGMEM do not seem to have that huge
      amounts of memory that would make DEFERRED_STRUCT_PAGE_INIT relevant.
      
      Link: http://lkml.kernel.org/r/1530279308-24988-1-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d39f8fb4
  18. 01 Aug, 2018 1 commit
  19. 08 Jun, 2018 1 commit
    • Laurent Dufour's avatar
      mm: introduce ARCH_HAS_PTE_SPECIAL · 3010a5ea
      Laurent Dufour authored
      Currently the PTE special supports is turned on in per architecture
      header files.  Most of the time, it is defined in
      arch/*/include/asm/pgtable.h depending or not on some other per
      architecture static definition.
      
      This patch introduce a new configuration variable to manage this
      directly in the Kconfig files.  It would later replace
      __HAVE_ARCH_PTE_SPECIAL.
      
      Here notes for some architecture where the definition of
      __HAVE_ARCH_PTE_SPECIAL is not obvious:
      
      arm
       __HAVE_ARCH_PTE_SPECIAL which is currently defined in
      arch/arm/include/asm/pgtable-3level.h which is included by
      arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
      So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.
      
      powerpc
      __HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
       - arch/powerpc/include/asm/book3s/64/pgtable.h
       - arch/powerpc/include/asm/pte-common.h
      The first one is included if (PPC_BOOK3S & PPC64) while the second is
      included in all the other cases.
      So select ARCH_HAS_PTE_SPECIAL all the time.
      
      sparc:
      __H...
      3010a5ea
  20. 22 May, 2018 1 commit
    • Dan Williams's avatar
      mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS · e7638488
      Dan Williams authored
      In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
      be able to rely on the fact that they will get wakeups on dev_pagemap
      page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
      generic_dax_page_free() as common indicator / infrastructure for dax
      filesytems to require. With this change there are no users of the
      MEMORY_DEVICE_HOST designation, so remove it.
      
      The HMM sub-system extended dev_pagemap to arrange a callback when a
      dev_pagemap managed page is freed. Since a dev_pagemap page is free /
      idle when its reference count is 1 it requires an additional branch to
      check the page-type at put_page() time. Given put_page() is a hot-path
      we do not want to incur that check if HMM is not in use, so a static
      branch is used to avoid that overhead when not necessary.
      
      Now, the FS_DAX implementation wants to reuse this mechanism for
      receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
      static-key into a generic mechanism t...
      e7638488
  21. 19 May, 2018 1 commit
  22. 09 May, 2018 1 commit
  23. 27 Apr, 2018 1 commit