1. 12 Dec, 2012 40 commits
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() on sparc32 architecture · a046be3d
      Michel Lespinasse authored
      Update the sparc32 arch_get_unmapped_area function to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: remove now-unused COLOUR_ALIGN()]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a046be3d
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() on sh architecture · b4265f12
      Michel Lespinasse authored
      Update the sh arch_get_unmapped_area[_topdown] functions to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      
      [akpm@linux-foundation.org: remove now-unused COLOUR_ALIGN_DOWN()]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4265f12
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() on arm architecture · 394ef640
      Michel Lespinasse authored
      Update the arm arch_get_unmapped_area[_topdown] functions to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      
      [akpm@linux-foundation.org: remove now-unused COLOUR_ALIGN_DOWN()]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      394ef640
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() on mips architecture · b6661861
      Michel Lespinasse authored
      Update the mips arch_get_unmapped_area[_topdown] functions to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      
      [akpm@linux-foundation.org: remove now-unused COLOUR_ALIGN_DOWN()]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6661861
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() in hugetlbfs on i386 architecture · cdc17344
      Michel Lespinasse authored
      Update the i386 hugetlb_get_unmapped_area function to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cdc17344
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() in hugetlbfs · 08659355
      Michel Lespinasse authored
      Update the hugetlb_get_unmapped_area function to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08659355
    • Michel Lespinasse's avatar
      mm: fix cache coloring on x86_64 architecture · 7d025059
      Michel Lespinasse authored
      Fix the x86-64 cache alignment code to take pgoff into account.  Use the
      x86 and MIPS cache alignment code as the basis for a generic cache
      alignment function.
      
      The old x86 code will always align the mmap to aliasing boundaries,
      even if the program mmaps the file with a non-zero pgoff.
      
      If program A mmaps the file with pgoff 0, and program B mmaps the file
      with pgoff 1.  The old code would align the mmaps, resulting in misaligned
      pages:
      
        A:  0123
        B:  123
      
      After this patch, they are aligned so the pages line up:
      
        A: 0123
        B:  123
      
      Proposed by Rik van Riel.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d025059
    • Michel Lespinasse's avatar
      mm: use vm_unmapped_area() on x86_64 architecture · f9902472
      Michel Lespinasse authored
      Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use
      of vm_unmapped_area() instead of implementing a brute force search.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9902472
    • Michel Lespinasse's avatar
      mm: vm_unmapped_area() lookup function · db4fbfb9
      Michel Lespinasse authored
      Implement vm_unmapped_area() using the rb_subtree_gap and highest_vm_end
      information to look up for suitable virtual address space gaps.
      
      struct vm_unmapped_area_info is used to define the desired allocation
      request:
       - lowest or highest possible address matching the remaining constraints
       - desired gap length
       - low/high address limits that the gap must fit into
       - alignment mask and offset
      
      Also update the generic arch_get_unmapped_area[_topdown] functions to make
      use of vm_unmapped_area() instead of implementing a brute force search.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db4fbfb9
    • Rik van Riel's avatar
      mm: rearrange vm_area_struct for fewer cache misses · e4c6bfd2
      Rik van Riel authored
      The kernel walks the VMA rbtree in various places, including the page
      fault path.  However, the vm_rb node spanned two cache lines, on 64 bit
      systems with 64 byte cache lines (most x86 systems).
      
      Rearrange vm_area_struct a little, so all the information we need to do a
      VMA tree walk is in the first cache line.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4c6bfd2
    • Michel Lespinasse's avatar
      mm: check rb_subtree_gap correctness · 5a0768f6
      Michel Lespinasse authored
      When CONFIG_DEBUG_VM_RB is enabled, check that rb_subtree_gap is correctly
      set for every vma and that mm->highest_vm_end is also correct.
      
      Also add an explicit 'bug' variable to track if browse_rb() detected any
      invalid condition.
      
      [akpm@linux-foundation.org: repair innovative coding-style inventions]
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a0768f6
    • Michel Lespinasse's avatar
      mm: augment vma rbtree with rb_subtree_gap · d3737187
      Michel Lespinasse authored
      Define vma->rb_subtree_gap as the largest gap between any vma in the
      subtree rooted at that vma, and their predecessor.  Or, for a recursive
      definition, vma->rb_subtree_gap is the max of:
      
       - vma->vm_start - vma->vm_prev->vm_end
       - rb_subtree_gap fields of the vmas pointed by vma->rb.rb_left and
         vma->rb.rb_right
      
      This will allow get_unmapped_area_* to find a free area of the right
      size in O(log(N)) time, instead of potentially having to do a linear
      walk across all the VMAs.
      
      Also define mm->highest_vm_end as the vm_end field of the highest vma,
      so that we can easily check if the following gap is suitable.
      
      This does have the potential to make unmapping VMAs more expensive,
      especially for processes with very large numbers of VMAs, where the VMA
      rbtree can grow quite deep.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3737187
    • Andi Kleen's avatar
      selftests: add a test program for variable huge page sizes in mmap/shmget · fcc1f2d5
      Andi Kleen authored
      Also remove -Wextra because gcc-4.6 emits lots of irritating
      signed/unsigned comparison warnings.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcc1f2d5
    • Andi Kleen's avatar
      mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB · 42d7395f
      Andi Kleen authored
      There was some desire in large applications using MAP_HUGETLB or
      SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
      others.  This is useful together with NUMA policy: use 2MB interleaving
      on some mappings, but 1GB on local mappings.
      
      This patch extends the IPC/SHM syscall interfaces slightly to allow
      specifying the page size.
      
      It borrows some upper bits in the existing flag arguments and allows
      encoding the log of the desired page size in addition to the *_HUGETLB
      flag.  When 0 is specified the default size is used, this makes the
      change fully compatible.
      
      Extending the internal hugetlb code to handle this is straight forward.
      Instead of a single mount it just keeps an array of them and selects the
      right mount based on the specified page size.  When no page size is
      specified it uses the mount of the default page size.
      
      The change is not visible in /proc/mounts because internal mounts don't
      appear there.  It also has very little overhead: the additional mounts
      just consume a super block, but not more memory when not used.
      
      I also exported the new flags to the user headers (they were previously
      under __KERNEL__).  Right now only symbols for x86 and some other
      architecture for 1GB and 2MB are defined.  The interface should already
      work for all other architectures though.  Only architectures that define
      multiple hugetlb sizes actually need it (that is currently x86, tile,
      powerpc).  However tile and powerpc have user configurable hugetlb
      sizes, so it's not easy to add defines.  A program on those
      architectures would need to query sysfs and use the appropiate log2.
      
      [akpm@linux-foundation.org: cleanups]
      [rientjes@google.com: fix build]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42d7395f
    • Naoya Horiguchi's avatar
      mm: hwpoison: fix action_result() to print out dirty/clean · ff604cf6
      Naoya Horiguchi authored
      action_result() fails to print out "dirty" even if an error occurred on
      a dirty pagecache, because when we check PageDirty in action_result() it
      was cleared after page isolation even if it's dirty before error
      handling.  This can break some applications that monitor this message,
      so should be fixed.
      
      There are several callers of action_result() except page_action(), but
      either of them are not for LRU pages but for free pages or kernel pages,
      so we don't have to consider dirty or not for them.
      
      Note that PG_dirty can be set outside page locks as described in commit
      6746aff7 ("HWPOISON: shmem: call set_page_dirty() with locked
      page"), so this patch does not completely closes the race window, but
      just narrows it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff604cf6
    • Matthieu CASTET's avatar
      dmapool: make DMAPOOL_DEBUG detect corruption of free marker · 5de55b26
      Matthieu CASTET authored
      This can help to catch the case where hardware is writing after dma free.
      
      [akpm@linux-foundation.org: tidy code, fix comment, use sizeof(page->offset), use pr_err()]
      Signed-off-by: default avatarMatthieu Castet <matthieu.castet@parrot.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5de55b26
    • David Rientjes's avatar
      mm, oom: allow exiting threads to have access to memory reserves · 9ff4868e
      David Rientjes authored
      Exiting threads, those with PF_EXITING set, can pagefault and require
      memory before they can make forward progress.  This happens, for instance,
      when a process must fault task->robust_list, a userspace structure, before
      detaching its memory.
      
      These threads also aren't guaranteed to get access to memory reserves
      unless oom killed or killed from userspace.  The oom killer won't grant
      memory reserves if other threads are also exiting other than current and
      stalling at the same point.  This prevents needlessly killing processes
      when others are already exiting.
      
      Instead of special casing all the possible situations between PF_EXITING
      getting set and a thread detaching its mm where it may allocate memory,
      which probably wouldn't get updated when a change is made to the exit
      path, the solution is to give all exiting threads access to memory
      reserves if they call the oom killer.  This allows them to quickly
      allocate, detach its mm, and free the memory it represents.
      
      Summary of Luigi's bug report:
      
      : He had an oom condition where threads were faulting on task->robust_list
      : and repeatedly called the oom killer but it would defer killing a thread
      : because it saw other PF_EXITING threads.  This can happen anytime we need
      : to allocate memory after setting PF_EXITING and before detaching our mm;
      : if there are other threads in the same state then the oom killer won't do
      : anything unless one of them happens to be killed from userspace.
      :
      : So instead of only deferring for PF_EXITING and !task->robust_list, it's
      : better to just give them access to memory reserves to prevent a potential
      : livelock so that any other faults that may be introduced in the future in
      : the exit path don't cause the same problem (and hopefully we don't allow
      : too many of those!).
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Tested-by: default avatarLuigi Semenzato <semenzato@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ff4868e
    • Jeff Liu's avatar
      Documentation/cgroups/memory.txt: s/mem_cgroup_charge/mem_cgroup_change_common/ · 348b4655
      Jeff Liu authored
      mem_cgroup_charge_common() is invoked as the entry point for cgroup limits
      charge rather than mem_cgroup_charge(), as the later has been removed for
      years.  Update the cgroup/memory.txt to reflect this change.
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      348b4655
    • Will Deacon's avatar
      mm: thp: set the accessed flag for old pages on access fault · a1dd450b
      Will Deacon authored
      On x86 memory accesses to pages without the ACCESSED flag set result in
      the ACCESSED flag being set automatically.  With the ARM architecture a
      page access fault is raised instead (and it will continue to be raised
      until the ACCESSED flag is set for the appropriate PTE/PMD).
      
      For normal memory pages, handle_pte_fault will call pte_mkyoung
      (effectively setting the ACCESSED flag).  For transparent huge pages,
      pmd_mkyoung will only be called for a write fault.
      
      This patch ensures that faults on transparent hugepages which do not
      result in a CoW update the access flags for the faulting pmd.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Acked-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ni zhan Chen <nizhan.chen@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1dd450b
    • Joonsoo Kim's avatar
      mm, highmem: get virtual address of the page using PKMAP_ADDR() · eb2db439
      Joonsoo Kim authored
      In flush_all_zero_pkmaps(), we have an index of the pkmap associated with
      the page.  Using this index, we can simply get virtual address of the
      page.  So change it.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb2db439
    • Joonsoo Kim's avatar
      mm, highmem: remove page_address_pool list · a354e2c8
      Joonsoo Kim authored
      We can find free page_address_map instance without the page_address_pool.
      So remove it.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a354e2c8
    • Joonsoo Kim's avatar
      mm, highmem: remove useless pool_lock · cc33a303
      Joonsoo Kim authored
      The pool_lock protects the page_address_pool from concurrent access.  But,
      access to the page_address_pool is already protected by kmap_lock.  So
      remove it.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarMinchan Kin <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cc33a303
    • Joonsoo Kim's avatar
      mm, highmem: use PKMAP_NR() to calculate an index of pkmap · 4de22c05
      Joonsoo Kim authored
      To calculate an index of pkmap, using PKMAP_NR() is more understandable
      and maintainable, so change it.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4de22c05
    • Cesar Eduardo Barros's avatar
      mm: do not call frontswap_init() during swapoff · 6555bc03
      Cesar Eduardo Barros authored
      The call to frontswap_init() was added within enable_swap_info(), which
      was called not only during sys_swapon, but also to reinsert the swap_info
      into the swap_list in case of failure of try_to_unuse() within
      sys_swapoff.  This means that frontswap_init() might be called more than
      once for the same swap area.
      
      While as far as I could see no frontswap implementation has any problem
      with it (and in fact, all the ones I found ignore the parameter passed to
      frontswap_init), this could change in the future.
      
      To prevent future problems, move the call to frontswap_init() to outside
      the code shared between sys_swapon and sys_swapoff.
      Signed-off-by: default avatarCesar Eduardo Barros <cesarb@cesarb.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6555bc03
    • Cesar Eduardo Barros's avatar
      mm: refactor reinsert of swap_info in sys_swapoff() · cf0cac0a
      Cesar Eduardo Barros authored
      The block within sys_swapoff() which re-inserts the swap_info into the
      swap_list in case of failure of try_to_unuse() reads a few values outside
      the swap_lock.  While this is safe at that point, it is subtle code.
      
      Simplify the code by moving the reading of these values to a separate
      function, refactoring it a bit so they are read from within the swap_lock.
       This is easier to understand, and matches better the way it worked before
      I unified the insertion of the swap_info from both sys_swapon and
      sys_swapoff.
      
      This change should make no functional difference.  The only real change is
      moving the read of two or three structure fields to within the lock
      (frontswap_map_get() is nothing more than a read of p->frontswap_map).
      Signed-off-by: default avatarCesar Eduardo Barros <cesarb@cesarb.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf0cac0a
    • Rik van Riel's avatar
      mm,vmscan: only evict file pages when we have plenty · e9868505
      Rik van Riel authored
      If we have more inactive file pages than active file pages, we skip
      scanning the active file pages altogether, with the idea that we do not
      want to evict the working set when there is plenty of streaming IO in the
      cache.
      
      However, the code forgot to also skip scanning anonymous pages in that
      situation.  That leads to the curious situation of keeping the active file
      pages protected from being paged out when there are lots of inactive file
      pages, while still scanning and evicting anonymous pages.
      
      This patch fixes that situation, by only evicting file pages when we have
      plenty of them and most are inactive.
      
      [akpm@linux-foundation.org: adjust comment layout]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9868505
    • Jan Kara's avatar
      mm: add comment on storage key dirty bit semantics · e749eb95
      Jan Kara authored
      Add comments that dirty bit in storage key gets set whenever page content
      is changed.  Hopefully if someone will use this function, he'll have a
      look at one of the two places where we comment on this.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e749eb95
    • Tang Chen's avatar
      mm/memory_hotplug.c: update start_pfn in zone and pg_data when spanned_pages == 0. · 712cd386
      Tang Chen authored
      If we hot-remove memory only and leave the cpus alive, the corresponding
      node will not be removed.  But the node_start_pfn and node_spanned_pages
      in pg_data will be reset to 0.  In this case, when we hot-add the memory
      back next time, the node_start_pfn will always be 0 because no pfn is less
      than 0.  After that, if we hot-remove the memory again, it will cause
      kernel panic in function find_biggest_section_pfn() when it tries to scan
      all the pfns.
      
      The zone will also have the same problem.
      
      This patch sets start_pfn to the start_pfn of the section being added when
      spanned_pages of the zone or pg_data is 0.
      
        ---How to reproduce---
      
      1. hot-add a container with some memory and cpus;
      2. hot-remove the container's memory, and leave cpus there;
      3. hot-add these memory again;
      4. hot-remove them again;
      
      then, the kernel will panic.
      
        ---Call trace---
      
        BUG: unable to handle kernel paging request at 00000fff82a8cc38
        IP: [<ffffffff811c0d55>] find_biggest_section_pfn+0xe5/0x180
        ......
        Call Trace:
         [<ffffffff811c1124>] __remove_zone+0x184/0x1b0
         [<ffffffff811c11dc>] __remove_section+0x8c/0xb0
         [<ffffffff811c12e7>] __remove_pages+0xe7/0x120
         [<ffffffff81654f7c>] arch_remove_memory+0x2c/0x80
         [<ffffffff81655bb6>] remove_memory+0x56/0x90
         [<ffffffff813da0c8>] acpi_memory_device_remove_memory+0x48/0x73
         [<ffffffff813da55a>] acpi_memory_device_notify+0x153/0x274
         [<ffffffff813b6786>] acpi_ev_notify_dispatch+0x41/0x5f
         [<ffffffff813a3867>] acpi_os_execute_deferred+0x27/0x34
         [<ffffffff81090589>] process_one_work+0x219/0x680
         [<ffffffff810923be>] worker_thread+0x12e/0x320
         [<ffffffff81098396>] kthread+0xc6/0xd0
         [<ffffffff8167c7c4>] kernel_thread_helper+0x4/0x10
        ......
        ---[ end trace 96d845dbf33fee11 ]---
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      712cd386
    • Lai Jiangshan's avatar
      slub, hotplug: ignore unrelated node's hot-adding and hot-removing · b9d5ab25
      Lai Jiangshan authored
      SLUB only focuses on the nodes which have normal memory and it ignores the
      other node's hot-adding and hot-removing.
      
      Aka: if some memory of a node which has no onlined memory is online, but
      this new memory onlined is not normal memory (for example, highmem), we
      should not allocate kmem_cache_node for SLUB.
      
      And if the last normal memory is offlined, but the node still has memory,
      we should remove kmem_cache_node for that node.  (The current code delays
      it when all of the memory is offlined)
      
      So we only do something when marg->status_change_nid_normal > 0.
      marg->status_change_nid is not suitable here.
      
      The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
      for every node even the node don't have normal memory, SLAB tolerates
      kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
      normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
      SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9d5ab25
    • Lai Jiangshan's avatar
      memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY] · d9713679
      Lai Jiangshan authored
      Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], it
      forgets to manage node_states[N_NORMAL_MEMORY].  This may cause
      node_states[N_NORMAL_MEMORY] to become incorrect.
      
      Example, if a node is empty before online, and we online a memory which is
      in ZONE_NORMAL.  And after online, node_states[N_HIGH_MEMORY] is correct,
      but node_states[N_NORMAL_MEMORY] is incorrect, the online code doesn't set
      the new online node to node_states[N_NORMAL_MEMORY].
      
      The same thing will happen when offlining (the offline code doesn't clear
      the node from node_states[N_NORMAL_MEMORY] when needed).  Some memory
      managment code depends node_states[N_NORMAL_MEMORY], so we have to fix up
      the node_states[N_NORMAL_MEMORY].
      
      We add node_states_check_changes_online() and
      node_states_check_changes_offline() to detect whether
      node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] are changed
      while hotpluging.
      
      Also add @status_change_nid_normal to struct memory_notify, thus the
      memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] are
      changed.  (We can add a @flags and reuse @status_change_nid instead of
      introducing @status_change_nid_normal, but it will add much more
      complexity in memory hotplug callback in every subsystem.  So introducing
      @status_change_nid_normal is better and it doesn't change the sematics of
      @status_change_nid)
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9713679
    • Wen Congyang's avatar
      memory-hotplug: allocate zone's pcp before onlining pages · 6dcd73d7
      Wen Congyang authored
      We use __free_page() to put a page to buddy system when onlining pages.
      __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
      should allocate zone's pcp before onlining pages, otherwise we will lose
      some free pages.
      
      [mhocko@suse.cz: make zone_pcp_reset independent of MEMORY_HOTREMOVE]
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6dcd73d7
    • Wen Congyang's avatar
      memory-hotplug, mm/sparse.c: clear the memory to store struct page · 3ac19f8e
      Wen Congyang authored
      If sparse memory vmemmap is enabled, we can't free the memory to store
      struct page when a memory device is hotremoved, because we may store
      struct page in the memory to manage the memory which doesn't belong to
      this memory device.  When we hotadded this memory device again, we will
      reuse this memory to store struct page, and struct page may contain some
      obsolete information, and we will get bad-page state:
      
        init_memory_mapping: [mem 0x80000000-0x9fffffff]
        Built 2 zonelists in Node order, mobility grouping on.  Total pages: 547617
        Policy zone: Normal
        BUG: Bad page state in process bash  pfn:9b6dc
        page:ffffea0002200020 count:0 mapcount:0 mapping:          (null) index:0xfdfdfdfdfdfdfdfd
        page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
        Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
        Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
        Call Trace:
         [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
         [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
         [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
         [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
         [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
         [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
         [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
         [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
         [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
         [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
         [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
         [<ffffffff81137977>] ? sys_write+0x47/0x90
         [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
         [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
        Disabling lock debugging due to kernel taint
      
      This patch clears the memory to store struct page to avoid unexpected error.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-by: default avatarVasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ac19f8e
    • Yasuaki Ishimatsu's avatar
      memory-hotplug: suppress "Device nodeX does not have a release() function" warning · 8c7b5b4e
      Yasuaki Ishimatsu authored
      When calling unregister_node(), the function shows following message at
      device_release().
      
      "Device 'node2' does not have a release() function, it is broken and must
      be fixed."
      
      The reason is node's device struct does not have a release() function.
      
      So the patch registers node_device_release() to the device's release()
      function for suppressing the warning message.  Additionally, the patch
      adds memset() to initialize a node struct into register_node().  Because
      the node struct is part of node_devices[] array and it cannot be freed by
      node_device_release().  So if system reuses the node struct, it has a
      garbage.
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c7b5b4e
    • Wen Congyang's avatar
      numa: convert static memory to dynamically allocated memory for per node device · 8732794b
      Wen Congyang authored
      We use a static array to store struct node.  In many cases, we don't have
      too many nodes, and some memory will be unused.  Convert it to per-device
      dynamically allocated memory.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8732794b
    • Wen Congyang's avatar
      memory-hotplug: fix NR_FREE_PAGES mismatch · 97d0da22
      Wen Congyang authored
      NR_FREE_PAGES will be wrong after offlining pages.  We add/dec
      NR_FREE_PAGES like this now:
      
      1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
      
      2. don't add NR_FREE_PAGES when it is freed and the migratetype is
         MIGRATE_ISOLATE
      
      3. dec NR_FREE_PAGES when offlining isolated pages.
      
      4. add NR_FREE_PAGES when undoing isolate pages.
      
      When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
      NR_FREE_PAGES are right.  When we come to step4, all pages are not in
      buddy system, so we don't change NR_FREE_PAGES in this step, but we change
      NR_FREE_PAGES in step3.  So NR_FREE_PAGES is wrong after offlining pages.
      So there is no need to change NR_FREE_PAGES in step3.
      
      This patch also fixs a problem in step2: if the migratetype is
      MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
      pcppages.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Jianguo Wu <wujianguo106@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97d0da22
    • Wen Congyang's avatar
      memory-hotplug: auto offline page_cgroup when onlining memory block failed · 7c72eb32
      Wen Congyang authored
      When a memory block is onlined, we will try allocate memory on that node
      to store page_cgroup.  If onlining the memory block failed, we don't
      offline the page cgroup, and we have no chance to offline this page cgroup
      unless the memory block is onlined successfully again.  It will cause that
      we can't hot-remove the memory device on that node, because some memory is
      used to store page cgroup.  If onlining the memory block is failed, there
      is no need to stort page cgroup for this memory.  So auto offline
      page_cgroup when onlining memory block failed.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c72eb32
    • Wen Congyang's avatar
      memory-hotplug: update mce_bad_pages when removing the memory · 95a4774d
      Wen Congyang authored
      When we hotremove a memory device, we will free the memory to store struct
      page.  If the page is hwpoisoned page, we should decrease mce_bad_pages.
      
      [akpm@linux-foundation.org: cleanup ifdefs]
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95a4774d
    • Wen Congyang's avatar
      memory-hotplug: skip HWPoisoned page when offlining pages · b023f468
      Wen Congyang authored
      hwpoisoned may be set when we offline a page by the sysfs interface
      /sys/devices/system/memory/soft_offline_page or
      /sys/devices/system/memory/hard_offline_page. If we don't clear
      this flag when onlining pages, this page can't be freed, and will
      not in free list. So we can't offline these pages again. So we
      should skip such page when offlining pages.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b023f468
    • Yasuaki Ishimatsu's avatar
      memory hotplug: suppress "Device memoryX does not have a release() function" warning · fa7194eb
      Yasuaki Ishimatsu authored
      When calling remove_memory_block(), the function shows following message
      at device_release().
      
      "Device 'memory528' does not have a release() function, it is broken and
      must be fixed."
      
      The reason is memory_block's device struct does not have a release()
      function.
      
      So the patch registers memory_block_release() to the device's release()
      function for suppressing the warning message.  Additionally, the patch
      moves kfree(mem) into the release function since the release function is
      prepared as a means to free a memory_block struct.
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa7194eb
    • Bob Liu's avatar
      thp: cleanup: introduce mk_huge_pmd() · b3092b3b
      Bob Liu authored
      Introduce mk_huge_pmd() to simplify the code
      Signed-off-by: default avatarBob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Ni zhan Chen <nizhan.chen@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3092b3b