1. 06 Dec, 2014 15 commits
  2. 21 Nov, 2014 25 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.14.25 · 2dc25659
      Greg Kroah-Hartman authored
      2dc25659
    • Vlastimil Babka's avatar
      mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced · ee78ce5d
      Vlastimil Babka authored
      commit 5bcc9f86 upstream.
      
      For the MIGRATE_RESERVE pages, it is useful when they do not get
      misplaced on free_list of other migratetype, otherwise they might get
      allocated prematurely and e.g.  fragment the MIGRATE_RESEVE pageblocks.
      While this cannot be avoided completely when allocating new
      MIGRATE_RESERVE pageblocks in min_free_kbytes sysctl handler, we should
      prevent the misplacement where possible.
      
      Currently, it is possible for the misplacement to happen when a
      MIGRATE_RESERVE page is allocated on pcplist through rmqueue_bulk() as a
      fallback for other desired migratetype, and then later freed back
      through free_pcppages_bulk() without being actually used.  This happens
      because free_pcppages_bulk() uses get_freepage_migratetype() to choose
      the free_list, and rmqueue_bulk() calls set_freepage_migratetype() with
      the *desired* migratetype and not the page's original MIGRATE_RESERVE
      migratetype.
      
      This patch fixes the problem by moving the call to
      set_freepage_migratetype() from rmqueue_bulk() down to
      __rmqueue_smallest() and __rmqueue_fallback() where the actual page's
      migratetype (e.g.  from which free_list the page is taken from) is used.
      Note that this migratetype might be different from the pageblock's
      migratetype due to freepage stealing decisions.  This is OK, as page
      stealing never uses MIGRATE_RESERVE as a fallback, and also takes care
      to leave all MIGRATE_CMA pages on the correct freelist.
      
      Therefore, as an additional benefit, the call to
      get_pageblock_migratetype() from rmqueue_bulk() when CMA is enabled, can
      be removed completely.  This relies on the fact that MIGRATE_CMA
      pageblocks are created only during system init, and the above.  The
      related is_migrate_isolate() check is also unnecessary, as memory
      isolation has other ways to move pages between freelists, and drain pcp
      lists containing pages that should be isolated.  The buffered_rmqueue()
      can also benefit from calling get_freepage_migratetype() instead of
      get_pageblock_migratetype().
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarYong-Taek Lee <ytk.lee@samsung.com>
      Reported-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Suggested-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Suggested-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: "Wang, Yalin" <Yalin.Wang@sonymobile.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee78ce5d
    • Mel Gorman's avatar
      mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY · 24fa0530
      Mel Gorman authored
      commit 1a501907 upstream.
      
      Commit "mm: vmscan: obey proportional scanning requirements for kswapd"
      ensured that file/anon lists were scanned proportionally for reclaim from
      kswapd but ignored it for direct reclaim.  The intent was to minimse
      direct reclaim latency but Yuanhan Liu pointer out that it substitutes one
      long stall for many small stalls and distorts aging for normal workloads
      like streaming readers/writers.  Hugh Dickins pointed out that a
      side-effect of the same commit was that when one LRU list dropped to zero
      that the entirety of the other list was shrunk leading to excessive
      reclaim in memcgs.  This patch scans the file/anon lists proportionally
      for direct reclaim to similarly age page whether reclaimed by kswapd or
      direct reclaim but takes care to abort reclaim if one LRU drops to zero
      after reclaiming the requested number of pages.
      
      Based on ext4 and using the Intel VM scalability test
      
                                                    3.15.0-rc5            3.15.0-rc5
                                                      shrinker            proportion
      Unit  lru-file-readonce    elapsed      5.3500 (  0.00%)      5.4200 ( -1.31%)
      Unit  lru-file-readonce time_range      0.2700 (  0.00%)      0.1400 ( 48.15%)
      Unit  lru-file-readonce time_stddv      0.1148 (  0.00%)      0.0536 ( 53.33%)
      Unit lru-file-readtwice    elapsed      8.1700 (  0.00%)      8.1700 (  0.00%)
      Unit lru-file-readtwice time_range      0.4300 (  0.00%)      0.2300 ( 46.51%)
      Unit lru-file-readtwice time_stddv      0.1650 (  0.00%)      0.0971 ( 41.16%)
      
      The test cases are running multiple dd instances reading sparse files. The results are within
      the noise for the small test machine. The impact of the patch is more noticable from the vmstats
      
                                  3.15.0-rc5  3.15.0-rc5
                                    shrinker  proportion
      Minor Faults                     35154       36784
      Major Faults                       611        1305
      Swap Ins                           394        1651
      Swap Outs                         4394        5891
      Allocation stalls               118616       44781
      Direct pages scanned           4935171     4602313
      Kswapd pages scanned          15921292    16258483
      Kswapd pages reclaimed        15913301    16248305
      Direct pages reclaimed         4933368     4601133
      Kswapd efficiency                  99%         99%
      Kswapd velocity             670088.047  682555.961
      Direct efficiency                  99%         99%
      Direct velocity             207709.217  193212.133
      Percentage direct scans            23%         22%
      Page writes by reclaim        4858.000    6232.000
      Page writes file                   464         341
      Page writes anon                  4394        5891
      
      Note that there are fewer allocation stalls even though the amount
      of direct reclaim scanning is very approximately the same.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Tested-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24fa0530
    • Tim Chen's avatar
      fs/superblock: avoid locking counting inodes and dentries before reclaiming them · 14261448
      Tim Chen authored
      commit d23da150 upstream.
      
      We remove the call to grab_super_passive in call to super_cache_count.
      This becomes a scalability bottleneck as multiple threads are trying to do
      memory reclamation, e.g.  when we are doing large amount of file read and
      page cache is under pressure.  The cached objects quickly got reclaimed
      down to 0 and we are aborting the cache_scan() reclaim.  But counting
      creates a log jam acquiring the sb_lock.
      
      We are holding the shrinker_rwsem which ensures the safety of call to
      list_lru_count_node() and s_op->nr_cached_objects.  The shrinker is
      unregistered now before ->kill_sb() so the operation is safe when we are
      doing unmount.
      
      The impact will depend heavily on the machine and the workload but for a
      small machine using postmark tuned to use 4xRAM size the results were
      
                                        3.15.0-rc5            3.15.0-rc5
                                           vanilla         shrinker-v1r1
      Ops/sec Transactions         21.00 (  0.00%)       24.00 ( 14.29%)
      Ops/sec FilesCreate          39.00 (  0.00%)       44.00 ( 12.82%)
      Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
      Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
      Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
      Ops/sec DataRead/MB          25.97 (  0.00%)       29.10 ( 12.05%)
      Ops/sec DataWrite/MB         49.99 (  0.00%)       56.02 ( 12.06%)
      
      ffsb running in a configuration that is meant to simulate a mail server showed
      
                                       3.15.0-rc5             3.15.0-rc5
                                          vanilla          shrinker-v1r1
      Ops/sec readall           9402.63 (  0.00%)      9567.97 (  1.76%)
      Ops/sec create            4695.45 (  0.00%)      4735.00 (  0.84%)
      Ops/sec delete             173.72 (  0.00%)       179.83 (  3.52%)
      Ops/sec Transactions     14271.80 (  0.00%)     14482.81 (  1.48%)
      Ops/sec Read                37.00 (  0.00%)        37.60 (  1.62%)
      Ops/sec Write               18.20 (  0.00%)        18.30 (  0.55%)
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Tested-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14261448
    • Dave Chinner's avatar
      fs/superblock: unregister sb shrinker before ->kill_sb() · e6bed540
      Dave Chinner authored
      commit 28f2cd4f upstream.
      
      This series is aimed at regressions noticed during reclaim activity.  The
      first two patches are shrinker patches that were posted ages ago but never
      merged for reasons that are unclear to me.  I'm posting them again to see
      if there was a reason they were dropped or if they just got lost.  Dave?
      Time?  The last patch adjusts proportional reclaim.  Yuanhan Liu, can you
      retest the vm scalability test cases on a larger machine?  Hugh, does this
      work for you on the memcg test cases?
      
      Based on ext4, I get the following results but unfortunately my larger
      test machines are all unavailable so this is based on a relatively small
      machine.
      
      postmark
                                        3.15.0-rc5            3.15.0-rc5
                                           vanilla       proportion-v1r4
      Ops/sec Transactions         21.00 (  0.00%)       25.00 ( 19.05%)
      Ops/sec FilesCreate          39.00 (  0.00%)       45.00 ( 15.38%)
      Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
      Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
      Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
      Ops/sec DataRead/MB          25.97 (  0.00%)       30.02 ( 15.59%)
      Ops/sec DataWrite/MB         49.99 (  0.00%)       57.78 ( 15.58%)
      
      ffsb (mail server simulator)
                                       3.15.0-rc5             3.15.0-rc5
                                          vanilla        proportion-v1r4
      Ops/sec readall           9402.63 (  0.00%)      9805.74 (  4.29%)
      Ops/sec create            4695.45 (  0.00%)      4781.39 (  1.83%)
      Ops/sec delete             173.72 (  0.00%)       177.23 (  2.02%)
      Ops/sec Transactions     14271.80 (  0.00%)     14764.37 (  3.45%)
      Ops/sec Read                37.00 (  0.00%)        38.50 (  4.05%)
      Ops/sec Write               18.20 (  0.00%)        18.50 (  1.65%)
      
      dd of a large file
                                      3.15.0-rc5            3.15.0-rc5
                                         vanilla       proportion-v1r4
      WallTime DownloadTar       75.00 (  0.00%)       61.00 ( 18.67%)
      WallTime DD               423.00 (  0.00%)      401.00 (  5.20%)
      WallTime Delete             2.00 (  0.00%)        5.00 (-150.00%)
      
      stutter (times mmap latency during large amounts of IO)
      
                                  3.15.0-rc5            3.15.0-rc5
                                     vanilla       proportion-v1r4
      Unit >5ms Delays  80252.0000 (  0.00%)  81523.0000 ( -1.58%)
      Unit Mmap min         8.2118 (  0.00%)      8.3206 ( -1.33%)
      Unit Mmap mean       17.4614 (  0.00%)     17.2868 (  1.00%)
      Unit Mmap stddev     24.9059 (  0.00%)     34.6771 (-39.23%)
      Unit Mmap max      2811.6433 (  0.00%)   2645.1398 (  5.92%)
      Unit Mmap 90%        20.5098 (  0.00%)     18.3105 ( 10.72%)
      Unit Mmap 93%        22.9180 (  0.00%)     20.1751 ( 11.97%)
      Unit Mmap 95%        25.2114 (  0.00%)     22.4988 ( 10.76%)
      Unit Mmap 99%        46.1430 (  0.00%)     43.5952 (  5.52%)
      Unit Ideal  Tput     85.2623 (  0.00%)     78.8906 (  7.47%)
      Unit Tput min        44.0666 (  0.00%)     43.9609 (  0.24%)
      Unit Tput mean       45.5646 (  0.00%)     45.2009 (  0.80%)
      Unit Tput stddev      0.9318 (  0.00%)      1.1084 (-18.95%)
      Unit Tput max        46.7375 (  0.00%)     46.7539 ( -0.04%)
      
      This patch (of 3):
      
      We will like to unregister the sb shrinker before ->kill_sb().  This will
      allow cached objects to be counted without call to grab_super_passive() to
      update ref count on sb.  We want to avoid locking during memory
      reclamation especially when we are skipping the memory reclaim when we are
      out of cached objects.
      
      This is safe because grab_super_passive does a try-lock on the
      sb->s_umount now, and so if we are in the unmount process, it won't ever
      block.  That means what used to be a deadlock and races we were avoiding
      by using grab_super_passive() is now:
      
              shrinker                        umount
      
              down_read(shrinker_rwsem)
                                              down_write(sb->s_umount)
                                              shrinker_unregister
                                                down_write(shrinker_rwsem)
                                                  <blocks>
              grab_super_passive(sb)
                down_read_trylock(sb->s_umount)
                  <fails>
              <shrinker aborts>
              ....
              <shrinkers finish running>
              up_read(shrinker_rwsem)
                                                <unblocks>
                                                <removes shrinker>
                                                up_write(shrinker_rwsem)
                                              ->kill_sb()
                                              ....
      
      So it is safe to deregister the shrinker before ->kill_sb().
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Tested-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6bed540
    • Hugh Dickins's avatar
      mm: fix direct reclaim writeback regression · ddb5f1a6
      Hugh Dickins authored
      commit 8bdd6380 upstream.
      
      Shortly before 3.16-rc1, Dave Jones reported:
      
        WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
                 xfs_vm_writepage+0x5ce/0x630 [xfs]()
        CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
        Call Trace:
          xfs_vm_writepage+0x5ce/0x630 [xfs]
          shrink_page_list+0x8f9/0xb90
          shrink_inactive_list+0x253/0x510
          shrink_lruvec+0x563/0x6c0
          shrink_zone+0x3b/0x100
          shrink_zones+0x1f1/0x3c0
          try_to_free_pages+0x164/0x380
          __alloc_pages_nodemask+0x822/0xc90
          alloc_pages_vma+0xaf/0x1c0
          handle_mm_fault+0xa31/0xc50
        etc.
      
       970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
       971                   PF_MEMALLOC))
      
      I did not respond at the time, because a glance at the PageDirty block
      in shrink_page_list() quickly shows that this is impossible: we don't do
      writeback on file pages (other than tmpfs) from direct reclaim nowadays.
      Dave was hallucinating, but it would have been disrespectful to say so.
      
      However, my own /var/log/messages now shows similar complaints
      
        WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
        WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()
      
      from stressing some mmotm trees during July.
      
      Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
      so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
      to mapping->a_ops->writepage()?
      
      Yes, 3.16-rc1's commit 68711a74 ("mm, migration: add destination
      page freeing callback") has provided such a way to compaction: if
      migrating a SwapBacked page fails, its newpage may be put back on the
      list for later use with PageSwapBacked still set, and nothing will clear
      it.
      
      Whether that can do anything worse than issue WARN_ON_ONCEs, and get
      some statistics wrong, is unclear: easier to fix than to think through
      the consequences.
      
      Fixing it here, before the put_new_page(), addresses the bug directly,
      but is probably the worst place to fix it.  Page migration is doing too
      many parts of the job on too many levels: fixing it in
      move_to_new_page() to complement its SetPageSwapBacked would be
      preferable, except why is it (and newpage->mapping and newpage->index)
      done there, rather than down in migrate_page_move_mapping(), once we are
      sure of success? Not a cleanup to get into right now, especially not
      with memcg cleanups coming in 3.17.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddb5f1a6
    • Shaohua Li's avatar
      x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB · 5450bba9
      Shaohua Li authored
      commit b13b1d2d upstream.
      
      We use the accessed bit to age a page at page reclaim time,
      and currently we also flush the TLB when doing so.
      
      But in some workloads TLB flush overhead is very heavy. In my
      simple multithreaded app with a lot of swap to several pcie
      SSDs, removing the tlb flush gives about 20% ~ 30% swapout
      speedup.
      
      Fortunately just removing the TLB flush is a valid optimization:
      on x86 CPUs, clearing the accessed bit without a TLB flush
      doesn't cause data corruption.
      
      It could cause incorrect page aging and the (mistaken) reclaim of
      hot pages, but the chance of that should be relatively low.
      
      So as a performance optimization don't flush the TLB when
      clearing the accessed bit, it will eventually be flushed by
      a context switch or a VM operation anyway. [ In the rare
      event of it not getting flushed for a long time the delay
      shouldn't really matter because there's no real memory
      pressure for swapout to react to. ]
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: linux-mm@kvack.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
      [ Rewrote the changelog and the code comments. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5450bba9
    • Vlastimil Babka's avatar
      mm, compaction: properly signal and act upon lock and need_sched() contention · 4201cb7e
      Vlastimil Babka authored
      commit be976572 upstream.
      
      Compaction uses compact_checklock_irqsave() function to periodically check
      for lock contention and need_resched() to either abort async compaction,
      or to free the lock, schedule and retake the lock.  When aborting,
      cc->contended is set to signal the contended state to the caller.  Two
      problems have been identified in this mechanism.
      
      First, compaction also calls directly cond_resched() in both scanners when
      no lock is yet taken.  This call either does not abort async compaction,
      or set cc->contended appropriately.  This patch introduces a new
      compact_should_abort() function to achieve both.  In isolate_freepages(),
      the check frequency is reduced to once by SWAP_CLUSTER_MAX pageblocks to
      match what the migration scanner does in the preliminary page checks.  In
      case a pageblock is found suitable for calling isolate_freepages_block(),
      the checks within there are done on higher frequency.
      
      Second, isolate_freepages() does not check if isolate_freepages_block()
      aborted due to contention, and advances to the next pageblock.  This
      violates the principle of aborting on contention, and might result in
      pageblocks not being scanned completely, since the scanning cursor is
      advanced.  This problem has been noticed in the code by Joonsoo Kim when
      reviewing related patches.  This patch makes isolate_freepages_block()
      check the cc->contended flag and abort.
      
      In case isolate_freepages() has already isolated some pages before
      aborting due to contention, page migration will proceed, which is OK since
      we do not want to waste the work that has been done, and page migration
      has own checks for contention.  However, we do not want another isolation
      attempt by either of the scanners, so cc->contended flag check is added
      also to compaction_alloc() and compact_finished() to make sure compaction
      is aborted right after the migration.
      
      The outcome of the patch should be reduced lock contention by async
      compaction and lower latencies for higher-order allocations where direct
      compaction is involved.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Tested-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Tested-by: default avatarKevin Hilman <khilman@linaro.org>
      Tested-by: default avatarStephen Warren <swarren@nvidia.com>
      Tested-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4201cb7e
    • Vlastimil Babka's avatar
      mm/compaction: avoid rescanning pageblocks in isolate_freepages · fb81c5ee
      Vlastimil Babka authored
      commit e9ade569 upstream.
      
      The compaction free scanner in isolate_freepages() currently remembers PFN
      of the highest pageblock where it successfully isolates, to be used as the
      starting pageblock for the next invocation.  The rationale behind this is
      that page migration might return free pages to the allocator when
      migration fails and we don't want to skip them if the compaction
      continues.
      
      Since migration now returns free pages back to compaction code where they
      can be reused, this is no longer a concern.  This patch changes
      isolate_freepages() so that the PFN for restarting is updated with each
      pageblock where isolation is attempted.  Using stress-highalloc from
      mmtests, this resulted in 10% reduction of the pages scanned by the free
      scanner.
      
      Note that the somewhat similar functionality that records highest
      successful pageblock in zone->compact_cached_free_pfn, remains unchanged.
      This cache is used when the whole compaction is restarted, not for
      multiple invocations of the free scanner during single compaction.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb81c5ee
    • Vlastimil Babka's avatar
      mm/compaction: do not count migratepages when unnecessary · 41c9323c
      Vlastimil Babka authored
      commit f8c9301f upstream.
      
      During compaction, update_nr_listpages() has been used to count remaining
      non-migrated and free pages after a call to migrage_pages().  The
      freepages counting has become unneccessary, and it turns out that
      migratepages counting is also unnecessary in most cases.
      
      The only situation when it's needed to count cc->migratepages is when
      migrate_pages() returns with a negative error code.  Otherwise, the
      non-negative return value is the number of pages that were not migrated,
      which is exactly the count of remaining pages in the cc->migratepages
      list.
      
      Furthermore, any non-zero count is only interesting for the tracepoint of
      mm_compaction_migratepages events, because after that all remaining
      unmigrated pages are put back and their count is set to 0.
      
      This patch therefore removes update_nr_listpages() completely, and changes
      the tracepoint definition so that the manual counting is done only when
      the tracepoint is enabled, and only when migrate_pages() returns a
      negative error code.
      
      Furthermore, migrate_pages() and the tracepoints won't be called when
      there's nothing to migrate.  This potentially avoids some wasted cycles
      and reduces the volume of uninteresting mm_compaction_migratepages events
      where "nr_migrated=0 nr_failed=0".  In the stress-highalloc mmtest, this
      was about 75% of the events.  The mm_compaction_isolate_migratepages event
      is better for determining that nothing was isolated for migration, and
      this one was just duplicating the info.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41c9323c
    • David Rientjes's avatar
      mm, compaction: terminate async compaction when rescheduling · 1c99371f
      David Rientjes authored
      commit aeef4b83 upstream.
      
      Async compaction terminates prematurely when need_resched(), see
      compact_checklock_irqsave().  This can never trigger, however, if the
      cond_resched() in isolate_migratepages_range() always takes care of the
      scheduling.
      
      If the cond_resched() actually triggers, then terminate this pageblock
      scan for async compaction as well.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c99371f
    • David Rientjes's avatar
      mm, compaction: embed migration mode in compact_control · 102a6230
      David Rientjes authored
      commit e0b9daeb upstream.
      
      We're going to want to manipulate the migration mode for compaction in the
      page allocator, and currently compact_control's sync field is only a bool.
      
      Currently, we only do MIGRATE_ASYNC or MIGRATE_SYNC_LIGHT compaction
      depending on the value of this bool.  Convert the bool to enum
      migrate_mode and pass the migration mode in directly.  Later, we'll want
      to avoid MIGRATE_SYNC_LIGHT for thp allocations in the pagefault patch to
      avoid unnecessary latency.
      
      This also alters compaction triggered from sysfs, either for the entire
      system or for a node, to force MIGRATE_SYNC.
      
      [akpm@linux-foundation.org: fix build]
      [iamjoonsoo.kim@lge.com: use MIGRATE_SYNC in alloc_contig_range()]
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Suggested-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      102a6230
    • David Rientjes's avatar
      mm, compaction: add per-zone migration pfn cache for async compaction · 3793816b
      David Rientjes authored
      commit 35979ef3 upstream.
      
      Each zone has a cached migration scanner pfn for memory compaction so that
      subsequent calls to memory compaction can start where the previous call
      left off.
      
      Currently, the compaction migration scanner only updates the per-zone
      cached pfn when pageblocks were not skipped for async compaction.  This
      creates a dependency on calling sync compaction to avoid having subsequent
      calls to async compaction from scanning an enormous amount of non-MOVABLE
      pageblocks each time it is called.  On large machines, this could be
      potentially very expensive.
      
      This patch adds a per-zone cached migration scanner pfn only for async
      compaction.  It is updated everytime a pageblock has been scanned in its
      entirety and when no pages from it were successfully isolated.  The cached
      migration scanner pfn for sync compaction is updated only when called for
      sync compaction.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3793816b
    • David Rientjes's avatar
      mm, compaction: return failed migration target pages back to freelist · 20f0d30f
      David Rientjes authored
      commit d53aea3d upstream.
      
      Greg reported that he found isolated free pages were returned back to the
      VM rather than the compaction freelist.  This will cause holes behind the
      free scanner and cause it to reallocate additional memory if necessary
      later.
      
      He detected the problem at runtime seeing that ext4 metadata pages (esp
      the ones read by "sbi->s_group_desc[i] = sb_bread(sb, block)") were
      constantly visited by compaction calls of migrate_pages().  These pages
      had a non-zero b_count which caused fallback_migrate_page() ->
      try_to_release_page() -> try_to_free_buffers() to fail.
      
      Memory compaction works by having a "freeing scanner" scan from one end of
      a zone which isolates pages as migration targets while another "migrating
      scanner" scans from the other end of the same zone which isolates pages
      for migration.
      
      When page migration fails for an isolated page, the target page is
      returned to the system rather than the freelist built by the freeing
      scanner.  This may require the freeing scanner to continue scanning memory
      after suitable migration targets have already been returned to the system
      needlessly.
      
      This patch returns destination pages to the freeing scanner freelist when
      page migration fails.  This prevents unnecessary work done by the freeing
      scanner but also encourages memory to be as compacted as possible at the
      end of the zone.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20f0d30f
    • David Rientjes's avatar
      mm, migration: add destination page freeing callback · a527e8d4
      David Rientjes authored
      commit 68711a74 upstream.
      
      Memory migration uses a callback defined by the caller to determine how to
      allocate destination pages.  When migration fails for a source page,
      however, it frees the destination page back to the system.
      
      This patch adds a memory migration callback defined by the caller to
      determine how to free destination pages.  If a caller, such as memory
      compaction, builds its own freelist for migration targets, this can reuse
      already freed memory instead of scanning additional memory.
      
      If the caller provides a function to handle freeing of destination pages,
      it is called when page migration fails.  If the caller passes NULL then
      freeing back to the system will be handled as usual.  This patch
      introduces no functional change.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      a527e8d4
    • Vlastimil Babka's avatar
      mm/compaction: cleanup isolate_freepages() · 5721949c
      Vlastimil Babka authored
      commit c96b9e50 upstream.
      
      isolate_freepages() is currently somewhat hard to follow thanks to many
      looks like it is related to the 'low_pfn' variable, but in fact it is not.
      
      This patch renames the 'high_pfn' variable to a hopefully less confusing name,
      and slightly changes its handling without a functional change. A comment made
      obsolete by recent changes is also updated.
      
      [akpm@linux-foundation.org: comment fixes, per Minchan]
      [iamjoonsoo.kim@lge.com: cleanups]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5721949c
    • Heesub Shin's avatar
      mm/compaction: clean up unused code lines · 46504e57
      Heesub Shin authored
      commit 13fb44e4 upstream.
      
      Remove code lines currently not in use or never called.
      Signed-off-by: default avatarHeesub Shin <heesub.shin@samsung.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46504e57
    • Fabian Frederick's avatar
      mm/readahead.c: inline ra_submit · aa64050a
      Fabian Frederick authored
      commit 29f175d1 upstream.
      
      Commit f9acc8c7 ("readahead: sanify file_ra_state names") left
      ra_submit with a single function call.
      
      Move ra_submit to internal.h and inline it to save some stack.  Thanks
      to Andrew Morton for commenting different versions.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa64050a
    • Al Viro's avatar
      callers of iov_copy_from_user_atomic() don't need pagecache_disable() · 9fb77c77
      Al Viro authored
      commit 9e8c2af9 upstream.
      
      ... it does that itself (via kmap_atomic())
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fb77c77
    • Sasha Levin's avatar
      mm: remove read_cache_page_async() · 034c4b3e
      Sasha Levin authored
      commit 67f9fd91 upstream.
      
      This patch removes read_cache_page_async() which wasn't really needed
      anywhere and simplifies the code around it a bit.
      
      read_cache_page_async() is useful when we want to read a page into the
      cache without waiting for it to complete.  This happens when the
      appropriate callback 'filler' doesn't complete its read operation and
      releases the page lock immediately, and instead queues a different
      completion routine to do that.  This never actually happened anywhere in
      the code.
      
      read_cache_page_async() had 3 different callers:
      
      - read_cache_page() which is the sync version, it would just wait for
        the requested read to complete using wait_on_page_read().
      
      - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
        function it supplied doesn't do any async reads, and would complete
        before the filler function returns - making it actually a sync read.
      
      - CRAMFS would call it using the read_mapping_page_async() wrapper, with
        a similar story to JFFS2 - the filler function doesn't do anything that
        reminds async reads and would always complete before the filler function
        returns.
      
      To sum it up, the code in mm/filemap.c never took advantage of having
      read_cache_page_async().  While there are filler callbacks that do async
      reads (such as the block one), we always called it with the
      read_cache_page().
      
      This patch adds a mandatory wait for read to complete when adding a new
      page to the cache, and removes read_cache_page_async() and its wrappers.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      034c4b3e
    • Johannes Weiner's avatar
      mm: madvise: fix MADV_WILLNEED on shmem swapouts · 30fe6d33
      Johannes Weiner authored
      commit 55231e5c upstream.
      
      MADV_WILLNEED currently does not read swapped out shmem pages back in.
      
      Commit 0cd6144a ("mm + fs: prepare for non-page entries in page
      cache radix trees") made find_get_page() filter exceptional radix tree
      entries but failed to convert all find_get_page() callers that WANT
      exceptional entries over to find_get_entry().  One of them is shmem swap
      readahead in madvise, which now skips over any swap-out records.
      
      Convert it to find_get_entry().
      
      Fixes: 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30fe6d33
    • Johannes Weiner's avatar
      mm + fs: prepare for non-page entries in page cache radix trees · 414af56f
      Johannes Weiner authored
      commit 0cd6144a upstream.
      
      shmem mappings already contain exceptional entries where swap slot
      information is remembered.
      
      To be able to store eviction information for regular page cache, prepare
      every site dealing with the radix trees directly to handle entries other
      than pages.
      
      The common lookup functions will filter out non-page entries and return
      NULL for page cache holes, just as before.  But provide a raw version of
      the API which returns non-page entries as well, and switch shmem over to
      use it.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      414af56f
    • Johannes Weiner's avatar
      mm: filemap: move radix tree hole searching here · d141bb0e
      Johannes Weiner authored
      commit e7b563bb upstream.
      
      The radix tree hole searching code is only used for page cache, for
      example the readahead code trying to get a a picture of the area
      surrounding a fault.
      
      It sufficed to rely on the radix tree definition of holes, which is
      "empty tree slot".  But this is about to change, though, as shadow page
      descriptors will be stored in the page cache after the actual pages get
      evicted from memory.
      
      Move the functions over to mm/filemap.c and make them native page cache
      operations, where they can later be adapted to handle the new definition
      of "page cache hole".
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d141bb0e
    • Johannes Weiner's avatar
      mm: shmem: save one radix tree lookup when truncating swapped pages · c2667299
      Johannes Weiner authored
      commit 6dbaf22c upstream.
      
      Page cache radix tree slots are usually stabilized by the page lock, but
      shmem's swap cookies have no such thing.  Because the overall truncation
      loop is lockless, the swap entry is currently confirmed by a tree lookup
      and then deleted by another tree lookup under the same tree lock region.
      
      Use radix_tree_delete_item() instead, which does the verification and
      deletion with only one lookup.  This also allows removing the
      delete-only special case from shmem_radix_tree_replace().
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2667299
    • Johannes Weiner's avatar
      lib: radix-tree: add radix_tree_delete_item() · d35a6232
      Johannes Weiner authored
      commit 53c59f26 upstream.
      
      Provide a function that does not just delete an entry at a given index,
      but also allows passing in an expected item.  Delete only if that item
      is still located at the specified index.
      
      This is handy when lockless tree traversals want to delete entries as
      well because they don't have to do an second, locked lookup to verify
      the slot has not changed under them before deleting the entry.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d35a6232