1. 03 May, 2017 28 commits
    • Shaohua Li's avatar
      mm: reclaim MADV_FREE pages · 802a3a92
      Shaohua Li authored
      When memory pressure is high, we free MADV_FREE pages.  If the pages are
      not dirty in pte, the pages could be freed immediately.  Otherwise we
      can't reclaim them.  We put the pages back to anonumous LRU list (by
      setting SwapBacked flag) and the pages will be reclaimed in normal
      swapout way.
      
      We use normal page reclaim policy.  Since MADV_FREE pages are put into
      inactive file list, such pages and inactive file pages are reclaimed
      according to their age.  This is expected, because we don't want to
      reclaim too many MADV_FREE pages before used once pages.
      
      Based on Minchan's original patch
      
      [minchan@kernel.org: clean up lazyfree page handling]
        Link: http://lkml.kernel.org/r/20170303025237.GB3503@bbox
      Link: http://lkml.kernel.org/r/14b8eb1d3f6bf6cc492833f183ac8c304e560484.1487965799.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      802a3a92
    • Shaohua Li's avatar
      mm: move MADV_FREE pages into LRU_INACTIVE_FILE list · f7ad2a6c
      Shaohua Li authored
      madv()'s MADV_FREE indicate pages are 'lazyfree'.  They are still
      anonymous pages, but they can be freed without pageout.  To distinguish
      these from normal anonymous pages, we clear their SwapBacked flag.
      
      MADV_FREE pages could be freed without pageout, so they pretty much like
      used once file pages.  For such pages, we'd like to reclaim them once
      there is memory pressure.  Also it might be unfair reclaiming MADV_FREE
      pages always before used once file pages and we definitively want to
      reclaim the pages before other anonymous and file pages.
      
      To speed up MADV_FREE pages reclaim, we put the pages into
      LRU_INACTIVE_FILE list.  The rationale is LRU_INACTIVE_FILE list is tiny
      nowadays and should be full of used once file pages.  Reclaiming
      MADV_FREE pages will not have much interfere of anonymous and active
      file pages.  And the inactive file pages and MADV_FREE pages will be
      reclaimed according to their age, so we don't reclaim too many MADV_FREE
      pages too.  Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
      means we can reclaim the pages without swap support.  This idea is
      suggested by Johannes.
      
      This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
      avoid bisect failure, next patch will do it.
      
      The patch is based on Minchan's original patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7ad2a6c
    • Shaohua Li's avatar
      mm: don't assume anonymous pages have SwapBacked flag · d44d363f
      Shaohua Li authored
      There are a few places the code assumes anonymous pages should have
      SwapBacked flag set.  MADV_FREE pages are anonymous pages but we are
      going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag
      for them.  The assumption doesn't hold any more, so fix them.
      
      Link: http://lkml.kernel.org/r/3945232c0df3dd6c4ef001976f35a95f18dcb407.1487965799.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d44d363f
    • Shaohua Li's avatar
      mm: delete unnecessary TTU_* flags · a128ca71
      Shaohua Li authored
      Patch series "mm: fix some MADV_FREE issues", v5.
      
      We are trying to use MADV_FREE in jemalloc.  Several issues are found.
      Without solving the issues, jemalloc can't use the MADV_FREE feature.
      
       - Doesn't support system without swap enabled. Because if swap is off,
         we can't or can't efficiently age anonymous pages. And since
         MADV_FREE pages are mixed with other anonymous pages, we can't
         reclaim MADV_FREE pages. In current implementation, MADV_FREE will
         fallback to MADV_DONTNEED without swap enabled. But in our
         environment, a lot of machines don't enable swap. This will prevent
         our setup using MADV_FREE.
      
       - Increases memory pressure. page reclaim bias file pages reclaim
         against anonymous pages. This doesn't make sense for MADV_FREE pages,
         because those pages could be freed easily and refilled with very
         slight penality. Even page reclaim doesn't bias file pages, there is
         still an issue, because MADV_FREE pages and other anonymous pages are
         mixed together. To reclaim a MADV_FREE page, we probably must scan a
         lot of other anonymous pages, which is inefficient. In our test, we
         usually see oom with MADV_FREE enabled and nothing without it.
      
       - Accounting. There are two accounting problems. We don't have a global
         accounting. If the system is abnormal, we don't know if it's a
         problem from MADV_FREE side. The other problem is RSS accounting.
         MADV_FREE pages are accounted as normal anon pages and reclaimed
         lazily, so application's RSS becomes bigger. This confuses our
         workloads. We have monitoring daemon running and if it finds
         applications' RSS becomes abnormal, the daemon will kill the
         applications even kernel can reclaim the memory easily.
      
      To address the first the two issues, we can either put MADV_FREE pages
      into a separate LRU list (Minchan's previous patches and V1 patches), or
      put them into LRU_INACTIVE_FILE list (suggested by Johannes).  The
      patchset use the second idea.  The reason is LRU_INACTIVE_FILE list is
      tiny nowadays and should be full of used once file pages.  So we can
      still efficiently reclaim MADV_FREE pages there without interference
      with other anon and active file pages.  Putting the pages into inactive
      file list also has an advantage which allows page reclaim to prioritize
      MADV_FREE pages and used once file pages.  MADV_FREE pages are put into
      the lru list and clear SwapBacked flag, so PageAnon(page) &&
      !PageSwapBacked(page) will indicate a MADV_FREE pages.  These pages will
      directly freed without pageout if they are clean, otherwise normal swap
      will reclaim them.
      
      For the third issue, the previous post adds global accounting and a
      separate RSS count for MADV_FREE pages.  The problem is we never get
      accurate accounting for MADV_FREE pages.  The pages are mapped to
      userspace, can be dirtied without notice from kernel side.  To get
      accurate accounting, we could write protect the page, but then there is
      extra page fault overhead, which people don't want to pay.  Jemalloc
      guys have concerns about the inaccurate accounting, so this post drops
      the accounting patches temporarily.  The info exported to
      /proc/pid/smaps for MADV_FREE pages are kept, which is the only place we
      can get accurate accounting right now.
      
      This patch (of 6):
      
      Johannes pointed out TTU_LZFREE is unnecessary.  It's true because we
      always have the flag set if we want to do an unmap.  For cases we don't
      do an unmap, the TTU_LZFREE part of code should never run.
      
      Also the TTU_UNMAP is unnecessary.  If no other flags set (for example,
      TTU_MIGRATION), an unmap is implied.
      
      The patch includes Johannes's cleanup and dead TTU_ACTION macro removal
      code
      
      Link: http://lkml.kernel.org/r/4be3ea1bc56b26fd98a54d0a6f70bec63f6d8980.1487965799.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a128ca71
    • Geliang Tang's avatar
      mm/page-writeback.c: use setup_deferrable_timer · 0a372d09
      Geliang Tang authored
      Use setup_deferrable_timer() instead of init_timer_deferrable() to
      simplify the code.
      
      Link: http://lkml.kernel.org/r/e8e3d4280a34facbc007346f31df833cec28801e.1488070291.git.geliangtang@gmail.comSigned-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a372d09
    • Johannes Weiner's avatar
      mm: remove unnecessary back-off function when retrying page reclaim · 491d79ae
      Johannes Weiner authored
      The backoff mechanism is not needed.  If we have MAX_RECLAIM_RETRIES
      loops without progress, we'll OOM anyway; backing off might cut one or
      two iterations off that in the rare OOM case.  If we have intermittent
      success reclaiming a few pages, the backoff function gets reset also,
      and so is of little help in these scenarios.
      
      We might want a backoff function for when there IS progress, but not
      enough to be satisfactory.  But this isn't that.  Remove it.
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-10-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      491d79ae
    • Johannes Weiner's avatar
      Revert "mm, vmscan: account for skipped pages as a partial scan" · 3db65812
      Johannes Weiner authored
      This reverts commit d7f05528.
      
      Now that reclaimability of a node is no longer based on the ratio
      between pages scanned and theoretically reclaimable pages, we can remove
      accounting tricks for pages skipped due to zone constraints.
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-9-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3db65812
    • Johannes Weiner's avatar
      mm: delete NR_PAGES_SCANNED and pgdat_reclaimable() · c822f622
      Johannes Weiner authored
      NR_PAGES_SCANNED counts number of pages scanned since the last page free
      event in the allocator.  This was used primarily to measure the
      reclaimability of zones and nodes, and determine when reclaim should
      give up on them.  In that role, it has been replaced in the preceding
      patches by a different mechanism.
      
      Being implemented as an efficient vmstat counter, it was automatically
      exported to userspace as well.  It's however unlikely that anyone
      outside the kernel is using this counter in any meaningful way.
      
      Remove the counter and the unused pgdat_reclaimable().
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-8-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c822f622
    • Johannes Weiner's avatar
      mm: don't avoid high-priority reclaim on memcg limit reclaim · 688035f7
      Johannes Weiner authored
      Commit 246e87a9 ("memcg: fix get_scan_count() for small targets")
      sought to avoid high reclaim priorities for memcg by forcing it to scan
      a minimum amount of pages when lru_pages >> priority yielded nothing.
      This was done at a time when reclaim decisions like dirty throttling
      were tied to the priority level.
      
      Nowadays, the only meaningful thing still tied to priority dropping
      below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally
      allowed to write.  But that is from an era where direct reclaim was
      still allowed to call ->writepage, and kswapd nowadays avoids writes
      until it's scanned every clean page in the system.  Potential changes to
      how quick sc->may_writepage could trigger are of little concern.
      
      Remove the force_scan stuff, as well as the ugly multi-pass target
      calculation that it necessitated.
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-7-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      688035f7
    • Johannes Weiner's avatar
      mm: don't avoid high-priority reclaim on unreclaimable nodes · a2d7f8e4
      Johannes Weiner authored
      Commit 246e87a9 ("memcg: fix get_scan_count() for small targets")
      sought to avoid high reclaim priorities for kswapd by forcing it to scan
      a minimum amount of pages when lru_pages >> priority yielded nothing.
      
      Commit b95a2f2d ("mm: vmscan: convert global reclaim to per-memcg
      LRU lists"), due to switching global reclaim to a round-robin scheme
      over all cgroups, had to restrict this forceful behavior to
      unreclaimable zones in order to prevent massive overreclaim with many
      cgroups.
      
      The latter patch effectively neutered the behavior completely for all
      but extreme memory pressure.  But in those situations we might as well
      drop the reclaimers to lower priority levels.  Remove the check.
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-6-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2d7f8e4
    • Johannes Weiner's avatar
      mm: remove unnecessary reclaimability check from NUMA balancing target · 15038d0d
      Johannes Weiner authored
      NUMA balancing already checks the watermarks of the target node to
      decide whether it's a suitable balancing target.  Whether the node is
      reclaimable or not is irrelevant when we don't intend to reclaim.
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-5-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15038d0d
    • Johannes Weiner's avatar
      mm: remove seemingly spurious reclaimability check from laptop_mode gating · 047d72c3
      Johannes Weiner authored
      Commit 1d82de61 ("mm, vmscan: make kswapd reclaim in terms of
      nodes") allowed laptop_mode=1 to start writing not just when the
      priority drops to DEF_PRIORITY - 2 but also when the node is
      unreclaimable.
      
      That appears to be a spurious change in this patch as I doubt the series
      was tested with laptop_mode, and neither is that particular change
      mentioned in the changelog.  Remove it, it's still recent.
      
      Link: http://lkml.kernel.org/r/20170228214007.5621-4-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      047d72c3
    • Johannes Weiner's avatar
      mm: fix check for reclaimable pages in PF_MEMALLOC reclaim throttling · d450abd8
      Johannes Weiner authored
      PF_MEMALLOC direct reclaimers get throttled on a node when the sum of
      all free pages in each zone fall below half the min watermark.  During
      the summation, we want to exclude zones that don't have reclaimables.
      Checking the same pgdat over and over again doesn't make sense.
      
      Fixes: 599d0c95 ("mm, vmscan: move LRU lists to node")
      Link: http://lkml.kernel.org/r/20170228214007.5621-3-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jia He <hejianet@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d450abd8
    • Johannes Weiner's avatar
      mm: fix 100% CPU kswapd busyloop on unreclaimable nodes · c73322d0
      Johannes Weiner authored
      Patch series "mm: kswapd spinning on unreclaimable nodes - fixes and
      cleanups".
      
      Jia reported a scenario in which the kswapd of a node indefinitely spins
      at 100% CPU usage.  We have seen similar cases at Facebook.
      
      The kernel's current method of judging its ability to reclaim a node (or
      whether to back off and sleep) is based on the amount of scanned pages
      in proportion to the amount of reclaimable pages.  In Jia's and our
      scenarios, there are no reclaimable pages in the node, however, and the
      condition for backing off is never met.  Kswapd busyloops in an attempt
      to restore the watermarks while having nothing to work with.
      
      This series reworks the definition of an unreclaimable node based not on
      scanning but on whether kswapd is able to actually reclaim pages in
      MAX_RECLAIM_RETRIES (16) consecutive runs.  This is the same criteria
      the page allocator uses for giving up on direct reclaim and invoking the
      OOM killer.  If it cannot free any pages, kswapd will go to sleep and
      leave further attempts to direct reclaim invocations, which will either
      make progress and re-enable kswapd, or invoke the OOM killer.
      
      Patch #1 fixes the immediate problem Jia reported, the remainder are
      smaller fixlets, cleanups, and overall phasing out of the old method.
      
      Patch #6 is the odd one out.  It's a nice cleanup to get_scan_count(),
      and directly related to #5, but in itself not relevant to the series.
      
      If the whole series is too ambitious for 4.11, I would consider the
      first three patches fixes, the rest cleanups.
      
      This patch (of 9):
      
      Jia He reports a problem with kswapd spinning at 100% CPU when
      requesting more hugepages than memory available in the system:
      
      $ echo 4000 >/proc/sys/vm/nr_hugepages
      
      top - 13:42:59 up  3:37,  1 user,  load average: 1.09, 1.03, 1.01
      Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  0.0 us, 12.5 sy,  0.0 ni, 85.5 id,  2.0 wa,  0.0 hi,  0.0 si,  0.0 st
      KiB Mem:  31371520 total, 30915136 used,   456384 free,      320 buffers
      KiB Swap:  6284224 total,   115712 used,  6168512 free.    48192 cached Mem
      
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
         76 root      20   0       0      0      0 R 100.0 0.000 217:17.29 kswapd3
      
      At that time, there are no reclaimable pages left in the node, but as
      kswapd fails to restore the high watermarks it refuses to go to sleep.
      
      Kswapd needs to back away from nodes that fail to balance.  Up until
      commit 1d82de61 ("mm, vmscan: make kswapd reclaim in terms of
      nodes") kswapd had such a mechanism.  It considered zones whose
      theoretically reclaimable pages it had reclaimed six times over as
      unreclaimable and backed away from them.  This guard was erroneously
      removed as the patch changed the definition of a balanced node.
      
      However, simply restoring this code wouldn't help in the case reported
      here: there *are* no reclaimable pages that could be scanned until the
      threshold is met.  Kswapd would stay awake anyway.
      
      Introduce a new and much simpler way of backing off.  If kswapd runs
      through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
      page, make it back off from the node.  This is the same number of shots
      direct reclaim takes before declaring OOM.  Kswapd will go to sleep on
      that node until a direct reclaimer manages to reclaim some pages, thus
      proving the node reclaimable again.
      
      [hannes@cmpxchg.org: check kswapd failure against the cumulative nr_reclaimed count]
        Link: http://lkml.kernel.org/r/20170306162410.GB2090@cmpxchg.org
      [shakeelb@google.com: fix condition for throttle_direct_reclaim]
        Link: http://lkml.kernel.org/r/20170314183228.20152-1-shakeelb@google.com
      Link: http://lkml.kernel.org/r/20170228214007.5621-2-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatarJia He <hejianet@gmail.com>
      Tested-by: default avatarJia He <hejianet@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c73322d0
    • Greg Thelen's avatar
      slab: avoid IPIs when creating kmem caches · a87c75fb
      Greg Thelen authored
      Each slab kmem cache has per cpu array caches.  The array caches are
      created when the kmem_cache is created, either via kmem_cache_create()
      or lazily when the first object is allocated in context of a kmem
      enabled memcg.  Array caches are replaced by writing to /proc/slabinfo.
      
      Array caches are protected by holding slab_mutex or disabling
      interrupts.  Array cache allocation and replacement is done by
      __do_tune_cpucache() which holds slab_mutex and calls
      kick_all_cpus_sync() to interrupt all remote processors which confirms
      there are no references to the old array caches.
      
      IPIs are needed when replacing array caches.  But when creating a new
      array cache, there's no need to send IPIs because there cannot be any
      references to the new cache.  Outside of memcg kmem accounting these
      IPIs occur at boot time, so they're not a problem.  But with memcg kmem
      accounting each container can create kmem caches, so the IPIs are
      wasteful.
      
      Avoid unnecessary IPIs when creating array caches.
      
      Test which reports the IPI count of allocating slab in 10000 memcg:
      
      	import os
      
      	def ipi_count():
      		with open("/proc/interrupts") as f:
      			for l in f:
      				if 'Function call interrupts' in l:
      					return int(l.split()[1])
      
      	def echo(val, path):
      		with open(path, "w") as f:
      			f.write(val)
      
      	n = 10000
      	os.chdir("/mnt/cgroup/memory")
      	pid = str(os.getpid())
      	a = ipi_count()
      	for i in range(n):
      		os.mkdir(str(i))
      		echo("1G\n", "%d/memory.limit_in_bytes" % i)
      		echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i)
      		echo(pid, "%d/cgroup.procs" % i)
      		open("/tmp/x", "w").close()
      		os.unlink("/tmp/x")
      	b = ipi_count()
      	print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a)
      	echo(pid, "cgroup.procs")
      	for i in range(n):
      		os.rmdir(str(i))
      
      patched:   10000 loops: 1069 => 1170 (+101 ipis)
      unpatched: 10000 loops: 1192 => 48933 (+47741 ipis)
      
      Link: http://lkml.kernel.org/r/20170416214544.109476-1-gthelen@google.comSigned-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a87c75fb
    • Geliang Tang's avatar
      fs/ocfs2/cluster: use offset_in_page() macro · d47736fa
      Geliang Tang authored
      Use offset_in_page() macro instead of open-coding.
      
      Link: http://lkml.kernel.org/r/4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.comSigned-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d47736fa
    • Junxiao Bi's avatar
      ocfs2: o2hb: revert hb threshold to keep compatible · 33496c3c
      Junxiao Bi authored
      Configfs is the interface for ocfs2-tools to set configure to kernel and
      $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one
      used to configure heartbeat dead threshold.  Kernel has a default value
      of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb
      to override it.
      
      Commit 45b99773 ("ocfs2/cluster: use per-attribute show and store
      methods") changed heartbeat dead threshold name while ocfs2-tools did
      not, so ocfs2-tools won't set this configurable and the default value is
      always used.  So revert it.
      
      Fixes: 45b99773 ("ocfs2/cluster: use per-attribute show and store methods")
      Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Acked-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33496c3c
    • Geliang Tang's avatar
    • Masahiro Yamada's avatar
      blackfin: bf609: let clk_disable() return immediately if clk is NULL · accce8e7
      Masahiro Yamada authored
      In many of clk_disable() implementations, it is a no-op for a NULL
      pointer input, but this is one of the exceptions.
      
      Making it treewide consistent will allow clock consumers to call
      clk_disable() without NULL pointer check.
      
      Link: http://lkml.kernel.org/r/1490692624-11931-4-git-send-email-yamada.masahiro@socionext.comSigned-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Michael Turquette <mturquette@baylibre.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      accce8e7
    • Colin Ian King's avatar
      scripts/spelling.txt: add several more common spelling mistakes · 672934d2
      Colin Ian King authored
      Here are some of the more common spelling mistakes that I've found while
      fixing up spelling mistakes in kernel error message text.  They probably
      should be added to this list so we don't keep on seeing them appearing
      again.
      
      Link: http://lkml.kernel.org/r/20170421122534.5378-1-colin.king@canonical.comSigned-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      672934d2
    • Pankaj Gupta's avatar
      lib/dma-debug.c: make locking work for RT · 6a5cd60b
      Pankaj Gupta authored
      Interrupt enable/disabled with spinlock is not a valid operation for RT
      as it can make executing tasks sleep from a non-sleepable context.  So
      convert it to spin_lock_irq[save, restore].
      
      Link: http://lkml.kernel.org/r/1492065666-3816-1-git-send-email-pagupta@redhat.comSigned-off-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ville Syrjl <ville.syrjala@linux.intel.com>
      Cc: Miles Chen <miles.chen@mediatek.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6a5cd60b
    • Linus Torvalds's avatar
      Merge branch 'stable-4.12' of git://git.infradead.org/users/pcmoore/audit · 46f0537b
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Fourteen audit patches for v4.12 that span the full range of fixes,
        new features, and internal cleanups.
      
        We have a patches to move to 64-bit timestamps, convert refcounts from
        atomic_t to refcount_t, track PIDs using the pid struct instead of
        pid_t, convert our own private audit buffer cache to a standard
        kmem_cache, log kernel module names when they are unloaded, and
        normalize the NETFILTER_PKT to make the userspace folks happier.
      
        From a fixes perspective, the most important is likely the auditd
        connection tracking RCU fix; it was a rather brain dead bug that I'll
        take the blame for, but thankfully it didn't seem to affect many
        people (only one report).
      
        I think the patch subject lines and commit descriptions do a pretty
        good job of explaining the details and why the changes are important
        so I'll point you there instead of duplicating it here; as usual, if
        you have any questions you know where to find us.
      
        We also manage to take out more code than we put in this time, that
        always makes me happy :)"
      
      * 'stable-4.12' of git://git.infradead.org/users/pcmoore/audit:
        audit: fix the RCU locking for the auditd_connection structure
        audit: use kmem_cache to manage the audit_buffer cache
        audit: Use timespec64 to represent audit timestamps
        audit: store the auditd PID as a pid struct instead of pid_t
        audit: kernel generated netlink traffic should have a portid of 0
        audit: combine audit_receive() and audit_receive_skb()
        audit: convert audit_watch.count from atomic_t to refcount_t
        audit: convert audit_tree.count from atomic_t to refcount_t
        audit: normalize NETFILTER_PKT
        netfilter: use consistent ipv4 network offset in xt_AUDIT
        audit: log module name on delete_module
        audit: remove unnecessary semicolon in audit_watch_handle_event()
        audit: remove unnecessary semicolon in audit_mark_handle_event()
        audit: remove unnecessary semicolon in audit_field_valid()
      46f0537b
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 0302e28d
      Linus Torvalds authored
      Pull security subsystem updates from James Morris:
       "Highlights:
      
        IMA:
         - provide ">" and "<" operators for fowner/uid/euid rules
      
        KEYS:
         - add a system blacklist keyring
      
         - add KEYCTL_RESTRICT_KEYRING, exposes keyring link restriction
           functionality to userland via keyctl()
      
        LSM:
         - harden LSM API with __ro_after_init
      
         - add prlmit security hook, implement for SELinux
      
         - revive security_task_alloc hook
      
        TPM:
         - implement contextual TPM command 'spaces'"
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits)
        tpm: Fix reference count to main device
        tpm_tis: convert to using locality callbacks
        tpm: fix handling of the TPM 2.0 event logs
        tpm_crb: remove a cruft constant
        keys: select CONFIG_CRYPTO when selecting DH / KDF
        apparmor: Make path_max parameter readonly
        apparmor: fix parameters so that the permission test is bypassed at boot
        apparmor: fix invalid reference to index variable of iterator line 836
        apparmor: use SHASH_DESC_ON_STACK
        security/apparmor/lsm.c: set debug messages
        apparmor: fix boolreturn.cocci warnings
        Smack: Use GFP_KERNEL for smk_netlbl_mls().
        smack: fix double free in smack_parse_opts_str()
        KEYS: add SP800-56A KDF support for DH
        KEYS: Keyring asymmetric key restrict method with chaining
        KEYS: Restrict asymmetric key linkage using a specific keychain
        KEYS: Add a lookup_restriction function for the asymmetric key type
        KEYS: Add KEYCTL_RESTRICT_KEYRING
        KEYS: Consistent ordering for __key_link_begin and restrict check
        KEYS: Add an optional lookup_restriction hook to key_type
        ...
      0302e28d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 89c9fea3
      Linus Torvalds authored
      Pull trivial tree updates from Jiri Kosina.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        tty: fix comment for __tty_alloc_driver()
        init/main: properly align the multi-line comment
        init/main: Fix double "the" in comment
        Fix dead URLs to ftp.kernel.org
        drivers: Clean up duplicated email address
        treewide: Fix typo in xml/driver-api/basics.xml
        tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall"
        selftests/timers: Spelling s/privledges/privileges/
        HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/
        net: phy: dp83848: Fix Typo
        UBI: Fix typos
        Documentation: ftrace.txt: Correct nice value of 120 priority
        net: fec: Fix typo in error msg and comment
        treewide: Fix typos in printk
      89c9fea3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 76f1948a
      Linus Torvalds authored
      Pull livepatch updates from Jiri Kosina:
      
       - a per-task consistency model is being added for architectures that
         support reliable stack dumping (extending this, currently rather
         trivial set, is currently in the works).
      
         This extends the nature of the types of patches that can be applied
         by live patching infrastructure. The code stems from the design
         proposal made [1] back in November 2014. It's a hybrid of SUSE's
         kGraft and RH's kpatch, combining advantages of both: it uses
         kGraft's per-task consistency and syscall barrier switching combined
         with kpatch's stack trace switching. There are also a number of
         fallback options which make it quite flexible.
      
         Most of the heavy lifting done by Josh Poimboeuf with help from
         Miroslav Benes and Petr Mladek
      
         [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
      
       - module load time patch optimization from Zhou Chengming
      
       - a few assorted small fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: add missing printk newlines
        livepatch: Cancel transition a safe way for immediate patches
        livepatch: Reduce the time of finding module symbols
        livepatch: make klp_mutex proper part of API
        livepatch: allow removal of a disabled patch
        livepatch: add /proc/<pid>/patch_state
        livepatch: change to a per-task consistency model
        livepatch: store function sizes
        livepatch: use kstrtobool() in enabled_store()
        livepatch: move patching functions into patch.c
        livepatch: remove unnecessary object loaded check
        livepatch: separate enabled and patched states
        livepatch/s390: add TIF_PATCH_PENDING thread flag
        livepatch/s390: reorganize TIF thread flag bits
        livepatch/powerpc: add TIF_PATCH_PENDING thread flag
        livepatch/x86: add TIF_PATCH_PENDING thread flag
        livepatch: create temporary klp_update_patch_state() stub
        x86/entry: define _TIF_ALLWORK_MASK flags explicitly
        stacktrace/x86: add function for detecting reliable stack traces
      76f1948a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 7af4c727
      Linus Torvalds authored
      Pull HID subsystem updates from Jiri Kosina:
      
       - The need for HID_QUIRK_NO_INIT_REPORTS per-device quirk has been
         growing dramatically during past years, so the time has come to
         switch over the default, and perform the pro-active reading only in
         cases where it's really needed (multitouch, wacom).
      
         The only place where this behavior is (in some form) preserved is
         hiddev so that we don't introduce userspace-visible change of
         behavior.
      
         From Benjamin Tissoires
      
       - HID++ support for power_supply / baterry reporting.
      
         From Benjamin Tissoires and Bastien Nocera
      
       - Vast improvements / rework of DS3 and DS4 in Sony driver.
      
         From Roderick Colenbrander
      
       - Improvment (in terms of getting closer to the Microsoft's
         interpretation of slightly ambiguous specification) of logical range
         interpretation in case null-state is set in the rdesc.
      
         From Valtteri Heikkilä and Tomasz Kramkowski
      
       - A lot of newly supported device IDs and small assorted fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (71 commits)
        HID: usbhid: Add HID_QUIRK_NOGET for Aten CS-1758 KVM switch
        HID: asus: support backlight on USB keyboards
        HID: wacom: Move wacom_remote_irq and wacom_remote_status_irq
        HID: wacom: generic: sync pad events only for actual packets
        HID: sony: remove redundant check for -ve err
        HID: sony: Make sure to unregister sensors on failure
        HID: sony: Make DS4 bt poll interval adjustable
        HID: sony: Set proper bit flags on DS4 output report
        HID: sony: DS4 use brighter LED colors
        HID: sony: Improve navigation controller axis/button mapping
        HID: sony: Use DS3 MAC address as unique identifier on USB
        HID: logitech-hidpp: add a sysfs file to tell we support power_supply
        HID: logitech-hidpp: enable HID++ 1.0 battery reporting
        HID: logitech-hidpp: add support for battery status for the K750
        HID: logitech-hidpp: battery: provide CAPACITY_LEVEL
        HID: logitech-hidpp: rename battery level into capacity
        HID: logitech-hidpp: battery: provide ONLINE property
        HID: logitech-hidpp: notify battery on connect
        HID: logitech-hidpp: return an error if the queried feature is not present
        HID: logitech-hidpp: create the battery for all types of HID++ devices
        ...
      7af4c727
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 68fed41e
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "This is the bulk of pin control changes for the v4.12 cycle.
      
        The extra week before the merge window actually resulted in some of
        the type of fixes that usually arrive after the merge window already
        starting to trickle in from eager developers using -next, I'm
        impressed.
      
        I have recruited a Samsung subsubsystem maintainer (Krzysztof) to deal
        with the onset of Samsung patches. It works great.
      
        Apart from that it is a boring round, just incremental updates and
        fixes all over the place, no serious core changes or anything exciting
        like that. The most pleasing to see is Julia Cartwrights work to audit
        the irqchip-providing drivers for realtime locking compliance. It's
        one of those "I should really get around to looking into that" things
        that have been on my TODO list since forever.
      
        Summary:
      
        Core changes:
      
         - add bi-directional and output-enable pin configurations to the
           generic bindings and generic pin controlling core.
      
        New drivers or subdrivers:
      
         - Armada 37xx SoC pin controller and GPIO support.
      
         - Axis ARTPEC-6 SoC pin controller support.
      
         - AllWinner A64 R_PIO controller support, and opening up the
           AllWinner sunxi driver for ARM64 use.
      
         - Rockchip RK3328 support.
      
         - Renesas R-Car H3 ES2.0 support.
      
         - STM32F469 support in the STM32 driver.
      
         - Aspeed G4 and G5 pin controller support.
      
        Improvements:
      
         - a whole slew of realtime improvements to drivers implementing
           irqchips: BCM, AMD, SiRF, sunxi, rockchip.
      
         - switch meson driver to get the GPIO ranges from the device tree.
      
         - input schmitt trigger support on the Rockchip driver.
      
         - enable the sunxi (AllWinner) driver to also be used on ARM64
           silicon.
      
         - name the Qualcomm QDF2xxx GPIO lines.
      
         - support GMMR GPIO regions on the Intel Cherryview. This fixes a
           serialization problem on these platforms.
      
         - pad retention support for the Samsung Exynos 5433.
      
         - handle suspend-to-ram in the AT91-pio4 driver.
      
         - pin configuration support in the Aspeed driver.
      
        Cleanups:
      
         - the final name of Rockchip RK1108 was RV1108 so rename the driver
           and variables to stay consistent"
      
      * tag 'pinctrl-v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (80 commits)
        pinctrl: mediatek: Add missing pinctrl bindings for mt7623
        pinctrl: artpec6: Fix return value check in artpec6_pmx_probe()
        pinctrl: artpec6: Remove .owner field for driver
        pinctrl: tegra: xusb: Silence sparse warnings
        ARM: at91/at91-pinctrl documentation: fix spelling mistake: "contoller" -> "controller"
        pinctrl: make artpec6 explicitly non-modular
        pinctrl: aspeed: g5: Add pinconf support
        pinctrl: aspeed: g4: Add pinconf support
        pinctrl: aspeed: Add core pinconf support
        pinctrl: aspeed: Document pinconf in devicetree bindings
        pinctrl: Add st,stm32f469-pinctrl compatible to stm32-pinctrl
        pinctrl: stm32: Add STM32F469 MCU support
        Documentation: dt: Remove ngpios from stm32-pinctrl binding
        pinctrl: stm32: replace device_initcall() with arch_initcall()
        pinctrl: stm32: add possibility to use gpio-ranges to declare bank range
        pinctrl: armada-37xx: Add gpio support
        pinctrl: armada-37xx: Add pin controller support for Armada 37xx
        pinctrl: dt-bindings: Add documentation for Armada 37xx pin controllers
        pinctrl: core: Make pinctrl_init_controller() static
        pinctrl: generic: Add bi-directional and output-enable
        ...
      68fed41e
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · be580e75
      Linus Torvalds authored
      Pull MMC updates from Ulf Hansson:
       "MMC core:
         - Continue to re-factor code to prepare for eMMC CMDQ and blkmq support
         - Introduce queue semantics to prepare for eMMC CMDQ and blkmq support
         - Add helper functions to manage temporary enable/disable of eMMC CMDQ
         - Improve wait-busy detection for SDIO
      
        MMC host:
         - cavium: Add driver to support Cavium controllers
         - cavium: Extend Cavium driver to support Octeon and ThunderX SOCs
         - bcm2835: Add new driver for Broadcom BCM2835 controller
         - sdhci-xenon: Add driver to support Marvell Xenon SDHCI controller
         - sdhci-tegra: Add support for the Tegra186 variant
         - sdhci-of-esdhc: Support for UHS-I SD cards
         - sdhci-of-esdhc: Support for eMMC HS200 cards
         - sdhci-cadence: Add eMMC HS400 enhanced strobe support
         - sdhci-esdhc-imx: Reset tuning circuit when needed
         - sdhci-pci: Modernize and clean-up some PM related code
         - sdhci-pci: Avoid re-tuning at runtime PM for some Intel devices
         - sdhci-pci|acpi: Use aggressive PM for some Intel BYT controllers
         - sdhci: Re-factoring and modernizations
         - sdhci: Optimize delay loops
         - sdhci: Improve register dump print format
         - sdhci: Add support for the Command Queue Engine
         - meson-gx: Various improvements and clean-ups
         - meson-gx: Add support for CMD23
         - meson-gx: Basic tuning support to avoid CRC errors
         - s3cmci: Enable probing via DT
         - mediatek: Improve tuning support for eMMC HS200 and HS400 mode
         - tmio: Improve DMA support
         - tmio: Use correct response for CMD12
         - dw_mmc: Minor improvements and clean-ups"
      
      * tag 'mmc-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (148 commits)
        mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a
        mmc: sdhci-of-esdhc: poll ESDHC_CLOCK_STABLE bit with udelay
        mmc: sdhci-xenon: Fix default value of LOGIC_TIMING_ADJUST for eMMC5.0 PHY
        mmc: sdhci-xenon: Fix the work flow in xenon_remove().
        MIPS: Octeon: cavium_octeon_defconfig: Enable Octeon MMC
        mmc: sdhci-xenon: Remove redundant dev_err call in get_dt_pad_ctrl_data()
        mmc: cavium: Use module_pci_driver to simplify the code
        mmc: cavium: Add MMC support for Octeon SOCs.
        mmc: cavium: Fix detection of block or byte addressing.
        mmc: core: Export API to allow hosts to get the card address
        mmc: sdio: Fix sdio wait busy implement limitation
        mmc: sdhci-esdhc-imx: reset tuning circuit when power on mmc card
        clk: apn806: fix spelling mistake: "mising" -> "missing"
        mmc: sdhci-of-esdhc: add delay between tuning cycles
        mmc: sdhci: Control the delay between tuning commands
        mmc: sdhci-of-esdhc: add tuning support
        mmc: sdhci-of-esdhc: add support for signal voltage switch
        mmc: sdhci-of-esdhc: add peripheral clock support
        mmc: sdhci-pci: Allow for 3 bytes from Intel DSM
        mmc: cavium: Fix a shift wrapping bug
        ...
      be580e75
  2. 02 May, 2017 12 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · 8d65b08d
      Linus Torvalds authored
      Pull networking updates from David Millar:
       "Here are some highlights from the 2065 networking commits that
        happened this development cycle:
      
         1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
      
         2) Add a generic XDP driver, so that anyone can test XDP even if they
            lack a networking device whose driver has explicit XDP support
            (me).
      
         3) Sparc64 now has an eBPF JIT too (me)
      
         4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
            Starovoitov)
      
         5) Make netfitler network namespace teardown less expensive (Florian
            Westphal)
      
         6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
      
         7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
      
         8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
      
         9) Multiqueue support in stmmac driver (Joao Pinto)
      
        10) Remove TCP timewait recycling, it never really could possibly work
            well in the real world and timestamp randomization really zaps any
            hint of usability this feature had (Soheil Hassas Yeganeh)
      
        11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
            Aleksandrov)
      
        12) Add socket busy poll support to epoll (Sridhar Samudrala)
      
        13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
            and several others)
      
        14) IPSEC hw offload infrastructure (Steffen Klassert)"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
        tipc: refactor function tipc_sk_recv_stream()
        tipc: refactor function tipc_sk_recvmsg()
        net: thunderx: Optimize page recycling for XDP
        net: thunderx: Support for XDP header adjustment
        net: thunderx: Add support for XDP_TX
        net: thunderx: Add support for XDP_DROP
        net: thunderx: Add basic XDP support
        net: thunderx: Cleanup receive buffer allocation
        net: thunderx: Optimize CQE_TX handling
        net: thunderx: Optimize RBDR descriptor handling
        net: thunderx: Support for page recycling
        ipx: call ipxitf_put() in ioctl error path
        net: sched: add helpers to handle extended actions
        qed*: Fix issues in the ptp filter config implementation.
        qede: Fix concurrency issue in PTP Tx path processing.
        stmmac: Add support for SIMATIC IOT2000 platform
        net: hns: fix ethtool_get_strings overflow in hns driver
        tcp: fix wraparound issue in tcp_lp
        bpf, arm64: fix jit branch offset related to ldimm64
        bpf, arm64: implement jiting of BPF_XADD
        ...
      8d65b08d
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 5a0387a8
      Linus Torvalds authored
      Pull crypto updates from Herbert Xu:
       "Here is the crypto update for 4.12:
      
        API:
         - Add batch registration for acomp/scomp
         - Change acomp testing to non-unique compressed result
         - Extend algorithm name limit to 128 bytes
         - Require setkey before accept(2) in algif_aead
      
        Algorithms:
         - Add support for deflate rfc1950 (zlib)
      
        Drivers:
         - Add accelerated crct10dif for powerpc
         - Add crc32 in stm32
         - Add sha384/sha512 in ccp
         - Add 3des/gcm(aes) for v5 devices in ccp
         - Add Queue Interface (QI) backend support in caam
         - Add new Exynos RNG driver
         - Add ThunderX ZIP driver
         - Add driver for hardware random generator on MT7623 SoC"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (101 commits)
        crypto: stm32 - Fix OF module alias information
        crypto: algif_aead - Require setkey before accept(2)
        crypto: scomp - add support for deflate rfc1950 (zlib)
        crypto: scomp - allow registration of multiple scomps
        crypto: ccp - Change ISR handler method for a v5 CCP
        crypto: ccp - Change ISR handler method for a v3 CCP
        crypto: crypto4xx - rename ce_ring_contol to ce_ring_control
        crypto: testmgr - Allow ecb(cipher_null) in FIPS mode
        Revert "crypto: arm64/sha - Add constant operand modifier to ASM_EXPORT"
        crypto: ccp - Disable interrupts early on unload
        crypto: ccp - Use only the relevant interrupt bits
        hwrng: mtk - Add driver for hardware random generator on MT7623 SoC
        dt-bindings: hwrng: Add Mediatek hardware random generator bindings
        crypto: crct10dif-vpmsum - Fix missing preempt_disable()
        crypto: testmgr - replace compression known answer test
        crypto: acomp - allow registration of multiple acomps
        hwrng: n2 - Use devm_kcalloc() in n2rng_probe()
        crypto: chcr - Fix error handling related to 'chcr_alloc_shash'
        padata: get_next is never NULL
        crypto: exynos - Add new Exynos RNG driver
        ...
      5a0387a8
    • David S. Miller's avatar
      Merge branch 'tipc-refactor-socket-receive-functions' · 5d15af67
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: refactor socket receive functions
      
      We try to make the functions tipc_sk_recvmsg() and
      tipc_sk_recvstream() more readable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d15af67
    • Jon Paul Maloy's avatar
      tipc: refactor function tipc_sk_recv_stream() · ec8a09fb
      Jon Paul Maloy authored
      We try to make this function more readable by improving variable names
      and comments, using more stack variables, and doing some smaller changes
      to the logics. We also rename the function to make it consistent with
      naming conventions used elsewhere in the code.
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec8a09fb
    • Jon Paul Maloy's avatar
      tipc: refactor function tipc_sk_recvmsg() · e9f8b101
      Jon Paul Maloy authored
      We try to make this function more readable by improving variable names
      and comments, plus some minor changes to the logics.
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9f8b101
    • David S. Miller's avatar
      Merge branch 'thunderx-xdp' · b0e92279
      David S. Miller authored
      Sunil Goutham says:
      
      ====================
      net: thunderx: Adds XDP support
      
      This patch series adds support for XDP to ThunderX NIC driver
      which is used on CN88xx, CN81xx and CN83xx platforms.
      
      Patches 1-4 are performance improvement and cleanup patches
      which are done keeping XDP performance bottlenecks in view.
      Rest of the patches adds actual XDP support.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0e92279
    • Sunil Goutham's avatar
      net: thunderx: Optimize page recycling for XDP · 77322538
      Sunil Goutham authored
      Driver follows a method of taking one extra reference on the
      page for recycling which is fine in usual packet path where
      each 64KB page is segmented into multiple receive buffers.
      
      But in XDP mode since there is just one receive buffer per
      page taking extra page reference itself becomes big bottleneck
      consuming ~50% of CPU cycles due to atomic operations.
      
      This patch adds a internal ref count in pgcache for each
      page and additional page references are taken in a batch
      instead of just one at a time. Internal i.e 'pgcache->ref_count'
      and page's i.e 'page->_refcount' counters are compared to check
      page's recyclability.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77322538
    • Sunil Goutham's avatar
      net: thunderx: Support for XDP header adjustment · e3d06ff9
      Sunil Goutham authored
      When in XDP mode reserve XDP_PACKET_HEADROOM bytes at the start
      of receive buffer for XDP program to modify headers and adjust
      packet start. Additional code changes done to handle such packets.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3d06ff9
    • Sunil Goutham's avatar
      net: thunderx: Add support for XDP_TX · 16f2bccd
      Sunil Goutham authored
      Adds support for XDP_TX i.e transmits packet out of
      the XDP TX queue mapped to the corresponding Rx queue
      on which packet is received.
      
      Since SQ for XDP TX will be used only on a single cpu i.e
      SQ description creation and freeing, using atomic free count
      is not necessary and will become a bottleneck. Hence added
      a separate 'xdp_free_cnt' used for SQs designated for XDP
      to track descriptor free count.
      
      Changes also include
      - A new entry 'xdp_page' is added to save transmitted packet's
        page pointer for later cleanup.
      - XDP Tx SQ's doorbell is ringed once per NAPI instance.
      - Retrieving designated SQ for packets being sent out by stack
        via 'nicvf_xmit'.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16f2bccd
    • Sunil Goutham's avatar
      net: thunderx: Add support for XDP_DROP · c56d91ce
      Sunil Goutham authored
      Adds support for XDP_DROP.
      Also since in XDP mode there is just a single buffer per page,
      made changes to recycle DMA mapping info as well along with pages.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c56d91ce
    • Sunil Goutham's avatar
      net: thunderx: Add basic XDP support · 05c773f5
      Sunil Goutham authored
      Adds basic XDP support i.e attaching a BPF program to an
      interface. Also takes care of allocating separate Tx queues
      for XDP path and for network stack packet transmission.
      
      This patch doesn't support handling of any of the XDP actions,
      all are treated as XDP_PASS i.e packets will be handed over to
      the network stack.
      
      Changes also involve allocating one receive buffer per page in XDP
      mode and multiple in normal mode i.e when no BPF program is attached.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05c773f5
    • Sunil Goutham's avatar
      net: thunderx: Cleanup receive buffer allocation · 927987f3
      Sunil Goutham authored
      Get rid of unnecessary double pointer references and type casting
      in receive buffer allocation code.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      927987f3