1. 10 Oct, 2014 40 commits
    • Hugh Dickins's avatar
      shmem: fix init_page_accessed use to stop !PageLRU bug · d6105cda
      Hugh Dickins authored
      Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU)
      in isolate_lru_pages() at mm/vmscan.c:1281!
      
      Commit 2457aec6 ("mm: non-atomically mark page accessed during page
      cache allocation where possible") looks like interrupted work-in-progress.
      
      mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's
      - shmem_write_begin() is clearly wrong to use it after shmem_getpage(),
      when the page is always visible in radix_tree, and often already on LRU.
      
      Revert change to shmem_write_begin(), and use init_page_accessed() or
      mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp().
      
      SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed()
      before; but since many other filesystems use [__]page_symlink(), which did
      and does mark the page accessed, consider this as rectifying an oversight.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Prabhakar Lad <prabhakar.csengg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 66d2f4d2)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d6105cda
    • Mel Gorman's avatar
      mm: avoid unnecessary atomic operations during end_page_writeback() · 0873b697
      Mel Gorman authored
      If a page is marked for immediate reclaim then it is moved to the tail of
      the LRU list.  This occurs when the system is under enough memory pressure
      for pages under writeback to reach the end of the LRU but we test for this
      using atomic operations on every writeback.  This patch uses an optimistic
      non-atomic test first.  It'll miss some pages in rare cases but the
      consequences are not severe enough to warrant such a penalty.
      
      While the function does not dominate profiles during a simple dd test the
      cost of it is reduced.
      
      73048     0.7428  vmlinux-3.15.0-rc5-mmotm-20140513 end_page_writeback
      23740     0.2409  vmlinux-3.15.0-rc5-lessatomic     end_page_writeback
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 888cf2db)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0873b697
    • Mel Gorman's avatar
      fs: buffer: do not use unnecessary atomic operations when discarding buffers · 9e4b51e6
      Mel Gorman authored
      Discarding buffers uses a bunch of atomic operations when discarding
      buffers because ......  I can't think of a reason.  Use a cmpxchg loop to
      clear all the necessary flags.  In most (all?) cases this will be a single
      atomic operations.
      
      [akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit e7470ee8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9e4b51e6
    • Mel Gorman's avatar
      mm: shmem: avoid atomic operation during shmem_getpage_gfp · 382ee384
      Mel Gorman authored
      shmem_getpage_gfp uses an atomic operation to set the SwapBacked field
      before it's even added to the LRU or visible.  This is unnecessary as what
      could it possible race against?  Use an unlocked variant.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 07a42788)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      382ee384
    • Mel Gorman's avatar
      mm: page_alloc: only check the alloc flags and gfp_mask for dirty once · 0011f433
      Mel Gorman authored
      Currently it's calculated once per zone in the zonelist.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit a6e21b14)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0011f433
    • Mel Gorman's avatar
      mm: page_alloc: only check the zone id check if pages are buddies · fd51043d
      Mel Gorman authored
      A node/zone index is used to check if pages are compatible for merging
      but this happens unconditionally even if the buddy page is not free. Defer
      the calculation as long as possible. Ideally we would check the zone boundary
      but nodes can overlap.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit d34c5fa0)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fd51043d
    • Mel Gorman's avatar
      mm: page_alloc: do not treat a zone that cannot be used for dirty pages as "full" · e9845e8a
      Mel Gorman authored
      If a zone cannot be used for a dirty page then it gets marked "full" which
      is cached in the zlc and later potentially skipped by allocation requests
      that have nothing to do with dirty zones.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 800a1e75)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e9845e8a
    • Mel Gorman's avatar
      mm: page_alloc: do not update zlc unless the zlc is active · 6b8731fd
      Mel Gorman authored
      The zlc is used on NUMA machines to quickly skip over zones that are full.
       However it is always updated, even for the first zone scanned when the
      zlc might not even be active.  As it's a write to a bitmap that
      potentially bounces cache line it's deceptively expensive and most
      machines will not care.  Only update the zlc if it was active.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 65bb3719)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6b8731fd
    • Shaohua Li's avatar
      x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB · 9654946f
      Shaohua Li authored
      We use the accessed bit to age a page at page reclaim time,
      and currently we also flush the TLB when doing so.
      
      But in some workloads TLB flush overhead is very heavy. In my
      simple multithreaded app with a lot of swap to several pcie
      SSDs, removing the tlb flush gives about 20% ~ 30% swapout
      speedup.
      
      Fortunately just removing the TLB flush is a valid optimization:
      on x86 CPUs, clearing the accessed bit without a TLB flush
      doesn't cause data corruption.
      
      It could cause incorrect page aging and the (mistaken) reclaim of
      hot pages, but the chance of that should be relatively low.
      
      So as a performance optimization don't flush the TLB when
      clearing the accessed bit, it will eventually be flushed by
      a context switch or a VM operation anyway. [ In the rare
      event of it not getting flushed for a long time the delay
      shouldn't really matter because there's no real memory
      pressure for swapout to react to. ]
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: linux-mm@kvack.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
      [ Rewrote the changelog and the code comments. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      
      (cherry picked from commit b13b1d2d)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9654946f
    • David Rientjes's avatar
      mm, compaction: terminate async compaction when rescheduling · 847fe19d
      David Rientjes authored
      Async compaction terminates prematurely when need_resched(), see
      compact_checklock_irqsave().  This can never trigger, however, if the
      cond_resched() in isolate_migratepages_range() always takes care of the
      scheduling.
      
      If the cond_resched() actually triggers, then terminate this pageblock
      scan for async compaction as well.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit aeef4b83)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      847fe19d
    • Heesub Shin's avatar
      mm/compaction: clean up unused code lines · 3336b192
      Heesub Shin authored
      Remove code lines currently not in use or never called.
      Signed-off-by: default avatarHeesub Shin <heesub.shin@samsung.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 13fb44e4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3336b192
    • Fabian Frederick's avatar
      mm/readahead.c: inline ra_submit · cba97a14
      Fabian Frederick authored
      Commit f9acc8c7 ("readahead: sanify file_ra_state names") left
      ra_submit with a single function call.
      
      Move ra_submit to internal.h and inline it to save some stack.  Thanks
      to Andrew Morton for commenting different versions.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 29f175d1)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cba97a14
    • Al Viro's avatar
      callers of iov_copy_from_user_atomic() don't need pagecache_disable() · 22cefd43
      Al Viro authored
      ... it does that itself (via kmap_atomic())
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      
      (cherry picked from commit 9e8c2af9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      22cefd43
    • Sasha Levin's avatar
      mm: remove read_cache_page_async() · 66a51f47
      Sasha Levin authored
      This patch removes read_cache_page_async() which wasn't really needed
      anywhere and simplifies the code around it a bit.
      
      read_cache_page_async() is useful when we want to read a page into the
      cache without waiting for it to complete.  This happens when the
      appropriate callback 'filler' doesn't complete its read operation and
      releases the page lock immediately, and instead queues a different
      completion routine to do that.  This never actually happened anywhere in
      the code.
      
      read_cache_page_async() had 3 different callers:
      
      - read_cache_page() which is the sync version, it would just wait for
        the requested read to complete using wait_on_page_read().
      
      - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
        function it supplied doesn't do any async reads, and would complete
        before the filler function returns - making it actually a sync read.
      
      - CRAMFS would call it using the read_mapping_page_async() wrapper, with
        a similar story to JFFS2 - the filler function doesn't do anything that
        reminds async reads and would always complete before the filler function
        returns.
      
      To sum it up, the code in mm/filemap.c never took advantage of having
      read_cache_page_async().  While there are filler callbacks that do async
      reads (such as the block one), we always called it with the
      read_cache_page().
      
      This patch adds a mandatory wait for read to complete when adding a new
      page to the cache, and removes read_cache_page_async() and its wrappers.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 67f9fd91)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      66a51f47
    • Johannes Weiner's avatar
      mm: filemap: move radix tree hole searching here · 08a37688
      Johannes Weiner authored
      The radix tree hole searching code is only used for page cache, for
      example the readahead code trying to get a a picture of the area
      surrounding a fault.
      
      It sufficed to rely on the radix tree definition of holes, which is
      "empty tree slot".  But this is about to change, though, as shadow page
      descriptors will be stored in the page cache after the actual pages get
      evicted from memory.
      
      Move the functions over to mm/filemap.c and make them native page cache
      operations, where they can later be adapted to handle the new definition
      of "page cache hole".
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit e7b563bb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      08a37688
    • Johannes Weiner's avatar
      mm: shmem: save one radix tree lookup when truncating swapped pages · 2948b933
      Johannes Weiner authored
      Page cache radix tree slots are usually stabilized by the page lock, but
      shmem's swap cookies have no such thing.  Because the overall truncation
      loop is lockless, the swap entry is currently confirmed by a tree lookup
      and then deleted by another tree lookup under the same tree lock region.
      
      Use radix_tree_delete_item() instead, which does the verification and
      deletion with only one lookup.  This also allows removing the
      delete-only special case from shmem_radix_tree_replace().
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 6dbaf22c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2948b933
    • Johannes Weiner's avatar
      lib: radix-tree: add radix_tree_delete_item() · 5581e1f4
      Johannes Weiner authored
      Provide a function that does not just delete an entry at a given index,
      but also allows passing in an expected item.  Delete only if that item
      is still located at the specified index.
      
      This is handy when lockless tree traversals want to delete entries as
      well because they don't have to do an second, locked lookup to verify
      the slot has not changed under them before deleting the entry.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 53c59f26)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5581e1f4
    • David Rientjes's avatar
      mm, compaction: determine isolation mode only once · d57ab360
      David Rientjes authored
      The conditions that control the isolation mode in
      isolate_migratepages_range() do not change during the iteration, so
      extract them out and only define the value once.
      
      This actually does have an effect, gcc doesn't optimize it itself because
      of cc->sync.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit da1c67a7)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d57ab360
    • Yasuaki Ishimatsu's avatar
      mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve · 540feba4
      Yasuaki Ishimatsu authored
      Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on
      9TB memory machine since onlining memory sections is too slow.  And we
      found out setup_zone_migrate_reserve spent >90% of the time.
      
      The problem is, setup_zone_migrate_reserve scans all pageblocks
      unconditionally, but it is only necessary if the number of reserved
      block was reduced (i.e.  memory hot remove).
      
      Moreover, maximum MIGRATE_RESERVE per zone is currently 2.  It means
      that the number of reserved pageblocks is almost always unchanged.
      
      This patch adds zone->nr_migrate_reserve_block to maintain the number of
      MIGRATE_RESERVE pageblocks and it reduces the overhead of
      setup_zone_migrate_reserve dramatically.  The following table shows time
      of onlining a memory section.
      
        Amount of memory     | 128GB | 192GB | 256GB|
        ---------------------------------------------
        linux-3.12           |  23.9 |  31.4 | 44.5 |
        This patch           |   8.3 |   8.3 |  8.6 |
        Mel's proposal patch |  10.9 |  19.2 | 31.3 |
        ---------------------------------------------
                                         (millisecond)
      
        128GB : 4 nodes and each node has 32GB of memory
        192GB : 6 nodes and each node has 32GB of memory
        256GB : 8 nodes and each node has 32GB of memory
      
        (*1) Mel proposed his idea by the following threads.
             https://lkml.org/lkml/2013/10/30/272
      
      [akpm@linux-foundation.org: tweak comment]
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Tested-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 943dca1a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      540feba4
    • Vlastimil Babka's avatar
      mm: compaction: reset scanner positions immediately when they meet · 179c9c61
      Vlastimil Babka authored
      Compaction used to start its migrate and free page scaners at the zone's
      lowest and highest pfn, respectively.  Later, caching was introduced to
      remember the scanners' progress across compaction attempts so that
      pageblocks are not re-scanned uselessly.  Additionally, pageblocks where
      isolation failed are marked to be quickly skipped when encountered again
      in future compactions.
      
      Currently, both the reset of cached pfn's and clearing of the pageblock
      skip information for a zone is done in __reset_isolation_suitable().
      This function gets called when:
      
       - compaction is restarting after being deferred
       - compact_blockskip_flush flag is set in compact_finished() when the scanners
         meet (and not again cleared when direct compaction succeeds in allocation)
         and kswapd acts upon this flag before going to sleep
      
      This behavior is suboptimal for several reasons:
      
       - when direct sync compaction is called after async compaction fails (in the
         allocation slowpath), it will effectively do nothing, unless kswapd
         happens to process the compact_blockskip_flush flag meanwhile. This is racy
         and goes against the purpose of sync compaction to more thoroughly retry
         the compaction of a zone where async compaction has failed.
         The restart-after-deferring path cannot help here as deferring happens only
         after the sync compaction fails. It is also done only for the preferred
         zone, while the compaction might be done for a fallback zone.
      
       - the mechanism of marking pageblock to be skipped has little value since the
         cached pfn's are reset only together with the pageblock skip flags. This
         effectively limits pageblock skip usage to parallel compactions.
      
      This patch changes compact_finished() so that cached pfn's are reset
      immediately when the scanners meet.  Clearing pageblock skip flags is
      unchanged, as well as the other situations where cached pfn's are reset.
      This allows the sync-after-async compaction to retry pageblocks not
      marked as skipped, such as blocks !MIGRATE_MOVABLE blocks that async
      compactions now skips without marking them.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 55b7c4c9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      179c9c61
    • Vlastimil Babka's avatar
      mm: compaction: do not mark unmovable pageblocks as skipped in async compaction · 55f9231b
      Vlastimil Babka authored
      Compaction temporarily marks pageblocks where it fails to isolate pages
      as to-be-skipped in further compactions, in order to improve efficiency.
      One of the reasons to fail isolating pages is that isolation is not
      attempted in pageblocks that are not of MIGRATE_MOVABLE (or CMA) type.
      
      The problem is that blocks skipped due to not being MIGRATE_MOVABLE in
      async compaction become skipped due to the temporary mark also in future
      sync compaction.  Moreover, this may follow quite soon during
      __alloc_page_slowpath, without much time for kswapd to clear the
      pageblock skip marks.  This goes against the idea that sync compaction
      should try to scan these blocks more thoroughly than the async
      compaction.
      
      The fix is to ensure in async compaction that these !MIGRATE_MOVABLE
      blocks are not marked to be skipped.  Note this should not affect
      performance or locking impact of further async compactions, as skipping
      a block due to being !MIGRATE_MOVABLE is done soon after skipping a
      block marked to be skipped, both without locking.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 50b5b094)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      55f9231b
    • Vlastimil Babka's avatar
      mm: compaction: encapsulate defer reset logic · 0d8266bc
      Vlastimil Babka authored
      Currently there are several functions to manipulate the deferred
      compaction state variables.  The remaining case where the variables are
      touched directly is when a successful allocation occurs in direct
      compaction, or is expected to be successful in the future by kswapd.
      Here, the lowest order that is expected to fail is updated, and in the
      case of successful allocation, the deferred status and counter is reset
      completely.
      
      Create a new function compaction_defer_reset() to encapsulate this
      functionality and make it easier to understand the code.  No functional
      change.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit de6c60a6)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0d8266bc
    • Damien Ramonda's avatar
      readahead: fix sequential read cache miss detection · 097bcfad
      Damien Ramonda authored
      The kernel's readahead algorithm sometimes interprets random read
      accesses as sequential and triggers unnecessary data prefecthing from
      storage device (impacting random read average latency).
      
      In order to identify sequential cache read misses, the readahead
      algorithm intends to check whether offset - previous offset == 1
      (trivial sequential reads) or offset - previous offset == 0 (sequential
      reads not aligned on page boundary):
      
        if (offset - (ra->prev_pos >> PAGE_CACHE_SHIFT) <= 1UL)
      
      The current offset is stored in the "offset" variable of type "pgoff_t"
      (unsigned long), while previous offset is stored in "ra->prev_pos" of
      type "loff_t" (long long).  Therefore, operands of the if statement are
      implicitly converted to type long long.  Consequently, when previous
      offset > current offset (which happens on random pattern), the if
      condition is true and access is wrongly interpeted as sequential.  An
      unnecessary data prefetching is triggered, impacting the average random
      read latency.
      
      Storing the previous offset value in a "pgoff_t" variable (unsigned
      long) fixes the sequential read detection logic.
      Signed-off-by: default avatarDamien Ramonda <damien.ramonda@intel.com>
      Reviewed-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Acked-by: default avatarPierre Tardy <pierre.tardy@intel.com>
      Acked-by: default avatarDavid Cohen <david.a.cohen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit af248a0c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      097bcfad
    • Hugh Dickins's avatar
      mm: fix bad rss-counter if remap_file_pages raced migration · 06574fd4
      Hugh Dickins authored
      Fix some "Bad rss-counter state" reports on exit, arising from the
      interaction between page migration and remap_file_pages(): zap_pte()
      must count a migration entry when zapping it.
      
      And yes, it is possible (though very unusual) to find an anon page or
      swap entry in a VM_SHARED nonlinear mapping: coming from that horrid
      get_user_pages(write, force) case which COWs even in a shared mapping.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: Sasha Levin sasha.levin@oracle.com>
      Tested-by: Dave Jones davej@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 88784396)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      06574fd4
    • Joonsoo Kim's avatar
      slab: correct pfmemalloc check · f47a2602
      Joonsoo Kim authored
      We checked pfmemalloc by slab unit, not page unit. You can see this
      in is_slab_pfmemalloc(). So other pages don't need to be set/cleared
      pfmemalloc.
      
      And, therefore we should check pfmemalloc in page flag of first page,
      but current implementation don't do that. virt_to_head_page(obj) just
      return 'struct page' of that object, not one of first page, since the SLAB
      don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
      we first get a slab and try to get it via virt_to_head_page(slab->s_mem).
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarPekka Enberg <penberg@iki.fi>
      
      (cherry picked from commit 73293c2f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f47a2602
    • Bob Liu's avatar
      mm: thp: cleanup: mv alloc_hugepage to better place · be33396e
      Bob Liu authored
      Move alloc_hugepage() to a better place, no need for a seperate #ifndef
      CONFIG_NUMA
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andrew Davidoff <davidoff@qedmf.net>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 10dc4155)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      be33396e
    • Prarit Bhargava's avatar
      x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() · b65c7fde
      Prarit Bhargava authored
      Further discussion here: http://marc.info/?l=linux-kernel&m=139073901101034&w=2
      
      kbuild, 0day kernel build service, outputs the warning:
      
      arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes
      is larger than 2048 bytes [-Wframe-larger-than=]
      
      because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the
      stack.   Fix this by moving the two cpumasks to a global file context.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Tested-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1390915331-27375-1-git-send-email-prarit@redhat.com
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      
      (cherry picked from commit 39424e89)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b65c7fde
    • Prarit Bhargava's avatar
      x86: Add check for number of available vectors before CPU down · 2d8a1ddb
      Prarit Bhargava authored
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
      
      When a cpu is downed on a system, the irqs on the cpu are assigned to
      other cpus.  It is possible, however, that when a cpu is downed there
      aren't enough free vectors on the remaining cpus to account for the
      vectors from the cpu that is being downed.
      
      This results in an interesting "overflow" condition where irqs are
      "assigned" to a CPU but are not handled.
      
      For example, when downing cpus on a 1-64 logical processor system:
      
      <snip>
      [  232.021745] smpboot: CPU 61 is now offline
      [  238.480275] smpboot: CPU 62 is now offline
      [  245.991080] ------------[ cut here ]------------
      [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
      [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
      [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
      [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
      [  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
      [  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
      [  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
      [  246.082430] Call Trace:
      [  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
      [  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
      [  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
      [  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
      [  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
      [  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
      [  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
      [  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
      [  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
      [  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
      [  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
      [  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
      [  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
      [  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
      [  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
      [  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
      [  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
      [  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
      [  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
      [  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
      [  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
      [  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
      [  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
      [  246.249610] ---[ end trace fb74fdef54d79039 ]---
      [  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
      [  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
      Last login: Mon Nov 11 08:35:14 from 10.18.17.119
      [root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      
      (last lines keep repeating.  ixgbe driver is dead until module reload.)
      
      If the downed cpu has more vectors than are free on the remaining cpus on the
      system, it is possible that some vectors are "orphaned" even though they are
      assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
      watchdog fired and notified that something was wrong.
      
      This patch adds a function, check_vectors(), to compare the number of vectors
      on the CPU going down and compares it to the number of vectors available on
      the system.  If there aren't enough vectors for the CPU to go down, an
      error is returned and propogated back to userspace.
      
      v2: Do not need to look at percpu irqs
      v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
          Priority Mode
      v4: Additional changes suggested by Gong Chen.
      v5/v6/v7/v8: Updated comment text
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.comReviewed-by: default avatarGong Chen <gong.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit da6139e4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2d8a1ddb
    • Vincent Stehlé's avatar
      usb: host: ohci-spear: fix ohci_dump parameters · 2df2ea47
      Vincent Stehlé authored
      Commit 6a04d05a ("USB: OHCI: fix bugs in debug routines") has removed
      the unused `verbose' argument of the debug function ohci_dump(); adapt
      ohci-spear accordingly.
      
      This fixes the following compilation error:
      
        drivers/usb/host/ohci-spear.c: In function ‘ohci_spear_start’:
        drivers/usb/host/ohci-spear.c:56:2: error: too many arguments to function ‘ohci_dump’
      Signed-off-by: default avatarVincent Stehlé <vincent.stehle@laposte.net>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      (cherry picked from commit d8804ba0)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2df2ea47
    • Takashi Iwai's avatar
      ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co · 07093219
      Takashi Iwai authored
      ALC269 & co have many vendor-specific setups with COEF verbs.
      However, some verbs seem specific to some codec versions and they
      result in the codec stalling.  Typically, such a case can be avoided
      by checking the return value from reading a COEF.  If the return value
      is -1, it implies that the COEF is invalid, thus it shouldn't be
      written.
      
      This patch adds the invalid COEF checks in appropriate places
      accessing ALC269 and its variants.  The patch actually fixes the
      resume problem on Acer AO725 laptop.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=52181Tested-by: default avatarFrancesco Muzio <muziofg@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit f3ee07d8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      07093219
    • Jiri Kosina's avatar
      HID: logitech: perform bounds checking on device_id early enough · 240e4b05
      Jiri Kosina authored
      device_index is a char type and the size of paired_dj_deivces is 7
      elements, therefore proper bounds checking has to be applied to
      device_index before it is used.
      
      We are currently performing the bounds checking in
      logi_dj_recv_add_djhid_device(), which is too late, as malicious device
      could send REPORT_TYPE_NOTIF_DEVICE_UNPAIRED early enough and trigger the
      problem in one of the report forwarding functions called from
      logi_dj_raw_event().
      
      Fix this by performing the check at the earliest possible ocasion in
      logi_dj_raw_event().
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarBen Hawkes <hawkes@google.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit ad3e14d7)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      240e4b05
    • Bjorn Helgaas's avatar
      PCI: Add pci_upstream_bridge() · d496928f
      Bjorn Helgaas authored
      This adds a pci_upstream_bridge() interface to find the PCI-to-PCI bridge
      upstream from a device.  This is typically just "dev->bus->self", but in
      the case of a VF on a virtual bus, we have to start from the corresponding
      PF.  Returns NULL if there is no upstream PCI bridge, i.e., if the device
      is on a root bus.
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      
      (cherry picked from commit c6bde215)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d496928f
    • Lee, Chun-Yi's avatar
      PM / hibernate: avoid unsafe pages in e820 reserved regions · e59b4835
      Lee, Chun-Yi authored
      When the machine doesn't well handle the e820 persistent when hibernate
      resuming, then it may cause page fault when writing image to snapshot
      buffer:
      
      [   17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000
      [   17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40
      [   17.933469] PGD 2194067 PUD 77ffff067 PMD 2197067 PTE 0
      [   17.933469] Oops: 0002 [#1] SMP
      ...
      
      The ffff880069d4f000 page is in e820 reserved region of resume boot
      kernel:
      
      [    0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] reserved
      ...
      [    0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff]
      
      So snapshot.c mark the pfn to forbidden pages map. But, this
      page is also in the memory bitmap in snapshot image because it's an
      original page used by image kernel, so it will also mark as an
      unsafe(free) page in prepare_image().
      
      That means the page in e820 when resuming mark as "forbidden" and
      "free", it causes get_buffer() treat it as an allocated unsafe page.
      Then snapshot_write_next() return this page to load_image, load_image
      writing content to this address, but this page didn't really allocated
      . So, we got page fault.
      
      Although the root cause is from BIOS, I think aggressive check and
      significant message in kernel will better then a page fault for
      issue tracking, especially when serial console unavailable.
      
      This patch adds code in mark_unsafe_pages() for check does free pages in
      nosave region. If so, then it print message and return fault to stop whole
      S4 resume process:
      
      [    8.166004] PM: Image loading progress:   0%
      [    8.658717] PM: 0x6796c000 in e820 nosave region: [mem 0x6796c000-0x6796cfff]
      [    8.918737] PM: Read 2511940 kbytes in 1.04 seconds (2415.32 MB/s)
      [    8.926633] PM: Error -14 resuming
      [    8.933534] PM: Failed to load hibernation image, recovering.
      Reviewed-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
      [rjw: Subject]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      
      (cherry picked from commit 84c91b7a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e59b4835
    • Jiang Liu's avatar
      USB: core: hcd-pci: free IRQ before disabling PCI device when shutting down · 81fa1df0
      Jiang Liu authored
      The assigned IRQ should be freed before calling pci_disable_device()
      when shutting down system, otherwise it will cause following warning.
      [  568.879482] ------------[ cut here ]------------
      [  568.884236] WARNING: CPU: 1 PID: 3300 at /home/konrad/ssd/konrad/xtt-i386/bootstrap/linux-usb/fs/proc/generic.c:521 remove_proc_entry+0x165/0x170()
      [  568.897846] remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'ohci_hcd:usb4'
      [  568.907430] Modules linked in: dm_multipath dm_mod iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi libcrc32c crc32c_generic sg sd_mod crct10dif_generic crc_t10dif crct10dif_common radeon fbcon tileblit ttm font bitblit softcursor ata_generic ahci libahci drm_kms_helper skge r8169 libata mii scsi_mod wmi acpi_cpufreq
      [  568.938539] CPU: 1 PID: 3300 Comm: init Tainted: G        W     3.16.0-rc5upstream-01651-g03b9189 #1
      [  568.947946] Hardware name: ECS A780GM-A Ultra/A780GM-A Ultra, BIOS 080015  04/01/2010
      [  568.956008]  00000209 ed0f1cd0 c1617946 c175403c ed0f1d00 c1090c3f c1754084 ed0f1d2c
      [  568.964068]  00000ce4 c175403c 00000209 c11f22a5 c11f22a5 f755e8c0 ed0f1d78 f755e90d
      [  568.972128]  ed0f1d18 c1090cde 00000009 ed0f1d10 c1754084 ed0f1d2c ed0f1d60 c11f22a5
      [  568.980194] Call Trace:
      [  568.982715]  [<c1617946>] dump_stack+0x48/0x60
      [  568.987294]  [<c1090c3f>] warn_slowpath_common+0x7f/0xa0
      [  569.003887]  [<c1090cde>] warn_slowpath_fmt+0x2e/0x30
      [  569.009092]  [<c11f22a5>] remove_proc_entry+0x165/0x170
      [  569.014476]  [<c10da6ca>] unregister_irq_proc+0xaa/0xc0
      [  569.019858]  [<c10d582f>] free_desc+0x1f/0x60
      [  569.024346]  [<c10d58aa>] irq_free_descs+0x3a/0x80
      [  569.029283]  [<c10d9e9d>] irq_dispose_mapping+0x2d/0x50
      [  569.034666]  [<c1078fd3>] mp_unmap_irq+0x73/0xa0
      [  569.039423]  [<c107196b>] acpi_unregister_gsi_ioapic+0x2b/0x40
      [  569.045431]  [<c107180f>] acpi_unregister_gsi+0xf/0x20
      [  569.050725]  [<c1339cad>] acpi_pci_irq_disable+0x4b/0x50
      [  569.056196]  [<c14daa38>] pcibios_disable_device+0x18/0x20
      [  569.061848]  [<c130123d>] do_pci_disable_device+0x4d/0x60
      [  569.067410]  [<c13012b7>] pci_disable_device+0x47/0xb0
      [  569.077814]  [<c14800b1>] usb_hcd_pci_shutdown+0x31/0x40
      [  569.083285]  [<c1304b19>] pci_device_shutdown+0x19/0x50
      [  569.088667]  [<c13fda64>] device_shutdown+0x14/0x120
      [  569.093777]  [<c10ac29d>] kernel_restart_prepare+0x2d/0x30
      [  569.099429]  [<c10ac41e>] kernel_restart+0xe/0x60
      [  569.109028]  [<c10ac611>] SYSC_reboot+0x191/0x220
      [  569.159269]  [<c10ac6ba>] SyS_reboot+0x1a/0x20
      [  569.163843]  [<c161c718>] sysenter_do_call+0x12/0x16
      [  569.168951] ---[ end trace ccc1ec4471c289c9 ]---
      Tested-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Reviewed-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit c5946f9d)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      81fa1df0
    • James P Michels III's avatar
      usb-core bInterval quirk · 8854b6e8
      James P Michels III authored
      This patch adds a usb quirk to support devices with interupt endpoints
      and bInterval values expressed as microframes. The quirk causes the
      parse endpoint function to modify the reported bInterval to a standards
      conforming value.
      
      There is currently code in the endpoint parser that checks for
      bIntervals that are outside of the valid range (1-16 for USB 2+ high
      speed and super speed interupt endpoints). In this case, the code assumes
      the bInterval is being reported in 1ms frames. As well, the correction
      is only applied if the original bInterval value is out of the 1-16 range.
      
      With this quirk applied to the device, the bInterval will be
      accurately adjusted from microframes to an exponent.
      Signed-off-by: default avatarJames P Michels III <james.p.michels@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit cd83ce9e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8854b6e8
    • Joonyoung Shim's avatar
      USB: add reset resume quirk for usb3503 · 665118b6
      Joonyoung Shim authored
      The usb device will autoresume from choose_wakeup() if it is
      autosuspended with the wrong wakeup setting, but below errors occur
      because usb3503 misc driver will switch to standby mode when suspended.
      
      As add USB_QUIRK_RESET_RESUME, it can stop setting wrong wakeup from
      autosuspend_check().
      
      [    7.734717] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
      [    7.854658] usb 1-3: device descriptor read/64, error -71
      [    8.079657] usb 1-3: device descriptor read/64, error -71
      [    8.294664] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
      [    8.414658] usb 1-3: device descriptor read/64, error -71
      [    8.639657] usb 1-3: device descriptor read/64, error -71
      [    8.854667] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
      [    9.264598] usb 1-3: device not accepting address 3, error -71
      [    9.374655] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
      [    9.784601] usb 1-3: device not accepting address 3, error -71
      [    9.784838] usb usb1-port3: device 1-3 not suspended yet
      Signed-off-by: default avatarJoonyoung Shim <jy0922.shim@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 526a4045)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      665118b6
    • Preston Fick's avatar
      USB: serial: cp210x: Removing unncessary `usb_reset_device` on startup · 8a4be8ed
      Preston Fick authored
      This `usb_reset_device` command has been around since the driver was
      originally reverse engineered. It doesn't cause much issue on single
      interface CP210x devices, but on the CP2105 and CP2108 with 2 and 4
      interfaces respectively it will cause instability on enumeration and
      delays enumeration noticably. There should be no reason to reset a device
      at startup, per the CP210x AN571 spec.
      Signed-off-by: default avatarPreston Fick <preston.fick@silabs.com>
      Cc: Johan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 934ef5ac)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8a4be8ed
    • Ales Novak's avatar
      drivers/rtc/interface.c: fix infinite loop in initializing the alarm · b35e915e
      Ales Novak authored
      In __rtc_read_alarm(), if the alarm time retrieved by
      rtc_read_alarm_internal() from the device contains invalid values (e.g.
      month=2,mday=31) and the year not set (=-1), the initialization will
      loop infinitely because the year-fixing loop expects the time being
      invalid due to leap year.
      
      Fix reduces the loop to the leap years and adds final validity check.
      Signed-off-by: default avatarAles Novak <alnovak@suse.cz>
      Acked-by: default avatarAlessandro Zummo <a.zummo@towertech.it>
      Reported-by: default avatarJiri Bohac <jbohac@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit ee1d9014)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b35e915e
    • Lee, Chun-Yi's avatar
      drivers/rtc/rtc-efi.c: avoid subtracting day twice when computing year days · 1b67d5e9
      Lee, Chun-Yi authored
      Compared source code of rtc-lib.c::rtc_year_days() with
      efirtc.c::rtc_year_days(), found the code in rtc-efi decreases value of
      day twice when it computing year days.  rtc-lib.c::rtc_year_days() has
      already decrease days and return the year days from 0 to 365.
      Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 809d9627)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1b67d5e9
    • Benjamin Tisssoires's avatar
      HID: logitech-dj: Fix USB 3.0 issue · cded09f5
      Benjamin Tisssoires authored
      This fix (not very clean though) should fix the long time USB3
      issue that was spotted last year. The rational has been given by
      Hans de Goede:
      
       ----
      
      I think the most likely cause for this is a firmware bug
      in the unifying receiver, likely a race condition.
      
      The most prominent difference between having a USB-2 device
      plugged into an EHCI (so USB-2 only) port versus an XHCI
      port will be inter packet timing. Specifically if you
      send packets (ie hid reports) one at a time, then with
      the EHCI controller their will be a significant pause
      between them, where with XHCI they will be very close
      together in time.
      
      The reason for this is the difference in EHCI / XHCI
      controller OS <-> driver interfaces.
      
      For non periodic endpoints (control, bulk) the EHCI uses a
      circular linked-list of commands in dma-memory, which it
      follows to execute commands, if the list is empty, it
      will go into an idle state and re-check periodically.
      
      The XHCI uses a ring of commands per endpoint, and if the OS
      places anything new on the ring it will do an ioport write,
      waking up the XHCI making it send the new packet immediately.
      
      For periodic transfers (isoc, interrupt) the delay between
      packets when sending one at a time (rather then queuing them
      up) will be even larger, because they need to be inserted into
      the EHCI schedule 2 ms in the future so the OS driver can be
      sure that the EHCI driver does not try to start executing the
      time slot in question before the insertion has completed.
      
      So a possible fix may be to insert a delay between packets
      being send to the receiver.
      
       ----
      
      I tested this on a buggy Haswell USB 3.0 motherboard, and I always
      get the notification after adding the msleep.
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit 42c22dbf)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cded09f5