1. 10 Aug, 2016 14 commits
    • David Rientjes's avatar
      mm, compaction: prevent VM_BUG_ON when terminating freeing scanner · 1a30fc84
      David Rientjes authored
      commit a46cbf3b upstream.
      
      It's possible to isolate some freepages in a pageblock and then fail
      split_free_page() due to the low watermark check.  In this case, we hit
      VM_BUG_ON() because the freeing scanner terminated early without a
      contended lock or enough freepages.
      
      This should never have been a VM_BUG_ON() since it's not a fatal
      condition.  It should have been a VM_WARN_ON() at best, or even handled
      gracefully.
      
      Regardless, we need to terminate anytime the full pageblock scan was not
      done.  The logic belongs in isolate_freepages_block(), so handle its
      state gracefully by terminating the pageblock loop and making a note to
      restart at the same pageblock next time since it was not possible to
      complete the scan this time.
      
      [rientjes@google.com: don't rescan pages in a pageblock]
        Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1607111244150.83138@chino.kir.corp.google.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606291436300.145590@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarMinchan Kim <minchan@kernel.org>
      Tested-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a30fc84
    • Torsten Hilbrich's avatar
      fs/nilfs2: fix potential underflow in call to crc32_le · bb01babb
      Torsten Hilbrich authored
      commit 63d2f95d upstream.
      
      The value `bytes' comes from the filesystem which is about to be
      mounted.  We cannot trust that the value is always in the range we
      expect it to be.
      
      Check its value before using it to calculate the length for the crc32_le
      call.  It value must be larger (or equal) sumoff + 4.
      
      This fixes a kernel bug when accidentially mounting an image file which
      had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
      The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
      s_bytes value of 1.  This caused an underflow when substracting sumoff +
      4 (20) in the call to crc32_le.
      
        BUG: unable to handle kernel paging request at ffff88021e600000
        IP:  crc32_le+0x36/0x100
        ...
        Call Trace:
          nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
          nilfs_load_super_block+0x142/0x300 [nilfs2]
          init_nilfs+0x60/0x390 [nilfs2]
          nilfs_mount+0x302/0x520 [nilfs2]
          mount_fs+0x38/0x160
          vfs_kern_mount+0x67/0x110
          do_mount+0x269/0xe00
          SyS_mount+0x9f/0x100
          entry_SYSCALL_64_fastpath+0x16/0x71
      
      Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jpSigned-off-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb01babb
    • David Rientjes's avatar
      mm, compaction: abort free scanner if split fails · 8f40a441
      David Rientjes authored
      commit a4f04f2c upstream.
      
      If the memory compaction free scanner cannot successfully split a free
      page (only possible due to per-zone low watermark), terminate the free
      scanner rather than continuing to scan memory needlessly.  If the
      watermark is insufficient for a free page of order <= cc->order, then
      terminate the scanner since all future splits will also likely fail.
      
      This prevents the compaction freeing scanner from scanning all memory on
      very large zones (very noticeable for zones > 128GB, for instance) when
      all splits will likely fail while holding zone->lock.
      
      compaction_alloc() iterating a 128GB zone has been benchmarked to take
      over 400ms on some systems whereas any free page isolated and ready to
      be split ends up failing in split_free_page() because of the low
      watermark check and thus the iteration continues.
      
      The next time compaction occurs, the freeing scanner will likely start
      at the end of the zone again since no success was made previously and we
      get the same lengthy iteration until the zone is brought above the low
      watermark.  All thp page faults can take >400ms in such a state without
      this fix.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f40a441
    • Lukasz Odzioba's avatar
      mm/swap.c: flush lru pvecs on compound page arrival · af809c0e
      Lukasz Odzioba authored
      commit 8f182270 upstream.
      
      Currently we can have compound pages held on per cpu pagevecs, which
      leads to a lot of memory unavailable for reclaim when needed.  In the
      systems with hundreads of processors it can be GBs of memory.
      
      On of the way of reproducing the problem is to not call munmap
      explicitly on all mapped regions (i.e.  after receiving SIGTERM).  After
      that some pages (with THP enabled also huge pages) may end up on
      lru_add_pvec, example below.
      
        void main() {
        #pragma omp parallel
        {
      	size_t size = 55 * 1000 * 1000; // smaller than  MEM/CPUS
      	void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
      		MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
      	if (p != MAP_FAILED)
      		memset(p, 0, size);
      	//munmap(p, size); // uncomment to make the problem go away
        }
        }
      
      When we run it with THP enabled it will leave significant amount of
      memory on lru_add_pvec.  This memory will be not reclaimed if we hit
      OOM, so when we run above program in a loop:
      
      	for i in `seq 100`; do ./a.out; done
      
      many processes (95% in my case) will be killed by OOM.
      
      The primary point of the LRU add cache is to save the zone lru_lock
      contention with a hope that more pages will belong to the same zone and
      so their addition can be batched.  The huge page is already a form of
      batched addition (it will add 512 worth of memory in one go) so skipping
      the batching seems like a safer option when compared to a potential
      excess in the caching which can be quite large and much harder to fix
      because lru_add_drain_all is way to expensive and it is not really clear
      what would be a good moment to call it.
      
      Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
      madvise(p, size, MADV_FREE); after memset.
      
      This patch flushes lru pvecs on compound page arrival making the problem
      less severe - after applying it kill rate of above example drops to 0%,
      due to reducing maximum amount of memory held on pvec from 28MB (with
      THP) to 56kB per CPU.
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.comSigned-off-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Ming Li <mingli199x@qq.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af809c0e
    • Tejun Heo's avatar
      memcg: css_alloc should return an ERR_PTR value on error · b31a27b8
      Tejun Heo authored
      commit ea3a9645 upstream.
      
      mem_cgroup_css_alloc() was returning NULL on failure while cgroup core
      expected it to return an ERR_PTR value leading to the following NULL
      deref after a css allocation failure.  Fix it by return
      ERR_PTR(-ENOMEM) instead.  I'll also update cgroup core so that it
      can handle NULL returns.
      
        mkdir: page allocation failure: order:6, mode:0x240c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO)
        CPU: 0 PID: 8738 Comm: mkdir Not tainted 4.7.0-rc3+ #123
        ...
        Call Trace:
          dump_stack+0x68/0xa1
          warn_alloc_failed+0xd6/0x130
          __alloc_pages_nodemask+0x4c6/0xf20
          alloc_pages_current+0x66/0xe0
          alloc_kmem_pages+0x14/0x80
          kmalloc_order_trace+0x2a/0x1a0
          __kmalloc+0x291/0x310
          memcg_update_all_caches+0x6c/0x130
          mem_cgroup_css_alloc+0x590/0x610
          cgroup_apply_control_enable+0x18b/0x370
          cgroup_mkdir+0x1de/0x2e0
          kernfs_iop_mkdir+0x55/0x80
          vfs_mkdir+0xb9/0x150
          SyS_mkdir+0x66/0xd0
          do_syscall_64+0x53/0x120
          entry_SYSCALL64_slow_path+0x25/0x25
        ...
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
        IP:  init_and_link_css+0x37/0x220
        PGD 34b1e067 PUD 3a109067 PMD 0
        Oops: 0002 [#1] SMP
        Modules linked in:
        CPU: 0 PID: 8738 Comm: mkdir Not tainted 4.7.0-rc3+ #123
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.2-20160422_131301-anatol 04/01/2014
        task: ffff88007cbc5200 ti: ffff8800666d4000 task.ti: ffff8800666d4000
        RIP: 0010:[<ffffffff810f2ca7>]  [<ffffffff810f2ca7>] init_and_link_css+0x37/0x220
        RSP: 0018:ffff8800666d7d90  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: ffffffff810f2499 RSI: 0000000000000000 RDI: 0000000000000008
        RBP: ffff8800666d7db8 R08: 0000000000000003 R09: 0000000000000000
        R10: 0000000000000001 R11: 0000000000000000 R12: ffff88005a5fb400
        R13: ffffffff81f0f8a0 R14: ffff88005a5fb400 R15: 0000000000000010
        FS:  00007fc944689700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f3aed0d2b80 CR3: 000000003a1e8000 CR4: 00000000000006f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
          cgroup_apply_control_enable+0x1ac/0x370
          cgroup_mkdir+0x1de/0x2e0
          kernfs_iop_mkdir+0x55/0x80
          vfs_mkdir+0xb9/0x150
          SyS_mkdir+0x66/0xd0
          do_syscall_64+0x53/0x120
          entry_SYSCALL64_slow_path+0x25/0x25
        Code: 89 f5 48 89 fb 49 89 d4 48 83 ec 08 8b 05 72 3b d8 00 85 c0 0f 85 60 01 00 00 4c 89 e7 e8 72 f7 ff ff 48 8d 7b 08 48 89 d9 31 c0 <48> c7 83 d0 00 00 00 00 00 00 00 48 83 e7 f8 48 29 f9 81 c1 d8
        RIP   init_and_link_css+0x37/0x220
         RSP <ffff8800666d7d90>
        CR2: 00000000000000d0
        ---[ end trace a2d8836ae1e852d1 ]---
      
      Link: http://lkml.kernel.org/r/20160621165740.GJ3262@mtj.duckdns.orgSigned-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b31a27b8
    • Tejun Heo's avatar
      memcg: mem_cgroup_migrate() may be called with irq disabled · 5acd89d3
      Tejun Heo authored
      commit d93c4130 upstream.
      
      mem_cgroup_migrate() uses local_irq_disable/enable() but can be called
      with irq disabled from migrate_page_copy().  This ends up enabling irq
      while holding a irq context lock triggering the following lockdep
      warning.  Fix it by using irq_save/restore instead.
      
        =================================
        [ INFO: inconsistent lock state ]
        4.7.0-rc1+ #52 Tainted: G        W
        ---------------------------------
        inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
        kcompactd0/151 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (&(&ctx->completion_lock)->rlock){+.?.-.}, at: [<000000000038fd96>] aio_migratepage+0x156/0x1e8
        {IN-SOFTIRQ-W} state was registered at:
           __lock_acquire+0x5b6/0x1930
           lock_acquire+0xee/0x270
           _raw_spin_lock_irqsave+0x66/0xb0
           aio_complete+0x98/0x328
           dio_complete+0xe4/0x1e0
           blk_update_request+0xd4/0x450
           scsi_end_request+0x48/0x1c8
           scsi_io_completion+0x272/0x698
           blk_done_softirq+0xca/0xe8
           __do_softirq+0xc8/0x518
           irq_exit+0xee/0x110
           do_IRQ+0x6a/0x88
           io_int_handler+0x11a/0x25c
           __mutex_unlock_slowpath+0x144/0x1d8
           __mutex_unlock_slowpath+0x140/0x1d8
           kernfs_iop_permission+0x64/0x80
           __inode_permission+0x9e/0xf0
           link_path_walk+0x6e/0x510
           path_lookupat+0xc4/0x1a8
           filename_lookup+0x9c/0x160
           user_path_at_empty+0x5c/0x70
           SyS_readlinkat+0x68/0x140
           system_call+0xd6/0x270
        irq event stamp: 971410
        hardirqs last  enabled at (971409):  migrate_page_move_mapping+0x3ea/0x588
        hardirqs last disabled at (971410):  _raw_spin_lock_irqsave+0x3c/0xb0
        softirqs last  enabled at (970526):  __do_softirq+0x460/0x518
        softirqs last disabled at (970519):  irq_exit+0xee/0x110
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
      	 CPU0
      	 ----
          lock(&(&ctx->completion_lock)->rlock);
          <Interrupt>
            lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
        3 locks held by kcompactd0/151:
         #0:  (&(&mapping->private_lock)->rlock){+.+.-.}, at:  aio_migratepage+0x42/0x1e8
         #1:  (&ctx->ring_lock){+.+.+.}, at:  aio_migratepage+0x5a/0x1e8
         #2:  (&(&ctx->completion_lock)->rlock){+.?.-.}, at:  aio_migratepage+0x156/0x1e8
      
        stack backtrace:
        CPU: 20 PID: 151 Comm: kcompactd0 Tainted: G        W       4.7.0-rc1+ #52
        Call Trace:
          show_trace+0xea/0xf0
          show_stack+0x72/0xf0
          dump_stack+0x9a/0xd8
          print_usage_bug.part.27+0x2d4/0x2e8
          mark_lock+0x17e/0x758
          mark_held_locks+0xa2/0xd0
          trace_hardirqs_on_caller+0x140/0x1c0
          mem_cgroup_migrate+0x266/0x370
          aio_migratepage+0x16a/0x1e8
          move_to_new_page+0xb0/0x260
          migrate_pages+0x8f4/0x9f0
          compact_zone+0x4dc/0xdc8
          kcompactd_do_work+0x1aa/0x358
          kcompactd+0xba/0x2c8
          kthread+0x10a/0x110
          kernel_thread_starter+0x6/0xc
          kernel_thread_starter+0x0/0xc
        INFO: lockdep is turned off.
      
      Link: http://lkml.kernel.org/r/20160620184158.GO3262@mtj.duckdns.org
      Link: http://lkml.kernel.org/g/5767CFE5.7080904@de.ibm.com
      Fixes: 74485cf2 ("mm: migrate: consolidate mem_cgroup_migrate() calls")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5acd89d3
    • Mel Gorman's avatar
      mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask · 521fe1d2
      Mel Gorman authored
      commit e838a45f upstream.
      
      Commit d0164adc ("mm, page_alloc: distinguish between being unable
      to sleep, unwilling to sleep and avoiding waking kswapd") modified
      __GFP_WAIT to explicitly identify the difference between atomic callers
      and those that were unwilling to sleep.  Later the definition was
      removed entirely.
      
      The GFP_RECLAIM_MASK is the set of flags that affect watermark checking
      and reclaim behaviour but __GFP_ATOMIC was never added.  Without it,
      atomic users of the slab allocator strip the __GFP_ATOMIC flag and
      cannot access the page allocator atomic reserves.  This patch addresses
      the problem.
      
      The user-visible impact depends on the workload but potentially atomic
      allocations unnecessarily fail without this path.
      
      Link: http://lkml.kernel.org/r/20160610093832.GK2527@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      521fe1d2
    • Ludovic Desroches's avatar
      dmaengine: at_xdmac: double FIFO flush needed to compute residue · f883b924
      Ludovic Desroches authored
      commit 9295c41d upstream.
      
      Due to the way CUBC register is updated, a double flush is needed to
      compute an accurate residue. First flush aim is to get data from the DMA
      FIFO and second one ensures that we won't report data which are not in
      memory.
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel
      eXtended DMA Controller driver")
      Reviewed-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f883b924
    • Ludovic Desroches's avatar
      dmaengine: at_xdmac: fix residue corruption · d1a41160
      Ludovic Desroches authored
      commit 53398f48 upstream.
      
      An unexpected value of CUBC can lead to a corrupted residue. A more
      complex sequence is needed to detect an inaccurate value for NCA or CUBC.
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel
      eXtended DMA Controller driver")
      Reviewed-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1a41160
    • Ludovic Desroches's avatar
      dmaengine: at_xdmac: align descriptors on 64 bits · de462956
      Ludovic Desroches authored
      commit 4a9723e8 upstream.
      
      Having descriptors aligned on 64 bits allows update CNDA and CUBC in an
      atomic way.
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel
      eXtended DMA Controller driver")
      Reviewed-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de462956
    • Lukas Wunner's avatar
      x86/quirks: Add early quirk to reset Apple AirPort card · 980d99cd
      Lukas Wunner authored
      commit abb2bafd upstream.
      
      The EFI firmware on Macs contains a full-fledged network stack for
      downloading OS X images from osrecovery.apple.com. Unfortunately
      on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
      wireless card on every boot and leaves it enabled even after
      ExitBootServices has been called. The card continues to assert its IRQ
      line, causing spurious interrupts if the IRQ is shared. It also corrupts
      memory by DMAing received packets, allowing for remote code execution
      over the air. This only stops when a driver is loaded for the wireless
      card, which may be never if the driver is not installed or blacklisted.
      
      The issue seems to be constrained to the Broadcom 4331. Chris Milsted
      has verified that the newer Broadcom 4360 built into the MacBookPro11,3
      (2013/2014) does not exhibit this behaviour. The chances that Apple will
      ever supply a firmware fix for the older machines appear to be zero.
      
      The solution is to reset the card on boot by writing to a reset bit in
      its mmio space. This must be done as an early quirk and not as a plain
      vanilla PCI quirk to successfully combat memory corruption by DMAed
      packets: Matthew Garrett found out in 2012 that the packets are written
      to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
      This type of memory is made available to the page allocator by
      efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
      subsys initcall level. In-between a time window would be open for memory
      corruption. Random crashes occurring in this time window and attributed
      to DMAed packets have indeed been observed in the wild by Chris
      Bainbridge.
      
      When Matthew Garrett analyzed the memory corruption issue in 2012, he
      sought to fix it with a grub quirk which transitions the card to D3hot:
      http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
      
      This approach does not help users with other bootloaders and while it
      may prevent DMAed packets, it does not cure the spurious interrupts
      emanating from the card. Unfortunately the card's mmio space is
      inaccessible in D3hot, so to reset it, we have to undo the effect of
      Matthew's grub patch and transition the card back to D0.
      
      Note that the quirk takes a few shortcuts to reduce the amount of code:
      The size of BAR 0 and the location of the PM capability is identical
      on all affected machines and therefore hardcoded. Only the address of
      BAR 0 differs between models. Also, it is assumed that the BCMA core
      currently mapped is the 802.11 core. The EFI driver seems to always take
      care of this.
      
      Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
      towards finding the best solution to this problem.
      
      The following should be a comprehensive list of affected models:
          iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
          iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
          Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
          Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]
      
      For posterity, spurious interrupts caused by the Broadcom 4331 wireless
      card resulted in splats like this (stacktrace omitted):
      
          irq 17: nobody cared (try booting with the "irqpoll" option)
          handlers:
          [<ffffffff81374370>] pcie_isr
          [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
          [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
          Disabling IRQ #17
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
      Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
      Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
      Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
      Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
      Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Acked-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Acked-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris Milsted <cmilsted@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Michael Buesch <m@bues.ch>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: b43-dev@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
      [ Did minor readability edits. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      980d99cd
    • Lukas Wunner's avatar
      x86/quirks: Reintroduce scanning of secondary buses · 7013ef90
      Lukas Wunner authored
      commit 850c3210 upstream.
      
      We used to scan secondary buses until the following commit that
      was applied in 2009:
      
        8659c406 ("x86: only scan the root bus in early PCI quirks")
      
      which commit constrained early quirks to the root bus only. Its
      motivation was to prevent application of the nvidia_bugs quirk
      on secondary buses.
      
      We're about to add a quirk to reset the Broadcom 4331 wireless card on
      2011/2012 Macs, which is located on a secondary bus behind a PCIe root
      port. To facilitate that, reintroduce scanning of secondary buses.
      
      The commit message of 8659c406 notes that scanning only the root bus
      "saves quite some unnecessary scanning work". The algorithm used prior
      to 8659c406 was particularly time consuming because it scanned
      buses 0 to 31 brute force. To avoid lengthening boot time, employ a
      recursive strategy which only scans buses that are actually reachable
      from the root bus.
      
      Yinghai Lu pointed out that the secondary bus number read from a
      bridge's config space may be invalid, in particular a value of 0 would
      cause an infinite loop. The PCI core goes beyond that and recurses to a
      child bus only if its bus number is greater than the parent bus number
      (see pci_scan_bridge()). Since the root bus is numbered 0, this implies
      that secondary buses may not be 0. Do the same on early scanning.
      
      If this algorithm is found to significantly impact boot time or cause
      infinite loops on broken hardware, it would be possible to limit its
      recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
      at depth 0, so the bus need not be scanned deeper than that for now. An
      alternative approach would be to revert to scanning only the root bus,
      and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
      and 8086:1e16. Apple always positioned the card behind either of these
      three ports. The quirk would then check presence of the card in slot 0
      below the root port and do its deed.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-pci@vger.kernel.org
      Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7013ef90
    • Lukas Wunner's avatar
      x86/quirks: Apply nvidia_bugs quirk only on root bus · 9c170009
      Lukas Wunner authored
      commit 447d29d1 upstream.
      
      Since the following commit:
      
        8659c406 ("x86: only scan the root bus in early PCI quirks")
      
      ... early quirks are only applied to devices on the root bus.
      
      The motivation was to prevent application of the nvidia_bugs quirk on
      secondary buses.
      
      We're about to reintroduce scanning of secondary buses for a quirk to
      reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
      regressions, open code the requirement to apply nvidia_bugs only on the
      root bus.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c170009
    • Michał Pecio's avatar
      USB: OHCI: Don't mark EDs as ED_OPER if scheduling fails · e043a37e
      Michał Pecio authored
      commit c66f59ee upstream.
      
      Since ed_schedule begins with marking the ED as "operational",
      the ED may be left in such state even if scheduling actually
      fails.
      
      This allows future submission attempts to smuggle this ED to the
      hardware behind the scheduler's back and without linking it to
      the ohci->eds_in_use list.
      
      The former causes bandwidth saturation and data loss on isoc
      endpoints, the latter crashes the kernel when attempt is made
      to unlink such ED from this list.
      
      Fix ed_schedule to update ED state only on successful return.
      Signed-off-by: default avatarMichal Pecio <michal.pecio@gmail.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e043a37e
  2. 27 Jul, 2016 26 commits