1. 29 Apr, 2018 7 commits
  2. 24 Apr, 2018 33 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.129 · 8e2def05
      Greg Kroah-Hartman authored
      8e2def05
    • Greg Thelen's avatar
      writeback: safer lock nesting · 6f051f89
      Greg Thelen authored
      commit 2e898e4c upstream.
      
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Reported-by: default avatarWang Long <wanglong19@meituan.com>
      Acked-by: default avatarWang Long <wanglong19@meituan.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [natechancellor: Applied to 4.4 based on Greg's backport on lkml.org]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f051f89
    • Amir Goldstein's avatar
      fanotify: fix logic of events on child · 87d7ccbf
      Amir Goldstein authored
      commit 54a307ba upstream.
      
      When event on child inodes are sent to the parent inode mark and
      parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
      will not be delivered to the listener process. However, if the same
      process also has a mount mark, the event to the parent inode will be
      delivered regadless of the mount mark mask.
      
      This behavior is incorrect in the case where the mount mark mask does
      not contain the specific event type. For example, the process adds
      a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
      and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).
      
      A modify event on a file inside that directory (and inside that mount)
      should not create a FAN_MODIFY event, because neither of the marks
      requested to get that event on the file.
      
      Fixes: 1968f5ee ("fanotify: use both marks when possible")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      [natechancellor: Fix small conflict due to lack of 3cd5eca8]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87d7ccbf
    • wangguang's avatar
      ext4: bugfix for mmaped pages in mpage_release_unused_pages() · a529f29a
      wangguang authored
      commit 4e800c03 upstream.
      
      Pages clear buffers after ext4 delayed block allocation failed,
      However, it does not clean its pte_dirty flag.
      if the pages unmap ,in cording to the pte_dirty ,
      unmap_page_range may try to call __set_page_dirty,
      
      which may lead to the bugon at
      mpage_prepare_extent_to_map:head = page_buffers(page);.
      
      This patch just call clear_page_dirty_for_io to clean pte_dirty
      at mpage_release_unused_pages for pages mmaped.
      
      Steps to reproduce the bug:
      
      (1) mmap a file in ext4
      	addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
      	       	            fd, 0);
      	memset(addr, 'i', 4096);
      
      (2) return EIO at
      
      	ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent
      
      which causes this log message to be print:
      
                      ext4_msg(sb, KERN_CRIT,
                              "Delayed block allocation failed for "
                              "inode %lu at logical offset %llu with"
                              " max blocks %u with error %d",
                              inode->i_ino,
                              (unsigned long long)map->m_lblk,
                              (unsigned)map->m_len, -err);
      
      (3)Unmap the addr cause warning at
      
      	__set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page));
      
      (4) wait for a minute,then bugon happen.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarwangguang <wangguang03@zte.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [@nathanchance: Resolved conflict from lack of 09cbfeaf]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a529f29a
    • Matthew Wilcox's avatar
      mm/filemap.c: fix NULL pointer in page_cache_tree_insert() · d47a5ca3
      Matthew Wilcox authored
      commit abc1be13 upstream.
      
      f2fs specifies the __GFP_ZERO flag for allocating some of its pages.
      Unfortunately, the page cache also uses the mapping's GFP flags for
      allocating radix tree nodes.  It always masked off the __GFP_HIGHMEM
      flag, and masks off __GFP_ZERO in some paths, but not all.  That causes
      radix tree nodes to be allocated with a NULL list_head, which causes
      backtraces like:
      
        __list_del_entry+0x30/0xd0
        list_lru_del+0xac/0x1ac
        page_cache_tree_insert+0xd8/0x110
      
      The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through
      if they are ever used.  Fix them all by using GFP_RECLAIM_MASK at the
      innermost location, and remove it from earlier in the callchain.
      
      Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Reported-by: default avatarChris Fries <cfries@google.com>
      Debugged-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d47a5ca3
    • Michal Hocko's avatar
      mm: allow GFP_{FS,IO} for page_cache_read page cache allocation · 820ca577
      Michal Hocko authored
      commit c20cd45e upstream.
      
      page_cache_read has been historically using page_cache_alloc_cold to
      allocate a new page.  This means that mapping_gfp_mask is used as the
      base for the gfp_mask.  Many filesystems are setting this mask to
      GFP_NOFS to prevent from fs recursion issues.  page_cache_read is called
      from the vm_operations_struct::fault() context during the page fault.
      This context doesn't need the reclaim protection normally.
      
      ceph and ocfs2 which call filemap_fault from their fault handlers seem
      to be OK because they are not taking any fs lock before invoking generic
      implementation.  xfs which takes XFS_MMAPLOCK_SHARED is safe from the
      reclaim recursion POV because this lock serializes truncate and punch
      hole with the page faults and it doesn't get involved in the reclaim.
      
      There is simply no reason to deliberately use a weaker allocation
      context when a __GFP_FS | __GFP_IO can be used.  The GFP_NOFS protection
      might be even harmful.  There is a push to fail GFP_NOFS allocations
      rather than loop within allocator indefinitely with a very limited
      reclaim ability.  Once we start failing those requests the OOM killer
      might be triggered prematurely because the page cache allocation failure
      is propagated up the page fault path and end up in
      pagefault_out_of_memory.
      
      We cannot play with mapping_gfp_mask directly because that would be racy
      wrt.  parallel page faults and it might interfere with other users who
      really rely on NOFS semantic from the stored gfp_mask.  The mask is also
      inode proper so it would even be a layering violation.  What we can do
      instead is to push the gfp_mask into struct vm_fault and allow fs layer
      to overwrite it should the callback need to be called with a different
      allocation context.
      
      Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
      because this should be safe from the page fault path normally.  Why do
      we care about mapping_gfp_mask at all then? Because this doesn't hold
      only reclaim protection flags but it also might contain zone and
      movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have
      to respect those.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      820ca577
    • Ian Kent's avatar
      autofs: mount point create should honour passed in mode · ce98dd37
      Ian Kent authored
      commit 1e630665 upstream.
      
      The autofs file system mkdir inode operation blindly sets the created
      directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
      cause selinux dac_override denials.
      
      But the function also checks if the caller is the daemon (as no-one else
      should be able to do anything here) so there's no point in not honouring
      the passed in mode, allowing the daemon to set appropriate mode when
      required.
      
      Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.netSigned-off-by: default avatarIan Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce98dd37
    • Al Viro's avatar
      Don't leak MNT_INTERNAL away from internal mounts · d10a274a
      Al Viro authored
      commit 16a34adb upstream.
      
      We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
      their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
      somewhere in a new namespace and exiting yields a stack overflow.
      
      Cc: stable@kernel.org
      Reported-by: default avatarAlexander Aring <aring@mojatatu.com>
      Bisected-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarAlexander Aring <aring@mojatatu.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d10a274a
    • Al Viro's avatar
      rpc_pipefs: fix double-dput() · 20e96d90
      Al Viro authored
      commit 4a3877c4 upstream.
      
      if we ever hit rpc_gssd_dummy_depopulate() dentry passed to
      it has refcount equal to 1.  __rpc_rmpipe() drops it and
      dput() done after that hits an already freed dentry.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20e96d90
    • Al Viro's avatar
      hypfs_kill_super(): deal with failed allocations · 873b214b
      Al Viro authored
      commit a24cd490 upstream.
      
      hypfs_fill_super() might fail to allocate sbi; hypfs_kill_super()
      should not oops on that.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      873b214b
    • Al Viro's avatar
      jffs2_kill_sb(): deal with failed allocations · 2154ecea
      Al Viro authored
      commit c66b23c2 upstream.
      
      jffs2_fill_super() might fail to allocate jffs2_sb_info;
      jffs2_kill_sb() must survive that.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2154ecea
    • Michael Ellerman's avatar
      powerpc/lib: Fix off-by-one in alternate feature patching · 263b8d4e
      Michael Ellerman authored
      commit b8858581 upstream.
      
      When we patch an alternate feature section, we have to adjust any
      relative branches that branch out of the alternate section.
      
      But currently we have a bug if we have a branch that points to past
      the last instruction of the alternate section, eg:
      
        FTR_SECTION_ELSE
        1:     b       2f
               or      6,6,6
        2:
        ALT_FTR_SECTION_END(...)
               nop
      
      This will result in a relative branch at 1 with a target that equals
      the end of the alternate section.
      
      That branch does not need adjusting when it's moved to the non-else
      location. Currently we do adjust it, resulting in a branch that goes
      off into the link-time location of the else section, which is junk.
      
      The fix is to not patch branches that have a target == end of the
      alternate section.
      
      Fixes: d20fe50a ("KVM: PPC: Book3S HV: Branch inside feature section")
      Fixes: 9b1a735d ("powerpc: Add logic to patch alternative feature sections")
      Cc: stable@vger.kernel.org # v2.6.27+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      263b8d4e
    • Michael Neuling's avatar
      powerpc/eeh: Fix enabling bridge MMIO windows · 286427ed
      Michael Neuling authored
      commit 13a83eac upstream.
      
      On boot we save the configuration space of PCIe bridges. We do this so
      when we get an EEH event and everything gets reset that we can restore
      them.
      
      Unfortunately we save this state before we've enabled the MMIO space
      on the bridges. Hence if we have to reset the bridge when we come back
      MMIO is not enabled and we end up taking an PE freeze when the driver
      starts accessing again.
      
      This patch forces the memory/MMIO and bus mastering on when restoring
      bridges on EEH. Ideally we'd do this correctly by saving the
      configuration space writes later, but that will have to come later in
      a larger EEH rewrite. For now we have this simple fix.
      
      The original bug can be triggered on a boston machine by doing:
        echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
      On boston, this PHB has a PCIe switch on it.  Without this patch,
      you'll see two EEH events, 1 expected and 1 the failure we are fixing
      here. The second EEH event causes the anything under the PHB to
      disappear (i.e. the i40e eth).
      
      With this patch, only 1 EEH event occurs and devices properly recover.
      
      Fixes: 652defed ("powerpc/eeh: Check PCIe link after reset")
      Cc: stable@vger.kernel.org # v3.11+
      Reported-by: default avatarPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Acked-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      286427ed
    • Matt Redfearn's avatar
      MIPS: memset.S: Fix clobber of v1 in last_fixup · d37aca47
      Matt Redfearn authored
      commit c96eebf0 upstream.
      
      The label .Llast_fixup\@ is jumped to on page fault within the final
      byte set loop of memset (on < MIPSR6 architectures). For some reason, in
      this fault handler, the v1 register is randomly set to a2 & STORMASK.
      This clobbers v1 for the calling function. This can be observed with the
      following test code:
      
      static int __init __attribute__((optimize("O0"))) test_clear_user(void)
      {
        register int t asm("v1");
        char *test;
        int j, k;
      
        pr_info("\n\n\nTesting clear_user\n");
        test = vmalloc(PAGE_SIZE);
      
        for (j = 256; j < 512; j++) {
          t = 0xa5a5a5a5;
          if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
              pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k);
          }
          if (t != 0xa5a5a5a5) {
             pr_err("v1 was clobbered to 0x%x!\n", t);
          }
        }
      
        return 0;
      }
      late_initcall(test_clear_user);
      
      Which demonstrates that v1 is indeed clobbered (MIPS64):
      
      Testing clear_user
      v1 was clobbered to 0x1!
      v1 was clobbered to 0x2!
      v1 was clobbered to 0x3!
      v1 was clobbered to 0x4!
      v1 was clobbered to 0x5!
      v1 was clobbered to 0x6!
      v1 was clobbered to 0x7!
      
      Since the number of bytes that could not be set is already contained in
      a2, the andi placing a value in v1 is not necessary and actively
      harmful in clobbering v1.
      Reported-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19109/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d37aca47
    • Matt Redfearn's avatar
      MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup · af878d51
      Matt Redfearn authored
      commit daf70d89 upstream.
      
      The __clear_user function is defined to return the number of bytes that
      could not be cleared. From the underlying memset / bzero implementation
      this means setting register a2 to that number on return. Currently if a
      page fault is triggered within the memset_partial block, the value
      loaded into a2 on return is meaningless.
      
      The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
      out how many bytes failed to copy, the exception handler should find how
      many bytes left in the partial block (andi a2, STORMASK), add that to
      the partial block end address (a2), and subtract the faulting address to
      get the remainder. Currently it incorrectly subtracts the partial block
      start address (t1), which has additionally been clobbered to generate a
      jump target in memset_partial. Fix this by adding the block end address
      instead.
      
      This issue was found with the following test code:
            int j, k;
            for (j = 0; j < 512; j++) {
              if ((k = clear_user(NULL, j)) != j) {
                 pr_err("clear_user (NULL %d) returned %d\n", j, k);
              }
            }
      Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).
      Suggested-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19108/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af878d51
    • Matt Redfearn's avatar
      MIPS: memset.S: EVA & fault support for small_memset · be204694
      Matt Redfearn authored
      commit 8a8158c8 upstream.
      
      The MIPS kernel memset / bzero implementation includes a small_memset
      branch which is used when the region to be set is smaller than a long (4
      bytes on 32bit, 8 bytes on 64bit). The current small_memset
      implementation uses a simple store byte loop to write the destination.
      There are 2 issues with this implementation:
      
      1. When EVA mode is active, user and kernel address spaces may overlap.
      Currently the use of the sb instruction means kernel mode addressing is
      always used and an intended write to userspace may actually overwrite
      some critical kernel data.
      
      2. If the write triggers a page fault, for example by calling
      __clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
      is triggered.
      
      Fix these issues by replacing the sb instruction with the EX() macro,
      which will emit EVA compatible instuctions as required. Additionally
      implement a fault fixup for small_memset which sets a2 to the number of
      bytes that could not be cleared (as defined by __clear_user).
      Reported-by: default avatarChuanhua Lei <chuanhua.lei@intel.com>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/18975/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be204694
    • Matt Redfearn's avatar
      MIPS: uaccess: Add micromips clobbers to bzero invocation · 6a5722cb
      Matt Redfearn authored
      commit b3d7e55c upstream.
      
      The micromips implementation of bzero additionally clobbers registers t7
      & t8. Specify this in the clobbers list when invoking bzero.
      
      Fixes: 26c5e07d ("MIPS: microMIPS: Optimise 'memset' core library function.")
      Reported-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.10+
      Patchwork: https://patchwork.linux-mips.org/patch/19110/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a5722cb
    • Rodrigo Rivas Costa's avatar
      HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device · 7c3a5626
      Rodrigo Rivas Costa authored
      commit a955358d upstream.
      
      Doing `ioctl(HIDIOCGFEATURE)` in a tight loop on a hidraw device
      and then disconnecting the device, or unloading the driver, can
      cause a NULL pointer dereference.
      
      When a hidraw device is destroyed it sets 0 to `dev->exist`.
      Most functions check 'dev->exist' before doing its work, but
      `hidraw_get_report()` was missing that check.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRodrigo Rivas Costa <rodrigorivascosta@gmail.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c3a5626
    • David Wang's avatar
      ALSA: hda - New VIA controller suppor no-snoop path · cebd9b67
      David Wang authored
      commit af52f998 upstream.
      
      This patch is used to tell kernel that new VIA HDAC controller also
      support no-snoop path.
      
      [ minor coding style fix by tiwai ]
      Signed-off-by: default avatarDavid Wang <davidwang@zhaoxin.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cebd9b67
    • Takashi Iwai's avatar
      ALSA: rawmidi: Fix missing input substream checks in compat ioctls · fc338748
      Takashi Iwai authored
      commit 8a56ef4f upstream.
      
      Some rawmidi compat ioctls lack of the input substream checks
      (although they do check only for rfile->output).  This many eventually
      lead to an Oops as NULL substream is passed to the rawmidi core
      functions.
      
      Fix it by adding the proper checks before each function call.
      
      The bug was spotted by syzkaller.
      
      Reported-by: syzbot+f7a0348affc3b67bc617@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc338748
    • Fabián Inostroza's avatar
      ALSA: line6: Use correct endpoint type for midi output · 68fc6f74
      Fabián Inostroza authored
      commit 7ecb46e9 upstream.
      
      Sending MIDI messages to a PODxt through the USB connection shows
      "usb_submit_urb failed" in dmesg and the message is not received by
      the POD.
      
      The error is caused because in the funcion send_midi_async() in midi.c
      there is a call to usb_sndbulkpipe() for endpoint 3 OUT, but the PODxt
      USB descriptor shows that this endpoint it's an interrupt endpoint.
      
      Patch tested with PODxt only.
      
      [ The bug has been present from the very beginning in the staging
        driver time, but Fixes below points to the commit moving to sound/
        directory so that the fix can be cleanly applied -- tiwai ]
      
      Fixes: 61864d84 ("ALSA: move line6 usb driver into sound/usb")
      Signed-off-by: default avatarFabián Inostroza <fabianinostroza@udec.cl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68fc6f74
    • Theodore Ts'o's avatar
      ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() · 9b06cce3
      Theodore Ts'o authored
      commit c755e251 upstream.
      
      The xattr_sem deadlock problems fixed in commit 2e81a4ee: "ext4:
      avoid deadlock when expanding inode size" didn't include the use of
      xattr_sem in fs/ext4/inline.c.  With the addition of project quota
      which added a new extra inode field, this exposed deadlocks in the
      inline_data code similar to the ones fixed by 2e81a4ee.
      
      The deadlock can be reproduced via:
      
         dmesg -n 7
         mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
         mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
         mkdir /vdc/a
         umount /vdc
         mount -t ext4 /dev/vdc /vdc
         echo foo > /vdc/a/foo
      
      and looks like this:
      
      [   11.158815]
      [   11.160276] =============================================
      [   11.161960] [ INFO: possible recursive locking detected ]
      [   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W
      [   11.161960] ---------------------------------------------
      [   11.161960] bash/2519 is trying to acquire lock:
      [   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]
      [   11.161960] but task is already holding lock:
      [   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
      [   11.161960]
      [   11.161960] other info that might help us debug this:
      [   11.161960]  Possible unsafe locking scenario:
      [   11.161960]
      [   11.161960]        CPU0
      [   11.161960]        ----
      [   11.161960]   lock(&ei->xattr_sem);
      [   11.161960]   lock(&ei->xattr_sem);
      [   11.161960]
      [   11.161960]  *** DEADLOCK ***
      [   11.161960]
      [   11.161960]  May be due to missing lock nesting notation
      [   11.161960]
      [   11.161960] 4 locks held by bash/2519:
      [   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
      [   11.161960]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
      [   11.161960]  #2:  (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
      [   11.161960]  #3:  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
      [   11.161960]
      [   11.161960] stack backtrace:
      [   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
      [   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
      [   11.161960] Call Trace:
      [   11.161960]  dump_stack+0x72/0xa3
      [   11.161960]  __lock_acquire+0xb7c/0xcb9
      [   11.161960]  ? kvm_clock_read+0x1f/0x29
      [   11.161960]  ? __lock_is_held+0x36/0x66
      [   11.161960]  ? __lock_is_held+0x36/0x66
      [   11.161960]  lock_acquire+0x106/0x18a
      [   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  down_write+0x39/0x72
      [   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  ? _raw_read_unlock+0x22/0x2c
      [   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
      [   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
      [   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
      [   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
      [   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
      [   11.161960]  ext4_try_add_inline_entry+0x69/0x152
      [   11.161960]  ext4_add_entry+0xa3/0x848
      [   11.161960]  ? __brelse+0x14/0x2f
      [   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
      [   11.161960]  ext4_add_nondir+0x17/0x5b
      [   11.161960]  ext4_create+0xcf/0x133
      [   11.161960]  ? ext4_mknod+0x12f/0x12f
      [   11.161960]  lookup_open+0x39e/0x3fb
      [   11.161960]  ? __wake_up+0x1a/0x40
      [   11.161960]  ? lock_acquire+0x11e/0x18a
      [   11.161960]  path_openat+0x35c/0x67a
      [   11.161960]  ? sched_clock_cpu+0xd7/0xf2
      [   11.161960]  do_filp_open+0x36/0x7c
      [   11.161960]  ? _raw_spin_unlock+0x22/0x2c
      [   11.161960]  ? __alloc_fd+0x169/0x173
      [   11.161960]  do_sys_open+0x59/0xcc
      [   11.161960]  SyS_open+0x1d/0x1f
      [   11.161960]  do_int80_syscall_32+0x4f/0x61
      [   11.161960]  entry_INT80_32+0x2f/0x2f
      [   11.161960] EIP: 0xb76ad469
      [   11.161960] EFLAGS: 00000286 CPU: 0
      [   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
      [   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
      [   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      
      Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4ee as a prereq)
      Reported-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b06cce3
    • Jan Kara's avatar
      ext4: fix crashes in dioread_nolock mode · b9b98c26
      Jan Kara authored
      commit 74dae427 upstream.
      
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9b98c26
    • Paul Parsons's avatar
      drm/radeon: Fix PCIe lane width calculation · ba250be9
      Paul Parsons authored
      commit 85e290d9 upstream.
      
      Two years ago I tried an AMD Radeon E8860 embedded GPU with the drm driver.
      The dmesg output included driver warnings about an invalid PCIe lane width.
      Tracking the problem back led to si_set_pcie_lane_width_in_smc().
      The calculation of the lane widths via ATOM_PPLIB_PCIE_LINK_WIDTH_MASK and
      ATOM_PPLIB_PCIE_LINK_WIDTH_SHIFT macros did not increment the resulting
      value, per the comment in pptable.h ("lanes - 1"), and per usage elsewhere.
      Applying the increment silenced the warnings.
      The code has not changed since, so either my analysis was incorrect or the
      bug has gone unnoticed. Hence submitting this as an RFC.
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Acked-by: default avatarChunming Zhou <david1.zhou@amd.com>
      Signed-off-by: default avatarPaul Parsons <lost.distance@yahoo.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba250be9
    • Theodore Ts'o's avatar
      ext4: don't allow r/w mounts if metadata blocks overlap the superblock · 4845fefe
      Theodore Ts'o authored
      commit 18db4b4e upstream.
      
      If some metadata block, such as an allocation bitmap, overlaps the
      superblock, it's very likely that if the file system is mounted
      read/write, the results will not be pretty.  So disallow r/w mounts
      for file systems corrupted in this particular way.
      
      Backport notes:
      3.18.y is missing bc98a42c ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
      and e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
      so we simply use the sb MS_RDONLY check from pre bc98a42c in place of the sb_rdonly
      function used in the upstream variant of the patch.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHarsh Shandilya <harsh@prjkt.io>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4845fefe
    • Alex Williamson's avatar
      vfio/pci: Virtualize Maximum Read Request Size · 7b0278ca
      Alex Williamson authored
      commit cf0d53ba upstream.
      
      MRRS defines the maximum read request size a device is allowed to
      make.  Drivers will often increase this to allow more data transfer
      with a single request.  Completions to this request are bound by the
      MPS setting for the bus.  Aside from device quirks (none known), it
      doesn't seem to make sense to set an MRRS value less than MPS, yet
      this is a likely scenario given that user drivers do not have a
      system-wide view of the PCI topology.  Virtualize MRRS such that the
      user can set MRRS >= MPS, but use MPS as the floor value that we'll
      write to hardware.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0278ca
    • Alex Williamson's avatar
      vfio/pci: Virtualize Maximum Payload Size · 737e33da
      Alex Williamson authored
      commit 52318497 upstream.
      
      With virtual PCI-Express chipsets, we now see userspace/guest drivers
      trying to match the physical MPS setting to a virtual downstream port.
      Of course a lone physical device surrounded by virtual interconnects
      cannot make a correct decision for a proper MPS setting.  Instead,
      let's virtualize the MPS control register so that writes through to
      hardware are disallowed.  Userspace drivers like QEMU assume they can
      write anything to the device and we'll filter out anything dangerous.
      Since mismatched MPS can lead to AER and other faults, let's add it
      to the kernel side rather than relying on userspace virtualization to
      handle it.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      737e33da
    • Alex Williamson's avatar
      vfio-pci: Virtualize PCIe & AF FLR · 1639df89
      Alex Williamson authored
      commit ddf9dc0e upstream.
      
      We use a BAR restore trick to try to detect when a user has performed
      a device reset, possibly through FLR or other backdoors, to put things
      back into a working state.  This is important for backdoor resets, but
      we can actually just virtualize the "front door" resets provided via
      PCIe and AF FLR.  Set these bits as virtualized + writable, allowing
      the default write to set them in vconfig, then we can simply check the
      bit, perform an FLR of our own, and clear the bit.  We don't actually
      have the granularity in PCI to specify the type of reset we want to
      do, but generally devices don't implement both PCIe and AF FLR and
      we'll favor these over other types of reset, so we should generally
      lineup.  We do test whether the device provides the requested FLR type
      to stay consistent with hardware capabilities though.
      
      This seems to fix several instance of devices getting into bad states
      with userspace drivers, like dpdk, running inside a VM.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarGreg Rose <grose@lightfleet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1639df89
    • Takashi Iwai's avatar
      ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation · 0c8c443e
      Takashi Iwai authored
      commit e15dc99d upstream.
      
      The commit 02a5d692 ("ALSA: pcm: Avoid potential races between OSS
      ioctls and read/write") split the PCM preparation code to a locked
      version, and it added a sanity check of runtime->oss.prepare flag
      along with the change.  This leaded to an endless loop when the stream
      gets XRUN: namely, snd_pcm_oss_write3() and co call
      snd_pcm_oss_prepare() without setting runtime->oss.prepare flag and
      the loop continues until the PCM state reaches to another one.
      
      As the function is supposed to execute the preparation
      unconditionally, drop the invalid state check there.
      
      The bug was triggered by syzkaller.
      
      Fixes: 02a5d692 ("ALSA: pcm: Avoid potential races between OSS ioctls and read/write")
      Reported-by: syzbot+150189c103427d31a053@syzkaller.appspotmail.com
      Reported-by: syzbot+7e3f31a52646f939c052@syzkaller.appspotmail.com
      Reported-by: syzbot+4f2016cf5185da7759dc@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c8c443e
    • Takashi Iwai's avatar
      ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls · a2b3309a
      Takashi Iwai authored
      commit f6d297df upstream.
      
      The previous fix 40cab6e8 ("ALSA: pcm: Return -EBUSY for OSS
      ioctls changing busy streams") introduced some mutex unbalance; the
      check of runtime->oss.rw_ref was inserted in a wrong place after the
      mutex lock.
      
      This patch fixes the inconsistency by rewriting with the helper
      functions to lock/unlock parameters with the stream check.
      
      Fixes: 40cab6e8 ("ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2b3309a
    • Takashi Iwai's avatar
      ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams · 68ba825a
      Takashi Iwai authored
      commit 40cab6e8 upstream.
      
      OSS PCM stream management isn't modal but it allows ioctls issued at
      any time for changing the parameters.  In the previous hardening
      patch ("ALSA: pcm: Avoid potential races between OSS ioctls and
      read/write"), we covered these races and prevent the corruption by
      protecting the concurrent accesses via params_lock mutex.  However,
      this means that some ioctls that try to change the stream parameter
      (e.g. channels or format) would be blocked until the read/write
      finishes, and it may take really long.
      
      Basically changing the parameter while reading/writing is an invalid
      operation, hence it's even more user-friendly from the API POV if it
      returns -EBUSY in such a situation.
      
      This patch adds such checks in the relevant ioctls with the addition
      of read/write access refcount.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68ba825a
    • Takashi Iwai's avatar
      ALSA: pcm: Avoid potential races between OSS ioctls and read/write · f3e4c937
      Takashi Iwai authored
      commit 02a5d692 upstream.
      
      Although we apply the params_lock mutex to the whole read and write
      operations as well as snd_pcm_oss_change_params(), we may still face
      some races.
      
      First off, the params_lock is taken inside the read and write loop.
      This is intentional for avoiding the too long locking, but it allows
      the in-between parameter change, which might lead to invalid
      pointers.  We check the readiness of the stream and set up via
      snd_pcm_oss_make_ready() at the beginning of read and write, but it's
      called only once, by assuming that it remains ready in the rest.
      
      Second, many ioctls that may change the actual parameters
      (i.e. setting runtime->oss.params=1) aren't protected, hence they can
      be processed in a half-baked state.
      
      This patch is an attempt to plug these holes.  The stream readiness
      check is moved inside the read/write inner loop, so that the stream is
      always set up in a proper state before further processing.  Also, each
      ioctl that may change the parameter is wrapped with the params_lock
      for avoiding the races.
      
      The issues were triggered by syzkaller in a few different scenarios,
      particularly the one below appearing as GPF in loopback_pos_update.
      
      Reported-by: syzbot+c4227aec125487ec3efa@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3e4c937
    • Takashi Iwai's avatar
      ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation · 60dd12fd
      Takashi Iwai authored
      commit c64ed5dd upstream.
      
      Fix the last standing EINTR in the whole subsystem.  Use more correct
      ERESTARTSYS for pending signals.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60dd12fd