1. 09 Oct, 2014 8 commits
    • NeilBrown's avatar
      md/raid5: disable 'DISCARD' by default due to safety concerns. · 06905ff8
      NeilBrown authored
      commit 8e0e99ba upstream.
      
      It has come to my attention (thanks Martin) that 'discard_zeroes_data'
      is only a hint.  Some devices in some cases don't do what it
      says on the label.
      
      The use of DISCARD in RAID5 depends on reads from discarded regions
      being predictably zero.  If a write to a previously discarded region
      performs a read-modify-write cycle it assumes that the parity block
      was consistent with the data blocks.  If all were zero, this would
      be the case.  If some are and some aren't this would not be the case.
      This could lead to data corruption after a device failure when
      data needs to be reconstructed from the parity.
      
      As we cannot trust 'discard_zeroes_data', ignore it by default
      and so disallow DISCARD on all raid4/5/6 arrays.
      
      As many devices are trustworthy, and as there are benefits to using
      DISCARD, add a module parameter to over-ride this caution and cause
      DISCARD to work if discard_zeroes_data is set.
      
      If a site want to enable DISCARD on some arrays but not on others they
      should select DISCARD support at the filesystem level, and set the
      raid456 module parameter.
          raid456.devices_handle_discard_safely=Y
      
      As this is a data-safety issue, I believe this patch is suitable for
      -stable.
      DISCARD support for RAID456 was added in 3.7
      
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Heinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: 620125f2Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      [bwh: Backported to 3.10: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06905ff8
    • Hans Verkuil's avatar
      media: vb2: fix VBI/poll regression · f5d34b7c
      Hans Verkuil authored
      commit 58d75f4b upstream.
      
      The recent conversion of saa7134 to vb2 unconvered a poll() bug that
      broke the teletext applications alevt and mtt. These applications
      expect that calling poll() without having called VIDIOC_STREAMON will
      cause poll() to return POLLERR. That did not happen in vb2.
      
      This patch fixes that behavior. It also fixes what should happen when
      poll() is called when STREAMON is called but no buffers have been
      queued. In that case poll() will also return POLLERR, but only for
      capture queues since output queues will always return POLLOUT
      anyway in that situation.
      
      This brings the vb2 behavior in line with the old videobuf behavior.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5d34b7c
    • Mel Gorman's avatar
      mm: numa: Do not mark PTEs pte_numa when splitting huge pages · f35407ac
      Mel Gorman authored
      commit abc40bd2 upstream.
      
      This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the
      NUMA type from the pmd to the pte"). If a huge page is being split due
      a protection change and the tail will be in a PROT_NONE vma then NUMA
      hinting PTEs are temporarily created in the protected VMA.
      
       VM_RW|VM_PROTNONE
      |-----------------|
            ^
            split here
      
      In the specific case above, it should get fixed up by change_pte_range()
      but there is a window of opportunity for weirdness to happen. Similarly,
      if a huge page is shrunk and split during a protection update but before
      pmd_numa is cleared then a pte_numa can be left behind.
      
      Instead of adding complexity trying to deal with the case, this patch
      will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
      will not be triggered which is marginal in comparison to the complexity
      in dealing with the corner cases during THP split.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f35407ac
    • Waiman Long's avatar
      mm, thp: move invariant bug check out of loop in __split_huge_page_map · 183c062c
      Waiman Long authored
      commit f8303c25 upstream.
      
      In __split_huge_page_map(), the check for page_mapcount(page) is
      invariant within the for loop.  Because of the fact that the macro is
      implemented using atomic_read(), the redundant check cannot be optimized
      away by the compiler leading to unnecessary read to the page structure.
      
      This patch moves the invariant bug check out of the loop so that it will
      be done only once.  On a 3.16-rc1 based kernel, the execution time of a
      microbenchmark that broke up 1000 transparent huge pages using munmap()
      had an execution time of 38,245us and 38,548us with and without the
      patch respectively.  The performance gain is about 1%.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      183c062c
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Fix infinite spin in reading buffer · 78a3db11
      Steven Rostedt (Red Hat) authored
      commit 24607f11 upstream.
      
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78a3db11
    • Josh Triplett's avatar
      init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu · 0000372a
      Josh Triplett authored
      commit 62b4d204 upstream.
      
      commit 03b8c7b6 ("futex: Allow
      architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
      HAVE_FUTEX_CMPXCHG symbol right below FUTEX.  This placed it right in
      the middle of the options for the EXPERT menu.  However,
      HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
      placing items in the EXPERT menu, and displays the remaining several
      EXPERT items (starting with EPOLL) directly in the General Setup menu.
      
      Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
      HAVE_FUTEX_CMPXCHG itself depend on FUTEX.  With this change, the
      subsequent items display as part of the EXPERT menu again; the EMBEDDED
      menu now appears as the next top-level item in the General Setup menu,
      which makes General Setup much shorter and more usable.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0000372a
    • Peter Zijlstra's avatar
      perf: fix perf bug in fork() · bee870fc
      Peter Zijlstra authored
      commit 6c72e350 upstream.
      
      Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
      calling perf_event_free_task() when failing sched_fork() we will not yet
      have done the memset() on ->perf_event_ctxp[] and will therefore try and
      'free' the inherited contexts, which are still in use by the parent
      process.  This is bad..
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarSylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bee870fc
    • Jan Kara's avatar
      udf: Avoid infinite loop when processing indirect ICBs · 07d209bd
      Jan Kara authored
      commit c03aa9f6 upstream.
      
      We did not implement any bound on number of indirect ICBs we follow when
      loading inode. Thus corrupted medium could cause kernel to go into an
      infinite loop, possibly causing a stack overflow.
      
      Fix the possible stack overflow by removing recursion from
      __udf_read_inode() and limit number of indirect ICBs we follow to avoid
      infinite loops.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07d209bd
  2. 05 Oct, 2014 32 commits