1. 03 Mar, 2016 4 commits
  2. 02 Mar, 2016 26 commits
  3. 25 Feb, 2016 10 commits
    • Jiri Slaby's avatar
      Linux 3.12.55 · bb47c5ec
      Jiri Slaby authored
      bb47c5ec
    • Dave Chinner's avatar
      xfs: inode recovery readahead can race with inode buffer creation · af8d9716
      Dave Chinner authored
      commit b79f4a1c upstream.
      
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      af8d9716
    • Darrick J. Wong's avatar
      libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct · 81184804
      Darrick J. Wong authored
      commit 96f859d5 upstream.
      
      Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
      inside it, gcc will quietly round the structure size up to the nearest
      64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
      macro returning incorrect results for v5 filesystems on 64-bit
      machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
      will see garbage in AGFL item 119 and complain.
      
      Therefore, tell gcc not to pad the structure so that the AGFL size
      calculation is correct.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      81184804
    • Rusty Russell's avatar
      module: wrapper for symbol name. · 9aa1d608
      Rusty Russell authored
      commit 2e7bac53 upstream.
      
      This trivial wrapper adds clarity and makes the following patch
      smaller.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9aa1d608
    • Thomas Gleixner's avatar
      futex: Drop refcount if requeue_pi() acquired the rtmutex · 8a39ca32
      Thomas Gleixner authored
      commit fb75a428 upstream.
      
      If the proxy lock in the requeue loop acquires the rtmutex for a
      waiter then it acquired also refcount on the pi_state related to the
      futex, but the waiter side does not drop the reference count.
      
      Add the missing free_pi_state() call.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Bhuvanesh_Surachari@mentor.com
      Cc: Andy Lowe <Andy_Lowe@mentor.com>
      Link: http://lkml.kernel.org/r/20151219200607.178132067@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8a39ca32
    • Dan Carpenter's avatar
      intel_scu_ipcutil: underflow in scu_reg_access() · 5e903959
      Dan Carpenter authored
      commit b1d353ad upstream.
      
      "count" is controlled by the user and it can be negative.  Let's prevent
      that by making it unsigned.  You have to have CAP_SYS_RAWIO to call this
      function so the bug is not as serious as it could be.
      
      Fixes: 5369c02d ('intel_scu_ipc: Utility driver for intel scu ipc')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5e903959
    • Eric Dumazet's avatar
      dump_stack: avoid potential deadlocks · 535f0e46
      Eric Dumazet authored
      commit d7ce3692 upstream.
      
      Some servers experienced fatal deadlocks because of a combination of
      bugs, leading to multiple cpus calling dump_stack().
      
      The checksumming bug was fixed in commit 34ae6a1a ("ipv6: update
      skb->csum when CE mark is propagated").
      
      The second problem is a faulty locking in dump_stack()
      
      CPU1 runs in process context and calls dump_stack(), grabs dump_lock.
      
         CPU2 receives a TCP packet under softirq, grabs socket spinlock, and
         call dump_stack() from netdev_rx_csum_fault().
      
         dump_stack() spins on atomic_cmpxchg(&dump_lock, -1, 2), since
         dump_lock is owned by CPU1
      
      While dumping its stack, CPU1 is interrupted by a softirq, and happens
      to process a packet for the TCP socket locked by CPU2.
      
      CPU1 spins forever in spin_lock() : deadlock
      
      Stack trace on CPU1 looked like :
      
          NMI backtrace for cpu 1
          RIP: _raw_spin_lock+0x25/0x30
          ...
          Call Trace:
            <IRQ>
            tcp_v6_rcv+0x243/0x620
            ip6_input_finish+0x11f/0x330
            ip6_input+0x38/0x40
            ip6_rcv_finish+0x3c/0x90
            ipv6_rcv+0x2a9/0x500
            process_backlog+0x461/0xaa0
            net_rx_action+0x147/0x430
            __do_softirq+0x167/0x2d0
            call_softirq+0x1c/0x30
            do_softirq+0x3f/0x80
            irq_exit+0x6e/0xc0
            smp_call_function_single_interrupt+0x35/0x40
            call_function_single_interrupt+0x6a/0x70
            <EOI>
            printk+0x4d/0x4f
            printk_address+0x31/0x33
            print_trace_address+0x33/0x3c
            print_context_stack+0x7f/0x119
            dump_trace+0x26b/0x28e
            show_trace_log_lvl+0x4f/0x5c
            show_stack_log_lvl+0x104/0x113
            show_stack+0x42/0x44
            dump_stack+0x46/0x58
            netdev_rx_csum_fault+0x38/0x3c
            __skb_checksum_complete_head+0x6e/0x80
            __skb_checksum_complete+0x11/0x20
            tcp_rcv_established+0x2bd5/0x2fd0
            tcp_v6_do_rcv+0x13c/0x620
            sk_backlog_rcv+0x15/0x30
            release_sock+0xd2/0x150
            tcp_recvmsg+0x1c1/0xfc0
            inet_recvmsg+0x7d/0x90
            sock_recvmsg+0xaf/0xe0
            ___sys_recvmsg+0x111/0x3b0
            SyS_recvmsg+0x5c/0xb0
            system_call_fastpath+0x16/0x1b
      
      Fixes: b58d9774 ("dump_stack: serialize the output from dump_stack()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      535f0e46
    • Konstantin Khlebnikov's avatar
      radix-tree: fix oops after radix_tree_iter_retry · 135bf262
      Konstantin Khlebnikov authored
      commit 73204282 upstream.
      
      Helper radix_tree_iter_retry() resets next_index to the current index.
      In following radix_tree_next_slot current chunk size becomes zero.  This
      isn't checked and it tries to dereference null pointer in slot.
      
      Tagged iterator is fine because retry happens only at slot 0 where tag
      bitmask in iter->tags is filled with single bit.
      
      Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Jeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      135bf262
    • Matthew Wilcox's avatar
      radix-tree: fix race in gang lookup · 44e02657
      Matthew Wilcox authored
      commit 46437f9a upstream.
      
      If the indirect_ptr bit is set on a slot, that indicates we need to redo
      the lookup.  Introduce a new function radix_tree_iter_retry() which
      forces the loop to retry the lookup by setting 'slot' to NULL and
      turning the iterator back to point at the problematic entry.
      
      This is a pretty rare problem to hit at the moment; the lookup has to
      race with a grow of the radix tree from a height of 0.  The consequences
      of hitting this race are that gang lookup could return a pointer to a
      radix_tree_node instead of a pointer to whatever the user had inserted
      in the tree.
      
      Fixes: cebbd29e ("radix-tree: rewrite gang lookup using iterator")
      Signed-off-by: default avatarMatthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      44e02657
    • Martijn Coenen's avatar
      memcg: only free spare array when readers are done · 1c2456e3
      Martijn Coenen authored
      commit 6611d8d7 upstream.
      
      A spare array holding mem cgroup threshold events is kept around to make
      sure we can always safely deregister an event and have an array to store
      the new set of events in.
      
      In the scenario where we're going from 1 to 0 registered events, the
      pointer to the primary array containing 1 event is copied to the spare
      slot, and then the spare slot is freed because no events are left.
      However, it is freed before calling synchronize_rcu(), which means
      readers may still be accessing threshold->primary after it is freed.
      
      Fixed by only freeing after synchronize_rcu().
      Signed-off-by: default avatarMartijn Coenen <maco@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1c2456e3