1. 08 Jun, 2016 11 commits
    • Waiman Long's avatar
      locking/rwsem: Protect all writes to owner by WRITE_ONCE() · fb6a44f3
      Waiman Long authored
      Without using WRITE_ONCE(), the compiler can potentially break a
      write into multiple smaller ones (store tearing). So a read from the
      same data by another task concurrently may return a partial result.
      This can result in a kernel crash if the data is a memory address
      that is being dereferenced.
      
      This patch changes all write to rwsem->owner to use WRITE_ONCE()
      to make sure that store tearing will not happen. READ_ONCE() may
      not be needed for rwsem->owner as long as the value is only used for
      comparison and not dereferencing.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1463534783-38814-3-git-send-email-Waiman.Long@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fb6a44f3
    • Waiman Long's avatar
      locking/rwsem: Add reader-owned state to the owner field · 19c5d690
      Waiman Long authored
      Currently, it is not possible to determine for sure if a reader
      owns a rwsem by looking at the content of the rwsem data structure.
      This patch adds a new state RWSEM_READER_OWNED to the owner field
      to indicate that readers currently own the lock. This enables us to
      address the following 2 issues in the rwsem optimistic spinning code:
      
       1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
          the owner field is NULL which can mean either the readers own
          the lock or the owning writer hasn't set the owner field yet.
          In the latter case, we miss the chance to do optimistic spinning.
      
       2) While a writer is waiting in the OSQ and a reader takes the lock,
          the writer will continue to spin when out of the OSQ in the main
          rwsem_optimistic_spin() loop as the owner field is NULL wasting
          CPU cycles if some of readers are sleeping.
      
      Adding the new state will allow optimistic spinning to go forward as
      long as the owner field is not RWSEM_READER_OWNED and the owner is
      running, if set, but stop immediately when that state has been reached.
      
      On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
      fio test with multithreaded randrw and randwrite tests on the same
      file on a XFS partition on top of a NVDIMM were run, the aggregated
      bandwidths before and after the patch were as follows:
      
        Test      BW before patch     BW after patch  % change
        ----      ---------------     --------------  --------
        randrw         988 MB/s          1192 MB/s      +21%
        randwrite     1513 MB/s          1623 MB/s      +7.3%
      
      The perf profile of the rwsem_down_write_failed() function in randrw
      before and after the patch were:
      
         19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
         14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
      
      The actual CPU cycles spend in rwsem_down_write_failed() dropped from
      5.88% to 1.52% after the patch.
      
      The xfstests was also run and no regression was observed.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJason Low <jason.low2@hp.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1463534783-38814-2-git-send-email-Waiman.Long@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      19c5d690
    • Jason Low's avatar
      locking/rwsem: Remove rwsem_atomic_add() and rwsem_atomic_update() · d157bd86
      Jason Low authored
      The rwsem-xadd count has been converted to an atomic variable and the
      rwsem code now directly uses atomic_long_add() and
      atomic_long_add_return(), so we can remove the arch implementations of
      rwsem_atomic_add() and rwsem_atomic_update().
      Signed-off-by: default avatarJason Low <jason.low2@hpe.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Waiman Long <Waiman.Long@hpe.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d157bd86
    • Jason Low's avatar
      locking/rwsem: Convert sem->count to 'atomic_long_t' · 8ee62b18
      Jason Low authored
      Convert the rwsem count variable to an atomic_long_t since we use it
      as an atomic variable. This also allows us to remove the
      rwsem_atomic_{add,update}() "abstraction" which would now be an unnecesary
      level of indirection. In follow up patches, we also remove the
      rwsem_atomic_{add,update}() definitions across the various architectures.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJason Low <jason.low2@hpe.com>
      [ Build warning fixes on various architectures. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Waiman Long <Waiman.Long@hpe.com>
      Link: http://lkml.kernel.org/r/1465017963-4839-2-git-send-email-jason.low2@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8ee62b18
    • Peter Zijlstra's avatar
      locking/qspinlock: Add comments · 055ce0fd
      Peter Zijlstra authored
      I figured we need to document the spin_is_locked() and
      spin_unlock_wait() constraints somwehere.
      
      Ideally 'someone' would rewrite Documentation/atomic_ops.txt and we
      could find a place in there. But currently that document is stale to
      the point of hardly being useful.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <waiman.long@hpe.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      055ce0fd
    • Peter Zijlstra's avatar
      locking/qspinlock: Clarify xchg_tail() ordering · 8d53fa19
      Peter Zijlstra authored
      While going over the code I noticed that xchg_tail() is a RELEASE but
      had no obvious pairing commented.
      
      It pairs with a somewhat unique address dependency through
      decode_tail().
      
      So the store-release of xchg_tail() is paired by the address
      dependency of the load of xchg_tail followed by the dereference from
      the pointer computed from that load.
      
      The @old -> @prev transformation itself is pure, and therefore does
      not depend on external state, so that is immaterial wrt. ordering.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <waiman.long@hpe.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8d53fa19
    • Ingo Molnar's avatar
    • Peter Zijlstra's avatar
      locking/qspinlock: Fix spin_unlock_wait() some more · 2c610022
      Peter Zijlstra authored
      While this prior commit:
      
        54cf809b ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
      
      ... fixes spin_is_locked() and spin_unlock_wait() for the usage
      in ipc/sem and netfilter, it does not in fact work right for the
      usage in task_work and futex.
      
      So while the 2 locks crossed problem:
      
      	spin_lock(A)		spin_lock(B)
      	if (!spin_is_locked(B)) spin_unlock_wait(A)
      	  foo()			foo();
      
      ... works with the smp_mb() injected by both spin_is_locked() and
      spin_unlock_wait(), this is not sufficient for:
      
      	flag = 1;
      	smp_mb();		spin_lock()
      	spin_unlock_wait()	if (!flag)
      				  // add to lockless list
      	// iterate lockless list
      
      ... because in this scenario, the store from spin_lock() can be delayed
      past the load of flag, uncrossing the variables and loosing the
      guarantee.
      
      This patch reworks spin_is_locked() and spin_unlock_wait() to work in
      both cases by exploiting the observation that while the lock byte
      store can be delayed, the contender must have registered itself
      visibly in other state contained in the word.
      
      It also allows for architectures to override both functions, as PPC
      and ARM64 have an additional issue for which we currently have no
      generic solution.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Giovanni Gherdovich <ggherdovich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <waiman.long@hpe.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org # v4.2 and later
      Fixes: 54cf809b ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2c610022
    • Peter Zijlstra's avatar
      locking/barriers: Validate lockless_dereference() is used on a pointer type · 331b6d8c
      Peter Zijlstra authored
      Use the type to validate the argument @p is indeed a pointer type.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160522104827.GP3193@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      331b6d8c
    • Sebastian Andrzej Siewior's avatar
      locking/rtmutex: Only warn once on a trylock from bad context · a461d587
      Sebastian Andrzej Siewior authored
      One warning should be enough to get one motivated to fix this. It is
      possible that this happens more than once and that starts flooding the
      output. Later the prints will be suppressed so we only get half of it.
      Depending on the console system used it might not be helpful.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1464356838-1755-1-git-send-email-bigeasy@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a461d587
    • Peter Zijlstra's avatar
      locking/lockdep: Use __jhash_mix() for iterate_chain_key() · dfaaf3fa
      Peter Zijlstra authored
      Use __jhash_mix() to mix the class_idx into the class_key. This
      function provides better mixing than the previously used, home grown
      mix function.
      
      Leave hashing to the professionals :-)
      Suggested-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dfaaf3fa
  2. 03 Jun, 2016 7 commits
    • Tejun Heo's avatar
      percpu, locking: Revert ("percpu: Replace smp_read_barrier_depends() with lockless_dereference()") · ed8ebd1d
      Tejun Heo authored
      lockless_dereference() is planned to grow a sanity check to ensure
      that the input parameter is a pointer.  __ref_is_percpu() passes in an
      unsinged long value which is a combination of a pointer and a flag.
      While it can be casted to a pointer lvalue, the casting looks messy
      and it's a special case anyway.  Let's revert back to open-coding
      READ_ONCE() and explicit barrier.
      
      This doesn't cause any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@fb.com
      Link: http://lkml.kernel.org/g/20160522185040.GA23664@p183.telecom.bySigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ed8ebd1d
    • Jason Low's avatar
      locking/mutex: Set and clear owner using WRITE_ONCE() · 6e281474
      Jason Low authored
      The mutex owner can get read and written to locklessly.
      Use WRITE_ONCE when setting and clearing the owner field
      in order to avoid optimizations such as store tearing. This
      avoids situations where the owner field gets written to with
      multiple stores and another thread could concurrently read
      and use a partially written owner value.
      Signed-off-by: default avatarJason Low <jason.low2@hpe.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: default avatarWaiman Long <Waiman.Long@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1463782776.2479.9.camel@j-VirtualBoxSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6e281474
    • Jason Low's avatar
      locking/rwsem: Optimize write lock by reducing operations in slowpath · c0fcb6c2
      Jason Low authored
      When acquiring the rwsem write lock in the slowpath, we first try
      to set count to RWSEM_WAITING_BIAS. When that is successful,
      we then atomically add the RWSEM_WAITING_BIAS in cases where
      there are other tasks on the wait list. This causes write lock
      operations to often issue multiple atomic operations.
      
      We can instead make the list_is_singular() check first, and then
      set the count accordingly, so that we issue at most 1 atomic
      operation when acquiring the write lock and reduce unnecessary
      cacheline contention.
      Signed-off-by: default avatarJason Low <jason.low2@hpe.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Waiman Long<Waiman.Long@hpe.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1463445486-16078-2-git-send-email-jason.low2@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c0fcb6c2
    • Davidlohr Bueso's avatar
      locking/rwsem: Rework zeroing reader waiter->task · e3851390
      Davidlohr Bueso authored
      Readers that are awoken will expect a nil ->task indicating
      that a wakeup has occurred. Because of the way readers are
      implemented, there's a small chance that the waiter will never
      block in the slowpath (rwsem_down_read_failed), and therefore
      requires some form of reference counting to avoid the following
      scenario:
      
      rwsem_down_read_failed()		rwsem_wake()
        get_task_struct();
        spin_lock_irq(&wait_lock);
        list_add_tail(&waiter.list)
        spin_unlock_irq(&wait_lock);
      					  raw_spin_lock_irqsave(&wait_lock)
      					  __rwsem_do_wake()
        while (1) {
          set_task_state(TASK_UNINTERRUPTIBLE);
      					    waiter->task = NULL
          if (!waiter.task) // true
            break;
          schedule() // never reached
      
         __set_task_state(TASK_RUNNING);
       do_exit();
      					    wake_up_process(tsk); // boom
      
      ... and therefore race with do_exit() when the caller returns.
      
      There is also a mismatch between the smp_mb() and its documentation,
      in that the serialization is done between reading the task and the
      nil store. Furthermore, in addition to having the overlapping of
      loads and stores to waiter->task guaranteed to be ordered within
      that CPU, both wake_up_process() originally and now wake_q_add()
      already imply barriers upon successful calls, which serves the
      comment.
      
      Now, as an alternative to perhaps inverting the checks in the blocker
      side (which has its own penalty in that schedule is unavoidable),
      with lockless wakeups this situation is naturally addressed and we
      can just use the refcount held by wake_q_add(), instead doing so
      explicitly. Of course, we must guarantee that the nil store is done
      as the _last_ operation in that the task must already be marked for
      deletion to not fall into the race above. Spurious wakeups are also
      handled transparently in that the task's reference is only removed
      when wake_up_q() is actually called _after_ the nil store.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hpe.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hp.com
      Cc: peter@hurleysoftware.com
      Link: http://lkml.kernel.org/r/1463165787-25937-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3851390
    • Davidlohr Bueso's avatar
      locking/rwsem: Enable lockless waiter wakeup(s) · 133e89ef
      Davidlohr Bueso authored
      As wake_qs gain users, we can teach rwsems about them such that
      waiters can be awoken without the wait_lock. This is for both
      readers and writer, the former being the most ideal candidate
      as we can batch the wakeups shortening the critical region that
      much more -- ie writer task blocking a bunch of tasks waiting to
      service page-faults (mmap_sem readers).
      
      In general applying wake_qs to rwsem (xadd) is not difficult as
      the wait_lock is intended to be released soon _anyways_, with
      the exception of when a writer slowpath will proactively wakeup
      any queued readers if it sees that the lock is owned by a reader,
      in which we simply do the wakeups with the lock held (see comment
      in __rwsem_down_write_failed_common()).
      
      Similar to other locking primitives, delaying the waiter being
      awoken does allow, at least in theory, the lock to be stolen in
      the case of writers, however no harm was seen in this (in fact
      lock stealing tends to be a _good_ thing in most workloads), and
      this is a tiny window anyways.
      
      Some page-fault (pft) and mmap_sem intensive benchmarks show some
      pretty constant reduction in systime (by up to ~8 and ~10%) on a
      2-socket, 12 core AMD box. In addition, on an 8-core Westmere doing
      page allocations (page_test)
      
      aim9:
      	 4.6-rc6				4.6-rc6
      						rwsemv2
      Min      page_test   378167.89 (  0.00%)   382613.33 (  1.18%)
      Min      exec_test      499.00 (  0.00%)      502.67 (  0.74%)
      Min      fork_test     3395.47 (  0.00%)     3537.64 (  4.19%)
      Hmean    page_test   395433.06 (  0.00%)   414693.68 (  4.87%)
      Hmean    exec_test      499.67 (  0.00%)      505.30 (  1.13%)
      Hmean    fork_test     3504.22 (  0.00%)     3594.95 (  2.59%)
      Stddev   page_test    17426.57 (  0.00%)    26649.92 (-52.93%)
      Stddev   exec_test        0.47 (  0.00%)        1.41 (-199.05%)
      Stddev   fork_test       63.74 (  0.00%)       32.59 ( 48.86%)
      Max      page_test   429873.33 (  0.00%)   456960.00 (  6.30%)
      Max      exec_test      500.33 (  0.00%)      507.66 (  1.47%)
      Max      fork_test     3653.33 (  0.00%)     3650.90 ( -0.07%)
      
      	     4.6-rc6     4.6-rc6
      			 rwsemv2
      User            1.12        0.04
      System          0.23        0.04
      Elapsed       727.27      721.98
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hpe.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hp.com
      Cc: peter@hurleysoftware.com
      Link: http://lkml.kernel.org/r/1463165787-25937-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      133e89ef
    • Chris Wilson's avatar
      locking/ww_mutex: Report recursive ww_mutex locking early · 0422e83d
      Chris Wilson authored
      Recursive locking for ww_mutexes was originally conceived as an
      exception. However, it is heavily used by the DRM atomic modesetting
      code. Currently, the recursive deadlock is checked after we have queued
      up for a busy-spin and as we never release the lock, we spin until
      kicked, whereupon the deadlock is discovered and reported.
      
      A simple solution for the now common problem is to move the recursive
      deadlock discovery to the first action when taking the ww_mutex.
      Suggested-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0422e83d
    • Peter Zijlstra's avatar
      locking/seqcount: Re-fix raw_read_seqcount_latch() · 55eed755
      Peter Zijlstra authored
      Commit 50755bc1 ("seqlock: fix raw_read_seqcount_latch()") broke
      raw_read_seqcount_latch().
      
      If you look at the comment that was modified; the thing that changes is
      the seq count, not the latch pointer.
      
       * void latch_modify(struct latch_struct *latch, ...)
       * {
       *	smp_wmb();	<- Ensure that the last data[1] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[0], ...);
       *
       *	smp_wmb();	<- Ensure that the data[0] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[1], ...);
       * }
       *
       * The query will have a form like:
       *
       * struct entry *latch_query(struct latch_struct *latch, ...)
       * {
       *	struct entry *entry;
       *	unsigned seq, idx;
       *
       *	do {
       *		seq = lockless_dereference(latch->seq);
      
      So here we have:
      
      		seq = READ_ONCE(latch->seq);
      		smp_read_barrier_depends();
      
      Which is exactly what we want; the new code:
      
      		seq = ({ p = READ_ONCE(latch);
      			 smp_read_barrier_depends(); p })->seq;
      
      is just wrong; because it looses the volatile read on seq, which can now
      be torn or worse 'optimized'. And the read_depend barrier is also placed
      wrong, we want it after the load of seq, to match the above data[]
      up-to-date wmb()s.
      
      Such that when we dereference latch->data[] below, we're guaranteed to
      observe the right data.
      
       *
       *		idx = seq & 0x01;
       *		entry = data_query(latch->data[idx], ...);
       *
       *		smp_rmb();
       *	} while (seq != latch->seq);
       *
       *	return entry;
       * }
      
      So yes, not passing a pointer is not pretty, but the code was correct,
      and isn't anymore now.
      
      Change to explicit READ_ONCE()+smp_read_barrier_depends() to avoid
      confusion and allow strict lockless_dereference() checking.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 50755bc1 ("seqlock: fix raw_read_seqcount_latch()")
      Link: http://lkml.kernel.org/r/20160527111117.GL3192@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      55eed755
  3. 01 Jun, 2016 5 commits
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 719af93a
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Here are three pin control fixes for v4.7.  Not much, and just driver
        fixes:
      
         - add device tree matches to MAINTAINERS
      
         - inversion bug in the Nomadik driver
      
         - dual edge handling bug in the mediatek driver"
      
      * tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: mediatek: fix dual-edge code defect
        MAINTAINERS: Add file patterns for pinctrl device tree bindings
        pinctrl: nomadik: fix inversion of gpio direction
      719af93a
    • Linus Torvalds's avatar
      Merge tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf · ebb8cb2b
      Linus Torvalds authored
      Pull dma-buf updates from Sumit Semwal:
      
       - use of vma_pages instead of explicit computation
      
       - DocBook and headerdoc updates for dma-buf
      
      * tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
        dma-buf: use vma_pages()
        fence: add missing descriptions for fence
        doc: update/fixup dma-buf related DocBook
        reservation: add headerdoc comments
        dma-buf: headerdoc fixes
      ebb8cb2b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6b15d665
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.
      
       2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
          properly.  From Ezequiel Garcia.
      
       3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.
      
       4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.
      
       5) Fix deadlock on team->lock when propagating features, from Ivan
          Vecera.
      
       6) Work around buffer offset hw bug in alx chips, from Feng Tang.
      
       7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
          Long.
      
       8) Various statistics bug fixes in mlx4 from Eric Dumazet.
      
       9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.
      
      10) All of l2tp was namespace aware, but the ipv6 support code was not
          doing so.  From Shmulik Ladkani.
      
      11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.
      
      12) Propagate MAC changes properly through VLAN devices, from Mike
          Manning.
      
      13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
        sfc: Track RPS flow IDs per channel instead of per function
        usbnet: smsc95xx: fix link detection for disabled autonegotiation
        virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
        bnx2x: avoid leaking memory on bnx2x_init_one() failures
        fou: fix IPv6 Kconfig options
        openvswitch: update checksum in {push,pop}_mpls
        sctp: sctp_diag should dump sctp socket type
        net: fec: update dirty_tx even if no skb
        vlan: Propagate MAC address to VLANs
        atm: iphase: off by one in rx_pkt()
        atm: firestream: add more reserved strings
        vxlan: Accept user specified MTU value when create new vxlan link
        net: pktgen: Call destroy_hrtimer_on_stack()
        timer: Export destroy_hrtimer_on_stack()
        net: l2tp: Make l2tp_ip6 namespace aware
        Documentation: ip-sysctl.txt: clarify secure_redirects
        sfc: use flow dissector helpers for aRFS
        ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
        net: nps_enet: Disable interrupts before napi reschedule
        net/lapb: tuse %*ph to dump buffers
        ...
      6b15d665
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 58c1f995
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "sparc64 mmu context allocation and trap return bug fixes"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix return from trap window fill crashes.
        sparc: Harden signal return frame checks.
        sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
      58c1f995
    • Jon Cooper's avatar
      sfc: Track RPS flow IDs per channel instead of per function · faf8dcc1
      Jon Cooper authored
      Otherwise we get confused when two flows on different channels get the
       same flow ID.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faf8dcc1
  4. 31 May, 2016 17 commits