1. 14 Jan, 2017 23 commits
    • Chris Wilson's avatar
      locking/ww_mutex: Add kselftests for ww_mutex AA deadlock detection · c22fb380
      Chris Wilson authored
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Nicolai Hähnle <nhaehnle@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161201114711.28697-5-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c22fb380
    • Chris Wilson's avatar
      locking/ww_mutex: Begin kselftests for ww_mutex · f2a5fec1
      Chris Wilson authored
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Nicolai Hähnle <nhaehnle@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161201114711.28697-4-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f2a5fec1
    • Chris Wilson's avatar
      locking/ww_mutex: Add ww_mutex to locktorture test · 0186a6cb
      Chris Wilson authored
      Although ww_mutexes degenerate into mutexes, it would be useful to
      torture the deadlock handling between multiple ww_mutexes in addition to
      torturing the regular mutexes.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Nicolai Hähnle <nhaehnle@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161201114711.28697-3-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0186a6cb
    • Chris Wilson's avatar
      locking/ww_mutex: Fix compilation of __WW_MUTEX_INITIALIZER · af2e859e
      Chris Wilson authored
      From conflicting macro parameters, passing the wrong name to
      __MUTEX_INITIALIZER and a stray '\', #define __WW_MUTEX_INITIALIZER was
      very unhappy.
      
      One unnecessary change was to choose to pass &ww_class instead of
      implicitly taking the address of the class within the macro.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 1b375dc3 ("mutex: Move ww_mutex definitions to ww_mutex.h")
      Link: http://lkml.kernel.org/r/20161201114711.28697-2-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      af2e859e
    • Nicolai Hähnle's avatar
      locking/ww_mutex/Documentation: Update the design document · 27bd57aa
      Nicolai Hähnle authored
      Document the invariants we maintain for the wait list of ww_mutexes.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-13-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      27bd57aa
    • Nicolai Hähnle's avatar
      locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging · 977625a6
      Nicolai Hähnle authored
      Help catch cases where mutex_lock is used directly on w/w mutexes, which
      otherwise result in the w/w tasks reading uninitialized data.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-12-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      977625a6
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Optimize ww-mutexes by yielding to other waiters from optimistic spin · c516df97
      Nicolai Hähnle authored
      Lock stealing is less beneficial for w/w mutexes since we may just end up
      backing off if we stole from a thread with an earlier acquire stamp that
      already holds another w/w mutex that we also need. So don't spin
      optimistically unless we are sure that there is no other waiter that might
      cause us to back off.
      
      Median timings taken of a contention-heavy GPU workload:
      
      Before:
      
        real    0m52.946s
        user    0m7.272s
        sys     1m55.964s
      
      After:
      
        real    0m53.086s
        user    0m7.360s
        sys     1m46.204s
      
      This particular workload still spends 20%-25% of CPU in mutex_spin_on_owner
      according to perf, but my attempts to further reduce this spinning based on
      various heuristics all lead to an increase in measured wall time despite
      the decrease in sys time.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-11-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c516df97
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop · 25f13b40
      Nicolai Hähnle authored
      In the following scenario, thread #1 should back off its attempt to lock
      ww1 and unlock ww2 (assuming the acquire context stamps are ordered
      accordingly).
      
          Thread #0               Thread #1
          ---------               ---------
                                  successfully lock ww2
          set ww1->base.owner
                                  attempt to lock ww1
                                  confirm ww1->ctx == NULL
                                  enter mutex_spin_on_owner
          set ww1->ctx
      
      What was likely to happen previously is:
      
          attempt to lock ww2
          refuse to spin because
            ww2->ctx != NULL
          schedule()
                                  detect thread #0 is off CPU
                                  stop optimistic spin
                                  return -EDEADLK
                                  unlock ww2
                                  wakeup thread #0
          lock ww2
      
      Now, we are more likely to see:
      
                                  detect ww1->ctx != NULL
                                  stop optimistic spin
                                  return -EDEADLK
                                  unlock ww2
          successfully lock ww2
      
      ... because thread #1 will stop its optimistic spin as soon as possible.
      
      The whole scenario is quite unlikely, since it requires thread #1 to get
      between thread #0 setting the owner and setting the ctx. But since we're
      idling here anyway, the additional check is basically free.
      
      Found by inspection.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-10-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      25f13b40
    • Peter Zijlstra's avatar
      locking/mutex: Improve inlining · 427b1820
      Peter Zijlstra authored
      Instead of inlining __mutex_lock_common() 5 times, once for each
      {state,ww} variant. Reduce this to two, ww and !ww.
      
      Then add __always_inline to mutex_optimistic_spin(), so that that will
      get inlined all 4 remaining times, for all {waiter,ww} variants.
      
         text    data     bss     dec     hex filename
      
         6301       0       0    6301    189d defconfig-build/kernel/locking/mutex.o
         4053       0       0    4053     fd5 defconfig-build/kernel/locking/mutex.o
         4257       0       0    4257    10a1 defconfig-build/kernel/locking/mutex.o
      
      This reduces total text size and better separates the ww and !ww mutex
      code generation.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      427b1820
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff... · 659cf9f5
      Nicolai Hähnle authored
      locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock
      
      The wait list is sorted by stamp order, and the only waiting task that may
      have to back off is the first waiter with a context.
      
      The regular slow path does not have to wake any other tasks at all, since
      all other waiters that would have to back off were either woken up when
      the waiter was added to the list, or detected the condition before they
      added themselves.
      
      Median timings taken of a contention-heavy GPU workload:
      
      Without this series:
      
        real    0m59.900s
        user    0m7.516s
        sys     2m16.076s
      
      With changes up to and including this patch:
      
        real    0m52.946s
        user    0m7.272s
        sys     1m55.964s
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-9-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      659cf9f5
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list · 200b1874
      Nicolai Hähnle authored
      While adding our task as a waiter, detect if another task should back off
      because of us.
      
      With this patch, we establish the invariant that the wait list contains
      at most one (sleeping) waiter with ww_ctx->acquired > 0, and this waiter
      will be the first waiter with a context.
      
      Since only waiters with ww_ctx->acquired > 0 have to back off, this allows
      us to be much more economical with wakeups.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-8-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      200b1874
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Add waiters in stamp order · 6baa5c60
      Nicolai Hähnle authored
      Add regular waiters in stamp order. Keep adding waiters that have no
      context in FIFO order and take care not to starve them.
      
      While adding our task as a waiter, back off if we detect that there is
      a waiter with a lower stamp in front of us.
      
      Make sure to call lock_contended even when we back off early.
      
      For w/w mutexes, being first in the wait list is only stable when
      taking the lock without a context. Therefore, the purpose of the first
      flag is split into two: 'first' remains to indicate whether we want to
      spin optimistically, while 'handoff' indicates that we should be
      prepared to accept a handoff.
      
      For w/w locking with a context, we always accept handoffs after the
      first schedule(), to handle the following sequence of events:
      
       1. Task #0 unlocks and hands off to Task #2 which is first in line
      
       2. Task #1 adds itself in front of Task #2
      
       3. Task #2 wakes up and must accept the handoff even though it is no
          longer first in line
      Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-7-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6baa5c60
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Remove the __ww_mutex_lock*() inline wrappers · c5470b22
      Nicolai Hähnle authored
      Keep the documentation in the header file since there is no good place
      for it in mutex.c: there are two rather different implementations with
      different EXPORT_SYMBOLs for each function.
      Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-6-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c5470b22
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Set use_ww_ctx even when locking without a context · ea9e0fb8
      Nicolai Hähnle authored
      We will add a new field to struct mutex_waiter.  This field must be
      initialized for all waiters if any waiter uses the ww_use_ctx path.
      
      So there is a trade-off: Keep ww_mutex locking without a context on
      the faster non-use_ww_ctx path, at the cost of adding the
      initialization to all mutex locks (including non-ww_mutexes), or avoid
      the additional cost for non-ww_mutex locks, at the cost of adding
      additional checks to the use_ww_ctx path.
      
      We take the latter choice.  It may be worth eliminating the users of
      ww_mutex_lock(lock, NULL), but there are a lot of them.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-5-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ea9e0fb8
    • Nicolai Hähnle's avatar
      locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after() · 3822da3e
      Nicolai Hähnle authored
      The function will be re-used in subsequent patches.
      Signed-off-by: default avatarNicolai Hähnle <Nicolai.Haehnle@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <dev@mblankhorst.nl>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/1482346000-9927-4-git-send-email-nhaehnle@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3822da3e
    • Peter Zijlstra's avatar
      locking/mutex: Fix mutex handoff · e274795e
      Peter Zijlstra authored
      While reviewing the ww_mutex patches, I noticed that it was still
      possible to (incorrectly) succeed for (incorrect) code like:
      
      	mutex_lock(&a);
      	mutex_lock(&a);
      
      This was possible if the second mutex_lock() would block (as expected)
      but then receive a spurious wakeup. At that point it would find itself
      at the front of the queue, request a handoff and instantly claim
      ownership and continue, since owner would point to itself.
      
      Avoid this scenario and simplify the code by introducing a third low
      bit to signal handoff pickup. So once we request handoff, unlock
      clears the handoff bit and sets the pickup bit along with the new
      owner.
      
      This also removes the need for the .handoff argument to
      __mutex_trylock(), since that becomes superfluous with PICKUP.
      
      In order to guarantee enough low bits, ensure task_struct alignment is
      at least L1_CACHE_BYTES (which seems a good ideal regardless).
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d659ae1 ("locking/mutex: Add lock handoff to avoid starvation")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e274795e
    • Davidlohr Bueso's avatar
      locking/percpu-rwsem: Replace waitqueue with rcuwait · 52b94129
      Davidlohr Bueso authored
      The use of any kind of wait queue is an overkill for pcpu-rwsems.
      While one option would be to use the less heavy simple (swait)
      flavor, this is still too much for what pcpu-rwsems needs. For one,
      we do not care about any sort of queuing in that the only (rare) time
      writers (and readers, for that matter) are queued is when trying to
      acquire the regular contended rw_sem. There cannot be any further
      queuing as writers are serialized by the rw_sem in the first place.
      
      Given that percpu_down_write() must not be called after exit_notify(),
      we can replace the bulky waitqueue with rcuwait such that a writer
      can wait for its turn to take the lock. As such, we can avoid the
      queue handling and locking overhead.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1484148146-14210-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      52b94129
    • Davidlohr Bueso's avatar
      sched/wait, RCU: Introduce rcuwait machinery · 8f95c90c
      Davidlohr Bueso authored
      rcuwait provides support for (single) RCU-safe task wait/wake functionality,
      with the caveat that it must not be called after exit_notify(), such that
      we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
      being rcu unaware in this context -- for which we similarly have
      task_rcu_dereference() magic, but with different return semantics, which
      can conflict with the wakeup side.
      
      The interfaces are quite straightforward:
      
        rcuwait_wait_event()
        rcuwait_wake_up()
      
      More details are in the comments, but it's perhaps worth mentioning at least,
      that users must provide proper serialization when waiting on a condition, and
      avoid corrupting a concurrent waiter. Also care must be taken between the task
      and the condition for when calling the wakeup -- we cannot miss wakeups. When
      porting users, this is for example, a given when using waitqueues in that
      everything is done under the q->lock. As such, it can remove sources of non
      preemptable unbounded work for realtime.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8f95c90c
    • Davidlohr Bueso's avatar
      sched/core: Remove set_task_state() · 642fa448
      Davidlohr Bueso authored
      This is a nasty interface and setting the state of a foreign task must
      not be done. As of the following commit:
      
        be628be0 ("bcache: Make gc wakeup sane, remove set_task_state()")
      
      ... everyone in the kernel calls set_task_state() with current, allowing
      the helper to be removed.
      
      However, as the comment indicates, it is still around for those archs
      where computing current is more expensive than using a pointer, at least
      in theory. An important arch that is affected is arm64, however this has
      been addressed now [1] and performance is up to par making no difference
      with either calls.
      
      Of all the callers, if any, it's the locking bits that would care most
      about this -- ie: we end up passing a tsk pointer to a lot of the lock
      slowpath, and setting ->state on that. The following numbers are based
      on two tests: a custom ad-hoc microbenchmark that just measures
      latencies (for ~65 million calls) between get_task_state() vs
      get_current_state().
      
      Secondly for a higher overview, an unlink microbenchmark was used,
      which pounds on a single file with open, close,unlink combos with
      increasing thread counts (up to 4x ncpus). While the workload is quite
      unrealistic, it does contend a lot on the inode mutex or now rwsem.
      
      [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      
      x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
      and ppc64 (with paca) show similar improvements in the unlink microbenches.
      The small delta for ppc64 (2ms), does not represent the gains on the unlink
      runs. In the case of x86, there was a decent amount of variation in the
      latency runs, but always within a 20 to 50ms increase), ppc was more constant.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      642fa448
    • Davidlohr Bueso's avatar
      kernel/locking: Compute 'current' directly · d269a8b8
      Davidlohr Bueso authored
      This patch effectively replaces the tsk pointer dereference
      (which is obviously == current), to directly use get_current()
      macro. This is to make the removal of setting foreign task
      states smoother and painfully obvious. Performance win on some
      archs such as x86-64 and ppc64. On a microbenchmark that calls
      set_task_state() vs set_current_state() and an inode rwsem
      pounding benchmark doing unlink:
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-4-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d269a8b8
    • Davidlohr Bueso's avatar
      drivers/tty: Compute 'current' directly · 5376f2e7
      Davidlohr Bueso authored
      This patch effectively replaces the tsk pointer dereference
      (which is obviously == current), to directly use get_current()
      macro. This is to make the removal of setting foreign task
      states smoother and painfully obvious. Performance win on some
      archs such as x86-64 and ppc64 -- arm64 is no longer an issue.
      Signed-off-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5376f2e7
    • Davidlohr Bueso's avatar
      kernel/exit: Compute 'current' directly · 0039962a
      Davidlohr Bueso authored
      This patch effectively replaces the tsk pointer dereference (which is
      obviously == current), to directly use get_current() macro. In this
      case, do_exit() always passes current to exit_mm(), hence we can
      simply get rid of the argument. This is also a performance win on some
      archs such as x86-64 and ppc64 -- arm64 is no longer an issue.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0039962a
    • Waiman Long's avatar
      locking/spinlocks/x86, paravirt: Remove paravirt_ticketlocks_enabled · aef591cd
      Waiman Long authored
      This is a follow-up of commit:
      
        cfd8983f ("x86, locking/spinlocks: Remove ticket (spin)lock implementation")
      
      The static_key structure 'paravirt_ticketlocks_enabled' is now removed as it is
      no longer used.
      
      As a result, the init functions kvm_spinlock_init_jump() and
      xen_init_spinlocks_jump() are also removed.
      
      A simple build and boot test was done to verify it.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1484252878-1962-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aef591cd
  2. 12 Jan, 2017 3 commits
  3. 11 Jan, 2017 14 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ba836a6f
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "27 fixes.
      
        There are three patches that aren't actually fixes. They're simple
        function renamings which are nice-to-have in mainline as ongoing net
        development depends on them."
      
      * akpm: (27 commits)
        timerfd: export defines to userspace
        mm/hugetlb.c: fix reservation race when freeing surplus pages
        mm/slab.c: fix SLAB freelist randomization duplicate entries
        zram: support BDI_CAP_STABLE_WRITES
        zram: revalidate disk under init_lock
        mm: support anonymous stable page
        mm: add documentation for page fragment APIs
        mm: rename __page_frag functions to __page_frag_cache, drop order from drain
        mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
        mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
        mm: don't dereference struct page fields of invalid pages
        mailmap: add codeaurora.org names for nameless email commits
        signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
        mm: pmd dirty emulation in page fault handler
        ipc/sem.c: fix incorrect sem_lock pairing
        lib/Kconfig.debug: fix frv build failure
        mm: get rid of __GFP_OTHER_NODE
        mm: fix remote numa hits statistics
        mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
        ocfs2: fix crash caused by stale lvb with fsdlm plugin
        ...
      ba836a6f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · cff3b2c4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix rtlwifi crash, from Larry Finger.
      
       2) Memory disclosure in appletalk ipddp routing code, from Vlad
          Tsyrklevich.
      
       3) r8152 can erroneously split an RX packet into multiple URBs if the
          Rx FIFO is not empty when we suspend. Fix this by waiting for the
          FIFO to empty before suspending. From Hayes Wang.
      
       4) Two GRO fixes (enter slow path when not enough SKB tail room exists,
          disable frag0 optimizations when there are IPV6 extension headers)
          from Eric Dumazet and Herbert Xu.
      
       5) A series of mlx5e bug fixes (do source udp port offloading for
          tunnels properly, Ip fragment matching fixes, handling firmware
          errors properly when installing TC rules, etc.) from Saeed Mahameed,
          Or Gerlitz, Roi Dayan, Hadar Hen Zion, Gil Rockah, and Daniel
          Jurgens.
      
       6) Two VRF fixes from David Ahern (don't skip multipath selection for
          VRF paths, disallow VRF to be configured with table ID 0).
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        net: vrf: do not allow table id 0
        net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
        sctp: Fix spelling mistake: "Atempt" -> "Attempt"
        net: ipv4: Fix multipath selection with vrf
        cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
        gro: use min_t() in skb_gro_reset_offset()
        net/mlx5: Only cancel recovery work when cleaning up device
        net/mlx5e: Remove WARN_ONCE from adaptive moderation code
        net/mlx5e: Un-register uplink representor on nic_disable
        net/mlx5e: Properly handle FW errors while adding TC rules
        net/mlx5e: Fix kbuild warnings for uninitialized parameters
        net/mlx5e: Set inline mode requirements for matching on IP fragments
        net/mlx5e: Properly get address type of encapsulation IP headers
        net/mlx5e: TC ipv4 tunnel encap offload error flow fixes
        net/mlx5e: Warn when rejecting offload attempts of IP tunnels
        net/mlx5e: Properly handle offloading of source udp port for IP tunnels
        gro: Disable frag0 optimization on IPv6 ext headers
        gro: Enter slow-path if there is no tailroom
        mlx4: Return EOPNOTSUPP instead of ENOTSUPP
        net/af_iucv: don't use paged skbs for TX on HiperSockets
        ...
      cff3b2c4
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · a6b6e616
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes a regression in aesni that renders it useless if it's
        built-in with a modular pcbc configuration"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: aesni - Fix failure when built-in with modular pcbc
      a6b6e616
    • David Ahern's avatar
      net: vrf: do not allow table id 0 · 24c63bbc
      David Ahern authored
      Frank reported that vrf devices can be created with a table id of 0.
      This breaks many of the run time table id checks and should not be
      allowed. Detect this condition at create time and fail with EINVAL.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Reported-by: default avatarFrank Kellermann <frank.kellermann@atos.net>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24c63bbc
    • Russell King's avatar
      net: phy: marvell: fix Marvell 88E1512 used in SGMII mode · a13c0652
      Russell King authored
      When an Marvell 88E1512 PHY is connected to a nic in SGMII mode, the
      fiber page is used for the SGMII host-side connection.  The PHY driver
      notices that SUPPORTED_FIBRE is set, so it tries reading the fiber page
      for the link status, and ends up reading the MAC-side status instead of
      the outgoing (copper) link.  This leads to incorrect results reported
      via ethtool.
      
      If the PHY is connected via SGMII to the host, ignore the fiber page.
      However, continue to allow the existing power management code to
      suspend and resume the fiber page.
      
      Fixes: 6cfb3bcc ("Marvell phy: check link status in case of fiber link.")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a13c0652
    • Colin Ian King's avatar
      sctp: Fix spelling mistake: "Atempt" -> "Attempt" · eb004603
      Colin Ian King authored
      Trivial fix to spelling mistake in WARN_ONCE message
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb004603
    • David Ahern's avatar
      net: ipv4: Fix multipath selection with vrf · 7a18c5b9
      David Ahern authored
      fib_select_path does not call fib_select_multipath if oif is set in the
      flow struct. For VRF use cases oif is always set, so multipath route
      selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
      check similar to what is done in fib_table_lookup.
      
      Add saddr and proto to the flow struct for the fib lookup done by the
      VRF driver to better match hash computation for a flow.
      
      Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a18c5b9
    • Arnd Bergmann's avatar
      cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig · 73b35147
      Arnd Bergmann authored
      We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
      not right when CONFIG_NET is disabled and there is no socket interface:
      
      warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)
      
      I don't know what the correct solution for this is, but simply removing
      the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
      'if NET' section avoids the warning and does not produce other build
      errors.
      
      Fixes: 483c4933 ("cgroup: Fix CGROUP_BPF config")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73b35147
    • Eric Dumazet's avatar
      gro: use min_t() in skb_gro_reset_offset() · 7cfd5fd5
      Eric Dumazet authored
      On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
      so we shall use min_t() instead of min() to avoid a compiler error.
      
      Fixes: 1272ce87 ("gro: Enter slow-path if there is no tailroom")
      Reported-by: default avatarkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cfd5fd5
    • David S. Miller's avatar
      Merge branch 'mlx5-fixes' · 6c711c86
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox mlx5 fixes and cleanups 2017-01-10
      
      This series includes some mlx5e general cleanups from Daniel, Gil, Hadar
      and myself.
      Also it includes some critical mlx5e TC offloads fixes from Or Gerlitz.
      
      For -stable:
       - net/mlx5e: Remove WARN_ONCE from adaptive moderation code
      
         Although this fix doesn't affect any functionality, I thought it is
         better to clean this -WARN_ONCE- up for -stable in case someone hits
         such corner case.
      
      Please apply and let me know if there's any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c711c86
    • Daniel Jurgens's avatar
      net/mlx5: Only cancel recovery work when cleaning up device · 5e44fca5
      Daniel Jurgens authored
      Do not attempt to drain the health workqueue when unloading the device in
      the recovery flow, this can cause a deadlock when the recovery work
      tries to cancel itself with sync.
      
      Because the work is no longer unconditionally canceled when unloading, it
      must be explicitly canceled in the AER flow.
      
      fixes: 689a248d ("net/mlx5: Cancel recovery work in remove flow")
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e44fca5
    • Gil Rockah's avatar
      net/mlx5e: Remove WARN_ONCE from adaptive moderation code · 0bbcc0a8
      Gil Rockah authored
      When trying to do interface down or changing interface configuration
      under heavy traffic, some of the adaptive moderation corner cases can
      occur and leave a WARN_ONCE call trace in the kernel log.
      
      Those WARN_ONCE are meant for debug only, and should have been inserted
      only under debug. We avoid such call traces by removing those WARN_ONCE.
      
      Fixes: cb3c7fd4 ("net/mlx5e: Support adaptive RX coalescing")
      Signed-off-by: default avatarGil Rockah <gilr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bbcc0a8
    • Saeed Mahameed's avatar
      net/mlx5e: Un-register uplink representor on nic_disable · 3deef8ce
      Saeed Mahameed authored
      The code before this patch registered uplink e-Switch representor
      on nic_enable and unregistered on nic_cleanup, the right place
      for this unregister is in nic_disable.
      
      Fixes: 127ea380 ("net/mlx5: Add Representors registration API")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3deef8ce
    • Or Gerlitz's avatar
      net/mlx5e: Properly handle FW errors while adding TC rules · 5e86397a
      Or Gerlitz authored
      When the firmware returns an error (common example is an attempt to
      add twice the same rule which is refused by the some FWs), we are not
      properly derefing/cleaning few resources allocated on the way.
      Examples are vport vlan deref under eswitch vlan offloads, and encap
      entry/neighbour deref under eswitch encapsulation offloads, fix that.
      
      Fixes: a54e20b4 ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
      Fixes: 8b32580d ('net/mlx5e: Add TC vlan action for SRIOV offloads')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e86397a