1. 14 Jan, 2017 22 commits
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() call in set_user_nice() · 2fb8d367
      Peter Zijlstra authored
      Address this rq-clock update bug:
      
        WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          set_next_entity()
          ? _raw_spin_lock()
          set_curr_task_fair()
          set_user_nice.part.85()
          set_user_nice()
          create_worker()
          worker_thread()
          kthread()
          ret_from_fork()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2fb8d367
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() call for task_hot() · 3bed5e21
      Peter Zijlstra authored
      Add the update_rq_clock() call at the top of the callstack instead of
      at the bottom where we find it missing, this to aid later effort to
      minimize the number of update_rq_lock() calls.
      
        WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          assert_clock_updated.isra.63.part.64()
          can_migrate_task()
          load_balance()
          pick_next_task_fair()
          __schedule()
          schedule()
          worker_thread()
          kthread()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3bed5e21
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() in detach_task_cfs_rq() · 80f5c1b8
      Peter Zijlstra authored
      Instead of adding the update_rq_clock() all the way at the bottom of
      the callstack, add one at the top, this to aid later effort to
      minimize update_rq_lock() calls.
      
        WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          detach_task_cfs_rq()
          switched_from_fair()
          __sched_setscheduler()
          _sched_setscheduler()
          sched_set_stop_task()
          cpu_stop_create()
          __smpboot_create_thread.part.2()
          smpboot_register_percpu_thread_cpumask()
          cpu_stop_init()
          do_one_initcall()
          ? print_cpu_info()
          kernel_init_freeable()
          ? rest_init()
          kernel_init()
          ret_from_fork()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      80f5c1b8
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() in post_init_entity_util_avg() · 4126bad6
      Peter Zijlstra authored
      Address this rq-clock update bug:
      
        WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          __warn()
          post_init_entity_util_avg()
          wake_up_new_task()
          _do_fork()
          kernel_thread()
          rest_init()
          start_kernel()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4126bad6
    • Matt Fleming's avatar
      sched/fair: Push rq lock pin/unpin into idle_balance() · 46f69fa3
      Matt Fleming authored
      Future patches will emit warnings if rq_clock() is called before
      update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.
      
      Since there is only one caller of idle_balance() we can push the
      unpin/repin there.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-7-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      46f69fa3
    • Matt Fleming's avatar
      sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock · 92509b73
      Matt Fleming authored
      rq_clock() is called from sched_info_{depart,arrive}() after resetting
      RQCF_ACT_SKIP but prior to a call to update_rq_clock().
      
      In preparation for pending patches that check whether the rq clock has
      been updated inside of a pin context before rq_clock() is called, move
      the reset of rq->clock_skip_update immediately before unpinning the rq
      lock.
      
      This will avoid the new warnings which check if update_rq_clock() is
      being actively skipped.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-6-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      92509b73
    • Matt Fleming's avatar
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming authored
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d8ac8971
    • Frederic Weisbecker's avatar
      sched/cputime: Rename vtime_account_user() to vtime_flush() · c8d7dabf
      Frederic Weisbecker authored
      CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and
      account it on ticks and context switches only through the
      vtime_account_user() function.
      
      Now this model has been generalized on the 3 archs for all kind of
      cputime (system, irq, ...) and all the cputime flushing happens under
      vtime_account_user().
      
      So let's rename this function to better reflect its new role.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c8d7dabf
    • Martin Schwidefsky's avatar
      sched/cputime, s390: Implement delayed accounting of system time · b7394a5f
      Martin Schwidefsky authored
      The account_system_time() function is called with a cputime that
      occurred while running in the kernel. The function detects which
      context the CPU is currently running in and accounts the time to
      the correct bucket. This forces the arch code to account the
      cputime for hardirq and softirq immediately.
      
      Such accounting function can be costly and perform unwelcome divisions
      and multiplications, among others.
      
      The arch code can delay the accounting for system time. For s390
      the accounting is done once per timer tick and for each task switch.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      [ Rebase against latest linus tree and move account_system_index_scaled(). ]
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-10-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b7394a5f
    • Frederic Weisbecker's avatar
      sched/cputime, ia64: Accumulate cputime and account only on tick/task switch · 7dd58230
      Frederic Weisbecker authored
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-9-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7dd58230
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch · a19ff1a2
      Frederic Weisbecker authored
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-8-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a19ff1a2
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc: Migrate stolen_time field to the accounting structure · f828c3d0
      Frederic Weisbecker authored
      That in order to gather all cputime accumulation to the same place.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f828c3d0
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick · 8c8b73c4
      Frederic Weisbecker authored
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, provide finegrained accumulators to
      powerpc in order to store the cputime until flushing.
      
      While at it, normalize the name of several fields according to common
      cputime naming.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8c8b73c4
    • Frederic Weisbecker's avatar
      sched/cputime: Export account_guest_time() · 1213699a
      Frederic Weisbecker authored
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's allow archs to account cputime
      directly to gtime.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-5-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1213699a
    • Frederic Weisbecker's avatar
      sched/cputime: Allow accounting system time using cpustat index · c31cc6a5
      Frederic Weisbecker authored
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's provide APIs to account system
      time to precise contexts: hardirq, softirq, pure system, ...
      Inspired-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-4-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c31cc6a5
    • Frederic Weisbecker's avatar
      sched/cputime, ia64: Fix incorrect start cputime assignment on task switch · 8388d214
      Frederic Weisbecker authored
      On task switch we must initialize the current cputime of the next task
      using the value of the previous task which got freshly updated.
      
      But we are confusing that with doing the opposite, which should result
      in incorrect cputime accounting.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-3-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8388d214
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc32: Fix stale scaled stime on context switch · 90d08ba2
      Frederic Weisbecker authored
      On context switch with powerpc32, the cputime is accumulated in the
      thread_info struct. So the switching-in task must move forward its
      start time snapshot to the current time in order to later compute the
      delta spent in system mode.
      
      This is what we do for the normal cputime by initializing the starttime
      field to the value of the previous task's starttime which got freshly
      updated.
      
      But we are missing the update of the scaled cputime start time. As a
      result we may be accounting too much scaled cputime later.
      
      Fix this by initializing the scaled cputime the same way we do for
      normal cputime.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      90d08ba2
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio · af54efa4
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
      
       - Export and make use of has_capability() to fix incorrect use of
         ns_capable() for testing task capabilities (Jike Song)
      
      * tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Remove pid_namespace.h include
        vfio iommu type1: fix the testing of capability for remote task
        capability: export has_capability
        vfio-mdev: remove some dead code
        vfio-mdev: buffer overflow in ioctl()
        vfio-mdev: return -EFAULT if copy_to_user() fails
      af54efa4
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 406732c9
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
      
       - fix for module unload vs deferred jump labels (note: there might be
         other buggy modules!)
      
       - two NULL pointer dereferences from syzkaller
      
       - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
         made worse during this merge window, "just" kernel memory leak on
         releases
      
       - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix emulation of "MOV SS, null selector"
        KVM: x86: fix NULL deref in vcpu_scan_ioapic
        KVM: eventfd: fix NULL deref irqbypass consumer
        KVM: x86: Introduce segmented_write_std
        KVM: x86: flush pending lapic jump label updates on module unload
        jump_labels: API for flushing deferred jump label updates
      406732c9
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a65c9259
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix huge_ptep_set_access_flags() to return "changed" when any of the
         ptes in the contiguous range is changed, not just the last one
      
       - Fix the adr_l assembly macro to work in modules under KASLR
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: assembler: make adr_l work in modules under KASLR
        arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
      a65c9259
  2. 13 Jan, 2017 3 commits
  3. 12 Jan, 2017 15 commits
    • Jike Song's avatar
      vfio iommu type1: fix the testing of capability for remote task · d1b333d1
      Jike Song authored
      Before the mdev enhancement type1 iommu used capable() to test the
      capability of current task; in the course of mdev development a
      new requirement, testing for another task other than current, was
      raised.  ns_capable() was used for this purpose, however it still
      tests current, the only difference is, in a specified namespace.
      
      Fix it by using has_capability() instead, which tests the cap for
      specified task in init_user_ns, the same namespace as capable().
      
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarJike Song <jike.song@intel.com>
      Reviewed-by: default avatarJames Morris <james.l.morris@oracle.com>
      Reviewed-by: default avatarKirti Wankhede <kwankhede@nvidia.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      d1b333d1
    • Linus Torvalds's avatar
      Merge tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 557ed56c
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This time we got a few more fixes than the previous rc's, and most of
        commits were about ASoC.
      
        The only significant change in the core side is the regression fix wrt
        the aux device list handling, and all the rest are driver-specific
        small / trivial fixes"
      
      * tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Add a quirk for Plantronics BT600
        ASoC: rt5645: set sel_i2s_pre_div1 to 2
        ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
        ASoC: Intel: Skylake: Release FW ctx in cleanup
        ASoC: Intel: bytcr-rt5640: fix settings in internal clock mode
        ASoC: fsl_ssi: set fifo watermark to more reliable value
        ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
        ASoC: nau8825: correct the function name of register
        ASoC: Intel: Skylake: Fix to fail safely if module not available in path
        ASoC: tlv320aic3x: Mark the RESET register as volatile
        ASoC: Fix binding and probing of auxiliary components
        ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
        ASoC: Intel: bytcr_rt5640: fallback mechanism if MCLK is not enabled
        ASoC: hdmi-codec: use unsigned type to structure members with bit-field
        ASoC: topology: kfree kcontrol->private_value before freeing kcontrol
        ASoC: rsnd: don't double free kctrl
        ASoC: dwc: Fix PIO mode initialization
      557ed56c
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-4.10-rc4-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e28ac1fc
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "As promised last week, here's some stability fixes from Christoph and
        Jan Kara:
      
         - fix free space request handling when low on disk space
      
         - remove redundant log failure error messages
      
         - free truncated dirty pages instead of letting them build up
           forever"
      
      * tag 'xfs-for-linus-4.10-rc4-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Timely free truncated dirty pages
        xfs: don't print warnings when xfs_log_force fails
        xfs: don't rely on ->total in xfs_alloc_space_available
        xfs: adjust allocation length in xfs_alloc_space_available
        xfs: fix bogus minleft manipulations
        xfs: bump up reserved blocks in xfs_alloc_set_aside
      e28ac1fc
    • Linus Torvalds's avatar
      Merge tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteproc · 9ca277eb
      Linus Torvalds authored
      Pull remoteproc fixes from Bjorn Andersson:
       "This fixes two regressions that have been reported to be introduced in
        v4.10-rc1.
      
         - correct an incorrect usage of the kref api
      
         - revert the change to make the resource table read-only. As the
           space each vdev resource is used as virtio device config space it
           must be shared with the remote"
      
      * tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteproc:
        Revert "remoteproc: Merge table_ptr and cached_table pointers"
        remoteproc: fix vdev reference management
      9ca277eb
    • Linus Torvalds's avatar
      Merge tag 'rpmsg-v4.10-fixes' of git://github.com/andersson/remoteproc · 1d865da7
      Linus Torvalds authored
      Pull rpmsg fixes from Bjorn Andersson:
       "This fixes a regression introduced in v4.10-rc1 that prohibits
        multiple channels with the same name but different endpoint addresses
        to be used"
      
      * tag 'rpmsg-v4.10-fixes' of git://github.com/andersson/remoteproc:
        rpmsg: virtio_rpmsg_bus: fix channel creation
      1d865da7
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 95ce1313
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - device descriptor length validation fix to hid-cypress driver from
         Greg
      
       - introduction of a short delay into i2c-hid, which is not really
         mandated by the spec, but fixes Asus Touchpads
      
       - Petzl USB connectable flashlight quirk from myself
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: i2c-hid: Add sleep between POWER ON and RESET
        HID: hid-cypress: validate length of report
        HID: ignore Petzl USB headlamp
      95ce1313
    • Linus Torvalds's avatar
      Merge branch 'scsi-target-for-v4.10' of... · cb38b453
      Linus Torvalds authored
      Merge branch 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
      
      Pull scsi target fixes from Bart Van Assche:
      
       - a series of bug fixes for the XCOPY implementation from David
         Disseldorp
      
       - one bug fix for the ibmvscsis driver, a driver that is used for
         communication between partitions on IBM POWER systems.
      
      * 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux:
        ibmvscsis: Fix srp_transfer_data fail return code
        target: support XCOPY requests without parameters
        target: check for XCOPY parameter truncation
        target: use XCOPY segment descriptor CSCD IDs
        target: check XCOPY segment descriptor CSCD IDs
        target: simplify XCOPY wwn->se_dev lookup helper
        target: return UNSUPPORTED TARGET/SEGMENT DESC TYPE CODE sense
        target: bounds check XCOPY total descriptor list length
        target: bounds check XCOPY segment descriptor list
        target: use XCOPY TOO MANY TARGET DESCRIPTORS sense
        target: add XCOPY target/segment desc sense codes
      cb38b453
    • Geng, Jichao's avatar
      ceph: fix get_oldest_context() · 84fcc2d2
      Geng, Jichao authored
      For no snapshot case, we should use ci->truncate_{seq,size}.
      
      Fixes: 5f743e45 ("ceph: record truncate size/seq for snap data writeback")
      Signed-off-by: default avatarGeng, Jichao <geng.jichao@h3c.com>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      84fcc2d2
    • Yan, Zheng's avatar
      ceph: fix mds cluster availability check · cc8e8342
      Yan, Zheng authored
      We should apply the check after getting the initial mdsmap.
      
      Fixes: e9e427f0 ("ceph: check availability of mds cluster on mount")
      Link: http://tracker.ceph.com/issues/18161Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      cc8e8342
    • Linus Torvalds's avatar
      Merge tag 'md/4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · 607ae5f2
      Linus Torvalds authored
      Pull md fixes from Shaohua Li:
       "Basically one fix for raid5 cache which is merged in this cycle,
        others are trival fixes"
      
      * tag 'md/4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        md/raid5: Use correct IS_ERR() variation on pointer check
        md: cleanup mddev flag clear for takeover
        md/r5cache: fix spelling mistake on "recoverying"
        md/r5cache: assign conf->log before r5l_load_log()
        md/r5cache: simplify handling of sh->log_start in recovery
        md/raid5-cache: removes unnecessary write-through mode judgments
        md/raid10: Refactor raid10_make_request
        md/raid1: Refactor raid1_make_request
      607ae5f2
    • Ard Biesheuvel's avatar
      arm64: assembler: make adr_l work in modules under KASLR · 41c066f2
      Ard Biesheuvel authored
      When CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, the offset between loaded
      modules and the core kernel may exceed 4 GB, putting symbols exported
      by the core kernel out of the reach of the ordinary adrp/add instruction
      pairs used to generate relative symbol references. So make the adr_l
      macro emit a movz/movk sequence instead when executing in module context.
      
      While at it, remove the pointless special case for the stack pointer.
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      41c066f2
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 33ab9110
      Paolo Bonzini authored
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3
      Cc: stable@nongnu.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      33ab9110
    • Jike Song's avatar
      capability: export has_capability · 19c816e8
      Jike Song authored
      has_capability() is sometimes needed by modules to test capability
      for specified task other than current, so export it.
      
      Cc: Kirti Wankhede <kwankhede@nvidia.com>
      Signed-off-by: default avatarJike Song <jike.song@intel.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      19c816e8
    • Wanpeng Li's avatar
      KVM: x86: fix NULL deref in vcpu_scan_ioapic · 546d87e5
      Wanpeng Li authored
      Reported by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
          IP: _raw_spin_lock+0xc/0x30
          PGD 3e28eb067
          PUD 3f0ac6067
          PMD 0
          Oops: 0002 [#1] SMP
          CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
          Call Trace:
           ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
           kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
           ? pick_next_task_fair+0xe1/0x4e0
           ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
           kvm_vcpu_ioctl+0x33a/0x600 [kvm]
           ? hrtimer_try_to_cancel+0x29/0x130
           ? do_nanosleep+0x97/0xf0
           do_vfs_ioctl+0xa1/0x5d0
           ? __hrtimer_init+0x90/0x90
           ? do_nanosleep+0x5b/0xf0
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x6e/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0
      
      The syzkaller folks reported a NULL pointer dereference due to
      ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
      synthetic interrupt controller is activated, resulting in a
      wrong request to rescan the ioapic and a NULL pointer dereference.
      
          #include <sys/ioctl.h>
          #include <sys/mman.h>
          #include <sys/types.h>
          #include <linux/kvm.h>
          #include <pthread.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #ifndef KVM_CAP_HYPERV_SYNIC
          #define KVM_CAP_HYPERV_SYNIC 123
          #endif
      
          void* thr(void* arg)
          {
      	struct kvm_enable_cap cap;
      	cap.flags = 0;
      	cap.cap = KVM_CAP_HYPERV_SYNIC;
      	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
      	return 0;
          }
      
          int main()
          {
      	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	int kvmfd = open("/dev/kvm", 0);
      	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
      	struct kvm_userspace_memory_region memreg;
      	memreg.slot = 0;
      	memreg.flags = 0;
      	memreg.guest_phys_addr = 0;
      	memreg.memory_size = 0x1000;
      	memreg.userspace_addr = (unsigned long)host_mem;
      	host_mem[0] = 0xf4;
      	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
      	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
      	struct kvm_sregs sregs;
      	ioctl(cpufd, KVM_GET_SREGS, &sregs);
      	sregs.cr0 = 0;
      	sregs.cr4 = 0;
      	sregs.efer = 0;
      	sregs.cs.selector = 0;
      	sregs.cs.base = 0;
      	ioctl(cpufd, KVM_SET_SREGS, &sregs);
      	struct kvm_regs regs = { .rflags = 2 };
      	ioctl(cpufd, KVM_SET_REGS, &regs);
      	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
      	pthread_t th;
      	pthread_create(&th, 0, thr, (void*)(long)cpufd);
      	usleep(rand() % 10000);
      	ioctl(cpufd, KVM_RUN, 0);
      	pthread_join(th, 0);
      	return 0;
          }
      
      This patch fixes it by failing ENABLE_CAP if without an irqchip.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller)
      Cc: stable@vger.kernel.org # 4.5+
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      546d87e5
    • Wanpeng Li's avatar
      KVM: eventfd: fix NULL deref irqbypass consumer · 4f3dbdf4
      Wanpeng Li authored
      Reported syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
      
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           process_one_work+0x16b/0x480
           worker_thread+0x4b/0x500
           kthread+0x101/0x140
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
           ret_from_fork+0x25/0x30
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4f3dbdf4