1. 29 Jun, 2017 40 commits
    • Nicholas Bellinger's avatar
      target: Fix kref->refcount underflow in transport_cmd_finish_abort · 852a8042
      Nicholas Bellinger authored
      commit 73d4e580 upstream.
      
      This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED
      when a fabric driver drops it's second reference from below the
      target_core_tmr.c based callers of transport_cmd_finish_abort().
      
      Recently with the conversion of kref to refcount_t, this bug was
      manifesting itself as:
      
      [705519.601034] refcount_t: underflow; use-after-free.
      [705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 20116.512 msecs
      [705539.719111] ------------[ cut here ]------------
      [705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 refcount_sub_and_test+0x33/0x51
      
      Since the original kref atomic_t based kref_put() didn't check for
      underflow and only invoked the final callback when zero was reached,
      this bug did not manifest in practice since all se_cmd memory is
      using preallocated tags.
      
      To address this, go ahead and propigate the existing return from
      transport_put_cmd() up via transport_cmd_finish_abort(), and
      change transport_cmd_finish_abort() + core_tmr_handle_tas_abort()
      callers to only do their local target_put_sess_cmd() if necessary.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Tested-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Tested-by: default avatarGary Guo <ghg@datera.io>
      Tested-by: default avatarChu Yuan Lin <cyl@datera.io>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852a8042
    • Will Deacon's avatar
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · 830146b6
      Will Deacon authored
      commit dbb236c1 upstream.
      
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Acked-by: default avatarKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      830146b6
    • John Stultz's avatar
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 102d12f1
      John Stultz authored
      commit 3d88d56c upstream.
      
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      102d12f1
    • John Stultz's avatar
      time: Fix clock->read(clock) race around clocksource changes · fa79bdd4
      John Stultz authored
      commit ceea5e37 upstream.
      
      In tests, which excercise switching of clocksources, a NULL
      pointer dereference can be observed on AMR64 platforms in the
      clocksource read() function:
      
      u64 clocksource_mmio_readl_down(struct clocksource *c)
      {
      	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
      }
      
      This is called from the core timekeeping code via:
      
      	cycle_now = tkr->read(tkr->clock);
      
      tkr->read is the cached tkr->clock->read() function pointer.
      When the clocksource is changed then tkr->clock and tkr->read
      are updated sequentially. The code above results in a sequential
      load operation of tkr->read and tkr->clock as well.
      
      If the store to tkr->clock hits between the loads of tkr->read
      and tkr->clock, then the old read() function is called with the
      new clock pointer. As a consequence the read() function
      dereferences a different data structure and the resulting 'reg'
      pointer can point anywhere including NULL.
      
      This problem was introduced when the timekeeping code was
      switched over to use struct tk_read_base. Before that, it was
      theoretically possible as well when the compiler decided to
      reload clock in the code sequence:
      
           now = tk->clock->read(tk->clock);
      
      Add a helper function which avoids the issue by reading
      tk_read_base->clock once into a local variable clk and then issue
      the read function via clk->read(clk). This guarantees that the
      read() function always gets the proper clocksource pointer handed
      in.
      
      Since there is now no use for the tkr.read pointer, this patch
      also removes it, and to address stopping the fast timekeeper
      during suspend/resume, it introduces a dummy clocksource to use
      rather then just a dummy read function.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa79bdd4
    • Arend Van Spriel's avatar
      brcmfmac: unbind all devices upon failure in firmware callback · 0f3ae212
      Arend Van Spriel authored
      commit 7a51461f upstream.
      
      When request firmware fails, brcmf_ops_sdio_remove is being called and
      brcmf_bus freed. In such circumstancies if you do a suspend/resume cycle
      the kernel hangs on resume due a NULL pointer dereference in resume
      function. So in brcmf_sdio_firmware_callback() we need to unbind the
      driver from both sdio_func devices when firmware load failure is indicated.
      Tested-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f3ae212
    • Arend Van Spriel's avatar
      brcmfmac: use firmware callback upon failure to load · 2cb9b88c
      Arend Van Spriel authored
      commit 03fb0e83 upstream.
      
      When firmware loading failed the code used to unbind the device provided
      by the calling code. However, for the sdio driver two devices are bound
      and both need to be released upon failure. The callback has been extended
      with parameter to pass error code so add that in this commit upon firmware
      loading failure.
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cb9b88c
    • Arend Van Spriel's avatar
      brcmfmac: add parameter to pass error code in firmware callback · 1e5c7193
      Arend Van Spriel authored
      commit 6d0507a7 upstream.
      
      Extend the parameters in the firmware callback so it can be called
      upon success and failure. This allows the caller to properly clear
      all resources in the failure path. Right now the error code is
      always zero, ie. success.
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e5c7193
    • Daniel Drake's avatar
      Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list · b6bfede2
      Daniel Drake authored
      commit 817ae460 upstream.
      
      Without this quirk, the touchpad is not responsive on this product, with
      the following message repeated in the logs:
      
       psmouse serio1: bad data from KBC - timeout
      
      Add it to the notimeout list alongside other similar Fujitsu laptops.
      Signed-off-by: default avatarDaniel Drake <drake@endlessm.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6bfede2
    • Naveen N. Rao's avatar
      powerpc/64s: Handle data breakpoints in Radix mode · fc1c180b
      Naveen N. Rao authored
      commit d89ba535 upstream.
      
      On Power9, trying to use data breakpoints throws the splat shown
      below. This is because the check for a data breakpoint in DSISR is in
      do_hash_page(), which is not called when in Radix mode.
      
        Unable to handle kernel paging request for data at address 0xc000000000e19218
        Faulting instruction address: 0xc0000000001155e8
        cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
        pc: c0000000001155e8: find_pid_ns+0x48/0xe0
        lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
        sp: c0000000ef1e7da0
        msr: 9000000000009033
        dar: c000000000e19218
        dsisr: 400000
      
      Move the check to handle_page_fault() so as to catch data breakpoints
      in both Hash and Radix MMU modes.
      
      We have to change the check in do_hash_page() against 0xa410 to use
      0xa450, so as to include the value of (DSISR_DABRMATCH << 16).
      
      There are two sites that call handle_page_fault() when in Radix, both
      already pass DSISR in r4.
      
      Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
      Reported-by: default avatarShriya R. Kulkarni <shriykul@in.ibm.com>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Fix the fall-through case on hash, we need to reload DSISR]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc1c180b
    • Naveen N. Rao's avatar
      powerpc/kprobes: Pause function_graph tracing during jprobes handling · 7a6c5053
      Naveen N. Rao authored
      commit a9f8553e upstream.
      
      This fixes a crash when function_graph and jprobes are used together.
      This is essentially commit 237d28db ("ftrace/jprobes/x86: Fix
      conflict between jprobes and function graph tracing"), but for powerpc.
      
      Jprobes breaks function_graph tracing since the jprobe hook needs to use
      jprobe_return(), which never returns back to the hook, but instead to
      the original jprobe'd function. The solution is to momentarily pause
      function_graph tracing before invoking the jprobe hook and re-enable it
      when returning back to the original jprobe'd function.
      
      Fixes: 6794c782 ("powerpc64: port of the function graph tracer")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a6c5053
    • Eric W. Biederman's avatar
      signal: Only reschedule timers on signals timers have sent · e6281e0d
      Eric W. Biederman authored
      commit 57db7e4a upstream.
      
      Thomas Gleixner  wrote:
      > The CRIU support added a 'feature' which allows a user space task to send
      > arbitrary (kernel) signals to itself. The changelog says:
      >
      >   The kernel prevents sending of siginfo with positive si_code, because
      >   these codes are reserved for kernel.  I think we can allow a task to
      >   send such a siginfo to itself.  This operation should not be dangerous.
      >
      > Quite contrary to that claim, it turns out that it is outright dangerous
      > for signals with info->si_code == SI_TIMER. The following code sequence in
      > a user space task allows to crash the kernel:
      >
      >    id = timer_create(CLOCK_XXX, ..... signo = SIGX);
      >    timer_set(id, ....);
      >    info->si_signo = SIGX;
      >    info->si_code = SI_TIMER:
      >    info->_sifields._timer._tid = id;
      >    info->_sifields._timer._sys_private = 2;
      >    rt_[tg]sigqueueinfo(..., SIGX, info);
      >    sigemptyset(&sigset);
      >    sigaddset(&sigset, SIGX);
      >    rt_sigtimedwait(sigset, info);
      >
      > For timers based on CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID this
      > results in a kernel crash because sigwait() dequeues the signal and the
      > dequeue code observes:
      >
      >   info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0
      >
      > which triggers the following callchain:
      >
      >  do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer()
      >
      > arm_timer() executes a list_add() on the timer, which is already armed via
      > the timer_set() syscall. That's a double list add which corrupts the posix
      > cpu timer list. As a consequence the kernel crashes on the next operation
      > touching the posix cpu timer list.
      >
      > Posix clocks which are internally implemented based on hrtimers are not
      > affected by this because hrtimer_start() can handle already armed timers
      > nicely, but it's a reliable way to trigger the WARN_ON() in
      > hrtimer_forward(), which complains about calling that function on an
      > already armed timer.
      
      This problem has existed since the posix timer code was merged into
      2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to
      inject not just a signal (which linux has supported since 1.0) but the
      full siginfo of a signal.
      
      The core problem is that the code will reschedule in response to
      signals getting dequeued not just for signals the timers sent but
      for other signals that happen to a si_code of SI_TIMER.
      
      Avoid this confusion by testing to see if the queued signal was
      preallocated as all timer signals are preallocated, and so far
      only the timer code preallocates signals.
      
      Move the check for if a timer needs to be rescheduled up into
      collect_signal where the preallocation check must be performed,
      and pass the result back to dequeue_signal where the code reschedules
      timers.   This makes it clear why the code cares about preallocated
      timers.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      Reference: 66dd34ad ("signal: allow to send any siginfo to itself")
      Reference: 1669ce53 ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO")
      Fixes: db8b50ba ("[PATCH] POSIX clocks & timers")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6281e0d
    • Jason A. Donenfeld's avatar
      random: silence compiler warnings and fix race · 3a182635
      Jason A. Donenfeld authored
      commit 4a072c71 upstream.
      
      Odd versions of gcc for the sh4 architecture will actually warn about
      flags being used while uninitialized, so we set them to zero. Non crazy
      gccs will optimize that out again, so it doesn't make a difference.
      
      Next, over aggressive gccs could inline the expression that defines
      use_lock, which could then introduce a race resulting in a lock
      imbalance. By using READ_ONCE, we prevent that fate. Finally, we make
      that assignment const, so that gcc can still optimize a nice amount.
      
      Finally, we fix a potential deadlock between primary_crng.lock and
      batched_entropy_reset_lock, where they could be called in opposite
      order. Moving the call to invalidate_batched_entropy to outside the lock
      rectifies this issue.
      
      Fixes: b169c13dSigned-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a182635
    • Sebastian Parschauer's avatar
      HID: Add quirk for Dell PIXART OEM mouse · 1429fdb1
      Sebastian Parschauer authored
      commit 3db28271 upstream.
      
      This mouse is also known under other IDs. It needs the quirk
      ALWAYS_POLL or will disconnect in runlevel 1 or 3.
      Signed-off-by: default avatarSebastian Parschauer <sparschauer@suse.de>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1429fdb1
    • Raju Rangoju's avatar
      cxgb4: notify uP to route ctrlq compl to rdma rspq · abef7afb
      Raju Rangoju authored
      commit dec6b331 upstream.
      
      During the module initialisation there is a possible race
      (basically race between uld and lld) where neither the uld
      nor lld notifies the uP about where to route the ctrl queue
      completions. LLD skips notifying uP as the rdma queues were
      not created by then (will leave it to ULD to notify the uP).
      As the ULD comes up, it also skips notifying the uP as the
      flag FULL_INIT_DONE is not set yet (ULD assumes that the
      interface is not up yet).
      
      Consequently, this race between uld and lld leaves uP
      unnotified about where to send the ctrl queue completions
      to, leading to iwarp RI_RES WR failure.
      
      Here is the race:
      
      CPU 0                                   CPU1
      
      - allocates nic rx queus
      - t4_sge_alloc_ctrl_txq()
      (if rdma rsp queues exists,
      tell uP to route ctrl queue
      compl to rdma rspq)
                                      - acquires the mutex_lock
                                      - allocates rdma response queues
                                      - if FULL_INIT_DONE set,
                                        tell uP to route ctrl queue compl
                                        to rdma rspq
                                      - relinquishes mutex_lock
      - acquires the mutex_lock
      - enable_rx()
      - set FULL_INIT_DONE
      - relinquishes mutex_lock
      
      This patch fixes the above issue.
      
      Fixes: e7519f99('cxgb4: avoid enabling napi twice to the same queue')
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Acked-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abef7afb
    • Christophe Jaillet's avatar
      CIFS: Fix some return values in case of error in 'crypt_message' · 4eec15d3
      Christophe Jaillet authored
      commit 517a6e43 upstream.
      
      'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we
      should return -ENOMEM instead.
      
      Also remove a useless 'rc' in a debug message as it is meaningless here.
      
      Fixes: 026e93dc ("CIFS: Encrypt SMB3 requests before sending")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4eec15d3
    • Pavel Shilovsky's avatar
      CIFS: Improve readdir verbosity · dcaa5a53
      Pavel Shilovsky authored
      commit dcd87838 upstream.
      
      Downgrade the loglevel for SMB2 to prevent filling the log
      with messages if e.g. readdir was interrupted. Also make SMB2
      and SMB1 codepaths do the same logging during readdir.
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcaa5a53
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Save/restore host values of debug registers · 3ff8eda2
      Paul Mackerras authored
      commit 7ceaa6dc upstream.
      
      At present, HV KVM on POWER8 and POWER9 machines loses any instruction
      or data breakpoint set in the host whenever a guest is run.
      Instruction breakpoints are currently only used by xmon, but ptrace
      and the perf_event subsystem can set data breakpoints as well as xmon.
      
      To fix this, we save the host values of the debug registers (CIABR,
      DAWR and DAWRX) before entering the guest and restore them on exit.
      To provide space to save them in the stack frame, we expand the stack
      frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ff8eda2
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 324df574
      Paul Mackerras authored
      commit 4c3bb4cc upstream.
      
      This restores several special-purpose registers (SPRs) to sane values
      on guest exit that were missed before.
      
      TAR and VRSAVE are readable and writable by userspace, and we need to
      save and restore them to prevent the guest from potentially affecting
      userspace execution (not that TAR or VRSAVE are used by any known
      program that run uses the KVM_RUN ioctl).  We save/restore these
      in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.
      
      FSCR affects userspace execution in that it can prohibit access to
      certain facilities by userspace.  We restore it to the normal value
      for the task on exit from the KVM_RUN ioctl.
      
      IAMR is normally 0, and is restored to 0 on guest exit.  However,
      with a radix host on POWER9, it is set to a value that prevents the
      kernel from executing user-accessible memory.  On POWER9, we save
      IAMR on guest entry and restore it on guest exit to the saved value
      rather than 0.  On POWER8 we continue to set it to 0 on guest exit.
      
      PSPB is normally 0.  We restore it to 0 on guest exit to prevent
      userspace taking advantage of the guest having set it non-zero
      (which would allow userspace to set its SMT priority to high).
      
      UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
      the AMR from being used as a covert channel between userspace
      processes, since the AMR is not context-switched at present.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324df574
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Context-switch EBB registers properly · 4ae3f358
      Paul Mackerras authored
      commit ca8efa1d upstream.
      
      This adds code to save the values of three SPRs (special-purpose
      registers) used by userspace to control event-based branches (EBBs),
      which are essentially interrupts that get delivered directly to
      userspace.  These registers are loaded up with guest values when
      entering the guest, and their values are saved when exiting the
      guest, but we were not saving the host values and restoring them
      before going back to userspace.
      
      On POWER8 this would only affect userspace programs which explicitly
      request the use of EBBs and also use the KVM_RUN ioctl, since the
      only source of EBBs on POWER8 is the PMU, and there is an explicit
      enable bit in the PMU registers (and those PMU registers do get
      properly context-switched between host and guest).  On POWER9 there
      is provision for externally-generated EBBs, and these are not subject
      to the control in the PMU registers.
      
      Since these registers only affect userspace, we can save them when
      we first come in from userspace and restore them before returning to
      userspace, rather than saving/restoring the host values on every
      guest entry/exit.  Similarly, we don't need to worry about their
      values on offline secondary threads since they execute in the context
      of the idle task, which never executes in userspace.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ae3f358
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · a7d29e27
      Paul Mackerras authored
      commit 3d3efb68 upstream.
      
      POWER9 DD1 has an erratum where writing to the TBU40 register, which
      is used to apply an offset to the timebase, can cause the timebase to
      lose counts.  This results in the timebase on some CPUs getting out of
      sync with other CPUs, which then results in misbehaviour of the
      timekeeping code.
      
      To work around the problem, we make KVM ignore the timebase offset for
      all guests on POWER9 DD1 machines.  This means that live migration
      cannot be supported on POWER9 DD1 machines.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7d29e27
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 4d9c6828
      Paul Mackerras authored
      commit 46a704f8 upstream.
      
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d9c6828
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Cope with host using large decrementer mode · 1e0d7996
      Paul Mackerras authored
      commit 2f272463 upstream.
      
      POWER9 introduces a new mode for the decrementer register, called
      large decrementer mode, in which the decrementer counter is 56 bits
      wide rather than 32, and reads are sign-extended rather than
      zero-extended.  For the decrementer, this new mode is optional and
      controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
      is 56 bits wide on POWER9 and has no mode control.
      
      Since KVM code reads and writes the decrementer and hypervisor
      decrementer registers in a few places, it needs to be aware of the
      need to treat the decrementer value as a 64-bit quantity, and only do
      a 32-bit sign extension when large decrementer mode is not in effect.
      Similarly, the HDEC should always be treated as a 64-bit quantity on
      POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
      test for POWER9 and the sign extension.
      
      To enable the sign extension to be removed in large decrementer mode,
      we test the LPCR_LD bit in the host LPCR image stored in the struct
      kvm for the guest.  If is set then large decrementer mode is enabled
      and the sign extension should be skipped.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e0d7996
    • Heiko Carstens's avatar
      KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows · 5d34bddd
      Heiko Carstens authored
      commit addb63c1 upstream.
      
      For real-space designation asces the asce origin part is only a token.
      The asce token origin must not be used to generate an effective
      address for storage references. This however is erroneously done
      within kvm_s390_shadow_tables().
      
      Furthermore within the same function the wrong parts of virtual
      addresses are used to generate a corresponding real address
      (e.g. the region second index is used as region first index).
      
      Both of the above can result in incorrect address translations. Only
      for real space designations with a token origin of zero and addresses
      below one megabyte the translation was correct.
      
      Furthermore replace a "!asce.r" statement with a "!*fake" statement to
      make it more obvious that a specific condition has nothing to do with
      the architecture, but with the fake handling of real space designations.
      
      Fixes: 3218f709 ("s390/mm: support real-space for gmap shadows")
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d34bddd
    • James Cowgill's avatar
      KVM: MIPS: Fix maybe-uninitialized build failure · 87ee4cdd
      James Cowgill authored
      commit e27a9eca upstream.
      
      This commit fixes a "maybe-uninitialized" build failure in
      arch/mips/kvm/tlb.c when KVM, DYNAMIC_DEBUG and JUMP_LABEL are all
      enabled. The failure is:
      
      In file included from ./include/linux/printk.h:329:0,
                       from ./include/linux/kernel.h:13,
                       from ./include/asm-generic/bug.h:15,
                       from ./arch/mips/include/asm/bug.h:41,
                       from ./include/linux/bug.h:4,
                       from ./include/linux/thread_info.h:11,
                       from ./include/asm-generic/current.h:4,
                       from ./arch/mips/include/generated/asm/current.h:1,
                       from ./include/linux/sched.h:11,
                       from arch/mips/kvm/tlb.c:13:
      arch/mips/kvm/tlb.c: In function ‘kvm_mips_host_tlb_inv’:
      ./include/linux/dynamic_debug.h:126:3: error: ‘idx_kernel’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         __dynamic_pr_debug(&descriptor, pr_fmt(fmt), \
         ^~~~~~~~~~~~~~~~~~
      arch/mips/kvm/tlb.c:169:16: note: ‘idx_kernel’ was declared here
        int idx_user, idx_kernel;
                      ^~~~~~~~~~
      
      There is a similar error relating to "idx_user". Both errors were
      observed with GCC 6.
      
      As far as I can tell, it is impossible for either idx_user or idx_kernel
      to be uninitialized when they are later read in the calls to kvm_debug,
      but to satisfy the compiler, add zero initializers to both variables.
      Signed-off-by: default avatarJames Cowgill <James.Cowgill@imgtec.com>
      Fixes: 57e3869c ("KVM: MIPS/TLB: Generalise host TLB invalidate to kernel ASID")
      Acked-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87ee4cdd
    • Paolo Bonzini's avatar
      KVM: x86: fix singlestepping over syscall · 3af2b32a
      Paolo Bonzini authored
      commit c8401dda upstream.
      
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3af2b32a
    • Björn Töpel's avatar
      perf probe: Fix probe definition for inlined functions · 75d73538
      Björn Töpel authored
      commit 7598f8bc upstream.
      
      In commit 613f050d ("perf probe: Fix to probe on gcc generated
      functions in modules"), the offset from symbol is, incorrectly, added
      to the trace point address. This leads to incorrect probe trace points
      for inlined functions and when using relative line number on symbols.
      
      Prior this patch:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
      
      After:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
      
      Committer testing:
      
      Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
      the functions that while not being explicitely marked as inline, were inlined
      by the compiler:
      
        # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
        __ew32
        e1000_regdump
        e1000e_dump_ps_pages
        e1000_desc_unused
        e1000e_systim_to_hwtstamp
        e1000e_rx_hwtstamp
        e1000e_update_rdt_wa
        e1000e_update_tdt_wa
        e1000_put_txbuf
        e1000_consume_page
      
      Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
      them:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
      e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
      that function:
      
        $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
        <SNIP>
        <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
          <13e27c>   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
          <13e287>   DW_AT_low_pc      : 0x17a30
        <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e6f0>   DW_AT_abstract_origin: <0x13ed2c>
          <13e6f4>   DW_AT_low_pc      : 0x17be6
        <SNIP>
        <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
           <13ed2e>   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page
      
      So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
      inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
      address, gives us the offset we should use in the probe definition:
      
        0x17be6 - 0x17a30 = 438
      
      but above we have 876, which is twice as much.
      
      Lets see the second inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
        <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e86f>   DW_AT_abstract_origin: <0x13ed2c>
          <13e873>   DW_AT_low_pc      : 0x17d21
      
        0x17d21 - 0x17a30 = 753
      
      So we where adding it at twice the offset from the containing function as we
      should.
      
      And then after this patch:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
        #
      
      Which matches the two first expansions and shows that because we were
      doubling the offset it would spill over the next function:
      
        readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
         673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
         674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps
      
      This is the 3rd inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
         <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13ec78>   DW_AT_abstract_origin: <0x13ed2c>
          <13ec7c>   DW_AT_low_pc      : 0x17f79
      
        0x17f79 - 0x17a30 = 1353
      
       So:
      
         0x17a30 + 2 * 1353 = 0x184c2
      
        And:
      
         0x184c2 - 0x18090 = 1074
      
      Which explains the bogus third expansion for e1000_consume_page() to end up at:
      
         p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      All fixed now :-)
      
      [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Fixes: 613f050d ("perf probe: Fix to probe on gcc generated functions in modules")
      Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75d73538
    • Kan Liang's avatar
      perf/x86/intel: Add 1G DTLB load/store miss support for SKL · d46eda19
      Kan Liang authored
      commit fb3a5055 upstream.
      
      Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
      4M page size.
      Need to extend the events to support any page size (4K/2M/4M/1G).
      
      The complete DTLB load/store miss events are:
      
        DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
        DTLB_STORE_MISSES.WALK_COMPLETED		0xe49
      Signed-off-by: default avatarKan Liang <Kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d46eda19
    • Ilya Matveychikov's avatar
      lib/cmdline.c: fix get_options() overflow while parsing ranges · 1a640581
      Ilya Matveychikov authored
      commit a91e0f68 upstream.
      
      When using get_options() it's possible to specify a range of numbers,
      like 1-100500.  The problem is that it doesn't track array size while
      calling internally to get_range() which iterates over the range and
      fills the memory with numbers.
      
      Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.comSigned-off-by: default avatarIlya V. Matveychikov <matvejchikov@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a640581
    • Jan Kara's avatar
      fs/dax.c: fix inefficiency in dax_writeback_mapping_range() · 5f83a744
      Jan Kara authored
      commit 1eb643d0 upstream.
      
      dax_writeback_mapping_range() fails to update iteration index when
      searching radix tree for entries needing cache flushing.  Thus each
      pagevec worth of entries is searched starting from the start which is
      inefficient and prone to livelocks.  Update index properly.
      
      Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz
      Fixes: 9973c98e ("dax: add support for fsync/sync")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f83a744
    • NeilBrown's avatar
      autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL · 909c2562
      NeilBrown authored
      commit 9fa4eb8e upstream.
      
      If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
      autofs4_d_automount() will return
      
         ERR_PTR(status)
      
      with that status to follow_automount(), which will then dereference an
      invalid pointer.
      
      So treat a positive status the same as zero, and map to ENOENT.
      
      See comment in systemd src/core/automount.c::automount_send_ready().
      
      Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      909c2562
    • Ravi Bangoria's avatar
      powerpc/perf: Fix oops when kthread execs user process · a2063a20
      Ravi Bangoria authored
      commit bf05fc25 upstream.
      
      When a kthread calls call_usermodehelper() the steps are:
        1. allocate current->mm
        2. load_elf_binary()
        3. populate current->thread.regs
      
      While doing this, interrupts are not disabled. If there is a perf
      interrupt in the middle of this process (i.e. step 1 has completed
      but not yet reached to step 3) and if perf tries to read userspace
      regs, kernel oops with following log:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000da0fc
        ...
        Call Trace:
        perf_output_sample_regs+0x6c/0xd0
        perf_output_sample+0x4e4/0x830
        perf_event_output_forward+0x64/0x90
        __perf_event_overflow+0x8c/0x1e0
        record_and_restart+0x220/0x5c0
        perf_event_interrupt+0x2d8/0x4d0
        performance_monitor_exception+0x54/0x70
        performance_monitor_common+0x158/0x160
        --- interrupt: f01 at avtab_search_node+0x150/0x1a0
            LR = avtab_search_node+0x100/0x1a0
        ...
        load_elf_binary+0x6e8/0x15a0
        search_binary_handler+0xe8/0x290
        do_execveat_common.isra.14+0x5f4/0x840
        call_usermodehelper_exec_async+0x170/0x210
        ret_from_kernel_thread+0x5c/0x7c
      
      Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
      pt_regs are not set.
      
      Fixes: ed4a4ef8 ("powerpc/perf: Add support for sampling interrupt register state")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2063a20
    • Kees Cook's avatar
      fs/exec.c: account for argv/envp pointers · fed07e89
      Kees Cook authored
      commit 98da7d08 upstream.
      
      When limiting the argv/envp strings during exec to 1/4 of the stack limit,
      the storage of the pointers to the strings was not included.  This means
      that an exec with huge numbers of tiny strings could eat 1/4 of the stack
      limit in strings and then additional space would be later used by the
      pointers to the strings.
      
      For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
      single-byte strings would consume less than 2MB of stack, the max (8MB /
      4) amount allowed, but the pointers to the strings would consume the
      remaining additional stack space (1677721 * 4 == 6710884).
      
      The result (1677721 + 6710884 == 8388605) would exhaust stack space
      entirely.  Controlling this stack exhaustion could result in
      pathological behavior in setuid binaries (CVE-2017-1000365).
      
      [akpm@linux-foundation.org: additional commenting from Kees]
      Fixes: b6a2fea3 ("mm: variable length argument support")
      Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fed07e89
    • Takashi Iwai's avatar
      ALSA: hda - Apply quirks to Broxton-T, too · c8bfdd08
      Takashi Iwai authored
      commit c7ecb906 upstream.
      
      Broxton-T was a forgotten child and we didn't apply the quirks for
      Skylake+ properly.  Meanwhile, a quirk for reducing the DMA latency
      seems specific to the early Broxton model, so we leave as is.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8bfdd08
    • Megha Dey's avatar
      ALSA: hda - Add Coffelake PCI ID · 14487f9f
      Megha Dey authored
      commit e79b0006 upstream.
      
      Coffelake is another Intel part, so need to add PCI ID for it.
      Signed-off-by: default avatarMegha Dey <megha.dey@intel.com>
      Signed-off-by: default avatarSubhransu S. Prusty <subhransu.s.prusty@intel.com>
      Acked-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14487f9f
    • Takashi Iwai's avatar
      ALSA: pcm: Don't treat NULL chmap as a fatal error · e6dc243c
      Takashi Iwai authored
      commit 2deaeaf1 upstream.
      
      The standard PCM chmap helper callbacks treat the NULL info->chmap as
      a fatal error and spews the kernel warning with stack trace when
      CONFIG_SND_DEBUG is on.  This was OK, originally it was supposed to be
      always static and non-NULL.  But, as the recent addition of Intel LPE
      audio driver shows, the chmap content may vary dynamically, and it can
      be even NULL when disconnected.  The user still sees the kernel
      warning unnecessarily.
      
      For clearing such a confusion, this patch simply removes the
      snd_BUG_ON() in each place, just returns an error without warning.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6dc243c
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib: Fix stall of process context at packet error · c06718bc
      Takashi Sakamoto authored
      commit 4a9bfafc upstream.
      
      At Linux v3.5, packet processing can be done in process context of ALSA
      PCM application as well as software IRQ context for OHCI 1394. Below is
      an example of the callgraph (some calls are omitted).
      
      ioctl(2) with e.g. HWSYNC
      (sound/core/pcm_native.c)
      ->snd_pcm_common_ioctl1()
        ->snd_pcm_hwsync()
          ->snd_pcm_stream_lock_irq
          (sound/core/pcm_lib.c)
          ->snd_pcm_update_hw_ptr()
            ->snd_pcm_udpate_hw_ptr0()
              ->struct snd_pcm_ops.pointer()
              (sound/firewire/*)
              = Each handler on drivers in ALSA firewire stack
                (sound/firewire/amdtp-stream.c)
                ->amdtp_stream_pcm_pointer()
                  (drivers/firewire/core-iso.c)
                  ->fw_iso_context_flush_completions()
                    ->struct fw_card_driver.flush_iso_completion()
                    (drivers/firewire/ohci.c)
                    = flush_iso_completions()
                      ->struct fw_iso_context.callback.sc
                      (sound/firewire/amdtp-stream.c)
                      = in_stream_callback() or out_stream_callback()
                        ->...
          ->snd_pcm_stream_unlock_irq
      
      When packet queueing error occurs or detecting invalid packets in
      'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
      is called on local CPU with disabled IRQ.
      
      (sound/firewire/amdtp-stream.c)
      in_stream_callback() or out_stream_callback()
      ->amdtp_stream_pcm_abort()
        ->snd_pcm_stop_xrun()
          ->snd_pcm_stream_lock_irqsave()
          ->snd_pcm_stop()
          ->snd_pcm_stream_unlock_irqrestore()
      
      The process is stalled on the CPU due to attempt to acquire recursive lock.
      
      [  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
      [  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0 softirq=38323/38323 fqs=7140
      [  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035, q=5933)
      [  562.630866] Task dump for CPU 2:
      [  562.630867] alsa-source-OXF R  running task        0  6619      1 0x00000008
      [  562.630870] Call Trace:
      [  562.630876]  ? vt_console_print+0x79/0x3e0
      [  562.630880]  ? msg_print_text+0x9d/0x100
      [  562.630883]  ? up+0x32/0x50
      [  562.630885]  ? irq_work_queue+0x8d/0xa0
      [  562.630886]  ? console_unlock+0x2b6/0x4b0
      [  562.630888]  ? vprintk_emit+0x312/0x4a0
      [  562.630892]  ? dev_vprintk_emit+0xbf/0x230
      [  562.630895]  ? do_sys_poll+0x37a/0x550
      [  562.630897]  ? dev_printk_emit+0x4e/0x70
      [  562.630900]  ? __dev_printk+0x3c/0x80
      [  562.630903]  ? _raw_spin_lock+0x20/0x30
      [  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
      [  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
      [  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
      [  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
      [  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
      [  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
      [  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
      [  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
      [  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
      [  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
      [  562.630945]  ? poll_select_copy_remaining+0x150/0x150
      [  562.630947]  ? poll_select_copy_remaining+0x150/0x150
      [  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
      [  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
      [  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
      [  562.630962]  ? futex_wake+0x90/0x170
      [  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
      [  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
      [  562.630972]  ? do_vfs_ioctl+0xa3/0x610
      [  562.630974]  ? vfs_read+0x11b/0x130
      [  562.630976]  ? SyS_ioctl+0x79/0x90
      [  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      
      This commit fixes the above bug. This assumes two cases:
      1. Any error is detected in software IRQ context of OHCI 1394 context.
      In this case, PCM substream should be aborted in packet handler. On the
      other hand, it should not be done in any process context. TO distinguish
      these two context, use 'in_interrupt()' macro.
      2. Any error is detect in process context of ALSA PCM application.
      In this case, PCM substream should not be aborted in packet handler
      because PCM substream lock is acquired. The task to abort PCM substream
      should be done in ALSA PCM core. For this purpose, SNDRV_PCM_POS_XRUN is
      returned at 'struct snd_pcm_ops.pointer()'.
      Suggested-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Fixes: e9148ddd("ALSA: firewire-lib: flush completed packets when reading PCM position")
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c06718bc
    • Jan Beulich's avatar
      xen-blkback: don't leak stack data via response ring · b919d2dc
      Jan Beulich authored
      commit 089bc014 upstream.
      
      Rather than constructing a local structure instance on the stack, fill
      the fields directly on the shared ring, just like other backends do.
      Build on the fact that all response structure flavors are actually
      identical (the old code did make this assumption too).
      
      This is XSA-216.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b919d2dc
    • Juergen Gross's avatar
      xen/blkback: fix disconnect while I/Os in flight · 7bffd6bb
      Juergen Gross authored
      commit 46464411 upstream.
      
      Today disconnecting xen-blkback is broken in case there are still
      I/Os in flight: xen_blkif_disconnect() will bail out early without
      releasing all resources in the hope it will be called again when
      the last request has terminated. This, however, won't happen as
      xen_blkif_free() won't be called on termination of the last running
      request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
      xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
      calls in xen_blkif_disconnect() didn't happen.
      
      To solve this deadlock xen_blkif_disconnect() and
      xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
      xen_blkif_get() but use some other way to do their accounting of
      resources.
      
      This at once fixes another error in xen_blkif_disconnect(): when it
      returned early with -EBUSY for another ring than 0 it would call
      xen_blkif_put() again for already handled rings on a subsequent call.
      This will lead to inconsistencies in the refcnt handling.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarSteven Haigh <netwiz@crc.id.au>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bffd6bb
    • Boris Brezillon's avatar
      clk: sunxi-ng: sun5i: Fix ahb_bist_clk definition · 61ab4c4a
      Boris Brezillon authored
      commit 370d9192 upstream.
      
      AHB BIST gate is actually controlled with bit 7.
      
      This bug was detected while trying to use the NAND controller which is
      using the DMA engine to transfer data to the NAND.
      Since the ahb_bist_clk gate bit conflicts with the ahb_dma_clk gate bit,
      the core was disabling the DMA engine clock as part of its 'disable
      unused clks' procedure, which was causing all DMA transfers to fail after
      this point.
      
      Fixes: 5e737617 ("clk: sunxi-ng: Add sun5i CCU driver")
      Reported-by: default avatarAngus Ainslie <angus@akkea.ca>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Tested-by: default avatarAngus Ainslie <angus@akkea.ca>
      Reviewed-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMichael Turquette <mturquette@baylibre.com>
      Link: lkml.kernel.org/r/1495643669-28221-1-git-send-email-boris.brezillon@free-electrons.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61ab4c4a
    • Yong Deng's avatar
      clk: sunxi-ng: v3s: Fix usb otg device reset bit · 744b0f61
      Yong Deng authored
      commit 7ffc781e upstream.
      
      V3S's usb otg device reset bit should be 24, not 23.
      Signed-off-by: default avatarYong Deng <iemdey@gmail.com>
      Reviewed-By: default avatarIcenowy Zheng <icenowy@aosc.io>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      744b0f61