1. 29 Jun, 2017 31 commits
    • Alex Deucher's avatar
    • Alex Deucher's avatar
    • Nicholas Bellinger's avatar
      iscsi-target: Reject immediate data underflow larger than SCSI transfer length · 3900f24a
      Nicholas Bellinger authored
      commit abb85a9b upstream.
      
      When iscsi WRITE underflow occurs there are two different scenarios
      that can happen.
      
      Normally in practice, when an EDTL vs. SCSI CDB TRANSFER LENGTH
      underflow is detected, the iscsi immediate data payload is the
      smaller SCSI CDB TRANSFER LENGTH.
      
      That is, when a host fabric LLD is using a fixed size EDTL for
      a specific control CDB, the SCSI CDB TRANSFER LENGTH and actual
      SCSI payload ends up being smaller than EDTL.  In iscsi, this
      means the received iscsi immediate data payload matches the
      smaller SCSI CDB TRANSFER LENGTH, because there is no more
      SCSI payload to accept beyond SCSI CDB TRANSFER LENGTH.
      
      However, it's possible for a malicous host to send a WRITE
      underflow where EDTL is larger than SCSI CDB TRANSFER LENGTH,
      but incoming iscsi immediate data actually matches EDTL.
      
      In the wild, we've never had a iscsi host environment actually
      try to do this.
      
      For this special case, it's wrong to truncate part of the
      control CDB payload and continue to process the command during
      underflow when immediate data payload received was larger than
      SCSI CDB TRANSFER LENGTH, so go ahead and reject and drop the
      bogus payload as a defensive action.
      
      Note this potential bug was originally relaxed by the following
      for allowing WRITE underflow in MSFT FCP host environments:
      
         commit c72c5250
         Author: Roland Dreier <roland@purestorage.com>
         Date:   Wed Jul 22 15:08:18 2015 -0700
      
            target: allow underflow/overflow for PR OUT etc. commands
      
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3900f24a
    • Nicholas Bellinger's avatar
      iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP · 463440e6
      Nicholas Bellinger authored
      commit 105fa2f4 upstream.
      
      This patch fixes a BUG() in iscsit_close_session() that could be
      triggered when iscsit_logout_post_handler() execution from within
      tx thread context was not run for more than SECONDS_FOR_LOGOUT_COMP
      (15 seconds), and the TCP connection didn't already close before
      then forcing tx thread context to automatically exit.
      
      This would manifest itself during explicit logout as:
      
      [33206.974254] 1 connection(s) still exist for iSCSI session to iqn.1993-08.org.debian:01:3f5523242179
      [33206.980184] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 2100.772 msecs
      [33209.078643] ------------[ cut here ]------------
      [33209.078646] kernel BUG at drivers/target/iscsi/iscsi_target.c:4346!
      
      Normally when explicit logout attempt fails, the tx thread context
      exits and iscsit_close_connection() from rx thread context does the
      extra cleanup once it detects conn->conn_logout_remove has not been
      cleared by the logout type specific post handlers.
      
      To address this special case, if the logout post handler in tx thread
      context detects conn->tx_thread_active has already been cleared, simply
      return and exit in order for existing iscsit_close_connection()
      logic from rx thread context do failed logout cleanup.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Tested-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Tested-by: default avatarGary Guo <ghg@datera.io>
      Tested-by: default avatarChu Yuan Lin <cyl@datera.io>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      463440e6
    • Nicholas Bellinger's avatar
      target: Fix kref->refcount underflow in transport_cmd_finish_abort · 1f576d53
      Nicholas Bellinger authored
      commit 73d4e580 upstream.
      
      This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED
      when a fabric driver drops it's second reference from below the
      target_core_tmr.c based callers of transport_cmd_finish_abort().
      
      Recently with the conversion of kref to refcount_t, this bug was
      manifesting itself as:
      
      [705519.601034] refcount_t: underflow; use-after-free.
      [705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 20116.512 msecs
      [705539.719111] ------------[ cut here ]------------
      [705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 refcount_sub_and_test+0x33/0x51
      
      Since the original kref atomic_t based kref_put() didn't check for
      underflow and only invoked the final callback when zero was reached,
      this bug did not manifest in practice since all se_cmd memory is
      using preallocated tags.
      
      To address this, go ahead and propigate the existing return from
      transport_put_cmd() up via transport_cmd_finish_abort(), and
      change transport_cmd_finish_abort() + core_tmr_handle_tas_abort()
      callers to only do their local target_put_sess_cmd() if necessary.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Tested-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Tested-by: default avatarGary Guo <ghg@datera.io>
      Tested-by: default avatarChu Yuan Lin <cyl@datera.io>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f576d53
    • Will Deacon's avatar
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · 99f66b51
      Will Deacon authored
      commit dbb236c1 upstream.
      
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Acked-by: default avatarKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99f66b51
    • John Stultz's avatar
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · a53bfdda
      John Stultz authored
      commit 3d88d56c upstream.
      
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a53bfdda
    • John Stultz's avatar
      time: Fix clock->read(clock) race around clocksource changes · 02a37ccd
      John Stultz authored
      commit ceea5e37 upstream.
      
      In tests, which excercise switching of clocksources, a NULL
      pointer dereference can be observed on AMR64 platforms in the
      clocksource read() function:
      
      u64 clocksource_mmio_readl_down(struct clocksource *c)
      {
      	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
      }
      
      This is called from the core timekeeping code via:
      
      	cycle_now = tkr->read(tkr->clock);
      
      tkr->read is the cached tkr->clock->read() function pointer.
      When the clocksource is changed then tkr->clock and tkr->read
      are updated sequentially. The code above results in a sequential
      load operation of tkr->read and tkr->clock as well.
      
      If the store to tkr->clock hits between the loads of tkr->read
      and tkr->clock, then the old read() function is called with the
      new clock pointer. As a consequence the read() function
      dereferences a different data structure and the resulting 'reg'
      pointer can point anywhere including NULL.
      
      This problem was introduced when the timekeeping code was
      switched over to use struct tk_read_base. Before that, it was
      theoretically possible as well when the compiler decided to
      reload clock in the code sequence:
      
           now = tk->clock->read(tk->clock);
      
      Add a helper function which avoids the issue by reading
      tk_read_base->clock once into a local variable clk and then issue
      the read function via clk->read(clk). This guarantees that the
      read() function always gets the proper clocksource pointer handed
      in.
      
      Since there is now no use for the tkr.read pointer, this patch
      also removes it, and to address stopping the fast timekeeper
      during suspend/resume, it introduces a dummy clocksource to use
      rather then just a dummy read function.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02a37ccd
    • Arend Van Spriel's avatar
      brcmfmac: unbind all devices upon failure in firmware callback · c81d034b
      Arend Van Spriel authored
      commit 7a51461f upstream.
      
      When request firmware fails, brcmf_ops_sdio_remove is being called and
      brcmf_bus freed. In such circumstancies if you do a suspend/resume cycle
      the kernel hangs on resume due a NULL pointer dereference in resume
      function. So in brcmf_sdio_firmware_callback() we need to unbind the
      driver from both sdio_func devices when firmware load failure is indicated.
      Tested-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c81d034b
    • Arend Van Spriel's avatar
      brcmfmac: use firmware callback upon failure to load · ba2d8d67
      Arend Van Spriel authored
      commit 03fb0e83 upstream.
      
      When firmware loading failed the code used to unbind the device provided
      by the calling code. However, for the sdio driver two devices are bound
      and both need to be released upon failure. The callback has been extended
      with parameter to pass error code so add that in this commit upon firmware
      loading failure.
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba2d8d67
    • Arend Van Spriel's avatar
      brcmfmac: add parameter to pass error code in firmware callback · 1dd15bd6
      Arend Van Spriel authored
      commit 6d0507a7 upstream.
      
      Extend the parameters in the firmware callback so it can be called
      upon success and failure. This allows the caller to properly clear
      all resources in the failure path. Right now the error code is
      always zero, ie. success.
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1dd15bd6
    • Daniel Drake's avatar
      Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list · 20d8f785
      Daniel Drake authored
      commit 817ae460 upstream.
      
      Without this quirk, the touchpad is not responsive on this product, with
      the following message repeated in the logs:
      
       psmouse serio1: bad data from KBC - timeout
      
      Add it to the notimeout list alongside other similar Fujitsu laptops.
      Signed-off-by: default avatarDaniel Drake <drake@endlessm.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20d8f785
    • Naveen N. Rao's avatar
      powerpc/64s: Handle data breakpoints in Radix mode · 8eaa481d
      Naveen N. Rao authored
      commit d89ba535 upstream.
      
      On Power9, trying to use data breakpoints throws the splat shown
      below. This is because the check for a data breakpoint in DSISR is in
      do_hash_page(), which is not called when in Radix mode.
      
        Unable to handle kernel paging request for data at address 0xc000000000e19218
        Faulting instruction address: 0xc0000000001155e8
        cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
        pc: c0000000001155e8: find_pid_ns+0x48/0xe0
        lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
        sp: c0000000ef1e7da0
        msr: 9000000000009033
        dar: c000000000e19218
        dsisr: 400000
      
      Move the check to handle_page_fault() so as to catch data breakpoints
      in both Hash and Radix MMU modes.
      
      We have to change the check in do_hash_page() against 0xa410 to use
      0xa450, so as to include the value of (DSISR_DABRMATCH << 16).
      
      There are two sites that call handle_page_fault() when in Radix, both
      already pass DSISR in r4.
      
      Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
      Reported-by: default avatarShriya R. Kulkarni <shriykul@in.ibm.com>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Fix the fall-through case on hash, we need to reload DSISR]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8eaa481d
    • Naveen N. Rao's avatar
      powerpc/kprobes: Pause function_graph tracing during jprobes handling · 414f51ce
      Naveen N. Rao authored
      commit a9f8553e upstream.
      
      This fixes a crash when function_graph and jprobes are used together.
      This is essentially commit 237d28db ("ftrace/jprobes/x86: Fix
      conflict between jprobes and function graph tracing"), but for powerpc.
      
      Jprobes breaks function_graph tracing since the jprobe hook needs to use
      jprobe_return(), which never returns back to the hook, but instead to
      the original jprobe'd function. The solution is to momentarily pause
      function_graph tracing before invoking the jprobe hook and re-enable it
      when returning back to the original jprobe'd function.
      
      Fixes: 6794c782 ("powerpc64: port of the function graph tracer")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      414f51ce
    • Eric W. Biederman's avatar
      signal: Only reschedule timers on signals timers have sent · f719f20a
      Eric W. Biederman authored
      commit 57db7e4a upstream.
      
      Thomas Gleixner  wrote:
      > The CRIU support added a 'feature' which allows a user space task to send
      > arbitrary (kernel) signals to itself. The changelog says:
      >
      >   The kernel prevents sending of siginfo with positive si_code, because
      >   these codes are reserved for kernel.  I think we can allow a task to
      >   send such a siginfo to itself.  This operation should not be dangerous.
      >
      > Quite contrary to that claim, it turns out that it is outright dangerous
      > for signals with info->si_code == SI_TIMER. The following code sequence in
      > a user space task allows to crash the kernel:
      >
      >    id = timer_create(CLOCK_XXX, ..... signo = SIGX);
      >    timer_set(id, ....);
      >    info->si_signo = SIGX;
      >    info->si_code = SI_TIMER:
      >    info->_sifields._timer._tid = id;
      >    info->_sifields._timer._sys_private = 2;
      >    rt_[tg]sigqueueinfo(..., SIGX, info);
      >    sigemptyset(&sigset);
      >    sigaddset(&sigset, SIGX);
      >    rt_sigtimedwait(sigset, info);
      >
      > For timers based on CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID this
      > results in a kernel crash because sigwait() dequeues the signal and the
      > dequeue code observes:
      >
      >   info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0
      >
      > which triggers the following callchain:
      >
      >  do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer()
      >
      > arm_timer() executes a list_add() on the timer, which is already armed via
      > the timer_set() syscall. That's a double list add which corrupts the posix
      > cpu timer list. As a consequence the kernel crashes on the next operation
      > touching the posix cpu timer list.
      >
      > Posix clocks which are internally implemented based on hrtimers are not
      > affected by this because hrtimer_start() can handle already armed timers
      > nicely, but it's a reliable way to trigger the WARN_ON() in
      > hrtimer_forward(), which complains about calling that function on an
      > already armed timer.
      
      This problem has existed since the posix timer code was merged into
      2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to
      inject not just a signal (which linux has supported since 1.0) but the
      full siginfo of a signal.
      
      The core problem is that the code will reschedule in response to
      signals getting dequeued not just for signals the timers sent but
      for other signals that happen to a si_code of SI_TIMER.
      
      Avoid this confusion by testing to see if the queued signal was
      preallocated as all timer signals are preallocated, and so far
      only the timer code preallocates signals.
      
      Move the check for if a timer needs to be rescheduled up into
      collect_signal where the preallocation check must be performed,
      and pass the result back to dequeue_signal where the code reschedules
      timers.   This makes it clear why the code cares about preallocated
      timers.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      Reference: 66dd34ad ("signal: allow to send any siginfo to itself")
      Reference: 1669ce53 ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO")
      Fixes: db8b50ba ("[PATCH] POSIX clocks & timers")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f719f20a
    • Sebastian Parschauer's avatar
      HID: Add quirk for Dell PIXART OEM mouse · 99afebe8
      Sebastian Parschauer authored
      commit 3db28271 upstream.
      
      This mouse is also known under other IDs. It needs the quirk
      ALWAYS_POLL or will disconnect in runlevel 1 or 3.
      Signed-off-by: default avatarSebastian Parschauer <sparschauer@suse.de>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99afebe8
    • Raju Rangoju's avatar
      cxgb4: notify uP to route ctrlq compl to rdma rspq · cdf300d6
      Raju Rangoju authored
      commit dec6b331 upstream.
      
      During the module initialisation there is a possible race
      (basically race between uld and lld) where neither the uld
      nor lld notifies the uP about where to route the ctrl queue
      completions. LLD skips notifying uP as the rdma queues were
      not created by then (will leave it to ULD to notify the uP).
      As the ULD comes up, it also skips notifying the uP as the
      flag FULL_INIT_DONE is not set yet (ULD assumes that the
      interface is not up yet).
      
      Consequently, this race between uld and lld leaves uP
      unnotified about where to send the ctrl queue completions
      to, leading to iwarp RI_RES WR failure.
      
      Here is the race:
      
      CPU 0                                   CPU1
      
      - allocates nic rx queus
      - t4_sge_alloc_ctrl_txq()
      (if rdma rsp queues exists,
      tell uP to route ctrl queue
      compl to rdma rspq)
                                      - acquires the mutex_lock
                                      - allocates rdma response queues
                                      - if FULL_INIT_DONE set,
                                        tell uP to route ctrl queue compl
                                        to rdma rspq
                                      - relinquishes mutex_lock
      - acquires the mutex_lock
      - enable_rx()
      - set FULL_INIT_DONE
      - relinquishes mutex_lock
      
      This patch fixes the above issue.
      
      Fixes: e7519f99('cxgb4: avoid enabling napi twice to the same queue')
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Acked-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdf300d6
    • Pavel Shilovsky's avatar
      CIFS: Improve readdir verbosity · fb6dc831
      Pavel Shilovsky authored
      commit dcd87838 upstream.
      
      Downgrade the loglevel for SMB2 to prevent filling the log
      with messages if e.g. readdir was interrupted. Also make SMB2
      and SMB1 codepaths do the same logging during readdir.
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb6dc831
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Context-switch EBB registers properly · 2f1527e3
      Paul Mackerras authored
      commit ca8efa1d upstream.
      
      This adds code to save the values of three SPRs (special-purpose
      registers) used by userspace to control event-based branches (EBBs),
      which are essentially interrupts that get delivered directly to
      userspace.  These registers are loaded up with guest values when
      entering the guest, and their values are saved when exiting the
      guest, but we were not saving the host values and restoring them
      before going back to userspace.
      
      On POWER8 this would only affect userspace programs which explicitly
      request the use of EBBs and also use the KVM_RUN ioctl, since the
      only source of EBBs on POWER8 is the PMU, and there is an explicit
      enable bit in the PMU registers (and those PMU registers do get
      properly context-switched between host and guest).  On POWER9 there
      is provision for externally-generated EBBs, and these are not subject
      to the control in the PMU registers.
      
      Since these registers only affect userspace, we can save them when
      we first come in from userspace and restore them before returning to
      userspace, rather than saving/restoring the host values on every
      guest entry/exit.  Similarly, we don't need to worry about their
      values on offline secondary threads since they execute in the context
      of the idle task, which never executes in userspace.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f1527e3
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 468aa930
      Paul Mackerras authored
      commit 46a704f8 upstream.
      
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      468aa930
    • Heiko Carstens's avatar
      KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows · df3a787b
      Heiko Carstens authored
      commit addb63c1 upstream.
      
      For real-space designation asces the asce origin part is only a token.
      The asce token origin must not be used to generate an effective
      address for storage references. This however is erroneously done
      within kvm_s390_shadow_tables().
      
      Furthermore within the same function the wrong parts of virtual
      addresses are used to generate a corresponding real address
      (e.g. the region second index is used as region first index).
      
      Both of the above can result in incorrect address translations. Only
      for real space designations with a token origin of zero and addresses
      below one megabyte the translation was correct.
      
      Furthermore replace a "!asce.r" statement with a "!*fake" statement to
      make it more obvious that a specific condition has nothing to do with
      the architecture, but with the fake handling of real space designations.
      
      Fixes: 3218f709 ("s390/mm: support real-space for gmap shadows")
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df3a787b
    • Kan Liang's avatar
      perf/x86/intel: Add 1G DTLB load/store miss support for SKL · 5220378b
      Kan Liang authored
      commit fb3a5055 upstream.
      
      Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
      4M page size.
      Need to extend the events to support any page size (4K/2M/4M/1G).
      
      The complete DTLB load/store miss events are:
      
        DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
        DTLB_STORE_MISSES.WALK_COMPLETED		0xe49
      Signed-off-by: default avatarKan Liang <Kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5220378b
    • Ilya Matveychikov's avatar
      lib/cmdline.c: fix get_options() overflow while parsing ranges · 7c679fe7
      Ilya Matveychikov authored
      commit a91e0f68 upstream.
      
      When using get_options() it's possible to specify a range of numbers,
      like 1-100500.  The problem is that it doesn't track array size while
      calling internally to get_range() which iterates over the range and
      fills the memory with numbers.
      
      Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.comSigned-off-by: default avatarIlya V. Matveychikov <matvejchikov@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c679fe7
    • NeilBrown's avatar
      autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL · bc6eecff
      NeilBrown authored
      commit 9fa4eb8e upstream.
      
      If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
      autofs4_d_automount() will return
      
         ERR_PTR(status)
      
      with that status to follow_automount(), which will then dereference an
      invalid pointer.
      
      So treat a positive status the same as zero, and map to ENOENT.
      
      See comment in systemd src/core/automount.c::automount_send_ready().
      
      Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc6eecff
    • Ravi Bangoria's avatar
      powerpc/perf: Fix oops when kthread execs user process · 4b660fcb
      Ravi Bangoria authored
      commit bf05fc25 upstream.
      
      When a kthread calls call_usermodehelper() the steps are:
        1. allocate current->mm
        2. load_elf_binary()
        3. populate current->thread.regs
      
      While doing this, interrupts are not disabled. If there is a perf
      interrupt in the middle of this process (i.e. step 1 has completed
      but not yet reached to step 3) and if perf tries to read userspace
      regs, kernel oops with following log:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000da0fc
        ...
        Call Trace:
        perf_output_sample_regs+0x6c/0xd0
        perf_output_sample+0x4e4/0x830
        perf_event_output_forward+0x64/0x90
        __perf_event_overflow+0x8c/0x1e0
        record_and_restart+0x220/0x5c0
        perf_event_interrupt+0x2d8/0x4d0
        performance_monitor_exception+0x54/0x70
        performance_monitor_common+0x158/0x160
        --- interrupt: f01 at avtab_search_node+0x150/0x1a0
            LR = avtab_search_node+0x100/0x1a0
        ...
        load_elf_binary+0x6e8/0x15a0
        search_binary_handler+0xe8/0x290
        do_execveat_common.isra.14+0x5f4/0x840
        call_usermodehelper_exec_async+0x170/0x210
        ret_from_kernel_thread+0x5c/0x7c
      
      Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
      pt_regs are not set.
      
      Fixes: ed4a4ef8 ("powerpc/perf: Add support for sampling interrupt register state")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b660fcb
    • Kees Cook's avatar
      fs/exec.c: account for argv/envp pointers · 3d6848e4
      Kees Cook authored
      commit 98da7d08 upstream.
      
      When limiting the argv/envp strings during exec to 1/4 of the stack limit,
      the storage of the pointers to the strings was not included.  This means
      that an exec with huge numbers of tiny strings could eat 1/4 of the stack
      limit in strings and then additional space would be later used by the
      pointers to the strings.
      
      For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
      single-byte strings would consume less than 2MB of stack, the max (8MB /
      4) amount allowed, but the pointers to the strings would consume the
      remaining additional stack space (1677721 * 4 == 6710884).
      
      The result (1677721 + 6710884 == 8388605) would exhaust stack space
      entirely.  Controlling this stack exhaustion could result in
      pathological behavior in setuid binaries (CVE-2017-1000365).
      
      [akpm@linux-foundation.org: additional commenting from Kees]
      Fixes: b6a2fea3 ("mm: variable length argument support")
      Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d6848e4
    • Takashi Iwai's avatar
      ALSA: pcm: Don't treat NULL chmap as a fatal error · 552a14a5
      Takashi Iwai authored
      commit 2deaeaf1 upstream.
      
      The standard PCM chmap helper callbacks treat the NULL info->chmap as
      a fatal error and spews the kernel warning with stack trace when
      CONFIG_SND_DEBUG is on.  This was OK, originally it was supposed to be
      always static and non-NULL.  But, as the recent addition of Intel LPE
      audio driver shows, the chmap content may vary dynamically, and it can
      be even NULL when disconnected.  The user still sees the kernel
      warning unnecessarily.
      
      For clearing such a confusion, this patch simply removes the
      snd_BUG_ON() in each place, just returns an error without warning.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      552a14a5
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib: Fix stall of process context at packet error · 8c9c55a0
      Takashi Sakamoto authored
      commit 4a9bfafc upstream.
      
      At Linux v3.5, packet processing can be done in process context of ALSA
      PCM application as well as software IRQ context for OHCI 1394. Below is
      an example of the callgraph (some calls are omitted).
      
      ioctl(2) with e.g. HWSYNC
      (sound/core/pcm_native.c)
      ->snd_pcm_common_ioctl1()
        ->snd_pcm_hwsync()
          ->snd_pcm_stream_lock_irq
          (sound/core/pcm_lib.c)
          ->snd_pcm_update_hw_ptr()
            ->snd_pcm_udpate_hw_ptr0()
              ->struct snd_pcm_ops.pointer()
              (sound/firewire/*)
              = Each handler on drivers in ALSA firewire stack
                (sound/firewire/amdtp-stream.c)
                ->amdtp_stream_pcm_pointer()
                  (drivers/firewire/core-iso.c)
                  ->fw_iso_context_flush_completions()
                    ->struct fw_card_driver.flush_iso_completion()
                    (drivers/firewire/ohci.c)
                    = flush_iso_completions()
                      ->struct fw_iso_context.callback.sc
                      (sound/firewire/amdtp-stream.c)
                      = in_stream_callback() or out_stream_callback()
                        ->...
          ->snd_pcm_stream_unlock_irq
      
      When packet queueing error occurs or detecting invalid packets in
      'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
      is called on local CPU with disabled IRQ.
      
      (sound/firewire/amdtp-stream.c)
      in_stream_callback() or out_stream_callback()
      ->amdtp_stream_pcm_abort()
        ->snd_pcm_stop_xrun()
          ->snd_pcm_stream_lock_irqsave()
          ->snd_pcm_stop()
          ->snd_pcm_stream_unlock_irqrestore()
      
      The process is stalled on the CPU due to attempt to acquire recursive lock.
      
      [  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
      [  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0 softirq=38323/38323 fqs=7140
      [  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035, q=5933)
      [  562.630866] Task dump for CPU 2:
      [  562.630867] alsa-source-OXF R  running task        0  6619      1 0x00000008
      [  562.630870] Call Trace:
      [  562.630876]  ? vt_console_print+0x79/0x3e0
      [  562.630880]  ? msg_print_text+0x9d/0x100
      [  562.630883]  ? up+0x32/0x50
      [  562.630885]  ? irq_work_queue+0x8d/0xa0
      [  562.630886]  ? console_unlock+0x2b6/0x4b0
      [  562.630888]  ? vprintk_emit+0x312/0x4a0
      [  562.630892]  ? dev_vprintk_emit+0xbf/0x230
      [  562.630895]  ? do_sys_poll+0x37a/0x550
      [  562.630897]  ? dev_printk_emit+0x4e/0x70
      [  562.630900]  ? __dev_printk+0x3c/0x80
      [  562.630903]  ? _raw_spin_lock+0x20/0x30
      [  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
      [  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
      [  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
      [  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
      [  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
      [  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
      [  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
      [  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
      [  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
      [  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
      [  562.630945]  ? poll_select_copy_remaining+0x150/0x150
      [  562.630947]  ? poll_select_copy_remaining+0x150/0x150
      [  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
      [  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
      [  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
      [  562.630962]  ? futex_wake+0x90/0x170
      [  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
      [  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
      [  562.630972]  ? do_vfs_ioctl+0xa3/0x610
      [  562.630974]  ? vfs_read+0x11b/0x130
      [  562.630976]  ? SyS_ioctl+0x79/0x90
      [  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      
      This commit fixes the above bug. This assumes two cases:
      1. Any error is detected in software IRQ context of OHCI 1394 context.
      In this case, PCM substream should be aborted in packet handler. On the
      other hand, it should not be done in any process context. TO distinguish
      these two context, use 'in_interrupt()' macro.
      2. Any error is detect in process context of ALSA PCM application.
      In this case, PCM substream should not be aborted in packet handler
      because PCM substream lock is acquired. The task to abort PCM substream
      should be done in ALSA PCM core. For this purpose, SNDRV_PCM_POS_XRUN is
      returned at 'struct snd_pcm_ops.pointer()'.
      Suggested-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Fixes: e9148ddd("ALSA: firewire-lib: flush completed packets when reading PCM position")
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c9c55a0
    • Jan Beulich's avatar
      xen-blkback: don't leak stack data via response ring · 4ae2cb91
      Jan Beulich authored
      commit 089bc014 upstream.
      
      Rather than constructing a local structure instance on the stack, fill
      the fields directly on the shared ring, just like other backends do.
      Build on the fact that all response structure flavors are actually
      identical (the old code did make this assumption too).
      
      This is XSA-216.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ae2cb91
    • Juergen Gross's avatar
      xen/blkback: fix disconnect while I/Os in flight · e5c49c17
      Juergen Gross authored
      commit 46464411 upstream.
      
      Today disconnecting xen-blkback is broken in case there are still
      I/Os in flight: xen_blkif_disconnect() will bail out early without
      releasing all resources in the hope it will be called again when
      the last request has terminated. This, however, won't happen as
      xen_blkif_free() won't be called on termination of the last running
      request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
      xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
      calls in xen_blkif_disconnect() didn't happen.
      
      To solve this deadlock xen_blkif_disconnect() and
      xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
      xen_blkif_get() but use some other way to do their accounting of
      resources.
      
      This at once fixes another error in xen_blkif_disconnect(): when it
      returned early with -EBUSY for another ring than 0 it would call
      xen_blkif_put() again for already handled rings on a subsequent call.
      This will lead to inconsistencies in the refcnt handling.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarSteven Haigh <netwiz@crc.id.au>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5c49c17
    • Chen-Yu Tsai's avatar
      clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset · 0e051f17
      Chen-Yu Tsai authored
      commit 38b8f823 upstream.
      
      The register offset for the lcd1-ch1 clock was incorrectly pointing to
      the lcd0-ch1 clock. This resulted in the lcd0-ch1 clock being disabled
      when the clk core disables unused clocks. This then stops the simplefb
      HDMI output path.
      Reported-by: default avatarBob Ham <rah@settrans.net>
      Fixes: c6e6c96d ("clk: sunxi-ng: Add A31/A31s clocks")
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e051f17
  2. 24 Jun, 2017 9 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.34 · 493ecd5c
      Greg Kroah-Hartman authored
      493ecd5c
    • Hugh Dickins's avatar
      mm: fix new crash in unmapped_area_topdown() · ce7fe859
      Hugh Dickins authored
      commit f4cb767d upstream.
      
      Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
      mmap testing.  That's the VM_BUG_ON(gap_end < gap_start) at the
      end of unmapped_area_topdown().  Linus points out how MAP_FIXED
      (which does not have to respect our stack guard gap intentions)
      could result in gap_end below gap_start there.  Fix that, and
      the similar case in its alternative, unmapped_area().
      
      Fixes: 1be7107f ("mm: larger stack guard gap, between vmas")
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Debugged-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce7fe859
    • Helge Deller's avatar
      Allow stack to grow up to address space limit · 5d10ad62
      Helge Deller authored
      commit bd726c90 upstream.
      
      Fix expand_upwards() on architectures with an upward-growing stack (parisc,
      metag and partly IA-64) to allow the stack to reliably grow exactly up to
      the address space limit given by TASK_SIZE.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d10ad62
    • Hugh Dickins's avatar
      mm: larger stack guard gap, between vmas · cfc0eb40
      Hugh Dickins authored
      commit 1be7107f upstream.
      
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: default avatarOleg Nesterov <oleg@redhat.com>
      Original-patch-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [wt: backport to 4.11: adjust context]
      [wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfc0eb40
    • Thomas Gleixner's avatar
      alarmtimer: Rate limit periodic intervals · 04651048
      Thomas Gleixner authored
      commit ff86bf0c upstream.
      
      The alarmtimer code has another source of potentially rearming itself too
      fast. Interval timers with a very samll interval have a similar CPU hog
      effect as the previously fixed overflow issue.
      
      The reason is that alarmtimers do not implement the normal protection
      against this kind of problem which the other posix timer use:
      
        timer expires -> queue signal -> deliver signal -> rearm timer
      
      This scheme brings the rearming under scheduler control and prevents
      permanently firing timers which hog the CPU.
      
      Bringing this scheme to the alarm timer code is a major overhaul because it
      lacks all the necessary mechanisms completely.
      
      So for a quick fix limit the interval to one jiffie. This is not
      problematic in practice as alarmtimers are usually backed by an RTC for
      suspend which have 1 second resolution. It could be therefor argued that
      the resolution of this clock should be set to 1 second in general, but
      that's outside the scope of this fix.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      04651048
    • David Miller's avatar
      crypto: Work around deallocated stack frame reference gcc bug on sparc. · b355b899
      David Miller authored
      commit d41519a6 upstream.
      
      On sparc, if we have an alloca() like situation, as is the case with
      SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
      memory.  The result can be that the value is clobbered if a trap
      or interrupt arrives at just the right instruction.
      
      It only occurs if the function ends returning a value from that
      alloca() area and that value can be placed into the return value
      register using a single instruction.
      
      For example, in lib/libcrc32c.c:crc32c() we end up with a return
      sequence like:
      
              return  %i7+8
               lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],
      
      %o5 holds the base of the on-stack area allocated for the shash
      descriptor.  But the return released the stack frame and the
      register window.
      
      So if an intererupt arrives between 'return' and 'lduw', then
      the value read at %o5+16 can be corrupted.
      
      Add a data compiler barrier to work around this problem.  This is
      exactly what the gcc fix will end up doing as well, and it absolutely
      should not change the code generated for other cpus (unless gcc
      on them has the same bug :-)
      
      With crucial insight from Eric Sandeen.
      Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b355b899
    • Hon Ching \(Vicky) Lo's avatar
      vTPM: Fix missing NULL check · 7dfe7ca9
      Hon Ching \(Vicky) Lo authored
      commit 31574d32 upstream.
      
      The current code passes the address of tpm_chip as the argument to
      dev_get_drvdata() without prior NULL check in
      tpm_ibmvtpm_get_desired_dma.  This resulted an oops during kernel
      boot when vTPM is enabled in Power partition configured in active
      memory sharing mode.
      
      The vio_driver's get_desired_dma() is called before the probe(), which
      for vtpm is tpm_ibmvtpm_probe, and it's this latter function that
      initializes the driver and set data.  Attempting to get data before
      the probe() caused the problem.
      
      This patch adds a NULL check to the tpm_ibmvtpm_get_desired_dma.
      
      fixes: 9e0d39d8 ("tpm: Remove useless priv field in struct tpm_vendor_specific")
      Signed-off-by: default avatarHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
      Reviewed-by: default avatarJarkko Sakkine <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      7dfe7ca9
    • Paul Burton's avatar
      MIPS: .its targets depend on vmlinux · ecae4733
      Paul Burton authored
      commit bcd7c45e upstream.
      
      The .its targets require information about the kernel binary, such as
      its entry point, which is extracted from the vmlinux ELF. We therefore
      require that the ELF is built before the .its files are generated.
      Declare this requirement in the Makefile such that make will ensure this
      is always the case, otherwise in corner cases we can hit issues as the
      .its is generated with an incorrect (either invalid or stale) entry
      point.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Fixes: cf2a5e0b ("MIPS: Support generating Flattened Image Trees (.itb)")
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16179/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecae4733
    • Paul Burton's avatar
      MIPS: Fix bnezc/jialc return address calculation · 6b706cbb
      Paul Burton authored
      commit 1a73d931 upstream.
      
      The code handling the pop76 opcode (ie. bnezc & jialc instructions) in
      __compute_return_epc_for_insn() needs to set the value of $31 in the
      jialc case, which is encoded with rs = 0. However its check to
      differentiate bnezc (rs != 0) from jialc (rs = 0) was unfortunately
      backwards, meaning that if we emulate a bnezc instruction we clobber $31
      & if we emulate a jialc instruction it actually behaves like a jic
      instruction.
      
      Fix this by inverting the check of rs to match the way the instructions
      are actually encoded.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Fixes: 28d6f93d ("MIPS: Emulate the new MIPS R6 BNEZC and JIALC instructions")
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16178/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b706cbb