1. 06 Aug, 2019 2 commits
  2. 04 Aug, 2019 34 commits
  3. 31 Jul, 2019 4 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.63 · 9a9de33a
      Greg Kroah-Hartman authored
      9a9de33a
    • Linus Torvalds's avatar
      access: avoid the RCU grace period for the temporary subjective credentials · 408af823
      Linus Torvalds authored
      commit d7852fbd upstream.
      
      It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
      work because it installs a temporary credential that gets allocated and
      freed for each system call.
      
      The allocation and freeing overhead is mostly benign, but because
      credentials can be accessed under the RCU read lock, the freeing
      involves a RCU grace period.
      
      Which is not a huge deal normally, but if you have a lot of access()
      calls, this causes a fair amount of seconday damage: instead of having a
      nice alloc/free patterns that hits in hot per-CPU slab caches, you have
      all those delayed free's, and on big machines with hundreds of cores,
      the RCU overhead can end up being enormous.
      
      But it turns out that all of this is entirely unnecessary.  Exactly
      because access() only installs the credential as the thread-local
      subjective credential, the temporary cred pointer doesn't actually need
      to be RCU free'd at all.  Once we're done using it, we can just free it
      synchronously and avoid all the RCU overhead.
      
      So add a 'non_rcu' flag to 'struct cred', which can be set by users that
      know they only use it in non-RCU context (there are other potential
      users for this).  We can make it a union with the rcu freeing list head
      that we need for the RCU case, so this doesn't need any extra storage.
      
      Note that this also makes 'get_current_cred()' clear the new non_rcu
      flag, in case we have filesystems that take a long-term reference to the
      cred and then expect the RCU delayed freeing afterwards.  It's not
      entirely clear that this is required, but it makes for clear semantics:
      the subjective cred remains non-RCU as long as you only access it
      synchronously using the thread-local accessors, but you _can_ use it as
      a generic cred if you want to.
      
      It is possible that we should just remove the whole RCU markings for
      ->cred entirely.  Only ->real_cred is really supposed to be accessed
      through RCU, and the long-term cred copies that nfs uses might want to
      explicitly re-enable RCU freeing if required, rather than have
      get_current_cred() do it implicitly.
      
      But this is a "minimal semantic changes" change for the immediate
      problem.
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Glauber <jglauber@marvell.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      408af823
    • Dan Williams's avatar
      libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl() · 1a547d24
      Dan Williams authored
      commit b70d31d0 upstream.
      
      In preparation for fixing a deadlock between wait_for_bus_probe_idle()
      and the nvdimm_bus_list_mutex arrange for __nd_ioctl() without
      nvdimm_bus_list_mutex held. This also unifies the 'dimm' and 'bus' level
      ioctls into a common nd_ioctl() preamble implementation.
      
      Marked for -stable as it is a pre-requisite for a follow-on fix.
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341209518.292348.7183897251740665198.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a547d24
    • Michael Neuling's avatar
      powerpc/tm: Fix oops on sigreturn on systems without TM · b993a66d
      Michael Neuling authored
      commit f16d80b7 upstream.
      
      On systems like P9 powernv where we have no TM (or P8 booted with
      ppc_tm=off), userspace can construct a signal context which still has
      the MSR TS bits set. The kernel tries to restore this context which
      results in the following crash:
      
        Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ff #69
        NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
        REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
        MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
        CFAR: c0000000000022e0 IRQMASK: 0
        GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
        GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
        GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
        GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
        GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
        GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
        NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
        LR [00007fffb2d67e48] 0x7fffb2d67e48
        Call Trace:
        Instruction dump:
        e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
        e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
      
      The problem is the signal code assumes TM is enabled when
      CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
      with P9 powernv or if `ppc_tm=off` is used on P8.
      
      This means any local user can crash the system.
      
      Fix the problem by returning a bad stack frame to the user if they try
      to set the MSR TS bits with sigreturn() on systems where TM is not
      supported.
      
      Found with sigfuz kernel selftest on P9.
      
      This fixes CVE-2019-13648.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9
      Reported-by: default avatarPraveen Pandey <Praveen.Pandey@in.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b993a66d