1. 15 Dec, 2016 13 commits
    • Mike Galbraith's avatar
      sched/autogroup: Fix 64-bit kernel nice level adjustment · b0224f36
      Mike Galbraith authored
      commit 83929cce upstream.
      
      Michael Kerrisk reported:
      
      > Regarding the previous paragraph...  My tests indicate
      > that writing *any* value to the autogroup [nice priority level]
      > file causes the task group to get a lower priority.
      
      Because autogroup didn't call the then meaningless scale_load()...
      
      Autogroup nice level adjustment has been broken ever since load
      resolution was increased for 64-bit kernels.  Use scale_load() to
      scale group weight.
      
      Michael Kerrisk tested this patch to fix the problem:
      
      > Applied and tested against 4.9-rc6 on an Intel u7 (4 cores).
      > Test setup:
      >
      > Terminal window 1: running 40 CPU burner jobs
      > Terminal window 2: running 40 CPU burner jobs
      > Terminal window 1: running  1 CPU burner job
      >
      > Demonstrated that:
      > * Writing "0" to the autogroup file for TW1 now causes no change
      >   to the rate at which the process on the terminal consume CPU.
      > * Writing -20 to the autogroup file for TW1 caused those processes
      >   to get the lion's share of CPU while TW2 TW3 get a tiny amount.
      > * Writing -20 to the autogroup files for TW1 and TW3 allowed the
      >   process on TW3 to get as much CPU as it was getting as when
      >   the autogroup nice values for both terminals were 0.
      Reported-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Tested-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-man <linux-man@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0224f36
    • Mauricio Faria de Oliveira's avatar
      scsi: lpfc: fix oops/BUG in lpfc_sli_ringtxcmpl_put() · 2a477999
      Mauricio Faria de Oliveira authored
      commit 2319f847 upstream.
      
      The BUG_ON() recently introduced in lpfc_sli_ringtxcmpl_put() is hit in
      the lpfc_els_abort() > lpfc_sli_issue_abort_iotag() >
      lpfc_sli_abort_iotag_issue() function path [similar names], due to
      'piocb->vport == NULL':
      
      	BUG_ON(!piocb || !piocb->vport);
      
      This happens because lpfc_sli_abort_iotag_issue() doesn't set the
      'abtsiocbp->vport' pointer -- but this is not the problem.
      
      Previously, lpfc_sli_ringtxcmpl_put() accessed 'piocb->vport' only if
      'piocb->iocb.ulpCommand' is neither CMD_ABORT_XRI_CN nor
      CMD_CLOSE_XRI_CN, which are the only possible values for
      lpfc_sli_abort_iotag_issue():
      
          lpfc_sli_ringtxcmpl_put():
      
              if ((unlikely(pring->ringno == LPFC_ELS_RING)) &&
                 (piocb->iocb.ulpCommand != CMD_ABORT_XRI_CN) &&
                 (piocb->iocb.ulpCommand != CMD_CLOSE_XRI_CN) &&
                  (!(piocb->vport->load_flag & FC_UNLOADING)))
      
          lpfc_sli_abort_iotag_issue():
      
              if (phba->link_state >= LPFC_LINK_UP)
                      iabt->ulpCommand = CMD_ABORT_XRI_CN;
              else
                      iabt->ulpCommand = CMD_CLOSE_XRI_CN;
      
      So, this function path would not have hit this possible NULL pointer
      dereference before.
      
      In order to fix this regression, move the second part of the BUG_ON()
      check prior to the pointer dereference that it does check for.
      
      For reference, this is the stack trace observed. The problem happened
      because an unsolicited event was received - a PLOGI was received after
      our PLOGI was issued but not yet complete, so the discovery state
      machine goes on to sw-abort our PLOGI.
      
          kernel BUG at drivers/scsi/lpfc/lpfc_sli.c:1326!
          Oops: Exception in kernel mode, sig: 5 [#1]
          <...>
          NIP [...] lpfc_sli_ringtxcmpl_put+0x1c/0xf0 [lpfc]
          LR  [...] __lpfc_sli_issue_iocb_s4+0x188/0x200 [lpfc]
          Call Trace:
          [...] [...] __lpfc_sli_issue_iocb_s4+0xb0/0x200 [lpfc] (unreliable)
          [...] [...] lpfc_sli_issue_abort_iotag+0x2b4/0x350 [lpfc]
          [...] [...] lpfc_els_abort+0x1a8/0x4a0 [lpfc]
          [...] [...] lpfc_rcv_plogi+0x6d4/0x700 [lpfc]
          [...] [...] lpfc_rcv_plogi_plogi_issue+0xd8/0x1d0 [lpfc]
          [...] [...] lpfc_disc_state_machine+0xc0/0x2b0 [lpfc]
          [...] [...] lpfc_els_unsol_buffer+0xcc0/0x26c0 [lpfc]
          [...] [...] lpfc_els_unsol_event+0xa8/0x220 [lpfc]
          [...] [...] lpfc_complete_unsol_iocb+0xb8/0x138 [lpfc]
          [...] [...] lpfc_sli4_handle_received_buffer+0x6a0/0xec0 [lpfc]
          [...] [...] lpfc_sli_handle_slow_ring_event_s4+0x1c4/0x240 [lpfc]
          [...] [...] lpfc_sli_handle_slow_ring_event+0x24/0x40 [lpfc]
          [...] [...] lpfc_do_work+0xd88/0x1970 [lpfc]
          [...] [...] kthread+0x108/0x130
          [...] [...] ret_from_kernel_thread+0x5c/0xbc
          <...>
      
      Fixes: 22466da5 ("lpfc: Fix possible NULL pointer dereference")
      Reported-by: default avatarHarsha Thyagaraja <hathyaga@in.ibm.com>
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a477999
    • Dan Williams's avatar
      device-dax: fix private mapping restriction, permit read-only · ac65fe0b
      Dan Williams authored
      commit 325896ff upstream.
      
      Hugh notes in response to commit 4cb19355 "device-dax: fail all
      private mapping attempts":
      
        "I think that is more restrictive than you intended: haven't tried, but I
        believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no
        way to mmap /dev/dax without write permission to it."
      
      Indeed it does restrict read-only mappings, switch to checking
      VM_MAYSHARE, not VM_SHARED.
      
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Fixes: 4cb19355 ("device-dax: fail all private mapping attempts")
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac65fe0b
    • Thomas Gleixner's avatar
      locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() · 2386c6b1
      Thomas Gleixner authored
      commit 1be5d4fa upstream.
      
      While debugging the rtmutex unlock vs. dequeue race Will suggested to use
      READ_ONCE() in rt_mutex_owner() as it might race against the
      cmpxchg_release() in unlock_rt_mutex_safe().
      
      Will: "It's a minor thing which will most likely not matter in practice"
      
      Careful search did not unearth an actual problem in todays code, but it's
      better to be safe than surprised.
      Suggested-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2386c6b1
    • Thomas Gleixner's avatar
      locking/rtmutex: Prevent dequeue vs. unlock race · 7b2347c8
      Thomas Gleixner authored
      commit dbb26055 upstream.
      
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: default avatarDavid Daney <david.daney@cavium.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b2347c8
    • Sergey Senozhatsky's avatar
      zram: restrict add/remove attributes to root only · bed4eef2
      Sergey Senozhatsky authored
      commit 5c7e9ccd upstream.
      
      zram hot_add sysfs attribute is a very 'special' attribute - reading
      from it creates a new uninitialized zram device.  This file, by a
      mistake, can be read by a 'normal' user at the moment, while only root
      must be able to create a new zram device, therefore hot_add attribute
      must have S_IRUSR mode, not S_IRUGO.
      
      [akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols]
      Fixes: 6566d1a3 ("zram: add dynamic device add/remove functionality")
      Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.comSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: default avatarSteven Allen <steven@stebalien.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bed4eef2
    • Helge Deller's avatar
      parisc: Fix TLB related boot crash on SMP machines · 4fb7569c
      Helge Deller authored
      commit 24d0492b upstream.
      
      At bootup we run measurements to calculate the best threshold for when we
      should be using full TLB flushes instead of just flushing a specific amount of
      TLB entries.  This performance test is run over the kernel text segment.
      
      But running this TLB performance test on the kernel text segment turned out to
      crash some SMP machines when the kernel text pages were mapped as huge pages.
      
      To avoid those crashes this patch simply skips this test on some SMP machines
      and calculates an optimal threshold based on the maximum number of available
      TLB entries and number of online CPUs.
      
      On a technical side, this seems to happen:
      The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
      entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
      instruction seems to work without problems even if the pages were mapped as
      huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
      other CPUs. Those CPUs then crash the machine because the page size is not as
      expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
      problem, because the required cache coherency prohibits to use huge pages at
      all.  Sadly I didn't found any documentation about this behaviour, so this
      finding is purely based on testing with phyiscal SMP machines (A500-44 and
      J5000, both were 2-way boxes).
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fb7569c
    • John David Anglin's avatar
      parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asm · b81e5db4
      John David Anglin authored
      commit febe4296 upstream.
      
      We have four routines in pacache.S that use temporary alias pages:
      copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and
      flush_icache_page_asm().  copy_user_page_asm() and clear_user_page_asm()
      don't purge the TLB entry used for the operation.
      flush_dcache_page_asm() and flush_icache_page_asm do purge the entry.
      
      Presumably, this was thought to optimize TLB use.  However, the
      operation is quite heavy weight on PA 1.X processors as we need to take
      the TLB lock and a TLB broadcast is sent to all processors.
      
      This patch removes the purges from flush_dcache_page_asm() and
      flush_icache_page_asm.
      Signed-off-by: default avatarJohn David Anglin  <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b81e5db4
    • John David Anglin's avatar
      parisc: Purge TLB before setting PTE · 7e8f68aa
      John David Anglin authored
      commit c78e710c upstream.
      
      The attached change interchanges the order of purging the TLB and
      setting the corresponding page table entry.  TLB purges are strongly
      ordered.  It occurred to me one night that setting the PTE first might
      have subtle ordering issues on SMP machines and cause random memory
      corruption.
      
      A TLB lock guards the insertion of user TLB entries.  So after the TLB
      is purged, a new entry can't be inserted until the lock is released.
      This ensures that the new PTE value is used when the lock is released.
      
      Since making this change, no random segmentation faults have been
      observed on the Debian hppa buildd servers.
      Signed-off-by: default avatarJohn David Anglin  <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e8f68aa
    • Miklos Szeredi's avatar
      fuse: fix clearing suid, sgid for chown() · 6e284445
      Miklos Szeredi authored
      commit c01638f5 upstream.
      
      Basically, the pjdfstests set the ownership of a file to 06555, and then
      chowns it (as root) to a new uid/gid. Prior to commit a09f99ed ("fuse:
      fix killing s[ug]id in setattr"), fuse would send down a setattr with both
      the uid/gid change and a new mode.  Now, it just sends down the uid/gid
      change.
      
      Technically this is NOTABUG, since POSIX doesn't _require_ that we clear
      these bits for a privileged process, but Linux (wisely) has done that and I
      think we don't want to change that behavior here.
      
      This is caused by the use of should_remove_suid(), which will always return
      0 when the process has CAP_FSETID.
      
      In fact we really don't need to be calling should_remove_suid() at all,
      since we've already been indicated that we should remove the suid, we just
      don't want to use a (very) stale mode for that.
      
      This patch should fix the above as well as simplify the logic.
      Reported-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: a09f99ed ("fuse: fix killing s[ug]id in setattr")
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e284445
    • Ben Hutchings's avatar
      powerpc/boot: Fix build failure in 32-bit boot wrapper · e70d6d2d
      Ben Hutchings authored
      commit 10c77dba upstream.
      
      OPAL is not callable from 32-bit mode and the assembly code for it
      may not even build (depending on how binutils was configured).
      
      References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpcspe&ver=4.8.7-1&stamp=1479203712
      Fixes: 656ad58e ("powerpc/boot: Add OPAL console to epapr wrappers")
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e70d6d2d
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Fix lazy icache flush on pre-POWER5 · a82ad493
      Benjamin Herrenschmidt authored
      commit dd7b2f03 upstream.
      
      On 64-bit CPUs with no-execute support and non-snooping icache, such as
      970 or POWER4, we have a software mechanism to ensure coherency of the
      cache (using exec faults when needed).
      
      This was broken due to a logic error when the code was rewritten
      from assembly to C, previously the assembly code did:
      
        BEGIN_FTR_SECTION
               mr      r4,r30
               mr      r5,r7
               bl      hash_page_do_lazy_icache
        END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE)
      
      Which tests that:
         (cpu_features & (NOEXECUTE | COHERENT_ICACHE)) == NOEXECUTE
      
      Which says that the current cpu does have NOEXECUTE, but does not have
      COHERENT_ICACHE.
      
      Fixes: 91f1da99 ("powerpc/mm: Convert 4k hash insert to C")
      Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C")
      Fixes: a43c0eb8 ("powerpc/mm: Convert 4k insert from asm to C")
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log verbosification]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a82ad493
    • Andrew Donnellan's avatar
      powerpc/eeh: Fix deadlock when PE frozen state can't be cleared · 84b36287
      Andrew Donnellan authored
      commit 409bf7f8 upstream.
      
      In eeh_reset_device(), we take the pci_rescan_remove_lock immediately after
      after we call eeh_reset_pe() to reset the PCI controller. We then call
      eeh_clear_pe_frozen_state(), which can return an error. In this case, we
      bail out of eeh_reset_device() without calling pci_unlock_rescan_remove().
      
      Add a call to pci_unlock_rescan_remove() in the eeh_clear_pe_frozen_state()
      error path so that we don't cause a deadlock later on.
      Reported-by: default avatarPradipta Ghosh <pradghos@in.ibm.com>
      Fixes: 78954700 ("powerpc/eeh: Avoid I/O access during PE reset")
      Signed-off-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84b36287
  2. 10 Dec, 2016 27 commits