1. 11 Oct, 2019 40 commits
    • Mian Yousaf Kaukab's avatar
      arm64: Add sysfs vulnerability show for spectre-v1 · 047aac35
      Mian Yousaf Kaukab authored
      [ Upstream commit 3891ebcc ]
      
      spectre-v1 has been mitigated and the mitigation is always active.
      Report this to userspace via sysfs
      Signed-off-by: default avatarMian Yousaf Kaukab <ykaukab@suse.de>
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Reviewed-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Acked-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      047aac35
    • Mark Rutland's avatar
      arm64: fix SSBS sanitization · edfc0266
      Mark Rutland authored
      [ Upstream commit f54dada8 ]
      
      In valid_user_regs() we treat SSBS as a RES0 bit, and consequently it is
      unexpectedly cleared when we restore a sigframe or fiddle with GPRs via
      ptrace.
      
      This patch fixes valid_user_regs() to account for this, updating the
      function to refer to the latest ARM ARM (ARM DDI 0487D.a). For AArch32
      tasks, SSBS appears in bit 23 of SPSR_EL1, matching its position in the
      AArch32-native PSR format, and we don't need to translate it as we have
      to for DIT.
      
      There are no other bit assignments that we need to account for today.
      As the recent documentation describes the DIT bit, we can drop our
      comment regarding DIT.
      
      While removing SSBS from the RES0 masks, existing inconsistent
      whitespace is corrected.
      
      Fixes: d71be2b6 ("arm64: cpufeature: Detect SSBS and advertise to userspace")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edfc0266
    • Will Deacon's avatar
      arm64: docs: Document SSBS HWCAP · 09c22781
      Will Deacon authored
      [ Upstream commit ee911761 ]
      
      We advertise the MRS/MSR instructions for toggling SSBS at EL0 using an
      HWCAP, so document it along with the others.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09c22781
    • Will Deacon's avatar
      KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe · a59d42ac
      Will Deacon authored
      [ Upstream commit 7c36447a ]
      
      When running without VHE, it is necessary to set SCTLR_EL2.DSSBS if SSBD
      has been forcefully disabled on the kernel command-line.
      Acked-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a59d42ac
    • Will Deacon's avatar
      arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3 · 1eaff33e
      Will Deacon authored
      [ Upstream commit 8f04e8e6 ]
      
      On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD
      state without needing to call into firmware.
      
      This patch hooks into the existing SSBD infrastructure so that SSBS is
      used on CPUs that support it, but it's all made horribly complicated by
      the very real possibility of big/little systems that don't uniformly
      provide the new capability.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1eaff33e
    • Vincent Chen's avatar
      riscv: Avoid interrupts being erroneously enabled in handle_exception() · d286a374
      Vincent Chen authored
      [ Upstream commit c82dd6d0 ]
      
      When the handle_exception function addresses an exception, the interrupts
      will be unconditionally enabled after finishing the context save. However,
      It may erroneously enable the interrupts if the interrupts are disabled
      before entering the handle_exception.
      
      For example, one of the WARN_ON() condition is satisfied in the scheduling
      where the interrupt is disabled and rq.lock is locked. The WARN_ON will
      trigger a break exception and the handle_exception function will enable the
      interrupts before entering do_trap_break function. During the procedure, if
      a timer interrupt is pending, it will be taken when interrupts are enabled.
      In this case, it may cause a deadlock problem if the rq.lock is locked
      again in the timer ISR.
      
      Hence, the handle_exception() can only enable interrupts when the state of
      sstatus.SPIE is 1.
      
      This patch is tested on HiFive Unleashed board.
      Signed-off-by: default avatarVincent Chen <vincent.chen@sifive.com>
      Reviewed-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      [paul.walmsley@sifive.com: updated to apply]
      Fixes: bcae803a ("RISC-V: Enable IRQ during exception handling")
      Cc: David Abdurachmanov <david.abdurachmanov@sifive.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d286a374
    • Srikar Dronamraju's avatar
      perf stat: Reset previous counts on repeat with interval · 5b67a472
      Srikar Dronamraju authored
      [ Upstream commit b63fd11c ]
      
      When using 'perf stat' with repeat and interval option, it shows wrong
      values for events.
      
      The wrong values will be shown for the first interval on the second and
      subsequent repetitions.
      
      Without the fix:
      
        # perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
      
           2.000282489                 53      faults
           2.000282489                513      sched:sched_switch
           4.005478208              3,721      faults
           4.005478208              2,666      sched:sched_switch
           5.025470933                395      faults
           5.025470933              1,307      sched:sched_switch
           2.009602825 1,84,46,74,40,73,70,95,47,520      faults 		<------
           2.009602825 1,84,46,74,40,73,70,95,49,568      sched:sched_switch  <------
           4.019612206              4,730      faults
           4.019612206              2,746      sched:sched_switch
           5.039615484              3,953      faults
           5.039615484              1,496      sched:sched_switch
           2.000274620 1,84,46,74,40,73,70,95,47,520      faults		<------
           2.000274620 1,84,46,74,40,73,70,95,47,520      sched:sched_switch	<------
           4.000480342              4,282      faults
           4.000480342              2,303      sched:sched_switch
           5.000916811              1,322      faults
           5.000916811              1,064      sched:sched_switch
        #
      
      prev_raw_counts is allocated when using intervals. This is used when
      calculating the difference in the counts of events when using interval.
      
      The current counts are stored in prev_raw_counts to calculate the
      differences in the next iteration.
      
      On the first interval of the second and subsequent repetitions,
      prev_raw_counts would be the values stored in the last interval of the
      previous repetitions, while the current counts will only be for the
      first interval of the current repetition.
      
      Hence there is a possibility of events showing up as big number.
      
      Fix this by resetting prev_raw_counts whenever perf stat repeats the
      command.
      
      With the fix:
      
        # perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
      
           2.019349347              2,597      faults
           2.019349347              2,753      sched:sched_switch
           4.019577372              3,098      faults
           4.019577372              2,532      sched:sched_switch
           5.019415481              1,879      faults
           5.019415481              1,356      sched:sched_switch
           2.000178813              8,468      faults
           2.000178813              2,254      sched:sched_switch
           4.000404621              7,440      faults
           4.000404621              1,266      sched:sched_switch
           5.040196079              2,458      faults
           5.040196079                556      sched:sched_switch
           2.000191939              6,870      faults
           2.000191939              1,170      sched:sched_switch
           4.000414103                541      faults
           4.000414103                902      sched:sched_switch
           5.000809863                450      faults
           5.000809863                364      sched:sched_switch
        #
      
      Committer notes:
      
      This was broken since the cset introducing the --interval feature, i.e.
      --repeat + --interval wasn't tested at that point, add the Fixes tag so
      that automatic scripts can pick this up.
      
      Fixes: 13370a9b ("perf stat: Add interval printing")
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: stable@vger.kernel.org # v3.9+
      Link: http://lore.kernel.org/lkml/20190904094738.9558-2-srikar@linux.vnet.ibm.com
      [ Fixed up conflicts with libperf, i.e. some perf_{evsel,evlist} lost the 'perf' prefix ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5b67a472
    • Jiri Olsa's avatar
      perf tools: Fix segfault in cpu_cache_level__read() · 15c57bf9
      Jiri Olsa authored
      [ Upstream commit 0216234c ]
      
      We release wrong pointer on error path in cpu_cache_level__read
      function, leading to segfault:
      
        (gdb) r record ls
        Starting program: /root/perf/tools/perf/perf record ls
        ...
        [ perf record: Woken up 1 times to write data ]
        double free or corruption (out)
      
        Thread 1 "perf" received signal SIGABRT, Aborted.
        0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6
        (gdb) bt
        #0  0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6
        #1  0x00007ffff7443bac in abort () from /lib64/power9/libc.so.6
        #2  0x00007ffff74af8bc in __libc_message () from /lib64/power9/libc.so.6
        #3  0x00007ffff74b92b8 in malloc_printerr () from /lib64/power9/libc.so.6
        #4  0x00007ffff74bb874 in _int_free () from /lib64/power9/libc.so.6
        #5  0x0000000010271260 in __zfree (ptr=0x7fffffffa0b0) at ../../lib/zalloc..
        #6  0x0000000010139340 in cpu_cache_level__read (cache=0x7fffffffa090, cac..
        #7  0x0000000010143c90 in build_caches (cntp=0x7fffffffa118, size=<optimiz..
        ...
      
      Releasing the proper pointer.
      
      Fixes: 720e98b5 ("perf tools: Add perf data cache feature")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org: # v4.6+
      Link: http://lore.kernel.org/lkml/20190912105235.10689-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      15c57bf9
    • Balasubramani Vivekanandan's avatar
      tick: broadcast-hrtimer: Fix a race in bc_set_next · e5331c37
      Balasubramani Vivekanandan authored
      [ Upstream commit b9023b91 ]
      
      When a cpu requests broadcasting, before starting the tick broadcast
      hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
      using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
      the required synchronization when the callback is active on other core.
      
      The callback could have already executed tick_handle_oneshot_broadcast()
      and could have also returned. But still there is a small time window where
      the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
      without doing anything, but the next_event of the tick broadcast clock
      device is already set to a timeout value.
      
      In the race condition diagram below, CPU #1 is running the timer callback
      and CPU #2 is entering idle state and so calls bc_set_next().
      
      In the worst case, the next_event will contain an expiry time, but the
      hrtimer will not be started which happens when the racing callback returns
      HRTIMER_NORESTART. The hrtimer might never recover if all further requests
      from the CPUs to subscribe to tick broadcast have timeout greater than the
      next_event of tick broadcast clock device. This leads to cascading of
      failures and finally noticed as rcu stall warnings
      
      Here is a depiction of the race condition
      
      CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                        and subscribe to
                                                        tick broadcast)
      ---------------------                             ---------------------
      
      __run_hrtimer()                                   tick_broadcast_enter()
      
        bc_handler()                                      __tick_broadcast_oneshot_control()
      
          tick_handle_oneshot_broadcast()
      
            raw_spin_lock(&tick_broadcast_lock);
      
            dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
            //next_event for tick broadcast clock
            set to KTIME_MAX since no other cores
            subscribed to tick broadcasting
      
            raw_spin_unlock(&tick_broadcast_lock);
      
          if (dev->next_event == KTIME_MAX)
            return HRTIMER_NORESTART
          // callback function exits without
             restarting the hrtimer                      //tick_broadcast_lock acquired
                                                         raw_spin_lock(&tick_broadcast_lock);
      
                                                         tick_broadcast_set_event()
      
                                                           clockevents_program_event()
      
                                                             dev->next_event = expires;
      
                                                             bc_set_next()
      
                                                               hrtimer_try_to_cancel()
                                                               //returns -1 since the timer
                                                               callback is active. Exits without
                                                               restarting the timer
        cpu_base->running = NULL;
      
      The comment that hrtimer cannot be armed from within the callback is
      wrong. It is fine to start the hrtimer from within the callback. Also it is
      safe to start the hrtimer from the enter/exit idle code while the broadcast
      handler is active. The enter/exit idle code and the broadcast handler are
      synchronized using tick_broadcast_lock. So there is no need for the
      existing try to cancel logic. All this can be removed which will eliminate
      the race condition as well.
      
      Fixes: 5d1638ac ("tick: Introduce hrtimer based broadcast")
      Originally-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBalasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      e5331c37
    • Steven Rostedt (VMware)'s avatar
      tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure · 140acbb0
      Steven Rostedt (VMware) authored
      [ Upstream commit e0d2615856b2046c2e8d5bfd6933f37f69703b0b ]
      
      If the re-allocation of tep->cmdlines succeeds, then the previous
      allocation of tep->cmdlines will be freed. If we later fail in
      add_new_comm(), we must not free cmdlines, and also should assign
      tep->cmdlines to the new allocation. Otherwise when freeing tep, the
      tep->cmdlines will be pointing to garbage.
      
      Fixes: a6d2a61a ("tools lib traceevent: Remove some die() calls")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: linux-trace-devel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190828191819.970121417@goodmis.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      140acbb0
    • Aneesh Kumar K.V's avatar
      powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag · d1e4b4cc
      Aneesh Kumar K.V authored
      commit 09ce98ca upstream.
      
      Rename the #define to indicate this is related to store vs tlbie
      ordering issue. In the next patch, we will be adding another feature
      flag that is used to handles ERAT flush vs tlbie ordering issue.
      
      Fixes: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1e4b4cc
    • Gautham R. Shenoy's avatar
      powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt() · f5f31a6e
      Gautham R. Shenoy authored
      [ Upstream commit c784be43 ]
      
      The calls to arch_add_memory()/arch_remove_memory() are always made
      with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin().
      On pSeries, arch_add_memory()/arch_remove_memory() eventually call
      resize_hpt() which in turn calls stop_machine() which acquires the
      read-side cpu_hotplug_lock again, thereby resulting in the recursive
      acquisition of this lock.
      
      In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
      lockup during a memory hotplug operation because cpus_read_lock() is a
      per-cpu rwsem read, which, in the fast-path (in the absence of the
      writer, which in our case is a CPU-hotplug operation) simply
      increments the read_count on the semaphore. Thus a recursive read in
      the fast-path doesn't cause any problems.
      
      However, we can hit this problem in practice if there is a concurrent
      CPU-Hotplug operation in progress which is waiting to acquire the
      write-side of the lock. This will cause the second recursive read to
      block until the writer finishes. While the writer is blocked since the
      first read holds the lock. Thus both the reader as well as the writers
      fail to make any progress thereby blocking both CPU-Hotplug as well as
      Memory Hotplug operations.
      
      Memory-Hotplug				CPU-Hotplug
      CPU 0					CPU 1
      ------                                  ------
      
      1. down_read(cpu_hotplug_lock.rw_sem)
         [memory_hotplug_begin]
      					2. down_write(cpu_hotplug_lock.rw_sem)
      					[cpu_up/cpu_down]
      3. down_read(cpu_hotplug_lock.rw_sem)
         [stop_machine()]
      
      Lockdep complains as follows in these code-paths.
      
       swapper/0/1 is trying to acquire lock:
       (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
      
      but task is already holding lock:
      (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(cpu_hotplug_lock.rw_sem);
         lock(cpu_hotplug_lock.rw_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       3 locks held by swapper/0/1:
        #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
        #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
        #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
      
      stack backtrace:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
       Call Trace:
         dump_stack+0xe8/0x164 (unreliable)
         __lock_acquire+0x1110/0x1c70
         lock_acquire+0x240/0x290
         cpus_read_lock+0x64/0xf0
         stop_machine+0x2c/0x60
         pseries_lpar_resize_hpt+0x19c/0x2c0
         resize_hpt_for_hotplug+0x70/0xd0
         arch_add_memory+0x58/0xfc
         devm_memremap_pages+0x5e8/0x8f0
         pmem_attach_disk+0x764/0x830
         nvdimm_bus_probe+0x118/0x240
         really_probe+0x230/0x4b0
         driver_probe_device+0x16c/0x1e0
         __driver_attach+0x148/0x1b0
         bus_for_each_dev+0x90/0x130
         driver_attach+0x34/0x50
         bus_add_driver+0x1a8/0x360
         driver_register+0x108/0x170
         __nd_driver_register+0xd0/0xf0
         nd_pmem_driver_init+0x34/0x48
         do_one_initcall+0x1e0/0x45c
         kernel_init_freeable+0x540/0x64c
         kernel_init+0x2c/0x160
         ret_from_kernel_thread+0x5c/0x68
      
      Fix this issue by
        1) Requiring all the calls to pseries_lpar_resize_hpt() be made
           with cpu_hotplug_lock held.
      
        2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
           as a consequence of 1)
      
        3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
           with cpu_hotplug_lock held.
      
      Fixes: dbcf929c ("powerpc/pseries: Add support for hash table resizing")
      Cc: stable@vger.kernel.org # v4.11+
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      f5f31a6e
    • Xiubo Li's avatar
      nbd: fix crash when the blksize is zero · c688982f
      Xiubo Li authored
      [ Upstream commit 553768d1 ]
      
      This will allow the blksize to be set zero and then use 1024 as
      default.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarXiubo Li <xiubli@redhat.com>
      [fix to use goto out instead of return in genl_connect]
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c688982f
    • Sean Christopherson's avatar
      KVM: nVMX: Fix consistency check on injected exception error code · 63bb8b76
      Sean Christopherson authored
      [ Upstream commit 567926cc ]
      
      Current versions of Intel's SDM incorrectly state that "bits 31:15 of
      the VM-Entry exception error-code field" must be zero.  In reality, bits
      31:16 must be zero, i.e. error codes are 16-bit values.
      
      The bogus error code check manifests as an unexpected VM-Entry failure
      due to an invalid code field (error number 7) in L1, e.g. when injecting
      a #GP with error_code=0x9f00.
      
      Nadav previously reported the bug[*], both to KVM and Intel, and fixed
      the associated kvm-unit-test.
      
      [*] https://patchwork.kernel.org/patch/11124749/Reported-by: default avatarNadav Amit <namit@vmware.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      63bb8b76
    • Cédric Le Goater's avatar
      KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP · 34b13ff6
      Cédric Le Goater authored
      [ Upstream commit 237aed48 ]
      
      When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
      disabled and then the event notification queues are freed. When freeing
      the queues, we check for possible escalation interrupts and free them
      also.
      
      But when a XIVE VP is disabled, the underlying XIVE ENDs also are
      disabled in OPAL. When an END (Event Notification Descriptor) is
      disabled, its ESB pages (ESn and ESe) are disabled and loads return all
      1s. Which means that any access on the ESB page of the escalation
      interrupt will return invalid values.
      
      When an interrupt is freed, the shutdown handler computes a 'saved_p'
      field from the value returned by a load in xive_do_source_set_mask().
      This value is incorrect for escalation interrupts for the reason
      described above.
      
      This has no impact on Linux/KVM today because we don't make use of it
      but we will introduce in future changes a xive_get_irqchip_state()
      handler. This handler will use the 'saved_p' field to return the state
      of an interrupt and 'saved_p' being incorrect, softlockup will occur.
      
      Fix the vCPU cleanup sequence by first freeing the escalation interrupts
      if any, then disable the XIVE VP and last free the queues.
      
      Fixes: 90c73795 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.orgSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      34b13ff6
    • Hans de Goede's avatar
      drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed · 1b155b4f
      Hans de Goede authored
      [ Upstream commit 9dbc88d0 ]
      
      Bail from the pci_driver probe function instead of from the drm_driver
      load function.
      
      This avoid /dev/dri/card0 temporarily getting registered and then
      unregistered again, sending unwanted add / remove udev events to
      userspace.
      
      Specifically this avoids triggering the (userspace) bug fixed by this
      plymouth merge-request:
      https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59
      
      Note that despite that being an userspace bug, not sending unnecessary
      udev events is a good idea in general.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490Reviewed-by: default avatarMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b155b4f
    • Navid Emamdoost's avatar
      nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs · 04e0c84f
      Navid Emamdoost authored
      [ Upstream commit 8ce39eb5 ]
      
      In nfp_flower_spawn_vnic_reprs in the loop if initialization or the
      allocations fail memory is leaked. Appropriate releases are added.
      
      Fixes: b9452452 ("nfp: flower: add per repr private data for LAG offload")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      04e0c84f
    • Arnaldo Carvalho de Melo's avatar
      perf unwind: Fix libunwind build failure on i386 systems · 575a5bb3
      Arnaldo Carvalho de Melo authored
      [ Upstream commit 26acf400 ]
      
      Naresh Kamboju reported, that on the i386 build pr_err()
      doesn't get defined properly due to header ordering:
      
        perf-in.o: In function `libunwind__x86_reg_id':
        tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109:
        undefined reference to `pr_err'
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      575a5bb3
    • Valdis Kletnieks's avatar
      kernel/elfcore.c: include proper prototypes · b0aaf65b
      Valdis Kletnieks authored
      [ Upstream commit 0f749140 ]
      
      When building with W=1, gcc properly complains that there's no prototypes:
      
        CC      kernel/elfcore.o
      kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes]
          7 | Elf_Half __weak elf_core_extra_phdrs(void)
            |                 ^~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes]
         12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes]
         17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes]
         22 | size_t __weak elf_core_extra_data_size(void)
            |               ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Provide the include file so gcc is happy, and we don't have potential code drift
      
      Link: http://lkml.kernel.org/r/29875.1565224705@turing-policeSigned-off-by: default avatarValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0aaf65b
    • Thomas Richter's avatar
      perf build: Add detection of java-11-openjdk-devel package · bab46480
      Thomas Richter authored
      [ Upstream commit 815c1560 ]
      
      With Java 11 there is no seperate JRE anymore.
      
      Details:
      
        https://coderanch.com/t/701603/java/JRE-JDK
      
      Therefore the detection of the JRE needs to be adapted.
      
      This change works for s390 and x86.  I have not tested other platforms.
      
      Committer testing:
      
      Continues to work with the OpenJDK 8:
      
        $ rm -f ~acme/lib64/libperf-jvmti.so
        $ rpm -qa | grep jdk-devel
        java-1.8.0-openjdk-devel-1.8.0.222.b10-0.fc30.x86_64
        $ git log --oneline -1
        a51937170f33 (HEAD -> perf/core) perf build: Add detection of java-11-openjdk-devel package
        $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install > /dev/null 2>1
        $ ls -la ~acme/lib64/libperf-jvmti.so
        -rwxr-xr-x. 1 acme acme 230744 Sep 24 16:46 /home/acme/lib64/libperf-jvmti.so
        $
      Suggested-by: default avatarAndreas Krebbel <krebbel@linux.ibm.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20190909114116.50469-4-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bab46480
    • KeMeng Shi's avatar
      sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() · 46ff0e2f
      KeMeng Shi authored
      [ Upstream commit 714e501e ]
      
      An oops can be triggered in the scheduler when running qemu on arm64:
      
       Unable to handle kernel paging request at virtual address ffff000008effe40
       Internal error: Oops: 96000007 [#1] SMP
       Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
       pstate: 20000085 (nzCv daIf -PAN -UAO)
       pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
       lr : move_queued_task.isra.21+0x124/0x298
       ...
       Call trace:
        __ll_sc___cmpxchg_case_acq_4+0x4/0x20
        __migrate_task+0xc8/0xe0
        migration_cpu_stop+0x170/0x180
        cpu_stopper_thread+0xec/0x178
        smpboot_thread_fn+0x1ac/0x1e8
        kthread+0x134/0x138
        ret_from_fork+0x10/0x18
      
      __set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
      migrage the process if process is not currently running on any one of the
      CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
      invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
      CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
      check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
      corresponding bit is coincidentally set. As a consequence, kernel will
      access an invalid rq address associate with the invalid CPU in
      migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
      
      The reproduce the crash:
      
        1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
        sched_setaffinity.
      
        2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
        and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
      
        3) Oops appears if the invalid CPU is set in memory after tested cpumask.
      Signed-off-by: default avatarKeMeng Shi <shikemeng@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      46ff0e2f
    • Mathieu Desnoyers's avatar
      sched/membarrier: Fix private expedited registration check · 6cb7aa1b
      Mathieu Desnoyers authored
      [ Upstream commit fc0d7738 ]
      
      Fix a logic flaw in the way membarrier_register_private_expedited()
      handles ready state checks for private expedited sync core and private
      expedited registrations.
      
      If a private expedited membarrier registration is first performed, and
      then a private expedited sync_core registration is performed, the ready
      state check will skip the second registration when it really should not.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cb7aa1b
    • Mathieu Desnoyers's avatar
      sched/membarrier: Call sync_core only before usermode for same mm · e250f2b6
      Mathieu Desnoyers authored
      [ Upstream commit 2840cf02 ]
      
      When the prev and next task's mm change, switch_mm() provides the core
      serializing guarantees before returning to usermode. The only case
      where an explicit core serialization is needed is when the scheduler
      keeps the same mm for prev and next.
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e250f2b6
    • Nathan Chancellor's avatar
      libnvdimm/nfit_test: Fix acpi_handle redefinition · 9f33b178
      Nathan Chancellor authored
      [ Upstream commit 59f08896 ]
      
      After commit 62974fc3 ("libnvdimm: Enable unit test infrastructure
      compile checks"), clang warns:
      
      In file included from
      ../drivers/nvdimm/../../tools/testing/nvdimm/test/iomap.c:15:
      ../drivers/nvdimm/../../tools/testing/nvdimm/test/nfit_test.h:206:15:
      warning: redefinition of typedef 'acpi_handle' is a C11 feature
      [-Wtypedef-redefinition]
      typedef void *acpi_handle;
                    ^
      ../include/acpi/actypes.h:424:15: note: previous definition is here
      typedef void *acpi_handle;      /* Actually a ptr to a NS Node */
                    ^
      1 warning generated.
      
      The include chain:
      
      iomap.c ->
          linux/acpi.h ->
              acpi/acpi.h ->
                  acpi/actypes.h
          nfit_test.h
      
      Avoid this by including linux/acpi.h in nfit_test.h, which allows us to
      remove both the typedef and the forward declaration of acpi_object.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/660Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/20190918042148.77553-1-natechancellor@gmail.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9f33b178
    • zhengbin's avatar
      fuse: fix memleak in cuse_channel_open · 7b4f541f
      zhengbin authored
      [ Upstream commit 9ad09b19 ]
      
      If cuse_send_init fails, need to fuse_conn_put cc->fc.
      
      cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1)
                       ->fuse_dev_alloc->fuse_conn_get
                       ->fuse_dev_free->fuse_conn_put
      
      Fixes: cc080e9e ("fuse: introduce per-instance fuse_dev structure")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarzhengbin <zhengbin13@huawei.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7b4f541f
    • Aneesh Kumar K.V's avatar
      libnvdimm/region: Initialize bad block for volatile namespaces · 2e93d24a
      Aneesh Kumar K.V authored
      [ Upstream commit c42adf87 ]
      
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2e93d24a
    • Stefan Mavrodiev's avatar
      thermal_hwmon: Sanitize thermal_zone type · 9025adf3
      Stefan Mavrodiev authored
      [ Upstream commit 8c7aa184 ]
      
      When calling thermal_add_hwmon_sysfs(), the device type is sanitized by
      replacing '-' with '_'. However tz->type remains unsanitized. Thus
      calling thermal_hwmon_lookup_by_type() returns no device. And if there is
      no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup
      failed!".
      
      The result is unregisted hwmon devices in the sysfs.
      
      Fixes: 409ef0ba ("thermal_hwmon: Sanitize attribute name passed to hwmon")
      Signed-off-by: default avatarStefan Mavrodiev <stefan@olimex.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9025adf3
    • Ido Schimmel's avatar
      thermal: Fix use-after-free when unregistering thermal zone device · c01a9dbe
      Ido Schimmel authored
      [ Upstream commit 1851799e ]
      
      thermal_zone_device_unregister() cancels the delayed work that polls the
      thermal zone, but it does not wait for it to finish. This is racy with
      respect to the freeing of the thermal zone device, which can result in a
      use-after-free [1].
      
      Fix this by waiting for the delayed work to finish before freeing the
      thermal zone device. Note that thermal_zone_device_set_polling() is
      never invoked from an atomic context, so it is safe to call
      cancel_delayed_work_sync() that can block.
      
      [1]
      [  +0.002221] ==================================================================
      [  +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
      [  +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
      
      [  +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
      [  +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
      [  +0.000012] Call Trace:
      [  +0.000021]  dump_stack+0xa9/0x10e
      [  +0.000020]  print_address_description.cold.2+0x9/0x25e
      [  +0.000018]  __kasan_report.cold.3+0x78/0x9d
      [  +0.000016]  kasan_report+0xe/0x20
      [  +0.000016]  __mutex_lock+0x1076/0x11c0
      [  +0.000014]  step_wise_throttle+0x72/0x150
      [  +0.000018]  handle_thermal_trip+0x167/0x760
      [  +0.000019]  thermal_zone_device_update+0x19e/0x5f0
      [  +0.000019]  process_one_work+0x969/0x16f0
      [  +0.000017]  worker_thread+0x91/0xc40
      [  +0.000014]  kthread+0x33d/0x400
      [  +0.000015]  ret_from_fork+0x3a/0x50
      
      [  +0.000020] Allocated by task 1:
      [  +0.000015]  save_stack+0x19/0x80
      [  +0.000015]  __kasan_kmalloc.constprop.4+0xc1/0xd0
      [  +0.000014]  kmem_cache_alloc_trace+0x152/0x320
      [  +0.000015]  thermal_zone_device_register+0x1b4/0x13a0
      [  +0.000015]  mlxsw_thermal_init+0xc92/0x23d0
      [  +0.000014]  __mlxsw_core_bus_device_register+0x659/0x11b0
      [  +0.000013]  mlxsw_core_bus_device_register+0x3d/0x90
      [  +0.000013]  mlxsw_pci_probe+0x355/0x4b0
      [  +0.000014]  local_pci_probe+0xc3/0x150
      [  +0.000013]  pci_device_probe+0x280/0x410
      [  +0.000013]  really_probe+0x26a/0xbb0
      [  +0.000013]  driver_probe_device+0x208/0x2e0
      [  +0.000013]  device_driver_attach+0xfe/0x140
      [  +0.000013]  __driver_attach+0x110/0x310
      [  +0.000013]  bus_for_each_dev+0x14b/0x1d0
      [  +0.000013]  driver_register+0x1c0/0x400
      [  +0.000015]  mlxsw_sp_module_init+0x5d/0xd3
      [  +0.000014]  do_one_initcall+0x239/0x4dd
      [  +0.000013]  kernel_init_freeable+0x42b/0x4e8
      [  +0.000012]  kernel_init+0x11/0x18b
      [  +0.000013]  ret_from_fork+0x3a/0x50
      
      [  +0.000015] Freed by task 581:
      [  +0.000013]  save_stack+0x19/0x80
      [  +0.000014]  __kasan_slab_free+0x125/0x170
      [  +0.000013]  kfree+0xf3/0x310
      [  +0.000013]  thermal_release+0xc7/0xf0
      [  +0.000014]  device_release+0x77/0x200
      [  +0.000014]  kobject_put+0x1a8/0x4c0
      [  +0.000014]  device_unregister+0x38/0xc0
      [  +0.000014]  thermal_zone_device_unregister+0x54e/0x6a0
      [  +0.000014]  mlxsw_thermal_fini+0x184/0x35a
      [  +0.000014]  mlxsw_core_bus_device_unregister+0x10a/0x640
      [  +0.000013]  mlxsw_devlink_core_bus_device_reload+0x92/0x210
      [  +0.000015]  devlink_nl_cmd_reload+0x113/0x1f0
      [  +0.000014]  genl_family_rcv_msg+0x700/0xee0
      [  +0.000013]  genl_rcv_msg+0xca/0x170
      [  +0.000013]  netlink_rcv_skb+0x137/0x3a0
      [  +0.000012]  genl_rcv+0x29/0x40
      [  +0.000013]  netlink_unicast+0x49b/0x660
      [  +0.000013]  netlink_sendmsg+0x755/0xc90
      [  +0.000013]  __sys_sendto+0x3de/0x430
      [  +0.000013]  __x64_sys_sendto+0xe2/0x1b0
      [  +0.000013]  do_syscall_64+0xa4/0x4d0
      [  +0.000013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  +0.000017] The buggy address belongs to the object at ffff8881e48e0008
                     which belongs to the cache kmalloc-2k of size 2048
      [  +0.000012] The buggy address is located 1096 bytes inside of
                     2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
      [  +0.000007] The buggy address belongs to the page:
      [  +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
      [  +0.000020] flags: 0x200000000010200(slab|head)
      [  +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
      [  +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
      [  +0.000007] page dumped because: kasan: bad access detected
      
      [  +0.000012] Memory state around the buggy address:
      [  +0.000012]  ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012]  ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000008]                                                  ^
      [  +0.000012]  ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012]  ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000007] ==================================================================
      
      Fixes: b1569e99 ("ACPI: move thermal trip handling to generic thermal layer")
      Reported-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c01a9dbe
    • Sanjay R Mehta's avatar
      ntb: point to right memory window index · 55ebeb4e
      Sanjay R Mehta authored
      [ Upstream commit ae89339b ]
      
      second parameter of ntb_peer_mw_get_addr is pointing to wrong memory
      window index by passing "peer gidx" instead of "local gidx".
      
      For ex, "local gidx" value is '0' and "peer gidx" value is '1', then
      
      on peer side ntb_mw_set_trans() api is used as below with gidx pointing to
      local side gidx which is '0', so memroy window '0' is chosen and XLAT '0'
      will be programmed by peer side.
      
          ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat,
                          peer->inbuf_size);
      
      Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx
      pointing to "peer gidx" which is '1', so pointing to memory window '1'
      instead of memory window '0'.
      
          ntb_peer_mw_get_addr(perf->ntb,  peer->gidx, &phys_addr,
                              &peer->outbuf_size);
      
      So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().
      Signed-off-by: default avatarSanjay R Mehta <sanju.mehta@amd.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55ebeb4e
    • Arvind Sankar's avatar
      x86/purgatory: Disable the stackleak GCC plugin for the purgatory · 9dabade5
      Arvind Sankar authored
      [ Upstream commit ca14c996 ]
      
      Since commit:
      
        b059f801 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      
      kexec breaks if GCC_PLUGIN_STACKLEAK=y is enabled, as the purgatory
      contains undefined references to stackleak_track_stack.
      
      Attempting to load a kexec kernel results in this failure:
      
        kexec: Undefined symbol: stackleak_track_stack
        kexec-bzImage64: Loading purgatory failed
      
      Fix this by disabling the stackleak plugin for the purgatory.
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: b059f801 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      Link: https://lkml.kernel.org/r/20190923171753.GA2252517@rani.riverdale.lanSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9dabade5
    • Fabrice Gasnier's avatar
      pwm: stm32-lp: Add check in case requested period cannot be achieved · 65348659
      Fabrice Gasnier authored
      [ Upstream commit c91e3234 ]
      
      LPTimer can use a 32KHz clock for counting. It depends on clock tree
      configuration. In such a case, PWM output frequency range is limited.
      Although unlikely, nothing prevents user from requesting a PWM frequency
      above counting clock (32KHz for instance):
      - This causes (prd - 1) = 0xffff to be written in ARR register later in
      the apply() routine.
      This results in badly configured PWM period (and also duty_cycle).
      Add a check to report an error is such a case.
      Signed-off-by: default avatarFabrice Gasnier <fabrice.gasnier@st.com>
      Reviewed-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarThierry Reding <thierry.reding@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      65348659
    • Trond Myklebust's avatar
      pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors · 19b1c70e
      Trond Myklebust authored
      [ Upstream commit 9c47b18c ]
      
      IF the server rejected our layout return with a state error such as
      NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want
      to clear out all the remaining layout segments and mark that stateid
      as invalid.
      
      Fixes: 1c5bd76d ("pNFS: Enable layoutreturn operation for...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      19b1c70e
    • Trek's avatar
      drm/amdgpu: Check for valid number of registers to read · 1c70ae6a
      Trek authored
      [ Upstream commit 73d8e6c7 ]
      
      Do not try to allocate any amount of memory requested by the user.
      Instead limit it to 128 registers. Actually the longest series of
      consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and
      mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
      
      Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273Signed-off-by: default avatarTrek <trek00@inbox.ru>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c70ae6a
    • Felix Kuehling's avatar
      drm/amdgpu: Fix KFD-related kernel oops on Hawaii · e0af3b19
      Felix Kuehling authored
      [ Upstream commit dcafbd50 ]
      
      Hawaii needs to flush caches explicitly, submitting an IB in a user
      VMID from kernel mode. There is no s_fence in this case.
      
      Fixes: eb3961a5 ("drm/amdgpu: remove fence context from the job")
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0af3b19
    • Florian Westphal's avatar
      netfilter: nf_tables: allow lookups in dynamic sets · f7ace7f2
      Florian Westphal authored
      [ Upstream commit acab7131 ]
      
      This un-breaks lookups in sets that have the 'dynamic' flag set.
      Given this active example configuration:
      
      table filter {
        set set1 {
          type ipv4_addr
          size 64
          flags dynamic,timeout
          timeout 1m
        }
      
        chain input {
           type filter hook input priority 0; policy accept;
        }
      }
      
      ... this works:
      nft add rule ip filter input add @set1 { ip saddr }
      
      -> whenever rule is triggered, the source ip address is inserted
      into the set (if it did not exist).
      
      This won't work:
      nft add rule ip filter input ip saddr @set1 counter
      Error: Could not process rule: Operation not supported
      
      In other words, we can add entries to the set, but then can't make
      matching decision based on that set.
      
      That is just wrong -- all set backends support lookups (else they would
      not be very useful).
      The failure comes from an explicit rejection in nft_lookup.c.
      
      Looking at the history, it seems like NFT_SET_EVAL used to mean
      'set contains expressions' (aka. "is a meter"), for instance something like
      
       nft add rule ip filter input meter example { ip saddr limit rate 10/second }
       or
       nft add rule ip filter input meter example { ip saddr counter }
      
      The actual meaning of NFT_SET_EVAL however, is
      'set can be updated from the packet path'.
      
      'meters' and packet-path insertions into sets, such as
      'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
      and thus require a set backend that provides the ->update() function.
      
      The only set that provides this also is the only one that has the
      NFT_SET_EVAL feature flag.
      
      Removing the wrong check makes the above example work.
      While at it, also fix the flag check during set instantiation to
      allow supported combinations only.
      
      Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f7ace7f2
    • Ryan Chen's avatar
      watchdog: aspeed: Add support for AST2600 · f217883b
      Ryan Chen authored
      [ Upstream commit b3528b48 ]
      
      The ast2600 can be supported by the same code as the ast2500.
      Signed-off-by: default avatarRyan Chen <ryan_chen@aspeedtech.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20190819051738.17370-3-joel@jms.id.auSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f217883b
    • Erqi Chen's avatar
      ceph: reconnect connection if session hang in opening state · 520c2a64
      Erqi Chen authored
      [ Upstream commit 71a228bc ]
      
      If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
      mds won't send session msg to client, and delayed_work skip
      CEPH_MDS_SESSION_OPENING state session, the session hang forever.
      
      Allow ceph_con_keepalive to reconnect a session in OPENING to avoid
      session hang. Also, ensure that we skip sessions in RESTARTING and
      REJECTED states since those states can't be resurrected by issuing
      a keepalive.
      
      Link: https://tracker.ceph.com/issues/41551
      Signed-off-by: Erqi Chen chenerqi@gmail.com
      Reviewed-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      520c2a64
    • Luis Henriques's avatar
      ceph: fix directories inode i_blkbits initialization · 0275113f
      Luis Henriques authored
      [ Upstream commit 75067034 ]
      
      When filling an inode with info from the MDS, i_blkbits is being
      initialized using fl_stripe_unit, which contains the stripe unit in
      bytes.  Unfortunately, this doesn't make sense for directories as they
      have fl_stripe_unit set to '0'.  This means that i_blkbits will be set
      to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
      
        UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
        shift exponent 255 is too large for 32-bit type 'int'
      
      Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
      is zero.
      Signed-off-by: default avatarLuis Henriques <lhenriques@suse.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0275113f
    • Igor Druzhinin's avatar
      xen/pci: reserve MCFG areas earlier · 2bc2a90a
      Igor Druzhinin authored
      [ Upstream commit a4098bc6 ]
      
      If MCFG area is not reserved in E820, Xen by default will defer its usage
      until Dom0 registers it explicitly after ACPI parser recognizes it as
      a reserved resource in DSDT. Having it reserved in E820 is not
      mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
      and firmware is free to keep a hole in E820 in that place. Xen doesn't know
      what exactly is inside this hole since it lacks full ACPI view of the
      platform therefore it's potentially harmful to access MCFG region
      without additional checks as some machines are known to provide
      inconsistent information on the size of the region.
      
      Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
      PCI enumeration starts exactly there as well. Trying to register a device
      prior to MCFG reservation causes multiple problems with PCIe extended
      capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
      no convenient hooks for us to subscribe to so register MCFG areas earlier
      upon the first invocation of xen_add_device(). It should be safe to do once
      since all the boot time buses must have their MCFG areas in MCFG table
      already and we don't support PCI bus hot-plug.
      Signed-off-by: default avatarIgor Druzhinin <igor.druzhinin@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2bc2a90a
    • Chengguang Xu's avatar
      9p: avoid attaching writeback_fid on mmap with type PRIVATE · 18dd2b05
      Chengguang Xu authored
      [ Upstream commit c87a37eb ]
      
      Currently on mmap cache policy, we always attach writeback_fid
      whether mmap type is SHARED or PRIVATE. However, in the use case
      of kata-container which combines 9p(Guest OS) with overlayfs(Host OS),
      this behavior will trigger overlayfs' copy-up when excute command
      inside container.
      
      Link: http://lkml.kernel.org/r/20190820100325.10313-1-cgxu519@zoho.com.cnSigned-off-by: default avatarChengguang Xu <cgxu519@zoho.com.cn>
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      18dd2b05