1. 11 Oct, 2019 40 commits
    • Cédric Le Goater's avatar
      KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP · 34b13ff6
      Cédric Le Goater authored
      [ Upstream commit 237aed48 ]
      
      When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
      disabled and then the event notification queues are freed. When freeing
      the queues, we check for possible escalation interrupts and free them
      also.
      
      But when a XIVE VP is disabled, the underlying XIVE ENDs also are
      disabled in OPAL. When an END (Event Notification Descriptor) is
      disabled, its ESB pages (ESn and ESe) are disabled and loads return all
      1s. Which means that any access on the ESB page of the escalation
      interrupt will return invalid values.
      
      When an interrupt is freed, the shutdown handler computes a 'saved_p'
      field from the value returned by a load in xive_do_source_set_mask().
      This value is incorrect for escalation interrupts for the reason
      described above.
      
      This has no impact on Linux/KVM today because we don't make use of it
      but we will introduce in future changes a xive_get_irqchip_state()
      handler. This handler will use the 'saved_p' field to return the state
      of an interrupt and 'saved_p' being incorrect, softlockup will occur.
      
      Fix the vCPU cleanup sequence by first freeing the escalation interrupts
      if any, then disable the XIVE VP and last free the queues.
      
      Fixes: 90c73795 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.orgSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      34b13ff6
    • Hans de Goede's avatar
      drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed · 1b155b4f
      Hans de Goede authored
      [ Upstream commit 9dbc88d0 ]
      
      Bail from the pci_driver probe function instead of from the drm_driver
      load function.
      
      This avoid /dev/dri/card0 temporarily getting registered and then
      unregistered again, sending unwanted add / remove udev events to
      userspace.
      
      Specifically this avoids triggering the (userspace) bug fixed by this
      plymouth merge-request:
      https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59
      
      Note that despite that being an userspace bug, not sending unnecessary
      udev events is a good idea in general.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490Reviewed-by: default avatarMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b155b4f
    • Navid Emamdoost's avatar
      nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs · 04e0c84f
      Navid Emamdoost authored
      [ Upstream commit 8ce39eb5 ]
      
      In nfp_flower_spawn_vnic_reprs in the loop if initialization or the
      allocations fail memory is leaked. Appropriate releases are added.
      
      Fixes: b9452452 ("nfp: flower: add per repr private data for LAG offload")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      04e0c84f
    • Arnaldo Carvalho de Melo's avatar
      perf unwind: Fix libunwind build failure on i386 systems · 575a5bb3
      Arnaldo Carvalho de Melo authored
      [ Upstream commit 26acf400 ]
      
      Naresh Kamboju reported, that on the i386 build pr_err()
      doesn't get defined properly due to header ordering:
      
        perf-in.o: In function `libunwind__x86_reg_id':
        tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109:
        undefined reference to `pr_err'
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      575a5bb3
    • Valdis Kletnieks's avatar
      kernel/elfcore.c: include proper prototypes · b0aaf65b
      Valdis Kletnieks authored
      [ Upstream commit 0f749140 ]
      
      When building with W=1, gcc properly complains that there's no prototypes:
      
        CC      kernel/elfcore.o
      kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes]
          7 | Elf_Half __weak elf_core_extra_phdrs(void)
            |                 ^~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes]
         12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes]
         17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes]
         22 | size_t __weak elf_core_extra_data_size(void)
            |               ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Provide the include file so gcc is happy, and we don't have potential code drift
      
      Link: http://lkml.kernel.org/r/29875.1565224705@turing-policeSigned-off-by: default avatarValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0aaf65b
    • Thomas Richter's avatar
      perf build: Add detection of java-11-openjdk-devel package · bab46480
      Thomas Richter authored
      [ Upstream commit 815c1560 ]
      
      With Java 11 there is no seperate JRE anymore.
      
      Details:
      
        https://coderanch.com/t/701603/java/JRE-JDK
      
      Therefore the detection of the JRE needs to be adapted.
      
      This change works for s390 and x86.  I have not tested other platforms.
      
      Committer testing:
      
      Continues to work with the OpenJDK 8:
      
        $ rm -f ~acme/lib64/libperf-jvmti.so
        $ rpm -qa | grep jdk-devel
        java-1.8.0-openjdk-devel-1.8.0.222.b10-0.fc30.x86_64
        $ git log --oneline -1
        a51937170f33 (HEAD -> perf/core) perf build: Add detection of java-11-openjdk-devel package
        $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install > /dev/null 2>1
        $ ls -la ~acme/lib64/libperf-jvmti.so
        -rwxr-xr-x. 1 acme acme 230744 Sep 24 16:46 /home/acme/lib64/libperf-jvmti.so
        $
      Suggested-by: default avatarAndreas Krebbel <krebbel@linux.ibm.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20190909114116.50469-4-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bab46480
    • KeMeng Shi's avatar
      sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() · 46ff0e2f
      KeMeng Shi authored
      [ Upstream commit 714e501e ]
      
      An oops can be triggered in the scheduler when running qemu on arm64:
      
       Unable to handle kernel paging request at virtual address ffff000008effe40
       Internal error: Oops: 96000007 [#1] SMP
       Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
       pstate: 20000085 (nzCv daIf -PAN -UAO)
       pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
       lr : move_queued_task.isra.21+0x124/0x298
       ...
       Call trace:
        __ll_sc___cmpxchg_case_acq_4+0x4/0x20
        __migrate_task+0xc8/0xe0
        migration_cpu_stop+0x170/0x180
        cpu_stopper_thread+0xec/0x178
        smpboot_thread_fn+0x1ac/0x1e8
        kthread+0x134/0x138
        ret_from_fork+0x10/0x18
      
      __set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
      migrage the process if process is not currently running on any one of the
      CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
      invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
      CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
      check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
      corresponding bit is coincidentally set. As a consequence, kernel will
      access an invalid rq address associate with the invalid CPU in
      migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
      
      The reproduce the crash:
      
        1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
        sched_setaffinity.
      
        2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
        and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
      
        3) Oops appears if the invalid CPU is set in memory after tested cpumask.
      Signed-off-by: default avatarKeMeng Shi <shikemeng@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      46ff0e2f
    • Mathieu Desnoyers's avatar
      sched/membarrier: Fix private expedited registration check · 6cb7aa1b
      Mathieu Desnoyers authored
      [ Upstream commit fc0d7738 ]
      
      Fix a logic flaw in the way membarrier_register_private_expedited()
      handles ready state checks for private expedited sync core and private
      expedited registrations.
      
      If a private expedited membarrier registration is first performed, and
      then a private expedited sync_core registration is performed, the ready
      state check will skip the second registration when it really should not.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cb7aa1b
    • Mathieu Desnoyers's avatar
      sched/membarrier: Call sync_core only before usermode for same mm · e250f2b6
      Mathieu Desnoyers authored
      [ Upstream commit 2840cf02 ]
      
      When the prev and next task's mm change, switch_mm() provides the core
      serializing guarantees before returning to usermode. The only case
      where an explicit core serialization is needed is when the scheduler
      keeps the same mm for prev and next.
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e250f2b6
    • Nathan Chancellor's avatar
      libnvdimm/nfit_test: Fix acpi_handle redefinition · 9f33b178
      Nathan Chancellor authored
      [ Upstream commit 59f08896 ]
      
      After commit 62974fc3 ("libnvdimm: Enable unit test infrastructure
      compile checks"), clang warns:
      
      In file included from
      ../drivers/nvdimm/../../tools/testing/nvdimm/test/iomap.c:15:
      ../drivers/nvdimm/../../tools/testing/nvdimm/test/nfit_test.h:206:15:
      warning: redefinition of typedef 'acpi_handle' is a C11 feature
      [-Wtypedef-redefinition]
      typedef void *acpi_handle;
                    ^
      ../include/acpi/actypes.h:424:15: note: previous definition is here
      typedef void *acpi_handle;      /* Actually a ptr to a NS Node */
                    ^
      1 warning generated.
      
      The include chain:
      
      iomap.c ->
          linux/acpi.h ->
              acpi/acpi.h ->
                  acpi/actypes.h
          nfit_test.h
      
      Avoid this by including linux/acpi.h in nfit_test.h, which allows us to
      remove both the typedef and the forward declaration of acpi_object.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/660Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/20190918042148.77553-1-natechancellor@gmail.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9f33b178
    • zhengbin's avatar
      fuse: fix memleak in cuse_channel_open · 7b4f541f
      zhengbin authored
      [ Upstream commit 9ad09b19 ]
      
      If cuse_send_init fails, need to fuse_conn_put cc->fc.
      
      cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1)
                       ->fuse_dev_alloc->fuse_conn_get
                       ->fuse_dev_free->fuse_conn_put
      
      Fixes: cc080e9e ("fuse: introduce per-instance fuse_dev structure")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarzhengbin <zhengbin13@huawei.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7b4f541f
    • Aneesh Kumar K.V's avatar
      libnvdimm/region: Initialize bad block for volatile namespaces · 2e93d24a
      Aneesh Kumar K.V authored
      [ Upstream commit c42adf87 ]
      
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2e93d24a
    • Stefan Mavrodiev's avatar
      thermal_hwmon: Sanitize thermal_zone type · 9025adf3
      Stefan Mavrodiev authored
      [ Upstream commit 8c7aa184 ]
      
      When calling thermal_add_hwmon_sysfs(), the device type is sanitized by
      replacing '-' with '_'. However tz->type remains unsanitized. Thus
      calling thermal_hwmon_lookup_by_type() returns no device. And if there is
      no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup
      failed!".
      
      The result is unregisted hwmon devices in the sysfs.
      
      Fixes: 409ef0ba ("thermal_hwmon: Sanitize attribute name passed to hwmon")
      Signed-off-by: default avatarStefan Mavrodiev <stefan@olimex.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9025adf3
    • Ido Schimmel's avatar
      thermal: Fix use-after-free when unregistering thermal zone device · c01a9dbe
      Ido Schimmel authored
      [ Upstream commit 1851799e ]
      
      thermal_zone_device_unregister() cancels the delayed work that polls the
      thermal zone, but it does not wait for it to finish. This is racy with
      respect to the freeing of the thermal zone device, which can result in a
      use-after-free [1].
      
      Fix this by waiting for the delayed work to finish before freeing the
      thermal zone device. Note that thermal_zone_device_set_polling() is
      never invoked from an atomic context, so it is safe to call
      cancel_delayed_work_sync() that can block.
      
      [1]
      [  +0.002221] ==================================================================
      [  +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
      [  +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
      
      [  +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
      [  +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
      [  +0.000012] Call Trace:
      [  +0.000021]  dump_stack+0xa9/0x10e
      [  +0.000020]  print_address_description.cold.2+0x9/0x25e
      [  +0.000018]  __kasan_report.cold.3+0x78/0x9d
      [  +0.000016]  kasan_report+0xe/0x20
      [  +0.000016]  __mutex_lock+0x1076/0x11c0
      [  +0.000014]  step_wise_throttle+0x72/0x150
      [  +0.000018]  handle_thermal_trip+0x167/0x760
      [  +0.000019]  thermal_zone_device_update+0x19e/0x5f0
      [  +0.000019]  process_one_work+0x969/0x16f0
      [  +0.000017]  worker_thread+0x91/0xc40
      [  +0.000014]  kthread+0x33d/0x400
      [  +0.000015]  ret_from_fork+0x3a/0x50
      
      [  +0.000020] Allocated by task 1:
      [  +0.000015]  save_stack+0x19/0x80
      [  +0.000015]  __kasan_kmalloc.constprop.4+0xc1/0xd0
      [  +0.000014]  kmem_cache_alloc_trace+0x152/0x320
      [  +0.000015]  thermal_zone_device_register+0x1b4/0x13a0
      [  +0.000015]  mlxsw_thermal_init+0xc92/0x23d0
      [  +0.000014]  __mlxsw_core_bus_device_register+0x659/0x11b0
      [  +0.000013]  mlxsw_core_bus_device_register+0x3d/0x90
      [  +0.000013]  mlxsw_pci_probe+0x355/0x4b0
      [  +0.000014]  local_pci_probe+0xc3/0x150
      [  +0.000013]  pci_device_probe+0x280/0x410
      [  +0.000013]  really_probe+0x26a/0xbb0
      [  +0.000013]  driver_probe_device+0x208/0x2e0
      [  +0.000013]  device_driver_attach+0xfe/0x140
      [  +0.000013]  __driver_attach+0x110/0x310
      [  +0.000013]  bus_for_each_dev+0x14b/0x1d0
      [  +0.000013]  driver_register+0x1c0/0x400
      [  +0.000015]  mlxsw_sp_module_init+0x5d/0xd3
      [  +0.000014]  do_one_initcall+0x239/0x4dd
      [  +0.000013]  kernel_init_freeable+0x42b/0x4e8
      [  +0.000012]  kernel_init+0x11/0x18b
      [  +0.000013]  ret_from_fork+0x3a/0x50
      
      [  +0.000015] Freed by task 581:
      [  +0.000013]  save_stack+0x19/0x80
      [  +0.000014]  __kasan_slab_free+0x125/0x170
      [  +0.000013]  kfree+0xf3/0x310
      [  +0.000013]  thermal_release+0xc7/0xf0
      [  +0.000014]  device_release+0x77/0x200
      [  +0.000014]  kobject_put+0x1a8/0x4c0
      [  +0.000014]  device_unregister+0x38/0xc0
      [  +0.000014]  thermal_zone_device_unregister+0x54e/0x6a0
      [  +0.000014]  mlxsw_thermal_fini+0x184/0x35a
      [  +0.000014]  mlxsw_core_bus_device_unregister+0x10a/0x640
      [  +0.000013]  mlxsw_devlink_core_bus_device_reload+0x92/0x210
      [  +0.000015]  devlink_nl_cmd_reload+0x113/0x1f0
      [  +0.000014]  genl_family_rcv_msg+0x700/0xee0
      [  +0.000013]  genl_rcv_msg+0xca/0x170
      [  +0.000013]  netlink_rcv_skb+0x137/0x3a0
      [  +0.000012]  genl_rcv+0x29/0x40
      [  +0.000013]  netlink_unicast+0x49b/0x660
      [  +0.000013]  netlink_sendmsg+0x755/0xc90
      [  +0.000013]  __sys_sendto+0x3de/0x430
      [  +0.000013]  __x64_sys_sendto+0xe2/0x1b0
      [  +0.000013]  do_syscall_64+0xa4/0x4d0
      [  +0.000013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  +0.000017] The buggy address belongs to the object at ffff8881e48e0008
                     which belongs to the cache kmalloc-2k of size 2048
      [  +0.000012] The buggy address is located 1096 bytes inside of
                     2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
      [  +0.000007] The buggy address belongs to the page:
      [  +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
      [  +0.000020] flags: 0x200000000010200(slab|head)
      [  +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
      [  +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
      [  +0.000007] page dumped because: kasan: bad access detected
      
      [  +0.000012] Memory state around the buggy address:
      [  +0.000012]  ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012]  ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000008]                                                  ^
      [  +0.000012]  ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012]  ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000007] ==================================================================
      
      Fixes: b1569e99 ("ACPI: move thermal trip handling to generic thermal layer")
      Reported-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c01a9dbe
    • Sanjay R Mehta's avatar
      ntb: point to right memory window index · 55ebeb4e
      Sanjay R Mehta authored
      [ Upstream commit ae89339b ]
      
      second parameter of ntb_peer_mw_get_addr is pointing to wrong memory
      window index by passing "peer gidx" instead of "local gidx".
      
      For ex, "local gidx" value is '0' and "peer gidx" value is '1', then
      
      on peer side ntb_mw_set_trans() api is used as below with gidx pointing to
      local side gidx which is '0', so memroy window '0' is chosen and XLAT '0'
      will be programmed by peer side.
      
          ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat,
                          peer->inbuf_size);
      
      Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx
      pointing to "peer gidx" which is '1', so pointing to memory window '1'
      instead of memory window '0'.
      
          ntb_peer_mw_get_addr(perf->ntb,  peer->gidx, &phys_addr,
                              &peer->outbuf_size);
      
      So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().
      Signed-off-by: default avatarSanjay R Mehta <sanju.mehta@amd.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55ebeb4e
    • Arvind Sankar's avatar
      x86/purgatory: Disable the stackleak GCC plugin for the purgatory · 9dabade5
      Arvind Sankar authored
      [ Upstream commit ca14c996 ]
      
      Since commit:
      
        b059f801 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      
      kexec breaks if GCC_PLUGIN_STACKLEAK=y is enabled, as the purgatory
      contains undefined references to stackleak_track_stack.
      
      Attempting to load a kexec kernel results in this failure:
      
        kexec: Undefined symbol: stackleak_track_stack
        kexec-bzImage64: Loading purgatory failed
      
      Fix this by disabling the stackleak plugin for the purgatory.
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: b059f801 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      Link: https://lkml.kernel.org/r/20190923171753.GA2252517@rani.riverdale.lanSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9dabade5
    • Fabrice Gasnier's avatar
      pwm: stm32-lp: Add check in case requested period cannot be achieved · 65348659
      Fabrice Gasnier authored
      [ Upstream commit c91e3234 ]
      
      LPTimer can use a 32KHz clock for counting. It depends on clock tree
      configuration. In such a case, PWM output frequency range is limited.
      Although unlikely, nothing prevents user from requesting a PWM frequency
      above counting clock (32KHz for instance):
      - This causes (prd - 1) = 0xffff to be written in ARR register later in
      the apply() routine.
      This results in badly configured PWM period (and also duty_cycle).
      Add a check to report an error is such a case.
      Signed-off-by: default avatarFabrice Gasnier <fabrice.gasnier@st.com>
      Reviewed-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarThierry Reding <thierry.reding@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      65348659
    • Trond Myklebust's avatar
      pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors · 19b1c70e
      Trond Myklebust authored
      [ Upstream commit 9c47b18c ]
      
      IF the server rejected our layout return with a state error such as
      NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want
      to clear out all the remaining layout segments and mark that stateid
      as invalid.
      
      Fixes: 1c5bd76d ("pNFS: Enable layoutreturn operation for...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      19b1c70e
    • Trek's avatar
      drm/amdgpu: Check for valid number of registers to read · 1c70ae6a
      Trek authored
      [ Upstream commit 73d8e6c7 ]
      
      Do not try to allocate any amount of memory requested by the user.
      Instead limit it to 128 registers. Actually the longest series of
      consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and
      mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
      
      Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273Signed-off-by: default avatarTrek <trek00@inbox.ru>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c70ae6a
    • Felix Kuehling's avatar
      drm/amdgpu: Fix KFD-related kernel oops on Hawaii · e0af3b19
      Felix Kuehling authored
      [ Upstream commit dcafbd50 ]
      
      Hawaii needs to flush caches explicitly, submitting an IB in a user
      VMID from kernel mode. There is no s_fence in this case.
      
      Fixes: eb3961a5 ("drm/amdgpu: remove fence context from the job")
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0af3b19
    • Florian Westphal's avatar
      netfilter: nf_tables: allow lookups in dynamic sets · f7ace7f2
      Florian Westphal authored
      [ Upstream commit acab7131 ]
      
      This un-breaks lookups in sets that have the 'dynamic' flag set.
      Given this active example configuration:
      
      table filter {
        set set1 {
          type ipv4_addr
          size 64
          flags dynamic,timeout
          timeout 1m
        }
      
        chain input {
           type filter hook input priority 0; policy accept;
        }
      }
      
      ... this works:
      nft add rule ip filter input add @set1 { ip saddr }
      
      -> whenever rule is triggered, the source ip address is inserted
      into the set (if it did not exist).
      
      This won't work:
      nft add rule ip filter input ip saddr @set1 counter
      Error: Could not process rule: Operation not supported
      
      In other words, we can add entries to the set, but then can't make
      matching decision based on that set.
      
      That is just wrong -- all set backends support lookups (else they would
      not be very useful).
      The failure comes from an explicit rejection in nft_lookup.c.
      
      Looking at the history, it seems like NFT_SET_EVAL used to mean
      'set contains expressions' (aka. "is a meter"), for instance something like
      
       nft add rule ip filter input meter example { ip saddr limit rate 10/second }
       or
       nft add rule ip filter input meter example { ip saddr counter }
      
      The actual meaning of NFT_SET_EVAL however, is
      'set can be updated from the packet path'.
      
      'meters' and packet-path insertions into sets, such as
      'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
      and thus require a set backend that provides the ->update() function.
      
      The only set that provides this also is the only one that has the
      NFT_SET_EVAL feature flag.
      
      Removing the wrong check makes the above example work.
      While at it, also fix the flag check during set instantiation to
      allow supported combinations only.
      
      Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f7ace7f2
    • Ryan Chen's avatar
      watchdog: aspeed: Add support for AST2600 · f217883b
      Ryan Chen authored
      [ Upstream commit b3528b48 ]
      
      The ast2600 can be supported by the same code as the ast2500.
      Signed-off-by: default avatarRyan Chen <ryan_chen@aspeedtech.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20190819051738.17370-3-joel@jms.id.auSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f217883b
    • Erqi Chen's avatar
      ceph: reconnect connection if session hang in opening state · 520c2a64
      Erqi Chen authored
      [ Upstream commit 71a228bc ]
      
      If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
      mds won't send session msg to client, and delayed_work skip
      CEPH_MDS_SESSION_OPENING state session, the session hang forever.
      
      Allow ceph_con_keepalive to reconnect a session in OPENING to avoid
      session hang. Also, ensure that we skip sessions in RESTARTING and
      REJECTED states since those states can't be resurrected by issuing
      a keepalive.
      
      Link: https://tracker.ceph.com/issues/41551
      Signed-off-by: Erqi Chen chenerqi@gmail.com
      Reviewed-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      520c2a64
    • Luis Henriques's avatar
      ceph: fix directories inode i_blkbits initialization · 0275113f
      Luis Henriques authored
      [ Upstream commit 75067034 ]
      
      When filling an inode with info from the MDS, i_blkbits is being
      initialized using fl_stripe_unit, which contains the stripe unit in
      bytes.  Unfortunately, this doesn't make sense for directories as they
      have fl_stripe_unit set to '0'.  This means that i_blkbits will be set
      to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
      
        UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
        shift exponent 255 is too large for 32-bit type 'int'
      
      Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
      is zero.
      Signed-off-by: default avatarLuis Henriques <lhenriques@suse.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0275113f
    • Igor Druzhinin's avatar
      xen/pci: reserve MCFG areas earlier · 2bc2a90a
      Igor Druzhinin authored
      [ Upstream commit a4098bc6 ]
      
      If MCFG area is not reserved in E820, Xen by default will defer its usage
      until Dom0 registers it explicitly after ACPI parser recognizes it as
      a reserved resource in DSDT. Having it reserved in E820 is not
      mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
      and firmware is free to keep a hole in E820 in that place. Xen doesn't know
      what exactly is inside this hole since it lacks full ACPI view of the
      platform therefore it's potentially harmful to access MCFG region
      without additional checks as some machines are known to provide
      inconsistent information on the size of the region.
      
      Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
      PCI enumeration starts exactly there as well. Trying to register a device
      prior to MCFG reservation causes multiple problems with PCIe extended
      capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
      no convenient hooks for us to subscribe to so register MCFG areas earlier
      upon the first invocation of xen_add_device(). It should be safe to do once
      since all the boot time buses must have their MCFG areas in MCFG table
      already and we don't support PCI bus hot-plug.
      Signed-off-by: default avatarIgor Druzhinin <igor.druzhinin@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2bc2a90a
    • Chengguang Xu's avatar
      9p: avoid attaching writeback_fid on mmap with type PRIVATE · 18dd2b05
      Chengguang Xu authored
      [ Upstream commit c87a37eb ]
      
      Currently on mmap cache policy, we always attach writeback_fid
      whether mmap type is SHARED or PRIVATE. However, in the use case
      of kata-container which combines 9p(Guest OS) with overlayfs(Host OS),
      this behavior will trigger overlayfs' copy-up when excute command
      inside container.
      
      Link: http://lkml.kernel.org/r/20190820100325.10313-1-cgxu519@zoho.com.cnSigned-off-by: default avatarChengguang Xu <cgxu519@zoho.com.cn>
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      18dd2b05
    • Lu Shuaibing's avatar
      9p: Transport error uninitialized · 07f3596c
      Lu Shuaibing authored
      [ Upstream commit 0ce772fe ]
      
      The p9_tag_alloc() does not initialize the transport error t_err field.
      The struct p9_req_t *req is allocated and stored in a struct p9_client
      variable. The field t_err is never initialized before p9_conn_cancel()
      checks its value.
      
      KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
      reports this bug.
      
      ==================================================================
      BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0
      Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216
      
      CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      Workqueue: events p9_write_work
      Call Trace:
       dump_stack+0x75/0xae
       __kumsan_report+0x17c/0x3e6
       kumsan_report+0xe/0x20
       p9_conn_cancel+0x2d9/0x3b0
       p9_write_work+0x183/0x4a0
       process_one_work+0x4d1/0x8c0
       worker_thread+0x6e/0x780
       kthread+0x1ca/0x1f0
       ret_from_fork+0x35/0x40
      
      Allocated by task 1979:
       save_stack+0x19/0x80
       __kumsan_kmalloc.constprop.3+0xbc/0x120
       kmem_cache_alloc+0xa7/0x170
       p9_client_prepare_req.part.9+0x3b/0x380
       p9_client_rpc+0x15e/0x880
       p9_client_create+0x3d0/0xac0
       v9fs_session_init+0x192/0xc80
       v9fs_mount+0x67/0x470
       legacy_get_tree+0x70/0xd0
       vfs_get_tree+0x4a/0x1c0
       do_mount+0xba9/0xf90
       ksys_mount+0xa8/0x120
       __x64_sys_mount+0x62/0x70
       do_syscall_64+0x6d/0x1e0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 0:
      (stack is not available)
      
      The buggy address belongs to the object at ffff88805f9b6008
       which belongs to the cache p9_req_t of size 144
      The buggy address is located 4 bytes inside of
       144-byte region [ffff88805f9b6008, ffff88805f9b6098)
      The buggy address belongs to the page:
      page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0
      flags: 0x100000000010200(slab|head)
      raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740
      raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000
      page dumped because: kumsan: bad access detected
      ==================================================================
      
      Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.comSigned-off-by: default avatarLu Shuaibing <shuaibinglu@126.com>
      [dominique.martinet@cea.fr: grouped the added init with the others]
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      07f3596c
    • Jia-Ju Bai's avatar
      fs: nfs: Fix possible null-pointer dereferences in encode_attrs() · 448deb13
      Jia-Ju Bai authored
      [ Upstream commit e2751463 ]
      
      In encode_attrs(), there is an if statement on line 1145 to check
      whether label is NULL:
          if (label && (attrmask[2] & FATTR4_WORD2_SECURITY_LABEL))
      
      When label is NULL, it is used on lines 1178-1181:
          *p++ = cpu_to_be32(label->lfs);
          *p++ = cpu_to_be32(label->pi);
          *p++ = cpu_to_be32(label->len);
          p = xdr_encode_opaque_fixed(p, label->label, label->len);
      
      To fix these bugs, label is checked before being used.
      
      These bugs are found by a static analysis tool STCheck written by us.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      448deb13
    • Sascha Hauer's avatar
      ima: fix freeing ongoing ahash_request · 4753e7a8
      Sascha Hauer authored
      [ Upstream commit 4ece3125 ]
      
      integrity_kernel_read() can fail in which case we forward to call
      ahash_request_free() on a currently running request. We have to wait
      for its completion before we can free the request.
      
      This was observed by interrupting a "find / -type f -xdev -print0 | xargs -0
      cat 1>/dev/null" with ctrl-c on an IMA enabled filesystem.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4753e7a8
    • Sascha Hauer's avatar
      ima: always return negative code for error · b69c3085
      Sascha Hauer authored
      [ Upstream commit f5e10401 ]
      
      integrity_kernel_read() returns the number of bytes read. If this is
      a short read then this positive value is returned from
      ima_calc_file_hash_atfm(). Currently this is only indirectly called from
      ima_calc_file_hash() and this function only tests for the return value
      being zero or nonzero and also doesn't forward the return value.
      Nevertheless there's no point in returning a positive value as an error,
      so translate a short read into -EINVAL.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b69c3085
    • Will Deacon's avatar
      arm64: cpufeature: Detect SSBS and advertise to userspace · 6df3c66d
      Will Deacon authored
      commit d71be2b6 upstream.
      
      Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass
      Safe (SSBS) which can be used as a mitigation against Spectre variant 4.
      
      Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS
      directly, so that userspace can toggle the SSBS control without trapping
      to the kernel.
      
      This patch probes for the existence of SSBS and advertise the new instructions
      to userspace if they exist.
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6df3c66d
    • Johannes Berg's avatar
      cfg80211: initialize on-stack chandefs · 3a0e6733
      Johannes Berg authored
      commit f43e5210 upstream.
      
      In a few places we don't properly initialize on-stack chandefs,
      resulting in EDMG data to be non-zero, which broke things.
      
      Additionally, in a few places we rely on the driver to init the
      data completely, but perhaps we shouldn't as non-EDMG drivers
      may not initialize the EDMG data, also initialize it there.
      
      Cc: stable@vger.kernel.org
      Fixes: 2a38075c ("nl80211: Add support for EDMG channels")
      Reported-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Tested-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Link: https://lore.kernel.org/r/1569239475-I2dcce394ecf873376c386a78f31c2ec8b538fa25@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a0e6733
    • Vasily Gorbik's avatar
      s390/cio: avoid calling strlen on null pointer · 16c75eb1
      Vasily Gorbik authored
      commit ea298e6e upstream.
      
      Fix the following kasan finding:
      BUG: KASAN: global-out-of-bounds in ccwgroup_create_dev+0x850/0x1140
      Read of size 1 at addr 0000000000000000 by task systemd-udevd.r/561
      
      CPU: 30 PID: 561 Comm: systemd-udevd.r Tainted: G    B
      Hardware name: IBM 3906 M04 704 (LPAR)
      Call Trace:
      ([<0000000231b3db7e>] show_stack+0x14e/0x1a8)
       [<0000000233826410>] dump_stack+0x1d0/0x218
       [<000000023216fac4>] print_address_description+0x64/0x380
       [<000000023216f5a8>] __kasan_report+0x138/0x168
       [<00000002331b8378>] ccwgroup_create_dev+0x850/0x1140
       [<00000002332b618a>] group_store+0x3a/0x50
       [<00000002323ac706>] kernfs_fop_write+0x246/0x3b8
       [<00000002321d409a>] vfs_write+0x132/0x450
       [<00000002321d47da>] ksys_write+0x122/0x208
       [<0000000233877102>] system_call+0x2a6/0x2c8
      
      Triggered by:
      openat(AT_FDCWD, "/sys/bus/ccwgroup/drivers/qeth/group",
      		O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 16
      write(16, "0.0.bd00,0.0.bd01,0.0.bd02", 26) = 26
      
      The problem is that __get_next_id in ccwgroup_create_dev might set "buf"
      buffer pointer to NULL and explicit check for that is required.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16c75eb1
    • Johan Hovold's avatar
      ieee802154: atusb: fix use-after-free at disconnect · 3f41e88f
      Johan Hovold authored
      commit 7fd25e6f upstream.
      
      The disconnect callback was accessing the hardware-descriptor private
      data after having having freed it.
      
      Fixes: 7490b008 ("ieee802154: add support for atusb transceiver")
      Cc: stable <stable@vger.kernel.org>     # 4.2
      Cc: Alexander Aring <alex.aring@gmail.com>
      Reported-by: syzbot+f4509a9138a1472e7e80@syzkaller.appspotmail.com
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f41e88f
    • Juergen Gross's avatar
      xen/xenbus: fix self-deadlock after killing user process · 975859bb
      Juergen Gross authored
      commit a8fabb38 upstream.
      
      In case a user process using xenbus has open transactions and is killed
      e.g. via ctrl-C the following cleanup of the allocated resources might
      result in a deadlock due to trying to end a transaction in the xenbus
      worker thread:
      
      [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
      [ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
      [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 2551.528585] xenbus          D    0    37      2 0x80000080
      [ 2551.528590] Call Trace:
      [ 2551.528603]  __schedule+0x2c0/0x870
      [ 2551.528606]  ? _cond_resched+0x19/0x40
      [ 2551.528632]  schedule+0x2c/0x70
      [ 2551.528637]  xs_talkv+0x1ec/0x2b0
      [ 2551.528642]  ? wait_woken+0x80/0x80
      [ 2551.528645]  xs_single+0x53/0x80
      [ 2551.528648]  xenbus_transaction_end+0x3b/0x70
      [ 2551.528651]  xenbus_file_free+0x5a/0x160
      [ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
      [ 2551.528657]  xenbus_thread+0x7de/0x880
      [ 2551.528660]  ? wait_woken+0x80/0x80
      [ 2551.528665]  kthread+0x121/0x140
      [ 2551.528667]  ? xb_read+0x1d0/0x1d0
      [ 2551.528670]  ? kthread_park+0x90/0x90
      [ 2551.528673]  ret_from_fork+0x35/0x40
      
      Fix this by doing the cleanup via a workqueue instead.
      Reported-by: default avatarJames Dingwall <james@dingwall.me.uk>
      Fixes: fd8aa909 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
      Cc: <stable@vger.kernel.org> # 4.11
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      975859bb
    • Wanpeng Li's avatar
      Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" · e409b81d
      Wanpeng Li authored
      commit 89340d09 upstream.
      
      This patch reverts commit 75437bb3 (locking/pvqspinlock: Don't
      wait if vCPU is preempted).  A large performance regression was caused
      by this commit.  on over-subscription scenarios.
      
      The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads,
      with three VMs of 80 vCPUs each.  The score of ebizzy -M is reduced from
      13000-14000 records/s to 1700-1800 records/s:
      
                Host                Guest                score
      
      vanilla w/o kvm optimizations     upstream    1700-1800 records/s
      vanilla w/o kvm optimizations     revert      13000-14000 records/s
      vanilla w/ kvm optimizations      upstream    4500-5000 records/s
      vanilla w/ kvm optimizations      revert      14000-15500 records/s
      
      Exit from aggressive wait-early mechanism can result in premature yield
      and extra scheduling latency.
      
      Actually, only 6% of wait_early events are caused by vcpu_is_preempted()
      being true.  However, when one vCPU voluntarily releases its vCPU, all
      the subsequently waiters in the queue will do the same and the cascading
      effect leads to bad performance.
      
      kvm optimizations:
      [1] commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts)
      [2] commit 266e85a5 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
      
      Tested-by: loobinliu@tencent.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: loobinliu@tencent.com
      Cc: stable@vger.kernel.org
      Fixes: 75437bb3 (locking/pvqspinlock: Don't wait if vCPU is preempted)
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e409b81d
    • Russell King's avatar
      mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence · 7ed2867c
      Russell King authored
      commit 121bd08b upstream.
      
      We must not unconditionally set the DMA snoop bit; if the DMA API is
      assuming that the device is not DMA coherent, and the device snoops the
      CPU caches, the device can see stale cache lines brought in by
      speculative prefetch.
      
      This leads to the device seeing stale data, potentially resulting in
      corrupted data transfers.  Commonly, this results in a descriptor fetch
      error such as:
      
      mmc0: ADMA error
      mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
      mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
      mmc0: sdhci: Blk size:  0x00000008 | Blk cnt:  0x00000001
      mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
      mmc0: sdhci: Present:   0x01f50008 | Host ctl: 0x00000038
      mmc0: sdhci: Power:     0x00000003 | Blk gap:  0x00000000
      mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x000040d8
      mmc0: sdhci: Timeout:   0x00000003 | Int stat: 0x00000001
      mmc0: sdhci: Int enab:  0x037f108f | Sig enab: 0x037f108b
      mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
      mmc0: sdhci: Caps:      0x35fa0000 | Caps_1:   0x0000af00
      mmc0: sdhci: Cmd:       0x0000333a | Max curr: 0x00000000
      mmc0: sdhci: Resp[0]:   0x00000920 | Resp[1]:  0x001d8a33
      mmc0: sdhci: Resp[2]:   0x325b5900 | Resp[3]:  0x3f400e00
      mmc0: sdhci: Host ctl2: 0x00000000
      mmc0: sdhci: ADMA Err:  0x00000009 | ADMA Ptr: 0x000000236d43820c
      mmc0: sdhci: ============================================
      mmc0: error -5 whilst initialising SD card
      
      but can lead to other errors, and potentially direct the SDHCI
      controller to read/write data to other memory locations (e.g. if a valid
      descriptor is visible to the device in a stale cache line.)
      
      Fix this by ensuring that the DMA snoop bit corresponds with the
      behaviour of the DMA API.  Since the driver currently only supports DT,
      use of_dma_is_coherent().  Note that device_get_dma_attr() can not be
      used as that risks re-introducing this bug if/when the driver is
      converted to ACPI.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ed2867c
    • Russell King's avatar
      mmc: sdhci: improve ADMA error reporting · 4509a19d
      Russell King authored
      commit d1c536e3 upstream.
      
      ADMA errors are potentially data corrupting events; although we print
      the register state, we do not usefully print the ADMA descriptors.
      Worse than that, we print them by referencing their virtual address
      which is meaningless when the register state gives us the DMA address
      of the failing descriptor.
      
      Print the ADMA descriptors giving their DMA addresses rather than their
      virtual addresses, and print them using SDHCI_DUMP() rather than DBG().
      
      We also do not show the correct value of the interrupt status register;
      the register dump shows the current value, after we have cleared the
      pending interrupts we are going to service.  What is more useful is to
      print the interrupts that _were_ pending at the time the ADMA error was
      encountered.  Fix that too.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4509a19d
    • Xiaolin Zhang's avatar
      drm/i915/gvt: update vgpu workload head pointer correctly · 873f49d6
      Xiaolin Zhang authored
      commit 0a3242bd upstream.
      
      when creating a vGPU workload, the guest context head pointer should
      be updated correctly by comparing with the exsiting workload in the
      guest worklod queue including the current running context.
      
      in some situation, there is a running context A and then received 2 new
      vGPU workload context B and A. in the new workload context A, it's head
      pointer should be updated with the running context A's tail.
      
      v2: walk through guest workload list in backward way.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarXiaolin Zhang <xiaolin.zhang@intel.com>
      Reviewed-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      873f49d6
    • Lyude Paul's avatar
      drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors · 198bc704
      Lyude Paul authored
      commit 698c1aa9 upstream.
      
      On the ThinkPad P71, we have one eDP connector exposed along with 5 DP
      connectors, resulting in a total of 11 TMDS encoders. Since the GPU on
      this system is also capable of MST, we create an additional 4 fake MST
      encoders for each DP port. Unfortunately, we also do this for the eDP
      port as well, resulting in:
      
        1 eDP port: +1 TMDS encoder
                    +4 DPMST encoders
        5 DP ports: +2 TMDS encoders
                    +4 DPMST encoders
      	      *5 ports
      	      == 35 encoders
      
      Which breaks things, since DRM has a hard coded limit of 32 encoders.
      So, fix this by not creating MSTMs for any eDP connectors. This brings
      us down to 31 encoders, although we can do better.
      
      This fixes driver probing for nouveau on the ThinkPad P71.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      198bc704