1. 31 May, 2019 18 commits
    • Dan Williams's avatar
      libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead · e9e27bfc
      Dan Williams authored
      commit 52f476a3 upstream.
      
      Jeff discovered that performance improves from ~375K iops to ~519K iops
      on a simple psync-write fio workload when moving the location of 'struct
      page' from the default PMEM location to DRAM. This result is surprising
      because the expectation is that 'struct page' for dax is only needed for
      third party references to dax mappings. For example, a dax-mapped buffer
      passed to another system call for direct-I/O requires 'struct page' for
      sending the request down the driver stack and pinning the page. There is
      no usage of 'struct page' for first party access to a file via
      read(2)/write(2) and friends.
      
      However, this "no page needed" expectation is violated by
      CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
      copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
      check_heap_object() helper routine assumes the buffer is backed by a
      slab allocator (DRAM) page and applies some checks.  Those checks are
      invalid, dax pages do not originate from the slab, and redundant,
      dax_iomap_actor() has already validated that the I/O is within bounds.
      Specifically that routine validates that the logical file offset is
      within bounds of the file, then it does a sector-to-pfn translation
      which validates that the physical mapping is within bounds of the block
      device.
      
      Bypass additional hardened usercopy overhead and call the 'no check'
      versions of the copy_{to,from}_iter operations directly.
      
      Fixes: 0aed55af ("x86, uaccess: introduce copy_from_iter_flushcache...")
      Cc: <stable@vger.kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reported-and-tested-by: default avatarJeff Smits <jeff.smits@intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e27bfc
    • Wanpeng Li's avatar
      KVM: nVMX: Fix using __this_cpu_read() in preemptible context · e3feb4af
      Wanpeng Li authored
      commit 541e886f upstream.
      
       BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
        caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
        CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
        Call Trace:
         dump_stack+0x67/0x95
         __this_cpu_preempt_check+0xd2/0xe0
         nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
         nested_vmx_run+0xda/0x2b0 [kvm_intel]
         handle_vmlaunch+0x13/0x20 [kvm_intel]
         vmx_handle_exit+0xbd/0x660 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
         kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
         do_vfs_ioctl+0xa5/0x6e0
         ksys_ioctl+0x6d/0x80
         __x64_sys_ioctl+0x1a/0x20
         do_syscall_64+0x6f/0x6c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Accessing per-cpu variable should disable preemption, this patch extends the
      preemption disable region for __this_cpu_read().
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Fixes: 52017608 ("KVM: nVMX: add option to perform early consistency checks via H/W")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3feb4af
    • Suthikulpanit, Suravee's avatar
      kvm: svm/avic: fix off-by-one in checking host APIC ID · 4a4c222e
      Suthikulpanit, Suravee authored
      commit c9bcd3e3 upstream.
      
      Current logic does not allow VCPU to be loaded onto CPU with
      APIC ID 255. This should be allowed since the host physical APIC ID
      field in the AVIC Physical APIC table entry is an 8-bit value,
      and APIC ID 255 is valid in system with x2APIC enabled.
      Instead, do not allow VCPU load if the host APIC ID cannot be
      represented by an 8-bit value.
      
      Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
      instead of AVIC_MAX_PHYSICAL_ID_COUNT.
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a4c222e
    • Peter Xu's avatar
      kvm: Check irqchip mode before assign irqfd · baaee956
      Peter Xu authored
      commit 654f1f13 upstream.
      
      When assigning kvm irqfd we didn't check the irqchip mode but we allow
      KVM_IRQFD to succeed with all the irqchip modes.  However it does not
      make much sense to create irqfd even without the kernel chips.  Let's
      provide a arch-dependent helper to check whether a specific irqfd is
      allowed by the arch.  At least for x86, it should make sense to check:
      
      - when irqchip mode is NONE, all irqfds should be disallowed, and,
      
      - when irqchip mode is SPLIT, irqfds that are with resamplefd should
        be disallowed.
      
      For either of the case, previously we'll silently ignore the irq or
      the irq ack event if the irqchip mode is incorrect.  However that can
      cause misterious guest behaviors and it can be hard to triage.  Let's
      fail KVM_IRQFD even earlier to detect these incorrect configurations.
      
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Radim Krčmář <rkrcmar@redhat.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baaee956
    • Dan Williams's avatar
      dax: Arrange for dax_supported check to span multiple devices · e00303be
      Dan Williams authored
      commit 7bf7eac8 upstream.
      
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e00303be
    • Tom Zanussi's avatar
      tracing: Add a check_val() check before updating cond_snapshot() track_val · 269360f1
      Tom Zanussi authored
      commit 9b2ca371 upstream.
      
      Without this check a snapshot is taken whenever a bucket's max is hit,
      rather than only when the global max is hit, as it should be.
      
      Before:
      
        In this example, we do a first run of the workload (cyclictest),
        examine the output, note the max ('triggering value') (347), then do
        a second run and note the max again.
      
        In this case, the max in the second run (39) is below the max in the
        first run, but since we haven't cleared the histogram, the first max
        is still in the histogram and is higher than any other max, so it
        should still be the max for the snapshot.  It isn't however - the
        value should still be 347 after the second run.
      
        # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
        # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio,prev_comm):onmax($wakeup_lat).snapshot() if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
      
        # cyclictest -p 80 -n -s -t 2 -D 2
      
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
      
        { next_pid:       2143 } hitcount:        199
          max:         44  next_prio:        120  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/4
      
        { next_pid:       2145 } hitcount:       1325
          max:         38  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/2
      
        { next_pid:       2144 } hitcount:       1982
          max:        347  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/6
      
        Snapshot taken (see tracing/snapshot).  Details:
            triggering value { onmax($wakeup_lat) }:        347
            triggered by event with key: { next_pid:       2144 }
      
        # cyclictest -p 80 -n -s -t 2 -D 2
      
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
      
        { next_pid:       2143 } hitcount:        199
          max:         44  next_prio:        120  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/4
      
        { next_pid:       2148 } hitcount:        199
          max:         16  next_prio:        120  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/1
      
        { next_pid:       2145 } hitcount:       1325
          max:         38  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/2
      
        { next_pid:       2150 } hitcount:       1326
          max:         39  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/4
      
        { next_pid:       2144 } hitcount:       1982
          max:        347  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/6
      
        { next_pid:       2149 } hitcount:       1983
          max:        130  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/0
      
        Snapshot taken (see tracing/snapshot).  Details:
          triggering value { onmax($wakeup_lat) }:    39
          triggered by event with key: { next_pid:       2150 }
      
      After:
      
        In this example, we do a first run of the workload (cyclictest),
        examine the output, note the max ('triggering value') (375), then do
        a second run and note the max again.
      
        In this case, the max in the second run is still 375, the highest in
        any bucket, as it should be.
      
        # cyclictest -p 80 -n -s -t 2 -D 2
      
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
      
        { next_pid:       2072 } hitcount:        200
          max:         28  next_prio:        120  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/5
      
        { next_pid:       2074 } hitcount:       1323
          max:        375  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/2
      
        { next_pid:       2073 } hitcount:       1980
          max:        153  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/6
      
        Snapshot taken (see tracing/snapshot).  Details:
          triggering value { onmax($wakeup_lat) }:        375
          triggered by event with key: { next_pid:       2074 }
      
        # cyclictest -p 80 -n -s -t 2 -D 2
      
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
      
        { next_pid:       2101 } hitcount:        199
          max:         49  next_prio:        120  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/6
      
        { next_pid:       2072 } hitcount:        200
          max:         28  next_prio:        120  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/5
      
        { next_pid:       2074 } hitcount:       1323
          max:        375  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/2
      
        { next_pid:       2103 } hitcount:       1325
          max:         74  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/4
      
        { next_pid:       2073 } hitcount:       1980
          max:        153  next_prio:         19  next_comm: cyclictest
          prev_pid:          0  prev_prio:        120  prev_comm: swapper/6
      
        { next_pid:       2102 } hitcount:       1981
          max:         84  next_prio:         19  next_comm: cyclictest
          prev_pid:         12  prev_prio:        120  prev_comm: kworker/0:1
      
        Snapshot taken (see tracing/snapshot).  Details:
          triggering value { onmax($wakeup_lat) }:        375
          triggered by event with key: { next_pid:       2074 }
      
      Link: http://lkml.kernel.org/r/95958351329f129c07504b4d1769c47a97b70d65.1555597045.git.tom.zanussi@linux.intel.com
      
      Cc: stable@vger.kernel.org
      Fixes: a3785b7e ("tracing: Add hist trigger snapshot() action")
      Signed-off-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      269360f1
    • Trac Hoang's avatar
      mmc: sdhci-iproc: Set NO_HISPD bit to fix HS50 data hold time problem · acf49fa4
      Trac Hoang authored
      commit ec0970e0 upstream.
      
      The iproc host eMMC/SD controller hold time does not meet the
      specification in the HS50 mode.  This problem can be mitigated
      by disabling the HISPD bit; thus forcing the controller output
      data to be driven on the falling clock edges rather than the
      rising clock edges.
      
      Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
      the change does not produce merge conflicts backporting to older kernel
      versions. In reality, the timing bug existed since the driver was first
      introduced but there is no need for this driver to be supported in kernel
      versions that old.
      
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarTrac Hoang <trac.hoang@broadcom.com>
      Signed-off-by: default avatarScott Branden <scott.branden@broadcom.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acf49fa4
    • Trac Hoang's avatar
      mmc: sdhci-iproc: cygnus: Set NO_HISPD bit to fix HS50 data hold time problem · a0514c0a
      Trac Hoang authored
      commit b7dfa695 upstream.
      
      The iproc host eMMC/SD controller hold time does not meet the
      specification in the HS50 mode. This problem can be mitigated
      by disabling the HISPD bit; thus forcing the controller output
      data to be driven on the falling clock edges rather than the
      rising clock edges.
      
      This change applies only to the Cygnus platform.
      
      Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
      the change does not produce merge conflicts backporting to older kernel
      versions. In reality, the timing bug existed since the driver was first
      introduced but there is no need for this driver to be supported in kernel
      versions that old.
      
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarTrac Hoang <trac.hoang@broadcom.com>
      Signed-off-by: default avatarScott Branden <scott.branden@broadcom.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0514c0a
    • Daniel Axtens's avatar
      crypto: vmx - CTR: always increment IV as quadword · 1860a557
      Daniel Axtens authored
      commit 009b30ac upstream.
      
      The kernel self-tests picked up an issue with CTR mode:
      alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
      
      Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
      after 3 increments it should wrap around to 0.
      
      In the aesp8-ppc code from OpenSSL, there are two paths that
      increment IVs: the bulk (8 at a time) path, and the individual
      path which is used when there are fewer than 8 AES blocks to
      process.
      
      In the bulk path, the IV is incremented with vadduqm: "Vector
      Add Unsigned Quadword Modulo", which does 128-bit addition.
      
      In the individual path, however, the IV is incremented with
      vadduwm: "Vector Add Unsigned Word Modulo", which instead
      does 4 32-bit additions. Thus the IV would instead become
      FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
      
      Use vadduqm.
      
      This was probably a typo originally, what with q and w being
      adjacent. It is a pretty narrow edge case: I am really
      impressed by the quality of the kernel self-tests!
      
      Fixes: 5c380d62 ("crypto: vmx - Add support for VMS instructions by ASM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarNayna Jain <nayna@linux.ibm.com>
      Tested-by: default avatarNayna Jain <nayna@linux.ibm.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1860a557
    • Eric Biggers's avatar
      crypto: hash - fix incorrect HASH_MAX_DESCSIZE · 6920fcd3
      Eric Biggers authored
      commit e1354400 upstream.
      
      The "hmac(sha3-224-generic)" algorithm has a descsize of 368 bytes,
      which is greater than HASH_MAX_DESCSIZE (360) which is only enough for
      sha3-224-generic.  The check in shash_prepare_alg() doesn't catch this
      because the HMAC template doesn't set descsize on the algorithms, but
      rather sets it on each individual HMAC transform.
      
      This causes a stack buffer overflow when SHASH_DESC_ON_STACK() is used
      with hmac(sha3-224-generic).
      
      Fix it by increasing HASH_MAX_DESCSIZE to the real maximum.  Also add a
      sanity check to hmac_init().
      
      This was detected by the improved crypto self-tests in v5.2, by loading
      the tcrypt module with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y enabled.  I
      didn't notice this bug when I ran the self-tests by requesting the
      algorithms via AF_ALG (i.e., not using tcrypt), probably because the
      stack layout differs in the two cases and that made a difference here.
      
      KASAN report:
      
          BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:359 [inline]
          BUG: KASAN: stack-out-of-bounds in shash_default_import+0x52/0x80 crypto/shash.c:223
          Write of size 360 at addr ffff8880651defc8 by task insmod/3689
      
          CPU: 2 PID: 3689 Comm: insmod Tainted: G            E     5.1.0-10741-g35c99ffa #11
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
          Call Trace:
           __dump_stack lib/dump_stack.c:77 [inline]
           dump_stack+0x86/0xc5 lib/dump_stack.c:113
           print_address_description+0x7f/0x260 mm/kasan/report.c:188
           __kasan_report+0x144/0x187 mm/kasan/report.c:317
           kasan_report+0x12/0x20 mm/kasan/common.c:614
           check_memory_region_inline mm/kasan/generic.c:185 [inline]
           check_memory_region+0x137/0x190 mm/kasan/generic.c:191
           memcpy+0x37/0x50 mm/kasan/common.c:125
           memcpy include/linux/string.h:359 [inline]
           shash_default_import+0x52/0x80 crypto/shash.c:223
           crypto_shash_import include/crypto/hash.h:880 [inline]
           hmac_import+0x184/0x240 crypto/hmac.c:102
           hmac_init+0x96/0xc0 crypto/hmac.c:107
           crypto_shash_init include/crypto/hash.h:902 [inline]
           shash_digest_unaligned+0x9f/0xf0 crypto/shash.c:194
           crypto_shash_digest+0xe9/0x1b0 crypto/shash.c:211
           generate_random_hash_testvec.constprop.11+0x1ec/0x5b0 crypto/testmgr.c:1331
           test_hash_vs_generic_impl+0x3f7/0x5c0 crypto/testmgr.c:1420
           __alg_test_hash+0x26d/0x340 crypto/testmgr.c:1502
           alg_test_hash+0x22e/0x330 crypto/testmgr.c:1552
           alg_test.part.7+0x132/0x610 crypto/testmgr.c:4931
           alg_test+0x1f/0x40 crypto/testmgr.c:4952
      
      Fixes: b68a7ec1 ("crypto: hash - Remove VLA usage")
      Reported-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Cc: <stable@vger.kernel.org> # v4.20+
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6920fcd3
    • Martin K. Petersen's avatar
      Revert "scsi: sd: Keep disk read-only when re-reading partition" · 204d5350
      Martin K. Petersen authored
      commit 8acf608e upstream.
      
      This reverts commit 20bd1d02.
      
      This patch introduced regressions for devices that come online in
      read-only state and subsequently switch to read-write.
      
      Given how the partition code is currently implemented it is not
      possible to persist the read-only flag across a device revalidate
      call. This may need to get addressed in the future since it is common
      for user applications to proactively call BLKRRPART.
      
      Reverting this commit will re-introduce a regression where a
      device-initiated revalidate event will cause the admin state to be
      forgotten. A separate patch will address this issue.
      
      Fixes: 20bd1d02 ("scsi: sd: Keep disk read-only when re-reading partition")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      204d5350
    • Andrea Parri's avatar
      sbitmap: fix improper use of smp_mb__before_atomic() · 15e5e4b9
      Andrea Parri authored
      commit a0934fd2 upstream.
      
      This barrier only applies to the read-modify-write operations; in
      particular, it does not apply to the atomic_set() primitive.
      
      Replace the barrier with an smp_mb().
      
      Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize")
      Cc: stable@vger.kernel.org
      Reported-by: default avatar"Paul E. McKenney" <paulmck@linux.ibm.com>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: linux-block@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15e5e4b9
    • Andrea Parri's avatar
      bio: fix improper use of smp_mb__before_atomic() · 01b5e7f8
      Andrea Parri authored
      commit f381c6a4 upstream.
      
      This barrier only applies to the read-modify-write operations; in
      particular, it does not apply to the atomic_set() primitive.
      
      Replace the barrier with an smp_mb().
      
      Fixes: dac56212 ("bio: skip atomic inc/dec of ->bi_cnt for most use cases")
      Cc: stable@vger.kernel.org
      Reported-by: default avatar"Paul E. McKenney" <paulmck@linux.ibm.com>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: linux-block@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01b5e7f8
    • Borislav Petkov's avatar
      x86/kvm/pmu: Set AMD's virt PMU version to 1 · e83d85e7
      Borislav Petkov authored
      commit a80c4ec1 upstream.
      
      After commit:
      
        672ff6cf ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
      
      my AMD guests started #GPing like this:
      
        general protection fault: 0000 [#1] PREEMPT SMP
        CPU: 1 PID: 4355 Comm: bash Not tainted 5.1.0-rc6+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        RIP: 0010:x86_perf_event_update+0x3b/0xa0
      
      with Code: pointing to RDPMC. It is RDPMC because the guest has the
      hardware watchdog CONFIG_HARDLOCKUP_DETECTOR_PERF enabled which uses
      perf. Instrumenting kvm_pmu_rdpmc() some, showed that it fails due to:
      
        if (!pmu->version)
        	return 1;
      
      which the above commit added. Since AMD's PMU leaves the version at 0,
      that causes the #GP injection into the guest.
      
      Set pmu->version arbitrarily to 1 and move it above the non-applicable
      struct kvm_pmu members.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Mihai Carabas <mihai.carabas@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 672ff6cf ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e83d85e7
    • Paolo Bonzini's avatar
      KVM: x86: fix return value for reserved EFER · d7a74fba
      Paolo Bonzini authored
      commit 66f61c92 upstream.
      
      Commit 11988499 ("KVM: x86: Skip EFER vs. guest CPUID checks for
      host-initiated writes", 2019-04-02) introduced a "return false" in a
      function returning int, and anyway set_efer has a "nonzero on error"
      conventon so it should be returning 1.
      Reported-by: default avatarPavel Machek <pavel@denx.de>
      Fixes: 11988499 ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7a74fba
    • Jan Kara's avatar
      ext4: wait for outstanding dio during truncate in nojournal mode · 824adfb2
      Jan Kara authored
      commit 82a25b02 upstream.
      
      We didn't wait for outstanding direct IO during truncate in nojournal
      mode (as we skip orphan handling in that case). This can lead to fs
      corruption or stale data exposure if truncate ends up freeing blocks
      and these get reallocated before direct IO finishes. Fix the condition
      determining whether the wait is necessary.
      
      CC: stable@vger.kernel.org
      Fixes: 1c9114f9 ("ext4: serialize unlocked dio reads with truncate")
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      824adfb2
    • Jan Kara's avatar
      ext4: do not delete unlinked inode from orphan list on failed truncate · 5f2e67d3
      Jan Kara authored
      commit ee0ed02c upstream.
      
      It is possible that unlinked inode enters ext4_setattr() (e.g. if
      somebody calls ftruncate(2) on unlinked but still open file). In such
      case we should not delete the inode from the orphan list if truncate
      fails. Note that this is mostly a theoretical concern as filesystem is
      corrupted if we reach this path anyway but let's be consistent in our
      orphan handling.
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f2e67d3
    • Steven Rostedt (VMware)'s avatar
      x86: Hide the int3_emulate_call/jmp functions from UML · 680ae6ba
      Steven Rostedt (VMware) authored
      commit 693713cb upstream.
      
      User Mode Linux does not have access to the ip or sp fields of the pt_regs,
      and accessing them causes UML to fail to build. Hide the int3_emulate_jmp()
      and int3_emulate_call() instructions from UML, as it doesn't need them
      anyway.
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      680ae6ba
  2. 25 May, 2019 22 commits