1. 22 Jul, 2019 3 commits
    • Vincenzo Frascino's avatar
      arm64: vdso: Cleanup Makefiles · a88754b2
      Vincenzo Frascino authored
      The recent changes to the vdso library for arm64 and the introduction of
      the compat vdso library have generated some misalignment in the
      Makefiles.
      
      Cleanup the Makefiles for vdso and vdso32 libraries:
        * Removing unused rules.
        * Unifying the displayed compilation messages.
        * Simplifying the generic library inclusion path for
          arm64 vdso.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      a88754b2
    • Naohiro Aota's avatar
      arm64: vdso: fix flip/flop vdso build bug · 2e2f3c9b
      Naohiro Aota authored
      Running "make" on an already compiled kernel tree will rebuild the kernel
      even without any modifications:
      
      $ make ARCH=arm64 CROSS_COMPILE=/usr/bin/aarch64-unknown-linux-gnu-
      arch/arm64/Makefile:58: CROSS_COMPILE_COMPAT not defined or empty, the compat vDSO will not be built
        CALL    scripts/checksyscalls.sh
        CALL    scripts/atomic/check-atomics.sh
        VDSOCHK arch/arm64/kernel/vdso/vdso.so.dbg
        VDSOSYM include/generated/vdso-offsets.h
        CHK     include/generated/compile.h
        CC      arch/arm64/kernel/signal.o
        CC      arch/arm64/kernel/vdso.o
        CC      arch/arm64/kernel/signal32.o
        LD      arch/arm64/kernel/vdso/vdso.so.dbg
        OBJCOPY arch/arm64/kernel/vdso/vdso.so
        AS      arch/arm64/kernel/vdso/vdso.o
        AR      arch/arm64/kernel/vdso/built-in.a
        AR      arch/arm64/kernel/built-in.a
        GEN     .version
        CHK     include/generated/compile.h
        UPD     include/generated/compile.h
        CC      init/version.o
        AR      init/built-in.a
        LD      vmlinux.o
      
      This is the same bug fixed in commit 92a47286 ("x86/boot: Fix
      if_changed build flip/flop bug"). We cannot use two "if_changed" in one
      target. Fix this build bug by merging two commands into one function.
      
      Fixes: a7f71a2c ("arm64: compat: Add vDSO")
      Fixes: 28b1a824 ("arm64: vdso: Substitute gettimeofday() with C implementation")
      Reviewed-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Tested-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Co-developed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      [will: merged in compat fix from Vincenzo and made rule names consistent]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      2e2f3c9b
    • Vincenzo Frascino's avatar
      arm64: vdso: Fix population of AT_SYSINFO_EHDR for compat vdso · 85751e9e
      Vincenzo Frascino authored
      Prior to the introduction of Unified vDSO support and compat layer for
      vDSO on arm64, AT_SYSINFO_EHDR was not defined for compat tasks.
      In the current implementation, AT_SYSINFO_EHDR is defined even if the
      compat vdso layer is not built, which has been shown to break Android
      applications using bionic:
      
       | 01-01 01:22:14.097   755   755 F libc    : Fatal signal 11 (SIGSEGV),
       | code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 755 (cameraserver),
       | pid 755 (cameraserver)
       | 01-01 01:22:14.112   759   759 F libc    : Fatal signal 11 (SIGSEGV),
       | code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 759
       | (android.hardwar), pid 759 (android.hardwar)
       | 01-01 01:22:14.120   756   756 F libc    : Fatal signal 11 (SIGSEGV)
       | code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 756 (drmserver),
       | pid 756 (drmserver)
      
      Restore the old behaviour by making sure that AT_SYSINFO_EHDR for compat
      tasks is defined only when CONFIG_COMPAT_VDSO is enabled.
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      85751e9e
  2. 21 Jul, 2019 15 commits
    • Linus Torvalds's avatar
      Linus 5.3-rc1 · 5f9e832c
      Linus Torvalds authored
      5f9e832c
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · c7bf0a0f
      Linus Torvalds authored
      Pull Devicetree fixes from Rob Herring:
       "Fix several warnings/errors in validation of binding schemas"
      
      * tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples
        dt-bindings: iio: ad7124: Fix dtc warnings in example
        dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example
        dt-bindings: pinctrl: aspeed: Fix AST2500 example errors
        dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors
        dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes
        dt-bindings: Ensure child nodes are of type 'object'
      c7bf0a0f
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d6788eb7
      Linus Torvalds authored
      Pull vfs documentation typo fix from Al Viro.
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        typo fix: it's d_make_root, not d_make_inode...
      d6788eb7
    • Linus Torvalds's avatar
      Merge tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 91962d0f
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Two fixes for stable, one that had dependency on earlier patch in this
        merge window and can now go in, and a perf improvement in SMB3 open"
      
      * tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module number
        cifs: flush before set-info if we have writeable handles
        smb3: optimize open to not send query file internal info
        cifs: copy_file_range needs to strip setuid bits and update timestamps
        CIFS: fix deadlock in cached root handling
      91962d0f
    • Qian Cai's avatar
      iommu/amd: fix a crash in iova_magazine_free_pfns · 8cf66504
      Qian Cai authored
      The commit b3aa14f0 ("iommu: remove the mapping_error dma_map_ops
      method") incorrectly changed the checking from dma_ops_alloc_iova() in
      map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
      never return DMA_MAPPING_ERROR on failure but 0, so the error handling
      is all wrong.
      
         kernel BUG at drivers/iommu/iova.c:801!
          Workqueue: kblockd blk_mq_run_work_fn
          RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
          Call Trace:
           free_cpu_cached_iovas+0xbd/0x150
           alloc_iova_fast+0x8c/0xba
           dma_ops_alloc_iova.isra.6+0x65/0xa0
           map_sg+0x8c/0x2a0
           scsi_dma_map+0xc6/0x160
           pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
           pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
           scsi_queue_rq+0x79c/0x1200
           blk_mq_dispatch_rq_list+0x4dc/0xb70
           blk_mq_sched_dispatch_requests+0x249/0x310
           __blk_mq_run_hw_queue+0x128/0x200
           blk_mq_run_work_fn+0x27/0x30
           process_one_work+0x522/0xa10
           worker_thread+0x63/0x5b0
           kthread+0x1d2/0x1f0
           ret_from_fork+0x22/0x40
      
      Fixes: b3aa14f0 ("iommu: remove the mapping_error dma_map_ops method")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8cf66504
    • Mike Rapoport's avatar
      hexagon: switch to generic version of pte allocation · 618381f0
      Mike Rapoport authored
      The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(),
      pte_free_kernel() and pte_free() is identical to the generic except of
      lack of __GFP_ACCOUNT for the user PTEs allocation.
      
      Switch hexagon to use generic version of these functions.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      618381f0
    • Linus Torvalds's avatar
      Merge tag 'ntb-5.3' of git://github.com/jonmason/ntb · bec5545e
      Linus Torvalds authored
      Pull NTB updates from Jon Mason:
       "New feature to add support for NTB virtual MSI interrupts, the ability
        to test and use this feature in the NTB transport layer.
      
        Also, bug fixes for the AMD and Switchtec drivers, as well as some
        general patches"
      
      * tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
        NTB: Describe the ntb_msi_test client in the documentation.
        NTB: Add MSI interrupt support to ntb_transport
        NTB: Add ntb_msi_test support to ntb_test
        NTB: Introduce NTB MSI Test Client
        NTB: Introduce MSI library
        NTB: Rename ntb.c to support multiple source files in the module
        NTB: Introduce functions to calculate multi-port resource index
        NTB: Introduce helper functions to calculate logical port number
        PCI/switchtec: Add module parameter to request more interrupts
        PCI/MSI: Support allocating virtual MSI interrupts
        ntb_hw_switchtec: Fix setup MW with failure bug
        ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
        ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
        NTB: correct ntb_dev_ops and ntb_dev comment typos
        NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
        ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
        NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
        NTB: ntb_hw_amd: set peer limit register
        NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
        NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
        ...
      bec5545e
    • Al Viro's avatar
      typo fix: it's d_make_root, not d_make_inode... · 1b03bc5c
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1b03bc5c
    • Rob Herring's avatar
      dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples · e2297f7c
      Rob Herring authored
      Now that examples are validated against the DT schema, an error with
      required 'clocks' property missing is exposed:
      
      Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
      pinctrl@40020000: gpio@0: 'clocks' is a required property
      Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
      pinctrl@50020000: gpio@1000: 'clocks' is a required property
      Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
      pinctrl@50020000: gpio@2000: 'clocks' is a required property
      
      Add the missing 'clocks' properties to the examples to fix the errors.
      
      Fixes: 2c9239c1 ("dt-bindings: pinctrl: Convert stm32 pinctrl bindings to json-schema")
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: linux-gpio@vger.kernel.org
      Cc: linux-stm32@st-md-mailman.stormreply.com
      Acked-by: default avatarAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      e2297f7c
    • Rob Herring's avatar
      dt-bindings: iio: ad7124: Fix dtc warnings in example · 20051f5f
      Rob Herring authored
      With the conversion to DT schema, the examples are now compiled with
      dtc. The ad7124 binding example has the following warning:
      
      Documentation/devicetree/bindings/iio/adc/adi,ad7124.example.dts:19.11-21: \
      Warning (reg_format): /example-0/adc@0:reg: property has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1)
      
      There's a default #size-cells and #address-cells values of 1 for
      examples. For examples needing different values such as this one on a
      SPI bus, they need to provide a SPI bus parent node.
      
      Fixes: 26ae15e6 ("Convert AD7124 bindings documentation to YAML format.")
      
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: linux-iio@vger.kernel.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      20051f5f
    • Rob Herring's avatar
      dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example · fbbf2b6e
      Rob Herring authored
      Now that examples are validated against the DT schema, a typo in
      avia-hx711 example generates a warning:
      
      Documentation/devicetree/bindings/iio/adc/avia-hx711.example.dt.yaml: weight: 'avdd-supply' is a required property
      
      Fix the typo.
      
      Fixes: 5150ec3f ("avia-hx711.yaml: transform DT binding to YAML")
      Cc: Andreas Klinger <ak@it-klinger.de>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: linux-iio@vger.kernel.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      fbbf2b6e
    • Rob Herring's avatar
      dt-bindings: pinctrl: aspeed: Fix AST2500 example errors · fcbe7e3c
      Rob Herring authored
      The schema examples are now validated against the schema itself. The
      AST2500 pinctrl schema has a couple of errors:
      
      Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \
      example-0: $nodename:0: 'example-0' does not match '^(bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'
      Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \
      pinctrl: aspeed,external-nodes: [[1, 2]] is too short
      
      Fixes: 0a617de1 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema")
      Cc: Andrew Jeffery <andrew@aj.id.au>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: linux-aspeed@lists.ozlabs.org
      Cc: linux-gpio@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Acked-by: default avatarAndrew Jeffery <andrew@aj.id.au>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      fcbe7e3c
    • Rob Herring's avatar
      dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors · ad21a4ce
      Rob Herring authored
      The Aspeed pinctl schema have errors in the 'compatible' schema:
      
      Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.yaml: \
      properties:compatible:enum: ['aspeed', 'ast2400-pinctrl', 'aspeed', 'g4-pinctrl'] has non-unique elements
      Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.yaml: \
      properties:compatible:enum: ['aspeed', 'ast2500-pinctrl', 'aspeed', 'g5-pinctrl'] has non-unique elements
      
      Flow style sequences have to be quoted if the vales contain ','. Fix
      this by using the more common one line per entry formatting.
      
      Fixes: 0a617de1 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema")
      Fixes: 07457937 ("dt-bindings: pinctrl: aspeed: Convert AST2400 bindings to json-schema")
      Cc: Andrew Jeffery <andrew@aj.id.au>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: linux-aspeed@lists.ozlabs.org
      Cc: linux-gpio@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Acked-by: default avatarAndrew Jeffery <andrew@aj.id.au>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      ad21a4ce
    • Rob Herring's avatar
      dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes · 7d9ef7f3
      Rob Herring authored
      Matching on the 'cpus' node was a bad choice because the schema is
      incorrectly applied to non-RiscV cpus nodes. As we now have a common cpus
      schema which checks the general structure, it is also redundant to do so
      in the Risc-V CPU schema.
      
      The downside is one could conceivably mix different architecture's cpu
      nodes or have typos in the compatible string. The latter problem pretty
      much exists for every schema.
      Acked-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      7d9ef7f3
    • Rob Herring's avatar
      dt-bindings: Ensure child nodes are of type 'object' · 15ffef1a
      Rob Herring authored
      Properties which are child node definitions need to have an explict
      type. Otherwise, a matching (DT) property can silently match when an
      error is desired. Fix this up tree-wide. Once this is fixed, the
      meta-schema will enforce this on any child node definitions.
      
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Marek Vasut <marek.vasut@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Vignesh Raghavendra <vigneshr@ti.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: linux-mtd@lists.infradead.org
      Cc: linux-gpio@vger.kernel.org
      Cc: linux-stm32@st-md-mailman.stormreply.com
      Cc: linux-spi@vger.kernel.org
      Acked-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Acked-by: default avatarMaxime Ripard <maxime.ripard@bootlin.com>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Acked-by: default avatarAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      15ffef1a
  3. 20 Jul, 2019 22 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · f1a3b43c
      Linus Torvalds authored
      Pull more input updates from Dmitry Torokhov:
      
       - Apple SPI keyboard and trackpad driver for newer Macs
      
       - ALPS driver will ignore trackpoint-only devices to give the
         trackpoint driver a chance to handle them properly
      
       - another Lenovo is switched over to SMbus from PS/2
      
       - assorted driver fixups.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: alps - fix a mismatch between a condition check and its comment
        Input: psmouse - fix build error of multiple definition
        Input: applespi - remove set but not used variables 'sts'
        Input: add Apple SPI keyboard and trackpad driver
        Input: alps - don't handle ALPS cs19 trackpoint-only device
        Input: hyperv-keyboard - remove dependencies on PAGE_SIZE for ring buffer
        Input: adp5589 - initialize GPIO controller parent device
        Input: iforce - remove empty multiline comments
        Input: synaptics - fix misuse of strlcpy
        Input: auo-pixcir-ts - switch to using  devm_add_action_or_reset()
        Input: gtco - bounds check collection indent level
        Input: mtk-pmic-keys - add of_node_put() before return
        Input: sun4i-lradc-keys - add of_node_put() before return
        Input: synaptics - whitelist Lenovo T580 SMBus intertouch
      f1a3b43c
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping · ac60602a
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
       "Fix various regressions:
      
         - force unencrypted dma-coherent buffers if encryption bit can't fit
           into the dma coherent mask (Tom Lendacky)
      
         - avoid limiting request size if swiotlb is not used (me)
      
         - fix swiotlb handling in dma_direct_sync_sg_for_cpu/device (Fugang
           Duan)"
      
      * tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping:
        dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device
        dma-direct: only limit the mapping size if swiotlb could be used
        dma-mapping: add a dma_addressing_limited helper
        dma-direct: Force unencrypted DMA under SME for certain DMA masks
      ac60602a
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c6dd78fc
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of x86 specific fixes and updates:
      
         - The CR2 corruption fixes which store CR2 early in the entry code
           and hand the stored address to the fault handlers.
      
         - Revert a forgotten leftover of the dropped FSGSBASE series.
      
         - Plug a memory leak in the boot code.
      
         - Make the Hyper-V assist functionality robust by zeroing the shadow
           page.
      
         - Remove a useless check for dead processes with LDT
      
         - Update paravirt and VMware maintainers entries.
      
         - A few cleanup patches addressing various compiler warnings"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry/64: Prevent clobbering of saved CR2 value
        x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
        x86, boot: Remove multiple copy of static function sanitize_boot_params()
        x86/boot/compressed/64: Remove unused variable
        x86/boot/efi: Remove unused variables
        x86/mm, tracing: Fix CR2 corruption
        x86/entry/64: Update comments and sanity tests for create_gap
        x86/entry/64: Simplify idtentry a little
        x86/entry/32: Simplify common_exception
        x86/paravirt: Make read_cr2() CALLEE_SAVE
        MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
        x86/process: Delete useless check for dead process with LDT
        x86: math-emu: Hide clang warnings for 16-bit overflow
        x86/e820: Use proper booleans instead of 0/1
        x86/apic: Silence -Wtype-limits compiler warnings
        x86/mm: Free sme_early_buffer after init
        x86/boot: Fix memory leak in default_get_smp_config()
        Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
      c6dd78fc
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 46f5c0cc
      Linus Torvalds authored
      Pull perf tooling updates from Thomas Gleixner:
       "A set of perf improvements and fixes:
      
        perf db-export:
         - Improvements in how COMM details are exported to databases for post
           processing and use in the sql-viewer.py UI.
      
         - Export switch events to the database.
      
        BPF:
         - Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just
           like selftests/bpf/bpf_rlimit.h do, which makes errors due to
           exhaustion of this limit, which are kinda cryptic (EPERM sometimes)
           less frequent.
      
        perf version:
         - Fix segfault due to missing OPT_END(), noticed on PowerPC.
      
        perf vendor events:
         - Add JSON files for IBM s/390 machine type 8561.
      
        perf cs-etm (ARM):
         - Fix two cases of error returns not bing done properly: Invalid
           ERR_PTR() use and loss of propagation error codes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
        perf version: Fix segfault due to missing OPT_END()
        perf vendor events s390: Add JSON files for machine type 8561
        perf cs-etm: Return errcode in cs_etm__process_auxtrace_info()
        perf cs-etm: Remove errnoeous ERR_PTR() usage in cs_etm__process_auxtrace_info
        perf scripts python: export-to-postgresql.py: Export switch events
        perf scripts python: export-to-sqlite.py: Export switch events
        perf db-export: Export switch events
        perf db-export: Factor out db_export__threads()
        perf script: Add scripting operation process_switch()
        perf scripts python: exported-sql-viewer.py: Use new 'has_calls' column
        perf scripts python: exported-sql-viewer.py: Remove redundant semi-colons
        perf scripts python: export-to-postgresql.py: Add has_calls column to comms table
        perf scripts python: export-to-sqlite.py: Add has_calls column to comms table
        perf db-export: Also export thread's current comm
        perf db-export: Factor out db_export__comm()
        perf scripts python: export-to-postgresql.py: Export comm details
        perf scripts python: export-to-sqlite.py: Export comm details
        perf db-export: Export comm details
        perf db-export: Fix a white space issue in db_export__sample()
        perf db-export: Move export__comm_thread into db_export__sample()
        ...
      46f5c0cc
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e6023adc
      Linus Torvalds authored
      Pull core fixes from Thomas Gleixner:
      
       - A collection of objtool fixes which address recent fallout partially
         exposed by newer toolchains, clang, BPF and general code changes.
      
       - Force USER_DS for user stack traces
      
      [ Note: the "objtool fixes" are not all to objtool itself, but for
        kernel code that triggers objtool warnings.
      
        Things like missing function size annotations, or code that confuses
        the unwinder etc.   - Linus]
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
        objtool: Support conditional retpolines
        objtool: Convert insn type to enum
        objtool: Fix seg fault on bad switch table entry
        objtool: Support repeated uses of the same C jump table
        objtool: Refactor jump table code
        objtool: Refactor sibling call detection logic
        objtool: Do frame pointer check before dead end check
        objtool: Change dead_end_function() to return boolean
        objtool: Warn on zero-length functions
        objtool: Refactor function alias logic
        objtool: Track original function across branches
        objtool: Add mcsafe_handle_tail() to the uaccess safe list
        bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
        x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
        x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
        x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
        x86/head/64: Annotate start_cpu0() as non-callable
        x86/entry: Fix thunk function ELF sizes
        x86/kvm: Don't call kvm_spurious_fault() from .fixup
        x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
        ...
      e6023adc
    • Linus Torvalds's avatar
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b01f5a4
      Linus Torvalds authored
      Pull smp fix from Thomas Gleixner:
       "Add warnings to the smp function calls so callers from wrong contexts
        get detected"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp: Warn on function calls from softirq context
      4b01f5a4
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 70e6e1b9
      Linus Torvalds authored
      Pull CONFIG_PREEMPT_RT stub config from Thomas Gleixner:
       "The real-time preemption patch set exists for almost 15 years now and
        while the vast majority of infrastructure and enhancements have found
        their way into the mainline kernel, the final integration of RT is
        still missing.
      
        Over the course of the last few years, we have worked on reducing the
        intrusivenness of the RT patches by refactoring kernel infrastructure
        to be more real-time friendly. Almost all of these changes were
        benefitial to the mainline kernel on their own, so there was no
        objection to integrate them.
      
        Though except for the still ongoing printk refactoring, the remaining
        changes which are required to make RT a first class mainline citizen
        are not longer arguable as immediately beneficial for the mainline
        kernel. Most of them are either reordering code flows or adding RT
        specific functionality.
      
        But this now has hit a wall and turned into a classic hen and egg
        problem:
      
           Maintainers are rightfully wary vs. these changes as they make only
           sense if the final integration of RT into the mainline kernel takes
           place.
      
        Adding CONFIG_PREEMPT_RT aims to solve this as a clear sign that RT
        will be fully integrated into the mainline kernel. The final
        integration of the missing bits and pieces will be of course done with
        the same careful approach as we have used in the past.
      
        While I'm aware that you are not entirely enthusiastic about that, I
        think that RT should receive the same treatment as any other widely
        used out of tree functionality, which we have accepted into mainline
        over the years.
      
        RT has become the de-facto standard real-time enhancement and is
        shipped by enterprise, embedded and community distros. It's in use
        throughout a wide range of industries: telecommunications, industrial
        automation, professional audio, medical devices, data acquisition,
        automotive - just to name a few major use cases.
      
        RT development is backed by a Linuxfoundation project which is
        supported by major stakeholders of this technology. The funding will
        continue over the actual inclusion into mainline to make sure that the
        functionality is neither introducing regressions, regressing itself,
        nor becomes subject to bitrot. There is also a lifely user community
        around RT as well, so contrary to the grim situation 5 years ago, it's
        a healthy project.
      
        As RT is still a good vehicle to exercise rarely used code paths and
        to detect hard to trigger issues, you could at least view it as a QA
        tool if nothing else"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT
      70e6e1b9
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 07ab9d5b
      Linus Torvalds authored
      Pull more KVM updates from Paolo Bonzini:
       "Mostly bugfixes, but also:
      
         - s390 support for KVM selftests
      
         - LAPIC timer offloading to housekeeping CPUs
      
         - Extend an s390 optimization for overcommitted hosts to all
           architectures
      
         - Debugging cleanups and improvements"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
        KVM: x86: Add fixed counters to PMU filter
        KVM: nVMX: do not use dangling shadow VMCS after guest reset
        KVM: VMX: dump VMCS on failed entry
        KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
        KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
        KVM: Boost vCPUs that are delivering interrupts
        KVM: selftests: Remove superfluous define from vmx.c
        KVM: SVM: Fix detection of AMD Errata 1096
        KVM: LAPIC: Inject timer interrupt via posted interrupt
        KVM: LAPIC: Make lapic timer unpinned
        KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
        KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
        kvm: x86: ioapic and apic debug macros cleanup
        kvm: x86: some tsc debug cleanup
        kvm: vmx: fix coccinelle warnings
        x86: kvm: avoid constant-conversion warning
        x86: kvm: avoid -Wsometimes-uninitized warning
        KVM: x86: expose AVX512_BF16 feature to guest
        KVM: selftests: enable pgste option for the linker on s390
        KVM: selftests: Move kvm_create_max_vcpus test to generic code
        ...
      07ab9d5b
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f65420df
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is the final round of mostly small fixes in our initial submit.
      
        It's mostly minor fixes and driver updates. The only change of note is
        adding a virt_boundary_mask to the SCSI host and host template to
        parametrise this for NVMe devices instead of having them do a call in
        slave_alloc. It's a fairly straightforward conversion except in the
        two NVMe handling drivers that didn't set it who now have a virtual
        infinity parameter added"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
        scsi: megaraid_sas: set an unlimited max_segment_size
        scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
        scsi: IB/srp: set virt_boundary_mask in the scsi host
        scsi: IB/iser: set virt_boundary_mask in the scsi host
        scsi: storvsc: set virt_boundary_mask in the scsi host template
        scsi: ufshcd: set max_segment_size in the scsi host template
        scsi: core: take the DMA max mapping size into account
        scsi: core: add a host / host template field for the virt boundary
        scsi: core: Fix race on creating sense cache
        scsi: sd_zbc: Fix compilation warning
        scsi: libfc: fix null pointer dereference on a null lport
        scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
        scsi: zfcp: fix request object use-after-free in send path causing wrong traces
        scsi: zfcp: fix request object use-after-free in send path causing seqno errors
        scsi: megaraid_sas: Update driver version to 07.710.50.00
        scsi: megaraid_sas: Add module parameter for FW Async event logging
        scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers
        scsi: megaraid_sas: Fix calculation of target ID
        scsi: lpfc: reduce stack size with CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE
        scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade
        ...
      f65420df
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 168c7997
      Linus Torvalds authored
      Pull more Kbuild updates from Masahiro Yamada:
      
       - match the directory structure of the linux-libc-dev package to that
         of Debian-based distributions
      
       - fix incorrect include/config/auto.conf generation when Kconfig
         creates it along with the .config file
      
       - remove misleading $(AS) from documents
      
       - clean up precious tag files by distclean instead of mrproper
      
       - add a new coccinelle patch for devm_platform_ioremap_resource
         migration
      
       - refactor module-related scripts to read modules.order instead of
         $(MODVERDIR)/*.mod files to get the list of created modules
      
       - remove MODVERDIR
      
       - update list of header compile-test
      
       - add -fcf-protection=none flag to avoid conflict with the retpoline
         flags when CONFIG_RETPOLINE=y
      
       - misc cleanups
      
      * tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
        kbuild: add -fcf-protection=none when using retpoline flags
        kbuild: update compile-test header list for v5.3-rc1
        kbuild: split out *.mod out of {single,multi}-used-m rules
        kbuild: remove 'prepare1' target
        kbuild: remove the first line of *.mod files
        kbuild: create *.mod with full directory path and remove MODVERDIR
        kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
        kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
        kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
        kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
        scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver
        kbuild: remove duplication from modules.order in sub-directories
        kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
        kbuild: do not create empty modules.order in the prepare stage
        coccinelle: api: add devm_platform_ioremap_resource script
        kbuild: compile-test headers listed in header-test-m as well
        kbuild: remove unused hostcc-option
        kbuild: remove tag files by distclean instead of mrproper
        kbuild: add --hash-style= and --build-id unconditionally
        kbuild: get rid of misleading $(AS) from documents
        ...
      168c7997
    • Linus Torvalds's avatar
      Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 18253e03
      Linus Torvalds authored
      Pull dcache and mountpoint updates from Al Viro:
       "Saner handling of refcounts to mountpoints.
      
        Transfer the counting reference from struct mount ->mnt_mountpoint
        over to struct mountpoint ->m_dentry. That allows us to get rid of the
        convoluted games with ordering of mount shutdowns.
      
        The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
        mixed-filesystem shrink lists, which we'll also need for the Slab
        Movable Objects patchset"
      
      * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        switch the remnants of releasing the mountpoint away from fs_pin
        get rid of detach_mnt()
        make struct mountpoint bear the dentry reference to mountpoint, not struct mount
        Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
        fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
        __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
        nfs: dget_parent() never returns NULL
        ceph: don't open-code the check for dead lockref
      18253e03
    • Thomas Gleixner's avatar
      x86/entry/64: Prevent clobbering of saved CR2 value · 6879298b
      Thomas Gleixner authored
      The recent fix for CR2 corruption introduced a new way to reliably corrupt
      the saved CR2 value.
      
      CR2 is saved early in the entry code in RDX, which is the third argument to
      the fault handling functions. But it missed that between saving and
      invoking the fault handler enter_from_user_mode() can be called. RDX is a
      caller saved register so the invoked function can freely clobber it with
      the obvious consequences.
      
      The TRACE_IRQS_OFF call is safe as it calls through the thunk which
      preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into
      C-code outside of the thunk.
      
      Store CR2 in R12 instead which is a callee saved register and move R12 to
      RDX just before calling the fault handler.
      
      Fixes: a0d14b89 ("x86/mm, tracing: Fix CR2 corruption")
      Reported-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
      6879298b
    • Peter Zijlstra's avatar
      smp: Warn on function calls from softirq context · 19dbdcb8
      Peter Zijlstra authored
      It's clearly documented that smp function calls cannot be invoked from
      softirq handling context. Unfortunately nothing enforces that or emits a
      warning.
      
      A single function call can be invoked from softirq context only via
      smp_call_function_single_async().
      
      The only legit context is task context, so add a warning to that effect.
      Reported-by: default avatarluferry <luferry@163.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190718160601.GP3402@hirez.programming.kicks-ass.net
      19dbdcb8
    • Eric Hankland's avatar
      KVM: x86: Add fixed counters to PMU filter · 30cd8604
      Eric Hankland authored
      Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
      fixed counters.
      Signed-off-by: default avatarEric Hankland <ehankland@google.com>
      [No need to check padding fields for zero. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30cd8604
    • Paolo Bonzini's avatar
      KVM: nVMX: do not use dangling shadow VMCS after guest reset · 88dddc11
      Paolo Bonzini authored
      If a KVM guest is reset while running a nested guest, free_nested will
      disable the shadow VMCS execution control in the vmcs01.  However,
      on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
      the VMCS12 to the shadow VMCS which has since been freed.
      
      This causes a vmptrld of a NULL pointer on my machime, but Jan reports
      the host to hang altogether.  Let's see how much this trivial patch fixes.
      Reported-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      88dddc11
    • Paolo Bonzini's avatar
      KVM: VMX: dump VMCS on failed entry · 3b20e03a
      Paolo Bonzini authored
      This is useful for debugging, and is ratelimited nowadays.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3b20e03a
    • Like Xu's avatar
      KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed · 6fc3977c
      Like Xu authored
      If a perf_event creation fails due to any reason of the host perf
      subsystem, it has no chance to log the corresponding event for guest
      which may cause abnormal sampling data in guest result. In debug mode,
      this message helps to understand the state of vPMC and we may not
      limit the number of occurrences but not in a spamming style.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6fc3977c
    • Wanpeng Li's avatar
      KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup · d9847409
      Wanpeng Li authored
      Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup().
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d9847409
    • Wanpeng Li's avatar
      KVM: Boost vCPUs that are delivering interrupts · d73eb57b
      Wanpeng Li authored
      Inspired by commit 9cac38dd (KVM/s390: Set preempted flag during
      vcpu wakeup and interrupt delivery), we want to also boost not just
      lock holders but also vCPUs that are delivering interrupts. Most
      smp_call_function_many calls are synchronous, so the IPI target vCPUs
      are also good yield candidates.  This patch introduces vcpu->ready to
      boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
      not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
      into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
      (VT-d PI handles voluntary preemption separately, in pi_pre_block).
      
      Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
      ebizzy -M
      
                  vanilla     boosting    improved
      1VM          21443       23520         9%
      2VM           2800        8000       180%
      3VM           1800        3100        72%
      
      Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
      one running ebizzy -M, the other running 'stress --cpu 2':
      
      w/ boosting + w/o pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1570         4000      155%
      
      w/ boosting + w/ pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1844         5157      179%
      
      w/o boosting, perf top in VM:
      
       72.33%  [kernel]       [k] smp_call_function_many
        4.22%  [kernel]       [k] call_function_i
        3.71%  [kernel]       [k] async_page_fault
      
      w/ boosting, perf top in VM:
      
       38.43%  [kernel]       [k] smp_call_function_many
        6.31%  [kernel]       [k] async_page_fault
        6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
        4.88%  [kernel]       [k] call_function_interrupt
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d73eb57b
    • Thomas Huth's avatar
      KVM: selftests: Remove superfluous define from vmx.c · 2417c870
      Thomas Huth authored
      The code in vmx.c does not use "program_invocation_name", so there
      is no need to "#define _GNU_SOURCE" here.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2417c870
    • Liran Alon's avatar
      KVM: SVM: Fix detection of AMD Errata 1096 · 118154bd
      Liran Alon authored
      When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is
      possible that CPU microcode implementing DecodeAssist will fail
      to read bytes of instruction which caused #NPF. This is AMD errata
      1096 and it happens because CPU microcode reading instruction bytes
      incorrectly attempts to read code as implicit supervisor-mode data
      accesses (that is, just like it would read e.g. a TSS), which are
      susceptible to SMAP faults. The microcode reads CS:RIP and if it is
      a user-mode address according to the page tables, the processor
      gives up and returns no instruction bytes.  In this case,
      GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly
      return 0 instead of the correct guest instruction bytes.
      
      Current KVM code attemps to detect and workaround this errata, but it
      has multiple issues:
      
      1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1,
      which is required for encountering a SMAP fault.
      
      2) It assumes SMAP faults can only occur when guest CPL==3.
      However, in case guest CR4.SMEP=0, the guest can execute an instruction
      which reside in a user-accessible page with CPL<3 priviledge. If this
      instruction raise a #NPF on it's data access, then CPU DecodeAssist
      microcode will still encounter a SMAP violation.  Even though no sane
      OS will do so (as it's an obvious priviledge escalation vulnerability),
      we still need to handle this semanticly correct in KVM side.
      
      Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy
      triggerable condition and guests usually enable SMAP together with SMEP.
      If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt
      at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest
      instead of #NPF.  We keep this condition to avoid false positives in
      the detection of the errata.
      
      In addition, to avoid future confusion and improve code readbility,
      include details of the errata in code and not just in commit message.
      
      Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
      Cc: Singh Brijesh <brijesh.singh@amd.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      118154bd
    • Wanpeng Li's avatar
      KVM: LAPIC: Inject timer interrupt via posted interrupt · 0c5f81da
      Wanpeng Li authored
      Dedicated instances are currently disturbed by unnecessary jitter due
      to the emulated lapic timers firing on the same pCPUs where the
      vCPUs reside.  There is no hardware virtual timer on Intel for guest
      like ARM, so both programming timer in guest and the emulated timer fires
      incur vmexits.  This patch tries to avoid vmexit when the emulated timer
      fires, at least in dedicated instance scenario when nohz_full is enabled.
      
      In that case, the emulated timers can be offload to the nearest busy
      housekeeping cpus since APICv has been found for several years in server
      processors. The guest timer interrupt can then be injected via posted interrupts,
      which are delivered by the housekeeping cpu once the emulated timer fires.
      
      The host should tuned so that vCPUs are placed on isolated physical
      processors, and with several pCPUs surplus for busy housekeeping.
      If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
      ~3% redis performance benefit can be observed on Skylake server, and the
      number of external interrupt vmexits drops substantially.  Without patch
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
      EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
      
      While with patch:
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
      EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0c5f81da