1. 26 Feb, 2018 8 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 85a2d939
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Yet another pile of melted spectrum related changes:
      
         - sanitize the array_index_nospec protection mechanism: Remove the
           overengineered array_index_nospec_mask_check() magic and allow
           const-qualified types as index to avoid temporary storage in a
           non-const local variable.
      
         - make the microcode loader more robust by properly propagating error
           codes. Provide information about new feature bits after micro code
           was updated so administrators can act upon.
      
         - optimizations of the entry ASM code which reduce code footprint and
           make the code simpler and faster.
      
         - fix the {pmd,pud}_{set,clear}_flags() implementations to work
           properly on paravirt kernels by removing the address translation
           operations.
      
         - revert the harmful vmexit_fill_RSB() optimization
      
         - use IBRS around firmware calls
      
         - teach objtool about retpolines and add annotations for indirect
           jumps and calls.
      
         - explicitly disable jumplabel patching in __init code and handle
           patching failures properly instead of silently ignoring them.
      
         - remove indirect paravirt calls for writing the speculation control
           MSR as these calls are obviously proving the same attack vector
           which is tried to be mitigated.
      
         - a few small fixes which address build issues with recent compiler
           and assembler versions"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
        KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
        KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
        objtool, retpolines: Integrate objtool with retpoline support more closely
        x86/entry/64: Simplify ENCODE_FRAME_POINTER
        extable: Make init_kernel_text() global
        jump_label: Warn on failed jump_label patching attempt
        jump_label: Explicitly disable jump labels in __init code
        x86/entry/64: Open-code switch_to_thread_stack()
        x86/entry/64: Move ASM_CLAC to interrupt_entry()
        x86/entry/64: Remove 'interrupt' macro
        x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
        x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
        x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
        x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
        objtool: Add module specific retpoline rules
        objtool: Add retpoline validation
        objtool: Use existing global variables for options
        x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
        x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
        x86/paravirt, objtool: Annotate indirect calls
        ...
      85a2d939
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d4858aaf
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "s390:
         - optimization for the exitless interrupt support that was merged in 4.16-rc1
         - improve the branch prediction blocking for nested KVM
         - replace some jump tables with switch statements to improve expoline performance
         - fixes for multiple epoch facility
      
        ARM:
         - fix the interaction of userspace irqchip VMs with in-kernel irqchip VMs
         - make sure we can build 32-bit KVM/ARM with gcc-8.
      
        x86:
         - fixes for AMD SEV
         - fixes for Intel nested VMX, emulated UMIP and a dump_stack() on VM startup
         - fixes for async page fault migration
         - small optimization to PV TLB flush (new in 4.16-rc1)
         - syzkaller fixes
      
        Generic:
         - compiler warning fixes
         - syzkaller fixes
         - more improvements to the kvm_stat tool
      
        Two more small Spectre fixes are going to reach you via Ingo"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (40 commits)
        KVM: SVM: Fix SEV LAUNCH_SECRET command
        KVM: SVM: install RSM intercept
        KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command
        include: psp-sev: Capitalize invalid length enum
        crypto: ccp: Fix sparse, use plain integer as NULL pointer
        KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled
        x86/kvm: Make parse_no_xxx __init for kvm
        KVM: x86: fix backward migration with async_PF
        kvm: fix warning for non-x86 builds
        kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
        tools/kvm_stat: print 'Total' line for multiple events only
        tools/kvm_stat: group child events indented after parent
        tools/kvm_stat: separate drilldown and fields filtering
        tools/kvm_stat: eliminate extra guest/pid selection dialog
        tools/kvm_stat: mark private methods as such
        tools/kvm_stat: fix debugfs handling
        tools/kvm_stat: print error on invalid regex
        tools/kvm_stat: fix crash when filtering out all non-child trace events
        tools/kvm_stat: avoid 'is' for equality checks
        tools/kvm_stat: use a more pythonic way to iterate over dictionaries
        ...
      d4858aaf
    • Linus Torvalds's avatar
      Linux 4.16-rc3 · 4a3928c6
      Linus Torvalds authored
      4a3928c6
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20180225' of git://github.com/jcmvbkbc/linux-xtensa · e1171aca
      Linus Torvalds authored
      Pull Xtensa fixes from Max Filippov:
       "Two fixes for reserved memory/DMA buffers allocation in high memory on
        xtensa architecture
      
         - fix memory accounting when reserved memory is in high memory region
      
         - fix DMA allocation from high memory"
      
      * tag 'xtensa-20180225' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: support DMA buffers in high memory
        xtensa: fix high memory/reserved memory collision
      e1171aca
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c23a7575
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A small set of fixes:
      
         - UAPI data type correction for hyperv
      
         - correct the cpu cores field in /proc/cpuinfo on CPU hotplug
      
         - return proper error code in the resctrl file system failure path to
           avoid silent subsequent failures
      
         - correct a subtle accounting issue in the new vector allocation code
           which went unnoticed for a while and caused suspend/resume
           failures"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
        x86/topology: Fix function name in documentation
        x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
        x86/apic/vector: Handle vector release on CPU unplug correctly
        genirq/matrix: Handle CPU offlining proper
        x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
      c23a7575
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e912bf2c
      Linus Torvalds authored
      Pull perf fix from Thomas Gleixner:
       "A single commit which shuts up a bogus GCC-8 warning"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()
      e912bf2c
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9c897096
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "Three patches to fix memory ordering issues on ALPHA and a comment to
        clarify the usage scope of a mutex internal function"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
        locking/xchg/alpha: Clean up barrier usage by using smp_mb() in place of __ASM__MB
        locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
        locking/mutex: Add comment to __mutex_owner() to deter usage
      9c897096
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 297ea1b7
      Linus Torvalds authored
      Pull cleanup patchlet from Thomas Gleixner:
       "A single commit removing a bunch of bogus double semicolons all over
        the tree"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        treewide/trivial: Remove ';;$' typo noise
      297ea1b7
  2. 25 Feb, 2018 2 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · c89be524
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
      
       - fix a broken cast in nfs4_callback_recallany()
      
       - fix an Oops during NFSv4 migration events
      
       - make struct nlmclnt_fl_close_lock_ops static
      
      * tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFS: make struct nlmclnt_fl_close_lock_ops static
        nfs: system crashes after NFS4ERR_MOVED recovery
        NFSv4: Fix broken cast in nfs4_callback_recallany()
      c89be524
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 3664ce2d
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Add handling for a missing instruction in our 32-bit BPF JIT so that
         it can be used for seccomp filtering.
      
       - Add a missing NULL pointer check before a function call in new EEH
         code.
      
       - Fix an error path in the new ocxl driver to correctly return EFAULT.
      
       - The support for the new ibm,drc-info device tree property turns out
         to need several fixes, so for now we just stop advertising to
         firmware that we support it until the bugs can be ironed out.
      
       - One fix for the new drmem code which was incorrectly modifying the
         device tree in place.
      
       - Finally two fixes for the RFI flush support, so that firmware can
         advertise to us that it should be disabled entirely so as not to
         affect performance.
      
      Thanks to: Bharata B Rao, Frederic Barrat, Juan J. Alvarez, Mark Lord,
      Michael Bringmann.
      
      * tag 'powerpc-4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Support firmware disable of RFI flush
        powerpc/pseries: Support firmware disable of RFI flush
        powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2
        powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
        powerpc/pseries: Revert support for ibm,drc-info devtree property
        powerpc/pseries: Fix duplicate firmware feature for DRC_INFO
        ocxl: Fix potential bad errno on irq allocation
        powerpc/eeh: Fix crashes in eeh_report_resume()
      3664ce2d
  3. 24 Feb, 2018 28 commits
    • Brijesh Singh's avatar
      KVM: SVM: Fix SEV LAUNCH_SECRET command · 9c5e0afa
      Brijesh Singh authored
      The SEV LAUNCH_SECRET command fails with error code 'invalid param'
      because we missed filling the guest and header system physical address
      while issuing the command.
      
      Fixes: 9f5b5b95 (KVM: SVM: Add support for SEV LAUNCH_SECRET command)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9c5e0afa
    • Brijesh Singh's avatar
      KVM: SVM: install RSM intercept · 7607b717
      Brijesh Singh authored
      RSM instruction is used by the SMM handler to return from SMM mode.
      Currently, rsm causes a #UD - which results in instruction fetch, decode,
      and emulate. By installing the RSM intercept we can avoid the instruction
      fetch since we know that #VMEXIT was due to rsm.
      
      The patch is required for the SEV guest, because in case of SEV guest
      memory is encrypted with guest-specific key and hypervisor will not
      able to fetch the instruction bytes from the guest memory.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7607b717
    • Brijesh Singh's avatar
      KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command · 3e233385
      Brijesh Singh authored
      Using the access_ok() to validate the input before issuing the SEV
      command does not buy us anything in this case. If userland is
      giving us a garbage pointer then copy_to_user() will catch it when we try
      to return the measurement.
      Suggested-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 0d0736f7 (KVM: SVM: Add support for KVM_SEV_LAUNCH_MEASURE ...)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3e233385
    • Brijesh Singh's avatar
      include: psp-sev: Capitalize invalid length enum · 45d0be87
      Brijesh Singh authored
      Commit 1d57b17c ("crypto: ccp: Define SEV userspace ioctl and command
      id") added the invalid length enum but we missed capitalizing it.
      
      Fixes: 1d57b17c (crypto: ccp: Define SEV userspace ioctl ...)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      CC: Gary R Hook <gary.hook@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      45d0be87
    • Brijesh Singh's avatar
      crypto: ccp: Fix sparse, use plain integer as NULL pointer · e5699f56
      Brijesh Singh authored
      Fix sparse warning: Using plain integer as NULL pointer. Replaces
      assignment of 0 to pointer with NULL assignment.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Gary Hook <gary.hook@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e5699f56
    • Wanpeng Li's avatar
      KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled · 4f2f61fc
      Wanpeng Li authored
      Avoid traversing all the cpus for pv tlb flush when steal time
      is disabled since pv tlb flush depends on the field in steal time
      for shared data.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmáŠ<rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4f2f61fc
    • Dou Liyang's avatar
      x86/kvm: Make parse_no_xxx __init for kvm · afdc3f58
      Dou Liyang authored
      The early_param() is only called during kernel initialization, So Linux
      marks the functions of it with __init macro to save memory.
      
      But it forgot to mark the parse_no_kvmapf/stealacc/kvmclock_vsyscall,
      So, Make them __init as well.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: rkrcmar@redhat.com
      Cc: kvm@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: x86@kernel.org
      Signed-off-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      afdc3f58
    • Radim Krčmář's avatar
      KVM: x86: fix backward migration with async_PF · fe2a3027
      Radim Krčmář authored
      Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
      bit when enabling async_PF, but this bit is reserved on old hypervisors,
      which results in a failure upon migration.
      
      To avoid breaking different cases, we are checking for CPUID feature bit
      before enabling the feature and nothing else.
      
      Fixes: 52a5c155 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe2a3027
    • Sebastian Ott's avatar
      kvm: fix warning for non-x86 builds · f75e4924
      Sebastian Ott authored
      Fix the following sparse warning by moving the prototype
      of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
      
        CHECK   arch/s390/kvm/../../../virt/kvm/kvm_main.c
      arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
      Signed-off-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f75e4924
    • Sebastian Ott's avatar
      kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds · 07646749
      Sebastian Ott authored
      Move the kvm_arch_irq_routing_update() prototype outside of
      ifdef CONFIG_HAVE_KVM_EVENTFD guards to fix the following sparse warning:
      
      arch/s390/kvm/../../../virt/kvm/irqchip.c:171:28: warning: symbol 'kvm_arch_irq_routing_update' was not declared. Should it be static?
      Signed-off-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      07646749
    • Stefan Raspl's avatar
      tools/kvm_stat: print 'Total' line for multiple events only · 6789af03
      Stefan Raspl authored
      The 'Total' line looks a bit weird when we have a single event only. This
      can happen e.g. due to filters. Therefore suppress when there's only a
      single event in the output.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6789af03
    • Stefan Raspl's avatar
      tools/kvm_stat: group child events indented after parent · df72ecfc
      Stefan Raspl authored
      We keep the current logic that sorts all events (parent and child), but
      re-shuffle the events afterwards, grouping the children after the
      respective parent. Note that the percentage column for child events
      gives the percentage of the parent's total.
      Since we rework the logic anyway, we modify the total average
      calculation to use the raw numbers instead of the (rounded) averages.
      Note that this can result in differing numbers (between total average
      and the sum of the individual averages) due to rounding errors.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      df72ecfc
    • Stefan Raspl's avatar
      tools/kvm_stat: separate drilldown and fields filtering · 18e8f410
      Stefan Raspl authored
      Drilldown (i.e. toggle display of child trace events) was implemented by
      overriding the fields filter. This resulted in inconsistencies: E.g. when
      drilldown was not active, adding a filter that also matches child trace
      events would not only filter fields according to the filter, but also add
      in the child trace events matching the filter. E.g. on x86, setting
      'kvm_userspace_exit' as the fields filter after startup would result in
      display of kvm_userspace_exit(DCR), although that wasn't previously
      present - not exactly what one would expect from a filter.
      This patch addresses the issue by keeping drilldown and fields filter
      separate. While at it, we also fix a PEP8 issue by adding a blank line
      at one place (since we're in the area...).
      We implement this by adding a framework that also allows to define a
      taxonomy among the debugfs events to identify child trace events. I.e.
      drilldown using 'x' can now also work with debugfs. A respective parent-
      child relationship is only known for S390 at the moment, but could be
      added adjusting other platforms' ARCH.dbg_is_child() methods
      accordingly.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      18e8f410
    • Stefan Raspl's avatar
      tools/kvm_stat: eliminate extra guest/pid selection dialog · 516f1190
      Stefan Raspl authored
      We can do with a single dialog that takes both, pids and guest names.
      Note that we keep both interactive commands, 'p' and 'g' for now, to
      avoid confusion among users used to a specific key.
      
      While at it, we improve on some minor glitches regarding curses usage,
      e.g. cursor still visible when not supposed to be.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      516f1190
    • Stefan Raspl's avatar
      tools/kvm_stat: mark private methods as such · c0e8c21e
      Stefan Raspl authored
      Helps quite a bit reading the code when it's obvious when a method is
      intended for internal use only.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c0e8c21e
    • Stefan Raspl's avatar
      tools/kvm_stat: fix debugfs handling · 1fd6a708
      Stefan Raspl authored
      Te checks for debugfs assumed that debugfs is always mounted at
      /sys/kernel/debug - which is likely, but not guaranteed. This is addressed
      by checking /proc/mounts for the actual location.
      Furthermore, when debugfs was mounted, but the kvm module not loaded, a
      misleading error pointing towards debugfs not present was given.
      To reproduce,
      (a) run kvm_stat with debugfs mounted at a place different from
          /sys/kernel/debug
      (b) run kvm_stat with debugfs mounted but kvm module not loaded
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1fd6a708
    • Stefan Raspl's avatar
      tools/kvm_stat: print error on invalid regex · 1cd8bfb1
      Stefan Raspl authored
      Entering an invalid regular expression did not produce any indication of an
      error so far.
      To reproduce, press 'f' and enter 'foo(' (with an unescaped bracket).
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1cd8bfb1
    • Stefan Raspl's avatar
      tools/kvm_stat: fix crash when filtering out all non-child trace events · 3df33a0f
      Stefan Raspl authored
      When we apply a filter that will only leave child trace events, we
      receive a ZeroDivisionError when calculating the percentages.
      In that case, provide percentages based on child events only.
      To reproduce, run 'kvm_stat -f .*[\(].*'.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3df33a0f
    • Marc Hartmayer's avatar
      tools/kvm_stat: avoid 'is' for equality checks · 369d5a85
      Marc Hartmayer authored
      Use '==' for equality checks and 'is' when comparing identities.
      
      An example where '==' and 'is' behave differently:
      >>> a = 4242
      >>> a == 4242
      True
      >>> a is 4242
      False
      Signed-off-by: default avatarMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      369d5a85
    • Marc Hartmayer's avatar
      tools/kvm_stat: use a more pythonic way to iterate over dictionaries · 0eb57800
      Marc Hartmayer authored
      If it's clear that the values of a dictionary will be used then use
      the '.items()' method.
      Signed-off-by: default avatarMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Tested-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      [Include fix for logging mode by Stefan Raspl]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0eb57800
    • Marc Hartmayer's avatar
      tools/kvm_stat: use a namedtuple for storing the values · 006f1548
      Marc Hartmayer authored
      Use a namedtuple for storing the values as it allows to access the
      fields of a tuple via names. This makes the overall code much easier
      to read and to understand. Access by index is still possible as
      before.
      Signed-off-by: default avatarMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Tested-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      006f1548
    • Marc Hartmayer's avatar
      tools/kvm_stat: simplify the sortkey function · faa312a5
      Marc Hartmayer authored
      The 'sortkey' function references a value in its enclosing
      scope (closure). This is not common practice for a sort key function
      so let's replace it. Additionally, the function 'sorted' has already a
      parameter for reversing the result therefore the inversion of the
      values is unneeded. The check for stats[x][1] is also superfluous as
      it's ensured that this value is initialized with 0.
      Signed-off-by: default avatarMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Tested-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      faa312a5
    • Wanpeng Li's avatar
      KVM: X86: Fix SMRAM accessing even if VM is shutdown · 95e057e2
      Wanpeng Li authored
      Reported by syzkaller:
      
         WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
         RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         Call Trace:
          vmx_handle_exit+0xbd/0xe20 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
          kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
          do_vfs_ioctl+0xa4/0x6a0
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x25/0x9c
      
      The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
      a second thread to mmap and operate on the same vCPU.  This triggers a race
      condition when running the testcase with multiple threads. Sometimes one thread
      exits with a triple fault while another thread mmaps and operates on the same
      vCPU.  Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
      results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
      in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
      exit with KVM_EXIT_INTERNAL_ERROR.
      
      Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      95e057e2
    • Chao Gao's avatar
      KVM: nVMX: Don't halt vcpu when L1 is injecting events to L2 · 135a06c3
      Chao Gao authored
      Although L2 is in halt state, it will be in the active state after
      VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
      State. Halting the vcpu here means the event won't be injected to L2
      and this decision isn't reported to L1. Thus L0 drops an event that
      should be injected to L2.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarChao Gao <chao.gao@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      135a06c3
    • Wanpeng Li's avatar
      KVM: mmu: Fix overlap between public and private memslots · b28676bb
      Wanpeng Li authored
      Reported by syzkaller:
      
          pte_list_remove: ffff9714eb1f8078 0->BUG
          ------------[ cut here ]------------
          kernel BUG at arch/x86/kvm/mmu.c:1157!
          invalid opcode: 0000 [#1] SMP
          RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
          Call Trace:
           drop_spte+0x83/0xb0 [kvm]
           mmu_page_zap_pte+0xcc/0xe0 [kvm]
           kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
           kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
           kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
           kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
           ? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
           __mmu_notifier_release+0x79/0x110
           ? __mmu_notifier_release+0x5/0x110
           exit_mmap+0x15a/0x170
           ? do_exit+0x281/0xcb0
           mmput+0x66/0x160
           do_exit+0x2c9/0xcb0
           ? __context_tracking_exit.part.5+0x4a/0x150
           do_group_exit+0x50/0xd0
           SyS_exit_group+0x14/0x20
           do_syscall_64+0x73/0x1f0
           entry_SYSCALL64_slow_path+0x25/0x25
      
      The reason is that when creates new memslot, there is no guarantee for new
      memslot not overlap with private memslots. This can be triggered by the
      following program:
      
         #include <fcntl.h>
         #include <pthread.h>
         #include <setjmp.h>
         #include <signal.h>
         #include <stddef.h>
         #include <stdint.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/ioctl.h>
         #include <sys/stat.h>
         #include <sys/syscall.h>
         #include <sys/types.h>
         #include <unistd.h>
         #include <linux/kvm.h>
      
         long r[16];
      
         int main()
         {
      	void *p = valloc(0x4000);
      
      	r[2] = open("/dev/kvm", 0);
      	r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
      
      	uint64_t addr = 0xf000;
      	ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
      	r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
      	ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
      	ioctl(r[6], KVM_RUN, 0);
      	ioctl(r[6], KVM_RUN, 0);
      
      	struct kvm_userspace_memory_region mr = {
      		.slot = 0,
      		.flags = KVM_MEM_LOG_DIRTY_PAGES,
      		.guest_phys_addr = 0xf000,
      		.memory_size = 0x4000,
      		.userspace_addr = (uintptr_t) p
      	};
      	ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
      	return 0;
         }
      
      This patch fixes the bug by not adding a new memslot even if it
      overlaps with private memslots.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      ---
       virt/kvm/kvm_main.c | 3 +--
       1 file changed, 1 insertion(+), 2 deletions(-)
      b28676bb
    • Eric Biggers's avatar
      KVM/x86: remove WARN_ON() for when vm_munmap() fails · 103c763c
      Eric Biggers authored
      On x86, special KVM memslots such as the TSS region have anonymous
      memory mappings created on behalf of userspace, and these mappings are
      removed when the VM is destroyed.
      
      It is however possible for removing these mappings via vm_munmap() to
      fail.  This can most easily happen if the thread receives SIGKILL while
      it's waiting to acquire ->mmap_sem.   This triggers the 'WARN_ON(r < 0)'
      in __x86_set_memory_region().  syzkaller was able to hit this, using
      'exit()' to send the SIGKILL.  Note that while the vm_munmap() failure
      results in the mapping not being removed immediately, it is not leaked
      forever but rather will be freed when the process exits.
      
      It's not really possible to handle this failure properly, so almost
      every other caller of vm_munmap() doesn't check the return value.  It's
      a limitation of having the kernel manage these mappings rather than
      userspace.
      
      So just remove the WARN_ON() so that users can't spam the kernel log
      with this warning.
      
      Fixes: f0d648bd ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      103c763c
    • Radim Krčmář's avatar
      KVM: nVMX: preserve SECONDARY_EXEC_DESC without UMIP · 99158246
      Radim Krčmář authored
      L1 might want to use SECONDARY_EXEC_DESC, so we must not clear the VMCS
      bit if UMIP is not being emulated.
      
      We must still set the bit when emulating UMIP as the feature can be
      passed to L2 where L0 will do the emulation and because L2 can change
      CR4 without a VM exit, we should clear the bit if UMIP is disabled.
      
      Fixes: 0367f205 ("KVM: vmx: add support for emulating UMIP")
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      99158246
    • Paolo Bonzini's avatar
      KVM: x86: move LAPIC initialization after VMCS creation · 0b2e9904
      Paolo Bonzini authored
      The initial reset of the local APIC is performed before the VMCS has been
      created, but it tries to do a vmwrite:
      
       vmwrite error: reg 810 value 4a00 (err 18944)
       CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G        W I      4.16.0-0.rc2.git0.1.fc28.x86_64 #1
       Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
       Call Trace:
        vmx_set_rvi [kvm_intel]
        vmx_hwapic_irr_update [kvm_intel]
        kvm_lapic_reset [kvm]
        kvm_create_lapic [kvm]
        kvm_arch_vcpu_init [kvm]
        kvm_vcpu_init [kvm]
        vmx_create_vcpu [kvm_intel]
        kvm_vm_ioctl [kvm]
      
      Move it later, after the VMCS has been created.
      
      Fixes: 4191db26 ("KVM: x86: Update APICv on APIC reset")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0b2e9904
  4. 23 Feb, 2018 2 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9cb9c07d
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh.
      
       2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang.
      
       3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song.
      
       4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang.
      
       5) Fix potential deadlocks in netfilter getsockopt() code paths, from
          Paolo Abeni.
      
       6) Netfilter stackpointer size checks really are needed to validate
          user input, from Florian Westphal.
      
       7) Missing timer init in x_tables, from Paolo Abeni.
      
       8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg.
      
       9) When an ibmvnic device is brought down then back up again, it can be
          sent queue entries from a previous session, handle this properly
          instead of crashing. From Thomas Falcon.
      
      10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman.
      
      11) When we are dumping filters in cls_api, the output SKB is empty, and
          the filter we are dumping is too large for the space in the SKB, we
          should return -EMSGSIZE like other netlink dump operations do.
          Otherwise userland has no signal that is needs to increase the size
          of its read buffer. From Roman Kapl.
      
      12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer.
      
      13) Module refcount leak in netlink when a dump start fails, from Jason
          Donenfeld.
      
      14) Handle sub-optimal GSO sizes better in TCP BBR congestion control,
          from Eric Dumazet.
      
      15) Releasing bpf per-cpu arraymaps can take a long time, add a
          condtional scheduling point. From Eric Dumazet.
      
      16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From
          Daniel Borkmann.
      
      17) Fix page leak in gianfar driver, from Andy Spencer.
      
      18) Missed clearing of estimator scratch buffer, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
        net_sched: gen_estimator: fix broken estimators based on percpu stats
        gianfar: simplify FCS handling and fix memory leak
        ipv6 sit: work around bogus gcc-8 -Wrestrict warning
        macvlan: fix use-after-free in macvlan_common_newlink()
        bpf, arm64: fix out of bounds access in tail call
        bpf, x64: implement retpoline for tail call
        rxrpc: Fix send in rxrpc_send_data_packet()
        net: aquantia: Fix error handling in aq_pci_probe()
        bpf: fix rcu lockdep warning for lpm_trie map_free callback
        bpf: add schedule points in percpu arrays management
        regulatory: add NUL to request alpha2
        ibmvnic: Fix early release of login buffer
        net/smc9194: Remove bogus CONFIG_MAC reference
        net: ipv4: Set addr_type in hash_keys for forwarded case
        tcp_bbr: better deal with suboptimal GSO
        smsc75xx: fix smsc75xx_set_features()
        netlink: put module reference if dump start fails
        selftests/bpf/test_maps: exit child process without error in ENOMEM case
        selftests/bpf: update gitignore with test_libbpf_open
        selftests/bpf: tcpbpf_kern: use in6_* macros from glibc
        ..
      9cb9c07d
    • Linus Torvalds's avatar
      Merge branch 'fixes-v4.16-rc3' of... · 2eb02aa9
      Linus Torvalds authored
      Merge branch 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
      
      Pull security subsystem fixes from James Morris:
      
       - keys fixes via David Howells:
            "A collection of fixes for Linux keyrings, mostly thanks to Eric
             Biggers:
      
              - Fix some PKCS#7 verification issues.
      
              - Fix handling of unsupported crypto in X.509.
      
              - Fix too-large allocation in big_key"
      
       - Seccomp updates via Kees Cook:
            "These are fixes for the get_metadata interface that landed during
             -rc1. While the new selftest is strictly not a bug fix, I think
             it's in the same spirit of avoiding bugs"
      
       - an IMA build fix from Randy Dunlap
      
      * 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        integrity/security: fix digsig.c build error with header file
        KEYS: Use individual pages in big_key for crypto buffers
        X.509: fix NULL dereference when restricting key with unsupported_sig
        X.509: fix BUG_ON() when hash algorithm is unsupported
        PKCS#7: fix direct verification of SignerInfo signature
        PKCS#7: fix certificate blacklisting
        PKCS#7: fix certificate chain verification
        seccomp: add a selftest for get_metadata
        ptrace, seccomp: tweak get_metadata behavior slightly
        seccomp, ptrace: switch get_metadata types to arch independent
      2eb02aa9