1. 05 Mar, 2024 3 commits
    • David Woodhouse's avatar
      KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery · 66e3cf72
      David Woodhouse authored
      The kvm_xen_inject_vcpu_vector() function has a comment saying "the fast
      version will always work for physical unicast", justifying its use of
      kvm_irq_delivery_to_apic_fast() and the WARN_ON_ONCE() when that fails.
      
      In fact that assumption isn't true if X2APIC isn't in use by the guest
      and there is (8-bit x)APIC ID aliasing. A single "unicast" destination
      APIC ID *may* then be delivered to multiple vCPUs. Remove the warning,
      and in fact it might as well just call kvm_irq_delivery_to_apic().
      Reported-by: default avatarMichal Luczaj <mhal@rbox.co>
      Fixes: fde0451b ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Link: https://lore.kernel.org/r/20240227115648.3104-4-dwmw2@infradead.orgSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      66e3cf72
    • David Woodhouse's avatar
      KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled · 8e62bf2b
      David Woodhouse authored
      Linux guests since commit b1c3497e ("x86/xen: Add support for
      HVMOP_set_evtchn_upcall_vector") in v6.0 onwards will use the per-vCPU
      upcall vector when it's advertised in the Xen CPUID leaves.
      
      This upcall is injected through the guest's local APIC as an MSI, unlike
      the older system vector which was merely injected by the hypervisor any
      time the CPU was able to receive an interrupt and the upcall_pending
      flags is set in its vcpu_info.
      
      Effectively, that makes the per-CPU upcall edge triggered instead of
      level triggered, which results in the upcall being lost if the MSI is
      delivered when the local APIC is *disabled*.
      
      Xen checks the vcpu_info->evtchn_upcall_pending flag when the local APIC
      for a vCPU is software enabled (in fact, on any write to the SPIV
      register which doesn't disable the APIC). Do the same in KVM since KVM
      doesn't provide a way for userspace to intervene and trap accesses to
      the SPIV register of a local APIC emulated by KVM.
      
      Fixes: fde0451b ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240227115648.3104-3-dwmw2@infradead.orgSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      8e62bf2b
    • David Woodhouse's avatar
      KVM: x86/xen: improve accuracy of Xen timers · 451a7078
      David Woodhouse authored
      A test program such as http://david.woodhou.se/timerlat.c confirms user
      reports that timers are increasingly inaccurate as the lifetime of a
      guest increases. Reporting the actual delay observed when asking for
      100µs of sleep, it starts off OK on a newly-launched guest but gets
      worse over time, giving incorrect sleep times:
      
      root@ip-10-0-193-21:~# ./timerlat -c -n 5
      00000000 latency 103243/100000 (3.2430%)
      00000001 latency 103243/100000 (3.2430%)
      00000002 latency 103242/100000 (3.2420%)
      00000003 latency 103245/100000 (3.2450%)
      00000004 latency 103245/100000 (3.2450%)
      
      The biggest problem is that get_kvmclock_ns() returns inaccurate values
      when the guest TSC is scaled. The guest sees a TSC value scaled from the
      host TSC by a mul/shift conversion (hopefully done in hardware). The
      guest then converts that guest TSC value into nanoseconds using the
      mul/shift conversion given to it by the KVM pvclock information.
      
      But get_kvmclock_ns() performs only a single conversion directly from
      host TSC to nanoseconds, giving a different result. A test program at
      http://david.woodhou.se/tsdrift.c demonstrates the cumulative error
      over a day.
      
      It's non-trivial to fix get_kvmclock_ns(), although I'll come back to
      that. The actual guest hv_clock is per-CPU, and *theoretically* each
      vCPU could be running at a *different* frequency. But this patch is
      needed anyway because...
      
      The other issue with Xen timers was that the code would snapshot the
      host CLOCK_MONOTONIC at some point in time, and then... after a few
      interrupts may have occurred, some preemption perhaps... would also read
      the guest's kvmclock. Then it would proceed under the false assumption
      that those two happened at the *same* time. Any time which *actually*
      elapsed between reading the two clocks was introduced as inaccuracies
      in the time at which the timer fired.
      
      Fix it to use a variant of kvm_get_time_and_clockread(), which reads the
      host TSC just *once*, then use the returned TSC value to calculate the
      kvmclock (making sure to do that the way the guest would instead of
      making the same mistake get_kvmclock_ns() does).
      
      Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen
      timers still have to use CLOCK_MONOTONIC. In practice the difference
      between the two won't matter over the timescales involved, as the
      *absolute* values don't matter; just the delta.
      
      This does mean a new variant of kvm_get_time_and_clockread() is needed;
      called kvm_get_monotonic_and_clockread() because that's what it does.
      
      Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Link: https://lore.kernel.org/r/20240227115648.3104-2-dwmw2@infradead.org
      [sean: massage moved comment, tweak if statement formatting]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      451a7078
  2. 22 Feb, 2024 7 commits
  3. 20 Feb, 2024 11 commits
  4. 08 Feb, 2024 13 commits
  5. 07 Feb, 2024 4 commits
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.8-2' of... · 547ab8fc
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix acpi_core_pic[] array overflow, fix earlycon parameter if KASAN
        enabled, disable UBSAN instrumentation for vDSO build, and two Kconfig
        cleanups"
      
      * tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: vDSO: Disable UBSAN instrumentation
        LoongArch: Fix earlycon parameter if KASAN enabled
        LoongArch: Change acpi_core_pic[NR_CPUS] to acpi_core_pic[MAX_CORE_PIC]
        LoongArch: Select HAVE_ARCH_SECCOMP to use the common SECCOMP menu
        LoongArch: Select ARCH_ENABLE_THP_MIGRATION instead of redefining it
      547ab8fc
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 5c24ba20
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "x86 guest:
      
         - Avoid false positive for check that only matters on AMD processors
      
        x86:
      
         - Give a hint when Win2016 might fail to boot due to XSAVES &&
           !XSAVEC configuration
      
         - Do not allow creating an in-kernel PIT unless an IOAPIC already
           exists
      
        RISC-V:
      
         - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc,
           scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)
      
        S390:
      
         - fix CC for successful PQAP instruction
      
         - fix a race when creating a shadow page"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM
        x86/kvm: Fix SEV check in sev_map_percpu_data()
        KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum
        KVM: x86: Check irqchip mode before create PIT
        KVM: riscv: selftests: Add Zfa extension to get-reg-list test
        RISC-V: KVM: Allow Zfa extension for Guest/VM
        KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
        RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
        KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
        RISC-V: KVM: Allow Zihintntl extension for Guest/VM
        KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
        RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
        KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test
        RISC-V: KVM: Allow vector crypto extensions for Guest/VM
        KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test
        RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
        KVM: riscv: selftests: Add Zbc extension to get-reg-list test
        RISC-V: KVM: Allow Zbc extension for Guest/VM
        KVM: s390: fix cc for successful PQAP
        KVM: s390: vsie: fix race during shadow creation
      5c24ba20
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · c8d80f83
      Linus Torvalds authored
      Pull nfsd fix from Chuck Lever:
      
       - Address a deadlock regression in RELEASE_LOCKOWNER
      
      * tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: don't take fi_lock in nfsd_break_deleg_cb()
      c8d80f83
    • Linus Torvalds's avatar
      Merge tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6d280f4d
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - two fixes preventing deletion and manual creation of subvolume qgroup
      
       - unify error code returned for unknown send flags
      
       - fix assertion during subvolume creation when anonymous device could
         be allocated by other thread (e.g. due to backref walk)
      
      * tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: do not ASSERT() if the newly created subvolume already got read
        btrfs: forbid deleting live subvol qgroup
        btrfs: forbid creating subvol qgroups
        btrfs: send: return EOPNOTSUPP on unknown flags
      6d280f4d
  6. 06 Feb, 2024 2 commits
    • Nathan Chancellor's avatar
      x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM · e4596477
      Nathan Chancellor authored
      After commit a9ef2774 ("x86/kvm: Fix SEV check in
      sev_map_percpu_data()"), there is a build error when building
      x86_64_defconfig with GCOV using LLVM:
      
        ld.lld: error: undefined symbol: cc_vendor
        >>> referenced by kvm.c
        >>>               arch/x86/kernel/kvm.o:(kvm_smp_prepare_boot_cpu) in archive vmlinux.a
      
      which corresponds to
      
        if (cc_vendor != CC_VENDOR_AMD ||
            !cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
                  return;
      
      Without GCOV, clang is able to eliminate the use of cc_vendor because
      cc_platform_has() evaluates to false when CONFIG_ARCH_HAS_CC_PLATFORM is
      not set, meaning that if statement will be true no matter what value
      cc_vendor has.
      
      With GCOV, the instrumentation keeps the use of cc_vendor around for
      code coverage purposes but cc_vendor is only declared, not defined,
      without CONFIG_ARCH_HAS_CC_PLATFORM, leading to the build error above.
      
      Provide a macro definition of cc_vendor when CONFIG_ARCH_HAS_CC_PLATFORM
      is not set with a value of CC_VENDOR_NONE, so that the first condition
      can always be evaluated/eliminated at compile time, avoiding the build
      error altogether. This is very similar to the situation prior to
      commit da86eb96 ("x86/coco: Get rid of accessor functions").
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Message-Id: <20240202-provide-cc_vendor-without-arch_has_cc_platform-v1-1-09ad5f2a3099@kernel.org>
      Fixes: a9ef2774 ("x86/kvm: Fix SEV check in sev_map_percpu_data()", 2024-01-31)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e4596477
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-02-05' of https://evilpiepirate.org/git/bcachefs · 99bd3cb0
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
       "Two serious ones here that we'll want to backport to stable: a fix for
        a race in the thread_with_file code, and another locking fixup in the
        subvolume deletion path"
      
      * tag 'bcachefs-2024-02-05' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: time_stats: Check for last_event == 0 when updating freq stats
        bcachefs: install fd later to avoid race with close
        bcachefs: unlock parent dir if entry is not found in subvolume deletion
        bcachefs: Fix build on parisc by avoiding __multi3()
      99bd3cb0