1. 15 Nov, 2020 24 commits
    • Peter Xu's avatar
      KVM: selftests: Add dirty ring buffer test · 84292e56
      Peter Xu authored
      Add the initial dirty ring buffer test.
      
      The current test implements the userspace dirty ring collection, by
      only reaping the dirty ring when the ring is full.
      
      So it's still running synchronously like this:
      
                  vcpu                             main thread
      
        1. vcpu dirties pages
        2. vcpu gets dirty ring full
           (userspace exit)
      
                                             3. main thread waits until full
                                                (so hardware buffers flushed)
                                             4. main thread collects
                                             5. main thread continues vcpu
      
        6. vcpu continues, goes back to 1
      
      We can't directly collects dirty bits during vcpu execution because
      otherwise we can't guarantee the hardware dirty bits were flushed when
      we collect and we're very strict on the dirty bits so otherwise we can
      fail the future verify procedure.  A follow up patch will make this
      test to support async just like the existing dirty log test, by adding
      a vcpu kick mechanism.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012237.6111-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      84292e56
    • Peter Xu's avatar
      KVM: selftests: Introduce after_vcpu_run hook for dirty log test · 60f644fb
      Peter Xu authored
      Provide a hook for the checks after vcpu_run() completes.  Preparation
      for the dirty ring test because we'll need to take care of another
      exit reason.
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012235.6063-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      60f644fb
    • Peter Xu's avatar
      KVM: Don't allocate dirty bitmap if dirty ring is enabled · 044c59c4
      Peter Xu authored
      Because kvm dirty rings and kvm dirty log is used in an exclusive way,
      Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled.
      At the meantime, since the dirty_bitmap will be conditionally created
      now, we can't use it as a sign of "whether this memory slot enabled
      dirty tracking".  Change users like that to check against the kvm
      memory slot flags.
      
      Note that there still can be chances where the kvm memory slot got its
      dirty_bitmap allocated, _if_ the memory slots are created before
      enabling of the dirty rings and at the same time with the dirty
      tracking capability enabled, they'll still with the dirty_bitmap.
      However it should not hurt much (e.g., the bitmaps will always be
      freed if they are there), and the real users normally won't trigger
      this because dirty bit tracking flag should in most cases only be
      applied to kvm slots only before migration starts, that should be far
      latter than kvm initializes (VM starts).
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012226.5868-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      044c59c4
    • Peter Xu's avatar
      KVM: Make dirty ring exclusive to dirty bitmap log · b2cc64c4
      Peter Xu authored
      There's no good reason to use both the dirty bitmap logging and the
      new dirty ring buffer to track dirty bits.  We should be able to even
      support both of them at the same time, but it could complicate things
      which could actually help little.  Let's simply make it the rule
      before we enable dirty ring on any arch, that we don't allow these two
      interfaces to be used together.
      
      The big world switch would be KVM_CAP_DIRTY_LOG_RING capability
      enablement.  That's where we'll switch from the default dirty logging
      way to the dirty ring way.  As long as kvm->dirty_ring_size is setup
      correctly, we'll once and for all switch to the dirty ring buffer mode
      for the current virtual machine.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012224.5818-1-peterx@redhat.com>
      [Change errno from EINVAL to ENXIO. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b2cc64c4
    • Peter Xu's avatar
      KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed
      Peter Xu authored
      This patch is heavily based on previous work from Lei Cao
      <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]
      
      KVM currently uses large bitmaps to track dirty memory.  These bitmaps
      are copied to userspace when userspace queries KVM for its dirty page
      information.  The use of bitmaps is mostly sufficient for live
      migration, as large parts of memory are be dirtied from one log-dirty
      pass to another.  However, in a checkpointing system, the number of
      dirty pages is small and in fact it is often bounded---the VM is
      paused when it has dirtied a pre-defined number of pages. Traversing a
      large, sparsely populated bitmap to find set bits is time-consuming,
      as is copying the bitmap to user-space.
      
      A similar issue will be there for live migration when the guest memory
      is huge while the page dirty procedure is trivial.  In that case for
      each dirty sync we need to pull the whole dirty bitmap to userspace
      and analyse every bit even if it's mostly zeros.
      
      The preferred data structure for above scenarios is a dense list of
      guest frame numbers (GFN).  This patch series stores the dirty list in
      kernel memory that can be memory mapped into userspace to allow speedy
      harvesting.
      
      This patch enables dirty ring for X86 only.  However it should be
      easily extended to other archs as well.
      
      [1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: default avatarLei Cao <lei.cao@stratus.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012222.5767-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb04a1ed
    • Peter Xu's avatar
      KVM: Pass in kvm pointer into mark_page_dirty_in_slot() · 28bd726a
      Peter Xu authored
      The context will be needed to implement the kvm dirty ring.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012044.5151-5-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28bd726a
    • Paolo Bonzini's avatar
      KVM: remove kvm_clear_guest_page · 2f541442
      Paolo Bonzini authored
      kvm_clear_guest_page is not used anymore after "KVM: X86: Don't track dirty
      for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]", except from kvm_clear_guest.
      We can just inline it in its sole user.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2f541442
    • Peter Xu's avatar
      KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] · ff5a983c
      Peter Xu authored
      Originally, we have three code paths that can dirty a page without
      vcpu context for X86:
      
        - init_rmode_identity_map
        - init_rmode_tss
        - kvmgt_rw_gpa
      
      init_rmode_identity_map and init_rmode_tss will be setup on
      destination VM no matter what (and the guest cannot even see them), so
      it does not make sense to track them at all.
      
      To do this, allow __x86_set_memory_region() to return the userspace
      address that just allocated to the caller.  Then in both of the
      functions we directly write to the userspace address instead of
      calling kvm_write_*() APIs.
      
      Another trivial change is that we don't need to explicitly clear the
      identity page table root in init_rmode_identity_map() because no
      matter what we'll write to the whole page with 4M huge page entries.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012044.5151-4-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ff5a983c
    • Vitaly Kuznetsov's avatar
      KVM: selftests: test KVM_GET_SUPPORTED_HV_CPUID as a system ioctl · 8b460692
      Vitaly Kuznetsov authored
      KVM_GET_SUPPORTED_HV_CPUID is now supported as both vCPU and VM ioctl,
      test that.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200929150944.1235688-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8b460692
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl · c21d54f0
      Vitaly Kuznetsov authored
      KVM_GET_SUPPORTED_HV_CPUID is a vCPU ioctl but its output is now
      independent from vCPU and in some cases VMMs may want to use it as a system
      ioctl instead. In particular, QEMU doesn CPU feature expansion before any
      vCPU gets created so KVM_GET_SUPPORTED_HV_CPUID can't be used.
      
      Convert KVM_GET_SUPPORTED_HV_CPUID to 'dual' system/vCPU ioctl with the
      same meaning.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200929150944.1235688-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c21d54f0
    • David Woodhouse's avatar
      kvm/eventfd: Drain events from eventfd in irqfd_wakeup() · b59e00dd
      David Woodhouse authored
      Don't allow the events to accumulate in the eventfd counter, drain them
      as they are handled.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20201027135523.646811-4-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b59e00dd
    • David Woodhouse's avatar
      vfio/virqfd: Drain events from eventfd in virqfd_wakeup() · b1b397ae
      David Woodhouse authored
      Don't allow the events to accumulate in the eventfd counter, drain them
      as they are handled.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20201027135523.646811-3-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      b1b397ae
    • David Woodhouse's avatar
      eventfd: Export eventfd_ctx_do_read() · 28f13267
      David Woodhouse authored
      Where events are consumed in the kernel, for example by KVM's
      irqfd_wakeup() and VFIO's virqfd_wakeup(), they currently lack a
      mechanism to drain the eventfd's counter.
      
      Since the wait queue is already locked while the wakeup functions are
      invoked, all they really need to do is call eventfd_ctx_do_read().
      
      Add a check for the lock, and export it for them.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20201027135523.646811-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28f13267
    • David Woodhouse's avatar
      kvm/eventfd: Use priority waitqueue to catch events before userspace · e8dbf195
      David Woodhouse authored
      As far as I can tell, when we use posted interrupts we silently cut off
      the events from userspace, if it's listening on the same eventfd that
      feeds the irqfd.
      
      I like that behaviour. Let's do it all the time, even without posted
      interrupts. It makes it much easier to handle IRQ remapping invalidation
      without having to constantly add/remove the fd from the userspace poll
      set. We can just leave userspace polling on it, and the bypass will...
      well... bypass it.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20201026175325.585623-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8dbf195
    • David Woodhouse's avatar
      sched/wait: Add add_wait_queue_priority() · c4d51a52
      David Woodhouse authored
      This allows an exclusive wait_queue_entry to be added at the head of the
      queue, instead of the tail as normal. Thus, it gets to consume events
      first without allowing non-exclusive waiters to be woken at all.
      
      The (first) intended use is for KVM IRQFD, which currently has
      inconsistent behaviour depending on whether posted interrupts are
      available or not. If they are, KVM will bypass the eventfd completely
      and deliver interrupts directly to the appropriate vCPU. If not, events
      are delivered through the eventfd and userspace will receive them when
      polling on the eventfd.
      
      By using add_wait_queue_priority(), KVM will be able to consistently
      consume events within the kernel without accidentally exposing them
      to userspace when they're supposed to be bypassed. This, in turn, means
      that userspace doesn't have to jump through hoops to avoid listening
      on the erroneously noisy eventfd and injecting duplicate interrupts.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20201027143944.648769-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c4d51a52
    • Yadong Qi's avatar
      KVM: x86: emulate wait-for-SIPI and SIPI-VMExit · bf0cd88c
      Yadong Qi authored
      Background: We have a lightweight HV, it needs INIT-VMExit and
      SIPI-VMExit to wake-up APs for guests since it do not monitor
      the Local APIC. But currently virtual wait-for-SIPI(WFS) state
      is not supported in nVMX, so when running on top of KVM, the L1
      HV cannot receive the INIT-VMExit and SIPI-VMExit which cause
      the L2 guest cannot wake up the APs.
      
      According to Intel SDM Chapter 25.2 Other Causes of VM Exits,
      SIPIs cause VM exits when a logical processor is in
      wait-for-SIPI state.
      
      In this patch:
          1. introduce SIPI exit reason,
          2. introduce wait-for-SIPI state for nVMX,
          3. advertise wait-for-SIPI support to guest.
      
      When L1 hypervisor is not monitoring Local APIC, L0 need to emulate
      INIT-VMExit and SIPI-VMExit to L1 to emulate INIT-SIPI-SIPI for
      L2. L2 LAPIC write would be traped by L0 Hypervisor(KVM), L0 should
      emulate the INIT/SIPI vmexit to L1 hypervisor to set proper state
      for L2's vcpu state.
      
      Handle procdure:
      Source vCPU:
          L2 write LAPIC.ICR(INIT).
          L0 trap LAPIC.ICR write(INIT): inject a latched INIT event to target
             vCPU.
      Target vCPU:
          L0 emulate an INIT VMExit to L1 if is guest mode.
          L1 set guest VMCS, guest_activity_state=WAIT_SIPI, vmresume.
          L0 set vcpu.mp_state to INIT_RECEIVED if (vmcs12.guest_activity_state
             == WAIT_SIPI).
      
      Source vCPU:
          L2 write LAPIC.ICR(SIPI).
          L0 trap LAPIC.ICR write(INIT): inject a latched SIPI event to traget
             vCPU.
      Target vCPU:
          L0 emulate an SIPI VMExit to L1 if (vcpu.mp_state == INIT_RECEIVED).
          L1 set CS:IP, guest_activity_state=ACTIVE, vmresume.
          L0 resume to L2.
          L2 start-up.
      Signed-off-by: default avatarYadong Qi <yadong.qi@intel.com>
      Message-Id: <20200922052343.84388-1-yadong.qi@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20201106065122.403183-1-yadong.qi@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf0cd88c
    • Paolo Bonzini's avatar
      KVM: x86: fix apic_accept_events vs check_nested_events · 1c96dcce
      Paolo Bonzini authored
      vmx_apic_init_signal_blocked is buggy in that it returns true
      even in VMX non-root mode.  In non-root mode, however, INITs
      are not latched, they just cause a vmexit.  Previously,
      KVM was waiting for them to be processed when kvm_apic_accept_events
      and in the meanwhile it ate the SIPIs that the processor received.
      
      However, in order to implement the wait-for-SIPI activity state,
      KVM will have to process KVM_APIC_SIPI in vmx_check_nested_events,
      and it will not be possible anymore to disregard SIPIs in non-root
      mode as the code is currently doing.
      
      By calling kvm_x86_ops.nested_ops->check_events, we can force a vmexit
      (with the side-effect of latching INITs) before incorrectly injecting
      an INIT or SIPI in a guest, and therefore vmx_apic_init_signal_blocked
      can do the right thing.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1c96dcce
    • Sean Christopherson's avatar
      KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2 · 7a873e45
      Sean Christopherson authored
      Extend the KVM_SET_SREGS test to verify that all supported CR4 bits, as
      enumerated by KVM, can be set before KVM_SET_CPUID2, i.e. without first
      defining the vCPU model.  KVM is supposed to skip guest CPUID checks
      when host userspace is stuffing guest state.
      
      Check the inverse as well, i.e. that KVM rejects KVM_SET_REGS if CR4
      has one or more unsupported bits set.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-7-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7a873e45
    • Sean Christopherson's avatar
      KVM: x86: Return bool instead of int for CR4 and SREGS validity checks · ee69c92b
      Sean Christopherson authored
      Rework the common CR4 and SREGS checks to return a bool instead of an
      int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to
      clarify the polarity of the return value (which is effectively inverted
      by this change).
      
      No functional changed intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ee69c92b
    • Sean Christopherson's avatar
      KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · c2fe3cd4
      Sean Christopherson authored
      Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
      and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
      KVM_SET_SREGS would return success while failing to actually set CR4.
      
      Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
      in __set_sregs() is not a viable option as KVM has already stuffed a
      variety of vCPU state.
      
      Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
      inverted semantics.  This will be remedied in a future patch.
      
      Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2fe3cd4
    • Sean Christopherson's avatar
      KVM: SVM: Drop VMXE check from svm_set_cr4() · 311a0659
      Sean Christopherson authored
      Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
      the check by incorporating VMXE into the CR4 reserved bits, via
      kvm_cpu_caps.  SVM obviously does not set X86_FEATURE_VMX.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      311a0659
    • Sean Christopherson's avatar
      KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4() · a447e38a
      Sean Christopherson authored
      Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
      that common x86 handles the check by incorporating VMXE into the CR4
      reserved bits, via kvm_cpu_caps.  X86_FEATURE_VMX is set in kvm_cpu_caps
      (by vmx_set_cpu_caps()), if and only if 'nested' is true.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a447e38a
    • Sean Christopherson's avatar
      KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4() · d3a9e414
      Sean Christopherson authored
      Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
      that common x86 handles the check by incorporating VMXE into the CR4
      reserved bits, i.e. in cr4_guest_rsvd_bits.  This fixes a bug where KVM
      incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
      before KVM_SET_CPUID{,2}.
      
      Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Reported-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d3a9e414
    • Paolo Bonzini's avatar
      kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9
      Paolo Bonzini authored
      In some cases where shadow paging is in use, the root page will
      be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
      not have an associated struct kvm_mmu_page, because it is allocated
      with alloc_page instead of kvm_mmu_alloc_page.
      
      Just return false quickly from is_tdp_mmu_root if the TDP MMU is
      not in use, which also includes the case where shadow paging is
      enabled.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c887c9b9
  2. 13 Nov, 2020 6 commits
  3. 12 Nov, 2020 10 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · db7c9535
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for
           ENETC
      
        Current release - bugs in new features:
      
         - mptcp: provide rmem[0] limit offset to fix oops
      
        Previous release - regressions:
      
         - IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
           calculations
      
         - lan743x: correctly handle chips with internal PHY
      
         - bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
      
         - mlx5e: Fix VXLAN port table synchronization after function reload
      
        Previous release - always broken:
      
         - bpf: Zero-fill re-used per-cpu map element
      
         - fix out-of-order UDP packets when forwarding with UDP GSO fraglists
           turned on:
             - fix UDP header access on Fast/frag0 UDP GRO
             - fix IP header access and skb lookup on Fast/frag0 UDP GRO
      
         - ethtool: netlink: add missing netdev_features_change() call
      
         - net: Update window_clamp if SOCK_RCVBUF is set
      
         - igc: Fix returning wrong statistics
      
         - ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
      
         - tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
      
         - r8169: disable hw csum for short packets on all chip versions
      
         - vrf: Fix fast path output packet handling with async Netfilter
           rules"
      
      * tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        lan743x: fix use of uninitialized variable
        net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
        net: udp: fix UDP header access on Fast/frag0 UDP GRO
        devlink: Avoid overwriting port attributes of registered port
        vrf: Fix fast path output packet handling with async Netfilter rules
        cosa: Add missing kfree in error path of cosa_write
        net: switch to the kernel.org patchwork instance
        ch_ktls: stop the txq if reaches threshold
        ch_ktls: tcb update fails sometimes
        ch_ktls/cxgb4: handle partial tag alone SKBs
        ch_ktls: don't free skb before sending FIN
        ch_ktls: packet handling prior to start marker
        ch_ktls: Correction in middle record handling
        ch_ktls: missing handling of header alone
        ch_ktls: Correction in trimmed_len calculation
        cxgb4/ch_ktls: creating skbs causes panic
        ch_ktls: Update cheksum information
        ch_ktls: Correction in finding correct length
        cxgb4/ch_ktls: decrypted bit is not enough
        net/x25: Fix null-ptr-deref in x25_connect
        ...
      db7c9535
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 200f9d21
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "Stable fixes:
        - Fix failure to unregister shrinker
      
        Other fixes:
        - Fix unnecessary locking to clear up some contention
        - Fix listxattr receive buffer size
        - Fix default mount options for nfsroot"
      
      * tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFS: Remove unnecessary inode lock in nfs_fsync_dir()
        NFS: Remove unnecessary inode locking in nfs_llseek_dir()
        NFS: Fix listxattr receive buffer size
        NFSv4.2: fix failure to unregister shrinker
        nfsroot: Default mount option should ask for built-in NFS version
      200f9d21
    • Marc Zyngier's avatar
      KVM: arm64: Handle SCXTNUM_ELx traps · ed4ffaf4
      Marc Zyngier authored
      As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
      will trap to EL2. Let's handle that as gracefully as possible
      by injecting an UNDEF exception into the guest. This is consistent
      with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
      ed4ffaf4
    • Marc Zyngier's avatar
      KVM: arm64: Unify trap handlers injecting an UNDEF · 338b1793
      Marc Zyngier authored
      A large number of system register trap handlers only inject an
      UNDEF exeption, and yet each class of sysreg seems to provide its
      own, identical function.
      
      Let's unify them all, saving us introducing yet another one later.
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
      338b1793
    • Marc Zyngier's avatar
      KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace · 23711a5e
      Marc Zyngier authored
      We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
      that are immune to Spectre-v2, but that don't have this field set,
      most likely because they predate the specification.
      
      However, this prevents the migration of guests that have started on
      a host the doesn't fake this CSV2 setting to one that does, as KVM
      rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
      what is already there.
      
      In order to fix this, allow userspace to set this field as long as
      this doesn't result in a promising more than what is already there
      (setting CSV2 to 0 is acceptable, but setting it to 1 when it is
      already set to 0 isn't).
      
      Fixes: e1026237 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
      Reported-by: default avatarPeng Liang <liangpeng10@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
      23711a5e
    • Marc Zyngier's avatar
      Merge tag 'v5.10-rc1' into kvmarm-master/next · 4f6b838c
      Marc Zyngier authored
      Linux 5.10-rc1
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      4f6b838c
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · af5043c8
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These are mostly docmentation fixes and janitorial changes plus some
        new device IDs and a new quirk.
      
        Specifics:
      
         - Fix documentation regarding GPIO properties (Andy Shevchenko)
      
         - Fix spelling mistakes in ACPI documentation (Flavio Suligoi)
      
         - Fix white space inconsistencies in ACPI code (Maximilian Luz)
      
         - Fix string formatting in the ACPI Generic Event Device (GED) driver
           (Nick Desaulniers)
      
         - Add Intel Alder Lake device IDs to the ACPI drivers used by the
           Dynamic Platform and Thermal Framework (Srinivas Pandruvada)
      
         - Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI
           button driver (Hans de Goede)"
      
      * tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: DPTF: Support Alder Lake
        Documentation: ACPI: fix spelling mistakes
        ACPI: button: Add DMI quirk for Medion Akoya E2228T
        ACPI: GED: fix -Wformat
        ACPI: Fix whitespace inconsistencies
        ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
        Documentation: firmware-guide: gpio-properties: Clarify initial output state
        Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()
        Documentation: firmware-guide: gpio-properties: Fix factual mistakes
      af5043c8
    • Linus Torvalds's avatar
      Merge tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fcfb6791
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Make the intel_pstate driver behave as expected when it operates in
        the passive mode with HWP enabled and the 'powersave' governor on top
        of it"
      
      * tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account
        cpufreq: Add strict_target to struct cpufreq_policy
        cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
        cpufreq: Introduce governor flags
      fcfb6791
    • Sven Van Asbroeck's avatar
      lan743x: fix use of uninitialized variable · edbc2111
      Sven Van Asbroeck authored
      When no devicetree is present, the driver will use an
      uninitialized variable.
      
      Fix by initializing this variable.
      
      Fixes: 902a66e0 ("lan743x: correctly handle chips with internal PHY")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarSven Van Asbroeck <thesven73@gmail.com>
      Link: https://lore.kernel.org/r/20201112152513.1941-1-TheSven73@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      edbc2111
    • Jakub Kicinski's avatar
      Merge branch 'net-udp-fix-fast-frag0-udp-gro' · 5861c8cb
      Jakub Kicinski authored
      Alexander Lobakin says:
      
      ====================
      net: udp: fix Fast/frag0 UDP GRO
      
      While testing UDP GSO fraglists forwarding through driver that uses
      Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
      iperf packets:
      
      [ ID] Interval           Transfer     Bitrate         Jitter
      [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
      
      Simple switch to napi_gro_receive() or any other method without frag0
      shortcut completely resolved them.
      
      I've found two incorrect header accesses in GRO receive callback(s):
       - udp_hdr() (instead of udp_gro_udphdr()) that always points to junk
         in "fast" mode and could probably do this in "regular".
         This was the actual bug that caused all out-of-order delivers;
       - udp{4,6}_lib_lookup_skb() -> ip{,v6}_hdr() (instead of
         skb_gro_network_header()) that potentionally might return odd
         pointers in both modes.
      
      Each patch addresses one of these two issues.
      
      This doesn't cover a support for nested tunnels as it's out of the
      subject and requires more invasive changes. It will be handled
      separately in net-next series.
      
      Credits:
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Willem de Bruijn <willemb@google.com>
      
      Since v4 [0]:
       - split the fix into two logical ones (Willem);
       - replace ternaries with plain ifs to beautify the code (Jakub);
       - drop p->data part to reintroduce it later in abovementioned set.
      
      Since v3 [1]:
       - restore the original {,__}udp{4,6}_lib_lookup_skb() and use
         private versions of them inside GRO code (Willem).
      
      Since v2 [2]:
       - dropped redundant check introduced in v2 as it's performed right
         before (thanks to Eric);
       - udp_hdr() switched to data + off for skbs from list (also Eric);
       - fixed possible malfunction of {,__}udp{4,6}_lib_lookup_skb() with
         Fast/frag0 due to ip{,v6}_hdr() usage (Willem).
      
      Since v1 [3]:
       - added a NULL pointer check for "uh" as suggested by Willem.
      
      [0] https://lore.kernel.org/netdev/Ha2hou5eJPcblo4abjAqxZRzIl1RaLs2Hy0oOAgFs@cp4-web-036.plabs.ch
      [1] https://lore.kernel.org/netdev/MgZce9htmEtCtHg7pmWxXXfdhmQ6AHrnltXC41zOoo@cp7-web-042.plabs.ch
      [2] https://lore.kernel.org/netdev/0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch
      [3] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
      ====================
      
      Link: https://lore.kernel.org/r/hjGOh0iCOYyo1FPiZh6TMXcx3YCgNs1T1eGKLrDz8@cp4-web-037.plabs.chSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5861c8cb