1. 08 Jun, 2022 11 commits
    • Maciej S. Szmigiero's avatar
      KVM: selftests: nSVM: Add svm_nested_soft_inject_test · d8969871
      Maciej S. Szmigiero authored
      Add a KVM self-test that checks whether a nSVM L1 is able to successfully
      inject a software interrupt, a soft exception and a NMI into its L2 guest.
      
      In practice, this tests both the next_rip field consistency and
      L1-injected event with intervening L0 VMEXIT during its delivery:
      the first nested VMRUN (that's also trying to inject a software interrupt)
      will immediately trigger a L0 NPF.
      This L0 NPF will have zero in its CPU-returned next_rip field, which if
      incorrectly reused by KVM will trigger a #PF when trying to return to
      such address 0 from the interrupt handler.
      
      For NMI injection this tests whether the L1 NMI state isn't getting
      incorrectly mixed with the L2 NMI state if a L1 -> L2 NMI needs to be
      re-injected.
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      [sean: check exact L2 RIP on first soft interrupt]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <d5f3d56528558ad8e28a9f1e1e4187f5a1e6770a.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d8969871
    • Maciej S. Szmigiero's avatar
      KVM: nSVM: Transparently handle L1 -> L2 NMI re-injection · 159fc6fa
      Maciej S. Szmigiero authored
      A NMI that L1 wants to inject into its L2 should be directly re-injected,
      without causing L0 side effects like engaging NMI blocking for L1.
      
      It's also worth noting that in this case it is L1 responsibility
      to track the NMI window status for its L2 guest.
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <f894d13501cd48157b3069a4b4c7369575ddb60e.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      159fc6fa
    • Sean Christopherson's avatar
      KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint · 2d613912
      Sean Christopherson authored
      In the IRQ injection tracepoint, differentiate between Hard IRQs and Soft
      "IRQs", i.e. interrupts that are reinjected after incomplete delivery of
      a software interrupt from an INTn instruction.  Tag reinjected interrupts
      as such, even though the information is usually redundant since soft
      interrupts are only ever reinjected by KVM.  Though rare in practice, a
      hard IRQ can be reinjected.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      [MSS: change "kvm_inj_virq" event "reinjected" field type to bool]
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <9664d49b3bd21e227caa501cff77b0569bebffe2.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2d613912
    • Sean Christopherson's avatar
      KVM: x86: Print error code in exception injection tracepoint iff valid · 21d4c575
      Sean Christopherson authored
      Print the error code in the exception injection tracepoint if and only if
      the exception has an error code.  Define the entire error code sequence
      as a set of formatted strings, print empty strings if there's no error
      code, and abuse __print_symbolic() by passing it an empty array to coerce
      it into printing the error code as a hex string.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <e8f0511733ed2a0410cbee8a0a7388eac2ee5bac.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      21d4c575
    • Sean Christopherson's avatar
      KVM: x86: Trace re-injected exceptions · a61d7c54
      Sean Christopherson authored
      Trace exceptions that are re-injected, not just those that KVM is
      injecting for the first time.  Debugging re-injection bugs is painful
      enough as is, not having visibility into what KVM is doing only makes
      things worse.
      
      Delay propagating pending=>injected in the non-reinjection path so that
      the tracing can properly identify reinjected exceptions.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <25470690a38b4d2b32b6204875dd35676c65c9f2.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a61d7c54
    • Sean Christopherson's avatar
      KVM: SVM: Re-inject INTn instead of retrying the insn on "failure" · 7e5b5ef8
      Sean Christopherson authored
      Re-inject INTn software interrupts instead of retrying the instruction if
      the CPU encountered an intercepted exception while vectoring the INTn,
      e.g. if KVM intercepted a #PF when utilizing shadow paging.  Retrying the
      instruction is architecturally wrong e.g. will result in a spurious #DB
      if there's a code breakpoint on the INT3/O, and lack of re-injection also
      breaks nested virtualization, e.g. if L1 injects a software interrupt and
      vectoring the injected interrupt encounters an exception that is
      intercepted by L0 but not L1.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <1654ad502f860948e4f2d57b8bd881d67301f785.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7e5b5ef8
    • Sean Christopherson's avatar
      KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction · 6ef88d6e
      Sean Christopherson authored
      Re-inject INT3/INTO instead of retrying the instruction if the CPU
      encountered an intercepted exception while vectoring the software
      exception, e.g. if vectoring INT3 encounters a #PF and KVM is using
      shadow paging.  Retrying the instruction is architecturally wrong, e.g.
      will result in a spurious #DB if there's a code breakpoint on the INT3/O,
      and lack of re-injection also breaks nested virtualization, e.g. if L1
      injects a software exception and vectoring the injected exception
      encounters an exception that is intercepted by L0 but not L1.
      
      Due to, ahem, deficiencies in the SVM architecture, acquiring the next
      RIP may require flowing through the emulator even if NRIPS is supported,
      as the CPU clears next_rip if the VM-Exit is due to an exception other
      than "exceptions caused by the INT3, INTO, and BOUND instructions".  To
      deal with this, "skip" the instruction to calculate next_rip (if it's
      not already known), and then unwind the RIP write and any side effects
      (RFLAGS updates).
      
      Save the computed next_rip and use it to re-stuff next_rip if injection
      doesn't complete.  This allows KVM to do the right thing if next_rip was
      known prior to injection, e.g. if L1 injects a soft event into L2, and
      there is no backing INTn instruction, e.g. if L1 is injecting an
      arbitrary event.
      
      Note, it's impossible to guarantee architectural correctness given SVM's
      architectural flaws.  E.g. if the guest executes INTn (no KVM injection),
      an exit occurs while vectoring the INTn, and the guest modifies the code
      stream while the exit is being handled, KVM will compute the incorrect
      next_rip due to "skipping" the wrong instruction.  A future enhancement
      to make this less awful would be for KVM to detect that the decoded
      instruction is not the correct INTn and drop the to-be-injected soft
      event (retrying is a lesser evil compared to shoving the wrong RIP on the
      exception stack).
      Reported-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <65cb88deab40bc1649d509194864312a89bbe02e.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6ef88d6e
    • Sean Christopherson's avatar
      KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported · 3741aec4
      Sean Christopherson authored
      If NRIPS is supported in hardware but disabled in KVM, set next_rip to
      the next RIP when advancing RIP as part of emulating INT3 injection.
      There is no flag to tell the CPU that KVM isn't using next_rip, and so
      leaving next_rip is left as is will result in the CPU pushing garbage
      onto the stack when vectoring the injected event.
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: 66b7138f ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <cd328309a3b88604daa2359ad56f36cb565ce2d4.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3741aec4
    • Sean Christopherson's avatar
      KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails" · cd9e6da8
      Sean Christopherson authored
      Unwind the RIP advancement done by svm_queue_exception() when injecting
      an INT3 ultimately "fails" due to the CPU encountering a VM-Exit while
      vectoring the injected event, even if the exception reported by the CPU
      isn't the same event that was injected.  If vectoring INT3 encounters an
      exception, e.g. #NP, and vectoring the #NP encounters an intercepted
      exception, e.g. #PF when KVM is using shadow paging, then the #NP will
      be reported as the event that was in-progress.
      
      Note, this is still imperfect, as it will get a false positive if the
      INT3 is cleanly injected, no VM-Exit occurs before the IRET from the INT3
      handler in the guest, the instruction following the INT3 generates an
      exception (directly or indirectly), _and_ vectoring that exception
      encounters an exception that is intercepted by KVM.  The false positives
      could theoretically be solved by further analyzing the vectoring event,
      e.g. by comparing the error code against the expected error code were an
      exception to occur when vectoring the original injected exception, but
      SVM without NRIPS is a complete disaster, trying to make it 100% correct
      is a waste of time.
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: 66b7138f ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <450133cf0a026cb9825a2ff55d02cb136a1cb111.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cd9e6da8
    • Maciej S. Szmigiero's avatar
      KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0 · f17c31c4
      Maciej S. Szmigiero authored
      Don't BUG/WARN on interrupt injection due to GIF being cleared,
      since it's trivial for userspace to force the situation via
      KVM_SET_VCPU_EVENTS (even if having at least a WARN there would be correct
      for KVM internally generated injections).
      
        kernel BUG at arch/x86/kvm/svm/svm.c:3386!
        invalid opcode: 0000 [#1] SMP
        CPU: 15 PID: 926 Comm: smm_test Not tainted 5.17.0-rc3+ #264
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:svm_inject_irq+0xab/0xb0 [kvm_amd]
        Code: <0f> 0b 0f 1f 00 0f 1f 44 00 00 80 3d ac b3 01 00 00 55 48 89 f5 53
        RSP: 0018:ffffc90000b37d88 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff88810a234ac0 RCX: 0000000000000006
        RDX: 0000000000000000 RSI: ffffc90000b37df7 RDI: ffff88810a234ac0
        RBP: ffffc90000b37df7 R08: ffff88810a1fa410 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
        R13: ffff888109571000 R14: ffff88810a234ac0 R15: 0000000000000000
        FS:  0000000001821380(0000) GS:ffff88846fdc0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f74fc550008 CR3: 000000010a6fe000 CR4: 0000000000350ea0
        Call Trace:
         <TASK>
         inject_pending_event+0x2f7/0x4c0 [kvm]
         kvm_arch_vcpu_ioctl_run+0x791/0x17a0 [kvm]
         kvm_vcpu_ioctl+0x26d/0x650 [kvm]
         __x64_sys_ioctl+0x82/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Fixes: 219b65dc ("KVM: SVM: Improve nested interrupt injection")
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <35426af6e123cbe91ec7ce5132ce72521f02b1b5.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f17c31c4
    • Maciej S. Szmigiero's avatar
      KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02 · 00f08d99
      Maciej S. Szmigiero authored
      The next_rip field of a VMCB is *not* an output-only field for a VMRUN.
      This field value (instead of the saved guest RIP) in used by the CPU for
      the return address pushed on stack when injecting a software interrupt or
      INT3 or INTO exception.
      
      Make sure this field gets synced from vmcb12 to vmcb02 when entering L2 or
      loading a nested state and NRIPS is exposed to L1.  If NRIPS is supported
      in hardware but not exposed to L1 (nrips=0 or hidden by userspace), stuff
      vmcb02's next_rip from the new L2 RIP to emulate a !NRIPS CPU (which
      saves RIP on the stack as-is).
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <c2e0a3d78db3ae30530f11d4e9254b452a89f42b.1651440202.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      00f08d99
  2. 07 Jun, 2022 10 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-5.19-2' of... · 5552de7b
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: pvdump and selftest improvements
      
      - add an interface to provide a hypervisor dump for secure guests
      - improve selftests to show tests
      5552de7b
    • Paolo Bonzini's avatar
      b31455e9
    • Paolo Bonzini's avatar
      a280e358
    • Maxim Levitsky's avatar
      KVM: SVM: fix tsc scaling cache logic · 11d39e8c
      Maxim Levitsky authored
      SVM uses a per-cpu variable to cache the current value of the
      tsc scaling multiplier msr on each cpu.
      
      Commit 1ab9287a
      ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
      broke this caching logic.
      
      Refactor the code so that all TSC scaling multiplier writes go through
      a single function which checks and updates the cache.
      
      This fixes the following scenario:
      
      1. A CPU runs a guest with some tsc scaling ratio.
      
      2. New guest with different tsc scaling ratio starts on this CPU
         and terminates almost immediately.
      
         This ensures that the short running guest had set the tsc scaling ratio just
         once when it was set via KVM_SET_TSC_KHZ. Due to the bug,
         the per-cpu cache is not updated.
      
      3. The original guest continues to run, it doesn't restore the msr
         value back to its own value, because the cache matches,
         and thus continues to run with a wrong tsc scaling ratio.
      
      Fixes: 1ab9287a ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      11d39e8c
    • Vitaly Kuznetsov's avatar
      KVM: selftests: Make hyperv_clock selftest more stable · eae260be
      Vitaly Kuznetsov authored
      hyperv_clock doesn't always give a stable test result, especially with
      AMD CPUs. The test compares Hyper-V MSR clocksource (acquired either
      with rdmsr() from within the guest or KVM_GET_MSRS from the host)
      against rdtsc(). To increase the accuracy, increase the measured delay
      (done with nop loop) by two orders of magnitude and take the mean rdtsc()
      value before and after rdmsr()/KVM_GET_MSRS.
      Reported-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Tested-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220601144322.1968742-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eae260be
    • Ben Gardon's avatar
      KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging · 5ba7c4c6
      Ben Gardon authored
      Currently disabling dirty logging with the TDP MMU is extremely slow.
      On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds
      to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with
      the shadow MMU.
      
      When disabling dirty logging, zap non-leaf parent entries to allow
      replacement with huge pages instead of recursing and zapping all of the
      child, leaf entries. This reduces the number of TLB flushes required.
      and reduces the disable dirty log time with the TDP MMU to ~3 seconds.
      
      Opportunistically add a WARN() to catch GFNs that are mapped at a
      higher level than their max level.
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20220525230904.1584480-1-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ba7c4c6
    • Jan Beulich's avatar
      x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() · 1df931d9
      Jan Beulich authored
      As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs
      and clobbering of "cc" don't work well together. The compiler appears to
      mean to reject such, but doesn't - in its upstream form - quite manage
      to yet for "cc". Furthermore two similar macros don't clobber "cc", and
      clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler
      always assumes status flags to be clobbered there.
      
      Fixes: 989b5db2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses")
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1df931d9
    • Shaoqin Huang's avatar
      KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() · cf4a8693
      Shaoqin Huang authored
      When freeing obsolete previous roots, check prev_roots as intended, not
      the current root.
      Signed-off-by: default avatarShaoqin Huang <shaoqin.huang@intel.com>
      Fixes: 527d5cd7 ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped")
      Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cf4a8693
    • Seth Forshee's avatar
      entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set · 3e684903
      Seth Forshee authored
      A livepatch transition may stall indefinitely when a kvm vCPU is heavily
      loaded. To the host, the vCPU task is a user thread which is spending a
      very long time in the ioctl(KVM_RUN) syscall. During livepatch
      transition, set_notify_signal() will be called on such tasks to
      interrupt the syscall so that the task can be transitioned. This
      interrupts guest execution, but when xfer_to_guest_mode_work() sees that
      TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an
      exit to user mode is unnecessary, and guest execution is resumed without
      transitioning the task for the livepatch.
      
      This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal()
      is expected to break tasks out of interruptible kernel loops and cause
      them to return to userspace. Change xfer_to_guest_mode_work() to handle
      TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run
      loop that an exit to userpsace is needed. Any pending task_work will be
      run when get_signal() is called from exit_to_user_mode_loop(), so there
      is no longer any need to run task work from xfer_to_guest_mode_work().
      Suggested-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3e684903
    • Alexey Kardashevskiy's avatar
      KVM: Don't null dereference ops->destroy · e8bc2427
      Alexey Kardashevskiy authored
      A KVM device cleanup happens in either of two callbacks:
      1) destroy() which is called when the VM is being destroyed;
      2) release() which is called when a device fd is closed.
      
      Most KVM devices use 1) but Book3s's interrupt controller KVM devices
      (XICS, XIVE, XIVE-native) use 2) as they need to close and reopen during
      the machine execution. The error handling in kvm_ioctl_create_device()
      assumes destroy() is always defined which leads to NULL dereference as
      discovered by Syzkaller.
      
      This adds a checks for destroy!=NULL and adds a missing release().
      
      This is not changing kvm_destroy_devices() as devices with defined
      release() should have been removed from the KVM devices list by then.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8bc2427
  3. 06 Jun, 2022 3 commits
  4. 05 Jun, 2022 16 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · e17fee89
      Linus Torvalds authored
      Pull delay-accounting update from Andrew Morton:
       "A single featurette for delay accounting.
      
        Delayed a bit because, unusually, it had dependencies on both the
        mm-stable and mm-nonmm-stable queues"
      
      * tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        delayacct: track delays from write-protect copy
      e17fee89
    • Linus Torvalds's avatar
      bluetooth: don't use bitmaps for random flag accesses · e1cff700
      Linus Torvalds authored
      The bluetooth code uses our bitmap infrastructure for the two bits (!)
      of connection setup flags, and in the process causes odd problems when
      it converts between a bitmap and just the regular values of said bits.
      
      It's completely pointless to do things like bitmap_to_arr32() to convert
      a bitmap into a u32.  It shoudln't have been a bitmap in the first
      place.  The reason to use bitmaps is if you have arbitrary number of
      bits you want to manage (not two!), or if you rely on the atomicity
      guarantees of the bitmap setting and clearing.
      
      The code could use an "atomic_t" and use "atomic_or/andnot()" to set and
      clear the bit values, but considering that it then copies the bitmaps
      around with "bitmap_to_arr32()" and friends, there clearly cannot be a
      lot of atomicity requirements.
      
      So just use a regular integer.
      
      In the process, this avoids the warnings about erroneous use of
      bitmap_from_u64() which were triggered on 32-bit architectures when
      conversion from a u64 would access two words (and, surprise, surprise,
      only one word is needed - and indeed overkill - for a 2-bit bitmap).
      
      That was always problematic, but the compiler seems to notice it and
      warn about the invalid pattern only after commit 0a97953f ("lib: add
      bitmap_{from,to}_arr64") changed the exact implementation details of
      'bitmap_from_u64()', as reported by Sudip Mukherjee and Stephen Rothwell.
      
      Fixes: fe92ee64 ("Bluetooth: hci_core: Rework hci_conn_params flags")
      Link: https://lore.kernel.org/all/YpyJ9qTNHJzz0FHY@debian/
      Link: https://lore.kernel.org/all/20220606080631.0c3014f2@canb.auug.org.au/
      Link: https://lore.kernel.org/all/20220605162537.1604762-1-yury.norov@gmail.com/Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Reported-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1cff700
    • Al Viro's avatar
      fix the breakage in close_fd_get_file() calling conventions change · 40a19260
      Al Viro authored
      It used to grab an extra reference to struct file rather than
      just transferring to caller the one it had removed from descriptor
      table.  New variant doesn't, and callers need to be adjusted.
      
      Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com
      Fixes: 6319194e ("Unify the primitives for file descriptor closing")
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      40a19260
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d717180e
      Linus Torvalds authored
      Pull x86 SGX fix from Thomas Gleixner:
       "A single fix for x86/SGX to prevent that memory which is allocated for
        an SGX enclave is accounted to the wrong memory control group"
      
      * tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sgx: Set active memcg prior to shmem allocation
      d717180e
    • Linus Torvalds's avatar
      Merge tag 'x86-mm-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0b7da15c
      Linus Torvalds authored
      Pull x86 mm cleanup from Thomas Gleixner:
       "Use PAGE_ALIGNED() instead of open coding it in the x86/mm code"
      
      * tag 'x86-mm-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Use PAGE_ALIGNED(x) instead of IS_ALIGNED(x, PAGE_SIZE)
      0b7da15c
    • Linus Torvalds's avatar
      Merge tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9784edd7
      Linus Torvalds authored
      Pull x86 microcode updates from Thomas Gleixner:
      
       - Disable late microcode loading by default. Unless the HW people get
         their act together and provide a required minimum version in the
         microcode header for making a halfways informed decision its just
         lottery and broken.
      
       - Warn and taint the kernel when microcode is loaded late
      
       - Remove the old unused microcode loader interface
      
       - Remove a redundant perf callback from the microcode loader
      
      * tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode: Remove unnecessary perf callback
        x86/microcode: Taint and warn on late loading
        x86/microcode: Default-disable late loading
        x86/microcode: Rip out the OLD_INTERFACE
      9784edd7
    • Linus Torvalds's avatar
      Merge tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a9251280
      Linus Torvalds authored
      Pull x86 cleanups from Thomas Gleixner:
       "A set of small x86 cleanups:
      
         - Remove unused headers in the IDT code
      
         - Kconfig indendation and comment fixes
      
         - Fix all 'the the' typos in one go instead of waiting for bots to
           fix one at a time"
      
      * tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix all occurences of the "the the" typo
        x86/idt: Remove unused headers
        x86/Kconfig: Fix indentation of arch/x86/Kconfig.debug
        x86/Kconfig: Fix indentation and add endif comments to arch/x86/Kconfig
      a9251280
    • Linus Torvalds's avatar
      Merge tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1fd9f4ce
      Linus Torvalds authored
      Pull x86 boot update from Thomas Gleixner:
       "Use strlcpy() instead of strscpy() in arch_setup()"
      
      * tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/setup: Use strscpy() to replace deprecated strlcpy()
      1fd9f4ce
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c049ecc5
      Linus Torvalds authored
      Pull clockevent/clocksource updates from Thomas Gleixner:
      
       - Device tree bindings for MT8186
      
       - Tell the kernel that the RISC-V SBI timer stops in deeper power
         states
      
       - Make device tree parsing in sp804 more robust
      
       - Dead code removal and tiny fixes here and there
      
       - Add the missing SPDX identifiers
      
      * tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
        clocksource/drivers/timer-ti-dm: Remove unnecessary NULL check
        clocksource/drivers/timer-sun5i: Convert to SPDX identifier
        clocksource/drivers/timer-sun4i: Convert to SPDX identifier
        clocksource/drivers/pistachio: Convert to SPDX identifier
        clocksource/drivers/orion: Convert to SPDX identifier
        clocksource/drivers/lpc32xx: Convert to SPDX identifier
        clocksource/drivers/digicolor: Convert to SPDX identifier
        clocksource/drivers/armada-370-xp: Convert to SPDX identifier
        clocksource/drivers/mips-gic-timer: Convert to SPDX identifier
        clocksource/drivers/jcore: Convert to SPDX identifier
        clocksource/drivers/bcm_kona: Convert to SPDX identifier
        clocksource/drivers/sp804: Avoid error on multiple instances
        clocksource/drivers/riscv: Events are stopped during CPU suspend
        clocksource/drivers/ixp4xx: Drop boardfile probe path
        dt-bindings: timer: Add compatible for Mediatek MT8186
      c049ecc5
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bc1e02c3
      Linus Torvalds authored
      Pull scheduler fix from Thomas Gleixner:
       "Fix the fallout of sysctl code move which placed the init function
        wrong"
      
      * tag 'sched-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/autogroup: Fix sysctl move
      bc1e02c3
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fa11c280
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
      
        - Make the ICL event constraints match reality
      
        - Remove a unused local variable
      
      * tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Remove unused local variable
        perf/x86/intel: Fix event constraints for ICL
      fa11c280
    • Linus Torvalds's avatar
      Merge tag 'perf-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5cc47d4a
      Linus Torvalds authored
      Pull perf fixlet from Thomas Gleixner:
       "Trivial indentation fix in Kconfig"
      
      * tag 'perf-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/Kconfig: Fix indentation in the Kconfig file
      5cc47d4a
    • Linus Torvalds's avatar
      Merge tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 44688ffd
      Linus Torvalds authored
      Pull objtool fixes from Thomas Gleixner:
      
       - Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
         noreturn
      
       - Allow architectures to select uaccess validation
      
       - Use the non-instrumented bit test for test_cpu_has() to prevent
         escape from non-instrumentable regions
      
       - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
         from non-instrumentable regions
      
       - Mark a few tiny inline as __always_inline to prevent GCC from
         bringing them out of line and instrumenting them
      
       - Mark the empty stub context_tracking_enabled() as always inline as
         GCC brings them out of line and instruments the empty shell
      
       - Annotate ex_handler_msr_mce() as dead end
      
      * tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/extable: Annotate ex_handler_msr_mce() as a dead end
        context_tracking: Always inline empty stubs
        x86: Always inline on_thread_stack() and current_top_of_stack()
        jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds
        x86/cpu: Elide KCSAN for cpu_has() and friends
        objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn
        objtool: Add CONFIG_HAVE_UACCESS_VALIDATION
      44688ffd
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · b2c9a83d
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "Mostly small bug fixes plus other trivial updates.
      
        The major change of note is moving ufs out of scsi and a minor update
        to lpfc vmid handling"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
        scsi: qla2xxx: Remove unused 'ql_dm_tgt_ex_pct' parameter
        scsi: qla2xxx: Remove setting of 'req' and 'rsp' parameters
        scsi: mpi3mr: Fix kernel-doc
        scsi: lpfc: Add support for ATTO Fibre Channel devices
        scsi: core: Return BLK_STS_TRANSPORT for ALUA transitioning
        scsi: sd_zbc: Prevent zone information memory leak
        scsi: sd: Fix potential NULL pointer dereference
        scsi: mpi3mr: Rework mrioc->bsg_device model to fix warnings
        scsi: myrb: Fix up null pointer access on myrb_cleanup()
        scsi: core: Unexport scsi_bus_type
        scsi: sd: Don't call blk_cleanup_disk() in sd_probe()
        scsi: ufs: ufshcd: Delete unnecessary NULL check
        scsi: isci: Fix typo in comment
        scsi: pmcraid: Fix typo in comment
        scsi: smartpqi: Fix typo in comment
        scsi: qedf: Fix typo in comment
        scsi: esas2r: Fix typo in comment
        scsi: storvsc: Fix typo in comment
        scsi: ufs: Split the drivers/scsi/ufs directory
        scsi: qla1280: Remove redundant variable
        ...
      b2c9a83d
    • Linus Torvalds's avatar
      Merge tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux · 29814363
      Linus Torvalds authored
      Pull hardware timestamping subsystem from Thierry Reding:
       "This contains the new HTE (hardware timestamping engine) subsystem
        that has been in the works for a couple of months now.
      
        The infrastructure provided allows for drivers to register as hardware
        timestamp providers, while consumers will be able to request events
        that they are interested in (such as GPIOs and IRQs) to be timestamped
        by the hardware providers.
      
        Note that this currently supports only one provider, but there seems
        to be enough interest in this functionality and we expect to see more
        drivers added once this is merged"
      
      [ Linus Walleij mentions the Intel PMC in the Elkhart and Tiger Lake
        platforms as another future timestamp provider ]
      
      * tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
        dt-bindings: timestamp: Correct id path
        dt-bindings: Renamed hte directory to timestamp
        hte: Uninitialized variable in hte_ts_get()
        hte: Fix off by one in hte_push_ts_ns()
        hte: Fix possible use-after-free in tegra_hte_test_remove()
        hte: Remove unused including <linux/version.h>
        MAINTAINERS: Add HTE Subsystem
        hte: Add Tegra HTE test driver
        tools: gpio: Add new hardware clock type
        gpiolib: cdev: Add hardware timestamp clock type
        gpio: tegra186: Add HTE support
        gpiolib: Add HTE support
        dt-bindings: Add HTE bindings
        hte: Add Tegra194 HTE kernel provider
        drivers: Add hardware timestamp engine (HTE) subsystem
        Documentation: Add HTE subsystem guide
      29814363
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 71e80720
      Linus Torvalds authored
      Pull more Kbuild updates from Masahiro Yamada:
      
       - Fix build regressions for parisc, csky, nios2, openrisc
      
       - Simplify module builds for CONFIG_LTO_CLANG and CONFIG_X86_KERNEL_IBT
      
       - Remove arch/parisc/nm, which was presumably a workaround for old
         tools
      
       - Check the odd combination of EXPORT_SYMBOL and 'static' precisely
      
       - Make external module builds robust against "too long argument error"
      
       - Support j, k keys for moving the cursor in nconfig
      
      * tag 'kbuild-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
        kbuild: Allow to select bash in a modified environment
        scripts: kconfig: nconf: make nconfig accept jk keybindings
        modpost: use fnmatch() to simplify match()
        modpost: simplify mod->name allocation
        kbuild: factor out the common objtool arguments
        kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o
        kbuild: clean .tmp_* pattern by make clean
        kbuild: remove redundant cleanups in scripts/link-vmlinux.sh
        kbuild: rebuild multi-object modules when objtool is updated
        kbuild: add cmd_and_savecmd macro
        kbuild: make *.mod rule robust against too long argument error
        kbuild: make built-in.a rule robust against too long argument error
        kbuild: check static EXPORT_SYMBOL* by script instead of modpost
        parisc: remove arch/parisc/nm
        kbuild: do not create *.prelink.o for Clang LTO or IBT
        kbuild: replace $(linked-object) with CONFIG options
        kbuild: do not try to parse *.cmd files for objects provided by compiler
        kbuild: replace $(if A,A,B) with $(or A,B) in scripts/Makefile.modpost
        modpost: squash if...else-if in find_elf_symbol2()
        modpost: reuse ARRAY_SIZE() macro for section_mismatch()
        ...
      71e80720