1. 09 Mar, 2024 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEAD · 1b6c146d
      Paolo Bonzini authored
      KVM x86 fixes for 6.8, round 2:
      
       - When emulating an atomic access, mark the gfn as dirty in the memslot
         to fix a bug where KVM could fail to mark the slot as dirty during live
         migration, ultimately resulting in guest data corruption due to a dirty
         page not being re-copied from the source to the target.
      
       - Check for mmu_notifier invalidation events before faulting in the pfn,
         and before acquiring mmu_lock, to avoid unnecessary work and lock
         contention.  Contending mmu_lock is especially problematic on preemptible
         kernels, as KVM may yield mmu_lock in response to the contention, which
         severely degrades overall performance due to vCPUs making it difficult
         for the task that triggered invalidation to make forward progress.
      
         Note, due to another kernel bug, this fix isn't limited to preemtible
         kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
         contended rwlocks and spinlocks.
      
         https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
      1b6c146d
  2. 23 Feb, 2024 2 commits
    • Sean Christopherson's avatar
      KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing · d02c357e
      Sean Christopherson authored
      Retry page faults without acquiring mmu_lock, and without even faulting
      the page into the primary MMU, if the resolved gfn is covered by an active
      invalidation.  Contending for mmu_lock is especially problematic on
      preemptible kernels as the mmu_notifier invalidation task will yield
      mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
      ultimately increase the latency of resolving the page fault.  And in the
      worst case scenario, yielding will be accompanied by a remote TLB flush,
      e.g. if the invalidation covers a large range of memory and vCPUs are
      accessing addresses that were already zapped.
      
      Faulting the page into the primary MMU is similarly problematic, as doing
      so may acquire locks that need to be taken for the invalidation to
      complete (the primary MMU has finer grained locks than KVM's MMU), and/or
      may cause unnecessary churn (getting/putting pages, marking them accessed,
      etc).
      
      Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
      iterators to perform more work before yielding, but that wouldn't solve
      the lock contention and would negatively affect scenarios where a vCPU is
      trying to fault in an address that is NOT covered by the in-progress
      invalidation.
      
      Add a dedicated lockess version of the range-based retry check to avoid
      false positives on the sanity check on start+end WARN, and so that it's
      super obvious that checking for a racing invalidation without holding
      mmu_lock is unsafe (though obviously useful).
      
      Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
      invalidation in a loop won't put KVM into an infinite loop, e.g. due to
      caching the in-progress flag and never seeing it go to '0'.
      
      Force a load of mmu_invalidate_seq as well, even though it isn't strictly
      necessary to avoid an infinite loop, as doing so improves the probability
      that KVM will detect an invalidation that already completed before
      acquiring mmu_lock and bailing anyways.
      
      Do the pre-check even for non-preemptible kernels, as waiting to detect
      the invalidation until mmu_lock is held guarantees the vCPU will observe
      the worst case latency in terms of handling the fault, and can generate
      even more mmu_lock contention.  E.g. the vCPU will acquire mmu_lock,
      detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
      eventually re-acquire mmu_lock.  This behavior is also why there are no
      new starvation issues due to losing the fairness guarantees provided by
      rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
      on mmu_lock doesn't guarantee forward progress in the face of _another_
      mmu_notifier invalidation event.
      
      Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
      may generate a load into a register instead of doing a direct comparison
      (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
      is a few bytes of code and maaaaybe a cycle or three.
      Reported-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.comReported-by: default avatarFriedrich Weber <f.weber@proxmox.com>
      Cc: Kai Huang <kai.huang@intel.com>
      Cc: Yan Zhao <yan.y.zhao@intel.com>
      Cc: Yuan Yao <yuan.yao@linux.intel.com>
      Cc: Xu Yilun <yilun.xu@linux.intel.com>
      Acked-by: default avatarKai Huang <kai.huang@intel.com>
      Reviewed-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Link: https://lore.kernel.org/r/20240222012640.2820927-1-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      d02c357e
    • Sean Christopherson's avatar
      KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() · 5ef1d8c1
      Sean Christopherson authored
      Do the cache flush of converted pages in svm_register_enc_region() before
      dropping kvm->lock to fix use-after-free issues where region and/or its
      array of pages could be freed by a different task, e.g. if userspace has
      __unregister_enc_region_locked() already queued up for the region.
      
      Note, the "obvious" alternative of using local variables doesn't fully
      resolve the bug, as region->pages is also dynamically allocated.  I.e. the
      region structure itself would be fine, but region->pages could be freed.
      
      Flushing multiple pages under kvm->lock is unfortunate, but the entire
      flow is a rare slow path, and the manual flush is only needed on CPUs that
      lack coherency for encrypted memory.
      
      Fixes: 19a23da5 ("Fix unsynchronized access to sev members through svm_register_enc_region")
      Reported-by: default avatarGabe Kirkpatrick <gkirkpatrick@google.com>
      Cc: Josh Eads <josheads@google.com>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20240217013430.2079561-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ef1d8c1
  3. 21 Feb, 2024 3 commits
  4. 17 Feb, 2024 1 commit
    • Sean Christopherson's avatar
      KVM: x86: Mark target gfn of emulated atomic instruction as dirty · 910c57df
      Sean Christopherson authored
      When emulating an atomic access on behalf of the guest, mark the target
      gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
      fixes a bug where KVM effectively corrupts guest memory during live
      migration by writing to guest memory without informing userspace that the
      page is dirty.
      
      Marking the page dirty got unintentionally dropped when KVM's emulated
      CMPXCHG was converted to do a user access.  Before that, KVM explicitly
      mapped the guest page into kernel memory, and marked the page dirty during
      the unmap phase.
      
      Mark the page dirty even if the CMPXCHG fails, as the old data is written
      back on failure, i.e. the page is still written.  The value written is
      guaranteed to be the same because the operation is atomic, but KVM's ABI
      is that all writes are dirty logged regardless of the value written.  And
      more importantly, that's what KVM did before the buggy commit.
      
      Huge kudos to the folks on the Cc list (and many others), who did all the
      actual work of triaging and debugging.
      
      Fixes: 1c2361f6 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
      Cc: stable@vger.kernel.org
      Cc: David Matlack <dmatlack@google.com>
      Cc: Pasha Tatashin <tatashin@google.com>
      Cc: Michael Krebs <mkrebs@google.com>
      base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Link: https://lore.kernel.org/r/20240215010004.1456078-2-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      910c57df
  5. 16 Feb, 2024 2 commits
  6. 14 Feb, 2024 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.8-1' of https://github.com/kvm-riscv/linux into HEAD · e67391ca
      Paolo Bonzini authored
      KVM/riscv fixes for 6.8, take #1
      
      - Fix steal-time related sparse warnings
      e67391ca
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-selftests-6.8-rcN' of https://github.com/kvm-x86/linux into HEAD · 2f8ebe43
      Paolo Bonzini authored
      KVM selftests fixes/cleanups (and one KVM x86 cleanup) for 6.8:
      
       - Remove redundant newlines from error messages.
      
       - Delete an unused variable in the AMX test (which causes build failures when
         compiling with -Werror).
      
       - Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails with an
         error code other than ENOENT (a Hyper-V selftest bug resulted in an EMFILE,
         and the test eventually got skipped).
      
       - Fix TSC related bugs in several Hyper-V selftests.
      
       - Fix a bug in the dirty ring logging test where a sem_post() could be left
         pending across multiple runs, resulting in incorrect synchronization between
         the main thread and the vCPU worker thread.
      
       - Relax the dirty log split test's assertions on 4KiB mappings to fix false
         positives due to the number of mappings for memslot 0 (used for code and
         data that is NOT being dirty logged) changing, e.g. due to NUMA balancing.
      
       - Have KVM's gtod_is_based_on_tsc() return "bool" instead of an "int" (the
         function generates boolean values, and all callers treat the return value as
         a bool).
      2f8ebe43
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.8-rcN' of https://github.com/kvm-x86/linux into HEAD · 22d0bc07
      Paolo Bonzini authored
      KVM x86 fixes for 6.8:
      
       - Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if and only
         if the incoming events->nmi.pending is non-zero.  If the target vCPU is in
         the UNITIALIZED state, the spurious request will result in KVM exiting to
         userspace, which in turn causes QEMU to constantly acquire and release
         QEMU's global mutex, to the point where the BSP is unable to make forward
         progress.
      
       - Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl being
         incorrectly truncated, and ultimately causes KVM to think a fixed counter
         has already been disabled (KVM thinks the old value is '0').
      
       - Fix a stack leak in KVM_GET_MSRS where a failed MSR read from userspace
         that is ultimately ignored due to ignore_msrs=true doesn't zero the output
         as intended.
      22d0bc07
  7. 13 Feb, 2024 1 commit
  8. 11 Feb, 2024 3 commits
  9. 10 Feb, 2024 8 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of... · 7521f258
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
        issues or aren't considered to be needed in earlier kernel versions"
      
      * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
        nilfs2: fix potential bug in end_buffer_async_write
        mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
        nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
        MAINTAINERS: Leo Yan has moved
        mm/zswap: don't return LRU_SKIP if we have dropped lru lock
        fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
        mailmap: switch email address for John Moon
        mm: zswap: fix objcg use-after-free in entry destruction
        mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
        arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
        selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
        mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
        mm: memcg: optimize parent iteration in memcg_rstat_updated()
        nilfs2: fix data corruption in dsync block recovery for small block sizes
        mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
        exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
        fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
        fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
        getrusage: use sig->stats_lock rather than lock_task_sighand()
        getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
        ...
      7521f258
    • Linus Torvalds's avatar
      Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux · a5b6244c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Keith:
           - Update a potentially stale firmware attribute (Maurizio)
           - Fixes for the recent verbose error logging (Keith, Chaitanya)
           - Protection information payload size fix for passthrough (Francis)
      
       - Fix for a queue freezing issue in virtblk (Yi)
      
       - blk-iocost underflow fix (Tejun)
      
       - blk-wbt task detection fix (Jan)
      
      * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux:
        virtio-blk: Ensure no requests in virtqueues before deleting vqs.
        blk-iocost: Fix an UBSAN shift-out-of-bounds warning
        nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
        nvme-core: fix comment to reflect right functions
        nvme: move passthrough logging attribute to head
        blk-wbt: Fix detection of dirty-throttled tasks
        nvme-host: fix the updating of the firmware version
      a5b6244c
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.8-rc4' of... · a38ff5bb
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fix from Takashi Sakamoto:
       "A change to accelerate the device detection step in some cases.
      
        In the self-identification step after bus-reset, all nodes in the same
        bus broadcast selfID packet including the value of gap count. The
        value is related to the cable hops between nodes, and used to
        calculate the subaction gap and the arbitration reset gap.
      
        When each node has the different value of the gap count, the
        asynchronous communication between them is unreliable, since an
        asynchronous transaction could be interrupted by another asynchronous
        transaction before completion. The gap count inconsistency can be
        resolved by several ways; e.g. the transfer of PHY configuration
        packet and generation of bus-reset.
      
        The current implementation of firewire stack can correctly detect the
        gap count inconsistency, however the recovery action from the
        inconsistency tends to be delayed after reading configuration ROM of
        root node. This results in the long time to probe devices in some
        combinations of hardware.
      
        Here the stack is changed to schedule the action as soon as possible"
      
      * tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: core: send bus reset promptly on gap count error
      a38ff5bb
    • Linus Torvalds's avatar
      Merge tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 5a7ec870
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two ksmbd server fixes:
      
         - memory leak fix
      
         - a minor kernel-doc fix"
      
      * tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
        ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
      5a7ec870
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4a7bbe75
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three small driver fixes and one core fix.
      
        The core fix being a fixup to the one in the last pull request which
        didn't entirely move checking of scsi_host_busy() out from under the
        host lock"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare()
        scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd()
        scsi: lpfc: Use unsigned type for num_sge
        scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
      4a7bbe75
    • Linus Torvalds's avatar
      Merge tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · ca00c700
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - reconnect fix
      
       - multichannel channel selection fix
      
       - minor mount warning fix
      
       - reparse point fix
      
       - null pointer check improvement
      
      * tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: clarify mount warning
        cifs: handle cases where multiple sessions share connection
        cifs: change tcon status when need_reconnect is set on it
        smb: client: set correct d_type for reparse points under DFS mounts
        smb3: add missing null server pointer check
      ca00c700
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client · e1e3f530
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Some fscrypt-related fixups (sparse reads are used only for encrypted
        files) and two cap handling fixes from Xiubo and Rishabh"
      
      * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
        ceph: always check dir caps asynchronously
        ceph: prevent use-after-free in encode_cap_msg()
        ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
        libceph: just wait for more data to be available on the socket
        libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
        libceph: fail sparse-read if the data length doesn't match
      e1e3f530
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3 · a2343df3
      Linus Torvalds authored
      Pull ntfs3 fixes from Konstantin Komarov:
       "Fixed:
         - size update for compressed file
         - some logic errors, overflows
         - memory leak
         - some code was refactored
      
        Added:
         - implement super_operations::shutdown
      
        Improved:
         - alternative boot processing
         - reduced stack usage"
      
      * tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
        fs/ntfs3: Slightly simplify ntfs_inode_printk()
        fs/ntfs3: Add ioctl operation for directories (FITRIM)
        fs/ntfs3: Fix oob in ntfs_listxattr
        fs/ntfs3: Fix an NULL dereference bug
        fs/ntfs3: Update inode->i_size after success write into compressed file
        fs/ntfs3: Fixed overflow check in mi_enum_attr()
        fs/ntfs3: Correct function is_rst_area_valid
        fs/ntfs3: Use i_size_read and i_size_write
        fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
        fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
        fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
        fs/ntfs3: Disable ATTR_LIST_ENTRY size check
        fs/ntfs3: Fix c/mtime typo
        fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
        fs/ntfs3: Add and fix comments
        fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
        fs/ntfs3: Implement super_operations::shutdown
        fs/ntfs3: Drop suid and sgid bits as a part of fpunch
        fs/ntfs3: Add file_modified
        fs/ntfs3: Correct use bh_read
        ...
      a2343df3
  10. 09 Feb, 2024 16 commits
    • Linus Torvalds's avatar
      work around gcc bugs with 'asm goto' with outputs · 4356e9f8
      Linus Torvalds authored
      We've had issues with gcc and 'asm goto' before, and we created a
      'asm_volatile_goto()' macro for that in the past: see commits
      3f0116c3 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
      bug") and a9f18034 ("compiler/gcc4: Make quirk for
      asm_volatile_goto() unconditional").
      
      Then, much later, we ended up removing the workaround in commit
      43c249ea ("compiler-gcc.h: remove ancient workaround for gcc PR
      58670") because we no longer supported building the kernel with the
      affected gcc versions, but we left the macro uses around.
      
      Now, Sean Christopherson reports a new version of a very similar
      problem, which is fixed by re-applying that ancient workaround.  But the
      problem in question is limited to only the 'asm goto with outputs'
      cases, so instead of re-introducing the old workaround as-is, let's
      rename and limit the workaround to just that much less common case.
      
      It looks like there are at least two separate issues that all hit in
      this area:
      
       (a) some versions of gcc don't mark the asm goto as 'volatile' when it
           has outputs:
      
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
      
           which is easy to work around by just adding the 'volatile' by hand.
      
       (b) Internal compiler errors:
      
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
      
           which are worked around by adding the extra empty 'asm' as a
           barrier, as in the original workaround.
      
      but the problem Sean sees may be a third thing since it involves bad
      code generation (not an ICE) even with the manually added 'volatile'.
      
      but the same old workaround works for this case, even if this feels a
      bit like voodoo programming and may only be hiding the issue.
      Reported-and-tested-by: default avatarSean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Andrew Pinski <quic_apinski@quicinc.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4356e9f8
    • Steve French's avatar
      smb3: clarify mount warning · a5cc98eb
      Steve French authored
      When a user tries to use the "sec=krb5p" mount parameter to encrypt
      data on connection to a server (when authenticating with Kerberos), we
      indicate that it is not supported, but do not note the equivalent
      recommended mount parameter ("sec=krb5,seal") which turns on encryption
      for that mount (and uses Kerberos for auth).  Update the warning message.
      Reviewed-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a5cc98eb
    • Shyam Prasad N's avatar
      cifs: handle cases where multiple sessions share connection · a39c757b
      Shyam Prasad N authored
      Based on our implementation of multichannel, it is entirely
      possible that a server struct may not be found in any channel
      of an SMB session.
      
      In such cases, we should be prepared to move on and search for
      the server struct in the next session.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a39c757b
    • Shyam Prasad N's avatar
      cifs: change tcon status when need_reconnect is set on it · c6e02eef
      Shyam Prasad N authored
      When a tcon is marked for need_reconnect, the intention
      is to have it reconnected.
      
      This change adjusts tcon->status in cifs_tree_connect
      when need_reconnect is set. Also, this change has a minor
      correction in resetting need_reconnect on success. It makes
      sure that it is done with tc_lock held.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      c6e02eef
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 9ed18b0b
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - fix missing TLB flush during early boot on SPARSEMEM_VMEMMAP
         configurations
      
       - fixes to correctly implement the break-before-make behavior requried
         by the ISA for NAPOT mappings
      
       - fix a missing TLB flush on intermediate mapping changes
      
       - fix build warning about a missing declaration of overflow_stack
      
       - fix performace regression related to incorrect tracking of completed
         batch TLB flushes
      
      * tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix arch_tlbbatch_flush() by clearing the batch cpumask
        riscv: declare overflow_stack as exported from traps.c
        riscv: Fix arch_hugetlb_migration_supported() for NAPOT
        riscv: Flush the tlb when a page directory is freed
        riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled
        riscv: Fix set_huge_pte_at() for NAPOT mapping
        riscv: mm: execute local TLB flush after populating vmemmap
      9ed18b0b
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ca8a6673
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix broken direct trampolines being called when another callback is
         attached the same function.
      
         ARM 64 does not support FTRACE_WITH_REGS, and when it added direct
         trampoline calls from ftrace, it removed the "WITH_REGS" flag from
         the ftrace_ops for direct trampolines. This broke x86 as x86 requires
         direct trampolines to have WITH_REGS.
      
         This wasn't noticed because direct trampolines work as long as the
         function it is attached to is not shared with other callbacks (like
         the function tracer). When there are other callbacks, a helper
         trampoline is called, to call all the non direct callbacks and when
         it returns, the direct trampoline is called.
      
         For x86, the direct trampoline sets a flag in the regs field to tell
         the x86 specific code to call the direct trampoline. But this only
         works if the ftrace_ops had WITH_REGS set. ARM does things
         differently that does not require this. For now, set WITH_REGS if the
         arch supports WITH_REGS (which ARM does not), and this makes it work
         for both ARM64 and x86.
      
       - Fix wasted memory in the saved_cmdlines logic.
      
         The saved_cmdlines is a cache that maps PIDs to COMMs that tracing
         can use. Most trace events only save the PID in the event. The
         saved_cmdlines file lists PIDs to COMMs so that the tracing tools can
         show an actual name and not just a PID for each event. There's an
         array of PIDs that map to a small set of saved COMM strings. The
         array is set to PID_MAX_DEFAULT which is usually set to 32768. When a
         PID comes in, it will add itself to this array along with the index
         into the COMM array (note if the system allows more than
         PID_MAX_DEFAULT, this cache is similar to cache lines as an update of
         a PID that has the same PID_MAX_DEFAULT bits set will flush out
         another task with the same matching bits set).
      
         A while ago, the size of this cache was changed to be dynamic and the
         array was moved into a structure and created with kmalloc(). But this
         new structure had the size of 131104 bytes, or 0x20020 in hex. As
         kmalloc allocates in powers of two, it was actually allocating
         0x40000 bytes (262144) leaving 131040 bytes of wasted memory. The
         last element of this structure was a pointer to the COMM string array
         which defaulted to just saving 128 COMMs.
      
         By changing the last field of this structure to a variable length
         string, and just having it round up to fill the allocated memory, the
         default size of the saved COMM cache is now 8190. This not only uses
         the wasted space, but actually saves space by removing the extra
         allocation for the COMM names.
      
      * tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix wasted memory in saved_cmdlines logic
        ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
      ca8a6673
    • Linus Torvalds's avatar
      Merge tag 'probes-fixes-v6.8-rc3' of... · 6dc512a0
      Linus Torvalds authored
      Merge tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull probes fixes from Masami Hiramatsu:
      
       - remove unnecessary initial values of kprobes local variables
      
       - probe-events parser bug fixes:
      
          - calculate the argument size and format string after setting type
            information from BTF, because BTF can change the size and format
            string.
      
          - show $comm parse error correctly instead of failing silently.
      
      * tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        kprobes: Remove unnecessary initial values of variables
        tracing/probes: Fix to set arg size and fmt after setting type from BTF
        tracing/probes: Fix to show a parse error for bad type for $comm
      6dc512a0
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · e6f39a90
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
       "The only notable change here is the patch that changes the way we deal
        with spurious errors from the EFI memory attribute protocol. This will
        be backported to v6.6, and is intended to ensure that we will not
        paint ourselves into a corner when we tighten this further in order to
        comply with MS requirements on signed EFI code.
      
        Note that this protocol does not currently exist in x86 production
        systems in the field, only in Microsoft's fork of OVMF, but it will be
        mandatory for Windows logo certification for x86 PCs in the future.
      
         - Tighten ELF relocation checks on the RISC-V EFI stub
      
         - Give up if the new EFI memory attributes protocol fails spuriously
           on x86
      
         - Take care not to place the kernel in the lowest 16 MB of DRAM on
           x86
      
         - Omit special purpose EFI memory from memblock
      
         - Some fixes for the CXL CPER reporting code
      
         - Make the PE/COFF layout of mixed-mode capable images comply with a
           strict interpretation of the spec"
      
      * tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
        cxl/trace: Remove unnecessary memcpy's
        cxl/cper: Fix errant CPER prints for CXL events
        efi: Don't add memblocks for soft-reserved memory
        efi: runtime: Fix potential overflow of soft-reserved region size
        efi/libstub: Add one kernel-doc comment
        x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
        x86/efistub: Give up if memory attribute protocol returns an error
        riscv/efistub: Tighten ELF relocation check
        riscv/efistub: Ensure GP-relative addressing is not used
      e6f39a90
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 5ddfc246
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Fix an unintentional truncation of DWC MSI-X address to 32 bits and
         update similar MSI code to match (Dan Carpenter)
      
      * tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment
        PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
      5ddfc246
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v6.8-rc4' of... · 5ca243c2
      Linus Torvalds authored
      Merge tag 'hwmon-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - coretemp: Various fixes, and increase number of supported CPU cores
      
       - aspeed-pwm-tacho: Add missing mutex protection
      
      * tag 'hwmon-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (coretemp) Enlarge per package core count limit
        hwmon: (coretemp) Fix bogus core_id to attr name mapping
        hwmon: (coretemp) Fix out-of-bounds memory access
        hwmon: (aspeed-pwm-tacho) mutex for tach reading
      5ca243c2
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · eb747bcc
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - Allow non-sleeping read-only slot-gpio
      
        MMC host:
         - sdhci-pci-o2micro: Fix a warm reboot BIOS issue"
      
      * tag 'mmc-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: slot-gpio: Allow non-sleeping GPIO ro
        mmc: sdhci-pci-o2micro: Fix a warm reboot issue that disk can't be detected by BIOS
      eb747bcc
    • Linus Torvalds's avatar
      Merge tag 'pmdomain-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm · 3760081f
      Linus Torvalds authored
      Pull pmdomain fixes from Ulf Hansson:
       "Core:
         - Move the unused cleanup to a _sync initcall
      
        Providers:
         - mediatek: Fix race conditions at probe/remove with genpd
         - renesas: r8a77980-sysc: CR7 must be always on"
      
      * tag 'pmdomain-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
        pmdomain: mediatek: fix race conditions with genpd
        pmdomain: renesas: r8a77980-sysc: CR7 must be always on
        pmdomain: core: Move the unused cleanup to a _sync initcall
      3760081f
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 4a8e4b3c
      Linus Torvalds authored
      Pull gpio fix from Bartosz Golaszewski:
      
       - remove the new GPIO device from the global list unconditionally in
         error path in core GPIOLIB
      
      * tag 'gpio-fixes-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: remove GPIO device from the list unconditionally in error path
      4a8e4b3c
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-02-09' of git://anongit.freedesktop.org/drm/drm · c76b766e
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Regular weekly fixes, xe, amdgpu and msm are most of them, with some
        misc in i915, ivpu and nouveau, scattered but nothing too intense at
        this point.
      
        i915:
         - gvt: docs fix, uninit var, MAINTAINERS
      
        ivpu:
         - add aborted job status
         - disable d3 hot delay
         - mmu fixes
      
        nouveau:
         - fix gsp rpc size request
         - fix dma buffer leaks
         - use common code for gsp mem ctor
      
        xe:
         - Fix a loop in an error path
         - Fix a missing dma-fence reference
         - Fix a retry path on userptr REMAP
         - Workaround for a false gcc warning
         - Fix missing map of the usm batch buffer in the migrate vm.
         - Fix a memory leak.
         - Fix a bad assumption of used page size
         - Fix hitting a BUG() due to zero pages to map.
         - Remove some leftover async bind queue relics
      
        amdgpu:
         - Misc NULL/bounds check fixes
         - ODM pipe policy fix
         - Aborted suspend fixes
         - JPEG 4.0.5 fix
         - DCN 3.5 fixes
         - PSP fix
         - DP MST fix
         - Phantom pipe fix
         - VRAM vendor fix
         - Clang fix
         - SR-IOV fix
      
        msm:
         - DPU:
            - fix for kernel doc warnings and smatch warnings in dpu_encoder
            - fix for smatch warning in dpu_encoder
            - fix the bus bandwidth value for SDM670
         - DP:
            - fixes to handle unknown bpc case correctly for DP
            - fix for MISC0 programming
         - GPU:
            - dmabuf vmap fix
            - a610 UBWC corruption fix (incorrect hbb)
            - revert a commit that was making GPU recovery unreliable"
      
      * tag 'drm-fixes-2024-02-09' of git://anongit.freedesktop.org/drm/drm: (43 commits)
        drm/xe: Remove TEST_VM_ASYNC_OPS_ERROR
        drm/xe/vm: don't ignore error when in_kthread
        drm/xe: Assume large page size if VMA not yet bound
        drm/xe/display: Fix memleak in display initialization
        drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool
        drm/xe: circumvent bogus stringop-overflow warning
        drm/xe: Pick correct userptr VMA to repin on REMAP op failure
        drm/xe: Take a reference in xe_exec_queue_last_fence_get()
        drm/xe: Fix loop in vm_bind_ioctl_ops_unwind
        drm/amdgpu: Fix HDP flush for VFs on nbio v7.9
        drm/amd/display: Implement bounds check for stream encoder creation in DCN301
        drm/amd/display: Increase frame-larger-than for all display_mode_vba files
        drm/amd/display: Clear phantom stream count and plane count
        drm/amdgpu: Avoid fetching VRAM vendor info
        drm/amd/display: Disable ODM by default for DCN35
        drm/amd/display: Update phantom pipe enable / disable sequence
        drm/amd/display: Fix MST Null Ptr for RV
        drm/amdgpu: Fix shared buff copy to user
        drm/amd/display: Increase eval/entry delay for DCN35
        drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend
        ...
      c76b766e
    • Aleksander Mazur's avatar
      x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6 · f6a18925
      Aleksander Mazur authored
      The kernel built with MCRUSOE is unbootable on Transmeta Crusoe.  It shows
      the following error message:
      
        This kernel requires an i686 CPU, but only detected an i586 CPU.
        Unable to boot - please use a kernel appropriate for your CPU.
      
      Remove MCRUSOE from the condition introduced in commit in Fixes, effectively
      changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the
      CPU family given by CPUID.
      
        [ bp: Massage commit message. ]
      
      Fixes: 25d76ac8 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig")
      Signed-off-by: default avatarAleksander Mazur <deweloper@wp.pl>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      Link: https://lore.kernel.org/r/20240123134309.1117782-1-deweloper@wp.pl
      f6a18925
    • Steven Rostedt (Google)'s avatar
      tracing: Fix wasted memory in saved_cmdlines logic · 44dc5c41
      Steven Rostedt (Google) authored
      While looking at improving the saved_cmdlines cache I found a huge amount
      of wasted memory that should be used for the cmdlines.
      
      The tracing data saves pids during the trace. At sched switch, if a trace
      occurred, it will save the comm of the task that did the trace. This is
      saved in a "cache" that maps pids to comms and exposed to user space via
      the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
      default 128 comms.
      
      The structure that uses this creates an array to store the pids using
      PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
      to be of the size of 131104 bytes on 64 bit machines.
      
      In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
      powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
      this structure. That leaves 131040 bytes of wasted space.
      
      Worse, the structure points to an allocated array to store the comm names,
      which is 16 bytes times the amount of names to save (currently 128), which
      is 2048 bytes. Instead of allocating a separate array, make the structure
      end with a variable length string and use the extra space for that.
      
      This is similar to a recommendation that Linus had made about eventfs_inode names:
      
        https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/
      
      Instead of allocating a separate string array to hold the saved comms,
      have the structure end with: char saved_cmdlines[]; and round up to the
      next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
      It will use this extra space for the saved_cmdline portion.
      
      Now, instead of saving only 128 comms by default, by using this wasted
      space at the end of the structure it can save over 8000 comms and even
      saves space by removing the need for allocating the other array.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Vincent Donnefort <vdonnefort@google.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Mete Durlu <meted@linux.ibm.com>
      Fixes: 939c7a4f ("tracing: Introduce saved_cmdlines_size file")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      44dc5c41