1. 19 Mar, 2024 33 commits
  2. 10 Mar, 2024 7 commits
    • Linus Torvalds's avatar
      Linux 6.8 · e8f897f4
      Linus Torvalds authored
      e8f897f4
    • Linus Torvalds's avatar
      Merge tag 'trace-ring-buffer-v6.8-rc7' of... · fa4b851b
      Linus Torvalds authored
      Merge tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
      
       - Do not allow large strings (> 4096) as single write to trace_marker
      
         The size of a string written into trace_marker was determined by the
         size of the sub-buffer in the ring buffer. That size is dependent on
         the PAGE_SIZE of the architecture as it can be mapped into user
         space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of
         the string of writing into trace_marker 64K.
      
         One of the selftests looks at the size of the ring buffer sub-buffers
         and writes that plus more into the trace_marker. The write will take
         what it can and report back what it consumed so that the user space
         application (like echo) will write the rest of the string. The string
         is stored in the ring buffer and can be read via the "trace" or
         "trace_pipe" files.
      
         The reading of the ring buffer uses vsnprintf(), which uses a
         precision "%.*s" to make sure it only reads what is stored in the
         buffer, as a bug could cause the string to be non terminated.
      
         With the combination of the precision change and the PAGE_SIZE of 64K
         allowing huge strings to be added into the ring buffer, plus the test
         that would actually stress that limit, a bug was reported that the
         precision used was too big for "%.*s" as the string was close to 64K
         in size and the max precision of vsnprintf is 32K.
      
         Linus suggested not to have that precision as it could hide a bug if
         the string was again stored without a nul byte.
      
         Another issue that was brought up is that the trace_seq buffer is
         also based on PAGE_SIZE even though it is not tied to the
         architecture limit like the ring buffer sub-buffer is. Having it be
         64K * 2 is simply just too big and wasting memory on systems with 64K
         page sizes. It is now hardcoded to 8K which is what all other
         architectures with 4K PAGE_SIZE has.
      
         Finally, the write to trace_marker is now limited to 4K as there is
         no reason to write larger strings into trace_marker.
      
       - ring_buffer_wait() should not loop.
      
         The ring_buffer_wait() does not have the full context (yet) on if it
         should loop or not. Just exit the loop as soon as its woken up and
         let the callers decide to loop or not (they already do, so it's a bit
         redundant).
      
       - Fix shortest_full field to be the smallest amount in the ring buffer
         that a waiter is waiting for. The "shortest_full" field is updated
         when a new waiter comes in and wants to wait for a smaller amount of
         data in the ring buffer than other waiters. But after all waiters are
         woken up, it's not reset, so if another waiter comes in wanting to
         wait for more data, it will be woken up when the ring buffer has a
         smaller amount from what the previous waiters were waiting for.
      
       - The wake up all waiters on close is incorrectly called frome
         .release() and not from .flush() so it will never wake up any waiters
         as the .release() will not get called until all .read() calls are
         finished. And the wakeup is for the waiters in those .read() calls.
      
      * tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Use .flush() call to wake up readers
        ring-buffer: Fix resetting of shortest_full
        ring-buffer: Fix waking up ring buffer readers
        tracing: Limit trace_marker writes to just 4K
        tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE
        tracing: Remove precision vsnprintf() check from print event
      fa4b851b
    • Linus Torvalds's avatar
      Merge tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · 210ee636
      Linus Torvalds authored
      Pull phy fixes from Vinod Koul:
      
       - fixes for Qualcomm qmp-combo driver for ordering of drm and type-c
         switch registartion due to drivers might not probe defer after having
         registered child devices to avoid triggering a probe deferral loop.
      
         This fixes internal display on Lenovo ThinkPad X13s
      
      * tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
        phy: qcom-qmp-combo: fix type-c switch registration
        phy: qcom-qmp-combo: fix drm bridge registration
      210ee636
    • Steven Rostedt (Google)'s avatar
      tracing: Use .flush() call to wake up readers · e5d7c191
      Steven Rostedt (Google) authored
      The .release() function does not get called until all readers of a file
      descriptor are finished.
      
      If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
      and another thread closes the file descriptor, it will not wake up the
      other thread as ring_buffer_wake_waiters() is called by .release(), and
      that will not get called until the .read() is finished.
      
      The issue originally showed up in trace-cmd, but the readers are actually
      other processes with their own file descriptors. So calling close() would wake
      up the other tasks because they are blocked on another descriptor then the
      one that was closed(). But there's other wake ups that solve that issue.
      
      When a thread is blocked on a read, it can still hang even when another
      thread closed its descriptor.
      
      This is what the .flush() callback is for. Have the .flush() wake up the
      readers.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linke li <lilinke99@qq.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: f3ddb74a ("tracing: Wake up ring buffer waiters on closing of the file")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e5d7c191
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix resetting of shortest_full · 68282dd9
      Steven Rostedt (Google) authored
      The "shortest_full" variable is used to keep track of the waiter that is
      waiting for the smallest amount on the ring buffer before being woken up.
      When a tasks waits on the ring buffer, it passes in a "full" value that is
      a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
      100% full buffer.
      
      As all waiters are on the same wait queue, the wake up happens for the
      waiter with the smallest percentage.
      
      The problem is that the smallest_full on the cpu_buffer that stores the
      smallest amount doesn't get reset when all the waiters are woken up. It
      does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
      
      This means that tasks may be woken up more often then when they want to
      be. Instead, have the shortest_full field get reset just before waking up
      all the tasks. If the tasks wait again, they will update the shortest_full
      before sleeping.
      
      Also add locking around setting of shortest_full in the poll logic, and
      change "work" to "rbwork" to match the variable name for rb_irq_work
      structures that are used in other places.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.948914369@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linke li <lilinke99@qq.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: 2c2b0a78 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      68282dd9
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 137e0ec0
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "KVM GUEST_MEMFD fixes for 6.8:
      
         - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
           to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
           writable from userspace, so there would be no way to write to a
           read-only guest_memfd).
      
         - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
           clear that such VMs are purely for development and testing.
      
         - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
           plan is to support confidential VMs with deterministic private
           memory (SNP and TDX) only in the TDP MMU.
      
         - Fix a bug in a GUEST_MEMFD dirty logging test that caused false
           passes.
      
        x86 fixes:
      
         - Fix missing marking of a guest page as dirty when emulating an
           atomic access.
      
         - Check for mmu_notifier invalidation events before faulting in the
           pfn, and before acquiring mmu_lock, to avoid unnecessary work and
           lock contention with preemptible kernels (including
           CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).
      
         - Disable AMD DebugSwap by default, it breaks VMSA signing and will
           be re-enabled with a better VM creation API in 6.10.
      
         - Do the cache flush of converted pages in svm_register_enc_region()
           before dropping kvm->lock, to avoid a race with unregistering of
           the same region and the consequent use-after-free issue"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        SEV: disable SEV-ES DebugSwap by default
        KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
        KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
        KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
        KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
        KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
        KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
        KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
        KVM: x86: Mark target gfn of emulated atomic instruction as dirty
      137e0ec0
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix waking up ring buffer readers · b3594573
      Steven Rostedt (Google) authored
      A task can wait on a ring buffer for when it fills up to a specific
      watermark. The writer will check the minimum watermark that waiters are
      waiting for and if the ring buffer is past that, it will wake up all the
      waiters.
      
      The waiters are in a wait loop, and will first check if a signal is
      pending and then check if the ring buffer is at the desired level where it
      should break out of the loop.
      
      If a file that uses a ring buffer closes, and there's threads waiting on
      the ring buffer, it needs to wake up those threads. To do this, a
      "wait_index" was used.
      
      Before entering the wait loop, the waiter will read the wait_index. On
      wakeup, it will check if the wait_index is different than when it entered
      the loop, and will exit the loop if it is. The waker will only need to
      update the wait_index before waking up the waiters.
      
      This had a couple of bugs. One trivial one and one broken by design.
      
      The trivial bug was that the waiter checked the wait_index after the
      schedule() call. It had to be checked between the prepare_to_wait() and
      the schedule() which it was not.
      
      The main bug is that the first check to set the default wait_index will
      always be outside the prepare_to_wait() and the schedule(). That's because
      the ring_buffer_wait() doesn't have enough context to know if it should
      break out of the loop.
      
      The loop itself is not needed, because all the callers to the
      ring_buffer_wait() also has their own loop, as the callers have a better
      sense of what the context is to decide whether to break out of the loop
      or not.
      
      Just have the ring_buffer_wait() block once, and if it gets woken up, exit
      the function and let the callers decide what to do next.
      
      Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linke li <lilinke99@qq.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: e30f53aa ("tracing: Do not busy wait in buffer splice")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b3594573