1. 20 Dec, 2023 2 commits
    • Vegard Nossum's avatar
      x86/asm: Replace magic numbers in GDT descriptors, preparations · 41ef75c8
      Vegard Nossum authored
      We'd like to replace all the magic numbers in various GDT descriptors
      with new, semantically meaningful, symbolic values.
      
      In order to be able to verify that the change doesn't cause any actual
      changes to the compiled binary code, I've split the change into two
      patches:
      
       - Part 1 (this commit): everything _but_ actually replacing the numbers
       - Part 2 (the following commit): _only_ replacing the numbers
      
      The reason we need this split for verification is that including new
      headers causes some spurious changes to the object files, mostly line
      number changes in the debug info but occasionally other subtle codegen
      changes.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
      41ef75c8
    • Vegard Nossum's avatar
      x86/asm: Provide new infrastructure for GDT descriptors · 016919c1
      Vegard Nossum authored
      Linus suggested replacing the magic numbers in the GDT descriptors
      using preprocessor macros. Designing the interface properly is actually
      pretty hard -- there are several constraints:
      
      - you want the final expressions to be readable at a glance; something
        like GDT_ENTRY_FLAGS(5, 1, 0, 1, 0, 1, 1, 0) isn't because you need
        to visit the definition to understand what each parameter represents
        and then match up parameters in the user and the definition (which is
        hard when there are so many of them)
      
      - you want the final expressions to be fairly short/information-dense;
        something like GDT_ENTRY_PRESENT | GDT_ENTRY_DATA_WRITABLE |
        GDT_ENTRY_SYSTEM | GDT_ENTRY_DB | GDT_ENTRY_GRANULARITY_4K is a bit
        too verbose to write out every time and is actually hard to read as
        well because of all the repetition
      
      - you may want to assume defaults for some things (e.g. entries are
        DPL-0 a.k.a. kernel segments by default) and allow the user to
        override the default -- but this works best if you can OR in the
        override; if you want DPL-3 by default and override with DPL-0 you
        would need to start masking off bits instead of OR-ing them in and
        that just becomes harder to read
      
      - you may want to parameterize some things (e.g. CODE vs. DATA or
        KERNEL vs. USER) since both values are used and you don't really
        want prefer either one by default -- or DPL, which is always some
        value that is always specified
      
      This patch tries to balance these requirements and has two layers of
      definitions -- low-level and high-level:
      
      - the low-level defines are the mapping between human-readable names
        and the actual bit numbers
      
      - the high-level defines are the mapping from high-level intent to
        combinations of low-level flags, representing roughly a tuple
        (data/code/tss, 64/32/16-bits) plus an override for DPL-3 (= USER),
        since that's relatively rare but still very important to mark
        properly for those segments.
      
      - we have *_BIOS variants for 32-bit code and data segments that don't
        have the G flag set and give the limit in terms of bytes instead of
        pages
      
      [ mingo: Improved readability bit more. ]
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/20231219151200.2878271-2-vegard.nossum@oracle.com
      016919c1
  2. 17 Dec, 2023 10 commits
  3. 16 Dec, 2023 3 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 3b8a9b2e
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix eventfs to check creating new files for events with names greater
         than NAME_MAX. The eventfs lookup needs to check the return result of
         simple_lookup().
      
       - Fix the ring buffer to check the proper max data size. Events must be
         able to fit on the ring buffer sub-buffer, if it cannot, then it
         fails to be written and the logic to add the event is avoided. The
         code to check if an event can fit failed to add the possible absolute
         timestamp which may make the event not be able to fit. This causes
         the ring buffer to go into an infinite loop trying to find a
         sub-buffer that would fit the event. Luckily, there's a check that
         will bail out if it looped over a 1000 times and it also warns.
      
         The real fix is not to add the absolute timestamp to an event that is
         starting at the beginning of a sub-buffer because it uses the
         sub-buffer timestamp.
      
         By avoiding the timestamp at the start of the sub-buffer allows
         events that pass the first check to always find a sub-buffer that it
         can fit on.
      
       - Have large events that do not fit on a trace_seq to print "LINE TOO
         BIG" like it does for the trace_pipe instead of what it does now
         which is to silently drop the output.
      
       - Fix a memory leak of forgetting to free the spare page that is saved
         by a trace instance.
      
       - Update the size of the snapshot buffer when the main buffer is
         updated if the snapshot buffer is allocated.
      
       - Fix ring buffer timestamp logic by removing all the places that tried
         to put the before_stamp back to the write stamp so that the next
         event doesn't add an absolute timestamp. But each of these updates
         added a race where by making the two timestamp equal, it was
         validating the write_stamp so that it can be incorrectly used for
         calculating the delta of an event.
      
       - There's a temp buffer used for printing the event that was using the
         event data size for allocation when it needed to use the size of the
         entire event (meta-data and payload data)
      
       - For hardening, use "%.*s" for printing the trace_marker output, to
         limit the amount that is printed by the size of the event. This was
         discovered by development that added a bug that truncated the '\0'
         and caused a crash.
      
       - Fix a use-after-free bug in the use of the histogram files when an
         instance is being removed.
      
       - Remove a useless update in the rb_try_to_discard of the write_stamp.
         The before_stamp was already changed to force the next event to add
         an absolute timestamp that the write_stamp is not used. But the
         write_stamp is modified again using an unneeded 64-bit cmpxchg.
      
       - Fix several races in the 32-bit implementation of the
         rb_time_cmpxchg() that does a 64-bit cmpxchg.
      
       - While looking at fixing the 64-bit cmpxchg, I noticed that because
         the ring buffer uses normal cmpxchg, and this can be done in NMI
         context, there's some architectures that do not have a working
         cmpxchg in NMI context. For these architectures, fail recording
         events that happen in NMI context.
      
      * tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
        ring-buffer: Have rb_time_cmpxchg() set the msb counter too
        ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
        ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
        ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
        ring-buffer: Do not try to put back write_stamp
        tracing: Fix uaf issue when open the hist or hist_debug file
        tracing: Add size check when printing trace_marker output
        ring-buffer: Have saved event hold the entire event
        ring-buffer: Do not update before stamp when switching sub-buffers
        tracing: Update snapshot buffer on resize if it is allocated
        ring-buffer: Fix memory leak of free page
        eventfs: Fix events beyond NAME_MAX blocking tasks
        tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
        ring-buffer: Fix writing to the buffer with max_data_size
      3b8a9b2e
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · c8e97fc6
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Arm CMN perf: fix the DTC allocation failure path which can end up
         erroneously clearing live counters
      
       - arm64/mm: fix hugetlb handling of the dirty page state leading to a
         continuous fault loop in user on hardware without dirty bit
         management (DBM). That's caused by the dirty+writeable information
         not being properly preserved across a series of mprotect(PROT_NONE),
         mprotect(PROT_READ|PROT_WRITE)
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
        perf/arm-cmn: Fail DTC counter allocation correctly
      c8e97fc6
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 2e3f280b
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Limit Max_Read_Request_Size (MRRS) on some MIPS Loongson systems
         because they don't all support MRRS > 256, and firmware doesn't
         always initialize it correctly, which meant some PCIe devices didn't
         work (Jiaxun Yang)
      
       - Add and use pci_enable_link_state_locked() to prevent potential
         deadlocks in vmd and qcom drivers (Johan Hovold)
      
       - Revert recent (v6.5) acpiphp resource assignment changes that fixed
         issues with hot-adding devices on a root bus or with large BARs, but
         introduced new issues with GPU initialization and hot-adding SCSI
         disks in QEMU VMs and (Bjorn Helgaas)
      
      * tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
        PCI/ASPM: Add pci_disable_link_state_locked() lockdep assert
        PCI/ASPM: Clean up __pci_disable_link_state() 'sem' parameter
        PCI: qcom: Clean up ASPM comment
        PCI: qcom: Fix potential deadlock when enabling ASPM
        PCI: vmd: Fix potential deadlock when enabling ASPM
        PCI/ASPM: Add pci_enable_link_state_locked()
        PCI: loongson: Limit MRRS to 256
      2e3f280b
  4. 15 Dec, 2023 24 commits
  5. 14 Dec, 2023 1 commit
    • Al Viro's avatar
      io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation · 1ba0e9d6
      Al Viro authored
      	In 8e9fad0e "io_uring: Add io_uring command support for sockets"
      you've got an include of asm-generic/ioctls.h done in io_uring/uring_cmd.c.
      That had been done for the sake of this chunk -
      +               ret = prot->ioctl(sk, SIOCINQ, &arg);
      +               if (ret)
      +                       return ret;
      +               return arg;
      +       case SOCKET_URING_OP_SIOCOUTQ:
      +               ret = prot->ioctl(sk, SIOCOUTQ, &arg);
      
      SIOC{IN,OUT}Q are defined to symbols (FIONREAD and TIOCOUTQ) that come from
      ioctls.h, all right, but the values vary by the architecture.
      
      FIONREAD is
      	0x467F on mips
      	0x4004667F on alpha, powerpc and sparc
      	0x8004667F on sh and xtensa
      	0x541B everywhere else
      TIOCOUTQ is
      	0x7472 on mips
      	0x40047473 on alpha, powerpc and sparc
      	0x80047473 on sh and xtensa
      	0x5411 everywhere else
      
      ->ioctl() expects the same values it would've gotten from userland; all
      places where we compare with SIOC{IN,OUT}Q are using asm/ioctls.h, so
      they pick the correct values.  io_uring_cmd_sock(), OTOH, ends up
      passing the default ones.
      
      Fixes: 8e9fad0e ("io_uring: Add io_uring command support for sockets")
      Cc:  <stable@vger.kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/r/20231214213408.GT1674809@ZenIVSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1ba0e9d6