1. 24 Apr, 2024 3 commits
  2. 17 Apr, 2024 5 commits
  3. 15 Apr, 2024 1 commit
  4. 13 Apr, 2024 1 commit
    • Linus Torvalds's avatar
      vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements · 42bd2af5
      Linus Torvalds authored
         "The definition of insanity is doing the same thing over and over
          again and expecting different results”
      
      We've tried to do this before, most recently with commit bb2314b4
      ("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a
      decade ago.
      
      But the effort goes back even further than that, eg this thread back
      from 1998 that is so old that we don't even have it archived in lore:
      
          https://lkml.org/lkml/1998/3/10/108
      
      which also points out some of the reasons why it's dangerous.
      
      Or, how about then in 2003:
      
          https://lkml.org/lkml/2003/4/6/112
      
      where we went through some of the same arguments, just wirh different
      people involved.
      
      In particular, having access to a file descriptor does not necessarily
      mean that you have access to the path that was used for lookup, and
      there may be very good reasons why you absolutely must not have access
      to a path to said file.
      
      For example, if we were passed a file descriptor from the outside into
      some limited environment (think chroot, but also user namespaces etc) a
      'flink()' system call could now make that file visible inside a context
      where it's not supposed to be visible.
      
      In the process the user may also be able to re-open it with permissions
      that the original file descriptor did not have (eg a read-only file
      descriptor may be associated with an underlying file that is writable).
      
      Another variation on this is if somebody else (typically root) opens a
      file in a directory that is not accessible to others, and passes the
      file descriptor on as a read-only file.  Again, the access to the file
      descriptor does not imply that you should have access to a path to the
      file in the filesystem.
      
      So while we have tried this several times in the past, it never works.
      
      The last time we did this, that commit bb2314b4 quickly got reverted
      again in commit f0cc6ffb (Revert "fs: Allow unprivileged linkat(...,
      AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once
      the whole discussion about the interface is done".
      
      Well, the discussion is long done, and didn't come to any resolution.
      There's no question that 'flink()' would be a useful operation, but it's
      a dangerous one.
      
      However, it does turn out that since 2008 (commit d76b0d9b: "CRED:
      Use creds in file structs") we have had a fairly straightforward way to
      check whether the file descriptor was opened by the same credentials as
      the credentials of the flink().
      
      That allows the most common patterns that people want to use, which tend
      to be to either open the source carefully (ie using the openat2()
      RESOLVE_xyz flags, and/or checking ownership with fstat() before
      linking), or to use O_TMPFILE and fill in the file contents before it's
      exposed to the world with linkat().
      
      But it also means that if the file descriptor was opened by somebody
      else, or we've gone through a credentials change since, the operation no
      longer works (unless we have CAP_DAC_READ_SEARCH capabilities in the
      opener's user namespace, as before).
      
      Note that the credential equality check is done by using pointer
      equality, which means that it's not enough that you have effectively the
      same user - they have to be literally identical, since our credentials
      are using copy-on-write semantics.
      
      So you can't change your credentials to something else and try to change
      it back to the same ones between the open() and the linkat().  This is
      not meant to be some kind of generic permission check, this is literally
      meant as a "the open and link calls are 'atomic' wrt user credentials"
      check.
      
      It also means that you can't just move things between namespaces,
      because the credentials aren't just a list of uid's and gid's: they
      includes the pointer to the user_ns that the capabilities are relative
      to.
      
      So let's try this one more time and see if maybe this approach ends up
      being workable after all.
      
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/20240411001012.12513-1-torvalds@linux-foundation.org
      [brauner: relax capability check to opener of the file]
      Link: https://lore.kernel.org/all/20231113-undenkbar-gediegen-efde5f1c34bc@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      42bd2af5
  5. 11 Apr, 2024 1 commit
  6. 09 Apr, 2024 4 commits
  7. 07 Apr, 2024 2 commits
  8. 05 Apr, 2024 12 commits
  9. 26 Mar, 2024 4 commits
  10. 24 Mar, 2024 7 commits
    • Linus Torvalds's avatar
      Linux 6.9-rc1 · 4cece764
      Linus Torvalds authored
      4cece764
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · ab8de2db
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Fix logic that is supposed to prevent placement of the kernel image
         below LOAD_PHYSICAL_ADDR
      
       - Use the firmware stack in the EFI stub when running in mixed mode
      
       - Clear BSS only once when using mixed mode
      
       - Check efi.get_variable() function pointer for NULL before trying to
         call it
      
      * tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: fix panic in kdump kernel
        x86/efistub: Don't clear BSS twice in mixed mode
        x86/efistub: Call mixed mode boot services on the firmware's stack
        efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
      ab8de2db
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e74df2f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Ensure that the encryption mask at boot is properly propagated on
         5-level page tables, otherwise the PGD entry is incorrectly set to
         non-encrypted, which causes system crashes during boot.
      
       - Undo the deferred 5-level page table setup as it cannot work with
         memory encryption enabled.
      
       - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
         to the default value but the cached variable is not, so subsequent
         comparisons might yield the wrong result and as a consequence the
         result prevents updating the MSR.
      
       - Register the local APIC address only once in the MPPARSE enumeration
         to prevent triggering the related WARN_ONs() in the APIC and topology
         code.
      
       - Handle the case where no APIC is found gracefully by registering a
         fake APIC in the topology code. That makes all related topology
         functions work correctly and does not affect the actual APIC driver
         code at all.
      
       - Don't evaluate logical IDs during early boot as the local APIC IDs
         are not yet enumerated and the invoked function returns an error
         code. Nothing requires the logical IDs before the final CPUID
         enumeration takes place, which happens after the enumeration.
      
       - Cure the fallout of the per CPU rework on UP which misplaced the
         copying of boot_cpu_data to per CPU data so that the final update to
         boot_cpu_data got lost which caused inconsistent state and boot
         crashes.
      
       - Use copy_from_kernel_nofault() in the kprobes setup as there is no
         guarantee that the address can be safely accessed.
      
       - Reorder struct members in struct saved_context to work around another
         kmemleak false positive
      
       - Remove the buggy code which tries to update the E820 kexec table for
         setup_data as that is never passed to the kexec kernel.
      
       - Update the resource control documentation to use the proper units.
      
       - Fix a Kconfig warning observed with tinyconfig
      
      * tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/64: Move 5-level paging global variable assignments back
        x86/boot/64: Apply encryption mask to 5-level pagetable update
        x86/cpu: Add model number for another Intel Arrow Lake mobile processor
        x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
        Documentation/x86: Document that resctrl bandwidth control units are MiB
        x86/mpparse: Register APIC address only once
        x86/topology: Handle the !APIC case gracefully
        x86/topology: Don't evaluate logical IDs during early boot
        x86/cpu: Ensure that CPU info updates are propagated on UP
        kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
        x86/pm: Work around false positive kmemleak report in msr_build_context()
        x86/kexec: Do not update E820 kexec table for setup_data
        x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
      5e74df2f
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b136f68e
      Linus Torvalds authored
      Pull scheduler doc clarification from Thomas Gleixner:
       "A single update for the documentation of the base_slice_ns tunable to
        clarify that any value which is less than the tick slice has no effect
        because the scheduler tick is not guaranteed to happen within the set
        time slice"
      
      * tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
      b136f68e
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping · 864ad046
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
       "This has a set of swiotlb alignment fixes for sometimes very long
        standing bugs from Will. We've been discussion them for a while and
        they should be solid now"
      
      * tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
        iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
        swiotlb: Fix alignment checks when both allocation and DMA masks are present
        swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
        swiotlb: Enforce page alignment in swiotlb_alloc()
        swiotlb: Fix double-allocation of slots due to broken alignment handling
      864ad046
    • Oleksandr Tymoshenko's avatar
      efi: fix panic in kdump kernel · 62b71cd7
      Oleksandr Tymoshenko authored
      Check if get_next_variable() is actually valid pointer before
      calling it. In kdump kernel this method is set to NULL that causes
      panic during the kexec-ed kernel boot.
      
      Tested with QEMU and OVMF firmware.
      
      Fixes: bad267f9 ("efi: verify that variable services are supported")
      Signed-off-by: default avatarOleksandr Tymoshenko <ovt@google.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      62b71cd7
    • Ard Biesheuvel's avatar
      x86/efistub: Don't clear BSS twice in mixed mode · df7ecce8
      Ard Biesheuvel authored
      Clearing BSS should only be done once, at the very beginning.
      efi_pe_entry() is the entrypoint from the firmware, which may not clear
      BSS and so it is done explicitly. However, efi_pe_entry() is also used
      as an entrypoint by the mixed mode startup code, in which case BSS will
      already have been cleared, and doing it again at this point will corrupt
      global variables holding the firmware's GDT/IDT and segment selectors.
      
      So make the memset() conditional on whether the EFI stub is running in
      native mode.
      
      Fixes: b3810c5a ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      df7ecce8