1. 28 Aug, 2023 19 commits
    • Linus Torvalds's avatar
      Merge tag 'rcu.2023.08.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 68cadad1
      Linus Torvalds authored
      Pull RCU updates from Paul McKenney:
      
       - Documentation updates
      
       - Miscellaneous fixes, perhaps most notably simplifying
         SRCU_NOTIFIER_INIT() as suggested
      
       - RCU Tasks updates, most notably treating Tasks RCU callbacks as lazy
         while still treating synchronous grace periods as urgent. Also fixes
         one bug that restores the ability to apply debug-objects to RCU Tasks
         and another that fixes a race condition that could result in
         false-positive failures of the boot-time self-test code
      
       - RCU-scalability performance-test updates, most notably adding the
         ability to measure the RCU-Tasks's grace-period kthread's CPU
         consumption. This proved quite useful for the RCU Tasks work
      
       - Reference-acquisition/release performance-test updates, including a
         fix for an uninitialized wait_queue_head_t
      
       - Miscellaneous torture-test updates
      
       - Torture-test scripting updates, including removal of the
         non-longer-functional formal-verification scripts, test builds of
         individual RCU Tasks flavors, better diagnostics for loss of
         connectivity for distributed rcutorture tests, disabling of reboot
         loops in qemu/KVM-based rcutorture testing, and passing of init
         parameters to rcutorture's init program
      
      * tag 'rcu.2023.08.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (64 commits)
        rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls
        rcu: Make the rcu_nocb_poll boot parameter usable via boot config
        rcu: Mark __rcu_irq_enter_check_tick() ->rcu_urgent_qs load
        srcu,notifier: Remove #ifdefs in favor of SRCU Tiny srcu_usage
        rcutorture: Stop right-shifting torture_random() return values
        torture: Stop right-shifting torture_random() return values
        torture: Move stutter_wait() timeouts to hrtimers
        torture: Move torture_shuffle() timeouts to hrtimers
        torture: Move torture_onoff() timeouts to hrtimers
        torture: Make torture_hrtimeout_*() use TASK_IDLE
        torture: Add lock_torture writer_fifo module parameter
        torture: Add a kthread-creation callback to _torture_create_kthread()
        rcu-tasks: Fix boot-time RCU tasks debug-only deadlock
        rcu-tasks: Permit use of debug-objects with RCU Tasks flavors
        checkpatch: Complain about unexpected uses of RCU Tasks Trace
        torture: Cause mkinitrd.sh to indicate failure on compile errors
        torture: Make init program dump command-line arguments
        torture: Switch qemu from -nographic to -display none
        torture: Add init-program support for loongarch
        torture: Avoid torture-test reboot loops
        ...
      68cadad1
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 727dbda1
      Linus Torvalds authored
      Pull hardening updates from Kees Cook:
       "As has become normal, changes are scattered around the tree (either
        explicitly maintainer Acked or for trivial stuff that went ignored):
      
         - Carve out the new CONFIG_LIST_HARDENED as a more focused subset of
           CONFIG_DEBUG_LIST (Marco Elver)
      
         - Fix kallsyms lookup failure under Clang LTO (Yonghong Song)
      
         - Clarify documentation for CONFIG_UBSAN_TRAP (Jann Horn)
      
         - Flexible array member conversion not carried in other tree (Gustavo
           A. R. Silva)
      
         - Various strlcpy() and strncpy() removals not carried in other trees
           (Azeem Shaikh, Justin Stitt)
      
         - Convert nsproxy.count to refcount_t (Elena Reshetova)
      
         - Add handful of __counted_by annotations not carried in other trees,
           as well as an LKDTM test
      
         - Fix build failure with gcc-plugins on GCC 14+
      
         - Fix selftests to respect SKIP for signal-delivery tests
      
         - Fix CFI warning for paravirt callback prototype
      
         - Clarify documentation for seq_show_option_n() usage"
      
      * tag 'hardening-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
        LoadPin: Annotate struct dm_verity_loadpin_trusted_root_digest with __counted_by
        kallsyms: Change func signature for cleanup_symbol_name()
        kallsyms: Fix kallsyms_selftest failure
        nsproxy: Convert nsproxy.count to refcount_t
        integrity: Annotate struct ima_rule_opt_list with __counted_by
        lkdtm: Add FAM_BOUNDS test for __counted_by
        Compiler Attributes: counted_by: Adjust name and identifier expansion
        um: refactor deprecated strncpy to memcpy
        um: vector: refactor deprecated strncpy
        alpha: Replace one-element array with flexible-array member
        hardening: Move BUG_ON_DATA_CORRUPTION to hardening options
        list: Introduce CONFIG_LIST_HARDENED
        list_debug: Introduce inline wrappers for debug checks
        compiler_types: Introduce the Clang __preserve_most function attribute
        gcc-plugins: Rename last_stmt() for GCC 14+
        selftests/harness: Actually report SKIP for signal tests
        x86/paravirt: Fix tlb_remove_table function callback prototype warning
        EISA: Replace all non-returning strlcpy with strscpy
        perf: Replace strlcpy with strscpy
        um: Remove strlcpy declaration
        ...
      727dbda1
    • Linus Torvalds's avatar
      Merge tag 'seccomp-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · b03a4342
      Linus Torvalds authored
      Pull seccomp updates from Kees Cook:
      
       - Provide USER_NOTIFY flag for synchronous mode (Andrei Vagin, Peter
         Oskolkov). This touches the scheduler and perf but has been Acked by
         Peter Zijlstra.
      
       - Fix regression in syscall skipping and restart tracing on arm32. This
         touches arch/arm/ but has been Acked by Arnd Bergmann.
      
      * tag 'seccomp-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        seccomp: Add missing kerndoc notations
        ARM: ptrace: Restore syscall skipping for tracers
        ARM: ptrace: Restore syscall restart tracing
        selftests/seccomp: Handle arm32 corner cases better
        perf/benchmark: add a new benchmark for seccom_unotify
        selftest/seccomp: add a new test for the sync mode of seccomp_user_notify
        seccomp: add the synchronous mode for seccomp_unotify
        sched: add a few helpers to wake up tasks on the current cpu
        sched: add WF_CURRENT_CPU and externise ttwu
        seccomp: don't use semaphore and wait_queue together
      b03a4342
    • Linus Torvalds's avatar
      Merge tag 'pstore-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 5b07aaca
      Linus Torvalds authored
      Pull pstore updates from Kees Cook:
      
       - Greatly simplify compression support (Ard Biesheuvel)
      
       - Avoid crashes for corrupted offsets when prz size is 0 (Enlin Mu)
      
       - Expand range of usable record sizes (Yuxiao Zhang)
      
       - Fix kernel-doc warning (Matthew Wilcox)
      
      * tag 'pstore-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        pstore: Fix kernel-doc warning
        pstore: Support record sizes larger than kmalloc() limit
        pstore/ram: Check start of empty przs during init
        pstore: Replace crypto API compression with zlib_deflate library calls
        pstore: Remove worst-case compression size logic
      5b07aaca
    • Linus Torvalds's avatar
      Merge tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 547635c6
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "No new features, the bulk of the changes are fixes, refactoring and
        cleanups. The notable fix is the scrub performance restoration after
        rewrite in 6.4, though still only partial.
      
        Fixes:
      
         - scrub performance drop due to rewrite in 6.4 partially restored:
            - do IO grouping by blg_plug/blk_unplug again
            - avoid unnecessary tree searches when processing stripes, in
              extent and checksum trees
            - the drop is noticeable on fast PCIe devices, -66% and restored
              to -33% of the original
            - backports to 6.4 planned
      
         - handle more corner cases of transaction commit during orphan
           cleanup or delayed ref processing
      
         - use correct fsid/metadata_uuid when validating super block
      
         - copy directory permissions and time when creating a stub subvolume
      
        Core:
      
         - debugging feature integrity checker deprecated, to be removed in
           6.7
      
         - in zoned mode, zones are activated just before the write, making
           error handling easier, now the overcommit mechanism can be enabled
           again which improves performance by avoiding more frequent flushing
      
         - v0 extent handling completely removed, deprecated long time ago
      
         - error handling improvements
      
         - tests:
            - extent buffer bitmap tests
            - pinned extent splitting tests
      
         - cleanups and refactoring:
            - compression writeback
            - extent buffer bitmap
            - space flushing, ENOSPC handling"
      
      * tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits)
        btrfs: zoned: skip splitting and logical rewriting on pre-alloc write
        btrfs: tests: test invalid splitting when skipping pinned drop extent_map
        btrfs: tests: add a test for btrfs_add_extent_mapping
        btrfs: tests: add extent_map tests for dropping with odd layouts
        btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()
        btrfs: scrub: don't go ordered workqueue for dev-replace
        btrfs: scrub: fix grouping of read IO
        btrfs: scrub: avoid unnecessary csum tree search preparing stripes
        btrfs: scrub: avoid unnecessary extent tree search preparing stripes
        btrfs: copy dir permission and time when creating a stub subvolume
        btrfs: remove pointless empty list check when reading delayed dir indexes
        btrfs: drop redundant check to use fs_devices::metadata_uuid
        btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super
        btrfs: use the correct superblock to compare fsid in btrfs_validate_super
        btrfs: simplify memcpy either of metadata_uuid or fsid
        btrfs: add a helper to read the superblock metadata_uuid
        btrfs: remove v0 extent handling
        btrfs: output extra debug info if we failed to find an inline backref
        btrfs: move the !zoned assert into run_delalloc_cow
        btrfs: consolidate the error handling in run_delalloc_nocow
        ...
      547635c6
    • Linus Torvalds's avatar
      Merge tag 'affs-for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f678c890
      Linus Torvalds authored
      Pull affs updates from David Sterba:
       "Two minor updates for AFFS:
      
         - reimplement writepage() address space callback on top of
           migrate_folio()
      
         - fix a build warning, local parameters 'toupper' collide with the
           standard ctype.h name"
      
      * tag 'affs-for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: rename local toupper() to fn() to avoid confusion
        affs: remove writepage implementation
      f678c890
    • Linus Torvalds's avatar
      Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux · 3bb156a5
      Linus Torvalds authored
      Pull fsverity updates from Eric Biggers:
       "Several cleanups for fs/verity/, including two commits that make the
        builtin signature support more cleanly separated from the base
        feature"
      
      * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
        fsverity: skip PKCS#7 parser when keyring is empty
        fsverity: move sysctl registration out of signature.c
        fsverity: simplify handling of errors during initcall
        fsverity: explicitly check that there is no algorithm 0
      3bb156a5
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux · cc0a38d0
      Linus Torvalds authored
      Pull fscrypt update from Eric Biggers:
       "Just a small documentation improvement"
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
        fscrypt: improve the "Encryption modes and usage" section
      cc0a38d0
    • Linus Torvalds's avatar
      Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 6016fc91
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "We've got some big changes for this release -- I'm very happy to be
        landing willy's work to enable large folios for the page cache for
        general read and write IOs when the fs can make contiguous space
        allocations, and Ritesh's work to track sub-folio dirty state to
        eliminate the write amplification problems inherent in using large
        folios.
      
        As a bonus, io_uring can now process write completions in the caller's
        context instead of bouncing through a workqueue, which should reduce
        io latency dramatically. IOWs, XFS should see a nice performance bump
        for both IO paths.
      
        Summary:
      
         - Make large writes to the page cache fill sparse parts of the cache
           with large folios, then use large memcpy calls for the large folio.
      
         - Track the per-block dirty state of each large folio so that a
           buffered write to a single byte on a large folio does not result in
           a (potentially) multi-megabyte writeback IO.
      
         - Allow some directio completions to be performed in the initiating
           task's context instead of punting through a workqueue. This will
           reduce latency for some io_uring requests"
      
      * tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
        iomap: support IOCB_DIO_CALLER_COMP
        io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
        fs: add IOCB flags related to passing back dio completions
        iomap: add IOMAP_DIO_INLINE_COMP
        iomap: only set iocb->private for polled bio
        iomap: treat a write through cache the same as FUA
        iomap: use an unsigned type for IOMAP_DIO_* defines
        iomap: cleanup up iomap_dio_bio_end_io()
        iomap: Add per-block dirty state tracking to improve performance
        iomap: Allocate ifs in ->write_begin() early
        iomap: Refactor iomap_write_delalloc_punch() function out
        iomap: Use iomap_punch_t typedef
        iomap: Fix possible overflow condition in iomap_write_delalloc_scan
        iomap: Add some uptodate state handling helpers for ifs state bitmap
        iomap: Drop ifs argument from iomap_set_range_uptodate()
        iomap: Rename iomap_page to iomap_folio_state and others
        iomap: Copy larger chunks from userspace
        iomap: Create large folios in the buffered write path
        filemap: Allow __filemap_get_folio to allocate large folios
        filemap: Add fgf_t typedef
        ...
      6016fc91
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · dd2c0198
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "In this cycle, a xattr bloom filter feature is introduced to speed up
        negative xattr lookups, which was originally suggested by Alexander
        for Composefs use cases.
      
        Additionally, the DEFLATE algorithm is now supported, which can be
        used together with hardware accelerators for our cloud workloads. Each
        supported compression algorithm can be selected on a per-file basis
        for specific access patterns too.
      
        There are also some random fixes and cleanups as usual:
      
         - Support xattr bloom filter to optimize negative xattr lookups
      
         - Support DEFLATE compression algorithm as an alternative
      
         - Fix a regression that ztailpacking pclusters don't release properly
      
         - Avoid warning dedupe and fragments features anymore
      
         - Some folio conversions and cleanups"
      
      * tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: release ztailpacking pclusters properly
        erofs: don't warn dedupe and fragments features anymore
        erofs: adapt folios for z_erofs_read_folio()
        erofs: adapt folios for z_erofs_readahead()
        erofs: get rid of fe->backmost for cache decompression
        erofs: drop z_erofs_page_mark_eio()
        erofs: tidy up z_erofs_do_read_page()
        erofs: move preparation logic into z_erofs_pcluster_begin()
        erofs: avoid obsolete {collector,collection} terms
        erofs: simplify z_erofs_read_fragment()
        erofs: remove redundant erofs_fs_type declaration in super.c
        erofs: add necessary kmem_cache_create flags for erofs inode cache
        erofs: clean up redundant comment and adjust code alignment
        erofs: refine warning messages for zdata I/Os
        erofs: boost negative xattr lookup with bloom filter
        erofs: update on-disk format for xattr name filter
        erofs: DEFLATE compression support
      dd2c0198
    • Linus Torvalds's avatar
      Merge tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux · f20ae9cf
      Linus Torvalds authored
      Pull file locking updates from Jeff Layton:
      
       - new functionality for F_OFD_GETLK: requesting a type of F_UNLCK will
         find info about whatever lock happens to be first in the given range,
         regardless of type.
      
       - an OFD lock selftest
      
       - bugfix involving a UAF in a tracepoint
      
       - comment typo fix
      
      * tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
        locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock
        fs/locks: Fix typo
        selftests: add OFD lock tests
        fs/locks: F_UNLCK extension for F_OFD_GETLK
      f20ae9cf
    • Linus Torvalds's avatar
      Merge tag 'v6.6-fs.proc.uapi' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · b4a04f92
      Linus Torvalds authored
      Pull procfs fixes from Christian Brauner:
       "Mode changes to files under /proc/<pid>/ aren't supported ever since
        commit 6d76fa58 ("Don't allow chmod() on the /proc/<pid>/ files").
      
        Due to an oversight in commit 1b3044e3 ("procfs: fix pthread
        cross-thread naming if !PR_DUMPABLE") in switching from REG to NOD,
        mode changes on /proc/thread-self/comm were accidently allowed.
      
        Similar, mode changes for all files beneath /proc/<pid>/net/ are
        blocked but mode changes on /proc/<pid>/net itself were accidently
        allowed.
      
        Both issues come down to not using the generic proc_setattr() helper
        which blocks all mode changes. This is rectified with this pull
        request.
      
        This also removes a strange nolibc test that abused /proc/<pid>/net
        for testing mode changes. Using procfs for this test never made a lot
        of sense given procfs has special semantics for almost everything
        anway.
      
        Both changes are minor user-visible changes. It is however very
        unlikely that mode changes on proc/<pid>/net and
        /proc/thread-self/comm are something that userspace relies on"
      
      * tag 'v6.6-fs.proc.uapi' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        procfs: block chmod on /proc/thread-self/comm
        proc: use generic setattr() for /proc/$PID/net
        selftests/nolibc: drop test chmod_net
      b4a04f92
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 2e0afa7e
      Linus Torvalds authored
      Pull autofs fixes from Christian Brauner:
       "This fixes a memory leak in autofs reported by syzkaller and a missing
        conversion from uninterruptible to interruptible wake up when autofs
        is in catatonic mode"
      
      * tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        autofs: use wake_up() instead of wake_up_interruptible(()
        autofs: fix memory leak of waitqueues in autofs_catatonic_mode
      2e0afa7e
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.fchmodat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 475d4df8
      Linus Torvalds authored
      Pull fchmodat2 system call from Christian Brauner:
       "This adds the fchmodat2() system call. It is a revised version of the
        fchmodat() system call, adding a missing flag argument. Support for
        both AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH are included.
      
        Adding this system call revision has been a longstanding request but
        so far has always fallen through the cracks. While the kernel
        implementation of fchmodat() does not have a flag argument the libc
        provided POSIX-compliant fchmodat(3) version does. Both glibc and musl
        have to implement a workaround in order to support AT_SYMLINK_NOFOLLOW
        (see [1] and [2]).
      
        The workaround is brittle because it relies not just on O_PATH and
        O_NOFOLLOW semantics and procfs magic links but also on our rather
        inconsistent symlink semantics.
      
        This gives userspace a proper fchmodat2() system call that libcs can
        use to properly implement fchmodat(3) and allows them to get rid of
        their hacks. In this case it will immediately benefit them as the
        current workaround is already defunct because of aformentioned
        inconsistencies.
      
        In addition to AT_SYMLINK_NOFOLLOW, give userspace the ability to use
        AT_EMPTY_PATH with fchmodat2(). This is already possible with
        fchownat() so there's no reason to not also support it for
        fchmodat2().
      
        The implementation is simple and comes with selftests. Implementation
        of the system call and wiring up the system call are done as separate
        patches even though they could arguably be one patch. But in case
        there are merge conflicts from other system call additions it can be
        beneficial to have separate patches"
      
      Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35 [1]
      Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28 [2]
      
      * tag 'v6.6-vfs.fchmodat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        selftests: fchmodat2: remove duplicate unneeded defines
        fchmodat2: add support for AT_EMPTY_PATH
        selftests: Add fchmodat2 selftest
        arch: Register fchmodat2, usually as syscall 452
        fs: Add fchmodat2()
        Non-functional cleanup of a "__user * filename"
      475d4df8
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 511fb5ba
      Linus Torvalds authored
      Pull superblock updates from Christian Brauner:
       "This contains the super rework that was ready for this cycle. The
        first part changes the order of how we open block devices and allocate
        superblocks, contains various cleanups, simplifications, and a new
        mechanism to wait on superblock state changes.
      
        This unblocks work to ultimately limit the number of writers to a
        block device. Jan has already scheduled follow-up work that will be
        ready for v6.7 and allows us to restrict the number of writers to a
        given block device. That series builds on this work right here.
      
        The second part contains filesystem freezing updates.
      
        Overview:
      
        The generic superblock changes are rougly organized as follows
        (ignoring additional minor cleanups):
      
         (1) Removal of the bd_super member from struct block_device.
      
             This was a very odd back pointer to struct super_block with
             unclear rules. For all relevant places we have other means to get
             the same information so just get rid of this.
      
         (2) Simplify rules for superblock cleanup.
      
             Roughly, everything that is allocated during fs_context
             initialization and that's stored in fs_context->s_fs_info needs
             to be cleaned up by the fs_context->free() implementation before
             the superblock allocation function has been called successfully.
      
             After sget_fc() returned fs_context->s_fs_info has been
             transferred to sb->s_fs_info at which point sb->kill_sb() if
             fully responsible for cleanup. Adhering to these rules means that
             cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
             brittle and inconsistent.
      
             Cleanup shouldn't be duplicated between sb->put_super() as
             sb->put_super() is only called if sb->s_root has been set aka
             when the filesystem has been successfully born (SB_BORN). That
             complexity should be avoided.
      
             This also means that block devices are to be closed in
             sb->kill_sb() instead of sb->put_super(). More details in the
             lower section.
      
         (3) Make it possible to lookup or create a superblock before opening
             block devices
      
             There's a subtle dependency on (2) as some filesystems did rely
             on fill_super() to be called in order to correctly clean up
             sb->s_fs_info. All these filesystems have been fixed.
      
         (4) Switch most filesystem to follow the same logic as the generic
             mount code now does as outlined in (3).
      
         (5) Use the superblock as the holder of the block device. We can now
             easily go back from block device to owning superblock.
      
         (6) Export and extend the generic fs_holder_ops and use them as
             holder ops everywhere and remove the filesystem specific holder
             ops.
      
         (7) Call from the block layer up into the filesystem layer when the
             block device is removed, allowing to shut down the filesystem
             without risk of deadlocks.
      
         (8) Get rid of get_super().
      
             We can now easily go back from the block device to owning
             superblock and can call up from the block layer into the
             filesystem layer when the device is removed. So no need to wade
             through all registered superblock to find the owning superblock
             anymore"
      
      Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/
      
      * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
        super: use higher-level helper for {freeze,thaw}
        super: wait until we passed kill super
        super: wait for nascent superblocks
        super: make locking naming consistent
        super: use locking helpers
        fs: simplify invalidate_inodes
        fs: remove get_super
        block: call into the file system for ioctl BLKFLSBUF
        block: call into the file system for bdev_mark_dead
        block: consolidate __invalidate_device and fsync_bdev
        block: drop the "busy inodes on changed media" log message
        dasd: also call __invalidate_device when setting the device offline
        amiflop: don't call fsync_bdev in FDFMTBEG
        floppy: call disk_force_media_change when changing the format
        block: simplify the disk_force_media_change interface
        nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
        xfs use fs_holder_ops for the log and RT devices
        xfs: drop s_umount over opening the log and RT devices
        ext4: use fs_holder_ops for the log device
        ext4: drop s_umount over opening the log device
        ...
      511fb5ba
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · de16588a
      Linus Torvalds authored
      Pull misc vfs updates from Christian Brauner:
       "This contains the usual miscellaneous features, cleanups, and fixes
        for vfs and individual filesystems.
      
        Features:
      
         - Block mode changes on symlinks and rectify our broken semantics
      
         - Report file modifications via fsnotify() for splice
      
         - Allow specifying an explicit timeout for the "rootwait" kernel
           command line option. This allows to timeout and reboot instead of
           always waiting indefinitely for the root device to show up
      
         - Use synchronous fput for the close system call
      
        Cleanups:
      
         - Get rid of open-coded lockdep workarounds for async io submitters
           and replace it all with a single consolidated helper
      
         - Simplify epoll allocation helper
      
         - Convert simple_write_begin and simple_write_end to use a folio
      
         - Convert page_cache_pipe_buf_confirm() to use a folio
      
         - Simplify __range_close to avoid pointless locking
      
         - Disable per-cpu buffer head cache for isolated cpus
      
         - Port ecryptfs to kmap_local_page() api
      
         - Remove redundant initialization of pointer buf in pipe code
      
         - Unexport the d_genocide() function which is only used within core
           vfs
      
         - Replace printk(KERN_ERR) and WARN_ON() with WARN()
      
        Fixes:
      
         - Fix various kernel-doc issues
      
         - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE
      
         - Fix a mainly theoretical issue in devpts
      
         - Check the return value of __getblk() in reiserfs
      
         - Fix a racy assert in i_readcount_dec
      
         - Fix integer conversion issues in various functions
      
         - Fix LSM security context handling during automounts that prevented
           NFS superblock sharing"
      
      * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
        cachefiles: use kiocb_{start,end}_write() helpers
        ovl: use kiocb_{start,end}_write() helpers
        aio: use kiocb_{start,end}_write() helpers
        io_uring: use kiocb_{start,end}_write() helpers
        fs: create kiocb_{start,end}_write() helpers
        fs: add kerneldoc to file_{start,end}_write() helpers
        io_uring: rename kiocb_end_write() local helper
        splice: Convert page_cache_pipe_buf_confirm() to use a folio
        libfs: Convert simple_write_begin and simple_write_end to use a folio
        fs/dcache: Replace printk and WARN_ON by WARN
        fs/pipe: remove redundant initialization of pointer buf
        fs: Fix kernel-doc warnings
        devpts: Fix kernel-doc warnings
        doc: idmappings: fix an error and rephrase a paragraph
        init: Add support for rootwait timeout parameter
        vfs: fix up the assert in i_readcount_dec
        fs: Fix one kernel-doc comment
        docs: filesystems: idmappings: clarify from where idmappings are taken
        fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
        vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
        ...
      de16588a
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ecd7db20
      Linus Torvalds authored
      Pull libfs and tmpfs updates from Christian Brauner:
       "This cycle saw a lot of work for tmpfs that required changes to the
        vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this
        cycle. Things will go back to mm next cycle.
      
        Features
        ========
      
         - By far the biggest work is the quota support for tmpfs. New tmpfs
           quota infrastructure is added to support it and a new QFMT_SHMEM
           uapi option is exposed.
      
           This offers user and group quotas to tmpfs (project quotas will be
           added later). Similar to other filesystems tmpfs quota are not
           supported within user namespaces yet.
      
         - Add support for user xattrs. While tmpfs already supports security
           xattrs (security.*) and POSIX ACLs for a long time it lacked
           support for user xattrs (user.*). With this pull request tmpfs will
           be able to support a limited number of user xattrs.
      
           This is accompanied by a fix (see below) to limit persistent simple
           xattr allocations.
      
         - Add support for stable directory offsets. Currently tmpfs relies on
           the libfs provided cursor-based mechanism for readdir. This causes
           issues when a tmpfs filesystem is exported via NFS.
      
           NFS clients do not open directories. Instead, each server-side
           readdir operation opens the directory, reads it, and then closes
           it. Since the cursor state for that directory is associated with
           the opened file it is discarded after each readdir operation. Such
           directory offsets are not just cached by NFS clients but also
           various userspace libraries based on these clients.
      
           As it stands there is no way to invalidate the caches when
           directory offsets have changed and the whole application depends on
           unchanging directory offsets.
      
           At LSFMM we discussed how to solve this problem and decided to
           support stable directory offsets. libfs now allows filesystems like
           tmpfs to use an xarrary to map a directory offset to a dentry. This
           mechanism is currently only used by tmpfs but can be supported by
           others as well.
      
        Fixes
        =====
      
         - Change persistent simple xattrs allocations in libfs from
           GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory
           cgroup limits. Since this is a change to libfs it affects both
           tmpfs and kernfs.
      
         - Correctly verify {g,u}id mount options.
      
           A new filesystem context is created via fsopen() which records the
           namespace that becomes the owning namespace of the superblock when
           fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are
           mountable in namespaces. However, fsconfig() calls can occur in a
           namespace different from the namespace where fsopen() has been
           called.
      
           Currently, when fsconfig() is called to set {g,u}id mount options
           the requested {g,u}id is mapped into a k{g,u}id according to the
           namespace where fsconfig() was called from. The resulting k{g,u}id
           is not guaranteed to be resolvable in the namespace of the
           filesystem (the one that fsopen() was called in).
      
           This means it's possible for an unprivileged user to create files
           owned by any group in a tmpfs mount since it's possible to set the
           setid bits on the tmpfs directory.
      
           The contract for {g,u}id mount options and {g,u}id values in
           general set from userspace has always been that they are translated
           according to the caller's idmapping. In so far, tmpfs has been
           doing the correct thing. But since tmpfs is mountable in
           unprivileged contexts it is also necessary to verify that the
           resulting {k,g}uid is representable in the namespace of the
           superblock to avoid such bugs.
      
           The new mount api's cross-namespace delegation abilities are
           already widely used. Having talked to a bunch of userspace this is
           the most faithful solution with minimal regression risks"
      
      * tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
        mm: invalidation check mapping before folio_contains
        tmpfs: trivial support for direct IO
        tmpfs,xattr: enable limited user extended attributes
        tmpfs: track free_ispace instead of free_inodes
        xattr: simple_xattr_set() return old_xattr to be freed
        tmpfs: verify {g,u}id mount options correctly
        shmem: move spinlock into shmem_recalc_inode() to fix quota support
        libfs: Remove parent dentry locking in offset_iterate_dir()
        libfs: Add a lock class for the offset map's xa_lock
        shmem: stable directory offsets
        shmem: Refactor shmem_symlink()
        libfs: Add directory operations for stable offsets
        shmem: fix quota lock nesting in huge hole handling
        shmem: Add default quota limit mount options
        shmem: quota support
        shmem: prepare shmem quota infrastructure
        quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks
        shmem: make shmem_get_inode() return ERR_PTR instead of NULL
        shmem: make shmem_inode_acct_block() return error
      ecd7db20
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 615e9583
      Linus Torvalds authored
      Pull vfs timestamp updates from Christian Brauner:
       "This adds VFS support for multi-grain timestamps and converts tmpfs,
        xfs, ext4, and btrfs to use them. This carries acks from all relevant
        filesystems.
      
        The VFS always uses coarse-grained timestamps when updating the ctime
        and mtime after a change. This has the benefit of allowing filesystems
        to optimize away a lot of metadata updates, down to around 1 per
        jiffy, even when a file is under heavy writes.
      
        Unfortunately, this has always been an issue when we're exporting via
        NFSv3, which relies on timestamps to validate caches. A lot of changes
        can happen in a jiffy, so timestamps aren't sufficient to help the
        client decide to invalidate the cache.
      
        Even with NFSv4, a lot of exported filesystems don't properly support
        a change attribute and are subject to the same problems with timestamp
        granularity. Other applications have similar issues with timestamps
        (e.g., backup applications).
      
        If we were to always use fine-grained timestamps, that would improve
        the situation, but that becomes rather expensive, as the underlying
        filesystem would have to log a lot more metadata updates.
      
        This introduces fine-grained timestamps that are used when they are
        actively queried.
      
        This uses the 31st bit of the ctime tv_nsec field to indicate that
        something has queried the inode for the mtime or ctime. When this flag
        is set, on the next mtime or ctime update, the kernel will fetch a
        fine-grained timestamp instead of the usual coarse-grained one.
      
        As POSIX generally mandates that when the mtime changes, the ctime
        must also change the kernel always stores normalized ctime values, so
        only the first 30 bits of the tv_nsec field are ever used.
      
        Filesytems can opt into this behavior by setting the FS_MGTIME flag in
        the fstype. Filesystems that don't set this flag will continue to use
        coarse-grained timestamps.
      
        Various preparatory changes, fixes and cleanups are included:
      
         - Fixup all relevant places where POSIX requires updating ctime
           together with mtime. This is a wide-range of places and all
           maintainers provided necessary Acks.
      
         - Add new accessors for inode->i_ctime directly and change all
           callers to rely on them. Plain accesses to inode->i_ctime are now
           gone and it is accordingly rename to inode->__i_ctime and commented
           as requiring accessors.
      
         - Extend generic_fillattr() to pass in a request mask mirroring in a
           sense the statx() uapi. This allows callers to pass in a request
           mask to only get a subset of attributes filled in.
      
         - Rework timestamp updates so it's possible to drop the @now
           parameter the update_time() inode operation and associated helpers.
      
         - Add inode_update_timestamps() and convert all filesystems to it
           removing a bunch of open-coding"
      
      * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
        btrfs: convert to multigrain timestamps
        ext4: switch to multigrain timestamps
        xfs: switch to multigrain timestamps
        tmpfs: add support for multigrain timestamps
        fs: add infrastructure for multigrain timestamps
        fs: drop the timespec64 argument from update_time
        xfs: have xfs_vn_update_time gets its own timestamp
        fat: make fat_update_time get its own timestamp
        fat: remove i_version handling from fat_update_time
        ubifs: have ubifs_update_time use inode_update_timestamps
        btrfs: have it use inode_update_timestamps
        fs: drop the timespec64 arg from generic_update_time
        fs: pass the request_mask to generic_fillattr
        fs: remove silly warning from current_time
        gfs2: fix timestamp handling on quota inodes
        fs: rename i_ctime field to __i_ctime
        selinux: convert to ctime accessor functions
        security: convert to ctime accessor functions
        apparmor: convert to ctime accessor functions
        sunrpc: convert to ctime accessor functions
        ...
      615e9583
    • Linus Torvalds's avatar
      Merge tag 'v6.6-vfs.fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 84ab1277
      Linus Torvalds authored
      Pull mount API updates from Christian Brauner:
       "This introduces FSCONFIG_CMD_CREATE_EXCL which allows userspace to
        implement something like
      
            $ mount -t ext4 --exclusive /dev/sda /B
      
        which fails if a superblock for the requested filesystem does already
        exist instead of silently reusing an existing superblock.
      
        Without it, in the sequence
      
            $ move-mount -f xfs -o       source=/dev/sda4 /A
            $ move-mount -f xfs -o noacl,source=/dev/sda4 /B
      
        the initial mounter will create a superblock. The second mounter will
        reuse the existing superblock, creating a bind-mount (see [1] for the
        source of the move-mount binary).
      
        The problem is that reusing an existing superblock means all mount
        options other than read-only and read-write will be silently ignored
        even if they are incompatible requests. For example, the second mount
        has requested no POSIX ACL support but since the existing superblock
        is reused POSIX ACL support will remain enabled.
      
        Such silent superblock reuse can easily become a security issue.
      
        After adding support for FSCONFIG_CMD_CREATE_EXCL to mount(8) in
        util-linux this can be fixed:
      
            $ move-mount -f xfs --exclusive -o       source=/dev/sda4 /A
            $ move-mount -f xfs --exclusive -o noacl,source=/dev/sda4 /B
            Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed
      
        This requires the new mount api. With the old mount api it would be
        necessary to plumb this through every legacy filesystem's
        file_system_type->mount() method. If they want this feature they are
        most welcome to switch to the new mount api"
      
      Link: https://github.com/brauner/move-mount-beneath [1]
      Link: https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner
      Link: https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner
      Link: https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner
      Link: https://lore.kernel.org/lkml/20230824-anzog-allheilmittel-e8c63e429a79@brauner/
      
      * tag 'v6.6-vfs.fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: add FSCONFIG_CMD_CREATE_EXCL
        fs: add vfs_cmd_reconfigure()
        fs: add vfs_cmd_create()
        super: remove get_tree_single_reconf()
      84ab1277
  2. 27 Aug, 2023 2 commits
  3. 26 Aug, 2023 7 commits
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 28f20a19
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fix an FPU invalidation bug on exec(), and fix a performance
        regression due to a missing setting of X86_FEATURE_OSXSAVE"
      
      * tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
        x86/fpu: Invalidate FPU state correctly on exec()
      28f20a19
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3b35375f
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A last minute fix for a regression introduced in the v6.5 merge
        window.
      
        The conversion of the software based interrupt resend mechanism to
        hlist missed to add a check whether the descriptor is already enqueued
        and dropped the interrupt descriptor lookup for nested interrupts.
      
        The missing check whether the descriptor is already queued causes
        hlist corruption and can be observed in the wild. The dropped parent
        descriptor lookup has not yet caused problems, but it would result in
        stale interrupt line in the worst case.
      
        Add the missing enqueued check and bring the descriptor lookup back to
        cure this"
      
      * tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Fix software resend lockup and nested resend
      3b35375f
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.5-2' of... · c3137613
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix a ptrace bug, a hw_breakpoint bug, some build errors/warnings and
        some trivial cleanups"
      
      * tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Fix hw_breakpoint_control() for watchpoints
        LoongArch: Ensure FP/SIMD registers in the core dump file is up to date
        LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()
        LoongArch: Add identifier names to arguments of die() declaration
        LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP
        LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP
        LoongArch: Remove <asm/export.h>
        LoongArch: Replace #include <asm/export.h> with #include <linux/export.h>
        LoongArch: Remove unneeded #include <asm/export.h>
        LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's
        LoongArch: Remove redundant "source drivers/firmware/Kconfig"
      c3137613
    • Johan Hovold's avatar
      genirq: Fix software resend lockup and nested resend · 9f5deb55
      Johan Hovold authored
      The switch to using hlist for managing software resend of interrupts
      broke resend in at least two ways:
      
      First, unconditionally adding interrupt descriptors to the resend list can
      corrupt the list when the descriptor in question has already been
      added. This causes the resend tasklet to loop indefinitely with interrupts
      disabled as was recently reported with the Lenovo ThinkPad X13s after
      threaded NAPI was disabled in the ath11k WiFi driver.
      
      This bug is easily fixed by restoring the old semantics of irq_sw_resend()
      so that it can be called also for descriptors that have already been marked
      for resend.
      
      Second, the offending commit also broke software resend of nested
      interrupts by simply discarding the code that made sure that such
      interrupts are retriggered using the parent interrupt.
      
      Add back the corresponding code that adds the parent descriptor to the
      resend list.
      
      Fixes: bc06a9e0 ("genirq: Use hlist for managing resend handlers")
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@kernel.org/
      Link: https://lore.kernel.org/r/20230826154004.1417-1-johan+linaro@kernel.org
      9f5deb55
    • Huacai Chen's avatar
      LoongArch: Fix hw_breakpoint_control() for watchpoints · 9730870b
      Huacai Chen authored
      In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the
      MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to
      add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally.
      
      Otherwise we can't set read watchpoint and write watchpoint separately.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9730870b
    • Huacai Chen's avatar
      LoongArch: Ensure FP/SIMD registers in the core dump file is up to date · 656f9aec
      Huacai Chen authored
      This is a port of commit 379eb01c ("riscv: Ensure the value
      of FP registers in the core dump file is up to date").
      
      The values of FP/SIMD registers in the core dump file come from the
      thread.fpu. However, kernel saves the FP/SIMD registers only before
      scheduling out the process. If no process switch happens during the
      exception handling, kernel will not have a chance to save the latest
      values of FP/SIMD registers. So it may cause their values in the core
      dump file incorrect. To solve this problem, force fpr_get()/simd_get()
      to save the FP/SIMD registers into the thread.fpu if the target task
      equals the current task.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      656f9aec
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 7d2f353b
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One clk driver fix and two clk framework fixes:
      
         - Fix an OOB access when devm_get_clk_from_child() is used and
           devm_clk_release() casts the void pointer to the wrong type
      
         - Move clk_rate_exclusive_{get,put}() within the correct ifdefs in
           clk.h so that the stubs are used when CONFIG_COMMON_CLK=n
      
         - Register the proper clk provider function depending on the value of
           #clock-cells in the TI keystone driver"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: Fix slab-out-of-bounds error in devm_clk_release()
        clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
        clk: keystone: syscon-clk: Fix audio refclk
      7d2f353b
  4. 25 Aug, 2023 12 commits
    • Kees Cook's avatar
      LoadPin: Annotate struct dm_verity_loadpin_trusted_root_digest with __counted_by · 5f536ac6
      Kees Cook authored
      Prepare for the coming implementation by GCC and Clang of the __counted_by
      attribute. Flexible array members annotated with __counted_by can have
      their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
      (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
      functions).
      
      As found with Coccinelle[1], add __counted_by for struct dm_verity_loadpin_trusted_root_digest.
      Additionally, since the element count member must be set before accessing
      the annotated flexible array member, move its initialization earlier.
      
      [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
      
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: dm-devel@redhat.com
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: linux-security-module@vger.kernel.org
      Link: https://lore.kernel.org/r/20230817235955.never.762-kees@kernel.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      5f536ac6
    • Yonghong Song's avatar
      kallsyms: Change func signature for cleanup_symbol_name() · 76903a96
      Yonghong Song authored
      All users of cleanup_symbol_name() do not use the return value.
      So let us change the return value of cleanup_symbol_name() to
      'void' to reflect its usage pattern.
      Suggested-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230825202036.441212-1-yonghong.song@linux.devSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      76903a96
    • Helge Deller's avatar
      lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels · 382d4cd1
      Helge Deller authored
      The gcc compiler translates on some architectures the 64-bit
      __builtin_clzll() function to a call to the libgcc function __clzdi2(),
      which should take a 64-bit parameter on 32- and 64-bit platforms.
      
      But in the current kernel code, the built-in __clzdi2() function is
      defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
      32, thus the return values on 32-bit kernels are in the range from
      [0..31] instead of the expected [0..63] range.
      
      This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
      take a 64-bit parameter on 32-bit kernels as well, thus it makes the
      functions identical for 32- and 64-bit kernels.
      
      This bug went unnoticed since kernel 3.11 for over 10 years, and here
      are some possible reasons for that:
      
       a) Some architectures have assembly instructions to count the bits and
          which are used instead of calling __clzdi2(), e.g. on x86 the bsr
          instruction and on ppc cntlz is used. On such architectures the
          wrong __clzdi2() implementation isn't used and as such the bug has
          no effect and won't be noticed.
      
       b) Some architectures link to libgcc.a, and the in-kernel weak
          functions get replaced by the correct 64-bit variants from libgcc.a.
      
       c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
          places in the kernel, and most likely only in uncritical functions,
          e.g. when printing hex values via seq_put_hex_ll(). The wrong return
          value will still print the correct number, but just in a wrong
          formatting (e.g. with too many leading zeroes).
      
       d) 32-bit kernels aren't used that much any longer, so they are less
          tested.
      
      A trivial testcase to verify if the currently running 32-bit kernel is
      affected by the bug is to look at the output of /proc/self/maps:
      
      Here the kernel uses a correct implementation of __clzdi2():
      
        root@debian:~# cat /proc/self/maps
        00010000-00019000 r-xp 00000000 08:05 787324     /usr/bin/cat
        00019000-0001a000 rwxp 00009000 08:05 787324     /usr/bin/cat
        0001a000-0003b000 rwxp 00000000 00:00 0          [heap]
        f7551000-f770d000 r-xp 00000000 08:05 794765     /usr/lib/hppa-linux-gnu/libc.so.6
        ...
      
      and this kernel uses the broken implementation of __clzdi2():
      
        root@debian:~# cat /proc/self/maps
        0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324  /usr/bin/cat
        0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324  /usr/bin/cat
        000000001a000-000000003b000 rwxp 00000000 00:00 0  [heap]
        00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765  /usr/lib/hppa-linux-gnu/libc.so.6
        ...
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Fixes: 4df87bb7 ("lib: add weak clz/ctz functions")
      Cc: Chanho Min <chanho.min@lge.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: stable@vger.kernel.org # v3.11+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      382d4cd1
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of... · 6f0edbb8
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4
        issues or aren't considered suitable for a -stable backport"
      
      * tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        shmem: fix smaps BUG sleeping while atomic
        selftests: cachestat: catch failing fsync test on tmpfs
        selftests: cachestat: test for cachestat availability
        maple_tree: disable mas_wr_append() when other readers are possible
        madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
        madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
        madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
        mm: multi-gen LRU: don't spin during memcg release
        mm: memory-failure: fix unexpected return value in soft_offline_page()
        radix tree: remove unused variable
        mm: add a call to flush_cache_vmap() in vmap_pfn()
        selftests/mm: FOLL_LONGTERM need to be updated to 0x100
        nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
        mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
        selftests: cgroup: fix test_kmem_basic less than error
        mm: enable page walking API to lock vmas during the walk
        smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
        mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
      6f0edbb8
    • Yonghong Song's avatar
      kallsyms: Fix kallsyms_selftest failure · 33f0467f
      Yonghong Song authored
      Kernel test robot reported a kallsyms_test failure when clang lto is
      enabled (thin or full) and CONFIG_KALLSYMS_SELFTEST is also enabled.
      I can reproduce in my local environment with the following error message
      with thin lto:
        [    1.877897] kallsyms_selftest: Test for 1750th symbol failed: (tsc_cs_mark_unstable) addr=ffffffff81038090
        [    1.877901] kallsyms_selftest: abort
      
      It appears that commit 8cc32a9b ("kallsyms: strip LTO-only suffixes
      from promoted global functions") caused the failure. Commit 8cc32a9b
      changed cleanup_symbol_name() based on ".llvm." instead of '.' where
      ".llvm." is appended to a before-lto-optimization local symbol name.
      We need to propagate such knowledge in kallsyms_selftest.c as well.
      
      Further more, compare_symbol_name() in kallsyms.c needs change as well.
      In scripts/kallsyms.c, kallsyms_names and kallsyms_seqs_of_names are used
      to record symbol names themselves and index to symbol names respectively.
      For example:
        kallsyms_names:
          ...
          __amd_smn_rw._entry       <== seq 1000
          __amd_smn_rw._entry.5     <== seq 1001
          __amd_smn_rw.llvm.<hash>  <== seq 1002
          ...
      
      kallsyms_seqs_of_names are sorted based on cleanup_symbol_name() through, so
      the order in kallsyms_seqs_of_names actually has
      
        index 1000:   seq 1002   <== __amd_smn_rw.llvm.<hash> (actual symbol comparison using '__amd_smn_rw')
        index 1001:   seq 1000   <== __amd_smn_rw._entry
        index 1002:   seq 1001   <== __amd_smn_rw._entry.5
      
      Let us say at a particular point, at index 1000, symbol '__amd_smn_rw.llvm.<hash>'
      is comparing to '__amd_smn_rw._entry' where '__amd_smn_rw._entry' is the one to
      search e.g., with function kallsyms_on_each_match_symbol(). The current implementation
      will find out '__amd_smn_rw._entry' is less than '__amd_smn_rw.llvm.<hash>' and
      then continue to search e.g., index 999 and never found a match although the actual
      index 1001 is a match.
      
      To fix this issue, let us do cleanup_symbol_name() first and then do comparison.
      In the above case, comparing '__amd_smn_rw' vs '__amd_smn_rw._entry' and
      '__amd_smn_rw._entry' being greater than '__amd_smn_rw', the next comparison will
      be > index 1000 and eventually index 1001 will be hit an a match is found.
      
      For any symbols not having '.llvm.' substr, there is no functionality change
      for compare_symbol_name().
      
      Fixes: 8cc32a9b ("kallsyms: strip LTO-only suffixes from promoted global functions")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202308232200.1c932a90-oliver.sang@intel.comSigned-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Reviewed-by: default avatarSong Liu <song@kernel.org>
      Reviewed-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Link: https://lore.kernel.org/r/20230825034659.1037627-1-yonghong.song@linux.dev
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      33f0467f
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 4942fed8
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
       "This is obviously not ideal, particularly for something this late in
        the cycle.
      
        Unfortunately we found some uABI issues in the vector support while
        reviewing the GDB port, which has triggered a revert -- probably a
        good sign we should have reviewed GDB before merging this, I guess I
        just dropped the ball because I was so worried about the context
        extension and libc suff I forgot. Hence the late revert.
      
        There's some risk here as we're still exposing the vector context for
        signal handlers, but changing that would have meant reverting all of
        the vector support. The issues we've found so far have been fixed
        already and they weren't absolute showstoppers, so we're essentially
        just playing it safe by holding ptrace support for another release (or
        until we get through a proper userspace code review).
      
        Summary:
      
         - The vector ucontext extension has been extended with vlenb
      
         - The vector registers ELF core dump note type has been changed to
           avoid aliasing with the CSR type used in embedded systems
      
         - Support for accessing vector registers via ptrace() has been
           reverted
      
         - Another build fix for the ISA spec changes around Zifencei/Zicsr
           that manifests on some systems built with binutils-2.37 and
           gcc-11.2"
      
      * tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix build errors using binutils2.37 toolchains
        RISC-V: vector: export VLENB csr in __sc_riscv_v_state
        RISC-V: Remove ptrace support for vectors
      4942fed8
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 98c6b8a5
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix an irq mapping leak in gpio-sim
      
       - associate the GPIO device's software node with the irq domain in
         gpio-sim
      
      * tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: sim: pass the GPIO device's software node to irq domain
        gpio: sim: dispose of irq mappings before destroying the irq_sim domain
      98c6b8a5
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · a87eaffb
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Here are some Renesas and AMD driver fixes, the AMD fix affects
        important laptops in the wild so this one is pretty important. It
        seems a bit tough to get this right.
      
         - Fix DT parsing and related locking in the Renesas driver.
      
         - Fix wakeup IRQs in the AMD driver once again. Really tricky this
           one"
      
      * tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: amd: Mask wake bits on probe again
        pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
        pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
        pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
      a87eaffb
    • Linus Torvalds's avatar
      Merge tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ced5bf24
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Hopefully the last bits for 6.5. It's slightly higher LOCs than
        wished, but it doesn't look scary.
      
        The biggest change is MAINTAINERS update for TI; it's good to have the
        update before the final release, so that people can contact to the
        right persons for bug reports (which shouldn't happen of course!)
      
        The rest are all device-specific fixes and quirks, most for various
        ASoC platforms"
      
      * tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
        ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
        ASoC: cs35l41: Correct amp_gain_tlv values
        ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
        ASoC: tas2781: fixed register access error when switching to other chips
        ASoC: cs35l56: Add an ACPI match table
        ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
        ASoC: SOF: ipc4-pcm: fix possible null pointer deference
        MAINTAINERS: Add entries for TEXAS INSTRUMENTS ASoC DRIVERS
      ced5bf24
    • Tiezhu Yang's avatar
      LoongArch: Put the body of play_dead() into arch_cpu_idle_dead() · c337c849
      Tiezhu Yang authored
      The initial aim is to silence the following objtool warning:
      
      arch/loongarch/kernel/process.o: warning: objtool: arch_cpu_idle_dead() falls through to next function start_thread()
      
      According to tools/objtool/Documentation/objtool.txt, this is because
      the last instruction of arch_cpu_idle_dead() is a call to a noreturn
      function play_dead(). In order to silence the warning, one simple way
      is to add the noreturn function play_dead() to objtool's hard-coded
      global_noreturns array, that is to say, just put "NORETURN(play_dead)"
      into tools/objtool/noreturns.h, it works well.
      
      But I noticed that play_dead() is only defined once and only called by
      arch_cpu_idle_dead(), so put the body of play_dead() into the caller
      arch_cpu_idle_dead(), then remove the noreturn function play_dead() is
      an alternative way which can reduce the overhead of the function call
      at the same time.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      c337c849
    • Tiezhu Yang's avatar
      LoongArch: Add identifier names to arguments of die() declaration · 8879515e
      Tiezhu Yang authored
      Add identifier names to arguments of die() declaration in ptrace.h
      to fix the following checkpatch warnings:
      
        WARNING: function definition argument 'const char *' should also have an identifier name
        WARNING: function definition argument 'struct pt_regs *' should also have an identifier name
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      8879515e
    • Tiezhu Yang's avatar
      LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP · a038ae71
      Tiezhu Yang authored
      After the call to oops_exit(), it should not panic or execute
      the crash kernel if the oops is to be suppressed.
      Suggested-by: default avatarMaciej W. Rozycki <macro@orcam.me.uk>
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      a038ae71