1. 25 May, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · 2313022e
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Fixes for -Wmissing-prototypes warnings and further cleanup
      
       - Remove callback returning void from rtc and virtio drivers
      
       - Fix bash location
      
      * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
        um: virtio_uml: Convert to platform remove callback returning void
        um: rtc: Convert to platform remove callback returning void
        um: Remove unused do_get_thread_area function
        um: Fix -Wmissing-prototypes warnings for __vdso_*
        um: Add an internal header shared among the user code
        um: Fix the declaration of kasan_map_memory
        um: Fix the -Wmissing-prototypes warning for get_thread_reg
        um: Fix the -Wmissing-prototypes warning for __switch_mm
        um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
        um: Stop tracking host PID in cpu_tasks
        um: process: remove unused 'n' variable
        um: vector: remove unused len variable/calculation
        um: vector: fix bpfflash parameter evaluation
        um: slirp: remove set but unused variable 'pid'
        um: signal: move pid variable where needed
        um: Makefile: use bash from the environment
        um: Add winch to winch_handlers before registering winch IRQ
        um: Fix -Wmissing-prototypes warnings for __warp_* and foo
        um: Fix -Wmissing-prototypes warnings for text_poke*
        um: Move declarations to proper headers
        ...
      2313022e
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel · 56fb6f92
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Some fixes for the end of the merge window, mostly amdgpu and panthor,
        with one nouveau uAPI change that fixes a bad decision we made a few
        months back.
      
        nouveau:
         - fix bo metadata uAPI for vm bind
      
        panthor:
         - Fixes for panthor's heap logical block.
         - Reset on unrecoverable fault
         - Fix VM references.
         - Reset fix.
      
        xlnx:
         - xlnx compile and doc fixes.
      
        amdgpu:
         - Handle vbios table integrated info v2.3
      
        amdkfd:
         - Handle duplicate BOs in reserve_bo_and_cond_vms
         - Handle memory limitations on small APUs
      
        dp/mst:
         - MST null deref fix.
      
        bridge:
         - Don't let next bridge create connector in adv7511 to make probe
           work"
      
      * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
        drm/amdgpu/atomfirmware: add intergrated info v2.3 table
        drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
        drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
        drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
        drm/bridge: adv7511: Attach next bridge without creating connector
        drm/buddy: Fix the warn on's during force merge
        drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
        drm/panthor: Call panthor_sched_post_reset() even if the reset failed
        drm/panthor: Reset the FW VM to NULL on unplug
        drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
        drm/panthor: Force an immediate reset on unrecoverable faults
        drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
        drm/panthor: Fix an off-by-one in the heap context retrieval logic
        drm/panthor: Relax the constraints on the tiler chunk size
        drm/panthor: Make sure the tiler initial/max chunks are consistent
        drm/panthor: Fix tiler OOM handling to allow incremental rendering
        drm: xlnx: zynqmp_dpsub: Fix compilation error
        drm: xlnx: zynqmp_dpsub: Fix few function comments
      56fb6f92
  2. 24 May, 2024 17 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 0b32d436
      Linus Torvalds authored
      Pull more mm updates from Andrew Morton:
       "Jeff Xu's implementation of the mseal() syscall"
      
      * tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftest mm/mseal read-only elf memory segment
        mseal: add documentation
        selftest mm/mseal memory sealing
        mseal: add mseal syscall
        mseal: wire up mseal syscall
      0b32d436
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · f1f9984f
      Linus Torvalds authored
      Pull more RISC-V updates from Palmer Dabbelt:
      
       - The compression format used for boot images is now configurable at
         build time, and these formats are shown in `make help`
      
       - access_ok() has been optimized
      
       - A pair of performance bugs have been fixed in the uaccess handlers
      
       - Various fixes and cleanups, including one for the IMSIC build failure
         and one for the early-boot ftrace illegal NOPs bug
      
      * tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix early ftrace nop patching
        irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
        riscv: selftests: Add signal handling vector tests
        riscv: mm: accelerate pagefault when badaccess
        riscv: uaccess: Relax the threshold for fast path
        riscv: uaccess: Allow the last potential unrolled copy
        riscv: typo in comment for get_f64_reg
        Use bool value in set_cpu_online()
        riscv: selftests: Add hwprobe binaries to .gitignore
        riscv: stacktrace: fixed walk_stackframe()
        ftrace: riscv: move from REGS to ARGS
        riscv: do not select MODULE_SECTIONS by default
        riscv: show help string for riscv-specific targets
        riscv: make image compression configurable
        riscv: cpufeature: Fix extension subset checking
        riscv: cpufeature: Fix thead vector hwcap removal
        riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
        riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
        riscv: Define TASK_SIZE_MAX for __access_ok()
        riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
      f1f9984f
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 9351f138
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
      
       - a small cleanup in the drivers/xen/xenbus Makefile
      
       - a fix of the Xen xenstore driver to improve connecting to a late
         started Xenstore
      
       - an enhancement for better support of ballooning in PVH guests
      
       - a cleanup using try_cmpxchg() instead of open coding it
      
      * tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        drivers/xen: Improve the late XenStore init protocol
        xen/xenbus: Use *-y instead of *-objs in Makefile
        xen/x86: add extra pages to unpopulated-alloc if available
        locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
      9351f138
    • Linus Torvalds's avatar
      Merge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 02c438bb
      Linus Torvalds authored
      Pull more btrfs updates from David Sterba:
       "A few more updates, mostly stability fixes or user visible changes:
      
         - fix race in zoned mode during device replace that can lead to
           use-after-free
      
         - update return codes and lower message levels for quota rescan where
           it's causing false alerts
      
         - fix unexpected qgroup id reuse under some conditions
      
         - fix condition when looking up extent refs
      
         - add option norecovery (removed in 6.8), the intended replacements
           haven't been used and some aplications still rely on the old one
      
         - build warning fixes"
      
      * tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: re-introduce 'norecovery' mount option
        btrfs: fix end of tree detection when searching for data extent ref
        btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
        btrfs: zoned: fix use-after-free due to race with dev replace
        btrfs: qgroup: fix qgroup id collision across mounts
        btrfs: qgroup: update rescan message levels and error codes
      02c438bb
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · dcb9f486
      Linus Torvalds authored
      Pull more erofs updates from Gao Xiang:
       "The main ones are metadata API conversion to byte offsets by Al Viro.
      
        Another patch gets rid of unnecessary memory allocation out of DEFLATE
        decompressor. The remaining one is a trivial cleanup.
      
         - Convert metadata APIs to byte offsets
      
         - Avoid allocating DEFLATE streams unnecessarily
      
         - Some erofs_show_options() cleanup"
      
      * tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: avoid allocating DEFLATE streams before mounting
        z_erofs_pcluster_begin(): don't bother with rounding position down
        erofs: don't round offset down for erofs_read_metabuf()
        erofs: don't align offset for erofs_read_metabuf() (simple cases)
        erofs: mechanically convert erofs_read_metabuf() to offsets
        erofs: clean up erofs_show_options()
      dcb9f486
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs · c40b1994
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
       "Nothing exciting, just syzbot fixes (except for the one
        FMODE_CAN_ODIRECT patch).
      
        Looks like syzbot reports have slowed down; this is all catch up from
        two weeks of conferences.
      
        Next hardening project is using Thomas's error injection tooling to
        torture test repair"
      
      * tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: Fix race path in bch2_inode_insert()
        bcachefs: Ensure we're RW before journalling
        bcachefs: Fix shutdown ordering
        bcachefs: Fix unsafety in bch2_dirent_name_bytes()
        bcachefs: Fix stack oob in __bch2_encrypt_bio()
        bcachefs: Fix btree_trans leak in bch2_readahead()
        bcachefs: Fix bogus verify_replicas_entry() assert
        bcachefs: Check for subvolues with bogus snapshot/inode fields
        bcachefs: bch2_checksum() returns 0 for unknown checksum type
        bcachefs: Fix bch2_alloc_ciphers()
        bcachefs: Add missing guard in bch2_snapshot_has_children()
        bcachefs: Fix missing parens in drop_locks_do()
        bcachefs: Improve bch2_assert_pos_locked()
        bcachefs: Fix shift overflows in replicas.c
        bcachefs: Fix shift overflow in btree_lost_data()
        bcachefs: Fix ref in trans_mark_dev_sbs() error path
        bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
        bcachefs: Fix rcu splat in check_fix_ptrs()
      c40b1994
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 9ea370f3
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - a change to input core to trim amount of keys data in modalias string
         in case when a device declares too many keys and they do not fit in
         uevent buffer instead of reporting an error which results in uevent
         not being generated at all
      
       - support for Machenike G5 Pro Controller added to xpad driver
      
       - support for FocalTech FT5452 and FT8719 added to edt-ft5x06
      
       - support for new SPMI vibrator added to pm8xxx-vibrator driver
      
       - missing locking added to cyapa touchpad driver
      
       - removal of unused fields in various driver structures
      
       - explicit initialization of i2c_device_id::driver_data to 0 dropped
         from input drivers
      
       - other assorted fixes and cleanups.
      
      * tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
        Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
        dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
        Input: xpad - add support for Machenike G5 Pro Controller
        Input: try trimming too long modalias strings
        Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
        Input: zet6223 - remove an unused field in struct zet6223_ts
        Input: chipone_icn8505 - remove an unused field in struct icn8505_data
        Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
        Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
        Input: matrix_keypad - remove an unused field in struct matrix_keypad
        Input: tca6416-keypad - remove unused struct tca6416_drv_data
        Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
        Input: da7280 - remove an unused field in struct da7280_haptic
        Input: ff-core - prefer struct_size over open coded arithmetic
        Input: cyapa - add missing input core locking to suspend/resume functions
        input: pm8xxx-vibrator: add new SPMI vibrator support
        dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
        input: pm8xxx-vibrator: refactor to support new SPMI vibrator
        Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
        Input: sur40 - convert le16 to cpu before use
        ...
      9ea370f3
    • Linus Torvalds's avatar
      Merge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 041c9f71
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes for 6.10-rc1. Most of changes are various
        device-specific fixes and quirks, while there are a few small changes
        in ALSA core timer and module / built-in fixes"
      
      * tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
        ALSA: core: Enable proc module when CONFIG_MODULES=y
        ALSA: core: Fix NULL module pointer assignment at card init
        ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
        ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
        ASoC: tas2781: Fix wrong loading calibrated data sequence
        ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
        ALSA: usb-audio: Fix for sampling rates support for Mbox3
        Documentation: sound: Fix trailing whitespaces
        ALSA: timer: Set lower bound of start tick time
        ASoC: codecs: ES8326: solve hp and button detect issue
        ASoC: rt5645: mic-in detection threshold modification
        ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
      041c9f71
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.10-rc1-fix' of... · e292ead0
      Linus Torvalds authored
      Merge tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
      
      Pull char/misc fix from Greg KH:
       "Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
        merge window, and has been sitting in my tree and linux-next for quite
        a while now, but wasn't sent to you (my fault, travels...)
      
        It is a bugfix to resolve an error in the speakup code that could
        overflow a buffer.
      
        It has been in linux-next for a while with no reported problems"
      
      * tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        speakup: Fix sizeof() vs ARRAY_SIZE() bug
      e292ead0
    • Linus Torvalds's avatar
      Merge tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · f6d199c7
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small TTY and Serial driver fixes that missed the
        6.9-final merge window, but have been in my tree for weeks (my fault,
        travel caused me to miss this)
      
        These fixes include:
      
         - more n_gsm fixes for reported problems
      
         - 8520_mtk driver fix
      
         - 8250_bcm7271 driver fix
      
         - sc16is7xx driver fix
      
        All of these have been in linux-next for weeks without any reported
        problems"
      
      * tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
        serial: 8250_bcm7271: use default_mux_rate if possible
        serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
        tty: n_gsm: fix missing receive state reset after mode switch
        tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
      f6d199c7
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · b0a9ba13
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
         decompression (Stephen Boyd)
      
       - ubsan: Restore dependency on ARCH_HAS_UBSAN
      
       - kunit/fortify: Fix memcmp() test to be amplitude agnostic
      
      * tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        kunit/fortify: Fix memcmp() test to be amplitude agnostic
        ubsan: Restore dependency on ARCH_HAS_UBSAN
        loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
      b0a9ba13
    • Linus Torvalds's avatar
      Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 0eb03c7e
      Linus Torvalds authored
      Pull tracefs/eventfs updates from Steven Rostedt:
       "Bug fixes:
      
         - The eventfs directories need to have unique inode numbers. Make
           sure that they do not get the default file inode number.
      
         - Update the inode uid and gid fields on remount.
      
           When a remount happens where a uid and/or gid is specified, all the
           tracefs files and directories should get the specified uid and/or
           gid. But this can be sporadic when some uids were assigned already.
           There's already a list of inodes that are allocated. Just update
           their uid and gid fields at the time of remount.
      
         - Update the eventfs_inodes on remount from the top level "events"
           descriptor.
      
           There was a bug where not all the eventfs files or directories
           where getting updated on remount. One fix was to clear the
           SAVED_UID/GID flags from the inode list during the iteration of the
           inodes during the remount. But because the eventfs inodes can be
           freed when the last referenced is released, not all the
           eventfs_inodes were being updated. This lead to the ownership
           selftest to fail if it was run a second time (the first time would
           leave eventfs_inodes with no corresponding tracefs_inode).
      
           Instead, for eventfs_inodes, only process the "events"
           eventfs_inode from the list iteration, as it is guaranteed to have
           a tracefs_inode (it's never freed while the "events" directory
           exists). As it has a list of its children, and the children have a
           list of their children, just iterate all the eventfs_inodes from
           the "events" descriptor and it is guaranteed to get all of them.
      
         - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
      
           Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
           callback. But this is the wrong location. The iput() callback is
           called when the last reference to the dentry inode is hit. There
           could be a case where two dentry's have the same inode, and the
           flag will be cleared prematurely. The flag needs to be cleared when
           the last reference of the inode is dropped and that happens in the
           inode's drop_inode() callback handler.
      
        Cleanups:
      
         - Consolidate the creation of a tracefs_inode for an eventfs_inode
      
           A tracefs_inode is created for both files and directories of the
           eventfs system. It is open coded. Instead, consolidate it into a
           single eventfs_get_inode() function call.
      
         - Remove the eventfs getattr and permission callbacks.
      
           The permissions for the eventfs files and directories are updated
           when the inodes are created, on remount, and when the user sets
           them (via setattr). The inodes hold the current permissions so
           there is no need to have custom getattr or permissions callbacks as
           they will more likely cause them to be incorrect. The inode's
           permissions are updated when they should be updated. Remove the
           getattr and permissions inode callbacks.
      
         - Do not update eventfs_inode attributes on creation of inodes.
      
           The eventfs_inodes attribute field is used to store the permissions
           of the directories and files for when their corresponding inodes
           are freed and are created again. But when the creation of the
           inodes happen, the eventfs_inode attributes are recalculated. The
           recalculation should only happen when the permissions change for a
           given file or directory. Currently, the attribute changes are just
           being set to their current files so this is not a bug, but it's
           unnecessary and error prone. Stop doing that.
      
         - The events directory inode is created once when the events
           directory is created and deleted when it is deleted. It is now
           updated on remount and when the user changes the permissions.
           There's no need to use the eventfs_inode of the events directory to
           store the events directory permissions. But using it to store the
           default permissions for the files within the directory that have
           not been updated by the user can simplify the code"
      
      * tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Do not use attributes for events directory
        eventfs: Cleanup permissions in creation of inodes
        eventfs: Remove getattr and permission callbacks
        eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
        tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
        eventfs: Update all the eventfs_inodes from the events descriptor
        tracefs: Update inode permissions on remount
        eventfs: Keep the directories from having the same inode number as files
      0eb03c7e
    • Jeff Xu's avatar
      selftest mm/mseal read-only elf memory segment · a52b4f11
      Jeff Xu authored
      Sealing read-only of elf mapping so it can't be changed by mprotect.
      
      [jeffxu@chromium.org: style change]
        Link: https://lkml.kernel.org/r/20240416220944.2481203-2-jeffxu@chromium.org
      [amer.shanawany@gmail.com: fix linker error for inline function]
        Link: https://lkml.kernel.org/r/20240420202346.546444-1-amer.shanawany@gmail.com
      [jeffxu@chromium.org: fix compile warning]
        Link: https://lkml.kernel.org/r/20240420003515.345982-2-jeffxu@chromium.org
      [jeffxu@chromium.org: fix arm build]
        Link: https://lkml.kernel.org/r/20240502225331.3806279-2-jeffxu@chromium.org
      Link: https://lkml.kernel.org/r/20240415163527.626541-6-jeffxu@chromium.orgSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
      Signed-off-by: default avatarAmer Al Shanawany <amer.shanawany@gmail.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: Stephen Röttger <sroettger@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
      Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a52b4f11
    • Jeff Xu's avatar
      mseal: add documentation · c010d099
      Jeff Xu authored
      Add documentation for mseal().
      
      Link: https://lkml.kernel.org/r/20240415163527.626541-5-jeffxu@chromium.orgSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: Stephen Röttger <sroettger@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
      Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c010d099
    • Jeff Xu's avatar
      selftest mm/mseal memory sealing · 4926c7a5
      Jeff Xu authored
      selftest for memory sealing change in mmap() and mseal().
      
      Link: https://lkml.kernel.org/r/20240415163527.626541-4-jeffxu@chromium.orgSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: Stephen Röttger <sroettger@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
      Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4926c7a5
    • Jeff Xu's avatar
      mseal: add mseal syscall · 8be7258a
      Jeff Xu authored
      The new mseal() is an syscall on 64 bit CPU, and with following signature:
      
      int mseal(void addr, size_t len, unsigned long flags)
      addr/len: memory range.
      flags: reserved.
      
      mseal() blocks following operations for the given memory range.
      
      1> Unmapping, moving to another location, and shrinking the size,
         via munmap() and mremap(), can leave an empty space, therefore can
         be replaced with a VMA with a new set of attributes.
      
      2> Moving or expanding a different VMA into the current location,
         via mremap().
      
      3> Modifying a VMA via mmap(MAP_FIXED).
      
      4> Size expansion, via mremap(), does not appear to pose any specific
         risks to sealed VMAs. It is included anyway because the use case is
         unclear. In any case, users can rely on merging to expand a sealed VMA.
      
      5> mprotect() and pkey_mprotect().
      
      6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
         memory, when users don't have write permission to the memory. Those
         behaviors can alter region contents by discarding pages, effectively a
         memset(0) for anonymous memory.
      
      Following input during RFC are incooperated into this patch:
      
      Jann Horn: raising awareness and providing valuable insights on the
      destructive madvise operations.
      Linus Torvalds: assisting in defining system call signature and scope.
      Liam R. Howlett: perf optimization.
      Theo de Raadt: sharing the experiences and insight gained from
        implementing mimmutable() in OpenBSD.
      
      Finally, the idea that inspired this patch comes from Stephen Röttger's
      work in Chrome V8 CFI.
      
      [jeffxu@chromium.org: add branch prediction hint, per Pedro]
        Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
      Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.orgSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: Stephen Röttger <sroettger@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
      Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8be7258a
    • Jeff Xu's avatar
      mseal: wire up mseal syscall · ff388fe5
      Jeff Xu authored
      Patch series "Introduce mseal", v10.
      
      This patchset proposes a new mseal() syscall for the Linux kernel.
      
      In a nutshell, mseal() protects the VMAs of a given virtual memory range
      against modifications, such as changes to their permission bits.
      
      Modern CPUs support memory permissions, such as the read/write (RW) and
      no-execute (NX) bits.  Linux has supported NX since the release of kernel
      version 2.6.8 in August 2004 [1].  The memory permission feature improves
      the security stance on memory corruption bugs, as an attacker cannot
      simply write to arbitrary memory and point the code to it.  The memory
      must be marked with the X bit, or else an exception will occur. 
      Internally, the kernel maintains the memory permissions in a data
      structure called VMA (vm_area_struct).  mseal() additionally protects the
      VMA itself against modifications of the selected seal type.
      
      Memory sealing is useful to mitigate memory corruption issues where a
      corrupted pointer is passed to a memory management system.  For example,
      such an attacker primitive can break control-flow integrity guarantees
      since read-only memory that is supposed to be trusted can become writable
      or .text pages can get remapped.  Memory sealing can automatically be
      applied by the runtime loader to seal .text and .rodata pages and
      applications can additionally seal security critical data at runtime.  A
      similar feature already exists in the XNU kernel with the
      VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
      [4].  Also, Chrome wants to adopt this feature for their CFI work [2] and
      this patchset has been designed to be compatible with the Chrome use case.
      
      Two system calls are involved in sealing the map:  mmap() and mseal().
      
      The new mseal() is an syscall on 64 bit CPU, and with following signature:
      
      int mseal(void addr, size_t len, unsigned long flags)
      addr/len: memory range.
      flags: reserved.
      
      mseal() blocks following operations for the given memory range.
      
      1> Unmapping, moving to another location, and shrinking the size,
         via munmap() and mremap(), can leave an empty space, therefore can
         be replaced with a VMA with a new set of attributes.
      
      2> Moving or expanding a different VMA into the current location,
         via mremap().
      
      3> Modifying a VMA via mmap(MAP_FIXED).
      
      4> Size expansion, via mremap(), does not appear to pose any specific
         risks to sealed VMAs. It is included anyway because the use case is
         unclear. In any case, users can rely on merging to expand a sealed VMA.
      
      5> mprotect() and pkey_mprotect().
      
      6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
         memory, when users don't have write permission to the memory. Those
         behaviors can alter region contents by discarding pages, effectively a
         memset(0) for anonymous memory.
      
      The idea that inspired this patch comes from Stephen Röttger’s work in
      V8 CFI [5].  Chrome browser in ChromeOS will be the first user of this
      API.
      
      Indeed, the Chrome browser has very specific requirements for sealing,
      which are distinct from those of most applications.  For example, in the
      case of libc, sealing is only applied to read-only (RO) or read-execute
      (RX) memory segments (such as .text and .RELRO) to prevent them from
      becoming writable, the lifetime of those mappings are tied to the lifetime
      of the process.
      
      Chrome wants to seal two large address space reservations that are managed
      by different allocators.  The memory is mapped RW- and RWX respectively
      but write access to it is restricted using pkeys (or in the future ARM
      permission overlay extensions).  The lifetime of those mappings are not
      tied to the lifetime of the process, therefore, while the memory is
      sealed, the allocators still need to free or discard the unused memory. 
      For example, with madvise(DONTNEED).
      
      However, always allowing madvise(DONTNEED) on this range poses a security
      risk.  For example if a jump instruction crosses a page boundary and the
      second page gets discarded, it will overwrite the target bytes with zeros
      and change the control flow.  Checking write-permission before the discard
      operation allows us to control when the operation is valid.  In this case,
      the madvise will only succeed if the executing thread has PKEY write
      permissions and PKRU changes are protected in software by control-flow
      integrity.
      
      Although the initial version of this patch series is targeting the Chrome
      browser as its first user, it became evident during upstream discussions
      that we would also want to ensure that the patch set eventually is a
      complete solution for memory sealing and compatible with other use cases. 
      The specific scenario currently in mind is glibc's use case of loading and
      sealing ELF executables.  To this end, Stephen is working on a change to
      glibc to add sealing support to the dynamic linker, which will seal all
      non-writable segments at startup.  Once this work is completed, all
      applications will be able to automatically benefit from these new
      protections.
      
      In closing, I would like to formally acknowledge the valuable
      contributions received during the RFC process, which were instrumental in
      shaping this patch:
      
      Jann Horn: raising awareness and providing valuable insights on the
        destructive madvise operations.
      Liam R. Howlett: perf optimization.
      Linus Torvalds: assisting in defining system call signature and scope.
      Theo de Raadt: sharing the experiences and insight gained from
        implementing mimmutable() in OpenBSD.
      
      MM perf benchmarks
      ==================
      This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
      check the VMAs’ sealing flag, so that no partial update can be made,
      when any segment within the given memory range is sealed.
      
      To measure the performance impact of this loop, two tests are developed.
      [8]
      
      The first is measuring the time taken for a particular system call,
      by using clock_gettime(CLOCK_MONOTONIC). The second is using
      PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
      similar results.
      
      The tests have roughly below sequence:
      for (i = 0; i < 1000, i++)
          create 1000 mappings (1 page per VMA)
          start the sampling
          for (j = 0; j < 1000, j++)
              mprotect one mapping
          stop and save the sample
          delete 1000 mappings
      calculates all samples.
      
      Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
      4G memory, Chromebook.
      
      Based on the latest upstream code:
      The first test (measuring time)
      syscall__	vmas	t	t_mseal	delta_ns	per_vma	%
      munmap__  	1	909	944	35	35	104%
      munmap__  	2	1398	1502	104	52	107%
      munmap__  	4	2444	2594	149	37	106%
      munmap__  	8	4029	4323	293	37	107%
      munmap__  	16	6647	6935	288	18	104%
      munmap__  	32	11811	12398	587	18	105%
      mprotect	1	439	465	26	26	106%
      mprotect	2	1659	1745	86	43	105%
      mprotect	4	3747	3889	142	36	104%
      mprotect	8	6755	6969	215	27	103%
      mprotect	16	13748	14144	396	25	103%
      mprotect	32	27827	28969	1142	36	104%
      madvise_	1	240	262	22	22	109%
      madvise_	2	366	442	76	38	121%
      madvise_	4	623	751	128	32	121%
      madvise_	8	1110	1324	215	27	119%
      madvise_	16	2127	2451	324	20	115%
      madvise_	32	4109	4642	534	17	113%
      
      The second test (measuring cpu cycle)
      syscall__	vmas	cpu	cmseal	delta_cpu	per_vma	%
      munmap__	1	1790	1890	100	100	106%
      munmap__	2	2819	3033	214	107	108%
      munmap__	4	4959	5271	312	78	106%
      munmap__	8	8262	8745	483	60	106%
      munmap__	16	13099	14116	1017	64	108%
      munmap__	32	23221	24785	1565	49	107%
      mprotect	1	906	967	62	62	107%
      mprotect	2	3019	3203	184	92	106%
      mprotect	4	6149	6569	420	105	107%
      mprotect	8	9978	10524	545	68	105%
      mprotect	16	20448	21427	979	61	105%
      mprotect	32	40972	42935	1963	61	105%
      madvise_	1	434	497	63	63	115%
      madvise_	2	752	899	147	74	120%
      madvise_	4	1313	1513	200	50	115%
      madvise_	8	2271	2627	356	44	116%
      madvise_	16	4312	4883	571	36	113%
      madvise_	32	8376	9319	943	29	111%
      
      Based on the result, for 6.8 kernel, sealing check adds
      20-40 nano seconds, or around 50-100 CPU cycles, per VMA.
      
      In addition, I applied the sealing to 5.10 kernel:
      The first test (measuring time)
      syscall__	vmas	t	tmseal	delta_ns	per_vma	%
      munmap__	1	357	390	33	33	109%
      munmap__	2	442	463	21	11	105%
      munmap__	4	614	634	20	5	103%
      munmap__	8	1017	1137	120	15	112%
      munmap__	16	1889	2153	263	16	114%
      munmap__	32	4109	4088	-21	-1	99%
      mprotect	1	235	227	-7	-7	97%
      mprotect	2	495	464	-30	-15	94%
      mprotect	4	741	764	24	6	103%
      mprotect	8	1434	1437	2	0	100%
      mprotect	16	2958	2991	33	2	101%
      mprotect	32	6431	6608	177	6	103%
      madvise_	1	191	208	16	16	109%
      madvise_	2	300	324	24	12	108%
      madvise_	4	450	473	23	6	105%
      madvise_	8	753	806	53	7	107%
      madvise_	16	1467	1592	125	8	108%
      madvise_	32	2795	3405	610	19	122%
      					
      The second test (measuring cpu cycle)
      syscall__	nbr_vma	cpu	cmseal	delta_cpu	per_vma	%
      munmap__	1	684	715	31	31	105%
      munmap__	2	861	898	38	19	104%
      munmap__	4	1183	1235	51	13	104%
      munmap__	8	1999	2045	46	6	102%
      munmap__	16	3839	3816	-23	-1	99%
      munmap__	32	7672	7887	216	7	103%
      mprotect	1	397	443	46	46	112%
      mprotect	2	738	788	50	25	107%
      mprotect	4	1221	1256	35	9	103%
      mprotect	8	2356	2429	72	9	103%
      mprotect	16	4961	4935	-26	-2	99%
      mprotect	32	9882	10172	291	9	103%
      madvise_	1	351	380	29	29	108%
      madvise_	2	565	615	49	25	109%
      madvise_	4	872	933	61	15	107%
      madvise_	8	1508	1640	132	16	109%
      madvise_	16	3078	3323	245	15	108%
      madvise_	32	5893	6704	811	25	114%
      
      For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
      CPU cycles, there is even decrease in some cases.
      
      It might be interesting to compare 5.10 and 6.8 kernel
      The first test (measuring time)
      syscall__	vmas	t_5_10	t_6_8	delta_ns	per_vma	%
      munmap__	1	357	909	552	552	254%
      munmap__	2	442	1398	956	478	316%
      munmap__	4	614	2444	1830	458	398%
      munmap__	8	1017	4029	3012	377	396%
      munmap__	16	1889	6647	4758	297	352%
      munmap__	32	4109	11811	7702	241	287%
      mprotect	1	235	439	204	204	187%
      mprotect	2	495	1659	1164	582	335%
      mprotect	4	741	3747	3006	752	506%
      mprotect	8	1434	6755	5320	665	471%
      mprotect	16	2958	13748	10790	674	465%
      mprotect	32	6431	27827	21397	669	433%
      madvise_	1	191	240	49	49	125%
      madvise_	2	300	366	67	33	122%
      madvise_	4	450	623	173	43	138%
      madvise_	8	753	1110	357	45	147%
      madvise_	16	1467	2127	660	41	145%
      madvise_	32	2795	4109	1314	41	147%
      
      The second test (measuring cpu cycle)
      syscall__	vmas	cpu_5_10	c_6_8	delta_cpu	per_vma	%
      munmap__	1	684	1790	1106	1106	262%
      munmap__	2	861	2819	1958	979	327%
      munmap__	4	1183	4959	3776	944	419%
      munmap__	8	1999	8262	6263	783	413%
      munmap__	16	3839	13099	9260	579	341%
      munmap__	32	7672	23221	15549	486	303%
      mprotect	1	397	906	509	509	228%
      mprotect	2	738	3019	2281	1140	409%
      mprotect	4	1221	6149	4929	1232	504%
      mprotect	8	2356	9978	7622	953	423%
      mprotect	16	4961	20448	15487	968	412%
      mprotect	32	9882	40972	31091	972	415%
      madvise_	1	351	434	82	82	123%
      madvise_	2	565	752	186	93	133%
      madvise_	4	872	1313	442	110	151%
      madvise_	8	1508	2271	763	95	151%
      madvise_	16	3078	4312	1234	77	140%
      madvise_	32	5893	8376	2483	78	142%
      
      From 5.10 to 6.8
      munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
      mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
      madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.
      
      In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
      increase from 5.10 to 6.8 is significantly larger, approximately ten times
      greater for munmap and mprotect.
      
      When I discuss the mm performance with Brian Makin, an engineer who worked
      on performance, it was brought to my attention that such performance
      benchmarks, which measuring millions of mm syscall in a tight loop, may
      not accurately reflect real-world scenarios, such as that of a database
      service.  Also this is tested using a single HW and ChromeOS, the data
      from another HW or distribution might be different.  It might be best to
      take this data with a grain of salt.
      
      
      This patch (of 5):
      
      Wire up mseal syscall for all architectures.
      
      Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org
      Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.orgSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Jann Horn <jannh@google.com> [Bug #2]
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: Stephen Röttger <sroettger@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
      Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ff388fe5
  3. 23 May, 2024 21 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 6d69b6c1
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Stable fixes:
         - nfs: fix undefined behavior in nfs_block_bits()
         - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
      
        Bugfixes:
         - Fix mixing of the lock/nolock and local_lock mount options
         - NFSv4: Fixup smatch warning for ambiguous return
         - NFSv3: Fix remount when using the legacy binary mount api
         - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
         - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
         - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
      
        Features and cleanups:
         - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
         - pNFS/filelayout: S layout segment range in LAYOUTGET
         - pNFS: rework pnfs_generic_pg_check_layout to check IO range
         - NFSv2: Turn off enabling of NFS v2 by default"
      
      * tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: fix undefined behavior in nfs_block_bits()
        pNFS: rework pnfs_generic_pg_check_layout to check IO range
        pNFS/filelayout: check layout segment range
        pNFS/filelayout: fixup pNfs allocation modes
        rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
        NFS: Don't enable NFS v2 by default
        NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
        sunrpc: fix NFSACL RPC retry on soft mount
        SUNRPC: fix handling expired GSS context
        nfs: keep server info for remounts
        NFSv4: Fixup smatch warning for ambiguous return
        NFS: make sure lock/nolock overriding local_lock mount option
        NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
        pNFS/filelayout: Specify the layout segment range in LAYOUTGET
        pNFS/filelayout: Remove the whole file layout requirement
      6d69b6c1
    • Linus Torvalds's avatar
      Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux · b4d88a60
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Followup block updates, mostly due to NVMe being a bit late to the
        party. But nothing major in there, so not a big deal.
      
        In detail, this contains:
      
         - NVMe pull request via Keith:
             - Fabrics connection retries (Daniel, Hannes)
             - Fabrics logging enhancements (Tokunori)
             - RDMA delete optimization (Sagi)
      
         - ublk DMA alignment fix (me)
      
         - null_blk sparse warning fixes (Bart)
      
         - Discard support for brd (Keith)
      
         - blk-cgroup list corruption fixes (Ming)
      
         - blk-cgroup stat propagation fix (Waiman)
      
         - Regression fix for plugging stall with md (Yu)
      
         - Misc fixes or cleanups (David, Jeff, Justin)"
      
      * tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
        null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
        blk-throttle: remove unused struct 'avg_latency_bucket'
        block: fix lost bio for plug enabled bio based device
        block: t10-pi: add MODULE_DESCRIPTION()
        blk-mq: add helper for checking if one CPU is mapped to specified hctx
        blk-cgroup: Properly propagate the iostat update up the hierarchy
        blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
        blk-cgroup: fix list corruption from resetting io stat
        cdrom: rearrange last_media_change check to avoid unintentional overflow
        nbd: Fix signal handling
        nbd: Remove a local variable from nbd_send_cmd()
        nbd: Improve the documentation of the locking assumptions
        nbd: Remove superfluous casts
        nbd: Use NULL to represent a pointer
        brd: implement discard support
        null_blk: Fix two sparse warnings
        ublk_drv: set DMA alignment mask to 3
        nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
        nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
        nvme: do not retry authentication failures
        ...
      b4d88a60
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux · 483a351e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Single fix here for a regression in 6.9, and then a simple cleanup
        removing some dead code"
      
      * tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
        io_uring: remove checks for NULL 'sq_offset'
        io_uring/sqpoll: ensure that normal task_work is also run timely
      483a351e
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.10-merge-window' of... · c2c80ecd
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A bunch of fixes that came in during the merge window.
      
        Matti found several issues with some of the more complexly configured
        Rohm regulators and the helpers they use and there were some errors in
        the specification of tps6594 when regulators are grouped together"
      
      * tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: tps6594-regulator: Correct multi-phase configuration
        regulator: tps6287x: Force writing VSEL bit
        regulator: pickable ranges: don't always cache vsel
        regulator: rohm-regulator: warn if unsupported voltage is set
        regulator: bd71828: Don't overwrite runtime voltages
      c2c80ecd
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.10-merge-window' of... · 09f8f2c4
      Linus Torvalds authored
      Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
      
      Pull regmap fix from Mark Brown:
       "Guenter ran with memory sanitisers and found an issue in the new KUnit
        tests that Richard added where an assumption in older test code was
        exposed, this was fixed quickly by Richard"
      
      * tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: kunit: Fix array overflow in stride() test
      09f8f2c4
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 66ad4829
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Quite smaller than usual. Notably it includes the fix for the unix
        regression from the past weeks. The TCP window fix will require some
        follow-up, already queued.
      
        Current release - regressions:
      
         - af_unix: fix garbage collection of embryos
      
        Previous releases - regressions:
      
         - af_unix: fix race between GC and receive path
      
         - ipv6: sr: fix missing sk_buff release in seg6_input_core
      
         - tcp: remove 64 KByte limit for initial tp->rcv_wnd value
      
         - eth: r8169: fix rx hangup
      
         - eth: lan966x: remove ptp traps in case the ptp is not enabled
      
         - eth: ixgbe: fix link breakage vs cisco switches
      
         - eth: ice: prevent ethtool from corrupting the channels
      
        Previous releases - always broken:
      
         - openvswitch: set the skbuff pkt_type for proper pmtud support
      
         - tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
      
        Misc:
      
         - a bunch of selftests stabilization patches"
      
      * tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
        r8169: Fix possible ring buffer corruption on fragmented Tx packets.
        idpf: Interpret .set_channels() input differently
        ice: Interpret .set_channels() input differently
        nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
        net: relax socket state check at accept time.
        tcp: remove 64 KByte limit for initial tp->rcv_wnd value
        net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
        tls: fix missing memory barrier in tls_init
        net: fec: avoid lock evasion when reading pps_enable
        Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
        testing: net-drv: use stats64 for testing
        net: mana: Fix the extra HZ in mana_hwc_send_request
        net: lan966x: Remove ptp traps in case the ptp is not enabled.
        openvswitch: Set the skbuff pkt_type for proper pmtud support.
        selftest: af_unix: Make SCM_RIGHTS into OOB data.
        af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
        tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
        selftests/net: use tc rule to filter the na packet
        ipv6: sr: fix memleak in seg6_hmac_init_algo
        af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
        ...
      66ad4829
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 404001dd
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Minor last minute fixes:
      
         - Fix a very tight race between the ring buffer readers and resizing
           the ring buffer
      
         - Correct some stale comments in the ring buffer code
      
         - Fix kernel-doc in the rv code
      
         - Add a MODULE_DESCRIPTION to preemptirq_delay_test"
      
      * tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rv: Update rv_en(dis)able_monitor doc to match kernel-doc
        tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test
        ring-buffer: Fix a race between readers and resize checks
        ring-buffer: Correct stale comments related to non-consuming readers
      404001dd
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e82d2af5
      Linus Torvalds authored
      Pull tracing tool fix from Steven Rostedt:
       "Fix printf format warnings in latency-collector.
      
        Use the printf format string with %s to take a string instead of
        taking in a string directly"
      
      * tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tools/latency-collector: Fix -Wformat-security compile warns
      e82d2af5
    • Linus Torvalds's avatar
      Merge tag 'trace-assign-str-v6.10' of... · d6a326d6
      Linus Torvalds authored
      Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing cleanup from Steven Rostedt:
       "Remove second argument of __assign_str()
      
        The __assign_str() macro logic of the TRACE_EVENT() macro was
        optimized so that it no longer needs the second argument. The
        __assign_str() is always matched with __string() field that takes a
        field name and the source for that field:
      
          __string(field, source)
      
        The TRACE_EVENT() macro logic will save off the source value and then
        use that value to copy into the ring buffer via the __assign_str().
      
        Before commit c1fa617c ("tracing: Rework __assign_str() and
        __string() to not duplicate getting the string"), the __assign_str()
        needed the second argument which would perform the same logic as the
        __string() source parameter did. Not only would this add overhead, but
        it was error prone as if the __assign_str() source produced something
        different, it may not have allocated enough for the string in the ring
        buffer (as the __string() source was used to determine how much to
        allocate)
      
        Now that the __assign_str() just uses the same string that was used in
        __string() it no longer needs the source parameter. It can now be
        removed"
      
      * tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/treewide: Remove second parameter of __assign_str()
      d6a326d6
    • Linus Torvalds's avatar
      Merge tag 'sparc-for-6.10-tag1' of... · bca2a25d
      Linus Torvalds authored
      Merge tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
      
      Pull sparc updates from Andreas Larsson:
      
       - Avoid on-stack cpumask variables in a number of places
      
       - Move struct termio to asm/termios.h, matching other architectures and
         allowing certain user space applications to build also for sparc
      
       - Fix missing prototype warnings for sparc64
      
       - Fix version generation warnings for sparc32
      
       - Fix bug where non-consecutive CPU IDs lead to some CPUs not starting
      
       - Simplification using swap and cleanup using NULL for pointer
      
       - Convert sparc parport and chmc drivers to use remove callbacks
         returning void
      
      * tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
        sparc/leon: Remove on-stack cpumask var
        sparc/pci_msi: Remove on-stack cpumask var
        sparc/of: Remove on-stack cpumask var
        sparc/irq: Remove on-stack cpumask var
        sparc/srmmu: Remove on-stack cpumask var
        sparc: chmc: Convert to platform remove callback returning void
        sparc: parport: Convert to platform remove callback returning void
        sparc: Compare pointers to NULL instead of 0
        sparc: Use swap() to fix Coccinelle warning
        sparc32: Fix version generation failed warnings
        sparc64: Fix number of online CPUs
        sparc64: Fix prototype warning for sched_clock
        sparc64: Fix prototype warnings in adi_64.c
        sparc64: Fix prototype warning for dma_4v_iotsb_bind
        sparc64: Fix prototype warning for uprobe_trap
        sparc64: Fix prototype warning for alloc_irqstack_bootmem
        sparc64: Fix prototype warning for vmemmap_free
        sparc64: Fix prototype warnings in traps_64.c
        sparc64: Fix prototype warning for init_vdso_image
        sparc: move struct termio to asm/termios.h
      bca2a25d
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2b7ced10
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The major fix here is for a filesystem corruption issue reported on
        Apple M1 as a result of buggy management of the floating point
        register state introduced in 6.8. I initially reverted one of the
        offending patches, but in the end Ard cooked a proper fix so there's a
        revert+reapply in the series.
      
        Aside from that, we've got some CPU errata workarounds and misc other
        fixes.
      
         - Fix broken FP register state tracking which resulted in filesystem
           corruption when dm-crypt is used
      
         - Workarounds for Arm CPU errata affecting the SSBS Spectre
           mitigation
      
         - Fix lockdep assertion in DMC620 memory controller PMU driver
      
         - Fix alignment of BUG table when CONFIG_DEBUG_BUGVERBOSE is
           disabled"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/fpsimd: Avoid erroneous elide of user state reload
        Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
        perf/arm-dmc620: Fix lockdep assert in ->event_init()
        Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: errata: Add workaround for Arm errata 3194386 and 3312417
        arm64: cputype: Add Neoverse-V3 definitions
        arm64: cputype: Add Cortex-X4 definitions
        arm64: barrier: Restore spec_bar() macro
      2b7ced10
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2ef32ad2
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "Several new features here:
      
         - virtio-net is finally supported in vduse
      
         - virtio (balloon and mem) interaction with suspend is improved
      
         - vhost-scsi now handles signals better/faster
      
        And fixes, cleanups all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
        virtio-pci: Check if is_avq is NULL
        virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
        MAINTAINERS: add Eugenio Pérez as reviewer
        vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
        vp_vdpa: don't allocate unused msix vectors
        sound: virtio: drop owner assignment
        fuse: virtio: drop owner assignment
        scsi: virtio: drop owner assignment
        rpmsg: virtio: drop owner assignment
        nvdimm: virtio_pmem: drop owner assignment
        wifi: mac80211_hwsim: drop owner assignment
        vsock/virtio: drop owner assignment
        net: 9p: virtio: drop owner assignment
        net: virtio: drop owner assignment
        net: caif: virtio: drop owner assignment
        misc: nsm: drop owner assignment
        iommu: virtio: drop owner assignment
        drm/virtio: drop owner assignment
        gpio: virtio: drop owner assignment
        firmware: arm_scmi: virtio: drop owner assignment
        ...
      2ef32ad2
    • Alexandre Ghiti's avatar
      riscv: Fix early ftrace nop patching · 6ca445d8
      Alexandre Ghiti authored
      Commit c97bf629 ("riscv: Fix text patching when IPI are used")
      converted ftrace_make_nop() to use patch_insn_write() which does not
      emit any icache flush relying entirely on __ftrace_modify_code() to do
      that.
      
      But we missed that ftrace_make_nop() was called very early directly when
      converting mcount calls into nops (actually on riscv it converts 2B nops
      emitted by the compiler into 4B nops).
      
      This caused crashes on multiple HW as reported by Conor and Björn since
      the booting core could have half-patched instructions in its icache
      which would trigger an illegal instruction trap: fix this by emitting a
      local flush icache when early patching nops.
      
      Fixes: c97bf629 ("riscv: Fix text patching when IPI are used")
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Reported-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Tested-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Reviewed-by: default avatarBjörn Töpel <bjorn@rivosinc.com>
      Tested-by: default avatarBjörn Töpel <bjorn@rivosinc.com>
      Link: https://lore.kernel.org/r/20240523115134.70380-1-alexghiti@rivosinc.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      6ca445d8
    • Shuah Khan's avatar
      tools/latency-collector: Fix -Wformat-security compile warns · df73757c
      Shuah Khan authored
      Fix the following -Wformat-security compile warnings adding missing
      format arguments:
      
      latency-collector.c: In function ‘show_available’:
      latency-collector.c:938:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        938 |                 warnx(no_tracer_msg);
            |                 ^~~~~
      
      latency-collector.c:943:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        943 |                 warnx(no_latency_tr_msg);
            |                 ^~~~~
      
      latency-collector.c: In function ‘find_default_tracer’:
      latency-collector.c:986:25: warning: format not a string literal and
      no format arguments [-Wformat-security]
        986 |                         errx(EXIT_FAILURE, no_tracer_msg);
            |
                               ^~~~
      latency-collector.c: In function ‘scan_arguments’:
      latency-collector.c:1881:33: warning: format not a string literal and
      no format arguments [-Wformat-security]
       1881 |                                 errx(EXIT_FAILURE, no_tracer_msg);
            |                                 ^~~~
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240404011009.32945-1-skhan@linuxfoundation.org
      
      Cc: stable@vger.kernel.org
      Fixes: e23db805 ("tracing/tools: Add the latency-collector to tools directory")
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      df73757c
    • Ken Milmore's avatar
      r8169: Fix possible ring buffer corruption on fragmented Tx packets. · c71e3a5c
      Ken Milmore authored
      An issue was found on the RTL8125b when transmitting small fragmented
      packets, whereby invalid entries were inserted into the transmit ring
      buffer, subsequently leading to calls to dma_unmap_single() with a null
      address.
      
      This was caused by rtl8169_start_xmit() not noticing changes to nr_frags
      which may occur when small packets are padded (to work around hardware
      quirks) in rtl8169_tso_csum_v2().
      
      To fix this, postpone inspecting nr_frags until after any padding has been
      applied.
      
      Fixes: 9020845f ("r8169: improve rtl8169_start_xmit")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKen Milmore <ken.milmore@gmail.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c71e3a5c
    • Steven Rostedt (Google)'s avatar
      eventfs: Do not use attributes for events directory · 2dd00ac1
      Steven Rostedt (Google) authored
      The top "events" directory has a static inode (it's created when it is and
      removed when the directory is removed). There's no need to use the events
      ei->attr to determine its permissions. But it is used for saving the
      permissions of the "events" directory for when it is created, as that is
      needed for the default permissions for the files and directories
      underneath it.
      
      For example:
      
       # cd /sys/kernel/tracing
       # mkdir instances/foo
       # chown 1001 instances/foo/events
      
      The files under instances/foo/events should still have the same owner as
      instances/foo (which the instances/foo/events ei->attr will hold), but the
      events directory now has owner 1001.
      
      Link: https://lore.kernel.org/lkml/20240522165032.104981011@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2dd00ac1
    • Steven Rostedt (Google)'s avatar
      eventfs: Cleanup permissions in creation of inodes · 6e3d7c90
      Steven Rostedt (Google) authored
      The permissions being set during the creation of the inodes was updating
      eventfs_inode attributes as well. Those attributes should only be touched
      by the setattr or remount operations, not during the creation of inodes.
      The eventfs_inode attributes should only be used to set the inodes and
      should not be modified during the inode creation.
      
      Simplify the code and fix the situation by:
      
       1) Removing the eventfs_find_events() and doing a simple lookup for
          the events descriptor in eventfs_get_inode()
      
       2) Remove update_events_attr() as the attributes should only be used
          to update the inode and should not be modified here.
      
       3) Add update_inode_attr() that uses the attributes to determine what
          the inode permissions should be.
      
       4) As the parent_inode of the eventfs_root_inode structure is no longer
          needed, remove it.
      
      Now on creation, the inode gets the proper permissions without causing
      side effects to the ei->attr field.
      
      Link: https://lore.kernel.org/lkml/20240522165031.944088388@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      6e3d7c90
    • Steven Rostedt (Google)'s avatar
      eventfs: Remove getattr and permission callbacks · 37cd0d12
      Steven Rostedt (Google) authored
      Now that inodes have their permissions updated on remount, the only other
      places to update the inode permissions are when they are created and in
      the setattr callback. The getattr and permission callbacks are not needed
      as the inodes should already be set at their proper settings.
      
      Remove the callbacks, as it not only simplifies the code, but also allows
      more flexibility to fix the inconsistencies with various corner cases
      (like changing the permission of an instance directory).
      
      Link: https://lore.kernel.org/lkml/20240522165031.782066021@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      37cd0d12
    • Steven Rostedt (Google)'s avatar
      eventfs: Consolidate the eventfs_inode update in eventfs_get_inode() · 625acf9d
      Steven Rostedt (Google) authored
      To simplify the code, create a eventfs_get_inode() that is used when an
      eventfs file or directory is created. Have the internal tracefs_inode
      updated the appropriate flags in this function and update the inode's
      mode as well.
      
      Link: https://lore.kernel.org/lkml/20240522165031.624864160@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      625acf9d
    • Steven Rostedt (Google)'s avatar
      tracefs: Clear EVENT_INODE flag in tracefs_drop_inode() · 0bcfd9aa
      Steven Rostedt (Google) authored
      When the inode is being dropped from the dentry, the TRACEFS_EVENT_INODE
      flag needs to be cleared to prevent a remount from calling
      eventfs_remount() on the tracefs_inode private data. There's a race
      between the inode is dropped (and the dentry freed) to where the inode is
      actually freed. If a remount happens between the two, the eventfs_inode
      could be accessed after it is freed (only the dentry keeps a ref count on
      it).
      
      Currently the TRACEFS_EVENT_INODE flag is cleared from the dentry iput()
      function. But this is incorrect, as it is possible that the inode has
      another reference to it. The flag should only be cleared when the inode is
      really being dropped and has no more references. That happens in the
      drop_inode callback of the inode, as that gets called when the last
      reference of the inode is released.
      
      Remove the tracefs_d_iput() function and move its logic to the more
      appropriate tracefs_drop_inode() callback function.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.908205106@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Fixes: baa23a8d ("tracefs: Reset permissions on remount if permissions are options")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      0bcfd9aa
    • Steven Rostedt (Google)'s avatar
      eventfs: Update all the eventfs_inodes from the events descriptor · 340f0c70
      Steven Rostedt (Google) authored
      The change to update the permissions of the eventfs_inode had the
      misconception that using the tracefs_inode would find all the
      eventfs_inodes that have been updated and reset them on remount.
      The problem with this approach is that the eventfs_inodes are freed when
      they are no longer used (basically the reason the eventfs system exists).
      When they are freed, the updated eventfs_inodes are not reset on a remount
      because their tracefs_inodes have been freed.
      
      Instead, since the events directory eventfs_inode always has a
      tracefs_inode pointing to it (it is not freed when finished), and the
      events directory has a link to all its children, have the
      eventfs_remount() function only operate on the events eventfs_inode and
      have it descend into its children updating their uid and gids.
      
      Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVwJyVQ@mail.gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: baa23a8d ("tracefs: Reset permissions on remount if permissions are options")
      Reported-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      340f0c70