1. 24 May, 2024 1 commit
    • Linus Torvalds's avatar
      Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 0eb03c7e
      Linus Torvalds authored
      Pull tracefs/eventfs updates from Steven Rostedt:
       "Bug fixes:
      
         - The eventfs directories need to have unique inode numbers. Make
           sure that they do not get the default file inode number.
      
         - Update the inode uid and gid fields on remount.
      
           When a remount happens where a uid and/or gid is specified, all the
           tracefs files and directories should get the specified uid and/or
           gid. But this can be sporadic when some uids were assigned already.
           There's already a list of inodes that are allocated. Just update
           their uid and gid fields at the time of remount.
      
         - Update the eventfs_inodes on remount from the top level "events"
           descriptor.
      
           There was a bug where not all the eventfs files or directories
           where getting updated on remount. One fix was to clear the
           SAVED_UID/GID flags from the inode list during the iteration of the
           inodes during the remount. But because the eventfs inodes can be
           freed when the last referenced is released, not all the
           eventfs_inodes were being updated. This lead to the ownership
           selftest to fail if it was run a second time (the first time would
           leave eventfs_inodes with no corresponding tracefs_inode).
      
           Instead, for eventfs_inodes, only process the "events"
           eventfs_inode from the list iteration, as it is guaranteed to have
           a tracefs_inode (it's never freed while the "events" directory
           exists). As it has a list of its children, and the children have a
           list of their children, just iterate all the eventfs_inodes from
           the "events" descriptor and it is guaranteed to get all of them.
      
         - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
      
           Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
           callback. But this is the wrong location. The iput() callback is
           called when the last reference to the dentry inode is hit. There
           could be a case where two dentry's have the same inode, and the
           flag will be cleared prematurely. The flag needs to be cleared when
           the last reference of the inode is dropped and that happens in the
           inode's drop_inode() callback handler.
      
        Cleanups:
      
         - Consolidate the creation of a tracefs_inode for an eventfs_inode
      
           A tracefs_inode is created for both files and directories of the
           eventfs system. It is open coded. Instead, consolidate it into a
           single eventfs_get_inode() function call.
      
         - Remove the eventfs getattr and permission callbacks.
      
           The permissions for the eventfs files and directories are updated
           when the inodes are created, on remount, and when the user sets
           them (via setattr). The inodes hold the current permissions so
           there is no need to have custom getattr or permissions callbacks as
           they will more likely cause them to be incorrect. The inode's
           permissions are updated when they should be updated. Remove the
           getattr and permissions inode callbacks.
      
         - Do not update eventfs_inode attributes on creation of inodes.
      
           The eventfs_inodes attribute field is used to store the permissions
           of the directories and files for when their corresponding inodes
           are freed and are created again. But when the creation of the
           inodes happen, the eventfs_inode attributes are recalculated. The
           recalculation should only happen when the permissions change for a
           given file or directory. Currently, the attribute changes are just
           being set to their current files so this is not a bug, but it's
           unnecessary and error prone. Stop doing that.
      
         - The events directory inode is created once when the events
           directory is created and deleted when it is deleted. It is now
           updated on remount and when the user changes the permissions.
           There's no need to use the eventfs_inode of the events directory to
           store the events directory permissions. But using it to store the
           default permissions for the files within the directory that have
           not been updated by the user can simplify the code"
      
      * tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Do not use attributes for events directory
        eventfs: Cleanup permissions in creation of inodes
        eventfs: Remove getattr and permission callbacks
        eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
        tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
        eventfs: Update all the eventfs_inodes from the events descriptor
        tracefs: Update inode permissions on remount
        eventfs: Keep the directories from having the same inode number as files
      0eb03c7e
  2. 23 May, 2024 37 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 6d69b6c1
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Stable fixes:
         - nfs: fix undefined behavior in nfs_block_bits()
         - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
      
        Bugfixes:
         - Fix mixing of the lock/nolock and local_lock mount options
         - NFSv4: Fixup smatch warning for ambiguous return
         - NFSv3: Fix remount when using the legacy binary mount api
         - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
         - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
         - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
      
        Features and cleanups:
         - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
         - pNFS/filelayout: S layout segment range in LAYOUTGET
         - pNFS: rework pnfs_generic_pg_check_layout to check IO range
         - NFSv2: Turn off enabling of NFS v2 by default"
      
      * tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: fix undefined behavior in nfs_block_bits()
        pNFS: rework pnfs_generic_pg_check_layout to check IO range
        pNFS/filelayout: check layout segment range
        pNFS/filelayout: fixup pNfs allocation modes
        rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
        NFS: Don't enable NFS v2 by default
        NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
        sunrpc: fix NFSACL RPC retry on soft mount
        SUNRPC: fix handling expired GSS context
        nfs: keep server info for remounts
        NFSv4: Fixup smatch warning for ambiguous return
        NFS: make sure lock/nolock overriding local_lock mount option
        NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
        pNFS/filelayout: Specify the layout segment range in LAYOUTGET
        pNFS/filelayout: Remove the whole file layout requirement
      6d69b6c1
    • Linus Torvalds's avatar
      Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux · b4d88a60
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Followup block updates, mostly due to NVMe being a bit late to the
        party. But nothing major in there, so not a big deal.
      
        In detail, this contains:
      
         - NVMe pull request via Keith:
             - Fabrics connection retries (Daniel, Hannes)
             - Fabrics logging enhancements (Tokunori)
             - RDMA delete optimization (Sagi)
      
         - ublk DMA alignment fix (me)
      
         - null_blk sparse warning fixes (Bart)
      
         - Discard support for brd (Keith)
      
         - blk-cgroup list corruption fixes (Ming)
      
         - blk-cgroup stat propagation fix (Waiman)
      
         - Regression fix for plugging stall with md (Yu)
      
         - Misc fixes or cleanups (David, Jeff, Justin)"
      
      * tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
        null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
        blk-throttle: remove unused struct 'avg_latency_bucket'
        block: fix lost bio for plug enabled bio based device
        block: t10-pi: add MODULE_DESCRIPTION()
        blk-mq: add helper for checking if one CPU is mapped to specified hctx
        blk-cgroup: Properly propagate the iostat update up the hierarchy
        blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
        blk-cgroup: fix list corruption from resetting io stat
        cdrom: rearrange last_media_change check to avoid unintentional overflow
        nbd: Fix signal handling
        nbd: Remove a local variable from nbd_send_cmd()
        nbd: Improve the documentation of the locking assumptions
        nbd: Remove superfluous casts
        nbd: Use NULL to represent a pointer
        brd: implement discard support
        null_blk: Fix two sparse warnings
        ublk_drv: set DMA alignment mask to 3
        nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
        nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
        nvme: do not retry authentication failures
        ...
      b4d88a60
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux · 483a351e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Single fix here for a regression in 6.9, and then a simple cleanup
        removing some dead code"
      
      * tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
        io_uring: remove checks for NULL 'sq_offset'
        io_uring/sqpoll: ensure that normal task_work is also run timely
      483a351e
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.10-merge-window' of... · c2c80ecd
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A bunch of fixes that came in during the merge window.
      
        Matti found several issues with some of the more complexly configured
        Rohm regulators and the helpers they use and there were some errors in
        the specification of tps6594 when regulators are grouped together"
      
      * tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: tps6594-regulator: Correct multi-phase configuration
        regulator: tps6287x: Force writing VSEL bit
        regulator: pickable ranges: don't always cache vsel
        regulator: rohm-regulator: warn if unsupported voltage is set
        regulator: bd71828: Don't overwrite runtime voltages
      c2c80ecd
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.10-merge-window' of... · 09f8f2c4
      Linus Torvalds authored
      Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
      
      Pull regmap fix from Mark Brown:
       "Guenter ran with memory sanitisers and found an issue in the new KUnit
        tests that Richard added where an assumption in older test code was
        exposed, this was fixed quickly by Richard"
      
      * tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: kunit: Fix array overflow in stride() test
      09f8f2c4
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 66ad4829
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Quite smaller than usual. Notably it includes the fix for the unix
        regression from the past weeks. The TCP window fix will require some
        follow-up, already queued.
      
        Current release - regressions:
      
         - af_unix: fix garbage collection of embryos
      
        Previous releases - regressions:
      
         - af_unix: fix race between GC and receive path
      
         - ipv6: sr: fix missing sk_buff release in seg6_input_core
      
         - tcp: remove 64 KByte limit for initial tp->rcv_wnd value
      
         - eth: r8169: fix rx hangup
      
         - eth: lan966x: remove ptp traps in case the ptp is not enabled
      
         - eth: ixgbe: fix link breakage vs cisco switches
      
         - eth: ice: prevent ethtool from corrupting the channels
      
        Previous releases - always broken:
      
         - openvswitch: set the skbuff pkt_type for proper pmtud support
      
         - tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
      
        Misc:
      
         - a bunch of selftests stabilization patches"
      
      * tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
        r8169: Fix possible ring buffer corruption on fragmented Tx packets.
        idpf: Interpret .set_channels() input differently
        ice: Interpret .set_channels() input differently
        nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
        net: relax socket state check at accept time.
        tcp: remove 64 KByte limit for initial tp->rcv_wnd value
        net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
        tls: fix missing memory barrier in tls_init
        net: fec: avoid lock evasion when reading pps_enable
        Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
        testing: net-drv: use stats64 for testing
        net: mana: Fix the extra HZ in mana_hwc_send_request
        net: lan966x: Remove ptp traps in case the ptp is not enabled.
        openvswitch: Set the skbuff pkt_type for proper pmtud support.
        selftest: af_unix: Make SCM_RIGHTS into OOB data.
        af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
        tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
        selftests/net: use tc rule to filter the na packet
        ipv6: sr: fix memleak in seg6_hmac_init_algo
        af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
        ...
      66ad4829
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 404001dd
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Minor last minute fixes:
      
         - Fix a very tight race between the ring buffer readers and resizing
           the ring buffer
      
         - Correct some stale comments in the ring buffer code
      
         - Fix kernel-doc in the rv code
      
         - Add a MODULE_DESCRIPTION to preemptirq_delay_test"
      
      * tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rv: Update rv_en(dis)able_monitor doc to match kernel-doc
        tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test
        ring-buffer: Fix a race between readers and resize checks
        ring-buffer: Correct stale comments related to non-consuming readers
      404001dd
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e82d2af5
      Linus Torvalds authored
      Pull tracing tool fix from Steven Rostedt:
       "Fix printf format warnings in latency-collector.
      
        Use the printf format string with %s to take a string instead of
        taking in a string directly"
      
      * tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tools/latency-collector: Fix -Wformat-security compile warns
      e82d2af5
    • Linus Torvalds's avatar
      Merge tag 'trace-assign-str-v6.10' of... · d6a326d6
      Linus Torvalds authored
      Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing cleanup from Steven Rostedt:
       "Remove second argument of __assign_str()
      
        The __assign_str() macro logic of the TRACE_EVENT() macro was
        optimized so that it no longer needs the second argument. The
        __assign_str() is always matched with __string() field that takes a
        field name and the source for that field:
      
          __string(field, source)
      
        The TRACE_EVENT() macro logic will save off the source value and then
        use that value to copy into the ring buffer via the __assign_str().
      
        Before commit c1fa617c ("tracing: Rework __assign_str() and
        __string() to not duplicate getting the string"), the __assign_str()
        needed the second argument which would perform the same logic as the
        __string() source parameter did. Not only would this add overhead, but
        it was error prone as if the __assign_str() source produced something
        different, it may not have allocated enough for the string in the ring
        buffer (as the __string() source was used to determine how much to
        allocate)
      
        Now that the __assign_str() just uses the same string that was used in
        __string() it no longer needs the source parameter. It can now be
        removed"
      
      * tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/treewide: Remove second parameter of __assign_str()
      d6a326d6
    • Linus Torvalds's avatar
      Merge tag 'sparc-for-6.10-tag1' of... · bca2a25d
      Linus Torvalds authored
      Merge tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
      
      Pull sparc updates from Andreas Larsson:
      
       - Avoid on-stack cpumask variables in a number of places
      
       - Move struct termio to asm/termios.h, matching other architectures and
         allowing certain user space applications to build also for sparc
      
       - Fix missing prototype warnings for sparc64
      
       - Fix version generation warnings for sparc32
      
       - Fix bug where non-consecutive CPU IDs lead to some CPUs not starting
      
       - Simplification using swap and cleanup using NULL for pointer
      
       - Convert sparc parport and chmc drivers to use remove callbacks
         returning void
      
      * tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
        sparc/leon: Remove on-stack cpumask var
        sparc/pci_msi: Remove on-stack cpumask var
        sparc/of: Remove on-stack cpumask var
        sparc/irq: Remove on-stack cpumask var
        sparc/srmmu: Remove on-stack cpumask var
        sparc: chmc: Convert to platform remove callback returning void
        sparc: parport: Convert to platform remove callback returning void
        sparc: Compare pointers to NULL instead of 0
        sparc: Use swap() to fix Coccinelle warning
        sparc32: Fix version generation failed warnings
        sparc64: Fix number of online CPUs
        sparc64: Fix prototype warning for sched_clock
        sparc64: Fix prototype warnings in adi_64.c
        sparc64: Fix prototype warning for dma_4v_iotsb_bind
        sparc64: Fix prototype warning for uprobe_trap
        sparc64: Fix prototype warning for alloc_irqstack_bootmem
        sparc64: Fix prototype warning for vmemmap_free
        sparc64: Fix prototype warnings in traps_64.c
        sparc64: Fix prototype warning for init_vdso_image
        sparc: move struct termio to asm/termios.h
      bca2a25d
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2b7ced10
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The major fix here is for a filesystem corruption issue reported on
        Apple M1 as a result of buggy management of the floating point
        register state introduced in 6.8. I initially reverted one of the
        offending patches, but in the end Ard cooked a proper fix so there's a
        revert+reapply in the series.
      
        Aside from that, we've got some CPU errata workarounds and misc other
        fixes.
      
         - Fix broken FP register state tracking which resulted in filesystem
           corruption when dm-crypt is used
      
         - Workarounds for Arm CPU errata affecting the SSBS Spectre
           mitigation
      
         - Fix lockdep assertion in DMC620 memory controller PMU driver
      
         - Fix alignment of BUG table when CONFIG_DEBUG_BUGVERBOSE is
           disabled"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/fpsimd: Avoid erroneous elide of user state reload
        Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
        perf/arm-dmc620: Fix lockdep assert in ->event_init()
        Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: errata: Add workaround for Arm errata 3194386 and 3312417
        arm64: cputype: Add Neoverse-V3 definitions
        arm64: cputype: Add Cortex-X4 definitions
        arm64: barrier: Restore spec_bar() macro
      2b7ced10
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2ef32ad2
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "Several new features here:
      
         - virtio-net is finally supported in vduse
      
         - virtio (balloon and mem) interaction with suspend is improved
      
         - vhost-scsi now handles signals better/faster
      
        And fixes, cleanups all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
        virtio-pci: Check if is_avq is NULL
        virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
        MAINTAINERS: add Eugenio Pérez as reviewer
        vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
        vp_vdpa: don't allocate unused msix vectors
        sound: virtio: drop owner assignment
        fuse: virtio: drop owner assignment
        scsi: virtio: drop owner assignment
        rpmsg: virtio: drop owner assignment
        nvdimm: virtio_pmem: drop owner assignment
        wifi: mac80211_hwsim: drop owner assignment
        vsock/virtio: drop owner assignment
        net: 9p: virtio: drop owner assignment
        net: virtio: drop owner assignment
        net: caif: virtio: drop owner assignment
        misc: nsm: drop owner assignment
        iommu: virtio: drop owner assignment
        drm/virtio: drop owner assignment
        gpio: virtio: drop owner assignment
        firmware: arm_scmi: virtio: drop owner assignment
        ...
      2ef32ad2
    • Shuah Khan's avatar
      tools/latency-collector: Fix -Wformat-security compile warns · df73757c
      Shuah Khan authored
      Fix the following -Wformat-security compile warnings adding missing
      format arguments:
      
      latency-collector.c: In function ‘show_available’:
      latency-collector.c:938:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        938 |                 warnx(no_tracer_msg);
            |                 ^~~~~
      
      latency-collector.c:943:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        943 |                 warnx(no_latency_tr_msg);
            |                 ^~~~~
      
      latency-collector.c: In function ‘find_default_tracer’:
      latency-collector.c:986:25: warning: format not a string literal and
      no format arguments [-Wformat-security]
        986 |                         errx(EXIT_FAILURE, no_tracer_msg);
            |
                               ^~~~
      latency-collector.c: In function ‘scan_arguments’:
      latency-collector.c:1881:33: warning: format not a string literal and
      no format arguments [-Wformat-security]
       1881 |                                 errx(EXIT_FAILURE, no_tracer_msg);
            |                                 ^~~~
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240404011009.32945-1-skhan@linuxfoundation.org
      
      Cc: stable@vger.kernel.org
      Fixes: e23db805 ("tracing/tools: Add the latency-collector to tools directory")
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      df73757c
    • Ken Milmore's avatar
      r8169: Fix possible ring buffer corruption on fragmented Tx packets. · c71e3a5c
      Ken Milmore authored
      An issue was found on the RTL8125b when transmitting small fragmented
      packets, whereby invalid entries were inserted into the transmit ring
      buffer, subsequently leading to calls to dma_unmap_single() with a null
      address.
      
      This was caused by rtl8169_start_xmit() not noticing changes to nr_frags
      which may occur when small packets are padded (to work around hardware
      quirks) in rtl8169_tso_csum_v2().
      
      To fix this, postpone inspecting nr_frags until after any padding has been
      applied.
      
      Fixes: 9020845f ("r8169: improve rtl8169_start_xmit")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKen Milmore <ken.milmore@gmail.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c71e3a5c
    • Steven Rostedt (Google)'s avatar
      eventfs: Do not use attributes for events directory · 2dd00ac1
      Steven Rostedt (Google) authored
      The top "events" directory has a static inode (it's created when it is and
      removed when the directory is removed). There's no need to use the events
      ei->attr to determine its permissions. But it is used for saving the
      permissions of the "events" directory for when it is created, as that is
      needed for the default permissions for the files and directories
      underneath it.
      
      For example:
      
       # cd /sys/kernel/tracing
       # mkdir instances/foo
       # chown 1001 instances/foo/events
      
      The files under instances/foo/events should still have the same owner as
      instances/foo (which the instances/foo/events ei->attr will hold), but the
      events directory now has owner 1001.
      
      Link: https://lore.kernel.org/lkml/20240522165032.104981011@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2dd00ac1
    • Steven Rostedt (Google)'s avatar
      eventfs: Cleanup permissions in creation of inodes · 6e3d7c90
      Steven Rostedt (Google) authored
      The permissions being set during the creation of the inodes was updating
      eventfs_inode attributes as well. Those attributes should only be touched
      by the setattr or remount operations, not during the creation of inodes.
      The eventfs_inode attributes should only be used to set the inodes and
      should not be modified during the inode creation.
      
      Simplify the code and fix the situation by:
      
       1) Removing the eventfs_find_events() and doing a simple lookup for
          the events descriptor in eventfs_get_inode()
      
       2) Remove update_events_attr() as the attributes should only be used
          to update the inode and should not be modified here.
      
       3) Add update_inode_attr() that uses the attributes to determine what
          the inode permissions should be.
      
       4) As the parent_inode of the eventfs_root_inode structure is no longer
          needed, remove it.
      
      Now on creation, the inode gets the proper permissions without causing
      side effects to the ei->attr field.
      
      Link: https://lore.kernel.org/lkml/20240522165031.944088388@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      6e3d7c90
    • Steven Rostedt (Google)'s avatar
      eventfs: Remove getattr and permission callbacks · 37cd0d12
      Steven Rostedt (Google) authored
      Now that inodes have their permissions updated on remount, the only other
      places to update the inode permissions are when they are created and in
      the setattr callback. The getattr and permission callbacks are not needed
      as the inodes should already be set at their proper settings.
      
      Remove the callbacks, as it not only simplifies the code, but also allows
      more flexibility to fix the inconsistencies with various corner cases
      (like changing the permission of an instance directory).
      
      Link: https://lore.kernel.org/lkml/20240522165031.782066021@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      37cd0d12
    • Steven Rostedt (Google)'s avatar
      eventfs: Consolidate the eventfs_inode update in eventfs_get_inode() · 625acf9d
      Steven Rostedt (Google) authored
      To simplify the code, create a eventfs_get_inode() that is used when an
      eventfs file or directory is created. Have the internal tracefs_inode
      updated the appropriate flags in this function and update the inode's
      mode as well.
      
      Link: https://lore.kernel.org/lkml/20240522165031.624864160@goodmis.org
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      625acf9d
    • Steven Rostedt (Google)'s avatar
      tracefs: Clear EVENT_INODE flag in tracefs_drop_inode() · 0bcfd9aa
      Steven Rostedt (Google) authored
      When the inode is being dropped from the dentry, the TRACEFS_EVENT_INODE
      flag needs to be cleared to prevent a remount from calling
      eventfs_remount() on the tracefs_inode private data. There's a race
      between the inode is dropped (and the dentry freed) to where the inode is
      actually freed. If a remount happens between the two, the eventfs_inode
      could be accessed after it is freed (only the dentry keeps a ref count on
      it).
      
      Currently the TRACEFS_EVENT_INODE flag is cleared from the dentry iput()
      function. But this is incorrect, as it is possible that the inode has
      another reference to it. The flag should only be cleared when the inode is
      really being dropped and has no more references. That happens in the
      drop_inode callback of the inode, as that gets called when the last
      reference of the inode is released.
      
      Remove the tracefs_d_iput() function and move its logic to the more
      appropriate tracefs_drop_inode() callback function.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.908205106@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Fixes: baa23a8d ("tracefs: Reset permissions on remount if permissions are options")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      0bcfd9aa
    • Steven Rostedt (Google)'s avatar
      eventfs: Update all the eventfs_inodes from the events descriptor · 340f0c70
      Steven Rostedt (Google) authored
      The change to update the permissions of the eventfs_inode had the
      misconception that using the tracefs_inode would find all the
      eventfs_inodes that have been updated and reset them on remount.
      The problem with this approach is that the eventfs_inodes are freed when
      they are no longer used (basically the reason the eventfs system exists).
      When they are freed, the updated eventfs_inodes are not reset on a remount
      because their tracefs_inodes have been freed.
      
      Instead, since the events directory eventfs_inode always has a
      tracefs_inode pointing to it (it is not freed when finished), and the
      events directory has a link to all its children, have the
      eventfs_remount() function only operate on the events eventfs_inode and
      have it descend into its children updating their uid and gids.
      
      Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVwJyVQ@mail.gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: baa23a8d ("tracefs: Reset permissions on remount if permissions are options")
      Reported-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      340f0c70
    • Steven Rostedt (Google)'s avatar
      tracefs: Update inode permissions on remount · 27c04648
      Steven Rostedt (Google) authored
      When a remount happens, if a gid or uid is specified update the inodes to
      have the same gid and uid. This will allow the simplification of the
      permissions logic for the dynamically created files and directories.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.592429986@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Fixes: baa23a8d ("tracefs: Reset permissions on remount if permissions are options")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      27c04648
    • Steven Rostedt (Google)'s avatar
      eventfs: Keep the directories from having the same inode number as files · 8898e7f2
      Steven Rostedt (Google) authored
      The directories require unique inode numbers but all the eventfs files
      have the same inode number. Prevent the directories from having the same
      inode numbers as the files as that can confuse some tooling.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.428826685@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Fixes: 834bf76a ("eventfs: Save directory inodes in the eventfs_inode structure")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      8898e7f2
    • Yu Kuai's avatar
      null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues' · a2db328b
      Yu Kuai authored
      Writing 'power' and 'submit_queues' concurrently will trigger kernel
      panic:
      
      Test script:
      
      modprobe null_blk nr_devices=0
      mkdir -p /sys/kernel/config/nullb/nullb0
      while true; do echo 1 > submit_queues; echo 4 > submit_queues; done &
      while true; do echo 1 > power; echo 0 > power; done
      
      Test result:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000148
      Oops: 0000 [#1] PREEMPT SMP
      RIP: 0010:__lock_acquire+0x41d/0x28f0
      Call Trace:
       <TASK>
       lock_acquire+0x121/0x450
       down_write+0x5f/0x1d0
       simple_recursive_removal+0x12f/0x5c0
       blk_mq_debugfs_unregister_hctxs+0x7c/0x100
       blk_mq_update_nr_hw_queues+0x4a3/0x720
       nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
       nullb_device_submit_queues_store+0x79/0xf0 [null_blk]
       configfs_write_iter+0x119/0x1e0
       vfs_write+0x326/0x730
       ksys_write+0x74/0x150
      
      This is because del_gendisk() can concurrent with
      blk_mq_update_nr_hw_queues():
      
      nullb_device_power_store	nullb_apply_submit_queues
       null_del_dev
       del_gendisk
      				 nullb_update_nr_hw_queues
      				  if (!dev->nullb)
      				  // still set while gendisk is deleted
      				   return 0
      				  blk_mq_update_nr_hw_queues
       dev->nullb = NULL
      
      Fix this problem by resuing the global mutex to protect
      nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.
      
      Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
      Reported-and-tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@linux.dev>
      Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a2db328b
    • Paolo Abeni's avatar
      Merge branch 'intel-interpret-set_channels-input-differently' · 3d8597d8
      Paolo Abeni authored
      Jacob Keller says:
      
      ====================
      intel: Interpret .set_channels() input differently
      
      The ice and idpf drivers can trigger a crash with AF_XDP due to incorrect
      interpretation of the asymmetric Tx and Rx parameters in their
      .set_channels() implementations:
      
      1. ethtool -l <IFNAME> -> combined: 40
      2. Attach AF_XDP to queue 30
      3. ethtool -L <IFNAME> rx 15 tx 15
         combined number is not specified, so command becomes {rx_count = 15,
         tx_count = 15, combined_count = 40}.
      4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
         new (combined_count + rx_count) to the old one, so from 55 to 40, check
         does not trigger.
      5. the driver interprets `rx 15 tx 15` as 15 combined channels and deletes
         the queue that AF_XDP is attached to.
      
      This is fundamentally a problem with interpreting a request for asymmetric
      queues as symmetric combined queues.
      
      Fix the ice and idpf drivers to stop interpreting such requests as a
      request for combined queues. Due to current driver design for both ice and
      idpf, it is not possible to support requests of the same count of Tx and Rx
      queues with independent interrupts, (i.e. ethtool -L <IFNAME> rx 15 tx 15)
      so such requests are now rejected.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240521-iwl-net-2024-05-14-set-channels-fixes-v2-0-7aa39e2e99f1@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3d8597d8
    • Larysa Zaremba's avatar
      idpf: Interpret .set_channels() input differently · 5e7695e0
      Larysa Zaremba authored
      Unlike ice, idpf does not check, if user has requested at least 1 combined
      channel. Instead, it relies on a check in the core code. Unfortunately, the
      check does not trigger for us because of the hacky .set_channels()
      interpretation logic that is not consistent with the core code.
      
      This naturally leads to user being able to trigger a crash with an invalid
      input. This is how:
      
      1. ethtool -l <IFNAME> -> combined: 40
      2. ethtool -L <IFNAME> rx 0 tx 0
         combined number is not specified, so command becomes {rx_count = 0,
         tx_count = 0, combined_count = 40}.
      3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel,
         comparing (combined_count + rx_count) and (combined_count + tx_count)
         to zero. Obviously, (40 + 0) is greater than zero, so the core code
         deems the input OK.
      4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such
         configuration.
      
      The issue has to be solved fundamentally, as current logic is also known to
      cause AF_XDP problems in ice [0].
      
      Interpret the command in a way that is more consistent with ethtool
      manual [1] (--show-channels and --set-channels) and new ice logic.
      
      Considering that in the idpf driver only the difference between RX and TX
      queues forms dedicated channels, change the correct way to set number of
      channels to:
      
      ethtool -L <IFNAME> combined 10 /* For symmetric queues */
      ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */
      
      [0] https://lore.kernel.org/netdev/20240418095857.2827-1-larysa.zaremba@intel.com/
      [1] https://man7.org/linux/man-pages/man8/ethtool.8.html
      
      Fixes: 02cbfba1 ("idpf: add ethtool callbacks")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarIgor Bagnucki <igor.bagnucki@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5e7695e0
    • Larysa Zaremba's avatar
      ice: Interpret .set_channels() input differently · 05d6f442
      Larysa Zaremba authored
      A bug occurs because a safety check guarding AF_XDP-related queues in
      ethnl_set_channels(), does not trigger. This happens, because kernel and
      ice driver interpret the ethtool command differently.
      
      How the bug occurs:
      1. ethtool -l <IFNAME> -> combined: 40
      2. Attach AF_XDP to queue 30
      3. ethtool -L <IFNAME> rx 15 tx 15
         combined number is not specified, so command becomes {rx_count = 15,
         tx_count = 15, combined_count = 40}.
      4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
         new (combined_count + rx_count) to the old one, so from 55 to 40, check
         does not trigger.
      5. ice interprets `rx 15 tx 15` as 15 combined channels and deletes the
         queue that AF_XDP is attached to.
      
      Interpret the command in a way that is more consistent with ethtool
      manual [0] (--show-channels and --set-channels).
      
      Considering that in the ice driver only the difference between RX and TX
      queues forms dedicated channels, change the correct way to set number of
      channels to:
      
      ethtool -L <IFNAME> combined 10 /* For symmetric queues */
      ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */
      
      [0] https://man7.org/linux/man-pages/man8/ethtool.8.html
      
      Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      05d6f442
    • Ryosuke Yasuoka's avatar
      nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() · 6671e352
      Ryosuke Yasuoka authored
      When nci_rx_work() receives a zero-length payload packet, it should not
      discard the packet and exit the loop. Instead, it should continue
      processing subsequent packets.
      
      Fixes: d24b0353 ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
      Signed-off-by: default avatarRyosuke Yasuoka <ryasuoka@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6671e352
    • Paolo Abeni's avatar
      net: relax socket state check at accept time. · 26afda78
      Paolo Abeni authored
      Christoph reported the following splat:
      
      WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
      Modules linked in:
      CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
      Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
      RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
      RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
      R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
      R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
      FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
       do_accept+0x435/0x620 net/socket.c:1929
       __sys_accept4_file net/socket.c:1969 [inline]
       __sys_accept4+0x9b/0x110 net/socket.c:1999
       __do_sys_accept net/socket.c:2016 [inline]
       __se_sys_accept net/socket.c:2013 [inline]
       __x64_sys_accept+0x7d/0x90 net/socket.c:2013
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      RIP: 0033:0x4315f9
      Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
      RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
      RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
      R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
       </TASK>
      
      The reproducer invokes shutdown() before entering the listener status.
      After commit 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for
      TCP_SYN_RECV sockets"), the above causes the child to reach the accept
      syscall in FIN_WAIT1 status.
      
      Eric noted we can relax the existing assertion in __inet_accept()
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      26afda78
    • Jason Xing's avatar
      tcp: remove 64 KByte limit for initial tp->rcv_wnd value · 378979e9
      Jason Xing authored
      Recently, we had some servers upgraded to the latest kernel and noticed
      the indicator from the user side showed worse results than before. It is
      caused by the limitation of tp->rcv_wnd.
      
      In 2018 commit a337531b ("tcp: up initial rmem to 128KB and SYN rwin
      to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
      CDN teams would not benefit from this change because they cannot have a
      large window to receive a big packet, which will be slowed down especially
      in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's
      the side effect for the latency/time-sensitive users.
      
      To avoid future confusion, current change doesn't affect the initial
      receive window on the wire in a SYN or SYN+ACK packet which are set within
      65535 bytes according to RFC 7323 also due to the limit in
      __tcp_transmit_skb():
      
          th->window      = htons(min(tp->rcv_wnd, 65535U));
      
      In one word, __tcp_transmit_skb() already ensures that constraint is
      respected, no matter how large tp->rcv_wnd is. The change doesn't violate
      RFC.
      
      Let me provide one example if with or without the patch:
      Before:
      client   --- SYN: rwindow=65535 ---> server
      client   <--- SYN+ACK: rwindow=65535 ----  server
      client   --- ACK: rwindow=65536 ---> server
      Note: for the last ACK, the calculation is 512 << 7.
      
      After:
      client   --- SYN: rwindow=65535 ---> server
      client   <--- SYN+ACK: rwindow=65535 ----  server
      client   --- ACK: rwindow=175232 ---> server
      Note: I use the following command to make it work:
      ip route change default via [ip] dev eth0 metric 100 initrwnd 120
      For the last ACK, the calculation is 1369 << 7.
      
      When we apply such a patch, having a large rcv_wnd if the user tweak this
      knob can help transfer data more rapidly and save some rtts.
      
      Fixes: a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      378979e9
    • Romain Gantois's avatar
      net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe() · b31c7e78
      Romain Gantois authored
      In the prueth_probe() function, if one of the calls to emac_phy_connect()
      fails due to of_phy_connect() returning NULL, then the subsequent call to
      phy_attached_info() will dereference a NULL pointer.
      
      Check the return code of emac_phy_connect and fail cleanly if there is an
      error.
      
      Fixes: 128d5874 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Link: https://lore.kernel.org/r/20240521-icssg-prueth-fix-v1-1-b4b17b1433e9@bootlin.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b31c7e78
    • Dae R. Jeong's avatar
      tls: fix missing memory barrier in tls_init · 91e61dd7
      Dae R. Jeong authored
      In tls_init(), a write memory barrier is missing, and store-store
      reordering may cause NULL dereference in tls_{setsockopt,getsockopt}.
      
      CPU0                               CPU1
      -----                              -----
      // In tls_init()
      // In tls_ctx_create()
      ctx = kzalloc()
      ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1)
      
      // In update_sk_prot()
      WRITE_ONCE(sk->sk_prot, tls_prots)     -(2)
      
                                         // In sock_common_setsockopt()
                                         READ_ONCE(sk->sk_prot)->setsockopt()
      
                                         // In tls_{setsockopt,getsockopt}()
                                         ctx->sk_proto->setsockopt()    -(3)
      
      In the above scenario, when (1) and (2) are reordered, (3) can observe
      the NULL value of ctx->sk_proto, causing NULL dereference.
      
      To fix it, we rely on rcu_assign_pointer() which implies the release
      barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is
      initialized, we can ensure that ctx->sk_proto are visible when
      changing sk->sk_prot.
      
      Fixes: d5bee737 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE")
      Signed-off-by: default avatarYewon Choi <woni9911@gmail.com>
      Signed-off-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/
      Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      91e61dd7
    • Wei Fang's avatar
      net: fec: avoid lock evasion when reading pps_enable · 3b1c92f8
      Wei Fang authored
      The assignment of pps_enable is protected by tmreg_lock, but the read
      operation of pps_enable is not. So the Coverity tool reports a lock
      evasion warning which may cause data race to occur when running in a
      multithread environment. Although this issue is almost impossible to
      occur, we'd better fix it, at least it seems more logically reasonable,
      and it also prevents Coverity from continuing to issue warnings.
      
      Fixes: 278d2404 ("net: fec: ptp: Enable PPS output based on ptp clock")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Link: https://lore.kernel.org/r/20240521023800.17102-1-wei.fang@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3b1c92f8
    • Jacob Keller's avatar
      Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" · b35b1c0b
      Jacob Keller authored
      This reverts commit 56573604.
      
      According to the commit, it implements a manual AN-37 for some
      "troublesome" Juniper MX5 switches. This appears to be a workaround for a
      particular switch.
      
      It has been reported that this causes a severe breakage for other switches,
      including a Cisco 3560CX-12PD-S.
      
      The code appears to be a workaround for a specific switch which fails to
      link in SFI mode. It expects to see AN-37 auto negotiation in order to
      link. The Cisco switch is not expecting AN-37 auto negotiation. When the
      device starts the manual AN-37, the Cisco switch decides that the port is
      confused and stops attempting to link with it. This persists until a power
      cycle. A simple driver unload and reload does not resolve the issue, even
      if loading with a version of the driver which lacks this workaround.
      
      The authors of the workaround commit have not responded with
      clarifications, and the result of the workaround is complete failure to
      connect with other switches.
      
      This appears to be a case where the driver can either "correctly" link with
      the Juniper MX5 switch, at the cost of bricking the link with the Cisco
      switch, or it can behave properly for the Cisco switch, but fail to link
      with the Junipir MX5 switch. I do not know enough about the standards
      involved to clearly determine whether either switch is at fault or behaving
      incorrectly. Nor do I know whether there exists some alternative fix which
      corrects behavior with both switches.
      
      Revert the workaround for the Juniper switch.
      
      Fixes: 56573604 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI")
      Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/
      Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeff Daly <jeffd@silicom-usa.com>
      Cc: kernel.org-fo5k2w@ycharbi.fr
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240520-net-2024-05-20-revert-silicom-switch-workaround-v1-1-50f80f261c94@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b35b1c0b
    • Joe Damato's avatar
      testing: net-drv: use stats64 for testing · a61a459f
      Joe Damato authored
      Testing a network device that has large numbers of bytes/packets may
      overflow. Using stats64 when comparing fixes this problem.
      
      I tripped on this while iterating on a qstats patch for mlx5. See below
      for confirmation without my added code that this is a bug.
      
      Before this patch (with added debugging output):
      
      $ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
      KTAP version 1
      1..4
      ok 1 stats.check_pause
      ok 2 stats.check_fec
      rstat: 481708634 qstat: 666201639514 key: tx-bytes
      not ok 3 stats.pkt_byte_sum
      ok 4 stats.qstat_by_ifindex
      
      Note the huge delta above ^^^ in the rtnl vs qstats.
      
      After this patch:
      
      $ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
      KTAP version 1
      1..4
      ok 1 stats.check_pause
      ok 2 stats.check_fec
      ok 3 stats.pkt_byte_sum
      ok 4 stats.qstat_by_ifindex
      
      It looks like rtnl_fill_stats in net/core/rtnetlink.c will attempt to
      copy the 64bit stats into a 32bit structure which is probably why this
      behavior is occurring.
      
      To show this is happening, you can get the underlying stats that the
      stats.py test uses like this:
      
      $ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \
                 --do getlink --json '{"ifi-index": 7}'
      
      And examine the output (heavily snipped to show relevant fields):
      
       'stats': {
                 'multicast': 3739197,
                 'rx-bytes': 1201525399,
                 'rx-packets': 56807158,
                 'tx-bytes': 492404458,
                 'tx-packets': 1200285371,
      
       'stats64': {
                   'multicast': 3739197,
                   'rx-bytes': 35561263767,
                   'rx-packets': 56807158,
                   'tx-bytes': 666212335338,
                   'tx-packets': 1200285371,
      
      The stats.py test prior to this patch was using the 'stats' structure
      above, which matches the failure output on my system.
      
      Comparing side by side, rx-bytes and tx-bytes, and getting ethtool -S
      output:
      
      rx-bytes stats:    1201525399
      rx-bytes stats64: 35561263767
      rx-bytes ethtool: 36203402638
      
      tx-bytes stats:      492404458
      tx-bytes stats64: 666212335338
      tx-bytes ethtool: 666215360113
      
      Note that the above was taken from a system with an mlx5 NIC, which only
      exposes ndo_get_stats64.
      
      Based on the ethtool output and qstat output, it appears that stats.py
      should be updated to use the 'stats64' structure for accurate
      comparisons when packet/byte counters get very large.
      
      To confirm that this was not related to the qstats code I was iterating
      on, I booted a kernel without my driver changes and re-ran the test
      which shows the qstats are skipped (as they don't exist for mlx5):
      
      NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
      KTAP version 1
      1..4
      ok 1 stats.check_pause
      ok 2 stats.check_fec
      ok 3 stats.pkt_byte_sum # SKIP qstats not supported by the device
      ok 4 stats.qstat_by_ifindex # SKIP No ifindex supports qstats
      
      But, fetching the stats using the CLI
      
      $ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \
                 --do getlink --json '{"ifi-index": 7}'
      
      Shows the same issue (heavily snipped for relevant fields only):
      
       'stats': {
                 'multicast': 105489,
                 'rx-bytes': 530879526,
                 'rx-packets': 751415,
                 'tx-bytes': 2510191396,
                 'tx-packets': 27700323,
       'stats64': {
                   'multicast': 105489,
                   'rx-bytes': 530879526,
                   'rx-packets': 751415,
                   'tx-bytes': 15395093284,
                   'tx-packets': 27700323,
      
      Comparing side by side with ethtool -S on the unmodified mlx5 driver:
      
      tx-bytes stats:    2510191396
      tx-bytes stats64: 15395093284
      tx-bytes ethtool: 17718435810
      
      Fixes: f0e6c86e ("testing: net-drv: add a driver test for stats reporting")
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Link: https://lore.kernel.org/r/20240520235850.190041-1-jdamato@fastly.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a61a459f
    • Linus Torvalds's avatar
      Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of... · c760b372
      Linus Torvalds authored
      Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull more non-mm updates from Andrew Morton:
      
       - A series ("kbuild: enable more warnings by default") from Arnd
         Bergmann which enables a number of additional build-time warnings. We
         fixed all the fallout which we could find, there may still be a few
         stragglers.
      
       - Samuel Holland has developed the series "Unified cross-architecture
         kernel-mode FPU API". This does a lot of consolidation of
         per-architecture kernel-mode FPU usage and enables the use of newer
         AMD GPUs on RISC-V.
      
       - Tao Su has fixed some selftests build warnings in the series
         "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
         definition".
      
       - This pull also includes a nilfs2 fixup from Ryusuke Konishi.
      
      * tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
        nilfs2: make block erasure safe in nilfs_finish_roll_forward()
        selftests/harness: use 1024 in place of LINE_MAX
        Revert "selftests/harness: remove use of LINE_MAX"
        selftests/fpu: allow building on other architectures
        selftests/fpu: move FP code to a separate translation unit
        drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
        drm/amd/display: only use hard-float, not altivec on powerpc
        riscv: add support for kernel-mode FPU
        x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
        arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
        arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
        ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
        x86/fpu: fix asm/fpu/types.h include guard
        kbuild: enable -Wcast-function-type-strict unconditionally
        kbuild: enable -Wformat-truncation on clang
        ...
      c760b372
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 5c6f4d68
      Linus Torvalds authored
      Pull more mm updates from Andrew Morton:
       "A series from Dave Chinner which cleans up and fixes the handling of
        nested allocations within stackdepot and page-owner"
      
      * tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/page-owner: use gfp_nested_mask() instead of open coded masking
        stackdepot: use gfp_nested_mask() instead of open coded masking
        mm: lift gfp_kmemleak_mask() to gfp.h
      5c6f4d68
    • Steven Rostedt (Google)'s avatar
      tracing/treewide: Remove second parameter of __assign_str() · 2c92ca84
      Steven Rostedt (Google) authored
      With the rework of how the __string() handles dynamic strings where it
      saves off the source string in field in the helper structure[1], the
      assignment of that value to the trace event field is stored in the helper
      value and does not need to be passed in again.
      
      This means that with:
      
        __string(field, mystring)
      
      Which use to be assigned with __assign_str(field, mystring), no longer
      needs the second parameter and it is unused. With this, __assign_str()
      will now only get a single parameter.
      
      There's over 700 users of __assign_str() and because coccinelle does not
      handle the TRACE_EVENT() macro I ended up using the following sed script:
      
        git grep -l __assign_str | while read a ; do
            sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
            mv /tmp/test-file $a;
        done
      
      I then searched for __assign_str() that did not end with ';' as those
      were multi line assignments that the sed script above would fail to catch.
      
      Note, the same updates will need to be done for:
      
        __assign_str_len()
        __assign_rel_str()
        __assign_rel_str_len()
      
      I tested this with both an allmodconfig and an allyesconfig (build only for both).
      
      [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Julia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
      Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
      Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
      Acked-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: Darrick J. Wong <djwong@kernel.org>	# xfs
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      2c92ca84
  3. 22 May, 2024 2 commits
    • Linus Torvalds's avatar
      mm: simplify and improve print_vma_addr() output · de7e71ef
      Linus Torvalds authored
      Use '%pD' to print out the filename, and print out the actual offset
      within the file too, rather than just what the virtual address of the
      mapping is (which doesn't tell you anything about any mapping offsets).
      
      Also, use the exact vma_lookup() instead of find_vma() - the latter
      looks up any vma _after_ the address, which is of questionable value
      (yes, maybe you fell off the beginning, but you'd be more likely to fall
      off the end).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de7e71ef
    • Linus Torvalds's avatar
      Merge local branch 'x86-codegen' · f8a6e48c
      Linus Torvalds authored
      Merge trivial x86 code generation annoyances
      
       - Introduce helper macros for clang asm input problems
      
       - use said macros to improve trivially stupid code generation issues in
         bitops and array_index_mask_nospec
      
       - also improve codegen with 32-bit array index comparisons
      
      None of these really matter, but I look at code generation and profiles
      fairly regularly, and these misfeatures caused the generated code to
      look really odd and distract from the real issues.
      
      * branch 'x86-codegen' of local tree:
        x86: improve bitop code generation with clang
        x86: improve array_index_mask_nospec() code generation
        clang: work around asm input constraint problems
      f8a6e48c