1. 27 May, 2024 1 commit
  2. 23 May, 2024 30 commits
    • Andrii Nakryiko's avatar
      uprobes: prevent mutex_lock() under rcu_read_lock() · 69964673
      Andrii Nakryiko authored
      Recent changes made uprobe_cpu_buffer preparation lazy, and moved it
      deeper into __uprobe_trace_func(). This is problematic because
      __uprobe_trace_func() is called inside rcu_read_lock()/rcu_read_unlock()
      block, which then calls prepare_uprobe_buffer() -> uprobe_buffer_get() ->
      mutex_lock(&ucb->mutex), leading to a splat about using mutex under
      non-sleepable RCU:
      
        BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
         in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 98231, name: stress-ng-sigq
         preempt_count: 0, expected: 0
         RCU nest depth: 1, expected: 0
         ...
         Call Trace:
          <TASK>
          dump_stack_lvl+0x3d/0xe0
          __might_resched+0x24c/0x270
          ? prepare_uprobe_buffer+0xd5/0x1d0
          __mutex_lock+0x41/0x820
          ? ___perf_sw_event+0x206/0x290
          ? __perf_event_task_sched_in+0x54/0x660
          ? __perf_event_task_sched_in+0x54/0x660
          prepare_uprobe_buffer+0xd5/0x1d0
          __uprobe_trace_func+0x4a/0x140
          uprobe_dispatcher+0x135/0x280
          ? uprobe_dispatcher+0x94/0x280
          uprobe_notify_resume+0x650/0xec0
          ? atomic_notifier_call_chain+0x21/0x110
          ? atomic_notifier_call_chain+0xf8/0x110
          irqentry_exit_to_user_mode+0xe2/0x1e0
          asm_exc_int3+0x35/0x40
         RIP: 0033:0x7f7e1d4da390
         Code: 33 04 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b9 01 00 00 00 e9 b2 fc ff ff 66 90 f3 0f 1e fa 31 c9 e9 a5 fc ff ff 0f 1f 44 00 00 <cc> 0f 1e fa b8 27 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 6e
         RSP: 002b:00007ffd2abc3608 EFLAGS: 00000246
         RAX: 0000000000000000 RBX: 0000000076d325f1 RCX: 0000000000000000
         RDX: 0000000076d325f1 RSI: 000000000000000a RDI: 00007ffd2abc3690
         RBP: 000000000000000a R08: 00017fb700000000 R09: 00017fb700000000
         R10: 00017fb700000000 R11: 0000000000000246 R12: 0000000000017ff2
         R13: 00007ffd2abc3610 R14: 0000000000000000 R15: 00007ffd2abc3780
          </TASK>
      
      Luckily, it's easy to fix by moving prepare_uprobe_buffer() to be called
      slightly earlier: into uprobe_trace_func() and uretprobe_trace_func(), outside
      of RCU locked section. This still keeps this buffer preparation lazy and helps
      avoid the overhead when it's not needed. E.g., if there is only BPF uprobe
      handler installed on a given uprobe, buffer won't be initialized.
      
      Note, the other user of prepare_uprobe_buffer(), __uprobe_perf_func(), is not
      affected, as it doesn't prepare buffer under RCU read lock.
      
      Link: https://lore.kernel.org/all/20240521053017.3708530-1-andrii@kernel.org/
      
      Fixes: 1b8f85de ("uprobes: prepare uprobe args buffer lazily")
      Reported-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      69964673
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 6d69b6c1
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Stable fixes:
         - nfs: fix undefined behavior in nfs_block_bits()
         - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
      
        Bugfixes:
         - Fix mixing of the lock/nolock and local_lock mount options
         - NFSv4: Fixup smatch warning for ambiguous return
         - NFSv3: Fix remount when using the legacy binary mount api
         - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
         - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
         - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
      
        Features and cleanups:
         - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
         - pNFS/filelayout: S layout segment range in LAYOUTGET
         - pNFS: rework pnfs_generic_pg_check_layout to check IO range
         - NFSv2: Turn off enabling of NFS v2 by default"
      
      * tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: fix undefined behavior in nfs_block_bits()
        pNFS: rework pnfs_generic_pg_check_layout to check IO range
        pNFS/filelayout: check layout segment range
        pNFS/filelayout: fixup pNfs allocation modes
        rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
        NFS: Don't enable NFS v2 by default
        NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
        sunrpc: fix NFSACL RPC retry on soft mount
        SUNRPC: fix handling expired GSS context
        nfs: keep server info for remounts
        NFSv4: Fixup smatch warning for ambiguous return
        NFS: make sure lock/nolock overriding local_lock mount option
        NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
        pNFS/filelayout: Specify the layout segment range in LAYOUTGET
        pNFS/filelayout: Remove the whole file layout requirement
      6d69b6c1
    • Linus Torvalds's avatar
      Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux · b4d88a60
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Followup block updates, mostly due to NVMe being a bit late to the
        party. But nothing major in there, so not a big deal.
      
        In detail, this contains:
      
         - NVMe pull request via Keith:
             - Fabrics connection retries (Daniel, Hannes)
             - Fabrics logging enhancements (Tokunori)
             - RDMA delete optimization (Sagi)
      
         - ublk DMA alignment fix (me)
      
         - null_blk sparse warning fixes (Bart)
      
         - Discard support for brd (Keith)
      
         - blk-cgroup list corruption fixes (Ming)
      
         - blk-cgroup stat propagation fix (Waiman)
      
         - Regression fix for plugging stall with md (Yu)
      
         - Misc fixes or cleanups (David, Jeff, Justin)"
      
      * tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
        null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
        blk-throttle: remove unused struct 'avg_latency_bucket'
        block: fix lost bio for plug enabled bio based device
        block: t10-pi: add MODULE_DESCRIPTION()
        blk-mq: add helper for checking if one CPU is mapped to specified hctx
        blk-cgroup: Properly propagate the iostat update up the hierarchy
        blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
        blk-cgroup: fix list corruption from resetting io stat
        cdrom: rearrange last_media_change check to avoid unintentional overflow
        nbd: Fix signal handling
        nbd: Remove a local variable from nbd_send_cmd()
        nbd: Improve the documentation of the locking assumptions
        nbd: Remove superfluous casts
        nbd: Use NULL to represent a pointer
        brd: implement discard support
        null_blk: Fix two sparse warnings
        ublk_drv: set DMA alignment mask to 3
        nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
        nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
        nvme: do not retry authentication failures
        ...
      b4d88a60
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux · 483a351e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Single fix here for a regression in 6.9, and then a simple cleanup
        removing some dead code"
      
      * tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
        io_uring: remove checks for NULL 'sq_offset'
        io_uring/sqpoll: ensure that normal task_work is also run timely
      483a351e
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.10-merge-window' of... · c2c80ecd
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A bunch of fixes that came in during the merge window.
      
        Matti found several issues with some of the more complexly configured
        Rohm regulators and the helpers they use and there were some errors in
        the specification of tps6594 when regulators are grouped together"
      
      * tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: tps6594-regulator: Correct multi-phase configuration
        regulator: tps6287x: Force writing VSEL bit
        regulator: pickable ranges: don't always cache vsel
        regulator: rohm-regulator: warn if unsupported voltage is set
        regulator: bd71828: Don't overwrite runtime voltages
      c2c80ecd
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.10-merge-window' of... · 09f8f2c4
      Linus Torvalds authored
      Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
      
      Pull regmap fix from Mark Brown:
       "Guenter ran with memory sanitisers and found an issue in the new KUnit
        tests that Richard added where an assumption in older test code was
        exposed, this was fixed quickly by Richard"
      
      * tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: kunit: Fix array overflow in stride() test
      09f8f2c4
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 66ad4829
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Quite smaller than usual. Notably it includes the fix for the unix
        regression from the past weeks. The TCP window fix will require some
        follow-up, already queued.
      
        Current release - regressions:
      
         - af_unix: fix garbage collection of embryos
      
        Previous releases - regressions:
      
         - af_unix: fix race between GC and receive path
      
         - ipv6: sr: fix missing sk_buff release in seg6_input_core
      
         - tcp: remove 64 KByte limit for initial tp->rcv_wnd value
      
         - eth: r8169: fix rx hangup
      
         - eth: lan966x: remove ptp traps in case the ptp is not enabled
      
         - eth: ixgbe: fix link breakage vs cisco switches
      
         - eth: ice: prevent ethtool from corrupting the channels
      
        Previous releases - always broken:
      
         - openvswitch: set the skbuff pkt_type for proper pmtud support
      
         - tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
      
        Misc:
      
         - a bunch of selftests stabilization patches"
      
      * tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
        r8169: Fix possible ring buffer corruption on fragmented Tx packets.
        idpf: Interpret .set_channels() input differently
        ice: Interpret .set_channels() input differently
        nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
        net: relax socket state check at accept time.
        tcp: remove 64 KByte limit for initial tp->rcv_wnd value
        net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
        tls: fix missing memory barrier in tls_init
        net: fec: avoid lock evasion when reading pps_enable
        Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
        testing: net-drv: use stats64 for testing
        net: mana: Fix the extra HZ in mana_hwc_send_request
        net: lan966x: Remove ptp traps in case the ptp is not enabled.
        openvswitch: Set the skbuff pkt_type for proper pmtud support.
        selftest: af_unix: Make SCM_RIGHTS into OOB data.
        af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
        tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
        selftests/net: use tc rule to filter the na packet
        ipv6: sr: fix memleak in seg6_hmac_init_algo
        af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
        ...
      66ad4829
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 404001dd
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Minor last minute fixes:
      
         - Fix a very tight race between the ring buffer readers and resizing
           the ring buffer
      
         - Correct some stale comments in the ring buffer code
      
         - Fix kernel-doc in the rv code
      
         - Add a MODULE_DESCRIPTION to preemptirq_delay_test"
      
      * tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rv: Update rv_en(dis)able_monitor doc to match kernel-doc
        tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test
        ring-buffer: Fix a race between readers and resize checks
        ring-buffer: Correct stale comments related to non-consuming readers
      404001dd
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e82d2af5
      Linus Torvalds authored
      Pull tracing tool fix from Steven Rostedt:
       "Fix printf format warnings in latency-collector.
      
        Use the printf format string with %s to take a string instead of
        taking in a string directly"
      
      * tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tools/latency-collector: Fix -Wformat-security compile warns
      e82d2af5
    • Linus Torvalds's avatar
      Merge tag 'trace-assign-str-v6.10' of... · d6a326d6
      Linus Torvalds authored
      Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing cleanup from Steven Rostedt:
       "Remove second argument of __assign_str()
      
        The __assign_str() macro logic of the TRACE_EVENT() macro was
        optimized so that it no longer needs the second argument. The
        __assign_str() is always matched with __string() field that takes a
        field name and the source for that field:
      
          __string(field, source)
      
        The TRACE_EVENT() macro logic will save off the source value and then
        use that value to copy into the ring buffer via the __assign_str().
      
        Before commit c1fa617c ("tracing: Rework __assign_str() and
        __string() to not duplicate getting the string"), the __assign_str()
        needed the second argument which would perform the same logic as the
        __string() source parameter did. Not only would this add overhead, but
        it was error prone as if the __assign_str() source produced something
        different, it may not have allocated enough for the string in the ring
        buffer (as the __string() source was used to determine how much to
        allocate)
      
        Now that the __assign_str() just uses the same string that was used in
        __string() it no longer needs the source parameter. It can now be
        removed"
      
      * tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/treewide: Remove second parameter of __assign_str()
      d6a326d6
    • Linus Torvalds's avatar
      Merge tag 'sparc-for-6.10-tag1' of... · bca2a25d
      Linus Torvalds authored
      Merge tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
      
      Pull sparc updates from Andreas Larsson:
      
       - Avoid on-stack cpumask variables in a number of places
      
       - Move struct termio to asm/termios.h, matching other architectures and
         allowing certain user space applications to build also for sparc
      
       - Fix missing prototype warnings for sparc64
      
       - Fix version generation warnings for sparc32
      
       - Fix bug where non-consecutive CPU IDs lead to some CPUs not starting
      
       - Simplification using swap and cleanup using NULL for pointer
      
       - Convert sparc parport and chmc drivers to use remove callbacks
         returning void
      
      * tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
        sparc/leon: Remove on-stack cpumask var
        sparc/pci_msi: Remove on-stack cpumask var
        sparc/of: Remove on-stack cpumask var
        sparc/irq: Remove on-stack cpumask var
        sparc/srmmu: Remove on-stack cpumask var
        sparc: chmc: Convert to platform remove callback returning void
        sparc: parport: Convert to platform remove callback returning void
        sparc: Compare pointers to NULL instead of 0
        sparc: Use swap() to fix Coccinelle warning
        sparc32: Fix version generation failed warnings
        sparc64: Fix number of online CPUs
        sparc64: Fix prototype warning for sched_clock
        sparc64: Fix prototype warnings in adi_64.c
        sparc64: Fix prototype warning for dma_4v_iotsb_bind
        sparc64: Fix prototype warning for uprobe_trap
        sparc64: Fix prototype warning for alloc_irqstack_bootmem
        sparc64: Fix prototype warning for vmemmap_free
        sparc64: Fix prototype warnings in traps_64.c
        sparc64: Fix prototype warning for init_vdso_image
        sparc: move struct termio to asm/termios.h
      bca2a25d
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2b7ced10
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The major fix here is for a filesystem corruption issue reported on
        Apple M1 as a result of buggy management of the floating point
        register state introduced in 6.8. I initially reverted one of the
        offending patches, but in the end Ard cooked a proper fix so there's a
        revert+reapply in the series.
      
        Aside from that, we've got some CPU errata workarounds and misc other
        fixes.
      
         - Fix broken FP register state tracking which resulted in filesystem
           corruption when dm-crypt is used
      
         - Workarounds for Arm CPU errata affecting the SSBS Spectre
           mitigation
      
         - Fix lockdep assertion in DMC620 memory controller PMU driver
      
         - Fix alignment of BUG table when CONFIG_DEBUG_BUGVERBOSE is
           disabled"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/fpsimd: Avoid erroneous elide of user state reload
        Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
        perf/arm-dmc620: Fix lockdep assert in ->event_init()
        Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: errata: Add workaround for Arm errata 3194386 and 3312417
        arm64: cputype: Add Neoverse-V3 definitions
        arm64: cputype: Add Cortex-X4 definitions
        arm64: barrier: Restore spec_bar() macro
      2b7ced10
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2ef32ad2
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "Several new features here:
      
         - virtio-net is finally supported in vduse
      
         - virtio (balloon and mem) interaction with suspend is improved
      
         - vhost-scsi now handles signals better/faster
      
        And fixes, cleanups all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
        virtio-pci: Check if is_avq is NULL
        virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
        MAINTAINERS: add Eugenio Pérez as reviewer
        vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
        vp_vdpa: don't allocate unused msix vectors
        sound: virtio: drop owner assignment
        fuse: virtio: drop owner assignment
        scsi: virtio: drop owner assignment
        rpmsg: virtio: drop owner assignment
        nvdimm: virtio_pmem: drop owner assignment
        wifi: mac80211_hwsim: drop owner assignment
        vsock/virtio: drop owner assignment
        net: 9p: virtio: drop owner assignment
        net: virtio: drop owner assignment
        net: caif: virtio: drop owner assignment
        misc: nsm: drop owner assignment
        iommu: virtio: drop owner assignment
        drm/virtio: drop owner assignment
        gpio: virtio: drop owner assignment
        firmware: arm_scmi: virtio: drop owner assignment
        ...
      2ef32ad2
    • Shuah Khan's avatar
      tools/latency-collector: Fix -Wformat-security compile warns · df73757c
      Shuah Khan authored
      Fix the following -Wformat-security compile warnings adding missing
      format arguments:
      
      latency-collector.c: In function ‘show_available’:
      latency-collector.c:938:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        938 |                 warnx(no_tracer_msg);
            |                 ^~~~~
      
      latency-collector.c:943:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        943 |                 warnx(no_latency_tr_msg);
            |                 ^~~~~
      
      latency-collector.c: In function ‘find_default_tracer’:
      latency-collector.c:986:25: warning: format not a string literal and
      no format arguments [-Wformat-security]
        986 |                         errx(EXIT_FAILURE, no_tracer_msg);
            |
                               ^~~~
      latency-collector.c: In function ‘scan_arguments’:
      latency-collector.c:1881:33: warning: format not a string literal and
      no format arguments [-Wformat-security]
       1881 |                                 errx(EXIT_FAILURE, no_tracer_msg);
            |                                 ^~~~
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240404011009.32945-1-skhan@linuxfoundation.org
      
      Cc: stable@vger.kernel.org
      Fixes: e23db805 ("tracing/tools: Add the latency-collector to tools directory")
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      df73757c
    • Ken Milmore's avatar
      r8169: Fix possible ring buffer corruption on fragmented Tx packets. · c71e3a5c
      Ken Milmore authored
      An issue was found on the RTL8125b when transmitting small fragmented
      packets, whereby invalid entries were inserted into the transmit ring
      buffer, subsequently leading to calls to dma_unmap_single() with a null
      address.
      
      This was caused by rtl8169_start_xmit() not noticing changes to nr_frags
      which may occur when small packets are padded (to work around hardware
      quirks) in rtl8169_tso_csum_v2().
      
      To fix this, postpone inspecting nr_frags until after any padding has been
      applied.
      
      Fixes: 9020845f ("r8169: improve rtl8169_start_xmit")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKen Milmore <ken.milmore@gmail.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c71e3a5c
    • Yu Kuai's avatar
      null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues' · a2db328b
      Yu Kuai authored
      Writing 'power' and 'submit_queues' concurrently will trigger kernel
      panic:
      
      Test script:
      
      modprobe null_blk nr_devices=0
      mkdir -p /sys/kernel/config/nullb/nullb0
      while true; do echo 1 > submit_queues; echo 4 > submit_queues; done &
      while true; do echo 1 > power; echo 0 > power; done
      
      Test result:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000148
      Oops: 0000 [#1] PREEMPT SMP
      RIP: 0010:__lock_acquire+0x41d/0x28f0
      Call Trace:
       <TASK>
       lock_acquire+0x121/0x450
       down_write+0x5f/0x1d0
       simple_recursive_removal+0x12f/0x5c0
       blk_mq_debugfs_unregister_hctxs+0x7c/0x100
       blk_mq_update_nr_hw_queues+0x4a3/0x720
       nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
       nullb_device_submit_queues_store+0x79/0xf0 [null_blk]
       configfs_write_iter+0x119/0x1e0
       vfs_write+0x326/0x730
       ksys_write+0x74/0x150
      
      This is because del_gendisk() can concurrent with
      blk_mq_update_nr_hw_queues():
      
      nullb_device_power_store	nullb_apply_submit_queues
       null_del_dev
       del_gendisk
      				 nullb_update_nr_hw_queues
      				  if (!dev->nullb)
      				  // still set while gendisk is deleted
      				   return 0
      				  blk_mq_update_nr_hw_queues
       dev->nullb = NULL
      
      Fix this problem by resuing the global mutex to protect
      nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.
      
      Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
      Reported-and-tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@linux.dev>
      Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a2db328b
    • Paolo Abeni's avatar
      Merge branch 'intel-interpret-set_channels-input-differently' · 3d8597d8
      Paolo Abeni authored
      Jacob Keller says:
      
      ====================
      intel: Interpret .set_channels() input differently
      
      The ice and idpf drivers can trigger a crash with AF_XDP due to incorrect
      interpretation of the asymmetric Tx and Rx parameters in their
      .set_channels() implementations:
      
      1. ethtool -l <IFNAME> -> combined: 40
      2. Attach AF_XDP to queue 30
      3. ethtool -L <IFNAME> rx 15 tx 15
         combined number is not specified, so command becomes {rx_count = 15,
         tx_count = 15, combined_count = 40}.
      4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
         new (combined_count + rx_count) to the old one, so from 55 to 40, check
         does not trigger.
      5. the driver interprets `rx 15 tx 15` as 15 combined channels and deletes
         the queue that AF_XDP is attached to.
      
      This is fundamentally a problem with interpreting a request for asymmetric
      queues as symmetric combined queues.
      
      Fix the ice and idpf drivers to stop interpreting such requests as a
      request for combined queues. Due to current driver design for both ice and
      idpf, it is not possible to support requests of the same count of Tx and Rx
      queues with independent interrupts, (i.e. ethtool -L <IFNAME> rx 15 tx 15)
      so such requests are now rejected.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240521-iwl-net-2024-05-14-set-channels-fixes-v2-0-7aa39e2e99f1@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3d8597d8
    • Larysa Zaremba's avatar
      idpf: Interpret .set_channels() input differently · 5e7695e0
      Larysa Zaremba authored
      Unlike ice, idpf does not check, if user has requested at least 1 combined
      channel. Instead, it relies on a check in the core code. Unfortunately, the
      check does not trigger for us because of the hacky .set_channels()
      interpretation logic that is not consistent with the core code.
      
      This naturally leads to user being able to trigger a crash with an invalid
      input. This is how:
      
      1. ethtool -l <IFNAME> -> combined: 40
      2. ethtool -L <IFNAME> rx 0 tx 0
         combined number is not specified, so command becomes {rx_count = 0,
         tx_count = 0, combined_count = 40}.
      3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel,
         comparing (combined_count + rx_count) and (combined_count + tx_count)
         to zero. Obviously, (40 + 0) is greater than zero, so the core code
         deems the input OK.
      4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such
         configuration.
      
      The issue has to be solved fundamentally, as current logic is also known to
      cause AF_XDP problems in ice [0].
      
      Interpret the command in a way that is more consistent with ethtool
      manual [1] (--show-channels and --set-channels) and new ice logic.
      
      Considering that in the idpf driver only the difference between RX and TX
      queues forms dedicated channels, change the correct way to set number of
      channels to:
      
      ethtool -L <IFNAME> combined 10 /* For symmetric queues */
      ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */
      
      [0] https://lore.kernel.org/netdev/20240418095857.2827-1-larysa.zaremba@intel.com/
      [1] https://man7.org/linux/man-pages/man8/ethtool.8.html
      
      Fixes: 02cbfba1 ("idpf: add ethtool callbacks")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarIgor Bagnucki <igor.bagnucki@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5e7695e0
    • Larysa Zaremba's avatar
      ice: Interpret .set_channels() input differently · 05d6f442
      Larysa Zaremba authored
      A bug occurs because a safety check guarding AF_XDP-related queues in
      ethnl_set_channels(), does not trigger. This happens, because kernel and
      ice driver interpret the ethtool command differently.
      
      How the bug occurs:
      1. ethtool -l <IFNAME> -> combined: 40
      2. Attach AF_XDP to queue 30
      3. ethtool -L <IFNAME> rx 15 tx 15
         combined number is not specified, so command becomes {rx_count = 15,
         tx_count = 15, combined_count = 40}.
      4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
         new (combined_count + rx_count) to the old one, so from 55 to 40, check
         does not trigger.
      5. ice interprets `rx 15 tx 15` as 15 combined channels and deletes the
         queue that AF_XDP is attached to.
      
      Interpret the command in a way that is more consistent with ethtool
      manual [0] (--show-channels and --set-channels).
      
      Considering that in the ice driver only the difference between RX and TX
      queues forms dedicated channels, change the correct way to set number of
      channels to:
      
      ethtool -L <IFNAME> combined 10 /* For symmetric queues */
      ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */
      
      [0] https://man7.org/linux/man-pages/man8/ethtool.8.html
      
      Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      05d6f442
    • Ryosuke Yasuoka's avatar
      nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() · 6671e352
      Ryosuke Yasuoka authored
      When nci_rx_work() receives a zero-length payload packet, it should not
      discard the packet and exit the loop. Instead, it should continue
      processing subsequent packets.
      
      Fixes: d24b0353 ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
      Signed-off-by: default avatarRyosuke Yasuoka <ryasuoka@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6671e352
    • Paolo Abeni's avatar
      net: relax socket state check at accept time. · 26afda78
      Paolo Abeni authored
      Christoph reported the following splat:
      
      WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
      Modules linked in:
      CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
      Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
      RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
      RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
      R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
      R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
      FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
       do_accept+0x435/0x620 net/socket.c:1929
       __sys_accept4_file net/socket.c:1969 [inline]
       __sys_accept4+0x9b/0x110 net/socket.c:1999
       __do_sys_accept net/socket.c:2016 [inline]
       __se_sys_accept net/socket.c:2013 [inline]
       __x64_sys_accept+0x7d/0x90 net/socket.c:2013
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      RIP: 0033:0x4315f9
      Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
      RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
      RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
      R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
       </TASK>
      
      The reproducer invokes shutdown() before entering the listener status.
      After commit 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for
      TCP_SYN_RECV sockets"), the above causes the child to reach the accept
      syscall in FIN_WAIT1 status.
      
      Eric noted we can relax the existing assertion in __inet_accept()
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      26afda78
    • Jason Xing's avatar
      tcp: remove 64 KByte limit for initial tp->rcv_wnd value · 378979e9
      Jason Xing authored
      Recently, we had some servers upgraded to the latest kernel and noticed
      the indicator from the user side showed worse results than before. It is
      caused by the limitation of tp->rcv_wnd.
      
      In 2018 commit a337531b ("tcp: up initial rmem to 128KB and SYN rwin
      to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
      CDN teams would not benefit from this change because they cannot have a
      large window to receive a big packet, which will be slowed down especially
      in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's
      the side effect for the latency/time-sensitive users.
      
      To avoid future confusion, current change doesn't affect the initial
      receive window on the wire in a SYN or SYN+ACK packet which are set within
      65535 bytes according to RFC 7323 also due to the limit in
      __tcp_transmit_skb():
      
          th->window      = htons(min(tp->rcv_wnd, 65535U));
      
      In one word, __tcp_transmit_skb() already ensures that constraint is
      respected, no matter how large tp->rcv_wnd is. The change doesn't violate
      RFC.
      
      Let me provide one example if with or without the patch:
      Before:
      client   --- SYN: rwindow=65535 ---> server
      client   <--- SYN+ACK: rwindow=65535 ----  server
      client   --- ACK: rwindow=65536 ---> server
      Note: for the last ACK, the calculation is 512 << 7.
      
      After:
      client   --- SYN: rwindow=65535 ---> server
      client   <--- SYN+ACK: rwindow=65535 ----  server
      client   --- ACK: rwindow=175232 ---> server
      Note: I use the following command to make it work:
      ip route change default via [ip] dev eth0 metric 100 initrwnd 120
      For the last ACK, the calculation is 1369 << 7.
      
      When we apply such a patch, having a large rcv_wnd if the user tweak this
      knob can help transfer data more rapidly and save some rtts.
      
      Fixes: a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      378979e9
    • Romain Gantois's avatar
      net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe() · b31c7e78
      Romain Gantois authored
      In the prueth_probe() function, if one of the calls to emac_phy_connect()
      fails due to of_phy_connect() returning NULL, then the subsequent call to
      phy_attached_info() will dereference a NULL pointer.
      
      Check the return code of emac_phy_connect and fail cleanly if there is an
      error.
      
      Fixes: 128d5874 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Link: https://lore.kernel.org/r/20240521-icssg-prueth-fix-v1-1-b4b17b1433e9@bootlin.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b31c7e78
    • Dae R. Jeong's avatar
      tls: fix missing memory barrier in tls_init · 91e61dd7
      Dae R. Jeong authored
      In tls_init(), a write memory barrier is missing, and store-store
      reordering may cause NULL dereference in tls_{setsockopt,getsockopt}.
      
      CPU0                               CPU1
      -----                              -----
      // In tls_init()
      // In tls_ctx_create()
      ctx = kzalloc()
      ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1)
      
      // In update_sk_prot()
      WRITE_ONCE(sk->sk_prot, tls_prots)     -(2)
      
                                         // In sock_common_setsockopt()
                                         READ_ONCE(sk->sk_prot)->setsockopt()
      
                                         // In tls_{setsockopt,getsockopt}()
                                         ctx->sk_proto->setsockopt()    -(3)
      
      In the above scenario, when (1) and (2) are reordered, (3) can observe
      the NULL value of ctx->sk_proto, causing NULL dereference.
      
      To fix it, we rely on rcu_assign_pointer() which implies the release
      barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is
      initialized, we can ensure that ctx->sk_proto are visible when
      changing sk->sk_prot.
      
      Fixes: d5bee737 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE")
      Signed-off-by: default avatarYewon Choi <woni9911@gmail.com>
      Signed-off-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/
      Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      91e61dd7
    • Wei Fang's avatar
      net: fec: avoid lock evasion when reading pps_enable · 3b1c92f8
      Wei Fang authored
      The assignment of pps_enable is protected by tmreg_lock, but the read
      operation of pps_enable is not. So the Coverity tool reports a lock
      evasion warning which may cause data race to occur when running in a
      multithread environment. Although this issue is almost impossible to
      occur, we'd better fix it, at least it seems more logically reasonable,
      and it also prevents Coverity from continuing to issue warnings.
      
      Fixes: 278d2404 ("net: fec: ptp: Enable PPS output based on ptp clock")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Link: https://lore.kernel.org/r/20240521023800.17102-1-wei.fang@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3b1c92f8
    • Jacob Keller's avatar
      Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" · b35b1c0b
      Jacob Keller authored
      This reverts commit 56573604.
      
      According to the commit, it implements a manual AN-37 for some
      "troublesome" Juniper MX5 switches. This appears to be a workaround for a
      particular switch.
      
      It has been reported that this causes a severe breakage for other switches,
      including a Cisco 3560CX-12PD-S.
      
      The code appears to be a workaround for a specific switch which fails to
      link in SFI mode. It expects to see AN-37 auto negotiation in order to
      link. The Cisco switch is not expecting AN-37 auto negotiation. When the
      device starts the manual AN-37, the Cisco switch decides that the port is
      confused and stops attempting to link with it. This persists until a power
      cycle. A simple driver unload and reload does not resolve the issue, even
      if loading with a version of the driver which lacks this workaround.
      
      The authors of the workaround commit have not responded with
      clarifications, and the result of the workaround is complete failure to
      connect with other switches.
      
      This appears to be a case where the driver can either "correctly" link with
      the Juniper MX5 switch, at the cost of bricking the link with the Cisco
      switch, or it can behave properly for the Cisco switch, but fail to link
      with the Junipir MX5 switch. I do not know enough about the standards
      involved to clearly determine whether either switch is at fault or behaving
      incorrectly. Nor do I know whether there exists some alternative fix which
      corrects behavior with both switches.
      
      Revert the workaround for the Juniper switch.
      
      Fixes: 56573604 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI")
      Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/
      Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeff Daly <jeffd@silicom-usa.com>
      Cc: kernel.org-fo5k2w@ycharbi.fr
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240520-net-2024-05-20-revert-silicom-switch-workaround-v1-1-50f80f261c94@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b35b1c0b
    • Joe Damato's avatar
      testing: net-drv: use stats64 for testing · a61a459f
      Joe Damato authored
      Testing a network device that has large numbers of bytes/packets may
      overflow. Using stats64 when comparing fixes this problem.
      
      I tripped on this while iterating on a qstats patch for mlx5. See below
      for confirmation without my added code that this is a bug.
      
      Before this patch (with added debugging output):
      
      $ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
      KTAP version 1
      1..4
      ok 1 stats.check_pause
      ok 2 stats.check_fec
      rstat: 481708634 qstat: 666201639514 key: tx-bytes
      not ok 3 stats.pkt_byte_sum
      ok 4 stats.qstat_by_ifindex
      
      Note the huge delta above ^^^ in the rtnl vs qstats.
      
      After this patch:
      
      $ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
      KTAP version 1
      1..4
      ok 1 stats.check_pause
      ok 2 stats.check_fec
      ok 3 stats.pkt_byte_sum
      ok 4 stats.qstat_by_ifindex
      
      It looks like rtnl_fill_stats in net/core/rtnetlink.c will attempt to
      copy the 64bit stats into a 32bit structure which is probably why this
      behavior is occurring.
      
      To show this is happening, you can get the underlying stats that the
      stats.py test uses like this:
      
      $ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \
                 --do getlink --json '{"ifi-index": 7}'
      
      And examine the output (heavily snipped to show relevant fields):
      
       'stats': {
                 'multicast': 3739197,
                 'rx-bytes': 1201525399,
                 'rx-packets': 56807158,
                 'tx-bytes': 492404458,
                 'tx-packets': 1200285371,
      
       'stats64': {
                   'multicast': 3739197,
                   'rx-bytes': 35561263767,
                   'rx-packets': 56807158,
                   'tx-bytes': 666212335338,
                   'tx-packets': 1200285371,
      
      The stats.py test prior to this patch was using the 'stats' structure
      above, which matches the failure output on my system.
      
      Comparing side by side, rx-bytes and tx-bytes, and getting ethtool -S
      output:
      
      rx-bytes stats:    1201525399
      rx-bytes stats64: 35561263767
      rx-bytes ethtool: 36203402638
      
      tx-bytes stats:      492404458
      tx-bytes stats64: 666212335338
      tx-bytes ethtool: 666215360113
      
      Note that the above was taken from a system with an mlx5 NIC, which only
      exposes ndo_get_stats64.
      
      Based on the ethtool output and qstat output, it appears that stats.py
      should be updated to use the 'stats64' structure for accurate
      comparisons when packet/byte counters get very large.
      
      To confirm that this was not related to the qstats code I was iterating
      on, I booted a kernel without my driver changes and re-ran the test
      which shows the qstats are skipped (as they don't exist for mlx5):
      
      NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
      KTAP version 1
      1..4
      ok 1 stats.check_pause
      ok 2 stats.check_fec
      ok 3 stats.pkt_byte_sum # SKIP qstats not supported by the device
      ok 4 stats.qstat_by_ifindex # SKIP No ifindex supports qstats
      
      But, fetching the stats using the CLI
      
      $ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \
                 --do getlink --json '{"ifi-index": 7}'
      
      Shows the same issue (heavily snipped for relevant fields only):
      
       'stats': {
                 'multicast': 105489,
                 'rx-bytes': 530879526,
                 'rx-packets': 751415,
                 'tx-bytes': 2510191396,
                 'tx-packets': 27700323,
       'stats64': {
                   'multicast': 105489,
                   'rx-bytes': 530879526,
                   'rx-packets': 751415,
                   'tx-bytes': 15395093284,
                   'tx-packets': 27700323,
      
      Comparing side by side with ethtool -S on the unmodified mlx5 driver:
      
      tx-bytes stats:    2510191396
      tx-bytes stats64: 15395093284
      tx-bytes ethtool: 17718435810
      
      Fixes: f0e6c86e ("testing: net-drv: add a driver test for stats reporting")
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Link: https://lore.kernel.org/r/20240520235850.190041-1-jdamato@fastly.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a61a459f
    • Linus Torvalds's avatar
      Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of... · c760b372
      Linus Torvalds authored
      Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull more non-mm updates from Andrew Morton:
      
       - A series ("kbuild: enable more warnings by default") from Arnd
         Bergmann which enables a number of additional build-time warnings. We
         fixed all the fallout which we could find, there may still be a few
         stragglers.
      
       - Samuel Holland has developed the series "Unified cross-architecture
         kernel-mode FPU API". This does a lot of consolidation of
         per-architecture kernel-mode FPU usage and enables the use of newer
         AMD GPUs on RISC-V.
      
       - Tao Su has fixed some selftests build warnings in the series
         "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
         definition".
      
       - This pull also includes a nilfs2 fixup from Ryusuke Konishi.
      
      * tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
        nilfs2: make block erasure safe in nilfs_finish_roll_forward()
        selftests/harness: use 1024 in place of LINE_MAX
        Revert "selftests/harness: remove use of LINE_MAX"
        selftests/fpu: allow building on other architectures
        selftests/fpu: move FP code to a separate translation unit
        drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
        drm/amd/display: only use hard-float, not altivec on powerpc
        riscv: add support for kernel-mode FPU
        x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
        arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
        arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
        ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
        arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
        x86/fpu: fix asm/fpu/types.h include guard
        kbuild: enable -Wcast-function-type-strict unconditionally
        kbuild: enable -Wformat-truncation on clang
        ...
      c760b372
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 5c6f4d68
      Linus Torvalds authored
      Pull more mm updates from Andrew Morton:
       "A series from Dave Chinner which cleans up and fixes the handling of
        nested allocations within stackdepot and page-owner"
      
      * tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/page-owner: use gfp_nested_mask() instead of open coded masking
        stackdepot: use gfp_nested_mask() instead of open coded masking
        mm: lift gfp_kmemleak_mask() to gfp.h
      5c6f4d68
    • Steven Rostedt (Google)'s avatar
      tracing/treewide: Remove second parameter of __assign_str() · 2c92ca84
      Steven Rostedt (Google) authored
      With the rework of how the __string() handles dynamic strings where it
      saves off the source string in field in the helper structure[1], the
      assignment of that value to the trace event field is stored in the helper
      value and does not need to be passed in again.
      
      This means that with:
      
        __string(field, mystring)
      
      Which use to be assigned with __assign_str(field, mystring), no longer
      needs the second parameter and it is unused. With this, __assign_str()
      will now only get a single parameter.
      
      There's over 700 users of __assign_str() and because coccinelle does not
      handle the TRACE_EVENT() macro I ended up using the following sed script:
      
        git grep -l __assign_str | while read a ; do
            sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
            mv /tmp/test-file $a;
        done
      
      I then searched for __assign_str() that did not end with ';' as those
      were multi line assignments that the sed script above would fail to catch.
      
      Note, the same updates will need to be done for:
      
        __assign_str_len()
        __assign_rel_str()
        __assign_rel_str_len()
      
      I tested this with both an allmodconfig and an allyesconfig (build only for both).
      
      [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Julia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
      Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
      Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
      Acked-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: Darrick J. Wong <djwong@kernel.org>	# xfs
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      2c92ca84
  3. 22 May, 2024 9 commits
    • Linus Torvalds's avatar
      mm: simplify and improve print_vma_addr() output · de7e71ef
      Linus Torvalds authored
      Use '%pD' to print out the filename, and print out the actual offset
      within the file too, rather than just what the virtual address of the
      mapping is (which doesn't tell you anything about any mapping offsets).
      
      Also, use the exact vma_lookup() instead of find_vma() - the latter
      looks up any vma _after_ the address, which is of questionable value
      (yes, maybe you fell off the beginning, but you'd be more likely to fall
      off the end).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de7e71ef
    • Linus Torvalds's avatar
      Merge local branch 'x86-codegen' · f8a6e48c
      Linus Torvalds authored
      Merge trivial x86 code generation annoyances
      
       - Introduce helper macros for clang asm input problems
      
       - use said macros to improve trivially stupid code generation issues in
         bitops and array_index_mask_nospec
      
       - also improve codegen with 32-bit array index comparisons
      
      None of these really matter, but I look at code generation and profiles
      fairly regularly, and these misfeatures caused the generated code to
      look really odd and distract from the real issues.
      
      * branch 'x86-codegen' of local tree:
        x86: improve bitop code generation with clang
        x86: improve array_index_mask_nospec() code generation
        clang: work around asm input constraint problems
      f8a6e48c
    • Linus Torvalds's avatar
      x86: improve bitop code generation with clang · b9b60b31
      Linus Torvalds authored
      This uses the new ASM_INPUT_RM macro to avoid the bad code generation
      issue that clang has with more generic asm inputs.
      
      This ends up avoiding generating code like this:
      
       	mov    %r10,(%rsp)
       	tzcnt  (%rsp),%rcx
      
      which now becomes just
      
       	tzcnt  %r10,%rcx
      
      and in the process ends up also removing a few unnecessary stack frames
      when the only use was that pointless "asm uses memory location off stack".
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9b60b31
    • Linus Torvalds's avatar
      x86: improve array_index_mask_nospec() code generation · 7453b948
      Linus Torvalds authored
      Don't force the inputs to be 'unsigned long', when the comparison can
      easily be done in 32-bit if that's more appropriate.
      
      Note that while we can look at the inputs to choose an appropriate size
      for the compare instruction, the output is fixed at 'unsigned long'.
      That's not technically optimal either, since a 32-bit 'sbbl' would often
      be sufficient.
      
      But for the outgoing mask we don't know how the mask ends up being used
      (ie we have uses that have an incoming 32-bit array index, but end up
      using the mask for other things).  That said, it only costs the extra
      REX prefix to always generate the 64-bit mask.
      
      [ A 'sbbl' also always technically generates a 64-bit mask, but with the
        upper 32 bits clear: that's fine for when the incoming index that will
        be masked is already 32-bit, but not if you use the mask to mask a
        pointer afterwards, like the file table lookup does ]
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7453b948
    • Linus Torvalds's avatar
      clang: work around asm input constraint problems · dbaaabd6
      Linus Torvalds authored
      Work around clang problems with asm constraints that have multiple
      possibilities, particularly "g" and "rm".
      
      Clang seems to turn inputs like that into the most generic form, which
      is the memory input - but to make matters worse, clang won't even use a
      possible original memory location, but will spill the value to stack,
      and use the stack for the asm input.
      
      See
      
        https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
      
      for some explanation of why clang has this strange behavior, but the end
      result is that "g" and "rm" really end up generating horrid code.
      
      Link: https://github.com/llvm/llvm-project/issues/20571
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbaaabd6
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 5f16eb05
      Linus Torvalds authored
      Pull char/misc and other driver subsystem updates from Greg KH:
       "Here is the big set of char/misc and other driver subsystem updates
        for 6.10-rc1. Nothing major here, just lots of new drivers and updates
        for apis and new hardware types. Included in here are:
      
         - big IIO driver updates with more devices and drivers added
      
         - fpga driver updates
      
         - hyper-v driver updates
      
         - uio_pruss driver removal, no one uses it, other drivers control the
           same hardware now
      
         - binder minor updates
      
         - mhi driver updates
      
         - excon driver updates
      
         - counter driver updates
      
         - accessability driver updates
      
         - coresight driver updates
      
         - other hwtracing driver updates
      
         - nvmem driver updates
      
         - slimbus driver updates
      
         - spmi driver updates
      
         - other smaller misc and char driver updates
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (319 commits)
        misc: ntsync: mark driver as "broken" to prevent from building
        spmi: pmic-arb: Add multi bus support
        spmi: pmic-arb: Register controller for bus instead of arbiter
        spmi: pmic-arb: Make core resources acquiring a version operation
        spmi: pmic-arb: Make the APID init a version operation
        spmi: pmic-arb: Fix some compile warnings about members not being described
        dt-bindings: spmi: Deprecate qcom,bus-id
        dt-bindings: spmi: Add X1E80100 SPMI PMIC ARB schema
        spmi: pmic-arb: Replace three IS_ERR() calls by null pointer checks in spmi_pmic_arb_probe()
        spmi: hisi-spmi-controller: Do not override device identifier
        dt-bindings: spmi: hisilicon,hisi-spmi-controller: clean up example
        dt-bindings: spmi: hisilicon,hisi-spmi-controller: fix binding references
        spmi: make spmi_bus_type const
        extcon: adc-jack: Document missing struct members
        extcon: realtek: Remove unused of_gpio.h
        extcon: usbc-cros-ec: Convert to platform remove callback returning void
        extcon: usb-gpio: Convert to platform remove callback returning void
        extcon: max77843: Convert to platform remove callback returning void
        extcon: max3355: Convert to platform remove callback returning void
        extcon: intel-mrfld: Convert to platform remove callback returning void
        ...
      5f16eb05
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.10-rc1' of... · d90be6e4
      Linus Torvalds authored
      Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is the small set of driver core and kernfs changes for 6.10-rc1.
      
        Nothing major here at all, just a small set of changes for some driver
        core apis, and minor fixups. Included in here are:
      
         - sysfs_bin_attr_simple_read() helper added and used
      
         - device_show_string() helper added and used
      
        All usages of these were acked by the various maintainers. Also in
        here are:
      
         - kernfs minor cleanup
      
         - removed unused functions
      
         - typo fix in documentation
      
         - pay attention to sysfs_create_link() failures in module.c finally
      
        All of these have been in linux-next for a very long time with no
        reported problems"
      
      * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        device property: Fix a typo in the description of device_get_child_node_count()
        kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
        scsi: Use device_show_string() helper for sysfs attributes
        platform/x86: Use device_show_string() helper for sysfs attributes
        perf: Use device_show_string() helper for sysfs attributes
        IB/qib: Use device_show_string() helper for sysfs attributes
        hwmon: Use device_show_string() helper for sysfs attributes
        driver core: Add device_show_string() helper for sysfs attributes
        treewide: Use sysfs_bin_attr_simple_read() helper
        sysfs: Add sysfs_bin_attr_simple_read() helper
        module: don't ignore sysfs_create_link() failures
        driver core: Remove unused platform_notify, platform_notify_remove
      d90be6e4
    • Linus Torvalds's avatar
      Merge tag 'staging-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · be81389c
      Linus Torvalds authored
      Pull staging driver updates from Greg KH:
       "Here is the big set of staging driver changes for 6.10-rc1. Not a lot
        of cleanups happening this kernel release, intern applications must be
        out of sync at the moment. But we did delete two drivers, wlan-ng and
        pi433, as they are no longer in use and the developers involved wanted
        them just gone entirely, allowing us to drop 19k lines from the tree.
      
        Other than the normal coding style cleanups here, there has been a lot
        of work on the vc04_services code, with the intent to finally get that
        out of staging hopefully soon. It's getting closer, which is nice to
        see.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'staging-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (98 commits)
        staging: pi433: Remove unused driver
        staging: vchiq_core: Add missing blank lines
        staging: vchiq_core: Drop unnecessary blank lines
        staging: vchiq_core: Add parentheses to VCHIQ_MSG_SRCPORT
        staging: vchiq_core: Use printk messages for devices
        staging: vchiq_arm: Drop unnecessary NULL check
        staging: vc04_services: Delete unnecessary NULL check
        staging: vc04_services: vchiq_arm: Fix NULL ptr dereferences
        Staging: rtl8192e: Rename variable DssCCk
        Staging: rtl8192e: Rename variable ExtHTCapInfo
        Staging: rtl8192e: Rename variable MPDUDensity
        Staging: rtl8192e: Rename variable MaxRxAMPDUFactor
        Staging: rtl8192e: Rename variable MaxAMSDUSize
        Staging: rtl8192e: Rename variable DelayBA
        Staging: rtl8192e: Rename variable RxSTBC
        Staging: rtl8192e: Rename variable TxSTBC
        Staging: rtl8192e: Rename variable GreenField
        Staging: rtl8192e: Rename variable ShortGI20Mhz
        Staging: rtl8192e: Rename variable ShortGI40Mhz
        Staging: rtl8192e: Rename variable MimoPwrSave
        ...
      be81389c
    • Linus Torvalds's avatar
      Merge tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · f6b8e86b
      Linus Torvalds authored
      Pull tty / serial updates from Greg KH:
       "Here is the big set of tty/serial driver changes for 6.10-rc1.
        Included in here are:
      
         - Usual good set of api cleanups and evolution by Jiri Slaby to make
           the serial interfaces move out of the 1990's by using kfifos
           instead of hand-rolling their own logic.
      
         - 8250_exar driver updates
      
         - max3100 driver updates
      
         - sc16is7xx driver updates
      
         - exar driver updates
      
         - sh-sci driver updates
      
         - tty ldisc api addition to help refuse bindings
      
         - other smaller serial driver updates
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits)
        serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev()
        serial: imx: Raise TX trigger level to 8
        serial: 8250_pnp: Simplify "line" related code
        serial: sh-sci: simplify locking when re-issuing RXDMA fails
        serial: sh-sci: let timeout timer only run when DMA is scheduled
        serial: sh-sci: describe locking requirements for invalidating RXDMA
        serial: sh-sci: protect invalidating RXDMA on shutdown
        tty: add the option to have a tty reject a new ldisc
        serial: core: Call device_set_awake_path() for console port
        dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema
        tty: serial: uartps: Add support for uartps controller reset
        arm64: zynqmp: Add resets property for UART nodes
        dt-bindings: serial: cdns,uart: Add optional reset property
        serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS()
        serial: 8250_exar: Keep the includes sorted
        serial: 8250_exar: Make type of bit the same in exar_ee_*_bit()
        serial: 8250_exar: Use BIT() in exar_ee_read()
        serial: 8250_exar: Switch to use dev_err_probe()
        serial: 8250_exar: Return directly from switch-cases
        serial: 8250_exar: Decrease indentation level
        ...
      f6b8e86b