1. 25 May, 2024 6 commits
    • Alexei Starovoitov's avatar
      Merge branch 'fix-bpf-multi-uprobe-pid-filtering-logic' · 590016ad
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      Fix BPF multi-uprobe PID filtering logic
      
      It turns out that current implementation of multi-uprobe PID filtering logic
      is broken. It filters by thread, while the promise is filtering by process.
      Patch #1 fixes the logic trivially. The rest is testing and mitigations that
      are necessary for libbpf to not break users of USDT programs.
      
      v1->v2:
        - fix selftest in last patch (CI);
        - use semicolon in patch #3 (Jiri).
      ====================
      
      Link: https://lore.kernel.org/r/20240521163401.3005045-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      590016ad
    • Andrii Nakryiko's avatar
      selftests/bpf: extend multi-uprobe tests with USDTs · 198034a8
      Andrii Nakryiko authored
      Validate libbpf's USDT-over-multi-uprobe logic by adding USDTs to
      existing multi-uprobe tests. This checks correct libbpf fallback to
      singular uprobes (when run on older kernels with buggy PID filtering).
      We reuse already established child process and child thread testing
      infrastructure, so additions are minimal. These test fail on either
      older kernels or older version of libbpf that doesn't detect PID
      filtering problems.
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240521163401.3005045-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      198034a8
    • Andrii Nakryiko's avatar
      selftests/bpf: extend multi-uprobe tests with child thread case · 70342420
      Andrii Nakryiko authored
      Extend existing multi-uprobe tests to test that PID filtering works
      correctly. We already have child *process* tests, but we need also child
      *thread* tests. This patch adds spawn_thread() helper to start child
      thread, wait for it to be ready, and then instruct it to trigger desired
      uprobes.
      
      Additionally, we extend BPF-side code to track thread ID, not just
      process ID. Also we detect whether extraneous triggerings with
      unexpected process IDs happened, and validate that none of that happened
      in practice.
      
      These changes prove that fixed PID filtering logic for multi-uprobe
      works as expected. These tests fail on old kernels.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20240521163401.3005045-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      70342420
    • Andrii Nakryiko's avatar
      libbpf: detect broken PID filtering logic for multi-uprobe · 04d939a2
      Andrii Nakryiko authored
      Libbpf is automatically (and transparently to user) detecting
      multi-uprobe support in the kernel, and, if supported, uses
      multi-uprobes to improve USDT attachment speed.
      
      USDTs can be attached system-wide or for the specific process by PID. In
      the latter case, we rely on correct kernel logic of not triggering USDT
      for unrelated processes.
      
      As such, on older kernels that do support multi-uprobes, but still have
      broken PID filtering logic, we need to fall back to singular uprobes.
      
      Unfortunately, whether user is using PID filtering or not is known at
      the attachment time, which happens after relevant BPF programs were
      loaded into the kernel. Also unfortunately, we need to make a call
      whether to use multi-uprobes or singular uprobe for SEC("usdt") programs
      during BPF object load time, at which point we have no information about
      possible PID filtering.
      
      The distinction between single and multi-uprobes is small, but important
      for the kernel. Multi-uprobes get BPF_TRACE_UPROBE_MULTI attach type,
      and kernel internally substitiute different implementation of some of
      BPF helpers (e.g., bpf_get_attach_cookie()) depending on whether uprobe
      is multi or singular. So, multi-uprobes and singular uprobes cannot be
      intermixed.
      
      All the above implies that we have to make an early and conservative
      call about the use of multi-uprobes. And so this patch modifies libbpf's
      existing feature detector for multi-uprobe support to also check correct
      PID filtering. If PID filtering is not yet fixed, we fall back to
      singular uprobes for USDTs.
      
      This extension to feature detection is simple thanks to kernel's -EINVAL
      addition for pid < 0.
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240521163401.3005045-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      04d939a2
    • Andrii Nakryiko's avatar
      bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic · 4a8f635a
      Andrii Nakryiko authored
      get_pid_task() internally already calls rcu_read_lock() and
      rcu_read_unlock(), so there is no point to do this one extra time.
      
      This is a drive-by improvement and has no correctness implications.
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240521163401.3005045-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4a8f635a
    • Andrii Nakryiko's avatar
      bpf: fix multi-uprobe PID filtering logic · 46ba0e49
      Andrii Nakryiko authored
      Current implementation of PID filtering logic for multi-uprobes in
      uprobe_prog_run() is filtering down to exact *thread*, while the intent
      for PID filtering it to filter by *process* instead. The check in
      uprobe_prog_run() also differs from the analogous one in
      uprobe_multi_link_filter() for some reason. The latter is correct,
      checking task->mm, not the task itself.
      
      Fix the check in uprobe_prog_run() to perform the same task->mm check.
      
      While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
      type of lookup, given the intent is to get a representative task of an
      entire process. This doesn't change behavior, but seems more logical. It
      would hold task group leader task now, not any random thread task.
      
      Last but not least, given multi-uprobe support is half-broken due to
      this PID filtering logic (depending on whether PID filtering is
      important or not), we need to make it easy for user space consumers
      (including libbpf) to easily detect whether PID filtering logic was
      already fixed.
      
      We do it here by adding an early check on passed pid parameter. If it's
      negative (and so has no chance of being a valid PID), we return -EINVAL.
      Previous behavior would eventually return -ESRCH ("No process found"),
      given there can't be any process with negative PID. This subtle change
      won't make any practical change in behavior, but will allow applications
      to detect PID filtering fixes easily. Libbpf fixes take advantage of
      this in the next patch.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Fixes: b733eead ("bpf: Add pid filter support for uprobe_multi link")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46ba0e49
  2. 24 May, 2024 1 commit
  3. 21 May, 2024 10 commits
    • Xu Kuohai's avatar
      MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT · 8d00547e
      Xu Kuohai authored
      I am working on ARM64 BPF JIT for a while, hence add myself
      as reviewer.
      Signed-off-by: default avatarXu Kuohai <xukuohai@huaweicloud.com>
      Acked-by: default avatarHengqi Chen <hengqi.chen@gmail.com>
      Link: https://lore.kernel.org/r/20240516020928.156125-1-xukuohai@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8d00547e
    • Aaron Conole's avatar
      openvswitch: Set the skbuff pkt_type for proper pmtud support. · 30a92c9e
      Aaron Conole authored
      Open vSwitch is originally intended to switch at layer 2, only dealing with
      Ethernet frames.  With the introduction of l3 tunnels support, it crossed
      into the realm of needing to care a bit about some routing details when
      making forwarding decisions.  If an oversized packet would need to be
      fragmented during this forwarding decision, there is a chance for pmtu
      to get involved and generate a routing exception.  This is gated by the
      skbuff->pkt_type field.
      
      When a flow is already loaded into the openvswitch module this field is
      set up and transitioned properly as a packet moves from one port to
      another.  In the case that a packet execute is invoked after a flow is
      newly installed this field is not properly initialized.  This causes the
      pmtud mechanism to omit sending the required exception messages across
      the tunnel boundary and a second attempt needs to be made to make sure
      that the routing exception is properly setup.  To fix this, we set the
      outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
      to the openvswitch module via a port device or packet command.
      
      Even for bridge ports as users, the pkt_type needs to be reset when
      doing the transmit as the packet is truly outgoing and routing needs
      to get involved post packet transformations, in the case of
      VXLAN/GENEVE/udp-tunnel packets.  In general, the pkt_type on output
      gets ignored, since we go straight to the driver, but in the case of
      tunnel ports they go through IP routing layer.
      
      This issue is periodically encountered in complex setups, such as large
      openshift deployments, where multiple sets of tunnel traversal occurs.
      A way to recreate this is with the ovn-heater project that can setup
      a networking environment which mimics such large deployments.  We need
      larger environments for this because we need to ensure that flow
      misses occur.  In these environment, without this patch, we can see:
      
        ./ovn_cluster.sh start
        podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
        podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache
        podman exec ovn-chassis-1 ip netns exec sw01p1 \
               ping 21.0.0.3 -M do -s 1300 -c2
        PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
        From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)
      
        --- 21.0.0.3 ping statistics ---
        ...
      
      Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
      sent into the server.
      
      With this patch, setting the pkt_type, we see the following:
      
        podman exec ovn-chassis-1 ip netns exec sw01p1 \
               ping 21.0.0.3 -M do -s 1300 -c2
        PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
        From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
        ping: local error: message too long, mtu=1222
      
        --- 21.0.0.3 ping statistics ---
        ...
      
      In this case, the first ping request receives the FRAG_NEEDED message and
      a local routing exception is created.
      Tested-by: default avatarJaime Caamano <jcaamano@redhat.com>
      Reported-at: https://issues.redhat.com/browse/FDP-164
      Fixes: 58264848 ("openvswitch: Add vxlan tunneling support.")
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20240516200941.16152-1-aconole@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      30a92c9e
    • Paolo Abeni's avatar
      Merge branch 'af_unix-fix-gc-and-improve-selftest' · 580acf6c
      Paolo Abeni authored
      Michal Luczaj says:
      
      ====================
      af_unix: Fix GC and improve selftest
      
      Series deals with AF_UNIX garbage collector mishandling some in-flight
      graph cycles. Embryos carrying OOB packets with SCM_RIGHTS cause issues.
      
      Patch 1/2 fixes the memory leak.
      Patch 2/2 tweaks the selftest for a better OOB coverage.
      
      v3:
        - Patch 1/2: correct the commit message (Kuniyuki)
      
      v2: https://lore.kernel.org/netdev/20240516145457.1206847-1-mhal@rbox.co/
        - Patch 1/2: remove WARN_ON_ONCE() (Kuniyuki)
        - Combine both patches into a series (Kuniyuki)
      
      v1: https://lore.kernel.org/netdev/20240516103049.1132040-1-mhal@rbox.co/
      ====================
      
      Link: https://lore.kernel.org/r/20240517093138.1436323-1-mhal@rbox.coSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      580acf6c
    • Kuniyuki Iwashima's avatar
      selftest: af_unix: Make SCM_RIGHTS into OOB data. · e060e433
      Kuniyuki Iwashima authored
      scm_rights.c covers various test cases for inflight file descriptors
      and garbage collector for AF_UNIX sockets.
      
      Currently, SCM_RIGHTS messages are sent with 3-bytes string, and it's
      not good for MSG_OOB cases, as SCM_RIGTS cmsg goes with the first 2-bytes,
      which is non-OOB data.
      
      Let's send SCM_RIGHTS messages with 1-byte character to pack SCM_RIGHTS
      into OOB data.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e060e433
    • Michal Luczaj's avatar
      af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS · 041933a1
      Michal Luczaj authored
      GC attempts to explicitly drop oob_skb's reference before purging the hit
      list.
      
      The problem is with embryos: kfree_skb(u->oob_skb) is never called on an
      embryo socket.
      
      The python script below [0] sends a listener's fd to its embryo as OOB
      data.  While GC does collect the embryo's queue, it fails to drop the OOB
      skb's refcount.  The skb which was in embryo's receive queue stays as
      unix_sk(sk)->oob_skb and keeps the listener's refcount [1].
      
      Tell GC to dispose embryo's oob_skb.
      
      [0]:
      from array import array
      from socket import *
      
      addr = '\x00unix-oob'
      lis = socket(AF_UNIX, SOCK_STREAM)
      lis.bind(addr)
      lis.listen(1)
      
      s = socket(AF_UNIX, SOCK_STREAM)
      s.connect(addr)
      scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()]))
      s.sendmsg([b'x'], [scm], MSG_OOB)
      lis.close()
      
      [1]
      $ grep unix-oob /proc/net/unix
      $ ./unix-oob.py
      $ grep unix-oob /proc/net/unix
      0000000000000000: 00000002 00000000 00000000 0001 02     0 @unix-oob
      0000000000000000: 00000002 00000000 00010000 0001 01  6072 @unix-oob
      
      Fixes: 4090fa37 ("af_unix: Replace garbage collection algorithm.")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      041933a1
    • Kuniyuki Iwashima's avatar
      tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). · 3ebc46ca
      Kuniyuki Iwashima authored
      In dctcp_update_alpha(), we use a module parameter dctcp_shift_g
      as follows:
      
        alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g);
        ...
        delivered_ce <<= (10 - dctcp_shift_g);
      
      It seems syzkaller started fuzzing module parameters and triggered
      shift-out-of-bounds [0] by setting 100 to dctcp_shift_g:
      
        memcpy((void*)0x20000080,
               "/sys/module/tcp_dctcp/parameters/dctcp_shift_g\000", 47);
        res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul,
                      /*flags=*/2ul, /*mode=*/0ul);
        memcpy((void*)0x20000000, "100\000", 4);
        syscall(__NR_write, /*fd=*/r[0], /*val=*/0x20000000ul, /*len=*/4ul);
      
      Let's limit the max value of dctcp_shift_g by param_set_uint_minmax().
      
      With this patch:
      
        # echo 10 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
        # cat /sys/module/tcp_dctcp/parameters/dctcp_shift_g
        10
        # echo 11 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
        -bash: echo: write error: Invalid argument
      
      [0]:
      UBSAN: shift-out-of-bounds in net/ipv4/tcp_dctcp.c:143:12
      shift exponent 100 is too large for 32-bit type 'u32' (aka 'unsigned int')
      CPU: 0 PID: 8083 Comm: syz-executor345 Not tainted 6.9.0-05151-g1b294a1f #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x201/0x300 lib/dump_stack.c:114
       ubsan_epilogue lib/ubsan.c:231 [inline]
       __ubsan_handle_shift_out_of_bounds+0x346/0x3a0 lib/ubsan.c:468
       dctcp_update_alpha+0x540/0x570 net/ipv4/tcp_dctcp.c:143
       tcp_in_ack_event net/ipv4/tcp_input.c:3802 [inline]
       tcp_ack+0x17b1/0x3bc0 net/ipv4/tcp_input.c:3948
       tcp_rcv_state_process+0x57a/0x2290 net/ipv4/tcp_input.c:6711
       tcp_v4_do_rcv+0x764/0xc40 net/ipv4/tcp_ipv4.c:1937
       sk_backlog_rcv include/net/sock.h:1106 [inline]
       __release_sock+0x20f/0x350 net/core/sock.c:2983
       release_sock+0x61/0x1f0 net/core/sock.c:3549
       mptcp_subflow_shutdown+0x3d0/0x620 net/mptcp/protocol.c:2907
       mptcp_check_send_data_fin+0x225/0x410 net/mptcp/protocol.c:2976
       __mptcp_close+0x238/0xad0 net/mptcp/protocol.c:3072
       mptcp_close+0x2a/0x1a0 net/mptcp/protocol.c:3127
       inet_release+0x190/0x1f0 net/ipv4/af_inet.c:437
       __sock_release net/socket.c:659 [inline]
       sock_close+0xc0/0x240 net/socket.c:1421
       __fput+0x41b/0x890 fs/file_table.c:422
       task_work_run+0x23b/0x300 kernel/task_work.c:180
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0x9c8/0x2540 kernel/exit.c:878
       do_group_exit+0x201/0x2b0 kernel/exit.c:1027
       __do_sys_exit_group kernel/exit.c:1038 [inline]
       __se_sys_exit_group kernel/exit.c:1036 [inline]
       __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x67/0x6f
      RIP: 0033:0x7f6c2b5005b6
      Code: Unable to access opcode bytes at 0x7f6c2b50058c.
      RSP: 002b:00007ffe883eb948 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 00007f6c2b5862f0 RCX: 00007f6c2b5005b6
      RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
      RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
      R10: 0000000000000006 R11: 0000000000000246 R12: 00007f6c2b5862f0
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarYue Sun <samsun1006219@gmail.com>
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Closes: https://lore.kernel.org/netdev/CAEkJfYNJM=cw-8x7_Vmj1J6uYVCWMbbvD=EFmDPVBGpTsqOxEA@mail.gmail.com/
      Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240517091626.32772-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3ebc46ca
    • Hangbin Liu's avatar
      selftests/net: use tc rule to filter the na packet · ea63ac14
      Hangbin Liu authored
      Test arp_ndisc_untracked_subnets use tcpdump to filter the unsolicited
      and untracked na messages. It set -e before calling tcpdump. But if
      tcpdump filters 0 packet, it will return none zero, and cause the script
      to exit.
      
      Instead of using slow tcpdump to capture packets, let's using tc rule
      to filter out the na message.
      
      At the same time, fix function setup_v6 which only needs one parameter.
      Move all the related helpers from forwarding lib.sh to net lib.sh.
      
      Fixes: 0ea7b0a4 ("selftests: net: arp_ndisc_untracked_subnets: test for arp_accept and accept_untracked_na")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240517010327.2631319-1-liuhangbin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ea63ac14
    • Hangbin Liu's avatar
      ipv6: sr: fix memleak in seg6_hmac_init_algo · efb9f4f1
      Hangbin Liu authored
      seg6_hmac_init_algo returns without cleaning up the previous allocations
      if one fails, so it's going to leak all that memory and the crypto tfms.
      
      Update seg6_hmac_exit to only free the memory when allocated, so we can
      reuse the code directly.
      
      Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support")
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Closes: https://lore.kernel.org/netdev/Zj3bh-gE7eT6V6aH@hog/Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/20240517005435.2600277-1-liuhangbin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      efb9f4f1
    • Kuniyuki Iwashima's avatar
      af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock. · 9841991a
      Kuniyuki Iwashima authored
      Billy Jheng Bing-Jhong reported a race between __unix_gc() and
      queue_oob().
      
      __unix_gc() tries to garbage-collect close()d inflight sockets,
      and then if the socket has MSG_OOB in unix_sk(sk)->oob_skb, GC
      will drop the reference and set NULL to it locklessly.
      
      However, the peer socket still can send MSG_OOB message and
      queue_oob() can update unix_sk(sk)->oob_skb concurrently, leading
      NULL pointer dereference. [0]
      
      To fix the issue, let's update unix_sk(sk)->oob_skb under the
      sk_receive_queue's lock and take it everywhere we touch oob_skb.
      
      Note that we defer kfree_skb() in manage_oob() to silence lockdep
      false-positive (See [1]).
      
      [0]:
      BUG: kernel NULL pointer dereference, address: 0000000000000008
       PF: supervisor write access in kernel mode
       PF: error_code(0x0002) - not-present page
      PGD 8000000009f5e067 P4D 8000000009f5e067 PUD 9f5d067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP PTI
      CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc5-00191-gd091e579 #110
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: events delayed_fput
      RIP: 0010:skb_dequeue (./include/linux/skbuff.h:2386 ./include/linux/skbuff.h:2402 net/core/skbuff.c:3847)
      Code: 39 e3 74 3e 8b 43 10 48 89 ef 83 e8 01 89 43 10 49 8b 44 24 08 49 c7 44 24 08 00 00 00 00 49 8b 14 24 49 c7 04 24 00 00 00 00 <48> 89 42 08 48 89 10 e8 e7 c5 42 00 4c 89 e0 5b 5d 41 5c c3 cc cc
      RSP: 0018:ffffc900001bfd48 EFLAGS: 00000002
      RAX: 0000000000000000 RBX: ffff8880088f5ae8 RCX: 00000000361289f9
      RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff8880088f5b00
      RBP: ffff8880088f5b00 R08: 0000000000080000 R09: 0000000000000001
      R10: 0000000000000003 R11: 0000000000000001 R12: ffff8880056b6a00
      R13: ffff8880088f5280 R14: 0000000000000001 R15: ffff8880088f5a80
      FS:  0000000000000000(0000) GS:ffff88807dd80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 0000000006314000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <TASK>
       unix_release_sock (net/unix/af_unix.c:654)
       unix_release (net/unix/af_unix.c:1050)
       __sock_release (net/socket.c:660)
       sock_close (net/socket.c:1423)
       __fput (fs/file_table.c:423)
       delayed_fput (fs/file_table.c:444 (discriminator 3))
       process_one_work (kernel/workqueue.c:3259)
       worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
       kthread (kernel/kthread.c:388)
       ret_from_fork (arch/x86/kernel/process.c:153)
       ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
       </TASK>
      Modules linked in:
      CR2: 0000000000000008
      
      Link: https://lore.kernel.org/netdev/a00d3993-c461-43f2-be6d-07259c98509a@rbox.co/ [1]
      Fixes: 1279f9d9 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
      Reported-by: default avatarBilly Jheng Bing-Jhong <billy@starlabs.sg>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240516134835.8332-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9841991a
    • Heiner Kallweit's avatar
      Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already" · eabb8a9b
      Heiner Kallweit authored
      This reverts commit 7274c414.
      
      Ken reported that RTL8125b can lock up if gro_flush_timeout has the
      default value of 20000 and napi_defer_hard_irqs is set to 0.
      In this scenario device interrupts aren't disabled, what seems to
      trigger some silicon bug under heavy load. I was able to reproduce this
      behavior on RTL8168h. Fix this by reverting 7274c414.
      
      Fixes: 7274c414 ("r8169: don't try to disable interrupts if NAPI is scheduled already")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKen Milmore <ken.milmore@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/9b5b6f4c-4f54-4b90-b0b3-8d8023c2e780@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      eabb8a9b
  4. 20 May, 2024 4 commits
  5. 18 May, 2024 10 commits
    • Linus Torvalds's avatar
      kprobe/ftrace: fix build error due to bad function definition · 4b377b48
      Linus Torvalds authored
      Commit 1a7d0890 ("kprobe/ftrace: bail out if ftrace was killed")
      introduced a bad K&R function definition, which we haven't accepted in a
      long long time.
      
      Gcc seems to let it slide, but clang notices with the appropriate error:
      
        kernel/kprobes.c:1140:24: error: a function declaration without a prototype is deprecated in all >
         1140 | void kprobe_ftrace_kill()
              |                        ^
              |                         void
      
      but this commit was apparently never in linux-next before it was sent
      upstream, so it didn't get the appropriate build test coverage.
      
      Fixes: 1a7d0890 kprobe/ftrace: bail out if ftrace was killed
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b377b48
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f08a1e91
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - virtio_net: fix missed error path rtnl_unlock after control queue
           locking rework
      
        Current release - new code bugs:
      
         - bpf: fix KASAN slab-out-of-bounds in percpu_array_map_gen_lookup,
           caused by missing nested map handling
      
         - drv: dsa: correct initialization order for KSZ88x3 ports
      
        Previous releases - regressions:
      
         - af_packet: do not call packet_read_pending() from
           tpacket_destruct_skb() fix performance regression
      
         - ipv6: fix route deleting failure when metric equals 0, don't assume
           0 means not set / default in this case
      
        Previous releases - always broken:
      
         - bridge: couple of syzbot-driven fixes"
      
      * tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
        selftests: net: local_termination: annotate the expected failures
        net: dsa: microchip: Correct initialization order for KSZ88x3 ports
        MAINTAINERS: net: Update reviewers for TI's Ethernet drivers
        dt-bindings: net: ti: Update maintainers list
        l2tp: fix ICMP error handling for UDP-encap sockets
        net: txgbe: fix to control VLAN strip
        net: wangxun: match VLAN CTAG and STAG features
        net: wangxun: fix to change Rx features
        af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
        virtio_net: Fix missed rtnl_unlock
        netrom: fix possible dead-lock in nr_rt_ioctl()
        idpf: don't skip over ethtool tcp-data-split setting
        dt-bindings: net: qcom: ethernet: Allow dma-coherent
        bonding: fix oops during rmmod
        net/ipv6: Fix route deleting failure when metric equals 0
        selftests/net: reduce xfrm_policy test time
        selftests/bpf: Adjust btf_dump test to reflect recent change in file_operations
        selftests/bpf: Adjust test_access_variable_array after a kernel function name change
        selftests/net/lib: no need to record ns name if it already exist
        net: qrtr: ns: Fix module refcnt
        ...
      f08a1e91
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 26aa834f
      Linus Torvalds authored
      Pull tracing tool updates from Steven Rostedt:
       "Specific for timerlat:
      
         - Improve the output of timerlat top by adding a missing \n, and by
           avoiding printing color-formatting characters where they are
           translated to regular characters.
      
         - Improve timerlat auto-analysis output by replacing '\t' with spaces
           to avoid copy-and-paste issues when reporting problems.
      
         - Make the user-space (-u) option the default, as it is the most
           complete test. Add a -k option to use the in-kernel workload.
      
         - On timerlat top and hist, add a summary with the overall results.
           For instance, the minimum value for all CPUs, the overall average
           and the maximum value from all CPUs.
      
         - timerlat hist was printing initial values (i.e., 0 as max, and ~0
           as min) if the trace stopped before the first Ret-User event. This
           problem was fixed by printing the " - " no value string to the
           output if that was the case.
      
        For all RTLA tools:
      
         - Add a --warm-up <seconds> option, allowing the workload to run for
           <seconds> before starting to collect results.
      
         - Add a --trace-buffer-size option, allowing the user to set the
           tracing buffer size for -t option. This option is mainly useful for
           reducing the trace file. Now rtla depends on libtracefs >= 1.6.
      
         - Fix the -t [trace_file] parsing, now it does not require the '='
           before the option parameter, and better handles the multiple ways a
           user can pass the trace_file.txt"
      
      * tag 'trace-tools-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rtla: Documentation: Fix -t, --trace
        rtla: Fix -t\--trace[=file]
        rtla/timerlat: Fix histogram report when a cpu count is 0
        rtla: Add --trace-buffer-size option
        rtla/timerlat: Make user-space threads the default
        rtla: Add the --warm-up option
        rtla/timerlat: Add a summary for hist mode
        rtla/timerlat: Add a summary for top mode
        rtla/timerlat: Use pretty formatting only on interactive tty
        rtla/auto-analysis: Replace \t with spaces
        rtla/timerlat: Simplify "no value" printing on top
      26aa834f
    • Linus Torvalds's avatar
      Merge tag 'trace-user-events-v6.10' of... · fa3889d9
      Linus Torvalds authored
      Merge tag 'trace-user-events-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing user-event updates from Steven Rostedt:
      
       - Minor update to the user_events interface
      
        The ABI of creating a user event states that the fields are separated
        by semicolons, and spaces should be ignored.
      
        But the parsing expected at least one space to be there (which was
        incorrect). Fix the reading of the string to handle fields separated
        by semicolons but no space between them.
      
        This does extend the API sightly as now "field;field" will now be
        parsed and not cause an error. But it should not cause any regressions
        as no logic should expect it to fail.
      
        Note, that the logic that parses the event fields to create the
        trace_event works with no spaces after the semi-colon. It is
        the logic that tests against existing events that is inconsistent.
        This causes registering an event without using spaces to succeed
        if it doesn't exist, but makes the same call that tries to register
        to the same event, but doesn't use spaces, fail.
      
      * tag 'trace-user-events-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        selftests/user_events: Add non-spacing separator check
        tracing/user_events: Fix non-spaced field matching
      fa3889d9
    • Linus Torvalds's avatar
      Merge tag 'trace-ringbuffer-v6.10' of... · 53683e40
      Linus Torvalds authored
      Merge tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing ring buffer updates from Steven Rostedt:
       "Add ring_buffer memory mappings.
      
        The tracing ring buffer was created based on being mostly used with
        the splice system call. It is broken up into page ordered sub-buffers
        and the reader swaps a new sub-buffer with an existing sub-buffer
        that's part of the write buffer. It then has total access to the
        swapped out sub-buffer and can do copyless movements of the memory
        into other mediums (file system, network, etc).
      
        The buffer is great for passing around the ring buffer contents in the
        kernel, but is not so good for when the consumer is the user space
        task itself.
      
        A new interface is added that allows user space to memory map the ring
        buffer. It will get all the write sub-buffers as well as reader
        sub-buffer (that is not written to). It can send an ioctl to change
        which sub-buffer is the new reader sub-buffer.
      
        The ring buffer is read only to user space. It only needs to call the
        ioctl when it is finished with a sub-buffer and needs a new sub-buffer
        that the writer will not write over.
      
        A self test program was also created for testing and can be used as an
        example for the interface to user space. The libtracefs (external to
        the kernel) also has code that interacts with this, although it is
        disabled until the interface is in a official release. It can be
        enabled by compiling the library with a special flag. This was used
        for testing applications that perform better with the buffer being
        mapped.
      
        Memory mapped buffers have limitations. The main one is that it can
        not be used with the snapshot logic. If the buffer is mapped,
        snapshots will be disabled. If any logic is set to trigger snapshots
        on a buffer, that buffer will not be allowed to be mapped"
      
      * tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Add cast to unsigned long addr passed to virt_to_page()
        ring-buffer: Have mmapped ring buffer keep track of missed events
        ring-buffer/selftest: Add ring-buffer mapping test
        Documentation: tracing: Add ring-buffer mapping
        tracing: Allow user-space mapping of the ring-buffer
        ring-buffer: Introducing ring-buffer mapping functions
        ring-buffer: Allocate sub-buffers with __GFP_COMP
      53683e40
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 594d2815
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
      
       - Remove unused ftrace_direct_funcs variables
      
       - Fix a possible NULL pointer dereference race in eventfs
      
       - Update do_div() usage in trace event benchmark test
      
       - Speedup direct function registration with asynchronous RCU callback.
      
         The synchronization was done in the registration code and this caused
         delays when registering direct callbacks. Move the freeing to a
         call_rcu() that will prevent delaying of the registering.
      
       - Replace simple_strtoul() usage with kstrtoul()
      
      * tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Fix a possible null pointer dereference in eventfs_find_events()
        ftrace: Fix possible use-after-free issue in ftrace_location()
        ftrace: Remove unused global 'ftrace_direct_func_count'
        ftrace: Remove unused list 'ftrace_direct_funcs'
        tracing: Improve benchmark test performance by using do_div()
        ftrace: Use asynchronous grace period for register_ftrace_direct()
        ftrace: Replaces simple_strtoul in ftrace
      594d2815
    • Linus Torvalds's avatar
      Merge tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 70a66320
      Linus Torvalds authored
      Pull probes updates from Masami Hiramatsu:
      
       - tracing/probes: Add new pseudo-types %pd and %pD support for dumping
         dentry name from 'struct dentry *' and file name from 'struct file *'
      
       - uprobes performance optimizations:
          - Speed up the BPF uprobe event by delaying the fetching of the
            uprobe event arguments that are not used in BPF
          - Avoid locking by speculatively checking whether uprobe event is
            valid
          - Reduce lock contention by using read/write_lock instead of
            spinlock for uprobe list operation. This improved BPF uprobe
            benchmark result 43% on average
      
       - rethook: Remove non-fatal warning messages when tracing stack from
         BPF and skip rcu_is_watching() validation in rethook if possible
      
       - objpool: Optimize objpool (which is used by kretprobes and fprobe as
         rethook backend storage) by inlining functions and avoid caching
         nr_cpu_ids because it is a const value
      
       - fprobe: Add entry/exit callbacks types (code cleanup)
      
       - kprobes: Check ftrace was killed in kprobes if it uses ftrace
      
      * tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        kprobe/ftrace: bail out if ftrace was killed
        selftests/ftrace: Fix required features for VFS type test case
        objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
        objpool: enable inlining objpool_push() and objpool_pop() operations
        rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
        ftrace: make extra rcu_is_watching() validation check optional
        uprobes: reduce contention on uprobes_tree access
        rethook: Remove warning messages printed for finding return address of a frame.
        fprobe: Add entry/exit callbacks types
        selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
        selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
        Documentation: tracing: add new type '%pd' and '%pD' for kprobe
        tracing/probes: support '%pD' type for print struct file's name
        tracing/probes: support '%pd' type for print struct dentry's name
        uprobes: add speculative lockless system-wide uprobe filter check
        uprobes: prepare uprobe args buffer lazily
        uprobes: encapsulate preparation of uprobe args buffer
      70a66320
    • Linus Torvalds's avatar
      Merge tag 'bootconfig-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e9d68251
      Linus Torvalds authored
      Pull bootconfig updates from Masami Hiramatsu:
      
       - Do not put unneeded quotes on the extra command line items which was
         inserted from the bootconfig.
      
       - Remove redundant spaces from the extra command line.
      
      * tag 'bootconfig-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        init/main.c: Minor cleanup for the setup_command_line() function
        init/main.c: Remove redundant space from saved_command_line
        bootconfig: do not put quotes on cmdline items unless necessary
      e9d68251
    • Linus Torvalds's avatar
      Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl · 91b6163b
      Linus Torvalds authored
      Pull sysctl updates from Joel Granados:
      
       - Remove sentinel elements from ctl_table structs in kernel/*
      
         Removing sentinels in ctl_table arrays reduces the build time size
         and runtime memory consumed by ~64 bytes per array. Removals for
         net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline
         through their respective subsystems making the next release the most
         likely place where the final series that removes the check for
         proc_name == NULL will land.
      
         This adds to removals already in arch/, drivers/ and fs/.
      
       - Adjust ctl_table definitions and references to allow constification
           - Remove unused ctl_table function arguments
           - Move non-const elements from ctl_table to ctl_table_header
           - Make ctl_table pointers const in ctl_table_root structure
      
         Making the static ctl_table structs const will increase safety by
         keeping the pointers to proc_handler functions in .rodata. Though no
         ctl_tables where made const in this PR, the ground work for making
         that possible has started with these changes sent by Thomas
         Weißschuh.
      
      * tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
        sysctl: drop now unnecessary out-of-bounds check
        sysctl: move sysctl type to ctl_table_header
        sysctl: drop sysctl_is_perm_empty_ctl_table
        sysctl: treewide: constify argument ctl_table_root::permissions(table)
        sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)
        bpf: Remove the now superfluous sentinel elements from ctl_table array
        delayacct: Remove the now superfluous sentinel elements from ctl_table array
        kprobes: Remove the now superfluous sentinel elements from ctl_table array
        printk: Remove the now superfluous sentinel elements from ctl_table array
        scheduler: Remove the now superfluous sentinel elements from ctl_table array
        seccomp: Remove the now superfluous sentinel elements from ctl_table array
        timekeeping: Remove the now superfluous sentinel elements from ctl_table array
        ftrace: Remove the now superfluous sentinel elements from ctl_table array
        umh: Remove the now superfluous sentinel elements from ctl_table array
        kernel misc: Remove the now superfluous sentinel elements from ctl_table array
      91b6163b
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 06f054b1
      Linus Torvalds authored
      Pull devicetree updates from Rob Herring:
       "DT Bindings:
      
         - Convert samsung,exynos5-dp, atmel,lcdc, aspeed,ast2400-wdt bindings
           to schemas
      
         - Add bindings for Allwinner H616 NMI controller, Renesas r8a779g0
           irqc, Renesas R-Car V4M TMU and CMT timers, Freescale S32G3
           linflexuart, and Mediatek MT7988 XHCI
      
         - Add 'reg' constraints on DSI and SPI display panels
      
         - More dropping of unnecessary quotes in schemas
      
         - Use full paths rather than relative paths in schema $refs
      
         - Drop redundant storing of phandle for reserved memory
      
        DT Core:
      
         - Use scope based cleanups for kfree() and of_node_put()
      
         - Track interrupt-map and power-supplies for fw_devlink
      
         - Add buffer overflow check in of_modalias()
      
         - Add and use __of_prop_free() helper for freeing struct property"
      
      * tag 'devicetree-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (25 commits)
        of: property: Add fw_devlink support for interrupt-map property
        dt-bindings: display: panel: constrain 'reg' in DSI panels
        dt-bindings: display: panel: constrain 'reg' in SPI panels
        dt-bindings: display: samsung,ams495qa01: add missing SPI properties ref
        dt-bindings: Use full path to other schemas
        dt-bindings: PCI: qcom,pcie-sm8350: Drop redundant 'oneOf' sub-schema
        of: module: add buffer overflow check in of_modalias()
        dt-bindings: PCI: microchip: increase number of items in ranges property
        dt-bindings: Drop unnecessary quotes on keys
        dt-bindings: interrupt-controller: mediatek,mt6577-sysirq: Drop unnecessary quotes
        of: property: Use scope based cleanup on port_node
        of: reserved_mem: Remove the use of phandle from the reserved_mem APIs
        of: property: fw_devlink: Add support for "power-supplies" binding
        dt-bindings: watchdog: aspeed,ast2400-wdt: Convert to DT schema
        dt-bindings: irq: sun7i-nmi: Add binding for the H616 NMI controller
        dt-bindings: interrupt-controller: renesas,irqc: Add r8a779g0 support
        dt-bindings: timer: renesas,tmu: Add R-Car V4M support
        dt-bindings: timer: renesas,cmt: Add R-Car V4M support
        of: Use scope based of_node_put() cleanups
        of: Use scope based kfree() cleanups
        ...
      06f054b1
  6. 17 May, 2024 9 commits
    • Jakub Kicinski's avatar
      selftests: net: local_termination: annotate the expected failures · fe56d6e4
      Jakub Kicinski authored
      Vladimir said when adding this test:
      
        The bridge driver fares particularly badly [...] mainly because
        it does not implement IFF_UNICAST_FLT.
      
      See commit 90b9566a ("selftests: forwarding: add a test for
      local_termination.sh").
      
      We don't want to hide the known gaps, but having a test which
      always fails prevents us from catching regressions. Report
      the cases we know may fail as XFAIL.
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20240516152513.1115270-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fe56d6e4
    • Oleksij Rempel's avatar
      net: dsa: microchip: Correct initialization order for KSZ88x3 ports · f0fa8411
      Oleksij Rempel authored
      Adjust the initialization sequence of KSZ88x3 switches to enable
      802.1p priority control on Port 2 before configuring Port 1. This
      change ensures the apptrust functionality on Port 1 operates
      correctly, as it depends on the priority settings of Port 2. The
      prior initialization sequence incorrectly configured Port 1 first,
      which could lead to functional discrepancies.
      
      Fixes: a1ea5771 ("net: dsa: microchip: dcb: add special handling for KSZ88X3 family")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Acked-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Link: https://lore.kernel.org/r/20240517050121.2174412-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f0fa8411
    • Ravi Gunasekaran's avatar
    • Ravi Gunasekaran's avatar
      dt-bindings: net: ti: Update maintainers list · ce08eeb5
      Ravi Gunasekaran authored
      Update the list with the current maintainers of TI's CPSW ethernet
      peripheral.
      Signed-off-by: default avatarRavi Gunasekaran <r-gunasekaran@ti.com>
      Acked-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Acked-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/20240516054932.27597-1-r-gunasekaran@ti.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ce08eeb5
    • Tom Parkin's avatar
      l2tp: fix ICMP error handling for UDP-encap sockets · 6e828dc6
      Tom Parkin authored
      Since commit a36e185e
      ("udp: Handle ICMP errors for tunnels with same destination port on both endpoints")
      UDP's handling of ICMP errors has allowed for UDP-encap tunnels to
      determine socket associations in scenarios where the UDP hash lookup
      could not.
      
      Subsequently, commit d26796ae
      ("udp: check udp sock encap_type in __udp_lib_err")
      subtly tweaked the approach such that UDP ICMP error handling would be
      skipped for any UDP socket which has encapsulation enabled.
      
      In the case of L2TP tunnel sockets using UDP-encap, this latter
      modification effectively broke ICMP error reporting for the L2TP
      control plane.
      
      To a degree this isn't catastrophic inasmuch as the L2TP control
      protocol defines a reliable transport on top of the underlying packet
      switching network which will eventually detect errors and time out.
      
      However, paying attention to the ICMP error reporting allows for more
      timely detection of errors in L2TP userspace, and aids in debugging
      connectivity issues.
      
      Reinstate ICMP error handling for UDP encap L2TP tunnels:
      
       * implement struct udp_tunnel_sock_cfg .encap_err_rcv in order to allow
         the L2TP code to handle ICMP errors;
      
       * only implement error-handling for tunnels which have a managed
         socket: unmanaged tunnels using a kernel socket have no userspace to
         report errors back to;
      
       * flag the error on the socket, which allows for userspace to get an
         error such as -ECONNREFUSED back from sendmsg/recvmsg;
      
       * pass the error into ip[v6]_icmp_error() which allows for userspace to
         get extended error information via. MSG_ERRQUEUE.
      
      Fixes: d26796ae ("udp: check udp sock encap_type in __udp_lib_err")
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Link: https://lore.kernel.org/r/20240513172248.623261-1-tparkin@katalix.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6e828dc6
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 7ee332c9
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
      
       -  define sigset_t in parisc uapi header to fix build of util-linux
      
       -  define HAVE_ARCH_HUGETLB_UNMAPPED_AREA to avoid compiler warning
      
       -  drop unused 'exc_reg' struct in math-emu code
      
      * tag 'parisc-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
        parisc/math-emu: Remove unused struct 'exc_reg'
        parisc: Define sigset_t in parisc uapi header
      7ee332c9
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ff2632d7
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
      
       - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
      
       - Allow per-process DEXCR (Dynamic Execution Control Register) settings
         via prctl, notably NPHIE which controls hashst/hashchk for ROP
         protection.
      
       - Install powerpc selftests in sub-directories. Note this changes the
         way run_kselftest.sh needs to be invoked for powerpc selftests.
      
       - Change fadump (Firmware Assisted Dump) to better handle memory
         add/remove.
      
       - Add support for passing additional parameters to the fadump kernel.
      
       - Add support for updating the kdump image on CPU/memory add/remove
         events.
      
       - Other small features, cleanups and fixes.
      
      Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
      Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
      Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
      Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
      Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
      Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
      Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
      Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
      Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
      Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
      Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
      
      * tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
        powerpc/fadump: Fix section mismatch warning
        powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
        powerpc/fadump: update documentation about bootargs_append
        powerpc/fadump: pass additional parameters when fadump is active
        powerpc/fadump: setup additional parameters for dump capture kernel
        powerpc/pseries/fadump: add support for multiple boot memory regions
        selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
        KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
        KVM: PPC: Fix documentation for ppc mmu caps
        KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
        KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
        powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
        powerpc/code-patching: Use dedicated memory routines for patching
        powerpc/code-patching: Test patch_instructions() during boot
        powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
        powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
        powerpc: Fix typos
        powerpc/eeh: Fix spelling of the word "auxillary" and update comment
        macintosh/ams: Fix unused variable warning
        powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
        ...
      ff2632d7
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux · 4853f1f6
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - Updates to AMBA bus subsystem to drop .owner struct device_driver
         initialisations, moving that to code instead.
      
       - Add LPAE privileged-access-never support
      
       - Add support for Clang CFI
      
       - clkdev: report over-sized device or connection strings
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: (36 commits)
        ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y
        clkdev: report over-sized strings when creating clkdev entries
        ARM: 9393/1: mm: Use conditionals for CFI branches
        ARM: 9392/2: Support CLANG CFI
        ARM: 9391/2: hw_breakpoint: Handle CFI breakpoints
        ARM: 9390/2: lib: Annotate loop delay instructions for CFI
        ARM: 9389/2: mm: Define prototypes for all per-processor calls
        ARM: 9388/2: mm: Type-annotate all per-processor assembly routines
        ARM: 9387/2: mm: Rewrite cacheflush vtables in CFI safe C
        ARM: 9386/2: mm: Use symbol alias for cache functions
        ARM: 9385/2: mm: Type-annotate all cache assembly routines
        ARM: 9384/2: mm: Make tlbflush routines CFI safe
        ARM: 9382/1: ftrace: Define ftrace_stub_graph
        ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement
        ARM: 9357/2: Reduce the number of #ifdef CONFIG_CPU_SW_DOMAIN_PAN
        ARM: 9356/2: Move asm statements accessing TTBCR into C functions
        ARM: 9355/2: Add TTBCR_* definitions to pgtable-3level-hwdef.h
        ARM: 9379/1: coresight: tpda: drop owner assignment
        ARM: 9378/1: coresight: etm4x: drop owner assignment
        ARM: 9377/1: hwrng: nomadik: drop owner assignment
        ...
      4853f1f6
    • David S. Miller's avatar
      Merge branch 'wangxun-fixes' · f6f25eeb
      David S. Miller authored
      Jiawen Wu says:
      
      ====================
      Wangxun fixes
      
      Fixed some bugs when using ethtool to operate network devices.
      
      v4 -> v5:
      - Simplify if...else... to fix features.
      
      v3 -> v4:
      - Require both ctag and stag to be enabled or disabled.
      
      v2 -> v3:
      - Drop the first patch.
      
      v1 -> v2:
      - Factor out the same code.
      - Remove statistics printing with more than 64 queues.
      - Detail the commit logs to describe issues.
      - Remove reset flag check in wx_update_stats().
      - Change to set VLAN CTAG and STAG to be consistent.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6f25eeb