1. 21 Aug, 2023 3 commits
  2. 18 Aug, 2023 25 commits
  3. 17 Aug, 2023 2 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · f54a2a13
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2023-08-16
      
      We've added 17 non-merge commits during the last 6 day(s) which contain
      a total of 20 files changed, 1179 insertions(+), 37 deletions(-).
      
      The main changes are:
      
      1) Add a BPF hook in sys_socket() to change the protocol ID
         from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
         applications, from Geliang Tang.
      
      2) Follow-up/fallout fix from the SO_REUSEPORT + bpf_sk_assign work
         to fix a splat on non-fullsock sks in inet[6]_steal_sock,
         from Lorenz Bauer.
      
      3) Improvements to struct_ops links to avoid forcing presence of
         update/validate callbacks. Also add bpf_struct_ops fields documentation,
         from David Vernet.
      
      4) Ensure libbpf sets close-on-exec flag on gzopen, from Marco Vedovati.
      
      5) Several new tcx selftest additions and bpftool link show support for
         tcx and xdp links, from Daniel Borkmann.
      
      6) Fix a smatch warning on uninitialized symbol in
         bpf_perf_link_fill_kprobe, from Yafang Shao.
      
      7) BPF selftest fixes e.g. misplaced break in kfunc_call test,
         from Yipeng Zou.
      
      8) Small cleanup to remove unused declaration bpf_link_new_file,
         from Yue Haibing.
      
      9) Small typo fix to bpftool's perf help message, from Daniel T. Lee.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
        selftests/bpf: Add mptcpify test
        selftests/bpf: Fix error checks of mptcp open_and_load
        selftests/bpf: Add two mptcp netns helpers
        bpf: Add update_socket_protocol hook
        bpftool: Implement link show support for xdp
        bpftool: Implement link show support for tcx
        selftests/bpf: Add selftest for fill_link_info
        bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe()
        net: Fix slab-out-of-bounds in inet[6]_steal_sock
        bpf: Document struct bpf_struct_ops fields
        bpf: Support default .validate() and .update() behavior for struct_ops links
        selftests/bpf: Add various more tcx test cases
        selftests/bpf: Clean up fmod_ret in bench_rename test script
        selftests/bpf: Fix repeat option when kfunc_call verification fails
        libbpf: Set close-on-exec flag on gzopen
        bpftool: fix perf help message
        bpf: Remove unused declaration bpf_link_new_file()
      ====================
      
      Link: https://lore.kernel.org/r/20230816212840.1539-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f54a2a13
    • Jakub Kicinski's avatar
      Revert "net: ethernet: ti: am65-cpsw: add mqprio qdisc offload in channel mode" · 42b118c9
      Jakub Kicinski authored
      This reverts commit 90bc21aa.
      
      Patch was merged too hastily, Vladimir requested changes in:
      https://lore.kernel.org/all/20230816121305.5dio5tk3chge2ndh@skbuf/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      42b118c9
  4. 16 Aug, 2023 10 commits
    • Martin KaFai Lau's avatar
      Merge branch 'bpf: Force to MPTCP' · de405373
      Martin KaFai Lau authored
      Geliang Tang says:
      
      ====================
      As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
      
      "Your app should create sockets with IPPROTO_MPTCP as the proto:
      ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
      forced to create and use MPTCP sockets instead of TCP ones via the
      mptcpize command bundled with the mptcpd daemon."
      
      But the mptcpize (LD_PRELOAD technique) command has some limitations
      [2]:
      
       - it doesn't work if the application is not using libc (e.g. GoLang
      apps)
       - in some envs, it might not be easy to set env vars / change the way
      apps are launched, e.g. on Android
       - mptcpize needs to be launched with all apps that want MPTCP: we could
      have more control from BPF to enable MPTCP only for some apps or all the
      ones of a netns or a cgroup, etc.
       - it is not in BPF, we cannot talk about it at netdev conf.
      
      So this patchset attempts to use BPF to implement functions similer to
      mptcpize.
      
      The main idea is to add a hook in sys_socket() to change the protocol id
      from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
      
      [1]
      https://github.com/multipath-tcp/mptcp_net-next/wiki
      [2]
      https://github.com/multipath-tcp/mptcp_net-next/issues/79
      
      v14:
       - Use getsockopt(MPTCP_INFO) to verify mptcp protocol intead of using
      nstat command.
      
      v13:
       - drop "Use random netns name for mptcp" patch.
      
      v12:
       - update diag_* log of update_socket_protocol.
       - add 'ip netns show' after 'ip netns del' to check if there is
      a test did not clean up its netns.
       - return libbpf_get_error() instead of -EIO for the error from
      open_and_load().
       - Use getsockopt(SOL_PROTOCOL) to verify mptcp protocol intead of
      using 'ss -tOni'.
      
      v11:
       - add comments about outputs of 'ss' and 'nstat'.
       - use "err = verify_mptcpify()" instead of using =+.
      
      v10:
       - drop "#ifdef CONFIG_BPF_JIT".
       - include vmlinux.h and bpf_tracing_net.h to avoid defining some
      macros.
       - drop unneeded checks for mptcp.
      
      v9:
       - update comment for 'update_socket_protocol'.
      
      v8:
       - drop the additional checks on the 'protocol' value after the
      'update_socket_protocol()' call.
      
      v7:
       - add __weak and __diag_* for update_socket_protocol.
      
      v6:
       - add update_socket_protocol.
      
      v5:
       - add bpf_mptcpify helper.
      
      v4:
       - use lsm_cgroup/socket_create
      
      v3:
       - patch 8: char cmd[128]; -> char cmd[256];
      
      v2:
       - Fix build selftests errors reported by CI
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      de405373
    • Geliang Tang's avatar
      selftests/bpf: Add mptcpify test · ddba1224
      Geliang Tang authored
      Implement a new test program mptcpify: if the family is AF_INET or
      AF_INET6, the type is SOCK_STREAM, and the protocol ID is 0 or
      IPPROTO_TCP, set it to IPPROTO_MPTCP. It will be hooked in
      update_socket_protocol().
      
      Extend the MPTCP test base, add a selftest test_mptcpify() for the
      mptcpify case. Open and load the mptcpify test prog to mptcpify the
      TCP sockets dynamically, then use start_server() and connect_to_fd()
      to create a TCP socket, but actually what's created is an MPTCP
      socket, which can be verified through 'getsockopt(SOL_PROTOCOL)'
      and 'getsockopt(MPTCP_INFO)'.
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/364e72f307e7bb38382ec7442c182d76298a9c41.1692147782.git.geliang.tang@suse.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      ddba1224
    • Geliang Tang's avatar
      selftests/bpf: Fix error checks of mptcp open_and_load · 20774655
      Geliang Tang authored
      Return libbpf_get_error(), instead of -EIO, for the error from
      mptcp_sock__open_and_load().
      
      Load success means prog_fd and map_fd are always valid. So drop these
      unneeded ASSERT_GE checks for them in mptcp run_test().
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/db5fcb93293df9ab173edcbaf8252465b80da6f2.1692147782.git.geliang.tang@suse.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      20774655
    • Geliang Tang's avatar
      selftests/bpf: Add two mptcp netns helpers · 97c9c652
      Geliang Tang authored
      Add two netns helpers for mptcp tests: create_netns() and
      cleanup_netns(). Use them in test_base().
      
      These new helpers will be re-used in the following commits
      introducing new tests.
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/7506371fb6c417b401cc9d7365fe455754f4ba3f.1692147782.git.geliang.tang@suse.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      97c9c652
    • Geliang Tang's avatar
      bpf: Add update_socket_protocol hook · 0dd061a6
      Geliang Tang authored
      Add a hook named update_socket_protocol in __sys_socket(), for bpf
      progs to attach to and update socket protocol. One user case is to
      force legacy TCP apps to create and use MPTCP sockets instead of
      TCP ones.
      
      Define a fmod_ret set named bpf_mptcp_fmodret_ids, add the hook
      update_socket_protocol into this set, and register it in
      bpf_mptcp_kfunc_init().
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79Acked-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/ac84be00f97072a46f8a72b4e2be46cbb7fa5053.1692147782.git.geliang.tang@suse.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      0dd061a6
    • Daniel Borkmann's avatar
      bpftool: Implement link show support for xdp · 053bbf9b
      Daniel Borkmann authored
      Add support to dump XDP link information to bpftool. This reuses the
      recently added show_link_ifindex_{plain,json}(). The XDP link info only
      exposes the ifindex.
      
      Below shows an example link dump output, and a cgroup link is included
      for comparison, too:
      
        # bpftool link
        [...]
        10: cgroup  prog 2466
              cgroup_id 1  attach_type cgroup_inet6_post_bind
        [...]
        16: xdp  prog 2477
              ifindex enp5s0(3)
        [...]
      
      Equivalent json output:
      
        # bpftool link --json
        [...]
        {
          "id": 10,
          "type": "cgroup",
          "prog_id": 2466,
          "cgroup_id": 1,
          "attach_type": "cgroup_inet6_post_bind"
        },
        [...]
        {
          "id": 16,
          "type": "xdp",
          "prog_id": 2477,
          "devname": "enp5s0",
          "ifindex": 3
        }
        [...]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230816095651.10014-2-daniel@iogearbox.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      053bbf9b
    • Daniel Borkmann's avatar
      bpftool: Implement link show support for tcx · e16e6c6d
      Daniel Borkmann authored
      Add support to dump tcx link information to bpftool. This adds a
      common helper show_link_ifindex_{plain,json}() which can be reused
      also for other link types. The plain text and json device output is
      the same format as in bpftool net dump.
      
      Below shows an example link dump output along with a cgroup link
      for comparison:
      
        # bpftool link
        [...]
        10: cgroup  prog 1977
              cgroup_id 1  attach_type cgroup_inet6_post_bind
        [...]
        13: tcx  prog 2053
              ifindex enp5s0(3)  attach_type tcx_ingress
        14: tcx  prog 2080
              ifindex enp5s0(3)  attach_type tcx_egress
        [...]
      
      Equivalent json output:
      
        # bpftool link --json
        [...]
        {
          "id": 10,
          "type": "cgroup",
          "prog_id": 1977,
          "cgroup_id": 1,
          "attach_type": "cgroup_inet6_post_bind"
        },
        [...]
        {
          "id": 13,
          "type": "tcx",
          "prog_id": 2053,
          "devname": "enp5s0",
          "ifindex": 3,
          "attach_type": "tcx_ingress"
        },
        {
          "id": 14,
          "type": "tcx",
          "prog_id": 2080,
          "devname": "enp5s0",
          "ifindex": 3,
          "attach_type": "tcx_egress"
        }
        [...]
      Suggested-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Acked-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/r/20230816095651.10014-1-daniel@iogearbox.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      e16e6c6d
    • Yafang Shao's avatar
      selftests/bpf: Add selftest for fill_link_info · 23cf7aa5
      Yafang Shao authored
      Add selftest for the fill_link_info of uprobe, kprobe and tracepoint.
      The result:
      
        $ tools/testing/selftests/bpf/test_progs --name=fill_link_info
        #79/1    fill_link_info/kprobe_link_info:OK
        #79/2    fill_link_info/kretprobe_link_info:OK
        #79/3    fill_link_info/kprobe_invalid_ubuff:OK
        #79/4    fill_link_info/tracepoint_link_info:OK
        #79/5    fill_link_info/uprobe_link_info:OK
        #79/6    fill_link_info/uretprobe_link_info:OK
        #79/7    fill_link_info/kprobe_multi_link_info:OK
        #79/8    fill_link_info/kretprobe_multi_link_info:OK
        #79/9    fill_link_info/kprobe_multi_invalid_ubuff:OK
        #79      fill_link_info:OK
        Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED
      
      The test case for kprobe_multi won't be run on aarch64, as it is not
      supported.
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/20230813141900.1268-3-laoar.shao@gmail.com
      23cf7aa5
    • Yafang Shao's avatar
      bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe() · 0aa35162
      Yafang Shao authored
      The commit 1b715e1b ("bpf: Support ->fill_link_info for perf_event") leads
      to the following Smatch static checker warning:
      
          kernel/bpf/syscall.c:3416 bpf_perf_link_fill_kprobe()
          error: uninitialized symbol 'type'.
      
      That can happens when uname is NULL. So fix it by verifying the uname when we
      really need to fill it.
      
      Fixes: 1b715e1b ("bpf: Support ->fill_link_info for perf_event")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Closes: https://lore.kernel.org/bpf/85697a7e-f897-4f74-8b43-82721bebc462@kili.mountain
      Link: https://lore.kernel.org/bpf/20230813141900.1268-2-laoar.shao@gmail.com
      0aa35162
    • David S. Miller's avatar
      Merge branch 'ipv6-expired-routes' · 950fe358
      David S. Miller authored
      Kui-Feng Lee says:
      
      ====================
      Remove expired routes with a separated list of routes.
      
      FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
      can be expensive if the number of routes in a table is big, even if most of
      them are permanent. Checking routes in a separated list of routes having
      expiration will avoid this potential issue.
      
      Background
      ==========
      
      The size of a Linux IPv6 routing table can become a big problem if not
      managed appropriately.  Now, Linux has a garbage collector to remove
      expired routes periodically.  However, this may lead to a situation in
      which the routing path is blocked for a long period due to an
      excessive number of routes.
      
      For example, years ago, there is a commit c7bb4b89 ("ipv6: tcp:
      drop silly ICMPv6 packet too big messages").  The root cause is that
      malicious ICMPv6 packets were sent back for every small packet sent to
      them. These packets add routes with an expiration time that prompts
      the GC to periodically check all routes in the tables, including
      permanent ones.
      
      Why Route Expires
      =================
      
      Users can add IPv6 routes with an expiration time manually. However,
      the Neighbor Discovery protocol may also generate routes that can
      expire.  For example, Router Advertisement (RA) messages may create a
      default route with an expiration time. [RFC 4861] For IPv4, it is not
      possible to set an expiration time for a route, and there is no RA, so
      there is no need to worry about such issues.
      
      Create Routes with Expires
      ==========================
      
      You can create routes with expires with the  command.
      
      For example,
      
          ip -6 route add 2001:b000:591::3 via fe80::5054:ff:fe12:3457 \
              dev enp0s3 expires 30
      
      The route that has been generated will be deleted automatically in 30
      seconds.
      
      GC of FIB6
      ==========
      
      The function called fib6_run_gc() is responsible for performing
      garbage collection (GC) for the Linux IPv6 stack. It checks for the
      expiration of every route by traversing the trees of routing
      tables. The time taken to traverse a routing table increases with its
      size. Holding the routing table lock during traversal is particularly
      undesirable. Therefore, it is preferable to keep the lock for the
      shortest possible duration.
      
      Solution
      ========
      
      The cause of the issue is keeping the routing table locked during the
      traversal of large trees. To solve this problem, we can create a separate
      list of routes that have expiration. This will prevent GC from checking
      permanent routes.
      
      Result
      ======
      
      We conducted a test to measure the execution times of fib6_gc_timer_cb()
      and observed that it enhances the GC of FIB6. During the test, we added
      permanent routes with the following numbers: 1000, 3000, 6000, and
      9000. Additionally, we added a route with an expiration time.
      
      Here are the average execution times for the kernel without the patch.
       - 120020 ns with 1000 permanent routes
       - 308920 ns with 3000 ...
       - 581470 ns with 6000 ...
       - 855310 ns with 9000 ...
      
      The kernel with the patch consistently takes around 14000 ns to execute,
      regardless of the number of permanent routes that are installed.
      
      Major changes from v7:
      
       - Fix warings raised by the patchwork.
      
      Major changes from v6:
      
       - Remove unnecessary check of tb6 in fib6_clean_expires_locked().
      
       - Use ib6_clean_expires_locked() instead in fib6_purge_rt().
      
      Major changes from v5:
      
       - Change the order of adding new routes to the GC list and starting
         GC timer.
      
       - Remove time measurements from the test case.
      
       - Stop forcing GC flush.
      
      Major changes from v4:
      
       - Detect existence of 'strace' in the test case.
      
      Major changes from v3:
      
       - Fix the type of arg according to feedback.
      
       - Add 1k temporary routes and 5K permanent routes in the test case.
         Measure time spending on GC with strace.
      
      Major changes from v2:
      
       - Remove unnecessary and incorrect sysctl restoring in the test case.
      
      Major changes from v1:
      
       - Moved gc_link to avoid creating a hole in fib6_info.
      
       - Moved fib6_set_expires*() and fib6_clean_expires*() to the header
         file and inlined. And removed duplicated lines.
      
       - Added a test case.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      950fe358