1. 06 Mar, 2024 18 commits
  2. 04 Mar, 2024 8 commits
  3. 03 Mar, 2024 2 commits
    • Dave Thaler's avatar
      bpf, docs: Use IETF format for field definitions in instruction-set.rst · 4e73e1bc
      Dave Thaler authored
      In preparation for publication as an IETF RFC, the WG chairs asked me
      to convert the document to use IETF packet format for field layout, so
      this patch attempts to make it consistent with other IETF documents.
      
      Some fields that are not byte aligned were previously inconsistent
      in how values were defined.  Some were defined as the value of the
      byte containing the field (like 0x20 for a field holding the high
      four bits of the byte), and others were defined as the value of the
      field itself (like 0x2).  This PR makes them be consistent in using
      just the values of the field itself, which is IETF convention.
      
      As a result, some of the defines that used BPF_* would no longer
      match the value in the spec, and so this patch also drops the BPF_*
      prefix to avoid confusion with the defines that are the full-byte
      equivalent values.  For consistency, BPF_* is then dropped from
      other fields too.  BPF_<foo> is thus the Linux implementation-specific
      define for <foo> as it appears in the BPF ISA specification.
      
      The syntax BPF_ADD | BPF_X | BPF_ALU only worked for full-byte
      values so the convention {ADD, X, ALU} is proposed for referring
      to field values instead.
      
      Also replace the redundant "LSB bits" with "least significant bits".
      
      A preview of what the resulting Internet Draft would look like can
      be seen at:
      https://htmlpreview.github.io/?https://raw.githubusercontent.com/dthaler/ebp
      f-docs-1/format/draft-ietf-bpf-isa.html
      
      v1->v2: Fix sphinx issue as recommended by David Vernet
      Signed-off-by: default avatarDave Thaler <dthaler1968@gmail.com>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20240301222337.15931-1-dthaler1968@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4e73e1bc
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 4b2765ae
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2024-02-29
      
      We've added 119 non-merge commits during the last 32 day(s) which contain
      a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
      
      The main changes are:
      
      1) Extend the BPF verifier to enable static subprog calls in spin lock
         critical sections, from Kumar Kartikeya Dwivedi.
      
      2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
         in BPF global subprogs, from Andrii Nakryiko.
      
      3) Larger batch of riscv BPF JIT improvements and enabling inlining
         of the bpf_kptr_xchg() for RV64, from Pu Lehui.
      
      4) Allow skeleton users to change the values of the fields in struct_ops
         maps at runtime, from Kui-Feng Lee.
      
      5) Extend the verifier's capabilities of tracking scalars when they
         are spilled to stack, especially when the spill or fill is narrowing,
         from Maxim Mikityanskiy & Eduard Zingerman.
      
      6) Various BPF selftest improvements to fix errors under gcc BPF backend,
         from Jose E. Marchesi.
      
      7) Avoid module loading failure when the module trying to register
         a struct_ops has its BTF section stripped, from Geliang Tang.
      
      8) Annotate all kfuncs in .BTF_ids section which eventually allows
         for automatic kfunc prototype generation from bpftool, from Daniel Xu.
      
      9) Several updates to the instruction-set.rst IETF standardization
         document, from Dave Thaler.
      
      10) Shrink the size of struct bpf_map resp. bpf_array,
          from Alexei Starovoitov.
      
      11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
          from Benjamin Tissoires.
      
      12) Fix bpftool to be more portable to musl libc by using POSIX's
          basename(), from Arnaldo Carvalho de Melo.
      
      13) Add libbpf support to gcc in CORE macro definitions,
          from Cupertino Miranda.
      
      14) Remove a duplicate type check in perf_event_bpf_event,
          from Florian Lehner.
      
      15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
          with notrace correctly, from Yonghong Song.
      
      16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
          array to fix build warnings, from Kees Cook.
      
      17) Fix resolve_btfids cross-compilation to non host-native endianness,
          from Viktor Malik.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
        selftests/bpf: Test if shadow types work correctly.
        bpftool: Add an example for struct_ops map and shadow type.
        bpftool: Generated shadow variables for struct_ops maps.
        libbpf: Convert st_ops->data to shadow type.
        libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
        bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
        bpf, arm64: use bpf_prog_pack for memory management
        arm64: patching: implement text_poke API
        bpf, arm64: support exceptions
        arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
        bpf: add is_async_callback_calling_insn() helper
        bpf: introduce in_sleepable() helper
        bpf: allow more maps in sleepable bpf programs
        selftests/bpf: Test case for lacking CFI stub functions.
        bpf: Check cfi_stubs before registering a struct_ops type.
        bpf: Clarify batch lookup/lookup_and_delete semantics
        bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
        bpf, docs: Fix typos in instruction-set.rst
        selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
        bpf: Shrink size of struct bpf_map/bpf_array.
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4b2765ae
  4. 01 Mar, 2024 12 commits
    • David S. Miller's avatar
      Merge branch 'inet_dump_ifaddr-no-rtnl' · e9608257
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      inet: no longer use RTNL to protect inet_dump_ifaddr()
      
      This series convert inet so that a dump of addresses (ip -4 addr)
      no longer requires RTNL.
      ====================
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9608257
    • Eric Dumazet's avatar
      inet: use xa_array iterator to implement inet_dump_ifaddr() · cdb2f80f
      Eric Dumazet authored
      1) inet_dump_ifaddr() can can run under RCU protection
         instead of RTNL.
      
      2) properly return 0 at the end of a dump, avoiding an
         an extra recvmsg() system call.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdb2f80f
    • Eric Dumazet's avatar
      inet: prepare inet_base_seq() to run without RTNL · 590e92cd
      Eric Dumazet authored
      In the following patch, inet_base_seq() will no longer be called
      with RTNL held.
      
      Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
      and inet_base_seq().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      590e92cd
    • Eric Dumazet's avatar
      inet: annotate data-races around ifa->ifa_flags · 3ddc2231
      Eric Dumazet authored
      ifa->ifa_flags can be read locklessly.
      
      Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ddc2231
    • Eric Dumazet's avatar
      inet: annotate data-races around ifa->ifa_preferred_lft · 9f6fa3c4
      Eric Dumazet authored
      ifa->ifa_preferred_lft can be read locklessly.
      
      Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f6fa3c4
    • Eric Dumazet's avatar
      inet: annotate data-races around ifa->ifa_valid_lft · a5fcf74d
      Eric Dumazet authored
      ifa->ifa_valid_lft can be read locklessly.
      
      Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5fcf74d
    • Eric Dumazet's avatar
      inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp · 3cd3e72c
      Eric Dumazet authored
      ifa->ifa_tstamp can be read locklessly.
      
      Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
      
      Do the same for ifa->ifa_cstamp to prepare upcoming changes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cd3e72c
    • David S. Miller's avatar
      Merge branch 'netdevsim-link' · 76f06cbd
      David S. Miller authored
      David Wei says:
      
      ====================
      netdevsim: link and forward skbs between ports
      
      This patchset adds the ability to link two netdevsim ports together and
      forward skbs between them, similar to veth. The goal is to use netdevsim
      for testing features e.g. zero copy Rx using io_uring.
      
      This feature was tested locally on QEMU, and a selftest is included.
      
      I ran netdev selftests CI style and all tests but the following passed:
      - gro.sh
      - l2tp.sh
      - ip_local_port_range.sh
      
      gro.sh fails because virtme-ng mounts as read-only and it tries to write
      to log.txt. This issue was reported to virtme-ng upstream.
      
      l2tp.sh and ip_local_port_range.sh both fail for me on net-next/main as
      well.
      
      ---
      v13->v14:
      - implement ndo_get_iflink()
      - fix returning 0 if peer is already linked during linking or not linked
        during unlinking
      - bump dropped counter if nsim_ipsec_tx() fails and generally reorder
        nsim_start_xmit()
      - fix overflowing lines and indentations
      
      v12->v13:
      - wait for socat listening port to be ready before sending data in
        selftest
      
      v11->v12:
      - fix leaked netns refs
      - fix rtnetlink.sh kci_test_ipsec_offload() selftest
      
      v10->v11:
      - add udevadm settle after creating netdevsims in selftest
      
      v9->v10:
      - fix not freeing skb when not there is no peer
      - prevent possible id clashes in selftest
      - cleanup selftest on error paths
      
      v8->v9:
      - switch to getting netns using fd rather than id
      - prevent linking a netdevsim to itself
      - update tests
      
      v7->v8:
      - fix not dereferencing RCU ptr using rcu_dereference()
      - remove unused variables in selftest
      
      v6->v7:
      - change link syntax to netnsid:ifidx
      - replace dev_get_by_index() with __dev_get_by_index()
      - check for NULL peer when linking
      - add a sysfs attribute for unlinking
      - only update Tx stats if not dropped
      - update selftest
      
      v5->v6:
      - reworked to link two netdevsims using sysfs attribute on the bus
        device instead of debugfs due to deadlock possibility if a netdevsim
        is removed during linking
      - removed unnecessary patch maintaining a list of probed nsim_devs
      - updated selftest
      
      v4->v5:
      - reduce nsim_dev_list_lock critical section
      - fixed missing mutex unlock during unwind ladder
      - rework nsim_dev_peer_write synchronization to take devlink lock as
        well as rtnl_lock
      - return err msgs to user during linking if port doesn't exist or
        linking to self
      - update tx stats outside of RCU lock
      
      v3->v4:
      - maintain a mutex protected list of probed nsim_devs instead of using
        nsim_bus_dev
      - fixed synchronization issues by taking rtnl_lock
      - track tx_dropped skbs
      
      v2->v3:
      - take lock when traversing nsim_bus_dev_list
      - take device ref when getting a nsim_bus_dev
      - return 0 if nsim_dev_peer_read cannot find the port
      - address code formatting
      - do not hard code values in selftests
      - add Makefile for selftests
      
      v1->v2:
      - renamed debugfs file from "link" to "peer"
      - replaced strstep() with sscanf() for consistency
      - increased char[] buf sz to 22 for copying id + port from user
      - added err msg w/ expected fmt when linking as a hint to user
      - prevent linking port to itself
      - protect peer ptr using RCU
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76f06cbd
    • David Wei's avatar
      netdevsim: fix rtnetlink.sh selftest · 8ee60f9c
      David Wei authored
      I cleared IFF_NOARP flag from netdevsim dev->flags in order to support
      skb forwarding. This breaks the rtnetlink.sh selftest
      kci_test_ipsec_offload() test because ipsec does not connect to peers it
      cannot transmit to.
      
      Fix the issue by adding a neigh entry manually. ipsec_offload test now
      successfully pass.
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ee60f9c
    • David Wei's avatar
      netdevsim: add selftest for forwarding skb between connected ports · dfb429ea
      David Wei authored
      Connect two netdevsim ports in different namespaces together, then send
      packets between them using socat.
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Reviewed-by: default avatarMaciek Machnikowski <maciek@machnikowski.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfb429ea
    • David Wei's avatar
      netdevsim: add ndo_get_iflink() implementation · 8debcf58
      David Wei authored
      Add an implementation for ndo_get_iflink() in netdevsim that shows the
      ifindex of the linked peer, if any.
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Reviewed-by: default avatarMaciek Machnikowski <maciek@machnikowski.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8debcf58
    • David Wei's avatar
      netdevsim: forward skbs from one connected port to another · 9eb95228
      David Wei authored
      Forward skbs sent from one netdevsim port to its connected netdevsim
      port using dev_forward_skb, in a spirit similar to veth.
      
      Add a tx_dropped variable to struct netdevsim, tracking the number of
      skbs that could not be forwarded using dev_forward_skb().
      
      The xmit() function accessing the peer ptr is protected by an RCU read
      critical section. The rcu_read_lock() is functionally redundant as since
      v5.0 all softirqs are implicitly RCU read critical sections; but it is
      useful for human readers.
      
      If another CPU is concurrently in nsim_destroy(), then it will first set
      the peer ptr to NULL. This does not affect any existing readers that
      dereferenced a non-NULL peer. Then, in unregister_netdevice(), there is
      a synchronize_rcu() before the netdev is actually unregistered and
      freed. This ensures that any readers i.e. xmit() that got a non-NULL
      peer will complete before the netdev is freed.
      
      Any readers after the RCU_INIT_POINTER() but before synchronize_rcu()
      will dereference NULL, making it safe.
      
      The codepath to nsim_destroy() and nsim_create() takes both the newly
      added nsim_dev_list_lock and rtnl_lock. This makes it safe with
      concurrent calls to linking two netdevsims together.
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Reviewed-by: default avatarMaciek Machnikowski <maciek@machnikowski.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eb95228