1. 15 May, 2023 12 commits
  2. 13 May, 2023 14 commits
  3. 12 May, 2023 14 commits
    • David S. Miller's avatar
      Merge branch 'sfc-decap' · ba79e9a7
      David S. Miller authored
      Edward Cree says:
      
      ====================
      sfc: more flexible encap matches on TC decap rules
      
      This series extends the TC offload support on EF100 to support optionally
       matching on the IP ToS and UDP source port of the outer header in rules
       performing tunnel decapsulation.  Both of these fields allow masked
       matches if the underlying hardware supports it (current EF100 hardware
       supports masking on ToS, but only exact-match on source port).
      Given that the source port is typically populated from a hash of inner
       header entropy, it's not clear whether filtering on it is useful, but
       since we can support it we may as well expose the capability.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba79e9a7
    • Edward Cree's avatar
      sfc: support TC decap rules matching on enc_src_port · b6583d5e
      Edward Cree authored
      Allow efx_tc_encap_match entries to include a udp_sport and a
       udp_sport_mask.  As with enc_ip_tos, use pseudos to enforce that all
       encap matches within a given <src_ip,dst_ip,udp_dport> tuple have
       the same udp_sport_mask.
      Note that since we use a single layer of pseudos for both fields, two
       matches that differ in (say) udp_sport value aren't permitted to have
       different ip_tos_mask, even though this would technically be safe.
      Current userland TC does not support setting enc_src_port; this patch
       was tested with an iproute2 patched to support it.
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6583d5e
    • Edward Cree's avatar
      sfc: support TC decap rules matching on enc_ip_tos · 3c9561c0
      Edward Cree authored
      Allow efx_tc_encap_match entries to include an ip_tos and ip_tos_mask.
      To avoid partially-overlapping Outer Rules (which can lead to undefined
       behaviour in the hardware), store extra "pseudo" entries in our
       encap_match hashtable, which are used to enforce that all Outer Rule
       entries within a given <src_ip,dst_ip,udp_dport> tuple (or IPv6
       equivalent) have the same ip_tos_mask.
      The "direct" encap_match entry takes a reference on the "pseudo",
       allowing it to be destroyed when all "direct" entries using it are
       removed.
      efx_tc_em_pseudo_type is an enum rather than just a bool because in
       future an additional pseudo-type will be added to support Conntrack
       offload.
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c9561c0
    • Edward Cree's avatar
      sfc: populate enc_ip_tos matches in MAE outer rules · 56beb35d
      Edward Cree authored
      Currently tc.c will block them before they get here, but following
       patch will change that.
      Use the extack message from efx_mae_check_encap_match_caps() instead
       of writing a new one, since there's now more being fed in than just
       an IP version.
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56beb35d
    • Edward Cree's avatar
      sfc: release encap match in efx_tc_flow_free() · 28fa3ac4
      Edward Cree authored
      When force-freeing leftover entries from our match_action_ht, call
       efx_tc_delete_rule(), which releases all the rule's resources, rather
       than open-coding it.  The open-coded version was missing a call to
       release the rule's encap match (if any).
      It probably doesn't matter as everything's being torn down anyway, but
       it's cleaner this way and prevents further error messages potentially
       being logged by efx_tc_encap_match_free() later on.
      Move efx_tc_flow_free() further down the file to avoid introducing a
       forward declaration of efx_tc_delete_rule().
      
      Fixes: 17654d84 ("sfc: add offloading of 'foreign' TC (decap) rules")
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28fa3ac4
    • wuych's avatar
      net: liquidio: lio_main: Remove unnecessary (void*) conversions · d3616dc7
      wuych authored
      Pointer variables of void * type do not require type cast.
      Signed-off-by: default avatarwuych <yunchuan@nfschina.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3616dc7
    • Alexander Mikhalitsyn's avatar
      sctp: add bpf_bypass_getsockopt proto callback · 2598619e
      Alexander Mikhalitsyn authored
      Implement ->bpf_bypass_getsockopt proto callback and filter out
      SCTP_SOCKOPT_PEELOFF, SCTP_SOCKOPT_PEELOFF_FLAGS and SCTP_SOCKOPT_CONNECTX3
      socket options from running eBPF hook on them.
      
      SCTP_SOCKOPT_PEELOFF and SCTP_SOCKOPT_PEELOFF_FLAGS options do fd_install(),
      and if BPF_CGROUP_RUN_PROG_GETSOCKOPT hook returns an error after success of
      the original handler sctp_getsockopt(...), userspace will receive an error
      from getsockopt syscall and will be not aware that fd was successfully
      installed into a fdtable.
      
      As pointed by Marcelo Ricardo Leitner it seems reasonable to skip
      bpf getsockopt hook for SCTP_SOCKOPT_CONNECTX3 sockopt too.
      Because internaly, it triggers connect() and if error is masked
      then userspace will be confused.
      
      This patch was born as a result of discussion around a new SCM_PIDFD interface:
      https://lore.kernel.org/all/20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com/
      
      Fixes: 0d01da6a ("bpf: implement getsockopt and setsockopt hooks")
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: linux-sctp@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Suggested-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
      Acked-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2598619e
    • David S. Miller's avatar
      Merge branch 'selftests-fcnal' · e7ea5080
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      selftests: fcnal: Test SO_DONTROUTE socket option.
      
      The objective is to cover kernel paths that use the RTO_ONLINK flag
      in .flowi4_tos. This way we'll be able to safely remove this flag in
      the future by properly setting .flowi4_scope instead. With these
      selftests in place, we can make sure this won't introduce regressions.
      
      For more context, the final objective is to convert .flowi4_tos to
      dscp_t, to ensure that ECN bits don't influence route and fib-rule
      lookups (see commit a410a0cf ("ipv6: Define dscp_t and stop taking
      ECN bits into account in fib6-rules")).
      
      These selftests only cover IPv4, as SO_DONTROUTE has no effect on IPv6
      sockets.
      
      v2:
        - Use two different nettest options for setting SO_DONTROUTE either
          on the server or on the client socket.
      
        - Use the above feature to run a single 'nettest -B' instance per
          test (instead of having two nettest processes for server and
          client).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7ea5080
    • Guillaume Nault's avatar
      selftests: fcnal: Test SO_DONTROUTE on raw and ping sockets. · ceec9f27
      Guillaume Nault authored
      Use ping -r to test the kernel behaviour with raw and ping sockets
      having the SO_DONTROUTE option.
      
      Since ipv4_ping_novrf() is called with different values of
      net.ipv4.ping_group_range, then it tests both raw and ping sockets
      (ping uses ping sockets if its user ID belongs to ping_group_range
      and raw sockets otherwise).
      
      With both socket types, sending packets to a neighbour (on link) host,
      should work. When the host is behind a router, sending should fail.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceec9f27
    • Guillaume Nault's avatar
      selftests: fcnal: Test SO_DONTROUTE on UDP sockets. · a431327c
      Guillaume Nault authored
      Use nettest --client-dontroute to test the kernel behaviour with UDP
      sockets having the SO_DONTROUTE option. Sending packets to a neighbour
      (on link) host, should work. When the host is behind a router, sending
      should fail.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a431327c
    • Guillaume Nault's avatar
      selftests: fcnal: Test SO_DONTROUTE on TCP sockets. · dd017c72
      Guillaume Nault authored
      Use nettest --{client,server}-dontroute to test the kernel behaviour
      with TCP sockets having the SO_DONTROUTE option. Sending packets to a
      neighbour (on link) host, should work. When the host is behind a
      router, sending should fail.
      
      Client and server sockets are tested independently, so that we can
      cover different TCP kernel paths.
      
      SO_DONTROUTE also affects the syncookies path. So ipv4_tcp_dontroute()
      is made to work with or without syncookies, to cover both paths.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd017c72
    • Guillaume Nault's avatar
      selftests: Add SO_DONTROUTE option to nettest. · aeefbb57
      Guillaume Nault authored
      Add --client-dontroute and --server-dontroute options to nettest. They
      allow to set the SO_DONTROUTE option to the client and server sockets
      respectively. This will be used by the following patches to test
      the SO_DONTROUTE kernel behaviour with TCP and UDP.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aeefbb57
    • Simon Horman's avatar
      bonding: Always assign be16 value to vlan_proto · c1bc7d73
      Simon Horman authored
      The type of the vlan_proto field is __be16.
      And most users of the field use it as such.
      
      In the case of setting or testing the field for the special VLAN_N_VID
      value, host byte order is used. Which seems incorrect.
      
      It also seems somewhat odd to store a VLAN ID value in a field that is
      otherwise used to store Ether types.
      
      Address this issue by defining BOND_VLAN_PROTO_NONE, a big endian value.
      0xffff was chosen somewhat arbitrarily. What is important is that it
      doesn't overlap with any valid VLAN Ether types.
      
      I don't believe the problems described above are a bug because
      VLAN_N_VID in both little-endian and big-endian byte order does not
      conflict with any supported VLAN Ether types in big-endian byte order.
      
      Reported by sparse as:
      
       .../bond_main.c:2857:26: warning: restricted __be16 degrades to integer
       .../bond_main.c:2863:20: warning: restricted __be16 degrades to integer
       .../bond_main.c:2939:40: warning: incorrect type in assignment (different base types)
       .../bond_main.c:2939:40:    expected restricted __be16 [usertype] vlan_proto
       .../bond_main.c:2939:40:    got int
      
      No functional changes intended.
      Compile tested only.
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1bc7d73
    • David S. Miller's avatar
      Merge branch 'net-handshake-fixes' · deb2e484
      David S. Miller authored
      Chuck Lever says:
      
      ====================
      Bug fixes for net/handshake
      
      Please consider these for merge via net-next.
      
      Paolo observed that there is a possible leak of sock->file. I
      haven't looked into that yet, but it seems to be separate from
      the fixes in this series, so no need to hold these up.
      
      Changes since v2:
      - Address Paolo comment regarding handshake_dup()
      
      Changes since v1:
      - Rework "Fix handshake_dup() ref counting"
      - Unpin sock->file when a handshake is cancelled
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      deb2e484