Commits · d0caf9876a1c9f844307effb598ad1312d9e0025 · Kirill Smelkov / linux

12 Sep, 2024 15 commits

netdev: add dmabuf introspection · d0caf987

Mina Almasry authored Sep 10, 2024

Add dmabuf information to page_pool stats:

$ ./cli.py --spec ../netlink/specs/netdev.yaml --dump page-pool-get
...
 {'dmabuf': 10,
  'id': 456,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 455,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 454,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 453,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 452,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 451,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 450,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},
 {'dmabuf': 10,
  'id': 449,
  'ifindex': 3,
  'inflight': 1023,
  'inflight-mem': 4190208},

And queue stats:

$ ./cli.py --spec ../netlink/specs/netdev.yaml --dump queue-get
...
{'dmabuf': 10, 'id': 8, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 9, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 10, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 11, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 12, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 13, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 14, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 15, 'ifindex': 3, 'type': 'rx'},
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-14-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d0caf987

selftests: add ncdevmem, netcat for devmem TCP · 85585b4b

Mina Almasry authored Sep 10, 2024

ncdevmem is a devmem TCP netcat. It works similarly to netcat, but it
sends and receives data using the devmem TCP APIs. It uses udmabuf as
the dmabuf provider. It is compatible with a regular netcat running on
a peer, or a ncdevmem running on a peer.

In addition to normal netcat support, ncdevmem has a validation mode,
where it sends a specific pattern and validates this pattern on the
receiver side to ensure data integrity.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240910171458.219195-13-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

85585b4b

net: add devmem TCP documentation · 09d1db26

Mina Almasry authored Sep 10, 2024

Add documentation outlining the usage and details of devmem TCP.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240910171458.219195-12-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

09d1db26

net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags · 678f6e28

Mina Almasry authored Sep 10, 2024

Add an interface for the user to notify the kernel that it is done
reading the devmem dmabuf frags returned as cmsg. The kernel will
drop the reference on the frags to make them available for reuse.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-11-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

678f6e28

tcp: RX path for devmem TCP · 8f0b3cc9

Mina Almasry authored Sep 10, 2024

In tcp_recvmsg_locked(), detect if the skb being received by the user
is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM
flag - pass it to tcp_recvmsg_devmem() for custom handling.

tcp_recvmsg_devmem() copies any data in the skb header to the linear
buffer, and returns a cmsg to the user indicating the number of bytes
returned in the linear buffer.

tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags,
and returns to the user a cmsg_devmem indicating the location of the
data in the dmabuf device memory. cmsg_devmem contains this information:

1. the offset into the dmabuf where the payload starts. 'frag_offset'.
2. the size of the frag. 'frag_size'.
3. an opaque token 'frag_token' to return to the kernel when the buffer
is to be released.

The pages awaiting freeing are stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page.  All pages are released when the socket is
destroyed.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-10-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8f0b3cc9

net: add support for skbs with unreadable frags · 65249feb

Mina Almasry authored Sep 10, 2024

For device memory TCP, we expect the skb headers to be available in host
memory for access, and we expect the skb frags to be in device memory
and unaccessible to the host. We expect there to be no mixing and
matching of device memory frags (unaccessible) with host memory frags
(accessible) in the same skb.

Add a skb->devmem flag which indicates whether the frags in this skb
are device memory frags or not.

__skb_fill_netmem_desc() now checks frags added to skbs for net_iov,
and marks the skb as skb->devmem accordingly.

Add checks through the network stack to avoid accessing the frags of
devmem skbs and avoid coalescing devmem skbs with non devmem skbs.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-9-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

65249feb

net: support non paged skb frags · 9f6b619e

Mina Almasry authored Sep 10, 2024

Make skb_frag_page() fail in the case where the frag is not backed
by a page, and fix its relevant callers to handle this case.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-8-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9f6b619e

memory-provider: dmabuf devmem memory provider · 0f921404

Mina Almasry authored Sep 10, 2024

Implement a memory provider that allocates dmabuf devmem in the form of
net_iov.

The provider receives a reference to the struct netdev_dmabuf_binding
via the pool->mp_priv pointer. The driver needs to set this pointer for
the provider in the net_iov.

The provider obtains a reference on the netdev_dmabuf_binding which
guarantees the binding and the underlying mapping remains alive until
the provider is destroyed.

Usage of PP_FLAG_DMA_MAP is required for this memory provide such that
the page_pool can provide the driver with the dma-addrs of the devmem.

Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order !=
0.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-7-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0f921404

page_pool: devmem support · 8ab79ed5

Mina Almasry authored Sep 10, 2024

Convert netmem to be a union of struct page and struct netmem. Overload
the LSB of struct netmem* to indicate that it's a net_iov, otherwise
it's a page.

Currently these entries in struct page are rented by the page_pool and
used exclusively by the net stack:

struct {
	unsigned long pp_magic;
	struct page_pool *pp;
	unsigned long _pp_mapping_pad;
	unsigned long dma_addr;
	atomic_long_t pp_ref_count;
};

Mirror these (and only these) entries into struct net_iov and implement
netmem helpers that can access these common fields regardless of
whether the underlying type is page or net_iov.

Implement checks for net_iov in netmem helpers which delegate to mm
APIs, to ensure net_iov are never passed to the mm stack.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-6-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8ab79ed5

netdev: netdevice devmem allocator · 28c5c74e

Mina Almasry authored Sep 10, 2024

Implement netdev devmem allocator. The allocator takes a given struct
netdev_dmabuf_binding as input and allocates net_iov from that
binding.

The allocation simply delegates to the binding's genpool for the
allocation logic and wraps the returned memory region in a net_iov
struct.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-5-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

28c5c74e

netdev: support binding dma-buf to netdevice · 170aafe3

Mina Almasry authored Sep 10, 2024

Add a netdev_dmabuf_binding struct which represents the
dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
rx queues on the netdevice. On the binding, the dma_buf_attach
& dma_buf_map_attachment will occur. The entries in the sg_table from
mapping will be inserted into a genpool to make it ready
for allocation.

The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
holds the dma-buf offset of the base of the chunk and the dma_addr of
the chunk. Both are needed to use allocations that come from this chunk.

We create a new type that represents an allocation from the genpool:
net_iov. We setup the net_iov allocation size in the
genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
allocated by the page pool and given to the drivers.

The user can unbind the dmabuf from the netdevice by closing the netlink
socket that established the binding. We do this so that the binding is
automatically unbound even if the userspace process crashes.

The binding and unbinding leaves an indicator in struct netdev_rx_queue
that the given queue is bound, and the binding is actuated by resetting
the rx queue using the queue API.

The netdev_dmabuf_binding struct is refcounted, and releases its
resources only when all the refs are released.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # excluding netlink
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-4-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

170aafe3

net: netdev netlink api to bind dma-buf to a net device · 3efd7ab4

Mina Almasry authored Sep 10, 2024

API takes the dma-buf fd as input, and binds it to the netdevice. The
user can specify the rx queues to bind the dma-buf to.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-3-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3efd7ab4

netdev: add netdev_rx_queue_restart() · 7c88f865

Mina Almasry authored Sep 10, 2024

Add netdev_rx_queue_restart(), which resets an rx queue using the
queue API recently merged[1].

The queue API was merged to enable the core net stack to reset individual
rx queues to actuate changes in the rx queue's configuration. In later
patches in this series, we will use netdev_rx_queue_restart() to reset
rx queues after binding or unbinding dmabuf configuration, which will
cause reallocation of the page_pool to repopulate its memory using the
new configuration.

[1] https://lore.kernel.org/netdev/20240430231420.699177-1-shailend@google.com/T/Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-2-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7c88f865

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 24b8c193

Jakub Kicinski authored Sep 11, 2024

Tony Nguyen says:

====================
idpf: XDP chapter II: convert Tx completion to libeth

Alexander Lobakin says:

XDP for idpf is currently 5 chapters:
* convert Rx to libeth;
* convert Tx completion to libeth (this);
* generic XDP and XSk code changes;
* actual XDP for idpf via libeth_xdp;
* XSk for idpf (^).

Part II does the following:
* adds generic libeth Tx completion routines;
* converts idpf to use generic libeth Tx comp routines;
* fixes Tx queue timeouts and robustifies Tx completion in general;
* fixes Tx event/descriptor flushes (writebacks).

Most idpf patches again remove more lines than adds.
Generic Tx completion helpers and structs are needed as libeth_xdp
(Ch. III) makes use of them. WB_ON_ITR is needed since XDPSQs don't
want to work without it at all. Tx queue timeouts fixes are needed
since without them, it's way easier to catch a Tx timeout event when
WB_ON_ITR is enabled.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: enable WB_ON_ITR
  idpf: fix netdev Tx queue stop/wake
  idpf: refactor Tx completion routines
  netdevice: add netdev_tx_reset_subqueue() shorthand
  idpf: convert to libeth Tx buffer completion
  libeth: add Tx buffer completion helpers
====================

Link: https://patch.msgid.link/20240909205323.3110312-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

24b8c193

net: phy: microchip_t1: Cable Diagnostics for lan887x · b2c8a506

Divya Koppera authored Sep 09, 2024

Add support for cable diagnostics in lan887x PHY.
Using this we can diagnose connected/open/short wires and
also length where cable fault is occurred.
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240909114339.3446-1-divya.koppera@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b2c8a506

11 Sep, 2024 25 commits

net: ethtool: phy: Check the req_info.pdn field for GET commands · fce1e9f8

Maxime Chevallier authored Sep 10, 2024

When processing the netlink GET requests to get PHY info, the req_info.pdn
pointer is NULL when no PHY matches the requested parameters, such as when
the phy_index is invalid, or there's simply no PHY attached to the
interface.

Therefore, check the req_info.pdn pointer for NULL instead of
dereferencing it.
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iKRW0WpGAh1tKqY345D8WkYCPm3Y9ym--Si42JZrQAu1g@mail.gmail.com/T/#mfced87d607d18ea32b3b4934dfa18d7b36669285
Fixes: 17194be4 ("net: ethtool: Introduce a command to list PHYs on an interface")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910174636.857352-1-maxime.chevallier@bootlin.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fce1e9f8

net: gianfar: fix NVMEM mac address · b2d95440

Rosen Penev authored Sep 10, 2024

If nvmem loads after the ethernet driver, mac address assignments will
not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a
case so we need to handle that to avoid eth_hw_addr_random.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910220913.14101-1-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b2d95440

sfc: Add X4 PF support · cf06766f

Jonathan Cooper authored Sep 10, 2024

Add X4 series. Most functionality is the same as previous
EF10 nics but enough is different to warrant a new nic type struct
and revision; for example legacy interrupts and SRIOV are
not supported.

Most removed features will be re-added later as new implementations.
Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240910153014.12803-1-jonathan.s.cooper@amd.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cf06766f

qlcnic: make read-only const array key static · af647fe2

Colin Ian King authored Sep 10, 2024

Don't populate the const read-only array key on the stack at
run time, instead make it static.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910120635.115266-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

af647fe2

Merge branch 'mptcp-fallback-to-tcp-after-3-mpc-drop-cache' · 9ee92621

Jakub Kicinski authored Sep 11, 2024

Matthieu Baerts says:

====================
mptcp: fallback to TCP after 3 MPC drop + cache

The SYN + MPTCP_CAPABLE packets could be explicitly dropped by firewalls
somewhere in the network, e.g. if they decide to drop packets based on
the TCP options, instead of stripping them off.

The idea of this series is to fallback to TCP after 3 SYN+MPC drop
(patch 2). If the connection succeeds after the fallback, it very likely
means a blackhole has been detected. In this case (patch 3), MPTCP can
be disabled for a certain period of time, 1h by default. If after this
period, MPTCP is still blocked, the period is doubled. This technique is
inspired by the one used by TCP FastOpen.

This should help applications which want to use MPTCP by default on the
client side if available.
====================

Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-0-da7ebb4cd2a3@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9ee92621

mptcp: disable active MPTCP in case of blackhole · 27069e7c

Matthieu Baerts (NGI0) authored Sep 09, 2024

An MPTCP firewall blackhole can be detected if the following SYN
retransmission after a fallback to "plain" TCP is accepted.

In case of blackhole, a similar technique to the one in place with TFO
is now used: MPTCP can be disabled for a certain period of time, 1h by
default. This time period will grow exponentially when more blackhole
issues get detected right after MPTCP is re-enabled and will reset to
the initial value when the blackhole issue goes away.

The blackhole period can be modified thanks to a new sysctl knob:
blackhole_timeout. Two new MIB counters help understanding what's
happening:

- 'Blackhole', incremented when a blackhole is detected.
- 'MPCapableSYNTXDisabled', incremented when an MPTCP connection
  directly falls back to TCP during the blackhole period.

Because the technique is inspired by the one used by TFO, an important
part of the new code is similar to what can find in tcp_fastopen.c, with
some adaptations to the MPTCP case.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

27069e7c

mptcp: fallback to TCP after SYN+MPC drops · 6982826f

Matthieu Baerts (NGI0) authored Sep 09, 2024

Some middleboxes might be nasty with MPTCP, and decide to drop packets
with MPTCP options, instead of just dropping the MPTCP options (or
letting them pass...).

In this case, it sounds better to fallback to "plain" TCP after 2
retransmissions, and try again.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/477Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-2-da7ebb4cd2a3@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6982826f

mptcp: export mptcp_subflow_early_fallback() · 65b02260

Matthieu Baerts (NGI0) authored Sep 09, 2024

This helper will be used outside protocol.h in the following commit.

While at it, also add a 'pr_fallback()' debug print, to help identifying
fallbacks.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-1-da7ebb4cd2a3@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

65b02260

Merge branch 'net-hsr-use-the-seqnr-lock-for-frames-received-via-interlink-port' · 8b5d2e5c

Jakub Kicinski authored Sep 11, 2024

Sebastian Andrzej Siewior says:

====================
net: hsr: Use the seqnr lock for frames received via interlink port.

This is follow-up to the thread at
https://lore.kernel.org/all/20240904133725.1073963-1-edumazet@google.com/
====================

Link: https://patch.msgid.link/20240906132816.657485-1-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8b5d2e5c

net: hsr: Remove interlink_sequence_nr. · 35e24f28

Eric Dumazet authored Sep 06, 2024

Remove interlink_sequence_nr which is unused.

[ bigeasy: split out from Eric's patch ].
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240906132816.657485-3-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

35e24f28

net: hsr: Use the seqnr lock for frames received via interlink port. · 430d67bd

Sebastian Andrzej Siewior authored Sep 06, 2024

syzbot reported that the seqnr_lock is not acquire for frames received
over the interlink port. In the interlink case a new seqnr is generated
and assigned to the frame.
Frames, which are received over the slave port have already a sequence
number assigned so the lock is not required.

Acquire the hsr_priv::seqnr_lock during in the invocation of
hsr_forward_skb() if a packet has been received from the interlink port.

Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com
Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ
Fixes: 5055cccf ("net: hsr: Provide RedBox support (HSR-SAN)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Tested-by: Lukasz Majewski <lukma@denx.de>
Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

430d67bd

Merge tag 'wireless-next-2024-09-11' of... · a18c097e

Jakub Kicinski authored Sep 11, 2024

Merge tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.12

The last -next "new features" pull request for v6.12. The stack now
supports DFS on MLO but otherwise nothing really standing out.

Major changes:

cfg80211/mac80211
 * EHT rate support in AQL airtime
 * DFS support for MLO

rtw89
 * complete BT-coexistence code for RTL8852BT
 * RTL8922A WoWLAN net-detect support

* tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
  wifi: brcmfmac: cfg80211: Convert comma to semicolon
  wifi: rsi: Remove an unused field in struct rsi_debugfs
  wifi: libertas: Cleanup unused declarations
  wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe()
  wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe()
  wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param
  wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()
  wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()
  wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors
  wifi: cfg80211: fix kernel-doc for per-link data
  wifi: mt76: mt7925: replace chan config with extend txpower config for clc
  wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc
  wifi: mt76: mt7615: check devm_kasprintf() returned value
  wifi: mt76: mt7925: convert comma to semicolon
  wifi: mt76: mt7925: fix a potential association failure upon resuming
  wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings
  wifi: mt76: mt7921: Check devm_kasprintf() returned value
  wifi: mt76: mt7915: check devm_kasprintf() returned value
  wifi: mt76: mt7915: avoid long MCU command timeouts during SER
  wifi: mt76: mt7996: fix uninitialized TLV data
  ...
====================

Link: https://patch.msgid.link/20240911084147.A205DC4AF0F@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a18c097e

Merge branch 'lan743x-phylink' · bf73478b

David S. Miller authored Sep 11, 2024

Raju Lakkaraju says:

====================
Add support to PHYLINK for LAN743x/PCI11x1x chips

This is the follow-up patch series of
https://lkml.iu.edu/hypermail/linux/kernel/2310.2/02078.html

Divide the PHYLINK adaptation and SFP modifications into two separate patch
series.

The current patch series focuses on transitioning the LAN743x driver's PHY
support from phylib to phylink.

Tested on PCI11010 Rev-1 Evaluation board

Change List:
============
V5 -> V6:
  - Remove the lan743x_find_max_speed( ) function. Not require
  - Add EEE enable check before calling lan743x_mac_eee_enable( ) function
V4 -> V5:
  - Remove the fixed_phy_unregister( ) function. Not require
  - Remove the "phydev->eee_enabled" check to update the MAC EEE
    enable/disable
  - Call lan743x_mac_eee_enable() with true after update tx_lpi_timer.
  - Add phy_support_eee() to initialize the EEE flags
V3 -> V4:
  - Add fixed-link patch along with this series.
    Note: Note: This code was developed by Mr.Russell King
    Ref:
    https://lore.kernel.org/netdev/LV8PR11MB8700C786F5F1C274C73036CC9F8E2@LV8PR11MB8700.namprd11.prod.outlook.com/T/#me943adf54f1ea082edf294aba448fa003a116815
  - Change phylink fixed-link function header's string from "Returns" to
    "Returns:"
  - Remove the EEE private variable from LAN743x adapter strcture and fix the
    EEE's set/get functions
  - set the individual caps (i.e. _RGMII, _RGMII_ID, _RGMII_RXID and
    __RGMII_TXID) replace with phy_interface_set_rgmii( ) function
  - Change lan743x_set_eee( ) to lan743x_mac_eee_enable( )

V2 -> V3:
  - Remove the unwanted parens in each of these if() sub-blocks
  - Replace "to_net_dev(config->dev)" with "netdev".
  - Add GMII_ID/RGMII_TXID/RGMII_RXID in supported_interfaces
  - Fix the lan743x_phy_handle_exists( ) return type

V1 -> V2:
  - Fix the Russell King's comments i.e. remove the speed, duplex update in
    lan743x_phylink_mac_config( )
  - pre-March 2020 legacy support has been removed

V0 -> V1:
  - Integrate with Synopsys DesignWare XPCS drivers
  - Based on external review comments,
  - Changes made to SGMII interface support only 1G/100M/10M bps speed
  - Changes made to 2500Base-X interface support only 2.5Gbps speed
  - Add check for not is_sgmii_en with is_sfp_support_en support
  - Change the "pci11x1x_strap_get_status" function return type from void to
    int
  - Add ethtool phylink wol, eee, pause get/set functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bf73478b

net: lan743x: Add support to ethtool phylink get and set settings · f95f28d7

Raju Lakkaraju authored Sep 06, 2024

Add support to ethtool phylink functions:
  - get/set settings like speed, duplex etc
  - get/set the wake-on-lan (WOL)
  - get/set the energy-efficient ethernet (EEE)
  - get/set the pause
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f95f28d7

net: lan743x: Migrate phylib to phylink · a5f199a8

Raju Lakkaraju authored Sep 06, 2024

Migrate phy support from phylib to phylink.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5f199a8

net: lan743x: Create separate Link Speed Duplex state function · 92b740a4

Raju Lakkaraju authored Sep 06, 2024

Create separate Link Speed Duplex (LSD) update state function from
lan743x_sgmii_config () to use as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92b740a4

net: lan743x: Create separate PCS power reset function · ef025045

Raju Lakkaraju authored Sep 06, 2024

Create separate PCS power reset function from lan743x_sgmii_config () to use
as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef025045

net: phylink: Add phylink_set_fixed_link() to configure fixed link state in phylink · 4b3fc475

Russell King authored Sep 06, 2024

The function allows for the configuration of a fixed link state for a given
phylink instance. This addition is particularly useful for network devices that
operate with a fixed link configuration, where the link parameters do not change
dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up
the fixed link state during initialization or configuration changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b3fc475

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · f3b6129b

Jakub Kicinski authored Sep 10, 2024

Tony Nguyen says:

====================
ice: support devlink subfunction

Michal Swiatkowski says:

Currently ice driver does not allow creating more than one networking
device per physical function. The only way to have more hardware backed
netdev is to use SR-IOV.

Following patchset adds support for devlink port API. For each new
pcisf type port, driver allocates new VSI, configures all resources
needed, including dynamically MSIX vectors, program rules and registers
new netdev.

This series supports only one Tx/Rx queue pair per subfunction.

Example commands:
devlink port add pci/0000:31:00.1 flavour pcisf pfnum 1 sfnum 1000
devlink port function set pci/0000:31:00.1/1 hw_addr 00:00:00:00:03:14
devlink port function set pci/0000:31:00.1/1 state active
devlink port function del pci/0000:31:00.1/1

Make the port representor and eswitch code generic to support
subfunction representor type.

VSI configuration is slightly different between VF and SF. It needs to
be reflected in the code.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: subfunction activation and base devlink ops
  ice: basic support for VLAN in subfunctions
  ice: support subfunction devlink Tx topology
  ice: implement netdevice ops for SF representor
  ice: check if SF is ready in ethtool ops
  ice: don't set target VSI for subfunction
  ice: create port representor for SF
  ice: make representor code generic
  ice: implement netdev for subfunction
  ice: base subfunction aux driver
  ice: allocate devlink for subfunction
  ice: treat subfunction VSI the same as PF VSI
  ice: add basic devlink subfunctions support
  ice: export ice ndo_ops functions
  ice: add new VSI type for subfunctions
====================

Link: https://patch.msgid.link/20240906223010.2194591-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f3b6129b

Merge tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 474bb1aa

Jakub Kicinski authored Sep 10, 2024

Saeed Mahameed says:

====================
mlx5-updates-2024-08-29

HW-Managed Flow Steering in mlx5 driver

Yevgeny Kliteynik says:
=======================

1. Overview
-----------

ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.

The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.

Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).

In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.

The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
   - HW calculates CRC for each entry
   - HW handles tree hash collisions
   - HW & FW manage objects refcount
+ HW keeps cache coherency:
   - HW provides tree access locking and synchronization
   - HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
   - Dedicated HW components that handle insertion

2. Performance
--------------

Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.

3. Configuration
----------------

The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:

  # devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs

4. Upstream Submission
----------------------

HWS support consists of 3 main components:
+ Steering:
   - The lower layer that exposes HWS API to upper layers and implements
     all the management of flow steering building blocks
+ FS-Core
   - Implementation of fs_hws layer to enable fs_core to use HWS instead
     of FW or SW steering
   - Create HW steering action pools to utilize the ability of HWS to
     share steering actions among different rules
   - Add support for configuring HWS mode through devlink command,
     similar to configuring SWS mode
+ Connection Tracking
   - Implementation of CT support for HW steering
   - Hooks up the CT ops for the new steering mode and uses the HWS API
     to implement connection tracking.

Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.

5. Patches in this series
-------------------------

This patch series contains implementation of the first bullet from above.

=======================

* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: HWS, added API and enabled HWS support
  net/mlx5: HWS, added send engine and context handling
  net/mlx5: HWS, added debug dump and internal headers
  net/mlx5: HWS, added backward-compatible API handling
  net/mlx5: HWS, added memory management handling
  net/mlx5: HWS, added vport handling
  net/mlx5: HWS, added modify header pattern and args handling
  net/mlx5: HWS, added FW commands handling
  net/mlx5: HWS, added matchers functionality
  net/mlx5: HWS, added definers handling
  net/mlx5: HWS, added rules handling
  net/mlx5: HWS, added tables handling
  net/mlx5: HWS, added actions handling
  net/mlx5: Added missing definitions in preparation for HW Steering
  net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================

Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

474bb1aa

Merge tag 'ipsec-next-2024-09-10' of... · ea403549

Jakub Kicinski authored Sep 10, 2024

Merge tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2024-09-10

1) Remove an unneeded WARN_ON on packet offload.
   From Patrisious Haddad.

2) Add a copy from skb_seq_state to buffer function.
   This is needed for the upcomming IPTFS patchset.
   From Christian Hopps.

3) Spelling fix in xfrm.h.
   From Simon Horman.

4) Speed up xfrm policy insertions.
   From Florian Westphal.

5) Add and revert a patch to support xfrm interfaces
   for packet offload. This patch was just half cooked.

6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
   From Florian Westphal.

7) Update comments on sdb and xfrm_policy.
   From Florian Westphal.

8) Fix a null pointer dereference in the new policy insertion
   code From Florian Westphal.

9) Fix an uninitialized variable in the new policy insertion
   code. From Nathan Chancellor.

* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
  xfrm: policy: fix null dereference
  Revert "xfrm: add SA information to the offloaded packet"
  xfrm: minor update to sdb and xfrm_policy comments
  xfrm: policy: use recently added helper in more places
  xfrm: add SA information to the offloaded packet
  xfrm: policy: remove remaining use of inexact list
  xfrm: switch migrate to xfrm_policy_lookup_bytype
  xfrm: policy: don't iterate inexact policies twice at insert time
  selftests: add xfrm policy insertion speed test script
  xfrm: Correct spelling in xfrm.h
  net: add copy from skb_seq_state to buffer function
  xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================

Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ea403549

Merge branch 'bnxt_en-msix-improvements' · e35b0515

Jakub Kicinski authored Sep 10, 2024

Michael Chan says:

====================
bnxt_en: MSIX improvements

This patchset makes some improvements related to MSIX.  The first
patch adjusts the default MSIX vectors assigned for RoCE.  On the
PF, the number of MSIX is increased to 64 from the current 9.  The
second patch allocates additional MSIX vectors ahead of time when
changing ethtool channels if dynamic MSIX is supported.  The 3rd
patch makes sure that the IRQ name is not truncated.
====================

Link: https://patch.msgid.link/20240909202737.93852-1-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e35b0515

bnxt_en: resize bnxt_irq name field to fit format string · f77cdee5

Edwin Peer authored Sep 09, 2024

The name field of struct bnxt_irq is written using snprintf in
bnxt_setup_msix(). Make the field large enough to fit the maximal
formatted string to prevent truncation.  Truncated IRQ names are
less meaningful to the user.  For example, "enp4s0f0np0-TxRx-0"
gets truncated to "enp4s0f0np0-TxRx-" with the existing code.

Make sure we have space for the extra characters added to the IRQ
names:

  - the characters introduced by the static format string: hyphens
  - the maximal static substituted ring type string: "TxRx"
  - the maximum length of an integer formatted as a string, even
    though reasonable ring numbers would never be as long as this.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f77cdee5

bnxt_en: Add MSIX check in bnxt_check_rings() · 2d51eb0b

Michael Chan authored Sep 09, 2024

bnxt_check_rings() is called to ensure that we have the hardware ring
resources before committing to reinitialize with the new number of
rings.  MSIX vectors are never checked at this point, because up
until recently we must first disable MSIX before we can allocate the
new set of MSIX vectors.

Now that we support dynamic MSIX allocation, check to make sure we
can dynamically allocate the new MSIX vectors as the last step in
bnxt_check_rings() if dynamic MSIX is supported.

For example, the IOMMU group may limit the number of MSIX vectors
for the device.  With this patch, the ring change will fail more
gracefully when there is not enough MSIX vectors.

It is also better to move bnxt_check_rings() to be called as the last
step when changing ethtool rings.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2d51eb0b

bnxt_en: Increase the number of MSIX vectors for RoCE device · f775cb1b

Michael Chan authored Sep 09, 2024

If RocE is supported on the device, set the number of RoCE MSIX vectors
to the number of online CPUs + 1 and capped at these maximums:

VF: 2
NPAR: 5
PF: 64

For the PF, the maximum is now increased from the previous value
of 9 to get better performance for kernel applications.

Remove the unnecessary check for BNXT_FLAG_ROCE_CAP.
bnxt_set_dflt_ulp_msix() will only be called if the flag is set.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-2-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f775cb1b